Remote Data Mining And Management Job In Data Science And Analytics

Filter price from order book level 2 summary data

Find more Data Mining And Management remote jobs posted recently Worldwide

Data:
- The raw data is level 2 order book snapshots and incremental updates at tick frequency (each update = new observation) for a single asset.
- To simplify the task and limit the scope, you need to work with a summarized dataset in the form of csv files with the following columns: timestamp, bidPrice_x1, bidPrice_x2, ... askPrice_x1, askPrice_x2, ..., where bidPrice_x1 = the average price at which a market sell order of size x1 would be executed if it arrived at this instant.
- The scope of this task is limited to the summarized dataset. If you believe that you could do much better if you could only calculate different features from the raw orderbook data, we could discuss it as a separate job.
- Expect to work with 10M-100M rows, 10-20 columns with possible subsampling.

Goal:
For each row output a summarizing price P_t such that P_t = E[ (best bid price + best ask price at t + dt) / 2 | data available at t]. dt = at the scale of 1-10 minutes, tbd.

For example, the simplest summary price of the orderbook would just be mid price between best bid and ask, but it misses the information content of the order book imbalance (if there is more volume on bid than on ask, the price will on average go up) and momentum/mean reversion time series dynamics. You need to take the form of the orderbook and time series into account in some basic fashion. It is not a goal to outperform the market with such prediction, but just to reasonably summarize 80% of the information content in the order book l2 dynamics that is essentially common knowledge to market participants. Obviously, you can only use past data for prediction.

Deliverable:
You should deliver a script that reads the data and outputs the summarized price for each input row as well as explain to me how it works. You can use R (preferred) or Python on a single server, no cluster solutions. Please stick to the simplest and fastest algorithms, essentially linear models only, and discuss with me if you go for anything more complicated than OLS/Kalman filter.

Ill provide access to an RStudio Server for R, tbd for Python.

About you:
You have experience working with order book level 2 and time series data or at least have a solid understanding of relevant methods. You value simplicity and dont throw all the fancy machine learning stuff at the solution just because this is cool and it makes you look more sophisticated.

I would like to hire several people for this job for different assets and exchanges. Feel free to ask questions and discuss the task and conditions.
About the recuiter
Member since Mar 14, 2020
Ali Imron
from Flevoland, Netherlands

Skills & Expertise Required

Quantitative Analysis R Python 

Candidate shortlisted and hiredHiring open till - Jul 13, 2022

Work from Anywhere

40 hrs / week

Fixed Type

Remote Job

$347.60

Cost

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

databse designer (Postgresql/Openstack, long term cooperaationship, Arbeitsangebot)

You need to design the database tables.
Configure the databases multi-head databases or open stack.

You can start immediately.

Generate key phrases from text using python / spark

Looking for assistance for an NLP project. I need to extract key phrases from Amazon reviews based on a set of grammar rules. The data is preprocessed and dependency parsing, POS tagging are already implemented. Current data is available in json form...read more

Help moving existing Scikit-Learn Model to AWS SageMaker

We have a basic neural network model that is already running using Scikit-Learn and Python. We have been working to get off of EC2 machines and migrate to AWS SageMaker. We have very little experience with SageMaker and looking for someone to help an...read more

Convert email SAS code to python

I have a small section of SAS code which is to be converted to Python
You dont need to run it, just provide the substitute code accordingly

scrape 10 websites

looking to scrape 10 websites. the scrapers must work on a windows client and must in near-real time upload/paste data into google sheets.