Stock Price Movement Prediction Using The Deutsche Börse Public Dataset & Machine Learning
Introduction
We use neural networks applied to stock market data from the Deutsche Börse Public Dataset (PDS) to make predictions about future price movements for each stock.
Specifically, we make a prediction on the direction of the next minute's price change using information from the previous ten minutes. We use this to power a simplified trading strategy to show potential returns.
This is intended as a demonstrate of the applications on this data set.
The Deutsche Börse Public Dataset
The Deutsche Börse PDS project provides minute-by-minute statistics over trading data from the XETRA and EUREX engines.
We focus on XETRA only. It is comprised of a variety of equities, funds and derivative securities. The PDS contains details for on a per security level, detailing trading activity by minute including the high, low, first and last prices within the time period.
Getting Started
Ensure you have Docker installed before completing the following steps.
- Run
./build.sh
in the main repo folder to build the Docker image. - Run
./run-notebook.sh
to receive the notebook URL. Copy/paste this into your browser to access the notebook. - Start with the notebooks in order. Notebook 02- prepared the data for the other notebooks.
Additionally, you should run step 1 (./build.sh
) after each pull where the Dockerfile has been updated to rebuild your local version against the latest update.
Project Structure
The work here is divided across three notebooks:
- Notebook 1: Obtain, Clean & Understand Data
- We obtain 1 day's worth of data to understand its structure and behaviour.
- Notebook 2: Create Test Dataset
- Using the understanding from Notebook 1, we create the full test dataset.
- Notebook 3: Applying A Neural Network
- We create and apply a neural network approach to the test dataset and the challenge of price movement prediction, and assess its performance.
Additional notebooks
- What prices are predictable
- We find out that it matters weather you predict an EndPrice, a MeanPrice or a MedianPrice in the next interval. We show how one can normalize the prices to improve the prediction.
- Clustering Stocks
- We cluster 100 stocks from the dataset using data from 60 days.
- Simpler Linear Model
- We show a well-performing linear model with hand-engineered features on a single stock. We predict the average price of the next day for a single stock. This is intended to get started easily with the dataset and price modeling.
- Large-scale linear model predicting 20 minutes ahead
- We run a linear model on the 50 most liquid stocks with proper training and test sets. We predict the direction of the average price in the next 20 minutes.
Documentation
General project documentation can be found in the wiki here.
Authors
- Stefan Savev (Originate)
- Rey Farhan (Originate)