• This repository has been archived on 04/Aug/2020
  • Stars
    star
    540
  • Rank 82,257 (Top 2 %)
  • Language
    Jupyter Notebook
  • Created over 7 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Attempting to replicate "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem" https://arxiv.org/abs/1706.10059 (and an openai gym environment)

Attempting to replicate "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem" by Jiang et. al. 2017 [1].

Note2 (20190525): vermouth1992 improved this environment during their final project, I reccomend you start with their repo. Also check out the sagemaker tutorial which is based on vermouth1992's work.

Note1 (2018): the paper's authors have put the official code for the paper up and it works well

tl;dr I managed to get 8% growth on training data, but it disapeared on test data. So I couldn't replicate it. However, RL papers can be very difficult to replicate due to bugs, framework differences, and hyperparameter sensistivity

About

This paper trains an agent to choose a good portfolio of cryptocurrencies. It's reported that it can give 4-fold returns in 50 days and the paper seems to do all the right things so I wanted to see if I could achieve the same results.

This repo includes an environment for portfolio management (with unit tests). Hopefully others will find this usefull as I am not aware of any other implementations (as of 2017-07-17).

Author: wassname

License: AGPLv3

[1] Jiang, Zhengyao, Dixing Xu, and Jinjun Liang. "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem." arXiv preprint arXiv:1706.10059 (2017).

Results

I have managed to overfit to the training data with no trading costs but it could not generalise to the test data. So far there have been poor results. I have not yet tried hyperparameter optimisation so it could be that parameter tweaking will allow the model to fit, or I may have subtle bugs.

  • VPG model,
    • training: 190% portfolio growth in 50 days
    • testing: 100% portfolio growth in 50 days

This test period is directly after the training period and it looks like the usefullness of the models learned knowledge may decay as it moves away from its training interval.

There are other experiments stored as notebooks in past commits.

Installing

  • git clone https://github.com/wassname/rl-portfolio-management.git
  • cd rl-portfolio-management
  • pip install -r requirements/requirements.txt
  • jupyter-notebook
    • Then open tensorforce-VPG.ipynb in jupyter
    • Or try an alternative agent with tensorforce-PPO.ipynb and train

Using the environment

These environments are dervied from the OpenAI environment class which you can learn about in their documentation.

These environments come with 47k steps of training data and 8k test steps. Each step represents 30 minutes. Thanks to reddit user ARRRBEEE for sharing the data.

There are three output options which you can use as follows:

import gym
import rl_portfolio_management.environments  # this registers them

env = gym.envs.spec('CryptoPortfolioEIIE-v0').make()
print("CryptoPortfolioEIIE has an history shape suitable for an EIIE model (see https://arxiv.org/abs/1706.10059)")
observation = env.reset()
print("shape =", observation["history"].shape)
# shape = (5, 50, 3)

env = gym.envs.spec('CryptoPortfolioMLP-v0').make()
print("CryptoPortfolioMLP history has an flat shape for a dense/multi-layer perceptron model")
observation = env.reset()
print("shape =", observation["history"].shape)
# shape = (750,)

env = gym.envs.spec('CryptoPortfolioAtari-v0').make()
print("CryptoPortfolioAtari history has been padded to represent an image so you can reuse models tuned on Atari games")
observation = env.reset()
print("shape =", observation["history"].shape)
# shape = (50, 50, 3)

Or define your own:

import rl_portfolio_management.environments import PortfolioEnv
df_test = pd.read_hdf('./data/poloniex_30m.hf', key='test')
env_test = PortfolioEnv(
  df=df_test,
  steps=256,
  scale=True,
  augment=0.00,
  trading_cost=0.0025,
  time_cost=0.00,
  window_length=50,
  output_mode='mlp'
)

Lets try it with a random agent and plot the results:

import numpy as np
import gym
import rl_portfolio_management.environments  # this registers them

env = gym.envs.spec('CryptoPortfolioMLP-v0').make()
steps = 150
state = env.reset()
for _ in range(steps):
    # The observation contains price history and portfolio weights
    old_portfolio_weights = state["weights"]

    # the action is an array with the new portfolio weights
    # for out action, let's change the weights by around a 20th each step
    action = old_portfolio_weights + np.random.normal(loc=0, scale=1/20., size=(4,))

    # clip and normalize since the portfolio weights should sum to one
    action = np.clip(action, 0, 1)
    action /= action.sum()

    observation, reward, done, info = env.step(action)

    if done:
        break

# plot
env.render('notebook')

Unsuprisingly, a random agent doesn't perform well in portfolio management. If it had chosen to bet on blue then black if could have outperformed any single asset, but hindsight is 20/20.

Plotting

You can run env.render('notebook') or extract a pandas dataframe and plot how you like. To use pandas: pd.DataFrame(gym.unwrapped.infos).

Tests

We have partial test coverage of the environment, just run:

  • python -m pytest

Files

  • enviroments/portfolio.py - contains an openai environment for porfolio trading
  • tensorforce-PPO-IEET.ipynb - notebook to try a policy gradient agent

Differences in implementation

The main differences from Jiang et. al. 2017 are:

  • The first step in a deep learning project should be to make sure the model can overfit, this provides a sanity check. So I am first trying to acheive good results with no trading costs.
  • I have not used portfolio vector memory. For ease of implementation I made the information available by using the last weights.
  • Instead of DPG (deterministic policy gradient) I tried and DDPG (deep deterministic policy gradient) and VPG (vanilla policy gradient) with generalized advantage estimation and PPO.
  • I tried to replicate the best performing CNN model from the paper and haven't attempted the LSTM or RNN models.
  • instead of selecting 12 assets for each window I chose 3 assets that have existed for the longest time
  • My topology had an extra layer see issue 3 fixed

TODO

See issue #4 and #2 for ideas on where to go from here

More Repositories

1

keywordshitter2

A website to find long-tail keywords using search suggestions
CSS
168
star
2

viz_torch_optim

Videos of deep learning optimizers moving on 3D problem-landscapes
Jupyter Notebook
105
star
3

attentive-neural-processes

implementing "recurrent attentive neural processes" to forecast power usage (w. LSTM baseline, MCDropout)
Jupyter Notebook
88
star
4

world-models-sonic-pytorch

Attempt at reinforcement learning with curiosity for Sonic the Hedgehog games. Number 149 on OpenAI retro contest leaderboard, but more work needed
Jupyter Notebook
31
star
5

satellite_leak_detection

Detect water leaks from satellite images using machine learning
Jupyter Notebook
22
star
6

pipe-segmentation

Using machine learning to find water pipelines in aerial images with 73% accuracy
Jupyter Notebook
20
star
7

phoneme2grapheme

Teaching machines to spell with deep learning (acc=>80%) e.g. a model hears "pɹˈaʊd˺ɚ" and writes "prowder" (but it should be "prouder")
Jupyter Notebook
19
star
8

rl_2d_walker.js

Teaching a humanoid to walk(ish), then displaying in your browser (using tensorflow.js and reinforcement learning)
Jupyter Notebook
10
star
9

compare_altcoin_development

Compare development stats between altcoins
JavaScript
10
star
10

seq2seq-time

Bechmarking seq2seq models on a range of multivariate regression datasets
Jupyter Notebook
8
star
11

valuations_of_ethereum

5
star
12

ascii2segy

This script uses segpy to convert ascii files to segy
Python
4
star
13

awesome-rlhf

Lists of datasets, training, and evals for RLHF and similar
4
star
14

Phonetic_english

Fork of PIE - Phonetically Intuitive English (PIE), to make it compatible with other respelling schemes, primarily Wikipedia's excellent one.
JavaScript
4
star
15

simple_gpt2_chatbot

Jupyter Notebook
3
star
16

svg2cube

Generate isometric game sprites. Inputs an svg panel and it's folded into a cube and rendered from any angle.
JavaScript
3
star
17

awesome-interpretability

Awesome tools for interpreting, manipulating the internals of of deep neural networks.
3
star
18

detect_bs_text

Can we measure how good a text is by how much an LLM learns from it?
Jupyter Notebook
2
star
19

adapters_can_monitor_lies

inspired by circuit breakers paper. honesty>harmless
Jupyter Notebook
2
star
20

open_pref_eval

Hackable, simple, llm evals on preference datasets
Python
2
star
21

compare_github_repos

Compare github repositories by commits/w, stars, size, contributors, and other advanced statistics.
JavaScript
2
star
22

join-dic-hunspell

Join dic files to make a dic file for hunspell
Python
2
star
23

lie_elicitation_prompts

Research dataset. We use prompts to get LLM's to lie. Using sys prompts and multi shot examples
Jupyter Notebook
2
star
24

LoRA_are_lie_detectors

Experiment to see if low rank adapters can work as interventions for lie detection on LLM's
Jupyter Notebook
2
star
25

train-seismic-interp

Interpret a synthetic seismic line so you can measure and improve your error.
JavaScript
1
star
26

repr-preference-optimization

Python
1
star
27

cardsforscience

cardsforscience.com a solitaire game where you work out the secret rule (inspired by Eleusis)
JavaScript
1
star
28

iris_bigvae

experiment: IRIS but with pretrained LLM
Jupyter Notebook
1
star