• Stars
    star
    512
  • Rank 83,301 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 6 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast data store for Pandas time-series data

PyStore - Fast data store for Pandas timeseries data

Python version PyPi version PyPi status Travis-CI build status CodeFactor Star this repo Follow me on twitter

PyStore is a simple (yet powerful) datastore for Pandas dataframes, and while it can store any Pandas object, it was designed with storing timeseries data in mind.

It's built on top of Pandas, Numpy, Dask, and Parquet (via Fastparquet), to provide an easy to use datastore for Python developers that can easily query millions of rows per second per client.

==> Check out this Blog post for the reasoning and philosophy behind PyStore, as well as a detailed tutorial with code examples.

==> Follow this PyStore tutorial in Jupyter notebook format.

Quickstart

Install PyStore

Install using pip:

$ pip install pystore --upgrade --no-cache-dir

Install using conda:

$ conda install -c ranaroussi pystore

INSTALLATION NOTE: If you don't have Snappy installed (compression/decompression library), you'll need to you'll need to install it first.

Using PyStore

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pystore
import quandl

# Set storage path (optional)
# Defaults to `~/pystore` or `PYSTORE_PATH` environment variable (if set)
pystore.set_path("~/pystore")

# List stores
pystore.list_stores()

# Connect to datastore (create it if not exist)
store = pystore.store('mydatastore')

# List existing collections
store.list_collections()

# Access a collection (create it if not exist)
collection = store.collection('NASDAQ')

# List items in collection
collection.list_items()

# Load some data from Quandl
aapl = quandl.get("WIKI/AAPL", authtoken="your token here")

# Store the first 100 rows of the data in the collection under "AAPL"
collection.write('AAPL', aapl[:100], metadata={'source': 'Quandl'})

# Reading the item's data
item = collection.item('AAPL')
data = item.data  # <-- Dask dataframe (see dask.pydata.org)
metadata = item.metadata
df = item.to_pandas()

# Append the rest of the rows to the "AAPL" item
collection.append('AAPL', aapl[100:])

# Reading the item's data
item = collection.item('AAPL')
data = item.data
metadata = item.metadata
df = item.to_pandas()


# --- Query functionality ---

# Query avaialable symbols based on metadata
collection.list_items(some_key='some_value', other_key='other_value')


# --- Snapshot functionality ---

# Snapshot a collection
# (Point-in-time named reference for all current symbols in a collection)
collection.create_snapshot('snapshot_name')

# List available snapshots
collection.list_snapshots()

# Get a version of a symbol given a snapshot name
collection.item('AAPL', snapshot='snapshot_name')

# Delete a collection snapshot
collection.delete_snapshot('snapshot_name')


# ...


# Delete the item from the current version
collection.delete_item('AAPL')

# Delete the collection
store.delete_collection('NASDAQ')

Using Dask schedulers

PyStore 0.1.18+ supports using Dask distributed.

To use a local Dask scheduler, add this to your code:

from dask.distributed import LocalCluster
pystore.set_client(LocalCluster())

To use a distributed Dask scheduler, add this to your code:

pystore.set_client("tcp://xxx.xxx.xxx.xxx:xxxx")
pystore.set_path("/path/to/shared/volume/all/workers/can/access")

Concepts

PyStore provides namespaced collections of data. These collections allow bucketing data by source, user or some other metric (for example frequency: End-Of-Day; Minute Bars; etc.). Each collection (or namespace) maps to a directory containing partitioned parquet files for each item (e.g. symbol).

A good practice it to create collections that may look something like this:

  • collection.EOD
  • collection.ONEMINUTE

Requirements

  • Python 2.7 or Python > 3.5
  • Pandas
  • Numpy
  • Dask
  • Fastparquet
  • Snappy (Google's compression/decompression library)
  • multitasking

PyStore was tested to work on *nix-like systems, including macOS.

Dependencies:

PyStore uses Snappy, a fast and efficient compression/decompression library from Google. You'll need to install Snappy on your system before installing PyStore.

* See the python-snappy Github repo for more information.

*nix Systems:

  • APT: sudo apt-get install libsnappy-dev
  • RPM: sudo yum install libsnappy-devel

macOS:

First, install Snappy's C library using Homebrew:

$ brew install snappy

Then, install Python's snappy using conda:

$ conda install python-snappy -c conda-forge

...or, using pip:

$ CPPFLAGS="-I/usr/local/include -L/usr/local/lib" pip install python-snappy

Windows:

Windows users should checkout Snappy for Windows and this Stackoverflow post for help on installing Snappy and python-snappy.

Roadmap

PyStore currently offers support for local filesystem (including attached network drives). I plan on adding support for Amazon S3 (via s3fs), Google Cloud Storage (via gcsfs) and Hadoop Distributed File System (via hdfs3) in the future.

Acknowledgements

PyStore is hugely inspired by Man AHL's Arctic which uses MongoDB for storage and allow for versioning and other features. I highly reommend you check it out.

License

PyStore is licensed under the Apache License, Version 2.0. A copy of which is included in LICENSE.txt.


I'm very interested in your experience with PyStore. Please drop me an note with any feedback you have.

Contributions welcome!

- Ran Aroussi

More Repositories

1

yfinance

Download market data from Yahoo! Finance's API
Python
11,948
star
2

quantstats

Portfolio analytics for quants, written in Python
Python
3,776
star
3

qtpylib

QTPyLib, Pythonic Algorithmic Trading
Python
2,058
star
4

pywallet

Dead-simple BIP32 (HD) wallet creation for BTC, BTG, BCH, LTC, DASH, USDT, QTUM and DOGE
Python
421
star
5

ezibpy

ezIBpy, a Pythonic Client for Interactive Brokers API
Python
318
star
6

pandas-montecarlo

A lightweight Python library for running simple Monte Carlo Simulations on Pandas Series data
Python
197
star
7

multitasking

Non-blocking Python methods using decorators
Python
187
star
8

futuresio-webinars

Supported files and code examples for my futures.io webinars series
Jupyter Notebook
62
star
9

pytrade

PyTrade, a Pythonic Trading Framework
Python
44
star
10

monthly-returns-heatmap

Python Monthly Returns Heatmap (DEPRECATED! Use QuantStats instead)
Python
25
star
11

pointjs

A lightweight, client-side framework for building user interfaces.
JavaScript
20
star
12

python-for-trading-meetup

Python for Trading Meetup (December 3, 2019)
HTML
18
star
13

seasonality

Streamlit app that shows the seasonal returns of a stock http://aroussi.com/seasonality
Python
17
star
14

python-webinar

Webinar slides and notebook
HTML
11
star
15

adblocker-detector

Detect if your visitors are using an ad blocker
JavaScript
6
star
16

textual.js

Javascript static website generator - https://ranaroussi.github.io/textual.js/
JavaScript
5
star
17

cryptex

CryptEX - Crypto-Currency Trading Framework
Python
5
star
18

grubstake

Figure out how much money you need to quit work and live off your grubstake
CSS
2
star