• Stars
    star
    100
  • Rank 338,708 (Top 7 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Machine Learning API with native redis caching and export + import using S3. Analyze entire datasets using an API for building, training, testing, analyzing, extracting, importing, and archiving. This repository can run from a docker container or from the repository.

Sci-Pype - A Machine Learning Framework for Sharing Models and Analysis

This is now deployed under the https://redten.io cloud service for analyzing datasets.

Sci-Pype is a framework for analyzing datasets using Python 2.7 and extended from the Jupyter Scipy-Notebook with a supported command line version (no docker or Jupyter required). It was built to make data analysis easier by providing an API to build, train, test, predict, validate, analyze, extract, archive, and import Models and Analysis datasets with: S3 and redis (Kafka coming soon). After building and training the requested Models with a dataset, they are cached in redis along with their respective Analysis. After they are cached, they can be extracted and shared using S3. From S3, the Models can be imported back into redis for making new predictions using the same API.

Sci-Pype - A Machine Learning Framework for Sharing Models and Analysis

Analyzing the IRIS dataset with Sci-Pype

Common use cases for this framework are sharing Analysis notebooks and then automating new predictions with email delivery using AWS SES. With this native caching + deployment layer, you can build, train and use the supported Machine Learning Algorithms and Models across multiple environments (including multi-tenant ones). Once trained, you can extract the Models as a compressed, serialized Model file (like a build artifact) that is uploaded to S3. Importing a Model file decompresses the file and stores the Pickle-serialized Models + Analysis objects in redis. In production, it might be useful to house larger Models in something like a load-balanced redis cluster for sharing and making new predictions across a team or by automation.

Please note this is a large docker container so it may take some time to download and it extracts to ~8.1 GB on disk.

Notebook Examples

Please refer to the examples directory for the latest notebooks. Most of the notebooks and command line tools require running with a redis server listening on port 6000 (<repo base dir>/dev-start.sh will start one).

  1. ML-IRIS-Analysis-Workflow-Classification.ipynb

    Build a unique Machine Learning Classifier (parameterized XGB by default) for each column in the IRIS dataset. After training and testing the Models, perform a general analysis on each column and save + display images generated during each step. After running, the Models + Analysis are Pickled into a set of objects stored in a set of unique redis cache keys. These leaf nodes are organized into a set of redis keys contained in the manifest node for retrieval as needed in the future (like a tree of Machine Learning Algorithm Models with their associated pre-computed Analysis in memory).

  2. ML-IRIS-Analysis-Workflow-Regression.ipynb

    Build a unique Machine Learning Regressor (parameterized XGB by default) for each column in the IRIS dataset. After training and testing the Models, perform a general analysis on each column and save + display images generated during each step. After running, the Models + Analysis are Pickled into a set of objects stored in a set of unique redis cache keys. These leaf nodes are organized into a set of redis keys contained in the manifest node for retrieval as needed in the future (like a tree of Machine Learning Algorithm Models with their associated pre-computed Analysis in memory).

  3. ML-IRIS-Extract-Models-From-Cache.ipynb

    Extract all Models and Analysis records from redis and compile a large Pickle-serialized dictionary. Create a manifest for decoupling Model + Analysis nodes and compress the dictionary object (using zlib) and write it to disk as a Model file (*.cache.pickle.zlib). After creating the file on disk, upload it to the configured S3 Bucket and Key.

    Once uploaded to the S3 Bucket you should be able to view, download and share the Model files:

    ./examples/images/scipype_s3_bucket_with_xgb_classifier_and_regressor_models_as_pickled_object_files.png

    S3 Bucket containing the IRIS Model Files

  4. ML-IRIS-Import-and-Cache-Models-From-S3.ipynb

    Download the S3 IRIS Model file from the configured S3 Bucket + Key and decompress the previously-built Analysis and Models using Pickle to store them all in the redis cache according to the manifest. This includes examples from the IRIS sample dataset and requires you to have a valid S3 Bucket storing the Models and are comfortable paying for the download costs to retrieve the Model file from S3 (https://aws.amazon.com/s3/pricing/).

  5. ML-IRIS-Predict-From-Cache-for-New-Predictions-and-Analysis-Classifier.ipynb

    This notebook shows how to make new predictions with cached IRIS Classifier Models + Analysis housed in redis.

  6. ML-IRIS-Predict-From-Cache-for-New-Predictions-and-Analysis-Regressor.ipynb

    This notebook shows how to make new predictions with cached IRIS Regressor Models + Analysis housed in redis.

Command Line Examples

Most of the notebooks and command line tools require running with a redis server listening on port 6000 (<repo base dir>/dev-start.sh will start one). The command line versions that do not require docker or Jupyter can be found:

<repo base dir>
├── bins
│   ├── demo-running-locally.py - Simple validate env is working test
│   ├── ml
│   │   ├── builders - Build and Train Models then Analyze Predictions without display any plotted images (automation examples)
│   │   │   ├── build-classifier-iris.py
│   │   │   ├── build-regressor-iris.py
│   │   │   ├── rl-build-regressor-iris.py
│   │   │   └── secure-rl-build-regressor-iris.py
│   │   ├── demo-ml-classifier-iris.py - Command line version of: ML-IRIS-Analysis-Workflow-Classification.ipynb
│   │   ├── demo-ml-regressor-iris.py - Command line version of: ML-IRIS-Analysis-Workflow-Regression.ipynb
│   │   ├── demo-rl-regressor-iris.py - Command line version of: ML-IRIS-Redis-Labs-Cache-XGB-Regressors.ipynb
│   │   ├── demo-secure-ml-regressor-iris.py - Demo with a Password-Required Redis Server running locally
│   │   ├── demo-secure-rl-regressor-iris.py - Demo with a Password-Required Redis Labs Cloud endpoint
│   │   ├── downloaders
│   │   │   ├── download_boston_house_prices.py
│   │   │   └── download_iris.py - Command line tool for downloading + preparing the IRIS dataset
│   │   ├── extractors
│   │   │   ├── extract_and_upload_iris_classifier.py - Command line version of: ML-IRIS-Extract-Models-From-Cache.ipynb (Classifier)
│   │   │   ├── extract_and_upload_iris_regressor.py - Command line version of: ML-IRIS-Extract-Models-From-Cache.ipynb (Regressor)
│   │   │   ├── rl_extract_and_upload_iris_regressor.py - Command line version of:  ML-IRIS-Redis-Labs-Extract-From-Cache.ipynb
│   │   │   └── secure_rl_extract_and_upload_iris_regressor.py - Command line version with a password for: ML-IRIS-Redis-Labs-Extract-From-Cache.ipynb
│   │   ├── importers
│   │   │   ├── import_iris_classifier.py - ML-IRIS-Import-and-Cache-Models-From-S3.ipynb (Classifier)
│   │   │   ├── import_iris_regressor.py - ML-IRIS-Import-and-Cache-Models-From-S3.ipynb (Regressor)
│   │   │   ├── rl_import_iris_regressor.py - Command line version of: ML-IRIS-Redis-Labs-Import-From-S3.ipynb
│   │   │   └── secure_rl_import_iris_regressor.py - Command line version with a password for: ML-IRIS-Redis-Labs-Import-From-S3.ipynb
│   │   └── predictors
│   │       ├── predict-from-cache-iris-classifier.py - ML-IRIS-Predict-From-Cache-for-New-Predictions-and-Analysis-Classifier.ipynb (Classifier)
│   │       ├── predict-from-cache-iris-regressor.py - ML-IRIS-Predict-From-Cache-for-New-Predictions-and-Analysis-Regressor.ipynb (Regressor)
│   │       ├── rl-predict-from-cache-iris-regressor.py - Command line version of: ML-IRIS-Redis-Labs-Predict-From-Cached-XGB.ipynb
│   │       └── secure-rl-predict-from-cache-iris-regressor.py - Command line version with a password for: ML-IRIS-Redis-Labs-Predict-From-Cached-XGB.ipynb

Now you can share, test, and deploy Models and their respective Analysis from a file in S3 for other Sci-Pype users running on different environments.

Overview

The docker container runs a Jupyter web application. The web application runs Jupyter Notebooks as kernels. For now the examples and core included in this repository will only work with Python 2.

This container can run in four modes:

  1. Default development

    This mode will mount your changes from the repository into the container at runtime for local testing.

    To start the local development version run: dev-start.sh

    ./dev-start.sh
    

    You can login to the container with: ./ssh.sh

  2. Docker Run Single Container

    To start the local development version run: start.sh

    ./start.sh
    

    You can login to the container with: ./ssh.sh

  3. Full Stack

    To start the full stack mode run: compose-start-full.sh

    ./compose-start-full.sh
    

    The full-stack-compose.yml will deploy three docker containers using docker compose:

  4. Standalone Testing

    To start the full stack mode run: compose-start-jupyter.sh

    ./compose-start-jupyter.sh
    

    The jupyter-docker-compose.yml is used to deploy a single Jupyter container.

Running Locally without Docker

Here is how to run locally without using docker (and Lambda deployments in the future).

  1. Clone the repo without the dash character in the name

    $ git clone [email protected]:jay-johnson/sci-pype.git scipype
    
  2. Go to the base dir of the repository

    dev$ cd scipype
    
  3. Set up a local virtual environment using the installer

    This will take some time and may fail due to missing packages on your host. Please refer to the Coming Soon and Known Issues section for help getting passed these issues.

    scipype$ ./setup-new-dev.sh
    

    After this finishes you should see the lines:

    ---------------------------------------------------------
    Activate the new Scipype virtualenv with:
    
    source ./dev-properties.sh"
       or:
    source ./properties.sh
    
  4. Activate the scipype virtual environment for development:

    $ source ./dev-properties.sh
    
  5. Confirm your virtual environment is ready for use

    (scipype) scipype$ pip list --format=columns | grep -E -i "tensorflow|pandas|redis|kafka|xgboost|scipy|scikit"
    confluent-kafka                    0.9.2
    kafka-python                       1.3.1
    pandas                             0.19.2
    pandas-datareader                  0.2.2
    pandas-ml                          0.4.0
    redis                              2.10.5
    scikit-image                       0.12.3
    scikit-learn                       0.18.1
    scikit-neuralnetwork               0.7
    scipy                              0.18.1
    tensorflow                         0.12.0
    xgboost                            0.6a2
    (scipype) scipype$
    
  6. Setup the /opt/work symlink

    When running outside docker, I find it easiest to just symlink the repo's base dir to /opt/work to emulate the container's internal directory deployment structure. In a future release, a local-properties.sh file will set all the environment variables relative to the repository, but for now this works.

    scipype$ ln -s $(pwd) /opt/work
    
  7. Confirm the symlink is setup

    scipype$ ll /opt/work
    lrwxrwxrwx 1 driver driver 32 Mar  6 22:38 /opt/work -> /home/driver/dev/scipype/
    scipype$
    
  8. If you want to always use this virtual environment add this to your ~/.bashrc

    echo 'source /opt/venv/scipype/bin/activate' >> ~/.bashrc
    
  9. Confirm the Demo downloader works using the Virtual Environment

    Please note: this assumes running from a new terminal to validate the virtual environment activation

    Activate it

    scipype$ source ./dev-properties.sh
    

    Run the Demo

    (scipype) scipype$ ./bins/demo-running-locally.py
    Downloading(SPY) Dates[Jan, 02 2016 - Jan, 02 2017]
    Storing CSV File(/opt/scipype/data/src/spy.csv)
    Done Downloading CSV for Ticker(SPY)
    Success File exists: /opt/scipype/data/src/spy.csv
    

    Deactivate it

    (scipype) scipype$ deactivate
    scipype$
    
  10. If you want to automatically load the full Scipype environment properties.sh for any new shell terminal add this to your user's ~/.bashrc

    echo 'source /opt/work/properties.sh' >> ~/.bashrc
    

Authenticated Redis Examples

You can lock redis down with a password by setting it in the redis.conf before starting the redis server (https://redis.io/topics/security#authentication-feature). Here is how to use the machine learning API with a password-locked Redis Labs endpoint or a local one.

Environment Variables

If you are running sci-pype in a docker container it will load the following env vars to ensure the redis application system's clients are setup with the password and database:

# Redis Password where Empty = No Password like:
# ENV_REDIS_PASSWORD=
ENV_REDIS_PASSWORD=2603648a854c4f3ba7c93e8449319380
ENV_REDIS_DB_ID=0

You can run without a password by either not defining the ENV_REDIS_PASSWORD environment variable or making it set to an empty string.

Using a Password-locked Redis Labs Cloud endpoint

  1. Run the Secure Redis Labs Cloud Demo

    bins/ml$ ./demo-secure-rl-regressor-iris.py
    
  2. Connect to the Redis Labs Cloud endpoint

    After running it you can verify the models were stored on the secured endpoint:

    bins/ml$ redis-cli -h pub-redis-12515.us-west-2-1.1.ec2.garantiadata.com -p 12515
    
  3. Verify the server is enforcing the password

    pub-redis-12515.us-west-2-1.1.ec2.garantiadata.com:12515> KEYS *
    (error) NOAUTH Authentication required
    
  4. Authenticate with the password

    pub-redis-12515.us-west-2-1.1.ec2.garantiadata.com:12515> auth 2603648a854c4f3ba7c93e8449319380
    OK
    
  5. View the redis keys

    pub-redis-12515.us-west-2-1.1.ec2.garantiadata.com:12515> KEYS *
    1) "_MD_IRIS_REGRESSOR_PetalWidth"
    2) "_MD_IRIS_REGRESSOR_PredictionsDF"
    3) "_MD_IRIS_REGRESSOR_SepalWidth"
    4) "_MODELS_IRIS_REGRESSOR_LATEST"
    5) "_MD_IRIS_REGRESSOR_ResultTargetValue"
    6) "_MD_IRIS_REGRESSOR_Accuracy"
    7) "_MD_IRIS_REGRESSOR_PetalLength"
    8) "_MD_IRIS_REGRESSOR_SepalLength"
    pub-redis-12515.us-west-2-1.1.ec2.garantiadata.com:12515> exit
    bins/ml$
    

Local

  1. You can run a password-locked, standalone redis server with docker compose using this script:

    https://github.com/jay-johnson/sci-pype/blob/master/bins/redis/auth-start.sh

  2. Once the redis server is started you can run the local secure demo with the script:

    bins/ml$ ./demo-secure-ml-regressor-iris.py
    
  3. After the demo finishes you can authenticate with the local redis server and view the cached models:

    bins/ml$ redis-cli -p 6400
    127.0.0.1:6400> KEYS *
    (error) NOAUTH Authentication required.
    127.0.0.1:6400> AUTH 2603648a854c4f3ba7c93e8449319380
    OK
    127.0.0.1:6400> KEYS *
    1) "_MD_IRIS_REGRESSOR_PetalWidth"
    2) "_MD_IRIS_REGRESSOR_PetalLength"
    3) "_MD_IRIS_REGRESSOR_PredictionsDF"
    4) "_MD_IRIS_REGRESSOR_SepalWidth"
    5) "_MODELS_IRIS_REGRESSOR_LATEST"
    6) "_MD_IRIS_REGRESSOR_Accuracy"
    7) "_MD_IRIS_REGRESSOR_ResultTargetValue"
    8) "_MD_IRIS_REGRESSOR_SepalLength"
    127.0.0.1:6400> exit
    bins/ml$
    
  4. If you want to stop the redis server run:

    https://github.com/jay-johnson/sci-pype/blob/master/bins/redis/stop.sh

Previous Examples

Version 1 Examples

  1. example-core-demo.ipynb

    How to use the python core from a Jupyter notebook. It also shows how to debug the JSON application configs which are used to connect to external database(s) and redis server(s).

    https://jaypjohnson.com/_images/image_2016-08-01_core-integration.png
  2. example-spy-downloader.ipynb

    Jupyter + Downloading the SPY Pricing Data

    Download the SPY ETF Pricing Data from Google Finance and store it in the shared ENV_PYTHON_SRC_DIR directory that is mounted from the host and into the Jupyter container. It uses a script that downloads the SPY daily pricing data as a csv file.

    https://jaypjohnson.com/_images/image_2016-08-01_download-spy-pricing-data.png
  3. example-plot-stock-data.ipynb

    Download SPY and use Pandas + Matlab to Plot Pricing by the Close

    This shows how to download the SPY daily prices from Google Finance as a csv then load it using Pandas for plotting on the Close prices with Matlab.

    https://jaypjohnson.com/_images/image_2016-08-01_plot-spy-by-close-prices.png
  4. example-redis-cache-demo.ipynb

    Building a Jupyter + Redis Data Pipeline

    This extends the previous SPY pricing demo and publishes + retreives the pricing data by using a targeted CACHE redis server (that runs inside the Jupyter container). It stores the Pandas dataframe as JSON in the LATEST_SPY_DAILY_STICKS redis key.

    https://jaypjohnson.com/_images/image_2016-08-01_redis-data-pipeline-with-spy-prices.png
  5. example-db-extract-and-cache.ipynb

    Building a Jupyter + MySQL + Redis Data Pipeline

    This requires running the Full Stack which uses the https://github.com/jay-johnson/sci-pype/blob/master/full-stack-compose.yml to deploy three docker containers on the same host:

    How it works

    https://jaypjohnson.com/_images/image_2016-08-01_using-jupyter-for-stock-analysis.png
    1. Extract the IBM stock data from the MySQL dataset and store it as a csv inside the /opt/work/data/src/ibm.csv file

    2. Load the IBM pricing data with Pandas

    3. Plot the pricing data with Matlab

    4. Publish the Pandas Dataframe as JSON to Redis

    5. Retrieve the Pandas Dataframe from Redis

    6. Test the cached pricing data exists outside the Jupyter container with:

      $ ./redis.sh
      SSH-ing into Docker image(redis-server)
      [root@redis-server container]# redis-cli -h localhost -p 6000
      localhost:6000> LRANGE LATEST_IBM_DAILY_STICKS 0 0
      1) "(dp0\nS'Data'\np1\nS'{\"Date\":{\"49\":971136000000,\"48\":971049600000,\"47\":970790400000,\"46\":970704000000,\"45\":970617600000,\"44\":970531200000,\"43\":970444800000,\"42\":970185600000,\"41\":970099200000,\"40\":970012800000,\"39\":969926400000,\"38\":969
      
       ... removed for docs ...
      
      localhost:6000> exit
      [root@redis-server container]# exit
      exit
      $
      
  6. example-slack-debugging.ipynb

    Jupyter + Slack Driven Development

    This example shows how environment variables allow the python core to publish a message into Slack to notify the associated user with a message containing the line number and source code that threw the exception.

    https://jaypjohnson.com/_images/image_2016-08-01_slack-debugging.png

Components

  1. Python 2 Core

    The PyCore uses a JSON config file for connecting to redis servers and configurable databases (MySQL and Postgres) using SQLAlchemy. It has only been tested with the Python 2.7 kernel.

  2. Local Redis Server

    When starting the container with ENV_DEPLOYMENT_TYPE set to anything not JustDB, the container will start a local redis server inside the container on port 6000 for iterating on your pipeline analysis, Model deployment and caching strategies.

  3. Loading Database and Redis Applications

    By default the jupyter.json config supports multiple environments for integrating notebooks with external resources. Here is table on what they define:

    Name

    Purpose

    Redis Applications

    Database Applications

    Local

    Use the internal redis server with the stock db

    local-redis.json

    db.json

    NoApps

    Run the core without redis servers or databases

    empty-redis.json

    empty-db.json

    JustRedis

    Run with just the redis servers and no databases

    local-redis.json

    empty-db.json

    JustDB

    Run without redis servers and load the databases

    empty-redis.json

    db.json

    Test

    Connect to external redis servers and databases

    redis.json

    db.json

    Live

    Connect to external redis servers and databases

    redis.json

    db.json

    Inside a notebook you can target a different environment before loading the core with:

    • Changing to the JustRedis Environment:

      import os
      os.environ["ENV_DEPLOYMENT_TYPE"] = "JustRedis"
      core = PyCore()
      
    • Changing to the NoApps Environment:

      import os
      os.environ["ENV_DEPLOYMENT_TYPE"] = "NoApps"
      core = PyCore()
      
  4. Customize the Jupyter Container Lifecycle

    The following environment variables can be used for defining pre-start, start, and post-start Jupyter actions as needed.

    Environment Variable

    Default Value

    Purpose

    ENV_PRESTART_SCRIPT

    /opt/containerfiles/pre-start-notebook.sh

    Run custom actions before starting Jupyter

    ENV_START_SCRIPT

    /opt/containerfiles/start-notebook.sh

    Start Jupyter

    ENV_POSTSTART_SCRIPT

    /opt/containerfiles/post-start-notebook.sh

    Run custom actions after starting Jupyter

  5. Slack Debugging

    The core supports publishing exceptions into Slack based off the environment variables passed in using docker or docker compose.

  6. Tracking Installed Dependencies for Notebook Sharing

    This docker container uses these files for tracking Python 2 and Python 3 pips:

    • /opt/work/pips/python2-requirements.txt
    • /opt/work/pips/python3-requirements.txt
  7. Shared Volumes

    These are the mounted volumes and directories that can be changed as needed. Also the core uses them as environment variables.

    Host Mount

    Container Mount

    Purpose

    /opt/project

    /opt/project

    Sharing a project from the host machine

    /opt/work/data

    /opt/work/data

    Sharing a common data dir between host and containers

    /opt/work/data/src

    /opt/work/data/src

    Passing data source files into the container

    /opt/work/data/dst

    /opt/work/data/dst

    Passing processed data files outside the container

    /opt/work/data/bin

    /opt/work/data/bin

    Exchanging data binaries from the host into the container

    /opt/work/data/synthesize

    /opt/work/data/synthesize

    Sharing files used for synthesizing data

    /opt/work/data/tidy

    /opt/work/data/tidy

    Sharing files used to tidy and marshall data

    /opt/work/data/analyze

    /opt/work/data/analyze

    Sharing files used for data analysis and processing

    /opt/work/data/output

    /opt/work/data/output

    Sharing processed files and analyzed output

Getting Started

Local Jupyter

  1. Start the Container in Local development mode

    $ ./start.sh
    Starting new Docker image(docker.io/jayjohnson/jupyter)
    4275447ef6a3aa06fb06097837deeb202bd80b15969a9c1269a5ee042d8df13d
    $
    
  2. Browse to the local Jupyter website

    http://localhost:82/

Full Stack

The full-stack-compose.yml patches the Jupyter and redis containers to ensure the MySQL database is listening on port 3306 before starting. It does this by defining a custom entrypoint wrapper for each in the wait-for-its tools directory.

  1. Start the Composition

    This can take around 20 seconds for MySQL to set up the seed pricing records, and it requires assigning the shared data directory permissions for read/write access from inside the Jupyter container.

    $ ./compose-start-full.sh
    Before starting changing permissions with:
       chown -R driver:users /opt/work/data/*
    [sudo] password for driver:
    Starting Composition: full-stack-compose.yml
    Starting stocksdb
    Starting jupyter
    Starting redis-server
    Done
    $
    
  2. Check the Composition

    $ docker ps
    CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS                                        NAMES
    1fd9bd22987f        jayjohnson/redis-single-node:1.0.0   "/wait-for-its/redis-"   12 minutes ago      Up 25 seconds       0.0.0.0:6000->6000/tcp                       redis-server
    2bcb6b8d2994        jayjohnson/jupyter:1.0.0             "/wait-for-its/jupyte"   12 minutes ago      Up 25 seconds       0.0.0.0:8888->8888/tcp                       jupyter
    b7bce846b9af        jayjohnson/schemaprototyping:1.0.0   "/root/start_containe"   25 minutes ago      Up 25 seconds       0.0.0.0:81->80/tcp, 0.0.0.0:3307->3306/tcp   stocksdb
    $
    
    • Optional - Login to the database container
    $ ./db.ssh
    SSH-ing into Docker image(stocksdb)
    [root@stocksdb db-loaders]# ps auwwx | grep mysql | grep -v grep
    root        28  0.0  0.0  11648  2752 ?        S    17:00   0:00 /bin/sh /usr/bin/mysqld_safe
    mysql      656  1.3 12.0 1279736 474276 ?      Sl   17:00   0:01 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/log/mysql/error.log --pid-file=/var/lib/mysql/mysqld.pid --socket=/var/lib/mysql/mysqld.sock --port=3306
    [root@stocksdb db-loaders]# exit
    

    View the Stocks Database with phpMyAdmin: http://localhost:81/phpmyadmin/sql.php?db=stocks&table=stocks

    Note

    By default the login to this sample db is: dbadmin / dbadmin123 which can be configured in the db.env

    • Optional - Login to the Redis container
    $ ./redis.sh
    SSH-ing into Docker image(redis-server)
    [root@redis-server container]# ps auwwx | grep redis
    root         1  0.0  0.0  11644  2616 ?        Ss   17:00   0:00 bash /wait-for-its/redis-wait-for-it.sh
    root        28  0.0  0.2 114800 11208 ?        Ss   17:00   0:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor.d/rediscluster.ini
    root        30  0.3  0.0  37268  3720 ?        Sl   17:00   0:00 redis-server *:6000
    root        47  0.0  0.0   9044   892 ?        S+   17:02   0:00 grep --color=auto redis
    [root@redis-server container]# exit
    
    • Optional - Login to the Jupyter container
    $ ./ssh.sh
    SSH-ing into Docker image(jupyter)
    jovyan:/opt/work$ ps auwwx | grep jupyter
    jovyan       1  0.0  0.0  13244  2908 ?        Ss   17:00   0:00 bash /wait-for-its/jupyter-wait-for-it.sh
    jovyan      38  0.3  1.2 180564 48068 ?        S    17:00   0:00 /opt/conda/bin/python /opt/conda/bin/jupyter-notebook
    jovyan:/opt/work$ exit
    
  3. Run the Database Extraction Jupyter Demo

    Open the notebook with this url: http://localhost:82/notebooks/examples/example-db-extract-and-cache.ipynb

  4. Click the Run Button

    This example will connect to the stocksdb MySQL container and pull 50 records from IBM's pricing data. It will then render plot lines for Open, Close, High, and Low using Pandas and Matlab. Next it will cache the IBM records in the redis-server container and then verify those records were cached correctly by retrieving it again.

  5. From outside the Jupyter container confirm the redis key holds the processed IBM data

    $ ./redis.sh
    SSH-ing into Docker image(redis-server)
    [root@redis-server container]# redis-cli -h localhost -p 6000
    localhost:6000> LRANGE LATEST_IBM_DAILY_STICKS 0 0
    1) "(dp0\nS'Data'\np1\nS'{\"Date\":{\"49\":971136000000,\"48\":971049600000,\"47\":970790400000,\"46\":970704000000,\"45\":970617600000,\"44\":970531200000,\"43\":970444800000,\"42\":970185600000,\"41\":970099200000,\"40\":970012800000,\"39\":969926400000,\"38\":969
    
     ... removed for docs ...
    
    localhost:6000> exit
    [root@redis-server container]# exit
    exit
    $
    
  6. Stop the Composition

    $ ./compose-stop-full.sh
    Stopping Composition: full-stack-compose.yml
    Stopping redis-server ... done
    Stopping jupyter ... done
    Stopping stocksdb ... done
    Done
    $
    

Standalone

  1. Start Standalone

    Start the standalone Jupyter container using the jupyter-docker-compose.yml file. This compose file requires access to /opt/work/data host directory like the Full Stack version for sharing files between the container and the host.

    $ ./compose-start-jupyter.sh
    Before starting changing permissions with:
       chown -R driver:users /opt/work/data/*
    [sudo] password for driver:
    Starting Composition: jupyter-docker-compose.yml
    Starting jupyter
    Done
    $
    
  2. Stop Standalone

    Stop the standalone Jupyter composition with:

    $ ./compose-stop-jupyter.sh
    Stopping Composition: jupyter-docker-compose.yml
    Stopping jupyter ... done
    Done
    $
    

Deleting the Containers

Remove the containers with the command:

$ docker rm jupyter redis-server stocksdb
jupyter
redis-server
stocksdb
$

Delete them from the host with:

$ docker rmi jayjohnson/schemaprototyping
$ docker rmi jayjohnson/jupyter
$ docker rmi jayjohnson/redis-single-node

Sharing between the Host and the Jupyter Container

By default, the host will have this directory structure available for passing files in and out of the container:

$ tree /opt/work
/opt/work
└── data
    ├── analyze
    ├── bin
    ├── dst
    ├── output
    ├── src
    │   └── spy.csv
    ├── synthesize
    └── tidy

8 directories, 1 file

From inside the container here is where the directories are mapped:

$ ./ssh.sh
SSH-ing into Docker image(jupyter)
driver:/opt/work$ tree data/
data/
├── analyze
├── bin
├── dst
├── output
├── src
│   └── spy.csv
├── synthesize
└── tidy

7 directories, 1 file

Coming Soon and Known Issues

  1. Missing xattr.h

    If you see this error:

    xattr.c:29:24: fatal error: attr/xattr.h: No such file or directory
    

    Install RPM:

    sudo yum install -y libattr-devel
    

    Install Deb:

    sudo apt-get install -y libattr1-dev
    

    Retry the install

  2. Local Install Confluent:

    If you're trying to setup the local development environment and missing the kafka headers:

    In file included from confluent_kafka/src/confluent_kafka.c:17:0:
    confluent_kafka/src/confluent_kafka.h:21:32: fatal error: librdkafka/rdkafka.h: No such file or directory
    #include <librdkafka/rdkafka.h>
    

    Please install Kafka by adding their repository and then installing:

    $ sudo yum install confluent-platform-oss-2.11
    $ sudo yum install librdkafka1 librdkafka-devel
    

    Official RPM Guide: http://docs.confluent.io/3.1.1/installation.html#rpm-packages-via-yum

    Official DEB Guide: http://docs.confluent.io/3.1.1/installation.html#deb-packages-via-apt

    For Fedora 24/RHEL 7/CentOS 7 users here's a tool to help:

    scipype/python2$ sudo ./install_confluent_platform.sh
    
  3. Install PyQt4 for ImportError: No module named PyQt4 errors:

    (python2) jovyan:/opt/work/bins$ conda install -y pyqt=4.11
    Fetching package metadata .........
    Solving package specifications: ..........
    
    Package plan for installation in environment /opt/conda/envs/python2:
    
    The following packages will be downloaded:
    
            package                    |            build
            ---------------------------|-----------------
            qt-4.8.7                   |                3        31.3 MB  conda-forge
            pyqt-4.11.4                |           py27_2         3.5 MB  conda-forge
            ------------------------------------------------------------
                                                                                    Total:        34.8 MB
    
    The following NEW packages will be INSTALLED:
    
            pyqt: 4.11.4-py27_2 conda-forge
            qt:   4.8.7-3       conda-forge (copy)
    
    Pruning fetched packages from the cache ...
    Fetching packages ...
    qt-4.8.7-3.tar 100% |##########################################################################################################################################| Time: 0:00:06   5.23 MB/s
    pyqt-4.11.4-py 100% |##########################################################################################################################################| Time: 0:00:02   1.28 MB/s
    Extracting packages ...
    [      COMPLETE      ]|#############################################################################################################################################################| 100%
    Linking packages ...
    [      COMPLETE      ]|#############################################################################################################################################################| 100%
    

    Now try running a script from the shell:

    (python2) jovyan:/opt/work/bins$ ./download-spy-csv.py
    Downloading(SPY) Dates[Jan, 02 2016 - Jan, 02 2017]
    Storing CSV File(/opt/work/data/src/spy.csv)
    Done Downloading CSV for Ticker(SPY)
    Success File exists: /opt/work/data/src/spy.csv
    (python2) jovyan:/opt/work/bins$
    
  4. How to build a customized Python Core mounted from outside the Jupyter container

  5. Fixing the docker compose networking so the stocksdb container does not need to know the compose-generated docker network.

    Right now it is defining the sci-pype_datapype as the expected docker network. This may not work on older versions of docker.

  6. Building Jupyter containers that are smaller and only run one kernel to reduce the overall size of the image

  7. Testing on an older docker version

    This was tested with 1.12.1

    $ docker -v
    Docker version 1.12.1, build 23cf638
    $
    
  8. Setting up the Jupyter wait-for-it to ensure the stocks database is loaded before starting...not just the port is up

    For now just shutdown the notebook kernel if you see an error related to the stocks database not being there when running the full stack.

Coming Soon

License

This project is not related to SciPy.org or the scipy library. It was originally built for exchanging and loading datasets using Redis for creating near-realtime data pipelines for streaming analysis (like a scientific pypeline).

This repo is Apache 2.0 License: https://github.com/jay-johnson/sci-pype/blob/master/LICENSE

Jupyter - BSD: https://github.com/jupyter/jupyter/blob/master/LICENSE

Please refer to the Conda Licenses for individual Python libraries: https://docs.continuum.io/anaconda/pkg-docs

Redis - https://redis.io/topics/license

zlib - https://opensource.org/licenses/zlib-license.php

More Repositories

1

deploy-to-kubernetes

Deploy a distributed AI stack to a multi-host or single-host Kubernetes cluster on CentOS 7 and also works on AWS - and comes with: cert-manager + redis-cluster + rook-ceph for persistent storage + minio s3 object store + splunk + optional external dns server + affinity examples - validated with K8 version 1.13.4 🔨 🔧 ☁️
Shell
79
star
2

train-ai-with-django-swagger-jwt

Train AI (Keras + Tensorflow) to defend apps with Django REST Framework + Celery + Swagger + JWT - deploys to Kubernetes and OpenShift Container Platform
Python
70
star
3

owasp-jenkins

Want to test your applications using the latest OWASP security toolchains and the NIST National Vulnerability Database using Jenkins, Ansible and docker? 🐳 🛡️ 🔒
Shell
54
star
4

docker-redis-cluster

Running a distributed 6-node Redis Cluster with Docker Swarm, Docker Compose, and Supervisor
Shell
49
star
5

docker-redis-haproxy-cluster

A Redis Replication Cluster accessible through HAProxy running across a Docker Composed-Swarm with Supervisor and Sentinel
Python
48
star
6

network-pipeline

Network traffic data pipeline for real-time predictions and building datasets for deep neural networks
Python
47
star
7

celery-connectors

Want to handle 100,000 messages in 90 seconds? Celery and Kombu are that awesome - Multiple publisher-subscriber demos for processing json or pickled messages from Redis, RabbitMQ or AWS SQS. Includes Kombu message processors using native Producer and Consumer classes as well as ConsumerProducerMixin workers for relay publish-hook or caching
Python
41
star
8

metalnetes

Create and manage multiple Kubernetes clusters using KVM on a bare metal Fedora 29 server. Includes helm + rook-ceph + nginx ingress + the stock analysis engine (jupyter + redis cluster + minio + automated cron jobs for data collection) - works on Kubernetes version v1.16.0 - 1.16.3 was not working
Shell
39
star
9

docker-django-nginx-slack-sphinx

Django + nginx using Docker Compose with Slack + uWSGI + Sphinx + Bootstrap + Bootswatch + AJAX ➡️ 🐳 + 🐍 = 💥
Python
27
star
10

antinex-core

Network exploit detection using highly accurate pre-trained deep neural networks with Celery + Keras + Tensorflow + Redis
Jupyter Notebook
20
star
11

nerfball

Want to see how something like Internet Chemotherapy works without bricking your own vms? This is a jail to reduce the python runtime from doing bad things on the host when running untrusted code. Nerf what you do not need 👾 + 🐛 ⚽ 🏈 🐳
Python
19
star
12

restapi

A secure-by-default, async Rest API with hyper, tokio, bb8, kafka-threadpool, postgres and prometheus for monitoring. Includes: a working user management and authentication backend written for postgres, async s3 uploading/downloading, async publishing to kafka with mTLS for encryption in transit
Rust
18
star
13

spylunking

Drill down into your python logs using JSON logs stored in Splunk - supports sending over TCP or the Splunk HEC REST API handlers (using threads or multiprocessing) - includes a pre-configured Splunk sandbox in a docker container
Python
12
star
14

docker-redis-sentinel-replication-cluster

A Redis Replication Cluster running across a Docker Swarm using Compose, Supervisor, and Sentinel
Shell
11
star
15

kombu-and-pika-pub-sub-examples

Simple publisher and subscriber examples for Kombu and Pika with a RabbitMQ broker
Python
10
star
16

sec-rss-read-and-convert-to-json

Using Python, read the SEC RSS Feed and convert to JSON dictionary
Python
8
star
17

antinex-datasets

Datasets for training deep neural networks to defend software applications
Python
7
star
18

docker-schema-prototyping-with-mysql

Prototyping a MySQL Schema with Docker and phpMyAdmin
Python
6
star
19

convert-stock-ticker-into-cik-json

Quickly Convert Stock Tickers into the associated SEC CIK with Company Name and dump to JSON
Python
6
star
20

antinex-client

AntiNex python client for training and using pre-trained deep neural networks with JWT authentication
Python
5
star
21

ruby-luhn-checker-for-credit-card-validation

💳 Process credit card transactions and maintain an account balance with the Luhn algorithm for fraud detection and credit card number validation. Written in Ruby and uses RSpec for testing.
Ruby
4
star
22

redten-python

Python client and docker image for the red10 machine learning api
Python
4
star
23

docker-nginx-sphinx-bootstrap

Host your own Technical Blog with Docker + nginx + Sphinx Bootstrap
Python
3
star
24

docker-sphinx-bootstrap

A containerized version of ryan-roemer's sphinx-bootstrap-theme repository so that on startup it will convert any rst files mounted from a host volume directory into themed, mobile-ready html.
Shell
3
star
25

celery-loaders

Examples for Celery applications and task loading
Python
3
star
26

antinex-utils

Manage and use pre-trained deep neural networks with a common interface for build, compile, fit, evaluate, kfold, cross validate, and predict lifecycle phases using Keras and Tensorflow
Python
3
star
27

slack-driven-development

Tired of crawling through logs looking for errors? This is a simple Slack bot that publishes exceptions + line number + environment name into a channel to help develop and find bugs faster
Python
3
star
28

docker-rails-app

Rails App - Phase 2 - With Docker Integration for Demonstrating Travis CI
Ruby
2
star
29

python-that-runs-c-plus-plus

Bind C++ objects, methods and classes for use with Python 2.7, and it builds with cmake
C++
2
star
30

network-pipeline-datasets

CSV datasets for ML/AI models from captured network traffic during ZAP scanning with web applications like Django, Flask, React, Vue and Spring - Anti-Nex training datasets
2
star
31

rust-with-strimzi-kafka-and-tls

Rust messaging with a Strimzi Kafka cluster secured with self-signed tls assets for encryption in transit with mtls for client authentication
Rust
2
star
32

datanode

A python 2 container runtime for processing data science tasks and workloads (used by https://github.com/jay-johnson/sci-pype for distributed analysis)
Python
2
star
33

docker-springxd-container

Docker container for Spring XD Container that uses Travis CI to auto-push passing builds into Docker Hub
Shell
1
star
34

antinex-docs

Docs on readthedocs.org
1
star
35

docker-nginx

A configurable docker nginx container running on CentOS 7
Shell
1
star
36

rust-kafka-threadpool

An async rust threadpool for publishing messages to kafka using SSL (mTLS) or PLAINTEXT protocols.
Rust
1
star
37

lets-encrypt-nginx

Another nginx docker container that automates registering and renewing Let's Encrypt x509 ssl certificates
Shell
1
star
38

message-simulator

An open source tool for testing and hardening clusters for high availability
Python
1
star