• Stars
    star
    141
  • Rank 259,971 (Top 6 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created about 10 years ago
  • Updated over 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Scalable REST API for the Caffe deep learning framework

Build Status GoDoc Coverage Status Join the chat at https://gitter.im/tleyden/elastic-thought

Scalable REST API wrapper for the Caffe deep learning framework.

The problem

Caffe is an awesome deep learning framework, but running it on a single laptop or desktop computer isn't nearly as productive as running it in the cloud at scale.

ElasticThought gives you the ability to:

  • Run multiple Caffe training jobs in parallel
  • Queue up training jobs
  • Tune the number of workers that process jobs on the queue
  • Interact with it via a REST API (and later build Web/Mobile apps on top of it)
  • Multi-tenancy to allow multiple users to interact with it, each having access to only their own data

Components

ElasticThought Components

Deployment Architecture

Here is what a typical cluster might look like:

ElasticThought Deployment

If running on AWS, each CoreOS instance would be running on its own EC2 instance.

Although not shown, all components would be running inside of Docker containers.

It would be possible to start more nodes which only had Caffe GPU workers running.

Roadmap

Current Status: everything under heavy construction, not ready for public consumption yet

  1. [done] Working end-to-end with IMAGE_DATA caffe layer using a single test set with a single training set, and ability to query trained set.
  2. [done] Support LEVELDB / LMDB data formats, to run mnist example.
  3. [in progress] Package everything up to make it easy to deploy locally or in the cloud
  4. Support the majority of caffe use cases
  5. Ability to auto-scale worker instances up and down based on how many jobs are in the message queue.
  6. Attempt to add support for other deep learning frameworks: pylearn2, cuda-convnet, etc.
  7. Build a Web App on top of the REST API that leverages PouchDB
  8. Build Android and iOS mobile apps on top of the REST API that leverages Couchbase Mobile

Design goals

  • 100% Open Source (Apache 2 / BSD), including all components used.
  • Architected to enable warehouse scale computing
  • No IAAS lockin -- easily migrate between AWS, GCE, or your own private data center
  • Ability to scale down as well as up

Documentation

System Requirements

ElasticThought requires CoreOS to run.

If you want to access the GPU, you will need to do extra work to get CoreOS working with Nvidia CUDA GPU Drivers

Installing elastic-thought on a single CoreOS host (Development mode)

If you are on OSX, you'll first need to install Vagrant, VirtualBox, and CoreOS. See CoreOS on Vagrant for instructions.

Here's what will be created:

           ┌─────────────────────────────────────────────────────────┐
           │                       CoreOS Host                       │
           │  ┌──────────────────────────┐  ┌─────────────────────┐  │
           │  │     Docker Container     │  │  Docker Container   │  │
           │  │   ┌───────────────────┐  │  │    ┌────────────┐   │  │
           │  │   │  Elastic Thought  │  │  │    │Sync Gateway│   │  │
           │  │   │      Server       │  │  │    │  Database  │   │  │
           │  │   │   ┌───────────┐   │  │  │    │            │   │  │
           │  │   │   │In-process │   │◀─┼──┼───▶│            │   │  │
           │  │   │   │   Caffe   │   │  │  │    │            │   │  │
           │  │   │   │  worker   │   │  │  │    │            │   │  │
           │  │   │   └───────────┘   │  │  │    └────────────┘   │  │
           │  │   └───────────────────┘  │  └─────────────────────┘  │
           │  └──────────────────────────┘                           │
           └─────────────────────────────────────────────────────────┘

Run the following commands on your CoreOS box (to get in, you may need to vagrant ssh core-01)

Start Sync Gateway Database

$ docker run -d --name sync-gateway -P couchbase/sync-gateway:1.1.0-forestdb_bucket sync_gateway https://gist.githubusercontent.com/tleyden/8051567cf62dfa8f89ca/raw/43d4abc9ef64cef7b4bbbdf6cb8ce80c456efd1f/gistfile1.txt

Start ElasticThought REST API server

$ docker run -d --name elastic-thought -p 8080:8080 --link sync-gateway:sync-gateway tleyden5iwx/elastic-thought-cpu-develop bash -c 'refresh-elastic-thought; elastic-thought --sync-gw http://sync-gateway:4984/elastic-thought'

It's also a good idea to check the logs of both containers to look for any errors:

$ docker logs sync-gateway 
$ docker logs -f elastic-thought

At this point you can test the API via curl.

Installing elastic-thought on AWS (Production mode)

It should be possible to install elastic-thought anywhere that CoreOS is supported. Currently, there are instructions for AWS and Vagrant (below).

Launch EC2 instances via CloudFormation script

Note: the instance will launch in us-east-1. If you want to launch in another region, please file an issue.

Verify CoreOS cluster

Run:

$ fleetctl list-machines

Which should show all the CoreOS machines in your cluster. (this uses etcd under the hood, so will also validate that etcd is setup correctly).

Kick off ElasticThought

Ssh into one of the machines (doesn't matter which): ssh -A [email protected]

$ wget https://raw.githubusercontent.com/tleyden/elastic-thought/master/docker/scripts/elasticthought-cluster-init.sh
$ chmod +x elasticthought-cluster-init.sh
$ ./elasticthought-cluster-init.sh -v 3.0.1 -n 3 -u "user:passw0rd" -p gpu 

Once it launches, verify your cluster by running fleetctl list-units.

It should look like this:

UNIT						MACHINE				ACTIVE	SUB
[email protected]                         2340c553.../10.225.17.229       active	running
[email protected]                         fbd4562e.../10.182.197.145      active	running
[email protected]                         0f5e2e11.../10.168.212.210      active	running
[email protected]                             2340c553.../10.225.17.229       active	running
[email protected]                             fbd4562e.../10.182.197.145      active	running
[email protected]                             0f5e2e11.../10.168.212.210      active	running
couchbase_bootstrap_node.service                0f5e2e11.../10.168.212.210      active	running
couchbase_bootstrap_node_announce.service       0f5e2e11.../10.168.212.210      active	running
couchbase_node.1.service                        2340c553.../10.225.17.229       active	running
couchbase_node.2.service                        fbd4562e.../10.182.197.145      active	running
[email protected]                   2340c553.../10.225.17.229       active	running
[email protected]                   fbd4562e.../10.182.197.145      active	running
[email protected]                   0f5e2e11.../10.168.212.210      active	running
[email protected]                      2340c553.../10.225.17.229       active	running
[email protected]                      fbd4562e.../10.182.197.145      active	running
[email protected]                      0f5e2e11.../10.168.212.210      active	running
[email protected]                          2340c553.../10.225.17.229       active	running
[email protected]                          fbd4562e.../10.182.197.145      active	running
[email protected]                          0f5e2e11.../10.168.212.210      active	running

At this point you should be able to access the REST API on the public ip any of the three Sync Gateway machines.

Installing elastic-thought on Vagrant (Staging mode)

This mode tries to replicate the Production mode described above, but on Vagrant instead of AWS.

Update Vagrant

Make sure you're running a current version of Vagrant, otherwise the plugin install below may fail.

$ vagrant -v
1.7.1

Install CoreOS on Vagrant

Clone the coreos/vagrant fork that has been customized for running ElasticThought.

$ cd ~/Vagrant 
$ git clone [email protected]:tleyden/coreos-vagrant.git
$ cd coreos-vagrant
$ cp config.rb.sample config.rb
$ cp user-data.sample user-data

By default this will run a two node cluster, if you want to change this, update the $num_instances variable in the config.rb file.

Run CoreOS

$ vagrant up

Ssh in:

$ vagrant ssh core-01 -- -A

If you see:

Failed Units: 1
  user-cloudinit@var-lib-coreos\x2dvagrant-vagrantfile\x2duser\x2ddata.service

Jump to Workaround CoreOS + Vagrant issues below.

Verify things started up correctly:

core@core-01 ~ $ fleectctl list-machines

If you get errors like:

2015/03/26 16:58:50 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused
2015/03/26 16:58:50 ERROR client.go:213: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms

Jump to Workaround CoreOS + Vagrant issues below.

Workaround CoreOS + Vagrant issues:

First exit out of CoreOS:

core@core-01 ~ $ exit

On your OSX workstation, try the following workaround:

$ sed -i '' 's/420/0644/' user-data
$ sed -i '' 's/484/0744/' user-data
$ vagrant reload --provision

Ssh back in:

$ vagrant ssh core-01 -- -A

Verify it worked:

core@core-01 ~ $ fleectctl list-machines

You should see:

MACHINE		IP		METADATA
ce0fec18...	172.17.8.102	-
d6402b24...	172.17.8.101	-

I filed CoreOS cloudinit issue 328 to figure out why this error is happening (possibly related issues: CoreOS cloudinit issue 261 or CoreOS cloudinit issue 190)

Continue steps above

Scroll up to the Installing elastic-thought on AWS section and start with Verify CoreOS cluster

FAQ

  • Is this useful for grid computing / distributed computation? Ans: No, this is not trying to be a grid computing (aka distributed computation) solution. You may want to check out Caffe Issue 876 or ParameterServer

Related Projects

License

Apache 2

More Repositories

1

open-ocr

Run your own OCR-as-a-Service using Tesseract and Docker
Go
1,290
star
2

neurlang

Neural Network library in Elixir / Erlang
Elixir
136
star
3

keynuker

🔐💥 KeyNuker - nuke AWS keys accidentally leaked to Github
Go
90
star
4

docker

Docker files
Dockerfile
81
star
5

deepstyle-ios

Open Source iOS app that uses neural-style to apply an artistic style to a photograph
Objective-C
49
star
6

neurgo

Neural Network toolkit in Go
Go
37
star
7

open-ocr-client

Client library for OpenOCR
Go
27
star
8

couchbase-cluster-go

Go library for managing a couchbase cluster via etcd and the couchbase REST API
Go
21
star
9

office-radar

See who's in the office! Couchbase Mobile + Estimote beacons
Objective-C
17
star
10

donkey-ros

Run ROS on a donkey car
16
star
11

goa-lambda-api

🎩 Deploy a Goa REST API on AWS Lambda
Go
15
star
12

screentap

Tap into the rich activity happening on your screen
Rust
9
star
13

deepstyle

Run DeepStyle (aka NeuralStyle) in the cloud
Go
8
star
14

hello_ros_erlang

Demonstration of connecting ROS with an Erlang Node via rospy and py-interface
Python
8
star
15

neurvolve

Evolution-based training for neurgo
Go
7
star
16

tleyden.github.io

Seven Story Rabbit Hole (A Blog)
HTML
6
star
17

awsutil

Utilities to extend aws-sdk-go
Go
5
star
18

fakehttp

Fake in-process webserver for unit testing code which depends on an external webserver
Go
5
star
19

todolite-appserver

Backend component for the TodoLite Couchbase Mobile sample app
Go
4
star
20

sync-gateway-coreos

Run Sync Gateway under Docker + CoreOS
Shell
4
star
21

checkerlution

Checkers Bot powered by Neurgo
Go
4
star
22

StanfordCars

Deep Learning project for the Stanford Cars dataset
Jupyter Notebook
4
star
23

deepstyle-21-market

21 Market wrapper for DeepStyle
Python
3
star
24

easy-fine-tune

Jupyter notebook examples of fine-tuning Imagenet models with Keras and Tensorflow
Jupyter Notebook
3
star
25

nupic-digitrecognizer

Example of using the NuPIC spatial pooler to recognize digits
Python
3
star
26

tutt

TUTT - The Ultimate Time Tracker
Vue
3
star
27

checkers-core

Core data structure and move generator for Checkers in golang.
Go
2
star
28

eyepi

Object detection on Raspberry Pi with AWS Lambda/S3 backend
Python
2
star
29

checkers-bot

Checkers Bot
Go
2
star
30

sync-gateway-nginx-confd

Generate Nginx proxy config for Sync Gateway via Confd
Shell
2
star
31

officeradar-appserver

App server for office-radar
Go
2
star
32

dotfiles

Dot files (.emacs, etc)
Emacs Lisp
1
star
33

aws-sdk-mock

Mock helpers for aws-sdk-go using testify/mock and goautomock
Go
1
star
34

deeplens-securitycam

Turn an AWS DeepLens device into a security cam
Python
1
star
35

uqclient

Go client library for uniqush-push (open source push notification server)
Go
1
star
36

qrstyle

QR code generator that can embed an image into the center of the QR code
Ruby
1
star
37

checkers-bot-minimax

Checkers Bot that uses a minimax search algorithm to find the best moves
Go
1
star