• Stars
    star
    162
  • Rank 232,284 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

massively parallel experimentation with Jupyter and AWS Lambda 🐑🌩📒

eigensheep

PyPI PyPI - Python Version PyPI - License

Eigensheep is a python package, with a very easy setup process, that lets you effortlessly run Jupyter Notebook cells on AWS Lambda, enabling massive parallelism. To instantly provision and run your code on 1000 tiny VMs, prefix a cell with %%eigensheep -n 1000.

Eigensheep gives your Lambda code full access both to packages from PyPi, and to layers from Lambda Layers, including typically tricky-to-install things like Z3, ffmpeg, and puppeteer.

Features

  • Just prefix a cell with %%eigensheep to run it on AWS Lambda
  • Automatically generates Lambda deployment packages with pre-installed dependencies via pip.
  • Supports Lambda Layers for easily including external libraries like Z3, FFmpeg, Puppeteer/Chromium, LibreOffice, Tesseract OCR, YOLOv3 on Darknet, and Spacy
  • Automatically caches Lambda configurations
  • Supports response sizes over 6MB by saving results to S3
  • Integrates tqdm for interactively displaying progress
  • Easy setup and configuration powered by AWS CloudFormation
  • Automatically copies variables from notebook scope

Sequentially opening 50 websites with Puppeteer and taking screenshots takes 105.6 seconds, while the same task split into 50 concurrent Lambda invocations finishes in 9.8 seconds

Here we compare the task of capturing screenshots of the 50 most popular websites (according to Moz) with Pyppeteer. In the first bar, we're doing this sequentially with a Python for loop. In the second one, each website is run as a different Lambda. The estimated cost of the full sequential test (at the current us-east-1 price) is $0.0051 (or 0.07% of the monthly free quota). The estimated cost of the full parallel test is $0.0073 (or 0.11% of the monthly free quota).

Getting Started

Open up your Terminal and install eigensheep with pip

pip install eigensheep

Open a Jupyter notebook with jupyter notebook and create a new Python notebook. Eigensheep supports both Python 2 and Python 3. Run the following code in a cell:

import eigensheep

Follow the on-screen instructions to configure AWS credentials. AWS credentials will be saved to ~/.aws/config under the eigensheep profile for subsequent invocations. Eigensheep uses AWS CloudFormation so you only need a few clicks to get started (see our guided video walkthrough).

eigensheep setup

Once Eigensheep is set up, you can run any code on Lambda by prefixing the cell with %%eigensheep. You can include dependencies from pip by typing %%eigensheep <list of package names>, for example %%eigensheep requests numpy. You can invoke a cell multiple times concurrently with the -n parameter, for example %%eigensheep -n 100.

eigensheep usage

Frequently Asked Questions

Q: Why is this library called Eigensheep?

The name comes from the classic math joke:

What do you call a baby eigensheep?

A lamb, duh.

Q: Does this work on Python 2 and Python 3?

A: Both Python 2 and Python 3 are supported. If the library is imported from a Python 2.x notebook, the Lambda runtime will default to "python2.6". If the library is imported from a Python 3.x notebook, the Lambda runtime defaults to "python3.6". This can be manually overridden with the "--runtime" option.

Q: Can I use this to do GPU stuff?

A: Currently the AWS Lambda execution environment does not expose access to any GPU acceleration. Eigensheep probably won't be that useful for training deep neural nets.

Q: How much does it cost to run stuff on AWS Lambda?

A: Unlike a traditional VM, you don't get charged while you're idling and not actively computing. You don't have to worry about accidentally forgetting to turn off a machine, and provisioning a VM takes only milliseconds rather than minutes.

AWS provides a pretty generous Free Tier for Lambda which does not expire after 12 months. It's 400,000 GB-seconds/month. That's 36 continuous hours of a single maxed out 3108MB Lambda job for free every month. Alternatively, it's about 20 minutes of 100 concurrent maxed out instances. After that it's about $7 for every subsequent free-tier equivalent.

Q: Can this be used for web scraping?

A: Yes, Eigensheep can be used for web scraping. However, note that different Lambda VM instances often share the same IP address.

Q: Can Eigensheep be used for long running computations?

A: The maximum allowed duration of any Lambda job is 15 minutes. Eigensheep works best for tasks which can be broken up into smaller chunks.

Q: What are the security implications of using Eigensheep?

A: The Eigensheep CloudFormation stack creates an IAM User, Access Key, and Lambda Role with as few permissions as possible. If the access keys are compromised, the attacker only has access to a bucket containing Eigensheep-specific content, and can not use it to access any of your other AWS resources.

The IAM User can only read/write from a specific bucket earmarked for use with Eigensheep, and can only update a specific lambda function (all the different variants are stored as different versions on a single Lambda function). The Lambda function only has access to the specific bucket and the ability to write to CloudWatch logs and XRay tracing streams.

All of the access keys can be revoked and all of the resources can be removed simply by deleting the CloudFormation stack from the AWS console.

Q: Where does Eigensheep store its configuration?

A: Eigensheep stores its access keys and configuration in the ~/.aws/config file under the eigensheep profile.

Q: Can I use Eigensheep without installing the CloudFormation Stack?

A: Yes. Although it's a bit more complicated to set up. You can use any AWS access key and secret, so long as it has the ability to modify/invoke a Lambda named "EigensheepLambda" (which must be manually created). You must also create an S3 bucket named "eigensheep-YOUR_ACCOUNT_ID", where YOUR_ACCOUNT_ID is your numerical AWS account ID.

Usage

usage: %%eigensheep [-h] [-n N] [--memory MEMORY] [--timeout TIMEOUT]
                    [--runtime RUNTIME] [--layer LAYER] [--reinstall]
                    [--no_install] [--clean] [--rm] [--name NAME] [--verbose]
                    [deps [deps ...]]

Jupyter cell magic to invoke cell on AWS Lambda

positional arguments:
  deps               dependencies to be installed via pip

optional arguments:
  -h, --help         show this help message and exit
  -n N               number of parallel lambdas to invoke
  --memory MEMORY    amount of memory in 64MB increments from 128 up to 3008
  --timeout TIMEOUT  lambda execution timeout in seconds up to 900 (15
                     minutes)
  --runtime RUNTIME  lambda runtime (python3.7, python2.7) defaults configured
                     based on host environment
  --layer LAYER      ARNs of lambda layers to include
  --reinstall        regenerate lambda configuration and dependencies
  --no_install       do not install dependencies if configration not found
  --clean            clear all deployed lambda configurations
  --rm               remove a specific lambda configuration
  --name NAME        store the lambda for later use with `eigensheep.map` or
                     `eigensheep.invoke`
  --verbose          show additional information from lambda invocation

eigensheep.map("do_stuff", [1, 2, 3, 4])

eigensheep.invoke("do_stuff")

%eigensheep --clean

Acknowledgements

This library was written by Kevin Kwok and Guillermo Webster. It is based on Jupyter/IPython, tqdm, boto3, and countless Stackoverflow answers.

If you're interested in this project, you should also check out PyWren by Eric Jonas, and ExCamera from Sadjad Fouladi, et al.

More Repositories

1

ocrad.js

OCR in Javascript via Emscripten
JavaScript
3,451
star
2

jsgif

Save a HTML5 Canvas to GIF and Animations. A port of as3gif GIFPlayer to JS
JavaScript
1,052
star
3

whammy

A real time javascript webm encoder based on a canvas hack
JavaScript
992
star
4

player

Almost certainly the first MP3 player of its kind.
JavaScript
276
star
5

cloudsave

Save to the cloud.
JavaScript
168
star
6

rgb-lab

convert between rgb and L*a*b color spaces in javascript
JavaScript
155
star
7

tesseract-rs

Rust bindings for Tesseract
Rust
113
star
8

weppy

Javascript WebP Library
JavaScript
111
star
9

gocr.js

OCR in Javascript via Emscripten
C
95
star
10

inpaint.js

Telea Inpainting Algorithm in JS
JavaScript
86
star
11

drag2up

Drag a file from your computer to any text field to upload and add link
JavaScript
83
star
12

surplus

Google+ Chrome Extension
JavaScript
68
star
13

summerTorrent

A bit torrent client written in JavaScript, on top of node.js
JavaScript
64
star
14

breadloaf

A draggable, dockable, notebook-style layout engine for React
JavaScript
53
star
15

bzip2.js

a bunzip implementation in pure javascript
JavaScript
37
star
16

evm

Eulerian Video Magnification in the Browser with JSFeat
JavaScript
32
star
17

obvious-rpc

fully strongly typed client-server communication that is so obvious you'll wonder why it hasn't always been like this
TypeScript
32
star
18

js-typed-array-sha1

sha1 with js typed arrays
JavaScript
29
star
19

swipe-gesture

Quick multitouch back/forward gesture for Chromebooks
JavaScript
28
star
20

js-id3v2

A Javascript implementation of ID3v2
JavaScript
28
star
21

autocircle

how to create a magical circle which adds people automagically
Ruby
23
star
22

google-music-protocol

reverse engineered google music protocol
Python
22
star
23

microwave

Mobile-friendly Javascript Data API based Google Wave Client
JavaScript
21
star
24

cloudfall

A simple text editor that syncs to dropbox
JavaScript
20
star
25

musicalpha

Upload songs to Google Music Beta on Linux
JavaScript
20
star
26

js-wikireader

An Offline Wikipedia Dump Reader in Javascript that probably only works on Chrome
JavaScript
19
star
27

jstorrent

A pure JavaScript BitTorrent 1.0 Implementation
JavaScript
17
star
28

heapqueue.js

A simple binary heap priority queue
JavaScript
17
star
29

boa

"its like OAB in python because snake"
Python
15
star
30

distributed-pi

Calculate Pi using distributed computing with JavaScript on Appengine
JavaScript
14
star
31

chrome-dropbox

Dropbox + Chrome
JavaScript
13
star
32

stick2

a simple stick figure animator with html5
JavaScript
13
star
33

hideelements

Chrome Extension. Background Page + Context Menu + Content Script
12
star
34

awesomeness

HTTP based federated protocol for real time hierarchical message manipulation
JavaScript
12
star
35

scratchpad

scratchpad used in khan academy
JavaScript
12
star
36

codemirror-jsx

CodeMirror Mode for React E4X/JSX
JavaScript
11
star
37

3d-sculpt

A simple 3D digital sculpting tool made with JS and HTML5 Canvas
10
star
38

antimatter15

Tiny projects of antimatter15
JavaScript
10
star
39

chromesearch

Desktop Search Engine Chrome Extension
JavaScript
10
star
40

zui

A zooming user interface
JavaScript
9
star
41

antimatter15.github.io

I can't think of a description so I'm describing my inability to think of a description
HTML
9
star
42

js-potrace

A JS port of the C# Vectorize port of the C Potrace
8
star
43

gayfish

experimental notebook programming environment
JavaScript
8
star
44

derpsacola

use mac accessibility api to scrape screen contents
Swift
8
star
45

2d-thin-plate-spline

javascript thin plate spline in 2d
JavaScript
8
star
46

chromecorder

Encode screencasts in a cool way copied off of sublimetext.com
CoffeeScript
8
star
47

gmailwave

Integrated Gmail and Wave Chrome Extension
JavaScript
8
star
48

js-ebml

a simple ebml parser in JS for no good reason
JavaScript
7
star
49

jsvectoreditor

a new version of vectoreditor
JavaScript
7
star
50

wave.js

A Node.JS implementation of the Wave Robot API
6
star
51

k5

differentiable graphics for react
JavaScript
6
star
52

untar.js

untar salvaged from bitjs
JavaScript
6
star
53

readability-iframe

Chrome extension for sites that want to use Readability
JavaScript
5
star
54

creamie

Chrome + Streamie (port of both client and server to Chrome)
JavaScript
5
star
55

pinball

coffeescript pinball game
CoffeeScript
5
star
56

w2_embed

Deep Integration Wave Embed API
JavaScript
5
star
57

autograph

the best most easiest way to graphql
TypeScript
5
star
58

surplus-lite

Google+ notifications in Chrome without colossal memory usage.
JavaScript
5
star
59

omeglebot

A simple Omegle robot that repeats previous conversation phrases semi-intelligently
JavaScript
5
star
60

py-wikireader

A simple offline Wikipedia dump reader
Python
5
star
61

pepper

Use face.com api and canvas to interactively, fancily and automagically add the casually pepper spraying cop to any picture
JavaScript
5
star
62

speed

Read in a subtitles track and speed up parts of TV shows which don't have talking
4
star
63

derp

kinda like version control or something
JavaScript
4
star
64

x-no-wiretap

Aid the NSA's unwitting collection of domestic internet traffic!
JavaScript
4
star
65

pdftotext-wasm

poppler pdftotext compiled with emscripten
Dockerfile
4
star
66

exthub

A self updating, collaborative extension platform
4
star
67

facebook-export

Export facebook phone and other data with a screen scraper into CSV format
CoffeeScript
4
star
68

venn-google

Venn Diagrams using Google Suggest
JavaScript
4
star
69

sqlite-vfs-js

TypeScript
4
star
70

espkey

A portable hyperlocal wireless social experiment
C++
4
star
71

hqx.js

hqx in js
JavaScript
4
star
72

jove

ipython notebook for node.js
JavaScript
4
star
73

franchise-client

database connectors for franchise
JavaScript
4
star
74

fluidizer

Bookmarklet which converts arbitray fixed-width layouts into fluid layouts
JavaScript
4
star
75

wsl

pipe to websocket
JavaScript
4
star
76

d3-pinch-zoom

pinch to zoom for d3 on desktop browsers
JavaScript
4
star
77

vx-comet

A lightweight implementation of the Bayeux protocol
JavaScript
3
star
78

anodize

New Chrome Packaged App BitTorrent Client, mostly just a lot of NodeJS modules stuck together
JavaScript
3
star
79

bitjs

Binary Tools for JavaScript
JavaScript
3
star
80

crossave

Chrome OS File Manager Handler powered by Cloud Save that uploads to a bucketload of services.
JavaScript
3
star
81

evilmeter

chrome extension that detects user agent sniffing
3
star
82

sublime-autobuild

Automatically build on save in Sublime Text 2
Python
3
star
83

fb-grapher

Make purty graphs out of fb data!
JavaScript
3
star
84

dropsync

dropbox syncing for chrome os
3
star
85

groebner.js

javascript implementation of buchberger's algorithm for computing a polynomial groebner basis
JavaScript
3
star
86

sprite-codec

A fast screen media optimized codec for embedding in websites
3
star
87

doge

wow. such commit. very push.
Python
3
star
88

identicon-login

A new approach to fighting phishing
PHP
3
star
89

wordless

extract plain text from a word document
JavaScript
3
star
90

rsvgshim

A SVG Shim that renders with RaphaelJS
JavaScript
3
star
91

kindlespark

Sparknotes -> Kindle via YQL
2
star
92

tensorflow-renderer

first steps toward trying to build a mesh renderer in tensorflow
Jupyter Notebook
2
star
93

retcon

TypeScript
2
star
94

articles

hopefully dis gon b gud
ASP
2
star
95

timeliner

automatically enable timeline for facebook
2
star
96

facetex

TeX for Facebook Chat
JavaScript
2
star
97

progressive-json

Parse JSON before all of it is loaded
JavaScript
2
star
98

wave-unread-navigator

Show gmail-like arrows listing if unread blips in an open wave are above or below.
JavaScript
2
star
99

keyboard

some failed experiment from a while ago
2
star
100

autocomplete

Probably one of my least interesting projects ever.
2
star