• Stars
    star
    207
  • Rank 188,685 (Top 4 %)
  • Language
    Shell
  • License
    MIT License
  • Created over 8 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Build the numpy/scipy/scikitlearn packages and strip them down to run in Lambda

sklearn-build-lambda

Building scikit-learn for AWS Lambda

This repo contains a build.sh script that's intended to be run in an Amazon Linux docker container, and build scikit-learn, numpy, and scipy for use in AWS Lambda. For more info about how the script works, and how to use it, see my blog post on deploying sklearn to Lambda.

There was an older version of this repo, now archived in the ec2-build-process branch, used an EC2 instance to perform the build process and an Ansible playbook to execute the build. That version still works, but the new dockerized version doesn't require you to launch a remote instance.

To build the zipfile, pull the Amazon Linux image and run the build script in it.

$ docker pull amazonlinux:2016.09
$ docker run -v $(pwd):/outputs -it amazonlinux:2016.09 \
      /bin/bash /outputs/build.sh

That will make a file called venv.zip in the local directory that's around 40MB.

Once you run this, you'll have a zipfile containing sklearn and its dependencies, to use them add your handler file to the zip, and add the lib directory so it can be used for shared libs. The minimum viable sklearn handler would thus look like:

import os
import ctypes

for d, _, files in os.walk('lib'):
    for f in files:
        if f.endswith('.a'):
            continue
        ctypes.cdll.LoadLibrary(os.path.join(d, f))

import sklearn

def handler(event, context):
    # do sklearn stuff here
    return {'yay': 'done'}

Extra Packages

To add extra packages to the build, create a requirements.txt file alongside the build.sh in this repo. All packages listed there will be installed in addition to sklearn, numpy, and related dependencies.

Sizing and Future Work

With just compression and stripped binaries, the full sklearn stack weighs in at 39 MB, and could probably be reduced further by:

  1. Pre-compiling all .pyc files and deleting their source
  2. Removing test files
  3. Removing documentation

For my purposes, 39 MB is sufficiently small, if you have any improvements to share pull requests or issues are welcome.

License

This project is MIT Licensed, for license info on the numpy, scipy, and sklearn packages see their respective sites. Full text of the MIT license is in LICENSE.txt.

More Repositories

1

hugo-lambda

Use AWS Lambda to run the Hugo static site generator
CSS
390
star
2

wut4lunch_demos

Demos to accompany an article comparing the Flask, Django, and Pyramid web frameworks.
Python
180
star
3

cfn-wrapper-python

Python decorator for making Lambda-backed CloudFormation resources
Python
71
star
4

disq

Python disque client, built on redis-py
Python
53
star
5

acm-certs-cloudformation

CloudFormation resource for AWS Certificate Manager cert requests
Python
47
star
6

yesterdaytabase

Cascade data from production to staging with AWS RDS and Lambda
Python
21
star
7

hugo-shortcodes

Handy shortcodes for the http://gohugo.io/ static site generator
HTML
20
star
8

serverless-cat-facts

Distribute facts about cats without needing to run your own infrastructure
Python
20
star
9

pubkey_police

Looks for publicly-readable RSA private keys and opens an issue on that repo notifying the owner.
Python
16
star
10

gitwarrior

Interface between Git Issues and TaskWarrior
Python
15
star
11

bookmarkd

Markdown -> IPython conversion tool
Python
15
star
12

ipylogue

A git-backed store for ipython notebooks
Python
13
star
13

ofCourse

Python courseware leveraging Flask and OpenShift
HTML
12
star
14

af3ro

Afero-compliant interface to S3
Go
11
star
15

pynamodb-polymorph

Calculated/copied attributes to support advanced single-table design in DynamoDB
Python
10
star
16

workstation

Ansible scripts to configure workstations and remote boxen
Shell
9
star
17

django-zappa-example

An implementation of my favorite Django demo - wut4lunch on AWS Lambda and API Gateway
Python
8
star
18

pepbrowser

An urwid-based browser for Python Enhancement Proposals (PEPs) that works on- and off-line.
Python
6
star
19

dotfiles

Configuration files for my environment.
Python
6
star
20

gitbyatruck

Read your git repository history and track propagation of knowledge through your team.
JavaScript
6
star
21

taskforge

A workflow system to help you wield your tasks
Python
6
star
22

crop

Cloudformation'd Repeatable Operator Packages
Python
5
star
23

lambda-ssl-alerts

AWS Lambda function to alert you to expiring SSL certs
Python
4
star
24

netHUD

a.k.a. mustached octo dangerzone
Python
4
star
25

gitsniffer

Find git repos where they live. In the CLOOUUD
Python
4
star
26

lmk

A tiny Golang SNS publisher
Go
4
star
27

microchord

C
3
star
28

cornice-openshift-quickstart

Python
3
star
29

spacehub

Space. Space. Wanna go to space.
JavaScript
3
star
30

arsd

Rust
3
star
31

bison-rates-demo

Rust
3
star
32

ask-an-expert-jan-2017

Companion files to the demos on Ansible's AWS Ask an Expert webinar January 11 2017
Python
3
star
33

ask-an-expert-sept-2017

Companion files to the demos on Ansible's AWS Ask an Expert webinar on September 14th
Python
3
star
34

gofigure

Golang configurator for managing multiple sources of information about your app's runtime
Go
3
star
35

networkingnotworking

Networking that does no work!
Go
2
star
36

tremendous

A wrapper for libfab (a fabulous reimplementation in C)
C
2
star
37

zsh-boto

ZSH tab-completion for Python's AWS library, Boto
2
star
38

brassballs

2
star
39

legowebservices

Go
2
star
40

FOOD

Parser Of On-campus Preferred Specials
Python
2
star
41

vc_notify

A simple script that lets me know if my repos are pushed to, or if repos I'm watching are pushed to.
Python
2
star
42

lemon_twist

A Twisted take on that same old Drink.
Python
2
star
43

otelme

A low-friction OpenTelemetry wrapper for Python apps. It comes with sugar over basic spanning and the `tell` magic receiver
Python
2
star
44

boto-layer-optimizer

Python
1
star
45

project_euler

My work on Project Euler, figured it was worth a shot
Python
1
star
46

rocpy.org

New Iteration of the Rochester Python User Group site
JavaScript
1
star
47

ish

Impulse Shell
Python
1
star
48

livesrv

A golang app server that you can reload parts of without interrupting web requests.
Go
1
star
49

decau.se

CSS
1
star
50

mdtocs

MarkDown Table Of Contents System
Python
1
star
51

hfoss-site

Individual HFOSS site, with archival content.
Python
1
star
52

hat-market-demo

Python
1
star
53

solitaire_haskell

Haskell implementation of https://www.schneier.com/solitaire.html
Haskell
1
star
54

yolo-octo-wookie

Shell
1
star
55

wrong.js

1
star
56

double-split-named

1
star
57

hybrid-operations-demo-repo

1
star
58

heat-standalone

1
star