• Stars
    star
    1,068
  • Rank 43,257 (Top 0.9 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created about 9 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Lambda functions over S3 objects with concurrency control (each, map, reduce, filter)

s3-lambda

s3-lambda enables you to run lambda functions over a context of S3 objects. It has a stateless architecture with concurrency control, allowing you to process a large number of files very quickly. This is useful for quickly prototyping complex data jobs without an infrastructure like Hadoop or Spark.

At Littlstar, we use s3-lambda for all sorts of data pipelining and analytics.

Disclaimer This module does not interact with the AWS Lambda service; the name s3-lambda is referring to lambda functions in computer science, and all s3 file processing happens locally.

Install

npm install s3-lambda --save

Quick Example

const S3Lambda = require('s3-lambda')

// example options
const lambda = new S3Lambda({
  accessKeyId: 'aws-access-key',       // Optional. (falls back on local AWS credentials)
  secretAccessKey: 'aws-secret-key',   // Optional. (falls back on local AWS credentials)
  showProgress: true,                  // Optional. Show progress bar in stdout
  verbose: true,                       // Optional. Show all S3 operations in stdout (GET, PUT, DELETE)
  signatureVersion: 'v4',              // Optional. Signature Version used in Authentication. Defaults to "v4"
  maxRetries: 10,                      // Optional. Maximum request retries on an S3 object. Defaults to 10.
  timeout: 10000                       // Optional. Amount of time for request to timeout. Defaults to 10000 (10s)
})

const context = {
  bucket: 'my-bucket',
  prefix: 'path/to/files/'
}

lambda
  .context(context)
  .forEach(object => {
    // do something with object
  })
  .then(_ => console.log('done!'))
  .catch(console.error)

Setting Context

Before initiating a lambda expression, you must tell s3-lambda what files to operate over by calling context. A context is defined with an options object with the following properties: bucket, prefix, marker, limit, and reverse.

lambda.context({
  bucket: 'my-bucket',       // The S3 bucket to use
  prefix: 'prefix/',         // The prefix of the files to use - s3-lambda will operate over every file with this prefix.
  marker: 'prefix/file1',    // Optional. Start at the first file with this prefix. If it is a full file path, starts with next file. Defaults to null.
  endPrefix: 'prefix/file3', // Optional. Process files up to (not including) this prefix. Defaults to null.
  match: /2017/i,            // Optional. Process files matching this regex / string. Defaults to null.
  limit: 1000,               // Optional. Limit the # of files operated over. Default is Infinity.
  reverse: false             // Optional. If true, operate over all files in reverse. Defaults to false.
})

You can also provide an array of context options, which will tell ls-lambda to operate over all the files in each.

const ctx1 = {
  bucket: 'my-bucket',
  prefix: 'path/to/files/',
  marker: 'path/to/logs/2017'
}
const ctx2 = {
  bucket: 'my-other-bucket',
  prefix: 'path/to/other/logs/',
  limit: 100
}

lambda.context([ctx1, ctx2])

Modifiers

After setting context, you can chain several other functions that modify the operation. Each returns a Request object, so they can be chained. All of these are optional.

.concurrency(c)

{Number} Set the request concurrency level (default is Infinity).

.exclude(e)

{Function} Sets the exclude function to use before getting objects from S3. This function will be called with the key and should return true if the object should be excluded.
Example: exclude png files

lambda
  .context(context)
  .exclude(key => /.png$/.test(key))
  .each(...)

.transform(f)

{Function} Sets the transformation function to use when getting objects. This transformer will be called with the raw object that is returned by the S3#getObject() method in the AWS SDK and the key, and should return the transformed object. When a transformer function is provided, objects are not automatically converted to strings, and the encoding parameter is ignored. Example: unzipping compressed S3 files before each operation

const zlib = require('zlib')

lambda
  .context(context)
  .transform((object) => {
    return zlib.gunzipSync(object.Body).toString('utf8')
  })
  .each(...)

.encode(e)

{String} Sets the string encoding to use when getting objects. This setting is ignored if a transformer function is used.

limit(l)

{Number} Limit the number of files operated over.

reverse(r)

{Boolean} Reverse the order of files operated over.

async()

Lets the resolver know that your function is async (returns a Promise).

Lambda Functions

Perform synchronous or asynchronous functions over each file in the set context.

  • each
  • forEach
  • map
  • reduce
  • filter

each

each(fn[, isasync])

Performs fn on each S3 object in parallel. You can set the concurrency level (defaults to Infinity). If isasync is true, fn should return a Promise.

lambda
  .context(bucket, prefix)
  .concurrency(5) // operates on 5 objects at a time
  .each(object => console.log(object))
  .then(_ => console.log('done!'))
  .catch(console.error)

forEach

forEach(fn[, isasync])

Same as each, but operates sequentially, one file at a time. Setting concurrency for this function is superfluous.

lambda
  .context(bucket, prefix)
  .forEach(object => { /* do something with object */ })
  .then(_ => console.log('done!'))
  .catch(console.error)

map

map(fn[, isasync])

Maps fn over each file in an S3 directory, replacing each file with what is returned from the mapper function. If isasync is true, fn should return a Promise.

This is a destructive action, meaning what you return from fn will change the S3 object itself. For your protection, you must specify inplace() to map over the existing files. Alternatively, you can use output() to output the results of the mapper function elsewhere (as demonstrated below). You can pass a third argument (a function) to rename the output key (bucket + prefix).

const addSmiley = object => object + ':)'

lambda
  .context(bucket, prefix)
  .inplace()
  .map(addSmiley)
  .then(console.log('done!'))
  .catch(console.error)

Make this non-destructive by specifying an output directory.

const outputBucket = 'my-bucket'
const outputPrefix = 'path/to/output/'

lambda
  .context(bucket, prefix)
  .output(outputBucket, outputPrefix, (key) => key.replace('-', '/'))
  .map(addSmiley)
  .then(console.log('done!'))
  .catch(console.error)

reduce

reduce(func[, isasync])

Reduces the objects in the working context to a single value.

// concatonates all the files
const reducer = (previousValue, currentValue, key) => {
  return previousValue + currentValue
}

lambda
  .context(bucket, prefix)
  .reduce(reducer)
  .then(result => { /* do something with result */ })
  .catch(console.error)

filter

filter(func[, isasync])

Destructive. Filters (deletes) files in S3. func should return true to keep the object, and false to delete it. If isasync is true, func returns a Promise.

This is a destructive action, meaning if fn is false, the object will be deleted from S3. For your protection, you must specify inplace() to filter the existing files. Alternatively, you can use output() to output the results of the filter function elsewhere (as demonstrated below). As with map, you can pass a function to output to rename the output key.

// filters empty files
const fn = object => object.length > 0

lambda
  .context(bucket, prefix)
  .inplace()
  .filter(fn)
  .then(_ => console.log('done!'))
  .catch(console.error)

Make this non-destructive by specifying an output directory.

lambda
  .context(bucket, prefix)
  .output(outputBucket, outputPrefix, (key) => key.replace('-', '/'))
  .filter(filter)
  .then(console.log('done!'))
  .catch(console.error()

S3 Functions

Promise-based wrapper around common S3 methods.

  • list
  • keys
  • get
  • put
  • copy
  • delete

list

list(bucket, prefix[, marker])

List all keys in s3://bucket/prefix. If you use a marker, s3-lambda will start listing alphabetically from there.

lambda
  .list(bucket, prefix)
  .then(list => console.log(list))
  .catch(console.error)

keys

keys(bucket, prefix[, marker])

Returns an array of keys for the given bucket and prefix.

lambda
  .keys(bucket, prefix)
  .then(keys => console.log(keys))
  .catch(console.error)

get

get(bucket, key[, encoding[, transformer]])

Gets an object in S3, calling toString(encoding on objects.

lambda
  .get(bucket, key)
  .then(object => { /* do something with object */ })
  .catch(console.error)

Optionally you can supply your own transformer function to use when retrieving objects. This transformer will be called with the raw object that is returned by the S3#getObject() method in the AWS SDK, and should return the transformed object. When a transformer function is provided, objects are not automatically converted to strings, and the encoding parameter is ignored.

const zlib = require('zlib')

const transformer = object => {
  return zlib.gunzipSync(object.Body).toString('utf8')
}

lambda
  .get(bucket, key, null, transformer)
  .then(object => { /* do something with object */ })
  .catch(console.error)

put

put(bucket, key, object[, encoding])

Puts an object in S3. Default encoding is utf8.

lambda
  .put(bucket, key, 'hello world!')
  .then(console.log('done!')).catch(console.error)

copy

copy(bucket, key, targetBucket, targetKey)

Copies an object in S3 from s3://sourceBucket/sourceKey to s3://targetBucket/targetKey.

lambda
  .copy(sourceBucket, sourceKey, targetBucket, targetKey)
  .then(console.log('done!')).catch(console.error)

delete

delete(bucket, key)

Deletes an object in S3 (s3://bucket/key).

lambda
  .delete(bucket, key)
  .then(console.log('done!')).catch(console.error)

More Repositories

1

hivemind

For creating distributed jobs using AWS Lambda functions
JavaScript
251
star
2

soil

Simple OpenGL Image Library
C
132
star
3

axis360

Axis360 is a panoramic (360 video) rendering engine
HTML
121
star
4

starplate

Lightning fast template and view engine built on top of Incremental DOM and Babel (ES6 Templates + Incremental DOM working together)
JavaScript
68
star
5

ls-psvr-encoder

A simple command line tool to encode your 180 and 360 videos for sideloading with Littlstar's VR Cinema app for PSVR.
JavaScript
63
star
6

slant

Minimal 360 Video Player
JavaScript
18
star
7

docker-docsify

docsify as a docker (moby) container
Shell
18
star
8

request.cc

Simple HTTP request lib backed by libcurl, inspired by superagent.
C++
11
star
9

stardux

Functional DOM containers based on starplate and redux
JavaScript
10
star
10

littlstar-ios-sdk

Littlstar iOS SDK
Objective-C
9
star
11

lager

A module for collecting data and programmatically flushing it to a single endpoint.
JavaScript
9
star
12

sop

Simple OBJ Parser
C
9
star
13

libgossip

Message queueing in Objective-C without Foundation dependency
Objective-C
9
star
14

rotating-s3-stream

Writable local stream that rotates and syncs to AWS S3 based on max file size or age. Optimized for speed and concurrency.
JavaScript
8
star
15

l3

Dead simple local cdn
JavaScript
8
star
16

orthographic-camera

A high level 3D orthographic camera
JavaScript
6
star
17

pushlytics

Simple server to serve a pixel and record metrics in a LevelDB instance
JavaScript
6
star
18

trim.cc

std::string trim utility
C++
5
star
19

json-struct

validate JSON structures
JavaScript
5
star
20

gqlc

GraphQL Schema Compiler
JavaScript
5
star
21

proposal-did-method-dat

A proposal for a Decentralized Identity (DID) method for the DAT Protocol
4
star
22

dat-did-resolver

DID Method for the DAT Protocol. Resolve a DDO (DID Document) for a DAT.
JavaScript
4
star
23

video-transcode-experiment

C
4
star
24

axis3d-extrude-geometry

JavaScript
4
star
25

ruby-sdk

The official Littlstar Ruby Gem Client Library
Ruby
4
star
26

bpp

Bits-per-pixel calculator
JavaScript
3
star
27

gean

"Composable" Generator Control
JavaScript
3
star
28

three-canvas-renderer

JavaScript
3
star
29

slant-photo

Minimal 360 photo viewer built with three.js.
JavaScript
3
star
30

web-lager

[DEPRECATED] Flexible logger with Express access logging and S3 integration
JavaScript
3
star
31

mlist

Fast lisp like memory lists
JavaScript
3
star
32

chasma

Application Screen Management
JavaScript
3
star
33

acts_as_starable

A gem to add staring functionality to ActiveRecord models
Ruby
3
star
34

git-tree

Tree command scoped to your git project leveraging .gitignore
Shell
3
star
35

clamp

Macro to clamp a value between two other values
C
2
star
36

python-sdk

Littlstar Python SDK
Python
2
star
37

slant-controls

Core video controls API for slant
JavaScript
2
star
38

littlstar.github.io

Littlstar Developer Portal Jekyll Site
CSS
2
star
39

axis3d-orbit-camera

Orbit camera controller for Axis3D
JavaScript
2
star
40

lsplayer-iframe-sdk

Littlstar iframe SDK
JavaScript
2
star
41

slant-player

Slant video player component
JavaScript
2
star
42

regl-examples

Example usage of regl
JavaScript
1
star
43

eslint-config-littlstar

JavaScript
1
star
44

three-vr-effect

THREE.VREffect as a module
JavaScript
1
star
45

wonder3d

A minimal 360 video player
JavaScript
1
star
46

axis3d-inputs

Common user inputs for Axis3D.
JavaScript
1
star
47

redshift-query

Simple function to query Redshift
JavaScript
1
star
48

nrsession

Fetch npm modules and jump into a repl
JavaScript
1
star
49

LSBatch

Batch control flow for Objective-C
Objective-C
1
star
50

persist-store

Syncs file across multiple sources
JavaScript
1
star
51

axis3d-obj-geometry

Converts .OBJ source into an Axis3D Geometry instance.
JavaScript
1
star