• Stars
    star
    143
  • Rank 257,007 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 6 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Serverless execution of R code on AWS Lambda

aws-lambda-r-runtime

Build Status

This project makes it easy to run AWS Lambda Functions written in R.

Example

To run the example, we need to create a IAM role executing our lambda. This role should have the following properties:

  • Trusted entity – Lambda.
  • Permissions – AWSLambdaBasicExecutionRole.

Furthermore you need a current version of the AWS CLI.

Then create a lambda function which uses the R runtime layer:

cd example/
chmod 755 script.R
zip function.zip script.R
# current region
region=$(aws configure get region)
# latest runtime layer ARN for R 3.6.0 in most regions
# for an accurate list, please have a look at the deploy section of the travis ci build log
# https://travis-ci.com/bakdata/aws-lambda-r-runtime
runtime_layer=arn:aws:lambda:$region:131329294410:layer:r-runtime-3_6_0:13
aws lambda create-function --function-name r-example \
    --zip-file fileb://function.zip --handler script.handler \
    --runtime provided --timeout 60 \
    --layers ${runtime_layer} \
    --role <role-arn>

The function simply increments 'x' by 1. Invoke the function:

aws lambda invoke --function-name r-example \
    --payload '{"x":1}' response.txt
cat response.txt

The expected result should look similar to this:

2

Using packages

We also provide a layer which ships with some recommended R packages, such as Matrix. This example lambda shows how to use them:

cd example/
chmod 755 matrix.R
zip function.zip matrix.R
# current region
region=$(aws configure get region)
# latest runtime layer ARN for R 3.6.0 in most regions
# for an accurate list, please have a look at the deploy section of the travis ci build log
# https://travis-ci.com/bakdata/aws-lambda-r-runtime
runtime_layer=arn:aws:lambda:$region:131329294410:layer:r-runtime-3_6_0:13
# latest recommended layer ARN for R 3.6.0 in most regions
# for an accurate list, please have a look at the deploy section of the travis ci build log
# https://travis-ci.com/bakdata/aws-lambda-r-runtime
recommended_layer=arn:aws:lambda:$region:131329294410:layer:r-recommended-3_6_0:13
aws lambda create-function --function-name r-matrix-example \
    --zip-file fileb://function.zip --handler matrix.handler \
    --runtime provided --timeout 60 --memory-size 3008 \
    --layers ${runtime_layer} ${recommended_layer} \
    --role <role-arn>

The function returns the second column of some static matrix. Invoke the function:

aws lambda invoke --function-name r-matrix-example response.txt
cat response.txt

The expected result should look similar to this:

[4,5,6]

Provided layers

Layers are only accessible in the AWS region they were published. We provide the following layers:

r-runtime

R, httr, jsonlite, aws.s3, logging

Available AWS regions:

  • ap-northeast-1
  • ap-northeast-2
  • ap-south-1
  • ap-southeast-1
  • ap-southeast-2
  • ca-central-1
  • eu-central-1
  • eu-north-1
  • eu-west-1
  • eu-west-2
  • eu-west-3
  • sa-east-1
  • us-east-1
  • us-east-2
  • us-west-1
  • us-west-2

Available R versions:

  • 3_5_1
  • 3_5_3
  • 3_6_0

Latest ARN can be retrieved from the Travis CI build log. In general, it looks this:

arn:aws:lambda:$region:131329294410:layer:r-runtime-$r_version:$layer_version

Automated command for retrieving the ARN does not work currently:

aws lambda list-layer-versions --max-items 1 --no-paginate  \
    --layer-name arn:aws:lambda:${region}:131329294410:layer:r-runtime-${r_version} \
    --query 'LayerVersions[0].LayerVersionArn' --output text

r-recommended

The recommended packages that ship with R: boot, class, cluster, codetools, foreign, KernSmooth, lattice, MASS, Matrix, mgcv, nlme, nnet, rpart, spatial, survival

Available AWS regions:

  • ap-northeast-1
  • ap-northeast-2
  • ap-south-1
  • ap-southeast-1
  • ap-southeast-2
  • ca-central-1
  • eu-central-1
  • eu-north-1
  • eu-west-1
  • eu-west-2
  • eu-west-3
  • sa-east-1
  • us-east-1
  • us-east-2
  • us-west-1
  • us-west-2

Available R versions:

  • 3_5_1
  • 3_5_3
  • 3_6_0

Latest ARN can be retrieved from the Travis CI build log. In general, it looks this:

arn:aws:lambda:$region:131329294410:layer:r-recommended-$r_version:$layer_version

Automated command for retrieving the ARN does not work currently:

aws lambda list-layer-versions --max-items 1 --no-paginate  \
    --layer-name arn:aws:lambda:${region}:131329294410:layer:r-recommended-${r_version} \
    --query 'LayerVersions[0].LayerVersionArn' --output text

r-awspack

The aws.s3 package. It used to contain the awspack package but unfortunately this package has been retired. You can still find it in old versions of the layer that have been published before 2020.

Available AWS regions:

  • ap-northeast-1
  • ap-northeast-2
  • ap-south-1
  • ap-southeast-1
  • ap-southeast-2
  • ca-central-1
  • eu-central-1
  • eu-north-1
  • eu-west-1
  • eu-west-2
  • eu-west-3
  • sa-east-1
  • us-east-1
  • us-east-2
  • us-west-1
  • us-west-2

Available R versions:

  • 3_5_1
  • 3_5_3
  • 3_6_0

Latest ARN can be retrieved from the Travis CI build log. In general, it looks this:

arn:aws:lambda:$region:131329294410:layer:r-awspack-$r_version:$layer_version

Automated command for retrieving the ARN does not work currently:

aws lambda list-layer-versions --max-items 1 --no-paginate  \
    --layer-name arn:aws:lambda:${region}:131329294410:layer:r-awspack-${r_version} \
    --query 'LayerVersions[0].LayerVersionArn' --output text

Documentation

The lambda handler is used to determine both the file name of the R script and the function to call. The handler must be separated by ., e.g., script.handler.

The lambda payload is unwrapped as named arguments to the R function to call, e.g., {"x":1} is unwrapped to handler(x=1).

The lambda function returns whatever is returned by the R function as a JSON object.

Building custom layers

In order to install additional R packages, you can create a lambda layer containing the libraries, just as in the second example. You must use the the compiled package files. The easiest way is to install the package with install.packages() and copy the resulting folder in $R_LIBS. Using only the package sources does not suffice. The file structure must be R/library/<my-library>. If your package requires system libraries, place them in R/lib/.

You can use Docker for building your layer. You need to run ./docker_build.sh first. Then you can install your packages inside the container and copy the files to your machine. See awspack/ for an example. The build.sh script is used to run the docker container and copy sources to your machine. The entrypoint.sh script is used for installing packages inside the container.

Debugging

In order to make the runtime log debugging messages, you can set the environment variable LOGLEVEL to DEBUG.

Limitations

AWS Lambda is limited to running with 3GB RAM and must finish within 15 minutes. It is therefore not feasible to execute long running R scripts with this runtime. Furthermore, only the /tmp/ directory is writeable on AWS Lambda. This must be considered when writing to the local disk.

Building

To build the layer yourself, you need to first build R from source. We provide a Docker image which uses the great docker-lambda project. Just run ./build.sh <version> and everything should be build properly.

If you plan to publish the runtime, you need to have a recent version of aws cli (>=1.16). Now run the <layer>/deploy.sh script. This creates a lambda layer named r-<layer>-<version> in your AWS account. You can use it as shown in the example.

Compiling on EC2

In case the Docker image does not properly represent the lambda environment, we also provide a script which launches an EC2 instance, compiles R, and uploads the zipped distribution to S3. You need to specify the R version, e.g., 3.6.0, as well as the S3 bucket to upload the distribution to. Finally, you need to create an EC2 instance profile which is capable of uploading to the S3 bucket. See the AWS documentation for details. With everything prepared, you can run the script:

./remote_compile_and_deploy.sh <version> <bucket-name> <instance-profile>

The script will also take care of terminating the launched EC2 instance.

To manually build R from source, follow these steps:

Start an EC2 instance which uses the Lambda AMI:

aws ec2 run-instances --image-id ami-657bd20a --count 1 --instance-type t2.medium --key-name <my-key-pair>

Now run the compile.sh script in r/. You must pass the R version as a parameter to the script, e.g., 3.6.0. The script produces a zip containing a functional R installation in /opt/R/. The relevant files can be found in r/build/bin/. Use this R distribution for building the layers.

Testing

After building all layers, you can test it locally with SAM CLI and Docker. Install it via pipenv install --dev. Then run python3 -m unittest. This will spawn a local lambda server via Docker and invokes the lambdas defined in template.yaml.

More Repositories

1

fluent-kafka-streams-tests

Fluent Kafka Streams Test with Java
Java
88
star
2

kafka-large-message-serde

A Kafka Serde that reads and writes records from and to Blob storage (S3, Azure, Google) transparently.
Java
54
star
3

streams-explorer

Explore Apache Kafka data pipelines in Kubernetes.
Python
45
star
4

conquery

Visual, interactive queries against big databases
Java
31
star
5

streams-bootstrap

Utility functions and base classes for Kafka Streams applications
Java
29
star
6

kafka-key-value-store

Queryable Kafka Topics with Kafka Streams.
Java
24
star
7

dedupe

Java DSL for (online) deduplication
Java
20
star
8

google-airflow-premier-league

Data Warehousing Made Easy with Google BigQuery and Apache Airflow
Shell
19
star
9

kafka-error-handling

A library for error handling in Kafka Streams.
Java
18
star
10

conversational-search-bot

Conversational search bot for book and movie recommendations
Jupyter Notebook
15
star
11

quick

The Fastest Way to Create Live Data Products
Java
12
star
12

kpops

Deploy Kafka pipelines to Kubernetes
Python
12
star
13

kafka-profile-store-demo

A Kafka Streams application that creates a queryable profile store
Java
12
star
14

citus-k8s-membership-manager

A manager for connecting Citus worker nodes with the Citus master node.
Python
11
star
15

rebalancing-demo

Repository that showcases problems with Kafka rebalancing and explains how to fix them. Please visit our blog article to learn what Kafka rebalancing is, what can cause problems, and how to solve them. https://medium.com/bakdata/solving-my-weird-kafka-rebalancing-problems-c05e99535435
Java
10
star
16

common-kafka-streams-demo

Demo Kafka Streaming NLP pipeline for the blogpost "Continuous NLP Pipelines with Python, Java, and Apache Kafka"
Java
9
star
17

kserve-client

A Java client for KServe inference services
Java
8
star
18

ignite-hbase

Adapter to use ignite as hbase cache
Java
7
star
19

faust-bootstrap

Utility functions and base classes for Faust applications
Python
7
star
20

faust-avro-serializer

An improved version of https://github.com/marcosschroh/python-schema-registry-client/
Python
7
star
21

kafka-brute-force-serde

A Kafka SerDe to deserialize messages of an unknown serialization format
Java
4
star
22

kafka-streams-kserve-demo

A demo to accompany our blogpost "Scalable Machine Learning with Kafka Streams and KServe"
Java
4
star
23

ci-templates

Collection of reusable workflows and composite actions for Github
Shell
4
star
24

s3-backed-serde-demo

Demo of S3-backed Kafka Streams SerDe
Java
3
star
25

connection_plugin

Templating for Airflow connections
Python
3
star
26

kafka-connect-resetter

An application to reset the state of Kafka Connect connectors
Java
3
star
27

sub-topologies-demo

Demo application for the Blogpost "Optimizing Kafka Streams Topologies running on Kubernetes"
Java
3
star
28

generic-avro-reflect

Allows generic types to be serialized in avro
Java
2
star
29

nginx-kafka-request-routing

Scaling Requests to Queryable Apache Kafka Topics with nginx
Lua
2
star
30

seq2

This library is a extension to the great jOOλ library (https://github.com/jOOQ/jOOL)
Java
2
star
31

faust-large-message-serializer

A Faust Serializer that reads and writes records from and to S3 or Azure Blob Storage transparently.
Python
2
star
32

kafka-dead-letter-analyzer

A Kafka Streams application that analyzes dead letters in your Kafka cluster
Java
2
star
33

hpi-ii-project-2022

Repository containing the code for the assignment of the Information Integration course in SoSe 2022
Python
2
star
34

kafka-connect-plugins

A collection Kafka Connect plugins
Java
1
star
35

cqapi

Python
1
star
36

bakdata-code-styles

The default code styles + tools to setup for the major IDEs
Shell
1
star
37

gradle-plugins

Convience plugins that orchestrate and preconfigure other plugins for release management or code quality.
Kotlin
1
star
38

gradle-avro-dependency-plugin

A Gradle plugin that lets you compile Apache Avro schemas to Java classes and supports dependencies
Kotlin
1
star
39

pipeline-atm-fraud

Demo pipeline for ATM fraud detection using streams-bootstrap
Java
1
star
40

rclone-helm-chart

Helm chart for deploying an rclone CronJob
Mustache
1
star