• This repository has been archived on 07/Mar/2020
  • Stars
    star
    102
  • Rank 335,584 (Top 7 %)
  • Language
    JavaScript
  • Created almost 10 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An AWS Lambda function in Node.js reading events from Amazon Kinesis and writing event counts to DynamoDB

AWS Lambda Node.js Example Project

DEPRECATED: This project is no longer maintained. If you wish to process a Kinesis stream of Snowplow events using an AWS Lambda application, we recommend using the Snowplow JavaScript and TypeScript Analytics SDK.

Build Status Release License

Introduction

This is an example AWS Lambda application for processing a Kinesis stream of events (introductory blog post). It reads the stream of simple JSON events generated by our event generator. Our AWS Lambda function aggregates and buckets events and stores them in DynamoDB.

This was built by the Data Science team at Snowplow Analytics, who use AWS Lambda in their projects.

Running this requires an Amazon AWS account, and will incur charges.

See also: Spark Streaming Example Project | Spark Example Project

Overview

We have implemented a super-simple analytics-on-write stream processing job using AWS Lambda. Our AWS Lambda function, written in JavaScript, reads a Kinesis stream containing events in a JSON format:

{
  "timestamp": "2015-06-05T12:54:43.064528",
  "type": "Green",
  "id": "4ec80fb1-0963-4e35-8f54-ce760499d974"
}

Our job counts the events by type and aggregates these counts into 1 minute buckets. The job then takes these aggregates and saves them into a table in DynamoDB:

dynamodb-table-image

Developer Quickstart

Assuming git, Vagrant and VirtualBox installed:

 host$ git clone https://github.com/snowplow/aws-lambda-nodejs-example-project.git
 host$ cd aws-lambda-example-project
 host$ vagrant up && vagrant ssh
guest$ cd /vagrant
guest# npm install grunt
guest$ npm install
guest$ grunt --help

Tutorial

You can follow along in the release blog post to get the project up and running yourself.

The following steps assume that you are running inside Vagrant, as per the Developer Quickstart above.

1. Setting up AWS

First we need to configure a default AWS profile:

$ aws configure
AWS Access Key ID [None]: ...
AWS Secret Access Key [None]: ...
Default region name [None]: us-east-1
Default output format [None]: json

Now we can create our DynamoDB table, Kinesis stream, and IAM role. We will be using CloudFormation to make our new role. Using Grunt, we can create all like so:

$ grunt init
Running "dynamo:default" (dynamo) task
{ TableDescription:
   { AttributeDefinitions: [ [Object], [Object], [Object] ],
     CreationDateTime: Sun Jun 28 2015 13:04:02 GMT-0700 (PDT),
     ItemCount: 0,
     KeySchema: [ [Object], [Object] ],
     LocalSecondaryIndexes: [ [Object] ],
     ProvisionedThroughput:
      { NumberOfDecreasesToday: 0,
        ReadCapacityUnits: 20,
        WriteCapacityUnits: 20 },
     TableName: 'my-table',
     TableSizeBytes: 0,
     TableStatus: 'CREATING' } }

Running "createRole:default" (createRole) task
{ ResponseMetadata: { RequestId: 'd29asdff0-1dd0-11e5-984e-35a24700edda' },
  StackId: 'arn:aws:cloudformation:us-east-1:84asdf429716:stack/kinesisDynamo/d2af8730-1dd0-11e5-854a-50d5017c76e0' }

Running "kinesis:default" (kinesis) task
{}

Done, without errors.

2. Connect AWS Lambda service with the new role and building the project

Wait a minute to ensure our IAM service role gets created. Now we connect the new service role to access Kinesis, CloudWatch, Lambda, and DynamoDB. We will attach an admin policy to the lambda exec role to easily access the services. Using Grunt, our AWS Lambda function gets assembled into a zip file for upload to the AWS Lambda service. Once it's zipped, we attach a service role to it:

$ grunt role
Running "attachRole:default" (attachRole) task
{ ResponseMetadata: { RequestId: '36ac7877-1dca-11e5-b439-d1da60d122be' } }

Running "packaging:default" (packaging) task
[email protected] ../../../../var/folders/3t/7nlz8rzs2mq5fg_sf3x4j7_m0000gn/T/1435519004662.0046/node_modules/aws-lambda-example-project
├── [email protected]
├── [email protected]
├── [email protected] ([email protected])
├── [email protected] ([email protected])
├── [email protected] ([email protected], [email protected], [email protected], [email protected])
├── [email protected]
├── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected], [email protected])
└── [email protected] ([email protected], [email protected], [email protected])
Created package at dist/aws-lambda-example-project_0-1-0_latest.zip
...

3. Deploy zip file to AWS Lambda service and connect Kinesis to Lambda

In deploy this project to Lambda with the grunt deploy command:

$ grunt deploy
Running "deployLambda:default" (deployLambda) task
Trying to create AWS Lambda Function...
Created AWS Lambda Function...

4. Connect Kinesis to Lambda

The final step to getting this projected ready to start processing events is to associate our Kinesis stream to the Lambda function with this command:

$ grunt connect
Running "associateStream:default" (associateStream) task
arn:aws:kinesis:us-east-1:844709429716:stream/my-stream
{ BatchSize: 100,
  EventSourceArn: 'arn:aws:kinesis:us-east-1:2349429716:stream/my-stream',
  FunctionArn: 'arn:aws:lambda:us-east-1:2349429716:function:ProcessKinesisRecordsDynamo',
  LastModified: Sun Jun 28 2015 12:38:37 GMT-0700 (PDT),
  LastProcessingResult: 'No records processed',
  State: 'Creating',
  StateTransitionReason: 'User action',
  UUID: 'f4efc-fe72-4337-9907-89d4e64c' }

Done, without errors.

5. Sending events to Kinesis

We need to start sending events to our new Kinesis stream. We have created a helper method to do this - run the below and leave it running in a tab:

$ grunt events
Writing Kineis Event: {"timestamp":"2015-06-29T20:12:21.625Z","type":"Red"}
{ SequenceNumber: '49552099319153062484931809176874704852938278389141209090',
  ShardId: 'shardId-000000000000' }
Writing Kineis Event: {"timestamp":"2015-06-29T20:12:22.200Z","type":"Red"}
{ SequenceNumber: '49552099319153062484931809176875913778757893018315915266',
  ShardId: 'shardId-000000000000' }
Writing Kineis Event: {"timestamp":"2015-06-29T20:12:22.708Z","type":"Green"}
{ SequenceNumber: '49552099319153062484931809176877122704577507716210098178',
  ShardId: 'shardId-000000000000' }
...

6. Monitoring your job

First head over to the AWS Lambda service console, then review the logs in CloudWatch.

Finally, let's check the data in our DynamoDB table. Make sure you are in the correct AWS region, then click on my-table and hit the Explore Table button:

dynamodb-table-image

For each BucketStart and EventType pair, we see a Count, plus some CreatedAt and UpdatedAt metadata for debugging purposes. Our bucket size is 1 minute, and we have 5 discrete event types, hence the matrix of rows that we see.

Roadmap

  • Various improvements for the 0.2.0 release
  • Expanding our analytics-on-write thinking into our new Icebucket project

Credits

Copyright and license

AWS Lambda Example Project is copyright 2015 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

More Repositories

1

snowplow

The leader in Next-Generation Customer Data Infrastructure
Scala
6,834
star
2

snowplow-javascript-tracker

Snowplow event tracker for client-side and server-side JavaScript. Add analytics to your websites, web apps and servers.
TypeScript
546
star
3

iglu

Iglu is a machine-readable, open-source schema repository for JSON Schema from the team at Snowplow
Shell
207
star
4

ansible-playbooks

Ansible playbooks to install common platforms and tools (e.g. JVM, Ruby, Postgres etc.)
Shell
178
star
5

snowplow-mini

An easily-deployable, single-instance version of Snowplow
Go
125
star
6

iglu-central

Contains all JSON Schemas, Avros and Thrifts for Iglu Central
Shell
118
star
7

snowplow-android-tracker

Snowplow event tracker for Android. Add analytics to your Android apps and games
Kotlin
109
star
8

scala-maxmind-iplookups

Scala client for MaxMind Geo-IP
Scala
86
star
9

sql-runner

Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Go
81
star
10

snowplow-ios-tracker

Snowplow event tracker for Swift and Objective-C. Add analytics to your iOS, macOS, tvOS and watchOS apps and games
Swift
81
star
11

snowplow-web-data-model

SQL data model for working with Snowplow web data. Supports Redshift and Looker. Snowflake and BigQuery coming soon
LookML
61
star
12

dbt-snowplow-web

A fully incremental model, that transforms raw web event data generated by the Snowplow JavaScript tracker into a series of derived tables of varying levels of aggregation.
Shell
55
star
13

chrome-snowplow-inspector

Web Extension for debugging Snowplow pixels.
TypeScript
49
star
14

scala-forex

High-performance Scala library for performing exchange rate lookups and currency conversions
Scala
46
star
15

scala-weather

High-performance Scala library for looking up the weather
Scala
45
star
16

snowplow-python-tracker

Snowplow event tracker for Python. Add analytics to your Python and Django apps, webapps and games
Python
43
star
17

snowplow-s3-loader

Mirrors a Kinesis stream to Amazon S3 using the KCL
Scala
42
star
18

data-models

⚠️ MAINTENANCE-ONLY MODE: Snowplow maintained SQL data models for working with Snowplow web and mobile behavioral data.
PLpgSQL
41
star
19

snowplow-php-tracker

Snowplow event tracker for PHP. Add analytics into your PHP apps and scripts
PHP
34
star
20

snowplow-rdb-loader

Stores Snowplow enriched events in Redshift, Snowflake and Databricks
Scala
31
star
21

snowplow-react-native-tracker

Snowplow event tracker for react-native apps
TypeScript
31
star
22

stream-collector

Collector for cloud-native web, mobile and event analytics, running on AWS and GCP
Scala
27
star
23

snowplow-golang-tracker

Snowplow event tracker for Golang. Add analytics to your Go apps and servers
Go
25
star
24

snowplow-nodejs-tracker

Snowplow event tracker for Node.js. Add analytics to your JavaScript apps, node-webkit projects and Node.js servers
TypeScript
24
star
25

snowplow-java-tracker

Snowplow event tracker for Java. Add analytics to your Java desktop and server apps, servlets and games. (See also: snowplow-android-tracker)
Java
24
star
26

snowplow-dotnet-tracker

Snowplow event tracker for .NET. Add analytics to your ASP.NET, C#, F# and Visual Basic apps, servers and games
C#
22
star
27

snowplow-ruby-tracker

Snowplow event tracker for Ruby. Add analytics to your Ruby and Rails apps and gems
Ruby
22
star
28

enrich

Snowplow Enrichment jobs and library
Scala
21
star
29

quickstart-examples

Examples of how to automate creating a Snowplow Community Edition pipeline
HCL
21
star
30

snowplow-python-analytics-sdk

Python SDK for working with Snowplow enriched events in Spark, AWS Lambda et al.
Python
21
star
31

snowplow-scala-analytics-sdk

Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.
Scala
20
star
32

dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
Go
19
star
33

snowplow-unity-tracker

Snowplow event tracker for Unity. Add analytics to your Unity games and apps
C#
16
star
34

snowbridge

For replicating streams across clouds, accounts and regions
Go
15
star
35

iglu-example-schema-registry

Example static schema registry for Iglu
15
star
36

dbt-snowplow-mobile

A fully incremental model, that transforms raw mobile event data generated by the Snowplow mobile trackers into a series of derived tables of varying levels of aggregation.
Shell
14
star
37

iglu-server

A RESTful schema registry
Scala
13
star
38

dbt-snowplow-utils

Snowplow utility functions to be used in conjunction with the snowplow-web dbt package.
PLpgSQL
13
star
39

kinesis-tee

Unix tee, but for Kinesis streams
Scala
12
star
40

snowplowanalytics.com

The Snowplow website
HTML
12
star
41

dbt-snowplow-fractribution

Snowplow Fractribution (marketing attribution) model for dbt
Python
11
star
42

snowplow-elasticsearch-loader

Writes Snowplow enriched events from Kinesis to Elasticsearch
Scala
11
star
43

documentation

Snowplow Documentation Website
JavaScript
10
star
44

dbt-snowplow-unified

A fully incremental model, that transforms raw web & mobile event data generated by the Snowplow JavaScript & mobile trackers into a series of derived tables of varying levels of aggregation.
Shell
10
star
45

snowplow-tracking-cli

Command-line app for tracking Snowplow events. Add analytics to your shell scripts and terminal sessions
Go
9
star
46

snowplow-gtm-server-side-client

A Google Tag Manager Server-side Client template for collecting events using the Snowplow JavaScript Tracker
Smarty
9
star
47

snowplow-cpp-tracker

Snowplow event tracker for C++. Add analytics to your C++ applications, games and servers
C++
9
star
48

dbt-snowplow-media-player

A fully incremental model, that transforms media player event data generated by the Snowplow JavaScript tracker into derived tables for easier querying
Shell
9
star
49

igluctl

A command-line tool for working with Iglu schema registries
Scala
8
star
50

snowplow-scala-tracker

Snowplow event tracker for Scala. Add analytics to your Scala, Akka and Play apps and servers
Scala
8
star
51

snowplow-looker-demo

LookML for the Snowplow Looker demo
LookML
7
star
52

snowplow-badrows

Scala
7
star
53

release-manager

Uploads zipfiles to Bintray and creates versions
Python
7
star
54

snowplow-rust-tracker

Rust
7
star
55

dbt-snowplow-ecommerce

A fully incremental model, that transforms raw ecommerce event data generated by the Snowplow JavaScript tracker into a series of derived tables representing various ecommerce data objects.
Shell
7
star
56

snowplow-arduino-tracker

Snowplow event tracker for Arduino. Add analytics to sketches on IP-connected Arduino boards
C++
7
star
57

dbt-snowplow-attribution

An incremental dbt package revolving around marketing attribution analysis
PLpgSQL
6
star
58

snowplow-flutter-tracker

Snowplow event tracker for Flutter apps
Dart
5
star
59

schema-ddl

ASTs and generators for producing various DDL and Schema formats
Scala
5
star
60

iglu-scala-client

Scala client for Iglu schema registry
Scala
5
star
61

snowplow-golang-analytics-sdk

Golang Analytics SDK for working with Snowplow enriched events in cloud functions and other Go applications.
Go
5
star
62

iab-spiders-and-robots-java-client

Java 8+ client library for the IAB and ABC International Spiders and Robots list
Java
5
star
63

snowplow-gtm-server-side-tag

A Google Tag Manager Server-side Tag template for sending events to a Snowplow Collector
Smarty
5
star
64

beam-enrich

Dataflow job reading tracked events from PubSub, validating and enriching them and writing them back to PubSub
Scala
4
star
65

snowplow-dotnet-analytics-sdk

C#
4
star
66

snowplow-gtm-server-side-amplitude-tag

A Google Tag Manager Server-side Amplitude Tag template for send events to the Amplitude HTTP API v2
Smarty
4
star
67

marketing-attribution-accelerator

A Snowplow accelerator which describes how to do marketing attribution with Snowplow
Shell
4
star
68

snowplow-lua-tracker

Snowplow event tracker for Lua. Add analytics to your Lua apps and Lua-scripted games
Lua
4
star
69

snowplow-aws-lambda-source

Sends Amazon S3 object operations into Snowplow, implemented as an AWS Lambda
4
star
70

snowplow-actionscript3-tracker

Snowplow event tracker for ActionScript 3.0. Add analytics to your Flash Player 9+, Flash Lite 4 and AIR games, apps and widgets
ActionScript
4
star
71

advanced-analytics-web-accelerator

Tutorial and visualisations showing how to instrument web analytics with Snowplow
Shell
3
star
72

dbt-snowplow-normalize

A dbt package to support modelling event data via split tables for use in downstream tools and systems.
Python
3
star
73

mobile-hybrid-apps-accelerator

Tutorial and demo apps showing how to instrument hybrid mobile apps with Snowplow tracking
Shell
3
star
74

composable-cdp-with-predictive-ml-modeling-accelerator

A composable CDP accelerator using Snowplow, Databricks & Hightouch
HTML
3
star
75

advanced-analytics-mobile-accelerator

Tutorial and visualisations showing how to instrument mobile analytics with Snowplow
Shell
2
star
76

scala-util

Reusable Scala code from Snowplow Analytics
Scala
2
star
77

snowplow-server-agent

Server monitoring agent compatible with Snowplow
1
star
78

iglu-scala-core

Core entities for working with Iglu in Scala
Scala
1
star
79

looker-snowplow-mobile

A LookML block, that uses data from the Snowplow JavaScript tracker and Mobile Data Model derived tables and makes it available for exploration in Looker.
LookML
1
star
80

stream-enrich

Application reading tracked events from Kafka/Kinesis/NSQ, validating and enriching them and writing them back to Kafka/Kinesis/NSQ
Scala
1
star
81

common-enrich

Library containing the logic to validate and enrich tracked events. Used by Stream Enrich and Beam Enrich
Scala
1
star
82

snowplow-full-demo-lookml

LookML for the full Snowplow Looker demo
LookML
1
star
83

snowplow-ecommerce-tracking-accelerator

Shell
1
star
84

iglu-javascript-client

Browser JavaScript client for Iglu
JavaScript
1
star