• Stars
    star
    393
  • Rank 109,518 (Top 3 %)
  • Language
    JavaScript
  • License
    MIT No Attribution
  • Created over 9 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data ingestion for Amazon Elasticsearch Service from S3 and Amazon Kinesis, using AWS Lambda: Sample code

Streaming Data to Amazon Elasticsearch Service

Using AWS Lambda: Sample Node.js Code

Introduction

It is often useful to stream data, as it gets generated, for indexing in an Amazon Elasticsearch Service domain. This helps fresh data to be available for search or analytics. To do this requires:

  1. Knowing when new data is available
  2. Code to pick up and parse the data into JSON documents, and add them to an Amazon Elasticsearch (henceforth, ES for short) domain.
  3. Scalable and fully managed infrastructure to host this code

Lambda is an AWS service that takes care of these requirements. Put simply, it is an "event handling" service in the cloud. Lambda lets us implement the event handler (in Node.js or Java), which it hosts and invokes in response to an event.

The handler can be triggered by a "push" or a "pull" approach. Certain event sources (such as S3) push an event notification to Lambda. Others (such as Kinesis) require Lambda to poll for events and pull them when available.

For more details on AWS Lambda, please see the documentation.

This package contains sample Lambda code (in Node.js) to stream data to ES from two common AWS data sources: S3 and Kinesis. The S3 sample takes apache log files, parses them into JSON documents and adds them to ES. The Kinesis sample reads JSON data from the stream and adds them to ES.

Note that the sample code has been kept simple for reasons for clarity. It does not handle ES document batching, or eventual consistency issues for S3 updates, etc.

Setup Overview

While some detailed instructions are covered later in this file and elsewhere (in the Lambda documentation), this section aims to show the larger picture that the individual steps work to accomplish. We assume that the data source (an S3 bucket or a Kinesis stream, in this case) and an ES domain are already set up.

  1. Deployment Package: The "Deployment Package" is the event handler code files and its dependencies packaged as a zip file. The first step in creating a new Lambda function is to prepare and upload this zip file.

  2. Lambda Configuration:

    1. Handler: The name of the main code file in the deployment package, with the file extension replaced with a .handler suffix.
    2. Memory: The memory limit, based on which the EC2 instance type to use is determined. For now, the default should do.
    3. Timeout: The default timeout value (3 seconds) is quite low for our use-case. 10 seconds might work better, but please adjust based on your testing.
  3. Authorization: Since there is a need here for various AWS services making calls to each other, appropriate authorization is required. This takes the form of configuring an IAM role, to which various authorization policies are attached. This role will be assumed by the Lambda function when running.

Note:

  • The AWS Console is simpler to use for configuration than other methods.
  • Lambda is currently available only in a few regions (us-east-1, us-west-2, eu-west-1, ap-northeast-1).
  • Once the setup is complete and tested, enable the data source in the Lambda console, so that data may start streaming in.
  • The code is kept simple for purposes of illustration. It doesn't batch documents when loading the ES domain, or (for S3 updates) handle eventual consistency cases.

Deployment Package Creation

  1. On your development machine, download and install Node.js.

  2. Anywhere, create a directory structure similar to the following:

    eslambda (place sample code here)
    |
    +-- node_modules (dependencies will go here)
    
  3. Modify the sample code with the correct ES endpoint, region, index and document type.

  4. Install each dependency imported by the sample code (with the require() call), as follows:

    npm install <dependency>
    

    Verify that these are installed within the node_modules subdirectory.

  5. Create a zip file to package the code and the node_modules subdirectory

    zip -r eslambda.zip *
    

The zip file thus created is the Lambda Deployment Package.

S3-Lambda-ES

Set up the Lambda function and the S3 bucket as described in the Lambda-S3 Walkthrough. Please keep in mind the following notes and configuration overrides:

  • The walkthrough uses the AWS CLI for configuration, but it's probably more convenient to use the AWS Console (web UI)

  • The S3 bucket must be created in the same region as Lambda is, so that it can push events to Lambda.

  • When registering the S3 bucket as the data-source in Lambda, add a filter for files having .log suffix, so that Lambda picks up only apache log files.

  • The following authorizations are required:

    1. Lambda permits S3 to push event notification to it
    2. S3 permits Lambda to fetch the created objects from a given bucket
    3. ES permits Lambda to add documents to the given domain

    The Lambda console provides a simple way to create an IAM role with policies for (1). For (2), when creating the IAM role, choose the "S3 execution role" option; this will load the role with permissions to read from the S3 bucket. For (3), add the following access policy to permit ES operations to the role.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "es:*"
                ],
                "Effect": "Allow",
                "Resource": "*"
            }
        ]
    }
    

Kinesis-Lambda-ES

Set up the Lambda function and the Kinesis stream as described in the Lambda-Kinesis Walkthrough. Please keep in mind the following notes and configuration overrides:

  • The walkthrough uses the AWS CLI, but it's probably more convenient to use the AWS Console (web UI) for Lambda configuration.

  • To the IAM role assigned to the Lambda function, add the following access policy to permit ES operations.

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Action": [
                      "es:*"
                  ],
                  "Effect": "Allow",
                  "Resource": "*"
              }
          ]
      }
    
  • For testing: If you have a Kinesis client, use it to stream a record to Lambda. If not, the AWS CLI could be used to push a JSON document to Lambda.

    aws kinesis put-record --stream-name <lambda name> --data "<JSON document>" --region <region> --partition-key shardId-000000000000
    

Copyright

Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.

SPDX-License-Identifier: MIT-0

More Repositories

1

aws-cdk-examples

Example projects using the AWS CDK
Python
4,121
star
2

aws-serverless-workshops

Code and walkthrough labs to set up serverless applications for Wild Rydes workshops
JavaScript
3,977
star
3

aws-workshop-for-kubernetes

AWS Workshop for Kubernetes
Shell
2,618
star
4

aws-machine-learning-university-accelerated-nlp

Machine Learning University: Accelerated Natural Language Processing Class
Jupyter Notebook
2,080
star
5

aws-serverless-airline-booking

Airline Booking is a sample web application that provides Flight Search, Flight Payment, Flight Booking and Loyalty points including end-to-end testing, GraphQL and CI/CD. This web application was the theme of Build on Serverless Season 2 on AWS Twitch running from April 24th until end of August in 2019.
Vue
1,967
star
6

ecs-refarch-cloudformation

A reference architecture for deploying containerized microservices with Amazon ECS and AWS CloudFormation (YAML)
Makefile
1,673
star
7

lambda-refarch-webapp

The Web Application reference architecture is a general-purpose, event-driven, web application back-end that uses AWS Lambda, Amazon API Gateway for its business logic. It also uses Amazon DynamoDB as its database and Amazon Cognito for user management. All static content is hosted using AWS Amplify Console.
JavaScript
1,561
star
8

serverless-patterns

Serverless patterns. Learn more at the website: https://serverlessland.com/patterns.
Python
1,544
star
9

aws-modern-application-workshop

A tutorial for developers that want to learn about how to build modern applications on top of AWS. You will build a sample website that leverages infrastructure as code, containers, serverless code functions, CI/CD, and more.
1,459
star
10

amazon-bedrock-workshop

This is a workshop designed for Amazon Bedrock a foundational model service.
Jupyter Notebook
1,419
star
11

aws-machine-learning-university-accelerated-cv

Machine Learning University: Accelerated Computer Vision Class
Jupyter Notebook
1,409
star
12

aws-glue-samples

AWS Glue code samples
Python
1,277
star
13

aws-deepracer-workshops

DeepRacer workshop content
Jupyter Notebook
1,086
star
14

aws-genai-llm-chatbot

A modular and comprehensive solution to deploy a Multi-LLM and Multi-RAG powered chatbot (Amazon Bedrock, Anthropic, HuggingFace, OpenAI, Meta, AI21, Cohere, Mistral) using AWS CDK on AWS
TypeScript
1,061
star
15

aws-refarch-wordpress

This reference architecture provides best practices and a set of YAML CloudFormation templates for deploying WordPress on AWS.
PHP
1,001
star
16

aws-machine-learning-university-accelerated-tab

Machine Learning University: Accelerated Tabular Data Class
Jupyter Notebook
955
star
17

aws-serverless-ecommerce-platform

Serverless Ecommerce Platform is a sample implementation of a serverless backend for an e-commerce website. This sample is not meant to be used as an e-commerce platform as-is, but as an inspiration on how to build event-driven serverless microservices on AWS.
Python
947
star
18

aws-big-data-blog

Java
895
star
19

machine-learning-samples

Sample applications built using AWS' Amazon Machine Learning.
Python
867
star
20

eks-workshop

AWS Workshop for Learning EKS
CSS
777
star
21

startup-kit-templates

CloudFormation templates to accelerate getting started on AWS.
Python
760
star
22

aws-incident-response-playbooks

756
star
23

aws-security-reference-architecture-examples

Example solutions demonstrating how to implement patterns within the AWS Security Reference Architecture guide using CloudFormation and Customizations for AWS Control Tower.
Python
731
star
24

retail-demo-store

AWS Retail Demo Store is a sample retail web application and workshop platform demonstrating how AWS infrastructure and services can be used to build compelling customer experiences for eCommerce, retail, and digital marketing use-cases
Jupyter Notebook
708
star
25

lambda-refarch-imagerecognition

The Image Recognition and Processing Backend reference architecture demonstrates how to use AWS Step Functions to orchestrate a serverless processing workflow using AWS Lambda, Amazon S3, Amazon DynamoDB and Amazon Rekognition.
JavaScript
662
star
26

aws-secure-environment-accelerator

The AWS Secure Environment Accelerator is a tool designed to help deploy and operate secure multi-account, multi-region AWS environments on an ongoing basis. The power of the solution is the configuration file which enables the completely automated deployment of customizable architectures within AWS without changing a single line of code.
HTML
653
star
27

simple-websockets-chat-app

This SAM application provides the Lambda functions, DynamoDB table, and roles to allow you to build a simple chat application based on API Gateway's new WebSocket-based API feature.
JavaScript
632
star
28

aws-codedeploy-samples

Samples and template scenarios for AWS CodeDeploy
Shell
627
star
29

emr-bootstrap-actions

This repository hold the Amazon Elastic MapReduce sample bootstrap actions
Shell
612
star
30

aws-bookstore-demo-app

AWS Bookstore Demo App is a full-stack sample web application that creates a storefront (and backend) for customers to shop for fictitious books. The entire application can be created with a single template. Built on AWS Full-Stack Template.
TypeScript
612
star
31

generative-ai-use-cases-jp

すぐに業務活用できるビジネスユースケース集付きの安全な生成AIアプリ実装
TypeScript
611
star
32

aws-lex-web-ui

Sample Amazon Lex chat bot web interface
JavaScript
607
star
33

hardeneks

Runs checks to see if an EKS cluster follows EKS Best Practices.
Python
603
star
34

lambda-refarch-mobilebackend

Serverless Reference Architecture for creating a Mobile Backend
Objective-C
584
star
35

amazon-personalize-samples

Notebooks and examples on how to onboard and use various features of Amazon Personalize
Jupyter Notebook
572
star
36

aws-serverless-workshop-innovator-island

Welcome to the Innovator Island serverless workshop! This repo contains all the instructions and code you need to complete the workshop.
JavaScript
564
star
37

kubernetes-for-java-developers

A Day in Java Developer’s Life, with a taste of Kubernetes
Java
562
star
38

aws-iot-chat-example

💬 Chat application using AWS IoT platform via MQTT over the WebSocket protocol
JavaScript
534
star
39

aws-dynamodb-examples

DynamoDB Examples
JavaScript
532
star
40

aws-amplify-graphql

Sample using AWS Amplify and AWS AppSync together for user login and authorization when making GraphQL queries and mutations. Also includes complex objects for uploading and downloading data to and from S3 with a React app.
JavaScript
521
star
41

aws-mobile-appsync-chat-starter-angular

GraphQL starter progressive web application (PWA) with Realtime and Offline functionality using AWS AppSync
TypeScript
520
star
42

aws-serverless-security-workshop

In this workshop, you will learn techniques to secure a serverless application built with AWS Lambda, Amazon API Gateway and RDS Aurora. We will cover AWS services and features you can leverage to improve the security of a serverless applications in 5 domains: identity & access management, code, data, infrastructure, logging & monitoring.
JavaScript
505
star
43

amazon-forecast-samples

Notebooks and examples on how to onboard and use various features of Amazon Forecast.
Jupyter Notebook
471
star
44

lambda-refarch-fileprocessing

Serverless Reference Architecture for Real-time File Processing
Python
450
star
45

ecs-blue-green-deployment

Reference architecture for doing blue green deployments on ECS.
Python
442
star
46

cloudfront-authorization-at-edge

Protect downloads of your content hosted on CloudFront with Cognito authentication using cookies and Lambda@Edge
TypeScript
439
star
47

aws-service-catalog-reference-architectures

Sample CloudFormation templates and architecture for AWS Service Catalog
JavaScript
430
star
48

amazon-bedrock-samples

This repository contains examples for customers to get started using the Amazon Bedrock Service. This contains examples for all available foundational models
Jupyter Notebook
422
star
49

siem-on-amazon-opensearch-service

A solution for collecting, correlating and visualizing multiple types of logs to help investigate security incidents.
Python
409
star
50

aws-microservices-deploy-options

This repo contains a simple application that consists of three microservices. Each application is deployed using different Compute options on AWS.
Jsonnet
407
star
51

aws-cost-explorer-report

Python SAM Lambda module for generating an Excel cost report with graphs, including month on month cost changes. Uses the AWS Cost Explorer API for data.
Python
406
star
52

aws-security-workshops

A collection of the latest AWS Security workshops
Jupyter Notebook
401
star
53

aws-sam-java-rest

A sample REST application built on SAM and DynamoDB that demonstrates testing with DynamoDB Local.
Java
400
star
54

amazon-textract-textractor

Analyze documents with Amazon Textract and generate output in multiple formats.
Jupyter Notebook
390
star
55

amazon-cloudfront-functions

JavaScript
388
star
56

aws-saas-factory-bootcamp

SaaS on AWS Bootcamp - Building SaaS Solutions on AWS
JavaScript
376
star
57

aws-lambda-extensions

A collection of sample extensions to help you get started with AWS Lambda Extensions
Go
376
star
58

amazon-sagemaker-notebook-instance-lifecycle-config-samples

A collection of sample scripts to customize Amazon SageMaker Notebook Instances using Lifecycle Configurations
Shell
366
star
59

non-profit-blockchain

Builds a blockchain network and application to track donations to non-profit organizations, using Amazon Managed Blockchain
SCSS
360
star
60

amazon-textract-code-samples

Amazon Textract Code Samples
Jupyter Notebook
355
star
61

amazon-neptune-samples

Samples and documentation for using the Amazon Neptune graph database service
JavaScript
355
star
62

lambda-refarch-streamprocessing

Serverless Reference Architecture for Real-time Stream Processing
JavaScript
349
star
63

amazon-ecs-java-microservices

This is a reference architecture for java microservice on Amazon ECS
Java
345
star
64

sessions-with-aws-sam

This repo contains all the SAM templates created in the Twitch series #SessionsWithSAM. The show is every Thursday on Twitch at 10 AM PDT.
JavaScript
343
star
65

amazon-rekognition-video-analyzer

A working prototype for capturing frames off of a live MJPEG video stream, identifying objects in near real-time using deep learning, and triggering actions based on an objects watch list.
JavaScript
343
star
66

aws-eks-accelerator-for-terraform

The AWS EKS Accelerator for Terraform is a framework designed to help deploy and operate secure multi-account, multi-region AWS environments. The power of the solution is the configuration file which enables the users to provide a unique terraform state for each cluster and manage multiple clusters from one repository. This code base allows users to deploy EKS add-ons using Helm charts.
HCL
338
star
67

aws-deepcomposer-samples

Jupyter Notebook
336
star
68

amazon-ecs-mythicalmysfits-workshop

A tutorial for developers who want to learn about how to containerized applications on top of AWS using AWS Fargate. You will build a sample website that leverages infrastructure as code, containers, CI/CD, and more! If you're planning on running this, let us know @ [email protected]. At re:Invent 2018, these sessions were run as CON214/CON321/CON322.
HTML
334
star
69

aws-iot-examples

Examples using AWS IoT (Internet of Things). Deprecated. See README for updated guidance.
JavaScript
331
star
70

aws-media-services-simple-vod-workflow

Lab that covers video conversion workflow for Video On Demand using AWS MediaConvert.
Python
328
star
71

php-examples-for-aws-lambda

Demo serverless applications, examples code snippets and resources for PHP
PHP
324
star
72

aws-serverless-cicd-workshop

Learn how to build a CI/CD pipeline for SAM-based applications
CSS
317
star
73

create-react-app-auth-amplify

Implements a basic authentication flow for signing up/signing in users as well as protected client side routing using AWS Amplify.
JavaScript
314
star
74

api-gateway-secure-pet-store

Amazon API Gateway sample using Amazon Cognito credentials through AWS Lambda
Objective-C
309
star
75

aws-etl-orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Python
307
star
76

amazon-textract-serverless-large-scale-document-processing

Process documents at scale using Amazon Textract
Python
302
star
77

lambda-go-samples

An example of using AWS Lambda with Go
Go
302
star
78

amazon-cloudfront-secure-static-site

Create a secure static website with CloudFront for your registered domain.
JavaScript
300
star
79

amazon-ecs-firelens-examples

Sample logging architectures for FireLens on Amazon ECS and AWS Fargate.
300
star
80

aws-nodejs-sample

Sample project to demonstrate usage of the AWS SDK for Node.js
JavaScript
299
star
81

aws-cognito-apigw-angular-auth

A simple/sample AngularV4-based web app that demonstrates different API authentication options using Amazon Cognito and API Gateway with an AWS Lambda and Amazon DynamoDB backend that stores user details in a complete end to end Serverless fashion.
JavaScript
297
star
82

lambda-ecs-worker-pattern

This example code illustrates how to extend AWS Lambda functionality using Amazon SQS and the Amazon EC2 Container Service (ECS).
POV-Ray SDL
291
star
83

aws-lambda-fanout

A sample AWS Lambda function that accepts messages from an Amazon Kinesis Stream and transfers the messages to another data transport.
JavaScript
289
star
84

aws-saas-factory-ref-solution-serverless-saas

Python
286
star
85

aws-mlu-explain

Visual, Interactive Articles About Machine Learning: https://mlu-explain.github.io/
JavaScript
285
star
86

aws-serverless-shopping-cart

Serverless Shopping Cart is a sample implementation of a serverless shopping cart for an e-commerce website.
Python
282
star
87

aws-serverless-samfarm

This repo is full CI/CD Serverless example which was used in the What's New with AWS Lambda presentation at Re:Invent 2016.
JavaScript
280
star
88

eb-node-express-sample

Sample Express application for AWS Elastic Beanstalk
EJS
279
star
89

eb-py-flask-signup

HTML
270
star
90

codepipeline-nested-cfn

CloudFormation templates, CodeBuild build specification & Python scripts to perform unit tests of a nested CloudFormation template.
Python
269
star
91

aws-amplify-auth-starters

Starter projects for developers looking to build web & mobile applications that have Authentication & protected routing
269
star
92

aws-containers-task-definitions

Task Definitions for running common applications Amazon ECS
264
star
93

aws-proton-cloudformation-sample-templates

Sample templates for AWS Proton
262
star
94

aws2tf

aws2tf - automates the importing of existing AWS resources into Terraform and outputs the Terraform HCL code.
Shell
261
star
95

aws-cdk-changelogs-demo

This is a demo application that uses modern serverless architecture to crawl changelogs from open source projects, parse them, and provide an API and website for viewing them.
JavaScript
260
star
96

designing-cloud-native-microservices-on-aws

Introduce a fluent way to design cloud native microservices via EventStorming workshop, this is a hands-on workshop. Contains such topics: DDD, Event storming, Specification by example. Including the AWS product : Serverless Lambda , DynamoDB, Fargate, CloudWatch.
Java
257
star
97

aws-secrets-manager-rotation-lambdas

Contains Lambda functions to be used for automatic rotation of secrets stored in AWS Secrets Manager
Python
256
star
98

lambda-refarch-iotbackend

Serverless Reference Architecture for creating an IoT Backend
Python
251
star
99

aws-health-aware

AHA is an incident management & communication framework to provide real-time alert customers when there are active AWS event(s). For customers with AWS Organizations, customers can get aggregated active account level events of all the accounts in the Organization. Customers not using AWS Organizations still benefit alerting at the account level.
Python
250
star
100

Intelli-Agent

Chatbot Portal with Agent: Streamlined Workflow for Building Agent-Based Applications
Python
250
star