• Stars
    star
    143
  • Rank 256,076 (Top 6 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created about 6 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Use AWS Lambda to perform free-text search on documents - With SAM Template

Lambda Serverless Search

I love elasticsearch. I love serverless functions. But I love serverless functions more because they're cheaper to run. The purpose of this project is to allow the benefits of free text searching but work and scale at the most minimal cost.

The search algorithm powering the system is lunrjs.

Limitations

Remember, this is a poorman's elastic search.

  • Great for exposing search for sets of new data and existing data
  • You only get the index id, not the entire document
  • Use as a lite api before migrating to a full scale search solution
  • More documents can mean slower performance - how much? Below I've noted my performance observations
  • AWS Lambda Memory requirements might need to be updated as per dataset
  • This is not a database, it is a search service. You will get results with the reference id only, not the entire document.

AWS Components

  • S3
  • Lambda (256mb)
  • API Gateway

Getting Started

You may head over to the Serverless Application Repository and deploy the service.

You will have to provide two parameters when you deploy:

TargetBucket - The Name of S3 Bucket that should be created, this is where all the documents will sit

Note: remember the S3 bucket naming conventions, only lowercase and alphanumberic

InternalAPIKey - This API Key is a secret string. Do not share this key with anyone, it will allow you to change your index configuration

You may test the API in postman. Be sure to update the BaseURL. Read below for route docs and design.

Run in Postman

After deploying here are somethings you might want to:

  • Change the default internal API key
  • Add Auth to your routes to restrict access

Design

alt text

API Routes

After you deploy, you will end up with a base URL:

https://${myapi}.execute-api.amazonaws.com/Prod/


POST /internal/config

Creates an Index(s) for the articles. You may update this whenever you want to.

body parameters definition
apikey An Internal Auth String to only let people with access make a request. Keep this secret, don't make this request from a client
config Array of index config objects. See below table

Config Body

body parameters definition required
fields Array of strings with the name of attributes that are to be indexed in document yes
name The name of the index yes
ref The ref is one field that will be returned. Most people use an ID, that they can later lookup in a DB or other store yes
shards This value sets the number of records per index. If individual documents are large in size, then you want smaller shards. By default, an index is sharded at 2000 shards no
Input
{
    "apikey":"supersecretkey",
	"configs":[
		{
			"name":"movies",
			"fields":["title","year","director","year","genre","tldr"],
			"ref": "id",
			"shards": 1000
		},
		{	"name":"movies-title",
			"fields":["title","year","director","year","genre","tldr"],
			"ref": "title"
		}
	]
	
}
Response
{
	"msg":"Index Config Updated"
}

POST /add

Adds a new article to search, you may upload either an array, or a single object

Input
 [
	    {           
		"id":"112233",
		"title": "Titanic",
		"year": 1997,
		"director": "Steven Spielberg",
		"genre": "Romance",
		"tldr": "An Amazing love story"
	    },
	    {           
		"id":"115566",
		"title": "Shawshank Redemption",
		"year": 1994,
		"director": "Frank Darabont",
		"genre": "Misc.",
		"tldr": "Story of friendship"
	    }
	
]
Response
{
	"msg":"Article Added"
}

GET /search

Searches all the articles

Input
query parameters required definition Example
q yes query string to be searched /Prod/search?q=titan&index=movies
index yes index to be used /Prod/search?q=get&index=movies
count no count of search result to return. Default: 25 /Prod/search?q=get&index=movies&count=50

Both parameters are required.

You may tweak the search algorithm here. LunrJS docs will also help.

Response
    [
        {
            "ref": "112233",
            "score": 1.992,
            "matchData": {
                "metadata": {
                    "titan": {
                        "title": {}
                    }
                }
            }
        }
    ]

GET /internal/config

Return the schema that is being used to index the documents

Response
    {
    	"fields":["title","year","director","year","genre","tldr"],
	"ref": "id"
    }

Performance

Here are some graphs on performance that I have done. It's not going to win any races, or even come close to algolia or elasticsearch. The real killer is network latency which is a non-negotiable ~2s depending on the index size. There might be a better way to query it with Athena that might speed things along.

alt text

Lambda memory allocation has a huge impact!

alt text

DocumentSearchFunction:

  • All search indexes are loaded in parallel to improve concurrency
  • The higher the memory for a Lambda function, the more the compute power, hence faster index searching. (you can see up to 75% increase in speed), just adjust the slider.

DocumentIndexingFunction:

  • The lower the number of individual articles, the faster the indexing time
  • You may see my scale test folder in the code above. It checks how long it takes to see a record appear in the results after it's uploaded. Latency of indexing operations degrades to about 30X over the course of 12K records.
  • Bulk uploads tend to decrease the amount of time
  • The higher the memory for a Lambda function, the more the compute power, hence faster index building

Next Steps, Optimizations and Future

  • Add pagination for large sets of results
  • Upload Index directly to the Lambda Function (this would radically improve performance)
  • Update to get all S3 Articles and Content via AWS Athena
  • Use Cloudfront with S3 to cache the index document
  • Add Cache to keep track of most popular results in order to dynamically perform result boosts

More Repositories

1

Scatter-Stocks

Visualize the impact of current events on stocks
JavaScript
50
star
2

Forms-BottomSheet

A bottom sheet control for Xamarin Forms with Tutorial
Assembly
29
star
3

y-serverless

Serverless Provider using Websockets and DynamoDB for YJS
JavaScript
21
star
4

Xamarin.Forms_Authentication

Xamarin Forms Solution which uses Facebook Authentication
Assembly
20
star
5

serverless-websockets-template

AWS Lambda + Websockets Template with Serverless
TypeScript
9
star
6

Hackend

Repo for issues, bugs and requests!
6
star
7

EasyFormsAuth

Xamarin.Forms Auth with Embedded Browser.
C#
6
star
8

airppt

Go from a PPT UI Mockup to Working HTML prototype - no code necessary
TypeScript
4
star
9

remarkable-parser

node.js parser for remarkable files
JavaScript
4
star
10

Xamarin.Forms-ChatMessenger

Uses the JSQmessages component and makes it work with Xamarin.forms
Java
4
star
11

Merge-CSV

Join two CSVs on a common key (with support to also cleanup records)
TypeScript
3
star
12

goingbackhome

HTML5 Flight Navigation Game
JavaScript
3
star
13

quickstart-aws

Contains Methods and Templates to quickly create an AWS Serverless Backend with DynamoDB or S3
JavaScript
2
star
14

amna

Amna Application
2
star
15

PythonPractice

practice problems and solutions using Python
Python
2
star
16

serverless-rest-api-template

Template to use with serverless framework for Typescript + Koa API with Local Testing
TypeScript
1
star
17

resume

My PDF resume generated using markdown, html and css
HTML
1
star
18

Coconut-App

Use to find nearby coconut water
Java
1
star
19

CsharpProblems

Csharp Coding Problems and Solutions
C#
1
star