• Stars
    star
    381
  • Rank 108,343 (Top 3 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Adds query routing and rewriting extensions to pgbouncer

Have you ever wanted to split your database load across multiple servers or clusters without impacting the configuration or code of your client applications? Or perhaps you have wished for a way to intercept and modify application queries, so that you can make them use optimized tables (sorted, pre-joined, pre-aggregated, etc.), add security filters, or hide changes you have made in the schema?

The pgbouncer-rr project is based on pgbouncer, an open source PostgreSQL connection pooler. It adds two new significant features:

  1. Routing: intelligently send queries to different database servers from one client connection; use it to partition or load balance across multiple servers/clusters.
  2. Rewrite: intercept and programmatically change client queries before they are sent to the server: use it to optimize or otherwise alter queries without modifying your application.

Diagram1

Pgbouncer-rr works the same way as pgbouncer; any target application can be connected to pgbouncer-rr as if it were an Amazon Redshift or PostgreSQL server, and pgbouncer-rr creates a connection to the actual server, or reuses an existing connection.

You can elect to deploy multiple instances of pgbouncer-rr to avoid throughput bottlenecks or single points of failure, or to support multiple configurations. It can live in an Auto Scaling group, and behind an Elastic Load Balancing load balancer. It can be deployed to a public subnet while your servers reside in private subnets. You can choose to run it as a bastion server using SSH tunneling, or you can use pgbouncer's recently introduced SSL support for encryption and authentication.

Documentation and community support for pgbouncer can be easily found online; pgbouncer-rr is a superset of pgbouncer.

Now I’d like to talk about the query routing and query rewrite feature enhancements.

Query Routing

The routing feature maps client connections to server connections using a Python routing function which you provide. Your function is called for each query, with the client username and the query string as parameters. Its return value must identify the target database server; how it does this is entirely up to you.

For example, you might want to run two Amazon Redshift clusters, each optimized to host a distinct data warehouse subject area. You can determine the appropriate cluster for any given query based on the names of the schemas or tables used in the query. This can be extended to support multiple Amazon Redshift clusters or PostgreSQL instances.

In fact, you can even mix and match Amazon Redshift and PostgreSQL, taking care to ensure that your Python functions correctly handle any server-specific grammar in your queries; your database will throw errors if your routing function sends queries it can’t process. And, of course, any query must run entirely on the server to which it is routed; cross-database joins or multi-server transactions do not work!

Here’s another example: you might want to implement controlled load balancing (or A/B testing) across replicated clusters or servers. Your routing function can choose a server for each query based on any combination of the client username, the query string, random variables, or external input. The logic can be as simple or as sophisticated as you want.

Your routing function has access to the full power of the python language and the myriad of available Python modules. You can use the regular expression module (re) to match words and patterns in the query string, or use the SQL parser module (sqlparse) to support more sophisticated/robust query parsing. You may also want to use the AWS SDK module (boto) to read your routing table configurations from Amazon DynamoDB.

The Python routing function is dynamically loaded by pgbouncer-rr, from the file you specify in the configuration:
routing_rules_py_module_file = /etc/pgbouncer-rr/routing_rules.py

The file should contain the following Python function:
def routing_rules(username, query):

  • The function parameters will provide the username associated with the client, and a query string
  • The function may take optonally a database key name parameter:
    • In this case the file should contain the following Python function: def routing_rules(username, query, db_key): the db_key argument will be the name of the database key used for the query.
  • The function return value must be a valid database key name (dbkey) as specified in the configuration file, or None:
    • When a valid dbkey is returned by the routing function, the client connection will be routed to a connection in the specified server connection pool.
    • When None is returned by the routing function, the client remains routed to its current server connection.

The route function is called only for query and prepare packets, with the following restrictions:

  • All queries must run wholly on the assigned server; cross-server joins do not work
  • Ideally queries should autocommit each statement. Set pool_mode = statement in the configuration.
  • Multi-statement transactions will work correctly only if statements are not rerouted by the routing_rules function to a different server pool mid-transaction. Set pool_mode = transaction in the configuration.
  • If your application uses database catalog tables to discover the schema, then the routing_rules function should direct catalog table queries to a database server that has all the relevant schema objects created.

Simple Query Routing Example

Amazon Redshift cluster 1 has data in table 'tablea'. Amazon Redshift cluster 2 has data in table 'tableb'. You want a client to be able to issue queries against either tablea or tableb without needing to know which table resides on which cluster.

Create a (default) entry with a key, say, 'dev' in the [databases] section of the pgbouncer configuration. This entry determines the default cluster used for client connections to database 'dev'. You can make either redshift1 or redshift2 the default, or even specify a third 'default' cluster. Create additional entries in the pgbouncer [databases] section for each cluster; give these unique key names such as 'dev.1', and 'dev.2'.

[databases]
dev = host=<redshift1> port=5439 dbname=dev
dev.1 = host=<redshift1> port=5439 dbname=dev
dev.2 = host=<redshift2> port=5439 dbname=dev

Ensure the configuration file setting routing_rules_py_module_file specifies the path to your python routing function file, such as ~/routing_rules.py.

The code in the file could look like the following:

def routing_rules(username, query):
	if "tablea" in query:
		return "dev.1"
	elif "tableb" in query:
		return "dev.2"
	else:
		return None

Diagram2-Routing

This is a toy example, but it illustrates the concept.
If a client sends the query SELECT * FROM tablea, it matches the first rule, and is assigned to server pool 'dev.1' (redshift1).
If a client (and it could be the same client in the same session) sends the query SELECT * FROM tableb, it matches the second rule, and is assigned to server pool 'dev.2' (redshift2). Any query that does not match either rule results in None being returned, and the server connection remains unchanged.

Here is an alternative function for the same use case, but the routing logic is defined in a separate extensible data structure using regular expressions to find the table matches. The routing table structure could easily be externalized in a DynamoDB table.

# ROUTING TABLE
# ensure all dbkey values are defined in [database] section of the pgbouncer ini file 
routingtable = {
	'route' : [{
			'usernameRegex' : '.*',
			'queryRegex' : '.*tablea.*',
			'dbkey' : 'dev.1'
		}, {
			'usernameRegex' : '.*',
			'queryRegex' : '.*tableb.*',
			'dbkey' : 'dev.2'
		}
	],
	'default' : None
}

# ROUTING FN - CALLED FROM PGBOUNCER-RR - DO NOT CHANGE NAME
# IMPLEMENTS REGEX RULES DEFINED IN ROUTINGTABLE OBJECT
# RETURNS FIRST MATCH FOUND
import re
def routing_rules(username, query):
	for route in routingtable['route']:
		u = re.compile(route['usernameRegex'])
		q = re.compile(route['queryRegex'])
		if u.search(username) and q.search(query):
			return route['dbkey']
	return routingtable['default']

You will most likely want to implement more robust and sophisticated rules, taking care to avoid unintended matches. Write test cases to call your function with different inputs and validate the output dbkey values.

Query Rewrite

The rewrite feature provides the opportunity to manipulate application queries en route to the server, without modifying application code. You might want to do this to:

  • Optimize an incoming query to use the best physical tables, when you have:
    • Replicated tables with alternative sort/dist keys and column subsets (emulate projections).
    • Stored pre-joined or pre-aggregated data (emulate 'materialized views')
  • Apply query filters to support row level data partitioning/security
  • Roll out new schemas, resolve naming conflicts and the like, by changing identifier names on the fly.

The rewrite function is also implemented in a fully configurable python function, dynamically loaded from an external module specified in the configuration: rewrite_query_py_module_file = /etc/pgbouncer-rr/rewrite_query.py

The file should contain the python function:
def rewrite_query(username, query):

  • The function parameters will provide the username associated with the client, and a query string.
  • The function return value must be a valid SQL query string which will return the result set you want the client application to receive.

Implementing a query rewrite function is straightforward when the incoming application queries have fixed formats that are easily detectable and easily manipulated, perhaps using regular expression search/replace logic in the Python function. It is much more challenging to build a robust rewrite function to handle SQL statements with arbitrary format and complexity.

Enabling the query rewrite function triggers pgbouncer-rr to enforce that a complete query is contained in the incoming client socket buffer. Long queries are often split across multiple network packets; they should all be in the buffer before the rewrite function is called. This requires that the buffer size be large enough to accommodate the largest query. The default buffer size (2048) is likely too small, so specify a much larger size in the configuration: pkt_buf = 32768

If a partially received query is detected, and there is room in the buffer for the remainder of the query, pgbouncer-rr waits for the remaining packets to be received before processing the query. If the buffer is not large enough for the incoming query, or if it is not large enough to hold the re-written query (which may be longer than the original), then the rewrite function will fail. By default, the failure is logged, and the original query string will be passed to the server unchanged. You can force the client connection to terminate instead, by setting: rewrite_query_disconnect_on_failure = true.

Simple Query Rewrite Example

Scenario:
You have a star schema with a large fact table in Redshift (say, 'sales') with two related dimension tables (say 'store' and 'product'). You want to optimize equally for two different queries:

1> SELECT storename, SUM(total) FROM sales JOIN store USING (storeid) GROUP BY storename ORDER BY storename
2> SELECT prodname, SUM(total) FROM sales JOIN product USING (productid) GROUP BY prodname ORDER BY prodname

By experimenting, you have determined that the best possible solution is to have two additional tables, each optimized for one of the queries:

  1. store_sales: store and sales tables denormalized, pre-aggregated by store, and sorted and distributed by store name.
  2. product_sales: product and sales tables denormalized, pre-aggregated by product, sorted and distributed by product name.

So you implement the new tables, and take care of their population in your ETL processes. But, you'd like to avoid directly exposing these new tables to your reporting or analytic client applications. This might be the best optimization today, but who knows what the future holds? Maybe you'll come up with a better optimization later, or maybe Redshift will introduce cool new features that provide a simpler alternative.

So, you implement a pgbouncer-rr rewrite function to change the original queries on the fly.

Ensure the configuration file setting rewrite_query_py_module_file specifies the path to your python function file, say ~/rewrite_query.py.

The code in the file could look like this:

import re
def rewrite_query(username, query):
    q1="SELECT storename, SUM\(total\) FROM sales JOIN store USING \(storeid\) GROUP BY storename ORDER BY storename"
    q2="SELECT prodname, SUM\(total\) FROM sales JOIN product USING \(productid\) GROUP BY prodname ORDER BY prodname"
    if re.match(q1, query):
        new_query = "SELECT storename, SUM(total) FROM store_sales GROUP BY storename ORDER BY storename;"
    elif re.match(q2, query):
        new_query = "SELECT prodname, SUM(total) FROM product_sales GROUP BY prodname ORDER BY prodname;"
    else:
        new_query = query
    return new_query

Diagram3-Rewrite

Again, this is a toy example to illustrate the concept. In any real application, your python function will need to employ more robust query pattern matching and substitution.

Your reports and client applications use the same join query as before:

SELECT prodname, SUM(total) FROM sales JOIN product USING (productid) GROUP BY prodname ORDER BY prodname;

But now, when you look in the Redshift console 'Queries' tab, you will see that the query received by Redshift is the rewritten version that uses the new product_sales table - leveraging your pre-joined, pre-aggregated data and the targeted sort and dist keys:

SELECT prodname, SUM(total) FROM product_sales GROUP BY prodname ORDER BY prodname;

Getting Started

Install
Download and install pgbouncer-rr by running the following commands (Amazon Linux/RHEL/CentOS):

# install required packages - see https://github.com/pgbouncer/pgbouncer#building
sudo yum install libevent-devel openssl-devel python-devel libtool git patch make -y

# download the latest tested pgbouncer distribution - 1.12
git clone https://github.com/pgbouncer/pgbouncer.git --branch "pgbouncer_1_12_0"

# download pgbouncer-rr extensions
git clone https://github.com/awslabs/pgbouncer-rr-patch.git

# merge pgbouncer-rr extensions into pgbouncer code
cd pgbouncer-rr-patch
./install-pgbouncer-rr-patch.sh ../pgbouncer

# build and install
cd ../pgbouncer
git submodule init
git submodule update
./autogen.sh
./configure ...
make
sudo make install

Configure
Create a configuration file, using ./pgbouncer-example.ini as a starting point, adding your own database connections and python routing rules / rewrite query functions.

Set up user authentication - see authentication file format. NOTE: the recently added pgbouncer auth_query feature will unfortunately not work with Amazon Redshift.

By default, pgbouncer-rr does not support SSL/TLS connections. However, you can experiment with pgbouncer's newest TLS/SSL feature. Just add a private key and certificate to your pgbouncer-rr configuration:

client_tls_sslmode=allow
client_tls_key_file = ./pgbouncer-rr-key.key
client_tls_cert_file = ./pgbouncer-rr-key.crt

Hint: Here's how to easily generate a test key with a self-signed certificate using openssl:

openssl req -newkey rsa:2048 -nodes -keyout pgbouncer-rr-key.key -x509 -days 365 -out pgbouncer-rr-key.crt

Configure firewall
Configure your linux firewall to enable incoming connections on the configured pgbouncer-rr listening port. Example:

sudo firewall-cmd --zone=public --add-port=5439/tcp --permanent
sudo firewall-cmd --reload

If you are running pgbouncer-rr on an Amazon EC2 instance, the instance Security Group must also be configured to allow incoming TCP connections on the listening port.

Launch
Run pgbouncer-rr as a daemon using the commandline pgbouncer <config_file> -d.
See pgbouncer --help for commandline options. Hint: use -v to enable verbose logging. If you look carefully in the logfile you will see evidence of the query routing and query rewrite features in action.

Connect
Configure your client application as though you were connecting directly to a Redshift or PostgreSQL database, but be sure to use the pgbouncer-rr hostname and listening port.
Example – using psql

psql -h pgbouncer-dnshostname -U dbuser -d dev -p 5439

Example – JDBC driver URL (Redshift driver)

jdbc:redshift://pgbouncer-dnshostname:5439/dev

Deploy pgbouncer in Elastic Kubernetes Service (EKS)

Dockerization and deployment considerations

We choose to deploy PGBouncer as a container on Amazon Elastic Kubernetes Service (EKS) to allow it to horizontally scale to the Redshift or Postgresql client load. We first containerized and stored the image in Amazon Elastic Container Registry (ECR); then we deployed Kubernetes (1) Deployment and (2) Service. The Kubernetes Deployment,pgbouncer-deploy.yaml defines the PGBouncer configuration, such as logging or the location of routing rules, as well as Redshift and Postgresql database cluster endpoint credentials. The startup script start.sh generates the pgbouncer.ini upon the PGBouncer container init (so don't look for it here)

          envFrom:
            - secretRef:
               name: pgbouncer
          env:
           - name: default_pool_size
             value: "20"
           - name: log_connections
             value: "1"
           - name: log_disconnections
             value: "1"
           - name: log_pooler_errors
             value: "1"
           - name: log_stats
             value: "1"
           - name: routing_rules_py_module_file
             value: "/home/pgbouncer/routing_rules.py"

Security wise, we run the PGBouncer process in a non-root user to limit the process scope.

...
RUN useradd -ms /bin/bash pgbouncer && \
    chown pgbouncer /home/pgbouncer && \
    chown pgbouncer /

USER pgbouncer
WORKDIR /home/pgbouncer
...

We also use Kubernetes secrets to allow secure credentials loading at runtime only. The secrets are created with create-secrets.sh that issue kubectl create secret generic with a local secret file, pgbouncer.secrets. Make sure you avoid loading the file to your git repository by adding *.secrets to your .gitignore. Here is an example of a secret file:

PGB_DATABASE1=dev = host= <redshift1>port=5432 dbname=dev
PGB_DATABASE2=dev.1 = host=<redshift1> port=5432 dbname=dev
PGB_DATABASE3=dev.2 = host=<redshift2> port=5432 dbname=dev
PGB_ADMIN_USERS=myrsuser
PGB_ADMIN_PASSWORDS=mym0stsecurepass0wrd

The Kubernetes Service,pgbouncer-svc.yaml uses Network Load Balancer that points to the PGBouncer containers deployed in the Kubernetes cluster public subnets. We choose to allow public access to the PGBouncer endpoint for demonstration purposes but we can limit the endpoint to be internal for production systems. Note, you need to deploy the AWS Load Balancer Controller to automate the integration between EKS and NLB. The AWS Load Balancer Controller uses annotations like:

  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-name: "pgbouncer"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"

aws-load-balancer-scheme:"internet-facing" exposes the PGBouncer service publicly. aws-load-balancer-nlb-target-type: "ip" uses the PGBouncer pods as target rather than the EC2 instance.

Deployment steps

The EKS option automates the configuration and installation sections above. The deployment steps with EKS are:

./build.sh
  • Deploy PGBouncer replicas
kubectl apply -f pgbouncer-deploy.yaml
  • Deploy PGBouncer NLB
kubectl apply -f pgbouncer-svc.yaml
  • Discover the NLB endpoint
kubectl get svc pgbouncer
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP                                              PORT(S)          AGE
pgbouncer   LoadBalancer   10.100.190.30   pgbouncer-14d32ab567b83e8f.elb.us-west-2.amazonaws.com   5432:31005/TCP   2d1h

Use the EXTERNAL-IP value, pgbouncer-14d32ab567b83e8f.elb.us-west-2.amazonaws.com as the endpoint to connect the database

How to view the pgbouncer-rr logs?

Pgbouncer default logging writes the process logs to stdout and stderr as well as a preconfigured log file. One can view the log in three ways:

[pgbouncer-rr-patch]$kubectl get po 
NAME                           READY   STATUS    RESTARTS   AGE
appload-6cb95bb44b-s6kgb       1/1     Running   0          41h
appselect-7678cc87b6-cdfq9     1/1     Running   0          41h
fluentbit-976pf                1/1     Running   0          3d2h
fluentbit-b8st4                1/1     Running   0          191d
pgbouncer-5dc7498984-f6js2     1/1     Running   0          41h

Use the pgbouncer-5dc7498984-f6js2 for getting the container shell

[pgbouncer-rr-patch]$kubectl exec -it pgbouncer-5dc7498984-f6js2 -- /bin/bash
[pgbouncer@pgbouncer-5dc7498984-f6js2 ~]$ vi pgbouncer.log
  • Get the container stdout and stderr. Viewing the container's stdout and stderr is preferred if the pgbouncer process is the only one running.
[pgbouncer-rr-patch]$kubectl logs pgbouncer-5dc7498984-f6js2

How to make configuration changes?

You must push changes to EKS if you need to modify the pgbouncer configuration or binaries. Changing configurations is done using the kubectl tool, but updating binaries requires rebuilding the docker image and pushing it to the image registry (ECR). Below we describe both methods.

kubectl apply -f ./pgbouncer-deploy.yaml
  • Making binaries changes The pgbouncer-rr docker specification is stored in Dockerfile. Say you want to upgrade the pgbouncer version from 1.15 to 1.16. You need to modify --branch "pgbouncer_1_15_0". Instead of:
RUN git clone https://github.com/pgbouncer/pgbouncer.git --branch "pgbouncer_1_15_0" && \

use

RUN git clone https://github.com/pgbouncer/pgbouncer.git --branch "pgbouncer_1_16_0" && \

Then you rebuild and push the changes to the docker image registry and rollout the changes in Kubernetes

./build.sh
kubectl rollout restart deploy pgbouncer

Other uses for pgbouncer-rr

It can be used for lots of things, really. In addition to the examples shown above, here are some other use cases suggested by colleagues:

  • Serve small or repetitive queries from PostgreSQL tables consisting of aggregated results.
  • Parse SQL for job tracking tablenames to implement workload management with job tracking tables on PostgreSQL, simplifies application development.
  • Leverage multiple Redshift clusters to serve dashboarding workloads with heavy concurrency requirements.
    • determine the appropriate route based on the current workload/state of cluster resources (always route to the cluster with least running queries, etc).
  • Use Query rewrite to parse SQL for queries which do not leverage the nuances of Redshift query design or query best practices either block these queries or re-write them for performance.
  • Use SQL parse to limit end users ability to access tables with adhoc queries. e.g. identify users scanning N+ years of data and instead issue a query which blocks them with a re-write: SELECT 'WARNING: scans against v_all_sales must be limited to no more than 30 days' as alert;
  • Use SQL parse to identify queries which filter on certain criteria to re-write the query to direct the query towards a specific table containing data matching that filter.

Actually, your use cases don't need to be limited to just routing and query rewriting! You could design a routing function that leaves the route unchanged, but which instead implements purposeful side effects, such as:

  • Publish custom CloudWatch metrics, enabling you to monitor specific query patterns and/or user interactions with your database(s).
  • Capture SQL DDL and COPY/Write Statements and wrap them into Kinesis put-records as input to the method described in Erik Swensson's most excellent post: Building Multi-AZ or Multi-Region Amazon Redshift Clusters

We'd love to hear your thoughts and ideas for pgbouncer-rr functions.

Legal Notice

Copyright 2015-2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.

Licensed under the Amazon Software License (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/asl/ or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and limitations under the License.

More Repositories

1

git-secrets

Prevents you from committing secrets and credentials into git repositories
Shell
11,616
star
2

llrt

LLRT (Low Latency Runtime) is an experimental, lightweight JavaScript runtime designed to address the growing demand for fast and efficient Serverless applications.
JavaScript
7,555
star
3

aws-shell

An integrated shell for working with the AWS CLI.
Python
7,116
star
4

autogluon

AutoGluon: AutoML for Image, Text, and Tabular Data
Python
4,348
star
5

aws-cloudformation-templates

A collection of useful CloudFormation templates
Python
4,302
star
6

mountpoint-s3

A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
Rust
3,986
star
7

gluonts

Probabilistic time series modeling in Python
Python
3,686
star
8

deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Scala
2,871
star
9

aws-lambda-rust-runtime

A Rust runtime for AWS Lambda
Rust
2,829
star
10

aws-sdk-rust

AWS SDK for the Rust Programming Language
2,754
star
11

amazon-redshift-utils

Amazon Redshift Utils contains utilities, scripts and view which are useful in a Redshift environment
Python
2,643
star
12

diagram-maker

A library to display an interactive editor for any graph-like data.
TypeScript
2,359
star
13

amazon-ecr-credential-helper

Automatically gets credentials for Amazon ECR on docker push/docker pull
Go
2,261
star
14

amazon-eks-ami

Packer configuration for building a custom EKS AMI
Shell
2,164
star
15

aws-lambda-powertools-python

A developer toolkit to implement Serverless best practices and increase developer velocity.
Python
2,148
star
16

aws-well-architected-labs

Hands on labs and code to help you learn, measure, and build using architectural best practices.
Python
1,834
star
17

aws-config-rules

[Node, Python, Java] Repository of sample Custom Rules for AWS Config.
Python
1,473
star
18

smithy

Smithy is a protocol-agnostic interface definition language and set of tools for generating clients, servers, and documentation for any programming language.
Java
1,356
star
19

aws-support-tools

Tools and sample code provided by AWS Premium Support.
Python
1,290
star
20

open-data-registry

A registry of publicly available datasets on AWS
Python
1,199
star
21

sockeye

Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch
Python
1,181
star
22

aws-lambda-powertools-typescript

Powertools is a developer toolkit to implement Serverless best practices and increase developer velocity.
TypeScript
1,179
star
23

dgl-ke

High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
Python
1,144
star
24

aws-sdk-ios-samples

This repository has samples that demonstrate various aspects of the AWS SDK for iOS, you can get the SDK source on Github https://github.com/aws-amplify/aws-sdk-ios/
Swift
1,038
star
25

aws-sdk-android-samples

This repository has samples that demonstrate various aspects of the AWS SDK for Android, you can get the SDK source on Github https://github.com/aws-amplify/aws-sdk-android/
Java
1,018
star
26

aws-solutions-constructs

The AWS Solutions Constructs Library is an open-source extension of the AWS Cloud Development Kit (AWS CDK) that provides multi-service, well-architected patterns for quickly defining solutions
TypeScript
1,013
star
27

aws-cfn-template-flip

Tool for converting AWS CloudFormation templates between JSON and YAML formats.
Python
981
star
28

amazon-kinesis-video-streams-webrtc-sdk-c

Amazon Kinesis Video Streams Webrtc SDK is for developers to install and customize realtime communication between devices and enable secure streaming of video, audio to Kinesis Video Streams.
C
975
star
29

aws-lambda-go-api-proxy

lambda-go-api-proxy makes it easy to port APIs written with Go frameworks such as Gin (https://gin-gonic.github.io/gin/ ) to AWS Lambda and Amazon API Gateway.
Go
967
star
30

eks-node-viewer

EKS Node Viewer
Go
947
star
31

multi-model-server

Multi Model Server is a tool for serving neural net models for inference
Java
936
star
32

ec2-spot-labs

Collection of tools and code examples to demonstrate best practices in using Amazon EC2 Spot Instances.
Jupyter Notebook
905
star
33

aws-mobile-appsync-sdk-js

JavaScript library files for Offline, Sync, Sigv4. includes support for React Native
TypeScript
902
star
34

aws-saas-boost

AWS SaaS Boost is a ready-to-use toolset that removes the complexity of successfully running SaaS workloads in the AWS cloud.
Java
901
star
35

fargatecli

CLI for AWS Fargate
Go
891
star
36

aws-api-gateway-developer-portal

A Serverless Developer Portal for easily publishing and cataloging APIs
JavaScript
879
star
37

ecs-refarch-continuous-deployment

ECS Reference Architecture for creating a flexible and scalable deployment pipeline to Amazon ECS using AWS CodePipeline
Shell
842
star
38

fortuna

A Library for Uncertainty Quantification.
Python
836
star
39

dynamodb-data-mapper-js

A schema-based data mapper for Amazon DynamoDB.
TypeScript
818
star
40

goformation

GoFormation is a Go library for working with CloudFormation templates.
Go
812
star
41

flowgger

A fast data collector in Rust
Rust
796
star
42

aws-js-s3-explorer

AWS JavaScript S3 Explorer is a JavaScript application that uses AWS's JavaScript SDK and S3 APIs to make the contents of an S3 bucket easy to browse via a web browser.
HTML
771
star
43

aws-icons-for-plantuml

PlantUML sprites, macros, and other includes for Amazon Web Services services and resources
Python
737
star
44

aws-devops-essential

In few hours, quickly learn how to effectively leverage various AWS services to improve developer productivity and reduce the overall time to market for new product capabilities.
Shell
674
star
45

aws-apigateway-lambda-authorizer-blueprints

Blueprints and examples for Lambda-based custom Authorizers for use in API Gateway.
C#
660
star
46

amazon-ecs-nodejs-microservices

Reference architecture that shows how to take a Node.js application, containerize it, and deploy it as microservices on Amazon Elastic Container Service.
Shell
650
star
47

amazon-kinesis-client

Client library for Amazon Kinesis
Java
621
star
48

aws-deployment-framework

The AWS Deployment Framework (ADF) is an extensive and flexible framework to manage and deploy resources across multiple AWS accounts and regions based on AWS Organizations.
Python
617
star
49

aws-lambda-web-adapter

Run web applications on AWS Lambda
Rust
610
star
50

dgl-lifesci

Python package for graph neural networks in chemistry and biology
Python
594
star
51

aws-security-automation

Collection of scripts and resources for DevSecOps and Automated Incident Response Security
Python
585
star
52

aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Python
565
star
53

python-deequ

Python API for Deequ
Python
535
star
54

aws-athena-query-federation

The Amazon Athena Query Federation SDK allows you to customize Amazon Athena with your own data sources and code.
Java
507
star
55

data-on-eks

DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS
HCL
504
star
56

shuttle

Shuttle is a library for testing concurrent Rust code
Rust
465
star
57

ami-builder-packer

An example of an AMI Builder using CI/CD with AWS CodePipeline, AWS CodeBuild, Hashicorp Packer and Ansible.
465
star
58

route53-dynamic-dns-with-lambda

A Dynamic DNS system built with API Gateway, Lambda & Route 53.
Python
461
star
59

aws-servicebroker

AWS Service Broker
Python
461
star
60

amazon-ecs-local-container-endpoints

A container that provides local versions of the ECS Task Metadata Endpoint and ECS Task IAM Roles Endpoint.
Go
456
star
61

datawig

Imputation of missing values in tables.
JavaScript
454
star
62

aws-jwt-verify

JS library for verifying JWTs signed by Amazon Cognito, and any OIDC-compatible IDP that signs JWTs with RS256, RS384, and RS512
TypeScript
452
star
63

amazon-dynamodb-lock-client

The AmazonDynamoDBLockClient is a general purpose distributed locking library built on top of DynamoDB. It supports both coarse-grained and fine-grained locking.
Java
447
star
64

ecs-refarch-service-discovery

An EC2 Container Service Reference Architecture for providing Service Discovery to containers using CloudWatch Events, Lambda and Route 53 private hosted zones.
Go
444
star
65

ssosync

Populate AWS SSO directly with your G Suite users and groups using either a CLI or AWS Lambda
Go
443
star
66

handwritten-text-recognition-for-apache-mxnet

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.
Jupyter Notebook
442
star
67

awscli-aliases

Repository for AWS CLI aliases.
437
star
68

aws-config-rdk

The AWS Config Rules Development Kit helps developers set up, author and test custom Config rules. It contains scripts to enable AWS Config, create a Config rule and test it with sample ConfigurationItems.
Python
436
star
69

snapchange

Lightweight fuzzing of a memory snapshot using KVM
Rust
427
star
70

aws-security-assessment-solution

An AWS tool to help you create a point in time assessment of your AWS account using Prowler and Scout as well as optional AWS developed ransomware checks.
423
star
71

lambda-refarch-mapreduce

This repo presents a reference architecture for running serverless MapReduce jobs. This has been implemented using AWS Lambda and Amazon S3.
JavaScript
422
star
72

aws-lambda-cpp

C++ implementation of the AWS Lambda runtime
C++
409
star
73

aws-cloudsaga

AWS CloudSaga - Simulate security events in AWS
Python
389
star
74

amazon-kinesis-producer

Amazon Kinesis Producer Library
C++
385
star
75

soci-snapshotter

Go
383
star
76

serverless-photo-recognition

A collection of 3 lambda functions that are invoked by Amazon S3 or Amazon API Gateway to analyze uploaded images with Amazon Rekognition and save picture labels to ElasticSearch (written in Kotlin)
Kotlin
378
star
77

amazon-sagemaker-workshop

Amazon SageMaker workshops: Introduction, TensorFlow in SageMaker, and more
Jupyter Notebook
378
star
78

serverless-rules

Compilation of rules to validate infrastructure-as-code templates against recommended practices for serverless applications.
Go
378
star
79

logstash-output-amazon_es

Logstash output plugin to sign and export logstash events to Amazon Elasticsearch Service
Ruby
374
star
80

kinesis-aggregation

AWS libraries/modules for working with Kinesis aggregated record data
Java
370
star
81

smithy-rs

Code generation for the AWS SDK for Rust, as well as server and generic smithy client generation.
Rust
369
star
82

syne-tune

Large scale and asynchronous Hyperparameter and Architecture Optimization at your fingertips.
Python
363
star
83

aws-sdk-kotlin

Multiplatform AWS SDK for Kotlin
Kotlin
359
star
84

dynamodb-transactions

Java
354
star
85

amazon-kinesis-client-python

Amazon Kinesis Client Library for Python
Python
354
star
86

aws-serverless-data-lake-framework

Enterprise-grade, production-hardened, serverless data lake on AWS
Python
349
star
87

threat-composer

A simple threat modeling tool to help humans to reduce time-to-value when threat modeling
TypeScript
346
star
88

amazon-kinesis-agent

Continuously monitors a set of log files and sends new data to the Amazon Kinesis Stream and Amazon Kinesis Firehose in near-real-time.
Java
342
star
89

rds-snapshot-tool

The Snapshot Tool for Amazon RDS automates the task of creating manual snapshots, copying them into a different account and a different region, and deleting them after a specified number of days
Python
337
star
90

amazon-kinesis-scaling-utils

The Kinesis Scaling Utility is designed to give you the ability to scale Amazon Kinesis Streams in the same way that you scale EC2 Auto Scaling groups – up or down by a count or as a percentage of the total fleet. You can also simply scale to an exact number of Shards. There is no requirement for you to manage the allocation of the keyspace to Shards when using this API, as it is done automatically.
Java
333
star
91

amazon-kinesis-video-streams-producer-sdk-cpp

Amazon Kinesis Video Streams Producer SDK for C++ is for developers to install and customize for their connected camera and other devices to securely stream video, audio, and time-encoded data to Kinesis Video Streams.
C++
332
star
92

landing-zone-accelerator-on-aws

Deploy a multi-account cloud foundation to support highly-regulated workloads and complex compliance requirements.
TypeScript
330
star
93

route53-infima

Library for managing service-level fault isolation using Amazon Route 53.
Java
326
star
94

aws-automated-incident-response-and-forensics

326
star
95

mxboard

Logging MXNet data for visualization in TensorBoard.
Python
326
star
96

aws-sigv4-proxy

This project signs and proxies HTTP requests with Sigv4
Go
325
star
97

statelint

A Ruby gem that provides a command-line validator for Amazon States Language JSON files.
Ruby
324
star
98

graphstorm

Enterprise graph machine learning framework for billion-scale graphs for ML scientists and data scientists.
Python
317
star
99

ecs-nginx-reverse-proxy

Reference architecture for deploying Nginx on ECS, both as a basic static resource server, and as a reverse proxy in front of a dynamic application server.
Nginx
317
star
100

simplebeerservice

Simple Beer Service (SBS) is a cloud-connected kegerator that streams live sensor data to AWS.
JavaScript
316
star