• Stars
    star
    3,537
  • Rank 11,988 (Top 0.3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 5 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS SDK for pandas (awswrangler)

AWS Data Wrangler is now AWS SDK for pandas (awswrangler). We’re changing the name we use when we talk about the library, but everything else will stay the same. You’ll still be able to install using pip install awswrangler and you won’t need to change any of your code. As part of this change, we’ve moved the library from AWS Labs to the main AWS GitHub organisation but, thanks to the GitHub’s redirect feature, you’ll still be able to access the project by its old URLs until you update your bookmarks. Our documentation has also moved to aws-sdk-pandas.readthedocs.io, but old bookmarks will redirect to the new site.

Pandas on AWS

Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS SDK for pandas tracker

An AWS Professional Service open source initiative | [email protected]

Release Python Version Code style: black License

Checked with mypy Static Checking Documentation Status

Source Downloads Installation Command
PyPi PyPI Downloads pip install awswrangler
Conda Conda Downloads conda install -c conda-forge awswrangler

⚠️ Starting version 3.0, optional modules must be installed explicitly:
➡️pip install 'awswrangler[redshift]'

Powered By

Table of contents

Quick Start

Installation command: pip install awswrangler

⚠️ Starting version 3.0, optional modules must be installed explicitly:
➡️pip install 'awswrangler[redshift]'

import awswrangler as wr
import pandas as pd
from datetime import datetime

df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})

# Storing data on Data Lake
wr.s3.to_parquet(
    df=df,
    path="s3://bucket/dataset/",
    dataset=True,
    database="my_db",
    table="my_table"
)

# Retrieving the data directly from Amazon S3
df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True)

# Retrieving the data from Amazon Athena
df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")

# Get a Redshift connection from Glue Catalog and retrieving data from Redshift Spectrum
con = wr.redshift.connect("my-glue-connection")
df = wr.redshift.read_sql_query("SELECT * FROM external_schema.my_table", con=con)
con.close()

# Amazon Timestream Write
df = pd.DataFrame({
    "time": [datetime.now(), datetime.now()],   
    "my_dimension": ["foo", "boo"],
    "measure": [1.0, 1.1],
})
rejected_records = wr.timestream.write(df,
    database="sampleDB",
    table="sampleTable",
    time_col="time",
    measure_col="measure",
    dimensions_cols=["my_dimension"],
)

# Amazon Timestream Query
wr.timestream.query("""
SELECT time, measure_value::double, my_dimension
FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
""")

At scale

AWS SDK for pandas can also run your workflows at scale by leveraging Modin and Ray. Both projects aim to speed up data workloads by distributing processing over a cluster of workers.

The quickest way to get started is to use AWS Glue with Ray. Read our docs, our blog, or head to our latest tutorials to discover even more features.

Read The Docs

Getting Help

The best way to interact with our team is through GitHub. You can open an issue and choose from one of our templates for bug reports, feature requests... You may also find help on these community resources:

Community Resources

Please send a Pull Request with your resource reference and @githubhandle.

Logging

Enabling internal logging examples:

import logging
logging.basicConfig(level=logging.INFO, format="[%(name)s][%(funcName)s] %(message)s")
logging.getLogger("awswrangler").setLevel(logging.DEBUG)
logging.getLogger("botocore.credentials").setLevel(logging.CRITICAL)

Into AWS lambda:

import logging
logging.getLogger("awswrangler").setLevel(logging.DEBUG)

Who uses AWS SDK for pandas?

Knowing which companies are using this library is important to help prioritize the project internally. If you would like us to include your company’s name and/or logo in the README file to indicate that your company is using the AWS SDK for pandas, please raise a "Support Us" issue. If you would like us to display your company’s logo, please raise a linked pull request to provide an image file for the logo. Note that by raising a Support Us issue (and related pull request), you are granting AWS permission to use your company’s name (and logo) for the limited purpose described here and you are confirming that you have authority to grant such permission.

More Repositories

1

aws-cli

Universal Command Line Interface for Amazon Web Services
Python
14,304
star
2

aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
JavaScript
10,440
star
3

chalice

Python Serverless Microframework for AWS
Python
10,191
star
4

amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Jupyter Notebook
9,297
star
5

serverless-application-model

The AWS Serverless Application Model (AWS SAM) transform is a AWS CloudFormation macro that transforms SAM templates into CloudFormation templates.
Python
9,235
star
6

aws-sdk-js

AWS SDK for JavaScript in the browser and Node.js
JavaScript
7,476
star
7

aws-sam-cli

CLI tool to build, test, debug, and deploy Serverless applications using AWS SAM
Python
6,443
star
8

aws-sdk-php

Official repository of the AWS SDK for PHP (@awsforphp)
PHP
5,886
star
9

containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
Shell
5,119
star
10

karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Go
4,615
star
11

s2n-tls

An implementation of the TLS/SSL protocols
C
4,447
star
12

aws-sdk-java

The official AWS SDK for Java 1.x. The AWS SDK for Java 2.x is available here: https://github.com/aws/aws-sdk-java-v2/
4,064
star
13

aws-lambda-go

Libraries, samples and tools to help Go developers develop AWS Lambda functions.
Go
3,498
star
14

aws-sdk-ruby

The official AWS SDK for Ruby.
Ruby
3,462
star
15

copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
Go
3,274
star
16

amazon-freertos

DEPRECATED - See README.md
C
2,535
star
17

aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
TypeScript
2,476
star
18

jsii

jsii allows code in any language to naturally interact with JavaScript classes. It is the technology that enables the AWS Cloud Development Kit to deliver polyglot libraries from a single codebase!
TypeScript
2,371
star
19

aws-sdk-go-v2

AWS SDK for the Go programming language.
Go
2,298
star
20

amazon-vpc-cni-k8s

Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS
Go
2,071
star
21

sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
Python
2,038
star
22

amazon-ecs-agent

Amazon Elastic Container Service Agent
Go
2,005
star
23

lumberyard

Amazon Lumberyard is a free AAA game engine deeply integrated with AWS and Twitch – with full source.
C++
1,965
star
24

aws-sdk-net

The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:
1,945
star
25

eks-anywhere

Run Amazon EKS on your own infrastructure 🚀
Go
1,899
star
26

aws-eks-best-practices

A best practices guide for day 2 operations, including operational excellence, security, reliability, performance efficiency, and cost optimization.
Python
1,853
star
27

aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Java
1,822
star
28

aws-sdk-cpp

AWS SDK for C++
1,779
star
29

amazon-ecs-cli

The Amazon ECS CLI enables users to run their applications on ECS/Fargate using the Docker Compose file format, quickly provision resources, push/pull images in ECR, and monitor running applications on ECS/Fargate.
Go
1,725
star
30

aws-sdk-php-laravel

A Laravel 5+ (and 4) service provider for the AWS SDK for PHP
PHP
1,589
star
31

aws-node-termination-handler

Gracefully handle EC2 instance shutdown within Kubernetes
Go
1,443
star
32

serverless-java-container

A Java wrapper to run Spring, Spring Boot, Jersey, and other apps inside AWS Lambda.
Java
1,439
star
33

aws-lambda-dotnet

Libraries, samples and tools to help .NET Core developers develop AWS Lambda functions.
C#
1,430
star
34

aws-fpga

Official repository of the AWS EC2 FPGA Hardware and Software Development Kit
VHDL
1,380
star
35

eks-distro

Amazon EKS Distro (EKS-D) is a Kubernetes distribution based on and used by Amazon Elastic Kubernetes Service (EKS) to create reliable and secure Kubernetes clusters.
Shell
1,263
star
36

aws-toolkit-vscode

CodeWhisperer, CodeCatalyst, Local Lambda debug, SAM/CFN syntax, ECS Terminal, AWS resources
TypeScript
1,150
star
37

eks-charts

Amazon EKS Helm chart repository
Mustache
1,142
star
38

s2n-quic

An implementation of the IETF QUIC protocol
Rust
1,066
star
39

opsworks-cookbooks

Chef Cookbooks for the AWS OpsWorks Service
Ruby
1,058
star
40

aws-codebuild-docker-images

Official AWS CodeBuild repository for managed Docker images http://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref.html
Dockerfile
1,032
star
41

amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
Go
975
star
42

aws-iot-device-sdk-js

SDK for connecting to AWS IoT from a device using JavaScript/Node.js
JavaScript
957
star
43

aws-iot-device-sdk-embedded-C

SDK for connecting to AWS IoT from a device using embedded C.
C
926
star
44

aws-health-tools

The samples provided in AWS Health Tools can help users to build automation and customized alerting in response to AWS Health events.
Python
887
star
45

aws-app-mesh-examples

AWS App Mesh is a service mesh that you can use with your microservices to manage service to service communication.
Shell
844
star
46

deep-learning-containers

AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet.
Python
800
star
47

aws-graviton-getting-started

Helping developers to use AWS Graviton2 and Graviton3 processors which power the 6th and 7th generation of Amazon EC2 instances (C6g[d], M6g[d], R6g[d], T4g, X2gd, C6gn, I4g, Im4gn, Is4gen, G5g, C7g[d][n], M7g[d], R7g[d]).
Python
788
star
48

aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
Python
782
star
49

aws-lambda-runtime-interface-emulator

Go
771
star
50

aws-toolkit-jetbrains

AWS Toolkit for JetBrains - a plugin for interacting with AWS from JetBrains IDEs
Kotlin
723
star
51

aws-iot-device-sdk-python

SDK for connecting to AWS IoT from a device using Python.
Python
670
star
52

graph-notebook

Library extending Jupyter notebooks to integrate with Apache TinkerPop, openCypher, and RDF SPARQL.
Jupyter Notebook
663
star
53

amazon-chime-sdk-js

A JavaScript client library for integrating multi-party communications powered by the Amazon Chime service.
TypeScript
655
star
54

amazon-ec2-instance-selector

A CLI tool and go library which recommends instance types based on resource criteria like vcpus and memory
Go
642
star
55

studio-lab-examples

Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!
Jupyter Notebook
566
star
56

aws-sdk-rails

Official repository for the aws-sdk-rails gem, which integrates the AWS SDK for Ruby with Ruby on Rails.
Ruby
554
star
57

aws-mwaa-local-runner

This repository provides a command line interface (CLI) utility that replicates an Amazon Managed Workflows for Apache Airflow (MWAA) environment locally.
Shell
553
star
58

amazon-eks-pod-identity-webhook

Amazon EKS Pod Identity Webhook
Go
534
star
59

event-ruler

Event Ruler is a Java library that allows matching many thousands of Events per second to any number of expressive and sophisticated rules.
Java
516
star
60

aws-lambda-base-images

506
star
61

aws-lambda-java-libs

Official mirror for interface definitions and helper classes for Java code running on the AWS Lambda platform.
C++
502
star
62

aws-appsync-community

The AWS AppSync community
HTML
495
star
63

dotnet

GitHub home for .NET development on AWS
487
star
64

aws-cdk-rfcs

RFCs for the AWS CDK
JavaScript
476
star
65

sagemaker-training-toolkit

Train machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Python
468
star
66

aws-elastic-beanstalk-cli-setup

Simplified EB CLI installation mechanism.
Python
453
star
67

aws-sam-cli-app-templates

Python
435
star
68

amazon-cloudwatch-agent

CloudWatch Agent enables you to collect and export host-level metrics and logs on instances running Linux or Windows server.
Go
403
star
69

secrets-store-csi-driver-provider-aws

The AWS provider for the Secrets Store CSI Driver allows you to fetch secrets from AWS Secrets Manager and AWS Systems Manager Parameter Store, and mount them into Kubernetes pods.
Go
393
star
70

amazon-braket-examples

Example notebooks that show how to apply quantum computing in Amazon Braket.
Python
376
star
71

aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Shell
375
star
72

aws-extensions-for-dotnet-cli

Extensions to the dotnet CLI to simplify the process of building and publishing .NET Core applications to AWS services
C#
346
star
73

aws-sdk-php-symfony

PHP
346
star
74

aws-app-mesh-roadmap

AWS App Mesh is a service mesh that you can use with your microservices to manage service to service communication
344
star
75

aws-iot-device-sdk-python-v2

Next generation AWS IoT Client SDK for Python using the AWS Common Runtime
Python
335
star
76

constructs

Define composable configuration models through code
TypeScript
332
star
77

aws-lambda-builders

Python library to compile, build & package AWS Lambda functions for several runtimes & framework
Python
325
star
78

aws-codedeploy-agent

Host Agent for AWS CodeDeploy
Ruby
316
star
79

aws-sdk-ruby-record

Official repository for the aws-record gem, an abstraction for Amazon DynamoDB.
Ruby
313
star
80

aws-ops-wheel

The AWS Ops Wheel is a randomizer that biases for options that haven’t come up recently; you can also outright cheat and specify the next result to be generated.
JavaScript
308
star
81

aws-xray-sdk-python

AWS X-Ray SDK for the Python programming language
Python
304
star
82

sagemaker-inference-toolkit

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Python
303
star
83

aws-pdk

The AWS PDK provides building blocks for common patterns together with development tools to manage and build your projects.
TypeScript
288
star
84

amazon-ivs-react-native-player

A React Native wrapper for the Amazon IVS iOS and Android player SDKs.
TypeScript
283
star
85

sagemaker-spark

A Spark library for Amazon SageMaker.
Scala
282
star
86

pg_tle

Framework for building trusted language extensions for PostgreSQL
C
282
star
87

apprunner-roadmap

This is the public roadmap for AWS App Runner.
280
star
88

graph-explorer

React-based web application that enables users to visualize both property graph and RDF data and explore connections between data without having to write graph queries.
TypeScript
278
star
89

aws-xray-sdk-go

AWS X-Ray SDK for the Go programming language.
Go
274
star
90

aws-toolkit-eclipse

(End of life: May 31, 2023) AWS Toolkit for Eclipse
Java
273
star
91

elastic-beanstalk-roadmap

AWS Elastic Beanstalk roadmap
272
star
92

aws-logging-dotnet

.NET Libraries for integrating Amazon CloudWatch Logs with popular .NET logging libraries
C#
271
star
93

sagemaker-tensorflow-training-toolkit

Toolkit for running TensorFlow training scripts on SageMaker. Dockerfiles used for building SageMaker TensorFlow Containers are at https://github.com/aws/deep-learning-containers.
Python
267
star
94

elastic-load-balancing-tools

AWS Elastic Load Balancing Tools
Java
262
star
95

aws-step-functions-data-science-sdk-python

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS
Python
261
star
96

efs-utils

Utilities for Amazon Elastic File System (EFS)
Python
257
star
97

amazon-braket-sdk-python

A Python SDK for interacting with quantum devices on Amazon Braket
Python
254
star
98

aws-xray-sdk-node

The official AWS X-Ray SDK for Node.js.
JavaScript
248
star
99

Trusted-Advisor-Tools

The sample functions provided help to automate AWS Trusted Advisor best practices using Amazon Cloudwatch events and AWS Lambda.
Python
242
star
100

amazon-chime-sdk-component-library-react

Amazon Chime React Component Library with integrations with the Amazon Chime SDK.
TypeScript
240
star