• Stars
    star
    381
  • Rank 112,502 (Top 3 %)
  • Language
    R
  • Created almost 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Amazon Simple Storage Service (S3) API Client

AWS S3 Client Package

aws.s3 is a simple client package for the Amazon Web Services (AWS) Simple Storage Service (S3) REST API. While other packages currently connect R to S3, they do so incompletely (mapping only some of the API endpoints to R) and most implementations rely on the AWS command-line tools, which users may not have installed on their system.

To use the package, you will need an AWS account and to enter your credentials into R. Your keypair can be generated on the IAM Management Console under the heading Access Keys. Note that you only have access to your secret key once. After it is generated, you need to save it in a secure location. New keypairs can be generated at any time if yours has been lost, stolen, or forgotten. The aws.iam package profiles tools for working with IAM, including creating roles, users, groups, and credentials programmatically; it is not needed to use IAM credentials.

A detailed description of how credentials can be specified is provided at: https://github.com/cloudyr/aws.signature/. The easiest way is to simply set environment variables on the command line prior to starting R or via an Renviron.site or .Renviron file, which are used to set environment variables in R during startup (see ? Startup). They can be also set within R:

Sys.setenv("AWS_ACCESS_KEY_ID" = "mykey",
           "AWS_SECRET_ACCESS_KEY" = "mysecretkey",
           "AWS_DEFAULT_REGION" = "us-east-1",
           "AWS_SESSION_TOKEN" = "mytoken")

Remarks:

  • To use the package with S3-compatible storage provided by other cloud platforms, set the AWS_S3_ENDPOINT environment variable to the appropriate host name. By default, the package uses the AWS endpoint: s3.amazonaws.com. Note that you may have to set region="" in the request as well if the back-end uses only a single server with no concept of regions.
  • To use the package from an EC2 instance, you would need to install aws.ec2metadata. This way, credential will be obtained from the machine's role.

Code Examples

The package can be used to examine publicly accessible S3 buckets and publicly accessible S3 objects without registering an AWS account. If credentials have been generated in the AWS console and made available in R, you can find your available buckets using:

library("aws.s3")
bucketlist()

If your credentials are incorrect, this function will return an error. Otherwise, it will return a list of information about the buckets you have access to.

Buckets

To get a listing of all objects in a public bucket, simply call

get_bucket(bucket = '1000genomes')

Amazon maintains a listing of Public Data Sets on S3.

To get a listing for all objects in a private bucket, pass your AWS key and secret in as parameters. (As described above, all functions in aws.s3 will look for your keys as environment variables by default, greatly simplifying the process of making an s3 request.)

# specify keys in-line
get_bucket(
  bucket = 'my_bucket',
  key = YOUR_AWS_ACCESS_KEY,
  secret = YOUR_AWS_SECRET_ACCESS_KEY
)

# specify keys as environment variables
Sys.setenv("AWS_ACCESS_KEY_ID" = "mykey",
           "AWS_SECRET_ACCESS_KEY" = "mysecretkey")
get_bucket("my_bucket")

S3 can be a bit picky about region specifications. bucketlist() will return buckets from all regions, but all other functions require specifying a region. A default of "us-east-1" is relied upon if none is specified explicitly and the correct region can't be detected automatically. (Note: using an incorrect region is one of the most common - and hardest to figure out - errors when working with S3.)

Objects

This package contains many functions. The following are those that will be useful for working with objects in S3:

  1. bucketlist() provides the data frames of buckets to which the user has access.
  2. get_bucket() and get_bucket_df() provide a list and data frame, respectively, of objects in a given bucket.
  3. object_exists() provides a logical for whether an object exists. bucket_exists() provides the same for buckets.
  4. s3read_using() provides a generic interface for reading from S3 objects using a user-defined function. s3write_using() provides a generic interface for writing to S3 objects using a user-defined function
  5. get_object() returns a raw vector representation of an S3 object. This might then be parsed in a number of ways, such as rawToChar(), xml2::read_xml(), jsonlite::fromJSON(), and so forth depending on the file format of the object. save_object() saves an S3 object to a specified local file without reading it into memory.
  6. s3connection() provides a binary readable connection to stream an S3 object into R. This can be useful for reading for very large files. get_object() also allows reading of byte ranges of functions (see the documentation for examples).
  7. put_object() stores a local file into an S3 bucket. The multipart = TRUE argument can be used to upload large files in pieces.
  8. s3save() saves one or more in-memory R objects to an .Rdata file in S3 (analogously to save()). s3saveRDS() is an analogue for saveRDS(). s3load() loads one or more objects into memory from an .Rdata file stored in S3 (analogously to load()). s3readRDS() is an analogue for readRDS()
  9. s3source() sources an R script directly from S3

They behave as you would probably expect:

# save an in-memory R object into S3
s3save(mtcars, bucket = "my_bucket", object = "mtcars.Rdata")

# `load()` R objects from the file
s3load("mtcars.Rdata", bucket = "my_bucket")

# get file as raw vector
get_object("mtcars.Rdata", bucket = "my_bucket")
# alternative 'S3 URI' syntax:
get_object("s3://my_bucket/mtcars.Rdata")

# save file locally
save_object("mtcars.Rdata", file = "mtcars.Rdata", bucket = "my_bucket")

# put local file into S3
put_object(file = "mtcars.Rdata", object = "mtcars2.Rdata", bucket = "my_bucket")

Installation

CRAN Downloads RForge Build Status codecov.io

Latest stable release from CRAN:

install.packages("aws.s3", repos = "https://cloud.R-project.org")

Lastest development version from RForge.net:

install.packages("aws.s3", repos = c("https://RForge.net", "https://cloud.R-project.org"))

On windows you may need to add INSTALL_opts = "--no-multiarch"


cloudyr project logo

More Repositories

1

googleComputeEngineR

An R interface to the Google Cloud Compute API, for launching virtual machines
R
152
star
2

rmote

Utilities for running R on a remote server
R
124
star
3

googleCloudStorageR

Google Cloud Storage API to R
R
103
star
4

MTurkR

R Client for the MTurk Requester API
R
91
star
5

RoogleVision

R Package for Image Recognition using Google Cloud Vision
R
76
star
6

limer

A LimeSurvey R Client
R
67
star
7

Rmonkey

A Survey Monkey R Client
R
53
star
8

aws.ec2

AWS EC2 Client Package
R
45
star
9

ghit

Lightweight GitHub Package Installer
R
43
star
10

bigQueryR

R Interface with Google BigQuery
R
41
star
11

aws.lambda

AWS Lambda Client Package
R
32
star
12

aws.signature

Amazon Web Services Request Signatures
R
31
star
13

cloudyr.github.io

the cloudyr project website
CSS
29
star
14

aws.polly

Client for AWS Polly
R
23
star
15

AzureStor

Interface to Azure storage accounts. Submit issues and PRs at https://github.com/Azure/AzureStor
R
22
star
16

AzureRMR

Interface to Azure Resource Manager: authenticate, get subscriptions, get resource groups. Submit issues and PRs at https://github.com/Azure/AzureRMR.
R
19
star
17

AzureKusto

R interface to Kusto/Azure Data Explorer. Submit issues and PRs at https://github.com/Azure/AzureKusto
R
18
star
18

cloudcidrs

Tools to Obtain and Work with Cloud Provider CIDR Blocks in R
R
16
star
19

Microsoft365R

Interface to Microsoft 365 (formerly Office 365). Submit issues and PRs at https://github.com/Azure/Microsoft365R.
R
16
star
20

aws.iam

AWS IAM Client Package
R
15
star
21

pyMTurkR

A Client for the MTurk Requester API
R
15
star
22

awspack

Amazon Web Services Bundle Package
Shell
15
star
23

aws.alexa

Client Package for the Amazon Alexa Web Information Service
R
13
star
24

aws.dynamodb

Client Package for the Amazon DynamoDB Service
R
13
star
25

AzureAuth

OAuth 2.0 authentication with Azure Active Directory. Submit issues and PRs at https://github.com/Azure/AzureAuth
R
13
star
26

aws.ec2metadata

Access to EC2 Instance Metadata
R
12
star
27

aws.comprehend

AWS Comprehend Client
R
12
star
28

aws.sns

Amazon Simple Notification Service (SNS) API Client
R
12
star
29

AzureContainers

Containers in Azure: AKS, ACR, ACI. Submit issues and PRs at https://github.com/Azure/AzureContainers.
R
11
star
30

rdatastore

R package for accessing google datastore
R
11
star
31

roto.athena

Access Amazon's AWS Athena API via reticulate and AWS official Python boto3 module
R
11
star
32

aws.sqs

Amazon Simple Queue Service (SQS) API Client
R
11
star
33

travisci

Travis-CI API Client Package
R
10
star
34

crowdflower

Crowdflower.com API Client
R
8
star
35

AzureVM

Manage virtual machines in Azure. Submit issues and PRs at https://github.com/Azure/AzureVM
R
8
star
36

roto.s3

Access and Orchestrate Amazon Simple Storage Service
R
7
star
37

aws.secrets

R
7
star
38

aws.ses

Amazon Email Service (SES) API Client
R
7
star
39

googleCloudVisionR

Google Cloud Vision API to R
R
6
star
40

qualtrics

An R Client for Qualtrics
R
6
star
41

aws.cloudwatch

AWS CloudWatch Client Package
R
6
star
42

gcloudR

Load all Google Cloud APIs at once
R
5
star
43

appveyor

Appveyor API Client Package
R
5
star
44

AzureVision

Interface to Azure Computer Vision API. Submit issues and PRs at https://github.com/Azure/AzureVision
R
4
star
45

pkgtemplate

Style guide-consistent package template for the cloudyr project
Shell
4
star
46

AzureKeyVault

R interface to Azure Key Vault. Submit issues and PRs at https://github.com/Azure/AzureKeyVault
R
4
star
47

aws.transcribe

Client for AWS Transcribe
R
4
star
48

AzureGraph

Simple interface to the Microsoft Graph API. Submit issues and PRs at https://github.com/Azure/AzureGraph
R
4
star
49

aws.athena

R
3
star
50

aws.ml

AWS Machine Learning Client
R
3
star
51

microworkers

Microworkers.com R Client
R
3
star
52

aws.glacier

AWS Glacier Client Package
R
3
star
53

aws.translate

Client for AWS Translate
R
3
star
54

circleci

Circle CI API Client Package
R
2
star
55

aws.code

Amazon Code-Commit, -Deploy, -Pipeline API Client
R
2
star
56

aws.batch

2
star
57

MTurkRGUI

A Graphical User Interface for MTurkR
R
2
star
58

aws.sagemaker

1
star
59

AzureVMmetadata

R interface to VM instance metadata. Submit issues and PRs at https://github.com/Azure/AzureVMmetadata.
R
1
star
60

aws.efs

Amazon Web Services Elastic File System Client
R
1
star
61

AzureCosmosR

Interface to Azure Cosmos DB. Submit issues and PRs at https://github.com/Azure/AzureCosmosR.
R
1
star