• Stars
    star
    312
  • Rank 134,133 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.

Docs Status PyPI conda-forge conda-forge feedstock tests codecov

Our goal is to be the meringue of file management libraries: the subtle sweetness of pathlib working in harmony with the ethereal lightness of the cloud.

A Python library with classes that mimic pathlib.Path's interface for URIs from different cloud storage services.

with CloudPath("s3://bucket/filename.txt").open("w+") as f:
    f.write("Send my changes to the cloud!")

Why use cloudpathlib?

  • Familiar: If you know how to interact with Path, you know how to interact with CloudPath. All of the cloud-relevant Path methods are implemented.
  • Supported clouds: AWS S3, Google Cloud Storage, and Azure Blob Storage are implemented. FTP is on the way.
  • Extensible: The base classes do most of the work generically, so implementing two small classes MyPath and MyClient is all you need to add support for a new cloud storage service.
  • Read/write support: Reading just works. Using the write_text, write_bytes or .open('w') methods will all upload your changes to cloud storage without any additional file management as a developer.
  • Seamless caching: Files are downloaded locally only when necessary. You can also easily pass a persistent cache folder so that across processes and sessions you only re-download what is necessary.
  • Tested: Comprehensive test suite and code coverage.
  • Testability: Local filesystem implementations that can be used to easily mock cloud storage in your unit tests.

Installation

cloudpathlib depends on the cloud services' SDKs (e.g., boto3, google-cloud-storage, azure-storage-blob) to communicate with their respective storage service. If you try to use cloud paths for a cloud service for which you don't have dependencies installed, cloudpathlib will error and let you know what you need to install.

To install a cloud service's SDK dependency when installing cloudpathlib, you need to specify it using pip's "extras" specification. For example:

pip install cloudpathlib[s3,gs,azure]

With some shells, you may need to use quotes:

pip install "cloudpathlib[s3,gs,azure]"

Currently supported cloud storage services are: azure, gs, s3. You can also use all to install all available services' dependencies.

If you do not specify any extras or separately install any cloud SDKs, you will only be able to develop with the base classes for rolling your own cloud path class.

conda

cloudpathlib is also available using conda from conda-forge. Note that to install the necessary cloud service SDK dependency, you should include the appropriate suffix in the package name. For example:

conda install cloudpathlib-s3 -c conda-forge

If no suffix is used, only the base classes will be usable. See the conda-forge/cloudpathlib-feedstock for all installation options.

Development version

You can get latest development version from GitHub:

pip install https://github.com/drivendataorg/cloudpathlib.git#egg=cloudpathlib[all]

Note that you similarly need to specify cloud service dependencies, such as all in the above example command.

Quick usage

Here's an example to get the gist of using the package. By default, cloudpathlib authenticates with the environment variables supported by each respective cloud service SDK. For more details and advanced authentication options, see the "Authentication" documentation.

from cloudpathlib import CloudPath

# dispatches to S3Path based on prefix
root_dir = CloudPath("s3://drivendata-public-assets/")
root_dir
#> S3Path('s3://drivendata-public-assets/')

# there's only one file, but globbing works in nested folder
for f in root_dir.glob('**/*.txt'):
    text_data = f.read_text()
    print(f)
    print(text_data)
#> s3://drivendata-public-assets/odsc-west-2019/DATA_DICTIONARY.txt
#> Eviction Lab Data Dictionary
#>
#> Additional information in our FAQ evictionlab.org/help-faq/
#> Full methodology evictionlab.org/methods/
#>
#> ... (additional text output truncated)

# use / to join paths (and, in this case, create a new file)
new_file_copy = root_dir / "nested_dir/copy_file.txt"
new_file_copy
#> S3Path('s3://drivendata-public-assets/nested_dir/copy_file.txt')

# show things work and the file does not exist yet
new_file_copy.exists()
#> False

# writing text data to the new file in the cloud
new_file_copy.write_text(text_data)
#> 6933

# file now listed
list(root_dir.glob('**/*.txt'))
#> [S3Path('s3://drivendata-public-assets/nested_dir/copy_file.txt'),
#>  S3Path('s3://drivendata-public-assets/odsc-west-2019/DATA_DICTIONARY.txt')]

# but, we can remove it
new_file_copy.unlink()

# no longer there
list(root_dir.glob('**/*.txt'))
#> [S3Path('s3://drivendata-public-assets/odsc-west-2019/DATA_DICTIONARY.txt')]

Supported methods and properties

Most methods and properties from pathlib.Path are supported except for the ones that don't make sense in a cloud context. There are a few additional methods or properties that relate to specific cloud services or specifically for cloud paths.

Methods + properties AzureBlobPath S3Path GSPath
absolute βœ… βœ… βœ…
anchor βœ… βœ… βœ…
as_uri βœ… βœ… βœ…
drive βœ… βœ… βœ…
exists βœ… βœ… βœ…
glob βœ… βœ… βœ…
is_absolute βœ… βœ… βœ…
is_dir βœ… βœ… βœ…
is_file βœ… βœ… βœ…
is_relative_to βœ… βœ… βœ…
iterdir βœ… βœ… βœ…
joinpath βœ… βœ… βœ…
match βœ… βœ… βœ…
mkdir βœ… βœ… βœ…
name βœ… βœ… βœ…
open βœ… βœ… βœ…
parent βœ… βœ… βœ…
parents βœ… βœ… βœ…
parts βœ… βœ… βœ…
read_bytes βœ… βœ… βœ…
read_text βœ… βœ… βœ…
relative_to βœ… βœ… βœ…
rename βœ… βœ… βœ…
replace βœ… βœ… βœ…
resolve βœ… βœ… βœ…
rglob βœ… βœ… βœ…
rmdir βœ… βœ… βœ…
samefile βœ… βœ… βœ…
stat βœ… βœ… βœ…
stem βœ… βœ… βœ…
suffix βœ… βœ… βœ…
suffixes βœ… βœ… βœ…
touch βœ… βœ… βœ…
unlink βœ… βœ… βœ…
with_name βœ… βœ… βœ…
with_stem βœ… βœ… βœ…
with_suffix βœ… βœ… βœ…
write_bytes βœ… βœ… βœ…
write_text βœ… βœ… βœ…
as_posix ❌ ❌ ❌
chmod ❌ ❌ ❌
cwd ❌ ❌ ❌
expanduser ❌ ❌ ❌
group ❌ ❌ ❌
home ❌ ❌ ❌
is_block_device ❌ ❌ ❌
is_char_device ❌ ❌ ❌
is_fifo ❌ ❌ ❌
is_mount ❌ ❌ ❌
is_reserved ❌ ❌ ❌
is_socket ❌ ❌ ❌
is_symlink ❌ ❌ ❌
lchmod ❌ ❌ ❌
link_to ❌ ❌ ❌
lstat ❌ ❌ ❌
owner ❌ ❌ ❌
readlink ❌ ❌ ❌
root ❌ ❌ ❌
symlink_to ❌ ❌ ❌
clear_cache βœ… βœ… βœ…
cloud_prefix βœ… βœ… βœ…
copy βœ… βœ… βœ…
copytree βœ… βœ… βœ…
download_to βœ… βœ… βœ…
etag βœ… βœ… βœ…
fspath βœ… βœ… βœ…
is_valid_cloudpath βœ… βœ… βœ…
rmtree βœ… βœ… βœ…
upload_from βœ… βœ… βœ…
validate βœ… βœ… βœ…
blob βœ… ❌ βœ…
bucket ❌ βœ… βœ…
container βœ… ❌ ❌
key ❌ βœ… ❌
md5 βœ… ❌ ❌

Icon made by srip from www.flaticon.com.
Sample code block generated using the reprexpy package.

More Repositories

1

competition-winners

The code for the prize winners in DrivenData competitions.
374
star
2

concept-to-clinic

ALCF Concept to Clinic Challenge
Python
368
star
3

box-plots-sklearn

An implementation of some of the tools used by the winner of the box plots competition using scikit-learn.
Jupyter Notebook
298
star
4

erdantic

Entity relationship diagrams for Python data model classes like Pydantic
Python
292
star
5

deon

A command line tool to easily add an ethics checklist to your data science projects.
Python
258
star
6

image-similarity-challenge

Winners of the Facebook Image Similarity Challenge
123
star
7

open-cities-ai-challenge

Winners of the Open Cities AI Challenge competition
Jupyter Notebook
115
star
8

zamba

A Python package for identifying 42 kinds of animals, training custom models, and estimating distance from camera trap videos
Python
105
star
9

nbautoexport

Automatically export Jupyter notebooks to various file formats (.py, .html, and more) on save.
Python
71
star
10

pandas-path

Use pathlib syntax to easily work with Pandas series containing file paths.
Python
59
star
11

power-laws-forecasting

Winners of the Power Laws forecasting competition
HTML
56
star
12

hateful-memes

52
star
13

the-biomassters

This a repository with the winners' code from the BioMassters challenge
Jupyter Notebook
36
star
14

stac-overflow

Winners of the STAC Overflow: Map Floodwater from Radar Imagery competition
Jupyter Notebook
34
star
15

power-laws-anomalies

Jupyter Notebook
30
star
16

open-ai-caribbean

Python
27
star
17

power-laws-optimization

Example repository for the Power Laws: Optimizing Demand-side Strategies competition on DrivenData
Jupyter Notebook
26
star
18

overhead-geopose-challenge

Winners of DrivenData's Overhead Geopose Challenge
Python
26
star
19

tick-tick-bloom

Winners of the Tick Tick Bloom: Harmful Algal Bloom Detection Challenge
Jupyter Notebook
25
star
20

cloud-cover-winners

Code from the winning submissions for the On Cloud N: Cloud Cover Detection Challenge
Jupyter Notebook
25
star
21

pover-t-tests

Jupyter Notebook
25
star
22

box-plots-for-education

Competition results for Box-plots for Education https://www.drivendata.org/competitions/4/
HTML
21
star
23

hakuna-madata

Jupyter Notebook
21
star
24

power-laws-cold-start

Jupyter Notebook
21
star
25

wind-dependent-variables

Winners of the Wind-dependent Variables: Predict Wind Speeds of Tropical Storms competition
Python
21
star
26

n-plus-one-fish

Winning models for the N+1 Fish, N+2 Fish competition.
Jupyter Notebook
20
star
27

rinse-over-run

Winners of the Sustainable Industry: Rinse Over Run competition
Jupyter Notebook
20
star
28

snomed-ct-entity-linking

Winners of the SNOMED CT Entity Linking Challenge
Python
19
star
29

cyfi

Estimate cyanobacteria density based on Sentinel-2 satellite imagery
Python
19
star
30

drivendata-submission-validator

Simple validator for submissions to DrivenData competitions
Python
19
star
31

pump-it-up

Code from winning competitors in the Pump it Up competition on DrivenData.
Jupyter Notebook
18
star
32

tissuenet-cervical-biopsies

Winners of the TissueNet: Detect Lesions in Cervical Biopsies competition
Python
17
star
33

nasa-airathon

Winning code from the NASA Airathon: Predict Air Quality challenge on DrivenData
Jupyter Notebook
15
star
34

snowcast-showdown

Jupyter Notebook
14
star
35

magnet-geomagnetic-field

Winners of the MagNet: Model the Geomagnetic Field competition
Jupyter Notebook
14
star
36

snomed-ct-entity-linking-runtime

Runtime repository for the SNOMED CT Entity Linking challenge on DrivenData
Makefile
14
star
37

repro-zipfile

A tiny, zero-dependency replacement for Python's zipfile.ZipFile for creating reproducible/deterministic ZIP archives.
Python
12
star
38

cloud-cover-runtime

Code execution runtime for the Cloud Cover competition
Python
11
star
39

odsc-actionable-ethics

"Actionable Ethics for Data Scientists" Workshop Material @ ODSC
Jupyter Notebook
11
star
40

floodwater-runtime

Code execution runtime for the STAC Overflow: Map Floodwater from Radar Imagery competition
Python
11
star
41

naive-bees-classifier

Competition results for the Naive Bees Classifier competition https://www.drivendata.org/competitions/8/
Python
10
star
42

clog-loss-alzheimers-research

Winners of the Clog Loss: Advance Alzheimer’s Research with Stall Catchers competition
Python
10
star
43

keeping-it-fresh

Competition results for Keeping it Fresh https://www.drivendata.org/competitions/5/
Python
10
star
44

pets-prize-challenge-runtime

Evaluation runtime for Phase 2 of the PETs Prize Challenge
Python
9
star
45

water-supply-forecast-rodeo-runtime

Data and runtime repository for the Water Supply Forecast Rodeo competition on DrivenData
Python
9
star
46

noaa-runtime

Code execution for the NOAA MagNet: Model the Geomagnetic Field competition.
Python
9
star
47

boem-belugas-runtime

Code execution runtime for the "Where's Whale-do?" beluga photo-identification challenge
Jupyter Notebook
8
star
48

video-similarity-challenge

Links to winning solutions for the Meta AI Video Similarity Challenge
8
star
49

senior-data-science

Winners of the Senior Data Science
Python
8
star
50

visiomel-melanoma

Winners of the VisioMel Challenge: Predicting Melanoma Relapse competition
C++
8
star
51

metrics

Useful implementations of metrics for competitions on www.drivendata.org
Python
8
star
52

mars-spectrometry

A repository for the winners of the NASA Mars Spectrometry challenge
Jupyter Notebook
8
star
53

deid2-runtime

Code execution runtime for the NIST De-ID2 competition
Python
7
star
54

meta-vsc-descriptor-runtime

Containerized runtime for the Descriptor Track of the Meta Video Similarity Competition
Python
6
star
55

blood-donations

Community-submitted solutions to the Blood Donations competition on DrivenData.
Jupyter Notebook
6
star
56

sfp-cervical-biopsy-runtime

Code execution for the SFP cervical biopsy competition
Makefile
6
star
57

deep-chimpact-winners

Winners of the Deep Chimpact: Depth Estimation for Wildlife Conservation Competition
Jupyter Notebook
6
star
58

setup-python-uv-action

Composite action that sets up Python and uv with optional caching
Python
6
star
59

nasa-airport-pushback

Winners of the Pushback to the Future: Predict Pushback Time at US Airports Challenge
Python
5
star
60

sortedcontainers-pydantic

Adds Pydantic support to sortedcontainers.
Python
5
star
61

wheres-whale-do

Winners of the Where's Whale-do? Competition
Jupyter Notebook
4
star
62

flu-shot-learning-tutorial

Materials for tutorial @ Good Tech Fest 2020
Jupyter Notebook
4
star
63

pale-blue-dot

Winners of the Pale Blue Dot: Visualization Challenge
Jupyter Notebook
4
star
64

minimal-configclasses

Minimal Python library for creating config classes: a data class that can load default overrides from other sources
Python
4
star
65

countable-care

Winners of the Countable Care competition https://www.drivendata.org/competitions/6/
R
3
star
66

pri-matrix-factorization

Jupyter Notebook
3
star
67

meta-vsc-matching-runtime

Containerized runtime for the Matching Track of the Meta AI Video Similarity Competition
Python
3
star
68

visiomel-melanoma-runtime

Makefile
3
star
69

nasa-airport-config

Winners of the Run-way Functions competition
Python
3
star
70

openai-caribbean-challenge-benchmark

Benchmark code for the Open AI Caribbean Challenge: Mapping Disaster Risk from Aerial Imagery
MATLAB
3
star
71

random-walk-of-the-penguins

Jupyter Notebook
2
star
72

prize-winner-template

Template for competition prize winners to submit their code for review
Python
2
star
73

tutorial-flu-shot-learning

Machine learning tutorial based on the Flu Shot Learning competition
Jupyter Notebook
2
star
74

loggingisfun

A tutorial on logging in your Python package
Python
2
star
75

repro-tarfile

A tiny, zero-dependency replacement for Python's tarfile standard library for creating reproducible/deterministic tar archives.
Python
2
star
76

clog-loss-stall-catchers-benchmark

Benchmark code for Clog Loss: Advance Alzheimer’s Research with Stall Catchers
MATLAB
2
star
77

kelp-wanted

This a repository with the winners' code from the Kelp Wanted challenge
Jupyter Notebook
2
star
78

unsupervised-wisdom

Repo with the winners' code from Unsupervised Wisdom: Explore Medical Narratives on Older Adult Falls
HTML
2
star
79

nasa-pushback-federated-learning

Experiment with federated learning models to predict pushback times at US airports!
Python
1
star
80

april-ai-chatbot

Source code for the April "AI chatbot" demo
HTML
1
star
81

from-fog-nets-to-neural-nets

Winners of the From Fog Nets to Neural Nets"
Python
1
star
82

intro-to-reproducible-ml

Materials for tutorial @ Good Tech Fest DS - Nov. 2020
HTML
1
star
83

dengai

Community-submitted solutions to the DengAI competition on DrivenData.
1
star
84

ai4earth-serengeti-runtime

Container specifications for AI for Earth Serengeti competition on DrivenData
R
1
star
85

millennium-development-goals

Community-submitted solutions to the Millennium Development Goals competition on DrivenData.
1
star
86

mars-spectrometry-gcms

Winners' code from the Mars Spectrometry 2: Gas Chromatography challenge
Python
1
star
87

americas-next-top-statistical-model

Competition results for America's Next Top Statistical Model https://www.drivendata.org/competitions/43/
R
1
star
88

pose-bowl-spacecraft-challenge

Winning solutions from the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge
Jupyter Notebook
1
star