piggyback
Because larger (> 50 MB) data files cannot easily be committed to git, a different approach is required to manage data associated with an analysis in a GitHub repository. This package provides a simple work-around by allowing larger (up to 2 GB per file) data files to piggyback on a repository as assets attached to individual GitHub releases. These files are not handled by git in any way, but instead are uploaded, downloaded, or edited directly by calls through the GitHub API. These data files can be versioned manually by creating different releases. This approach works equally well with public or private repositories. Data can be uploaded and downloaded programmatically from scripts. No authentication is required to download data from public repositories.
Installation
Install from CRAN via
install.packages("piggyback")
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("ropensci/piggyback")
Quickstart
See the piggyback vignette for details on authentication and additional package functionality.
Piggyback can download data attached to a release on any repository:
library(piggyback)
pb_download("iris.tsv.gz", repo = "cboettig/piggyback-tests", dest = tempdir())
#> Warning in pb_download("iris.tsv.gz", repo = "cboettig/piggyback-tests", :
#> file(s) iris.tsv.gz not found in repo cboettig/piggyback-tests
Downloading from private repos or uploading to any repo requires
authentication, so be sure to set a GITHUB_TOKEN
(or GITHUB_PAT
)
environmental variable, or include the .token
argument. Omit the file
name to download all attached objects. Omit the repository name to
default to the current repository. See introductory
vignette or
function documentation for details.
We can also upload data to any existing release (defaults to latest
):
## We'll need some example data first.
## Pro tip: compress your tabular data to save space & speed upload/downloads
readr::write_tsv(mtcars, "mtcars.tsv.gz")
pb_upload("mtcars.tsv.gz", repo = "cboettig/piggyback-tests")
Git LFS and other alternatives
piggyback
acts like a poor soul’s Git
LFS. Git LFS is not only expensive, it
also breaks GitHub’s collaborative
model
– basically if someone wants to submit a PR with a simple edit to your
docs, they cannot fork your repository since that would otherwise count
against your Git LFS storage. Unlike Git LFS, piggyback
doesn’t take
over your standard git
client, it just perches comfortably on the
shoulders of your existing GitHub API. Data can be versioned by
piggyback
, but relative to git LFS
versioning is less strict:
uploads can be set as a new version or allowed to overwrite previously
uploaded data.
But what will GitHub think of this?
GitHub documentation at the time of writing endorses the use of attachments to releases as a solution for distributing large files as part of your project:
Of course, it will be up to GitHub to decide if this use of release attachments is acceptable in the long term.
Also see our vignette comparing alternatives.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.