• Stars
    star
    179
  • Rank 214,039 (Top 5 %)
  • Language
    R
  • Created about 8 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

R package for statistical tools with big matrices stored on disk.

R build status Codecov test coverage CRAN_Status_Badge DOI

bigstatsr

R package {bigstatsr} provides functions for fast statistical analysis of large-scale data encoded as matrices. The package can handle matrices that are too large to fit in memory thanks to memory-mapping to binary files on disk. This is very similar to the format big.matrix provided by R package {bigmemory}, which is no longer used by this package (see the corresponding vignette). As inputs, package {bigstatsr} uses Filebacked Big Matrices (FBM).

LIST OF FEATURES

Note that most of the algorithms of this package don't handle missing values.

Installation

# For the CRAN version
install.packages("bigstatsr")
# For the latest version
remotes::install_github("privefl/bigstatsr")

Small example

library(bigstatsr)

# Create the data on disk
X <- FBM(5e3, 10e3, backingfile = "test")$save()
# If you open a new session you can do
X <- big_attach("test.rds")

# Fill it by chunks with random values
U <- matrix(0, nrow(X), 5); U[] <- rnorm(length(U))
V <- matrix(0, ncol(X), 5); V[] <- rnorm(length(V))
NCORES <- nb_cores()
# X = U V^T + E
big_apply(X, a.FUN = function(X, ind, U, V) {
  X[, ind] <- tcrossprod(U, V[ind, ]) + rnorm(nrow(X) * length(ind))
  NULL  ## you don't want to return anything here
}, a.combine = 'c', ncores = NCORES, U = U, V = V)
# Check some values
X[1:5, 1:5]

# Compute first 10 PCs
obj.svd <- big_randomSVD(X, fun.scaling = big_scale(), 
                         k = 10, ncores = NCORES)
plot(obj.svd)

# Cleanup
unlink(paste0("test", c(".bk", ".rds")))

Learn more with this introduction to package {bigstatsr}.

If you want to use Rcpp code, look at this tutorial.

Some use cases

Parallelization

Package {bigstatsr} uses package {foreach} for its parallelization tasks. Learn more on parallelism with {foreach} with this tutorial.

Large datasets

Bug report / Help

How to make a great R reproducible example?

Please open an issue if you find a bug.

If you want help using {bigstatsr}, please open an issue as well or post on Stack Overflow with the tag bigstatsr.

I will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.

References

  • Privé, Florian, et al. "Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr." Bioinformatics 34.16 (2018): 2781-2787.

  • Privé, Florian, Hugues Aschard, and Michael GB Blum. "Efficient implementation of penalized regression for genetic risk prediction." Genetics 212.1 (2019): 65-74.


More Repositories

1

bigsnpr

R package for the analysis of massive SNP arrays.
R
184
star
2

rmarkdown-website-template

Template for creating your website with R markdown.
51
star
3

bigreadr

R package to read large text files based on splitting + data.table::fread
R
42
star
4

pixelart

R package and Shiny app to make pixel art models and to resize images
R
41
star
5

advr38book

Advanced R course as an online bookdown.
HTML
17
star
6

paper-ldpred2

Paper discribing LDpred2
TeX
16
star
7

img2coord

R package to guess coordinates from a scatter plot (as an image)
R
13
star
8

bigdfr

R package to operate with data frames stored on disk
R
12
star
9

bigsparser

Sparse matrix format with data on disk
C++
10
star
10

bigutilsr

R package with utility functions for large-scale data
R
10
star
11

inplace

In-place operators for R
C++
8
star
12

UKBB-PGS

R
8
star
13

prettyjekyll

Convert R Markdown Pretty Document to Jekyll Markdown.
R
7
star
14

simus-PRS

Simulations and comparisons of Polygenic Risk Scores methods.
R
7
star
15

mmapcharr

Memory-map text files of characters as R matrices
R
6
star
16

privefl.github.io

My static website.
CSS
6
star
17

advr38pkg

Advanced R Programming course.
R
5
star
18

R-presentation

Presentation of R by an R enthusiast.
HTML
5
star
19

jekyll-now-r-template

This is a template for R bloggers based on Jekyll Now.
CSS
5
star
20

rmio

R package providing 'mio' header files (https://github.com/mandreyel/mio)
C++
5
star
21

libsassr

An R package to wrap the LibSass C++ library
CSS
4
star
22

paper-misspec

R
4
star
23

runonce

Run once and save result. Then, just read the result.
R
3
star
24

bigparallelr

Parallel tools in R
R
3
star
25

paper-infer

Inference with LDpred2-auto
R
3
star
26

bigassertr

Assertion tools.
R
3
star
27

paper4-bedpca

PCA paper
CSS
2
star
28

thesis

My thesis manuscript
TeX
1
star
29

minidplyr4

R
1
star
30

paper-ancestry-matching

TeX
1
star
31

C4-imputation

R
1
star
32

freq-ancestry

R
1
star
33

predict-everything

TeX
1
star
34

paper2-PRS

My second paper on comparisons of Polygenic Risk Scores
HTML
1
star