• Stars
    star
    177
  • Rank 209,516 (Top 5 %)
  • Language
    R
  • Created almost 8 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

R package for the analysis of massive SNP arrays.

R build status Codecov test coverage CRAN_Status_Badge DOI

bigsnpr

{bigsnpr} is an R package for the analysis of massive SNP arrays, primarily designed for human genetics. It enhances the features of package {bigstatsr} for the purpose of analyzing genotype data.

To get you started:

Installation

In R, run

# install.packages("remotes")
remotes::install_github("privefl/bigsnpr")

or for the CRAN version

install.packages("bigsnpr")

Input formats

This package reads bed/bim/fam files (PLINK preferred format) using functions snp_readBed() and snp_readBed2(). Before reading into this package's special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC() and snp_plinkKINGQC().

This package can also read UK Biobank BGEN files using function snp_readBGEN(). This function takes around 40 minutes to read 1M variants for 400K individuals using 15 cores.

This package uses a class called bigSNP for representing SNP data. A bigSNP object is a list with some elements:

  • $genotypes: A FBM.code256. Rows are samples and columns are variants. This stores genotype calls or dosages (rounded to 2 decimal places).
  • $fam: A data.frame with some information on the individuals.
  • $map: A data.frame with some information on the variants.

Note that most of the algorithms of this package don't handle missing values. You can use snp_fastImpute() (taking a few hours for a chip of 15K x 300K) and snp_fastImputeSimple() (taking a few minutes only) to impute missing values of genotyped variants.

Package {bigsnpr} also provides functions that directly work on bed files with a few missing values (the bed_*() functions). See paper "Efficient toolkit implementing..".

Polygenic scores

Polygenic scores are one of the main focus of this package. There are 3 main methods currently available:

  • Penalized regressions with individual-level data (see paper and tutorial)

  • Clumping and Thresholding (C+T) and Stacked C+T (SCT) with summary statistics and individual level data (see paper and tutorial).

  • LDpred2 with summary statistics (see paper and tutorial)

Possible upcoming features

You can request some feature by opening an issue.

Bug report / Support

How to make a great R reproducible example?

Please open an issue if you find a bug.

If you want help using {bigstatsr} (the big_*() functions), please open an issue on {bigstatsr}'s repo, or post on Stack Overflow with the tag bigstatsr.

I will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.

References

More Repositories

1

bigstatsr

R package for statistical tools with big matrices stored on disk.
R
174
star
2

rmarkdown-website-template

Template for creating your website with R markdown.
48
star
3

bigreadr

R package to read large text files based on splitting + data.table::fread
R
42
star
4

pixelart

R package and Shiny app to make pixel art models and to resize images
R
41
star
5

advr38book

Advanced R course as an online bookdown.
HTML
17
star
6

paper-ldpred2

Paper discribing LDpred2
TeX
16
star
7

img2coord

R package to guess coordinates from a scatter plot (as an image)
R
13
star
8

bigdfr

R package to operate with data frames stored on disk
R
12
star
9

bigsparser

Sparse matrix format with data on disk
C++
10
star
10

bigutilsr

R package with utility functions for large-scale data
R
10
star
11

inplace

In-place operators for R
C++
8
star
12

UKBB-PGS

R
8
star
13

prettyjekyll

Convert R Markdown Pretty Document to Jekyll Markdown.
R
7
star
14

simus-PRS

Simulations and comparisons of Polygenic Risk Scores methods.
R
7
star
15

privefl.github.io

My static website.
CSS
6
star
16

mmapcharr

Memory-map text files of characters as R matrices
R
5
star
17

advr38pkg

Advanced R Programming course.
R
5
star
18

R-presentation

Presentation of R by an R enthusiast.
HTML
5
star
19

jekyll-now-r-template

This is a template for R bloggers based on Jekyll Now.
CSS
5
star
20

rmio

R package providing 'mio' header files (https://github.com/mandreyel/mio)
C++
5
star
21

libsassr

An R package to wrap the LibSass C++ library
CSS
4
star
22

paper-misspec

R
4
star
23

runonce

Run once and save result. Then, just read the result.
R
3
star
24

paper-infer

Inference with LDpred2-auto
R
3
star
25

bigassertr

Assertion tools.
R
3
star
26

bigparallelr

Parallel tools in R
R
2
star
27

thesis

My thesis manuscript
TeX
1
star
28

paper4-bedpca

PCA paper
CSS
1
star
29

minidplyr4

R
1
star
30

freq-ancestry

R
1
star
31

paper-ancestry-matching

TeX
1
star
32

C4-imputation

R
1
star
33

predict-everything

TeX
1
star
34

paper2-PRS

My second paper on comparisons of Polygenic Risk Scores
HTML
1
star