• Stars
    star
    125
  • Rank 286,335 (Top 6 %)
  • Language
    R
  • License
    Artistic License 2.0
  • Created over 8 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Curated Metagenomic Data of the Human Microbiome

curatedMetagenomicData

code quality coverage

The curatedMetagenomicData package provides standardized, curated human microbiome data for novel analyses. It includes gene families, marker abundance, marker presence, pathway abundance, pathway coverage, and relative abundance for samples collected from different body sites. The bacterial, fungal, and archaeal taxonomic abundances for each sample were calculated with MetaPhlAn3, and metabolic functional potential was calculated with HUMAnN3. The manually curated sample metadata and standardized metagenomic data are available as (Tree)SummarizedExperiment objects.

Installation

To install curatedMetagenomicData from Bioconductor, use BiocManager as follows.

BiocManager::install("curatedMetagenomicData")

To install curatedMetagenomicData from GitHub, use BiocManager as follows.

BiocManager::install("waldronlab/curatedMetagenomicData", dependencies = TRUE, build_vignettes = TRUE)

Most users should simply install curatedMetagenomicData from Bioconductor.

Examples

To access curated metagenomic data, users will use the curatedMetagenomicData() method both to query and return resources. Multiple resources can be queried or returned with a single call to curatedMetagenomicData(), but only the titles of resources are returned by default.

curatedMetagenomicData("AsnicarF_20.+")
## 2021-03-31.AsnicarF_2017.gene_families
## 2021-03-31.AsnicarF_2017.marker_abundance
## 2021-03-31.AsnicarF_2017.marker_presence
## 2021-03-31.AsnicarF_2017.pathway_abundance
## 2021-03-31.AsnicarF_2017.pathway_coverage
## 2021-03-31.AsnicarF_2017.relative_abundance
## 2021-10-14.AsnicarF_2017.gene_families
## 2021-10-14.AsnicarF_2017.marker_abundance
## 2021-10-14.AsnicarF_2017.marker_presence
## 2021-10-14.AsnicarF_2017.pathway_abundance
## 2021-10-14.AsnicarF_2017.pathway_coverage
## 2021-10-14.AsnicarF_2017.relative_abundance
## 2021-03-31.AsnicarF_2021.gene_families
## 2021-03-31.AsnicarF_2021.marker_abundance
## 2021-03-31.AsnicarF_2021.marker_presence
## 2021-03-31.AsnicarF_2021.pathway_abundance
## 2021-03-31.AsnicarF_2021.pathway_coverage
## 2021-03-31.AsnicarF_2021.relative_abundance

When the dryrun argument is set to FALSE, a list of SummarizedExperiment and/or TreeSummarizedExperiment objects is returned. The rownames argument determines the type of rownames to use for relative_abundance resources: either "long" (the default), "short" (species name), or "NCBI" (NCBI Taxonomy ID). When a single resource is requested, a single element list is returned.

curatedMetagenomicData("AsnicarF_2017.relative_abundance", dryrun = FALSE, rownames = "short")
## $`2021-10-14.AsnicarF_2017.relative_abundance`
## class: TreeSummarizedExperiment 
## dim: 296 24 
## metadata(1): agglomerated_by_rank
## assays(1): relative_abundance
## rownames(296): Escherichia coli Bifidobacterium bifidum ...
##   Streptococcus gordonii Abiotrophia sp. HMSC24B09
## rowData names(7): superkingdom phylum ... genus species
## colnames(24): MV_FEI1_t1Q14 MV_FEI2_t1Q14 ... MV_MIM5_t2M14
##   MV_MIM5_t3F15
## colData names(22): study_name subject_id ... lactating curator
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (296 rows)
## rowTree: 1 phylo tree(s) (10430 leaves)
## colLinks: NULL
## colTree: NULL

When the counts argument is set to TRUE, relative abundance proportions are multiplied by read depth and rounded to the nearest integer prior to being returned. Also, when multiple resources are requested, the list will contain named elements corresponding to each SummarizedExperiment and/or TreeSummarizedExperiment object.

curatedMetagenomicData("AsnicarF_20.+.relative_abundance", dryrun = FALSE, counts = TRUE, rownames = "short")
## $`2021-10-14.AsnicarF_2017.relative_abundance`
## class: TreeSummarizedExperiment 
## dim: 296 24 
## metadata(1): agglomerated_by_rank
## assays(1): relative_abundance
## rownames(296): Escherichia coli Bifidobacterium bifidum ...
##   Streptococcus gordonii Abiotrophia sp. HMSC24B09
## rowData names(7): superkingdom phylum ... genus species
## colnames(24): MV_FEI1_t1Q14 MV_FEI2_t1Q14 ... MV_MIM5_t2M14
##   MV_MIM5_t3F15
## colData names(22): study_name subject_id ... lactating curator
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (296 rows)
## rowTree: 1 phylo tree(s) (10430 leaves)
## colLinks: NULL
## colTree: NULL
## 
## $`2021-03-31.AsnicarF_2021.relative_abundance`
## class: TreeSummarizedExperiment 
## dim: 633 1098 
## metadata(1): agglomerated_by_rank
## assays(1): relative_abundance
## rownames(633): Phocaeicola vulgatus Bacteroides stercoris ...
##   Pyramidobacter sp. C12-8 Brevibacterium aurantiacum
## rowData names(7): superkingdom phylum ... genus species
## colnames(1098): SAMEA7041133 SAMEA7041134 ... SAMEA7045952 SAMEA7045953
## colData names(24): study_name subject_id ... family treatment
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (633 rows)
## rowTree: 1 phylo tree(s) (10430 leaves)
## colLinks: NULL
## colTree: NULL

Analyses

See curatedMetagenomicAnalyses for analyses in R and Python using curatedMetagenomicData.

Contributing

To contribute to the curatedMetagenomicData R/Bioconductor package, first read the contributing guidelines and then open an issue. Also, note that in contributing you agree to abide by the code of conduct.


Pasolli E, Schiffer L, Manghi P, Renson A, Obenchain V, Truong D, Beghini F, Malik F, Ramos M, Dowd J, Huttenhower C, Morgan M, Segata N, Waldron L (2017). Accessible, curated metagenomic data through ExperimentHub. Nat. Methods, 14 (11), 1023-1024. ISSN 1548-7091, 1548-7105, doi: 10.1038/nmeth.4468.

More Repositories

1

Statistical-Rethinking

An interactive online reading of McElreath's Statistical Rethinking
Rebol
125
star
2

data-science-seminar

Topics In Data Science and Bioinformatics
HTML
78
star
3

MultiAssayExperiment

Bioconductor package for management of multi-assay data
R
69
star
4

HGNChelper

Identify and correct invalid gene symbols
R
55
star
5

lefser

R implementation of the LEfSe method
R
43
star
6

curatedTCGAData

Curated Data From The Cancer Genome Atlas (TCGA) as MultiAssayExperiment Objects
R
43
star
7

The-Art-of-Data-Science

An interactive online reading of Matsui and Peng's The Art of Data Science
33
star
8

cBioPortalData

Integrate the cancer genomics portal, cBioPortal, using MultiAssayExperiment
R
30
star
9

curatedMetagenomicDataCuration

Sample Metadata Curation for curatedMetagenomicData
R
28
star
10

TCGAutils

Toolbox package for organizing and working with TCGA data
R
23
star
11

curatedMetagenomicDataAnalyses

Analyses in R and Python Using curatedMetagenomicData
Python
22
star
12

ProjectAsPackage

Demonstration of the use of an R package to organize a data analysis project
R
21
star
13

enrichOmics

Functional enrichment analysis of high-throughput omics data
Dockerfile
20
star
14

MultiAssayWorkshop

Multi-omic Integration and Analysis of cBioPortal and TCGA data with MultiAssayExperiment
TeX
19
star
15

SingleCellMultiModal

Single Cell multimodal data scripts for downloading datasets
R
17
star
16

CNVWorkshop

Workshop for CNV analysis with Bioconductor
R
17
star
17

AppStatBio

Applied Statistics for High-Throughput Biology
16
star
18

MicrobiomeWorkshop

A workshop on microbiome analysis using Bioconductor
R
14
star
19

PublicDataResources

Public data resources and Bioconductor: The goal of this workshop is to introduce Bioconductor packages for finding, accessing, and using large-scale public data resources including the Gene Expression Omnibus GEO, Sequence Read Archive SRA, the Genomic Data Commons GDC, and Bioconductor-hosted curated data resources for metagenomics, pharmacogenomics PharmacoDB, and The Cancer Genome Atlas.
Dockerfile
14
star
20

GSEABenchmarkeR

Reproducible GSEA Benchmarking
R
13
star
21

BiocNYC

Materials presented at the BiocNYC meet-up
HTML
12
star
22

curatedOvarianData

The curatedOvarianData package provides data for gene expression analysis in patients with ovarian cancer
R
11
star
23

BugSigDBcuration

For documenting issues related to BugSigDB curation.
10
star
24

HMP16SData

16S rRNA Sequencing Data from the Human Microbiome Project
R
9
star
25

bioconductor

Docker container built on bioconductor/bioconductor_docker
Shell
8
star
26

GSEABenchmarking

Data and analysis vignette for the GSEA Benchmarking manuscript
HTML
8
star
27

BugSigDB

A microbial signatures database
JavaScript
7
star
28

CNVRanger

Functionality for CNV analysis
R
7
star
29

MultiAssayExperiment.TCGA

R
7
star
30

subtypeHeterogeneity

Tumor subclonality of expression-based cancer subtypes
R
6
star
31

bugphyzz

Harmonized annotation of microbial physiology
R
5
star
32

presentations

A repository for public presentations
TeX
5
star
33

nychanesmicrobiome

Microbiome analysis of the NYC-HANES study
HTML
4
star
34

doppelgangR

Identify possibly duplicate samples in a list of ExpressionSets
R
4
star
35

BIOS2

Biostatistics 621 / 821
Jupyter Notebook
3
star
36

MultiAssayExperiment-CCLE

R
3
star
37

curatedMetagenomicDataTerminal

A Command-Line Interface for curatedMetagenomicData
R
3
star
38

AnVILWorkshop

AnVIL/Terra workshop for Bioconductor conference
Dockerfile
3
star
39

bugsigdbr

R-side access to published microbial signatures from BugSigDB
R
3
star
40

bugSigSimple

Simple exploratory analysis of curated microbe signatures
R
3
star
41

curatedMetagenomicDataPipeline

The Preprocessing Pipeline for curatedMetagenomicData
R
2
star
42

curatedTCGAWorkshop

Workshop material for the BiocNYC R/Bioconductor
R
2
star
43

MultiAssayExperimentWorkshop

A workshop on multi-omics data representation and analysis with MultiAssayExperiment
2
star
44

OmicsMLRepoData

Translate Bioconductor resources into AI/ML-ready format + Manual curation/harmonization of the metadata for improved usability
R
2
star
45

cheatsheets

WaldronLab Cheat Sheets
1
star
46

metadataClientR

R client for http://cmgd.waldronlab.io/api-docs/
R
1
star
47

bioc_docker

Shell
1
star
48

MultiAssayExperiment_Bioc2016

R
1
star
49

BugSigDBExports

BugSigDB data files
R
1
star
50

ImageSC

R
1
star
51

bugphyzzExports

R
1
star
52

oncoKBData

R
1
star
53

taxPPro

R
1
star
54

bugphyzzAnalyses

Analyses with data and signatures imported from the bugphyzz package.
R
1
star
55

waldronlab.r-universe.dev

https://waldronlab.r-universe.dev/
1
star
56

MetagenomicsClient

API client for curated metagenomics
Python
1
star
57

MultiAssayShiny

A shiny tool for exploring MultiAssayExperiment methods
R
1
star
58

utils

Misc. scripts for assorted tasks
Shell
1
star
59

cacheur

R
1
star
60

MicrobiomeOntology

Dockerfile
1
star
61

BugSigDBStats

BugSigDB Stats and Analysis
HTML
1
star
62

MultiAssayExperiment-NCI60

MultiAssayExperiment creation for NCI60 data
R
1
star