• Stars
    star
    107
  • Rank 323,587 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created about 11 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

American Gut open-access data and IPython notebooks

American-Gut

American Gut open-access code and IPython notebooks

A note about data

American Gut sequences and metadata are deposited in The European Bioinformatics Institute under the accession ERP012803.

Bloom sequences found in the data repository are correct and up to date.

OTU tables and mapping files hosted in this repository reflects the state of the project in May 2015 and before. This includes an earlier version of the American Gut survey and dietary questionnaire. Data in GitHub has been scrubbed for PHI. A listing of processed data with the new survey can be found at ftp://ftp.microbio.me/AmericanGut.

The latest OTU tables and precalculated diversity comparisons generated by the primary processing notebook set can be found at ftp://ftp.microbio.me/AmericanGut/latest.

======= American Gut open-access data and IPython notebooks

INSTALL

Basics

American-Gut repository is intended to be used as a project/repo meaning there is no need to install it (ignore setup.py at the moment).

After cloning the repository and before using the scripts user should install necessary dependencies. Two approaches are supported at the moment.

Conda based

If you're choice of package manager is conda dependencies can be installed with

$ conda install --file ./conda_requirements.txt
$ pip install -r ./pip_requirements.txt

If you would like to install dependencies within a conda environment be sure to change to the appropriate environment prior to the installation of dependencies.

Note: Be aware that with pip some libraries will have to be compiled from source so appropriate system libraries should be installed prior to running the pip command. For more details take a look at Supported Systems section.

PIP based

$ pip install numpy==1.9.2
$ pip install -r ./pip_requirements.txt

If you would like to install dependencies within a virtualenv environment be sure to change to the appropriate environment prior to the installation of dependencies.

Note: Be aware that with pip some libraries will have to be compiled from source so appropriate system libraries should be installed prior to running the pip command. For more details take a look at Supported Systems section.

Supported Operating Systems / Distributions

Debian 8

Tested with Debian 8.3.0 (amd64).

To compile dependencies from source appropriate libraries can be installed (as root/sudo) with

(root/sudo)$ aptitude install pkg-config libxslt1-dev libxml2 libfreetype6 \
    build-essential python-pip python-dev liblapack-dev liblapack3 \
    libfreetype6-dev libblas-dev libblas3 gfortran libhdf5-serial-dev libsm6

RUN

Basics

Although American-Gut repo provides separate scripts (scripts folder) and a package (americangut folder) it is primarily intended to be used through notebooks (ipynb folder).

There are a few environment variable that can be used to customize the run:

  • AG_TESTING: if set to True scripts will not download AmericanGut EBI data (ERP012803) but instead work with test data (subset of the original EBI data). This is useful for testing.
  • AG_CPU_COUNT: Number of process to use when parallelizing code (defaults to the number of cores)

To generate reports (pdfs) a TeX distribution should be installed on the system.

Adjusting environment on POSIX systems

Since American-Gut repo contains scripts and packages we need to adjust PYTHONPATH and PATH to reflect this. Therefore, prior to working with notebooks execute the following from within the American-Gut repo:

REPO=`pwd`
$ export PYTHONPATH=$REPO/:$PYTHONPATH
$ export PATH=$REPO/scripts:$PATH

If needed adjust AG_* environment variables from Basics section.

Run notebooks

Notebooks are written in two formats and therefore require different profiles.

Markdown based notebooks

Markdown based notebooks can be found in ./ipynb/primary-processing/ folder and have extension md. To use these notebooks we first need to create a profile for ag_ipymd with

$ ipython profile create ag_ipymd

and adjust newly created /path/to/.ipython/profile_ag_ipymd/ipython_notebook_config.py by adding

#------------------------
# ipymd
#------------------------
c.NotebookApp.contents_manager_class = 'ipymd.IPymdContentsManager'

to the end of the file.

Now, we can start ipython with

$ ipython notebook --profile=ag_ipymd

and visit the newly started notebook server by going to http://localhost:8888

Jupyter/IPython based notebooks

Notebooks in native notebook format (ipynb) can be found in ./ipynb/ folder and have the extension ipynb. To use these notebooks we first need to create a profile for ag_default with

$ ipython profile create ag_default

Now, we can start ipython with

$ ipython --profile=ag_default notebook

and visit the newly started notebook server by going to http://localhost:8888

More Repositories

1

scikit-bio

scikit-bio is an open-source, BSD-licensed, Python package providing data structures, algorithms, and educational resources for bioinformatics.
Python
781
star
2

qiime

Official QIIME 1 software repository. QIIME 2 (https://qiime2.org) has succeeded QIIME 1 as of January 2018.
Python
285
star
3

sortmerna

SortMeRNA: next-generation sequence filtering and alignment tool
C++
169
star
4

emp

Code repository of the Earth Microbiome Project.
Jupyter Notebook
154
star
5

mmvec

Neural networks for microbe-metabolite interaction analysis
Python
117
star
6

biom-format

The Biological Observation Matrix (BIOM) Format Project
Python
92
star
7

deblur

Deblur is a greedy deconvolution algorithm based on known read error profiles.
Python
91
star
8

tcga

Microbial analysis in TCGA data
Jupyter Notebook
88
star
9

gemelli

Gemelli is a tool box for running Robust Aitchison PCA (RPCA), Joint Robust Aitchison PCA (Joint-RPCA), TEMPoral TEnsor Decomposition (TEMPTED), and Compositional Tensor Factorization (CTF) on sparse compositional omics datasets.
Python
67
star
10

songbird

Vanilla regression methods for microbiome differential abundance analysis
Python
56
star
11

gneiss

compositional data analysis toolbox
Jupyter Notebook
55
star
12

emperor

Emperor a tool for the analysis and visualization of large microbial ecology datasets
JavaScript
52
star
13

empress

A fast and scalable phylogenetic tree viewer for microbiome data analysis
JavaScript
45
star
14

redbiom

Sample search by metadata and features
Python
44
star
15

unifrac

Python
37
star
16

scikit-bio-cookbook

Recipes for bioinformatics analyses with scikit-bio
Jupyter Notebook
36
star
17

DEICODE

Robust Aitchison PCA from sparse count data
JavaScript
33
star
18

q2-qemistree

Hierarchical orderings for mass spectrometry data. Canonically pronounced "chemis-tree".
Python
31
star
19

qurro

Visualize differentially ranked features (taxa, metabolites, ...) and their log-ratios across samples
JavaScript
31
star
20

calour

exploratory and interactive microbiome analyses based on heatmaps
Python
27
star
21

q2-greengenes2

A QIIME 2 plugin for interaction with the Greengenes2 database
Python
26
star
22

wol

Reference Phylogeny for Bacterial and Archaeal Genomes
Jupyter Notebook
24
star
23

BIRDMAn

Bayesian Inferential Regression for Differential Microbiome Analysis
Python
22
star
24

Platypus-Conquistador

Confirming specific taxonomic groups within your samples.
Python
19
star
25

micronota

annotation pipeline for microbial genomes and metagenomes
Python
18
star
26

tax2tree

Automated taxonomy decoration onto a tree
Python
14
star
27

evident

Python
14
star
28

qadabra

Snakemake workflow for comparison of differential abundance ranks
Python
13
star
29

oecophylla

shotgun pipeline
Python
11
star
30

horizomer

Workflow for detecting genome-wide horizontal gene transfers
Python
11
star
31

greengenes2

Processing support for Greengenes2
Python
11
star
32

pyqi

Tools for developing and testing command line interfaces in Python.
Python
9
star
33

burrito

Python framework for controlling command-line applications.
Python
8
star
34

pynast

Python Nearest Alignment Space Termination tool (PyNAST): Official repository for software and unit tests
Python
8
star
35

metagenomics_pooling_notebook

Jupyter notebooks to assist with sample processing
Python
8
star
36

my-microbes

A set of tools for delivering personal microbiome results to individuals participating in microbiome sequencing studies.
Python
7
star
37

zebra_filter

Filtering out false taxonomic hits from shotgun sequencing based on genome coverage
Python
7
star
38

burrito-fillings

Application controllers for command line bioinformatics applications
Python
7
star
39

Evident-initial-demo

Elucidating sampling effort for microbial analysis studies
JavaScript
7
star
40

mds-approximations

Multidimensional scaling algorithms for microbiology-ecology datasets.
Python
6
star
41

microsetta-private-api

A private microservice to support The Microsetta Initiative
Python
6
star
42

conda-recipes

conda recipes for bioinformatic tools like blast+, infernal, etc.
Python
6
star
43

american-gut-web

The website for the American Gut Project participant portal
Python
5
star
44

qiime-default-reference

Default reference data files for use with QIIME.
Python
4
star
45

scikit-bio-rfcs

Request For Comments (RFCs) for scikit-bio.
4
star
46

labadmin

Administration website for the Knight Lab
Python
4
star
47

q2-umap

Applying umap to microbiome data via QIIME2
Python
4
star
48

improved-octo-waddle

Balanced parentheses succinct data structure in Python
Jupyter Notebook
4
star
49

dsFDR

descrete False Discovery Rate method
Python
3
star
50

SitePainter

A tool for exploring biogeographical patterns
JavaScript
3
star
51

bayestime

Jupyter Notebook
3
star
52

genome-subsampler

Statistical and empirical subsampling of reference genomes
Jupyter Notebook
3
star
53

micov

Aggregate genome coverage
Python
3
star
54

cmi-workshops

2
star
55

taxster

taxster: assigning taxonomy to organisms you've never even heard of
Python
2
star
56

PipeClust

MPI-based sequence clusterer
C
2
star
57

microsetta-public-api

A public microservice to support The Microsetta Initiative
Python
2
star
58

LabControl

lab manager for plate maps and sequence flows
Python
2
star
59

american-gut-rest

RESTful interface into the American Gut data
Python
2
star
60

unifrac-binaries

C++
1
star
61

biocore.github.io

CSS
1
star
62

q2-ili

QIIME2 plugin for `ili
Python
1
star
63

q2-katharoseq

Python
1
star
64

microsetta-interface

The Microsetta participant facing user interface
Jinja
1
star
65

qiime-workshops

Materials for biocore organized workshops
Jupyter Notebook
1
star
66

microprot

structural annotation pipeline for microbial genomes and metagenomes
Python
1
star
67

mg-scripts

Knight Lab internal Metagenomic processing scripts for demultiplexing, QC and host removal
Python
1
star
68

sage-emperor

Emperor implementation in the SAGE2 framework
JavaScript
1
star
69

q2-mislabeled

A QIIME 2 plugin for assessing sample mislabeling and contamination
Python
1
star
70

q2-american-gut

A QIIME2 plugin for working with and processing American Gut data
Python
1
star
71

basespace-qiime

QIIME's BaseSpace App
HTML
1
star