American-Gut
American Gut open-access code and IPython notebooks
A note about data
American Gut sequences and metadata are deposited in The European Bioinformatics Institute under the accession ERP012803.
Bloom sequences found in the data repository are correct and up to date.
OTU tables and mapping files hosted in this repository reflects the state of the project in May 2015 and before. This includes an earlier version of the American Gut survey and dietary questionnaire. Data in GitHub has been scrubbed for PHI. A listing of processed data with the new survey can be found at ftp://ftp.microbio.me/AmericanGut.
The latest OTU tables and precalculated diversity comparisons generated by the primary processing notebook set can be found at ftp://ftp.microbio.me/AmericanGut/latest.
======= American Gut open-access data and IPython notebooks
INSTALL
Basics
American-Gut repository is intended to be used as a project/repo
meaning there is no need to install it (ignore setup.py
at the moment).
After cloning the repository and before using the scripts user should install necessary dependencies. Two approaches are supported at the moment.
Conda based
If you're choice of package manager is conda dependencies can be installed with
$ conda install --file ./conda_requirements.txt
$ pip install -r ./pip_requirements.txt
If you would like to install dependencies within a conda environment be sure to change to the appropriate environment prior to the installation of dependencies.
Note: Be aware that with pip some libraries will have to be compiled from source so appropriate system libraries should be installed prior to running the pip command. For more details take a look at Supported Systems section.
PIP based
$ pip install numpy==1.9.2
$ pip install -r ./pip_requirements.txt
If you would like to install dependencies within a virtualenv environment be sure to change to the appropriate environment prior to the installation of dependencies.
Note: Be aware that with pip some libraries will have to be compiled from source so appropriate system libraries should be installed prior to running the pip command. For more details take a look at Supported Systems section.
Supported Operating Systems / Distributions
Debian 8
Tested with Debian 8.3.0 (amd64).
To compile dependencies from source appropriate libraries can be installed (as root/sudo) with
(root/sudo)$ aptitude install pkg-config libxslt1-dev libxml2 libfreetype6 \
build-essential python-pip python-dev liblapack-dev liblapack3 \
libfreetype6-dev libblas-dev libblas3 gfortran libhdf5-serial-dev libsm6
RUN
Basics
Although American-Gut repo provides separate scripts (scripts
folder)
and a package (americangut
folder) it is primarily intended to be used
through notebooks (ipynb
folder).
There are a few environment variable that can be used to customize the run:
- AG_TESTING: if set to
True
scripts will not download AmericanGut EBI data (ERP012803) but instead work with test data (subset of the original EBI data). This is useful for testing. - AG_CPU_COUNT: Number of process to use when parallelizing code (defaults to the number of cores)
To generate reports (pdfs) a TeX distribution should be installed on the system.
Adjusting environment on POSIX systems
Since American-Gut repo contains scripts and packages we need to adjust PYTHONPATH and PATH to reflect this. Therefore, prior to working with notebooks execute the following from within the American-Gut repo:
REPO=`pwd`
$ export PYTHONPATH=$REPO/:$PYTHONPATH
$ export PATH=$REPO/scripts:$PATH
If needed adjust AG_*
environment variables from Basics section.
Run notebooks
Notebooks are written in two formats and therefore require different profiles.
Markdown based notebooks
Markdown based notebooks can be found in ./ipynb/primary-processing/
folder
and have extension md
.
To use these notebooks we first need to create a profile for ag_ipymd
with
$ ipython profile create ag_ipymd
and adjust newly created /path/to/.ipython/profile_ag_ipymd/ipython_notebook_config.py
by adding
#------------------------
# ipymd
#------------------------
c.NotebookApp.contents_manager_class = 'ipymd.IPymdContentsManager'
to the end of the file.
Now, we can start ipython with
$ ipython notebook --profile=ag_ipymd
and visit the newly started notebook server by going to http://localhost:8888
Jupyter/IPython based notebooks
Notebooks in native notebook format (ipynb) can be found in ./ipynb/
folder
and have the extension ipynb
.
To use these notebooks we first need to create a profile for ag_default
with
$ ipython profile create ag_default
Now, we can start ipython with
$ ipython --profile=ag_default notebook
and visit the newly started notebook server by going to http://localhost:8888