metaGEM
π Note
An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data.
metaGEM
is a Snakemake workflow that integrates an array of existing bioinformatics and metabolic modeling tools, for the purpose of predicting metabolic interactions within bacterial communities of microbiomes. From whole metagenome shotgun datasets, metagenome assembled genomes (MAGs) are reconstructed, which are then converted into genome-scale metabolic models (GEMs) for in silico simulations. Additional outputs include abundance estimates, taxonomic assignment, growth rate estimation, pangenome analysis, and eukaryotic MAG identification.
βοΈ Installation
You can start using metaGEM
on your cluster with just one line of code with the mamba package manager
mamba create -n metagem -c bioconda metagem
This will create an environment called metagem
and start installing dependencies. Please consult the config/README.md
page for more detailed setup instructions.
π§ Usage
Clone this repo
git clone https://github.com/franciscozorrilla/metaGEM.git && cd metaGEM/workflow
Run metaGEM
without any arguments to see usage instructions:
bash metaGEM.sh
Usage: bash metaGEM.sh [-t|--task TASK]
[-j|--nJobs NUMBER OF JOBS]
[-c|--cores NUMBER OF CORES]
[-m|--mem GB RAM]
[-h|--hours MAX RUNTIME]
[-l|--local]
Options:
-t, --task Specify task to complete:
SETUP
createFolders
downloadToy
organizeData
check
CORE WORKFLOW
fastp
megahit
crossMapSeries
kallistoIndex
crossMapParallel
kallisto2concoct
concoct
metabat
maxbin
binRefine
binReassemble
extractProteinBins
carveme
memote
organizeGEMs
smetana
extractDnaBins
gtdbtk
abundance
BONUS
grid
prokka
roary
eukrep
eukcc
VISUALIZATION (in development)
stats
qfilterVis
assemblyVis
binningVis
taxonomyVis
modelVis
interactionVis
growthVis
-j, --nJobs Specify number of jobs to run in parallel
-c, --nCores Specify number of cores per job
-m, --mem Specify memory in GB required for job
-h, --hours Specify number of hours to allocated to job runtime
-l, --local Run jobs on local machine for non-cluster usage
π§ Try it now
You can set up and use metaGEM
on the cloud by following along the google colab notebook.
Please note that google colab does not provide the computational resources necessary to fully run metaGEM
on a real dataset. This notebook demonstrates how to set up and use metaGEM
by perfoming the first steps in the workflow on a toy dataset.
π© Tutorials
metaGEM
can be used to explore your own gut microbiome sequencing data from at-home-test-kit services such as unseen bio. The following tutorial showcases the metaGEM
workflow on two unseenbio samples.
For an introductory metabolic modeling tutorial, refer to the resources compiled for the EMBOMicroCom: Metabolite and species dynamics in microbial communities workshop in 2022.
For a more advanced tutorial, check out the resources we put together for the SymbNET: from metagenomics to metabolic interactions course in 2022.
ποΈ Wiki
Refer to the wiki for additional usage tips, frequently asked questions, and implementation details.
π¦ Datasets
- You can access the metaGEM-generated results for the publication here.
π§ͺ Small communities of gut microbes from lab cultures
π© Real gut microbiome samples from Swedish diabetes paper
πͺ΄ Plant-associated soil samples from Chinese rhizobiome study
π Bulk-soil samples from Australian biodiversity analysis
π Ocean water samples from global TARA Oceans expeditions
- Additionally, you can access metaGEM-generated results from a reanalysis of recently published ancient metagenomes here.
π Workflow
Core
- Quality filter reads with fastp
- Assembly with megahit
- Draft bin sets with CONCOCT, MaxBin2, and MetaBAT2
- Refine & reassemble bins with metaWRAP
- Taxonomic assignment with GTDB-tk
- Relative abundances with bwa and samtools
- Reconstruct & evaluate genome-scale metabolic models with CarveMe and memote
- Species metabolic coupling analysis with SMETANA
Bonus
- Growth rate estimation with GRiD, SMEG or CoPTR
- Pangenome analysis with roary
- Eukaryotic draft bins with EukRep and EukCC
ποΈ Active Development
If you want to see any new additional or alternative tools incorporated into the metaGEM
workflow please raise an issue or create a pull request. Snakemake allows workflows to be very flexible, so adding new rules is as easy as filling out the following template and adding it to the Snakefile:
rule package-name:
input:
rules.rulename.output
output:
f'{config["path"]["root"]}/{config["folder"]["X"]}/{{IDs}}/output.file'
message:
"""
Helpful and descriptive message detailing goal of this rule/package.
"""
shell:
"""
# Well documented command line instructions go here
# Load conda environment
set +u;source activate {config[envs][package]};set -u;
# Run tool
package-name -i {input} -o {output}
"""
ποΈ Publications
The metaGEM
workflow has been used in the following publications:
Plastic-degrading potential across the global microbiome correlates with recent pollution trends
J Zrimec, M Kokina, S Jonasson, F Zorrilla, A Zelezniak
MBio, 2021
Competition-cooperation in the chemoautotrophic ecosystem of Movile Cave: first metagenomic approach on sediments
Chiciudean, I., Russo, G., Bogdan, D.F. et al.
Environmental Microbiome, 2022
The National Ecological Observatory Networkβs soil metagenomes: assembly and basic analysis
Werbin ZR, Hackos B, Lopez-Nava J et al.
F1000Research, 2022
πΎ Please cite
metaGEM: reconstruction of genome scale metabolic models directly from metagenomes
Francisco Zorrilla, Filip Buric, Kiran R Patil, Aleksej Zelezniak
Nucleic Acids Research, 2021; gkab815, https://doi.org/10.1093/nar/gkab815
π² Contact
Please reach out with any comments, concerns, or discussions regarding metaGEM
.