• Stars
    star
    124
  • Rank 288,207 (Top 6 %)
  • Language
    C++
  • License
    Other
  • Created over 6 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

TPMCalculator quantifies mRNA abundance directly from the alignments by parsing BAM files

TPMCalculator

install with bioconda Anaconda-Server Badge Anaconda-Server Badge

TPMCalculator quantifies mRNA abundance directly from the alignments by parsing BAM files. The input parameters are the same GTF files used to generate the alignments, and one or multiple input BAM file(s) containing either single-end or paired-end sequencing reads. The TPMCalculator output is comprised of four files per sample reporting the TPM values and raw read counts for genes, transcripts, exons and introns respectively.

Reference

  • Roberto Vera Alvarez, Lorinc Sandor Pongor, Leonardo Mariño-Ramírez, David Landsman; TPMCalculator: one-step software to quantify mRNA abundance of genomic features, Bioinformatics, , bty896, https://doi.org/10.1093/bioinformatics/bty896

Conda/Bioconda

TPMCalculator is available on Bioconda: https://bioconda.github.io/recipes/tpmcalculator/README.html

NIH Biowulf

NIH Biowulf users can load TPMcalculator as a module: https://hpc.nih.gov/apps/TPMCalculator.html

Requirements

BAMTools

Clone the BAMTools repository from GitHub: https://github.com/pezmaster31/bamtools

Compile it on this way and set the environment variables for TPMCalculator:

cd bamtools
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=../ ..
make
make install
cd ..
export BAMTOOLS_DIR=`pwd`
export CPPFLAGS="-I $BAMTOOLS_DIR/include/bamtools/"
export LDFLAGS="-L $BAMTOOLS_DIR/lib64 -Wl,-rpath,$BAMTOOLS_DIR/lib64"

That's it. BAMTools was compiled and the env variables were set for compiling TPMCalculator.

Installation

After the installation of BAMTools go to the TPMCalculator folder and do make:

make

A bin folder will be created with the TPMCalculator executable.

Docker

Use provided Dockerfile based on the BioContainers base image.

docker build -t biocontainers/tpmcalculator:0.0.1 https://raw.githubusercontent.com/ncbi/TPMCalculator/master/Dockerfile

docker run -v /path_to_data:/data --user=yourUID:your:GID biocontainers/tpmcalculator:0.0.1 TPMCalculator -g /data/path_to_GTF/genes.gtf -b /data/path_to_bam/sample1.bam

CWL

A CWL tool definition is also provided tpmcalculator.cwl

Use it like this:

cwl-runner tpmcalculator.cwl --out_stderr=test.stderr --out_stdout=test.stdout -g genes.gtf -b sample_1.bam

Usage

Usage: ./bin/TPMCalculator -g GTF_file [-d BAM_files_directory|-b BAM_file]

./bin/TPMCalculator options:

    -v    Print info
    -h    Display this usage information.
    -g    GTF file
    -d    Directory with the BAM files
    -b    BAM file
    -k    Gene key to use from GTF file. Default: gene_id
    -t    Transcript key to use from GTF file. Default: transcript_id
    -c    Smaller size allowed for an intron created for genes. Default: 16. We recommend to use the reads length
    -p    Use only properly paired reads. Default: No. Recommended for paired-end reads.
    -q    Minimum MAPQ value to filter out reads. Default: 0. This value depends on the aligner MAPQ value.
    -o    Minimum overlap between a reads and a feature. Default: 8.
    -e    Extended output. This will include transcript level TPM values. Default: No.
    -a    Print out all features with read counts equal to zero. Default: No.

Description

The model to describe the genomic features used for a gene is created from the GTF provided by the user. TPMCalculator performs two transformations which are executed on the genomic coordinates generating regions for the genes that include the exons and “pure” intron regions as shown in Figure S1. The first transformation creates overlapped exons for all alternative spliced forms of the genes. A single gene model is generated with unique exons and introns which includes the sequence of all exonic regions. The second transformation process creates a list of pure intron regions that replace those generated by the first transformation. We should indicate that only the intron regions are modified to generate regions not overlapped by exons of other genes. Reporting TPM values for these unique introns allows further identification of alternative splicing events like intron retention. Additionally, a set of non-overlapped gene features (exons and introns) are generated and used for TPM calculation.

Gene model

Validation

For more detailed description and instalation guide lines see https://github.com/ncbi/TPMCalculator/wiki/

Credits

Roberto Vera Alvarez Email: [email protected]

Lorinc Pongor Email: [email protected]

Leonardo Mariño-Ramírez Email: [email protected]

David Landsman Email: [email protected]

Public Domain notice

National Center for Biotechnology Information.

This software is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the authors' official duties as United States Government employees and thus cannot be copyrighted. This software is freely available to the public for use. The National Library of Medicine and the U.S. Government have not placed any restriction on its use or reproduction.

Although all reasonable efforts have been taken to ensure the accuracy and reliability of the software and data, the NLM and the U.S. Government do not and cannot warrant the performance or results that may be obtained by using this software or data. The NLM and the U.S. Government disclaim all warranties, express or implied, including warranties of performance, merchantability or fitness for any particular purpose.

Please cite NCBI in any work or product based on this material.

More Repositories

1

sra-tools

SRA Tools
C
1,093
star
2

GeneGPT

Code and data for GeneGPT.
Python
370
star
3

datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
Jupyter Notebook
347
star
4

pgap

NCBI Prokaryotic Genome Annotation Pipeline
Common Workflow Language
300
star
5

amr

AMRFinderPlus - Identify AMR genes and point mutations, and virulence and stress resistance genes in assembled bacterial nucleotide and protein sequence.
C++
262
star
6

PubReader

A new way to view journal articles
JavaScript
193
star
7

icn3d

web-based protein structure viewer and analysis tool interactively or in batch mode
JavaScript
151
star
8

MedCPT

Code for MedCPT, a model for zero-shot biomedical information retrieval.
Python
123
star
9

dbsnp

dbSNP
Jupyter Notebook
120
star
10

ngs

NGS Language Bindings
C++
118
star
11

SKESA

SKESA assembler
C++
112
star
12

blast_plus_docs

106
star
13

ngs-tools

C++
102
star
14

vadr

Viral Annotation DefineR: classification and annotation of viral sequences based on RefSeq annotation
Perl
98
star
15

fcs

Foreign Contamination Screening caller scripts and documentation
95
star
16

ncbi-vdb

ncbi-vdb
C
89
star
17

gprobe

client app for the gRPC health-checking protocol
Go
84
star
18

robotframework-pageobjects

Implementation of the Page Object pattern with Robot Framework and selenium. Also facilitates page object pattern independent of Robot Framework
Python
84
star
19

SSDraw

Jupyter Notebook
74
star
20

BAMscale

BAMscale is a one-step tool for either 1) quantifying and normalizing the coverage of peaks or 2) generated scaled BigWig files for easy visualization of commonly used DNA-seq capture based methods.
C
66
star
21

clinvar

ClinVar aggregates information about genomic variation and its relationship to human health. Contact us at '[email protected]' with any questions or comments.
HTML
66
star
22

rapt

Read Assembly and Annotation Pipeline Tool
57
star
23

ncbi-cxx-toolkit-public

NCBI C++ Toolkit package sources
C++
49
star
24

JATSPreviewStylesheets

JATS Preview Stylesheets
XSLT
48
star
25

docker

Dockerfile
46
star
26

sra-human-scrubber

An SRA tool that takes as input local fastq file from a clinical infection sample, identifies and removes any significant human read, and outputs the edited (cleaned) fastq file that can safely be used for SRA submission.
Shell
45
star
27

elastic-blast

ElasticBLAST is a cloud-based tool to perform your BLAST searches faster and make you more effective
Python
43
star
28

BioConceptVec

Jupyter Notebook
40
star
29

dbvar

dbVar
39
star
30

AIONER

AIONER
Python
38
star
31

magicblast

Python
34
star
32

JUDI

This repository contains the source code of JUDI, a workflow management system for developing complex bioinformatics software with many parameter settings. Bioinformatics pipeline: Just Do It!
Python
33
star
33

sratoolkit

SRAToolkit has been REPLACED - see README
32
star
34

bert_gt

Python
30
star
35

egapx

Eukaryotic Genome Annotation Pipeline-External caller scripts and documentation
Nextflow
27
star
36

osiris

OSIRIS is a public domain quality assurance software package that facilitates the assessment of multiplex short tandem repeat (STR) DNA profiles based on laboratory-specific protocols. OSIRIS evaluates the raw electrophoresis data contained in .fsa or .hid files using an independently derived mathematically-based sizing algorithm. OSIRIS currently supports ABI capillary analytical platforms and numerous commercially available marker kits including all CODIS-compliant kits as well as those favored by biomedical laboratories.
C++
26
star
37

pm4ngs

Project Manager for NGS data analysis
Python
25
star
38

BioREx

Python
25
star
39

cwl-ngs-workflows-cbb

A set of CWL tools and workflows used by NCBI Computational Biology Branch for NGS data analysis
Common Workflow Language
23
star
40

consul-announcer

Service announcer for Consul (https://www.consul.io/).
Python
22
star
41

scPopCorn

A python tool to do comparative analysis of mulitple single cell datasets.
Jupyter Notebook
21
star
42

workshop-ncbi-data-with-python

Python
20
star
43

BioRED

19
star
44

cxx-toolkit

HTML
18
star
45

EvoGeneX

This repository contains the source code of the R package for EvoGeneX, a software to infer the mode of evolution from the gene expression data.
R
17
star
46

tree-tool

Incremental building of phylogenetic distance trees
C++
16
star
47

GNorm2

Java
15
star
48

pipelines

Common Workflow Language
14
star
49

tmVar3

Java
14
star
50

graf

Genetic Relationship And Fingerprinting
Perl
13
star
51

ribovore

Perl
13
star
52

biomedical-citation-selector

Python
12
star
53

PMCXMLConverters

PMC XML Converters
XSLT
12
star
54

gaptools

dbGaP data validation tool repo
Shell
11
star
55

AF2_benchmark

Jupyter Notebook
11
star
56

sars2variantcalling

The NCBI SARS-CoV-2 Variant Calling (SC2VC) Pipeline allows calling high-confidence variants from SARS-CoV-2 NGS data in a standardized format
Perl
11
star
57

ICITY

Python
10
star
58

fcs-gx

Foreign Contamination Screening - GX source code
C++
10
star
59

blast-cloud

Documentation for NCBI BLAST AMI
CSS
10
star
60

RepairSig

Python
10
star
61

finagle-consul

Service discovery for Finagle cluster with Consul.
9
star
62

NetREX

Python
9
star
63

python-libpq-dev

Shell
8
star
64

packit

Python packaging in declarative way (wrapping pbr to make it flexible)
Python
8
star
65

workshop-asm-ngs-2022

Pre-conference workshop for ASM NGS 2022
Perl
8
star
66

elastic-blast-demos

ElasticBLAST demos
Jupyter Notebook
7
star
67

PSSS-Bytes2Biology

Petabyte Scale Sequence Search Initiative
Python
7
star
68

HYDROID

Python package for analyzing hydroxyl-radical footprinting experiments of DNA-protein complexes
Python
7
star
69

ncbi-drs

GA4GH DRS Service
Python
6
star
70

ncbi-logging

Log monitoring and gathering infrastructure to feed analytics
C++
6
star
71

cwl-demos

CWL demonstration pipelines
Common Workflow Language
6
star
72

SpeciesAssignment

SpeciesAssignment
Python
6
star
73

niso-jats

6
star
74

dual_fold_coevolution

Python
6
star
75

mti

NLM Medical Text Indexer (MTI)
C
6
star
76

DbGaP-FHIR-API-Docs

The documentation repository for the dbGaP FHIR API.
Jupyter Notebook
6
star
77

mtix

ML based NLM Medical Text Indexer
Python
5
star
78

ITSx

Not the official ITSx repository, please visit https://microbiology.se/software/itsx/
Perl
5
star
79

deeplensnet

Python
5
star
80

gtax

Python
5
star
81

biomedical-citation-selector-trainer

Biomedical Citation Selector Trainer
Python
5
star
82

SuPER

Python
5
star
83

Co-SELECT

This repository contains the source code of Co-SELECT, a computational tool to analyze the results of in vitro HT-SELEX experiments for TF-DNA binding to show the role of DNA shape in TF-DNA binding by using a novel method of deconvoluting the contributions of DNA sequence and DNA shape on the binding.
Assembly
5
star
84

AceView

Acedb object oriented database engine and AceView/MAGIC RNA_seq pipeline, NCBI/NLM/NIH
C
5
star
85

stxtyper

StxTyper uses a standardized algorithm to accurately type both known and unknown Shiga toxin operons from assembled genomic sequence.
C++
4
star
86

ncbi-xmlwrapp

NCBI’s fork of “xmlwrapp” -- a C++ wrapper for libxml2/libxslt libraries
C++
4
star
87

ncbi-cxx-toolkit-conan

NCBI C++ Toolkit package recipe
Python
4
star
88

biocreative_litcovid

Evaluation scripts of the Biocreative LitCovid track
Python
4
star
89

NETPHIX

A computational tool to identify mutated subnetworks that are associated with a continuous cancer phenotype
Python
4
star
90

SRPRISM

C++
4
star
91

deflake

deflake.py Helps debug a non determinate test (or any flaky program) by running it until it exits with a non-zero exit code.
Python
4
star
92

ncbi_doc_template

CSS
3
star
93

Solr-Plugins

Assorted plugins for Solr
Java
3
star
94

ncbi_css_standards

NCBI CSS
HTML
3
star
95

datadicer

JavaScript
3
star
96

GeneSigNet

HTML
3
star
97

nlm-dtd

HTML
3
star
98

CoV-Dist

HTML
3
star
99

cloud-transcriptome-annotation

Time and cost comparison on GCP and AWS for transcriptome annotation
Jupyter Notebook
3
star
100

elastic-blast-docs

ElasticBLAST documentation
3
star