• Stars
    star
    439
  • Rank 99,247 (Top 2 %)
  • Language
    Perl
  • License
    GNU General Publi...
  • Created about 5 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Another Gtf/Gff Analysis Toolkit

Build Status Coverage Status Documentation Status install with bioconda docker_agat singularity_agat Anaconda-Server Badge Anaconda-Server Badge DOI

AGAT

Another Gtf/Gff Analysis Toolkit

Suite of tools to handle gene annotations in any GTF/GFF format.
>>docs<<


Table of Contents


What can AGAT do for you?

AGAT has the power to check, fix, pad missing information (features/attributes) of any kind of GTF and GFF to create complete, sorted and standardised gff3 format. Over the years it has been enriched by many many tools to perform just about any tasks that is possible related to GTF/GFF format files (sanitizing, conversions, merging, modifying, filtering, FASTA sequence extraction, adding information, etc). Comparing to other methods AGAT is robust to even the most despicable GTF/GFF files.

  • Standardize/sanitize any GTF/GFF file into a comprehensive GFF3 format (script with _sp_ prefix)

    See standardization/sanitization tool
    task tool
    check, fix, pad missing information into sorted and standardised gff3 agat_convert_sp_gxf2gxf.pl
    • add missing parent features (e.g. gene and mRNA if only CDS/exon exists).
    • add missing features (e.g. exon and UTR).
    • add missing mandatory attributes (i.e. ID, Parent).
    • fix identifiers to be uniq.
    • fix feature locations.
    • remove duplicated features.
    • group related features (if spread in different places in the file).
    • sort features (tabix optional).
    • merge overlapping loci into one single locus (only if option activated).
  • Convert many formats

    See conversion tools
    task tool
    convert any GTF/GFF into BED format agat_convert_sp_gff2bed.pl
    convert any GTF/GFF into GTF format agat_convert_sp_gff2gtf.pl
    convert any GTF/GFF into tabulated format agat_sp_gff2tsv.pl
    convert any BAM from minimap2 into GFF format agat_convert_sp_minimap2_bam2gff.pl
    convert any GTF/GFF into ZFF format agat_sp_gff2zff.pl
    convert any GTF/GFF into any GTF/GFF (bioperl) format agat_convert_sp_gxf2gxf.pl
    convert BED format into GFF3 format agat_convert_bed2gff.pl
    convert EMBL format into GFF3 format agat_convert_embl2gff.pl
    convert genscan format into GFF3 format agat_convert_genscan2gff.pl
    convert mfannot format into GFF3 format agat_convert_mfannot2gff.pl
  • Perform numerous tasks (Just about anything that is possible)

    See tools
    task tool
    make feature statistics agat_sp_statistics.pl
    make function statistics agat_sp_functional_statistics.pl
    extract any type of sequence agat_sp_extract_sequences.pl
    extract attributes agat_sp_extract_attributes.pl
    complement annotations (non-overlapping loci) agat_sp_complement_annotations.pl
    merge annotations agat_sp_merge_annotations.pl
    filter gene models by ORF size agat_sp_filter_by_ORF_size.pl
    filter to keep only longest isoforms agat_sp_keep_longest_isoform.pl
    create introns features agat_sp_add_introns.pl
    fix cds phases agat_sp_fix_cds_phases.pl
    manage IDs agat_sp_manage_IDs.pl
    manage UTRs agat_sp_manage_UTRs.pl
    manage introns agat_sp_manage_introns.pl
    manage functional annotation agat_sp_manage_functional_annotation.pl
    specificity sensitivity agat_sp_sensitivity_specificity.pl
    fusion / split analysis between two annotations agat_sp_compare_two_annotations.pl
    analyze differences between BUSCO results agat_sp_compare_two_BUSCOs.pl
    ... and much more ... ... see here ...

About the GTF/GFF fromat
The GTF/GFF formats are 9-column text formats used to describe and represent genomic features. The formats have quite evolved since 1997, and despite well-defined specifications existing nowadays they have a great flexibility allowing holding wide variety of information. This flexibility has a drawback aspect, there is an incredible amount of flavour of the formats, that can result in problems when using downstream programs.
For a complete overview of the GTF/GFF formats have a look here.

Installation

Using Docker

See details

First you must have Docker installed and running.
Secondly have look at the availabe AGAT biocontainers at quay.io.

Then:

# get the chosen AGAT container version
docker pull quay.io/biocontainers/agat:0.8.0--pl5262hdfd78af_0
# use an AGAT's tool e.g. agat_convert_sp_gxf2gxf.pl
docker run quay.io/biocontainers/agat:0.8.0--pl5262hdfd78af_0 agat_convert_sp_gxf2gxf.pl --help

Using Singularity

See details

First you must have Singularity installed and running.
Secondly have look at the availabe AGAT biocontainers at quay.io.

Then:

# get the chosen AGAT container version
singularity pull docker://quay.io/biocontainers/agat:1.0.0--pl5321hdfd78af_0
# run the container
singularity run agat_1.0.0--pl5321hdfd78af_0.sif

You are now in the container. You can use an AGAT's tool e.g. agat_convert_sp_gxf2gxf.pl doing

agat_convert_sp_gxf2gxf.pl --help

Using Bioconda

See details

Install AGAT

conda install -c bioconda agat

Update AGAT

conda update agat

Uninstall AGAT

conda uninstall agat  

Old school - Manually

See details You will have to install all prerequisites and AGAT manually.

Install prerequisites

  • R (optional)
    You can install it by conda (conda install r-base), through CRAN (See here for a nice tutorial) or using your package management tool (e.g apt for Debian, Ubuntu, and related Linux distributions). R is optional and can be used to perform some plots. You will need to install the perl depency Statistics::R

  • Perl >= 5.8
    It should already be available on your computer. If you are unlucky perl.org is the place to go.

  • Perl modules
    They can be installed in different ways:

    • using cpan or cpanm
    cpanm install bioperl Clone Graph::Directed LWP::UserAgent Carp Sort::Naturally File::Share File::ShareDir::Install Moose YAML LWP::Protocol::https
    
    • using conda

      • using the provided yaml file
      conda env create -f conda_environment_AGAT.yml
      conda activate agat
      
      • manually
      conda install perl-bioperl perl-clone perl-graph perl-lwp-simple perl-carp perl-sort-naturally perl-file-share perl-file-sharedir-install perl-moose perl-yaml perl-lwp-protocol-https
      
    • using your package management tool (e.g apt for Debian, Ubuntu, and related Linux distributions)

    apt install libbio-perl-perl libclone-perl libgraph-perl liblwp-useragent-determined-perl libstatistics-r-perl libcarp-clan-perl libsort-naturally-perl libfile-share-perl libfile-sharedir libfile-sharedir-install-perl libyaml-perl liblwp-protocol-https-perl
    
  • Optional Some scripts offer the possibility to perform plots. You will need R and Statistics::R which are not included by default.

    • R
      You can install it by conda (conda install r-base), through CRAN (See here for a nice tutorial) or using your package management tool (e.g apt for Debian, Ubuntu, and related Linux distributions).

    • Statistics::R You can install it through conda (conda install perl-statistics-r), using cpan/cpanm (cpanm install Statistics::R), or your package management tool (apt install libstatistics-r-perl)

Install AGAT

git clone https://github.com/NBISweden/AGAT.git # Clone AGAT
cd AGAT                                         # move into AGAT folder
perl Makefile.PL                                # Check all the dependencies*
make                                            # Compile
make test                                       # Test
make install                                    # Install

*If dependencies are missing you will be warn. Please refer to the Install prerequisites section.

Remark: On MS Windows, instead of make you'd probably have to use dmake or nmake depending the toolchain you have.

Update AGAT

From the folder where the repository is located.

git pull                                        # Update to last AGAT
perl Makefile.PL                                # Check all the dependencies*
make                                            # Compile
make test                                       # Test
make install                                    # Install

*If dependencies are missing you will be warn. Please refer to the Install prerequisites section.

Change to a specific version

From the folder where the repository is located.

git pull                                        # Update the code
git checkout v0.1                               # use version v0.1 (See releases tab for a list of available versions)
perl Makefile.PL                                # Check all the dependencies*
make                                            # Compile
make test                                       # Test
make install                                    # Install

*If dependencies are missing you will be warn. Please refer to the Install prerequisites section.

Uninstall AGAT

perl uninstall_AGAT

Usage

script_name.pl -h

List of tools

As AGAT is a toolkit, it contains a lot of tools. The main one is agat_convert_sp_gxf2gxf.pl that allows to check, fix, pad missing information (features/attributes) of any kind of gtf and gff to create complete, sorted and standardised gff3 format. All the installed scripts have the agat_ prefix.

To have a look to the available tools you have several approaches:

  • agat --tools
  • Typing agat_ in your terminal followed by the key to activate the autocompletion will display the complete list of available tools.
  • The documentation.

More about the tools

with _sp_ prefix => Means SLURP

The gff file will be charged in memory in a specific data structure facilitating the access to desired features at any time. It has a memory cost but make life smoother. Indeed, it allows to perform complicated tasks in a more time efficient way. Moreover, it allows to fix all potential errors in the limit of the possibilities given by the format itself. See the AGAT parser section for more information about it.

with _sq_ prefix => Means SEQUENTIAL

The gff file is read and processed from its top to the end line by line without sanity check. This is memory efficient.

The AGAT parser - Standardisation to create GXF files compliant to any tool

All tools with agat_sp_ prefix will parse and slurps the entire data into a specific data structure called. Below you will find more information about peculiarity of the data structure, and the parsing approach used.

the data structure

The method create a hash structure containing all the data in memory. We can call it OMNISCIENT. The OMNISCIENT structure is a three levels structure:

$omniscient{level1}{tag_l1}{level1_id} = feature <= tag could be gene, match  
$omniscient{level2}{tag_l2}{idY} = @featureListL2 <= tag could be mRNA,rRNA,tRNA,etc. idY is a level1_id (know as Parent attribute within the level2 feature). The @featureListL2 is a list to be able to manage isoform cases.  
$omniscient{level3}{tag_l3}{idZ} =  @featureListL3 <= tag could be exon,cds,utr3,utr5,etc. idZ is the ID of a level2 feature (know as Parent attribute within the level3 feature). The @featureListL3 is a list to be able to put all the feature of a same tag together.  

How does the AGAT parser work

The AGAT parser phylosophy:

    1. Parse by Parent/child relationship or gene_id/transcript_id relationship.
    1. ELSE Parse by a common tag (an attribute value shared by feature that must be grouped together. By default we are using locus_tag but can be set by parameter).
    1. ELSE Parse sequentially (mean group features in a bucket, and the bucket change at each level2 feature, and bucket are join in a common tag at each new L1 feature).

/!\ Case with only level3 features (i.e rast or some prokka files, sequential will not work as expected. Indeed all features will be the child of only one newly created Parent. To create a parent per feature or group of features, a common tag must be used to group them correctly. We use gene_id and locus_tag by default but you can set up the one of your choice)

To resume by priority of way to parse: Parent/child relationship > locus_tag > sequential.
The parser may used only one or a mix of these approaches according of the peculiarity of the gtf/gff file you provide.

What can the AGAT parser do for you

  • It creates missing parental features. (e.g if a level2 or level3 feature do not have parental feature(s) we create the missing level2 and/or level1 feature(s)).
  • It creates missing mandatory attributes (ID and/or Parent).
  • It fixes identifier to be uniq.
  • It removes duplicated features (same position, same ID, same Parent).
  • It expands level3 features sharing multiple parents (e.g if one exon has list of multiple parent mRNA in its Parent attribute, one exon per parent with uniq ID will be created.
  • It fixes feature location errors (e.g an mRNA spanning over its gene location, we fix the gene location).
  • It adds UTR if possible (CDS and exon present).
  • It adds exon if possible (CDS has to be present).
  • It groups features together (if related features are spread at different places in the file).

examples

AGAT was tested on 42 different types of GTF/GFF of different flavours or/and containing errors. Below few are listed but you can find the full list of them into the t/gff_syntax directory.

example 8 - only CDS defined
See example
##gff-version 3
Tob1_contig1	Prodigal:2.60	CDS	476	670	.	-	0	ID=Tob1_00001;locus_tag=Tob1_00001;product=hypothetical protein
Tob1_contig1	Prodigal:2.60	CDS	34266	35222	.	+	0	ID=Tob1_00024;locus_tag=Tob1_00024;product=hypothetical protein
Tob1_contig1	SignalP:4.1	sig_peptide	34266	34298	.	+	0	inference=ab initio prediction:SignalP:4.1;note=predicted cleavage at residue 33;product=putative signal peptide
Tob1_contig1	Prodigal:2.60	CDS	35267	37444	.	-	0	ID=Tob1_00025;locus_tag=Tob1_00025;
Tob1_contig1	SignalP:4.1	sig_peptide	37420	37444	.	-	0	inference=ab initio prediction:SignalP:4.1;note=predicted cleavage at residue 25;product=putative signal peptide
Tob1_contig1	Prodigal:2.60	CDS	38304	39338	.	-	0	ID=Tob1_00026;locus_tag=Tob1_00026;

agat_convert_sp_gxf2gxf.pl --gff 8_test.gff

See result
##gff-version 3
Tob1_contig1	Prodigal:2.60	gene	476	670	.	-	0	ID=nbis_NEW-gene-1;locus_tag=Tob1_00001;product=hypothetical protein
Tob1_contig1	Prodigal:2.60	mRNA	476	670	.	-	0	ID=nbis_nol2id-cds-1;Parent=nbis_NEW-gene-1;locus_tag=Tob1_00001;product=hypothetical protein
Tob1_contig1	Prodigal:2.60	exon	476	670	.	-	.	ID=nbis_NEW-exon-1;Parent=nbis_nol2id-cds-1;locus_tag=Tob1_00001;product=hypothetical protein
Tob1_contig1	Prodigal:2.60	CDS	476	670	.	-	0	ID=Tob1_00001;Parent=nbis_nol2id-cds-1;locus_tag=Tob1_00001;product=hypothetical protein
Tob1_contig1	Prodigal:2.60	gene	34266	35222	.	+	0	ID=nbis_NEW-gene-2;locus_tag=Tob1_00024;product=hypothetical protein
Tob1_contig1	Prodigal:2.60	mRNA	34266	35222	.	+	0	ID=nbis_nol2id-cds-2;Parent=nbis_NEW-gene-2;locus_tag=Tob1_00024;product=hypothetical protein
Tob1_contig1	Prodigal:2.60	exon	34266	35222	.	+	.	ID=nbis_NEW-exon-2;Parent=nbis_nol2id-cds-2;locus_tag=Tob1_00024;product=hypothetical protein
Tob1_contig1	Prodigal:2.60	CDS	34266	35222	.	+	0	ID=Tob1_00024;Parent=nbis_nol2id-cds-2;locus_tag=Tob1_00024;product=hypothetical protein
Tob1_contig1	SignalP:4.1	sig_peptide	34266	34298	.	+	0	ID=sig_peptide-1;Parent=nbis_nol2id-cds-2;inference=ab initio prediction:SignalP:4.1;note=predicted cleavage at residue 33;product=putative signal peptide
Tob1_contig1	Prodigal:2.60	gene	35267	37444	.	-	0	ID=nbis_NEW-gene-3;locus_tag=Tob1_00025
Tob1_contig1	Prodigal:2.60	mRNA	35267	37444	.	-	0	ID=nbis_nol2id-cds-3;Parent=nbis_NEW-gene-3;locus_tag=Tob1_00025
Tob1_contig1	Prodigal:2.60	exon	35267	37444	.	-	.	ID=nbis_NEW-exon-3;Parent=nbis_nol2id-cds-3;locus_tag=Tob1_00025
Tob1_contig1	Prodigal:2.60	CDS	35267	37444	.	-	0	ID=Tob1_00025;Parent=nbis_nol2id-cds-3;locus_tag=Tob1_00025
Tob1_contig1	SignalP:4.1	sig_peptide	37420	37444	.	-	0	ID=sig_peptide-2;Parent=nbis_nol2id-cds-3;inference=ab initio prediction:SignalP:4.1;note=predicted cleavage at residue 25;product=putative signal peptide
Tob1_contig1	Prodigal:2.60	gene	38304	39338	.	-	0	ID=nbis_NEW-gene-4;locus_tag=Tob1_00026
Tob1_contig1	Prodigal:2.60	mRNA	38304	39338	.	-	0	ID=nbis_nol2id-cds-4;Parent=nbis_NEW-gene-4;locus_tag=Tob1_00026
Tob1_contig1	Prodigal:2.60	exon	38304	39338	.	-	.	ID=nbis_NEW-exon-4;Parent=nbis_nol2id-cds-4;locus_tag=Tob1_00026
Tob1_contig1	Prodigal:2.60	CDS	38304	39338	.	-	0	ID=Tob1_00026;Parent=nbis_nol2id-cds-4;locus_tag=Tob1_00026
example 9 - level2 feature missing (mRNA) and level3 features missing (UTRs)
See example
##gff-version 3
#!gff-spec-version 1.14
#!source-version NCBI C++ formatter 0.2
##Type DNA NC_003070.9
NC_003070.9	RefSeq	source	1	30427671	.	+	.	organism=Arabidopsis thaliana;mol_type=genomic DNA;db_xref=taxon:3702;chromosome=1;ecotype=Columbia
NC_003070.9	RefSeq	gene	3631	5899	.	+	.	ID=NC_003070.9:NAC001;locus_tag=AT1G01010;
NC_003070.9	RefSeq	exon	3631	3913	.	+	.	ID=NM_099983.2;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010;
NC_003070.9	RefSeq	exon	3996	4276	.	+	.	ID=NM_099983.2;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010;
NC_003070.9	RefSeq	exon	4486	4605	.	+	.	ID=NM_099983.2;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010;
NC_003070.9	RefSeq	exon	4706	5095	.	+	.	ID=NM_099983.2;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010;
NC_003070.9	RefSeq	exon	5174	5326	.	+	.	ID=NM_099983.2;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010;
NC_003070.9	RefSeq	exon	5439	5899	.	+	.	ID=NM_099983.2;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010;
NC_003070.9	RefSeq	CDS	3760	3913	.	+	0	ID=NM_099983.2;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010;
NC_003070.9	RefSeq	CDS	3996	4276	.	+	2	ID=NM_099983.2;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010;
NC_003070.9	RefSeq	CDS	4486	4605	.	+	0	ID=NM_099983.2;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010;
NC_003070.9	RefSeq	CDS	4706	5095	.	+	0	ID=NM_099983.2;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010;
NC_003070.9	RefSeq	CDS	5174	5326	.	+	0	ID=NM_099983.2;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010;
NC_003070.9	RefSeq	CDS	5439	5627	.	+	0	ID=NM_099983.2;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010;
NC_003070.9	RefSeq	start_codon	3760	3762	.	+	0	ID=NM_099983.2;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010;
NC_003070.9	RefSeq	stop_codon	5628	5630	.	+	0	ID=NM_099983.2;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010;

agat_convert_sp_gxf2gxf.pl --gff 8_test.gff

See result
##gff-version 3
#!gff-spec-version 1.14
#!source-version NCBI C++ formatter 0.2
##Type DNA NC_003070.9
NC_003070.9	RefSeq	source	1	30427671	.	+	.	ID=source-1;chromosome=1;db_xref=taxon:3702;ecotype=Columbia;mol_type=genomic DNA;organism=Arabidopsis thaliana
NC_003070.9	RefSeq	gene	3631	5899	.	+	.	ID=nbis_NEW-gene-1;locus_tag=AT1G01010
NC_003070.9	RefSeq	mRNA	3631	5899	.	+	.	ID=NC_003070.9:NAC001;Parent=nbis_NEW-gene-1;locus_tag=AT1G01010
NC_003070.9	RefSeq	exon	3631	3913	.	+	.	ID=NM_099983.2;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010
NC_003070.9	RefSeq	exon	3996	4276	.	+	.	ID=nbis_NEW-exon-1;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010
NC_003070.9	RefSeq	exon	4486	4605	.	+	.	ID=nbis_NEW-exon-2;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010
NC_003070.9	RefSeq	exon	4706	5095	.	+	.	ID=nbis_NEW-exon-3;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010
NC_003070.9	RefSeq	exon	5174	5326	.	+	.	ID=nbis_NEW-exon-4;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010
NC_003070.9	RefSeq	exon	5439	5899	.	+	.	ID=nbis_NEW-exon-5;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010
NC_003070.9	RefSeq	CDS	3760	3913	.	+	0	ID=nbis_NEW-cds-1;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010
NC_003070.9	RefSeq	CDS	3996	4276	.	+	2	ID=nbis_NEW-cds-1;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010
NC_003070.9	RefSeq	CDS	4486	4605	.	+	0	ID=nbis_NEW-cds-1;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010
NC_003070.9	RefSeq	CDS	4706	5095	.	+	0	ID=nbis_NEW-cds-1;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010
NC_003070.9	RefSeq	CDS	5174	5326	.	+	0	ID=nbis_NEW-cds-1;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010
NC_003070.9	RefSeq	CDS	5439	5627	.	+	0	ID=nbis_NEW-cds-1;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010
NC_003070.9	RefSeq	five_prime_UTR	3631	3759	.	+	.	ID=nbis_NEW-five_prime_utr-1;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010
NC_003070.9	RefSeq	start_codon	3760	3762	.	+	0	ID=nbis_NEW-start_codon-1;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010
NC_003070.9	RefSeq	stop_codon	5628	5630	.	+	0	ID=nbis_NEW-stop_codon-1;Parent=NC_003070.9:NAC001;locus_tag=AT1G01010
NC_003070.9	RefSeq	three_prime_UTR	5628	5899	.	+	.	ID=nbis_NEW-three_prime_utr-1;Parent=NC_003070.9:NAC001;gbkey=mRNA;locus_tag=AT1G01010
example 18 - related features spread within the file
See example
##gff-version 3
scaffold625	maker	gene	337818	343277	.	+	.	ID=CLUHARG00000005458;Name=TUBB3_2
scaffold625	maker	mRNA	337818	343277	.	+	.	ID=CLUHART00000008717;Parent=CLUHARG00000005458
scaffold625	maker	exon	337818	337971	.	+	.	ID=CLUHART00000008717:exon:1404;Parent=CLUHART00000008717
scaffold625	maker	exon	340733	340841	.	+	.	ID=CLUHART00000008717:exon:1405;Parent=CLUHART00000008717
scaffold789	maker	three_prime_UTR	564589	564780	.	+	.	ID=CLUHART00000006146:three_prime_utr;Parent=CLUHART00000006146
scaffold789	maker	mRNA	558184	564780	.	+	.	ID=CLUHART00000006147;Parent=CLUHARG00000003852
scaffold625	maker	CDS	337915	337971	.	+	0	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625	maker	CDS	340733	340841	.	+	0	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625	maker	CDS	341518	341628	.	+	2	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625	maker	CDS	341964	343033	.	+	2	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625	maker	five_prime_UTR	337818	337914	.	+	.	ID=CLUHART00000008717:five_prime_utr;Parent=CLUHART00000008717
scaffold625	maker	three_prime_UTR	343034	343277	.	+	.	ID=CLUHART00000008717:three_prime_utr;Parent=CLUHART00000008717
scaffold789	maker	gene	558184	564780	.	+	.	ID=CLUHARG00000003852;Name=PF11_0240
scaffold789	maker	mRNA	558184	564780	.	+	.	ID=CLUHART00000006146;Parent=CLUHARG00000003852
scaffold789	maker	exon	558184	560123	.	+	.	ID=CLUHART00000006146:exon:995;Parent=CLUHART00000006146
scaffold789	maker	exon	561401	561519	.	+	.	ID=CLUHART00000006146:exon:996;Parent=CLUHART00000006146
scaffold789	maker	exon	564171	564235	.	+	.	ID=CLUHART00000006146:exon:997;Parent=CLUHART00000006146
scaffold789	maker	exon	564372	564780	.	+	.	ID=CLUHART00000006146:exon:998;Parent=CLUHART00000006146
scaffold789	maker	CDS	558191	560123	.	+	0	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
scaffold789	maker	CDS	561401	561519	.	+	2	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
scaffold625	maker	exon	341518	341628	.	+	.	ID=CLUHART00000008717:exon:1406;Parent=CLUHART00000008717
scaffold625	maker	exon	341964	343277	.	+	.	ID=CLUHART00000008717:exon:1407;Parent=CLUHART00000008717
scaffold789	maker	CDS	564171	564235	.	+	0	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
scaffold789	maker	CDS	564372	564588	.	+	1	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
scaffold789	maker	five_prime_UTR	558184	558190	.	+	.	ID=CLUHART00000006146:five_prime_utr;Parent=CLUHART00000006146
scaffold789	maker	exon	558184	560123	.	+	.	ID=CLUHART00000006147:exon:997;Parent=CLUHART00000006147
scaffold789	maker	exon	561401	561519	.	+	.	ID=CLUHART00000006147:exon:998;Parent=CLUHART00000006147
scaffold789	maker	exon	562057	562121	.	+	.	ID=CLUHART00000006147:exon:999;Parent=CLUHART00000006147
scaffold789	maker	exon	564372	564780	.	+	.	ID=CLUHART00000006147:exon:1000;Parent=CLUHART00000006147
scaffold789	maker	CDS	558191	560123	.	+	0	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
scaffold789	maker	CDS	561401	561519	.	+	2	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
scaffold789	maker	CDS	562057	562121	.	+	0	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
scaffold789	maker	CDS	564372	564588	.	+	1	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
scaffold789	maker	five_prime_UTR	558184	558190	.	+	.	ID=CLUHART00000006147:five_prime_utr;Parent=CLUHART00000006147
scaffold789	maker	three_prime_UTR	564589	564780	.	+	.	ID=CLUHART00000006147:three_prime_utr;Parent=CLUHART00000006147

agat_convert_sp_gxf2gxf.pl --gff 18_test.gff

See result
##gff-version 3
scaffold625	maker	gene	337818	343277	.	+	.	ID=CLUHARG00000005458;Name=TUBB3_2
scaffold625	maker	mRNA	337818	343277	.	+	.	ID=CLUHART00000008717;Parent=CLUHARG00000005458
scaffold625	maker	exon	337818	337971	.	+	.	ID=CLUHART00000008717:exon:1404;Parent=CLUHART00000008717
scaffold625	maker	exon	340733	340841	.	+	.	ID=CLUHART00000008717:exon:1405;Parent=CLUHART00000008717
scaffold625	maker	exon	341518	341628	.	+	.	ID=CLUHART00000008717:exon:1406;Parent=CLUHART00000008717
scaffold625	maker	exon	341964	343277	.	+	.	ID=CLUHART00000008717:exon:1407;Parent=CLUHART00000008717
scaffold625	maker	CDS	337915	337971	.	+	0	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625	maker	CDS	340733	340841	.	+	0	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625	maker	CDS	341518	341628	.	+	2	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625	maker	CDS	341964	343033	.	+	2	ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625	maker	five_prime_UTR	337818	337914	.	+	.	ID=CLUHART00000008717:five_prime_utr;Parent=CLUHART00000008717
scaffold625	maker	three_prime_UTR	343034	343277	.	+	.	ID=CLUHART00000008717:three_prime_utr;Parent=CLUHART00000008717
scaffold789	maker	gene	558184	564780	.	+	.	ID=CLUHARG00000003852;Name=PF11_0240
scaffold789	maker	mRNA	558184	564780	.	+	.	ID=CLUHART00000006146;Parent=CLUHARG00000003852
scaffold789	maker	exon	558184	560123	.	+	.	ID=CLUHART00000006146:exon:995;Parent=CLUHART00000006146
scaffold789	maker	exon	561401	561519	.	+	.	ID=CLUHART00000006146:exon:996;Parent=CLUHART00000006146
scaffold789	maker	exon	564171	564235	.	+	.	ID=CLUHART00000006146:exon:997;Parent=CLUHART00000006146
scaffold789	maker	exon	564372	564780	.	+	.	ID=CLUHART00000006146:exon:998;Parent=CLUHART00000006146
scaffold789	maker	CDS	558191	560123	.	+	0	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
scaffold789	maker	CDS	561401	561519	.	+	2	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
scaffold789	maker	CDS	564171	564235	.	+	0	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
scaffold789	maker	CDS	564372	564588	.	+	1	ID=CLUHART00000006146:cds;Parent=CLUHART00000006146
scaffold789	maker	five_prime_UTR	558184	558190	.	+	.	ID=CLUHART00000006146:five_prime_utr;Parent=CLUHART00000006146
scaffold789	maker	three_prime_UTR	564589	564780	.	+	.	ID=CLUHART00000006146:three_prime_utr;Parent=CLUHART00000006146
scaffold789	maker	mRNA	558184	564780	.	+	.	ID=CLUHART00000006147;Parent=CLUHARG00000003852
scaffold789	maker	exon	558184	560123	.	+	.	ID=CLUHART00000006147:exon:997;Parent=CLUHART00000006147
scaffold789	maker	exon	561401	561519	.	+	.	ID=CLUHART00000006147:exon:998;Parent=CLUHART00000006147
scaffold789	maker	exon	562057	562121	.	+	.	ID=CLUHART00000006147:exon:999;Parent=CLUHART00000006147
scaffold789	maker	exon	564372	564780	.	+	.	ID=CLUHART00000006147:exon:1000;Parent=CLUHART00000006147
scaffold789	maker	CDS	558191	560123	.	+	0	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
scaffold789	maker	CDS	561401	561519	.	+	2	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
scaffold789	maker	CDS	562057	562121	.	+	0	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
scaffold789	maker	CDS	564372	564588	.	+	1	ID=CLUHART00000006147:cds;Parent=CLUHART00000006147
scaffold789	maker	five_prime_UTR	558184	558190	.	+	.	ID=CLUHART00000006147:five_prime_utr;Parent=CLUHART00000006147
scaffold789	maker	three_prime_UTR	564589	564780	.	+	.	ID=CLUHART00000006147:three_prime_utr;Parent=CLUHART00000006147

How to cite?

This work has not been published (I will think about it). But if you wish to cite AGAT you could probably do it as follow (Adapt the version for the one you have used):

Dainat J. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format.  
(Version v0.8.0). Zenodo. https://www.doi.org/10.5281/zenodo.3552717

Publication using AGAT

Some examples of publications that have used AGAT

See publications
Journal Title
Genome Biology and Evolution Ancestral Physical Stress and Later Immune Gene Family Expansions Shaped Bivalve Mollusc Evolution
Preprint A long read optimized de novo transcriptome pipeline reveals novel ocular developmentally regulated gene isoforms and disease targets
G3 Genes Genomes Genetics A telomere to telomere assembly of Oscheius tipulae and the evolution of rhabditid nematode chromosomes
BMC genomics In vitro resynthesis of lichenization reveals the genetic background of symbiosis-specific fungal-algal interaction in Usnea hakonensis
G3 Genes Genomes Genetics Application of an optimized annotation pipeline to the Cryptococcus deuterogattii genome reveals dynamic primary metabolic gene clusters and genomic impact of RNAi loss
Mol. Biol. Evol. Genomics of an avian neo-sex chromosome reveals the evolutionary dynamics of recombination suppression and sex-linked genes
Virology Four novel Picornaviruses detected in Magellanic Penguins (Spheniscus magellanicus) in Chile
DNA Research The Crown Pearl: a draft genome assembly of the European freshwater pearl mussel Margaritifera margaritifera (Linnaeus, 1758)
BMC genomics Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
Plos pathogens Two novel loci underlie natural differences in Caenorhabditis elegans abamectin responses
Preprint Butterfly eyespots evolved via co-option of the antennal gene-regulatory network
Preprint Transcript- and annotation-guided genome assembly of the European starling
Microbiol Resour Announc. LGAAP: Leishmaniinae Genome Assembly and Annotation Pipeline
Genome Biology and Evolution A Chromosome-level Genome Assembly of the Reed Warbler (Acrocephalus scirpaceus)
Preprint Barcoded RH-seq illuminates the complex genetic basis of yeast thermotolerance
Gygabyte A high-quality draft genome for Melaleuca alternifolia (tea tree): a new platform for evolutionary genomics of myrtaceous terpene-rich species
Nature Chromosome-scale genome sequencing, assembly and annotation of six genomes from subfamily Leishmaniinae
Preprint High quality, phased genomes of Phytophthora ramorum clonal lineages NA1 and EU1
Elife Analysis of meiosis in Pristionchus pacificus reveals plasticity in homolog pairing and synapsis in the nematode lineage
MDPI Transcriptome Comparison of Secondary Metabolite Biosynthesis Genes Expressed in Cultured and Lichenized Conditions of Cladonia rangiferina
MDPI FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow
Preprint De Novo Whole Genome Assembly of the Roborovski Dwarf Hamster (Phodopus roborovskii) Genome, an Animal Model for Severe/Critical COVID-19
Preprint Using historical museum samples to examine divergent and parallel evolution in the invasive starling
GBE A Chromosome-Level Genome Assembly of the Reed Warbler (Acrocephalus scirpaceus)
Preprint A genome assembly of the Atlantic chub mackerel (Scomber colias): a valuable teleost fishing resource
Current Protocols BUSCO: Assessing Genomic Data Quality and Beyond
[...] [...]

Troubleshooting

See Troubleshooting section form the doc here.

More Repositories

1

MrBayes

MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. For documentation and downloading the program, please see the home page:
C
232
star
2

excelerate-scRNAseq

Single RNA-seq data analysis with R (Finland, May, 2019)
HTML
216
star
3

GAAS

Genome Assembly and Annotation Service code
Perl
199
star
4

workshop-scRNAseq

Single cell RNA sequencing analysis course
HTML
190
star
5

workshop_omics_integration

Workshop in omics integration and systems biology
HTML
60
star
6

EMBLmyGFF3

An efficient way to convert gff3 annotation files into EMBL format ready to submit.
Python
59
star
7

workshop-reproducible-research

NBIS/ELIXIR course: Tools for Reproducible Research
HTML
55
star
8

pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
Nextflow
42
star
9

workshop-RNAseq

Workshop β€’ Analysis of RNA-seq data
CSS
33
star
10

GUESSmyLT

An efficient way to guess the library type of your RNA-Seq data.
Python
29
star
11

GenErode

GitHub repository for GenErode, a Snakemake pipeline for the analysis of whole-genome sequencing data from historical and modern samples to study patterns of genome erosion.
Python
22
star
12

single-cell_sib_scilifelab

SIB/Scilifelab autumn school in single cell omics 2019.
HTML
21
star
13

single-cell_sib_scilifelab_2021

SciLifeLab SIB Summer School for Advanced topics in Single Cell Omics. Site: https://nbisweden.github.io/single-cell_sib_scilifelab_2021/
Jupyter Notebook
21
star
14

aMeta

Ancient microbiome snakemake workflow
Python
19
star
15

workshop-neural-nets-and-deep-learning

NBIS workshop in Neural Nets and Deep Learning
HTML
18
star
16

IgDiscover-legacy

Analyze antibody repertoires and discover new V genes from high-throughput sequencing reads
Python
17
star
17

development-guidelines

Development guidlines for software within NBIS.
17
star
18

workshop-epigenomics-RTDs

workshop website on readthedocs
16
star
19

raukrtemplate

Presentation & lab RMarkdown templates for RaukR course
CSS
16
star
20

workshop-mlbiostatistics

HTML
15
star
21

wgs-structvar

Whole Genome Sequenceing Structural Variation Pipelines
Nextflow
15
star
22

workshop-genome_assembly

13
star
23

workshop-ngsintro

Workshop β€’ Intro to Bioinformatics using NGS data β€’ 5 days
HTML
13
star
24

nbis-meta

A snakemake workflow for metagenomic projects
Python
13
star
25

TransposonPSI

TransposonPSI involves a PSI-blast search of a protein or nucleotide sequence against a set of profiles of proteins corresponding to major clades/families of transposon Open Reading Frames.
Pep8
12
star
26

Earth-Biogenome-Project-pilot

Assembly and Annotation workflows for analysing data in the Earth Biogenome Project pilot project.
Nextflow
11
star
27

project_template

An example of how to organize files for a research project to increase reproducibility
Dockerfile
9
star
28

workshop-genome_annotation

This is the NBIS repository for the Genome Annotation Workshop.
9
star
29

workshop-python

7
star
30

sauron

Website is at:
R
7
star
31

workshop-r

Workshop β€’ R Programming Foundations for Life Scientists β€’ 5 days
CSS
7
star
32

workshop-advanced-python

Workshop on advanced python lead by Sergio Netotea
Jupyter Notebook
6
star
33

pipelines

Pipelines developed within NBIS through different pipeline frameworks.
6
star
34

contigtax

Taxonomic classification of metagenomic contigs
Python
6
star
35

workshop-spatial

Jupyter Notebook
6
star
36

sda-cli

User command line interface for the SDA
Go
5
star
37

workflow-tools-evaluation

Evaluating workflow tools and formats, such as the Common Workflow Language (CWL) for possible use within NBIS
Python
5
star
38

workshop-pgip

Population Genomics in Practice main website
TeX
5
star
39

workshop_omicsint_ISMBECCB

Repository for the ISMB/ECCB workshop "A practical introduction to multi-omics integration and network analysis"
HTML
5
star
40

raukr-2024

NBIS Summer School β€’ Advanced R for Bioinformatics
JavaScript
5
star
41

fungal-trans

A Snakemake workflow for fungal metatranscriptomics
Python
4
star
42

snakemake_best_practice

Small snakemake best practice workflow
Python
4
star
43

LocalEGA

Please go to to https://github.com/EGA-archive/LocalEGA instead
Python
4
star
44

K9-WGS-Pipeline

Nextflow pipeline for standardised variant calls on canine genomes
Shell
4
star
45

pathfindr

R code for variant prioritization, intended for use with Sarek
R
3
star
46

RaukR-2019

RaukR 2019 teaching materials
3
star
47

workshop-snakemake-byoc

Python
3
star
48

data-submission-documentation

Documentation for data submissions from life science research in public repositories
Shell
3
star
49

rke-openstack

A simple CLI for automating RKE deployments on Openstack using Terraform
Python
3
star
50

ReprRes_G_Arnqvist_1305

Reproducible research repository for NBIS project G Arnqvist 1305
Perl
3
star
51

assemblyeval-smk

Snakemake workflow for evaluation of assemblies
Python
3
star
52

workshop-genome_annotation_elixir

genome annotation course within ELIXIR
HTML
3
star
53

xferticket

transient storage service
Ruby
3
star
54

Training-Tech-shorts

Short lessons, training various technologies useful to our work.
TeX
3
star
55

vcf2cytosure

Convert VCF with structural variations to CytoSure format
Python
2
star
56

workshop-plotting-in-r

HTML
2
star
57

sda-services-backup

Go
2
star
58

single-cell-pbl

Single Cell Project Based Learning
HTML
2
star
59

redmine_anonymous_watchers

Update of Redmine Anonymous Watchers plugin to work with redmine v3+.
Ruby
2
star
60

redmine_timetables

Redmine plugin for displaying users schedules.
HTML
2
star
61

workshop-conda

How to use Conda and how to contribute to Bioconda
CSS
2
star
62

encam

Encyclopedia of Cancer Microenvironment
TypeScript
2
star
63

module-open-science-dm-practices

Open Science module for Data Management course (carpentries style)
Ruby
2
star
64

workshop-bioinformatics-for-PIs

NBIS course Bioinformatics for Principal Investigators
2
star
65

synteny_plots_with_jcvi

Step-by-step tutorial to create synteny plots for pairs and trios of genomes using JCVI's MCScan
2
star
66

cytomine-bp-helm

Helm charts for deploying cytomine (https://doc.cytomine.org/) to a kubernetes environment.
Smarty
2
star
67

homepage_rshiny.nbis.se

homepage for rshiny.nbis.se
HTML
2
star
68

NBIS-template-support-reports

2
star
69

raukr-2023

NBIS Summer School β€’ Advanced R for Bioinformatics
JavaScript
2
star
70

workshop-data-visualization-r

Materials for NBIS workshop on data visulalization
HTML
2
star
71

NBIS-UtilityCode

NBIS-UtilityCode: a place for bioinformatics tools and libraries, written and maintained by NBIS, the National Bioinformatics Infrastructure Sweden.
C++
2
star
72

a_johansson_2020

Γ…sa Johansson partner project 2020
Python
2
star
73

assembly-project-template

This is a template for all assembly projects. Please read the instructions below on how to use it.
Shell
2
star
74

SMS_4882_19_Prostate_Bulk_RNA_Seq

HTML
1
star
75

workshop-in-the-cloud

A cloud solution for providing workshops
Shell
1
star
76

dicom-data-visualizer

A set of Jupyter Notebooks highlighting how to parse DICOM data using pydicom.
Jupyter Notebook
1
star
77

theherdbook

The Herd Book
TypeScript
1
star
78

cython-talk

Cython talk held August 2019
Jupyter Notebook
1
star
79

rmclient

Simple redmine command line client.
Ruby
1
star
80

LocalEGA-deploy-terraform

Terraform deployment of LocalEGA on openstack
Shell
1
star
81

excelerate-demonstrator-4.3

Elixir Excelerate Demonstrator 4.3
HCL
1
star
82

redmine_surveys_notifier

This plugin allows to send a survey when a support issue is closed
Ruby
1
star
83

workshop-kmer-analysis

Introduction into kmer analysis
Shell
1
star
84

canvas-integrations

1
star
85

gc-bp-helm

Helm chart for grand challenge
Python
1
star
86

web-grav-content

JavaScript
1
star
87

Rcourse

R Programming Foundations for Life Scientists
HTML
1
star
88

sms6012_CZIomicsint

HTML
1
star
89

NGScourse

Introduction to Bioinformatics Using NGS Data
HTML
1
star
90

course_chipseq

HTML
1
star
91

predictprotein-webserver-topcons2

web-server implementation for topcons2
HTML
1
star
92

ejprd

Shell
1
star
93

CMK-NBIS-PRT-project-template

NBIS peer review track project template
1
star
94

pipelines-bpipe

NBIS bpipe pipelines framework
Groovy
1
star
95

aida-data-hub

AIDA Data Hub Scrum team board
1
star
96

data-management-gym

Repository for Data Management training session modules
1
star
97

beacon-api-tests

Compliance tester and test cases for Beacons.
Python
1
star
98

ENA-submission

Tool for easing the submission of data to the ENA
Shell
1
star
99

confluence-server-to-cloud-utils

Various scripts used when migrating a Confluence Server instance to a Confluence Cloud instance.
Python
1
star
100

workshop-common

Common configuration of githubpages for workshops
HTML
1
star