• Stars
    star
    107
  • Rank 323,587 (Top 7 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 7 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Download files from NCBI Entrez by accession

NCBI accession download script

A partner script to the popular ncbi-genome-download script, ncbi-acc-download allows you to download sequences from GenBank/RefSeq by accession through the NCBI ENTREZ API.

Installation

pip install ncbi-acc-download

Alternatively, clone this repository from GitHub, then run (in a python virtual environment)

pip install .

If this fails on older versions of Python, try updating your pip tool first:

pip install --upgrade pip

and then rerun the ncbi-acc-download install.

ncbi-acc-download is only developed and tested on Python releases still under active support by the Python project. At the moment, this means versions 3.6, 3.7, 3.8, and 3.9. Specifically, no attempt at testing under Python versions older than 3.6 is being made.

ncbi-acc-download 0.2.6 was the last version to support Python 2.7.

If your system is stuck on an older version of Python, consider using a tool like Homebrew or Linuxbrew to obtain a more up-to-date version.

Usage

To download a nucleotide record AB_12345 in GenBank format, run

ncbi-acc-download AB_12345

To download a nucleotide record AB_12345 in FASTA format, run

ncbi-acc-download --format fasta AB_12345

To download a protein record WP_12345 in FASTA format, run

ncbi-acc-download --molecule protein WP_12345

To just generate a list of download URLs to run the actual download elsewhere, run

ncbi-acc-download --url AB_12345

If you want to concatenate multiple sequences into a single file, run

ncbi-acc-download --out two_genomes.gbk AB_12345 AB_23456

You can use this with /dev/stdout as the filename to print the downloaded data to standard output instead of writing to a file if you want to chain ncbi-acc-download with other command line tools, like so:

ncbi-genome-download --out /dev/stdout --format fasta AB_12345 AB_23456 | gzip > two_genomes.fa.gz

If you want to download all records covered by a WGS master record instead of the master record itself, run

ncbi-acc-download --recursive NZ_EXMP01000000

You can supply a genomic range to the accession download using --range

ncbi-acc-download NC_007194 --range 1001:9000

As cutting a record up with a range operator like that can leave partial features at both ends of the record, you can combine the range download with the new correct extended validator to remove the partial features.

ncbi-acc-download NC_007194 --range 1001:9000 --extended-validation correct

You can get more detailed information on the download progress by using the --verbose or -v flag.

To get an overview of all options, run

ncbi-acc-download --help

License

All code is available under the Apache License version 2, see the LICENSE file for details.

More Repositories

1

ncbi-genome-download

Scripts to download genomes from the NCBI FTP servers
Python
946
star
2

vim-fountain

A VIM syntax highlighting plugin for the Fountain screenplay format
Vim Script
23
star
3

mockaioredis

Mock library to replace aioredis during unit tests (RETIRED)
Python
15
star
4

covid-spike-classification

Detect interesting SARS-CoV-2 spike protein variants from Sanger sequencing data.
Python
11
star
5

supybot-gsoc

A collection of patches used to pimp gsocbot, the supybot of #gsoc
Python
7
star
6

bioperl-hmmer3

BioPerl modules for HMMER3
Perl
6
star
7

fancy-prompt

An /etc/profile.d settings file creating a fancy prompt.
Shell
6
star
8

merge-gbk-records

Merge multiple GenBank records using a defined spacer sequence
Python
6
star
9

svgene

SVGene, an SVG gene cluster visualization library in JavaScript
JavaScript
6
star
10

glimmerhmm

GlimmerHMM git repository
C
5
star
11

flask-downloader

Allow a Flask web app to download files on behalf of the user
Python
5
star
12

nrpys

Python bindings for nrps-rs
Rust
4
star
13

nrps-rs

A Rust reimplementation of NRPSPredictor2
Rust
4
star
14

gecco2as

Small script to convert GECCO result tables into antiSMASH sideload JSON files
Python
4
star
15

around-the-world

Code to run the beaglebone-driven world map for Around the World in 80 Days
JavaScript
3
star
16

py3-kkdcp

Python 3 asyncio Kerberos Key Distribution Center Proxy server
Python
3
star
17

patscanui

A comfortable web interface for PatScan
CSS
2
star
18

go-kkdcp

Go implementation of a Kerberos Key Distribution Center Proxy
Go
2
star
19

rpi-temp-monitor

1-wire temperature monitoring for the Raspberry Pi
JavaScript
1
star
20

mentor-summit-2019

A central place to keep notes, the schedule etc. of the 2019 GSoC mentor summit
1
star
21

bioinf-helperlibs

A library of bioinformatics-related helper functions
Python
1
star
22

rust-kkdcp

A Rust implementation of MS-KKDCP
1
star
23

ncbi-entrez-error-messages

A collection of error messages returned by NCBI Entrez
1
star
24

asproxy

Small reverse proxy to sidestep some networking issues.
Go
1
star
25

docker-debian

Go
1
star
26

asdb-api

This is a port of the antiSMASH DB backend into Rust.
Rust
1
star
27

contorted

Automatically exported from code.google.com/p/contorted
C
1
star
28

match-ids

Match IDs (or more precisely locus_tags) between two GenBank files.
Python
1
star
29

spire

Search for Prokaryote Iron Response Elements
Python
1
star
30

genomedb-py

A tool to manage and use some metadata around NCBI GenBank genome files
Python
1
star
31

jsbio

A collection of JavaScript functions for biology applications.
JavaScript
1
star
32

wombatdb

Wombat Database Backend rewrite
Python
1
star
33

ipinfo

A small IP address echo server for use with a dynamic DNS change script.
Go
1
star
34

rna_extract

Extract tRNAs and rRNAs from sequences identified by NCBI GenBank ID
Python
1
star
35

statusbot

A command-line to twitter tool I use to let my servers tweet their status.
Python
1
star
36

swc-shell-novice

Python
1
star