• Stars
    star
    103
  • Rank 333,046 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python package to download files on any topic in bulk.

PyPI license

content-downloader

content-downloader a.k.a ctdl is a python package with command line utility and desktop GUI to download files on any topic in bulk!

Features

  • ctdl can be used as a command line utility as well as a desktop GUI.

  • ctdl fetches file links related to a search query from Google Search.

  • Files can be downloaded parallely using multithreading.

  • ctdl is Python 2 as well as Python 3 compatible.

Installation

  • To install content-downloader, simply,

    $ pip install ctdl
    
  • There seem to be some issues with parallel progress bars in tqdm which have been resolved in this pull. Until this pull is merged, please use my patch by running this command:

    $ pip install -U git+https://github.com/nikhilkumarsingh/tqdm
    

Desktop GUI usage

To use ctdl desktop GUI, open terminal and run this command:

$ ctdl-gui

Command line usage

$ ctdl [-h] [-f FILE_TYPE] [-l LIMIT] [-d DIRECTORY] [-p] [-a] [-t]
       [-minfs MIN_FILE_SIZE] [-maxfs MAX_FILE_SIZE] [-nr]
       [query]

Optional arguments are:

  • -f FILE_TYPE : set the file type. (can take values like ppt, pdf, xml, etc.)

               Default value: pdf
    
  • -l LIMIT : specify the number of files to download.

           Default value: 10
    
  • -d DIRECTORY : specify the directory where files will be stored.

               Default: A directory with same name as the search query in the current directory.
    
  • -p : for parallel downloading.

  • -minfs MIN_FILE_SIZE : specify minimum file size to download in Kilobytes (KB).

               Default: 0
    
  • -maxfs MAX_FILE_SIZE : specify maximum file size to download in Kilobytes (KB).

               Default: -1 (represents no maximum file size)
    
  • -nr : prevent download redirects.

               Default: False
    

Examples

  • To get list of available filetypes:

    $ ctdl -a
    
  • To get list of potential high threat filetypes:

    $ ctdl -t
    
  • To download pdf files on topic 'python':

    $ ctdl python
    

    This is the default behaviour which will download 10 pdf files in a folder named 'python' in current directory.

  • To download 3 ppt files on 'health':

    $ ctdl -f ppt -l 3 health
    
  • To explicitly specify download folder:

    $ ctdl -d /home/nikhil/Desktop/ml-pdfs machine-learning
    
  • To download files parallely:

    $ ctdl -f pdf -p python
    
  • To search for and download in parallel 10 files in PDF format containing the text "python" and "algorithm", without allowing any url redirects, and where the file size is between 10,000 KB (10 MB) and 100,000KB (100 MB), where KB means Kilobytes, which has an equivalent value expressed in Megabytes:

    $ ctdl -f pdf -l 10 -minfs 10000 -maxfs 100000 -nr -p "python algorithm"
    

Usage in Python files

from ctdl import ctdl

ctdl.download_content(
file_type = 'ppt',
limit = 5,
directory = '/home/nikhil/Desktop/ml-pdfs',
query = 'machine learning using python')

TODO

  • Prompt user before downloading potentially threatful files

  • Create ctdl GUI

  • Implement unit testing

  • Use DuckDuckgo API as an option

Want to contribute?

  • Clone the repository

    $ git clone http://github.com/nikhilkumarsingh/content-downloader
    
  • Install dependencies

    $ pip install -r requirements.txt
    

    Note: There seem to be some issues with current version of tqdm. If you do not get expected progress bar behaviour, try this patch:

    $ pip uninstall tqdm
    $ pip install git+https://github.com/nikhilkumarsingh/tqdm
    
  • In ctdl/ctdl.py, remove the . prefix from .downloader and .utils for the following imports, so it changes from:

    from .downloader import download_series, download_parallel
    from .utils import FILE_EXTENSIONS, THREAT_EXTENSIONS

    to:

    from downloader import download_series, download_parallel
    from utils import FILE_EXTENSIONS, THREAT_EXTENSIONS
  • Run the python file directly python ctdl/ctdl.py ___ (instead of with ctdl ___)

More Repositories

1

WhatsAppBotTut

Tutorial to create WhatsApp Bot using Twilio and Python
Jupyter Notebook
140
star
2

RegEx-In-Python

A comprehensive guide for learning regular expressions using Python
Jupyter Notebook
118
star
3

python-curses-tut

A beginners guide to curses in Python
Jupyter Notebook
86
star
4

YouTubeAPI-Examples

YouTube Data API Usage Examples using Python.
Jupyter Notebook
72
star
5

RemotePy

A remote-desktop sharing application.
Python
67
star
6

tesseract-python

Examples to implement OCR(Optical Character Recognition) using tesseract using Python
Python
60
star
7

flask-chat-app

A chat application created using Flask and Socket.IO
Python
55
star
8

pytest-tut

Unit Testing in Python with pytest
Python
53
star
9

gnewsclient

An easy-to-use python client for Google News feeds.
Python
50
star
10

serverless-rest-api

Tutorial for creating serverless REST API using AWS and Python.
Python
49
star
11

async-http-requests-tut

Making multiple HTTP requests using Python (synchronous, multiprocessing, multithreading, asyncio)
Python
47
star
12

pyinrail

A python wrapper for Indian Railways Enquiry API!
Python
44
star
13

choropleth-python-tutorial

Plotting Choropleth Maps using Python
Jupyter Notebook
40
star
14

python-github-actions-example

Example for creating a simple CI/CD pipeline for a Python Project using GitHub Actions.
Jupyter Notebook
40
star
15

terminal-image-viewer

A Simple Python Script to Display Images in Linux Terminal
Python
39
star
16

FacebookGraphAPI-Examples

Examples for facebook graph api for python
Python
38
star
17

geeksforgeeks

Python scripts of my published articles on geeksforgeeks.com
Python
37
star
18

Parallel-Programming-in-Python

Parallel Programming in Python Course
Jupyter Notebook
30
star
19

Desktop-Notifier-Example

An exemplar desktop notifier application using notify2
Python
25
star
20

Wit-Speech-API-Wrapper

A python client for interacting with Wit Speech Recognition API
Python
23
star
21

PDF_AUDIO_READER

A simple and offline PDF audio reader
Python
20
star
22

tempate-card-generator

Automatic Card Generation from Template (Image Manipulation using Python)
Jupyter Notebook
18
star
23

clix

An easy to use clipboard manager made using tkinter.
Python
17
star
24

facebook-messenger-bot

A demo fb messenger bot
Python
16
star
25

hackerrank-sdk

A python client for Hackerrank API
Python
14
star
26

Basics-of-Python

Teaching material for basic Python bootcamp
Jupyter Notebook
14
star
27

recursion-tree-plotter

A python decorator to generate a visual tree for recursive functions.
Python
13
star
28

prettype

An easy to use text stylizer for your desktop!
Python
13
star
29

wordcloud-example

Exemplar program for creating wordcloud.
Python
12
star
30

PracticalPandas

A practical way to learn Pandas.
Jupyter Notebook
12
star
31

concurrent-programming-in-python

Tutorial on Concurrent Programming in Python
Jupyter Notebook
12
star
32

MessengerBotTut

Messenger bot tutorial using Python
Jupyter Notebook
10
star
33

discovering-hidden-apis

Tutorial about discovering and exploring hidden web APIs
Jupyter Notebook
10
star
34

nikhilkumarsingh.github.io

My portfolio website
HTML
9
star
35

Machine-Learning-Samples

Machine learning algorithm code samples
Python
9
star
36

desktop_reminder

A simple Desktop Reminder App made using Tkinter
Python
8
star
37

IntroToNumpy

Introduction to Numpy
Jupyter Notebook
8
star
38

MemeGenerator

Python program to generate memes.
Jupyter Notebook
7
star
39

Intermediate-Python

Intermediate Python bootcamp teaching material
Jupyter Notebook
7
star
40

awesome-wikidata-scripts

Awesome Wikidata SPARQL Scripts
Jupyter Notebook
7
star
41

PyDot-Examples

Contain some examples to use PyDot for graph visualization.
Jupyter Notebook
7
star
42

dataclasses-tut-reference-material

Reference material for my dataclasses tutorial
Jupyter Notebook
7
star
43

ThugLife

A simple python script to create thug life photos and videos
Python
6
star
44

PythonForWeb

Python for web bootcamp teaching material
Jupyter Notebook
6
star
45

PythonForDataScience

Introduction to basic tools/libraries for data science.
Jupyter Notebook
6
star
46

weather-reporter

A sample python package for tutorial purpose. https://pypi.org/project/weather-reporter/
Python
6
star
47

python-dev-aug-18

Python for Developers - August '18 - Coding Blocks Pitampura
Jupyter Notebook
6
star
48

NIKBOT

My first AI chatbot
Python
5
star
49

Python-Resources

Important Python References and Resources.
5
star
50

IntroToMatplotlib

Introduction to Matplotlib
Jupyter Notebook
5
star
51

PyOBEX3

PyOBEX for python 3
Python
4
star
52

Python-web-crawlers

My web crawlers in Python
Python
4
star
53

linkbook

A social media for links!
HTML
4
star
54

IntroToPandas

Introduction to Pandas
Jupyter Notebook
4
star
55

pygameTut

pygame tutorial
Jupyter Notebook
3
star
56

cmail

A simple command-line email client!
Python
3
star
57

globTut

Tutorial on glob in Python (Complete Explanation with Examples)
Jupyter Notebook
3
star
58

IntroToFlask

Intro to flask
Jupyter Notebook
3
star
59

bshare

A command line bluetooth file sharing application for Linux.
Python
2
star
60

TextFormatter

A Python script for printing formatted text in Linux terminal.
Python
2
star
61

telegram-bots

Python
2
star
62

myproject1

Python
2
star
63

glitch-test

Shell
2
star
64

mygmap

A tutorial repository to demonstrate how to publish package on PyPI
Python
2
star
65

aiml_bot

An AIML bot for messenger.
Python
2
star
66

newsbot2

Python
2
star
67

lambda-schduler-example

Tutorial for Running Cron Jobs on AWS Lambda
Python
2
star
68

myflaskproject

HTML
1
star
69

Working-with-Google-APIs

Demo python scripts to interact with Google APIs
1
star
70

python-dev-dec-18

Python for Developers - December '18 - Coding Blocks Pitampura
Jupyter Notebook
1
star
71

student_api

A student api created in flask
Python
1
star
72

career_compass

HTML
1
star
73

code-editor

Python
1
star
74

gnewsapi

A heroku depoyable gnewsclient flask api
Python
1
star
75

memap

A digital library for people suffering from dementia
Python
1
star
76

Glitch-Scripts

Some useful scripts for Glitch.com projects
Shell
1
star
77

swift

Airline reservation system (DBMS project)
Python
1
star
78

nikhilkumarsingh

1
star