datacollect
A collection of tools to collect and download various data.
Often, I write simple scripts and tools to collect data for various "data science" tasks. I thought that it might be worthwhile to collect them in a central repository since they might be useful to others!
Contents
- Collect Lyrics
- Twitter Timeline
- Collect Popular Music Tags
- PDB Info Table
- ZINC Molecule Downloader
- Collect English Premier League Soccer Data
Important Note
Please note that I developed and tested these tools in Python 3.x, and it could be possible that the scripts do not work flawlessly in Python 2.7.x due to the more challenging unicode handling.
Collect Lyrics
A command line tool to download song lyrics given artist names and song titles.
Twitter Timeline
A command line tool that downloads your personal twitter timeline in CSV format with optional keyword filter.
Tutorial for turning your twitter timeline into a word cloud.
Collect Popular Music Tags
A command line tool to download popular tags for a list of songs from last.fm, e.g., for various data mining projects.
PDB Info Table
A command line tool that creates an info table from a list of PDB files.
ZINC Molecule Downloader
A command line tool for downloading 3D structures of small chemical molecules from http://zinc.docking.org.
Collect English Premier League Soccer Data
A command line tool to Collect Fantasy Soccer data from the Premier League.