Twitter-Archive
A CLI Python application to download all media (and hopefully more) from bookmarked tweets (for now). Eventually I hope to make this a general archive utility for Twitter, allowing users to download/archive all kinds of tweets.
Originally, before the V2 Twitter API, this app used Selenium to try and scrape the contents of a users bookmarks page. Now, since the release of the V2 API, the application has been rewritten. This new version is much faster and more robust.
Installation and Setup
Installation
Twitter-Archive can be installed with pip
$ pip install twitter-archive
Alternatively, you can clone this repository and install from the repository instead of from PyPi.
$ git clone https://github.com/jarulsamy/Twitter-Archive
$ cd Twitter-Archive
$ pip install .
To properly authenticate with the Twitter API, you will have to create a developer application. This will provide you with a client ID and client secret.
Twitter Developer App Setup
Refer to these docs to setup your Twitter developer account and project.
Authentication and Usage
There are several options for passing the client ID and client secret to the application. Only one of the following is required.
Option 1: Environment Variables
Set the relevant environment variables as so:
$ export TWITTER_ARCHIVE_CLIENT_ID="YOUR_CLIENT_ID_HERE"
$ export TWITTER_ARCHIVE_CLIENT_SECRET="YOUR_CLIENT_SECRET_HERE"
Now you can use the application until you restart your shell.
Option 2: Dotenv Variables
Alternatively to environment variables, you can save your tokens in a .env
file in your current working directory. This file is automatically read and
loaded by twitter-archive
at runtime to load the necessary variables. An
example .env
file would look like this:
TWITTER_ARCHIVE_CLIENT_ID=YOUR_CLIENT_ID_HERE
TWITTER_ARCHIVE_CLIENT_SECRET=YOUR_CLIENT_SECRET_HERE
Option 3: CLI Flags
The tokens can also be passed in as CLI flags, but this is generally discouraged as most shells keep a history of commands entered, and this obviously risks leaking your keys. For example:
$ twitter-archive --client-id="YOUR_CLIENT_ID_HERE" --client-secret="YOUR_CLIENT_SECRET_HERE"
Usage
You can then invoke the app with:
$ twitter-archive
By default, the app will print a URL to prompt the user to authorize the application with Twitters official APIs. Once you navigate to that link and login with Twitter, the app will fetch a manifest of all the bookmarked tweets and begin saving any photos/videos to disk.
You can view the built-in CLI help menu for more info:
$ twitter-archive --help
Usage: twitter-archive [--client-id ID] [--client-secret ID] [--headless] [--no-clobber] [--num-download-threads N] [--quiet]
[-o FILE] [-i FILE | -m FILE] [-v] [--version] [--help]
A CLI Tool to archive tweets v0.0.7
Options:
--client-id ID Specify the client ID. (default: None)
--client-secret ID Specify the client ID. (default: None)
--headless Don't use interactive authentication. (default: False)
--no-clobber Don't redownload/overwrite existing media. (default: False)
--num-download-threads N
Number of threads to use while downloading media. (default: 8)
--quiet Disable download progress bars (default: False)
-o FILE, --media-output FILE
Path to output downloaded media. (default: media)
-i FILE, --manifest-input FILE
Use an existing manifest and download all media. (default: None)
-m FILE, --manifest-output FILE
Path to output bookmark manifest. (default: bookmark-manifest.json)
-v, --verbose
--version show program's version number and exit
--help Show this help message ane exit.
Acknowledgment
The Twitter developer team did an excellent job on the new APIs. The new APIs are substantially more intuitive and allow us to interact with many more features of Twitter. While it did take two years, the openness, transparency, and attention to feedback is much appreciated!
The relevant forum post is available here.