Twitter-Archive

A CLI Python application to download all media (and hopefully more) from bookmarked tweets (for now). Eventually I hope to make this a general archive utility for Twitter, allowing users to download/archive all kinds of tweets.

Originally, before the V2 Twitter API, this app used Selenium to try and scrape the contents of a users bookmarks page. Now, since the release of the V2 API, the application has been rewritten. This new version is much faster and more robust.

Installation and Setup

Installation

Twitter-Archive can be installed with pip

$ pip install twitter-archive

Alternatively, you can clone this repository and install from the repository instead of from PyPi.

$ git clone https://github.com/jarulsamy/Twitter-Archive
$ cd Twitter-Archive
$ pip install .

To properly authenticate with the Twitter API, you will have to create a developer application. This will provide you with a client ID and client secret.

Twitter Developer App Setup

Refer to these docs to setup your Twitter developer account and project.

Authentication and Usage

There are several options for passing the client ID and client secret to the application. Only one of the following is required.

Option 1: Environment Variables

Set the relevant environment variables as so:

$ export TWITTER_ARCHIVE_CLIENT_ID="YOUR_CLIENT_ID_HERE"
$ export TWITTER_ARCHIVE_CLIENT_SECRET="YOUR_CLIENT_SECRET_HERE"

Now you can use the application until you restart your shell.

Option 2: Dotenv Variables

Alternatively to environment variables, you can save your tokens in a .env file in your current working directory. This file is automatically read and loaded by twitter-archive at runtime to load the necessary variables. An example .env file would look like this:

TWITTER_ARCHIVE_CLIENT_ID=YOUR_CLIENT_ID_HERE
TWITTER_ARCHIVE_CLIENT_SECRET=YOUR_CLIENT_SECRET_HERE

Option 3: CLI Flags

The tokens can also be passed in as CLI flags, but this is generally discouraged as most shells keep a history of commands entered, and this obviously risks leaking your keys. For example:

$ twitter-archive --client-id="YOUR_CLIENT_ID_HERE" --client-secret="YOUR_CLIENT_SECRET_HERE"

Usage

You can then invoke the app with:

$ twitter-archive

By default, the app will print a URL to prompt the user to authorize the application with Twitters official APIs. Once you navigate to that link and login with Twitter, the app will fetch a manifest of all the bookmarked tweets and begin saving any photos/videos to disk.

You can view the built-in CLI help menu for more info:

$ twitter-archive --help
Usage: twitter-archive [--client-id ID] [--client-secret ID] [--headless] [--no-clobber] [--num-download-threads N] [--quiet]
                       [-o FILE] [-i FILE | -m FILE] [-v] [--version] [--help]

A CLI Tool to archive tweets v0.0.7

Options:
  --client-id ID        Specify the client ID. (default: None)
  --client-secret ID    Specify the client ID. (default: None)
  --headless            Don't use interactive authentication. (default: False)
  --no-clobber          Don't redownload/overwrite existing media. (default: False)
  --num-download-threads N
                        Number of threads to use while downloading media. (default: 8)
  --quiet               Disable download progress bars (default: False)
  -o FILE, --media-output FILE
                        Path to output downloaded media. (default: media)
  -i FILE, --manifest-input FILE
                        Use an existing manifest and download all media. (default: None)
  -m FILE, --manifest-output FILE
                        Path to output bookmark manifest. (default: bookmark-manifest.json)
  -v, --verbose
  --version             show program's version number and exit
  --help                Show this help message ane exit.

Acknowledgment

The Twitter developer team did an excellent job on the new APIs. The new APIs are substantially more intuitive and allow us to interact with many more features of Twitter. While it did take two years, the openness, transparency, and attention to feedback is much appreciated!

The relevant forum post is available here.

jarulsamy/Twitter-Archive

jarulsamy

Reviews

Repository Details