• Stars
    star
    1,320
  • Rank 35,601 (Top 0.8 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 12 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A command line tool (and Python library) for archiving Twitter JSON

twarc

DOI Build Status Standard Premium v2

twarc is a command line tool and Python library for collecting and archiving Twitter JSON data via the Twitter API. It has separate commands (twarc and twarc2) for working with the older v1.1 API and the newer v2 API and Academic Access (respectively).

twarc has been developed with generous support from the Mellon Foundation.

Contributing

New features are welcome and encouraged for twarc. However, to keep the core twarc library and command line tool sustainable we will look at new functionality with the following principles in mind:

  1. Purpose: twarc is for collection and archiving of Twitter data via the Twitter API.
  2. Sustainability: keeping the surface area of twarc and it's dependencies small enough to ensure high quality.
  3. Utility: what is exposed by twarc should be applicable to different people, projects and domains, and not specific use cases.
  4. API consistency: as much as sensible we aim to make twarc consistent with the Twitter API, and also aim to make twarc consistent with itself - so commands in core twarc should work similarly to each other, and twarc functionality should align towards the Twitter API.

For features and approaches that fall outside of this, twarc enables external packages to hook into the twarc2 command line tool via click-plugins. This means that if you want to propose new functionality, you can create your own package without coordinating with core twarc.

Documentation

The documentation is managed at ReadTheDocs. If you would like to improve the documentation you can edit the Markdown files in docs or add new ones. Then send a pull request and we can add it.

To view your documentation locally you should be able to:

pip install -r requirements-mkdocs.txt
pip install -e .
mkdocs serve
open http://127.0.0.1:8000/

If you prefer you can create a page on the wiki to workshop the documentation, and then when/if you think it's ready to be merged with the documentation create an issue. Please feel free to create whatever documentation is useful in the wiki area.

Code

If you are interested in adding functionality to twarc or fixing something that's broken here are the steps to setting up your development environment:

git clone https://github.com/docnow/twarc
cd twarc
pip install -r requirements.txt

Create a .env file that included Twitter App keys to use during testing:

BEARER_TOKEN=CHANGEME
CONSUMER_KEY=CHANGEME
CONSUMER_SECRET=CHANGEME
ACCESS_TOKEN=CHANGEME
ACCESS_TOKEN_SECRET=CHANGEME

Now run the tests:

python setup.py test

Add your code and some new tests, and send a pull request!

More Repositories

1

hydrator

Turn Tweet IDs into Twitter JSON & CSV from your desktop!
JavaScript
420
star
2

diffengine

track changes to the news, where news is anything with an RSS feed
Python
172
star
3

docnow

A Twitter data collection and appraisal application.
JavaScript
48
star
4

twarc-csv

A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV.
Python
31
star
5

unshrtn

A LevelDB backed URL unshortening microservice written in JavaScript
JavaScript
31
star
6

catalog

A simple catalog of Twitter ID Datasets
JavaScript
28
star
7

twarc-network

Generate network visualizations from Twitter data.
Python
18
star
8

awesome-social-media-archiving

Tools for helping you work with web platform archive downloads.
17
star
9

tweet-archive

A tool for working with tweet archives.
JavaScript
15
star
10

dnflow

A design prototype for DocNow to learn with
Python
14
star
11

code-of-conduct

Code of Conduct for the Documenting the Now Community
13
star
12

waybackprov

utility to fetch provenance information from Internet Archive's Wayback Machine
Python
13
star
13

tweet-viewer

Generates a single "infinite scroll" page to show tweets collected with the DocNow App
JavaScript
7
star
14

storified

archive Storify stories
Python
6
star
15

foaf

a microservice for generating friend-of-a-friend networks for Twitter
Python
5
star
16

twarc-timeline-archive

Download timelines for a set of users
Python
4
star
17

notebooks

sketches, ideas and experiments
Jupyter Notebook
4
star
18

twitter-archive-unshorten

Convert the t.co URLs in your Twitter archive back to their full form
JavaScript
4
star
19

social-humans

social media labels for more ethical archives
HTML
3
star
20

twarc-edits

Add a twarc2 command to get edited tweets
Python
3
star
21

docnow-ansible

Ansible Installer for Docnow App
Jinja
3
star
22

twarc-hashtags

Report on hashtags in tweet data.
Python
3
star
23

docnow.github.io

docnow.io website
HTML
3
star
24

twarc-ids

A plugin for twarc2 to extract tweet ids from tweet JSON.
Python
2
star
25

twarc-text

A twarc plugin to print tweets to the console
Python
2
star
26

flatten-tweet

Make Twitter v2 API response data easier to work with
JavaScript
2
star
27

roadmap

A roadmap of development on the Documenting the Now project.
1
star
28

twarcd

a microservice for twarc
Python
1
star
29

sfm-ansible

Ansible playbooks for setting up Social Feed Manager
1
star
30

tweetgresql

A little playground for testing modeling tweets in Node & PostgreSQL
JavaScript
1
star
31

twarc-statistics

A plugin for getting counts and basic statistics from a dataset
1
star
32

twarc-videos

Python
1
star
33

dnflow-ansible

Ansible playbooks for setting up dnflow
Shell
1
star