• Stars
    star
    471
  • Rank 93,216 (Top 2 %)
  • Language
    Rust
  • License
    Other
  • Created over 2 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

memory.lol

memory.lol

Rust build status Coverage status

Overview

This project is a tiny web service that provides historical information about social media accounts.

It can currently be used to look up 542 million historical screen names for 443 million Twitter accounts. Most of this data has been scraped from either the Twitter Stream Grab or the Wayback Machine (both published by the Internet Archive).

Coverage should be fairly good (for non-protected accounts) going back to 2011, which is when the Twitter Stream Grab was launched.

Please note that this software is not "open source", but the source is available for use and modification by individuals, non-profit organizations, and worker-owned businesses (see the license section below for details).

Safety

All information provided by this service has been gathered from public archives, and in most cases it can easily be found through other means (such as a Twitter search for replies to an account). The goal of the service is to make it easier for researchers or journalists to identify directions for further investigation, and more generally to indicate to users that an account may be operating a scam, spreading disinformation, etc. If you have concerns about safety or privacy, you can contact me (via Twitter DM or email) and your request will be handled privately.

Current access restrictions

In most cases public access to the tool is currently limited to historical facts that have been observed in the past 60 days. There are two exceptions to this rule:

  • Some accounts are excluded at the request of the account owner.
  • Full histories are provided for a set of accounts compiled from several "bad actor" lists.

The full twelve years and half a billion screen names (minus requested exclusions) are available to a trusted group of researchers, journalists, and activists.

The service currently supports authenticating via a GitHub or Google account.

The service only uses GitHub (or Google) for authentication, doesn't require any non-public or write access to the user's accounts, will never request any kind of password, and only the user's public information is stored on the servers.

The service does not currently log requests in a way that would allow anyone with access to the server to link individual queries to specific authorized users, but I reserve the right to implement such logging in the future if there's any suggestion of abuse.

To log in visit https://api.memory.lol/v1/login/github, which will take you to a GitHub "Authorize memory.lol" page that will ask you to authorize "Limited access to your public data". Click "Authorize" and you'll be taken to a status URL that will show your current access level (which will be empty unless your account has been specifically approved for access). From there the full index will be available (if your account has been approved).

To log out go to https://api.memory.lol/v1/logout.

It's possible to use the full version of the service from the command-line via GitHub's device flow, but this currently isn't very convenient (see instructions below). I'll be providing a client that makes command-line use a little easier.

If you're interested in having your account approved for non-date-restricted access, please contact me.

Use cases

Accounts that engage in hate speech, scams, harassment, etc. on social media platforms sometimes try to obscure their identities by changing their screen names, and they often also have really bad opsec (for example using real names or other identifying information on accounts that they later intend to use anonymously).

Being able to look up historical social media profiles often makes it possible to identify the offline identities of these people (or at least to trace connections between their activities).

Here are a few examples off the top of my head (the first three are examples of the service in action, and the last two show how it can be used to confirm the work of others):

In many cases the information provided by the service won't be enough to identify a person, but may provide hints about where to look next (for example looking up deleted tweets for old screen names with ✨cancel-culture✨ is often a reasonable second step).

Detailed example

If you visit https://api.memory.lol/v1/tw/libsoftiktok in your browser, you'll see the following data:

{
  "accounts": [
    {
      "id": 1326229737551912960,
      "screen-names": {
        "chayaraichik": null,
        "cuomomustgo": null,
        "houseplantpotus": null,
        "shaya69830552": [
          "2020-11-10"
        ],
        "shaya_ray": [
          "2020-11-27",
          "2020-12-17"
        ],
        "libsoftiktok": [
          "2021-08-18",
          "2022-06-16"
        ]
      }
    }
  ]
}

Note that for some screen names we don't currently have information about when they were observed (e.g. the ones with null values above). If an screen name was observed on only one day in our data sets, there will be a single date. If there are two dates, they indicate the first and last day that the screen name was observed.

These date ranges will not generally represent the entire time that the screen name has been used (they just indicate when the account appears with that screen name in our data sets).

Other features

The service is very minimal. One of these few things it does support is querying multiple screen names via a comma-separated list (for example: https://api.memory.lol/v1/tw/jr_majewski,MayraFlores2022). It also supports searching for a screen name prefix (currently limited to 100 results; for example: https://api.memory.lol/v1/tw/tradwife*).

It currently only supports JSON output, but if you want a spreadsheet, for example, you can convert the JSON to CSV using a tool like gojq:

$ curl -s https://api.memory.lol/v1/tw/jr_majewski,MayraFlores2022 |
> gojq -r '.[].accounts | .[] | .id as $id | ."screen-names" | keys | [$id] + . | @csv'
89469296,"LaRepublicana86","MayraFlores2022","MayraNohemiF"
726873022603362304,"JRMajewski","jr_majewski"
1533878962455293953,"jr_majewski"

Or if you want one screen name per row:

$ curl -s https://api.memory.lol/v1/tw/jr_majewski,MayraFlores2022 |
> gojq -r '.[].accounts | .[] | .id as $id | ."screen-names" | keys | .[] | [$id, .] | @csv'
89469296,"LaRepublicana86"
89469296,"MayraFlores2022"
89469296,"MayraNohemiF"
726873022603362304,"JRMajewski"
726873022603362304,"jr_majewski"
1533878962455293953,"jr_majewski"

Note that screen name queries are case-insensitive, but the results distinguish case (which can be useful for archives such as Archive Today, which only provide case-sensitive search).

Other endpoints

You can also look up an account's history by account ID (e.g. https://api.memory.lol/v1/tw/id/1326229737551912960 also shows the screen names for Raichik's account).

Authorized access via device flow

There are currently several steps if you want to access the full index from the command line. By default you will receive date-restricted results:

$ curl -s https://api.memory.lol/v1/tw/USForcesKorea | jq
{
  "accounts": [
    {
      "id": 4749974413,
      "id_str": "4749974413",
      "screen_names": {
        "USForcesKorea": [
          "2018-06-08",
          "2022-07-29"
        ]
      }
    }
  ]
}

To access the full index (assuming you have an approved account), you'll first need to get a device code and user code, using exactly this command:

$ curl -X POST -d 'client_id=b8ab5a8c1a2745d514b7' https://github.com/login/device/code
device_code=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&expires_in=898&interval=5&user_code=ABCD-0123&verification_uri=https%3A%2F%2Fgithub.com%2Flogin%2Fdevice

Next visit https://github.com/login/device in a browser and enter the user code you just received when prompted.

Lastly you need to get your bearer token (replacing device_code below with the one you were given, but again using the client_id shown here):

$ curl -X POST -d 'device_code=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&client_id=b8ab5a8c1a2745d514b7&grant_type=urn:ietf:params:oauth:grant-type:device_code' https://github.com/login/oauth/access_token
access_token=gho_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&scope=&token_type=bearer

You can then use this token to make authenticated queries:

$ curl -s -X POST -d 'token=gho_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' https://api.memory.lol/v1/tw/USForcesKorea | jq
{
  "accounts": [
    {
      "id": 26847645,
      "id_str": "26847645",
      "screen_names": {
        "USFKPAO": [
          "2011-10-19",
          "2016-06-19"
        ],
        "usforceskorea": [
          "2017-02-20",
          "2018-03-27"
        ]
      }
    },
    {
      "id": 4749974413,
      "id_str": "4749974413",
      "screen_names": {
        "usforceskorea": [
          "2016-02-26",
          "2017-02-07"
        ],
        "usforceskorea_": [
          "2017-02-19",
          "2018-04-24"
        ],
        "USForcesKorea": [
          "2018-06-08",
          "2022-07-29"
        ]
      }
    }
  ]
}

Eventually this process will be bundled up into a command-line client, but for now this approach will work with existing tools like curl.

Importing data

The application currently supports importing data in two file formats. The first requires one Twitter user object per line (in JSON format with an additional snapshot field representing the observation time as an epoch second). The second is a CSV format with at least three columns (Twitter user ID, screen name, and observation time as epoch second).

Future

Anything about the web service is subject to change at any time, including its availability.

There are non-public endpoints that I'm likely to open up at some point. These provide full historical user profiles, information about suspension or deactivation status, etc.

Terms of service compliance

This web service simply provides an interface to an index for content that is hosted in public archives, and the project aims to be compliant with the terms of service of all platforms that were accessed in generating this index.

This repository does not contain data from any social media platform.

License

This software is published under the Anti-Capitalist Software License (v. 1.4).

More Repositories

1

cancel-culture

Tools for fighting abuse on Twitter
Rust
411
star
2

iteratee

Iteratees for Cats
Scala
184
star
3

dhallj

Dhall for Java
Java
174
star
4

blue

Twitter Blue data
122
star
5

octocrabby

Tools for managing GitHub block lists
Rust
98
star
6

type-provider-examples

Macro-based type providers for Scala (examples)
Scala
85
star
7

twitter-watch

Tracking the far right on Twitter
Rust
58
star
8

abstracted

Forget your methods
Scala
41
star
9

expressier

A regular expression type provider demo
Scala
40
star
10

sized

Scala
35
star
11

hassreden-tracker

Hassreden-Tracker
Rust
34
star
12

unsuspensions

Elon Musk's suspension reversals
32
star
13

metaplasm

meta.plasm.us
HTML
32
star
14

scala-quickstart

Getting started in the Scala REPL
Shell
28
star
15

deleted-tweets

Deleted tweet archive metadata and code
26
star
16

typelevel-tour

A tour of some Typelevel libraries
Scala
25
star
17

itm

Interactive topic modeling
Java
24
star
18

stop-the-steal

Stop the Steal / J6 Twitter user profiles
20
star
19

evasion

Tracking far-right ban evasion on Twitter
19
star
20

circe-derivation

Scala
17
star
21

mstparser

Scala
17
star
22

circe-algebra

Experimental decoding algebra for circe
Scala
13
star
23

wayback-rs

Tools for working with the Wayback Machine in Rust
Rust
13
star
24

orcrs

An ORC reader for Rust
Rust
12
star
25

scala-java-interop

Scala-Java interoperability examples
Java
12
star
26

dotty-experiments

Scala
11
star
27

incompletes

Derivation for incomplete type class instances
Scala
9
star
28

instancez

Scalaz 7 type class instances for various Scala and Java libraries.
Scala
9
star
29

names

Named-entity recognition with Finagle
Scala
8
star
30

coverages

A sandbox for experimenting with Scala code coverage tools
Scala
7
star
31

rotation-rs

Document rotation detection
Rust
7
star
32

tsg-metadata

Metadata related to the Twitter Stream Grab from Archive Team
Rust
7
star
33

macavity

Faster stuff for Cats
Scala
7
star
34

memory-lol

memory.lol
Java
7
star
35

rust-jvm-demo

Rust on the JVM
Scala
6
star
36

concurrent

Scala
6
star
37

syzygist

Various utilities for Scalaz streams
Scala
6
star
38

scrooge-circe-demo

Circe codecs for Scrooge-generated code
Scala
6
star
39

at-twitter-stream

Tools for working with Twitter JSON data
Rust
5
star
40

haskell-cpython

CPython bindings for Haskell
Haskell
5
star
41

woodchipper

Woodchipper
JavaScript
5
star
42

deleted-tweets-archive

Deleted Tweets Archive - These tweets display several bad actors' most divisive uses of the Twitter platform.
5
star
43

iteratee-twitter

iteratee.io module for Twitter Util
Scala
5
star
44

twitter-model

JSON Schema definitions for the Twitter API
Rust
4
star
45

hkvdb

hkvdb
Rust
4
star
46

sbt-javacc

An sbt plugin for JavaCC
Scala
4
star
47

parquetry

Rust
4
star
48

sbt-opinions

Standard SBT project plugin
Scala
4
star
49

twprs

Twitter profile tools for Rust
Rust
4
star
50

mallet

MALLET
Java
3
star
51

egg-mode-extras

Rate-limit-aware streams and other helpers for egg-mode
Rust
3
star
52

twpis

Rust
2
star
53

sentlex

Haskell
2
star
54

sbt-jacc

An sbt plugin for JFlex and Jacc
Java
2
star
55

xmlunit

XmlUnit
Java
2
star
56

tesseract

Tesseract
C++
2
star
57

shapeless-twitter

Twitter + Shapeless experiments
Scala
2
star
58

findmyfrens

Tracking findmyfrens.net
Rust
2
star
59

cli-helpers

Rust
1
star
60

misccli

Miscellaneous command-line tools
Rust
1
star
61

relate

A prosopography visualization tool
Java
1
star
62

scala-visitor

Scala
1
star
63

smoothlife

A simple version of SmoothLife in Haskell
Haskell
1
star
64

morphadorner

Java
1
star
65

dotfiles

My configuration
Vim Script
1
star
66

euler

Project Euler framework and solutions
Scala
1
star
67

abbot

TEI conversion utilities
Shell
1
star
68

disinfo-notes

1
star
69

rocksdb-table

Some helpers for working with RocksDB databases in Rust
Rust
1
star
70

rearranger

Rust
1
star
71

gothic

Analysis of a collection of Gothic texts
1
star
72

scardf

Scala RDF API
Scala
1
star
73

corporacamp-site

PHP
1
star
74

travisbrown

1
star
75

oscon-challenge

Twitter Challenge at OSCON 2014
JavaScript
1
star
76

circles

Miscellaneous files for Romantic Circles
1
star
77

archivindex-builder

Archivindex Builder
HTML
1
star
78

gistlist

List your gists
Rust
1
star
79

retryable-error

Rust
1
star
80

json-schema-tools

JSON Schema tools
Rust
1
star