Archive Team (@ArchiveTeam)

Top repositories

1

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Python
1,330
star
2

wpull

Wget-compatible web downloader and crawler.
HTML
551
star
3

ArchiveBot

ArchiveBot, an IRC bot for archiving websites
Python
325
star
4

warrior-dockerfile

A Dockerfile for the ArchiveTeam Warrior
Dockerfile
244
star
5

parler-grab

Archiving Parler.
Lua
229
star
6

Ubuntu-Warrior

Scripts to build and boot warrior virtual machine containing Docker
Shell
114
star
7

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
C
91
star
8

IA.BAK

We back up a lot of stuff from around the web; now it's time to back up the Internet Archive, just in case.
Shell
87
star
9

seesaw-kit

Making a reusable toolkit for writing seesaw scripts
Python
67
star
10

terroroftinytown

URLTeam's second generation of URL shortener archiving tools
Python
63
star
11

reddit-grab

Grabbing everything from reddit.
Lua
60
star
12

NewsGrabber

Grabbing all news.
Python
59
star
13

yahooanswers-grab

Saving all questions and answers from Yahoo! Answers.
Lua
48
star
14

tumblr-grab

Archiving all to-be-deleted NSFW tumblr blogs.
Lua
47
star
15

imgur-grab

Archiving imgur.
Lua
42
star
16

universal-tracker

A configurable, reusable tracker with dashboard
JavaScript
30
star
17

googleplus-grab

Archiving Google+.
Lua
26
star
18

terroroftinytown-client-grab

The Seesaw pipeline grab script for the URLTeam (terroroftinytown) project
Python
25
star
19

ludios_wpull

wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
HTML
25
star
20

warrior-code2

Boot scripts for the ArchiveTeam Warrior 2
Shell
22
star
21

ftp-gov-grab

Archiving government FTPs.
Python
21
star
22

warrior-code

Shell
19
star
23

WebArchiver

Decentralized web archiving
Python
19
star
24

soundcloud-grab

Lua
18
star
25

500px-grab

Archiving https://500px.com/creativecommons
Lua
17
star
26

tinyback

A tiny web scraper
Python
17
star
27

gamemaker-sandbox-items

Gamemaker Sandbox Tracker items
15
star
28

youtube-grab

Archiving all metadata from YouTube (everything except videos themselves due to size)
Lua
14
star
29

youtube-dislikes-grab

Archiving general youtube video metadata through innertube for dislikes removal.
Lua
14
star
30

youtube-dislikes-items

Managing items for youtube-dislikes-grab.
11
star
31

VideoBot

Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.
Python
11
star
32

urlteam-stuff

Urlteam website, code, ... also, PONIES
C
10
star
33

urls-grab

Archiving URLs (outlinks) from a variety of sources.
Lua
9
star
34

NewsGrabber-Warrior

Python
8
star
35

google-sites-grab

Archiving Google Sites Classic.
Lua
8
star
36

flickr-grab

Grabbing Flickr images.
Lua
7
star
37

pastebin-grab

Archiving pastebin
Lua
7
star
38

youtube-items

Managing items for youtube-grab
7
star
39

wget-lua-forum-scripts

Downloading forums posts with Wget+Lua
Lua
6
star
40

greader-grab

http://www.archiveteam.org/index.php?title=Google_Reader
Python
6
star
41

ftp-grab

Save all FTP sites!
Python
6
star
42

mediafire-items

Managing items for mediafire-grab.
Roff
6
star
43

citeseerxpdf-grab

Grabbing all sources of CiteSeerX.
Lua
6
star
44

twitchtv-grab

Grabbing twitch.tv videos
Python
6
star
45

mobileme-grab

Downloading MobileMe
Shell
6
star
46

warrior-preseed

Constructing a new warrior VM
Shell
5
star
47

ftp-nab

Thinger to download FTP sites
Shell
5
star
48

coursera-grab

Saving courses from Coursera.
Lua
5
star
49

tinyarchive

Software behind tracker.tinyarchive.org - Warning: Very hacky code
Python
5
star
50

formspring-grab

Downloading Formspring
Lua
5
star
51

yahoomessages-grab

Archiving Yahoo Messages
Python
5
star
52

splinder-grab

Python
5
star
53

telegram-grab

Archiving public telegram messages.
Lua
5
star
54

ffnet-grab

Fanfictioning
Python
5
star
55

archiveteam-megawarc-factory

Some scripts to process ArchiveTeam uploads
Shell
5
star
56

roblox-grab

Archiving roblox forums.
Lua
4
star
57

gamemaker-sandbox-grab

Grabbing sandbox.yoyogames.com
Python
4
star
58

justintv-grab

Grabbing as much of justin.tv's archives as possible
Python
4
star
59

sourceforge-grab

Archiving SourceForge.
Lua
4
star
60

grab-base-df

Base Dockerfile for warrior project grab scripts
Dockerfile
4
star
61

wikis-grab

Grabbing all wikis.
Python
4
star
62

liveleak-grab

Archiving liveleak.com
Lua
4
star
63

tencent-weibo-grab

Archiving Tencent Weibo (t.qq.com), 腾讯微博
Lua
4
star
64

imdb-grab

Archiving IMDb.
Lua
4
star
65

reddit-items

Managing items for reddit-grab.
4
star
66

flashdomains-grab

Copy of domains-grab for Flash sites.
Lua
4
star
67

ftp-queue

Create queue items for ftp-grab.
NewLisp
4
star
68

tumblr-grab-test

Archiving Tumblr blogs (an ArchiveTeam Warrior testing project)
Python
4
star
69

heroku-buildpack-archiveteam-warrior

Heroku buildpack with the Archive Team Warrior
Shell
4
star
70

mobileme-index

An index of the MobileMe downloads
Ruby
3
star
71

twitchtv-items

Managing twitch.tv items.
Python
3
star
72

Universal-tracker-2

A better tracker with more features for ArchiveTeam
Python
3
star
73

eroshare-grab

Lua
3
star
74

panoramio-grab

Grabbing everything from panoramio
Lua
3
star
75

blingee-grab

Saving all images and content from Blingee.
Lua
3
star
76

vidme-grab

Archiving all videos from vid.me.
Python
3
star
77

livejournal-discovery

Discovering items for livejournal-grab.
Python
3
star
78

github-grab

Archiving GitHub
Lua
3
star
79

mediafire-grab

Archiving mediafire.com URLs.
Lua
3
star
80

furaffinity-grab

Grabbing all images and other stuff from Fur Affinity.
Python
3
star
81

yahoogroups-grab

Archiving Yahoo! Groups.
Lua
3
star
82

webs-grab

Archiving webs.com
Lua
3
star
83

puush-grab

Python
3
star
84

twitchtv-discovery-grab

Discovering twitch.tv content
Python
3
star
85

vlive-grab

Archiving vlive.tv.
Lua
3
star
86

NewsGrabber-Services

The services for NewsGrabber.
Python
3
star
87

parler-items

Managing items for parler-grab.
3
star
88

furaffinity-items

Python
3
star
89

standalone-readme-template

Readme instructions template for manually running pipeline grab scripts outside the warrior
3
star
90

ArchiveBot-agents

Site-specific agents that work with ArchiveBot
Ruby
3
star
91

ua-grab

Archiving all of .ua.
Lua
2
star
92

googlecode-grab

Saving the full Google Code site!
Lua
2
star
93

pixiv-2-grab

Archiving pixiv2 images
Lua
2
star
94

miiverse-grab

Archiving miiverse
Lua
2
star
95

orkut-grab

Download all of Orkut
Lua
2
star
96

dpreview-grab

Archiving DPReview
Lua
2
star
97

bottle

A statistics monitor for the listerine download project @ Archive Team. Massive hack, no tests.
Ruby
2
star
98

halo-new-grab

Archiving Halo (round 2)
Lua
2
star
99

googleplus-items

Managing items for googleplus-grab and googleplus2-grab.
2
star
100

furaffinity-discovery

Python
2
star