• Stars
    star
    3
  • Rank 3,963,521 (Top 79 %)
  • Language
  • License
    Creative Commons ...
  • Created almost 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Readme instructions template for manually running pipeline grab scripts outside the warrior

More Repositories

1

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Python
1,330
star
2

wpull

Wget-compatible web downloader and crawler.
HTML
551
star
3

ArchiveBot

ArchiveBot, an IRC bot for archiving websites
Python
325
star
4

warrior-dockerfile

A Dockerfile for the ArchiveTeam Warrior
Dockerfile
244
star
5

parler-grab

Archiving Parler.
Lua
229
star
6

Ubuntu-Warrior

Scripts to build and boot warrior virtual machine containing Docker
Shell
114
star
7

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
C
91
star
8

IA.BAK

We back up a lot of stuff from around the web; now it's time to back up the Internet Archive, just in case.
Shell
87
star
9

seesaw-kit

Making a reusable toolkit for writing seesaw scripts
Python
67
star
10

terroroftinytown

URLTeam's second generation of URL shortener archiving tools
Python
63
star
11

reddit-grab

Grabbing everything from reddit.
Lua
60
star
12

NewsGrabber

Grabbing all news.
Python
59
star
13

yahooanswers-grab

Saving all questions and answers from Yahoo! Answers.
Lua
48
star
14

tumblr-grab

Archiving all to-be-deleted NSFW tumblr blogs.
Lua
47
star
15

imgur-grab

Archiving imgur.
Lua
42
star
16

universal-tracker

A configurable, reusable tracker with dashboard
JavaScript
30
star
17

googleplus-grab

Archiving Google+.
Lua
26
star
18

terroroftinytown-client-grab

The Seesaw pipeline grab script for the URLTeam (terroroftinytown) project
Python
25
star
19

ludios_wpull

wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
HTML
25
star
20

warrior-code2

Boot scripts for the ArchiveTeam Warrior 2
Shell
22
star
21

ftp-gov-grab

Archiving government FTPs.
Python
21
star
22

warrior-code

Shell
19
star
23

WebArchiver

Decentralized web archiving
Python
19
star
24

soundcloud-grab

Lua
18
star
25

500px-grab

Archiving https://500px.com/creativecommons
Lua
17
star
26

tinyback

A tiny web scraper
Python
17
star
27

gamemaker-sandbox-items

Gamemaker Sandbox Tracker items
15
star
28

youtube-grab

Archiving all metadata from YouTube (everything except videos themselves due to size)
Lua
14
star
29

youtube-dislikes-grab

Archiving general youtube video metadata through innertube for dislikes removal.
Lua
14
star
30

youtube-dislikes-items

Managing items for youtube-dislikes-grab.
11
star
31

VideoBot

Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.
Python
11
star
32

urlteam-stuff

Urlteam website, code, ... also, PONIES
C
10
star
33

urls-grab

Archiving URLs (outlinks) from a variety of sources.
Lua
9
star
34

NewsGrabber-Warrior

Python
8
star
35

google-sites-grab

Archiving Google Sites Classic.
Lua
8
star
36

flickr-grab

Grabbing Flickr images.
Lua
7
star
37

pastebin-grab

Archiving pastebin
Lua
7
star
38

youtube-items

Managing items for youtube-grab
7
star
39

wget-lua-forum-scripts

Downloading forums posts with Wget+Lua
Lua
6
star
40

greader-grab

http://www.archiveteam.org/index.php?title=Google_Reader
Python
6
star
41

ftp-grab

Save all FTP sites!
Python
6
star
42

mediafire-items

Managing items for mediafire-grab.
Roff
6
star
43

citeseerxpdf-grab

Grabbing all sources of CiteSeerX.
Lua
6
star
44

twitchtv-grab

Grabbing twitch.tv videos
Python
6
star
45

mobileme-grab

Downloading MobileMe
Shell
6
star
46

warrior-preseed

Constructing a new warrior VM
Shell
5
star
47

ftp-nab

Thinger to download FTP sites
Shell
5
star
48

coursera-grab

Saving courses from Coursera.
Lua
5
star
49

tinyarchive

Software behind tracker.tinyarchive.org - Warning: Very hacky code
Python
5
star
50

formspring-grab

Downloading Formspring
Lua
5
star
51

yahoomessages-grab

Archiving Yahoo Messages
Python
5
star
52

splinder-grab

Python
5
star
53

telegram-grab

Archiving public telegram messages.
Lua
5
star
54

ffnet-grab

Fanfictioning
Python
5
star
55

archiveteam-megawarc-factory

Some scripts to process ArchiveTeam uploads
Shell
5
star
56

roblox-grab

Archiving roblox forums.
Lua
4
star
57

gamemaker-sandbox-grab

Grabbing sandbox.yoyogames.com
Python
4
star
58

justintv-grab

Grabbing as much of justin.tv's archives as possible
Python
4
star
59

sourceforge-grab

Archiving SourceForge.
Lua
4
star
60

grab-base-df

Base Dockerfile for warrior project grab scripts
Dockerfile
4
star
61

wikis-grab

Grabbing all wikis.
Python
4
star
62

liveleak-grab

Archiving liveleak.com
Lua
4
star
63

tencent-weibo-grab

Archiving Tencent Weibo (t.qq.com), 腾讯微博
Lua
4
star
64

imdb-grab

Archiving IMDb.
Lua
4
star
65

reddit-items

Managing items for reddit-grab.
4
star
66

flashdomains-grab

Copy of domains-grab for Flash sites.
Lua
4
star
67

ftp-queue

Create queue items for ftp-grab.
NewLisp
4
star
68

tumblr-grab-test

Archiving Tumblr blogs (an ArchiveTeam Warrior testing project)
Python
4
star
69

heroku-buildpack-archiveteam-warrior

Heroku buildpack with the Archive Team Warrior
Shell
4
star
70

mobileme-index

An index of the MobileMe downloads
Ruby
3
star
71

twitchtv-items

Managing twitch.tv items.
Python
3
star
72

Universal-tracker-2

A better tracker with more features for ArchiveTeam
Python
3
star
73

eroshare-grab

Lua
3
star
74

panoramio-grab

Grabbing everything from panoramio
Lua
3
star
75

blingee-grab

Saving all images and content from Blingee.
Lua
3
star
76

vidme-grab

Archiving all videos from vid.me.
Python
3
star
77

livejournal-discovery

Discovering items for livejournal-grab.
Python
3
star
78

github-grab

Archiving GitHub
Lua
3
star
79

mediafire-grab

Archiving mediafire.com URLs.
Lua
3
star
80

furaffinity-grab

Grabbing all images and other stuff from Fur Affinity.
Python
3
star
81

yahoogroups-grab

Archiving Yahoo! Groups.
Lua
3
star
82

webs-grab

Archiving webs.com
Lua
3
star
83

puush-grab

Python
3
star
84

twitchtv-discovery-grab

Discovering twitch.tv content
Python
3
star
85

vlive-grab

Archiving vlive.tv.
Lua
3
star
86

NewsGrabber-Services

The services for NewsGrabber.
Python
3
star
87

parler-items

Managing items for parler-grab.
3
star
88

furaffinity-items

Python
3
star
89

ArchiveBot-agents

Site-specific agents that work with ArchiveBot
Ruby
3
star
90

ua-grab

Archiving all of .ua.
Lua
2
star
91

googlecode-grab

Saving the full Google Code site!
Lua
2
star
92

pixiv-2-grab

Archiving pixiv2 images
Lua
2
star
93

miiverse-grab

Archiving miiverse
Lua
2
star
94

orkut-grab

Download all of Orkut
Lua
2
star
95

dpreview-grab

Archiving DPReview
Lua
2
star
96

bottle

A statistics monitor for the listerine download project @ Archive Team. Massive hack, no tests.
Ruby
2
star
97

halo-new-grab

Archiving Halo (round 2)
Lua
2
star
98

googleplus-items

Managing items for googleplus-grab and googleplus2-grab.
2
star
99

furaffinity-discovery

Python
2
star
100

scrapy-thingy

Archiving Thingiverse
Shell
2
star