• Stars
    star
    25
  • Rank 957,573 (Top 19 %)
  • Language
    HTML
  • License
    GNU General Publi...
  • Created about 6 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved

More Repositories

1

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Python
1,330
star
2

wpull

Wget-compatible web downloader and crawler.
HTML
551
star
3

ArchiveBot

ArchiveBot, an IRC bot for archiving websites
Python
325
star
4

warrior-dockerfile

A Dockerfile for the ArchiveTeam Warrior
Dockerfile
244
star
5

parler-grab

Archiving Parler.
Lua
229
star
6

Ubuntu-Warrior

Scripts to build and boot warrior virtual machine containing Docker
Shell
114
star
7

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
C
91
star
8

IA.BAK

We back up a lot of stuff from around the web; now it's time to back up the Internet Archive, just in case.
Shell
87
star
9

seesaw-kit

Making a reusable toolkit for writing seesaw scripts
Python
67
star
10

terroroftinytown

URLTeam's second generation of URL shortener archiving tools
Python
63
star
11

reddit-grab

Grabbing everything from reddit.
Lua
60
star
12

NewsGrabber

Grabbing all news.
Python
59
star
13

yahooanswers-grab

Saving all questions and answers from Yahoo! Answers.
Lua
48
star
14

tumblr-grab

Archiving all to-be-deleted NSFW tumblr blogs.
Lua
47
star
15

imgur-grab

Archiving imgur.
Lua
42
star
16

universal-tracker

A configurable, reusable tracker with dashboard
JavaScript
30
star
17

googleplus-grab

Archiving Google+.
Lua
26
star
18

terroroftinytown-client-grab

The Seesaw pipeline grab script for the URLTeam (terroroftinytown) project
Python
25
star
19

warrior-code2

Boot scripts for the ArchiveTeam Warrior 2
Shell
22
star
20

ftp-gov-grab

Archiving government FTPs.
Python
21
star
21

warrior-code

Shell
19
star
22

WebArchiver

Decentralized web archiving
Python
19
star
23

soundcloud-grab

Lua
18
star
24

500px-grab

Archiving https://500px.com/creativecommons
Lua
17
star
25

tinyback

A tiny web scraper
Python
17
star
26

gamemaker-sandbox-items

Gamemaker Sandbox Tracker items
15
star
27

youtube-grab

Archiving all metadata from YouTube (everything except videos themselves due to size)
Lua
14
star
28

youtube-dislikes-grab

Archiving general youtube video metadata through innertube for dislikes removal.
Lua
14
star
29

youtube-dislikes-items

Managing items for youtube-dislikes-grab.
11
star
30

VideoBot

Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.
Python
11
star
31

urlteam-stuff

Urlteam website, code, ... also, PONIES
C
10
star
32

urls-grab

Archiving URLs (outlinks) from a variety of sources.
Lua
9
star
33

NewsGrabber-Warrior

Python
8
star
34

google-sites-grab

Archiving Google Sites Classic.
Lua
8
star
35

flickr-grab

Grabbing Flickr images.
Lua
7
star
36

pastebin-grab

Archiving pastebin
Lua
7
star
37

youtube-items

Managing items for youtube-grab
7
star
38

wget-lua-forum-scripts

Downloading forums posts with Wget+Lua
Lua
6
star
39

greader-grab

http://www.archiveteam.org/index.php?title=Google_Reader
Python
6
star
40

ftp-grab

Save all FTP sites!
Python
6
star
41

mediafire-items

Managing items for mediafire-grab.
Roff
6
star
42

citeseerxpdf-grab

Grabbing all sources of CiteSeerX.
Lua
6
star
43

twitchtv-grab

Grabbing twitch.tv videos
Python
6
star
44

mobileme-grab

Downloading MobileMe
Shell
6
star
45

warrior-preseed

Constructing a new warrior VM
Shell
5
star
46

ftp-nab

Thinger to download FTP sites
Shell
5
star
47

coursera-grab

Saving courses from Coursera.
Lua
5
star
48

tinyarchive

Software behind tracker.tinyarchive.org - Warning: Very hacky code
Python
5
star
49

formspring-grab

Downloading Formspring
Lua
5
star
50

yahoomessages-grab

Archiving Yahoo Messages
Python
5
star
51

splinder-grab

Python
5
star
52

telegram-grab

Archiving public telegram messages.
Lua
5
star
53

ffnet-grab

Fanfictioning
Python
5
star
54

archiveteam-megawarc-factory

Some scripts to process ArchiveTeam uploads
Shell
5
star
55

roblox-grab

Archiving roblox forums.
Lua
4
star
56

gamemaker-sandbox-grab

Grabbing sandbox.yoyogames.com
Python
4
star
57

justintv-grab

Grabbing as much of justin.tv's archives as possible
Python
4
star
58

sourceforge-grab

Archiving SourceForge.
Lua
4
star
59

grab-base-df

Base Dockerfile for warrior project grab scripts
Dockerfile
4
star
60

wikis-grab

Grabbing all wikis.
Python
4
star
61

liveleak-grab

Archiving liveleak.com
Lua
4
star
62

tencent-weibo-grab

Archiving Tencent Weibo (t.qq.com), 腾讯微博
Lua
4
star
63

imdb-grab

Archiving IMDb.
Lua
4
star
64

reddit-items

Managing items for reddit-grab.
4
star
65

flashdomains-grab

Copy of domains-grab for Flash sites.
Lua
4
star
66

ftp-queue

Create queue items for ftp-grab.
NewLisp
4
star
67

tumblr-grab-test

Archiving Tumblr blogs (an ArchiveTeam Warrior testing project)
Python
4
star
68

heroku-buildpack-archiveteam-warrior

Heroku buildpack with the Archive Team Warrior
Shell
4
star
69

mobileme-index

An index of the MobileMe downloads
Ruby
3
star
70

twitchtv-items

Managing twitch.tv items.
Python
3
star
71

Universal-tracker-2

A better tracker with more features for ArchiveTeam
Python
3
star
72

eroshare-grab

Lua
3
star
73

panoramio-grab

Grabbing everything from panoramio
Lua
3
star
74

blingee-grab

Saving all images and content from Blingee.
Lua
3
star
75

vidme-grab

Archiving all videos from vid.me.
Python
3
star
76

livejournal-discovery

Discovering items for livejournal-grab.
Python
3
star
77

github-grab

Archiving GitHub
Lua
3
star
78

mediafire-grab

Archiving mediafire.com URLs.
Lua
3
star
79

furaffinity-grab

Grabbing all images and other stuff from Fur Affinity.
Python
3
star
80

yahoogroups-grab

Archiving Yahoo! Groups.
Lua
3
star
81

webs-grab

Archiving webs.com
Lua
3
star
82

puush-grab

Python
3
star
83

twitchtv-discovery-grab

Discovering twitch.tv content
Python
3
star
84

vlive-grab

Archiving vlive.tv.
Lua
3
star
85

NewsGrabber-Services

The services for NewsGrabber.
Python
3
star
86

parler-items

Managing items for parler-grab.
3
star
87

furaffinity-items

Python
3
star
88

standalone-readme-template

Readme instructions template for manually running pipeline grab scripts outside the warrior
3
star
89

ArchiveBot-agents

Site-specific agents that work with ArchiveBot
Ruby
3
star
90

ua-grab

Archiving all of .ua.
Lua
2
star
91

googlecode-grab

Saving the full Google Code site!
Lua
2
star
92

pixiv-2-grab

Archiving pixiv2 images
Lua
2
star
93

miiverse-grab

Archiving miiverse
Lua
2
star
94

orkut-grab

Download all of Orkut
Lua
2
star
95

dpreview-grab

Archiving DPReview
Lua
2
star
96

bottle

A statistics monitor for the listerine download project @ Archive Team. Massive hack, no tests.
Ruby
2
star
97

halo-new-grab

Archiving Halo (round 2)
Lua
2
star
98

googleplus-items

Managing items for googleplus-grab and googleplus2-grab.
2
star
99

furaffinity-discovery

Python
2
star
100

scrapy-thingy

Archiving Thingiverse
Shell
2
star