• Stars
    star
    244
  • Rank 165,885 (Top 4 %)
  • Language
    Dockerfile
  • Created almost 11 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Dockerfile for the ArchiveTeam Warrior

Archive Team Warrior Dockerfile

Warrior logoDocker logo

A Dockerfile for the Archive Team Warrior

Build, run, grab the container IP and access the web interface on port 8001.

Getting Started

Available as a built image at atdr.meo.ws/archiveteam/warrior-dockerfile.

The following example:

  • Runs the Warrior in the background
  • Configures Warrior to automatically start it again after machine reboot
  • And configures Watchtower to automatically update Warrior (optional, but recommended).
docker run --detach \
  --name watchtower \
  --restart=on-failure \
  --volume /var/run/docker.sock:/var/run/docker.sock \
  containrrr/watchtower --label-enable --cleanup --interval 3600

docker run --detach \
  --name archiveteam-warrior \
  --label=com.centurylinklabs.watchtower.enable=true \
  --restart=on-failure \
  --publish 8001:8001 \
  atdr.meo.ws/archiveteam/warrior-dockerfile

On Windows (CMD), replace \ with ^ like so:

docker run --detach ^
  --name watchtower ^
  --restart=on-failure ^
  --volume /var/run/docker.sock:/var/run/docker.sock ^
  containrrr/watchtower --label-enable --cleanup --interval 3600

docker run --detach ^
  --name archiveteam-warrior ^
  --label=com.centurylinklabs.watchtower.enable=true ^
  --restart=on-failure ^
  --publish 8001:8001 ^
  atdr.meo.ws/archiveteam/warrior-dockerfile

On Windows (PowerShell), replace \ (in the Linux example) with `.

To easily access the Warrior's web interface of multiple containers, try binding a different port for each subsequent container by incrementing --publish in your docker run command for the Warrior like so:

docker run --detach \
  --env DOWNLOADER="your name" \
  --env SELECTED_PROJECT="auto" \
  --name archiveteam-warrior \
  --label=com.centurylinklabs.watchtower.enable=true \
  --restart=on-failure \
  --publish 8002:8001 \
  atdr.meo.ws/archiveteam/warrior-dockerfile

Configuration

Configuration of Warrior can be done in one of three ways:

  • Manually via the web interface.
  • Via environment variables.
  • Or via a configuration file (projects/config.json).

Manual Using the Web Interface

To access the web interface get the container IP from docker inspect and point your browser to http://IP:8001. If you are running this container on a headless machine, be sure to bind the docker container's port to a port on that machine (e.g. -p 8001:8001) so that you can access the web interface on your LAN.

You can stop and resume the Warrior with docker stop and docker start

Using Environment Variables

If you don't mount a projects/config.json configuration, you can provide seed settings using environment variables. Once a projects/config.json file exists, environment variables will be ignored.

For example, to specify environment variables, modify your docker run command for the Warrior like so:

docker run --detach \
  --env DOWNLOADER="your name" \
  --env SELECTED_PROJECT="auto" \
  --name archiveteam-warrior \
  --label=com.centurylinklabs.watchtower.enable=true \
  --restart=on-failure \
  --publish 8001:8001 \
  atdr.meo.ws/archiveteam/warrior-dockerfile

Configuration Mapping

ENV JSON key Example Default
DOWNLOADER downloader
HTTP_PASSWORD http_password
HTTP_USERNAME http_username
SELECTED_PROJECT selected_project auto, tumblr
SHARED_RSYNC_THREADS shared:rsync_threads 20
WARRIOR_ID warrior_id
CONCURRENT_ITEMS concurrent_items 3

Other Ways to Run

Kubernetes

Edit the environment variable DOWNLOADER inside k8s-warrior.yml and set it to your name. This name will be used on the leaderboards.

kubectl create namespace archive
kubectl apply -n archive -f k8s-warrior.yml

If everything works out you should be able to connect to any of your k8s' nodes IP on port 30163 to view.

You can build the image on other platforms by using docker buildx, e.g.:

docker buildx build -t <yourusername>/archive-team-warrior:latest --platform linux/arm/v7 --push .

Docker Compose

First edit the docker-compose.yml file with any configuration keys (as described above). When configured to your liking, use docker compose to start both Warrior and Watchtower.

cd examples
docker compose up -d

More Repositories

1

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Python
1,330
star
2

wpull

Wget-compatible web downloader and crawler.
HTML
551
star
3

ArchiveBot

ArchiveBot, an IRC bot for archiving websites
Python
325
star
4

parler-grab

Archiving Parler.
Lua
229
star
5

Ubuntu-Warrior

Scripts to build and boot warrior virtual machine containing Docker
Shell
114
star
6

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
C
91
star
7

IA.BAK

We back up a lot of stuff from around the web; now it's time to back up the Internet Archive, just in case.
Shell
87
star
8

seesaw-kit

Making a reusable toolkit for writing seesaw scripts
Python
67
star
9

terroroftinytown

URLTeam's second generation of URL shortener archiving tools
Python
63
star
10

reddit-grab

Grabbing everything from reddit.
Lua
60
star
11

NewsGrabber

Grabbing all news.
Python
59
star
12

yahooanswers-grab

Saving all questions and answers from Yahoo! Answers.
Lua
48
star
13

tumblr-grab

Archiving all to-be-deleted NSFW tumblr blogs.
Lua
47
star
14

imgur-grab

Archiving imgur.
Lua
42
star
15

universal-tracker

A configurable, reusable tracker with dashboard
JavaScript
30
star
16

googleplus-grab

Archiving Google+.
Lua
26
star
17

terroroftinytown-client-grab

The Seesaw pipeline grab script for the URLTeam (terroroftinytown) project
Python
25
star
18

ludios_wpull

wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
HTML
25
star
19

warrior-code2

Boot scripts for the ArchiveTeam Warrior 2
Shell
22
star
20

ftp-gov-grab

Archiving government FTPs.
Python
21
star
21

warrior-code

Shell
19
star
22

WebArchiver

Decentralized web archiving
Python
19
star
23

soundcloud-grab

Lua
18
star
24

500px-grab

Archiving https://500px.com/creativecommons
Lua
17
star
25

tinyback

A tiny web scraper
Python
17
star
26

gamemaker-sandbox-items

Gamemaker Sandbox Tracker items
15
star
27

youtube-grab

Archiving all metadata from YouTube (everything except videos themselves due to size)
Lua
14
star
28

youtube-dislikes-grab

Archiving general youtube video metadata through innertube for dislikes removal.
Lua
14
star
29

youtube-dislikes-items

Managing items for youtube-dislikes-grab.
11
star
30

VideoBot

Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.
Python
11
star
31

urlteam-stuff

Urlteam website, code, ... also, PONIES
C
10
star
32

urls-grab

Archiving URLs (outlinks) from a variety of sources.
Lua
9
star
33

NewsGrabber-Warrior

Python
8
star
34

google-sites-grab

Archiving Google Sites Classic.
Lua
8
star
35

flickr-grab

Grabbing Flickr images.
Lua
7
star
36

pastebin-grab

Archiving pastebin
Lua
7
star
37

youtube-items

Managing items for youtube-grab
7
star
38

wget-lua-forum-scripts

Downloading forums posts with Wget+Lua
Lua
6
star
39

greader-grab

http://www.archiveteam.org/index.php?title=Google_Reader
Python
6
star
40

ftp-grab

Save all FTP sites!
Python
6
star
41

mediafire-items

Managing items for mediafire-grab.
Roff
6
star
42

citeseerxpdf-grab

Grabbing all sources of CiteSeerX.
Lua
6
star
43

twitchtv-grab

Grabbing twitch.tv videos
Python
6
star
44

mobileme-grab

Downloading MobileMe
Shell
6
star
45

warrior-preseed

Constructing a new warrior VM
Shell
5
star
46

ftp-nab

Thinger to download FTP sites
Shell
5
star
47

coursera-grab

Saving courses from Coursera.
Lua
5
star
48

tinyarchive

Software behind tracker.tinyarchive.org - Warning: Very hacky code
Python
5
star
49

formspring-grab

Downloading Formspring
Lua
5
star
50

yahoomessages-grab

Archiving Yahoo Messages
Python
5
star
51

splinder-grab

Python
5
star
52

telegram-grab

Archiving public telegram messages.
Lua
5
star
53

ffnet-grab

Fanfictioning
Python
5
star
54

archiveteam-megawarc-factory

Some scripts to process ArchiveTeam uploads
Shell
5
star
55

roblox-grab

Archiving roblox forums.
Lua
4
star
56

gamemaker-sandbox-grab

Grabbing sandbox.yoyogames.com
Python
4
star
57

justintv-grab

Grabbing as much of justin.tv's archives as possible
Python
4
star
58

sourceforge-grab

Archiving SourceForge.
Lua
4
star
59

grab-base-df

Base Dockerfile for warrior project grab scripts
Dockerfile
4
star
60

wikis-grab

Grabbing all wikis.
Python
4
star
61

liveleak-grab

Archiving liveleak.com
Lua
4
star
62

tencent-weibo-grab

Archiving Tencent Weibo (t.qq.com), ่…พ่ฎฏๅพฎๅš
Lua
4
star
63

imdb-grab

Archiving IMDb.
Lua
4
star
64

reddit-items

Managing items for reddit-grab.
4
star
65

flashdomains-grab

Copy of domains-grab for Flash sites.
Lua
4
star
66

ftp-queue

Create queue items for ftp-grab.
NewLisp
4
star
67

tumblr-grab-test

Archiving Tumblr blogs (an ArchiveTeam Warrior testing project)
Python
4
star
68

heroku-buildpack-archiveteam-warrior

Heroku buildpack with the Archive Team Warrior
Shell
4
star
69

mobileme-index

An index of the MobileMe downloads
Ruby
3
star
70

twitchtv-items

Managing twitch.tv items.
Python
3
star
71

Universal-tracker-2

A better tracker with more features for ArchiveTeam
Python
3
star
72

eroshare-grab

Lua
3
star
73

panoramio-grab

Grabbing everything from panoramio
Lua
3
star
74

blingee-grab

Saving all images and content from Blingee.
Lua
3
star
75

vidme-grab

Archiving all videos from vid.me.
Python
3
star
76

livejournal-discovery

Discovering items for livejournal-grab.
Python
3
star
77

github-grab

Archiving GitHub
Lua
3
star
78

mediafire-grab

Archiving mediafire.com URLs.
Lua
3
star
79

furaffinity-grab

Grabbing all images and other stuff from Fur Affinity.
Python
3
star
80

yahoogroups-grab

Archiving Yahoo! Groups.
Lua
3
star
81

webs-grab

Archiving webs.com
Lua
3
star
82

puush-grab

Python
3
star
83

twitchtv-discovery-grab

Discovering twitch.tv content
Python
3
star
84

vlive-grab

Archiving vlive.tv.
Lua
3
star
85

NewsGrabber-Services

The services for NewsGrabber.
Python
3
star
86

parler-items

Managing items for parler-grab.
3
star
87

furaffinity-items

Python
3
star
88

standalone-readme-template

Readme instructions template for manually running pipeline grab scripts outside the warrior
3
star
89

ArchiveBot-agents

Site-specific agents that work with ArchiveBot
Ruby
3
star
90

ua-grab

Archiving all of .ua.
Lua
2
star
91

googlecode-grab

Saving the full Google Code site!
Lua
2
star
92

pixiv-2-grab

Archiving pixiv2 images
Lua
2
star
93

miiverse-grab

Archiving miiverse
Lua
2
star
94

orkut-grab

Download all of Orkut
Lua
2
star
95

dpreview-grab

Archiving DPReview
Lua
2
star
96

bottle

A statistics monitor for the listerine download project @ Archive Team. Massive hack, no tests.
Ruby
2
star
97

halo-new-grab

Archiving Halo (round 2)
Lua
2
star
98

googleplus-items

Managing items for googleplus-grab and googleplus2-grab.
2
star
99

furaffinity-discovery

Python
2
star
100

scrapy-thingy

Archiving Thingiverse
Shell
2
star