• Stars
    star
    229
  • Rank 174,666 (Top 4 %)
  • Language
    Lua
  • License
    The Unlicense
  • Created about 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Archiving Parler.

parler-grab

More information about the archiving project can be found on the ArchiveTeam wiki: Parler

DOES NOT WORK IN THE WARRIOR

Setup instructions

Be sure to replace YOURNICKHERE with the nickname that you want to be shown as, on the tracker. You don't need to register it, just pick a nickname you like.

In most of the below cases, there will be a web interface running at http://localhost:8001/. If you don't know or care what this is, you can just ignore it—otherwise, it gives you a fancy view of what's going on.

If anything goes wrong while running the commands below, please scroll down to the bottom of this page. There's troubleshooting information there.

Running with a warrior

Follow the instructions on the ArchiveTeam wiki for installing the Warrior, and select the "Parler" project in the Warrior interface.

Running without a warrior

To run this outside the warrior, clone this repository, cd into its directory and run:

python3 -m pip install setuptools wheel
python3 -m pip install --upgrade seesaw zstandard requests
./get-wget-lua.sh

then start downloading with:

run-pipeline3 pipeline.py --concurrent 2 YOURNICKHERE

For more options, run:

run-pipeline3 --help

If you don't have root access and/or your version of pip is very old, you can replace "pip install --upgrade seesaw" with:

wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py ; python3 get-pip.py --user ; ~/.local/bin/pip3 install --upgrade --user seesaw

so that pip and seesaw are installed in your home, then run

~/.local/bin/run-pipeline3 pipeline.py --concurrent 2 YOURNICKHERE

Running multiple instances on different IPs

This feature requires seesaw version 0.0.16 or greater. Use pip install --upgrade seesaw to upgrade.

Use the --context-value argument to pass in bind_address=123.4.5.6 (replace the IP address with your own).

Example of running 2 threads, no web interface, and Wget binding of IP address:

run-pipeline3 pipeline.py --concurrent 2 YOURNICKHERE --disable-web-server --context-value bind_address=123.4.5.6

Distribution-specific setup

For Debian/Ubuntu:

Package libzstd-dev version 1.4.4 is required which is currently available from buster-backports.

adduser --system --group --shell /bin/bash archiveteam
echo deb http://deb.debian.org/debian buster-backports main contrib > /etc/apt/sources.list.d/backports.list
apt-get update \
&& apt-get install -y git-core libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen bzip2 zlib1g-dev flex autoconf autopoint texinfo gperf lua-socket rsync automake pkg-config python3-dev python3-pip build-essential \
&& apt-get -t buster-backports install zstd libzstd-dev libzstd1
python3 -m pip install setuptools wheel
python3 -m pip install --upgrade seesaw zstandard requests
su -c "cd /home/archiveteam; git clone https://github.com/ArchiveTeam/parler-grab.git; cd parler-grab; ./get-wget-lua.sh" archiveteam
screen su -c "cd /home/archiveteam/parler-grab/; run-pipeline3 pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam
[... ctrl+A D to detach ...]

In Debian Jessie, Ubuntu 18.04 Bionic and above, the libgnutls-dev package was renamed to libgnutls28-dev. So, you need to do the following instead:

adduser --system --group --shell /bin/bash archiveteam
echo deb http://deb.debian.org/debian buster-backports main contrib > /etc/apt/sources.list.d/backports.list
apt-get update \
&& apt-get install -y git-core libgnutls28-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen bzip2 zlib1g-dev flex autoconf autopoint texinfo gperf lua-socket rsync automake pkg-config python3-dev python3-pip build-essential \
&& apt-get -t buster-backports install zstd libzstd-dev libzstd1
[... pretty much the same as above ...]

Wget-lua is also available on ArchiveTeam's PPA for Ubuntu.

For CentOS:

Ensure that you have the CentOS equivalent of bzip2 installed as well. You will need the EPEL repository to be enabled.

yum -y groupinstall "Development Tools"
yum -y install gnutls-devel lua-devel python-pip zlib-devel zstd libzstd-devel git-core gperf lua-socket luarocks texinfo git rsync gettext-devel
pip install --upgrade seesaw
[... pretty much the same as above ...]

Tested with EL7 repositories.

For Fedora:

The same as CentOS but with "dnf" instead of "yum". Did not successfully test compiling, so far.

For openSUSE:

zypper install liblua5_1 lua51 lua51-devel screen python-pip libgnutls-devel bzip2 python-devel gcc make
pip install --upgrade seesaw
[... pretty much the same as above ...]

For OS X:

You need Homebrew. Ensure that you have the OS X equivalent of bzip2 installed as well.

brew install python lua gnutls
pip install --upgrade seesaw
[... pretty much the same as above ...]

There is a known issue with some packaged versions of rsync. If you get errors during the upload stage, parler-grab will not work with your rsync version.

This supposedly fixes it:

alias rsync=/usr/local/bin/rsync

For Arch Linux:

Ensure that you have the Arch equivalent of bzip2 installed as well.

  1. Make sure you have python2-pip installed.
  2. Install the wget-lua package from the AUR.
  3. Run pip2 install --upgrade seesaw.
  4. Modify the run-pipeline script in seesaw to point at #!/usr/bin/python2 instead of #!/usr/bin/python.
  5. useradd --system --group users --shell /bin/bash --create-home archiveteam
  6. screen su -c "cd /home/archiveteam/parler-grab/; run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam

For Alpine Linux:

apk add lua5.1 git python bzip2 bash rsync gcc libc-dev lua5.1-dev zlib-dev gnutls-dev autoconf flex make
python -m ensurepip
pip install -U seesaw
git clone https://github.com/ArchiveTeam/parler-grab
cd parler-grab; ./get-wget-lua.sh
run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE

For FreeBSD:

Honestly, I have no idea. ./get-wget-lua.sh supposedly doesn't work due to differences in the tar that ships with FreeBSD. Another problem is the apparent absence of Lua 5.1 development headers. If you figure this out, please do let us know on IRC (irc.hackint.org #archiveteam).

Troubleshooting

Broken? These are some of the possible solutions:

wget-lua was not successfully built

If you get errors about wget.pod or something similar, the documentation failed to compile - wget-lua, however, compiled fine. Try this:

cd get-wget-lua.tmp
mv src/wget ../wget-lua
cd ..

The get-wget-lua.tmp name may be inaccurate. If you have a folder with a similar but different name, use that instead and please let us know on IRC what folder name you had!

Optionally, if you know what you're doing, you may want to use wgetpod.patch.

Problem with gnutls or openssl during get-wget-lua

Please ensure that gnutls-dev(el) and openssl-dev(el) are installed.

ImportError: No module named seesaw

If you're sure that you followed the steps to install seesaw, permissions on your module directory may be set incorrectly. Try the following:

chmod o+rX -R /usr/local/lib/python2.7/dist-packages

run-pipeline: command not found

Install seesaw using pip2 instead of pip.

pip2 install seesaw

Issues in the code

If you notice a bug and want to file a bug report, please use the GitHub issues tracker.

Are you a developer? Help write code for us! Look at our developer documentation for details.

Other problems

Have an issue not listed here? Join us on IRC and ask! We can be found at hackint IRC #neparlepas.

More Repositories

1

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Python
1,330
star
2

wpull

Wget-compatible web downloader and crawler.
HTML
551
star
3

ArchiveBot

ArchiveBot, an IRC bot for archiving websites
Python
325
star
4

warrior-dockerfile

A Dockerfile for the ArchiveTeam Warrior
Dockerfile
244
star
5

Ubuntu-Warrior

Scripts to build and boot warrior virtual machine containing Docker
Shell
114
star
6

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
C
91
star
7

IA.BAK

We back up a lot of stuff from around the web; now it's time to back up the Internet Archive, just in case.
Shell
87
star
8

seesaw-kit

Making a reusable toolkit for writing seesaw scripts
Python
67
star
9

terroroftinytown

URLTeam's second generation of URL shortener archiving tools
Python
63
star
10

reddit-grab

Grabbing everything from reddit.
Lua
60
star
11

NewsGrabber

Grabbing all news.
Python
59
star
12

yahooanswers-grab

Saving all questions and answers from Yahoo! Answers.
Lua
48
star
13

tumblr-grab

Archiving all to-be-deleted NSFW tumblr blogs.
Lua
47
star
14

imgur-grab

Archiving imgur.
Lua
42
star
15

universal-tracker

A configurable, reusable tracker with dashboard
JavaScript
30
star
16

googleplus-grab

Archiving Google+.
Lua
26
star
17

terroroftinytown-client-grab

The Seesaw pipeline grab script for the URLTeam (terroroftinytown) project
Python
25
star
18

ludios_wpull

wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
HTML
25
star
19

warrior-code2

Boot scripts for the ArchiveTeam Warrior 2
Shell
22
star
20

ftp-gov-grab

Archiving government FTPs.
Python
21
star
21

warrior-code

Shell
19
star
22

WebArchiver

Decentralized web archiving
Python
19
star
23

soundcloud-grab

Lua
18
star
24

500px-grab

Archiving https://500px.com/creativecommons
Lua
17
star
25

tinyback

A tiny web scraper
Python
17
star
26

gamemaker-sandbox-items

Gamemaker Sandbox Tracker items
15
star
27

youtube-grab

Archiving all metadata from YouTube (everything except videos themselves due to size)
Lua
14
star
28

youtube-dislikes-grab

Archiving general youtube video metadata through innertube for dislikes removal.
Lua
14
star
29

youtube-dislikes-items

Managing items for youtube-dislikes-grab.
11
star
30

VideoBot

Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.
Python
11
star
31

urlteam-stuff

Urlteam website, code, ... also, PONIES
C
10
star
32

urls-grab

Archiving URLs (outlinks) from a variety of sources.
Lua
9
star
33

NewsGrabber-Warrior

Python
8
star
34

google-sites-grab

Archiving Google Sites Classic.
Lua
8
star
35

flickr-grab

Grabbing Flickr images.
Lua
7
star
36

pastebin-grab

Archiving pastebin
Lua
7
star
37

youtube-items

Managing items for youtube-grab
7
star
38

wget-lua-forum-scripts

Downloading forums posts with Wget+Lua
Lua
6
star
39

greader-grab

http://www.archiveteam.org/index.php?title=Google_Reader
Python
6
star
40

ftp-grab

Save all FTP sites!
Python
6
star
41

mediafire-items

Managing items for mediafire-grab.
Roff
6
star
42

citeseerxpdf-grab

Grabbing all sources of CiteSeerX.
Lua
6
star
43

twitchtv-grab

Grabbing twitch.tv videos
Python
6
star
44

mobileme-grab

Downloading MobileMe
Shell
6
star
45

warrior-preseed

Constructing a new warrior VM
Shell
5
star
46

ftp-nab

Thinger to download FTP sites
Shell
5
star
47

coursera-grab

Saving courses from Coursera.
Lua
5
star
48

tinyarchive

Software behind tracker.tinyarchive.org - Warning: Very hacky code
Python
5
star
49

formspring-grab

Downloading Formspring
Lua
5
star
50

yahoomessages-grab

Archiving Yahoo Messages
Python
5
star
51

splinder-grab

Python
5
star
52

telegram-grab

Archiving public telegram messages.
Lua
5
star
53

ffnet-grab

Fanfictioning
Python
5
star
54

archiveteam-megawarc-factory

Some scripts to process ArchiveTeam uploads
Shell
5
star
55

roblox-grab

Archiving roblox forums.
Lua
4
star
56

gamemaker-sandbox-grab

Grabbing sandbox.yoyogames.com
Python
4
star
57

justintv-grab

Grabbing as much of justin.tv's archives as possible
Python
4
star
58

sourceforge-grab

Archiving SourceForge.
Lua
4
star
59

grab-base-df

Base Dockerfile for warrior project grab scripts
Dockerfile
4
star
60

wikis-grab

Grabbing all wikis.
Python
4
star
61

liveleak-grab

Archiving liveleak.com
Lua
4
star
62

tencent-weibo-grab

Archiving Tencent Weibo (t.qq.com), 腾讯微博
Lua
4
star
63

imdb-grab

Archiving IMDb.
Lua
4
star
64

reddit-items

Managing items for reddit-grab.
4
star
65

flashdomains-grab

Copy of domains-grab for Flash sites.
Lua
4
star
66

ftp-queue

Create queue items for ftp-grab.
NewLisp
4
star
67

tumblr-grab-test

Archiving Tumblr blogs (an ArchiveTeam Warrior testing project)
Python
4
star
68

heroku-buildpack-archiveteam-warrior

Heroku buildpack with the Archive Team Warrior
Shell
4
star
69

mobileme-index

An index of the MobileMe downloads
Ruby
3
star
70

twitchtv-items

Managing twitch.tv items.
Python
3
star
71

Universal-tracker-2

A better tracker with more features for ArchiveTeam
Python
3
star
72

eroshare-grab

Lua
3
star
73

panoramio-grab

Grabbing everything from panoramio
Lua
3
star
74

blingee-grab

Saving all images and content from Blingee.
Lua
3
star
75

vidme-grab

Archiving all videos from vid.me.
Python
3
star
76

livejournal-discovery

Discovering items for livejournal-grab.
Python
3
star
77

github-grab

Archiving GitHub
Lua
3
star
78

mediafire-grab

Archiving mediafire.com URLs.
Lua
3
star
79

furaffinity-grab

Grabbing all images and other stuff from Fur Affinity.
Python
3
star
80

yahoogroups-grab

Archiving Yahoo! Groups.
Lua
3
star
81

webs-grab

Archiving webs.com
Lua
3
star
82

puush-grab

Python
3
star
83

twitchtv-discovery-grab

Discovering twitch.tv content
Python
3
star
84

vlive-grab

Archiving vlive.tv.
Lua
3
star
85

NewsGrabber-Services

The services for NewsGrabber.
Python
3
star
86

parler-items

Managing items for parler-grab.
3
star
87

furaffinity-items

Python
3
star
88

standalone-readme-template

Readme instructions template for manually running pipeline grab scripts outside the warrior
3
star
89

ArchiveBot-agents

Site-specific agents that work with ArchiveBot
Ruby
3
star
90

ua-grab

Archiving all of .ua.
Lua
2
star
91

googlecode-grab

Saving the full Google Code site!
Lua
2
star
92

pixiv-2-grab

Archiving pixiv2 images
Lua
2
star
93

miiverse-grab

Archiving miiverse
Lua
2
star
94

orkut-grab

Download all of Orkut
Lua
2
star
95

dpreview-grab

Archiving DPReview
Lua
2
star
96

bottle

A statistics monitor for the listerine download project @ Archive Team. Massive hack, no tests.
Ruby
2
star
97

halo-new-grab

Archiving Halo (round 2)
Lua
2
star
98

googleplus-items

Managing items for googleplus-grab and googleplus2-grab.
2
star
99

furaffinity-discovery

Python
2
star
100

scrapy-thingy

Archiving Thingiverse
Shell
2
star