• Stars
    star
    135
  • Rank 269,240 (Top 6 %)
  • Language
    Python
  • Created almost 11 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python-based Imageboard (4chan) complete thread archiver.

BASC Archiver

The BASC Archiver is a Python library (packaged with the thread-archiver script) used to archive imageboard threads. It uses the 4chan API with the py4chan wrapper. Developers are free to use the BASC-Archiver library for some interesting third-party applications, as it is licensed under the LGPLv3.

It comes with a CLI interface for archiving threads, the thread-archiver, with a GUI interface under development.

The thread-archiver is designed to archive all content from a 4chan thread:

  • Download all images and/or thumbnails in given threads.
  • Download all child threads (threads referred to in a post).
  • Download a JSON dump of thread comments using the 4chan API.
  • Download the HTML page.
  • Convert links in HTML to use the downloaded images.
  • Download CSS/JS and convert HTML to use them.
  • Keep downloading until 404 (with a user-set delay).
  • Can be restarted at any time.
  • Threaded downloading to download multiple files at the same time.

The thread-archiver replaces the typical โ€œRight-click Save As, Web Page Completeโ€ action, which does not save full-sized images or JSON. It works as a guerilla, static HTML alternative to Fuuka.

Usage

Usage:
  thread-archiver <url>... [options]
  thread-archiver -h | --help
  thread-archiver -v | --version

Options:
  --path=<string>                Path to folder where archives will be saved [default: ./archive]
  --runonce                      Downloads the thread as it is presently, then exits
  --thread-check-delay=<float>   Delay between checks of the same thread [default: 90]
  --delay=<float>                Delay between file downloads [default: 0]
  --poll-delay=<float>           Delay between thread checks [default: 20]
  --dl-threads-per-site=<int>    Download threads to use per site [default: 5]
  --dl-thread-wait=<float>       Seconds to wait between downloads on each thread [default: 0.1]
  --nothumbs                     Don't download thumbnails
  --thumbsonly                   Download thumbnails, no images
  --nojs                         Don't download javascript
  --nocss                        Don't download css
  --ssl                          Download using HTTPS
  --follow-children              Follow threads linked in downloaded threads
  --follow-to-other-boards       Follow linked threads, even if from other boards
  --silent                       Suppresses mundane printouts, prints what's important
  -v --verbose                   Printout more information than normal
  -h --help                      Show help
  -V --version                   Show version

Example

thread-archiver http://boards.4chan.org/b/res/423861837 --delay 5 --thumbsonly

Installation

The BASC-Archiver is designed for Python 3.x, and can be installed on Windows, Linux, or Mac OS X.

(Python2 has intractable ascii->unicode conversion errors, whereas Python 3.x stores all strings in unicode, so we strongly recommend using 3.x.)

New stable releases can be found on our Releases page, or installed with the PyPi package BASC-Archiver.

Linux and OSX

  1. Make sure you have Python3 and pip3 installed. On Debian/Ubuntu, Fedora/Red Hat/CentOS, install the packages python3 and python3-pip . Here's a Mac OS X Installation Guide.
  2. Run pip3 install basc-archiver
    • Linux users must run this command as root, or prefix the command with sudo.
  3. Run thread-archiver http://boards.4chan.org/etc/thread/12345

Threads will be saved in ./archive, but you can change that by supplying a directory with the --path= argument.

Windows

  1. Download the latest release from our page.
  2. Open up a command prompt window (cmd.exe), and move to the directory with thread-archiver.exe
  3. Run thread-archiver.exe http://boards.4chan.org/etc/thread/12345

Using the Windows version will become simpler once we finish writing the GUI.

Android (CLI)

Note: This is a temporary solution until we put together some kind of Android GUI app.

Thanks to the QPython interpreter, you can effortlessly run the BASC-Archiver on your Android phone.

  1. Install the QPython app from Google Play.

  2. Open the QPython app, and swipe left to reach the menu.

  3. Tap Package Index. Then scroll down and tap Pip Console.

  4. Run the following commands (after starting the pip_install.py script):

    pip install requests
    pip install basc-archiver
    

Now you can just open QPython, tap My QPython, tap pip_console, and run the following command with your own thread URL:

thread-archiver --path=/sdcard/ http://boards.4chan.org/qa/thread/23839

To run the script in the background, press the back button, and tap OK at the Run in Background prompt. You can stop the script anytime using Vol Down + C.

  • Note: On Android (CLI), it is important to set the path to /sdcard/, so the thread dump can be accessed from the /sdcard/archives/4chan/ folder.
  • Note: To update the BASC-Archiver on Android (CLI), you must open QPython, press the 3-dot menu button, scroll down and tap Reset Private Space. Then just reinstall the BASC-Archiver.

License

Bibliotheca Anonoma Imageboard Thread Archiver (BASC Archiver)

Copyright (C) 2014 Antonizoon Overtwater, Daniel Oaks. Licensed under the GNU Lesser General Public License v3.

More Repositories

1

bibanon

The Bibliotheca Anonoma: A wikified library of the internet's treasures. Researching Something Awful, 2channel, 4chan, and other imageboard/textboard communities.
HTML
1,325
star
2

tubeup

Use yt-dlp to download video and upload to the Internet Archive with metadata.
Python
407
star
3

Coreboot-ThinkPads

A (formerly) comprehensive guide to installing Coreboot on various laptops.
278
star
4

android-development-codex

A Wiki containing guides to modding many different consumer electronic devices.
HTML
99
star
5

BASC-py4chan

Python wrapper for 4chan API. The BA's vastly improved fork of Edgeworth's original.
Python
55
star
6

webcache-scraper

The Bibliotheca Anonoma's own Bing Cache and Google Cache scraper scripts. Unlike most of the other ones you've seen, these actually work.
Python
26
star
7

neofuuka-scraper

Asagi-like yotsuba scraper. WARNING: Currently has issues because of a recent 4chan change, contact Bakugo on the bibanon chat if you need to set up a new instance
Python
19
star
8

itabashi

Itabashi (ๆฟๆฉ‹) is a bridging bot that syncs messages between a Discord and an IRC channel.
Python
18
star
9

ayase

Ayase is a 4chan Archiver API middleware and HTML frontend based on Python, as a replacement for FoolFuuka, supporting both Asagi and Ayase SQL Schema compatible scrapers.
Python
17
star
10

eve

Asagi replacement written in Python
Python
16
star
11

BASC-eBookGenerator

Create EPUB and MOBI ebooks from Markdown pages, with all necessary pages, images, fonts, and CSS stylesheets kept in a source code folder.
Shell
15
star
12

py8chan

Python wrapper for the 8chan API, based on BASC-py4chan.
Python
14
star
13

prntscr-scraper

A web scraper designed to link prntscr.com URLs with their associated Imgur images, and archive them.
Python
13
star
14

everything-shii-knows

An archive of Shii's Wiki (a major source for and the inspiration for the Bibliotheca Anonoma), which was uncermoniously deleted by the man himself due to personal concerns.
HTML
12
star
15

4chan.doc

A decompiled version of ThrustVect's impressive 4chan.doc report, to Markdown and Mediawiki. Meant for eventual integration into the 4chan Chronicle (Wikibook).
9
star
16

mitsuba

4chan archiver written in Rust
JavaScript
8
star
17

a-tsundere-christmas-carol

A Tsundere Christmas Carol, now in Visual Novel Format. Archived for future anons by the Bibliotheca Anonoma.
Ren'Py
8
star
18

world4ch

(under construction) A publicly viewable archive of 4chan's old textboards.
JavaScript
7
star
19

pyvichan

Python wrapper for the vichan API, based on BASC-py4chan. Not all features have been tested yet, but there's enough to browse and archive a thread.
Python
6
star
20

py420chan

Python wrapper for the 420chan API, based on BASC-py4chan.
Python
5
star
21

macrochan-scraper

A scraper designed to archive Macrochan.org's 45175+ image archive.
HTML
4
star
22

PB_Spade

Photobucket Archiver. Spiritual successor to PB_Shovel
JavaScript
4
star
23

assorted-archival-scripts

An assortment of bash/python scripts that make it easy to archive data or upload data to the Internet Archive.
Python
4
star
24

Tanasinn-Kopipe

A git backup of all Kopipe(copypasta) from Tanasinn.info
4
star
25

docker-swfdec-thumbnailer

swfdec-thumbnailer: uses the swfdec program to generate thumbnails for swf (Flash) files. Uses an Arch Linux Docker image.
Shell
3
star
26

asagi_schema

The Asagi schema standard versioned as per the histories of various FoolFuuka/Asagi SQL dumps. Notice that no major SQL changes occured past https://github.com/eksopl/asagi (tag 1.0.0), only some tweaks to Mysql/trigger.sql (tag 1.3.0).
TSQL
3
star
27

asagi_archive_image_exporter

Tool to dump a range of images from an asagi/foolfuuka archive
Python
3
star
28

chan.arc

Imageboard Archive File Format Specification
Python
3
star
29

archives

BASC Website Archives. To preserve old sites for public viewing.
HTML
3
star
30

vyrd

Vyrd was the personal website of one of the great contemporary 4chan Archivists. His website was full of links to dying 4chan pages, publicly viewable versions of the Penfifteen and Yotsuba Society thread archives, and lots of info that is otherwise no longer extant.
HTML
3
star
31

Neglected-Mario-Characters

An archive of Neglected Mario Characters, the first video game sprite-based webcomic, and the most memorable. Based on MetalMan88's lost revision, but fixed missing comix
3
star
32

dagobah-scraper

A paginated gallery scraper designed to archive flash files and metadata from Dagobah.
HTML
2
star
33

genmaicha

An IRC frontend to senchado and grab-site. Still in the pre-alpha/planning stage.
Python
2
star
34

BASC-WARC

Library for creating and managing WARC files. Currently in planning / pre-alpha stage.
Python
2
star
35

pyFuuka

A plan for a FoolFuuka API wrapper for Python. Because even the archivers themselves need to be archived (as seen from archive.moe, which lost 50% of the thumbnails!)
2
star
36

roverfetcher

Lua
2
star
37

Twitch-Plays

A wiki and webpage detailing the bewilderingly amazing hivemind growing around the "Twitch Plays" streams.
2
star
38

bibanon.github.io

Bibliotheca Anonoma Website
HTML
1
star
39

scraping-everyboty

A Node.js tool to scrape Everyboty's API
JavaScript
1
star
40

bing-cache-scraper

A collection of node.js scripts for scraping Bing
JavaScript
1
star
41

asagi_archive_auto_failover

Script to detech when an asagi / foolfuuka archive breaks
Python
1
star
42

eientei

An imageboard HTML/CSS template as used in eientei.xyz , specifically tuned for 4chan archives. Comes with a mustache based template variant.
CSS
1
star