• Stars
    star
    154
  • Rank 240,586 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 10 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Turn a story on certain websites into an ebook for convenient reading

Leech

Let's say you want to read some sort of fiction. You're a fan of it, perhaps. But mobile websites are kind of non-ideal, so you'd like a proper ebook made from whatever you're reading.

Setup

You need Python 3.7+ and poetry.

My recommended setup process is:

$ pip install poetry
$ poetry install
$ poetry shell

...adjust as needed. Just make sure the dependencies from pyproject.toml get installed somehow.

Usage

Basic

$ python3 leech.py [[URL]]

A new file will appear named Title of the Story.epub.

This is equivalent to the slightly longer

$ python3 leech.py download [[URL]]

Flushing the cache

$ python3 leech.py flush

If you want to put it on a Kindle you'll have to convert it. I'd recommend Calibre, though you could also try using kindlegen directly.

Supports

  • Fanfiction.net
  • FictionPress
  • ArchiveOfOurOwn
    • Yes, it has its own built-in EPUB export, but the formatting is horrible
  • Various XenForo-based sites: SpaceBattles and SufficientVelocity, most notably
  • RoyalRoad
  • Fiction.live (Anonkun)
  • DeviantArt galleries/collections
  • Sta.sh
  • Completely arbitrary sites, with a bit more work (see below)

Configuration

A very small amount of configuration is possible by creating a file called leech.json in the project directory. Currently you can define login information for sites that support it, and some options for book covers.

Example:

{
    "logins": {
        "QuestionableQuesting": ["username", "password"]
    },
    "cover": {
        "fontname": "Comic Sans MS",
        "fontsize": 30,
        "bgcolor": [20, 120, 20],
        "textcolor": [180, 20, 180],
        "cover_url": "https://website.com/image.png"
    },
    "output_dir": "/tmp/ebooks",
    "site_options": {
        "RoyalRoad": {
            "output_dir": "/tmp/litrpg_isekai_trash"
        }
    }
}

Arbitrary Sites

If you want to just download a one-off story from a site, you can create a definition file to describe it. This requires investigation and understanding of things like CSS selectors, which may take some trial and error.

Example practical.json:

{
    "url": "https://practicalguidetoevil.wordpress.com/table-of-contents/",
    "title": "A Practical Guide To Evil: Book 1",
    "author": "erraticerrata",
    "chapter_selector": "#main .entry-content > ul:nth-of-type(1) > li > a",
    "content_selector": "#main .entry-content",
    "filter_selector": ".sharedaddy, .wpcnt, style",
    "cover_url": "https://gitlab.com/Mikescher2/A-Practical-Guide-To-Evil-Lyx/raw/master/APGTE_1/APGTE_front.png"
}

Run as:

$ ./leech.py practical.json

This tells leech to load url, follow the links described by chapter_selector, extract the content from those pages as described by content_selector, and remove any content from that which matches filter_selector. Optionally, cover_url will replace the default cover with the image of your choice.

If chapter_selector isn't given, it'll create a single-chapter book by applying content_selector to url.

This is a fairly viable way to extract a story from, say, a random Wordpress installation with a convenient table of contents. It's relatively likely to get you at least most of the way to the ebook you want, with maybe some manual editing needed.

A more advanced example with JSON would be:

{
    "url": "https://practicalguidetoevil.wordpress.com/2015/03/25/prologue/",
    "title": "A Practical Guide To Evil: Book 1",
    "author": "erraticerrata",
    "content_selector": "#main .entry-wrapper",
    "content_title_selector": "h1.entry-title",
    "content_text_selector": ".entry-content",
    "filter_selector": ".sharedaddy, .wpcnt, style",
    "next_selector": "a[rel=\"next\"]:not([href*=\"prologue\"])",
    "cover_url": "https://gitlab.com/Mikescher2/A-Practical-Guide-To-Evil-Lyx/raw/master/APGTE_1/APGTE_front.png"
}

Because there's no chapter_selector here, leech will keep on looking for a link which it can find with next_selector and following that link. We also see more advanced metadata acquisition here, with content_title_selector and content_text_selector being used to find specific elements from within the content.

If multiple matches for content_selector are found, leech will assume multiple chapters are present on one page, and will handle that. If you find a story that you want on a site which has all the chapters in the right order and next-page links, this is a notably efficient way to download it. See examples/dungeonkeeperami.json for this being used.

If you need more advanced behavior, consider looking at...

Adding new site handers

To add support for a new site, create a file in the sites directory that implements the Site interface. Take a look at ao3.py for a minimal example of what you have to do.

Docker

You can build the project's Docker container like this:

docker build . -t kemayo/leech:snapshot

The container's entrypoint runs leech directly and sets the current working directory to /work, so you can mount any directory there:

docker run -it --rm -v ${DIR}:/work kemayo/leech:snapshot download [[URL]]

Contributing

If you submit a pull request to add support for another reasonably-general-purpose site, I will nigh-certainly accept it.

Run EpubCheck on epubs you generate to make sure they're not breaking.

More Repositories

1

sublime-text-git

Plugin for some git integration into sublime text
Python
2,824
star
2

maphilight

jQuery plugin that adds highlighting to image maps
JavaScript
489
star
3

sublime-text-2-clipboard-history

Clipboard history plugin for Sublime Text 2
Python
188
star
4

sublime-text-2-goto-documentation

Sublime Text 2 plugin to go to documentation
Python
128
star
5

obsidian-smart-links

TypeScript
26
star
6

wow-silverdragon

World of Warcraft addon to find rare mobs
Lua
23
star
7

wp-rss-importer

PHP
19
star
8

wow-simpleitemlevel

World of Warcraft addon to show item levels
Lua
15
star
9

davidlynch.org

My website
HTML
14
star
10

firefox-sticky-containers

Firefox extention to make the current container "sticky" when opening a new tab
JavaScript
13
star
11

simplecomic

Lightweight online comic publishing system, for the webcomic artist who just wants to put up comics
PHP
11
star
12

google-webfonts-lister

Download a complete list of google webfonts
Python
11
star
13

hubot-phabricator

Hubot plugin to expand on phabricator object names
CoffeeScript
10
star
14

wp-disqus-importer

Import a Disqus XML export into WordPress
PHP
10
star
15

deviantart_backup

Backup script for DeviantArt
Python
9
star
16

gedit-smarthome

Enable smart HOME/END behavior for gedit
Python
8
star
17

wow-handynotes-dragonflight

World of Warcraft addon to tell you where treasures are in the Dragon Isles (requires HandyNotes)
Lua
8
star
18

lj.py

Sometimes you want to have API access to angsty teenagers under Python. This lets you do that. May God have mercy upon your soul.
Python
7
star
19

wow-dropthecheapestthing

World of Warcraft addon to drop/sell the cheapest thing you're carrying
Lua
7
star
20

wow-handynotes-shadowlandstreasures

World of Warcraft addon to tell you where treasures are in Shadowlands (requires HandyNotes)
Lua
7
star
21

wow-appearancetooltip

World of Warcraft addon: fancy tooltip for previewing item appearances
Lua
6
star
22

wow-handynotes-legiontreasures

World of Warcraft addon to tell you where treasures are in Legion (requires HandyNotes)
Lua
6
star
23

irssi-pushover

Pushover.net plugin for irssi
Perl
5
star
24

wow-minimaprangeextender

World of Warcraft addon to show minimap vignette icons sooner
Lua
5
star
25

wow-handynotes-battleforazerothtreasures

World of Warcraft addon to tell you where treasures are in Battle for Azeroth (requires HandyNotes)
Lua
5
star
26

wow-haveidonethat

World of Warcraft addon to tell you about what you've done and suggest what you might want to do next
Lua
4
star
27

wow-questpointer

World of Warcraft addon to point at quests
Lua
4
star
28

sublime-text-exclude-paths

Exclude paths from project / search indexing
Python
4
star
29

downer

token-based download manager
PHP
3
star
30

waup

An updater for World of Warcraft addons hosted on wowace.com
Python
3
star
31

wow-questschanged

World of Warcraft addon for developers to log quest completion changes
Lua
3
star
32

contwext

Fetch a Twitter feed and guess context
Python
3
star
33

wow-whatsonthemap

World of Warcraft addon that shows a simple list of currently visible map vignettes
Lua
3
star
34

wow-bankstack

World of Warcraft addon to move items around in your inventory, bank, and guild bank
Lua
3
star
35

wow-handynotes-warwithin

World of Warcraft addon to tell you where treasures are in the War Within expansion (requires HandyNotes)
Lua
3
star
36

jenny

Size calculation for blocking/non-blocking JS/CSS on a page
Python
2
star
37

wow-handynotes-lostandfound

World of Warcraft addon to help you find the Lost and Found treasures in Pandaria
Lua
2
star
38

actions-recent-changelog

A GitHub Action for generating a "recent changes" file from a large manually-curated changelog
Python
2
star
39

armory

Python module for accessing the World of Warcraft armory
Python
2
star
40

wow-doesmyaltknowthat

World of Warcraft addon to tell you about what your alts know
Lua
2
star
41

dice

Roll dice, and get information about the probabilities of the roll
Python
2
star
42

colorclock

A clock whose colors depend on the time
HTML
2
star
43

wow-handynotes-secretfish

World of Warcraft addon: where the secret fish are for the Secret Fish and Where to Find Them achievement
Lua
2
star
44

wow-handynotes-stygia

World of Warcraft addon to tell you where Stygia gathering nodes are in the Maw (requires HandyNotes)
Lua
2
star
45

unobtrusive-sudoku

Simple Javascript sudoku requiring no in-page markup
JavaScript
1
star
46

wow-handynotes-elitebattlepets

WoW addon: Handynotes plugin for the Tanaan Jungle elite battle pets
Lua
1
star
47

wow-handyminimaparrow

World of Warcraft addon that layers an arrow onto the minimap higher up than anything else
Lua
1
star
48

wow-handynotes-higherdimensionallearning

WoW addon: Handynotes plugin for Higher Dimensional Leaning
Lua
1
star
49

wow-handynotes-suramarleylines

WoW addon: Handynotes plugin for suramar leylines
Lua
1
star
50

wow-objectscanner

World of Warcraft addon: Watch tooltip updates to announce when you find specific objects
Lua
1
star
51

akismet

Python module to interact with akismet
Python
1
star
52

vagrant-skeleton

The vagrant skeleton I use
ApacheConf
1
star
53

tiny

I wrote a tinyurl cgi script once
Python
1
star
54

vue-editor-trash

Playing with a Vue-based contentediable editor to teach myself things
1
star
55

dilbert

Javascript viewer for Dilbert
JavaScript
1
star
56

loosephabric

Specialized macOS menubar utility to turn copied text into links
Swift
1
star
57

wow-handynotes-lorewalkers

World of Warcraft addon to help you find the Lorewalkers tablets in Pandaria
Lua
1
star
58

wow-handynotes-witheredarmytraining

WoW addon: Handynotes plugin for withered army training
Lua
1
star
59

wow-handynotes-wrath

World of Warcraft addon to tell you where things are in Northrend (requires HandyNotes)
Lua
1
star
60

wow-handynotes-burningcrusade

World of Warcraft addon to tell you where treasures are in the Burning Crusade (requires HandyNotes)
Lua
1
star
61

handynotes-kosumoth

World of Warcraft addon to tell you where Kosumoth is in Legion (requires HandyNotes)
Lua
1
star
62

wow-handynotes-longforgottenhippogryph

WoW addon: Handynotes plugin for the Long Forgotten Hippogryph
Lua
1
star
63

wow-handynotes-suramartelemancy

WoW addon: Handynotes plugin for suramar portals
Lua
1
star
64

tracker

Quick remote-data-tracker script
Python
1
star
65

vue-visualeditor-example

Test project wrapping VisualEditor into a vue component and using it
HTML
1
star
66

wow-serverrestartsound

World of Warcraft addon to make a noise when the server's about to restart
Lua
1
star
67

hubot-deviantart

A hubot script adding deviantart search
CoffeeScript
1
star
68

longurl

Tiny Python module for expanding shortened URLs
Python
1
star
69

wow-handynotes-treasurehunter

World of Warcraft addon to tell you where treasures are (requires HandyNotes)
Lua
1
star