• Stars
    star
    182
  • Rank 211,154 (Top 5 %)
  • Language
    Go
  • License
    GNU General Publi...
  • Created about 5 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Demeter is a tool for scraping the calibre web ui

demeter

demeter is a tool for downloading the .epub files you don't have from a Calibre library. It does this by building a database of books it has seen based on some clever algorithms. At least, that's the idea.

(Demeter only allows scraping a host every 12 hours to prevent overloading the server.)

Installation and Usage

Download the appropriate demeter binary for your platform from the releases page.

This is a standalone binary, there's no need to install any dependencies.

Move it somewhere in your $PATH so you can call it with demeter

Add a Host

demeter host add http://example.com:8080

Scrape all hosts and store results in the directory ./books and only download the extension pdf

demeter scrape run -d books -e pdf

For the rest, use the built in help.

This tool can be used for whatever you want, enjoy.

important note regarding extensions

The -e flag on the scrape run command only affects that specific run, the books are stored without any extension information in the database. In general that means that if you switch from the -e epub (default) to -e mobi, you will only download new books in the mobi extension. Books that were already present will not be re-downloaded in a different extension.

Database

Demeter builds an internal database that is stored in ~/.demeter/demeter.db

Scraping

When scraping a host, demeter does the following:

  • Use the API to collect all book ids
  • Check if there a new book ids since the previous scrape
  • Use the API to get the details for all the new book ids
  • Check the internal db if a book has already been downloaded
  • Download the book if it isn't and add it to the internal db
  • Mark the host as scraped so it won't do it again within 12 hours
  • If the host failed, mark it as failed and disable it after a while

all commands

$ demeter -h
demeter is CLI application for scraping calibre hosts and
retrieving books in epub format that are not in your local library.

Usage:
  demeter [command]

Available Commands:
  dl          download related commands
  help        Help about any command
  host        all host related commands
  scrape      all scrape related commands

$ demeter dl -h
download related commands

Usage:
  demeter dl [command]

Aliases:
  dl, download, downloads, dls

Available Commands:
  add          add a number of hashes to the database
  deleterecent delete all downloads from this time period
  list         list all downloads

$ demeter host -h
all host related commands

Usage:
  demeter host [command]

Available Commands:
  add         add a host to the scrape list
  disable     disable a host
  enabled     make a host active
  list        list all hosts
  rm          delete a host
  stats       Get host stats

$ demeter scrape -h
all scrape related commands

Usage:
  demeter scrape [command]

Available Commands:
  run         run all scrape jobs

$ demeter scrape run -h
demeter scrape run -h
Go over all defined hosts and if the last scrape
is old enough it will scrape that host.

Usage:
  demeter scrape run [flags]

Flags:
  -h, --help               help for run
  -d, --outputdir string   path to downloaded books to (default "books")
  -n, --stepsize int       number of books to request per query (default 50)
  -u, --useragent string   user agent used to identify to calibre hosts (default "demeter / v1")
  -w, --workers int        number of workers to concurrently download books (default 10)