scrapy/quotesbot

Stars
1,275
Rank 35,504 (Top 0.8 %)
Language
Python
License
MIT License
Created over 7 years ago
Updated 6 months ago

scrapy/quotesbot

scrapy

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

This is a sample Scrapy project for educational purposes

QuotesBot

This is a Scrapy project to scrape quotes from famous people from http://quotes.toscrape.com (github repo).

This project is only meant for educational purposes.

Extracted data

This project extracts quotes, combined with the respective author names and tags. The extracted data looks like this sample:

{
    'author': 'Douglas Adams',
    'text': '“I may not have gone where I intended to go, but I think I ...”',
    'tags': ['life', 'navigation']
}

Spiders

This project contains two spiders and you can list them using the list command:

$ scrapy list
toscrape-css
toscrape-xpath

Both spiders extract the same data from the same website, but toscrape-css employs CSS selectors, while toscrape-xpath employs XPath expressions.

You can learn more about the spiders by going through the Scrapy Tutorial.

Running the spiders

You can run a spider using the scrapy crawl command, such as:

$ scrapy crawl toscrape-css

If you want to save the scraped data to a file, you can pass the -o option:

$ scrapy crawl toscrape-css -o quotes.json

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

scrapyd

A service daemon to run Scrapy spiders

scrapely

A pure-python HTML screen-scraping library

dirbot

Scrapy project to scrape public web directories (educational) [DEPRECATED]

parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

scrapyd-client

Command line client for Scrapyd server

w3lib

Python library of web-related functions

cssselect

CSS Selectors for Python

loginform

Fill HTML login forms automatically

queuelib

Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python

slybot

scrapy.org

The scrapy.org website

itemadapter

Common interface for data container classes

protego

A pure-Python robots.txt parser with support for modern conventions.

DIGITAL Command Language

itemloaders

Library to populate items using XPath and CSS with a convenient API

scrapy-bench

A CLI for benchmarking Scrapy.

scurl

Performance-focused replacement for Python urllib

pypydispatcher

A fork of http://pydispatcher.sourceforge.net/ with PyPy support

xtractmime

https://mimesniff.spec.whatwg.org/ implementation for Python

base-chromium

base component forked from Chromium source https://chromium.googlesource.com/chromium/src/base/

scrapy-itemloader

[Archived] Library to populate Scrapy items using XPath and CSS with a convenient API

gsoc2014-integration-tests

GSoC2014 - Scrapy Integration tests project

url-chromium

url component from Chromium source code, forked from https://chromium.googlesource.com/chromium/src/url