scrapy/scrapy

Stars
50,821
Rank 209 (Top 0.01 %)
Language
Python
License
BSD 3-Clause "New...
Created about 14 years ago
Updated 11 days ago

scrapy/scrapy

scrapy

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy

Overview

Scrapy is a BSD-licensed fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Scrapy is maintained by Zyte (formerly Scrapinghub) and many other contributors.

Check the Scrapy homepage at https://scrapy.org for more information, including a list of features.

Requirements

Python 3.8+
Works on Linux, Windows, macOS, BSD

Install

The quick way:

pip install scrapy

See the install section in the documentation at https://docs.scrapy.org/en/latest/intro/install.html for more details.

Documentation

Documentation is available online at https://docs.scrapy.org/ and in the docs directory.

Releases

You can check https://docs.scrapy.org/en/latest/news.html for the release notes.

Community (blog, twitter, mail list, IRC)

See https://scrapy.org/community/ for details.

Contributing

See https://docs.scrapy.org/en/master/contributing.html for details.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct.

By participating in this project you agree to abide by its terms. Please report unacceptable behavior to [email protected].

Companies using Scrapy

See https://scrapy.org/companies/ for a list.

Commercial Support

See https://scrapy.org/support/ for details.

scrapyd

A service daemon to run Scrapy spiders

scrapely

A pure-python HTML screen-scraping library

dirbot

Scrapy project to scrape public web directories (educational) [DEPRECATED]

quotesbot

This is a sample Scrapy project for educational purposes

parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

scrapyd-client

Command line client for Scrapyd server

w3lib

Python library of web-related functions

cssselect

CSS Selectors for Python

loginform

Fill HTML login forms automatically

queuelib

Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python

slybot

itemadapter

Common interface for data container classes

scrapy.org

The scrapy.org website

protego

A pure-Python robots.txt parser with support for modern conventions.

DIGITAL Command Language

itemloaders

Library to populate items using XPath and CSS with a convenient API

scrapy-bench

A CLI for benchmarking Scrapy.

scurl

Performance-focused replacement for Python urllib

pypydispatcher

A fork of http://pydispatcher.sourceforge.net/ with PyPy support

xtractmime

https://mimesniff.spec.whatwg.org/ implementation for Python

base-chromium

base component forked from Chromium source https://chromium.googlesource.com/chromium/src/base/

scrapy-itemloader

[Archived] Library to populate Scrapy items using XPath and CSS with a convenient API

gsoc2014-integration-tests

GSoC2014 - Scrapy Integration tests project

url-chromium

url component from Chromium source code, forked from https://chromium.googlesource.com/chromium/src/url