• Stars
    star
    50,821
  • Rank 209 (Top 0.01 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created about 14 years ago
  • Updated 11 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Scrapy, a fast high-level web crawling & scraping framework for Python.

image

Scrapy

PyPI Version

Supported Python Versions

Ubuntu

Windows

Wheel Status

Coverage report

Conda Version

Overview

Scrapy is a BSD-licensed fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Scrapy is maintained by Zyte (formerly Scrapinghub) and many other contributors.

Check the Scrapy homepage at https://scrapy.org for more information, including a list of features.

Requirements

  • Python 3.8+
  • Works on Linux, Windows, macOS, BSD

Install

The quick way:

pip install scrapy

See the install section in the documentation at https://docs.scrapy.org/en/latest/intro/install.html for more details.

Documentation

Documentation is available online at https://docs.scrapy.org/ and in the docs directory.

Releases

You can check https://docs.scrapy.org/en/latest/news.html for the release notes.

Community (blog, twitter, mail list, IRC)

See https://scrapy.org/community/ for details.

Contributing

See https://docs.scrapy.org/en/master/contributing.html for details.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct.

By participating in this project you agree to abide by its terms. Please report unacceptable behavior to [email protected].

Companies using Scrapy

See https://scrapy.org/companies/ for a list.

Commercial Support

See https://scrapy.org/support/ for details.

More Repositories

1

scrapyd

A service daemon to run Scrapy spiders
Python
2,825
star
2

scrapely

A pure-python HTML screen-scraping library
HTML
1,843
star
3

dirbot

Scrapy project to scrape public web directories (educational) [DEPRECATED]
Python
1,629
star
4

quotesbot

This is a sample Scrapy project for educational purposes
Python
1,259
star
5

parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Python
1,074
star
6

scrapyd-client

Command line client for Scrapyd server
Python
751
star
7

w3lib

Python library of web-related functions
Python
381
star
8

cssselect

CSS Selectors for Python
Python
282
star
9

loginform

Fill HTML login forms automatically
Python
261
star
10

queuelib

Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python
Python
256
star
11

slybot

224
star
12

itemadapter

Common interface for data container classes
Python
60
star
13

scrapy.org

The scrapy.org website
HTML
57
star
14

protego

A pure-Python robots.txt parser with support for modern conventions.
DIGITAL Command Language
48
star
15

itemloaders

Library to populate items using XPath and CSS with a convenient API
Python
41
star
16

scrapy-bench

A CLI for benchmarking Scrapy.
Python
30
star
17

scurl

Performance-focused replacement for Python urllib
Python
21
star
18

pypydispatcher

A fork of http://pydispatcher.sourceforge.net/ with PyPy support
Python
15
star
19

xtractmime

https://mimesniff.spec.whatwg.org/ implementation for Python
Python
13
star
20

base-chromium

base component forked from Chromium source https://chromium.googlesource.com/chromium/src/base/
C++
7
star
21

scrapy-itemloader

[Archived] Library to populate Scrapy items using XPath and CSS with a convenient API
Python
6
star
22

gsoc2014-integration-tests

GSoC2014 - Scrapy Integration tests project
Shell
3
star
23

url-chromium

url component from Chromium source code, forked from https://chromium.googlesource.com/chromium/src/url
C++
2
star