• Stars
    star
    297
  • Rank 140,075 (Top 3 %)
  • Language
    Python
  • Created about 10 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Scrapy extension to control spiders using JSON-RPC

scrapy-jsonrpc

scrapy-jsonrpc is an extension to control a running Scrapy web crawler via JSON-RPC. The service provides access to the main Crawler object via the JSON-RPC 2.0 protocol.

Installation

Install scrapy-jsonrpc using pip:

$ pip install scrapy-jsonrpc

Configuration

First, you need to include the entension to your EXTENSIONS dict in settings.py, for example:

EXTENSIONS = {
    'scrapy_jsonrpc.webservice.WebService': 500,
}

Then, you need to enable the extension with the JSONRPC_ENABLED setting, set to True.

The web server will listen on a port specified in JSONRPC_PORT (by default, it will try to listen on port 6080), and will log to the file specified in JSONRPC_LOGFILE.

The endpoint for accessing the crawler object is:

http://localhost:6080/crawler

Example client

There is a command line tool provided for illustration purposes on how to build a client. You can find it in example-client.py. It supports a few basic commands such as listing the running spiders, etc.

Settings

These are the settings that control the web service behaviour:

JSONRPC_ENABLED

Default: False

A boolean which specifies if the web service will be enabled (provided its extension is also enabled).

JSONRPC_LOGFILE

Default: None

A file to use for logging HTTP requests made to the web service. If unset web the log is sent to standard scrapy log.

JSONRPC_PORT

Default: [6080, 7030]

The port range to use for the web service. If set to None or 0, a dynamically assigned port is used.

JSONRPC_HOST

Default: '127.0.0.1'

The interface the web service should listen on.

More Repositories

1

scrapy-splash

Scrapy+Splash for JavaScript integration
Python
3,148
star
2

scrapy-playwright

🎭 Playwright integration for Scrapy
Python
1,002
star
3

scrapy-djangoitem

Scrapy extension to write scraped items using Django models
Python
498
star
4

scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
Python
356
star
5

scrapy-deltafetch

Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
Python
267
star
6

scrapy-magicfields

Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.
Python
56
star
7

scrapy-jsonschema

Scrapy schema validation pipeline and Item builder using JSON Schema
Python
44
star
8

scrapy-monkeylearn

A Scrapy pipeline to categorize items using MonkeyLearn
Python
37
star
9

scrapy-zyte-api

Zyte API integration for Scrapy
Python
35
star
10

scrapy-headless

Python
29
star
11

scrapy-pagestorage

A scrapy extension to store requests and responses information in storage service
Python
26
star
12

scrapy-querycleaner

Scrapy spider middleware to clean up query parameters in request URLs
Python
25
star
13

scrapy-splitvariants

Scrapy spider middleware to split an item into multiple items using a multi-valued key
Python
20
star
14

scrapy-streaming

Python
18
star
15

scrapy-dotpersistence

A scrapy extension to sync `.scrapy` folder to an S3 bucket
Python
16
star
16

scrapy-streamitem

Scrapy support for working with streamcorpus Stream Items.
Python
11
star
17

scrapy-crawlera-fetch

Scrapy Downloader Middleware for Crawlera Fetch API
Python
8
star
18

scrapy-feedexporter-sftp

Python
6
star
19

scrapy-statsd

Python
6
star
20

scrapy-bigml

Scrapy pipeline for writing items to BigML datasets
Python
4
star
21

scrapy-spider-metadata

Python
4
star
22

scrapy-hcf

Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs
Python
4
star
23

scrapy-snowflake-stage-exporter

Snowflake database loading utility with Scrapy integration
Python
4
star
24

scrapy-feedexporter-google-drive

Python
3
star
25

scrapy-feedexporter-azure-storage

Python
2
star
26

scrapy-feedexporter-onedrive

Export to OneDrive
Python
1
star
27

scrapy-incremental

Python
1
star
28

scrapy-feedexporter-dropbox

Scrapy feed exporter for Dropbox
Python
1
star
29

scrapy-feedexporter-google-sheets

Python
1
star