• Stars
    star
    112
  • Rank 312,240 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 13 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A decorator to write coroutine-like spider callbacks.

Scrapy Inline Requests

Documentation Status Coverage Status Code Quality Status Requirements Status

A decorator for writing coroutine-like spider callbacks.

Quickstart

The spider below shows a simple use case of scraping a page and following a few links:

from inline_requests import inline_requests
from scrapy import Spider, Request

class MySpider(Spider):
    name = 'myspider'
    start_urls = ['http://httpbin.org/html']

    @inline_requests
    def parse(self, response):
        urls = [response.url]
        for i in range(10):
            next_url = response.urljoin('?page=%d' % i)
            try:
                next_resp = yield Request(next_url)
                urls.append(next_resp.url)
            except Exception:
                self.logger.info("Failed request %s", i, exc_info=True)

        yield {'urls': urls}

See the examples/ directory for a more complex spider.

Warning

The generator resumes its execution when a request's response is processed, this means the generator won't be resume after yielding an item or a request with it's own callback.

Known Issues

  • Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware documentation.
  • High concurrency and large responses can cause higher memory usage.
  • This decorator assumes your method have the following signature (self, response).
  • Wrapped requests may not be able to be serialized by persistent backends.
  • Unless you know what you are doing, the decorated method must be a spider method and return a generator instance.

More Repositories

1

scrapy-redis

Redis-based components for Scrapy.
Python
5,503
star
2

dirbot-mysql

Scrapy project based on dirbot to show how to use Twisted's adbapi to store the scraped data in MySQL.
Python
117
star
3

django-dummyimage

Dynamic Dummy Image Generator For Django!
Python
55
star
4

scrapy-boilerplate

Small set of utilities to simplify writing Scrapy spiders.
Python
49
star
5

scrapydo

Crochet-based blocking API for Scrapy.
Jupyter Notebook
46
star
6

databrewer

The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
Python
41
star
7

databrewer-recipes

DataBrewer Recipes Repository.
Python
21
star
8

django-on-tornado

Run django on tornado webserver
Python
15
star
9

webfaction-stuff

random stuff to manage your own webfaction hosting
Python
9
star
10

parsel-cli

Parsel Command Line Interface
Python
9
star
11

leveldict

LevelDB dict-like wrappers.
Python
7
star
12

cookiecutter-scrapycloud

A bare minimum Scrapy project template ready for Scrapinghub's Scrapy Cloud service.
Python
7
star
13

Facebook-Hacker-Cup-Results

C++
6
star
14

txrho

misc stuff on top twisted/cyclone
Python
6
star
15

Django-Dash-2010

Repository for Django Dash 2010
JavaScript
6
star
16

awesome-codename

Generate awesome codenames
Makefile
5
star
17

Random-Code

Random code
Python
4
star
18

dask-avro

Avro reader for Dask.
Python
4
star
19

mit-ocw-crawler

MIT's OCW Crawler
Python
4
star
20

anaconda-manylinux-builder

Scripts to build manylinux wheels in Travis CI and upload them in Anaconda.org
Shell
3
star
21

persistent-homology-examples

Examples of computing the persistent homology of miscellaneous data sets.
3
star
22

yatiri

Python
3
star
23

programming-challenges

My attempt to improve my algorithm skills. Starting from basic.
C++
3
star
24

dask-kafka

Dask-Kafka reader
Python
2
star
25

dotfiles

My dot files. DEPRECATED. Go -> https://github.com/rmax/dotfiles-ng
Vim Script
2
star
26

dockerfiles

Collection of dockerfiles.
Shell
2
star
27

scrapy-slidebot

A collection of Spiders to download slides as PDFs from popular sites like slideshare and speakerdeck.
Python
2
star
28

gyst

A pythonic tool to post gists
Python
2
star
29

haanga-benchs

Haanga's benchmarks port over Tornado Framework
PHP
2
star
30

scrapyorg-infinit-crawler

Python
1
star
31

rmax.github.io

CSS
1
star
32

code-katas

My code katas
Python
1
star
33

fastavro-codecs

1
star
34

login_signup

friendly login+signup form
JavaScript
1
star
35

lmbot

1
star
36

cookiecutter-datapackage

Makefile
1
star
37

rmax

1
star
38

ipynb

Assorted collection of iPython notebooks.
1
star
39

django-ipcountry

Python
1
star
40

dask-elasticsearch

An Elasticsearch reader for Dask
Python
1
star
41

python-benchmarks

Assorted python-based benchmarks
Python
1
star
42

binary-repr

Converts integers to binary representation.
Python
1
star
43

django_inline_example

django dynamic inline example
Python
1
star
44

yammh3

Yet another Murmurhash3 bindings.
Python
1
star
45

my-django-project-template

CSS
1
star
46

pmwiki-authelgg

PHP
1
star
47

omp-thread-count

A small Python module to get the actual number of threads used by OMP via Cython bindings.
Python
1
star
48

zend-ajax-form-test

PHP
1
star
49

rho-blogs-crawler

A Scrapy project to export my legacy blogs
Python
1
star
50

dotfiles-ng

YADM-managed dot files
Vim Script
1
star