• Stars
    star
    208
  • Rank 189,015 (Top 4 %)
  • Language
    Python
  • License
    BSD 2-Clause "Sim...
  • Created over 14 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

⛏ a library for scraping unreliable pages

scrapelib is a library for making requests to less-than-reliable websites.

Source: https://github.com/jamesturk/scrapelib

Documentation: https://jamesturk.github.io/scrapelib/

Issues: https://github.com/jamesturk/scrapelib/issues

PyPI badge Test badge

Features

scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.

Advantages of using scrapelib over using requests as-is:

  • HTTP(S) and FTP requests via an identical API
  • support for simple caching with pluggable cache backends
  • highly-configurable request throtting
  • configurable retries for non-permanent site failures
  • All of the power of the suberb requests library.

Installation

scrapelib is on PyPI, and can be installed via any standard package management tool:

poetry add scrapelib

or:

pip install scrapelib

Example Usage

  import scrapelib
  s = scrapelib.Scraper(requests_per_minute=10)

  # Grab Google front page
  s.get('http://google.com')

  # Will be throttled to 10 HTTP requests per minute
  while True:
      s.get('http://example.com')

More Repositories

1

jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
Jupyter Notebook
2,025
star
2

scrapeghost

👻 Experimental library for scraping websites using OpenAI's GPT API.
Python
1,421
star
3

django-honeypot

🍯 Generic honeypot utilities for use in django projects.
Python
360
star
4

spatula

A modern Python library for writing maintainable web scrapers.
Python
244
star
5

django-markupfield

📑 a MarkupField for Django
Python
194
star
6

django-brainstorm

❌ deprecated brainstorm idea voting app
Python
59
star
7

django-layar

❌ deprecated helper for publishing data to Layar augmented reality browser from Django
Python
34
star
8

saucebrush

experiment in writing a simple data processing toolkit in python
Python
18
star
9

glftfont

🔡 simple library/example for using Freetype fonts within OpenGL
C++
16
star
10

cjellyfish

🎐 C implementations of Jellyfish's algorithms [deprecated]
C
14
star
11

django-markupwiki

❌ deprecated version of a simple django wiki based on django-markupfield
Python
10
star
12

polipoly

❌ deprecated simple library for dealing with political boundaries as defined by census.gov shapefiles
Python
9
star
13

mongoprof

🕵 command line mongo profiling utility
Python
6
star
14

oyster

❌ deprecated attempt to build proactive document cache
Python
6
star
15

jellyfish-testdata

🎐 cross-language test data for string comparison/encoding algorithms
3
star
16

gcr-cli

CLI for working with GitHub classroom repositories.
Python
3
star
17

go-jellyfish

🎐 a Go library for doing approximate and phonetic matching of strings
Go
3
star
18

graveyard

⚰ pieces of code that accumulate along the way
Python
2
star
19

dotfiles

Shell
2
star
20

ansible-django-uwsgi-nginx

simple django-uwsgi-nginx ansible role
2
star
21

django-simplekeys

🔑 simple but flexible API keys
Python
1
star
22

scad-designs

OpenSCAD
1
star
23

slack-render

render slack backups as static HTML
JavaScript
1
star
24

cookiecutters

template for creating a python package to my liking
CSS
1
star
25

photon

❌ obsolete ctypes+SDL experiment
Python
1
star
26

cpp_photon

❌ obsolete C++ API for development of OpenGL accelerated applications/games
1
star
27

rust-jellyfish

🎐 a Rust library for doing approximate and phonetic matching of strings, based on Python library of the same name
Rust
1
star
28

zengine-gewi

❌ deprecated GUI library written to use ZEngine
1
star
29

zengine

❌ obsolete 2D game API using OpenGL for fast 2D drawing and SDL for everything else
1
star
30

tripod-lambda

really lightweight scaffolding for AWS Lambda
Python
1
star
31

python-disqus

❌ obsolete python client library for Disqus 1.1 API
Python
1
star