• Stars
    star
    244
  • Rank 165,885 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 8 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A modern Python library for writing maintainable web scrapers.

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Source: https://github.com/jamesturk/spatula

Documentation: https://jamesturk.github.io/spatula/

Issues: https://github.com/jamesturk/spatula/issues

PyPI badge Test badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.

More Repositories

1

jellyfish

πŸͺΌ a python library for doing approximate and phonetic matching of strings.
Jupyter Notebook
2,025
star
2

scrapeghost

πŸ‘» Experimental library for scraping websites using OpenAI's GPT API.
Python
1,421
star
3

django-honeypot

🍯 Generic honeypot utilities for use in django projects.
Python
360
star
4

scrapelib

⛏ a library for scraping unreliable pages
Python
208
star
5

django-markupfield

πŸ“‘ a MarkupField for Django
Python
194
star
6

django-brainstorm

❌ deprecated brainstorm idea voting app
Python
59
star
7

django-layar

❌ deprecated helper for publishing data to Layar augmented reality browser from Django
Python
34
star
8

saucebrush

experiment in writing a simple data processing toolkit in python
Python
18
star
9

glftfont

πŸ”‘ simple library/example for using Freetype fonts within OpenGL
C++
16
star
10

cjellyfish

🎐 C implementations of Jellyfish's algorithms [deprecated]
C
14
star
11

django-markupwiki

❌ deprecated version of a simple django wiki based on django-markupfield
Python
10
star
12

polipoly

❌ deprecated simple library for dealing with political boundaries as defined by census.gov shapefiles
Python
9
star
13

mongoprof

πŸ•΅ command line mongo profiling utility
Python
6
star
14

oyster

❌ deprecated attempt to build proactive document cache
Python
6
star
15

jellyfish-testdata

🎐 cross-language test data for string comparison/encoding algorithms
3
star
16

gcr-cli

CLI for working with GitHub classroom repositories.
Python
3
star
17

go-jellyfish

🎐 a Go library for doing approximate and phonetic matching of strings
Go
3
star
18

graveyard

⚰ pieces of code that accumulate along the way
Python
2
star
19

dotfiles

βš™
Shell
2
star
20

ansible-django-uwsgi-nginx

simple django-uwsgi-nginx ansible role
2
star
21

django-simplekeys

πŸ”‘ simple but flexible API keys
Python
1
star
22

scad-designs

OpenSCAD
1
star
23

slack-render

render slack backups as static HTML
JavaScript
1
star
24

cookiecutters

template for creating a python package to my liking
CSS
1
star
25

photon

❌ obsolete ctypes+SDL experiment
Python
1
star
26

cpp_photon

❌ obsolete C++ API for development of OpenGL accelerated applications/games
1
star
27

rust-jellyfish

🎐 a Rust library for doing approximate and phonetic matching of strings, based on Python library of the same name
Rust
1
star
28

zengine-gewi

❌ deprecated GUI library written to use ZEngine
1
star
29

zengine

❌ obsolete 2D game API using OpenGL for fast 2D drawing and SDL for everything else
1
star
30

tripod-lambda

really lightweight scaffolding for AWS Lambda
Python
1
star
31

python-disqus

❌ obsolete python client library for Disqus 1.1 API
Python
1
star