There are no reviews yet. Be the first to send feedback to the community and the maintainers!
portia
Visual scraping for Scrapysplash
Lightweight, scriptable browser as a service with an HTTP APIdateparser
python parser for human readable datesfrontera
A scalable frontier for web crawlersslackbot
A chat bot for Slack (https://slack.com).scrapyrt
HTTP API for Scrapy spidersextruct
Extract embedded metadata from HTML markuppython-crfsuite
A python binding for crfsuitespidermon
Scrapy Extension for monitoring spiders execution.price-parser
Extract price amount and currency symbol from a raw text stringwebstruct
NER toolkit for HTML dataarticle-extraction-benchmark
Article extraction benchmark: dataset and evaluation scriptspython-scrapinghub
A client interface for Scrapinghub's APIadblockparser
Python parser for Adblock Plus filtersjs2xml
Convert Javascript code to an XML documenttestspiders
Useful test spiders for Scrapyscrapy-training
Scrapy Training companion codeskinfer
Skinfer is a tool for inferring and merging JSON schemassample-projects
Sample projects showcasing Scrapinghub techshub
Scrapinghub Command Line Clientpython-simhash
An efficient simhash implementation for pythonscrapy-poet
Page Object pattern for Scrapymdr
A python library detect and extract listing data from HTML page.number-parser
Parse numbers written in natural languageweb-poet
Web scraping Page Objects core libraryaile
Automatic Item List Extractionwappalyzer-python
UNMAINTAINED Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)pydepta
A python implementation of DEPTAscrapinghub-stack-scrapy
Software stack with latest Scrapy and updated depsscrapy-autounit
Automatic unit test generation for Scrapy.learn.scrapinghub.com
Scrapinghub Learning Center. Report issues in Jira: Report issues in Jira: https://scrapinghub.atlassian.net/projects/WEBscrapy-autoextract
Zyte Automatic Extraction integration for Scrapyaduana
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).portia2code
arche
Analyze scraped datascmongo
MongoDB extensions for Scrapyexporters
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinationspage_clustering
A simple algorithm for clustering web pages, suitable for crawlerswebpager
Paginating the webscrapy-frontera
More flexible and featured Frontera scheduler for Scrapyscaws
Extensions for using Scrapy on Amazon AWSflatson
Tool to flatten stream of JSON-like objects, configured via schemadocker-images
scrapylib
Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)pycon-speakers
Speakers Spider (PyCon 2014 sprint)docker-devpi
pypi caching service using devpi and dockercrawlera-tools
Crawlera toolsscrapinghub-entrypoint-scrapy
Scrapy entrypoint for Scrapinghub job runnerscrapy-mosquitera
Restrict crawl and scraping scope using matchers.kafka-scanner
High Level Kafka Scannerandi
Library for annotation-based dependency injectionautoextract-spiders
Pre-built Scrapy spiders for AutoExtractpython-cld2
Python bindings for CLD2.python-hubstorage
Deprecated HubStorage client library - please use python-scrapinghub>=1.9.0 insteadshublang
Pluggable DSL that uses pipes to perform a series of linear transformations to extract dataproduct-extraction-benchmark
shubc
Go bindings for Scrapinghub HTTP API and a sweet command line tool for Scrapy Cloudshub-workflow
navscraper
Vanguard ETF NAV scrapertutorials
pastebin
hcf-backend
Crawl Frontier HCF backendvaranus
A command line spider monitoring toolpydatanyc
autoextract-poet
web-poet definitions for AutoExtractcollection-scanner
HubStorage collection scanner librarylocode
autoextract-examples
webstruct-demo
HTTP demo for https://github.com/scrapinghub/webstructshub-image
Deprecated client side tool to prepare docker images to run crawlers in Scrapinghub - please use shub>=2.5.0 insteaddocker-cloudera-manager
Run Cloudera Manager in dockeradblockgoparser
Golang parser for Adblock Plus filtershubstorage-frontera
Hubstorage crawl frontier backend for Fronterahttpation
xpathcsstutorial
[Work in progress] XPath & CSS for web scraping tutorialcustom-images-examples
Examples of custom images running on Scrapinghub platformepmdless_dist
egraylog
scrapinghub-conda-recipes
Conda packages for scrapinghub channelpydaybot
Demo bot for Python Day Uruguay 2011erl-iputils
jupyterhub-stacks
A docker images for jhub clusterscrapinghub-stack-hworker
[DEPRECATED] Software stack fully compatible with Scrapy Cloud 1.0cld2
Compact Language Detector 2pkg-opengrok
Ubuntu packaging for OpenGrokcrawlera.com
crawlera.com websitediscourse-sso-google
Use Google as Single-Sign-On provider for DiscourseLove Open Source and this site? Check out how you can help us