There are no reviews yet. Be the first to send feedback to the community and the maintainers!
portia
Visual scraping for Scrapysplash
Lightweight, scriptable browser as a service with an HTTP APIdateparser
python parser for human readable datesfrontera
A scalable frontier for web crawlersslackbot
A chat bot for Slack (https://slack.com).extruct
Extract embedded metadata from HTML markupscrapyrt
HTTP API for Scrapy spiderspython-crfsuite
A python binding for crfsuitespidermon
Scrapy Extension for monitoring spiders execution.price-parser
Extract price amount and currency symbol from a raw text stringarticle-extraction-benchmark
Article extraction benchmark: dataset and evaluation scriptswebstruct
NER toolkit for HTML datapython-scrapinghub
A client interface for Scrapinghub's APIadblockparser
Python parser for Adblock Plus filtersjs2xml
Convert Javascript code to an XML documenttestspiders
Useful test spiders for Scrapyscrapy-training
Scrapy Training companion codeskinfer
Skinfer is a tool for inferring and merging JSON schemassample-projects
Sample projects showcasing Scrapinghub techshub
Scrapinghub Command Line Clientpython-simhash
An efficient simhash implementation for pythonscrapy-poet
Page Object pattern for Scrapynumber-parser
Parse numbers written in natural languagemdr
A python library detect and extract listing data from HTML page.web-poet
Web scraping Page Objects core libraryaile
Automatic Item List Extractionwappalyzer-python
UNMAINTAINED Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)pydepta
A python implementation of DEPTAscrapinghub-stack-scrapy
Software stack with latest Scrapy and updated depsaduana
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).scrapy-autoextract
Zyte Automatic Extraction integration for Scrapyscrapy-autounit
Automatic unit test generation for Scrapy.learn.scrapinghub.com
Scrapinghub Learning Center. Report issues in Jira: Report issues in Jira: https://scrapinghub.atlassian.net/projects/WEBportia2code
arche
Analyze scraped datascmongo
MongoDB extensions for Scrapyexporters
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinationswebpager
Paginating the webscrapy-frontera
More flexible and featured Frontera scheduler for Scrapypage_clustering
A simple algorithm for clustering web pages, suitable for crawlersflatson
Tool to flatten stream of JSON-like objects, configured via schemascaws
Extensions for using Scrapy on Amazon AWSdocker-images
scrapylib
Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)pycon-speakers
Speakers Spider (PyCon 2014 sprint)docker-devpi
pypi caching service using devpi and dockercrawlera-tools
Crawlera toolsscrapinghub-entrypoint-scrapy
Scrapy entrypoint for Scrapinghub job runnerscrapy-mosquitera
Restrict crawl and scraping scope using matchers.andi
Library for annotation-based dependency injectionkafka-scanner
High Level Kafka Scannerautoextract-spiders
Pre-built Scrapy spiders for AutoExtractpython-cld2
Python bindings for CLD2.product-extraction-benchmark
python-hubstorage
Deprecated HubStorage client library - please use python-scrapinghub>=1.9.0 insteadshublang
Pluggable DSL that uses pipes to perform a series of linear transformations to extract datashub-workflow
shubc
Go bindings for Scrapinghub HTTP API and a sweet command line tool for Scrapy Cloudscrapinghub-stack-portia
Software stack used to run Portia spiders in Scrapinghub cloudtutorials
pastebin
navscraper
Vanguard ETF NAV scrapervaranus
A command line spider monitoring toolhcf-backend
Crawl Frontier HCF backendpydatanyc
autoextract-poet
web-poet definitions for AutoExtractcollection-scanner
HubStorage collection scanner librarylocode
adblockgoparser
Golang parser for Adblock Plus filtersautoextract-examples
webstruct-demo
HTTP demo for https://github.com/scrapinghub/webstructshub-image
Deprecated client side tool to prepare docker images to run crawlers in Scrapinghub - please use shub>=2.5.0 insteaddocker-cloudera-manager
Run Cloudera Manager in dockercustom-images-examples
Examples of custom images running on Scrapinghub platformhubstorage-frontera
Hubstorage crawl frontier backend for Fronteraxpathcsstutorial
[Work in progress] XPath & CSS for web scraping tutorialepmdless_dist
egraylog
scrapinghub-conda-recipes
Conda packages for scrapinghub channelpydaybot
Demo bot for Python Day Uruguay 2011erl-iputils
jupyterhub-stacks
A docker images for jhub clustercld2
Compact Language Detector 2scrapinghub-stack-hworker
[DEPRECATED] Software stack fully compatible with Scrapy Cloud 1.0crawlera.com
crawlera.com websitediscourse-sso-google
Use Google as Single-Sign-On provider for Discoursepkg-opengrok
Ubuntu packaging for OpenGrokLove Open Source and this site? Check out how you can help us