Zyte (Formally Scrapinghub) Sample Projects
This repo contains a few sample projects demonstrating capabilities of Zyte (Formally Scrapinghub) technologies.
There is not much to see here yet, but stay tuned, we're just getting started!
There are no reviews yet. Be the first to send feedback to the community and the maintainers!
This repo contains a few sample projects demonstrating capabilities of Zyte (Formally Scrapinghub) technologies.
There is not much to see here yet, but stay tuned, we're just getting started!
portia
Visual scraping for Scrapysplash
Lightweight, scriptable browser as a service with an HTTP APIdateparser
python parser for human readable datesfrontera
A scalable frontier for web crawlersslackbot
A chat bot for Slack (https://slack.com).extruct
Extract embedded metadata from HTML markupscrapyrt
HTTP API for Scrapy spiderspython-crfsuite
A python binding for crfsuitespidermon
Scrapy Extension for monitoring spiders execution.price-parser
Extract price amount and currency symbol from a raw text stringarticle-extraction-benchmark
Article extraction benchmark: dataset and evaluation scriptswebstruct
NER toolkit for HTML datapython-scrapinghub
A client interface for Scrapinghub's APIadblockparser
Python parser for Adblock Plus filtersjs2xml
Convert Javascript code to an XML documenttestspiders
Useful test spiders for Scrapyscrapy-training
Scrapy Training companion codeskinfer
Skinfer is a tool for inferring and merging JSON schemasshub
Scrapinghub Command Line Clientpython-simhash
An efficient simhash implementation for pythonscrapy-poet
Page Object pattern for Scrapynumber-parser
Parse numbers written in natural languagemdr
A python library detect and extract listing data from HTML page.web-poet
Web scraping Page Objects core libraryaile
Automatic Item List Extractionwappalyzer-python
UNMAINTAINED Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)pydepta
A python implementation of DEPTAscrapinghub-stack-scrapy
Software stack with latest Scrapy and updated depsaduana
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).scrapy-autoextract
Zyte Automatic Extraction integration for Scrapyscrapy-autounit
Automatic unit test generation for Scrapy.learn.scrapinghub.com
Scrapinghub Learning Center. Report issues in Jira: Report issues in Jira: https://scrapinghub.atlassian.net/projects/WEBportia2code
arche
Analyze scraped datascmongo
MongoDB extensions for Scrapyexporters
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinationswebpager
Paginating the webscrapy-frontera
More flexible and featured Frontera scheduler for Scrapypage_clustering
A simple algorithm for clustering web pages, suitable for crawlersflatson
Tool to flatten stream of JSON-like objects, configured via schemascaws
Extensions for using Scrapy on Amazon AWSdocker-images
scrapylib
Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)pycon-speakers
Speakers Spider (PyCon 2014 sprint)docker-devpi
pypi caching service using devpi and dockercrawlera-tools
Crawlera toolsscrapinghub-entrypoint-scrapy
Scrapy entrypoint for Scrapinghub job runnerscrapy-mosquitera
Restrict crawl and scraping scope using matchers.andi
Library for annotation-based dependency injectionkafka-scanner
High Level Kafka Scannerautoextract-spiders
Pre-built Scrapy spiders for AutoExtractpython-cld2
Python bindings for CLD2.product-extraction-benchmark
python-hubstorage
Deprecated HubStorage client library - please use python-scrapinghub>=1.9.0 insteadshublang
Pluggable DSL that uses pipes to perform a series of linear transformations to extract datashub-workflow
shubc
Go bindings for Scrapinghub HTTP API and a sweet command line tool for Scrapy Cloudscrapinghub-stack-portia
Software stack used to run Portia spiders in Scrapinghub cloudtutorials
pastebin
navscraper
Vanguard ETF NAV scrapervaranus
A command line spider monitoring toolhcf-backend
Crawl Frontier HCF backendpydatanyc
autoextract-poet
web-poet definitions for AutoExtractcollection-scanner
HubStorage collection scanner librarylocode
adblockgoparser
Golang parser for Adblock Plus filtersautoextract-examples
webstruct-demo
HTTP demo for https://github.com/scrapinghub/webstructshub-image
Deprecated client side tool to prepare docker images to run crawlers in Scrapinghub - please use shub>=2.5.0 insteaddocker-cloudera-manager
Run Cloudera Manager in dockercustom-images-examples
Examples of custom images running on Scrapinghub platformhubstorage-frontera
Hubstorage crawl frontier backend for Fronterahttpation
xpathcsstutorial
[Work in progress] XPath & CSS for web scraping tutorialepmdless_dist
egraylog
scrapinghub-conda-recipes
Conda packages for scrapinghub channelpydaybot
Demo bot for Python Day Uruguay 2011erl-iputils
jupyterhub-stacks
A docker images for jhub clustercld2
Compact Language Detector 2scrapinghub-stack-hworker
[DEPRECATED] Software stack fully compatible with Scrapy Cloud 1.0crawlera.com
crawlera.com websitediscourse-sso-google
Use Google as Single-Sign-On provider for Discoursepkg-opengrok
Ubuntu packaging for OpenGrokLove Open Source and this site? Check out how you can help us