• Stars
    star
    532
  • Rank 83,377 (Top 2 %)
  • Language
  • Created over 8 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A list of scrapers from around the web.

Scrapers

A list of scrapers from around the web.

Find your way through with the Table of Contents. It will showcase the entire list with easy navigate to their pros and cons while also providing links to their respective websites.

Please contribute by adding links, adding pros/cons, titles, or anything else you think would be helpful! Please help maintain alphabetical order.

Table Of Contents

Apifier

Description: Cloud-based scraper for JavaScript.

Applicable Language(s)

  • JavaScript

Beautiful Soup

Description: A Python library for navigating and parsing results from the Web. It allow for searching the HTML tree to find various tags.

Applicable Language(s)

  • Python

Browse AI

Description: Browse AI is a cloud-based SaaS that lets you extract and monitor structured data from any website with no code through a click and extract interface. It also comes with a REST API, webhooks, and native integrations with tools like Google Sheets.

Applicable Language(s)

  • C
  • Clojure
  • C#
  • Go
  • Java
  • Node
  • Objective-C
  • Ocaml
  • PHP
  • Python
  • Ruby
  • Shell
  • Swift

Cheerio

Description:Fast, flexible & lean implementation of core jQuery designed

Applicable Language(s)

  • JavaScript

Clearbit

Description: Service for looking up company and people information.

Applicable Language(s)


Common Crawl

Description: Open dataset of crawled websites.

Applicable Language(s)


Crawly

Description: Automatic service that turns a website into structured data in the form of JSON or CSV.

Applicable Language(s)


Dexi.io

Description: Website data extraction using a visual programming language.

Applicable Language(s)


Diffbot

Description: Automated tool for extracting structured information from pages, crawling websites, and turning a website into an API.

Applicable Language(s)


Diggernaut

Description: Cloud based web scraping platform.

Applicable Language(s)

  • SML
  • Javascript

Pros

  • Scraper can be build using visual tool and scraping meta language
  • Can execute JS snippets inside scraper
  • Supports Selenium (optionally) and OCR
  • Automated data validation and export to any text based format
  • Can run scrapers manually and scheduled in the cloud or compile and run locally
  • Full automation using API and integrations with other APIs

Cons

  • Currently in beta
  • Doesn't support PDF parsing yet

eLink

Description: Tool to mine LinkedIn profiles based on keywords.

Applicable Language(s)


EliteProxySwitcher

Description: Local software that can download a proxy list and let users choose which one to use.

Applicable Language(s)


Email Hunter

Description: API to find e-mail addresses for a given domain name.

Applicable Language(s)


FiveFilters

Description: Provide various website extraction and transformation tools such as Full-Text RSS and Term Extraction as services.

Applicable Language(s)


FMiner

Description: Local software for web scraping using a recording and a visual programming language.

Applicable Language(s)


FullContact

Description: API to retrieve more information on a person.

Applicable Language(s)


Grabby

Description: Service that searches a website for e-mails.

Applicable Language(s)


HrefScrap

Description: A chrome extension which scrapes off all the href's from a web page.

Applicable Language(s)


Import.io

Description: Automated tool to extract structured information from websites.

Applicable Language(s)


Kimonolabs

Description: Kimono was acquired by Palantir. This was a cloud-based service for turning websites into structured APIs. Now they offer a desktop-based alternative for continuing to use their tools.

Applicable Language(s)


lxml

Description: lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.

Pros

Applicable Language(s)

  • Python

Mozenda

Description: Extract structured information from HTML, PDF, Excel, and Word by clicking on document elements.

Applicable Language(s)


Morph.io

Description: Based on ScraperWiki, run scrapers in Python, Ruby, R, Perl or Node.js.

Applicable Language(s)

  • Node.js
  • Perl
  • Python
  • R
  • Ruby

Node-Crawler

Description: Web Crawler/Spider for NodeJS + server-side jQuery

Applicable Language(s)

  • Node.js

Nutch

Description: Web crawler that can be combined with the Hadoop ecosystem to run in a cluster.

Applicable Language(s)


Outwit Hub

Description: Application that can extract information from a website and turn it into structured data (CSV, Excel, etc.).

Applicable Language(s)


Octoparse

Description: The free web scraping tool for extracting all the web page data into several structured file formats easily and effectively.

Applicable Language(s)


rvest

Description: R package to scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup.

Applicable Language(s)

  • R

scrape-it

Description: A Node.js scraper for humans.

Applicable Language(s)

  • JavaScript (Node.js)

Scraper.AI

Description: Scraper.AI is an automated scraping SaaS that makes extracting data from any webpage as simple as clicking and selecting what you want. With a few clicks you can gather thousands of records.

Best of all, changes to the selections are monitored as often as you want. Updates are pushed to a consumable API for you to build on top of it

Applicable Language(s)

  • Any, through a JSON API and (optional) webhook

ScraperAPI

Description: ScraperAPI is a tool for developers building web scrapers, it handles proxies, browsers, and CAPTCHAs so developers can get the raw HTML from any website with a simple API call.

It’s the ultimate web scraping service for developers, with special pools of proxies for ecommerce price scraping, search engine scraping, social media scraping, sneaker scraping, ticket scraping and more.

Applicable Language(s)

  • Python
  • NodeJS
  • PHP
  • Ruby
  • Java

ScraperWiki

Description: Write a scraper in the browser and run on their cloud-based service. This is used by many news organisations.

Applicable Language(s)


ScrapingAnt

Description: ScrapingAnt is a Headless Chrome scraping API and free checked proxies service. ScrapingAnt supports Javascript rendering, premium rotating proxies and CAPTCHAs avoiding tools. Free plans available.

Applicable Language(s)

  • Any, through a JSON API

Scrapinghub

Description: Scraper cloud hosting as a service. Allows developers to deploy their own scrapers on their platform and benefit from their existing infrastructure.

Applicable Language(s)


Scrapper

Description: Scrapper is a powerful web scraping tool with a built-in headless browser and Read mode for parsing. It has a simple and beautiful web interface, a REST API, and can search for news links on websites. Other features include stealth mode, caching results, page screenshots, proxy support, and full customization. Scrapper is delivered as a Docker image and is free to use.

Applicable Language(s)

  • Any, through a JSON API

Screen Scraper

Description: Local tool for scraping websites.

Applicable Language(s)


Toofr

Description: Service for looking up business e-mails.

Applicable Language(s)


UBot Studio

Description: Web automation software using a visual programming language and recorder.

Applicable Language(s)


UiPath

Description: Visual tool for GUI automation by recording.

Applicable Language(s)


Venom

Description: Venom is an open source focused crawler for the Deep Web.

Features

  • Multi-threaded
  • Structured crawling
  • Page Validation
  • Automatic Retries
  • Proxy support

Applicable Language(s)

  • JAVA

Web Robots

Description: Data as a Service platform for web scraping.

Pros

  • Scraping dynamic javascript heavy websites
  • Login and form fill on websites
  • Data normalization and validation
  • Data uploads

Cons

  • Currently in beta
  • Possible payment model in the future

Applicable Language(s)


Web Scraper

Description: Extension that downloads websites and turns them into structured data. Data is selected by element or by specialised selectors (e.g., for tables).

Applicable Language(s)


WrapAPI

Description: Turn a website into an API. The structure of the data is defined by clicking elements or regular expressions.

Applicable Language(s)


X-Ray

Description: NPM module for scraping structured data via jQuery-like selectors.

Applicable Language(s)

  • JavaScript (Node.js)

ZenRows

Description: Web Scraping API & proxy server that bypasses any anti-bot solution while offering javascript rendering, rotating proxies, and geotargeting.

Applicable Language(s)

  • Any, using an API or proxy
  • JavaScript (Node.js SDK available)

More Repositories

1

getting-a-gig

Guide for getting a gig as a tech student.
6,975
star
2

todometer

A meter-based to-do list
JavaScript
1,810
star
3

HTML-CSS-Tutorial

Tutorial for HTML and CSS
HTML
853
star
4

talks

Talks at conferences, meetups, hackathons, and more, plus my speaker rider for good measure.
JavaScript
648
star
5

next-netlify-starter

A one-click starter project for Next and Netlify, moved to https://github.com/netlify-templates/next-netlify-starter
JavaScript
411
star
6

vim-up

A bunch of vim shortcuts, colors, and bundles to make your life easier
Vim Script
350
star
7

next-adventure

A crowdsourced, mad-lib, choose your own adventure story!
JavaScript
224
star
8

next-netlify-blog-starter

A lightweight markdown blog starter built with Next.js 12+ and Netlify
JavaScript
197
star
9

cass.run

My own lil URL shortener
156
star
10

better-security-questions

Better security questions for easy use in your projects
JavaScript
140
star
11

shopify-react-astro

A demo of a Shopify site using Astro and React.
JavaScript
107
star
12

next-contentful-starter

A starter project for Next.js, Contentful, and Netlify
JavaScript
81
star
13

cardstock

A clean Obsidian theme
CSS
65
star
14

astro-netlify-starter

A starter project for Astro and Netlify!
Astro
64
star
15

ama

Answering questions I get often
58
star
16

next-prankz

A project showing how to fake a news website to rick roll people.
JavaScript
57
star
17

cassidoo-v5

My personal website, version 5.
JavaScript
55
star
18

morsemoji

Translate text to morse code, but the morse code is emojis
JavaScript
50
star
19

shopify-next-netlify

A Shopify Next.js demo
JavaScript
46
star
20

astro-react-vue-demo

A small demo using Astro, React, and Vue together.
CSS
46
star
21

call-it-what-it-is

A Chrome extension to replace the phrase, "racially offensive" and "racially charged" to "racist" because that's what it means
JavaScript
42
star
22

reading-challenge-webinar

Code from my "You don't need Redux" webinar
JavaScript
40
star
23

search-nasa

A Vue.js demo live-coded at Twilio SIGNAL 2017
JavaScript
34
star
24

typing-game

A Pokemon guessing game
JavaScript
32
star
25

github-copilot-demo

A nonsensical React application made (almost) entirely by GitHub Copilot
JavaScript
32
star
26

link-in-bio-generator

Make your own "link in bio" site
Vue
30
star
27

flapjack-fwop

A game where you have to get butter on all of the pancakes.
CSS
30
star
28

next-netlify-portfolio-starter

A personal portfolio project starter using Next.js and Netlify
JavaScript
29
star
29

pic2trip

A Chrome extension to search Trip Advisor by image with Clarifai
JavaScript
28
star
30

book-rating-demo

A demo built with React and HarperDB to show book ratings
JavaScript
26
star
31

snapchat-dashboard

Watch your Snapchats arrive live in the browser
Python
24
star
32

parsesize

A tiny CSS preprocessor that converts the `size` property to `width` and `height`.
JavaScript
23
star
33

next-nasa-demo

JavaScript
23
star
34

next-context-example

Here's an example of using React Context and useReducer with Next.js for state management.
JavaScript
23
star
35

iris-keyboard-layout

A custom layout for the Iris mechanical keyboard.
19
star
36

accordion

A React accordion
JavaScript
18
star
37

cassidoo

17
star
38

react-hooks-workshop-2020

JavaScript
17
star
39

clarifai-javascript-starter

Basic starter in JavaScript for the Clarifai API
JavaScript
17
star
40

solid-pi-guesser

A SolidJS project to guess digits of pi
JavaScript
16
star
41

tue-due-vue

Cassidy's attempts to learn Vue.js with a todo app (and a counter app to boot)!
Vue
16
star
42

react-hooks-codemirror

A nice neat React component wrapper for CodeMirror
JavaScript
15
star
43

pokemon-feature-flags

A demo of LaunchDarkly, React, and Vite, using the Pokémon API!
JavaScript
15
star
44

cassidoo-v3

My personal website
HTML
15
star
45

sonnet-18

Shall I compare thine repo to a summer's day?
JavaScript
15
star
46

next-prismic-starter

JavaScript
15
star
47

members-only

A demo using Next.js and Netlify Identity
JavaScript
13
star
48

vveather

A React weather app
JavaScript
12
star
49

desktop-lava

A lava lamp for your desktop built with CSS and Electron.
CSS
11
star
50

blahg

Astro
11
star
51

next-contentful-birthday-card

JavaScript
10
star
52

chihuahua-or-muffin

A quick webapp that determines if a picture is a chihuahua or a muffin.
CSS
10
star
53

cassidoo-v4

My personal website!
CSS
10
star
54

clarifai-car-trainer

Basic example using Clarifai custom training.
CSS
9
star
55

generate-stripe-coupons

A Python script for bulk-generating Stripe coupons and promotion codes
Python
9
star
56

my-next-project

A demo project for Next.js
JavaScript
9
star
57

github-user-search

A live-coded Vue.js app using the GitHub API
JavaScript
7
star
58

stream-hooks-demo

A quick overview of the React hooks useState and useEffect demoed on cass.run/live
JavaScript
7
star
59

hued

A photo-based color palette generator
CSS
7
star
60

vscode-astro-stream

A demo of how Astro works, livestreamed with Visual Studio Code
Astro
7
star
61

Color-Detect

JavaScript + HTML5 app to detect dominant colors in an image
JavaScript
7
star
62

remix-demo

Following and experimenting with the Remix tutorial
JavaScript
7
star
63

screenshot-demo

Quickly screenshot a website and save it to .png and .pdf
JavaScript
7
star
64

whitney-api

Whitney Houston API. Because why not.
Python
6
star
65

avocado-labs-demo

JavaScript
6
star
66

astro-redirects-example

Astro + Netlify _redirects
Astro
6
star
67

effective-javascript

Douglas Crockford's class on Effective JavaScript
JavaScript
6
star
68

console-dot-fog

console.log with some fog
JavaScript
6
star
69

testaroni

Astro
6
star
70

protected-uploader

A simple checker that confirms that photos being uploaded have no pornographic content.
Python
6
star
71

ErmergershScript

This is the greatest compile-to-JavaScript language of all time: Ermergersh.
JavaScript
6
star
72

cs-education-funding

List of U.S. Senators and Representatives who are key decision-makers in terms of providing funding for K-12 computer science.
HTML
5
star
73

Star-Catcher

For Facebook's 2012 Summer of Hack, LuisBosquez and I made StarCatcher, a browser game in which you click the stars before the fall to the ground.
4
star
74

horses-are-cool

A demo of how Astro works!
Astro
4
star
75

decentralized-news

A decentralized news aggregator built with Chainlink and Astro (built during a live webinar)
Astro
4
star
76

project-pickle

JavaScript
3
star
77

learn-pi

A web app to help you learn pi.
CSS
3
star
78

CSS-Notes

A CSS notepad
CSS
3
star
79

split-testing-demo

A simple demo of Netlify Split Testing
HTML
3
star
80

cass.lol

3
star
81

link-in-bio-template

A template repo for generating link in bio sites
JavaScript
3
star
82

next-storyblok-starter

3
star
83

dooodles

Draw with your friends.
JavaScript
2
star
84

es6-party

You're going to learn ES6 and you're going to like it.
HTML
2
star
85

PebbleInsultGenerator

An insult generator for Pebble.
C
2
star
86

Learn-Morse

A Polymer application for learning morse code.
JavaScript
2
star
87

team-dory

Finance game for kids.
JavaScript
2
star
88

Dots

CSS3 + JavaScript dots orbiting the center of the page
CSS
2
star
89

reading-challenge-2015

Rails app for tracking your reading throughout 2015
Ruby
2
star
90

remove-gmail-categories

A free browser extension that removes the labels Social, Promotions, Updates, and Forums from the Gmail Move dropdown.
JavaScript
2
star
91

cassidoo-next

This is not Next.js this is just me merging my blog and website stop yelling at me
Astro
2
star
92

next-beastie-demo

JavaScript
1
star
93

markdown.tips

1
star
94

clarifai-go-cli

A CLI for the Clarifai /tag endpoint
Go
1
star
95

found

Search and find your loved ones in crisis through machine learning
CSS
1
star
96

Balance

A game in which players have to get through college while staying out of debt.
JavaScript
1
star
97

TextToTable

Convert lines in a text file to an HTML table
Python
1
star
98

cassidoo.github.io

Cassidy's Blog
CSS
1
star
99

scheme-exercises

Just me practicing Scheme
Racket
1
star
100

ShakespeareInsults

An olde-English insult generator
CSS
1
star