• Stars
    star
    6,049
  • Rank 6,656 (Top 0.2 %)
  • Language Makefile
  • License
    Other
  • Created over 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

List of libraries, tools and APIs for web scraping and data processing.

Awesome Web Scraping

The list of tools, programming libraries and web services used for web scraping and data processing.

Feel free to give feedback or ask web scraping questions in Telegram groups: @grablab (English) and @grablab_ru (Russian).

Programming Libraries

Other Things

Captcha Solving Services

These two links point to same captcha service, it is just a different language versions

Proxy Server Marketplaces

Largest marketplaces in the world which contain offers from hundreds sellers and services:

How to Contribute to this List

See Contributing guide.

Credits

The list is based initially on some data from these sources awesome-python, awesome-php, awesome-ruby, ruby-nlp, awesome-javascript

More Repositories

1

grab

Web Scraping Framework
Python
2,330
star
2

user_agent

Generator of User-Agent header
Python
317
star
3

captcha_solver

Universal python API to captcha solving services
Python
229
star
4

awesome-osint

Yet another list of OSINT tools
99
star
5

awesome-pastebin

List of pastebin services
77
star
6

ru-osint-infosec-map

Graph of OSINT and InfoSec resources in Russian language
JavaScript
33
star
7

awesome-anti-captcha

Curated list of captcha solving software, libraries and API.
16
star
8

proxylist

Python library to work with proxy server items loaded from local file or network document.
Python
16
star
9

awesome-python-dev

List of tools for debugging, profiling and analyzing python programs.
12
star
10

selection

API to extract data from HTML and XML documents
Python
10
star
11

learning-web-scraping

A list of articles and books teaching web scraping
9
star
12

runscript

Simple script launcher
Python
8
star
13

procstat

A tool to count runtime metrics
Python
6
star
14

pyproject

Python Project Template for Cookiecutter
Makefile
6
star
15

badserver

Bad Bad Server
Python
4
star
16

awesome-geoint

Tools for GEOINT
4
star
17

3proxy_confgen

3proxy config generator to use upstream proxies
Python
4
star
18

unicodec

Tools to detect encoding and convert HTML bytes content to Unicode.
Python
3
star
19

mongodb_toolbox

Tools to automate mongodb read/write operations.
Python
3
star
20

test_server

Server to test HTTP clients
Python
3
star
21

iohub

Dashboard to monitor ioweb crawlers
Python
1
star
22

rucaptcha

Python library to access rucaptcha/twocaptcha API
Python
1
star
23

mongoenum

Script to enumerate sizes of mongodb databases, collections and indexes.
Python
1
star
24

captcha_solution

A simple interface to multiple captcha solving services
Python
1
star