• Stars
    star
    2,330
  • Rank 19,757 (Top 0.4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Web Scraping Framework

Grab Framework Project

Grab Test Status Code Quality Type Check Grab Test Coverage Status Pypi Downloads Grab Documentation

Status of Project

I myself have not used Grab for many years. I am not sure it is being used by anybody at present time. Nonetheless I decided to refactor the project, just for fun. I have annotated whole code base with mypy type hints (in strict mode). Also the whole code base complies to pylint and flake8 requirements. There are few exceptions: very large methods and classes with too many local atributes and variables. I will refactor them eventually.

The current and the only network backend is urllib3.

I have refactored a few components into external packages: proxylist, procstat, selection, unicodec, user_agent

Feel free to give feedback in Telegram groups: @grablab and @grablab_ru

Things to be done next

  • Refactor source code to remove all pylint disable comments like:
    • too-many-instance-attributes
    • too-many-arguments
    • too-many-locals
    • too-many-public-methods
  • Make 100% test coverage, it is about 95% now
  • Release new version to pypi
  • Refactor more components into external packages
  • More abstract interfaces
  • More data structures and types
  • Decouple connections between internal components

Installation

That will install old Grab released in 2018 year: pip install -U grab

The updated Grab available in github repository is 100% not compatible with spiders and crawlers written for Grab released in 2018 year.

Documentation

Updated documenation is here https://grab.readthedocs.io/en/latest/ Most updates are removings content related to features I have removed from the Grab since 2018 year.

Documentation for old Grab version 0.6.41 (released in 2018 year) is here https://grab.readthedocs.io/en/v0.6.41-doc/

More Repositories

1

awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.
Makefile
6,049
star
2

user_agent

Generator of User-Agent header
Python
317
star
3

captcha_solver

Universal python API to captcha solving services
Python
229
star
4

awesome-osint

Yet another list of OSINT tools
99
star
5

awesome-pastebin

List of pastebin services
77
star
6

ru-osint-infosec-map

Graph of OSINT and InfoSec resources in Russian language
JavaScript
33
star
7

awesome-anti-captcha

Curated list of captcha solving software, libraries and API.
16
star
8

proxylist

Python library to work with proxy server items loaded from local file or network document.
Python
16
star
9

awesome-python-dev

List of tools for debugging, profiling and analyzing python programs.
12
star
10

selection

API to extract data from HTML and XML documents
Python
10
star
11

learning-web-scraping

A list of articles and books teaching web scraping
9
star
12

runscript

Simple script launcher
Python
8
star
13

procstat

A tool to count runtime metrics
Python
6
star
14

pyproject

Python Project Template for Cookiecutter
Makefile
6
star
15

badserver

Bad Bad Server
Python
4
star
16

awesome-geoint

Tools for GEOINT
4
star
17

3proxy_confgen

3proxy config generator to use upstream proxies
Python
4
star
18

unicodec

Tools to detect encoding and convert HTML bytes content to Unicode.
Python
3
star
19

mongodb_toolbox

Tools to automate mongodb read/write operations.
Python
3
star
20

test_server

Server to test HTTP clients
Python
3
star
21

iohub

Dashboard to monitor ioweb crawlers
Python
1
star
22

rucaptcha

Python library to access rucaptcha/twocaptcha API
Python
1
star
23

mongoenum

Script to enumerate sizes of mongodb databases, collections and indexes.
Python
1
star
24

captcha_solution

A simple interface to multiple captcha solving services
Python
1
star