• Stars
    star
    4
  • Rank 3,291,711 (Top 66 %)
  • Language
    Python
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Split the URL for the scheme, server, host domain, top domain, port, path, params, query/分割URL为协议、所用服务、主机域名、顶级域名、端口、路径、参数、查询字段几部分

More Repositories

1

MiniORM-MySQL

Based on the MySQLdb provide formatting Mysql light-weight ORM/基于MySQLdb提供格式化操作Mysql的轻量级ORM
Python
8
star
2

HashTree

Create a hash tree based on the URL for local storage and retrieval of web page information in the crawler/基于URL构建哈希目录树,用于爬虫中网页信息本地存储及检索
Python
5
star
3

HtmlExtract-Python

Extract all the text in the text, Chinese, keywords, Title, ICP, link and inside and outside the chain ratio, form form, alert, meta, jump, sensitive words and other information / 抽取HTML中所有文本、中文、关键词、Title、ICP、链接及内外链比例、form表单、alert、meta、跳转、敏感词等信息
Python
4
star
4

Eggtart

Eggtart, a distributed web page information processing framework, including web page data crawling, analysis, and results processing / 蛋挞,一个分布式网页信息处理框架,包括网页数据爬取、分析、结果业务处理
Python
3
star
5

HtmlExtract-Java

Extract all the text in the text, Chinese, keywords, Title, ICP, link and inside and outside the chain ratio, form form, alert, meta, jump, sensitive words and other information / 抽取HTML中所有文本、中文、关键词、Title、ICP、链接及内外链比例、form表单、alert、meta、跳转、敏感词等信息
Java
3
star
6

Daemonize-Manage

Daemonize management base class, providing daemon creation and termination, logging, child process management/守护进程管理基类,提供守护进程创建及终止、日志记录、子进程管理
Python
2
star
7

MiniORM-beanstalk

Based on the beanstalkc provide beanstalk light-weight ORM/基于beanstalkc提供操作beanstalk的轻量级ORM
Python
2
star
8

JsonExtractor

Pan JSON format data extractor, based on the stack and the regular extraction of JSON specified level key value/泛JSON格式数据抽取器,基于栈和正则抽取JSON中指定层级key的value
Java
1
star
9

Threadpool

Can loop through the thread pool, support pass function, pass, transfer call function, immediately terminate all threads, support thread recycling, save time and resources/可循环线程池,支持传函数、传参、传回调函数、立即终止所有线程,支持线程的循环利用,节省时间和资源
Python
1
star