• Stars
    star
    390
  • Rank 110,242 (Top 3 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Android 本地网络小说爬虫,基于jsoup及xpath

CrawlerForReader

Android 本地网络小说爬虫,基于 jsoup 与 xpath,通过模版解析网页。

阅读器实现:https://github.com/smuyyh/BookReader

支持书源

/**
 * 所有书源
 */
public static final SparseArray<Source> SOURCES = new SparseArray<Source>() {
    {
        put(SourceID.LIEWEN, new Source(SourceID.LIEWEN, "猎文网", "https://www.liewen.cc/search.php?keyword=%s"));
        put(SourceID.CHINESE81, new Source(SourceID.CHINESE81, "八一中文网", "https://www.zwdu.com/search.php?keyword=%s"));
        put(SourceID.ZHUISHU, new Source(SourceID.ZHUISHU, "追书网", "https://www.zhuishu.tw/search.aspx?keyword=%s"));
        put(SourceID.BIQUG, new Source(SourceID.BIQUG, "笔趣阁", "http://zhannei.baidu.com/cse/search?s=1393206249994657467&q=%s"));
        put(SourceID.WENXUEMI, new Source(SourceID.WENXUEMI, "文学迷", "http://www.wenxuemi.com/search.php?keyword=%s"));
        put(SourceID.CHINESEXIAOSHUO, new Source(SourceID.CHINESEXIAOSHUO, "小说中文网", "http://www.xszww.com/s.php?ie=gbk&s=10385337132858012269&q=%s"));
        put(SourceID.DINGDIAN, new Source(SourceID.DINGDIAN, "顶点小说", "http://zhannei.baidu.com/cse/search?s=1682272515249779940&q=%s"));
        put(SourceID.BIQUGER, new Source(SourceID.BIQUGER, "笔趣阁2", "http://zhannei.baidu.com/cse/search?s=7928441616248544648&ie=utf-8&q=%s"));
        put(SourceID.CHINESEZHUOBI, new Source(SourceID.CHINESEZHUOBI, "着笔中文网", "http://www.zbzw.com/s.php?ie=utf-8&s=4619765769851182557&q=%s"));
        put(SourceID.DASHUBAO, new Source(SourceID.DASHUBAO, "大书包", "http://zn.dashubao.net/cse/search?s=9410583021346449776&entry=1&ie=utf-8&q=%s"));
        put(SourceID.CHINESEWUZHOU, new Source(SourceID.CHINESEWUZHOU, "梧州中文台", "http://www.gxwztv.com/search.htm?keyword=%s"));
        put(SourceID.UCSHUMENG, new Source(SourceID.UCSHUMENG, "UC书盟", "http://www.uctxt.com/modules/article/search.php?searchkey=%s", 4));
        put(SourceID.QUANXIAOSHUO, new Source(SourceID.QUANXIAOSHUO, "全小说", "http://qxs.la/s_%s"));
        put(SourceID.YANMOXUAN, new Source(SourceID.YANMOXUAN, "衍墨轩", "http://www.ymoxuan.com/search.htm?keyword=%s"));
        put(SourceID.AIQIWENXUE, new Source(SourceID.AIQIWENXUE, "爱奇文学", "http://m.i7wx.com/?m=book/search&keyword=%s"));
        put(SourceID.QIANQIANXIAOSHUO, new Source(SourceID.QIANQIANXIAOSHUO, "千千小说", "http://www.xqqxs.com/modules/article/search.php?searchkey=%s", 4));
        put(SourceID.PIAOTIANWENXUE, new Source(SourceID.PIAOTIANWENXUE, "飘天文学网", "http://www.piaotian.com/modules/article/search.php?searchtype=articlename&searchkey=%s"));
        put(SourceID.SUIMENGXIAOSHUO, new Source(SourceID.SUIMENGXIAOSHUO, "随梦小说网", "http://m.suimeng.la/modules/article/search.php?searchkey=%s", 4));
        put(SourceID.DAJIADUSHUYUAN, new Source(SourceID.DAJIADUSHUYUAN, "大家读书苑", "http://www.dajiadu.net/modules/article/searchab.php?searchkey=%s"));
        put(SourceID.SHUQIBA, new Source(SourceID.SHUQIBA, "书旗吧", "http://www.shuqiba.com/modules/article/search.php?searchkey=%s", 4));
        put(SourceID.XIAOSHUO52, new Source(SourceID.XIAOSHUO52, "小说52", "http://m.xs52.com/search.php?searchkey=%s"));
    }
};

模版示例

例如针对猎文网:

{
    "id": 1, //对应书源id
    "search": { //搜索页解析规则
      "charset": "UTF-8",
      "xpath": "//div[@class='result-item result-game-item']",
      "coverXpath": "//div[@class='result-game-item-pic']//a//img/@src",
      "titleXpath": "//div[@class='result-game-item-detail']//h3//a/@title",
      "linkXpath": "//div[@class='result-game-item-detail']//h3//a/@href",
      "authorXpath": "//div[@class='result-game-item-detail']//div[@class='result-game-item-info']//p[1]/span[2]/text()",
      "descXpath": "//div[@class='result-game-item-detail']//p[@class='result-game-item-desc']/text()"
    },
    "catalog": { //目录列表解析规则
      "xpath": "//div[@id=list]//dl//dd",
      "titleXpath": "//a/text()",
      "linkXpath": "//a/@href"
    },
    "content": { //文章内容解析规则
      "xpath": "//div[@id='content']/text()"
    }
  }

调用方式

目前虽然请求结果是通过 Callback形式,因为搜多个源是分批返回结果。内部当仍是同步请求,没有做线程调度。

Crawler.search("你好", new SearchCallback() {
    @Override
    public void onResponse(String keyword, List<SearchBook> appendList) {
    }

    @Override
    public void onFinish() {

    }

    @Override
    public void onError(String msg) {

    }
});
Crawler.catalog(new SearchBook.SL("https://www.liewen.cc/b/24/24934/", SourceManager.SOURCES.get(1)), new ChapterCallback() {
    @Override
    public void onResponse(List<Chapter> chapters) {

    }

    @Override
    public void onError(String msg) {

    }
});

Crawler.content(new SearchBook.SL("https://www.liewen.cc/b/24/24934/", SourceManager.SOURCES.get(1)), "/b/24/24934/12212511.html", new ContentCallback() {
    @Override
    public void onResponse(String content) {

    }

    @Override
    public void onError(String msg) {

    }
});

ScreenShot

Search

SearchResult

BookDetail

ChangeSource

License

   Copyright 2016 smuyyh, All right reserved.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

More Repositories

1

BookReader

📕 "任阅" 网络小说阅读器,3D翻页效果、txt/pdf/epub书籍阅读、Wifi传书~
Java
6,562
star
2

ImageSelector

🌁 Android 图片选择器。充分自由定制,极大程度简化使用,支持图库多选/图片预览/单选/照片裁剪/拍照/自定义图片加载方式/自定义色调/沉浸式状态栏
Java
1,599
star
3

SprintNBA

🏀 NBA客户端
Java
632
star
4

BubblePopupWindow

Android 实现各个方向的气泡弹窗,可控制气泡尖角偏移量。
Java
345
star
5

BankCardFormat

💳 自动格式化银行卡号的EditText,卡号格式化、归属银行及卡别判断
Java
282
star
6

EasyGuideView

Android app新手引导,任意View高亮提示,简单易用
Java
197
star
7

JsonViewer

Android json viewer, to convert json strings to a friendly readable format, it supports expend&collapsed json objects.
Java
195
star
8

IncrementallyUpdate

Android 应用增量更新
C
179
star
9

EasyAdapter

Android 轻量级适配器,简化使用,适应所有的AbsListView、RecyclerView。支持HeaderView与FooterView~
Java
168
star
10

CommonLibary

CommonLibrary主要是自己整理的一些项目开发中常用的工具类、通用UI的集合,尽可能的覆盖Android开发中通用的一些东西
Java
139
star
11

AutoInstall

Android 无需Root实现APK的静默安装
Java
135
star
12

NestedRecyclerView

RecyclerView 嵌套 多Tab 吸顶容器,类似 盒马、朴朴超市 首页。
Java
111
star
13

StickyHeaderListView

仿滴滴打车开具发票页,ListView粘性Header
Java
89
star
14

HardwareTest

Android 各个硬件模块自动化测试。包括LCD、摄像头、键盘、闪光灯、声音、磁盘存储、震动、触摸屏、NFC及各类传感器的测试。
Java
87
star
15

StickyHeaderRecyclerView

RecyclerView 悬浮吸顶 Header,支持阴影、点击事件与状态绑定
Java
75
star
16

easyDAO

easyDAO is a light-weight&fast ORM library for Android that maps objects to SQLite , it becomes much easier to operate the SQLite database.
Java
35
star
17

ReactNativeTools

IntelliJ plugin, to make it easier to execute react-native commands
Java
26
star
18

GitHubWidgets

GitHub html widget, include User Widget、Repo Widget and Activity Widget.
JavaScript
18
star
19

SimpleMail

一个基于POP3和SMTP协议的邮件收发库
Java
10
star
20

ReflectionDemo

从Java反射机制到Android自定义注解框架
Java
10
star
21

AndroidLayoutIDConverter

Android Studio插件。快速生成findViewById代码/ButtefKnife注解/AndroidAnnotations注解~
Java
4
star
22

JustTest

contributions
Java
2
star