• Stars
    star
    2,480
  • Rank 18,539 (Top 0.4 %)
  • Language
    HTML
  • Created over 12 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿ“š Turn any web page into a clean view

Readability

Turn any web page into a clean view. This module is based on arc90's readability project.

Features

  1. Optimized for more websites.
  2. Supporting HTML5 tags (article, section) and Microdata API.
  3. Focusing on both accuracy and performance. 4x times faster than arc90's version.
  4. Supporting encodings such as GBK and GB2312.
  5. Converting relative urls to absolute for images and links automatically (Thank Guillermo Baigorria & Tom Sutton).

Example

Before -> After

Install

$ npm install node-readability

Note that from v2.0.0, this module only works with Node.js >= 2.0. In the meantime you are still welcome to install a release in the 1.x series (by npm install node-readability@1) if you use an older Node.js version.

Usage

read(html [, options], callback)

Where

  • html url or html code.
  • options is an optional options object
  • callback is the callback to run - callback(error, article, meta)

Example

var read = require('node-readability');

read('http://howtonode.org/really-simple-file-uploads', function(err, article, meta) {
  // Main Article
  console.log(article.content);
  // Title
  console.log(article.title);

  // HTML Source Code
  console.log(article.html);
  // DOM
  console.log(article.document);

  // Response Object from Request Lib
  console.log(meta);

  // Close article to clean up jsdom and prevent leaks
  article.close();
});

NB If the page has been marked with charset other than utf-8, it will be converted automatically. Charsets such as GBK, GB2312 is also supported.

Options

node-readability will pass the options to request directly. See request lib to view all available options.

node-readability has two additional options:

  • cleanRulers which allow set your own validation rule for tags.

If true rule is valid, otherwise no. options.cleanRulers = [callback(obj, tagName)]

read(url, {
  cleanRulers: [
    function(obj, tag) {
      if(tag === 'object') {
        if(obj.getAttribute('class') === 'BrightcoveExperience') {
          return true;
        }
      }
    }
  ]}, function(err, article, response) {
    //...
  });
  • preprocess which should be a function to check or modify downloaded source before passing it to readability.

options.preprocess = callback(source, response, contentType, callback);

read(url, {
    preprocess: function(source, response, contentType, callback) {
      if (source.length > maxBodySize) {
        return callback(new Error('too big'));
      }
      callback(null, source);
    }
  }, function(err, article, response) {
    //...
  });

article object

content

The article content of the web page. Return false if failed.

title

The article title of the web page. It's may not same to the text in the <title> tag.

textBody

A string containing all the text found on the page

html

The original html of the web page.

document

The document of the web page generated by jsdom. You can use it to access the DOM directly (for example, article.document.getElementById('main')).

meta object

Response object from request lib. If you need to get current url after all redirect or get some headers it can be useful.

Why not Cheerio

This lib is using jsdom to parse HTML instead of cheerio because some data such as image size and element visibility isn't able to acquire when using cheerio, which will significantly affect the result.

Contributors

https://github.com/luin/node-readability/graphs/contributors

License

This code is under the Apache License 2.0. http://www.apache.org/licenses/LICENSE-2.0

More Repositories

1

medis

๐Ÿ’ป Medis is a beautiful, easy-to-use Mac database management application for Redis.
JavaScript
11,605
star
2

wechat-export

๐Ÿ“ƒ Export WeChat chat histories to HTML files.
C
674
star
3

ranaly

๐Ÿ“ˆ An easy to use chart system
JavaScript
498
star
4

CodeGame

๐ŸŽฎ JavaScript AI tank game
JavaScript
349
star
5

express-promise

โค๏ธ Middleware for easy rendering of async Query results.
JavaScript
317
star
6

express-di

Dependency injection for Express applications
JavaScript
160
star
7

npm-try

๐Ÿš† Quickly try npm packages without writing boilerplate code.
TypeScript
113
star
8

bazinga

๐Ÿ’ฅ The best all-in-one toolbox. Bazinga!
TypeScript
110
star
9

redis-book-assets

ใ€ŠRedisๅ…ฅ้—จๆŒ‡ๅ—ใ€‹็ฌฌไบ”็ซ ็จ‹ๅบไปฃ็ 
JavaScript
92
star
10

serialize

Serialize an object including it's function into a JSON.
JavaScript
79
star
11

superfetch

A super powerful node.js HTTP client with the support of promise.
JavaScript
77
star
12

Hits-the-mole

The Hits-the-mole game implemented in pure CSS
CSS
62
star
13

colortype

A responsive WordPress theme
PHP
38
star
14

OhMyPullRequests

๐Ÿš€ Access my pull requests from the menu bar
Swift
30
star
15

node_ranaly

Ranaly client library
JavaScript
19
star
16

express-mongoose

JavaScript
15
star
17

SwiftJSONFormatter

๐Ÿชž Formatter JSON delightfully.
Swift
15
star
18

teascript

Synchronous JavaScript
JavaScript
13
star
19

slicee

a CLI version of slicy.
Ruby
13
star
20

typo

ๅฆˆๅฆˆไปŽๆญคๅ†ไนŸไธๆ€•ๆˆ‘่พ“้”™ๅฑžๆ€ง๏ผˆๆ–นๆณ•๏ผ‰ๅไบ†๏ผ
JavaScript
12
star
21

dotQ

Yet another Q.
JavaScript
10
star
22

redis-book-v3-code

JavaScript
8
star
23

Tribbble

A Dribbble client for iPhone
Objective-C
7
star
24

blog

My blog
HTML
6
star
25

php-lugit-framework

็ฎ€ๅ•ไผ˜้›…็š„PHPๅ‰็ซฏๆก†ๆžถใ€‚
PHP
6
star
26

the-css-that-you-dont-know-about

ไฝ ไธ็Ÿฅ้“็š„ CSS
HTML
6
star
27

quicker-npm-run

Alternative to `npm run` with support for autocomplete.
JavaScript
4
star
28

scene

A tiny front-end framework designed with โค๏ธ
JavaScript
4
star
29

splitargs

Splitting Redis arguments as redis-cli
JavaScript
3
star
30

Suki

An elegant web framework for node.js
CoffeeScript
3
star
31

guokroup

ๅฐ†ๆžœๅฃณๆ‰€ๆœ‰ๅฐ็ป„็š„ๆœ€ๆ–ฐ่ดดๅญๆ˜พ็คบๅœจไธ€้กต
CoffeeScript
3
star
32

cheerio-tester

Test the cheerio selector online
JavaScript
3
star
33

sample-NSSplitViewControlller

Swift
2
star
34

buddybook

books for buddies.
JavaScript
1
star
35

superspider

A powerful and distributed spider library used to crawl the web with the API of jQuery
1
star
36

3-key

โŒจ๏ธ Personal key mapping for The Key.
C
1
star
37

SwiftPHPSerialization

Swift implementation of PHP's `serialize` and `unserialize`.
Swift
1
star
38

GameEngineSuki

A JavaScript Game Engine
CoffeeScript
1
star
39

express-sequelize

Adds Sequelize Promise support to Express rendering.
JavaScript
1
star
40

Threes-AI

A Threes! AI
CoffeeScript
1
star
41

dribbble

A full-featured Dribbble client for Node.js
JavaScript
1
star
42

my-first-ios-app

Swift
1
star
43

aqsort

An implementation of asynchronous quicksort in JavaScript
JavaScript
1
star