• This repository has been archived on 10/Nov/2017
  • Stars
    star
    104
  • Rank 330,604 (Top 7 %)
  • Language
    JavaScript
  • Created over 9 years ago
  • Updated almost 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Node proxy server attempting to fetch readable contents from any provided URL.

readable-proxy

Build Status Dependency Status

Proxy server to retrieve a readable version of any provided url, powered by Node, PhantomJS and Readability.js.

Installation

$ git clone https://github.com/n1k0/readable-proxy
$ cd readable-proxy
$ npm install

Run

Starts server on localhost:3000:

$ npm start

Note about CORS: by design, the server will allow any origin to access it, so browsers can consume it from pages hosted on a different domain.

Configuration

By default, the proxy server will use the Readability.js version it ships with; to override this, you can set the READABILITY_LIB_PATH environment variable to the absolute path to the library file on your local system:

$ READABILITY_LIB_PATH=/path/to/my/own/version/of/Readability.js npm start

Usage

Web UI

Just head to http://localhost:3000/, enter some URL and start enjoying both original and readable renderings side by side.

REST/JSON API

The HTTP Rest API is available under /api.

Disclaimer: Truly REST implementation is probably far from being considered achieved.

GET /api/get

Required parameters
  • url: The URL to retrieve retrieve readable contents from, eg. https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/.
Optional parameters
  • sanitize: A boolean string to enable HTML sanitization (valid truthy boolean strings: "1", "on", "true", "yes", "y"; everything else will be considered falsy):
  • userAgent: A custom User Agent string. By default, it will use the PhantomJS one.

Note: Enabling contents sanitization loses Readability.js specific HTML semantics, though is probably safer for users if you plan to publish retrieved contents on a public website.

Example

Content sanitization enabled:

$ curl http://0.0.0.0:3000/api/get\?sanitize=y&url\=https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/
{
  "byline":"Nicolas Perriault β€”",
  "content":"<p><strong>So finally you&#39;re <a href=\"https://nicolas.perriault.net/code/2013/testing-frontend-javascript-code-using-mocha-chai-and-sinon/\">testing",
  "length":2867,
  "title":"Get your Frontend JavaScript Code Covered | Code",
  "uri":"https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/",
  "isProbablyReaderable": true
}

Content sanitization disabled (default):

$ curl http://0.0.0.0:3000/api/get\?url\=https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/
{
  "byline":"Nicolas Perriault β€”",
  "content":"<div id=\"readability-page-1\" class=\"page\"><section class=\"\">\n<p><strong>So finally you're…",
  "length":3851,
  "title":"Get your Frontend JavaScript Code Covered | Code",
  "uri":"https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/",
  "isProbablyReaderable": true
}

Note: the isProbablyReaderable property tells if Readability has determined if page contents were parseable or not.

Usage from node

scrape() function

The scrape function scrapes a URL and returns a Promise with the JSON result object described above:

var scrape = require("readable-proxy").scrape;
var url = "https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/";

scrape(url, {sanitize: true, userAgent: "My custom User-Agent string"})
  .then(console.error.log(console))
  .catch(console.error.bind(console));

Tests

$ npm test

License

MPL 2.0.

More Repositories

1

SublimeHighlight

A humble SublimeText package for exporting highlighted code as RTF or HTML
Python
788
star
2

tinysynth

A drums looper made with React and the WebAudio API
JavaScript
249
star
3

tooty

An alternative multi-accounts Web client for Mastodon.
Elm
156
star
4

kept

Personal notes as widgets, inspired by Google Keep
JavaScript
133
star
5

backbone-events-standalone

Standalone, minimal version of Backbone.Events
JavaScript
63
star
6

nicolas.perriault.net

My personal blog
CSS
53
star
7

npAssetsOptimizerPlugin

[No more maintained] Symfony 1.3+ plugin for Web assets compression, combination and optimization.
PHP
37
star
8

stpackages

A webapp listing Sublime Text packages powered by node, elasticsearch & angularjs
JavaScript
20
star
9

SublimeText-CasperJS

A SublimeText2 bundle for CasperJS
20
star
10

akDoctrineTemplateCacheInvaliderPlugin

[No more maintained] A Symfony plugin to manage dynamic templates cache invalidation on Doctrine objects save
PHP
17
star
11

wordlem

A simplistic port of the popular Wordle game in Elm.
Elm
17
star
12

toctoc

Generates and maintain a Table of Content for your README.md.
JavaScript
16
star
13

sfDoctrineEditableComponentPlugin

Symfony plugin for setting up edit-in-place, plain or wysiwyg/html components
JavaScript
13
star
14

elm-daterange-picker

A date range picker written in Elm.
Elm
12
star
15

djortunes

Fortunes manager written in Django.
Python
7
star
16

nodetunes

A fortune manager written using node, express, mongoose and coffeescript
CoffeeScript
7
star
17

docbrown

Minimalistic Flux implementation.
JavaScript
6
star
18

myshows

A simple Elm app to list and rate your favorite TV shows.
Elm
5
star
19

jquery-expander

A jQuery plugin to truncate/expand long texts
JavaScript
4
star
20

jqPlanize

jQuery plugin and bookmarklet to provide HTML documents some hierarchical organization
JavaScript
4
star
21

inslides

Silly slide generator inspired by outslides
JavaScript
3
star
22

casperjs-docs

Sphinx documentation for CasperJS
Python
3
star
23

sftunes

Fortunes manager written in Symfony.
PHP
3
star
24

gameoflife

Conway's Game of Life implemented in JavaScript
JavaScript
3
star
25

jetlagged

A mozilla jetpack extension allowing to translate any selected text using the google API
JavaScript
3
star
26

elm-kitten

Create an SPA using Elm and SASS.
Elm
3
star
27

swwwitch

Web port of the Switch game.
JavaScript
3
star
28

ckMediaEmbed

A CKEditor plugin allowing to embed html code snippet in an editor instance, using a convenient button
JavaScript
3
star
29

monkeypox-stats

Experiments with open data & dataviz about the monkeypox pandemic
Elm
2
star
30

checkio-solutions

My solutions for the check.io webgame
Python
2
star
31

exercisms

Personal solutions & attempts for exercism.io. For archiving purpose.
Elm
2
star
32

Presentations

Slides from talks I gave at random conferences
JavaScript
2
star
33

github-sync-upstream

Python script to automatically synchronize read-only forked repositories of a github org/user with their upstream parents.
Python
2
star
34

gastrobot

A mastodon bot posting random recipe names in French, daily.
JavaScript
2
star
35

elm-advent-2021

Advent of Code 2021, in Elm
Elm
1
star
36

jetconv

JetConv is a mozilla Jetpack allowing to convert an amount of money into another currency, using the Google Finance API.
JavaScript
1
star
37

atelier-djangocong-2015

Atelier Djangocong 2015.
JavaScript
1
star
38

NSMemcached

A simple implementation of a namespaced Python client for memcached
Python
1
star
39

n1k0.github.com

1
star
40

loop-ui-toolkit

Temporary repository for the Loop client UI toolkit.
CSS
1
star
41

cyclobot

A mastodon bot posting random silly bicycle innovations
JavaScript
1
star