tokio
Web scraping made simple.
Features
- Built on the top of jsdom.
- It runs inline and external scripts on the page.
- You can add resource filter to not load certain external resources.
- Simple and fast, only 100 SLOC and it does not require Electron or Chromium.
Sponsor
Install
yarn add tokio
Table of Contents
Usage
const Tokio = require('tokio')
const tokio = new Tokio({
url: 'https://some-website.com'
})
tokio.fetch().then(html => {
console.log(html) //=> string
// Query HTML with cheerio (server-side jQuery)
// https://github.com/cheeriojs/cheerio
const $ = tokio.query(html)
})
API
new Tokio(options)
options
options.url
- Type:
string
- Required:
required
The URL to fetch.
options.wait
- Type:
number
string
- Default:
50
Wait for certain time (in milliseconds) or dom element to show up.
options.manually
- Type:
boolean
string
Instead of using options.wait, you can manually call window.__tokio_ready__()
in your website to tell us that it's ready to be captured.
It can also be a string like i_am_ready
so that you can call window.i_am_ready()
instead.
options.resourceFilter
- Type:
resource => boolean
Whether to load certain resource. Check out the resource type.
options.requestOptions
proxy
:string
A URL for a HTTP proxy to use for the requests.agent
: http(s).Agent instance to use.agentOptions
: The agent options; defaults to{ keepAlive: true, keepAliveMsecs: 115000 }
, see http api for more details.strictSSL
: Iftrue
, requires SSL certificates be valid; defaults totrue
, see request module for more details.userAgent
: The user agent string used in requests; defaults toNode.js (#process.platform#; U; rv:#process.version#)
headers
: An object giving any headers that will be used while loading the HTML fromoptions.url
, if applicable.
options.variables
Inject variables to the global scope window
.
tokio.fetch()
- Type:
() => Promise<string>
Fetch URL and return corresponding HTML. (JavaScript on this page will be evaluated.)
tokio.query(html, opts)
This is basically cheerio.load(html, opts)
.
Contributing
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request :D
Author
tokio Β© egoist, Released under the MIT License.
Authored and maintained by egoist with help from contributors (list).
github.com/egoist Β· GitHub @egoist Β· Twitter @_egoistlily