• Stars
    star
    1,393
  • Rank 33,729 (Top 0.7 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created about 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

browserless is an efficient way to interact with a headless browser built in top of Puppeteer.

browserless browserless

Last version Coverage Status NPM Status

browserless is an efficient way to interact with a headless browser built in top of puppeteer.

Highlights

Installation

You can install it via npm:

$ npm install browserless puppeteer --save

browserless is backed by puppeteer, so you need to install it as well.

You can use it next to puppeteer, puppeteer-core or puppeteer-firefox, interchangeably.

Usage

This is a full example for showcase all the browserless capabilities:

const createBrowser = require('browserless')
const termImg = require('term-img')

// First, create a browserless factory
// that it will keep a singleton process running
const browser = createBrowser()

// After that, you can create as many browser context
// as you need. The browser contexts won't share cookies/cache
// with other browser contexts.
const browserless = await browser.createContext()

// Perform the action you want, e.g., getting the HTML markup
const buffer = await browserless.screenshot('http://example.com', {
  device: 'iPhone 6'
})

console.log(termImg(buffer))

// After your task is done, destroy your browser context
await browserless.destroyContext()

// At the end, gracefully shutdown the browser process
await browser.close()

As you can see, browserless is implemented using a single browser process and creating/destroying specific browser contexts.

If you're already using puppeteer, you can upgrade to use browserless instead almost with no effort.

Additionally, you can use some specific packages in your codebase, interacting with them from puppeteer.

CLI

With the command-line interface (CLI) you can interact with browserless methods using a terminal, or through an automated system:

cli.webm

Just install @browserless/cli globally in your system using your favorite package manager:

npm install -g @browserless/cli

Initializing a browser

The browserless main method is for creating a headless browser.

const createBrowser = require('browserless')

const browser = createBrowser({
  timeout: 25000,
  lossyDeviceName: true,
  ignoreHTTPSErrors: true
})

Once the browser is initialized, some browser high level methods are available:

// Now, just call `createContext` for creating a browser tab
const browserless = await browser.createContext({ retry: 2 })

const buffer = await browserless.screenshot('https://example.com')

// You call `destroyContext` to close the browser tab.
await browserless.destroyContext()

The browser keeps running until you explicitly close it:

// At the end, gracefully shutdown the browser process
await browser.close()

.constructor(options)

You can pass any puppeteer.launch#options.

Additionally, you can setup:

defaultDevice

type: string
default: 'Macbook Pro 13'

Sets a consistent device viewport for each page.

lossyDeviceName

type: boolean
default: false

It enables lossy detection over the device descriptor input.

const browserless = require('browserless')({ lossyDeviceName: true })

browserless.getDevice({ device: 'macbook pro 13' })
browserless.getDevice({ device: 'MACBOOK PRO 13' })
browserless.getDevice({ device: 'macbook pro' })
browserless.getDevice({ device: 'macboo pro' })

This setting is oriented for find the device even if the descriptor device name is not exactly the same.

mode

type: string
default: launch
values: 'launch' | 'connect'

It defines if browser should be spawned using puppeteer.launch or puppeteer.connect

timeout

type: number
default: 30000

This setting will change the default maximum navigation time.

puppeteer

type: Puppeteer
default: puppeteer|puppeteer-core|puppeteer-firefox

It's automatically detected based on your dependencies being supported puppeteer, puppeteer-core or puppeteer-firefox.

.createContext(options)

After initialize the browser, you can create browser context that is equivalente to open a tab:

const browserless = browser.createContext({
  retry: 2
})

Every browser context is isolated. They won't share cookies/cache with other browser contexts. They also can contain specific options.

options

Any browser.createIncognitoBrowserContext#options can be passed.

Additionally, you can setup:

retry

type: number
default: 2

The number of retries that can be performed before considering a navigation as failed.

.browser()

It returns the internal Browser instance.

const headlessBrowser = await browser.browser()

console.log('My headless browser PID is', headlessBrowser.process().pid)

.respawn()

It will respawn the internal browser.

const getPID = promise => (await promise).process().pid

console.log('Process PID:', await getPID(browser.browser()))

await browser.respawn()

console.log('Process PID:', await getPID(browser.browser()))

This method is an implementation detail, normally you don't need to call it.

.close()

It will close the internal browser.

const { onExit } = require('signal-exit')
// automatically teardown resources after
// `process.exit` is called
onExit(browser.close)

Built-in

.html(url, options)

It serializes the content from the target url into HTML.

const html = await browserless.html('https://example.com')

console.log(html)
// => "<!DOCTYPE html><html><head>…"

options

See browserless.goto to know all the options and values supported.

.text(url, options)

It serializes the content from the target url into plain text.

const text = await browserless.text('https://example.com')

console.log(text) 
// => "Example Domain\nThis domain is for use in illustrative…"

options

See browserless.goto to know all the options and values supported.

.pdf(url, options)

It generates the PDF version of a website behind an url.

const buffer = await browserless.pdf('https://example.com')

console.log(`PDF generated in ${buffer.byteLength()} bytes`)

options

This method use the following options by default:

{
  margin: '0.35cm',
  printBackground: true,
  scale: 0.65
}

See browserless.goto to know all the options and values supported.

Also, any page.pdf option is supported.

Additionally, you can setup:

margin

type: string | string[]
default: '0.35cm'

It sets paper margins. All possible units are:

  • px for pixel.
  • in for inches.
  • cm for centimeters.
  • mm for millimeters.

You can pass an object object specifying each corner side of the paper:

const buffer = await browserless.pdf(url.toString(), {
  margin: {
    top: '0.35cm',
    bottom: '0.35cm',
    left: '0.35cm',
    right: '0.35cm'
  }
})

Or, in case you pass an string, it will be used for all the sides:

const buffer = await browserless.pdf(url.toString(), {
  margin: '0.35cm'
})

.screenshot(url, options)

It takes a screenshot from the target url.

const buffer = await browserless.screenshot('https://example.com')

console.log(`Screenshot taken in ${buffer.byteLength()} bytes`)

options

This method use the following options by default:

{
  device: 'macbook pro 13'
}

See browserless.goto to know all the options and values supported.

Also, any page.screenshot option is supported.

Additionally, you can setup:

codeScheme

type: string
default: 'atom-dark'

When this value is present and the response 'Content-Type' header is 'json', it beautifies HTML markup using Prism.

The syntax highlight theme can be customized, being possible to setup:

  • A prism-themes identifier (e.g., 'dracula').
  • A remote URL (e.g., 'https://unpkg.com/prism-theme-night-owl').
element

type: string

Capture the DOM element matching the given CSS selector. It will wait for the element to appear in the page and to be visible.

overlay

type: object

After the screenshot has been taken, this option allows you to place the screenshot into a fancy overlay

You can configure the overlay specifying:

  • browser: It sets the browser image overlay to use, being light and dark supported values.
  • background: It sets the background to use, being supported to pass:
    • An hexadecimal/rgb/rgba color code, eg. #c1c1c1.
    • A CSS gradient, eg. linear-gradient(225deg, #FF057C 0%, #8D0B93 50%, #321575 100%)
    • An image url, eg. https://source.unsplash.com/random/1920x1080.
const buffer = await browserless.screenshot(url.toString(), {
  styles: [
    '.crisp-client, #cookies-policy { display: none; }'
  ],
  overlay: {
    browser: 'dark',
    background:
      'linear-gradient(45deg, rgba(255,18,223,1) 0%, rgba(69,59,128,1) 66%, rgba(69,59,128,1) 100%)'
  }
})

.destroyContext(options)

It will destroy the current browser context.

const browserless = await browser.createContext({ retry: 0 })

const content = await browserless.html('https://example.com')

await browserless.destroyContext()

options

force

type: string
default: 'force'

When force is passed, it avoids recreating the context in case a browser actions is being executed.

.getDevice(options)

Giving a specific device descriptons, this method will be the devices settings for it.

browserless.getDevice({ device: 'Macbook Pro 15' })

// => {
//   userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36',
//   viewport: {
//     width: 1440,
//     height: 900,
//     deviceScaleFactor: 2,
//     isMobile: false,
//     hasTouch: false,
//     isLandscape: false
//   }
// }

It extends from puppeteer.devices, adding some missing devices there.

options

device

type: string

The device descriptor name. It's used to find the rest presets associated with it.

When lossyDeviceName is enabled, a fuzzy search rather than a strict search will be performed in order to maximize getting a result back.

viewport

type: object

An extra of viewport settings that will be merged with the device presets.

browserless.getDevice({
  device: 'iPad',
  viewport: {
    isLandscape: true
  }
})
headers

type: object

An extra of headers that will be merged with the device presets.

browserless.getDevice({
  device: 'iPad',
  headers: {
    'user-agent': 'googlebot'
  }
})

.evaluate(fn, gotoOpts)

It exposes an interface for creating your own evaluate function, passing you the page and response.

The fn will receive page and response as arguments:

const ping = browserless.evaluate((page, response) => ({
  statusCode: response.status(),
  url: response.url(),
  redirectUrls: response.request().redirectChain()
}))

await ping('https://example.com')
// {
//   "statusCode": 200,
//   "url": "https://example.com/",
//   "redirectUrls": []
// }

You don't need to close the page; It will be closed automatically.

Internally, the method performs a browserless.goto, being possible to pass extra arguments as second parameter:

const serialize = browserless.evaluate(
  page => page.evaluate(() => document.body.innerText),
  {
    waitUntil: 'domcontentloaded'
  }
)

await serialize('https://example.com')
// => '<!DOCTYPE html><html><div>…'

.goto(page, options)

It performs a page.goto with a lot of extra capabilities:

const page = await browserless.page()
const { response, device } = await browserless.goto(page, { url: 'http://example.com' })

options

Any option passed here will bypass to page.goto.

Additionally, you can setup:

abortTypes

type: array
default: []

It sets the ability to abort requests based on the ResourceType.

adblock

type: boolean
default: true

It enabled the builtin adblocker by Cliqz that aborts unnecessary third party requests associated with ads services.

animations

type: boolean
default: false

Disable CSS animations and transitions, also it sets prefers-reduced-motion consequently.

click

type: string | string[]

Click the DOM element matching the given CSS selector.

colorScheme

type: string
default: 'no-preference'

Sets prefers-color-scheme CSS media feature, used to detect if the user has requested the system use a 'light' or 'dark' color theme.

device

type: string
default: 'macbook pro 13'

It specifies the device descriptor to use in order to retrieve userAgent and viewport.

headers

type: object

An object containing additional HTTP headers to be sent with every request.

const browserless = require('browserless')

const page = await browserless.page()
await browserless.goto(page, {
  url: 'http://example.com',
  headers: {
    'user-agent': 'googlebot',
    cookie: 'foo=bar; hello=world'
  }
})

This sets visibility: hidden on the matched elements.

html

type: string

In case you provide HTML markup, a page.setContent avoiding fetch the content from the target URL.

javascript

type: boolean
default: true

When it's false, it disables JavaScript on the current page.

mediaType

type: string
default: 'screen'

Changes the CSS media type of the page using page.emulateMediaType.

modules

type: string | string[]

Injects <script type="module"> into the browser page.

It can accept:

  • Absolute URLs (e.g., 'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js').
  • Local file (e.g., `'local-file.js').
  • Inline code (e.g., "document.body.style.backgroundColor = 'red'").
const buffer = await browserless.screenshot(url.toString(), {
  modules: [
    'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js',
    'local-file.js',
    "document.body.style.backgroundColor = 'red'"
  ]
})
onPageRequest

type:function

Associate a handler for every request in the page.

scripts

type: string | string[]

Injects <script> into the browser page.

It can accept:

  • Absolute URLs (e.g., 'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js').
  • Local file (e.g., `'local-file.js').
  • Inline code (e.g., "document.body.style.backgroundColor = 'red'").
const buffer = await browserless.screenshot(url.toString(), {
  scripts: [
    'https://cdn.jsdelivr.net/npm/[email protected]/dist/jquery.min.js',
    'local-file.js',
    "document.body.style.backgroundColor = 'red'"
  ]
})

Prefer to use modules whenever possible.

scroll

type: string

Scroll to the DOM element matching the given CSS selector.

styles

type: string | string[]

Injects <style> into the browser page.

It can accept:

  • Absolute URLs (e.g., 'https://cdn.jsdelivr.net/npm/[email protected]/dist/dark.css').
  • Local file (e.g., `'local-file.css').
  • Inline code (e.g., "body { background: red; }").
const buffer = await browserless.screenshot(url.toString(), {
  styles: [
    'https://cdn.jsdelivr.net/npm/[email protected]/dist/dark.css',
    'local-file.css',
    'body { background: red; }'
  ]
})
timezone

type: string

It changes the timezone of the page.

url

type: string

The target URL.

viewport

It will setup a custom viewport, using page.setViewport method.

waitForSelector

type:string

Wait a quantity of time, selector or function using page.waitForSelector.

waitForTimeout

type:number

Wait a quantity of time, selector or function using page.waitForTimeout.

waitUntil

type: string | string[]
default: 'auto'
values: 'auto' | 'load' | 'domcontentloaded' | 'networkidle0' | 'networkidle2'

When to consider navigation succeeded.

If you provide an array of event strings, navigation is considered to be successful after all events have been fired.

Events can be either:

  • 'auto': A combination of 'load' and 'networkidle2' in a smart way to wait the minimum time necessary.
  • 'load': Consider navigation to be finished when the load event is fired.
  • 'domcontentloaded': Consider navigation to be finished when the DOMContentLoaded event is fired.
  • 'networkidle0': Consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
  • 'networkidle2': Consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.

.context()

It returns the BrowserContext associated with your instance.

const browserContext = await browserless.context()

console.log({ isIncognito: browserContext.isIncognito() })
// => { isIncognito: true }

.page()

It returns a standalone Page associated with the current browser context.

const page = await browserless.page()
await page.content()
// => '<html><head></head><body></body></html>'

Extended

function

The @browserless/function package provides an isolated vm scope to run arbitrary JavaScript code with runtime access to a browser page:

const createFunction = require('@browserless/function')

const code = async ({ page }) => page.evaluate('jQuery.fn.jquery')

const version = createFunction(code)

const { isFulfilled, isRejected, value } = await version('https://jquery.com')

// => {
//   isFulfilled: true,
//   isRejected: false,
//   value: '1.13.1'
// }

options

Besides the following properties, any other argument provided will be available during the code execution.

vmOpts

The hosted code is also running inside a secure sandbox created via vm2.

gotoOpts

Any goto#options can be passed for tuning the internal URL resolution.

lighthouse

The @browserless/lighthouse package provides you the setup for running Lighthouse reports backed by browserless.

const createLighthouse = require('@browserless/lighthouse')
const createBrowser = require('browserless')
const { writeFile } = require('fs/promises')
const { onExit } = require('signal-exit')

const browser = createBrowser()
onExit(browser.close)

const lighthouse = createLighthouse(async teardown => {
  const browserless = await browser.createContext()
  teardown(() => browserless.destroyContext())
  return browserless
})

const report = await lighthouse('https://microlink.io')
await writeFile('report.json', JSON.stringify(report, null, 2))

The report will be generated url, extending from lighthouse:default settings, being these settings the same than Google Chrome Audits reports on Developer Tools.

options

The Lighthouse configuration that will extend 'lighthouse:default' settings:

const report = await lighthouse(url, { 
  onlyAudits: ['accessibility'] 
})

Also, you can extend from a different preset of settings:

const report = await lighthouse(url, { 
  preset: 'desktop', 
  onlyAudits: ['accessibility'] 
})

Additionally, you can setup:

The lighthouse execution runs as a worker thread, any worker#options are supported.

logLevel

type: string
default: 'error'
values: 'silent' | 'error' | 'info' | 'verbose'

The level of logging to enable.

output

type: string | string[]
default: 'json'
values: 'json' | 'csv' | 'html'

The type(s) of report output to be produced.

timeout

type: number
default: browserless.timeout

This setting will change the default maximum navigation time.

screencast

The @browserless/screencast package allows you automate browser action and produce a video recording as output.

screencast.webm
const screencast = require('@browserless/screencast')

const buffer = await screencast({
  getBrowserless: () => browserless,
  format: 'webm',
  ffmpegPath: await execa.command('which ffmpeg').then(({ stdout }) => stdout),
  gotoOpts: {
    url: 'https://vercel.com',
    animations: true,
    abortTypes: [],
    waitUntil: 'load'
  },
  withPage: async page => {
    await page.waitForTimeout(7000)
  }
})

options

ffmpegPath

type: string

The path for using ffmpeg binary.

format

type: string
values: 'mp4' | 'gif' | 'webm'
default: 'webm'

The video output format.

frames

These options will be passed to Page.startScreencast

gotoOpts

type: object

These options will be passed to goto#options in order to resolve, prior to starting the recording.

timeout

type: number
default: 30000

Sets the maximum navigation time.

tmpPath

type: string
default: os.tmpdir()

The temporary directory for writing the video. This is necessary for ffmpeg, will be cleaned before the function finished.

withPage(page)

type: function

It sets the in-page browser action to perform during the video recording.

Packages

browserless is internally divided into multiple packages for ensuring just use the minimum quantity of code necessary for your use case.

Package Version
browserless npm
@browserless/benchmark npm
@browserless/cli npm
@browserless/devices npm
@browserless/errors npm
@browserless/examples npm
@browserless/function npm
@browserless/goto npm
@browserless/lighthouse npm
@browserless/pdf npm
@browserless/screencast npm
@browserless/screenshot npm

FAQ

Q: Why use browserless over puppeteer?

browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.

Q: Why do you block ads scripts by default?

Headless navigation is expensive compared with just fetch the content from a website.

In order to speed up the process, we block ads scripts by default because they are so bloat.

Q: My output is different from the expected

Probably browserless was too smart and it blocked a request that you need.

You can active debug mode using DEBUG=browserless environment variable in order to see what is happening behind the code:

Consider open an issue with the debug trace.

Q: I want to use browserless with my AWS Lambda like project

Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.

License

browserless © Microlink, released under the MIT License.
Authored and maintained by Microlink with help from contributors.

The logo has been designed by xinh studio.

microlink.io · GitHub microlinkhq · Twitter @microlinkhq

More Repositories

1

metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
HTML
2,300
star
2

unavatar

Get unified user avatar from social networks, including Instagram, SoundCloud, Telegram, Twitter, YouTube & more.
JavaScript
946
star
3

sdk

Make any URL embeddable. Turn any URL into a beautiful link preview.
HTML
583
star
4

keyvhq

Simple key-value storage with support for multiple backends.
JavaScript
422
star
5

cards

The easiest way to create and share dynamic images at scale.
JavaScript
389
star
6

youtube-dl-exec

A simple Node.js wrapper for youtube-dl/yt-dlp.
JavaScript
316
star
7

async-ratelimiter

Rate limit made simple, easy, async.
JavaScript
298
star
8

react-json-view

JSON viewer for React
JavaScript
188
star
9

www

Browser as API
JavaScript
120
star
10

splashy

Given an whatever image (GIF, PNG, WebP, AVIF, etc) extract predominant & palette colors.
JavaScript
88
star
11

spotify-url-info

Get metadata from any Spotify URL.
JavaScript
68
star
12

html-get

Get the HTML from any website, using prerendering when necessary.
JavaScript
65
star
13

mql

Microlink Query Language. The official HTTP client to interact with Microlink API for Node.js, browsers & Deno.
JavaScript
47
star
14

nanoclamp

🗜Responsive clamping component for React in 735 bytes.
JavaScript
41
star
15

metatags

Ensure your HTML is previewed beautifully across social networks.
JavaScript
29
star
16

async-memoize-one

memoize the last result, in async way.
JavaScript
21
star
17

recipes

JavaScript
15
star
18

oembed-spec

A parser for oEmbed specification.
JavaScript
14
star
19

function

JavaScript Serverless functions with browser programmatic access.
JavaScript
11
star
20

server-proxy

Interact with Microlink API without exposing your credentials
JavaScript
9
star
21

queue

The high resilient queue for processing URLs.
JavaScript
9
star
22

keyv-s3

Amazon S3 storage adapter for Keyv.
JavaScript
8
star
23

openkey

Fast authentication layer for your SaaS, backed by Redis.
JavaScript
7
star
24

cdn

Content Delivery Network for Microlink assets
JavaScript
6
star
25

analytics

Microservice to retrieve your CloudFlare Analytics.
JavaScript
6
star
26

keyv-redis

Redis storage adapter for Keyv.
JavaScript
6
star
27

ping-url

Fast DNS resolution caching results for a while.
JavaScript
6
star
28

lighthouse-viewer

Lighthouse Viewer as service
JavaScript
6
star
29

cli

A CLI for interacting with Microlink API
JavaScript
5
star
30

geolocation

Get detailed information about the incoming request based on the IP address.
JavaScript
5
star
31

oss

Microservice to get the latest public GitHub repos from a user/organization
JavaScript
4
star
32

local

Runs Microlink Function locally.
JavaScript
4
star
33

html

Get HTML from any URL.
JavaScript
3
star
34

youtube-dl-binary

Tiny tool for downloading the latest `youtube-dl` version available.
JavaScript
3
star
35

open

3
star
36

mql-cli

CLI for interacting with Microlink Query Language.
JavaScript
2
star
37

healthcheck

Microservice to retrieve your CloudFlare Health Checks.
JavaScript
2
star
38

demo-links

A set of links used for demo purposes
2
star
39

meta

Open Graph Image as Service
TypeScript
2
star
40

proxy

Interact with Microlink API using an Edge Function.
JavaScript
2
star
41

logo

Adding logos to any website, powered by Microlink API.
JavaScript
2
star
42

microclap

clap button as service
JavaScript
1
star