• Stars
    star
    166
  • Rank 227,748 (Top 5 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created about 8 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NPM module: Request a url and scrape the metadata from its HTML using Node.js or the browser.

url-metadata

Request an http(s) url and scrape its html metadata. Includes Open Graph Protocol (og:) and Twitter Card meta tags.

Support also added for JSON-LD.

Under the hood, this package does some post-request processing on top of the javascript native fetch API.

To report a bug or request a feature please open an issue or pull request in GitHub.

Usage

Works with Node.js version >=18.0.0 or in the browser when bundled (with browserify or webpack for example).

Use previous version 2.5.0 which uses the (now-deprecated) request module instead if you don't have access to javascript-native fetch API in your target environment.

Install:

$ npm install url-metadata --save`

In your project file:

const urlMetadata = require('url-metadata')

urlMetadata('https://www.npmjs.com/package/url-metadata')
.then((metadata) => {
  console.log(metadata)
  // do stuff with the metadata
},
(err) => {
  console.log(err)
})

To override the default options (see below), pass in a second argument:

const urlMetadata = require('url-metadata')

urlMetadata('https://www.npmjs.com/package/url-metadata', {
  requestHeaders: {
    'User-Agent': 'foo',
    'From': '[email protected]'
  }
}).then((metadata) => {
  console.log(metadata)
  // do stuff with the metadata
}).catch((err) => {
  console.log(err)
})

Options & Defaults

This module's default options are the values below that you can override:

{
  // custom request headers
  requestHeaders: {
    'User-Agent': 'url-metadata/3.0 (npm module)',
    'From': '[email protected]',
  }

  // `fetch` API cache setting for request
  cache: 'no-cache',

  // `fetch` API mode (ex: `cors`, `no-cors`, `same-origin`, etc)
  mode: 'cors',

  // timeout in milliseconds, default is 10 seconds
  timeout: 10000,

  // number of characters to truncate description to
  descriptionLength: 750,

  // force image urls in selected tags to use https,
  // valid for 'image', 'og:image' and 'og:image:secure_url' tags
  ensureSecureImageRequest: true,

  // return raw response body as string
  includeResponseBody: false
}

Returns

Returns a promise that is resolved with an object if the response is successful. Note that the url field returned will be the last hop in the request chain. So if you passed in a url that was generated by a url shortener you'll get back the final destination as the url.