• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created about 10 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Parses HTML strings into objects using flexible, composable filters.

HTML to JSON

Parses HTML strings into objects using flexible, composable filters.

Installation

npm install html-to-json

htmlToJson.parse(html, filter, [callback]) -> promise

The parse() method takes a string of HTML, and a filter, and responds with the filtered data. This supports both callbacks and promises.

var promise = htmlToJson.parse('<div>content</div>', {
  'text': function ($doc) {
    return $doc.find('div').text();
  }
}, function (err, result) {
  console.log(result);
});

promise.done(function (result) {
  //Works as well
});

htmlToJson.request(requestOptions, filter, [callback]) -> promise

The request() method takes options for a call to the request library and a filter, then returns the filtered response body.

var promise = htmlToJson.request('http://prolificinteractive.com/team', {
  'images': ['img', function ($img) {
    return $img.attr('src');
  }]
}, function (err, result) {
  console.log(result);
});

htmlToJson.batch(html, dictionary, [callback]) -> promise

Performs many parsing operations against one HTML string. This transforms the HTML into a DOM only once instead of for each filter in the dictionary, which can quickly get expensive in terms of processing. This also allows you to break your filters up into more granular components and mix and match them as you please.

The values in the dictionary can be htmlToJson.Parser objects, generated methods from htmlToJson.createMethod, or naked filters that you might normally pass into htmlToJson.parse. For example:

return getProlificHomepage().then(function (html) {
  return htmlToJson.batch(html, {
    sections: htmlToJson.createParser(['#primary-nav a', {
      'name': function ($section) {
        return $section.text();
      },
      'link': function ($section) {
        return $section.attr('href');
      }
    }]),
    offices: htmlToJson.createMethod(['.office', {
      'location': function ($office) {
        return $office.find('.location').text();
      },
      'phone': function ($office) {
        return $office.find('.phone').text();
      }
    }]),
    socialInfo: ['#footer .social-link', {
      'name': function ($link) {
        return $link.text();
      },
      'link': function ($link) {
        return $link.attr('href');
      }
    }]
  });
});

htmlToJson.createMethod(filter) -> function (html, [callback])

Generates a method that wraps the passed filter argument. The generated method takes an HTML string and processes it against that filter.

var parseFoo = htmlToJson.createMethod({
  'foo': function ($doc) {
    return $doc.find('#foo').bar();
  }
});

htmlToJson.createParser(filter), new htmlToJson.Parser(filter)

For the sake of reusability, creates an object with .parse and .request helper methods, which use the passed filter. For example:

var linkParser = htmlToJson.createParser(['a[href]', {
  'text': function ($a) {
    return $a.text();
  },
  'href': function ($a) {
    return $a.attr('href');
  }
}]);

linkParser.request('http://prolificinteractive.com').done(function (links) {
  //Do stuff with links
});

is equivalent to:

linkParser.request('http://prolificinteractive.com', ['a[href]', {
  'text': function ($a) {
    return $a.text();
  },
  'href': function ($a) {
    return $a.attr('href');
  }
}]).done(function (links) {
  //Do stuff with links
});

The former allows you to easily reuse the filter (and make it testable), while that latter is a one-off.

parser.parse(html, [callback])

Parses the passed html argument against the parser's filter.

parser.method(html, [callback])

Returns a method that wraps parser.parse()

parser.request(requestOptions, [callback])

Makes a request with the request options, then runs the response body through the parser's filter.

Filter Types

Functions

The return values of functions are mapped against their corresponding keys. Function filters are passed cheerio objects, which allows you to play with a jQuery-like interface.

htmlToJson.parse('<div id="foo">foo</div>', {
  'foo1': function ($doc, $) {
    return $doc.find('#foo').text(); //foo
  }
}, callback);

Arrays

Arrays of data can be parsed out by either using the .map() method within a filter function or using the shorthand [selector, filter] syntax:

.map(selector, filter)

A filter is applied incrementally against each matched element, and the results are returned within an array.

var html = '<div id="items"><div class="item">1</div><div class="item">2</div></div>';

htmlToJson.parse(html, function () {
  return this.map('.item', function ($item) {
    return $item.text();
  });
}).done(function (items) {
  // Items should be: ['1','2']
}, function (err) {
  // Handle error
});

[selector, filter, after]

This is essentially a short-hand alias for .map(), making the filter look more like its output:

var html = '<div id="items"><div class="item">1</div><div class="item">2</div></div>';

htmlToJson
  .parse(html, ['.item', function ($item) {
    return $item.text();
  }])
  .done(function (items) {
    // Items should be: ['1','2']
  }, function (err) {
    // Handle error
  });

As an added convenience you can pass in a 3rd argument into the array filter, which allows you to manipulate the results. You can return a promise if you wish to do an asynchronous operation.

var html = '<div id="items"><div class="item">1</div><div class="item">2</div></div>';

htmlToJson
  .parse(html, ['.item', function ($item) {
    return +$item.text();
  }, function (items) {
    return _.map(items, function (item) {
      return item * 3;
    });
  }])
  .done(function (items) {
    // Items should be: [3,6]
  }, function (err) {
    // Handle error
  });

Asynchronous filters

Filter functions may also return promises, which get resolved asynchronously.

function getProductDetails (id, callback) {
  return htmlToJson.request({
    uri: 'http://store.prolificinteractive.com/products/' + id
  }, {
    'id': function ($doc) {
      return $doc.find('#product-details').attr('data-id');
    },
    'colors': ['.color', {
      'id': function ($color) {
        return $color.attr('data-id');
      },
      'hex': function ($color) {
        return $color.css('background-color');
      }
    }]
  }, callback);
}

function getProducts (callback) {
  return htmlToJson.request({
    uri: 'http://store.prolificinteractive.com'
  }, ['.product', {
    'id': function ($product) {
      return $product.attr('data-id');
    },
    'image': function ($product) {
      return $product.find('img').attr('src');
    },
    'colors': function ($product) {
      // This is where we use a promise to get the colors asynchronously
      return this
        .get('id')
        .then(function (id) {
          return getProductDetails(id).get('colors');
        });
    }
  }], callback);
}

Dependencies on other values

Filter functions may use the .get(propertyName) to use a value from another key in that filter. This returns a promise representing the value rather than the value itself.

function getProducts (callback) {
  return htmlToJson.request('http://store.prolificinteractive.com', ['.product', {
    'id': function ($product) {
      return $product.attr('data-id');
    },
    'image': function ($product) {
      return $product.find('img').attr('src');
    },
    'colors': function ($product) {
      // Resolve 'id' then get product details with it
      return this
        .get('id')
        .then(function (id) {
          return getProductDetails(id).get('colors');
        });
    }
  }], callback);
}

Objects

Nested objects within a filter are run against the same HTML context as the parent filter.

var html = '<div id="foo"><div id="bar">foobar</div></div>';

htmlToJson.parse(html, {
  'foo': {
    'bar': function ($doc) {
      return $doc.find('#bar').text();
    }
  }
});

$container modifier

You may specify a more specific DOM context by setting the $container property on the object filter:

var html = '<div id="foo"><div id="bar">foobar</div></div>';

htmlToJson.parse(html, {
  'foo': {
    $container: '#foo',
    'bar': function ($foo) {
      return $foo.find('#bar').text();
    }
  }
});

Constants

Strings, numbers, and null values are simply used as the filter's value. This especially comes in handy for incrementally converting from mock data to parsed data.

htmlToJson.parse('<div id="nada"></div>', {
  x: 1,
  y: 'string value',
  z: null
});

Contributing

Running Tests

Tests are written in mocha and located in the test directory. Run them with:

npm test

This script also executes jshint against lib/ and test/ directories.

Style

Please read the existing code in order to learn the conventions.

More Repositories

1

material-calendarview

A Material design back port of Android's CalendarView
Java
5,903
star
2

ParallaxPager

Add some depth to your Android scrolling.
Java
779
star
3

Caishen

A Payment Card UI & Validator for iOS
Swift
762
star
4

Yoshi

A convenient wrapper around the UI code that is often needed for displaying debug menus.
Swift
267
star
5

Chandelier

A nice swipe layout that provides new actions with a material design look and feel
Java
240
star
6

swift-style-guide

A style guide for Swift.
176
star
7

SamMitiAR-iOS

Ready-and-easy-to-use ARKit framework for the best user experience.
Swift
120
star
8

PIDatePicker

[DEPRECATED] A customizable implementation of UIDatePicker, written in Swift.
Swift
39
star
9

navigation-conductor

A Conductor integration for the Navigation Architecture Component.
Kotlin
38
star
10

android-studio-templates

A set of templates for your Android Studio
FreeMarker
35
star
11

NavigationControllerBlurTransition

[DEPRECATED] A UINavigationController transition that utilizes a blur view for a simple interface.
Swift
35
star
12

HouseOfCards

Android tools for working with a house of (credit) cards
Java
27
star
13

Optik

A Swift library for displaying images from any source, local or remote.
Swift
25
star
14

PIAPIEnvironmentManager

[DEPRECATED] A simple manager for handling the various API Environments in your project.
Objective-C
20
star
15

SOLID-Principles

Exploring the SOLID Principles in Swift. Video: https://www.youtube.com/watch?v=gkxmeWvGEpU&t=2s
Swift
19
star
16

simcoe

A simple, light analytics framework for iOS.
Swift
19
star
17

anchored-behavior

A CoordinatorLayout Behavior to anchor views with an animation.
Kotlin
17
star
18

Marker

A light wrapper around NSAttributedString.
Swift
15
star
19

Kumi-iOS

Swift
15
star
20

TouchIDBlogPost

Accompanies the tutorial "Use Touch ID in Your Swift App"
Swift
12
star
21

Bellerophon

Swift
12
star
22

Pilas

A scrollable stackview.
Swift
9
star
23

applepay-demo

Objective-C
9
star
24

heimdall

A simple validation check overview for you password fields.
Java
8
star
25

mabi

Start your REST Mobile APIs Fast and Build as you Grow
C
8
star
26

PIVideoPlayer

[DEPRECATED] A custom wrapper around AVFoundation for playing silent video files without any chrome.
Objective-C
7
star
27

Velar

A custom alert view presenter.
Swift
6
star
28

geocoder

This is a device independent and plugable replacement for Android's builtin Geocoder.
Kotlin
6
star
29

ShimmerBlocks

Add blocked shimmering views to your view components.
Swift
5
star
30

Cake

[DEPRECATED] A Cocoapods wrapper allowing for greater control over your workspaces
Swift
5
star
31

ballad

Assemble API Blueprint specs with concatenation, templating, and inheritance.
HTML
4
star
32

patrons

SharedPreferences wrappers with an encryption package.
Kotlin
3
star
33

DeathStar

Sample project for the "Conquering the Testing Beast" talk for Swift Camp (http://swiftcamp.io/)
Swift
3
star
34

behalf

Emulate the way browsers make requests and manage cookies.
JavaScript
3
star
35

data-builder

A build tool for JSON and YAML that uses special keys to specify functions, i.e. $import.
JavaScript
2
star
36

TickerCounter

A counter with a ticker animation.
Swift
2
star
37

simplesamlphp-module-mongodb

SimpleSAML Store implementation for MongoDB PHP Library
PHP
2
star
38

flutter_debug_menu

Flutter Debug Menu
Dart
2
star
39

Birdo

Prolific's android wrapper around the UI code that is often needed for displaying debug menus.
Kotlin
1
star
40

glenlivet

Create flexible, reusable processing pipelines powered by plugins.
JavaScript
1
star
41

mabiSkeletonApi

PHP
1
star
42

Olapic-SDK-iOS

Objective-C
1
star
43

prolific-cleaner

Sets up javascript projects to be linted and checked for code styles based on commit and push git hooks.
JavaScript
1
star
44

IQKeyboardManager

Objective-C
1
star
45

simcoe-android

A simple, light analytics framework for Android.
Kotlin
1
star
46

DevKit

Collection of commonly used swift code
Swift
1
star
47

PIPassiveAlert

[DEPRECATED] A passive alert library in Objective-C. 🚨
Objective-C
1
star
48

pandroid-gradle-plugin

The PAndroid Gradle plugin allows all Prolific's Android project to run on our CI pipeline for different build variants.
Groovy
1
star
49

artgun-php

PHP ArtGun API Wrapper
PHP
1
star
50

simplesamlphp-module-mongo

SimpleSAML Store implementation for MongoDB
PHP
1
star