• Stars
    star
    37
  • Rank 720,807 (Top 15 %)
  • Language
    HTML
  • License
    MIT License
  • Created over 11 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A ruby web/screen scraping tool / gem.

DocParser

Gem Version Build status

DocParser is a web scraping/screen scraping tool.

You can use it to easily scrape information out of HTML documents.

The gem is called docparser. You can find the documentation here.

Features

  • XPath and CSS support through Nokogiri
  • Support for parallel processing of the documents
  • 6 Output formats:
    • CSV
    • XLSX
    • HTML
    • YAML
    • JSON
    • Screen (for debugging and development)
    • And more! (easy to extend)

Installation

Add this line to your application's Gemfile:

gem 'docparser'

And then execute:

bundle

Or install it yourself using:

gem install docparser

Usage

See example.rb

Todo

  • Better examples and documentation

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Contributors

Thanks