• Stars
    star
    326
  • Rank 129,027 (Top 3 %)
  • Language
    Elixir
  • License
    GNU Lesser Genera...
  • Created over 9 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Scrape any website, article or RSS/Atom Feed with ease!

Scrape

Hex.pm Hex.pm Hex.pm

Structured Data extraction from common web resources, using information-retrieval techniques. See the docs

Installation

The package can be installed by adding scrape to your list of dependencies in mix.exs:

def deps do
  [
    {:scrape, "~> 3.0.0"}
  ]
end

Known Issues

  • This package uses an outdated version of httpoison because of keepcosmos/readability. You can override this in your app with override: true and everything should work.
  • The current version 3.X is a complete rewrite from scratch, so some new issues might occur and the API has changed. Please provide some URL to a HTML/Feed document when submitting issues, so I can look into it for bugfixing.

Usage

  • Scrape.domain!(url) -> get structured data of a domain-type url (like https://bbc.com)
  • Scrape.feed!(url) -> get structured data of a RSS/Atom feed
  • Scrape.article!(url) -> get structured data of an article-type url

License

LGPLv3. You can use this package any way you want (including commercially), but I want bugfixes and improvements to flow back into this package for everyone's benefit.