• Stars
    star
    135
  • Rank 269,297 (Top 6 %)
  • Language
    Crystal
  • License
    MIT License
  • Created almost 8 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An Html parser library for Crystal (like Nokogiri for Ruby)

Crystagiri

An HTML parser library for Crystal like the amazing Nokogiri Ruby gem.

I won't pretend that Crystagiri does much as Nokogiri. All help is welcome! :)

Installation

Add this to your application's shard.yml:

dependencies:
  crystagiri:
    github: madeindjs/crystagiri

and then run

$ shards install

Usage

require "crystagiri"

Then you can simply instantiate a Crystagiri::HTML object from an HTML String like this

doc = Crystagiri::HTML.new "<h1>Crystagiri is awesome!!</h1>"

... or directly load it from a Web URL or a pathname:

doc = Crystagiri::HTML.from_file "README.md"
doc = Crystagiri::HTML.from_url "http://example.com/"

Also you can specify follow: true flag if you want to follow redirect URL

Then you can search all XML::Nodes from the Crystagiri::HTML instance. The tags found will be Crystagiri::Tag objects with the .node property:

  • CSS query
puts doc.css("li > strong.title") { |tag| puts tag.node}
# => <strong class="title"> .. </strong>
# => <strong class="title"> .. </strong>

Known limitations: Currently, you can't use CSS queries with complex search specifiers like :nth-child

  • HTML tag
doc.where_tag("h2") { |tag| puts tag.content }
# => Development
# => Contributing
  • HTML id
puts doc.at_id("main-content").tagname
# => div
  • HTML class attribute
doc.where_class("summary") { |tag| puts tag.node }
# => <div class="summary"> .. </div>
# => <div class="summary"> .. </div>
# => <div class="summary"> .. </div>

Benchmark

I know you love benchmarks between Ruby & Crystal, so here's one:

require "nokogiri"
t1 = Time.now
doc = Nokogiri::HTML File.read("spec/fixture/HTML.html")
1..100000.times do
  doc.at_css("h1")
  doc.css(".step-title"){ |tag| tag }
end
puts "executed in #{Time.now - t1} milliseconds"

executed in 00:00:11.10 seconds with Ruby 2.6.0 with RVM on old Mac

require "crystagiri"
t = Time.now
doc = Crystagiri::HTML.from_file "./spec/fixture/HTML.html"
1..100000.times do
  doc.at_css("h1")
  doc.css(".step-title") { |tag| tag }
end
puts "executed in #{Time.now - t} milliseconds"

executed in 00:00:03.09 seconds on Crystal 0.27.2 on LLVM 6.0.1 with release flag

Crystagiri is more than two time faster than Nokogiri!!

Development

Clone this repository and navigate to it:

$ git clone https://github.com/madeindjs/crystagiri.git
$ cd crystagiri

You can generate all documentation with

$ crystal doc

And run spec tests to ensure everything works correctly

$ crystal spec

Contributing

Do you like this project? here you can find some issues to get started.

Contributing is simple:

  1. Fork it ( https://github.com/madeindjs/crystagiri/fork )
  2. Create your feature branch git checkout -b my-new-feature
  3. Commit your changes git commit -am "Add some feature"
  4. Push to the branch git push origin my-new-feature
  5. Create a new Pull Request

Contributors

See the list on Github

More Repositories

1

api_on_rails

Learn best practices to build an API using Ruby on Rails 5/6
Ruby
366
star
2

Wifi_BruteForce

A script to find all Wifi Networks in the area and try the 100K most used passwords
Python
166
star
3

active_storage-send_zip

Create a zip from one or more Active Storage objects
Ruby
44
star
4

nestjs-graphile-worker

A Nest.js wrapper for Graphile Worker
TypeScript
34
star
5

market_place_api_6

Code example of API on Rails 6 book https://github.com/madeindjs/api_on_rails
Ruby
22
star
6

workflow.ts

TypeScript + Sequelize ORM + Express.js
TypeScript
20
star
7

rest-api.ts

Free ebook to learn best practices to build an API using Node.js and Typescript.
TypeScript
17
star
8

Rocket_MVC

An MVC web application built with Rocket.rs & Diesel.rs
Rust
15
star
9

Super-Markdown

a Python library to export a complex Markdown file into a standalone HTML file.
Python
7
star
10

recipe_scraper

A web scraper to get a Marmiton, 750g or cuisineaz's recipe
Ruby
6
star
11

vscode-notable

VSCode plugin to take Markdown notes following Notable format.
TypeScript
4
star
12

vscode-markdown-tags

VSCode plugin to handle Markdown tags
TypeScript
4
star
13

invoice-generator-pdf

Generate PDF invoice using PDFMake from JSON definition.
JavaScript
2
star
14

vscode-markdown-move

Move, promote and demote Markdown section in VSCode
TypeScript
2
star
15

killer-game

generate Killer game for a party
JavaScript
2
star
16

fooder

An web-based application to easilly create a restaurant website
Ruby
1
star
17

build_my_portfolio

A resume / portfolio generator without database requirement.
PHP
1
star
18

jMarkdown

A Markdown editor with live HTML preview
Java
1
star
19

the-godfather.tech

TypeScript
1
star
20

Zombie-Outbreak

A 'Zombie Apocalypse' simulator created in Python
Python
1
star
21

market_place_api

Ruby
1
star
22

dcim_orgnzr

Organize your picture library according to EXIF data.
TypeScript
1
star
23

ScrawlEO

Web Scrawler for Android.
Java
1
star
24

pomodore_manager

a Pomodore manager written only Python
Python
1
star
25

locadb

Locadex API using Typescript and MongoDB
TypeScript
1
star