orf/html-query

Stars
622
Rank 72,195 (Top 2 %)
Language
HTML
License
MIT License
Created almost 2 years ago
Updated 5 months ago

orf/html-query

orf

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

jq, but for HTML

hq

jq, but for HTML. Try it in your browser here

hq reads HTML and converts it into a JSON object based on a series of CSS selectors. The selectors are expressed in a similar way to JSON, but where the values are CSS selectors. For example:

{posts: .athing | [ {title: .titleline > a, url: .titleline > a | @(href)} ] }

This will select all .athing elements, and it will create an array (| [{...}]) of objects for each element selected. Then for each element it will select the text of the titleline > a element, and the href attribute (| @(href)).

The end result is the following structure:

{
  "posts": [
    {
      "title": "...",
      "url": "..."
    }
  ]
}

Install

cargo install html-query

Examples

Full hacker news story extraction

{posts: .athing | [{href: .titleline > a | @(href), title: .titleline > a, meta: @sibling(1) | {user: .hnuser, posted: .age | @(title) }}]}

This selects each .athing element, extracts the URL from the href attribute as well as the title. It then selects the sibling .athing element, and extracts the user and post time from that:

{
  "posts": [
    {
      "title": "...",
      "url": "...",
      "meta": {
        "posted": "...",
        "user": "..."
      }
    }
  ]
}

Special query syntax

Selecting attributes

.foo | @(href)

This will select the href attribute from the first element matching .foo.

Parents

.foo | @parent

This will return the parent element from the first element matching .foo.

Siblings

.foo | @sibling(1)

This will return the sibling element from the first element matching .foo.

gping

Ping, but with a graph

simple

Simple is a clone of Obtvse written in Python running on Flask.

xcat

XPath injection tool

cyborg

Python web scraping framework

django-debug-toolbar-template-timings

A django-debug-toolbar panel that displays template rendering times for your Django application

git-workspace

Sync personal and work git repositories from multiple providers 🚀

dirscan

A high performance tool for summarizing large directories or drives

inliner

Automagically inline python methods

cargo-bloat-action

Track rust binary sizes across builds using Github Actions

wordinserter

Insert HTML or Markdown into a Word document

bare-hugo-theme

A Hugo theme based on Bulma.io

datatables

SQLAlchemy->Datatables

ptail

Stream and display a fixed number of lines from a processes output.

human_id

Human readable IDs, in Python

MovieFinder

A basic movie recommendation site built using Python, Flask, SQLAlchemy and Backbone.js

ripgrep-structured

Ripgrep over structured data

crontabula

Parse crontab expressions with Python

websocket_stdout_example

Use websockets with twisteds ProcessProtocol

django-docker-box

See https://github.com/django/django-docker-box

xcat_app

A XPath injection demonstration application

django-choice-object

A choice object for Django

spam

A tool to graph who has sent you the most emails

HtmlToWord

Render HTML to a specific portion of a word document using Python and PyWin32

dotfiles

cel-rust-original

pytest-scrutinize

Find bottlenecks in your test suites

xpath-expressions

Treat XPath expressions as Python objects

petal

🌺 Petal - Flask, for gRPC services.

TinyLink

Small link-shortening service written in Django

CTF

Simple capture the flag web application

django-github-actions

Github actions PoC for Django

pinger

Archived: Now part of https://github.com/orf/gping

uni_timetables

A quick timetabling application written in Python using Flask

cvsslib

A library implementing CVSS v2 and v3 scores

aio-pipes

Asynchronous pipes in Python

hnewssimulator

Hacker news simulator using Markov chains. Very messy at the moment.

alfred-quip-workflow

Fulltext, local Quip document search

deterministic-zip

Deterministic zipfiles, with Rust

pyvector

https://vector.dev/ embedded inside Python

django-performance-metrics

alfred-pycharm

Quickly open Pycharm projects via Alfred

s3-deletion-visualizer

howslow_django

hncat

Grab all Hacker News stores + comments, quickly.

redis-parser

watchman-client

apple-music-importer

Import your Library.xml file into Apple Music

digest

Simple RSS digester

pypaper

A windows desktop background manager written in Python

Gmail-dumper

Dump Gmail inboxes

cargo-bloat-backend

blog-hugo

logbot

Logbot tails local log files to an IRC channel.

homebrew-brew

Personal homebrew things

workaround

Facebook-link-stats

Half finished facebook application that would track links shared on facebook.

vulnerable_website

A vulnerable website I made for a presentation

wow_economy

Word of Warcraft auction price average thing.

FindMeChicken-mono

trend

Simple terminal graphs

proximity-db

euclidean distance calculations, fast.

circleci-inspector

Wikipedia-XML-Processor

Wikipedia XML Processor

presentations

Presentations I've given since 2019

ripgrep-stream