• Stars
    star
    379
  • Rank 113,004 (Top 3 %)
  • Language
    OCaml
  • License
    MIT License
  • Created about 9 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Functional HTML scraping and rewriting with CSS in OCaml

Lambda Soup   CI status Coverage

Lambda Soup is a functional HTML scraping and manipulation library for OCaml aimed at being easy to use.

Lambda Soup usage example

Lambda Soup is simple. It provides a set of elementary traversals for getting from node to node, familiar functional combinators such as filter, map, and fold, and support for all CSS selectors that still make sense when not running in a browser (and a few obvious extensions on top of that).

Here is a trivial self-contained example:

(parse "<p class='Hello'>World!</p>") $ ".Hello" |> R.leaf_text;;
- : string = "World!"

And, a mutation:

let soup = parse "<p class='Hello'>World!</p>" in
wrap (soup $ ".Hello" |> R.child) (create_element "strong");
soup |> to_string;;
- : string = "<p class=\"Hello\"><strong>World!</strong></p>"

For some more examples, see the Lambda Soup postprocessor that runs on Lambda Soup's own documentation after it is generated by ocamldoc.

The library is tested thoroughly.

Lambda Soup is based on Markup.ml. As a consequence, it resolves entity references, detects character encodings automatically, and converts everything to UTF-8. And, you can use Lambda Soup on XML, by parsing the XML with Markup.ml and feeding the signals to Lambda Soup.


Installing

opam install lambdasoup

Starting from scratch

To use Lambda Soup interactively as in the GIF at the top of this README, you need to have done something like this:

your-package-manager install ocaml opam
opam init
eval `opam config env`          # Or restart your shell
opam install lambdasoup

and make sure your ~/.ocamlinit file looks something like this:

let () =
  try Topdirs.dir_directory (Sys.getenv "OCAML_TOPLEVEL_PATH")
  with Not_found -> ()
;;

#use "topfind";;

Then, run ocaml -short-paths to start the top-level, and scrape away!


Depending

Lambda Soup uses semantic versioning, but is currently in 0.x.x. For now, the minor version number will be incremented on breaking changes. So, to give yourself a chance to review the changelog before your code breaks, put the following constraint on Lambda Soup: lambdasoup {< "0.7.0"}.


Documentation

Lambda Soup's interface consists of one module Soup, whose signature is documented here.


Developing

See CONTRIBUTING. All feedback is welcome – open an issue on GitHub, or send me an email at [email protected]. If you find yourself repeatedly writing the same helper on top of Lambda Soup's functions, perhaps we should add it to Lambda Soup.


History

Lambda Soup was originally written to answer a Stack Overflow question in November 2015.

More Repositories

1

better-enums

C++ compile-time enum to string, iteration, in a single header file
C++
1,638
star
2

dream

Tidy, feature-complete Web framework
OCaml
1,596
star
3

promise

Light and type-safe binding to JS promises
Reason
341
star
4

bisect_ppx

Code coverage for OCaml and ReScript
OCaml
303
star
5

luv

Cross-platform asynchronous I/O and system calls
OCaml
275
star
6

markup.ml

Error-recovering streaming HTML5 and XML parsers
OCaml
146
star
7

namespaces

Sane file naming for OCaml projects.
OCaml
71
star
8

hyper

OCaml Web client, composable with Dream [unannounced]
OCaml
68
star
9

dream-serve

Live-reloading server for static sites (eventually also dynamic)
OCaml
50
star
10

repromise_lwt

OCaml
13
star
11

faster-map

A tail-recursive list map with good performance for all list sizes. Not actually written in assembly.
Assembly
13
star
12

reason-native-hello

The smallest possible Reason Native project
Reason
11
star
13

bisect-starter-dune

Bisect_ppx + Dune starter repo
OCaml
6
star
14

bisect-starter-rescript

Bisect_ppx + ReScript starter repo
ReScript
5
star
15

binaries

OCaml binaries for all the platforms
Shell
5
star
16

promise-example-binding

reason-promise binding to node-fetch
Reason
4
star
17

bisect-starter-esy

Bisect_ppx + esy starter repo
OCaml
3
star
18

promise-example-bsb

Hello world using reason-promise
Reason
3
star
19

promise-example-esy

Using native reason-promise with esy
OCaml
3
star
20

bisect-ci-integration-megatest

Bisect_ppx web integrations testing
OCaml
2
star
21

bisect-starter-jsoo

Bisect_ppx + Js_of_ocaml starter repo
OCaml
1
star
22

ocamlformat-binary

Just storage for an Ocamlformat to use in Bisect_ppx's CIs
1
star
23

lwt-manual

OCaml
1
star
24

bisect-starter-jest

Bisect_ppx + Jest starter repo
ReScript
1
star
25

dream-branches

Archived branches from the main Dream repo
OCaml
1
star