• Stars
    star
    130
  • Rank 267,638 (Top 6 %)
  • Language
    TypeScript
  • License
    MIT License
  • Created over 8 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An HTML parser recognizes content and string placeholders and allows JavaScript expressions as attribute values


htmljs-parser
Styled with prettier Build status Code Coverage NPM version Downloads

An HTML parser with super powers used by Marko.

Installation

npm install htmljs-parser

Creating A Parser

First we must create a parser instance and pass it some handlers for the various parse events shown below.

Each parse event is called a Range and is an object with start and end properties which are zero-based offsets from the beginning of th parsed code.

Additional meta data and nested ranges are exposed on some events shown below.

You can get the raw string from any range using parser.read(range).

import { createParser, ErrorCode, TagType } from "htmljs-parser";

const parser = createParser({
  /**
   * Called when the parser encounters an error.
   *
   * @example
   * 1╭─ <a><b
   *  ╰─     ╰─ error(code: 19, message: "EOF reached while parsing open tag")
   */
  onError(range) {
    range.code; // An error code id. You can see the list of error codes in ErrorCode imported above.
    range.message; // A human readable (hopefully) error message.
  },

  /**
   * Called when some static text is parsed within some body content.
   *
   * @example
   * 1╭─ <div>Hi</div>
   *  ╰─      ╰─ text "Hi"
   */
  onText(range) {},

  /**
   * Called after parsing a placeholder within body content.
   *
   * @example
   * 1╭─ <div>${hello} $!{world}</div>
   *  │       │ │      │  ╰─ placeholder.value "world"
   *  │       │ │      ╰─ placeholder "$!{world}"
   *  │       │ ╰─ placeholder:escape.value "hello"
   *  ╰─      ╰─ placeholder:escape "${hello}"
   */
  onPlaceholder(range) {
    range.escape; // true for ${} placeholders and false for $!{} placeholders.
    range.value; // Another range that includes only the placeholder value itself without the wrapping braces.
  },

  /**
   * Called when we find a comment at the root of the document or within a tags contents.
   * It will not be fired for comments within expressions, such as attribute values.
   *
   * @example
   * 1╭─ <!-- hi -->
   *  │  │   ╰─ comment.value " hi "
   *  ╰─ ╰─ comment "<!-- hi -->"
   * 2╭─ // hi
   *  │  │ ╰─ comment.value " hi"
   *  ╰─ ╰─ comment "// hi"
   */
  onComment(range) {
    range.value; // Another range that only includes the contents of the comment.
  },

  /**
   * Called after parsing a CDATA section.
   * // https://developer.mozilla.org/en-US/docs/Web/API/CDATASection
   *
   * @example
   * 1╭─ <![CDATA[hi]]>
   *  │  │        ╰─ cdata.value "hi"
   *  ╰─ ╰─ cdata "<![CDATA[hi]]>"
   */
  onCDATA(range) {
    range.value; // Another range that only includes the contents of the CDATA.
  },

  /**
   * Called after parsing a DocType comment.
   * https://developer.mozilla.org/en-US/docs/Web/API/DocumentType
   *
   * @example
   * 1╭─ <!DOCTYPE html>
   *  │  │ ╰─ doctype.value "DOCTYPE html"
   *  ╰─ ╰─ doctype "<!DOCTYPE html>"
   */
  onDoctype(range) {
    range.value; // Another range that only includes the contents of the DocType.
  },

  /**
   * Called after parsing an XML declaration.
   * https://developer.mozilla.org/en-US/docs/Web/XML/XML_introduction#xml_declaration
   *
   * @example
   * 1╭─ <?xml version="1.0" encoding="UTF-8"?>
   *  │  │ ╰─ declaration.value "xml version=\"1.0\" encoding=\"UTF-8\""
   *  ╰─ ╰─ declaration "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
   */
  onDeclaration(range) {
    range.value; // Another range that only includes the contents of the declaration.
  },

  /**
   * Called after parsing a scriptlet (new line followed by a $).
   *
   * @example
   * 1╭─ $ foo();
   *  │   │╰─ scriptlet.value "foo();"
   *  ╰─  ╰─ scriptlet " foo();"
   * 2╭─ $ { bar(); }
   *  │   │ ╰─ scriptlet:block.value " bar(); "
   *  ╰─  ╰─ scriptlet:block " { bar(); }"
   */
  onScriptlet(range) {
    range.block; // true if the scriptlet was contained within braces.
    range.value; // Another range that includes only the value itself without the leading $ or surrounding braces (if applicable).
  },

  /**
   * Called when we're about to begin an HTML open tag (before the tag name).
   * Note: This is only called for HTML mode tags and can be used to track if you are in concise mode.
   *
   * @example
   * 1╭─ <div>Hi</div>
   *  ╰─ ╰─ openTagStart
   */
  onOpenTagStart(range) {},

  /**
   * Called when a tag name, which can include placeholders, has been parsed.
   *
   * @example
   * 1╭─ <div/>
   *  ╰─  ╰─ openTagName "div"
   * 2╭─ <hello-${test}-again/>
   *  │   │     │      ╰─ openTagName.quasis[1] "-again"
   *  │   │     ╰─ openTagName.expressions[0] "${test}"
   *  │   ├─ openTagName.quasis[0] "hello-"
   *  ╰─  ╰─ openTagName "hello-${test}-again"
   */
  onOpenTagName(range) {
    range.quasis; // An array of ranges that indicate the string literal parts of the tag name.
    range.expressions; // A list of placeholder ranges (similar to whats emitted via onPlaceholder).

    // Return a different tag type enum value to enter a different parse mode.
    // Below is approximately what Marko uses:
    switch (parser.read(range)) {
      case "area":
      case "base":
      case "br":
      case "col":
      case "embed":
      case "hr":
      case "img":
      case "input":
      case "link":
      case "meta":
      case "param":
      case "source":
      case "track":
      case "wbr":
        // TagType.void makes this a void element (cannot have children).
        return TagType.void;
      case "html-comment":
      case "script":
      case "style":
      case "textarea":
        // TagType.text makes the child content text only (with placeholders).
        return TagType.text;
      case "class":
      case "export":
      case "import":
      case "static":
        // TagType.statement makes this a statement tag where the content following the tag name will be parsed as script code until we reach a new line, eg for `import x from "y"`).
        return TagType.statement;
    }

    // TagType.html is the default which allows child content as html with placeholders.
    return TagType.html;
  },

  /**
   * Called when a shorthand id, which can include placeholders, has been parsed.
   *
   * @example
   * 1╭─ <div#hello-${test}-again/>
   *  │      ││     │       ╰─ tagShorthandId.quasis[1] "-again"
   *  │      ││     ╰─ tagShorthandId.expressions[0] "${test}"
   *  │      │╰─ tagShorthandId.quasis[0] "hello-"
   *  ╰─     ╰─ tagShorthandId "#hello-${test}-again"
   */
  onTagShorthandId(range) {
    range.quasis; // An array of ranges that indicate the string literal parts of the shorthand id name.
    range.expressions; // A list of placeholder ranges (similar to whats emitted via onPlaceholder).
  },

  /**
   * Called when a shorthand class name, which can include placeholders, has been parsed.
   * Note there can be multiple of these.
   *
   * @example
   * 1╭─ <div.hello-${test}-again/>
   *  │      ││     │       ╰─ tagShorthandClassName.quasis[1] "-again"
   *  │      ││     ╰─ tagShorthandClassName.expressions[0] "${test}"
   *  │      │╰─ tagShorthandClassName.quasis[0] "hello-"
   *  ╰─     ╰─ tagShorthandClassName "#hello-${test}-again"
   */
  onTagShorthandClass(range) {
    range.quasis; // An array of ranges that indicate the string literal parts of the shorthand id name.
    range.expressions; // A list of placeholder ranges (similar to whats emitted via onPlaceholder).
  },

  /**
   * Called after the type arguments for a tag have been parsed.
   *
   * @example
   * 1╭─ <foo<string>>
   *  │      │╰─ tagTypeArgs.value "string"
   *  ╰─     ╰─ tagTypeArgs "<string>"
   */
  onTagTypeArgs(range) {
    range.value; // Another range that includes only the type arguments themselves and not the angle brackets.
  },

  /**
   * Called after a tag variable has been parsed.
   *
   * @example
   * 1╭─ <div/el/>
   *  │      │╰─ tagVar.value "el"
   *  ╰─     ╰─ tagVar "/el"
   */
  onTagVar(range) {
    range.value; // Another range that includes only the tag var itself and not the leading slash.
  },

  /**
   * Called after tag arguments have been parsed.
   *
   * @example
   * 1╭─ <if(x)>
   *  │     │╰─ tagArgs.value "x"
   *  ╰─    ╰─ tagArgs "(x)"
   */
  onTagArgs(range) {
    range.value; // Another range that includes only the args themselves and not the outer parenthesis.
  },

  /**
   * Called after type parameters for the tag parameters have been parsed.
   *
   * @example
   * 1╭─ <tag<T>|input: { name: T }|>
   *  │      │╰─ tagTypeParams.value
   *  ╰─     ╰─ tagTypeParams "<T>"
   */
  onTagTypeParams(range) {
    range.value; // Another range that includes only the type params themselves and not the angle brackets.
  },

  /**
   * Called after tag parameters have been parsed.
   *
   * @example
   * 1╭─ <for|item| of=list>
   *  │      │╰─ tagParams.value "item"
   *  ╰─     ╰─ tagParams "|item|"
   */
  onTagParams(range) {
    range.value; // Another range that includes only the params themselves and not the outer pipes.
  },

  /**
   * Called after an attribute name as been parsed.
   * Note this may be followed by the related AttrArgs, AttrValue or AttrMethod. It can also be directly followed by another AttrName, AttrSpread or the OpenTagEnd if this is a boolean attribute.
   *
   * @example
   * 1╭─ <div class="hi">
   *  ╰─      ╰─ attrName "class"
   */
  onAttrName(range) {},

  /**
   * Called after attr arguments have been parsed.
   *
   * @example
   * 1╭─ <div if(x)>
   *  │         │╰─ attrArgs.value "x"
   *  ╰─        ╰─ attrArgs "(x)"
   */
  onAttrArgs(range) {
    range.value; // Another range that includes only the args themselves and not the outer parenthesis.
  },

  /**
   * Called after an attr value has been parsed.
   *
   * @example
   * 1╭─ <input name="hi" value:=x>
   *  │             ││         │ ╰─ attrValue:bound.value
   *  │             ││         ╰─ attrValue:bound ":=x"
   *  │             │╰─ attrValue.value "\"hi\""
   *  ╰─            ╰─ attrValue "=\"hi\""
   */
  onAttrValue(range) {
    range.bound; // true if the attribute value was preceded by :=.
    range.value; // Another range that includes only the value itself without the leading = or :=.
  },

  /**
   * Called after an attribute method shorthand has been parsed.
   *
   * @example
   * 1╭─ <div onClick(ev) { foo(); }>
   *  │              ││   │╰─ attrMethod.body.value " foo(); "
   *  │              ││   ╰─ attrMethod.body "{ foo(); }"
   *  │              │╰─ attrMethod.params.value "ev"
   *  │              ├─ attrMethod.params "(ev)"
   *  ╰─             ╰─ attrMethod "(ev) { foo(); }"
   */
  onAttrMethod(range) {
    range.typeParams; // Another range which includes the type params for the method.
    range.typeParams.value; // Another range which includes the type params without outer angle brackets.

    range.params; // Another range which includes the params for the method.
    range.params.value; // Another range which includes the method params without outer parenthesis.

    range.body; // Another range which includes the entire body block.
    range.body.value; // Another range which includes the body block without outer braces.
  },

  /**
   * Called after we've parsed a spread attribute.
   *
   * @example
   * 1╭─ <div ...attrs>
   *  │       │  ╰─ attrSpread.value "attrs"
   *  ╰─      ╰─ attrSpread "...attrs"
   */
  onAttrSpread(range) {
    range.value; // Another range that includes only the value itself without the leading ...
  },

  /**
   * Called once we've completed parsing the open tag.
   *
   * @example
   * 1╭─ <div><span/></div>
   *  │      │     ╰─ openTagEnd:selfClosed "/>"
   *  ╰─     ╰─ openTagEnd ">"
   */
  onOpenTagEnd(range) {
    range.selfClosed; // true if this tag was self closed (the onCloseTag* handlers will not be called if so).
  },

  /**
   * Called when we start parsing and html closing tag.
   * Note this is not emitted for concise, selfClosed, void or statement tags.
   *
   * @example
   * 1╭─ <div><span/></div>
   *  ╰─             ╰─ closeTagStart "</"
   */
  onCloseTagStart(range) {},

  /**
   * Called after the content within the brackets of an html closing tag has been parsed.
   * Note this is not emitted for concise, selfClosed, void or statement tags.
   *
   * @example
   * 1╭─ <div><span/></div>
   *  ╰─               ╰─ closeTagName "div"
   */
  onCloseTagName(range) {},

  /**
   * Called once the closing tag has finished parsing, or in concise mode we hit an outdent or eof.
   * Note this is not called for selfClosed, void or statement tags.
   *
   * @example
   * 1╭─ <div><span/></div>
   *  ╰─                  ╰─ closeTagEnd ">"
   */
  onCloseTagEnd(range) {},
});

Finally after setting up the parser with it's handlers, it's time to pass in some source code to parse.

parser.parse("<div></div>");

Parser Helpers

The parser instance provides a few helpers to make it easier to work with the parsed content.

// Pass any range object into this method to get the raw string from the source for the range.
parser.read(range);

// Given an zero based offset within the source code, returns a position object that contains line and column properties.
parser.positionAt(offset);

// Given a range object returns a location object with start and end properties which are each position objects as returned from the "positionAt" api.
parser.locationAt(range);

Code of Conduct

This project adheres to the eBay Code of Conduct. By participating in this project you agree to abide by its terms.

More Repositories

1

marko

A declarative, HTML-based language that makes building web apps fun
JavaScript
13,035
star
2

isomorphic-ui-benchmarks

Client/server benchmarks for various UI libraries
JavaScript
243
star
3

templating-benchmarks

Benchmarking framework for multiple templating engines
HTML
135
star
4

run

The Marko application framework
TypeScript
127
star
5

cli

command-line tools for Marko
JavaScript
94
star
6

writable-dom

Utility to stream HTML content into a live document.
TypeScript
85
star
7

examples

53
star
8

vite

A Marko plugin for Vite
TypeScript
52
star
9

micro-frame

A Marko tag for building SSR friendly micro frontends.
HTML
49
star
10

website

The markojs.com website
JavaScript
36
star
11

language-server

Marko autocomplete, intellisense and editor support.
TypeScript
34
star
12

atom-language-marko

Syntax highlighting in Atom for the Marko templating language
JavaScript
33
star
13

testing-library

Simple and complete Marko testing utilities that encourage good testing practices.
TypeScript
33
star
14

tags

Not quite core components.
JavaScript
22
star
15

tags-api-preview

Preview the Marko 6 "Tags API" in Marko 5!
TypeScript
22
star
16

marko-tmbundle

TextMate bundle for the Marko templating language
Marko
20
star
17

x

experiments. do not use.
TypeScript
19
star
18

rollup

A Marko plugin for Rollup.
TypeScript
15
star
19

markojs.com-legacy

Source code for markojs.com
JavaScript
14
star
20

prettier

A prettier plugin for parsing and printing Marko files.
Marko
13
star
21

webpack

A loader for marko files and a plugin for universal bundling
JavaScript
13
star
22

fastify

Render Marko templates in a Fastify application.
TypeScript
9
star
23

ssr-to-html

Crawl a server rendered application and output html files.
TypeScript
9
star
24

express

Render Marko templates in an express application.
TypeScript
9
star
25

jest

A preprocessor and preset to use Marko with Jest
TypeScript
8
star
26

urql

An isomorphic URQL GraphQL client for Marko
TypeScript
7
star
27

tutorial-blog

Blog site tutorial to demonstrate basic concepts in Marko
Marko
6
star
28

community

Meetings, proposals & other community related discussions
5
star
29

fixture-snapshots

A tool for loading fixtures and creating snapshot tests
TypeScript
5
star
30

marko-migrate

Tool for migrating a project using marko v2 to marko v3
JavaScript
5
star
31

branding

Logos and other graphics for Marko
4
star
32

utils

A set of utilities used by the Marko ecosystem. 🛠
JavaScript
4
star
33

marko-js.github.io

Statically generated files for markojs.com
JavaScript
4
star
34

screencasts

Project source code associated with Marko screencasts
JavaScript
3
star
35

markojs-website-v3

Marko v3 website (legacy)
JavaScript
1
star
36

compat

Marko compat taglibs
JavaScript
1
star
37

relative-import-path

Like path.relative for generating short require'able paths.
TypeScript
1
star
38

migrate-v3-widget

Migrates a Marko v3 widget to a v4 component.
TypeScript
1
star
39

marko.run

Marko Run website
Marko
1
star