• Stars
    star
    1,557
  • Rank 30,051 (Top 0.6 %)
  • Language
    HTML
  • License
    MIT License
  • Created over 2 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A light markup language

Djot

Djot is a light markup syntax. It derives most of its features from commonmark, but it fixes a few things that make commonmark's syntax complex and difficult to parse efficiently. It is also much fuller-featured than commonmark, with support for definition lists, footnotes, tables, several new kinds of inline formatting (insert, delete, highlight, superscript, subscript), math, smart punctuation, attributes that can be applied to any element, and generic containers for block-level, inline-level, and raw content.

The project began as an attempt to implement some of the ideas I suggested in my essay Beyond Markdown. (See Rationale, below.)

This repository contains a Syntax Description, a Cheatsheet, and a Quick Start for Markdown Users that outlines the main differences between djot and Markdown.

You can try djot on the djot playground without installing anything locally.

Rationale

Here are some design goals:

  1. It should be possible to parse djot markup in linear time, with no backtracking.

  2. Parsing of inline elements should be "local" and not depend on what references are defined later. This is not the case in commonmark: [foo][bar] might be "[foo]" followed by a link with text "bar", or "[foo][bar]", or a link with text "foo", or a link with text "foo" followed by "[bar]", depending on whether the references [foo] and [bar] are defined elsewhere (perhaps later) in the document. This non-locality makes accurate syntax highlighting nearly impossible.

  3. Rules for emphasis should be simpler. The fact that doubled characters are used for strong emphasis in commonmark leads to many potential ambiguities, which are resolved by a daunting list of 17 rules. It is hard to form a good mental model of these rules. Most of the time they interpret things the way a human would most naturally interpret them---but not always.

  4. Expressive blind spots should be avoided. In commonmark, you're out of luck if you want to produce the HTML a<em>?</em>b, because the flanking rules classify the first asterisk in a*?*b as right-flanking. There is a way around this, but it's ugly (using a numerical entity instead of a). In djot there should not be expressive blind spots of this kind.

  5. Rules for what content belongs to a list item should be simple. In commonmark, content under a list item must be indented as far as the first non-space content after the list marker (or five spaces after the marker, in case the list item begins with indented code). Many people get confused when their indented content is not indented far enough and does not get included in the list item.

  6. Parsers should not be forced to recognize unicode character classes, HTML tags, or entities, or perform unicode case folding. That adds a lot of complexity.

  7. The syntax should be friendly to hard-wrapping: hard-wrapping a paragraph should not lead to different interpretations, e.g. when a number followed by a period ends up at the beginning of a line. (I anticipate that many will ask, why hard-wrap at all? Answer: so that your document is readable just as it is, without conversion to HTML and without special editor modes that soft-wrap long lines. Remember that source readability was one of the prime goals of Markdown and Commonmark.)

  8. The syntax should compose uniformly, in the following sense: if a sequence of lines has a certain meaning outside a list item or block quote, it should have the same meaning inside it. This principle is articulated in the commonmark spec, but the spec doesn't completely abide by it (see commonmark/commonmark-spec#634).

  9. It should be possible to attach arbitrary attributes to any element.

  10. There should be generic containers for text, inline content, and block-level content, to which arbitrary attributes can be applied. This allows for extensibility using AST transformations.

  11. The syntax should be kept as simple as possible, consistent with these goals. Thus, for example, we don't need two different styles of headings or code blocks.

These goals motivated the following decisions:

  • Block-level elements can't interrupt paragraphs (or headings), because of goal 7. So in djot the following is a single paragraph, not (as commonmark sees it) a paragraph followed by an ordered list followed by a block quote followed by a section heading:

    My favorite number is probably the number
    1. It's the smallest natural number that is
    > 0. With pencils, though, I prefer a
    # 2.
    

    Commonmark does make some concessions to goal 7, by forbidding lists beginning with markers other than 1. to interrupt paragraphs. But this is a compromise and a sacrifice of regularity and predictability in the syntax. Better just to have a general rule.

  • An implication of the last decision is that, although "tight" lists are still possible (without blank lines between items), a sublist must always be preceded by a blank line. Thus, instead of

    - Fruits
      - apple
      - orange
    

    you must write

    - Fruits
    
      - apple
      - orange
    

    (This blank line doesn't count against "tightness.") reStructuredText makes the same design decision.

  • Also to promote goal 7, we allow headings to "lazily" span multiple lines:

    ## My excessively long section heading is too
    long to fit on one line.
    

    While we're at it, we'll simplify by removing setext-style (underlined) headings. We don't really need two heading syntaxes (goal 11).

  • To meet goal 5, we have a very simple rule: anything that is indented beyond the start of the list marker belongs in the list item.

    1. list item
    
      > block quote inside item 1
    
    2. second item
    

    In commonmark, this would be parsed as two separate lists with a block quote between them, because the block quote is not indented far enough. What kept us from using this simple rule in commonmark was indented code blocks. If list items are going to contain an indented code block, we need to know at what column to start counting the indentation, so we fixed on the column that makes the list look best (the first column of non-space content after the marker):

    1.  A commonmark list item with an indented code block in it.
    
            code!
    

    In djot, we just get rid of indented code blocks. Most people prefer fenced code blocks anyway, and we don't need two different ways of writing code blocks (goal 11).

  • To meet goal 6 and to avoid the complex rules commonmark adopted for handling raw HTML, we simply do not allow raw HTML, except in explicitly marked contexts, e.g. `<a id="foo">`{=html} or

    ``` =html
    <table>
    <tr><td>foo</td></tr>
    </table>
    ```
    

    Unlike Markdown, djot is not HTML-centric. Djot documents might be rendered to a variety of different formats, so although we want to provide the flexibility to include raw content in any output format, there is no reason to privilege HTML. For similar reasons we do not interpret HTML entities, as commonmark does.

  • To meet goal 2, we make reference link parsing local. Anything that looks like [foo][bar] or [foo][] gets treated as a reference link, regardless of whether [foo] is defined later in the document. A corollary is that we must get rid of shortcut link syntax, with just a single bracket pair, [like this]. It must always be clear what is a link without needing to know the surrounding context.

  • In support of goal 6, reference links are no longer case-insensitive. Supporting this beyond an ASCII context would require building in unicode case folding to every implementation, and it doesn't seem necessary.

  • A space or newline is required after > in block quotes, to avoid the violations of the principle of uniformity noted in goal 8:

    >This is not a
    >block quote in djot.
    
  • To meet goal 3, we avoid using doubled characters for strong emphasis. Instead, we use _ for emphasis and * for strong emphasis. Emphasis can begin with one of these characters, as long as it is not followed by a space, and will end when a similar character is encountered, as long as it is not preceded by a space and some different characters have occurred in between. In the case of overlap, the first one to be closed takes precedence. (This simple rule also avoids the need we had in commonmark to determine unicode character classes---goal 6.)

  • Taken just by itself, this last change would introduce a number of expressive blind spots. For example, given the simple rule,

    _(_foo_)_
    

    parses as

    <em>(</em>foo<em>)</em>

    rather than

    <em>(<em>foo</em>)</em>

    If you want the latter interpretation, djot allows you to use the syntax

    _({_foo_})_
    

    The {_ is a _ that can only open emphasis, and the _} is a _ that can only close emphasis. The same can be done with * or any other inline formatting marker that is ambiguous between an opener and closer. These curly braces are required for certain inline markup, e.g. {=highlighting=}, {+insert+}, and {-delete-}, since the characters =, +, and - are found often in ordinary text.

  • In support of goal 1, code span parsing does not backtrack. So if you open a code span and don't close it, it extends to the end of the paragraph. That is similar to the way fenced code blocks work in commonmark.

    This is `inline code.
    
  • In support of goal 9, a generic attribute syntax is introduced. Attributes can be attached to any block-level element by putting them on the line before it, and to any inline-level element by putting them directly after it.

    {#introduction}
    This is the introductory paragraph, with
    an identifier `introduction`.
    
               {.important color="blue" #heading}
    ## heading
    
    The word *atelier*{weight="600"} is French.
    
  • Since we are going to have generic attributes, we no longer support quoted titles in links. One can add a title attribute if needed, but this isn't very common, so we don't need a special syntax for it:

    [Link text](url){title="Click me!"}
    
  • Fenced divs and bracketed spans are introduced in order to allow attributes to be attached to arbitrary sequences of block-level or inline-level elements. For example,

    {#warning .sidebar}
    ::: Warning
    This is a warning.
    Here is a word in [français]{lang=fr}.
    :::
    

Syntax

For a full syntax reference, see the syntax description.

A vim syntax highlighting definition for djot is provided in editors/vim/.

Implementations

There are currently two complete djot implementations:

  • djot.js is a JavaScript (actually TypeScript) library and command-line tool. It includes a djot renderer and a converter between pandoc and djot ASTs, allowing conversion between djot and many other formats.

  • djot.lua is a Lua library and command-line tool, with no dependencies. It includes a Makefile that can produce a static C library that can be linked with liblua in a standalone executable. It also includes a custom reader and writer for djot that can be used for interoperability with pandoc, allowing conversion between djot and many other formats.

Both implementations support filters, small programs that can alter the AST after parsing, allowing djot to be customized to your needs.

djot.lua was the original reference implementation, but current development is focused on djot.js, and it is possible that djot.lua will not be kept up to date with the latest syntax changes.

File extension

The extension .dj may be used to indicate that the contents of a file are djot-formatted text.

License

The code and documentation are released under the MIT license.

More Repositories

1

pandoc

Universal markup converter
Haskell
34,313
star
2

gitit

A wiki using HAppS, pandoc, and git
Haskell
2,126
star
3

peg-markdown

An implementation of markdown in C, using a PEG grammar
C
686
star
4

pandocfilters

A python module for writing pandoc filters, with a collection of examples
Python
511
star
5

pandoc-templates

Templates for pandoc, tagged to release
HTML
418
star
6

yst

create static websites from YAML data and string templates
Haskell
373
star
7

texmath

A Haskell library for converting LaTeX math to MathML.
Haskell
291
star
8

pandoc-citeproc

Library and executable for using citeproc with pandoc
Haskell
288
star
9

lunamark

Lua library for conversion between markup formats
C
192
star
10

skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
Haskell
185
star
11

citeproc

CSL citation processing library in Haskell
Haskell
138
star
12

commonmark-hs

Pure Haskell commonmark parsing library, designed to be flexible and extensible
Haskell
130
star
13

djot.js

JavaScript implementation of djot
TypeScript
120
star
14

highlighting-kate

A syntax highlighting library in Haskell, based on Kate syntax definitions
HTML
109
star
15

cheapskate

Experimental markdown processor in Haskell
HTML
105
star
16

pandoc-types

types for representing structured documents
Haskell
105
star
17

gitit2

A reimplementation of gitit in Yesod
Haskell
94
star
18

lcmark

Flexible CommonMark converter
Lua
54
star
19

doctemplates

Pandoc-compatible templating system
Haskell
49
star
20

zip-archive

Native Haskell library for working with zip archives
Haskell
44
star
21

cmark-hs

Haskell bindings to libcmark commonmark parser
C
43
star
22

djot.lua

Lua parser for the djot light markup language
Lua
39
star
23

typst-hs

Haskell library for parsing and evaluating typst
Haskell
32
star
24

dotvim

My vim configuration
Vim Script
30
star
25

scripts

A collection of small scripts to do various things
Shell
28
star
26

filestore

A versioning file store backed by git, darcs, or mercurial
Haskell
28
star
27

pandoc-website

Source files for pandoc's website
Lua
28
star
28

illuminate

An efficient syntax highlighting library in Haskell, using alex-generated lexers
Haskell
26
star
29

emojis

Haskell library for emojis
Haskell
25
star
30

markdown-peg

A Haskell implementation of markdown using a PEG grammar
Haskell
24
star
31

pandoc-server

Simple server app for pandoc conversions.
Haskell
20
star
32

doclayout

A prettyprinting library designed for laying out plain text documents
Haskell
20
star
33

standalone-html

Incorporates external dependencies into HTML file using data: URI scheme
Haskell
19
star
34

pandoc-tex2svg

Pandoc filter to convert math to SVG using MathJax-node's tex2svg
HTML
19
star
35

cloudlib

tools for keeping a library of books and articles on Amazon's S3 and SimpleDB
Ruby
19
star
36

cmark-lua

Lua bindings to libcmark CommonMark parser
C
17
star
37

HeX

a flexible text macro system
Haskell
17
star
38

djoths

Haskell parser for the djot light markup language
Haskell
17
star
39

unicode-collation

Haskell implementation of the Unicode Collation Algorithm
Haskell
16
star
40

sep-offprint

Creates formatted "offprints" of Stanford Encyclopedia of Philosophy entries.
15
star
41

BayHac2014

Slides for my presentation on pandoc at BayHac2014
TeX
14
star
42

cmarkpdf

Steps towards a PDF renderer for cmark using libharu
C
14
star
43

lunamark-standalone

Standalone version of lunamark (compiled with no library dependencies)
C
12
star
44

commonmarker

Ruby wrapper for libcmark (CommonMark parser)
Ruby
12
star
45

ipynb

Data structures and JSON serializer/deserializer for Jupyter notebooks (.ipynb) format.
Jupyter Notebook
11
star
46

hsb2hs

Preprocessor for inserting literals with binary blobs into Haskell programs.
Haskell
11
star
47

gogar

Computer implementation of Robert Brandom's "game of giving and asking for reasons," from Making It Explicit, chapter 3.
Ruby
10
star
48

emacsd

emacs configuration
Emacs Lisp
9
star
49

hscommonmark

pure Haskell CommonMark parser
Haskell
9
star
50

recaptcha

Haskell library for using the reCAPTCHA service
Haskell
8
star
51

select-meta

Pandoc lua filter for constructing metadata from YAML data sources using queries
Lua
8
star
52

html2cmark

Lua library to convert HTML5 to commonmark
Lua
8
star
53

citeproc-hs-bin

Command-line interface to the citeproc-hs CSL citation processing library
Haskell
8
star
54

grammata

Well-typed system for generating documents in multiple formats
Haskell
7
star
55

hw2gitit

Script to convert haskellwiki pages to a gitit wiki
Haskell
7
star
56

ecstatic

Static website management using tenjin templates and YAML data files
Ruby
7
star
57

hsgit

A higher-level interface to libgit2 functions than hlibgit2
Haskell
6
star
58

trypandoc

Live demo of pandoc
JavaScript
6
star
59

pandoc-highlight

Filter and library for using pandoc with highlighting-kate
Haskell
6
star
60

commonmark-lua

Lua binding to libcmark commonmark parser
Lua
5
star
61

rfc5051

Haskell implementation of RFC5051, simple unicode collation.
Haskell
5
star
62

jgm.github.com

jgm's web pages on github
4
star
63

rocks

luarocks repository
4
star
64

cmark-fuzz-data

A minimal fuzz test suite for cmark created by american fuzzy lop and afl-cmin
3
star
65

GHCUnicodeAlt

Improved version of GHC.Unicode, with benchmarks
Haskell
3
star
66

luacmark

Lua binding to CommonMark
C
2
star
67

typst-symbols

Defines symbols and emoji used in typst
Haskell
2
star