• Stars
    star
    686
  • Rank 63,507 (Top 2 %)
  • Language
    C
  • License
    Other
  • Created about 16 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An implementation of markdown in C, using a PEG grammar

Note: this package is unmaintained.

What is this?

This is an implementation of John Gruber's markdown in C. It uses a parsing expression grammar (PEG) to define the syntax. This should allow easy modification and extension. It currently supports output in HTML, LaTeX, ODF, or groff_mm formats, and adding new formats is relatively easy.

It is pretty fast. A 179K text file that takes 5.7 seconds for Markdown.pl (v. 1.0.1) to parse takes less than 0.2 seconds for this markdown. It does, however, use a lot of memory (up to 4M of heap space while parsing the 179K file, and up to 80K for a 4K file). (Note that the memory leaks in earlier versions of this program have now been plugged.)

Both a library and a standalone program are provided.

peg-markdown is written and maintained by John MacFarlane (jgm on github), with significant contributions by Ryan Tomayko (rtomayko). It is released under both the GPL and the MIT license; see LICENSE for details.

Installing

On a linux or unix-based system

This program is written in portable ANSI C. It requires glib2. Most *nix systems will have this installed already. The build system requires GNU make.

The other required dependency, Ian Piumarta's peg/leg PEG parser generator, is included in the source directory. It will be built automatically. (However, it is not as portable as peg-markdown itself, and seems to require gcc.)

To make the 'markdown' executable:

make

(Or, on some systems, gmake.) Then, for usage instructions:

./markdown --help

To run John Gruber's Markdown 1.0.3 test suite:

make test

The test suite will fail on one of the list tests. Here's why. Markdown.pl encloses "item one" in the following list in <p> tags:

1.  item one
    * subitem
    * subitem

2.  item two

3.  item three

peg-markdown does not enclose "item one" in <p> tags unless it has a following blank line. This is consistent with the official markdown syntax description, and lets the author of the document choose whether <p> tags are desired.

Cross-compiling for Windows with MinGW on a linux box

Prerequisites:

Steps:

  1. Create the markdown parser using Linux-compiled leg from peg-0.1.4:

    ./peg-0.1.4/leg markdown_parser.leg >markdown_parser.c
    

    (Note: The same thing could be accomplished by cross-compiling leg, executing it on Windows, and copying the resulting C file to the Linux cross-compiler host.)

  2. Run the cross compiler with include flag for the Windows glib-2.0 headers: for example,

    /usr/bin/i586-mingw32msvc-cc -c \
    -I/usr/i586-mingw32msvc/include/glib-2.0 \
    -I/usr/i586-mingw32msvc/lib/glib-2.0/include -Wall -O3 -ansi markdown*.c
    
  3. Link against Windows glib-2.0 headers: for example,

    /usr/bin/i586-mingw32msvc-cc markdown*.o \
    -Wl,-L/usr/i586-mingw32msvc/lib/glib,--dy,--warn-unresolved-symbols,-lglib-2.0 \
    -o markdown.exe
    

The resulting executable depends on the glib dll file, so be sure to load the glib binary on the Windows host.

Compiling with MinGW on Windows

These directions assume that MinGW is installed in c:\MinGW and glib-2.0 is installed in the MinGW directory hierarchy (with the mingw bin directory in the system path).

Unzip peg-markdown in a temp directory. From the directory with the peg-markdown source, execute:

cd peg-0.1.4
make PKG_CONFIG=c:/path/to/glib/bin/pkg-config.exe

Extensions

peg-markdown supports extensions to standard markdown syntax. These can be turned on using the command line flag -x or --extensions. -x by itself turns on all extensions. Extensions can also be turned on selectively, using individual command-line options. To see the available extensions:

./markdown --help-extensions

The --smart extension provides "smart quotes", dashes, and ellipses.

The --notes extension provides a footnote syntax like that of Pandoc or PHP Markdown Extra.

The --strike extension provides a strike-through syntax like that of Redcarpet. For strike-through support in LaTeX documents the sout macro from the ulem package is used. Add \usepackage[normalem]{ulem} to your document's preamble to load it.

Using the library

The library exports two functions:

GString * markdown_to_g_string(char *text, int extensions, int output_format);
char * markdown_to_string(char *text, int extensions, int output_format);

The only difference between these is that markdown_to_g_string returns a GString (glib's automatically resizable string), while markdown_to_string returns a regular character pointer. The memory allocated for these must be freed by the calling program, using g_string_free() or free().

text is the markdown-formatted text to be converted. Note that tabs will be converted to spaces, using a four-space tab stop. Character encodings are ignored.

extensions is a bit-field specifying which syntax extensions should be used. If extensions is 0, no extensions will be used. If it is 0xFFFFFF, all extensions will be used. To set extensions selectively, use the bitwise & operator and the following constants:

  • EXT_SMART turns on smart quotes, dashes, and ellipses.
  • EXT_NOTES turns on footnote syntax. Pandoc's footnote syntax is used here.
  • EXT_FILTER_HTML filters out raw HTML (except for styles).
  • EXT_FILTER_STYLES filters out styles in HTML.
  • EXT_STRIKE turns on strike-through syntax.

output_format is either HTML_FORMAT, LATEX_FORMAT, ODF_FORMAT, or GROFF_MM_FORMAT.

To use the library, include markdown_lib.h. See markdown.c for an example.

Hacking

It should be pretty easy to modify the program to produce other formats, and to parse syntax extensions. A quick guide:

  • markdown_parser.leg contains the grammar itself.

  • markdown_output.c contains functions for printing the Element structure in various output formats.

  • To add an output format, add the format to markdown_formats in markdown_lib.h. Then modify print_element in markdown_output.c, and add functions print_XXXX_string, print_XXXX_element, and print_XXXX_element_list. Also add an option in the main program that selects the new format. Don't forget to add it to the list of formats in the usage message.

  • To add syntax extensions, define them in the PEG grammar (markdown_parser.leg), using existing extensions as a guide. New inline elements will need to be added to Inline =; new block elements will need to be added to Block =. (Note: the order of the alternatives does matter in PEG grammars.)

  • If you need to add new types of elements, modify the keys enum in markdown_peg.h.

  • By using &{ } rules one can selectively disable extensions depending on command-line options. For example, &{ extension(EXT_SMART) } succeeds only if the EXT_SMART bit of the global syntax_extensions is set. Add your option to markdown_extensions in markdown_lib.h, and add an option in markdown.c to turn on your extension.

  • Note: Avoid using [^abc] character classes in the grammar, because they cause problems with non-ascii input. Instead, use: ( !'a' !'b' !'c' . )

Acknowledgements

Support for ODF output was added by Fletcher T. Penney.

More Repositories

1

pandoc

Universal markup converter
Haskell
32,506
star
2

gitit

A wiki using HAppS, pandoc, and git
Haskell
2,126
star
3

djot

A light markup language
HTML
1,557
star
4

pandocfilters

A python module for writing pandoc filters, with a collection of examples
Python
494
star
5

pandoc-templates

Templates for pandoc, tagged to release
HTML
418
star
6

yst

create static websites from YAML data and string templates
Haskell
373
star
7

texmath

A Haskell library for converting LaTeX math to MathML.
Haskell
291
star
8

pandoc-citeproc

Library and executable for using citeproc with pandoc
Haskell
288
star
9

lunamark

Lua library for conversion between markup formats
C
186
star
10

skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
Haskell
185
star
11

citeproc

CSL citation processing library in Haskell
Haskell
138
star
12

commonmark-hs

Pure Haskell commonmark parsing library, designed to be flexible and extensible
Haskell
130
star
13

djot.js

JavaScript implementation of djot
TypeScript
120
star
14

highlighting-kate

A syntax highlighting library in Haskell, based on Kate syntax definitions
HTML
109
star
15

cheapskate

Experimental markdown processor in Haskell
HTML
105
star
16

pandoc-types

types for representing structured documents
Haskell
105
star
17

gitit2

A reimplementation of gitit in Yesod
Haskell
94
star
18

lcmark

Flexible CommonMark converter
Lua
54
star
19

doctemplates

Pandoc-compatible templating system
Haskell
49
star
20

zip-archive

Native Haskell library for working with zip archives
Haskell
44
star
21

cmark-hs

Haskell bindings to libcmark commonmark parser
C
43
star
22

djot.lua

Lua parser for the djot light markup language
Lua
39
star
23

typst-hs

Haskell library for parsing and evaluating typst
Haskell
32
star
24

dotvim

My vim configuration
Vim Script
30
star
25

scripts

A collection of small scripts to do various things
Shell
28
star
26

filestore

A versioning file store backed by git, darcs, or mercurial
Haskell
28
star
27

pandoc-website

Source files for pandoc's website
Lua
28
star
28

illuminate

An efficient syntax highlighting library in Haskell, using alex-generated lexers
Haskell
26
star
29

emojis

Haskell library for emojis
Haskell
25
star
30

markdown-peg

A Haskell implementation of markdown using a PEG grammar
Haskell
24
star
31

pandoc-server

Simple server app for pandoc conversions.
Haskell
20
star
32

doclayout

A prettyprinting library designed for laying out plain text documents
Haskell
20
star
33

standalone-html

Incorporates external dependencies into HTML file using data: URI scheme
Haskell
19
star
34

pandoc-tex2svg

Pandoc filter to convert math to SVG using MathJax-node's tex2svg
HTML
19
star
35

cloudlib

tools for keeping a library of books and articles on Amazon's S3 and SimpleDB
Ruby
19
star
36

cmark-lua

Lua bindings to libcmark CommonMark parser
C
17
star
37

HeX

a flexible text macro system
Haskell
17
star
38

djoths

Haskell parser for the djot light markup language
Haskell
17
star
39

unicode-collation

Haskell implementation of the Unicode Collation Algorithm
Haskell
16
star
40

sep-offprint

Creates formatted "offprints" of Stanford Encyclopedia of Philosophy entries.
15
star
41

BayHac2014

Slides for my presentation on pandoc at BayHac2014
TeX
14
star
42

cmarkpdf

Steps towards a PDF renderer for cmark using libharu
C
14
star
43

lunamark-standalone

Standalone version of lunamark (compiled with no library dependencies)
C
12
star
44

commonmarker

Ruby wrapper for libcmark (CommonMark parser)
Ruby
12
star
45

hsb2hs

Preprocessor for inserting literals with binary blobs into Haskell programs.
Haskell
11
star
46

ipynb

Data structures and JSON serializer/deserializer for Jupyter notebooks (.ipynb) format.
Jupyter Notebook
10
star
47

gogar

Computer implementation of Robert Brandom's "game of giving and asking for reasons," from Making It Explicit, chapter 3.
Ruby
10
star
48

emacsd

emacs configuration
Emacs Lisp
9
star
49

hscommonmark

pure Haskell CommonMark parser
Haskell
9
star
50

recaptcha

Haskell library for using the reCAPTCHA service
Haskell
8
star
51

select-meta

Pandoc lua filter for constructing metadata from YAML data sources using queries
Lua
8
star
52

html2cmark

Lua library to convert HTML5 to commonmark
Lua
8
star
53

citeproc-hs-bin

Command-line interface to the citeproc-hs CSL citation processing library
Haskell
8
star
54

grammata

Well-typed system for generating documents in multiple formats
Haskell
7
star
55

ecstatic

Static website management using tenjin templates and YAML data files
Ruby
7
star
56

hw2gitit

Script to convert haskellwiki pages to a gitit wiki
Haskell
7
star
57

hsgit

A higher-level interface to libgit2 functions than hlibgit2
Haskell
6
star
58

pandoc-highlight

Filter and library for using pandoc with highlighting-kate
Haskell
6
star
59

trypandoc

Live demo of pandoc
JavaScript
6
star
60

commonmark-lua

Lua binding to libcmark commonmark parser
Lua
5
star
61

rfc5051

Haskell implementation of RFC5051, simple unicode collation.
Haskell
5
star
62

jgm.github.com

jgm's web pages on github
4
star
63

rocks

luarocks repository
4
star
64

GHCUnicodeAlt

Improved version of GHC.Unicode, with benchmarks
Haskell
3
star
65

cmark-fuzz-data

A minimal fuzz test suite for cmark created by american fuzzy lop and afl-cmin
3
star
66

luacmark

Lua binding to CommonMark
C
2
star
67

typst-symbols

Defines symbols and emoji used in typst
Haskell
2
star