• Stars
    star
    494
  • Rank 86,001 (Top 2 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 10 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A python module for writing pandoc filters, with a collection of examples

pandocfilters

A python module for writing pandoc filters

What are pandoc filters?

Pandoc filters are pipes that read a JSON serialization of the Pandoc AST from stdin, transform it in some way, and write it to stdout. They can be used with pandoc (>= 1.12) either using pipes

pandoc -t json -s | ./caps.py | pandoc -f json

or using the --filter (or -F) command-line option.

pandoc --filter ./caps.py -s

For more on pandoc filters, see the pandoc documentation under --filter and the tutorial on writing filters.

For an alternative library for writing pandoc filters, with a more "Pythonic" design, see panflute.

Compatibility

Pandoc 1.16 introduced link and image attributes to the existing caption and target arguments, requiring a change in pandocfilters that breaks backwards compatibility. Consequently, you should use:

  • pandocfilters version <= 1.2.4 for pandoc versions 1.12--1.15, and
  • pandocfilters version >= 1.3.0 for pandoc versions >= 1.16.

Pandoc 1.17.3 (pandoc-types 1.17.*) introduced a new JSON format. pandocfilters 1.4.0 should work with both the old and the new format.

Installing

Run this inside the present directory:

python setup.py install

Or install from PyPI:

pip install pandocfilters

Available functions

The main functions pandocfilters exports are

  • walk(x, action, format, meta)

    Walk a tree, applying an action to every object. Returns a modified tree. An action is a function of the form action(key, value, format, meta), where:

    • key is the type of the pandoc object (e.g. 'Str', 'Para')
    • value is the contents of the object (e.g. a string for 'Str', a list of inline elements for 'Para')
    • format is the target output format (as supplied by the format argument of walk)
    • meta is the document's metadata

    The return of an action is either:

    • None: this means that the object should remain unchanged
    • a pandoc object: this will replace the original object
    • a list of pandoc objects: these will replace the original object; the list is merged with the neighbors of the original objects (spliced into the list the original object belongs to); returning an empty list deletes the object
  • toJSONFilter(action)

    Like toJSONFilters, but takes a single action as argument.

  • toJSONFilters(actions)

    Generate a JSON-to-JSON filter from stdin to stdout

    The filter:

    • reads a JSON-formatted pandoc document from stdin
    • transforms it by walking the tree and performing the actions
    • returns a new JSON-formatted pandoc document to stdout

    The argument actions is a list of functions of the form action(key, value, format, meta), as described in more detail under walk.

    This function calls applyJSONFilters, with the format argument provided by the first command-line argument, if present. (Pandoc sets this by default when calling filters.)

  • applyJSONFilters(actions, source, format="")

    Walk through JSON structure and apply filters

    This:

    • reads a JSON-formatted pandoc document from a source string
    • transforms it by walking the tree and performing the actions
    • returns a new JSON-formatted pandoc document as a string

    The actions argument is a list of functions (see walk for a full description).

    The argument source is a string encoded JSON object.

    The argument format is a string describing the output format.

    Returns a new JSON-formatted pandoc document.

  • stringify(x)

    Walks the tree x and returns concatenated string content, leaving out all formatting.

  • attributes(attrs)

    Returns an attribute list, constructed from the dictionary attrs.

How to use

Most users will only need toJSONFilter. Here is a simple example of its use:

#!/usr/bin/env python

"""
Pandoc filter to convert all regular text to uppercase.
Code, link URLs, etc. are not affected.
"""

from pandocfilters import toJSONFilter, Str

def caps(key, value, format, meta):
  if key == 'Str':
    return Str(value.upper())

if __name__ == "__main__":
  toJSONFilter(caps)

Examples

The examples subdirectory in the source repository contains the following filters. These filters should provide a useful starting point for developing your own pandocfilters.

abc.py
Pandoc filter to process code blocks with class abc containing ABC notation into images. Assumes that abcm2ps and ImageMagick's convert are in the path. Images are put in the abc-images directory.
caps.py
Pandoc filter to convert all regular text to uppercase. Code, link URLs, etc. are not affected.
blockdiag.py
Pandoc filter to process code blocks with class "blockdiag" into generated images. Needs utils from http://blockdiag.com.
comments.py
Pandoc filter that causes everything between <!-- BEGIN COMMENT --> and <!-- END COMMENT --> to be ignored. The comment lines must appear on lines by themselves, with blank lines surrounding
deemph.py
Pandoc filter that causes emphasized text to be displayed in ALL CAPS.
deflists.py
Pandoc filter to convert definition lists to bullet lists with the defined terms in strong emphasis (for compatibility with standard markdown).
gabc.py
Pandoc filter to convert code blocks with class "gabc" to LaTeX \gabcsnippet commands in LaTeX output, and to images in HTML output.
graphviz.py
Pandoc filter to process code blocks with class graphviz into graphviz-generated images.
lilypond.py
Pandoc filter to process code blocks with class "ly" containing Lilypond notation.
metavars.py
Pandoc filter to allow interpolation of metadata fields into a document. %{fields} will be replaced by the field's value, assuming it is of the type MetaInlines or MetaString.
myemph.py
Pandoc filter that causes emphasis to be rendered using the custom macro \myemph{...} rather than \emph{...} in latex. Other output formats are unaffected.
plantuml.py
Pandoc filter to process code blocks with class plantuml to images. Needs plantuml.jar from http://plantuml.com/.
ditaa.py
Pandoc filter to process code blocks with class ditaa to images. Needs ditaa.jar from http://ditaa.sourceforge.net/.
theorem.py
Pandoc filter to convert divs with class="theorem" to LaTeX theorem environments in LaTeX output, and to numbered theorems in HTML output.
tikz.py
Pandoc filter to process raw latex tikz environments into images. Assumes that pdflatex is in the path, and that the standalone package is available. Also assumes that ImageMagick's convert is in the path. Images are put in the tikz-images directory.

API documentation

By default most filters use get_filename4code to create a directory ...-images to save temporary files. This directory doesn't get removed as it can be used as a cache so that later pandoc runs don't have to recreate files if they already exist. The directory is generated in the current directory.

If you prefer to have a clean directory after running pandoc filters, you can set an environment variable PANDOCFILTER_CLEANUP to any non-empty value such as 1 which forces the code to create a temporary directory that will be removed by the end of execution.

More Repositories

1

pandoc

Universal markup converter
Haskell
32,506
star
2

gitit

A wiki using HAppS, pandoc, and git
Haskell
2,126
star
3

djot

A light markup language
HTML
1,557
star
4

peg-markdown

An implementation of markdown in C, using a PEG grammar
C
686
star
5

pandoc-templates

Templates for pandoc, tagged to release
HTML
418
star
6

yst

create static websites from YAML data and string templates
Haskell
373
star
7

texmath

A Haskell library for converting LaTeX math to MathML.
Haskell
291
star
8

pandoc-citeproc

Library and executable for using citeproc with pandoc
Haskell
288
star
9

lunamark

Lua library for conversion between markup formats
C
186
star
10

skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
Haskell
185
star
11

citeproc

CSL citation processing library in Haskell
Haskell
138
star
12

commonmark-hs

Pure Haskell commonmark parsing library, designed to be flexible and extensible
Haskell
130
star
13

djot.js

JavaScript implementation of djot
TypeScript
120
star
14

highlighting-kate

A syntax highlighting library in Haskell, based on Kate syntax definitions
HTML
109
star
15

cheapskate

Experimental markdown processor in Haskell
HTML
105
star
16

pandoc-types

types for representing structured documents
Haskell
105
star
17

gitit2

A reimplementation of gitit in Yesod
Haskell
94
star
18

lcmark

Flexible CommonMark converter
Lua
54
star
19

doctemplates

Pandoc-compatible templating system
Haskell
49
star
20

zip-archive

Native Haskell library for working with zip archives
Haskell
44
star
21

cmark-hs

Haskell bindings to libcmark commonmark parser
C
43
star
22

djot.lua

Lua parser for the djot light markup language
Lua
39
star
23

typst-hs

Haskell library for parsing and evaluating typst
Haskell
32
star
24

dotvim

My vim configuration
Vim Script
30
star
25

scripts

A collection of small scripts to do various things
Shell
28
star
26

filestore

A versioning file store backed by git, darcs, or mercurial
Haskell
28
star
27

pandoc-website

Source files for pandoc's website
Lua
28
star
28

illuminate

An efficient syntax highlighting library in Haskell, using alex-generated lexers
Haskell
26
star
29

emojis

Haskell library for emojis
Haskell
25
star
30

markdown-peg

A Haskell implementation of markdown using a PEG grammar
Haskell
24
star
31

pandoc-server

Simple server app for pandoc conversions.
Haskell
20
star
32

doclayout

A prettyprinting library designed for laying out plain text documents
Haskell
20
star
33

standalone-html

Incorporates external dependencies into HTML file using data: URI scheme
Haskell
19
star
34

pandoc-tex2svg

Pandoc filter to convert math to SVG using MathJax-node's tex2svg
HTML
19
star
35

cloudlib

tools for keeping a library of books and articles on Amazon's S3 and SimpleDB
Ruby
19
star
36

cmark-lua

Lua bindings to libcmark CommonMark parser
C
17
star
37

HeX

a flexible text macro system
Haskell
17
star
38

djoths

Haskell parser for the djot light markup language
Haskell
17
star
39

unicode-collation

Haskell implementation of the Unicode Collation Algorithm
Haskell
16
star
40

sep-offprint

Creates formatted "offprints" of Stanford Encyclopedia of Philosophy entries.
15
star
41

BayHac2014

Slides for my presentation on pandoc at BayHac2014
TeX
14
star
42

cmarkpdf

Steps towards a PDF renderer for cmark using libharu
C
14
star
43

lunamark-standalone

Standalone version of lunamark (compiled with no library dependencies)
C
12
star
44

commonmarker

Ruby wrapper for libcmark (CommonMark parser)
Ruby
12
star
45

hsb2hs

Preprocessor for inserting literals with binary blobs into Haskell programs.
Haskell
11
star
46

ipynb

Data structures and JSON serializer/deserializer for Jupyter notebooks (.ipynb) format.
Jupyter Notebook
10
star
47

gogar

Computer implementation of Robert Brandom's "game of giving and asking for reasons," from Making It Explicit, chapter 3.
Ruby
10
star
48

emacsd

emacs configuration
Emacs Lisp
9
star
49

hscommonmark

pure Haskell CommonMark parser
Haskell
9
star
50

recaptcha

Haskell library for using the reCAPTCHA service
Haskell
8
star
51

select-meta

Pandoc lua filter for constructing metadata from YAML data sources using queries
Lua
8
star
52

html2cmark

Lua library to convert HTML5 to commonmark
Lua
8
star
53

citeproc-hs-bin

Command-line interface to the citeproc-hs CSL citation processing library
Haskell
8
star
54

grammata

Well-typed system for generating documents in multiple formats
Haskell
7
star
55

ecstatic

Static website management using tenjin templates and YAML data files
Ruby
7
star
56

hw2gitit

Script to convert haskellwiki pages to a gitit wiki
Haskell
7
star
57

hsgit

A higher-level interface to libgit2 functions than hlibgit2
Haskell
6
star
58

pandoc-highlight

Filter and library for using pandoc with highlighting-kate
Haskell
6
star
59

trypandoc

Live demo of pandoc
JavaScript
6
star
60

commonmark-lua

Lua binding to libcmark commonmark parser
Lua
5
star
61

rfc5051

Haskell implementation of RFC5051, simple unicode collation.
Haskell
5
star
62

jgm.github.com

jgm's web pages on github
4
star
63

rocks

luarocks repository
4
star
64

GHCUnicodeAlt

Improved version of GHC.Unicode, with benchmarks
Haskell
3
star
65

cmark-fuzz-data

A minimal fuzz test suite for cmark created by american fuzzy lop and afl-cmin
3
star
66

luacmark

Lua binding to CommonMark
C
2
star
67

typst-symbols

Defines symbols and emoji used in typst
Haskell
2
star