• Stars
    star
    447
  • Rank 97,403 (Top 2 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 13 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Yet another markdown processor for the JVM

Txtmark - Java markdown processor

Copyright (C) 2011-2015 René Jeschke [email protected]
See LICENSE.txt for licensing information.


Txtmark is yet another markdown processor for the JVM.

  • It is easy to use:

    String result = txtmark.Processor.process("This is ***TXTMARK***");
    
  • It is fast (see below)
    ... well, it is the fastest markdown processor on the JVM right now. (This might be outdated, but txtmark is still flippin' fast.)

  • It does not depend on other libraries, so classpathing txtmark.jar is sufficient to use Txtmark in your project.

For an in-depth explanation of markdown have a look at the original Markdown Syntax.


Maven repository

Txtmark is available on maven central.


Txtmark extensions

To enable Txtmark's extended markdown parsing you can use the $PROFILE$ mechanism:

[$PROFILE$]: extended

This seemed to me as the easiest and safest way to enable different behaviours. Just put this line into your Txtmark file like you would use reference links.

Behavior changes when using [$PROFILE$]: extended

  • Lists and code blocks end a paragraph

    In normal markdown the following:

    This is a paragraph
    * and this is not a list
    

    Will produce:

    <p>This is a paragraph
    * and this is not a list</p>
    

    When using Txtmark extensions this changes to:

    <p>This is a paragraph</p>
    <ul>
    <li>and this is not a list</li>
    </ul>
    
  • Text anchors

    Headlines and list items may recieve an ID which you can refer to using links.

    ## Headline with ID ##     {#headid}
    
    Another headline with ID   {#headid2}
    ------------------------
    
    * List with ID             {#listid}
    
    Links: [Foo] (#headid)
    

    this will produce:

    <h2 id="headid">Headline with ID</h2>
    <h2 id="headid2">Another headline with ID</h2>
    <ul>
    <li id="listid">List with ID</li>
    </ul>
    <p>Links: <a href="#headid">Foo</a></p>
    

    The ID must be the last thing on the first line.

    All spaces before {# get removed, so you can't use an ID and a manual line break in the same line.

  • Auto HTML entities

    • (C) becomes &copy; - ©
    • (R) becomes &reg; - ®
    • (TM) becomes &trade; - ™
    • -- becomes &ndash; - –
    • --- becomes &mdash; - —
    • ... becomes &hellip; - …
    • << becomes &laquo; - «
    • >> becomes &raquo; - »
    • "Hello" becomes &ldquo;Hello&rdquo; - “Hello”
  • Underscores (Emphasis)

    Underscores in the middle of a word don't result in emphasis.

    Con_cat_this
    

    normally produces this:

    Con<em>cat</em>this
    
  • Superscript

    You can use ^ to mark a span as superscript.

    2^2^ = 4
    

    turns into

    2<sup>2</sup> = 4
    
  • Abbreviations

    Abbreviations are defined like reference links, but using a * instead of a link and must be single-line only.

    [Git]: * "Fast distributed revision control system"
    

    and used like this

    This is [Git]!
    

    which will produce

    This is <abbr title="Fast distributed revision control system">Git</abbr>!
    
  • Fenced code blocks

    ```
    This is code!
    ```
    
    ~~~
    Another code block
    ~~~
    
    ~~~
    You can also mix flavours
    ```
    

    Fenced code block delimiter lines do start with at least three of `` or `~

    It is possible to add meta data to the beginning line. Everything trailing after `` or `~ is then considered meta data. These are all valid meta lines:

    ```python
    ~ ~ ~ ~ ~java
    ``` ``` ``` this is even more meta
    

    The meta information that you provide here can be used with a BlockEmitter to include e.g. syntax highlighted code blocks. Here's an example:

    public class CodeBlockEmitter implements BlockEmitter
    {
        private static void append(StringBuilder out, List<String> lines)
        {
            out.append("<pre class=\"pre_no_hl\">");
            for (final String l : lines)
            {
                Utils.escapedAdd(out, l);
                out.append('\n');
            }
            out.append("</pre>");
        }
    
        @Override
        public void emitBlock(StringBuilder out, List<String> lines, String meta)
        {
            if (Strings.isEmpty(meta))
            {
                append(out, lines);
            }
            else
            {
                try
                {
                    // Utils#highlight(...) is not included with txtmark, it's sole purpose
                    // is to show what the meta can be used for
                    out.append(Utils.highlight(lines, meta));
                    out.append('\n');
                }
                catch (final IOException e)
                {
                    // Ignore or do something, still, pump out the lines
                    append(out, lines);
                }
            }
        }
    }
    

    You can then set the BlockEmitter in the txtmark Configuration using Configuration.Builder#setCodeBlockEmitter(BlockEmitter emitter).


Markdown conformity

Txtmark passes all tests inside MarkdownTest_1.0_2007-05-09 except of two:

  1. Images.text

    Fails because Txtmark doesn't produce empty 'title' image attributes.
    (IMHO: Images ... OK)

  2. Literal quotes in titles.text

    What the frell ... this test will continue to FAIL.
    Sorry, but using unescaped " in a title which should be surrounded by " is unacceptable for me ;)

    Change:

    Foo [bar](/url/ "Title with "quotes" inside").
    [bar]: /url/ "Title with "quotes" inside"
    

    to:

    Foo [bar](/url/ "Title with \"quotes\" inside").
    [bar]: /url/ "Title with \"quotes\" inside"
    

    and Txtmark will produce the correct result.
    (IMHO: Literal quotes in titles ... OK)


Where Txtmark is not like Markdown

  • Txtmark does not produce empty title attributes in link and image tags.

  • Unescaped " in link titles starting with " are not recognized and result in unexpected behaviour.

  • Due to a different list parsing approach some things get interpreted differently:

    * List
    > Quote
    

    will produce when processed with Markdown:

    <p><ul>
    <li>List</p>
    
    <blockquote>
     <p>Quote</li>
    </ul></p>
    </blockquote>
    

    and this when produced with Txtmark:

    <ul>
    <li>List<blockquote><p>Quote</p>
    </blockquote>
    </li>
    </ul>
    

    Another one:

    * List
    ====
    

    will produce when processed with Markdown:

    <h1>* List</h1>
    

    and this when produced with Txtmark:

    <ul>
    <li><h1>List</h1>
    </li>
    </ul>
    
  • List of escapeable characters:

    \   [   ]   (   )   {   }   #
    "   '   .   <   >   +   -   _
    !   `   ^
    

Performance comparison of markdown processors for the JVM

Remarks: These benchmarks are too old to be of any value. I leave them here as a reference, though.

Based on this benchmark suite.

Excerpt from the original post concerning this benchmark suite:

Most of these tests are of course unrealistic: Who would write a text where each word is a link? Yet they serve an important use: It makes it possible for the developer to pinpoint the parts of the parser where there is most room for improvement. Also, it explains why certain texts might render much faster in one Processor than in another.

Benchmark system:

  • Ubuntu Linux 10.04 32 Bit
  • Intel(R) Core(TM) 2 Duo T7500 @ 2.2GHz
  • Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
  • Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode)
Test Actuarius PegDown Knockoff Txtmark
1st Run (ms) 2nd Run (ms) 1st Run (ms) 2nd Run (ms) 1st Run (ms) 2nd Run (ms) 1st Run (ms) 2nd Run (ms)
Plain Paragraphs 1127 577 1273 1037 740 400 157 64
Every Word Emphasized 1562 1001 1523 1513 13982 13221 54 46
Every Word Strong 1125 997 1115 1114 9543 9647 44 41
Every Word Inline Code 382 277 1058 1052 9116 9074 51 39
Every Word a Fast Link 2257 1600 537 531 3980 3410 109 55
Every Word Consisting of Special XML Chars 4045 4270 2985 3044 312 377 778 775
Every Word wrapped in manual HTML tags 3334 2919 901 896 3863 3736 73 62
Every Line with a manual line break 510 588 1445 1440 1527 1130 56 56
Every word with a full link 452 246 1045 996 1884 1819 86 55
Every word with a full image 268 150 1140 1132 1985 1908 38 36
Every word with a reference link 9847 9082 18956 18719 121136 115416 1525 1380
Every block a quote 445 206 1312 1301 478 457 50 45
Every block a codeblock 70 87 373 376 161 175 60 22
Every block a list 920 912 1720 1725 622 651 55 55
All tests together 3281 2885 5184 5196 10130 10460 206 196
Benchmarked versions:

Actuarius version: 0.2
PegDown version: 0.8.5.4
Knockoff version: 0.7.3-15


Mentioned/related projects

Markdown is Copyright (C) 2004 by John Gruber
SmartyPants is Copyright (C) 2003 by John Gruber
Actuarius is Copyright (C) 2010 by Christoph Henkelmann
Knockoff is Copyright (C) 2009-2011 by Tristan Juricek
PegDown is Copyright (C) 2010 by Mathias Doenitz
PHP Markdown & Extra is Copyright (C) 2009 Michel Fortin


Project link: https://github.com/rjeschke/txtmark