• Stars
    star
    1,867
  • Rank 24,668 (Top 0.5 %)
  • Language
    Rust
  • License
    MIT License
  • Created over 8 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Rust library for syntax highlighting using Sublime Text syntax definitions.

syntect

Crates.io Documentation Crates.io Build Status codecov

syntect is a syntax highlighting library for Rust that uses Sublime Text syntax definitions. It aims to be a good solution for any Rust project that needs syntax highlighting, including deep integration with text editors written in Rust. It's used in production by at least two companies, and by many open source projects.

If you are writing a text editor (or something else needing highlighting) in Rust and this library doesn't fit your needs, I consider that a bug and you should file an issue or email me. I consider this project mostly complete, I still maintain it and review PRs, but it's not under heavy development.

Important Links

Getting Started

syntect is available on crates.io. You can install it by adding this line to your Cargo.toml:

syntect = "5.0"

After that take a look at the documentation and the examples.

If you've cloned this repository, be sure to run

git submodule update --init

to fetch all the required dependencies for running the tests.

Features/Goals

  • Work with many languages (accomplished through using existing grammar formats)
  • Highlight super quickly, faster than nearly all text editors
  • Include easy to use API for basic cases
  • API allows use in fancy text editors with piece tables and incremental re-highlighting and the like.
  • Expose internals of the parsing process so text editors can do things like cache parse states and use semantic info for code intelligence
  • High quality highlighting, supporting things like heredocs and complex syntaxes (like Rust's).
  • Include a compressed dump of all the default syntax definitions in the library binary so users don't have to manage a folder of syntaxes.
  • Well documented, I've tried to add a useful documentation comment to everything that isn't utterly self explanatory.
  • Built-in output to coloured HTML <pre> tags or 24-bit colour ANSI terminal escape sequences.
  • Nearly complete compatibility with Sublime Text 3, including lots of edge cases. Passes nearly all of Sublime's syntax tests, see issue 59.
  • Load up quickly, currently in around 23ms but could potentially be even faster.

Screenshots

There's currently an example program called syncat that prints one of the source files using hard-coded themes and syntaxes using 24-bit terminal escape sequences supported by many newer terminals. These screenshots don't look as good as they could for two reasons: first the sRGB colours aren't corrected properly, and second the Rust syntax definition uses some fancy labels that these themes don't have highlighting for.

Nested languages Base 16 Ocean Dark Solarized Light InspiredGithub

Example Code

Prints highlighted lines of a string to the terminal. See the easy and html module docs for more basic use case examples.

use syntect::easy::HighlightLines;
use syntect::parsing::SyntaxSet;
use syntect::highlighting::{ThemeSet, Style};
use syntect::util::{as_24_bit_terminal_escaped, LinesWithEndings};

// Load these once at the start of your program
let ps = SyntaxSet::load_defaults_newlines();
let ts = ThemeSet::load_defaults();

let syntax = ps.find_syntax_by_extension("rs").unwrap();
let mut h = HighlightLines::new(syntax, &ts.themes["base16-ocean.dark"]);
let s = "pub struct Wow { hi: u64 }\nfn blah() -> u64 {}";
for line in LinesWithEndings::from(s) {
    let ranges: Vec<(Style, &str)> = h.highlight_line(line, &ps).unwrap();
    let escaped = as_24_bit_terminal_escaped(&ranges[..], true);
    print!("{}", escaped);
}

Performance

Currently syntect is one of the faster syntax highlighting engines, but not the fastest. The following perf features are done:

  • Pre-link references between languages (e.g <script> tags) so there are no tree traversal string lookups in the hot-path
  • Compact binary representation of scopes to allow quickly passing and copying them around
  • Determine if a scope is a prefix of another scope using bit manipulation in only a few instructions
  • Cache regex matches to reduce number of times oniguruma is asked to search a line
  • Accelerate scope lookups to reduce how much selector matching has to be done to highlight a list of scope operations
  • Lazily compile regexes so startup time isn't taken compiling a thousand regexes for Actionscript that nobody will use
  • Optionally use the fancy-regex crate. Unfortunately this isn't yet faster than oniguruma on our benchmarks but it might be in the future.

The current perf numbers are below. These numbers may get better if more of the things above are implemented, but they're better than many other text editors. All measurements were taken on a mid 2012 15" retina Macbook Pro, my new 2019 Macbook takes about 70% of these times.

  • Highlighting 9200 lines/247kb of jQuery 2.1 takes 600ms. For comparison:
    • Textmate 2, Spacemacs and Visual Studio Code all take around 2ish seconds (measured by hand with a stopwatch, hence approximate).
    • Atom takes 6 seconds
    • Sublime Text 3 dev build takes 98ms (highlighting only, takes ~200ms click to pixels), despite having a super fancy javascript syntax definition.
    • Vim is instantaneous but that isn't a fair comparison since vim's highlighting is far more basic than the other editors. Compare vim's grammar to Sublime's.
    • These comparisons aren't totally fair, except the one to Sublime Text since that is using the same theme and the same complex definition for ES6 syntax.
  • Simple syntaxes are faster, JS is one of the most complex. It only takes 34ms to highlight a 1700 line 62kb XML file or 50,000 lines/sec.
  • ~138ms to load and link all the syntax definitions in the default Sublime package set.
    • but only ~23ms to load and link all the syntax definitions from an internal pre-made binary dump with lazy regex compilation.
  • ~1.9ms to parse and highlight the 30 line 791 character testdata/highlight_test.erb file. This works out to around 16,000 lines/second or 422 kilobytes/second.
  • ~250ms end to end for syncat to start, load the definitions, highlight the test file and shut down. This is mostly spent loading.

Feature Flags

Syntect makes heavy use of cargo features, to support users who require only a subset of functionality. In particular, it is possible to use the highlighting component of syntect without the parser (for instance when hand-rolling a higher performance parser for a particular language), by adding default-features = false to the syntect entry in your Cargo.toml.

For more information on available features, see the features section in Cargo.toml.

Pure Rust fancy-regex mode, without onig

Since 4.0 syntect offers an alternative pure-rust regex engine based on the fancy-regex engine which extends the awesome regex crate with support for fancier regex features that Sublime syntaxes need like lookaheads.

The advantage of fancy-regex is that it does not require the onig crate which requires building and linking the Oniguruma C library. Many users experience difficulty building the onig crate, especially on Windows and Webassembly.

As far as our tests can tell this new engine is just as correct, but it hasn't been tested as extensively in production. It also currently seems to be about half the speed of the default Oniguruma engine, although further testing and optimization (perhaps by you!) may eventually see it surpass Oniguruma's speed and become the default.

To use the fancy-regex engine with syntect, add it to your Cargo.toml like so:

syntect = { version = "4.2", default-features = false, features = ["default-fancy"]}

If you want to run examples with the fancy-regex engine you can use a command line like the following:

cargo run --features default-fancy --no-default-features --release --example syncat testdata/highlight_test.erb

Due to the way Cargo features work, if any crate you depend on depends on syntect without enabling fancy-regex then you'll get the default onig mode.

Note: The fancy-regex engine is absurdly slow in debug mode, because the regex engine (the main hot spot of highlighting) is now in Rust instead of C that's always built with optimizations. Consider using release mode or onig when testing.

Caching

Because syntect's API exposes internal cacheable data structures, there is a caching strategy that text editors can use that allows the text on screen to be re-rendered instantaneously regardless of the file size when a change is made after the initial highlight.

Basically, on the initial parse every 1000 lines or so copy the parse state into a side-buffer for that line. When a change is made to the text, because of the way Sublime Text grammars work (and languages in general), only the highlighting after that change can be affected. Thus when a change is made to the text, search backwards in the parse state cache for the last state before the edit, then kick off a background task to start re-highlighting from there. Once the background task highlights past the end of the current editor viewport, render the new changes and continue re-highlighting the rest of the file in the background.

This way from the time the edit happens to the time the new colouring gets rendered in the worst case only 999+length of viewport lines must be re-highlighted. Given the speed of syntect even with a long file and the most complicated syntax and theme this should take less than 100ms. This is enough to re-highlight on every key-stroke of the world's fastest typist in the worst possible case. And you can reduce this asymptotically to the length of the viewport by caching parse states more often, at the cost of more memory.

Any time the file is changed the latest cached state is found, the cache is cleared after that point, and a background job is started. Any already running jobs are stopped because they would be working on old state. This way you can just have one thread dedicated to highlighting that is always doing the most up-to-date work, or sleeping.

Parallelizing

Since 3.0, syntect can be used to do parsing/highlighting in parallel. SyntaxSet is both Send and Sync and so can easily be used from multiple threads. It is also Clone, which means you can construct a syntax set and then clone it to use for other threads if you prefer.

Compared to older versions, there's nothing preventing the serialization of a SyntaxSet either. So you can directly deserialize a fully linked SyntaxSet and start using it for parsing/highlighting. Before, it was always necessary to do linking first.

It is worth mentioning that regex compilation is done lazily only when the regexes are actually needed. Once a regex has been compiled, the compiled version is used for all threads after that. Note that this is done using interior mutability, so if multiple threads happen to encounter the same uncompiled regex at the same time, compiling might happen multiple times. After that, one of the compiled regexes will be used. When a SyntaxSet is cloned, the regexes in the cloned set will need to be recompiled currently.

For adding parallelism to a previously single-threaded program, the recommended thread pooling is rayon. However, if you're working in an already-threaded context where there might be more threads than you want (such as writing a handler for an Iron request), the recommendation is to force all highlighting to be done within a fixed-size thread pool using rust-scoped-pool. An example of the former is in examples/parsyncat.rs.

Examples Available

There's a number of examples of programs that use syntect in the examples folder and some code outside the repo:

  • syncat prints a highlighted file to the terminal using 24-bit colour ANSI escape codes. It demonstrates a simple file highlighting workflow.
  • synhtml prints an HTML file that will display the highlighted code. Demonstrates how syntect could be used by web servers and static site generators.
  • synstats collects a bunch of statistics about the code in a folder. Includes basic things like line count but also fancier things like number of functions. Demonstrates how syntect can be used for code analysis as well as highlighting, as well as how to use the APIs to parse out the semantic tokenization.
  • faiyels is a little code minimap visualizer I wrote that uses syntect for highlighting.
  • parsyncat is like syncat, but accepts multiple files and highlights them in parallel. It demonstrates how to use syntect from multiple threads.

Here's that stats that synstats extracts from syntect's codebase (not including examples and test data) as of this commit:

################## Stats ###################
File count:                               19
Total characters:                     155504

Function count:                          165
Type count (structs, enums, classes):     64

Code lines (traditional SLOC):          2960
Total lines (w/ comments & blanks):     4011
Comment lines (comment but no code):     736
Blank lines (lines-blank-comment):       315

Lines with a documentation comment:      646
Total words written in doc comments:    4734
Total words written in all comments:    5145
Characters of comment:                 41099

Projects using Syntect

Below is a list of projects using Syntect, in approximate order by how long they've been using syntect (feel free to send PRs to add to this list):

  • bat, a cat(1) clone, uses syntect for syntax highlighting.
  • Bolt, a desktop application for building and testing APIs, uses syntect for syntax highlighting.
  • catmark, a console markdown printer, uses syntect for code blocks.
  • Cobalt, a static site generator that uses syntect for highlighting code snippets.
  • crowbook, a Markdown book generator, uses syntect for code blocks.
  • delta, a syntax-highlighting pager for Git.
  • Docket, a documentation site generator that uses syntect for highlighting.
  • hors, instant coding answers via command line, uses syntect for highlighting code blocks.
  • mdcat, a console markdown printer, uses syntect for code blocks.
  • Scribe, a Rust text editor framework which uses syntect for highlighting.
  • syntect_server, an HTTP server for syntax highlighting.
  • tokio-cassandra, CQL shell in Rust, uses syntect for shell colouring.
  • xi-editor, a text editor in Rust which uses syntect for highlighting.
  • Zola, a static site generator that uses syntect for highlighting code snippets.
  • The Way, a code snippets manager for your terminal that uses syntectfor highlighting.
  • Broot, a terminal file manager, uses syntect for file previews.
  • Rusty Slider, a markdown slideshow presentation application, uses syntect for code blocks.

License and Acknowledgements

Thanks to Robin Stocker, Keith Hall and Martin Nordholts for making awesome substantial contributions of the most important impressive improvements syntect has had post-v1.0! They deserve lots of credit for where syntect is today. For example @robinst implemented fancy-regex support and a massive refactor to enable parallel highlighting using an arena. @keith-hall found and fixed many bugs and implemented Sublime syntax test support.

Thanks to Textmate 2 and @defuz's sublimate for the existing open source code I used as inspiration and in the case of sublimate's tmTheme loader, copy-pasted. All code (including defuz's sublimate code) is released under the MIT license.

More Repositories

1

eyeLike

A webcam based pupil tracking implementation.
C++
894
star
2

telefork

Like fork() but teleports the forked process to a different computer!
Rust
587
star
3

pro

Quickly jump to git repositories, run commands in batch and get status overviews.
Ruby
181
star
4

doubleVision

A ruby gem that that manipulates PNG filest to create magic thumbnails like http://funnyjunk.com/channel/ponytime/rainbow+dash/llhuDyy/15#15
Ruby
157
star
5

screentunes

Play tones through an LCD monitor from a web page using an electronics quirk in some displays.
JavaScript
135
star
6

numderline

Font patcher to using shaping tricks to make it easier to pick out digit groups in numbers without commas
HTML
115
star
7

dotfiles

My dotfiles for Linux and Mac.
Shell
98
star
8

stashline

A long term personal finance planning timeline app for IOS
Objective-C
75
star
9

gigatrace

Proof-of-concept for a memory-efficient data structure for zooming billion-event traces
Rust
65
star
10

QuestSaberPatch

Patcher tool to add custom levels to the Oculus Quest version of Beat Saber
C#
64
star
11

twitterperf

Prototyping the performance of various components of a theoretical faster Twitter
Rust
63
star
12

enfasten

⚡️ Automatically make your site faster with optimized responsive images
Go
62
star
13

wikicrush

Processor scripts for Wikipedia dumps to crush them into a dense binary format that is easy to pathfind with.
Ruby
59
star
14

dayder

Search lots of data sets for spurious correlations
Rust
55
star
15

MetalTest

Glitchless smooth window resizing in Metal
Swift
48
star
16

ratewithscience

Rate things on arbitrary scales using big data and science!
Rust
48
star
17

PolyMouse

Fusion of gaze and head tracking for fast & accurate hands-free pointing
C++
48
star
18

quickdown

Fast lightweight Markdown viewer with Rust and Webrender. HtN 2017
Rust
45
star
19

indexView

Widget to graph historical stock indexes and compute stats.
JavaScript
40
star
20

faceHack

Replace faces in any video with your own! Made for Terrible Hacks
HTML
37
star
21

FusionMouse

Combines Tobii eye tracking with TrackIR head tracking for a fast hands-free mouse replacement, in Rust!
Rust
36
star
22

LastSecondSlides

Use the Google speech-to-text API to generate presentation slides as you talk!
Python
34
star
23

faiyels

See all your code rendered at once and zoom around it using Rust & GPU rendering.
Rust
32
star
24

ddbus

D library for interfacing with dbus
D
31
star
25

thume.popclick

A lua module for PopClick, detect mouth noises with HammerSpoon
C
30
star
26

SmartGaze

WIP Open source IR eye tracker for the Eye Tribe tracker hardware
C++
28
star
27

OpenTuringCompiler

A cross platform Turing Compiler built with LLVM.
C++
24
star
28

mjolnir.th.hints

Mjolnir Module for Window Hints
Objective-C
23
star
29

EmojiEngine

The world's fastest rendering engine, for emoji. Made for TerribleHack X
JavaScript
23
star
30

seqalign_pathing

Rust implementation of sequence alignment / Levenshtein distance by A* acceleration of the DP algorithm
Rust
19
star
31

PopClick

Detecting lip popping noises to trigger an action.
C++
15
star
32

EyeTribeReversing

Reverse engineering The Eye Tribe eye tracker
C
14
star
33

QuickMouse

O(log9(N)) keyboard mouse for OSX that uses the Numpad. Written in MacRuby
Ruby
13
star
34

PolyType

Firmware for my custom keyboard running on the Teensy 3.1
C++
12
star
35

KeyboardCAD

Cad files and scripts for 3D printing and laser cutting custom keyboards
OpenSCAD
12
star
36

perftrace

Trace instruction execution using perf breakpoints in Python
C
12
star
37

SublimeRustAutoImport

Automatically add use statement imports for Rust from ST3
Python
10
star
38

SquareGame

Kiran Rao's SquareGame. Ported to IOS. The project does not include the cocos2d libs folder and the OpenFeint Library. If you want to compile it you will have to download them.
Objective-C
10
star
39

d-phobos-docset

Dash docset for D language Phobos library
Ruby
10
star
40

SublimeTect

Control scheme like Vim but with mode switching through pressing palm keys on fancy keyboards.
Python
10
star
41

nixfiles

my NixOS dotfiles for a VPS and eventually a desktop
Nix
9
star
42

hnblogs

RSS feed of all the comments of my favourite Hacker News users
Ruby
9
star
43

trishume.github.com

My website/blog thing. Made with Jekyll.
HTML
8
star
44

tobii-sys

Rust bindings for the Tobii Stream Engine API for eye tracking
Rust
8
star
45

SmartHeadTracker

WIP OpenCV webcam head tracker based on coloured markers
C++
8
star
46

stopfmt

Sublime plugin to fold short error-checking if statements in Go onto one line
Python
6
star
47

improsent

A command-based web app for improvising powerpoint presentations.
JavaScript
6
star
48

VintageousPlus

DEPRECATED: merged into https://github.com/NeoVintageous/NeoVintageous
Python
6
star
49

ChordingModel

Modelling and statistics scripts for chorded keyboard layouts.
Ruby
6
star
50

QLTool

Command line tool for generating quicklook previews
Objective-C
5
star
51

TuringInTuring

My grade 11 programming summative. A turing interpreter written in turing. Not Perl, they just have a shared extension. Perl does have better syntax than turing though.
Turing
5
star
52

Sublime-Rosetta-Get

Sublime Text 2 plugin to insert snippets from rosetta code.
Python
4
star
53

transience

A Qt daemon for displaying transient overlays like window hints.
C++
4
star
54

webrender_playground

Playing around with Webrender and the new glutin to get nice window resizing on macOS
Rust
3
star
55

portmanteau

Portmanteau generator web app using graph search in Rust
Rust
3
star
56

SublimeTalon

Sublime plugin for integration with https://talonvoice.com/
Python
3
star
57

sympy-wiki-with-svn

For a Google Code-In task. Merge-in of svn wiki.
3
star
58

haxe-webcam-witchcraft

Object tracking webcam witchcraft in Haxe. There is also a faster but less featured processing version that requires libOpenCV.
Haxe
3
star
59

KeySelect

Select text using your keyboard. A Qt OSX app that does image processing on screenshots to identify text.
C
3
star
60

AITron

test bed for tron AI
JavaScript
2
star
61

Drop2Run

Work in progress. Uses ideone API to run any program dropped onto the page.
JavaScript
2
star
62

OpenTuringParser

A hand-written parser for Turing. Will be used in the new compiler. Goals include giving excellent error messages and having auto-fixing support. (Similar to the Clang parser.)
C++
2
star
63

vizwl

Qt program for drawing debug visualizations from any language using a simple text protocol
C++
2
star
64

atom-good-go-to

Fast go to file and line from command line searching all panes for Atom
JavaScript
2
star
65

watcard-cli

A ruby command line tool for accessing watcard history.
Ruby
2
star
66

quickruby

A ruby DSL for programming contests and associated tools.
Ruby
2
star
67

uiscope

A QT Dbus server for accessing the OSX accessibility hierarchy.
Objective-C++
2
star
68

MaximallyDirect2D

TerribleHack XI project to render directly from UDP packets to the screen with Apple Metal
Objective-C
2
star
69

dynamic-favicon

Experimentation with dynamic favicons.
2
star
70

resume

My printable online resume.
HTML
2
star
71

Geo-Summative

My grade 9 geography summative on going to Niagara Falls.
JavaScript
2
star
72

libdlusb-0.0.9-mac

A port of libdlusb to the mac. Sync your Timex DataLink USB watch to a mac computer.
C
2
star
73

LookMouse

A head tracking mouse using a microcontroller and a head-mounted IMU.
Arduino
2
star
74

Ultimate-Tic-Tac-Toe

A java applet (with crappy AI) for ultimate tic tac toe.
Java
2
star
75

ratews_backend

Rating on arbitrary scales courtesy of the Wikipedia link graph.
Rust
2
star
76

BlinKey

Firmware for one key keyboard emulating Blink(1) with Adafruit Trinket.
C
1
star
77

google-ai-ants-bot

My ants bot for the google AI challenge. Not gonna upload full code till it's over.
1
star
78

jsglot

Javascript implementation for the polyglot gem. Requires the polyglot gem and a ruby v8 gem.
Ruby
1
star
79

live_debug

Live coding ruby debugger inspired by Bret Victor's "Inventing on Principle." Intended mainly for programming contests.
1
star
80

d-vibed-docset

A Dash docset for the Vibe.d D web framework
Ruby
1
star
81

magicsense

Probabilistic language-independent intelligent autocomplete in Rust
Rust
1
star
82

shakespeare-programs

Programs written in the Shakespeare language. Run using the perl module Lingua::Shakespeare
1
star
83

turing-particles

Simple particle systems in turing
Perl
1
star
84

fractions.rb

A simple fractions and probability library for ruby. I wrote it without knowing that ruby includes the Rational class with the same functionality. :-|
Ruby
1
star
85

quickchart

WIP simple real-time charting web app and server
JavaScript
1
star
86

ninjax

Ruby library that allows you to write nearly any program in one line of code. See the test programs for documentation.
Ruby
1
star
87

rustdoc

Repo for hosting rust docs on Github Pages, managed by an automated uploads from Travis
HTML
1
star
88

treeeees

Generates cool word gardens! Based on the first problem of the 2012 ECOO programming contest finals.
Ruby
1
star
89

ruby-pdf-word-grapher

Creates graphs of per-line-group occurences of words in a text file (from a converted PDF.) Track symbols in literature!
Ruby
1
star
90

DWITE-solutions

My solutions to past DWITE contest problems. Only the ones I wrote, not guaranteed to work.
Ruby
1
star
91

normal

Normal distribution calculator web app
HTML
1
star