• Stars
    star
    110
  • Rank 316,770 (Top 7 %)
  • Language
    TypeScript
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

JavaScript implementation of the Knuth-Plass linebreaking algorithm

tex-linebreak

npm version

tex-linebreak is a JavaScript library for laying out justified text as you would find in a newspaper, book or technical paper. It implements the Knuth-Plass line-breaking algorithm, as used by TeX.

Introduction

Most text on the web is presented with "ragged-right" margins, as opposed to the justified text you would find in eg. a scientific paper or newspaper. Text can be justified in web pages using text-align: justify. However this option alone tends to result in large   spaces    between words which is distracting to read. This is due to the use of "first fit" line-breaking algorithms where the browser considers only the current line when finding the next breakpoint. Some browsers support hyphenation via hyphens: auto which reduces this effect. However the first-fit approach can still produce wide lines and it can also produce more hyphenated lines than necessary.

The Knuth-Plass algorithm on the other hand optimizes the spacing between words over the whole paragraph, seeking to minimize the overall "badness" of the layout. This factor depends on the amount by which spaces have been shrunk or stretched and the number of hyphenated lines. The benefits of this approach are greater when rendering narrower columns of text (eg. on small screens).

This table compares the same text rendered in the same environment (font, font size, device width, margins) using CSS justification, CSS justification + hyphenation and this library:

Safari: text-align: justify Chrome: text-align: justify; hyphens: auto _tex-linebreak_
CSS justification produces large spaces on the second and penultimate lines. Enabling hyphenation using `hyphens: auto` in browsers that support it (as of 2018-04-07 this appears to be only Chrome) produces better output but still produces wide lines. The TeX algorithm in contrast hyphenates fewer lines and avoids excessive spacing between words.

tex-linebreak has no dependencies on a particular JS environment (browser, Node) or render target (<canvas>, HTML elements, PDF).

Try it out

The easiest way to see what the library can do is to install the bookmarklet and activate it on an existing web page, such as this Medium article.

It will justify and apply hyphenation to the content of any paragraph (<p>) elements on the page. The difference is more beneficial on smaller screens, so try in your browser's responsive design mode.

Note that the bookmarklet does not work on sites that use Content Security Policy to restrict where scripts can be loaded from.

Usage

First, add the tex-linebreak package to your dependencies:

npm install tex-linebreak

The library has low-level APIs which implement the core line-breaking and positioning algorithm, as well as higher-level APIs that provide a convenient way to justify existing HTML content.

Low-level APIs

The low-level APIs breakLines and positionItems work with generic "box" (typeset material), "glue" (spaces with flexible sizing) and "penalty" items. Typically "boxes" are words, "glue" items are spaces and "penalty" items represent hyphenation points or the end of a paragraph. However you can use them to lay out arbitrary content.

import { layoutItemsFromString, breakLines, positionItems } from 'tex-linebreak';

// Convert your text to a set of "box", "glue" and "penalty" items used by the
// line-breaking process.
//
// "Box" items are things (typically words) to typeset.
// "Glue" items are spaces that can stretch or shrink or be a breakpoint.
// "Penalty" items are possible breakpoints (hyphens, end of a paragraph etc.).
//
// `layoutItemsFromString` is a helper that takes a string and a function to
// measure the width of a piece of that string and returns a suitable set of
// items.
const measureText = text => text.length * 5;
const items = layoutItemsFromString(yourText, measureText);

// Find where to insert line-breaks in order to optimally lay out the text.
const lineWidth = 200;
const breakpoints = breakLines(items, lineWidth)

// Compute the (xOffset, line number) at which to draw each box item.
const positionedItems = positionItems(items, lineWidth, breakpoints);

positionedItems.forEach(pi => {
  const item = items[pi.item];

  // Add code to draw `item.text` at `(box.xOffset, box.line)` to whatever output
  // you want, eg. `<canvas>`, HTML elements with spacing created using CSS,
  // WebGL, ...
});

High-level APIs

The high-level APIs provide convenience methods for justifying content in existing HTML elements and laying out justified lines for rendering to HTML, canvas or other outputs. This includes support for hyphenation using the hypher library.

Justifying existing HTML content

The contents of an existing HTML element can be justified using the justifyContent function.

import enUsPatterns from 'hyphenation.en-us';
import { createHyphenator, justifyContent } from 'tex-linebreak';

const hyphenate = createHyphenator(enUsPatterns);
const paragraphs = Array.from(document.querySelectorAll('p'));
justifyContent(paragraphs, hyphenate);

After an element is justified, its layout will remain fixed until justifyContent is called again. In order to re-justify content in response to window size changes or other events, your code will need to listen for the appropriate events and re-invoke justifyContent.

Rendering text

For rendering justified text into a variety of targets (HTML, canvas, SVG, WebGL etc.), the layoutText helper can be used to lay out justifed text and obtain the positions which each word should be drawn at.

import { createHyphenator, layoutText } from 'tex-linebreak';

import enUsPatterns from 'hyphenation.en-us';

const hyphenate = createHyphenator(enUsPatterns);
const measure = word => word.length * 5;

const { items, positions } = layoutText(text, lineWidth, measure, hyphenate);

positions.forEach(pos => {
  // Draw text as in the above example for the low-level APIs
});

API reference

The source files in src/ have documentation in the form of TypeScript annotations.

Examples

For working code showing different ways to use this library, see the demos. You can build and run the demos using:

npm i -g http-server

git clone https://github.com/robertknight/tex-linebreak.git
cd tex-linebreak
yarn
yarn build-dev
http-server -c-1

Then navigate to http://127.0.0.1:8080/src/demos/layout.html (note that http-server may choose a different port).

Caveats

The library currently has a number of caveats:

  • It is not aware of floated content which can affect the available space in a paragraph to lay out text into. In the presence of floats lines can exceed the width of the paragraph.
  • Justification of existing HTML content relies on modifying the DOM to insert linebreaks and wrap text nodes in order to adjust inter-word spacing on each line. This can be in slow in large documents. Test it on your content to decide whether the overhead is acceptable for your use case. Also limit the number of elements which you apply justification to.

References

[1] D. E. Knuth and M. F. Plass, “Breaking paragraphs into lines,” Softw. Pract. Exp., vol. 11, no. 11, pp. 1119–1184, Nov. 1981.

More Repositories

1

webpack-bundle-size-analyzer

A tool for finding out what contributes to the size of Webpack bundles
TypeScript
1,400
star
2

Qt-Inspector

Utility to browse the Qt object tree of a running Qt application and edit object properties on the fly.
C++
375
star
3

react-testing

Slides, notes and sample project from a talk on testing React applications given at the London React meetup group
JavaScript
188
star
4

passcards

A 1Password-compatible command-line and web-based password manager
TypeScript
137
star
5

tesseract-wasm

JS/WebAssembly build of the Tesseract OCR engine for use in browsers and Node
TypeScript
110
star
6

qt-mustache

Mustache templating library for C++ using Qt
C++
85
star
7

1pass

1Password command-line client
Go
66
star
8

konsole

Terminal for KDE
C++
60
star
9

approx-string-match-js

Approximate string matching library for JavaScript
TypeScript
35
star
10

ts-style

A small library for defining CSS classes using JavaScript or TypeScript
TypeScript
33
star
11

qt-maybe

Implementation of sum/option types using QVariant
C++
29
star
12

qt-signal-tools

Utility classes related to Qt signal and slot handling
C++
24
star
13

rollup-cache

Disk caching to speed up Rollup builds
JavaScript
15
star
14

mandrawer

A miscellaneous collection of scripts I use across various devices.
Python
13
star
15

sass-unused

Utility for finding unused variables in a collection of SASS files
JavaScript
12
star
16

extension-tools

A collection of scripts for automated publishing of browser extensions
JavaScript
11
star
17

kdevelop

KDevelop IDE
C++
10
star
18

babel-plugin-mockable-imports

Babel plugin that enables mocking of ES and CommonJS imports in tests
JavaScript
10
star
19

idb-mutex

Mutual exclusion (locks) between tabs in browsers using IndexedDB
TypeScript
9
star
20

xar-js

A JavaScript library for generating xar archives
TypeScript
8
star
21

dbpro-bluegui

BlueGUI v2 GUI controls plugin for DarkBASIC Professional
C++
7
star
22

rust-optparse

Command-line option parsing lib for Rust
Rust
3
star
23

duckface

Go-like interface generator for C++
Ruby
2
star
24

ureact

A small implementation of the modern React API for educational purposes.
JavaScript
2
star
25

unroll-it

Helper for creating data-driven / parametrized tests with Mocha
JavaScript
2
star
26

textgen

Random text generator built with PyTorch
Python
2
star
27

hot-reloading-talk

A talk for LondonJS on architecting web applications to enable hot reloading
JavaScript
2
star
28

elm-arch-tutorial

Notes made whilst working through the Elm Architecture Tutorial
Elm
1
star
29

talks

Slides and resources for talks I've given or will give
JavaScript
1
star
30

elm-april-hack-night

Elm
1
star
31

rd

A tool for quick access to recently used directories
Go
1
star
32

qt-webitemview

Qt item view delegate which uses WebKit to render item content
C++
1
star
33

find-unused-css-classes

Tool for finding CSS classes that are styled by selectors in a set of CSS files but not used
JavaScript
1
star
34

rust-cmacros

Rust lib for parsing macros from C header files and assisting with translation to Rust code
Rust
1
star
35

labnotes

A river-of-news feed aggregator. Created as a Mendeley hackday project.
TypeScript
1
star
36

used-css-classes

Tool that prints the list of unique CSS classes used in a set of HTML files or templates
Python
1
star
37

gomendeley

A basic demo showing authentication and use of the Mendeley API with Go
Go
1
star
38

prune-merged-branches

Tool to clean up your local Git branches
Python
1
star