• Stars
    star
    106
  • Rank 324,960 (Top 7 %)
  • Language
    HTML
  • Created about 8 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Working with hOCR in Javascript

hocrjs

Working with hOCR in Javascript

Showcase

Demo

See this demo document: Demo

Video

video of hocrjs

Screenshots

background image, transparent text

text only, scaled font

Usage

Simple Usage

To add the interface to a plain hOCR file, add this line just before the closing </body> tag:

<script src="https://unpkg.com/hocrjs"></script>

In addition, your webserver must set the Content-Type to a value that allows loading scripts, such as text/html. If your hOCR file uses an extension .html or .htm extension, the media type should be set correctly.

For files with a .hocr extension (e.g. generated by tesseract), you will need to add a mapping from extension to media type:

  • Apache: Add the following to you server configuration or .htaccess file:

    AddType text/html hocr
  • nginx: Add to mime.types:

    text/html    hocr;
    

User script

Also bundled is a browser extension that lets you add the hocrjs interface to any hOCR document you browse on the web.

Tampermonkey:

Tampermonkey

Greasemonkey:

Greasemonkey

Command line interface

hocrjs comes with a command line tool hocrjs-inject that inserts the necessary <script> tag to a local hOCR document. To use it, first install hocrjs system-wide:

npm install -g hocrjs

Then run hocrjs-inject /path/to/ocr-doc.hocr. The resulting file will be /path/to/ocr-doc.hocrjs.html which you can open in a browser.

Development

To build hocrjs from source, you need Node.js and make.

Clone the repository and run make for a list of targets:

Targets

bootstrap  lerna bootstrap
dist       webpack all
clean      Remove built targets
test       Run unit tests
link       link
publish    publish packages

Variables

VERSION  Version of the latest git tag

Layout

The hOCR elements are positioned with display: fixed. The trick is that they are within a container element that has transformation. This makes the fixed positions relative to the container element instead of the viewport.

Features and SASS

A feature is behavior that can be enabled or disabled and possibly configured, such as displaying the background image (BackgroundImage) or whether to disable <strong>/<em> display of text (DisableEmStrong).

If a feature is enabled, a class hocr-viewer-feature-<NAME-OF-FEATURE> will be added to the root container.

These classes are used in the SCSS stylesheet to implement the desired behavior using CSS, if possible.

Adding a feature

Add enableMyFeature property to the HocrViewer component.

In hocr-viewer.scss add rules for .hocr-viewer-feature-myFeature as necessary.

If the behavior requires modifying the hOCR (e.g. ScaleFont), create a class ./src/components/hocr-viewer/feature/MyFeature.js that gets passed the component to the constructor and implements an apply(dom) method to modify the HTML. Use the methods provided by hocr-dom to access hOCR specific features like properties.

More Repositories

1

awesome-ocr

Links to awesome OCR projects
2,417
star
2

hocr-spec

The hOCR Embedded OCR Workflow and Output Format
HTML
71
star
3

jcurses

Java Curses implementation
Java
22
star
4

canvas-editor

Vue component for editing shapes in a canvas
Vue
20
star
5

makefile-parser

Parser and documentation generator for Makefiles
JavaScript
20
star
6

page-to-alto

Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)
Python
13
star
7

transkribus-to-prima

Convert Transkribus PAGE-XML to standard PAGE-XML
Python
11
star
8

vfs

Virtual File Systems with a node fs-like API
JavaScript
9
star
9

anno-common

Node.JS/Browser Web Annotation Framework
JavaScript
9
star
10

anno-frontend

Vue application for displaying/editing annotations
JavaScript
8
star
11

hocr-spec-python

Validation of hOCR close to the specs
Python
7
star
12

tsht

A tiny shell-script based testing framework
Shell
7
star
13

zts-in-a-box

Zotero Translation Server + Simple Query API + Swagger in Docker
CoffeeScript
6
star
14

ocror-detector

Detect errors in OCR
Python
6
star
15

winston-timer

Extend winston to measure time intervals
CoffeeScript
6
star
16

turtleson

Concise, permissive, TURTLE-like dialect of JSON
CoffeeScript
5
star
17

jsonld-rapper

Create RDF from JSON-LD with rapper
CoffeeScript
5
star
18

grip-docker

Run grip markdown renderer in a docker container
Shell
5
star
19

ocr-xsl

XSLT 2.0 functions for transforming between hOCR, ALTO and ABBYY
XSLT
5
star
20

ocr-fileformat-samples

Samples for common OCR file formats (hOCR, ABBYY, ALTO)
5
star
21

hocr-dom

Extend DOM to handle hOCR
JavaScript
4
star
22

shinclude

Include directives for code/markup comments
Shell
4
star
23

libjcsi

Java Console System Interface
Java
4
star
24

gdxai-btree.vim

Vim Syntax highlighting for gdx-ai behavior tree files
Vim Script
4
star
25

kraken-docker

Docker container for the kraken OCR engine
Makefile
4
star
26

node-vim2html

Create HTML highlighted by Vim's 2html script
CoffeeScript
3
star
27

mpv-config

mpv config
Lua
3
star
28

vdhd-2021-05-05

Demos for OCR-D presentation at OCR@vDHd
Makefile
3
star
29

dhd2022

3
star
30

ocr-models

A registry of models for OCR engines
Makefile
2
star
31

mollusc

Tools for handling line-based ground truth for OCR
JavaScript
2
star
32

neon

neon roguelike engine
2
star
33

ocrad-docker

GNU ocrad in a docker container
Makefile
2
star
34

tesseract-3.03-models

Tesseract 3.03 / 3.04 models
Shell
2
star
35

2019-icdar

2
star
36

object-prune

JavaScript
2
star
37

ucc

Java UniCode Constants (UCC)
Java
1
star
38

heiper

DOI registration microservice supporting dara and datacite
JavaScript
1
star
39

unicode-paint

A Java based Unicode image editor using Lanterna
Java
1
star
40

rlforj

Roguelike Library For Java
1
star
41

ocrrect

CoffeeScript
1
star
42

dh-2022-12-01

HTML
1
star
43

sift-date

Sift expressions to check for older/newer dates
JavaScript
1
star
44

ocrd-dita

Generating user docs for OCR-D from Markdown with DITA
Makefile
1
star
45

rssscrpr

Scrape web content to RSS feeds
HTML
1
star
46

dotfiles

Framework for handling lots and lots of dotfiles
Shell
1
star
47

semtonotes-utils

Enhancing SemToNotes with helper methods
JavaScript
1
star
48

ocrd-docs

OCR-D Documentation
Makefile
1
star
49

js-utils

Javascript utility and best practice functions for browser and Noded.JS
JavaScript
1
star