• Stars
    star
    3,451
  • Rank 12,331 (Top 0.3 %)
  • Language
    JavaScript
  • License
    GNU General Publi...
  • Created over 10 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

OCR in Javascript via Emscripten

ocrad.js

OCR in Javascript via Emscripten by Kevin Kwok

As with any minor stepping stone on the road to hell relentless trajectory of Atwood's Law, I probably don't need to justify the existence of yet another "x, but now in Javascript!", but I might as well try. After all, we all would like to think that there's some ulterior motive to fulfilling that prophecy.

On tablet or other touchscreen devices- of which there are quite a number of nowadays (as the New Year's Eve post, I am obliged to include conjecture about the technological zeitgeist), a library such as Ocrad.js might be used to add handwriting input in a device and operating system agnostic manner. Oftentimes, capturing the strokes and sending them over to a server to process might entail unacceptably high latency. Maybe you're working on an offline-capable note-taking app, or a browser extension which indexes all the doge memes that you stumble upon while prawling the dark corners of the internet.

If you've been following my trail of blog posts recently, you'd probably be able to tell that I've been scrambling to finish the program that I prototyped many months ago overnight at a Hackathon. The idea of the extension was kind of simple and also kind of magical: a browser extension that allowed users to highlight, copy, and paste text from any image as if it were plain text. Of course the implementation is a bit difficult and actually relies on the advent of a number of newfangled technologies.

If you try to search for some open source text recognition engine, the first thing that comes up is Tesseract. That isn't a mistake, because it turns out that the competition is worlds away in terms of accuracy. It's actually pretty sad that the state of the art hasn't progressed substantially since the mid-nineties.

A month ago, I tried compiling Tesseract using Emscripten. Perhaps it was a bad thing to try first, but soon I learned that even if it did work out, it probably wouldn't have been practical anyway. I had figured that all OCR engines had been powered by artificial neural networks, support vector machines, k-nearest-neighbors and their machine learning kin. It turns out that this is hardly the norm except in the realm of the actually-accurate, whose open source provinces live under the protection of Lord Tesseract.

GOCR and Ocrad are essentially the only other open source OCR engines (there's technically also Cuneiform, but the source code is in a really really big zip file from some website in Russian and its also really slow according to benchmarks). And something I didn't realize until I had peered into the source code is that they are powered by (presumably) painstakingly written rules for each and every detectable glyph and variation. This kind of blew my mind.

Anyway, I tried to compile GOCR first and was immediately struck by how easy and painless it had been. I was on a roll, and decided to do Ocrad as well. It wasn't particularly hard- sure it was slightly more involved but still hardly anything.

If you know me in person, you'll probably know that I'm not a terribly decisive person. Oftentimes, I'll delay the decision until there isn't a choice left for me to make. Anyway, serially-indecisive-me strikes again, so I alternated between the development of GOCR.js and Ocrad.js, leading up to a simultaneous release.

But in the back of my mind, I knew that eventually I would have to pick one for building my image highlighting project.

What consistently amazes me about Optical Character Recognition isn't its astonishing quality or lack thereof. Rather, it's how utterly unpredictable the results can be. Sometimes there'll be some barely legible block of text that comes through absolutely pristine, and some other time there will be a perfectly clean input which outputs complete garbage. Maybe this is a testament to the sheer difficulty of computer vision or the incredible and underappreciated abilities of the human visual cortex.

At one point, I was talking to someone and I distinctly remembered (I know, all the best stories start this way) a sense of surprise when the person indicated that he had heard of Tesseract, the open source OCR engine. I had appraised it as somewhat more obscure than it evidently was. Some time later, I confided about the incident with a friend, and he said something along the lines of "OCR is one of those fields that everyone comes across once".

I guess I've kind of held onto that thought for a while now, and it certainly seems to have at least a grain of truth. Text embedded into the physical world is more or less our primary means we have for communication and expression. Technology is about building tools that augment human capacity and inevitably entails supplanting some human capability. Data input is a huge bottleneck, and while we're kind of sidestepping the problem with things like QR codes by bringing the digital world into the physical. OCR is just one of those fundamental enabling technologies which ought to be as broad in scope as the set of humans who have interacted with a keyboard.

I can't help but feel that the rather large set of people who have interacted with the problem character recognition have surveyed the available tools and reached the same conclusion as your miniature Magic 8 Ball desk ornament: "Try again later". It doesn't take long for one to discover an instance of perfectly crisp and legible type which results in line noise of such entropy that it'd give DUAL_EC_DRBG a run for its money. "No, there really isn't any way for this to be the state of the art." "Well, I guess if it is, then maybe it'll improve in a few years- technology improves quickly, right?"

You would think that some analogue of Linus's Law would hold true: "given enough eyeballs, all bugs are shallow"- especially if you're dealing with literal eyeballs reading letters. But incidentally, the engine that absolutely everyone uses was developed three decades ago (It's older than I am!), abandoned for a decade before being acquired and released to the world (by our favorite benevolent overlords, Google).

In fact, what's absolutely stunning is the sheer universality of Tesseract. Just about everything which claims to have text recognition as a feature is backed by it. At one point, I was hoping that Mathematica had some clever routine using morphology and symbolic new kinds of sciences and evolved automata pattern recognition. Nope! Nestled deep within the gigabytes of code lies the Chuck Testa of textadermies: Tesseract.

More Repositories

1

jsgif

Save a HTML5 Canvas to GIF and Animations. A port of as3gif GIFPlayer to JS
JavaScript
1,052
star
2

whammy

A real time javascript webm encoder based on a canvas hack
JavaScript
992
star
3

player

Almost certainly the first MP3 player of its kind.
JavaScript
276
star
4

cloudsave

Save to the cloud.
JavaScript
168
star
5

eigensheep

massively parallel experimentation with Jupyter and AWS Lambda πŸ‘πŸŒ©πŸ“’
Python
160
star
6

rgb-lab

convert between rgb and L*a*b color spaces in javascript
JavaScript
155
star
7

tesseract-rs

Rust bindings for Tesseract
Rust
113
star
8

weppy

Javascript WebP Library
JavaScript
111
star
9

gocr.js

OCR in Javascript via Emscripten
C
95
star
10

inpaint.js

Telea Inpainting Algorithm in JS
JavaScript
86
star
11

drag2up

Drag a file from your computer to any text field to upload and add link
JavaScript
83
star
12

surplus

Google+ Chrome Extension
JavaScript
68
star
13

summerTorrent

A bit torrent client written in JavaScript, on top of node.js
JavaScript
63
star
14

breadloaf

A draggable, dockable, notebook-style layout engine for React
JavaScript
53
star
15

bzip2.js

a bunzip implementation in pure javascript
JavaScript
37
star
16

obvious-rpc

fully strongly typed client-server communication that is so obvious you'll wonder why it hasn't always been like this
TypeScript
32
star
17

evm

Eulerian Video Magnification in the Browser with JSFeat
JavaScript
31
star
18

js-typed-array-sha1

sha1 with js typed arrays
JavaScript
29
star
19

swipe-gesture

Quick multitouch back/forward gesture for Chromebooks
JavaScript
28
star
20

js-id3v2

A Javascript implementation of ID3v2
JavaScript
28
star
21

autocircle

how to create a magical circle which adds people automagically
Ruby
23
star
22

google-music-protocol

reverse engineered google music protocol
Python
22
star
23

microwave

Mobile-friendly Javascript Data API based Google Wave Client
JavaScript
21
star
24

musicalpha

Upload songs to Google Music Beta on Linux
JavaScript
20
star
25

cloudfall

A simple text editor that syncs to dropbox
JavaScript
20
star
26

js-wikireader

An Offline Wikipedia Dump Reader in Javascript that probably only works on Chrome
JavaScript
19
star
27

jstorrent

A pure JavaScript BitTorrent 1.0 Implementation
JavaScript
17
star
28

heapqueue.js

A simple binary heap priority queue
JavaScript
17
star
29

boa

"its like OAB in python because snake"
Python
15
star
30

distributed-pi

Calculate Pi using distributed computing with JavaScript on Appengine
JavaScript
14
star
31

stick2

a simple stick figure animator with html5
JavaScript
13
star
32

chrome-dropbox

Dropbox + Chrome
JavaScript
13
star
33

hideelements

Chrome Extension. Background Page + Context Menu + Content Script
12
star
34

awesomeness

HTTP based federated protocol for real time hierarchical message manipulation
JavaScript
12
star
35

scratchpad

scratchpad used in khan academy
JavaScript
12
star
36

codemirror-jsx

CodeMirror Mode for React E4X/JSX
JavaScript
11
star
37

3d-sculpt

A simple 3D digital sculpting tool made with JS and HTML5 Canvas
10
star
38

antimatter15

Tiny projects of antimatter15
JavaScript
10
star
39

chromesearch

Desktop Search Engine Chrome Extension
JavaScript
10
star
40

zui

A zooming user interface
JavaScript
9
star
41

antimatter15.github.io

I can't think of a description so I'm describing my inability to think of a description
HTML
9
star
42

js-potrace

A JS port of the C# Vectorize port of the C Potrace
8
star
43

2d-thin-plate-spline

javascript thin plate spline in 2d
JavaScript
8
star
44

derpsacola

use mac accessibility api to scrape screen contents
Swift
8
star
45

chromecorder

Encode screencasts in a cool way copied off of sublimetext.com
CoffeeScript
8
star
46

gmailwave

Integrated Gmail and Wave Chrome Extension
JavaScript
8
star
47

js-ebml

a simple ebml parser in JS for no good reason
JavaScript
7
star
48

gayfish

experimental notebook programming environment
JavaScript
7
star
49

jsvectoreditor

a new version of vectoreditor
JavaScript
7
star
50

wave.js

A Node.JS implementation of the Wave Robot API
6
star
51

k5

differentiable graphics for react
JavaScript
6
star
52

untar.js

untar salvaged from bitjs
JavaScript
6
star
53

readability-iframe

Chrome extension for sites that want to use Readability
JavaScript
5
star
54

creamie

Chrome + Streamie (port of both client and server to Chrome)
JavaScript
5
star
55

pinball

coffeescript pinball game
CoffeeScript
5
star
56

w2_embed

Deep Integration Wave Embed API
JavaScript
5
star
57

autograph

the best most easiest way to graphql
TypeScript
5
star
58

surplus-lite

Google+ notifications in Chrome without colossal memory usage.
JavaScript
5
star
59

omeglebot

A simple Omegle robot that repeats previous conversation phrases semi-intelligently
JavaScript
5
star
60

py-wikireader

A simple offline Wikipedia dump reader
Python
5
star
61

pepper

Use face.com api and canvas to interactively, fancily and automagically add the casually pepper spraying cop to any picture
JavaScript
5
star
62

derp

kinda like version control or something
JavaScript
4
star
63

pdftotext-wasm

poppler pdftotext compiled with emscripten
Dockerfile
4
star
64

facebook-export

Export facebook phone and other data with a screen scraper into CSV format
CoffeeScript
4
star
65

exthub

A self updating, collaborative extension platform
4
star
66

venn-google

Venn Diagrams using Google Suggest
JavaScript
4
star
67

sqlite-vfs-js

TypeScript
4
star
68

espkey

A portable hyperlocal wireless social experiment
C++
4
star
69

x-no-wiretap

Aid the NSA's unwitting collection of domestic internet traffic!
JavaScript
4
star
70

hqx.js

hqx in js
JavaScript
4
star
71

jove

ipython notebook for node.js
JavaScript
4
star
72

franchise-client

database connectors for franchise
JavaScript
4
star
73

speed

Read in a subtitles track and speed up parts of TV shows which don't have talking
4
star
74

wsl

pipe to websocket
JavaScript
4
star
75

fluidizer

Bookmarklet which converts arbitray fixed-width layouts into fluid layouts
JavaScript
4
star
76

d3-pinch-zoom

pinch to zoom for d3 on desktop browsers
JavaScript
4
star
77

vx-comet

A lightweight implementation of the Bayeux protocol
JavaScript
3
star
78

anodize

New Chrome Packaged App BitTorrent Client, mostly just a lot of NodeJS modules stuck together
JavaScript
3
star
79

bitjs

Binary Tools for JavaScript
JavaScript
3
star
80

crossave

Chrome OS File Manager Handler powered by Cloud Save that uploads to a bucketload of services.
JavaScript
3
star
81

evilmeter

chrome extension that detects user agent sniffing
3
star
82

sublime-autobuild

Automatically build on save in Sublime Text 2
Python
3
star
83

fb-grapher

Make purty graphs out of fb data!
JavaScript
3
star
84

dropsync

dropbox syncing for chrome os
3
star
85

rsvgshim

A SVG Shim that renders with RaphaelJS
JavaScript
3
star
86

groebner.js

javascript implementation of buchberger's algorithm for computing a polynomial groebner basis
JavaScript
3
star
87

sprite-codec

A fast screen media optimized codec for embedding in websites
3
star
88

doge

wow. such commit. very push.
Python
3
star
89

identicon-login

A new approach to fighting phishing
PHP
3
star
90

wordless

extract plain text from a word document
JavaScript
3
star
91

kindlespark

Sparknotes -> Kindle via YQL
2
star
92

tensorflow-renderer

first steps toward trying to build a mesh renderer in tensorflow
Jupyter Notebook
2
star
93

retcon

TypeScript
2
star
94

articles

hopefully dis gon b gud
ASP
2
star
95

react-use-nav

a simple routing system for react
2
star
96

timeliner

automatically enable timeline for facebook
2
star
97

facetex

TeX for Facebook Chat
JavaScript
2
star
98

progressive-json

Parse JSON before all of it is loaded
JavaScript
2
star
99

wave-unread-navigator

Show gmail-like arrows listing if unread blips in an open wave are above or below.
JavaScript
2
star
100

keyboard

some failed experiment from a while ago
2
star