• Stars
    star
    317
  • Rank 131,497 (Top 3 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 14 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Adds text to PDF files using the cuneiform OCR software

pdfocr

pdfocr adds an OCR text layer to scanned PDF files, allowing them to be searched. It currently depends on Ruby 1.8.7 or above, and uses ocropus, cuneiform, or tesseract for performing OCR.

Using

To use, run:

pdfocr -i input.pdf -o output.pdf

For more details, see the manpage.

Dependencies

pdfocr requires tesseract and hocr2pdf. These can be provided by installing the packages tesseract-ocr, tesseract-ocr-eng (or other languages you need), and exactimage from your distribution.

Credits

pdfocr was written by Geza Kovacs

pdfocr is hosted at http://github.com/gkovacs/pdfocr

Christian Pietsch added tesseract support.

More Repositories

1

rime-japanese

日语输入法 Input method for typing Japanese with RIME
168
star
2

remove_miner_fees

Removes miner fees on claymore ethereum miner. Donation: ETH 0xb70fc6f9865ce18c20d90ebf067d9951918f8933
Python
127
star
3

self-signed-https

Creates a self-signed https server
LiveScript
39
star
4

smart-subtitles-extract-subtitle

Subtitle extraction system for Smart Subtitles http://up.csail.mit.edu/other-pubs/chi2014-smartsubs.pdf
Python
29
star
5

cantodict-kindle-mobi

CC-CEDICT Chinese-English Dictionary for Kindle with Mandarin and Cantonese (mobi format)
Python
20
star
6

udev-key-remapping

remap keys using udev
Shell
12
star
7

syslinux-mac

Port of syslinux to Mac OS X
C
9
star
8

cc-cedict-kindle-mobi

CC-CEDICT Chinese-English Dictionary for Kindle (mobi format)
Python
9
star
9

smart-subtitles-system-chi2014

Tool to assist in understanding foreign-language videos
HTML
8
star
10

ffmpeg-concat

FFmpeg with concatenation patches (GSoC 2009 project)
C
7
star
11

cs230-cryptocurrency-trading-lstm

deep learning (cs 230) final project
Jupyter Notebook
7
star
12

rime-double-jyutping-extra

粤语双拼输入法 Input method for typing Chinese using Cantonese pronunciations with 2-3 keys per character, based on RIME
7
star
13

webpack-stream-watch

JavaScript
7
star
14

rime-spanish

RIME keyboard layout for typing Spanish, French, German, Portuguese, and Hungarian. Supports QWERTY and colemak
7
star
15

rime-korean

韩语输入法 RIME IME schema for typing Korean Hangul and Hanja
6
star
16

rime-td-pinyin-flypy-extra

小鹤双拼输入法(带声调) Input method for typing Chinese using Flypy Double Pinyin (Xiaohe Shuangpin) with Tones, for RIME
6
star
17

japanese-morphology

morphology analyzer for Japanese
Python
6
star
18

lson

LSON: LiveScript Object Notation Parser
LiveScript
5
star
19

LiveScriptConsole

A Chrome Extension for running LiveScript from the Web Inspector
HTML
5
star
20

rime-vietnamese

RIME IME schema for inputting Vietnamese
Python
5
star
21

fcitx-rime-config

config for rime, place in ~/.config/fcitx/rime (linux) or ~/Library/Rime (macos)
Python
5
star
22

rime-td-pinyin-flypy

小鹤双拼输入法(带声调) Input method for typing Chinese using Flypy Double Pinyin (Xiaohe Shuangpin) with Tones, for RIME
Python
5
star
23

python-livescript

Call LiveScript from Python
Python
4
star
24

rime-tongwen-config

Config for RIME on Android (trime, 同文输入法) optimized for td-pinyin-flypy and double-jyutping input methods
4
star
25

trime-config

configuration files for rime (android version 同文输入法) put into /storage/emulated/0/rime
4
star
26

textmatch

Matches translatable messages to text extracted from program screenshots
Java
4
star
27

sudowin

sudowin - sudo for windows powershell
PowerShell
4
star
28

pianotutor

Provide a piece you want to learn, will generate a set of exercises to help you learn it.
CoffeeScript
4
star
29

translationsense

Translation sense disambiguation (for Chinese -> English word-level translations). Final project for 6.864.
Python
4
star
30

psetparty

Organize Pset Parties (6.470 Web Programming Competition 2013)
HTML
3
star
31

ipython-export

Python
3
star
32

habitlab-motivation-analysis

Analysis code for motivation analysis and user behavior prediction on HabitLab
Jupyter Notebook
3
star
33

node-tutorials

Sample code for my node tutorial series
JavaScript
2
star
34

praatinvoke

Praat interface using PortAudio
C#
2
star
35

read-each-line-sync

Read file line by line, synchronously.
JavaScript
2
star
36

feedlearn-mongoexport

Mongo related export and tooling for feedlearn
LiveScript
2
star
37

feedlearn2

Learn vocabulary as you browse your Facebook feed
JavaScript
2
star
38

speechsynth

Speech synthesis service in nodejs that proxies and caches Google Translate results
JavaScript
2
star
39

youku_series

Downloads video series from Youku
Python
2
star
40

ocr_service_onenote

Exposes OneNote's OCR engine as a web service
C#
2
star
41

decompress_lzstring

python port of the lz-string decompression routines at https://github.com/pieroxy/lz-string/blob/master/libs/lz-string.js
Python
2
star
42

add_utils_to_windows_path

adds utilities to the windows path
PowerShell
2
star
43

feedlearn-extension

Chrome extension for helping you learn as you browse your Facebook feed
JavaScript
2
star
44

grammarvis

Grammar Visualizer described in UIST 2013 demo: "Foreign Manga Reader: Learn Grammar and Pronunciation while Reading Comics"
CoffeeScript
2
star
45

gkovacs.github.com

Geza Kovacs
HTML
2
star
46

instantkaraoke

InstantKaraoke - a multiplayer game that generates karaoke for your favorite songs! (Boston Music Hack Day 2012)
CoffeeScript
2
star
47

liveocr

Provides an overlay on the computer screen which performs real-time OCR. Basis of ScreenMatch: Providing Context to Software Translators by Displaying Screenshots
C++
2
star
48

rime-colemak

Colemak layout for the RIME input engine
2
star
49

pdfocr-debian

Debian packaging for pdfocr
Ruby
2
star
50

enable-webcomponents-and-shadow-dom-in-content-scripts

Enables use of Web Components Custom Elements and Shadow DOM in Chrome content scripts.
JavaScript
2
star
51

findus

Fast Location Sharing
CoffeeScript
1
star
52

rna-reconstruct

6.047 final project
Python
1
star
53

karaokemaker

CoffeeScript
1
star
54

getsong

download via you-get then transcode into m4a
JavaScript
1
star
55

pillowpal

PillowPal - wake up your friends by sending music to their pillows! (HackMIT 2013)
CoffeeScript
1
star
56

split-utils

utilities for splitting and unsplitting files
Ruby
1
star
57

livescript-async

LiveScript with async/await patch added
LiveScript
1
star
58

phd_thesis

LaTeX source for my PhD Thesis
TeX
1
star
59

smart-subtitles-paper-chi2014

LaTeX sources for CHI 2014 paper: Smart Subtitles for Vocabulary Learning
TeX
1
star
60

effic

Improves your efficiency
CSS
1
star
61

dropbox-media-server

Node.js server that serves a public folder of media files on Dropbox
LiveScript
1
star
62

d3-network-visualization

HTML
1
star
63

shuffled

returns a shuffled copy of an array
LiveScript
1
star
64

transgame

Online game where users competitively translate webpages into foreign languages
JavaScript
1
star
65

code-reader

Tool to assist in reading code
JavaScript
1
star
66

publications

My academic publications
HTML
1
star
67

force-ssl

Express middleware for forcing ssl. Redirects http requests to https. Tested on Express 4.x
LiveScript
1
star
68

creativity_browsing_survey_analysis

Jupyter Notebook
1
star
69

feedlearn1

Learn vocabulary as you browse your Facebook feed
JavaScript
1
star
70

feedlearn-firefox

Firefox extension for feedlearn
LiveScript
1
star
71

xrff-utils

A set of utilities to manipulate files in Weka’s XRFF format.
C#
1
star
72

uglify-inplace

Runs uglify to minify javascript code in-place. See https://github.com/gkovacs/babili-inplace if you need ES6 features
JavaScript
1
star
73

cluebotng-mirror

A mirror of https://cobihome.external.cluenet.org:8443/svn/cluebotng r185
C++
1
star
74

read-each-line

Read file line by line, synchronously.
JavaScript
1
star
75

foreign-manga-reader

Chrome extension for reading foreign-language manga
JavaScript
1
star
76

parser-service

Web service that provides various parsers
LiveScript
1
star
77

iparead

Helps you learn to read IPA
JavaScript
1
star
78

flashcontrol

Source code of flash control, extracted from the chrome extension for reverse engineering purposes
JavaScript
1
star
79

publishresults

Publishes results of a script run on a port
LiveScript
1
star
80

cfy

Use generators and yield to write regular callback-based functions
LiveScript
1
star
81

reading-practice

6.813 final project - practice reading foreign sentences appropriate to your learning level
C#
1
star
82

textmatch-old-python

Obsolete, please see http://github.com/gkovacs/textmatch
Python
1
star
83

iChrome-personal

JavaScript
1
star
84

squirrel-rime-config

Configuration for squirrel (rime for macos, 鼠须管输入法) place in ~/Library/Rime
Python
1
star
85

google-play-music-export

Exports your Google Play Music collection from Android and creates mp3 tags for them
JavaScript
1
star
86

sweetjs-min

Sweet.js repackaged to run inside the browser and require minimal dependencies
1
star
87

feedlearn-emailer

SendGrid utils for sending emails for FeedLearn
LiveScript
1
star
88

maslab

MIT Mobile Autonomous System Laboratory (Maslab 2010 Competition) Team 9 Code
Java
1
star
89

sceneparse

Provides descriptions of scenes by composing object-generating grammars.
C#
1
star
90

rime-international

RIME schema for inputting several languages based on the Latin alphabet
1
star
91

ppa-multiseries

Given a Debian package, builds source packages for multiple Ubuntu series for upload to a PPA
Ruby
1
star
92

python-getsecret

Reads credentials from a file .getsecret.yaml
Python
1
star
93

do-not-automatically-add-other-search-engines

Chrome automatically adds sites to your search engines list when you visit them, which this extension prevents
JavaScript
1
star
94

extract_attachments

Command-line utility to extract attachments from an email message
JavaScript
1
star
95

browsing-behavior-reconstuction-analysis

Machile learning models for reconstructing detailed browsing behaviors (exact time spent on each site, when tabs were switched, site next visited) from browsing logs collected in Chrome extensions
Jupyter Notebook
1
star
96

gmail_status_unread_count

A script to set your gmail status based on the number of unread messages in your gmail inbox.
Python
1
star
97

superuser-recommendation-system-cs224w

CS224W final project: Personalized Recommendation System for Questions to Answer on SuperUser
Python
1
star
98

sysexec

equivalent to os.system in python, executes command synchronously, prints output to console as it executes, and returns the exit code
LiveScript
1
star