• Stars
    star
    293
  • Rank 141,748 (Top 3 %)
  • Language
    C
  • License
    GNU General Publi...
  • Created about 14 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Converts VobSub subtitles (.idx/.srt format) into .srt subtitles.

VobSub2SRT is a simple command line program to convert .idx / .sub subtitles into .srt text subtitles by using OCR. It is based on code from the MPlayer project - a really really great movie player. Some minor parts are copied from ffmpeg/avutil headers. Tesseract is used as OCR software.

vobsub2srt is released under the GPL3+ license. The MPlayer code included is GPL2+ licensed.

The quality of the OCR depends on the text in the subtitles. Currently the code does not use any preprocessing. But I’m currently looking into adding filters and scaling options to improve the OCR. You can correct mistakes in the .srt files with a text editor or a special subtitle editor.

Building

You need tesseract. You also need cmake and a gcc to build it. With Ubuntu 12.10 you can install the dependencies with

sudo apt-get install libtiff5-dev libtesseract-dev tesseract-ocr-eng build-essential cmake pkg-config

You should also install the tesseract data for the languages you want to use! Note that the support for tesseract 2 is deprecated and will be removed in the future!

./configure
make
sudo make install

This should install the program vobsub2srt to /usr/local/bin. You can uninstall vobsub2srt with sudo make uninstall.

Static binary

I recommend using the dynamic binary! However if you really need a static binary you can add the flag -DBUILD_STATIC=ON to the ./configure call. But be aware that building static binaries can be quite troublesome. You need the static library files for tesseract, libtill, libavutils, and for their dependencies as well. On Ubuntu 12.04 the static libraries are only included in the dev packages! You probably also need the Gold linker.

For Ubuntu 12.04 you need the following extra packages:

sudo apt-get install libleptonica-dev libpng12-dev libwebp-dev libgif-dev zlib1g-dev libjpeg-dev binutils-gold

If linking fails with undefined references then checking what other dependencies your version of leptonica has is a good starting point. You can do this by running ldd /usr/lib/liblept.so (or whatever the path to leptonica is on your system). Add those dependencies to CMakeModules/FindTesseract.cmake.

Ubuntu PPA and .deb packages

I have created a PPA (Personal Package Archive) to make installation on Ubuntu easy. Simply add the PPA to your apt-get sources and run an update and you can install the vobsub2srt package:

sudo add-apt-repository ppa:ruediger-c-plusplus/vobsub2srt
sudo apt-get update
sudo apt-get install vobsub2srt

.deb (Debian/Ubuntu)

You can build a *.deb package (Debian/Ubuntu) with make package. The package is created in the build directory.

You can also create a source package and upload it to your own PPA by using the UploadPPA.cmake. But this is only recommended for people experienced with cmake and creating Debian packages.

Homebrew

Vobsub2srt contains a formula for Homebrew (a package manager for OS X). It can be installed by using the following commands:

brew install --with-all-languages tesseract
brew install --HEAD https://github.com/ruediger/VobSub2SRT/raw/master/packaging/vobsub2srt.rb

Gentoo ebuild

An ebuild for Gentoo Linux is also available. You can make it available to emerge with the following steps

sudo mkdir -p /usr/local/portage/media-video/vobsub2srt/
wget https://github.com/ruediger/VobSub2SRT/raw/master/packaging/vobsub2srt-9999.ebuild
sudo mv vobsub2srt-9999.ebuild /usr/local/portage/media-video/vobsub2srt/
cd /usr/local/portage/media-video/vobsub2srt/
sudo ebuild vobsub2srt-999.ebuild digest

You should be able to install vobsub2srt with emerge vobsub2srt now. If you want to use a newer version (3+) of tesseract you have to use layman. See #13 for details.

Arch AUR

There also exist a PKGBUILD file for Arch Linux in AUR: https://aur.archlinux.org/packages/vobsub2srt-git

Usage

vobsub2srt converts subtitles in VobSub (.idx / .sub) format into subtitles in .srt format. VobSub subtitles consist of two or three files called Filename.idx, Filename.sub and optional Filename.ifo. To convert subtitles simply call

vobsub2srt Filename

with Filename being the file name of the subtitle files WITHOUT the extension (.idx / .sub). vobsub2srt writes the subtitles to a file called Filename.srt.

If a subtitle file contains more than one language you can use the --lang parameter to set the correct language (Use --langlist to find out about the languages in the file). For some languages you might need to set the tesseract language yourself (e.g., chi_tra/chi_sim for traditional or simplified chinese characters). You can use --tesseract-lang to do this. In most cases this should however be autodetected.

If you want to dump the subtitles as images (e.g. to check for correct ocr) you can use the --dump-images flag.

Use --help or read the manpage to get more information about the options of vobsub2srt.

Bug reports

Please submit bug reports or feature requests to the issue tracker on GitHub. If you do not have a GitHub account and feel uncomfortable creating one then feel free to send an e-mail to <[email protected]> instead.

If you have problems with a specific subtitle file then please check if it works in mplayer first. If it does not then please report the bug to mplayer as well and link to the mplayer bug report.

For bug reports please run vobsub2srt with the --verbose option and copy and paste the full output to the bug report.

Contributors

Most code is from the MPlayer project.

  • Armin Häberling <[email protected]> wrote a patch to fix an issue with multiple instances of the same subtitle in result file (21af426)
  • James Harris <[email protected]> wrote the formula for Homebrew (54f311d6)
  • Leo Koppelkamm reported and fixed issue #5 and problems with long filenames (b903074c, 36ec8da, d3602d6)
  • Till Korten <[email protected]> wrote the ebuild script (#13)
  • Andreasf fixed missing libavutil include path (3a175eb, #15)
  • Michal Gawlik fixed the overlapping issue (5b2ccabc55f, #29, #32)
  • “bit” made sure no trailing whitespace are written to the SRT (3a59dc278abc2, #38)
  • Baudouin Raoult for various fixes (028f742, #44, b722a03, #42, 7293ac2, #40)
  • Justyn Butler added the y-threshold support (f873761, #43)
  • James Laird-Wah added min-width/height support and fixed other issues (41c6844, #48, #46)
  • Filirom1 fixed a minor issue (4ed58c2, #49)

To Do

  • implement preprocessing (first step scaling. Code available in spudec.c)

More Repositories

1

Boost-Pretty-Printer

GDB Pretty Printers for Boost
Python
196
star
2

libbert

A BERT library for C++ and C
C++
53
star
3

magit-filenotify

Refresh status buffer when git tree changes
Emacs Lisp
49
star
4

emacs-codepad

Emacs integration for codepad.org
Emacs Lisp
42
star
5

weather-metno-el

Weather data from met.no in GNU Emacs
Emacs Lisp
32
star
6

qrencode-el

QRCode encoder for Emacs in pure elisp
Emacs Lisp
30
star
7

matlab-emacs

(Unofficial GIT Import of the Official CVS Repo!) Major mode for Emacs for editing MATLAB code, and running MATLAB in an inferior shell.
Emacs Lisp
19
star
8

mat-asm.js

Simple linear algebra implementation using asm.js
JavaScript
15
star
9

emacs-firefox-remote

Using Firefox' Remote Debugger from Emacs
Emacs Lisp
9
star
10

gcc-etags

GCC plugin to generate ETags-like TAGS files
C++
9
star
11

rusti.el

Rust interactive mode
Emacs Lisp
8
star
12

osm-mode

OpenStreetMap mode for Emacs
Emacs Lisp
8
star
13

systemtap-mode

Emacs mode for SystemTap
Emacs Lisp
7
star
14

rainbow-mode

Colorize color names in buffers
Emacs Lisp
7
star
15

bibeltex

BibTeX for org-mode
Emacs Lisp
5
star
16

cl-fastcgi

FastCGI Library for Common Lisp
Common Lisp
4
star
17

gcc-python-plugin

My "fork" of the gcc-python-plugin (will send patches upstream soon!)
Python
3
star
18

org-world-cup2014

FIFA World Cup 2014 Schedule for org-mode
3
star
19

Beerzlib-License

A modified zlib license
3
star
20

help.github.com

GitHub help guides
JavaScript
3
star
21

xmls-path

lisp style xpath-derivate
Common Lisp
3
star
22

PonWebGL

A simple Pong-like game based on WebGL
JavaScript
2
star
23

sdl-flusspferd

SDL module for Flusspferd (JavaScript)
C++
2
star
24

misc

misc stuff
C
2
star
25

webglframework

A very basic framework for WebGL
JavaScript
2
star
26

dcpu16

DCPU-16 Emulator and Tools
C++
2
star
27

apl-mode

An Emacs Mode for the APL programming language
Emacs Lisp
2
star
28

libuni

my attempt at implementing Unicode
C++
2
star
29

countryquiz

A country quiz in xhtml, svg and javascript.
JavaScript
2
star
30

wasm-demo-web

Just playing around with wasm/web-sys/rust.
Rust
1
star
31

cloudbbq

Attempt at a Rust implementation to talk to "Cloud BBQ" thermometers.
Rust
1
star