• Stars
    star
    226
  • Rank 176,576 (Top 4 %)
  • Language
    Rust
  • License
    The Unlicense
  • Created over 6 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A command line tool to rename media files based on titles from IMDb.

imdb-rename

A command line tool to rename media files based on titles from IMDb. imdb-rename downloads the official IMDb data set and creates a local index to use for fast fuzzy searching.

Linux build status Windows build status

Dual-licensed under MIT or the UNLICENSE.

Installation

Archives of precompiled binaries for imdb-rename are available for Windows, macOS and Linux.

Otherwise, users are expected to compile imdb-rename from source:

$ git clone https://github.com/BurntSushi/imdb-rename
$ cd imdb-rename
$ cargo build --release
$ ./target/release/imdb-rename --help

Alternatively, if you have Cargo installed, then you can install imdb-rename directly from crates.io:

$ cargo install imdb-rename

imdb-rename's minimum supported Rust version is 1.28.0.

Archlinux

An aur package is available: imdb-rename.

Quick example

Ever since Season 1 of The Simpsons came out on DVD, I've been collecting them and ripping them on to my hard drive. My process is somewhat manual, but I wind up with a directory that looks like this:

S18E01.mkv  S18E05.mkv  S18E09.mkv  S18E13.mkv  S18E17.mkv  S18E21.mkv
S18E02.mkv  S18E06.mkv  S18E10.mkv  S18E14.mkv  S18E18.mkv  S18E22.mkv
S18E03.mkv  S18E07.mkv  S18E11.mkv  S18E15.mkv  S18E19.mkv
S18E04.mkv  S18E08.mkv  S18E12.mkv  S18E16.mkv  S18E20.mkv

It would be much nicer if these files had their proper episode titles. imdb-rename can rename these files automatically using episode titles from IMDb:

$ imdb-rename -q 'the simpsons {show}' *.mkv

This command ran a query with the -q flag to identify the TV show, provided the files to rename, and... presto!

S18E01 - The Mook, the Chef, the Wife and Her Homer.mkv
S18E02 - Jazzy & The Pussycats.mkv
S18E03 - Please Homer, Don't Hammer 'Em.mkv
S18E04 - Treehouse of Horror XVII.mkv
S18E05 - G.I. (Annoyed Grunt).mkv
S18E06 - Moe'N'a Lisa.mkv
S18E07 - Ice Cream of Margie: With the Light Blue Hair.mkv
S18E08 - The Haw-Hawed Couple.mkv
S18E09 - Kill Gil, Vol. 1 & 2.mkv
S18E10 - The Wife Aquatic.mkv
S18E11 - Revenge Is a Dish Best Served Three Times.mkv
S18E12 - Little Big Girl.mkv
S18E13 - Springfield Up.mkv
S18E14 - Yokel Chords.mkv
S18E15 - Rome-old and Juli-eh.mkv
S18E16 - Homerazzi.mkv
S18E17 - Marge Gamer.mkv
S18E18 - The Boys of Bummer.mkv
S18E19 - Crook and Ladder.mkv
S18E20 - Stop or My Dog Will Shoot.mkv
S18E21 - 24 Minutes.mkv
S18E22 - You Kent Always Say What You Want.mkv

Fancier example

imdb-rename isn't limited to just renaming TV episodes based on season/episode numbers. It can also perform a fuzzy match based on the contents of the file name. For example, given this file:

Thor.Ragnarok.2017.1080p.WEB-DL.DD5.1.H264-FGT.mkv

We can "clean it up" and rename it to a nice title like so:

$ imdb-rename Thor.Ragnarok.2017.1080p.WEB-DL.DD5.1.H264-FGT.mkv

which gives us:

Thor: Ragnarok (2017).mkv

Freeform searching

We can also use imdb-rename to search IMDb, which is the default behavior when a -q/--query is provided without any file names:

$ imdb-rename -q 'homey loves flanders'
#     score  id         kind       title                   year  tv
1     1.000  tt0773646  tvEpisode  Homer Loves Flanders    1994  S05E16 The Simpsons
2     0.646  tt2101691  tvEpisode  Tiny Loves Flowers      N/A   S02E08 Dinosaur Train
3     0.568  tt3203408  tvEpisode  Courtney Loves Love     2014  S01E05 Courtney Loves Dallas
4     0.561  tt1722576  short      In Flanders Fields      2010
5     0.561  tt2253780  tvSeries   In Vlaamse Velden       2014
6     0.555  tt4528474  video      My Lovely Homeland      2011
7     0.551  tt0220646  tvMovie    Moll Flanders           1975
[... results truncated ...]

Notice that our query had a typo in it. imdb-rename does its best to find the most relevant results. It is also fast. Even though the above query searches through all 6 million names in IMDb, it runs in under 100ms. This is thanks to using an inverted index memory mapped from disk.

How does it work?

imdb-rename works by downloading approved datasets from IMDb, and creating an inverted index based on ngrams extracted from the names in IMDb's data. The inverted index provides a quick way to search and rank results using techniques from information retrieval such as Okapi-BM25.

Motivation

My motivation for building this tool is somewhat idiosyncratic, but three-fold:

  1. I find it very convenient to have a tool to rename media files automatically. imdb-rename is my third iteration on this tool. The first was an unpublished hodge podge of Python scripts and a MySQL database. The second was a Go program with a PostgreSQL database. The Go program served me well, but IMDb retired their old data format, which required me to build a new tool to adapt.
  2. I've been working on a low-level information retrieval library off-and-on for a couple years, and initially built this tool on top of that library as a form of dogfooding. It didn't work out as well as I'd hoped, so I scrapped the generic library and built out a specific solution tailored to IMDb. I'm no longer dogfooding directly, but I've established a useful baseline.
  3. I want more people to learn about information retrieval, and I believe this tool can serve to teach others. In particular, imdb-rename is a complete end-to-end information retrieval system that is fast, solves a real problem, is only a few thousand lines of code and comes with a built-in evaluation that is easy to run.

This tool is perhaps a bit over engineered, but I had fun with it. Believe it or not, parts of imdb-rename are intentionally simple at the cost of both query speed and size on disk!

Evaluation

It is possible to run an evaluation to compare the various parameters available for searching. The evaluation system is available as a separate tool called imdb-eval, which is included in this repository. To use it, we must first build it:

$ git clone https://github.com/BurntSushi/imdb-rename
$ cd imdb-rename
$ cargo build --release --all
$ ./target/release/imdb-eval --help

Running an evaluation is simple. We can run an evaluation on all combinations of scorer and similarity function, along with ngram sizes of 3 and 4 like so: (This will use truth data that is built into the imdb-eval binary.)

$ ./target/release/imdb-eval --ngram-size 3 --ngram-size 4 | tee eval.csv

This will output the results of running a search on every item in the truth data. The results include the rank of the expected answer. The results can be summarized into a single score called the Mean Reciprocal Rank (which is itself a specific instance of MAP, or mean average precision) with the --summarize flag like so:

$ ./target/release/imdb-eval --summarize eval.csv

If you have xsv installed, then the results can be easily sorted and formatted:

$ ./target/release/imdb-eval --summarize eval.csv | xsv sort -R -s mrr | xsv table

If you want to tweak the truth data, then you might consider starting with the bundled truth data (assuming you're at the root of the imdb-rename repository):

$ $EDITOR data/eval/truth.toml
$ ./target/release/imdb-eval --ngram-size 3 --ngram-size 4 --truth data/eval/truth.toml

What does this tool not do?

imdb-rename is tool for renaming media files, and to the extent that searching IMDb facilitates renaming files, it is also a search tool. There is no intent to develop this further to explore all IMDb data, such as cast/crew information.

Folks interested in building a different type of IMDb tool may be interested in the imdb-index crate, which provides programmatic access to the index created by imdb-rename.

IMDb licensing

The data used by imdb-rename is retrieved from IMDb datasets. In particular, imdb-rename will never scrape imdb.com, and only uses the data provided by IMDb in the tsv files.

Additionally, imdb-rename must only be used for non-commercial and personal uses.

More Repositories

1

ripgrep

ripgrep recursively searches directories for a regex pattern while respecting your gitignore
Rust
48,517
star
2

xsv

A fast CSV command line toolkit written in Rust.
Rust
10,370
star
3

toml

TOML parser for Golang with reflection.
Go
4,464
star
4

quickcheck

Automated property based testing for Rust (with shrinking).
Rust
2,408
star
5

erd

Translates a plain text description of a relational database schema to a graphical entity-relationship diagram.
Haskell
1,805
star
6

fst

Represent large sets and maps compactly with finite state transducers.
Rust
1,771
star
7

jiff

A date-time library for Rust that encourages you to jump into the pit of success.
Rust
1,736
star
8

rust-csv

A CSV parser for Rust, with Serde support.
Rust
1,706
star
9

walkdir

Rust library for walking directories recursively.
Rust
1,283
star
10

nflgame

An API to retrieve and read NFL Game Center JSON data. It can work with real-time data, which can be used for fantasy football.
Python
1,257
star
11

nfldb

A library to manage and update NFL data in a relational database.
Python
1,079
star
12

aho-corasick

A fast implementation of Aho-Corasick in Rust.
Rust
1,027
star
13

byteorder

Rust library for reading/writing numbers in big-endian and little-endian.
Rust
980
star
14

wingo

A fully-featured window manager written in Go.
Go
958
star
15

memchr

Optimized string search routines for Rust.
Rust
888
star
16

bstr

A string type for Rust that is not required to be valid UTF-8.
Rust
795
star
17

advent-of-code

Rust solutions to AoC 2018
Rust
479
star
18

xgb

The X Go Binding is a low-level API to communicate with the X server. It is modeled on XCB and supports many X extensions.
Go
472
star
19

termcolor

Cross platform terminal colors for Rust.
Rust
462
star
20

rust-snappy

Snappy compression implemented in Rust (including the Snappy frame format).
Rust
449
star
21

go-sumtype

A simple utility for running exhaustiveness checks on Go "sum types."
Go
421
star
22

chan

Multi-producer, multi-consumer concurrent channel for Rust.
Rust
392
star
23

regex-automata

A low level regular expression library that uses deterministic finite automata.
Rust
352
star
24

cargo-benchcmp

A small utility to compare Rust micro-benchmarks.
Rust
342
star
25

suffix

Fast suffix arrays for Rust (with Unicode support).
Rust
261
star
26

rure-go

Go bindings to Rust's regex engine.
Go
250
star
27

tabwriter

Elastic tabstops for Rust.
Rust
247
star
28

rebar

A biased barometer for gauging the relative speed of some regex engines on a curated set of tasks.
Python
226
star
29

critcmp

A command line tool for comparing benchmarks run by Criterion.
Rust
216
star
30

ty

Easy parametric polymorphism at run time using completely unidiomatic Go.
Go
198
star
31

xgbutil

A utility library to make use of the X Go Binding easier. (Implements EWMH and ICCCM specs, key binding support, etc.)
Go
191
star
32

pytyle3

An updated (and much faster) version of pytyle that uses xpybutil and is compatible with Openbox Multihead.
Python
181
star
33

dotfiles

My configuration files and personal collection of scripts.
Vim Script
154
star
34

rsc-regexp

Translations of a simple C program to Rust.
Rust
137
star
35

rust-cbor

CBOR (binary JSON) for Rust with automatic type based decoding and encoding.
Rust
129
star
36

chan-signal

Respond to OS signals with channels.
Rust
125
star
37

goim

Goim is a robust command line utility to maintain and query the Internet Movie Database (IMDb).
Go
117
star
38

same-file

Cross platform Rust library for checking whether two file paths are the same file.
Rust
101
star
39

clibs

A smattering of miscellaneous C libraries. Includes sane argument parsing, a thread-safe multi-producer/multi-consumer queue, and implementation of common data structures (hashmaps, vectors and linked lists).
C
98
star
40

ucd-generate

A command line tool to generate Unicode tables as source code.
Rust
95
star
41

nflvid

An experimental library to map play meta data to footage of that play.
Python
91
star
42

rust-stats

Basic statistical functions on streams for Rust.
Rust
87
star
43

migration

Package migration for Golang automatically handles versioning of a database schema by applying a series of migrations supplied by the client.
Go
81
star
44

winapi-util

Safe wrappers for various Windows specific APIs.
Rust
64
star
45

xpybutil

An incomplete xcb-util port plus some extras
Python
62
star
46

graphics-go

Automatically exported from code.google.com/p/graphics-go
Go
60
star
47

rust-pcre2

High level Rust bindings to PCRE2.
C
56
star
48

blog

My blog.
Rust
52
star
49

rust-sorts

Implementations of common sorting algorithms in Rust with comprehensive tests and benchmarks.
Rust
51
star
50

openbox-multihead

Openbox with patches for enhanced multihead support.
C
46
star
51

nakala

A low level embedded information retrieval system.
Rust
45
star
52

nflfan

View your fantasy teams with nfldb using a web interface.
JavaScript
43
star
53

globset

A globbing library for Rust.
Rust
42
star
54

utf8-ranges

Convert contiguous ranges of Unicode codepoints to UTF-8 byte ranges.
Rust
42
star
55

rtmpdump-ksv

rtmpdump with ksv's patch. Intended to track upstream git://git.ffmpeg.org/rtmpdump as well.
C
40
star
56

regexp

A regular expression library implemented in Rust.
Rust
37
star
57

xdg

A Go package for reading config and data files according to the XDG Base Directory specification.
Go
35
star
58

locker

A simple Golang package for conveniently using named read/write locks. Useful for synchronizing access to session based storage in web applications.
Go
34
star
59

nflcmd

A collection of command line utilities for viewing NFL statistics and rankings with nfldb.
Python
32
star
60

notes

A collection of small notes that aren't appropriate for my blog.
31
star
61

mempool

A fast thread safe memory pool for reusing allocations.
Rust
29
star
62

gribble

A command oriented language whose environment is defined through Go struct types by reflection.
Go
28
star
63

vcr

A simple wrapper tool around ffmpeg to capture video from a VCR.
Rust
27
star
64

encoding_rs_io

Streaming I/O adapters for the encoding_rs crate.
Rust
25
star
65

rust-cmail

A simple command line utility for periodically sending email containing the output of long-running commands.
Rust
21
star
66

cluster

A simple API for managing a network cluster with smart peer discovery.
Go
19
star
67

pager-multihead

A pager that supports per-monitor desktops (compatible with Openbox Multihead and Wingo)
Python
15
star
68

cablastp

Performs BLAST on compressed proteomic data.
Go
15
star
69

rust-error-handling-case-study

Code for the case study in my blog post: http://blog.burntsushi.net/rust-error-handling
Rust
15
star
70

rg-cratesio-typosquat

The source code of the 'rg' crate. It is an intentional typo-squat that redirects folks to 'ripgrep'.
Rust
15
star
71

imgv

An image viewer for Linux written in Go.
Go
14
star
72

cmd

A convenience library for executing commands in Go, including executing commands in parallel with a pool.
Go
14
star
73

fanfoot

View your fantasy football leagues and get text alerts when one of your players scores.
Python
12
star
74

cmail

cmail runs a command and sends the output to your email address at certain intervals.
Go
12
star
75

gohead

An xrandr wrapper script to manage multi-monitor configurations. With hooks.
Go
12
star
76

burntsushi-blog

A small Go application for my old blog.
CSS
12
star
77

intern

A simple package for interning strings, with a focus on efficiently representing dense pairwise data.
Go
11
star
78

crev-proofs

My crev reviews.
10
star
79

pytyle1

A lightweight X11 tool for simulating tiling in a stacking window manager.
Python
9
star
80

cif

A golang package for reading and writing data in the Crystallographic Information File (CIF) format. It mostly conforms to the CIF 1.1 specification.
Go
9
star
81

rucd

WIP
Rust
8
star
82

qcsv

An API to read and analyze CSV files by inferring types for each column of data.
Python
8
star
83

pyndow

A window manager written in Python
Python
8
star
84

csql

Package csql provides convenience functions for use with the types and functions defined in the standard library `database/sql` package.
Go
6
star
85

freetype-go

A fork of freetype-go with bounding box calculations.
Go
6
star
86

sqlsess

Simple database backed session management. Integrates with Gorilla's sessions package.
Go
6
star
87

go-wayland-simple-shm

C
5
star
88

sqlauth

A simple Golang package that provides database backed user authentication with bcrypt.
Vim Script
4
star
89

lcmweb

A Go web application for coding documents with the Linguistic Category Model.
JavaScript
4
star
90

bcbgo

Computational biology tools for the BCB group at Tufts University.
Go
4
star
91

fex

A framework for specifying and executing experiments.
Haskell
3
star
92

present

My presentations.
HTML
3
star
93

memchr-2.6-mov-regression

Rust
3
star
94

genecentric

A tool to generate between-pathway modules and perform GO enrichment on them.
Python
3
star
95

rust-docs

A silly repo for managing my Rust crate documentation.
Python
3
star
96

pcre2-mirror

A git mirror for PCRE2's SVN repository at svn://vcs.exim.org/pcre2/code
2
star
97

xpyb

A clone of xorg-xpyb.
C
2
star
98

burntsushi-homepage

A small PHP web application for my old homepage.
PHP
2
star
99

window-marker

Use vim-like marks on windows.
Python
2
star
100

sudoku

An attempt at a sudoku solver in Haskell.
Haskell
1
star