unibits | Reveal the Unicode
Ruby library and CLI command that visualizes various Unicode and ASCII/single byte encodings in the terminal:
- Makes analyzing encodings easier
- Helps you with debugging strings
- Highlights invalid/special/blank bytes/characters/codepoints
- Supports UTF-8, UTF-16LE/UTF-16BE, UTF-32LE/UTF-32BE, ISO-8859-X, Windows-125X, IBMX, CP85X, macX, TIS-620/Windows-874, KOI8-R/KOI8-U, 7-Bit ASCII/GB1988, and arbitrary BINARY data
Color Coding
Each byte of the given string is highlighted using the following mechanism (characters -> codepoints):
- Red for invalid bytes
- Light blue for blanks
- Blue for control characters
- Non-control formatting characters in pink
- Green for marks (Unicode only)
- Orange for unassigned codepoints
- Lighter orange for unassigned codepoints which are also ignorable
- Random color for all other codepoints
The same colors are used in the higher-level companion tool uniscribe.
Setup
Make sure you have Ruby installed and installing gems works properly. Then do:
$ gem install unibits
Usage
Pass the string to debug to unibits:
From CLI
$ unibits "🌫 Idiosyncrätic ℜսᖯʏ"
From Ruby
require 'unibits/kernel_method'
unibits "🌫 Idiosyncrätic ℜսᖯʏ"
Advanced Options
unibits
takes some optional options:
- encoding (e): The encoding of the given string (uses the string's default encoding if none given)
- convert (c): An encoding the string should be converted to before visualizing it
- stats: Whether to show a short stats header (default:
true
), you can deactivate on the CLI with--no-stats
- wide-ambiguous: Treat characters of ambiguous width as 2 spaces instead of 1 (more info)
- width (w): Set a custom column width, if not set, unibits will retrieve it from the terminal or just use 80
Examples of Valid Encodings
UTF-8
CLI: $ unibits -e utf-8 -c utf-8 "🌫 Idiosyncrätic ℜսᖯʏ"
Ruby: unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'utf-8', convert: 'utf-8'
UTF-16LE
CLI: $ unibits -e utf-8 -c utf-16le "🌫 Idiosyncrätic ℜսᖯʏ"
Ruby: unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'utf-8', convert: 'utf-16le'
UTF-32BE
CLI: $ unibits -e utf-8 -c utf-32be "🌫 Idiosyncrätic ℜսᖯʏ"
Ruby: unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'utf-8', convert: 'utf-32be'
BINARY
CLI: $ unibits -e binary "🌫 Idiosyncrätic ℜսᖯʏ"
Ruby: unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'binary'
ASCII
CLI: $ unibits -e utf-8 -c ascii "ascii"
Ruby: unibits "ascii", encoding: 'utf-8', convert: 'ascii'
Examples of Invalid Encodings
UTF-8
Example in Ruby: unibits "unexpected \x80 | not enough \xF0\x9F\x8C | overlong \xE0\x81\x81 | surrogate \xED\xA0\x80 | too large \xF5\x8F\xBF\xBF"
ASCII
Example in Ruby: unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'ascii'
Notes
More info
- Ruby's Encoding class
- UTF-8 (Wikipedia)
- UTF-16 (Wikipedia)
- UTF-32 (Wikipedia)
- Difference between BINARY and ASCII
Related gems
Lots of thanks to @damienklinnert for the motivation and inspiration required to build this!
Copyright (C) 2017-2022 Jan Lelis https://janlelis.com. Released under the MIT license.