uniscribe | Describe the Unicode
Describes Unicode characters with their name and shows compositions. UNICODE 15.0*
- Helps you understand how glyphs and codepoints are structured within the data
- Gives you the names of glyphs and codepoints, which can be used for further research
- Highlights invalid/special/blank codepoints
Uses a similar color coding like its lower-level companion tool unibits.
Setup
Make sure you have Ruby installed and installing gems works properly. Then do:
$ gem install uniscribe
Usage
Pass the string to debug to uniscribe:
From CLI
$ uniscribe "test strฤฑฬng"
From Ruby
require "uniscribe/kernel_method"
uniscribe "test strฤฑฬng"
Output
0074 โโ t โโ LATIN SMALL LETTER T
0065 โโ e โโ LATIN SMALL LETTER E
0073 โโ s โโ LATIN SMALL LETTER S
0074 โโ t โโ LATIN SMALL LETTER T
0020 โโ ] [ โโ SPACE
0073 โโ s โโ LATIN SMALL LETTER S
0074 โโ t โโ LATIN SMALL LETTER T
0072 โโ r โโ LATIN SMALL LETTER R
---- โโฌ ฤฑฬ โโฌ Composition
0131 โโโ ฤฑ โโโ LATIN SMALL LETTER DOTLESS I
0308 โโโ โฬ โโโ COMBINING DIAERESIS
006E โโ n โโ LATIN SMALL LETTER N
0067 โโ g โโ LATIN SMALL LETTER G
Examples
Tamil
>> uniscribe "เฎจเฎเฎฐเฎคเฏเฎคเฎฟเฎฒเฏ"
Thai
>> uniscribe "เธกเนเธฒเธฅเธฒเธขเธซเธเธเธฑเธง"
Ideographic Variations
>> uniscribe "่พป๓ ใ๓ "
(the variation is not visible in the screenshot, because my system does not render it correctly)
Emoji Sequences
>> uniscribe "3๏ธโฃ๐คธโโ"
Lots of Combining Marks
>> uniscribe "องฬพอฌฬงฬถฬจฬฑฬนฬญฬฏCอญฬอฅอฎอฬทฬฬฒฬอOอฎอฬฎฬชฬอ"
Random Sequences of some Special Unicode Codepoints
>> uniscribe "\0A\u{E01D7}\x7F\r\n\u{D0000}\u{81}\u{FFF9}B\u{FFFB}๐ด\u{E0061}\u{E007F}\u{10FFFF}"
Some Blanks
>> uniscribe "ยญแ
โโฌ๏ปฟ๐
ธ"
*Notes
Although the gem is generally up to date with Unicode 15.0, the proper detection of compositions / graphemes / combined characters depends on your Ruby version:
You can run uniscribe -v
to check for the Unicode level of your uniscribe version.
Also see
- CLI: unibits - visualizes Unicode encodings
- CLI: unicopy - copy codepoints to clipboard
- Website: character.construction - lists notable codepoints
- Ruby Library: symbolify - used for safely printing individual codepoints
- Ruby Library: characteristics - used for detecting blanks and similar
- Unicodeยฎ Standard Annex #29: Unicode Text Segmentation
- Talk: Ten Unicode Characters You Should Know About as a Programmer
Copyright (C) 2017-2022 Jan Lelis https://janlelis.com. Released under the MIT license.