• Stars
    star
    795
  • Rank 57,274 (Top 2 %)
  • Language
    JavaScript
  • License
    Other
  • Created over 14 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

node.js iconv bindings - text recoding for fun and profit!

node-iconv

Text recoding in JavaScript for fun and profit!

Supported encodings

European languages
    ASCII, ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16},
    KOI8-R, KOI8-U, KOI8-RU,
    CP{437,737,775,850,852,853,855,857,858,860,861,863,865,866,869}
    CP{1125,1250,1251,1252,1253,1254,1257}
    Mac{Roman,CentralEurope,Iceland,Croatian,Romania},
    Mac{Cyrillic,Ukraine,Greek,Turkish},
    Macintosh
Semitic languages
    ISO-8859-{6,8}, CP{1255,1256}, CP862, CP864, Mac{Hebrew,Arabic}
Japanese
    EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1
    EUC-JISX0213, Shift_JISX0213, ISO-2022-JP-3
Chinese
    EUC-CN, HZ, GBK, CP936, GB18030, EUC-TW, BIG5, CP950, BIG5-HKSCS,
    BIG5-HKSCS:2004, BIG5-HKSCS:2001, BIG5-HKSCS:1999, ISO-2022-CN,
    ISO-2022-CN-EXT, BIG5-2003 (experimental)
Korean
    EUC-KR, CP949, ISO-2022-KR, JOHAB
Turkmen
    TDS565
Armenian
    ARMSCII-8
Georgian
    Georgian-Academy, Georgian-PS
Tajik
    KOI8-T
Kazakh
    PT154, RK1048
Thai
    ISO-8859-11, TIS-620, CP874, MacThai
Laotian
    MuleLao-1, CP1133
Vietnamese
    VISCII, TCVN, CP1258
Platform specifics
    HP-ROMAN8, NEXTSTEP, ATARIST, RISCOS-LATIN1
Full Unicode
    UTF-8
    UCS-2, UCS-2BE, UCS-2LE
    UCS-4, UCS-4BE, UCS-4LE
    UTF-16, UTF-16BE, UTF-16LE
    UTF-32, UTF-32BE, UTF-32LE
    UTF-7
    C99, JAVA
Full Unicode, in terms of `uint16_t` or `uint32_t`
    (with machine dependent endianness and alignment)
    UCS-2-INTERNAL, UCS-4-INTERNAL
Locale dependent, in terms of `char` or `wchar_t`
    (with machine dependent endianness and alignment, and with OS and
    locale dependent semantics)
    char, wchar_t
    The empty encoding name "" is equivalent to "char": it denotes the
    locale dependent character encoding.

If you don't need the full gamut of encodings, consider using iconv-lite. It supports most common encodings and doesn't require a compiler to install.

Installing with npm

$ npm install iconv

Note that you do not need to have a copy of libiconv installed to use this module.

Compiling from source

$ git clone git://github.com/bnoordhuis/node-iconv.git
$ cd node-iconv
$ npm install

If you have a specific node.js source checkout that you want to build against, replace the last command with:

$ npm install --nodedir=/path/to/node

Usage

Encode from one character encoding to another:

// convert from UTF-8 to ISO-8859-1
var Buffer = require('buffer').Buffer;
var Iconv  = require('iconv').Iconv;
var assert = require('assert');

var iconv = new Iconv('UTF-8', 'ISO-8859-1');
var buffer = iconv.convert('Hello, world!');
var buffer2 = iconv.convert(Buffer.from('Hello, world!'));
assert.equal(buffer.inspect(), buffer2.inspect());
// do something useful with the buffers

A simple ISO-8859-1 to UTF-8 conversion TCP service:

var net = require('net');
var Iconv = require('iconv').Iconv;
var server = net.createServer(function(conn) {
  var iconv = new Iconv('latin1', 'utf-8');
  conn.pipe(iconv).pipe(conn);
});
server.listen(8000);
console.log('Listening on tcp://0.0.0.0:8000/');

Look at test/test-basic.js and test/test-stream.js for more examples and node-iconv's behaviour under error conditions.

Notes

Things to keep in mind when you work with node-iconv.

Chunked data

Say you are reading data in chunks from a HTTP stream. The logical input is a single document (the full POST request data) but the physical input will be spread over several buffers (the request chunks).

You must accumulate the small buffers into a single large buffer before performing the conversion. If you don't, you will get unexpected results with multi-byte and stateful character sets like UTF-8 and ISO-2022-JP.

The above only applies when you are calling Iconv#convert() yourself. If you use the streaming interface, node-iconv takes care of stitching partial character sequences together again.

Dealing with untranslatable characters

Characters are not always translatable to another encoding. The UTF-8 string "ça va が", for example, cannot be represented in plain 7-bits ASCII without some loss of fidelity.

By default, node-iconv throws EILSEQ when untranslatabe characters are encountered but this can be customized. Quoting the iconv_open(3) man page:

//TRANSLIT
When  the  string  "//TRANSLIT"  is appended to tocode, transliteration is
activated. This means that when a character cannot be represented in the
target character set, it can be approximated through one or several
similarly looking characters.

//IGNORE
When the string "//IGNORE" is appended to tocode, characters that cannot be
represented in the target character set will be silently discarded.

Example usage:

var iconv = new Iconv('UTF-8', 'ASCII');
iconv.convert('Γ§a va'); // throws EILSEQ

var iconv = new Iconv('UTF-8', 'ASCII//IGNORE');
iconv.convert('Γ§a va'); // returns "a va"

var iconv = new Iconv('UTF-8', 'ASCII//TRANSLIT');
iconv.convert('Γ§a va'); // "ca va"

var iconv = new Iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE');
iconv.convert('ça va が'); // "ca va "

EINVAL

EINVAL is raised when the input ends in a partial character sequence. This is a feature, not a bug.

More Repositories

1

node-heapdump

Make a dump of the V8 heap for later inspection.
JavaScript
2,487
star
2

node-profiler

Access the V8 profiler from node.js
JavaScript
283
star
3

node-buffertools

working with node.js buffers made easy
C++
205
star
4

v8-cmake

The V8 JavaScript engine, but built with CMake instead of GN - WIP
C++
185
star
5

ragel

Ragel State Machine Compiler - http://www.complang.org/ragel/
C++
156
star
6

node-event-emitter

shows how to emit events from C++ land
JavaScript
80
star
7

node-unix-dgram

unix datagram support for node.js
C++
77
star
8

gyp

GYP can Generate Your Projects.
Python
72
star
9

quickjit

quickjs meets jit
C
60
star
10

node-mmap

mmap(2) bindings for node.js
C++
54
star
11

node-idle-gc

Run the V8 GC when node.js is idle.
C++
42
star
12

v8.rs

V8 bindings for Rust
C++
39
star
13

punycode

Punycode encoder/decoder
C
37
star
14

libuv-chat

a chat app done the hard way
JavaScript
32
star
15

phode

we put the async in php
C
30
star
16

uriparser2

Your one-stop C and C++ library for URI parsing.
C
30
star
17

bspc

Quake 3 BSP-to-AAS compiler
C
28
star
18

tkgate

A graphical editor and event-driven simulator for digital circuits
C
25
star
19

strace

http://sourceforge.net/projects/strace/
C
24
star
20

ngx_http_auth_cas_module

CAS client for nginx
C
24
star
21

node-weakref

weak references in node.js
C++
17
star
22

netif

List available network interfaces
Rust
13
star
23

python-ntlm

Python library that provides NTLM support, including an authentication handler for urllib2
Python
13
star
24

rust-epoll

low-level epoll bindings for rust
Rust
13
star
25

epoll-bench

C
12
star
26

node-wgpu

WebGPU for Node.js
Rust
12
star
27

libiconv

C
12
star
28

mongrel2

mongrel2 + dependencies = your one-stop mongrel2 build
C
12
star
29

termios

Pretty-print the contents of `struct termios`
C
10
star
30

lua-uv

lua + libuv = sweet async goodness
C
10
star
31

node-fastcgi

FastCGI support for node.js
C++
9
star
32

cracknode

stop gacking, start cracking
C
9
star
33

Infinite-Adaptive-Mario

Maven-ized version of Infinite Adaptive Mario.
Java
9
star
34

node-backtrace

Prints a C++ and JS backtrace on SIGABRT.
C++
8
star
35

vime

vime, a faster vm module
JavaScript
8
star
36

node-native-certs

Load TLS root certificates from the system trust store
JavaScript
7
star
37

node-curl

cURL bindings for node.js
C++
6
star
38

rust-errno

access errno from your rust code
Rust
6
star
39

marnix

MARNIX, a UNIX clone
C
6
star
40

smjs

gyp-ified spidermonkey, WIP
JavaScript
6
star
41

chicken-core

http://call-cc.org/
Scheme
5
star
42

amazing-graceful-fs

Like graceful-fs, but without eval() hacks and polyfills.
JavaScript
4
star
43

httpd-accfilt

kernel-mode http accept filter benchmark
C
4
star
44

mod_git

serve content straight from your git repositories
C
4
star
45

node-roughtime

Roughtime client for Node.js. Roughtime provides secure time synchronisation.
JavaScript
4
star
46

libsm

A fast string matcher library.
C
4
star
47

axis2-c

Apache Axis2/C is a Web services engine implemented in the C programming language.
C
4
star
48

suv

scheme + libuv - what's not to like?
Python
4
star
49

node-http-parser

C++
3
star
50

fth

a not-quite-Forth-to-LLVM-bitcode compiler
Python
3
star
51

entityplus

just another quake 3 mod
C
3
star
52

dyper

Rust
3
star
53

bnoordhuis.github.com

GitHub Pages
3
star
54

random-bigint

Generate cryptographically strong pseudo-random BigInts
JavaScript
3
star
55

chamfilter

block China and other South Asian countries at the firewall level
Shell
3
star
56

qo

quickjs + golang
C
2
star
57

faio

Fast asynchronous I/O
C
2
star
58

jove

JΓ–VE is a framework for making 2D games in JavaScript
Rust
2
star
59

gyp-bug

showcase gyp bug
C
2
star
60

mod_modlet

Hassle-free module authoring for Apache 2
C
2
star
61

node-bursar

Generate RSA keys as PKCS#1, PKCS#8 or BER
JavaScript
2
star
62

node-rusage

getrusage(2) bindings
C++
2
star
63

brr

Nothing special, just going fast
C
2
star
64

hpv

hyper + polloi + v8
Rust
2
star
65

node-permute

A tiny library to permutate a sequence.
JavaScript
2
star
66

json-schema-validator

Java
1
star
67

golang-quickjs-serde

Go
1
star