• Stars
    star
    108
  • Rank 321,259 (Top 7 %)
  • Language
    Elixir
  • License
    Other
  • Created over 9 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Elixir string encoding conversion - like iconv but pure Elixir

Codepagex

Build Status Documentation Status

Codepagex is an elixir library to convert between string encodings to and from utf-8. Like iconv, but written in pure Elixir.

All the encodings are fetched from unicode.org tables and conversion functions are generated from these at compile time.

Note on the unicode built in module

Note that the Erlang built in :unicode module has some provisions for converting between utf-8 and latin1 code sets. If that is all you need, you should consider not using codepagex but rather rely on this simpler alternative.

Compared to this functionality codepagex provides:

  • More codepage mapping options
  • The ability to handle illegal encoding with custom logic
  • A simpler interface

But please remember that codepagex is comparatively a lot more complex, making extensive use of macro programming.

Examples

The package is assumed to be interfaced using only the Codepagex module.

    iex> from_string("æøåÆØÅ", :iso_8859_1)
    {:ok, <<230, 248, 229, 198, 216, 197>>}

    iex> to_string(<<230, 248, 229, 198, 216, 197>>, :iso_8859_1)
    {:ok, "æøåÆØÅ"}

    iex> from_string!("æøåÆØÅ", :iso_8859_1)
    <<230, 248, 229, 198, 216, 197>>

    iex> to_string!(<<230, 248, 229, 198, 216, 197>>, :iso_8859_1)
    "æøåÆØÅ"

When there are invalid byte sequences in a String or encoded binary, the functions will not succeed. If you still want to handle these strings, you may specify a function to handle these circumstances. Eg:

    iex> from_string("Hello æøå!", :ascii, replace_nonexistent("_"))
    {:ok, "Hello ___!", 3}

    iex> iso = "Hello æøå!" |> from_string!(:iso_8859_1)
    iex> to_string!(iso, :ascii, use_utf_replacement())
    "Hello ���!"

Encodings

A full list of encodings is found by running encoding_list/1.

The encodings are best supplied as an atom, or else the string is converted to atom for you (but with a somewhat less efficient function lookup). Eg:

    iex> from_string("æøå", "ISO8859/8859-9")
    {:ok, <<230, 248, 229>>}

    iex> from_string("æøå", :"ISO8859/8859-9")
    {:ok, <<230, 248, 229>>}

For some encodings, an alias is set up for easier dispatch. The list of aliases is found by running aliases/1. The code looks like:

    iex> from_string!("Hello æøåÆØÅ!", :iso_8859_1)
    <<72, 101, 108, 108, 111, 32, 230, 248, 229, 198, 216, 197, 33>>

Encoding selection

By default all ISO-8859 encodings and ASCII is included. There are a few more available, and these must be specified in the config/config.exs file. The specified files are then compiled. Adding many encodings may affect compilation times, in particular for the largest ones.

To specify the encodings to use, add the following lines to your config/config.exs and recompile:

    use Mix.Config
    config :codepagex, :encodings, [:ascii]

This will add only the ASCII encoding, as specified by it's shorthand alias. Any number of encodings may be specified like this in the list. The list may contain strings, atoms or regular expressions that match either an alias or a full encoding name, eg:

    use Mix.Config
    config :codepagex, :encodings, [
      :ascii,           # by alias name
      ~r[iso8859]i,     # by a regex matching the full name
      "ETSI/GSM0338",   # by the full name as a string
      :"MISC/CP856"     # by a full name as an atom
    ]

After modifying the encodings list in the configuration, always make sure to run the following or the encodings you specified will not be compiled in:

mix deps.compile codepagex --force

This is necessary due to the fact that Codepagex's configuration changes are not picked up automatically when it's a dependency in another project. Credit for the find goes to @michalmuskala here: https://elixirforum.com/t/sharing-with-the-community-text-transcoding-libraries/17962/2

The encodings that are known to require very long compile times are:

  • VENDORS/MISC/KPS9566
  • VENDORS/MICSFT/WINDOWS/CP932
  • VENDORS/MICSFT/WINDOWS/CP936
  • VENDORS/MICSFT/WINDOWS/CP949
  • VENDORS/MICSFT/WINDOWS/CP950

TODO

  • A few encodings are not yet supported for different reasons. In particular the asian and arab ones with left-right and up-down variations.
  • Test Elixir function specs
  • Benchmarking vs iconv native libraries
  • Support for iolists
  • when converting sections of a string that are unchanged, return the original input. Consider using iolists to return the values so that chunks may be saved continuously
  • lazy converter to get n characters / codepoints
  • function to drop n characters and take n characters (and slice?)

More Repositories

1

modbus-cli

Modbus command line utility
Ruby
102
star
2

comb

Elixir combinatorics - permutations and combinations of lists
Elixir
48
star
3

geoutm

Conversion between latitude/longitude coordinates to UTM
Ruby
41
star
4

stream_split

Split a stream into a head and tail, without iterating the tail
Elixir
36
star
5

spliner

Cubic spline interpolation library
Ruby
20
star
6

servodrive

Beagleboard RC Servo driver - fork of http://chrisd.info/portfolio/indexbeaglerc.shtml
C
17
star
7

bladegen

Generate propeller blades in OpenSCAD
OpenSCAD
14
star
8

plcutil

Some PLC utillities for Siemens, Schneider and Wonderware written in Ruby
Ruby
12
star
9

picopc

Pico OPC: A really small OPC library for Ruby 1.9+
Ruby
11
star
10

machines

Some classes that could be used in a theoretical Ruby based PLC
Ruby
6
star
11

railsvg

Concept test to replace HMI - combination of SVG graphics and AJAX
Ruby
3
star
12

TRPTSim

A simulator for the Pyramid TRPT airborne wind energy [AWE] windmill
Julia
3
star
13

gpsspeed

Calculate the maximum speeds over a given distance given track in gpx file
Ruby
3
star
14

veoanity

Vanity addresses for the Amoveo network
Elixir
2
star
15

kidsakoder_kidsbook

Kursopplegg Kidsa Koder
Slim
2
star
16

kapture

Rails driven application for remote image capture on KAP rig
Ruby
2
star
17

tallakt-picsort

TODO: one-line summary of your gem
Ruby
1
star
18

veoallet

Simplest safe wallet for Amoveo
Elixir
1
star
19

firealarm

SMS alarm triggered by digital output of smoke detectors
Ruby
1
star
20

prowlex

Elixir interface to Prowl app https://www.prowlapp.com/
Elixir
1
star
21

TetherDragODESolver

Tether drag estimate for airborne wind energy [AWE]
Julia
1
star
22

awltool

Parser and command line tool for Siemens AWL files - GROSSLY INCOMPLETE
Ruby
1
star
23

em-modbus

A modbus driver using eventmachine - PROTORYPE DONT USE
Ruby
1
star
24

InfluxFlux

Julia
1
star