• Stars
    star
    134
  • Rank 270,908 (Top 6 %)
  • Language
    Erlang
  • License
    Apache License 2.0
  • Created almost 9 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast Expat based Erlang XML parsing library

Erlang and Elixir XML Parsing

CI Coverage Status Hex version

Fast Expat based Erlang XML parsing and manipulation library, with a strong focus on XML stream parsing from network.

It supports:

  • Full XML structure parsing: Suitable for small but complete XML chunks.
  • XML stream parsing: Suitable for large XML document, or infinite network XML stream like XMPP.

This module can parse files much faster than built-in module xmerl. Depending on file complexity and size fxml_stream:parse_element/1 can be 8-18 times faster than calling xmerl_scan:string/2.

This application was previously called p1_xml and was renamed after major optimisations to put emphasis on the fact it is damn fast.

Building

Erlang XML parser can be build as follow:

./configure && make

Erlang XML parser is a rebar-compatible OTP application. Alternatively, you can build it with rebar:

rebar compile

Dependencies

Erlang XML parser depends on Expat XML parser. You need development headers for Expat library to build it.

You can use configure options to pass custom path to Expat libraries and headers:

--with-expat=[ARG]      use Expat XML Parser from given prefix (ARG=path);
                        check standard prefixes (ARG=yes); disable (ARG=no)
--with-expat-inc=[DIR]  path to Expat XML Parser headers
--with-expat-lib=[ARG]  link options for Expat XML Parser libraries

xmlel record and types

XML elements are provided as Erlang xmlel records.

Format of the record allows defining a simple tree-like structure. xmlel record has the following fields:

  • name :: binary()
  • attrs :: [attr()]
  • children :: [xmlel() | cdata()]

cdata type is a tuple of the form:

{xmlcdata, CData::binary()}

attr type if a tuple of the form:

{Name::binary(), Value::binary()}

XML full structure parsing

You can definitely parse a complete XML structure with fast_xml:

$ erl -pa ebin
Erlang/OTP 17 [erts-6.3] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Eshell V6.3  (abort with ^G)
1> application:start(fast_xml).
ok
2> rr(fxml).
[xmlel]
3> fxml_stream:parse_element(<<"<test>content cdata</test>">>).
#xmlel{name = <<"test">>,attrs = [],
       children = [{xmlcdata,<<"content cdata">>}]}

XML Stream parsing example

You can also parse continuous stream. Our design allows decoupling very easily the process receiving the raw XML to parse from the process receiving the parsed content.

The workflow is as follow:

state = new(CallbackPID); parse(state, data); parse(state, moredata); ...

and the parsed XML fragments (stanzas) are send to CallbackPID.

With that approach you can be very flexible on how you architect your own application.

Here is an example XML stream parsing:

$ erl -pa ebin
Erlang/OTP 17 [erts-6.3] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Eshell V6.3  (abort with ^G)

% Start the application:
1> application:start(fast_xml).
ok

% Create a new stream, using self PID to received XML parsing event:
2> S1 = fxml_stream:new(self()).
<<>>

% Start feeding content to the XML parser.
3> S2 = fxml_stream:parse(S1, <<"<root>">>).
<<>>

% Receive Erlang message send to shell process:
4> flush().
Shell got {'$gen_event',{xmlstreamstart,<<"root">>,[]}}
ok

% Feed more content:
5> S3 = fxml_stream:parse(S2, <<"<xmlelement>content cdata</xmlelement>">>).
<<>>

% Receive more messages:
6> flush().
Shell got {'$gen_event',
              {xmlstreamelement,
                  {xmlel,<<"xmlelement">>,[],
                      [{xmlcdata,<<"content cdata">>}]}}}
ok

% Feed more content:
7> S4 = fxml_stream:parse(S3, <<"</root>">>).      
<<>>

% Receive messages:
8> flush().
Shell got {'$gen_event',{xmlstreamend,<<"root">>}}
ok

9> fxml_stream:close(S4).
true

Note how the root element is important. We expect to have the root element serve as boundary with stream start and stream end event. Then, lower level tags are passed as sub stream elements.

How does this module relate to exmpp ?

This module is a low level fast XML parser. It is not an XMPP client library like exmpp.

References

This module is use at large scale for parsing massive XML content in ejabberd XMPP server project. It is used in production in thousands of real life deployments.

Development

Test

Unit test

You can run eunit test with the command:

$ rebar eunit

Elixir / Quickcheck test

You can run test written with Elixir / Quickcheck thanks to the mix command:

MIX_EXS=test/elixir/mix.exs mix test

More Repositories

1

ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
Erlang
6,101
star
2

tsung

Tsung is a high-performance benchmark framework for various protocols including HTTP, XMPP, LDAP, etc.
Erlang
2,543
star
3

ejabberd-contrib

Growing and curated ejabberd contributions repository - PR or ask to join !
Erlang
250
star
4

stun

STUN and TURN library for Erlang / Elixir
Erlang
245
star
5

eturnal

STUN / TURN standalone server
Erlang
241
star
6

xmpp-messenger-ios

iOS XMPP Messenger Framework
Swift
219
star
7

exmpp

Erlang XMPP library
Erlang
179
star
8

xmpp

Erlang/Elixir XMPP parsing and serialization library on top of Fast XML
Erlang
136
star
9

oneteam

OneTeam XMPP multi-platform client. This is a Mozilla / XUL based platform, developed mostly in Javascript and C++ XPCOM.
C
100
star
10

docker-ejabberd

Set of ejabberd Docker images
Shell
95
star
11

grapherl

ejabberd monitoring server
Erlang
91
star
12

fast_tls

TLS / SSL OpenSSL-based native driver for Erlang / Elixir
C
83
star
13

demo-xmpp-ios

XMPPFramework Basic client relying using CocoaPods package
Swift
56
star
14

iconv

Fast encoding conversion library for Erlang / Elixir
Shell
54
star
15

fast_yaml

Fast YAML native library for Erlang / Elixir
Erlang
51
star
16

rtb

Benchmarking tool to stress real-time protocols
Erlang
50
star
17

eimp

Erlang Image Manipulation Process
Erlang
47
star
18

cache_tab

In-memory cache Erlang / Elixir library
Erlang
44
star
19

oms

Erlang-based Flash media server supporting video streaming, video conferencing, RPC call from client and from server, Remote Shared objects.
JavaScript
43
star
20

go-erlang

Go library for Erlang/Elixir interop
Go
31
star
21

ejabberd-vagrant-dev

Vagrant and Ansible script to create a VM preconfigured for ejabberd development
31
star
22

ejabberd-api

ejabberd API library in Go and multi-platform command-line tool
Go
24
star
23

xmpp-websocket-client

Test XMPP Websocket client
JavaScript
21
star
24

esip

ProcessOne SIP server component in Erlang
Erlang
20
star
25

p1_utils

Erlang Utility Modules from ProcessOne
Erlang
19
star
26

p1_pgsql

Pure Erlang PostgreSQL driver
Erlang
19
star
27

oneweb

OneWeb firefox extension to interact with browser and share content over XMPP
JavaScript
18
star
28

p1_mysql

Pure Erlang MySQL driver
Erlang
18
star
29

fast_ts

Fast TS is a fast Time Series Event Stream Processor
Elixir
17
star
30

stringprep

Fast Stringprep implementation for Erlang / Elixir
C
16
star
31

jamler

OCaml
15
star
32

docs.ejabberd.im

This is documentation site for ejabberd messaging server
CSS
14
star
33

mqtree

Index tree for MQTT topic filters
C
14
star
34

ezlib

Native zlib driver for Erlang / Elixir
Erlang
13
star
35

mysql

Erlang MySQL driver
Erlang
13
star
36

xml

Fast Expat based Erlang XML parsing library
Erlang
12
star
37

pkix

PKIX certificates management for Erlang
Erlang
11
star
38

epam

epam helper for Erlang / Elixir PAM authentication support
Erlang
11
star
39

p1_acme

ACME client library for Erlang
Erlang
11
star
40

eiconv

iconv Erlang binding
Shell
10
star
41

tls

TLS / SSL native driver for Erlang / Elixir
C
10
star
42

p1pp

ProcessOne Push Platform Command Line
Ruby
10
star
43

dpk

Analyse & convert data from online services for backup, indexing or migration purpose
Go
9
star
44

httpmock

HTTP recorder and mock library
HTML
9
star
45

xmpp-notifier

Github Action to send repository notifications to XMPP
Go
9
star
46

pgsql

Pure Erlang PostgreSQL driver
Erlang
8
star
47

p1_yaml

Fast Yaml native library for Erlang / Elixir
Erlang
7
star
48

yconf

YAML configuration processor
Erlang
7
star
49

p1pp-js

ProcessOne Push Platform Javascript library
JavaScript
7
star
50

fluux

fluux.io is a scalable messaging service (SaaS) powered by ejabberd Business Edition
7
star
51

android-wave-client

Wave client for Android mobile
Java
6
star
52

elixir_experiments

This is a repository containing Elixir experiments for ejabberd
Elixir
5
star
53

eredis_queue

Erlang Async Job Processing
Erlang
5
star
54

zlib

Native zlib driver for Erlang
Erlang
5
star
55

ecrdt

CRDT compliant data structures
Erlang
4
star
56

bfile

An interface to fast FILE I/O
C
4
star
57

p1_sip

ProcessOne SIP server component
Erlang
4
star
58

rebar3_exunit

A plugin to run Elixir ExUnit tests from rebar3 build tool
Erlang
2
star
59

google-wave-api

Wave API ported for Android
Java
2
star
60

dns-tools

Provides a programmer-friendly API for a number of undocumented OTP dns lookup, resolution, caching and configuration functions.
Erlang
2
star
61

ejabberd-po

Translation files for ejabberd
Erlang
2
star
62

soundcloud

Minimal library to implement SoundCloud client in Go
Go
1
star
63

OpenfireExporter

Export users from Openfire
1
star
64

p1_logger

ProcessOne logger for ejabberd
Erlang
1
star
65

jorge

Jorge is set of php scripts that are front-end for Oleg Palij ejabberd mod_logdb
PHP
1
star
66

ejabberdOSXPrefs

ejabberd preference panel for OSX
Objective-C
1
star
67

boxcar-ios-framework

iOS Push Framework for Boxcar
Objective-C
1
star
68

jira-security-level-plugin

This is a JIRA plugin to automatically set security level based on group of the reporter
Java
1
star