• Stars
    star
    356
  • Rank 115,483 (Top 3 %)
  • Language
    Elixir
  • License
    MIT License
  • Created almost 10 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SweetXml

Build Status Module Version Hex Docs Total Download License Last Updated

SweetXml is a thin wrapper around :xmerl. It allows you to convert a char_list or xmlElement record as defined in :xmerl to an elixir value such as map, list, string, integer, float or any combination of these.

Installation

Add dependency to your project's mix.exs:

def deps do
  [{:sweet_xml, "~> 0.7.1"}]
end

SweetXml depends on :xmerl. On some Linux systems, you might need to install the package erlang-xmerl.

Examples

Given an XML document such as below:

<?xml version="1.05" encoding="UTF-8"?>
<game>
  <matchups>
    <matchup winner-id="1">
      <name>Match One</name>
      <teams>
        <team>
          <id>1</id>
          <name>Team One</name>
        </team>
        <team>
          <id>2</id>
          <name>Team Two</name>
        </team>
      </teams>
    </matchup>
    <matchup winner-id="2">
      <name>Match Two</name>
      <teams>
        <team>
          <id>2</id>
          <name>Team Two</name>
        </team>
        <team>
          <id>3</id>
          <name>Team Three</name>
        </team>
      </teams>
    </matchup>
    <matchup winner-id="1">
      <name>Match Three</name>
      <teams>
        <team>
          <id>1</id>
          <name>Team One</name>
        </team>
        <team>
          <id>3</id>
          <name>Team Three</name>
        </team>
      </teams>
    </matchup>
  </matchups>
</game>

We can do the following:

import SweetXml
doc = "..." # as above

Get the name of the first match:

result = doc |> xpath(~x"//matchup/name/text()") # `sigil_x` for (x)path
assert result == 'Match One'

Get the XML record of the name of the first match:

result = doc |> xpath(~x"//matchup/name"e) # `e` is the modifier for (e)ntity
assert result == {:xmlElement, :name, :name, [], {:xmlNamespace, [], []},
        [matchup: 2, matchups: 2, game: 1], 2, [],
        [{:xmlText, [name: 2, matchup: 2, matchups: 2, game: 1], 1, [],
          'Match One', :text}], [],
        ...}

Get the full list of matchup name:

result = doc |> xpath(~x"//matchup/name/text()"l) # `l` stands for (l)ist
assert result == ['Match One', 'Match Two', 'Match Three']

Get a list of winner-id by attributes:

result = doc |> xpath(~x"//matchup/@winner-id"l)
assert result == ['1', '2', '1']

Get a list of matchups with different map structure:

result = doc |> xpath(
  ~x"//matchups/matchup"l,
  name: ~x"./name/text()",
  winner: [
    ~x".//team/id[.=ancestor::matchup/@winner-id]/..",
    name: ~x"./name/text()"
  ]
)
assert result == [
  %{name: 'Match One', winner: %{name: 'Team One'}},
  %{name: 'Match Two', winner: %{name: 'Team Two'}},
  %{name: 'Match Three', winner: %{name: 'Team One'}}
]

Or directly return a mapping of your liking:

result = doc |> xmap(
  matchups: [
    ~x"//matchups/matchup"l,
    name: ~x"./name/text()",
    winner: [
      ~x".//team/id[.=ancestor::matchup/@winner-id]/..",
      name: ~x"./name/text()"
    ]
  ],
  last_matchup: [
    ~x"//matchups/matchup[last()]",
    name: ~x"./name/text()",
    winner: [
      ~x".//team/id[.=ancestor::matchup/@winner-id]/..",
      name: ~x"./name/text()"
    ]
  ]
)
assert result == %{
  matchups: [
    %{name: 'Match One', winner: %{name: 'Team One'}},
    %{name: 'Match Two', winner: %{name: 'Team Two'}},
    %{name: 'Match Three', winner: %{name: 'Team One'}}
  ],
  last_matchup: %{name: 'Match Three', winner: %{name: 'Team One'}}
}

The ~x Sigil

Warning ! Because we use xmerl internally, only XPath 1.0 paths are handled.

In the above examples, we used the expression ~x"//some/path" to define the path. The reason is it allows us to more precisely specify what is being returned.

  • ~x"//some/path"

    without any modifiers, xpath/2 will return the value of the entity if the entity is of type xmlText, xmlAttribute, xmlPI, xmlComment as defined in :xmerl

  • ~x"//some/path"e

    e stands for (e)ntity. This forces xpath/2 to return the entity with which you can further chain your xpath/2 call

  • ~x"//some/path"l

    'l' stands for (l)ist. This forces xpath/2 to return a list. Without l, xpath/2 will only return the first element of the match

  • ~x"//some/path"k

    'k' stands for (k)eyword. This forces xpath/2 to return a Keyword instead of a Map.

  • ~x"//some/path"el - mix of the above

  • ~x"//some/path"s

    's' stands for (s)tring. This forces xpath/2 to return the value as string instead of a char list.

  • ~x"//some/path"S

    'S' stands for soft (S)tring. This forces xpath/2 to return the value as string instead of a char list, but if node content is incompatible with a string, set "".

  • ~x"//some/path"o

    'o' stands for (o)ptional. This allows the path to not exist, and will return nil.

  • ~x"//some/path"sl - string list.

  • ~x"//some/path"i

    'i' stands for (i)nteger. This forces xpath/2 to return the value as integer instead of a char list.

  • ~x//some/path"I

    'I' stands for soft (I)nteger. This forces xpath/2 to return the value as integer instead of a char list, but if node content is incompatible with an integer, set 0.

  • ~x"//some/path"f

    'f' stands for (f)loat. This forces xpath/2 to return the value as float instead of a char list.

  • ~x//some/path"F

    'F' stands for soft (F)loat. This forces xpath/2 to return the value as float instead of a char list, but if node content is incompatible with a float, set 0.0.

  • ~x"//some/path"il - integer list.

If you use the optional modifier o together with a soft cast modifier (uppercase), then the value is set to nil when the value is not compatible for instance ~x//some/path/text()"Fo return nil if the text is not a number.

Also in the examples section, we always import SweetXml first. This makes x_sigil available in the current scope. Without it, instead of using ~x, you can use the %SweetXpath struct

assert ~x"//some/path"e == %SweetXpath{path: '//some/path', is_value: false, is_list: false, cast_to: false}

Note the use of char_list in the path definition.

Namespace support

Given a XML document such as below

<?xml version="1.05" encoding="UTF-8"?>
<game xmlns="http://example.com/fantasy-league" xmlns:ns1="http://example.com/baseball-stats">
  <matchups>
    <matchup winner-id="1">
      <name>Match One</name>
      <teams>
        <team>
          <id>1</id>
          <name>Team One</name>
          <ns1:runs>5</ns1:runs>
        </team>
        <team>
          <id>2</id>
          <name>Team Two</name>
          <ns1:runs>2</ns1:runs>
        </team>
      </teams>
    </matchup>
  </matchups>
</game>

We can do the following:

import SweetXml
xml_str = "..." # as above
doc = parse(xml_str, namespace_conformant: true)

Note the fact that we explicitly parse the XML with the namespace_conformant: true option. This is needed to allow nodes to be identified in a prefix independent way.

We can use namespace prefixes of our preference, regardless of what prefix is used in the document:

result = doc
  |> xpath(~x"//ff:matchup/ff:name/text()"
           |> add_namespace("ff", "http://example.com/fantasy-league"))

assert result == 'Match One'

We can specify multiple namespace prefixes:

result = doc
  |> xpath(~x"//ff:matchup//bb:runs/text()"
           |> add_namespace("ff", "http://example.com/fantasy-league")
           |> add_namespace("bb", "http://example.com/baseball-stats"))

assert result == '5'

From Chaining to Nesting

Here's a brief explanation to how nesting came about.

Chaining

Both xpath and xmap can take an :xmerl XML record as the first argument. Therefore you can chain calls to these functions like below:

doc
|> xpath(~x"//li"l)
|> Enum.map fn (li_node) ->
  %{
    name: li_node |> xpath(~x"./name/text()"),
    age: li_node |> xpath(~x"./age/text()")
  }
end

Mapping to a structure

Since the previous example is such a common use case, SweetXml allows you just simply do the following

doc
|> xpath(
  ~x"//li"l,
  name: ~x"./name/text()",
  age: ~x"./age/text()"
)

Nesting

But what you want is sometimes more complex than just that, SweetXml thus also allows nesting

doc
|> xpath(
  ~x"//li"l,
  name: [
    ~x"./name",
    first: ~x"./first/text()",
    last: ~x"./last/text()"
  ],
  age: ~x"./age/text()"
)

Transform By

Sometimes we need to transform the value to what we need, SweetXml supports that via transform_by/2

doc = "<li><name><first>john</first><last>doe</last></name><age>30</age></li>"

result = doc |> xpath(
  ~x"//li"l,
  name: [
    ~x"./name",
    first: ~x"./first/text()"s |> transform_by(&String.capitalize/1),
    last: ~x"./last/text()"s |> transform_by(&String.capitalize/1)
  ],
  age: ~x"./age/text()"i
)

^result = [%{age: 30, name: %{first: "John", last: "Doe"}}]

The same can be used to break parsing code into reusable functions that can be used in nesting:

doc = "<li><name><first>john</first><last>doe</last></name><age>30</age></li>"

parse_name = fn xpath_node ->
  xpath_node |> xmap(
    first: ~x"./first/text()"s |> transform_by(&String.capitalize/1),
    last: ~x"./last/text()"s |> transform_by(&String.capitalize/1)
  )
end

result = doc |> xpath(
  ~x"//li"l,
  name: ~x"./name" |> transform_by(parse_name),
  age: ~x"./age/text()"i
)

^result = [%{age: 30, name: %{first: "John", last: "Doe"}}]

For more examples, please take a look at the tests and help.

Streaming

SweetXml now also supports streaming in various forms. Here's a sample XML doc. Notice the certain lines have XML tags that span multiple lines:

<?xml version="1.05" encoding="UTF-8"?>
<html>
  <head>
    <title>XML Parsing</title>
    <head><title>Nested Head</title></head>
  </head>
  <body>
    <p>Neato €</p><ul>
      <li class="first star" data-index="1">
        First</li><li class="second">Second
      </li><li
            class="third">Third</li>
    </ul>
    <div>
      <ul>
        <li>Forth</li>
      </ul>
    </div>
    <special_match_key>first star</special_match_key>
  </body>
</html>

Working with File.stream!/1

Working with streams is exactly the same as working with binaries:

File.stream!("file_above.xml") |> xpath(...)

SweetXml element streaming

Once you have a file stream, you may not want to work with the entire document to save memory:

file_stream = File.stream!("file_above.xml")

result = file_stream
|> stream_tags([:li, :special_match_key])
|> Stream.map(fn
    {_, doc} ->
      xpath(doc, ~x"./text()")
  end)
|> Enum.to_list

assert result == ['\n        First', 'Second\n      ', 'Third', 'Forth', 'first star']

Warning: In case of large document, you may want to use the discard option to avoid memory leak.

result = file_stream
|> stream_tags([:li, :special_match_key], discard: [:li, :special_match_key])

Security

Whenever you have to deal with some XML that was not generated by your system (untrusted document), it is highly recommended that you separate the parsing step from the mapping step, in order to be able to prevent some default behavior through options. You can check the doc for SweetXml.parse/2 for more details. The current recommendations are:

doc |> parse(dtd: :none) |> xpath(spec, subspec)
enum |> stream_tags(tags, dtd: :none)

Copyright and License

Copyright (c) 2014, Frank Liu

SweetXml source code is licensed under the MIT License.

CONTRIBUTING

Hi, and thank you for wanting to contribute. Please refer to the centralized informations available at: https://github.com/kbrw#contributing

More Repositories

1

reaxt

Use React template into your Elixir application for server rendering
Elixir
372
star
2

elixir.nvim

Vim Completion/Doc/Eval for Elixir (nvim), compile https://github.com/awetzel/neovim-elixir and https://github.com/awetzel/nvim-rplugin
Vim Script
103
star
3

ewebmachine

The HTTP decision tree as a plug (full elixir rewriting of basho/webmachine with improvements)
CSS
97
star
4

exos

Exos is a simple Port Wrapper : a GenServer which forwards cast and call to a linked Port.
Elixir
77
star
5

node_erlastic

Node library to make nodejs gen_server in Erlang/Elixir through Port connection
JavaScript
74
star
6

neovim-elixir

Neovim host plugin for Elixir (use https://github.com/awetzel/elixir.nvim for a packaged version)
Elixir
69
star
7

gitex

Elixir implementation of the Git object storage, but with the goal to implement the same semantic with other storage and topics
Elixir
68
star
8

mailibex

Library containing Email related implementations in Elixir : dkim, spf, dmark, mimemail, smtp
Elixir
60
star
9

nano_ring

NanoRing is a very very small Cluster management System in Elixir.
Elixir
35
star
10

clojure-erlastic

Micro lib making use of erlang JInterface lib to decode and encode Binary Erlang Term and simple erlang port interface with core.async channel. So you can communicate with erlang coroutine with clojure abstraction
Clojure
33
star
11

reaxt-example

An example and test application for "reaxt", the react integration for elixir
JavaScript
30
star
12

delayed_otp

Delay death of supervisor children or gen_server : for instance Erlang supervisor with exponential backoff restart strategy
Elixir
24
star
13

plug_forwarded_peer

Very simple plug which reads `X-Forwarded-For` or `Forwarded` header according to rfc7239 and fill `conn.remote_ip` with the root client ip.
Elixir
24
star
14

supervisorring

otp/supervisor-like interface to supervise distributed processes
Elixir
16
star
15

adap

Create a data stream across your information systems to query, augment and transform data according to Elixir matching rules.
Elixir
16
star
16

calibex

Elixir ICal : bijective coding/decoding for ICal transformation, ICal email request and responses.
Elixir
15
star
17

rulex

This tiny library (2 macros only) allows you to define very simple rule handler using Elixir pattern matching.
Elixir
13
star
18

exfsm

Simple elixir library to define a static FSM.
Elixir
11
star
19

nvim-rplugin

Nvim Elixir plugin with autocompletion and eval commands
Elixir
9
star
20

dotx

Dotx is a library for DOT file parsing and generation. The whole spec https://www.graphviz.org/doc/info/lang.html is implemented.
Elixir
9
star
21

json_stream

Small but useful wrapper above erlang `jsx` to stream json elements from an Elixir binary stream.
Elixir
8
star
22

stemex

Stemex is a NIF wrapper above the snowball language (http://snowball.tartarus.org)
C
7
star
23

zip_stream

Library to read zip file in a stream. Zip file binary stream -> stream of {:new_file,name} or uncompressed_bin Erlang zlib library only allows deflate decompress stream. But Erlang zip library does not allow content streaming.
HTML
7
star
24

nox

Node / npm wrapper
Elixir
5
star
25

babel-plugin-transform-jsxz

Write your JSX for your HTML components using successive transformations targeted with CSS selector at compilation time - in the same way that enlive templates work. As a Babel transform plugin.
JavaScript
4
star
26

bert_view

Chrome Extension to have a pretty view of binary erlang term in HTTP service
JavaScript
2
star
27

arcore

Arcore OS, the very simple Kbrw Immutable OS - linux distribution based on Systemd and Runc.
Shell
2
star
28

jsxz-loader

JSXZ Loader allows you to Precompile your JSX HTML components from static HTML templates using CSS selectors.
JavaScript
1
star
29

elixir-sandbox

Elixir
1
star
30

qrcode

Erlang
1
star