• Stars
    star
    220
  • Rank 180,361 (Top 4 %)
  • Language
    Clojure
  • License
    Eclipse Public Li...
  • Created over 13 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

data.xml

data.xml is a Clojure library for reading and writing XML data. This library is the successor to lazy-xml. data.xml has the following features:

  • Parses XML documents into Clojure data structures
  • Emits XML from Clojure data structures
  • No additional dependencies if using JDK >= 1.6
  • Uses StAX internally
  • lazy - should allow parsing and emitting of large XML documents

API Reference

Generated API docs for data.xml are available here.

Bugs

Please report bugs using JIRA here.

Installation

Latest stable release: 0.0.8

Latest preview release: 0.2.0-alpha8

(The main features of the 0.2.0 series are XML Namespace support and Clojurescript support)

Maven

For Maven projects, add the following XML in your pom.xml's <dependencies> section:

For stable:

<dependency>
  <groupId>org.clojure</groupId>
  <artifactId>data.xml</artifactId>
  <version>0.0.8</version>
 </dependency>

For preview:

<dependency>
  <groupId>org.clojure</groupId>
  <artifactId>data.xml</artifactId>
  <version>0.2.0-alpha8</version>
 </dependency>

Leiningen

Add the following to the project.clj dependencies:

For stable:

[org.clojure/data.xml "0.0.8"]

For preview:

[org.clojure/data.xml "0.2.0-alpha8"]

CLI/deps.edn

Add the following to the deps.edn dependencies:

;; for stable version:
org.clojure/data.xml {:mvn/version "0.0.8"}

;; for preview version:
org.clojure/data.xml {:mvn/version "0.2.0-alpha8"}

Examples

The examples below assume you have added a :refer for data.xml:

(require '[clojure.data.xml :as xml])

data.xml supports parsing and emitting XML. The parsing functions will read XML from a Reader or InputStream.

(let [input-xml (java.io.StringReader. "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
                                        <foo><bar><baz>The baz value</baz></bar></foo>")]
  (xml/parse input-xml))

#xml/element{:tag :foo,
             :content [#xml/element{:tag :bar,
                                    :content [#xml/element{:tag :baz,
                                                           :content ["The baz value"]}]}]}

The data is returned as defrecords and can be manipulated using the normal clojure data structure functions. Additional parsing options can be passed via key pairs:

(xml/parse-str "<a><![CDATA[\nfoo bar\n]]><![CDATA[\nbaz\n]]></a>" :coalescing false)
#xml/element{:tag :a, :content ["\nfoo bar\n" "\nbaz\n"]}

XML elements can be created using the typical defrecord constructor functions or the element function used below or just a plain map with :tag :attrs :content keys, and written using a java.io.Writer.:

(let [tags (xml/element :foo {:foo-attr "foo value"}
             (xml/element :bar {:bar-attr "bar value"}
               (xml/element :baz {} "The baz value")))]
  (with-open [out-file (java.io.FileWriter. "/tmp/foo.xml")]
    (xml/emit tags out-file)))

;;-> Writes XML to /tmp/foo.xml

The same can also be expressed using a more Hiccup-like style of defining the elements using sexp-as-element:

(= (xml/element :foo {:foo-attr "foo value"}
     (xml/element :bar {:bar-attr "bar value"}
       (xml/element :baz {} "The baz value")))
   (xml/sexp-as-element
      [:foo {:foo-attr "foo value"}
       [:bar {:bar-attr "bar value"}
        [:baz {} "The baz value"]]]))
;;-> true

Comments and CDATA can also be emitted as an S-expression with the special tag names :-cdata and :-comment:

(= (xml/element :tag {:attr "value"}
     (xml/element :body {} (xml/cdata "not parsed <stuff")))
   (xml/sexp-as-element [:tag {:attr "value"} [:body {} [:-cdata "not parsed <stuff"]]]))
;;-> true

XML can be "round tripped" through the library:

(let [tags (xml/element :foo {:foo-attr "foo value"}
             (xml/element :bar {:bar-attr "bar value"}
               (xml/element :baz {} "The baz value")))]
  (with-open [out-file (java.io.FileWriter. "/tmp/foo.xml")]
    (xml/emit tags out-file))
  (with-open [input (java.io.FileInputStream. "/tmp/foo.xml")]
    (xml/parse input)))

#xml/element{:tag :foo, :attrs {:foo-attr "foo value"}...}

There are also some string based functions that are useful for debugging.

(let [tags (xml/element :foo {:foo-attr "foo value"}
             (xml/element :bar {:bar-attr "bar value"}
               (xml/element :baz {} "The baz value")))]
  (= tags (xml/parse-str (xml/emit-str tags))))

true

Indentation is supported, but should be treated as a debugging feature as it's likely to be pretty slow:

(print (xml/indent-str (xml/element :foo {:foo-attr "foo value"}
                         (xml/element :bar {:bar-attr "bar value"}
                           (xml/element :baz {} "The baz value1")
                           (xml/element :baz {} "The baz value2")
                           (xml/element :baz {} "The baz value3")))))

<?xml version="1.0" encoding="UTF-8"?>
<foo foo-attr="foo value">
  <bar bar-attr="bar value">
    <baz>The baz value1</baz>
    <baz>The baz value2</baz>
    <baz>The baz value3</baz>
  </bar>
</foo>

CDATA can be emitted:

(xml/emit-str (xml/element :foo {}
                (xml/cdata "<non><escaped><info><here>")))

;; newlines added for readability, not in actual output
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
 <foo><![CDATA[<non><escaped><info><here>]]></foo>"

But will be read as regular character data:

(xml/parse-str (xml/emit-str (xml/element :foo {}
                 (xml/cdata "<non><escaped><info><here>"))))

#xml/element{:tag :foo, :content ["<non><escaped><info><here>"]}

Comments can also be emitted:

(xml/emit-str
  (xml/element :foo {}
    (xml/xml-comment "Just a <comment> goes here")
    (xml/element :bar {} "and another element")))

;; newlines added for readability, not in actual output
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
 <foo><!--Just a <comment> goes here--><bar>and another element</bar></foo>"

But are ignored when read:

(xml/emit-str
  (xml/parse-str
    (xml/emit-str (xml/element :foo {}
                    (xml/xml-comment "Just a <comment> goes here")
                    (xml/element :bar {} "and another element")))))

;; newlines added for readability, not in actual output
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
 <foo><bar>and another element</bar></foo>"

Namespace Support

XML Namespaced names (QNames) are encoded into clojure keywords, by percent-encoding the (XML) namespace: {http://www.w3.org/1999/xhtml}head is encoded in data.xml as :http%3A%2F%2Fwww.w3.org%2F1999%2Fxhtml/head.

Below is an example of parsing an XHTML document:

(xml/parse-str "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
                <foo:html xmlns:foo=\"http://www.w3.org/1999/xhtml\"/>")

#xml/element{:tag :xmlns.http%3A%2F%2Fwww.w3.org%2F1999%2Fxhtml/html}

Emitting namespaced XML is usually done by using alias-uri in combination with clojure's built-in ::kw-ns/shorthands:

;; this needs to be at the top level of your code (parallel to defns)
;; or subsequent ::xh/ ... will throw "Invalid token"
(xml/alias-uri 'xh "http://www.w3.org/1999/xhtml")

(xml/emit-str {:tag ::xh/html
               :content [{:tag ::xh/head} {:tag ::xh/body :content ["DOCUMENT"]}]})

<?xml version="1.0" encoding="UTF-8"?>
<a:html xmlns:a="http://www.w3.org/1999/xhtml">
  <a:head/>
  <a:body>DOCUMENT</a:body>
</a:html>

To emit namespaced tags without prefixes, you can also set the default xmlns at the root (it's important that the uris match!!):

;; at top level
(xml/alias-uri 'xh "http://www.w3.org/1999/xhtml")

;; top-level element should set xmlns that matches
(xml/emit-str
  (xml/element ::xh/html
               {:xmlns "http://www.w3.org/1999/xhtml"}
			   (xml/element ::xh/head)
			   (xml/element ::xh/body {} "DOCUMENT")))

;; newlines and indents added for readability, not in actual output
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
 <html xmlns=\"http://www.w3.org/1999/xhtml\">
   <head/>
   <body>DOCUMENT</body>
 </html>"

Same example, but using the more concise hiccup style (same output):

;; at top level
(xml/alias-uri 'xh "http://www.w3.org/1999/xhtml")

(xml/emit-str
  (xml/sexp-as-element
    [::xh/html {:xmlns "http://www.w3.org/1999/xhtml"}
     [::xh/head]
     [::xh/body "DOCUMENT"]]))

It is also allowable to use javax.xml.namespace.QName instances, as well as strings with the informal {ns}n encoding.

(xml/emit-str {:tag (xml/qname "http://www.w3.org/1999/xhtml" "html")})
(xml/emit-str {:tag "{http://www.w3.org/1999/xhtml}html"})

;; newlines added for readability, not in actual output
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<a:html xmlns:a=\"http://www.w3.org/1999/xhtml\"></a:html>

Namespace Prefixes

Prefixes are mostly an artifact of xml serialisation. They can be customized by explicitly declaring them as attributes in the xmlns kw-namespace:

(xml/emit-str
  (xml/element (xml/qname "http://www.w3.org/1999/xhtml" "title")
               {:xmlns/foo "http://www.w3.org/1999/xhtml"}
               "Example title"))

;; newlines added for readability, not in actual output
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
 <foo:title xmlns:foo=\"http://www.w3.org/1999/xhtml\">Example title</foo:title>"

Not specifying a namespace prefix will results in a prefix being generated:

(xml/emit-str
  (xml/element ::xh/title
           {}
           "Example title"))

;; newlines added for readability, not in actual output
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
 <a:title xmlns:a=\"http://www.w3.org/1999/xhtml\">Example title</a:title>"

The above example auto assigns prefixes for the namespaces used. In this case it was named a by the emitter. Emitting several nested tags with the same namespace will use one prefix:

(xml/emit-str
  (xml/element ::xh/html
               {}
               (xml/element ::xh/head
                            {}
                            (xml/element ::xh/title
                                         {}
                                         "Example title"))))

;; newlines and indents added for readability, not in actual output
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
 <a:html xmlns:a=\"http://www.w3.org/1999/xhtml\">
   <a:head>
     <a:title>Example title</a:title></a:head></a:html>"

Note that the jdk QName ignores namespace prefixes for equality, but allows to preserve them for emitting.

(= (xml/parse-str "<foo:title xmlns:foo=\"http://www.w3.org/1999/xhtml2\">Example title</foo:title>")
   (xml/parse-str "<bar:title xmlns:bar=\"http://www.w3.org/1999/xhtml2\">Example title</bar:title>"))

In data.xml prefix mappings are (by default) retained in metadata on a tag record. If there is no metadata, new prefixes will be generated when emitting.

(xml/emit-str (xml/parse-str "<foo:element xmlns:foo=\"FOO:\" />"))

Location information as meta

By default the parser attaches location information as element meta, :character-offset, :column-number and :line-number are available under the :clojure.data.xml/location-info key:

(deftest test-location-meta
  (let [input "<a><b/>\n<b/></a>"
        location-meta (comp :clojure.data.xml/location-info meta)]
    (is (= 1 (-> input xml/parse-str location-meta :line-number)))))

To elide location information, pass :location-info false to the parser:

(xml/parse-str your-input :location-info false)

Clojurescript support

The Clojurescript implementation uses the same namespace as the Clojure one clojure.data.xml.

Native DOM support

data.xml can directly work with native dom nodes.

  • To parse into DOM objects, call parse with :raw true
  • To use DOM objects like regular persistent maps, call (extend-dom-as-data!). This extends the native dom node prototypes to Clojurescript collection protocols, such that you can treat them as data.xml parse trees.
  • To coerce to native dom use element-node
  • To coerce to records use element-data

Missing Features, Patches Welcome

Streaming

data.xml on Clojurescript doesn't currently support streaming, hence only the *-str variants of parse/emit are implemented. Those are just wrappers for browser's native xml parsing/printing.

Pull parsing doesn't seem the right solution for Clojurescript, because when code cannot block, the parser has no way of waiting on its input. For this reason, parsing in Clojurescript cannot be based around event-seq.

Push parsing, on the other hand should not pose a problem, because when data arrives in a callback, it can be pushed on into the parser. Fortunately, clojure already has a nice push-based pendant for lazy sequences: transducers.

Utilities

Some utilities, like process/*-xmlns, prxml/sexp-as-*, indent aren't yet implemented.

Immutable updates for dom types

Make extend-dom-as-data! also support assoc, ... on dom nodes.

Feel free to pick a ticket to work on

License

Licensed under the Eclipse Public License.

Developer Information

Contributing

All contributions need to be made via patches attached to tickets in JIRA. Check the Contributing to Clojure page for more information.

More Repositories

1

clojure

The Clojure programming language
Java
10,334
star
2

clojurescript

Clojure to JS compiler
Clojure
9,191
star
3

core.async

Facilities for async programming and communication in Clojure
Clojure
1,935
star
4

clojure-clr

A port of Clojure to the CLR, part of the Clojure project
C#
1,541
star
5

core.logic

A logic programming library for Clojure & ClojureScript
Clojure
1,434
star
6

core.typed

An optional type system for Clojure
Clojure
1,285
star
7

core.match

An optimized pattern matching library for Clojure
Clojure
1,180
star
8

test.check

QuickCheck for Clojure
Clojure
1,112
star
9

java.jdbc

JDBC from Clojure (formerly clojure.contrib.sql)
Clojure
714
star
10

tools.cli

Command-line processing
Clojure
711
star
11

tools.nrepl

A Clojure network REPL that provides a server and client, along with some common APIs of use to IDEs and other tools that may need to evaluate Clojure code in remote environments.
Clojure
661
star
12

tools.namespace

Tools for managing namespaces in Clojure
Clojure
596
star
13

data.json

JSON in Clojure
Clojure
536
star
14

algo.monads

Macros for defining monads, and definition of the most common monads
Clojure
444
star
15

core.cache

A caching library for Clojure implementing various cache strategies
Clojure
442
star
16

tools.deps.alpha

A functional API for transitive dependency graph expansion and the creation of classpaths
Clojure
435
star
17

tools.logging

Clojure logging API
Clojure
382
star
18

tools.trace

1.3 update of clojure.contrib.trace
Clojure
354
star
19

math.combinatorics

Efficient, functional algorithms for generating lazy sequences for common combinatorial functions
Clojure
343
star
20

spec-alpha2

Clojure library to describe the structure of data and functions
Clojure
297
star
21

data.csv

CSV reader/writer to/from Clojure data structures
Clojure
270
star
22

core.memoize

A manipulable, pluggable, memoization framework for Clojure
Clojure
263
star
23

tools.analyzer

An analyzer for Clojure code, written in Clojure and producing AST in EDN
Clojure
257
star
24

clojure-site

clojure.org site
HTML
249
star
25

data.finger-tree

Finger Tree data structure
Clojure
213
star
26

spec.alpha

Clojure library to describe the structure of data and functions
Clojure
212
star
27

tools.reader

Clojure reader in Clojure
Clojure
203
star
28

tools.build

Clojure builds as Clojure programs
Clojure
200
star
29

core.rrb-vector

RRB-Trees in Clojure
Clojure
191
star
30

data.priority-map

Clojure priority map data structure
Clojure
186
star
31

math.numeric-tower

Math functions that deal intelligently with the various types in Clojure's numeric tower
Clojure
175
star
32

test.generative

Generative test runner
Clojure
161
star
33

core.unify

Unification library
Clojure
137
star
34

core.contracts

Contracts programming
Clojure
127
star
35

data.fressian

Read and write Fressian data from Clojure
Clojure
127
star
36

data.avl

Persistent sorted maps and sets with log-time rank queries
Clojure
125
star
37

data.int-map

A map optimized for integer keys
Java
124
star
38

core.incubator

Proving ground for proposed new core fns
Clojure
116
star
39

java.data

Functions for recursively converting Java beans to Clojure and vice versa
Clojure
114
star
40

tools.analyzer.jvm

Additional jvm-specific passes for tools.analyzer
Clojure
113
star
41

tools.macro

Utilities for macro writers
Clojure
113
star
42

clojurescript-site

website for ClojureScript
Shell
106
star
43

tools.deps.graph

Dependency graphs for deps.edn projects
Clojure
106
star
44

java.jmx

Produce and consume JMX beans from Clojure
Clojure
94
star
45

algo.generic

Generic versions of commonly used functions, implemented as multimethods that can be implemented for any data type
Clojure
92
star
46

tools.emitter.jvm

A JVM bytecode generator for ASTs compatible with tools.analyzer(.jvm)
Clojure
86
star
47

data.generators

Random data generators
Clojure
85
star
48

data.zip

Utilities for clojure.zip
Clojure
83
star
49

brew-install

Clojure CLI installer
Shell
81
star
50

data.codec

Native codec implementations
Clojure
74
star
51

tools.gitlibs

API for retrieving, caching, and programatically accessing git libraries
Clojure
62
star
52

java.classpath

Examine the Java classpath from Clojure programs
Clojure
59
star
53

jvm.tools.analyzer

Clojure
53
star
54

core.specs.alpha

specs to describe Clojure core macros and functions
Clojure
47
star
55

tools.tools

Clojure CLI tool for managing Clojure CLI tools
Clojure
42
star
56

homebrew-tools

Clojure homebrew tap providing Clojure formulae
Ruby
41
star
57

data.alpha.replicant-server

A Clojure library providing remote implementations of the Clojure data structures and a remote REPL server.
Clojure
37
star
58

test.benchmark

Benchmark and Regression Suite for Clojure
Roff
37
star
59

clr.tools.nrepl

Clojure
25
star
60

build.ci

Support scripts for continuous integration
Clojure
23
star
61

tools.analyzer.js

Provides js-specific passes for tools.analyzer
Clojure
21
star
62

algo.graph

Basic graph theory algorithms
Clojure
16
star
63

clojure-install

Java
16
star
64

data.alpha.replicant-client

A Clojure library providing client-side implementations of Clojure datastructures served by replicant-server.
Clojure
13
star
65

clojure.github.com

Documentation repos
HTML
8
star
66

build.poms

Parent POMs
8
star
67

core.typed.analyzer.jvm

Clojure
7
star
68

clr.tools.namespace

Clojure
7
star
69

core.typed.runtime.jvm

Clojure
7
star
70

clr.data.json

JSON in Clojure on the CLR
Clojure
6
star
71

clr.tools.reader

Clojure
5
star
72

clr.test.generative

Clojure
5
star
73

clojure-api-doc

Clojure API doc build
Clojure
5
star
74

contrib-api-doc

Clojure contrib API doc build
Clojure
5
star
75

core.typed.annotator.jvm

Clojure
5
star
76

core.typed.checker.jvm

Clojure
4
star
77

core.typed.checker.js

Clojure
4
star
78

io.incubator

Proving ground for proposed new io fns
4
star
79

clr.data.generators

Random data generators for Clojure on the CLR
Clojure
4
star
80

clr.core.async

Port of Clojure core.async to the CLR
Clojure
3
star
81

clr.spec.alpha

spec on the CLR
Clojure
3
star
82

clr.tools.analyzer

Clojure
3
star
83

test.regression

Regression tests for Clojure
Clojure
3
star
84

tools.deps.cli

Deps functions
Clojure
2
star
85

clr.core.specs.alpha

core specs on CLR
HTML
2
star
86

java.internal.invoke

2
star
87

clr.tools.gitlibs

An API for retrieving, caching, and programatically accessing git libraries
HTML
2
star
88

clr.core.logic

Clojure
2
star
89

clr.tools.trace

1
star
90

clr.core.cli

Clojure
1
star
91

clr.data.priority-map

ClojureCLR port of data.priority-map
Clojure
1
star
92

cljs.tools.closure

ClojureScript build of Google Closure
Shell
1
star
93

tools.analyzer.clr

additional clr-specific passes for tools.analyzer
Clojure
1
star
94

clr.test.check

Clojure
1
star
95

clr.core.cache

ClojureCLR port of core.cache
Clojure
1
star
96

clr.tools.logging

1
star
97

build.test

Dummy project for testing contrib build and deploy
Clojure
1
star
98

clr.core.memoize

ClojureCLR port of core.memoize
Clojure
1
star