• Stars
    star
    236
  • Rank 164,883 (Top 4 %)
  • Language
    Java
  • License
    Other
  • Created almost 11 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Assembly for XML: an imperative language for modifying XML documents

logo

EO principles respected here DevOps By Rultor.com We recommend IntelliJ IDEA

mvn PDD status codecov codebeat badge Codacy Badge Javadoc Maven Central Hits-of-Code Lines of code Code Smells

Xembly is an Assembly-like imperative programming language for data manipulation in XML documents. It is a much simplier alternative to DOM, XSLT, and XQuery. Read this blog post for a more detailed explanation: Xembly, an Assembly for XML. You may also want to watch this webinar.

You need this dependency:

<dependency>
  <groupId>com.jcabi.incubator</groupId>
  <artifactId>xembly</artifactId>
  <version>0.28.1</version>
</dependency>

Here is a command line implementation (as Ruby gem): xembly-gem

For example, you have an XML document:

<orders>
  <order id="553">
    <amount>$45.00</amount>
  </order>
</orders>

Then, you want to change the amount of the order #553 from $45.00 to $140.00. Xembly script would look like this:

XPATH "orders/order[@id=553]";
XPATH "amount";
SET "$140.00";

As you see, it's much simpler and compact than DOM, XSLT, or XQuery.

This Java package implements Xembly:

Document document = DocumentBuilderFactory.newInstance()
  .newDocumentBuilder().newDocument();
new Xembler(
  new Directives(
    "ADD 'orders'; ADD 'order'; ATTR 'id', '553'; SET '$140.00';"
  )
).apply(document);

Since version 0.9 you can directly transform directives to XML:

String xml = new Xembler(
  new Directives()
    .add("root")
    .add("order")
    .attr("id", "553")
    .set("$140.00")
).xml();

This code will produce the following XML document:

<root>
  <order id="553">$140</order>
</root>

Directives

This is a full list of supported directives, in the current version:

  • ADD: adds new node to all current nodes
  • ADDIF: adds new node, if it's absent
  • SET: sets text value of current node
  • XSET: sets text value, calculating it with XPath
  • XATTR: sets attribute value, calculating it with XPath
  • CDATA: same as SET, but makes CDATA
  • UP: moves cursor one node up
  • XPATH: moves cursor to the nodes found by XPath
  • REMOVE: removes all current nodes
  • STRICT: throws an exception if cursor is missing nodes
  • PI: adds processing instruction
  • PUSH: saves cursor in stack
  • POP: retrieves cursor from stack
  • NS: sets namespace of all current nodes
  • COMMENT: adds XML comment

The "cursor" or "current nodes" is where we're currently located in the XML document. When Xembly script starts, the cursor is empty: it simply points to the highest level in the XML hierarchy. Pay attention, it doesn't point to the root node. It points to one level above the root. Remember, when a document is empty, there is no root node.

Then, we start executing directives one by one. After each directive the cursor is moving somewhere. There may be many nodes under the cursor, or just one, or none. For example, let's assume we're starting with this simple document <car/>:

ADD 'hello';        // Nothing happens, since the cursor is empty
XPATH '/car';       // There is one node <car> under the cursor
ADD 'make';         // The result is "<car><make/></car>",
                    // the cursor has one node "<make/>"
ATTR 'name', 'BMW'; // The result is "<car><make name='BMW'/></car>",
                    // the cursor still points to one node "<make/>"
UP;                 // The cursor has one node "<car>"
ADD 'mileage';      // The result is "<car><make name='BMW'/><mileage/></car>",
                    // the cursor still has one node "<car>"
XPATH '*';          // The cursor has two nodes "<make name='BMW'/>"
                    // and "<mileage/>"
REMOVE;             // The result is "<car/>", since all nodes under
                    // the cursor are removed

You can create a collection of directives either from a text or via supplementary methods, one per each directive. In both cases, you need to use the Directives class:

import org.xembly.Directives;
new Directives("XPATH '//car'; REMOVE;");
new Directives().xpath("//car").remove();

The second option is preferable, because it is faster โ€” there is no parsing involved.

ADD

The ADD directive adds a new node to every node in the current node set. ADD expects exactly one mandatory argument, which is the name of a new node to be added (case sensitive):

ADD 'orders';
ADD 'order';

Even if a node with the same name already exists, a new node will be added. Use ADDIF if you need to add only if the same-name node is absent.

After the execution, the ADD directive moves the cursor to the nodes just added.

ADDIF

The ADDIF directive adds a new node to every node of the current set, only if it is absent. ADDIF expects exactly one argument, which is the name of the node to be added (case sensitive):

ADD 'orders';
ADDIF 'order';

After the execution, the ADDIF directive moves the cursor to the nodes just added.

SET

The SET directive changes text content of all current nodes, and expects exactly one argument, which is the text content to set:

ADD "employee";
SET "John Smith";

SET doesn't move the cursor anywhere.

XSET

The XSET directive changes text content of all current nodes to a value calculated with the provided XPath expression:

ADD "product-1";
ADD "price";
XSET "sum(/products/price) div count(/products)";

XSET doesn't move the cursor anywhere.

XATTR

The XATTR directive changes the value of an attribute of all current nodes to a value calculated with the provided XPath expression:

ADD "product-1";
ADD "price";
XATTR "s", "sum(/products/price) div count(/products)";

XATTR doesn't move the cursor anywhere.

UP

The UP directive moves all current nodes to their parents.

XPATH

The XPATH directive re-points the cursor to the nodes found by the provided XPath expression:

XPATH "//employee[@id='234' and name='John Smith']/name";
SET "John R. Smith";

REMOVE

The REMOVE directive removes current nodes under the cursor and moves the cursor to their parents:

ADD "employee";
REMOVE;

STRICT

The STRICT directive checks that there is a certain number of current nodes:

XPATH "//employee[name='John Doe']";  // Move the cursor to the employee
STRICT "1";                           // Throw an exception if there
                                      // is not exactly one node under
                                      // the cursor

This is a very effective mechanism of validation of your script, in production mode. It is similar to assert statement in Java. It is recommended to use STRICT regularly, to make sure your cursor has correct amount of nodes, to avoid unexpected modifications.

STRICT doesn't move the cursor anywhere.

PI

The PI directive adds a new processing directive to the XML:

PI "xsl-stylesheet" "href='http://example.com'";

PI doesn't move the cursor anywhere.

PUSH and POP

The PUSH and POP directives save current DOM position to stack and restore it from there.

Let's say, you start your Xembly manipulations from a place in DOM, which location is not determined for you. After your manipulations are done, you want to get back to exactly the same place. You should use PUSH to save your current location and POP to restore it back, when manipulations are finished, for example:

PUSH;                        // Doesn't matter where we are
                             // We just save the location to stack
XPATH '//user[@id="123"]';   // Move the cursor to a completely
                             // different location in the XML
ADD 'name';                  // Add "<name/>" to all nodes under the cursor
SET 'Jeff';                  // Set text value to the nodes
POP;                         // Get back to where we were before the PUSH

PUSH basically saves the cursor into stack and POP restores it from there. This is a very similar technique to PUSH/POP directives in Assembly. The stack has no limits, you can push multiple times and pop them back. It is a stack, that's why it is First-In-Last-Out (FILO).

This operation is fast and it is highly recommended to use it everywhere, to be sure you're not making unexpected changes to the XML document.

NS

The NS directive adds a namespace attribute to a node:

XPATH '/garage/car';                // Move the cursor to "<car/>" node(s)
NS "http://www.w3.org/TR/html4/";   // Set the namespace over there

If an original document was like this:

<garage>
  <car>BMW</car>
  <car>Toyota</car>
</garage>

After the applying of that two directives, it will look like this:

<garage xmlns:a="http://www.w3.org/TR/html4/">
  <a:car>BMW</a:car>
  <a:car>Toyota</a:car>
</garage>

The namspace prefix may not necessarily be a:.

NS doesn't move the cursor anywhere.

XML Collections

Let's say you want to build an XML document with a collection of names:

package org.xembly.example;
import org.xembly.Directives;
import org.xembly.Xembler;
public class XemblyExample {
  public static void main(String[] args) throws Exception {
    String[] names = new String[] {
      "Jeffrey Lebowski",
      "Walter Sobchak",
      "Theodore Donald 'Donny' Kerabatsos",
    };
    Directives directives = new Directives().add("actors");
    for (String name : names) {
      directives.add("actor").set(name).up();
    }
    System.out.println(new Xembler(directives).xml());
  }
}

The standard output will contain this text:

<?xml version="1.0" encoding="UTF-8"?>
<actors>
  <actor>Jeffrey Lebowski</actor>
  <actor>Walter Sobchak</actor>
  <actor>Theodore Donald &apos;Donny&apos; Kerabatsos</actor>
</actors>

Merging Documents

When you need to add an entire XML document, you can convert it first into Xembly directives and then add them all together:

Iterable<Iterable> dirs = new Directives()
  .add("garage")
  .append(Directives.copyOf(node))
  .add("something-else");

This static utility method copyOf() converts an instance of class org.w3c.dom.Node into a collection of Xembly directives. Then, the append() method adds them all together to the main list.

Unfortunately, not every valid XML document can be parsed by copyOf(). For example, this one will lead to a runtime exception: <car>2015<name>BMW</name></car>. Read more about Xembly limitations, a few paragraphs below.

Escaping Invalid XML Text

XML, as a standard, doesn't allow certain characters in its body. For example, this code will throw an exception:

String xml = new Xembler(
  new Directives().add("car").set("\u00")
).xml();

The character \u00 is not allowed in XML. Actually, these ranges are also not allowed: \u00..\u08, \u0B..\u0C, \u0E..\u1F, \u7F..\u84, and \u86..u9F.

This means that you should validate everything and make sure you're setting only the "valid" text values to your XML nodes. Sometimes, it's not feasible to always check them. Sometimes you may simply need to save whatever is possible and call it a day. There a utility static method Xembler.escape(), to help you do that:

String xml = new Xembler(
  new Directives().add("car").set(Xembler.escape("\u00"))
).xml();

This code won't throw an exception. The Xembler.escape() method will convert "\u00" to "\u0000". It is recommended to use this method everywhere, if you are not sure about the quality of the content.

Shaded Xembly JAR With Dependencies

Usually, you're supposed to use this dependency in your pom.xml:

<dependency>
  <groupId>com.jcabi.incubator</groupId>
  <artifactId>xembly</artifactId>
</dependency>

However, if you have conflicts between dependencies, you can use our "shaded" JAR, that includes all dependencies:

<dependency>
  <groupId>com.jcabi.incubator</groupId>
  <artifactId>xembly</artifactId>
  <classifier>jar-with-dependencies</classifier>
</dependency>

Known Limitations

Xembly is not intended to be a replacement of XSL or XQuery. It is a lightweight (!) instrument for XML manipulations. There are a few things that can't be done by means of Xembly:

  • DTD section can't be modified

  • Elements and text content can't be mixed, e.g. this structure is not supported: <test>hello <b>friend</a></test>

Some of these limitations may be removed in the next versions. Please, submit an issue.

How To Contribute

Fork repository, make changes, send us a pull request. We will review your changes and apply them to the master branch shortly, provided they don't violate our quality standards. To avoid frustration, before sending us your pull request, please run full Maven build:

$ mvn clean install -Pqulice

You must fix all static analysis issues, otherwise we won't be able to merge your pull request. The build must be "clean".

Delivery Pipeline

Git master branch is our cutting edge of development. It always contains the latest version of the product, always in -SNAPSHOT suffixed version. Nobody is allowed to commit directly to master โ€” this branch is basically read-only. Everybody contributes changes via pull requrests. We are using rultor, a hosted chatbot, in order to merge pull requests into master. Only our architect is allowed to send pull requests to @rultor for merge, using merge command. Before it happens, a mandatory code review must be performed for a pull request.

After each successful merge of a pull request, our project manager gives deploy command to @rultor. The code from master branch is tested, packaged, and deployed to Sonatype, in version *-SNAPSHOT.

Every once in a while, the architect may decide that it's time to release a new minor/major version of the product. When it happens, he gives release command to @rultor. The code from master branch is tested, versioned, packaged, and deployed to Sonatype and Maven Central. A new Git tag is created. A new GitHub release is created and briefly documented. All this is done automatically by @rultor.

Got questions?

If you have questions or general suggestions, don't hesitate to submit a new Github issue. But keep these Five Principles of Bug Tracking in mind.

More Repositories

1

tacit

CSS framework for dummies, without a single CSS class: nicely renders properly formatted HTML5 pages
SCSS
1,636
star
2

takes

True Object-Oriented Java Web Framework without NULLs, Static Methods, Annotations, and Mutable Objects
Java
777
star
3

cactoos

Object-Oriented Java primitives, as an alternative to Google Guava and Apache Commons
Java
726
star
4

rultor

DevOps team assistant that helps you merge, deploy, and release GitHub-hosted apps and libraries
Java
468
star
5

qulice

Quality Police for Java projects: aggregator of Checkstyle and PMD
Java
293
star
6

s3auth

Amazon S3 HTTP Basic Auth Gateway: put your files into S3 bucket and make them accessible with a login/password through a browser
Java
254
star
7

jare

Free and Instant Content Delivery Network (CDN)
Java
130
star
8

elegantobjects.github.io

Fan club for Elegant Objects programmers
HTML
111
star
9

iri

Simple Immutable URI/URL Builder in Ruby
Ruby
110
star
10

0pdd

Puzzle Driven Development (PDD) Chatbot Assistant for Your GitHub Repositories
Ruby
109
star
11

blog

My blog about computers, written in Jekyll and deployed to GitHub Pages
Liquid
108
star
12

quiz

Refactor the code to make it look more object-oriented and maintainable
PHP
104
star
13

dynamo-archive

Archive and Restore DynamoDB Tables, from the Command Line
JavaScript
100
star
14

jekyll-github-deploy

Jekyll Site Automated Deployer to GitHub Pages
Ruby
78
star
15

sixnines

Website Availability Monitor: add your website to our dashboard and get 24x7 monitoring of its availability (and a badge!)
Ruby
68
star
16

hoc

Hits-of-Code Command Line Calculator, for Git and Subversion
Ruby
61
star
17

ssd16

16 lectures about "Software Systems Design" presented in Innopolis University in 2021 for 3rd year BSc students
TeX
57
star
18

squid-proxy

Docker image for a Squid forward proxy with authorization (fully anonymous)
Dockerfile
52
star
19

jekyll-plantuml

PlantUML plugin for Jekyll: helps you embed UML diagrams into static pages
Ruby
43
star
20

xdsd

eXtremely Distributed Software Development
TeX
41
star
21

jpages

Experimental Java OOP Web Framework
Java
39
star
22

rehttp

HTTP Repeater: you point your Webhooks to us and we make sure they get delivered even if not on the first try
Java
39
star
23

netbout

Private talks made easy
Java
39
star
24

awesome-risks

Sample Risks for a Software Project
38
star
25

requs

Controlled Natural Language for Requirements Specifications
Java
37
star
26

cactoos-http

Object-Oriented HTTP Client
Java
36
star
27

threecopies

Hosted Server Backup Service
Java
36
star
28

awesome-academic-oop

Curated list of academic writings on object-oriented programming
35
star
29

threads

Ruby Gem to unit-test a piece of code in multiple concurrent threads
Ruby
35
star
30

zache

Zero-footprint Ruby In-Memory Thread-Safe Cache
Ruby
34
star
31

mailanes

Smart E-mail Delivery System
Ruby
33
star
32

codexia

Open Source Incubator
Ruby
33
star
33

micromap

๐Ÿ“ˆ A much faster (for very small maps!) alternative of Rust HashMap, which doesn't use hashing and doesn't use heap
Rust
31
star
34

hangman

Hangman (the game) written in a very unelegant procedural style, which you can improve in order to test your skills
Java
29
star
35

jacli

Java Command Line Interface
29
star
36

wring

Smart Inbox for GitHub Notifications
Java
27
star
37

sibit

Simplified Command-Line Bitcoin Client
Ruby
27
star
38

xcop

Command Line Style Checker of XML Documents
Ruby
27
star
39

phprack

phpRack Integration Testing Framework
PHP
25
star
40

thindeck

Web Hosting That Deploys Itself
Java
24
star
41

elegantobjects

Supplementary materials for "Elegant Objects" book
Java
22
star
42

jekyll-git-hash

Jekyll Plugin for Git Hash Retrieval
Ruby
21
star
43

painofoop

Object-oriented programming is a pain if we do it wrong: Lecture Notes for a BSc course
TeX
21
star
44

0rsk

Online Risk Manager
Ruby
20
star
45

tojos

Text Object Java Objects (TOJOs): an object representation of a multi-line structured text file like CSV, YAML, or JSON
Java
19
star
46

soalition

Social Coalitions of Internet Writers
Ruby
18
star
47

jo

Junior Objects: JavaScript Examples
JavaScript
17
star
48

random-port

A Ruby gem to reserve a random TCP port
Ruby
17
star
49

latex-best-practices

A short collection of LaTeX academic writing best practices: it's my personal taste, read it skeptically
TeX
17
star
50

total

Ruby Gem to get total memory size in the system
Ruby
16
star
51

backtrace

Ruby gem to print exception backtrace nicely
Ruby
16
star
52

telepost

Simple Telegram posting Ruby gem
Ruby
15
star
53

rexsl

Java RESTful XSL-based Web Framework
Java
15
star
54

ru.yegor256.com

My Russian blog about politics and social problems
HTML
15
star
55

huawei.cls

LaTeX class for documents you create when working with Huawei or maybe even inside it
TeX
14
star
56

glogin

Login/logout via GitHub OAuth for your Ruby web app
Ruby
14
star
57

seedramp

Micro-Investment Venture Fund
HTML
14
star
58

use_tinymce

yet another TinyMCE for Rails adaptor. Rails 3 + Minimal dependencies
JavaScript
14
star
59

tacky

Primitive Object Memoization for Ruby
Ruby
14
star
60

nutch-in-java

How to use Apache Nutch without command line
Java
14
star
61

futex

File-based Ruby Mutex
Ruby
14
star
62

veils

Ruby Veil Objects
Ruby
13
star
63

texsc

Spell checking for LaTeX documents with the help of aspell
Ruby
13
star
64

bloghacks

Jekyll demo blog for "256 Bloghacks" book
HTML
13
star
65

cam

Classes and Metriัs (CaM): a dataset of Java classes from public open-source GitHub repositories
Shell
13
star
66

latexmk-action

GitHub action for building LaTeX documents via latexmk
Dockerfile
13
star
67

rumble

Command Line Tool to Send Newsletters
Ruby
13
star
68

est

Estimates Automated
Ruby
12
star
69

syncem

A simple Ruby decorator to make all methods of your object thread-safe
Ruby
12
star
70

techiends

Tech Friends Club
HTML
12
star
71

loog

Ruby object, which you can pass to other objects, where they will use it as a logger
Ruby
12
star
72

fibonacci

Fibonacci algorithm implemented in a few compilable languages in different programming flavors
C++
12
star
73

drops

Primitive CSS classes to replace most commonly used CSS styles
CSS
12
star
74

kdpcover

LaTeX class rendering a cover for a book published by Kindle Direct Publishing (KDP)
TeX
12
star
75

iccq.github.io

Official Website of International Conference on Code Quality (ICCQ)
TeX
12
star
76

ppt-slides

LaTeX package for making slide decks ร  la PowerPoint (PPT)
TeX
12
star
77

dockers

Useful Docker Images
Dockerfile
11
star
78

obk

Ruby decorator to throttle object method calls: there will be fixed delays between them
Ruby
11
star
79

xsline

Declarative and Immutable Java Chain of XSL Transformations
Java
11
star
80

microstack

The most primitive and the fastest implementation of a fixed-size last-in-first-out stack on stack in Rust, for Copy-implementing types
Rust
11
star
81

jaxec

Primitive execution of command line commands from Java (mostly useful for tests)
Java
11
star
82

jekyll-chatgpt-translate

Automated translating of Jekyll pages via ChatGPT: all you need is just an OpenAI API key
Ruby
10
star
83

phandom

PhantomJS Java DOM Builder
Java
10
star
84

tdx

Test Dynamics
Ruby
10
star
85

texqc

LaTeX Build Quality Control: checks the log file after LaTeX and finds error reports
Ruby
10
star
86

names-vs-complexity

How compound variable names affect complexity of Java methods
TeX
10
star
87

random-tcp-port

Random TCP Port Reserver
C++
10
star
88

jekyll-bits

Jekyll plugin with simple and nice tools for better blogging
Ruby
10
star
89

articles

Some articles I write for online and offline magazines
Perl
9
star
90

bibrarian

Quotes Organized
Java
9
star
91

yb-book

This LaTeX class I'm using for all my books I publish on Amazon
TeX
9
star
92

size-vs-immutability

Empirically proven that immutable Java classes are smaller than mutable ones
TeX
9
star
93

rultor-image

Default Docker image for Rultor
Dockerfile
9
star
94

rultor-remote

Rultor command line remote control
Ruby
9
star
95

colorizejs

jQuery plugin to colorize data elements
JavaScript
9
star
96

emap

๐Ÿ“ˆ The fastest map possible in Rust, where keys are integers and the capacity is fixed (faster than Vec!)
Rust
9
star
97

pgtk

PostgreSQL ToolKit for Ruby apps: Liquibase + Rake + Connection Pool
Ruby
8
star
98

jpeek-maven-plugin

Maven Plugin for jPeek
Java
8
star
99

fazend-framework

FaZend Framework, Zend Framework extensions
PHP
8
star
100

sqm

Lecture Notes for "Software Quality Metrics" course in HSE University, 2023-2024
TeX
8
star