• Stars
    star
    121
  • Rank 288,073 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 9 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

xmlsjon converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.

xmljson

This library is not actively maintained. Alternatives are xmltodict and untangle. Use only if you need to parse using specific XML to JSON conventions.

xmljson converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.

About

XML can be converted to a data structure (such as JSON) and back. For example:

<employees>
    <person>
        <name value="Alice"/>
    </person>
    <person>
        <name value="Bob"/>
    </person>
</employees>

can be converted into this data structure (which also a valid JSON object):

{
    "employees": [{
        "person": {
            "name": {
                "@value": "Alice"
            }
        }
    }, {
        "person": {
            "name": {
                "@value": "Bob"
            }
        }
    }]
}

This uses the BadgerFish convention that prefixes attributes with @. The conventions supported by this library are:

  • Abdera: Use "attributes" for attributes, "children" for nodes
  • BadgerFish: Use "$" for text content, @ to prefix attributes
  • Cobra: Use "attributes" for sorted attributes (even when empty), "children" for nodes, values are strings
  • GData: Use "$t" for text content, attributes added as-is
  • Parker: Use tail nodes for text content, ignore attributes
  • Yahoo Use "content" for text content, attributes added as-is

Convert data to XML

To convert from a data structure to XML using the BadgerFish convention:

>>> from xmljson import badgerfish as bf
>>> bf.etree({'p': {'@id': 'main', '$': 'Hello', 'b': 'bold'}})

This returns an array of etree.Element structures. In this case, the result is identical to:

>>> from xml.etree.ElementTree import fromstring
>>> [fromstring('<p id="main">Hello<b>bold</b></p>')]

The result can be inserted into any existing root etree.Element:

>>> from xml.etree.ElementTree import Element, tostring
>>> result = bf.etree({'p': {'@id': 'main'}}, root=Element('root'))
>>> tostring(result)
'<root><p id="main"/></root>'

This includes lxml.html as well:

>>> from lxml.html import Element, tostring
>>> result = bf.etree({'p': {'@id': 'main'}}, root=Element('html'))
>>> tostring(result, doctype='<!DOCTYPE html>')
'<!DOCTYPE html>\n<html><p id="main"></p></html>'

For ease of use, strings are treated as node text. For example, both the following are the same:

>>> bf.etree({'p': {'$': 'paragraph text'}})
>>> bf.etree({'p': 'paragraph text'})

By default, non-string values are converted to strings using Python's str, except for booleans -- which are converted into true and false (lower case). Override this behaviour using xml_fromstring:

>>> tostring(bf.etree({'x': 1.23, 'y': True}, root=Element('root')))
'<root><y>true</y><x>1.23</x></root>'
>>> from xmljson import BadgerFish              # import the class
>>> bf_str = BadgerFish(xml_tostring=str)       # convert using str()
>>> tostring(bf_str.etree({'x': 1.23, 'y': True}, root=Element('root')))
'<root><y>True</y><x>1.23</x></root>'

If the data contains invalid XML keys, these can be dropped via invalid_tags='drop' in the constructor:

>>> bf_drop = BadgerFish(invalid_tags='drop')
>>> data = bf_drop.etree({'$': '1', 'x': '1'}, root=Element('root'))    # Drops invalid <$> tag
>>> tostring(data)
'<root>1<x>1</x></root>'

Convert XML to data

To convert from XML to a data structure using the BadgerFish convention:

>>> bf.data(fromstring('<p id="main">Hello<b>bold</b></p>'))
{"p": {"$": "Hello", "@id": "main", "b": {"$": "bold"}}}

To convert this to JSON, use:

>>> from json import dumps
>>> dumps(bf.data(fromstring('<p id="main">Hello<b>bold</b></p>')))
'{"p": {"b": {"$": "bold"}, "@id": "main", "$": "Hello"}}'

To preserve the order of attributes and children, specify the dict_type as OrderedDict (or any other dictionary-like type) in the constructor:

>>> from collections import OrderedDict
>>> from xmljson import BadgerFish              # import the class
>>> bf = BadgerFish(dict_type=OrderedDict)      # pick dict class

By default, values are parsed into boolean, int or float where possible (except in the Yahoo method). Override this behaviour using xml_fromstring:

>>> dumps(bf.data(fromstring('<x>1</x>')))
'{"x": {"$": 1}}'
>>> bf_str = BadgerFish(xml_fromstring=False)   # Keep XML values as strings
>>> dumps(bf_str.data(fromstring('<x>1</x>')))
'{"x": {"$": "1"}}'
>>> bf_str = BadgerFish(xml_fromstring=repr)    # Custom string parser
'{"x": {"$": "\'1\'"}}'

xml_fromstring can be any custom function that takes a string and returns a value. In the example below, only the integer 1 is converted to an integer. Everything else is retained as a float:

>>> def convert_only_int(val):
...     return int(val) if val.isdigit() else val
>>> bf_int = BadgerFish(xml_fromstring=convert_only_int)
>>> dumps(bf_int.data(fromstring('<p><x>1</x><y>2.5</y><z>NaN</z></p>')))
'{"p": {"x": {"$": 1}, "y": {"$": "2.5"}, "z": {"$": "NaN"}}}'

Conventions

To use a different conversion method, replace BadgerFish with one of the other classes. Currently, these are supported:

>>> from xmljson import abdera          # == xmljson.Abdera()
>>> from xmljson import badgerfish      # == xmljson.BadgerFish()
>>> from xmljson import cobra           # == xmljson.Cobra()
>>> from xmljson import gdata           # == xmljson.GData()
>>> from xmljson import parker          # == xmljson.Parker()
>>> from xmljson import yahoo           # == xmljson.Yahoo()

Options

Conventions may support additional options.

The Parker convention absorbs the root element by default. parker.data(preserve_root=True) preserves the root instance:

>>> from xmljson import parker, Parker
>>> from xml.etree.ElementTree import fromstring
>>> from json import dumps
>>> dumps(parker.data(fromstring('<x><a>1</a><b>2</b></x>')))
'{"a": 1, "b": 2}'
>>> dumps(parker.data(fromstring('<x><a>1</a><b>2</b></x>'), preserve_root=True))
'{"x": {"a": 1, "b": 2}}'

Installation

This is a pure-Python package built for Python 2.7+ and Python 3.0+. To set up:

pip install xmljson

Simple CLI utility

After installation, you can benefit from using this package as simple CLI utility. By now only XML to JSON conversion supported. Example:

$ python -m xmljson -h
usage: xmljson [-h] [-o OUT_FILE]
            [-d {abdera,badgerfish,cobra,gdata,parker,xmldata,yahoo}]
            [in_file]

positional arguments:
in_file               defaults to stdin

optional arguments:
-h, --help            show this help message and exit
-o OUT_FILE, --out_file OUT_FILE
                        defaults to stdout
-d {abdera,badgerfish,...}, --dialect {...}
                        defaults to parker

$ python -m xmljson -d parker tests/mydata.xml
{
  "foo": "spam",
  "bar": 42
}

This is a typical UNIX filter program: it reads file (or stdin), processes it in some way (convert XML to JSON in this case), then prints it to stdout (or file). Example with pipe:

$ some-xml-producer | python -m xmljson | some-json-processor

There is also pip's console_script entry-point, you can call this utility as xml2json:

$ xml2json -d abdera mydata.xml

Roadmap

  • Test cases for Unicode
  • Support for namespaces and namespace prefixes
  • Support XML comments

More Repositories

1

ipython-notebooks

Jupyter Notebook
61
star
2

pyconindia2020

My keynote at PyCon India 2020 https://in.pycon.org/2020/
Python
58
star
3

pincode

PIN Code mapping database
JavaScript
55
star
4

minecraft-websocket

Control Minecraft using websockets in JavaScript and Python
JavaScript
33
star
5

beautiful-visualisations

Talk at jsFoo Bangalore on 20 Oct 2012
JavaScript
25
star
6

fifadata

FIFA data
21
star
7

markdress

Serve Markdown files as web pages
PHP
19
star
8

orderedattrdict

An ordered Python dictionary with attribute-style access.
Python
16
star
9

benchmarks

Various benchmark tests
Python
13
star
10

protectstatic

Protect static files via PHP using OpenID, Google authentication, etc.
PHP
12
star
11

pyconindia2012-autolysis

Automated Data Analysis in Python: Talk at Pycon India 2012
Python
7
star
12

euler

Project Euler solutions in Python
Python
7
star
13

datascience

A Data Science Curriculum
6
star
14

mixamail

Twitter via e-mail
Python
5
star
15

actornetwork

Clustering the network of actors on IMDb
Jupyter Notebook
4
star
16

whatnext

What should I do next? A prioritisation matrix
HTML
4
star
17

radar

ThoughtWorks Radar data
4
star
18

data.gov.in

Exploring the datasets
Python
3
star
19

py-pretty

Formats dates, numbers, etc. in a pretty, human readable format.
Python
3
star
20

chatgpt-to-markdown

Convert ChatGPT exported conversations.json to Markdown
JavaScript
3
star
21

forms

Customisable forms
JavaScript
3
star
22

reportbee-dashboard

Python
2
star
23

indian-song-database

Automatically exported from code.google.com/p/indian-song-database
Python
2
star
24

vizpack

Collaboratively designing a design taxonomy
2
star
25

dilbert-search

Automatically exported from code.google.com/p/dilbert-search
HTML
1
star
26

pincode-shapes

A PIN code boundary editor
HTML
1
star
27

onething

A Chrome app to create a pinned note to remind you of what you were doing
JavaScript
1
star
28

less

LESS / CSS mixins
Python
1
star
29

statistically-improbable-phrases

Automatically exported from code.google.com/p/statistically-improbable-phrases
HTML
1
star
30

talk

My Talks
Python
1
star
31

two-fifty

Automatically exported from code.google.com/p/two-fifty
Python
1
star
32

chargeback

Automatically exported from code.google.com/p/chargeback
HTML
1
star
33

text-analysis

Text analysis bookmarklets
JavaScript
1
star
34

lok-sabha-attendance

Scrape and visualisation attendance of MPs
JavaScript
1
star
35

vis-cricket

Cricket visualisation
Python
1
star