• Stars
    star
    61
  • Rank 497,051 (Top 10 %)
  • Language
    Go
  • License
    BSD 3-Clause "New...
  • Created over 12 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Unmarshal an anonymous XML doc to map[string]interface{} and JSON, and extract values (using wildcards, if necessary) [deprecated for clbanning/mxj].
x2j.go - Unmarshal arbitrary XML docs to map[string]interface{} or JSON and extract values (using wildcards, if necessary).

NOTICE:

AS OF 3-FEB-2014 THE x2j PACKAGE IS NO LONGER SUPPORTED.

IT HAS BEEN DEPRECATED IN FAVOR OF mxj/x2j: https://github.com/clbanning/mxj/tree/master/x2j.
**** Use the github.com/clbannig/mxj/x2j-wrapper sub-package to port your old x2j code. ****

ANNOUNCEMENTS

22 April 2016:

People keep cloning this "DEPRECATED" repo at the rate of about 100 per day, so I've pulled in the github.com/clbanning/mxj decoder that is orders of magnitude faster to speed things up. (It'd be great if everyone would just use the mxj package.)

23 January 2014:

Extract path(s) for a tag/key value from an anonymous XML or map[string]interface{} value.  Result can be used with ValuesForKeyPath() / ValuesAtKeyPath().  Similar to ValuesFor...() functions but PathsFor...() and PathFor...Shortest() expose the tree navigation information.

20 December 2013:

Non-UTF8 character sets supported via the X2jCharsetReader variable.

12 December 2013:

For symmetry, the package j2x has functions that marshal JSON strings and
map[string]interface{} values to XML encoded strings: http://godoc.org/github.com/clbanning/j2x.

Also, ToTree(), ToMap(), ToJson(), ToJsonIndent(), ReaderValuesFromTagPath() and ReaderValuesForTag() use io.Reader instead of string or []byte.

If you want to process a stream of XML messages check out XmlMsgsFromReader().

USAGE

The package is fairly well self-documented. (http://godoc.org/github.com/clbanning/x2j)  
The one really useful function is:

    - Unmarshal(doc []byte, v interface{}) error  
      where v is a pointer to a variable of type 'map[string]interface{}', 'string', or
      any other type supported by xml.Unmarshal().

To retrieve a value for specific tag use: 

    - DocValue(doc, path string, attrs ...string) (interface{},error) 
    - MapValue(m map[string]interface{}, path string, attr map[string]interface{}, recast ...bool) (interface{}, error)

The 'path' argument is a period-separated tag hierarchy - also known as dot-notation.
It is the program's responsibility to cast the returned value to the proper type; possible 
types are the normal JSON unmarshaling types: string, float64, bool, []interface, map[string]interface{}.  

To retrieve all values associated with a tag occurring anywhere in the XML document use:

    - ValuesForTag(doc, tag string) ([]interface{}, error)
    - ValuesForKey(m map[string]interface{}, key string) []interface{}

    Demos: http://play.golang.org/p/m8zP-cpk0O
           http://play.golang.org/p/cIteTS1iSg
           http://play.golang.org/p/vd8pMiI21b

Returned values should be one of map[string]interface, []interface{}, or string.

All the values assocated with a tag-path that may include one or more wildcard characters - 
'*' - can also be retrieved using:

    - ValuesFromTagPath(doc, path string, getAttrs ...bool) ([]interface{}, error)
    - ValuesFromKeyPath(map[string]interface{}, path string, getAttrs ...bool) []interface{}

    Demos: http://play.golang.org/p/kUQnZ8VuhS
           http://play.golang.org/p/l1aMHYtz7G

NOTE: care should be taken when using "*" at the end of a path - i.e., "books.book.*".  See
the x2jpath_test.go case on how the wildcard returns all key values and collapses list values;
the same message structure can load a []interface{} or a map[string]interface{} (or an interface{}) 
value for a tag.

See the test cases in "x2jpath_test.go" and programs in "example" subdirectory for more.

XML PARSING CONVENTIONS

   - Attributes are parsed to map[string]interface{} values by prefixing a hyphen, '-',
     to the attribute label. [See https://godoc.org/github.com/clbanning/mxj#SetAttrPrefix for options.]
   - If the element is a simple element and has attributes, the element value
     is given the key '#text' for its map[string]interface{} representation.  (See
     the 'atomFeedString.xml' test data, below.)

BULK PROCESSING OF MESSAGE FILES

Sometime messages may be logged into files for transmission via FTP (e.g.) and subsequent
processing. You can use the bulk XML message processor to convert files of XML messages into 
map[string]interface{} values with custom processing and error handler functions.  See
the notes and test code for:

   - XmlMsgsFromFile(fname string, phandler func(map[string]interface{}) bool, ehandler func(error) bool,recast ...bool) error
     [See https://godoc.org/github.com/clbanning/mxj#HandleXmlReader for general version.]

IMPLEMENTATION NOTES

Nothing fancy here, just brute force.

   - Use xml.Decoder to parse the XML doc and build a tree.
   - Walk the tree and load values into a map[string]interface{} variable, 'm', as
     appropriate.
   - Use json.Marshaler to convert 'm' to JSON.

As for testing:

   - Copy an XML doc into 'x2j_test.xml'.
   - Run "go test" and you'll get a full dump.
     ("pathTestString.xml" and "atomFeedString.xml" are test data from "read_test.go"
     in the encoding/xml directory of the standard package library.)

MOTIVATION

I make extensive use of JSON for messaging and typically unmarshal the messages into
map[string]interface{} variables.  This is easily done using json.Unmarshal from the
standard Go libraries.  Unfortunately, many legacy solutions use structured
XML messages; in those environments the applications would have to be refitted to
interoperate with my components.

The better solution is to just provide an alternative HTTP handler that receives
XML messages and parses it into a map[string]interface{} variable and then reuse
all the JSON-based code.  The Go xml.Unmarshal() function does not provide the same
option of unmarshaling XML messages into map[string]interface{} variables. So I wrote
a couple of small functions to fill this gap.

Of course, once the XML doc was unmarshal'd into a map[string]interface{} variable it
was just a matter of calling json.Marshal() to provide it as a JSON string.  Hence 'x2j'
rather than just 'x2m'.

USES

   - putting a XML API on our message hub middleware (http://tamgroupinc.com/jsonhub/)
   - loading XML data into NoSQL database, such as, mongoDB