• Stars
    star
    668
  • Rank 64,939 (Top 2 %)
  • Language
    Go
  • License
    GNU General Publi...
  • Created over 6 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Generate a Go struct from XML.

zek

Zek is a prototype for creating a Go struct from an XML document. The resulting struct works best for reading XML (see also #14), to create XML, you might want to use something else.

It was developed at Leipzig University Library to shorten the time to go from raw XML to a struct that allows to access XML data in Go programs.

Skip the fluff, just the code.

Given some XML, run:

$ curl -s https://raw.githubusercontent.com/miku/zek/master/fixtures/e.xml | zek -e
// Rss was generated 2018-08-30 20:24:14 by tir on sol.
type Rss struct {
    XMLName xml.Name `xml:"rss"`
    Text    string   `xml:",chardata"`
    Rdf     string   `xml:"rdf,attr"`
    Dc      string   `xml:"dc,attr"`
    Geoscan string   `xml:"geoscan,attr"`
    Media   string   `xml:"media,attr"`
    Gml     string   `xml:"gml,attr"`
    Taxo    string   `xml:"taxo,attr"`
    Georss  string   `xml:"georss,attr"`
    Content string   `xml:"content,attr"`
    Geo     string   `xml:"geo,attr"`
    Version string   `xml:"version,attr"`
    Channel struct {
        Text          string `xml:",chardata"`
        Title         string `xml:"title"`         // ESS New Releases (Display...
        Link          string `xml:"link"`          // http://tinyurl.com/ESSNew...
        Description   string `xml:"description"`   // New releases from the Ear...
        LastBuildDate string `xml:"lastBuildDate"` // Mon, 27 Nov 2017 00:06:35...
        Item          []struct {
            Text        string `xml:",chardata"`
            Title       string `xml:"title"`       // Surficial geology, Aberde...
            Link        string `xml:"link"`        // https://geoscan.nrcan.gc....
            Description string `xml:"description"` // Geological Survey of Cana...
            Guid        struct {
                Text        string `xml:",chardata"` // 304279, 306212, 306175, 3...
                IsPermaLink string `xml:"isPermaLink,attr"`
            } `xml:"guid"`
            PubDate       string   `xml:"pubDate"`      // Fri, 24 Nov 2017 00:00:00...
            Polygon       []string `xml:"polygon"`      // 64.0000 -98.0000 64.0000 ...
            Download      string   `xml:"download"`     // https://geoscan.nrcan.gc....
            License       string   `xml:"license"`      // http://data.gc.ca/eng/ope...
            Author        string   `xml:"author"`       // Geological Survey of Cana...
            Source        string   `xml:"source"`       // Geological Survey of Cana...
            SndSeries     string   `xml:"SndSeries"`    // Bedford Institute of Ocea...
            Publisher     string   `xml:"publisher"`    // Natural Resources Canada,...
            Edition       string   `xml:"edition"`      // prelim., surficial data m...
            Meeting       string   `xml:"meeting"`      // Geological Association of...
            Documenttype  string   `xml:"documenttype"` // serial, open file, serial...
            Language      string   `xml:"language"`     // English, English, English...
            Maps          string   `xml:"maps"`         // 1 map, 5 maps, Publicatio...
            Mapinfo       string   `xml:"mapinfo"`      // surficial geology, surfic...
            Medium        string   `xml:"medium"`       // on-line; digital, digital...
            Province      string   `xml:"province"`     // Nunavut, Northwest Territ...
            Nts           string   `xml:"nts"`          // 066B, 095J; 095N; 095O; 0...
            Area          string   `xml:"area"`         // Aberdeen Lake, Mackenzie ...
            Subjects      string   `xml:"subjects"`
            Program       string   `xml:"program"`       // GEM2: Geo-mapping for Ene...
            Project       string   `xml:"project"`       // Rae Province Project Mana...
            Projectnumber string   `xml:"projectnumber"` // 340521, 343202, 340557, 3...
            Abstract      string   `xml:"abstract"`      // This new surficial geolog...
            Links         string   `xml:"links"`         // Online - En ligne (PDF, 9...
            Readme        string   `xml:"readme"`        // readme | https://geoscan....
            PPIid         string   `xml:"PPIid"`         // 34532, 35096, 35438, 2563...
        } `xml:"item"`
    } `xml:"channel"`
}

Online

About

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Upsides:

  • it works fine for non-recursive structures,
  • does not need XSD or DTD,
  • it is relatively convenient to access attributes, children and text,
  • will generate a single struct, which make for a quite compact representation,
  • simple user interface,
  • comments with examples,
  • schema inference across multiple files.

Downsides:

  • experimental, early, buggy, unstable prototype,
  • no support for recursive types (similar to Russian Doll strategy, [1])
  • no type inference, everything is accessible as string (without a schema, type inference may fail if the type guess is wrong)

Bugs:

Mapping between XML elements and data structures is inherently flawed: an XML element is an order-dependent collection of anonymous values, while a data structure is an order-independent collection of named values.

https://golang.org/pkg/encoding/xml/#pkg-note-BUG

Related projects:

And other awesome XML utilities.

Presentations:

Install

$ go install github.com/miku/zek/cmd/zek@latest

Debian and RPM packages:

It's in AUR, too.

Usage

$ zek -h
Usage of zek:
  -B    use a fixed banner string (e.g. for CI)
  -C    emit less compact struct
  -F    skip formatting
  -P string
        if set, write out struct within a package with the given name
  -S int
        read at most this many tags, approximately (0=unlimited)
  -c    emit more compact struct (noop, as this is the default since 0.1.7)
  -d    debug output
  -e    add comments with example
  -j    add JSON tags
  -m    omit empty Text fields
  -max-examples int
        limit number of examples (default 10)
  -n string
        use a different name for the top-level struct
  -o string
        if set, write to output file, not stdout
  -p    write out an example program
  -s    strict parsing and writing
  -t string
        emit struct for tag matching this name
  -u    filter out duplicated examples
  -version
        show version
  -x int
        max chars for example (default 25)

Examples:

$ cat fixtures/a.xml
<a></a>

$ zek -C < fixtures/a.xml
type A struct {
    XMLName xml.Name `xml:"a"`
    Text    string   `xml:",chardata"`
}

Debug output dumps the internal tree as JSON to stdout.

$ zek -d < fixtures/a.xml
{"name":{"Space":"","Local":"a"}}

Example program:

package main

import (
	"encoding/json"
	"encoding/xml"
	"fmt"
	"log"
	"os"
)

// A was generated 2017-12-05 17:35:21 by tir on apollo.
type A struct {
	XMLName xml.Name `xml:"a"`
	Text    string   `xml:",chardata"`
}

func main() {
	dec := xml.NewDecoder(os.Stdin)
	var doc A
	if err := dec.Decode(&doc); err != nil {
		log.Fatal(err)
	}
	b, err := json.Marshal(doc)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(string(b))
}

$ zek -C -p < fixtures/a.xml > sample.go && go run sample.go < fixtures/a.xml | jq . && rm sample.go
{
  "XMLName": {
    "Space": "",
    "Local": "a"
  },
  "Text": ""
}

More complex example:

$ zek < fixtures/d.xml
// Root was generated 2019-06-11 16:27:04 by tir on hayiti.
type Root struct {
        XMLName xml.Name `xml:"root"`
        Text    string   `xml:",chardata"`
        A       []struct {
                Text string `xml:",chardata"`
                B    []struct {
                        Text string `xml:",chardata"`
                        C    string `xml:"c"`
                        D    string `xml:"d"`
                } `xml:"b"`
        } `xml:"a"`
}

$ zek -p < fixtures/d.xml > sample.go && go run sample.go < fixtures/d.xml | jq . && rm sample.go
{
  "XMLName": {
    "Space": "",
    "Local": "root"
  },
  "Text": "\n\n\n\n",
  "A": [
    {
      "Text": "\n  \n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": "Hi",
          "D": ""
        },
        {
          "Text": "\n    \n    \n  ",
          "C": "World",
          "D": ""
        }
      ]
    },
    {
      "Text": "\n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": "Hello",
          "D": ""
        }
      ]
    },
    {
      "Text": "\n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": "",
          "D": "World"
        }
      ]
    }
  ]
}

Annotate with comments:

$ zek -e < fixtures/l.xml
// Records was generated 2019-06-11 16:29:35 by tir on hayiti.
type Records struct {
        XMLName xml.Name `xml:"Records"`
        Text    string   `xml:",chardata"` // \n
        Xsi     string   `xml:"xsi,attr"`
        Record  []struct {
                Text   string `xml:",chardata"`
                Header struct {
                        Text       string `xml:",chardata"`
                        Status     string `xml:"status,attr"`
                        Identifier string `xml:"identifier"` // oai:ojs.localhost:article...
                        Datestamp  string `xml:"datestamp"`  // 2009-06-24T14:48:23Z, 200...
                        SetSpec    string `xml:"setSpec"`    // eppp:ART, eppp:ART, eppp:...
                } `xml:"header"`
                Metadata struct {
                        Text    string `xml:",chardata"`
                        Rfc1807 struct {
                                Text           string   `xml:",chardata"`
                                Xmlns          string   `xml:"xmlns,attr"`
                                Xsi            string   `xml:"xsi,attr"`
                                SchemaLocation string   `xml:"schemaLocation,attr"`
                                BibVersion     string   `xml:"bib-version"`  // v2, v2, v2...
                                ID             string   `xml:"id"`           // http://jou...
                                Entry          string   `xml:"entry"`        // 2009-06-24...
                                Organization   []string `xml:"organization"` // Proceeding...
                                Title          string   `xml:"title"`        // Introducti...
                                Type           string   `xml:"type"`
                                Author         []string `xml:"author"`       // KRAMPEN, G..
                                Copyright      string   `xml:"copyright"`    // Das Urhebe...
                                OtherAccess    string   `xml:"other_access"` // url:http:/...
                                Keyword        string   `xml:"keyword"`
                                Period         []string `xml:"period"`
                                Monitoring     string   `xml:"monitoring"`
                                Language       string   `xml:"language"` // en, en, en, e...
                                Abstract       string   `xml:"abstract"` // After a short...
                                Date           string   `xml:"date"`     // 2009-06-22 12...
                        } `xml:"rfc1807"`
                } `xml:"metadata"`
                About string `xml:"about"`
        } `xml:"Record"`
}

Only consider a nested element

$ zek -t metadata fixtures/z.xml
// Metadata was generated 2019-06-11 16:33:26 by tir on hayiti.
type Metadata struct {
        XMLName xml.Name `xml:"metadata"`
        Text    string   `xml:",chardata"`
        Dc      struct {
                Text  string `xml:",chardata"`
                Xmlns string `xml:"xmlns,attr"`
                Title struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                } `xml:"title"`
                Identifier struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                } `xml:"identifier"`
                Rights struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                        Lang  string `xml:"lang,attr"`
                } `xml:"rights"`
                AccessRights struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                } `xml:"accessRights"`
        } `xml:"dc"`
}

Inference across files

$ zek fixtures/a.xml fixtures/b.xml fixtures/c.xml
// A was generated 2017-12-05 17:40:14 by tir on apollo.
type A struct {
	XMLName xml.Name `xml:"a"`
	Text    string   `xml:",chardata"`
	B       []struct {
		Text string `xml:",chardata"`
	} `xml:"b"`
}

This is also useful, if you deal with archives containing XML files:

$ unzip -p 4082359.zip '*.xml' | zek -e

Given a directory full of zip files, you can combined find, unzip and zek:

$ for i in $(find ftp/b571 -type f -name "*zip"); do unzip -p $i '*xml'; done | zek -e

Another example (tarball with thousands of XML files, seemingly MARC):

$ tar -xOzf /tmp/20180725.125255.tar.gz | zek -e
// OAIPMH was generated 2018-09-26 15:03:29 by tir on sol.
type OAIPMH struct {
        XMLName        xml.Name `xml:"OAI-PMH"`
        Text           string   `xml:",chardata"`
        Xmlns          string   `xml:"xmlns,attr"`
        Xsi            string   `xml:"xsi,attr"`
        SchemaLocation string   `xml:"schemaLocation,attr"`
        ListRecords    struct {
                Text   string `xml:",chardata"`
                Record struct {
                        Text   string `xml:",chardata"`
                        Header struct {
                                Text       string `xml:",chardata"`
                                Identifier struct {
                                        Text string `xml:",chardata"` // aleph-pub:000000001, ...
                                } `xml:"identifier"`
                        } `xml:"header"`
                        Metadata struct {
                                Text   string `xml:",chardata"`
                                Record struct {
                                        Text           string `xml:",chardata"`
                                        Xmlns          string `xml:"xmlns,attr"`
                                        Xsi            string `xml:"xsi,attr"`
                                        SchemaLocation string `xml:"schemaLocation,attr"`
                                        Leader         struct
                                                Text string `xml:",chardata"` // 00001nM2.01200024
                                        } `xml:"leader"`
                                        Controlfield []struct {
                                                Text string `xml:",chardata"` // 00001nM2.01200024
                                                Tag  string `xml:"tag,attr"`
                                        } `xml:"controlfield"`
                                        Datafield []struct {
                                                Text     string `xml:",chardata"`
                                                Tag      string `xml:"tag,attr"`
                                                Ind1     string `xml:"ind1,attr"`
                                                Ind2     string `xml:"ind2,attr"`
                                                Subfield []struct {
                                                        Text string `xml:",chardata"` // KM0000002
                                                        Code string `xml:"code,attr"`
                                                } `xml:"subfield"`
                                        } `xml:"datafield"`
                                } `xml:"record"`
                        } `xml:"metadata"`
                } `xml:"record"`
        } `xml:"ListRecords"`
}

Generate a package

If you want in include generated file in the build process, e.g. with go generate, you may find -P and -o helpful.

$ cat fixtures/b.xml
<a><b></b></a>

Run on the command line or via go generate:

$ zek -P mypkg -o data.go < fixtures/b.xml

This would write out the following in data.go file:

// Code generated by zek; DO NOT EDIT.

package mypkg

import "encoding/xml"

// A was generated 2021-09-16 11:23:06 by tir on trieste.
type A struct {
        XMLName xml.Name `xml:"a"`
        Text    string   `xml:",chardata"`
        B       string   `xml:"b"`
}

Note that any existing file will be overwritten, without any warning.

Misc

As a side effect, zek seems to be a useful for debugging. Example:

This record is emitted from a typical OAI server (OJS, not even uncommon), yet one can quickly spot the flaw in the structure.

Over 30 different struct generated manually in the course of a few hours (around five minutes per source): https://git.io/vbTDo.

-- Current extent leader: 1532 lines struct

More Repositories

1

esbulk

Bulk indexing command line tool for elasticsearch
Go
269
star
2

metha

Command line OAI harvester and client with built-in cache,
Go
102
star
3

binpic

Create a picture from any file.
Go
89
star
4

microblob

Serve millions of JSON documents via HTTP.
Go
65
star
5

gluish

Utils around luigi.
Python
63
star
6

xmlcutty

Select elements from large XML files, fast.
Go
52
star
7

solrbulk

SOLR bulk indexing utility for the command line.
Go
41
star
8

estab

Export elasticsearch as TSV or line delimited JSON.
Go
36
star
9

haystack

Haystack and seaweedfs lightning talk.
C
25
star
10

pgrk

Command line pagerank computation with Go.
Go
20
star
11

siskin

Tasks around metadata.
Python
20
star
12

parallel

Process lines in parallel.
Go
17
star
13

exploreio

Explore IO with Golang, workshop at Golab 2017
Go
17
star
14

mlgo

Machine Learning with Go (golang) Session Material for GOLAB 2018
Makefile
17
star
15

stardust

stardust, strdist. String distance and similarity measures for the command line.
Go
16
star
16

span

Span formats.
Go
15
star
17

dwstalk

A data web service, lightning talk.
15
star
18

kat

Kat is like Preview.app for the command-line.
Go
15
star
19

filterline

Command line tool to filter file by line number.
C
12
star
20

brew-completion

bash tab completion for homebrew package manager
Shell
11
star
21

activememory

A page to test short term memory.
JavaScript
11
star
22

ntto

Small n-triples to line delimited JSON converter and prefix cutter.
Go
11
star
23

issnlister

List of valid, registered ISSN
Python
10
star
24

nntour

Neural nets intro @lpyug
Python
10
star
25

rsampling

Reservoir sampling for the command line.
Go
8
star
26

esdump

Stream documents from elasticsearch with scroll (and HTTP GET only)
Go
8
star
27

workshops

A level of indirection.
7
star
28

batchdata

Batch data processing with luigi, 90min workshop at PyCon Balkan 2018, Belgrade.
Python
7
star
29

jquery-retype

Your friendly javascript keylogger.
JavaScript
7
star
30

dcdump

Datacite API bulk access.
Go
7
star
31

goforprogrammers

Go for Programmers, Spartakiade 2021
Go
6
star
32

cignotes

Notes on Concurrency in Go
Go
6
star
33

marc21

A MARC21 library for Go.
Go
6
star
34

clinker

Dumb link checker.
Go
6
star
35

golangintro

A one day introductory Golang workshop at http://devopenspace.de 2018
Go
6
star
36

go4x4

Go materials for a set of 4x4 sessions.
HTML
6
star
37

glamline

Glamorous command line
Go
5
star
38

isbngrep

Command line ISBN sniffer and normalizer.
Go
5
star
39

oaimi

No frills OAI PMH harvesting for the command line.
Go
5
star
40

oaicrawl

OAI crawler for strange endpoints.
Go
5
star
41

makta

Make a sqlite3 database from tabular data (2-TSV).
Go
5
star
42

cachetools

Various Python caching, pickling and memoization functions.
5
star
43

wikitools

Few tools for working with wikipedia XML dumps.
Go
5
star
44

unzippa

Unzip selected members from a zipfile 150x faster than unzip.
Go
4
star
45

urlbisect

For URLs with autoincrement ids, find the highest number using binary search.
Go
4
star
46

jsoninf

JSON schema inference
Go
4
star
47

clam

A templated shell helper library.
Go
4
star
48

es-hf-2014-05-28

Experimenting with the Elasticsearch completion suggester during elasticsearch hackfest.
JavaScript
4
star
49

jpul

Jobportal Uni Leipzig
PHP
4
star
50

kollektions

kollektions
Python
3
star
51

productivego

Three reasons why go is fun to work with (even after seven years).
Makefile
3
star
52

fuzzycat

Fuzzy matching publications for fatcat (wip).
Python
3
star
53

lpug-luigi

Material from luigi presentation at LPUG meeting on 10/11/2015.
Python
3
star
54

concgo

Concurrency in Go workshop, GOLAB 2019
HTML
3
star
55

aboutgo

Materials for learning and teaching various Go topics.
Go
3
star
56

httpgetaway

HTTP GETAWAY - clients and hops, transports and timeouts.
Go
3
star
57

flask-gae-stub

Google App Engine Flask Stub.
Python
3
star
58

packpy

Python packaging notes for PyCon Balkan 2018.
mIRC Script
3
star
59

goexp

Go Expedition
HTML
3
star
60

hurrly

Hurry, hurrly!
Go
3
star
61

ottily

Ottily executes a javascript snippet on each line of an input file in parallel.
Go
3
star
62

memcmarc

Load/Set MARC records into memcache.
Go
2
star
63

khwarizmi

Python
2
star
64

groupcover

Like uniq, but worse.
Go
2
star
65

benchtrie

Benchmarking name lookups.
Go
2
star
66

ldjtab

Extract values and line numbers from LDJ files.
Go
2
star
67

rarara

Prime buffer cache for a file via readahead from the command line (linux only).
C
2
star
68

istools

Finc Intermediate Schema tools (linter, licensing)
Go
2
star
69

tableau

Data and Feedback.
JavaScript
2
star
70

goai

Go OAI.
Go
2
star
71

ttarc

Minimalistic TikTok trending archiver.
HTML
2
star
72

padsync

Tracking etherpads in git repositories.
Go
2
star
73

io15min

Lightning talk about the io package and its interfaces.
2
star
74

waste

A cat in a container service.
Go
2
star
75

memcldj

Load JSON blobs into memcache or memcachedb quickly.
Go
2
star
76

dvmapp

Server (prototype) for Project Die Virtuelle Mittagsfrau (defunkt)
Go
2
star
77

goenergy

Go energy lightning talk
2
star
78

gndzero

GND cache. Zeroth prototype.
Python
2
star
79

elasticsearch-slides

JavaScript
2
star
80

magento-tooling

Small magento analgesics.
2
star
81

marctojson

Command line MARC to JSON converter.
Java
2
star
82

creativejupyter

Creative Jupyter, PyCon Balkan 2019
2
star
83

sundaypython

Input session for Coding da Vinci Ost 2018: Python 101 for data processing.
Jupyter Notebook
2
star
84

s2gen

Generate code for representing SOLR documents in Go from schema.xml file.
Go
2
star
85

runpad

Run code from an etherpad
Go
2
star
86

solrcount

A proxy for solr requests, that will only reveal the number of results.
Go
2
star
87

zeromq-slides

JavaScript
1
star
88

gows

Go workshop notes.
HTML
1
star
89

picourse

A WIP Raspberry Pi and Python course.
1
star
90

golang6h

Golang in six hours. Language tour and tooling.
Go
1
star
91

vcprompt

Imported from https://bitbucket.org/gward/vcprompt
C
1
star
92

marc22

An experimental fork of marc21.
Go
1
star
93

evreg

JavaScript
1
star
94

redminesync

Download and cache all attachments from Redmine tickets locally.
Go
1
star
95

dvmweb

Die Virtuelle Mittagsfrau (web)
HTML
1
star
96

gndcache

Local sqlite-based GND cache.
Go
1
star
97

triform

3-from.
Python
1
star
98

esmlt

Run many more-like-this queries agains elasticsearch in parallel.
Go
1
star
99

indigo

Inspect JSON lines (jsonl, ndjson) files with a single script.
Python
1
star
100

tinycat

A small cat.
Go
1
star