• Stars
    star
    205
  • Rank 191,226 (Top 4 %)
  • Language
    JavaScript
  • License
    Other
  • Created over 9 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

JavaScript parser for the Archie Markup Language (ArchieML)

ArchieML

Parse Archie Markup Language (ArchieML) documents into JavaScript objects.

Read about the ArchieML specification at archieml.org.

The current version is v0.5.0.

Installation

npm install archieml

Usage

<script src="archieml.js"></script>

<script type="text/javascript">
  var parsed = archieml.load("key: value");
  >> {"key": "value"}
</script>
var archieml = require('archieml');
var parsed = archieml.load("key: value");
>> {"key": "value"}

Parser options

Inline comments are now deprecated in ArchieML. They will continue to be supported until 1.0, but are now disabled by default. They can be enabled by passing an options object as the second parameter in load:

archieml.load("key: value [comment]");
>> {"key": "value [comment]"}

archieml.load("key: value [comment]", {comments: true});
>> {"key": "value"}

Using with Google Documents

We use archieml at The New York Times to parse Google Documents containing AML. This requires a little upfront work to download the document and convert it into text that archieml can load.

The first step is authenticating with the Google Drive API, and accessing the document. For this, you will need a user account that is authorized to view the document you wish to download.

For this example, I'm going to use a simple node app using Google's official googleapis npm package, but you can use another library or authentication method if you like. Whatever mechanism, you'll need to be able to export the document either as text, or html, and then run some of the post-processing listed in the example file at examples/google_drive.js.

You will need to set up a Google API application in order to authenticate yourself. Full instructions are available here. When you create your Client ID, you should list http://127.0.0.1:3000 as an authorized origin, and http://127.0.0.1:3000/oauth2callback as the callback url.

Then open up examples/google_drive.js and enter the CLIENT_ID and CLIENT_SECRET from the API account you created. And then run the server:

$ npm install archieml
$ npm install express
$ npm install googleapis
$ npm install htmlparser2
$ npm install html-entities

$ node examples/google_drive.js

You should then be able to go to http://127.0.0.1/KEY, where KEY is the file id of the Google Drive document you want to parse. Make sure that the account you created has access to that document.

You can use a test document to start that's public to everyone. It will ask you to authenticate your current session, and then will return back a json representation of the document. View the source of examples/google_drive.js for step by step instructions on what's being done.

http://127.0.0.1:3000/1JjYD90DyoaBuRYNxa4_nqrHKkgZf1HrUj30i3rTWX1s

Tests

A full shared test suite is included from the archieml.org repository, under /test. After running npm install, initialize the shared test submodules (git submodule init && git submodule update) and npm run test to execute the tests.

Changelog

  • 0.5.0 - Added support for implicit object nesting.
  • 0.4.2 - Fixes bug #19.
  • 0.4.1 - Fixes bug #21.
  • 0.4.0 - Updates to how dot-notation is handled in freeform array, unicode key support.
  • 0.3.1 - Added support for freeform arrays.
  • 0.3.0 - Added support for nested arrays. Follows modifications in ArchieML CR-20150509.
  • 0.2.0 - Arrays that are redefined now overwrite the previous definition. Skips within multi-line values break up the value. Follows modifications in ArchieML CR-20150306.
  • 0.1.2 - More consistent handling of newlines. Fixes issue #4, around detecting the scope of multi-line values.
  • 0.1.1 - Fixes issue #1, removing comment backslashes.
  • 0.1.0 - Initial release supporting the first version of the ArchieML spec, published 2015-03-06.

More Repositories

1

ai2html

A script for Adobe Illustrator that converts your Illustrator artwork into an html page.
JavaScript
897
star
2

who-the-hill

Who The Hill: An MMS-based facial recognition service for members of Congress.
Python
224
star
3

about-int

Meet the Interactive News Technology (INT) desk at The New York Times
180
star
4

elex

A wrapper for the AP v2 Elections API.
Python
171
star
5

stevedore

search document dumps: ingest and explore in one extensible framework
JavaScript
124
star
6

archieml.org

The archieml.org website and hub for specification development
HTML
81
star
7

driveshaft

Google Drive → JSON → S3
Ruby
74
star
8

foialawya

an app for keeping track of your FOIAs and getting alerts when they're (over) due
Python
53
star
9

adcom

Admin Components
JavaScript
53
star
10

ai-scripts

A collection of useful Illustrator scripts
JavaScript
39
star
11

nyt-clerk

A set of Python modules for downloading, parsing, and outputting data related to the Supreme Court.
Python
39
star
12

archieml-ruby

Ruby parser for the Archie Markup Language (ArchieML)
Ruby
30
star
13

rocketdocket

The fastest, cleanest, most reproducible ways to OCR a document.
Shell
27
star
14

nyt-pyfec

A Python library for downloading, parsing and cleaning Federal Election Commission filings.
Python
27
star
15

odf

Scripts and tools to help in parsing the Olympic Data Feed from the International Olympic Committee
Ruby
25
star
16

nyt-entity-service

A web service for disambiguating and canonically storing entities.
Python
25
star
17

compstat_parser

Parse the NYPD's weekly per-precinct crime complaints stats to CSV or MySQL
Ruby
24
star
18

elex-loader

The NYT AP election loader scripts
Shell
22
star
19

nyt-forcible-entry

Deaths resulting from forcible-entry search warrant raids from 2010 to 2016
21
star
20

nyt-fec

a smaller, cleaner, campaign finance app that complements the new FEC site
Python
21
star
21

geoip-legacy

A simple, Node-based service for providing geolocation data based on a user's IP address. Also includes a client-side implementation to help use geolocation information in client-side apps.
JavaScript
16
star
22

nyt-docket

A Python client for parsing SCOTUS cases from the granted/noted and orders dockets. https://pypi.python.org/pypi/nyt-docket
Python
15
star
23

apfake

A command-line tool for generating AP API JSON files for testing elections applications.
Python
15
star
24

capital_git

Use git as a database. Wrapper around https://github.com/libgit2/rugged
Ruby
13
star
25

int-newsapps-template

A template for creating new INT News Apps applications in Django or Flask
HTML
13
star
26

context

Securely stores and conveniently retrieves environment variables in etcd or Redis.
Go
12
star
27

nyt-scotus

A Django app for accessing and editing Supreme Court data.
Python
11
star
28

fec2json

turn fec files into json
Python
10
star
29

ap-deja-vu

A small web service that will replay captured JSON from an AP election test.
HTML
8
star
30

nyt-scotusbot

A SlackBot for notifying NYTimes reporters and editors about changes to the Supreme Court's docket, grants and orders.
Python
8
star
31

nyt-campfinbot

A SlackBot for notifying NYTimes reporters and editors about filings to the Federal Election Commission's web site.
Python
8
star
32

lambda-gem-builder

Build Ruby Gems using AWS Lambda and host them statically on S3
JavaScript
8
star
33

nyt-nj-campfin

Scrapers for NJ campaign finance data
Python
8
star
34

fec-csv-sources

CSV headers and column sources for parsing FEC filings.
7
star
35

elex-ftp-loader

A simple loader for AP FTP elections results.
Python
7
star
36

moving_summonses_parser

Ruby
7
star
37

readme_templates

We want our repos to have more and better developer documentation. Here are some templates for READMEs for different kinds of projects that you can copy (and edit, and contribute to).
7
star
38

nyt_inmates

Methodology notes and data from the series on discipline and parole in New York State
7
star
39

ap-precinct-parser

Parses precinct-level AP election results
Python
6
star
40

datasettr

A Python library for wrangling CSVs into SQLite databases for serving with Datasette.
6
star
41

elex-admin

A CRUD admin for editing AP election results data, including names and race calls.
HTML
6
star
42

longshore

Build server for Docker.
Go
6
star
43

archieml-loader

A very quick and simple Webpack loader for ArchieML files.
JavaScript
5
star
44

SEC

Data for S.E.C. Enforcement Story
5
star
45

kubernetes-dns-reverse-proxy

Proxy server to route traffic to the right kubernetes local hostname
Go
4
star
46

promise

An active HTTP reverse-proxy backed by etcd.
Go
4
star
47

docker-rails

The base Ruby image with additions to support Rails.
Dockerfile
4
star
48

nyt-pyiap

A set of Python functions and middlewares for common frameworks for validating JWT tokens set by Google IAP.
Python
4
star
49

euro

Our XML parser for the 2016 Euro data from Opta sports.
PHP
4
star
50

nyt-entity-uploader

A Python wrapper for making requests to the NYT Entity Service API
Python
4
star
51

nyt-elections-admin

A simple Django-based administration interface for an election loader.
Python
4
star
52

nyt-screenshot-service

A lightweight screenshotting service backed by Google Cloud Storage.
Python
3
star
53

nyt-scotus-loader

A set of bash scripts for loading Supreme Court data into a Postgres database.
Shell
3
star
54

campfin-loader

archived on 2022-05-31 as a cleanup of old campaign finance code
Python
2
star
55

simple-wind

JavaScript
2
star
56

elex-micro

Everything you like about Elex, only less.
Python
2
star
57

replay-ap

An UPDATED, RE-NAMED engine for recording, storing, and replaying Associated Press elections.
Python
2
star
58

nyt-prb-scraper

A scraper and parser for the Periodic Review Secretariat's web pages for Guantanamo detainees.
Python
1
star
59

stevedore-uploader

the uploader for stevedore (github.com/newsdev/stevedore)
Ruby
1
star
60

remora

A utility for tracking a Docker container using etcd.
Go
1
star
61

newsapps-scraper-txair

An air emissions events scraper for Climate.
Python
1
star
62

kube-test-app

Simple dockerized Node.js app
JavaScript
1
star