• This repository has been archived on 25/Mar/2019
  • Stars
    star
    107
  • Rank 323,587 (Top 7 %)
  • Language
  • Created over 10 years ago
  • Updated over 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Defining a standard JavaScript CST (concrete syntax tree) to complement ASTs

Update: This effort has moved to estree, so this repo has now been shut down and deprecated.

Concrete Syntax Tree

Defining a standard JavaScript CST (concrete syntax tree) to complement ASTs.

Problem

The standard SpiderMonkey AST format holds only information which is considered necessary for producing valid execution of a JS program. Parsing to this format explicitly discards certain "unnecessary" information such as comments, whitespace, and perhaps ignored syntactic forms like extra ( ) pairs, extra ;, etc.

Unfortunately, there's a class of emerging tools for which the AST is insufficient as an intermediate representation format, because loss of this information defeats the purpose of the tools. Whitespace, comments, etc need to be representable in the tree, so that a tool which wants to produce or use them, can.

More back-reading on this topic can be found in this really long thread.

Examples:

  • a JSDoc interpreter needs the comments, and in particular, exactly where and how they appear in the code is strictly important information.
  • a code formatter may only concern itself with modifying part of a JS program's formatting, and would need to preserve other code exactly as it originally appeared.

Goal

Our goal is to define an alternate tree format, called a CST (concrete syntax tree), which holds the additional information these other tools need. Of course, any tool could produce a CST, not just parsers: transpilers from other languages, code formatters, etc.

We need one standard format that all the various tools can agree to use as an intermediate form, so that processing flows can integrate multiple tools in the handling of a snippet of source code.

Plan

PLEASE GO READ THE CHARTER/MISSION FIRST

This is an open group, an ad hoc working committee if you will, where all interested parties can register their interest to participate. Ultimately, it is the tools makers (who must participate to make this work!) who will decide if what we come up with is suitable. The success of this group effort is if tools makers agree and implement what we propose.

We will conduct all discussions on this repo, in the open, or at least post transcribed notes here from any teleconference/online meetings. It seems like a good idea to have at least one initial meeting to kick things off, but we will self-organize here and make such decisions as we go.

"Register" Interest

Please go to this issue thread and add a comment to "register" your interest in this "committee".

We'll all figure out next steps after people have a chance to join.

Design

Some stated design biases (to be validated):

  • a CST should be pretty similar to the standard AST, as much as possible, so as to make conversion between CST and AST (or vice versa?) straightforward.
  • the CST must be a format that a code parser could practically produce while parsing, either by default or by option, as the main use-case of this format is to prevent information-loss during parsing.
  • most importantly, a CST format must be suitable for code generators to use while producing standard JS code.

Current Proposals

There were some discussions on this topic awhile back, and two rough ideas for CST format emerged. These should serve to inform the design process, but not limit/control it.

To illustrate the two proposals' approaches, consider this sample piece of code:

/*1*/ function /*2*/ foo /*3*/ ( /*4*/ ) /*5*/ {
  // ..
}

This program is parsed into the following AST:

{
    "type": "Program",
    "body": [
        {
            "type": "FunctionDeclaration",
            "id": {
                "type": "Identifier",
                "name": "foo"
            },
            "params": [],
            "defaults": [],
            "body": {
                "type": "BlockStatement",
                "body": []
            },
            "rest": null,
            "generator": false,
            "expression": false
        }
    ]
}

As you can see, there's whitespace and comments appearing in a several different syntactic "positions" in that function declaration header. Now, if you consider the current standard AST, not all of those syntactic positions have natural AST representations where these "extras" (comments, whitespace, etc) can be attached.

For example, /*4*/ appears inside an empty parameters-list. But how could we represent the whitespace/comments there in the above AST? The params list is an array, and in JSON/object-literal format, an array cannot also have properties.

We need a CST to do that.

So, briefly, a CST could look like:

  1. These "extras" annotations could be represented by an "extras" property attached to any node, with three standard position names as properties of the "extras" annotation: "before", "inside", "after".

    Since those names aren't sufficient to describe all syntactic forms with just the standard AST, such as /*4*/ above, we can add additional synthetic/virtual nodes to the tree (parent or child nodes) that act as anchors to hang these "extras" annotations in the proper positions.

    We could solve this by adding a node called paramList in the function declaration, which is an object. This node could have as a child property the params array seen above. Or it could just be a sibling to params in the function declaration. Either way, paramList as an object can have a property called extras, and in that object, the inside position descriptor could be an array to hold the whitespace and comments items that appear in between the ( ).

    Another option (which isn't as obvious here but would end up helping in a lot of other places) is to define a special virtual node type like EmptyExpression, which is a node that acts as a positional placeholder, but which would strictly imply no actual code when being processed. This node could be a child of params only if params was otherwise empty. That may help some cases, but it could make some tools' jobs a little harder in that they are in the habit of checking the length of the array only, not its contents.

    Regardless of how or where these virtual nodes sit, it underscores the idea that a CST would probably need to be "converted" (that is, have all the virtual nodes stripped) to a standard AST before some AST-only-aware tools could work with the tree. That would be an explicit, and fairly trivial to implement, conversion step, provided by tools like parsers for instance, like makeASTFromCST(..).

    The upside is that we only have 3 position names to know about and handle ("before", "inside", "after"). That makes tooling logic easier in the generation and consumption phases. The downside is that it creates extra nodes, which means current tools that deal with ASTs either need a "conversion" which strips the CST down to an AST, or have to change to be able to handle the presence of virtual nodes it should ignore in its processing. It could make some code paths a bit harder, while making other code paths much easier. Tradeoffs abound.

  2. Instead of adding extra virtual nodes, we can just limit which nodes can have "extras" annotations to those which are already objects. This means that we would need to have potentially lots more "position descriptors", as "before", "inside", and "after" would not be sufficient in all the various different JS syntactic forms.

"inside" when attached to a function declaration could always mean "inside the param list", and so whether there are params or not, the whitespace/comments extras can always be represented. In the above example, "inside" kinda works, but there are lots of other syntactic forms where different position names would have to be used. Instead of knowing/handling just 3 positional labels, we might have 5-10 different labels (not all of which made sense in different nodes).

The upside is that a CST is structurally the same as an AST (simplifying some existing tools handling ASTs), with just extra properties which could theoretically be either used or ignored easily. The downside is that producing and consuming these extras is a fair bit more complicated as there are quite a few different positional names to handle.

Chicken Or The Egg?

Another way to boil down the contrast between these two proposals is: Is an AST a CST, or is a CST an AST? Or, which comes first, the AST or the CST?

The first proposal is that all ASTs are CSTs, but not vice versa. Any (future) tool which could take a (future) CST could just take an AST, and work just fine (without extras, of course!). Any existing tool which expects ASTs would probably need you to "convert" (that is, strip out the virtual nodes) a CST down to a base AST, so that they didn't need to handle all those virtual nodes.

The second proposal is essentially the reverse: all CSTs are ASTs, but not vice versa. Any tool which currently takes ASTs can start taking CSTs (without conversion) and not have to change (as long as it's tolerant to ignore properties it doesn't care about, which is sort of an implicit "conversion" from CST to AST). Any tool which would take a (future) CST could take an AST (without extras, of course!).

Neither proposal is perfect. They both imply different tradeoffs. They each imply that some code paths will be simpler and some will be harder. It's not totally clear that either is the right choice, or that another option wouldn't be better. These are things that should be explored and debated.

License & Assignment

By participating in this group, you are explicitly granting/assigning any and all ideas you present to the group to be licensed under CC0 public domain, and you agree that neither you nor anyone else who participates in any of these discussions will retain any intellectual property rights (copyright, patentability, etc) over what is submitted publicly to our discussions.

Simply do not participate if you have any potential or real conflict that prevents you from agreeing to this process under the spirit of those terms.

More Repositories

1

You-Dont-Know-JS

A book series on JavaScript. @YDKJS on twitter.
178,746
star
2

Functional-Light-JS

Pragmatic, balanced FP in JavaScript. @FLJSBook on twitter.
JavaScript
16,593
star
3

LABjs

Loading And Blocking JavaScript: On-demand parallel loader for JavaScript with execution order dependencies
HTML
2,277
star
4

asynquence

Asynchronous flow control (promises, generators, observables, CSP, etc)
JavaScript
1,743
star
5

CAF

Cancelable Async Flows (CAF)
JavaScript
1,332
star
6

monio

The most powerful IO monad implementation in JS, possibly in any language!
JavaScript
1,049
star
7

TNG-Hooks

Provides React-inspired 'hooks' like useState(..) for stand-alone functions
JavaScript
1,011
star
8

native-promise-only

A polyfill for native ES6 Promises as close as possible (no extensions) to the strict spec definitions.
JavaScript
723
star
9

A-Tale-Of-Three-Lists

Comparing various async patterns for a single demo
JavaScript
651
star
10

fasy

FP iterators that are both eager and asynchronous
JavaScript
546
star
11

FPO

FP library for JavaScript. Supports named-argument style methods.
JavaScript
449
star
12

youperiod.app

YouPeriod.app -- the privacy-first period tracking app
JavaScript
442
star
13

JSON.minify

Simple minifier for JSON to remove comments and whitespace
409
star
14

TypL

The Type Linter for JS
JavaScript
374
star
15

h5ive-DEPRECATED

**DEPRECATED** A collection of thin facade APIs wrapped around HTML5 JavaScript features.
JavaScript
324
star
16

eslint-plugin-proper-arrows

ESLint rules to ensure proper arrow function definitions
JavaScript
305
star
17

foi-lang

Foi: a different kind of functional programming language
JavaScript
303
star
18

grips

Simple-logic templates
JavaScript
287
star
19

moduloze

Convert CommonJS (CJS) modules to UMD and ESM formats
JavaScript
207
star
20

ES-Feature-Tests

Feature Tests for JavaScript
JavaScript
199
star
21

let-er

DEPRECATED: Transpile non-ES6 let-blocks into ES6 (or ES3)
JavaScript
192
star
22

Incomplete-JS

"The Incomplete Guide to JavaScript" (book). @IncompleteJS on twitter.
190
star
23

revocable-queue

Specialized async queue data structure, supports revocation of values
JavaScript
176
star
24

deePool

highly-efficient pool for JavaScript objects
JavaScript
118
star
25

getify.github.io

JavaScript
113
star
26

eslint-plugin-proper-ternary

ESLint rules to ensure proper usage of ternary/conditional expressions
JavaScript
96
star
27

cloud-sweeper

A casual game built for the web.
JavaScript
94
star
28

BikechainJS

JavaScript VM engine (powered by V8); server-side environment modules; server-side synchronous web app controllers
C++
82
star
29

wepuzzleit

Demo PoC game for various advanced HTML5 js APIs
JavaScript
79
star
30

workshop-periodic-table

66
star
31

elasi

EL/ASI: Encrypt Locally, Account Secure Interchange
JavaScript
62
star
32

remote-csp-channel

Remote bridge for CSP channels
JavaScript
55
star
33

ScanTree

Scan a JS file tree to build an ordered and grouped dependency listing
JavaScript
51
star
34

dwordly-game

A game where words dwindle down to the shortest possible
JavaScript
42
star
35

stable-timers

timers with less race conditions
JavaScript
38
star
36

emdash

Simple blogging with node/iojs + GitHub.
36
star
37

domio

DOM and Event helpers for Monio
JavaScript
30
star
38

eslint-plugin-no-optional-call

ESLint plugin to disallow the optional-call operator
JavaScript
30
star
39

eslint-plugin-arrow-require-this

DEPRECATED: ESLint rule to require arrow functions to reference the 'this' keyword
JavaScript
28
star
40

import-remap

Rewrite ES module import specifiers using an import-map.
JavaScript
25
star
41

gum-try-hd

Try to enforce HD (16:9) camera aspect ratio for web-video calls
JavaScript
25
star
42

Mock-DOM-Resources

A mock of (parts of) the DOM API to simulate resource preloading and loading
JavaScript
25
star
43

make-a-game

some project files for a tutorial on making a simple web game
JavaScript
25
star
44

mpAjax

framework plugin for handling multi-part Ajax responses
JavaScript
23
star
45

asyncGate.js

DEPRECATED. asynchronous gate for javascript
JavaScript
21
star
46

tic-tac-toe-workshop

Workshop files for building tic-tac-toe in JS and <canvas>
21
star
47

esre

esre: fully configurable JS code formatting
20
star
48

workshop-chess-diagonals

17
star
49

featuretests.io

JavaScript Feature Tests... as a service
JavaScript
16
star
50

FoilScript

FoilScript: a new dialect of JS that fixes the sucky parts but still looks and feels like JS
16
star
51

literalizer

Specialized heuristic lexer for JS to identify complex literals
JavaScript
16
star
52

normalize-selector

Normalize CSS selectors
JavaScript
14
star
53

DOMEventBridge

Bridge DOM events to a JS event hub (for pubsub)
JavaScript
14
star
54

pong-around-workshop

Workshop files for building a pong-variant game in JS and <canvas>
12
star
55

workshop-wordy-unscrambler

11
star
56

shortie.me

Proof-of-concept demo for server-side JavaScript driven "middle-end" architecture
JavaScript
11
star
57

workshop-knights-dialer

8
star
58

middleend-boilerplate

Boilerplate Starting Point for Middle-end Web Architecture
JavaScript
8
star
59

demo-app-weatheround

JavaScript
7
star
60

the-economy-of-keystrokes-slides

Slides code built for "The Economy of Keystrokes" talk
JavaScript
6
star
61

santa-connect-app

Santa Connect: keeping track of your kids' nice/naughty status
JavaScript
5
star
62

nyc-bug-demo

bug demo for NYC code coverage tool
JavaScript
4
star
63

unnamed

unnamed (for now). nothing to see here. ;-)
JavaScript
2
star
64

my-lifesheets

CSS
1
star
65

breakthewebforward.com

JavaScript
1
star