The first early access release of ANTLR Yggdrasil is out; please download, comment, and report defects and misfeatures. ANTLR Yggdrasil supports rapid development for language processors, including grammar generation for multi-pass translation, an attribute syntax with templated actions to support full separation of concerns, target-language-independent grammars (although Java is currently the only target), and an “output grammar” mapping of grammar/templates for output generation. Unlike ANTLR 4, ANTLR Yggdrasil is geared towards grammar-driven multi-pass language transformation. Sorcerer and ANTLR tree grammars were an important advance in language processing; Yggdrasil builds on Ter’s pioneering efforts, and is as much about building data structures as it is about language recognition. Early versions will not support graph parsing (I have not yet worked out attributes for graph rewrites), graph transformation, and some other patent pending graph technology, but the commercial versions will.
Early access versions are “free” for evaluation purposes, but the end product will be commercial. I intend the release implementations to be inexpensive for personal and educational use. (Much of the code will be released BSD license eventually, but the recognizer generation code will remain proprietary.) This is a consequence of LGLL: it has a wide range of potential applications beyond the usual formal language processing, and I intend to take advantage of that. [I decided to file for patent protection as a consequence of Micron’s Automata processor; LGLL is more capable, and after finding an algorithmic solution to the problem of excessive code generation, I am even more convinced.]
ANTLR Yggdrasil incorporates a revolutionary parsing algorithm (patent pending). Linear GLL recognition coupled with PEG-style ambiguity resolution provides makes for fast parsing, while semantic and syntactic predicates deal with context-sensitivity. Linear GLL handles both left- and right-recursive grammars.
The major stumbling block in getting a version of the LGLL algorithm for practical use was in getting generated code down to a manageable size—early tests were generating 7-10 MB files, and one of the grammars for the grammar generator produced 32 MB of output. It took awhile, but I discovered a collection of graph processing algorithms that define a (patent pending) process that limits the size of generated parsers. The current version generates parsers that are bulky (a few hundred KB, like previous versions of ANTLR) rather than enormous.
ANTLR Yggdrasil is heavily dependent on templates, and releases will include my StringTemplate compiler. It is currently usable (I’ve not yet shifted ANTLR Yggdrasil from ST4) and has both Java and C++ targets, but needs more work. Then again, the point of early access releases is tool maturation, and bug reports and other feedback will be much appreciated.
The ea1 release includes source code for the runtime, StringTemplate compiler, and GrammarGen (the grammar generator). That is to make up for deficiencies in documentation—no doxygen-generated API docs, for example. There is a manual (in the doc directory). The lib directory (or rather, directories since STC is provided with its own directory tree) contains the supporting jar files as well as the Yggdrasil jar.
ANTLR Yggdrasil implements the philosophy that language processing is about transformation. It rejects the notion that language processing is about semantics; language semantics can change depending on the problem to be solved. The point of using a domain-specific language is that the semantics of domain specification is made implicit and need not explicitly appear in any programs written in that language. Similar languages applied to different problem domains can easily differ semantically. Consider, for example, the expression A + B. If A is 1 and B is 2, then A+B = 3. If A is “a” and B is “b”, then A+B = “ab” in many modern programming languages. If the context is “numerical processing”, you get one result; if the context is “string processing”, you get another. In strongly typed languages, that context is provided by the declarations that define A and B; in typeless (or “duck typed”) languages, that context is provided by most recent assignments to A and B. Rather than focusing on “semantics”, it is more practical to focus on “context capture”. Instead of the context-free semantics of Knuth’s attribute grammars to compute synthetic attribute values for non-terminals, ANTLR Yggdrasil focuses on the decoration of syntax trees and grammar-level attributes and to hide the actual computation of synthetic attributes. Just as semantics is an issue of context, so too the choice of synthetic attributes and their computation may be a function of the problem being solved. At the grammar level, attributes are only used for context-dependent decisions (semantic predicates); the computation of a synthetic attribute value is a separable concern.
The ANTLR Yggdrasil approach to language processing is an outgrowth of the principles that Parr built Sorcerer on: annotation-guided tree construction and predicated context-free grammars for processing syntax trees. It began with algorithms for automating output grammar construction to match the results of an annotated input grammar. Then token construction was added to provide finer control over output tree construction with first-class syntax in order to support recognition for grammar generation. That implementation led to the realization that an attribute type system could be mapped directly onto target-language types with manipulation limited to instantiation and get/set operations. Altogether, this technology leads to annotated grammars which map input token stream to output data structure; tokenized data structure provides the token stream for the next pass. Having gotten this far, the “last” issue for rapid application development was textual output; that problem was solved by adding an additional annotation system to map grammar annotations onto templates for output.
Only—this was not quite the last issue. Just as it is possible to parse syntax trees via serialization to token stream from preorder traversal and the introduction of UP/DOWN token types, it is also possible to parse graphs through preorder traversal and the introduction of another set of token types. The machinery for that is working, but not integrated. Instead, that technology led to the patents described on the “Technology” page. With the technology present in the early access versions, ANTLR Yggdrasil can make short work of the typical language processing problem; with the added graph technology, the commercial versions should be able to make short work of any compiler problem.
While it is not yet integrated with ANTLR Yggdrasil for code generation, the StringTemplate compiler, STC, is part of the release, along with the grammar generator. Neither of these are mature—the grammar generator does not propagate attributes, and STC is halfway through a conversion to supporting globally scoped values—but I have found both quite useful. Both are documented in the manual (in the doc directory), although in a somewhat spartan manner.