Our Technology
The research behind ANTLR Yggdrasil began with the goal of providing DSL solutions to problems of parallel and distributed computing. Development of parallel applications is time-consuming and diverts talent away from solving problems of interest and into structuring programs so that they can operate in parallel. Send and receive code can be generated from the same specification; data layouts can be determined algorithmically, and code can be generated for optimal cache usage. Rapid development of domain specific languages looked to be a promising solution, although I have since come to believe that parallelization is better done by compiler. Nevertheless, DSLs have a significant place in composing large-scale bodies of software.
Early research focused on automating the process of tree grammar generation from annotated precursor grammars. This led to adding support for token creation; then it seemed desirable to have a first-class syntax for tree/token decoration. The result was published through the Open Channel Foundation (NASA’s then approach to releasing Open Source software) in 2004 as ANTLR 2.8 (with the permission of ANTLR’s author, Dr. Terence Parr). The experience with developing this first grammar generator led to the Yggdrasil type mapping scheme and pre-defined types (or interfaces, which seems more accurate). One important consideration of the Yggdrasil syntax was minimalism, and arithmetic proved unnecessary—all arithmetic operations could be hidden behind assignments to data structures and the get/set methods that implement them. Where Knuth’s attribute grammar concept focused on computing attributes, ANTLR Yggdrasil decorates grammars and tokens and hides the computation. Parr’s introduction of StringTemplate for code generation made it possible to replace actions using target code with templates that were independent of the target language.
Automated grammar generation sped up the development process considerably; multi-pass translation became a matter of routine instead of a time-consuming chore. This left output generation as the limiting step in the process, apart from grammar design. Terence Parr had introduced StringTemplate as a code generation tool; during the planning for ANTLR 3 he had the concept for an output grammar that integrated the template capabilities of StringTemplate, but the idea never quite gelled into an implementable design. The mapping concept developed for implementing Yggdrasil attributes proved to be a viable approach: instead of mapping types and tokens, templates could be mapped to productions and tokens to template arguments. This significantly reduces development time, as much from reducing errors as from accelerating code development.
The next step was to incorporate graph technology—graphs are fundamental to optimizing compilers. In ANTLR 3, Parr demonstrated that abstract syntax trees could be implemented in terms of UP/DOWN structuring tokens. Graphs are a bit more complicated, but—provided that a graph has a spanning tree—it is possible to tokenize a graph via pre-order depth-first traversal, and use a small set of token types to provide graph structuring. Graphs could be reconstructed from the resulting token stream: this made it possible to incorporate graph parsing into the development process. Unfortunately, it took time to implement bug-free GraphPayloadStream and GraphBuilder classes; some experimentation is still needed to develop a syntax for rewriting output graphs, and graph parsing will not be available until the first commercial release.
Despite the incomplete implementation, the effort to incorporate graph technology into ANTLR Yggdrasil led to the discovery of three sets of algorithms for language processing that are in various stages of the national and international patent processes. The first set of algorithms make it possible to automatically parallelize programs during compilation (US patents 9,182,957 and 9,864,590); the second is the linear GLL recognition algorithm at the heart of the current ANTLR Yggdrasil implementation (US patent 10,055,300 and patent pending); the third set is a collection of graph processing tools (patent pending) that made it possible to implement LGLL efficiently, and has obvious application to other language problems. The graph processing tools will become available in the commercial release; the current implementation requires some adaptation for library use.