ART V4 summary
==============
This is a concise description of ART's operation: this version dated 8 August 2022 and describes ART V4
For more detailed discussion, see the appendices and main content of the 'Software Language Engineering' textbook.
For notes on interworking with V3, see final section of this document.
The ART static pipeline
-----------------------
ART specifications comprise declarative rules and imperative directives whose names begin with a shriek (!) character
ART merges the initial specification and any dependencies, and constucts from them a set of modules
containing rules, and a directives list. This stage is called the static pipeline.
Static pipeline:
Parse initial specification
Recursively process !merge directives to create a final specification
Parse final specification
Construct modules according to !module, !import and !export directives
Concatenate all other directives onto the directive list which forms a sequential impertive program.
ART command line processing
---------------------------
The command line interface constructs the initial specification string from the command line arguments as follows
Begin with an empty string
If the first command line argument begins with a shriek (!) then append followed by a space
else
append !merge ''
If the second command line argument begins with a shriek then append followed by a space
else append !input ''
For each subsequent argument , append followed by a space
If no !try argument has been appended, then append a final !try
The effect of this is that a command line such as
java -jar art.jar spec.art input.str !option1
yields the initial ART specification
!merge 'spec.art' !input 'input.str' !option1 !try
which will process input.str using the rules in spec.art under option !option1.
The ART dynamic pipeline
-----------------------
The dynamic pipeline connnects a lexer, a parser, a term rewriter and an attribute evaluator in that order.
(Input) -> lexer -> (TWE set) -> choose -> (TWE set) ->
parser -> (ESPPF) -> choose -> (Term) ->
term rewriter -> (Term) ->
Attribute evaluator -> (Values)
Parenthesised elements are datastructures; nonparenthesised elements are processes.
In more detail, the dynamic pipeline when triggered by !try directive performs these actions:
Read current input
Lexicalise to current TWE set
Modify the current TWE set by applying TWE choosers and optionally a selecting single lexicalisation
Parse TWE set to the current ESPPF and GSS
Modify the current ESPFF by applying SPPF choosers and optionally selecting a single derivation
Create current term from the ESPPF with any remaining ambiguities expanded via the ARTAMBIG constructor
Modify the current term by applying rewrite rules according to a rewrite strategy
Evaluate the current term's attributes
Directives
==========
Tracing, creating and accessing pipeline data structures
--------------------------------------------------------
!verbosity n set the console verbosity threshold to n (DEFAULT 5)
!trace n set the console trace threshold for processes which support tracing (DEFAULT 0)
!log n set the logging threshold to n
For x in input, TWE, ESPPF, GSS, tree, term:
!xCounts print summary statistics
!xPrint print contents
!xPrintFull print detailed contents
!xWrite 'f' write detailed contents as text to file f
!xShow create a graphical visialisation
!xDump 'f' write binary representation to file 'f'
!xUndump 'f' read binary representation from file 'f'
!main m select m as main module
!start N set the start nonterminal for the parser to nonterminal N
!start R set the initial rewrite relation to relation R
!try trigger pipeline execution
Input control
-------------
!input "s" set current input to the literal string s
!input 'f' set current input to the string contents of file named f
!inputPrint print the current input
!inputCounts print the number of lines and the overall number of characters for current input
Lexer/parser interface control
------------------------------
!paraterminal n declare nonterminal n as a paraterminal (n may also be a comma delimited list of nonterminals)
!wsAbsorbLeft absorb prefix whitespace into left extent
!wsAbsorbRight absorb postfix whitespace into right extent (DEFAULT)
!wsAbsorb absorb default whitespace (` | `t | `n | `r )* after each paraterminal is recognised
!wsAbsorb N absorb whitespace defined by nonterminal N after each paraterminal is recognised
!wsInject inject (` | `t | `n | `r )* after each righ hand side instance of a paraterminal
!wsInject N inject nontermonal N after each right hand side instance of a paraterminal
Lexer control
-------------
!lexCounts print final lexer statistics
!lexDisable disable lexing, using current TWE set for parser input
!lexDFA select DFA based recogniser (DEFAULT)
!lexGLL select GLL recogniser
!lexPlugin select hand-crafted recognisers defined by lexer plugin code
TWE set management
------------------
!tweShortest enable suppression of TWE element if s< with k if s>>t and there
exists with k>j (t, s tokens; i, j extents)
!twePriority enable suppression of TWE element if s>t and there
exists (t, s tokens; i, j extents)
!tweDead enable suppression of TWE element if it is not on a path that spans the input string
!tweSelectOne arbitrarily select a single lexicalisation within the TWE
set if there is one after chooser application
!twePrint print non-suppressed contents of TWE set
!twePrintFull print all TWE set elements
!tweCounts print TWE set cardinalities
!tweAmbiguityClasses collect and print current TWE set's ambiguity classes
!tweLexicalisations compute lexicalisation counts and report: the next
three directives enable different aspects of lexicalisation counting
!tweExtents in tweCounts, use extents
!tweSegments in tweCounts, use segments
!tweRecursive in tweCounts, exhastively count lexicalisations (only useful for small TWE sets)
!tweDump dump TWE set to ARTTWE.twe in a format that can be loaded by V3 MGLL parsers
!tweTokenWrite 'f' write selected lexicalisation out as a token string to file 'f'
Parser control
--------------
!parseCounts print final parser statistics
select parser mode: for x in mgll gll gllClustered gllTWERecognise cnp lcnp mcnp lr glr rnglr brnglr earley earley2007 OSBRD RD
!xTermAPI use algorithm x reading term based grammar and using standard API functions for support
!xIndexedAPI use algorithm x reading lookup table based grammar and using standard API functions for support
!xIndexedPool use algorithm x reading lookup table based grammar and Hash Pool memory management for support
!xGeneratorPool write out standalone parser which uses algorithm x reading lookup table based grammar and Hash Pool
memory management for support
SPPF management
---------------
!sppfChooseCounts Report on node numbers before and after SPPF choosers run
!sppfShortest suppress SPPF packed node if l< with k if l>>m and there exists sibling with k>j (l,m slots; j pivot)
!sppfPriority suppress SPPF packed node if l>m and there exists sibling (l,m slots; j,k pivot)
!sppfDead suppress SPPF packed node if it is unreachable from node
!sppfOrderedLongest suppress SPPF packed node if there exists sibling with k>j OR if m appears before l in the specification ** Issue
!sppfSelectOne arbitrarily select a single derivation from the SPPF
if there is one after chooser application
!sppfCountArities compute a histogram of the arities of all symbol/intermediate nodes reachable from the SPPF root
!sppfCountDerivations attempt to count all of the derivations in an SPPF (warning: time exponential in arities)
!sppfCountSentences attempt to enumerate all of the sentences in an SPPF by constructing each yield and adding
it to a set of sentences (warning: time exponential in arities and potentially space exponential too)
!sppfToTWE (was!tweFromSPPF) construct a TWE set containing the yields for all of the unsuppressed derivations in the SPPF
Rewrite management
------------------
!rewriteDisable disable the term rewriter
!rewriteAtRoot rewrite strategy: match rules at node only
!rewritePreOrderOne rewrite strategy: search preorder for the first reducible term, rewrite and stop
!rewritePostOrderOne rewrite strategy: searche postorder for the first reducible term, rewrite and stop
!rewritePreOrderAll rewrite strategy: searche preorder for the first reducible term, rewrite then
recursively traverse the rewritten term
!rewritePostOrderAll rewriter strategy: searche postorder for the first reducible term, rewrite and then
recursively traverse the rewritten term
!rewriteIterateAtRoot iterate until normalised rewrite strategy: match rules at node only
!rewriteIteratePreOrderOne iterate until normalised rewrite strategy: searche preorder for the first reducible term, rewrite and stop
!rewriteIteratePostOrderOne iterate until normalised rewrite strategy: searche postorder for the first reducible term, rewrite and stop
!rewriteIteratePreOrderAll iterate until normalised rewrite strategy: searche preorder for the first reducible term, rewrite then
recursively traverse the rewritten term
!rewriteIteratePostOrderAll iterate until normalised rewriter strategy: searche postorder for the first reducible term, rewrite and then
recursively traverse the rewritten term
Attribute evaluator management
------------------------------
!evalDisable disable the attribute evaluator
Standalone tools
----------------
!termTool start the term rewriting tutorial tool
!grammarWrite write out parse, lexer, character level, token and pretty printed grammars
!generateDepth generate strings from grammar depth first
!generateBreadth generate strings from grammar breadth first
!generateRandom generate strings by randomly selecting the expansion instance and right hand side
!extractJLS extract Java Language Specification grammar from text snipped document
!compressWhitespaceJava compress Java file whitespace runs to single character
Rules
-----
ART usees three kinds of rule: context free grammar rules, choser rules and rewrite inference rules. CFG rules and rewrite rules may optionally contain attribute equations which are evaluated after the tree stabilises.
Context Free Grammar rules
--------------------------
A CFG rule has the form ::=
CFG rules are composed from nonterminals, terminals, the empty string symbol, EBNF operators and the ::= symbol
A nonterminal is denoted by an alphnumeric symbol or a string delimuted by $ characters: adrian123 $;^^\$xyz$ An alphanumeric identifier may not begin with ART, art or any other mix of cases
Terminals have four subclass: 'case sensitive' "case insensitive" &builtin `c where c is a single character
The empty string is denoted by #
EBNF operators are: postfix * Kleene star, postifx + positive closure, postfix ? optional, ( ) do-first, infix | alternation, infix concatentaion, infix \ difference, prefix \ not,
adrian ::= 'xb'* C C ::= ('c' | 'C')+ | #
Choosers
--------
The choice mechanism utilises named sets of three-tuples (higher, longer, shorter) where higher, longer and shorter are
sets of ordered pairs of grammar elements which may include terminals, nonterminals, productions and slots.
A chooser declaration has this form:
!choose chooserSetName? ( L OP R )*
that is, the directive !choose followed by an optional name followed by zero or more choosers, where:
L and R are set expressions over grammar elements and the operators | \ and () representing union, difference and do-first respectively
Grammar elements may be terminals, nonterminals, productions and slots, or one of these keywords:
anyLiteralTerminal, anyBuiltinTerminal, anyParaterminal, anyTerminal
OP is one of >> (longer) << (shorter) > (L higher than R) < (R higher than L)
chooserSetName is an optional named set. If chooserSetName is omitted, the set with the empty name "" is selected
In the grammarPrint phase, chooser relations are computed, and the individual written out to files names ARTChooseXyz.art
where Xyz is a chooser set name. Since the default name is empty, the unnamed chooser set is written to ARTChoose.art
The batch files lexGLL and parseMGLL expect to find files ARTChooseLexTWE.art, ARTChooseParseSPPF.art and ARTChooseParseTWE.art,
so specifications to be used with these batch files must include !choose directive for those three names. This is only a convention, and
the names could be changed if the batch files were changed to match.
The relations are written into the generated lexer and parser, so any changes to them will require regeneration. However, ambiguity
reduction is only actually enabled by the directives !tweLongest, !twePriority, !sppfLongest and !sppfPriority applied to the
corresponding ARTV3TestGenerated instance.
Rewrite rules
-------------
A rewrite rules has the form conditions --- transition
Notes on V3 and V4
==================
The notes above are for V4.Version 3 does not have the pipeline and associated directive list. Instead,
ART is activated once for each
Top level batch files
--------------------
1. grammarWrite output ART specification files derived from :
ARTParserGrammar.art, ARTLexerGrammar.art, ARTChoose*.art,
ARTCharacterGrammar.art, ARTPrettyGrammar.art, ARTTokenGrammar.art
2. lexGLL use EBNF GLL to lexicalise the contents of and dump the TWE set to file ARTTWE.twe
3. parseMGLL use BNF-only MGLL to parse ARTTWE.twe - argument identical to the lexGLL run that built ARTTWE.twe
4. validateTokens run steps 1-3 and then perform a token level parse using a lexicalisation derived from the SPPF
5. clean removes intermediate files created by steps 1-4
Utility batch files used by the top level batch files
-----------------------------------------------------
art Run ART V4
artV3 Run ARTV3
artV3CompileGenerated Run the Java compiler on the most recently produced ARTGeneratedLexer.java and ARTGeneratedParser.java
artV3TestGenerated Exercise the most recently compiled ARTGeneratedParser and ARTGeneratedLexer