ART V4 summary ============== This is a concise description of ART's operation: this version dated 8 August 2022 and describes ART V4 For more detailed discussion, see the appendices and main content of the 'Software Language Engineering' textbook. For notes on interworking with V3, see final section of this document. The ART static pipeline ----------------------- ART specifications comprise declarative rules and imperative directives whose names begin with a shriek (!) character ART merges the initial specification and any dependencies, and constucts from them a set of modules containing rules, and a directives list. This stage is called the static pipeline. Static pipeline: Parse initial specification Recursively process !merge directives to create a final specification Parse final specification Construct modules according to !module, !import and !export directives Concatenate all other directives onto the directive list which forms a sequential impertive program. ART command line processing --------------------------- The command line interface constructs the initial specification string from the command line arguments as follows Begin with an empty string If the first command line argument begins with a shriek (!) then append followed by a space else append !merge '' If the second command line argument begins with a shriek then append followed by a space else append !input '' For each subsequent argument , append followed by a space If no !try argument has been appended, then append a final !try The effect of this is that a command line such as java -jar art.jar spec.art input.str !option1 yields the initial ART specification !merge 'spec.art' !input 'input.str' !option1 !try which will process input.str using the rules in spec.art under option !option1. The ART dynamic pipeline ----------------------- The dynamic pipeline connnects a lexer, a parser, a term rewriter and an attribute evaluator in that order. (Input) -> lexer -> (TWE set) -> choose -> (TWE set) -> parser -> (ESPPF) -> choose -> (Term) -> term rewriter -> (Term) -> Attribute evaluator -> (Values) Parenthesised elements are datastructures; nonparenthesised elements are processes. In more detail, the dynamic pipeline when triggered by !try directive performs these actions: Read current input Lexicalise to current TWE set Modify the current TWE set by applying TWE choosers and optionally a selecting single lexicalisation Parse TWE set to the current ESPPF and GSS Modify the current ESPFF by applying SPPF choosers and optionally selecting a single derivation Create current term from the ESPPF with any remaining ambiguities expanded via the ARTAMBIG constructor Modify the current term by applying rewrite rules according to a rewrite strategy Evaluate the current term's attributes Directives ========== Tracing, creating and accessing pipeline data structures -------------------------------------------------------- !verbosity n set the console verbosity threshold to n (DEFAULT 5) !trace n set the console trace threshold for processes which support tracing (DEFAULT 0) !log n set the logging threshold to n For x in input, TWE, ESPPF, GSS, tree, term: !xCounts print summary statistics !xPrint print contents !xPrintFull print detailed contents !xWrite 'f' write detailed contents as text to file f !xShow create a graphical visialisation !xDump 'f' write binary representation to file 'f' !xUndump 'f' read binary representation from file 'f' !main m select m as main module !start N set the start nonterminal for the parser to nonterminal N !start R set the initial rewrite relation to relation R !try trigger pipeline execution Input control ------------- !input "s" set current input to the literal string s !input 'f' set current input to the string contents of file named f !inputPrint print the current input !inputCounts print the number of lines and the overall number of characters for current input Lexer/parser interface control ------------------------------ !paraterminal n declare nonterminal n as a paraterminal (n may also be a comma delimited list of nonterminals) !wsAbsorbLeft absorb prefix whitespace into left extent !wsAbsorbRight absorb postfix whitespace into right extent (DEFAULT) !wsAbsorb absorb default whitespace (` | `t | `n | `r )* after each paraterminal is recognised !wsAbsorb N absorb whitespace defined by nonterminal N after each paraterminal is recognised !wsInject inject (` | `t | `n | `r )* after each righ hand side instance of a paraterminal !wsInject N inject nontermonal N after each right hand side instance of a paraterminal Lexer control ------------- !lexCounts print final lexer statistics !lexDisable disable lexing, using current TWE set for parser input !lexDFA select DFA based recogniser (DEFAULT) !lexGLL select GLL recogniser !lexPlugin select hand-crafted recognisers defined by lexer plugin code TWE set management ------------------ !tweShortest enable suppression of TWE element if s< with k if s>>t and there exists with k>j (t, s tokens; i, j extents) !twePriority enable suppression of TWE element if s>t and there exists (t, s tokens; i, j extents) !tweDead enable suppression of TWE element if it is not on a path that spans the input string !tweSelectOne arbitrarily select a single lexicalisation within the TWE set if there is one after chooser application !twePrint print non-suppressed contents of TWE set !twePrintFull print all TWE set elements !tweCounts print TWE set cardinalities !tweAmbiguityClasses collect and print current TWE set's ambiguity classes !tweLexicalisations compute lexicalisation counts and report: the next three directives enable different aspects of lexicalisation counting !tweExtents in tweCounts, use extents !tweSegments in tweCounts, use segments !tweRecursive in tweCounts, exhastively count lexicalisations (only useful for small TWE sets) !tweDump dump TWE set to ARTTWE.twe in a format that can be loaded by V3 MGLL parsers !tweTokenWrite 'f' write selected lexicalisation out as a token string to file 'f' Parser control -------------- !parseCounts print final parser statistics select parser mode: for x in mgll gll gllClustered gllTWERecognise cnp lcnp mcnp lr glr rnglr brnglr earley earley2007 OSBRD RD !xTermAPI use algorithm x reading term based grammar and using standard API functions for support !xIndexedAPI use algorithm x reading lookup table based grammar and using standard API functions for support !xIndexedPool use algorithm x reading lookup table based grammar and Hash Pool memory management for support !xGeneratorPool write out standalone parser which uses algorithm x reading lookup table based grammar and Hash Pool memory management for support SPPF management --------------- !sppfChooseCounts Report on node numbers before and after SPPF choosers run !sppfShortest suppress SPPF packed node if l< with k if l>>m and there exists sibling with k>j (l,m slots; j pivot) !sppfPriority suppress SPPF packed node if l>m and there exists sibling (l,m slots; j,k pivot) !sppfDead suppress SPPF packed node if it is unreachable from node !sppfOrderedLongest suppress SPPF packed node if there exists sibling with k>j OR if m appears before l in the specification ** Issue !sppfSelectOne arbitrarily select a single derivation from the SPPF if there is one after chooser application !sppfCountArities compute a histogram of the arities of all symbol/intermediate nodes reachable from the SPPF root !sppfCountDerivations attempt to count all of the derivations in an SPPF (warning: time exponential in arities) !sppfCountSentences attempt to enumerate all of the sentences in an SPPF by constructing each yield and adding it to a set of sentences (warning: time exponential in arities and potentially space exponential too) !sppfToTWE (was!tweFromSPPF) construct a TWE set containing the yields for all of the unsuppressed derivations in the SPPF Rewrite management ------------------ !rewriteDisable disable the term rewriter !rewriteAtRoot rewrite strategy: match rules at node only !rewritePreOrderOne rewrite strategy: search preorder for the first reducible term, rewrite and stop !rewritePostOrderOne rewrite strategy: searche postorder for the first reducible term, rewrite and stop !rewritePreOrderAll rewrite strategy: searche preorder for the first reducible term, rewrite then recursively traverse the rewritten term !rewritePostOrderAll rewriter strategy: searche postorder for the first reducible term, rewrite and then recursively traverse the rewritten term !rewriteIterateAtRoot iterate until normalised rewrite strategy: match rules at node only !rewriteIteratePreOrderOne iterate until normalised rewrite strategy: searche preorder for the first reducible term, rewrite and stop !rewriteIteratePostOrderOne iterate until normalised rewrite strategy: searche postorder for the first reducible term, rewrite and stop !rewriteIteratePreOrderAll iterate until normalised rewrite strategy: searche preorder for the first reducible term, rewrite then recursively traverse the rewritten term !rewriteIteratePostOrderAll iterate until normalised rewriter strategy: searche postorder for the first reducible term, rewrite and then recursively traverse the rewritten term Attribute evaluator management ------------------------------ !evalDisable disable the attribute evaluator Standalone tools ---------------- !termTool start the term rewriting tutorial tool !grammarWrite write out parse, lexer, character level, token and pretty printed grammars !generateDepth generate strings from grammar depth first !generateBreadth generate strings from grammar breadth first !generateRandom generate strings by randomly selecting the expansion instance and right hand side !extractJLS extract Java Language Specification grammar from text snipped document !compressWhitespaceJava compress Java file whitespace runs to single character Rules ----- ART usees three kinds of rule: context free grammar rules, choser rules and rewrite inference rules. CFG rules and rewrite rules may optionally contain attribute equations which are evaluated after the tree stabilises. Context Free Grammar rules -------------------------- A CFG rule has the form ::= CFG rules are composed from nonterminals, terminals, the empty string symbol, EBNF operators and the ::= symbol A nonterminal is denoted by an alphnumeric symbol or a string delimuted by $ characters: adrian123 $;^^\$xyz$ An alphanumeric identifier may not begin with ART, art or any other mix of cases Terminals have four subclass: 'case sensitive' "case insensitive" &builtin `c where c is a single character The empty string is denoted by # EBNF operators are: postfix * Kleene star, postifx + positive closure, postfix ? optional, ( ) do-first, infix | alternation, infix concatentaion, infix \ difference, prefix \ not, adrian ::= 'xb'* C C ::= ('c' | 'C')+ | # Choosers -------- The choice mechanism utilises named sets of three-tuples (higher, longer, shorter) where higher, longer and shorter are sets of ordered pairs of grammar elements which may include terminals, nonterminals, productions and slots. A chooser declaration has this form: !choose chooserSetName? ( L OP R )* that is, the directive !choose followed by an optional name followed by zero or more choosers, where: L and R are set expressions over grammar elements and the operators | \ and () representing union, difference and do-first respectively Grammar elements may be terminals, nonterminals, productions and slots, or one of these keywords: anyLiteralTerminal, anyBuiltinTerminal, anyParaterminal, anyTerminal OP is one of >> (longer) << (shorter) > (L higher than R) < (R higher than L) chooserSetName is an optional named set. If chooserSetName is omitted, the set with the empty name "" is selected In the grammarPrint phase, chooser relations are computed, and the individual written out to files names ARTChooseXyz.art where Xyz is a chooser set name. Since the default name is empty, the unnamed chooser set is written to ARTChoose.art The batch files lexGLL and parseMGLL expect to find files ARTChooseLexTWE.art, ARTChooseParseSPPF.art and ARTChooseParseTWE.art, so specifications to be used with these batch files must include !choose directive for those three names. This is only a convention, and the names could be changed if the batch files were changed to match. The relations are written into the generated lexer and parser, so any changes to them will require regeneration. However, ambiguity reduction is only actually enabled by the directives !tweLongest, !twePriority, !sppfLongest and !sppfPriority applied to the corresponding ARTV3TestGenerated instance. Rewrite rules ------------- A rewrite rules has the form conditions --- transition Notes on V3 and V4 ================== The notes above are for V4.Version 3 does not have the pipeline and associated directive list. Instead, ART is activated once for each Top level batch files -------------------- 1. grammarWrite output ART specification files derived from : ARTParserGrammar.art, ARTLexerGrammar.art, ARTChoose*.art, ARTCharacterGrammar.art, ARTPrettyGrammar.art, ARTTokenGrammar.art 2. lexGLL use EBNF GLL to lexicalise the contents of and dump the TWE set to file ARTTWE.twe 3. parseMGLL use BNF-only MGLL to parse ARTTWE.twe - argument identical to the lexGLL run that built ARTTWE.twe 4. validateTokens run steps 1-3 and then perform a token level parse using a lexicalisation derived from the SPPF 5. clean removes intermediate files created by steps 1-4 Utility batch files used by the top level batch files ----------------------------------------------------- art Run ART V4 artV3 Run ARTV3 artV3CompileGenerated Run the Java compiler on the most recently produced ARTGeneratedLexer.java and ARTGeneratedParser.java artV3TestGenerated Exercise the most recently compiled ARTGeneratedParser and ARTGeneratedLexer