Nondeterministic parser generators will perform lookahead or backtracking in an effort to resolve conflicts that cannot be handled by their underlying (usually deterministic) parser technologies. This gives extra parsing power, but at the cost of (worst case) exponentially increasing parse times.
Backtracking and lookahead algorithms are suitable for parsing languages which have a large deterministic core and only need small amounts of localised backtracking, such as C++. You should forget all about using these tools to handle very general grammars though (for instance for applications in natural language parsing) where you need parser generators based on algorithms such as the Early or Tomita parsers which display worst case cubic time (or quadratic time for unambiguous grammars).
Another thing to watch is that adding backtracking to a parser does not necessarily make it able to handle completely general grammars. None of the top-down parser generators described here are able to handle left-recursion, for instance, and there are more subtle effects that can catch you out if you do not fully understand the parser's backtracking behaviour.
ANTLR2 is the Java rewrite of PCCTS (see below). Its syntactic
predicates allow user controlled lookahead to resolve ambiguities which the
(otherwise LL(k)) parser algorithm cannot resolve.
BtYACC (Backtracking YACC) is Berkley YACC modified to perform exhaustive
backtracking when it comes across a conflict in the table by first shifting and
then backing up and making all possible reductions until it finds a successful
parse or until it has tried every possibility. I have used BTYACC on Windows
95 and I find it robust and easy to use. The documentation is extremely sparse,
but the authors (Chris Dodd and Vadim Maslov) are friendly.
Cocktail
is a compiler writer's toolkit that includes the LARK parser generator
which generates backtracking LR parsers. Evaluation versions for
Windows 95 and NT may be downloaded but the full kit is commercial. An
earlier, free, version of Cocktail which does not include LARK is
available here
Information supplied to me by the author, Josef Grosch
COCOM also known as
Russian armoury
is a compiler writer's toolkit that includes the MSTA parser generator
which generates LR(k) and LALR(k) parsers using a modified DeRemer
algorithm. The author is working on a version with Tomita-style
backtracking.
Information supplied to me by the author, Vladimir Makarov who tells me that
MSTA's status is `experimental' and that the documentation needs work. Other
parts of the toolkit are more battle hardened.
Our own
GRDP is based on a new backtracking recursive descent parsing
algorithm that can handle grammars requiring arbitrary amounts of
lookahead and which even behaves correctly when presented with
ambiguous grammars. It is presently a prototype but we are building
the algorithms into our new project PGT (Permutation Grammar Tool)
which will be available Real Soon Now.
LADE is a compiler writer's toolkit that contains a bottom-up
backtracking parser. It was the subject of several messages in
comp.compilers during the years 1992-1995.
I believe that it is commercial, and I am
unable at present to track down a current link. One user reports that it is
very slow compared to Cocktail's LARK, but does not give the test grammar. Given that
both tools will be exponential on some grammars, this observation may not
be helpful - it's possible that the grammar triggered exponential behaviour in LADE but not in LARK.
LALL is a LookAhead LL parser generator that accepts YACC-like BNF. Actually the current release is LL(1) only, but the web site contains a thesis paper from 1994 which describes the lookahead functionality that will be available `shortly'.
Information supplied to me by the author, George W Freas II.
I haven't tried this yet and I'd be interested in users' comments.
LPM is a backtracking pattern matcher.
Information supplied to me by the author, Quinn Tyler Jackson.
I haven't tried this yet and I'd be interested in users' comments.
JAVACC
is Sun's own parser generator for Java users. It is very much in the spirit
of PCCTS and ANTLER2 although it has more of a `programming language' feel than either of those. It too uses syntactic predicates to specify local lookahead.
It's underlying parser technology is LL(1) rather than the LL(k) of PCCTS
and ANTLR 2.
PCCTS is a compiler writer's toolkit that includes DLG, a lexical analyser generator, ANTLR, a predicated LL(k) parser generator and Sorcerer, a tree walker generator. ANTLER's syntactic
predicates allow user controlled lookahead to resolve ambiguities which the
(otherwise LL(k)) parser algorithm cannot resolve.
PRECCX (PREttier Compiler-Compiler) performs limited backtrack recursive
descent parsing.
I've used PRECCX on MS-DOS under Turbo C. It has some limitations which
are not made clear in the (sparse) documentation. It also has some very powerful capabilities arising from its ability to pass a parser to a rule as an
attribute. I've found the authors helpful.
Ratatosk is a parser generator for use within the Gofer functional programming environment. It uses SLR with backtracking.
Information supplied to me by the author Torben Mogensen.
I haven't tried this because I don't use Gofer, but I'd be interested in users' comments.
YAY (Yet Another YACC) is an LR(2) parser generator with constructs for
disambiguating grammars.
Information supplied to me by Salvador Cavadini.
I haven't tried this yet and I'd be interested in users' comments.
TCLL1 and TCLLk are
parser generators that generate tables for programs written in
Icon. Little (no?) support for semantic actions. TCLL1 is an LL(1)
generator. TCLL(k) trys to put a grammar into LL(1) form and if that
fails resorts to lookahead by building a a `look-ahead' tree to
determine the correct right hand side to choose for a nonterminal and
then backup up its input. TCLLk is currently alpha-release but has
been used successfully in class.
Information supplied to me by Thomas Christopher.
I haven't tried this yet and I'd be interested in users' comments.
YACC++ is an object oriented parser generator in the YACC tradition. It
supports user-controlled lookahead via syntactic predicates.Information supplied by the author. Single user licenses cost $995 or $500 to academic users.
I haven't tried it but some users are very pleased with it.
TXL is a functional tree manipulation language. The full version is
commercial: a `light' version may be downloaded free.
ASF+SDF uses Tomita-style parsing and term rewriting.