ANTLR predicates Edit

Created 7/2/2007, updated 3/12/2020

This page provides an overview of the different types of predicate available in ANTLR:

Syntactic predicates

Specifies the syntactic validity of applying an alternative
Indicate syntactic context that must be satisfied in order for an alternative to match
Automatically look ahead at future input to help make decisions (arbitrary lookahead)
Used to order a rule’s alternatives (specify precedence among rules that would otherwise be ambiguous); this would suggest that it makes no sense to apply syntactic predicates in a rule with only one alternative and experience seems to confirm this (see PEG-style predicates in ANTLR for example)
First alternative that matches wins
Put another way, syntactic predicates indicate where backtracking may occur
Take the form: (...)=> where ... is a grammar fragment
May only appear on the extreme left edge of an alternative
In a rule with many alternatives, the last rule does not need an explicit predicate (it is assumed to be the default if nothing else matches)
Similarly, in a rule with many alternatives, if some of the rules are not mutually ambiguous then they don’t need explicit predicates either
The backtrack = true option may be used to have ANTLR automatically insert syntactic predicates on alternatives that do not already have a user-specified predicate
The overhead of backtracking may be amortized by turning on memoizing, preferably on a per-rule basis (the memoize = true option)
Actions are not executed during backtracking
Under the covers use a "pushdown machine" rather than a DFA (as in normal LL(*) parsing), so are more powerful recognizers (they can recognize context-free structures, unlike DFAs which cannot recognize recursive structures and are equivalent in power to regular expressions)
Explicitly specified syntactic predicates are implemented in ANTLR using gated semantic predicates behind the scenes
Implicitly added syntactic predicates (added for auto-backtracking) are implemented using normal disambiguating semantic predicates and are therefore only evaluated when LL(*) fails to predict an alternative
Documented in Chapter 14 of the ANTLR book (starting on page 331)

Example

Here is an example lexer rule that uses a syntactic predicate to order two ambiguous alternatives:

CRLF
  : ('\r'? '\n')=> '\r'? '\n'
  | '\r'
  ;

This will match \r\n, \r or \n, even in the same file (even if a file has mixed line endings). Without the syntactic predicate ANTLR doesn’t know what to do when faced with input such as \r\n (is it two CRLF tokens or one?); with the predicate ANTLR doesn’t issue an ambiguity warning and always chooses the longest match (\r\n) if possible.

Semantic predicates

Specifies the semantic validity of applying an alternative
Are boolean expressions in the target language that are evaluated at runtime
Used to guide recognition
Enforce non-syntactic rules (rules for which mere syntax is not enough to describe a valid sentence)
Help to recognize context-sensitive language constructs
Are "hoisted" up into other rules when necessary to inform the decision making process
Must be free of side effects
- Must be possible to repeatedly evaluate them and get the same result
- Evaluation order must not matter
Should not reference local variables or parameters (because this would break hoisting)
Documented in Chapter 13 of the ANTLR book (starting on page 317)

Validating semantic predicates

Look like normal actions but are followed immediately by a question mark {..}?
... is a boolean expression written in the target language; if it evaluates to false then a FailedPredicateException is thrown

Gated semantic predicates

Precede alternatives and are written as {...}?=> where ... is a boolean expression written in the target language
When evaluating to false, effectively disable the alternative, making it invisible to the recognizer (no exception is thrown)
Put another way, they dynamically "turn on" or "turn off" portions of a grammar
Always hoisted, even when decisions are deterministic
Useful when you want to distinguish between alternatives that are not syntactically ambiguous
May only appear in rules that actually have multiple alternatives
May be used in lexer and parser rules

Disambiguating semantic predicates

Precede alternatives and are written as {...}?
Used in making prediction decisions
Are only used (evaluated) when syntax alone is insufficient to distinguish between alternatives
Put another way, they disambiguate syntactically identical alternatives
Unlike gated semantic predicates may be used in rules which have only one alternative
Are hoisted into rules higher up in the decision chain when LL(*) lookahead alone is not sufficient to distinguish between alternatives
Unlike gated semantic predicates, are not hoisted for deterministic decisions
Only predicates reachable from the left edge without consuming an input symbol are hoisted
To fully resolve a non-determinism, all alternatives must be covered by a disambiguating predicate
As a special case, if the last (and only the last) conflicting alternative is not covered then ANLTR implicitly covers it as a "default"
Predicated alternatives specified earlier have precedence over those specified later; that is, the semantic predicates are evaluated in the order specified in the alternatives in the grammar
Multiple predicates may be specified for a single alternative:

rule
  : {pred1}? {pred2}? TOK // both pred1 and pred2 must evaluate to true
  | {pred3}? TOK
  ;

ANTLR predicatesEdit