PEG-style predicates in ANTLREdit

The 11 May 2006 entry from this page describes one way of getting PEG-like predicates in ANTLR.

PEG "not" predicates

In a PEG a "not" predicate is used to indicate *match X as long as it is not followed by Y*. A PEG-like syntax for this would involve using ! as a prefix operator:

// match any "bar" not followed by "baz"
foo
  : bar !baz
  ;

Apparently, ANTLR can achieve the equivalent effect using a combination of syntactic and semantic predicates, although it is much less readable than the PEG syntax (I say "apparently" because I haven’t actually tested this):

foo
  : (bar ((baz)=>{false}? | ))=> bar
  ;

Paraphrased, this means:

  • At the outermost level we will match bar only if the syntactic predicate succeeds; the syntactic predicate consists of:
    • First trying to match bar
    • Then try to match baz using a nested syntactic predicate
      • If the nested predicate succeeds (not what we wanted), must fail; we do that using a validating semantic predicate which always evaluates to false
      • If the nested predicate fails (which is what we wanted), fall through to the alternative subrule, which is an empty match (the | followed by nothing) and will always succeed

Terence Parr notes the following about "not" predicates:

They are really only useful in the lexer and seemingly only for single elements (all examples so far have been "not semicolon" or something similar). In ANTLR, you say ~';' so I don’t think we need them.

PEG "and" predicates

In a PEG an "and" predicate is used to indicate *match X as long as it is followed by Y*. Although the Y must be present it is not actually included in the match. A standard PEG notation for this would involve using & as a prefix operator:

// match any "bar" followed by "baz" (the "baz" is not consumed)
foo
  : bar &baz
  ;

Using a similar trick to that already shown above, ANTLR can achieve the same effect using a syntactic predicate; this version is considerably simpler:

foo
  : (bar baz)=> bar
  ;

Evaluation

It seems that this technique cannot be used for the reasons discussed in this mailing list post:

In the thread Jim Idle suggests the following workaround for the C target:

foo
   : bar { MARK(); } baz { REWINDLAST(); }
   ;

See also