|
Parsing Expression Grammars: A Recognition-Based Syntactic Foundation
Bryan Ford
For decades we have been using Chomsky's generative system of
grammars, particularly context-free grammars (CFGs) and regular
expressions (REs), to express the syntax of programming languages and
protocols. The power of generative grammars to express ambiguity is
crucial to their original purpose of modelling natural languages, but
this very power makes it unnecessarily difficult both to express and
to parse machine-oriented languages using CFGs. Parsing Expression
Grammars (PEGs) provide an alternative, recognition-based formal
foundation for describing machine-oriented syntax, which solves the
ambiguity problem by not introducing ambiguity in the first
place. Where CFGs express nondeterministic choice between
alternatives, PEGs instead use prioritized choice. PEGs address
frequently felt expressiveness limitations of CFGs and REs,
simplifying syntax definitions and making it unnecessary to separate
their lexical and hierarchical components. A linear-time parser can be
built for any PEG, avoiding both the complexity and fickleness of LR
parsers and the inefficiency of generalized CFG parsing. While PEGs
provide a rich set of operators for constructing grammars, they are
reducible to two minimal recognition schemas developed around 1970,
TS/TDPL and gTS/GTDPL, which are here proven equivalent in effective
recognition power.
|