This is a very simple tutorial for the definition and use of grammars. To view a complete example, visit Examples - Expression Grammar.

Defining the grammar

To define a grammar, simply create an instance of the Grammar class or the Grammar<T> class (differences will be explained in a minute):

var G = new Grammar();

After you have a grammar instance, you can begin defining non-terminals:

var S = G.StartSymbol;  // The starting symbol.
var A = G.Rule();
var B = G.Rule();

... and terminals:

var a = G.Token('a');
var b = G.Token('b');

All symbols can be given a name for debugging reasons, like this:

var A = G.Rule("A");

Non-terminals by default are named S0, S1, and so on, and terminals are named after the character that defines them. By the way, terminals can also be defined by regular expressions, and named:

var CONST = G.Token("CONST", @"\d+");

Following up with our AB grammar, we can begin defining the productions. The operator % is used for each non-terminal definition. The operator + is used for the concatenation of symbols (since we cannot simply put them side by side separated by spaces) and the operator | is used to separate several definitions.

S %= A + S + B + S | B + S + A + S | A + B | B + A;
A %= a;
B %= b;

Some might recognize this as the grammar for strings over { a, b } with the same number of a's and b's. Of course we could drop the A and B symbols and simply declare it all in one line.

Making a parser

After we have a grammar, we can make a parser with the Parser static class. So far we have only implemented the LR parser generator (which is very powerful), but many more are on the way.

var parser = Parsers.BuildLr(G); 

The LR parser also provides the set of states for the LR automaton, for debugging purposes, as an out parameter.

Once we have a parser, we can build a TokenStream instance from a source path, a Stream or a TextReader, or we can call one of the extension methods.

var n1 = parser.ParseString("aaabbbccc");
var n2 = parser.ParseString("aaabbcccc"); // Parsing exception

Note that the LR parser throws and exception if it finds an error in the input.

The result of the parser.Parse family of methods is a Node type. This node by default contains no important data, but it can be used to store grammar attributes, as we explain in the following section.

Adding semantic rules

For languages that are not context-free (or cannot be easily written as context-free grammars) we can use semantic rules. These rules define attributes of the non-terminals that can be used after parsing. Semantic rules are what Yacc and ANTLR use for the AST construction. In the Examples - Expression Grammar page, semantic rules are used for the implicit evaluation of the expression during parsing.

Here we will use semantic rules to recognize a context-dependent language: strings over { a, b, c } with the same number of a's, b's and c's. Some might remember that this is the paradigm of context-dependent grammars.

To set attributes to a grammar, we apply the .With method after a production, and pass in a delegate. For example:

S %= (A + B + C).With(x => x.Set("ok", A.Node.Get<int>("count") == B.Node.Get<int>("count") && A.Node.Get<int>("count") == C.Node.Get<int>("count"));

The .Set<T> and .Get<T> allow to set and get attributes with an specific type. The argument of the lambda is the attributes node that will be set to the S symbol after the production is applied. However, this is a somewhat clumsy syntax, with all those strings around. We can work around this by defining a class that inherits from Node and add all the attributes we want to it as fields or properties.

class MyNode: Node
    public int Count { get; set; }
    public bool Ok { get; set; }

Then we make our grammar generic in the node type:

var G = new Grammar<MyNode>();

var S = G.StartSymbol;
var A = G.Rule();
var B = G.Rule();
var C = G.Rule();

var a = G.Token('a');
var b = G.Token('b');
var c = G.Token('c');

And we can write the semantic rules with a more clear syntax:

S %= (A + B + C).With(x => x.Ok = A.Node.Count == B.Node.Count && A.Node.Count == C.Node.Count);
A %= (A + a).With(x => x.Count = 1 + A.Node.Count) | a.With(x => x.Count = 1);
B %= (B + b).With(x => x.Count = 1 + B.Node.Count) | b.With(x => x.Count = 1);
C %= (C + c).With(x => x.Count = 1 + C.Node.Count) | c.With(x => x.Count = 1);

If we now parse a string, we can check the "Ok" property of the return node and say if the string was correct or not. This example can be seen in the Examples - ABC Grammar page.

Last edited Apr 2, 2012 at 4:01 PM by AlejandroPiad, version 8


No comments yet.