| [/============================================================================== |
| Copyright (C) 2001-2015 Joel de Guzman |
| Copyright (C) 2001-2011 Hartmut Kaiser |
| |
| Distributed under the Boost Software License, Version 1.0. (See accompanying |
| file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| ===============================================================================/] |
| |
| [section:roman Roman Numerals] |
| |
| This example demonstrates: |
| |
| * The Symbol Table |
| * Non-terminal rules |
| |
| [heading Symbol Table] |
| |
| The symbol table holds a dictionary of symbols where each symbol is a sequence |
| of characters. The template class, can work efficiently with 8, 16, 32 and even |
| 64 bit characters. Mutable data of type T are associated with each symbol. |
| |
| Traditionally, symbol table management is maintained separately outside the BNF |
| grammar through semantic actions. Contrary to standard practice, the Spirit |
| symbol table class `symbols` is a parser. An object of which may be used |
| anywhere in the EBNF grammar specification. It is an example of a dynamic |
| parser. A dynamic parser is characterized by its ability to modify its behavior |
| at run time. Initially, an empty symbols object matches nothing. At any time, |
| symbols may be added or removed, thus, dynamically altering its behavior. |
| |
| Each entry in a symbol table may have an associated mutable data slot. In this |
| regard, one can view the symbol table as an associative container (or map) of |
| key-value pairs where the keys are strings. |
| |
| The symbols class expects one template parameter to specify the data type |
| associated with each symbol: its attribute. There are a couple of |
| namespaces in X3 where you can find various versions of the symbols class |
| for handling different character encoding including ascii, standard, |
| standard_wide, iso8859_1, and unicode. The default symbol parser type in |
| the main x3 namespace is standard. |
| |
| Here's a parser for roman hundreds (100..900) using the symbol table. Keep in |
| mind that the data associated with each slot is the parser's attribute (which is |
| passed to attached semantic actions). |
| |
| struct hundreds_ : x3::symbols<unsigned> |
| { |
| hundreds_() |
| { |
| add |
| ("C" , 100) |
| ("CC" , 200) |
| ("CCC" , 300) |
| ("CD" , 400) |
| ("D" , 500) |
| ("DC" , 600) |
| ("DCC" , 700) |
| ("DCCC" , 800) |
| ("CM" , 900) |
| ; |
| } |
| |
| } hundreds; |
| |
| Here's a parser for roman tens (10..90): |
| |
| struct tens_ : x3::symbols<unsigned> |
| { |
| tens_() |
| { |
| add |
| ("X" , 10) |
| ("XX" , 20) |
| ("XXX" , 30) |
| ("XL" , 40) |
| ("L" , 50) |
| ("LX" , 60) |
| ("LXX" , 70) |
| ("LXXX" , 80) |
| ("XC" , 90) |
| ; |
| } |
| |
| } tens; |
| |
| and, finally, for ones (1..9): |
| |
| struct ones_ : x3::symbols<unsigned> |
| { |
| ones_() |
| { |
| add |
| ("I" , 1) |
| ("II" , 2) |
| ("III" , 3) |
| ("IV" , 4) |
| ("V" , 5) |
| ("VI" , 6) |
| ("VII" , 7) |
| ("VIII" , 8) |
| ("IX" , 9) |
| ; |
| } |
| |
| } ones; |
| |
| Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions. |
| They are all parsers. |
| |
| [heading Rules] |
| |
| Up until now, we've been inlining our parser expressions, passing them directly |
| to the `phrase_parse` function. The expression evaluates into a temporary, |
| unnamed parser which is passed into the `phrase_parse` function, used, and then |
| destroyed. This is fine for small parsers. When the expressions get complicated, |
| you'd want to break the expressions into smaller easier-to-understand pieces, |
| name them, and refer to them from other parser expressions by name. |
| |
| A parser expression can be assigned to what is called a "rule". There are |
| various ways to declare rules. The simplest form is: |
| |
| rule<ID> const r = "some-name"; |
| |
| [heading Rule ID] |
| |
| At the very least, the rule needs an identification tag. This ID can be any |
| struct or class type and need not be defined. Forward declaration would suffice. |
| In subsequent tutorials, we will see that the rule ID can have additional |
| functionalities for error handling and annotation. |
| |
| [heading Rule Name] |
| |
| The name is optional, but is useful for debugging and error handling, as |
| we'll see later. Notice that rule `r` is declared `const`. Rules are |
| immutable and are best declared as `const`. Rules are lightweight and can be |
| passed around by value. Its only member variable is a `std::string`: its |
| name. |
| |
| [note Unlike Qi (Spirit V2), X3 rules can be used with both `phrase_parse` and |
| `parse` without having to specify the skip parser] |
| |
| [heading Rule Attributes] |
| |
| For our next example, there's one more rule form you should know about: |
| |
| rule<ID, Attribute> const r = "some-name"; |
| |
| The Attribute parameter specifies the attribute type of the rule. You've seen |
| that our parsers can have an attribute. Recall that the `double_` parser has |
| an attribute of `double`. To be precise, these are /synthesized/ attributes. |
| The parser "synthesizes" the attribute value. If the parser is a function, |
| think of them as function return values. |
| |
| [heading Rule Definition] |
| |
| After having declared a rule, you need a definition for the rule. Example: |
| |
| auto const r_def = double_ >> *(',' >> double_); |
| |
| By convention, rule definitions have a _def suffix. Like rules, rule definitions |
| are immutable and are best declared as `const`. |
| |
| [#__tutorial_spirit_define__] |
| [heading BOOST_SPIRIT_DEFINE] |
| |
| Now that we have a rule and its definition, we tie the rule with a rule |
| definition using the `BOOST_SPIRIT_DEFINE` macro: |
| |
| BOOST_SPIRIT_DEFINE(r); |
| |
| Behind the scenes, what's actually happening is that we are defining a `parse_rule` |
| function in the client namespace that tells X3 how to invoke the rule. |
| And so for each rule defined using `BOOST_SPIRIT_DEFINE`, there is an |
| overloaded `parse_rule` function. At parse time, Spirit X3 recursively calls |
| the appropriate `parse_rule` function. |
| |
| [note `BOOST_SPIRIT_DEFINE` is variadic and may be used for one or more rules. |
| Example: `BOOST_SPIRIT_DEFINE(r1, r2, r3);`] |
| |
| [heading Grammars] |
| |
| Unlike Qi (Spirit V2), X3 discards the notion of a grammar as a concrete |
| entity for encapsulating rules. In X3, a grammar is simply a logical group of |
| rules that work together, typically with a single top-level start rule which |
| serves as the main entry point. X3 grammars are grouped using namespaces. |
| The roman numeral grammar is a very nice and simple example of a grammar: |
| |
| namespace parser |
| { |
| using x3::eps; |
| using x3::lit; |
| using x3::_val; |
| using x3::_attr; |
| using ascii::char_; |
| |
| auto set_zero = [&](auto& ctx){ _val(ctx) = 0; }; |
| auto add1000 = [&](auto& ctx){ _val(ctx) += 1000; }; |
| auto add = [&](auto& ctx){ _val(ctx) += _attr(ctx); }; |
| |
| x3::rule<class roman, unsigned> const roman = "roman"; |
| |
| auto const roman_def = |
| eps [set_zero] |
| >> |
| ( |
| -(+lit('M') [add1000]) |
| >> -hundreds [add] |
| >> -tens [add] |
| >> -ones [add] |
| ) |
| ; |
| |
| BOOST_SPIRIT_DEFINE(roman); |
| } |
| |
| Things to take notice of: |
| |
| * The start rule's attribute is `unsigned`. |
| |
| * `_val(ctx)` gets a reference to the rule's synthesized attribute. |
| |
| * `_attr(ctx)` gets a reference to the parser's synthesized attribute. |
| |
| * `eps` is a special spirit parser that consumes no input but is always |
| successful. We use it to initialize the rule's synthesized |
| attribute, to zero before anything else. The actual parser starts at |
| `+lit('M')`, parsing roman thousands. Using `eps` this way is good |
| for doing pre and post initializations. |
| |
| * The rule `roman` and the definition `roman_def` are const objects. |
| |
| * The rule's ID is `class roman`. C++ allows you to declare the class |
| in the actual template declaration as you can see in the example: |
| |
| x3::rule<class roman, unsigned> const roman = "roman"; |
| |
| [heading Let's Parse!] |
| |
| bool r = parse(iter, end, roman, result); |
| |
| if (r && iter == end) |
| { |
| std::cout << "-------------------------\n"; |
| std::cout << "Parsing succeeded\n"; |
| std::cout << "result = " << result << std::endl; |
| std::cout << "-------------------------\n"; |
| } |
| else |
| { |
| std::string rest(iter, end); |
| std::cout << "-------------------------\n"; |
| std::cout << "Parsing failed\n"; |
| std::cout << "stopped at: \": " << rest << "\"\n"; |
| std::cout << "-------------------------\n"; |
| } |
| |
| `roman` is our roman numeral parser. This time around we are using the |
| no-skipping version of the parse functions. We do not want to skip any spaces! |
| We are also passing in an attribute, `unsigned result`, which will receive the |
| parsed value. |
| |
| The full cpp file for this example can be found here: |
| [@../../../example/x3/roman.cpp roman.cpp] |
| |
| [endsect] |