| [/============================================================================== |
| Copyright (C) 2001-2015 Hartmut Kaiser |
| Copyright (C) 2001-2011 Joel de Guzman |
| |
| Distributed under the Boost Software License, Version 1.0. (See accompanying |
| file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| ===============================================================================/] |
| |
| [/////////////////////////////////////////////////////////////////////////////] |
| [section:primitive_attributes Attributes of Primitive Components] |
| |
| Parsers in __spirit__ are fully attributed. __x3__ parsers always /expose/ an |
| attribute specific to their type. This is called /synthesized attribute/ as it |
| is returned from a successful match representing the matched input sequence. For |
| instance, numeric parsers, such as `int_` or `double_`, return the `int` or |
| `double` value converted from the matched input sequence. Other primitive parser |
| components have other intuitive attribute types, such as for instance `int_` |
| which has `int`, or `ascii::char_` which has `char`. Primitive parsers apply the |
| normal C++ convertibility rules: you can use any C++ type to receive the parsed |
| value as long as the attribute type of the parser is convertible to the type |
| provided. The following example shows how a synthesized parser attribute (the |
| `int` value) is extracted by calling the API function `x3::parse`: |
| |
| int value = 0; |
| std::string str("123"); |
| std::string::iterator strbegin = str.begin(); |
| x3::parse(strbegin, str.end(), int_, value); // value == 123 |
| |
| For a full list of available parser primitives and their attribute types please |
| see the sections __sec_x3_primitive__. |
| |
| [endsect] |
| |
| [/////////////////////////////////////////////////////////////////////////////] |
| [section:compound_attributes Attributes of Compound Components] |
| |
| __x3__ implement well defined attribute type propagation rules for all compound |
| parsers, such as sequences, alternatives, Kleene star, etc. The main attribute |
| propagation rule for a sequences is for instance: |
| |
| a: A, b: B --> (a >> b): tuple<A, B> |
| |
| which reads as: |
| |
| [:Given `a` and `b` are parsers, and `A` is the attribute type of `a`, and `B` |
| is the attribute type of `b`, then the attribute type of `a >> b` (`a << b`) |
| will be `tuple<A, B>`.] |
| |
| [note The notation `tuple<A, B>` is used as a placeholder expression for any |
| fusion sequence holding the types A and B, such as `boost::fusion::tuple<A, B>` |
| or `std::pair<A, B>` (for more information see __fusion__).] |
| |
| As you can see, in order for a type to be compatible with the attribute type |
| of a compound expression it has to |
| |
| * either be convertible to the attribute type, |
| * or it has to expose certain functionalities, i.e. it needs to conform to a |
| concept compatible with the component. |
| |
| Each compound component implements its own set of attribute propagation rules. |
| For a full list of how the different compound parsers consume attributes |
| see the sections __sec_x3_compound__. |
| |
| [heading The Attribute of Sequence Parsers] |
| |
| Sequences require an attribute type to expose the concept of a fusion sequence, |
| where all elements of that fusion sequence have to be compatible with the |
| corresponding element of the component sequence. For example, the expression: |
| |
| double_ >> double_ |
| |
| is compatible with any fusion sequence holding two types, where both types have |
| to be compatible with `double`. The first element of the fusion sequence has to |
| be compatible with the attribute of the first `double_`, and the second element |
| of the fusion sequence has to be compatible with the attribute of the second |
| `double_`. If we assume to have an instance of a `std::pair<double, double>`, |
| we can directly use the expressions above to do both, parse input to fill the |
| attribute: |
| |
| // the following parses "1.0 2.0" into a pair of double |
| std::string input("1.0 2.0"); |
| std::string::iterator strbegin = input.begin(); |
| std::pair<double, double> p; |
| x3::phrase_parse(strbegin, input.end(), |
| x3::double_ >> x3::double_, // parser grammar |
| x3::space, // delimiter grammar |
| p); // attribute to fill while parsing |
| |
| [tip *For sequences only:* To keep it simple, unlike __Spirit.qi__, __x3__ does |
| not support more than one attribute anymore in the `parse` and `phrase_parse` function. |
| Just use `std:tuple'. Be sure to include `boost/fusion/adapted/std_tuple.hpp' in this case. |
| |
| ] |
| |
| [heading The Attribute of Alternative Parsers] |
| |
| Alternative parsers are all about - well - alternatives. In |
| order to store possibly different result (attribute) types from the different |
| alternatives we use the data type __boost_variant__. The main attribute |
| propagation rule of these components is: |
| |
| a: A, b: B --> (a | b): variant<A, B> |
| |
| Alternatives have a second very important attribute propagation rule: |
| |
| a: A, b: A --> (a | b): A |
| |
| often simplifying things significantly. If all sub expressions of |
| an alternative expose the same attribute type, the overall alternative |
| will expose exactly the same attribute type as well. |
| |
| [endsect] |
| |
| [/////////////////////////////////////////////////////////////////////////////] |
| [section:more_compound_attributes More About Attributes of Compound Components] |
| |
| While parsing input, it is often desirable to combine some |
| constant elements with variable parts. For instance, let us look at the example |
| of parsing or formatting a complex number, which is written as `(real, imag)`, |
| where `real` and `imag` are the variables representing the real and imaginary |
| parts of our complex number. This can be achieved by writing: |
| |
| '(' >> double_ >> ", " >> double_ >> ')' |
| |
| Literals (such as `'('` and `", "`) do /not/ expose any attribute |
| (well actually, they do expose the special type `unused_type`, but in this |
| context `unused_type` is interpreted as if the component does not expose any |
| attribute at all). It is very important to understand that the literals don't |
| consume any of the elements of a fusion sequence passed to this component |
| sequence. As said, they just don't expose any attribute and don't produce |
| (consume) any data. The following example shows this: |
| |
| // the following parses "(1.0, 2.0)" into a pair of double |
| std::string input("(1.0, 2.0)"); |
| std::string::iterator strbegin = input.begin(); |
| std::pair<double, double> p; |
| x3::parse(strbegin, input.end(), |
| '(' >> x3::double_ >> ", " >> x3::double_ >> ')', // parser grammar |
| p); // attribute to fill while parsing |
| |
| where the first element of the pair passed in as the data to generate is still |
| associated with the first `double_`, and the second element is associated with |
| the second `double_` parser. |
| |
| This behavior should be familiar as it conforms to the way other input and |
| output formatting libraries such as `scanf`, `printf` or `boost::format` are |
| handling their variable parts. In this context you can think about __x3__'s |
| primitive components (such as the `double_` above) as of being |
| type safe placeholders for the attribute values. |
| |
| [tip *For sequences only:* To keep it simple, unlike __Spirit.qi__, __x3__ does |
| not support more than one attribute anymore in the `parse` and `phrase_parse` function. |
| Just use `std:tuple'. Be sure to include `boost/fusion/adapted/std_tuple.hpp' in this case. |
| |
| ] |
| |
| Let's take a look at this from a more formal perspective: |
| |
| a: A, b: Unused --> (a >> b): A |
| |
| which reads as: |
| |
| [:Given `a` and `b` are parsers, and `A` is the attribute type of |
| `a`, and `unused_type` is the attribute type of `b`, then the attribute type |
| of `a >> b` (`a << b`) will be `A` as well. This rule applies regardless of |
| the position the element exposing the `unused_type` is at.] |
| |
| This rule is the key to the understanding of the attribute handling in |
| sequences as soon as literals are involved. It is as if elements with |
| `unused_type` attributes 'disappeared' during attribute propagation. Notably, |
| this is not only true for sequences but for any compound components. For |
| instance, for alternative components the corresponding rule is: |
| |
| a: A, b: Unused --> (a | b): A |
| |
| again, allowing to simplify the overall attribute type of an expression. |
| |
| [endsect] |
| |
| [/////////////////////////////////////////////////////////////////////////////] |
| [section:nonterminal_attributes Attributes of Nonterminals] |
| |
| Nonterminals are the main means of constructing more complex parsers out of |
| simpler ones. The nonterminals in the parser world are very similar to functions |
| in an imperative programming language. They can be used to encapsulate parser |
| expressions for a particular input sequence. After being defined, the |
| nonterminals can be used as 'normal' parsers in more complex expressions |
| whenever the encapsulated input needs to be recognized. Parser nonterminals in |
| __x3__ usually return a value (the synthesized attribute). |
| |
| The type of the synthesized attribute as to be explicitly specified while |
| defining the particular nonterminal. Example (ignore ID for now): |
| |
| x3::rule<ID, int> r; |
| |
| [endsect] |
| |