blob: edb0cf7c690812e48657845e1e516e8a15a90b76 [file] [log] [blame]
[/==============================================================================
Copyright (C) 2001-2015 Hartmut Kaiser
Copyright (C) 2001-2011 Joel de Guzman
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]
[/////////////////////////////////////////////////////////////////////////////]
[section:primitive_attributes Attributes of Primitive Components]
Parsers in __spirit__ are fully attributed. __x3__ parsers always /expose/ an
attribute specific to their type. This is called /synthesized attribute/ as it
is returned from a successful match representing the matched input sequence. For
instance, numeric parsers, such as `int_` or `double_`, return the `int` or
`double` value converted from the matched input sequence. Other primitive parser
components have other intuitive attribute types, such as for instance `int_`
which has `int`, or `ascii::char_` which has `char`. Primitive parsers apply the
normal C++ convertibility rules: you can use any C++ type to receive the parsed
value as long as the attribute type of the parser is convertible to the type
provided. The following example shows how a synthesized parser attribute (the
`int` value) is extracted by calling the API function `x3::parse`:
int value = 0;
std::string str("123");
std::string::iterator strbegin = str.begin();
x3::parse(strbegin, str.end(), int_, value); // value == 123
For a full list of available parser primitives and their attribute types please
see the sections __sec_x3_primitive__.
[endsect]
[/////////////////////////////////////////////////////////////////////////////]
[section:compound_attributes Attributes of Compound Components]
__x3__ implement well defined attribute type propagation rules for all compound
parsers, such as sequences, alternatives, Kleene star, etc. The main attribute
propagation rule for a sequences is for instance:
a: A, b: B --> (a >> b): tuple<A, B>
which reads as:
[:Given `a` and `b` are parsers, and `A` is the attribute type of `a`, and `B`
is the attribute type of `b`, then the attribute type of `a >> b` (`a << b`)
will be `tuple<A, B>`.]
[note The notation `tuple<A, B>` is used as a placeholder expression for any
fusion sequence holding the types A and B, such as `boost::fusion::tuple<A, B>`
or `std::pair<A, B>` (for more information see __fusion__).]
As you can see, in order for a type to be compatible with the attribute type
of a compound expression it has to
* either be convertible to the attribute type,
* or it has to expose certain functionalities, i.e. it needs to conform to a
concept compatible with the component.
Each compound component implements its own set of attribute propagation rules.
For a full list of how the different compound parsers consume attributes
see the sections __sec_x3_compound__.
[heading The Attribute of Sequence Parsers]
Sequences require an attribute type to expose the concept of a fusion sequence,
where all elements of that fusion sequence have to be compatible with the
corresponding element of the component sequence. For example, the expression:
double_ >> double_
is compatible with any fusion sequence holding two types, where both types have
to be compatible with `double`. The first element of the fusion sequence has to
be compatible with the attribute of the first `double_`, and the second element
of the fusion sequence has to be compatible with the attribute of the second
`double_`. If we assume to have an instance of a `std::pair<double, double>`,
we can directly use the expressions above to do both, parse input to fill the
attribute:
// the following parses "1.0 2.0" into a pair of double
std::string input("1.0 2.0");
std::string::iterator strbegin = input.begin();
std::pair<double, double> p;
x3::phrase_parse(strbegin, input.end(),
x3::double_ >> x3::double_, // parser grammar
x3::space, // delimiter grammar
p); // attribute to fill while parsing
[tip *For sequences only:* To keep it simple, unlike __Spirit.qi__, __x3__ does
not support more than one attribute anymore in the `parse` and `phrase_parse` function.
Just use `std:tuple'. Be sure to include `boost/fusion/adapted/std_tuple.hpp' in this case.
]
[heading The Attribute of Alternative Parsers]
Alternative parsers are all about - well - alternatives. In
order to store possibly different result (attribute) types from the different
alternatives we use the data type __boost_variant__. The main attribute
propagation rule of these components is:
a: A, b: B --> (a | b): variant<A, B>
Alternatives have a second very important attribute propagation rule:
a: A, b: A --> (a | b): A
often simplifying things significantly. If all sub expressions of
an alternative expose the same attribute type, the overall alternative
will expose exactly the same attribute type as well.
[endsect]
[/////////////////////////////////////////////////////////////////////////////]
[section:more_compound_attributes More About Attributes of Compound Components]
While parsing input, it is often desirable to combine some
constant elements with variable parts. For instance, let us look at the example
of parsing or formatting a complex number, which is written as `(real, imag)`,
where `real` and `imag` are the variables representing the real and imaginary
parts of our complex number. This can be achieved by writing:
'(' >> double_ >> ", " >> double_ >> ')'
Literals (such as `'('` and `", "`) do /not/ expose any attribute
(well actually, they do expose the special type `unused_type`, but in this
context `unused_type` is interpreted as if the component does not expose any
attribute at all). It is very important to understand that the literals don't
consume any of the elements of a fusion sequence passed to this component
sequence. As said, they just don't expose any attribute and don't produce
(consume) any data. The following example shows this:
// the following parses "(1.0, 2.0)" into a pair of double
std::string input("(1.0, 2.0)");
std::string::iterator strbegin = input.begin();
std::pair<double, double> p;
x3::parse(strbegin, input.end(),
'(' >> x3::double_ >> ", " >> x3::double_ >> ')', // parser grammar
p); // attribute to fill while parsing
where the first element of the pair passed in as the data to generate is still
associated with the first `double_`, and the second element is associated with
the second `double_` parser.
This behavior should be familiar as it conforms to the way other input and
output formatting libraries such as `scanf`, `printf` or `boost::format` are
handling their variable parts. In this context you can think about __x3__'s
primitive components (such as the `double_` above) as of being
type safe placeholders for the attribute values.
[tip *For sequences only:* To keep it simple, unlike __Spirit.qi__, __x3__ does
not support more than one attribute anymore in the `parse` and `phrase_parse` function.
Just use `std:tuple'. Be sure to include `boost/fusion/adapted/std_tuple.hpp' in this case.
]
Let's take a look at this from a more formal perspective:
a: A, b: Unused --> (a >> b): A
which reads as:
[:Given `a` and `b` are parsers, and `A` is the attribute type of
`a`, and `unused_type` is the attribute type of `b`, then the attribute type
of `a >> b` (`a << b`) will be `A` as well. This rule applies regardless of
the position the element exposing the `unused_type` is at.]
This rule is the key to the understanding of the attribute handling in
sequences as soon as literals are involved. It is as if elements with
`unused_type` attributes 'disappeared' during attribute propagation. Notably,
this is not only true for sequences but for any compound components. For
instance, for alternative components the corresponding rule is:
a: A, b: Unused --> (a | b): A
again, allowing to simplify the overall attribute type of an expression.
[endsect]
[/////////////////////////////////////////////////////////////////////////////]
[section:nonterminal_attributes Attributes of Nonterminals]
Nonterminals are the main means of constructing more complex parsers out of
simpler ones. The nonterminals in the parser world are very similar to functions
in an imperative programming language. They can be used to encapsulate parser
expressions for a particular input sequence. After being defined, the
nonterminals can be used as 'normal' parsers in more complex expressions
whenever the encapsulated input needs to be recognized. Parser nonterminals in
__x3__ usually return a value (the synthesized attribute).
The type of the synthesized attribute as to be explicitly specified while
defining the particular nonterminal. Example (ignore ID for now):
x3::rule<ID, int> r;
[endsect]