| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"> |
| <title>Parsers in Depth</title> |
| <link rel="stylesheet" href="../../../../../../../doc/src/boostbook.css" type="text/css"> |
| <meta name="generator" content="DocBook XSL Stylesheets V1.75.0"> |
| <link rel="home" href="../../../index.html" title="Spirit 2.5"> |
| <link rel="up" href="../indepth.html" title="In Depth"> |
| <link rel="prev" href="../indepth.html" title="In Depth"> |
| <link rel="next" href="../customize.html" title="Customization of Spirit's Attribute Handling"> |
| </head> |
| <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> |
| <table cellpadding="2" width="100%"><tr> |
| <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../boost.png"></td> |
| <td align="center"><a href="../../../../../../../index.html">Home</a></td> |
| <td align="center"><a href="../../../../../../../libs/libraries.htm">Libraries</a></td> |
| <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> |
| <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> |
| <td align="center"><a href="../../../../../../../more/index.htm">More</a></td> |
| </tr></table> |
| <hr> |
| <div class="spirit-nav"> |
| <a accesskey="p" href="../indepth.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../indepth.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../customize.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a> |
| </div> |
| <div class="section"> |
| <div class="titlepage"><div><div><h4 class="title"> |
| <a name="spirit.advanced.indepth.parsers_indepth"></a><a class="link" href="parsers_indepth.html" title="Parsers in Depth">Parsers in |
| Depth</a> |
| </h4></div></div></div> |
| <p> |
| This section is not for the faint of heart. In here, are distilled the |
| inner workings of <span class="emphasis"><em>Spirit.Qi</em></span> parsers, using real code |
| from the <a href="http://boost-spirit.com" target="_top">Spirit</a> library as |
| examples. On the other hand, here is no reason to fear reading on, though. |
| We tried to explain things step by step while highlighting the important |
| insights. |
| </p> |
| <p> |
| The <code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/parser.html" title="Parser"><code class="computeroutput"><span class="identifier">Parser</span></code></a></code> class is the base |
| class for all parsers. |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Derived</span><span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">parser</span> |
| <span class="special">{</span> |
| <span class="keyword">struct</span> <span class="identifier">parser_id</span><span class="special">;</span> |
| <span class="keyword">typedef</span> <span class="identifier">Derived</span> <span class="identifier">derived_type</span><span class="special">;</span> |
| <span class="keyword">typedef</span> <span class="identifier">qi</span><span class="special">::</span><span class="identifier">domain</span> <span class="identifier">domain</span><span class="special">;</span> |
| |
| <span class="comment">// Requirement: p.parse(f, l, context, skip, attr) -> bool |
| </span> <span class="comment">// |
| </span> <span class="comment">// p: a parser |
| </span> <span class="comment">// f, l: first/last iterator pair |
| </span> <span class="comment">// context: enclosing rule context (can be unused_type) |
| </span> <span class="comment">// skip: skipper (can be unused_type) |
| </span> <span class="comment">// attr: attribute (can be unused_type) |
| </span> |
| <span class="comment">// Requirement: p.what(context) -> info |
| </span> <span class="comment">// |
| </span> <span class="comment">// p: a parser |
| </span> <span class="comment">// context: enclosing rule context (can be unused_type) |
| </span> |
| <span class="comment">// Requirement: P::template attribute<Ctx, Iter>::type |
| </span> <span class="comment">// |
| </span> <span class="comment">// P: a parser type |
| </span> <span class="comment">// Ctx: A context type (can be unused_type) |
| </span> <span class="comment">// Iter: An iterator type (can be unused_type) |
| </span> |
| <span class="identifier">Derived</span> <span class="keyword">const</span><span class="special">&</span> <span class="identifier">derived</span><span class="special">()</span> <span class="keyword">const</span> |
| <span class="special">{</span> |
| <span class="keyword">return</span> <span class="special">*</span><span class="keyword">static_cast</span><span class="special"><</span><span class="identifier">Derived</span> <span class="keyword">const</span><span class="special">*>(</span><span class="keyword">this</span><span class="special">);</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| The <code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/parser.html" title="Parser"><code class="computeroutput"><span class="identifier">Parser</span></code></a></code> class does not really |
| know how to parse anything but instead relies on the template parameter |
| <code class="computeroutput"><span class="identifier">Derived</span></code> to do the actual |
| parsing. This technique is known as the "Curiously Recurring Template |
| Pattern" in template meta-programming circles. This inheritance strategy |
| gives us the power of polymorphism without the virtual function overhead. |
| In essence this is a way to implement compile time polymorphism. |
| </p> |
| <p> |
| The Derived parsers, <code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/primitiveparser.html" title="PrimitiveParser"><code class="computeroutput"><span class="identifier">PrimitiveParser</span></code></a></code>, <code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/unaryparser.html" title="UnaryParser"><code class="computeroutput"><span class="identifier">UnaryParser</span></code></a></code>, <code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/binaryparser.html" title="BinaryParser"><code class="computeroutput"><span class="identifier">BinaryParser</span></code></a></code> and <code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/naryparser.html" title="NaryParser"><code class="computeroutput"><span class="identifier">NaryParser</span></code></a></code> provide the |
| necessary facilities for parser detection, introspection, transformation |
| and visitation. |
| </p> |
| <p> |
| Derived parsers must support the following: |
| </p> |
| <div class="variablelist"> |
| <p class="title"><b>bool parse(f, l, context, skip, attr)</b></p> |
| <dl> |
| <dt><span class="term"><code class="computeroutput"><span class="identifier">f</span></code>, <code class="computeroutput"><span class="identifier">l</span></code></span></dt> |
| <dd><p> |
| first/last iterator pair |
| </p></dd> |
| <dt><span class="term"><code class="computeroutput"><span class="identifier">context</span></code></span></dt> |
| <dd><p> |
| enclosing rule context (can be unused_type) |
| </p></dd> |
| <dt><span class="term"><code class="computeroutput"><span class="identifier">skip</span></code></span></dt> |
| <dd><p> |
| skipper (can be unused_type) |
| </p></dd> |
| <dt><span class="term"><code class="computeroutput"><span class="identifier">attr</span></code></span></dt> |
| <dd><p> |
| attribute (can be unused_type) |
| </p></dd> |
| </dl> |
| </div> |
| <p> |
| The <span class="emphasis"><em>parse</em></span> is the main parser entry point. <span class="emphasis"><em>skipper</em></span> |
| can be an <code class="computeroutput"><span class="identifier">unused_type</span></code>. |
| It's a type used every where in <a href="http://boost-spirit.com" target="_top">Spirit</a> |
| to signify "don't-care". There is an overload for <span class="emphasis"><em>skip</em></span> |
| for <code class="computeroutput"><span class="identifier">unused_type</span></code> that is |
| simply a no-op. That way, we do not have to write multiple parse functions |
| for phrase and character level parsing. |
| </p> |
| <p> |
| Here are the basic rules for parsing: |
| </p> |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"> |
| <li class="listitem"> |
| The parser returns <code class="computeroutput"><span class="keyword">true</span></code> |
| if successful, <code class="computeroutput"><span class="keyword">false</span></code> otherwise. |
| </li> |
| <li class="listitem"> |
| If successful, <code class="computeroutput"><span class="identifier">first</span></code> |
| is incremented N number of times, where N is the number of characters |
| parsed. N can be zero --an empty (epsilon) match. |
| </li> |
| <li class="listitem"> |
| If successful, the parsed attribute is assigned to <span class="emphasis"><em>attr</em></span> |
| </li> |
| <li class="listitem"> |
| If unsuccessful, <code class="computeroutput"><span class="identifier">first</span></code> |
| is reset to its position before entering the parser function. <span class="emphasis"><em>attr</em></span> |
| is untouched. |
| </li> |
| </ul></div> |
| <div class="variablelist"> |
| <p class="title"><b>void what(context)</b></p> |
| <dl> |
| <dt><span class="term"><code class="computeroutput"><span class="identifier">context</span></code></span></dt> |
| <dd><p> |
| enclosing rule context (can be <code class="computeroutput"><span class="identifier">unused_type</span></code>) |
| </p></dd> |
| </dl> |
| </div> |
| <p> |
| The <span class="emphasis"><em>what</em></span> function should be obvious. It provides some |
| information about <span class="quote">“<span class="quote">what</span>”</span> the parser is. It is used as a debugging |
| aid, for example. |
| </p> |
| <div class="variablelist"> |
| <p class="title"><b>P::template attribute<context>::type</b></p> |
| <dl> |
| <dt><span class="term"><code class="computeroutput"><span class="identifier">P</span></code></span></dt> |
| <dd><p> |
| a parser type |
| </p></dd> |
| <dt><span class="term"><code class="computeroutput"><span class="identifier">context</span></code></span></dt> |
| <dd><p> |
| A context type (can be unused_type) |
| </p></dd> |
| </dl> |
| </div> |
| <p> |
| The <span class="emphasis"><em>attribute</em></span> metafunction returns the expected attribute |
| type of the parser. In some cases, this is context dependent. |
| </p> |
| <p> |
| In this section, we will dissect two parser types: |
| </p> |
| <div class="variablelist"> |
| <p class="title"><b>Parsers</b></p> |
| <dl> |
| <dt><span class="term"><code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/primitiveparser.html" title="PrimitiveParser"><code class="computeroutput"><span class="identifier">PrimitiveParser</span></code></a></code></span></dt> |
| <dd><p> |
| A parser for primitive data (e.g. integer parsing). |
| </p></dd> |
| <dt><span class="term"><code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/unaryparser.html" title="UnaryParser"><code class="computeroutput"><span class="identifier">UnaryParser</span></code></a></code></span></dt> |
| <dd><p> |
| A parser that has single subject (e.g. kleene star). |
| </p></dd> |
| </dl> |
| </div> |
| <a name="spirit.advanced.indepth.parsers_indepth.primitive_parsers"></a><h6> |
| <a name="spirit.advanced.indepth.parsers_indepth.primitive_parsers-heading"></a> |
| <a class="link" href="parsers_indepth.html#spirit.advanced.indepth.parsers_indepth.primitive_parsers">Primitive |
| Parsers</a> |
| </h6> |
| <p> |
| For our dissection study, we will use a <a href="http://boost-spirit.com" target="_top">Spirit</a> |
| primitive, the <code class="computeroutput"><span class="identifier">any_int_parser</span></code> |
| in the boost::spirit::qi namespace. |
| </p> |
| <p> |
| [primitive_parsers_any_int_parser] |
| </p> |
| <p> |
| The <code class="computeroutput"><span class="identifier">any_int_parser</span></code> is derived |
| from a <code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/primitiveparser.html" title="PrimitiveParser"><code class="computeroutput"><span class="identifier">PrimitiveParser</span></code></a><span class="special"><</span><span class="identifier">Derived</span><span class="special">></span></code>, |
| which in turn derives from <code class="computeroutput"><span class="identifier">parser</span><span class="special"><</span><span class="identifier">Derived</span><span class="special">></span></code>. Therefore, it supports the following |
| requirements: |
| </p> |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"> |
| <li class="listitem"> |
| The <code class="computeroutput"><span class="identifier">parse</span></code> member function |
| </li> |
| <li class="listitem"> |
| The <code class="computeroutput"><span class="identifier">what</span></code> member function |
| </li> |
| <li class="listitem"> |
| The nested <code class="computeroutput"><span class="identifier">attribute</span></code> |
| metafunction |
| </li> |
| </ul></div> |
| <p> |
| <span class="emphasis"><em>parse</em></span> is the main entry point. For primitive parsers, |
| our first thing to do is call: |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="identifier">qi</span><span class="special">::</span><span class="identifier">skip</span><span class="special">(</span><span class="identifier">first</span><span class="special">,</span> <span class="identifier">last</span><span class="special">,</span> <span class="identifier">skipper</span><span class="special">);</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| to do a pre-skip. After pre-skipping, the parser proceeds to do its thing. |
| The actual parsing code is placed in <code class="computeroutput"><span class="identifier">extract_int</span><span class="special"><</span><span class="identifier">T</span><span class="special">,</span> <span class="identifier">Radix</span><span class="special">,</span> <span class="identifier">MinDigits</span><span class="special">,</span> <span class="identifier">MaxDigits</span><span class="special">>::</span><span class="identifier">call</span><span class="special">(</span><span class="identifier">first</span><span class="special">,</span> <span class="identifier">last</span><span class="special">,</span> <span class="identifier">attr</span><span class="special">);</span></code> |
| </p> |
| <p> |
| This simple no-frills protocol is one of the reasons why <a href="http://boost-spirit.com" target="_top">Spirit</a> |
| is fast. If you know the internals of <a href="../../../../../../../libs/spirit/classic/index.html" target="_top"><span class="emphasis"><em>Spirit.Classic</em></span></a> |
| and perhaps even wrote some parsers with it, this simple <a href="http://boost-spirit.com" target="_top">Spirit</a> |
| mechanism is a joy to work with. There are no scanners and all that crap. |
| </p> |
| <p> |
| The <span class="emphasis"><em>what</em></span> function just tells us that it is an integer |
| parser. Simple. |
| </p> |
| <p> |
| The <span class="emphasis"><em>attribute</em></span> metafunction returns the T template |
| parameter. We associate the <code class="computeroutput"><span class="identifier">any_int_parser</span></code> |
| to some placeholders for <code class="computeroutput"><span class="identifier">short_</span></code>, |
| <code class="computeroutput"><span class="identifier">int_</span></code>, <code class="computeroutput"><span class="identifier">long_</span></code> |
| and <code class="computeroutput"><span class="identifier">long_long</span></code> types. But, |
| first, we enable these placeholders in namespace boost::spirit: |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><></span> <span class="comment">// enables short_ |
| </span><span class="keyword">struct</span> <span class="identifier">use_terminal</span><span class="special"><</span><span class="identifier">qi</span><span class="special">::</span><span class="identifier">domain</span><span class="special">,</span> <span class="identifier">tag</span><span class="special">::</span><span class="identifier">short_</span><span class="special">></span> <span class="special">:</span> <span class="identifier">mpl</span><span class="special">::</span><span class="identifier">true_</span> <span class="special">{};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><></span> <span class="comment">// enables int_ |
| </span><span class="keyword">struct</span> <span class="identifier">use_terminal</span><span class="special"><</span><span class="identifier">qi</span><span class="special">::</span><span class="identifier">domain</span><span class="special">,</span> <span class="identifier">tag</span><span class="special">::</span><span class="identifier">int_</span><span class="special">></span> <span class="special">:</span> <span class="identifier">mpl</span><span class="special">::</span><span class="identifier">true_</span> <span class="special">{};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><></span> <span class="comment">// enables long_ |
| </span><span class="keyword">struct</span> <span class="identifier">use_terminal</span><span class="special"><</span><span class="identifier">qi</span><span class="special">::</span><span class="identifier">domain</span><span class="special">,</span> <span class="identifier">tag</span><span class="special">::</span><span class="identifier">long_</span><span class="special">></span> <span class="special">:</span> <span class="identifier">mpl</span><span class="special">::</span><span class="identifier">true_</span> <span class="special">{};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><></span> <span class="comment">// enables long_long |
| </span><span class="keyword">struct</span> <span class="identifier">use_terminal</span><span class="special"><</span><span class="identifier">qi</span><span class="special">::</span><span class="identifier">domain</span><span class="special">,</span> <span class="identifier">tag</span><span class="special">::</span><span class="identifier">long_long</span><span class="special">></span> <span class="special">:</span> <span class="identifier">mpl</span><span class="special">::</span><span class="identifier">true_</span> <span class="special">{};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| Notice that <code class="computeroutput"><span class="identifier">any_int_parser</span></code> |
| is placed in the namespace boost::spirit::qi while these <span class="emphasis"><em>enablers</em></span> |
| are in namespace boost::spirit. The reason is that these placeholders are |
| shared by other <a href="http://boost-spirit.com" target="_top">Spirit</a> <span class="emphasis"><em>domains</em></span>. |
| <span class="emphasis"><em>Spirit.Qi</em></span>, the parser is one domain. <span class="emphasis"><em>Spirit.Karma</em></span>, |
| the generator is another domain. Other parser technologies may be developed |
| and placed in yet another domain. Yet, all these can potentially share |
| the same placeholders for interoperability. The interpretation of these |
| placeholders is domain-specific. |
| </p> |
| <p> |
| Now that we enabled the placeholders, we have to write generators for them. |
| The make_xxx stuff (in boost::spirit::qi namespace): |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span> |
| <span class="keyword">typename</span> <span class="identifier">T</span> |
| <span class="special">,</span> <span class="keyword">unsigned</span> <span class="identifier">Radix</span> <span class="special">=</span> <span class="number">10</span> |
| <span class="special">,</span> <span class="keyword">unsigned</span> <span class="identifier">MinDigits</span> <span class="special">=</span> <span class="number">1</span> |
| <span class="special">,</span> <span class="keyword">int</span> <span class="identifier">MaxDigits</span> <span class="special">=</span> <span class="special">-</span><span class="number">1</span><span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">make_int</span> |
| <span class="special">{</span> |
| <span class="keyword">typedef</span> <span class="identifier">any_int_parser</span><span class="special"><</span><span class="identifier">T</span><span class="special">,</span> <span class="identifier">Radix</span><span class="special">,</span> <span class="identifier">MinDigits</span><span class="special">,</span> <span class="identifier">MaxDigits</span><span class="special">></span> <span class="identifier">result_type</span><span class="special">;</span> |
| <span class="identifier">result_type</span> <span class="keyword">operator</span><span class="special">()(</span><span class="identifier">unused_type</span><span class="special">,</span> <span class="identifier">unused_type</span><span class="special">)</span> <span class="keyword">const</span> |
| <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">result_type</span><span class="special">();</span> |
| <span class="special">}</span> |
| <span class="special">};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| This one above is our main generator. It's a simple function object with |
| 2 (unused) arguments. These arguments are |
| </p> |
| <div class="orderedlist"><ol class="orderedlist" type="1"> |
| <li class="listitem"> |
| The actual terminal value obtained by proto. In this case, either a |
| short_, int_, long_ or long_long. We don't care about this. |
| </li> |
| <li class="listitem"> |
| Modifiers. We also don't care about this. This allows directives such |
| as <code class="computeroutput"><span class="identifier">no_case</span><span class="special">[</span><span class="identifier">p</span><span class="special">]</span></code> |
| to pass information to inner parser nodes. We'll see how that works |
| later. |
| </li> |
| </ol></div> |
| <p> |
| Now: |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Modifiers</span><span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">make_primitive</span><span class="special"><</span><span class="identifier">tag</span><span class="special">::</span><span class="identifier">short_</span><span class="special">,</span> <span class="identifier">Modifiers</span><span class="special">></span> |
| <span class="special">:</span> <span class="identifier">make_int</span><span class="special"><</span><span class="keyword">short</span><span class="special">></span> <span class="special">{};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Modifiers</span><span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">make_primitive</span><span class="special"><</span><span class="identifier">tag</span><span class="special">::</span><span class="identifier">int_</span><span class="special">,</span> <span class="identifier">Modifiers</span><span class="special">></span> |
| <span class="special">:</span> <span class="identifier">make_int</span><span class="special"><</span><span class="keyword">int</span><span class="special">></span> <span class="special">{};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Modifiers</span><span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">make_primitive</span><span class="special"><</span><span class="identifier">tag</span><span class="special">::</span><span class="identifier">long_</span><span class="special">,</span> <span class="identifier">Modifiers</span><span class="special">></span> |
| <span class="special">:</span> <span class="identifier">make_int</span><span class="special"><</span><span class="keyword">long</span><span class="special">></span> <span class="special">{};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Modifiers</span><span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">make_primitive</span><span class="special"><</span><span class="identifier">tag</span><span class="special">::</span><span class="identifier">long_long</span><span class="special">,</span> <span class="identifier">Modifiers</span><span class="special">></span> |
| <span class="special">:</span> <span class="identifier">make_int</span><span class="special"><</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">long_long_type</span><span class="special">></span> <span class="special">{};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| These, specialize <code class="computeroutput"><span class="identifier">qi</span><span class="special">:</span><span class="identifier">make_primitive</span></code> for specific tags. They |
| all inherit from <code class="computeroutput"><span class="identifier">make_int</span></code> |
| which does the actual work. |
| </p> |
| <a name="spirit.advanced.indepth.parsers_indepth.composite_parsers"></a><h6> |
| <a name="spirit.advanced.indepth.parsers_indepth.composite_parsers-heading"></a> |
| <a class="link" href="parsers_indepth.html#spirit.advanced.indepth.parsers_indepth.composite_parsers">Composite |
| Parsers</a> |
| </h6> |
| <p> |
| Let me present the kleene star (also in namespace spirit::qi): |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Subject</span><span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">kleene</span> <span class="special">:</span> <span class="identifier">unary_parser</span><span class="special"><</span><span class="identifier">kleene</span><span class="special"><</span><span class="identifier">Subject</span><span class="special">></span> <span class="special">></span> |
| <span class="special">{</span> |
| <span class="keyword">typedef</span> <span class="identifier">Subject</span> <span class="identifier">subject_type</span><span class="special">;</span> |
| |
| <span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Context</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Iterator</span><span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">attribute</span> |
| <span class="special">{</span> |
| <span class="comment">// Build a std::vector from the subject's attribute. Note |
| </span> <span class="comment">// that build_std_vector may return unused_type if the |
| </span> <span class="comment">// subject's attribute is an unused_type. |
| </span> <span class="keyword">typedef</span> <span class="keyword">typename</span> |
| <span class="identifier">traits</span><span class="special">::</span><span class="identifier">build_std_vector</span><span class="special"><</span> |
| <span class="keyword">typename</span> <span class="identifier">traits</span><span class="special">::</span> |
| <span class="identifier">attribute_of</span><span class="special"><</span><span class="identifier">Subject</span><span class="special">,</span> <span class="identifier">Context</span><span class="special">,</span> <span class="identifier">Iterator</span><span class="special">>::</span><span class="identifier">type</span> |
| <span class="special">>::</span><span class="identifier">type</span> |
| <span class="identifier">type</span><span class="special">;</span> |
| <span class="special">};</span> |
| |
| <span class="identifier">kleene</span><span class="special">(</span><span class="identifier">Subject</span> <span class="keyword">const</span><span class="special">&</span> <span class="identifier">subject</span><span class="special">)</span> |
| <span class="special">:</span> <span class="identifier">subject</span><span class="special">(</span><span class="identifier">subject</span><span class="special">)</span> <span class="special">{}</span> |
| |
| <span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">F</span><span class="special">></span> |
| <span class="keyword">bool</span> <span class="identifier">parse_container</span><span class="special">(</span><span class="identifier">F</span> <span class="identifier">f</span><span class="special">)</span> <span class="keyword">const</span> |
| <span class="special">{</span> |
| <span class="keyword">while</span> <span class="special">(!</span><span class="identifier">f</span> <span class="special">(</span><span class="identifier">subject</span><span class="special">))</span> |
| <span class="special">;</span> |
| <span class="keyword">return</span> <span class="keyword">true</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Iterator</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Context</span> |
| <span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Skipper</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Attribute</span><span class="special">></span> |
| <span class="keyword">bool</span> <span class="identifier">parse</span><span class="special">(</span><span class="identifier">Iterator</span><span class="special">&</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">Iterator</span> <span class="keyword">const</span><span class="special">&</span> <span class="identifier">last</span> |
| <span class="special">,</span> <span class="identifier">Context</span><span class="special">&</span> <span class="identifier">context</span><span class="special">,</span> <span class="identifier">Skipper</span> <span class="keyword">const</span><span class="special">&</span> <span class="identifier">skipper</span> |
| <span class="special">,</span> <span class="identifier">Attribute</span><span class="special">&</span> <span class="identifier">attr</span><span class="special">)</span> <span class="keyword">const</span> |
| <span class="special">{</span> |
| <span class="comment">// ensure the attribute is actually a container type |
| </span> <span class="identifier">traits</span><span class="special">::</span><span class="identifier">make_container</span><span class="special">(</span><span class="identifier">attr</span><span class="special">);</span> |
| |
| <span class="keyword">typedef</span> <span class="identifier">detail</span><span class="special">::</span><span class="identifier">fail_function</span><span class="special"><</span><span class="identifier">Iterator</span><span class="special">,</span> <span class="identifier">Context</span><span class="special">,</span> <span class="identifier">Skipper</span><span class="special">></span> |
| <span class="identifier">fail_function</span><span class="special">;</span> |
| |
| <span class="identifier">Iterator</span> <span class="identifier">iter</span> <span class="special">=</span> <span class="identifier">first</span><span class="special">;</span> |
| <span class="identifier">fail_function</span> <span class="identifier">f</span><span class="special">(</span><span class="identifier">iter</span><span class="special">,</span> <span class="identifier">last</span><span class="special">,</span> <span class="identifier">context</span><span class="special">,</span> <span class="identifier">skipper</span><span class="special">);</span> |
| <span class="identifier">parse_container</span><span class="special">(</span><span class="identifier">detail</span><span class="special">::</span><span class="identifier">make_pass_container</span><span class="special">(</span><span class="identifier">f</span><span class="special">,</span> <span class="identifier">attr</span><span class="special">));</span> |
| |
| <span class="identifier">first</span> <span class="special">=</span> <span class="identifier">f</span><span class="special">.</span><span class="identifier">first</span><span class="special">;</span> |
| <span class="keyword">return</span> <span class="keyword">true</span><span class="special">;</span> |
| <span class="special">}</span> |
| |
| <span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Context</span><span class="special">></span> |
| <span class="identifier">info</span> <span class="identifier">what</span><span class="special">(</span><span class="identifier">Context</span><span class="special">&</span> <span class="identifier">context</span><span class="special">)</span> <span class="keyword">const</span> |
| <span class="special">{</span> |
| <span class="keyword">return</span> <span class="identifier">info</span><span class="special">(</span><span class="string">"kleene"</span><span class="special">,</span> <span class="identifier">subject</span><span class="special">.</span><span class="identifier">what</span><span class="special">(</span><span class="identifier">context</span><span class="special">));</span> |
| <span class="special">}</span> |
| |
| <span class="identifier">Subject</span> <span class="identifier">subject</span><span class="special">;</span> |
| <span class="special">};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| Looks similar in form to its primitive cousin, the <code class="computeroutput"><span class="identifier">int_parser</span></code>. |
| And, again, it has the same basic ingredients required by <code class="computeroutput"><span class="identifier">Derived</span></code>. |
| </p> |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"> |
| <li class="listitem"> |
| The nested attribute metafunction |
| </li> |
| <li class="listitem"> |
| The parse member function |
| </li> |
| <li class="listitem"> |
| The what member function |
| </li> |
| </ul></div> |
| <p> |
| kleene is a composite parser. It is a parser that composes another parser, |
| its <span class="quote">“<span class="quote">subject</span>”</span>. It is a <code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/unaryparser.html" title="UnaryParser"><code class="computeroutput"><span class="identifier">UnaryParser</span></code></a></code> and subclasses |
| from it. Like <code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/primitiveparser.html" title="PrimitiveParser"><code class="computeroutput"><span class="identifier">PrimitiveParser</span></code></a></code>, <code class="computeroutput"><a class="link" href="../../qi/reference/parser_concepts/unaryparser.html" title="UnaryParser"><code class="computeroutput"><span class="identifier">UnaryParser</span></code></a><span class="special"><</span><span class="identifier">Derived</span><span class="special">></span></code> |
| derives from <code class="computeroutput"><span class="identifier">parser</span><span class="special"><</span><span class="identifier">Derived</span><span class="special">></span></code>. |
| </p> |
| <p> |
| unary_parser<Derived>, has these expression requirements on Derived: |
| </p> |
| <div class="itemizedlist"><ul class="itemizedlist" type="disc"> |
| <li class="listitem"> |
| p.subject -> subject parser ( <span class="emphasis"><em>p</em></span> is a <a class="link" href="../../qi/reference/parser_concepts/unaryparser.html" title="UnaryParser"><code class="computeroutput"><span class="identifier">UnaryParser</span></code></a> parser.) |
| </li> |
| <li class="listitem"> |
| P::subject_type -> subject parser type ( <span class="emphasis"><em>P</em></span> |
| is a <a class="link" href="../../qi/reference/parser_concepts/unaryparser.html" title="UnaryParser"><code class="computeroutput"><span class="identifier">UnaryParser</span></code></a> type.) |
| </li> |
| </ul></div> |
| <p> |
| <span class="emphasis"><em>parse</em></span> is the main parser entry point. Since this is |
| not a primitive parser, we do not need to call <code class="computeroutput"><span class="identifier">qi</span><span class="special">::</span><span class="identifier">skip</span><span class="special">(</span><span class="identifier">first</span><span class="special">,</span> <span class="identifier">last</span><span class="special">,</span> <span class="identifier">skipper</span><span class="special">)</span></code>. The <span class="emphasis"><em>subject</em></span>, if |
| it is a primitive, will do the pre-skip. If if it is another composite |
| parser, it will eventually call a primitive parser somewhere down the line |
| which will do the pre-skip. This makes it a lot more efficient than <a href="../../../../../../../libs/spirit/classic/index.html" target="_top"><span class="emphasis"><em>Spirit.Classic</em></span></a>. |
| <a href="../../../../../../../libs/spirit/classic/index.html" target="_top"><span class="emphasis"><em>Spirit.Classic</em></span></a> |
| puts the skipping business into the so-called "scanner" which |
| blindly attempts a pre-skip every time we increment the iterator. |
| </p> |
| <p> |
| What is the <span class="emphasis"><em>attribute</em></span> of the kleene? In general, it |
| is a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span><span class="special"><</span><span class="identifier">T</span><span class="special">></span></code> |
| where <code class="computeroutput"><span class="identifier">T</span></code> is the attribute |
| of the subject. There is a special case though. If <code class="computeroutput"><span class="identifier">T</span></code> |
| is an <code class="computeroutput"><span class="identifier">unused_type</span></code>, then |
| the attribute of kleene is also <code class="computeroutput"><span class="identifier">unused_type</span></code>. |
| <code class="computeroutput"><span class="identifier">traits</span><span class="special">::</span><span class="identifier">build_std_vector</span></code> takes care of that minor |
| detail. |
| </p> |
| <p> |
| So, let's parse. First, we need to provide a local attribute of for the |
| subject: |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">typename</span> <span class="identifier">traits</span><span class="special">::</span><span class="identifier">attribute_of</span><span class="special"><</span><span class="identifier">Subject</span><span class="special">,</span> <span class="identifier">Context</span><span class="special">>::</span><span class="identifier">type</span> <span class="identifier">val</span><span class="special">;</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| <code class="computeroutput"><span class="identifier">traits</span><span class="special">::</span><span class="identifier">attribute_of</span><span class="special"><</span><span class="identifier">Subject</span><span class="special">,</span> <span class="identifier">Context</span><span class="special">></span></code> |
| simply calls the subject's <code class="computeroutput"><span class="keyword">struct</span> |
| <span class="identifier">attribute</span><span class="special"><</span><span class="identifier">Context</span><span class="special">></span></code> |
| nested metafunction. |
| </p> |
| <p> |
| <span class="emphasis"><em>val</em></span> starts out default initialized. This val is the |
| one we'll pass to the subject's parse function. |
| </p> |
| <p> |
| The kleene repeats indefinitely while the subject parser is successful. |
| On each successful parse, we <code class="computeroutput"><span class="identifier">push_back</span></code> |
| the parsed attribute to the kleene's attribute, which is expected to be, |
| at the very least, compatible with a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span></code>. |
| In other words, although we say that we want our attribute to be a <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span></code>, we try to be more lenient than |
| that. The caller of kleene's parse may pass a different attribute type. |
| For as long as it is also a conforming STL container with <code class="computeroutput"><span class="identifier">push_back</span></code>, we are ok. Here is the kleene |
| loop: |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">while</span> <span class="special">(</span><span class="identifier">subject</span><span class="special">.</span><span class="identifier">parse</span><span class="special">(</span><span class="identifier">first</span><span class="special">,</span> <span class="identifier">last</span><span class="special">,</span> <span class="identifier">context</span><span class="special">,</span> <span class="identifier">skipper</span><span class="special">,</span> <span class="identifier">val</span><span class="special">))</span> |
| <span class="special">{</span> |
| <span class="comment">// push the parsed value into our attribute |
| </span> <span class="identifier">traits</span><span class="special">::</span><span class="identifier">push_back</span><span class="special">(</span><span class="identifier">attr</span><span class="special">,</span> <span class="identifier">val</span><span class="special">);</span> |
| <span class="identifier">traits</span><span class="special">::</span><span class="identifier">clear</span><span class="special">(</span><span class="identifier">val</span><span class="special">);</span> |
| <span class="special">}</span> |
| <span class="keyword">return</span> <span class="keyword">true</span><span class="special">;</span> |
| </pre> |
| <p> |
| Take note that we didn't call attr.push_back(val). Instead, we called a |
| Spirit provided function: |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="identifier">traits</span><span class="special">::</span><span class="identifier">push_back</span><span class="special">(</span><span class="identifier">attr</span><span class="special">,</span> <span class="identifier">val</span><span class="special">);</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| This is a recurring pattern. The reason why we do it this way is because |
| attr <span class="bold"><strong>can</strong></span> be <code class="computeroutput"><span class="identifier">unused_type</span></code>. |
| <code class="computeroutput"><span class="identifier">traits</span><span class="special">::</span><span class="identifier">push_back</span></code> takes care of that detail. |
| The overload for unused_type is a no-op. Now, you can imagine why <a href="http://boost-spirit.com" target="_top">Spirit</a> is fast! The parsers are so |
| simple and the generated code is as efficient as a hand rolled loop. All |
| these parser compositions and recursive parse invocations are extensively |
| inlined by a modern C++ compiler. In the end, you get a tight loop when |
| you use the kleene. No more excess baggage. If the attribute is unused, |
| then there is no code generated for that. That's how <a href="http://boost-spirit.com" target="_top">Spirit</a> |
| is designed. |
| </p> |
| <p> |
| The <span class="emphasis"><em>what</em></span> function simply wraps the output of the subject |
| in a "kleene<span class="quote">“<span class="quote">... "</span>”</span>". |
| </p> |
| <p> |
| Ok, now, like the <code class="computeroutput"><span class="identifier">int_parser</span></code>, |
| we have to hook our parser to the <span class="underline">qi</span> |
| engine. Here's how we do it: |
| </p> |
| <p> |
| First, we enable the prefix star operator. In proto, it's called the "dereference": |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><></span> |
| <span class="keyword">struct</span> <span class="identifier">use_operator</span><span class="special"><</span><span class="identifier">qi</span><span class="special">::</span><span class="identifier">domain</span><span class="special">,</span> <span class="identifier">proto</span><span class="special">::</span><span class="identifier">tag</span><span class="special">::</span><span class="identifier">dereference</span><span class="special">></span> <span class="comment">// enables *p |
| </span> <span class="special">:</span> <span class="identifier">mpl</span><span class="special">::</span><span class="identifier">true_</span> <span class="special">{};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| This is done in namespace <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">spirit</span></code> |
| like its friend, the <code class="computeroutput"><span class="identifier">use_terminal</span></code> |
| specialization for our <code class="computeroutput"><span class="identifier">int_parser</span></code>. |
| Obviously, we use <span class="emphasis"><em>use_operator</em></span> to enable the dereference |
| for the qi::domain. |
| </p> |
| <p> |
| Then, we need to write our generator (in namespace qi): |
| </p> |
| <p> |
| |
| </p> |
| <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Elements</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Modifiers</span><span class="special">></span> |
| <span class="keyword">struct</span> <span class="identifier">make_composite</span><span class="special"><</span><span class="identifier">proto</span><span class="special">::</span><span class="identifier">tag</span><span class="special">::</span><span class="identifier">dereference</span><span class="special">,</span> <span class="identifier">Elements</span><span class="special">,</span> <span class="identifier">Modifiers</span><span class="special">></span> |
| <span class="special">:</span> <span class="identifier">make_unary_composite</span><span class="special"><</span><span class="identifier">Elements</span><span class="special">,</span> <span class="identifier">kleene</span><span class="special">></span> |
| <span class="special">{};</span> |
| </pre> |
| <p> |
| </p> |
| <p> |
| This essentially says; for all expressions of the form: <code class="computeroutput"><span class="special">*</span><span class="identifier">p</span></code>, to build a kleene parser. Elements |
| is a <a href="../../../../../../../libs/fusion/doc/html/index.html" target="_top">Boost.Fusion</a> |
| sequence. For the kleene, which is a unary operator, expect only one element |
| in the sequence. That element is the subject of the kleene. |
| </p> |
| <p> |
| We still don't care about the Modifiers. We'll see how the modifiers is |
| all about when we get to deep directives. |
| </p> |
| </div> |
| <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> |
| <td align="left"></td> |
| <td align="right"><div class="copyright-footer">Copyright © 2001-2011 Joel de Guzman, Hartmut Kaiser<p> |
| Distributed under the Boost Software License, Version 1.0. (See accompanying |
| file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) |
| </p> |
| </div></td> |
| </tr></table> |
| <hr> |
| <div class="spirit-nav"> |
| <a accesskey="p" href="../indepth.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../indepth.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../customize.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a> |
| </div> |
| </body> |
| </html> |