| [/============================================================================== |
| Copyright (C) 2001-2018 Joel de Guzman |
| |
| Distributed under the Boost Software License, Version 1.0. (See accompanying |
| file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| |
| I would like to thank Rainbowverse, llc (https://primeorbial.com/) |
| for sponsoring this work and donating it to the community. |
| ===============================================================================/] |
| |
| [section:annotation Annotations - Decorating the ASTs] |
| |
| As a prerequisite in understanding this tutorial, please review the previous |
| [tutorial_employee employee example]. This example builds on top of that |
| example. |
| |
| Stop and think about it... We're actually generating ASTs (abstract syntax |
| trees) in our previoius examples. We parsed a single structure and generated |
| an in-memory representation of it in the form of a struct: the struct |
| employee. If we changed the implementation to parse one or more employees, |
| the result would be a std::vector<employee>. We can go on and add more |
| hierarchy: teams, departments, corporations, etc. We can have an AST |
| representation of it all. |
| |
| This example shows how to annotate the AST with the iterator positions for |
| access to the source code when post processing using a client supplied |
| `on_success` handler. The example will show how to get the position in input |
| source stream that corresponds to a given element in the AST. |
| |
| In addition, This example also shows how to "inject" client data, using the |
| "with" directive, that the `on_success` handler can access as it is called |
| within the parse traversal through the parser's context. |
| |
| The full cpp file for this example can be found here: |
| [@../../../example/x3/annotation.cpp annotation.cpp] |
| |
| [heading The AST] |
| |
| First, we'll update our previous employee struct, this time separating the |
| person into its own struct. So now, we have two structs, the `person` and the |
| `employee`. Take note too that we now inherit `person` and `employee` from |
| `x3::position_tagged` which provides positional information that we can use |
| to tell the AST's position in the input stream anytime. |
| |
| namespace client { namespace ast |
| { |
| struct person : x3::position_tagged |
| { |
| person( |
| std::string const& first_name = "" |
| , std::string const& last_name = "" |
| ) |
| : first_name(first_name) |
| , last_name(last_name) |
| {} |
| |
| std::string first_name, last_name; |
| }; |
| |
| struct employee : x3::position_tagged |
| { |
| int age; |
| person who; |
| double salary; |
| }; |
| }} |
| |
| Like before, we need to tell __fusion__ about our structs to make them |
| first-class fusion citizens that the grammar can utilize: |
| |
| BOOST_FUSION_ADAPT_STRUCT(client::ast::person, |
| first_name, last_name |
| ) |
| |
| BOOST_FUSION_ADAPT_STRUCT(client::ast::employee, |
| age, who, salary |
| ) |
| |
| [heading x3::position_cache] |
| |
| Before we proceed, let me introduce a helper class called the |
| `position_cache`. It is a simple class that collects iterator ranges that |
| point to where each element in the AST are located in the input stream. Given |
| an AST, you can query the position_cache about AST's position. For example: |
| |
| auto pos = positions.position_of(my_ast); |
| |
| Where `my_ast` is the AST, `positions` and is the `position_cache`, |
| `position_of` returns an iterator range that points to the start and end |
| (`pos.begin()` and `pos.end()`) positions where the AST was parsed from. |
| `positions.begin()` and `positions.end()` points to the start and end of the |
| entire input stream. |
| |
| [heading on_success] |
| |
| The `on_success` gives you everything you want from semantic actions without |
| the visual clutter. Declarative code can and should be free from imperative |
| code. `on_success` as a concept and mechanism is an important departure from |
| how things are done in Spirit's previous version: Qi. |
| |
| As demonstrated in the previous [tutorial_employee employee example], the |
| preferred way to extract data from an input source is by having the parser |
| collect the data for us into C++ structs as it traverses the input stream. |
| Ideally, Spirit X3 grammars are fully attributed and declared in such a way |
| that you do not have to add any imperative code and there should be no need |
| for semantic actions at all. The parser simply works as declared and you get |
| your data back as a result. |
| |
| However, there are certain cases where there's no way to avoid introducing |
| imperative code. But semantic actions mess up our clean declarative grammars. |
| If we care to keep our code clean, `on_success` handlers are alternative |
| callback hooks to client code that are executed by the parser after a |
| successful parse without polluting the grammar. Like semantic actions, |
| `on_success` handlers have access to the AST, the iterators, and context. |
| But, unlike semantic actions, `on_success` handlers are cleanly separated |
| from the actual grammar. |
| |
| [heading Annotation Handler] |
| |
| As discussed, we annotate the AST with its position in the input stream with |
| our `on_success` handler: |
| |
| // tag used to get the position cache from the context |
| struct position_cache_tag; |
| |
| struct annotate_position |
| { |
| template <typename T, typename Iterator, typename Context> |
| inline void on_success(Iterator const& first, Iterator const& last |
| , T& ast, Context const& context) |
| { |
| auto& position_cache = x3::get<position_cache_tag>(context).get(); |
| position_cache.annotate(ast, first, last); |
| } |
| }; |
| |
| `position_cache_tag` is a special tag we will use to get a reference to the |
| actual `position_cache`, client data that we will inject at very start, when |
| we call parse. More on that later. |
| |
| Our `on_success` handler gets a reference to the actual `position_cache` and |
| calls its `annotate` member function, passing in the AST and the iterators. |
| `position_cache.annotate(ast, first, last)` annotates the AST with |
| information required by `x3::position_tagged`. |
| |
| [heading The Parser] |
| |
| Now we'll write a parser for our employee. To simplify, inputs will be of the |
| form: |
| |
| { age, "forename", "surname", salary } |
| |
| [#__tutorial_annotated_employee_parser__] |
| Here we go: |
| |
| namespace parser |
| { |
| using x3::int_; |
| using x3::double_; |
| using x3::lexeme; |
| using ascii::char_; |
| |
| struct quoted_string_class; |
| struct person_class; |
| struct employee_class; |
| |
| x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string"; |
| x3::rule<person_class, ast::person> const person = "person"; |
| x3::rule<employee_class, ast::employee> const employee = "employee"; |
| |
| auto const quoted_string_def = lexeme['"' >> +(char_ - '"') >> '"']; |
| auto const person_def = quoted_string >> ',' >> quoted_string; |
| |
| auto const employee_def = |
| '{' |
| >> int_ >> ',' |
| >> person >> ',' |
| >> double_ |
| >> '}' |
| ; |
| |
| auto const employees = employee >> *(',' >> employee); |
| |
| BOOST_SPIRIT_DEFINE(quoted_string, person, employee); |
| } |
| |
| [heading Rule Declarations] |
| |
| struct quoted_string_class; |
| struct person_class; |
| struct employee_class; |
| |
| x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string"; |
| x3::rule<person_class, ast::person> const person = "person"; |
| x3::rule<employee_class, ast::employee> const employee = "employee"; |
| |
| Go back and review the original [link __tutorial_employee_parser__ employee parser]. |
| What has changed? |
| |
| * We split the single employee rule into three smaller rules: `quoted_string`, |
| `person` and `employee`. |
| * We're using forward declared rule classes: `quoted_string_class`, `person_class`, |
| and `employee_class`. |
| |
| [heading Rule Classes] |
| |
| Like before, in this example, the rule classes, `quoted_string_class`, |
| `person_class`, and `employee_class` provide statically known IDs for the |
| rules required by X3 to perform its tasks. In addition to that, the rule |
| class can also be extended to have some user-defined customization hooks that |
| are called: |
| |
| * On success: After a rule successfully parses an input. |
| * On Error: After a rule fails to parse. |
| |
| By subclassing the rule class from a client supplied handler such as our |
| `annotate_position` handler above: |
| |
| struct person_class : annotate_position {}; |
| struct employee_class : annotate_position {}; |
| |
| The code above tells X3 to check the rule class if it has an `on_success` or |
| `on_error` member functions and appropriately calls them on such events. |
| |
| [#__tutorial_with_directive__] |
| [heading The with Directive] |
| |
| For any parser `p`, one can inject supplementary data that semantic actions |
| and handlers can access later on when they are called. The general syntax is: |
| |
| with<tag>(data)[p] |
| |
| For our particular example, we use to inject the `position_cache` into the |
| parse for our `annotate_position` on_success handler to have access to: |
| |
| auto const parser = |
| // we pass our position_cache to the parser so we can access |
| // it later in our on_sucess handlers |
| with<position_cache_tag>(std::ref(positions)) |
| [ |
| employees |
| ]; |
| |
| Typically this is done just before calling `x3::parse` or `x3::phrase_parse`. |
| `with` is a very lightwight operation. It is possible to inject as much data |
| as you want, even multiple `with` directives: |
| |
| with<tag1>(data1) |
| [ |
| with<tag2>(data2)[p] |
| ] |
| |
| Multiple `with` directives can (perhaps not obviously) be injected from |
| outside the called function. Here's an outline: |
| |
| template <typename Parser> |
| void bar(Parser const& p) |
| { |
| // Inject data2 |
| auto const parser = with<tag2>(data2)[p]; |
| x3::parse(first, last, parser); |
| } |
| |
| void foo() |
| { |
| // Inject data1 |
| auto const parser = with<tag1>(data1)[my_parser]; |
| bar(p); |
| } |
| |
| [heading Let's Parse] |
| |
| Now we have the complete parse mechanism with support for annotations: |
| |
| using iterator_type = std::string::const_iterator; |
| using position_cache = boost::spirit::x3::position_cache<std::vector<iterator_type>>; |
| |
| std::vector<client::ast::employee> |
| parse(std::string const& input, position_cache& positions) |
| { |
| using boost::spirit::x3::ascii::space; |
| |
| std::vector<client::ast::employee> ast; |
| iterator_type iter = input.begin(); |
| iterator_type const end = input.end(); |
| |
| using boost::spirit::x3::with; |
| |
| // Our parser |
| using client::parser::employees; |
| using client::parser::position_cache_tag; |
| |
| auto const parser = |
| // we pass our position_cache to the parser so we can access |
| // it later in our on_sucess handlers |
| with<position_cache_tag>(std::ref(positions)) |
| [ |
| employees |
| ]; |
| |
| bool r = phrase_parse(iter, end, parser, space, ast); |
| |
| // ... Some error checking here |
| |
| return ast; |
| } |
| |
| Let's walk through the code. |
| |
| First, we have some typedefs for 1) The iterator type we are using for the |
| parser, `iterator_type` and 2) For the `position_cache` type. The latter is a |
| template that accepts the type of container it will hold. In this case, a |
| `std::vector<iterator_type>`. |
| |
| The main parse function accepts an input, a std::string and a reference to a |
| position_cache, and returns an AST: `std::vector<client::ast::employee>`. |
| |
| Inside the parse function, we first create an AST where parsed data will be |
| stored: |
| |
| std::vector<client::ast::employee> ast; |
| |
| Then finally, we create a parser, injecting a reference to the `position_cache`, |
| and call phrase_parse: |
| |
| using client::parser::employees; |
| using client::parser::position_cache_tag; |
| |
| auto const parser = |
| // we pass our position_cache to the parser so we can access |
| // it later in our on_sucess handlers |
| with<position_cache_tag>(std::ref(positions)) |
| [ |
| employees |
| ]; |
| |
| bool r = phrase_parse(iter, end, parser, space, ast); |
| |
| On successful parse, the AST, `ast`, will contain the actual parsed data. |
| |
| [heading Getting The Source Positions] |
| |
| Now that we have our main parse function, let's have an example sourcefile to |
| parse and show how we can obtain the position of an AST element, returned |
| after a successful parse. |
| |
| Given this input: |
| |
| std::string input = R"( |
| { |
| 23, |
| "Amanda", |
| "Stefanski", |
| 1000.99 |
| }, |
| { |
| 35, |
| "Angie", |
| "Chilcote", |
| 2000.99 |
| }, |
| { |
| 43, |
| "Dannie", |
| "Dillinger", |
| 3000.99 |
| }, |
| { |
| 22, |
| "Dorene", |
| "Dole", |
| 2500.99 |
| }, |
| { |
| 38, |
| "Rossana", |
| "Rafferty", |
| 5000.99 |
| } |
| )"; |
| |
| We call our parse function after instantiating a `position_cache` object that |
| will hold the source stream positions: |
| |
| position_cache positions{input.begin(), input.end()}; |
| auto ast = parse(input, positions); |
| |
| We now have an AST, `ast`, that contains the parsed results. Let us get the |
| source positions of the 2nd employee: |
| |
| auto pos = positions.position_of(ast[1]); // zero based of course! |
| |
| `pos` is an iterator range that contains iterators to the start and end of |
| `ast[1]` in the input stream. |
| |
| [heading Config] |
| |
| If you read the previous [tutorial_minimal Program Structure] tutorial where |
| we separated various logical modules of the parser into separate cpp and |
| header files, and you are wondering how to provide the context configuration |
| information (see [link tutorial_configuration Config Section]), we need to |
| supplement the context like this: |
| |
| using phrase_context_type = x3::phrase_parse_context<x3::ascii::space_type>::type; |
| |
| typedef x3::context< |
| error_handler_tag |
| , std::reference_wrapper<position_cache> |
| , phrase_context_type> |
| context_type; |
| |
| [endsect] |