| // Copyright 2022 The Kythe Authors. All rights reserved. |
| // |
| // Licensed under the Apache License, Version 2.0 (the "License"); |
| // you may not use this file except in compliance with the License. |
| // You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, software |
| // distributed under the License is distributed on an "AS IS" BASIS, |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| // See the License for the specific language governing permissions and |
| // limitations under the License. |
| |
| = Adding Support for Write References |
| |
| We recently updated the schema with the `ref/writes` edge, which is |
| intended to allow users to distinguish between ordinary references and |
| references that have some lasting effect on a program. Consider this |
| basic example: |
| |
| [source, c] |
| ---- |
| //- @n defines/binding VarN |
| int n; |
| //- @n ref/writes VarN |
| n = 1; |
| ---- |
| |
| It turns out that filtering by writes can make certain tasks much |
| easier. In this document, we'll discuss the changes that need to be |
| made to an indexer (or other Kythe-supporting software, such as the |
| protobuf generator) to support the `ref/writes` distinction. |
| |
| == Goals and non-goals |
| |
| Throughout this document, we emphasize that we should only distinguish |
| writes that are *almost certainly* writes. Indexer authors are not |
| expected to do alias analysis or track dataflow, except in some narrow |
| circumstances (such as supporting an immediate dereference-and-assignment, |
| like `*x = 3;`). |
| |
| As users interact with the data, we may add or remove guidance. |
| |
| == Core language features |
| |
| These features should be implemented for write references to be useful. |
| |
| === Basic assignment |
| |
| The basic set of assignment operators (`=`, `+=`, `--`, etc) should be |
| supported. The minimal set of lvalues should include: |
| |
| * Ordinary variables. |
| * Fields. |
| * Arrays, in which the array variable is the thing that is written. |
| |
| This means that we don't need to worry about most calculated lvalues. |
| We should walk through member specifiers. For example, `(*x + 3) = 5` |
| doesn't get special treatment. Where possible, writes to containers |
| (i.e., any data type that has an open-ended field count) should use |
| the `ref/writes/partial` edge: |
| |
| [source, c] |
| ---- |
| //- @a ref VarA |
| //- @foo ref/writes/partial FieldFoo |
| a.foo[x] = 3; |
| ---- |
| |
| Note that `*x = 0;` should use ref/writes; `x[0]` should use |
| `ref/writes/partial`. `0[x]` is left as an exercise to the reader. |
| |
| === Property assignment |
| |
| The `property/reads` and `property/writes` edges point from semantic |
| nodes to semantic nodes; this is in distinction to `ref/writes/*`, which |
| points from anchors to semantic nodes. These edges are still useful |
| for code generators, as it is not always the case that an indexer |
| has access to the metadata for generated code (e.g., javac might |
| receive an interface jar instead of the full source in a build system). |
| |
| == Extensions and refinements |
| |
| === Assignment through pointers |
| |
| Our goal is to mark writes that are *almost certainly* writes. |
| Fortunately, we can still support many basic patterns that appear |
| frequently in code. For example: |
| |
| [source, c] |
| ---- |
| //- @mutable_foo ref/writes FieldFoo |
| *(proto_x.mutable_foo()) = 3; |
| ... |
| int& bar() { return bar_; } |
| ... |
| //- @bar ref/writes FieldBar |
| bar() = 4; |
| ---- |
| |
| === Getters and setters |
| |
| Many languages encourage the use of getter and setter functions. |
| These typically look like some small variation of: |
| |
| [source, c] |
| ---- |
| int foo() const { return foo_; } |
| void set_foo(int foo) { foo_ = foo; } |
| ---- |
| |
| There is some value in being able to detect the structure of these |
| functions and to treat `set_foo` as a ref/write to `foo` (as |
| well as a ref to `set_foo` itself): |
| |
| [source, c] |
| ---- |
| //- @set_foo ref/writes FooField |
| //- @set_foo ref SetFoo |
| //- ... |
| x.set_foo(1); |
| ---- |
| |
| It may also be possible in a target language to provide a library |
| of annotations so that end-users can describe the semantics of |
| non-trivial library functions. |
| |
| === Builder-like classes |
| |
| Setters on builder classes should appear as writes to the relevant |
| fields: |
| |
| [source, java] |
| ---- |
| //- @setFoo ref/writes FieldFoo |
| //- @setBar ref/writes FieldBar |
| Thing t = Thing.builder().setFoo(1).setBar(2).build(); |
| ---- |
| |
| == Code generators |
| |
| Kythe's metadata facility supports attaching semantics to generated code. |
| This is used, for example, to turn calls into protocol buffer setters into |
| writes to the relevant fields (in C++, where metadata is available) or |
| to emit property/writes and property/reads edges for languages like Java, |
| where it is not available downstream. |
| |
| The current C++ implementation for proto uses a name-based heuristic with |
| https://github.com/kythe/kythe/blob/dfb439a63de909e9eb67d0f36fc2471fe0693afb/kythe/cxx/common/protobuf_metadata_file.cc#L97[obvious flaws]. |
| (A more precise version is in development.) Still, the basic structure is |
| visible: various metadata rules are instantiated with particular semantics. |
| When a reference is made, the indexer consults a table of offsets and |
| declarations to check whether to change its behavior when adding edges. |