The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. We'll start by describing a slightly simplified variant of the real thing – the “basic algorithm” – and then describe some possible improvements.
The basic algorithm is called the red-green algorithm[^salsa]. The high-level idea is that, after each run of the compiler, we will save the results of all the queries that we do, as well as the query DAG. The query DAG is a DAG that indexes which queries executed which other queries. So, for example, there would be an edge from a query Q1 to another query Q2 if computing Q1 required computing Q2 (note that because queries cannot depend on themselves, this results in a DAG and not a general graph).
On the next run of the compiler, then, we can sometimes reuse these query results to avoid re-executing a query. We do this by assigning every query a color:
There are two key insights here:
At the core of incremental compilation is an algorithm called “try-mark-green”. It has the job of determining the color of a given query Q (which must not have yet been executed). In cases where Q has red inputs, determining Q‘s color may involve re-executing Q so that we can compare its output, but if all of Q’s inputs are green, then we can conclude that Q must be green without re-executing it or inspecting its value at all. In the compiler, this allows us to avoid deserializing the result from disk when we don't need it, and in fact enables us to sometimes skip serializing the result as well (see the refinements section below).
Try-mark-green works as follows:
reads(Q)
vector from the query DAG. The “reads” is the set of queries that Q executed during its execution.reads(Q)
, we recursively demand the color of R using try-mark-green.reads(Q)
in same order as they occurred in the original compilation. See the section on the query DAG below.reads(Q)
wind up colored red, then Q is dirty.reads(Q)
must be green. In that case, we can color Q as green and return.The query DAG code is stored in src/librustc/dep_graph
. Construction of the DAG is done by instrumenting the query execution.
One key point is that the query DAG also tracks ordering; that is, for each query Q, we not only track the queries that Q reads, we track the order in which they were read. This allows try-mark-green to walk those queries back in the same order. This is important because once a subquery comes back as red, we can no longer be sure that Q will continue along the same path as before. That is, imagine a query like this:
fn main_query(tcx) { if tcx.subquery1() { tcx.subquery2() } else { tcx.subquery3() } }
Now imagine that in the first compilation, main_query
starts by executing subquery1
, and this returns true. In that case, the next query main_query
executes will be subquery2
, and subquery3
will not be executed at all.
But now imagine that in the next compilation, the input has changed such that subquery1
returns false. In this case, subquery2
would never execute. If try-mark-green were to visit reads(main_query)
out of order, however, it might visit subquery2
before subquery1
, and hence execute it. This can lead to ICEs and other problems in the compiler.
In the description of the basic algorithm, we said that at the end of compilation we would save the results of all the queries that were performed. In practice, this can be quite wasteful – many of those results are very cheap to recompute, and serializing and deserializing them is not a particular win. In practice, what we would do is to save the hashes of all the subqueries that we performed. Then, in select cases, we also save the results.
This is why the incremental algorithm separates computing the color of a node, which often does not require its value, from computing the result of a node. Computing the result is done via a simple algorithm like so:
The initial design document can be found at https://github.com/nikomatsakis/rustc-on-demand-incremental-design-doc/blob/master/0000-rustc-on-demand-and-incremental.md, which expands on the memoization details, provides more high-level overview and motivation for this system.
[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis