MIR is Rust's Mid-level Intermediate Representation. It is constructed from HIR. MIR was introduced in RFC 1211. It is a radically simplified form of Rust that is used for certain flow-sensitive safety checks – notably the borrow checker! – and also for optimization and code generation.
If you'd like a very high-level introduction to MIR, as well as some of the compiler concepts that it relies on (such as control-flow graphs and desugaring), you may enjoy the rust-lang blog post that introduced MIR.
MIR is defined in the src/librustc/mir/
module, but much of the code that manipulates it is found in src/librustc_mir
.
Some of the key characteristics of MIR are:
This section introduces the key concepts of MIR, summarized here:
_1
. There is also a special “local” (_0
) allocated to store the return value._1
or _1.f
.22
) or a place (like _1
).You can get a feeling for how MIR is structed by translating simple programs into MIR and reading the pretty printed output. In fact, the playground makes this easy, since it supplies a MIR button that will show you the MIR for your program. Try putting this program into play (or clicking on this link), and then clicking the “MIR” button on the top:
fn main() { let mut vec = Vec::new(); vec.push(1); vec.push(2); }
You should see something like:
// WARNING: This output format is intended for human consumers only // and is subject to change without notice. Knock yourself out. fn main() -> () { ... }
This is the MIR format for the main
function.
Variable declarations. If we drill in a bit, we'll see it begins with a bunch of variable declarations. They look like this:
let mut _0: (); // return place let mut _1: std::vec::Vec<i32>; // in scope 0 at src/main.rs:2:9: 2:16 let mut _2: (); let mut _3: &mut std::vec::Vec<i32>; let mut _4: (); let mut _5: &mut std::vec::Vec<i32>;
You can see that variables in MIR don‘t have names, they have indices, like _0
or _1
. We also intermingle the user’s variables (e.g., _1
) with temporary values (e.g., _2
or _3
). You can tell apart user-defined variables because they have debuginfo associated to them (see below).
User variable debuginfo. Below the variable declarations, we find the only hint that _1
represents a user variable:
scope 1 { debug vec => _1; // in scope 1 at src/main.rs:2:9: 2:16 }
Each debug <Name> => <Place>;
annotation describes a named user variable, and where (i.e. the place) a debugger can find the data of that variable. Here the mapping is trivial, but optimizations may complicate the place, or lead to multiple user variables sharing the same place. Additionally, closure captures are described using the same system, and so they're complicated even without optimizations, e.g.: debug x => (*((*_1).0: &T));
.
The “scope” blocks (e.g., scope 1 { .. }
) describe the lexical structure of the source program (which names were in scope when), so any part of the program annotated with // in scope 0
would be missing vec
, if you were stepping through the code in a debugger, for example.
Basic blocks. Reading further, we see our first basic block (naturally it may look slightly different when you view it, and I am ignoring some of the comments):
bb0: { StorageLive(_1); _1 = const <std::vec::Vec<T>>::new() -> bb2; }
A basic block is defined by a series of statements and a final terminator. In this case, there is one statement:
StorageLive(_1);
This statement indicates that the variable _1
is “live”, meaning that it may be used later – this will persist until we encounter a StorageDead(_1)
statement, which indicates that the variable _1
is done being used. These “storage statements” are used by LLVM to allocate stack space.
The terminator of the block bb0
is the call to Vec::new
:
_1 = const <std::vec::Vec<T>>::new() -> bb2;
Terminators are different from statements because they can have more than one successor – that is, control may flow to different places. Function calls like the call to Vec::new
are always terminators because of the possibility of unwinding, although in the case of Vec::new
we are able to see that indeed unwinding is not possible, and hence we list only one successor block, bb2
.
If we look ahead to bb2
, we will see it looks like this:
bb2: { StorageLive(_3); _3 = &mut _1; _2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4]; }
Here there are two statements: another StorageLive
, introducing the _3
temporary, and then an assignment:
_3 = &mut _1;
Assignments in general have the form:
<Place> = <Rvalue>
A place is an expression like _3
, _3.f
or *_3
– it denotes a location in memory. An Rvalue is an expression that creates a value: in this case, the rvalue is a mutable borrow expression, which looks like &mut <Place>
. So we can kind of define a grammar for rvalues like so:
<Rvalue> = & (mut)? <Place> | <Operand> + <Operand> | <Operand> - <Operand> | ... <Operand> = Constant | copy Place | move Place
As you can see from this grammar, rvalues cannot be nested – they can only reference places and constants. Moreover, when you use a place, we indicate whether we are copying it (which requires that the place have a type T
where T: Copy
) or moving it (which works for a place of any type). So, for example, if we had the expression x = a + b + c
in Rust, that would get compiled to two statements and a temporary:
TMP1 = a + b x = TMP1 + c
(Try it and see, though you may want to do release mode to skip over the overflow checks.)
The MIR data types are defined in the src/librustc/mir/
module. Each of the key concepts mentioned in the previous section maps in a fairly straightforward way to a Rust type.
The main MIR data type is Mir
. It contains the data for a single function (along with sub-instances of Mir for “promoted constants”, but you can read about those below).
basic_blocks
; this is a vector of BasicBlockData
structures. Nobody ever references a basic block directly: instead, we pass around BasicBlock
values, which are newtype'd indices into this vector.Statement
.Terminator
.Local
. The data for a local variable is found in the Mir
(the local_decls
vector). There is also a special constant RETURN_PLACE
identifying the special “local” representing the return value.Place
. There are a few variants:_1
FOO
_1.f
is a projection, with f
being the "projection element and _1
being the base path. *_1
is also a projection, with the *
being represented by the ProjectionElem::Deref
element.Rvalue
.Operand
.to be written
to be written