jit

The jit directory contains infrastructure for a just-in-time compiler for PyTorch and associated ‘script’ subset of python it can execute directly.

The JIT compiler has several phases.

Parsing - An AST (defined in tree_views.h) is generated either by parsing a string of python-like code (jit/script/parser.h) or by translation from the Python AST (jit/frontend.py). This phase only checks for syntactic correctness and for use of the syntactic subset of python that the script supports.
Semantic Checking/Specialization - We lower the AST into an IR Graph object. In this phase we check that variables are in scope and resolve any free variables to python objects. When we find free variables that are python objects, or references to non-first-class values such as modules, we temporarily represent them as SugaredValue objects. This phase then de-sugars these values by e.g. inserting a PythonOp into the graph to call a python function.
Optimizations - A GraphExecutor works on an initial Graph object, performing optimizations, possibly differentiating it, and possibly specializing it to a particular size.
Translation to Instructions - to execute a graph, it is lowered by the interpreter into a linear list of Instruction objects.
Execution - the interpreter reads the instruction stream, executing ATen operations and any generated code fragments.

Well-known functions

Ordinarily, when defining a compiler you want the set of functions to be user extensible; e.g., a user can add to the set of defined functions by defining an appropriate autograd Function. However, there are some functions where we want to make assumptions about their semantics, because we are going to write optimizations over them or insert them into the program. Such functions are “well-known” functions, because the JIT compiler knows about them, and a user implementation must abide by the contract (sometimes implicitly) specified by the compiler.

A well-known function is usually implemented in several parts:

First, we pre-intern the string (interned_strings.h) that identifies the node. This allows us to more conveniently refer to these operators without having to first do a lookup through the intern table.
If we generate this operator during optimizations, we will often have a helper function in Graph (ir.h) for creating the operator. This is the easiest way to find out, in code, what attributes we assume for an operator.
There is a runtime interpretation of the operator in torch/csrc/autograd/functions/interpreter.cpp, which specifies how we actually interpret programs that contain such an operator.

So, whence the specifications! For the most part, we are following the ONNX operator specification to determine the semantics of our operators. However, there are a few other well-known functions which are specific to PyTorch.

FusionGroup
A fusion group takes some number of input tensors, applies a graph Subgraph to them, producing the returned tensors of the subgraph. Operationally, operators inside a FusionGroup are fused into a single kernel, so that their intermediate results are never materialized. Not all operators support fusion:
- attribute:
- input: 1 - ∞ (same as inputs of Subgraph)
- output: 1 - ∞ (same as outputs of Subgraph)