This README describes the details of how the profiler is implemented.
The profiler instruments PyTorch to collect information about the model's execution. Its main features are:
RecordFunction
TODO
RecordFunction
/aten/src/ATen/record_function.h
RecordFunction
is used by the profiler to instrument CPU-side events.
RecordFunction
is a general method of instrumenting function calls in PyTorch. It can be used for other general applications, e.g. see Features for Large-Scale Deployments. In PyTorch, it is already included at some important locations; notably, in the dispatcher, surrounding every op.
Users (or PyTorch itself) can register callbacks that will be executed whenever a RecordFunction
guard is encountered. The profiler uses this mechanism to record the start and end times for each op call, as well as user-provided RecordFunction
annotations. The RecordFunction
machinery is designed to have relatively low overhead, especially when there are no callbacks registered. Nevertheless, there can still be some overhead.
There is also a python binding for RecordFunction
in python (with torch.profiler.record_function
); this is often used by users to annotate events corresponding to module-level events.
The autograd engine is responsible for automatically computing gradients.
The profiler records two pieces of information from the autograd engine:
(*) Note that only op invocations whose inputs require gradients are assigned a sequence number
TODO
TODO
TODO