This document discusses the design of the ExecuTorch runtime, which executes ExecuTorch program files on edge devices like smartphones, wearables, and embedded devices. The code for the main execution API is under executorch/runtime/executor/.
Before reading this document we recommend that you read How ExecuTorch Works.
At the highest level, the ExecuTorch runtime is responsible for:
.pte program files that were generated by the to_executorch() step of the model-lowering process.Note that as of late 2023, the ExecuTorch runtime only supports model inference, and does not yet support training.
This diagram shows the high-level flow of, and components involved with, exporting and executing an ExecuTorch program:
The runtime is also responsible for:
"aten::add.out" to concrete C++ functions or kernels that implement the semantics of those operators.The ExecuTorch runtime was designed to run on a wide variety of edge devices, from modern smartphone CPUs to resource-constrained microcontrollers and DSPs. It has first-class support for delegating execution to one or more backends to take advantage of architecture-specific optimizations and modern heterogeneous architectures. It is small and portable enough to run directly in bare-metal embedded environments with no operating systems, dynamic memory, or threads.
.pte file data, avoiding copies of that data. The alignment of these data chunks can be adjusted at .pte creation time..pte file.ExecuTorch is a first-class component of the PyTorch stack, and reuses APIs and semantics whenever possible.
c10:: and at:: libraries, and ExecuTorch provides aten_bridge to convert between the two. This can be helpful for projects that already use PyTorch C++ types.aten::add and aten::sigmoid are identical between ExecuTorch and core PyTorch. ExecuTorch provides a testing framework to ensure this, and to help test future implementations of these operators.The ExecuTorch runtime is implemented with portability in mind, so that users can build it for a wide variety of target systems.
The runtime makes no direct system calls. All access to memory, files, logging, and clocks are abstracted through the Runtime Platform Abstraction Layer (PAL) and injected interfaces like DataLoader and MemoryAllocator. See the runtime api reference to learn more.
Applications can control all memory allocation through the MemoryManager, MemoryAllocator, HierarchicalAllocator, and DataLoader classes. The core runtime makes no direct calls to malloc() or new, or to types like std::vector that allocate under the hood. This makes it possible to:
However, please note that specific kernel or backend implementations may use arbitrary runtime or operating system features. Users should double-check the docs for the kernel and backend libraries that they use.
The core runtime does no threading or locking, and does not use thread local variables. But, it plays well with higher-level synchronization.
Program instance is immutable and therefore fully thread-safe. Multiple threads may concurrently access a single Program instance.Method instance is mutable but self-contained, and therefore conditionally thread-safe. Multiple threads can concurrently access and execute independent Method instances, but access and execution of a single instance must be serialized.However, please note:
Program::load_method(): the kernel registration table and the backend registration table.Program is loaded. But some applications may need to be aware of these tables, especially if they manually mutate them after process/system load time.For more details about the ExecuTorch runtime, please see: