| # ExecuTorch Runtime Overview |
| |
| This document discusses the design of the ExecuTorch runtime, which executes |
| ExecuTorch program files on edge devices like smartphones, wearables, and |
| embedded devices. The code for the main execution API is under |
| [`executorch/runtime/executor/`](https://github.com/pytorch/executorch/tree/main/runtime/executor). |
| |
| Before reading this document we recommend that you read [How ExecuTorch |
| Works](intro-how-it-works.md). |
| |
| At the highest level, the ExecuTorch runtime is responsible for: |
| |
| * Loading binary `.pte` program files that were generated by the |
| [`to_executorch()`](./tutorials/export-to-executorch-tutorial) step of the |
| model-lowering process. |
| * Executing the series of instructions that implement a lowered model. |
| |
| Note that as of late 2023, the ExecuTorch runtime only supports model inference, |
| and does not yet support training. |
| |
| This diagram shows the high-level flow of, and components involved with, |
| exporting and executing an ExecuTorch program: |
| |
|  |
| |
| The runtime is also responsible for: |
| |
| * Managing the memory used during load and execution, potentially across |
| multiple memory banks like SRAM and DRAM. |
| * Mapping symbolic operator names like `"aten::add.out"` to concrete C++ |
| functions or [_kernels_](kernel-library-overview.md) that implement the |
| semantics of those operators. |
| * Dispatching predetermined sections of the model to [backend |
| delegates](compiler-delegate-and-partitioner.md) for acceleration. |
| * Optionally gathering [profiling data](sdk-profiling.md) during load and |
| execution. |
| |
| ## Design Goals |
| |
| The ExecuTorch runtime was designed to run on a wide variety of edge devices, |
| from modern smartphone CPUs to resource-constrained microcontrollers and DSPs. |
| It has first-class support for |
| [delegating](compiler-delegate-and-partitioner.md) execution to one or more |
| backends to take advantage of architecture-specific optimizations and modern |
| heterogeneous architectures. It is small and portable enough to run directly in |
| bare-metal embedded environments with no operating systems, dynamic memory, or |
| threads. |
| |
| ### Low Execution Overhead |
| |
| #### Memory |
| |
| * The core runtime library is less than 50kB when built without kernels or |
| backends. |
| * Constant tensors point directly into the `.pte` file data, avoiding copies of |
| that data. The alignment of these data chunks can be adjusted at `.pte` |
| creation time. |
| * Backend delegates can choose to unload their precompiled data after model |
| initialization, reducing peak memory usage. |
| * Mutable tensor memory layout is planned ahead of time and packed into a small |
| set of user-allocated buffers, providing fine-grained control over memory |
| location. This is especially useful on systems with heterogeneous memory |
| hierarchies, allowing placement onto (e.g.) SRAM or DRAM close to the core |
| that will operate on the data. |
| |
| #### CPU |
| |
| * Model execution is a simple loop over an array of instructions, most of which |
| are function pointers to kernels and backend delegates. This keeps the |
| execution overhead small, on the order of microseconds to nanoseconds per |
| operation. |
| * The implementation of an operation (like "add" or "conv3d") can be fully |
| customized for a particular target system without needing to modify the |
| original model or generated `.pte` file. |
| |
| ### Familiar PyTorch Semantics |
| |
| ExecuTorch is a first-class component of the PyTorch stack, and reuses APIs and |
| semantics whenever possible. |
| |
| * The C++ types used by ExecuTorch are source-compatible with the corresponding |
| types from core PyTorch's `c10::` and `at::` libraries, and ExecuTorch |
| provides |
| [`aten_bridge`](https://github.com/pytorch/executorch/blob/main/extension/aten_util/aten_bridge.h) |
| to convert between the two. This can be helpful for projects that already use |
| PyTorch C++ types. |
| * The semantics of operators like `aten::add` and `aten::sigmoid` are identical |
| between ExecuTorch and core PyTorch. ExecuTorch provides a testing framework |
| to ensure this, and to help test future implementations of these operators. |
| |
| ### Portable Code and Architecture |
| |
| The ExecuTorch runtime is implemented with portability in mind, so that users |
| can build it for a wide variety of target systems. |
| |
| #### C++ Language Considerations |
| |
| * The code is C++11-compatible to work with older toolchains. |
| * The runtime does not use exceptions or RTTI, although it is not antagonistic |
| to them. |
| * The code is compatible with GCC and Clang, and has also been built with |
| several proprietary embedded toolchains. |
| * The repo provides both CMake and buck2 build systems to make integration |
| easier. |
| |
| #### Operating System Considerations |
| |
| The runtime makes no direct system calls. All access to memory, files, logging, |
| and clocks are abstracted through the [_Runtime Platform Abstraction Layer |
| (PAL)_](runtime-platform-abstraction-layer.md) and injected interfaces like |
| `DataLoader` and `MemoryAllocator`. See the [runtime api reference](executorch-runtime-api-reference.rst) to learn more. |
| |
| Applications can control all memory allocation through the `MemoryManager`, |
| `MemoryAllocator`, `HierarchicalAllocator`, and `DataLoader` classes. The core |
| runtime makes no direct calls to `malloc()` or `new`, or to types like |
| `std::vector` that allocate under the hood. This makes it possible to: |
| |
| * Run in environments without a heap, but still use the heap if desired. |
| * Avoid synchronization on the heap during model load and execution. |
| * Control which memory region to use for different types of data. For example, |
| one set of mutable tensors could live in SRAM while another set lived in DRAM. |
| * Easily monitor how much memory the runtime uses. |
| |
| However, please note that specific kernel or backend implementations may use |
| arbitrary runtime or operating system features. Users should double-check the |
| docs for the kernel and backend libraries that they use. |
| |
| #### Threading Considerations |
| |
| The core runtime does no threading or locking, and does not use thread local |
| variables. But, it plays well with higher-level synchronization. |
| |
| * Each `Program` instance is immutable and therefore _[fully |
| thread-safe](https://faithlife.codes/blog/2008/03/degrees_of_thread_safety/#thread-safe)_. |
| Multiple threads may concurrently access a single `Program` instance. |
| * Each `Method` instance is mutable but self-contained, and therefore |
| _[conditionally |
| thread-safe](https://faithlife.codes/blog/2008/03/degrees_of_thread_safety/#conditionally-thread-safe)_. |
| Multiple threads can concurrently access and execute independent `Method` |
| instances, but access and execution of a single instance must be serialized. |
| |
| However, please note: |
| |
| * There are two global tables that may be read during `Program::load_method()`: |
| the kernel registration table and the backend registration table. |
| * In practice, these tables are only modified at process/system load time, |
| and are effectively frozen before the first `Program` is loaded. But some |
| applications may need to be aware of these tables, especially if they |
| manually mutate them after process/system load time. |
| * Specific kernel or backend implementations may have their own threading |
| restrictions. Users should double-check the docs for the kernel and backend |
| libraries that they use. |
| |
| ## Further Reading |
| |
| For more details about the ExecuTorch runtime, please see: |
| |
| * [Runtime API Tutorial](running-a-model-cpp-tutorial.md) |
| * [Runtime Build and Cross Compilation](runtime-build-and-cross-compilation.md) |
| * [Runtime Platform Abstraction Layer](runtime-platform-abstraction-layer.md) |
| * [Runtime Profiling](sdk-profiling.md) |
| * [Backends and Delegates](compiler-delegate-and-partitioner.md) |
| * [Backend Delegate Implementation](runtime-backend-delegate-implementation-and-linking.md) |
| * [Kernel Library Overview](kernel-library-overview.md) |