Mark Kernel and KernelKey constructors constexpr (#17909)

Enables constant initialization of static Kernel arrays (e.g. the
prim_ops array in register_prim_ops.cpp). With constexpr constructors,
the compiler places the fully initialized Kernel data directly into
.data.rel.ro instead of generating a static initialization function to
construct each Kernel object at startup.

This shrinks _GLOBAL__sub_I_register_prim_ops.cpp from 1,010 bytes
to 35 bytes (just the register_kernels call), and also eliminates
startup latency from Kernel construction.

Reduces stripped size_test by 3,488 bytes and stripped
size_test_all_ops by 2,656 bytes.

---------

Co-authored-by: Github Executorch <github_executorch@arm.com>
1 file changed
tree: 30b1a5a7babf984d9980d8d452ed90554168286d
  1. .ci/
  2. .claude/
  3. .githooks/
  4. .github/
  5. .Package.swift/
  6. backends/
  7. codegen/
  8. configurations/
  9. data/
  10. desktop/
  11. devtools/
  12. docs/
  13. examples/
  14. exir/
  15. export/
  16. extension/
  17. kernels/
  18. profiler/
  19. runtime/
  20. schema/
  21. scripts/
  22. shim_et/
  23. src/
  24. test/
  25. third-party/
  26. tools/
  27. util/
  28. website/
  29. zephyr/
  30. .bandit.yaml
  31. .buckconfig
  32. .clang-format
  33. .clang-tidy
  34. .cmake-format.yaml
  35. .cmakelintrc
  36. .flake8
  37. .gitignore
  38. .gitmodules
  39. .lintrunner.toml
  40. .mypy.ini
  41. CLAUDE.md
  42. CMakeLists.txt
  43. CMakePresets.json
  44. CODE_OF_CONDUCT.md
  45. CODEOWNERS
  46. conftest.py
  47. CONTRIBUTING.md
  48. install_executorch.bat
  49. install_executorch.py
  50. install_executorch.sh
  51. install_requirements.py
  52. install_requirements.sh
  53. install_utils.py
  54. LICENSE
  55. Makefile
  56. Package.swift
  57. pyproject.toml
  58. pytest-windows.ini
  59. pytest.ini
  60. README-wheel.md
  61. README.md
  62. requirements-dev.txt
  63. requirements-examples.txt
  64. requirements-lintrunner.txt
  65. run_python_script.sh
  66. setup.py
  67. Test.cmake
  68. torch_pin.py
  69. version.txt
README.md

ExecuTorch is PyTorch‘s unified solution for deploying AI models on-device—from smartphones to microcontrollers—built for privacy, performance, and portability. It powers Meta’s on-device AI across Instagram, WhatsApp, Quest 3, Ray-Ban Meta Smart Glasses, and more.

Deploy LLMs, vision, speech, and multimodal models with the same PyTorch APIs you already know—accelerating research to production with seamless model export, optimization, and deployment. No manual C++ rewrites. No format conversions. No vendor lock-in.

Why ExecuTorch?

  • 🔒 Native PyTorch Export — Direct export from PyTorch. No .onnx, .tflite, or intermediate format conversions. Preserve model semantics.
  • ⚡ Production-Proven — Powers billions of users at Meta with real-time on-device inference.
  • 💾 Tiny Runtime — 50KB base footprint. Runs on microcontrollers to high-end smartphones.
  • 🚀 12+ Hardware Backends — Open-source acceleration for Apple, Qualcomm, ARM, MediaTek, Vulkan, and more.
  • 🎯 One Export, Multiple Backends — Switch hardware targets with a single line change. Deploy the same model everywhere.

How It Works

ExecuTorch uses ahead-of-time (AOT) compilation to prepare PyTorch models for edge deployment:

  1. 🧩 Export — Capture your PyTorch model graph with torch.export()
  2. ⚙️ Compile — Quantize, optimize, and partition to hardware backends → .pte
  3. 🚀 Execute — Load .pte on-device via lightweight C++ runtime

Models use a standardized Core ATen operator set. Partitioners delegate subgraphs to specialized hardware (NPU/GPU) with CPU fallback.

Learn more: How ExecuTorch WorksArchitecture Guide

Quick Start

Installation

pip install executorch

For platform-specific setup (Android, iOS, embedded systems), see the Quick Start documentation for additional info.

Export and Deploy in 3 Steps

import torch
from executorch.exir import to_edge_transform_and_lower
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner

# 1. Export your PyTorch model
model = MyModel().eval()
example_inputs = (torch.randn(1, 3, 224, 224),)
exported_program = torch.export.export(model, example_inputs)

# 2. Optimize for target hardware (switch backends with one line)
program = to_edge_transform_and_lower(
    exported_program,
    partitioner=[XnnpackPartitioner()]  # CPU | CoreMLPartitioner() for iOS | QnnPartitioner() for Qualcomm
).to_executorch()

# 3. Save for deployment
with open("model.pte", "wb") as f:
    f.write(program.buffer)

# Test locally via ExecuTorch runtime's pybind API (optional)
from executorch.runtime import Runtime
runtime = Runtime.get()
method = runtime.load_program("model.pte").load_method("forward")
outputs = method.execute([torch.randn(1, 3, 224, 224)])

Run on Device

C++

#include <executorch/extension/module/module.h>
#include <executorch/extension/tensor/tensor.h>

Module module("model.pte");
auto tensor = make_tensor_ptr({2, 2}, {1.0f, 2.0f, 3.0f, 4.0f});
auto outputs = module.forward(tensor);

Swift (iOS)

import ExecuTorch

let module = Module(filePath: "model.pte")
let input = Tensor<Float>([1.0, 2.0, 3.0, 4.0], shape: [2, 2])
let outputs = try module.forward(input)

Kotlin (Android)

val module = Module.load("model.pte")
val inputTensor = Tensor.fromBlob(floatArrayOf(1.0f, 2.0f, 3.0f, 4.0f), longArrayOf(2, 2))
val outputs = module.forward(EValue.from(inputTensor))

LLM Example: Llama

Export Llama models using the export_llm script or Optimum-ExecuTorch:

# Using export_llm
python -m executorch.extension.llm.export.export_llm --model llama3_2 --output llama.pte

# Using Optimum-ExecuTorch
optimum-cli export executorch \
  --model meta-llama/Llama-3.2-1B \
  --task text-generation \
  --recipe xnnpack \
  --output_dir llama_model

Run on-device with the LLM runner API:

C++

#include <executorch/extension/llm/runner/text_llm_runner.h>

auto runner = create_llama_runner("llama.pte", "tiktoken.bin");
executorch::extension::llm::GenerationConfig config{
    .seq_len = 128, .temperature = 0.8f};
runner->generate("Hello, how are you?", config);

Swift (iOS)

import ExecuTorchLLM

let runner = TextRunner(modelPath: "llama.pte", tokenizerPath: "tiktoken.bin")
try runner.generate("Hello, how are you?", Config {
    $0.sequenceLength = 128
}) { token in
    print(token, terminator: "")
}

Kotlin (Android)API DocsDemo App

val llmModule = LlmModule("llama.pte", "tiktoken.bin", 0.8f)
llmModule.load()
llmModule.generate("Hello, how are you?", 128, object : LlmCallback {
    override fun onResult(result: String) { print(result) }
    override fun onStats(stats: String) { }
})

For multimodal models (vision, audio), use the MultiModal runner API which extends the LLM runner to handle image and audio inputs alongside text. See Llava and Voxtral examples.

See examples/models/llama for complete workflow including quantization, mobile deployment, and advanced options.

Next Steps:

Platform & Hardware Support

PlatformSupported Backends
AndroidXNNPACK, Vulkan, Qualcomm, MediaTek, Samsung Exynos
iOSXNNPACK, MPS, CoreML (Neural Engine)
Linux / WindowsXNNPACK, OpenVINO, CUDA (experimental)
macOSXNNPACK, MPS, Metal (experimental)
Embedded / MCUXNNPACK, ARM Ethos-U, NXP, Cadence DSP

See Backend Documentation for detailed hardware requirements and optimization guides. For desktop/laptop GPU inference with CUDA and Metal, see the Desktop Guide. For Zephyr RTOS integration, see the Zephyr Guide.

Production Deployments

ExecuTorch powers on-device AI at scale across Meta's family of apps, VR/AR devices, and partner deployments. View success stories →

Examples & Models

LLMs: Llama 3.2/3.1/3, Qwen 3, Phi-4-mini, LiquidAI LFM2

Multimodal: Llava (vision-language), Voxtral (audio-language), Gemma (vision-language)

Vision/Speech: MobileNetV2, DeepLabV3, Whisper

Resources: examples/ directory • executorch-examples out-of-tree demos • Optimum-ExecuTorch for HuggingFace models • Unsloth for fine-tuned LLM deployment

Key Features

ExecuTorch provides advanced capabilities for production deployment:

  • Quantization — Built-in support via torchao for 8-bit, 4-bit, and dynamic quantization
  • Memory Planning — Optimize memory usage with ahead-of-time allocation strategies
  • Developer Tools — ETDump profiler, ETRecord inspector, and model debugger
  • Selective Build — Strip unused operators to minimize binary size
  • Custom Operators — Extend with domain-specific kernels
  • Dynamic Shapes — Support variable input sizes with bounded ranges

See Advanced Topics for quantization techniques, custom backends, and compiler passes.

Documentation

Community & Contributing

We welcome contributions from the community!

License

ExecuTorch is BSD licensed, as found in the LICENSE file.