commit	aed32c40096c668b8808a2e068d01db31260ec25	[log] [tgz]
author	Stephen Jia <ssjia@meta.com>	Wed Mar 06 19:04:51 2024 -0800
committer	Facebook GitHub Bot <facebook-github-bot@users.noreply.github.com>	Wed Mar 06 19:04:51 2024 -0800
tree	f61833896858de6b83c20472f0cf2564abc052b1
parent	05702947e30cd720dc0a3f1e0df7bb209c879725 [diff]

Use lazy descriptor pool allocation (#2285)

Summary:
Pull Request resolved: https://github.com/pytorch/executorch/pull/2285

## Context

In Vulkan, memory for Descriptor Sets (which are used to bind data to shader arguments) must be pre-allocated. Previously, the convention is that a large number of descriptor sets are allocated upon creation of a Vulkan Context. While this worked well in Lite Interpreter, where only a global vulkan context is used, it will lead to overallocating descriptor sets in the Vulkan Delegate, where every `ComputeGraph` has its own dedicated Context.

https://github.com/pytorch/pytorch/pull/121134 allows the Descriptor Set pool to be initialized in a deferred fashion. This means that a ComputeGraph can count the total number of descriptors needed across all the compute shaders that will be encoded, and then allocate a Descriptor Set Pool of the appropriate size.

## Implementation Overview

1. When constructing `ComputeGraph`, make sure that the descriptor pool config contains 0 for number of max sets. This will ensure that no descriptor pool will be initialized when constructing the graph's `api::Context` instance
2. When building the graph, `ExecuteNode` and `PrepackNode` will call `graph.update_descriptor_counts(shader)` upon construction, which allows `ComputeGraph` to count the total number of descriptor sets needed.
3. There is a separate descriptor count object for prepack and execute, since they correspond to different command buffers.
4. Before encoding any command buffers, call `graph.prepare()` which will construct a descriptor pool config from the descriptor counts.

## Notes

One interesting finding is that I had to apply a safety factor to the descriptor counts to prevent the pool from running out of memory. This was reproducible on both Linux and Android.

A more robust design, i.e. as discussed [here](https://www.reddit.com/r/vulkan/comments/17v66fi/question_about_descriptor_pool_allocations/) may be to maintain separate descriptor pools for each layout type. We should revisit this refactor at a later time.

bypass-github-export-checks

Reviewed By: jorgep31415

Differential Revision: D54603935

fbshipit-source-id: eb04403b5f0967d69b390153c778b58bd940004e

12 files changed

tree: f61833896858de6b83c20472f0cf2564abc052b1

README.md

ExecuTorch

ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices.

Key value propositions of ExecuTorch are:

Portability: Compatibility with a wide variety of computing platforms, from high-end mobile phones to highly constrained embedded systems and microcontrollers.
Productivity: Enabling developers to use the same toolchains and SDK from PyTorch model authoring and conversion, to debugging and deployment to a wide variety of platforms.
Performance: Providing end users with a seamless and high-performance experience due to a lightweight runtime and utilizing full hardware capabilities such as CPUs, NPUs, and DSPs.

For a comprehensive technical overview of ExecuTorch and step-by-step tutorials, please visit our documentation website.

Important: This is a preview release

This is a preview version of ExecuTorch and should be used for testing and evaluation purposes only. It is not recommended for use in production settings. We welcome any feedback, suggestions, and bug reports from the community to help us improve the technology. Please use the PyTorch Forums for discussion and feedback about ExecuTorch using the ExecuTorch category, and our GitHub repository for bug reporting.

The ExecuTorch code and APIs are still changing quickly, and there are not yet any guarantees about forward/backward source compatibility. We recommend using the latest v#.#.# release tag from the Releases page when experimenting with this preview release.

Directory Structure

executorch
├── backends # Backend delegate implementations.
├── build # Utilities for managing the build system.
├── bundled_program # Utilities for attaching reference inputs and outputs to models. TODO move to extension
├── codegen # Tooling to autogenerate bindings between kernels and the runtime. TODO move to tool
├── configurations # TODO delete this
├── docs # Static docs tooling
├── examples # Examples of various user flows, such as model export, delegates, and runtime execution.
├── exir # Ahead of time library, model capture and lowering apis.
| ├── _serialize # Serialize final export artifact.
| ├── backend # Backend delegate ahead of time APIs
| ├── capture # Program capture.
| ├── dialects # Op sets for various dialects in the export process.
| ├── emit # Conversion from ExportedProgram to ExecuTorch execution instructions.
| ├── passes # Built-in compiler passes.
| ├── program # Export artifacts.
| ├── verification # IR verification.
├── extension # Extensions built on top of the runtime.
| ├── aten_util
| ├── data_loader # 1st party data loader implementations.
| ├── memory_allocator # 1st party memory allocator implementations.
| ├── pybindings # Python api for executorch runtime.
| ├── pytree # C++ and Python flattening and unflattening lib for pytrees.
| ├── testing_util
├── kernels # 1st party kernel implementations.
| ├── aten
| ├── optimized
| ├── portable # Reference implementations of ATen operators.
| ├── prim_ops # Special ops used in executorch runtime for control flow and symbolic primitives.
| ├── quantized
├── profiler # Utilities for profiling. TODO delete in favor of ETDump in sdk/
├── runtime # core cpp runtime of executorch
| ├── backend # Backend delegate runtime APIs
| ├── core # Core structures used across all levels of the runtime
| ├── executor # Model loading, initalization, and execution.
| ├── kernel # Kernel registration and management.
| ├── platform # Layer between architecture specific code and user calls.
├── schema # ExecuTorch program definition, TODO move under serialization/
├── scripts # Utility scripts for size management, dependency management, etc.
├── sdk # Model profiling, debugging, and introspection.
├── shim # Compatibility layer between OSS and Internal builds
├── test # Broad scoped end2end tests
├── third-party # third-party dependencies
├── util # TODO delete this

License

ExecuTorch is BSD licensed, as found in the LICENSE file.