remove transpose addmm weights hack (#358)

Summary:
Pull Request resolved: https://github.com/pytorch/executorch/pull/358

### Background

A common pattern we when encountering addmm is that weights are permuted before given to addmm. This is because generally for torch.nn.Linear, the input shape and weight shape are given as such:
```
input: (*, in_features)
weight: (out_features,in_features)
```

while the input shape and weight shape of addmm are the following:
```
input1 (input): (*, in_features)
input2 (weight): (in_features, out_features)
```

so when decomposing nn.Linear to addmm, the weights go through a permute node to comply with addmm's shapes

### XNNPACK Status
XNNPACK can handle both the transpose and normal weight shape, however it requires a flag for whether or not the weights are transposed. So an easy optimization is to skip the permute node and use the flag.

### Change and Motivation
Currently, we have hardcoded some of this optimization logic directly into serialization. I believe that serialization should not be aware of these optimizations, which is why I am removing this logic from within serialization. Instead this logic should be performed completely by the addmm --> linear pass which recomposes permute + addmm into a singular linear. We should no longer rely on serialization logic to perform this logic (Right now its errorneous and causing a bug).

Reviewed By: kirklandsign

Differential Revision: D49129704

fbshipit-source-id: 1134c33f76eb27ac05a90b29c6dc057c8c647b58
2 files changed
tree: 9396aab2bffafd6db13781f38ec3c54f80cca6f1
  1. .ci/
  2. .github/
  3. backends/
  4. build/
  5. bundled_program/
  6. codegen/
  7. configurations/
  8. docs/
  9. examples/
  10. exir/
  11. extension/
  12. kernels/
  13. profiler/
  14. runtime/
  15. schema/
  16. scripts/
  17. sdk/
  18. shim/
  19. test/
  20. third-party/
  21. util/
  22. .buckconfig
  23. .clang-format
  24. .clang-tidy
  25. .cmakelintrc
  26. .flake8
  27. .gitignore
  28. .gitmodules
  29. .lintrunner.toml
  30. CMakeLists.txt
  31. CODE_OF_CONDUCT.md
  32. CONTRIBUTING.md
  33. install_requirements.sh
  34. LICENSE
  35. pyproject.toml
  36. pytest.ini
  37. README.md
  38. setup.py
README.md

ExecuTorch

A unified ML software stack within the PyTorch platform for edge devices. It defines new compiler entry points as well as a state-of-art runtime.

Why ExecuTorch?

Compared to the legacy Lite Interpreter, there are some major benefits:

  • Performance wins compared to Lite Interpreter
    • Faster (orders of magnitude lower framework tax in both DSP and CPU)
    • Much smaller binary size, 1.5 MB vs 30 KB without operators.
    • Smaller memory footprint because we do ahead of time memory planning in ExecuTorch and also have clear granular control over where the runtime allocations are done.
  • Long term alignment with the direction of PyTorch infrastructure
    • Lite Interpreter relies on TorchScript, which is being phased out; ExecuTorch is the planned replacement for Lite Interpreter.
  • Model Authoring & Productivity gains
    • More and better defined entry points to perform model, device, and/or use-case specific optimizations (e.g. better backend delegation, user-defined compiler transformations, default or user-defined memory planning, etc)
    • Ability to lower constructs like dynamic control flow to run on device.

Design goals

  • Minimal binary size (< 50KB not including kernels)
  • Minimal framework tax: loading program, initializing executor, kernel and backend-delegate dispatch, runtime memory utilization
  • Portable (cross-compile across many toolchains)
  • Executes ATen kernels (or ATen custom kernels)
  • Executes custom op kernels
  • Supports inter op asynchronous execution
  • Supports static memory allocation (heapless)
  • Supports custom allocation across memory hierarchies
  • Supports control flow needed by models
  • Allows selective build of kernels
  • Allows backend delegation with lightweight interface

Quick Links

Quick Links for Partners

Directory Structure [WIP]

executorch
├── backends                        #  1st party backend implementations.
|   ├── xnnpack
|   ├── vulkan
├── build                           #  Utilities for managing the build system.
├── bundled_program                 #  Utilities for attaching reference inputs and outputs to models. TODO move to extension
├── codegen                         #  Tooling to autogenerate bindings between kernels and the runtime. TODO move to tool
├── configurations                  #  TODO delete this
├── docs                            #  Static docs tooling
├── examples                        #  Examples of various user flows, such as model export, delegates, and runtime execution.
|   ├── executor_runner
|   ├── export
|   ├── models
├── exir                            #  Ahead of time library, model capture and lowering apis.
|   ├── backend                     #  Backend delegate ahead of time APIs
|   ├── capture                     #  Program capture.
|   ├── dialects                    #  Op sets for various dialects in the export process.
|   ├── emit                        #  Conversion from ExportedProgram to Executorch execution instructions.
|   ├── program                     #  Export artifacts.
|   ├── serialize                   #  Serialize final export artifact.
├── extension                       #  Extensions built on top of the runtime.
|   ├── aten_util
|   ├── data_loader                 # 1st party data loader implementations.
|   ├── memory_allocator            # 1st party memory allocator implementations.
|   ├── pybindings                  # Python api for executorch runtime.
|   ├── pytree                      # C++ and Python flattening and unflattening lib for pytrees.
|   ├── testing_util
├── kernels                         #  1st party kernel implementations.
|   ├── aten
|   ├── optimized
|   ├── portable                    #  Reference implementations of ATen operators.
|   ├── prim_ops                    #  Special ops used in executorch runtime for control flow and symbolic primitives.
|   ├── quantized
├── profiler                        #  Utilities for profiling. TODO delete in favor of ETDump in sdk/
├── runtime                         #  core cpp runtime of executorch
|   ├── backend                     #  Backend delegate runtime APIs
|   ├── core                        #  Core structures used across all levels of the runtime
|   ├── executor                    #  Model loading, initalization, and execution.
|   ├── kernel                      #  Kernel registration and management.
|   ├── platform                    #  Layer between architecture specific code and user calls.
├── schema                          #  Executorch program definition, TODO move under serialization/
├── scripts                         #  Utility scripts for size management, dependency management, etc.
├── sdk                             #  Model profiling, debugging, and introspection: NOT READY YET FOR OSS USE
├── shim                            #  Compatibility layer between OSS and Internal builds
├── test                            #  Broad scoped end2end tests
├── third-party                     #  third-party dependencies
├── util                            #  TODO delete this

License

ExecuTorch is BSD licensed, as found in the LICENSE file.