This subtree contains operator implementations that ExecuTorch clients can use and contribute to.
kernels
: Contains implementations and tests for the operators defined in the YAML files.kernels/portable/cpu
: Pure C++ implementations of the operators defined in the YAML files.kernels/optimized/cpu
: Optimized C++ implementations of the operators defined in the YAML files, for specific hardware platforms.kernels/aten
: A thin wrapper layer to hookup ATen library into ExecuTorch.kernels/test
: Tests for all operator implementations. Since all implementations should behave identically, the same tests should pass for all target types.If you have problems or questions, or have suggestions for ways to make implementation and testing better, please contact Dave Bort, Mengwei Liu, or Martin Yuan on the PyTorch Edge team.
Please follow these steps and guidelines when adding a new operator implementation to this library. The goals of these guidelines are to:
ExecuTorch does not use at::Tensor
, at::ScalarType
, c10::Scalar
, or any of the types defined by PyTorch core in the at
or c10
namespaces. To retain tigher control over CPU and memory runtime behavior, ExecuTorch reimplements compatible but restricted subsets of those types.
//runtime/core/exec_aten/exec_aten.h
contains the mapping between ATen/c10 types and the ExecuTorch types. The ExecuTorch types are defined in other headers in that same directory, //runtime/core/portable_type/
.
The ExecuTorch types are source-compatible with the ATen/c10 types; if you write code that works with the ExecuTorch types, then that same code should work when built against ATen/c10. But, there are features of at::Tensor
and other ATen/c10 types that may not be present. In many cases this is intentional, but in other cases we can consider adding the missing features.
Although ExecuTorch is mapped into both xplat
and fbcode
, we recommend setting up the initial targets while working from fbcode
. Once everything's in place, you should be able to build from either spot.
The most important thing is to consistently work out of one root or the other. And, if you're getting weird build failures, hg commit
your edited files locally to make sure that both xplat
and fbcode
are in sync with each other.
We use yaml files to declare the ATen operators or custom operators being implemented by this kernel library.
Before implementing, the operator must be declared in exactly one of the operator YAML files:
//kernels/portable/functions.yaml
op: add.out
) appears in the core pytorch file pytorch/aten/src/ATen/native/native_functions.yaml
.//kernels/aten/functions.yaml
for test coverage.//kernels/portable/custom_ops.yaml
native_functions.yaml
.The next sections describe how to add a yaml entry.
This YAML file schema is a DSL to decribe the operators and the kernels that implement them. This YAML file is a contract between AOT model export and runtime execution, that if followed correctly, can make sure ExecuTorch runtime be able to link the C++ implementation of an operator to the exported model artifact. Here are some rules of writing up your own YAML files.
Out variants only
ExecuTorch only supports out-style operators, where:
out
.out
argument.()
(which maps to void), the C++ function should still modify out
but does not need to return anything.out
argument must be keyword-only, which means it needs to follow an argument named *
like in the add.out
example below.<name>.out
or <name>.<overload>_out
.Since all output values are returned via an out
parameter, ExecuTorch ignores the actual C++ function return value. But, to be consistent, functions should always return out
when the return type is non-void
.
Can only return Tensor
or ()
ExecuTorch only supports operators that return a single Tensor
, or the unit type ()
(which maps to void
). It does not support returning any other types, including lists, optionals, tuples, or scalars like bool
.
Supported argument types
ExecuTorch does not support all of the argument types that core PyTorch supports. See this spreadsheet for the list of supported and unsupported types.
Functions only, no methods
ExecuTorch does not support Tensor methods, and assumes variants: function
for all operators. Entries like variants: method
or variants: function, method
will be ignored.
Some examples of operator entry:
ATen operator with a default kernel
- op: add.out kernels: - arg_meta: null kernel_name: torch::executor::add_out
ATen operator with a dtype/dim order specialized kernel (works for Double
dtype and dim order needs to be (0, 1, 2, 3))
- op: add.out type_alias: T0: [Double] dim_order_alias: D0: [[0, 1, 2, 3]] kernels: - arg_meta: self: [T0, D0] other: [T0 , D0] out: [T0, D0] kernel_name: torch::executor::add_out
Custom operator with a default kernel
- func: allclose.out(Tensor self, Tensor other, float rtol=1e-05, float atol=1e-08, bool equal_nan=False, bool dummy_param=False, *, Tensor(a!) out) -> Tensor(a!) kernels: - arg_meta: null kernel_name: torch::executor::allclose_out
Top level attributes:
op
(if the operator appears in native_functions.yaml
) or func
for custom operator. The value for this key needs to be the full operator name (including overload name) for op
key, or a full operator schema (namespace, operator name, operator overload name and schema string). For schema syntax please refer to this instruction.
kernels
: this entry is used to define the information of kernels. It consists of arg_meta
and kernel_name
, they are bound together to describe “for input tensors with these metadata, use this kernel”.
type_alias
(optional): we are giving aliases to possible dtype options. T0: [Double, Float]
means T0
can be one of Double
or Float
.
dim_order_alias
(optional): similar to type_alias
, we are giving names to possible dim order options.
Attributes under kernels
:
arg_meta
: a list of “tensor arg name” entries. The value for these keys are dtypes and dim orders alias, that are implemented by the corresponding kernel_name
. This being null
means the kernel will be used for all types of input.kernel_name
: the expected name of the C++ function that will implement this operator. You can put whatever you want to here, but you should follow the convention of replacing the .
in the overload name with an underscore, and lowercasing all characters. In this example, add.out
uses the C++ function named add_out
. add.Scalar_out
would become add_scalar_out
, with a lowercase S
. We support namespace for kernels, but note that we will be inserting a native::
to the last level of namespace. So custom::add_out
in the kernel_name
will point to custom::native::add_out
.The base name is the part of the operator name before the .
, excluding any trailing underscores. The rest of this document refer to this as <name>
.
E.g., these operator overloads all have a base name of add
:
add.Scalar
add.Tensor
add.out
add_.Tensor
So, if you were implementing add.out
then your operator base name would be add
, and you would replace <name>
with add
everywhere below.
When using macros that require a NAME
argument, eg. #define ET_SWITCH_REAL_TYPES_AND(ADDITIONAL, TYPE, CONTEXT, NAME, CTYPE_ALIAS, ...)
, make sure to pass in the same operator name defined in functions.yaml
. This is the base name + variant, eg. add.out
, add.Scalar_out
. The function name is required for dtype selective build, which matches against the operator names and dtypes present in a model.
For the operator base name <name>
, you should work with these files and Buck targets. Sections below give more details about what they should contain.
./kernels/portable/cpu/op_<name>.cpp
: The implementations of operator overloads with base name <name>
. This is the file that clients will link into their runtimes.//executorch/kernels/portable/cpu:op_<name>
: The build target for op_<name>.cpp
, defined in targets.bzl
in the same directory../kernels/test/op_<name>_test.cpp
: Unit tests for the operator overloads with base name <name>
.cpu
; tests should be implementation-agnostic. This will let us run the same tests against all implementations of a given operator, which should behave identically.//executorch/kernels/portable/test:op_<name>_test
: The test target for op_<name>_test.cpp
, defined in targets.bzl
in the same directory.For an example, see the add
operator (note that these are slightly different from the add
examples in this doc):
//executorch/kernels/portable/cpu/op_add.cpp
: Implementations.//executorch/kernels/portable/cpu/targets.bzl
: Definition of the :op_add
target.//executorch/kernels/portable/test/op_add_test.cpp
: Unit tests.//executorch/kernels/portable/test/targets.bzl
: Definition of the :op_add_test
target.Define a build target by adding an entry to //executorch/kernels/portable/cpu/targets.bzl
, inside define_common_targets()
, in sorted order with other _op_target
entries:
_op_target(name = "op_<name>")
If your operator overload group is ATen-compatible, its _op_target
entry belongs in the _ATEN_OPS
list, otherwise it belongs in the _CUSTOM_OPS
list. Note that this means that a given op_<name>
cannot implement both ATen-compatible and non-ATen-compatible (i.e., custom) operators. We suggest adding the suffix _custom
if necessary: e.g., op_add
for ATen-compatible overloads of the add
operator, and op_add_custom
for non-ATen-compatible overloads.
By default, this target will depend on the core ExecuTorch types, but you can add additional deps if you want to.
NOTE: An op_<name>
target may not depend on another op_<name>
target. If two op_
targets need to share code, define a separate runtime.cxx_library
target under //executorch/kernels/portable/cpu/lib
that they both depend on. This keeps the dependencies more managable, especially for selective builds where only a subset of operators are used.
NOTE: An op_<name>
target may not depend on targets outside of //executorch
. This library is intended to be portable, open-sourceable, and self-contained.
If not already present, create the file //executorch/kernels/portable/cpu/op_<name>.cpp
, which should follow the pattern:
// Copyright (c) Meta Platforms, Inc. and affiliates. #include <executorch/runtime/kernel/kernel_includes.h> namespace torch { namespace executor { namespace native { namespace { // <helper code> } // namespace // <operator overload implementations> } // namespace native } // namespace executor } // namespace torch
With the target and cpp file in place, you should be able to build it:
cd ${HOME}/fbsource/fbcode/executorch buck build fbcode//executorch/kernels/portable/cpu:op_<name>
When you add an entry to the YAML file, the codegen tools will generate an expected function signature for you to implement in a file called NativeFunctions.h
.
To build and find that generated header, run the script fbsource/fbcode/executorch/kernels/portable/find_op_header.sh
. It will print output like:
===== Generating header files ===== File changed: fbcode//executorch/kernels/portable/functions.yaml Buck UI: https://www.internalfb.com/buck2/e5a6f22a-5b6e-4931-9a7f-df18bdf97ab6 RE Session: reSessionID-4b735cfa-e66f-43d8-a73b-94f22d5936c5 Jobs completed: 3. Time elapsed: 0.2s. Cache hits: 100%. Commands: 1 (cached: 1, remote: 0, local: 0) BUILD SUCCEEDED Header file: /data/users/USER/fbsource/buck-out/v2/gen/fbcode/d839c731f5505c62/executorch/codegen/__generated_lib_generate__/out/NativeFunctions.h
The path will be different in your environment, so be sure to use the output from the script instead of copy-pasting this path. And, since this header is generated from the YAML files, re-run the script if you have modified your operator's entry in those files.
Open that file and look for the function with the same name that you earlier added under CPU: dispatch:
in the YAML file. For add_out
, this might look like
TORCH_API torch::executor::Tensor & add_out(const at::Tensor & self, const at::Tensor & other, at::Tensor & out);
This is the function signature that you will need to implement.
Now that you have your function signature, add a stub to the op_<name>.cpp
file that just returns the out
argument. For example:
Tensor& add_out( const Tensor& self, const Tensor& other, Tensor& out) { return out; }
Note that you should drop the TORCH_API
attribute, and should drop at::
.
Try building again with
cd ${HOME}/fbsource/fbcode/executorch buck build fbcode//executorch/kernels/portable/cpu:op_<name>
Define a test build target by adding an entry to //executorch/kernels/portable/test/targets.bzl
, inside define_common_targets()
, in sorted order with other _op_test
entries:
_op_target(name = "op_<name>_test")
By default, this target will depend on //executorch/kernels/portable/cpu:op_<name>
, the core Executor types, and some helper test utilities (see headers), but you can add additional deps if you want to.
If not already present, create the file //executorch/kernels/portable/test/op_<name>_test.cpp
. Here's a suggested starting point:
// Copyright (c) Meta Platforms, Inc. and affiliates. #include <executorch/kernels/test/FunctionHeaderWrapper.h> // Declares the operator #include <executorch/runtime/core/exec_aten/exec_aten.h> #include <executorch/runtime/core/exec_aten/testing_util/tensor_factory.h> #include <executorch/runtime/core/exec_aten/testing_util/tensor_util.h> #include <gtest/gtest.h> using namespace ::testing; using exec_aten::ScalarType; using exec_aten::Tensor; using torch::executor::native::<operator_function_name>; using torch::executor::testing::IsCloseTo; using torch::executor::testing::TensorFactory; TEST(Op<Name>Test, SmokeTest) { TensorFactory<ScalarType::Int> tf; Tensor a = tf.make(/*sizes=*/{2, 2}, /*data=*/{1, 1, 1, 1}): Tensor b = tf.ones(/*sizes=*/{2, 2}): Tensor z = tf.zeros(/*sizes=*/{2, 2}): EXPECT_EQ(a, b); // Exact equality EXPECT_THAT(a, IsCloseTo(b)); // For floating-point tensors EXPECT_NE(a, z); EXPECT_THAT(a, Not(IsCloseTo(z))); }
Try running the test:
cd ${HOME}/fbsource/fbcode/executorch buck test fbcode//executorch/kernels/test:op_<name>_test
You should now be able to implement and test your operator. It's helpful to see how other operators do it, so take a look at op_add
:
Check out how it uses helper macros like ET_CHECK_SAME_SHAPE_AND_DTYPE
and ET_FORALL_REAL_TYPES
when implementing the operator, and test helpers like TensorFactory
and IsCloseTo()
when testing.
To reduce dependencies and size, to ensure portability, and to conform to the restrictions of embedded environments, your operator implementations:
string
/basic_string
, vector
, unordered_map
, cout
, unique_pointer
must not be used.out
parameter or another tensor parameter to be used as scratch space.new
, malloc
, realloc
, etc., as well as operations that allocate under the hood like make_unique
, or the creation of vector
or string
, for example.stdout
, stderr
, or other file/stream IO via printf
/cout
etc.; instead, use ET_LOG
from executorch/runtime/platform/log.h
.assert()
. Instead use ET_CHECK
and other macros from executorch/runtime/platform/assert.h
.ET_CHECK
and other macros from executorch/runtime/platform/assert.h
.Note that not all of these apply to every ExecuTorch-compatible operator implementation, only those included in this portable library.
For example, a target-specfic custom operator that initiates a DMA copy would be stateful, and would probaby modify global memory, but it would need to use target-specific APIs to do so. But, since this library is only for portable operator implementations, the operators it contains can't depend on target-specific APIs like that.
The portable kernel impelemntation and its corresponding tests can be used as a reference for other kernels. We can also share the test cases in //executorch/kernels/test
, which contains common resources for kernel testing.
util.bzl contains common BUCK targets for other test libs to include:
targets.bzl has targets shared by other kernels tests:
_codegen_function_header_wrapper generates a header FunctionHeaderWrapper.h, which simply includes the corresponding Functions.h file for the specified kernel: #include <executorch/kernels/{}/Functions.h>
. With that, the test sources don't need to know about which kernel we are testing and which Functions.h we should use.
With _common_op_test we use a single test source file (op__test.cpp) at this directory. We automatically find the corresponding registered dispatch function through Funcitons.h, so it can be used to test multiple kernels.
In /test/ we can put kernel-specific test cases.
SupportedFeatures is used to distinguish between different kernel features. For example, ATen supports mixing input and output dtype while portable doesn‘t. When we expect death in portable testing in such case, we can check the supported features by the running kernel and bypass if it’s supported.