Portable C++ Programming

NOTE: This document covers the code that needs to build for and execute in target hardware environments. This applies to the core execution runtime, as well as kernel and backend implementations in this repo. These rules do not necessarily apply to code that only runs on the development host, like authoring or build tools.

The ExecuTorch runtime code is intendend to be portable, and should build for a wide variety of systems, from servers to mobile phones to DSPs, from POSIX to Windows to bare-metal environments.

This means that it can't assume the existence of:

Files
Threads
Exceptions
stdout, stderr
printf(), fprintf()
POSIX APIs and concepts in general

It also can't assume:

64 bit pointers
The size of a given integer type
The signedness of char

To keep the binary size to a minimum, and to keep tight control over memory allocation, the code may not use:

malloc(), free()
new, delete
Most stdlibc++ types; especially container types that manage their own memory like string and vector, or memory-management wrapper types like unique_ptr and shared_ptr.

And to help reduce complexity, the code may not depend on any external dependencies except:

flatbuffers (for .pte file deserialization)
flatcc (for event trace serialization)
Core PyTorch (only for ATen mode)

Platform Abstraction Layer (PAL)

To avoid assuming the capabilities of the target system, the ExecuTorch runtime lets clients override low-level functions in its Platform Abstraction Layer (PAL), defined in //executorch/runtime/platform/platform.h, to perform operations like:

Getting the current timestamp
Printing a log message
Panicking the system

Memory Allocation

Instead of using malloc() or new, the runtime code should allocate memory using the MemoryManager (//executorch/runtime/executor/memory_manager.h) provided by the client.

File Loading

Instead of loading files directly, clients should provide buffers with the data already loaded, or wrapped in types like DataLoader.

Integer Types

ExecuTorch runtime code should not assume anything about the sizes of primitive types like int, short, or char. For example, the C++ standard only guarantees that int will be at least 16 bits wide. And ARM toolchains treat char as unsigned, while other toolchains often treat it as signed.

Instead, the runtime APIs use a set of more predictable, but still standard, integer types:

<cstdint> types like uint64_t, int32_t; these types guarantee the bit width and signedness, regardless of the architecture. Use these types when you need a very specific integer width.
size_t for counts of things, or memory offsets. size_t is guaranteed to be big enough to represent any memory byte offset; i.e., it will be as wide as the native pointer type for the target system. Prefer using this instead of uint64_t for counts/offsets so that 32-bit systems don't need to pay for the unnecessary overhead of a 64-bit value.
ssize_t for some ATen-compatibility situations where Tensor returns a signed count. Prefer size_t when possible.

Floating Point Arithmetic

Not every system has support for floating point arithmetic: some don't even enable floating point emulation in their toolchains. Therefore, the core runtime code must not perform any floating point arithmetic at runtime, although it is ok to simply create or manage float or double values (e.g., in an EValue).

Kernels, being outside of the core runtime, are allowed to perform floating point arithmetic. Though some kernels may choose not to, so that they can run on systems without floating point support.

Logging

Instead of using printf(), fprintf(), cout, cerr, or a library like folly::logging or glog, the ExecuTorch runtime provides the ET_LOG interface in //executorch/runtime/platform/log.h and the ET_CHECK interface in //executorch/runtime/platform/assert.h. The messages are printed using a hook in the PAL, which means that clients can redirect them to any underlying logging system, or just print them to stderr if available.

Logging Format Portability

Fixed-Width Integers

When you have a log statement like

int64_t value;
ET_LOG(Error, "Value %??? is bad", value);

what should you put for the %??? part, to match the int64_t? On different systems, the int64_t typdef might be int, long int, or long long int. Picking a format like %d, %ld, or %lld might work on one target, but break on the others.

To be portable, the runtime code uses the standard (but admittedly awkward) helper macros from <cinttypes>. Each portable integer type has a corresponding PRIn## macro, like

int32_t -> PRId32
uint32_t -> PRIu32
int64_t -> PRId64
uint64_t -> PRIu64
See https://en.cppreference.com/w/cpp/header/cinttypes for more

These macros are literal strings that can concatenate with other parts of the format string, like

int64_t value;
ET_LOG(Error, "Value %" PRId64 " is bad", value);

Note that this requires chopping up the literal format string (the extra double quotes). It also requires the leading % before the macro.

But, by using these macros, you're guaranteed that the toolchain will use the appropriate format pattern for the type.

`size_t`, `ssize_t`

Unlike the fixed-width integer types, format strings already have a portable way to handle size_t and ssize_t:

size_t -> %zu
ssize_t -> %zd

Casting

Sometimes, especially in code that straddles ATen and lean mode, the type of the value itself might be different across build modes. In those cases, cast the value to the lean mode type, like:

ET_CHECK_MSG(
    input.dim() == output.dim(),
    "input.dim() %zd not equal to output.dim() %zd",
    (ssize_t)input.dim(),
    (ssize_t)output.dim());

In this case, Tensor::dim() returns ssize_t in lean mode, while at::Tensor::dim() returns int64_t in ATen mode. Since they both conceptually return (signed) counts, ssize_t is the most appropriate integer type. int64_t would work, but it would unnecessarily require 32-bit systems to deal with a 64-bit value in lean mode.

This is the only situation where casting should be necessary, when lean and ATen modes disagree. Otherwise, use the format pattern that matches the type.