docs/source/portable-cpp-programming.md - platform/external/executorch - Git at Google

 # Portable C++ Programming

 NOTE: This document covers the code that needs to build for and execute in
 target hardware environments. This applies to the core execution runtime, as
 well as kernel and backend implementations in this repo. These rules do not
 necessarily apply to code that only runs on the development host, like authoring
 or build tools.

 The ExecuTorch runtime code is intendend to be portable, and should build for a
 wide variety of systems, from servers to mobile phones to DSPs, from POSIX to
 Windows to bare-metal environments.

 This means that it can't assume the existence of:
 - Files
 - Threads
 - Exceptions
 - `stdout`, `stderr`
 - `printf()`, `fprintf()`
 - POSIX APIs and concepts in general

 It also can't assume:
 - 64 bit pointers
 - The size of a given integer type
 - The signedness of `char`

 To keep the binary size to a minimum, and to keep tight control over memory
 allocation, the code may not use:
 - `malloc()`, `free()`
 - `new`, `delete`
 - Most `stdlibc++` types; especially container types that manage their own
   memory like `string` and `vector`, or memory-management wrapper types like
   `unique_ptr` and `shared_ptr`.

 And to help reduce complexity, the code may not depend on any external
 dependencies except:
 - `flatbuffers` (for `.pte` file deserialization)
 - `flatcc` (for event trace serialization)
 - Core PyTorch (only for ATen mode)

 ## Platform Abstraction Layer (PAL)

 To avoid assuming the capabilities of the target system, the ExecuTorch runtime
 lets clients override low-level functions in its Platform Abstraction Layer
 (PAL), defined in `//executorch/runtime/platform/platform.h`, to perform operations
 like:
 - Getting the current timestamp
 - Printing a log message
 - Panicking the system

 ## Memory Allocation

 Instead of using `malloc()` or `new`, the runtime code should allocate memory
 using the `MemoryManager` (`//executorch/runtime/executor/memory_manager.h`)
 provided by the client.

 ## File Loading

 Instead of loading files directly, clients should provide buffers with the data
 already loaded, or wrapped in types like `DataLoader`.

 ## Integer Types

 ExecuTorch runtime code should not assume anything about the sizes of primitive
 types like `int`, `short`, or `char`. For example, the C++ standard only
 guarantees that `int` will be at least 16 bits wide. And ARM toolchains treat
 `char` as unsigned, while other toolchains often treat it as signed.

 Instead, the runtime APIs use a set of more predictable, but still standard,
 integer types:
 - `<cstdint>` types like `uint64_t`, `int32_t`; these types guarantee the bit
   width and signedness, regardless of the architecture. Use these types when you
   need a very specific integer width.
 - `size_t` for counts of things, or memory offsets. `size_t` is guaranteed to be
   big enough to represent any memory byte offset; i.e., it will be as wide as
   the native pointer type for the target system. Prefer using this instead of
   `uint64_t` for counts/offsets so that 32-bit systems don't need to pay for the
   unnecessary overhead of a 64-bit value.
 - `ssize_t` for some ATen-compatibility situations where `Tensor` returns a
   signed count. Prefer `size_t` when possible.

 ## Floating Point Arithmetic

 Not every system has support for floating point arithmetic: some don't even enable
 floating point emulation in their toolchains. Therefore, the core runtime code
 must not perform any floating point arithmetic at runtime, although it is ok to
 simply create or manage `float` or `double` values (e.g., in an `EValue`).

 Kernels, being outside of the core runtime, are allowed to perform floating point
 arithmetic. Though some kernels may choose not to, so that they can run on systems
 without floating point support.

 ## Logging

 Instead of using `printf()`, `fprintf()`, `cout`, `cerr`, or a library like
 `folly::logging` or `glog`, the ExecuTorch runtime provides the `ET_LOG`
 interface in `//executorch/runtime/platform/log.h` and the `ET_CHECK` interface in
 `//executorch/runtime/platform/assert.h`. The messages are printed using a hook in the PAL,
 which means that clients can redirect them to any underlying logging system, or
 just print them to `stderr` if available.

 ### Logging Format Portability

 #### Fixed-Width Integers

 When you have a log statement like
 ```
 int64_t value;
 ET_LOG(Error, "Value %??? is bad", value);
 ```
 what should you put for the `%???` part, to match the `int64_t`? On different
 systems, the `int64_t` typdef might be `int`, `long int`, or `long long int`.
 Picking a format like `%d`, `%ld`, or `%lld` might work on one target, but break
 on the others.

 To be portable, the runtime code uses the standard (but admittedly awkward)
 helper macros from `<cinttypes>`. Each portable integer type has a corresponding
 `PRIn##` macro, like
 - `int32_t` -> `PRId32`
 - `uint32_t` -> `PRIu32`
 - `int64_t` -> `PRId64`
 - `uint64_t` -> `PRIu64`
 - See https://en.cppreference.com/w/cpp/header/cinttypes for more

 These macros are literal strings that can concatenate with other parts of the
 format string, like
 ```
 int64_t value;
 ET_LOG(Error, "Value %" PRId64 " is bad", value);
 ```
 Note that this requires chopping up the literal format string (the extra double
 quotes). It also requires the leading `%` before the macro.

 But, by using these macros, you're guaranteed that the toolchain will use the
 appropriate format pattern for the type.

 #### `size_t`, `ssize_t`

 Unlike the fixed-width integer types, format strings already have a portable
 way to handle `size_t` and `ssize_t`:
 - `size_t` -> `%zu`
 - `ssize_t` -> `%zd`

 #### Casting

 Sometimes, especially in code that straddles ATen and lean mode, the type of the
 value itself might be different across build modes. In those cases, cast the
 value to the lean mode type, like:
 ```
 ET_CHECK_MSG(
     input.dim() == output.dim(),
     "input.dim() %zd not equal to output.dim() %zd",
     (ssize_t)input.dim(),
     (ssize_t)output.dim());
 ```
 In this case, `Tensor::dim()` returns `ssize_t` in lean mode, while
 `at::Tensor::dim()` returns `int64_t` in ATen mode. Since they both conceptually
 return (signed) counts, `ssize_t` is the most appropriate integer type.
 `int64_t` would work, but it would unnecessarily require 32-bit systems to deal
 with a 64-bit value in lean mode.

 This is the only situation where casting should be necessary, when lean and ATen
 modes disagree. Otherwise, use the format pattern that matches the type.
	# Portable C++ Programming

	NOTE: This document covers the code that needs to build for and execute in
	target hardware environments. This applies to the core execution runtime, as
	well as kernel and backend implementations in this repo. These rules do not
	necessarily apply to code that only runs on the development host, like authoring
	or build tools.

	The ExecuTorch runtime code is intendend to be portable, and should build for a
	wide variety of systems, from servers to mobile phones to DSPs, from POSIX to
	Windows to bare-metal environments.

	This means that it can't assume the existence of:
	- Files
	- Threads
	- Exceptions
	- `stdout`, `stderr`
	- `printf()`, `fprintf()`
	- POSIX APIs and concepts in general

	It also can't assume:
	- 64 bit pointers
	- The size of a given integer type
	- The signedness of `char`

	To keep the binary size to a minimum, and to keep tight control over memory
	allocation, the code may not use:
	- `malloc()`, `free()`
	- `new`, `delete`
	- Most `stdlibc++` types; especially container types that manage their own
	memory like `string` and `vector`, or memory-management wrapper types like
	`unique_ptr` and `shared_ptr`.

	And to help reduce complexity, the code may not depend on any external
	dependencies except:
	- `flatbuffers` (for `.pte` file deserialization)
	- `flatcc` (for event trace serialization)
	- Core PyTorch (only for ATen mode)

	## Platform Abstraction Layer (PAL)

	To avoid assuming the capabilities of the target system, the ExecuTorch runtime
	lets clients override low-level functions in its Platform Abstraction Layer
	(PAL), defined in `//executorch/runtime/platform/platform.h`, to perform operations
	like:
	- Getting the current timestamp
	- Printing a log message
	- Panicking the system

	## Memory Allocation

	Instead of using `malloc()` or `new`, the runtime code should allocate memory
	using the `MemoryManager` (`//executorch/runtime/executor/memory_manager.h`)
	provided by the client.

	## File Loading

	Instead of loading files directly, clients should provide buffers with the data
	already loaded, or wrapped in types like `DataLoader`.

	## Integer Types

	ExecuTorch runtime code should not assume anything about the sizes of primitive
	types like `int`, `short`, or `char`. For example, the C++ standard only
	guarantees that `int` will be at least 16 bits wide. And ARM toolchains treat
	`char` as unsigned, while other toolchains often treat it as signed.

	Instead, the runtime APIs use a set of more predictable, but still standard,
	integer types:
	- `<cstdint>` types like `uint64_t`, `int32_t`; these types guarantee the bit
	width and signedness, regardless of the architecture. Use these types when you
	need a very specific integer width.
	- `size_t` for counts of things, or memory offsets. `size_t` is guaranteed to be
	big enough to represent any memory byte offset; i.e., it will be as wide as
	the native pointer type for the target system. Prefer using this instead of
	`uint64_t` for counts/offsets so that 32-bit systems don't need to pay for the
	unnecessary overhead of a 64-bit value.
	- `ssize_t` for some ATen-compatibility situations where `Tensor` returns a
	signed count. Prefer `size_t` when possible.

	## Floating Point Arithmetic

	Not every system has support for floating point arithmetic: some don't even enable
	floating point emulation in their toolchains. Therefore, the core runtime code
	must not perform any floating point arithmetic at runtime, although it is ok to
	simply create or manage `float` or `double` values (e.g., in an `EValue`).

	Kernels, being outside of the core runtime, are allowed to perform floating point
	arithmetic. Though some kernels may choose not to, so that they can run on systems
	without floating point support.

	## Logging

	Instead of using `printf()`, `fprintf()`, `cout`, `cerr`, or a library like
	`folly::logging` or `glog`, the ExecuTorch runtime provides the `ET_LOG`
	interface in `//executorch/runtime/platform/log.h` and the `ET_CHECK` interface in
	`//executorch/runtime/platform/assert.h`. The messages are printed using a hook in the PAL,
	which means that clients can redirect them to any underlying logging system, or
	just print them to `stderr` if available.

	### Logging Format Portability

	#### Fixed-Width Integers

	When you have a log statement like
	```
	int64_t value;
	ET_LOG(Error, "Value %??? is bad", value);
	```
	what should you put for the `%???` part, to match the `int64_t`? On different
	systems, the `int64_t` typdef might be `int`, `long int`, or `long long int`.
	Picking a format like `%d`, `%ld`, or `%lld` might work on one target, but break
	on the others.

	To be portable, the runtime code uses the standard (but admittedly awkward)
	helper macros from `<cinttypes>`. Each portable integer type has a corresponding
	`PRIn##` macro, like
	- `int32_t` -> `PRId32`
	- `uint32_t` -> `PRIu32`
	- `int64_t` -> `PRId64`
	- `uint64_t` -> `PRIu64`
	- See https://en.cppreference.com/w/cpp/header/cinttypes for more

	These macros are literal strings that can concatenate with other parts of the
	format string, like
	```
	int64_t value;
	ET_LOG(Error, "Value %" PRId64 " is bad", value);
	```
	Note that this requires chopping up the literal format string (the extra double
	quotes). It also requires the leading `%` before the macro.

	But, by using these macros, you're guaranteed that the toolchain will use the
	appropriate format pattern for the type.

	#### `size_t`, `ssize_t`

	Unlike the fixed-width integer types, format strings already have a portable
	way to handle `size_t` and `ssize_t`:
	- `size_t` -> `%zu`
	- `ssize_t` -> `%zd`

	#### Casting

	Sometimes, especially in code that straddles ATen and lean mode, the type of the
	value itself might be different across build modes. In those cases, cast the
	value to the lean mode type, like:
	```
	ET_CHECK_MSG(
	input.dim() == output.dim(),
	"input.dim() %zd not equal to output.dim() %zd",
	(ssize_t)input.dim(),
	(ssize_t)output.dim());
	```
	In this case, `Tensor::dim()` returns `ssize_t` in lean mode, while
	`at::Tensor::dim()` returns `int64_t` in ATen mode. Since they both conceptually
	return (signed) counts, `ssize_t` is the most appropriate integer type.
	`int64_t` would work, but it would unnecessarily require 32-bit systems to deal
	with a 64-bit value in lean mode.

	This is the only situation where casting should be necessary, when lean and ATen
	modes disagree. Otherwise, use the format pattern that matches the type.