| *This document was originally written for a broad audience, and it was* |
| *determined that it'd be good to hold in Bionic's docs, too. Due to the* |
| *ever-changing nature of code, it tries to link to a stable tag of* |
| *Bionic's libc, rather than the live code in Bionic. Same for Clang.* |
| *Reader beware. :)* |
| |
| # The Anatomy of Clang FORTIFY |
| |
| ## Objective |
| |
| The intent of this document is to run through the minutiae of how Clang FORTIFY |
| actually works in Bionic at the time of writing. Other FORTIFY implementations |
| that target Clang should use very similar mechanics. This document exists in part |
| because many Clang-specific features serve multiple purposes simultaneously, so |
| getting up-to-speed on how things function can be quite difficult. |
| |
| ## Background |
| |
| FORTIFY is a broad suite of extensions to libc aimed at catching misuses of |
| common library functions. Textually, these extensions exist purely in libc, but |
| all implementations of FORTIFY rely heavily on C language extensions in order |
| to function at all. |
| |
| Broadly, FORTIFY implementations try to guard against many misuses of C |
| standard(-ish) libraries: |
| - Buffer overruns in functions where pointers+sizes are passed (e.g., `memcpy`, |
| `poll`), or where sizes exist implicitly (e.g., `strcpy`). |
| - Arguments with incorrect values passed to libc functions (e.g., |
| out-of-bounds bits in `umask`). |
| - Missing arguments to functions (e.g., `open()` with `O_CREAT`, but no mode |
| bits). |
| |
| FORTIFY is traditionally enabled by passing `-D_FORTIFY_SOURCE=N` to your |
| compiler. `N==0` disables FORTIFY, whereas `N==1`, `N==2`, and `N==3` enable |
| increasingly strict versions of it. In general, FORTIFY doesn't require user |
| code changes; that said, some code patterns |
| are [incompatible with stricter versions of FORTIFY checking]. This is largely |
| because FORTIFY has significant flexibility in what it considers to be an |
| "out-of-bounds" access. |
| |
| FORTIFY implementations use a mix of compiler diagnostics and runtime checks to |
| flag and/or mitigate the impacts of the misuses mentioned above. |
| |
| Further, given FORTIFY's design, the effectiveness of FORTIFY is a function of |
| -- among other things -- the optimization level you're compiling your code at. |
| Many FORTIFY implementations are implicitly disabled when building with `-O0`, |
| since FORTIFY's design for both Clang and GCC relies on optimizations in order |
| to provide useful run-time checks. For the purpose of this document, all |
| analysis of FORTIFY functions and commentary on builtins assume that code is |
| being built with some optimization level > `-O0`. |
| |
| ### A note on GCC |
| |
| This document talks specifically about Bionic's FORTIFY implementation targeted |
| at Clang. While GCC also provides a set of language extensions necessary to |
| implement FORTIFY, these tools are different from what Clang offers. This |
| divergence is an artifact of Clang and GCC's differing architecture as |
| compilers. |
| |
| Textually, quite a bit can be shared between a FORTIFY implementation for GCC |
| and one for Clang (e.g., see [ChromeOS' Glibc patch]), but this kind of sharing |
| requires things like macros that expand to unbalanced braces depending on your |
| compiler: |
| |
| ```c |
| /* |
| * Highly simplified; if you're interested in FORTIFY's actual implementation, |
| * please see the patch linked above. |
| */ |
| #ifdef __clang__ |
| # define FORTIFY_PRECONDITIONS |
| # define FORTIFY_FUNCTION_END |
| #else |
| # define FORTIFY_PRECONDITIONS { |
| # define FORTIFY_FUNCTION_END } |
| #endif |
| |
| /* |
| * FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN is not defined, due to its |
| * complexity and irrelevance. It turns into a compile-time warning if the |
| * compiler can determine `*buf` has fewer than `size` bytes available. |
| */ |
| |
| char *getcwd(char *buf, size_t size) |
| FORTIFY_PRECONDITIONS |
| FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN(buf, size, "`buf` is too smol.") |
| { |
| // Actual shared function implementation goes here. |
| } |
| FORTIFY_FUNCTION_END |
| ``` |
| |
| All talk of GCC-focused implementations and how to merge Clang and GCC |
| implementations is out-of-scope for this doc, however. |
| |
| ## The Life of a Clang FORTIFY Function |
| |
| As referenced in the Background section, FORTIFY performs many different checks |
| for many functions. This section intends to go through real-world examples of |
| FORTIFY functions in Bionic, breaking down how each part of these functions |
| work, and how the pieces fit together to provide FORTIFY-like functionality. |
| |
| While FORTIFY implementations may differ between stdlibs, they broadly follow |
| the same patterns when implementing their checks for Clang, and they try to |
| make similar promises with respect to FORTIFY compiling to be zero-overhead in |
| some cases, etc. Moreover, while this document specifically examines Bionic, |
| many stdlibs will operate _very similarly_ to Bionic in their Clang FORTIFY |
| implementations. |
| |
| **In general, when reading the below, be prepared for exceptions, subtlety, and |
| corner cases. The individual function breakdowns below try to not offer |
| redundant information. Each one focuses on different aspects of FORTIFY.** |
| |
| ### Terminology |
| |
| Because FORTIFY should be mostly transparent to developers, there are inherent |
| naming collisions here: `memcpy(x, y, z)` turns into fundamentally different |
| generated code depending on the value of `_FORTIFY_SOURCE`. Further, said |
| `memcpy` call with `_FORTIFY_SOURCE` enabled needs to be able to refer to the |
| `memcpy` that would have been called, had `_FORTIFY_SOURCE` been disabled. |
| Hence, the following convention is followed in the subsections below for all |
| prose (namely, multiline code blocks are exempted from this): |
| |
| - Standard library function names preceded by `__builtin_` refer to the use of |
| the function with `_FORTIFY_SOURCE` disabled. |
| - Standard library function names without a prefix refer to the use of the |
| function with `_FORTIFY_SOURCE` enabled. |
| |
| This convention also applies in `clang`. `__builtin_memcpy` will always call |
| `memcpy` as though `_FORTIFY_SOURCE` were disabled. |
| |
| ## Breakdown of `mempcpy` |
| |
| The [FORTIFY'ed version of `mempcpy`] is a full, featureful example of a |
| FORTIFY'ed function from Bionic. From the user's perspective, it supports a few |
| things: |
| - Producing a compile-time error if the number of bytes to copy trivially |
| exceeds the number of bytes available at the destination pointer. |
| - If the `mempcpy` has the potential to write to more bytes than what is |
| available at the destination, a run-time check is inserted to crash the |
| program if more bytes are written than what is allowed. |
| - Compiling away to be zero overhead when none of the buffer sizes can be |
| determined at compile-time[^1]. |
| |
| The declaration in Bionic's headers for `__builtin_mempcpy` is: |
| ```c |
| void* mempcpy(void* __dst, const void* __src, size_t __n) __INTRODUCED_IN(23); |
| ``` |
| |
| Which is annotated with nothing special, so it will be ignored. |
| |
| The [source for `mempcpy`] in Bionic's headers for is: |
| ```c |
| __BIONIC_FORTIFY_INLINE |
| void* mempcpy(void* const dst __pass_object_size0, const void* src, size_t copy_amount) |
| __overloadable |
| __clang_error_if(__bos_unevaluated_lt(__bos0(dst), copy_amount), |
| "'mempcpy' called with size bigger than buffer") { |
| #if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED |
| size_t bos_dst = __bos0(dst); |
| if (!__bos_trivially_ge(bos_dst, copy_amount)) { |
| return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst); |
| } |
| #endif |
| return __builtin_mempcpy(dst, src, copy_amount); |
| } |
| ``` |
| |
| Expanding some of the important macros here, this function expands to roughly: |
| ```c |
| static |
| __inline__ |
| __attribute__((no_stack_protector)) |
| __attribute__((always_inline)) |
| void* mempcpy( |
| void* const dst __attribute__((pass_object_size(0))), |
| const void* src, |
| size_t copy_amount) |
| __attribute__((overloadable)) |
| __attribute__((diagnose_if( |
| __builtin_object_size(dst, 0) != -1 && __builtin_object_size(dst, 0) <= copy_amount), |
| "'mempcpy' called with size bigger than buffer"))) { |
| #if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED |
| size_t bos_dst = __builtin_object_size(dst, 0); |
| if (!(__bos_trivially_ge(bos_dst, copy_amount))) { |
| return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst); |
| } |
| #endif |
| return __builtin_mempcpy(dst, src, copy_amount); |
| } |
| ``` |
| |
| So let's walk through this step by step, to see how FORTIFY does what it says on |
| the tin here. |
| |
| [^1]: "Zero overhead" in a way [similar to C++11's `std::unique_ptr`]: this will |
| turn into a direct call `__builtin_mempcpy` (or an optimized form thereof) with |
| no other surrounding checks at runtime. However, the additional complexity may |
| hinder optimizations that are performed before the optimizer can prove that the |
| `if (...) { ... }` can be optimized out. Depending on how late this happens, |
| the additional complexity may skew inlining costs, hide opportunities for e.g., |
| `memcpy` coalescing, etc etc. |
| |
| ### How does Clang select `mempcpy`? |
| |
| First, it's critical to notice that `mempcpy` is marked `overloadable`. This |
| function is a `static inline __attribute__((always_inline))` overload of |
| `__builtin_mempcpy`: |
| - `__attribute__((overloadable))` allows us to perform overloading in C. |
| - `__attribute__((overloadable))` mangles all calls to functions marked with |
| `__attribute__((overloadable))`. |
| - `__attribute__((overloadable))` allows exactly one function signature with a |
| given name to not be marked with `__attribute__((overloadable))`. Calls to |
| this overload will not be mangled. |
| |
| Second, one might note that this `mempcpy` implementation has the same C-level |
| signature as `__builtin_mempcpy`. `pass_object_size` is a Clang attribute that |
| is generally needed by FORTIFY, but it carries the side-effect that functions |
| may be overloaded simply on the presence (or lack of presence) of |
| `pass_object_size` attributes. Given two overloads of a function that only |
| differ on the presence of `pass_object_size` attributes, the candidate with |
| `pass_object_size` attributes is preferred. |
| |
| Finally, the prior paragraph gets thrown out if one tries to take the address of |
| `mempcpy`. It is impossible to take the address of a function with one or more |
| parameters that are annotated with `pass_object_size`. Hence, |
| `&__builtin_mempcpy == &mempcpy`. Further, because this is an issue of overload |
| resolution, `(&mempcpy)(x, y, z);` is functionally identical to |
| `__builtin_mempcpy(x, y, z);`. |
| |
| All of this accomplishes the following: |
| - Direct calls to `mempcpy` should call the FORTIFY-protected `mempcpy`. |
| - Indirect calls to `&mempcpy` should call `__builtin_mempcpy`. |
| |
| ### How does Clang offer compile-time diagnostics? |
| |
| Once one is convinced that the FORTIFY-enabled overload of `mempcpy` will be |
| selected for direct calls, Clang's `diagnose_if` and `__builtin_object_size` do |
| all of the work from there. |
| |
| Subtleties here primarily fall out of the discussion in the above section about |
| `&__builtin_mempcpy == &mempcpy`: |
| ```c |
| #define _FORTIFY_SOURCE 2 |
| #include <string.h> |
| void example_code() { |
| char buf[4]; // ...Assume sizeof(char) == 1. |
| const char input_buf[] = "Hello, World"; |
| mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued. |
| |
| mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5. |
| __builtin_mempcpy(buf, input_buf, 5); // No compile-time error. |
| (&mempcpy)(buf, input_buf, 5); // No compile-time error, since __builtin_mempcpy is selected. |
| } |
| ``` |
| |
| Otherwise, the rest of this subsection is dedicated to preliminary discussion |
| about `__builtin_object_size`. |
| |
| Clang's frontend can do one of two things with `__builtin_object_size(p, n)`: |
| - Evaluate it as a constant. |
| - This can either mean declaring that the number of bytes at `p` is definitely |
| impossible to know, so the default value is used, or the number of bytes at |
| `p` can be known without optimizations. |
| - Declare that the expression cannot form a constant, and lower it to |
| `@llvm.objectsize`, which is discussed in depth later. |
| |
| In the examples above, since `diagnose_if` is evaluated with context from the |
| caller, Clang should be able to trivially determine that `buf` refers to a |
| `char` array with 4 elements. |
| |
| The primary consequence of the above is that diagnostics can only be emitted if |
| no optimizations are required to detect a broken code pattern. To be specific, |
| clang's constexpr evaluator must be able to determine the logical object that |
| any given pointer points to in order to fold `__builtin_object_size` to a |
| constant, non-default answer: |
| |
| ```c |
| #define _FORTIFY_SOURCE 2 |
| #include <string.h> |
| void example_code() { |
| char buf[4]; // ...Assume sizeof(char) == 1. |
| const char input_buf[] = "Hello, World"; |
| mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued. |
| mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5. |
| char *buf_ptr = buf; |
| mempcpy(buf_ptr, input_buf, 5); // No compile-time error; `buf_ptr`'s target can't be determined. |
| } |
| ``` |
| |
| ### How does Clang insert run-time checks? |
| |
| This section expands on the following statement: FORTIFY has zero runtime cost |
| in instances where there is no chance of catching a bug at run-time. Otherwise, |
| it introduces a tiny additional run-time cost to ensure that functions aren't |
| misused. |
| |
| In prior sections, the following was established: |
| - `overloadable` and `pass_object_size` prompt Clang to always select this |
| overload of `mempcpy` over `__builtin_mempcpy` for direct calls. |
| - If a call to `mempcpy` was trivially broken, Clang would produce a |
| compile-time error, rather than producing a binary. |
| |
| Hence, the case we're interested in here is one where Clang's frontend selected |
| a FORTIFY'ed function's implementation for a function call, but was unable to |
| find anything seriously wrong with said function call. Since the frontend is |
| powerless to detect bugs at this point, our focus shifts to the mechanisms that |
| LLVM uses to support FORTIFY. |
| |
| Going back to Bionic's `mempcpy` implementation, we have the following (ignoring |
| diagnose_if and assuming run-time checks are enabled): |
| ```c |
| static |
| __inline__ |
| __attribute__((no_stack_protector)) |
| __attribute__((always_inline)) |
| void* mempcpy( |
| void* const dst __attribute__((pass_object_size(0))), |
| const void* src, |
| size_t copy_amount) |
| __attribute__((overloadable)) { |
| size_t bos_dst = __builtin_object_size(dst, 0); |
| if (bos_dst != -1 && |
| !(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) { |
| return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst); |
| } |
| return __builtin_mempcpy(dst, src, copy_amount); |
| } |
| ``` |
| |
| In other words, we have a `static`, `always_inline` function which: |
| - If `__builtin_object_size(dst, 0)` cannot be determined (in which case, it |
| returns -1), calls `__builtin_mempcpy`. |
| - Otherwise, if `copy_amount` can be folded to a constant, and if |
| `__builtin_object_size(dst, 0) >= copy_amount`, calls `__builtin_mempcpy`. |
| - Otherwise, calls `__builtin___mempcpy_chk`. |
| |
| |
| How can this be "zero overhead"? Let's focus on the following part of the |
| function: |
| |
| ```c |
| size_t bos_dst = __builtin_object_size(dst, 0); |
| if (bos_dst != -1 && |
| !(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) { |
| ``` |
| |
| If Clang's frontend cannot determine a value for `__builtin_object_size`, Clang |
| lowers it to LLVM's `@llvm.objectsize` intrinsic. The `@llvm.objectsize` |
| invocation corresponding to `__builtin_object_size(p, 0)` is guaranteed to |
| always fold to a constant value by the time LLVM emits machine code. |
| |
| Hence, `bos_dst` is guaranteed to be a constant; if it's -1, the above branch |
| can be eliminated entirely, since it folds to `if (false && ...)`. Further, the |
| RHS of the `&&` in this branch has us call `__builtin_mempcpy` if `copy_amount` |
| is a known value less than `bos_dst` (yet another constant value). Therefore, |
| the entire condition is always knowable when LLVM is done with LLVM IR-level |
| optimizations, so no condition is ever emitted to machine code in practice. |
| |
| #### Why is "zero overhead" in quotes? Why is `unique_ptr` relevant? |
| |
| `__builtin_object_size` and `__builtin_constant_p` are forced to be constants |
| after most optimizations take place. Until LLVM replaces both of these with |
| constants and optimizes them out, we have additional branches and function calls |
| in our IR. This can have negative effects, such as distorting inlining costs and |
| inhibiting optimizations that are conservative around branches in control-flow. |
| |
| So FORTIFY is free in these cases _in isolation of any of the code around it_. |
| Due to its implementation, it may impact the optimizations that occur on code |
| around the literal call to the FORTIFY-hardened libc function. |
| |
| `unique_ptr` was just the first thing that came to the author's mind for "the |
| type should be zero cost with any level of optimization enabled, but edge-cases |
| might make it only-mostly-free to use." |
| |
| ### How is checking actually performed? |
| |
| In cases where checking can be performed (e.g., where we call |
| `__builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);`), Bionic provides [an |
| implementation for `__mempcpy_chk`]. This is: |
| |
| ```c |
| extern "C" void* __mempcpy_chk(void* dst, const void* src, size_t count, size_t dst_len) { |
| __check_count("mempcpy", "count", count); |
| __check_buffer_access("mempcpy", "write into", count, dst_len); |
| return mempcpy(dst, src, count); |
| } |
| ``` |
| This function itself boils down to a few small branches which abort the program |
| if they fail, and a direct call to `__builtin_mempcpy`. |
| |
| ### Wrapping up |
| |
| In the above breakdown, it was shown how Clang and Bionic work together to: |
| - represent FORTIFY-hardened overloads of functions, |
| - report misuses of stdlib functions at compile-time, and |
| - insert run-time checks for uses of functions that might be incorrect, but only |
| if we have the potential of proving the incorrectness of these. |
| |
| ## Breakdown of open |
| |
| In Bionic, the [FORTIFY'ed implementation of `open`] is quite large. Much like |
| `mempcpy`, the `__builtin_open` declaration is simple: |
| |
| ```c |
| int open(const char* __path, int __flags, ...); |
| ``` |
| |
| With some macros expanded, the FORTIFY-hardened header implementation is: |
| ```c |
| int __open_2(const char*, int); |
| int __open_real(const char*, int, ...) __asm__(open); |
| |
| #define __open_modes_useful(flags) (((flags) & O_CREAT) || ((flags) & O_TMPFILE) == O_TMPFILE) |
| |
| static |
| int open(const char* pathname, int flags, mode_t modes, ...) __overloadable |
| __attribute__((diagnose_if(1, "error", "too many arguments"))); |
| |
| static |
| __inline__ |
| __attribute__((no_stack_protector)) |
| __attribute__((always_inline)) |
| int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags) |
| __attribute__((overloadable)) |
| __attribute__((diagnose_if( |
| __open_modes_useful(flags), |
| "error", |
| "'open' called with O_CREAT or O_TMPFILE, but missing mode"))) { |
| #if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED |
| return __open_2(pathname, flags); |
| #else |
| return __open_real(pathname, flags); |
| #endif |
| } |
| static |
| __inline__ |
| __attribute__((no_stack_protector)) |
| __attribute__((always_inline)) |
| int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes) |
| __attribute__((overloadable)) |
| __clang_warning_if(!__open_modes_useful(flags) && modes, |
| "'open' has superfluous mode bits; missing O_CREAT?") { |
| return __open_real(pathname, flags, modes); |
| } |
| ``` |
| |
| Which may be a lot to take in. |
| |
| Before diving too deeply, please note that the remainder of these subsections |
| assume that the programmer didn't make any egregious typos. Moreover, there's no |
| real way that Bionic tries to prevent calls to `open` like |
| `open("foo", 0, "how do you convert a const char[N] to mode_t?");`. The only |
| real C-compatible solution the author can think of is "stamp out many overloads |
| to catch sort-of-common instances of this very uncommon typo." This isn't great. |
| |
| More directly, no effort is made below to recognize calls that, due to |
| incompatible argument types, cannot go to any `open` implementation other than |
| `__builtin_open`, since it's recognized right here. :) |
| |
| ### Implementation breakdown |
| |
| This `open` implementation does a few things: |
| - Turns calls to `open` with too many arguments into a compile-time error. |
| - Diagnoses calls to `open` with missing modes at compile-time and run-time |
| (both cases turn into errors). |
| - Emits warnings on calls to `open` with useless mode bits, unless the mode bits |
| are all 0. |
| |
| One common bit of code not explained below is the `__open_real` declaration above: |
| ```c |
| int __open_real(const char*, int, ...) __asm__(open); |
| ``` |
| |
| This exists as a way for us to call `__builtin_open` without needing clang to |
| have a pre-defined `__builtin_open` function. |
| |
| #### Compile-time error on too many arguments |
| |
| ```c |
| static |
| int open(const char* pathname, int flags, mode_t modes, ...) __overloadable |
| __attribute__((diagnose_if(1, "error", "too many arguments"))); |
| ``` |
| |
| Which matches most calls to open that supply too many arguments, since |
| `int(const char *, int, ...)` matches less strongly than |
| `int(const char *, int, mode_t, ...)` for calls where the 3rd arg can be |
| converted to `mode_t` without too much effort. Because of the `diagnose_if` |
| attribute, all of these calls turn into compile-time errors. |
| |
| #### Compile-time or run-time error on missing arguments |
| The following overload handles all two-argument calls to `open`. |
| ```c |
| static |
| __inline__ |
| __attribute__((no_stack_protector)) |
| __attribute__((always_inline)) |
| int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags) |
| __attribute__((overloadable)) |
| __attribute__((diagnose_if( |
| __open_modes_useful(flags), |
| "error", |
| "'open' called with O_CREAT or O_TMPFILE, but missing mode"))) { |
| #if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED |
| return __open_2(pathname, flags); |
| #else |
| return __open_real(pathname, flags); |
| #endif |
| } |
| ``` |
| |
| Like `mempcpy`, `diagnose_if` handles emitting a compile-time error if the call |
| to `open` is broken in a way that's visible to Clang's frontend. This |
| essentially boils down to "`open` is being called with a `flags` value that |
| requires mode bits to be set." |
| |
| If that fails to catch a bug, we [unconditionally call `__open_2`], which |
| performs a run-time check: |
| ```c |
| int __open_2(const char* pathname, int flags) { |
| if (needs_mode(flags)) __fortify_fatal("open: called with O_CREAT/O_TMPFILE but no mode"); |
| return FDTRACK_CREATE_NAME("open", __openat(AT_FDCWD, pathname, force_O_LARGEFILE(flags), 0)); |
| } |
| ``` |
| |
| #### Compile-time warning if modes are pointless |
| |
| Finally, we have the following `open` call: |
| ```c |
| static |
| __inline__ |
| __attribute__((no_stack_protector)) |
| __attribute__((always_inline)) |
| int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes) |
| __attribute__((overloadable)) |
| __clang_warning_if(!__open_modes_useful(flags) && modes, |
| "'open' has superfluous mode bits; missing O_CREAT?") { |
| return __open_real(pathname, flags, modes); |
| } |
| ``` |
| |
| This simply issues a warning if Clang's frontend can determine that `flags` |
| isn't necessary. Due to conventions in existing code, a `modes` value of `0` is |
| not diagnosed. |
| |
| #### What about `&open`? |
| One yet-unaddressed aspect of the above is how `&open` works. This is thankfully |
| a short answer: |
| - It happens that `open` takes a parameter of type `const char*`. |
| - It happens that `pass_object_size` -- an attribute only applicable to |
| parameters of type `T*` -- makes it impossible to take the address of a |
| function. |
| |
| Since clang doesn't support a "this function should never have its address |
| taken," attribute, Bionic uses the next best thing: `pass_object_size`. :) |
| |
| ## Breakdown of poll |
| |
| (Preemptively: at the time of writing, Clang has no literal `__builtin_poll` |
| builtin. `__builtin_poll` is referenced below to remain consistent with the |
| convention established in the Terminology section.) |
| |
| Bionic's `poll` implementation is closest to `mempcpy` above, though it has a |
| few interesting aspects worth examining. |
| |
| The [full header implementation of `poll`] is, with some macros expanded: |
| ```c |
| #define __bos_fd_count_trivially_safe(bos_val, fds, fd_count) \ |
| ((bos_val) == -1) || \ |
| (__builtin_constant_p(fd_count) && \ |
| (bos_val) >= sizeof(*fds) * (fd_count))) |
| |
| static |
| __inline__ |
| __attribute__((no_stack_protector)) |
| __attribute__((always_inline)) |
| int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout) |
| __attribute__((overloadable)) |
| __attriubte__((diagnose_if( |
| __builtin_object_size(fds, 1) != -1 && __builtin_object_size(fds, 1) < sizeof(*fds) * fd_count, |
| "error", |
| "in call to 'poll', fd_count is larger than the given buffer"))) { |
| size_t bos_fds = __builtin_object_size(fds, 1); |
| if (!__bos_fd_count_trivially_safe(bos_fds, fds, fd_count)) { |
| return __poll_chk(fds, fd_count, timeout, bos_fds); |
| } |
| return (&poll)(fds, fd_count, timeout); |
| } |
| ``` |
| |
| To get the commonality with `mempcpy` and `open` out of the way: |
| - This function is an overload with `__builtin_poll`. |
| - The signature is the same, modulo the presence of a `pass_object_size` |
| attribute. Hence, for direct calls, overload resolution will always prefer it |
| over `__builtin_poll`. Taking the address of `poll` is forbidden, so all |
| references to `&poll` actually reference `__builtin_poll`. |
| - When `fds` is too small to hold `fd_count` `pollfd`s, Clang will emit a |
| compile-time error if possible using `diagnose_if`. |
| - If this can't be observed until run-time, `__poll_chk` verifies this. |
| - When `fds` is a constant according to `__builtin_constant_p`, this always |
| compiles into `__poll_chk` for always-broken calls to `poll`, or |
| `__builtin_poll` for always-safe calls to `poll`. |
| |
| The critical bits to highlight here are on this line: |
| ```c |
| int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout) |
| ``` |
| |
| And this line: |
| ```c |
| return (&poll)(fds, fd_count, timeout); |
| ``` |
| |
| Starting with the simplest, we call `__builtin_poll` with `(&poll)(...);`. As |
| referenced above, taking the address of an overloaded function where all but one |
| overload has a `pass_object_size` attribute on one or more parameters always |
| resolves to the function without any `pass_object_size` attributes. |
| |
| The other line deserves a section. The subtlety of it is almost entirely in the |
| use of `pass_object_size(1)` instead of `pass_object_size(0)`. on the `fds` |
| parameter, and the corresponding use of `__builtin_object_size(fds, 1);` in the |
| body of `poll`. |
| |
| ### Subtleties of __builtin_object_size(p, N) |
| |
| Earlier in this document, it was said that a full description of each |
| attribute/builtin necessary to power FORTIFY was out of scope. This is... only |
| somewhat the case when we talk about `__builtin_object_size` and |
| `pass_object_size`, especially when their second argument is `1`. |
| |
| #### tl;dr |
| `__builtin_object_size(p, N)` and `pass_object_size(N)`, where `(N & 1) == 1`, |
| can only be accurately determined by Clang. LLVM's `@llvm.objectsize` intrinsic |
| ignores the value of `N & 1`, since handling `(N & 1) == 1` accurately requires |
| data that's currently entirely inaccessible to LLVM, and that is difficult to |
| preserve through LLVM's optimization passes. |
| |
| `pass_object_size`'s "lifting" of the evaluation of |
| `__builtin_object_size(p, N)` to the caller is critical, since it allows Clang |
| full visibility into the expression passed to e.g., `poll(&foo->bar, baz, qux)`. |
| It's not a perfect solution, but it allows `N == 1` to be fully accurate in at |
| least some cases. |
| |
| #### Background |
| Clang's implementation of `__builtin_object_size` aims to be compatible with |
| GCC's, which has [a decent bit of documentation]. Put simply, |
| `__builtin_object_size(p, N)` is intended to evaluate at compile-time how many |
| bytes can be accessed after `p` in a well-defined way. Straightforward examples |
| of this are: |
| ```c |
| char buf[8]; |
| assert(__builtin_object_size(buf, N) == 8); |
| assert(__builtin_object_size(buf + 1, N) == 7); |
| ``` |
| |
| This should hold for all values of N that are valid to pass to |
| `__builtin_object_size`. The `N` value of `__builtin_object_size` is a mask of |
| settings. |
| |
| ##### (N & 2) == ? |
| |
| This is mostly for completeness sake; in Bionic's FORTIFY implementation, N is |
| always either 0 or 1. |
| |
| If there are multiple possible values of `p` in a call to |
| `__builtin_object_size(p, N)`, the second bit in `N` determines the behavior of |
| the compiler. If `(N & 2) == 0`, `__builtin_object_size` should return the |
| greatest possible size for each possible value of `p`. Otherwise, it should |
| return the least possible value. For example: |
| |
| ```c |
| char smol_buf[7]; |
| char buf[8]; |
| char *p = rand() ? smol_buf : buf; |
| assert(__builtin_object_size(p, 0) == 8); |
| assert(__builtin_object_size(p, 2) == 7); |
| ``` |
| |
| ##### (N & 1) == 0 |
| |
| `__builtin_object_size(p, 0)` is more or less as simple as the example in the |
| Background section directly above. When Clang attempts to evaluate |
| `__builtin_object_size(p, 0);` and when LLVM tries to determine the result of a |
| corresponding `@llvm.objectsize` call to, they search for the storage underlying |
| the pointer in question. If that can be determined, Clang or LLVM can provide an |
| answer; otherwise, they cannot. |
| |
| ##### (N & 1) == 1, and the true magic of pass_object_size |
| |
| `__builtin_object_size(p, 1)` has a less uniform implementation between LLVM and |
| Clang. According to GCC's documentation, "If the least significant bit [of |
| __builtin_object_size's second argument] is clear, objects are whole variables, |
| if it is set, a closest surrounding subobject is considered the object a pointer |
| points to." |
| |
| The "closest surrounding subobject," means that `(N & 1) == 1` depends on type |
| information in order to operate in many cases. Consider the following examples: |
| ```c |
| struct Foo { |
| int a; |
| int b; |
| }; |
| |
| struct Foo foo; |
| assert(__builtin_object_size(&foo, 0) == sizeof(foo)); |
| assert(__builtin_object_size(&foo, 1) == sizeof(foo)); |
| assert(__builtin_object_size(&foo->a, 0) == sizeof(foo)); |
| assert(__builtin_object_size(&foo->a, 1) == sizeof(int)); |
| |
| struct Foo foos[2]; |
| assert(__builtin_object_size(&foos[0], 0) == 2 * sizeof(foo)); |
| assert(__builtin_object_size(&foos[0], 1) == sizeof(foo)); |
| assert(__builtin_object_size(&foos[0]->a, 0) == 2 * sizeof(foo)); |
| assert(__builtin_object_size(&foos[0]->a, 1) == sizeof(int)); |
| ``` |
| |
| ...And perhaps somewhat surprisingly: |
| ```c |
| void example(struct Foo *foo) { |
| // (As a reminder, `-1` is "I don't know" when `(N & 2) == 0`.) |
| assert(__builtin_object_size(foo, 0) == -1); |
| assert(__builtin_object_size(foo, 1) == -1); |
| assert(__builtin_object_size(foo->a, 0) == -1); |
| assert(__builtin_object_size(foo->a, 1) == sizeof(int)); |
| } |
| ``` |
| |
| In Clang, [this type-aware requirement poses problems for us]: Clang's frontend |
| knows everything we could possibly want about the types of variables, but |
| optimizations are only performed by LLVM. LLVM has no reliable source for C or |
| C++ data types, so calls to `__builtin_object_size(p, N)` that cannot be |
| resolved by clang are lowered to the equivalent of |
| `__builtin_object_size(p, N & ~1)` in LLVM IR. |
| |
| Moreover, Clang's frontend is the best-equipped part of the compiler to |
| accurately determine the answer for `__builtin_object_size(p, N)`, given we know |
| what `p` is. LLVM is the best-equipped part of the compiler to determine the |
| value of `p`. This ordering issue is unfortunate. |
| |
| This is where `pass_object_size(N)` comes in. To summarize [the docs for |
| `pass_object_size`], it evaluates `__builtin_object_size(p, N)` within the |
| context of the caller of the function annotated with `pass_object_size`, and |
| passes the value of that into the callee as an invisible parameter. All calls to |
| `__builtin_object_size(parameter, N)` are substituted with references to this |
| invisible parameter. |
| |
| Putting this plainly, Clang's frontend struggles to evaluate the following: |
| ```c |
| int foo(void *p) { |
| return __builtin_object_size(p, 1); |
| } |
| |
| void bar() { |
| struct { int i, j } k; |
| // The frontend can't figure this interprocedural objectsize out, so it gets lowered to |
| // LLVM, which determines that the answer here is sizeof(k). |
| int baz = foo(&k.i); |
| } |
| ``` |
| |
| However, with the magic of `pass_object_size`, we get one level of inlining to |
| look through: |
| ```c |
| int foo(void *const __attribute__((pass_object_size(1))) p) { |
| return __builtin_object_size(p, 1); |
| } |
| |
| void bar() { |
| struct { int i, j } k; |
| // Due to pass_object_size, this is equivalent to: |
| // int baz = foo(&k.i, __builtin_object_size(&k.i, 1)); |
| // ...and `int foo(void *)` is actually equivalent to: |
| // int foo(void *const, size_t size) { |
| // return size; |
| // } |
| int baz = foo(&k.i); |
| } |
| ``` |
| |
| So we can obtain an accurate result in this case. |
| |
| ##### What about pass_object_size(0)? |
| It's sort of tangential, but if you find yourself wondering about the utility of |
| `pass_object_size(0)` ... it's somewhat split. `pass_object_size(0)` in Bionic's |
| FORTIFY exists mostly for visual consistency, simplicity, and as a useful way to |
| have e.g., `&mempcpy` == `&__builtin_mempcpy`. |
| |
| Outside of these fringe benefits, all of the functions with |
| `pass_object_size(0)` on parameters are marked with `always_inline`, so |
| "lifting" the `__builtin_object_size` call isn't ultimately very helpful. In |
| theory, users can always have something like: |
| |
| ```c |
| // In some_header.h |
| // This function does cool and interesting things with the `__builtin_object_size` of its parameter, |
| // and is able to work with that as though the function were defined inline. |
| void out_of_line_function(void *__attribute__((pass_object_size(0)))); |
| ``` |
| |
| Though the author isn't aware of uses like this in practice, beyond a few folks |
| on LLVM's mailing list seeming interested in trying it someday. |
| |
| #### Wrapping up |
| In the (long) section above, two things were covered: |
| - The use of `(&poll)(...);` is a convenient shorthand for calling |
| `__builtin_poll`. |
| - `__builtin_object_size(p, N)` with `(N & 1) == 1` is not easy for Clang to |
| answer accurately, since it relies on type info only available in the |
| frontend, and it sometimes relies on optimizations only available in the |
| middle-end. `pass_object_size` helps mitigate this. |
| |
| ## Miscellaneous Notes |
| The above should be a roughly comprehensive view of how FORTIFY works in the |
| real world. The main thing it fails to mention is the use of [the `diagnose_as_builtin` attribute] in Clang. |
| |
| As time has moved on, Clang has increasingly gained support for emitting |
| warnings that were previously emitted by FORTIFY machinery. |
| `diagnose_as_builtin` allows us to remove the `diagnose_if`s from some of the |
| `static inline` overloads of stdlib functions above, so Clang may diagnose them |
| instead. |
| |
| Clang's built-in diagnostics are often better than `diagnose_if` diagnostics, |
| since Clang can format its diagnostics to include e.g., information about the |
| sizes of buffers in a suspect call to a function. `diagnose_if` can only have |
| the compiler output constant strings. |
| |
| [ChromeOS' Glibc patch]: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/90fa9b27731db10a6010c7f7c25b24028145b091/sys-libs/glibc/files/local/glibc-2.33/0007-glibc-add-clang-style-FORTIFY.patch |
| [FORTIFY'ed implementation of `open`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/fcntl.h#41 |
| [FORTIFY'ed version of `mempcpy`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/string.h#45 |
| [a decent bit of documentation]: https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html |
| [an implementation for `__mempcpy_chk`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/bionic/fortify.cpp#501 |
| [full header implementation of `poll`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/poll.h#43 |
| [incompatible with stricter versions of FORTIFY checking]: https://godbolt.org/z/fGfEYxfnf |
| [similar to C++11's `std::unique_ptr`]: https://stackoverflow.com/questions/58339165/why-can-a-t-be-passed-in-register-but-a-unique-ptrt-cannot |
| [source for `mempcpy`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/string.h#55 |
| [the `diagnose_as_builtin` attribute]: https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeReference.html#diagnose-as-builtin |
| [the docs for `pass_object_size`]: https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeReference.html#pass-object-size-pass-dynamic-object-size |
| [this type-aware requirement poses problems for us]: https://github.com/llvm/llvm-project/issues/55742 |
| [unconditionally call `__open_2`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/bionic/open.cpp#70 |