blob: 49ff6358b4b8c53f890c24df728df4fa9d51fe25 [file] [log] [blame] [view]
# Clang Migration Notes
NDK r17 was the last version to include GCC. If you're upgrading from an old NDK
and need to migrate to Clang, this doc can help.
If you maintain a custom build system, see the [Build System Maintainers]
documentation.
[Build System Maintainers]: ./BuildSystemMaintainers.md
## `-Oz` versus `-Os`
[Clang Optimization Flags](https://clang.llvm.org/docs/CommandGuide/clang.html#code-generation-options)
has the full details, but if you used `-Os` to optimize your
code for size with GCC, you probably want `-Oz` when using
Clang. Although `-Os` attempts to make code small, it still
enables some optimizations that will increase code size (based on
https://stackoverflow.com/a/15548189/632035). For the smallest possible
code with Clang, prefer `-Oz`. With `-Oz`, Chromium actually saw both
size *and* performance improvements when moving to Clang compared to
`-Os` with GCC.
## `__attribute__((__aligned__))`
Normally the `__aligned__` attribute is given an explicit alignment,
but with no value means “maximum alignment”. The interpretation of
“maximum” differs between GCC and Clang: Clang includes vector types
too so for ARM GCC thinks the maximum alignment is 8 (for `uint64_t`), but
Clang thinks it’s 16 (because there are NEON instructions that require
16-byte alignment). Normally this shouldn’t matter because malloc is
always at least 16-byte aligned, and mmap regions are page (4096-byte)
aligned. Most code should either specify an explicit alignment or use
[alignas](http://en.cppreference.com/w/cpp/language/alignas) instead.
## `-Bsymbolic`
When targeting Android (but no other platform), GCC passed
[-Bsymbolic](ftp://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_3.html)
to the linker by default. This is not a good default, so Clang does not
do that. `-Bsymbolic` causes the following behavior change:
```c++
// foo.cpp
#include <iostream>
void foo() {
std::cout << "Goodbye, world" << std::endl;
}
void bar() {
foo();
}
```
```c++
// main.cpp
#include <iostream>
extern void bar();
void foo() {
std::cout << "Hello, world\n";
}
int main(int, char**) {
foo(); // Prints “Hello, world!”
bar(); // Without -Bsymbolic, prints “Hello, world!” With -Bsymbolic, prints “Goodbye, world!”
}
```
In addition to not being the "expected" default behavior on all other
platforms, this prevents symbol interposition (used by tools such
as asan).
You might however wish to add manually `-Bsymbolic` back because it can
result in smaller ELF files because fewer relocations are needed. If you
do want the non-`-Bsymbolic` behavior but would like fewer relocations,
that can be achieved via `-fvisibility=hidden` (and manually exporting
the symbols you want to be public, using the `JNI_EXPORT` macro in JNI
code or `__attribute__ ((visibility("default")))` otherwise. Linker
version scripts are an even more powerful mechanism for controlling
exported symbols, but harder to use.
## Assembler issues
For many years the problem of adjusting inline assembler to work with
LLVM could be punted down the road by using `-fno-integrated-as` to fall
back to the GNU Assembler (GAS). With the removal of GNU binutils from
the NDK, such issues will now need to be addressed. We’ve collected
some of the most common issues and their solutions/workarounds here.
### `.arch` or `.arch_extension` scope with `__asm__`
GAS doesn’t scope `.arch` or `.arch_extension`, so you can have a global
`__asm__(".arch foo")` that applies to the whole C/C++ source file,
just like a bare `.arch` or `.arch_extension` directive would in a .S
file. LLVM scopes these to the specific `__asm__` in which it occurs,
so you’ll need to adapt your inline assembler, or build the whole file
for the relevant arch variant.
### ARM `ADRL`
GAS lets you use the `ADRL` pseudoinstruction to get the address of
something too far away for a regular `ADR` to reference. This means
that it expands to two instructions, which LLVM doesn’t support,
so you’ll need to use a macro something like this instead:
```
.macro ADRL reg:req, label:req
add \reg, pc, #((\label - .L_adrl_\@) & 0xff00)
add \reg, \reg, #((\label - .L_adrl_\@) - ((\label - .L_adrl_\@) & 0xff00))
.L_adrl_\@:
.endm
```
### ARM assembler syntactical strictness
While GAS supports the older divided and newer unified syntax (selectable
via `.syntax unified` and `.syntax divided`), LLVM only supports the
newer unified syntax.
As an example of where this matters, `LDR` has an optional type and the
optional condition code allowed on all instructions. GAS allows these
to come in either order when using divided syntax, but LLVM only allows
them in the canonical order given in the ARM instruction reference (which
is what “unified” syntax means). So continuing this example, GAS
accepts both `LDRBEQ` and `LDREQB`, but LLVM only accepts `LDRBEQ` (with
the condition code at the end, as the instruction appears in the manual).
Most humans usually use this order anyway, but you’ll have to rearrange
any instructions that don’t use the canonical order.
### ARM assembler implicit operands
Some ARM instructions have restrictions that make some operands
implicit. For example, the two target registers supplied to `LDREXD`
must be consecutive. GAS would allow you to write `LDREXD R1, [R4]`
because the other register _must_ be `R2`, but LLVM requires both
registers to be explicitly stated, in this case `LDREXD R1, R2, [R4]`.
### ARM `.arm` or `.code 32` alignment
Switching from Thumb to ARM mode implicitly forces 4-byte alignment
with GAS but doesn’t with LLVM. You may need to use an explicit
`.align`/`.balign`/`.p2align` directive in such cases.
### No `--defsym` command-line option
GAS and LLVM implement their own conditional assembly mechanism with
`.if`...`.endif` rather than the C preprocessor’s `#if`...`#endif`. The
equivalent of `-DA=B` for `.if` is `-Wa,-defsym,A=B`, but GAS allowed
`--defsym` instead of `-defsym`. LLVM requires `-defsym`.
You might also prefer to just use the C preprocessor. If your assembly
is in a .S file it is already being preprocessed. If your assembly
is in a file with any other extension (including `.s` --- this is the
difference between `.s` and `.S`), you’ll need to either rename it to
`.S` or use the `-x assembler-with-cpp` flag to the compiler to override
the file extension-based guess.
### No `.func`/`.endfunc`
GAS ignores a request for obsolete STABS debugging information to be
emitted using `.func` and `.endfunc`. Neither GAS nor LLVM actually
support STABS, but LLVM rejects these meaningless directives. The fix
is simply to remove them.