| <html devsite> |
| <head> |
| <title>Diagnosing Native Crashes</title> |
| <meta name="project_path" value="/_project.yaml" /> |
| <meta name="book_path" value="/_book.yaml" /> |
| </head> |
| <body> |
| <!-- |
| Copyright 2017 The Android Open Source Project |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <p> |
| The following sections include common types of native crash, an analysis of a |
| sample crash dump, and a discussion of tombstones. Each crash type includes |
| example <code>debuggerd</code> output with key evidence highlighted to help |
| you distinguish the specific kind of crash. |
| </p> |
| |
| <aside class=tip> |
| <strong>Tip:</strong> If you've never seen a native crash before, start with |
| <a href="/devices/tech/debug/index.html">Debugging Native Android Platform |
| Code</a>. |
| </aside> |
| |
| <h2 id=abort>Abort</h2> |
| |
| <p> |
| Aborts are interesting because they are deliberate. There are many different |
| ways to abort (including calling |
| <code><a href="http://man7.org/linux/man-pages/man3/abort.3.html" class="external">abort(3)</a></code>, |
| failing an |
| <code><a href="http://man7.org/linux/man-pages/man3/assert.3.html" class="external">assert(3)</a></code>, |
| using one of the Android-specific fatal logging types), but all involve |
| calling <code>abort</code>. A call to <code>abort</code> signals the calling |
| thread with SIGABRT, so a frame showing "abort" in <code>libc.so</code> plus |
| SIGABRT are the things to look for in the <code>debuggerd</code> output to |
| recognize this case. |
| </p> |
| |
| <p> |
| There may be an explicit "abort message" line. You should also look in the |
| <code>logcat</code> output to see what this thread logged before deliberately |
| killing itself, because unlike <code>assert(3)</code> or high level fatal |
| logging facilities, <code>abort(3)</code> doesn't accept a message. |
| </p> |
| |
| <p> |
| Current versions of Android inline the |
| <code><a href="http://man7.org/linux/man-pages/man2/tgkill.2.html" class="external">tgkill(2)</a></code> |
| system call, so their stacks are the easiest to read, with the call to |
| abort(3) at the very top: |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| pid: 4637, tid: 4637, name: crasher >>> crasher <<< |
| signal 6 (<em style="color:Orange">SIGABRT</em>), code -6 (SI_TKILL), fault addr -------- |
| <em style="color:Orange">Abort message</em>: 'some_file.c:123: some_function: assertion "false" failed' |
| r0 00000000 r1 0000121d r2 00000006 r3 00000008 |
| r4 0000121d r5 0000121d r6 ffb44a1c r7 0000010c |
| r8 00000000 r9 00000000 r10 00000000 r11 00000000 |
| ip ffb44c20 sp ffb44a08 lr eace2b0b pc eace2b16 |
| backtrace: |
| #00 pc 0001cb16 /system/lib/<em style="color:Orange">libc.so</em> (<em style="color:Orange">abort</em>+57) |
| #01 pc 0001cd8f /system/lib/libc.so (__assert2+22) |
| #02 pc 00001531 /system/bin/crasher (do_action+764) |
| #03 pc 00002301 /system/bin/crasher (main+68) |
| #04 pc 0008a809 /system/lib/libc.so (__libc_init+48) |
| #05 pc 00001097 /system/bin/crasher (_start_main+38) |
| </pre> |
| |
| <p> |
| Older versions of Android followed a convoluted path between the original |
| abort call (frame 4 here) and the actual sending of the signal (frame 0 here). |
| This was especially true on 32-bit ARM, which added |
| <code>__libc_android_abort</code> (frame 3 here) to the other platforms' |
| sequence of <code>raise</code>/<code>pthread_kill</code>/<code>tgkill</code>: |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| pid: 1656, tid: 1656, name: crasher >>> crasher <<< |
| signal 6 (<em style="color:Orange">SIGABRT</em>), code -6 (SI_TKILL), fault addr -------- |
| <em style="color:Orange">Abort message</em>: 'some_file.c:123: some_function: assertion "false" failed' |
| r0 00000000 r1 00000678 r2 00000006 r3 f70b6dc8 |
| r4 f70b6dd0 r5 f70b6d80 r6 00000002 r7 0000010c |
| r8 ffffffed r9 00000000 sl 00000000 fp ff96ae1c |
| ip 00000006 sp ff96ad18 lr f700ced5 pc f700dc98 cpsr 400b0010 |
| backtrace: |
| #00 pc 00042c98 /system/lib/libc.so (tgkill+12) |
| #01 pc 00041ed1 /system/lib/libc.so (pthread_kill+32) |
| #02 pc 0001bb87 /system/lib/libc.so (raise+10) |
| #03 pc 00018cad /system/lib/libc.so (__libc_android_abort+34) |
| #04 pc 000168e8 /system/lib/<em style="color:Orange">libc.so</em> (<em style="color:Orange">abort</em>+4) |
| #05 pc 0001a78f /system/lib/libc.so (__libc_fatal+16) |
| #06 pc 00018d35 /system/lib/libc.so (__assert2+20) |
| #07 pc 00000f21 /system/xbin/crasher |
| #08 pc 00016795 /system/lib/libc.so (__libc_init+44) |
| #09 pc 00000abc /system/xbin/crasher |
| </pre> |
| |
| <p> |
| You can reproduce an instance of this type of crash using <code>crasher |
| abort</code>. |
| </p> |
| |
| <h2 id=nullpointer>Pure null pointer dereference</h2> |
| |
| <p> |
| This is the classic native crash, and although it's just a special case of the |
| next crash type, it's worth mentioning separately because it usually requires |
| the least thought. |
| </p> |
| |
| <p> |
| In the example below, even though the crashing function is in |
| <code>libc.so</code>, because the string functions just operate on the |
| pointers they're given, you can infer that |
| <code><a href="http://man7.org/linux/man-pages/man3/strlen.3.html" class="external">strlen(3)</a></code> |
| was called with a null pointer; and this crash should go straight to the |
| author of the calling code. In this case, frame #01 is the bad caller. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| pid: 25326, tid: 25326, name: crasher >>> crasher <<< |
| signal 11 (<em style="color:Orange">SIGSEGV</em>), code 1 (SEGV_MAPERR), <em style="color:Orange">fault addr 0x0</em> |
| r0 00000000 r1 00000000 r2 00004c00 r3 00000000 |
| r4 ab088071 r5 fff92b34 r6 00000002 r7 fff92b40 |
| r8 00000000 r9 00000000 sl 00000000 fp fff92b2c |
| ip ab08cfc4 sp fff92a08 lr ab087a93 pc efb78988 cpsr 600d0030 |
| |
| backtrace: |
| #00 pc 00019988 /system/lib/libc.so (strlen+71) |
| #01 pc 00001a8f /system/xbin/crasher (strlen_null+22) |
| #02 pc 000017cd /system/xbin/crasher (do_action+948) |
| #03 pc 000020d5 /system/xbin/crasher (main+100) |
| #04 pc 000177a1 /system/lib/libc.so (__libc_init+48) |
| #05 pc 000010e4 /system/xbin/crasher (_start+96) |
| </pre> |
| |
| <p> |
| You can reproduce an instance of this type of crash using <code>crasher |
| strlen-NULL</code>. |
| </p> |
| |
| <h2 id=lowaddress>Low-address null pointer dereference</h2> |
| |
| <p> |
| In many cases the fault address won't be 0, but some other low number. Two- or |
| three-digit addresses in particular are very common, whereas a six-digit |
| address is almost certainly not a null pointer dereference—that would |
| require a 1MiB offset. This usually occurs when you have code that |
| dereferences a null pointer as if it was a valid struct. Common functions are |
| <code><a href="http://man7.org/linux/man-pages/man3/fprintf.3.html" class="external">fprintf(3)</a></code> |
| (or any other function taking a FILE*) and |
| <code><a href="http://man7.org/linux/man-pages/man3/readdir.3.html" class="external">readdir(3)</a></code>, |
| because code often fails to check that the |
| <code><a href="http://man7.org/linux/man-pages/man3/fopen.3.html" class="external">fopen(3)</a></code> |
| or |
| <code><a href="http://man7.org/linux/man-pages/man3/opendir.3.html" class="external">opendir(3)</a></code> |
| call actually succeeded first. |
| </p> |
| |
| <p> |
| Here's an example of <code>readdir</code>: |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| pid: 25405, tid: 25405, name: crasher >>> crasher <<< |
| signal 11 (<em style="color:Orange">SIGSEGV</em>), code 1 (SEGV_MAPERR), <em style="color:Orange">fault addr 0xc</em> |
| r0 0000000c r1 00000000 r2 00000000 r3 3d5f0000 |
| r4 00000000 r5 0000000c r6 00000002 r7 ff8618f0 |
| r8 00000000 r9 00000000 sl 00000000 fp ff8618dc |
| ip edaa6834 sp ff8617a8 lr eda34a1f pc eda618f6 cpsr 600d0030 |
| |
| backtrace: |
| #00 pc 000478f6 /system/lib/libc.so (pthread_mutex_lock+1) |
| #01 pc 0001aa1b /system/lib/libc.so (readdir+10) |
| #02 pc 00001b35 /system/xbin/crasher (readdir_null+20) |
| #03 pc 00001815 /system/xbin/crasher (do_action+976) |
| #04 pc 000021e5 /system/xbin/crasher (main+100) |
| #05 pc 000177a1 /system/lib/libc.so (__libc_init+48) |
| #06 pc 00001110 /system/xbin/crasher (_start+96) |
| </pre> |
| |
| <p> |
| Here the direct cause of the crash is that |
| <code><a href="http://man7.org/linux/man-pages/man3/pthread_mutex_lock.3p.html" class="external">pthread_mutex_lock(3)</a></code> |
| has tried to access address 0xc (frame 0). But the first thing |
| <code>pthread_mutex_lock</code> does is dereference the <code>state</code> |
| element of the <code>pthread_mutex_t*</code> it was given. If you look at the |
| source, you can see that element is at offset 0 in the struct, which tells you |
| that <code>pthread_mutex_lock</code> was given the invalid pointer 0xc. From |
| frame 1 you can see that it was given that pointer by <code>readdir</code>, |
| which extracts the <code>mutex_</code> field from the <code>DIR*</code> it's |
| given. Looking at that structure, you can see that <code>mutex_</code> is at |
| offset <code>sizeof(int) + sizeof(size_t) + sizeof(dirent*)</code> into |
| <code>struct DIR</code>, which on a 32-bit device is 4 + 4 + 4 = 12 = 0xc, so |
| you found the bug: <code>readdir</code> was passed a null pointer by the |
| caller. At this point you can paste the stack into the stack tool to find out |
| <em>where</em> in logcat this happened. |
| </p> |
| |
| <pre class="prettyprint"> |
| struct DIR { |
| int fd_; |
| size_t available_bytes_; |
| dirent* next_; |
| pthread_mutex_t mutex_; |
| dirent buff_[15]; |
| long current_pos_; |
| }; |
| </pre> |
| |
| <p> |
| In most cases you can actually skip this analysis. A sufficiently low fault |
| address usually means you can just skip any <code>libc.so</code> frames in the |
| stack and directly accuse the calling code. But not always, and this is how |
| you would present a compelling case. |
| </p> |
| |
| <p> |
| You can reproduce instances of this kind of crash using <code>crasher |
| fprintf-NULL</code> or <code>crasher readdir-NULL</code>. |
| </p> |
| |
| <h2 id=fortify>FORTIFY failure</h2> |
| |
| <p> |
| A FORTIFY failure is a special case of an abort that occurs when the C library |
| detects a problem that might lead to a security vulnerability. Many C library |
| functions are <em>fortified</em>; they take an extra argument that tells them |
| how large a buffer actually is and check at run time whether the operation |
| you're trying to perform actually fits. Here's an example where the code tries |
| to <code>read(fd, buf, 32)</code> into a buffer that's actually only 10 bytes |
| long... |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| pid: 25579, tid: 25579, name: crasher >>> crasher <<< |
| signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr -------- |
| Abort message: '<em style="color:Orange">FORTIFY: read: prevented 32-byte write into 10-byte buffer</em>' |
| r0 00000000 r1 000063eb r2 00000006 r3 00000008 |
| r4 ff96f350 r5 000063eb r6 000063eb r7 0000010c |
| r8 00000000 r9 00000000 sl 00000000 fp ff96f49c |
| ip 00000000 sp ff96f340 lr ee83ece3 pc ee86ef0c cpsr 000d0010 |
| |
| backtrace: |
| #00 pc 00049f0c /system/lib/libc.so (tgkill+12) |
| #01 pc 00019cdf /system/lib/libc.so (abort+50) |
| #02 pc 0001e197 /system/lib/libc.so (<em style="color:Orange">__fortify_fatal</em>+30) |
| #03 pc 0001baf9 /system/lib/libc.so (__read_chk+48) |
| #04 pc 0000165b /system/xbin/crasher (do_action+534) |
| #05 pc 000021e5 /system/xbin/crasher (main+100) |
| #06 pc 000177a1 /system/lib/libc.so (__libc_init+48) |
| #07 pc 00001110 /system/xbin/crasher (_start+96) |
| </pre> |
| |
| <p> |
| You can reproduce an instance of this type of crash using <code>crasher |
| fortify</code>. |
| </p> |
| |
| <h2 id=stackcorruption>Stack corruption detected by -fstack-protector</h2> |
| |
| <p> |
| The compiler's <code>-fstack-protector</code> option inserts checks into |
| functions with on-stack buffers to guard against buffer overruns. This option |
| is on by default for platform code but not for apps. When this option is |
| enabled, the compiler adds instructions to the |
| <a href="https://en.wikipedia.org/wiki/Function_prologue" class="external">function |
| prologue</a> to write a random value just past the last local on the stack and |
| to the function epilogue to read it back and check that it's not changed. If |
| that value changed, it was overwritten by a buffer overrun, so the epilogue |
| calls <code>__stack_chk_fail</code> to log a message and abort. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| pid: 26717, tid: 26717, name: crasher >>> crasher <<< |
| signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr -------- |
| <em style="color:Orange">Abort message: 'stack corruption detected'</em> |
| r0 00000000 r1 0000685d r2 00000006 r3 00000008 |
| r4 ffd516d8 r5 0000685d r6 0000685d r7 0000010c |
| r8 00000000 r9 00000000 sl 00000000 fp ffd518bc |
| ip 00000000 sp ffd516c8 lr ee63ece3 pc ee66ef0c cpsr 000e0010 |
| |
| backtrace: |
| #00 pc 00049f0c /system/lib/libc.so (tgkill+12) |
| #01 pc 00019cdf /system/lib/libc.so (abort+50) |
| #02 pc 0001e07d /system/lib/libc.so (__libc_fatal+24) |
| #03 pc 0004863f /system/lib/libc.so (<em style="color:Orange">__stack_chk_fail</em>+6) |
| #04 pc 000013ed /system/xbin/crasher (smash_stack+76) |
| #05 pc 00001591 /system/xbin/crasher (do_action+280) |
| #06 pc 00002219 /system/xbin/crasher (main+100) |
| #07 pc 000177a1 /system/lib/libc.so (__libc_init+48) |
| #08 pc 00001144 /system/xbin/crasher (_start+96) |
| </pre> |
| |
| <p> |
| You can distinguish this from other kinds of abort by the presence of |
| <code>__stack_chk_fail</code> in the backtrace and the specific abort message. |
| </p> |
| |
| <p> |
| You can reproduce an instance of this type of crash using <code>crasher |
| smash-stack</code>. |
| </p> |
| |
| <h2 id="seccomp">Seccomp SIGSYS from a disallowed system call</h2> |
| |
| <p> |
| The <a href="https://en.wikipedia.org/wiki/Seccomp" class="external">seccomp</a> |
| system (specifically seccomp-bpf) restricts access to system calls. For more |
| information about seccomp for platform developers, see the blog post |
| <a href="https://android-developers.googleblog.com/2017/07/seccomp-filter-in-android-o.html" class="external">Seccomp |
| filter in Android O</a>. A thread that calls a restricted system call will |
| receive a SIGSYS signal with code SYS_SECCOMP. The system call number will be |
| shown in the cause line, along with the architecture. It is important to note |
| that system call numbers vary between architectures. For example, the |
| <code>readlinkat(2)</code> system call is number 305 on x86 but 267 on x86-64. |
| The call number is different again on both arm and arm64. Because system call |
| numbers vary between architectures, it's usually easier to use the stack trace |
| to find out which system call was disallowed rather than looking for the |
| system call number in the headers. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| pid: 11046, tid: 11046, name: crasher >>> crasher <<< |
| signal 31 (SIGSYS), code 1 (<em style="color:Orange">SYS_SECCOMP</em>), fault addr -------- |
| <em style="color:Orange">Cause: seccomp prevented call to disallowed arm system call 99999</em> |
| r0 cfda0444 r1 00000014 r2 40000000 r3 00000000 |
| r4 00000000 r5 00000000 r6 00000000 r7 0001869f |
| r8 00000000 r9 00000000 sl 00000000 fp fffefa58 |
| ip fffef898 sp fffef888 lr 00401997 pc f74f3658 cpsr 600f0010 |
| |
| backtrace: |
| #00 pc 00019658 /system/lib/libc.so (syscall+32) |
| #01 pc 00001993 /system/bin/crasher (do_action+1474) |
| #02 pc 00002699 /system/bin/crasher (main+68) |
| #03 pc 0007c60d /system/lib/libc.so (__libc_init+48) |
| #04 pc 000011b0 /system/bin/crasher (_start_main+72) |
| </pre> |
| |
| <p> |
| You can distinguish disallowed system calls from other crashes by the presence of |
| <code>SYS_SECCOMP</code> on the signal line and the description on the cause line. |
| </p> |
| <p> |
| You can reproduce an instance of this type of crash using <code>crasher |
| seccomp</code>. |
| </p> |
| |
| <h2 id="fdsan">Error detected by fdsan</h2> |
| |
| <p> |
| Android's fdsan file descriptor sanitizer helps catch common mistakes with file descriptors such |
| as use-after-close and double-close. See the |
| <a |
| href="https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md" class="external">fdsan |
| documentation</a> |
| for more details about debugging (and avoiding) this class of errors. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| pid: 32315, tid: 32315, name: crasher64 >>> crasher64 <<< |
| signal 35 (<debuggerd signal>), code -1 (SI_QUEUE), fault addr -------- |
| <em style="color:Orange">Abort message: 'attempted to close file descriptor 3, expected to be unowned, actually owned by FILE* 0x7d8e413018'</em> |
| x0 0000000000000000 x1 0000000000007e3b x2 0000000000000023 x3 0000007fe7300bb0 |
| x4 3033313465386437 x5 3033313465386437 x6 3033313465386437 x7 3831303331346538 |
| x8 00000000000000f0 x9 0000000000000000 x10 0000000000000059 x11 0000000000000034 |
| x12 0000007d8ebc3a49 x13 0000007fe730077a x14 0000007fe730077a x15 0000000000000000 |
| x16 0000007d8ec9a7b8 x17 0000007d8ec779f0 x18 0000007d8f29c000 x19 0000000000007e3b |
| x20 0000000000007e3b x21 0000007d8f023020 x22 0000007d8f3b58dc x23 0000000000000001 |
| x24 0000007fe73009a0 x25 0000007fe73008e0 x26 0000007fe7300ca0 x27 0000000000000000 |
| x28 0000000000000000 x29 0000007fe7300c90 |
| sp 0000007fe7300860 lr 0000007d8ec2f22c pc 0000007d8ec2f250 |
| |
| backtrace: |
| #00 pc 0000000000088250 /bionic/lib64/libc.so (fdsan_error(char const*, ...)+384) |
| #01 pc 0000000000088060 /bionic/lib64/libc.so (android_fdsan_close_with_tag+632) |
| #02 pc 00000000000887e8 /bionic/lib64/libc.so (close+16) |
| #03 pc 000000000000379c /system/bin/crasher64 (do_action+1316) |
| #04 pc 00000000000049c8 /system/bin/crasher64 (main+96) |
| #05 pc 000000000008021c /bionic/lib64/libc.so (_start_main) |
| </pre> |
| |
| <p> |
| You can distinguish this from other kinds of abort by the presence of |
| <code>fdsan_error</code> in the backtrace and the specific abort message. |
| </p> |
| |
| <p> |
| You can reproduce an instance of this type of crash using |
| <code>crasher fdsan_file</code> or <code>crasher fdsan_dir</code>. |
| </p> |
| |
| |
| <h2 id=crashdump>Investigating crash dumps</h2> |
| |
| <p> |
| If you don't have a specific crash that you're investigating right now, the |
| platform source includes a tool for testing <code>debuggerd</code> called |
| crasher. If you <code>mm</code> in <code>system/core/debuggerd/</code> you'll |
| get both a <code>crasher</code> and a <code>crasher64</code> on your path (the |
| latter allowing you to test 64-bit crashes). Crasher can crash in a large |
| number of interesting ways based on the command line arguments you provide. |
| Use <code>crasher --help</code> to see the currently supported selection. |
| </p> |
| |
| <p> |
| To introduce the different pieces in a crash dump, let's work through this |
| example crash dump: |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** |
| Build fingerprint: 'Android/aosp_flounder/flounder:5.1.51/AOSP/enh08201009:eng/test-keys' |
| Revision: '0' |
| ABI: 'arm' |
| pid: 1656, tid: 1656, name: crasher >>> crasher <<< |
| signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr -------- |
| Abort message: 'some_file.c:123: some_function: assertion "false" failed' |
| r0 00000000 r1 00000678 r2 00000006 r3 f70b6dc8 |
| r4 f70b6dd0 r5 f70b6d80 r6 00000002 r7 0000010c |
| r8 ffffffed r9 00000000 sl 00000000 fp ff96ae1c |
| ip 00000006 sp ff96ad18 lr f700ced5 pc f700dc98 cpsr 400b0010 |
| backtrace: |
| #00 pc 00042c98 /system/lib/libc.so (tgkill+12) |
| #01 pc 00041ed1 /system/lib/libc.so (pthread_kill+32) |
| #02 pc 0001bb87 /system/lib/libc.so (raise+10) |
| #03 pc 00018cad /system/lib/libc.so (__libc_android_abort+34) |
| #04 pc 000168e8 /system/lib/libc.so (abort+4) |
| #05 pc 0001a78f /system/lib/libc.so (__libc_fatal+16) |
| #06 pc 00018d35 /system/lib/libc.so (__assert2+20) |
| #07 pc 00000f21 /system/xbin/crasher |
| #08 pc 00016795 /system/lib/libc.so (__libc_init+44) |
| #09 pc 00000abc /system/xbin/crasher |
| Tombstone written to: /data/tombstones/tombstone_06 |
| *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** |
| </pre> |
| |
| <p> |
| The line of asterisks with spaces is helpful if you're searching a log |
| for native crashes. The string "*** ***" rarely shows up in logs other than |
| at the beginning of a native crash. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| Build fingerprint: |
| 'Android/aosp_flounder/flounder:5.1.51/AOSP/enh08201009:eng/test-keys' |
| </pre> |
| |
| <p> |
| The fingerprint lets you identify exactly which build the crash occurred on. |
| This is exactly the same as the <code>ro.build.fingerprint</code> system |
| property. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| Revision: '0' |
| </pre> |
| |
| <p> |
| The revision refers to the hardware rather than the software. This is usually |
| unused but can be useful to help you automatically ignore bugs known to be |
| caused by bad hardware. This is exactly the same as the |
| <code>ro.revision</code> system property. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| ABI: 'arm' |
| </pre> |
| |
| <p> |
| The ABI is one of arm, arm64, mips, mips64, x86, or x86-64. This is mostly |
| useful for the <code>stack</code> script mentioned above, so that it knows |
| what toolchain to use. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| pid: 1656, tid: 1656, name: crasher >>> crasher <<< |
| </pre> |
| |
| <p> |
| This line identifies the specific thread in the process that crashed. In this |
| case, it was the process' main thread, so the process ID and thread ID match. |
| The first name is the thread name, and the name surrounded by >>> and |
| <<< is the process name. For an app, the process name is typically |
| the fully-qualified package name (such as com.facebook.katana), which is |
| useful when filing bugs or trying to find the app in Google Play. The pid and |
| tid can also be useful in finding the relevant log lines preceding the crash. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr -------- |
| </pre> |
| |
| <p> |
| This line tells you which signal (SIGABRT) was received, and more about how it |
| was received (SI_TKILL). The signals reported by <code>debuggerd</code> are |
| SIGABRT, SIGBUS, SIGFPE, SIGILL, SIGSEGV, and SIGTRAP. The signal-specific |
| codes vary based on the specific signal. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| Abort message: 'some_file.c:123: some_function: assertion "false" failed' |
| </pre> |
| |
| <p> |
| Not all crashes will have an abort message line, but aborts will. This is |
| automatically gathered from the last line of fatal logcat output for this |
| pid/tid, and in the case of a deliberate abort is likely to give an |
| explanation of why the program killed itself. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| r0 00000000 r1 00000678 r2 00000006 r3 f70b6dc8 |
| r4 f70b6dd0 r5 f70b6d80 r6 00000002 r7 0000010c |
| r8 ffffffed r9 00000000 sl 00000000 fp ff96ae1c |
| ip 00000006 sp ff96ad18 lr f700ced5 pc f700dc98 cpsr 400b0010 |
| </pre> |
| |
| <p> |
| The register dump shows the content of the CPU registers at the time the |
| signal was received. (This section varies wildly between ABIs.) How useful |
| these are will depend on the exact crash. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| backtrace: |
| #00 pc 00042c98 /system/lib/libc.so (tgkill+12) |
| #01 pc 00041ed1 /system/lib/libc.so (pthread_kill+32) |
| #02 pc 0001bb87 /system/lib/libc.so (raise+10) |
| #03 pc 00018cad /system/lib/libc.so (__libc_android_abort+34) |
| #04 pc 000168e8 /system/lib/libc.so (abort+4) |
| #05 pc 0001a78f /system/lib/libc.so (__libc_fatal+16) |
| #06 pc 00018d35 /system/lib/libc.so (__assert2+20) |
| #07 pc 00000f21 /system/xbin/crasher |
| #08 pc 00016795 /system/lib/libc.so (__libc_init+44) |
| #09 pc 00000abc /system/xbin/crasher |
| </pre> |
| |
| <p> |
| The backtrace shows you where in the code we were at the time of crash. The |
| first column is the frame number (matching gdb's style where the deepest frame |
| is 0). The PC values are relative to the location of the shared library rather |
| than absolute addresses. The next column is the name of the mapped region |
| (which is usually a shared library or executable, but might not be for, say, |
| JIT-compiled code). Finally, if symbols are available, the symbol that the PC |
| value corresponds to is shown, along with the offset into that symbol in |
| bytes. You can use this in conjunction with <code>objdump(1)</code> to find |
| the corresponding assembler instruction. |
| </p> |
| |
| <h2 id=tombstones>Reading tombstones</h2> |
| |
| <pre class="devsite-click-to-copy"> |
| Tombstone written to: /data/tombstones/tombstone_06 |
| </pre> |
| |
| <p> |
| This tells you where <code>debuggerd</code> wrote extra information. |
| <code>debuggerd</code> will keep up to 10 tombstones, cycling through the |
| numbers 00 to 09 and overwriting existing tombstones as necessary. |
| </p> |
| |
| <p> |
| The tombstone contains the same information as the crash dump, plus a few |
| extras. For example, it includes backtraces for <em>all</em> threads (not |
| just the crashing thread), the floating point registers, raw stack dumps, |
| and memory dumps around the addresses in registers. Most usefully it also |
| includes a full memory map (similar to <code>/proc/<var>pid</var>/maps</code>). |
| Here's an annotated example from a 32-bit ARM process crash: |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| memory map: (fault address prefixed with --->) |
| --->ab15f000-ab162fff r-x 0 4000 /system/xbin/crasher (BuildId: |
| b9527db01b5cf8f5402f899f64b9b121) |
| </pre> |
| |
| <p> |
| There are two things to note here. The first is that this line is prefixed |
| with "--->". The maps are most useful when your crash isn't just a null |
| pointer dereference. If the fault address is small, it's probably some variant |
| of a null pointer dereference. Otherwise looking at the maps around the fault |
| address can often give you a clue as to what happened. Some possible issues |
| that can be recognized by looking at the maps include: |
| </p> |
| |
| <ul> |
| <li>Reads/writes past the end of a block of memory.</li> |
| <li>Reads/writes before the beginning of a block of memory.</li> |
| <li>Attempts to execute non-code.</li> |
| <li>Running off the end of a stack.</li> |
| <li>Attempts to write to code (as in the example above).</li> |
| </ul> |
| |
| <p> |
| The second thing to note is that executables and shared libraries files will |
| show the BuildId (if present) in Android 6.0 and higher, so you can see exactly |
| which version of your code crashed. Platform binaries include a BuildId by |
| default since Android 6.0; NDK r12 and higher automatically pass |
| <code>-Wl,--build-id</code> to the linker too. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| ab163000-ab163fff r-- 3000 1000 /system/xbin/crasher |
| ab164000-ab164fff rw- 0 1000 |
| f6c80000-f6d7ffff rw- 0 100000 [anon:libc_malloc] |
| </pre> |
| |
| <p> |
| On Android the heap isn't necessarily a single region. Heap regions will |
| be labeled <code>[anon:libc_malloc]</code>. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| f6d82000-f6da1fff r-- 0 20000 /dev/__properties__/u:object_r:logd_prop:s0 |
| f6da2000-f6dc1fff r-- 0 20000 /dev/__properties__/u:object_r:default_prop:s0 |
| f6dc2000-f6de1fff r-- 0 20000 /dev/__properties__/u:object_r:logd_prop:s0 |
| f6de2000-f6de5fff r-x 0 4000 /system/lib/libnetd_client.so (BuildId: 08020aa06ed48cf9f6971861abf06c9d) |
| f6de6000-f6de6fff r-- 3000 1000 /system/lib/libnetd_client.so |
| f6de7000-f6de7fff rw- 4000 1000 /system/lib/libnetd_client.so |
| f6dec000-f6e74fff r-x 0 89000 /system/lib/libc++.so (BuildId: 8f1f2be4b37d7067d366543fafececa2) (load base 0x2000) |
| f6e75000-f6e75fff --- 0 1000 |
| f6e76000-f6e79fff r-- 89000 4000 /system/lib/libc++.so |
| f6e7a000-f6e7afff rw- 8d000 1000 /system/lib/libc++.so |
| f6e7b000-f6e7bfff rw- 0 1000 [anon:.bss] |
| f6e7c000-f6efdfff r-x 0 82000 /system/lib/libc.so (BuildId: d189b369d1aafe11feb7014d411bb9c3) |
| f6efe000-f6f01fff r-- 81000 4000 /system/lib/libc.so |
| f6f02000-f6f03fff rw- 85000 2000 /system/lib/libc.so |
| f6f04000-f6f04fff rw- 0 1000 [anon:.bss] |
| f6f05000-f6f05fff r-- 0 1000 [anon:.bss] |
| f6f06000-f6f0bfff rw- 0 6000 [anon:.bss] |
| f6f0c000-f6f21fff r-x 0 16000 /system/lib/libcutils.so (BuildId: d6d68a419dadd645ca852cd339f89741) |
| f6f22000-f6f22fff r-- 15000 1000 /system/lib/libcutils.so |
| f6f23000-f6f23fff rw- 16000 1000 /system/lib/libcutils.so |
| f6f24000-f6f31fff r-x 0 e000 /system/lib/liblog.so (BuildId: e4d30918d1b1028a1ba23d2ab72536fc) |
| f6f32000-f6f32fff r-- d000 1000 /system/lib/liblog.so |
| f6f33000-f6f33fff rw- e000 1000 /system/lib/liblog.so |
| </pre> |
| |
| <p> |
| Typically, a shared library has three adjacent entries. One is readable and |
| executable (code), one is read-only (read-only data), and one is read-write |
| (mutable data). The first column shows the address ranges for the mapping, the |
| second column the permissions (in the usual Unix <code>ls(1)</code> style), |
| the third column the offset into the file (in hex), the fourth column the size |
| of the region (in hex), and the fifth column the file (or other region name). |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| f6f34000-f6f53fff r-x 0 20000 /system/lib/libm.so (BuildId: 76ba45dcd9247e60227200976a02c69b) |
| f6f54000-f6f54fff --- 0 1000 |
| f6f55000-f6f55fff r-- 20000 1000 /system/lib/libm.so |
| f6f56000-f6f56fff rw- 21000 1000 /system/lib/libm.so |
| f6f58000-f6f58fff rw- 0 1000 |
| f6f59000-f6f78fff r-- 0 20000 /dev/__properties__/u:object_r:default_prop:s0 |
| f6f79000-f6f98fff r-- 0 20000 /dev/__properties__/properties_serial |
| f6f99000-f6f99fff rw- 0 1000 [anon:linker_alloc_vector] |
| f6f9a000-f6f9afff r-- 0 1000 [anon:atexit handlers] |
| f6f9b000-f6fbafff r-- 0 20000 /dev/__properties__/properties_serial |
| f6fbb000-f6fbbfff rw- 0 1000 [anon:linker_alloc_vector] |
| f6fbc000-f6fbcfff rw- 0 1000 [anon:linker_alloc_small_objects] |
| f6fbd000-f6fbdfff rw- 0 1000 [anon:linker_alloc_vector] |
| f6fbe000-f6fbffff rw- 0 2000 [anon:linker_alloc] |
| f6fc0000-f6fc0fff r-- 0 1000 [anon:linker_alloc] |
| f6fc1000-f6fc1fff rw- 0 1000 [anon:linker_alloc_lob] |
| f6fc2000-f6fc2fff r-- 0 1000 [anon:linker_alloc] |
| f6fc3000-f6fc3fff rw- 0 1000 [anon:linker_alloc_vector] |
| f6fc4000-f6fc4fff rw- 0 1000 [anon:linker_alloc_small_objects] |
| f6fc5000-f6fc5fff rw- 0 1000 [anon:linker_alloc_vector] |
| f6fc6000-f6fc6fff rw- 0 1000 [anon:linker_alloc_small_objects] |
| f6fc7000-f6fc7fff rw- 0 1000 [anon:arc4random _rsx structure] |
| f6fc8000-f6fc8fff rw- 0 1000 [anon:arc4random _rs structure] |
| f6fc9000-f6fc9fff r-- 0 1000 [anon:atexit handlers] |
| f6fca000-f6fcafff --- 0 1000 [anon:thread signal stack guard page] |
| </pre> |
| |
| <p> |
| As of Android 5.0, the C library names most of its anonymous mapped regions so |
| there are fewer mystery regions. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| f6fcb000-f6fccfff rw- 0 2000 [stack:5081] |
| </pre> |
| |
| <p> |
| Regions named <code>[stack:<var>tid</var>]</code> are the stacks for the given |
| threads. |
| </p> |
| |
| <pre class="devsite-click-to-copy"> |
| f6fcd000-f702afff r-x 0 5e000 /system/bin/linker (BuildId: 84f1316198deee0591c8ac7f158f28b7) |
| f702b000-f702cfff r-- 5d000 2000 /system/bin/linker |
| f702d000-f702dfff rw- 5f000 1000 /system/bin/linker |
| f702e000-f702ffff rw- 0 2000 |
| f7030000-f7030fff r-- 0 1000 |
| f7031000-f7032fff rw- 0 2000 |
| ffcd7000-ffcf7fff rw- 0 21000 |
| ffff0000-ffff0fff r-x 0 1000 [vectors] |
| </pre> |
| |
| <p> |
| Whether you see <code>[vector]</code> or <code>[vdso]</code> depends on the |
| architecture. ARM uses <code>[vector]</code>, while all other architectures use |
| <a href="http://man7.org/linux/man-pages/man7/vdso.7.html" class="external"><code>[vdso]</code></a>. |
| </p> |
| |
| </body> |
| </html> |