btf loader: Support raw BTF as available in /sys/kernel/btf/vmlinux

Be it automatically when no -F option is passed and
/sys/kernel/btf/vmlinux is available, or when /sys/kernel/btf/vmlinux is
passed as the filename to the tool, i.e.:

  $ pahole -C list_head
  struct list_head {
  	struct list_head *         next;                 /*     0     8 */
  	struct list_head *         prev;                 /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* last cacheline: 16 bytes */
  };
  $ strace -e openat pahole -C list_head |& grep /sys/kernel/btf/
  openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
  $
  $ pahole -C list_head /sys/kernel/btf/vmlinux
  struct list_head {
  	struct list_head *         next;                 /*     0     8 */
  	struct list_head *         prev;                 /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* last cacheline: 16 bytes */
  };
  $

If one wants to grab the matching vmlinux to use its DWARF info instead,
which is useful to compare the results with what we have from BTF, for
instance, its just a matter of using '-F dwarf'.

This in turn shows something that at first came as a surprise, but then
has a simple explanation:

For very common data structures, that will probably appear in all of the
DWARF CUs (Compilation Units), like 'struct list_head', using '-F dwarf'
is faster:

  [acme@quaco pahole]$ perf stat -e cycles pahole -F btf -C list_head > /dev/null

   Performance counter stats for 'pahole -F btf -C list_head':

          45,722,518      cycles:u

         0.023717300 seconds time elapsed

         0.016474000 seconds user
         0.007212000 seconds sys

  [acme@quaco pahole]$ perf stat -e cycles pahole -F dwarf -C list_head > /dev/null

   Performance counter stats for 'pahole -F dwarf -C list_head':

          14,170,321      cycles:u

         0.006668904 seconds time elapsed

         0.005562000 seconds user
         0.001109000 seconds sys

  [acme@quaco pahole]$

But for something that is more specific to a subsystem, the DWARF loader
will have to process way more stuff till it gets to that struct:

  $ perf stat -e cycles pahole -F dwarf -C tcp_sock > /dev/null

   Performance counter stats for 'pahole -F dwarf -C tcp_sock':

      31,579,795,238      cycles:u

         8.332272930 seconds time elapsed

         8.032124000 seconds user
         0.286537000 seconds sys

  $

While using the BTF loader the time should be constant, as it loads
everything from /sys/kernel/btf/vmlinux:

  $ perf stat -e cycles pahole -F btf -C tcp_sock > /dev/null

   Performance counter stats for 'pahole -F btf -C tcp_sock':

          48,823,488      cycles:u

         0.024102760 seconds time elapsed

         0.012035000 seconds user
         0.012046000 seconds sys

  $

Above I used '-F btf' just to show that it can be used, but its not
really needed, i.e. those are equivalent:

  $ strace -e openat pahole -F btf -C list_head |& grep /sys/kernel/btf/vmlinux
  openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
  $ strace -e openat pahole -C list_head |& grep /sys/kernel/btf/vmlinux
  openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
  $

The btf_raw__load() function that ends up being grafted into the
preexisting btf_elf routines was based on libbpf's btf_load_raw().

Acked-by: Alexei Starovoitov <ast@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 files changed