Merge "Add default choice to HandleInput class" into main
diff --git a/simpleperf/doc/AFDOFlow.png b/simpleperf/doc/AFDOFlow.png
new file mode 100644
index 0000000..134e8fc
--- /dev/null
+++ b/simpleperf/doc/AFDOFlow.png
Binary files differ
diff --git a/simpleperf/doc/collect_lbr_data_for_autofdo.md b/simpleperf/doc/collect_lbr_data_for_autofdo.md
index 08f6148..1b6bc5b 100644
--- a/simpleperf/doc/collect_lbr_data_for_autofdo.md
+++ b/simpleperf/doc/collect_lbr_data_for_autofdo.md
@@ -1,158 +1,391 @@
-# Collect LBR (x86 Architectures) data for AutoFDO
-
-[TOC]
-
-## Introduction
-
-Intel's Performance Monitoring Unit (PMU) is a hardware feature built into their processors to measure various
-performance parameters. These parameters include instruction cycles, cache hits, cache misses, branch misses,
-and more1. The PMU helps in understanding how effectively code uses hardware resources and provides insights
-for optimization.The Last Branch Record (LBR) is indeed a part of Intel's Performance Monitoring Unit (PMU).
-The PMU includes various performance monitoring features, and LBR is one of them.
-
-The Last Branch Record (LBR) is an advanced CPU feature designed to meticulously log the source and destination
-addresses of recently executed branch instructions. This capability serves as a vital tool for performance
-monitoring and debugging, allowing developers to track the intricate control flow of their programs. By analyzing
-the data captured through LBR, we can gain valuable insights into how applications navigate through their execution
-paths and pinpoint the areas where the program spends most of its time-often referred to as "hot paths."
-
-Branch Statistics:
-One of the remarkable applications of LBR is its ability to gather comprehensive branch statistics in C++ programs.
-This data can be pivotal in understanding the behavior of conditional decisions in the code.
-
-Virtual Calls:
-LBR proves particularly useful for analyzing the outcomes of indirect branches and virtual calls, key components in
-object-oriented programming that can significantly influence performance.
-
-LBR entries are rich with information, typically consisting of `FROM_IP` and `TO_IP`, which denote the source and
-destination addresses of the branching instructions. This detailed logging offers a clear view of the program's
-execution flow.
-
-Model Specific Registers (MSRs):
-The configuration of LBR relies on Model Specific Registers (MSRs) specific to Intel CPUs. These registers play
-a crucial role in enabling and managing LBR functionalities.
-
-IA32_DEBUGCTL: To initiate LBR recording, one must set bit 0 of the IA32_DEBUGCTL register to 1, effectively
-activating this powerful feature.
-
-MSR_LASTBRANCH_x_FROM_IP:
-This particular register is responsible for storing the originating addresses of the most recent branch instructions,
-preserving a trail of execution paths.
-
-MSR_LASTBRANCH_x_TO_IP: Conversely, this register captures the destination addresses of those most recent branches,
-creating a comprehensive map of transitions within the program.
-
-Clearing LBRs: A noteworthy aspect of LBR is that it gets cleared when the CPU enters certain low-power sleep states
-deeper than C2. To maintain the integrity of the recorded data, it may be necessary to keep the CPU in an awake state.
-
-Stopping LBR: Ceasing LBR recording can present challenges and might require invoking performance monitoring
-interrupts (PMIs), introducing additional complexity to the management of this feature.
-
-Advantages:
-Overhead: One of the standout benefits of LBR is its minimal overhead; it provides nearly zero performance degradation
-compared to traditional software-based branch recording methods, making it an efficient choice in performance-sensitive applications.
-
-Accuracy: Although manual code instrumentation might yield slightly better precision in certain scenarios, this advantage
-comes at the significant cost of increased runtime performance overhead, making LBR a more appealing alternative in many cases.
-
-Scenarios: The utility of LBR shines particularly in situations where the source code is not readily accessible or when
-the software builds process remains shrouded in mystery. In such cases, LBR becomes an invaluable ally in uncovering insights
-into program behavior, allowing developers and analysts to make informed decisions based on the recorded execution paths.
-
-
-Simpleperf supports collecting LBR data and converting it to input files for AutoFDO, which can then be used for
-Feedback Directed Optimization during compilation.
-
-## Examples
-
-Below are examples collecting LBR data for AutoFDO. It has two steps: first recording LBR data,second converting LBR data to
-AutoFDO input files.
-
-Record LBR data:
-
-# preparation: we need to be root the device to record LBR data
-$ adb root
-$ adb shell
-brya:/ \# cd data/local/tmp
-brya:/data/local/tmp \#
-
-# Do a system wide collection, it writes output to perf.data.
-# If only want LBR data for kernel, use `-e BR_INST_RETIRED.NEAR_TAKEN:k`.
-# If only want LBR data for userspace, use `-e BR_INST_RETIRED.NEAR_TAKEN:u`.
-# If want LBR data for system wide collection, use `-e BR_INST_RETIRED.NEAR_TAKEN -a`.
-
-brya:/data/local/tmp \# simpleperf record -b -e BR_INST_RETIRED.NEAR_TAKEN:u -c 100003
-
-simpleperf record:
-The simpleperf record command is used to profile processes and store the profiling data in a file (usually�perf.data).
-
--b:
-This option enables branch recording. It uses the Last Branch Record (LBR) feature of the CPU to capture the
-most recent branches taken by the processor. This is useful for understanding the control flow of a program.
-
--a:
-This option tells perf to record system-wide. It collects performance data from all CPUs, not just the one
-where the command is run. This is useful for capturing a comprehensive view of system performance.
-
--e:
-This option specifies the event (BR_INST_RETIRED.NEAR_TAKEN in this case) to record.
-
-# To reduce file size and time converting to AutoFDO input files, we recommend converting LBR data into an intermediate branch-list format.
-
-brya:/data/local/tmp \# simpleperf inject -i perf.data --output branch-list -o branch_list.data
-```
-Converting LBR data to AutoFDO input files needs to read binaries.
-So for userspace and kernel libraries, it needs to be converted on host, with vmlinux and kernel modules available.
-
-Convert LBR data for userspace libraries:
-
-```sh
-# Injecting LBR data on device. It writes output to perf_inject.data.
-# perf_inject.data is a text file, containing branch counts for each library.
-```
-
-Convert LBR data for Userspace & kernel:
-
-```sh
-# pull LBR data to host.
-host $ adb pull /data/local/tmp/branch_list.data
-# download vmlinux and kernel modules to <binary_dir>
-# host simpleperf is in <aosp-top>/aosp/out/host/linux-x86/bin/simpleperf,
-# or you can build simpleperf by `mmma system/extras/simpleperf`.
-host $ simpleperf inject --symdir <binary_dir> -i branch_list.data
-simpleperf inject -i branch_list.data --binary <userspacelibrary> --symdir <symboldir> -o perf_inject.data
-
-```
-The generated perf_inject.data may contain branch info for multiple binaries. But AutoFDO only
-accepts one at a time. So we need to split perf_inject.data.
-The format of perf_inject.data is below:
-
-```perf_inject.data format
-
-executed range with count info for binary1
-branch with count info for binary1
-// name for binary1
-
-executed range with count info for binary2
-branch with count info for binary2
-// name for binary2
-
-...
-```
-
-We need to split perf_inject.data, and make sure one file only contains info for one binary.
-
-Then we can use [AutoFDO](https://github.com/google/autofdo) to create profile. Follow README.md
-in AutoFDO to build create_llvm_prof, then use `create_llvm_prof` to create profiles for clang.
-
-```sh
-# perf_inject_binary1.data is split from perf_inject.data, and only contains branch info for binary1.
-host $ create_llvm_prof -profile perf_inject_binary1.data -profiler text -binary path_of_binary1 -out a.prof -format extbinary
-
-# perf_inject_kernel.data is split from perf_inject.data, and only contains branch info for [kernel.kallsyms].
-host $ create_llvm_prof -profile perf_inject_kernel.data -profiler text -binary vmlinux -out a.prof -format extbinary
-```
-
-Then we can use a.prof for AFDO during compilation, via `-fprofile-sample-use=a.prof`.
-[Here](https://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers) are more details.
-
+# Collect LBR (x86 Architectures) data for AutoFDO
+
+# Table of Contents
+
+- Introduction
+- AFDO Compiler Optimizations
+ - Sampling Profiler
+ - Execution Profiles
+ - Limitations of Code Coverage
+ - Generating Sampling Profiles
+ - AFDO Flow Diagram
+- Intel's Performance Monitoring Unit (PMU)
+- Examples
+ - A complete example: autofdo_inline_test.cpp
+- Related docs
+
+## Introduction
+
+The following user guide provides an overview of AFDO compiler
+optimizations, details on Intel Performance Monitoring Units (PMU), and
+instructions for collecting Last Branch Record (LBR) related profiles on
+x86 platforms.
+
+## AFDO Compiler Optimization
+
+**AutoFDO compiler** optimization refer to a set of advanced techniques
+employed by compilers to enhance the performance of software
+applications. These optimizations are based on insights gained from
+hardware performance metrics, specifically focusing on events such as
+`br_inst_retired.neartaken` and `cpu_cycles`.
+
+### Sampling Profiler
+
+A sampling profiler can generate a performance profile with very low
+runtime overhead. This profile is crucial for optimization purposes but
+is not suitable for code coverage analysis. The profiler collects data
+by periodically sampling the program's execution, which provides a
+statistical representation of where time is being spent in the code.
+
+### Execution Profiles
+
+Compilers utilize **execution profiles** that consist of basic block and
+edge frequency counts. These profiles guide various optimizations,
+including:
+
+- **Instruction Scheduling**: Reordering instructions to minimize
+ delays and improve pipeline efficiency.
+- **Basic Block Re-ordering**: Rearranging basic blocks to enhance
+ cache performance and reduce branch mispredictions.
+- **Function Splitting**: Dividing functions into smaller parts to
+ improve inlining and reduce code size.
+- **Register Allocation**: Efficiently assigning variables to CPU
+ registers to minimize memory access.
+
+These optimizations aim to improve execution speed, reduce resource
+consumption, and enhance the overall efficiency of applications on
+specific hardware configurations.
+
+### Limitations for Code Coverage
+
+While it is technically possible to use sampling profiles for code
+coverage, they are generally too coarse-grained for this purpose.
+Sampling profiles provide a statistical view rather than a precise
+execution trace, leading to poor results in code coverage analysis.
+
+### Generating Sampling Profiles
+
+Sampling profiles must be generated by an external tool like simpleperf
+in the below case. Once generated, the profile needs to be converted
+into a format that can be read by LLVM using create_llvm_prof tool
+
+### AFDO Flow Diagram
+
+
+
+## Intel's Performance Monitoring Unit (PMU)
+
+Intel's Performance Monitoring Unit (PMU) is a hardware feature built
+into their processors to measure various performance parameters. These
+parameters include instruction cycles, cache hits, cache misses, branch
+misses, and more. The PMU helps in understanding how effectively code
+uses hardware resources and provides insights for optimization.The Last
+Branch Record (LBR) is indeed a part of Intel's Performance Monitoring
+Unit (PMU). The PMU includes various performance monitoring features,
+and LBR is one of them.
+
+The Last Branch Record (LBR) is an advanced CPU feature designed to
+meticulously log the source and destination addresses of recently
+executed branch instructions. This capability serves as a vital tool for
+performance monitoring and debugging, allowing developers to track the
+intricate control flow of their programs. By analyzing the data captured
+through LBR, we can gain valuable insights into how applications
+navigate through their execution paths and pinpoint the areas where the
+program spends most of its time-often referred to as "hot paths."
+
+**Branch Statistics**: One of the remarkable applications of LBR is its
+ability to gather comprehensive branch statistics in C++ programs. This
+data can be pivotal in understanding the behavior of conditional
+decisions in the code.
+
+**Virtual Calls**: LBR proves particularly useful for analyzing the
+outcomes of indirect branches and virtual calls, key components in
+object-oriented programming that can significantly influence
+performance.
+
+LBR entries are rich with information, typically consisting of `FROM_IP`
+and `TO_IP`, which denote the source and destination addresses of the
+branching instructions. This detailed logging offers a clear view of the
+program's execution flow.
+
+**Model Specific Registers (MSRs)**: The configuration of LBR relies on
+Model Specific Registers (MSRs) specific to Intel CPUs. These registers
+play a crucial role in enabling and managing LBR functionalities.
+**IA32_DEBUGCTL**: To initiate LBR recording, one must set bit 0 of the
+IA32_DEBUGCTL register to 1, effectively activating this powerful
+feature. **MSR_LASTBRANCH_x\_FROM_IP**: This particular register is
+responsible for storing the originating addresses of the most recent
+branch instructions, preserving a trail of execution paths.
+**MSR_LASTBRANCH_x\_TO_IP**: Conversely, this register captures the
+destination addresses of those most recent branches, creating a
+comprehensive map of transitions within the program.
+
+**Clearing LBRs**: A noteworthy aspect of LBR is that it gets cleared
+when the CPU enters certain low-power sleep states deeper than C2. To
+maintain the integrity of the recorded data, it may be necessary to keep
+the CPU in an awake state.
+
+**Stopping LBR**: Ceasing LBR recording can present challenges and might
+require invoking performance monitoring interrupts (PMIs), introducing
+additional complexity to the management of this feature.
+
+**Advantages**: **Overhead**: One of the standout benefits of LBR is its
+minimal overhead; it provides nearly zero performance degradation
+compared to traditional software-based branch recording methods, making
+it an efficient choice in performance-sensitive applications.
+**Accuracy**: Although manual code instrumentation might yield slightly
+better precision in certain scenarios, this advantage comes at the
+significant cost of increased runtime performance overhead, making LBR a
+more appealing alternative in many cases. **Scenarios**: The utility of
+LBR shines particularly in situations where the source code is not
+readily accessible or when the software builds process remains shrouded
+in mystery. In such cases, LBR becomes an invaluable ally in uncovering
+insights into program behavior, allowing developers and analysts to make
+informed decisions based on the recorded execution paths.
+
+Simpleperf supports collecting LBR data and converting it to input files
+for AutoFDO, which can then be used for Feedback Directed Optimization
+during compilation.
+
+## Examples
+
+Below are examples collecting LBR data for AutoFDO. It has two steps:
+first recording LBR data,second converting LBR data to AutoFDO input
+files.
+
+Record LBR data:
+
+``` sh
+# preparation: we need to be root the device to record LBR data
+# for initial setup
+$ adb root
+$ adb remount
+# device will ask for reboot for changes to be applied
+# once initial setup is done,next time onwards the below steps only should be used
+$ adb root
+$ adb shell
+brya:/ \# cd data/local/tmp
+brya:/data/local/tmp \#
+
+# Do a system wide collection, it writes output to perf.data.
+# If only want LBR data for kernel, use `-e BR_INST_RETIRED.NEAR_TAKEN:k`.
+# If only want LBR data for userspace, use `-e BR_INST_RETIRED.NEAR_TAKEN:u`.
+# If want LBR data for system wide collection, use `-e BR_INST_RETIRED.NEAR_TAKEN -a`.
+
+brya:/data/local/tmp \# simpleperf record -b -p <processid> -e BR_INST_RETIRED.NEAR_TAKEN:u -c 10003
+
+# if you have a standalone binary the below command needs to be used
+
+brya:/data/local/tmp \# simpleperf record -b -e BR_INST_RETIRED.NEAR_TAKEN:u -c 10003 ./<binaryname>
+
+simpleperf record:
+The simpleperf record command is used to profile processes and store the profiling data in a file (usually perf.data).
+
+-b:
+This option enables branch recording. It uses the Last Branch Record (LBR) feature of the CPU to capture the
+most recent branches taken by the processor. This is useful for understanding the control flow of a program.
+
+-a:
+This option tells perf to record system-wide. It collects performance data from all CPUs, not just the one
+where the command is run. This is useful for capturing a comprehensive view of system performance.
+
+-e:
+This option specifies the event (BR_INST_RETIRED.NEAR_TAKEN in this case) to record.
+
+-c:
+This option is used to specify the event count threshold for sampling.
+
+
+# To reduce file size and time converting to AutoFDO input files, we recommend converting LBR data into an intermediate branch-list format.
+
+brya:/data/local/tmp \# simpleperf inject -i perf.data --output branch-list -o branch_list.data
+```
+
+Converting LBR data to AutoFDO input files needs to read binaries. So
+for userspace and kernel libraries, it needs to be converted on host,
+with vmlinux and kernel modules available.
+
+1) Convert LBR data for userspace libraries:
+
+``` sh
+# Injecting LBR data on device. It writes output to perf_inject.data.
+# perf_inject.data is a text file, containing branch counts for each library.
+# Host simpleperf is in <aosp-top>/aosp/out/host/linux-x86/bin/simpleperf,
+# or you can build simpleperf by `make simpleperf_ndk`.
+
+host $ adb pull /data/local/tmp/branch_list.data
+
+host $ simpleperf inject -i branch_list.data --binary <binaryorlibraryname> --symdir <aosp-top>/aosp/out/target/product/generic_x86_64/symbols/system/ -o perf_inject.data
+```
+
+2) Convert LBR data for Userspace & kernel:
+
+``` sh
+# pull LBR data to host.
+
+host $ adb pull /data/local/tmp/branch_list.data
+
+# download vmlinux and kernel modules to <binary_dir>
+# host simpleperf is in <aosp-top>/aosp/out/host/linux-x86/bin/simpleperf,
+# or you can build simpleperf by `make simpleperf_ndk`.
+
+host $ simpleperf inject -i branch_list.data --binary <userspacebinaryorlibrary> --symdir <symboldir> -o perf_inject.data
+```
+
+The generated perf_inject.data may contain branch info for multiple
+binaries. But AutoFDO only accepts one at a time. So we need to split
+perf_inject.data. The format of perf_inject.data is below:
+
+\`\`\`perf_inject.data format
+
+executed range with count info for binary1 branch with count info for
+binary1 // name for binary1
+
+executed range with count info for binary2 branch with count info for
+binary2 // name for binary2
+
+...
+
+ We need to split perf_inject.data, and make sure one file only contains info for one binary.
+
+ Then we can use [AutoFDO](https://github.com/google/autofdo) to create profile. Follow README.md
+ in AutoFDO to build create_llvm_prof, then use `create_llvm_prof` to create profiles for clang.
+
+ ```sh
+ # perf_inject_binary1.data is split from perf_inject.data, and only contains branch info for binary1.
+ host $ create_llvm_prof -profile perf_inject_binary1.data -profiler text -binary path_of_binary1 -out a.afdo -format extbinary
+
+ # perf_inject_kernel.data is split from perf_inject.data, and only contains branch info for [kernel.kallsyms].
+ host $ create_llvm_prof -profile perf_inject_kernel.data -profiler text -binary vmlinux -out a.afdo -format extbinary
+
+Then we can use a.prof for AFDO during compilation, via
+`-fprofile-sample-use=a.afdo`.
+[Here](https://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers)
+are more details.
+
+### A complete example: autofdo_inline_test.cpp
+
+`autofdo_inline_test.cpp` is an example to show the complete
+process. The source code is in
+[autofdo_inline_test.cpp](https://android.googlesource.com/platform/system/extras/+/main/simpleperf/runtest/autofdo_inline_test.cpp).
+The build script is in
+[Android.bp](https://android.googlesource.com/platform/system/extras/+/main/simpleperf/runtest/Android.bp).
+It builds an executable called `autofdo_inline_test`, which runs on device
+(Referred here as brya).
+
+**Step 1: Build `autofdo_inline_test` binary**
+
+``` sh
+(host) <AOSP>$ source build/envsetup.sh
+(host) <AOSP>$ lunch aosp_x86_64-trunk_staging-userdebug
+(host) <AOSP>$ make autofdo_inline_test
+```
+
+**Step 2: Run `autofdo_inline_test.cpp` on brya, and collect LBR
+data for its running**
+
+``` sh
+(host) <AOSP>$ adb push out/target/product/generic_x86_64/system/bin/autofdo_inline_test /data/local/tmp
+(host) <AOSP>$ adb root
+(host) <AOSP>$ adb shell
+(brya) / $ cd /data/local/tmp
+(brya) /data/local/tmp $ chmod a+x autofdo_inline_test
+(brya) /data/local/tmp $ simpleperf record -b -p <processidofautofdobinary> -e BR_INST_RETIRED.NEAR_TAKEN:u
+simpleperf I cmd_record.cpp:840] Recorded for 4.0012 seconds. Start post processing.
+simpleperf I cmd_record.cpp:941] Samples recorded: 7. Samples lost: 0.
+(brya) /data/local/tmp $ simpleperf inject --output branch-list -o branch_list.data
+(brya) /data/local/tmp $ simpleperf inject -i branch_list.data
+(brya) /data/local/tmp $ exit
+(host) <AOSP>$ adb pull /data/local/tmp/perf_inject.data
+```
+
+**Step 3: Convert LBR data to AutoFDO profile**
+
+``` sh
+# Build simpleperf tool on host.
+(host) <AOSP>$ make simpleperf_ndk
+(host) <AOSP>$ cat perf_inject.data
+2
+4160-418d:8
+419d-41bb:9
+3
+4170:1
+419d:1
+41a7:1
+4
+4159->4187:3
+4185->41d2:1
+418d->419d:9
+41bb->4160:11
+// build_id: 0x1631385c6a846e19fd38cec137041c2200000000
+// /data/local/tmp/latest/autofdo_inline_test
+
+(host) <AOSP>$ create_llvm_prof --binary <AOSP>/out/target/product/generic_x86_64/system/bin/autofdo_inline_test --format extbinary --out autofdo_inline_test.afdo --profile perf_inject.data --profiler text
+
+(host) <AOSP>$ ls -lh autofdo_inline_test.afdo
+-rw-rw-rw- 1 root root 1.0K 2025-03-11 10:18 autofdo_inline_test.afdo
+```
+
+**Step 4: Use AutoFDO profile to build optimized binary**
+
+``` sh
+(host) <AOSP>$ cp autofdo_inline_test.afdo toolchain/pgo-profiles/sampling/
+(host) <AOSP>$ vi toolchain/pgo-profiles/sampling/Android.bp
+# Edit Android.bp to add a fdo_profile module:
+#
+# fdo_profile {
+# name: "autofdo_inline_test",
+# profile: "autofdo_inline_test.afdo"
+# }
+(host) <AOSP>$ vi toolchain/pgo-profiles/sampling/afdo_profiles.mk
+# Edit afdo_profiles.mk to add autofdo_inline_test profile mapping:
+#
+# AFDO_PROFILES += keystore2://toolchain/pgo-profiles/sampling:keystore2 \
+# ...
+# server_configurable_flags://toolchain/pgo-profiles/sampling:server_configurable_flags \
+# autofdo_inline_test://toolchain/pgo-profiles/sampling:autofdo_inline_test
+#
+
+(host) <AOSP>$ make autofdo_inline_test
+```
+
+We can check if `autofdo_inline_test.afdo` is used when building autofdo_inline_test binary.
+
+``` sh
+(host) <AOSP>$ gzip -d out/verbose.log.gz
+(host) <AOSP>$ cat out/verbose.log | grep autofdo_inline_test`
+ ... -fprofile-sample-use=toolchain/pgo-profiles/sampling/autofdo_inline_test.afdo ...
+```
+
+If comparing the disassembly of
+`out/target/product/generic_x86_64/system/bin/autofdo_inline_test` before and after
+optimizing with AutoFDO data, we can see different preferences in
+inlining, branching & basic block re-ordering. In addition we can also
+monitor Intel(R) PMU Branch Monitoring events using simpleperf. Refer below events
+comparison data.
+
+|Intel(R) PerfMon EventName|Without AFDO|With AFDO|% Delta|
+|-|-|-|-|
+|BR_INST_RETIRED.ALL_BRANCHES|25289601|25680449|2%|
+|BR_MISP_RETIRED.ALL_BRANCHES|2,693,141|2,376,465|-12%|
+|BR_MISP_RETIRED.COND|2,477,232|2,133,468|-14%|
+|BR_MISP_RETIRED.COND_TAKEN|2,136,117|1,897,894|-11%|
+|BR_MISP_RETIRED.INDIRECT|238,063|200,008|-16%|
+|BR_MISP_RETIRED.INDIRECT_CALL|205,970|179,661|-13%|
+|BR_MISP_RETIRED.RET|76,709|72,147|-6%|
+|BACLEARS.ANY|6,217,138|5,761,070|-7%|
+
+|Standard Events|Without AFDO|With AFDO|% Delta|
+|-|-|-|-|
+|cpu-cycles |780,810,870|743,257,553|-5%|
+|context-switches|7,463|6,659|-11%|
+|task-clock (ms)|187128.967|174391.7821|-7%|
+
+## Related docs
+
+- [Last Branch Record
+ Stack](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)
+- [Performance monitoring events supported by Intel Performance
+ Monitoring Units (PMUs)](https://perfmon-events.intel.com/)
+- [AutoFDO tool for converting profile
+ data](https://github.com/google/autofdo)
diff --git a/simpleperf/runtest/Android.bp b/simpleperf/runtest/Android.bp
index cbc10c7..6723702 100644
--- a/simpleperf/runtest/Android.bp
+++ b/simpleperf/runtest/Android.bp
@@ -115,6 +115,13 @@
],
}
+// Used as an example in collect_lbr_data_for_autofdo.md.
+cc_binary {
+ name: "autofdo_inline_test",
+ srcs: ["autofdo_inline_test.cpp"],
+ afdo: true,
+}
+
cc_binary {
name: "autofdo_addr_test",
srcs: ["autofdo_addr_test.cpp"],
diff --git a/simpleperf/runtest/autofdo_inline_test.cpp b/simpleperf/runtest/autofdo_inline_test.cpp
new file mode 100644
index 0000000..813daf6
--- /dev/null
+++ b/simpleperf/runtest/autofdo_inline_test.cpp
@@ -0,0 +1,61 @@
+/*
+ * Copyright (C) 2025 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <unistd.h> // For usleep function
+
+typedef uint32_t UInt32;
+typedef uint16_t UInt16;
+
+void cond_branch_example_function(uint16_t* prob, uint8_t* buf) {
+ UInt32 range = 0xFFFFFFFF;
+ UInt32 code = 0;
+ UInt32 bound;
+ UInt16 ttt;
+ UInt32 symbol = 0;
+
+ do {
+ ttt = *(prob + symbol);
+ if (range < ((UInt32)1 << 24)) {
+ range <<= 8;
+ code = (code << 8) | (*buf++);
+ }
+ bound = (range >> 11) * ttt;
+ if (code < bound) { // <== This is mispredicted branch (conditional branch)
+ range = bound;
+ *(prob + symbol) = (UInt16)(ttt + (((1 << 11) - ttt) >> 5));
+ symbol = (symbol + symbol);
+ } else {
+ range -= bound;
+ code -= bound;
+ *(prob + symbol) = (UInt16)(ttt - (ttt >> 5));
+ symbol = (symbol + symbol) + 1;
+ }
+ } while (symbol < 0x100); // conditional branch
+}
+
+int main() {
+ uint16_t prob[256] = {0};
+ uint8_t buf[256] = {0};
+
+ usleep(15000000);
+
+ // Call the conditional branch example function
+ cond_branch_example_function(prob, buf);
+
+ return 0;
+}