blob: 6b5b8719f11906160efa9da495db65d0d3f64e37 [file] [log] [blame]
Demonstrations of kvm exit reasons, the Linux eBPF/bcc version.
Considering virtual machines' frequent exits can cause performance problems,
this tool aims to locate the frequent exited reasons and then find solutions
to reduce or even avoid the exit, by displaying the detail exit reasons and
the counts of each vm exit for all vms running on one physical machine.
Features of this tool
=====================
- Although there is a patch: [KVM: x86: add full vm-exit reason debug entries]
(https://patchwork.kernel.org/project/kvm/patch/1555939499-30854-1-git-send-email-pizhenwei@bytedance.com/)
trying to fill more vm-exit reason debug entries, just as the comments said,
the code allocates lots of memory that may never be consumed, misses some
arch-specific kvm causes, and can not do kernel aggregation. Instead bcc, as
a user space tool, can implement all these functions more easily and flexibly.
- The bcc python logic could provide nice kernel aggregation and custom output,
like collpasing all tids for one pid (e.i. one vm's qemu process id) with exit
reasons sorted in descending order. For more information, see the following
#USAGE message.
- The bpf in-kernel percpu_array and percpu_cache further improves performance.
For more information, see the following #Help to understand.
Limited
=======
In view of the hardware-assisted virtualization technology of
different architectures, currently we only adapt on vmx in intel.
And the amd feature is on the road..
Example output:
===============
# ./kvmexit.py
Display kvm exit reasons and statistics for all threads... Hit Ctrl-C to end.
PID TID KVM_EXIT_REASON COUNT
^C1273551 1273568 EXIT_REASON_HLT 12
1273551 1273568 EXIT_REASON_MSR_WRITE 6
1274253 1274261 EXIT_REASON_EXTERNAL_INTERRUPT 1
1274253 1274261 EXIT_REASON_HLT 12
1274253 1274261 EXIT_REASON_MSR_WRITE 4
# ./kvmexit.py 6
Display kvm exit reasons and statistics for all threads after sleeping 6 secs.
PID TID KVM_EXIT_REASON COUNT
1273903 1273922 EXIT_REASON_EXTERNAL_INTERRUPT 175
1273903 1273922 EXIT_REASON_CPUID 10
1273903 1273922 EXIT_REASON_HLT 6043
1273903 1273922 EXIT_REASON_IO_INSTRUCTION 24
1273903 1273922 EXIT_REASON_MSR_WRITE 15025
1273903 1273922 EXIT_REASON_PAUSE_INSTRUCTION 11
1273903 1273922 EXIT_REASON_EOI_INDUCED 12
1273903 1273922 EXIT_REASON_EPT_VIOLATION 6
1273903 1273922 EXIT_REASON_EPT_MISCONFIG 380
1273903 1273922 EXIT_REASON_PREEMPTION_TIMER 194
1273551 1273568 EXIT_REASON_EXTERNAL_INTERRUPT 18
1273551 1273568 EXIT_REASON_HLT 989
1273551 1273568 EXIT_REASON_IO_INSTRUCTION 10
1273551 1273568 EXIT_REASON_MSR_WRITE 2205
1273551 1273568 EXIT_REASON_PAUSE_INSTRUCTION 1
1273551 1273568 EXIT_REASON_EOI_INDUCED 5
1273551 1273568 EXIT_REASON_EPT_MISCONFIG 61
1273551 1273568 EXIT_REASON_PREEMPTION_TIMER 14
# ./kvmexit.py -p 1273795 5
Display kvm exit reasons and statistics for PID 1273795 after sleeping 5 secs.
KVM_EXIT_REASON COUNT
MSR_WRITE 13467
HLT 5060
PREEMPTION_TIMER 345
EPT_MISCONFIG 264
EXTERNAL_INTERRUPT 169
EPT_VIOLATION 18
PAUSE_INSTRUCTION 6
IO_INSTRUCTION 4
EOI_INDUCED 2
# ./kvmexit.py -p 1273795 5 -a
Display kvm exit reasons and statistics for PID 1273795 and its all threads after sleeping 5 secs.
TID KVM_EXIT_REASON COUNT
1273819 EXTERNAL_INTERRUPT 64
1273819 HLT 2802
1273819 IO_INSTRUCTION 4
1273819 MSR_WRITE 7196
1273819 PAUSE_INSTRUCTION 2
1273819 EOI_INDUCED 2
1273819 EPT_VIOLATION 6
1273819 EPT_MISCONFIG 162
1273819 PREEMPTION_TIMER 194
1273820 EXTERNAL_INTERRUPT 78
1273820 HLT 2054
1273820 MSR_WRITE 5199
1273820 EPT_VIOLATION 2
1273820 EPT_MISCONFIG 77
1273820 PREEMPTION_TIMER 102
# ./kvmexit.py -p 1273795 -v 0
Display kvm exit reasons and statistics for PID 1273795 VCPU 0... Hit Ctrl-C to end.
KVM_EXIT_REASON COUNT
^CMSR_WRITE 2076
HLT 795
PREEMPTION_TIMER 86
EXTERNAL_INTERRUPT 20
EPT_MISCONFIG 10
PAUSE_INSTRUCTION 2
IO_INSTRUCTION 2
EPT_VIOLATION 1
EOI_INDUCED 1
# ./kvmexit.py -p 1273795 -v 0 4
Display kvm exit reasons and statistics for PID 1273795 VCPU 0 after sleeping 4 secs.
KVM_EXIT_REASON COUNT
MSR_WRITE 4726
HLT 1827
PREEMPTION_TIMER 78
EPT_MISCONFIG 67
EXTERNAL_INTERRUPT 28
IO_INSTRUCTION 4
EOI_INDUCED 2
PAUSE_INSTRUCTION 2
# ./kvmexit.py -p 1273795 -v 4 4
Traceback (most recent call last):
File "tools/kvmexit.py", line 306, in <module>
raise Exception("There's no v%s for PID %d." % (tgt_vcpu, args.pid))
Exception: There's no vCPU 4 for PID 1273795.
# ./kvmexit.py -t 1273819 10
Display kvm exit reasons and statistics for TID 1273819 after sleeping 10 secs.
KVM_EXIT_REASON COUNT
MSR_WRITE 13318
HLT 5274
EPT_MISCONFIG 263
PREEMPTION_TIMER 171
EXTERNAL_INTERRUPT 109
IO_INSTRUCTION 8
PAUSE_INSTRUCTION 5
EOI_INDUCED 4
EPT_VIOLATION 2
# ./kvmexit.py -T '1273820,1273819'
Display kvm exit reasons and statistics for TIDS ['1273820', '1273819']... Hit Ctrl-C to end.
TIDS KVM_EXIT_REASON COUNT
^C1273819 EXTERNAL_INTERRUPT 300
1273819 HLT 13718
1273819 IO_INSTRUCTION 26
1273819 MSR_WRITE 37457
1273819 PAUSE_INSTRUCTION 13
1273819 EOI_INDUCED 13
1273819 EPT_VIOLATION 53
1273819 EPT_MISCONFIG 654
1273819 PREEMPTION_TIMER 958
1273820 EXTERNAL_INTERRUPT 212
1273820 HLT 9002
1273820 MSR_WRITE 25495
1273820 PAUSE_INSTRUCTION 2
1273820 EPT_VIOLATION 64
1273820 EPT_MISCONFIG 396
1273820 PREEMPTION_TIMER 268
Help to understand
==================
We use a PERCPU_ARRAY: pcpuArrayA and a percpu_hash: hashA to collaboratively
store each kvm exit reason and its count. The reason is there exists a rule when
one vcpu exits and re-enters, it tends to continue to run on the same physical
cpu (pcpu as follows) as the last cycle, which is also called 'cache hit'. Thus
we turn to use a PERCPU_ARRAY to record the 'cache hit' situation to speed
things up; and for other cases, then use a percpu_hash.
BTW, we originally use a common hash to do this, with a u64(exit_reason)
key and a struct exit_info {tgid_pid, exit_reason} value. But due to
the big lock in bpf_hash, each updating is quite performance consuming.
Now imagine here is a pid_tgidA (vcpu A) exits and is going to run on
pcpuArrayA, the BPF code flow is as follows:
pid_tgidA keeps running on the same pcpu
// \\
// \\
// Y N \\
// \\
a. cache_hit b. cache_miss
(cacheA's pid_tgid matches pid_tgidA) ||
| ||
| ||
"increase percpu exit_ct and return" ||
[*Note*] ||
pid_tgidA ever been exited on pcpuArrayA?
// \\
// \\
// \\
// Y N \\
// \\
b.a load_last_hashA b.b initialize_hashA_with_zero
\ /
\ /
\ /
"increase percpu exit_ct"
||
||
is another pid_tgid been running on pcpuArrayA?
// \\
// Y N \\
// \\
b.*.a save_theLastHit_hashB do_nothing
\\ //
\\ //
\\ //
b.* save_to_pcpuArrayA
[*Note*] we do not update the table in above "a.", in case the vcpu hit the same
pcpu again when exits next time, instead we only update until this pcpu is not
hitted by the same tgidpid(vcpu) again, which is in "b.*.a" and "b.*".
USAGE message:
==============
# ./kvmexit.py -h
usage: kvmexit.py [-h] [-p PID [-v VCPU | -a] ] [-t TID | -T 'TID1,TID2'] [duration]
Display kvm_exit_reason and its statistics at a timed interval
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID display process with this PID only, collpase all tids with exit reasons sorted in descending order
-v VCPU, --v VCPU display this VCPU only for this PID
-a, --alltids display all TIDS for this PID
-t TID, --tid TID display thread with this TID only with exit reasons sorted in descending order
-T 'TID1,TID2', --tids 'TID1,TID2'
display threads for a union like {395490, 395491}
duration duration of display, after sleeping several seconds
examples:
./kvmexit # Display kvm_exit_reason and its statistics in real-time until Ctrl-C
./kvmexit 5 # Display in real-time after sleeping 5s
./kvmexit -p 3195281 # Collpase all tids for pid 3195281 with exit reasons sorted in descending order
./kvmexit -p 3195281 20 # Collpase all tids for pid 3195281 with exit reasons sorted in descending order, and display after sleeping 20s
./kvmexit -p 3195281 -v 0 # Display only vcpu0 for pid 3195281, descending sort by default
./kvmexit -p 3195281 -a # Display all tids for pid 3195281
./kvmexit -t 395490 # Display only for tid 395490 with exit reasons sorted in descending order
./kvmexit -t 395490 20 # Display only for tid 395490 with exit reasons sorted in descending order after sleeping 20s
./kvmexit -T '395490,395491' # Display for a union like {395490, 395491}