blob: b3fea3e9e4b8c8192a30bc659a5d05e9cf9caf56 [file] [log] [blame]
Demonstrations of funcinterval, the Linux eBPF/bcc version.
eBPF/bcc is very suitable for platform performance tuning.
By funclatency, we can profile specific functions to know how latency
this function costs. However, sometimes performance drop is not about the
latency of function but the interval between function calls.
funcinterval is born for this purpose.
Another story, hardware performance tuning on the platform we will use
protocol analyzer to analyze performance, but most protocol analyzers lack
the distribution feature. Using a protocol analyzer you need a lot of time
to check every detail latency. By funcinterval, we can save a lot of time
by distribution feature.
For example:
# ./funcinterval xhci_ring_ep_doorbell -d 2 -u
Tracing 1 functions for "xhci_ring_ep_doorbell"... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 134 | |
64 -> 127 : 2862 |******************** |
128 -> 255 : 5552 |****************************************|
256 -> 511 : 216 |* |
512 -> 1023 : 2 | |
Detaching...
This example output shows that the interval latency of xhci_ring_ep_doorbell
took between 64 and 255 microseconds. USB MAC will start its job after USB
doorbell register ringing, above information that can help hardware engineer to
analyze, the performance drop is because software rings the doorbell too
late or just slowly hardware DMA.
# ./funcinterval blk_start_request -i 2 -u
Tracing 1 functions for "blk_start_request"... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 5 |* |
16 -> 31 : 0 | |
32 -> 63 : 1 | |
64 -> 127 : 2 | |
128 -> 255 : 1 | |
256 -> 511 : 0 | |
512 -> 1023 : 1 | |
1024 -> 2047 : 1 | |
2048 -> 4095 : 117 |****************************************|
4096 -> 8191 : 13 |**** |
8192 -> 16383 : 1 | |
If using biolatency tool that has no difference between two platforms.
Maybe the problem is related to the interval time instead of block
device I/O latency.
# ./funcinterval ion_ioctl -i 2 -m
Tracing 1 functions for "ion_ioctl"... Hit Ctrl-C to end.
msecs : count distribution
0 -> 1 : 215 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 4 | |
8 -> 15 : 5 | |
16 -> 31 : 29 |***** |
You can also check the ion_ioctl. By the above output, we know the activity
frequency of ion_ioctl() is high mostly(less than 1 ms), but has 29 times low
frequency.
# ./funcinterval t:block:block_bio_queue -d 30 -u
Tracing tracepoint for "block:block_bio_queue"... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 2 | |
16 -> 31 : 262 | |
32 -> 63 : 9075 |******************* |
64 -> 127 : 18668 |****************************************|
128 -> 255 : 1492 |*** |
256 -> 511 : 2616 |***** |
512 -> 1023 : 7226 |*************** |
1024 -> 2047 : 8982 |******************* |
2048 -> 4095 : 2394 |***** |
4096 -> 8191 : 163 | |
8192 -> 16383 : 42 | |
16384 -> 32767 : 2 | |
32768 -> 65535 : 0 | |
65536 -> 131071 : 0 | |
131072 -> 262143 : 0 | |
262144 -> 524287 : 0 | |
524288 -> 1048575 : 1 | |
Detaching...
# ./funcinterval t:block:block_rq_issue -d 30 -u
Tracing tracepoint for "block:block_rq_issue"... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 5 | |
32 -> 63 : 18 | |
64 -> 127 : 32 | |
128 -> 255 : 95 | |
256 -> 511 : 2194 |****** |
512 -> 1023 : 13830 |****************************************|
1024 -> 2047 : 9001 |************************** |
2048 -> 4095 : 1569 |**** |
4096 -> 8191 : 96 | |
8192 -> 16383 : 17 | |
Detaching...
funcinterval also supports tracepoint filter. The above two cases are under EMMC
throughput testing, by those results you know which layer has a slower interval
time. In our case, mmc-cmdqd is slower than block layer.
# ./funcinterval -p `pidof -s top` c:malloc -i 3
Tracing uprobe function for "malloc"... Hit Ctrl-C to end.
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 0 | |
4096 -> 8191 : 7 |************************* |
8192 -> 16383 : 11 |****************************************|
16384 -> 32767 : 4 |************** |
32768 -> 65535 : 1 |*** |
65536 -> 131071 : 1 |*** |
131072 -> 262143 : 1 |*** |
262144 -> 524287 : 0 | |
524288 -> 1048575 : 0 | |
1048576 -> 2097151 : 0 | |
2097152 -> 4194303 : 0 | |
4194304 -> 8388607 : 1 |*** |
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 0 | |
4096 -> 8191 : 8 |******************************** |
8192 -> 16383 : 10 |****************************************|
16384 -> 32767 : 4 |**************** |
32768 -> 65535 : 1 |**** |
65536 -> 131071 : 1 |**** |
131072 -> 262143 : 1 |**** |
262144 -> 524287 : 0 | |
524288 -> 1048575 : 0 | |
1048576 -> 2097151 : 0 | |
2097152 -> 4194303 : 0 | |
4194304 -> 8388607 : 1 |**** |
Time the interval of libc's malloc for top utility every 3 seconds.
# ./funcinterval /usr/local/bin/python:main
Tracing uprobe function for "main"... Hit Ctrl-C to end.
^C
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 0 | |
4096 -> 8191 : 0 | |
8192 -> 16383 : 0 | |
16384 -> 32767 : 0 | |
32768 -> 65535 : 0 | |
65536 -> 131071 : 0 | |
131072 -> 262143 : 0 | |
262144 -> 524287 : 0 | |
524288 -> 1048575 : 0 | |
1048576 -> 2097151 : 0 | |
2097152 -> 4194303 : 0 | |
4194304 -> 8388607 : 0 | |
8388608 -> 16777215 : 0 | |
16777216 -> 33554431 : 0 | |
33554432 -> 67108863 : 0 | |
67108864 -> 134217727 : 0 | |
134217728 -> 268435455 : 0 | |
268435456 -> 536870911 : 1 |****************************************|
536870912 -> 1073741823 : 1 |****************************************|
1073741824 -> 2147483647 : 1 |****************************************|
2147483648 -> 4294967295 : 1 |****************************************|
Detaching...
Time the interal of python's main function.
USAGE message:
# ./funcinterval -h
usage: funcinterval [-h] [-p PID] [-i INTERVAL] [-d DURATION] [-T] [-u] [-m]
[-v]
pattern
Time interval and print latency as a histogram
positional arguments:
pattern Function name for tracing
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID trace this PID only
-i INTERVAL, --interval INTERVAL
summary interval, in seconds
-d DURATION, --duration DURATION
total duration of trace, in seconds
-T, --timestamp include timestamp on output
-u, --microseconds microsecond histogram
-m, --milliseconds millisecond histogram
-v, --verbose print the BPF program (for debugging purposes)
examples:
# time the interval of do_sys_open()
./funcinterval do_sys_open
# time the interval of xhci_ring_ep_doorbell(), in microseconds
./funcinterval -u xhci_ring_ep_doorbell
# time the interval of do_nanosleep(), in milliseconds
./funcinterval -m do_nanosleep
# output every 5 seconds, with timestamps
./funcinterval -mTi 5 vfs_read
# time process 181 only
./funcinterval -p 181 vfs_read
# time the interval of mm_vmscan_direct_reclaim_begin tracepoint
./funcinterval t:vmscan:mm_vmscan_direct_reclaim_begin
# time the interval of c:malloc used by top every 3 seconds
./funcinterval -p `pidof -s top` -i 3 c:malloc
# time /usr/local/bin/python main function
./funcinterval /usr/local/bin/python:main