blob: 0c41fa531dbdc539fb528c8f7b1987d406787ccc [file] [log] [blame]
Demonstrations of drsnoop, the Linux eBPF/bcc version.
drsnoop traces the direct reclaim system-wide, and prints various details.
Example output:
# ./drsnoop
COMM PID LAT(ms) PAGES
summond 17678 0.19 143
summond 17669 0.55 313
summond 17669 0.15 145
summond 17669 0.27 237
summond 17669 0.48 111
summond 17669 0.16 75
head 17821 0.29 339
head 17825 0.17 109
summond 17669 0.14 73
summond 17496 104.84 40
summond 17678 0.32 167
summond 17678 0.14 106
summond 17678 0.16 67
summond 17678 0.29 267
summond 17678 0.27 69
summond 17678 0.32 46
base64 17816 0.16 85
summond 17678 0.43 283
summond 17678 0.14 182
head 17736 0.57 135
^C
While tracing, the processes alloc pages,due to insufficient memory available
in the system, direct reclaim events happened, which will increase the waiting
delay of the processes.
drsnoop can be useful for discovering when allocstall(/proc/vmstat) continues to increase,
whether it is caused by some critical processes or not.
The -p option can be used to filter on a PID, which is filtered in-kernel. Here
I've used it with -T to print timestamps:
# ./drsnoop -Tp 17491
TIME(s) COMM PID LAT(ms) PAGES
107.364115000 summond 17491 0.24 50
107.364550000 summond 17491 0.26 38
107.365266000 summond 17491 0.36 72
107.365753000 summond 17491 0.22 49
^C
This shows the summond process allocs pages, and direct reclaim events happening,
and the delays are not affected much.
The -U option include UID on output:
# ./drsnoop -U
UID COMM PID LAT(ms) PAGES
1000 summond 17678 0.32 46
0 base64 17816 0.16 85
1000 summond 17678 0.43 283
1000 summond 17678 0.14 182
0 head 17821 0.29 339
0 head 17825 0.17 109
^C
The -u option filtering UID:
# ./drsnoop -Uu 1000
UID COMM PID LAT(ms) PAGES
1000 summond 17678 0.19 143
1000 summond 17669 0.55 313
1000 summond 17669 0.15 145
1000 summond 17669 0.27 237
1000 summond 17669 0.48 111
1000 summond 17669 0.16 75
1000 summond 17669 0.14 73
1000 summond 17678 0.32 167
^C
A maximum tracing duration can be set with the -d option. For example, to trace
for 2 seconds:
# ./drsnoop -d 2
COMM PID LAT(ms) PAGES
head 21715 0.15 195
The -n option can be used to filter on process name using partial matches:
# ./drsnoop -n mond
COMM PID LAT(ms) PAGES
summond 10271 0.03 51
summond 10271 0.03 51
summond 10259 0.05 51
summond 10269 319.41 37
summond 10270 111.73 35
summond 10270 0.11 78
summond 10270 0.12 71
summond 10270 0.03 35
summond 10277 111.62 41
summond 10277 0.08 45
summond 10277 0.06 32
^C
This caught the 'summond' command because it partially matches 'mond' that's passed
to the '-n' option.
The -v option can be used to show system memory state (now only free mem) at
the beginning of direct reclaiming:
# ./drsnoop.py -v
COMM PID LAT(ms) PAGES FREE(KB)
base64 34924 0.23 151 86260
base64 34962 0.26 149 86260
head 34931 0.24 150 86260
base64 34902 0.19 148 86260
head 34963 0.19 151 86228
base64 34959 0.17 151 86228
head 34965 0.29 190 86228
base64 34957 0.24 152 86228
summond 34870 0.15 151 86080
summond 34870 0.12 115 86184
USAGE message:
# ./drsnoop -h
usage: drsnoop.py [-h] [-T] [-U] [-p PID] [-t TID] [-u UID] [-d DURATION]
[-n NAME]
Trace direct reclaim
optional arguments:
-h, --help show this help message and exit
-T, --timestamp include timestamp on output
-U, --print-uid print UID column
-p PID, --pid PID trace this PID only
-t TID, --tid TID trace this TID only
-u UID, --uid UID trace this UID only
-d DURATION, --duration DURATION
total duration of trace in seconds
-n NAME, --name NAME only print process names containing this name
examples:
./drsnoop # trace all direct reclaim
./drsnoop -T # include timestamps
./drsnoop -U # include UID
./drsnoop -p 181 # only trace PID 181
./drsnoop -t 123 # only trace TID 123
./drsnoop -u 1000 # only trace UID 1000
./drsnoop -d 10 # trace for 10 seconds only
./drsnoop -n main # only print process names containing "main"