lmkd: Allow killing perceptible apps when recorded stall is too high

When system is under heavy memory pressure the system might be able
to keep free memory above the min watermark avoiding perceptible app
kills. In such situation system might end up using all its cpu
capacity on memory reclaim and not doing productive work. To detect
this condition, check memory full stall and compare it with the new
ro.lmk.stall_limit_critical tunable representing the stall threshold.
When the recorded level is over ro.lmk.stall_limit_critical, lmkd will
be allowed to kill perceptible apps. ro.lmk.stall_limit_critical
represents the max memory full stall in % that is allowed before
perceptible apps will get killed. By default it is set to 100%, which
effectively disables the feature.
Currently system stall is measured based on psi memory stall 10s average
value, however this definition might change in the future if better
metrics are developed. Setting ro.lmk.stall_limit_critical to 5 means
the system should be fully stalled (no productive work is done) for 5%
of the 10sec period, resulting in 0.5 sec loss due to the stall.

Bug: 205182133
Test: verify on heavy memory pressure test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: I9713e30d82641d86d1b7edb5e1ba2971b935c898
Merged-In: I9713e30d82641d86d1b7edb5e1ba2971b935c898
3 files changed
tree: 17daec82f0919a7149cf220ca2c08a5b314d33eb
  1. include/
  2. libpsi/
  3. tests/
  4. Android.bp
  5. event.logtags
  6. liblmkd_utils.cpp
  7. lmkd.cpp
  8. lmkd.rc
  9. OWNERS
  10. README.md
  11. statslog.cpp
  12. statslog.h
README.md

Android Low Memory Killer Daemon

Introduction

Android Low Memory Killer Daemon (lmkd) is a process monitoring memory state of a running Android system and reacting to high memory pressure by killing the least essential process(es) to keep system performing at acceptable levels.

Background

Historically on Android systems memory monitoring and killing of non-essential processes was handled by a kernel lowmemorykiller driver. Since Linux Kernel 4.12 the lowmemorykiller driver has been removed and instead userspace lmkd daemon performs these tasks.

Android Properties

lmkd can be configured on a particular system using the following Android properties:

ro.config.low_ram: choose between low-memory vs high-performance device. Default = false.

ro.lmk.use_minfree_levels: use free memory and file cache thresholds for making decisions when to kill. This mode works the same way kernel lowmemorykiller driver used to work. Default = false

ro.lmk.low: min oom_adj score for processes eligible to be killed at low vmpressure level. Default = 1001 (disabled)

ro.lmk.medium: min oom_adj score for processes eligible to be killed at medium vmpressure level. Default = 800 (non-essential processes)

ro.lmk.critical: min oom_adj score for processes eligible to be killed at critical vmpressure level. Default = 0 (all processes)

ro.lmk.critical_upgrade: enables upgrade to critical level. Default = false

ro.lmk.upgrade_pressure: max mem_pressure at which level will be upgraded because system is swapping too much. Default = 100 (disabled)

ro.lmk.downgrade_pressure: min mem_pressure at which vmpressure event will be ignored because enough free memory is still available. Default = 100 (disabled)

ro.lmk.kill_heaviest_task: kill heaviest eligible task (best decision) vs. any eligible task (fast decision). Default = false

ro.lmk.kill_timeout_ms: duration in ms after a kill when no additional kill will be done. Default = 0 (disabled)

ro.lmk.debug: enable lmkd debug logs, Default = false

ro.lmk.swap_free_low_percentage: level of free swap as a percentage of the total swap space used as a threshold to consider the system as swap space starved. Default for low-RAM devices = 10, for high-end devices = 20

ro.lmk.thrashing_limit: number of workingset refaults as a percentage of the file-backed pagecache size used as a threshold to consider system thrashing its pagecache. Default for low-RAM devices = 30, for high-end devices = 100

ro.lmk.thrashing_limit_decay: thrashing threshold decay expressed as a percentage of the original threshold used to lower the threshold when system does not recover even after a kill. Default for low-RAM devices = 50, for high-end devices = 10

ro.lmk.psi_partial_stall_ms: partial PSI stall threshold in milliseconds for triggering low memory notification. Default for low-RAM devices = 200, for high-end devices = 70

ro.lmk.psi_complete_stall_ms: complete PSI stall threshold in milliseconds for triggering critical memory notification. Default = 700

lmkd will set the following Android properties according to current system configurations:

sys.lmk.minfree_levels: minfree:oom_adj_score pairs, delimited by comma

sys.lmk.reportkills: whether or not it supports reporting process kills to clients. Test app should check this property before testing low memory kill notification. Default will be unset.