Ensure to kill all processes and reap them before unmounting
During shutdown, init stops services by sending SIGTERM or SIGKILL to
the process groups they lead. Child processes of a service by default
belong to the process group of the service, so, stopping the service
kills everything under the service.
However, this "assumption" may not hold always. Process can create a new
process group by calling setpgid(3). In this case, they can outlive
their parent (the service) during the shutdown. Actually they get killed
at the very last moment when init issues `echo i > /proc/sysrq-trigger`.
And to make things worse, init doesn't reap for those rogue processes,
presumably because it's in a somewhat emergency path.
As a result, following catastrophic sequence of actions can occur right
before kernel enters a reboot.
1. the rogue process has issued an I/O. Kernel is in the process of
handling it.
2. init sends SIGKILL (via echo i > /proc/sysrq-trigger), but due to #1,
it is not immediately killed until the I/O is done.
3. init (without waiting for the I/O to complete), try to unmount the
partitions, and fails (as expected).
4. init, can't help but shut the underlying hardware down by issuing
F2FS_IOC_SHUTDOWN. And then jumps to the kernel.
5. kernel may still see #1 ongoing. e.g. a lock may be held, ...
This change tries to fix such an event by ensuring that all processes,
even those in new process groups are killed and reaped before unmounting
the partitions.
Bug: 420528003
Test: follow the repro steps in the bug
Flag: EXEMPT bug fix
(cherry picked from https://googleplex-android-review.googlesource.com/q/commit:ae8ff5b26e40802d63705402cab38c7b3c40d3ac)
Merged-In: I2988dd17844e25900dacbda1c293d4e3c269eb12
Change-Id: I2988dd17844e25900dacbda1c293d4e3c269eb12
1 file changed