Ensure to kill all processes and reap them before unmounting
During shutdown, init stops services by sending SIGTERM or SIGKILL to
the process groups they lead. Child processes of a service by default
belong to the process group of the service, so, stopping the service
kills everything under the service.
However, this "assumption" may not hold always. Process can create a new
process group by calling setpgid(3). In this case, they can outlive
their parent (the service) during the shutdown. Actually they get killed
at the very last moment when init issues `echo i > /proc/sysrq-trigger`.
And to make things worse, init doesn't reap for those rogue processes,
presumably because it's in a somewhat emergency path.
As a result, following catastrophic sequence of actions can occur right
before kernel enters a reboot.
1. the rogue process has issued an I/O. Kernel is in the process of
handling it.
2. init sends SIGKILL (via echo i > /proc/sysrq-trigger), but due to #1,
it is not immediately killed until the I/O is done.
3. init (without waiting for the I/O to complete), try to unmount the
partitions, and fails (as expected).
4. init, can't help but shut the underlying hardware down by issuing
F2FS_IOC_SHUTDOWN. And then jumps to the kernel.
5. kernel may still see #1 ongoing. e.g. a lock may be held, ...
This change tries to fix such an event by ensuring that all processes,
even those in new process groups are killed and reaped before unmounting
the partitions.
Bug: 420528003
Test: follow the repro steps in the bug
Flag: EXEMPT bug fix
(cherry picked from https://googleplex-android-review.googlesource.com/q/commit:ae8ff5b26e40802d63705402cab38c7b3c40d3ac)
Merged-In: I2988dd17844e25900dacbda1c293d4e3c269eb12
Change-Id: I2988dd17844e25900dacbda1c293d4e3c269eb12
diff --git a/init/reboot.cpp b/init/reboot.cpp
index b3322f6..76787ea 100644
--- a/init/reboot.cpp
+++ b/init/reboot.cpp
@@ -346,18 +346,38 @@
return UMOUNT_STAT_ERROR;
}
-static UmountStat UmountPartitions(std::chrono::milliseconds timeout) {
- // Terminate (SIGTERM) the services before unmounting partitions.
- // If the processes block the signal, then partitions will eventually fail
- // to unmount and then we fallback to SIGKILL the services.
- //
- // Hence, give the services a chance for a graceful shutdown before sending SIGKILL.
+static void KillAllProcesses(bool force) {
+ // SIGKILL on force == true. SIGTERM if not.
+ WriteStringToFile(force ? "i" : "e", PROC_SYSRQ);
+}
+
+static UmountStat UmountPartitions(std::chrono::milliseconds timeout, bool ota_update_in_progress) {
+ // If we have no time left, kill them all as fast as possible by sending SIGKILL. Otherwise
+ // SIGTERM so that they can gracefully exit.
+ bool immediate = timeout == 0ms;
+ // Terminate the services before unmounting partitions. If we have some time left, give them a
+ // chance for a graceful shutdown by sending SIGTERM. If not, kill immediately by sending
+ // SIGKILL.
for (const auto& s : ServiceList::GetInstance()) {
if (s->IsShutdownCritical()) {
LOG(INFO) << "Shutdown service: " << s->name();
- s->Terminate();
+ if (immediate) {
+ s->Timeout();
+ } else {
+ s->Terminate();
+ }
}
}
+ // Below is to ensure that all remaining processes (except init) are SIGKILL'ed or SIGTERM'ed.
+ // This is because some children of the services above might have created new process groups.
+ // Note that, each service by default is a process group leader, and we send a signal to the
+ // process group when killing the service. So, if some children created their own process group,
+ // they don't get killed. Below is to kill even such ones.
+ //
+ // However, if OTA update is in progress we NEVER send SIGKILL because snapuserd will be serving
+ // I/Os and therefore killing it will ruin the update. snapuserd ignores SIGTERM.
+ KillAllProcesses(immediate && !ota_update_in_progress);
+
ReapAnyOutstandingChildren();
Timer t;
@@ -366,12 +386,12 @@
*/
while (true) {
// force umount operation if timeout is not set
- UmountStat stat = TryUmountPartitions(/*force=*/timeout == 0ms);
+ UmountStat stat = TryUmountPartitions(immediate);
if (stat == UMOUNT_STAT_SUCCESS) {
return UMOUNT_STAT_SUCCESS;
}
- if (stat == UMOUNT_STAT_NOT_AVAILABLE || timeout == 0ms) {
+ if (stat == UMOUNT_STAT_NOT_AVAILABLE || immediate) {
return UMOUNT_STAT_ERROR;
}
@@ -382,10 +402,6 @@
}
}
-static void KillAllProcesses() {
- WriteStringToFile("i", PROC_SYSRQ);
-}
-
// Reboot/shutdown monitor thread
static void RebootMonitorThread(unsigned int cmd, const Timer& shutdown_timer) {
// We want quite a long timeout here since the "sync" in the calling
@@ -521,7 +537,7 @@
ota_update_in_progress = true;
}
}
- UmountStat stat = UmountPartitions(timeout - t.duration());
+ UmountStat stat = UmountPartitions(timeout - t.duration(), ota_update_in_progress);
if (stat != UMOUNT_STAT_SUCCESS) {
// Do not delete: Critical log for reboot_fs_integrity_test.
KLOG_INFO(LOG_TAG, "umount timeout, last resort, kill all and try");
@@ -542,7 +558,7 @@
bool umount_dynamic_partitions = UmountDynamicPartitions(dynamic_partitions);
LOG(INFO) << "Sending SIGTERM to all process";
// Send SIGTERM to all processes except init
- WriteStringToFile("e", PROC_SYSRQ);
+ KillAllProcesses(/* force */ false);
// Wait for processes to terminate
std::this_thread::sleep_for(1s);
// Try one more attempt to umount other partitions which failed
@@ -552,9 +568,9 @@
}
return stat;
}
- KillAllProcesses();
+ KillAllProcesses(/* force */ true);
// even if it succeeds, still it is timeout and do not run fsck with all processes killed
- UmountStat st = UmountPartitions(0ms);
+ UmountStat st = UmountPartitions(0ms, ota_update_in_progress);
if ((st != UMOUNT_STAT_SUCCESS) && DUMP_ON_UMOUNT_FAILURE) DumpUmountDebuggingInfo();
}