ANDROID: KVM: arm64: Only map swap-backed pages into the guest Alistair reports an ext4 splat when running a non-protected guest under pKVM using Cuttlefish on a rockpi board: | WARNING: CPU: 4 PID: 3125 at fs/ext4/inode.c:3592 ext4_set_page_dirty+0x6c/0x90 | sp : ffffffc00e1a39b0 | x29: ffffffc00e1a39b0 x28: ffffffc009ac3c18 x27: ffffffc009a80968 | x26: ffffff80c2753a00 x25: 0000000200000000 x24: ffffffc00a6dc000 | x23: 0000000000000000 x22: 0000000000000001 x21: fffffffe0314f640 | x20: ffffff8063a99890 x19: fffffffe0314f640 x18: ffffffc00dbf5090 | x17: 0000000000000020 x16: ffffffc00ab73080 x15: 0000000000000040 | x14: 0000000000000040 x13: 0000000000000040 x12: 0000000080200000 | x11: 0000000000000000 x10: fffffffe0314f640 x9 : 0000000000000016 | x8 : 0000000000000015 x7 : 0000000000000062 x6 : 0000000000000068 | x5 : 0000000080200015 x4 : ffffff80067c7500 x3 : 0000000080200016 | x2 : 0000000000000001 x1 : 0000000000000001 x0 : fffffffe0314f640 | Call trace: | ext4_set_page_dirty+0x6c/0x90 | set_page_dirty+0xf0/0x264 | set_page_dirty_lock+0x94/0x164 | unpin_user_pages_dirty_lock+0xa0/0x15c | kvm_shadow_destroy+0xd4/0x150 | kvm_arch_destroy_vm+0xa0/0xa4 | kvm_destroy_vm+0x634/0xa0c | kvm_vcpu_release+0x44/0xc0 | __fput+0xf8/0x43c | ____fput+0x14/0x24 | task_work_run+0x140/0x204 | do_exit+0x450/0x12b0 | do_group_exit+0xc8/0x17c | get_signal+0x85c/0xa10 | do_signal+0x9c/0x268 | do_notify_resume+0x98/0x220 | el0_svc+0x5c/0x84 | el0t_64_sync_handler+0x88/0xec | el0t_64_sync+0x1b4/0x1b8 This appears to be due to virtio-pmem mapping a host page-cache page directly into the guest and pinning it with GUP. A later attempt to wrprotect the page using page_mkclean() on the writeback path will not find the guest mapping and consequently the filesystem becomes confused when we later dirty the page without any page buffers having been allocated. Since the host cannot generally access the memory of protected VMs, restrict ourselves to swap-backed pages for now and avoid attempting writeback altogether, with the GUP pin preventing swapout. Bug: 223678931 Reported-by: Alistair Delva <adelva@google.com> Signed-off-by: Will Deacon <willdeacon@google.com> Change-Id: Id8da126aac220df6eff44177a911dc4627e68c02
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 80fed4c..79fb8a4 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c
@@ -1189,7 +1189,22 @@ static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, ret = pin_user_pages(hva, 1, flags, &page, NULL); mmap_read_unlock(mm); - if (ret == -EHWPOISON) { + /* + * We really can't deal with page-cache pages returned by GUP + * because (a) we may trigger writeback of a page for which we + * no longer have access and (b) page_mkclean() won't find the + * stage-2 mapping in the rmap so we can get out-of-whack with + * the filesystem when marking the page dirty during unpinning. + * + * Ideally we'd just restrict ourselves to anonymous pages, but + * we also want to allow memfd (i.e. shmem) pages, so check for + * pages backed by swap in the knowledge that the GUP pin will + * prevent try_to_unmap() from succeeding. + */ + if (!PageSwapBacked(page)) { + ret = -EIO; + goto dec_account; + } else if (ret == -EHWPOISON) { kvm_send_hwpoison_signal(hva, PAGE_SHIFT); ret = 0; goto dec_account;