Merge 6.1.11 into android14-6.1 Changes in 6.1.11 firewire: fix memory leak for payload of request subaction to IEC 61883-1 FCP region bus: sunxi-rsb: Fix error handling in sunxi_rsb_init() arm64: dts: imx8m-venice: Remove incorrect 'uart-has-rtscts' arm64: dts: freescale: imx8dxl: fix sc_pwrkey's property name linux,keycode ASoC: amd: acp-es8336: Drop reference count of ACPI device after use ASoC: Intel: bytcht_es8316: Drop reference count of ACPI device after use ASoC: Intel: bytcr_rt5651: Drop reference count of ACPI device after use ASoC: Intel: bytcr_rt5640: Drop reference count of ACPI device after use ASoC: Intel: bytcr_wm5102: Drop reference count of ACPI device after use ASoC: Intel: sof_es8336: Drop reference count of ACPI device after use ASoC: Intel: avs: Implement PCI shutdown bpf: Fix off-by-one error in bpf_mem_cache_idx() bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers ALSA: hda/via: Avoid potential array out-of-bound in add_secret_dac_path() bpf: Fix to preserve reg parent/live fields when copying range info selftests/filesystems: grant executable permission to run_fat_tests.sh ASoC: SOF: ipc4-mtrace: prevent underflow in sof_ipc4_priority_mask_dfs_write() bpf: Add missing btf_put to register_btf_id_dtor_kfuncs media: v4l2-ctrls-api.c: move ctrl->is_new = 1 to the correct line bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener arm64: dts: imx8mm: Fix pad control for UART1_DTE_RX arm64: dts: imx8mm-verdin: Do not power down eth-phy drm/vc4: hdmi: make CEC adapter name unique drm/ssd130x: Init display before the SSD130X_DISPLAY_ON command scsi: Revert "scsi: core: map PQ=1, PDT=other values to SCSI_SCAN_TARGET_PRESENT" bpf: Fix the kernel crash caused by bpf_setsockopt(). ALSA: memalloc: Workaround for Xen PV vhost/net: Clear the pending messages when the backend is removed copy_oldmem_kernel() - WRITE is "data source", not destination WRITE is "data source", not destination... READ is "data destination", not source... zcore: WRITE is "data source", not destination... memcpy_real(): WRITE is "data source", not destination... fix iov_iter_bvec() "direction" argument fix 'direction' argument of iov_iter_{init,bvec}() fix "direction" argument of iov_iter_kvec() use less confusing names for iov_iter direction initializers vhost-scsi: unbreak any layout for response ice: Prevent set_channel from changing queues while RDMA active qede: execute xdp_do_flush() before napi_complete_done() virtio-net: execute xdp_do_flush() before napi_complete_done() dpaa_eth: execute xdp_do_flush() before napi_complete_done() dpaa2-eth: execute xdp_do_flush() before napi_complete_done() skb: Do mix page pool and page referenced frags in GRO sfc: correctly advertise tunneled IPv6 segmentation net: phy: dp83822: Fix null pointer access on DP83825/DP83826 devices net: wwan: t7xx: Fix Runtime PM initialization block, bfq: replace 0/1 with false/true in bic apis block, bfq: fix uaf for bfqq in bic_set_bfqq() netrom: Fix use-after-free caused by accept on already connected socket fscache: Use wait_on_bit() to wait for the freeing of relinquished volume platform/x86/amd/pmf: update to auto-mode limits only after AMT event platform/x86/amd/pmf: Add helper routine to update SPS thermals platform/x86/amd/pmf: Fix to update SPS default pprof thermals platform/x86/amd/pmf: Add helper routine to check pprof is balanced platform/x86/amd/pmf: Fix to update SPS thermals when power supply change platform/x86/amd/pmf: Ensure mutexes are initialized before use platform/x86: thinkpad_acpi: Fix thinklight LED brightness returning 255 drm/i915/guc: Fix locking when searching for a hung request drm/i915: Fix request ref counting during error capture & debugfs dump drm/i915: Fix up locking around dumping requests lists drm/i915/adlp: Fix typo for reference clock net/tls: tls_is_tx_ready() checked list_entry ALSA: firewire-motu: fix unreleased lock warning in hwdep device netfilter: br_netfilter: disable sabotage_in hook after first suppression block: ublk: extending queue_size to fix overflow kunit: fix kunit_test_init_section_suites(...) squashfs: harden sanity check in squashfs_read_xattr_id_table maple_tree: should get pivots boundary by type sctp: do not check hb_timer.expires when resetting hb_timer net: phy: meson-gxl: Add generic dummy stubs for MMD register access drm/panel: boe-tv101wum-nl6: Ensure DSI writes succeed during disable ip/ip6_gre: Fix changing addr gen mode not generating IPv6 link local address ip/ip6_gre: Fix non-point-to-point tunnel not generating IPv6 link local address riscv: kprobe: Fixup kernel panic when probing an illegal position igc: return an error if the mac type is unknown in igc_ptp_systim_to_hwtstamp() octeontx2-af: Fix devlink unregister can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate can: raw: fix CAN FD frame transmissions over CAN XL devices can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq ata: libata: Fix sata_down_spd_limit() when no link speed is reported selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking virtio-net: Keep stop() to follow mirror sequence of open() net: openvswitch: fix flow memory leak in ovs_flow_cmd_new efi: fix potential NULL deref in efi_mem_reserve_persistent rtc: sunplus: fix format string for printing resource certs: Fix build error when PKCS#11 URI contains semicolon kbuild: modinst: Fix build error when CONFIG_MODULE_SIG_KEY is a PKCS#11 URI i2c: designware-pci: Add new PCI IDs for AMD NAVI GPU i2c: mxs: suppress probe-deferral error message scsi: target: core: Fix warning on RT kernels x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings perf/x86/intel: Add Emerald Rapids perf/x86/intel/cstate: Add Emerald Rapids scsi: iscsi_tcp: Fix UAF during logout when accessing the shost ipaddress scsi: iscsi_tcp: Fix UAF during login when accessing the shost ipaddress i2c: rk3x: fix a bunch of kernel-doc warnings Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one" x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block platform/x86: dell-wmi: Add a keymap for KEY_MUTE in type 0x0010 table platform/x86: hp-wmi: Handle Omen Key event platform/x86: gigabyte-wmi: add support for B450M DS3H WIFI-CF platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN net/x25: Fix to not accept on connected socket drm/amd/display: Fix timing not changning when freesync video is enabled bcache: Silence memcpy() run-time false positive warnings iio: adc: stm32-dfsdm: fill module aliases usb: dwc3: qcom: enable vbus override when in OTG dr-mode usb: gadget: f_fs: Fix unbalanced spinlock in __ffs_ep0_queue_wait vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF fbcon: Check font dimension limits cgroup/cpuset: Fix wrong check in update_parent_subparts_cpumask() hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap() ARM: dts: imx7d-smegw01: Fix USB host over-current polarity net: qrtr: free memory on error path in radix_tree_insert() can: isotp: split tx timer into transmission and timeout can: isotp: handle wait_event_interruptible() return values watchdog: diag288_wdt: do not use stack buffers for hardware data watchdog: diag288_wdt: fix __diag288() inline assembly ALSA: hda/realtek: Add Acer Predator PH315-54 ALSA: hda/realtek: fix mute/micmute LEDs, speaker don't work for a HP platform ASoC: codecs: wsa883x: correct playback min/max rates ASoC: SOF: sof-audio: unprepare when swidget->use_count > 0 ASoC: SOF: sof-audio: skip prepare/unprepare if swidget is NULL ASoC: SOF: keep prepare/unprepare widgets in sink path efi: Accept version 2 of memory attributes table rtc: efi: Enable SET/GET WAKEUP services as optional iio: hid: fix the retval in accel_3d_capture_sample iio: hid: fix the retval in gyro_3d_capture_sample iio: adc: xilinx-ams: fix devm_krealloc() return value check iio: adc: berlin2-adc: Add missing of_node_put() in error path iio: imx8qxp-adc: fix irq flood when call imx8qxp_adc_read_raw() iio:adc:twl6030: Enable measurements of VUSB, VBAT and others iio: light: cm32181: Fix PM support on system with 2 I2C resources iio: imu: fxos8700: fix ACCEL measurement range selection iio: imu: fxos8700: fix incomplete ACCEL and MAGN channels readback iio: imu: fxos8700: fix IMU data bits returned to user space iio: imu: fxos8700: fix map label of channel type to MAGN sensor iio: imu: fxos8700: fix swapped ACCEL and MAGN channels readback iio: imu: fxos8700: fix incorrect ODR mode readback iio: imu: fxos8700: fix failed initialization ODR mode assignment iio: imu: fxos8700: remove definition FXOS8700_CTRL_ODR_MIN iio: imu: fxos8700: fix MAGN sensor scale and unit nvmem: brcm_nvram: Add check for kzalloc nvmem: sunxi_sid: Always use 32-bit MMIO reads nvmem: qcom-spmi-sdam: fix module autoloading parisc: Fix return code of pdc_iodc_print() parisc: Replace hardcoded value with PRIV_USER constant in ptrace.c parisc: Wire up PTRACE_GETREGS/PTRACE_SETREGS for compat case riscv: disable generation of unwind tables Revert "mm: kmemleak: alloc gray object for reserved region with direct map" mm: multi-gen LRU: fix crash during cgroup migration mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() usb: gadget: f_uac2: Fix incorrect increment of bNumEndpoints usb: typec: ucsi: Don't attempt to resume the ports before they exist usb: gadget: udc: do not clear gadget driver.bus kernel/irq/irqdomain.c: fix memory leak with using debugfs_lookup() HV: hv_balloon: fix memory leak with using debugfs_lookup() x86/debug: Fix stack recursion caused by wrongly ordered DR7 accesses fpga: m10bmc-sec: Fix probe rollback fpga: stratix10-soc: Fix return value check in s10_ops_write_init() mm/uffd: fix pte marker when fork() without fork event mm/swapfile: add cond_resched() in get_swap_pages() mm/khugepaged: fix ->anon_vma race mm, mremap: fix mremap() expanding for vma's with vm_ops->close() mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups highmem: round down the address passed to kunmap_flush_on_unmap() ia64: fix build error due to switch case label appearing next to declaration Squashfs: fix handling and sanity checking of xattr_ids count maple_tree: fix mas_empty_area_rev() lower bound validation migrate: hugetlb: check for hugetlb shared PMD in node migration dma-buf: actually set signaling bit for private stub fences serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler drm/i915: Avoid potential vm use-after-free drm/i915: Fix potential bit_17 double-free drm/amd: Fix initialization for nbio 4.3.0 drm/amd/pm: drop unneeded dpm features disablement for SMU 13.0.4/11 drm/amdgpu: update wave data type to 3 for gfx11 nvmem: core: initialise nvmem->id early nvmem: core: remove nvmem_config wp_gpio nvmem: core: fix cleanup after dev_set_name() nvmem: core: fix registration vs use race nvmem: core: fix device node refcounting nvmem: core: fix cell removal on error nvmem: core: fix return value phy: qcom-qmp-combo: fix runtime suspend serial: 8250_dma: Fix DMA Rx completion race serial: 8250_dma: Fix DMA Rx rearm race platform/x86/amd: pmc: add CONFIG_SERIO dependency ASoC: SOF: sof-audio: prepare_widgets: Check swidget for NULL on sink failure iio:adc:twl6030: Enable measurement of VAC powerpc/64s/radix: Fix crash with unaligned relocated kernel powerpc/64s: Fix local irq disable when PMIs are disabled powerpc/imc-pmu: Revert nest_init_lock to being a mutex fs/ntfs3: Validate attribute data and valid sizes ovl: Use "buf" flexible array for memcpy() destination f2fs: initialize locks earlier in f2fs_fill_super() fbdev: smscufx: fix error handling code in ufx_usb_probe f2fs: fix to do sanity check on i_extra_isize in is_alive() wifi: brcmfmac: Check the count value of channel spec to prevent out-of-bounds reads gfs2: Cosmetic gfs2_dinode_{in,out} cleanup gfs2: Always check inode size of inline inodes bpf: Skip invalid kfunc call in backtrack_insn Linux 6.1.11 Change-Id: I69722bc9711b91f2fca18de59746ada373f64c5e Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>

commit: c747c018510514875d9d790dde177c52bba7b679 [log] [tgz]
author: Greg Kroah-Hartman <gregkh@google.com> Thu Feb 09 13:29:55 2023 +0000
committer: Greg Kroah-Hartman <gregkh@google.com> Thu Feb 09 13:29:55 2023 +0000
tree: 6873c12b013a89e31f46a6ff316987db6d6ac14f
parent: 22bb0d018db1d896a0df8de10a727241573cab3f [diff]
parent: d60c95efffe84428e3611431bf688f50bfc13f4e [diff]
diff --git a/BUILD.bazel b/BUILD.bazel
new file mode 100644
index 0000000..1ebac0b
--- /dev/null
+++ b/BUILD.bazel

@@ -0,0 +1,568 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2021 The Android Open Source Project
+
+load("//build/bazel_common_rules/dist:dist.bzl", "copy_to_dist_dir")
+load("//build/kernel/kleaf:common_kernels.bzl", "define_common_kernels", "define_db845c")
+load("//build/kernel/kleaf:kernel.bzl", "ddk_headers", "kernel_build", "kernel_images", "kernel_modules_install")
+load(":modules.bzl", "COMMON_GKI_MODULES_LIST")
+
+package(
+    default_visibility = [
+        "//visibility:public",
+    ],
+)
+
+define_common_kernels(target_configs = {
+    "kernel_aarch64": {
+        "module_implicit_outs": COMMON_GKI_MODULES_LIST,
+        "additional_kmi_symbol_lists": [],  # Temporarily disable vendor symbol lists. This will be reverted before freezing the KMI.
+        "kmi_symbol_list_strict_mode": False,
+        "trim_nonlisted_kmi": False,
+    },
+    "kernel_aarch64_16k": {
+        "module_implicit_outs": COMMON_GKI_MODULES_LIST,
+        "additional_kmi_symbol_lists": [],  # Temporarily disable vendor symbol lists. This will be reverted before freezing the KMI.
+        "kmi_symbol_list_strict_mode": False,
+        "trim_nonlisted_kmi": False,
+    },
+    "kernel_aarch64_debug": {
+        "module_implicit_outs": COMMON_GKI_MODULES_LIST,
+        "additional_kmi_symbol_lists": [],  # Temporarily disable vendor symbol lists. This will be reverted before freezing the KMI.
+        "kmi_symbol_list_strict_mode": False,
+        "trim_nonlisted_kmi": False,
+    },
+    "kernel_riscv64": {
+        "module_implicit_outs": COMMON_GKI_MODULES_LIST,
+        "additional_kmi_symbol_lists": [],  # Temporarily disable vendor symbol lists. This will be reverted before freezing the KMI.
+        "kmi_symbol_list_strict_mode": False,
+        "trim_nonlisted_kmi": False,
+    },
+    "kernel_x86_64": {
+        "kmi_symbol_list_strict_mode": False,
+        "module_implicit_outs": COMMON_GKI_MODULES_LIST,
+        "additional_kmi_symbol_lists": [],  # Temporarily disable vendor symbol lists. This will be reverted before freezing the KMI.
+        "trim_nonlisted_kmi": False,
+    },
+    "kernel_x86_64_debug": {
+        "kmi_symbol_list_strict_mode": False,
+        "module_implicit_outs": COMMON_GKI_MODULES_LIST,
+        "additional_kmi_symbol_lists": [],  # Temporarily disable vendor symbol lists. This will be reverted before freezing the KMI.
+        "trim_nonlisted_kmi": False,
+    },
+})
+
+define_db845c(
+    name = "db845c",
+    outs = [
+        "arch/arm64/boot/dts/qcom/qrb5165-rb5.dtb",
+        "arch/arm64/boot/dts/qcom/sdm845-db845c.dtb",
+        "arch/arm64/boot/dts/qcom/sm8450-qrd.dtb",
+    ],
+    module_outs = [
+        # keep sorted
+        "crypto/michael_mic.ko",
+        "drivers/base/regmap/regmap-sdw.ko",
+        "drivers/base/regmap/regmap-slimbus.ko",
+        "drivers/bus/mhi/host/mhi.ko",
+        "drivers/clk/qcom/clk-qcom.ko",
+        "drivers/clk/qcom/clk-rpmh.ko",
+        "drivers/clk/qcom/clk-spmi-pmic-div.ko",
+        "drivers/clk/qcom/dispcc-sdm845.ko",
+        "drivers/clk/qcom/dispcc-sm8250.ko",
+        "drivers/clk/qcom/gcc-sdm845.ko",
+        "drivers/clk/qcom/gcc-sm8250.ko",
+        "drivers/clk/qcom/gcc-sm8450.ko",
+        "drivers/clk/qcom/gpucc-sdm845.ko",
+        "drivers/clk/qcom/gpucc-sm8250.ko",
+        "drivers/clk/qcom/lpass-gfm-sm8250.ko",
+        "drivers/clk/qcom/videocc-sdm845.ko",
+        "drivers/clk/qcom/videocc-sm8250.ko",
+        "drivers/cpufreq/qcom-cpufreq-hw.ko",
+        "drivers/dma-buf/heaps/system_heap.ko",
+        "drivers/dma/qcom/bam_dma.ko",
+        "drivers/dma/qcom/gpi.ko",
+        "drivers/extcon/extcon-usb-gpio.ko",
+        "drivers/firmware/qcom-scm.ko",
+        "drivers/gpio/gpio-wcd934x.ko",
+        "drivers/gpu/drm/bridge/display-connector.ko",
+        "drivers/gpu/drm/bridge/lontium-lt9611.ko",
+        "drivers/gpu/drm/bridge/lontium-lt9611uxc.ko",
+        "drivers/gpu/drm/display/drm_display_helper.ko",
+        "drivers/gpu/drm/display/drm_dp_aux_bus.ko",
+        "drivers/gpu/drm/msm/msm.ko",
+        "drivers/gpu/drm/scheduler/gpu-sched.ko",
+        "drivers/hwspinlock/qcom_hwspinlock.ko",
+        "drivers/i2c/busses/i2c-designware-core.ko",
+        "drivers/i2c/busses/i2c-designware-platform.ko",
+        "drivers/i2c/busses/i2c-qcom-geni.ko",
+        "drivers/i2c/busses/i2c-qup.ko",
+        "drivers/i2c/busses/i2c-rk3x.ko",
+        "drivers/i2c/i2c-dev.ko",
+        "drivers/i2c/i2c-mux.ko",
+        "drivers/i2c/muxes/i2c-mux-pca954x.ko",
+        "drivers/iio/adc/qcom-spmi-adc5.ko",
+        "drivers/iio/adc/qcom-vadc-common.ko",
+        "drivers/input/misc/pm8941-pwrkey.ko",
+        "drivers/interconnect/qcom/icc-bcm-voter.ko",
+        "drivers/interconnect/qcom/icc-osm-l3.ko",
+        "drivers/interconnect/qcom/icc-rpmh.ko",
+        "drivers/interconnect/qcom/qnoc-sdm845.ko",
+        "drivers/interconnect/qcom/qnoc-sm8250.ko",
+        "drivers/interconnect/qcom/qnoc-sm8450.ko",
+        "drivers/iommu/arm/arm-smmu/arm_smmu.ko",
+        "drivers/irqchip/qcom-pdc.ko",
+        "drivers/leds/rgb/leds-qcom-lpg.ko",
+        "drivers/mailbox/qcom-apcs-ipc-mailbox.ko",
+        "drivers/mailbox/qcom-ipcc.ko",
+        "drivers/media/platform/qcom/venus/venus-core.ko",
+        "drivers/media/platform/qcom/venus/venus-dec.ko",
+        "drivers/media/platform/qcom/venus/venus-enc.ko",
+        "drivers/mfd/qcom-spmi-pmic.ko",
+        "drivers/mfd/wcd934x.ko",
+        "drivers/misc/fastrpc.ko",
+        "drivers/mmc/host/cqhci.ko",
+        "drivers/mmc/host/sdhci-msm.ko",
+        "drivers/net/can/spi/mcp251xfd/mcp251xfd.ko",
+        "drivers/net/wireless/ath/ath.ko",
+        "drivers/net/wireless/ath/ath10k/ath10k_core.ko",
+        "drivers/net/wireless/ath/ath10k/ath10k_pci.ko",
+        "drivers/net/wireless/ath/ath10k/ath10k_snoc.ko",
+        "drivers/net/wireless/ath/ath11k/ath11k.ko",
+        "drivers/net/wireless/ath/ath11k/ath11k_ahb.ko",
+        "drivers/net/wireless/ath/ath11k/ath11k_pci.ko",
+        "drivers/nvmem/nvmem_qfprom.ko",
+        "drivers/phy/qualcomm/phy-qcom-qmp-combo.ko",
+        "drivers/phy/qualcomm/phy-qcom-qmp-pcie.ko",
+        "drivers/phy/qualcomm/phy-qcom-qmp-pcie-msm8996.ko",
+        "drivers/phy/qualcomm/phy-qcom-qmp-ufs.ko",
+        "drivers/phy/qualcomm/phy-qcom-qmp-usb.ko",
+        "drivers/phy/qualcomm/phy-qcom-qusb2.ko",
+        "drivers/phy/qualcomm/phy-qcom-snps-femto-v2.ko",
+        "drivers/phy/qualcomm/phy-qcom-usb-hs.ko",
+        "drivers/pinctrl/qcom/pinctrl-lpass-lpi.ko",
+        "drivers/pinctrl/qcom/pinctrl-msm.ko",
+        "drivers/pinctrl/qcom/pinctrl-sdm845.ko",
+        "drivers/pinctrl/qcom/pinctrl-sm8250.ko",
+        "drivers/pinctrl/qcom/pinctrl-sm8250-lpass-lpi.ko",
+        "drivers/pinctrl/qcom/pinctrl-sm8450.ko",
+        "drivers/pinctrl/qcom/pinctrl-spmi-gpio.ko",
+        "drivers/pinctrl/qcom/pinctrl-spmi-mpp.ko",
+        "drivers/power/reset/qcom-pon.ko",
+        "drivers/power/reset/reboot-mode.ko",
+        "drivers/power/reset/syscon-reboot-mode.ko",
+        "drivers/regulator/gpio-regulator.ko",
+        "drivers/regulator/qcom-rpmh-regulator.ko",
+        "drivers/regulator/qcom_spmi-regulator.ko",
+        "drivers/regulator/qcom_usb_vbus-regulator.ko",
+        "drivers/remoteproc/qcom_common.ko",
+        "drivers/remoteproc/qcom_pil_info.ko",
+        "drivers/remoteproc/qcom_q6v5.ko",
+        "drivers/remoteproc/qcom_q6v5_adsp.ko",
+        "drivers/remoteproc/qcom_q6v5_mss.ko",
+        "drivers/remoteproc/qcom_q6v5_pas.ko",
+        "drivers/remoteproc/qcom_q6v5_wcss.ko",
+        "drivers/remoteproc/qcom_sysmon.ko",
+        "drivers/reset/reset-qcom-aoss.ko",
+        "drivers/reset/reset-qcom-pdc.ko",
+        "drivers/rpmsg/qcom_glink.ko",
+        "drivers/rpmsg/qcom_glink_rpm.ko",
+        "drivers/rpmsg/qcom_glink_smem.ko",
+        "drivers/rpmsg/qcom_smd.ko",
+        "drivers/rpmsg/rpmsg_ns.ko",
+        "drivers/rtc/rtc-pm8xxx.ko",
+        "drivers/slimbus/slim-qcom-ngd-ctrl.ko",
+        "drivers/slimbus/slimbus.ko",
+        "drivers/soc/qcom/apr.ko",
+        "drivers/soc/qcom/cmd-db.ko",
+        "drivers/soc/qcom/cpr.ko",
+        "drivers/soc/qcom/llcc-qcom.ko",
+        "drivers/soc/qcom/mdt_loader.ko",
+        "drivers/soc/qcom/pdr_interface.ko",
+        "drivers/soc/qcom/qcom_aoss.ko",
+        "drivers/soc/qcom/qcom_rpmh.ko",
+        "drivers/soc/qcom/qmi_helpers.ko",
+        "drivers/soc/qcom/rmtfs_mem.ko",
+        "drivers/soc/qcom/rpmhpd.ko",
+        "drivers/soc/qcom/smem.ko",
+        "drivers/soc/qcom/smp2p.ko",
+        "drivers/soc/qcom/smsm.ko",
+        "drivers/soc/qcom/socinfo.ko",
+        "drivers/soc/qcom/spm.ko",
+        "drivers/soundwire/soundwire-bus.ko",
+        "drivers/soundwire/soundwire-qcom.ko",
+        "drivers/spi/spi-geni-qcom.ko",
+        "drivers/spi/spi-pl022.ko",
+        "drivers/spi/spi-qcom-qspi.ko",
+        "drivers/spi/spi-qup.ko",
+        "drivers/spmi/spmi-pmic-arb.ko",
+        "drivers/thermal/qcom/lmh.ko",
+        "drivers/thermal/qcom/qcom-spmi-adc-tm5.ko",
+        "drivers/thermal/qcom/qcom-spmi-temp-alarm.ko",
+        "drivers/thermal/qcom/qcom_tsens.ko",
+        "drivers/tty/serial/msm_serial.ko",
+        "drivers/ufs/host/ufs_qcom.ko",
+        "drivers/usb/common/ulpi.ko",
+        "drivers/usb/host/ohci-hcd.ko",
+        "drivers/usb/host/ohci-pci.ko",
+        "drivers/usb/host/ohci-platform.ko",
+        "drivers/usb/typec/qcom-pmic-typec.ko",
+        "drivers/watchdog/pm8916_wdt.ko",
+        "drivers/watchdog/qcom-wdt.ko",
+        "net/qrtr/qrtr.ko",
+        "net/qrtr/qrtr-mhi.ko",
+        "net/qrtr/qrtr-smd.ko",
+        "net/qrtr/qrtr-tun.ko",
+        "sound/soc/codecs/snd-soc-dmic.ko",
+        "sound/soc/codecs/snd-soc-hdmi-codec.ko",
+        "sound/soc/codecs/snd-soc-lpass-macro-common.ko",
+        "sound/soc/codecs/snd-soc-lpass-va-macro.ko",
+        "sound/soc/codecs/snd-soc-lpass-wsa-macro.ko",
+        "sound/soc/codecs/snd-soc-max98927.ko",
+        "sound/soc/codecs/snd-soc-rl6231.ko",
+        "sound/soc/codecs/snd-soc-rt5663.ko",
+        "sound/soc/codecs/snd-soc-wcd-mbhc.ko",
+        "sound/soc/codecs/snd-soc-wcd9335.ko",
+        "sound/soc/codecs/snd-soc-wcd934x.ko",
+        "sound/soc/codecs/snd-soc-wsa881x.ko",
+        "sound/soc/qcom/qdsp6/q6adm.ko",
+        "sound/soc/qcom/qdsp6/q6afe.ko",
+        "sound/soc/qcom/qdsp6/q6afe-clocks.ko",
+        "sound/soc/qcom/qdsp6/q6afe-dai.ko",
+        "sound/soc/qcom/qdsp6/q6apm-dai.ko",
+        "sound/soc/qcom/qdsp6/q6apm-lpass-dais.ko",
+        "sound/soc/qcom/qdsp6/q6asm.ko",
+        "sound/soc/qcom/qdsp6/q6asm-dai.ko",
+        "sound/soc/qcom/qdsp6/q6core.ko",
+        "sound/soc/qcom/qdsp6/q6prm.ko",
+        "sound/soc/qcom/qdsp6/q6prm-clocks.ko",
+        "sound/soc/qcom/qdsp6/q6routing.ko",
+        "sound/soc/qcom/qdsp6/snd-q6apm.ko",
+        "sound/soc/qcom/qdsp6/snd-q6dsp-common.ko",
+        "sound/soc/qcom/snd-soc-qcom-common.ko",
+        "sound/soc/qcom/snd-soc-qcom-sdw.ko",
+        "sound/soc/qcom/snd-soc-sdm845.ko",
+        "sound/soc/qcom/snd-soc-sm8250.ko",
+    ],
+)
+
+# TODO(b/258259749): Convert rockpi4 to mixed build
+kernel_build(
+    name = "rockpi4",
+    outs = [
+        "Image",
+        "System.map",
+        "modules.builtin",
+        "modules.builtin.modinfo",
+        "rk3399-rock-pi-4b.dtb",
+        "vmlinux",
+        "vmlinux.symvers",
+    ],
+    build_config = "build.config.rockpi4",
+    dtstree = "//common-modules/virtual-device:rockpi4_dts",
+    module_outs = COMMON_GKI_MODULES_LIST + [
+        # keep sorted
+        "drivers/block/virtio_blk.ko",
+        "drivers/char/hw_random/virtio-rng.ko",
+        "drivers/clk/clk-rk808.ko",
+        "drivers/cpufreq/cpufreq-dt.ko",
+        "drivers/dma/pl330.ko",
+        "drivers/gpu/drm/bridge/analogix/analogix_dp.ko",
+        "drivers/gpu/drm/bridge/synopsys/dw-hdmi.ko",
+        "drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.ko",
+        "drivers/gpu/drm/display/drm_display_helper.ko",
+        "drivers/gpu/drm/drm_dma_helper.ko",
+        "drivers/gpu/drm/rockchip/rockchipdrm.ko",
+        "drivers/i2c/busses/i2c-rk3x.ko",
+        "drivers/iio/adc/rockchip_saradc.ko",
+        "drivers/iio/buffer/industrialio-triggered-buffer.ko",
+        "drivers/iio/buffer/kfifo_buf.ko",
+        "drivers/mfd/rk808.ko",
+        "drivers/mmc/core/pwrseq_simple.ko",
+        "drivers/mmc/host/cqhci.ko",
+        "drivers/mmc/host/dw_mmc.ko",
+        "drivers/mmc/host/dw_mmc-pltfm.ko",
+        "drivers/mmc/host/dw_mmc-rockchip.ko",
+        "drivers/mmc/host/sdhci-of-arasan.ko",
+        "drivers/net/ethernet/stmicro/stmmac/dwmac-rk.ko",
+        "drivers/net/ethernet/stmicro/stmmac/stmmac.ko",
+        "drivers/net/ethernet/stmicro/stmmac/stmmac-platform.ko",
+        "drivers/net/net_failover.ko",
+        "drivers/net/pcs/pcs_xpcs.ko",
+        "drivers/net/virtio_net.ko",
+        "drivers/pci/controller/pcie-rockchip-host.ko",
+        "drivers/phy/rockchip/phy-rockchip-emmc.ko",
+        "drivers/phy/rockchip/phy-rockchip-inno-usb2.ko",
+        "drivers/phy/rockchip/phy-rockchip-pcie.ko",
+        "drivers/phy/rockchip/phy-rockchip-typec.ko",
+        "drivers/pwm/pwm-rockchip.ko",
+        "drivers/regulator/fan53555.ko",
+        "drivers/regulator/pwm-regulator.ko",
+        "drivers/regulator/rk808-regulator.ko",
+        "drivers/rtc/rtc-rk808.ko",
+        "drivers/soc/rockchip/io-domain.ko",
+        "drivers/thermal/rockchip_thermal.ko",
+        "drivers/usb/host/ohci-hcd.ko",
+        "drivers/usb/host/ohci-platform.ko",
+        "drivers/virtio/virtio_pci.ko",
+        "drivers/virtio/virtio_pci_legacy_dev.ko",
+        "drivers/virtio/virtio_pci_modern_dev.ko",
+        "drivers/watchdog/dw_wdt.ko",
+        "net/core/failover.ko",
+    ],
+)
+
+kernel_modules_install(
+    name = "rockpi4_modules_install",
+    kernel_build = "//common:rockpi4",
+)
+
+kernel_images(
+    name = "rockpi4_images",
+    build_initramfs = True,
+    kernel_build = "//common:rockpi4",
+    kernel_modules_install = "//common:rockpi4_modules_install",
+)
+
+copy_to_dist_dir(
+    name = "rockpi4_dist",
+    data = [
+        ":rockpi4",
+        ":rockpi4_images",
+        ":rockpi4_modules_install",
+    ],
+    dist_dir = "out/rockpi4/dist",
+    flat = True,
+)
+
+kernel_build(
+    name = "fips140",
+    outs = [],
+    base_kernel = ":kernel_aarch64",
+    build_config = "build.config.gki.aarch64.fips140",
+    module_outs = ["crypto/fips140.ko"],
+)
+
+copy_to_dist_dir(
+    name = "fips140_dist",
+    data = [
+        ":fips140",
+    ],
+    dist_dir = "out/fips140/dist",
+    flat = True,
+)
+
+# allmodconfig build tests.
+# These are build tests only, so:
+# - outs are intentionally set to empty to not copy anything to DIST_DIR
+# - --allow-undeclared-modules must be used so modules are not declared or copied.
+# - No dist target because these are build tests. We don't care about the artifacts.
+
+# tools/bazel build --allow_undeclared_modules //common:kernel_aarch64_allmodconfig
+kernel_build(
+    name = "kernel_aarch64_allmodconfig",
+    # Hack to actually check the build.
+    # Otherwise, Bazel thinks that there are no output files, and skip building.
+    outs = [".config"],
+    build_config = "build.config.allmodconfig.aarch64",
+    visibility = ["//visibility:private"],
+)
+
+# tools/bazel build --allow_undeclared_modules //common:kernel_x86_64_allmodconfig
+kernel_build(
+    name = "kernel_x86_64_allmodconfig",
+    # Hack to actually check the build.
+    # Otherwise, Bazel thinks that there are no output files, and skip building.
+    outs = [".config"],
+    build_config = "build.config.allmodconfig.x86_64",
+    visibility = ["//visibility:private"],
+)
+
+# tools/bazel build --allow_undeclared_modules //common:kernel_arm_allmodconfig
+kernel_build(
+    name = "kernel_arm_allmodconfig",
+    # Hack to actually check the build.
+    # Otherwise, Bazel thinks that there are no output files, and skip building.
+    outs = [".config"],
+    build_config = "build.config.allmodconfig.arm",
+    visibility = ["//visibility:private"],
+)
+
+# DDK Headers
+# All headers. These are the public targets for DDK modules to use.
+alias(
+    name = "all_headers",
+    actual = "all_headers_aarch64",
+    visibility = ["//visibility:public"],
+)
+
+ddk_headers(
+    name = "all_headers_aarch64",
+    hdrs = [":all_headers_allowlist_aarch64"] + select({
+        "//build/kernel/kleaf:allow_ddk_unsafe_headers_set": [":all_headers_unsafe"],
+        "//conditions:default": [],
+    }),
+    visibility = ["//visibility:public"],
+)
+
+ddk_headers(
+    name = "all_headers_arm",
+    hdrs = [":all_headers_allowlist_arm"] + select({
+        "//build/kernel/kleaf:allow_ddk_unsafe_headers_set": [":all_headers_unsafe"],
+        "//conditions:default": [],
+    }),
+    visibility = ["//visibility:public"],
+)
+
+ddk_headers(
+    name = "all_headers_riscv64",
+    hdrs = [":all_headers_allowlist_riscv64"] + select({
+        "//build/kernel/kleaf:allow_ddk_unsafe_headers_set": [":all_headers_unsafe"],
+        "//conditions:default": [],
+    }),
+    visibility = ["//visibility:public"],
+)
+
+ddk_headers(
+    name = "all_headers_x86_64",
+    hdrs = [":all_headers_allowlist_x86_64"] + select({
+        "//build/kernel/kleaf:allow_ddk_unsafe_headers_set": [":all_headers_unsafe"],
+        "//conditions:default": [],
+    }),
+    visibility = ["//visibility:public"],
+)
+
+# Implementation details for DDK headers. The targets below cannot be directly
+# depended on by DDK modules.
+
+# DDK headers allowlist. This is the list of all headers and include
+# directories that are safe to use in DDK modules.
+ddk_headers(
+    name = "all_headers_allowlist_aarch64",
+    hdrs = [
+        ":all_headers_allowlist_aarch64_globs",
+        ":all_headers_allowlist_common_globs",
+    ],
+    # The list of include directories where source files can #include headers
+    # from. In other words, these are the `-I` option to the C compiler.
+    # These are prepended to LINUXINCLUDE.
+    linux_includes = [
+        "arch/arm64/include",
+        "arch/arm64/include/uapi",
+        "include",
+        "include/uapi",
+    ],
+    visibility = ["//visibility:private"],
+)
+
+ddk_headers(
+    name = "all_headers_allowlist_arm",
+    hdrs = [
+        ":all_headers_allowlist_arm_globs",
+        ":all_headers_allowlist_common_globs",
+    ],
+    # The list of include directories where source files can #include headers
+    # from. In other words, these are the `-I` option to the C compiler.
+    # These are prepended to LINUXINCLUDE.
+    linux_includes = [
+        "arch/arm/include",
+        "arch/arm/include/uapi",
+        "include",
+        "include/uapi",
+    ],
+    visibility = ["//visibility:private"],
+)
+
+ddk_headers(
+    name = "all_headers_allowlist_riscv64",
+    hdrs = [
+        ":all_headers_allowlist_common_globs",
+        ":all_headers_allowlist_riscv64_globs",
+    ],
+    # The list of include directories where source files can #include headers
+    # from. In other words, these are the `-I` option to the C compiler.
+    # These are prepended to LINUXINCLUDE.
+    linux_includes = [
+        "arch/riscv/include",
+        "arch/riscv/include/uapi",
+        "include",
+        "include/uapi",
+    ],
+    visibility = ["//visibility:private"],
+)
+
+ddk_headers(
+    name = "all_headers_allowlist_x86_64",
+    hdrs = [
+        ":all_headers_allowlist_common_globs",
+        ":all_headers_allowlist_x86_64_globs",
+    ],
+    # The list of include directories where source files can #include headers
+    # from. In other words, these are the `-I` option to the C compiler.
+    # These are prepended to LINUXINCLUDE.
+    linux_includes = [
+        "arch/x86/include",
+        "arch/x86/include/uapi",
+        "include",
+        "include/uapi",
+    ],
+    visibility = ["//visibility:private"],
+)
+
+# List of DDK headers allowlist that are glob()-ed to avoid changes of BUILD
+# file when the list of files changes. All headers in these directories
+# are safe to use.
+# These are separate filegroup targets so the all_headers_allowlist_* are
+# more friendly to batch BUILD file update tools like buildozer.
+
+# globs() for arm64 only
+filegroup(
+    name = "all_headers_allowlist_aarch64_globs",
+    srcs = glob(["arch/arm64/include/**/*.h"]),
+    visibility = ["//visibility:private"],
+)
+
+# globs() for arm only
+filegroup(
+    name = "all_headers_allowlist_arm_globs",
+    srcs = glob(["arch/arm/include/**/*.h"]),
+    visibility = ["//visibility:private"],
+)
+
+# globs() for riscv64 only
+filegroup(
+    name = "all_headers_allowlist_riscv64_globs",
+    srcs = glob(["arch/riscv/include/**/*.h"]),
+    visibility = ["//visibility:private"],
+)
+
+# globs() for x86 only
+filegroup(
+    name = "all_headers_allowlist_x86_64_globs",
+    srcs = glob(["arch/x86/include/**/*.h"]),
+    visibility = ["//visibility:private"],
+)
+
+# globs() for all architectures
+filegroup(
+    name = "all_headers_allowlist_common_globs",
+    srcs = glob(["include/**/*.h"]),
+    visibility = ["//visibility:private"],
+)
+
+# DDK headers unsafe list. This is the list of all headers and include
+# directories that may be used during migration from kernel_module's, but
+# should be avoided in general.
+# Use with caution; items may:
+# - be removed without notice
+# - be moved into all_headers
+ddk_headers(
+    name = "all_headers_unsafe",
+    hdrs = [
+        "drivers/gpu/drm/virtio/virtgpu_trace.h",
+    ],
+    # The list of include directories where source files can #include headers
+    # from. In other words, these are the `-I` option to the C compiler.
+    # Unsafe include directories are appended to ccflags-y.
+    includes = [],
+    visibility = ["//visibility:private"],
+)

diff --git a/Documentation/ABI/testing/OWNERS b/Documentation/ABI/testing/OWNERS
new file mode 100644
index 0000000..3ab5dca
--- /dev/null
+++ b/Documentation/ABI/testing/OWNERS

@@ -0,0 +1 @@
+per-file sysfs-fs-f2fs=file:/fs/f2fs/OWNERS

diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc
index 611b23e..f00cff6 100644
--- a/Documentation/ABI/testing/configfs-usb-gadget-uvc
+++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc

@@ -197,7 +197,7 @@
 					read-only
 		bmaControls		this format's data for bmaControls in
 					the streaming header
-		bmInterfaceFlags	specifies interlace information,
+		bmInterlaceFlags	specifies interlace information,
 					read-only
 		bAspectRatioY		the X dimension of the picture aspect
 					ratio, read-only
@@ -253,7 +253,7 @@
 					read-only
 		bmaControls		this format's data for bmaControls in
 					the streaming header
-		bmInterfaceFlags	specifies interlace information,
+		bmInterlaceFlags	specifies interlace information,
 					read-only
 		bAspectRatioY		the X dimension of the picture aspect
 					ratio, read-only

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
index 483639f..9e37566 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs

@@ -99,6 +99,12 @@
 		checkpoint is triggered, and issued during the checkpoint.
 		By default, it is disabled with 0.
 
+What:		/sys/fs/f2fs/<disk>/max_ordered_discard
+Date:		October 2022
+Contact:	"Yangtao Li" <frank.li@vivo.com>
+Description:	Controls the maximum ordered discard, the unit size is one block(4KB).
+		Set it to 16 by default.
+
 What:		/sys/fs/f2fs/<disk>/max_discard_request
 Date:		December 2021
 Contact:	"Konstantin Vyshetsky" <vkon@google.com>
@@ -132,7 +138,8 @@
 Description:	Controls discard granularity of inner discard thread. Inner thread
 		will not issue discards with size that is smaller than granularity.
 		The unit size is one block(4KB), now only support configuring
-		in range of [1, 512]. Default value is 4(=16KB).
+		in range of [1, 512]. Default value is 16.
+		For small devices, default value is 1.
 
 What:		/sys/fs/f2fs/<disk>/umount_discard_timeout
 Date:		January 2019
@@ -235,7 +242,7 @@
 What:		/sys/fs/f2fs/<disk>/features
 Date:		July 2017
 Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
-Description:	<deprecated: should use /sys/fs/f2fs/<disk>/feature_list/
+Description:	<deprecated: should use /sys/fs/f2fs/<disk>/feature_list/>
 		Shows all enabled features in current device.
 		Supported features:
 		encryption, blkzoned, extra_attr, projquota, inode_checksum,
@@ -592,10 +599,10 @@
 		in the length of 1..<max_fragment_hole> by turns. This value can be set
 		between 1..512 and the default value is 4.
 
-What:		/sys/fs/f2fs/<disk>/gc_urgent_high_remaining
-Date:		December 2021
-Contact:	"Daeho Jeong" <daehojeong@google.com>
-Description:	You can set the trial count limit for GC urgent high mode with this value.
+What:		/sys/fs/f2fs/<disk>/gc_remaining_trials
+Date:		October 2022
+Contact:	"Yangtao Li" <frank.li@vivo.com>
+Description:	You can set the trial count limit for GC urgent and idle mode with this value.
 		If GC thread gets to the limit, the mode will turn back to GC normal mode.
 		By default, the value is zero, which means there is no limit like before.
 
@@ -634,3 +641,31 @@
 Contact:	"Daeho Jeong" <daehojeong@google.com>
 Description:	Show the accumulated total revoked atomic write block count after boot.
 		If you write "0" here, you can initialize to "0".
+
+What:		/sys/fs/f2fs/<disk>/gc_mode
+Date:		October 2022
+Contact:	"Yangtao Li" <frank.li@vivo.com>
+Description:	Show the current gc_mode as a string.
+		This is a read-only entry.
+
+What:		/sys/fs/f2fs/<disk>/discard_urgent_util
+Date:		November 2022
+Contact:	"Yangtao Li" <frank.li@vivo.com>
+Description:	When space utilization exceeds this, do background DISCARD aggressively.
+		Does DISCARD forcibly in a period of given min_discard_issue_time when the number
+		of discards is not 0 and set discard granularity to 1.
+		Default: 80
+
+What:		/sys/fs/f2fs/<disk>/hot_data_age_threshold
+Date:		November 2022
+Contact:	"Ping Xiong" <xiongping1@xiaomi.com>
+Description:	When DATA SEPARATION is on, it controls the age threshold to indicate
+		the data blocks as hot. By default it was initialized as 262144 blocks
+		(equals to 1GB).
+
+What:		/sys/fs/f2fs/<disk>/warm_data_age_threshold
+Date:		November 2022
+Contact:	"Ping Xiong" <xiongping1@xiaomi.com>
+Description:	When DATA SEPARATION is on, it controls the age threshold to indicate
+		the data blocks as warm. By default it was initialized as 2621440 blocks
+		(equals to 10GB).

diff --git a/Documentation/ABI/testing/sysfs-fs-fuse b/Documentation/ABI/testing/sysfs-fs-fuse
new file mode 100644
index 0000000..b995684
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-fs-fuse

@@ -0,0 +1,19 @@
+What:		/sys/fs/fuse/features/fuse_bpf
+Date:		December 2022
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:
+		Read-only file that contains the word 'supported' if fuse-bpf is
+		supported, does not exist otherwise
+
+What:		/sys/fs/fuse/bpf_prog_type_fuse
+Date:		December 2022
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:
+		bpf_prog_type_fuse defines the program type of bpf programs that
+		may be passed to fuse-bpf. For upstream bpf program types, this
+		is a constant defined in a contiguous array of constants.
+		bpf_prog_type_fuse is appended to the end of the list, so it may
+		change and therefore its value must be read from this file.
+
+		Contents is ASCII decimal representation of bpf_prog_type_fuse
+

diff --git a/Documentation/ABI/testing/sysfs-fs-incfs b/Documentation/ABI/testing/sysfs-fs-incfs
new file mode 100644
index 0000000..690c687
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-fs-incfs

@@ -0,0 +1,64 @@
+What:		/sys/fs/incremental-fs/features/corefs
+Date:		2019
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Reads 'supported'. Always present.
+
+What:		/sys/fs/incremental-fs/features/v2
+Date:		April 2021
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Reads 'supported'. Present if all v2 features of incfs are
+		supported.
+
+What:		/sys/fs/incremental-fs/features/zstd
+Date:		April 2021
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Reads 'supported'. Present if zstd compression is supported
+		for data blocks.
+
+What:		/sys/fs/incremental-fs/instances/[name]
+Date:		April 2021
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Folder created when incfs is mounted with the sysfs_name=[name]
+		option. If this option is used, the following values are created
+		in this folder.
+
+What:		/sys/fs/incremental-fs/instances/[name]/reads_delayed_min
+Date:		April 2021
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Returns a count of the number of reads that were delayed as a
+		result of the per UID read timeouts min time setting.
+
+What:		/sys/fs/incremental-fs/instances/[name]/reads_delayed_min_us
+Date:		April 2021
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Returns total delay time for all files since first mount as a
+		result of the per UID read timeouts min time setting.
+
+What:		/sys/fs/incremental-fs/instances/[name]/reads_delayed_pending
+Date:		April 2021
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Returns a count of the number of reads that were delayed as a
+		result of waiting for a pending read.
+
+What:		/sys/fs/incremental-fs/instances/[name]/reads_delayed_pending_us
+Date:		April 2021
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Returns total delay time for all files since first mount as a
+		result of waiting for a pending read.
+
+What:		/sys/fs/incremental-fs/instances/[name]/reads_failed_hash_verification
+Date:		April 2021
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Returns number of reads that failed because of hash verification
+		failures.
+
+What:		/sys/fs/incremental-fs/instances/[name]/reads_failed_other
+Date:		April 2021
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Returns number of reads that failed for reasons other than
+		timing out or hash failures.
+
+What:		/sys/fs/incremental-fs/instances/[name]/reads_failed_timed_out
+Date:		April 2021
+Contact:	Paul Lawrence <paullawrence@google.com>
+Description:	Returns number of reads that timed out.

diff --git a/Documentation/ABI/testing/sysfs-kernel-wakeup_reasons b/Documentation/ABI/testing/sysfs-kernel-wakeup_reasons
new file mode 100644
index 0000000..acb19b9
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-wakeup_reasons

@@ -0,0 +1,16 @@
+What:		/sys/kernel/wakeup_reasons/last_resume_reason
+Date:		February 2014
+Contact:	Ruchi Kandoi <kandoiruchi@google.com>
+Description:
+		The /sys/kernel/wakeup_reasons/last_resume_reason is
+		used to report wakeup reasons after system exited suspend.
+
+What:		/sys/kernel/wakeup_reasons/last_suspend_time
+Date:		March 2015
+Contact:	jinqian <jinqian@google.com>
+Description:
+		The /sys/kernel/wakeup_reasons/last_suspend_time is
+		used to report time spent in last suspend cycle. It contains
+		two numbers (in seconds) separated by space. First number is
+		the time spent in suspend and resume processes. Second number
+		is the time spent in sleep state.
\ No newline at end of file

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6b83886..7122305 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt

@@ -1774,6 +1774,10 @@
 				If specified, z/VM IUCV HVC accepts connections
 				from listed z/VM user IDs only.
 
+	hvc_dcc.enable=	[ARM,ARM64]	Enable DCC driver at runtime. For GKI,
+				disabled at runtime by default to prevent
+				crashes in devices which do not support DCC.
+
 	hv_nopvspin	[X86,HYPER_V] Disables the paravirt spinlock optimizations
 				      which allow the hypervisor to 'idle' the
 				      guest on lock contention.
@@ -1784,6 +1788,10 @@
 			between unregistering the boot console and initializing
 			the real console.
 
+	kunit.enable=	[KUNIT] Enable executing KUnit tests in modules on load.
+			Requires CONFIG_KUNIT to be enabled.
+			Default is 0 (disabled)
+
 	i2c_bus=	[HW]	Override the default board specific I2C bus speed
 				or register an additional I2C bus that is not
 				registered from board initialization code.
@@ -2176,6 +2184,9 @@
 			1 - Bypass the IOMMU for DMA.
 			unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
 
+	ioremap_guard	[ARM64] enable the KVM MMIO guard functionality
+			if available.
+
 	io7=		[HW] IO7 for Marvel-based Alpha systems
 			See comment before marvel_specify_io7 in
 			arch/alpha/kernel/core_marvel.c.
@@ -2538,7 +2549,9 @@
 			      protected guests.
 
 			protected: nVHE-based mode with support for guests whose
-				   state is kept private from the host.
+				   state is kept private from the host. See
+				   Documentation/virt/kvm/arm/pkvm.rst for more
+				   information about this mode of operation.
 
 			Defaults to VHE/nVHE based on hardware support. Setting
 			mode to "protected" will disable kexec and hibernation

diff --git a/Documentation/block/inline-encryption.rst b/Documentation/block/inline-encryption.rst
index 4d151fb..4fd2b11 100644
--- a/Documentation/block/inline-encryption.rst
+++ b/Documentation/block/inline-encryption.rst

@@ -77,10 +77,10 @@
 ============
 
 We introduce ``struct blk_crypto_key`` to represent an inline encryption key and
-how it will be used.  This includes the actual bytes of the key; the size of the
-key; the algorithm and data unit size the key will be used with; and the number
-of bytes needed to represent the maximum data unit number the key will be used
-with.
+how it will be used.  This includes the type of the key (standard or
+hardware-wrapped); the actual bytes of the key; the size of the key; the
+algorithm and data unit size the key will be used with; and the number of bytes
+needed to represent the maximum data unit number the key will be used with.
 
 We introduce ``struct bio_crypt_ctx`` to represent an encryption context.  It
 contains a data unit number and a pointer to a blk_crypto_key.  We add pointers
@@ -142,7 +142,7 @@
 of inline encryption using the kernel crypto API.  blk-crypto-fallback is built
 into the block layer, so it works on any block device without any special setup.
 Essentially, when a bio with an encryption context is submitted to a
-request_queue that doesn't support that encryption context, the block layer will
+block_device that doesn't support that encryption context, the block layer will
 handle en/decryption of the bio using blk-crypto-fallback.
 
 For encryption, the data cannot be encrypted in-place, as callers usually rely
@@ -187,7 +187,7 @@
 
 ``blk_crypto_config_supported()`` allows users to check ahead of time whether
 inline encryption with particular crypto settings will work on a particular
-request_queue -- either via hardware or via blk-crypto-fallback.  This function
+block_device -- either via hardware or via blk-crypto-fallback.  This function
 takes in a ``struct blk_crypto_config`` which is like blk_crypto_key, but omits
 the actual bytes of the key and instead just contains the algorithm, data unit
 size, etc.  This function can be useful if blk-crypto-fallback is disabled.
@@ -195,7 +195,7 @@
 ``blk_crypto_init_key()`` allows users to initialize a blk_crypto_key.
 
 Users must call ``blk_crypto_start_using_key()`` before actually starting to use
-a blk_crypto_key on a request_queue (even if ``blk_crypto_config_supported()``
+a blk_crypto_key on a block_device (even if ``blk_crypto_config_supported()``
 was called earlier).  This is needed to initialize blk-crypto-fallback if it
 will be needed.  This must not be called from the data path, as this may have to
 allocate resources, which may deadlock in that case.
@@ -207,7 +207,7 @@
 later, as that happens automatically when the bio is freed or reset.
 
 Finally, when done using inline encryption with a blk_crypto_key on a
-request_queue, users must call ``blk_crypto_evict_key()``.  This ensures that
+block_device, users must call ``blk_crypto_evict_key()``.  This ensures that
 the key is evicted from all keyslots it may be programmed into and unlinked from
 any kernel data structures it may be linked into.
 
@@ -221,9 +221,9 @@
 5. ``blk_crypto_evict_key()`` (after all I/O has completed)
 6. Zeroize the blk_crypto_key (this has no dedicated function)
 
-If a blk_crypto_key is being used on multiple request_queues, then
+If a blk_crypto_key is being used on multiple block_devices, then
 ``blk_crypto_config_supported()`` (if used), ``blk_crypto_start_using_key()``,
-and ``blk_crypto_evict_key()`` must be called on each request_queue.
+and ``blk_crypto_evict_key()`` must be called on each block_device.
 
 API presented to device drivers
 ===============================
@@ -302,3 +302,216 @@
 When the crypto API fallback is enabled, this means that all bios with and
 encryption context will use the fallback, and IO will complete as usual.  When
 the fallback is disabled, a bio with an encryption context will be failed.
+
+.. _hardware_wrapped_keys:
+
+Hardware-wrapped keys
+=====================
+
+Motivation and threat model
+---------------------------
+
+Linux storage encryption (dm-crypt, fscrypt, eCryptfs, etc.) traditionally
+relies on the raw encryption key(s) being present in kernel memory so that the
+encryption can be performed.  This traditionally isn't seen as a problem because
+the key(s) won't be present during an offline attack, which is the main type of
+attack that storage encryption is intended to protect from.
+
+However, there is an increasing desire to also protect users' data from other
+types of attacks (to the extent possible), including:
+
+- Cold boot attacks, where an attacker with physical access to a system suddenly
+  powers it off, then immediately dumps the system memory to extract recently
+  in-use encryption keys, then uses these keys to decrypt user data on-disk.
+
+- Online attacks where the attacker is able to read kernel memory without fully
+  compromising the system, followed by an offline attack where any extracted
+  keys can be used to decrypt user data on-disk.  An example of such an online
+  attack would be if the attacker is able to run some code on the system that
+  exploits a Meltdown-like vulnerability but is unable to escalate privileges.
+
+- Online attacks where the attacker fully compromises the system, but their data
+  exfiltration is significantly time-limited and/or bandwidth-limited, so in
+  order to completely exfiltrate the data they need to extract the encryption
+  keys to use in a later offline attack.
+
+Hardware-wrapped keys are a feature of inline encryption hardware that is
+designed to protect users' data from the above attacks (to the extent possible),
+without introducing limitations such as a maximum number of keys.
+
+Note that it is impossible to **fully** protect users' data from these attacks.
+Even in the attacks where the attacker "just" gets read access to kernel memory,
+they can still extract any user data that is present in memory, including
+plaintext pagecache pages of encrypted files.  The focus here is just on
+protecting the encryption keys, as those instantly give access to **all** user
+data in any following offline attack, rather than just some of it (where which
+data is included in that "some" might not be controlled by the attacker).
+
+Solution overview
+-----------------
+
+Inline encryption hardware typically has "keyslots" into which software can
+program keys for the hardware to use; the contents of keyslots typically can't
+be read back by software.  As such, the above security goals could be achieved
+if the kernel simply erased its copy of the key(s) after programming them into
+keyslot(s) and thereafter only referred to them via keyslot number.
+
+However, that naive approach runs into the problem that it limits the number of
+unlocked keys to the number of keyslots, which typically is a small number.  In
+cases where there is only one encryption key system-wide (e.g., a full-disk
+encryption key), that can be tolerable.  However, in general there can be many
+logged-in users with many different keys, and/or many running applications with
+application-specific encrypted storage areas.  This is especially true if
+file-based encryption (e.g. fscrypt) is being used.
+
+Thus, it is important for the kernel to still have a way to "remind" the
+hardware about a key, without actually having the raw key itself.  This would
+ensure that the number of hardware keyslots only limits the number of active I/O
+requests, not other things such as the number of logged-in users, the number of
+running apps, or the number of encrypted storage areas that apps can create.
+
+Somewhat less importantly, it is also desirable that the raw keys are never
+visible to software at all, even while being initially unlocked.  This would
+ensure that a read-only compromise of system memory will never allow a key to be
+extracted to be used off-system, even if it occurs when a key is being unlocked.
+
+To solve all these problems, some vendors of inline encryption hardware have
+made their hardware support *hardware-wrapped keys*.  Hardware-wrapped keys
+are encrypted keys that can only be unwrapped (decrypted) and used by hardware
+-- either by the inline encryption hardware itself, or by a dedicated hardware
+block that can directly provision keys to the inline encryption hardware.
+
+(We refer to them as "hardware-wrapped keys" rather than simply "wrapped keys"
+to add some clarity in cases where there could be other types of wrapped keys,
+such as in file-based encryption.  Key wrapping is a commonly used technique.)
+
+The key which wraps (encrypts) hardware-wrapped keys is a hardware-internal key
+that is never exposed to software; it is either a persistent key (a "long-term
+wrapping key") or a per-boot key (an "ephemeral wrapping key").  The long-term
+wrapped form of the key is what is initially unlocked, but it is erased from
+memory as soon as it is converted into an ephemerally-wrapped key.  In-use
+hardware-wrapped keys are always ephemerally-wrapped, not long-term wrapped.
+
+As inline encryption hardware can only be used to encrypt/decrypt data on-disk,
+the hardware also includes a level of indirection; it doesn't use the unwrapped
+key directly for inline encryption, but rather derives both an inline encryption
+key and a "software secret" from it.  Software can use the "software secret" for
+tasks that can't use the inline encryption hardware, such as filenames
+encryption.  The software secret is not protected from memory compromise.
+
+Key hierarchy
+-------------
+
+Here is the key hierarchy for a hardware-wrapped key::
+
+                       Hardware-wrapped key
+                                |
+                                |
+                          <Hardware KDF>
+                                |
+                  -----------------------------
+                  |                           |
+        Inline encryption key           Software secret
+
+The components are:
+
+- *Hardware-wrapped key*: a key for the hardware's KDF (Key Derivation
+  Function), in ephemerally-wrapped form.  The key wrapping algorithm is a
+  hardware implementation detail that doesn't impact kernel operation, but a
+  strong authenticated encryption algorithm such as AES-256-GCM is recommended.
+
+- *Hardware KDF*: a KDF (Key Derivation Function) which the hardware uses to
+  derive subkeys after unwrapping the wrapped key.  The hardware's choice of KDF
+  doesn't impact kernel operation, but it does need to be known for testing
+  purposes, and it's also assumed to have at least a 256-bit security strength.
+  All known hardware uses the SP800-108 KDF in Counter Mode with AES-256-CMAC,
+  with a particular choice of labels and contexts; new hardware should use this
+  already-vetted KDF.
+
+- *Inline encryption key*: a derived key which the hardware directly provisions
+  to a keyslot of the inline encryption hardware, without exposing it to
+  software.  In all known hardware, this will always be an AES-256-XTS key.
+  However, in principle other encryption algorithms could be supported too.
+  Hardware must derive distinct subkeys for each supported encryption algorithm.
+
+- *Software secret*: a derived key which the hardware returns to software so
+  that software can use it for cryptographic tasks that can't use inline
+  encryption.  This value is cryptographically isolated from the inline
+  encryption key, i.e. knowing one doesn't reveal the other.  (The KDF ensures
+  this.)  Currently, the software secret is always 32 bytes and thus is suitable
+  for cryptographic applications that require up to a 256-bit security strength.
+  Some use cases (e.g. full-disk encryption) won't require the software secret.
+
+Example: in the case of fscrypt, the fscrypt master key (the key that protects a
+particular set of encrypted directories) is made hardware-wrapped.  The inline
+encryption key is used as the file contents encryption key, while the software
+secret (rather than the master key directly) is used to key fscrypt's KDF
+(HKDF-SHA512) to derive other subkeys such as filenames encryption keys.
+
+Note that currently this design assumes a single inline encryption key per
+hardware-wrapped key, without any further key derivation.  Thus, in the case of
+fscrypt, currently hardware-wrapped keys are only compatible with the "inline
+encryption optimized" settings, which use one file contents encryption key per
+encryption policy rather than one per file.  This design could be extended to
+make the hardware derive per-file keys using per-file nonces passed down the
+storage stack, and in fact some hardware already supports this; future work is
+planned to remove this limitation by adding the corresponding kernel support.
+
+Kernel support
+--------------
+
+The inline encryption support of the kernel's block layer ("blk-crypto") has
+been extended to support hardware-wrapped keys as an alternative to standard
+keys, when hardware support is available.  This works in the following way:
+
+- A ``key_types_supported`` field is added to the crypto capabilities in
+  ``struct blk_crypto_profile``.  This allows device drivers to declare that
+  they support standard keys, hardware-wrapped keys, or both.
+
+- ``struct blk_crypto_key`` can now contain a hardware-wrapped key as an
+  alternative to a standard key; a ``key_type`` field is added to
+  ``struct blk_crypto_config`` to distinguish between the different key types.
+  This allows users of blk-crypto to en/decrypt data using a hardware-wrapped
+  key in a way very similar to using a standard key.
+
+- A new method ``blk_crypto_ll_ops::derive_sw_secret`` is added.  Device drivers
+  that support hardware-wrapped keys must implement this method.  Users of
+  blk-crypto can call ``blk_crypto_derive_sw_secret()`` to access this method.
+
+- The programming and eviction of hardware-wrapped keys happens via
+  ``blk_crypto_ll_ops::keyslot_program`` and
+  ``blk_crypto_ll_ops::keyslot_evict``, just like it does for standard keys.  If
+  a driver supports hardware-wrapped keys, then it must handle hardware-wrapped
+  keys being passed to these methods.
+
+blk-crypto-fallback doesn't support hardware-wrapped keys.  Therefore,
+hardware-wrapped keys can only be used with actual inline encryption hardware.
+
+Currently, the kernel only works with hardware-wrapped keys in
+ephemerally-wrapped form.  No generic kernel interfaces are provided for
+generating or importing hardware-wrapped keys in the first place, or converting
+them to ephemerally-wrapped form.  In Android, SoC vendors are required to
+support these operations in their KeyMint implementation (a hardware abstraction
+layer in userspace); for details, see the `Android documentation
+<https://source.android.com/security/encryption/hw-wrapped-keys>`_.
+
+Testability
+-----------
+
+Both the hardware KDF and the inline encryption itself are well-defined
+algorithms that don't depend on any secrets other than the unwrapped key.
+Therefore, if the unwrapped key is known to software, these algorithms can be
+reproduced in software in order to verify the ciphertext that is written to disk
+by the inline encryption hardware.
+
+However, the unwrapped key will only be known to software for testing if the
+"import" functionality is used.  Proper testing is not possible in the
+"generate" case where the hardware generates the key itself.  The correct
+operation of the "generate" mode thus relies on the security and correctness of
+the hardware RNG and its use to generate the key, as well as the testing of the
+"import" mode as that should cover all parts other than the key generation.
+
+For an example of a test that verifies the ciphertext written to disk in the
+"import" mode, see the fscrypt hardware-wrapped key tests in xfstests, or
+`Android's vts_kernel_encryption_test
+<https://android.googlesource.com/platform/test/vts-testcase/kernel/+/refs/heads/master/encryption/>`_.

diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
index 5c93ab9..e66916a 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst

@@ -140,6 +140,23 @@
 - ``kasan.vmalloc=off`` or ``=on`` disables or enables tagging of vmalloc
   allocations (default: ``on``).
 
+- ``kasan.page_alloc.sample=<sampling interval>`` makes KASAN tag only every
+  Nth page_alloc allocation with the order equal or greater than
+  ``kasan.page_alloc.sample.order``, where N is the value of the ``sample``
+  parameter (default: ``1``, or tag every such allocation).
+  This parameter is intended to mitigate the performance overhead introduced
+  by KASAN.
+  Note that enabling this parameter makes Hardware Tag-Based KASAN skip checks
+  of allocations chosen by sampling and thus miss bad accesses to these
+  allocations. Use the default value for accurate bug detection.
+
+- ``kasan.page_alloc.sample.order=<minimum page order>`` specifies the minimum
+  order of allocations that are affected by sampling (default: ``3``).
+  Only applies when ``kasan.page_alloc.sample`` is set to a value greater
+  than ``1``.
+  This parameter is intended to allow sampling only large page_alloc
+  allocations, which is the biggest source of the performance overhead.
+
 Error reports
 ~~~~~~~~~~~~~
 

diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst
index f5d13f1..5dd538b 100644
--- a/Documentation/dev-tools/kunit/index.rst
+++ b/Documentation/dev-tools/kunit/index.rst

@@ -19,6 +19,15 @@
 	tips
 	running_tips
 
+.. warning::
+	AOSP only supports running tests loaded with modules. Built-in
+	test execution support has been disabled. In addition, in order
+	to fully enable running module loaded tests both CONFIG_KUNIT
+	needs to be enabled and kernel command line  argument
+	`kunit.enable` needs to be set to 1.
+
+	The remaining KUnit documentation has been left as-is.
+
 This section details the kernel unit testing framework.
 
 Introduction

diff --git a/Documentation/filesystems/OWNERS b/Documentation/filesystems/OWNERS
new file mode 100644
index 0000000..a63dbf4
--- /dev/null
+++ b/Documentation/filesystems/OWNERS

@@ -0,0 +1 @@
+per-file f2fs**=file:/fs/f2fs/OWNERS

diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst
index 17df9a0..220f3e0 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst

@@ -25,10 +25,14 @@
 
 - git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
 
-For reporting bugs and sending patches, please use the following mailing list:
+For sending patches, please use the following mailing list:
 
 - linux-f2fs-devel@lists.sourceforge.net
 
+For reporting bugs, please use the following f2fs bug tracker link:
+
+- https://bugzilla.kernel.org/enter_bug.cgi?product=File%20System&component=f2fs
+
 Background and Design issues
 ============================
 
@@ -154,6 +158,8 @@
 			 If this option is set, no cache_flush commands are issued
 			 but f2fs still guarantees the write ordering of all the
 			 data writes.
+barrier		 If this option is set, cache_flush commands are allowed to be
+			 issued.
 fastboot		 This option is used when a system wants to reduce mount
 			 time as much as possible, even though normal performance
 			 can be sacrificed.
@@ -199,6 +205,7 @@
 			 FAULT_SLAB_ALLOC	  0x000008000
 			 FAULT_DQUOT_INIT	  0x000010000
 			 FAULT_LOCK_OP		  0x000020000
+			 FAULT_BLKADDR		  0x000040000
 			 ===================	  ===========
 mode=%s			 Control block allocation mode which supports "adaptive"
 			 and "lfs". In "lfs" mode, there should be no random
@@ -340,6 +347,10 @@
 			 Because of the nature of low memory devices, in this mode, f2fs
 			 will try to save memory sometimes by sacrificing performance.
 			 "normal" mode is the default mode and same as before.
+age_extent_cache	 Enable an age extent cache based on rb-tree. It records
+			 data block update frequency of the extent per inode, in
+			 order to provide better temperature hints for data block
+			 allocation.
 ======================== ============================================================
 
 Debugfs Entries

diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index 5ba5817..ef18338 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst

@@ -338,6 +338,7 @@
 - AES-128-CBC for contents and AES-128-CTS-CBC for filenames
 - Adiantum for both contents and filenames
 - AES-256-XTS for contents and AES-256-HCTR2 for filenames (v2 policies only)
+- SM4-XTS for contents and SM4-CTS-CBC for filenames (v2 policies only)
 
 If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
 
@@ -369,6 +370,12 @@
 POLYVAL should be enabled, e.g. CRYPTO_POLYVAL_ARM64_CE and
 CRYPTO_AES_ARM64_CE_BLK for ARM64.
 
+SM4 is a Chinese block cipher that is an alternative to AES.  It has
+not seen as much security review as AES, and it only has a 128-bit key
+size.  It may be useful in cases where its use is mandated.
+Otherwise, it should not be used.  For SM4 support to be available, it
+also needs to be enabled in the kernel crypto API.
+
 New encryption modes can be added relatively easily, without changes
 to individual filesystems.  However, authenticated encryption (AE)
 modes are not currently supported because of the difficulty of dealing

diff --git a/Documentation/filesystems/incfs.rst b/Documentation/filesystems/incfs.rst
new file mode 100644
index 0000000..19db303
--- /dev/null
+++ b/Documentation/filesystems/incfs.rst

@@ -0,0 +1,82 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================================================
+incfs: A stacked incremental filesystem for Linux
+=================================================
+
+/sys/fs interface
+=================
+
+Please update Documentation/ABI/testing/sysfs-fs-incfs if you update this
+section.
+
+incfs creates the following files in /sys/fs.
+
+Features
+--------
+
+/sys/fs/incremental-fs/features/corefs
+  Reads 'supported'. Always present.
+
+/sys/fs/incremental-fs/features/v2
+  Reads 'supported'. Present if all v2 features of incfs are supported. These
+  are:
+    fs-verity support
+    inotify support
+    ioclts:
+      INCFS_IOC_SET_READ_TIMEOUTS
+      INCFS_IOC_GET_READ_TIMEOUTS
+      INCFS_IOC_GET_BLOCK_COUNT
+      INCFS_IOC_CREATE_MAPPED_FILE
+    .incomplete folder
+    .blocks_written pseudo file
+    report_uid mount option
+
+/sys/fs/incremental-fs/features/zstd
+  Reads 'supported'. Present if zstd compression is supported for data blocks.
+
+Optional per mount
+------------------
+
+For each incfs mount, the mount option sysfs_name=[name] creates a /sys/fs
+node called:
+
+/sys/fs/incremental-fs/instances/[name]
+
+This will contain the following files:
+
+/sys/fs/incremental-fs/instances/[name]/reads_delayed_min
+  Returns a count of the number of reads that were delayed as a result of the
+  per UID read timeouts min time setting.
+
+/sys/fs/incremental-fs/instances/[name]/reads_delayed_min_us
+  Returns total delay time for all files since first mount as a result of the
+  per UID read timeouts min time setting.
+
+/sys/fs/incremental-fs/instances/[name]/reads_delayed_pending
+  Returns a count of the number of reads that were delayed as a result of
+  waiting for a pending read.
+
+/sys/fs/incremental-fs/instances/[name]/reads_delayed_pending_us
+  Returns total delay time for all files since first mount as a result of
+  waiting for a pending read.
+
+/sys/fs/incremental-fs/instances/[name]/reads_failed_hash_verification
+  Returns number of reads that failed because of hash verification failures.
+
+/sys/fs/incremental-fs/instances/[name]/reads_failed_other
+  Returns number of reads that failed for reasons other than timing out or
+  hash failures.
+
+/sys/fs/incremental-fs/instances/[name]/reads_failed_timed_out
+  Returns number of reads that timed out.
+
+For reads_delayed_*** settings, note that a file can count for both
+reads_delayed_min and reads_delayed_pending if incfs first waits for a pending
+read then has to wait further for the min time. In that case, the time spent
+waiting is split between reads_delayed_pending_us, which is increased by the
+time spent waiting for the pending read, and reads_delayed_min_us, which is
+increased by the remainder of the time spent waiting.
+
+Reads that timed out are not added to the reads_delayed_pending or the
+reads_delayed_pending_us counters.

diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
index 4c76fda..6e316f8 100644
--- a/Documentation/filesystems/overlayfs.rst
+++ b/Documentation/filesystems/overlayfs.rst

@@ -195,7 +195,7 @@
 
 1. return EXDEV error: this error is returned by rename(2) when trying to
    move a file or directory across filesystem boundaries.  Hence
-   applications are usually prepared to hande this error (mv(1) for example
+   applications are usually prepared to handle this error (mv(1) for example
    recursively copies the directory tree).  This is the default behavior.
 
 2. If the "redirect_dir" feature is enabled, then the directory will be
@@ -324,6 +324,30 @@
 The resulting access permissions should be the same.  The difference is in
 the time of copy (on-demand vs. up-front).
 
+### Non overlapping credentials
+
+As noted above, all access to the upper, lower and work directories is the
+recorded mounter's MAC and DAC credentials.  The incoming accesses are
+checked against the caller's credentials.
+
+In the case where caller MAC or DAC credentials do not overlap the mounter, a
+use case available in older versions of the driver, the override_creds mount
+flag can be turned off.  For when the use pattern has caller with legitimate
+credentials where the mounter does not.  For example init may have been the
+mounter, but the caller would have execute or read MAC permissions where
+init would not.  override_creds off means all access, incoming, upper, lower
+or working, will be tested against the caller.
+
+Several unintended side effects will occur though.  The caller without certain
+key capabilities or lower privilege will not always be able to delete files or
+directories, create nodes, or search some restricted directories.  The ability
+to search and read a directory entry is spotty as a result of the cache
+mechanism not re-testing the credentials because of the assumption, a
+privileged caller can fill cache, then a lower privilege can read the directory
+cache.  The uneven security model where cache, upperdir and workdir are opened
+at privilege, but accessed without creating a form of privilege escalation,
+should only be used with strict understanding of the side effects and of the
+security policies.
 
 Multiple lower layers
 ---------------------

diff --git a/Documentation/kbuild/modules.rst b/Documentation/kbuild/modules.rst
index a1f3eb7..e9b6fe2 100644
--- a/Documentation/kbuild/modules.rst
+++ b/Documentation/kbuild/modules.rst

@@ -21,6 +21,7 @@
 	   --- 4.1 Kernel Includes
 	   --- 4.2 Single Subdirectory
 	   --- 4.3 Several Subdirectories
+	   --- 4.4 UAPI Headers Installation
 	=== 5. Module Installation
 	   --- 5.1 INSTALL_MOD_PATH
 	   --- 5.2 INSTALL_MOD_DIR
@@ -131,6 +132,10 @@
 		/lib/modules/<kernel_release>/extra/, but a prefix may
 		be added with INSTALL_MOD_PATH (discussed in section 5).
 
+	headers_install
+		Export headers in a format suitable for userspace. The default
+		location is $PWD/usr. INSTALL_HDR_PATH can change this path.
+
 	clean
 		Remove all generated files in the module directory only.
 
@@ -406,6 +411,17 @@
 	pointing to the directory where the currently executing kbuild
 	file is located.
 
+4.4 UAPI Headers Installation
+-----------------------------
+
+	External modules may export headers to userspace in a similar
+	fashion to the in-tree counterpart drivers. kbuild supports
+	running headers_install target in an out-of-tree. The location
+	where kbuild searches for headers is $(M)/include/uapi and
+	$(M)/arch/$(SRCARCH)/include/uapi.
+
+	See also Documentation/kbuild/headers_install.rst.
+
 
 5. Module Installation
 ======================

diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst
index 8fbce5e..83f1179 100644
--- a/Documentation/scheduler/sched-energy.rst
+++ b/Documentation/scheduler/sched-energy.rst

@@ -402,7 +402,7 @@
 because it is the only one providing some degree of consistency between
 frequency requests and energy predictions.
 
-Using EAS with any other governor than schedutil is not supported.
+Using EAS with any other governor than schedutil is not recommended.
 
 
 6.5 Scale-invariant utilization signals

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index b8ec88e..28cbe9a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst

@@ -6427,6 +6427,13 @@
 KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
 if it decides to decode and emulate the instruction.
 
+This feature isn't available to protected VMs, as userspace does not
+have access to the state that is required to perform the emulation.
+Instead, a data abort exception is directly injected in the guest.
+Note that although KVM_CAP_ARM_NISV_TO_USER will be reported if
+queried outside of a protected VM context, the feature will not be
+exposed if queried on a protected VM file descriptor.
+
 ::
 
 		/* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */

diff --git a/Documentation/virt/kvm/arm/fw-pseudo-registers.rst b/Documentation/virt/kvm/arm/fw-pseudo-registers.rst
new file mode 100644
index 0000000..b90fd0b
--- /dev/null
+++ b/Documentation/virt/kvm/arm/fw-pseudo-registers.rst

@@ -0,0 +1,138 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+ARM firmware pseudo-registers interface
+=======================================
+
+KVM handles the hypercall services as requested by the guests. New hypercall
+services are regularly made available by the ARM specification or by KVM (as
+vendor services) if they make sense from a virtualization point of view.
+
+This means that a guest booted on two different versions of KVM can observe
+two different "firmware" revisions. This could cause issues if a given guest
+is tied to a particular version of a hypercall service, or if a migration
+causes a different version to be exposed out of the blue to an unsuspecting
+guest.
+
+In order to remedy this situation, KVM exposes a set of "firmware
+pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
+interface. These registers can be saved/restored by userspace, and set
+to a convenient value as required.
+
+The following registers are defined:
+
+* KVM_REG_ARM_PSCI_VERSION:
+
+  KVM implements the PSCI (Power State Coordination Interface)
+  specification in order to provide services such as CPU on/off, reset
+  and power-off to the guest.
+
+  - Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
+    (and thus has already been initialized)
+  - Returns the current PSCI version on GET_ONE_REG (defaulting to the
+    highest PSCI version implemented by KVM and compatible with v0.2)
+  - Allows any PSCI version implemented by KVM and compatible with
+    v0.2 to be set with SET_ONE_REG
+  - Affects the whole VM (even if the register view is per-vcpu)
+
+* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
+    Holds the state of the firmware support to mitigate CVE-2017-5715, as
+    offered by KVM to the guest via a HVC call. The workaround is described
+    under SMCCC_ARCH_WORKAROUND_1 in [1].
+
+  Accepted values are:
+
+    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL:
+      KVM does not offer
+      firmware support for the workaround. The mitigation status for the
+      guest is unknown.
+    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL:
+      The workaround HVC call is
+      available to the guest and required for the mitigation.
+    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED:
+      The workaround HVC call
+      is available to the guest, but it is not needed on this VCPU.
+
+* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
+    Holds the state of the firmware support to mitigate CVE-2018-3639, as
+    offered by KVM to the guest via a HVC call. The workaround is described
+    under SMCCC_ARCH_WORKAROUND_2 in [1]_.
+
+  Accepted values are:
+
+    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
+      A workaround is not
+      available. KVM does not offer firmware support for the workaround.
+    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
+      The workaround state is
+      unknown. KVM does not offer firmware support for the workaround.
+    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
+      The workaround is available,
+      and can be disabled by a vCPU. If
+      KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for
+      this vCPU.
+    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
+      The workaround is always active on this vCPU or it is not needed.
+
+
+Bitmap Feature Firmware Registers
+---------------------------------
+
+Contrary to the above registers, the following registers exposes the
+hypercall services in the form of a feature-bitmap to the userspace. This
+bitmap is translated to the services that are available to the guest.
+There is a register defined per service call owner and can be accessed via
+GET/SET_ONE_REG interface.
+
+By default, these registers are set with the upper limit of the features
+that are supported. This way userspace can discover all the usable
+hypercall services via GET_ONE_REG. The user-space can write-back the
+desired bitmap back via SET_ONE_REG. The features for the registers that
+are untouched, probably because userspace isn't aware of them, will be
+exposed as is to the guest.
+
+Note that KVM will not allow the userspace to configure the registers
+anymore once any of the vCPUs has run at least once. Instead, it will
+return a -EBUSY.
+
+The pseudo-firmware bitmap register are as follows:
+
+* KVM_REG_ARM_STD_BMAP:
+    Controls the bitmap of the ARM Standard Secure Service Calls.
+
+  The following bits are accepted:
+
+    Bit-0: KVM_REG_ARM_STD_BIT_TRNG_V1_0:
+      The bit represents the services offered under v1.0 of ARM True Random
+      Number Generator (TRNG) specification, ARM DEN0098.
+
+* KVM_REG_ARM_STD_HYP_BMAP:
+    Controls the bitmap of the ARM Standard Hypervisor Service Calls.
+
+  The following bits are accepted:
+
+    Bit-0: KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
+      The bit represents the Paravirtualized Time service as represented by
+      ARM DEN0057A.
+
+* KVM_REG_ARM_VENDOR_HYP_BMAP:
+    Controls the bitmap of the Vendor specific Hypervisor Service Calls.
+
+  The following bits are accepted:
+
+    Bit-0: KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT
+      The bit represents the ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID
+      and ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID function-ids.
+
+    Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
+      The bit represents the Precision Time Protocol KVM service.
+
+Errors:
+
+    =======  =============================================================
+    -ENOENT   Unknown register accessed.
+    -EBUSY    Attempt a 'write' to the register after the VM has started.
+    -EINVAL   Invalid bitmap written to the register.
+    =======  =============================================================
+
+.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf

diff --git a/Documentation/virt/kvm/arm/hypercalls.rst b/Documentation/virt/kvm/arm/hypercalls.rst
index 3e23084..75a28b80 100644
--- a/Documentation/virt/kvm/arm/hypercalls.rst
+++ b/Documentation/virt/kvm/arm/hypercalls.rst

@@ -1,138 +1,152 @@
 .. SPDX-License-Identifier: GPL-2.0
 
-=======================
-ARM Hypercall Interface
-=======================
+===============================================
+KVM/arm64-specific hypercalls exposed to guests
+===============================================
 
-KVM handles the hypercall services as requested by the guests. New hypercall
-services are regularly made available by the ARM specification or by KVM (as
-vendor services) if they make sense from a virtualization point of view.
+This file documents the KVM/arm64-specific hypercalls which may be
+exposed by KVM/arm64 to guest operating systems. These hypercalls are
+issued using the HVC instruction according to version 1.1 of the Arm SMC
+Calling Convention (DEN0028/C):
 
-This means that a guest booted on two different versions of KVM can observe
-two different "firmware" revisions. This could cause issues if a given guest
-is tied to a particular version of a hypercall service, or if a migration
-causes a different version to be exposed out of the blue to an unsuspecting
-guest.
+https://developer.arm.com/docs/den0028/c
 
-In order to remedy this situation, KVM exposes a set of "firmware
-pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
-interface. These registers can be saved/restored by userspace, and set
-to a convenient value as required.
+All KVM/arm64-specific hypercalls are allocated within the "Vendor
+Specific Hypervisor Service Call" range with a UID of
+``28b46fb6-2ec5-11e9-a9ca-4b564d003a74``. This UID should be queried by the
+guest using the standard "Call UID" function for the service range in
+order to determine that the KVM/arm64-specific hypercalls are available.
 
-The following registers are defined:
+``ARM_SMCCC_KVM_FUNC_FEATURES``
+---------------------------------------------
 
-* KVM_REG_ARM_PSCI_VERSION:
+Provides a discovery mechanism for other KVM/arm64 hypercalls.
 
-  KVM implements the PSCI (Power State Coordination Interface)
-  specification in order to provide services such as CPU on/off, reset
-  and power-off to the guest.
++---------------------+-------------------------------------------------------------+
+| Presence:           | Mandatory for the KVM/arm64 UID                             |
++---------------------+-------------------------------------------------------------+
+| Calling convention: | HVC32                                                       |
++---------------------+----------+--------------------------------------------------+
+| Function ID:        | (uint32) | 0x86000000                                       |
++---------------------+----------+--------------------------------------------------+
+| Arguments:          | None                                                        |
++---------------------+----------+----+---------------------------------------------+
+| Return Values:      | (uint32) | R0 | Bitmap of available function numbers 0-31   |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint32) | R1 | Bitmap of available function numbers 32-63  |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint32) | R2 | Bitmap of available function numbers 64-95  |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint32) | R3 | Bitmap of available function numbers 96-127 |
++---------------------+----------+----+---------------------------------------------+
 
-  - Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
-    (and thus has already been initialized)
-  - Returns the current PSCI version on GET_ONE_REG (defaulting to the
-    highest PSCI version implemented by KVM and compatible with v0.2)
-  - Allows any PSCI version implemented by KVM and compatible with
-    v0.2 to be set with SET_ONE_REG
-  - Affects the whole VM (even if the register view is per-vcpu)
+``ARM_SMCCC_KVM_FUNC_PTP``
+----------------------------------------
 
-* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
-    Holds the state of the firmware support to mitigate CVE-2017-5715, as
-    offered by KVM to the guest via a HVC call. The workaround is described
-    under SMCCC_ARCH_WORKAROUND_1 in [1].
+See ptp_kvm.rst
 
-  Accepted values are:
+``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO``
+----------------------------------
 
-    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL:
-      KVM does not offer
-      firmware support for the workaround. The mitigation status for the
-      guest is unknown.
-    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL:
-      The workaround HVC call is
-      available to the guest and required for the mitigation.
-    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED:
-      The workaround HVC call
-      is available to the guest, but it is not needed on this VCPU.
+Query the memory protection parameters for a protected virtual machine.
 
-* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
-    Holds the state of the firmware support to mitigate CVE-2018-3639, as
-    offered by KVM to the guest via a HVC call. The workaround is described
-    under SMCCC_ARCH_WORKAROUND_2 in [1]_.
++---------------------+-------------------------------------------------------------+
+| Presence:           | Optional; protected guests only.                            |
++---------------------+-------------------------------------------------------------+
+| Calling convention: | HVC64                                                       |
++---------------------+----------+--------------------------------------------------+
+| Function ID:        | (uint32) | 0xC6000002                                       |
++---------------------+----------+----+---------------------------------------------+
+| Arguments:          | (uint64) | R1 | Reserved / Must be zero                     |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint64) | R2 | Reserved / Must be zero                     |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint64) | R3 | Reserved / Must be zero                     |
++---------------------+----------+----+---------------------------------------------+
+| Return Values:      | (int64)  | R0 | ``INVALID_PARAMETER (-3)`` on error, else   |
+|                     |          |    | memory protection granule in bytes          |
++---------------------+----------+----+---------------------------------------------+
 
-  Accepted values are:
+``ARM_SMCCC_KVM_FUNC_MEM_SHARE``
+--------------------------------
 
-    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
-      A workaround is not
-      available. KVM does not offer firmware support for the workaround.
-    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
-      The workaround state is
-      unknown. KVM does not offer firmware support for the workaround.
-    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
-      The workaround is available,
-      and can be disabled by a vCPU. If
-      KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for
-      this vCPU.
-    KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
-      The workaround is always active on this vCPU or it is not needed.
+Share a region of memory with the KVM host, granting it read, write and execute
+permissions. The size of the region is equal to the memory protection granule
+advertised by ``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO``.
 
++---------------------+-------------------------------------------------------------+
+| Presence:           | Optional; protected guests only.                            |
++---------------------+-------------------------------------------------------------+
+| Calling convention: | HVC64                                                       |
++---------------------+----------+--------------------------------------------------+
+| Function ID:        | (uint32) | 0xC6000003                                       |
++---------------------+----------+----+---------------------------------------------+
+| Arguments:          | (uint64) | R1 | Base IPA of memory region to share          |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint64) | R2 | Reserved / Must be zero                     |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint64) | R3 | Reserved / Must be zero                     |
++---------------------+----------+----+---------------------------------------------+
+| Return Values:      | (int64)  | R0 | ``SUCCESS (0)``                             |
+|                     |          |    +---------------------------------------------+
+|                     |          |    | ``INVALID_PARAMETER (-3)``                  |
++---------------------+----------+----+---------------------------------------------+
 
-Bitmap Feature Firmware Registers
----------------------------------
+``ARM_SMCCC_KVM_FUNC_MEM_UNSHARE``
+----------------------------------
 
-Contrary to the above registers, the following registers exposes the
-hypercall services in the form of a feature-bitmap to the userspace. This
-bitmap is translated to the services that are available to the guest.
-There is a register defined per service call owner and can be accessed via
-GET/SET_ONE_REG interface.
+Revoke access permission from the KVM host to a memory region previously shared
+with ``ARM_SMCCC_KVM_FUNC_MEM_SHARE``. The size of the region is equal to the
+memory protection granule advertised by ``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO``.
 
-By default, these registers are set with the upper limit of the features
-that are supported. This way userspace can discover all the usable
-hypercall services via GET_ONE_REG. The user-space can write-back the
-desired bitmap back via SET_ONE_REG. The features for the registers that
-are untouched, probably because userspace isn't aware of them, will be
-exposed as is to the guest.
++---------------------+-------------------------------------------------------------+
+| Presence:           | Optional; protected guests only.                            |
++---------------------+-------------------------------------------------------------+
+| Calling convention: | HVC64                                                       |
++---------------------+----------+--------------------------------------------------+
+| Function ID:        | (uint32) | 0xC6000004                                       |
++---------------------+----------+----+---------------------------------------------+
+| Arguments:          | (uint64) | R1 | Base IPA of memory region to unshare        |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint64) | R2 | Reserved / Must be zero                     |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint64) | R3 | Reserved / Must be zero                     |
++---------------------+----------+----+---------------------------------------------+
+| Return Values:      | (int64)  | R0 | ``SUCCESS (0)``                             |
+|                     |          |    +---------------------------------------------+
+|                     |          |    | ``INVALID_PARAMETER (-3)``                  |
++---------------------+----------+----+---------------------------------------------+
 
-Note that KVM will not allow the userspace to configure the registers
-anymore once any of the vCPUs has run at least once. Instead, it will
-return a -EBUSY.
+``ARM_SMCCC_KVM_FUNC_MEM_RELINQUISH``
+--------------------------------------
 
-The pseudo-firmware bitmap register are as follows:
+Cooperatively relinquish ownership of a memory region. The size of the
+region is equal to the memory protection granule advertised by
+``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO``. If this hypercall is advertised
+then it is mandatory to call it before freeing memory via, for
+example, virtio balloon. If the caller is a protected VM, it is
+guaranteed that the memory region will be completely cleared before
+becoming visible to another VM.
 
-* KVM_REG_ARM_STD_BMAP:
-    Controls the bitmap of the ARM Standard Secure Service Calls.
++---------------------+-------------------------------------------------------------+
+| Presence:           | Optional.                                                   |
++---------------------+-------------------------------------------------------------+
+| Calling convention: | HVC64                                                       |
++---------------------+----------+--------------------------------------------------+
+| Function ID:        | (uint32) | 0xC6000009                                       |
++---------------------+----------+----+---------------------------------------------+
+| Arguments:          | (uint64) | R1 | Base IPA of memory region to relinquish     |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint64) | R2 | Reserved / Must be zero                     |
+|                     +----------+----+---------------------------------------------+
+|                     | (uint64) | R3 | Reserved / Must be zero                     |
++---------------------+----------+----+---------------------------------------------+
+| Return Values:      | (int64)  | R0 | ``SUCCESS (0)``                             |
+|                     |          |    +---------------------------------------------+
+|                     |          |    | ``INVALID_PARAMETER (-3)``                  |
++---------------------+----------+----+---------------------------------------------+
 
-  The following bits are accepted:
+``ARM_SMCCC_KVM_FUNC_MMIO_GUARD_*``
+-----------------------------------
 
-    Bit-0: KVM_REG_ARM_STD_BIT_TRNG_V1_0:
-      The bit represents the services offered under v1.0 of ARM True Random
-      Number Generator (TRNG) specification, ARM DEN0098.
-
-* KVM_REG_ARM_STD_HYP_BMAP:
-    Controls the bitmap of the ARM Standard Hypervisor Service Calls.
-
-  The following bits are accepted:
-
-    Bit-0: KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
-      The bit represents the Paravirtualized Time service as represented by
-      ARM DEN0057A.
-
-* KVM_REG_ARM_VENDOR_HYP_BMAP:
-    Controls the bitmap of the Vendor specific Hypervisor Service Calls.
-
-  The following bits are accepted:
-
-    Bit-0: KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT
-      The bit represents the ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID
-      and ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID function-ids.
-
-    Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
-      The bit represents the Precision Time Protocol KVM service.
-
-Errors:
-
-    =======  =============================================================
-    -ENOENT   Unknown register accessed.
-    -EBUSY    Attempt a 'write' to the register after the VM has started.
-    -EINVAL   Invalid bitmap written to the register.
-    =======  =============================================================
-
-.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
+See mmio-guard.rst

diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index e848484..4d795a7 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst

@@ -7,7 +7,10 @@
 .. toctree::
    :maxdepth: 2
 
+   fw-pseudo-registers
    hyp-abi
    hypercalls
+   pkvm
    pvtime
    ptp_kvm
+   mmio-guard

diff --git a/Documentation/virt/kvm/arm/mmio-guard.rst b/Documentation/virt/kvm/arm/mmio-guard.rst
new file mode 100644
index 0000000..c1ba749
--- /dev/null
+++ b/Documentation/virt/kvm/arm/mmio-guard.rst

@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+KVM MMIO guard
+==============
+
+KVM implements device emulation by handling translation faults to any
+IPA range that is not contained in a memory slot. Such a translation
+fault is in most cases passed on to userspace (or in rare cases to the
+host kernel) with the address, size and possibly data of the access
+for emulation.
+
+Should the guest exit with an address that is not one that corresponds
+to an emulatable device, userspace may take measures that are not the
+most graceful as far as the guest is concerned (such as terminating it
+or delivering a fatal exception).
+
+There is also an element of trust: by forwarding the request to
+userspace, the kernel assumes that the guest trusts userspace to do
+the right thing.
+
+The KVM MMIO guard offers a way to mitigate this last point: a guest
+can request that only certain regions of the IPA space are valid as
+MMIO. Only these regions will be handled as an MMIO, and any other
+will result in an exception being delivered to the guest.
+
+This relies on a set of hypercalls defined in the KVM-specific range,
+using the HVC64 calling convention.
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO
+
+    ==============    ========    ================================
+    Function ID:      (uint32)    0xC6000005
+    Arguments:        r1-r3       Reserved / Must be zero
+    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
+                      (uint64)    Protection Granule (PG) size in
+                                  bytes (r0)
+    ==============    ========    ================================
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL
+
+    ==============    ========    ==============================
+    Function ID:      (uint32)    0xC6000006
+    Arguments:        none
+    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
+                                  RET_SUCCESS(0) (r0)
+    ==============    ========    ==============================
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP
+
+    ==============    ========    ====================================
+    Function ID:      (uint32)    0xC6000007
+    Arguments:        (uint64)    The base of the PG-sized IPA range
+                                  that is allowed to be accessed as
+                                  MMIO. Must be aligned to the PG size
+                                  (r1)
+                      (uint64)    Index in the MAIR_EL1 register
+		                  providing the memory attribute that
+				  is used by the guest (r2)
+    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
+                                  RET_SUCCESS(0) (r0)
+    ==============    ========    ====================================
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP
+
+    ==============    ========    ======================================
+    Function ID:      (uint32)    0xC6000008
+    Arguments:        (uint64)    PG-sized IPA range aligned to the PG
+                                  size which has been previously mapped.
+                                  Must be aligned to the PG size and
+                                  have been previously mapped (r1)
+    Return Values:    (int64)     NOT_SUPPORTED(-1) on error, or
+                                  RET_SUCCESS(0) (r0)
+    ==============    ========    ======================================

diff --git a/Documentation/virt/kvm/arm/pkvm.rst b/Documentation/virt/kvm/arm/pkvm.rst
new file mode 100644
index 0000000..64f099a
--- /dev/null
+++ b/Documentation/virt/kvm/arm/pkvm.rst

@@ -0,0 +1,96 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Protected virtual machines (pKVM)
+=================================
+
+Introduction
+------------
+
+Protected KVM (pKVM) is a KVM/arm64 extension which uses the two-stage
+translation capability of the Armv8 MMU to isolate guest memory from the host
+system. This allows for the creation of a confidential computing environment
+without relying on whizz-bang features in hardware, but still allowing room for
+complementary technologies such as memory encryption and hardware-backed
+attestation.
+
+The major implementation change brought about by pKVM is that the hypervisor
+code running at EL2 is now largely independent of (and isolated from) the rest
+of the host kernel running at EL1 and therefore additional hypercalls are
+introduced to manage manipulation of guest stage-2 page tables, creation of VM
+data structures and reclamation of memory on teardown. An immediate consequence
+of this change is that the host itself runs with an identity mapping enabled
+at stage-2, providing the hypervisor code with a mechanism to restrict host
+access to an arbitrary physical page.
+
+Enabling pKVM
+-------------
+
+The pKVM hypervisor is enabled by booting the host kernel at EL2 with
+"``kvm-arm.mode=protected``" on the command-line. Once enabled, VMs can be spawned
+in either protected or non-protected state, although the hypervisor is still
+responsible for managing most of the VM metadata in either case.
+
+Limitations
+-----------
+
+Enabling pKVM places some significant limitations on KVM guests, regardless of
+whether they are spawned in protected state. It is therefore recommended only
+to enable pKVM if protected VMs are required, with non-protected state acting
+primarily as a debug and development aid.
+
+If you're still keen, then here is an incomplete list of caveats that apply
+to all VMs running under pKVM:
+
+- Guest memory cannot be file-backed (with the exception of shmem/memfd) and is
+  pinned as it is mapped into the guest. This prevents the host from
+  swapping-out, migrating, merging or generally doing anything useful with the
+  guest pages. It also requires that the VMM has either ``CAP_IPC_LOCK`` or
+  sufficient ``RLIMIT_MEMLOCK`` to account for this pinned memory.
+
+- GICv2 is not supported and therefore GICv3 hardware is required in order
+  to expose a virtual GICv3 to the guest.
+
+- Read-only memslots are unsupported and therefore dirty logging cannot be
+  enabled.
+
+- Memslot configuration is fixed once a VM has started running, with subsequent
+  move or deletion requests being rejected with ``-EPERM``.
+
+- There are probably many others.
+
+Since the host is unable to tear down the hypervisor when pKVM is enabled,
+hibernation (``CONFIG_HIBERNATION``) and kexec (``CONFIG_KEXEC``) will fail
+with ``-EBUSY``.
+
+If you are not happy with these limitations, then please don't enable pKVM :)
+
+VM creation
+-----------
+
+When pKVM is enabled, protected VMs can be created by specifying the
+``KVM_VM_TYPE_ARM_PROTECTED`` flag in the machine type identifier parameter
+passed to ``KVM_CREATE_VM``.
+
+Protected VMs are instantiated according to a fixed vCPU configuration
+described by the ID register definitions in
+``arch/arm64/include/asm/kvm_pkvm.h``. Only a subset of the architectural
+features that may be available to the host are exposed to the guest and the
+capabilities advertised by ``KVM_CHECK_EXTENSION`` are limited accordingly,
+with the vCPU registers being initialised to their architecturally-defined
+values.
+
+Where not defined by the architecture, the registers of a protected vCPU
+are reset to zero with the exception of the PC and X0 which can be set
+either by the ``KVM_SET_ONE_REG`` interface or by a call to PSCI ``CPU_ON``.
+
+VM runtime
+----------
+
+By default, memory pages mapped into a protected guest are inaccessible to the
+host and any attempt by the host to access such a page will result in the
+injection of an abort at EL1 by the hypervisor. For accesses originating from
+EL0, the host will then terminate the current task with a ``SIGSEGV``.
+
+pKVM exposes additional hypercalls to protected guests, primarily for the
+purpose of establishing shared-memory regions with the host for communication
+and I/O. These hypercalls are documented in hypercalls.rst.

diff --git a/Documentation/virt/kvm/arm/ptp_kvm.rst b/Documentation/virt/kvm/arm/ptp_kvm.rst
index aecdc80..c8b01b7 100644
--- a/Documentation/virt/kvm/arm/ptp_kvm.rst
+++ b/Documentation/virt/kvm/arm/ptp_kvm.rst

@@ -7,19 +7,29 @@
 It relies on transferring the wall clock and counter value from the
 host to the guest using a KVM-specific hypercall.
 
-* ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID: 0x86000001
+``ARM_SMCCC_KVM_FUNC_PTP``
+----------------------------------------
 
-This hypercall uses the SMC32/HVC32 calling convention:
+Retrieve current time information for the specific counter. There are no
+endianness restrictions.
 
-ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID
-    ==============    ========    =====================================
-    Function ID:      (uint32)    0x86000001
-    Arguments:        (uint32)    KVM_PTP_VIRT_COUNTER(0)
-                                  KVM_PTP_PHYS_COUNTER(1)
-    Return Values:    (int32)     NOT_SUPPORTED(-1) on error, or
-                      (uint32)    Upper 32 bits of wall clock time (r0)
-                      (uint32)    Lower 32 bits of wall clock time (r1)
-                      (uint32)    Upper 32 bits of counter (r2)
-                      (uint32)    Lower 32 bits of counter (r3)
-    Endianness:                   No Restrictions.
-    ==============    ========    =====================================
++---------------------+-------------------------------------------------------+
+| Presence:           | Optional                                              |
++---------------------+-------------------------------------------------------+
+| Calling convention: | HVC32                                                 |
++---------------------+----------+--------------------------------------------+
+| Function ID:        | (uint32) | 0x86000001                                 |
++---------------------+----------+----+---------------------------------------+
+| Arguments:          | (uint32) | R1 | ``KVM_PTP_VIRT_COUNTER (0)``          |
+|                     |          |    +---------------------------------------+
+|                     |          |    | ``KVM_PTP_PHYS_COUNTER (1)``          |
++---------------------+----------+----+---------------------------------------+
+| Return Values:      | (int32)  | R0 | ``NOT_SUPPORTED (-1)`` on error, else |
+|                     |          |    | upper 32 bits of wall clock time      |
+|                     +----------+----+---------------------------------------+
+|                     | (uint32) | R1 | Lower 32 bits of wall clock time      |
+|                     +----------+----+---------------------------------------+
+|                     | (uint32) | R2 | Upper 32 bits of counter              |
+|                     +----------+----+---------------------------------------+
+|                     | (uint32) | R3 | Lower 32 bits of counter              |
++---------------------+----------+----+---------------------------------------+

diff --git a/Kconfig b/Kconfig
index 745bc77..57a142d 100644
--- a/Kconfig
+++ b/Kconfig

@@ -30,3 +30,6 @@
 source "lib/Kconfig.debug"
 
 source "Documentation/Kconfig"
+
+# ANDROID: Set KCONFIG_EXT_PREFIX to decend into an external project.
+source "$(KCONFIG_EXT_PREFIX)Kconfig.ext"

diff --git a/Kconfig.ext b/Kconfig.ext
new file mode 100644
index 0000000..48d805f
--- /dev/null
+++ b/Kconfig.ext

@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+# This file is intentionally empty. It's used as a placeholder for when
+# KCONFIG_EXT_PREFIX isn't defined.

diff --git a/MAINTAINERS b/MAINTAINERS
index d4822ae..d750812 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS

@@ -7831,6 +7831,7 @@
 L:	linux-f2fs-devel@lists.sourceforge.net
 S:	Maintained
 W:	https://f2fs.wiki.kernel.org/
+B:	https://bugzilla.kernel.org/enter_bug.cgi?product=File%20System&component=f2fs
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
 F:	Documentation/ABI/testing/sysfs-fs-f2fs
 F:	Documentation/filesystems/f2fs.rst
@@ -10022,6 +10023,13 @@
 F:	drivers/hwmon/ina2xx.c
 F:	include/linux/platform_data/ina2xx.h
 
+INCREMENTAL FILE SYSTEM
+M:	Paul Lawrence <paullawrence@google.com>
+L:	linux-unionfs@vger.kernel.org
+S:	Supported
+F:	fs/incfs/
+F:	tools/testing/selftests/filesystems/incfs/
+
 INDUSTRY PACK SUBSYSTEM (IPACK)
 M:	Samuel Iglesias Gonsalvez <siglesias@igalia.com>
 M:	Jens Taprogge <jens.taprogge@taprogge.org>

diff --git a/Makefile b/Makefile
index e039f2a..e0eccd7 100644
--- a/Makefile
+++ b/Makefile

@@ -148,6 +148,24 @@
 
 export KBUILD_EXTMOD
 
+# ANDROID: set up mixed-build support. mixed-build allows device kernel modules
+# to be compiled against a GKI kernel. This approach still uses the headers and
+# Kbuild from device kernel, so care must be taken to ensure that those headers match.
+ifdef KBUILD_MIXED_TREE
+# Need vmlinux.symvers for modpost and System.map for depmod, check whether they exist in KBUILD_MIXED_TREE
+required_mixed_files=vmlinux.symvers System.map
+$(if $(filter-out $(words $(required_mixed_files)), \
+		$(words $(wildcard $(add-prefix $(KBUILD_MIXED_TREE)/,$(required_mixed_files))))),,\
+	$(error KBUILD_MIXED_TREE=$(KBUILD_MIXED_TREE) doesn't contain $(required_mixed_files)))
+endif
+
+mixed-build-prefix = $(if $(KBUILD_MIXED_TREE),$(KBUILD_MIXED_TREE)/)
+export KBUILD_MIXED_TREE
+# This is a hack for kleaf to set mixed-build-prefix within the execution of a make rule, e.g.
+# within __modinst_pre.
+# TODO(b/205893923): Revert this hack once it is properly handled.
+export mixed-build-prefix
+
 # Kbuild will save output files in the current working directory.
 # This does not need to match to the root of the kernel source tree.
 #
@@ -522,7 +540,7 @@
 KBZIP2		= bzip2
 KLZOP		= lzop
 LZMA		= lzma
-LZ4		= lz4c
+LZ4		= lz4
 XZ		= xz
 ZSTD		= zstd
 
@@ -748,11 +766,13 @@
 libs-y		:= lib/
 endif # KBUILD_EXTMOD
 
+ifndef KBUILD_MIXED_TREE
 # The all: target is the default when no target is given on the
 # command line.
 # This allow a user to issue only 'make' to build a kernel including modules
 # Defaults to vmlinux, but the arch makefile usually adds further targets
 all: vmlinux
+endif
 
 CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage
 ifdef CONFIG_CC_IS_GCC
@@ -978,7 +998,13 @@
 else
 CC_FLAGS_LTO	:= -flto
 endif
+
+ifeq ($(SRCARCH),x86)
+# Workaround for compiler / linker bug
 CC_FLAGS_LTO	+= -fvisibility=hidden
+else
+CC_FLAGS_LTO	+= -fvisibility=default
+endif
 
 # Limit inlining across translation units to reduce binary size
 KBUILD_LDFLAGS += -mllvm -import-instr-limit=5
@@ -1168,6 +1194,40 @@
 export MODORDER := $(extmod_prefix)modules.order
 export MODULES_NSDEPS := $(extmod_prefix)modules.nsdeps
 
+# ---------------------------------------------------------------------------
+# Kernel headers
+
+PHONY += headers
+
+#Default location for installed headers
+ifeq ($(KBUILD_EXTMOD),)
+PHONY += archheaders archscripts
+hdr-inst := -f $(srctree)/scripts/Makefile.headersinst obj
+headers: $(version_h) scripts_unifdef uapi-asm-generic archheaders archscripts
+else
+hdr-prefix = $(KBUILD_EXTMOD)/
+hdr-inst := -f $(srctree)/scripts/Makefile.headersinst dst=$(KBUILD_EXTMOD)/usr/include objtree=$(objtree)/$(KBUILD_EXTMOD) obj
+endif
+
+export INSTALL_HDR_PATH = $(objtree)/$(hdr-prefix)usr
+
+quiet_cmd_headers_install = INSTALL $(INSTALL_HDR_PATH)/include
+      cmd_headers_install = \
+	mkdir -p $(INSTALL_HDR_PATH); \
+	rsync -mrl --include='*/' --include='*\.h' --exclude='*' \
+	$(hdr-prefix)usr/include $(INSTALL_HDR_PATH);
+
+PHONY += headers_install
+headers_install: headers
+	$(call cmd,headers_install)
+
+headers:
+ifeq ($(KBUILD_EXTMOD),)
+	$(if $(filter um, $(SRCARCH)), $(error Headers not exportable for UML))
+endif
+	$(Q)$(MAKE) $(hdr-inst)=$(hdr-prefix)include/uapi
+	$(Q)$(MAKE) $(hdr-inst)=$(hdr-prefix)arch/$(SRCARCH)/include/uapi
+
 ifeq ($(KBUILD_EXTMOD),)
 
 build-dir	:= .
@@ -1224,6 +1284,7 @@
 vmlinux.a: $(KBUILD_VMLINUX_OBJS) scripts/head-object-list.txt autoksyms_recursive FORCE
 	$(call if_changed,ar_vmlinux.a)
 
+ifndef KBUILD_MIXED_TREE
 PHONY += vmlinux_o
 vmlinux_o: vmlinux.a $(KBUILD_VMLINUX_LIBS)
 	$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.vmlinux_o
@@ -1246,13 +1307,15 @@
 vmlinux: export LDFLAGS_vmlinux = $(_LDFLAGS_vmlinux)
 vmlinux: vmlinux.o $(KBUILD_LDS) modpost
 	$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.vmlinux
+endif
 
 # The actual objects are generated when descending,
 # make sure no implicit rule kicks in
 $(sort $(KBUILD_LDS) $(KBUILD_VMLINUX_OBJS) $(KBUILD_VMLINUX_LIBS)): . ;
 
 filechk_kernel.release = \
-	echo "$(KERNELVERSION)$$($(CONFIG_SHELL) $(srctree)/scripts/setlocalversion $(srctree))"
+	echo "$(KERNELVERSION)$$($(CONFIG_SHELL) $(srctree)/scripts/setlocalversion \
+		$(srctree) $(BRANCH) $(KMI_GENERATION))"
 
 # Store (new) KERNELRELEASE string in include/config/kernel.release
 include/config/kernel.release: FORCE
@@ -1352,32 +1415,6 @@
 	$(Q)find $(srctree)/include/ -name '*.h' | xargs --max-args 1 \
 	$(srctree)/scripts/headerdep.pl -I$(srctree)/include
 
-# ---------------------------------------------------------------------------
-# Kernel headers
-
-#Default location for installed headers
-export INSTALL_HDR_PATH = $(objtree)/usr
-
-quiet_cmd_headers_install = INSTALL $(INSTALL_HDR_PATH)/include
-      cmd_headers_install = \
-	mkdir -p $(INSTALL_HDR_PATH); \
-	rsync -mrl --include='*/' --include='*\.h' --exclude='*' \
-	usr/include $(INSTALL_HDR_PATH)
-
-PHONY += headers_install
-headers_install: headers
-	$(call cmd,headers_install)
-
-PHONY += archheaders archscripts
-
-hdr-inst := -f $(srctree)/scripts/Makefile.headersinst obj
-
-PHONY += headers
-headers: $(version_h) scripts_unifdef uapi-asm-generic archheaders archscripts
-	$(if $(filter um, $(SRCARCH)), $(error Headers not exportable for UML))
-	$(Q)$(MAKE) $(hdr-inst)=include/uapi
-	$(Q)$(MAKE) $(hdr-inst)=arch/$(SRCARCH)/include/uapi
-
 ifdef CONFIG_HEADERS_INSTALL
 prepare: headers
 endif
@@ -1455,7 +1492,9 @@
 # Devicetree files
 
 ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/boot/dts/),)
-dtstree := arch/$(SRCARCH)/boot/dts
+# ANDROID: allow this to be overridden by the build environment. This allows
+# one to compile a device tree that is located out-of-tree.
+dtstree ?= arch/$(SRCARCH)/boot/dts
 endif
 
 ifneq ($(dtstree),)
@@ -1530,7 +1569,7 @@
 # is an exception.
 ifdef CONFIG_DEBUG_INFO_BTF_MODULES
 KBUILD_BUILTIN := 1
-modules: vmlinux
+modules: $(mixed-build-prefix)vmlinux
 endif
 
 modules: modules_prepare
@@ -1570,8 +1609,8 @@
 		ln -s $(CURDIR) $(MODLIB)/build ; \
 	fi
 	@sed 's:^:kernel/:' modules.order > $(MODLIB)/modules.order
-	@cp -f modules.builtin $(MODLIB)/
-	@cp -f $(objtree)/modules.builtin.modinfo $(MODLIB)/
+	@cp -f $(mixed-build-prefix)modules.builtin $(MODLIB)/
+	@cp -f $(or $(mixed-build-prefix),$(objtree)/)modules.builtin.modinfo $(MODLIB)/
 
 endif # CONFIG_MODULES
 
@@ -1900,6 +1939,8 @@
 	@echo  ''
 	@echo  '  modules         - default target, build the module(s)'
 	@echo  '  modules_install - install the module'
+	@echo  '  headers_install - Install sanitised kernel headers to INSTALL_HDR_PATH'
+	@echo  '                    (default: $(abspath $(INSTALL_HDR_PATH)))'
 	@echo  '  clean           - remove generated files in module directory only'
 	@echo  ''
 
@@ -1928,7 +1969,7 @@
 
 quiet_cmd_depmod = DEPMOD  $(MODLIB)
       cmd_depmod = $(CONFIG_SHELL) $(srctree)/scripts/depmod.sh $(DEPMOD) \
-                   $(KERNELRELEASE)
+                   $(KERNELRELEASE) $(mixed-build-prefix)
 
 modules_install:
 	$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modinst
@@ -1952,7 +1993,7 @@
 endif # CONFIG_MODULES
 
 PHONY += modpost
-modpost: $(if $(single-build),, $(if $(KBUILD_BUILTIN), vmlinux.o)) \
+modpost: $(if $(single-build),, $(if $(KBUILD_MIXED_TREE),,$(if $(KBUILD_BUILTIN), vmlinux.o))) \
 	 $(if $(KBUILD_MODULES), modules_check)
 	$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost
 
@@ -2002,7 +2043,7 @@
 # Error messages still appears in the original language
 PHONY += $(build-dir)
 $(build-dir): prepare
-	$(Q)$(MAKE) $(build)=$@ need-builtin=1 need-modorder=1 $(single-goals)
+	$(Q)$(MAKE) $(build)=$@ $(if $(KBUILD_MIXED_TREE),,need-builtin=1) need-modorder=1 $(single-goals)
 
 clean-dirs := $(addprefix _clean_, $(clean-dirs))
 PHONY += $(clean-dirs) clean
@@ -2011,7 +2052,9 @@
 
 clean: $(clean-dirs)
 	$(call cmd,rmfiles)
-	@find $(or $(KBUILD_EXTMOD), .) $(RCS_FIND_IGNORE) \
+	@find $(or $(KBUILD_EXTMOD), .) \
+		$(if $(filter-out arch/$(SRCARCH)/boot/dts, $(dtstree)), $(dtstree)) \
+		$(RCS_FIND_IGNORE) \
 		\( -name '*.[aios]' -o -name '*.rsi' -o -name '*.ko' -o -name '.*.cmd' \
 		-o -name '*.ko.*' \
 		-o -name '*.dtb' -o -name '*.dtbo' -o -name '*.dtb.S' -o -name '*.dt.yaml' \
@@ -2050,7 +2093,7 @@
       cmd_gen_compile_commands = $(PYTHON3) $< -a $(AR) -o $@ $(filter-out $<, $(real-prereqs))
 
 $(extmod_prefix)compile_commands.json: scripts/clang-tools/gen_compile_commands.py \
-	$(if $(KBUILD_EXTMOD),, vmlinux.a $(KBUILD_VMLINUX_LIBS)) \
+	$(if $(KBUILD_EXTMOD)$(KBUILD_MIXED_TREE),, vmlinux.a $(KBUILD_VMLINUX_LIBS)) \
 	$(if $(CONFIG_MODULES), $(MODORDER)) FORCE
 	$(call if_changed,gen_compile_commands)
 
@@ -2107,7 +2150,8 @@
 	$(PERL) $(srctree)/scripts/checkstack.pl $(CHECKSTACK_ARCH)
 
 kernelrelease:
-	@echo "$(KERNELVERSION)$$($(CONFIG_SHELL) $(srctree)/scripts/setlocalversion $(srctree))"
+	@echo "$(KERNELVERSION)$$($(CONFIG_SHELL) $(srctree)/scripts/setlocalversion \
+		$(srctree) $(BRANCH) $(KMI_GENERATION))"
 
 kernelversion:
 	@echo $(KERNELVERSION)

diff --git a/OWNERS b/OWNERS
new file mode 100644
index 0000000..a8b2615
--- /dev/null
+++ b/OWNERS

@@ -0,0 +1,13 @@
+# The full list of approvers is defined in
+# https://android.googlesource.com/kernel/common/+/refs/meta/config/OWNERS
+
+# The following OWNERS are defined at the top level to improve the OWNERS
+# suggestions through any user interface. Consider those people the ones that
+# can help with finding the best person to review.
+adelva@google.com
+gregkh@google.com
+maennich@google.com
+saravanak@google.com
+smuckle@google.com
+surenb@google.com
+tkjos@google.com

diff --git a/OWNERS_DrNo b/OWNERS_DrNo
new file mode 100644
index 0000000..c4b8c0d5
--- /dev/null
+++ b/OWNERS_DrNo

@@ -0,0 +1,23 @@
+# Authoritative list of Dr. No reviewers to approve changes on GKI release
+# branches, such as android12-5.10.
+#
+# This file has no effect in this branch, but is referred to from release
+# branches. So, please do not move or rename.
+#
+# See the GKI release documentation (go/gki-dr-no) for further details.
+
+# Main reviewers
+adelva@google.com
+maennich@google.com
+saravanak@google.com
+vmartensson@google.com
+tkjos@google.com
+willdeacon@google.com
+
+# GKI Release Team
+howardsoc@google.com #{LAST_RESORT_SUGGESTION}
+szuweilin@google.com #{LAST_RESORT_SUGGESTION}
+
+# Backup
+sspatil@google.com #{LAST_RESORT_SUGGESTION}
+malchev@google.com #{LAST_RESORT_SUGGESTION}

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..4a1deb3
--- /dev/null
+++ b/README.md

@@ -0,0 +1,150 @@
+# How do I submit patches to Android Common Kernels
+
+1. BEST: Make all of your changes to upstream Linux. If appropriate, backport to the stable releases.
+   These patches will be merged automatically in the corresponding common kernels. If the patch is already
+   in upstream Linux, post a backport of the patch that conforms to the patch requirements below.
+   - Do not send patches upstream that contain only symbol exports. To be considered for upstream Linux,
+additions of `EXPORT_SYMBOL_GPL()` require an in-tree modular driver that uses the symbol -- so include
+the new driver or changes to an existing driver in the same patchset as the export.
+   - When sending patches upstream, the commit message must contain a clear case for why the patch
+is needed and beneficial to the community. Enabling out-of-tree drivers or functionality is not
+not a persuasive case.
+
+2. LESS GOOD: Develop your patches out-of-tree (from an upstream Linux point-of-view). Unless these are
+   fixing an Android-specific bug, these are very unlikely to be accepted unless they have been
+   coordinated with kernel-team@android.com. If you want to proceed, post a patch that conforms to the
+   patch requirements below.
+
+# Common Kernel patch requirements
+
+- All patches must conform to the Linux kernel coding standards and pass `scripts/checkpatch.pl`
+- Patches shall not break gki_defconfig or allmodconfig builds for arm, arm64, x86, x86_64 architectures
+(see  https://source.android.com/setup/build/building-kernels)
+- If the patch is not merged from an upstream branch, the subject must be tagged with the type of patch:
+`UPSTREAM:`, `BACKPORT:`, `FROMGIT:`, `FROMLIST:`, or `ANDROID:`.
+- All patches must have a `Change-Id:` tag (see https://gerrit-review.googlesource.com/Documentation/user-changeid.html)
+- If an Android bug has been assigned, there must be a `Bug:` tag.
+- All patches must have a `Signed-off-by:` tag by the author and the submitter
+
+Additional requirements are listed below based on patch type
+
+## Requirements for backports from mainline Linux: `UPSTREAM:`, `BACKPORT:`
+
+- If the patch is a cherry-pick from Linux mainline with no changes at all
+    - tag the patch subject with `UPSTREAM:`.
+    - add upstream commit information with a `(cherry picked from commit ...)` line
+    - Example:
+        - if the upstream commit message is
+```
+        important patch from upstream
+
+        This is the detailed description of the important patch
+
+        Signed-off-by: Fred Jones <fred.jones@foo.org>
+```
+>- then Joe Smith would upload the patch for the common kernel as
+```
+        UPSTREAM: important patch from upstream
+
+        This is the detailed description of the important patch
+
+        Signed-off-by: Fred Jones <fred.jones@foo.org>
+
+        Bug: 135791357
+        Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
+        (cherry picked from commit c31e73121f4c1ec41143423ac6ce3ce6dafdcec1)
+        Signed-off-by: Joe Smith <joe.smith@foo.org>
+```
+
+- If the patch requires any changes from the upstream version, tag the patch with `BACKPORT:`
+instead of `UPSTREAM:`.
+    - use the same tags as `UPSTREAM:`
+    - add comments about the changes under the `(cherry picked from commit ...)` line
+    - Example:
+```
+        BACKPORT: important patch from upstream
+
+        This is the detailed description of the important patch
+
+        Signed-off-by: Fred Jones <fred.jones@foo.org>
+
+        Bug: 135791357
+        Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
+        (cherry picked from commit c31e73121f4c1ec41143423ac6ce3ce6dafdcec1)
+        [joe: Resolved minor conflict in drivers/foo/bar.c ]
+        Signed-off-by: Joe Smith <joe.smith@foo.org>
+```
+
+## Requirements for other backports: `FROMGIT:`, `FROMLIST:`,
+
+- If the patch has been merged into an upstream maintainer tree, but has not yet
+been merged into Linux mainline
+    - tag the patch subject with `FROMGIT:`
+    - add info on where the patch came from as `(cherry picked from commit <sha1> <repo> <branch>)`. This
+must be a stable maintainer branch (not rebased, so don't use `linux-next` for example).
+    - if changes were required, use `BACKPORT: FROMGIT:`
+    - Example:
+        - if the commit message in the maintainer tree is
+```
+        important patch from upstream
+
+        This is the detailed description of the important patch
+
+        Signed-off-by: Fred Jones <fred.jones@foo.org>
+```
+>- then Joe Smith would upload the patch for the common kernel as
+```
+        FROMGIT: important patch from upstream
+
+        This is the detailed description of the important patch
+
+        Signed-off-by: Fred Jones <fred.jones@foo.org>
+
+        Bug: 135791357
+        (cherry picked from commit 878a2fd9de10b03d11d2f622250285c7e63deace
+         https://git.kernel.org/pub/scm/linux/kernel/git/foo/bar.git test-branch)
+        Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
+        Signed-off-by: Joe Smith <joe.smith@foo.org>
+```
+
+
+- If the patch has been submitted to LKML, but not accepted into any maintainer tree
+    - tag the patch subject with `FROMLIST:`
+    - add a `Link:` tag with a link to the submittal on lore.kernel.org
+    - add a `Bug:` tag with the Android bug (required for patches not accepted into
+a maintainer tree)
+    - if changes were required, use `BACKPORT: FROMLIST:`
+    - Example:
+```
+        FROMLIST: important patch from upstream
+
+        This is the detailed description of the important patch
+
+        Signed-off-by: Fred Jones <fred.jones@foo.org>
+
+        Bug: 135791357
+        Link: https://lore.kernel.org/lkml/20190619171517.GA17557@someone.com/
+        Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
+        Signed-off-by: Joe Smith <joe.smith@foo.org>
+```
+
+## Requirements for Android-specific patches: `ANDROID:`
+
+- If the patch is fixing a bug to Android-specific code
+    - tag the patch subject with `ANDROID:`
+    - add a `Fixes:` tag that cites the patch with the bug
+    - Example:
+```
+        ANDROID: fix android-specific bug in foobar.c
+
+        This is the detailed description of the important fix
+
+        Fixes: 1234abcd2468 ("foobar: add cool feature")
+        Change-Id: I4caaaa566ea080fa148c5e768bb1a0b6f7201c01
+        Signed-off-by: Joe Smith <joe.smith@foo.org>
+```
+
+- If the patch is a new feature
+    - tag the patch subject with `ANDROID:`
+    - add a `Bug:` tag with the Android bug (required for android-specific features)
+

diff --git a/android/abi_gki_aarch64_qcom b/android/abi_gki_aarch64_qcom
new file mode 100644
index 0000000..7e2c497
--- /dev/null
+++ b/android/abi_gki_aarch64_qcom

@@ -0,0 +1,3815 @@
+[abi_symbol_list]
+  access_process_vm
+  activate_task
+  add_cpu
+  add_device_randomness
+  add_memory
+  addrconf_add_linklocal
+  addrconf_prefix_rcv_add_addr
+  add_taint
+  add_timer
+  add_timer_on
+  add_uevent_var
+  add_wait_queue
+  add_wait_queue_exclusive
+  adjust_managed_page_count
+  aes_encrypt
+  aes_expandkey
+  alloc_anon_inode
+  alloc_candev_mqs
+  alloc_can_err_skb
+  alloc_canfd_skb
+  alloc_can_skb
+  alloc_canxl_skb
+  alloc_chrdev_region
+  alloc_etherdev_mqs
+  alloc_io_pgtable_ops
+  alloc_netdev_mqs
+  __alloc_pages
+  __alloc_percpu
+  __alloc_percpu_gfp
+  __alloc_skb
+  alloc_skb_with_frags
+  alloc_workqueue
+  alt_cb_patch_nops
+  amba_bustype
+  amba_driver_register
+  amba_driver_unregister
+  android_rvh_probe_register
+  anon_inode_getfile
+  arc4_crypt
+  arc4_setkey
+  __arch_clear_user
+  __arch_copy_from_user
+  __arch_copy_to_user
+  arch_freq_scale
+  arch_timer_read_counter
+  argv_free
+  argv_split
+  arm64_use_ng_mappings
+  __arm_smccc_smc
+  __arm_smccc_sve_check
+  async_schedule_node
+  async_synchronize_cookie
+  atomic_notifier_call_chain
+  atomic_notifier_chain_register
+  atomic_notifier_chain_unregister
+  autoremove_wake_function
+  available_idle_cpu
+  backlight_device_get_by_type
+  backlight_device_register
+  backlight_device_set_brightness
+  backlight_device_unregister
+  __balance_callbacks
+  balance_push_callback
+  baswap
+  bcmp
+  bdev_end_io_acct
+  bdev_start_io_acct
+  bin2hex
+  bio_endio
+  bio_end_io_acct_remapped
+  bio_start_io_acct
+  bitmap_allocate_region
+  __bitmap_and
+  __bitmap_andnot
+  __bitmap_clear
+  __bitmap_complement
+  __bitmap_equal
+  bitmap_find_next_zero_area_off
+  bitmap_free
+  __bitmap_intersects
+  __bitmap_or
+  bitmap_print_to_pagebuf
+  bitmap_release_region
+  __bitmap_set
+  __bitmap_subset
+  __bitmap_weight
+  bitmap_zalloc
+  bit_wait
+  bit_wait_timeout
+  __blk_alloc_disk
+  blk_execute_rq_nowait
+  blk_mq_free_request
+  blk_mq_rq_cpu
+  blk_queue_flag_clear
+  blk_queue_flag_set
+  blk_queue_io_min
+  blk_queue_io_opt
+  blk_queue_logical_block_size
+  blk_queue_max_discard_sectors
+  blk_queue_max_write_zeroes_sectors
+  blk_queue_physical_block_size
+  blk_rq_map_user_io
+  blk_rq_unmap_user
+  blocking_notifier_call_chain
+  blocking_notifier_chain_register
+  blocking_notifier_chain_unregister
+  bpf_trace_run1
+  bpf_trace_run10
+  bpf_trace_run11
+  bpf_trace_run12
+  bpf_trace_run2
+  bpf_trace_run3
+  bpf_trace_run4
+  bpf_trace_run5
+  bpf_trace_run6
+  bpf_trace_run7
+  bpf_trace_run8
+  bpf_trace_run9
+  bridge_tunnel_header
+  bt_accept_dequeue
+  bt_accept_enqueue
+  bt_accept_unlink
+  btbcm_check_bdaddr
+  btbcm_finalize
+  btbcm_initialize
+  btbcm_patchram
+  btbcm_read_pcm_int_params
+  btbcm_set_bdaddr
+  btbcm_setup_apple
+  btbcm_setup_patchram
+  btbcm_write_pcm_int_params
+  bt_debugfs
+  bt_err
+  bt_err_ratelimited
+  bt_info
+  bt_procfs_cleanup
+  bt_procfs_init
+  bt_sock_ioctl
+  bt_sock_link
+  bt_sock_poll
+  bt_sock_reclassify_lock
+  bt_sock_recvmsg
+  bt_sock_register
+  bt_sock_stream_recvmsg
+  bt_sock_unlink
+  bt_sock_unregister
+  bt_sock_wait_ready
+  bt_sock_wait_state
+  bt_status
+  bt_to_errno
+  bt_warn
+  bt_warn_ratelimited
+  build_skb
+  bus_find_device
+  bus_for_each_dev
+  bus_register
+  bus_unregister
+  caches_clean_inval_pou
+  call_netdevice_notifiers
+  call_rcu
+  call_rcu_tasks
+  call_rcu_tasks_trace
+  call_srcu
+  can_bus_off
+  cancel_delayed_work
+  cancel_delayed_work_sync
+  cancel_work_sync
+  can_change_mtu
+  can_change_state
+  can_dropped_invalid_skb
+  can_eth_ioctl_hwts
+  can_ethtool_op_get_ts_info_hwts
+  can_fd_dlc2len
+  can_fd_len2dlc
+  can_free_echo_skb
+  can_get_echo_skb
+  can_get_state_str
+  can_proto_register
+  can_proto_unregister
+  can_put_echo_skb
+  can_rx_offload_add_fifo
+  can_rx_offload_add_manual
+  can_rx_offload_add_timestamp
+  can_rx_offload_del
+  can_rx_offload_enable
+  can_rx_offload_get_echo_skb
+  can_rx_offload_irq_finish
+  can_rx_offload_irq_offload_fifo
+  can_rx_offload_irq_offload_timestamp
+  can_rx_offload_queue_tail
+  can_rx_offload_queue_timestamp
+  can_rx_offload_threaded_irq_finish
+  can_rx_register
+  can_rx_unregister
+  can_send
+  can_skb_get_frame_len
+  can_sock_destruct
+  capable
+  cdc_parse_cdc_header
+  cdev_add
+  cdev_alloc
+  cdev_del
+  cdev_device_add
+  cdev_device_del
+  cdev_init
+  __cfg80211_alloc_event_skb
+  __cfg80211_alloc_reply_skb
+  cfg80211_any_usable_channels
+  cfg80211_assoc_comeback
+  cfg80211_assoc_failure
+  cfg80211_auth_timeout
+  cfg80211_background_cac_abort
+  cfg80211_bss_color_notify
+  cfg80211_bss_flush
+  cfg80211_bss_iter
+  cfg80211_cac_event
+  cfg80211_calculate_bitrate
+  cfg80211_chandef_compatible
+  cfg80211_chandef_create
+  cfg80211_chandef_dfs_required
+  cfg80211_chandef_usable
+  cfg80211_chandef_valid
+  cfg80211_check_combinations
+  cfg80211_check_station_change
+  cfg80211_ch_switch_notify
+  cfg80211_ch_switch_started_notify
+  cfg80211_classify8021d
+  cfg80211_connect_done
+  cfg80211_conn_failed
+  cfg80211_control_port_tx_status
+  cfg80211_cqm_beacon_loss_notify
+  cfg80211_cqm_pktloss_notify
+  cfg80211_cqm_rssi_notify
+  cfg80211_cqm_txe_notify
+  cfg80211_crit_proto_stopped
+  cfg80211_del_sta_sinfo
+  cfg80211_disconnected
+  cfg80211_external_auth_request
+  cfg80211_find_elem_match
+  cfg80211_find_vendor_elem
+  cfg80211_free_nan_func
+  cfg80211_ft_event
+  cfg80211_get_bss
+  cfg80211_get_drvinfo
+  cfg80211_get_ies_channel_number
+  cfg80211_get_iftype_ext_capa
+  cfg80211_get_p2p_attr
+  cfg80211_get_station
+  cfg80211_gtk_rekey_notify
+  cfg80211_ibss_joined
+  cfg80211_iftype_allowed
+  cfg80211_inform_bss_data
+  cfg80211_inform_bss_frame_data
+  cfg80211_is_element_inherited
+  cfg80211_iter_combinations
+  cfg80211_merge_profile
+  cfg80211_mgmt_tx_status_ext
+  cfg80211_michael_mic_failure
+  cfg80211_nan_func_terminated
+  cfg80211_nan_match
+  cfg80211_new_sta
+  cfg80211_notify_new_peer_candidate
+  cfg80211_pmksa_candidate_notify
+  cfg80211_pmsr_complete
+  cfg80211_pmsr_report
+  cfg80211_port_authorized
+  cfg80211_probe_status
+  cfg80211_put_bss
+  __cfg80211_radar_event
+  cfg80211_ready_on_channel
+  cfg80211_ref_bss
+  cfg80211_reg_can_beacon
+  cfg80211_reg_can_beacon_relax
+  cfg80211_register_netdevice
+  cfg80211_remain_on_channel_expired
+  cfg80211_report_obss_beacon_khz
+  cfg80211_report_wowlan_wakeup
+  cfg80211_roamed
+  cfg80211_rx_assoc_resp
+  cfg80211_rx_control_port
+  cfg80211_rx_mgmt_ext
+  cfg80211_rx_mlme_mgmt
+  cfg80211_rx_spurious_frame
+  cfg80211_rx_unexpected_4addr_frame
+  cfg80211_rx_unprot_mlme_mgmt
+  cfg80211_scan_done
+  cfg80211_sched_scan_results
+  cfg80211_sched_scan_stopped
+  cfg80211_sched_scan_stopped_locked
+  __cfg80211_send_event_skb
+  cfg80211_send_layer2_update
+  cfg80211_shutdown_all_interfaces
+  cfg80211_sinfo_alloc_tid_stats
+  cfg80211_sta_opmode_change_notify
+  cfg80211_stop_iface
+  cfg80211_tdls_oper_request
+  cfg80211_tx_mgmt_expired
+  cfg80211_tx_mlme_mgmt
+  cfg80211_unlink_bss
+  cfg80211_unregister_wdev
+  cfg80211_update_owe_info_event
+  cfg80211_vendor_cmd_get_sender
+  cfg80211_vendor_cmd_reply
+  cgroup_taskset_first
+  cgroup_taskset_next
+  __check_object_size
+  check_preempt_curr
+  check_zeroed_user
+  __class_create
+  class_destroy
+  class_dev_iter_exit
+  class_dev_iter_init
+  class_dev_iter_next
+  class_find_device
+  class_for_each_device
+  class_interface_unregister
+  __class_register
+  class_unregister
+  cleanup_srcu_struct
+  clear_page
+  __ClearPageMovable
+  clk_bulk_disable
+  clk_bulk_enable
+  clk_bulk_prepare
+  clk_bulk_put_all
+  clk_bulk_unprepare
+  __clk_determine_rate
+  clk_disable
+  clk_enable
+  clk_fixed_factor_ops
+  clk_fixed_rate_ops
+  clk_get
+  __clk_get_hw
+  __clk_get_name
+  clk_get_parent
+  clk_get_rate
+  clk_hw_get_flags
+  clk_hw_get_name
+  clk_hw_get_num_parents
+  clk_hw_get_parent
+  clk_hw_get_parent_by_index
+  clk_hw_get_rate
+  clk_hw_get_rate_range
+  clk_hw_is_enabled
+  clk_hw_is_prepared
+  clk_hw_register
+  clk_hw_round_rate
+  clk_hw_unregister
+  __clk_is_enabled
+  clk_is_match
+  __clk_mux_determine_rate_closest
+  clk_notifier_register
+  clk_notifier_unregister
+  clk_prepare
+  clk_put
+  clk_register
+  clk_round_rate
+  clk_set_parent
+  clk_set_rate
+  clk_sync_state
+  clk_unprepare
+  close_candev
+  close_fd
+  cma_alloc
+  cma_get_name
+  cma_release
+  compat_ptr_ioctl
+  complete
+  complete_all
+  completion_done
+  component_add
+  component_bind_all
+  component_del
+  component_master_add_with_match
+  component_master_del
+  component_match_add_release
+  component_unbind_all
+  cond_synchronize_rcu
+  cond_synchronize_rcu_expedited
+  config_ep_by_speed
+  configfs_register_group
+  configfs_register_subsystem
+  configfs_unregister_group
+  configfs_unregister_subsystem
+  config_group_init
+  config_group_init_type_name
+  config_item_get
+  config_item_put
+  config_item_set_name
+  console_stop
+  console_suspend_enabled
+  __const_udelay
+  consume_skb
+  contig_page_data
+  _copy_from_iter
+  copy_from_kernel_nofault
+  __copy_overflow
+  _copy_to_iter
+  __cpu_active_mask
+  cpu_bit_bitmap
+  __cpu_dying_mask
+  cpufreq_cpu_get
+  cpufreq_cpu_get_raw
+  cpufreq_cpu_put
+  cpufreq_disable_fast_switch
+  cpufreq_driver_fast_switch
+  cpufreq_driver_resolve_freq
+  __cpufreq_driver_target
+  cpufreq_enable_boost_support
+  cpufreq_enable_fast_switch
+  cpufreq_freq_attr_scaling_available_freqs
+  cpufreq_freq_attr_scaling_boost_freqs
+  cpufreq_generic_frequency_table_verify
+  cpufreq_get_driver_data
+  cpufreq_get_policy
+  cpufreq_quick_get_max
+  cpufreq_register_driver
+  cpufreq_register_governor
+  cpufreq_register_notifier
+  cpufreq_unregister_driver
+  __cpuhp_remove_state
+  __cpuhp_setup_state
+  __cpuhp_setup_state_cpuslocked
+  __cpuhp_state_add_instance
+  __cpuhp_state_remove_instance
+  cpu_hwcaps
+  cpuidle_governor_latency_req
+  cpuidle_register_governor
+  cpu_irqtime
+  cpu_is_hotpluggable
+  cpu_latency_qos_add_request
+  cpu_latency_qos_remove_request
+  cpu_latency_qos_request_active
+  cpu_latency_qos_update_request
+  cpumask_any_and_distribute
+  cpu_number
+  __cpu_online_mask
+  cpu_pm_register_notifier
+  cpu_pm_unregister_notifier
+  __cpu_possible_mask
+  __cpu_present_mask
+  cpupri_find_fitness
+  cpu_scale
+  cpus_read_lock
+  cpus_read_unlock
+  cpu_subsys
+  cpu_topology
+  crc16
+  crc32_be
+  crc32_le
+  crc8
+  crc8_populate_msb
+  crc_ccitt
+  crypto_aead_decrypt
+  crypto_aead_encrypt
+  crypto_aead_setauthsize
+  crypto_aead_setkey
+  crypto_ahash_digest
+  crypto_ahash_setkey
+  crypto_alloc_aead
+  crypto_alloc_ahash
+  crypto_alloc_base
+  crypto_alloc_kpp
+  crypto_alloc_shash
+  crypto_alloc_skcipher
+  crypto_alloc_sync_skcipher
+  crypto_comp_compress
+  crypto_comp_decompress
+  crypto_default_rng
+  crypto_dequeue_request
+  crypto_destroy_tfm
+  crypto_ecdh_encode_key
+  crypto_ecdh_key_len
+  crypto_enqueue_request
+  crypto_get_default_rng
+  crypto_has_ahash
+  crypto_has_alg
+  crypto_init_queue
+  __crypto_memneq
+  crypto_put_default_rng
+  crypto_register_aead
+  crypto_register_ahash
+  crypto_register_rngs
+  crypto_register_skcipher
+  crypto_shash_digest
+  crypto_shash_final
+  crypto_shash_finup
+  crypto_shash_setkey
+  crypto_shash_tfm_digest
+  crypto_shash_update
+  crypto_skcipher_decrypt
+  crypto_skcipher_encrypt
+  crypto_skcipher_setkey
+  crypto_unregister_aead
+  crypto_unregister_ahash
+  crypto_unregister_rngs
+  crypto_unregister_skcipher
+  __crypto_xor
+  css_next_child
+  csum_ipv6_magic
+  csum_partial
+  csum_tcpudp_nofold
+  _ctype
+  datagram_poll
+  deactivate_task
+  debugfs_attr_read
+  debugfs_attr_write
+  debugfs_create_atomic_t
+  debugfs_create_blob
+  debugfs_create_bool
+  debugfs_create_dir
+  debugfs_create_file
+  debugfs_create_file_unsafe
+  debugfs_create_symlink
+  debugfs_create_u16
+  debugfs_create_u32
+  debugfs_create_u64
+  debugfs_create_u8
+  debugfs_create_ulong
+  debugfs_create_x32
+  debugfs_create_x64
+  debugfs_create_x8
+  debugfs_file_get
+  debugfs_file_put
+  debugfs_lookup
+  debugfs_remove
+  debugfs_rename
+  debug_locks_off
+  debug_locks_silent
+  dec_node_page_state
+  dec_zone_page_state
+  default_llseek
+  default_wake_function
+  deferred_free
+  delayed_work_timer_fn
+  del_gendisk
+  del_timer
+  del_timer_sync
+  desc_to_gpio
+  destroy_workqueue
+  dev_add_pack
+  dev_addr_mod
+  dev_alloc_name
+  dev_change_flags
+  __dev_change_net_namespace
+  dev_close
+  dev_close_many
+  dev_coredumpm
+  dev_coredumpv
+  _dev_crit
+  dev_driver_string
+  _dev_err
+  dev_err_probe
+  dev_fetch_sw_netstats
+  devfreq_add_device
+  devfreq_add_governor
+  devfreq_cooling_unregister
+  devfreq_get_devfreq_by_node
+  devfreq_remove_device
+  devfreq_remove_governor
+  devfreq_resume_device
+  devfreq_suspend_device
+  dev_fwnode
+  dev_getbyhwaddr_rcu
+  __dev_get_by_index
+  dev_get_by_index
+  dev_get_by_index_rcu
+  __dev_get_by_name
+  dev_get_by_name
+  dev_get_by_name_rcu
+  dev_getfirstbyhwtype
+  dev_get_flags
+  dev_get_regmap
+  dev_get_stats
+  device_add
+  device_add_disk
+  device_add_groups
+  device_create
+  device_create_file
+  device_create_with_groups
+  device_del
+  device_destroy
+  device_find_child
+  device_for_each_child
+  device_for_each_child_reverse
+  device_get_child_node_count
+  device_get_match_data
+  device_get_next_child_node
+  device_get_phy_mode
+  device_initialize
+  device_link_add
+  device_match_fwnode
+  device_match_name
+  device_match_of_node
+  device_move
+  device_property_present
+  device_property_read_string
+  device_property_read_u16_array
+  device_property_read_u32_array
+  device_property_read_u8_array
+  device_register
+  device_remove_file
+  device_rename
+  device_set_wakeup_capable
+  device_show_int
+  device_store_int
+  device_unregister
+  device_wakeup_disable
+  device_wakeup_enable
+  _dev_info
+  __dev_kfree_skb_any
+  dev_load
+  devm_add_action
+  devm_alloc_etherdev_mqs
+  devm_backlight_device_register
+  devm_bitmap_zalloc
+  devm_blk_crypto_profile_init
+  devm_clk_bulk_get
+  devm_clk_bulk_get_all
+  devm_clk_get
+  devm_clk_get_optional
+  devm_clk_hw_register
+  devm_clk_put
+  devm_clk_register
+  dev_mc_sync
+  dev_mc_unsync
+  devm_device_add_group
+  devm_device_remove_group
+  devm_extcon_dev_allocate
+  devm_extcon_dev_register
+  devm_extcon_dev_unregister
+  devm_free_irq
+  devm_fwnode_pwm_get
+  devm_gen_pool_create
+  devm_gpiod_get
+  devm_gpiod_get_optional
+  devm_gpio_request_one
+  devm_hwspin_lock_register
+  devm_iio_channel_get
+  devm_iio_device_alloc
+  __devm_iio_device_register
+  devm_input_allocate_device
+  devm_ioremap
+  devm_ioremap_resource
+  devm_ioremap_wc
+  devm_iounmap
+  devm_kasprintf
+  devm_kfree
+  devm_kmalloc
+  devm_kmemdup
+  devm_krealloc
+  devm_kstrdup
+  devm_led_classdev_register_ext
+  devm_mbox_controller_register
+  devm_nvmem_cell_get
+  devm_nvmem_device_get
+  devm_nvmem_register
+  devm_of_clk_add_hw_provider
+  devm_of_icc_get
+  __devm_of_phy_provider_register
+  devm_of_platform_populate
+  devm_pci_alloc_host_bridge
+  devm_phy_create
+  devm_phy_get
+  devm_pinctrl_get
+  devm_pinctrl_put
+  devm_pinctrl_register
+  devm_platform_get_and_ioremap_resource
+  devm_platform_ioremap_resource
+  devm_platform_ioremap_resource_byname
+  devm_power_supply_register
+  devm_pwm_get
+  devm_regmap_add_irq_chip
+  devm_regmap_del_irq_chip
+  devm_regmap_field_alloc
+  __devm_regmap_init
+  __devm_regmap_init_i2c
+  __devm_regmap_init_mmio_clk
+  __devm_regmap_init_spmi_ext
+  devm_regulator_bulk_get
+  devm_regulator_get
+  devm_regulator_get_exclusive
+  devm_regulator_get_optional
+  devm_regulator_put
+  devm_regulator_register
+  devm_regulator_register_notifier
+  devm_request_any_context_irq
+  __devm_request_region
+  devm_request_threaded_irq
+  __devm_reset_control_get
+  devm_reset_controller_register
+  devm_rtc_allocate_device
+  __devm_rtc_register_device
+  devm_snd_soc_register_card
+  devm_thermal_of_cooling_device_register
+  devm_thermal_of_zone_register
+  devm_usb_get_phy_by_node
+  devm_usb_get_phy_by_phandle
+  dev_nit_active
+  _dev_notice
+  dev_pm_clear_wake_irq
+  dev_pm_domain_attach
+  dev_pm_domain_attach_by_name
+  dev_pm_domain_detach
+  dev_pm_genpd_add_notifier
+  dev_pm_genpd_remove_notifier
+  dev_pm_genpd_set_next_wakeup
+  dev_pm_genpd_set_performance_state
+  dev_pm_opp_add
+  dev_pm_opp_adjust_voltage
+  dev_pm_opp_clear_config
+  dev_pm_opp_disable
+  dev_pm_opp_enable
+  dev_pm_opp_find_freq_ceil
+  dev_pm_opp_find_freq_exact
+  dev_pm_opp_find_freq_floor
+  dev_pm_opp_get_opp_count
+  dev_pm_opp_get_voltage
+  dev_pm_opp_of_add_table
+  dev_pm_opp_of_cpumask_remove_table
+  dev_pm_opp_of_find_icc_paths
+  dev_pm_opp_of_register_em
+  dev_pm_opp_of_remove_table
+  dev_pm_opp_put
+  dev_pm_opp_remove_all_dynamic
+  dev_pm_opp_set_config
+  dev_pm_opp_set_opp
+  dev_pm_opp_set_rate
+  dev_pm_opp_set_sharing_cpus
+  dev_pm_qos_add_notifier
+  dev_pm_qos_add_request
+  dev_pm_qos_remove_notifier
+  dev_pm_qos_remove_request
+  dev_pm_qos_update_request
+  dev_pm_set_wake_irq
+  _dev_printk
+  __dev_queue_xmit
+  dev_remove_pack
+  devres_add
+  __devres_alloc_node
+  devres_free
+  devres_release
+  dev_set_allmulti
+  dev_set_mac_address
+  dev_set_mtu
+  dev_set_name
+  dev_set_promiscuity
+  dev_uc_add
+  dev_uc_del
+  dev_uc_sync
+  dev_uc_unsync
+  _dev_warn
+  disable_irq
+  disable_irq_nosync
+  disable_percpu_irq
+  divider_get_val
+  divider_recalc_rate
+  divider_ro_round_rate_parent
+  divider_round_rate_parent
+  dma_alloc_attrs
+  dma_alloc_pages
+  dma_async_device_register
+  dma_async_device_unregister
+  dma_async_tx_descriptor_init
+  dma_buf_attach
+  dma_buf_begin_cpu_access
+  dma_buf_begin_cpu_access_partial
+  dma_buf_detach
+  dma_buf_end_cpu_access
+  dma_buf_end_cpu_access_partial
+  dma_buf_export
+  dma_buf_fd
+  dma_buf_get
+  dma_buf_get_flags
+  dma_buf_map_attachment
+  dma_buf_put
+  dma_buf_unmap_attachment
+  dma_buf_vmap
+  dma_buf_vunmap
+  dma_contiguous_default_area
+  dma_fence_add_callback
+  dma_fence_array_create
+  dma_fence_array_ops
+  dma_fence_chain_init
+  dma_fence_context_alloc
+  dma_fence_default_wait
+  dma_fence_enable_sw_signaling
+  dma_fence_free
+  dma_fence_get_status
+  dma_fence_init
+  dma_fence_release
+  dma_fence_remove_callback
+  dma_fence_signal
+  dma_fence_signal_locked
+  dma_fence_signal_timestamp_locked
+  dma_fence_wait_timeout
+  dma_free_attrs
+  dma_free_pages
+  dma_get_sgtable_attrs
+  dma_get_slave_channel
+  dma_heap_add
+  dma_heap_buffer_alloc
+  dma_heap_buffer_free
+  dma_heap_find
+  dma_heap_get_dev
+  dma_heap_get_drvdata
+  dma_heap_get_name
+  dmam_alloc_attrs
+  dma_map_page_attrs
+  dma_map_resource
+  dma_map_sg_attrs
+  dma_map_sgtable
+  dmam_free_coherent
+  dma_mmap_attrs
+  dma_release_channel
+  dma_request_chan
+  dma_resv_fini
+  dma_resv_get_singleton
+  dma_resv_init
+  dma_resv_wait_timeout
+  dma_set_coherent_mask
+  dma_set_mask
+  dma_sync_sg_for_cpu
+  dma_sync_sg_for_device
+  dma_sync_single_for_cpu
+  dma_sync_single_for_device
+  dma_unmap_page_attrs
+  dma_unmap_resource
+  dma_unmap_sg_attrs
+  do_trace_netlink_extack
+  do_trace_rcu_torture_read
+  double_rq_lock
+  do_wait_intr
+  do_wait_intr_irq
+  down
+  down_read
+  down_timeout
+  down_write
+  d_path
+  dput
+  drain_workqueue
+  driver_attach
+  driver_find_device
+  driver_register
+  driver_set_override
+  driver_unregister
+  drm_add_edid_modes
+  drm_add_modes_noedid
+  drm_atomic_commit
+  drm_atomic_get_connector_state
+  drm_atomic_get_crtc_state
+  drm_atomic_get_new_private_obj_state
+  drm_atomic_get_plane_state
+  drm_atomic_get_private_obj_state
+  drm_atomic_helper_check
+  drm_atomic_helper_cleanup_planes
+  drm_atomic_helper_commit_duplicated_state
+  drm_atomic_helper_commit_hw_done
+  drm_atomic_helper_commit_modeset_disables
+  drm_atomic_helper_commit_modeset_enables
+  drm_atomic_helper_commit_planes
+  __drm_atomic_helper_connector_destroy_state
+  __drm_atomic_helper_connector_duplicate_state
+  __drm_atomic_helper_connector_reset
+  __drm_atomic_helper_crtc_destroy_state
+  __drm_atomic_helper_crtc_duplicate_state
+  drm_atomic_helper_dirtyfb
+  drm_atomic_helper_disable_plane
+  drm_atomic_helper_duplicate_state
+  drm_atomic_helper_page_flip
+  __drm_atomic_helper_plane_duplicate_state
+  drm_atomic_helper_prepare_planes
+  __drm_atomic_helper_private_obj_duplicate_state
+  drm_atomic_helper_set_config
+  drm_atomic_helper_shutdown
+  drm_atomic_helper_swap_state
+  drm_atomic_helper_update_legacy_modeset_state
+  drm_atomic_helper_update_plane
+  drm_atomic_helper_wait_for_fences
+  drm_atomic_helper_wait_for_vblanks
+  drm_atomic_private_obj_fini
+  drm_atomic_private_obj_init
+  drm_atomic_set_crtc_for_connector
+  drm_atomic_set_crtc_for_plane
+  drm_atomic_set_fb_for_plane
+  drm_atomic_set_mode_for_crtc
+  drm_atomic_state_alloc
+  drm_atomic_state_clear
+  drm_atomic_state_default_clear
+  drm_atomic_state_default_release
+  __drm_atomic_state_free
+  drm_atomic_state_init
+  drm_bridge_attach
+  drm_bridge_chain_disable
+  drm_bridge_chain_enable
+  drm_bridge_chain_mode_set
+  drm_bridge_chain_post_disable
+  drm_bridge_chain_pre_enable
+  drm_client_init
+  drm_client_modeset_commit_locked
+  drm_client_register
+  drm_compat_ioctl
+  drm_connector_attach_encoder
+  drm_connector_cleanup
+  drm_connector_init
+  drm_connector_list_iter_begin
+  drm_connector_list_iter_end
+  drm_connector_list_iter_next
+  drm_connector_register
+  drm_connector_unregister
+  drm_connector_update_edid_property
+  drm_crtc_add_crc_entry
+  drm_crtc_cleanup
+  __drm_crtc_commit_free
+  drm_crtc_commit_wait
+  drm_crtc_handle_vblank
+  drm_crtc_init_with_planes
+  drm_crtc_send_vblank_event
+  drm_crtc_set_max_vblank_count
+  drm_crtc_vblank_off
+  drm_crtc_vblank_on
+  drm_crtc_vblank_reset
+  drm_crtc_wait_one_vblank
+  ___drm_dbg
+  __drm_debug
+  drm_detect_hdmi_monitor
+  drm_detect_monitor_audio
+  drm_dev_alloc
+  __drm_dev_dbg
+  drm_dev_printk
+  drm_dev_put
+  drm_dev_register
+  drm_dev_unregister
+  drm_display_mode_from_cea_vic
+  drm_edid_duplicate
+  drm_edid_get_monitor_name
+  drm_encoder_cleanup
+  drm_encoder_init
+  __drm_err
+  drm_event_reserve_init_locked
+  drm_format_info
+  drm_framebuffer_init
+  drm_framebuffer_lookup
+  drm_framebuffer_remove
+  drm_framebuffer_unregister_private
+  drm_gem_create_mmap_offset
+  drm_gem_fb_create_handle
+  drm_gem_fb_destroy
+  drm_gem_fb_get_obj
+  drm_gem_get_pages
+  drm_gem_handle_create
+  drm_gem_mmap
+  drm_gem_mmap_obj
+  drm_gem_object_free
+  drm_gem_object_init
+  drm_gem_object_lookup
+  drm_gem_object_release
+  drm_gem_prime_fd_to_handle
+  drm_gem_prime_handle_to_fd
+  drm_gem_private_object_init
+  drm_gem_put_pages
+  drm_gem_vm_close
+  drm_gem_vm_open
+  drm_get_connector_status_name
+  drm_get_edid
+  drm_get_format_info
+  drm_helper_hpd_irq_event
+  drm_helper_mode_fill_fb_struct
+  drm_helper_probe_single_connector_modes
+  drm_ioctl
+  drm_is_current_master
+  drm_kms_helper_hotplug_event
+  drm_kms_helper_poll_disable
+  drm_kms_helper_poll_enable
+  drm_kms_helper_poll_fini
+  drm_kms_helper_poll_init
+  drm_mm_init
+  drm_mm_insert_node_in_range
+  drmm_mode_config_init
+  drm_mm_remove_node
+  drm_mm_takedown
+  drm_mode_config_cleanup
+  drm_mode_config_reset
+  drm_mode_convert_umode
+  drm_mode_copy
+  drm_mode_create
+  drm_mode_create_dp_colorspace_property
+  drm_mode_debug_printmodeline
+  drm_mode_duplicate
+  drm_mode_equal
+  drm_mode_is_420_only
+  drm_mode_match
+  drm_mode_object_find
+  drm_mode_object_get
+  drm_mode_object_put
+  drm_mode_probed_add
+  drm_modeset_acquire_fini
+  drm_modeset_acquire_init
+  drm_modeset_backoff
+  drm_mode_set_crtcinfo
+  drm_modeset_drop_locks
+  drm_modeset_lock
+  drm_modeset_lock_all
+  drm_modeset_lock_all_ctx
+  drm_modeset_lock_single_interruptible
+  drm_mode_set_name
+  drm_modeset_unlock
+  drm_modeset_unlock_all
+  drm_mode_vrefresh
+  drm_object_attach_property
+  drm_object_property_set_value
+  drm_of_component_match_add
+  drm_open
+  drm_panel_add
+  drm_panel_init
+  drm_panel_remove
+  drm_plane_cleanup
+  drm_plane_create_rotation_property
+  drm_poll
+  drm_prime_gem_destroy
+  drm_prime_pages_to_sg
+  drm_printf
+  __drm_printfn_coredump
+  __drm_printfn_debug
+  drm_property_blob_get
+  drm_property_blob_put
+  drm_property_create
+  drm_property_create_bitmask
+  drm_property_create_blob
+  drm_property_create_enum
+  drm_property_create_range
+  drm_property_lookup_blob
+  __drm_puts_coredump
+  drm_read
+  drm_release
+  drm_rotation_simplify
+  drm_send_event_locked
+  drm_set_preferred_mode
+  drm_universal_plane_init
+  drm_vblank_init
+  drm_wait_one_vblank
+  dst_cache_destroy
+  dst_cache_get
+  dst_cache_init
+  dst_cache_set_ip4
+  dst_cache_set_ip6
+  dst_release
+  dump_stack
+  __dynamic_dev_dbg
+  __dynamic_pr_debug
+  edac_device_add_device
+  edac_device_alloc_ctl_info
+  edac_device_alloc_index
+  edac_device_del_device
+  edac_device_free_ctl_info
+  edac_device_handle_ce_count
+  edac_device_handle_ue_count
+  enable_irq
+  enable_percpu_irq
+  ether_setup
+  eth_header_parse
+  eth_mac_addr
+  ethnl_cable_test_fault_length
+  ethnl_cable_test_result
+  ethtool_convert_legacy_u32_to_link_mode
+  ethtool_convert_link_mode_to_legacy_u32
+  __ethtool_get_link_ksettings
+  ethtool_op_get_link
+  ethtool_op_get_ts_info
+  eth_type_trans
+  eth_validate_addr
+  eventfd_ctx_fdget
+  eventfd_ctx_fileget
+  eventfd_ctx_put
+  eventfd_ctx_remove_wait_queue
+  eventfd_signal
+  event_triggers_call
+  extcon_get_edev_by_phandle
+  extcon_get_edev_name
+  extcon_get_extcon_dev
+  extcon_get_property
+  extcon_get_state
+  extcon_register_notifier
+  extcon_set_property
+  extcon_set_property_capability
+  extcon_set_state
+  extcon_set_state_sync
+  extcon_unregister_notifier
+  fasync_helper
+  __fdget
+  fd_install
+  fget
+  _find_first_bit
+  _find_first_zero_bit
+  find_get_pid
+  _find_last_bit
+  _find_next_and_bit
+  _find_next_bit
+  _find_next_zero_bit
+  find_vma
+  find_vma_intersection
+  find_vpid
+  finish_wait
+  firmware_request_nowarn
+  flow_block_cb_setup_simple
+  flow_rule_match_basic
+  flow_rule_match_ipv4_addrs
+  flow_rule_match_ports
+  flow_rule_match_vlan
+  flush_dcache_page
+  flush_delayed_fput
+  flush_delayed_work
+  flush_work
+  __flush_workqueue
+  __folio_lock
+  __folio_put
+  folio_wait_bit
+  fortify_panic
+  fput
+  fqdir_exit
+  fqdir_init
+  free_candev
+  free_io_pgtable_ops
+  free_irq
+  free_netdev
+  __free_pages
+  free_pages
+  free_percpu
+  free_percpu_irq
+  freq_qos_add_request
+  freq_qos_remove_request
+  freq_qos_update_request
+  freq_reg_info
+  fsync_bdev
+  ftrace_dump
+  fwnode_find_reference
+  fwnode_get_name
+  fwnode_get_next_child_node
+  fwnode_handle_get
+  fwnode_handle_put
+  fwnode_property_present
+  fwnode_property_read_string
+  fwnode_property_read_u32_array
+  fwnode_property_read_u8_array
+  gcd
+  generic_device_group
+  generic_file_llseek
+  generic_handle_domain_irq
+  generic_handle_irq
+  geni_icc_disable
+  geni_icc_enable
+  geni_icc_get
+  geni_icc_set_bw
+  geni_se_clk_freq_match
+  geni_se_config_packing
+  geni_se_get_qup_hw_version
+  geni_se_init
+  geni_se_resources_off
+  geni_se_resources_on
+  geni_se_rx_dma_prep
+  geni_se_rx_dma_unprep
+  geni_se_select_mode
+  geni_se_tx_dma_prep
+  geni_se_tx_dma_unprep
+  genlmsg_multicast_allns
+  genlmsg_put
+  genl_register_family
+  genl_unregister_family
+  __genphy_config_aneg
+  genphy_read_abilities
+  genphy_read_mmd_unsupported
+  genphy_read_status
+  genphy_restart_aneg
+  genphy_resume
+  genphy_soft_reset
+  genphy_suspend
+  genphy_write_mmd_unsupported
+  gen_pool_add_owner
+  gen_pool_alloc_algo_owner
+  gen_pool_avail
+  gen_pool_best_fit
+  gen_pool_create
+  gen_pool_destroy
+  gen_pool_first_fit_order_align
+  gen_pool_free_owner
+  gen_pool_has_addr
+  gen_pool_set_algo
+  gen_pool_size
+  gen_pool_virt_to_phys
+  getboottime64
+  get_completed_synchronize_rcu
+  get_cpu_device
+  get_device
+  __get_free_pages
+  get_governor_parent_kobj
+  get_net_ns_by_fd
+  get_net_ns_by_pid
+  get_option
+  get_pfnblock_flags_mask
+  get_pid_task
+  get_random_bytes
+  get_random_u16
+  get_random_u32
+  get_sg_io_hdr
+  get_state_synchronize_rcu
+  get_state_synchronize_srcu
+  __get_task_comm
+  get_task_mm
+  get_task_pid
+  get_unmapped_area
+  get_unused_fd_flags
+  get_user_ifreq
+  get_user_pages
+  get_wiphy_regdom
+  get_zeroed_page
+  gic_nonsecure_priorities
+  gov_attr_set_init
+  gov_attr_set_put
+  governor_sysfs_ops
+  gpiochip_add_data_with_key
+  gpiochip_add_pin_range
+  gpiochip_disable_irq
+  gpiochip_enable_irq
+  gpiochip_generic_free
+  gpiochip_generic_request
+  gpiochip_get_data
+  gpiochip_irq_relres
+  gpiochip_irq_reqres
+  gpiochip_line_is_valid
+  gpiochip_lock_as_irq
+  gpiochip_populate_parent_fwspec_fourcell
+  gpiochip_remove
+  gpiochip_unlock_as_irq
+  gpiod_direction_input
+  gpiod_direction_output
+  gpiod_direction_output_raw
+  gpiod_get_optional
+  gpiod_get_raw_value
+  gpiod_get_raw_value_cansleep
+  gpiod_get_value
+  gpiod_get_value_cansleep
+  gpiod_set_raw_value
+  gpiod_set_raw_value_cansleep
+  gpiod_set_value
+  gpiod_set_value_cansleep
+  gpiod_to_irq
+  gpio_free
+  gpio_free_array
+  gpio_request
+  gpio_request_one
+  gpio_to_desc
+  gre_add_protocol
+  gre_del_protocol
+  gro_cells_destroy
+  gro_cells_init
+  gro_cells_receive
+  h4_recv_buf
+  handle_bad_irq
+  handle_edge_irq
+  handle_fasteoi_ack_irq
+  handle_fasteoi_irq
+  handle_level_irq
+  handle_nested_irq
+  handle_simple_irq
+  handle_sysrq
+  hashlen_string
+  hci_alloc_dev_priv
+  __hci_cmd_send
+  __hci_cmd_sync
+  hci_cmd_sync
+  hci_cmd_sync_cancel
+  __hci_cmd_sync_ev
+  hci_cmd_sync_queue
+  __hci_cmd_sync_sk
+  __hci_cmd_sync_status
+  __hci_cmd_sync_status_sk
+  hci_conn_check_secure
+  hci_conn_security
+  hci_conn_switch_role
+  hci_free_dev
+  hci_get_route
+  hci_mgmt_chan_register
+  hci_mgmt_chan_unregister
+  hci_recv_diag
+  hci_recv_frame
+  hci_register_cb
+  hci_register_dev
+  hci_release_dev
+  hci_reset_dev
+  hci_resume_dev
+  hci_set_fw_info
+  hci_set_hw_info
+  hci_suspend_dev
+  hci_uart_register_device
+  hci_uart_tx_wakeup
+  hci_uart_unregister_device
+  hci_unregister_cb
+  hci_unregister_dev
+  hex2bin
+  hex_asc_upper
+  hex_dump_to_buffer
+  hex_to_bin
+  hid_add_device
+  hid_allocate_device
+  hid_destroy_device
+  hid_ignore
+  hid_input_report
+  hid_parse_report
+  hidp_hid_driver
+  housekeeping_cpumask
+  housekeeping_overridden
+  housekeeping_test_cpu
+  hrtimer_active
+  hrtimer_cancel
+  hrtimer_forward
+  __hrtimer_get_remaining
+  hrtimer_init
+  hrtimer_init_sleeper
+  hrtimer_start_range_ns
+  hrtimer_try_to_cancel
+  hvc_alloc
+  hvc_kick
+  hvc_poll
+  hvc_remove
+  __hw_addr_init
+  __hw_addr_sync
+  __hw_addr_unsync
+  hwrng_register
+  hwrng_unregister
+  hwspin_lock_free
+  hwspin_lock_request_specific
+  __hwspin_lock_timeout
+  __hwspin_unlock
+  hypervisor_kobj
+  i2c_add_adapter
+  i2c_bus_type
+  i2c_del_adapter
+  i2c_del_driver
+  i2c_get_dma_safe_msg_buf
+  i2c_put_dma_safe_msg_buf
+  i2c_register_driver
+  i2c_transfer
+  i2c_transfer_buffer_flags
+  i3c_device_do_priv_xfers
+  i3c_driver_register_with_owner
+  i3c_driver_unregister
+  i3c_generic_ibi_alloc_pool
+  i3c_generic_ibi_free_pool
+  i3c_generic_ibi_get_free_slot
+  i3c_generic_ibi_recycle_slot
+  i3c_master_add_i3c_dev_locked
+  i3c_master_disec_locked
+  i3c_master_do_daa
+  i3c_master_enec_locked
+  i3c_master_entdaa_locked
+  i3c_master_get_free_addr
+  i3c_master_queue_ibi
+  i3c_master_register
+  i3c_master_set_info
+  i3c_master_unregister
+  icc_get
+  icc_link_create
+  icc_node_add
+  icc_node_create
+  icc_nodes_remove
+  icc_provider_add
+  icc_provider_del
+  icc_put
+  icc_set_bw
+  icc_set_tag
+  icc_sync_state
+  ida_alloc_range
+  ida_free
+  idr_alloc
+  idr_alloc_cyclic
+  idr_alloc_u32
+  idr_destroy
+  idr_find
+  idr_for_each
+  idr_get_next
+  idr_preload
+  idr_remove
+  idr_replace
+  ieee80211_alloc_hw_nm
+  ieee80211_amsdu_to_8023s
+  ieee80211_ap_probereq_get
+  ieee80211_ave_rssi
+  ieee80211_beacon_cntdwn_is_complete
+  ieee80211_beacon_get_template
+  ieee80211_beacon_get_tim
+  ieee80211_beacon_loss
+  ieee80211_beacon_set_cntdwn
+  ieee80211_beacon_update_cntdwn
+  ieee80211_bss_get_elem
+  ieee80211_calc_rx_airtime
+  ieee80211_calc_tx_airtime
+  ieee80211_chandef_to_operating_class
+  ieee80211_channel_switch_disconnect
+  ieee80211_channel_to_freq_khz
+  ieee80211_chswitch_done
+  ieee80211_color_change_finish
+  ieee80211_connection_loss
+  ieee80211_cqm_beacon_loss_notify
+  ieee80211_cqm_rssi_notify
+  ieee80211_csa_finish
+  ieee80211_ctstoself_duration
+  ieee80211_ctstoself_get
+  ieee80211_data_to_8023_exthdr
+  ieee80211_disable_rssi_reports
+  ieee80211_disconnect
+  ieee80211_enable_rssi_reports
+  ieee80211_find_sta
+  ieee80211_find_sta_by_ifaddr
+  ieee80211_find_sta_by_link_addrs
+  ieee80211_free_hw
+  ieee80211_free_txskb
+  ieee80211_freq_khz_to_channel
+  ieee80211_generic_frame_duration
+  ieee80211_get_bssid
+  ieee80211_get_buffered_bc
+  ieee80211_get_channel_khz
+  ieee80211_get_fils_discovery_tmpl
+  ieee80211_get_hdrlen_from_skb
+  ieee80211_get_key_rx_seq
+  ieee80211_get_mesh_hdrlen
+  ieee80211_get_num_supported_channels
+  ieee80211_get_response_rate
+  ieee80211_get_tkip_p1k_iv
+  ieee80211_get_tkip_p2k
+  ieee80211_get_tkip_rx_p1k
+  ieee80211_get_tx_rates
+  ieee80211_get_unsol_bcast_probe_resp_tmpl
+  ieee80211_get_vht_max_nss
+  ieee80211_gtk_rekey_add
+  ieee80211_gtk_rekey_notify
+  ieee80211_hdrlen
+  ieee80211_hw_restart_disconnect
+  ieee80211_ie_split_ric
+  ieee80211_iterate_active_interfaces_atomic
+  ieee80211_iterate_active_interfaces_mtx
+  ieee80211_iterate_interfaces
+  ieee80211_iterate_stations
+  ieee80211_iterate_stations_atomic
+  ieee80211_iter_chan_contexts_atomic
+  ieee80211_iter_keys
+  ieee80211_iter_keys_rcu
+  ieee80211_key_mic_failure
+  ieee80211_key_replay
+  ieee80211_manage_rx_ba_offl
+  ieee80211_mandatory_rates
+  ieee80211_mark_rx_ba_filtered_frames
+  ieee80211_nan_func_match
+  ieee80211_nan_func_terminated
+  ieee80211_next_txq
+  ieee80211_nullfunc_get
+  ieee80211_operating_class_to_band
+  ieee80211_parse_p2p_noa
+  ieee80211_probereq_get
+  ieee80211_proberesp_get
+  ieee80211_pspoll_get
+  ieee80211_queue_delayed_work
+  ieee80211_queue_stopped
+  ieee80211_queue_work
+  ieee80211_radar_detected
+  ieee80211_radiotap_iterator_init
+  ieee80211_radiotap_iterator_next
+  ieee80211_rate_control_register
+  ieee80211_rate_control_unregister
+  ieee80211_ready_on_channel
+  ieee80211_register_hw
+  ieee80211_remain_on_channel_expired
+  ieee80211_remove_key
+  ieee80211_report_low_ack
+  ieee80211_report_wowlan_wakeup
+  ieee80211_request_smps
+  ieee80211_reserve_tid
+  ieee80211_restart_hw
+  ieee80211_resume_disconnect
+  ieee80211_rts_duration
+  ieee80211_rts_get
+  ieee80211_rx_ba_timer_expired
+  ieee80211_rx_irqsafe
+  ieee80211_rx_list
+  ieee80211_rx_napi
+  ieee80211_s1g_channel_width
+  ieee80211_scan_completed
+  ieee80211_sched_scan_results
+  ieee80211_sched_scan_stopped
+  __ieee80211_schedule_txq
+  ieee80211_send_bar
+  ieee80211_send_eosp_nullfunc
+  ieee80211_set_active_links
+  ieee80211_set_active_links_async
+  ieee80211_set_key_rx_seq
+  ieee80211_sta_block_awake
+  ieee80211_sta_eosp
+  ieee80211_sta_pspoll
+  ieee80211_sta_ps_transition
+  ieee80211_sta_recalc_aggregates
+  ieee80211_sta_register_airtime
+  ieee80211_start_tx_ba_cb_irqsafe
+  ieee80211_start_tx_ba_session
+  ieee80211_sta_set_buffered
+  ieee80211_sta_uapsd_trigger
+  ieee80211_stop_queue
+  ieee80211_stop_queues
+  ieee80211_stop_rx_ba_session
+  ieee80211_stop_tx_ba_cb_irqsafe
+  ieee80211_stop_tx_ba_session
+  ieee80211_tdls_oper_request
+  ieee80211_tkip_add_iv
+  ieee80211_tx_dequeue
+  ieee80211_tx_prepare_skb
+  ieee80211_txq_airtime_check
+  ieee80211_txq_get_depth
+  ieee80211_txq_may_transmit
+  ieee80211_txq_schedule_start
+  ieee80211_tx_rate_update
+  ieee80211_tx_status
+  ieee80211_tx_status_8023
+  ieee80211_tx_status_ext
+  ieee80211_tx_status_irqsafe
+  ieee80211_unregister_hw
+  ieee80211_unreserve_tid
+  ieee80211_update_mu_groups
+  ieee80211_update_p2p_noa
+  ieee80211_vif_to_wdev
+  ieee80211_wake_queue
+  ieee80211_wake_queues
+  ieee802154_alloc_hw
+  ieee802154_configure_durations
+  ieee802154_free_hw
+  ieee802154_hdr_peek
+  ieee802154_hdr_peek_addrs
+  ieee802154_hdr_pull
+  ieee802154_hdr_push
+  ieee802154_max_payload
+  ieee802154_register_hw
+  ieee802154_rx_irqsafe
+  ieee802154_stop_queue
+  ieee802154_unregister_hw
+  ieee802154_wake_queue
+  ieee802154_xmit_complete
+  ieee802154_xmit_error
+  ieee802154_xmit_hw_error
+  ieeee80211_obss_color_collision_notify
+  iio_read_channel_processed
+  iio_write_channel_raw
+  in4_pton
+  in6addr_any
+  in6_pton
+  inc_node_page_state
+  inc_zone_page_state
+  in_egroup_p
+  inet6_csk_xmit
+  inet6_ioctl
+  inet_csk_get_port
+  inet_frag_destroy
+  inet_frag_find
+  inet_frag_kill
+  inet_frag_queue_insert
+  inet_frag_reasm_finish
+  inet_frag_reasm_prepare
+  inet_frags_fini
+  inet_frags_init
+  inet_ioctl
+  init_dummy_netdev
+  init_iova_domain
+  init_net
+  init_pid_ns
+  init_pseudo
+  __init_rwsem
+  init_srcu_struct
+  __init_swait_queue_head
+  init_task
+  init_timer_key
+  init_user_ns
+  init_uts_ns
+  init_wait_entry
+  __init_waitqueue_head
+  input_alloc_absinfo
+  input_allocate_device
+  input_close_device
+  input_event
+  input_ff_create
+  input_ff_destroy
+  input_free_device
+  input_mt_init_slots
+  input_mt_report_pointer_emulation
+  input_mt_report_slot_state
+  input_mt_sync_frame
+  input_open_device
+  input_register_device
+  input_register_handle
+  input_register_handler
+  input_set_abs_params
+  input_set_capability
+  input_unregister_device
+  input_unregister_handle
+  input_unregister_handler
+  interval_tree_insert
+  interval_tree_iter_first
+  interval_tree_iter_next
+  interval_tree_remove
+  int_sqrt
+  iomem_resource
+  iommu_alloc_resv_region
+  iommu_attach_device
+  iommu_detach_device
+  iommu_device_register
+  iommu_device_sysfs_add
+  iommu_device_sysfs_remove
+  iommu_device_unregister
+  iommu_dma_get_resv_regions
+  iommu_domain_alloc
+  iommu_domain_free
+  iommu_fwspec_add_ids
+  iommu_fwspec_free
+  iommu_get_domain_for_dev
+  iommu_get_msi_cookie
+  iommu_group_for_each_dev
+  iommu_group_get
+  iommu_group_get_iommudata
+  iommu_group_put
+  iommu_group_ref_get
+  iommu_group_set_iommudata
+  iommu_iova_to_phys
+  iommu_map
+  iommu_map_sg
+  iommu_present
+  iommu_put_resv_regions
+  iommu_set_fault_handler
+  iommu_set_pgtable_quirks
+  iommu_unmap
+  __ioread32_copy
+  ioremap_prot
+  iounmap
+  iov_iter_init
+  iov_iter_kvec
+  iov_iter_revert
+  __iowrite32_copy
+  ip6_dst_hoplimit
+  ip_compute_csum
+  __ip_dev_find
+  ipi_desc_get
+  ip_local_out
+  ip_mc_join_group
+  ip_queue_xmit
+  ip_route_output_flow
+  __ip_select_ident
+  ip_send_check
+  iput
+  __ipv6_addr_type
+  ipv6_dev_find
+  ipv6_ext_hdr
+  ipv6_skip_exthdr
+  ipv6_stub
+  __irq_apply_affinity_hint
+  irq_check_status_bit
+  irq_chip_ack_parent
+  irq_chip_disable_parent
+  irq_chip_enable_parent
+  irq_chip_eoi_parent
+  irq_chip_get_parent_state
+  irq_chip_mask_parent
+  irq_chip_retrigger_hierarchy
+  irq_chip_set_affinity_parent
+  irq_chip_set_parent_state
+  irq_chip_set_type_parent
+  irq_chip_set_vcpu_affinity_parent
+  irq_chip_set_wake_parent
+  irq_chip_unmask_parent
+  irq_create_fwspec_mapping
+  irq_create_mapping_affinity
+  irq_dispose_mapping
+  __irq_domain_add
+  irq_domain_alloc_irqs_parent
+  irq_domain_create_hierarchy
+  irq_domain_disconnect_hierarchy
+  irq_domain_free_irqs_common
+  irq_domain_free_irqs_parent
+  irq_domain_get_irq_data
+  irq_domain_remove
+  irq_domain_set_hwirq_and_chip
+  irq_domain_set_info
+  irq_domain_translate_twocell
+  irq_domain_update_bus_token
+  irq_domain_xlate_onecell
+  irq_domain_xlate_twocell
+  irq_find_matching_fwspec
+  irq_get_irqchip_state
+  irq_get_irq_data
+  irq_modify_status
+  irq_of_parse_and_map
+  __irq_resolve_mapping
+  irq_set_affinity_notifier
+  irq_set_chained_handler_and_data
+  irq_set_chip_and_handler_name
+  irq_set_chip_data
+  irq_set_irqchip_state
+  irq_set_irq_type
+  irq_set_irq_wake
+  irq_set_parent
+  irq_to_desc
+  irq_work_queue
+  irq_work_queue_on
+  irq_work_sync
+  isolate_and_split_free_page
+  isolate_anon_lru_page
+  is_vmalloc_addr
+  iterate_fd
+  jiffies
+  jiffies_to_msecs
+  jiffies_to_usecs
+  kasan_flag_enabled
+  kasprintf
+  kernel_accept
+  kernel_bind
+  kernel_connect
+  kernel_getsockname
+  kernel_kobj
+  kernel_listen
+  kernel_param_lock
+  kernel_param_unlock
+  kernel_power_off
+  kernel_recvmsg
+  kernel_restart
+  kernel_sendmsg
+  kernel_sock_shutdown
+  kernfs_find_and_get_ns
+  kernfs_notify
+  kernfs_put
+  kern_mount
+  kern_unmount
+  key_create_or_update
+  key_put
+  keyring_alloc
+  __kfifo_alloc
+  __kfifo_free
+  __kfifo_from_user_r
+  __kfifo_in
+  __kfifo_in_r
+  __kfifo_len_r
+  __kfifo_out
+  __kfifo_out_r
+  __kfifo_to_user_r
+  kfree
+  kfree_const
+  kfree_sensitive
+  kfree_skb_list_reason
+  kfree_skb_partial
+  kfree_skb_reason
+  kick_all_cpus_sync
+  kill_anon_super
+  kill_fasync
+  kimage_vaddr
+  kimage_voffset
+  __kmalloc
+  kmalloc_caches
+  kmalloc_large
+  kmalloc_node_trace
+  __kmalloc_node_track_caller
+  kmalloc_trace
+  kmem_cache_alloc
+  kmem_cache_create
+  kmem_cache_create_usercopy
+  kmem_cache_destroy
+  kmem_cache_free
+  kmemdup
+  kmemdup_nul
+  kmsg_dump_get_buffer
+  kmsg_dump_get_line
+  kmsg_dump_register
+  kmsg_dump_rewind
+  kmsg_dump_unregister
+  kobject_add
+  kobject_create_and_add
+  kobject_del
+  kobject_get_path
+  kobject_init
+  kobject_init_and_add
+  kobject_put
+  kobject_set_name
+  kobject_uevent
+  kobject_uevent_env
+  kobj_sysfs_ops
+  krealloc
+  kset_create_and_add
+  kset_find_obj
+  kset_unregister
+  ksize
+  ksoftirqd
+  kstat
+  kstat_irqs_cpu
+  kstat_irqs_usr
+  kstrdup
+  kstrdup_const
+  kstrndup
+  kstrtobool
+  kstrtobool_from_user
+  kstrtoint
+  kstrtoint_from_user
+  kstrtoll
+  kstrtos8
+  kstrtos8_from_user
+  kstrtou16
+  kstrtou16_from_user
+  kstrtou8
+  kstrtou8_from_user
+  kstrtouint
+  kstrtouint_from_user
+  kstrtoul_from_user
+  kstrtoull
+  kstrtoull_from_user
+  kthread_bind
+  kthread_bind_mask
+  kthread_cancel_delayed_work_sync
+  kthread_cancel_work_sync
+  kthread_complete_and_exit
+  kthread_create_on_node
+  kthread_create_worker
+  kthread_delayed_work_timer_fn
+  kthread_destroy_worker
+  kthread_flush_work
+  kthread_flush_worker
+  __kthread_init_worker
+  kthread_mod_delayed_work
+  kthread_queue_delayed_work
+  kthread_queue_work
+  kthread_should_stop
+  kthread_stop
+  kthread_worker_fn
+  ktime_get
+  ktime_get_coarse_with_offset
+  ktime_get_mono_fast_ns
+  ktime_get_real_seconds
+  ktime_get_real_ts64
+  ktime_get_seconds
+  ktime_get_ts64
+  ktime_get_with_offset
+  kvasprintf_const
+  kvfree
+  kvfree_call_rcu
+  kvmalloc_node
+  l2cap_add_psm
+  l2cap_chan_close
+  l2cap_chan_connect
+  l2cap_chan_create
+  l2cap_chan_del
+  l2cap_chan_list
+  l2cap_chan_put
+  l2cap_chan_send
+  l2cap_chan_set_defaults
+  l2cap_conn_get
+  l2cap_conn_put
+  l2cap_is_socket
+  l2cap_register_user
+  l2cap_unregister_user
+  l2tp_recv_common
+  l2tp_session_create
+  l2tp_session_dec_refcount
+  l2tp_session_delete
+  l2tp_session_get
+  l2tp_session_get_by_ifname
+  l2tp_session_get_nth
+  l2tp_session_inc_refcount
+  l2tp_session_register
+  l2tp_session_set_header_len
+  l2tp_sk_to_tunnel
+  l2tp_tunnel_create
+  l2tp_tunnel_dec_refcount
+  l2tp_tunnel_delete
+  l2tp_tunnel_get
+  l2tp_tunnel_get_nth
+  l2tp_tunnel_get_session
+  l2tp_tunnel_inc_refcount
+  l2tp_tunnel_register
+  l2tp_udp_encap_recv
+  l2tp_xmit_skb
+  led_classdev_flash_register_ext
+  led_classdev_flash_unregister
+  led_classdev_unregister
+  led_trigger_event
+  led_trigger_register
+  led_trigger_register_simple
+  led_trigger_unregister
+  led_trigger_unregister_simple
+  linkwatch_fire_event
+  __list_add_valid
+  __list_del_entry_valid
+  list_sort
+  llist_add_batch
+  llist_reverse_order
+  __local_bh_enable_ip
+  lock_sock_nested
+  log_post_read_mmio
+  log_post_write_mmio
+  log_read_mmio
+  log_write_mmio
+  lowpan_header_compress
+  lowpan_header_decompress
+  lowpan_nhc_add
+  lowpan_nhc_del
+  lowpan_register_netdev
+  lowpan_register_netdevice
+  lowpan_unregister_netdev
+  lowpan_unregister_netdevice
+  mac_pton
+  mas_find
+  match_string
+  mbox_chan_received_data
+  mbox_chan_txdone
+  mbox_client_txdone
+  mbox_controller_register
+  mbox_controller_unregister
+  mbox_free_channel
+  mbox_request_channel
+  mbox_send_message
+  mdiobus_alloc_size
+  mdiobus_free
+  mdiobus_get_phy
+  mdiobus_read
+  mdiobus_unregister
+  mdiobus_write
+  mdio_device_create
+  mdio_device_free
+  media_device_cleanup
+  media_device_init
+  __media_device_register
+  media_device_unregister
+  media_entity_pads_init
+  memblock_end_of_DRAM
+  memblock_free
+  __memcat_p
+  memchr
+  memchr_inv
+  memcmp
+  memcpy
+  __memcpy_fromio
+  __memcpy_toio
+  mem_dump_obj
+  memdup_user
+  memmove
+  memory_block_size_bytes
+  memory_read_from_buffer
+  memparse
+  mempool_alloc
+  mempool_alloc_slab
+  mempool_create
+  mempool_destroy
+  mempool_free
+  mempool_free_slab
+  memremap
+  memscan
+  mem_section
+  memset
+  memset64
+  __memset_io
+  memstart_addr
+  memunmap
+  migrate_pages
+  migrate_swap
+  __migrate_task
+  mipi_dsi_create_packet
+  mipi_dsi_dcs_set_display_brightness
+  mipi_dsi_dcs_set_tear_off
+  mipi_dsi_host_register
+  mipi_dsi_host_unregister
+  misc_deregister
+  misc_register
+  __mmap_lock_do_trace_acquire_returned
+  __mmap_lock_do_trace_released
+  __mmap_lock_do_trace_start_locking
+  mmc_cqe_request_done
+  mmc_of_parse
+  mmc_regulator_get_supply
+  mmc_regulator_set_ocr
+  mmc_regulator_set_vqmmc
+  mmc_send_tuning
+  mmput
+  mod_delayed_work_on
+  mod_node_page_state
+  mod_timer
+  __module_get
+  module_layout
+  module_put
+  __module_put_and_kthread_exit
+  __msecs_to_jiffies
+  msi_get_virq
+  msleep
+  msleep_interruptible
+  __mutex_init
+  mutex_is_locked
+  mutex_lock
+  mutex_lock_interruptible
+  mutex_trylock
+  mutex_unlock
+  napi_complete_done
+  napi_disable
+  napi_enable
+  napi_gro_flush
+  napi_gro_receive
+  __napi_schedule
+  napi_schedule_prep
+  __ndisc_fill_addr_option
+  nd_tbl
+  neigh_destroy
+  neigh_lookup
+  neigh_resolve_output
+  netdev_alert
+  __netdev_alloc_skb
+  netdev_core_stats_alloc
+  netdev_err
+  netdev_info
+  netdev_name_in_use
+  netdev_printk
+  netdev_rss_key_fill
+  netdev_rx_handler_register
+  netdev_rx_handler_unregister
+  netdev_set_default_ethtool_ops
+  netdev_update_features
+  netdev_upper_dev_link
+  netdev_upper_dev_unlink
+  netdev_warn
+  netif_carrier_off
+  netif_carrier_on
+  netif_device_attach
+  netif_device_detach
+  netif_inherit_tso_max
+  netif_napi_add_weight
+  __netif_napi_del
+  netif_receive_skb
+  netif_receive_skb_list
+  netif_rx
+  netif_stacked_transfer_operstate
+  netif_tx_lock
+  netif_tx_stop_all_queues
+  netif_tx_unlock
+  netif_tx_wake_queue
+  netlink_broadcast
+  netlink_capable
+  __netlink_dump_start
+  __netlink_kernel_create
+  netlink_kernel_release
+  netlink_net_capable
+  netlink_register_notifier
+  netlink_unicast
+  netlink_unregister_notifier
+  net_namespace_list
+  net_ns_type_operations
+  net_ratelimit
+  nfc_add_se
+  nfc_allocate_device
+  nfc_alloc_recv_skb
+  __nfc_alloc_vendor_cmd_reply_skb
+  nfc_class
+  nfc_dep_link_is_up
+  nfc_driver_failure
+  nfc_find_se
+  nfc_fw_download_done
+  nfc_get_local_general_bytes
+  nf_conntrack_destroy
+  nfc_proto_register
+  nfc_proto_unregister
+  nfc_register_device
+  nfc_remove_se
+  nfc_se_connectivity
+  nfc_send_to_raw_sock
+  nfc_se_transaction
+  nfc_set_remote_general_bytes
+  nfc_target_lost
+  nfc_targets_found
+  nfc_tm_activated
+  nfc_tm_data_received
+  nfc_tm_deactivated
+  nfc_unregister_device
+  nfc_vendor_cmd_reply
+  nla_find
+  nla_memcpy
+  __nla_parse
+  nla_put
+  nla_put_64bit
+  nla_reserve
+  nla_strscpy
+  __nla_validate
+  __nlmsg_put
+  nonseekable_open
+  noop_llseek
+  nr_cpu_ids
+  nr_ipi_get
+  nr_irqs
+  ns_capable
+  nsecs_to_jiffies
+  ns_to_kernel_old_timeval
+  ns_to_timespec64
+  n_tty_ioctl_helper
+  __num_online_cpus
+  nvmem_cell_get
+  nvmem_cell_put
+  nvmem_cell_read
+  nvmem_cell_read_u32
+  nvmem_cell_write
+  nvmem_device_read
+  nvmem_device_write
+  of_address_to_resource
+  of_alias_get_id
+  of_can_transceiver
+  of_clk_add_hw_provider
+  of_clk_add_provider
+  of_clk_del_provider
+  of_clk_get_from_provider
+  of_clk_hw_simple_get
+  of_clk_src_onecell_get
+  of_clk_src_simple_get
+  of_count_phandle_with_args
+  of_cpu_node_to_id
+  of_devfreq_cooling_register
+  of_device_get_match_data
+  of_device_is_available
+  of_device_is_compatible
+  of_device_modalias
+  of_device_uevent_modalias
+  of_dma_configure_id
+  of_dma_controller_free
+  of_dma_controller_register
+  of_dma_is_coherent
+  of_drm_find_bridge
+  of_drm_find_panel
+  of_find_compatible_node
+  of_find_device_by_node
+  of_find_i2c_device_by_node
+  of_find_matching_node_and_match
+  of_find_node_by_name
+  of_find_node_by_phandle
+  of_find_node_opts_by_path
+  of_find_node_with_property
+  of_find_property
+  offline_and_remove_memory
+  of_fwnode_ops
+  of_genpd_add_provider_onecell
+  of_genpd_add_provider_simple
+  of_genpd_del_provider
+  __of_get_address
+  of_get_child_by_name
+  of_get_cpu_node
+  of_get_named_gpio_flags
+  of_get_next_available_child
+  of_get_next_child
+  of_get_next_parent
+  of_get_parent
+  of_get_property
+  of_get_regulator_init_data
+  of_get_required_opp_performance_state
+  of_graph_get_next_endpoint
+  of_graph_get_port_parent
+  of_graph_get_remote_endpoint
+  of_graph_get_remote_node
+  of_graph_get_remote_port_parent
+  of_graph_is_present
+  of_graph_parse_endpoint
+  of_hwspin_lock_get_id
+  of_icc_get
+  of_icc_xlate_onecell
+  of_iomap
+  of_irq_find_parent
+  of_irq_get
+  of_irq_get_byname
+  of_irq_parse_one
+  of_machine_is_compatible
+  of_match_device
+  of_match_node
+  of_mdiobus_register
+  of_modalias_node
+  of_n_addr_cells
+  of_node_name_eq
+  of_n_size_cells
+  __of_parse_phandle_with_args
+  of_phandle_iterator_init
+  of_phandle_iterator_next
+  of_phy_is_fixed_link
+  of_phy_simple_xlate
+  of_platform_depopulate
+  of_platform_device_create
+  of_platform_device_destroy
+  of_platform_populate
+  of_property_count_elems_of_size
+  of_property_match_string
+  of_property_read_string
+  of_property_read_string_helper
+  of_property_read_u32_index
+  of_property_read_u64
+  of_property_read_u64_index
+  of_property_read_variable_u16_array
+  of_property_read_variable_u32_array
+  of_property_read_variable_u8_array
+  of_prop_next_string
+  of_prop_next_u32
+  of_reserved_mem_device_init_by_idx
+  of_reserved_mem_device_release
+  of_reserved_mem_lookup
+  of_root
+  of_thermal_get_ntrips
+  of_thermal_get_trip_points
+  of_thermal_is_trip_valid
+  of_translate_address
+  oops_in_progress
+  open_candev
+  out_of_line_wait_on_bit
+  out_of_line_wait_on_bit_timeout
+  overflowuid
+  page_endio
+  page_mapping
+  page_pool_alloc_pages
+  page_pool_create
+  page_pool_destroy
+  page_pool_release_page
+  panic
+  panic_notifier_list
+  panic_timeout
+  param_array_ops
+  param_get_int
+  param_get_string
+  param_get_ullong
+  param_ops_bool
+  param_ops_charp
+  param_ops_int
+  param_ops_string
+  param_ops_uint
+  param_ops_ullong
+  param_ops_ulong
+  param_set_bool
+  param_set_copystring
+  param_set_int
+  pci_aer_clear_nonfatal_status
+  pci_alloc_irq_vectors_affinity
+  pci_assign_resource
+  pci_bus_type
+  pci_clear_master
+  pci_dev_get
+  pci_device_group
+  pci_device_is_present
+  pci_dev_present
+  pci_dev_put
+  pci_disable_device
+  pci_disable_msi
+  pcie_capability_clear_and_set_word
+  pcie_capability_read_word
+  pci_enable_device
+  pci_enable_pcie_error_reporting
+  pci_find_ext_capability
+  pci_free_irq_vectors
+  pci_get_device
+  pci_get_domain_bus_and_slot
+  pci_host_probe
+  pci_iomap
+  pci_iounmap
+  pci_irq_vector
+  pci_load_and_free_saved_state
+  pci_load_saved_state
+  pci_msi_create_irq_domain
+  pci_msi_mask_irq
+  pci_msi_unmask_irq
+  pci_read_config_dword
+  pci_read_config_word
+  __pci_register_driver
+  pci_release_region
+  pci_request_region
+  pci_restore_state
+  pci_save_state
+  pci_set_master
+  pci_set_power_state
+  pci_store_saved_state
+  pci_unregister_driver
+  pci_walk_bus
+  pci_write_config_dword
+  pci_write_config_word
+  __percpu_down_read
+  percpu_down_write
+  percpu_free_rwsem
+  __percpu_init_rwsem
+  __per_cpu_offset
+  per_cpu_ptr_to_phys
+  percpu_up_write
+  perf_aux_output_begin
+  perf_aux_output_end
+  perf_aux_output_flag
+  perf_event_addr_filters_sync
+  perf_event_create_kernel_counter
+  perf_event_disable
+  perf_event_enable
+  perf_event_read_local
+  perf_event_read_value
+  perf_event_release_kernel
+  perf_get_aux
+  perf_pmu_register
+  perf_pmu_unregister
+  perf_trace_buf_alloc
+  perf_trace_run_bpf_submit
+  phy_attached_info
+  phy_calibrate
+  phy_drivers_register
+  phy_drivers_unregister
+  phy_error
+  phy_ethtool_get_wol
+  phy_ethtool_set_wol
+  phy_exit
+  phy_init
+  phy_init_eee
+  phy_init_hw
+  phylink_connect_phy
+  phylink_create
+  phylink_destroy
+  phylink_disconnect_phy
+  phylink_ethtool_get_eee
+  phylink_ethtool_get_pauseparam
+  phylink_ethtool_get_wol
+  phylink_ethtool_ksettings_get
+  phylink_ethtool_ksettings_set
+  phylink_ethtool_nway_reset
+  phylink_ethtool_set_eee
+  phylink_ethtool_set_pauseparam
+  phylink_ethtool_set_wol
+  phylink_get_eee_err
+  phylink_mac_change
+  phylink_mii_ioctl
+  phylink_of_phy_connect
+  phylink_resume
+  phylink_set_port_modes
+  phylink_speed_down
+  phylink_speed_up
+  phylink_start
+  phylink_stop
+  phylink_suspend
+  phy_mac_interrupt
+  phy_modify
+  phy_modify_mmd
+  phy_power_off
+  phy_power_on
+  phy_read_mmd
+  phy_set_mode_ext
+  phy_trigger_machine
+  phy_write_mmd
+  pick_highest_pushable_task
+  pick_migrate_task
+  pinconf_generic_dt_node_to_map
+  pinctrl_dev_get_drvdata
+  pinctrl_force_default
+  pinctrl_force_sleep
+  pinctrl_lookup_state
+  pinctrl_pm_select_default_state
+  pinctrl_pm_select_sleep_state
+  pinctrl_select_state
+  pinctrl_utils_free_map
+  platform_bus_type
+  platform_device_add
+  platform_device_add_data
+  platform_device_alloc
+  platform_device_del
+  platform_device_put
+  platform_device_register
+  platform_device_register_full
+  platform_device_unregister
+  __platform_driver_register
+  platform_driver_unregister
+  platform_get_irq
+  platform_get_irq_byname
+  platform_get_irq_byname_optional
+  platform_get_irq_optional
+  platform_get_resource
+  platform_get_resource_byname
+  platform_irqchip_probe
+  platform_irq_count
+  platform_msi_domain_alloc_irqs
+  platform_msi_domain_free_irqs
+  pm_clk_add
+  pm_clk_create
+  pm_clk_destroy
+  pm_clk_resume
+  pm_clk_suspend
+  pm_generic_resume
+  pm_generic_suspend
+  pm_genpd_add_subdomain
+  pm_genpd_init
+  pm_genpd_remove
+  pm_genpd_remove_subdomain
+  pm_power_off
+  __pm_relax
+  pm_relax
+  pm_runtime_allow
+  pm_runtime_autosuspend_expiration
+  pm_runtime_barrier
+  __pm_runtime_disable
+  pm_runtime_enable
+  pm_runtime_forbid
+  pm_runtime_force_resume
+  pm_runtime_force_suspend
+  __pm_runtime_idle
+  pm_runtime_irq_safe
+  pm_runtime_no_callbacks
+  __pm_runtime_resume
+  pm_runtime_set_autosuspend_delay
+  __pm_runtime_set_status
+  __pm_runtime_suspend
+  __pm_runtime_use_autosuspend
+  __pm_stay_awake
+  pm_stay_awake
+  pm_system_wakeup
+  pm_wakeup_dev_event
+  pm_wakeup_ws_event
+  policy_has_boost_freq
+  poll_state_synchronize_rcu
+  poll_state_synchronize_srcu
+  power_supply_changed
+  power_supply_get_by_name
+  power_supply_get_drvdata
+  power_supply_get_property
+  power_supply_put
+  power_supply_reg_notifier
+  power_supply_set_property
+  power_supply_unreg_notifier
+  ppp_channel_index
+  ppp_dev_name
+  ppp_input
+  ppp_input_error
+  ppp_output_wakeup
+  pppox_compat_ioctl
+  pppox_ioctl
+  pppox_unbind_sock
+  ppp_register_channel
+  ppp_register_compressor
+  ppp_register_net_channel
+  ppp_unit_number
+  ppp_unregister_channel
+  ppp_unregister_compressor
+  preempt_schedule
+  preempt_schedule_notrace
+  prepare_to_wait_event
+  print_hex_dump
+  _printk
+  _printk_deferred
+  proc_create
+  proc_create_data
+  proc_create_net_data
+  proc_create_net_single
+  proc_create_seq_private
+  proc_create_single_data
+  proc_dointvec
+  proc_dointvec_jiffies
+  proc_dointvec_minmax
+  proc_dostring
+  proc_douintvec_minmax
+  proc_doulongvec_minmax
+  _proc_mkdir
+  proc_mkdir
+  proc_remove
+  proc_set_user
+  proto_register
+  proto_unregister
+  __pskb_copy_fclone
+  pskb_expand_head
+  __pskb_pull_tail
+  pskb_put
+  ___pskb_trim
+  ptp_clock_event
+  ptp_clock_index
+  ptp_clock_register
+  ptp_clock_unregister
+  putback_movable_pages
+  put_cmsg
+  put_device
+  put_disk
+  put_iova_domain
+  __put_net
+  put_pid
+  put_sg_io_hdr
+  __put_task_struct
+  put_unused_fd
+  put_user_ifreq
+  pwm_apply_state
+  pwmchip_add
+  pwmchip_remove
+  qca_read_soc_version
+  qca_send_pre_shutdown_cmd
+  qca_set_bdaddr
+  qca_set_bdaddr_rome
+  qca_uart_setup
+  qcom_icc_xlate_extended
+  qcom_smem_state_get
+  qcom_smem_state_register
+  qcom_smem_state_unregister
+  qcom_smem_state_update_bits
+  queue_delayed_work_on
+  queue_work_on
+  radix_tree_delete
+  radix_tree_insert
+  radix_tree_iter_delete
+  radix_tree_iter_resume
+  radix_tree_lookup
+  radix_tree_next_chunk
+  radix_tree_tagged
+  rate_control_set_rates
+  ___ratelimit
+  rational_best_approximation
+  raw_notifier_call_chain
+  raw_notifier_chain_register
+  raw_notifier_chain_unregister
+  _raw_read_lock
+  _raw_read_lock_bh
+  _raw_read_lock_irq
+  _raw_read_lock_irqsave
+  _raw_read_unlock
+  _raw_read_unlock_bh
+  _raw_read_unlock_irq
+  _raw_read_unlock_irqrestore
+  _raw_spin_lock
+  _raw_spin_lock_bh
+  _raw_spin_lock_irq
+  _raw_spin_lock_irqsave
+  raw_spin_rq_lock_nested
+  raw_spin_rq_unlock
+  _raw_spin_trylock
+  _raw_spin_trylock_bh
+  _raw_spin_unlock
+  _raw_spin_unlock_bh
+  _raw_spin_unlock_irq
+  _raw_spin_unlock_irqrestore
+  _raw_write_lock
+  _raw_write_lock_bh
+  _raw_write_lock_irq
+  _raw_write_lock_irqsave
+  _raw_write_unlock
+  _raw_write_unlock_bh
+  _raw_write_unlock_irq
+  _raw_write_unlock_irqrestore
+  rb_erase
+  __rb_erase_color
+  rb_first
+  rb_first_postorder
+  __rb_insert_augmented
+  rb_insert_color
+  rb_last
+  rb_next
+  rb_next_postorder
+  rb_prev
+  rcu_barrier
+  rcu_barrier_tasks
+  rcu_barrier_tasks_trace
+  rcu_bind_current_to_nocb
+  rcu_check_boost_fail
+  rcu_cpu_stall_suppress
+  rcu_cpu_stall_suppress_at_boot
+  rcu_expedite_gp
+  rcu_force_quiescent_state
+  rcu_fwd_progress_check
+  rcu_get_gp_kthreads_prio
+  rcu_get_gp_seq
+  rcu_gp_is_expedited
+  rcu_gp_is_normal
+  rcu_gp_set_torture_wait
+  rcu_inkernel_boot_has_ended
+  rcu_is_watching
+  rcu_jiffies_till_stall_check
+  rcu_nocb_cpu_deoffload
+  rcu_nocb_cpu_offload
+  __rcu_read_lock
+  __rcu_read_unlock
+  rcu_read_unlock_trace_special
+  rcu_tasks_trace_qs_blkd
+  rcutorture_get_gp_data
+  rcu_trc_cmpxchg_need_qs
+  rcu_unexpedite_gp
+  rcuwait_wake_up
+  rdev_get_drvdata
+  reboot_mode
+  rebuild_sched_domains
+  refcount_dec_and_lock
+  refcount_dec_and_mutex_lock
+  refcount_dec_if_one
+  refcount_dec_not_one
+  refcount_warn_saturate
+  regcache_cache_only
+  regcache_mark_dirty
+  regcache_sync
+  regcache_sync_region
+  reg_initiator_name
+  __register_blkdev
+  register_candev
+  __register_chrdev
+  register_chrdev_region
+  register_console
+  register_die_notifier
+  register_ftrace_export
+  register_inet6addr_notifier
+  register_inetaddr_notifier
+  register_kprobe
+  register_kretprobe
+  register_memory_notifier
+  register_module_notifier
+  register_netdev
+  register_netdevice
+  register_netdevice_notifier
+  register_netevent_notifier
+  register_net_sysctl
+  register_oom_notifier
+  register_pernet_device
+  register_pernet_subsys
+  register_pm_notifier
+  register_pppox_proto
+  register_qdisc
+  register_reboot_notifier
+  register_restart_handler
+  __register_rpmsg_driver
+  register_shrinker
+  register_syscore_ops
+  register_sysctl
+  register_sysctl_table
+  regmap_bulk_read
+  regmap_bulk_write
+  regmap_check_range_table
+  regmap_field_read
+  regmap_field_update_bits_base
+  __regmap_init
+  regmap_irq_get_virq
+  regmap_mmio_detach_clk
+  regmap_read
+  regmap_update_bits_base
+  regmap_write
+  reg_query_regdb_wmm
+  regulator_allow_bypass
+  regulator_bulk_disable
+  regulator_bulk_enable
+  regulator_count_voltages
+  regulator_disable
+  regulator_disable_regmap
+  regulator_enable
+  regulator_enable_regmap
+  regulator_force_disable
+  regulator_get
+  regulator_get_mode
+  regulator_get_optional
+  regulator_get_voltage
+  regulator_get_voltage_rdev
+  regulator_get_voltage_sel_regmap
+  regulator_is_enabled
+  regulator_is_enabled_regmap
+  regulator_is_supported_voltage
+  regulator_list_voltage_linear
+  regulator_map_voltage_linear
+  regulator_notifier_call_chain
+  regulator_put
+  regulator_register_notifier
+  regulator_set_active_discharge_regmap
+  regulator_set_current_limit
+  regulator_set_load
+  regulator_set_mode
+  regulator_set_pull_down_regmap
+  regulator_set_voltage
+  regulator_set_voltage_sel_regmap
+  regulator_unregister_notifier
+  regulatory_hint
+  regulatory_pre_cac_allowed
+  regulatory_set_wiphy_regd
+  regulatory_set_wiphy_regd_sync
+  release_firmware
+  __release_region
+  release_sock
+  remap_pfn_range
+  remove_cpu
+  remove_memory
+  remove_proc_entry
+  remove_wait_queue
+  report_iommu_fault
+  request_any_context_irq
+  request_firmware
+  request_firmware_direct
+  request_firmware_into_buf
+  request_firmware_nowait
+  __request_module
+  __request_percpu_irq
+  __request_region
+  request_threaded_irq
+  resched_curr
+  reset_control_acquire
+  reset_control_assert
+  reset_control_deassert
+  reset_control_release
+  reset_control_reset
+  return_address
+  rfc1042_header
+  rfkill_alloc
+  rfkill_blocked
+  rfkill_destroy
+  rfkill_find_type
+  rfkill_get_led_trigger_name
+  rfkill_init_sw_state
+  rfkill_pause_polling
+  rfkill_register
+  rfkill_resume_polling
+  rfkill_set_hw_state_reason
+  rfkill_set_led_trigger_name
+  rfkill_set_states
+  rfkill_set_sw_state
+  rfkill_soft_blocked
+  rfkill_unregister
+  rhashtable_destroy
+  rhashtable_free_and_destroy
+  rhashtable_init
+  rhashtable_insert_slow
+  rhashtable_walk_enter
+  rhashtable_walk_exit
+  rhashtable_walk_next
+  rhashtable_walk_start_check
+  rhashtable_walk_stop
+  rhltable_init
+  __rht_bucket_nested
+  rht_bucket_nested
+  rht_bucket_nested_insert
+  root_task_group
+  round_jiffies
+  round_jiffies_relative
+  round_jiffies_up
+  rpmsg_poll
+  rpmsg_register_device
+  rpmsg_register_device_override
+  rpmsg_send
+  rpmsg_trysend
+  rpmsg_unregister_device
+  rproc_add
+  rproc_add_subdev
+  rproc_alloc
+  rproc_boot
+  rproc_coredump
+  rproc_coredump_add_custom_segment
+  rproc_coredump_add_segment
+  rproc_coredump_cleanup
+  rproc_coredump_set_elf_info
+  rproc_coredump_using_sections
+  rproc_del
+  rproc_free
+  rproc_get_by_child
+  rproc_get_by_phandle
+  rproc_put
+  rproc_remove_subdev
+  rproc_report_crash
+  rproc_shutdown
+  rtc_time64_to_tm
+  rtc_tm_to_time64
+  rtc_update_irq
+  rt_mutex_lock
+  rt_mutex_unlock
+  rtnl_is_locked
+  rtnl_link_register
+  rtnl_link_unregister
+  rtnl_lock
+  rtnl_register_module
+  rtnl_trylock
+  rtnl_unicast
+  rtnl_unlock
+  rtnl_unregister
+  rtnl_unregister_all
+  runqueues
+  safe_candev_priv
+  scatterwalk_ffwd
+  scatterwalk_map_and_copy
+  sched_clock
+  sched_feat_keys
+  sched_feat_names
+  sched_setattr
+  sched_set_fifo
+  sched_set_fifo_low
+  sched_set_normal
+  sched_setscheduler
+  sched_setscheduler_nocheck
+  sched_show_task
+  sched_uclamp_used
+  schedule
+  schedule_hrtimeout
+  schedule_timeout
+  schedule_timeout_interruptible
+  schedule_timeout_uninterruptible
+  scmi_driver_register
+  scmi_driver_unregister
+  scmi_protocol_register
+  scmi_protocol_unregister
+  scnprintf
+  scsi_alloc_request
+  scsi_autopm_get_device
+  scsi_autopm_put_device
+  scsi_block_when_processing_errors
+  scsi_cmd_allowed
+  scsi_command_size_tbl
+  scsi_device_get
+  scsi_device_put
+  scsi_ioctl
+  scsi_ioctl_block_when_processing_errors
+  __scsi_iterate_devices
+  scsi_normalize_sense
+  __scsi_print_sense
+  scsi_register_interface
+  sdev_prefix_printk
+  __sdhci_add_host
+  sdhci_add_host
+  sdhci_cleanup_host
+  sdhci_cqe_disable
+  sdhci_cqe_enable
+  sdhci_cqe_irq
+  sdhci_enable_clk
+  sdhci_get_property
+  sdhci_pltfm_free
+  sdhci_pltfm_init
+  sdhci_remove_host
+  sdhci_reset
+  sdhci_set_bus_width
+  sdhci_set_power_noreg
+  __sdhci_set_timeout
+  sdhci_setup_host
+  sdio_claim_host
+  sdio_claim_irq
+  sdio_disable_func
+  sdio_enable_func
+  sdio_readb
+  sdio_readsb
+  sdio_register_driver
+  sdio_release_host
+  sdio_release_irq
+  sdio_unregister_driver
+  sdio_writeb
+  sdio_writesb
+  security_sk_classify_flow
+  security_sk_clone
+  security_sock_graft
+  select_fallback_rq
+  seq_buf_printf
+  seq_hex_dump
+  seq_hlist_next
+  seq_hlist_start_head
+  seq_lseek
+  seq_open
+  seq_printf
+  seq_putc
+  seq_puts
+  seq_read
+  seq_release
+  seq_vprintf
+  seq_write
+  serdev_device_close
+  __serdev_device_driver_register
+  serdev_device_get_tiocm
+  serdev_device_open
+  serdev_device_set_baudrate
+  serdev_device_set_flow_control
+  serdev_device_set_tiocm
+  serdev_device_wait_until_sent
+  serdev_device_write_buf
+  serdev_device_write_flush
+  set_capacity
+  set_capacity_and_notify
+  set_cpus_allowed_ptr
+  set_direct_map_range_uncached
+  set_next_entity
+  set_normalized_timespec64
+  set_page_dirty_lock
+  __SetPageMovable
+  set_task_cpu
+  setup_udp_tunnel_sock
+  set_user_nice
+  sg_alloc_table
+  sg_alloc_table_from_pages_segment
+  sg_copy_from_buffer
+  sg_copy_to_buffer
+  sg_free_table
+  sg_init_one
+  sg_init_table
+  sg_miter_next
+  sg_miter_skip
+  sg_miter_start
+  sg_miter_stop
+  sg_nents
+  sg_next
+  __sg_page_iter_dma_next
+  __sg_page_iter_next
+  __sg_page_iter_start
+  shmem_read_mapping_page_gfp
+  show_rcu_gp_kthreads
+  show_rcu_tasks_classic_gp_kthread
+  show_rcu_tasks_trace_gp_kthread
+  show_regs
+  si_mem_available
+  si_meminfo
+  simple_attr_open
+  simple_attr_read
+  simple_attr_release
+  simple_attr_write
+  simple_open
+  simple_read_from_buffer
+  simple_write_to_buffer
+  single_open
+  single_open_size
+  single_release
+  sk_alloc
+  skb_add_rx_frag
+  skb_append_pagefrags
+  skb_checksum
+  skb_checksum_help
+  skb_clone
+  skb_clone_sk
+  skb_coalesce_rx_frag
+  skb_complete_wifi_ack
+  skb_copy
+  skb_copy_bits
+  skb_copy_datagram_from_iter
+  skb_copy_datagram_iter
+  skb_copy_expand
+  skb_cow_data
+  skb_dequeue
+  skb_dequeue_tail
+  skb_ensure_writable
+  skb_free_datagram
+  __skb_get_hash
+  __skb_gso_segment
+  __skb_pad
+  skb_pull
+  skb_pull_data
+  skb_pull_rcsum
+  skb_push
+  skb_put
+  skb_queue_head
+  skb_queue_purge
+  skb_queue_tail
+  skb_realloc_headroom
+  skb_recv_datagram
+  skb_scrub_packet
+  skb_set_owner_w
+  skb_store_bits
+  skb_to_sgvec
+  skb_trim
+  skb_try_coalesce
+  skb_tstamp_tx
+  skb_unlink
+  sk_common_release
+  sk_error_report
+  sk_filter_trim_cap
+  sk_free
+  skip_spaces
+  __sk_receive_skb
+  sk_reset_timer
+  sk_setup_caps
+  sk_stop_timer
+  slhc_compress
+  slhc_free
+  slhc_init
+  slhc_remember
+  slhc_toss
+  slhc_uncompress
+  smp_call_function
+  smp_call_function_single
+  smp_call_function_single_async
+  snd_info_create_card_entry
+  snd_info_create_module_entry
+  snd_info_free_entry
+  snd_info_register
+  snd_interval_refine
+  snd_jack_set_key
+  snd_pcm_format_width
+  _snd_pcm_hw_params_any
+  snd_soc_add_component_controls
+  snd_soc_card_get_kcontrol
+  snd_soc_card_jack_new
+  snd_soc_component_exit_regmap
+  snd_soc_component_init_regmap
+  snd_soc_component_read
+  snd_soc_component_update_bits
+  snd_soc_component_write
+  snd_soc_dai_get_channel_map
+  snd_soc_dai_set_channel_map
+  snd_soc_dapm_add_routes
+  snd_soc_dapm_disable_pin
+  snd_soc_dapm_force_enable_pin
+  snd_soc_dapm_get_enum_double
+  snd_soc_dapm_get_volsw
+  snd_soc_dapm_ignore_suspend
+  snd_soc_dapm_kcontrol_widget
+  snd_soc_dapm_mixer_update_power
+  snd_soc_dapm_mux_update_power
+  snd_soc_dapm_new_controls
+  snd_soc_dapm_new_widgets
+  snd_soc_dapm_put_enum_double
+  snd_soc_dapm_put_volsw
+  snd_soc_dapm_sync
+  snd_soc_get_enum_double
+  snd_soc_get_pcm_runtime
+  snd_soc_get_volsw
+  snd_soc_info_enum_double
+  snd_soc_info_multi_ext
+  snd_soc_info_volsw
+  snd_soc_jack_report
+  snd_soc_lookup_component
+  snd_soc_of_parse_audio_routing
+  snd_soc_of_parse_card_name
+  snd_soc_pm_ops
+  snd_soc_put_enum_double
+  snd_soc_put_volsw
+  snd_soc_register_component
+  snd_soc_rtdcom_lookup
+  snd_soc_set_runtime_hwparams
+  snd_soc_unregister_card
+  snd_soc_unregister_component
+  snprintf
+  soc_device_register
+  soc_device_unregister
+  sock_alloc_send_pskb
+  sock_cmsg_send
+  sock_common_getsockopt
+  sock_common_recvmsg
+  sock_common_setsockopt
+  __sock_create
+  sock_create_kern
+  sock_diag_register
+  sock_diag_save_cookie
+  sock_diag_unregister
+  sock_efree
+  sockfd_lookup
+  sock_gettstamp
+  sock_i_ino
+  sock_init_data
+  sock_i_uid
+  sock_no_accept
+  sock_no_bind
+  sock_no_connect
+  sock_no_getname
+  sock_no_ioctl
+  sock_no_listen
+  sock_no_mmap
+  sock_no_recvmsg
+  sock_no_sendmsg
+  sock_no_sendpage
+  sock_no_shutdown
+  sock_no_socketpair
+  __sock_queue_rcv_skb
+  sock_queue_rcv_skb_reason
+  __sock_recv_cmsgs
+  sock_recv_errqueue
+  sock_recvmsg
+  __sock_recv_timestamp
+  __sock_recv_wifi_status
+  sock_register
+  sock_release
+  sock_rfree
+  sock_setsockopt
+  __sock_tx_timestamp
+  sock_unregister
+  sock_wmalloc
+  softnet_data
+  sort
+  __spi_alloc_controller
+  spi_register_controller
+  __spi_register_driver
+  spi_setup
+  spi_sync
+  spi_unregister_controller
+  split_page
+  spmi_controller_add
+  spmi_controller_alloc
+  spmi_controller_remove
+  spmi_device_from_of
+  __spmi_driver_register
+  spmi_ext_register_read
+  spmi_ext_register_readl
+  spmi_ext_register_write
+  spmi_ext_register_writel
+  spmi_register_read
+  spmi_register_write
+  spmi_register_zero_write
+  sprintf
+  sprint_symbol
+  srcu_barrier
+  srcu_batches_completed
+  srcu_init_notifier_head
+  srcu_notifier_call_chain
+  srcu_notifier_chain_register
+  srcu_notifier_chain_unregister
+  __srcu_read_lock
+  __srcu_read_unlock
+  srcutorture_get_gp_data
+  srcu_torture_stats_print
+  sscanf
+  __stack_chk_fail
+  stack_depot_fetch
+  stack_depot_save
+  stack_trace_print
+  stack_trace_save
+  start_poll_synchronize_rcu
+  start_poll_synchronize_rcu_expedited
+  start_poll_synchronize_srcu
+  static_key_disable
+  stop_machine
+  stop_one_cpu
+  stop_one_cpu_nowait
+  strcasecmp
+  strchr
+  strchrnul
+  strcmp
+  strcpy
+  stream_open
+  strim
+  strlcat
+  strlcpy
+  strlen
+  strncasecmp
+  strnchr
+  strncmp
+  strncpy
+  strncpy_from_user
+  strnlen
+  strnstr
+  strpbrk
+  strrchr
+  strreplace
+  strscpy
+  strscpy_pad
+  strsep
+  strstr
+  __sw_hweight16
+  __sw_hweight32
+  __sw_hweight64
+  __sw_hweight8
+  sync_blockdev
+  sync_file_create
+  sync_file_get_fence
+  synchronize_irq
+  synchronize_net
+  synchronize_rcu
+  synchronize_rcu_expedited
+  synchronize_rcu_tasks
+  synchronize_rcu_tasks_trace
+  synchronize_srcu
+  synchronize_srcu_expedited
+  synth_event_create
+  synth_event_delete
+  syscon_node_to_regmap
+  syscon_regmap_lookup_by_phandle
+  sysctl_sched_features
+  sysctl_vals
+  sysfs_add_file_to_group
+  sysfs_add_link_to_group
+  sysfs_create_bin_file
+  sysfs_create_file_ns
+  sysfs_create_files
+  sysfs_create_group
+  sysfs_create_groups
+  sysfs_create_link
+  sysfs_emit
+  sysfs_emit_at
+  __sysfs_match_string
+  sysfs_notify
+  sysfs_remove_bin_file
+  sysfs_remove_file_from_group
+  sysfs_remove_file_ns
+  sysfs_remove_files
+  sysfs_remove_group
+  sysfs_remove_groups
+  sysfs_remove_link
+  sysfs_remove_link_from_group
+  sysfs_streq
+  sysrq_mask
+  system_32bit_el0_cpumask
+  system_freezable_wq
+  system_highpri_wq
+  system_long_wq
+  system_power_efficient_wq
+  system_state
+  system_unbound_wq
+  system_wq
+  sys_tz
+  task_active_pid_ns
+  __tasklet_hi_schedule
+  tasklet_init
+  tasklet_kill
+  __tasklet_schedule
+  tasklet_setup
+  tasklet_unlock_wait
+  tasklist_lock
+  __task_pid_nr_ns
+  __task_rq_lock
+  thermal_cdev_update
+  thermal_cooling_device_register
+  thermal_cooling_device_unregister
+  thermal_of_cooling_device_register
+  thermal_pressure
+  thermal_zone_device_enable
+  thermal_zone_device_register
+  thermal_zone_device_unregister
+  thermal_zone_device_update
+  thermal_zone_get_temp
+  thermal_zone_get_zone_by_name
+  tick_nohz_get_sleep_length
+  time64_to_tm
+  timer_unstable_counter_workaround
+  tipc_dump_done
+  tipc_dump_start
+  tipc_nl_sk_walk
+  tipc_sk_fill_sock_diag
+  topology_clear_scale_freq_source
+  topology_update_done
+  topology_update_thermal_pressure
+  _totalram_pages
+  trace_array_put
+  __trace_bprintk
+  trace_clock_local
+  trace_event_buffer_commit
+  trace_event_buffer_reserve
+  trace_event_ignore_this_pid
+  trace_event_printf
+  trace_event_raw_init
+  trace_event_reg
+  trace_get_event_file
+  trace_handle_return
+  __traceiter_android_rvh_account_irq
+  __traceiter_android_rvh_after_dequeue_task
+  __traceiter_android_rvh_after_enqueue_task
+  __traceiter_android_rvh_build_perf_domains
+  __traceiter_android_rvh_can_migrate_task
+  __traceiter_android_rvh_check_preempt_wakeup
+  __traceiter_android_rvh_cpu_cgroup_attach
+  __traceiter_android_rvh_cpu_cgroup_online
+  __traceiter_android_rvh_do_sched_yield
+  __traceiter_android_rvh_find_busiest_queue
+  __traceiter_android_rvh_find_lowest_rq
+  __traceiter_android_rvh_flush_task
+  __traceiter_android_rvh_get_nohz_timer_target
+  __traceiter_android_rvh_iommu_setup_dma_ops
+  __traceiter_android_rvh_is_cpu_allowed
+  __traceiter_android_rvh_migrate_queued_task
+  __traceiter_android_rvh_new_task_stats
+  __traceiter_android_rvh_replace_next_task_fair
+  __traceiter_android_rvh_rto_next_cpu
+  __traceiter_android_rvh_sched_cpu_dying
+  __traceiter_android_rvh_sched_cpu_starting
+  __traceiter_android_rvh_sched_exec
+  __traceiter_android_rvh_sched_fork_init
+  __traceiter_android_rvh_sched_getaffinity
+  __traceiter_android_rvh_sched_newidle_balance
+  __traceiter_android_rvh_sched_nohz_balancer_kick
+  __traceiter_android_rvh_sched_setaffinity
+  __traceiter_android_rvh_schedule
+  __traceiter_android_rvh_select_task_rq_fair
+  __traceiter_android_rvh_select_task_rq_rt
+  __traceiter_android_rvh_set_balance_anon_file_reclaim
+  __traceiter_android_rvh_set_cpus_allowed_by_task
+  __traceiter_android_rvh_set_task_cpu
+  __traceiter_android_rvh_show_max_freq
+  __traceiter_android_rvh_tick_entry
+  __traceiter_android_rvh_try_to_wake_up
+  __traceiter_android_rvh_try_to_wake_up_success
+  __traceiter_android_rvh_ttwu_cond
+  __traceiter_android_rvh_update_cpu_capacity
+  __traceiter_android_rvh_update_cpus_allowed
+  __traceiter_android_rvh_update_misfit_status
+  __traceiter_android_rvh_update_thermal_stats
+  __traceiter_android_rvh_wake_up_new_task
+  __traceiter_android_vh_binder_restore_priority
+  __traceiter_android_vh_binder_set_priority
+  __traceiter_android_vh_binder_wakeup_ilocked
+  __traceiter_android_vh_cpu_idle_enter
+  __traceiter_android_vh_cpu_idle_exit
+  __traceiter_android_vh_cpuidle_psci_enter
+  __traceiter_android_vh_cpuidle_psci_exit
+  __traceiter_android_vh_ftrace_dump_buffer
+  __traceiter_android_vh_ftrace_format_check
+  __traceiter_android_vh_ftrace_oops_enter
+  __traceiter_android_vh_ftrace_oops_exit
+  __traceiter_android_vh_ftrace_size_check
+  __traceiter_android_vh_ignore_dmabuf_vmap_bounds
+  __traceiter_android_vh_ipi_stop
+  __traceiter_android_vh_jiffies_update
+  __traceiter_android_vh_printk_hotplug
+  __traceiter_android_vh_rproc_recovery
+  __traceiter_android_vh_rproc_recovery_set
+  __traceiter_android_vh_scheduler_tick
+  __traceiter_android_vh_show_resume_epoch_val
+  __traceiter_android_vh_show_suspend_epoch_val
+  __traceiter_android_vh_timer_calc_index
+  __traceiter_android_vh_ufs_check_int_errors
+  __traceiter_android_vh_ufs_compl_command
+  __traceiter_android_vh_ufs_send_command
+  __traceiter_android_vh_ufs_send_tm_command
+  __traceiter_android_vh_ufs_send_uic_command
+  __traceiter_android_vh_update_topology_flags_workfn
+  __traceiter_binder_transaction_received
+  __traceiter_cpu_frequency_limits
+  __traceiter_cpu_idle
+  __traceiter_gpu_mem_total
+  __traceiter_ipi_entry
+  __traceiter_ipi_raise
+  __traceiter_mmap_lock_acquire_returned
+  __traceiter_mmap_lock_released
+  __traceiter_mmap_lock_start_locking
+  __traceiter_sched_overutilized_tp
+  __traceiter_sched_switch
+  __traceiter_suspend_resume
+  trace_output_call
+  __tracepoint_android_rvh_account_irq
+  __tracepoint_android_rvh_after_dequeue_task
+  __tracepoint_android_rvh_after_enqueue_task
+  __tracepoint_android_rvh_build_perf_domains
+  __tracepoint_android_rvh_can_migrate_task
+  __tracepoint_android_rvh_check_preempt_wakeup
+  __tracepoint_android_rvh_cpu_cgroup_attach
+  __tracepoint_android_rvh_cpu_cgroup_online
+  __tracepoint_android_rvh_do_sched_yield
+  __tracepoint_android_rvh_find_busiest_queue
+  __tracepoint_android_rvh_find_lowest_rq
+  __tracepoint_android_rvh_flush_task
+  __tracepoint_android_rvh_get_nohz_timer_target
+  __tracepoint_android_rvh_iommu_setup_dma_ops
+  __tracepoint_android_rvh_is_cpu_allowed
+  __tracepoint_android_rvh_migrate_queued_task
+  __tracepoint_android_rvh_new_task_stats
+  __tracepoint_android_rvh_replace_next_task_fair
+  __tracepoint_android_rvh_rto_next_cpu
+  __tracepoint_android_rvh_sched_cpu_dying
+  __tracepoint_android_rvh_sched_cpu_starting
+  __tracepoint_android_rvh_sched_exec
+  __tracepoint_android_rvh_sched_fork_init
+  __tracepoint_android_rvh_sched_getaffinity
+  __tracepoint_android_rvh_sched_newidle_balance
+  __tracepoint_android_rvh_sched_nohz_balancer_kick
+  __tracepoint_android_rvh_sched_setaffinity
+  __tracepoint_android_rvh_schedule
+  __tracepoint_android_rvh_select_task_rq_fair
+  __tracepoint_android_rvh_select_task_rq_rt
+  __tracepoint_android_rvh_set_balance_anon_file_reclaim
+  __tracepoint_android_rvh_set_cpus_allowed_by_task
+  __tracepoint_android_rvh_set_task_cpu
+  __tracepoint_android_rvh_show_max_freq
+  __tracepoint_android_rvh_tick_entry
+  __tracepoint_android_rvh_try_to_wake_up
+  __tracepoint_android_rvh_try_to_wake_up_success
+  __tracepoint_android_rvh_ttwu_cond
+  __tracepoint_android_rvh_update_cpu_capacity
+  __tracepoint_android_rvh_update_cpus_allowed
+  __tracepoint_android_rvh_update_misfit_status
+  __tracepoint_android_rvh_update_thermal_stats
+  __tracepoint_android_rvh_wake_up_new_task
+  __tracepoint_android_vh_binder_restore_priority
+  __tracepoint_android_vh_binder_set_priority
+  __tracepoint_android_vh_binder_wakeup_ilocked
+  __tracepoint_android_vh_cpu_idle_enter
+  __tracepoint_android_vh_cpu_idle_exit
+  __tracepoint_android_vh_cpuidle_psci_enter
+  __tracepoint_android_vh_cpuidle_psci_exit
+  __tracepoint_android_vh_ftrace_dump_buffer
+  __tracepoint_android_vh_ftrace_format_check
+  __tracepoint_android_vh_ftrace_oops_enter
+  __tracepoint_android_vh_ftrace_oops_exit
+  __tracepoint_android_vh_ftrace_size_check
+  __tracepoint_android_vh_ignore_dmabuf_vmap_bounds
+  __tracepoint_android_vh_ipi_stop
+  __tracepoint_android_vh_jiffies_update
+  __tracepoint_android_vh_printk_hotplug
+  __tracepoint_android_vh_rproc_recovery
+  __tracepoint_android_vh_rproc_recovery_set
+  __tracepoint_android_vh_scheduler_tick
+  __tracepoint_android_vh_show_resume_epoch_val
+  __tracepoint_android_vh_show_suspend_epoch_val
+  __tracepoint_android_vh_timer_calc_index
+  __tracepoint_android_vh_ufs_check_int_errors
+  __tracepoint_android_vh_ufs_compl_command
+  __tracepoint_android_vh_ufs_send_command
+  __tracepoint_android_vh_ufs_send_tm_command
+  __tracepoint_android_vh_ufs_send_uic_command
+  __tracepoint_android_vh_update_topology_flags_workfn
+  __tracepoint_binder_transaction_received
+  __tracepoint_cpu_frequency_limits
+  __tracepoint_cpu_idle
+  __tracepoint_gpu_mem_total
+  __tracepoint_ipi_entry
+  __tracepoint_ipi_raise
+  __tracepoint_mmap_lock_acquire_returned
+  __tracepoint_mmap_lock_released
+  __tracepoint_mmap_lock_start_locking
+  tracepoint_probe_register
+  tracepoint_probe_register_prio
+  tracepoint_probe_unregister
+  __tracepoint_sched_overutilized_tp
+  __tracepoint_sched_switch
+  __tracepoint_suspend_resume
+  trace_print_array_seq
+  trace_print_flags_seq
+  trace_print_symbols_seq
+  trace_raw_output_prep
+  trace_seq_printf
+  trace_seq_putc
+  __trace_trigger_soft_disabled
+  tracing_off
+  try_module_get
+  try_wait_for_completion
+  __tty_alloc_driver
+  tty_driver_flush_buffer
+  tty_driver_kref_put
+  tty_encode_baud_rate
+  tty_flip_buffer_push
+  tty_get_char_size
+  tty_hangup
+  __tty_insert_flip_char
+  tty_insert_flip_string_fixed_flag
+  tty_kref_put
+  tty_ldisc_deref
+  tty_ldisc_flush
+  tty_ldisc_ref
+  tty_mode_ioctl
+  tty_port_close
+  tty_port_destroy
+  tty_port_hangup
+  tty_port_init
+  tty_port_install
+  tty_port_open
+  tty_port_put
+  tty_port_register_device
+  tty_port_tty_get
+  tty_port_tty_hangup
+  tty_port_tty_wakeup
+  tty_register_driver
+  tty_register_ldisc
+  tty_set_termios
+  tty_standard_install
+  tty_std_termios
+  tty_termios_baud_rate
+  tty_termios_copy_hw
+  tty_termios_encode_baud_rate
+  tty_unregister_device
+  tty_unregister_driver
+  tty_unregister_ldisc
+  tty_unthrottle
+  tty_vhangup
+  tty_wakeup
+  typec_get_drvdata
+  typec_register_partner
+  typec_register_port
+  typec_set_data_role
+  typec_set_pwr_opmode
+  typec_set_pwr_role
+  typec_unregister_partner
+  uart_add_one_port
+  uart_get_baud_rate
+  uart_insert_char
+  uart_register_driver
+  uart_remove_one_port
+  uart_resume_port
+  uart_suspend_port
+  uart_try_toggle_sysrq
+  uart_unregister_driver
+  uart_update_timeout
+  uart_write_wakeup
+  uclamp_eff_value
+  ucsi_connector_change
+  ucsi_create
+  ucsi_destroy
+  ucsi_get_drvdata
+  ucsi_register
+  ucsi_set_drvdata
+  ucsi_unregister
+  __udelay
+  udp6_set_csum
+  udp_set_csum
+  udp_sock_create4
+  udp_sock_create6
+  udp_tunnel6_xmit_skb
+  udp_tunnel_sock_release
+  udp_tunnel_xmit_skb
+  ufshcd_auto_hibern8_update
+  ufshcd_dme_get_attr
+  ufshcd_dme_set_attr
+  ufshcd_dump_regs
+  ufshcd_fixup_dev_quirks
+  ufshcd_get_local_unipro_ver
+  ufshcd_hba_stop
+  ufshcd_hold
+  ufshcd_pltfrm_init
+  ufshcd_pltfrm_shutdown
+  ufshcd_query_attr
+  ufshcd_query_descriptor_retry
+  ufshcd_query_flag
+  ufshcd_release
+  ufshcd_remove
+  ufshcd_resume_complete
+  ufshcd_runtime_resume
+  ufshcd_runtime_suspend
+  ufshcd_suspend_prepare
+  ufshcd_system_resume
+  ufshcd_system_suspend
+  ufshcd_uic_hibern8_enter
+  ufshcd_uic_hibern8_exit
+  __uio_register_device
+  uio_unregister_device
+  unlock_page
+  unregister_blkdev
+  unregister_candev
+  __unregister_chrdev
+  unregister_chrdev_region
+  unregister_console
+  unregister_die_notifier
+  unregister_ftrace_export
+  unregister_inet6addr_notifier
+  unregister_inetaddr_notifier
+  unregister_kprobe
+  unregister_kretprobe
+  unregister_netdev
+  unregister_netdevice_many
+  unregister_netdevice_notifier
+  unregister_netdevice_queue
+  unregister_netevent_notifier
+  unregister_net_sysctl_table
+  unregister_oom_notifier
+  unregister_pernet_device
+  unregister_pernet_subsys
+  unregister_pm_notifier
+  unregister_pppox_proto
+  unregister_qdisc
+  unregister_reboot_notifier
+  unregister_restart_handler
+  unregister_rpmsg_driver
+  unregister_shrinker
+  unregister_syscore_ops
+  unregister_sysctl_table
+  up
+  update_devfreq
+  update_rq_clock
+  up_read
+  up_write
+  usb_add_phy_dev
+  usb_alloc_coherent
+  usb_alloc_urb
+  usb_anchor_urb
+  usb_assign_descriptors
+  usb_autopm_get_interface
+  usb_autopm_get_interface_async
+  usb_autopm_get_interface_no_resume
+  usb_autopm_put_interface
+  usb_autopm_put_interface_async
+  usb_bus_idr
+  usb_bus_idr_lock
+  usb_clear_halt
+  usb_composite_setup_continue
+  usb_control_msg
+  usb_control_msg_recv
+  usb_debug_root
+  usb_decode_ctrl
+  usb_deregister
+  usb_disabled
+  usb_driver_claim_interface
+  usb_driver_release_interface
+  usb_ep_alloc_request
+  usb_ep_autoconfig
+  usb_ep_dequeue
+  usb_ep_disable
+  usb_ep_enable
+  usb_ep_free_request
+  usb_ep_queue
+  usb_ep_set_halt
+  usb_find_common_endpoints
+  usb_free_all_descriptors
+  usb_free_coherent
+  usb_free_urb
+  usb_function_register
+  usb_function_unregister
+  usb_gadget_wakeup
+  usb_get_dev
+  usb_get_from_anchor
+  usb_get_intf
+  usb_ifnum_to_if
+  usb_interface_id
+  usb_kill_urb
+  usb_match_id
+  usb_match_one_id
+  usb_phy_set_charger_current
+  usb_poison_urb
+  usb_put_dev
+  usb_put_function_instance
+  usb_put_intf
+  usb_register_driver
+  usb_register_notify
+  usb_remove_phy
+  usb_role_switch_find_by_fwnode
+  usb_role_switch_get_drvdata
+  usb_role_switch_register
+  usb_role_switch_set_role
+  usb_role_switch_unregister
+  usb_serial_claim_interface
+  usb_serial_deregister_drivers
+  usb_serial_generic_chars_in_buffer
+  usb_serial_generic_close
+  usb_serial_generic_get_icount
+  usb_serial_generic_open
+  usb_serial_generic_process_read_urb
+  usb_serial_generic_read_bulk_callback
+  usb_serial_generic_resume
+  usb_serial_generic_submit_read_urbs
+  usb_serial_generic_throttle
+  usb_serial_generic_tiocmiwait
+  usb_serial_generic_unthrottle
+  usb_serial_generic_wait_until_sent
+  usb_serial_generic_write
+  usb_serial_generic_write_bulk_callback
+  usb_serial_generic_write_start
+  usb_serial_handle_dcd_change
+  usb_serial_port_softint
+  usb_serial_register_drivers
+  usb_serial_resume
+  usb_serial_suspend
+  usb_show_dynids
+  usb_speed_string
+  usb_store_new_id
+  usb_string_id
+  usb_submit_urb
+  usb_unpoison_urb
+  usb_unregister_notify
+  __usecs_to_jiffies
+  usleep_range_state
+  utf8_data_table
+  uuid_parse
+  v4l2_compat_ioctl32
+  v4l2_ctrl_find
+  v4l2_ctrl_handler_free
+  v4l2_ctrl_handler_init_class
+  v4l2_ctrl_new_custom
+  v4l2_ctrl_new_std
+  v4l2_ctrl_new_std_menu
+  v4l2_ctrl_request_complete
+  v4l2_ctrl_request_setup
+  v4l2_ctrl_subscribe_event
+  v4l2_device_register
+  v4l2_device_register_subdev
+  __v4l2_device_register_subdev_nodes
+  v4l2_device_unregister
+  v4l2_device_unregister_subdev
+  v4l2_event_dequeue
+  v4l2_event_pending
+  v4l2_event_queue
+  v4l2_event_queue_fh
+  v4l2_event_subscribe
+  v4l2_event_unsubscribe
+  v4l2_fh_add
+  v4l2_fh_del
+  v4l2_fh_exit
+  v4l2_fh_init
+  v4l2_fh_open
+  v4l2_fh_release
+  v4l2_m2m_ctx_init
+  v4l2_m2m_ctx_release
+  v4l2_m2m_init
+  v4l2_m2m_job_finish
+  v4l2_m2m_register_media_controller
+  v4l2_m2m_release
+  v4l2_m2m_request_queue
+  v4l2_m2m_unregister_media_controller
+  v4l2_src_change_event_subscribe
+  v4l2_subdev_call_wrappers
+  v4l2_subdev_init
+  vb2_buffer_done
+  vb2_create_bufs
+  vb2_dqbuf
+  vb2_prepare_buf
+  vb2_qbuf
+  vb2_querybuf
+  vb2_queue_init
+  vb2_queue_release
+  vb2_reqbufs
+  vb2_request_validate
+  vb2_streamoff
+  vb2_streamon
+  vchan_dma_desc_free_list
+  vchan_find_desc
+  vchan_init
+  vchan_tx_desc_free
+  vchan_tx_submit
+  verify_pkcs7_signature
+  vfree
+  vhost_add_used_and_signal
+  vhost_dev_check_owner
+  vhost_dev_cleanup
+  vhost_dev_init
+  vhost_dev_ioctl
+  vhost_dev_stop
+  vhost_disable_notify
+  vhost_enable_notify
+  vhost_get_vq_desc
+  vhost_log_access_ok
+  vhost_vq_access_ok
+  vhost_vq_init_access
+  vhost_vring_ioctl
+  video_devdata
+  video_device_alloc
+  video_device_release
+  video_device_release_empty
+  video_ioctl2
+  __video_register_device
+  video_unregister_device
+  vlan_dev_vlan_id
+  vlan_filter_drop_vids
+  vlan_filter_push_vids
+  vlan_ioctl_set
+  vlan_uses_dev
+  vlan_vid_add
+  vlan_vid_del
+  vmalloc
+  vmalloc_to_page
+  vmalloc_to_pfn
+  vmap
+  vmemdup_user
+  vmf_insert_mixed
+  vmf_insert_pfn
+  vm_get_page_prot
+  vm_insert_page
+  vm_iomap_memory
+  vm_map_pages
+  vm_mmap
+  vm_munmap
+  vm_node_stat
+  vm_zone_stat
+  vprintk
+  vscnprintf
+  vsnprintf
+  vsprintf
+  vunmap
+  vzalloc
+  wait_for_completion
+  wait_for_completion_interruptible
+  wait_for_completion_interruptible_timeout
+  wait_for_completion_killable
+  wait_for_completion_timeout
+  __wait_rcu_gp
+  wait_woken
+  __wake_up
+  wake_up_bit
+  wake_up_if_idle
+  __wake_up_locked
+  wake_up_process
+  wakeup_source_register
+  wakeup_source_unregister
+  __wake_up_sync_key
+  __warn_printk
+  wdev_chandef
+  wdev_to_ieee80211_vif
+  wiphy_apply_custom_regulatory
+  wiphy_free
+  wiphy_new_nm
+  wiphy_read_of_freq_limits
+  wiphy_register
+  wiphy_rfkill_set_hw_state_reason
+  wiphy_rfkill_start_polling
+  wiphy_to_ieee80211_hw
+  wiphy_unregister
+  wireless_nlevent_flush
+  wireless_send_event
+  woken_wake_function
+  work_busy
+  wpan_phy_find
+  wpan_phy_for_each
+  wpan_phy_free
+  wpan_phy_new
+  wpan_phy_register
+  wpan_phy_unregister
+  ww_mutex_lock
+  ww_mutex_unlock
+  __xa_alloc
+  __xa_alloc_cyclic
+  xa_destroy
+  xa_erase
+  xa_find
+  xa_find_after
+  __xa_insert
+  xa_load
+  xa_store
+  xdp_rxq_info_is_reg
+  xdp_rxq_info_unreg_mem_model
+  xfrm_lookup
+  xhci_get_endpoint_index
+  xp_alloc
+  xp_dma_map
+  xp_dma_sync_for_cpu_slow
+  xp_dma_sync_for_device_slow
+  xp_dma_unmap
+  xp_free
+  xp_raw_get_dma
+  xp_set_rxq_info
+  xsk_clear_rx_need_wakeup
+  xsk_get_pool_from_qid
+  xsk_set_rx_need_wakeup
+  xsk_set_tx_need_wakeup
+  xsk_tx_completed
+  xsk_tx_peek_desc
+  xsk_tx_release
+  xsk_uses_need_wakeup
+  zlib_deflate
+  zlib_deflateEnd
+  zlib_deflateInit2
+  zlib_deflateReset
+  zlib_deflate_workspacesize
+  zlib_inflate
+  zlib_inflateIncomp
+  zlib_inflateInit2
+  zlib_inflateReset
+  zlib_inflate_workspacesize
+  zs_compact
+  zs_create_pool
+  zs_destroy_pool
+  zs_free
+  zs_get_total_pages
+  zs_huge_class_size
+  zs_malloc
+  zs_map_object
+  zs_pool_stats
+  zs_unmap_object

diff --git a/android/abi_gki_aarch64_virtual_device b/android/abi_gki_aarch64_virtual_device
new file mode 100644
index 0000000..2bf7a6f
--- /dev/null
+++ b/android/abi_gki_aarch64_virtual_device

@@ -0,0 +1,1404 @@
+[abi_symbol_list]
+# commonly used symbols
+  alloc_candev_mqs
+  alloc_can_err_skb
+  alloc_can_skb
+  alloc_etherdev_mqs
+  __alloc_pages
+  __alloc_skb
+  alloc_workqueue
+  alt_cb_patch_nops
+  amba_driver_register
+  amba_driver_unregister
+  __arch_copy_from_user
+  __arch_copy_to_user
+  arm64_use_ng_mappings
+  bcmp
+  blk_queue_io_min
+  blk_queue_io_opt
+  blk_queue_logical_block_size
+  blk_queue_max_discard_sectors
+  blk_queue_max_write_zeroes_sectors
+  blk_queue_physical_block_size
+  bpf_trace_run2
+  bpf_trace_run3
+  bt_err
+  bt_info
+  bt_warn
+  build_skb
+  cancel_delayed_work_sync
+  cancel_work_sync
+  can_change_mtu
+  can_dropped_invalid_skb
+  cfg80211_chandef_valid
+  __check_object_size
+  __class_create
+  class_destroy
+  __ClearPageMovable
+  clk_disable
+  clk_enable
+  clk_get_rate
+  clk_prepare
+  clk_set_rate
+  clk_unprepare
+  close_candev
+  complete
+  __const_udelay
+  consume_skb
+  contig_page_data
+  __cpuhp_remove_state
+  __cpuhp_setup_state
+  __cpuhp_state_add_instance
+  __cpuhp_state_remove_instance
+  cpu_hwcaps
+  cpu_number
+  __cpu_online_mask
+  debugfs_attr_read
+  debugfs_attr_write
+  debugfs_create_devm_seqfile
+  debugfs_create_dir
+  debugfs_create_file
+  debugfs_create_u32
+  debugfs_create_u8
+  debugfs_remove
+  default_llseek
+  delayed_work_timer_fn
+  del_gendisk
+  del_timer
+  destroy_workqueue
+  dev_addr_mod
+  _dev_err
+  device_add_disk
+  device_create
+  device_create_file
+  device_remove_file
+  device_set_wakeup_capable
+  device_unregister
+  device_wakeup_enable
+  _dev_info
+  __dev_kfree_skb_any
+  devm_clk_get
+  devm_clk_hw_register
+  devm_ioremap
+  devm_ioremap_resource
+  devm_kfree
+  devm_kmalloc
+  devm_request_threaded_irq
+  _dev_notice
+  __dev_queue_xmit
+  _dev_warn
+  dma_alloc_attrs
+  dma_buf_export
+  dma_fence_context_alloc
+  dma_fence_init
+  dma_fence_release
+  dma_fence_signal_locked
+  dma_free_attrs
+  dmam_alloc_attrs
+  dma_set_coherent_mask
+  dma_set_mask
+  dma_sync_sg_for_device
+  dma_sync_single_for_device
+  dma_unmap_sg_attrs
+  do_trace_netlink_extack
+  drm_add_modes_noedid
+  drm_atomic_get_crtc_state
+  drm_atomic_helper_check
+  drm_atomic_helper_check_plane_state
+  drm_atomic_helper_commit
+  drm_atomic_helper_connector_destroy_state
+  drm_atomic_helper_connector_duplicate_state
+  drm_atomic_helper_connector_reset
+  drm_atomic_helper_disable_plane
+  drm_atomic_helper_page_flip
+  drm_atomic_helper_set_config
+  drm_atomic_helper_shutdown
+  drm_atomic_helper_update_plane
+  drm_compat_ioctl
+  drm_connector_attach_encoder
+  drm_connector_cleanup
+  drm_connector_init
+  drm_crtc_arm_vblank_event
+  drm_crtc_cleanup
+  drm_crtc_handle_vblank
+  drm_crtc_init_with_planes
+  drm_crtc_send_vblank_event
+  drm_crtc_vblank_get
+  drm_crtc_vblank_off
+  drm_crtc_vblank_on
+  ___drm_dbg
+  drm_debugfs_create_files
+  drm_dev_alloc
+  drm_dev_put
+  drm_dev_register
+  drm_dev_unregister
+  __drm_err
+  drm_gem_create_mmap_offset
+  drm_gem_fb_create
+  drm_gem_handle_create
+  drm_gem_mmap
+  drm_gem_object_free
+  drm_gem_object_release
+  drm_gem_prime_fd_to_handle
+  drm_gem_prime_handle_to_fd
+  drm_gem_prime_mmap
+  drm_gem_private_object_init
+  drm_gem_vm_close
+  drm_gem_vm_open
+  drm_helper_probe_single_connector_modes
+  drm_ioctl
+  drmm_mode_config_init
+  drm_mode_config_reset
+  __drmm_universal_plane_alloc
+  drm_open
+  drm_poll
+  drm_read
+  drm_release
+  drm_set_preferred_mode
+  drm_simple_encoder_init
+  drm_vblank_init
+  ether_setup
+  ethtool_op_get_link
+  ethtool_op_get_ts_info
+  eth_validate_addr
+  fd_install
+  finish_wait
+  firmware_request_nowarn
+  flush_work
+  __folio_put
+  fortify_panic
+  fput
+  free_candev
+  free_irq
+  free_netdev
+  __free_pages
+  free_pages
+  get_device
+  __get_free_pages
+  get_random_bytes
+  get_unused_fd_flags
+  gic_nonsecure_priorities
+  gpiod_put
+  hci_alloc_dev_priv
+  __hci_cmd_sync
+  __hci_cmd_sync_ev
+  hci_free_dev
+  hci_recv_frame
+  hci_register_dev
+  hci_unregister_dev
+  hrtimer_cancel
+  hrtimer_forward
+  hrtimer_init
+  hrtimer_start_range_ns
+  ida_alloc_range
+  ida_free
+  idr_alloc
+  idr_destroy
+  idr_remove
+  ieee80211_alloc_hw_nm
+  ieee80211_beacon_cntdwn_is_complete
+  ieee80211_beacon_get_tim
+  ieee80211_csa_finish
+  ieee80211_free_hw
+  ieee80211_free_txskb
+  ieee80211_get_buffered_bc
+  ieee80211_get_hdrlen_from_skb
+  ieee80211_get_tx_rates
+  ieee80211_iterate_active_interfaces_atomic
+  ieee80211_queue_delayed_work
+  ieee80211_radar_detected
+  ieee80211_register_hw
+  ieee80211_send_bar
+  ieee80211_sta_register_airtime
+  ieee80211_stop_queues
+  ieee80211_stop_tx_ba_cb_irqsafe
+  ieee80211_tx_status_ext
+  ieee80211_unregister_hw
+  ieee80211_wake_queues
+  __init_swait_queue_head
+  init_timer_key
+  init_wait_entry
+  __init_waitqueue_head
+  input_alloc_absinfo
+  input_allocate_device
+  input_event
+  input_free_device
+  input_mt_init_slots
+  input_register_device
+  input_set_abs_params
+  input_unregister_device
+  ioremap_prot
+  iounmap
+  irq_set_irq_wake
+  is_vmalloc_addr
+  jiffies
+  kasan_flag_enabled
+  kfree
+  kfree_skb_reason
+  kimage_voffset
+  __kmalloc
+  kmalloc_caches
+  kmalloc_trace
+  kmem_cache_alloc
+  kmem_cache_create
+  kmem_cache_destroy
+  kmem_cache_free
+  kmemdup
+  kstrndup
+  kstrtobool_from_user
+  kstrtoint
+  kstrtouint
+  kthread_create_on_node
+  kthread_park
+  kthread_should_stop
+  kthread_stop
+  kthread_unpark
+  ktime_get
+  ktime_get_with_offset
+  kvfree
+  kvmalloc_node
+  __list_add_valid
+  __list_del_entry_valid
+  __local_bh_enable_ip
+  log_post_read_mmio
+  log_post_write_mmio
+  log_read_mmio
+  log_write_mmio
+  memcpy
+  memmove
+  memparse
+  memset
+  memstart_addr
+  misc_deregister
+  misc_register
+  mod_timer
+  module_layout
+  __msecs_to_jiffies
+  msleep
+  __mutex_init
+  mutex_lock
+  mutex_lock_interruptible
+  mutex_unlock
+  napi_complete_done
+  napi_disable
+  napi_enable
+  napi_gro_receive
+  __napi_schedule
+  napi_schedule_prep
+  netdev_err
+  netdev_info
+  netdev_printk
+  netdev_rx_handler_register
+  netdev_rx_handler_unregister
+  netdev_upper_dev_unlink
+  netdev_warn
+  netif_carrier_off
+  netif_carrier_on
+  netif_device_detach
+  netif_napi_add_weight
+  __netif_napi_del
+  netif_rx
+  netif_tx_stop_all_queues
+  netif_tx_wake_queue
+  nf_conntrack_destroy
+  __nla_parse
+  nla_put_64bit
+  nla_put
+  nonseekable_open
+  noop_llseek
+  nr_cpu_ids
+  __num_online_cpus
+  of_device_is_compatible
+  of_find_property
+  of_property_read_variable_u32_array
+  open_candev
+  page_frag_alloc_align
+  __page_frag_cache_drain
+  page_frag_free
+  param_ops_bool
+  param_ops_int
+  param_ops_uint
+  passthru_features_check
+  pci_bus_type
+  pci_disable_device
+  pci_enable_device
+  pci_find_capability
+  pci_find_next_capability
+  pci_iounmap
+  pci_read_config_byte
+  pci_read_config_dword
+  __pci_register_driver
+  pci_release_region
+  pci_request_region
+  pci_unregister_driver
+  __per_cpu_offset
+  perf_trace_buf_alloc
+  perf_trace_run_bpf_submit
+  platform_device_add
+  platform_device_add_data
+  platform_device_alloc
+  platform_device_del
+  platform_device_put
+  platform_device_register_full
+  platform_device_unregister
+  __platform_driver_register
+  platform_driver_unregister
+  platform_get_irq
+  platform_get_resource
+  pm_wakeup_dev_event
+  preempt_schedule
+  preempt_schedule_notrace
+  prepare_to_wait_event
+  _printk
+  put_device
+  put_disk
+  __put_task_struct
+  put_unused_fd
+  queue_delayed_work_on
+  queue_work_on
+  ___ratelimit
+  _raw_spin_lock
+  _raw_spin_lock_bh
+  _raw_spin_lock_irq
+  _raw_spin_lock_irqsave
+  _raw_spin_trylock
+  _raw_spin_unlock
+  _raw_spin_unlock_bh
+  _raw_spin_unlock_irq
+  _raw_spin_unlock_irqrestore
+  __rcu_read_lock
+  __rcu_read_unlock
+  refcount_warn_saturate
+  __register_blkdev
+  register_candev
+  register_netdevice
+  register_netdevice_notifier
+  register_shrinker
+  register_virtio_device
+  register_virtio_driver
+  __regmap_init
+  regmap_write
+  release_firmware
+  remap_pfn_range
+  request_firmware
+  request_threaded_irq
+  rtnl_link_register
+  rtnl_link_unregister
+  rtnl_lock
+  rtnl_unlock
+  sched_set_fifo_low
+  schedule
+  schedule_timeout
+  scnprintf
+  seq_lseek
+  seq_printf
+  seq_puts
+  seq_read
+  serio_close
+  serio_interrupt
+  serio_open
+  serio_reconnect
+  __serio_register_driver
+  __serio_register_port
+  serio_unregister_driver
+  set_capacity_and_notify
+  __SetPageMovable
+  sg_alloc_table
+  sg_free_table
+  sg_init_one
+  sg_init_table
+  sg_miter_next
+  sg_miter_start
+  sg_miter_stop
+  sg_next
+  simple_attr_open
+  simple_attr_release
+  simple_open
+  simple_read_from_buffer
+  single_open
+  single_release
+  skb_add_rx_frag
+  skb_clone
+  skb_dequeue
+  skb_pull
+  skb_push
+  skb_put
+  skb_queue_purge
+  skb_queue_tail
+  skb_to_sgvec
+  skb_tstamp_tx
+  snprintf
+  sprintf
+  sscanf
+  __stack_chk_fail
+  strcasecmp
+  strcmp
+  strlen
+  strncpy
+  strnlen
+  strscpy
+  __sw_hweight8
+  sync_file_create
+  synchronize_irq
+  synchronize_net
+  sysfs_create_group
+  sysfs_remove_group
+  system_wq
+  tasklet_unlock_wait
+  trace_event_buffer_commit
+  trace_event_buffer_reserve
+  trace_event_printf
+  trace_event_raw_init
+  trace_event_reg
+  trace_handle_return
+  trace_raw_output_prep
+  __trace_trigger_soft_disabled
+  unlock_page
+  unregister_blkdev
+  unregister_candev
+  unregister_netdev
+  unregister_netdevice_notifier
+  unregister_netdevice_queue
+  unregister_shrinker
+  unregister_virtio_device
+  unregister_virtio_driver
+  usb_add_hcd
+  usb_alloc_urb
+  usb_anchor_urb
+  usb_bulk_msg
+  usb_control_msg
+  usb_create_hcd
+  usb_create_shared_hcd
+  usb_deregister
+  usb_disabled
+  usb_free_urb
+  usb_get_dev
+  usb_hcd_check_unlink_urb
+  usb_hcd_giveback_urb
+  usb_hcd_is_primary_hcd
+  usb_hcd_link_urb_to_ep
+  usb_hcd_poll_rh_status
+  usb_hcd_resume_root_hub
+  usb_hcd_unlink_urb_from_ep
+  usb_kill_anchored_urbs
+  usb_put_dev
+  usb_put_hcd
+  usb_register_driver
+  usb_remove_hcd
+  usb_reset_device
+  usb_submit_urb
+  usb_unanchor_urb
+  __usecs_to_jiffies
+  usleep_range_state
+  vfree
+  virtio_break_device
+  virtio_check_driver_offered_feature
+  virtio_config_changed
+  virtio_device_freeze
+  virtio_device_restore
+  virtio_reset_device
+  virtqueue_add_inbuf
+  virtqueue_add_outbuf
+  virtqueue_add_sgs
+  virtqueue_detach_unused_buf
+  virtqueue_disable_cb
+  virtqueue_enable_cb
+  virtqueue_get_avail_addr
+  virtqueue_get_buf
+  virtqueue_get_desc_addr
+  virtqueue_get_used_addr
+  virtqueue_get_vring_size
+  virtqueue_is_broken
+  virtqueue_kick
+  virtqueue_kick_prepare
+  virtqueue_notify
+  vmalloc_to_page
+  vm_get_page_prot
+  vring_create_virtqueue
+  vring_del_virtqueue
+  vring_interrupt
+  vring_transport_features
+  __wake_up
+  wake_up_process
+  __warn_printk
+
+# required by ambakmi.ko
+  amba_release_regions
+  amba_request_regions
+  clk_get
+  clk_put
+  serio_unregister_port
+
+# required by armmmci.ko
+  clk_round_rate
+  devm_of_iomap
+  devm_pinctrl_get
+  __devm_reset_control_get
+  dma_map_sg_attrs
+  dma_release_channel
+  dma_request_chan
+  gpiod_direction_input
+  gpiod_get
+  gpiod_get_value
+  gpiod_set_value
+  mmc_add_host
+  mmc_alloc_host
+  mmc_free_host
+  mmc_gpiod_request_cd
+  mmc_gpiod_request_ro
+  mmc_gpio_get_cd
+  mmc_gpio_get_ro
+  mmc_of_parse
+  mmc_regulator_get_supply
+  mmc_regulator_set_ocr
+  mmc_regulator_set_vqmmc
+  mmc_remove_host
+  mmc_request_done
+  mmc_send_tuning
+  of_get_property
+  pinctrl_lookup_state
+  pinctrl_pm_select_sleep_state
+  pinctrl_select_default_state
+  pinctrl_select_state
+  pm_runtime_force_resume
+  pm_runtime_force_suspend
+  __pm_runtime_idle
+  __pm_runtime_resume
+  pm_runtime_set_autosuspend_delay
+  __pm_runtime_use_autosuspend
+  regulator_disable
+  regulator_enable
+  reset_control_assert
+  reset_control_deassert
+  sg_copy_from_buffer
+  sg_copy_to_buffer
+
+# required by btintel.ko
+  bit_wait_timeout
+  bt_to_errno
+  hci_cmd_sync
+  out_of_line_wait_on_bit_timeout
+  request_firmware_direct
+  wake_up_bit
+
+# required by btusb.ko
+  btbcm_set_bdaddr
+  btbcm_setup_apple
+  btbcm_setup_patchram
+  cancel_delayed_work
+  device_wakeup_disable
+  disable_irq
+  disable_irq_nosync
+  enable_irq
+  gpiod_get_optional
+  gpiod_set_value_cansleep
+  hci_cmd_sync_cancel
+  hci_recv_diag
+  irq_modify_status
+  ktime_get_mono_fast_ns
+  of_irq_get_byname
+  of_match_device
+  of_property_read_variable_u16_array
+  pm_runtime_allow
+  pm_runtime_forbid
+  __pm_runtime_suspend
+  pm_system_wakeup
+  usb_autopm_get_interface
+  usb_autopm_put_interface
+  usb_driver_claim_interface
+  usb_driver_release_interface
+  usb_enable_autosuspend
+  usb_get_from_anchor
+  usb_ifnum_to_if
+  usb_match_id
+  usb_queue_reset_device
+  usb_scuttle_anchored_urbs
+  usb_set_interface
+
+# required by clk-vexpress-osc.ko
+  clk_hw_set_rate_range
+  devm_of_clk_add_hw_provider
+  of_clk_hw_simple_get
+  of_property_read_string
+  regmap_read
+
+# required by drm_dma_helper.ko
+  dma_alloc_pages
+  dma_buf_vmap
+  dma_buf_vunmap
+  dma_free_pages
+  dma_get_sgtable_attrs
+  dma_mmap_attrs
+  dma_mmap_pages
+  drm_atomic_helper_damage_iter_init
+  drm_atomic_helper_damage_iter_next
+  __drm_dev_dbg
+  drm_format_info_block_height
+  drm_format_info_block_width
+  drm_gem_fb_get_obj
+  drm_gem_object_init
+  drm_prime_gem_destroy
+  drm_prime_get_contiguous_size
+  drm_printf
+
+# required by dummy-cpufreq.ko
+  cpufreq_generic_attr
+  cpufreq_register_driver
+  cpufreq_unregister_driver
+
+# required by dummy_hcd.ko
+  ktime_get_ts64
+  strstr
+  usb_add_gadget_udc
+  usb_del_gadget_udc
+  usb_ep_set_maxpacket_limit
+  usb_gadget_giveback_request
+  usb_gadget_udc_reset
+
+# required by failover.ko
+  netdev_master_upper_dev_link
+  rtnl_is_locked
+
+# required by goldfish_address_space.ko
+  memremap
+  memunmap
+
+# required by goldfish_battery.ko
+  power_supply_changed
+  power_supply_get_drvdata
+  power_supply_register
+  power_supply_unregister
+
+# required by goldfish_pipe.ko
+  pin_user_pages_fast
+  unpin_user_pages_dirty_lock
+
+# required by goldfish_sync.ko
+  dma_fence_default_wait
+  dma_fence_free
+
+# required by gs_usb.ko
+  alloc_canfd_skb
+  can_eth_ioctl_hwts
+  can_ethtool_op_get_ts_info_hwts
+  can_fd_dlc2len
+  can_fd_len2dlc
+  can_free_echo_skb
+  can_get_echo_skb
+  can_put_echo_skb
+  timecounter_cyc2time
+  timecounter_init
+  timecounter_read
+  usb_control_msg_recv
+  usb_control_msg_send
+
+# required by hci_vhci.ko
+  _copy_from_iter
+  hci_resume_dev
+  hci_suspend_dev
+  iov_iter_revert
+  skb_queue_head
+
+# required by mac80211_hwsim.ko
+  alloc_netdev_mqs
+  __cfg80211_alloc_event_skb
+  __cfg80211_alloc_reply_skb
+  __cfg80211_send_event_skb
+  cfg80211_vendor_cmd_reply
+  dev_alloc_name
+  device_bind_driver
+  device_release_driver
+  dst_release
+  eth_mac_addr
+  genlmsg_put
+  genl_notify
+  genl_register_family
+  genl_unregister_family
+  ieee80211_find_sta_by_link_addrs
+  ieee80211_get_channel_khz
+  ieee80211_nullfunc_get
+  ieee80211_probereq_get
+  ieee80211_ready_on_channel
+  ieee80211_remain_on_channel_expired
+  ieee80211_rx_irqsafe
+  ieee80211_scan_completed
+  ieee80211_set_active_links_async
+  ieee80211_tx_prepare_skb
+  ieee80211_tx_status_irqsafe
+  init_net
+  jiffies_to_msecs
+  __netdev_alloc_skb
+  netlink_broadcast
+  netlink_register_notifier
+  netlink_unicast
+  netlink_unregister_notifier
+  net_namespace_list
+  nla_memcpy
+  register_pernet_device
+  regulatory_hint
+  rhashtable_destroy
+  rhashtable_init
+  rhashtable_insert_slow
+  __rht_bucket_nested
+  rht_bucket_nested
+  rht_bucket_nested_insert
+  schedule_timeout_interruptible
+  skb_copy
+  skb_copy_expand
+  __skb_ext_put
+  skb_trim
+  __sw_hweight16
+  unregister_pernet_device
+  wiphy_apply_custom_regulatory
+
+# required by mt76-usb.ko
+  usb_init_urb
+  usb_kill_urb
+  usb_poison_urb
+  usb_unpoison_urb
+
+# required by mt76.ko
+  cfg80211_reg_can_beacon
+  debugfs_create_blob
+  debugfs_create_file_unsafe
+  dev_driver_string
+  devm_kmemdup
+  dev_set_threaded
+  dma_map_page_attrs
+  dma_sync_single_for_cpu
+  dma_unmap_page_attrs
+  idr_get_next
+  ieee80211_calc_rx_airtime
+  ieee80211_channel_to_freq_khz
+  ieee80211_find_sta_by_ifaddr
+  ieee80211_get_key_rx_seq
+  ieee80211_next_txq
+  ieee80211_rx_list
+  __ieee80211_schedule_txq
+  ieee80211_sta_eosp
+  ieee80211_sta_pspoll
+  ieee80211_sta_ps_transition
+  ieee80211_sta_uapsd_trigger
+  ieee80211_tx_dequeue
+  ieee80211_txq_schedule_start
+  init_dummy_netdev
+  __ioread32_copy
+  __iowrite32_copy
+  kthread_parkme
+  kthread_should_park
+  kvfree_call_rcu
+  led_classdev_register_ext
+  led_classdev_unregister
+  netif_receive_skb_list
+  of_get_child_by_name
+  of_get_mac_address
+  of_get_next_child
+  of_prop_next_string
+  pci_disable_link_state
+  pcie_capability_clear_and_set_word
+  pcie_capability_read_word
+  radix_tree_tagged
+  rfc1042_header
+  __skb_pad
+  wiphy_read_of_freq_limits
+
+# required by mt76x02-lib.ko
+  bpf_trace_run1
+  debugfs_create_bool
+  ieee80211_calc_tx_airtime
+  ieee80211_hdrlen
+  ieee80211_iter_keys_rcu
+  ieee80211_restart_hw
+  __kfifo_init
+  ___pskb_trim
+  __tasklet_schedule
+  tasklet_setup
+  wiphy_to_ieee80211_hw
+
+# required by mt76x02-usb.ko
+  hrtimer_active
+  ieee80211_iterate_interfaces
+  system_highpri_wq
+
+# required by nd_virtio.ko
+  bio_alloc_bioset
+  bio_chain
+  bio_clone_blkg_association
+  fs_bio_set
+  submit_bio
+
+# required by net_failover.ko
+  call_netdevice_notifiers
+  dev_close
+  dev_get_stats
+  dev_mc_sync_multiple
+  dev_mc_unsync
+  dev_open
+  dev_set_mtu
+  dev_uc_sync_multiple
+  dev_uc_unsync
+  __ethtool_get_link_ksettings
+  netdev_change_features
+  netdev_core_stats_alloc
+  netdev_increment_features
+  netdev_lower_state_changed
+  netdev_pick_tx
+  register_netdev
+  vlan_uses_dev
+  vlan_vid_add
+  vlan_vid_del
+  vlan_vids_add_by_dev
+  vlan_vids_del_by_dev
+
+# required by pl111_drm.ko
+  __clk_get_name
+  clk_hw_get_parent
+  clk_hw_round_rate
+  drm_kms_helper_poll_init
+  drm_of_find_panel_or_bridge
+  drm_panel_bridge_add_typed
+  drm_panel_bridge_connector
+  drm_panel_bridge_remove
+  drm_simple_display_pipe_attach_bridge
+  drm_simple_display_pipe_init
+  of_find_device_by_node
+  of_find_matching_node_and_match
+  of_find_node_opts_by_path
+  of_get_next_available_child
+  of_graph_get_next_endpoint
+  of_reserved_mem_device_init_by_idx
+  of_reserved_mem_device_release
+  regmap_update_bits_base
+  syscon_node_to_regmap
+
+# required by psmouse.ko
+  bus_register_notifier
+  bus_unregister_notifier
+  del_timer_sync
+  device_add_groups
+  device_link_add
+  device_link_remove
+  device_remove_groups
+  __flush_workqueue
+  i2c_adapter_type
+  i2c_bus_type
+  i2c_client_type
+  i2c_for_each_dev
+  i2c_new_scanned_device
+  i2c_unregister_device
+  i2c_verify_adapter
+  input_mt_assign_slots
+  input_mt_drop_unused
+  input_mt_report_finger_count
+  input_mt_report_pointer_emulation
+  input_mt_report_slot_state
+  input_mt_sync_frame
+  input_set_capability
+  kstrtobool
+  kstrtou8
+  ps2_begin_command
+  ps2_cmd_aborted
+  ps2_command
+  ps2_drain
+  ps2_end_command
+  ps2_handle_ack
+  ps2_handle_response
+  ps2_init
+  ps2_sendbyte
+  ps2_sliced_command
+  serio_rescan
+  serio_unregister_child_port
+  strncmp
+  strsep
+
+# required by pulse8-cec.ko
+  cec_allocate_adapter
+  cec_delete_adapter
+  cec_received_msg_ts
+  cec_register_adapter
+  cec_s_log_addrs
+  cec_s_phys_addr
+  cec_transmit_attempt_done_ts
+  cec_unregister_adapter
+  wait_for_completion_timeout
+
+# required by rtc-test.ko
+  add_timer
+  devm_rtc_allocate_device
+  __devm_rtc_register_device
+  ktime_get_real_seconds
+  rtc_time64_to_tm
+  rtc_tm_to_time64
+  rtc_update_irq
+
+# required by slcan.ko
+  can_bus_off
+  can_change_state
+  capable
+  hex_asc_upper
+  hex_to_bin
+  tty_mode_ioctl
+  tty_register_ldisc
+  tty_unregister_ldisc
+
+# required by system_heap.ko
+  dma_heap_add
+  dma_heap_get_dev
+  dma_heap_get_name
+  dma_map_sgtable
+  dma_sync_sg_for_cpu
+  __sg_page_iter_next
+  __sg_page_iter_start
+  vmalloc
+  vmap
+  vunmap
+
+# required by usbip-core.ko
+  iov_iter_kvec
+  param_ops_ulong
+  print_hex_dump
+  sock_recvmsg
+
+# required by vcan.ko
+  sock_efree
+
+# required by vexpress-config.ko
+  devres_add
+  __devres_alloc_node
+  devres_free
+  of_find_compatible_node
+  of_get_next_parent
+  __of_parse_phandle_with_args
+  of_platform_populate
+  of_root
+  regmap_exit
+  __udelay
+
+# required by vexpress-sysreg.ko
+  bgpio_init
+  devm_gpiochip_add_data_with_key
+  devm_mfd_add_devices
+
+# required by vhci-hcd.ko
+  kernel_sendmsg
+  kernel_sock_shutdown
+  kstrtoll
+  platform_bus
+  sockfd_lookup
+  strchr
+  sysfs_remove_link
+  usb_speed_string
+
+# required by virt_wifi.ko
+  cfg80211_connect_done
+  cfg80211_disconnected
+  cfg80211_inform_bss_data
+  cfg80211_put_bss
+  cfg80211_scan_done
+  __dev_get_by_index
+  _dev_printk
+  __module_get
+  module_put
+  netdev_upper_dev_link
+  netif_stacked_transfer_operstate
+  unregister_netdevice_many
+  wiphy_free
+  wiphy_new_nm
+  wiphy_register
+  wiphy_unregister
+
+# required by virtio-gpu.ko
+  __devm_request_region
+  dma_fence_match_context
+  dma_fence_wait_timeout
+  dma_map_resource
+  dma_resv_add_fence
+  dma_resv_reserve_fences
+  dma_resv_test_signaled
+  dma_resv_wait_timeout
+  dma_unmap_resource
+  drm_add_edid_modes
+  drm_aperture_remove_conflicting_pci_framebuffers
+  drm_atomic_helper_crtc_destroy_state
+  drm_atomic_helper_crtc_duplicate_state
+  drm_atomic_helper_crtc_reset
+  drm_atomic_helper_damage_merged
+  drm_atomic_helper_dirtyfb
+  drm_atomic_helper_plane_destroy_state
+  drm_atomic_helper_plane_duplicate_state
+  drm_atomic_helper_plane_reset
+  drm_connector_attach_edid_property
+  drm_connector_register
+  drm_connector_unregister
+  drm_connector_update_edid_property
+  drm_cvt_mode
+  drm_dev_enter
+  drm_dev_exit
+  drm_dev_get
+  drm_dev_printk
+  drm_dev_unplug
+  drm_do_get_edid
+  drm_event_reserve_init
+  drm_firmware_drivers_only
+  drm_framebuffer_init
+  drm_gem_dmabuf_mmap
+  drm_gem_dmabuf_release
+  drm_gem_dmabuf_vmap
+  drm_gem_dmabuf_vunmap
+  drm_gem_fb_create_handle
+  drm_gem_fb_destroy
+  drm_gem_free_mmap_offset
+  drm_gem_lock_reservations
+  drm_gem_map_attach
+  drm_gem_map_detach
+  drm_gem_map_dma_buf
+  drm_gem_object_lookup
+  drm_gem_prime_import
+  drm_gem_shmem_create
+  drm_gem_shmem_free
+  drm_gem_shmem_get_pages_sgt
+  drm_gem_shmem_get_sg_table
+  drm_gem_shmem_mmap
+  drm_gem_shmem_pin
+  drm_gem_shmem_print_info
+  drm_gem_shmem_unpin
+  drm_gem_shmem_vmap
+  drm_gem_shmem_vm_ops
+  drm_gem_shmem_vunmap
+  drm_gem_unlock_reservations
+  drm_gem_unmap_dma_buf
+  drm_helper_hpd_irq_event
+  drm_helper_mode_fill_fb_struct
+  drm_kms_helper_hotplug_event
+  drmm_kfree
+  drmm_kmalloc
+  drm_mm_init
+  drm_mm_insert_node_in_range
+  drm_mm_print
+  drm_mm_remove_node
+  drm_mm_takedown
+  drm_mode_probed_add
+  __drm_printfn_seq_file
+  __drm_puts_seq_file
+  drm_send_event
+  __get_task_comm
+  iomem_resource
+  is_virtio_device
+  memdup_user
+  sync_file_get_fence
+  __traceiter_dma_fence_emit
+  __tracepoint_dma_fence_emit
+  vmemdup_user
+  ww_mutex_lock_interruptible
+  ww_mutex_unlock
+
+# required by virtio-rng.ko
+  hwrng_register
+  hwrng_unregister
+  wait_for_completion_killable
+
+# required by virtio_balloon.ko
+  adjust_managed_page_count
+  all_vm_events
+  balloon_mops
+  balloon_page_alloc
+  balloon_page_dequeue
+  balloon_page_enqueue
+  init_on_free
+  mutex_trylock
+  page_relinquish
+  page_reporting_register
+  page_reporting_unregister
+  register_oom_notifier
+  si_mem_available
+  si_meminfo
+  system_freezable_wq
+  unregister_oom_notifier
+  virtqueue_disable_dma_api_for_buffers
+  vm_event_states
+  vm_node_stat
+
+# required by virtio_blk.ko
+  blk_execute_rq
+  __blk_mq_alloc_disk
+  blk_mq_alloc_request
+  blk_mq_alloc_tag_set
+  blk_mq_complete_request
+  blk_mq_end_request
+  blk_mq_end_request_batch
+  blk_mq_free_request
+  blk_mq_free_tag_set
+  blk_mq_map_queues
+  blk_mq_quiesce_queue
+  blk_mq_requeue_request
+  blk_mq_start_request
+  blk_mq_start_stopped_hw_queues
+  blk_mq_stop_hw_queue
+  blk_mq_unquiesce_queue
+  blk_mq_virtio_map_queues
+  blk_queue_alignment_offset
+  blk_queue_max_discard_segments
+  blk_queue_max_hw_sectors
+  blk_queue_max_secure_erase_sectors
+  blk_queue_max_segments
+  blk_queue_max_segment_size
+  blk_queue_write_cache
+  blk_rq_map_kern
+  __blk_rq_map_sg
+  blk_status_to_errno
+  set_disk_ro
+  sg_alloc_table_chained
+  sg_free_table_chained
+  string_get_size
+  sysfs_emit
+  __sysfs_match_string
+  virtio_max_dma_size
+
+# required by virtio_console.ko
+  cdev_add
+  cdev_alloc
+  cdev_del
+  device_destroy
+  fasync_helper
+  hvc_alloc
+  hvc_instantiate
+  hvc_kick
+  hvc_poll
+  hvc_remove
+  __hvc_resize
+  kill_fasync
+  kobject_uevent
+  pipe_lock
+  pipe_unlock
+  __register_chrdev
+  __splice_from_pipe
+  __unregister_chrdev
+  wait_for_completion
+
+# required by virtio_mmio.ko
+  device_for_each_child
+  device_register
+  devm_platform_ioremap_resource
+
+# required by virtio_net.ko
+  bpf_dispatcher_xdp_func
+  bpf_master_redirect_enabled_key
+  bpf_prog_add
+  bpf_prog_put
+  bpf_prog_sub
+  bpf_stats_enabled_key
+  bpf_warn_invalid_xdp_action
+  cpumask_next_wrap
+  cpus_read_lock
+  cpus_read_unlock
+  eth_commit_mac_addr_change
+  eth_prepare_mac_addr_change
+  ethtool_sprintf
+  ethtool_virtdev_set_link_ksettings
+  eth_type_trans
+  _find_first_bit
+  flow_keys_basic_dissector
+  jiffies_to_usecs
+  __napi_alloc_skb
+  napi_consume_skb
+  netdev_notify_peers
+  netdev_rss_key_fill
+  netif_device_attach
+  netif_set_real_num_rx_queues
+  netif_set_real_num_tx_queues
+  __netif_set_xps_queue
+  netif_tx_lock
+  netif_tx_unlock
+  net_ratelimit
+  __pskb_pull_tail
+  sched_clock
+  skb_coalesce_rx_frag
+  __skb_flow_dissect
+  skb_page_frag_refill
+  skb_partial_csum_set
+  softnet_data
+  __traceiter_xdp_exception
+  __tracepoint_xdp_exception
+  virtqueue_add_inbuf_ctx
+  virtqueue_enable_cb_delayed
+  virtqueue_enable_cb_prepare
+  virtqueue_get_buf_ctx
+  virtqueue_poll
+  virtqueue_resize
+  xdp_convert_zc_to_xdp_frame
+  xdp_do_flush
+  xdp_do_redirect
+  xdp_master_redirect
+  xdp_return_frame
+  xdp_return_frame_rx_napi
+  __xdp_rxq_info_reg
+  xdp_rxq_info_reg_mem_model
+  xdp_rxq_info_unreg
+  xdp_warn
+
+# required by virtio_pci.ko
+  __irq_apply_affinity_hint
+  pci_alloc_irq_vectors_affinity
+  pci_device_is_present
+  pci_disable_sriov
+  pci_enable_sriov
+  pci_find_ext_capability
+  pci_free_irq_vectors
+  pci_irq_get_affinity
+  pci_irq_vector
+  pci_set_master
+  pci_vfs_assigned
+
+# required by virtio_pci_legacy_dev.ko
+  pci_iomap
+
+# required by virtio_pci_modern_dev.ko
+  pci_iomap_range
+  pci_release_selected_regions
+  pci_request_selected_regions
+
+# required by virtio_pmem.ko
+  nvdimm_bus_register
+  nvdimm_bus_unregister
+  nvdimm_pmem_region_create
+
+# required by virtio_snd.ko
+  snd_card_free
+  snd_card_new
+  snd_card_register
+  snd_jack_new
+  snd_jack_report
+  snd_pcm_add_chmap_ctls
+  snd_pcm_format_physical_width
+  snd_pcm_hw_constraint_integer
+  snd_pcm_lib_ioctl
+  snd_pcm_new
+  snd_pcm_period_elapsed
+  snd_pcm_set_managed_buffer_all
+  snd_pcm_set_ops
+  wait_for_completion_interruptible_timeout
+
+# required by vkms.ko
+  crc32_le
+  __devm_drm_dev_alloc
+  devres_open_group
+  devres_release_group
+  drm_atomic_add_affected_planes
+  drm_atomic_helper_check_wb_encoder_state
+  drm_atomic_helper_cleanup_planes
+  drm_atomic_helper_commit_hw_done
+  drm_atomic_helper_commit_modeset_disables
+  drm_atomic_helper_commit_modeset_enables
+  drm_atomic_helper_commit_planes
+  __drm_atomic_helper_crtc_destroy_state
+  __drm_atomic_helper_crtc_duplicate_state
+  __drm_atomic_helper_crtc_reset
+  drm_atomic_helper_fake_vblank
+  drm_atomic_helper_wait_for_flip_done
+  drm_calc_timestamping_constants
+  drm_crtc_accurate_vblank_count
+  drm_crtc_add_crc_entry
+  drm_crtc_vblank_put
+  drm_encoder_cleanup
+  drm_gem_cleanup_shadow_fb
+  __drm_gem_destroy_shadow_plane_state
+  __drm_gem_duplicate_shadow_plane_state
+  drm_gem_fb_vmap
+  drm_gem_fb_vunmap
+  drm_gem_prepare_shadow_fb
+  __drm_gem_reset_shadow_plane
+  drm_gem_shmem_dumb_create
+  drm_gem_shmem_prime_import_sg_table
+  drm_mode_object_get
+  drm_mode_object_put
+  drm_writeback_connector_init
+  drm_writeback_queue_job
+  drm_writeback_signal_completion
+
+# required by vmw_vsock_virtio_transport.ko
+  sk_error_report
+  synchronize_rcu
+  virtio_transport_connect
+  virtio_transport_deliver_tap_pkt
+  virtio_transport_destruct
+  virtio_transport_dgram_allow
+  virtio_transport_dgram_bind
+  virtio_transport_dgram_dequeue
+  virtio_transport_dgram_enqueue
+  virtio_transport_do_socket_init
+  virtio_transport_free_pkt
+  virtio_transport_notify_buffer_size
+  virtio_transport_notify_poll_in
+  virtio_transport_notify_poll_out
+  virtio_transport_notify_recv_init
+  virtio_transport_notify_recv_post_dequeue
+  virtio_transport_notify_recv_pre_block
+  virtio_transport_notify_recv_pre_dequeue
+  virtio_transport_notify_send_init
+  virtio_transport_notify_send_post_enqueue
+  virtio_transport_notify_send_pre_block
+  virtio_transport_notify_send_pre_enqueue
+  virtio_transport_recv_pkt
+  virtio_transport_release
+  virtio_transport_seqpacket_dequeue
+  virtio_transport_seqpacket_enqueue
+  virtio_transport_seqpacket_has_data
+  virtio_transport_shutdown
+  virtio_transport_stream_allow
+  virtio_transport_stream_dequeue
+  virtio_transport_stream_enqueue
+  virtio_transport_stream_has_data
+  virtio_transport_stream_has_space
+  virtio_transport_stream_is_active
+  virtio_transport_stream_rcvhiwat
+  vsock_core_register
+  vsock_core_unregister
+  vsock_for_each_connected_socket
+
+# required by zram.ko
+  __alloc_percpu
+  bdev_end_io_acct
+  bdev_start_io_acct
+  bio_endio
+  bio_end_io_acct_remapped
+  bio_start_io_acct
+  __blk_alloc_disk
+  blk_queue_flag_clear
+  blk_queue_flag_set
+  __class_register
+  class_unregister
+  __cpu_possible_mask
+  crypto_alloc_base
+  crypto_comp_compress
+  crypto_comp_decompress
+  crypto_destroy_tfm
+  crypto_has_alg
+  down_read
+  down_write
+  _find_next_bit
+  flush_dcache_page
+  free_percpu
+  idr_find
+  idr_for_each
+  __init_rwsem
+  kstrtou16
+  kstrtoull
+  memset64
+  mutex_is_locked
+  page_endio
+  set_capacity
+  sync_blockdev
+  sysfs_streq
+  up_read
+  up_write
+  vzalloc
+
+# required by zsmalloc.ko
+  dec_zone_page_state
+  folio_wait_bit
+  inc_zone_page_state
+  kstrdup
+  _raw_read_lock
+  _raw_read_unlock
+  _raw_write_lock
+  _raw_write_unlock

diff --git a/android/abi_gki_protected_exports b/android/abi_gki_protected_exports
new file mode 100644
index 0000000..0cfdef3
--- /dev/null
+++ b/android/abi_gki_protected_exports

@@ -0,0 +1,538 @@
+__cfg80211_alloc_event_skb
+__cfg80211_alloc_reply_skb
+__cfg80211_radar_event
+__cfg80211_send_event_skb
+__hci_cmd_send
+__hci_cmd_sync
+__hci_cmd_sync_ev
+__hci_cmd_sync_sk
+__hci_cmd_sync_status
+__hci_cmd_sync_status_sk
+__ieee80211_schedule_txq
+__nfc_alloc_vendor_cmd_reply_skb
+alloc_can_err_skb
+alloc_can_skb
+alloc_candev_mqs
+alloc_canfd_skb
+alloc_canxl_skb
+arc4_crypt
+arc4_setkey
+baswap
+bridge_tunnel_header
+bt_accept_dequeue
+bt_accept_enqueue
+bt_accept_unlink
+bt_debugfs
+bt_err
+bt_err_ratelimited
+bt_info
+bt_procfs_cleanup
+bt_procfs_init
+bt_sock_ioctl
+bt_sock_link
+bt_sock_poll
+bt_sock_reclassify_lock
+bt_sock_recvmsg
+bt_sock_register
+bt_sock_stream_recvmsg
+bt_sock_unlink
+bt_sock_unregister
+bt_sock_wait_ready
+bt_sock_wait_state
+bt_status
+bt_to_errno
+bt_warn
+bt_warn_ratelimited
+btbcm_check_bdaddr
+btbcm_finalize
+btbcm_initialize
+btbcm_patchram
+btbcm_read_pcm_int_params
+btbcm_set_bdaddr
+btbcm_setup_apple
+btbcm_setup_patchram
+btbcm_write_pcm_int_params
+can_bus_off
+can_change_mtu
+can_change_state
+can_dropped_invalid_skb
+can_eth_ioctl_hwts
+can_ethtool_op_get_ts_info_hwts
+can_fd_dlc2len
+can_fd_len2dlc
+can_free_echo_skb
+can_get_echo_skb
+can_get_state_str
+can_proto_register
+can_proto_unregister
+can_put_echo_skb
+can_rx_offload_add_fifo
+can_rx_offload_add_manual
+can_rx_offload_add_timestamp
+can_rx_offload_del
+can_rx_offload_enable
+can_rx_offload_get_echo_skb
+can_rx_offload_irq_finish
+can_rx_offload_irq_offload_fifo
+can_rx_offload_irq_offload_timestamp
+can_rx_offload_queue_tail
+can_rx_offload_queue_timestamp
+can_rx_offload_threaded_irq_finish
+can_rx_register
+can_rx_unregister
+can_send
+can_skb_get_frame_len
+can_sock_destruct
+cfg80211_any_usable_channels
+cfg80211_assoc_comeback
+cfg80211_assoc_failure
+cfg80211_auth_timeout
+cfg80211_background_cac_abort
+cfg80211_bss_color_notify
+cfg80211_bss_flush
+cfg80211_bss_iter
+cfg80211_cac_event
+cfg80211_calculate_bitrate
+cfg80211_ch_switch_notify
+cfg80211_ch_switch_started_notify
+cfg80211_chandef_compatible
+cfg80211_chandef_create
+cfg80211_chandef_dfs_required
+cfg80211_chandef_usable
+cfg80211_chandef_valid
+cfg80211_check_combinations
+cfg80211_check_station_change
+cfg80211_classify8021d
+cfg80211_conn_failed
+cfg80211_connect_done
+cfg80211_control_port_tx_status
+cfg80211_cqm_beacon_loss_notify
+cfg80211_cqm_pktloss_notify
+cfg80211_cqm_rssi_notify
+cfg80211_cqm_txe_notify
+cfg80211_crit_proto_stopped
+cfg80211_del_sta_sinfo
+cfg80211_disconnected
+cfg80211_external_auth_request
+cfg80211_find_elem_match
+cfg80211_find_vendor_elem
+cfg80211_free_nan_func
+cfg80211_ft_event
+cfg80211_get_bss
+cfg80211_get_drvinfo
+cfg80211_get_ies_channel_number
+cfg80211_get_iftype_ext_capa
+cfg80211_get_p2p_attr
+cfg80211_get_station
+cfg80211_gtk_rekey_notify
+cfg80211_ibss_joined
+cfg80211_iftype_allowed
+cfg80211_inform_bss_data
+cfg80211_inform_bss_frame_data
+cfg80211_is_element_inherited
+cfg80211_iter_combinations
+cfg80211_merge_profile
+cfg80211_mgmt_tx_status_ext
+cfg80211_michael_mic_failure
+cfg80211_nan_func_terminated
+cfg80211_nan_match
+cfg80211_new_sta
+cfg80211_notify_new_peer_candidate
+cfg80211_pmksa_candidate_notify
+cfg80211_pmsr_complete
+cfg80211_pmsr_report
+cfg80211_port_authorized
+cfg80211_probe_status
+cfg80211_put_bss
+cfg80211_ready_on_channel
+cfg80211_ref_bss
+cfg80211_reg_can_beacon
+cfg80211_reg_can_beacon_relax
+cfg80211_register_netdevice
+cfg80211_remain_on_channel_expired
+cfg80211_report_obss_beacon_khz
+cfg80211_report_wowlan_wakeup
+cfg80211_roamed
+cfg80211_rx_assoc_resp
+cfg80211_rx_control_port
+cfg80211_rx_mgmt_ext
+cfg80211_rx_mlme_mgmt
+cfg80211_rx_spurious_frame
+cfg80211_rx_unexpected_4addr_frame
+cfg80211_rx_unprot_mlme_mgmt
+cfg80211_scan_done
+cfg80211_sched_scan_results
+cfg80211_sched_scan_stopped
+cfg80211_sched_scan_stopped_locked
+cfg80211_send_layer2_update
+cfg80211_shutdown_all_interfaces
+cfg80211_sinfo_alloc_tid_stats
+cfg80211_sta_opmode_change_notify
+cfg80211_stop_iface
+cfg80211_tdls_oper_request
+cfg80211_tx_mgmt_expired
+cfg80211_tx_mlme_mgmt
+cfg80211_unlink_bss
+cfg80211_unregister_wdev
+cfg80211_update_owe_info_event
+cfg80211_vendor_cmd_get_sender
+cfg80211_vendor_cmd_reply
+close_candev
+free_candev
+freq_reg_info
+get_wiphy_regdom
+h4_recv_buf
+hci_alloc_dev_priv
+hci_cmd_sync
+hci_cmd_sync_cancel
+hci_cmd_sync_queue
+hci_conn_check_secure
+hci_conn_security
+hci_conn_switch_role
+hci_free_dev
+hci_get_route
+hci_mgmt_chan_register
+hci_mgmt_chan_unregister
+hci_recv_diag
+hci_recv_frame
+hci_register_cb
+hci_register_dev
+hci_release_dev
+hci_reset_dev
+hci_resume_dev
+hci_set_fw_info
+hci_set_hw_info
+hci_suspend_dev
+hci_uart_register_device
+hci_uart_tx_wakeup
+hci_uart_unregister_device
+hci_unregister_cb
+hci_unregister_dev
+hidp_hid_driver
+ieee80211_alloc_hw_nm
+ieee80211_amsdu_to_8023s
+ieee80211_ap_probereq_get
+ieee80211_ave_rssi
+ieee80211_beacon_cntdwn_is_complete
+ieee80211_beacon_get_template
+ieee80211_beacon_get_tim
+ieee80211_beacon_loss
+ieee80211_beacon_set_cntdwn
+ieee80211_beacon_update_cntdwn
+ieee80211_bss_get_elem
+ieee80211_calc_rx_airtime
+ieee80211_calc_tx_airtime
+ieee80211_chandef_to_operating_class
+ieee80211_channel_switch_disconnect
+ieee80211_channel_to_freq_khz
+ieee80211_chswitch_done
+ieee80211_color_change_finish
+ieee80211_connection_loss
+ieee80211_cqm_beacon_loss_notify
+ieee80211_cqm_rssi_notify
+ieee80211_csa_finish
+ieee80211_ctstoself_duration
+ieee80211_ctstoself_get
+ieee80211_data_to_8023_exthdr
+ieee80211_disable_rssi_reports
+ieee80211_disconnect
+ieee80211_enable_rssi_reports
+ieee80211_find_sta
+ieee80211_find_sta_by_ifaddr
+ieee80211_find_sta_by_link_addrs
+ieee80211_free_hw
+ieee80211_free_txskb
+ieee80211_freq_khz_to_channel
+ieee80211_generic_frame_duration
+ieee80211_get_bssid
+ieee80211_get_buffered_bc
+ieee80211_get_channel_khz
+ieee80211_get_fils_discovery_tmpl
+ieee80211_get_hdrlen_from_skb
+ieee80211_get_key_rx_seq
+ieee80211_get_mesh_hdrlen
+ieee80211_get_num_supported_channels
+ieee80211_get_response_rate
+ieee80211_get_tkip_p1k_iv
+ieee80211_get_tkip_p2k
+ieee80211_get_tkip_rx_p1k
+ieee80211_get_tx_rates
+ieee80211_get_unsol_bcast_probe_resp_tmpl
+ieee80211_get_vht_max_nss
+ieee80211_gtk_rekey_add
+ieee80211_gtk_rekey_notify
+ieee80211_hdrlen
+ieee80211_hw_restart_disconnect
+ieee80211_ie_split_ric
+ieee80211_iter_chan_contexts_atomic
+ieee80211_iter_keys
+ieee80211_iter_keys_rcu
+ieee80211_iterate_active_interfaces_atomic
+ieee80211_iterate_active_interfaces_mtx
+ieee80211_iterate_interfaces
+ieee80211_iterate_stations
+ieee80211_iterate_stations_atomic
+ieee80211_key_mic_failure
+ieee80211_key_replay
+ieee80211_manage_rx_ba_offl
+ieee80211_mandatory_rates
+ieee80211_mark_rx_ba_filtered_frames
+ieee80211_nan_func_match
+ieee80211_nan_func_terminated
+ieee80211_next_txq
+ieee80211_nullfunc_get
+ieee80211_operating_class_to_band
+ieee80211_parse_p2p_noa
+ieee80211_probereq_get
+ieee80211_proberesp_get
+ieee80211_pspoll_get
+ieee80211_queue_delayed_work
+ieee80211_queue_stopped
+ieee80211_queue_work
+ieee80211_radar_detected
+ieee80211_radiotap_iterator_init
+ieee80211_radiotap_iterator_next
+ieee80211_rate_control_register
+ieee80211_rate_control_unregister
+ieee80211_ready_on_channel
+ieee80211_register_hw
+ieee80211_remain_on_channel_expired
+ieee80211_remove_key
+ieee80211_report_low_ack
+ieee80211_report_wowlan_wakeup
+ieee80211_request_smps
+ieee80211_reserve_tid
+ieee80211_restart_hw
+ieee80211_resume_disconnect
+ieee80211_rts_duration
+ieee80211_rts_get
+ieee80211_rx_ba_timer_expired
+ieee80211_rx_irqsafe
+ieee80211_rx_list
+ieee80211_rx_napi
+ieee80211_s1g_channel_width
+ieee80211_scan_completed
+ieee80211_sched_scan_results
+ieee80211_sched_scan_stopped
+ieee80211_send_bar
+ieee80211_send_eosp_nullfunc
+ieee80211_set_active_links
+ieee80211_set_active_links_async
+ieee80211_set_key_rx_seq
+ieee80211_sta_block_awake
+ieee80211_sta_eosp
+ieee80211_sta_ps_transition
+ieee80211_sta_pspoll
+ieee80211_sta_recalc_aggregates
+ieee80211_sta_register_airtime
+ieee80211_sta_set_buffered
+ieee80211_sta_uapsd_trigger
+ieee80211_start_tx_ba_cb_irqsafe
+ieee80211_start_tx_ba_session
+ieee80211_stop_queue
+ieee80211_stop_queues
+ieee80211_stop_rx_ba_session
+ieee80211_stop_tx_ba_cb_irqsafe
+ieee80211_stop_tx_ba_session
+ieee80211_tdls_oper_request
+ieee80211_tkip_add_iv
+ieee80211_tx_dequeue
+ieee80211_tx_prepare_skb
+ieee80211_tx_rate_update
+ieee80211_tx_status
+ieee80211_tx_status_8023
+ieee80211_tx_status_ext
+ieee80211_tx_status_irqsafe
+ieee80211_txq_airtime_check
+ieee80211_txq_get_depth
+ieee80211_txq_may_transmit
+ieee80211_txq_schedule_start
+ieee80211_unregister_hw
+ieee80211_unreserve_tid
+ieee80211_update_mu_groups
+ieee80211_update_p2p_noa
+ieee80211_vif_to_wdev
+ieee80211_wake_queue
+ieee80211_wake_queues
+ieee802154_alloc_hw
+ieee802154_configure_durations
+ieee802154_free_hw
+ieee802154_hdr_peek
+ieee802154_hdr_peek_addrs
+ieee802154_hdr_pull
+ieee802154_hdr_push
+ieee802154_max_payload
+ieee802154_register_hw
+ieee802154_rx_irqsafe
+ieee802154_stop_queue
+ieee802154_unregister_hw
+ieee802154_wake_queue
+ieee802154_xmit_complete
+ieee802154_xmit_error
+ieee802154_xmit_hw_error
+ieeee80211_obss_color_collision_notify
+l2cap_add_psm
+l2cap_chan_close
+l2cap_chan_connect
+l2cap_chan_create
+l2cap_chan_del
+l2cap_chan_list
+l2cap_chan_put
+l2cap_chan_send
+l2cap_chan_set_defaults
+l2cap_conn_get
+l2cap_conn_put
+l2cap_is_socket
+l2cap_register_user
+l2cap_unregister_user
+l2tp_recv_common
+l2tp_session_create
+l2tp_session_dec_refcount
+l2tp_session_delete
+l2tp_session_get
+l2tp_session_get_by_ifname
+l2tp_session_get_nth
+l2tp_session_inc_refcount
+l2tp_session_register
+l2tp_session_set_header_len
+l2tp_sk_to_tunnel
+l2tp_tunnel_create
+l2tp_tunnel_dec_refcount
+l2tp_tunnel_delete
+l2tp_tunnel_get
+l2tp_tunnel_get_nth
+l2tp_tunnel_get_session
+l2tp_tunnel_inc_refcount
+l2tp_tunnel_register
+l2tp_udp_encap_recv
+l2tp_xmit_skb
+lowpan_header_compress
+lowpan_header_decompress
+lowpan_nhc_add
+lowpan_nhc_del
+lowpan_register_netdev
+lowpan_register_netdevice
+lowpan_unregister_netdev
+lowpan_unregister_netdevice
+nfc_add_se
+nfc_alloc_recv_skb
+nfc_allocate_device
+nfc_class
+nfc_dep_link_is_up
+nfc_driver_failure
+nfc_find_se
+nfc_fw_download_done
+nfc_get_local_general_bytes
+nfc_proto_register
+nfc_proto_unregister
+nfc_register_device
+nfc_remove_se
+nfc_se_connectivity
+nfc_se_transaction
+nfc_send_to_raw_sock
+nfc_set_remote_general_bytes
+nfc_target_lost
+nfc_targets_found
+nfc_tm_activated
+nfc_tm_data_received
+nfc_tm_deactivated
+nfc_unregister_device
+nfc_vendor_cmd_reply
+of_can_transceiver
+open_candev
+ppp_channel_index
+ppp_dev_name
+ppp_input
+ppp_input_error
+ppp_output_wakeup
+ppp_register_channel
+ppp_register_compressor
+ppp_register_net_channel
+ppp_unit_number
+ppp_unregister_channel
+ppp_unregister_compressor
+pppox_compat_ioctl
+pppox_ioctl
+pppox_unbind_sock
+qca_read_soc_version
+qca_send_pre_shutdown_cmd
+qca_set_bdaddr
+qca_set_bdaddr_rome
+qca_uart_setup
+rate_control_set_rates
+reg_initiator_name
+reg_query_regdb_wmm
+register_candev
+register_pppox_proto
+regulatory_hint
+regulatory_pre_cac_allowed
+regulatory_set_wiphy_regd
+regulatory_set_wiphy_regd_sync
+rfc1042_header
+rfkill_alloc
+rfkill_blocked
+rfkill_destroy
+rfkill_find_type
+rfkill_get_led_trigger_name
+rfkill_init_sw_state
+rfkill_pause_polling
+rfkill_register
+rfkill_resume_polling
+rfkill_set_hw_state_reason
+rfkill_set_led_trigger_name
+rfkill_set_states
+rfkill_set_sw_state
+rfkill_soft_blocked
+rfkill_unregister
+safe_candev_priv
+slhc_compress
+slhc_free
+slhc_init
+slhc_remember
+slhc_toss
+slhc_uncompress
+tipc_dump_done
+tipc_dump_start
+tipc_nl_sk_walk
+tipc_sk_fill_sock_diag
+unregister_candev
+unregister_pppox_proto
+usb_serial_claim_interface
+usb_serial_deregister_drivers
+usb_serial_generic_chars_in_buffer
+usb_serial_generic_close
+usb_serial_generic_get_icount
+usb_serial_generic_open
+usb_serial_generic_process_read_urb
+usb_serial_generic_read_bulk_callback
+usb_serial_generic_resume
+usb_serial_generic_submit_read_urbs
+usb_serial_generic_throttle
+usb_serial_generic_tiocmiwait
+usb_serial_generic_unthrottle
+usb_serial_generic_wait_until_sent
+usb_serial_generic_write
+usb_serial_generic_write_bulk_callback
+usb_serial_generic_write_start
+usb_serial_handle_dcd_change
+usb_serial_port_softint
+usb_serial_register_drivers
+usb_serial_resume
+usb_serial_suspend
+wdev_chandef
+wdev_to_ieee80211_vif
+wiphy_apply_custom_regulatory
+wiphy_free
+wiphy_new_nm
+wiphy_read_of_freq_limits
+wiphy_register
+wiphy_rfkill_set_hw_state_reason
+wiphy_rfkill_start_polling
+wiphy_to_ieee80211_hw
+wiphy_unregister
+wpan_phy_find
+wpan_phy_for_each
+wpan_phy_free
+wpan_phy_new
+wpan_phy_register
+wpan_phy_unregister
\ No newline at end of file

diff --git a/android/gki_protected_modules b/android/gki_protected_modules
new file mode 100644
index 0000000..fa169fc
--- /dev/null
+++ b/android/gki_protected_modules

@@ -0,0 +1,47 @@
+drivers/bluetooth/btbcm.ko
+drivers/bluetooth/btqca.ko
+drivers/bluetooth/btsdio.ko
+drivers/bluetooth/hci_uart.ko
+drivers/net/can/dev/can-dev.ko
+drivers/net/can/slcan/slcan.ko
+drivers/net/can/vcan.ko
+drivers/net/ppp/bsd_comp.ko
+drivers/net/ppp/ppp_deflate.ko
+drivers/net/ppp/ppp_generic.ko
+drivers/net/ppp/ppp_mppe.ko
+drivers/net/ppp/pppox.ko
+drivers/net/ppp/pptp.ko
+drivers/net/slip/slhc.ko
+drivers/usb/class/cdc-acm.ko
+drivers/usb/serial/ftdi_sio.ko
+drivers/usb/serial/usbserial.ko
+lib/crypto/libarc4.ko
+net/6lowpan/6lowpan.ko
+net/6lowpan/nhc_dest.ko
+net/6lowpan/nhc_fragment.ko
+net/6lowpan/nhc_hop.ko
+net/6lowpan/nhc_ipv6.ko
+net/6lowpan/nhc_mobility.ko
+net/6lowpan/nhc_routing.ko
+net/6lowpan/nhc_udp.ko
+net/8021q/8021q.ko
+net/bluetooth/bluetooth.ko
+net/bluetooth/hidp/hidp.ko
+net/bluetooth/rfcomm/rfcomm.ko
+net/can/can.ko
+net/can/can-bcm.ko
+net/can/can-gw.ko
+net/can/can-raw.ko
+net/ieee802154/6lowpan/ieee802154_6lowpan.ko
+net/ieee802154/ieee802154.ko
+net/ieee802154/ieee802154_socket.ko
+net/l2tp/l2tp_core.ko
+net/l2tp/l2tp_ppp.ko
+net/mac80211/mac80211.ko
+net/mac802154/mac802154.ko
+net/nfc/nfc.ko
+net/rfkill/rfkill.ko
+net/tipc/diag.ko
+net/tipc/tipc.ko
+net/wireless/cfg80211.ko
+

diff --git a/android/gki_system_dlkm_modules b/android/gki_system_dlkm_modules
new file mode 100644
index 0000000..c790142
--- /dev/null
+++ b/android/gki_system_dlkm_modules

@@ -0,0 +1,49 @@
+drivers/block/zram/zram.ko
+drivers/bluetooth/btbcm.ko
+drivers/bluetooth/btqca.ko
+drivers/bluetooth/btsdio.ko
+drivers/bluetooth/hci_uart.ko
+drivers/net/can/dev/can-dev.ko
+drivers/net/can/slcan/slcan.ko
+drivers/net/can/vcan.ko
+drivers/net/ppp/bsd_comp.ko
+drivers/net/ppp/ppp_deflate.ko
+drivers/net/ppp/ppp_generic.ko
+drivers/net/ppp/ppp_mppe.ko
+drivers/net/ppp/pppox.ko
+drivers/net/ppp/pptp.ko
+drivers/net/slip/slhc.ko
+drivers/usb/class/cdc-acm.ko
+drivers/usb/serial/ftdi_sio.ko
+drivers/usb/serial/usbserial.ko
+lib/crypto/libarc4.ko
+mm/zsmalloc.ko
+net/6lowpan/6lowpan.ko
+net/6lowpan/nhc_dest.ko
+net/6lowpan/nhc_fragment.ko
+net/6lowpan/nhc_hop.ko
+net/6lowpan/nhc_ipv6.ko
+net/6lowpan/nhc_mobility.ko
+net/6lowpan/nhc_routing.ko
+net/6lowpan/nhc_udp.ko
+net/8021q/8021q.ko
+net/bluetooth/bluetooth.ko
+net/bluetooth/hidp/hidp.ko
+net/bluetooth/rfcomm/rfcomm.ko
+net/can/can.ko
+net/can/can-bcm.ko
+net/can/can-gw.ko
+net/can/can-raw.ko
+net/ieee802154/6lowpan/ieee802154_6lowpan.ko
+net/ieee802154/ieee802154.ko
+net/ieee802154/ieee802154_socket.ko
+net/l2tp/l2tp_core.ko
+net/l2tp/l2tp_ppp.ko
+net/mac80211/mac80211.ko
+net/mac802154/mac802154.ko
+net/nfc/nfc.ko
+net/rfkill/rfkill.ko
+net/tipc/diag.ko
+net/tipc/tipc.ko
+net/wireless/cfg80211.ko
+

diff --git a/arch/Kconfig b/arch/Kconfig
index 81599f5c..acc75b2 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig

@@ -1316,6 +1316,9 @@
 config ARCH_HAS_MEM_ENCRYPT
 	bool
 
+config ARCH_HAS_MEM_RELINQUISH
+	bool
+
 config ARCH_HAS_CC_PLATFORM
 	bool
 

diff --git a/arch/arm/OWNERS b/arch/arm/OWNERS
new file mode 100644
index 0000000..54f66d6
--- /dev/null
+++ b/arch/arm/OWNERS

@@ -0,0 +1 @@
+include ../arm64/OWNERS

diff --git a/arch/arm/include/asm/hypervisor.h b/arch/arm/include/asm/hypervisor.h
index bd61502..8133c8c 100644
--- a/arch/arm/include/asm/hypervisor.h
+++ b/arch/arm/include/asm/hypervisor.h

@@ -6,5 +6,6 @@
 
 void kvm_init_hyp_services(void);
 bool kvm_arm_hyp_service_available(u32 func_id);
+void kvm_arm_init_hyp_services(void);
 
 #endif

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 978db2d..3751c25a 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c

@@ -51,6 +51,10 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/ipi.h>
 
+EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_raise);
+EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_entry);
+EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_exit);
+
 /*
  * as from 2.5, kernels no longer have an init_tasks structure
  * so we need some other way of telling a new secondary core

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 505c8a1..460761d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig

@@ -28,9 +28,12 @@
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE
+	select ARCH_HAS_IOREMAP_PHYS_HOOKS
 	select ARCH_HAS_KCOV
 	select ARCH_HAS_KEEPINITRD
 	select ARCH_HAS_MEMBARRIER_SYNC_CORE
+	select ARCH_HAS_MEM_ENCRYPT
+	select ARCH_HAS_MEM_RELINQUISH
 	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	select ARCH_HAS_PTE_DEVMAP
 	select ARCH_HAS_PTE_SPECIAL
@@ -141,6 +144,7 @@
 	select GENERIC_GETTIMEOFDAY
 	select GENERIC_VDSO_TIME_NS
 	select HARDIRQS_SW_RESEND
+	select HAVE_MOD_ARCH_SPECIFIC if (ARM64_MODULE_PLTS || KVM)
 	select HAVE_MOVE_PMD
 	select HAVE_MOVE_PUD
 	select HAVE_PCI
@@ -2046,7 +2050,6 @@
 config ARM64_MODULE_PLTS
 	bool "Use PLTs to allow module memory to spill over into vmalloc area"
 	depends on MODULES
-	select HAVE_MOD_ARCH_SPECIFIC
 	help
 	  Allocate PLTs when loading modules so that jumps and calls whose
 	  targets are too far away for their relative offsets to be encoded
@@ -2192,6 +2195,12 @@
 	  the boot loader doesn't provide any, the default kernel command
 	  string provided in CMDLINE will be used.
 
+config CMDLINE_EXTEND
+	bool "Extend bootloader kernel arguments"
+	help
+	  The command-line arguments provided by the boot loader will be
+	  appended to the default kernel command string.
+
 config CMDLINE_FORCE
 	bool "Always use the default kernel command string"
 	help

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 5e56d26..34691c2 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile

@@ -155,7 +155,10 @@
 KBUILD_IMAGE	:= $(boot)/vmlinuz.efi
 endif
 
+# Don't compile Image in mixed build with "all" target
+ifndef KBUILD_MIXED_TREE
 all:	$(notdir $(KBUILD_IMAGE))
+endif
 
 
 Image vmlinuz.efi: vmlinux
@@ -187,6 +190,11 @@
   endif
 endif
 
+ifeq ($(CONFIG_KVM),y)
+archscripts:
+	$(Q)$(MAKE) $(build)=arch/arm64/tools gen-hyprel
+endif
+
 ifeq ($(KBUILD_EXTMOD),)
 # We need to generate vdso-offsets.h before compiling certain files in kernel/.
 # In order to do that, we should use the archprepare target, but we can't since

diff --git a/arch/arm64/Makefile.postlink b/arch/arm64/Makefile.postlink
new file mode 100644
index 0000000..8cf297f
--- /dev/null
+++ b/arch/arm64/Makefile.postlink

@@ -0,0 +1,49 @@
+# SPDX-License-Identifier: GPL-2.0
+
+#
+# This file is included by the generic Kbuild makefile to permit the
+# architecture to perform postlink actions on vmlinux and any .ko module file.
+# In this case, we only need it for fips140.ko, which needs some postprocessing
+# for the integrity check mandated by FIPS. This involves making copies of the
+# relocation sections so that the module will have access to them at
+# initialization time, and calculating and injecting a HMAC digest into the
+# module. All other targets are NOPs.
+#
+
+PHONY := __archpost
+__archpost:
+
+-include include/config/auto.conf
+include scripts/Kbuild.include
+
+CMD_FIPS140_GEN_HMAC = crypto/fips140_gen_hmac
+quiet_cmd_gen_hmac = HMAC    $@
+      cmd_gen_hmac = $(OBJCOPY) $@ \
+	--dump-section=$(shell $(READELF) -SW $@|grep -Eo '\.rela\.text\S*')=$@.rela.text \
+	--dump-section=$(shell $(READELF) -SW $@|grep -Eo '\.rela\.rodata\S*')=$@.rela.rodata && \
+	$(OBJCOPY) $@ \
+	--add-section=.init.rela.text=$@.rela.text \
+	--add-section=.init.rela.rodata=$@.rela.rodata \
+	--set-section-flags=.init.rela.text=alloc,readonly \
+	--set-section-flags=.init.rela.rodata=alloc,readonly && \
+	$(CMD_FIPS140_GEN_HMAC) $@
+
+# `@true` prevents complaints when there is nothing to be done
+
+vmlinux: FORCE
+	@true
+
+$(objtree)/crypto/fips140.ko: FORCE
+	$(call cmd,gen_hmac)
+
+%.ko: FORCE
+	@true
+
+clean:
+	rm -f $(objtree)/crypto/fips140.ko.rela.*
+
+PHONY += FORCE clean
+
+FORCE:
+
+.PHONY: $(PHONY)

diff --git a/arch/arm64/OWNERS b/arch/arm64/OWNERS
new file mode 100644
index 0000000..f362e24
--- /dev/null
+++ b/arch/arm64/OWNERS

@@ -0,0 +1,4 @@
+per-file crypto/**=file:/crypto/OWNERS
+per-file {include,kernel,kvm,lib}/**=mzyngier@google.com,willdeacon@google.com
+per-file mm/**=file:/mm/OWNERS
+per-file net/**=file:/net/OWNERS

diff --git a/arch/arm64/configs/16k_gki.fragment b/arch/arm64/configs/16k_gki.fragment
new file mode 100644
index 0000000..b923493
--- /dev/null
+++ b/arch/arm64/configs/16k_gki.fragment

@@ -0,0 +1,3 @@
+CONFIG_ARM64_16K_PAGES=y
+# b/241785095
+# CONFIG_INCREMENTAL_FS is not set

diff --git a/arch/arm64/configs/amlogic_gki.fragment b/arch/arm64/configs/amlogic_gki.fragment
new file mode 100644
index 0000000..e577f55
--- /dev/null
+++ b/arch/arm64/configs/amlogic_gki.fragment

@@ -0,0 +1,142 @@
+#
+# Generic drivers/frameworks
+#
+CONFIG_COMMON_CLK_PWM=m
+CONFIG_REGULATOR_PWM=m
+CONFIG_PWRSEQ_EMMC=m
+CONFIG_PWRSEQ_SIMPLE=m
+CONFIG_USB_DWC2=m
+CONFIG_LEDS_GPIO=m
+
+#
+# Networking
+#
+CONFIG_REALTEK_PHY=m
+CONFIG_STMMAC_ETH=m
+CONFIG_STMMAC_PLATFORM=m
+
+#
+# WLAN
+#
+CONFIG_WLAN_VENDOR_BROADCOM=y
+CONFIG_BRCMUTIL=m
+CONFIG_BRCMFMAC=m
+CONFIG_BRCMFMAC_PROTO_BCDC=y
+CONFIG_BRCMFMAC_SDIO=y
+
+#
+# Amlogic
+#
+CONFIG_ARCH_MESON=y
+CONFIG_SERIAL_MESON=m
+CONFIG_SERIAL_MESON_CONSOLE=y
+
+#
+# Amlogic drivers as modules
+#
+
+# core
+CONFIG_MESON_SM=m
+CONFIG_RESET_MESON=m
+CONFIG_MESON_IRQ_GPIO=m
+
+# clocks
+CONFIG_COMMON_CLK_MESON_REGMAP=m
+CONFIG_COMMON_CLK_MESON_DUALDIV=m
+CONFIG_COMMON_CLK_MESON_MPLL=m
+CONFIG_COMMON_CLK_MESON_PHASE=m
+CONFIG_COMMON_CLK_MESON_PLL=m
+CONFIG_COMMON_CLK_MESON_SCLK_DIV=m
+CONFIG_COMMON_CLK_MESON_VID_PLL_DIV=m
+CONFIG_COMMON_CLK_MESON_AO_CLKC=m
+CONFIG_COMMON_CLK_MESON_EE_CLKC=m
+CONFIG_COMMON_CLK_MESON_CPU_DYNDIV=m
+CONFIG_COMMON_CLK_GXBB=m
+CONFIG_COMMON_CLK_AXG=m
+CONFIG_COMMON_CLK_G12A=m
+
+# PHY
+CONFIG_PHY_MESON8B_USB2=m
+CONFIG_PHY_MESON_GXL_USB2=m
+CONFIG_PHY_MESON_G12A_USB2=m
+CONFIG_PHY_MESON_G12A_USB3_PCIE=m
+CONFIG_PHY_MESON_AXG_PCIE=m
+CONFIG_PHY_MESON_AXG_MIPI_PCIE_ANALOG=m
+
+# peripherals
+CONFIG_I2C_MESON=m
+CONFIG_MMC_MESON_GX=m
+CONFIG_HW_RANDOM_MESON=m
+CONFIG_USB_DWC3_MESON_G12A=m
+CONFIG_MESON_SARADC=m
+CONFIG_SPI_MESON_SPICC=m
+CONFIG_SPI_MESON_SPIFC=m
+CONFIG_PCI_MESON=m
+CONFIG_DWMAC_MESON=m
+CONFIG_MDIO_BUS_MUX_MESON_G12A=m
+CONFIG_MESON_GXL_PHY=m
+CONFIG_PINCTRL_MESON=m
+CONFIG_PINCTRL_MESON_GXBB=m
+CONFIG_PINCTRL_MESON_GXL=m
+CONFIG_PINCTRL_MESON_AXG=m
+CONFIG_PINCTRL_MESON_AXG_PMX=m
+CONFIG_PINCTRL_MESON_G12A=m
+CONFIG_MESON_GXBB_WATCHDOG=m
+CONFIG_MESON_WATCHDOG=m
+CONFIG_MTD_NAND_MESON=m
+CONFIG_PWM_MESON=m
+CONFIG_IR_MESON=m
+CONFIG_MESON_EFUSE=m
+CONFIG_MFD_KHADAS_MCU=m
+CONFIG_KHADAS_MCU_FAN_THERMAL=m
+CONFIG_AMLOGIC_THERMAL=m
+
+# sound
+CONFIG_SND_MESON_AXG_SOUND_CARD=m
+CONFIG_SND_MESON_GX_SOUND_CARD=m
+CONFIG_SND_MESON_G12A_TOHDMITX=m
+
+# display / video
+CONFIG_DRM_MESON=m
+CONFIG_DRM_MESON_DW_HDMI=m
+CONFIG_DRM_DW_HDMI=m
+CONFIG_DRM_DW_HDMI_AHB_AUDIO=m
+CONFIG_DRM_DW_HDMI_I2S_AUDIO=m
+CONFIG_DRM_DW_HDMI_CEC=m
+CONFIG_CEC_MESON_AO=m
+CONFIG_CEC_MESON_G12A_AO=m
+CONFIG_VIDEO_MESON_GE2D=m
+
+# SoC drivers
+CONFIG_MESON_CANVAS=m
+CONFIG_MESON_CLK_MEASURE=m
+CONFIG_MESON_GX_PM_DOMAINS=m
+CONFIG_MESON_EE_PM_DOMAINS=m
+CONFIG_MESON_SECURE_PM_DOMAINS=m
+
+#
+# Amlogic drivers disable
+#
+
+# 32-bit SoC drivers
+CONFIG_MESON6_TIMER=n
+CONFIG_MESON_MX_SOCINFO=n
+
+# only needed by DRM on S805X
+CONFIG_MESON_GX_SOCINFO=n
+
+#
+# Debug / Testing
+#
+
+# devtmpfs needed for buildroot/udev module loading, serial console
+#CONFIG_DEVTMPFS=y
+#CONFIG_DEVTMPFS_MOUNT=y
+
+# debug/testing with FB console
+#CONFIG_DRM_KMS_FB_HELPER=y
+#CONFIG_DRM_FBDEV_EMULATION=y
+#CONFIG_FB=y
+#CONFIG_VT=y
+#CONFIG_FRAMEBUFFER_CONSOLE=y
+#CONFIG_LOGO=y

diff --git a/arch/arm64/configs/db845c_gki.fragment b/arch/arm64/configs/db845c_gki.fragment
new file mode 100644
index 0000000..7cb0dc9e
--- /dev/null
+++ b/arch/arm64/configs/db845c_gki.fragment

@@ -0,0 +1,310 @@
+# CONFIG_MODULE_SIG_ALL is not set
+CONFIG_QRTR=m
+CONFIG_QRTR_TUN=m
+CONFIG_SCSI_UFS_QCOM=m
+CONFIG_INPUT_PM8941_PWRKEY=m
+CONFIG_SERIAL_MSM=m
+CONFIG_I2C_QCOM_GENI=m
+CONFIG_I2C_QUP=m
+CONFIG_PINCTRL_MSM=m
+CONFIG_PINCTRL_QCOM_SPMI_PMIC=m
+CONFIG_PINCTRL_SDM845=m
+CONFIG_POWER_RESET_QCOM_PON=m
+CONFIG_SYSCON_REBOOT_MODE=m
+CONFIG_QCOM_TSENS=m
+CONFIG_QCOM_WDT=m
+CONFIG_PM8916_WATCHDOG=m
+CONFIG_MFD_SPMI_PMIC=m
+CONFIG_SPMI_MSM_PMIC_ARB=m
+CONFIG_REGULATOR_QCOM_RPMH=m
+CONFIG_REGULATOR_QCOM_SPMI=m
+CONFIG_DRM_MSM=m
+# CONFIG_DRM_MSM_DSI_28NM_PHY is not set
+# CONFIG_DRM_MSM_DSI_20NM_PHY is not set
+# CONFIG_DRM_MSM_DSI_28NM_8960_PHY is not set
+CONFIG_DRM_LONTIUM_LT9611=m
+CONFIG_USB_OHCI_HCD=m
+CONFIG_USB_OHCI_HCD_PLATFORM=m
+# CONFIG_USB_DWC3_HAPS is not set
+# CONFIG_USB_DWC3_OF_SIMPLE is not set
+CONFIG_USB_GADGET_VBUS_DRAW=500
+# CONFIG_USB_DUMMY_HCD is not set
+CONFIG_USB_ROLE_SWITCH=m
+CONFIG_USB_ULPI_BUS=m
+CONFIG_MMC_SDHCI_MSM=m
+CONFIG_RTC_DRV_PM8XXX=m
+CONFIG_COMMON_CLK_QCOM=m
+CONFIG_SDM_GPUCC_845=m
+CONFIG_QCOM_CLK_RPMH=m
+CONFIG_SDM_DISPCC_845=m
+CONFIG_HWSPINLOCK_QCOM=m
+CONFIG_QCOM_LLCC=m
+CONFIG_QCOM_RMTFS_MEM=m
+CONFIG_QCOM_SMEM=m
+CONFIG_QCOM_SMSM=m
+CONFIG_EXTCON_USB_GPIO=m
+CONFIG_RESET_QCOM_AOSS=m
+CONFIG_RESET_QCOM_PDC=m
+CONFIG_PHY_QCOM_QMP=m
+CONFIG_PHY_QCOM_QUSB2=m
+CONFIG_PHY_QCOM_USB_HS=m
+CONFIG_NVMEM_QCOM_QFPROM=m
+CONFIG_INTERCONNECT_QCOM=y
+CONFIG_INTERCONNECT_QCOM_OSM_L3=m
+CONFIG_INTERCONNECT_QCOM_SDM845=m
+CONFIG_QCOM_RPMH=m
+CONFIG_QCOM_RPMHPD=m
+CONFIG_WLAN_VENDOR_ATH=y
+CONFIG_ATH10K_AHB=y
+CONFIG_ATH10K=m
+CONFIG_ATH10K_PCI=m
+CONFIG_ATH10K_SNOC=m
+CONFIG_QRTR_SMD=m
+CONFIG_QCOM_FASTRPC=m
+CONFIG_QCOM_APCS_IPC=m
+CONFIG_QCOM_Q6V5_COMMON=m
+CONFIG_QCOM_RPROC_COMMON=m
+CONFIG_QCOM_Q6V5_ADSP=m
+CONFIG_QCOM_Q6V5_MSS=m
+CONFIG_QCOM_Q6V5_PAS=m
+CONFIG_QCOM_Q6V5_WCSS=m
+CONFIG_QCOM_SYSMON=m
+CONFIG_RPMSG_QCOM_GLINK_SMEM=m
+CONFIG_RPMSG_QCOM_SMD=m
+CONFIG_QCOM_AOSS_QMP=m
+CONFIG_QCOM_SMP2P=m
+CONFIG_QCOM_SOCINFO=m
+CONFIG_QCOM_APR=m
+CONFIG_QCOM_GLINK_SSR=m
+CONFIG_RPMSG_QCOM_GLINK_RPM=m
+CONFIG_QCOM_PDC=m
+CONFIG_QCOM_SCM=m
+CONFIG_ARM_SMMU=m
+CONFIG_ARM_QCOM_CPUFREQ_HW=m
+# XXX Audio bits start here
+CONFIG_I2C_CHARDEV=m
+CONFIG_I2C_MUX=m
+CONFIG_I2C_MUX_PCA954x=m
+CONFIG_I2C_DESIGNWARE_PLATFORM=m
+CONFIG_I2C_RK3X=m
+CONFIG_SPI_PL022=m
+CONFIG_SPI_QCOM_QSPI=m
+CONFIG_SPI_QUP=m
+CONFIG_SPI_QCOM_GENI=m
+CONFIG_GPIO_WCD934X=m
+CONFIG_MFD_WCD934X=m
+CONFIG_REGULATOR_GPIO=m
+CONFIG_SND_SOC_QCOM=m
+CONFIG_SND_SOC_QCOM_COMMON=m
+CONFIG_SND_SOC_QDSP6_COMMON=m
+CONFIG_SND_SOC_QDSP6_CORE=m
+CONFIG_SND_SOC_QDSP6_AFE=m
+CONFIG_SND_SOC_QDSP6_AFE_DAI=m
+CONFIG_SND_SOC_QDSP6_ADM=m
+CONFIG_SND_SOC_QDSP6_ROUTING=m
+CONFIG_SND_SOC_QDSP6_ASM=m
+CONFIG_SND_SOC_QDSP6_ASM_DAI=m
+CONFIG_SND_SOC_QDSP6=m
+CONFIG_SND_SOC_SDM845=m
+CONFIG_SND_SOC_DMIC=m
+CONFIG_SND_SOC_WCD9335=m
+CONFIG_SND_SOC_WCD934X=m
+CONFIG_SND_SOC_WSA881X=m
+CONFIG_QCOM_BAM_DMA=m
+CONFIG_QCOM_GPI_DMA=m
+CONFIG_SPMI_PMIC_CLKDIV=m
+CONFIG_SOUNDWIRE=m
+CONFIG_SOUNDWIRE_QCOM=m
+CONFIG_SLIMBUS=m
+CONFIG_SLIM_QCOM_NGD_CTRL=m
+CONFIG_DMABUF_HEAPS_SYSTEM=m
+CONFIG_VIDEO_QCOM_VENUS=m
+CONFIG_SDM_VIDEOCC_845=m
+# CONFIG_CXD2880_SPI_DRV is not set
+# CONFIG_MEDIA_TUNER_SIMPLE is not set
+# CONFIG_MEDIA_TUNER_TDA18250 is not set
+# CONFIG_MEDIA_TUNER_TDA8290 is not set
+# CONFIG_MEDIA_TUNER_TDA827X is not set
+# CONFIG_MEDIA_TUNER_TDA18271 is not set
+# CONFIG_MEDIA_TUNER_TDA9887 is not set
+# CONFIG_MEDIA_TUNER_TEA5761 is not set
+# CONFIG_MEDIA_TUNER_TEA5767 is not set
+# CONFIG_MEDIA_TUNER_MSI001 is not set
+# CONFIG_MEDIA_TUNER_MT20XX is not set
+# CONFIG_MEDIA_TUNER_MT2060 is not set
+# CONFIG_MEDIA_TUNER_MT2063 is not set
+# CONFIG_MEDIA_TUNER_MT2266 is not set
+# CONFIG_MEDIA_TUNER_MT2131 is not set
+# CONFIG_MEDIA_TUNER_QT1010 is not set
+# CONFIG_MEDIA_TUNER_XC2028 is not set
+# CONFIG_MEDIA_TUNER_XC5000 is not set
+# CONFIG_MEDIA_TUNER_XC4000 is not set
+# CONFIG_MEDIA_TUNER_MXL5005S is not set
+# CONFIG_MEDIA_TUNER_MXL5007T is not set
+# CONFIG_MEDIA_TUNER_MC44S803 is not set
+# CONFIG_MEDIA_TUNER_MAX2165 is not set
+# CONFIG_MEDIA_TUNER_TDA18218 is not set
+# CONFIG_MEDIA_TUNER_FC0011 is not set
+# CONFIG_MEDIA_TUNER_FC0012 is not set
+# CONFIG_MEDIA_TUNER_FC0013 is not set
+# CONFIG_MEDIA_TUNER_TDA18212 is not set
+# CONFIG_MEDIA_TUNER_E4000 is not set
+# CONFIG_MEDIA_TUNER_FC2580 is not set
+# CONFIG_MEDIA_TUNER_M88RS6000T is not set
+# CONFIG_MEDIA_TUNER_TUA9001 is not set
+# CONFIG_MEDIA_TUNER_SI2157 is not set
+# CONFIG_MEDIA_TUNER_IT913X is not set
+# CONFIG_MEDIA_TUNER_R820T is not set
+# CONFIG_MEDIA_TUNER_MXL301RF is not set
+# CONFIG_MEDIA_TUNER_QM1D1C0042 is not set
+# CONFIG_MEDIA_TUNER_QM1D1B0004 is not set
+# CONFIG_DVB_STB0899 is not set
+# CONFIG_DVB_STB6100 is not set
+# CONFIG_DVB_STV090x is not set
+# CONFIG_DVB_STV0910 is not set
+# CONFIG_DVB_STV6110x is not set
+# CONFIG_DVB_STV6111 is not set
+# CONFIG_DVB_MXL5XX is not set
+# CONFIG_DVB_M88DS3103 is not set
+# CONFIG_DVB_DRXK is not set
+# CONFIG_DVB_TDA18271C2DD is not set
+# CONFIG_DVB_SI2165 is not set
+# CONFIG_DVB_MN88472 is not set
+# CONFIG_DVB_MN88473 is not set
+# CONFIG_DVB_CX24110 is not set
+# CONFIG_DVB_CX24123 is not set
+# CONFIG_DVB_MT312 is not set
+# CONFIG_DVB_ZL10036 is not set
+# CONFIG_DVB_ZL10039 is not set
+# CONFIG_DVB_S5H1420 is not set
+# CONFIG_DVB_STV0288 is not set
+# CONFIG_DVB_STB6000 is not set
+# CONFIG_DVB_STV0299 is not set
+# CONFIG_DVB_STV6110 is not set
+# CONFIG_DVB_STV0900 is not set
+# CONFIG_DVB_TDA8083 is not set
+# CONFIG_DVB_TDA10086 is not set
+# CONFIG_DVB_TDA8261 is not set
+# CONFIG_DVB_VES1X93 is not set
+# CONFIG_DVB_TUNER_ITD1000 is not set
+# CONFIG_DVB_TUNER_CX24113 is not set
+# CONFIG_DVB_TDA826X is not set
+# CONFIG_DVB_TUA6100 is not set
+# CONFIG_DVB_CX24116 is not set
+# CONFIG_DVB_CX24117 is not set
+# CONFIG_DVB_CX24120 is not set
+# CONFIG_DVB_SI21XX is not set
+# CONFIG_DVB_TS2020 is not set
+# CONFIG_DVB_DS3000 is not set
+# CONFIG_DVB_MB86A16 is not set
+# CONFIG_DVB_TDA10071 is not set
+# CONFIG_DVB_SP8870 is not set
+# CONFIG_DVB_SP887X is not set
+# CONFIG_DVB_CX22700 is not set
+# CONFIG_DVB_CX22702 is not set
+# CONFIG_DVB_S5H1432 is not set
+# CONFIG_DVB_DRXD is not set
+# CONFIG_DVB_L64781 is not set
+# CONFIG_DVB_TDA1004X is not set
+# CONFIG_DVB_NXT6000 is not set
+# CONFIG_DVB_MT352 is not set
+# CONFIG_DVB_ZL10353 is not set
+# CONFIG_DVB_DIB3000MB is not set
+# CONFIG_DVB_DIB3000MC is not set
+# CONFIG_DVB_DIB7000M is not set
+# CONFIG_DVB_DIB7000P is not set
+# CONFIG_DVB_DIB9000 is not set
+# CONFIG_DVB_TDA10048 is not set
+# CONFIG_DVB_AF9013 is not set
+# CONFIG_DVB_EC100 is not set
+# CONFIG_DVB_STV0367 is not set
+# CONFIG_DVB_CXD2820R is not set
+# CONFIG_DVB_CXD2841ER is not set
+# CONFIG_DVB_RTL2830 is not set
+# CONFIG_DVB_RTL2832 is not set
+# CONFIG_DVB_RTL2832_SDR is not set
+# CONFIG_DVB_SI2168 is not set
+# CONFIG_DVB_ZD1301_DEMOD is not set
+# CONFIG_DVB_CXD2880 is not set
+# CONFIG_DVB_VES1820 is not set
+# CONFIG_DVB_TDA10021 is not set
+# CONFIG_DVB_TDA10023 is not set
+# CONFIG_DVB_STV0297 is not set
+# CONFIG_DVB_NXT200X is not set
+# CONFIG_DVB_OR51211 is not set
+# CONFIG_DVB_OR51132 is not set
+# CONFIG_DVB_BCM3510 is not set
+# CONFIG_DVB_LGDT330X is not set
+# CONFIG_DVB_LGDT3305 is not set
+# CONFIG_DVB_LGDT3306A is not set
+# CONFIG_DVB_LG2160 is not set
+# CONFIG_DVB_S5H1409 is not set
+# CONFIG_DVB_AU8522_DTV is not set
+# CONFIG_DVB_AU8522_V4L is not set
+# CONFIG_DVB_S5H1411 is not set
+# CONFIG_DVB_S921 is not set
+# CONFIG_DVB_DIB8000 is not set
+# CONFIG_DVB_MB86A20S is not set
+# CONFIG_DVB_TC90522 is not set
+# CONFIG_DVB_MN88443X is not set
+# CONFIG_DVB_PLL is not set
+# CONFIG_DVB_TUNER_DIB0070 is not set
+# CONFIG_DVB_TUNER_DIB0090 is not set
+# CONFIG_DVB_DRX39XYJ is not set
+# CONFIG_DVB_LNBH25 is not set
+# CONFIG_DVB_LNBH29 is not set
+# CONFIG_DVB_LNBP21 is not set
+# CONFIG_DVB_LNBP22 is not set
+# CONFIG_DVB_ISL6405 is not set
+# CONFIG_DVB_ISL6421 is not set
+# CONFIG_DVB_ISL6423 is not set
+# CONFIG_DVB_A8293 is not set
+# CONFIG_DVB_LGS8GL5 is not set
+# CONFIG_DVB_LGS8GXX is not set
+# CONFIG_DVB_ATBM8830 is not set
+# CONFIG_DVB_TDA665x is not set
+# CONFIG_DVB_IX2505V is not set
+# CONFIG_DVB_M88RS2000 is not set
+# CONFIG_DVB_AF9033 is not set
+# CONFIG_DVB_HORUS3A is not set
+# CONFIG_DVB_ASCOT2E is not set
+# CONFIG_DVB_HELENE is not set
+# CONFIG_DVB_CXD2099 is not set
+# CONFIG_DVB_SP2 is not set
+CONFIG_QCOM_COMMAND_DB=m
+CONFIG_QCOM_LMH=m
+# XXX RB5 bits start here
+CONFIG_QCOM_IPCC=m
+CONFIG_QCOM_SPMI_ADC5=m
+CONFIG_QCOM_SPMI_TEMP_ALARM=m
+CONFIG_RPMSG_NS=m
+CONFIG_CAN_MCP251XFD=m
+CONFIG_ATH11K=m
+CONFIG_ATH11K_AHB=m
+CONFIG_ATH11K_PCI=m
+CONFIG_PINCTRL_SM8250=m
+CONFIG_PINCTRL_SM8250_LPASS_LPI=m
+CONFIG_PINCTRL_LPASS_LPI=m
+CONFIG_QCOM_SPMI_ADC_TM5=m
+CONFIG_MFD_QCOM_QCA639X=m
+CONFIG_REGULATOR_QCOM_USB_VBUS=m
+CONFIG_DRM_DISPLAY_CONNECTOR=m
+CONFIG_DRM_LONTIUM_LT9611UXC=m
+CONFIG_SND_SOC_SM8250=m
+CONFIG_SND_SOC_LPASS_WSA_MACRO=m
+CONFIG_SND_SOC_LPASS_VA_MACRO=m
+CONFIG_TYPEC_QCOM_PMIC=m
+CONFIG_LEDS_QCOM_LPG=m
+CONFIG_SM_GPUCC_8250=m
+CONFIG_SM_DISPCC_8250=m
+CONFIG_SM_VIDEOCC_8250=m
+CONFIG_CLK_GFM_LPASS_SM8250=m
+CONFIG_PHY_QCOM_USB_SNPS_FEMTO_V2=m
+CONFIG_INTERCONNECT_QCOM_SM8250=m
+CONFIG_CRYPTO_MICHAEL_MIC=m
+CONFIG_QCOM_CPR=m
+CONFIG_QCOM_SPM=m
+# XXX SM8450 bits start here
+CONFIG_PINCTRL_SM8450=m
+CONFIG_SM_GCC_8450=m
+CONFIG_INTERCONNECT_QCOM_SM8450=m

diff --git a/arch/arm64/configs/fips140_gki.fragment b/arch/arm64/configs/fips140_gki.fragment
new file mode 100644
index 0000000..aa8444d
--- /dev/null
+++ b/arch/arm64/configs/fips140_gki.fragment

@@ -0,0 +1,2 @@
+CONFIG_CRYPTO_FIPS140_MOD=m
+# CONFIG_MODULE_SIG_ALL is not set

diff --git a/arch/arm64/configs/gki_defconfig b/arch/arm64/configs/gki_defconfig
new file mode 100644
index 0000000..e3c30f8
--- /dev/null
+++ b/arch/arm64/configs/gki_defconfig

@@ -0,0 +1,703 @@
+CONFIG_UAPI_HEADER_TEST=y
+CONFIG_AUDIT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BPF_SYSCALL=y
+CONFIG_BPF_JIT=y
+CONFIG_BPF_JIT_ALWAYS_ON=y
+# CONFIG_BPF_UNPRIV_DEFAULT_OFF is not set
+CONFIG_PREEMPT=y
+CONFIG_IRQ_TIME_ACCOUNTING=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+CONFIG_PSI=y
+CONFIG_RCU_EXPERT=y
+CONFIG_RCU_BOOST=y
+CONFIG_RCU_NOCB_CPU=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_IKHEADERS=y
+CONFIG_UCLAMP_TASK=y
+CONFIG_UCLAMP_BUCKETS_COUNT=20
+CONFIG_CGROUPS=y
+CONFIG_MEMCG=y
+CONFIG_BLK_CGROUP=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_UCLAMP_TASK_GROUP=y
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CPUSETS=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_BPF=y
+CONFIG_NAMESPACES=y
+# CONFIG_PID_NS is not set
+CONFIG_RT_SOFTIRQ_AWARE_SCHED=y
+# CONFIG_RD_BZIP2 is not set
+# CONFIG_RD_LZMA is not set
+# CONFIG_RD_XZ is not set
+# CONFIG_RD_LZO is not set
+CONFIG_BOOT_CONFIG=y
+# CONFIG_SYSFS_SYSCALL is not set
+# CONFIG_FHANDLE is not set
+CONFIG_KALLSYMS_ALL=y
+# CONFIG_RSEQ is not set
+CONFIG_EMBEDDED=y
+CONFIG_PROFILING=y
+CONFIG_ARCH_SUNXI=y
+CONFIG_ARCH_HISI=y
+CONFIG_ARCH_QCOM=y
+CONFIG_SCHED_MC=y
+CONFIG_NR_CPUS=32
+CONFIG_PARAVIRT_TIME_ACCOUNTING=y
+CONFIG_ARM64_SW_TTBR0_PAN=y
+CONFIG_COMPAT=y
+CONFIG_ARMV8_DEPRECATED=y
+CONFIG_SWP_EMULATION=y
+CONFIG_CP15_BARRIER_EMULATION=y
+CONFIG_SETEND_EMULATION=y
+CONFIG_RANDOMIZE_BASE=y
+# CONFIG_RANDOMIZE_MODULE_REGION_FULL is not set
+CONFIG_CMDLINE="console=ttynull stack_depot_disable=on cgroup_disable=pressure kasan.page_alloc.sample=10 kasan.stacktrace=off kvm-arm.mode=protected bootconfig ioremap_guard"
+CONFIG_CMDLINE_EXTEND=y
+# CONFIG_DMI is not set
+CONFIG_PM_WAKELOCKS=y
+CONFIG_PM_WAKELOCKS_LIMIT=0
+# CONFIG_PM_WAKELOCKS_GC is not set
+CONFIG_ENERGY_MODEL=y
+CONFIG_CPU_IDLE=y
+CONFIG_CPU_IDLE_GOV_MENU=y
+CONFIG_CPU_IDLE_GOV_TEO=y
+CONFIG_ARM_PSCI_CPUIDLE=y
+CONFIG_CPU_FREQ=y
+CONFIG_CPU_FREQ_STAT=y
+CONFIG_CPU_FREQ_TIMES=y
+CONFIG_CPU_FREQ_GOV_POWERSAVE=y
+CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
+CONFIG_ARM_SCPI_CPUFREQ=y
+CONFIG_ARM_SCMI_CPUFREQ=y
+CONFIG_VIRTUALIZATION=y
+CONFIG_KVM=y
+CONFIG_KPROBES=y
+CONFIG_JUMP_LABEL=y
+CONFIG_SHADOW_CALL_STACK=y
+CONFIG_CFI_CLANG=y
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_MODVERSIONS=y
+CONFIG_MODULE_SIG=y
+CONFIG_MODULE_SIG_PROTECT=y
+CONFIG_BLK_DEV_ZONED=y
+CONFIG_BLK_INLINE_ENCRYPTION=y
+CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y
+CONFIG_IOSCHED_BFQ=y
+CONFIG_BFQ_GROUP_IOSCHED=y
+CONFIG_GKI_HACKS_TO_FIX=y
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+CONFIG_BINFMT_MISC=y
+# CONFIG_SLAB_MERGE_DEFAULT is not set
+CONFIG_SLAB_FREELIST_RANDOM=y
+CONFIG_SLAB_FREELIST_HARDENED=y
+CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
+# CONFIG_COMPAT_BRK is not set
+CONFIG_MEMORY_HOTPLUG=y
+CONFIG_MEMORY_HOTREMOVE=y
+CONFIG_DEFAULT_MMAP_MIN_ADDR=32768
+CONFIG_TRANSPARENT_HUGEPAGE=y
+CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
+CONFIG_CMA=y
+CONFIG_CMA_DEBUGFS=y
+CONFIG_CMA_AREAS=16
+# CONFIG_ZONE_DMA is not set
+CONFIG_ANON_VMA_NAME=y
+CONFIG_USERFAULTFD=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_XFRM_USER=y
+CONFIG_XFRM_INTERFACE=y
+CONFIG_XFRM_MIGRATE=y
+CONFIG_XFRM_STATISTICS=y
+CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_NET_IPIP=y
+CONFIG_NET_IPGRE_DEMUX=y
+CONFIG_NET_IPGRE=y
+CONFIG_NET_IPVTI=y
+CONFIG_INET_ESP=y
+CONFIG_INET_UDP_DIAG=y
+CONFIG_INET_DIAG_DESTROY=y
+CONFIG_IPV6_ROUTER_PREF=y
+CONFIG_IPV6_ROUTE_INFO=y
+CONFIG_IPV6_OPTIMISTIC_DAD=y
+CONFIG_INET6_ESP=y
+CONFIG_INET6_IPCOMP=y
+CONFIG_IPV6_MIP6=y
+CONFIG_IPV6_VTI=y
+CONFIG_IPV6_GRE=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_IPV6_MROUTE=y
+CONFIG_NETFILTER=y
+CONFIG_NF_CONNTRACK=y
+CONFIG_NF_CONNTRACK_SECMARK=y
+CONFIG_NF_CONNTRACK_PROCFS=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CONNTRACK_AMANDA=y
+CONFIG_NF_CONNTRACK_FTP=y
+CONFIG_NF_CONNTRACK_H323=y
+CONFIG_NF_CONNTRACK_IRC=y
+CONFIG_NF_CONNTRACK_NETBIOS_NS=y
+CONFIG_NF_CONNTRACK_PPTP=y
+CONFIG_NF_CONNTRACK_SANE=y
+CONFIG_NF_CONNTRACK_TFTP=y
+CONFIG_NF_CT_NETLINK=y
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
+CONFIG_NETFILTER_XT_TARGET_DSCP=y
+CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
+CONFIG_NETFILTER_XT_TARGET_MARK=y
+CONFIG_NETFILTER_XT_TARGET_NFLOG=y
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
+CONFIG_NETFILTER_XT_TARGET_NOTRACK=y
+CONFIG_NETFILTER_XT_TARGET_TEE=y
+CONFIG_NETFILTER_XT_TARGET_TPROXY=y
+CONFIG_NETFILTER_XT_TARGET_TRACE=y
+CONFIG_NETFILTER_XT_TARGET_SECMARK=y
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
+CONFIG_NETFILTER_XT_MATCH_BPF=y
+CONFIG_NETFILTER_XT_MATCH_COMMENT=y
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
+CONFIG_NETFILTER_XT_MATCH_DSCP=y
+CONFIG_NETFILTER_XT_MATCH_ESP=y
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_HELPER=y
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
+CONFIG_NETFILTER_XT_MATCH_L2TP=y
+CONFIG_NETFILTER_XT_MATCH_LENGTH=y
+CONFIG_NETFILTER_XT_MATCH_LIMIT=y
+CONFIG_NETFILTER_XT_MATCH_MAC=y
+CONFIG_NETFILTER_XT_MATCH_MARK=y
+CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y
+CONFIG_NETFILTER_XT_MATCH_OWNER=y
+CONFIG_NETFILTER_XT_MATCH_POLICY=y
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA2=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG=y
+CONFIG_NETFILTER_XT_MATCH_SOCKET=y
+CONFIG_NETFILTER_XT_MATCH_STATE=y
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
+CONFIG_NETFILTER_XT_MATCH_STRING=y
+CONFIG_NETFILTER_XT_MATCH_TIME=y
+CONFIG_NETFILTER_XT_MATCH_U32=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP_NF_MATCH_ECN=y
+CONFIG_IP_NF_MATCH_TTL=y
+CONFIG_IP_NF_FILTER=y
+CONFIG_IP_NF_TARGET_REJECT=y
+CONFIG_IP_NF_NAT=y
+CONFIG_IP_NF_TARGET_MASQUERADE=y
+CONFIG_IP_NF_TARGET_NETMAP=y
+CONFIG_IP_NF_TARGET_REDIRECT=y
+CONFIG_IP_NF_MANGLE=y
+CONFIG_IP_NF_RAW=y
+CONFIG_IP_NF_SECURITY=y
+CONFIG_IP_NF_ARPTABLES=y
+CONFIG_IP_NF_ARPFILTER=y
+CONFIG_IP_NF_ARP_MANGLE=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_MATCH_RPFILTER=y
+CONFIG_IP6_NF_FILTER=y
+CONFIG_IP6_NF_TARGET_REJECT=y
+CONFIG_IP6_NF_MANGLE=y
+CONFIG_IP6_NF_RAW=y
+CONFIG_TIPC=m
+CONFIG_L2TP=m
+CONFIG_BRIDGE=y
+CONFIG_VLAN_8021Q=m
+CONFIG_6LOWPAN=m
+CONFIG_IEEE802154=m
+CONFIG_IEEE802154_6LOWPAN=m
+CONFIG_MAC802154=m
+CONFIG_NET_SCHED=y
+CONFIG_NET_SCH_HTB=y
+CONFIG_NET_SCH_PRIO=y
+CONFIG_NET_SCH_MULTIQ=y
+CONFIG_NET_SCH_SFQ=y
+CONFIG_NET_SCH_TBF=y
+CONFIG_NET_SCH_NETEM=y
+CONFIG_NET_SCH_CODEL=y
+CONFIG_NET_SCH_FQ_CODEL=y
+CONFIG_NET_SCH_FQ=y
+CONFIG_NET_SCH_INGRESS=y
+CONFIG_NET_CLS_BASIC=y
+CONFIG_NET_CLS_TCINDEX=y
+CONFIG_NET_CLS_FW=y
+CONFIG_NET_CLS_U32=y
+CONFIG_CLS_U32_MARK=y
+CONFIG_NET_CLS_FLOW=y
+CONFIG_NET_CLS_BPF=y
+CONFIG_NET_CLS_MATCHALL=y
+CONFIG_NET_EMATCH=y
+CONFIG_NET_EMATCH_CMP=y
+CONFIG_NET_EMATCH_NBYTE=y
+CONFIG_NET_EMATCH_U32=y
+CONFIG_NET_EMATCH_META=y
+CONFIG_NET_EMATCH_TEXT=y
+CONFIG_NET_CLS_ACT=y
+CONFIG_NET_ACT_POLICE=y
+CONFIG_NET_ACT_GACT=y
+CONFIG_NET_ACT_MIRRED=y
+CONFIG_NET_ACT_SKBEDIT=y
+CONFIG_NET_ACT_BPF=y
+CONFIG_VSOCKETS=y
+CONFIG_CGROUP_NET_PRIO=y
+CONFIG_CAN=m
+CONFIG_BT=m
+CONFIG_BT_RFCOMM=m
+CONFIG_BT_RFCOMM_TTY=y
+CONFIG_BT_HIDP=m
+CONFIG_BT_HCIBTSDIO=m
+CONFIG_BT_HCIUART=m
+CONFIG_BT_HCIUART_LL=y
+CONFIG_BT_HCIUART_BCM=y
+CONFIG_BT_HCIUART_QCA=y
+CONFIG_CFG80211=m
+CONFIG_NL80211_TESTMODE=y
+# CONFIG_CFG80211_DEFAULT_PS is not set
+# CONFIG_CFG80211_CRDA_SUPPORT is not set
+CONFIG_MAC80211=m
+CONFIG_RFKILL=m
+CONFIG_NFC=m
+CONFIG_PCI=y
+CONFIG_PCIEPORTBUS=y
+CONFIG_PCIEAER=y
+CONFIG_PCI_IOV=y
+# CONFIG_VGA_ARB is not set
+CONFIG_PCI_HOST_GENERIC=y
+CONFIG_PCIE_DW_PLAT_EP=y
+CONFIG_PCIE_QCOM=y
+CONFIG_PCIE_KIRIN=y
+CONFIG_PCI_ENDPOINT=y
+CONFIG_FW_LOADER_USER_HELPER=y
+# CONFIG_FW_CACHE is not set
+# CONFIG_SUN50I_DE2_BUS is not set
+# CONFIG_SUNXI_RSB is not set
+CONFIG_ARM_SCMI_PROTOCOL=y
+# CONFIG_ARM_SCMI_POWER_DOMAIN is not set
+CONFIG_ARM_SCPI_PROTOCOL=y
+# CONFIG_ARM_SCPI_POWER_DOMAIN is not set
+# CONFIG_EFI_ARMSTUB_DTB_LOADER is not set
+CONFIG_GNSS=y
+CONFIG_ZRAM=m
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_LOOP_MIN_COUNT=16
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_SIZE=8192
+CONFIG_SRAM=y
+CONFIG_SCSI=y
+# CONFIG_SCSI_PROC_FS is not set
+CONFIG_BLK_DEV_SD=y
+CONFIG_MD=y
+CONFIG_BLK_DEV_DM=y
+CONFIG_DM_CRYPT=y
+CONFIG_DM_DEFAULT_KEY=y
+CONFIG_DM_SNAPSHOT=y
+CONFIG_DM_UEVENT=y
+CONFIG_DM_VERITY=y
+CONFIG_DM_VERITY_FEC=y
+CONFIG_NETDEVICES=y
+CONFIG_DUMMY=y
+CONFIG_WIREGUARD=y
+CONFIG_IFB=y
+CONFIG_MACSEC=y
+CONFIG_TUN=y
+CONFIG_VETH=y
+CONFIG_CAN_VCAN=m
+CONFIG_CAN_SLCAN=m
+CONFIG_PPP=m
+CONFIG_PPP_BSDCOMP=m
+CONFIG_PPP_DEFLATE=m
+CONFIG_PPP_MPPE=m
+CONFIG_PPTP=m
+CONFIG_PPPOL2TP=m
+CONFIG_USB_RTL8150=y
+CONFIG_USB_RTL8152=y
+CONFIG_USB_USBNET=y
+CONFIG_USB_NET_CDC_EEM=y
+# CONFIG_USB_NET_NET1080 is not set
+# CONFIG_USB_NET_CDC_SUBSET is not set
+# CONFIG_USB_NET_ZAURUS is not set
+CONFIG_USB_NET_AQC111=y
+# CONFIG_WLAN_VENDOR_ADMTEK is not set
+# CONFIG_WLAN_VENDOR_ATH is not set
+# CONFIG_WLAN_VENDOR_ATMEL is not set
+# CONFIG_WLAN_VENDOR_BROADCOM is not set
+# CONFIG_WLAN_VENDOR_CISCO is not set
+# CONFIG_WLAN_VENDOR_INTEL is not set
+# CONFIG_WLAN_VENDOR_INTERSIL is not set
+# CONFIG_WLAN_VENDOR_MARVELL is not set
+# CONFIG_WLAN_VENDOR_MEDIATEK is not set
+# CONFIG_WLAN_VENDOR_RALINK is not set
+# CONFIG_WLAN_VENDOR_REALTEK is not set
+# CONFIG_WLAN_VENDOR_RSI is not set
+# CONFIG_WLAN_VENDOR_ST is not set
+# CONFIG_WLAN_VENDOR_TI is not set
+# CONFIG_WLAN_VENDOR_ZYDAS is not set
+# CONFIG_WLAN_VENDOR_QUANTENNA is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_KEYBOARD_GPIO=y
+# CONFIG_MOUSE_PS2 is not set
+CONFIG_INPUT_JOYSTICK=y
+CONFIG_JOYSTICK_XPAD=y
+CONFIG_INPUT_TOUCHSCREEN=y
+CONFIG_INPUT_MISC=y
+CONFIG_INPUT_UINPUT=y
+# CONFIG_VT is not set
+# CONFIG_LEGACY_PTYS is not set
+CONFIG_SERIAL_8250=y
+# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
+CONFIG_SERIAL_8250_CONSOLE=y
+# CONFIG_SERIAL_8250_EXAR is not set
+CONFIG_SERIAL_8250_RUNTIME_UARTS=0
+CONFIG_SERIAL_8250_DW=y
+CONFIG_SERIAL_OF_PLATFORM=y
+CONFIG_SERIAL_AMBA_PL011=y
+CONFIG_SERIAL_AMBA_PL011_CONSOLE=y
+CONFIG_SERIAL_SAMSUNG=y
+CONFIG_SERIAL_SAMSUNG_CONSOLE=y
+CONFIG_SERIAL_QCOM_GENI=y
+CONFIG_SERIAL_QCOM_GENI_CONSOLE=y
+CONFIG_SERIAL_SPRD=y
+CONFIG_SERIAL_SPRD_CONSOLE=y
+CONFIG_NULL_TTY=y
+CONFIG_HVC_DCC=y
+CONFIG_SERIAL_DEV_BUS=y
+CONFIG_HW_RANDOM=y
+# CONFIG_DEVMEM is not set
+# CONFIG_DEVPORT is not set
+# CONFIG_I2C_COMPAT is not set
+# CONFIG_I2C_HELPER_AUTO is not set
+CONFIG_I3C=y
+CONFIG_SPI=y
+CONFIG_SPI_MEM=y
+# CONFIG_SPMI_MSM_PMIC_ARB is not set
+# CONFIG_PINCTRL_SUN8I_H3_R is not set
+# CONFIG_PINCTRL_SUN50I_A64 is not set
+# CONFIG_PINCTRL_SUN50I_A64_R is not set
+# CONFIG_PINCTRL_SUN50I_A100 is not set
+# CONFIG_PINCTRL_SUN50I_A100_R is not set
+# CONFIG_PINCTRL_SUN50I_H5 is not set
+# CONFIG_PINCTRL_SUN50I_H6 is not set
+# CONFIG_PINCTRL_SUN50I_H6_R is not set
+# CONFIG_PINCTRL_SUN50I_H616 is not set
+# CONFIG_PINCTRL_SUN50I_H616_R is not set
+CONFIG_GPIO_GENERIC_PLATFORM=y
+CONFIG_POWER_RESET_HISI=y
+CONFIG_POWER_RESET_SYSCON=y
+# CONFIG_HWMON is not set
+CONFIG_THERMAL=y
+CONFIG_THERMAL_NETLINK=y
+CONFIG_THERMAL_STATISTICS=y
+CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=100
+CONFIG_THERMAL_WRITABLE_TRIPS=y
+CONFIG_THERMAL_GOV_USER_SPACE=y
+CONFIG_THERMAL_GOV_POWER_ALLOCATOR=y
+CONFIG_CPU_THERMAL=y
+CONFIG_DEVFREQ_THERMAL=y
+CONFIG_THERMAL_EMULATION=y
+CONFIG_WATCHDOG=y
+CONFIG_WATCHDOG_CORE=y
+CONFIG_MFD_ACT8945A=y
+CONFIG_REGULATOR=y
+CONFIG_REGULATOR_FIXED_VOLTAGE=y
+CONFIG_RC_CORE=y
+CONFIG_BPF_LIRC_MODE2=y
+CONFIG_LIRC=y
+# CONFIG_RC_MAP is not set
+CONFIG_RC_DECODERS=y
+CONFIG_RC_DEVICES=y
+CONFIG_MEDIA_CEC_RC=y
+# CONFIG_MEDIA_ANALOG_TV_SUPPORT is not set
+# CONFIG_MEDIA_DIGITAL_TV_SUPPORT is not set
+# CONFIG_MEDIA_RADIO_SUPPORT is not set
+# CONFIG_MEDIA_SDR_SUPPORT is not set
+# CONFIG_MEDIA_TEST_SUPPORT is not set
+CONFIG_MEDIA_USB_SUPPORT=y
+CONFIG_USB_GSPCA=y
+CONFIG_USB_VIDEO_CLASS=y
+CONFIG_V4L_PLATFORM_DRIVERS=y
+CONFIG_V4L_MEM2MEM_DRIVERS=y
+CONFIG_DRM=y
+CONFIG_BACKLIGHT_CLASS_DEVICE=y
+CONFIG_SOUND=y
+CONFIG_SND=y
+CONFIG_SND_HRTIMER=y
+# CONFIG_SND_SUPPORT_OLD_API is not set
+# CONFIG_SND_VERBOSE_PROCFS is not set
+# CONFIG_SND_DRIVERS is not set
+CONFIG_SND_USB_AUDIO=y
+CONFIG_SND_SOC=y
+CONFIG_HID_BATTERY_STRENGTH=y
+CONFIG_HIDRAW=y
+CONFIG_UHID=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_PRODIKEYS=y
+CONFIG_HID_ELECOM=y
+CONFIG_HID_UCLOGIC=y
+CONFIG_HID_LOGITECH=y
+CONFIG_HID_LOGITECH_DJ=y
+CONFIG_HID_MAGICMOUSE=y
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MULTITOUCH=y
+CONFIG_HID_NINTENDO=y
+CONFIG_HID_PICOLCD=y
+CONFIG_HID_PLANTRONICS=y
+CONFIG_HID_PLAYSTATION=y
+CONFIG_PLAYSTATION_FF=y
+CONFIG_HID_ROCCAT=y
+CONFIG_HID_SONY=y
+CONFIG_SONY_FF=y
+CONFIG_HID_STEAM=y
+CONFIG_HID_WACOM=y
+CONFIG_HID_WIIMOTE=y
+CONFIG_USB_HIDDEV=y
+CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
+CONFIG_USB_OTG=y
+CONFIG_USB_XHCI_HCD=y
+CONFIG_USB_EHCI_HCD=y
+CONFIG_USB_EHCI_ROOT_HUB_TT=y
+CONFIG_USB_EHCI_HCD_PLATFORM=y
+CONFIG_USB_ACM=m
+CONFIG_USB_STORAGE=y
+CONFIG_USB_UAS=y
+CONFIG_USB_DWC3=y
+CONFIG_USB_SERIAL=m
+CONFIG_USB_SERIAL_FTDI_SIO=m
+CONFIG_USB_GADGET=y
+CONFIG_USB_CONFIGFS=y
+CONFIG_USB_CONFIGFS_UEVENT=y
+CONFIG_USB_CONFIGFS_SERIAL=y
+CONFIG_USB_CONFIGFS_ACM=y
+CONFIG_USB_CONFIGFS_NCM=y
+CONFIG_USB_CONFIGFS_ECM=y
+CONFIG_USB_CONFIGFS_EEM=y
+CONFIG_USB_CONFIGFS_MASS_STORAGE=y
+CONFIG_USB_CONFIGFS_F_FS=y
+CONFIG_USB_CONFIGFS_F_ACC=y
+CONFIG_USB_CONFIGFS_F_AUDIO_SRC=y
+CONFIG_USB_CONFIGFS_F_UAC2=y
+CONFIG_USB_CONFIGFS_F_MIDI=y
+CONFIG_USB_CONFIGFS_F_HID=y
+CONFIG_USB_CONFIGFS_F_UVC=y
+CONFIG_TYPEC=y
+CONFIG_TYPEC_TCPM=y
+CONFIG_TYPEC_TCPCI=y
+CONFIG_TYPEC_UCSI=y
+CONFIG_MMC=y
+# CONFIG_PWRSEQ_EMMC is not set
+# CONFIG_PWRSEQ_SIMPLE is not set
+CONFIG_MMC_CRYPTO=y
+CONFIG_MMC_SDHCI=y
+CONFIG_MMC_SDHCI_PLTFM=y
+CONFIG_SCSI_UFSHCD=y
+CONFIG_SCSI_UFS_BSG=y
+CONFIG_SCSI_UFS_CRYPTO=y
+CONFIG_SCSI_UFSHCD_PCI=y
+CONFIG_SCSI_UFSHCD_PLATFORM=y
+CONFIG_SCSI_UFS_DWC_TC_PLATFORM=y
+CONFIG_SCSI_UFS_HISI=y
+CONFIG_LEDS_CLASS_FLASH=y
+CONFIG_LEDS_CLASS_MULTICOLOR=y
+CONFIG_LEDS_TRIGGER_TIMER=y
+CONFIG_LEDS_TRIGGER_TRANSIENT=y
+CONFIG_EDAC=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_DRV_PL030=y
+CONFIG_RTC_DRV_PL031=y
+CONFIG_DMABUF_HEAPS=y
+CONFIG_DMABUF_HEAPS_DEFERRED_FREE=y
+CONFIG_UIO=y
+CONFIG_VHOST_VSOCK=y
+CONFIG_STAGING=y
+CONFIG_ASHMEM=y
+CONFIG_COMMON_CLK_SCPI=y
+# CONFIG_SUNXI_CCU is not set
+CONFIG_HWSPINLOCK=y
+# CONFIG_SUN50I_ERRATUM_UNKNOWN1 is not set
+CONFIG_MAILBOX=y
+CONFIG_IOMMU_IO_PGTABLE_ARMV7S=y
+CONFIG_REMOTEPROC=y
+CONFIG_REMOTEPROC_CDEV=y
+CONFIG_RPMSG_CHAR=y
+CONFIG_QCOM_GENI_SE=y
+CONFIG_DEVFREQ_GOV_PERFORMANCE=y
+CONFIG_DEVFREQ_GOV_POWERSAVE=y
+CONFIG_DEVFREQ_GOV_USERSPACE=y
+CONFIG_DEVFREQ_GOV_PASSIVE=y
+CONFIG_PM_DEVFREQ_EVENT=y
+CONFIG_IIO=y
+CONFIG_IIO_BUFFER=y
+CONFIG_IIO_TRIGGER=y
+CONFIG_PWM=y
+CONFIG_GENERIC_PHY=y
+CONFIG_POWERCAP=y
+CONFIG_ANDROID_BINDER_IPC=y
+CONFIG_ANDROID_BINDERFS=y
+CONFIG_ANDROID_VENDOR_HOOKS=y
+CONFIG_ANDROID_DEBUG_KINFO=y
+CONFIG_LIBNVDIMM=y
+CONFIG_INTERCONNECT=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_POSIX_ACL=y
+CONFIG_EXT4_FS_SECURITY=y
+CONFIG_F2FS_FS=y
+CONFIG_F2FS_FS_SECURITY=y
+CONFIG_F2FS_FS_COMPRESSION=y
+CONFIG_F2FS_UNFAIR_RWSEM=y
+CONFIG_FS_ENCRYPTION=y
+CONFIG_FS_ENCRYPTION_INLINE_CRYPT=y
+CONFIG_FS_VERITY=y
+CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
+# CONFIG_DNOTIFY is not set
+CONFIG_QUOTA=y
+CONFIG_QFMT_V2=y
+CONFIG_FUSE_FS=y
+CONFIG_VIRTIO_FS=y
+CONFIG_FUSE_BPF=y
+CONFIG_OVERLAY_FS=y
+CONFIG_INCREMENTAL_FS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_EXFAT_FS=y
+CONFIG_TMPFS=y
+# CONFIG_EFIVAR_FS is not set
+CONFIG_PSTORE=y
+CONFIG_PSTORE_CONSOLE=y
+CONFIG_PSTORE_PMSG=y
+CONFIG_PSTORE_RAM=y
+CONFIG_EROFS_FS=y
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_CODEPAGE_737=y
+CONFIG_NLS_CODEPAGE_775=y
+CONFIG_NLS_CODEPAGE_850=y
+CONFIG_NLS_CODEPAGE_852=y
+CONFIG_NLS_CODEPAGE_855=y
+CONFIG_NLS_CODEPAGE_857=y
+CONFIG_NLS_CODEPAGE_860=y
+CONFIG_NLS_CODEPAGE_861=y
+CONFIG_NLS_CODEPAGE_862=y
+CONFIG_NLS_CODEPAGE_863=y
+CONFIG_NLS_CODEPAGE_864=y
+CONFIG_NLS_CODEPAGE_865=y
+CONFIG_NLS_CODEPAGE_866=y
+CONFIG_NLS_CODEPAGE_869=y
+CONFIG_NLS_CODEPAGE_936=y
+CONFIG_NLS_CODEPAGE_950=y
+CONFIG_NLS_CODEPAGE_932=y
+CONFIG_NLS_CODEPAGE_949=y
+CONFIG_NLS_CODEPAGE_874=y
+CONFIG_NLS_ISO8859_8=y
+CONFIG_NLS_CODEPAGE_1250=y
+CONFIG_NLS_CODEPAGE_1251=y
+CONFIG_NLS_ASCII=y
+CONFIG_NLS_ISO8859_1=y
+CONFIG_NLS_ISO8859_2=y
+CONFIG_NLS_ISO8859_3=y
+CONFIG_NLS_ISO8859_4=y
+CONFIG_NLS_ISO8859_5=y
+CONFIG_NLS_ISO8859_6=y
+CONFIG_NLS_ISO8859_7=y
+CONFIG_NLS_ISO8859_9=y
+CONFIG_NLS_ISO8859_13=y
+CONFIG_NLS_ISO8859_14=y
+CONFIG_NLS_ISO8859_15=y
+CONFIG_NLS_KOI8_R=y
+CONFIG_NLS_KOI8_U=y
+CONFIG_NLS_MAC_ROMAN=y
+CONFIG_NLS_MAC_CELTIC=y
+CONFIG_NLS_MAC_CENTEURO=y
+CONFIG_NLS_MAC_CROATIAN=y
+CONFIG_NLS_MAC_CYRILLIC=y
+CONFIG_NLS_MAC_GAELIC=y
+CONFIG_NLS_MAC_GREEK=y
+CONFIG_NLS_MAC_ICELAND=y
+CONFIG_NLS_MAC_INUIT=y
+CONFIG_NLS_MAC_ROMANIAN=y
+CONFIG_NLS_MAC_TURKISH=y
+CONFIG_NLS_UTF8=y
+CONFIG_UNICODE=y
+CONFIG_SECURITY=y
+CONFIG_SECURITYFS=y
+CONFIG_SECURITY_NETWORK=y
+CONFIG_HARDENED_USERCOPY=y
+CONFIG_FORTIFY_SOURCE=y
+CONFIG_STATIC_USERMODEHELPER=y
+CONFIG_STATIC_USERMODEHELPER_PATH=""
+CONFIG_SECURITY_SELINUX=y
+CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
+CONFIG_CRYPTO_ECDH=y
+CONFIG_CRYPTO_DES=y
+CONFIG_CRYPTO_ADIANTUM=y
+CONFIG_CRYPTO_HCTR2=y
+CONFIG_CRYPTO_CHACHA20POLY1305=y
+CONFIG_CRYPTO_CCM=y
+CONFIG_CRYPTO_BLAKE2B=y
+CONFIG_CRYPTO_CMAC=y
+CONFIG_CRYPTO_MD5=y
+CONFIG_CRYPTO_XCBC=y
+CONFIG_CRYPTO_LZO=y
+CONFIG_CRYPTO_LZ4=y
+CONFIG_CRYPTO_ZSTD=y
+CONFIG_CRYPTO_ANSI_CPRNG=y
+CONFIG_CRYPTO_SHA2_ARM64_CE=y
+CONFIG_CRYPTO_SHA512_ARM64_CE=y
+CONFIG_CRYPTO_POLYVAL_ARM64_CE=y
+CONFIG_CRYPTO_AES_ARM64_CE_BLK=y
+CONFIG_TRACE_MMIO_ACCESS=y
+CONFIG_CRC_CCITT=y
+CONFIG_XZ_DEC=y
+CONFIG_DMA_CMA=y
+CONFIG_PRINTK_TIME=y
+CONFIG_PRINTK_CALLER=y
+CONFIG_DYNAMIC_DEBUG_CORE=y
+CONFIG_DEBUG_INFO_DWARF4=y
+CONFIG_DEBUG_INFO_BTF=y
+CONFIG_MODULE_ALLOW_BTF_MISMATCH=y
+CONFIG_HEADERS_INSTALL=y
+# CONFIG_SECTION_MISMATCH_WARN_ONLY is not set
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_UBSAN=y
+CONFIG_UBSAN_TRAP=y
+CONFIG_UBSAN_LOCAL_BOUNDS=y
+# CONFIG_UBSAN_SHIFT is not set
+# CONFIG_UBSAN_BOOL is not set
+# CONFIG_UBSAN_ENUM is not set
+CONFIG_PAGE_OWNER=y
+CONFIG_DEBUG_STACK_USAGE=y
+CONFIG_DEBUG_MEMORY_INIT=y
+CONFIG_KASAN=y
+CONFIG_KASAN_HW_TAGS=y
+CONFIG_KFENCE=y
+CONFIG_KFENCE_SAMPLE_INTERVAL=500
+CONFIG_KFENCE_NUM_OBJECTS=63
+CONFIG_KFENCE_STATIC_KEYS=y
+CONFIG_PANIC_ON_OOPS=y
+CONFIG_PANIC_TIMEOUT=-1
+CONFIG_SOFTLOCKUP_DETECTOR=y
+CONFIG_WQ_WATCHDOG=y
+CONFIG_SCHEDSTATS=y
+# CONFIG_DEBUG_PREEMPT is not set
+CONFIG_BUG_ON_DATA_CORRUPTION=y
+CONFIG_HIST_TRIGGERS=y
+CONFIG_PID_IN_CONTEXTIDR=y
+CONFIG_KUNIT=y
+CONFIG_KUNIT_DEBUGFS=y
+# CONFIG_KUNIT_DEFAULT_ENABLED is not set
+# CONFIG_RUNTIME_TESTING_MENU is not set

diff --git a/arch/arm64/configs/rockpi4_gki.fragment b/arch/arm64/configs/rockpi4_gki.fragment
new file mode 100644
index 0000000..2af01b8
--- /dev/null
+++ b/arch/arm64/configs/rockpi4_gki.fragment

@@ -0,0 +1,82 @@
+# Core features
+CONFIG_ARCH_ROCKCHIP=y
+# CONFIG_CLK_PX30 is not set
+# CONFIG_CLK_RV110X is not set
+# CONFIG_CLK_RK3036 is not set
+# CONFIG_CLK_RK312X is not set
+# CONFIG_CLK_RK3188 is not set
+# CONFIG_CLK_RK322X is not set
+# CONFIG_CLK_RK3288 is not set
+# CONFIG_CLK_RK3308 is not set
+# CONFIG_CLK_RK3328 is not set
+# CONFIG_CLK_RK3368 is not set
+CONFIG_COMMON_CLK_RK808=m
+CONFIG_CPUFREQ_DT=m
+CONFIG_MFD_RK808=m
+CONFIG_PCIE_ROCKCHIP_HOST=m
+CONFIG_PHY_ROCKCHIP_PCIE=m
+CONFIG_PL330_DMA=m
+CONFIG_PWM_ROCKCHIP=m
+CONFIG_PWRSEQ_SIMPLE=m
+CONFIG_REGULATOR_FAN53555=m
+CONFIG_REGULATOR_PWM=m
+CONFIG_REGULATOR_RK808=m
+CONFIG_ROCKCHIP_EFUSE=m
+CONFIG_ROCKCHIP_IOMMU=y
+CONFIG_ROCKCHIP_IODOMAIN=m
+CONFIG_ROCKCHIP_MBOX=y
+CONFIG_ROCKCHIP_PM_DOMAINS=y
+CONFIG_ROCKCHIP_THERMAL=m
+
+# Ethernet
+CONFIG_STMMAC_ETH=m
+# CONFIG_DWMAC_GENERIC is not set
+# CONFIG_DWMAC_IPQ806X is not set
+# CONFIG_DWMAC_QCOM_ETHQOS is not set
+# CONFIG_DWMAC_SUNXI is not set
+# CONFIG_DWMAC_SUN8I is not set
+
+# I2C
+CONFIG_I2C_RK3X=m
+
+# Watchdog
+CONFIG_DW_WATCHDOG=m
+
+# Display
+CONFIG_DRM_ROCKCHIP=m
+CONFIG_ROCKCHIP_ANALOGIX_DP=y
+CONFIG_ROCKCHIP_DW_HDMI=y
+CONFIG_ROCKCHIP_DW_MIPI_DSI=y
+
+# USB 2.x
+CONFIG_PHY_ROCKCHIP_INNO_USB2=m
+CONFIG_USB_OHCI_HCD=m
+# CONFIG_USB_OHCI_HCD_PCI is not set
+CONFIG_USB_OHCI_HCD_PLATFORM=m
+
+# eMMC / SD-Card
+CONFIG_MMC_SDHCI_OF_ARASAN=m
+CONFIG_MMC_DW=m
+CONFIG_MMC_DW_ROCKCHIP=m
+CONFIG_PHY_ROCKCHIP_EMMC=m
+
+# Real-time clock
+CONFIG_RTC_DRV_RK808=m
+
+# Type-C
+CONFIG_PHY_ROCKCHIP_TYPEC=m
+
+# SAR ADC
+CONFIG_ROCKCHIP_SARADC=m
+
+# Audio
+CONFIG_SND_SOC_ROCKCHIP_I2S=m
+
+# To boot Linux distributions like Debian
+CONFIG_DEVTMPFS=y
+
+# To bootstrap rootfs with QEMU
+CONFIG_HW_RANDOM_VIRTIO=m
+CONFIG_VIRTIO_PCI=m
+CONFIG_VIRTIO_BLK=m
+CONFIG_VIRTIO_NET=m

diff --git a/arch/arm64/crypto/Kbuild.fips140 b/arch/arm64/crypto/Kbuild.fips140
new file mode 100644
index 0000000..9aa0af6
--- /dev/null
+++ b/arch/arm64/crypto/Kbuild.fips140

@@ -0,0 +1,52 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Create a separate FIPS archive that duplicates the modules that are relevant
+# for FIPS 140 certification as builtin objects
+#
+
+sha1-ce-y := sha1-ce-glue.o sha1-ce-core.o
+sha2-ce-y := sha2-ce-glue.o sha2-ce-core.o
+sha512-ce-y := sha512-ce-glue.o sha512-ce-core.o
+ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
+aes-ce-cipher-y := aes-ce-core.o aes-ce-glue.o
+aes-ce-blk-y := aes-glue-ce.o aes-ce.o
+aes-neon-blk-y := aes-glue-neon.o aes-neon.o
+sha256-arm64-y := sha256-glue.o sha256-core.o
+sha512-arm64-y := sha512-glue.o sha512-core.o
+aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
+aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o
+
+crypto-arm64-fips-src	  := $(srctree)/arch/arm64/crypto/
+crypto-arm64-fips-modules := sha1-ce.o sha2-ce.o sha512-ce.o ghash-ce.o \
+			     aes-ce-cipher.o aes-ce-blk.o aes-neon-blk.o \
+			     sha256-arm64.o sha512-arm64.o aes-arm64.o \
+			     aes-neon-bs.o
+
+crypto-fips-objs += $(foreach o,$(crypto-arm64-fips-modules),$($(o:.o=-y):.o=-fips-arch.o))
+
+CFLAGS_aes-glue-ce-fips-arch.o := -DUSE_V8_CRYPTO_EXTENSIONS
+
+$(obj)/aes-glue-%-fips-arch.o: KBUILD_CFLAGS += $(FIPS140_CFLAGS)
+$(obj)/aes-glue-%-fips-arch.o: $(crypto-arm64-fips-src)/aes-glue.c FORCE
+	$(call if_changed_rule,cc_o_c)
+
+$(obj)/%-fips-arch.o: KBUILD_CFLAGS += $(FIPS140_CFLAGS)
+$(obj)/%-fips-arch.o: $(crypto-arm64-fips-src)/%.c FORCE
+	$(call if_changed_rule,cc_o_c)
+
+$(obj)/%-fips-arch.o: $(crypto-arm64-fips-src)/%.S FORCE
+	$(call if_changed_rule,as_o_S)
+
+quiet_cmd_perlasm = PERLASM $@
+      cmd_perlasm = $(PERL) $(<) void $(@)
+
+$(obj)/%-core.S: $(crypto-arm64-fips-src)/%-armv8.pl
+	$(call cmd,perlasm)
+
+$(obj)/sha256-core.S: $(crypto-arm64-fips-src)/sha512-armv8.pl
+	$(call cmd,perlasm)
+
+clean-files += sha256-core.S sha512-core.S
+
+$(obj)/%-fips-arch.o: $(obj)/%.S FORCE
+	$(call if_changed_rule,as_o_S)

diff --git a/arch/arm64/include/asm/alternative-macros.h b/arch/arm64/include/asm/alternative-macros.h
index 3622e9f..c7842fd 100644
--- a/arch/arm64/include/asm/alternative-macros.h
+++ b/arch/arm64/include/asm/alternative-macros.h

@@ -19,6 +19,7 @@
 #error "cpucaps have overflown ARM64_CB_BIT"
 #endif
 
+#ifndef BUILD_FIPS140_KO
 #ifndef __ASSEMBLY__
 
 #include <linux/stringify.h>
@@ -261,4 +262,50 @@ alternative_has_feature_unlikely(unsigned long feature)
 
 #endif /* __ASSEMBLY__ */
 
+#else
+
+/*
+ * The FIPS140 module does not support alternatives patching, as this
+ * invalidates the HMAC digest of the .text section. However, some alternatives
+ * are known to be irrelevant so we can tolerate them in the FIPS140 module, as
+ * they will never be applied in the first place in the use cases that the
+ * FIPS140 module targets (Android running on a production phone). Any other
+ * uses of alternatives should be avoided, as it is not safe in the general
+ * case to simply use the default sequence in one place (the fips module) and
+ * the alternative sequence everywhere else.
+ *
+ * Below is an allowlist of features that we can ignore, by simply taking the
+ * safe default instruction sequence. Note that this implies that the FIPS140
+ * module is not compatible with VHE, or with pseudo-NMI support.
+ */
+
+#define __ALT_ARM64_HAS_LDAPR			0,
+#define __ALT_ARM64_HAS_VIRT_HOST_EXTN		0,
+#define __ALT_ARM64_HAS_IRQ_PRIO_MASKING	0,
+
+#define ALTERNATIVE(oldinstr, newinstr, feature, ...)   \
+	_ALTERNATIVE(oldinstr, __ALT_ ## feature, #feature)
+
+#define _ALTERNATIVE(oldinstr, feature, feature_str)   \
+	__take_second_arg(feature oldinstr, \
+		".err Feature " feature_str " not supported in fips140 module")
+
+#ifndef __ASSEMBLY__
+
+#include <linux/types.h>
+
+static __always_inline bool
+alternative_has_feature_likely(unsigned long feature)
+{
+	return feature == ARM64_HAS_LDAPR ||
+		feature == ARM64_HAS_VIRT_HOST_EXTN ||
+		feature == ARM64_HAS_IRQ_PRIO_MASKING;
+}
+
+#define alternative_has_feature_unlikely alternative_has_feature_likely
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* BUILD_FIPS140_KO */
+
 #endif /* __ASM_ALTERNATIVE_MACROS_H */

diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index 668569a..1cc29960 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h

@@ -196,4 +196,96 @@
 	__init_el2_nvhe_prepare_eret
 .endm
 
+#ifndef __KVM_NVHE_HYPERVISOR__
+// This will clobber tmp1 and tmp2, and expect tmp1 to contain
+// the id register value as read from the HW
+.macro __check_override idreg, fld, width, pass, fail, tmp1, tmp2
+	ubfx	\tmp1, \tmp1, #\fld, #\width
+	cbz	\tmp1, \fail
+
+	adr_l	\tmp1, \idreg\()_override
+	ldr	\tmp2, [\tmp1, FTR_OVR_VAL_OFFSET]
+	ldr	\tmp1, [\tmp1, FTR_OVR_MASK_OFFSET]
+	ubfx	\tmp2, \tmp2, #\fld, #\width
+	ubfx	\tmp1, \tmp1, #\fld, #\width
+	cmp	\tmp1, xzr
+	and	\tmp2, \tmp2, \tmp1
+	csinv	\tmp2, \tmp2, xzr, ne
+	cbnz	\tmp2, \pass
+	b	\fail
+.endm
+
+// This will clobber tmp1 and tmp2
+.macro check_override idreg, fld, pass, fail, tmp1, tmp2
+	mrs	\tmp1, \idreg\()_el1
+	__check_override \idreg \fld 4 \pass \fail \tmp1 \tmp2
+.endm
+#else
+// This will clobber tmp
+.macro __check_override idreg, fld, width, pass, fail, tmp, ignore
+	ldr_l	\tmp, \idreg\()_el1_sys_val
+	ubfx	\tmp, \tmp, #\fld, #\width
+	cbnz	\tmp, \pass
+	b	\fail
+.endm
+
+.macro check_override idreg, fld, pass, fail, tmp, ignore
+	__check_override \idreg \fld 4 \pass \fail \tmp \ignore
+.endm
+#endif
+
+.macro finalise_el2_state
+	check_override id_aa64pfr0, ID_AA64PFR0_EL1_SVE_SHIFT, .Linit_sve_\@, .Lskip_sve_\@, x1, x2
+
+.Linit_sve_\@:	/* SVE register access */
+	mrs	x0, cptr_el2			// Disable SVE traps
+	bic	x0, x0, #CPTR_EL2_TZ
+	msr	cptr_el2, x0
+	isb
+	mov	x1, #ZCR_ELx_LEN_MASK		// SVE: Enable full vector
+	msr_s	SYS_ZCR_EL2, x1			// length for EL1.
+
+.Lskip_sve_\@:
+	check_override id_aa64pfr1, ID_AA64PFR1_EL1_SME_SHIFT, .Linit_sme_\@, .Lskip_sme_\@, x1, x2
+
+.Linit_sme_\@:	/* SME register access and priority mapping */
+	mrs	x0, cptr_el2			// Disable SME traps
+	bic	x0, x0, #CPTR_EL2_TSM
+	msr	cptr_el2, x0
+	isb
+
+	mrs	x1, sctlr_el2
+	orr	x1, x1, #SCTLR_ELx_ENTP2	// Disable TPIDR2 traps
+	msr	sctlr_el2, x1
+	isb
+
+	mov	x0, #0				// SMCR controls
+
+	// Full FP in SM?
+	mrs_s	x1, SYS_ID_AA64SMFR0_EL1
+	__check_override id_aa64smfr0, ID_AA64SMFR0_EL1_FA64_SHIFT, 1, .Linit_sme_fa64_\@, .Lskip_sme_fa64_\@, x1, x2
+
+.Linit_sme_fa64_\@:
+	orr	x0, x0, SMCR_ELx_FA64_MASK
+.Lskip_sme_fa64_\@:
+
+	orr	x0, x0, #SMCR_ELx_LEN_MASK	// Enable full SME vector
+	msr_s	SYS_SMCR_EL2, x0		// length for EL1.
+
+	mrs_s	x1, SYS_SMIDR_EL1		// Priority mapping supported?
+	ubfx    x1, x1, #SMIDR_EL1_SMPS_SHIFT, #1
+	cbz     x1, .Lskip_sme_\@
+
+	msr_s	SYS_SMPRIMAP_EL2, xzr		// Make all priorities equal
+
+	mrs	x1, id_aa64mmfr1_el1		// HCRX_EL2 present?
+	ubfx	x1, x1, #ID_AA64MMFR1_EL1_HCX_SHIFT, #4
+	cbz	x1, .Lskip_sme_\@
+
+	mrs_s	x1, SYS_HCRX_EL2
+	orr	x1, x1, #HCRX_EL2_SMPME_MASK	// Enable priority mapping
+	msr_s	SYS_HCRX_EL2, x1
+.Lskip_sme_\@:
+.endm
+
 #endif /* __ARM_KVM_INIT_H__ */

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 71ed5fd..02bf6e4 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h

@@ -109,6 +109,8 @@ void __init early_fixmap_init(void);
 
 extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
 
+extern pte_t *__get_fixmap_pte(enum fixed_addresses idx);
+
 #include <asm-generic/fixmap.h>
 
 #endif /* !__ASSEMBLY__ */

diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h
index 0ae427f..9b4e4ed 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h

@@ -6,5 +6,14 @@
 
 void kvm_init_hyp_services(void);
 bool kvm_arm_hyp_service_available(u32 func_id);
+void kvm_arm_init_hyp_services(void);
+void kvm_init_memshare_services(void);
+void kvm_init_ioremap_services(void);
+
+#ifdef CONFIG_MEMORY_RELINQUISH
+void kvm_init_memrelinquish_services(void);
+#else
+static inline void kvm_init_memrelinquish_services(void) {}
+#endif
 
 #endif

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 8aa8492..7313fac 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h

@@ -135,7 +135,7 @@
  * 40 bits wide (T0SZ = 24).  Systems with a PARange smaller than 40 bits are
  * not known to exist and will break with this configuration.
  *
- * The VTCR_EL2 is configured per VM and is initialised in kvm_arm_setup_stage2().
+ * The VTCR_EL2 is configured per VM and is initialised in kvm_init_stage2_mmu.
  *
  * Note that when using 4K pages, we concatenate two first level page tables
  * together. With 16K pages, we concatenate 16 first level page tables.
@@ -344,6 +344,8 @@
 #define PAR_TO_HPFAR(par)		\
 	(((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8)
 
+#define FAR_MASK GENMASK_ULL(11, 0)
+
 #define ECN(x) { ESR_ELx_EC_##x, #x }
 
 #define kvm_arm_exception_class \
@@ -361,4 +363,13 @@
 #define CPACR_EL1_DEFAULT	(CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN |\
 				 CPACR_EL1_ZEN_EL1EN)
 
+/*
+ * ARMv8 Reset Values
+ */
+#define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
+				 PSR_F_BIT | PSR_D_BIT)
+
+#define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
+				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
+
 #endif /* __ARM64_KVM_ARM_H__ */

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5303576..96fecb0 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h

@@ -59,23 +59,55 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___kvm_enable_ssbs,
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs,
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config,
+	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
+	__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_ipa,
+	__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid,
+	__KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
+
+	/*
+	 * __pkvm_alloc_module_va may temporarily serve as the privileged hcall
+	 * limit when module loading is enabled, see early_pkvm_enable_modules().
+	 */
+	__KVM_HOST_SMCCC_FUNC___pkvm_alloc_module_va,
+	__KVM_HOST_SMCCC_FUNC___pkvm_map_module_page,
+	__KVM_HOST_SMCCC_FUNC___pkvm_unmap_module_page,
+	__KVM_HOST_SMCCC_FUNC___pkvm_init_module,
+	__KVM_HOST_SMCCC_FUNC___pkvm_register_hcall,
+	__KVM_HOST_SMCCC_FUNC___pkvm_close_module_registration,
 	__KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
 
 	/* Hypercalls available after pKVM finalisation */
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_map_guest,
 	__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
 	__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
-	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
-	__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_ipa,
-	__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid,
-	__KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
 	__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
-	__KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr,
-	__KVM_HOST_SMCCC_FUNC___vgic_v3_write_vmcr,
-	__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
-	__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
-	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
+	__KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs,
+	__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
+	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
+	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+	__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
+	__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
+	__KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
+	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
+	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
+	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state,
+	__KVM_HOST_SMCCC_FUNC___pkvm_iommu_driver_init,
+	__KVM_HOST_SMCCC_FUNC___pkvm_iommu_register,
+	__KVM_HOST_SMCCC_FUNC___pkvm_iommu_pm_notify,
+	__KVM_HOST_SMCCC_FUNC___pkvm_iommu_finalize,
+	__KVM_HOST_SMCCC_FUNC___pkvm_start_tracing,
+	__KVM_HOST_SMCCC_FUNC___pkvm_stop_tracing,
+	__KVM_HOST_SMCCC_FUNC___pkvm_rb_swap_reader_page,
+	__KVM_HOST_SMCCC_FUNC___pkvm_rb_update_footers,
+	__KVM_HOST_SMCCC_FUNC___pkvm_enable_event,
+
+	/*
+	 * Start of the dynamically registered hypercalls. Start a bit
+	 * further, just in case some modules...
+	 */
+	__KVM_HOST_SMCCC_FUNC___dynamic_hcalls = 128,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
@@ -106,7 +138,7 @@ enum __kvm_host_smccc_func {
 #define per_cpu_ptr_nvhe_sym(sym, cpu)						\
 	({									\
 		unsigned long base, off;					\
-		base = kvm_arm_hyp_percpu_base[cpu];				\
+		base = kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];		\
 		off = (unsigned long)&CHOOSE_NVHE_SYM(sym) -			\
 		      (unsigned long)&CHOOSE_NVHE_SYM(__per_cpu_start);		\
 		base ? (typeof(CHOOSE_NVHE_SYM(sym))*)(base + off) : NULL;	\
@@ -211,7 +243,7 @@ DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
 #define __kvm_hyp_init		CHOOSE_NVHE_SYM(__kvm_hyp_init)
 #define __kvm_hyp_vector	CHOOSE_HYP_SYM(__kvm_hyp_vector)
 
-extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
+extern unsigned long kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[];
 DECLARE_KVM_NVHE_SYM(__per_cpu_start);
 DECLARE_KVM_NVHE_SYM(__per_cpu_end);
 
@@ -231,8 +263,6 @@ extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 extern void __kvm_adjust_pc(struct kvm_vcpu *vcpu);
 
 extern u64 __vgic_v3_get_gic_config(void);
-extern u64 __vgic_v3_read_vmcr(void);
-extern void __vgic_v3_write_vmcr(u32 vmcr);
 extern void __vgic_v3_init_lrs(void);
 
 extern u64 __kvm_get_mdcr_el2(void);

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 0d40c48..2cfce26 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h

@@ -42,6 +42,11 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_size_fault(struct kvm_vcpu *vcpu);
 
+unsigned long get_except64_offset(unsigned long psr, unsigned long target_mode,
+				  enum exception_type type);
+unsigned long get_except64_cpsr(unsigned long old, bool has_mte,
+				unsigned long sctlr, unsigned long mode);
+
 void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
 
 #if defined(__KVM_VHE_HYPERVISOR__) || defined(__KVM_NVHE_HYPERVISOR__)
@@ -508,4 +513,61 @@ static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
 	return test_bit(feature, vcpu->arch.features);
 }
 
+static inline int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * For now make sure that both address/generic pointer authentication
+	 * features are requested by the userspace together and the system
+	 * supports these capabilities.
+	 */
+	if (!vcpu_has_feature(vcpu, KVM_ARM_VCPU_PTRAUTH_ADDRESS) ||
+	    !vcpu_has_feature(vcpu, KVM_ARM_VCPU_PTRAUTH_GENERIC) ||
+	    !system_has_full_ptr_auth())
+		return -EINVAL;
+
+	vcpu_set_flag(vcpu, GUEST_HAS_PTRAUTH);
+	return 0;
+}
+
+/* Reset a vcpu's core registers. */
+static inline void kvm_reset_vcpu_core(struct kvm_vcpu *vcpu)
+{
+	u32 pstate;
+
+	if (vcpu_el1_is_32bit(vcpu)) {
+		pstate = VCPU_RESET_PSTATE_SVC;
+	} else {
+		pstate = VCPU_RESET_PSTATE_EL1;
+	}
+
+	/* Reset core registers */
+	memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu)));
+	memset(&vcpu->arch.ctxt.fp_regs, 0, sizeof(vcpu->arch.ctxt.fp_regs));
+	vcpu->arch.ctxt.spsr_abt = 0;
+	vcpu->arch.ctxt.spsr_und = 0;
+	vcpu->arch.ctxt.spsr_irq = 0;
+	vcpu->arch.ctxt.spsr_fiq = 0;
+	vcpu_gp_regs(vcpu)->pstate = pstate;
+}
+
+/* PSCI reset handling for a vcpu. */
+static inline void kvm_reset_vcpu_psci(struct kvm_vcpu *vcpu,
+				       struct vcpu_reset_state *reset_state)
+{
+	unsigned long target_pc = reset_state->pc;
+
+	/* Gracefully handle Thumb2 entry point */
+	if (vcpu_mode_is_32bit(vcpu) && (target_pc & 1)) {
+		target_pc &= ~1UL;
+		vcpu_set_thumb(vcpu);
+	}
+
+	/* Propagate caller endianness */
+	if (reset_state->be)
+		kvm_vcpu_set_be(vcpu);
+
+	*vcpu_pc(vcpu) = target_pc;
+	vcpu_set_reg(vcpu, 0, reset_state->r0);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 45e2136..b4209ee 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h

@@ -73,6 +73,64 @@ u32 __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
 
+struct kvm_hyp_memcache {
+	phys_addr_t head;
+	unsigned long nr_pages;
+};
+
+static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
+				     phys_addr_t *p,
+				     phys_addr_t (*to_pa)(void *virt))
+{
+	*p = mc->head;
+	mc->head = to_pa(p);
+	mc->nr_pages++;
+}
+
+static inline void *pop_hyp_memcache(struct kvm_hyp_memcache *mc,
+				     void *(*to_va)(phys_addr_t phys))
+{
+	phys_addr_t *p = to_va(mc->head);
+
+	if (!mc->nr_pages)
+		return NULL;
+
+	mc->head = *p;
+	mc->nr_pages--;
+
+	return p;
+}
+
+static inline int __topup_hyp_memcache(struct kvm_hyp_memcache *mc,
+				       unsigned long min_pages,
+				       void *(*alloc_fn)(void *arg),
+				       phys_addr_t (*to_pa)(void *virt),
+				       void *arg)
+{
+	while (mc->nr_pages < min_pages) {
+		phys_addr_t *p = alloc_fn(arg);
+
+		if (!p)
+			return -ENOMEM;
+		push_hyp_memcache(mc, p, to_pa);
+	}
+
+	return 0;
+}
+
+static inline void __free_hyp_memcache(struct kvm_hyp_memcache *mc,
+				       void (*free_fn)(void *virt, void *arg),
+				       void *(*to_va)(phys_addr_t phys),
+				       void *arg)
+{
+	while (mc->nr_pages)
+		free_fn(pop_hyp_memcache(mc, to_va), arg);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc, struct kvm *kvm);
+void free_hyp_stage2_memcache(struct kvm_hyp_memcache *mc, struct kvm *kvm);
+int topup_hyp_memcache(struct kvm_vcpu *vcpu);
+
 struct kvm_vmid {
 	atomic64_t id;
 };
@@ -115,6 +173,23 @@ struct kvm_smccc_features {
 	unsigned long vendor_hyp_bmap;
 };
 
+struct kvm_pinned_page {
+	struct rb_node		node;
+	struct page		*page;
+	u64			ipa;
+};
+
+typedef unsigned int pkvm_handle_t;
+
+struct kvm_protected_vm {
+	pkvm_handle_t handle;
+	struct kvm_hyp_memcache teardown_mc;
+	struct kvm_hyp_memcache teardown_stage2_mc;
+	struct rb_root pinned_pages;
+	gpa_t pvmfw_load_addr;
+	bool enabled;
+};
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -149,7 +224,8 @@ struct kvm_arch {
 #define KVM_ARCH_FLAG_EL1_32BIT				4
 	/* PSCI SYSTEM_SUSPEND enabled for the guest */
 #define KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED		5
-
+	/* Guest has bought into the MMIO guard extension */
+#define KVM_ARCH_FLAG_MMIO_GUARD			6
 	unsigned long flags;
 
 	/*
@@ -166,6 +242,12 @@ struct kvm_arch {
 
 	/* Hypercall features firmware registers' descriptor */
 	struct kvm_smccc_features smccc_feat;
+
+	/*
+	 * For an untrusted host VM, 'pkvm.handle' is used to lookup
+	 * the associated pKVM instance in the hypervisor.
+	 */
+	struct kvm_protected_vm pkvm;
 };
 
 struct kvm_vcpu_fault_info {
@@ -277,6 +359,7 @@ struct kvm_host_data {
 struct kvm_host_psci_config {
 	/* PSCI version used by host. */
 	u32 version;
+	u32 smccc_version;
 
 	/* Function IDs used by host if version is v0.1. */
 	struct psci_0_1_function_ids function_ids_0_1;
@@ -296,6 +379,28 @@ extern s64 kvm_nvhe_sym(hyp_physvirt_offset);
 extern u64 kvm_nvhe_sym(hyp_cpu_logical_map)[NR_CPUS];
 #define hyp_cpu_logical_map CHOOSE_NVHE_SYM(hyp_cpu_logical_map)
 
+enum pkvm_iommu_pm_event {
+	PKVM_IOMMU_PM_SUSPEND,
+	PKVM_IOMMU_PM_RESUME,
+};
+
+struct pkvm_iommu_ops;
+
+struct pkvm_iommu_driver {
+	const struct pkvm_iommu_ops *ops;
+	struct list_head list;
+	atomic_t state;
+};
+
+int pkvm_iommu_driver_init(u64 drv, void *data, size_t size);
+int pkvm_iommu_register(struct device *dev, u64 drv,
+			phys_addr_t pa, size_t size, struct device *parent);
+int pkvm_iommu_suspend(struct device *dev);
+int pkvm_iommu_resume(struct device *dev);
+
+/* Reject future calls to pkvm_iommu_driver_init() and pkvm_iommu_register(). */
+int pkvm_iommu_finalize(void);
+
 struct vcpu_reset_state {
 	unsigned long	pc;
 	unsigned long	r0;
@@ -399,8 +504,12 @@ struct kvm_vcpu_arch {
 	/* vcpu power state */
 	struct kvm_mp_state mp_state;
 
-	/* Cache some mmu pages needed inside spinlock regions */
-	struct kvm_mmu_memory_cache mmu_page_cache;
+	union {
+		/* Cache some mmu pages needed inside spinlock regions */
+		struct kvm_mmu_memory_cache mmu_page_cache;
+		/* Pages to be donated to pkvm/EL2 if it runs out */
+		struct kvm_hyp_memcache pkvm_memcache;
+	};
 
 	/* Target CPU and feature flags */
 	int target;
@@ -474,9 +583,25 @@ struct kvm_vcpu_arch {
 		*fset &= ~(m);					\
 	} while (0)
 
+#define __vcpu_copy_flag(vt, vs, flagset, f, m)			\
+	do {							\
+		typeof(vs->arch.flagset) tmp, val;		\
+								\
+		__build_check_flag(vs, flagset, f, m);		\
+								\
+		val = READ_ONCE(vs->arch.flagset);		\
+		val &= (m);					\
+		tmp = READ_ONCE(vt->arch.flagset);		\
+		tmp &= ~(m);					\
+		tmp |= val;					\
+		WRITE_ONCE(vt->arch.flagset, tmp);		\
+	} while (0)
+
+
 #define vcpu_get_flag(v, ...)	__vcpu_get_flag((v), __VA_ARGS__)
 #define vcpu_set_flag(v, ...)	__vcpu_set_flag((v), __VA_ARGS__)
 #define vcpu_clear_flag(v, ...)	__vcpu_clear_flag((v), __VA_ARGS__)
+#define vcpu_copy_flag(vt, vs,...) __vcpu_copy_flag((vt), (vs), __VA_ARGS__)
 
 /* SVE exposed to guest */
 #define GUEST_HAS_SVE		__vcpu_single_flag(cflags, BIT(0))
@@ -494,6 +619,8 @@ struct kvm_vcpu_arch {
 #define INCREMENT_PC		__vcpu_single_flag(iflags, BIT(1))
 /* Target EL/MODE (not a single flag, but let's abuse the macro) */
 #define EXCEPT_MASK		__vcpu_single_flag(iflags, GENMASK(3, 1))
+/* Cover both PENDING_EXCEPTION and EXCEPT_MASK for global operations */
+#define PC_UPDATE_REQ		__vcpu_single_flag(iflags, GENMASK(3, 0))
 
 /* Helpers to encode exceptions with minimum fuss */
 #define __EXCEPT_MASK_VAL	unpack_vcpu_flag(EXCEPT_MASK)
@@ -525,6 +652,8 @@ struct kvm_vcpu_arch {
 #define DEBUG_STATE_SAVE_SPE	__vcpu_single_flag(iflags, BIT(5))
 /* Save TRBE context if active  */
 #define DEBUG_STATE_SAVE_TRBE	__vcpu_single_flag(iflags, BIT(6))
+/* pKVM host vcpu state is dirty, needs resync */
+#define PKVM_HOST_STATE_DIRTY	__vcpu_single_flag(iflags, BIT(7))
 
 /* SVE enabled for host EL0 */
 #define HOST_SVE_ENABLED	__vcpu_single_flag(sflags, BIT(0))
@@ -601,9 +730,6 @@ struct kvm_vcpu_arch {
 
 #define __vcpu_sys_reg(v,r)	(ctxt_sys_reg(&(v)->arch.ctxt, (r)))
 
-u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg);
-void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg);
-
 static inline bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
 {
 	/*
@@ -695,8 +821,32 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
 	return true;
 }
 
+#define vcpu_read_sys_reg(__vcpu, reg)					\
+	({								\
+		u64 __val = 0x8badf00d8badf00d;				\
+									\
+		/* SYSREGS_ON_CPU is only used in VHE */		\
+		((!is_nvhe_hyp_code() &&				\
+		  vcpu_get_flag(__vcpu, SYSREGS_ON_CPU) &&		\
+		  __vcpu_read_sys_reg_from_cpu(reg, &__val))) ?		\
+		 __val							\
+		 :							\
+		 ctxt_sys_reg(&__vcpu->arch.ctxt, reg);			\
+	 })
+
+#define vcpu_write_sys_reg(__vcpu, __val, reg)				\
+	do {								\
+		/* SYSREGS_ON_CPU is only used in VHE */		\
+		if (is_nvhe_hyp_code() ||				\
+		    !vcpu_get_flag(__vcpu, SYSREGS_ON_CPU) ||		\
+		    !__vcpu_write_sys_reg_to_cpu(__val, reg))		\
+			ctxt_sys_reg(&__vcpu->arch.ctxt, reg) = __val;	\
+	} while (0)
+
 struct kvm_vm_stat {
 	struct kvm_vm_stat_generic generic;
+	atomic64_t protected_hyp_mem;
+	atomic64_t protected_shared_mem;
 };
 
 struct kvm_vcpu_stat {
@@ -869,9 +1019,26 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
+#define __vcpu_save_guest_debug_regs(vcpu)				\
+	do {								\
+		u64 val = vcpu_read_sys_reg(vcpu, MDSCR_EL1);		\
+									\
+		(vcpu)->arch.guest_debug_preserved.mdscr_el1 = val;	\
+	} while(0)
+
+#define __vcpu_restore_guest_debug_regs(vcpu)				\
+	do {								\
+		u64 val = (vcpu)->arch.guest_debug_preserved.mdscr_el1;	\
+									\
+		vcpu_write_sys_reg(vcpu, val, MDSCR_EL1);		\
+	} while (0)
+
 #define kvm_vcpu_os_lock_enabled(vcpu)		\
 	(!!(__vcpu_sys_reg(vcpu, OSLSR_EL1) & SYS_OSLSR_OSLK))
 
+#define kvm_vcpu_needs_debug_regs(vcpu)		\
+	((vcpu)->guest_debug || kvm_vcpu_os_lock_enabled(vcpu))
+
 int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
 			       struct kvm_device_attr *attr);
 int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
@@ -915,12 +1082,7 @@ int kvm_set_ipa_limit(void);
 #define __KVM_HAVE_ARCH_VM_ALLOC
 struct kvm *kvm_arch_alloc_vm(void);
 
-int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
-
-static inline bool kvm_vm_is_protected(struct kvm *kvm)
-{
-	return false;
-}
+#define kvm_vm_is_protected(kvm)	((kvm)->arch.pkvm.enabled)
 
 void kvm_init_protected_traps(struct kvm_vcpu *vcpu);
 

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index aa7fa2a..d12b208 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h

@@ -15,6 +15,9 @@
 DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
 DECLARE_PER_CPU(unsigned long, kvm_hyp_vector);
 DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
+DECLARE_PER_CPU(int, hyp_cpu_number);
+
+#define hyp_smp_processor_id() (__this_cpu_read(hyp_cpu_number))
 
 #define read_sysreg_elx(r,nvh,vh)					\
 	({								\
@@ -61,8 +64,8 @@ void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if);
-void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if);
-void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_save_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_restore_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 #ifdef __KVM_NVHE_HYPERVISOR__
@@ -90,6 +93,7 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
+void __sve_save_state(void *sve_pffr, u32 *fpsr);
 void __sve_restore_state(void *sve_pffr, u32 *fpsr);
 
 #ifndef __KVM_NVHE_HYPERVISOR__
@@ -122,5 +126,18 @@ extern u64 kvm_nvhe_sym(id_aa64isar2_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
+extern u64 kvm_nvhe_sym(id_aa64smfr0_el1_sys_val);
 
+extern unsigned long kvm_nvhe_sym(__icache_flags);
+extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);
+extern bool kvm_nvhe_sym(smccc_trng_available);
+
+extern bool kvm_nvhe_sym(__pkvm_modules_enabled);
+
+struct kvm_nvhe_clock_data {
+	u32 mult;
+	u32 shift;
+	u64 epoch_ns;
+	u64 epoch_cyc;
+};
 #endif /* __ARM64_KVM_HYP_H__ */

diff --git a/arch/arm64/include/asm/kvm_hypevents.h b/arch/arm64/include/asm/kvm_hypevents.h
new file mode 100644
index 0000000..8a2dd41
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_hypevents.h

@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARM64_KVM_HYPEVENTS_H_
+#define __ARM64_KVM_HYPEVENTS_H_
+
+#ifdef __KVM_NVHE_HYPERVISOR__
+#include <nvhe/trace.h>
+#endif
+
+/*
+ * Hypervisor events definitions.
+ */
+
+HYP_EVENT(hyp_enter,
+	HE_PROTO(void),
+	HE_STRUCT(
+	),
+	HE_ASSIGN(
+	),
+	HE_PRINTK(" ")
+);
+
+HYP_EVENT(hyp_exit,
+	HE_PROTO(void),
+	HE_STRUCT(
+	),
+	HE_ASSIGN(
+	),
+	HE_PRINTK(" ")
+);
+
+HYP_EVENT(host_hcall,
+	HE_PROTO(unsigned int id, u8 invalid),
+	HE_STRUCT(
+		he_field(unsigned int, id)
+		he_field(u8, invalid)
+	),
+	HE_ASSIGN(
+		__entry->id = id;
+		__entry->invalid = invalid;
+	),
+	HE_PRINTK("id=%u invalid=%u",
+		  __entry->id, __entry->invalid)
+);
+
+HYP_EVENT(host_smc,
+	HE_PROTO(u64 id, u8 forwarded),
+	HE_STRUCT(
+		he_field(u64, id)
+		he_field(u8, forwarded)
+	),
+	HE_ASSIGN(
+		__entry->id = id;
+		__entry->forwarded = forwarded;
+	),
+	HE_PRINTK("id=%llu invalid=%u",
+		  __entry->id, __entry->forwarded)
+);
+
+
+HYP_EVENT(host_mem_abort,
+	HE_PROTO(u64 esr, u64 addr),
+	HE_STRUCT(
+		he_field(u64, esr)
+		he_field(u64, addr)
+	),
+	HE_ASSIGN(
+		__entry->esr = esr;
+		__entry->addr = addr;
+	),
+	HE_PRINTK("esr=0x%llx addr=0x%llx",
+		  __entry->esr, __entry->addr)
+);
+#endif

diff --git a/arch/arm64/include/asm/kvm_hypevents_defs.h b/arch/arm64/include/asm/kvm_hypevents_defs.h
new file mode 100644
index 0000000..e228d89
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_hypevents_defs.h

@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARM64_KVM_HYPEVENTS_DEFS_H
+#define __ARM64_KVM_HYPEVENTS_DEFS_H
+
+struct hyp_event_id {
+	unsigned short id;
+	void *data;
+};
+
+struct hyp_entry_hdr {
+	unsigned short id;
+};
+
+/*
+ * Hyp events definitions common to the hyp and the host
+ */
+#define HYP_EVENT_FORMAT(__name, __struct)	\
+	struct trace_hyp_format_##__name {	\
+		struct hyp_entry_hdr hdr;	\
+		__struct			\
+	}
+
+#define HE_PROTO(args...)	args
+#define HE_STRUCT(args...)	args
+#define HE_ASSIGN(args...)	args
+#define HE_PRINTK(args...)	args
+
+#define he_field(type, item)	type item;
+#endif

diff --git a/arch/arm64/include/asm/kvm_hyptrace.h b/arch/arm64/include/asm/kvm_hyptrace.h
new file mode 100644
index 0000000..d32b445
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_hyptrace.h

@@ -0,0 +1,21 @@
+#ifndef __ARM64_KVM_HYPTRACE_H_
+#define __ARM64_KVM_HYPTRACE_H_
+#include <asm/kvm_hyp.h>
+
+#include <linux/ring_buffer_ext.h>
+
+/*
+ * Host donations to the hypervisor to store the struct hyp_buffer_page.
+ */
+struct hyp_buffer_pages_backing {
+	unsigned long start;
+	size_t size;
+};
+
+struct hyp_trace_pack {
+	struct hyp_buffer_pages_backing		backing;
+	struct kvm_nvhe_clock_data		trace_clock_data;
+	struct trace_buffer_pack		trace_buffer_pack;
+
+};
+#endif

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 7784081..0f2cfa1 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h

@@ -116,6 +116,7 @@ alternative_cb_end
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
 #include <asm/kvm_host.h>
+#include <asm/kvm_pkvm_module.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -166,7 +167,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
-int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
@@ -187,8 +188,13 @@ static inline void *__kvm_vector_slot2addr(void *base,
 
 struct kvm;
 
-#define kvm_flush_dcache_to_poc(a,l)	\
-	dcache_clean_inval_poc((unsigned long)(a), (unsigned long)(a)+(l))
+#define kvm_flush_dcache_to_poc(a, l)	do {			\
+	unsigned long __a = (unsigned long)(a);			\
+	unsigned long __l = (unsigned long)(l);			\
+								\
+	if (__l)						\
+		dcache_clean_inval_poc(__a, __a + __l);		\
+} while (0)
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3252eb5..054612a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h

@@ -42,6 +42,38 @@ typedef u64 kvm_pte_t;
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
 #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
 
+#define KVM_PHYS_INVALID		(-1ULL)
+
+#define KVM_PTE_TYPE			BIT(1)
+#define KVM_PTE_TYPE_BLOCK		0
+#define KVM_PTE_TYPE_PAGE		1
+#define KVM_PTE_TYPE_TABLE		1
+
+#define KVM_PTE_LEAF_ATTR_LO		GENMASK(11, 2)
+
+#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX	GENMASK(4, 2)
+#define KVM_PTE_LEAF_ATTR_LO_S1_AP	GENMASK(7, 6)
+#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RO	3
+#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RW	1
+#define KVM_PTE_LEAF_ATTR_LO_S1_SH	GENMASK(9, 8)
+#define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS	3
+#define KVM_PTE_LEAF_ATTR_LO_S1_AF	BIT(10)
+
+#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR	GENMASK(5, 2)
+#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R	BIT(6)
+#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W	BIT(7)
+#define KVM_PTE_LEAF_ATTR_LO_S2_SH	GENMASK(9, 8)
+#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS	3
+#define KVM_PTE_LEAF_ATTR_LO_S2_AF	BIT(10)
+
+#define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
+
+#define KVM_PTE_LEAF_ATTR_HI_SW		GENMASK(58, 55)
+
+#define KVM_PTE_LEAF_ATTR_HI_S1_XN	BIT(54)
+
+#define KVM_PTE_LEAF_ATTR_HI_S2_XN	BIT(54)
+
 static inline bool kvm_pte_valid(kvm_pte_t pte)
 {
 	return pte & KVM_PTE_VALID;
@@ -57,6 +89,18 @@ static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
 	return pa;
 }
 
+static inline kvm_pte_t kvm_phys_to_pte(u64 pa)
+{
+	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
+
+	if (PAGE_SHIFT == 16) {
+		pa &= GENMASK(51, 48);
+		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+	}
+
+	return pte;
+}
+
 static inline u64 kvm_granule_shift(u32 level)
 {
 	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
@@ -73,6 +117,17 @@ static inline bool kvm_level_supports_block_mapping(u32 level)
 	return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
 }
 
+static inline bool kvm_pte_table(kvm_pte_t pte, u32 level)
+{
+	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
+		return false;
+
+	if (!kvm_pte_valid(pte))
+		return false;
+
+	return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
+}
+
 /**
  * struct kvm_pgtable_mm_ops - Memory management callbacks.
  * @zalloc_page:		Allocate a single zeroed memory page.
@@ -129,6 +184,7 @@ enum kvm_pgtable_stage2_flags {
  * @KVM_PGTABLE_PROT_W:		Write permission.
  * @KVM_PGTABLE_PROT_R:		Read permission.
  * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
+ * @KVM_PGTABLE_PROT_NC:       Normal non-cacheable attributes.
  * @KVM_PGTABLE_PROT_SW0:	Software bit 0.
  * @KVM_PGTABLE_PROT_SW1:	Software bit 1.
  * @KVM_PGTABLE_PROT_SW2:	Software bit 2.
@@ -140,6 +196,7 @@ enum kvm_pgtable_prot {
 	KVM_PGTABLE_PROT_R			= BIT(2),
 
 	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
+	KVM_PGTABLE_PROT_NC			= BIT(4),
 
 	KVM_PGTABLE_PROT_SW0			= BIT(55),
 	KVM_PGTABLE_PROT_SW1			= BIT(56),
@@ -161,6 +218,22 @@ enum kvm_pgtable_prot {
 typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
 					   enum kvm_pgtable_prot prot);
 
+typedef bool (*kvm_pgtable_pte_is_counted_cb_t)(kvm_pte_t pte, u32 level);
+
+/**
+ * struct kvm_pgtable_pte_ops - PTE callbacks.
+ * @force_pte_cb:		Force the mapping granularity to pages and
+ *				return true if we support this instead of
+ *				block mappings.
+ * @pte_is_counted_cb		Verify the attributes of the @pte argument
+ *				and return true if the descriptor needs to be
+ *				refcounted, otherwise return false.
+ */
+struct kvm_pgtable_pte_ops {
+	kvm_pgtable_force_pte_cb_t		force_pte_cb;
+	kvm_pgtable_pte_is_counted_cb_t		pte_is_counted_cb;
+};
+
 /**
  * struct kvm_pgtable - KVM page-table.
  * @ia_bits:		Maximum input address size, in bits.
@@ -169,8 +242,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
  * @mm_ops:		Memory management callbacks.
  * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
  * @flags:		Stage-2 page-table flags.
- * @force_pte_cb:	Function that returns true if page level mappings must
- *			be used instead of block mappings.
+ * @pte_ops:		PTE callbacks.
  */
 struct kvm_pgtable {
 	u32					ia_bits;
@@ -181,7 +253,7 @@ struct kvm_pgtable {
 	/* Stage-2 only */
 	struct kvm_s2_mmu			*mmu;
 	enum kvm_pgtable_stage2_flags		flags;
-	kvm_pgtable_force_pte_cb_t		force_pte_cb;
+	struct kvm_pgtable_pte_ops		*pte_ops;
 };
 
 /**
@@ -297,23 +369,30 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
 
 /**
+ * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
+ * @vtcr:	Content of the VTCR register.
+ *
+ * Return: the size (in bytes) of the stage-2 PGD
+ */
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr);
+
+/**
  * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
  * @mmu:	S2 MMU context for this S2 translation
  * @mm_ops:	Memory management callbacks.
  * @flags:	Stage-2 configuration flags.
- * @force_pte_cb: Function that returns true if page level mappings must
- *		be used instead of block mappings.
+ * @pte_ops:	PTE callbacks.
  *
  * Return: 0 on success, negative error code on failure.
  */
 int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 			      struct kvm_pgtable_mm_ops *mm_ops,
 			      enum kvm_pgtable_stage2_flags flags,
-			      kvm_pgtable_force_pte_cb_t force_pte_cb);
+			      struct kvm_pgtable_pte_ops *pte_ops);
 
-#define kvm_pgtable_stage2_init(pgt, mmu, mm_ops) \
-	__kvm_pgtable_stage2_init(pgt, mmu, mm_ops, 0, NULL)
+#define kvm_pgtable_stage2_init(pgt, mmu, mm_ops, pte_ops) \
+	__kvm_pgtable_stage2_init(pgt, mmu, mm_ops, 0, pte_ops)
 
 /**
  * kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
@@ -357,14 +436,16 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   void *mc);
 
 /**
- * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
- *				    track ownership.
+ * kvm_pgtable_stage2_annotate() - Unmap and annotate pages in the IPA space
+ *				   to track ownership (and more).
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
  * @addr:	Base intermediate physical address to annotate.
  * @size:	Size of the annotated range.
  * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
  *		page-table pages.
- * @owner_id:	Unique identifier for the owner of the page.
+ * @annotation:	A 63 bit value that will be stored in the page tables.
+ *		@annotation[0] must be 0, and @annotation[63:1] is stored
+ *		in the page tables.
  *
  * By default, all page-tables are owned by identifier 0. This function can be
  * used to mark portions of the IPA space as owned by other entities. When a
@@ -373,8 +454,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
-				 void *mc, u8 owner_id);
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				void *mc, kvm_pte_t annotation);
 
 /**
  * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 9f4ad2a..2c7652b 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h

@@ -2,18 +2,298 @@
 /*
  * Copyright (C) 2020 - Google LLC
  * Author: Quentin Perret <qperret@google.com>
+ * Author: Fuad Tabba <tabba@google.com>
  */
 #ifndef __ARM64_KVM_PKVM_H__
 #define __ARM64_KVM_PKVM_H__
 
+#include <linux/arm_ffa.h>
 #include <linux/memblock.h>
+#include <linux/scatterlist.h>
 #include <asm/kvm_pgtable.h>
+#include <asm/sysreg.h>
+
+/* Maximum number of VMs that can co-exist under pKVM. */
+#define KVM_MAX_PVMS 255
 
 #define HYP_MEMBLOCK_REGIONS 128
+#define PVMFW_INVALID_LOAD_ADDR	(-1)
+
+int pkvm_vm_ioctl_enable_cap(struct kvm *kvm,struct kvm_enable_cap *cap);
+int pkvm_init_host_vm(struct kvm *kvm, unsigned long type);
+int pkvm_create_hyp_vm(struct kvm *kvm);
+void pkvm_destroy_hyp_vm(struct kvm *kvm);
+void pkvm_host_reclaim_page(struct kvm *host_kvm, phys_addr_t ipa);
+
+/*
+ * Definitions for features to be allowed or restricted for guest virtual
+ * machines, depending on the mode KVM is running in and on the type of guest
+ * that is running.
+ *
+ * The ALLOW masks represent a bitmask of feature fields that are allowed
+ * without any restrictions as long as they are supported by the system.
+ *
+ * The RESTRICT_UNSIGNED masks, if present, represent unsigned fields for
+ * features that are restricted to support at most the specified feature.
+ *
+ * If a feature field is not present in either, than it is not supported.
+ *
+ * The approach taken for protected VMs is to allow features that are:
+ * - Needed by common Linux distributions (e.g., floating point)
+ * - Trivial to support, e.g., supporting the feature does not introduce or
+ * require tracking of additional state in KVM
+ * - Cannot be trapped or prevent the guest from using anyway
+ */
+
+/*
+ * Allow for protected VMs:
+ * - Floating-point and Advanced SIMD
+ * - GICv3(+) system register interface
+ * - Data Independent Timing
+ */
+#define PVM_ID_AA64PFR0_ALLOW (\
+	ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_FP) | \
+	ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_AdvSIMD) | \
+	ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_GIC) | \
+	ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_DIT) \
+	)
+
+/*
+ * Restrict to the following *unsigned* features for protected VMs:
+ * - AArch64 guests only (no support for AArch32 guests):
+ *	AArch32 adds complexity in trap handling, emulation, condition codes,
+ *	etc...
+ * - RAS (v1)
+ *	Supported by KVM
+ */
+#define PVM_ID_AA64PFR0_RESTRICT_UNSIGNED (\
+	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_EL0), ID_AA64PFR0_EL1_ELx_64BIT_ONLY) | \
+	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_EL1), ID_AA64PFR0_EL1_ELx_64BIT_ONLY) | \
+	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_EL2), ID_AA64PFR0_EL1_ELx_64BIT_ONLY) | \
+	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_EL3), ID_AA64PFR0_EL1_ELx_64BIT_ONLY) | \
+	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_RAS), ID_AA64PFR0_EL1_RAS_IMP) \
+	)
+
+/*
+ * Allow for protected VMs:
+ * - Branch Target Identification
+ * - Speculative Store Bypassing
+ */
+#define PVM_ID_AA64PFR1_ALLOW (\
+	ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_BT) | \
+	ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SSBS) \
+	)
+
+/*
+ * Allow for protected VMs:
+ * - Mixed-endian
+ * - Distinction between Secure and Non-secure Memory
+ * - Mixed-endian at EL0 only
+ * - Non-context synchronizing exception entry and exit
+ */
+#define PVM_ID_AA64MMFR0_ALLOW (\
+	ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_BIGEND) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_SNSMEM) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_BIGENDEL0) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_EXS) \
+	)
+
+/*
+ * Restrict to the following *unsigned* features for protected VMs:
+ * - 40-bit IPA
+ * - 16-bit ASID
+ */
+#define PVM_ID_AA64MMFR0_RESTRICT_UNSIGNED (\
+	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_PARANGE), ID_AA64MMFR0_EL1_PARANGE_40) | \
+	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_ASIDBITS), ID_AA64MMFR0_EL1_ASIDBITS_16) \
+	)
+
+/*
+ * Allow for protected VMs:
+ * - Hardware translation table updates to Access flag and Dirty state
+ * - Number of VMID bits from CPU
+ * - Hierarchical Permission Disables
+ * - Privileged Access Never
+ * - SError interrupt exceptions from speculative reads
+ * - Enhanced Translation Synchronization
+ */
+#define PVM_ID_AA64MMFR1_ALLOW (\
+	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_HAFDBS) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_VMIDBits) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_HPDS) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_PAN) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_SpecSEI) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_ETS) \
+	)
+
+/*
+ * Allow for protected VMs:
+ * - Common not Private translations
+ * - User Access Override
+ * - IESB bit in the SCTLR_ELx registers
+ * - Unaligned single-copy atomicity and atomic functions
+ * - ESR_ELx.EC value on an exception by read access to feature ID space
+ * - TTL field in address operations.
+ * - Break-before-make sequences when changing translation block size
+ * - E0PDx mechanism
+ */
+#define PVM_ID_AA64MMFR2_ALLOW (\
+	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_CnP) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_UAO) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_IESB) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_AT) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_IDS) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_TTL) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_BBM) | \
+	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_E0PD) \
+	)
+
+/*
+ * No support for Scalable Vectors for protected VMs:
+ *	Requires additional support from KVM, e.g., context-switching and
+ *	trapping at EL2
+ */
+#define PVM_ID_AA64ZFR0_ALLOW (0ULL)
+
+/*
+ * No support for debug, including breakpoints, and watchpoints for protected
+ * VMs:
+ *	The Arm architecture mandates support for at least the Armv8 debug
+ *	architecture, which would include at least 2 hardware breakpoints and
+ *	watchpoints. Providing that support to protected guests adds
+ *	considerable state and complexity. Therefore, the reserved value of 0 is
+ *	used for debug-related fields.
+ */
+#define PVM_ID_AA64DFR0_ALLOW (0ULL)
+#define PVM_ID_AA64DFR1_ALLOW (0ULL)
+
+/*
+ * No support for implementation defined features.
+ */
+#define PVM_ID_AA64AFR0_ALLOW (0ULL)
+#define PVM_ID_AA64AFR1_ALLOW (0ULL)
+
+/*
+ * No restrictions on instructions implemented in AArch64.
+ */
+#define PVM_ID_AA64ISAR0_ALLOW (\
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_AES) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_SHA1) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_SHA2) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_CRC32) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_ATOMIC) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_RDM) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_SHA3) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_SM3) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_SM4) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_DP) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_FHM) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_TS) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_TLB) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_RNDR) \
+	)
+
+#define PVM_ID_AA64ISAR1_ALLOW (\
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_DPB) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_APA) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_API) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_JSCVT) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_FCMA) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_LRCPC) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_GPA) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_GPI) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_FRINTTS) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_SB) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_SPECRES) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_BF16) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_DGH) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_I8MM) \
+	)
+
+#define PVM_ID_AA64ISAR2_ALLOW (\
+	ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_GPA3) | \
+	ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_APA3) \
+	)
+
+/*
+ * Returns the maximum number of breakpoints supported for protected VMs.
+ */
+static inline int pkvm_get_max_brps(void)
+{
+	int num = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_BRPs),
+			    PVM_ID_AA64DFR0_ALLOW);
+
+	/*
+	 * If breakpoints are supported, the maximum number is 1 + the field.
+	 * Otherwise, return 0, which is not compliant with the architecture,
+	 * but is reserved and is used here to indicate no debug support.
+	 */
+	return num ? num + 1 : 0;
+}
+
+/*
+ * Returns the maximum number of watchpoints supported for protected VMs.
+ */
+static inline int pkvm_get_max_wrps(void)
+{
+	int num = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_WRPs),
+			    PVM_ID_AA64DFR0_ALLOW);
+
+	return num ? num + 1 : 0;
+}
+
+enum pkvm_moveable_reg_type {
+	PKVM_MREG_MEMORY,
+	PKVM_MREG_PROTECTED_RANGE,
+};
+
+struct pkvm_moveable_reg {
+	phys_addr_t start;
+	u64 size;
+	enum pkvm_moveable_reg_type type;
+};
+
+#define PKVM_NR_MOVEABLE_REGS 512
+extern struct pkvm_moveable_reg kvm_nvhe_sym(pkvm_moveable_regs)[];
+extern unsigned int kvm_nvhe_sym(pkvm_moveable_regs_nr);
 
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
+extern phys_addr_t kvm_nvhe_sym(pvmfw_base);
+extern phys_addr_t kvm_nvhe_sym(pvmfw_size);
+
+static inline unsigned long
+hyp_vmemmap_memblock_size(struct memblock_region *reg, size_t vmemmap_entry_size)
+{
+	unsigned long nr_pages = reg->size >> PAGE_SHIFT;
+	unsigned long start, end;
+
+	start = (reg->base >> PAGE_SHIFT) * vmemmap_entry_size;
+	end = start + nr_pages * vmemmap_entry_size;
+	start = ALIGN_DOWN(start, PAGE_SIZE);
+	end = ALIGN(end, PAGE_SIZE);
+
+	return end - start;
+}
+
+static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
+{
+	unsigned long res = 0, i;
+
+	for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
+		res += hyp_vmemmap_memblock_size(&kvm_nvhe_sym(hyp_memory)[i],
+						 vmemmap_entry_size);
+	}
+
+	return res >> PAGE_SHIFT;
+}
+
+static inline unsigned long hyp_vm_table_pages(void)
+{
+	return PAGE_ALIGN(KVM_MAX_PVMS * sizeof(void *)) >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
 	unsigned long total = 0, i;
@@ -27,27 +307,28 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 	return total;
 }
 
-static inline unsigned long __hyp_pgtable_total_pages(void)
+static inline unsigned long __hyp_pgtable_moveable_regs_pages(void)
 {
 	unsigned long res = 0, i;
 
-	/* Cover all of memory with page-granularity */
-	for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
-		struct memblock_region *reg = &kvm_nvhe_sym(hyp_memory)[i];
+	/* Cover all of moveable regions with page-granularity */
+	for (i = 0; i < kvm_nvhe_sym(pkvm_moveable_regs_nr); i++) {
+		struct pkvm_moveable_reg *reg = &kvm_nvhe_sym(pkvm_moveable_regs)[i];
 		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
 	}
 
 	return res;
 }
 
+#define __PKVM_PRIVATE_SZ SZ_1G
+
 static inline unsigned long hyp_s1_pgtable_pages(void)
 {
 	unsigned long res;
 
-	res = __hyp_pgtable_total_pages();
+	res = __hyp_pgtable_moveable_regs_pages();
 
-	/* Allow 1 GiB for private mappings */
-	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
+	res += __hyp_pgtable_max_pages(__PKVM_PRIVATE_SZ >> PAGE_SHIFT);
 
 	return res;
 }
@@ -60,12 +341,48 @@ static inline unsigned long host_s2_pgtable_pages(void)
 	 * Include an extra 16 pages to safely upper-bound the worst case of
 	 * concatenated pgds.
 	 */
-	res = __hyp_pgtable_total_pages() + 16;
+	res = __hyp_pgtable_moveable_regs_pages() + 16;
 
-	/* Allow 1 GiB for MMIO mappings */
+	/* Allow 1 GiB for non-moveable regions */
 	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
 
 	return res;
 }
 
+#define KVM_FFA_MBOX_NR_PAGES	1
+
+/*
+ * Maximum number of consitutents allowed in a descriptor. This number is
+ * arbitrary, see comment below on SG_MAX_SEGMENTS in hyp_ffa_proxy_pages().
+ */
+#define KVM_FFA_MAX_NR_CONSTITUENTS	4096
+
+static inline unsigned long hyp_ffa_proxy_pages(void)
+{
+	size_t desc_max;
+
+	/*
+	 * SG_MAX_SEGMENTS is supposed to bound the number of elements in an
+	 * sglist, which should match the number of consituents in the
+	 * corresponding FFA descriptor. As such, the EL2 buffer needs to be
+	 * large enough to hold a descriptor with SG_MAX_SEGMENTS consituents
+	 * at least. But the kernel's DMA code doesn't enforce the limit, and
+	 * it is sometimes abused, so let's allow larger descriptors and hope
+	 * for the best.
+	 */
+	BUILD_BUG_ON(KVM_FFA_MAX_NR_CONSTITUENTS < SG_MAX_SEGMENTS);
+
+	/*
+	 * The hypervisor FFA proxy needs enough memory to buffer a fragmented
+	 * descriptor returned from EL3 in response to a RETRIEVE_REQ call.
+	 */
+	desc_max = sizeof(struct ffa_mem_region) +
+		   sizeof(struct ffa_mem_region_attributes) +
+		   sizeof(struct ffa_composite_mem_region) +
+		   KVM_FFA_MAX_NR_CONSTITUENTS * sizeof(struct ffa_mem_region_addr_range);
+
+	/* Plus a page each for the hypervisor's RX and TX mailboxes. */
+	return (2 * KVM_FFA_MBOX_NR_PAGES) + DIV_ROUND_UP(desc_max, PAGE_SIZE);
+}
+
 #endif	/* __ARM64_KVM_PKVM_H__ */

diff --git a/arch/arm64/include/asm/kvm_pkvm_module.h b/arch/arm64/include/asm/kvm_pkvm_module.h
new file mode 100644
index 0000000..a3f80be
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_pkvm_module.h

@@ -0,0 +1,101 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARM64_KVM_PKVM_MODULE_H__
+#define __ARM64_KVM_PKVM_MODULE_H__
+
+#include <asm/kvm_pgtable.h>
+#include <linux/export.h>
+
+typedef void (*dyn_hcall_t)(struct kvm_cpu_context *);
+
+enum pkvm_psci_notification {
+	PKVM_PSCI_CPU_SUSPEND,
+	PKVM_PSCI_SYSTEM_SUSPEND,
+	PKVM_PSCI_CPU_ENTRY,
+};
+
+#ifdef CONFIG_MODULES
+struct pkvm_module_ops {
+	int (*create_private_mapping)(phys_addr_t phys, size_t size,
+				      enum kvm_pgtable_prot prot,
+				      unsigned long *haddr);
+	void *(*alloc_module_va)(u64 nr_pages);
+	int (*map_module_page)(u64 pfn, void *va, enum kvm_pgtable_prot prot, bool is_protected);
+	int (*register_serial_driver)(void (*hyp_putc_cb)(char));
+	void (*puts)(const char *str);
+	void (*putx64)(u64 num);
+	void *(*fixmap_map)(phys_addr_t phys);
+	void (*fixmap_unmap)(void);
+	void *(*linear_map_early)(phys_addr_t phys, size_t size, enum kvm_pgtable_prot prot);
+	void (*linear_unmap_early)(void *addr, size_t size);
+	void (*flush_dcache_to_poc)(void *addr, size_t size);
+	int (*register_host_perm_fault_handler)(int (*cb)(struct kvm_cpu_context *ctxt, u64 esr, u64 addr));
+	int (*host_stage2_mod_prot)(u64 pfn, enum kvm_pgtable_prot prot);
+	int (*host_stage2_get_leaf)(phys_addr_t phys, kvm_pte_t *ptep, u32 *level);
+	int (*register_host_smc_handler)(bool (*cb)(struct kvm_cpu_context *));
+	int (*register_default_trap_handler)(bool (*cb)(struct kvm_cpu_context *));
+	int (*register_illegal_abt_notifier)(void (*cb)(struct kvm_cpu_context *));
+	int (*register_psci_notifier)(void (*cb)(enum pkvm_psci_notification, struct kvm_cpu_context *));
+	int (*register_hyp_panic_notifier)(void (*cb)(struct kvm_cpu_context *host_ctxt));
+	int (*host_donate_hyp)(u64 pfn, u64 nr_pages);
+	int (*hyp_donate_host)(u64 pfn, u64 nr_pages);
+	void* (*memcpy)(void *to, const void *from, size_t count);
+	void* (*memset)(void *dst, int c, size_t count);
+	phys_addr_t (*hyp_pa)(void *x);
+	void* (*hyp_va)(phys_addr_t phys);
+	unsigned long (*kern_hyp_va)(unsigned long x);
+};
+
+int __pkvm_load_el2_module(struct module *this, unsigned long *token);
+
+int __pkvm_register_el2_call(unsigned long hfn_hyp_va);
+#else
+static inline int __pkvm_load_el2_module(struct module *this,
+					 unsigned long *token)
+{
+	return -ENOSYS;
+}
+
+static inline int __pkvm_register_el2_call(unsigned long hfn_hyp_va)
+{
+	return -ENOSYS;
+}
+#endif /* CONFIG_MODULES */
+
+#ifdef MODULE
+/*
+ * Convert an EL2 module addr from the kernel VA to the hyp VA
+ */
+#define pkvm_el2_mod_va(kern_va, token)					\
+({									\
+	unsigned long hyp_text_kern_va =				\
+		(unsigned long)THIS_MODULE->arch.hyp.text.start;	\
+	unsigned long offset;						\
+									\
+	offset = (unsigned long)kern_va - hyp_text_kern_va;		\
+	token + offset;							\
+})
+
+#define pkvm_load_el2_module(init_fn, token)				\
+({									\
+	THIS_MODULE->arch.hyp.init = init_fn;				\
+	__pkvm_load_el2_module(THIS_MODULE, token);			\
+})
+
+#define pkvm_register_el2_mod_call(hfn, token)				\
+({									\
+	__pkvm_register_el2_call(pkvm_el2_mod_va(hfn, token));		\
+})
+
+#define pkvm_el2_mod_call(id, ...)					\
+	({								\
+		struct arm_smccc_res res;				\
+									\
+		arm_smccc_1_1_hvc(KVM_HOST_SMCCC_ID(id),		\
+				  ##__VA_ARGS__, &res);			\
+		WARN_ON(res.a0 != SMCCC_RET_SUCCESS);			\
+									\
+		res.a1;							\
+	})
+#endif
+#endif

diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
index c503db8..2875129 100644
--- a/arch/arm64/include/asm/lse.h
+++ b/arch/arm64/include/asm/lse.h

@@ -4,7 +4,7 @@
 
 #include <asm/atomic_ll_sc.h>
 
-#ifdef CONFIG_ARM64_LSE_ATOMICS
+#if defined(CONFIG_ARM64_LSE_ATOMICS) && !defined(BUILD_FIPS140_KO)
 
 #define __LSE_PREAMBLE	".arch_extension lse\n"
 

diff --git a/arch/arm64/include/asm/mem_encrypt.h b/arch/arm64/include/asm/mem_encrypt.h
new file mode 100644
index 0000000..300c8b8
--- /dev/null
+++ b/arch/arm64/include/asm/mem_encrypt.h

@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_MEM_ENCRYPT_H
+#define __ASM_MEM_ENCRYPT_H
+
+bool mem_encrypt_active(void);
+int set_memory_encrypted(unsigned long addr, int numpages);
+int set_memory_decrypted(unsigned long addr, int numpages);
+
+#endif	/* __ASM_MEM_ENCRYPT_H */

diff --git a/arch/arm64/include/asm/mem_relinquish.h b/arch/arm64/include/asm/mem_relinquish.h
new file mode 100644
index 0000000..ac51786
--- /dev/null
+++ b/arch/arm64/include/asm/mem_relinquish.h

@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2022 Google LLC
+ * Author: Keir Fraser <keirf@google.com>
+ */
+
+#ifndef __ASM_MEM_RELINQUISH_H
+#define __ASM_MEM_RELINQUISH_H
+
+struct page;
+
+bool kvm_has_memrelinquish_services(void);
+void page_relinquish(struct page *page);
+
+#endif	/* __ASM_MEM_RELINQUISH_H */

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 9dd08cd..cb7055d 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h

@@ -147,6 +147,7 @@
  * Memory types for Stage-2 translation
  */
 #define MT_S2_NORMAL		0xf
+#define MT_S2_NORMAL_NC		0x5
 #define MT_S2_DEVICE_nGnRE	0x1
 
 /*
@@ -154,6 +155,7 @@
  * Stage-2 enforces Normal-WB and Device-nGnRE
  */
 #define MT_S2_FWB_NORMAL	6
+#define MT_S2_FWB_NORMAL_NC	5
 #define MT_S2_FWB_DEVICE_nGnRE	1
 
 #ifdef CONFIG_ARM64_4K_PAGES

diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
index 18734fe..3a505c1 100644
--- a/arch/arm64/include/asm/module.h
+++ b/arch/arm64/include/asm/module.h

@@ -14,12 +14,50 @@ struct mod_plt_sec {
 	int			plt_max_entries;
 };
 
-struct mod_arch_specific {
-	struct mod_plt_sec	core;
-	struct mod_plt_sec	init;
-
-	/* for CONFIG_DYNAMIC_FTRACE */
+#define ARM64_MODULE_PLTS_ARCHDATA					\
+	struct mod_plt_sec	core;					\
+	struct mod_plt_sec	init;					\
+									\
+	/* for CONFIG_DYNAMIC_FTRACE */					\
 	struct plt_entry	*ftrace_trampolines;
+#else
+#define ARM64_MODULE_PLTS_ARCHDATA
+#endif
+
+#ifdef CONFIG_KVM
+struct pkvm_module_section {
+	void *start;
+	void *end;
+};
+
+typedef s32 kvm_nvhe_reloc_t;
+struct pkvm_module_ops;
+
+struct pkvm_el2_module {
+	struct pkvm_module_section text;
+	struct pkvm_module_section bss;
+	struct pkvm_module_section rodata;
+	struct pkvm_module_section data;
+	kvm_nvhe_reloc_t *relocs;
+	unsigned int nr_relocs;
+	int (*init)(const struct pkvm_module_ops *ops);
+};
+
+void kvm_apply_hyp_module_relocations(void *mod_start, void *hyp_va,
+				      kvm_nvhe_reloc_t *begin,
+				      kvm_nvhe_reloc_t *end);
+
+#define ARM64_MODULE_KVM_ARCHDATA					\
+	/* For pKVM hypervisor modules */				\
+	struct pkvm_el2_module	hyp;
+#else
+#define ARM64_MODULE_KVM_ARCHDATA
+#endif
+
+#ifdef CONFIG_HAVE_MOD_ARCH_SPECIFIC
+struct mod_arch_specific {
+	ARM64_MODULE_PLTS_ARCHDATA
+	ARM64_MODULE_KVM_ARCHDATA
 };
 #endif
 

diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
index 094701ec..f11d922 100644
--- a/arch/arm64/include/asm/module.lds.h
+++ b/arch/arm64/include/asm/module.lds.h

@@ -1,3 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include <asm/page-def.h>
+
 SECTIONS {
 #ifdef CONFIG_ARM64_MODULE_PLTS
 	.plt 0 : { BYTE(0) }
@@ -17,4 +20,24 @@
 	 */
 	.text.hot : { *(.text.hot) }
 #endif
+
+#ifdef CONFIG_KVM
+	.hyp.text : ALIGN(PAGE_SIZE) {
+		*(.hyp.text)
+		. = ALIGN(PAGE_SIZE);
+	}
+	.hyp.bss : ALIGN(PAGE_SIZE) {
+		*(.hyp.bss)
+		. = ALIGN(PAGE_SIZE);
+	}
+	.hyp.rodata : ALIGN(PAGE_SIZE) {
+		*(.hyp.rodata)
+		. = ALIGN(PAGE_SIZE);
+	}
+	.hyp.data : ALIGN(PAGE_SIZE) {
+		*(.hyp.data)
+		. = ALIGN(PAGE_SIZE);
+	}
+	.hyp.reloc : ALIGN(4) {	*(.hyp.reloc) }
+#endif
 }

diff --git a/arch/arm64/include/asm/patching.h b/arch/arm64/include/asm/patching.h
index 6bf5adc..82b1e0c 100644
--- a/arch/arm64/include/asm/patching.h
+++ b/arch/arm64/include/asm/patching.h

@@ -6,6 +6,7 @@
 
 int aarch64_insn_read(void *addr, u32 *insnp);
 int aarch64_insn_write(void *addr, u32 insn);
+int aarch64_addr_write(void *addr, u64 dst);
 
 int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
 int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 400f895..14e6187 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h

@@ -31,6 +31,7 @@
 #include <linux/stddef.h>
 #include <linux/string.h>
 #include <linux/thread_info.h>
+#include <linux/android_vendor.h>
 
 #include <vdso/processor.h>
 
@@ -152,6 +153,8 @@ struct thread_struct {
 		struct user_fpsimd_state fpsimd_state;
 	} uw;
 
+	ANDROID_VENDOR_DATA(1);
+
 	unsigned int		fpsimd_cpu;
 	void			*sve_state;	/* SVE registers, if any */
 	void			*za_state;	/* ZA register, if any */

diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 40971ac..51b0d59 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h

@@ -11,6 +11,7 @@ extern char __alt_instructions[], __alt_instructions_end[];
 extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
 extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
+extern char __hyp_data_start[], __hyp_data_end[];
 extern char __hyp_rodata_start[], __hyp_rodata_end[];
 extern char __hyp_reloc_begin[], __hyp_reloc_end[];
 extern char __hyp_bss_start[], __hyp_bss_end[];

diff --git a/arch/arm64/include/asm/set_memory.h b/arch/arm64/include/asm/set_memory.h
index 0f740b7..0ad879e4 100644
--- a/arch/arm64/include/asm/set_memory.h
+++ b/arch/arm64/include/asm/set_memory.h

@@ -10,6 +10,7 @@ bool can_set_direct_map(void);
 
 int set_memory_valid(unsigned long addr, int numpages, int enable);
 
+int arch_set_direct_map_range_uncached(unsigned long addr, unsigned long numpages);
 int set_direct_map_invalid_noflush(struct page *page);
 int set_direct_map_default_noflush(struct page *page);
 bool kernel_page_present(struct page *page);

diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
index 6a75d7e..543fa59 100644
--- a/arch/arm64/include/asm/simd.h
+++ b/arch/arm64/include/asm/simd.h

@@ -35,9 +35,7 @@ static __must_check inline bool may_use_simd(void)
 	 * migrated, and if it's clear we cannot be migrated to a CPU
 	 * where it is set.
 	 */
-	return !WARN_ON(!system_capabilities_finalized()) &&
-	       system_supports_fpsimd() &&
-	       !in_hardirq() && !irqs_disabled() && !in_nmi() &&
+	return !in_hardirq() && !irqs_disabled() && !in_nmi() &&
 	       !this_cpu_read(fpsimd_context_busy);
 }
 

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index fc55f5a..b936377 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h

@@ -87,6 +87,8 @@ extern void secondary_entry(void);
 
 extern void arch_send_call_function_single_ipi(int cpu);
 extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
+extern int nr_ipi_get(void);
+extern struct irq_desc **ipi_desc_get(void);
 
 #ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
 extern void arch_send_wakeup_ipi_mask(const struct cpumask *mask);

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 4eb601e..44dbede 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h

@@ -81,6 +81,12 @@ void __hyp_reset_vectors(void);
 
 DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 
+static inline bool is_pkvm_initialized(void)
+{
+	return IS_ENABLED(CONFIG_KVM) &&
+	       static_branch_likely(&kvm_protected_mode_initialized);
+}
+
 /* Reports the availability of HYP mode */
 static inline bool is_hyp_mode_available(void)
 {
@@ -88,8 +94,7 @@ static inline bool is_hyp_mode_available(void)
 	 * If KVM protected mode is initialized, all CPUs must have been booted
 	 * in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
 	 */
-	if (IS_ENABLED(CONFIG_KVM) &&
-	    static_branch_likely(&kvm_protected_mode_initialized))
+	if (is_pkvm_initialized())
 		return true;
 
 	return (__boot_cpu_mode[0] == BOOT_CPU_MODE_EL2 &&
@@ -103,8 +108,7 @@ static inline bool is_hyp_mode_mismatched(void)
 	 * If KVM protected mode is initialized, all CPUs must have been booted
 	 * in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
 	 */
-	if (IS_ENABLED(CONFIG_KVM) &&
-	    static_branch_likely(&kvm_protected_mode_initialized))
+	if (is_pkvm_initialized())
 		return false;
 
 	return __boot_cpu_mode[0] != __boot_cpu_mode[1];

diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 316917b..288f2da2cb 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h

@@ -457,6 +457,15 @@ enum {
 #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
 #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
 
+/* Protected KVM */
+#define KVM_CAP_ARM_PROTECTED_VM_FLAGS_SET_FW_IPA	0
+#define KVM_CAP_ARM_PROTECTED_VM_FLAGS_INFO		1
+
+struct kvm_protected_vm_info {
+	__u64 firmware_size;
+	__u64 __reserved[7];
+};
+
 /* arm64-specific kvm_run::system_event flags */
 /*
  * Reset caused by a PSCI v1.1 SYSTEM_RESET2 call.

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b3f37e2..a274518 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c

@@ -1458,6 +1458,7 @@ const struct cpumask *system_32bit_el0_cpumask(void)
 
 	return cpu_possible_mask;
 }
+EXPORT_SYMBOL_GPL(system_32bit_el0_cpumask);
 
 static int __init parse_32bit_el0_param(char *str)
 {

diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 3da0977..872218e 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c

@@ -283,21 +283,25 @@ void register_user_break_hook(struct break_hook *hook)
 {
 	register_debug_hook(&hook->node, &user_break_hook);
 }
+EXPORT_SYMBOL_GPL(register_user_break_hook);
 
 void unregister_user_break_hook(struct break_hook *hook)
 {
 	unregister_debug_hook(&hook->node);
 }
+EXPORT_SYMBOL_GPL(unregister_user_break_hook);
 
 void register_kernel_break_hook(struct break_hook *hook)
 {
 	register_debug_hook(&hook->node, &kernel_break_hook);
 }
+EXPORT_SYMBOL_GPL(register_kernel_break_hook);
 
 void unregister_kernel_break_hook(struct break_hook *hook)
 {
 	unregister_debug_hook(&hook->node);
 }
+EXPORT_SYMBOL_GPL(unregister_kernel_break_hook);
 
 static int call_break_hook(struct pt_regs *regs, unsigned long esr)
 {

diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 2ee18c8..9439240 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S

@@ -16,30 +16,6 @@
 #include <asm/ptrace.h>
 #include <asm/virt.h>
 
-// Warning, hardcoded register allocation
-// This will clobber x1 and x2, and expect x1 to contain
-// the id register value as read from the HW
-.macro __check_override idreg, fld, width, pass, fail
-	ubfx	x1, x1, #\fld, #\width
-	cbz	x1, \fail
-
-	adr_l	x1, \idreg\()_override
-	ldr	x2, [x1, FTR_OVR_VAL_OFFSET]
-	ldr	x1, [x1, FTR_OVR_MASK_OFFSET]
-	ubfx	x2, x2, #\fld, #\width
-	ubfx	x1, x1, #\fld, #\width
-	cmp	x1, xzr
-	and	x2, x2, x1
-	csinv	x2, x2, xzr, ne
-	cbnz	x2, \pass
-	b	\fail
-.endm
-
-.macro check_override idreg, fld, pass, fail
-	mrs	x1, \idreg\()_el1
-	__check_override \idreg \fld 4 \pass \fail
-.endm
-
 	.text
 	.pushsection	.hyp.text, "ax"
 
@@ -98,58 +74,7 @@
 SYM_CODE_END(elx_sync)
 
 SYM_CODE_START_LOCAL(__finalise_el2)
-	check_override id_aa64pfr0 ID_AA64PFR0_EL1_SVE_SHIFT .Linit_sve .Lskip_sve
-
-.Linit_sve:	/* SVE register access */
-	mrs	x0, cptr_el2			// Disable SVE traps
-	bic	x0, x0, #CPTR_EL2_TZ
-	msr	cptr_el2, x0
-	isb
-	mov	x1, #ZCR_ELx_LEN_MASK		// SVE: Enable full vector
-	msr_s	SYS_ZCR_EL2, x1			// length for EL1.
-
-.Lskip_sve:
-	check_override id_aa64pfr1 ID_AA64PFR1_EL1_SME_SHIFT .Linit_sme .Lskip_sme
-
-.Linit_sme:	/* SME register access and priority mapping */
-	mrs	x0, cptr_el2			// Disable SME traps
-	bic	x0, x0, #CPTR_EL2_TSM
-	msr	cptr_el2, x0
-	isb
-
-	mrs	x1, sctlr_el2
-	orr	x1, x1, #SCTLR_ELx_ENTP2	// Disable TPIDR2 traps
-	msr	sctlr_el2, x1
-	isb
-
-	mov	x0, #0				// SMCR controls
-
-	// Full FP in SM?
-	mrs_s	x1, SYS_ID_AA64SMFR0_EL1
-	__check_override id_aa64smfr0 ID_AA64SMFR0_EL1_FA64_SHIFT 1 .Linit_sme_fa64 .Lskip_sme_fa64
-
-.Linit_sme_fa64:
-	orr	x0, x0, SMCR_ELx_FA64_MASK
-.Lskip_sme_fa64:
-
-	orr	x0, x0, #SMCR_ELx_LEN_MASK	// Enable full SME vector
-	msr_s	SYS_SMCR_EL2, x0		// length for EL1.
-
-	mrs_s	x1, SYS_SMIDR_EL1		// Priority mapping supported?
-	ubfx    x1, x1, #SMIDR_EL1_SMPS_SHIFT, #1
-	cbz     x1, .Lskip_sme
-
-	msr_s	SYS_SMPRIMAP_EL2, xzr		// Make all priorities equal
-
-	mrs	x1, id_aa64mmfr1_el1		// HCRX_EL2 present?
-	ubfx	x1, x1, #ID_AA64MMFR1_EL1_HCX_SHIFT, #4
-	cbz	x1, .Lskip_sme
-
-	mrs_s	x1, SYS_HCRX_EL2
-	orr	x1, x1, #HCRX_EL2_SMPME_MASK	// Enable priority mapping
-	msr_s	SYS_HCRX_EL2, x1
-
-.Lskip_sme:
+	finalise_el2_state
 
 	// nVHE? No way! Give me the real thing!
 	// Sanity check: MMU *must* be off
@@ -157,7 +82,7 @@
 	tbnz	x1, #0, 1f
 
 	// Needs to be VHE capable, obviously
-	check_override id_aa64mmfr1 ID_AA64MMFR1_EL1_VH_SHIFT 2f 1f
+	check_override id_aa64mmfr1 ID_AA64MMFR1_EL1_VH_SHIFT 2f 1f x1 x2
 
 1:	mov_q	x0, HVC_STUB_ERR
 	eret

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index 9513376..1af3ac4 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c

@@ -287,8 +287,11 @@ static __init void parse_cmdline(void)
 {
 	const u8 *prop = get_bootargs_cmdline();
 
-	if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || !prop)
+	if (IS_ENABLED(CONFIG_CMDLINE_EXTEND) ||
+	    IS_ENABLED(CONFIG_CMDLINE_FORCE) ||
+	    !prop) {
 		__parse_cmdline(CONFIG_CMDLINE, true);
+	}
 
 	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE) && prop)
 		__parse_cmdline(prop, true);

diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 8151412..1f778c3 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h

@@ -71,12 +71,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* Kernel symbol used by icache_is_vpipt(). */
-KVM_NVHE_ALIAS(__icache_flags);
-
-/* VMID bits set by the KVM VMID allocator */
-KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
-
 /* Static keys which are set if a vGIC trap should be handled in hyp. */
 KVM_NVHE_ALIAS(vgic_v2_cpuif_trap);
 KVM_NVHE_ALIAS(vgic_v3_cpuif_trap);
@@ -92,9 +86,6 @@ KVM_NVHE_ALIAS(gic_nonsecure_priorities);
 KVM_NVHE_ALIAS(__start___kvm_ex_table);
 KVM_NVHE_ALIAS(__stop___kvm_ex_table);
 
-/* Array containing bases of nVHE per-CPU memory regions. */
-KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
-
 /* PMU available static key */
 #ifdef CONFIG_HW_PERF_EVENTS
 KVM_NVHE_ALIAS(kvm_arm_pmu_available);
@@ -111,12 +102,6 @@ KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
 KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
 #endif
 
-/* Kernel memory sections */
-KVM_NVHE_ALIAS(__start_rodata);
-KVM_NVHE_ALIAS(__end_rodata);
-KVM_NVHE_ALIAS(__bss_start);
-KVM_NVHE_ALIAS(__bss_stop);
-
 /* Hyp memory sections */
 KVM_NVHE_ALIAS(__hyp_idmap_text_start);
 KVM_NVHE_ALIAS(__hyp_idmap_text_end);
@@ -124,8 +109,14 @@ KVM_NVHE_ALIAS(__hyp_text_start);
 KVM_NVHE_ALIAS(__hyp_text_end);
 KVM_NVHE_ALIAS(__hyp_bss_start);
 KVM_NVHE_ALIAS(__hyp_bss_end);
+KVM_NVHE_ALIAS(__hyp_data_start);
+KVM_NVHE_ALIAS(__hyp_data_end);
 KVM_NVHE_ALIAS(__hyp_rodata_start);
 KVM_NVHE_ALIAS(__hyp_rodata_end);
+#ifdef CONFIG_FTRACE
+KVM_NVHE_ALIAS(__hyp_event_ids_start);
+KVM_NVHE_ALIAS(__hyp_event_ids_end);
+#endif
 
 /* pKVM static key */
 KVM_NVHE_ALIAS(kvm_protected_mode_initialized);

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 76b41e4..f01d8b3 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c

@@ -505,14 +505,76 @@ static int module_init_ftrace_plt(const Elf_Ehdr *hdr,
 	return 0;
 }
 
+static int module_init_hyp(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs,
+			   struct module *mod)
+{
+#ifdef CONFIG_KVM
+	const Elf_Shdr *s;
+
+	/*
+	 * If the .hyp.text is missing or empty, this is not a hypervisor
+	 * module so ignore the rest of it.
+	 */
+	s = find_section(hdr, sechdrs, ".hyp.text");
+	if (!s || !s->sh_size)
+		return 0;
+
+	mod->arch.hyp.text = (struct pkvm_module_section) {
+		.start	= (void *)s->sh_addr,
+		.end	= (void *)s->sh_addr + s->sh_size,
+	};
+
+	s = find_section(hdr, sechdrs, ".hyp.bss");
+	if (!s)
+		return -ENOEXEC;
+
+	mod->arch.hyp.bss = (struct pkvm_module_section) {
+		.start	= (void *)s->sh_addr,
+		.end	= (void *)s->sh_addr + s->sh_size,
+	};
+
+	s = find_section(hdr, sechdrs, ".hyp.rodata");
+	if (!s)
+		return -ENOEXEC;
+
+	mod->arch.hyp.rodata = (struct pkvm_module_section) {
+		.start	= (void *)s->sh_addr,
+		.end	= (void *)s->sh_addr + s->sh_size,
+	};
+
+	s = find_section(hdr, sechdrs, ".hyp.data");
+	if (!s)
+		return -ENOEXEC;
+
+	mod->arch.hyp.data = (struct pkvm_module_section) {
+		.start	= (void *)s->sh_addr,
+		.end	= (void *)s->sh_addr + s->sh_size,
+	};
+
+	s = find_section(hdr, sechdrs, ".hyp.reloc");
+	if (!s)
+		return -ENOEXEC;
+
+	mod->arch.hyp.relocs = (void *)s->sh_addr;
+	mod->arch.hyp.nr_relocs = s->sh_size / sizeof(*mod->arch.hyp.relocs);
+#endif
+	return 0;
+}
+
 int module_finalize(const Elf_Ehdr *hdr,
 		    const Elf_Shdr *sechdrs,
 		    struct module *me)
 {
+	int err;
 	const Elf_Shdr *s;
+
 	s = find_section(hdr, sechdrs, ".altinstructions");
 	if (s)
 		apply_alternatives_module((void *)s->sh_addr, s->sh_size);
 
-	return module_init_ftrace_plt(hdr, sechdrs, me);
+	err = module_init_ftrace_plt(hdr, sechdrs, me);
+	if (err)
+		return err;
+
+	return module_init_hyp(hdr, sechdrs, me);
 }

diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c
index 33e0fab..b336073 100644
--- a/arch/arm64/kernel/patching.c
+++ b/arch/arm64/kernel/patching.c

@@ -66,16 +66,16 @@ int __kprobes aarch64_insn_read(void *addr, u32 *insnp)
 	return ret;
 }
 
-static int __kprobes __aarch64_insn_write(void *addr, __le32 insn)
+static int __kprobes __aarch64_text_write(void *dst, void *src, size_t size)
 {
-	void *waddr = addr;
-	unsigned long flags = 0;
+	unsigned long flags;
+	void *waddr;
 	int ret;
 
 	raw_spin_lock_irqsave(&patch_lock, flags);
-	waddr = patch_map(addr, FIX_TEXT_POKE0);
+	waddr = patch_map(dst, FIX_TEXT_POKE0);
 
-	ret = copy_to_kernel_nofault(waddr, &insn, AARCH64_INSN_SIZE);
+	ret = copy_to_kernel_nofault(waddr, src, size);
 
 	patch_unmap(FIX_TEXT_POKE0);
 	raw_spin_unlock_irqrestore(&patch_lock, flags);
@@ -85,7 +85,14 @@ static int __kprobes __aarch64_insn_write(void *addr, __le32 insn)
 
 int __kprobes aarch64_insn_write(void *addr, u32 insn)
 {
-	return __aarch64_insn_write(addr, cpu_to_le32(insn));
+	__le32 __insn = cpu_to_le32(insn);
+
+	return __aarch64_text_write(addr, &__insn, AARCH64_INSN_SIZE);
+}
+
+int __kprobes aarch64_addr_write(void *addr, u64 dst)
+{
+	return __aarch64_text_write(addr, &dst, sizeof(dst));
 }
 
 int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 044a7d7..6ce5a69 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c

@@ -41,6 +41,8 @@
 #include <linux/thread_info.h>
 #include <linux/prctl.h>
 #include <linux/stacktrace.h>
+#include <trace/hooks/fpsimd.h>
+#include <trace/hooks/mpam.h>
 
 #include <asm/alternative.h>
 #include <asm/compat.h>
@@ -245,6 +247,7 @@ void show_regs(struct pt_regs *regs)
 	__show_regs(regs);
 	dump_backtrace(regs, NULL, KERN_DEFAULT);
 }
+EXPORT_SYMBOL_GPL(show_regs);
 
 static void tls_thread_flush(void)
 {
@@ -532,6 +535,12 @@ struct task_struct *__switch_to(struct task_struct *prev,
 	ptrauth_thread_switch_user(next);
 
 	/*
+	 *  vendor hook is needed before the dsb(),
+	 *  because MPAM is related to cache maintenance.
+	 */
+	trace_android_vh_mpam_set(prev, next);
+
+	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case
 	 * the thread migrates to a different CPU.
 	 * This full barrier is also required by the membarrier system
@@ -549,6 +558,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
 	if (prev->thread.sctlr_user != next->thread.sctlr_user)
 		update_sctlr_el1(next->thread.sctlr_user);
 
+	trace_android_vh_is_fpsimd_save(prev, next);
+
 	/* the actual thread switch */
 	last = cpu_switch_to(prev, next);
 

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index fea3223..cfbf894 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c

@@ -40,6 +40,7 @@
 #include <asm/elf.h>
 #include <asm/cpufeature.h>
 #include <asm/cpu_ops.h>
+#include <asm/hypervisor.h>
 #include <asm/kasan.h>
 #include <asm/numa.h>
 #include <asm/sections.h>
@@ -49,6 +50,7 @@
 #include <asm/tlbflush.h>
 #include <asm/traps.h>
 #include <asm/efi.h>
+#include <asm/hypervisor.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/mmu_context.h>
 
@@ -438,3 +440,10 @@ static int __init register_arm64_panic_block(void)
 	return 0;
 }
 device_initcall(register_arm64_panic_block);
+
+void kvm_arm_init_hyp_services(void)
+{
+	kvm_init_ioremap_services();
+	kvm_init_memshare_services();
+	kvm_init_memrelinquish_services();
+}

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76..de8dcba 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c

@@ -53,9 +53,14 @@
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/ipi.h>
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/debug.h>
 
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
 EXPORT_PER_CPU_SYMBOL(cpu_number);
+EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_raise);
+EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_entry);
+EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_exit);
 
 /*
  * as from 2.5, kernels no longer have an init_tasks structure
@@ -877,6 +882,7 @@ static void do_handle_IPI(int ipinr)
 		break;
 
 	case IPI_CPU_STOP:
+		trace_android_vh_ipi_stop(get_irq_regs());
 		local_cpu_stop();
 		break;
 
@@ -1097,3 +1103,15 @@ bool cpus_are_stuck_in_kernel(void)
 	return !!cpus_stuck_in_kernel || smp_spin_tables ||
 		is_protected_kvm_enabled();
 }
+
+int nr_ipi_get(void)
+{
+	return nr_ipi;
+}
+EXPORT_SYMBOL_GPL(nr_ipi_get);
+
+struct irq_desc **ipi_desc_get(void)
+{
+	return ipi_desc;
+}
+EXPORT_SYMBOL_GPL(ipi_desc_get);

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 8315430..1fc70bc 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c

@@ -159,6 +159,7 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 
 	put_task_stack(tsk);
 }
+EXPORT_SYMBOL_GPL(dump_backtrace);
 
 void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
 {

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 45131e3..8191ea7 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S

@@ -13,16 +13,36 @@
 	*(__kvm_ex_table)					\
 	__stop___kvm_ex_table = .;
 
-#define HYPERVISOR_DATA_SECTIONS				\
+#ifdef CONFIG_TRACING
+#define HYPERVISOR_EVENT_IDS 					\
+		. = ALIGN(PAGE_SIZE);				\
+		__hyp_event_ids_start = .;			\
+		*(HYP_SECTION_NAME(_hyp_event_ids))		\
+		__hyp_event_ids_end = .;
+#else
+#define HYPERVISOR_EVENT_IDS
+#endif
+
+#define HYPERVISOR_RODATA_SECTIONS				\
 	HYP_SECTION_NAME(.rodata) : {				\
 		. = ALIGN(PAGE_SIZE);				\
 		__hyp_rodata_start = .;				\
 		*(HYP_SECTION_NAME(.data..ro_after_init))	\
 		*(HYP_SECTION_NAME(.rodata))			\
+		HYPERVISOR_EVENT_IDS				\
 		. = ALIGN(PAGE_SIZE);				\
 		__hyp_rodata_end = .;				\
 	}
 
+#define HYPERVISOR_DATA_SECTION					\
+	HYP_SECTION_NAME(.data) : {				\
+		. = ALIGN(PAGE_SIZE);				\
+		__hyp_data_start = .;				\
+		*(HYP_SECTION_NAME(.data))			\
+		. = ALIGN(PAGE_SIZE);				\
+		__hyp_data_end = .;				\
+	}
+
 #define HYPERVISOR_PERCPU_SECTION				\
 	. = ALIGN(PAGE_SIZE);					\
 	HYP_SECTION_NAME(.data..percpu) : {			\
@@ -42,6 +62,17 @@
 	. = ALIGN(PAGE_SIZE);					\
 	__hyp_bss_end = .;
 
+#ifdef CONFIG_TRACING
+#define HYPERVISOR_EVENTS					\
+	.hyp.events : {						\
+		__start_hyp_events = .;				\
+		*(_hyp_events)					\
+		__stop_hyp_events = .;				\
+	}
+#else
+#define HYPERVISOR_EVENTS
+#endif
+
 /*
  * We require that __hyp_bss_start and __bss_start are aligned, and enforce it
  * with an assertion. But the BSS_SECTION macro places an empty .sbss section
@@ -51,9 +82,11 @@
 #define SBSS_ALIGN			PAGE_SIZE
 #else /* CONFIG_KVM */
 #define HYPERVISOR_EXTABLE
-#define HYPERVISOR_DATA_SECTIONS
+#define HYPERVISOR_RODATA_SECTIONS
+#define HYPERVISOR_DATA_SECTION
 #define HYPERVISOR_PERCPU_SECTION
 #define HYPERVISOR_RELOC_SECTION
+#define HYPERVISOR_EVENTS
 #define SBSS_ALIGN			0
 #endif
 
@@ -188,7 +221,9 @@
 	/* everything from this point to __init_begin will be marked RO NX */
 	RO_DATA(PAGE_SIZE)
 
-	HYPERVISOR_DATA_SECTIONS
+	HYPERVISOR_EVENTS
+
+	HYPERVISOR_RODATA_SECTIONS
 
 	/* code sections that are never executed via the kernel mapping */
 	.rodata.text : {
@@ -276,6 +311,8 @@
 	_sdata = .;
 	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
 
+	HYPERVISOR_DATA_SECTION
+
 	/*
 	 * Data written with the MMU off but read with the MMU on requires
 	 * cache lines to be invalidated, discarding up to a Cache Writeback

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5e33c2d..8481726 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile

@@ -14,7 +14,7 @@
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o stacktrace.o \
 	 vgic-sys-reg-v3.o fpsimd.o pkvm.o \
-	 arch_timer.o trng.o vmid.o \
+	 arch_timer.o trng.o vmid.o iommu.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
@@ -24,6 +24,8 @@
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu.o
 
+kvm-$(CONFIG_TRACING) += hyp_events.o hyp_trace.o
+
 always-y := hyp_constants.h hyp-constants.s
 
 define rule_gen_hyp_constants

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index bb24a76..3c7096c 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c

@@ -88,7 +88,9 @@ static u64 timer_get_offset(struct arch_timer_context *ctxt)
 
 	switch(arch_timer_ctx_index(ctxt)) {
 	case TIMER_VTIMER:
-		return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
+		if (likely(!kvm_vm_is_protected(vcpu->kvm)))
+			return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
+		fallthrough;
 	default:
 		return 0;
 	}
@@ -768,6 +770,9 @@ static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_vcpu *tmp;
 
+	if (unlikely(kvm_vm_is_protected(vcpu->kvm)))
+		cntvoff = 0;
+
 	mutex_lock(&kvm->lock);
 	kvm_for_each_vcpu(i, tmp, kvm)
 		timer_set_offset(vcpu_vtimer(tmp), cntvoff);

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e2..c2ec9c7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c

@@ -27,6 +27,8 @@
 #define CREATE_TRACE_POINTS
 #include "trace_arm.h"
 
+#include "hyp_trace.h"
+
 #include <linux/uaccess.h>
 #include <asm/ptrace.h>
 #include <asm/mman.h>
@@ -37,6 +39,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_pkvm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
@@ -50,8 +53,8 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
 DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
-unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
+DECLARE_KVM_NVHE_PER_CPU(int, hyp_cpu_number);
 
 static bool vgic_present;
 
@@ -78,18 +81,31 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 {
 	int r;
 
-	if (cap->flags)
-		return -EINVAL;
+	/* Capabilities with flags */
+	switch (cap->cap) {
+	case KVM_CAP_ARM_PROTECTED_VM:
+		return pkvm_vm_ioctl_enable_cap(kvm, cap);
+	default:
+		if (cap->flags)
+			return -EINVAL;
+	}
 
+	/* Capabilities without flags */
 	switch (cap->cap) {
 	case KVM_CAP_ARM_NISV_TO_USER:
-		r = 0;
-		set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
-			&kvm->arch.flags);
+		if (kvm_vm_is_protected(kvm)) {
+			r = -EINVAL;
+		} else {
+			r = 0;
+			set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
+				&kvm->arch.flags);
+		}
 		break;
 	case KVM_CAP_ARM_MTE:
 		mutex_lock(&kvm->lock);
-		if (!system_supports_mte() || kvm->created_vcpus) {
+		if (!system_supports_mte() ||
+		    kvm_vm_is_protected(kvm) ||
+		    kvm->created_vcpus) {
 			r = -EINVAL;
 		} else {
 			r = 0;
@@ -138,24 +154,27 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
 	int ret;
 
-	ret = kvm_arm_setup_stage2(kvm, type);
-	if (ret)
-		return ret;
-
-	ret = kvm_init_stage2_mmu(kvm, &kvm->arch.mmu);
-	if (ret)
-		return ret;
+	if (type & ~KVM_VM_TYPE_MASK)
+		return -EINVAL;
 
 	ret = kvm_share_hyp(kvm, kvm + 1);
 	if (ret)
-		goto out_free_stage2_pgd;
+		return ret;
+
+	ret = pkvm_init_host_vm(kvm, type);
+	if (ret)
+		goto err_unshare_kvm;
 
 	if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL)) {
 		ret = -ENOMEM;
-		goto out_free_stage2_pgd;
+		goto err_unshare_kvm;
 	}
 	cpumask_copy(kvm->arch.supported_cpus, cpu_possible_mask);
 
+	ret = kvm_init_stage2_mmu(kvm, &kvm->arch.mmu, type);
+	if (ret)
+		goto err_free_cpumask;
+
 	kvm_vgic_early_init(kvm);
 
 	/* The maximum number of VCPUs is limited by the host's GIC model */
@@ -164,9 +183,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	set_default_spectre(kvm);
 	kvm_arm_init_hypercalls(kvm);
 
-	return ret;
-out_free_stage2_pgd:
-	kvm_free_stage2_pgd(&kvm->arch.mmu);
+	return 0;
+
+err_free_cpumask:
+	free_cpumask_var(kvm->arch.supported_cpus);
+err_unshare_kvm:
+	kvm_unshare_hyp(kvm, kvm + 1);
 	return ret;
 }
 
@@ -187,14 +209,22 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 
 	kvm_vgic_destroy(kvm);
 
+	if (is_protected_kvm_enabled())
+		pkvm_destroy_hyp_vm(kvm);
+
 	kvm_destroy_vcpus(kvm);
 
+	if (atomic64_read(&kvm->stat.protected_hyp_mem))
+		pr_warn("%lluB of donations to the nVHE hyp are missing\n",
+			atomic64_read(&kvm->stat.protected_hyp_mem));
+
 	kvm_unshare_hyp(kvm, kvm + 1);
 }
 
-int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
+static int kvm_check_extension(struct kvm *kvm, long ext)
 {
 	int r;
+
 	switch (ext) {
 	case KVM_CAP_IRQCHIP:
 		r = vgic_present;
@@ -212,7 +242,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_IMMEDIATE_EXIT:
 	case KVM_CAP_VCPU_EVENTS:
 	case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
-	case KVM_CAP_ARM_NISV_TO_USER:
 	case KVM_CAP_ARM_INJECT_EXT_DABT:
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
@@ -220,6 +249,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_SYSTEM_SUSPEND:
 		r = 1;
 		break;
+	case KVM_CAP_ARM_NISV_TO_USER:
+		r = !kvm || !kvm_vm_is_protected(kvm);
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
 		return KVM_GUESTDBG_VALID_MASK;
 	case KVM_CAP_ARM_SET_DEVICE_ADDR:
@@ -293,6 +325,75 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	return r;
 }
 
+/*
+ * Checks whether the extension specified in ext is supported in protected
+ * mode for the specified vm.
+ * The capabilities supported by kvm in general are passed in kvm_cap.
+ */
+static int pkvm_check_extension(struct kvm *kvm, long ext, int kvm_cap)
+{
+	int r;
+
+	switch (ext) {
+	case KVM_CAP_IRQCHIP:
+	case KVM_CAP_ARM_PSCI:
+	case KVM_CAP_ARM_PSCI_0_2:
+	case KVM_CAP_NR_VCPUS:
+	case KVM_CAP_MAX_VCPUS:
+	case KVM_CAP_MAX_VCPU_ID:
+	case KVM_CAP_MSI_DEVID:
+	case KVM_CAP_ARM_VM_IPA_SIZE:
+		r = kvm_cap;
+		break;
+	case KVM_CAP_GUEST_DEBUG_HW_BPS:
+		r = min(kvm_cap, pkvm_get_max_brps());
+		break;
+	case KVM_CAP_GUEST_DEBUG_HW_WPS:
+		r = min(kvm_cap, pkvm_get_max_wrps());
+		break;
+	case KVM_CAP_ARM_PMU_V3:
+		r = kvm_cap && FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer),
+					 PVM_ID_AA64DFR0_ALLOW);
+		break;
+	case KVM_CAP_ARM_SVE:
+		r = kvm_cap && FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_SVE),
+					 PVM_ID_AA64PFR0_RESTRICT_UNSIGNED);
+		break;
+	case KVM_CAP_ARM_PTRAUTH_ADDRESS:
+		r = kvm_cap &&
+		    FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_API),
+			      PVM_ID_AA64ISAR1_ALLOW) &&
+		    FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_APA),
+			      PVM_ID_AA64ISAR1_ALLOW);
+		break;
+	case KVM_CAP_ARM_PTRAUTH_GENERIC:
+		r = kvm_cap &&
+		    FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_GPI),
+			      PVM_ID_AA64ISAR1_ALLOW) &&
+		    FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_GPA),
+			      PVM_ID_AA64ISAR1_ALLOW);
+		break;
+	case KVM_CAP_ARM_PROTECTED_VM:
+		r = 1;
+		break;
+	default:
+		r = 0;
+		break;
+	}
+
+	return r;
+}
+
+int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
+{
+	int r = kvm_check_extension(kvm, ext);
+
+	if (kvm && kvm_vm_is_protected(kvm))
+		r = pkvm_check_extension(kvm, ext, r);
+
+	return r;
+}
+
 long kvm_arch_dev_ioctl(struct file *filp,
 			unsigned int ioctl, unsigned long arg)
 {
@@ -363,7 +464,11 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		static_branch_dec(&userspace_irqchip_in_use);
 
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	if (is_protected_kvm_enabled())
+		free_hyp_stage2_memcache(&vcpu->arch.pkvm_memcache, vcpu->kvm);
+	else
+		kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
 
@@ -385,6 +490,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	struct kvm_s2_mmu *mmu;
 	int *last_ran;
 
+	if (is_protected_kvm_enabled())
+		goto nommu;
+
 	mmu = vcpu->arch.hw_mmu;
 	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
 
@@ -402,6 +510,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		*last_ran = vcpu->vcpu_id;
 	}
 
+nommu:
 	vcpu->cpu = cpu;
 
 	kvm_vgic_load(vcpu);
@@ -422,18 +531,36 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
 
+	if (is_protected_kvm_enabled()) {
+		kvm_call_hyp_nvhe(__pkvm_vcpu_load,
+				  vcpu->kvm->arch.pkvm.handle,
+				  vcpu->vcpu_idx, vcpu->arch.hcr_el2);
+		kvm_call_hyp(__vgic_v3_restore_vmcr_aprs,
+			     &vcpu->arch.vgic_cpu.vgic_v3);
+	}
+
 	if (!cpumask_test_cpu(smp_processor_id(), vcpu->kvm->arch.supported_cpus))
 		vcpu_set_on_unsupported_cpu(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	if (is_protected_kvm_enabled()) {
+		kvm_call_hyp(__vgic_v3_save_vmcr_aprs,
+			     &vcpu->arch.vgic_cpu.vgic_v3);
+		kvm_call_hyp_nvhe(__pkvm_vcpu_put);
+
+		/* __pkvm_vcpu_put implies a sync of the state */
+		if (!kvm_vm_is_protected(vcpu->kvm))
+			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
+	}
+
 	kvm_arch_vcpu_put_debug_state_flags(vcpu);
 	kvm_arch_vcpu_put_fp(vcpu);
 	if (has_vhe())
 		kvm_vcpu_put_sysregs_vhe(vcpu);
 	kvm_timer_vcpu_put(vcpu);
-	kvm_vgic_put(vcpu);
+	kvm_vgic_put(vcpu, false);
 	kvm_vcpu_pmu_restore_host(vcpu);
 	kvm_arm_vmid_clear_active();
 
@@ -569,6 +696,15 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 	if (ret)
 		return ret;
 
+	if (is_protected_kvm_enabled()) {
+		/* Start with the vcpu in a dirty state */
+		if (!kvm_vm_is_protected(vcpu->kvm))
+			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
+		ret = pkvm_create_hyp_vm(kvm);
+		if (ret)
+			return ret;
+	}
+
 	if (!irqchip_in_kernel(kvm)) {
 		/*
 		 * Tell the rest of the code that there are userspace irqchip
@@ -577,14 +713,6 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 		static_branch_inc(&userspace_irqchip_in_use);
 	}
 
-	/*
-	 * Initialize traps for protected VMs.
-	 * NOTE: Move to run in EL2 directly, rather than via a hypercall, once
-	 * the code is in place for first run initialization at EL2.
-	 */
-	if (kvm_vm_is_protected(kvm))
-		kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
-
 	mutex_lock(&kvm->lock);
 	set_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags);
 	mutex_unlock(&kvm->lock);
@@ -660,15 +788,14 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
 	 * doorbells to be signalled, should an interrupt become pending.
 	 */
 	preempt_disable();
-	kvm_vgic_vmcr_sync(vcpu);
-	vgic_v4_put(vcpu, true);
+	kvm_vgic_put(vcpu, true);
 	preempt_enable();
 
 	kvm_vcpu_halt(vcpu);
 	vcpu_clear_flag(vcpu, IN_WFIT);
 
 	preempt_disable();
-	vgic_v4_load(vcpu);
+	kvm_vgic_load(vcpu);
 	preempt_enable();
 }
 
@@ -1522,6 +1649,9 @@ static void cpu_prepare_hyp_mode(int cpu)
 {
 	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 	unsigned long tcr;
+	int *hyp_cpu_number_ptr = per_cpu_ptr_nvhe_sym(hyp_cpu_number, cpu);
+
+	*hyp_cpu_number_ptr = cpu;
 
 	/*
 	 * Calculate the raw per-cpu offset without a translation from the
@@ -1779,6 +1909,7 @@ static bool init_psci_relay(void)
 	}
 
 	kvm_host_psci_config.version = psci_ops.get_version();
+	kvm_host_psci_config.smccc_version = arm_smccc_get_version();
 
 	if (kvm_host_psci_config.version == PSCI_VERSION(0, 1)) {
 		kvm_host_psci_config.function_ids_0_1 = get_psci_0_1_function_ids();
@@ -1844,13 +1975,13 @@ static void teardown_hyp_mode(void)
 	free_hyp_pgds();
 	for_each_possible_cpu(cpu) {
 		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
-		free_pages(kvm_arm_hyp_percpu_base[cpu], nvhe_percpu_order());
+		free_pages(kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu], nvhe_percpu_order());
 	}
 }
 
 static int do_pkvm_init(u32 hyp_va_bits)
 {
-	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
+	void *per_cpu_base = kvm_ksym_ref(kvm_nvhe_sym(kvm_arm_hyp_percpu_base));
 	int ret;
 
 	preempt_disable();
@@ -1870,11 +2001,8 @@ static int do_pkvm_init(u32 hyp_va_bits)
 	return ret;
 }
 
-static int kvm_hyp_init_protection(u32 hyp_va_bits)
+static void kvm_hyp_init_symbols(void)
 {
-	void *addr = phys_to_virt(hyp_mem_base);
-	int ret;
-
 	kvm_nvhe_sym(id_aa64pfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
 	kvm_nvhe_sym(id_aa64pfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
 	kvm_nvhe_sym(id_aa64isar0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64ISAR0_EL1);
@@ -1883,6 +2011,18 @@ static int kvm_hyp_init_protection(u32 hyp_va_bits)
 	kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+	kvm_nvhe_sym(id_aa64smfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64SMFR0_EL1);
+	kvm_nvhe_sym(__icache_flags) = __icache_flags;
+	kvm_nvhe_sym(kvm_arm_vmid_bits) = kvm_arm_vmid_bits;
+	kvm_nvhe_sym(smccc_trng_available) = smccc_trng_available;
+}
+
+int kvm_hyp_init_events(void);
+
+static int kvm_hyp_init_protection(u32 hyp_va_bits)
+{
+	void *addr = phys_to_virt(hyp_mem_base);
+	int ret;
 
 	ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
 	if (ret)
@@ -1950,7 +2090,7 @@ static int init_hyp_mode(void)
 
 		page_addr = page_address(page);
 		memcpy(page_addr, CHOOSE_NVHE_SYM(__per_cpu_start), nvhe_percpu_size());
-		kvm_arm_hyp_percpu_base[cpu] = (unsigned long)page_addr;
+		kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu] = (unsigned long)page_addr;
 	}
 
 	/*
@@ -1963,6 +2103,13 @@ static int init_hyp_mode(void)
 		goto out_err;
 	}
 
+	err = create_hyp_mappings(kvm_ksym_ref(__hyp_data_start),
+				  kvm_ksym_ref(__hyp_data_end), PAGE_HYP);
+	if (err) {
+		kvm_err("Cannot map .hyp.data section\n");
+		goto out_err;
+	}
+
 	err = create_hyp_mappings(kvm_ksym_ref(__hyp_rodata_start),
 				  kvm_ksym_ref(__hyp_rodata_end), PAGE_HYP_RO);
 	if (err) {
@@ -2043,7 +2190,7 @@ static int init_hyp_mode(void)
 	}
 
 	for_each_possible_cpu(cpu) {
-		char *percpu_begin = (char *)kvm_arm_hyp_percpu_base[cpu];
+		char *percpu_begin = (char *)kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];
 		char *percpu_end = percpu_begin + nvhe_percpu_size();
 
 		/* Map Hyp percpu pages */
@@ -2057,6 +2204,13 @@ static int init_hyp_mode(void)
 		cpu_prepare_hyp_mode(cpu);
 	}
 
+	kvm_hyp_init_symbols();
+
+	/* TODO: Real .h interface */
+#ifdef CONFIG_TRACING
+	kvm_hyp_init_events();
+#endif
+
 	if (is_protected_kvm_enabled()) {
 		init_cpu_logical_map();
 
@@ -2064,9 +2218,7 @@ static int init_hyp_mode(void)
 			err = -ENODEV;
 			goto out_err;
 		}
-	}
 
-	if (is_protected_kvm_enabled()) {
 		err = kvm_hyp_init_protection(hyp_va_bits);
 		if (err) {
 			kvm_err("Failed to init hyp memory protection\n");
@@ -2099,6 +2251,17 @@ static int pkvm_drop_host_privileges(void)
 	 * once the host stage 2 is installed.
 	 */
 	static_branch_enable(&kvm_protected_mode_initialized);
+
+	/*
+	 * Fixup the boot mode so that we don't take spurious round
+	 * trips via EL2 on cpu_resume. Flush to the PoC for a good
+	 * measure, so that it can be observed by a CPU coming out of
+	 * suspend with the MMU off.
+	 */
+	__boot_cpu_mode[0] = __boot_cpu_mode[1] = BOOT_CPU_MODE_EL1;
+	dcache_clean_poc((unsigned long)__boot_cpu_mode,
+			 (unsigned long)(__boot_cpu_mode + 2));
+
 	on_each_cpu(_kvm_host_prot_finalize, &ret, 1);
 	return ret;
 }
@@ -2237,6 +2400,10 @@ int kvm_arch_init(void *opaque)
 			kvm_err("Failed to finalize Hyp protection\n");
 			goto out_hyp;
 		}
+
+		err = init_hyp_tracefs();
+		if (err)
+			kvm_err("Failed to initialize Hyp tracing\n");
 	}
 
 	if (is_protected_kvm_enabled()) {

diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index fccf9ec..81218d7 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c

@@ -39,9 +39,7 @@ static DEFINE_PER_CPU(u64, mdcr_el2);
  */
 static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
 {
-	u64 val = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
-
-	vcpu->arch.guest_debug_preserved.mdscr_el1 = val;
+	__vcpu_save_guest_debug_regs(vcpu);
 
 	trace_kvm_arm_set_dreg32("Saved MDSCR_EL1",
 				vcpu->arch.guest_debug_preserved.mdscr_el1);
@@ -52,9 +50,7 @@ static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
 
 static void restore_guest_debug_regs(struct kvm_vcpu *vcpu)
 {
-	u64 val = vcpu->arch.guest_debug_preserved.mdscr_el1;
-
-	vcpu_write_sys_reg(vcpu, val, MDSCR_EL1);
+	__vcpu_restore_guest_debug_regs(vcpu);
 
 	trace_kvm_arm_set_dreg32("Restored MDSCR_EL1",
 				vcpu_read_sys_reg(vcpu, MDSCR_EL1));
@@ -175,7 +171,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 	kvm_arm_setup_mdcr_el2(vcpu);
 
 	/* Check if we need to use the debug registers. */
-	if (vcpu->guest_debug || kvm_vcpu_os_lock_enabled(vcpu)) {
+	if (kvm_vcpu_needs_debug_regs(vcpu)) {
 		/* Save guest debug state */
 		save_guest_debug_regs(vcpu);
 
@@ -284,7 +280,7 @@ void kvm_arm_clear_debug(struct kvm_vcpu *vcpu)
 	/*
 	 * Restore the guest's debug registers if we were using them.
 	 */
-	if (vcpu->guest_debug || kvm_vcpu_os_lock_enabled(vcpu)) {
+	if (kvm_vcpu_needs_debug_regs(vcpu)) {
 		if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
 			if (!(*vcpu_cpsr(vcpu) & DBG_SPSR_SS))
 				/*

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 2ff13a3..89cb03e 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c

@@ -29,7 +29,9 @@
 #include "trace.h"
 
 const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
-	KVM_GENERIC_VM_STATS()
+	KVM_GENERIC_VM_STATS(),
+	STATS_DESC_ICOUNTER(VM, protected_hyp_mem),
+	STATS_DESC_ICOUNTER(VM, protected_shared_mem),
 };
 
 const struct kvm_stats_header kvm_vm_stats_header = {

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e778eefc..05b3ec8 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c

@@ -241,6 +241,21 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
 	int handled;
 
 	/*
+	 * If we run a non-protected VM when protection is enabled
+	 * system-wide, resync the state from the hypervisor and mark
+	 * it as dirty on the host side if it wasn't dirty already
+	 * (which could happen if preemption has taken place).
+	 */
+	if (is_protected_kvm_enabled() && !kvm_vm_is_protected(vcpu->kvm)) {
+		preempt_disable();
+		if (!(vcpu_get_flag(vcpu, PKVM_HOST_STATE_DIRTY))) {
+			kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
+			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
+		}
+		preempt_enable();
+	}
+
+	/*
 	 * See ARM ARM B1.14.1: "Hyp traps on instructions
 	 * that fail their condition code check"
 	 */
@@ -307,6 +322,13 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
 /* For exit types that need handling before we can be preempted */
 void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
 {
+	/*
+	 * We just exited, so the state is clean from a hypervisor
+	 * perspective.
+	 */
+	if (is_protected_kvm_enabled())
+		vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
+
 	if (ARM_SERROR_PENDING(exception_index)) {
 		if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
 			u64 disr = kvm_vcpu_get_disr(vcpu);

diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 791d3de..a5fa6a1 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c

@@ -61,12 +61,25 @@ static void __vcpu_write_spsr_und(struct kvm_vcpu *vcpu, u64 val)
 		vcpu->arch.ctxt.spsr_und = val;
 }
 
+unsigned long get_except64_offset(unsigned long psr, unsigned long target_mode,
+				  enum exception_type type)
+{
+	u64 mode = psr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+	u64 exc_offset;
+
+	if      (mode == target_mode)
+		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
+	else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
+		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
+	else if (!(mode & PSR_MODE32_BIT))
+		exc_offset = LOWER_EL_AArch64_VECTOR;
+	else
+		exc_offset = LOWER_EL_AArch32_VECTOR;
+
+	return exc_offset + type;
+}
+
 /*
- * This performs the exception entry at a given EL (@target_mode), stashing PC
- * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
- * The EL passed to this function *must* be a non-secure, privileged mode with
- * bit 0 being set (PSTATE.SP == 1).
- *
  * When an exception is taken, most PSTATE fields are left unchanged in the
  * handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
  * of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
@@ -78,45 +91,17 @@ static void __vcpu_write_spsr_und(struct kvm_vcpu *vcpu, u64 val)
  * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
  * MSB to LSB.
  */
-static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
-			      enum exception_type type)
+unsigned long get_except64_cpsr(unsigned long old, bool has_mte,
+				unsigned long sctlr, unsigned long target_mode)
 {
-	unsigned long sctlr, vbar, old, new, mode;
-	u64 exc_offset;
-
-	mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
-
-	if      (mode == target_mode)
-		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
-	else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
-		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
-	else if (!(mode & PSR_MODE32_BIT))
-		exc_offset = LOWER_EL_AArch64_VECTOR;
-	else
-		exc_offset = LOWER_EL_AArch32_VECTOR;
-
-	switch (target_mode) {
-	case PSR_MODE_EL1h:
-		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL1);
-		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
-		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
-		break;
-	default:
-		/* Don't do that */
-		BUG();
-	}
-
-	*vcpu_pc(vcpu) = vbar + exc_offset + type;
-
-	old = *vcpu_cpsr(vcpu);
-	new = 0;
+	u64 new = 0;
 
 	new |= (old & PSR_N_BIT);
 	new |= (old & PSR_Z_BIT);
 	new |= (old & PSR_C_BIT);
 	new |= (old & PSR_V_BIT);
 
-	if (kvm_has_mte(kern_hyp_va(vcpu->kvm)))
+	if (has_mte)
 		new |= PSR_TCO_BIT;
 
 	new |= (old & PSR_DIT_BIT);
@@ -152,6 +137,36 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 
 	new |= target_mode;
 
+	return new;
+}
+
+/*
+ * This performs the exception entry at a given EL (@target_mode), stashing PC
+ * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
+ * The EL passed to this function *must* be a non-secure, privileged mode with
+ * bit 0 being set (PSTATE.SP == 1).
+ */
+static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
+			      enum exception_type type)
+{
+	u64 offset = get_except64_offset(*vcpu_cpsr(vcpu), target_mode, type);
+	unsigned long sctlr, vbar, old, new;
+
+	switch (target_mode) {
+	case PSR_MODE_EL1h:
+		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL1);
+		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
+		break;
+	default:
+		/* Don't do that */
+		BUG();
+	}
+
+	*vcpu_pc(vcpu) = vbar + offset;
+
+	old = *vcpu_cpsr(vcpu);
+	new = get_except64_cpsr(old, kvm_has_mte(kern_hyp_va(vcpu->kvm)), sctlr, target_mode);
 	*vcpu_cpsr(vcpu) = new;
 	__vcpu_write_spsr(vcpu, old);
 }

diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
index 61e6f3b..e950875 100644
--- a/arch/arm64/kvm/hyp/fpsimd.S
+++ b/arch/arm64/kvm/hyp/fpsimd.S

@@ -25,3 +25,9 @@
 	sve_load 0, x1, x2, 3
 	ret
 SYM_FUNC_END(__sve_restore_state)
+
+SYM_FUNC_START(__sve_save_state)
+	mov	x2, #1
+	sve_save 0, x1, x2, 3
+	ret
+SYM_FUNC_END(__sve_save_state)

diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b3742a6..9fe0d2a 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c

@@ -2,9 +2,16 @@
 
 #include <linux/kbuild.h>
 #include <nvhe/memory.h>
+#include <nvhe/pkvm.h>
+#include <nvhe/trace.h>
 
 int main(void)
 {
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
+	DEFINE(PKVM_HYP_VM_SIZE,	sizeof(struct pkvm_hyp_vm));
+	DEFINE(PKVM_HYP_VCPU_SIZE,	sizeof(struct pkvm_hyp_vcpu));
+#ifdef CONFIG_FTRACE
+	DEFINE(STRUCT_HYP_BUFFER_PAGE_SIZE,	sizeof(struct hyp_buffer_page));
+#endif
 	return 0;
 }

diff --git a/arch/arm64/kvm/hyp/include/nvhe/clock.h b/arch/arm64/kvm/hyp/include/nvhe/clock.h
new file mode 100644
index 0000000..7e5c2d2
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/clock.h

@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM64_KVM_HYP_NVHE_CLOCK_H
+#define __ARM64_KVM_HYP_NVHE_CLOCK_H
+#include <linux/types.h>
+
+#include <asm/kvm_hyp.h>
+
+#ifdef CONFIG_TRACING
+void trace_clock_update(struct kvm_nvhe_clock_data *data);
+u64 trace_clock(void);
+#else
+static inline void trace_clock_update(struct kvm_nvhe_clock_data *data) { }
+static inline u64 trace_clock(void) { return 0; }
+#endif
+#endif

diff --git a/arch/arm64/kvm/hyp/include/nvhe/ffa.h b/arch/arm64/kvm/hyp/include/nvhe/ffa.h
new file mode 100644
index 0000000..1becb10
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/ffa.h

@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2022 - Google LLC
+ * Author: Andrew Walbran <qwandor@google.com>
+ */
+#ifndef __KVM_HYP_FFA_H
+#define __KVM_HYP_FFA_H
+
+#include <asm/kvm_host.h>
+
+#define FFA_MIN_FUNC_NUM 0x60
+#define FFA_MAX_FUNC_NUM 0x7F
+
+int hyp_ffa_init(void *pages);
+bool kvm_host_ffa_handler(struct kvm_cpu_context *host_ctxt);
+
+#endif /* __KVM_HYP_FFA_H */

diff --git a/arch/arm64/kvm/hyp/include/nvhe/fixed_config.h b/arch/arm64/kvm/hyp/include/nvhe/fixed_config.h
deleted file mode 100644
index 07edfc75..0000000
--- a/arch/arm64/kvm/hyp/include/nvhe/fixed_config.h
+++ /dev/null

@@ -1,205 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2021 Google LLC
- * Author: Fuad Tabba <tabba@google.com>
- */
-
-#ifndef __ARM64_KVM_FIXED_CONFIG_H__
-#define __ARM64_KVM_FIXED_CONFIG_H__
-
-#include <asm/sysreg.h>
-
-/*
- * This file contains definitions for features to be allowed or restricted for
- * guest virtual machines, depending on the mode KVM is running in and on the
- * type of guest that is running.
- *
- * The ALLOW masks represent a bitmask of feature fields that are allowed
- * without any restrictions as long as they are supported by the system.
- *
- * The RESTRICT_UNSIGNED masks, if present, represent unsigned fields for
- * features that are restricted to support at most the specified feature.
- *
- * If a feature field is not present in either, than it is not supported.
- *
- * The approach taken for protected VMs is to allow features that are:
- * - Needed by common Linux distributions (e.g., floating point)
- * - Trivial to support, e.g., supporting the feature does not introduce or
- * require tracking of additional state in KVM
- * - Cannot be trapped or prevent the guest from using anyway
- */
-
-/*
- * Allow for protected VMs:
- * - Floating-point and Advanced SIMD
- * - Data Independent Timing
- */
-#define PVM_ID_AA64PFR0_ALLOW (\
-	ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_FP) | \
-	ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_AdvSIMD) | \
-	ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_DIT) \
-	)
-
-/*
- * Restrict to the following *unsigned* features for protected VMs:
- * - AArch64 guests only (no support for AArch32 guests):
- *	AArch32 adds complexity in trap handling, emulation, condition codes,
- *	etc...
- * - RAS (v1)
- *	Supported by KVM
- */
-#define PVM_ID_AA64PFR0_RESTRICT_UNSIGNED (\
-	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_EL0), ID_AA64PFR0_EL1_ELx_64BIT_ONLY) | \
-	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_EL1), ID_AA64PFR0_EL1_ELx_64BIT_ONLY) | \
-	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_EL2), ID_AA64PFR0_EL1_ELx_64BIT_ONLY) | \
-	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_EL3), ID_AA64PFR0_EL1_ELx_64BIT_ONLY) | \
-	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_RAS), ID_AA64PFR0_EL1_RAS_IMP) \
-	)
-
-/*
- * Allow for protected VMs:
- * - Branch Target Identification
- * - Speculative Store Bypassing
- */
-#define PVM_ID_AA64PFR1_ALLOW (\
-	ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_BT) | \
-	ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SSBS) \
-	)
-
-/*
- * Allow for protected VMs:
- * - Mixed-endian
- * - Distinction between Secure and Non-secure Memory
- * - Mixed-endian at EL0 only
- * - Non-context synchronizing exception entry and exit
- */
-#define PVM_ID_AA64MMFR0_ALLOW (\
-	ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_BIGEND) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_SNSMEM) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_BIGENDEL0) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_EXS) \
-	)
-
-/*
- * Restrict to the following *unsigned* features for protected VMs:
- * - 40-bit IPA
- * - 16-bit ASID
- */
-#define PVM_ID_AA64MMFR0_RESTRICT_UNSIGNED (\
-	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_PARANGE), ID_AA64MMFR0_EL1_PARANGE_40) | \
-	FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64MMFR0_EL1_ASIDBITS), ID_AA64MMFR0_EL1_ASIDBITS_16) \
-	)
-
-/*
- * Allow for protected VMs:
- * - Hardware translation table updates to Access flag and Dirty state
- * - Number of VMID bits from CPU
- * - Hierarchical Permission Disables
- * - Privileged Access Never
- * - SError interrupt exceptions from speculative reads
- * - Enhanced Translation Synchronization
- */
-#define PVM_ID_AA64MMFR1_ALLOW (\
-	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_HAFDBS) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_VMIDBits) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_HPDS) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_PAN) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_SpecSEI) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR1_EL1_ETS) \
-	)
-
-/*
- * Allow for protected VMs:
- * - Common not Private translations
- * - User Access Override
- * - IESB bit in the SCTLR_ELx registers
- * - Unaligned single-copy atomicity and atomic functions
- * - ESR_ELx.EC value on an exception by read access to feature ID space
- * - TTL field in address operations.
- * - Break-before-make sequences when changing translation block size
- * - E0PDx mechanism
- */
-#define PVM_ID_AA64MMFR2_ALLOW (\
-	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_CnP) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_UAO) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_IESB) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_AT) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_IDS) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_TTL) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_BBM) | \
-	ARM64_FEATURE_MASK(ID_AA64MMFR2_EL1_E0PD) \
-	)
-
-/*
- * No support for Scalable Vectors for protected VMs:
- *	Requires additional support from KVM, e.g., context-switching and
- *	trapping at EL2
- */
-#define PVM_ID_AA64ZFR0_ALLOW (0ULL)
-
-/*
- * No support for debug, including breakpoints, and watchpoints for protected
- * VMs:
- *	The Arm architecture mandates support for at least the Armv8 debug
- *	architecture, which would include at least 2 hardware breakpoints and
- *	watchpoints. Providing that support to protected guests adds
- *	considerable state and complexity. Therefore, the reserved value of 0 is
- *	used for debug-related fields.
- */
-#define PVM_ID_AA64DFR0_ALLOW (0ULL)
-#define PVM_ID_AA64DFR1_ALLOW (0ULL)
-
-/*
- * No support for implementation defined features.
- */
-#define PVM_ID_AA64AFR0_ALLOW (0ULL)
-#define PVM_ID_AA64AFR1_ALLOW (0ULL)
-
-/*
- * No restrictions on instructions implemented in AArch64.
- */
-#define PVM_ID_AA64ISAR0_ALLOW (\
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_AES) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_SHA1) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_SHA2) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_CRC32) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_ATOMIC) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_RDM) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_SHA3) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_SM3) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_SM4) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_DP) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_FHM) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_TS) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_TLB) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR0_EL1_RNDR) \
-	)
-
-#define PVM_ID_AA64ISAR1_ALLOW (\
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_DPB) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_APA) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_API) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_JSCVT) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_FCMA) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_LRCPC) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_GPA) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_GPI) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_FRINTTS) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_SB) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_SPECRES) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_BF16) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_DGH) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_I8MM) \
-	)
-
-#define PVM_ID_AA64ISAR2_ALLOW (\
-	ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_GPA3) | \
-	ARM64_FEATURE_MASK(ID_AA64ISAR2_EL1_APA3) \
-	)
-
-u64 pvm_read_id_reg(const struct kvm_vcpu *vcpu, u32 id);
-bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code);
-bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code);
-int kvm_check_pvm_sysreg_table(void);
-
-#endif /* __ARM64_KVM_FIXED_CONFIG_H__ */

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 0a048dc..9330b13 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h

@@ -7,7 +7,7 @@
 #include <nvhe/memory.h>
 #include <nvhe/spinlock.h>
 
-#define HYP_NO_ORDER	USHRT_MAX
+#define HYP_NO_ORDER	0xff
 
 struct hyp_pool {
 	/*
@@ -19,11 +19,11 @@ struct hyp_pool {
 	struct list_head free_area[MAX_ORDER];
 	phys_addr_t range_start;
 	phys_addr_t range_end;
-	unsigned short max_order;
+	u8 max_order;
 };
 
 /* Allocation */
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order);
 void hyp_split_page(struct hyp_page *page);
 void hyp_get_page(struct hyp_pool *pool, void *addr);
 void hyp_put_page(struct hyp_pool *pool, void *addr);

diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
new file mode 100644
index 0000000..6392770
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h

@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ARM64_KVM_NVHE_IOMMU_H__
+#define __ARM64_KVM_NVHE_IOMMU_H__
+
+#include <linux/types.h>
+#include <asm/kvm_host.h>
+
+#include <nvhe/mem_protect.h>
+
+struct pkvm_iommu;
+
+struct pkvm_iommu_ops {
+	/*
+	 * Global driver initialization called before devices are registered.
+	 * Driver-specific arguments are passed in a buffer shared by the host.
+	 * The buffer memory has been pinned in EL2 but host retains R/W access.
+	 * Extra care must be taken when reading from it to avoid TOCTOU bugs.
+	 * If the driver maintains its own page tables, it is expected to
+	 * initialize them to all memory owned by the host.
+	 * Driver initialization lock held during callback.
+	 */
+	int (*init)(void *data, size_t size);
+
+	/*
+	 * Driver-specific validation of a device that is being registered.
+	 * All fields of the device struct have been populated.
+	 * Called with the host lock held.
+	 */
+	int (*validate)(struct pkvm_iommu *dev);
+
+	/*
+	 * Validation of a new child device that is being register by
+	 * the parent device the child selected. Called with the host lock held.
+	 */
+	int (*validate_child)(struct pkvm_iommu *dev, struct pkvm_iommu *child);
+
+	/*
+	 * Callback to apply a host stage-2 mapping change at driver level.
+	 * Called before 'host_stage2_idmap_apply' with host lock held.
+	 */
+	void (*host_stage2_idmap_prepare)(phys_addr_t start, phys_addr_t end,
+					  enum kvm_pgtable_prot prot);
+
+	/*
+	 * Callback to apply a host stage-2 mapping change at device level.
+	 * Called after 'host_stage2_idmap_prepare' with host lock held.
+	 */
+	void (*host_stage2_idmap_apply)(struct pkvm_iommu *dev,
+					phys_addr_t start, phys_addr_t end);
+
+	/*
+	 * Callback to finish a host stage-2 mapping change at device level.
+	 * Called after 'host_stage2_idmap_apply' with host lock held.
+	 */
+	void (*host_stage2_idmap_complete)(struct pkvm_iommu *dev);
+
+	/* Power management callbacks. Called with host lock held. */
+	int (*suspend)(struct pkvm_iommu *dev);
+	int (*resume)(struct pkvm_iommu *dev);
+
+	/*
+	 * Host data abort handler callback. Called with host lock held.
+	 * Returns true if the data abort has been handled.
+	 */
+	bool (*host_dabt_handler)(struct pkvm_iommu *dev,
+				  struct kvm_cpu_context *host_ctxt,
+				  u32 esr, size_t off);
+
+	/* Amount of memory allocated per-device for use by the driver. */
+	size_t data_size;
+};
+
+struct pkvm_iommu {
+	struct pkvm_iommu *parent;
+	struct list_head list;
+	struct list_head siblings;
+	struct list_head children;
+	unsigned long id;
+	const struct pkvm_iommu_ops *ops;
+	phys_addr_t pa;
+	void *va;
+	size_t size;
+	bool powered;
+	char data[];
+};
+
+int __pkvm_iommu_driver_init(struct pkvm_iommu_driver *drv, void *data, size_t size);
+int __pkvm_iommu_register(unsigned long dev_id, unsigned long drv_id,
+			  phys_addr_t dev_pa, size_t dev_size,
+			  unsigned long parent_id,
+			  void *kern_mem_va, size_t mem_size);
+int __pkvm_iommu_pm_notify(unsigned long dev_id,
+			   enum pkvm_iommu_pm_event event);
+int __pkvm_iommu_finalize(void);
+int pkvm_iommu_host_stage2_adjust_range(phys_addr_t addr, phys_addr_t *start,
+					phys_addr_t *end);
+bool pkvm_iommu_host_dabt_handler(struct kvm_cpu_context *host_ctxt, u32 esr,
+				  phys_addr_t fault_pa);
+void pkvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
+				  enum kvm_pgtable_prot prot);
+
+#endif	/* __ARM64_KVM_NVHE_IOMMU_H__ */

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 80e9983..b405013 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h

@@ -8,8 +8,10 @@
 #define __KVM_NVHE_MEM_PROTECT__
 #include <linux/kvm_host.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/spinlock.h>
 
 /*
@@ -28,7 +30,9 @@ enum pkvm_page_state {
 					  KVM_PGTABLE_PROT_SW1,
 
 	/* Meta-states which aren't encoded directly in the PTE's SW bits */
-	PKVM_NOPAGE,
+	PKVM_NOPAGE			= BIT(0),
+	PKVM_PAGE_RESTRICTED_PROT	= BIT(1),
+	PKVM_MODULE_DONT_TOUCH		= BIT(2),
 };
 
 #define PKVM_PAGE_STATE_PROT_MASK	(KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
@@ -43,30 +47,73 @@ static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
 	return prot & PKVM_PAGE_STATE_PROT_MASK;
 }
 
-struct host_kvm {
+struct host_mmu {
 	struct kvm_arch arch;
 	struct kvm_pgtable pgt;
 	struct kvm_pgtable_mm_ops mm_ops;
 	hyp_spinlock_t lock;
 };
-extern struct host_kvm host_kvm;
+extern struct host_mmu host_mmu;
 
-extern const u8 pkvm_hyp_id;
+/* This corresponds to page-table locking order */
+enum pkvm_component_id {
+	PKVM_ID_HOST,
+	PKVM_ID_HYP,
+	PKVM_ID_GUEST,
+	PKVM_ID_FFA,
+	PKVM_ID_PROTECTED,
+	PKVM_ID_MAX = PKVM_ID_PROTECTED,
+};
+
+extern unsigned long hyp_nr_cpus;
 
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
+int __pkvm_host_reclaim_page(struct pkvm_hyp_vm *vm, u64 pfn, u64 ipa);
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
+int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
+int __pkvm_guest_relinquish_to_host(struct pkvm_hyp_vcpu *vcpu,
+				    u64 ipa, u64 *ppa);
+int __pkvm_install_ioguard_page(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
+int __pkvm_remove_ioguard_page(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa);
+bool __pkvm_check_ioguard_page(struct pkvm_hyp_vcpu *hyp_vcpu);
+int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
+int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
 
 bool addr_is_memory(phys_addr_t phys);
-int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
-int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
+int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot,
+			     bool update_iommu);
+int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, enum pkvm_component_id owner_id);
+int host_stage2_protect_pages_locked(phys_addr_t addr, u64 size);
+int host_stage2_unmap_reg_locked(phys_addr_t start, u64 size);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
+int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
+int hyp_register_host_perm_fault_handler(int (*cb)(struct kvm_cpu_context *ctxt, u64 esr, u64 addr));
+int hyp_pin_shared_mem(void *from, void *to);
+void hyp_unpin_shared_mem(void *from, void *to);
+int host_stage2_get_leaf(phys_addr_t phys, kvm_pte_t *ptep, u32 *level);
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+		    struct kvm_hyp_memcache *host_mc);
+
+int module_change_host_page_prot(u64 pfn, enum kvm_pgtable_prot prot);
+
+void destroy_hyp_vm_pgt(struct pkvm_hyp_vm *vm);
+void drain_hyp_pool(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc);
+
+void psci_mem_protect_inc(u64 n);
+void psci_mem_protect_dec(u64 n);
+
 static __always_inline void __load_host_stage2(void)
 {
 	if (static_branch_likely(&kvm_protected_mode_initialized))
-		__load_stage2(&host_kvm.arch.mmu, &host_kvm.arch);
+		__load_stage2(&host_mmu.arch.mmu, &host_mmu.arch);
 	else
 		write_sysreg(0, vttbr_el2);
 }

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 592b7ed..a392021 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h

@@ -7,9 +7,16 @@
 
 #include <linux/types.h>
 
+/*
+ * Accesses to struct hyp_page flags are serialized by the host stage-2
+ * page-table lock.
+ */
+#define MODULE_OWNED_PAGE		BIT(0)
+
 struct hyp_page {
 	unsigned short refcount;
-	unsigned short order;
+	u8 order;
+	u8 flags;
 };
 
 extern u64 __hyp_vmemmap;
@@ -38,6 +45,10 @@ static inline phys_addr_t hyp_virt_to_phys(void *addr)
 #define hyp_page_to_virt(page)	__hyp_va(hyp_page_to_phys(page))
 #define hyp_page_to_pool(page)	(((struct hyp_page *)page)->pool)
 
+/*
+ * Refcounting for 'struct hyp_page'.
+ * hyp_pool::lock must be held if atomic access to the refcount is required.
+ */
 static inline int hyp_page_count(void *addr)
 {
 	struct hyp_page *p = hyp_virt_to_page(addr);
@@ -45,4 +56,27 @@ static inline int hyp_page_count(void *addr)
 	return p->refcount;
 }
 
+static inline void hyp_page_ref_inc(struct hyp_page *p)
+{
+	BUG_ON(p->refcount == USHRT_MAX);
+	p->refcount++;
+}
+
+static inline void hyp_page_ref_dec(struct hyp_page *p)
+{
+	BUG_ON(!p->refcount);
+	p->refcount--;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	hyp_page_ref_dec(p);
+	return (p->refcount == 0);
+}
+
+static inline void hyp_set_page_refcounted(struct hyp_page *p)
+{
+	BUG_ON(p->refcount);
+	p->refcount = 1;
+}
 #endif /* __KVM_HYP_MEMORY_H */

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 42d8eb9..66a6d71 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h

@@ -12,10 +12,16 @@
 
 extern struct kvm_pgtable pkvm_pgtable;
 extern hyp_spinlock_t pkvm_pgd_lock;
+extern const struct pkvm_module_ops module_ops;
+
+int hyp_create_pcpu_fixmap(void);
+void *hyp_fixmap_map(phys_addr_t phys);
+void hyp_fixmap_unmap(void);
+void hyp_poison_page(phys_addr_t phys);
 
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int hyp_back_vmemmap(phys_addr_t back);
 int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
 int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
@@ -23,17 +29,9 @@ int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 				  enum kvm_pgtable_prot prot,
 				  unsigned long *haddr);
 int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
+void pkvm_remove_mappings(void *from, void *to);
 
-static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
-				     unsigned long *start, unsigned long *end)
-{
-	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct hyp_page *p = hyp_phys_to_page(phys);
-
-	*start = (unsigned long)p;
-	*end = *start + nr_pages * sizeof(struct hyp_page);
-	*start = ALIGN_DOWN(*start, PAGE_SIZE);
-	*end = ALIGN(*end, PAGE_SIZE);
-}
-
+int __pkvm_map_module_page(u64 pfn, void *va, enum kvm_pgtable_prot prot, bool is_protected);
+void __pkvm_unmap_module_page(u64 pfn, void *va);
+void *__pkvm_alloc_module_va(u64 nr_pages);
 #endif /* __KVM_HYP_MM_H */

diff --git a/arch/arm64/kvm/hyp/include/nvhe/modules.h b/arch/arm64/kvm/hyp/include/nvhe/modules.h
new file mode 100644
index 0000000..f339d88
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/modules.h

@@ -0,0 +1,36 @@
+#include <asm/kvm_pgtable.h>
+
+#define HCALL_HANDLED 0
+#define HCALL_UNHANDLED -1
+
+int __pkvm_register_host_smc_handler(bool (*cb)(struct kvm_cpu_context *));
+int __pkvm_register_default_trap_handler(bool (*cb)(struct kvm_cpu_context *));
+int __pkvm_register_illegal_abt_notifier(void (*cb)(struct kvm_cpu_context *));
+int __pkvm_register_hyp_panic_notifier(void (*cb)(struct kvm_cpu_context *));
+
+enum pkvm_psci_notification;
+int __pkvm_register_psci_notifier(void (*cb)(enum pkvm_psci_notification, struct kvm_cpu_context *));
+
+int reset_pkvm_priv_hcall_limit(void);
+
+#ifdef CONFIG_MODULES
+int __pkvm_init_module(void *module_init);
+int __pkvm_register_hcall(unsigned long hfn_hyp_va);
+int handle_host_dynamic_hcall(struct kvm_cpu_context *host_ctxt);
+void pkvm_modules_lock(void);
+void pkvm_modules_unlock(void);
+bool pkvm_modules_enabled(void);
+int __pkvm_close_module_registration(void);
+#else
+static inline int __pkvm_init_module(void *module_init) { return -EOPNOTSUPP; }
+static inline int
+__pkvm_register_hcall(unsigned long hfn_hyp_va) { return -EOPNOTSUPP; }
+static inline int handle_host_dynamic_hcall(struct kvm_cpu_context *host_ctxt)
+{
+	return HCALL_UNHANDLED;
+}
+static inline void pkvm_modules_lock(void) { }
+static inline void pkvm_modules_unlock(void) { }
+static inline bool pkvm_modules_enabled(void) { return false; }
+static inline int __pkvm_close_module_registration(void) { return -EOPNOTSUPP; }
+#endif

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
new file mode 100644
index 0000000..c373844
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h

@@ -0,0 +1,149 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Google LLC
+ * Author: Fuad Tabba <tabba@google.com>
+ */
+
+#ifndef __ARM64_KVM_NVHE_PKVM_H__
+#define __ARM64_KVM_NVHE_PKVM_H__
+
+#include <asm/kvm_pkvm.h>
+
+#include <nvhe/gfp.h>
+#include <nvhe/spinlock.h>
+
+/*
+ * Holds the relevant data for maintaining the vcpu state completely at hyp.
+ */
+struct pkvm_hyp_vcpu {
+	struct kvm_vcpu vcpu;
+
+	/* Backpointer to the host's (untrusted) vCPU instance. */
+	struct kvm_vcpu *host_vcpu;
+
+	/*
+	 * If this hyp vCPU is loaded, then this is a backpointer to the
+	 * per-cpu pointer tracking us. Otherwise, NULL if not loaded.
+	 */
+	struct pkvm_hyp_vcpu **loaded_hyp_vcpu;
+
+	/* Tracks exit code for the protected guest. */
+	u32 exit_code;
+
+	/*
+	 * Track the power state transition of a protected vcpu.
+	 * Can be in one of three states:
+	 * PSCI_0_2_AFFINITY_LEVEL_ON
+	 * PSCI_0_2_AFFINITY_LEVEL_OFF
+	 * PSCI_0_2_AFFINITY_LEVEL_PENDING
+	 */
+	int power_state;
+};
+
+/*
+ * Holds the relevant data for running a protected vm.
+ */
+struct pkvm_hyp_vm {
+	struct kvm kvm;
+
+	/* Backpointer to the host's (untrusted) KVM instance. */
+	struct kvm *host_kvm;
+
+	/* The guest's stage-2 page-table managed by the hypervisor. */
+	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
+	struct hyp_pool pool;
+	hyp_spinlock_t lock;
+
+	/* Primary vCPU pending entry to the pvmfw */
+	struct pkvm_hyp_vcpu *pvmfw_entry_vcpu;
+
+	/*
+	 * The number of vcpus initialized and ready to run.
+	 * Modifying this is protected by 'vm_table_lock'.
+	 */
+	unsigned int nr_vcpus;
+
+	/*
+	 * True when the guest is being torn down. When in this state, the
+	 * guest's vCPUs can't be loaded anymore, but its pages can be
+	 * reclaimed by the host.
+	 */
+	bool is_dying;
+
+	/* Array of the hyp vCPU structures for this VM. */
+	struct pkvm_hyp_vcpu *vcpus[];
+};
+
+static inline struct pkvm_hyp_vm *
+pkvm_hyp_vcpu_to_hyp_vm(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	return container_of(hyp_vcpu->vcpu.kvm, struct pkvm_hyp_vm, kvm);
+}
+
+static inline bool vcpu_is_protected(struct kvm_vcpu *vcpu)
+{
+	if (!is_protected_kvm_enabled())
+		return false;
+
+	return vcpu->kvm->arch.pkvm.enabled;
+}
+
+static inline bool pkvm_hyp_vcpu_is_protected(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	return vcpu_is_protected(&hyp_vcpu->vcpu);
+}
+
+extern phys_addr_t pvmfw_base;
+extern phys_addr_t pvmfw_size;
+
+void pkvm_hyp_vm_table_init(void *tbl);
+
+int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
+		   unsigned long pgd_hva, unsigned long last_ran_hva);
+int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
+		     unsigned long vcpu_hva);
+int __pkvm_start_teardown_vm(pkvm_handle_t handle);
+int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
+int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 pfn, u64 ipa);
+
+struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
+					 unsigned int vcpu_idx);
+void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
+struct pkvm_hyp_vcpu *pkvm_get_loaded_hyp_vcpu(void);
+
+u64 pvm_read_id_reg(const struct kvm_vcpu *vcpu, u32 id);
+bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code);
+bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code);
+void kvm_reset_pvm_sys_regs(struct kvm_vcpu *vcpu);
+int kvm_check_pvm_sysreg_table(void);
+
+void pkvm_reset_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
+
+bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
+bool kvm_hyp_handle_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
+
+struct pkvm_hyp_vcpu *pkvm_mpidr_to_hyp_vcpu(struct pkvm_hyp_vm *vm, u64 mpidr);
+
+static inline bool pkvm_hyp_vm_has_pvmfw(struct pkvm_hyp_vm *vm)
+{
+	return vm->kvm.arch.pkvm.pvmfw_load_addr != PVMFW_INVALID_LOAD_ADDR;
+}
+
+static inline bool pkvm_ipa_range_has_pvmfw(struct pkvm_hyp_vm *vm,
+					    u64 ipa_start, u64 ipa_end)
+{
+	struct kvm_protected_vm *pkvm = &vm->kvm.arch.pkvm;
+	u64 pvmfw_load_end = pkvm->pvmfw_load_addr + pvmfw_size;
+
+	if (!pkvm_hyp_vm_has_pvmfw(vm))
+		return false;
+
+	return ipa_end > pkvm->pvmfw_load_addr && ipa_start < pvmfw_load_end;
+}
+
+int pkvm_load_pvmfw_pages(struct pkvm_hyp_vm *vm, u64 ipa, phys_addr_t phys,
+			  u64 size);
+void pkvm_poison_pvmfw_pages(void);
+
+#endif /* __ARM64_KVM_NVHE_PKVM_H__ */

diff --git a/arch/arm64/kvm/hyp/include/nvhe/serial.h b/arch/arm64/kvm/hyp/include/nvhe/serial.h
new file mode 100644
index 0000000..85ff8fd
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/serial.h

@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ARM64_KVM_NVHE_SERIAL_H__
+#define __ARM64_KVM_NVHE_SERIAL_H__
+
+void hyp_puts(const char *s);
+void hyp_putx64(u64 x);
+void hyp_putc(char c);
+int __pkvm_register_serial_driver(void (*driver_cb)(char));
+
+#endif

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
index 4652fd0..7c7ea8c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h

@@ -28,9 +28,17 @@ typedef union hyp_spinlock {
 	};
 } hyp_spinlock_t;
 
+#define __HYP_SPIN_LOCK_INITIALIZER \
+	{ .__val = 0 }
+
+#define __HYP_SPIN_LOCK_UNLOCKED \
+	((hyp_spinlock_t) __HYP_SPIN_LOCK_INITIALIZER)
+
+#define DEFINE_HYP_SPINLOCK(x)	hyp_spinlock_t x = __HYP_SPIN_LOCK_UNLOCKED
+
 #define hyp_spin_lock_init(l)						\
 do {									\
-	*(l) = (hyp_spinlock_t){ .__val = 0 };				\
+	*(l) = __HYP_SPIN_LOCK_UNLOCKED;				\
 } while (0)
 
 static inline void hyp_spin_lock(hyp_spinlock_t *lock)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/trace.h b/arch/arm64/kvm/hyp/include/nvhe/trace.h
new file mode 100644
index 0000000..ff5d047
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/trace.h

@@ -0,0 +1,115 @@
+#ifndef __ARM64_KVM_HYP_NVHE_TRACE_H
+#define __ARM64_KVM_HYP_NVHE_TRACE_H
+
+#include <nvhe/trace.h>
+
+#include <linux/trace_events.h>
+#include <linux/ring_buffer.h>
+
+#include <asm/kvm_hyptrace.h>
+#include <asm/kvm_hypevents_defs.h>
+#include <asm/percpu.h>
+
+#ifdef CONFIG_TRACING
+
+struct hyp_buffer_page {
+	struct list_head list;
+	struct buffer_data_page *page;
+	atomic_t write;
+	atomic_t entries;
+};
+
+#define HYP_RB_UNUSED	0
+#define HYP_RB_READY	1
+#define HYP_RB_WRITE	2
+
+struct hyp_rb_per_cpu {
+	struct hyp_buffer_page *tail_page;
+	struct hyp_buffer_page *reader_page;
+	struct hyp_buffer_page *head_page;
+	struct hyp_buffer_page *bpages;
+	unsigned long nr_pages;
+	atomic64_t write_stamp;
+	atomic_t pages_touched;
+	atomic_t nr_entries;
+	atomic_t status;
+	atomic_t overrun;
+};
+
+static inline bool __start_write_hyp_rb(struct hyp_rb_per_cpu *rb)
+{
+	/*
+	 * Paired with rb_cpu_init()
+	 */
+	return atomic_cmpxchg_acquire(&rb->status, HYP_RB_READY, HYP_RB_WRITE)
+		!= HYP_RB_UNUSED;
+}
+
+static inline void __stop_write_hyp_rb(struct hyp_rb_per_cpu *rb)
+{
+	/*
+	 * Paired with rb_cpu_teardown()
+	 */
+	atomic_set_release(&rb->status, HYP_RB_READY);
+}
+
+struct hyp_rb_per_cpu;
+DECLARE_PER_CPU(struct hyp_rb_per_cpu, trace_rb);
+
+void *rb_reserve_trace_entry(struct hyp_rb_per_cpu *cpu_buffer, unsigned long length);
+
+int __pkvm_start_tracing(unsigned long pack_va, size_t pack_size);
+void __pkvm_stop_tracing(void);
+int __pkvm_rb_swap_reader_page(int cpu);
+int __pkvm_rb_update_footers(int cpu);
+int __pkvm_enable_event(unsigned short id, bool enable);
+
+#define HYP_EVENT(__name, __proto, __struct, __assign, __printk)		\
+	HYP_EVENT_FORMAT(__name, __struct);					\
+	extern atomic_t __name##_enabled;					\
+	extern unsigned short hyp_event_id_##__name;				\
+	static inline void trace_##__name(__proto)				\
+	{									\
+		size_t length = sizeof(struct trace_hyp_format_##__name);	\
+		struct hyp_rb_per_cpu *rb = this_cpu_ptr(&trace_rb);		\
+		struct trace_hyp_format_##__name *__entry;			\
+										\
+		if (!atomic_read(&__name##_enabled))				\
+			return;							\
+		if (!__start_write_hyp_rb(rb))					\
+			return;							\
+		__entry = rb_reserve_trace_entry(rb, length);			\
+		__entry->hdr.id = hyp_event_id_##__name;			\
+		__assign							\
+		__stop_write_hyp_rb(rb);					\
+	}
+
+/* TODO: atomic_t to static_branch */
+
+#else
+static inline int __pkvm_start_tracing(unsigned long pack_va, size_t pack_size)
+{
+	return -ENODEV;
+}
+
+static inline void __pkvm_stop_tracing(void) { }
+
+static inline int __pkvm_rb_swap_reader_page(int cpu)
+{
+	return -ENODEV;
+}
+
+static inline int __pkvm_rb_update_footers(int cpu)
+{
+	return -ENODEV;
+}
+
+#define HYP_EVENT(__name, __proto, __struct, __assign, __printk)	\
+	static inline void trace_##__name(__proto) {}
+
+static inline int __pkvm_enable_event(unsigned short id, bool enable)
+{
+	return -ENODEV;
+}
+#endif
+#endif

diff --git a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
index 45a84f0a..1e6d995 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h

@@ -15,6 +15,4 @@
 #define DECLARE_REG(type, name, ctxt, reg)	\
 				type name = (type)cpu_reg(ctxt, (reg))
 
-void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu);
-
 #endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */

diff --git a/arch/arm64/kvm/hyp/nvhe/.gitignore b/arch/arm64/kvm/hyp/nvhe/.gitignore
index 5b6c43c..899547d8 100644
--- a/arch/arm64/kvm/hyp/nvhe/.gitignore
+++ b/arch/arm64/kvm/hyp/nvhe/.gitignore

@@ -1,4 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
-gen-hyprel
 hyp.lds
 hyp-reloc.S

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index be0a2bc..3283c43 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile

@@ -1,111 +1,21 @@
 # SPDX-License-Identifier: GPL-2.0
-#
-# Makefile for Kernel-based Virtual Machine module, HYP/nVHE part
-#
-
-asflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS
-
-# Tracepoint and MMIO logging symbols should not be visible at nVHE KVM as
-# there is no way to execute them and any such MMIO access from nVHE KVM
-# will explode instantly (Words of Marc Zyngier). So introduce a generic flag
-# __DISABLE_TRACE_MMIO__ to disable MMIO tracing for nVHE KVM.
-ccflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS -D__DISABLE_TRACE_MMIO__
-ccflags-y += -fno-stack-protector	\
-	     -DDISABLE_BRANCH_PROFILING	\
-	     $(DISABLE_STACKLEAK_PLUGIN)
-
-hostprogs := gen-hyprel
-HOST_EXTRACFLAGS += -I$(objtree)/include
 
 lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 hyp-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o page_alloc.o \
-	 cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o
+	 cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o iommu.o \
+	 serial.o
 hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
+hyp-obj-$(CONFIG_TRACING) += clock.o events.o trace.o
 hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+hyp-obj-$(CONFIG_MODULES) += modules.o
 hyp-obj-y += $(lib-objs)
 
-##
-## Build rules for compiling nVHE hyp code
-## Output of this folder is `kvm_nvhe.o`, a partially linked object
-## file containing all nVHE hyp code and data.
-##
-
-hyp-obj := $(patsubst %.o,%.nvhe.o,$(hyp-obj-y))
-obj-y := kvm_nvhe.o
-targets += $(hyp-obj) kvm_nvhe.tmp.o kvm_nvhe.rel.o hyp.lds hyp-reloc.S hyp-reloc.o
-
-# 1) Compile all source files to `.nvhe.o` object files. The file extension
-#    avoids file name clashes for files shared with VHE.
-$(obj)/%.nvhe.o: $(src)/%.c FORCE
-	$(call if_changed_rule,cc_o_c)
-$(obj)/%.nvhe.o: $(src)/%.S FORCE
-	$(call if_changed_rule,as_o_S)
-
-# 2) Compile linker script.
 $(obj)/hyp.lds: $(src)/hyp.lds.S FORCE
 	$(call if_changed_dep,cpp_lds_S)
 
-# 3) Partially link all '.nvhe.o' files and apply the linker script.
-#    Prefixes names of ELF sections with '.hyp', eg. '.hyp.text'.
-#    Note: The following rule assumes that the 'ld' rule puts LDFLAGS before
-#          the list of dependencies to form '-T $(obj)/hyp.lds'. This is to
-#          keep the dependency on the target while avoiding an error from
-#          GNU ld if the linker script is passed to it twice.
-LDFLAGS_kvm_nvhe.tmp.o := -r -T
-$(obj)/kvm_nvhe.tmp.o: $(obj)/hyp.lds $(addprefix $(obj)/,$(hyp-obj)) FORCE
-	$(call if_changed,ld)
-
-# 4) Generate list of hyp code/data positions that need to be relocated at
-#    runtime. Because the hypervisor is part of the kernel binary, relocations
-#    produce a kernel VA. We enumerate relocations targeting hyp at build time
-#    and convert the kernel VAs at those positions to hyp VAs.
-$(obj)/hyp-reloc.S: $(obj)/kvm_nvhe.tmp.o $(obj)/gen-hyprel FORCE
-	$(call if_changed,hyprel)
-
-# 5) Compile hyp-reloc.S and link it into the existing partially linked object.
-#    The object file now contains a section with pointers to hyp positions that
-#    will contain kernel VAs at runtime. These pointers have relocations on them
-#    so that they get updated as the hyp object is linked into `vmlinux`.
-LDFLAGS_kvm_nvhe.rel.o := -r
-$(obj)/kvm_nvhe.rel.o: $(obj)/kvm_nvhe.tmp.o $(obj)/hyp-reloc.o FORCE
-	$(call if_changed,ld)
-
-# 6) Produce the final 'kvm_nvhe.o', ready to be linked into 'vmlinux'.
-#    Prefixes names of ELF symbols with '__kvm_nvhe_'.
-$(obj)/kvm_nvhe.o: $(obj)/kvm_nvhe.rel.o FORCE
-	$(call if_changed,hypcopy)
-
-# The HYPREL command calls `gen-hyprel` to generate an assembly file with
-# a list of relocations targeting hyp code/data.
-quiet_cmd_hyprel = HYPREL  $@
-      cmd_hyprel = $(obj)/gen-hyprel $< > $@
-
-# The HYPCOPY command uses `objcopy` to prefix all ELF symbol names
-# to avoid clashes with VHE code/data.
-quiet_cmd_hypcopy = HYPCOPY $@
-      cmd_hypcopy = $(OBJCOPY) --prefix-symbols=__kvm_nvhe_ $< $@
-
-# Remove ftrace, Shadow Call Stack, and CFI CFLAGS.
-# This is equivalent to the 'notrace', '__noscs', and '__nocfi' annotations.
-KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI), $(KBUILD_CFLAGS))
-# Starting from 13.0.0 llvm emits SHT_REL section '.llvm.call-graph-profile'
-# when profile optimization is applied. gen-hyprel does not support SHT_REL and
-# causes a build failure. Remove profile optimization flags.
-KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%, $(KBUILD_CFLAGS))
-
-# KVM nVHE code is run at a different exception code with a different map, so
-# compiler instrumentation that inserts callbacks or checks into the code may
-# cause crashes. Just disable it.
-GCOV_PROFILE	:= n
-KASAN_SANITIZE	:= n
-KCSAN_SANITIZE	:= n
-UBSAN_SANITIZE	:= n
-KCOV_INSTRUMENT	:= n
-
-# Skip objtool checking for this directory because nVHE code is compiled with
-# non-standard build rules.
-OBJECT_FILES_NON_STANDARD := y
+include $(srctree)/arch/arm64/kvm/hyp/nvhe/Makefile.nvhe
+obj-y := kvm_nvhe.o

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile.module b/arch/arm64/kvm/hyp/nvhe/Makefile.module
new file mode 100644
index 0000000..d3ad446
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile.module

@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+$(obj)/hyp.lds: arch/arm64/kvm/hyp/nvhe/module.lds.S FORCE
+	$(call if_changed_dep,cpp_lds_S)
+
+include $(srctree)/arch/arm64/kvm/hyp/nvhe/Makefile.nvhe

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile.nvhe b/arch/arm64/kvm/hyp/nvhe/Makefile.nvhe
new file mode 100644
index 0000000..381e610
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile.nvhe

@@ -0,0 +1,94 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for Kernel-based Virtual Machine module, HYP/nVHE part
+#
+
+asflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS
+
+# Tracepoint and MMIO logging symbols should not be visible at nVHE KVM as
+# there is no way to execute them and any such MMIO access from nVHE KVM
+# will explode instantly (Words of Marc Zyngier). So introduce a generic flag
+# __DISABLE_TRACE_MMIO__ to disable MMIO tracing for nVHE KVM.
+ccflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS -D__DISABLE_TRACE_MMIO__
+ccflags-y += -fno-stack-protector	\
+	     -DDISABLE_BRANCH_PROFILING	\
+	     $(DISABLE_STACKLEAK_PLUGIN)
+
+HYPREL := arch/arm64/tools/gen-hyprel
+
+##
+## Build rules for compiling nVHE hyp code
+## Output of this folder is `kvm_nvhe.o`, a partially linked object
+## file containing all nVHE hyp code and data.
+##
+
+hyp-obj := $(patsubst %.o,%.nvhe.o,$(hyp-obj-y))
+targets += $(hyp-obj) kvm_nvhe.tmp.o kvm_nvhe.rel.o hyp.lds hyp-reloc.S hyp-reloc.o
+
+# 1) Compile all source files to `.nvhe.o` object files. The file extension
+#    avoids file name clashes for files shared with VHE.
+$(obj)/%.nvhe.o: $(src)/%.c FORCE
+	$(call if_changed_rule,cc_o_c)
+$(obj)/%.nvhe.o: $(src)/%.S FORCE
+	$(call if_changed_rule,as_o_S)
+
+# 2) Partially link all '.nvhe.o' files and apply the linker script.
+#    Prefixes names of ELF sections with '.hyp', eg. '.hyp.text'.
+#    Note: The following rule assumes that the 'ld' rule puts LDFLAGS before
+#          the list of dependencies to form '-T $(obj)/hyp.lds'. This is to
+#          keep the dependency on the target while avoiding an error from
+#          GNU ld if the linker script is passed to it twice.
+LDFLAGS_kvm_nvhe.tmp.o := -r -T
+$(obj)/kvm_nvhe.tmp.o: $(obj)/hyp.lds $(addprefix $(obj)/,$(hyp-obj)) FORCE
+	$(call if_changed,ld)
+
+# 3) Generate list of hyp code/data positions that need to be relocated at
+#    runtime. Because the hypervisor is part of the kernel binary, relocations
+#    produce a kernel VA. We enumerate relocations targeting hyp at build time
+#    and convert the kernel VAs at those positions to hyp VAs.
+$(obj)/hyp-reloc.S: $(obj)/kvm_nvhe.tmp.o FORCE
+	$(call if_changed,hyprel)
+
+# 4) Compile hyp-reloc.S and link it into the existing partially linked object.
+#    The object file now contains a section with pointers to hyp positions that
+#    will contain kernel VAs at runtime. These pointers have relocations on them
+#    so that they get updated as the hyp object is linked into `vmlinux`.
+LDFLAGS_kvm_nvhe.rel.o := -r
+$(obj)/kvm_nvhe.rel.o: $(obj)/kvm_nvhe.tmp.o $(obj)/hyp-reloc.o FORCE
+	$(call if_changed,ld)
+
+# 5) Produce the final 'kvm_nvhe.o', ready to be linked into 'vmlinux'.
+#    Prefixes names of ELF symbols with '__kvm_nvhe_'.
+$(obj)/kvm_nvhe.o: $(obj)/kvm_nvhe.rel.o FORCE
+	$(call if_changed,hypcopy)
+
+# The HYPREL command calls `gen-hyprel` to generate an assembly file with
+# a list of relocations targeting hyp code/data.
+quiet_cmd_hyprel = HYPREL  $@
+      cmd_hyprel = $(HYPREL) $< > $@
+
+# The HYPCOPY command uses `objcopy` to prefix all ELF symbol names
+# to avoid clashes with VHE code/data.
+quiet_cmd_hypcopy = HYPCOPY $@
+      cmd_hypcopy = $(OBJCOPY) --prefix-symbols=__kvm_nvhe_ $< $@
+
+# Remove ftrace, Shadow Call Stack, and CFI CFLAGS.
+# This is equivalent to the 'notrace', '__noscs', and '__nocfi' annotations.
+KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI), $(KBUILD_CFLAGS))
+# Starting from 13.0.0 llvm emits SHT_REL section '.llvm.call-graph-profile'
+# when profile optimization is applied. gen-hyprel does not support SHT_REL and
+# causes a build failure. Remove profile optimization flags.
+KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%, $(KBUILD_CFLAGS))
+
+# KVM nVHE code is run at a different exception code with a different map, so
+# compiler instrumentation that inserts callbacks or checks into the code may
+# cause crashes. Just disable it.
+GCOV_PROFILE	:= n
+KASAN_SANITIZE	:= n
+KCSAN_SANITIZE	:= n
+UBSAN_SANITIZE	:= n
+KCOV_INSTRUMENT	:= n
+
+# Skip objtool checking for this directory because nVHE code is compiled with
+# non-standard build rules.
+OBJECT_FILES_NON_STANDARD := y

diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
index 0c367eb..85936c17 100644
--- a/arch/arm64/kvm/hyp/nvhe/cache.S
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S

@@ -12,3 +12,14 @@
 	ret
 SYM_FUNC_END(__pi_dcache_clean_inval_poc)
 SYM_FUNC_ALIAS(dcache_clean_inval_poc, __pi_dcache_clean_inval_poc)
+
+SYM_FUNC_START(__pi_icache_inval_pou)
+alternative_if ARM64_HAS_CACHE_DIC
+	isb
+	ret
+alternative_else_nop_endif
+
+	invalidate_icache_by_line x0, x1, x2, x3
+	ret
+SYM_FUNC_END(__pi_icache_inval_pou)
+SYM_FUNC_ALIAS(icache_inval_pou, __pi_icache_inval_pou)

diff --git a/arch/arm64/kvm/hyp/nvhe/clock.c b/arch/arm64/kvm/hyp/nvhe/clock.c
new file mode 100644
index 0000000..4ff87e86
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/clock.c

@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <nvhe/clock.h>
+
+#include <asm/arch_timer.h>
+#include <asm/div64.h>
+
+static struct kvm_nvhe_clock_data trace_clock_data;
+
+/*
+ * Update without any locks! This is fine because tracing, the sole user of this
+ * clock is ordering the memory and protects from races between read and
+ * updates.
+ */
+void trace_clock_update(struct kvm_nvhe_clock_data *data)
+{
+	trace_clock_data.mult = data->mult;
+	trace_clock_data.shift = data->shift;
+	trace_clock_data.epoch_ns = data->epoch_ns;
+	trace_clock_data.epoch_cyc = data->epoch_cyc;
+}
+
+/*
+ * This clock is relying on host provided slope and epoch values to return
+ * something synchronized with the host. The downside is we can't trust the
+ * output which must not be used for anything else than debugging.
+ */
+u64 trace_clock(void)
+{
+	u64 cyc = __arch_counter_get_cntpct() - trace_clock_data.epoch_cyc;
+	__uint128_t ns;
+
+	/*
+	 * The host kernel can avoid the 64-bits overflow of the multiplication
+	 * by updating the epoch value with a timer (see
+	 * kernel/time/clocksource.c). The hypervisor doesn't have that option,
+	 * so let's do a more costly 128-bits mult here.
+	 */
+	ns = (__uint128_t)cyc * trace_clock_data.mult;
+	ns >>= trace_clock_data.shift;
+
+	return (u64)ns + trace_clock_data.epoch_ns;
+}

diff --git a/arch/arm64/kvm/hyp/nvhe/events.c b/arch/arm64/kvm/hyp/nvhe/events.c
new file mode 100644
index 0000000..9deb2c3
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/events.c

@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 Google LLC
+ */
+
+#include <nvhe/trace.h>
+#include <nvhe/mm.h>
+
+extern struct hyp_event_id __hyp_event_ids_start[];
+extern struct hyp_event_id __hyp_event_ids_end[];
+
+#undef HYP_EVENT
+#define HYP_EVENT(__name, __proto, __struct, __assign, __printk)	\
+	atomic_t __ro_after_init __name##_enabled = ATOMIC_INIT(0);	\
+	struct hyp_event_id hyp_event_id_##__name __section("_hyp_event_ids") = {	\
+		.data = (void *)&__name##_enabled,			\
+	}
+
+#include <asm/kvm_hypevents.h>
+
+int __pkvm_enable_event(unsigned short id, bool enable)
+{
+	struct hyp_event_id *event_id = __hyp_event_ids_start;
+	atomic_t *enable_key;
+
+	for (; (unsigned long)event_id < (unsigned long)__hyp_event_ids_end;
+	     event_id++) {
+		if (event_id->id != id)
+			continue;
+
+		enable_key = (atomic_t *)event_id->data;
+		enable_key = hyp_fixmap_map(__hyp_pa(enable_key));
+
+		atomic_set(enable_key, enable);
+
+		hyp_fixmap_unmap();
+
+		return 0;
+	}
+
+	return -EINVAL;
+}

diff --git a/arch/arm64/kvm/hyp/nvhe/ffa.c b/arch/arm64/kvm/hyp/nvhe/ffa.c
new file mode 100644
index 0000000..4d1c3c8
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/ffa.c

@@ -0,0 +1,741 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * FF-A v1.0 proxy to filter out invalid memory-sharing SMC calls issued by
+ * the host. FF-A is a slightly more palatable abbreviation of "Arm Firmware
+ * Framework for Arm A-profile", which is specified by Arm in document
+ * number DEN0077.
+ *
+ * Copyright (C) 2022 - Google LLC
+ * Author: Andrew Walbran <qwandor@google.com>
+ *
+ * This driver hooks into the SMC trapping logic for the host and intercepts
+ * all calls falling within the FF-A range. Each call is either:
+ *
+ *	- Forwarded on unmodified to the SPMD at EL3
+ *	- Rejected as "unsupported"
+ *	- Accompanied by a host stage-2 page-table check/update and reissued
+ *
+ * Consequently, any attempts by the host to make guest memory pages
+ * accessible to the secure world using FF-A will be detected either here
+ * (in the case that the memory is already owned by the guest) or during
+ * donation to the guest (in the case that the memory was previously shared
+ * with the secure world).
+ *
+ * To allow the rolling-back of page-table updates and FF-A calls in the
+ * event of failure, operations involving the RXTX buffers are locked for
+ * the duration and are therefore serialised.
+ */
+
+#include <linux/arm-smccc.h>
+#include <linux/arm_ffa.h>
+#include <asm/kvm_pkvm.h>
+
+#include <nvhe/ffa.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/memory.h>
+#include <nvhe/trap_handler.h>
+#include <nvhe/spinlock.h>
+
+/*
+ * "ID value 0 must be returned at the Non-secure physical FF-A instance"
+ * We share this ID with the host.
+ */
+#define HOST_FFA_ID	0
+
+/*
+ * A buffer to hold the maximum descriptor size we can see from the host,
+ * which is required when the SPMD returns a fragmented FFA_MEM_RETRIEVE_RESP
+ * when resolving the handle on the reclaim path.
+ */
+struct kvm_ffa_descriptor_buffer {
+	void	*buf;
+	size_t	len;
+};
+
+static struct kvm_ffa_descriptor_buffer ffa_desc_buf;
+
+struct kvm_ffa_buffers {
+	hyp_spinlock_t lock;
+	void *tx;
+	void *rx;
+};
+
+/*
+ * Note that we don't currently lock these buffers explicitly, instead
+ * relying on the locking of the host FFA buffers as we only have one
+ * client.
+ */
+static struct kvm_ffa_buffers hyp_buffers;
+static struct kvm_ffa_buffers host_buffers;
+
+static void ffa_to_smccc_error(struct arm_smccc_res *res, u64 ffa_errno)
+{
+	*res = (struct arm_smccc_res) {
+		.a0	= FFA_ERROR,
+		.a2	= ffa_errno,
+	};
+}
+
+static void ffa_to_smccc_res_prop(struct arm_smccc_res *res, int ret, u64 prop)
+{
+	if (ret == FFA_RET_SUCCESS) {
+		*res = (struct arm_smccc_res) { .a0 = FFA_SUCCESS,
+						.a2 = prop };
+	} else {
+		ffa_to_smccc_error(res, ret);
+	}
+}
+
+static void ffa_to_smccc_res(struct arm_smccc_res *res, int ret)
+{
+	ffa_to_smccc_res_prop(res, ret, 0);
+}
+
+static void ffa_set_retval(struct kvm_cpu_context *ctxt,
+			   struct arm_smccc_res *res)
+{
+	cpu_reg(ctxt, 0) = res->a0;
+	cpu_reg(ctxt, 1) = res->a1;
+	cpu_reg(ctxt, 2) = res->a2;
+	cpu_reg(ctxt, 3) = res->a3;
+}
+
+static bool is_ffa_call(u64 func_id)
+{
+	return ARM_SMCCC_IS_FAST_CALL(func_id) &&
+	       ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
+	       ARM_SMCCC_FUNC_NUM(func_id) >= FFA_MIN_FUNC_NUM &&
+	       ARM_SMCCC_FUNC_NUM(func_id) <= FFA_MAX_FUNC_NUM;
+}
+
+static int spmd_map_ffa_buffers(u64 ffa_page_count)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_smc(FFA_FN64_RXTX_MAP,
+			  hyp_virt_to_phys(hyp_buffers.tx),
+			  hyp_virt_to_phys(hyp_buffers.rx),
+			  ffa_page_count,
+			  0, 0, 0, 0,
+			  &res);
+
+	return res.a0 == FFA_SUCCESS ? FFA_RET_SUCCESS : res.a2;
+}
+
+static int spmd_unmap_ffa_buffers(void)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_smc(FFA_RXTX_UNMAP,
+			  HOST_FFA_ID,
+			  0, 0, 0, 0, 0, 0,
+			  &res);
+
+	return res.a0 == FFA_SUCCESS ? FFA_RET_SUCCESS : res.a2;
+}
+
+static void spmd_mem_frag_tx(struct arm_smccc_res *res, u32 handle_lo,
+			     u32 handle_hi, u32 fraglen, u32 endpoint_id)
+{
+	arm_smccc_1_1_smc(FFA_MEM_FRAG_TX,
+			  handle_lo, handle_hi, fraglen, endpoint_id,
+			  0, 0, 0,
+			  res);
+}
+
+static void spmd_mem_frag_rx(struct arm_smccc_res *res, u32 handle_lo,
+			     u32 handle_hi, u32 fragoff)
+{
+	arm_smccc_1_1_smc(FFA_MEM_FRAG_RX,
+			  handle_lo, handle_hi, fragoff, HOST_FFA_ID,
+			  0, 0, 0,
+			  res);
+}
+
+static void spmd_mem_xfer(struct arm_smccc_res *res, u64 func_id, u32 len,
+			  u32 fraglen)
+{
+	arm_smccc_1_1_smc(func_id, len, fraglen,
+			  0, 0, 0, 0, 0,
+			  res);
+}
+
+static void spmd_mem_reclaim(struct arm_smccc_res *res, u32 handle_lo,
+			     u32 handle_hi, u32 flags)
+{
+	arm_smccc_1_1_smc(FFA_MEM_RECLAIM,
+			  handle_lo, handle_hi, flags,
+			  0, 0, 0, 0,
+			  res);
+}
+
+static void spmd_retrieve_req(struct arm_smccc_res *res, u32 len)
+{
+	arm_smccc_1_1_smc(FFA_FN64_MEM_RETRIEVE_REQ,
+			  len, len,
+			  0, 0, 0, 0, 0,
+			  res);
+}
+
+static void do_ffa_rxtx_map(struct arm_smccc_res *res,
+			    struct kvm_cpu_context *ctxt)
+{
+	DECLARE_REG(phys_addr_t, tx, ctxt, 1);
+	DECLARE_REG(phys_addr_t, rx, ctxt, 2);
+	DECLARE_REG(u32, npages, ctxt, 3);
+	int ret = 0;
+	void *rx_virt, *tx_virt;
+
+	if (npages != (KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE) / FFA_PAGE_SIZE) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto out;
+	}
+
+	if (!PAGE_ALIGNED(tx) || !PAGE_ALIGNED(rx)) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto out;
+	}
+
+	hyp_spin_lock(&host_buffers.lock);
+	if (host_buffers.tx) {
+		ret = FFA_RET_DENIED;
+		goto out_unlock;
+	}
+
+	ret = spmd_map_ffa_buffers(npages);
+	if (ret)
+		goto out_unlock;
+
+	ret = __pkvm_host_share_hyp(hyp_phys_to_pfn(tx));
+	if (ret) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto err_unmap;
+	}
+
+	ret = __pkvm_host_share_hyp(hyp_phys_to_pfn(rx));
+	if (ret) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto err_unshare_tx;
+	}
+
+	tx_virt = hyp_phys_to_virt(tx);
+	ret = hyp_pin_shared_mem(tx_virt, tx_virt + 1);
+	if (ret) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto err_unshare_rx;
+	}
+
+	rx_virt = hyp_phys_to_virt(rx);
+	ret = hyp_pin_shared_mem(rx_virt, rx_virt + 1);
+	if (ret) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto err_unpin_tx;
+	}
+
+	host_buffers.tx = tx_virt;
+	host_buffers.rx = rx_virt;
+
+out_unlock:
+	hyp_spin_unlock(&host_buffers.lock);
+out:
+	ffa_to_smccc_res(res, ret);
+	return;
+
+err_unpin_tx:
+	hyp_unpin_shared_mem(tx_virt, tx_virt + 1);
+err_unshare_rx:
+	__pkvm_host_unshare_hyp(hyp_phys_to_pfn(rx));
+err_unshare_tx:
+	__pkvm_host_unshare_hyp(hyp_phys_to_pfn(tx));
+err_unmap:
+	spmd_unmap_ffa_buffers();
+	goto out_unlock;
+}
+
+static void do_ffa_rxtx_unmap(struct arm_smccc_res *res,
+			      struct kvm_cpu_context *ctxt)
+{
+	DECLARE_REG(u32, id, ctxt, 1);
+	int ret = 0;
+
+	if (id != HOST_FFA_ID) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto out;
+	}
+
+	hyp_spin_lock(&host_buffers.lock);
+	if (!host_buffers.tx) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto out_unlock;
+	}
+
+	hyp_unpin_shared_mem(host_buffers.tx, host_buffers.tx + 1);
+	WARN_ON(__pkvm_host_unshare_hyp(hyp_virt_to_pfn(host_buffers.tx)));
+	host_buffers.tx = NULL;
+
+	hyp_unpin_shared_mem(host_buffers.rx, host_buffers.rx + 1);
+	WARN_ON(__pkvm_host_unshare_hyp(hyp_virt_to_pfn(host_buffers.rx)));
+	host_buffers.rx = NULL;
+
+	spmd_unmap_ffa_buffers();
+
+out_unlock:
+	hyp_spin_unlock(&host_buffers.lock);
+out:
+	ffa_to_smccc_res(res, ret);
+}
+
+static u32 __ffa_host_share_ranges(struct ffa_mem_region_addr_range *ranges,
+				   u32 nranges)
+{
+	u32 i;
+
+	for (i = 0; i < nranges; ++i) {
+		struct ffa_mem_region_addr_range *range = &ranges[i];
+		u64 sz = (u64)range->pg_cnt * FFA_PAGE_SIZE;
+		u64 pfn = hyp_phys_to_pfn(range->address);
+
+		if (!PAGE_ALIGNED(sz))
+			break;
+
+		if (__pkvm_host_share_ffa(pfn, sz / PAGE_SIZE))
+			break;
+	}
+
+	return i;
+}
+
+static u32 __ffa_host_unshare_ranges(struct ffa_mem_region_addr_range *ranges,
+				     u32 nranges)
+{
+	u32 i;
+
+	for (i = 0; i < nranges; ++i) {
+		struct ffa_mem_region_addr_range *range = &ranges[i];
+		u64 sz = (u64)range->pg_cnt * FFA_PAGE_SIZE;
+		u64 pfn = hyp_phys_to_pfn(range->address);
+
+		if (!PAGE_ALIGNED(sz))
+			break;
+
+		if (__pkvm_host_unshare_ffa(pfn, sz / PAGE_SIZE))
+			break;
+	}
+
+	return i;
+}
+
+static int ffa_host_share_ranges(struct ffa_mem_region_addr_range *ranges,
+				 u32 nranges)
+{
+	u32 nshared = __ffa_host_share_ranges(ranges, nranges);
+	int ret = 0;
+
+	if (nshared != nranges) {
+		WARN_ON(__ffa_host_unshare_ranges(ranges, nshared) != nshared);
+		ret = FFA_RET_DENIED;
+	}
+
+	return ret;
+}
+
+static int ffa_host_unshare_ranges(struct ffa_mem_region_addr_range *ranges,
+				   u32 nranges)
+{
+	u32 nunshared = __ffa_host_unshare_ranges(ranges, nranges);
+	int ret = 0;
+
+	if (nunshared != nranges) {
+		WARN_ON(__ffa_host_share_ranges(ranges, nunshared) != nunshared);
+		ret = FFA_RET_DENIED;
+	}
+
+	return ret;
+}
+
+static void do_ffa_mem_frag_tx(struct arm_smccc_res *res,
+			       struct kvm_cpu_context *ctxt)
+{
+	DECLARE_REG(u32, handle_lo, ctxt, 1);
+	DECLARE_REG(u32, handle_hi, ctxt, 2);
+	DECLARE_REG(u32, fraglen, ctxt, 3);
+	DECLARE_REG(u32, endpoint_id, ctxt, 4);
+	struct ffa_mem_region_addr_range *buf;
+	int ret = FFA_RET_INVALID_PARAMETERS;
+	u32 nr_ranges;
+
+	if (fraglen > KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE)
+		goto out;
+
+	if (fraglen % sizeof(*buf))
+		goto out;
+
+	hyp_spin_lock(&host_buffers.lock);
+	if (!host_buffers.tx)
+		goto out_unlock;
+
+	buf = hyp_buffers.tx;
+	memcpy(buf, host_buffers.tx, fraglen);
+	nr_ranges = fraglen / sizeof(*buf);
+
+	ret = ffa_host_share_ranges(buf, nr_ranges);
+	if (ret) {
+		/*
+		 * We're effectively aborting the transaction, so we need
+		 * to restore the global state back to what it was prior to
+		 * transmission of the first fragment.
+		 */
+		spmd_mem_reclaim(res, handle_lo, handle_hi, 0);
+		WARN_ON(res->a0 != FFA_SUCCESS);
+		goto out_unlock;
+	}
+
+	spmd_mem_frag_tx(res, handle_lo, handle_hi, fraglen, endpoint_id);
+	if (res->a0 != FFA_SUCCESS && res->a0 != FFA_MEM_FRAG_RX)
+		WARN_ON(ffa_host_unshare_ranges(buf, nr_ranges));
+
+out_unlock:
+	hyp_spin_unlock(&host_buffers.lock);
+out:
+	if (ret)
+		ffa_to_smccc_res(res, ret);
+
+	/*
+	 * If for any reason this did not succeed, we're in trouble as we have
+	 * now lost the content of the previous fragments and we can't rollback
+	 * the host stage-2 changes. The pages previously marked as shared will
+	 * remain stuck in that state forever, hence preventing the host from
+	 * sharing/donating them again and may possibly lead to subsequent
+	 * failures, but this will not compromise confidentiality.
+	 */
+	return;
+}
+
+static __always_inline void do_ffa_mem_xfer(const u64 func_id,
+					    struct arm_smccc_res *res,
+					    struct kvm_cpu_context *ctxt)
+{
+	DECLARE_REG(u32, len, ctxt, 1);
+	DECLARE_REG(u32, fraglen, ctxt, 2);
+	DECLARE_REG(u64, addr_mbz, ctxt, 3);
+	DECLARE_REG(u32, npages_mbz, ctxt, 4);
+	struct ffa_composite_mem_region *reg;
+	struct ffa_mem_region *buf;
+	u32 offset, nr_ranges;
+	int ret = 0;
+
+	BUILD_BUG_ON(func_id != FFA_FN64_MEM_SHARE &&
+		     func_id != FFA_FN64_MEM_LEND);
+
+	if (addr_mbz || npages_mbz || fraglen > len ||
+	    fraglen > KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto out;
+	}
+
+	if (fraglen < sizeof(struct ffa_mem_region) +
+		      sizeof(struct ffa_mem_region_attributes)) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto out;
+	}
+
+	hyp_spin_lock(&host_buffers.lock);
+	if (!host_buffers.tx) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto out_unlock;
+	}
+
+	buf = hyp_buffers.tx;
+	memcpy(buf, host_buffers.tx, fraglen);
+
+	offset = buf->ep_mem_access[0].composite_off;
+	if (!offset || buf->ep_count != 1 || buf->sender_id != HOST_FFA_ID) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto out_unlock;
+	}
+
+	if (fraglen < offset + sizeof(struct ffa_composite_mem_region)) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto out_unlock;
+	}
+
+	reg = (void *)buf + offset;
+	nr_ranges = ((void *)buf + fraglen) - (void *)reg->constituents;
+	if (nr_ranges % sizeof(reg->constituents[0])) {
+		ret = FFA_RET_INVALID_PARAMETERS;
+		goto out_unlock;
+	}
+
+	nr_ranges /= sizeof(reg->constituents[0]);
+	ret = ffa_host_share_ranges(reg->constituents, nr_ranges);
+	if (ret)
+		goto out_unlock;
+
+	spmd_mem_xfer(res, func_id, len, fraglen);
+	if (fraglen != len) {
+		if (res->a0 != FFA_MEM_FRAG_RX)
+			goto err_unshare;
+
+		if (res->a3 != fraglen)
+			goto err_unshare;
+	} else if (res->a0 != FFA_SUCCESS) {
+		goto err_unshare;
+	}
+
+out_unlock:
+	hyp_spin_unlock(&host_buffers.lock);
+out:
+	if (ret)
+		ffa_to_smccc_res(res, ret);
+	return;
+
+err_unshare:
+	WARN_ON(ffa_host_unshare_ranges(reg->constituents, nr_ranges));
+	goto out_unlock;
+}
+
+static void do_ffa_mem_reclaim(struct arm_smccc_res *res,
+			       struct kvm_cpu_context *ctxt)
+{
+	DECLARE_REG(u32, handle_lo, ctxt, 1);
+	DECLARE_REG(u32, handle_hi, ctxt, 2);
+	DECLARE_REG(u32, flags, ctxt, 3);
+	struct ffa_composite_mem_region *reg;
+	u32 offset, len, fraglen, fragoff;
+	struct ffa_mem_region *buf;
+	int ret = 0;
+	u64 handle;
+
+	handle = PACK_HANDLE(handle_lo, handle_hi);
+
+	hyp_spin_lock(&host_buffers.lock);
+
+	buf = hyp_buffers.tx;
+	*buf = (struct ffa_mem_region) {
+		.sender_id	= HOST_FFA_ID,
+		.handle		= handle,
+	};
+
+	spmd_retrieve_req(res, sizeof(*buf));
+	buf = hyp_buffers.rx;
+	if (res->a0 != FFA_MEM_RETRIEVE_RESP)
+		goto out_unlock;
+
+	len = res->a1;
+	fraglen = res->a2;
+
+	offset = buf->ep_mem_access[0].composite_off;
+	/*
+	 * We can trust the SPMD to get this right, but let's at least
+	 * check that we end up with something that doesn't look _completely_
+	 * bogus.
+	 */
+	if (WARN_ON(offset > len ||
+		    fraglen > KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE)) {
+		ret = FFA_RET_ABORTED;
+		goto out_unlock;
+	}
+
+	if (len > ffa_desc_buf.len) {
+		ret = FFA_RET_NO_MEMORY;
+		goto out_unlock;
+	}
+
+	buf = ffa_desc_buf.buf;
+	memcpy(buf, hyp_buffers.rx, fraglen);
+
+	for (fragoff = fraglen; fragoff < len; fragoff += fraglen) {
+		spmd_mem_frag_rx(res, handle_lo, handle_hi, fragoff);
+		if (res->a0 != FFA_MEM_FRAG_TX) {
+			ret = FFA_RET_INVALID_PARAMETERS;
+			goto out_unlock;
+		}
+
+		fraglen = res->a3;
+		memcpy((void *)buf + fragoff, hyp_buffers.rx, fraglen);
+	}
+
+	spmd_mem_reclaim(res, handle_lo, handle_hi, flags);
+	if (res->a0 != FFA_SUCCESS)
+		goto out_unlock;
+
+	reg = (void *)buf + offset;
+	/* If the SPMD was happy, then we should be too. */
+	WARN_ON(ffa_host_unshare_ranges(reg->constituents,
+					reg->addr_range_cnt));
+out_unlock:
+	hyp_spin_unlock(&host_buffers.lock);
+
+	if (ret)
+		ffa_to_smccc_res(res, ret);
+}
+
+static bool ffa_call_unsupported(u64 func_id)
+{
+	switch (func_id) {
+	/* Unsupported memory management calls */
+	case FFA_FN64_MEM_RETRIEVE_REQ:
+	case FFA_MEM_RETRIEVE_RESP:
+	case FFA_MEM_RELINQUISH:
+	case FFA_MEM_OP_PAUSE:
+	case FFA_MEM_OP_RESUME:
+	case FFA_MEM_FRAG_RX:
+	case FFA_FN64_MEM_DONATE:
+	/* Indirect message passing via RX/TX buffers */
+	case FFA_MSG_SEND:
+	case FFA_MSG_POLL:
+	case FFA_MSG_WAIT:
+	/* 32-bit variants of 64-bit calls */
+	case FFA_MSG_SEND_DIRECT_REQ:
+	case FFA_MSG_SEND_DIRECT_RESP:
+	case FFA_RXTX_MAP:
+	case FFA_MEM_DONATE:
+	case FFA_MEM_RETRIEVE_REQ:
+		return true;
+	}
+
+	return false;
+}
+
+static bool do_ffa_features(struct arm_smccc_res *res,
+			    struct kvm_cpu_context *ctxt)
+{
+	DECLARE_REG(u32, id, ctxt, 1);
+	u64 prop = 0;
+	int ret = 0;
+
+	if (ffa_call_unsupported(id)) {
+		ret = FFA_RET_NOT_SUPPORTED;
+		goto out_handled;
+	}
+
+	switch (id) {
+	case FFA_MEM_SHARE:
+	case FFA_FN64_MEM_SHARE:
+	case FFA_MEM_LEND:
+	case FFA_FN64_MEM_LEND:
+		ret = FFA_RET_SUCCESS;
+		prop = 0; /* No support for dynamic buffers */
+		goto out_handled;
+	default:
+		return false;
+	}
+
+out_handled:
+	ffa_to_smccc_res_prop(res, ret, prop);
+	return true;
+}
+
+bool kvm_host_ffa_handler(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, func_id, host_ctxt, 0);
+	struct arm_smccc_res res;
+
+	if (!is_ffa_call(func_id))
+		return false;
+
+	switch (func_id) {
+	case FFA_FEATURES:
+		if (!do_ffa_features(&res, host_ctxt))
+			return false;
+		goto out_handled;
+	/* Memory management */
+	case FFA_FN64_RXTX_MAP:
+		do_ffa_rxtx_map(&res, host_ctxt);
+		goto out_handled;
+	case FFA_RXTX_UNMAP:
+		do_ffa_rxtx_unmap(&res, host_ctxt);
+		goto out_handled;
+	case FFA_MEM_SHARE:
+	case FFA_FN64_MEM_SHARE:
+		do_ffa_mem_xfer(FFA_FN64_MEM_SHARE, &res, host_ctxt);
+		goto out_handled;
+	case FFA_MEM_RECLAIM:
+		do_ffa_mem_reclaim(&res, host_ctxt);
+		goto out_handled;
+	case FFA_MEM_LEND:
+	case FFA_FN64_MEM_LEND:
+		do_ffa_mem_xfer(FFA_FN64_MEM_LEND, &res, host_ctxt);
+		goto out_handled;
+	case FFA_MEM_FRAG_TX:
+		do_ffa_mem_frag_tx(&res, host_ctxt);
+		goto out_handled;
+	}
+
+	if (!ffa_call_unsupported(func_id))
+		return false; /* Pass through */
+
+	ffa_to_smccc_error(&res, FFA_RET_NOT_SUPPORTED);
+out_handled:
+	ffa_set_retval(host_ctxt, &res);
+	return true;
+}
+
+int hyp_ffa_init(void *pages)
+{
+	struct arm_smccc_res res;
+	size_t min_rxtx_sz;
+	void *tx, *rx;
+
+	if (kvm_host_psci_config.smccc_version < ARM_SMCCC_VERSION_1_1)
+		return 0;
+
+	arm_smccc_1_1_smc(FFA_VERSION, FFA_VERSION_1_0, 0, 0, 0, 0, 0, 0, &res);
+	if (res.a0 == FFA_RET_NOT_SUPPORTED)
+		return 0;
+
+	if (res.a0 != FFA_VERSION_1_0)
+		return -EOPNOTSUPP;
+
+	arm_smccc_1_1_smc(FFA_ID_GET, 0, 0, 0, 0, 0, 0, 0, &res);
+	if (res.a0 != FFA_SUCCESS)
+		return -EOPNOTSUPP;
+
+	if (res.a2 != HOST_FFA_ID)
+		return -EINVAL;
+
+	arm_smccc_1_1_smc(FFA_FEATURES, FFA_FN64_RXTX_MAP,
+			  0, 0, 0, 0, 0, 0, &res);
+	if (res.a0 != FFA_SUCCESS)
+		return -EOPNOTSUPP;
+
+	switch (res.a2) {
+	case FFA_FEAT_RXTX_MIN_SZ_4K:
+		min_rxtx_sz = SZ_4K;
+		break;
+	case FFA_FEAT_RXTX_MIN_SZ_16K:
+		min_rxtx_sz = SZ_16K;
+		break;
+	case FFA_FEAT_RXTX_MIN_SZ_64K:
+		min_rxtx_sz = SZ_64K;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (min_rxtx_sz > PAGE_SIZE)
+		return -EOPNOTSUPP;
+
+	tx = pages;
+	pages += KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE;
+	rx = pages;
+	pages += KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE;
+
+	ffa_desc_buf = (struct kvm_ffa_descriptor_buffer) {
+		.buf	= pages,
+		.len	= PAGE_SIZE *
+			  (hyp_ffa_proxy_pages() - (2 * KVM_FFA_MBOX_NR_PAGES)),
+	};
+
+	hyp_buffers = (struct kvm_ffa_buffers) {
+		.lock	= __HYP_SPIN_LOCK_UNLOCKED,
+		.tx	= tx,
+		.rx	= rx,
+	};
+
+	host_buffers = (struct kvm_ffa_buffers) {
+		.lock	= __HYP_SPIN_LOCK_UNLOCKED,
+	};
+
+	return 0;
+}

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index c953fb4b..a6d67c2 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S

@@ -183,6 +183,7 @@
 
 	/* Initialize EL2 CPU state to sane values. */
 	init_el2_state				// Clobbers x0..x2
+	finalise_el2_state
 
 	/* Enable MMU, set vectors and stack. */
 	mov	x0, x28

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 3cea4b6..8d8f7e2 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c

@@ -4,6 +4,8 @@
  * Author: Andrew Scull <ascull@google.com>
  */
 
+#include <kvm/arm_hypercalls.h>
+
 #include <hyp/adjust_pc.h>
 
 #include <asm/pgtable-types.h>
@@ -11,28 +13,896 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_hypevents.h>
 #include <asm/kvm_mmu.h>
 
+#include <nvhe/ffa.h>
+#include <nvhe/iommu.h>
 #include <nvhe/mem_protect.h>
+#include <nvhe/modules.h>
 #include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
+#include <nvhe/trace.h>
 #include <nvhe/trap_handler.h>
 
+#include <linux/irqchip/arm-gic-v3.h>
+#include <uapi/linux/psci.h>
+
+#include "../../sys_regs.h"
+
+/*
+ * Host FPSIMD state. Written to when the guest accesses its own FPSIMD state,
+ * and read when the guest state is live and we need to switch back to the host.
+ *
+ * Only valid when (fp_state == FP_STATE_GUEST_OWNED) in the hyp vCPU structure.
+ */
+static DEFINE_PER_CPU(struct user_fpsimd_state, loaded_host_fpsimd_state);
+
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
 
+static bool (*default_host_smc_handler)(struct kvm_cpu_context *host_ctxt);
+static bool (*default_trap_handler)(struct kvm_cpu_context *host_ctxt);
+
+int __pkvm_register_host_smc_handler(bool (*cb)(struct kvm_cpu_context *))
+{
+	return cmpxchg(&default_host_smc_handler, NULL, cb) ? -EBUSY : 0;
+}
+
+int __pkvm_register_default_trap_handler(bool (*cb)(struct kvm_cpu_context *))
+{
+	return cmpxchg(&default_trap_handler, NULL, cb) ? -EBUSY : 0;
+}
+
+static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+	u64 nr_pages = VTCR_EL2_LVLS(hyp_vm->kvm.arch.vtcr) - 1;
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+	return refill_memcache(&hyp_vcpu->vcpu.arch.pkvm_memcache, nr_pages,
+			       &host_vcpu->arch.pkvm_memcache);
+}
+
+typedef void (*hyp_entry_exit_handler_fn)(struct pkvm_hyp_vcpu *);
+
+static void handle_pvm_entry_wfx(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	if (vcpu_get_flag(hyp_vcpu->host_vcpu, INCREMENT_PC)) {
+		vcpu_clear_flag(&hyp_vcpu->vcpu, PC_UPDATE_REQ);
+		kvm_incr_pc(&hyp_vcpu->vcpu);
+	}
+}
+
+static void handle_pvm_entry_psci(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	u32 psci_fn = smccc_get_function(&hyp_vcpu->vcpu);
+	u64 ret = READ_ONCE(hyp_vcpu->host_vcpu->arch.ctxt.regs.regs[0]);
+
+	switch (psci_fn) {
+	case PSCI_0_2_FN_CPU_ON:
+	case PSCI_0_2_FN64_CPU_ON:
+		/*
+		 * Check whether the cpu_on request to the host was successful.
+		 * If not, reset the vcpu state from ON_PENDING to OFF.
+		 * This could happen if this vcpu attempted to turn on the other
+		 * vcpu while the other one is in the process of turning itself
+		 * off.
+		 */
+		if (ret != PSCI_RET_SUCCESS) {
+			unsigned long cpu_id = smccc_get_arg1(&hyp_vcpu->vcpu);
+			struct pkvm_hyp_vcpu *target_vcpu;
+			struct pkvm_hyp_vm *hyp_vm;
+
+			hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+			target_vcpu = pkvm_mpidr_to_hyp_vcpu(hyp_vm, cpu_id);
+
+			if (target_vcpu && READ_ONCE(target_vcpu->power_state) == PSCI_0_2_AFFINITY_LEVEL_ON_PENDING)
+				WRITE_ONCE(target_vcpu->power_state, PSCI_0_2_AFFINITY_LEVEL_OFF);
+
+			ret = PSCI_RET_INTERNAL_FAILURE;
+		}
+
+		break;
+	default:
+		break;
+	}
+
+	vcpu_set_reg(&hyp_vcpu->vcpu, 0, ret);
+}
+
+static void handle_pvm_entry_hvc64(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	u32 fn = smccc_get_function(&hyp_vcpu->vcpu);
+
+	switch (fn) {
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
+		pkvm_refill_memcache(hyp_vcpu);
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID:
+		fallthrough;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID:
+		fallthrough;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_RELINQUISH_FUNC_ID:
+		vcpu_set_reg(&hyp_vcpu->vcpu, 0, SMCCC_RET_SUCCESS);
+		break;
+	default:
+		handle_pvm_entry_psci(hyp_vcpu);
+		break;
+	}
+}
+
+static void handle_pvm_entry_sys64(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+	/* Exceptions have priority on anything else */
+	if (vcpu_get_flag(host_vcpu, PENDING_EXCEPTION)) {
+		/* Exceptions caused by this should be undef exceptions. */
+		u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
+
+		__vcpu_sys_reg(&hyp_vcpu->vcpu, ESR_EL1) = esr;
+		kvm_pend_exception(&hyp_vcpu->vcpu, EXCEPT_AA64_EL1_SYNC);
+		return;
+	}
+
+	if (vcpu_get_flag(host_vcpu, INCREMENT_PC)) {
+		vcpu_clear_flag(&hyp_vcpu->vcpu, PC_UPDATE_REQ);
+		kvm_incr_pc(&hyp_vcpu->vcpu);
+	}
+
+	if (!esr_sys64_to_params(hyp_vcpu->vcpu.arch.fault.esr_el2).is_write) {
+		/* r0 as transfer register between the guest and the host. */
+		u64 rt_val = READ_ONCE(host_vcpu->arch.ctxt.regs.regs[0]);
+		int rt = kvm_vcpu_sys_get_rt(&hyp_vcpu->vcpu);
+
+		vcpu_set_reg(&hyp_vcpu->vcpu, rt, rt_val);
+	}
+}
+
+static void handle_pvm_entry_iabt(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	unsigned long cpsr = *vcpu_cpsr(&hyp_vcpu->vcpu);
+	u32 esr = ESR_ELx_IL;
+
+	if (!vcpu_get_flag(hyp_vcpu->host_vcpu, PENDING_EXCEPTION))
+		return;
+
+	/*
+	 * If the host wants to inject an exception, get syndrom and
+	 * fault address.
+	 */
+	if ((cpsr & PSR_MODE_MASK) == PSR_MODE_EL0t)
+		esr |= (ESR_ELx_EC_IABT_LOW << ESR_ELx_EC_SHIFT);
+	else
+		esr |= (ESR_ELx_EC_IABT_CUR << ESR_ELx_EC_SHIFT);
+
+	esr |= ESR_ELx_FSC_EXTABT;
+
+	__vcpu_sys_reg(&hyp_vcpu->vcpu, ESR_EL1) = esr;
+	__vcpu_sys_reg(&hyp_vcpu->vcpu, FAR_EL1) =
+		kvm_vcpu_get_hfar(&hyp_vcpu->vcpu);
+
+	/* Tell the run loop that we want to inject something */
+	kvm_pend_exception(&hyp_vcpu->vcpu, EXCEPT_AA64_EL1_SYNC);
+}
+
+static void handle_pvm_entry_dabt(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	bool pc_update;
+
+	/* Exceptions have priority over anything else */
+	if (vcpu_get_flag(host_vcpu, PENDING_EXCEPTION)) {
+		unsigned long cpsr = *vcpu_cpsr(&hyp_vcpu->vcpu);
+		u32 esr = ESR_ELx_IL;
+
+		if ((cpsr & PSR_MODE_MASK) == PSR_MODE_EL0t)
+			esr |= (ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT);
+		else
+			esr |= (ESR_ELx_EC_DABT_CUR << ESR_ELx_EC_SHIFT);
+
+		esr |= ESR_ELx_FSC_EXTABT;
+
+		__vcpu_sys_reg(&hyp_vcpu->vcpu, ESR_EL1) = esr;
+		__vcpu_sys_reg(&hyp_vcpu->vcpu, FAR_EL1) =
+			kvm_vcpu_get_hfar(&hyp_vcpu->vcpu);
+
+		/* Tell the run loop that we want to inject something */
+		kvm_pend_exception(&hyp_vcpu->vcpu, EXCEPT_AA64_EL1_SYNC);
+
+		/* Cancel potential in-flight MMIO */
+		hyp_vcpu->vcpu.mmio_needed = false;
+		return;
+	}
+
+	/* Handle PC increment on MMIO */
+	pc_update = (hyp_vcpu->vcpu.mmio_needed &&
+		     vcpu_get_flag(host_vcpu, INCREMENT_PC));
+	if (pc_update) {
+		vcpu_clear_flag(&hyp_vcpu->vcpu, PC_UPDATE_REQ);
+		kvm_incr_pc(&hyp_vcpu->vcpu);
+	}
+
+	/* If we were doing an MMIO read access, update the register*/
+	if (pc_update && !kvm_vcpu_dabt_iswrite(&hyp_vcpu->vcpu)) {
+		/* r0 as transfer register between the guest and the host. */
+		u64 rd_val = READ_ONCE(host_vcpu->arch.ctxt.regs.regs[0]);
+		int rd = kvm_vcpu_dabt_get_rd(&hyp_vcpu->vcpu);
+
+		vcpu_set_reg(&hyp_vcpu->vcpu, rd, rd_val);
+	}
+
+	hyp_vcpu->vcpu.mmio_needed = false;
+}
+
+static void handle_pvm_exit_wfx(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	WRITE_ONCE(hyp_vcpu->host_vcpu->arch.ctxt.regs.pstate,
+		   hyp_vcpu->vcpu.arch.ctxt.regs.pstate & PSR_MODE_MASK);
+	WRITE_ONCE(hyp_vcpu->host_vcpu->arch.fault.esr_el2,
+		   hyp_vcpu->vcpu.arch.fault.esr_el2);
+}
+
+static void handle_pvm_exit_sys64(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	u32 esr_el2 = hyp_vcpu->vcpu.arch.fault.esr_el2;
+
+	/* r0 as transfer register between the guest and the host. */
+	WRITE_ONCE(host_vcpu->arch.fault.esr_el2,
+		   esr_el2 & ~ESR_ELx_SYS64_ISS_RT_MASK);
+
+	/* The mode is required for the host to emulate some sysregs */
+	WRITE_ONCE(host_vcpu->arch.ctxt.regs.pstate,
+		   hyp_vcpu->vcpu.arch.ctxt.regs.pstate & PSR_MODE_MASK);
+
+	if (esr_sys64_to_params(esr_el2).is_write) {
+		int rt = kvm_vcpu_sys_get_rt(&hyp_vcpu->vcpu);
+		u64 rt_val = vcpu_get_reg(&hyp_vcpu->vcpu, rt);
+
+		WRITE_ONCE(host_vcpu->arch.ctxt.regs.regs[0], rt_val);
+	}
+}
+
+static void handle_pvm_exit_hvc64(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	int n, i;
+
+	switch (smccc_get_function(&hyp_vcpu->vcpu)) {
+	/*
+	 * CPU_ON takes 3 arguments, however, to wake up the target vcpu the
+	 * host only needs to know the target's cpu_id, which is passed as the
+	 * first argument. The processing of the reset state is done at hyp.
+	 */
+	case PSCI_0_2_FN_CPU_ON:
+	case PSCI_0_2_FN64_CPU_ON:
+		n = 2;
+		break;
+
+	case PSCI_0_2_FN_CPU_OFF:
+	case PSCI_0_2_FN_SYSTEM_OFF:
+	case PSCI_0_2_FN_SYSTEM_RESET:
+	case PSCI_0_2_FN_CPU_SUSPEND:
+	case PSCI_0_2_FN64_CPU_SUSPEND:
+		n = 1;
+		break;
+
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID:
+		fallthrough;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID:
+		fallthrough;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_RELINQUISH_FUNC_ID:
+		n = 4;
+		break;
+
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
+		n = 3;
+		break;
+
+	case PSCI_1_1_FN_SYSTEM_RESET2:
+	case PSCI_1_1_FN64_SYSTEM_RESET2:
+		n = 3;
+		break;
+
+	/*
+	 * The rest are either blocked or handled by HYP, so we should
+	 * really never be here.
+	 */
+	default:
+		BUG();
+	}
+
+	WRITE_ONCE(host_vcpu->arch.fault.esr_el2,
+		   hyp_vcpu->vcpu.arch.fault.esr_el2);
+
+	/* Pass the hvc function id (r0) as well as any potential arguments. */
+	for (i = 0; i < n; i++) {
+		WRITE_ONCE(host_vcpu->arch.ctxt.regs.regs[i],
+			   vcpu_get_reg(&hyp_vcpu->vcpu, i));
+	}
+}
+
+static void handle_pvm_exit_iabt(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	WRITE_ONCE(hyp_vcpu->host_vcpu->arch.fault.esr_el2,
+		   hyp_vcpu->vcpu.arch.fault.esr_el2);
+	WRITE_ONCE(hyp_vcpu->host_vcpu->arch.fault.hpfar_el2,
+		   hyp_vcpu->vcpu.arch.fault.hpfar_el2);
+}
+
+static void handle_pvm_exit_dabt(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+	hyp_vcpu->vcpu.mmio_needed = __pkvm_check_ioguard_page(hyp_vcpu);
+
+	if (hyp_vcpu->vcpu.mmio_needed) {
+		/* r0 as transfer register between the guest and the host. */
+		WRITE_ONCE(host_vcpu->arch.fault.esr_el2,
+			   hyp_vcpu->vcpu.arch.fault.esr_el2 & ~ESR_ELx_SRT_MASK);
+
+		if (kvm_vcpu_dabt_iswrite(&hyp_vcpu->vcpu)) {
+			int rt = kvm_vcpu_dabt_get_rd(&hyp_vcpu->vcpu);
+			u64 rt_val = vcpu_get_reg(&hyp_vcpu->vcpu, rt);
+
+			WRITE_ONCE(host_vcpu->arch.ctxt.regs.regs[0], rt_val);
+		}
+	} else {
+		WRITE_ONCE(host_vcpu->arch.fault.esr_el2,
+			   hyp_vcpu->vcpu.arch.fault.esr_el2 & ~ESR_ELx_ISV);
+	}
+
+	WRITE_ONCE(host_vcpu->arch.ctxt.regs.pstate,
+		   hyp_vcpu->vcpu.arch.ctxt.regs.pstate & PSR_MODE_MASK);
+	WRITE_ONCE(host_vcpu->arch.fault.far_el2,
+		   hyp_vcpu->vcpu.arch.fault.far_el2 & GENMASK(11, 0));
+	WRITE_ONCE(host_vcpu->arch.fault.hpfar_el2,
+		   hyp_vcpu->vcpu.arch.fault.hpfar_el2);
+	WRITE_ONCE(__vcpu_sys_reg(host_vcpu, SCTLR_EL1),
+		   __vcpu_sys_reg(&hyp_vcpu->vcpu, SCTLR_EL1) &
+			(SCTLR_ELx_EE | SCTLR_EL1_E0E));
+}
+
+static void handle_vm_entry_generic(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	vcpu_copy_flag(&hyp_vcpu->vcpu, hyp_vcpu->host_vcpu, PC_UPDATE_REQ);
+}
+
+static void handle_vm_exit_generic(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	WRITE_ONCE(hyp_vcpu->host_vcpu->arch.fault.esr_el2,
+		   hyp_vcpu->vcpu.arch.fault.esr_el2);
+}
+
+static void handle_vm_exit_abt(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+	WRITE_ONCE(host_vcpu->arch.fault.esr_el2,
+		   hyp_vcpu->vcpu.arch.fault.esr_el2);
+	WRITE_ONCE(host_vcpu->arch.fault.far_el2,
+		   hyp_vcpu->vcpu.arch.fault.far_el2);
+	WRITE_ONCE(host_vcpu->arch.fault.hpfar_el2,
+		   hyp_vcpu->vcpu.arch.fault.hpfar_el2);
+	WRITE_ONCE(host_vcpu->arch.fault.disr_el1,
+		   hyp_vcpu->vcpu.arch.fault.disr_el1);
+}
+
+static const hyp_entry_exit_handler_fn entry_hyp_pvm_handlers[] = {
+	[0 ... ESR_ELx_EC_MAX]		= NULL,
+	[ESR_ELx_EC_WFx]		= handle_pvm_entry_wfx,
+	[ESR_ELx_EC_HVC64]		= handle_pvm_entry_hvc64,
+	[ESR_ELx_EC_SYS64]		= handle_pvm_entry_sys64,
+	[ESR_ELx_EC_IABT_LOW]		= handle_pvm_entry_iabt,
+	[ESR_ELx_EC_DABT_LOW]		= handle_pvm_entry_dabt,
+};
+
+static const hyp_entry_exit_handler_fn exit_hyp_pvm_handlers[] = {
+	[0 ... ESR_ELx_EC_MAX]		= NULL,
+	[ESR_ELx_EC_WFx]		= handle_pvm_exit_wfx,
+	[ESR_ELx_EC_HVC64]		= handle_pvm_exit_hvc64,
+	[ESR_ELx_EC_SYS64]		= handle_pvm_exit_sys64,
+	[ESR_ELx_EC_IABT_LOW]		= handle_pvm_exit_iabt,
+	[ESR_ELx_EC_DABT_LOW]		= handle_pvm_exit_dabt,
+};
+
+static const hyp_entry_exit_handler_fn entry_hyp_vm_handlers[] = {
+	[0 ... ESR_ELx_EC_MAX]		= handle_vm_entry_generic,
+};
+
+static const hyp_entry_exit_handler_fn exit_hyp_vm_handlers[] = {
+	[0 ... ESR_ELx_EC_MAX]		= handle_vm_exit_generic,
+	[ESR_ELx_EC_IABT_LOW]		= handle_vm_exit_abt,
+	[ESR_ELx_EC_DABT_LOW]		= handle_vm_exit_abt,
+};
+
+static void flush_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
+	unsigned int used_lrs, max_lrs, i;
+
+	host_cpu_if	= &host_vcpu->arch.vgic_cpu.vgic_v3;
+	hyp_cpu_if	= &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+
+	max_lrs		= (read_gicreg(ICH_VTR_EL2) & 0xf) + 1;
+	used_lrs	= READ_ONCE(host_cpu_if->used_lrs);
+	used_lrs	= min(used_lrs, max_lrs);
+
+	hyp_cpu_if->vgic_hcr	= READ_ONCE(host_cpu_if->vgic_hcr);
+	/* Should be a one-off */
+	hyp_cpu_if->vgic_sre	= (ICC_SRE_EL1_DIB |
+				   ICC_SRE_EL1_DFB |
+				   ICC_SRE_EL1_SRE);
+	hyp_cpu_if->used_lrs	= used_lrs;
+
+	for (i = 0; i < used_lrs; i++)
+		hyp_cpu_if->vgic_lr[i] = READ_ONCE(host_cpu_if->vgic_lr[i]);
+}
+
+static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	struct vgic_v3_cpu_if *host_cpu_if, *hyp_cpu_if;
+	unsigned int i;
+
+	host_cpu_if	= &host_vcpu->arch.vgic_cpu.vgic_v3;
+	hyp_cpu_if	= &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+
+	WRITE_ONCE(host_cpu_if->vgic_hcr, hyp_cpu_if->vgic_hcr);
+
+	for (i = 0; i < hyp_cpu_if->used_lrs; i++)
+		WRITE_ONCE(host_cpu_if->vgic_lr[i], hyp_cpu_if->vgic_lr[i]);
+}
+
+static void flush_hyp_timer_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		return;
+
+	/*
+	 * A hyp vcpu has no offset, and sees vtime == ptime. The
+	 * ptimer is fully emulated by EL1 and cannot be trusted.
+	 */
+	write_sysreg(0, cntvoff_el2);
+	isb();
+	write_sysreg_el0(__vcpu_sys_reg(&hyp_vcpu->vcpu, CNTV_CVAL_EL0),
+			 SYS_CNTV_CVAL);
+	write_sysreg_el0(__vcpu_sys_reg(&hyp_vcpu->vcpu, CNTV_CTL_EL0),
+			 SYS_CNTV_CTL);
+}
+
+static void sync_hyp_timer_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		return;
+
+	/*
+	 * Preserve the vtimer state so that it is always correct,
+	 * even if the host tries to make a mess.
+	 */
+	__vcpu_sys_reg(&hyp_vcpu->vcpu, CNTV_CVAL_EL0) =
+		read_sysreg_el0(SYS_CNTV_CVAL);
+	__vcpu_sys_reg(&hyp_vcpu->vcpu, CNTV_CTL_EL0) =
+		read_sysreg_el0(SYS_CNTV_CTL);
+}
+
+static void __copy_vcpu_state(const struct kvm_vcpu *from_vcpu,
+			      struct kvm_vcpu *to_vcpu)
+{
+	int i;
+
+	to_vcpu->arch.ctxt.regs		= from_vcpu->arch.ctxt.regs;
+	to_vcpu->arch.ctxt.spsr_abt	= from_vcpu->arch.ctxt.spsr_abt;
+	to_vcpu->arch.ctxt.spsr_und	= from_vcpu->arch.ctxt.spsr_und;
+	to_vcpu->arch.ctxt.spsr_irq	= from_vcpu->arch.ctxt.spsr_irq;
+	to_vcpu->arch.ctxt.spsr_fiq	= from_vcpu->arch.ctxt.spsr_fiq;
+
+	/*
+	 * Copy the sysregs, but don't mess with the timer state which
+	 * is directly handled by EL1 and is expected to be preserved.
+	 */
+	for (i = 1; i < NR_SYS_REGS; i++) {
+		if (i >= CNTVOFF_EL2 && i <= CNTP_CTL_EL0)
+			continue;
+		to_vcpu->arch.ctxt.sys_regs[i] = from_vcpu->arch.ctxt.sys_regs[i];
+	}
+}
+
+static void __sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	__copy_vcpu_state(&hyp_vcpu->vcpu, hyp_vcpu->host_vcpu);
+}
+
+static void __flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	__copy_vcpu_state(hyp_vcpu->host_vcpu, &hyp_vcpu->vcpu);
+}
+
+static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	u64 mdcr_el2 = READ_ONCE(host_vcpu->arch.mdcr_el2);
+
+	/*
+	 * Propagate the monitor debug configuration of the vcpu from host.
+	 * Preserve HPMN, which is set-up by some knowledgeable bootcode.
+	 * Ensure that MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK are clear,
+	 * as guests should not be able to access profiling and trace buffers.
+	 * Ensure that RES0 bits are clear.
+	 */
+	mdcr_el2 &= ~(MDCR_EL2_RES0 |
+		      MDCR_EL2_HPMN_MASK |
+		      (MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) |
+		      (MDCR_EL2_E2TB_MASK << MDCR_EL2_E2TB_SHIFT));
+	vcpu->arch.mdcr_el2 = read_sysreg(mdcr_el2) & MDCR_EL2_HPMN_MASK;
+	vcpu->arch.mdcr_el2 |= mdcr_el2;
+
+	vcpu->arch.pmu = host_vcpu->arch.pmu;
+	vcpu->guest_debug = READ_ONCE(host_vcpu->guest_debug);
+
+	if (!kvm_vcpu_needs_debug_regs(vcpu))
+		return;
+
+	__vcpu_save_guest_debug_regs(vcpu);
+
+	/* Switch debug_ptr to the external_debug_state if done by the host. */
+	if (kern_hyp_va(READ_ONCE(host_vcpu->arch.debug_ptr)) ==
+	    &host_vcpu->arch.external_debug_state)
+		vcpu->arch.debug_ptr = &host_vcpu->arch.external_debug_state;
+
+	/* Propagate any special handling for single step from host. */
+	vcpu_write_sys_reg(vcpu, vcpu_read_sys_reg(host_vcpu, MDSCR_EL1),
+						   MDSCR_EL1);
+	*vcpu_cpsr(vcpu) = *vcpu_cpsr(host_vcpu);
+}
+
+static void sync_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+	if (!kvm_vcpu_needs_debug_regs(vcpu))
+		return;
+
+	__vcpu_restore_guest_debug_regs(vcpu);
+	vcpu->arch.debug_ptr = &host_vcpu->arch.vcpu_debug_state;
+}
+
+static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	hyp_entry_exit_handler_fn ec_handler;
+	u8 esr_ec;
+
+	if (READ_ONCE(hyp_vcpu->power_state) == PSCI_0_2_AFFINITY_LEVEL_ON_PENDING)
+		pkvm_reset_vcpu(hyp_vcpu);
+
+	/*
+	 * If we deal with a non-protected guest and the state is potentially
+	 * dirty (from a host perspective), copy the state back into the hyp
+	 * vcpu.
+	 */
+	if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
+		if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
+			__flush_hyp_vcpu(hyp_vcpu);
+
+		hyp_vcpu->vcpu.arch.iflags = READ_ONCE(host_vcpu->arch.iflags);
+		flush_debug_state(hyp_vcpu);
+
+		hyp_vcpu->vcpu.arch.hcr_el2 = HCR_GUEST_FLAGS & ~(HCR_RW | HCR_TWI | HCR_TWE);
+		hyp_vcpu->vcpu.arch.hcr_el2 |= READ_ONCE(host_vcpu->arch.hcr_el2);
+	}
+
+	hyp_vcpu->vcpu.arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
+
+	flush_hyp_vgic_state(hyp_vcpu);
+	flush_hyp_timer_state(hyp_vcpu);
+
+	switch (ARM_EXCEPTION_CODE(hyp_vcpu->exit_code)) {
+	case ARM_EXCEPTION_IRQ:
+	case ARM_EXCEPTION_EL1_SERROR:
+	case ARM_EXCEPTION_IL:
+		break;
+	case ARM_EXCEPTION_TRAP:
+		esr_ec = ESR_ELx_EC(kvm_vcpu_get_esr(&hyp_vcpu->vcpu));
+
+		if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+			ec_handler = entry_hyp_pvm_handlers[esr_ec];
+		else
+			ec_handler = entry_hyp_vm_handlers[esr_ec];
+
+		if (ec_handler)
+			ec_handler(hyp_vcpu);
+		break;
+	default:
+		BUG();
+	}
+
+	hyp_vcpu->exit_code = 0;
+}
+
+static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu, u32 exit_reason)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	hyp_entry_exit_handler_fn ec_handler;
+	u8 esr_ec;
+
+	if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		sync_debug_state(hyp_vcpu);
+
+	/*
+	 * Don't sync the vcpu GPR/sysreg state after a run. Instead,
+	 * leave it in the hyp vCPU until someone actually requires it.
+	 */
+	sync_hyp_vgic_state(hyp_vcpu);
+	sync_hyp_timer_state(hyp_vcpu);
+
+	switch (ARM_EXCEPTION_CODE(exit_reason)) {
+	case ARM_EXCEPTION_IRQ:
+		break;
+	case ARM_EXCEPTION_TRAP:
+		esr_ec = ESR_ELx_EC(kvm_vcpu_get_esr(&hyp_vcpu->vcpu));
+
+		if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+			ec_handler = exit_hyp_pvm_handlers[esr_ec];
+		else
+			ec_handler = exit_hyp_vm_handlers[esr_ec];
+
+		if (ec_handler)
+			ec_handler(hyp_vcpu);
+		break;
+	case ARM_EXCEPTION_EL1_SERROR:
+	case ARM_EXCEPTION_IL:
+		break;
+	default:
+		BUG();
+	}
+
+	if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		vcpu_clear_flag(host_vcpu, PC_UPDATE_REQ);
+	else
+		host_vcpu->arch.iflags = hyp_vcpu->vcpu.arch.iflags;
+
+	hyp_vcpu->exit_code = exit_reason;
+}
+
+static void __hyp_sve_save_guest(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+
+	__sve_save_state(vcpu_sve_pffr(vcpu), &vcpu->arch.ctxt.fp_regs.fpsr);
+	__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
+	sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL1);
+}
+
+static void fpsimd_host_restore(void)
+{
+	sysreg_clear_set(cptr_el2, CPTR_EL2_TZ | CPTR_EL2_TFP, 0);
+	isb();
+
+	if (unlikely(is_protected_kvm_enabled())) {
+		struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+		struct user_fpsimd_state *host_fpsimd_state;
+
+		host_fpsimd_state = this_cpu_ptr(&loaded_host_fpsimd_state);
+
+		if (vcpu_has_sve(&hyp_vcpu->vcpu))
+			__hyp_sve_save_guest(hyp_vcpu);
+		else
+			__fpsimd_save_state(&hyp_vcpu->vcpu.arch.ctxt.fp_regs);
+
+		__fpsimd_restore_state(host_fpsimd_state);
+
+		hyp_vcpu->vcpu.arch.fp_state = FP_STATE_HOST_OWNED;
+	}
+
+	if (system_supports_sve())
+		sve_cond_update_zcr_vq(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2);
+}
+
+static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+	DECLARE_REG(unsigned int, vcpu_idx, host_ctxt, 2);
+	DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	int __percpu *last_vcpu_ran;
+	int *last_ran;
+
+	if (!is_protected_kvm_enabled())
+		return;
+
+	hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
+	if (!hyp_vcpu)
+		return;
+
+	/*
+	 * Guarantee that both TLBs and I-cache are private to each vcpu. If a
+	 * vcpu from the same VM has previously run on the same physical CPU,
+	 * nuke the relevant contexts.
+	 */
+	last_vcpu_ran = hyp_vcpu->vcpu.arch.hw_mmu->last_vcpu_ran;
+	last_ran = (__force int *) &last_vcpu_ran[hyp_smp_processor_id()];
+	if (*last_ran != hyp_vcpu->vcpu.vcpu_id) {
+		__kvm_flush_cpu_context(hyp_vcpu->vcpu.arch.hw_mmu);
+		*last_ran = hyp_vcpu->vcpu.vcpu_id;
+	}
+
+	hyp_vcpu->vcpu.arch.host_fpsimd_state = this_cpu_ptr(&loaded_host_fpsimd_state);
+	hyp_vcpu->vcpu.arch.fp_state = FP_STATE_HOST_OWNED;
+
+	if (pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
+		/* Propagate WFx trapping flags, trap ptrauth */
+		hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWE | HCR_TWI |
+						     HCR_API | HCR_APK);
+		hyp_vcpu->vcpu.arch.hcr_el2 |= hcr_el2 & (HCR_TWE | HCR_TWI);
+	}
+}
+
+static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+
+	if (!is_protected_kvm_enabled())
+		return;
+
+	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+	if (hyp_vcpu) {
+		struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+		if (hyp_vcpu->vcpu.arch.fp_state == FP_STATE_GUEST_OWNED)
+			fpsimd_host_restore();
+
+		if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu) &&
+		    !vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY)) {
+			__sync_hyp_vcpu(hyp_vcpu);
+		}
+
+		pkvm_put_hyp_vcpu(hyp_vcpu);
+	}
+}
+
+static void handle___pkvm_vcpu_sync_state(struct kvm_cpu_context *host_ctxt)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+
+	if (!is_protected_kvm_enabled())
+		return;
+
+	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		return;
+
+	if (hyp_vcpu->vcpu.arch.fp_state == FP_STATE_GUEST_OWNED)
+		fpsimd_host_restore();
+
+	__sync_hyp_vcpu(hyp_vcpu);
+}
+
+static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
+					     struct pkvm_hyp_vcpu **hyp_vcpup)
+{
+	struct kvm_vcpu *host_vcpu = kern_hyp_va(arg);
+	struct pkvm_hyp_vcpu *hyp_vcpu = NULL;
+
+	if (unlikely(is_protected_kvm_enabled())) {
+		hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+
+		if (!hyp_vcpu || hyp_vcpu->host_vcpu != host_vcpu) {
+			hyp_vcpu = NULL;
+			host_vcpu = NULL;
+		}
+	}
+
+	*hyp_vcpup = hyp_vcpu;
+	return host_vcpu;
+}
+
+#define get_host_hyp_vcpus(ctxt, regnr, hyp_vcpup)			\
+	({								\
+		DECLARE_REG(struct kvm_vcpu *, __vcpu, ctxt, regnr);	\
+		__get_host_hyp_vcpus(__vcpu, hyp_vcpup);		\
+	})
+
+#define get_host_hyp_vcpus_from_vgic_v3_cpu_if(ctxt, regnr, hyp_vcpup)		\
+	({									\
+		DECLARE_REG(struct vgic_v3_cpu_if *, cif, ctxt, regnr); 	\
+		struct kvm_vcpu *__vcpu = container_of(cif,			\
+						       struct kvm_vcpu,		\
+						       arch.vgic_cpu.vgic_v3);	\
+										\
+		__get_host_hyp_vcpus(__vcpu, hyp_vcpup);			\
+	})
+
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	struct kvm_vcpu *host_vcpu;
+	int ret;
 
-	cpu_reg(host_ctxt, 1) =  __kvm_vcpu_run(kern_hyp_va(vcpu));
+	host_vcpu = get_host_hyp_vcpus(host_ctxt, 1, &hyp_vcpu);
+	if (!host_vcpu) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (unlikely(hyp_vcpu)) {
+		flush_hyp_vcpu(hyp_vcpu);
+
+		ret = __kvm_vcpu_run(&hyp_vcpu->vcpu);
+
+		sync_hyp_vcpu(hyp_vcpu, ret);
+
+		if (hyp_vcpu->vcpu.arch.fp_state == FP_STATE_GUEST_OWNED) {
+			/*
+			 * The guest has used the FP, trap all accesses
+			 * from the host (both FP and SVE).
+			 */
+			u64 reg = CPTR_EL2_TFP;
+
+			if (system_supports_sve())
+				reg |= CPTR_EL2_TZ;
+
+			sysreg_clear_set(cptr_el2, 0, reg);
+		}
+	} else {
+		/* The host is fully trusted, run its vCPU directly. */
+		ret = __kvm_vcpu_run(host_vcpu);
+	}
+out:
+	cpu_reg(host_ctxt, 1) =  ret;
+}
+
+static void handle___pkvm_host_map_guest(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, pfn, host_ctxt, 1);
+	DECLARE_REG(u64, gfn, host_ctxt, 2);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	int ret = -EINVAL;
+
+	if (!is_protected_kvm_enabled())
+		goto out;
+
+	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+	if (!hyp_vcpu)
+		goto out;
+
+	/* Top-up our per-vcpu memcache from the host's */
+	ret = pkvm_refill_memcache(hyp_vcpu);
+	if (ret)
+		goto out;
+
+	if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		ret = __pkvm_host_donate_guest(pfn, gfn, hyp_vcpu);
+	else
+		ret = __pkvm_host_share_guest(pfn, gfn, hyp_vcpu);
+out:
+	cpu_reg(host_ctxt, 1) =  ret;
 }
 
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	struct kvm_vcpu *host_vcpu;
 
-	__kvm_adjust_pc(kern_hyp_va(vcpu));
+	host_vcpu = get_host_hyp_vcpus(host_ctxt, 1, &hyp_vcpu);
+	if (!host_vcpu)
+		return;
+
+	if (hyp_vcpu) {
+		/* This only applies to non-protected VMs */
+		if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+			return;
+
+		__kvm_adjust_pc(&hyp_vcpu->vcpu);
+	} else {
+		__kvm_adjust_pc(host_vcpu);
+	}
 }
 
 static void handle___kvm_flush_vm_context(struct kvm_cpu_context *host_ctxt)
@@ -82,16 +952,6 @@ static void handle___vgic_v3_get_gic_config(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __vgic_v3_get_gic_config();
 }
 
-static void handle___vgic_v3_read_vmcr(struct kvm_cpu_context *host_ctxt)
-{
-	cpu_reg(host_ctxt, 1) = __vgic_v3_read_vmcr();
-}
-
-static void handle___vgic_v3_write_vmcr(struct kvm_cpu_context *host_ctxt)
-{
-	__vgic_v3_write_vmcr(cpu_reg(host_ctxt, 1));
-}
-
 static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
 {
 	__vgic_v3_init_lrs();
@@ -102,18 +962,65 @@ static void handle___kvm_get_mdcr_el2(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __kvm_get_mdcr_el2();
 }
 
-static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
+static void handle___vgic_v3_save_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	struct kvm_vcpu *host_vcpu;
 
-	__vgic_v3_save_aprs(kern_hyp_va(cpu_if));
+	host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
+							   &hyp_vcpu);
+	if (!host_vcpu)
+		return;
+
+	if (unlikely(hyp_vcpu)) {
+		struct vgic_v3_cpu_if *hyp_cpu_if, *host_cpu_if;
+		int i;
+
+		hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+		__vgic_v3_save_vmcr_aprs(hyp_cpu_if);
+
+		host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+		host_cpu_if->vgic_vmcr = hyp_cpu_if->vgic_vmcr;
+		for (i = 0; i < ARRAY_SIZE(host_cpu_if->vgic_ap0r); i++) {
+			host_cpu_if->vgic_ap0r[i] = hyp_cpu_if->vgic_ap0r[i];
+			host_cpu_if->vgic_ap1r[i] = hyp_cpu_if->vgic_ap1r[i];
+		}
+	} else {
+		__vgic_v3_save_vmcr_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
+	}
 }
 
-static void handle___vgic_v3_restore_aprs(struct kvm_cpu_context *host_ctxt)
+static void handle___vgic_v3_restore_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	struct kvm_vcpu *host_vcpu;
 
-	__vgic_v3_restore_aprs(kern_hyp_va(cpu_if));
+	host_vcpu = get_host_hyp_vcpus_from_vgic_v3_cpu_if(host_ctxt, 1,
+							   &hyp_vcpu);
+	if (!host_vcpu)
+		return;
+
+	if (unlikely(hyp_vcpu)) {
+		struct vgic_v3_cpu_if *hyp_cpu_if, *host_cpu_if;
+		int i;
+
+		hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+		host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+
+		hyp_cpu_if->vgic_vmcr = host_cpu_if->vgic_vmcr;
+		/* Should be a one-off */
+		hyp_cpu_if->vgic_sre = (ICC_SRE_EL1_DIB |
+					ICC_SRE_EL1_DFB |
+					ICC_SRE_EL1_SRE);
+		for (i = 0; i < ARRAY_SIZE(host_cpu_if->vgic_ap0r); i++) {
+			hyp_cpu_if->vgic_ap0r[i] = host_cpu_if->vgic_ap0r[i];
+			hyp_cpu_if->vgic_ap1r[i] = host_cpu_if->vgic_ap1r[i];
+		}
+
+		__vgic_v3_restore_vmcr_aprs(hyp_cpu_if);
+	} else {
+		__vgic_v3_restore_vmcr_aprs(&host_vcpu->arch.vgic_cpu.vgic_v3);
+	}
 }
 
 static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
@@ -154,6 +1061,15 @@ static void handle___pkvm_host_unshare_hyp(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __pkvm_host_unshare_hyp(pfn);
 }
 
+static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+	DECLARE_REG(u64, pfn, host_ctxt, 2);
+	DECLARE_REG(u64, ipa, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_reclaim_dying_guest_page(handle, pfn, ipa);
+}
+
 static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
@@ -184,11 +1100,159 @@ static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
 }
 
-static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt)
+static void handle___pkvm_init_vm(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
+	DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
+	DECLARE_REG(unsigned long, vm_hva, host_ctxt, 2);
+	DECLARE_REG(unsigned long, pgd_hva, host_ctxt, 3);
+	DECLARE_REG(unsigned long, last_ran_hva, host_ctxt, 4);
 
-	__pkvm_vcpu_init_traps(kern_hyp_va(vcpu));
+	host_kvm = kern_hyp_va(host_kvm);
+	cpu_reg(host_ctxt, 1) = __pkvm_init_vm(host_kvm, vm_hva, pgd_hva,
+					       last_ran_hva);
+}
+
+static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 2);
+	DECLARE_REG(unsigned long, vcpu_hva, host_ctxt, 3);
+
+	host_vcpu = kern_hyp_va(host_vcpu);
+	cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
+}
+
+static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_start_teardown_vm(handle);
+}
+
+static void handle___pkvm_finalize_teardown_vm(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_finalize_teardown_vm(handle);
+}
+static void handle___pkvm_iommu_driver_init(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(struct pkvm_iommu_driver*, drv, host_ctxt, 1);
+	DECLARE_REG(void *, data, host_ctxt, 2);
+	DECLARE_REG(size_t, size, host_ctxt, 3);
+
+	data = kern_hyp_va(data);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_iommu_driver_init(drv, data, size);
+}
+
+static void handle___pkvm_iommu_register(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned long, dev_id, host_ctxt, 1);
+	DECLARE_REG(unsigned long, drv_id, host_ctxt, 2);
+	DECLARE_REG(phys_addr_t, dev_pa, host_ctxt, 3);
+	DECLARE_REG(size_t, dev_size, host_ctxt, 4);
+	DECLARE_REG(unsigned long, parent_id, host_ctxt, 5);
+	DECLARE_REG(void *, mem, host_ctxt, 6);
+	DECLARE_REG(size_t, mem_size, host_ctxt, 7);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_iommu_register(dev_id, drv_id, dev_pa,
+						      dev_size, parent_id,
+						      mem, mem_size);
+}
+
+static void handle___pkvm_iommu_pm_notify(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned long, dev_id, host_ctxt, 1);
+	DECLARE_REG(enum pkvm_iommu_pm_event, event, host_ctxt, 2);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_iommu_pm_notify(dev_id, event);
+}
+
+static void handle___pkvm_iommu_finalize(struct kvm_cpu_context *host_ctxt)
+{
+	cpu_reg(host_ctxt, 1) = __pkvm_iommu_finalize();
+}
+
+static void handle___pkvm_alloc_module_va(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, nr_pages, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = (u64)__pkvm_alloc_module_va(nr_pages);
+}
+
+static void handle___pkvm_map_module_page(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, pfn, host_ctxt, 1);
+	DECLARE_REG(void *, va, host_ctxt, 2);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = (u64)__pkvm_map_module_page(pfn, va, prot, false);
+}
+
+static void handle___pkvm_unmap_module_page(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, pfn, host_ctxt, 1);
+	DECLARE_REG(void *, va, host_ctxt, 2);
+
+	__pkvm_unmap_module_page(pfn, va);
+}
+
+static void handle___pkvm_init_module(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(void *, ptr, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_init_module(ptr);
+}
+
+static void handle___pkvm_register_hcall(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned long, hfn_hyp_va, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_register_hcall(hfn_hyp_va);
+}
+
+static void
+handle___pkvm_close_module_registration(struct kvm_cpu_context *host_ctxt)
+{
+	cpu_reg(host_ctxt, 1) = __pkvm_close_module_registration();
+}
+
+static void handle___pkvm_start_tracing(struct kvm_cpu_context *host_ctxt)
+{
+	 DECLARE_REG(unsigned long, pack_hva, host_ctxt, 1);
+	 DECLARE_REG(size_t, pack_size, host_ctxt, 2);
+
+	 cpu_reg(host_ctxt, 1) = __pkvm_start_tracing(pack_hva, pack_size);
+}
+
+static void handle___pkvm_stop_tracing(struct kvm_cpu_context *host_ctxt)
+{
+	__pkvm_stop_tracing();
+
+	cpu_reg(host_ctxt, 1) = 0;
+}
+
+static void handle___pkvm_rb_swap_reader_page(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(int, cpu, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_rb_swap_reader_page(cpu);
+}
+
+static void handle___pkvm_rb_update_footers(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(int, cpu, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_rb_update_footers(cpu);
+}
+
+static void handle___pkvm_enable_event(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned short, id, host_ctxt, 1);
+	DECLARE_REG(bool, enable, host_ctxt, 2);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_enable_event(id, enable);
 }
 
 typedef void (*hcall_t)(struct kvm_cpu_context *);
@@ -204,30 +1268,71 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__kvm_enable_ssbs),
 	HANDLE_FUNC(__vgic_v3_init_lrs),
 	HANDLE_FUNC(__vgic_v3_get_gic_config),
-	HANDLE_FUNC(__pkvm_prot_finalize),
-
-	HANDLE_FUNC(__pkvm_host_share_hyp),
-	HANDLE_FUNC(__pkvm_host_unshare_hyp),
-	HANDLE_FUNC(__kvm_adjust_pc),
-	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
 	HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
 	HANDLE_FUNC(__kvm_tlb_flush_vmid),
 	HANDLE_FUNC(__kvm_flush_cpu_context),
+
+	HANDLE_FUNC(__pkvm_alloc_module_va),
+	HANDLE_FUNC(__pkvm_map_module_page),
+	HANDLE_FUNC(__pkvm_unmap_module_page),
+	HANDLE_FUNC(__pkvm_init_module),
+	HANDLE_FUNC(__pkvm_register_hcall),
+	HANDLE_FUNC(__pkvm_close_module_registration),
+	HANDLE_FUNC(__pkvm_prot_finalize),
+
+	HANDLE_FUNC(__pkvm_host_share_hyp),
+	HANDLE_FUNC(__pkvm_host_unshare_hyp),
+	HANDLE_FUNC(__pkvm_host_map_guest),
+	HANDLE_FUNC(__kvm_adjust_pc),
+	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_timer_set_cntvoff),
-	HANDLE_FUNC(__vgic_v3_read_vmcr),
-	HANDLE_FUNC(__vgic_v3_write_vmcr),
-	HANDLE_FUNC(__vgic_v3_save_aprs),
-	HANDLE_FUNC(__vgic_v3_restore_aprs),
-	HANDLE_FUNC(__pkvm_vcpu_init_traps),
+	HANDLE_FUNC(__vgic_v3_save_vmcr_aprs),
+	HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
+	HANDLE_FUNC(__pkvm_init_vm),
+	HANDLE_FUNC(__pkvm_init_vcpu),
+	HANDLE_FUNC(__pkvm_start_teardown_vm),
+	HANDLE_FUNC(__pkvm_finalize_teardown_vm),
+	HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
+	HANDLE_FUNC(__pkvm_vcpu_load),
+	HANDLE_FUNC(__pkvm_vcpu_put),
+	HANDLE_FUNC(__pkvm_vcpu_sync_state),
+	HANDLE_FUNC(__pkvm_iommu_driver_init),
+	HANDLE_FUNC(__pkvm_iommu_register),
+	HANDLE_FUNC(__pkvm_iommu_pm_notify),
+	HANDLE_FUNC(__pkvm_iommu_finalize),
+	HANDLE_FUNC(__pkvm_start_tracing),
+	HANDLE_FUNC(__pkvm_stop_tracing),
+	HANDLE_FUNC(__pkvm_rb_swap_reader_page),
+	HANDLE_FUNC(__pkvm_rb_update_footers),
+	HANDLE_FUNC(__pkvm_enable_event),
 };
 
+unsigned long pkvm_priv_hcall_limit __ro_after_init = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize;
+
+int reset_pkvm_priv_hcall_limit(void)
+{
+	unsigned long *addr;
+
+	if (pkvm_priv_hcall_limit == __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize)
+		return -EACCES;
+
+	addr = hyp_fixmap_map(__hyp_pa(&pkvm_priv_hcall_limit));
+	*addr = KVM_HOST_SMCCC_FUNC(__pkvm_prot_finalize);
+	hyp_fixmap_unmap();
+
+	return 0;
+}
+
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(unsigned long, id, host_ctxt, 0);
 	unsigned long hcall_min = 0;
 	hcall_t hfn;
 
+	if (handle_host_dynamic_hcall(host_ctxt) == HCALL_HANDLED)
+		return;
+
 	/*
 	 * If pKVM has been initialised then reject any calls to the
 	 * early "privileged" hypercalls. Note that we cannot reject
@@ -238,7 +1343,7 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
 	 * returns -EPERM after the first call for a given CPU.
 	 */
 	if (static_branch_unlikely(&kvm_protected_mode_initialized))
-		hcall_min = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize;
+		hcall_min = pkvm_priv_hcall_limit;
 
 	id -= KVM_HOST_SMCCC_ID(0);
 
@@ -252,23 +1357,33 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 0) = SMCCC_RET_SUCCESS;
 	hfn(host_ctxt);
 
+	trace_host_hcall(id, 0);
+
 	return;
 inval:
+	trace_host_hcall(id, 1);
 	cpu_reg(host_ctxt, 0) = SMCCC_RET_NOT_SUPPORTED;
 }
 
-static void default_host_smc_handler(struct kvm_cpu_context *host_ctxt)
-{
-	__kvm_hyp_host_forward_smc(host_ctxt);
-}
-
 static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
 {
+	DECLARE_REG(u64, func_id, host_ctxt, 0);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
 	bool handled;
 
+	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+	if (hyp_vcpu && hyp_vcpu->vcpu.arch.fp_state == FP_STATE_GUEST_OWNED)
+		fpsimd_host_restore();
+
 	handled = kvm_host_psci_handler(host_ctxt);
 	if (!handled)
-		default_host_smc_handler(host_ctxt);
+		handled = kvm_host_ffa_handler(host_ctxt);
+	if (!handled && READ_ONCE(default_host_smc_handler))
+		handled = default_host_smc_handler(host_ctxt);
+	if (!handled)
+		__kvm_hyp_host_forward_smc(host_ctxt);
+
+	trace_host_smc(func_id, !handled);
 
 	/* SMC was trapped, move ELR past the current PC. */
 	kvm_skip_host_instr();
@@ -278,6 +1393,8 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
 {
 	u64 esr = read_sysreg_el2(SYS_ESR);
 
+	trace_hyp_enter();
+
 	switch (ESR_ELx_EC(esr)) {
 	case ESR_ELx_EC_HVC64:
 		handle_host_hcall(host_ctxt);
@@ -285,16 +1402,17 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
 	case ESR_ELx_EC_SMC64:
 		handle_host_smc(host_ctxt);
 		break;
+	case ESR_ELx_EC_FP_ASIMD:
 	case ESR_ELx_EC_SVE:
-		sysreg_clear_set(cptr_el2, CPTR_EL2_TZ, 0);
-		isb();
-		sve_cond_update_zcr_vq(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2);
+		fpsimd_host_restore();
 		break;
 	case ESR_ELx_EC_IABT_LOW:
 	case ESR_ELx_EC_DABT_LOW:
 		handle_host_mem_abort(host_ctxt);
 		break;
 	default:
-		BUG();
+		BUG_ON(!READ_ONCE(default_trap_handler) || !default_trap_handler(host_ctxt));
 	}
+
+	trace_hyp_exit();
 }

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
index 9f54833..9fcb92a 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c

@@ -8,6 +8,8 @@
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
+DEFINE_PER_CPU(int, hyp_cpu_number);
+
 /*
  * nVHE copy of data structures tracking available CPU cores.
  * Only entries for CPUs that were online at KVM init are populated.
@@ -23,6 +25,8 @@ u64 cpu_logical_map(unsigned int cpu)
 	return hyp_cpu_logical_map[cpu];
 }
 
+unsigned long __ro_after_init kvm_arm_hyp_percpu_base[NR_CPUS];
+
 unsigned long __hyp_per_cpu_offset(unsigned int cpu)
 {
 	unsigned long *cpu_base_array;

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
index f4562f4..8254ef9 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S

@@ -16,6 +16,10 @@
 	HYP_SECTION(.text)
 	HYP_SECTION(.data..ro_after_init)
 	HYP_SECTION(.rodata)
+#ifdef CONFIG_TRACING
+	. = ALIGN(PAGE_SIZE);
+	HYP_SECTION(_hyp_event_ids)
+#endif
 
 	/*
 	 * .hyp..data..percpu needs to be page aligned to maintain the same
@@ -25,5 +29,7 @@
 	BEGIN_HYP_SECTION(.data..percpu)
 		PERCPU_INPUT(L1_CACHE_BYTES)
 	END_HYP_SECTION
+
 	HYP_SECTION(.bss)
+	HYP_SECTION(.data)
 }

diff --git a/arch/arm64/kvm/hyp/nvhe/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu.c
new file mode 100644
index 0000000..a6073f4
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/iommu.c

@@ -0,0 +1,570 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 Google LLC
+ * Author: David Brazdil <dbrazdil@google.com>
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pkvm.h>
+
+#include <hyp/adjust_pc.h>
+#include <nvhe/iommu.h>
+#include <nvhe/mm.h>
+
+#define DRV_ID(drv_addr)			((unsigned long)drv_addr)
+
+enum {
+	IOMMU_DRIVER_NOT_READY = 0,
+	IOMMU_DRIVER_INITIALIZING,
+	IOMMU_DRIVER_READY,
+};
+
+/* List of registered IOMMU drivers, protected with iommu_drv_lock. */
+static LIST_HEAD(iommu_drivers);
+/* IOMMU device list. Must only be accessed with host_mmu.lock held. */
+static LIST_HEAD(iommu_list);
+
+static bool iommu_finalized;
+static DEFINE_HYP_SPINLOCK(iommu_registration_lock);
+static DEFINE_HYP_SPINLOCK(iommu_drv_lock);
+
+static void *iommu_mem_pool;
+static size_t iommu_mem_remaining;
+
+static void assert_host_component_locked(void)
+{
+	hyp_assert_lock_held(&host_mmu.lock);
+}
+
+static void host_lock_component(void)
+{
+	hyp_spin_lock(&host_mmu.lock);
+}
+
+static void host_unlock_component(void)
+{
+	hyp_spin_unlock(&host_mmu.lock);
+}
+
+/*
+ * Find IOMMU driver by its ID. The input ID is treated as unstrusted
+ * and is properly validated.
+ */
+static inline struct pkvm_iommu_driver *get_driver(unsigned long id)
+{
+	struct pkvm_iommu_driver *drv, *ret = NULL;
+
+	hyp_spin_lock(&iommu_drv_lock);
+	list_for_each_entry(drv, &iommu_drivers, list) {
+		if (DRV_ID(drv) == id) {
+			ret =  drv;
+			break;
+		}
+	}
+	hyp_spin_unlock(&iommu_drv_lock);
+	return ret;
+}
+
+static inline bool driver_acquire_init(struct pkvm_iommu_driver *drv)
+{
+	return atomic_cmpxchg_acquire(&drv->state, IOMMU_DRIVER_NOT_READY,
+				      IOMMU_DRIVER_INITIALIZING)
+			== IOMMU_DRIVER_NOT_READY;
+}
+
+static inline void driver_release_init(struct pkvm_iommu_driver *drv,
+				       bool success)
+{
+	atomic_set_release(&drv->state, success ? IOMMU_DRIVER_READY
+						: IOMMU_DRIVER_NOT_READY);
+}
+
+static inline bool is_driver_ready(struct pkvm_iommu_driver *drv)
+{
+	return atomic_read(&drv->state) == IOMMU_DRIVER_READY;
+}
+
+static size_t __iommu_alloc_size(struct pkvm_iommu_driver *drv)
+{
+	return ALIGN(sizeof(struct pkvm_iommu) + drv->ops->data_size,
+		     sizeof(unsigned long));
+}
+
+static bool validate_driver_id_unique(struct pkvm_iommu_driver *drv)
+{
+	struct pkvm_iommu_driver *cur;
+
+	hyp_assert_lock_held(&iommu_drv_lock);
+	list_for_each_entry(cur, &iommu_drivers, list) {
+		if (DRV_ID(drv) == DRV_ID(cur))
+			return false;
+	}
+	return true;
+}
+
+static int __pkvm_register_iommu_driver(struct pkvm_iommu_driver *drv)
+{
+	int ret = 0;
+
+	if (!drv)
+		return -EINVAL;
+
+	hyp_assert_lock_held(&iommu_registration_lock);
+	hyp_spin_lock(&iommu_drv_lock);
+	if (validate_driver_id_unique(drv))
+		list_add_tail(&drv->list, &iommu_drivers);
+	else
+		ret = -EEXIST;
+	hyp_spin_unlock(&iommu_drv_lock);
+	return ret;
+}
+
+/* Global memory pool for allocating IOMMU list entry structs. */
+static inline struct pkvm_iommu *alloc_iommu(struct pkvm_iommu_driver *drv,
+					     void *mem, size_t mem_size)
+{
+	size_t size = __iommu_alloc_size(drv);
+	void *ptr;
+
+	assert_host_component_locked();
+
+	/*
+	 * If new memory is being provided, replace the existing pool with it.
+	 * Any remaining memory in the pool is discarded.
+	 */
+	if (mem && mem_size) {
+		iommu_mem_pool = mem;
+		iommu_mem_remaining = mem_size;
+	}
+
+	if (size > iommu_mem_remaining)
+		return NULL;
+
+	ptr = iommu_mem_pool;
+	iommu_mem_pool += size;
+	iommu_mem_remaining -= size;
+	return ptr;
+}
+
+static inline void free_iommu(struct pkvm_iommu_driver *drv, struct pkvm_iommu *ptr)
+{
+	size_t size = __iommu_alloc_size(drv);
+
+	assert_host_component_locked();
+
+	if (!ptr)
+		return;
+
+	/* Only allow freeing the last allocated buffer. */
+	if ((void *)ptr + size != iommu_mem_pool)
+		return;
+
+	iommu_mem_pool -= size;
+	iommu_mem_remaining += size;
+}
+
+static bool is_overlap(phys_addr_t r1_start, size_t r1_size,
+		       phys_addr_t r2_start, size_t r2_size)
+{
+	phys_addr_t r1_end = r1_start + r1_size;
+	phys_addr_t r2_end = r2_start + r2_size;
+
+	return (r1_start < r2_end) && (r2_start < r1_end);
+}
+
+static bool is_mmio_range(phys_addr_t base, size_t size)
+{
+	struct memblock_region *reg;
+	phys_addr_t limit = BIT(host_mmu.pgt.ia_bits);
+	size_t i;
+
+	/* Check against limits of host IPA space. */
+	if ((base >= limit) || !size || (size > limit - base))
+		return false;
+
+	for (i = 0; i < hyp_memblock_nr; i++) {
+		reg = &hyp_memory[i];
+		if (is_overlap(base, size, reg->base, reg->size))
+			return false;
+	}
+	return true;
+}
+
+static int __snapshot_host_stage2(u64 start, u64 pa_max, u32 level,
+				  kvm_pte_t *ptep,
+				  enum kvm_pgtable_walk_flags flags,
+				  void * const arg)
+{
+	struct pkvm_iommu_driver * const drv = arg;
+	u64 end = start + kvm_granule_size(level);
+	kvm_pte_t pte = *ptep;
+
+	/*
+	 * Valid stage-2 entries are created lazily, invalid ones eagerly.
+	 * Note: In the future we may need to check if [start,end) is MMIO.
+	 * Note: Drivers initialize their PTs to all memory owned by the host,
+	 * so we only call the driver on regions where that is not the case.
+	 */
+	if (pte && !kvm_pte_valid(pte))
+		drv->ops->host_stage2_idmap_prepare(start, end, /*prot*/ 0);
+	return 0;
+}
+
+static int snapshot_host_stage2(struct pkvm_iommu_driver * const drv)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= __snapshot_host_stage2,
+		.arg	= drv,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+	};
+	struct kvm_pgtable *pgt = &host_mmu.pgt;
+
+	if (!drv->ops->host_stage2_idmap_prepare)
+		return 0;
+
+	return kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker);
+}
+
+static bool validate_against_existing_iommus(struct pkvm_iommu *dev)
+{
+	struct pkvm_iommu *other;
+
+	assert_host_component_locked();
+
+	list_for_each_entry(other, &iommu_list, list) {
+		/* Device ID must be unique. */
+		if (dev->id == other->id)
+			return false;
+
+		/* MMIO regions must not overlap. */
+		if (is_overlap(dev->pa, dev->size, other->pa, other->size))
+			return false;
+	}
+	return true;
+}
+
+static struct pkvm_iommu *find_iommu_by_id(unsigned long id)
+{
+	struct pkvm_iommu *dev;
+
+	assert_host_component_locked();
+
+	list_for_each_entry(dev, &iommu_list, list) {
+		if (dev->id == id)
+			return dev;
+	}
+	return NULL;
+}
+
+/*
+ * Initialize EL2 IOMMU driver.
+ *
+ * This is a common hypercall for driver initialization. Driver-specific
+ * arguments are passed in a shared memory buffer. The driver is expected to
+ * initialize it's page-table bookkeeping.
+ */
+int __pkvm_iommu_driver_init(struct pkvm_iommu_driver *drv, void *data, size_t size)
+{
+	const struct pkvm_iommu_ops *ops;
+	int ret = 0;
+
+	/* New driver initialization not allowed after __pkvm_iommu_finalize(). */
+	hyp_spin_lock(&iommu_registration_lock);
+	if (iommu_finalized) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+
+	ret =  __pkvm_register_iommu_driver(drv);
+	if (ret)
+		return ret;
+
+	if (!drv->ops) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (!driver_acquire_init(drv)) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	ops = drv->ops;
+
+	/* This can change stage-2 mappings. */
+	if (ops->init) {
+		ret = hyp_pin_shared_mem(data, data + size);
+		if (!ret) {
+			ret = ops->init(data, size);
+			hyp_unpin_shared_mem(data, data + size);
+		}
+		if (ret)
+			goto out_release;
+	}
+
+	/*
+	 * Walk host stage-2 and pass current mappings to the driver. Start
+	 * accepting host stage-2 updates as soon as the host lock is released.
+	 */
+	host_lock_component();
+	ret = snapshot_host_stage2(drv);
+	if (!ret)
+		driver_release_init(drv, /*success=*/true);
+	host_unlock_component();
+
+out_release:
+	if (ret)
+		driver_release_init(drv, /*success=*/false);
+
+out_unlock:
+	hyp_spin_unlock(&iommu_registration_lock);
+	return ret;
+}
+
+int __pkvm_iommu_register(unsigned long dev_id, unsigned long drv_id,
+			  phys_addr_t dev_pa, size_t dev_size,
+			  unsigned long parent_id,
+			  void *kern_mem_va, size_t mem_size)
+{
+	struct pkvm_iommu *dev = NULL;
+	struct pkvm_iommu_driver *drv;
+	void *mem_va = NULL;
+	int ret = 0;
+
+	/* New device registration not allowed after __pkvm_iommu_finalize(). */
+	hyp_spin_lock(&iommu_registration_lock);
+	if (iommu_finalized) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+
+	drv = get_driver(drv_id);
+	if (!drv || !is_driver_ready(drv)) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	if (!PAGE_ALIGNED(dev_pa) || !PAGE_ALIGNED(dev_size)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (!is_mmio_range(dev_pa, dev_size)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/*
+	 * Accept memory donation if the host is providing new memory.
+	 * Note: We do not return the memory even if there is an error later.
+	 */
+	if (kern_mem_va && mem_size) {
+		mem_va = kern_hyp_va(kern_mem_va);
+
+		if (!PAGE_ALIGNED(mem_va) || !PAGE_ALIGNED(mem_size)) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		ret = __pkvm_host_donate_hyp(hyp_virt_to_pfn(mem_va),
+					     mem_size >> PAGE_SHIFT);
+		if (ret)
+			goto out_unlock;
+	}
+
+	host_lock_component();
+
+	/* Allocate memory for the new device entry. */
+	dev = alloc_iommu(drv, mem_va, mem_size);
+	if (!dev) {
+		ret = -ENOMEM;
+		goto out_free;
+	}
+
+	/* Populate the new device entry. */
+	*dev = (struct pkvm_iommu){
+		.children = LIST_HEAD_INIT(dev->children),
+		.id = dev_id,
+		.ops = drv->ops,
+		.pa = dev_pa,
+		.size = dev_size,
+	};
+
+	if (!validate_against_existing_iommus(dev)) {
+		ret = -EBUSY;
+		goto out_free;
+	}
+
+	if (parent_id) {
+		dev->parent = find_iommu_by_id(parent_id);
+		if (!dev->parent) {
+			ret = -EINVAL;
+			goto out_free;
+		}
+
+		if (dev->parent->ops->validate_child) {
+			ret = dev->parent->ops->validate_child(dev->parent, dev);
+			if (ret)
+				goto out_free;
+		}
+	}
+
+	if (dev->ops->validate) {
+		ret = dev->ops->validate(dev);
+		if (ret)
+			goto out_free;
+	}
+
+	/*
+	 * Unmap the device's MMIO range from host stage-2. If registration
+	 * is successful, future attempts to re-map will be blocked by
+	 * pkvm_iommu_host_stage2_adjust_range.
+	 */
+	ret = host_stage2_unmap_reg_locked(dev_pa, dev_size);
+	if (ret)
+		goto out_free;
+
+	/* Create EL2 mapping for the device. */
+	ret = __pkvm_create_private_mapping(dev_pa, dev_size,
+					    PAGE_HYP_DEVICE, (unsigned long *)(&dev->va));
+	if (ret){
+		goto out_free;
+	}
+
+	/* Register device and prevent host from mapping the MMIO range. */
+	list_add_tail(&dev->list, &iommu_list);
+	if (dev->parent)
+		list_add_tail(&dev->siblings, &dev->parent->children);
+
+out_free:
+	if (ret)
+		free_iommu(drv, dev);
+	host_unlock_component();
+
+out_unlock:
+	hyp_spin_unlock(&iommu_registration_lock);
+	return ret;
+}
+
+int __pkvm_iommu_finalize(void)
+{
+	int ret = 0;
+
+	hyp_spin_lock(&iommu_registration_lock);
+	if (!iommu_finalized)
+		iommu_finalized = true;
+	else
+		ret = -EPERM;
+	hyp_spin_unlock(&iommu_registration_lock);
+	return ret;
+}
+
+int __pkvm_iommu_pm_notify(unsigned long dev_id, enum pkvm_iommu_pm_event event)
+{
+	struct pkvm_iommu *dev;
+	int ret;
+
+	host_lock_component();
+	dev = find_iommu_by_id(dev_id);
+	if (dev) {
+		if (event == PKVM_IOMMU_PM_SUSPEND) {
+			ret = dev->ops->suspend ? dev->ops->suspend(dev) : 0;
+			if (!ret)
+				dev->powered = false;
+		} else if (event == PKVM_IOMMU_PM_RESUME) {
+			ret = dev->ops->resume ? dev->ops->resume(dev) : 0;
+			if (!ret)
+				dev->powered = true;
+		} else {
+			ret = -EINVAL;
+		}
+	} else {
+		ret = -ENODEV;
+	}
+	host_unlock_component();
+	return ret;
+}
+
+/*
+ * Check host memory access against IOMMUs' MMIO regions.
+ * Returns -EPERM if the address is within the bounds of a registered device.
+ * Otherwise returns zero and adjusts boundaries of the new mapping to avoid
+ * MMIO regions of registered IOMMUs.
+ */
+int pkvm_iommu_host_stage2_adjust_range(phys_addr_t addr, phys_addr_t *start,
+					phys_addr_t *end)
+{
+	struct pkvm_iommu *dev;
+	phys_addr_t new_start = *start;
+	phys_addr_t new_end = *end;
+	phys_addr_t dev_start, dev_end;
+
+	assert_host_component_locked();
+
+	list_for_each_entry(dev, &iommu_list, list) {
+		dev_start = dev->pa;
+		dev_end = dev_start + dev->size;
+
+		if (addr < dev_start)
+			new_end = min(new_end, dev_start);
+		else if (addr >= dev_end)
+			new_start = max(new_start, dev_end);
+		else
+			return -EPERM;
+	}
+
+	*start = new_start;
+	*end = new_end;
+	return 0;
+}
+
+bool pkvm_iommu_host_dabt_handler(struct kvm_cpu_context *host_ctxt, u32 esr,
+				  phys_addr_t pa)
+{
+	struct pkvm_iommu *dev;
+
+	assert_host_component_locked();
+
+	list_for_each_entry(dev, &iommu_list, list) {
+		if (pa < dev->pa || pa >= dev->pa + dev->size)
+			continue;
+
+		/* No 'powered' check - the host assumes it is powered. */
+		if (!dev->ops->host_dabt_handler ||
+		    !dev->ops->host_dabt_handler(dev, host_ctxt, esr, pa - dev->pa))
+			return false;
+
+		kvm_skip_host_instr();
+		return true;
+	}
+	return false;
+}
+
+void pkvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
+				  enum kvm_pgtable_prot prot)
+{
+	struct pkvm_iommu_driver *drv;
+	struct pkvm_iommu *dev;
+
+	assert_host_component_locked();
+	hyp_spin_lock(&iommu_drv_lock);
+	list_for_each_entry(drv, &iommu_drivers, list) {
+		if (drv && is_driver_ready(drv) && drv->ops->host_stage2_idmap_prepare)
+			drv->ops->host_stage2_idmap_prepare(start, end, prot);
+	}
+	hyp_spin_unlock(&iommu_drv_lock);
+
+	list_for_each_entry(dev, &iommu_list, list) {
+		if (dev->powered && dev->ops->host_stage2_idmap_apply)
+			dev->ops->host_stage2_idmap_apply(dev, start, end);
+	}
+
+	list_for_each_entry(dev, &iommu_list, list) {
+		if (dev->powered && dev->ops->host_stage2_idmap_complete)
+			dev->ops->host_stage2_idmap_complete(dev);
+	}
+}

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 07f9dc9..735bd4c 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c

@@ -7,35 +7,65 @@
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_hypevents.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/kvm_pkvm.h>
 #include <asm/stage2_pgtable.h>
 
+#include <hyp/adjust_pc.h>
 #include <hyp/fault.h>
 
 #include <nvhe/gfp.h>
+#include <nvhe/iommu.h>
 #include <nvhe/memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 
 #define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_NOFWB | KVM_PGTABLE_S2_IDMAP)
 
-extern unsigned long hyp_nr_cpus;
-struct host_kvm host_kvm;
+struct host_mmu host_mmu;
+
+struct pkvm_moveable_reg pkvm_moveable_regs[PKVM_NR_MOVEABLE_REGS];
+unsigned int pkvm_moveable_regs_nr;
 
 static struct hyp_pool host_s2_pool;
 
-const u8 pkvm_hyp_id = 1;
+static DEFINE_PER_CPU(struct pkvm_hyp_vm *, __current_vm);
+#define current_vm (*this_cpu_ptr(&__current_vm))
+
+static struct kvm_pgtable_pte_ops host_s2_pte_ops;
+static bool host_stage2_force_pte(u64 addr, u64 end, enum kvm_pgtable_prot prot);
+static bool host_stage2_pte_is_counted(kvm_pte_t pte, u32 level);
+static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
+				      enum kvm_pgtable_prot prot);
+static bool guest_stage2_pte_is_counted(kvm_pte_t pte, u32 level);
+
+static struct kvm_pgtable_pte_ops guest_s2_pte_ops = {
+	.force_pte_cb = guest_stage2_force_pte_cb,
+	.pte_is_counted_cb = guest_stage2_pte_is_counted
+};
+
+static void guest_lock_component(struct pkvm_hyp_vm *vm)
+{
+	hyp_spin_lock(&vm->lock);
+	current_vm = vm;
+}
+
+static void guest_unlock_component(struct pkvm_hyp_vm *vm)
+{
+	current_vm = NULL;
+	hyp_spin_unlock(&vm->lock);
+}
 
 static void host_lock_component(void)
 {
-	hyp_spin_lock(&host_kvm.lock);
+	hyp_spin_lock(&host_mmu.lock);
 }
 
 static void host_unlock_component(void)
 {
-	hyp_spin_unlock(&host_kvm.lock);
+	hyp_spin_unlock(&host_mmu.lock);
 }
 
 static void hyp_lock_component(void)
@@ -90,7 +120,7 @@ static int prepare_s2_pool(void *pgt_pool_base)
 	if (ret)
 		return ret;
 
-	host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) {
+	host_mmu.mm_ops = (struct kvm_pgtable_mm_ops) {
 		.zalloc_pages_exact = host_s2_zalloc_pages_exact,
 		.zalloc_page = host_s2_zalloc_page,
 		.phys_to_virt = hyp_phys_to_virt,
@@ -111,53 +141,234 @@ static void prepare_host_vtcr(void)
 	parange = kvm_get_parange(id_aa64mmfr0_el1_sys_val);
 	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
 
-	host_kvm.arch.vtcr = kvm_get_vtcr(id_aa64mmfr0_el1_sys_val,
+	host_mmu.arch.vtcr = kvm_get_vtcr(id_aa64mmfr0_el1_sys_val,
 					  id_aa64mmfr1_el1_sys_val, phys_shift);
 }
 
-static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot);
-
 int kvm_host_prepare_stage2(void *pgt_pool_base)
 {
-	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+	struct kvm_s2_mmu *mmu = &host_mmu.arch.mmu;
 	int ret;
 
 	prepare_host_vtcr();
-	hyp_spin_lock_init(&host_kvm.lock);
-	mmu->arch = &host_kvm.arch;
+	hyp_spin_lock_init(&host_mmu.lock);
+	mmu->arch = &host_mmu.arch;
 
 	ret = prepare_s2_pool(pgt_pool_base);
 	if (ret)
 		return ret;
 
-	ret = __kvm_pgtable_stage2_init(&host_kvm.pgt, mmu,
-					&host_kvm.mm_ops, KVM_HOST_S2_FLAGS,
-					host_stage2_force_pte_cb);
+	host_s2_pte_ops.force_pte_cb = host_stage2_force_pte;
+	host_s2_pte_ops.pte_is_counted_cb = host_stage2_pte_is_counted;
+
+	ret = __kvm_pgtable_stage2_init(&host_mmu.pgt, mmu,
+					&host_mmu.mm_ops, KVM_HOST_S2_FLAGS,
+					&host_s2_pte_ops);
 	if (ret)
 		return ret;
 
-	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
-	mmu->pgt = &host_kvm.pgt;
+	mmu->pgd_phys = __hyp_pa(host_mmu.pgt.pgd);
+	mmu->pgt = &host_mmu.pgt;
 	atomic64_set(&mmu->vmid.id, 0);
 
 	return 0;
 }
 
+static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
+				      enum kvm_pgtable_prot prot)
+{
+	return true;
+}
+
+static bool guest_stage2_pte_is_counted(kvm_pte_t pte, u32 level)
+{
+	return host_stage2_pte_is_counted(pte, level);
+}
+
+static void *guest_s2_zalloc_pages_exact(size_t size)
+{
+	void *addr = hyp_alloc_pages(&current_vm->pool, get_order(size));
+
+	WARN_ON(size != (PAGE_SIZE << get_order(size)));
+	hyp_split_page(hyp_virt_to_page(addr));
+
+	return addr;
+}
+
+static void guest_s2_free_pages_exact(void *addr, unsigned long size)
+{
+	u8 order = get_order(size);
+	unsigned int i;
+
+	for (i = 0; i < (1 << order); i++)
+		hyp_put_page(&current_vm->pool, addr + (i * PAGE_SIZE));
+}
+
+static void *guest_s2_zalloc_page(void *mc)
+{
+	struct hyp_page *p;
+	void *addr;
+
+	addr = hyp_alloc_pages(&current_vm->pool, 0);
+	if (addr)
+		return addr;
+
+	addr = pop_hyp_memcache(mc, hyp_phys_to_virt);
+	if (!addr)
+		return addr;
+
+	memset(addr, 0, PAGE_SIZE);
+	p = hyp_virt_to_page(addr);
+	memset(p, 0, sizeof(*p));
+	p->refcount = 1;
+
+	return addr;
+}
+
+static void guest_s2_get_page(void *addr)
+{
+	hyp_get_page(&current_vm->pool, addr);
+}
+
+static void guest_s2_put_page(void *addr)
+{
+	hyp_put_page(&current_vm->pool, addr);
+}
+
+static void clean_dcache_guest_page(void *va, size_t size)
+{
+	__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+	hyp_fixmap_unmap();
+}
+
+static void invalidate_icache_guest_page(void *va, size_t size)
+{
+	__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+	hyp_fixmap_unmap();
+}
+
+int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
+{
+	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
+	unsigned long nr_pages;
+	int ret;
+
+	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	ret = hyp_pool_init(&vm->pool, hyp_virt_to_pfn(pgd), nr_pages, 0);
+	if (ret)
+		return ret;
+
+	hyp_spin_lock_init(&vm->lock);
+	vm->mm_ops = (struct kvm_pgtable_mm_ops) {
+		.zalloc_pages_exact	= guest_s2_zalloc_pages_exact,
+		.free_pages_exact	= guest_s2_free_pages_exact,
+		.zalloc_page		= guest_s2_zalloc_page,
+		.phys_to_virt		= hyp_phys_to_virt,
+		.virt_to_phys		= hyp_virt_to_phys,
+		.page_count		= hyp_page_count,
+		.get_page		= guest_s2_get_page,
+		.put_page		= guest_s2_put_page,
+		.dcache_clean_inval_poc	= clean_dcache_guest_page,
+		.icache_inval_pou	= invalidate_icache_guest_page,
+	};
+
+	guest_lock_component(vm);
+	ret = __kvm_pgtable_stage2_init(mmu->pgt, mmu, &vm->mm_ops, 0,
+					&guest_s2_pte_ops);
+	guest_unlock_component(vm);
+	if (ret)
+		return ret;
+
+	vm->kvm.arch.mmu.pgd_phys = __hyp_pa(vm->pgt.pgd);
+
+	return 0;
+}
+
+struct relinquish_data {
+	enum pkvm_page_state expected_state;
+	u64 pa;
+};
+
+static int relinquish_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			     enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+	kvm_pte_t pte = *ptep;
+	struct relinquish_data *data = arg;
+	enum pkvm_page_state state;
+	phys_addr_t phys;
+
+	if (!kvm_pte_valid(pte))
+		return 0;
+
+	state = pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
+	if (state != data->expected_state)
+		return -EPERM;
+
+	phys = kvm_pte_to_phys(pte);
+	if (state == PKVM_PAGE_OWNED) {
+		hyp_poison_page(phys);
+		psci_mem_protect_dec(1);
+	}
+
+	data->pa = phys;
+
+	return 0;
+}
+
+int __pkvm_guest_relinquish_to_host(struct pkvm_hyp_vcpu *vcpu,
+				    u64 ipa, u64 *ppa)
+{
+	struct relinquish_data data;
+	struct kvm_pgtable_walker walker = {
+		.cb     = relinquish_walker,
+		.flags  = KVM_PGTABLE_WALK_LEAF,
+		.arg    = &data,
+	};
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	int ret;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	/* Expected page state depends on VM type. */
+	data.expected_state = pkvm_hyp_vcpu_is_protected(vcpu) ?
+		PKVM_PAGE_OWNED :
+		PKVM_PAGE_SHARED_BORROWED;
+
+	/* Set default pa value to "not found". */
+	data.pa = 0;
+
+	/* If ipa is mapped: poisons the page, and gets the pa. */
+	ret = kvm_pgtable_walk(&vm->pgt, ipa, PAGE_SIZE, &walker);
+
+	/* Zap the guest stage2 pte and return ownership to the host */
+	if (!ret && data.pa) {
+		WARN_ON(host_stage2_set_owner_locked(data.pa, PAGE_SIZE, PKVM_ID_HOST));
+		WARN_ON(kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE));
+	}
+
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	*ppa = data.pa;
+	return ret;
+}
+
 int __pkvm_prot_finalize(void)
 {
-	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+	struct kvm_s2_mmu *mmu = &host_mmu.arch.mmu;
 	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
 
 	if (params->hcr_el2 & HCR_VM)
 		return -EPERM;
 
 	params->vttbr = kvm_get_vttbr(mmu);
-	params->vtcr = host_kvm.arch.vtcr;
+	params->vtcr = host_mmu.arch.vtcr;
 	params->hcr_el2 |= HCR_VM;
 	kvm_flush_dcache_to_poc(params, sizeof(*params));
 
 	write_sysreg(params->hcr_el2, hcr_el2);
-	__load_stage2(&host_kvm.arch.mmu, &host_kvm.arch);
+	__load_stage2(&host_mmu.arch.mmu, &host_mmu.arch);
 
 	/*
 	 * Make sure to have an ISB before the TLB maintenance below but only
@@ -173,21 +384,38 @@ int __pkvm_prot_finalize(void)
 	return 0;
 }
 
-static int host_stage2_unmap_dev_all(void)
+int host_stage2_unmap_reg_locked(phys_addr_t start, u64 size)
 {
-	struct kvm_pgtable *pgt = &host_kvm.pgt;
-	struct memblock_region *reg;
+	int ret;
+
+	hyp_assert_lock_held(&host_mmu.lock);
+
+	ret = kvm_pgtable_stage2_unmap(&host_mmu.pgt, start, size);
+	if (ret)
+		return ret;
+
+	pkvm_iommu_host_stage2_idmap(start, start + size, 0);
+	return 0;
+}
+
+static int host_stage2_unmap_unmoveable_regs(void)
+{
+	struct kvm_pgtable *pgt = &host_mmu.pgt;
+	struct pkvm_moveable_reg *reg;
 	u64 addr = 0;
 	int i, ret;
 
-	/* Unmap all non-memory regions to recycle the pages */
-	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
-		reg = &hyp_memory[i];
-		ret = kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
-		if (ret)
-			return ret;
+	/* Unmap all unmoveable regions to recycle the pages */
+	for (i = 0; i < pkvm_moveable_regs_nr; i++) {
+		reg = &pkvm_moveable_regs[i];
+		if (reg->start > addr) {
+			ret = host_stage2_unmap_reg_locked(addr, reg->start - addr);
+			if (ret)
+				return ret;
+		}
+		addr = max(addr, reg->start + reg->size);
 	}
-	return kvm_pgtable_stage2_unmap(pgt, addr, BIT(pgt->ia_bits) - addr);
+	return host_stage2_unmap_reg_locked(addr, BIT(pgt->ia_bits) - addr);
 }
 
 struct kvm_mem_range {
@@ -195,7 +423,7 @@ struct kvm_mem_range {
 	u64 end;
 };
 
-static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
+static struct memblock_region *find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
 {
 	int cur, left = 0, right = hyp_memblock_nr;
 	struct memblock_region *reg;
@@ -218,18 +446,33 @@ static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
 		} else {
 			range->start = reg->base;
 			range->end = end;
-			return true;
+			return reg;
 		}
 	}
 
-	return false;
+	return NULL;
+}
+
+static enum kvm_pgtable_prot default_host_prot(bool is_memory)
+{
+	return is_memory ? PKVM_HOST_MEM_PROT : PKVM_HOST_MMIO_PROT;
 }
 
 bool addr_is_memory(phys_addr_t phys)
 {
 	struct kvm_mem_range range;
 
-	return find_mem_range(phys, &range);
+	return !!find_mem_range(phys, &range);
+}
+
+static bool addr_is_allowed_memory(phys_addr_t phys)
+{
+	struct memblock_region *reg;
+	struct kvm_mem_range range;
+
+	reg = find_mem_range(phys, &range);
+
+	return reg && !(reg->flags & MEMBLOCK_NOMAP);
 }
 
 static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
@@ -248,25 +491,34 @@ static bool range_is_memory(u64 start, u64 end)
 }
 
 static inline int __host_stage2_idmap(u64 start, u64 end,
-				      enum kvm_pgtable_prot prot)
+				      enum kvm_pgtable_prot prot,
+				      bool update_iommu)
 {
-	return kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start,
-				      prot, &host_s2_pool);
+	int ret;
+
+	ret = kvm_pgtable_stage2_map(&host_mmu.pgt, start, end - start, start,
+				     prot, &host_s2_pool);
+	if (ret)
+		return ret;
+
+	if (update_iommu)
+		pkvm_iommu_host_stage2_idmap(start, end, prot);
+	return 0;
 }
 
 /*
- * The pool has been provided with enough pages to cover all of memory with
- * page granularity, but it is difficult to know how much of the MMIO range
- * we will need to cover upfront, so we may need to 'recycle' the pages if we
- * run out.
+ * The pool has been provided with enough pages to cover all of moveable regions
+ * with page granularity, but it is difficult to know how much of the
+ * non-moveable regions we will need to cover upfront, so we may need to
+ * 'recycle' the pages if we run out.
  */
 #define host_stage2_try(fn, ...)					\
 	({								\
 		int __ret;						\
-		hyp_assert_lock_held(&host_kvm.lock);			\
+		hyp_assert_lock_held(&host_mmu.lock);			\
 		__ret = fn(__VA_ARGS__);				\
 		if (__ret == -ENOMEM) {					\
-			__ret = host_stage2_unmap_dev_all();		\
+			__ret = host_stage2_unmap_unmoveable_regs();		\
 			if (!__ret)					\
 				__ret = fn(__VA_ARGS__);		\
 		}							\
@@ -286,8 +538,8 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 	u32 level;
 	int ret;
 
-	hyp_assert_lock_held(&host_kvm.lock);
-	ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, &level);
+	hyp_assert_lock_held(&host_mmu.lock);
+	ret = kvm_pgtable_get_leaf(&host_mmu.pgt, addr, &pte, &level);
 	if (ret)
 		return ret;
 
@@ -312,18 +564,39 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 }
 
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
-			     enum kvm_pgtable_prot prot)
+			     enum kvm_pgtable_prot prot, bool update_iommu)
 {
-	return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot);
+	return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot, update_iommu);
 }
 
-int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
+#define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
+static kvm_pte_t kvm_init_invalid_leaf_owner(enum pkvm_component_id owner_id)
 {
-	return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_kvm.pgt,
-			       addr, size, &host_s2_pool, owner_id);
+	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
 }
 
-static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
+int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, enum pkvm_component_id owner_id)
+{
+	kvm_pte_t annotation;
+	enum kvm_pgtable_prot prot;
+	int ret;
+
+	if (owner_id > PKVM_ID_MAX)
+		return -EINVAL;
+
+	annotation = kvm_init_invalid_leaf_owner(owner_id);
+
+	ret = host_stage2_try(kvm_pgtable_stage2_annotate, &host_mmu.pgt,
+			      addr, size, &host_s2_pool, annotation);
+	if (ret)
+		return ret;
+
+	prot = owner_id == PKVM_ID_HOST ? PKVM_HOST_MEM_PROT : 0;
+	pkvm_iommu_host_stage2_idmap(addr, addr + size, prot);
+	return 0;
+}
+
+static bool host_stage2_force_pte(u64 addr, u64 end, enum kvm_pgtable_prot prot)
 {
 	/*
 	 * Block mappings must be used with care in the host stage-2 as a
@@ -339,52 +612,155 @@ static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot pr
 	 * mappings, hence avoiding to lose the state because of side-effects in
 	 * kvm_pgtable_stage2_map().
 	 */
-	if (range_is_memory(addr, end))
-		return prot != PKVM_HOST_MEM_PROT;
-	else
-		return prot != PKVM_HOST_MMIO_PROT;
+	return prot != default_host_prot(range_is_memory(addr, end));
+}
+
+static bool host_stage2_pte_is_counted(kvm_pte_t pte, u32 level)
+{
+	/*
+	 * The refcount tracks valid entries as well as invalid entries if they
+	 * encode ownership of a page to another entity than the page-table
+	 * owner, whose id is 0.
+	 */
+	return !!pte;
 }
 
 static int host_stage2_idmap(u64 addr)
 {
 	struct kvm_mem_range range;
-	bool is_memory = find_mem_range(addr, &range);
-	enum kvm_pgtable_prot prot;
+	bool is_memory = !!find_mem_range(addr, &range);
+	enum kvm_pgtable_prot prot = default_host_prot(is_memory);
 	int ret;
 
-	prot = is_memory ? PKVM_HOST_MEM_PROT : PKVM_HOST_MMIO_PROT;
+	hyp_assert_lock_held(&host_mmu.lock);
 
-	host_lock_component();
+	/*
+	 * Adjust against IOMMU devices first. host_stage2_adjust_range() should
+	 * be called last for proper alignment.
+	 */
+	if (!is_memory) {
+		ret = pkvm_iommu_host_stage2_adjust_range(addr, &range.start,
+							  &range.end);
+		if (ret)
+			return ret;
+	}
+
 	ret = host_stage2_adjust_range(addr, &range);
 	if (ret)
-		goto unlock;
+		return ret;
 
-	ret = host_stage2_idmap_locked(range.start, range.end - range.start, prot);
-unlock:
-	host_unlock_component();
+	return host_stage2_idmap_locked(range.start, range.end - range.start, prot, false);
+}
 
-	return ret;
+static void (*illegal_abt_notifier)(struct kvm_cpu_context *host_ctxt);
+
+int __pkvm_register_illegal_abt_notifier(void (*cb)(struct kvm_cpu_context *))
+{
+	return cmpxchg(&illegal_abt_notifier, NULL, cb) ? -EBUSY : 0;
+}
+
+static void host_inject_abort(struct kvm_cpu_context *host_ctxt)
+{
+	u64 spsr = read_sysreg_el2(SYS_SPSR);
+	u64 esr = read_sysreg_el2(SYS_ESR);
+	u64 ventry, ec;
+
+	if (READ_ONCE(illegal_abt_notifier))
+		illegal_abt_notifier(host_ctxt);
+
+	/* Repaint the ESR to report a same-level fault if taken from EL1 */
+	if ((spsr & PSR_MODE_MASK) != PSR_MODE_EL0t) {
+		ec = ESR_ELx_EC(esr);
+		if (ec == ESR_ELx_EC_DABT_LOW)
+			ec = ESR_ELx_EC_DABT_CUR;
+		else if (ec == ESR_ELx_EC_IABT_LOW)
+			ec = ESR_ELx_EC_IABT_CUR;
+		else
+			WARN_ON(1);
+		esr &= ~ESR_ELx_EC_MASK;
+		esr |= ec << ESR_ELx_EC_SHIFT;
+	}
+
+	/*
+	 * Since S1PTW should only ever be set for stage-2 faults, we're pretty
+	 * much guaranteed that it won't be set in ESR_EL1 by the hardware. So,
+	 * let's use that bit to allow the host abort handler to differentiate
+	 * this abort from normal userspace faults.
+	 *
+	 * Note: although S1PTW is RES0 at EL1, it is guaranteed by the
+	 * architecture to be backed by flops, so it should be safe to use.
+	 */
+	esr |= ESR_ELx_S1PTW;
+
+	write_sysreg_el1(esr, SYS_ESR);
+	write_sysreg_el1(spsr, SYS_SPSR);
+	write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
+	write_sysreg_el1(read_sysreg_el2(SYS_FAR), SYS_FAR);
+
+	ventry = read_sysreg_el1(SYS_VBAR);
+	ventry += get_except64_offset(spsr, PSR_MODE_EL1h, except_type_sync);
+	write_sysreg_el2(ventry, SYS_ELR);
+
+	spsr = get_except64_cpsr(spsr, system_supports_mte(),
+				 read_sysreg_el1(SYS_SCTLR), PSR_MODE_EL1h);
+	write_sysreg_el2(spsr, SYS_SPSR);
+}
+
+static bool is_dabt(u64 esr)
+{
+	return ESR_ELx_EC(esr) == ESR_ELx_EC_DABT_LOW;
+}
+
+static int (*perm_fault_handler)(struct kvm_cpu_context *host_ctxt, u64 esr, u64 addr);
+
+int hyp_register_host_perm_fault_handler(int (*cb)(struct kvm_cpu_context *ctxt, u64 esr, u64 addr))
+{
+	return cmpxchg(&perm_fault_handler, NULL, cb) ? -EBUSY : 0;
+}
+
+static int handle_host_perm_fault(struct kvm_cpu_context *host_ctxt, u64 esr, u64 addr)
+{
+	int (*cb)(struct kvm_cpu_context *host_ctxt, u64 esr, u64 addr);
+
+	cb = READ_ONCE(perm_fault_handler);
+	return cb ? cb(host_ctxt, esr, addr) : -EPERM;
 }
 
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 {
 	struct kvm_vcpu_fault_info fault;
 	u64 esr, addr;
-	int ret = 0;
+	int ret = -EPERM;
 
 	esr = read_sysreg_el2(SYS_ESR);
 	BUG_ON(!__get_fault_info(esr, &fault));
 
 	addr = (fault.hpfar_el2 & HPFAR_MASK) << 8;
-	ret = host_stage2_idmap(addr);
-	BUG_ON(ret && ret != -EAGAIN);
-}
+	addr |= fault.far_el2 & FAR_MASK;
 
-/* This corresponds to locking order */
-enum pkvm_component_id {
-	PKVM_ID_HOST,
-	PKVM_ID_HYP,
-};
+	host_lock_component();
+
+	/* Check if an IOMMU device can handle the DABT. */
+	if (is_dabt(esr) && !addr_is_memory(addr) &&
+	    pkvm_iommu_host_dabt_handler(host_ctxt, esr, addr))
+		ret = 0;
+
+	/* If not handled, attempt to map the page. */
+	if (ret == -EPERM)
+		ret = host_stage2_idmap(addr);
+
+	host_unlock_component();
+
+	if ((esr & ESR_ELx_FSC_TYPE) == FSC_PERM)
+		ret = handle_host_perm_fault(host_ctxt, esr, addr);
+
+	if (ret == -EPERM)
+		host_inject_abort(host_ctxt);
+	else
+		BUG_ON(ret && ret != -EAGAIN);
+
+	trace_host_mem_abort(esr, addr);
+}
 
 struct pkvm_mem_transition {
 	u64				nr_pages;
@@ -399,11 +775,24 @@ struct pkvm_mem_transition {
 				/* Address in the completer's address space */
 				u64	completer_addr;
 			} host;
+			struct {
+				u64	completer_addr;
+			} hyp;
+			struct {
+				struct pkvm_hyp_vcpu *hyp_vcpu;
+			} guest;
 		};
 	} initiator;
 
 	struct {
 		enum pkvm_component_id	id;
+
+		union {
+			struct {
+				struct pkvm_hyp_vcpu *hyp_vcpu;
+				phys_addr_t phys;
+			} guest;
+		};
 	} completer;
 };
 
@@ -412,9 +801,13 @@ struct pkvm_mem_share {
 	const enum kvm_pgtable_prot		completer_prot;
 };
 
+struct pkvm_mem_donation {
+	const struct pkvm_mem_transition	tx;
+};
+
 struct check_walk_data {
 	enum pkvm_page_state	desired;
-	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
+	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte, u64 addr);
 };
 
 static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
@@ -425,10 +818,10 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
 	struct check_walk_data *d = arg;
 	kvm_pte_t pte = *ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(pte) && !addr_is_allowed_memory(kvm_pte_to_phys(pte)))
 		return -EINVAL;
 
-	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
+	return d->get_page_state(pte, addr) == d->desired ? 0 : -EPERM;
 }
 
 static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
@@ -443,12 +836,25 @@ static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return kvm_pgtable_walk(pgt, addr, size, &walker);
 }
 
-static enum pkvm_page_state host_get_page_state(kvm_pte_t pte)
+static enum pkvm_page_state host_get_page_state(kvm_pte_t pte, u64 addr)
 {
+	bool is_memory = addr_is_memory(addr);
+	enum pkvm_page_state state = 0;
+	enum kvm_pgtable_prot prot;
+
+	if (is_memory && hyp_phys_to_page(addr)->flags & MODULE_OWNED_PAGE)
+	       return PKVM_MODULE_DONT_TOUCH;
+
 	if (!kvm_pte_valid(pte) && pte)
 		return PKVM_NOPAGE;
 
-	return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
+	prot = kvm_pgtable_stage2_pte_prot(pte);
+	if (kvm_pte_valid(pte)) {
+		if ((prot & KVM_PGTABLE_PROT_RWX) != default_host_prot(is_memory))
+			state = PKVM_PAGE_RESTRICTED_PROT;
+	}
+
+	return state | pkvm_getstate(prot);
 }
 
 static int __host_check_page_state_range(u64 addr, u64 size,
@@ -459,8 +865,8 @@ static int __host_check_page_state_range(u64 addr, u64 size,
 		.get_page_state	= host_get_page_state,
 	};
 
-	hyp_assert_lock_held(&host_kvm.lock);
-	return check_page_state_range(&host_kvm.pgt, addr, size, &d);
+	hyp_assert_lock_held(&host_mmu.lock);
+	return check_page_state_range(&host_mmu.pgt, addr, size, &d);
 }
 
 static int __host_set_page_state_range(u64 addr, u64 size,
@@ -468,7 +874,7 @@ static int __host_set_page_state_range(u64 addr, u64 size,
 {
 	enum kvm_pgtable_prot prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, state);
 
-	return host_stage2_idmap_locked(addr, size, prot);
+	return host_stage2_idmap_locked(addr, size, prot, true);
 }
 
 static int host_request_owned_transition(u64 *completer_addr,
@@ -511,12 +917,100 @@ static int host_initiate_unshare(u64 *completer_addr,
 	return __host_set_page_state_range(addr, size, PKVM_PAGE_OWNED);
 }
 
-static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
+static int host_initiate_donation(u64 *completer_addr,
+				  const struct pkvm_mem_transition *tx)
 {
+	enum pkvm_component_id owner_id = tx->completer.id;
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	*completer_addr = tx->initiator.host.completer_addr;
+	return host_stage2_set_owner_locked(tx->initiator.addr, size, owner_id);
+}
+
+static bool __host_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
+{
+	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
+		 tx->initiator.id != PKVM_ID_HYP);
+}
+
+static int __host_ack_transition(u64 addr, const struct pkvm_mem_transition *tx,
+				 enum pkvm_page_state state)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__host_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __host_check_page_state_range(addr, size, state);
+}
+
+static int host_ack_share(u64 addr, const struct pkvm_mem_transition *tx,
+			  enum kvm_pgtable_prot perms)
+{
+	if (perms != PKVM_HOST_MEM_PROT)
+		return -EPERM;
+
+	return __host_ack_transition(addr, tx, PKVM_NOPAGE);
+}
+
+static int host_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	return __host_ack_transition(addr, tx, PKVM_NOPAGE);
+}
+
+static int host_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	return __host_ack_transition(addr, tx, PKVM_PAGE_SHARED_BORROWED);
+}
+
+static int host_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
+			       enum kvm_pgtable_prot perms)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	int err;
+
+	err = __host_set_page_state_range(addr, size, PKVM_PAGE_SHARED_BORROWED);
+	if (err)
+		return err;
+
+	if (tx->initiator.id == PKVM_ID_GUEST)
+		psci_mem_protect_dec(tx->nr_pages);
+
+	return 0;
+}
+
+static int host_complete_unshare(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	enum pkvm_component_id owner_id = tx->initiator.id;
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (tx->initiator.id == PKVM_ID_GUEST)
+		psci_mem_protect_inc(tx->nr_pages);
+
+	return host_stage2_set_owner_locked(addr, size, owner_id);
+}
+
+static int host_complete_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	enum pkvm_component_id host_id = tx->completer.id;
+
+	return host_stage2_set_owner_locked(addr, size, host_id);
+}
+
+static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte, u64 addr)
+{
+	enum pkvm_page_state state = 0;
+	enum kvm_pgtable_prot prot;
+
 	if (!kvm_pte_valid(pte))
 		return PKVM_NOPAGE;
 
-	return pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
+	prot = kvm_pgtable_hyp_pte_prot(pte);
+	if (kvm_pte_valid(pte) && ((prot & KVM_PGTABLE_PROT_RWX) != PAGE_HYP))
+		state = PKVM_PAGE_RESTRICTED_PROT;
+
+	return state | pkvm_getstate(prot);
 }
 
 static int __hyp_check_page_state_range(u64 addr, u64 size,
@@ -531,6 +1025,27 @@ static int __hyp_check_page_state_range(u64 addr, u64 size,
 	return check_page_state_range(&pkvm_pgtable, addr, size, &d);
 }
 
+static int hyp_request_donation(u64 *completer_addr,
+				const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u64 addr = tx->initiator.addr;
+
+	*completer_addr = tx->initiator.hyp.completer_addr;
+	return __hyp_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
+}
+
+static int hyp_initiate_donation(u64 *completer_addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	int ret;
+
+	*completer_addr = tx->initiator.hyp.completer_addr;
+	ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, tx->initiator.addr, size);
+	return (ret != size) ? -EFAULT : 0;
+}
+
 static bool __hyp_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
 {
 	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
@@ -555,6 +1070,9 @@ static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 {
 	u64 size = tx->nr_pages * PAGE_SIZE;
 
+	if (tx->initiator.id == PKVM_ID_HOST && hyp_page_count((void *)addr))
+		return -EBUSY;
+
 	if (__hyp_ack_skip_pgtable_check(tx))
 		return 0;
 
@@ -562,6 +1080,16 @@ static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 					    PKVM_PAGE_SHARED_BORROWED);
 }
 
+static int hyp_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__hyp_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
+}
+
 static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
 			      enum kvm_pgtable_prot perms)
 {
@@ -580,6 +1108,227 @@ static int hyp_complete_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 	return (ret != size) ? -EFAULT : 0;
 }
 
+static int hyp_complete_donation(u64 addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	void *start = (void *)addr, *end = start + (tx->nr_pages * PAGE_SIZE);
+	enum kvm_pgtable_prot prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_OWNED);
+
+	return pkvm_create_mappings_locked(start, end, prot);
+}
+
+static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
+{
+	enum pkvm_page_state state = 0;
+	enum kvm_pgtable_prot prot;
+
+	if (!kvm_pte_valid(pte))
+		return PKVM_NOPAGE;
+
+	prot = kvm_pgtable_stage2_pte_prot(pte);
+	if (kvm_pte_valid(pte) && ((prot & KVM_PGTABLE_PROT_RWX) != KVM_PGTABLE_PROT_RWX))
+		state = PKVM_PAGE_RESTRICTED_PROT;
+
+	return state | pkvm_getstate(prot);
+}
+
+static int __guest_check_page_state_range(struct pkvm_hyp_vcpu *vcpu, u64 addr,
+					  u64 size, enum pkvm_page_state state)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	struct check_walk_data d = {
+		.desired	= state,
+		.get_page_state	= guest_get_page_state,
+	};
+
+	hyp_assert_lock_held(&vm->lock);
+	return check_page_state_range(&vm->pgt, addr, size, &d);
+}
+
+static int guest_ack_share(u64 addr, const struct pkvm_mem_transition *tx,
+			   enum kvm_pgtable_prot perms)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (perms != KVM_PGTABLE_PROT_RWX)
+		return -EPERM;
+
+	return __guest_check_page_state_range(tx->completer.guest.hyp_vcpu,
+					      addr, size, PKVM_NOPAGE);
+}
+
+static int guest_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	return __guest_check_page_state_range(tx->completer.guest.hyp_vcpu,
+					      addr, size, PKVM_NOPAGE);
+}
+
+static int guest_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
+				enum kvm_pgtable_prot perms)
+{
+	struct pkvm_hyp_vcpu *vcpu = tx->completer.guest.hyp_vcpu;
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	enum kvm_pgtable_prot prot;
+
+	prot = pkvm_mkstate(perms, PKVM_PAGE_SHARED_BORROWED);
+	return kvm_pgtable_stage2_map(&vm->pgt, addr, size, tx->completer.guest.phys,
+				      prot, &vcpu->vcpu.arch.pkvm_memcache);
+}
+
+static int guest_complete_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	enum kvm_pgtable_prot prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED);
+	struct pkvm_hyp_vcpu *vcpu = tx->completer.guest.hyp_vcpu;
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	phys_addr_t phys = tx->completer.guest.phys;
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	int err;
+
+	if (tx->initiator.id == PKVM_ID_HOST)
+		psci_mem_protect_inc(tx->nr_pages);
+
+	if (pkvm_ipa_range_has_pvmfw(vm, addr, addr + size)) {
+		if (WARN_ON(!pkvm_hyp_vcpu_is_protected(vcpu))) {
+			err = -EPERM;
+			goto err_undo_psci;
+		}
+
+		WARN_ON(tx->initiator.id != PKVM_ID_HOST);
+		err = pkvm_load_pvmfw_pages(vm, addr, phys, size);
+		if (err)
+			goto err_undo_psci;
+	}
+
+	/*
+	 * If this fails, we effectively leak the pages since they're now
+	 * owned by the guest but not mapped into its stage-2 page-table.
+	 */
+	return kvm_pgtable_stage2_map(&vm->pgt, addr, size, phys, prot,
+				      &vcpu->vcpu.arch.pkvm_memcache);
+
+err_undo_psci:
+	if (tx->initiator.id == PKVM_ID_HOST)
+		psci_mem_protect_dec(tx->nr_pages);
+	return err;
+}
+
+static int __guest_get_completer_addr(u64 *completer_addr, phys_addr_t phys,
+				      const struct pkvm_mem_transition *tx)
+{
+	switch (tx->completer.id) {
+	case PKVM_ID_HOST:
+		*completer_addr = phys;
+		break;
+	case PKVM_ID_HYP:
+		*completer_addr = (u64)__hyp_va(phys);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int __guest_request_page_transition(u64 *completer_addr,
+					   const struct pkvm_mem_transition *tx,
+					   enum pkvm_page_state desired)
+{
+	struct pkvm_hyp_vcpu *vcpu = tx->initiator.guest.hyp_vcpu;
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	enum pkvm_page_state state;
+	phys_addr_t phys;
+	kvm_pte_t pte;
+	u32 level;
+	int ret;
+
+	if (tx->nr_pages != 1)
+		return -E2BIG;
+
+	ret = kvm_pgtable_get_leaf(&vm->pgt, tx->initiator.addr, &pte, &level);
+	if (ret)
+		return ret;
+
+	state = guest_get_page_state(pte, tx->initiator.addr);
+	if (state == PKVM_NOPAGE)
+		return -EFAULT;
+
+	if (state != desired)
+		return -EPERM;
+
+	/*
+	 * We only deal with page granular mappings in the guest for now as
+	 * the pgtable code relies on being able to recreate page mappings
+	 * lazily after zapping a block mapping, which doesn't work once the
+	 * pages have been donated.
+	 */
+	if (level != KVM_PGTABLE_MAX_LEVELS - 1)
+		return -EINVAL;
+
+	phys = kvm_pte_to_phys(pte);
+	if (!addr_is_allowed_memory(phys))
+		return -EINVAL;
+
+	return __guest_get_completer_addr(completer_addr, phys, tx);
+}
+
+static int guest_request_share(u64 *completer_addr,
+			       const struct pkvm_mem_transition *tx)
+{
+	return __guest_request_page_transition(completer_addr, tx,
+					       PKVM_PAGE_OWNED);
+}
+
+static int guest_request_unshare(u64 *completer_addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	return __guest_request_page_transition(completer_addr, tx,
+					       PKVM_PAGE_SHARED_OWNED);
+}
+
+static int __guest_initiate_page_transition(u64 *completer_addr,
+					    const struct pkvm_mem_transition *tx,
+					    enum pkvm_page_state state)
+{
+	struct pkvm_hyp_vcpu *vcpu = tx->initiator.guest.hyp_vcpu;
+	struct kvm_hyp_memcache *mc = &vcpu->vcpu.arch.pkvm_memcache;
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u64 addr = tx->initiator.addr;
+	enum kvm_pgtable_prot prot;
+	phys_addr_t phys;
+	kvm_pte_t pte;
+	int ret;
+
+	ret = kvm_pgtable_get_leaf(&vm->pgt, addr, &pte, NULL);
+	if (ret)
+		return ret;
+
+	phys = kvm_pte_to_phys(pte);
+	prot = pkvm_mkstate(kvm_pgtable_stage2_pte_prot(pte), state);
+	ret = kvm_pgtable_stage2_map(&vm->pgt, addr, size, phys, prot, mc);
+	if (ret)
+		return ret;
+
+	return __guest_get_completer_addr(completer_addr, phys, tx);
+}
+
+static int guest_initiate_share(u64 *completer_addr,
+				const struct pkvm_mem_transition *tx)
+{
+	return __guest_initiate_page_transition(completer_addr, tx,
+						PKVM_PAGE_SHARED_OWNED);
+}
+
+static int guest_initiate_unshare(u64 *completer_addr,
+				  const struct pkvm_mem_transition *tx)
+{
+	return __guest_initiate_page_transition(completer_addr, tx,
+						PKVM_PAGE_OWNED);
+}
+
 static int check_share(struct pkvm_mem_share *share)
 {
 	const struct pkvm_mem_transition *tx = &share->tx;
@@ -590,6 +1339,9 @@ static int check_share(struct pkvm_mem_share *share)
 	case PKVM_ID_HOST:
 		ret = host_request_owned_transition(&completer_addr, tx);
 		break;
+	case PKVM_ID_GUEST:
+		ret = guest_request_share(&completer_addr, tx);
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -598,9 +1350,22 @@ static int check_share(struct pkvm_mem_share *share)
 		return ret;
 
 	switch (tx->completer.id) {
+	case PKVM_ID_HOST:
+		ret = host_ack_share(completer_addr, tx, share->completer_prot);
+		break;
 	case PKVM_ID_HYP:
 		ret = hyp_ack_share(completer_addr, tx, share->completer_prot);
 		break;
+	case PKVM_ID_GUEST:
+		ret = guest_ack_share(completer_addr, tx, share->completer_prot);
+		break;
+	case PKVM_ID_FFA:
+		/*
+		 * We only check the host; the secure side will check the other
+		 * end when we forward the FFA call.
+		 */
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -618,6 +1383,9 @@ static int __do_share(struct pkvm_mem_share *share)
 	case PKVM_ID_HOST:
 		ret = host_initiate_share(&completer_addr, tx);
 		break;
+	case PKVM_ID_GUEST:
+		ret = guest_initiate_share(&completer_addr, tx);
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -626,9 +1394,22 @@ static int __do_share(struct pkvm_mem_share *share)
 		return ret;
 
 	switch (tx->completer.id) {
+	case PKVM_ID_HOST:
+		ret = host_complete_share(completer_addr, tx, share->completer_prot);
+		break;
 	case PKVM_ID_HYP:
 		ret = hyp_complete_share(completer_addr, tx, share->completer_prot);
 		break;
+	case PKVM_ID_GUEST:
+		ret = guest_complete_share(completer_addr, tx, share->completer_prot);
+		break;
+	case PKVM_ID_FFA:
+		/*
+		 * We're not responsible for any secure page-tables, so there's
+		 * nothing to do here.
+		 */
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -666,6 +1447,9 @@ static int check_unshare(struct pkvm_mem_share *share)
 	case PKVM_ID_HOST:
 		ret = host_request_unshare(&completer_addr, tx);
 		break;
+	case PKVM_ID_GUEST:
+		ret = guest_request_unshare(&completer_addr, tx);
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -674,9 +1458,16 @@ static int check_unshare(struct pkvm_mem_share *share)
 		return ret;
 
 	switch (tx->completer.id) {
+	case PKVM_ID_HOST:
+		ret = host_ack_unshare(completer_addr, tx);
+		break;
 	case PKVM_ID_HYP:
 		ret = hyp_ack_unshare(completer_addr, tx);
 		break;
+	case PKVM_ID_FFA:
+		/* See check_share() */
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -694,6 +1485,9 @@ static int __do_unshare(struct pkvm_mem_share *share)
 	case PKVM_ID_HOST:
 		ret = host_initiate_unshare(&completer_addr, tx);
 		break;
+	case PKVM_ID_GUEST:
+		ret = guest_initiate_unshare(&completer_addr, tx);
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -702,9 +1496,16 @@ static int __do_unshare(struct pkvm_mem_share *share)
 		return ret;
 
 	switch (tx->completer.id) {
+	case PKVM_ID_HOST:
+		ret = host_complete_unshare(completer_addr, tx);
+		break;
 	case PKVM_ID_HYP:
 		ret = hyp_complete_unshare(completer_addr, tx);
 		break;
+	case PKVM_ID_FFA:
+		/* See __do_share() */
+		ret = 0;
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -732,6 +1533,100 @@ static int do_unshare(struct pkvm_mem_share *share)
 	return WARN_ON(__do_unshare(share));
 }
 
+static int check_donation(struct pkvm_mem_donation *donation)
+{
+	const struct pkvm_mem_transition *tx = &donation->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_request_owned_transition(&completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_request_donation(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id) {
+	case PKVM_ID_HOST:
+		ret = host_ack_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_ack_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_GUEST:
+		ret = guest_ack_donation(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static int __do_donate(struct pkvm_mem_donation *donation)
+{
+	const struct pkvm_mem_transition *tx = &donation->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_initiate_donation(&completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_initiate_donation(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id) {
+	case PKVM_ID_HOST:
+		ret = host_complete_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_complete_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_GUEST:
+		ret = guest_complete_donation(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+/*
+ * do_donate():
+ *
+ * The page owner transfers ownership to another component, losing access
+ * as a consequence.
+ *
+ * Initiator: OWNED	=> NOPAGE
+ * Completer: NOPAGE	=> OWNED
+ */
+static int do_donate(struct pkvm_mem_donation *donation)
+{
+	int ret;
+
+	ret = check_donation(donation);
+	if (ret)
+		return ret;
+
+	return WARN_ON(__do_donate(donation));
+}
+
 int __pkvm_host_share_hyp(u64 pfn)
 {
 	int ret;
@@ -765,6 +1660,70 @@ int __pkvm_host_share_hyp(u64 pfn)
 	return ret;
 }
 
+int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 ipa)
+{
+	int ret;
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= 1,
+			.initiator	= {
+				.id	= PKVM_ID_GUEST,
+				.addr	= ipa,
+				.guest	= {
+					.hyp_vcpu = vcpu,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HOST,
+			},
+		},
+		.completer_prot	= PKVM_HOST_MEM_PROT,
+	};
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = do_share(&share);
+
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
+
+int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 ipa)
+{
+	int ret;
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= 1,
+			.initiator	= {
+				.id	= PKVM_ID_GUEST,
+				.addr	= ipa,
+				.guest	= {
+					.hyp_vcpu = vcpu,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HOST,
+			},
+		},
+		.completer_prot	= PKVM_HOST_MEM_PROT,
+	};
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = do_unshare(&share);
+
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
+
 int __pkvm_host_unshare_hyp(u64 pfn)
 {
 	int ret;
@@ -797,3 +1756,506 @@ int __pkvm_host_unshare_hyp(u64 pfn)
 
 	return ret;
 }
+
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= host_addr,
+				.host	= {
+					.completer_addr = hyp_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HYP,
+			},
+		},
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_donate(&donation);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HYP,
+				.addr	= hyp_addr,
+				.hyp	= {
+					.completer_addr = host_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HOST,
+			},
+		},
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_donate(&donation);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+static int restrict_host_page_perms(u64 addr, kvm_pte_t pte, u32 level, enum kvm_pgtable_prot prot)
+{
+	int ret = 0;
+
+	/* XXX: optimize ... */
+	if (kvm_pte_valid(pte) && (level == KVM_PGTABLE_MAX_LEVELS - 1))
+		ret = kvm_pgtable_stage2_unmap(&host_mmu.pgt, addr, PAGE_SIZE);
+	if (!ret)
+		ret = host_stage2_idmap_locked(addr, PAGE_SIZE, prot, false);
+
+	return ret;
+}
+
+int module_change_host_page_prot(u64 pfn, enum kvm_pgtable_prot prot)
+{
+	u64 addr = hyp_pfn_to_phys(pfn);
+	struct hyp_page *page;
+	kvm_pte_t pte;
+	u32 level;
+	int ret;
+
+	if ((prot & KVM_PGTABLE_PROT_RWX) != prot || !addr_is_memory(addr))
+		return -EINVAL;
+
+	host_lock_component();
+	ret = kvm_pgtable_get_leaf(&host_mmu.pgt, addr, &pte, &level);
+	if (ret)
+		goto unlock;
+
+	ret = -EPERM;
+	page = hyp_phys_to_page(addr);
+
+	/*
+	 * Modules can only relax permissions of pages they own, and restrict
+	 * permissions of pristine pages.
+	 */
+	if (prot == KVM_PGTABLE_PROT_RWX) {
+		if (!(page->flags & MODULE_OWNED_PAGE))
+			goto unlock;
+	} else if (host_get_page_state(pte, addr) != PKVM_PAGE_OWNED) {
+		goto unlock;
+	}
+
+	if (prot == KVM_PGTABLE_PROT_RWX)
+		ret = host_stage2_set_owner_locked(addr, PAGE_SIZE, PKVM_ID_HOST);
+	else if (!prot)
+		ret = host_stage2_set_owner_locked(addr, PAGE_SIZE, PKVM_ID_PROTECTED);
+	else
+		ret = restrict_host_page_perms(addr, pte, level, prot);
+
+	if (ret)
+		goto unlock;
+
+	if (prot != KVM_PGTABLE_PROT_RWX)
+		hyp_phys_to_page(addr)->flags |= MODULE_OWNED_PAGE;
+	else
+		hyp_phys_to_page(addr)->flags &= ~MODULE_OWNED_PAGE;
+
+unlock:
+	host_unlock_component();
+
+	return ret;
+}
+
+int hyp_pin_shared_mem(void *from, void *to)
+{
+	u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+	u64 end = PAGE_ALIGN((u64)to);
+	u64 size = end - start;
+	int ret;
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = __host_check_page_state_range(__hyp_pa(start), size,
+					    PKVM_PAGE_SHARED_OWNED);
+	if (ret)
+		goto unlock;
+
+	ret = __hyp_check_page_state_range(start, size,
+					   PKVM_PAGE_SHARED_BORROWED);
+	if (ret)
+		goto unlock;
+
+	for (cur = start; cur < end; cur += PAGE_SIZE)
+		hyp_page_ref_inc(hyp_virt_to_page(cur));
+
+unlock:
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+void hyp_unpin_shared_mem(void *from, void *to)
+{
+	u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+	u64 end = PAGE_ALIGN((u64)to);
+
+	host_lock_component();
+	hyp_lock_component();
+
+	for (cur = start; cur < end; cur += PAGE_SIZE)
+		hyp_page_ref_dec(hyp_virt_to_page(cur));
+
+	hyp_unlock_component();
+	host_unlock_component();
+}
+
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 guest_addr = hyp_pfn_to_phys(gfn);
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= 1,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= host_addr,
+				.host	= {
+					.completer_addr = guest_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_GUEST,
+				.guest	= {
+					.hyp_vcpu = vcpu,
+					.phys = host_addr,
+				},
+			},
+		},
+		.completer_prot	= KVM_PGTABLE_PROT_RWX,
+	};
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = do_share(&share);
+
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
+
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 guest_addr = hyp_pfn_to_phys(gfn);
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= 1,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= host_addr,
+				.host	= {
+					.completer_addr = guest_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_GUEST,
+				.guest	= {
+					.hyp_vcpu = vcpu,
+					.phys = host_addr,
+				},
+			},
+		},
+	};
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = do_donate(&donation);
+
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
+
+int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= hyp_pfn_to_phys(pfn),
+			},
+			.completer	= {
+				.id	= PKVM_ID_FFA,
+			},
+		},
+	};
+
+	host_lock_component();
+	ret = do_share(&share);
+	host_unlock_component();
+
+	return ret;
+}
+
+
+int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= hyp_pfn_to_phys(pfn),
+			},
+			.completer	= {
+				.id	= PKVM_ID_FFA,
+			},
+		},
+	};
+
+	host_lock_component();
+	ret = do_unshare(&share);
+	host_unlock_component();
+
+	return ret;
+}
+
+void hyp_poison_page(phys_addr_t phys)
+{
+	void *addr = hyp_fixmap_map(phys);
+
+	memset(addr, 0, PAGE_SIZE);
+	/*
+	 * Prefer kvm_flush_dcache_to_poc() over __clean_dcache_guest_page()
+	 * here as the latter may elide the CMO under the assumption that FWB
+	 * will be enabled on CPUs that support it. This is incorrect for the
+	 * host stage-2 and would otherwise lead to a malicious host potentially
+	 * being able to read the contents of newly reclaimed guest pages.
+	 */
+	kvm_flush_dcache_to_poc(addr, PAGE_SIZE);
+	hyp_fixmap_unmap();
+}
+
+void destroy_hyp_vm_pgt(struct pkvm_hyp_vm *vm)
+{
+	guest_lock_component(vm);
+	kvm_pgtable_stage2_destroy(&vm->pgt);
+	guest_unlock_component(vm);
+}
+
+void drain_hyp_pool(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
+{
+	void *addr = hyp_alloc_pages(&vm->pool, 0);
+
+	while (addr) {
+		memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
+		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+		WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
+		addr = hyp_alloc_pages(&vm->pool, 0);
+	}
+}
+
+int __pkvm_host_reclaim_page(struct pkvm_hyp_vm *vm, u64 pfn, u64 ipa)
+{
+	phys_addr_t phys = hyp_pfn_to_phys(pfn);
+	kvm_pte_t pte;
+	int ret;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, NULL);
+	if (ret)
+		goto unlock;
+
+	if (!kvm_pte_valid(pte)) {
+		ret = -EINVAL;
+		goto unlock;
+	} else if (phys != kvm_pte_to_phys(pte)) {
+		ret = -EPERM;
+		goto unlock;
+	}
+
+	/* We could avoid TLB inval, it is done per VMID on the finalize path */
+	WARN_ON(kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE));
+
+	switch(guest_get_page_state(pte, ipa)) {
+	case PKVM_PAGE_OWNED:
+		WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE));
+		hyp_poison_page(phys);
+		psci_mem_protect_dec(1);
+		break;
+	case PKVM_PAGE_SHARED_BORROWED:
+		WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED));
+		break;
+	case PKVM_PAGE_SHARED_OWNED:
+		WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
+		break;
+	default:
+		BUG_ON(1);
+	}
+
+	WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
+
+unlock:
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
+
+/* Replace this with something more structured once day */
+#define MMIO_NOTE	(('M' << 24 | 'M' << 16 | 'I' << 8 | 'O') << 1)
+
+static bool __check_ioguard_page(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+	kvm_pte_t pte;
+	u32 level;
+	int ret;
+
+	ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
+	if (ret)
+		return false;
+
+	/* Must be a PAGE_SIZE mapping with our annotation */
+	return (BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(level)) == PAGE_SIZE &&
+		pte == MMIO_NOTE);
+}
+
+int __pkvm_install_ioguard_page(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+	kvm_pte_t pte;
+	u32 level;
+	int ret;
+
+	if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vm->kvm.arch.flags))
+		return -EINVAL;
+
+	if (ipa & ~PAGE_MASK)
+		return -EINVAL;
+
+	guest_lock_component(vm);
+
+	ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
+	if (ret)
+		goto unlock;
+
+	if (pte && BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(level)) == PAGE_SIZE) {
+		/*
+		 * Already flagged as MMIO, let's accept it, and fail
+		 * otherwise
+		 */
+		if (pte != MMIO_NOTE)
+			ret = -EBUSY;
+
+		goto unlock;
+	}
+
+	ret = kvm_pgtable_stage2_annotate(&vm->pgt, ipa, PAGE_SIZE,
+					  &hyp_vcpu->vcpu.arch.pkvm_memcache,
+					  MMIO_NOTE);
+
+unlock:
+	guest_unlock_component(vm);
+	return ret;
+}
+
+int __pkvm_remove_ioguard_page(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+
+	if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vm->kvm.arch.flags))
+		return -EINVAL;
+
+	guest_lock_component(vm);
+
+	if (__check_ioguard_page(hyp_vcpu, ipa))
+		WARN_ON(kvm_pgtable_stage2_unmap(&vm->pgt,
+				ALIGN_DOWN(ipa, PAGE_SIZE), PAGE_SIZE));
+
+	guest_unlock_component(vm);
+	return 0;
+}
+
+bool __pkvm_check_ioguard_page(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+	u64 ipa, end;
+	bool ret;
+
+	if (!kvm_vcpu_dabt_isvalid(&hyp_vcpu->vcpu))
+		return false;
+
+	if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vm->kvm.arch.flags))
+		return true;
+
+	ipa  = kvm_vcpu_get_fault_ipa(&hyp_vcpu->vcpu);
+	ipa |= kvm_vcpu_get_hfar(&hyp_vcpu->vcpu) & FAR_MASK;
+	end = ipa + kvm_vcpu_dabt_get_as(&hyp_vcpu->vcpu) - 1;
+
+	guest_lock_component(vm);
+	ret = __check_ioguard_page(hyp_vcpu, ipa);
+	if ((end & PAGE_MASK) != (ipa & PAGE_MASK))
+		ret &= __check_ioguard_page(hyp_vcpu, end);
+	guest_unlock_component(vm);
+
+	return ret;
+}
+
+int host_stage2_protect_pages_locked(phys_addr_t addr, u64 size)
+{
+	int ret;
+
+	hyp_assert_lock_held(&host_mmu.lock);
+
+	ret = __host_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
+	if (!ret)
+		ret = host_stage2_set_owner_locked(addr, size, PKVM_ID_PROTECTED);
+
+	return ret;
+}
+
+int host_stage2_get_leaf(phys_addr_t phys, kvm_pte_t *ptep, u32 *level)
+{
+	int ret;
+
+	host_lock_component();
+	ret = kvm_pgtable_get_leaf(&host_mmu.pgt, phys, ptep, level);
+	host_unlock_component();
+
+	return ret;
+}

diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 96193cb..76f26dc 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c

@@ -14,7 +14,9 @@
 #include <nvhe/early_alloc.h>
 #include <nvhe/gfp.h>
 #include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
+#include <nvhe/modules.h>
 #include <nvhe/spinlock.h>
 
 struct kvm_pgtable pkvm_pgtable;
@@ -23,7 +25,14 @@ hyp_spinlock_t pkvm_pgd_lock;
 struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
 unsigned int hyp_memblock_nr;
 
-static u64 __io_map_base;
+static u64 __private_range_base;
+static u64 __private_range_cur;
+
+struct hyp_fixmap_slot {
+	u64 addr;
+	kvm_pte_t *ptep;
+};
+static DEFINE_PER_CPU(struct hyp_fixmap_slot, fixmap_slots);
 
 static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 				  unsigned long phys, enum kvm_pgtable_prot prot)
@@ -42,29 +51,29 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
  * @size:	The size of the VA range to reserve.
  * @haddr:	The hypervisor virtual start address of the allocation.
  *
- * The private virtual address (VA) range is allocated above __io_map_base
+ * The private virtual address (VA) range is allocated above __private_range_base
  * and aligned based on the order of @size.
  *
  * Return: 0 on success or negative error code on failure.
  */
 int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr)
 {
-	unsigned long base, addr;
+	unsigned long cur, addr;
 	int ret = 0;
 
 	hyp_spin_lock(&pkvm_pgd_lock);
 
 	/* Align the allocation based on the order of its size */
-	addr = ALIGN(__io_map_base, PAGE_SIZE << get_order(size));
+	addr = ALIGN(__private_range_cur, PAGE_SIZE << get_order(size));
 
 	/* The allocated size is always a multiple of PAGE_SIZE */
-	base = addr + PAGE_ALIGN(size);
+	cur = addr + PAGE_ALIGN(size);
 
-	/* Are we overflowing on the vmemmap ? */
-	if (!addr || base > __hyp_vmemmap)
+	/* Has the private range grown too large ? */
+	if (!addr || cur > __hyp_vmemmap || (cur - __private_range_base) > __PKVM_PRIVATE_SZ) {
 		ret = -ENOMEM;
-	else {
-		__io_map_base = base;
+	} else {
+		__private_range_cur = cur;
 		*haddr = addr;
 	}
 
@@ -93,6 +102,72 @@ int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 	return err;
 }
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+static unsigned long mod_range_start = ULONG_MAX;
+static unsigned long mod_range_end;
+static DEFINE_HYP_SPINLOCK(mod_range_lock);
+
+static void update_mod_range(unsigned long addr, size_t size)
+{
+	hyp_spin_lock(&mod_range_lock);
+	mod_range_start = min(mod_range_start, addr);
+	mod_range_end = max(mod_range_end, addr + size);
+	hyp_spin_unlock(&mod_range_lock);
+}
+
+static void assert_in_mod_range(unsigned long addr)
+{
+	/*
+	 * This is not entirely watertight if there are private range
+	 * allocations between modules being loaded, but in practice that is
+	 * probably going to be allocation initiated by the modules themselves.
+	 */
+	hyp_spin_lock(&mod_range_lock);
+	WARN_ON(addr < mod_range_start || mod_range_end <= addr);
+	hyp_spin_unlock(&mod_range_lock);
+}
+#else
+static inline void update_mod_range(unsigned long addr, size_t size) { }
+static inline void assert_in_mod_range(unsigned long addr) { }
+#endif
+
+void *__pkvm_alloc_module_va(u64 nr_pages)
+{
+	size_t size = nr_pages << PAGE_SHIFT;
+	unsigned long addr = 0;
+
+	if (!pkvm_alloc_private_va_range(size, &addr))
+		update_mod_range(addr, size);
+
+	return (void *)addr;
+}
+
+int __pkvm_map_module_page(u64 pfn, void *va, enum kvm_pgtable_prot prot, bool is_protected)
+{
+	unsigned long addr = (unsigned long)va;
+	int ret;
+
+	assert_in_mod_range(addr);
+
+	if (!is_protected) {
+		ret = __pkvm_host_donate_hyp(pfn, 1);
+		if (ret)
+			return ret;
+	}
+
+	ret = __pkvm_create_mappings(addr, PAGE_SIZE, hyp_pfn_to_phys(pfn), prot);
+	if (ret && !is_protected)
+		WARN_ON(__pkvm_hyp_donate_host(pfn, 1));
+
+	return ret;
+}
+
+void __pkvm_unmap_module_page(u64 pfn, void *va)
+{
+	WARN_ON(__pkvm_hyp_donate_host(pfn, 1));
+	pkvm_remove_mappings(va, va + PAGE_SIZE);
+}
+
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot)
 {
 	unsigned long start = (unsigned long)from;
@@ -129,13 +204,45 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 	return ret;
 }
 
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+void pkvm_remove_mappings(void *from, void *to)
 {
-	unsigned long start, end;
+	unsigned long size = (unsigned long)to - (unsigned long)from;
 
-	hyp_vmemmap_range(phys, size, &start, &end);
+	hyp_spin_lock(&pkvm_pgd_lock);
+	WARN_ON(kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)from, size) != size);
+	hyp_spin_unlock(&pkvm_pgd_lock);
+}
 
-	return __pkvm_create_mappings(start, end - start, back, PAGE_HYP);
+int hyp_back_vmemmap(phys_addr_t back)
+{
+	unsigned long i, start, size, end = 0;
+	int ret;
+
+	for (i = 0; i < hyp_memblock_nr; i++) {
+		start = hyp_memory[i].base;
+		start = ALIGN_DOWN((u64)hyp_phys_to_page(start), PAGE_SIZE);
+		/*
+		 * The begining of the hyp_vmemmap region for the current
+		 * memblock may already be backed by the page backing the end
+		 * the previous region, so avoid mapping it twice.
+		 */
+		start = max(start, end);
+
+		end = hyp_memory[i].base + hyp_memory[i].size;
+		end = PAGE_ALIGN((u64)hyp_phys_to_page(end));
+		if (start >= end)
+			continue;
+
+		size = end - start;
+		ret = __pkvm_create_mappings(start, size, back, PAGE_HYP);
+		if (ret)
+			return ret;
+
+		memset(hyp_phys_to_virt(back), 0, size);
+		back += size;
+	}
+
+	return 0;
 }
 
 static void *__hyp_bp_vect_base;
@@ -189,6 +296,103 @@ int hyp_map_vectors(void)
 	return 0;
 }
 
+void *hyp_fixmap_map(phys_addr_t phys)
+{
+	struct hyp_fixmap_slot *slot = this_cpu_ptr(&fixmap_slots);
+	kvm_pte_t pte, *ptep = slot->ptep;
+
+	pte = *ptep;
+	pte &= ~kvm_phys_to_pte(KVM_PHYS_INVALID);
+	pte |= kvm_phys_to_pte(phys) | KVM_PTE_VALID;
+	WRITE_ONCE(*ptep, pte);
+	dsb(ishst);
+
+	return (void *)slot->addr + offset_in_page(phys);
+}
+
+static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
+{
+	kvm_pte_t *ptep = slot->ptep;
+	u64 addr = slot->addr;
+
+	WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID);
+
+	/*
+	 * Irritatingly, the architecture requires that we use inner-shareable
+	 * broadcast TLB invalidation here in case another CPU speculates
+	 * through our fixmap and decides to create an "amalagamation of the
+	 * values held in the TLB" due to the apparent lack of a
+	 * break-before-make sequence.
+	 *
+	 * https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
+	 */
+	dsb(ishst);
+	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
+	dsb(ish);
+	isb();
+}
+
+void hyp_fixmap_unmap(void)
+{
+	fixmap_clear_slot(this_cpu_ptr(&fixmap_slots));
+}
+
+static int __create_fixmap_slot_cb(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+				   enum kvm_pgtable_walk_flags flag,
+				   void * const arg)
+{
+	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)arg);
+
+	if (!kvm_pte_valid(*ptep) || level != KVM_PGTABLE_MAX_LEVELS - 1)
+		return -EINVAL;
+
+	slot->addr = addr;
+	slot->ptep = ptep;
+
+	/*
+	 * Clear the PTE, but keep the page-table page refcount elevated to
+	 * prevent it from ever being freed. This lets us manipulate the PTEs
+	 * by hand safely without ever needing to allocate memory.
+	 */
+	fixmap_clear_slot(slot);
+
+	return 0;
+}
+
+static int create_fixmap_slot(u64 addr, u64 cpu)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= __create_fixmap_slot_cb,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+		.arg = (void *)cpu,
+	};
+
+	return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker);
+}
+
+int hyp_create_pcpu_fixmap(void)
+{
+	unsigned long addr, i;
+	int ret;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		ret = pkvm_alloc_private_va_range(PAGE_SIZE, &addr);
+		if (ret)
+			return ret;
+
+		ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PAGE_SIZE,
+					  __hyp_pa(__hyp_bss_start), PAGE_HYP);
+		if (ret)
+			return ret;
+
+		ret = create_fixmap_slot(addr, i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 int hyp_create_idmap(u32 hyp_va_bits)
 {
 	unsigned long start, end;
@@ -207,9 +411,43 @@ int hyp_create_idmap(u32 hyp_va_bits)
 	 * with the idmap to place the IOs and the vmemmap. IOs use the lower
 	 * half of the quarter and the vmemmap the upper half.
 	 */
-	__io_map_base = start & BIT(hyp_va_bits - 2);
-	__io_map_base ^= BIT(hyp_va_bits - 2);
-	__hyp_vmemmap = __io_map_base | BIT(hyp_va_bits - 3);
+	__private_range_base = start & BIT(hyp_va_bits - 2);
+	__private_range_base ^= BIT(hyp_va_bits - 2);
+	__private_range_cur = __private_range_base;
+	__hyp_vmemmap = __private_range_base | BIT(hyp_va_bits - 3);
 
 	return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
 }
+
+static void *admit_host_page(void *arg)
+{
+	struct kvm_hyp_memcache *host_mc = arg;
+
+	if (!host_mc->nr_pages)
+		return NULL;
+
+	/*
+	 * The host still owns the pages in its memcache, so we need to go
+	 * through a full host-to-hyp donation cycle to change it. Fortunately,
+	 * __pkvm_host_donate_hyp() takes care of races for us, so if it
+	 * succeeds we're good to go.
+	 */
+	if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(host_mc->head), 1))
+		return NULL;
+
+	return pop_hyp_memcache(host_mc, hyp_phys_to_virt);
+}
+
+/* Refill our local memcache by poping pages from the one provided by the host. */
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+		    struct kvm_hyp_memcache *host_mc)
+{
+	struct kvm_hyp_memcache tmp = *host_mc;
+	int ret;
+
+	ret =  __topup_hyp_memcache(mc, min_pages, admit_host_page,
+				    hyp_virt_to_phys, &tmp);
+	*host_mc = tmp;
+
+	return ret;
+}

diff --git a/arch/arm64/kvm/hyp/nvhe/module.lds.S b/arch/arm64/kvm/hyp/nvhe/module.lds.S
new file mode 100644
index 0000000..645080c
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/module.lds.S

@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <asm/hyp_image.h>
+#include <asm/page-def.h>
+
+SECTIONS {
+	.hyp.text : {
+		HYP_SECTION_SYMBOL_NAME(.text) = .;
+		*(.text .text.*)
+	}
+
+	.hyp.bss : {
+		HYP_SECTION_SYMBOL_NAME(.bss) = .;
+		*(.bss .bss.*)
+	}
+
+	.hyp.rodata : {
+		HYP_SECTION_SYMBOL_NAME(.rodata) = .;
+		*(.rodata .rodata.*)
+	}
+
+	.hyp.data : {
+		HYP_SECTION_SYMBOL_NAME(.data) = .;
+		*(.data .data.*)
+	}
+}

diff --git a/arch/arm64/kvm/hyp/nvhe/modules.c b/arch/arm64/kvm/hyp/nvhe/modules.c
new file mode 100644
index 0000000..7eda072
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/modules.c

@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2022 Google LLC
+ */
+#include <asm/kvm_host.h>
+#include <asm/kvm_pkvm_module.h>
+
+#include <nvhe/mem_protect.h>
+#include <nvhe/modules.h>
+#include <nvhe/mm.h>
+#include <nvhe/serial.h>
+#include <nvhe/spinlock.h>
+#include <nvhe/trap_handler.h>
+
+static void __kvm_flush_dcache_to_poc(void *addr, size_t size)
+{
+	kvm_flush_dcache_to_poc((unsigned long)addr, (unsigned long)size);
+}
+
+static atomic_t early_lm_pages;
+static void *__pkvm_linear_map_early(phys_addr_t phys, size_t size, enum kvm_pgtable_prot prot)
+{
+	void *addr = NULL;
+	int ret;
+
+	if (!PAGE_ALIGNED(phys) || !PAGE_ALIGNED(size))
+		return NULL;
+
+	addr = __hyp_va(phys);
+	ret = pkvm_create_mappings(addr, addr + size, prot);
+	if (ret)
+		addr = NULL;
+	else
+		atomic_add(size, &early_lm_pages);
+
+	return addr;
+}
+
+static void __pkvm_linear_unmap_early(void *addr, size_t size)
+{
+	pkvm_remove_mappings(addr, addr + size);
+	atomic_sub(size, &early_lm_pages);
+}
+
+int __pkvm_close_module_registration(void)
+{
+	/*
+	 * Page ownership tracking might go out of sync if there are stale
+	 * entries in pKVM's linear map range, so they must really be gone by
+	 * now.
+	 */
+	WARN_ON(atomic_read(&early_lm_pages));
+	return reset_pkvm_priv_hcall_limit();
+
+	/* The fuse is blown! No way back until reset */
+}
+
+const struct pkvm_module_ops module_ops = {
+	.create_private_mapping = __pkvm_create_private_mapping,
+	.alloc_module_va = __pkvm_alloc_module_va,
+	.map_module_page = __pkvm_map_module_page,
+	.register_serial_driver = __pkvm_register_serial_driver,
+	.puts = hyp_puts,
+	.putx64 = hyp_putx64,
+	.fixmap_map = hyp_fixmap_map,
+	.fixmap_unmap = hyp_fixmap_unmap,
+	.linear_map_early = __pkvm_linear_map_early,
+	.linear_unmap_early = __pkvm_linear_unmap_early,
+	.flush_dcache_to_poc = __kvm_flush_dcache_to_poc,
+	.register_host_perm_fault_handler = hyp_register_host_perm_fault_handler,
+	.host_stage2_mod_prot = module_change_host_page_prot,
+	.host_stage2_get_leaf = host_stage2_get_leaf,
+	.register_host_smc_handler = __pkvm_register_host_smc_handler,
+	.register_default_trap_handler = __pkvm_register_default_trap_handler,
+	.register_illegal_abt_notifier = __pkvm_register_illegal_abt_notifier,
+	.register_psci_notifier = __pkvm_register_psci_notifier,
+	.register_hyp_panic_notifier = __pkvm_register_hyp_panic_notifier,
+	.host_donate_hyp = __pkvm_host_donate_hyp,
+	.hyp_donate_host = __pkvm_hyp_donate_host,
+	.memcpy = memcpy,
+	.memset = memset,
+	.hyp_pa = hyp_virt_to_phys,
+	.hyp_va = hyp_phys_to_virt,
+	.kern_hyp_va = __kern_hyp_va,
+};
+
+int __pkvm_init_module(void *module_init)
+{
+	int (*do_module_init)(const struct pkvm_module_ops *ops) = module_init;
+
+	return do_module_init(&module_ops);
+}
+
+#define MAX_DYNAMIC_HCALLS 128
+
+atomic_t num_dynamic_hcalls = ATOMIC_INIT(0);
+DEFINE_HYP_SPINLOCK(dyn_hcall_lock);
+
+static dyn_hcall_t host_dynamic_hcalls[MAX_DYNAMIC_HCALLS];
+
+int handle_host_dynamic_hcall(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned long, id, host_ctxt, 0);
+	dyn_hcall_t hfn;
+	int dyn_id;
+
+	/*
+	 * TODO: static key to protect when no dynamic hcall is registered?
+	 */
+
+	dyn_id = (int)(id - KVM_HOST_SMCCC_ID(0)) -
+		 __KVM_HOST_SMCCC_FUNC___dynamic_hcalls;
+	if (dyn_id < 0)
+		return HCALL_UNHANDLED;
+
+	cpu_reg(host_ctxt, 0) = SMCCC_RET_NOT_SUPPORTED;
+
+	/*
+	 * Order access to num_dynamic_hcalls and host_dynamic_hcalls. Paired
+	 * with __pkvm_register_hcall().
+	 */
+	if (dyn_id >= atomic_read_acquire(&num_dynamic_hcalls))
+		goto end;
+
+	hfn = READ_ONCE(host_dynamic_hcalls[dyn_id]);
+	if (!hfn)
+		goto end;
+
+	cpu_reg(host_ctxt, 0) = SMCCC_RET_SUCCESS;
+	hfn(host_ctxt);
+end:
+	return HCALL_HANDLED;
+}
+
+int __pkvm_register_hcall(unsigned long hvn_hyp_va)
+{
+	dyn_hcall_t hfn = (void *)hvn_hyp_va;
+	int reserved_id, ret;
+
+	hyp_spin_lock(&dyn_hcall_lock);
+
+	reserved_id = atomic_read(&num_dynamic_hcalls);
+
+	if (reserved_id >= MAX_DYNAMIC_HCALLS) {
+		ret = -ENOMEM;
+		goto err_hcall_unlock;
+	}
+
+	WRITE_ONCE(host_dynamic_hcalls[reserved_id], hfn);
+
+	/*
+	 * Order access to num_dynamic_hcalls and host_dynamic_hcalls. Paired
+	 * with handle_host_dynamic_hcall.
+	 */
+	atomic_set_release(&num_dynamic_hcalls, reserved_id + 1);
+
+	ret = reserved_id + __KVM_HOST_SMCCC_FUNC___dynamic_hcalls;
+err_hcall_unlock:
+	hyp_spin_unlock(&dyn_hcall_lock);
+
+	return ret;
+};

diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index d40f0b3..11b190f 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c

@@ -32,7 +32,7 @@ u64 __hyp_vmemmap;
  */
 static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 					     struct hyp_page *p,
-					     unsigned short order)
+					     u8 order)
 {
 	phys_addr_t addr = hyp_page_to_phys(p);
 
@@ -51,7 +51,7 @@ static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 /* Find a buddy page currently available for allocation */
 static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
 					   struct hyp_page *p,
-					   unsigned short order)
+					   u8 order)
 {
 	struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);
 
@@ -93,11 +93,16 @@ static inline struct hyp_page *node_to_page(struct list_head *node)
 static void __hyp_attach_page(struct hyp_pool *pool,
 			      struct hyp_page *p)
 {
-	unsigned short order = p->order;
+	phys_addr_t phys = hyp_page_to_phys(p);
 	struct hyp_page *buddy;
+	u8 order = p->order;
 
 	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
+	/* Skip coalescing for 'external' pages being freed into the pool. */
+	if (phys < pool->range_start || phys >= pool->range_end)
+		goto insert;
+
 	/*
 	 * Only the first struct hyp_page of a high-order page (otherwise known
 	 * as the 'head') should have p->order set. The non-head pages should
@@ -116,6 +121,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 		p = min(p, buddy);
 	}
 
+insert:
 	/* Mark the new head, and insert it */
 	p->order = order;
 	page_add_to_list(p, &pool->free_area[order]);
@@ -123,7 +129,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 
 static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
 					   struct hyp_page *p,
-					   unsigned short order)
+					   u8 order)
 {
 	struct hyp_page *buddy;
 
@@ -144,25 +150,6 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
 	return p;
 }
 
-static inline void hyp_page_ref_inc(struct hyp_page *p)
-{
-	BUG_ON(p->refcount == USHRT_MAX);
-	p->refcount++;
-}
-
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
-{
-	BUG_ON(!p->refcount);
-	p->refcount--;
-	return (p->refcount == 0);
-}
-
-static inline void hyp_set_page_refcounted(struct hyp_page *p)
-{
-	BUG_ON(p->refcount);
-	p->refcount = 1;
-}
-
 static void __hyp_put_page(struct hyp_pool *pool, struct hyp_page *p)
 {
 	if (hyp_page_ref_dec_and_test(p))
@@ -196,7 +183,7 @@ void hyp_get_page(struct hyp_pool *pool, void *addr)
 
 void hyp_split_page(struct hyp_page *p)
 {
-	unsigned short order = p->order;
+	u8 order = p->order;
 	unsigned int i;
 
 	p->order = 0;
@@ -208,10 +195,10 @@ void hyp_split_page(struct hyp_page *p)
 	}
 }
 
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order)
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order)
 {
-	unsigned short i = order;
 	struct hyp_page *p;
+	u8 i = order;
 
 	hyp_spin_lock(&pool->lock);
 
@@ -249,10 +236,8 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
 
 	/* Init the vmemmap portion */
 	p = hyp_phys_to_page(phys);
-	for (i = 0; i < nr_pages; i++) {
-		p[i].order = 0;
+	for (i = 0; i < nr_pages; i++)
 		hyp_set_page_refcounted(&p[i]);
-	}
 
 	/* Attach the unused pages to the buddy tree */
 	for (i = reserved_pages; i < nr_pages; i++)

diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 85d3b7a..f160bc4 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c

@@ -6,9 +6,30 @@
 
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
-#include <nvhe/fixed_config.h>
+
+#include <kvm/arm_hypercalls.h>
+#include <kvm/arm_psci.h>
+
+#include <asm/kvm_emulate.h>
+
+#include <nvhe/mem_protect.h>
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
+/* Used by icache_is_vpipt(). */
+unsigned long __icache_flags;
+
+/* Used by kvm_get_vttbr(). */
+unsigned int kvm_arm_vmid_bits;
+
+/*
+ * The currently loaded hyp vCPU for each physical CPU. Used only when
+ * protected KVM is enabled, but for both protected and non-protected VMs.
+ */
+static DEFINE_PER_CPU(struct pkvm_hyp_vcpu *, loaded_hyp_vcpu);
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
@@ -94,7 +115,7 @@ static void pvm_init_traps_aa64dfr0(struct kvm_vcpu *vcpu)
 
 	/* Trap Debug */
 	if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_DebugVer), feature_ids))
-		mdcr_set |= MDCR_EL2_TDRA | MDCR_EL2_TDA | MDCR_EL2_TDE;
+		mdcr_set |= MDCR_EL2_TDRA | MDCR_EL2_TDA;
 
 	/* Trap OS Double Lock */
 	if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_DoubleLock), feature_ids))
@@ -154,32 +175,1352 @@ static void pvm_init_traps_aa64mmfr1(struct kvm_vcpu *vcpu)
  */
 static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
 {
-	const u64 hcr_trap_feat_regs = HCR_TID3;
-	const u64 hcr_trap_impdef = HCR_TACR | HCR_TIDCP | HCR_TID1;
-
 	/*
 	 * Always trap:
 	 * - Feature id registers: to control features exposed to guests
 	 * - Implementation-defined features
 	 */
-	vcpu->arch.hcr_el2 |= hcr_trap_feat_regs | hcr_trap_impdef;
+	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS |
+			     HCR_TID3 | HCR_TACR | HCR_TIDCP | HCR_TID1;
 
-	/* Clear res0 and set res1 bits to trap potential new features. */
-	vcpu->arch.hcr_el2 &= ~(HCR_RES0);
-	vcpu->arch.mdcr_el2 &= ~(MDCR_EL2_RES0);
-	vcpu->arch.cptr_el2 |= CPTR_NVHE_EL2_RES1;
-	vcpu->arch.cptr_el2 &= ~(CPTR_NVHE_EL2_RES0);
+	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+		/* route synchronous external abort exceptions to EL2 */
+		vcpu->arch.hcr_el2 |= HCR_TEA;
+		/* trap error record accesses */
+		vcpu->arch.hcr_el2 |= HCR_TERR;
+	}
+
+	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+		vcpu->arch.hcr_el2 |= HCR_FWB;
+
+	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE))
+		vcpu->arch.hcr_el2 |= HCR_TID2;
 }
 
 /*
  * Initialize trap register values for protected VMs.
  */
-void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
+static void pkvm_vcpu_init_traps(struct pkvm_hyp_vcpu *hyp_vcpu)
 {
-	pvm_init_trap_regs(vcpu);
-	pvm_init_traps_aa64pfr0(vcpu);
-	pvm_init_traps_aa64pfr1(vcpu);
-	pvm_init_traps_aa64dfr0(vcpu);
-	pvm_init_traps_aa64mmfr0(vcpu);
-	pvm_init_traps_aa64mmfr1(vcpu);
+	hyp_vcpu->vcpu.arch.cptr_el2 = CPTR_EL2_DEFAULT;
+	hyp_vcpu->vcpu.arch.mdcr_el2 = 0;
+
+	if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
+		u64 hcr = READ_ONCE(hyp_vcpu->host_vcpu->arch.hcr_el2);
+
+		hyp_vcpu->vcpu.arch.hcr_el2 = HCR_GUEST_FLAGS | hcr;
+		return;
+	}
+
+	pvm_init_trap_regs(&hyp_vcpu->vcpu);
+	pvm_init_traps_aa64pfr0(&hyp_vcpu->vcpu);
+	pvm_init_traps_aa64pfr1(&hyp_vcpu->vcpu);
+	pvm_init_traps_aa64dfr0(&hyp_vcpu->vcpu);
+	pvm_init_traps_aa64mmfr0(&hyp_vcpu->vcpu);
+	pvm_init_traps_aa64mmfr1(&hyp_vcpu->vcpu);
+}
+
+/*
+ * Start the VM table handle at the offset defined instead of at 0.
+ * Mainly for sanity checking and debugging.
+ */
+#define HANDLE_OFFSET 0x1000
+
+static unsigned int vm_handle_to_idx(pkvm_handle_t handle)
+{
+	return handle - HANDLE_OFFSET;
+}
+
+static pkvm_handle_t idx_to_vm_handle(unsigned int idx)
+{
+	return idx + HANDLE_OFFSET;
+}
+
+/*
+ * Spinlock for protecting state related to the VM table. Protects writes
+ * to 'vm_table' and 'nr_table_entries' as well as reads and writes to
+ * 'last_hyp_vcpu_lookup'.
+ */
+static DEFINE_HYP_SPINLOCK(vm_table_lock);
+
+/*
+ * The table of VM entries for protected VMs in hyp.
+ * Allocated at hyp initialization and setup.
+ */
+static struct pkvm_hyp_vm **vm_table;
+
+void pkvm_hyp_vm_table_init(void *tbl)
+{
+	WARN_ON(vm_table);
+	vm_table = tbl;
+}
+
+/*
+ * Return the hyp vm structure corresponding to the handle.
+ */
+static struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
+{
+	unsigned int idx = vm_handle_to_idx(handle);
+
+	if (unlikely(idx >= KVM_MAX_PVMS))
+		return NULL;
+
+	return vm_table[idx];
+}
+
+int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 pfn, u64 ipa)
+{
+	struct pkvm_hyp_vm *hyp_vm;
+	int ret = -EINVAL;
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_vm = get_vm_by_handle(handle);
+	if (!hyp_vm || !hyp_vm->is_dying)
+		goto unlock;
+
+	ret = __pkvm_host_reclaim_page(hyp_vm, pfn, ipa);
+	if (ret)
+		goto unlock;
+
+	drain_hyp_pool(hyp_vm, &hyp_vm->host_kvm->arch.pkvm.teardown_stage2_mc);
+unlock:
+	hyp_spin_unlock(&vm_table_lock);
+
+	return ret;
+}
+
+struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
+					 unsigned int vcpu_idx)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu = NULL;
+	struct pkvm_hyp_vm *hyp_vm;
+
+	/* Cannot load a new vcpu without putting the old one first. */
+	if (__this_cpu_read(loaded_hyp_vcpu))
+		return NULL;
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_vm = get_vm_by_handle(handle);
+	if (!hyp_vm || hyp_vm->is_dying || hyp_vm->nr_vcpus <= vcpu_idx)
+		goto unlock;
+
+	hyp_vcpu = hyp_vm->vcpus[vcpu_idx];
+
+	/* Ensure vcpu isn't loaded on more than one cpu simultaneously. */
+	if (unlikely(hyp_vcpu->loaded_hyp_vcpu)) {
+		hyp_vcpu = NULL;
+		goto unlock;
+	}
+
+	hyp_vcpu->loaded_hyp_vcpu = this_cpu_ptr(&loaded_hyp_vcpu);
+	hyp_page_ref_inc(hyp_virt_to_page(hyp_vm));
+unlock:
+	hyp_spin_unlock(&vm_table_lock);
+
+	if (hyp_vcpu)
+		__this_cpu_write(loaded_hyp_vcpu, hyp_vcpu);
+	return hyp_vcpu;
+}
+
+void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_vcpu->loaded_hyp_vcpu = NULL;
+	__this_cpu_write(loaded_hyp_vcpu, NULL);
+	hyp_page_ref_dec(hyp_virt_to_page(hyp_vm));
+	hyp_spin_unlock(&vm_table_lock);
+}
+
+struct pkvm_hyp_vcpu *pkvm_get_loaded_hyp_vcpu(void)
+{
+	return __this_cpu_read(loaded_hyp_vcpu);
+}
+
+static void pkvm_vcpu_init_features_from_host(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	DECLARE_BITMAP(allowed_features, KVM_VCPU_MAX_FEATURES);
+
+	/* No restrictions for non-protected VMs. */
+	if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
+		bitmap_copy(hyp_vcpu->vcpu.arch.features,
+			    host_vcpu->arch.features,
+			    KVM_VCPU_MAX_FEATURES);
+		return;
+	}
+
+	bitmap_zero(allowed_features, KVM_VCPU_MAX_FEATURES);
+
+	/*
+	 * For protected vms, always allow:
+	 * - CPU starting in poweroff state
+	 * - PSCI v0.2
+	 */
+	set_bit(KVM_ARM_VCPU_POWER_OFF, allowed_features);
+	set_bit(KVM_ARM_VCPU_PSCI_0_2, allowed_features);
+
+	/*
+	 * Check if remaining features are allowed:
+	 * - Performance Monitoring
+	 * - Scalable Vectors
+	 * - Pointer Authentication
+	 */
+	if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer), PVM_ID_AA64DFR0_ALLOW))
+		set_bit(KVM_ARM_VCPU_PMU_V3, allowed_features);
+
+	if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1_SVE), PVM_ID_AA64PFR0_ALLOW))
+		set_bit(KVM_ARM_VCPU_SVE, allowed_features);
+
+	if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_API), PVM_ID_AA64ISAR1_ALLOW) &&
+	    FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_APA), PVM_ID_AA64ISAR1_ALLOW))
+		set_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, allowed_features);
+
+	if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_GPI), PVM_ID_AA64ISAR1_ALLOW) &&
+	    FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_EL1_GPA), PVM_ID_AA64ISAR1_ALLOW))
+		set_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, allowed_features);
+
+	bitmap_and(hyp_vcpu->vcpu.arch.features, host_vcpu->arch.features,
+		   allowed_features, KVM_VCPU_MAX_FEATURES);
+
+	/*
+	 * Now sanitise the configuration flags that we have inherited
+	 * from the host, as they may refer to features that protected
+	 * mode doesn't support.
+	 */
+	if (!vcpu_has_feature(&hyp_vcpu->vcpu,(KVM_ARM_VCPU_SVE))) {
+		vcpu_clear_flag(&hyp_vcpu->vcpu, GUEST_HAS_SVE);
+		vcpu_clear_flag(&hyp_vcpu->vcpu, VCPU_SVE_FINALIZED);
+	}
+
+	if (!vcpu_has_feature(&hyp_vcpu->vcpu, KVM_ARM_VCPU_PTRAUTH_ADDRESS) ||
+	    !vcpu_has_feature(&hyp_vcpu->vcpu, KVM_ARM_VCPU_PTRAUTH_GENERIC))
+		vcpu_clear_flag(&hyp_vcpu->vcpu, GUEST_HAS_PTRAUTH);
+}
+
+static int pkvm_vcpu_init_ptrauth(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	int ret = 0;
+
+	if (test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, vcpu->arch.features) ||
+	    test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features))
+		ret = kvm_vcpu_enable_ptrauth(vcpu);
+
+	return ret;
+}
+
+static int pkvm_vcpu_init_psci(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct vcpu_reset_state *reset_state = &hyp_vcpu->vcpu.arch.reset_state;
+	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+
+	if (test_bit(KVM_ARM_VCPU_POWER_OFF, hyp_vcpu->vcpu.arch.features)) {
+		reset_state->reset = false;
+		hyp_vcpu->power_state = PSCI_0_2_AFFINITY_LEVEL_OFF;
+	} else if (pkvm_hyp_vm_has_pvmfw(hyp_vm)) {
+		if (hyp_vm->pvmfw_entry_vcpu)
+			return -EINVAL;
+
+		hyp_vm->pvmfw_entry_vcpu = hyp_vcpu;
+		reset_state->reset = true;
+		hyp_vcpu->power_state = PSCI_0_2_AFFINITY_LEVEL_ON_PENDING;
+	} else {
+		struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+		reset_state->pc = READ_ONCE(host_vcpu->arch.ctxt.regs.pc);
+		reset_state->r0 = READ_ONCE(host_vcpu->arch.ctxt.regs.regs[0]);
+		reset_state->reset = true;
+		hyp_vcpu->power_state = PSCI_0_2_AFFINITY_LEVEL_ON_PENDING;
+	}
+
+	return 0;
+}
+
+static void unpin_host_vcpu(struct kvm_vcpu *host_vcpu)
+{
+	if (host_vcpu)
+		hyp_unpin_shared_mem(host_vcpu, host_vcpu + 1);
+}
+
+static void unpin_host_sve_state(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	void *sve_state;
+
+	if (!test_bit(KVM_ARM_VCPU_SVE, hyp_vcpu->vcpu.arch.features))
+		return;
+
+	sve_state = kern_hyp_va(hyp_vcpu->vcpu.arch.sve_state);
+	hyp_unpin_shared_mem(sve_state,
+			     sve_state + vcpu_sve_state_size(&hyp_vcpu->vcpu));
+}
+
+static void unpin_host_vcpus(struct pkvm_hyp_vcpu *hyp_vcpus[],
+			     unsigned int nr_vcpus)
+{
+	int i;
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vcpus[i];
+
+		unpin_host_vcpu(hyp_vcpu->host_vcpu);
+		unpin_host_sve_state(hyp_vcpu);
+	}
+}
+
+static size_t pkvm_get_last_ran_size(void)
+{
+	return array_size(hyp_nr_cpus, sizeof(int));
+}
+
+static void init_pkvm_hyp_vm(struct kvm *host_kvm, struct pkvm_hyp_vm *hyp_vm,
+			     int *last_ran, unsigned int nr_vcpus)
+{
+	u64 pvmfw_load_addr = PVMFW_INVALID_LOAD_ADDR;
+
+	hyp_vm->host_kvm = host_kvm;
+	hyp_vm->kvm.created_vcpus = nr_vcpus;
+	hyp_vm->kvm.arch.vtcr = host_mmu.arch.vtcr;
+	hyp_vm->kvm.arch.pkvm.enabled = READ_ONCE(host_kvm->arch.pkvm.enabled);
+
+	if (hyp_vm->kvm.arch.pkvm.enabled)
+		pvmfw_load_addr = READ_ONCE(host_kvm->arch.pkvm.pvmfw_load_addr);
+	hyp_vm->kvm.arch.pkvm.pvmfw_load_addr = pvmfw_load_addr;
+
+	hyp_vm->kvm.arch.mmu.last_vcpu_ran = (int __percpu *)last_ran;
+	memset(last_ran, -1, pkvm_get_last_ran_size());
+}
+
+static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu,
+			      struct pkvm_hyp_vm *hyp_vm,
+			      struct kvm_vcpu *host_vcpu,
+			      unsigned int vcpu_idx)
+{
+	int ret = 0;
+
+	if (hyp_pin_shared_mem(host_vcpu, host_vcpu + 1))
+		return -EBUSY;
+
+	if (host_vcpu->vcpu_idx != vcpu_idx) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	hyp_vcpu->host_vcpu = host_vcpu;
+
+	hyp_vcpu->vcpu.kvm = &hyp_vm->kvm;
+	hyp_vcpu->vcpu.vcpu_id = READ_ONCE(host_vcpu->vcpu_id);
+	hyp_vcpu->vcpu.vcpu_idx = vcpu_idx;
+
+	hyp_vcpu->vcpu.arch.hw_mmu = &hyp_vm->kvm.arch.mmu;
+	hyp_vcpu->vcpu.arch.cflags = READ_ONCE(host_vcpu->arch.cflags);
+	hyp_vcpu->vcpu.arch.mp_state.mp_state = KVM_MP_STATE_STOPPED;
+	hyp_vcpu->vcpu.arch.debug_ptr = &host_vcpu->arch.vcpu_debug_state;
+
+	pkvm_vcpu_init_features_from_host(hyp_vcpu);
+
+	ret = pkvm_vcpu_init_ptrauth(hyp_vcpu);
+	if (ret)
+		goto done;
+
+	ret = pkvm_vcpu_init_psci(hyp_vcpu);
+	if (ret)
+		goto done;
+
+	if (test_bit(KVM_ARM_VCPU_SVE, hyp_vcpu->vcpu.arch.features)) {
+		size_t sve_state_size;
+		void *sve_state;
+
+		hyp_vcpu->vcpu.arch.sve_state = READ_ONCE(host_vcpu->arch.sve_state);
+		hyp_vcpu->vcpu.arch.sve_max_vl = READ_ONCE(host_vcpu->arch.sve_max_vl);
+
+		sve_state = kern_hyp_va(hyp_vcpu->vcpu.arch.sve_state);
+		sve_state_size = vcpu_sve_state_size(&hyp_vcpu->vcpu);
+
+		if (!hyp_vcpu->vcpu.arch.sve_state || !sve_state_size ||
+		    hyp_pin_shared_mem(sve_state, sve_state + sve_state_size)) {
+			clear_bit(KVM_ARM_VCPU_SVE, hyp_vcpu->vcpu.arch.features);
+			hyp_vcpu->vcpu.arch.sve_state = NULL;
+			hyp_vcpu->vcpu.arch.sve_max_vl = 0;
+			ret = -EINVAL;
+			goto done;
+		}
+	}
+
+	pkvm_vcpu_init_traps(hyp_vcpu);
+	kvm_reset_pvm_sys_regs(&hyp_vcpu->vcpu);
+done:
+	if (ret)
+		unpin_host_vcpu(host_vcpu);
+	return ret;
+}
+
+static int find_free_vm_table_entry(struct kvm *host_kvm)
+{
+	int i;
+
+	for (i = 0; i < KVM_MAX_PVMS; ++i) {
+		if (!vm_table[i])
+			return i;
+	}
+
+	return -ENOMEM;
+}
+
+/*
+ * Allocate a VM table entry and insert a pointer to the new vm.
+ *
+ * Return a unique handle to the protected VM on success,
+ * negative error code on failure.
+ */
+static pkvm_handle_t insert_vm_table_entry(struct kvm *host_kvm,
+					   struct pkvm_hyp_vm *hyp_vm)
+{
+	struct kvm_s2_mmu *mmu = &hyp_vm->kvm.arch.mmu;
+	int idx;
+
+	hyp_assert_lock_held(&vm_table_lock);
+
+	/*
+	 * Initializing protected state might have failed, yet a malicious
+	 * host could trigger this function. Thus, ensure that 'vm_table'
+	 * exists.
+	 */
+	if (unlikely(!vm_table))
+		return -EINVAL;
+
+	idx = find_free_vm_table_entry(host_kvm);
+	if (idx < 0)
+		return idx;
+
+	hyp_vm->kvm.arch.pkvm.handle = idx_to_vm_handle(idx);
+
+	/* VMID 0 is reserved for the host */
+	atomic64_set(&mmu->vmid.id, idx + 1);
+
+	mmu->arch = &hyp_vm->kvm.arch;
+	mmu->pgt = &hyp_vm->pgt;
+
+	vm_table[idx] = hyp_vm;
+	return hyp_vm->kvm.arch.pkvm.handle;
+}
+
+/*
+ * Deallocate and remove the VM table entry corresponding to the handle.
+ */
+static void remove_vm_table_entry(pkvm_handle_t handle)
+{
+	hyp_assert_lock_held(&vm_table_lock);
+	vm_table[vm_handle_to_idx(handle)] = NULL;
+}
+
+static size_t pkvm_get_hyp_vm_size(unsigned int nr_vcpus)
+{
+	return size_add(sizeof(struct pkvm_hyp_vm),
+		size_mul(sizeof(struct pkvm_hyp_vcpu *), nr_vcpus));
+}
+
+static void *map_donated_memory_noclear(unsigned long host_va, size_t size)
+{
+	void *va = (void *)kern_hyp_va(host_va);
+
+	if (!PAGE_ALIGNED(va))
+		return NULL;
+
+	if (__pkvm_host_donate_hyp(hyp_virt_to_pfn(va),
+				   PAGE_ALIGN(size) >> PAGE_SHIFT))
+		return NULL;
+
+	return va;
+}
+
+static void *map_donated_memory(unsigned long host_va, size_t size)
+{
+	void *va = map_donated_memory_noclear(host_va, size);
+
+	if (va)
+		memset(va, 0, size);
+
+	return va;
+}
+
+static void __unmap_donated_memory(void *va, size_t size)
+{
+	kvm_flush_dcache_to_poc(va, size);
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
+				       PAGE_ALIGN(size) >> PAGE_SHIFT));
+}
+
+static void unmap_donated_memory(void *va, size_t size)
+{
+	if (!va)
+		return;
+
+	memset(va, 0, size);
+	__unmap_donated_memory(va, size);
+}
+
+static void unmap_donated_memory_noclear(void *va, size_t size)
+{
+	if (!va)
+		return;
+
+	__unmap_donated_memory(va, size);
+}
+
+/*
+ * Initialize the hypervisor copy of the protected VM state using the
+ * memory donated by the host.
+ *
+ * Unmaps the donated memory from the host at stage 2.
+ *
+ * host_kvm: A pointer to the host's struct kvm.
+ * vm_hva: The host va of the area being donated for the VM state.
+ *	   Must be page aligned.
+ * pgd_hva: The host va of the area being donated for the stage-2 PGD for
+ *	    the VM. Must be page aligned. Its size is implied by the VM's
+ *	    VTCR.
+ * last_ran_hva: The host va of the area being donated for hyp to use to track
+ *		 the most recent physical cpu on which each vcpu has run.
+ * Return a unique handle to the protected VM on success,
+ * negative error code on failure.
+ */
+int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
+		   unsigned long pgd_hva, unsigned long last_ran_hva)
+{
+	struct pkvm_hyp_vm *hyp_vm = NULL;
+	int *last_ran = NULL;
+	size_t vm_size, pgd_size, last_ran_size;
+	unsigned int nr_vcpus;
+	void *pgd = NULL;
+	int ret;
+
+	ret = hyp_pin_shared_mem(host_kvm, host_kvm + 1);
+	if (ret)
+		return ret;
+
+	nr_vcpus = READ_ONCE(host_kvm->created_vcpus);
+	if (nr_vcpus < 1) {
+		ret = -EINVAL;
+		goto err_unpin_kvm;
+	}
+
+	vm_size = pkvm_get_hyp_vm_size(nr_vcpus);
+	last_ran_size = pkvm_get_last_ran_size();
+	pgd_size = kvm_pgtable_stage2_pgd_size(host_mmu.arch.vtcr);
+
+	ret = -ENOMEM;
+
+	hyp_vm = map_donated_memory(vm_hva, vm_size);
+	if (!hyp_vm)
+		goto err_remove_mappings;
+
+	last_ran = map_donated_memory(last_ran_hva, last_ran_size);
+	if (!last_ran)
+		goto err_remove_mappings;
+
+	pgd = map_donated_memory_noclear(pgd_hva, pgd_size);
+	if (!pgd)
+		goto err_remove_mappings;
+
+	init_pkvm_hyp_vm(host_kvm, hyp_vm, last_ran, nr_vcpus);
+
+	hyp_spin_lock(&vm_table_lock);
+	ret = insert_vm_table_entry(host_kvm, hyp_vm);
+	if (ret < 0)
+		goto err_unlock;
+
+	ret = kvm_guest_prepare_stage2(hyp_vm, pgd);
+	if (ret)
+		goto err_remove_vm_table_entry;
+	hyp_spin_unlock(&vm_table_lock);
+
+	return hyp_vm->kvm.arch.pkvm.handle;
+
+err_remove_vm_table_entry:
+	remove_vm_table_entry(hyp_vm->kvm.arch.pkvm.handle);
+err_unlock:
+	hyp_spin_unlock(&vm_table_lock);
+err_remove_mappings:
+	unmap_donated_memory(hyp_vm, vm_size);
+	unmap_donated_memory(last_ran, last_ran_size);
+	unmap_donated_memory(pgd, pgd_size);
+err_unpin_kvm:
+	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
+	return ret;
+}
+
+/*
+ * Initialize the hypervisor copy of the protected vCPU state using the
+ * memory donated by the host.
+ *
+ * handle: The handle for the protected vm.
+ * host_vcpu: A pointer to the corresponding host vcpu.
+ * vcpu_hva: The host va of the area being donated for the vcpu state.
+ *	     Must be page aligned. The size of the area must be equal to
+ *	     the page-aligned size of 'struct pkvm_hyp_vcpu'.
+ * Return 0 on success, negative error code on failure.
+ */
+int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
+		     unsigned long vcpu_hva)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	struct pkvm_hyp_vm *hyp_vm;
+	unsigned int idx;
+	int ret;
+
+	hyp_vcpu = map_donated_memory(vcpu_hva, sizeof(*hyp_vcpu));
+	if (!hyp_vcpu)
+		return -ENOMEM;
+
+	hyp_spin_lock(&vm_table_lock);
+
+	hyp_vm = get_vm_by_handle(handle);
+	if (!hyp_vm) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+
+	idx = hyp_vm->nr_vcpus;
+	if (idx >= hyp_vm->kvm.created_vcpus) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	ret = init_pkvm_hyp_vcpu(hyp_vcpu, hyp_vm, host_vcpu, idx);
+	if (ret)
+		goto unlock;
+
+	hyp_vm->vcpus[idx] = hyp_vcpu;
+	hyp_vm->nr_vcpus++;
+unlock:
+	hyp_spin_unlock(&vm_table_lock);
+
+	if (ret)
+		unmap_donated_memory(hyp_vcpu, sizeof(*hyp_vcpu));
+
+	return ret;
+}
+
+static void
+teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
+{
+	void *start;
+
+	size = PAGE_ALIGN(size);
+	memset(addr, 0, size);
+
+	for (start = addr; start < addr + size; start += PAGE_SIZE)
+		push_hyp_memcache(mc, start, hyp_virt_to_phys);
+
+	unmap_donated_memory_noclear(addr, size);
+}
+
+int __pkvm_start_teardown_vm(pkvm_handle_t handle)
+{
+	struct pkvm_hyp_vm *hyp_vm;
+	int ret = 0;
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_vm = get_vm_by_handle(handle);
+	if (!hyp_vm) {
+		ret = -ENOENT;
+		goto unlock;
+	} else if (WARN_ON(hyp_page_count(hyp_vm))) {
+		ret = -EBUSY;
+		goto unlock;
+	} else if (hyp_vm->is_dying) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	hyp_vm->is_dying = true;
+
+unlock:
+	hyp_spin_unlock(&vm_table_lock);
+
+	return ret;
+}
+
+int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
+{
+	struct kvm_hyp_memcache *mc, *stage2_mc;
+	size_t vm_size, last_ran_size;
+	int __percpu *last_vcpu_ran;
+	struct pkvm_hyp_vm *hyp_vm;
+	struct kvm *host_kvm;
+	unsigned int idx;
+	int err;
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_vm = get_vm_by_handle(handle);
+	if (!hyp_vm) {
+		err = -ENOENT;
+		goto err_unlock;
+	} else if (!hyp_vm->is_dying) {
+		err = -EBUSY;
+		goto err_unlock;
+	}
+
+	host_kvm = hyp_vm->host_kvm;
+
+	/* Ensure the VMID is clean before it can be reallocated */
+	__kvm_tlb_flush_vmid(&hyp_vm->kvm.arch.mmu);
+	remove_vm_table_entry(handle);
+	hyp_spin_unlock(&vm_table_lock);
+
+	mc = &host_kvm->arch.pkvm.teardown_mc;
+	stage2_mc = &host_kvm->arch.pkvm.teardown_stage2_mc;
+
+	destroy_hyp_vm_pgt(hyp_vm);
+	drain_hyp_pool(hyp_vm, stage2_mc);
+	unpin_host_vcpus(hyp_vm->vcpus, hyp_vm->nr_vcpus);
+
+	/* Push the metadata pages to the teardown memcache */
+	for (idx = 0; idx < hyp_vm->nr_vcpus; ++idx) {
+		struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vm->vcpus[idx];
+		struct kvm_hyp_memcache *vcpu_mc;
+		void *addr;
+
+		vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
+		while (vcpu_mc->nr_pages) {
+			addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
+			push_hyp_memcache(stage2_mc, addr, hyp_virt_to_phys);
+			unmap_donated_memory_noclear(addr, PAGE_SIZE);
+		}
+
+		teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
+	}
+
+	last_vcpu_ran = hyp_vm->kvm.arch.mmu.last_vcpu_ran;
+	last_ran_size = pkvm_get_last_ran_size();
+	teardown_donated_memory(mc, (__force void *)last_vcpu_ran,
+				last_ran_size);
+
+	vm_size = pkvm_get_hyp_vm_size(hyp_vm->kvm.created_vcpus);
+	teardown_donated_memory(mc, hyp_vm, vm_size);
+	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
+	return 0;
+
+err_unlock:
+	hyp_spin_unlock(&vm_table_lock);
+	return err;
+}
+
+int pkvm_load_pvmfw_pages(struct pkvm_hyp_vm *vm, u64 ipa, phys_addr_t phys,
+			  u64 size)
+{
+	struct kvm_protected_vm *pkvm = &vm->kvm.arch.pkvm;
+	u64 npages, offset = ipa - pkvm->pvmfw_load_addr;
+	void *src = hyp_phys_to_virt(pvmfw_base) + offset;
+
+	if (offset >= pvmfw_size)
+		return -EINVAL;
+
+	size = min(size, pvmfw_size - offset);
+	if (!PAGE_ALIGNED(size) || !PAGE_ALIGNED(src))
+		return -EINVAL;
+
+	npages = size >> PAGE_SHIFT;
+	while (npages--) {
+		/*
+		 * No need for cache maintenance here, as the pgtable code will
+		 * take care of this when installing the pte in the guest's
+		 * stage-2 page table.
+		 */
+		memcpy(hyp_fixmap_map(phys), src, PAGE_SIZE);
+		hyp_fixmap_unmap();
+
+		src += PAGE_SIZE;
+		phys += PAGE_SIZE;
+	}
+
+	return 0;
+}
+
+void pkvm_poison_pvmfw_pages(void)
+{
+	u64 npages = pvmfw_size >> PAGE_SHIFT;
+	phys_addr_t addr = pvmfw_base;
+
+	while (npages--) {
+		hyp_poison_page(addr);
+		addr += PAGE_SIZE;
+	}
+}
+
+/*
+ * This function sets the registers on the vcpu to their architecturally defined
+ * reset values.
+ *
+ * Note: Can only be called by the vcpu on itself, after it has been turned on.
+ */
+void pkvm_reset_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct vcpu_reset_state *reset_state = &hyp_vcpu->vcpu.arch.reset_state;
+	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+
+	WARN_ON(!reset_state->reset);
+
+	pkvm_vcpu_init_ptrauth(hyp_vcpu);
+	kvm_reset_vcpu_core(&hyp_vcpu->vcpu);
+	kvm_reset_pvm_sys_regs(&hyp_vcpu->vcpu);
+
+	/* Must be done after reseting sys registers. */
+	kvm_reset_vcpu_psci(&hyp_vcpu->vcpu, reset_state);
+	if (hyp_vm->pvmfw_entry_vcpu == hyp_vcpu) {
+		struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+		u64 entry = hyp_vm->kvm.arch.pkvm.pvmfw_load_addr;
+		int i;
+
+		/* X0 - X14 provided by the VMM (preserved) */
+		for (i = 0; i <= 14; ++i) {
+			u64 val = vcpu_get_reg(host_vcpu, i);
+
+			vcpu_set_reg(&hyp_vcpu->vcpu, i, val);
+		}
+
+		/* X15: Boot protocol version */
+		vcpu_set_reg(&hyp_vcpu->vcpu, 15, 0);
+
+		/* PC: IPA of pvmfw base */
+		*vcpu_pc(&hyp_vcpu->vcpu) = entry;
+		hyp_vm->pvmfw_entry_vcpu = NULL;
+
+		/* Auto enroll MMIO guard */
+		set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &hyp_vm->kvm.arch.flags);
+	}
+
+	reset_state->reset = false;
+
+	hyp_vcpu->exit_code = 0;
+
+	WARN_ON(hyp_vcpu->power_state != PSCI_0_2_AFFINITY_LEVEL_ON_PENDING);
+	WRITE_ONCE(hyp_vcpu->vcpu.arch.mp_state.mp_state, KVM_MP_STATE_RUNNABLE);
+	WRITE_ONCE(hyp_vcpu->power_state, PSCI_0_2_AFFINITY_LEVEL_ON);
+}
+
+struct pkvm_hyp_vcpu *pkvm_mpidr_to_hyp_vcpu(struct pkvm_hyp_vm *hyp_vm,
+					     u64 mpidr)
+{
+	int i;
+
+	mpidr &= MPIDR_HWID_BITMASK;
+
+	for (i = 0; i < hyp_vm->nr_vcpus; i++) {
+		struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vm->vcpus[i];
+
+		if (mpidr == kvm_vcpu_get_mpidr_aff(&hyp_vcpu->vcpu))
+			return hyp_vcpu;
+	}
+
+	return NULL;
+}
+
+/*
+ * Returns true if the hypervisor has handled the PSCI call, and control should
+ * go back to the guest, or false if the host needs to do some additional work
+ * (i.e., wake up the vcpu).
+ */
+static bool pvm_psci_vcpu_on(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+	struct vcpu_reset_state *reset_state;
+	struct pkvm_hyp_vcpu *target;
+	unsigned long cpu_id, ret;
+	int power_state;
+
+	cpu_id = smccc_get_arg1(&hyp_vcpu->vcpu);
+	if (!kvm_psci_valid_affinity(&hyp_vcpu->vcpu, cpu_id)) {
+		ret = PSCI_RET_INVALID_PARAMS;
+		goto error;
+	}
+
+	target = pkvm_mpidr_to_hyp_vcpu(hyp_vm, cpu_id);
+	if (!target) {
+		ret = PSCI_RET_INVALID_PARAMS;
+		goto error;
+	}
+
+	/*
+	 * Make sure the requested vcpu is not on to begin with.
+	 * Atomic to avoid race between vcpus trying to power on the same vcpu.
+	 */
+	power_state = cmpxchg(&target->power_state,
+			      PSCI_0_2_AFFINITY_LEVEL_OFF,
+			      PSCI_0_2_AFFINITY_LEVEL_ON_PENDING);
+	switch (power_state) {
+	case PSCI_0_2_AFFINITY_LEVEL_ON_PENDING:
+		ret = PSCI_RET_ON_PENDING;
+		goto error;
+	case PSCI_0_2_AFFINITY_LEVEL_ON:
+		ret = PSCI_RET_ALREADY_ON;
+		goto error;
+	case PSCI_0_2_AFFINITY_LEVEL_OFF:
+		break;
+	default:
+		ret = PSCI_RET_INTERNAL_FAILURE;
+		goto error;
+	}
+
+	reset_state = &target->vcpu.arch.reset_state;
+	reset_state->pc = smccc_get_arg2(&hyp_vcpu->vcpu);
+	reset_state->r0 = smccc_get_arg3(&hyp_vcpu->vcpu);
+	/* Propagate caller endianness */
+	reset_state->be = kvm_vcpu_is_be(&hyp_vcpu->vcpu);
+	reset_state->reset = true;
+
+	/*
+	 * Return to the host, which should make the KVM_REQ_VCPU_RESET request
+	 * as well as kvm_vcpu_wake_up() to schedule the vcpu.
+	 */
+	return false;
+
+error:
+	/* If there's an error go back straight to the guest. */
+	smccc_set_retval(&hyp_vcpu->vcpu, ret, 0, 0, 0);
+	return true;
+}
+
+static bool pvm_psci_vcpu_affinity_info(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	unsigned long target_affinity_mask, target_affinity, lowest_affinity_level;
+	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	unsigned long mpidr, ret;
+	int i, matching_cpus = 0;
+
+	target_affinity = smccc_get_arg1(vcpu);
+	lowest_affinity_level = smccc_get_arg2(vcpu);
+	if (!kvm_psci_valid_affinity(vcpu, target_affinity)) {
+		ret = PSCI_RET_INVALID_PARAMS;
+		goto done;
+	}
+
+	/* Determine target affinity mask */
+	target_affinity_mask = psci_affinity_mask(lowest_affinity_level);
+	if (!target_affinity_mask) {
+		ret = PSCI_RET_INVALID_PARAMS;
+		goto done;
+	}
+
+	/* Ignore other bits of target affinity */
+	target_affinity &= target_affinity_mask;
+	ret = PSCI_0_2_AFFINITY_LEVEL_OFF;
+
+	/*
+	 * If at least one vcpu matching target affinity is ON then return ON,
+	 * then if at least one is PENDING_ON then return PENDING_ON.
+	 * Otherwise, return OFF.
+	 */
+	for (i = 0; i < hyp_vm->nr_vcpus; i++) {
+		struct pkvm_hyp_vcpu *target = hyp_vm->vcpus[i];
+
+		mpidr = kvm_vcpu_get_mpidr_aff(&target->vcpu);
+
+		if ((mpidr & target_affinity_mask) == target_affinity) {
+			int power_state;
+
+			matching_cpus++;
+			power_state = READ_ONCE(target->power_state);
+			switch (power_state) {
+			case PSCI_0_2_AFFINITY_LEVEL_ON_PENDING:
+				ret = PSCI_0_2_AFFINITY_LEVEL_ON_PENDING;
+				break;
+			case PSCI_0_2_AFFINITY_LEVEL_ON:
+				ret = PSCI_0_2_AFFINITY_LEVEL_ON;
+				goto done;
+			case PSCI_0_2_AFFINITY_LEVEL_OFF:
+				break;
+			default:
+				ret = PSCI_RET_INTERNAL_FAILURE;
+				goto done;
+			}
+		}
+	}
+
+	if (!matching_cpus)
+		ret = PSCI_RET_INVALID_PARAMS;
+
+done:
+	/* Nothing to be handled by the host. Go back to the guest. */
+	smccc_set_retval(vcpu, ret, 0, 0, 0);
+	return true;
+}
+
+/*
+ * Returns true if the hypervisor has handled the PSCI call, and control should
+ * go back to the guest, or false if the host needs to do some additional work
+ * (e.g., turn off and update vcpu scheduling status).
+ */
+static bool pvm_psci_vcpu_off(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+
+	WARN_ON(vcpu->arch.mp_state.mp_state == KVM_MP_STATE_STOPPED);
+	WARN_ON(hyp_vcpu->power_state != PSCI_0_2_AFFINITY_LEVEL_ON);
+
+	WRITE_ONCE(vcpu->arch.mp_state.mp_state, KVM_MP_STATE_STOPPED);
+	WRITE_ONCE(hyp_vcpu->power_state, PSCI_0_2_AFFINITY_LEVEL_OFF);
+
+	/* Return to the host so that it can finish powering off the vcpu. */
+	return false;
+}
+
+static bool pvm_psci_version(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	/* Nothing to be handled by the host. Go back to the guest. */
+	smccc_set_retval(&hyp_vcpu->vcpu, KVM_ARM_PSCI_1_1, 0, 0, 0);
+	return true;
+}
+
+static bool pvm_psci_not_supported(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	/* Nothing to be handled by the host. Go back to the guest. */
+	smccc_set_retval(&hyp_vcpu->vcpu, PSCI_RET_NOT_SUPPORTED, 0, 0, 0);
+	return true;
+}
+
+static bool pvm_psci_features(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	u32 feature = smccc_get_arg1(vcpu);
+	unsigned long val;
+
+	switch (feature) {
+	case PSCI_0_2_FN_PSCI_VERSION:
+	case PSCI_0_2_FN_CPU_SUSPEND:
+	case PSCI_0_2_FN64_CPU_SUSPEND:
+	case PSCI_0_2_FN_CPU_OFF:
+	case PSCI_0_2_FN_CPU_ON:
+	case PSCI_0_2_FN64_CPU_ON:
+	case PSCI_0_2_FN_AFFINITY_INFO:
+	case PSCI_0_2_FN64_AFFINITY_INFO:
+	case PSCI_0_2_FN_SYSTEM_OFF:
+	case PSCI_0_2_FN_SYSTEM_RESET:
+	case PSCI_1_0_FN_PSCI_FEATURES:
+	case PSCI_1_1_FN_SYSTEM_RESET2:
+	case PSCI_1_1_FN64_SYSTEM_RESET2:
+	case ARM_SMCCC_VERSION_FUNC_ID:
+		val = PSCI_RET_SUCCESS;
+		break;
+	default:
+		val = PSCI_RET_NOT_SUPPORTED;
+		break;
+	}
+
+	/* Nothing to be handled by the host. Go back to the guest. */
+	smccc_set_retval(vcpu, val, 0, 0, 0);
+	return true;
+}
+
+static bool pkvm_handle_psci(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	u32 psci_fn = smccc_get_function(vcpu);
+
+	switch (psci_fn) {
+	case PSCI_0_2_FN_CPU_ON:
+		kvm_psci_narrow_to_32bit(vcpu);
+		fallthrough;
+	case PSCI_0_2_FN64_CPU_ON:
+		return pvm_psci_vcpu_on(hyp_vcpu);
+	case PSCI_0_2_FN_CPU_OFF:
+		return pvm_psci_vcpu_off(hyp_vcpu);
+	case PSCI_0_2_FN_AFFINITY_INFO:
+		kvm_psci_narrow_to_32bit(vcpu);
+		fallthrough;
+	case PSCI_0_2_FN64_AFFINITY_INFO:
+		return pvm_psci_vcpu_affinity_info(hyp_vcpu);
+	case PSCI_0_2_FN_PSCI_VERSION:
+		return pvm_psci_version(hyp_vcpu);
+	case PSCI_1_0_FN_PSCI_FEATURES:
+		return pvm_psci_features(hyp_vcpu);
+	case PSCI_0_2_FN_SYSTEM_RESET:
+	case PSCI_0_2_FN_CPU_SUSPEND:
+	case PSCI_0_2_FN64_CPU_SUSPEND:
+	case PSCI_0_2_FN_SYSTEM_OFF:
+	case PSCI_1_1_FN_SYSTEM_RESET2:
+	case PSCI_1_1_FN64_SYSTEM_RESET2:
+		return false; /* Handled by the host. */
+	default:
+		break;
+	}
+
+	return pvm_psci_not_supported(hyp_vcpu);
+}
+
+static u64 __pkvm_memshare_page_req(struct pkvm_hyp_vcpu *hyp_vcpu, u64 ipa)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	u64 elr;
+
+	/* Fake up a data abort (Level 3 translation fault on write) */
+	vcpu->arch.fault.esr_el2 = (u32)ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT |
+				   ESR_ELx_WNR | ESR_ELx_FSC_FAULT |
+				   FIELD_PREP(ESR_ELx_FSC_LEVEL, 3);
+
+	/* Shuffle the IPA around into the HPFAR */
+	vcpu->arch.fault.hpfar_el2 = (ipa >> 8) & HPFAR_MASK;
+
+	/* This is a virtual address. 0's good. Let's go with 0. */
+	vcpu->arch.fault.far_el2 = 0;
+
+	/* Rewind the ELR so we return to the HVC once the IPA is mapped */
+	elr = read_sysreg(elr_el2);
+	elr -= 4;
+	write_sysreg(elr, elr_el2);
+
+	return ARM_EXCEPTION_TRAP;
+}
+
+static bool pkvm_memshare_call(struct pkvm_hyp_vcpu *hyp_vcpu, u64 *exit_code)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	u64 ipa = smccc_get_arg1(vcpu);
+	u64 arg2 = smccc_get_arg2(vcpu);
+	u64 arg3 = smccc_get_arg3(vcpu);
+	int err;
+
+	if (arg2 || arg3)
+		goto out_guest_err;
+
+	err = __pkvm_guest_share_host(hyp_vcpu, ipa);
+	switch (err) {
+	case 0:
+		/* Success! Now tell the host. */
+		goto out_host;
+	case -EFAULT:
+		/*
+		 * Convert the exception into a data abort so that the page
+		 * being shared is mapped into the guest next time.
+		 */
+		*exit_code = __pkvm_memshare_page_req(hyp_vcpu, ipa);
+		goto out_host;
+	}
+
+out_guest_err:
+	smccc_set_retval(vcpu, SMCCC_RET_INVALID_PARAMETER, 0, 0, 0);
+	return true;
+
+out_host:
+	return false;
+}
+
+static bool pkvm_memunshare_call(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	u64 ipa = smccc_get_arg1(vcpu);
+	u64 arg2 = smccc_get_arg2(vcpu);
+	u64 arg3 = smccc_get_arg3(vcpu);
+	int err;
+
+	if (arg2 || arg3)
+		goto out_guest_err;
+
+	err = __pkvm_guest_unshare_host(hyp_vcpu, ipa);
+	if (err)
+		goto out_guest_err;
+
+	return false;
+
+out_guest_err:
+	smccc_set_retval(vcpu, SMCCC_RET_INVALID_PARAMETER, 0, 0, 0);
+	return true;
+}
+
+static bool pkvm_meminfo_call(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	u64 arg1 = smccc_get_arg1(vcpu);
+	u64 arg2 = smccc_get_arg2(vcpu);
+	u64 arg3 = smccc_get_arg3(vcpu);
+
+	if (arg1 || arg2 || arg3)
+		goto out_guest_err;
+
+	smccc_set_retval(vcpu, PAGE_SIZE, 0, 0, 0);
+	return true;
+
+out_guest_err:
+	smccc_set_retval(vcpu, SMCCC_RET_INVALID_PARAMETER, 0, 0, 0);
+	return true;
+}
+
+static bool pkvm_memrelinquish_call(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *vcpu = &hyp_vcpu->vcpu;
+	u64 ipa = smccc_get_arg1(vcpu);
+	u64 arg2 = smccc_get_arg2(vcpu);
+	u64 arg3 = smccc_get_arg3(vcpu);
+	u64 pa = 0;
+	int ret;
+
+	if (arg2 || arg3)
+		goto out_guest_err;
+
+	ret = __pkvm_guest_relinquish_to_host(hyp_vcpu, ipa, &pa);
+	if (ret)
+		goto out_guest_err;
+
+	if (pa != 0) {
+		/* Now pass to host. */
+		return false;
+	}
+
+	/* This was a NOP as no page was actually mapped at the IPA. */
+	smccc_set_retval(vcpu, 0, 0, 0, 0);
+	return true;
+
+out_guest_err:
+	smccc_set_retval(vcpu, SMCCC_RET_INVALID_PARAMETER, 0, 0, 0);
+	return true;
+}
+
+static bool pkvm_install_ioguard_page(struct pkvm_hyp_vcpu *hyp_vcpu, u64 *exit_code)
+{
+	u64 retval = SMCCC_RET_SUCCESS;
+	u64 ipa = smccc_get_arg1(&hyp_vcpu->vcpu);
+	int ret;
+
+	ret = __pkvm_install_ioguard_page(hyp_vcpu, ipa);
+	if (ret == -ENOMEM) {
+		/*
+		 * We ran out of memcache, let's ask for more. Cancel
+		 * the effects of the HVC that took us here, and
+		 * forward the hypercall to the host for page donation
+		 * purposes.
+		 */
+		write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
+		return false;
+	}
+
+	if (ret)
+		retval = SMCCC_RET_INVALID_PARAMETER;
+
+	smccc_set_retval(&hyp_vcpu->vcpu, retval, 0, 0, 0);
+	return true;
+}
+
+bool smccc_trng_available;
+
+static bool pkvm_forward_trng(struct kvm_vcpu *vcpu)
+{
+	u32 fn = smccc_get_function(vcpu);
+	struct arm_smccc_res res;
+	unsigned long arg1 = 0;
+
+	/*
+	 * Forward TRNG calls to EL3, as we can't trust the host to handle
+	 * these for us.
+	 */
+	switch (fn) {
+	case ARM_SMCCC_TRNG_FEATURES:
+	case ARM_SMCCC_TRNG_RND32:
+	case ARM_SMCCC_TRNG_RND64:
+		arg1 = smccc_get_arg1(vcpu);
+		fallthrough;
+	case ARM_SMCCC_TRNG_VERSION:
+	case ARM_SMCCC_TRNG_GET_UUID:
+		arm_smccc_1_1_smc(fn, arg1, &res);
+		smccc_set_retval(vcpu, res.a0, res.a1, res.a2, res.a3);
+		memzero_explicit(&res, sizeof(res));
+		break;
+	}
+
+	return true;
+}
+
+/*
+ * Handler for protected VM HVC calls.
+ *
+ * Returns true if the hypervisor has handled the exit, and control should go
+ * back to the guest, or false if it hasn't.
+ */
+bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	u64 val[4] = { SMCCC_RET_NOT_SUPPORTED };
+	u32 fn = smccc_get_function(vcpu);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+
+	hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
+
+	switch (fn) {
+	case ARM_SMCCC_VERSION_FUNC_ID:
+		/* Nothing to be handled by the host. Go back to the guest. */
+		val[0] = ARM_SMCCC_VERSION_1_1;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+		val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+		val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+		val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+		val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_SHARE);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_UNSHARE);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_RELINQUISH);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP);
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID:
+		set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
+		val[0] = SMCCC_RET_SUCCESS;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
+		return pkvm_install_ioguard_page(hyp_vcpu, exit_code);
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID:
+		if (__pkvm_remove_ioguard_page(hyp_vcpu, vcpu_get_reg(vcpu, 1)))
+			val[0] = SMCCC_RET_INVALID_PARAMETER;
+		else
+			val[0] = SMCCC_RET_SUCCESS;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID:
+	case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
+		return pkvm_meminfo_call(hyp_vcpu);
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID:
+		return pkvm_memshare_call(hyp_vcpu, exit_code);
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID:
+		return pkvm_memunshare_call(hyp_vcpu);
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_RELINQUISH_FUNC_ID:
+		return pkvm_memrelinquish_call(hyp_vcpu);
+	case ARM_SMCCC_TRNG_VERSION ... ARM_SMCCC_TRNG_RND32:
+	case ARM_SMCCC_TRNG_RND64:
+		if (smccc_trng_available)
+			return pkvm_forward_trng(vcpu);
+		break;
+	default:
+		return pkvm_handle_psci(hyp_vcpu);
+	}
+
+	smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
+	return true;
+}
+
+/*
+ * Handler for non-protected VM HVC calls.
+ *
+ * Returns true if the hypervisor has handled the exit, and control should go
+ * back to the guest, or false if it hasn't.
+ */
+bool kvm_hyp_handle_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	u32 fn = smccc_get_function(vcpu);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+
+	hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
+
+	switch (fn) {
+	case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
+		return pkvm_meminfo_call(hyp_vcpu);
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_RELINQUISH_FUNC_ID:
+		return pkvm_memrelinquish_call(hyp_vcpu);
+	}
+
+	return false;
 }

diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index 0850878..f48321f 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c

@@ -6,12 +6,15 @@
 
 #include <asm/kvm_asm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_hypevents.h>
 #include <asm/kvm_mmu.h>
 #include <linux/arm-smccc.h>
 #include <linux/kvm_host.h>
 #include <uapi/linux/psci.h>
 
+#include <nvhe/mem_protect.h>
 #include <nvhe/memory.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 void kvm_hyp_cpu_entry(unsigned long r0);
@@ -22,6 +25,20 @@ void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
 /* Config options set by the host. */
 struct kvm_host_psci_config __ro_after_init kvm_host_psci_config;
 
+static void (*pkvm_psci_notifier)(enum pkvm_psci_notification, struct kvm_cpu_context *);
+static void pkvm_psci_notify(enum pkvm_psci_notification notif, struct kvm_cpu_context *host_ctxt)
+{
+	if (READ_ONCE(pkvm_psci_notifier))
+		pkvm_psci_notifier(notif, host_ctxt);
+}
+
+#ifdef CONFIG_MODULES
+int __pkvm_register_psci_notifier(void (*cb)(enum pkvm_psci_notification, struct kvm_cpu_context *))
+{
+	return cmpxchg(&pkvm_psci_notifier, NULL, cb) ? -EBUSY : 0;
+}
+#endif
+
 #define INVALID_CPU_ID	UINT_MAX
 
 struct psci_boot_args {
@@ -153,6 +170,7 @@ static int psci_cpu_suspend(u64 func_id, struct kvm_cpu_context *host_ctxt)
 	DECLARE_REG(u64, power_state, host_ctxt, 1);
 	DECLARE_REG(unsigned long, pc, host_ctxt, 2);
 	DECLARE_REG(unsigned long, r0, host_ctxt, 3);
+	int ret;
 
 	struct psci_boot_args *boot_args;
 	struct kvm_nvhe_init_params *init_params;
@@ -167,13 +185,18 @@ static int psci_cpu_suspend(u64 func_id, struct kvm_cpu_context *host_ctxt)
 	boot_args->pc = pc;
 	boot_args->r0 = r0;
 
+	pkvm_psci_notify(PKVM_PSCI_CPU_SUSPEND, host_ctxt);
+	trace_hyp_exit();
 	/*
 	 * Will either return if shallow sleep state, or wake up into the entry
 	 * point if it is a deep sleep state.
 	 */
-	return psci_call(func_id, power_state,
-			 __hyp_pa(&kvm_hyp_cpu_resume),
-			 __hyp_pa(init_params));
+	ret = psci_call(func_id, power_state,
+			__hyp_pa(&kvm_hyp_cpu_resume),
+			__hyp_pa(init_params));
+	trace_hyp_enter();
+
+	return ret;
 }
 
 static int psci_system_suspend(u64 func_id, struct kvm_cpu_context *host_ctxt)
@@ -194,6 +217,8 @@ static int psci_system_suspend(u64 func_id, struct kvm_cpu_context *host_ctxt)
 	boot_args->pc = pc;
 	boot_args->r0 = r0;
 
+	pkvm_psci_notify(PKVM_PSCI_SYSTEM_SUSPEND, host_ctxt);
+
 	/* Will only return on error. */
 	return psci_call(func_id,
 			 __hyp_pa(&kvm_hyp_cpu_resume),
@@ -218,9 +243,49 @@ asmlinkage void __noreturn kvm_host_psci_cpu_entry(bool is_cpu_on)
 	if (is_cpu_on)
 		release_boot_args(boot_args);
 
+	pkvm_psci_notify(PKVM_PSCI_CPU_ENTRY, host_ctxt);
+
 	__host_enter(host_ctxt);
 }
 
+static DEFINE_HYP_SPINLOCK(mem_protect_lock);
+
+static u64 psci_mem_protect(s64 offset)
+{
+	static u64 cnt;
+	u64 new = cnt + offset;
+
+	hyp_assert_lock_held(&mem_protect_lock);
+
+	if (!offset || kvm_host_psci_config.version < PSCI_VERSION(1, 1))
+		return cnt;
+
+	if (!cnt || !new)
+		psci_call(PSCI_1_1_FN_MEM_PROTECT, offset < 0 ? 0 : 1, 0, 0);
+
+	cnt = new;
+	return cnt;
+}
+
+static bool psci_mem_protect_active(void)
+{
+	return psci_mem_protect(0);
+}
+
+void psci_mem_protect_inc(u64 n)
+{
+	hyp_spin_lock(&mem_protect_lock);
+	psci_mem_protect(n);
+	hyp_spin_unlock(&mem_protect_lock);
+}
+
+void psci_mem_protect_dec(u64 n)
+{
+	hyp_spin_lock(&mem_protect_lock);
+	psci_mem_protect(-n);
+	hyp_spin_unlock(&mem_protect_lock);
+}
+
 static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
 {
 	if (is_psci_0_1(cpu_off, func_id) || is_psci_0_1(migrate, func_id))
@@ -249,6 +314,9 @@ static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_
 	 */
 	case PSCI_0_2_FN_SYSTEM_OFF:
 	case PSCI_0_2_FN_SYSTEM_RESET:
+		pkvm_poison_pvmfw_pages();
+		/* Avoid racing with a MEM_PROTECT call. */
+		hyp_spin_lock(&mem_protect_lock);
 		return psci_forward(host_ctxt);
 	case PSCI_0_2_FN64_CPU_SUSPEND:
 		return psci_cpu_suspend(func_id, host_ctxt);
@@ -262,9 +330,14 @@ static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_
 static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
 {
 	switch (func_id) {
+	case PSCI_1_1_FN64_SYSTEM_RESET2:
+		pkvm_poison_pvmfw_pages();
+		hyp_spin_lock(&mem_protect_lock);
+		if (psci_mem_protect_active())
+			cpu_reg(host_ctxt, 0) = PSCI_0_2_FN_SYSTEM_RESET;
+		fallthrough;
 	case PSCI_1_0_FN_PSCI_FEATURES:
 	case PSCI_1_0_FN_SET_SUSPEND_MODE:
-	case PSCI_1_1_FN64_SYSTEM_RESET2:
 		return psci_forward(host_ctxt);
 	case PSCI_1_0_FN64_SYSTEM_SUSPEND:
 		return psci_system_suspend(func_id, host_ctxt);

diff --git a/arch/arm64/kvm/hyp/nvhe/serial.c b/arch/arm64/kvm/hyp/nvhe/serial.c
new file mode 100644
index 0000000..0b2cf3b
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/serial.c

@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 - Google LLC
+ */
+
+#include <nvhe/pkvm.h>
+#include <nvhe/spinlock.h>
+
+static void (*__hyp_putc)(char c);
+
+static inline void __hyp_putx4(unsigned int x)
+{
+	x &= 0xf;
+	if (x <= 9)
+		x += '0';
+	else
+		x += ('a' - 0xa);
+
+	__hyp_putc(x);
+}
+
+static inline void __hyp_putx4n(unsigned long x, int n)
+{
+	int i = n >> 2;
+
+	__hyp_putc('0');
+	__hyp_putc('x');
+
+	while (i--)
+		__hyp_putx4(x >> (4 * i));
+
+	__hyp_putc('\n');
+	__hyp_putc('\r');
+}
+
+static inline bool hyp_serial_enabled(void)
+{
+	return !!READ_ONCE(__hyp_putc);
+}
+
+void hyp_puts(const char *s)
+{
+	if (!hyp_serial_enabled())
+		return;
+
+	while (*s)
+		__hyp_putc(*s++);
+
+	__hyp_putc('\n');
+	__hyp_putc('\r');
+}
+
+void hyp_putx64(u64 x)
+{
+	if (hyp_serial_enabled())
+		__hyp_putx4n(x, 64);
+}
+
+void hyp_putc(char c)
+{
+	if (hyp_serial_enabled())
+		__hyp_putc(c);
+}
+
+int __pkvm_register_serial_driver(void (*cb)(char))
+{
+	return cmpxchg(&__hyp_putc, NULL, cb) ? -EBUSY : 0;
+}

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2..a3f7878 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c

@@ -11,36 +11,48 @@
 #include <asm/kvm_pkvm.h>
 
 #include <nvhe/early_alloc.h>
-#include <nvhe/fixed_config.h>
+#include <nvhe/ffa.h>
 #include <nvhe/gfp.h>
+#include <nvhe/iommu.h>
 #include <nvhe/memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
+#include <nvhe/serial.h>
 #include <nvhe/trap_handler.h>
 
 unsigned long hyp_nr_cpus;
 
+phys_addr_t pvmfw_base;
+phys_addr_t pvmfw_size;
+
 #define hyp_percpu_size ((unsigned long)__per_cpu_end - \
 			 (unsigned long)__per_cpu_start)
 
 static void *vmemmap_base;
+static void *vm_table_base;
 static void *hyp_pgt_base;
 static void *host_s2_pgt_base;
+static void *ffa_proxy_pages;
 static struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops;
 static struct hyp_pool hpool;
 
 static int divide_memory_pool(void *virt, unsigned long size)
 {
-	unsigned long vstart, vend, nr_pages;
+	unsigned long nr_pages;
 
 	hyp_early_alloc_init(virt, size);
 
-	hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
-	nr_pages = (vend - vstart) >> PAGE_SHIFT;
+	nr_pages = hyp_vmemmap_pages(sizeof(struct hyp_page));
 	vmemmap_base = hyp_early_alloc_contig(nr_pages);
 	if (!vmemmap_base)
 		return -ENOMEM;
 
+	nr_pages = hyp_vm_table_pages();
+	vm_table_base = hyp_early_alloc_contig(nr_pages);
+	if (!vm_table_base)
+		return -ENOMEM;
+
 	nr_pages = hyp_s1_pgtable_pages();
 	hyp_pgt_base = hyp_early_alloc_contig(nr_pages);
 	if (!hyp_pgt_base)
@@ -51,6 +63,11 @@ static int divide_memory_pool(void *virt, unsigned long size)
 	if (!host_s2_pgt_base)
 		return -ENOMEM;
 
+	nr_pages = hyp_ffa_proxy_pages();
+	ffa_proxy_pages = hyp_early_alloc_contig(nr_pages);
+	if (!ffa_proxy_pages)
+		return -ENOMEM;
+
 	return 0;
 }
 
@@ -78,7 +95,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	if (ret)
 		return ret;
 
-	ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
+	ret = hyp_back_vmemmap(hyp_virt_to_phys(vmemmap_base));
 	if (ret)
 		return ret;
 
@@ -86,6 +103,10 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	if (ret)
 		return ret;
 
+	ret = pkvm_create_mappings(__hyp_data_start, __hyp_data_end, PAGE_HYP);
+	if (ret)
+		return ret;
+
 	ret = pkvm_create_mappings(__hyp_rodata_start, __hyp_rodata_end, PAGE_HYP_RO);
 	if (ret)
 		return ret;
@@ -138,20 +159,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	}
 
 	/*
-	 * Map the host's .bss and .rodata sections RO in the hypervisor, but
-	 * transfer the ownership from the host to the hypervisor itself to
-	 * make sure it can't be donated or shared with another entity.
+	 * Map the host sections RO in the hypervisor, but transfer the
+	 * ownership from the host to the hypervisor itself to make sure they
+	 * can't be donated or shared with another entity.
 	 *
 	 * The ownership transition requires matching changes in the host
 	 * stage-2. This will be done later (see finalize_host_mappings()) once
 	 * the hyp_vmemmap is addressable.
 	 */
 	prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
-	ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
+	ret = pkvm_create_mappings(&kvm_vgic_global_state,
+				   &kvm_vgic_global_state + 1, prot);
 	if (ret)
 		return ret;
 
-	ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
+	start = hyp_phys_to_virt(pvmfw_base);
+	end = start + pvmfw_size;
+	prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_OWNED);
+	ret = pkvm_create_mappings(start, end, prot);
 	if (ret)
 		return ret;
 
@@ -186,12 +211,11 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
-					 kvm_pte_t *ptep,
-					 enum kvm_pgtable_walk_flags flag,
-					 void * const arg)
+static int fix_host_ownership_walker(u64 addr, u64 end, u32 level,
+				     kvm_pte_t *ptep,
+				     enum kvm_pgtable_walk_flags flag,
+				     void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	kvm_pte_t pte = *ptep;
@@ -200,15 +224,6 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	/*
-	 * Fix-up the refcount for the page-table pages as the early allocator
-	 * was unable to access the hyp_vmemmap and so the buddy allocator has
-	 * initialised the refcount to '1'.
-	 */
-	mm_ops->get_page(ptep);
-	if (flag != KVM_PGTABLE_WALK_LEAF)
-		return 0;
-
 	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
@@ -223,7 +238,7 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
 	switch (state) {
 	case PKVM_PAGE_OWNED:
-		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
+		return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
 	case PKVM_PAGE_SHARED_OWNED:
 		prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
 		break;
@@ -234,15 +249,33 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 		return -EINVAL;
 	}
 
-	return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
+	return host_stage2_idmap_locked(phys, PAGE_SIZE, prot, false);
 }
 
-static int finalize_host_mappings(void)
+static int fix_hyp_pgtable_refcnt_walker(u64 addr, u64 end, u32 level,
+					 kvm_pte_t *ptep,
+					 enum kvm_pgtable_walk_flags flag,
+					 void * const arg)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	kvm_pte_t pte = *ptep;
+
+	/*
+	 * Fix-up the refcount for the page-table pages as the early allocator
+	 * was unable to access the hyp_vmemmap and so the buddy allocator has
+	 * initialised the refcount to '1'.
+	 */
+	if (kvm_pte_valid(pte))
+		mm_ops->get_page(ptep);
+
+	return 0;
+}
+
+static int fix_host_ownership(void)
 {
 	struct kvm_pgtable_walker walker = {
-		.cb	= finalize_host_mappings_walker,
-		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
+		.cb	= fix_host_ownership_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
 	};
 	int i, ret;
 
@@ -258,6 +291,35 @@ static int finalize_host_mappings(void)
 	return 0;
 }
 
+static int fix_hyp_pgtable_refcnt(void)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= fix_hyp_pgtable_refcnt_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pkvm_pgtable.mm_ops,
+	};
+
+	return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits),
+				&walker);
+}
+
+static int unmap_protected_regions(void)
+{
+	struct pkvm_moveable_reg *reg;
+	int i, ret;
+
+	for (i = 0; i < pkvm_moveable_regs_nr; i++) {
+		reg = &pkvm_moveable_regs[i];
+		if (reg->type != PKVM_MREG_PROTECTED_RANGE)
+			continue;
+		ret = host_stage2_protect_pages_locked(reg->start, reg->size);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 void __noreturn __pkvm_init_finalise(void)
 {
 	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
@@ -287,10 +349,27 @@ void __noreturn __pkvm_init_finalise(void)
 	};
 	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
 
-	ret = finalize_host_mappings();
+	ret = fix_host_ownership();
 	if (ret)
 		goto out;
 
+	ret = fix_hyp_pgtable_refcnt();
+	if (ret)
+		goto out;
+
+	ret = hyp_create_pcpu_fixmap();
+	if (ret)
+		goto out;
+
+	ret = unmap_protected_regions();
+	if (ret)
+		goto out;
+
+	ret = hyp_ffa_init(ffa_proxy_pages);
+	if (ret)
+		goto out;
+
+	pkvm_hyp_vm_table_init(vm_table_base);
 out:
 	/*
 	 * We tail-called to here from handle___pkvm_init() and will not return,

diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index c2cb46c..01694d3 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c

@@ -21,13 +21,14 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_hypevents.h>
 #include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
 #include <asm/processor.h>
 
-#include <nvhe/fixed_config.h>
 #include <nvhe/mem_protect.h>
+#include <nvhe/pkvm.h>
 
 /* Non-VHE specific context */
 DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
@@ -181,6 +182,7 @@ static bool kvm_handle_pvm_sys64(struct kvm_vcpu *vcpu, u64 *exit_code)
 static const exit_handler_fn hyp_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
 	[ESR_ELx_EC_CP15_32]		= kvm_hyp_handle_cp15_32,
+	[ESR_ELx_EC_HVC64]		= kvm_hyp_handle_hvc64,
 	[ESR_ELx_EC_SYS64]		= kvm_hyp_handle_sysreg,
 	[ESR_ELx_EC_SVE]		= kvm_hyp_handle_fpsimd,
 	[ESR_ELx_EC_FP_ASIMD]		= kvm_hyp_handle_fpsimd,
@@ -191,6 +193,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
 
 static const exit_handler_fn pvm_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
+	[ESR_ELx_EC_HVC64]		= kvm_handle_pvm_hvc64,
 	[ESR_ELx_EC_SYS64]		= kvm_handle_pvm_sys64,
 	[ESR_ELx_EC_SVE]		= kvm_handle_pvm_restricted,
 	[ESR_ELx_EC_FP_ASIMD]		= kvm_hyp_handle_fpsimd,
@@ -201,7 +204,7 @@ static const exit_handler_fn pvm_exit_handlers[] = {
 
 static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
 {
-	if (unlikely(kvm_vm_is_protected(kern_hyp_va(vcpu->kvm))))
+	if (unlikely(vcpu_is_protected(vcpu)))
 		return pvm_exit_handlers;
 
 	return hyp_exit_handlers;
@@ -220,9 +223,7 @@ static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
  */
 static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
-	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-
-	if (kvm_vm_is_protected(kvm) && vcpu_mode_is_32bit(vcpu)) {
+	if (unlikely(vcpu_is_protected(vcpu) && vcpu_mode_is_32bit(vcpu))) {
 		/*
 		 * As we have caught the guest red-handed, decide that it isn't
 		 * fit for purpose anymore by making the vcpu invalid. The VMM
@@ -295,10 +296,13 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	__debug_switch_to_guest(vcpu);
 
 	do {
+		trace_hyp_exit();
+
 		/* Jump in the fire! */
 		exit_code = __guest_enter(vcpu);
 
 		/* And we're baaack! */
+		trace_hyp_enter();
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
 	__sysreg_save_state_nvhe(guest_ctxt);
@@ -333,6 +337,12 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	return exit_code;
 }
 
+static void (*hyp_panic_notifier)(struct kvm_cpu_context *host_ctxt);
+int __pkvm_register_hyp_panic_notifier(void (*cb)(struct kvm_cpu_context *host_ctxt))
+{
+	return cmpxchg(&hyp_panic_notifier, NULL, cb) ? -EBUSY : 0;
+}
+
 asmlinkage void __noreturn hyp_panic(void)
 {
 	u64 spsr = read_sysreg_el2(SYS_SPSR);
@@ -344,6 +354,9 @@ asmlinkage void __noreturn hyp_panic(void)
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
+	if (READ_ONCE(hyp_panic_notifier))
+		hyp_panic_notifier(host_ctxt);
+
 	if (vcpu) {
 		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);

diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
index 0f9ac25..5a80815 100644
--- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
+++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c

@@ -11,7 +11,7 @@
 
 #include <hyp/adjust_pc.h>
 
-#include <nvhe/fixed_config.h>
+#include <nvhe/pkvm.h>
 
 #include "../../sys_regs.h"
 
@@ -26,6 +26,7 @@ u64 id_aa64isar2_el1_sys_val;
 u64 id_aa64mmfr0_el1_sys_val;
 u64 id_aa64mmfr1_el1_sys_val;
 u64 id_aa64mmfr2_el1_sys_val;
+u64 id_aa64smfr0_el1_sys_val;
 
 /*
  * Inject an unknown/undefined exception to an AArch64 guest while most of its
@@ -351,6 +352,17 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
 	/* Cache maintenance by set/way operations are restricted. */
 
 	/* Debug and Trace Registers are restricted. */
+	RAZ_WI(SYS_DBGBVRn_EL1(0)),
+	RAZ_WI(SYS_DBGBCRn_EL1(0)),
+	RAZ_WI(SYS_DBGWVRn_EL1(0)),
+	RAZ_WI(SYS_DBGWCRn_EL1(0)),
+	RAZ_WI(SYS_MDSCR_EL1),
+	RAZ_WI(SYS_OSLAR_EL1),
+	RAZ_WI(SYS_OSLSR_EL1),
+	RAZ_WI(SYS_OSDLR_EL1),
+
+	/* Group 1 ID registers */
+	RAZ_WI(SYS_REVIDR_EL1),
 
 	/* AArch64 mappings of the AArch32 ID registers */
 	/* CRm=1 */
@@ -454,8 +466,85 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
 	/* Performance Monitoring Registers are restricted. */
 };
 
+/* A structure to track reset values for system registers in protected vcpus. */
+struct sys_reg_desc_reset {
+	/* Index into sys_reg[]. */
+	int reg;
+
+	/* Reset function. */
+	void (*reset)(struct kvm_vcpu *, const struct sys_reg_desc_reset *);
+
+	/* Reset value. */
+	u64 value;
+};
+
+static void reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc_reset *r)
+{
+	__vcpu_sys_reg(vcpu, r->reg) = read_sysreg(actlr_el1);
+}
+
+static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc_reset *r)
+{
+	__vcpu_sys_reg(vcpu, r->reg) = read_sysreg(amair_el1);
+}
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc_reset *r)
+{
+	__vcpu_sys_reg(vcpu, r->reg) = calculate_mpidr(vcpu);
+}
+
+static void reset_value(struct kvm_vcpu *vcpu, const struct sys_reg_desc_reset *r)
+{
+	__vcpu_sys_reg(vcpu, r->reg) = r->value;
+}
+
+/* Specify the register's reset value. */
+#define RESET_VAL(REG, RESET_VAL) {  REG, reset_value, RESET_VAL }
+
+/* Specify a function that calculates the register's reset value. */
+#define RESET_FUNC(REG, RESET_FUNC) {  REG, RESET_FUNC, 0 }
+
 /*
- * Checks that the sysreg table is unique and in-order.
+ * Architected system registers reset values for Protected VMs.
+ * Important: Must be sorted ascending by REG (index into sys_reg[])
+ */
+static const struct sys_reg_desc_reset pvm_sys_reg_reset_vals[] = {
+	RESET_FUNC(MPIDR_EL1, reset_mpidr),
+	RESET_VAL(SCTLR_EL1, 0x00C50078),
+	RESET_FUNC(ACTLR_EL1, reset_actlr),
+	RESET_VAL(CPACR_EL1, 0),
+	RESET_VAL(ZCR_EL1, 0),
+	RESET_VAL(TCR_EL1, 0),
+	RESET_VAL(VBAR_EL1, 0),
+	RESET_VAL(CONTEXTIDR_EL1, 0),
+	RESET_FUNC(AMAIR_EL1, reset_amair_el1),
+	RESET_VAL(CNTKCTL_EL1, 0),
+	RESET_VAL(MDSCR_EL1, 0),
+	RESET_VAL(MDCCINT_EL1, 0),
+	RESET_VAL(DISR_EL1, 0),
+	RESET_VAL(PMCCFILTR_EL0, 0),
+	RESET_VAL(PMUSERENR_EL0, 0),
+};
+
+/*
+ * Sets system registers to reset value
+ *
+ * This function finds the right entry and sets the registers on the protected
+ * vcpu to their architecturally defined reset values.
+ */
+void kvm_reset_pvm_sys_regs(struct kvm_vcpu *vcpu)
+{
+	unsigned long i;
+
+	for (i = 0; i < ARRAY_SIZE(pvm_sys_reg_reset_vals); i++) {
+		const struct sys_reg_desc_reset *r = &pvm_sys_reg_reset_vals[i];
+
+		r->reset(vcpu, r);
+	}
+}
+
+/*
+ * Checks that the sysreg tables are unique and in-order.
  *
  * Returns 0 if the table is consistent, or 1 otherwise.
  */
@@ -468,6 +557,11 @@ int kvm_check_pvm_sysreg_table(void)
 			return 1;
 	}
 
+	for (i = 1; i < ARRAY_SIZE(pvm_sys_reg_reset_vals); i++) {
+		if (pvm_sys_reg_reset_vals[i-1].reg >= pvm_sys_reg_reset_vals[i].reg)
+			return 1;
+	}
+
 	return 0;
 }
 

diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index d296d61..35092e1 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c

@@ -11,26 +11,62 @@
 #include <nvhe/mem_protect.h>
 
 struct tlb_inv_context {
-	u64		tcr;
+	struct kvm_s2_mmu	*mmu;
+	u64			tcr;
+	u64			sctlr;
 };
 
-static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
-				  struct tlb_inv_context *cxt)
+static void enter_vmid_context(struct kvm_s2_mmu *mmu,
+			       struct tlb_inv_context *cxt)
 {
+	struct kvm_s2_mmu *host_s2_mmu = &host_mmu.arch.mmu;
+	struct kvm_cpu_context *host_ctxt;
+	struct kvm_vcpu *vcpu;
+
+	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	vcpu = host_ctxt->__hyp_running_vcpu;
+	cxt->mmu = NULL;
+
+	/*
+	 * If we're already in the desired context, then there's nothing
+	 * to do.
+	 */
+	if (vcpu) {
+		if (mmu == vcpu->arch.hw_mmu || WARN_ON(mmu != host_s2_mmu))
+			return;
+	} else if (mmu == host_s2_mmu) {
+		return;
+	}
+
+	cxt->mmu = mmu;
 	if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
 		u64 val;
 
 		/*
 		 * For CPUs that are affected by ARM 1319367, we need to
-		 * avoid a host Stage-1 walk while we have the guest's
-		 * VMID set in the VTTBR in order to invalidate TLBs.
-		 * We're guaranteed that the S1 MMU is enabled, so we can
-		 * simply set the EPD bits to avoid any further TLB fill.
+		 * avoid a Stage-1 walk with the old VMID while we have
+		 * the new VMID set in the VTTBR in order to invalidate TLBs.
+		 * We're guaranteed that the host S1 MMU is enabled, so
+		 * we can simply set the EPD bits to avoid any further
+		 * TLB fill. For guests, we ensure that the S1 MMU is
+		 * temporarily enabled in the next context.
 		 */
 		val = cxt->tcr = read_sysreg_el1(SYS_TCR);
 		val |= TCR_EPD1_MASK | TCR_EPD0_MASK;
 		write_sysreg_el1(val, SYS_TCR);
 		isb();
+
+		if (vcpu) {
+			val = cxt->sctlr = read_sysreg_el1(SYS_SCTLR);
+			if (!(val & SCTLR_ELx_M)) {
+				val |= SCTLR_ELx_M;
+				write_sysreg_el1(val, SYS_SCTLR);
+				isb();
+			}
+		} else {
+			/* The host S1 MMU is always enabled. */
+			cxt->sctlr = SCTLR_ELx_M;
+		}
 	}
 
 	/*
@@ -39,20 +75,44 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
 	 * ensuring that we always have an ISB, but not two ISBs back
 	 * to back.
 	 */
-	__load_stage2(mmu, kern_hyp_va(mmu->arch));
+	if (vcpu)
+		__load_host_stage2();
+	else
+		__load_stage2(mmu, kern_hyp_va(mmu->arch));
+
 	asm(ALTERNATIVE("isb", "nop", ARM64_WORKAROUND_SPECULATIVE_AT));
 }
 
-static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
+static void exit_vmid_context(struct tlb_inv_context *cxt)
 {
-	__load_host_stage2();
+	struct kvm_s2_mmu *mmu = cxt->mmu;
+	struct kvm_cpu_context *host_ctxt;
+	struct kvm_vcpu *vcpu;
+
+	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	vcpu = host_ctxt->__hyp_running_vcpu;
+
+	if (!mmu)
+		return;
+
+	if (vcpu)
+		__load_stage2(mmu, kern_hyp_va(mmu->arch));
+	else
+		__load_host_stage2();
 
 	if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
-		/* Ensure write of the host VMID */
+		/* Ensure write of the old VMID */
 		isb();
-		/* Restore the host's TCR_EL1 */
+
+		if (!(cxt->sctlr & SCTLR_ELx_M)) {
+			write_sysreg_el1(cxt->sctlr, SYS_SCTLR);
+			isb();
+		}
+
 		write_sysreg_el1(cxt->tcr, SYS_TCR);
 	}
+
+	cxt->mmu = NULL;
 }
 
 void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
@@ -63,7 +123,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest(mmu, &cxt);
+	enter_vmid_context(mmu, &cxt);
 
 	/*
 	 * We could do so much better if we had the VA as well.
@@ -106,7 +166,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
 	if (icache_is_vpipt())
 		icache_inval_all_pou();
 
-	__tlb_switch_to_host(&cxt);
+	exit_vmid_context(&cxt);
 }
 
 void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
@@ -116,13 +176,13 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest(mmu, &cxt);
+	enter_vmid_context(mmu, &cxt);
 
 	__tlbi(vmalls12e1is);
 	dsb(ish);
 	isb();
 
-	__tlb_switch_to_host(&cxt);
+	exit_vmid_context(&cxt);
 }
 
 void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
@@ -130,14 +190,14 @@ void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
 	struct tlb_inv_context cxt;
 
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest(mmu, &cxt);
+	enter_vmid_context(mmu, &cxt);
 
 	__tlbi(vmalle1);
 	asm volatile("ic iallu");
 	dsb(nsh);
 	isb();
 
-	__tlb_switch_to_host(&cxt);
+	exit_vmid_context(&cxt);
 }
 
 void __kvm_flush_vm_context(void)

diff --git a/arch/arm64/kvm/hyp/nvhe/trace.c b/arch/arm64/kvm/hyp/nvhe/trace.c
new file mode 100644
index 0000000..714433b
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/trace.c

@@ -0,0 +1,596 @@
+#include <nvhe/clock.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/mm.h>
+#include <nvhe/trace.h>
+
+#include <asm/kvm_mmu.h>
+#include <asm/local.h>
+
+#include <linux/ring_buffer.h>
+
+#define HYP_RB_PAGE_HEAD		1UL
+#define HYP_RB_PAGE_UPDATE		2UL
+#define HYP_RB_FLAG_MASK		3UL
+
+static struct hyp_buffer_pages_backing hyp_buffer_pages_backing;
+DEFINE_PER_CPU(struct hyp_rb_per_cpu, trace_rb);
+DEFINE_HYP_SPINLOCK(trace_rb_lock);
+
+static bool rb_set_flag(struct hyp_buffer_page *bpage, int new_flag)
+{
+	unsigned long ret, val = (unsigned long)bpage->list.next;
+
+	ret = cmpxchg((unsigned long *)&bpage->list.next,
+		      val, (val & ~HYP_RB_FLAG_MASK) | new_flag);
+
+	return ret == val;
+}
+
+static void rb_set_footer_status(struct hyp_buffer_page *bpage,
+				 unsigned long status,
+				 bool reader)
+{
+	struct buffer_data_page *page = bpage->page;
+	struct rb_ext_page_footer *footer;
+
+	footer = rb_ext_page_get_footer(page);
+
+	if (reader)
+		atomic_set(&footer->reader_status, status);
+	else
+		atomic_set(&footer->writer_status, status);
+}
+
+static void rb_footer_writer_status(struct hyp_buffer_page *bpage,
+				    unsigned long status)
+{
+	rb_set_footer_status(bpage, status, false);
+}
+
+static void rb_footer_reader_status(struct hyp_buffer_page *bpage,
+				    unsigned long status)
+{
+	rb_set_footer_status(bpage, status, true);
+}
+
+static struct hyp_buffer_page *rb_hyp_buffer_page(struct list_head *list)
+{
+	unsigned long ptr = (unsigned long)list & ~HYP_RB_FLAG_MASK;
+
+	return container_of((struct list_head *)ptr, struct hyp_buffer_page, list);
+}
+
+static struct hyp_buffer_page *rb_next_page(struct hyp_buffer_page *bpage)
+{
+	return rb_hyp_buffer_page(bpage->list.next);
+}
+
+static bool rb_is_head_page(struct hyp_buffer_page *bpage)
+{
+	return (unsigned long)bpage->list.prev->next & HYP_RB_PAGE_HEAD;
+}
+
+static struct hyp_buffer_page *rb_set_head_page(struct hyp_rb_per_cpu *cpu_buffer)
+{
+	struct hyp_buffer_page *bpage, *prev_head;
+	int cnt = 0;
+again:
+	bpage = prev_head = cpu_buffer->head_page;
+	do {
+		if (rb_is_head_page(bpage)) {
+			cpu_buffer->head_page = bpage;
+			rb_footer_reader_status(prev_head, 0);
+			rb_footer_reader_status(bpage, RB_PAGE_FT_HEAD);
+			return bpage;
+		}
+
+		bpage = rb_next_page(bpage);
+	} while (bpage != prev_head);
+
+	cnt++;
+
+	/* We might have race with the writer let's try again */
+	if (cnt < 3)
+		goto again;
+
+	return NULL;
+}
+
+static int rb_swap_reader_page(struct hyp_rb_per_cpu *cpu_buffer)
+{
+	unsigned long *old_head_link, old_link_val, new_link_val, overrun;
+	struct hyp_buffer_page *head, *reader = cpu_buffer->reader_page;
+	struct rb_ext_page_footer *footer;
+
+	rb_footer_reader_status(cpu_buffer->reader_page, 0);
+spin:
+	/* Update the cpu_buffer->header_page according to HYP_RB_PAGE_HEAD */
+	head = rb_set_head_page(cpu_buffer);
+	if (!head)
+		return -ENODEV;
+
+	/* Connect the reader page around the header page */
+	reader->list.next = head->list.next;
+	reader->list.prev = head->list.prev;
+
+	/* The reader page points to the new header page */
+	rb_set_flag(reader, HYP_RB_PAGE_HEAD);
+
+	/*
+	 * Paired with the cmpxchg in rb_move_tail(). Order the read of the head
+	 * page and overrun.
+	 */
+	smp_mb();
+	overrun = atomic_read(&cpu_buffer->overrun);
+
+	/* Try to swap the prev head link to the reader page */
+	old_head_link = (unsigned long *)&reader->list.prev->next;
+	old_link_val = (*old_head_link & ~HYP_RB_FLAG_MASK) | HYP_RB_PAGE_HEAD;
+	new_link_val = (unsigned long)&reader->list;
+	if (cmpxchg(old_head_link, old_link_val, new_link_val)
+		      != old_link_val)
+		goto spin;
+
+	cpu_buffer->head_page = rb_hyp_buffer_page(reader->list.next);
+	cpu_buffer->head_page->list.prev = &reader->list;
+	cpu_buffer->reader_page = head;
+
+	rb_footer_reader_status(cpu_buffer->reader_page, RB_PAGE_FT_READER);
+	rb_footer_reader_status(cpu_buffer->head_page, RB_PAGE_FT_HEAD);
+
+	footer = rb_ext_page_get_footer(cpu_buffer->reader_page->page);
+	footer->stats.overrun = overrun;
+
+	return 0;
+}
+
+static struct hyp_buffer_page *
+rb_move_tail(struct hyp_rb_per_cpu *cpu_buffer)
+{
+	struct hyp_buffer_page *tail_page, *new_tail, *new_head;
+
+	tail_page = cpu_buffer->tail_page;
+	new_tail = rb_next_page(tail_page);
+again:
+	/*
+	 * We caught the reader ... Let's try to move the head page.
+	 * The writer can only rely on ->next links to check if this is head.
+	 */
+	if ((unsigned long)tail_page->list.next & HYP_RB_PAGE_HEAD) {
+		/* The reader moved the head in between */
+		if (!rb_set_flag(tail_page, HYP_RB_PAGE_UPDATE))
+			goto again;
+
+		atomic_add(atomic_read(&new_tail->entries), &cpu_buffer->overrun);
+
+		/* Move the head */
+		rb_set_flag(new_tail, HYP_RB_PAGE_HEAD);
+
+		/* The new head is in place, reset the update flag */
+		rb_set_flag(tail_page, 0);
+
+		new_head = rb_next_page(new_tail);
+	}
+
+	rb_footer_writer_status(tail_page, 0);
+	rb_footer_writer_status(new_tail, RB_PAGE_FT_COMMIT);
+
+	local_set(&new_tail->page->commit, 0);
+
+	atomic_set(&new_tail->write, 0);
+	atomic_set(&new_tail->entries, 0);
+
+	atomic_inc(&cpu_buffer->pages_touched);
+
+	cpu_buffer->tail_page = new_tail;
+
+	return new_tail;
+}
+
+unsigned long rb_event_size(unsigned long length)
+{
+	struct ring_buffer_event *event;
+
+	return length + RB_EVNT_HDR_SIZE + sizeof(event->array[0]);
+}
+
+static struct ring_buffer_event *
+rb_add_ts_extend(struct ring_buffer_event *event, u64 delta)
+{
+	event->type_len = RINGBUF_TYPE_TIME_EXTEND;
+	event->time_delta = delta & TS_MASK;
+	event->array[0] = delta >> TS_SHIFT;
+
+	return (struct ring_buffer_event *)((unsigned long)event + 8);
+}
+
+static struct ring_buffer_event *
+rb_reserve_next(struct hyp_rb_per_cpu *cpu_buffer, unsigned long length)
+{
+	unsigned long ts_ext_size = 0, event_size = rb_event_size(length);
+	struct hyp_buffer_page *tail_page = cpu_buffer->tail_page;
+	struct ring_buffer_event *event;
+	unsigned long write, prev_write;
+	u64 ts, time_delta;
+
+	ts = trace_clock();
+
+	time_delta = ts - atomic64_read(&cpu_buffer->write_stamp);
+
+	if (test_time_stamp(time_delta))
+		ts_ext_size = 8;
+
+	prev_write = atomic_read(&tail_page->write);
+	write = prev_write + event_size + ts_ext_size;
+
+	if (unlikely(write > BUF_EXT_PAGE_SIZE))
+		tail_page = rb_move_tail(cpu_buffer);
+
+	if (!atomic_read(&tail_page->entries)) {
+		tail_page->page->time_stamp = ts;
+		time_delta = 0;
+		ts_ext_size = 0;
+		write = event_size;
+		prev_write = 0;
+	}
+
+	atomic_set(&tail_page->write, write);
+	atomic_inc(&tail_page->entries);
+
+	local_set(&tail_page->page->commit, write);
+
+	atomic_inc(&cpu_buffer->nr_entries);
+	atomic64_set(&cpu_buffer->write_stamp, ts);
+
+	event = (struct ring_buffer_event *)(tail_page->page->data +
+					     prev_write);
+	if (ts_ext_size) {
+		event = rb_add_ts_extend(event, time_delta);
+		time_delta = 0;
+	}
+
+	event->type_len = 0;
+	event->time_delta = time_delta;
+	event->array[0] = event_size - RB_EVNT_HDR_SIZE;
+
+	return event;
+}
+
+void *
+rb_reserve_trace_entry(struct hyp_rb_per_cpu *cpu_buffer, unsigned long length)
+{
+	struct ring_buffer_event *rb_event;
+
+	rb_event = rb_reserve_next(cpu_buffer, length);
+
+	return &rb_event->array[1];
+}
+
+static int rb_update_footers(struct hyp_rb_per_cpu *cpu_buffer)
+{
+	unsigned long entries, pages_touched, overrun;
+	struct rb_ext_page_footer *footer;
+	struct buffer_data_page *reader;
+
+	if (!rb_set_head_page(cpu_buffer))
+		return -ENODEV;
+
+	reader = cpu_buffer->reader_page->page;
+	footer = rb_ext_page_get_footer(reader);
+
+	entries = atomic_read(&cpu_buffer->nr_entries);
+	footer->stats.entries = entries;
+	pages_touched = atomic_read(&cpu_buffer->pages_touched);
+	footer->stats.pages_touched = pages_touched;
+	overrun = atomic_read(&cpu_buffer->overrun);
+	footer->stats.overrun = overrun;
+
+	return 0;
+}
+
+static int rb_page_init(struct hyp_buffer_page *bpage, unsigned long hva)
+{
+	void *hyp_va = (void *)kern_hyp_va(hva);
+	int ret;
+
+	ret = hyp_pin_shared_mem(hyp_va, hyp_va + PAGE_SIZE);
+	if (ret)
+		return ret;
+
+	INIT_LIST_HEAD(&bpage->list);
+	bpage->page = (struct buffer_data_page *)hyp_va;
+
+	atomic_set(&bpage->write, 0);
+
+	rb_footer_reader_status(bpage, 0);
+	rb_footer_writer_status(bpage, 0);
+
+	return 0;
+}
+
+static void rb_cpu_teardown(struct hyp_rb_per_cpu *cpu_buffer)
+{
+	unsigned int prev_status;
+	int i;
+
+	/* Wait for release of the buffer */
+	do {
+		prev_status = atomic_cmpxchg_acquire(&cpu_buffer->status,
+						     HYP_RB_READY,
+						     HYP_RB_UNUSED);
+	} while (prev_status == HYP_RB_WRITE);
+
+	if (prev_status == HYP_RB_READY)
+		rb_update_footers(cpu_buffer);
+
+	for (i = 0; i < cpu_buffer->nr_pages; i++) {
+		struct hyp_buffer_page *bpage = &cpu_buffer->bpages[i];
+
+		if (!bpage->page)
+			continue;
+
+		hyp_unpin_shared_mem((void *)bpage->page,
+				     (void *)bpage->page + PAGE_SIZE);
+	}
+}
+
+static bool rb_cpu_fits_backing(unsigned long nr_pages,
+			        struct hyp_buffer_page *start)
+{
+	unsigned long max = hyp_buffer_pages_backing.start +
+			    hyp_buffer_pages_backing.size;
+	struct hyp_buffer_page *end = start + nr_pages;
+
+	return (unsigned long)end <= max;
+}
+
+static bool rb_cpu_fits_pack(struct ring_buffer_pack *rb_pack,
+			     unsigned long pack_end)
+{
+	unsigned long *end;
+
+	/* Check we can at least read nr_pages */
+	if ((unsigned long)&rb_pack->nr_pages >= pack_end)
+		return false;
+
+	end = &rb_pack->page_va[rb_pack->nr_pages];
+
+	return (unsigned long)end <= pack_end;
+}
+
+static int rb_cpu_init(struct ring_buffer_pack *rb_pack, struct hyp_buffer_page *start,
+		       struct hyp_rb_per_cpu *cpu_buffer)
+{
+	struct hyp_buffer_page *bpage = start;
+	int i, ret;
+
+	if (!rb_pack->nr_pages ||
+	    !rb_cpu_fits_backing(rb_pack->nr_pages + 1, start))
+		return -EINVAL;
+
+	memset(cpu_buffer, 0, sizeof(*cpu_buffer));
+
+	cpu_buffer->bpages = start;
+	cpu_buffer->nr_pages = rb_pack->nr_pages + 1;
+
+	/* The reader page is not part of the ring initially */
+	ret = rb_page_init(bpage, rb_pack->reader_page_va);
+	if (ret)
+		return ret;
+
+	cpu_buffer->reader_page = bpage;
+	cpu_buffer->tail_page = bpage + 1;
+	cpu_buffer->head_page = bpage + 1;
+
+	for (i = 0; i < rb_pack->nr_pages; i++) {
+		ret = rb_page_init(++bpage, rb_pack->page_va[i]);
+		if (ret)
+			goto err;
+
+		bpage->list.next = &(bpage + 1)->list;
+		bpage->list.prev = &(bpage - 1)->list;
+	}
+
+	/* Close the ring */
+	bpage->list.next = &cpu_buffer->tail_page->list;
+	cpu_buffer->tail_page->list.prev = &bpage->list;
+
+	/* The last init'ed page points to the head page */
+	rb_set_flag(bpage, HYP_RB_PAGE_HEAD);
+
+	rb_footer_reader_status(cpu_buffer->reader_page, RB_PAGE_FT_READER);
+	rb_footer_reader_status(cpu_buffer->head_page, RB_PAGE_FT_HEAD);
+	rb_footer_writer_status(cpu_buffer->head_page, RB_PAGE_FT_COMMIT);
+
+	atomic_set(&cpu_buffer->overrun, 0);
+	atomic64_set(&cpu_buffer->write_stamp, 0);
+
+	/*
+	 * Paired with __start_write_hyp_rb()
+	 */
+	atomic_set_release(&cpu_buffer->status, HYP_RB_READY);
+
+	return 0;
+err:
+	rb_cpu_teardown(cpu_buffer);
+
+	return ret;
+}
+
+static int rb_setup_bpage_backing(struct hyp_trace_pack *pack)
+{
+	unsigned long start = kern_hyp_va(pack->backing.start);
+	size_t size = pack->backing.size;
+	int ret;
+
+	if (hyp_buffer_pages_backing.size)
+		return -EBUSY;
+
+	if (!PAGE_ALIGNED(start) || !PAGE_ALIGNED(size))
+		return -EINVAL;
+
+	ret = __pkvm_host_donate_hyp(hyp_virt_to_pfn((void *)start), size >> PAGE_SHIFT);
+	if (ret)
+		return ret;
+
+	memset((void *)start, 0, size);
+
+	hyp_buffer_pages_backing.start = start;
+	hyp_buffer_pages_backing.size = size;
+
+	return 0;
+}
+
+static void rb_teardown_bpage_backing(void)
+{
+	unsigned long start = hyp_buffer_pages_backing.start;
+	size_t size = hyp_buffer_pages_backing.size;
+
+	if (!size)
+		return;
+
+	memset((void *)start, 0, size);
+
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(start), size >> PAGE_SHIFT));
+
+	hyp_buffer_pages_backing.start = 0;
+	hyp_buffer_pages_backing.size = 0;
+}
+
+int __pkvm_rb_update_footers(int cpu)
+{
+	struct hyp_rb_per_cpu *cpu_buffer;
+	int ret = 0;
+
+	/* TODO: per-CPU lock for */
+	hyp_spin_lock(&trace_rb_lock);
+
+	if (cpu >= hyp_nr_cpus) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	cpu_buffer = per_cpu_ptr(&trace_rb, cpu);
+
+	if (atomic_read(&cpu_buffer->status) == HYP_RB_UNUSED) {
+		ret = -ENODEV;
+		goto err;
+	}
+
+	ret = rb_update_footers(cpu_buffer);
+err:
+	hyp_spin_unlock(&trace_rb_lock);
+
+	return ret;
+}
+
+int __pkvm_rb_swap_reader_page(int cpu)
+{
+	struct hyp_rb_per_cpu *cpu_buffer = per_cpu_ptr(&trace_rb, cpu);
+	int ret = 0;
+
+	/* TODO: per-CPU lock for */
+	hyp_spin_lock(&trace_rb_lock);
+
+	if (cpu >= hyp_nr_cpus) {
+		ret = -EINVAL;
+		goto err;
+	}
+	cpu_buffer = per_cpu_ptr(&trace_rb, cpu);
+
+	if (atomic_read(&cpu_buffer->status) == HYP_RB_UNUSED) {
+		ret = -ENODEV;
+		goto err;
+	}
+
+	ret = rb_swap_reader_page(cpu_buffer);
+	if (ret)
+		goto err;
+err:
+	hyp_spin_unlock(&trace_rb_lock);
+
+	return ret;
+}
+
+static void __pkvm_stop_tracing_locked(void)
+{
+	int cpu;
+
+	hyp_assert_lock_held(&trace_rb_lock);
+
+	for (cpu = 0; cpu < hyp_nr_cpus; cpu++) {
+		struct hyp_rb_per_cpu *cpu_buffer = per_cpu_ptr(&trace_rb, cpu);
+
+		if (atomic_read(&cpu_buffer->status) == HYP_RB_UNUSED)
+			continue;
+
+		rb_cpu_teardown(cpu_buffer);
+	}
+
+	rb_teardown_bpage_backing();
+}
+
+void __pkvm_stop_tracing(void)
+{
+	hyp_spin_lock(&trace_rb_lock);
+	__pkvm_stop_tracing_locked();
+	hyp_spin_unlock(&trace_rb_lock);
+}
+
+int __pkvm_start_tracing(unsigned long pack_hva, size_t pack_size)
+{
+	struct hyp_trace_pack *pack = (struct hyp_trace_pack *)kern_hyp_va(pack_hva);
+	struct trace_buffer_pack *trace_pack = &pack->trace_buffer_pack;
+	struct hyp_buffer_page *bpage_backing_start;
+	struct ring_buffer_pack *rb_pack;
+	int ret, cpu;
+
+	if (!pack_size || !PAGE_ALIGNED(pack_hva) || !PAGE_ALIGNED(pack_size))
+		return -EINVAL;
+
+	ret = __pkvm_host_donate_hyp(hyp_virt_to_pfn((void *)pack),
+				     pack_size >> PAGE_SHIFT);
+	if (ret)
+		return ret;
+
+	hyp_spin_lock(&trace_rb_lock);
+
+	ret = rb_setup_bpage_backing(pack);
+	if (ret)
+		goto err;
+
+	trace_clock_update(&pack->trace_clock_data);
+
+	bpage_backing_start = (struct hyp_buffer_page *)hyp_buffer_pages_backing.start;
+
+	for_each_ring_buffer_pack(rb_pack, cpu, trace_pack) {
+		struct hyp_rb_per_cpu *cpu_buffer;
+		int cpu;
+
+		ret = -EINVAL;
+		if (!rb_cpu_fits_pack(rb_pack, pack_hva + pack_size))
+			break;
+
+		cpu = rb_pack->cpu;
+		if (cpu >= hyp_nr_cpus)
+			break;
+
+		cpu_buffer = per_cpu_ptr(&trace_rb, cpu);
+
+		ret = rb_cpu_init(rb_pack, bpage_backing_start, cpu_buffer);
+		if (ret)
+			break;
+
+		/* reader page + nr pages in rb */
+		bpage_backing_start += 1 + rb_pack->nr_pages;
+	}
+err:
+	if (ret)
+		__pkvm_stop_tracing_locked();
+
+	hyp_spin_unlock(&trace_rb_lock);
+
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn((void *)pack),
+				       pack_size >> PAGE_SHIFT));
+	return ret;
+}

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cdf8e76..e48b66b 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c

@@ -12,43 +12,10 @@
 #include <asm/stage2_pgtable.h>
 
 
-#define KVM_PTE_TYPE			BIT(1)
-#define KVM_PTE_TYPE_BLOCK		0
-#define KVM_PTE_TYPE_PAGE		1
-#define KVM_PTE_TYPE_TABLE		1
-
-#define KVM_PTE_LEAF_ATTR_LO		GENMASK(11, 2)
-
-#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX	GENMASK(4, 2)
-#define KVM_PTE_LEAF_ATTR_LO_S1_AP	GENMASK(7, 6)
-#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RO	3
-#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RW	1
-#define KVM_PTE_LEAF_ATTR_LO_S1_SH	GENMASK(9, 8)
-#define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS	3
-#define KVM_PTE_LEAF_ATTR_LO_S1_AF	BIT(10)
-
-#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR	GENMASK(5, 2)
-#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R	BIT(6)
-#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W	BIT(7)
-#define KVM_PTE_LEAF_ATTR_LO_S2_SH	GENMASK(9, 8)
-#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS	3
-#define KVM_PTE_LEAF_ATTR_LO_S2_AF	BIT(10)
-
-#define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
-
-#define KVM_PTE_LEAF_ATTR_HI_SW		GENMASK(58, 55)
-
-#define KVM_PTE_LEAF_ATTR_HI_S1_XN	BIT(54)
-
-#define KVM_PTE_LEAF_ATTR_HI_S2_XN	BIT(54)
-
 #define KVM_PTE_LEAF_ATTR_S2_PERMS	(KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \
 					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
 					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
 
-#define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
-#define KVM_MAX_OWNER_ID		1
-
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -57,8 +24,6 @@ struct kvm_pgtable_walk_data {
 	u64				end;
 };
 
-#define KVM_PHYS_INVALID (-1ULL)
-
 static bool kvm_phys_is_valid(u64 phys)
 {
 	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
@@ -111,27 +76,6 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
 }
 
-static bool kvm_pte_table(kvm_pte_t pte, u32 level)
-{
-	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
-		return false;
-
-	if (!kvm_pte_valid(pte))
-		return false;
-
-	return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
-}
-
-static kvm_pte_t kvm_phys_to_pte(u64 pa)
-{
-	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
-
-	if (PAGE_SHIFT == 16)
-		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
-
-	return pte;
-}
-
 static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
 {
 	return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
@@ -167,11 +111,6 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
 	return pte;
 }
 
-static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
-{
-	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
-}
-
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 				  u32 level, kvm_pte_t *ptep,
 				  enum kvm_pgtable_walk_flags flag)
@@ -334,16 +273,26 @@ struct hyp_map_data {
 
 static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
 {
-	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
-	u32 mtype = device ? MT_DEVICE_nGnRE : MT_NORMAL;
-	kvm_pte_t attr = FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX, mtype);
-	u32 sh = KVM_PTE_LEAF_ATTR_LO_S1_SH_IS;
 	u32 ap = (prot & KVM_PGTABLE_PROT_W) ? KVM_PTE_LEAF_ATTR_LO_S1_AP_RW :
 					       KVM_PTE_LEAF_ATTR_LO_S1_AP_RO;
+	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
+	u32 sh = KVM_PTE_LEAF_ATTR_LO_S1_SH_IS;
+	bool nc = prot & KVM_PGTABLE_PROT_NC;
+	kvm_pte_t attr;
+	u32 mtype;
 
-	if (!(prot & KVM_PGTABLE_PROT_R))
+	if (!(prot & KVM_PGTABLE_PROT_R) || (device && nc))
 		return -EINVAL;
 
+	if (device)
+		mtype = MT_DEVICE_nGnRnE;
+	else if (nc)
+		mtype = MT_NORMAL_NC;
+	else
+		mtype = MT_NORMAL;
+
+	attr = FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX, mtype);
+
 	if (prot & KVM_PGTABLE_PROT_X) {
 		if (prot & KVM_PGTABLE_PROT_W)
 			return -EINVAL;
@@ -527,7 +476,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
 	pgt->mm_ops		= mm_ops;
 	pgt->mmu		= NULL;
-	pgt->force_pte_cb	= NULL;
+	pgt->pte_ops		= NULL;
 
 	return 0;
 }
@@ -565,7 +514,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 struct stage2_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
-	u8				owner_id;
+	u64				annotation;
 
 	kvm_pte_t			*anchor;
 	kvm_pte_t			*childp;
@@ -624,9 +573,19 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
 				kvm_pte_t *ptep)
 {
 	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
-	kvm_pte_t attr = device ? KVM_S2_MEMATTR(pgt, DEVICE_nGnRE) :
-			    KVM_S2_MEMATTR(pgt, NORMAL);
 	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
+	bool nc = prot & KVM_PGTABLE_PROT_NC;
+	kvm_pte_t attr;
+
+	if (device && nc)
+		return -EINVAL;
+
+	if (device)
+		attr = KVM_S2_MEMATTR(pgt, DEVICE_nGnRE);
+	else if (nc)
+		attr = KVM_S2_MEMATTR(pgt, NORMAL_NC);
+	else
+		attr = KVM_S2_MEMATTR(pgt, NORMAL);
 
 	if (!(prot & KVM_PGTABLE_PROT_X))
 		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
@@ -672,14 +631,14 @@ static bool stage2_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
 	return ((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS));
 }
 
-static bool stage2_pte_is_counted(kvm_pte_t pte)
+static void stage2_clear_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
+			     u32 level)
 {
-	/*
-	 * The refcount tracks valid entries as well as invalid entries if they
-	 * encode ownership of a page to another entity than the page-table
-	 * owner, whose id is 0.
-	 */
-	return !!pte;
+	if (!kvm_pte_valid(*ptep))
+		return;
+
+	kvm_clear_pte(ptep);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
 }
 
 static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
@@ -689,23 +648,19 @@ static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
 	 * Clear the existing PTE, and perform break-before-make with
 	 * TLB maintenance if it was valid.
 	 */
-	if (kvm_pte_valid(*ptep)) {
-		kvm_clear_pte(ptep);
-		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
-	}
-
+	stage2_clear_pte(ptep, mmu, addr, level);
 	mm_ops->put_page(ptep);
 }
 
 static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
 {
 	u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
-	return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
+	return kvm_pte_valid(pte) && memattr == KVM_S2_MEMATTR(pgt, NORMAL);
 }
 
 static bool stage2_pte_executable(kvm_pte_t pte)
 {
-	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
+	return kvm_pte_valid(pte) && !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
 }
 
 static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
@@ -724,6 +679,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
+	struct kvm_pgtable_pte_ops *pte_ops = pgt->pte_ops;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
@@ -732,9 +688,9 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	if (kvm_phys_is_valid(phys))
 		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
 	else
-		new = kvm_init_invalid_leaf_owner(data->owner_id);
+		new = data->annotation;
 
-	if (stage2_pte_is_counted(old)) {
+	if (pte_ops->pte_is_counted_cb(old, level)) {
 		/*
 		 * Skip updating the PTE if we are trying to recreate the exact
 		 * same mapping or only change the access permissions. Instead,
@@ -744,20 +700,28 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		if (!stage2_pte_needs_update(old, new))
 			return -EAGAIN;
 
+		/*
+		 * If we're only changing software bits, then we don't need to
+		 * do anything else/
+		 */
+		if (!((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
+			goto out_set_pte;
+
 		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
 	}
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
 	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
 		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
-						granule);
-
+					       granule);
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
 		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
 
-	smp_store_release(ptep, new);
-	if (stage2_pte_is_counted(new))
+	if (pte_ops->pte_is_counted_cb(new, level))
 		mm_ops->get_page(ptep);
+
+out_set_pte:
+	smp_store_release(ptep, new);
 	if (kvm_phys_is_valid(phys))
 		data->phys += granule;
 	return 0;
@@ -790,11 +754,13 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable *pgt = data->mmu->pgt;
+	struct kvm_pgtable_pte_ops *pte_ops = pgt->pte_ops;
 	kvm_pte_t *childp, pte = *ptep;
 	int ret;
 
 	if (data->anchor) {
-		if (stage2_pte_is_counted(pte))
+		if (pte_ops->pte_is_counted_cb(pte, level))
 			mm_ops->put_page(ptep);
 
 		return 0;
@@ -819,7 +785,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * a table. Accesses beyond 'end' that fall within the new table
 	 * will be mapped lazily.
 	 */
-	if (stage2_pte_is_counted(pte))
+	if (pte_ops->pte_is_counted_cb(pte, level))
 		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
 
 	kvm_set_table_pte(ptep, childp, mm_ops);
@@ -895,12 +861,12 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   void *mc)
 {
 	int ret;
+	struct kvm_pgtable_pte_ops *pte_ops = pgt->pte_ops;
 	struct stage2_map_data map_data = {
 		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
 		.mm_ops		= pgt->mm_ops,
-		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
@@ -910,6 +876,9 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.arg		= &map_data,
 	};
 
+	if (pte_ops->force_pte_cb)
+		map_data.force_pte = pte_ops->force_pte_cb(addr, addr + size, prot);
+
 	if (WARN_ON((pgt->flags & KVM_PGTABLE_S2_IDMAP) && (addr != phys)))
 		return -EINVAL;
 
@@ -922,8 +891,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
-				 void *mc, u8 owner_id)
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				void *mc, kvm_pte_t annotation)
 {
 	int ret;
 	struct stage2_map_data map_data = {
@@ -931,8 +900,8 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
 		.mm_ops		= pgt->mm_ops,
-		.owner_id	= owner_id,
 		.force_pte	= true,
+		.annotation	= annotation,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
@@ -942,7 +911,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.arg		= &map_data,
 	};
 
-	if (owner_id > KVM_MAX_OWNER_ID)
+	if (annotation & PTE_VALID)
 		return -EINVAL;
 
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
@@ -956,11 +925,12 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	struct kvm_pgtable *pgt = arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
+	struct kvm_pgtable_pte_ops *pte_ops = pgt->pte_ops;
 	kvm_pte_t pte = *ptep, *childp = NULL;
 	bool need_flush = false;
 
 	if (!kvm_pte_valid(pte)) {
-		if (stage2_pte_is_counted(pte)) {
+		if (pte_ops->pte_is_counted_cb(pte, level)) {
 			kvm_clear_pte(ptep);
 			mm_ops->put_page(ptep);
 		}
@@ -1148,7 +1118,7 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep;
 
-	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
+	if (!stage2_pte_cacheable(pgt, pte))
 		return 0;
 
 	if (mm_ops->dcache_clean_inval_poc)
@@ -1175,7 +1145,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 			      struct kvm_pgtable_mm_ops *mm_ops,
 			      enum kvm_pgtable_stage2_flags flags,
-			      kvm_pgtable_force_pte_cb_t force_pte_cb)
+			      struct kvm_pgtable_pte_ops *pte_ops)
 {
 	size_t pgd_sz;
 	u64 vtcr = mmu->arch->vtcr;
@@ -1193,21 +1163,32 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	pgt->mm_ops		= mm_ops;
 	pgt->mmu		= mmu;
 	pgt->flags		= flags;
-	pgt->force_pte_cb	= force_pte_cb;
+	pgt->pte_ops		= pte_ops;
 
 	/* Ensure zeroed PGD pages are visible to the hardware walker */
 	dsb(ishst);
 	return 0;
 }
 
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
+{
+	u32 ia_bits = VTCR_EL2_IPA(vtcr);
+	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+
+	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
+}
+
 static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable *pgt = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
+	struct kvm_pgtable_pte_ops *pte_ops = pgt->pte_ops;
 	kvm_pte_t pte = *ptep;
 
-	if (!stage2_pte_is_counted(pte))
+	if (!pte_ops->pte_is_counted_cb(pte, level))
 		return 0;
 
 	mm_ops->put_page(ptep);
@@ -1225,7 +1206,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
 			  KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
+		.arg	= pgt,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));

diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 6cb638b..7b397fa 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c

@@ -330,7 +330,7 @@ void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if)
 		write_gicreg(0, ICH_HCR_EL2);
 }
 
-void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
+static void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
 {
 	u64 val;
 	u32 nr_pre_bits;
@@ -363,7 +363,7 @@ void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
 	}
 }
 
-void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if)
+static void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if)
 {
 	u64 val;
 	u32 nr_pre_bits;
@@ -455,16 +455,35 @@ u64 __vgic_v3_get_gic_config(void)
 	return val;
 }
 
-u64 __vgic_v3_read_vmcr(void)
+static u64 __vgic_v3_read_vmcr(void)
 {
 	return read_gicreg(ICH_VMCR_EL2);
 }
 
-void __vgic_v3_write_vmcr(u32 vmcr)
+static void __vgic_v3_write_vmcr(u32 vmcr)
 {
 	write_gicreg(vmcr, ICH_VMCR_EL2);
 }
 
+void __vgic_v3_save_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if)
+{
+	__vgic_v3_save_aprs(cpu_if);
+	if (cpu_if->vgic_sre)
+		cpu_if->vgic_vmcr = __vgic_v3_read_vmcr();
+}
+
+void __vgic_v3_restore_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if)
+{
+	/*
+	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
+	 * is dependent on ICC_SRE_EL1.SRE, and we have to perform the
+	 * VMCR_EL2 save/restore in the world switch.
+	 */
+	if (cpu_if->vgic_sre)
+		__vgic_v3_write_vmcr(cpu_if->vgic_vmcr);
+	__vgic_v3_restore_aprs(cpu_if);
+}
+
 static int __vgic_v3_bpr_min(void)
 {
 	/* See Pseudocode for VPriorityGroup */

diff --git a/arch/arm64/kvm/hyp_events.c b/arch/arm64/kvm/hyp_events.c
new file mode 100644
index 0000000..7786f3b
--- /dev/null
+++ b/arch/arm64/kvm/hyp_events.c

@@ -0,0 +1,351 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 Google LLC
+ */
+
+#include <linux/tracefs.h>
+#include <linux/trace_events.h>
+
+#include <asm/kvm_host.h>
+#include <asm/kvm_hypevents_defs.h>
+#include <asm/setup.h>
+
+#include "hyp_trace.h"
+
+#define HYP_EVENT_NAME_MAX 32
+
+struct hyp_event {
+	struct trace_event_call *call;
+	char name[HYP_EVENT_NAME_MAX];
+	bool *enabled;
+};
+
+#define HYP_EVENT(__name, __proto, __struct, __assign, __printk)		\
+	HYP_EVENT_FORMAT(__name, __struct);					\
+	enum print_line_t hyp_event_trace_##__name(struct trace_iterator *iter,	\
+					  int flags, struct trace_event *event) \
+	{									\
+		struct ht_iterator *ht_iter = (struct ht_iterator *)iter;	\
+		struct trace_hyp_format_##__name __maybe_unused *__entry =	\
+			(struct trace_hyp_format_##__name *)ht_iter->ent;	\
+		trace_seq_puts(&ht_iter->seq, #__name);				\
+		trace_seq_putc(&ht_iter->seq, ' ');				\
+		trace_seq_printf(&ht_iter->seq, __printk);			\
+		trace_seq_putc(&ht_iter->seq, '\n');				\
+		return TRACE_TYPE_HANDLED;					\
+	}
+#include <asm/kvm_hypevents.h>
+
+#undef he_field
+#define he_field(_type, _item)						\
+	{								\
+		.type = #_type, .name = #_item,				\
+		.size = sizeof(_type), .align = __alignof__(_type),	\
+		.is_signed = is_signed_type(_type),			\
+	},
+#undef HYP_EVENT
+#define HYP_EVENT(__name, __proto, __struct, __assign, __printk)		\
+	static struct trace_event_fields hyp_event_fields_##__name[] = {	\
+		__struct							\
+		{}								\
+	};									\
+
+#undef __ARM64_KVM_HYPEVENTS_H_
+#include <asm/kvm_hypevents.h>
+
+#undef HYP_EVENT
+#undef HE_PRINTK
+#define __entry REC
+#define HE_PRINTK(fmt, args...) "\"" fmt "\", " __stringify(args)
+#define HYP_EVENT(__name, __proto, __struct, __assign, __printk)		\
+	static char hyp_event_print_fmt_##__name[] = __printk;			\
+	static struct trace_event_functions hyp_event_funcs_##__name = {	\
+		.trace = &hyp_event_trace_##__name,				\
+	};									\
+	static struct trace_event_class hyp_event_class_##__name = {		\
+		.system		= "nvhe-hypervisor",				\
+		.fields_array	= hyp_event_fields_##__name,			\
+		.fields		= LIST_HEAD_INIT(hyp_event_class_##__name.fields),\
+	};									\
+	static struct trace_event_call hyp_event_call_##__name = {		\
+		.class = &hyp_event_class_##__name,				\
+		.event.funcs = &hyp_event_funcs_##__name,			\
+		.print_fmt = hyp_event_print_fmt_##__name,			\
+	};									\
+	static bool hyp_event_enabled_##__name;					\
+	struct hyp_event __section("_hyp_events") hyp_event_##__name = {	\
+		.name = #__name,						\
+		.call = &hyp_event_call_##__name,				\
+		.enabled = &hyp_event_enabled_##__name,				\
+	}
+
+#undef __ARM64_KVM_HYPEVENTS_H_
+#include <asm/kvm_hypevents.h>
+
+extern struct hyp_event __start_hyp_events[];
+extern struct hyp_event __stop_hyp_events[];
+
+/* hyp_event section used by the hypervisor */
+extern struct hyp_event_id __hyp_event_ids_start[];
+extern struct hyp_event_id __hyp_event_ids_end[];
+
+static struct hyp_event *find_hyp_event(const char *name)
+{
+	struct hyp_event *event = __start_hyp_events;
+
+	for (; (unsigned long)event < (unsigned long)__stop_hyp_events;
+		event++) {
+		if (!strncmp(name, event->name, HYP_EVENT_NAME_MAX))
+			return event;
+	}
+
+	return NULL;
+}
+
+static int enable_hyp_event(struct hyp_event *event, bool enable)
+{
+	unsigned short id = event->call->event.type;
+	int ret;
+
+	if (enable == *event->enabled)
+		return 0;
+
+	ret = kvm_call_hyp_nvhe(__pkvm_enable_event, id, enable);
+	if (ret)
+		return ret;
+
+	*event->enabled = enable;
+
+	return 0;
+}
+
+static ssize_t
+hyp_event_write(struct file *filp, const char __user *ubuf, size_t cnt, loff_t *ppos)
+{
+	struct seq_file *seq_file = (struct seq_file *)filp->private_data;
+	struct hyp_event *evt = (struct hyp_event *)seq_file->private;
+	bool enabling;
+	int ret;
+	char c;
+
+	if (cnt != 2)
+		return -EINVAL;
+
+	if (get_user(c, ubuf))
+		return -EFAULT;
+
+	switch (c) {
+	case '1':
+		enabling = true;
+		break;
+	case '0':
+		enabling = false;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = enable_hyp_event(evt, enabling);
+	if (ret)
+		return ret;
+
+	return cnt;
+}
+
+static int hyp_event_show(struct seq_file *m, void *v)
+{
+	struct hyp_event *evt = (struct hyp_event *)m->private;
+
+	/* lock ?? Ain't no time for that ! */
+	seq_printf(m, "%d\n", *evt->enabled);
+
+	return 0;
+}
+
+static int hyp_event_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, hyp_event_show, inode->i_private);
+}
+
+static const struct file_operations hyp_event_fops = {
+	.open		= hyp_event_open,
+	.write		= hyp_event_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int hyp_event_id_show(struct seq_file *m, void *v)
+{
+	struct hyp_event *evt = (struct hyp_event *)m->private;
+
+	seq_printf(m, "%d\n", evt->call->event.type);
+
+	return 0;
+}
+
+static int hyp_event_id_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, hyp_event_id_show, inode->i_private);
+}
+
+static const struct file_operations hyp_event_id_fops = {
+	.open = hyp_event_id_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+static int hyp_event_format_show(struct seq_file *m, void *v)
+{
+	struct hyp_event *evt = (struct hyp_event *)m->private;
+	struct trace_event_fields *field;
+	unsigned int offset = sizeof(struct hyp_entry_hdr);
+
+	seq_printf(m, "name: %s\n", evt->name);
+	seq_printf(m, "ID: %d\n", evt->call->event.type);
+	seq_puts(m, "format:\n\tfield:unsigned short common_type;\toffset:0;\tsize:2;\tsigned:0;\n");
+	seq_puts(m, "\n");
+
+	field = &evt->call->class->fields_array[0];
+	while (field->name) {
+		seq_printf(m, "\tfield:%s %s;\toffset:%u;\tsize:%u;\tsigned:%d;\n",
+			  field->type, field->name, offset, field->size,
+			  !!field->is_signed);
+		offset += field->size;
+		field++;
+	}
+
+	if (field != &evt->call->class->fields_array[0])
+		seq_puts(m, "\n");
+
+	seq_printf(m, "print fmt: %s\n", evt->call->print_fmt);
+
+	return 0;
+}
+
+static int hyp_event_format_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, hyp_event_format_show, inode->i_private);
+}
+
+static const struct file_operations hyp_event_format_fops = {
+	.open = hyp_event_format_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+static char early_events[COMMAND_LINE_SIZE];
+
+static __init int setup_hyp_event_early(char *str)
+{
+	strscpy(early_events, str, COMMAND_LINE_SIZE);
+
+	return 1;
+}
+__setup("hyp_event=", setup_hyp_event_early);
+
+bool kvm_hyp_events_enable_early(void)
+{
+	char *token, *buf = early_events;
+	bool enabled = false;
+
+	while (true) {
+		token = strsep(&buf, ",");
+
+		if (!token)
+			break;
+
+		if (*token) {
+			struct hyp_event *event;
+			int ret;
+
+			event = find_hyp_event(token);
+			if (event) {
+				ret = enable_hyp_event(event, true);
+				if (ret)
+					pr_warn("Couldn't enable hyp event %s:%d\n",
+						token, ret);
+				else
+					enabled = true;
+			} else {
+				pr_warn("Couldn't find hyp event %s\n", token);
+			}
+		}
+
+		if (buf)
+			*(buf - 1) = ',';
+	}
+
+	return enabled;
+}
+
+void kvm_hyp_init_events_tracefs(struct dentry *parent)
+{
+	struct hyp_event *event = __start_hyp_events;
+	struct dentry *d, *event_dir;
+
+	parent = tracefs_create_dir("events", parent);
+	if (!parent) {
+		pr_err("Failed to create tracefs folder for hyp events\n");
+		return;
+	}
+
+	for (; (unsigned long)event < (unsigned long)__stop_hyp_events; event++) {
+		event_dir = tracefs_create_dir(event->name, parent);
+		if (!event_dir) {
+			pr_err("Failed to create events/hyp/%s\n", event->name);
+			continue;
+		}
+		d = tracefs_create_file("enable", 0700, event_dir, (void *)event,
+				&hyp_event_fops);
+		if (!d)
+			pr_err("Failed to create events/hyp/%s/enable\n", event->name);
+
+		d = tracefs_create_file("id", 0400, event_dir, (void *)event,
+				&hyp_event_id_fops);
+		if (!d)
+			pr_err("Failed to create events/hyp/%s/id\n", event->name);
+
+		d = tracefs_create_file("format", 0400, event_dir, (void *)event,
+					&hyp_event_format_fops);
+		if (!d)
+			pr_err("Failed to create events/hyp/%s/format\n",
+			       event->name);
+
+	}
+}
+
+/*
+ * Register hyp events and write their id into the hyp section _hyp_event_ids.
+ */
+int kvm_hyp_init_events(void)
+{
+	struct hyp_event *event = __start_hyp_events;
+	struct hyp_event_id *hyp_event_id = __hyp_event_ids_start;
+	int ret, err = -ENODEV;
+
+	/* TODO: BUILD_BUG nr events host side / hyp side */
+
+	for (; (unsigned long)event < (unsigned long)__stop_hyp_events;
+		event++, hyp_event_id++) {
+		event->call->name = event->name;
+		ret = register_trace_event(&event->call->event);
+		if (!ret) {
+			pr_warn("Couldn't register trace event for %s\n", event->name);
+			continue;
+		}
+
+		/*
+		 * Both the host and the hypervisor relies on the same hyp event
+		 * declarations from kvm_hypevents.h. We have then a 1:1
+		 * mapping.
+		 */
+		hyp_event_id->id = ret;
+
+		err = 0;
+	}
+
+	return err;
+}

diff --git a/arch/arm64/kvm/hyp_trace.c b/arch/arm64/kvm/hyp_trace.c
new file mode 100644
index 0000000..cabf730
--- /dev/null
+++ b/arch/arm64/kvm/hyp_trace.c

@@ -0,0 +1,796 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 Google LLC
+ */
+
+#include <linux/arm-smccc.h>
+#include <linux/list.h>
+#include <linux/percpu-defs.h>
+#include <linux/ring_buffer.h>
+#include <linux/trace_events.h>
+#include <linux/tracefs.h>
+
+#include <asm/kvm_host.h>
+#include <asm/kvm_hyptrace.h>
+#include <asm/kvm_hypevents_defs.h>
+
+#include "hyp_constants.h"
+#include "hyp_trace.h"
+
+#define RB_POLL_MS 1000
+
+#define TRACEFS_DIR "hyp"
+
+static bool hyp_trace_on;
+static int hyp_trace_readers;
+static struct trace_buffer *hyp_trace_buffer;
+static size_t hyp_trace_buffer_size = 7 << 10;
+static struct hyp_buffer_pages_backing hyp_buffer_pages_backing;
+static DEFINE_MUTEX(hyp_trace_lock);
+static DEFINE_PER_CPU(struct mutex, hyp_trace_reader_lock);
+
+static int bpage_backing_setup(struct hyp_trace_pack *pack)
+{
+	size_t backing_size;
+	void *start;
+
+	if (hyp_buffer_pages_backing.start)
+		return -EBUSY;
+
+	backing_size = STRUCT_HYP_BUFFER_PAGE_SIZE *
+		       pack->trace_buffer_pack.total_pages;
+	backing_size = PAGE_ALIGN(backing_size);
+
+	start = alloc_pages_exact(backing_size, GFP_KERNEL_ACCOUNT);
+	if (!start)
+		return -ENOMEM;
+
+	hyp_buffer_pages_backing.start = (unsigned long)start;
+	hyp_buffer_pages_backing.size = backing_size;
+	pack->backing.start = (unsigned long)start;
+	pack->backing.size = backing_size;
+
+	return 0;
+}
+
+static void bpage_backing_teardown(void)
+{
+	unsigned long backing = hyp_buffer_pages_backing.start;
+
+	if (!hyp_buffer_pages_backing.start)
+		return;
+
+	free_pages_exact((void *)backing, hyp_buffer_pages_backing.size);
+
+	hyp_buffer_pages_backing.start = 0;
+	hyp_buffer_pages_backing.size = 0;
+}
+
+/*
+ * Configure the hyp tracing clock. So far, only one is supported: "boot". This
+ * clock doesn't stop during suspend making it a good candidate. The downside is
+ * if this clock is corrected by NTP while tracing, the hyp clock will slightly
+ * drift compared to the host version.
+ */
+static void hyp_clock_setup(struct hyp_trace_pack *pack)
+{
+	struct kvm_nvhe_clock_data *clock_data = &pack->trace_clock_data;
+	struct system_time_snapshot snap;
+
+	ktime_get_snapshot(&snap);
+
+	clock_data->epoch_cyc = snap.cycles;
+	clock_data->epoch_ns = snap.boot;
+	clock_data->mult = snap.mono_mult;
+	clock_data->shift = snap.mono_shift;
+}
+
+static int __swap_reader_page(int cpu)
+{
+	return kvm_call_hyp_nvhe(__pkvm_rb_swap_reader_page, cpu);
+}
+
+static int __update_footers(int cpu)
+{
+	return kvm_call_hyp_nvhe(__pkvm_rb_update_footers, cpu);
+}
+
+struct ring_buffer_ext_cb hyp_cb = {
+	.update_footers = __update_footers,
+	.swap_reader = __swap_reader_page,
+};
+
+static inline int share_page(unsigned long va)
+{
+	return kvm_call_hyp_nvhe(__pkvm_host_share_hyp, virt_to_pfn(va), 1);
+}
+
+static inline int unshare_page(unsigned long va)
+{
+	return kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, virt_to_pfn(va), 1);
+}
+
+static int trace_pack_pages_apply(struct trace_buffer_pack *trace_pack,
+				  int (*func)(unsigned long))
+{
+	struct ring_buffer_pack *rb_pack;
+	int cpu, i, ret;
+
+	for_each_ring_buffer_pack(rb_pack, cpu, trace_pack) {
+		ret = func(rb_pack->reader_page_va);
+		if (ret)
+			return ret;
+
+		for (i = 0; i < rb_pack->nr_pages; i++) {
+			ret = func(rb_pack->page_va[i]);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * hyp_trace_pack size depends on trace_buffer_pack's, so
+ * trace_buffer_setup is in charge of the allocation for the former.
+ */
+static int trace_buffer_setup(struct hyp_trace_pack **pack, size_t *pack_size)
+{
+	struct trace_buffer_pack *trace_pack;
+	int ret;
+
+	hyp_trace_buffer = ring_buffer_alloc_ext(hyp_trace_buffer_size, &hyp_cb);
+	if (!hyp_trace_buffer)
+		return -ENOMEM;
+
+	*pack_size = offsetof(struct hyp_trace_pack, trace_buffer_pack) +
+		     trace_buffer_pack_size(hyp_trace_buffer);
+	/*
+	 * The hypervisor will unmap the pack from the host to protect the
+	 * reading. Page granularity for the pack allocation ensures no other
+	 * useful data will be unmapped.
+	 */
+	*pack_size = PAGE_ALIGN(*pack_size);
+	*pack = alloc_pages_exact(*pack_size, GFP_KERNEL);
+	if (!*pack) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	trace_pack = &(*pack)->trace_buffer_pack;
+	WARN_ON(trace_buffer_pack(hyp_trace_buffer, trace_pack));
+
+	ret = trace_pack_pages_apply(trace_pack, share_page);
+	if (ret) {
+		trace_pack_pages_apply(trace_pack, unshare_page);
+		free_pages_exact(*pack, *pack_size);
+		goto err;
+	}
+
+	return 0;
+err:
+	ring_buffer_free(hyp_trace_buffer);
+	hyp_trace_buffer = NULL;
+
+	return ret;
+}
+
+static void trace_buffer_teardown(struct trace_buffer_pack *trace_pack)
+{
+	bool alloc_trace_pack = !trace_pack;
+
+	if (alloc_trace_pack) {
+		trace_pack = kzalloc(trace_buffer_pack_size(hyp_trace_buffer), GFP_KERNEL);
+		if (!trace_pack) {
+			WARN_ON(1);
+			goto end;
+		}
+	}
+
+	WARN_ON(trace_buffer_pack(hyp_trace_buffer, trace_pack));
+	WARN_ON(trace_pack_pages_apply(trace_pack, unshare_page));
+
+	if (alloc_trace_pack)
+		kfree(trace_pack);
+end:
+	ring_buffer_free(hyp_trace_buffer);
+	hyp_trace_buffer = NULL;
+}
+
+static void hyp_free_tracing(void)
+{
+	if (!hyp_trace_buffer)
+		return;
+
+	trace_buffer_teardown(NULL);
+	bpage_backing_teardown();
+}
+
+static int hyp_start_tracing(void)
+{
+	struct hyp_trace_pack *pack;
+	size_t pack_size;
+	int ret = 0;
+
+	if (hyp_trace_on || hyp_trace_readers)
+		return -EBUSY;
+
+	hyp_free_tracing();
+
+	ret = trace_buffer_setup(&pack, &pack_size);
+	if (ret)
+		return ret;
+
+	hyp_clock_setup(pack);
+
+	ret = bpage_backing_setup(pack);
+	if (ret)
+		goto end_buffer_teardown;
+
+	ret = kvm_call_hyp_nvhe(__pkvm_start_tracing, (unsigned long)pack, pack_size);
+	if (!ret) {
+		hyp_trace_on = true;
+		goto end_free_pack;
+	}
+
+	bpage_backing_teardown();
+end_buffer_teardown:
+	trace_buffer_teardown(&pack->trace_buffer_pack);
+end_free_pack:
+	free_pages_exact(pack, pack_size);
+
+	return ret;
+}
+
+static void hyp_stop_tracing(void)
+{
+	int ret;
+
+	if (!hyp_trace_buffer || !hyp_trace_on)
+		return;
+
+	ret = kvm_call_hyp_nvhe(__pkvm_stop_tracing);
+	if (ret) {
+		WARN_ON(1);
+		return;
+	}
+
+	hyp_trace_on = false;
+}
+
+static ssize_t
+hyp_tracing_on(struct file *filp, const char __user *ubuf, size_t cnt, loff_t *ppos)
+{
+	int err = 0;
+	char c;
+
+	if (cnt != 2)
+		return -EINVAL;
+
+	if (get_user(c, ubuf))
+		return -EFAULT;
+
+	mutex_lock(&hyp_trace_lock);
+
+	switch (c) {
+	case '1':
+		err = hyp_start_tracing();
+		break;
+	case '0':
+		hyp_stop_tracing();
+		break;
+	default:
+		err = -EINVAL;
+	}
+
+	mutex_unlock(&hyp_trace_lock);
+
+	return err ? err : cnt;
+}
+
+static ssize_t hyp_tracing_on_read(struct file *filp, char __user *ubuf,
+				   size_t cnt, loff_t *ppos)
+{
+	char buf[3];
+	int r;
+
+	mutex_lock(&hyp_trace_lock);
+	r = sprintf(buf, "%d\n", hyp_trace_on);
+	mutex_unlock(&hyp_trace_lock);
+
+	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static const struct file_operations hyp_tracing_on_fops = {
+	.write	= hyp_tracing_on,
+	.read	= hyp_tracing_on_read,
+};
+
+static ssize_t hyp_buffer_size(struct file *filp, const char __user *ubuf,
+			       size_t cnt, loff_t *ppos)
+{
+	unsigned long val;
+	int ret;
+
+	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
+	if (ret)
+		return ret;
+
+	if (!val)
+		return -EINVAL;
+
+	mutex_lock(&hyp_trace_lock);
+	hyp_trace_buffer_size = val << 10; /* KB to B */
+	mutex_unlock(&hyp_trace_lock);
+
+	return cnt;
+}
+
+static ssize_t hyp_buffer_size_read(struct file *filp, char __user *ubuf,
+				    size_t cnt, loff_t *ppos)
+{
+	char buf[64];
+	int r;
+
+	mutex_lock(&hyp_trace_lock);
+	r = sprintf(buf, "%lu\n", hyp_trace_buffer_size >> 10);
+	mutex_unlock(&hyp_trace_lock);
+
+	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static const struct file_operations hyp_buffer_size_fops = {
+	.write	= hyp_buffer_size,
+	.read	= hyp_buffer_size_read,
+};
+
+static inline void hyp_trace_read_start(int cpu)
+{
+	mutex_lock(&per_cpu(hyp_trace_reader_lock, cpu));
+}
+
+static inline void hyp_trace_read_stop(int cpu)
+{
+	mutex_unlock(&per_cpu(hyp_trace_reader_lock, cpu));
+}
+
+static void ht_print_trace_time(struct ht_iterator *iter)
+{
+	unsigned long usecs_rem;
+	u64 ts_ns = iter->ts;
+
+	do_div(ts_ns, 1000);
+	usecs_rem = do_div(ts_ns, USEC_PER_SEC);
+
+	trace_seq_printf(&iter->seq, "[%5lu.%06lu] ",
+			 (unsigned long)ts_ns, usecs_rem);
+}
+
+extern struct trace_event *ftrace_find_event(int type);
+
+static void ht_print_trace_fmt(struct ht_iterator *iter)
+{
+	struct trace_event *e;
+
+	if (iter->lost_events)
+		trace_seq_printf(&iter->seq, "CPU:%d [LOST %lu EVENTS]\n",
+				 iter->cpu, iter->lost_events);
+
+	/* TODO: format bin/hex/raw */
+
+	ht_print_trace_time(iter);
+
+	e = ftrace_find_event(iter->ent->id);
+	if (e) {
+		e->funcs->trace((struct trace_iterator *)iter, 0, e);
+		return;
+	}
+
+	trace_seq_printf(&iter->seq, "Unknown event id %d\n", iter->ent->id);
+};
+
+static void *ht_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct ht_iterator *iter = m->private;
+	struct ring_buffer_event *evt;
+	u64 ts;
+
+	(*pos)++;
+
+	evt = ring_buffer_iter_peek(iter->buf_iter, &ts);
+	if (!evt)
+		return NULL;
+
+	iter->ent = (struct hyp_entry_hdr *)&evt->array[1];
+	iter->ts = ts;
+	iter->ent_size = evt->array[0];
+	ring_buffer_iter_advance(iter->buf_iter);
+
+	return iter;
+}
+
+static void *ht_start(struct seq_file *m, loff_t *pos)
+{
+	struct ht_iterator *iter = m->private;
+
+	if (*pos == 0) {
+		ring_buffer_iter_reset(iter->buf_iter);
+		(*pos)++;
+		iter->ent = NULL;
+
+		return iter;
+	}
+
+	hyp_trace_read_start(iter->cpu);
+
+	return ht_next(m, NULL, pos);
+}
+
+static void ht_stop(struct seq_file *m, void *v)
+{
+	struct ht_iterator *iter = m->private;
+
+	hyp_trace_read_stop(iter->cpu);
+}
+
+static int ht_show(struct seq_file *m, void *v)
+{
+	struct ht_iterator *iter = v;
+
+	if (!iter->ent) {
+		unsigned long entries, overrun;
+
+		entries = ring_buffer_entries_cpu(hyp_trace_buffer, iter->cpu);
+		overrun = ring_buffer_overrun_cpu(hyp_trace_buffer, iter->cpu);
+
+		seq_printf(m, "# entries-in-buffer/entries-written: %lu/%lu\n",
+			  entries, overrun + entries);
+	} else {
+		ht_print_trace_fmt(iter);
+		trace_print_seq(m, &iter->seq);
+	}
+
+	return 0;
+}
+
+static const struct seq_operations hyp_trace_ops = {
+	.start	= ht_start,
+	.next	= ht_next,
+	.stop	= ht_stop,
+	.show	= ht_show,
+};
+
+static int hyp_trace_open(struct inode *inode, struct file *file)
+{
+	unsigned long cpu = (unsigned long)inode->i_private;
+	struct ht_iterator *iter;
+	int ret = 0;
+
+	mutex_lock(&hyp_trace_lock);
+
+	if (!hyp_trace_buffer) {
+		ret = -ENODEV;
+		goto unlock;
+	}
+
+	iter = __seq_open_private(file, &hyp_trace_ops, sizeof(*iter));
+	if (!iter) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	iter->buf_iter = ring_buffer_read_prepare(hyp_trace_buffer, cpu, GFP_KERNEL);
+	if (!iter->buf_iter) {
+		seq_release_private(inode, file);
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	iter->cpu = cpu;
+
+	ring_buffer_read_prepare_sync();
+	ring_buffer_read_start(iter->buf_iter);
+
+	hyp_trace_readers++;
+unlock:
+	mutex_unlock(&hyp_trace_lock);
+
+	return ret;
+}
+
+int hyp_trace_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *m = file->private_data;
+	struct ht_iterator *iter = m->private;
+
+	ring_buffer_read_finish(iter->buf_iter);
+
+	mutex_lock(&hyp_trace_lock);
+	hyp_trace_readers--;
+	mutex_unlock(&hyp_trace_lock);
+
+	return seq_release_private(inode, file);
+}
+
+static const struct file_operations hyp_trace_fops = {
+	.open  = hyp_trace_open,
+	.read  = seq_read,
+	.llseek = seq_lseek,
+	.release = hyp_trace_release,
+};
+
+/*
+ * TODO: should be merged with the ring_buffer_iterator version
+ */
+static void *trace_buffer_peek(struct ht_iterator *iter)
+{
+	struct ring_buffer_event *event;
+
+	if (ring_buffer_empty_cpu(iter->trace_buffer, iter->cpu))
+		return NULL;
+
+	event = ring_buffer_peek(iter->trace_buffer, iter->cpu, &iter->ts, &iter->lost_events);
+	if (!event)
+		return NULL;
+
+	iter->ent = (struct hyp_entry_hdr *)&event->array[1];
+	iter->ent_size = event->array[0];
+
+	return iter;
+}
+
+static ssize_t
+hyp_trace_pipe_read(struct file *file, char __user *ubuf,
+		    size_t cnt, loff_t *ppos)
+{
+	struct ht_iterator *iter = (struct ht_iterator *)file->private_data;
+	struct trace_buffer *trace_buffer = iter->trace_buffer;
+	int ret;
+
+	trace_seq_init(&iter->seq);
+again:
+	ret = ring_buffer_wait(trace_buffer, iter->cpu, 0);
+	if (ret < 0)
+		return ret;
+
+	hyp_trace_read_start(iter->cpu);
+	while (trace_buffer_peek(iter)) {
+		unsigned long lost_events;
+
+		ht_print_trace_fmt(iter);
+		ring_buffer_consume(iter->trace_buffer, iter->cpu, NULL, &lost_events);
+	}
+	hyp_trace_read_stop(iter->cpu);
+
+	ret = trace_seq_to_user(&iter->seq, ubuf, cnt);
+	if (ret == -EBUSY)
+		goto again;
+
+	return ret;
+}
+
+static void __poke_reader(struct work_struct *work)
+{
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct ht_iterator *iter;
+
+	iter = container_of(dwork, struct ht_iterator, poke_work);
+
+	WARN_ON_ONCE(ring_buffer_poke(iter->trace_buffer, iter->cpu));
+
+	schedule_delayed_work((struct delayed_work *)work,
+			      msecs_to_jiffies(RB_POLL_MS));
+}
+
+static int hyp_trace_pipe_open(struct inode *inode, struct file *file)
+{
+	unsigned long cpu = (unsigned long)inode->i_private;
+	struct ht_iterator *iter;
+	int ret = -EINVAL;
+
+	mutex_lock(&hyp_trace_lock);
+
+	if (!hyp_trace_buffer)
+		goto unlock;
+
+	ret = ring_buffer_poke(hyp_trace_buffer, cpu);
+	if (ret)
+		goto unlock;
+
+	iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+	if (!iter) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	iter->cpu = cpu;
+	iter->trace_buffer = hyp_trace_buffer;
+
+	INIT_DELAYED_WORK(&iter->poke_work, __poke_reader);
+	schedule_delayed_work(&iter->poke_work, msecs_to_jiffies(RB_POLL_MS));
+
+	file->private_data = iter;
+
+	hyp_trace_readers++;
+unlock:
+	mutex_unlock(&hyp_trace_lock);
+
+	return ret;
+}
+
+static int hyp_trace_pipe_release(struct inode *inode, struct file *file)
+{
+	struct ht_iterator *iter = file->private_data;
+
+	cancel_delayed_work_sync(&iter->poke_work);
+
+	kfree(iter);
+
+	mutex_lock(&hyp_trace_lock);
+	hyp_trace_readers--;
+	mutex_unlock(&hyp_trace_lock);
+
+	return 0;
+}
+
+static const struct file_operations hyp_trace_pipe_fops = {
+	.open		= hyp_trace_pipe_open,
+	.read		= hyp_trace_pipe_read,
+	.release	= hyp_trace_pipe_release,
+	.llseek		= no_llseek,
+};
+
+static ssize_t
+hyp_trace_raw_read(struct file *file, char __user *ubuf,
+		    size_t cnt, loff_t *ppos)
+{
+	struct ht_iterator *iter = (struct ht_iterator *)file->private_data;
+	struct trace_buffer *trace_buffer = iter->trace_buffer;
+	size_t size;
+	int ret;
+
+	if (iter->copy_leftover)
+		goto read;
+again:
+	hyp_trace_read_start(iter->cpu);
+	ret = ring_buffer_read_page(trace_buffer, &iter->spare,
+				    cnt, iter->cpu, 0);
+	hyp_trace_read_stop(iter->cpu);
+	if (ret < 0) {
+		if (!ring_buffer_empty_cpu(iter->trace_buffer, iter->cpu))
+			return 0;
+
+		ret = ring_buffer_wait(trace_buffer, iter->cpu, 0);
+		if (ret < 0)
+			return ret;
+
+		goto again;
+	}
+
+	iter->copy_leftover = 0;
+read:
+	size = PAGE_SIZE - iter->copy_leftover;
+	if (size > cnt)
+		size = cnt;
+
+	ret = copy_to_user(ubuf, iter->spare + PAGE_SIZE - size, size);
+	if (ret == size)
+		return -EFAULT;
+
+	size -= ret;
+	*ppos += size;
+	iter->copy_leftover = ret;
+
+	return size;
+}
+
+static int hyp_trace_raw_open(struct inode *inode, struct file *file)
+{
+	int ret = hyp_trace_pipe_open(inode, file);
+	struct ht_iterator *iter;
+
+	if (ret)
+		return ret;
+
+	iter = file->private_data;
+	iter->spare = ring_buffer_alloc_read_page(iter->trace_buffer, iter->cpu);
+	if (IS_ERR(iter->spare)) {
+		ret = PTR_ERR(iter->spare);
+		iter->spare = NULL;
+		return ret;
+	}
+
+	return 0;
+}
+
+static int hyp_trace_raw_release(struct inode *inode, struct file *file)
+{
+	struct ht_iterator *iter = file->private_data;
+
+	ring_buffer_free_read_page(iter->trace_buffer, iter->cpu, iter->spare);
+
+	return hyp_trace_pipe_release(inode, file);
+}
+
+static const struct file_operations hyp_trace_raw_fops = {
+	.open		= hyp_trace_raw_open,
+	.read		= hyp_trace_raw_read,
+	.release	= hyp_trace_raw_release,
+	.llseek		= no_llseek,
+};
+
+static void hyp_tracefs_create_cpu_file(const char *file_name,
+					unsigned long cpu,
+					const struct file_operations *fops,
+					struct dentry *parent)
+{
+	if (!tracefs_create_file(file_name, 0440, parent, (void *)cpu, fops))
+		pr_warn("Failed to create tracefs %pd/%s\n", parent, file_name);
+}
+
+void kvm_hyp_init_events_tracefs(struct dentry *parent);
+bool kvm_hyp_events_enable_early(void);
+
+int init_hyp_tracefs(void)
+{
+	struct dentry *d, *root_dir, *per_cpu_root_dir;
+	char per_cpu_name[16];
+	unsigned long cpu;
+	int err;
+
+	if (!is_protected_kvm_enabled())
+		return 0;
+
+	root_dir = tracefs_create_dir(TRACEFS_DIR, NULL);
+	if (!root_dir) {
+		pr_err("Failed to create tracefs "TRACEFS_DIR"/\n");
+		return -ENODEV;
+	}
+
+	d = tracefs_create_file("tracing_on", 0640, root_dir, NULL,
+				&hyp_tracing_on_fops);
+	if (!d) {
+		pr_err("Failed to create tracefs "TRACEFS_DIR"/tracing_on\n");
+		return -ENODEV;
+	}
+
+	d = tracefs_create_file("buffer_size_kb", 0640, root_dir, NULL,
+				&hyp_buffer_size_fops);
+	if (!d)
+		pr_err("Failed to create tracefs "TRACEFS_DIR"/buffer_size_kb\n");
+
+
+	per_cpu_root_dir = tracefs_create_dir("per_cpu", root_dir);
+	if (!per_cpu_root_dir) {
+		pr_err("Failed to create tracefs "TRACEFS_DIR"/per_cpu/\n");
+		return -ENODEV;
+	}
+
+	for_each_possible_cpu(cpu) {
+		struct dentry *dir;
+
+		snprintf(per_cpu_name, sizeof(per_cpu_name), "cpu%lu", cpu);
+		dir = tracefs_create_dir(per_cpu_name, per_cpu_root_dir);
+		if (!dir) {
+			pr_warn("Failed to create tracefs "TRACEFS_DIR"/per_cpu/cpu%lu\n",
+				cpu);
+			continue;
+		}
+
+		hyp_tracefs_create_cpu_file("trace", cpu, &hyp_trace_fops, dir);
+		hyp_tracefs_create_cpu_file("trace_pipe", cpu,
+					    &hyp_trace_pipe_fops, dir);
+		hyp_tracefs_create_cpu_file("trace_pipe_raw", cpu,
+					    &hyp_trace_raw_fops, dir);
+	}
+
+	kvm_hyp_init_events_tracefs(root_dir);
+	if (kvm_hyp_events_enable_early()) {
+		err = hyp_start_tracing();
+		if (err)
+			pr_warn("Failed to start early events tracing: %d\n", err);
+	}
+
+	return 0;
+}

diff --git a/arch/arm64/kvm/hyp_trace.h b/arch/arm64/kvm/hyp_trace.h
new file mode 100644
index 0000000..fb8172e
--- /dev/null
+++ b/arch/arm64/kvm/hyp_trace.h

@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARM64_KVM_HYP_TRACE_H__
+#define __ARM64_KVM_HYP_TRACE_H__
+
+#include <linux/trace_seq.h>
+#include <linux/workqueue.h>
+
+struct ht_iterator {
+	struct ring_buffer_iter *buf_iter;
+	struct trace_buffer *trace_buffer;
+	struct hyp_entry_hdr *ent;
+	struct trace_seq seq;
+	u64 ts;
+	void *spare;
+	size_t copy_leftover;
+	size_t ent_size;
+	struct delayed_work poke_work;
+	unsigned long lost_events;
+	int cpu;
+};
+
+#ifdef CONFIG_TRACING
+int init_hyp_tracefs(void);
+#else
+static inline int init_hyp_tracefs(void) { return 0; }
+#endif
+#endif

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index c9f401f..b4712bc 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c

@@ -5,6 +5,7 @@
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_pkvm.h>
 
 #include <kvm/arm_hypercalls.h>
 #include <kvm/arm_psci.h>
@@ -13,8 +14,15 @@
 	GENMASK(KVM_REG_ARM_STD_BMAP_BIT_COUNT - 1, 0)
 #define KVM_ARM_SMCCC_STD_HYP_FEATURES				\
 	GENMASK(KVM_REG_ARM_STD_HYP_BMAP_BIT_COUNT - 1, 0)
-#define KVM_ARM_SMCCC_VENDOR_HYP_FEATURES			\
-	GENMASK(KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT - 1, 0)
+#define KVM_ARM_SMCCC_VENDOR_HYP_FEATURES ({				\
+	unsigned long f;						\
+	f = GENMASK(KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT - 1, 0);	\
+	if (is_protected_kvm_enabled()) {				\
+		f |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);		\
+		f |= BIT(ARM_SMCCC_KVM_FUNC_MEM_RELINQUISH);		\
+	}								\
+	f;								\
+})
 
 static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
 {
@@ -75,6 +83,9 @@ static bool kvm_hvc_call_default_allowed(u32 func_id)
 	 */
 	case ARM_SMCCC_VERSION_FUNC_ID:
 	case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID:
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID:
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
 		return true;
 	default:
 		/* PSCI 0.2 and up is in the 0:0x1f range */
@@ -116,6 +127,9 @@ static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
 	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
 		return test_bit(KVM_REG_ARM_VENDOR_HYP_BIT_PTP,
 				&smccc_feat->vendor_hyp_bmap);
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_RELINQUISH_FUNC_ID:
+		return test_bit(ARM_SMCCC_KVM_FUNC_MEM_RELINQUISH,
+				&smccc_feat->vendor_hyp_bmap);
 	default:
 		return kvm_hvc_call_default_allowed(func_id);
 	}
@@ -213,6 +227,24 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
 		kvm_ptp_get_time(vcpu, val);
 		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID:
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID:
+		if (!kvm_vm_is_protected(vcpu->kvm))
+			break;
+		atomic64_add(
+			func_id == ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID ?
+			PAGE_SIZE : -PAGE_SIZE,
+			&vcpu->kvm->stat.protected_shared_mem);
+		val[0] = SMCCC_RET_SUCCESS;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_RELINQUISH_FUNC_ID:
+		pkvm_host_reclaim_page(vcpu->kvm, smccc_get_arg1(vcpu));
+		val[0] = SMCCC_RET_SUCCESS;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
+		if (kvm_vm_is_protected(vcpu->kvm) && !topup_hyp_memcache(vcpu))
+			val[0] = SMCCC_RET_SUCCESS;
+		break;
 	case ARM_SMCCC_TRNG_VERSION:
 	case ARM_SMCCC_TRNG_FEATURES:
 	case ARM_SMCCC_TRNG_GET_UUID:

diff --git a/arch/arm64/kvm/iommu.c b/arch/arm64/kvm/iommu.c
new file mode 100644
index 0000000..08b318a
--- /dev/null
+++ b/arch/arm64/kvm/iommu.c

@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 - Google LLC
+ * Author: David Brazdil <dbrazdil@google.com>
+ */
+
+#include <linux/kvm_host.h>
+
+static unsigned long dev_to_id(struct device *dev)
+{
+	/* Use the struct device pointer as a unique identifier. */
+	return (unsigned long)dev;
+}
+
+int pkvm_iommu_driver_init(u64 drv, void *data, size_t size)
+{
+	return kvm_call_hyp_nvhe(__pkvm_iommu_driver_init, drv, data, size);
+}
+EXPORT_SYMBOL_GPL(pkvm_iommu_driver_init);
+
+int pkvm_iommu_register(struct device *dev, u64 drv, phys_addr_t pa,
+			size_t size, struct device *parent)
+{
+	void *mem;
+	int ret;
+
+	/*
+	 * Hypcall to register the device. It will return -ENOMEM if it needs
+	 * more memory. In that case allocate a page and retry.
+	 * We assume that hyp never allocates more than a page per hypcall.
+	 */
+	ret = kvm_call_hyp_nvhe(__pkvm_iommu_register, dev_to_id(dev),
+				drv, pa, size, dev_to_id(parent), NULL, 0);
+	if (ret == -ENOMEM) {
+		mem = (void *)__get_free_page(GFP_KERNEL);
+		if (!mem)
+			return -ENOMEM;
+
+		ret = kvm_call_hyp_nvhe(__pkvm_iommu_register, dev_to_id(dev),
+					drv, pa, size, dev_to_id(parent),
+					mem, PAGE_SIZE);
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pkvm_iommu_register);
+
+int pkvm_iommu_suspend(struct device *dev)
+{
+	return kvm_call_hyp_nvhe(__pkvm_iommu_pm_notify, dev_to_id(dev),
+				 PKVM_IOMMU_PM_SUSPEND);
+}
+EXPORT_SYMBOL_GPL(pkvm_iommu_suspend);
+
+int pkvm_iommu_resume(struct device *dev)
+{
+	return kvm_call_hyp_nvhe(__pkvm_iommu_pm_notify, dev_to_id(dev),
+				 PKVM_IOMMU_PM_RESUME);
+}
+EXPORT_SYMBOL_GPL(pkvm_iommu_resume);
+
+int pkvm_iommu_finalize(void)
+{
+	return kvm_call_hyp_nvhe(__pkvm_iommu_finalize);
+}
+EXPORT_SYMBOL_GPL(pkvm_iommu_finalize);

diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index 3dd38a1..db6630c 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c

@@ -133,8 +133,17 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 	/*
 	 * No valid syndrome? Ask userspace for help if it has
 	 * volunteered to do so, and bail out otherwise.
+	 *
+	 * In the protected VM case, there isn't much userspace can do
+	 * though, so directly deliver an exception to the guest.
 	 */
 	if (!kvm_vcpu_dabt_isvalid(vcpu)) {
+		if (is_protected_kvm_enabled() &&
+		    kvm_vm_is_protected(vcpu->kvm)) {
+			kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+			return 1;
+		}
+
 		if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
 			     &vcpu->kvm->arch.flags)) {
 			run->exit_reason = KVM_EXIT_ARM_NISV;

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9..9b6df52 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c

@@ -225,6 +225,24 @@ static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 si
 	__unmap_stage2_range(mmu, start, size, true);
 }
 
+static void pkvm_stage2_flush(struct kvm *kvm)
+{
+	struct kvm_pinned_page *ppage;
+	struct rb_node *node;
+
+	/*
+	 * Contrary to stage2_apply_range(), we don't need to check
+	 * whether the VM is being torn down, as this is always called
+	 * from a vcpu thread, and the list is only ever freed on VM
+	 * destroy (which only occurs when all vcpu are gone).
+	 */
+	for (node = rb_first(&kvm->arch.pkvm.pinned_pages); node; node = rb_next(node)) {
+		ppage = rb_entry(node, struct kvm_pinned_page, node);
+		__clean_dcache_guest_page(page_address(ppage->page), PAGE_SIZE);
+		cond_resched_rwlock_write(&kvm->mmu_lock);
+	}
+}
+
 static void stage2_flush_memslot(struct kvm *kvm,
 				 struct kvm_memory_slot *memslot)
 {
@@ -250,9 +268,13 @@ static void stage2_flush_vm(struct kvm *kvm)
 	idx = srcu_read_lock(&kvm->srcu);
 	write_lock(&kvm->mmu_lock);
 
-	slots = kvm_memslots(kvm);
-	kvm_for_each_memslot(memslot, bkt, slots)
-		stage2_flush_memslot(kvm, memslot);
+	if (!is_protected_kvm_enabled()) {
+		slots = kvm_memslots(kvm);
+		kvm_for_each_memslot(memslot, bkt, slots)
+			stage2_flush_memslot(kvm, memslot);
+	} else if (!kvm_vm_is_protected(kvm)) {
+		pkvm_stage2_flush(kvm);
+	}
 
 	write_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -658,6 +680,17 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 	return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(level));
 }
 
+static bool stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
+{
+	return true;
+}
+
+static bool stage2_pte_is_counted(kvm_pte_t pte, u32 level)
+
+{
+	return !!pte;
+}
+
 static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
 	.zalloc_page		= stage2_memcache_zalloc_page,
 	.zalloc_pages_exact	= kvm_s2_zalloc_pages_exact,
@@ -671,19 +704,54 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
 	.icache_inval_pou	= invalidate_icache_guest_page,
 };
 
+static struct kvm_pgtable_pte_ops kvm_s2_pte_ops = {
+	.force_pte_cb = stage2_force_pte_cb,
+	.pte_is_counted_cb = stage2_pte_is_counted
+
+};
+
 /**
  * kvm_init_stage2_mmu - Initialise a S2 MMU structure
  * @kvm:	The pointer to the KVM structure
  * @mmu:	The pointer to the s2 MMU structure
+ * @type:	The machine type of the virtual machine
  *
  * Allocates only the stage-2 HW PGD level table(s).
  * Note we don't need locking here as this is only called when the VM is
  * created, which can only be done once.
  */
-int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type)
 {
+	u32 kvm_ipa_limit = get_kvm_ipa_limit();
 	int cpu, err;
 	struct kvm_pgtable *pgt;
+	u64 mmfr0, mmfr1;
+	u32 phys_shift;
+
+	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
+	if (is_protected_kvm_enabled()) {
+		phys_shift = kvm_ipa_limit;
+	} else if (phys_shift) {
+		if (phys_shift > kvm_ipa_limit ||
+		    phys_shift < ARM64_MIN_PARANGE_BITS)
+			return -EINVAL;
+	} else {
+		phys_shift = KVM_PHYS_SHIFT;
+		if (phys_shift > kvm_ipa_limit) {
+			pr_warn_once("%s using unsupported default IPA limit, upgrade your VMM\n",
+				     current->comm);
+			return -EINVAL;
+		}
+	}
+
+	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
+	kvm->arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
+	kvm->arch.pkvm.pinned_pages = RB_ROOT;
+	mmu->arch = &kvm->arch;
+
+	if (is_protected_kvm_enabled())
+		return 0;
 
 	if (mmu->pgt != NULL) {
 		kvm_err("kvm_arch already initialized?\n");
@@ -695,7 +763,8 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 		return -ENOMEM;
 
 	mmu->arch = &kvm->arch;
-	err = kvm_pgtable_stage2_init(pgt, mmu, &kvm_s2_mm_ops);
+	err = kvm_pgtable_stage2_init(pgt, mmu, &kvm_s2_mm_ops,
+				      &kvm_s2_pte_ops);
 	if (err)
 		goto out_free_pgtable;
 
@@ -792,6 +861,9 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	struct kvm_pgtable *pgt = NULL;
 
+	if (is_protected_kvm_enabled())
+		return;
+
 	write_lock(&kvm->mmu_lock);
 	pgt = mmu->pgt;
 	if (pgt) {
@@ -807,6 +879,95 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	}
 }
 
+static void hyp_mc_free_fn(void *addr, void *args)
+{
+	bool account_stage2 = (bool)args;
+
+	if (account_stage2)
+		kvm_account_pgtable_pages(addr, -1);
+
+	free_page((unsigned long)addr);
+}
+
+static void *hyp_mc_alloc_fn(void *unused)
+{
+	void *addr = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+
+	if (addr)
+		kvm_account_pgtable_pages(addr, 1);
+
+	return addr;
+}
+
+static void account_hyp_memcache(struct kvm_hyp_memcache *mc,
+				 unsigned long prev_nr_pages,
+				 struct kvm *kvm)
+{
+	unsigned long nr_pages = mc->nr_pages;
+
+	if (prev_nr_pages == nr_pages)
+		return;
+
+	if (nr_pages > prev_nr_pages) {
+		atomic64_add((nr_pages - prev_nr_pages) << PAGE_SHIFT,
+			     &kvm->stat.protected_hyp_mem);
+	} else {
+		atomic64_sub((prev_nr_pages - nr_pages) << PAGE_SHIFT,
+			     &kvm->stat.protected_hyp_mem);
+	}
+}
+
+static void __free_account_hyp_memcache(struct kvm_hyp_memcache *mc,
+					struct kvm *kvm,
+					bool account_stage2)
+{
+	unsigned long prev_nr_pages;
+
+	if (!is_protected_kvm_enabled())
+		return;
+
+	prev_nr_pages = mc->nr_pages;
+	__free_hyp_memcache(mc, hyp_mc_free_fn, kvm_host_va,
+			    (void *)account_stage2);
+	account_hyp_memcache(mc, prev_nr_pages, kvm);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc, struct kvm *kvm)
+{
+	__free_account_hyp_memcache(mc, kvm, false);
+}
+
+/*
+ * All pages donated to the hypervisor through kvm_hyp_memcache are for the
+ * stage-2 page table. However, kvm_hyp_memcache is also a vehicule to retrieve
+ * meta-data from the hypervisor, hence the need for a stage2 specific free
+ * function.
+ */
+void free_hyp_stage2_memcache(struct kvm_hyp_memcache *mc, struct kvm *kvm)
+{
+	__free_account_hyp_memcache(mc, kvm, true);
+}
+
+int topup_hyp_memcache(struct kvm_vcpu *vcpu)
+{
+	struct kvm_hyp_memcache *mc = &vcpu->arch.pkvm_memcache;
+	unsigned long prev_nr_pages;
+	int err;
+
+	if (!is_protected_kvm_enabled())
+		return 0;
+
+	prev_nr_pages = mc->nr_pages;
+
+	err = __topup_hyp_memcache(mc, kvm_mmu_cache_min_pages(vcpu->kvm),
+				   hyp_mc_alloc_fn,
+				   kvm_host_pa, NULL);
+	if (!err)
+		account_hyp_memcache(mc, prev_nr_pages, vcpu->kvm);
+
+	return err;
+}
+
 /**
  * kvm_phys_addr_ioremap - map a device range to guest IPA
  *
@@ -1119,6 +1280,118 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 	return 0;
 }
 
+static int pkvm_host_map_guest(u64 pfn, u64 gfn)
+{
+	int ret = kvm_call_hyp_nvhe(__pkvm_host_map_guest, pfn, gfn);
+
+	/*
+	 * Getting -EPERM at this point implies that the pfn has already been
+	 * mapped. This should only ever happen when two vCPUs faulted on the
+	 * same page, and the current one lost the race to do the mapping.
+	 */
+	return (ret == -EPERM) ? -EAGAIN : ret;
+}
+
+static int cmp_ppages(struct rb_node *node, const struct rb_node *parent)
+{
+	struct kvm_pinned_page *a = container_of(node, struct kvm_pinned_page, node);
+	struct kvm_pinned_page *b = container_of(parent, struct kvm_pinned_page, node);
+
+	if (a->ipa < b->ipa)
+		return -1;
+	if (a->ipa > b->ipa)
+		return 1;
+	return 0;
+}
+
+static int insert_ppage(struct kvm *kvm, struct kvm_pinned_page *ppage)
+{
+	if (rb_find_add(&ppage->node, &kvm->arch.pkvm.pinned_pages, cmp_ppages))
+		return -EEXIST;
+
+	return 0;
+}
+
+static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			  unsigned long hva)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
+	struct kvm_pinned_page *ppage;
+	struct kvm *kvm = vcpu->kvm;
+	struct page *page;
+	u64 pfn;
+	int ret;
+
+	ret = topup_hyp_memcache(vcpu);
+	if (ret)
+		return -ENOMEM;
+
+	ppage = kmalloc(sizeof(*ppage), GFP_KERNEL_ACCOUNT);
+	if (!ppage)
+		return -ENOMEM;
+
+	ret = account_locked_vm(mm, 1, true);
+	if (ret)
+		goto free_ppage;
+
+	mmap_read_lock(mm);
+	ret = pin_user_pages(hva, 1, flags, &page, NULL);
+	mmap_read_unlock(mm);
+
+	if (ret == -EHWPOISON) {
+		kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
+		ret = 0;
+		goto dec_account;
+	} else if (ret != 1) {
+		ret = -EFAULT;
+		goto dec_account;
+	} else if (!PageSwapBacked(page)) {
+		/*
+		 * We really can't deal with page-cache pages returned by GUP
+		 * because (a) we may trigger writeback of a page for which we
+		 * no longer have access and (b) page_mkclean() won't find the
+		 * stage-2 mapping in the rmap so we can get out-of-whack with
+		 * the filesystem when marking the page dirty during unpinning
+		 * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
+		 * without asking ext4 first")).
+		 *
+		 * Ideally we'd just restrict ourselves to anonymous pages, but
+		 * we also want to allow memfd (i.e. shmem) pages, so check for
+		 * pages backed by swap in the knowledge that the GUP pin will
+		 * prevent try_to_unmap() from succeeding.
+		 */
+		ret = -EIO;
+		goto dec_account;
+	}
+
+	write_lock(&kvm->mmu_lock);
+	pfn = page_to_pfn(page);
+	ret = pkvm_host_map_guest(pfn, fault_ipa >> PAGE_SHIFT);
+	if (ret) {
+		if (ret == -EAGAIN)
+			ret = 0;
+		goto unpin;
+	}
+
+	ppage->page = page;
+	ppage->ipa = fault_ipa;
+	WARN_ON(insert_ppage(kvm, ppage));
+	write_unlock(&kvm->mmu_lock);
+
+	return 0;
+
+unpin:
+	write_unlock(&kvm->mmu_lock);
+	unpin_user_pages(&page, 1);
+dec_account:
+	account_locked_vm(mm, 1, false);
+free_ppage:
+	kfree(ppage);
+
+	return ret;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -1402,7 +1675,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		}
 
 		/* Falls between the IPA range and the PARange? */
-		if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
+		if (!is_protected_kvm_enabled() &&
+		    fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
 			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
 
 			if (is_iabt)
@@ -1484,7 +1758,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * faulting VA. This is always 12 bits, irrespective
 		 * of the page size.
 		 */
-		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
+		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & FAR_MASK;
 		ret = io_mem_abort(vcpu, fault_ipa);
 		goto out_unlock;
 	}
@@ -1498,7 +1772,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	if (is_protected_kvm_enabled())
+		ret = pkvm_mem_abort(vcpu, fault_ipa, hva);
+	else
+		ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+
 	if (ret == 0)
 		ret = 1;
 out:
@@ -1707,6 +1985,19 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	hva_t hva, reg_end;
 	int ret = 0;
 
+	if (is_protected_kvm_enabled()) {
+		/* In protected mode, cannot modify memslots once a VM has run. */
+		if ((change == KVM_MR_DELETE || change == KVM_MR_MOVE) &&
+		    kvm->arch.pkvm.handle) {
+			return -EPERM;
+		}
+
+		if (new &&
+		    new->flags & (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)) {
+			return -EPERM;
+		}
+	}
+
 	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
 			change != KVM_MR_FLAGS_ONLY)
 		return 0;
@@ -1783,6 +2074,10 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	gpa_t gpa = slot->base_gfn << PAGE_SHIFT;
 	phys_addr_t size = slot->npages << PAGE_SHIFT;
 
+	/* Stage-2 is managed by hyp in protected mode. */
+	if (is_protected_kvm_enabled())
+		return;
+
 	write_lock(&kvm->mmu_lock);
 	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	write_unlock(&kvm->mmu_lock);

diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index ebecb7c..4ff9834 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c

@@ -4,14 +4,28 @@
  * Author: Quentin Perret <qperret@google.com>
  */
 
+#include <linux/io.h>
 #include <linux/kvm_host.h>
 #include <linux/memblock.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/of_address.h>
+#include <linux/of_fdt.h>
+#include <linux/of_reserved_mem.h>
 #include <linux/sort.h>
 
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_pkvm.h>
+#include <asm/kvm_pkvm_module.h>
 
 #include "hyp_constants.h"
 
+static struct reserved_mem *pkvm_firmware_mem;
+static phys_addr_t *pvmfw_base = &kvm_nvhe_sym(pvmfw_base);
+static phys_addr_t *pvmfw_size = &kvm_nvhe_sym(pvmfw_size);
+
+static struct pkvm_moveable_reg *moveable_regs = kvm_nvhe_sym(pkvm_moveable_regs);
 static struct memblock_region *hyp_memory = kvm_nvhe_sym(hyp_memory);
 static unsigned int *hyp_memblock_nr_ptr = &kvm_nvhe_sym(hyp_memblock_nr);
 
@@ -51,9 +65,83 @@ static int __init register_memblock_regions(void)
 	return 0;
 }
 
+static int cmp_moveable_reg(const void *p1, const void *p2)
+{
+	const struct pkvm_moveable_reg *r1 = p1;
+	const struct pkvm_moveable_reg *r2 = p2;
+
+	/*
+	 * Moveable regions may overlap, so put the largest one first when start
+	 * addresses are equal to allow a simpler walk from e.g.
+	 * host_stage2_unmap_unmoveable_regs().
+	 */
+	if (r1->start < r2->start)
+		return -1;
+	else if (r1->start > r2->start)
+		return 1;
+	else if (r1->size > r2->size)
+		return -1;
+	else if (r1->size < r2->size)
+		return 1;
+	return 0;
+}
+
+static void __init sort_moveable_regs(void)
+{
+	sort(moveable_regs,
+	     kvm_nvhe_sym(pkvm_moveable_regs_nr),
+	     sizeof(struct pkvm_moveable_reg),
+	     cmp_moveable_reg,
+	     NULL);
+}
+
+static int __init register_moveable_regions(void)
+{
+	struct memblock_region *reg;
+	struct device_node *np;
+	int i = 0;
+
+	for_each_mem_region(reg) {
+		if (i >= PKVM_NR_MOVEABLE_REGS)
+			return -ENOMEM;
+		moveable_regs[i].start = reg->base;
+		moveable_regs[i].size = reg->size;
+		moveable_regs[i].type = PKVM_MREG_MEMORY;
+		i++;
+	}
+
+	for_each_compatible_node(np, NULL, "pkvm,protected-region") {
+		struct resource res;
+		u64 start, size;
+		int ret;
+
+		if (i >= PKVM_NR_MOVEABLE_REGS)
+			return -ENOMEM;
+
+		ret = of_address_to_resource(np, 0, &res);
+		if (ret)
+			return ret;
+
+		start = res.start;
+		size = resource_size(&res);
+		if (!PAGE_ALIGNED(start) || !PAGE_ALIGNED(size))
+			return -EINVAL;
+
+		moveable_regs[i].start = start;
+		moveable_regs[i].size = size;
+		moveable_regs[i].type = PKVM_MREG_PROTECTED_RANGE;
+		i++;
+	}
+
+	kvm_nvhe_sym(pkvm_moveable_regs_nr) = i;
+	sort_moveable_regs();
+
+	return 0;
+}
+
 void __init kvm_hyp_reserve(void)
 {
-	u64 nr_pages, prev, hyp_mem_pages = 0;
+	u64 hyp_mem_pages = 0;
 	int ret;
 
 	if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
@@ -69,23 +157,18 @@ void __init kvm_hyp_reserve(void)
 		return;
 	}
 
+	ret = register_moveable_regions();
+	if (ret) {
+		*hyp_memblock_nr_ptr = 0;
+		kvm_err("Failed to register pkvm moveable regions: %d\n", ret);
+		return;
+	}
+
 	hyp_mem_pages += hyp_s1_pgtable_pages();
 	hyp_mem_pages += host_s2_pgtable_pages();
-
-	/*
-	 * The hyp_vmemmap needs to be backed by pages, but these pages
-	 * themselves need to be present in the vmemmap, so compute the number
-	 * of pages needed by looking for a fixed point.
-	 */
-	nr_pages = 0;
-	do {
-		prev = nr_pages;
-		nr_pages = hyp_mem_pages + prev;
-		nr_pages = DIV_ROUND_UP(nr_pages * STRUCT_HYP_PAGE_SIZE,
-					PAGE_SIZE);
-		nr_pages += __hyp_pgtable_max_pages(nr_pages);
-	} while (nr_pages != prev);
-	hyp_mem_pages += nr_pages;
+	hyp_mem_pages += hyp_vm_table_pages();
+	hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
+	hyp_mem_pages += hyp_ffa_proxy_pages();
 
 	/*
 	 * Try to allocate a PMD-aligned region to reduce TLB pressure once
@@ -107,3 +190,481 @@ void __init kvm_hyp_reserve(void)
 	kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
 		 hyp_mem_base);
 }
+
+/*
+ * Allocates and donates memory for hypervisor VM structs at EL2.
+ *
+ * Allocates space for the VM state, which includes the hyp vm as well as
+ * the hyp vcpus.
+ *
+ * Stores an opaque handler in the kvm struct for future reference.
+ *
+ * Return 0 on success, negative error code on failure.
+ */
+static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
+{
+	size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz, last_ran_sz, total_sz;
+	struct kvm_vcpu *host_vcpu;
+	pkvm_handle_t handle;
+	void *pgd, *hyp_vm, *last_ran;
+	unsigned long idx;
+	int ret;
+
+	if (host_kvm->created_vcpus < 1)
+		return -EINVAL;
+
+	pgd_sz = kvm_pgtable_stage2_pgd_size(host_kvm->arch.vtcr);
+
+	/*
+	 * The PGD pages will be reclaimed using a hyp_memcache which implies
+	 * page granularity. So, use alloc_pages_exact() to get individual
+	 * refcounts.
+	 */
+	pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL_ACCOUNT);
+	if (!pgd)
+		return -ENOMEM;
+
+	/* Allocate memory to donate to hyp for vm and vcpu pointers. */
+	hyp_vm_sz = PAGE_ALIGN(size_add(PKVM_HYP_VM_SIZE,
+					size_mul(sizeof(void *),
+						 host_kvm->created_vcpus)));
+	hyp_vm = alloc_pages_exact(hyp_vm_sz, GFP_KERNEL_ACCOUNT);
+	if (!hyp_vm) {
+		ret = -ENOMEM;
+		goto free_pgd;
+	}
+
+	/* Allocate memory to donate to hyp for tracking mmu->last_vcpu_ran. */
+	last_ran_sz = PAGE_ALIGN(array_size(num_possible_cpus(), sizeof(int)));
+	last_ran = alloc_pages_exact(last_ran_sz, GFP_KERNEL_ACCOUNT);
+	if (!last_ran) {
+		ret = -ENOMEM;
+		goto free_vm;
+	}
+
+	/* Donate the VM memory to hyp and let hyp initialize it. */
+	ret = kvm_call_hyp_nvhe(__pkvm_init_vm, host_kvm, hyp_vm, pgd, last_ran);
+	if (ret < 0)
+		goto free_last_ran;
+
+	handle = ret;
+
+	host_kvm->arch.pkvm.handle = handle;
+
+	total_sz = hyp_vm_sz + last_ran_sz + pgd_sz;
+
+	/* Donate memory for the vcpus at hyp and initialize it. */
+	hyp_vcpu_sz = PAGE_ALIGN(PKVM_HYP_VCPU_SIZE);
+	kvm_for_each_vcpu(idx, host_vcpu, host_kvm) {
+		void *hyp_vcpu;
+
+		/* Indexing of the vcpus to be sequential starting at 0. */
+		if (WARN_ON(host_vcpu->vcpu_idx != idx)) {
+			ret = -EINVAL;
+			goto destroy_vm;
+		}
+
+		hyp_vcpu = alloc_pages_exact(hyp_vcpu_sz, GFP_KERNEL_ACCOUNT);
+		if (!hyp_vcpu) {
+			ret = -ENOMEM;
+			goto destroy_vm;
+		}
+
+		total_sz += hyp_vcpu_sz;
+
+		ret = kvm_call_hyp_nvhe(__pkvm_init_vcpu, handle, host_vcpu,
+					hyp_vcpu);
+		if (ret) {
+			free_pages_exact(hyp_vcpu, hyp_vcpu_sz);
+			goto destroy_vm;
+		}
+	}
+
+	atomic64_set(&host_kvm->stat.protected_hyp_mem, total_sz);
+	kvm_account_pgtable_pages(pgd, pgd_sz >> PAGE_SHIFT);
+
+	return 0;
+
+destroy_vm:
+	pkvm_destroy_hyp_vm(host_kvm);
+	return ret;
+free_last_ran:
+	free_pages_exact(last_ran, last_ran_sz);
+free_vm:
+	free_pages_exact(hyp_vm, hyp_vm_sz);
+free_pgd:
+	free_pages_exact(pgd, pgd_sz);
+	return ret;
+}
+
+int pkvm_create_hyp_vm(struct kvm *host_kvm)
+{
+	int ret = 0;
+
+	mutex_lock(&host_kvm->lock);
+	if (!host_kvm->arch.pkvm.handle)
+		ret = __pkvm_create_hyp_vm(host_kvm);
+	mutex_unlock(&host_kvm->lock);
+
+	return ret;
+}
+
+void pkvm_destroy_hyp_vm(struct kvm *host_kvm)
+{
+	struct kvm_pinned_page *ppage;
+	struct mm_struct *mm = current->mm;
+	struct rb_node *node;
+
+	if (!host_kvm->arch.pkvm.handle)
+		goto out_free;
+
+	WARN_ON(kvm_call_hyp_nvhe(__pkvm_start_teardown_vm, host_kvm->arch.pkvm.handle));
+
+	node = rb_first(&host_kvm->arch.pkvm.pinned_pages);
+	while (node) {
+		ppage = rb_entry(node, struct kvm_pinned_page, node);
+		WARN_ON(kvm_call_hyp_nvhe(__pkvm_reclaim_dying_guest_page,
+					  host_kvm->arch.pkvm.handle,
+					  page_to_pfn(ppage->page),
+					  ppage->ipa));
+		cond_resched();
+
+		account_locked_vm(mm, 1, false);
+		unpin_user_pages_dirty_lock(&ppage->page, 1, true);
+		node = rb_next(node);
+		rb_erase(&ppage->node, &host_kvm->arch.pkvm.pinned_pages);
+		kfree(ppage);
+	}
+
+	WARN_ON(kvm_call_hyp_nvhe(__pkvm_finalize_teardown_vm, host_kvm->arch.pkvm.handle));
+
+out_free:
+	host_kvm->arch.pkvm.handle = 0;
+	free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc, host_kvm);
+	free_hyp_stage2_memcache(&host_kvm->arch.pkvm.teardown_stage2_mc,
+				 host_kvm);
+}
+
+int pkvm_init_host_vm(struct kvm *host_kvm, unsigned long type)
+{
+	mutex_init(&host_kvm->lock);
+
+	if (!(type & KVM_VM_TYPE_ARM_PROTECTED))
+		return 0;
+
+	if (!is_protected_kvm_enabled())
+		return -EINVAL;
+
+	host_kvm->arch.pkvm.pvmfw_load_addr = PVMFW_INVALID_LOAD_ADDR;
+	host_kvm->arch.pkvm.enabled = true;
+	return 0;
+}
+
+static int rb_ppage_cmp(const void *key, const struct rb_node *node)
+{
+       struct kvm_pinned_page *p = container_of(node, struct kvm_pinned_page, node);
+       phys_addr_t ipa = (phys_addr_t)key;
+
+       return (ipa < p->ipa) ? -1 : (ipa > p->ipa);
+}
+
+void pkvm_host_reclaim_page(struct kvm *host_kvm, phys_addr_t ipa)
+{
+	struct kvm_pinned_page *ppage;
+	struct mm_struct *mm = current->mm;
+	struct rb_node *node;
+
+	write_lock(&host_kvm->mmu_lock);
+	node = rb_find((void *)ipa, &host_kvm->arch.pkvm.pinned_pages,
+		       rb_ppage_cmp);
+	if (node)
+		rb_erase(node, &host_kvm->arch.pkvm.pinned_pages);
+	write_unlock(&host_kvm->mmu_lock);
+
+	WARN_ON(!node);
+	if (!node)
+		return;
+
+	ppage = container_of(node, struct kvm_pinned_page, node);
+	account_locked_vm(mm, 1, false);
+	unpin_user_pages_dirty_lock(&ppage->page, 1, true);
+	kfree(ppage);
+}
+
+static int __init pkvm_firmware_rmem_err(struct reserved_mem *rmem,
+					 const char *reason)
+{
+	phys_addr_t end = rmem->base + rmem->size;
+
+	kvm_err("Ignoring pkvm guest firmware memory reservation [%pa - %pa]: %s\n",
+		&rmem->base, &end, reason);
+	return -EINVAL;
+}
+
+static int __init pkvm_firmware_rmem_init(struct reserved_mem *rmem)
+{
+	unsigned long node = rmem->fdt_node;
+
+	if (pkvm_firmware_mem)
+		return pkvm_firmware_rmem_err(rmem, "duplicate reservation");
+
+	if (!of_get_flat_dt_prop(node, "no-map", NULL))
+		return pkvm_firmware_rmem_err(rmem, "missing \"no-map\" property");
+
+	if (of_get_flat_dt_prop(node, "reusable", NULL))
+		return pkvm_firmware_rmem_err(rmem, "\"reusable\" property unsupported");
+
+	if (!PAGE_ALIGNED(rmem->base))
+		return pkvm_firmware_rmem_err(rmem, "base is not page-aligned");
+
+	if (!PAGE_ALIGNED(rmem->size))
+		return pkvm_firmware_rmem_err(rmem, "size is not page-aligned");
+
+	*pvmfw_size = rmem->size;
+	*pvmfw_base = rmem->base;
+	pkvm_firmware_mem = rmem;
+	return 0;
+}
+RESERVEDMEM_OF_DECLARE(pkvm_firmware, "linux,pkvm-guest-firmware-memory",
+		       pkvm_firmware_rmem_init);
+
+static int __init pkvm_firmware_rmem_clear(void)
+{
+	void *addr;
+	phys_addr_t size;
+
+	if (likely(!pkvm_firmware_mem) || is_protected_kvm_enabled())
+		return 0;
+
+	kvm_info("Clearing unused pKVM firmware memory\n");
+	size = pkvm_firmware_mem->size;
+	addr = memremap(pkvm_firmware_mem->base, size, MEMREMAP_WB);
+	if (!addr)
+		return -EINVAL;
+
+	memset(addr, 0, size);
+	dcache_clean_poc((unsigned long)addr, (unsigned long)addr + size);
+	memunmap(addr);
+	return 0;
+}
+device_initcall_sync(pkvm_firmware_rmem_clear);
+
+static int pkvm_vm_ioctl_set_fw_ipa(struct kvm *kvm, u64 ipa)
+{
+	int ret = 0;
+
+	if (!pkvm_firmware_mem)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+	if (kvm->arch.pkvm.handle) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	kvm->arch.pkvm.pvmfw_load_addr = ipa;
+out_unlock:
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+static int pkvm_vm_ioctl_info(struct kvm *kvm,
+			      struct kvm_protected_vm_info __user *info)
+{
+	struct kvm_protected_vm_info kinfo = {
+		.firmware_size = pkvm_firmware_mem ?
+				 pkvm_firmware_mem->size :
+				 0,
+	};
+
+	return copy_to_user(info, &kinfo, sizeof(kinfo)) ? -EFAULT : 0;
+}
+
+int pkvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	if (!kvm_vm_is_protected(kvm))
+		return -EINVAL;
+
+	if (cap->args[1] || cap->args[2] || cap->args[3])
+		return -EINVAL;
+
+	switch (cap->flags) {
+	case KVM_CAP_ARM_PROTECTED_VM_FLAGS_SET_FW_IPA:
+		return pkvm_vm_ioctl_set_fw_ipa(kvm, cap->args[0]);
+	case KVM_CAP_ARM_PROTECTED_VM_FLAGS_INFO:
+		return pkvm_vm_ioctl_info(kvm, (void __force __user *)cap->args[0]);
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+#ifdef CONFIG_MODULES
+static int __init early_pkvm_enable_modules(char *arg)
+{
+	extern unsigned long kvm_nvhe_sym(pkvm_priv_hcall_limit);
+
+	/*
+	 * Move the limit to allow module loading HVCs. It will be moved back to
+	 * its original position in __pkvm_close_module_registration().
+	 */
+	kvm_nvhe_sym(pkvm_priv_hcall_limit) = __KVM_HOST_SMCCC_FUNC___pkvm_alloc_module_va;
+
+	return 0;
+}
+early_param("kvm-arm.protected_modules", early_pkvm_enable_modules);
+
+struct pkvm_mod_sec_mapping {
+	struct pkvm_module_section *sec;
+	enum kvm_pgtable_prot prot;
+};
+
+static void pkvm_unmap_module_pages(void *kern_va, void *hyp_va, size_t size)
+{
+	size_t offset;
+	u64 pfn;
+
+	for (offset = 0; offset < size; offset += PAGE_SIZE) {
+		pfn = vmalloc_to_pfn(kern_va + offset);
+		kvm_call_hyp_nvhe(__pkvm_unmap_module_page, pfn,
+				  hyp_va + offset);
+	}
+}
+
+static void pkvm_unmap_module_sections(struct pkvm_mod_sec_mapping *secs_map, void *hyp_va_base, int nr_secs)
+{
+	size_t offset, size;
+	void *start;
+	int i;
+
+	for (i = 0; i < nr_secs; i++) {
+		start = secs_map[i].sec->start;
+		size = secs_map[i].sec->end - start;
+		offset = start - secs_map[0].sec->start;
+		pkvm_unmap_module_pages(start, hyp_va_base + offset, size);
+	}
+}
+
+static int pkvm_map_module_section(struct pkvm_mod_sec_mapping *sec_map, void *hyp_va)
+{
+	size_t offset, size = sec_map->sec->end - sec_map->sec->start;
+	int ret;
+	u64 pfn;
+
+	for (offset = 0; offset < size; offset += PAGE_SIZE) {
+		pfn = vmalloc_to_pfn(sec_map->sec->start + offset);
+		ret = kvm_call_hyp_nvhe(__pkvm_map_module_page, pfn,
+					hyp_va + offset, sec_map->prot);
+		if (ret) {
+			pkvm_unmap_module_pages(sec_map->sec->start, hyp_va, offset);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int pkvm_map_module_sections(struct pkvm_mod_sec_mapping *secs_map, void *hyp_va_base, int nr_secs)
+{
+	size_t offset;
+	int i, ret;
+
+	for (i = 0; i < nr_secs; i++) {
+		offset = secs_map[i].sec->start - secs_map[0].sec->start;
+		ret = pkvm_map_module_section(&secs_map[i], hyp_va_base + offset);
+		if (ret) {
+			pkvm_unmap_module_sections(secs_map, hyp_va_base, i);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int __pkvm_cmp_mod_sec(const void *p1, const void *p2)
+{
+	struct pkvm_mod_sec_mapping const *s1 = p1;
+	struct pkvm_mod_sec_mapping const *s2 = p2;
+
+	return s1->sec->start < s2->sec->start ? -1 : s1->sec->start > s2->sec->start;
+}
+
+int __pkvm_load_el2_module(struct module *this, unsigned long *token)
+{
+	struct pkvm_el2_module *mod = &this->arch.hyp;
+	struct pkvm_mod_sec_mapping secs_map[] = {
+		{ &mod->text, KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_X },
+		{ &mod->bss, KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W },
+		{ &mod->rodata, KVM_PGTABLE_PROT_R },
+		{ &mod->data, KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W },
+	};
+	void *start, *end, *hyp_va;
+	kvm_nvhe_reloc_t *endrel;
+	size_t offset, size;
+	int ret, i;
+
+	if (!is_protected_kvm_enabled())
+		return -EOPNOTSUPP;
+
+	for (i = 0; i < ARRAY_SIZE(secs_map); i++) {
+		if (!PAGE_ALIGNED(secs_map[i].sec->start)) {
+			kvm_err("EL2 sections are not page-aligned\n");
+			return -EINVAL;
+		}
+	}
+
+	if (!try_module_get(this)) {
+		kvm_err("Kernel module has been unloaded\n");
+		return -ENODEV;
+	}
+
+	sort(secs_map, ARRAY_SIZE(secs_map), sizeof(secs_map[0]), __pkvm_cmp_mod_sec, NULL);
+	start = secs_map[0].sec->start;
+	end = secs_map[ARRAY_SIZE(secs_map) - 1].sec->end;
+	size = end - start;
+
+	hyp_va = (void *)kvm_call_hyp_nvhe(__pkvm_alloc_module_va, size >> PAGE_SHIFT);
+	if (!hyp_va) {
+		kvm_err("Failed to allocate hypervisor VA space for EL2 module\n");
+		module_put(this);
+		return -ENOMEM;
+	}
+
+	/*
+	 * The token can be used for other calls related to this module.
+	 * Conveniently the only information needed is this addr so let's use it
+	 * as an identifier.
+	 */
+	if (token)
+		*token = (unsigned long)hyp_va;
+
+	endrel = (void *)mod->relocs + mod->nr_relocs * sizeof(*endrel);
+	kvm_apply_hyp_module_relocations(start, hyp_va, mod->relocs, endrel);
+
+	ret = pkvm_map_module_sections(secs_map, hyp_va, ARRAY_SIZE(secs_map));
+	if (ret) {
+		kvm_err("Failed to map EL2 module page: %d\n", ret);
+		module_put(this);
+		return ret;
+	}
+
+	offset = (size_t)((void *)mod->init - start);
+	ret = kvm_call_hyp_nvhe(__pkvm_init_module, hyp_va + offset);
+	if (ret) {
+		kvm_err("Failed to init EL2 module: %d\n", ret);
+		pkvm_unmap_module_sections(secs_map, hyp_va, ARRAY_SIZE(secs_map));
+		module_put(this);
+		return ret;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__pkvm_load_el2_module);
+
+int __pkvm_register_el2_call(unsigned long hfn_hyp_va)
+{
+	return kvm_call_hyp_nvhe(__pkvm_register_hcall, hfn_hyp_va);
+}
+EXPORT_SYMBOL_GPL(__pkvm_register_el2_call);
+#endif /* CONFIG_MODULES */

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 7fbc4c1..0f7001d7 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c

@@ -21,16 +21,6 @@
  * as described in ARM document number ARM DEN 0022A.
  */
 
-#define AFFINITY_MASK(level)	~((0x1UL << ((level) * MPIDR_LEVEL_BITS)) - 1)
-
-static unsigned long psci_affinity_mask(unsigned long affinity_level)
-{
-	if (affinity_level <= 3)
-		return MPIDR_HWID_BITMASK & AFFINITY_MASK(affinity_level);
-
-	return 0;
-}
-
 static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 {
 	/*
@@ -51,12 +41,6 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 	return PSCI_RET_SUCCESS;
 }
 
-static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
-					   unsigned long affinity)
-{
-	return !(affinity & ~MPIDR_HWID_BITMASK);
-}
-
 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 {
 	struct vcpu_reset_state *reset_state;
@@ -204,18 +188,6 @@ static void kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
 	run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
 }
 
-static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
-{
-	int i;
-
-	/*
-	 * Zero the input registers' upper 32 bits. They will be fully
-	 * zeroed on exit, so we're fine changing them in place.
-	 */
-	for (i = 1; i < 4; i++)
-		vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i)));
-}
-
 static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32 fn)
 {
 	/*

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ae1847..29a1854 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c

@@ -32,15 +32,6 @@
 /* Maximum phys_shift supported for any VM on this host */
 static u32 kvm_ipa_limit;
 
-/*
- * ARMv8 Reset Values
- */
-#define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
-				 PSR_F_BIT | PSR_D_BIT)
-
-#define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
-				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
-
 unsigned int kvm_sve_max_vl;
 
 int kvm_arm_init_sve(void)
@@ -118,7 +109,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
 		kfree(buf);
 		return ret;
 	}
-	
+
 	vcpu->arch.sve_state = buf;
 	vcpu_set_flag(vcpu, VCPU_SVE_FINALIZED);
 	return 0;
@@ -165,22 +156,6 @@ static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
 		memset(vcpu->arch.sve_state, 0, vcpu_sve_state_size(vcpu));
 }
 
-static int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
-{
-	/*
-	 * For now make sure that both address/generic pointer authentication
-	 * features are requested by the userspace together and the system
-	 * supports these capabilities.
-	 */
-	if (!test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, vcpu->arch.features) ||
-	    !test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features) ||
-	    !system_has_full_ptr_auth())
-		return -EINVAL;
-
-	vcpu_set_flag(vcpu, GUEST_HAS_PTRAUTH);
-	return 0;
-}
-
 /**
  * kvm_set_vm_width() - set the register width for the guest
  * @vcpu: Pointer to the vcpu being configured
@@ -251,7 +226,6 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	struct vcpu_reset_state reset_state;
 	int ret;
 	bool loaded;
-	u32 pstate;
 
 	mutex_lock(&vcpu->kvm->lock);
 	ret = kvm_set_vm_width(vcpu);
@@ -290,29 +264,13 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	switch (vcpu->arch.target) {
-	default:
-		if (vcpu_el1_is_32bit(vcpu)) {
-			pstate = VCPU_RESET_PSTATE_SVC;
-		} else {
-			pstate = VCPU_RESET_PSTATE_EL1;
-		}
-
-		if (kvm_vcpu_has_pmu(vcpu) && !kvm_arm_support_pmu_v3()) {
-			ret = -EINVAL;
-			goto out;
-		}
-		break;
+	if (kvm_vcpu_has_pmu(vcpu) && !kvm_arm_support_pmu_v3()) {
+		ret = -EINVAL;
+		goto out;
 	}
 
 	/* Reset core registers */
-	memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu)));
-	memset(&vcpu->arch.ctxt.fp_regs, 0, sizeof(vcpu->arch.ctxt.fp_regs));
-	vcpu->arch.ctxt.spsr_abt = 0;
-	vcpu->arch.ctxt.spsr_und = 0;
-	vcpu->arch.ctxt.spsr_irq = 0;
-	vcpu->arch.ctxt.spsr_fiq = 0;
-	vcpu_gp_regs(vcpu)->pstate = pstate;
+	kvm_reset_vcpu_core(vcpu);
 
 	/* Reset system registers */
 	kvm_reset_sys_regs(vcpu);
@@ -321,22 +279,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	 * Additional reset state handling that PSCI may have imposed on us.
 	 * Must be done after all the sys_reg reset.
 	 */
-	if (reset_state.reset) {
-		unsigned long target_pc = reset_state.pc;
-
-		/* Gracefully handle Thumb2 entry point */
-		if (vcpu_mode_is_32bit(vcpu) && (target_pc & 1)) {
-			target_pc &= ~1UL;
-			vcpu_set_thumb(vcpu);
-		}
-
-		/* Propagate caller endianness */
-		if (reset_state.be)
-			kvm_vcpu_set_be(vcpu);
-
-		*vcpu_pc(vcpu) = target_pc;
-		vcpu_set_reg(vcpu, 0, reset_state.r0);
-	}
+	if (reset_state.reset)
+		kvm_reset_vcpu_psci(vcpu, &reset_state);
 
 	/* Reset timer */
 	ret = kvm_timer_vcpu_reset(vcpu);
@@ -395,32 +339,3 @@ int kvm_set_ipa_limit(void)
 
 	return 0;
 }
-
-int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
-{
-	u64 mmfr0, mmfr1;
-	u32 phys_shift;
-
-	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
-		return -EINVAL;
-
-	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
-	if (phys_shift) {
-		if (phys_shift > kvm_ipa_limit ||
-		    phys_shift < ARM64_MIN_PARANGE_BITS)
-			return -EINVAL;
-	} else {
-		phys_shift = KVM_PHYS_SHIFT;
-		if (phys_shift > kvm_ipa_limit) {
-			pr_warn_once("%s using unsupported default IPA limit, upgrade your VMM\n",
-				     current->comm);
-			return -EINVAL;
-		}
-	}
-
-	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
-	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
-	kvm->arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
-
-	return 0;
-}

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f4a7c5ab..daad344 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c

@@ -61,26 +61,6 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 	return false;
 }
 
-u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
-{
-	u64 val = 0x8badf00d8badf00d;
-
-	if (vcpu_get_flag(vcpu, SYSREGS_ON_CPU) &&
-	    __vcpu_read_sys_reg_from_cpu(reg, &val))
-		return val;
-
-	return __vcpu_sys_reg(vcpu, reg);
-}
-
-void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
-{
-	if (vcpu_get_flag(vcpu, SYSREGS_ON_CPU) &&
-	    __vcpu_write_sys_reg_to_cpu(val, reg))
-		return;
-
-	 __vcpu_sys_reg(vcpu, reg) = val;
-}
-
 /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
 static u32 cache_levels;
 
@@ -578,19 +558,7 @@ static void reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 
 static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
-	u64 mpidr;
-
-	/*
-	 * Map the vcpu_id into the first three affinity level fields of
-	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
-	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
-	 * of the GICv3 to be able to address each CPU directly when
-	 * sending IPIs.
-	 */
-	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
-	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
-	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
-	vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
+	vcpu_write_sys_reg(vcpu, calculate_mpidr(vcpu), MPIDR_EL1);
 }
 
 static unsigned int pmu_visibility(const struct kvm_vcpu *vcpu,

diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index e4ebb3a..c102399 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h

@@ -200,6 +200,25 @@ find_reg(const struct sys_reg_params *params, const struct sys_reg_desc table[],
 	return __inline_bsearch((void *)pval, table, num, sizeof(table[0]), match_sys_reg);
 }
 
+static inline u64 calculate_mpidr(const struct kvm_vcpu *vcpu)
+{
+	u64 mpidr;
+
+	/*
+	 * Map the vcpu_id into the first three affinity level fields of
+	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
+	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
+	 * of the GICv3 to be able to address each CPU directly when
+	 * sending IPIs.
+	 */
+	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
+	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
+	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
+	mpidr |= (1ULL << 31);
+
+	return mpidr;
+}
+
 const struct sys_reg_desc *get_reg_by_id(u64 id,
 					 const struct sys_reg_desc table[],
 					 unsigned int num);

diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
index 91b22a0..2923da0 100644
--- a/arch/arm64/kvm/va_layout.c
+++ b/arch/arm64/kvm/va_layout.c

@@ -12,6 +12,7 @@
 #include <asm/insn.h>
 #include <asm/kvm_mmu.h>
 #include <asm/memory.h>
+#include <asm/patching.h>
 
 /*
  * The LSB of the HYP VA tag
@@ -109,6 +110,29 @@ __init void kvm_apply_hyp_relocations(void)
 	}
 }
 
+void kvm_apply_hyp_module_relocations(void *mod_start, void *hyp_va,
+				      kvm_nvhe_reloc_t *begin,
+				      kvm_nvhe_reloc_t *end)
+{
+	kvm_nvhe_reloc_t *rel;
+
+	for (rel = begin; rel < end; ++rel) {
+		u32 **ptr, *va;
+
+		/*
+		 * Each entry contains a 32-bit relative offset from itself
+		 * to a VA position in the module area.
+		 */
+		ptr = (u32 **)((char *)rel + *rel);
+
+		/* Read the module VA value at the relocation address. */
+		va = *ptr;
+
+		/* Convert the module VA of the reloc to a hyp VA */
+		WARN_ON(aarch64_addr_write(ptr, (u64)(((void *)va - mod_start) + hyp_va)));
+	}
+}
+
 static u32 compute_instruction(int n, u32 rd, u32 rn)
 {
 	u32 insn = AARCH64_BREAK_FAULT;

diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c
index 6456483..4e8bb90 100644
--- a/arch/arm64/kvm/vgic/vgic-v2.c
+++ b/arch/arm64/kvm/vgic/vgic-v2.c

@@ -470,17 +470,10 @@ void vgic_v2_load(struct kvm_vcpu *vcpu)
 		       kvm_vgic_global_state.vctrl_base + GICH_APR);
 }
 
-void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu)
+void vgic_v2_put(struct kvm_vcpu *vcpu, bool blocking)
 {
 	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
 
 	cpu_if->vgic_vmcr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VMCR);
-}
-
-void vgic_v2_put(struct kvm_vcpu *vcpu)
-{
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-
-	vgic_v2_vmcr_sync(vcpu);
 	cpu_if->vgic_apr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_APR);
 }

diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 5bdada3..8469155 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c

@@ -721,15 +721,8 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
-	/*
-	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
-	 * is dependent on ICC_SRE_EL1.SRE, and we have to perform the
-	 * VMCR_EL2 save/restore in the world switch.
-	 */
-	if (likely(cpu_if->vgic_sre))
-		kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
-
-	kvm_call_hyp(__vgic_v3_restore_aprs, cpu_if);
+	if (likely(!is_protected_kvm_enabled()))
+		kvm_call_hyp(__vgic_v3_restore_vmcr_aprs, cpu_if);
 
 	if (has_vhe())
 		__vgic_v3_activate_traps(cpu_if);
@@ -737,23 +730,14 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	WARN_ON(vgic_v4_load(vcpu));
 }
 
-void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
+void vgic_v3_put(struct kvm_vcpu *vcpu, bool blocking)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
-	if (likely(cpu_if->vgic_sre))
-		cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
-}
+	WARN_ON(vgic_v4_put(vcpu, blocking));
 
-void vgic_v3_put(struct kvm_vcpu *vcpu)
-{
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-
-	WARN_ON(vgic_v4_put(vcpu, false));
-
-	vgic_v3_vmcr_sync(vcpu);
-
-	kvm_call_hyp(__vgic_v3_save_aprs, cpu_if);
+	if (likely(!is_protected_kvm_enabled()))
+		kvm_call_hyp(__vgic_v3_save_vmcr_aprs, cpu_if);
 
 	if (has_vhe())
 		__vgic_v3_deactivate_traps(cpu_if);

diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index d97e6080..6189ad9 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c

@@ -931,26 +931,15 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu)
 		vgic_v3_load(vcpu);
 }
 
-void kvm_vgic_put(struct kvm_vcpu *vcpu)
+void kvm_vgic_put(struct kvm_vcpu *vcpu, bool blocking)
 {
 	if (unlikely(!vgic_initialized(vcpu->kvm)))
 		return;
 
 	if (kvm_vgic_global_state.type == VGIC_V2)
-		vgic_v2_put(vcpu);
+		vgic_v2_put(vcpu, blocking);
 	else
-		vgic_v3_put(vcpu);
-}
-
-void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu)
-{
-	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
-		return;
-
-	if (kvm_vgic_global_state.type == VGIC_V2)
-		vgic_v2_vmcr_sync(vcpu);
-	else
-		vgic_v3_vmcr_sync(vcpu);
+		vgic_v3_put(vcpu, blocking);
 }
 
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)

diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 23e280f..d9f54ac 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h

@@ -201,8 +201,7 @@ int vgic_register_dist_iodev(struct kvm *kvm, gpa_t dist_base_address,
 
 void vgic_v2_init_lrs(void);
 void vgic_v2_load(struct kvm_vcpu *vcpu);
-void vgic_v2_put(struct kvm_vcpu *vcpu);
-void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu);
+void vgic_v2_put(struct kvm_vcpu *vcpu, bool blocking);
 
 void vgic_v2_save_state(struct kvm_vcpu *vcpu);
 void vgic_v2_restore_state(struct kvm_vcpu *vcpu);
@@ -232,8 +231,7 @@ int vgic_register_redist_iodev(struct kvm_vcpu *vcpu);
 bool vgic_v3_check_base(struct kvm *kvm);
 
 void vgic_v3_load(struct kvm_vcpu *vcpu);
-void vgic_v3_put(struct kvm_vcpu *vcpu);
-void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu);
+void vgic_v3_put(struct kvm_vcpu *vcpu, bool blocking);
 
 bool vgic_has_its(struct kvm *kvm);
 int kvm_vgic_register_its_device(void);

diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index ff1e800..675de58 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile

@@ -1,8 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   cache.o copypage.o flush.o \
-				   ioremap.o mmap.o pgd.o mmu.o \
+				   ioremap.o mem_encrypt.o mmap.o pgd.o mmu.o \
 				   context.o proc.o pageattr.o
+obj-$(CONFIG_MEMORY_RELINQUISH)	+= mem_relinquish.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
 obj-$(CONFIG_PTDUMP_CORE)	+= ptdump.o
 obj-$(CONFIG_PTDUMP_DEBUGFS)	+= ptdump_debugfs.o

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 5240f6a..8e84749 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c

@@ -9,6 +9,7 @@
 #include <linux/dma-map-ops.h>
 #include <linux/iommu.h>
 #include <xen/xen.h>
+#include <trace/hooks/iommu.h>
 
 #include <asm/cacheflush.h>
 #include <asm/xen/xen-ops.h>
@@ -73,8 +74,10 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 		   ARCH_DMA_MINALIGN, cls);
 
 	dev->dma_coherent = coherent;
-	if (iommu)
+	if (iommu) {
 		iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
+		trace_android_rvh_iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
+	}
 
 	xen_setup_dma_ops(dev);
 }

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 3eb2825..3e4e882 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c

@@ -42,6 +42,7 @@
 #include <asm/system_misc.h>
 #include <asm/tlbflush.h>
 #include <asm/traps.h>
+#include <asm/virt.h>
 
 struct fault_info {
 	int	(*fn)(unsigned long far, unsigned long esr,
@@ -258,6 +259,15 @@ static inline bool is_el1_permission_fault(unsigned long addr, unsigned long esr
 	return false;
 }
 
+static bool is_pkvm_stage2_abort(unsigned int esr)
+{
+	/*
+	 * S1PTW should only ever be set in ESR_EL1 if the pkvm hypervisor
+	 * injected a stage-2 abort -- see host_inject_abort().
+	 */
+	return is_pkvm_initialized() && (esr & ESR_ELx_S1PTW);
+}
+
 static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
 							unsigned long esr,
 							struct pt_regs *regs)
@@ -269,6 +279,9 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
 	    (esr & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_FAULT)
 		return false;
 
+	if (is_pkvm_stage2_abort(esr))
+		return false;
+
 	local_irq_save(flags);
 	asm volatile("at s1e1r, %0" :: "r" (addr));
 	isb();
@@ -390,6 +403,8 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
 			msg = "read from unreadable memory";
 	} else if (addr < PAGE_SIZE) {
 		msg = "NULL pointer dereference";
+	} else if (is_pkvm_stage2_abort(esr)) {
+		msg = "access to hypervisor-protected memory";
 	} else {
 		if (is_translation_fault(esr) &&
 		    kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
@@ -583,6 +598,13 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 					 addr, esr, regs);
 	}
 
+	if (is_pkvm_stage2_abort(esr)) {
+		if (!user_mode(regs))
+			goto no_context;
+		arm64_force_sig_fault(SIGSEGV, SEGV_ACCERR, far, "stage-2 fault");
+		return 0;
+	}
+
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
 	/*

diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index c5af103..060c578 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c

@@ -1,7 +1,217 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
+#define pr_fmt(fmt)	"ioremap: " fmt
+
 #include <linux/mm.h>
 #include <linux/io.h>
+#include <linux/arm-smccc.h>
+#include <linux/slab.h>
+
+#include <asm/fixmap.h>
+#include <asm/tlbflush.h>
+#include <asm/hypervisor.h>
+
+#ifndef ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO	5
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO)
+#endif	/* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO */
+
+#ifndef ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL	6
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL)
+#endif	/* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL */
+
+#ifndef ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP	7
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP)
+#endif	/* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP */
+
+#ifndef ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP	8
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP)
+#endif	/* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP */
+
+struct ioremap_guard_ref {
+	refcount_t	count;
+};
+
+static DEFINE_STATIC_KEY_FALSE(ioremap_guard_key);
+static DEFINE_XARRAY(ioremap_guard_array);
+static DEFINE_MUTEX(ioremap_guard_lock);
+
+static size_t guard_granule;
+
+static bool ioremap_guard;
+static int __init ioremap_guard_setup(char *str)
+{
+	ioremap_guard = true;
+
+	return 0;
+}
+early_param("ioremap_guard", ioremap_guard_setup);
+
+void kvm_init_ioremap_services(void)
+{
+	struct arm_smccc_res res;
+	size_t granule;
+
+	if (!ioremap_guard)
+		return;
+
+	/* We need all the functions to be implemented */
+	if (!kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO) ||
+	    !kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL) ||
+	    !kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP) ||
+	    !kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP))
+		return;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID,
+			     0, 0, 0, &res);
+	granule = res.a0;
+	if (granule > PAGE_SIZE || !granule || (granule & (granule - 1))) {
+		pr_warn("KVM MMIO guard initialization failed: "
+			"guard granule (%lu), page size (%lu)\n",
+			granule, PAGE_SIZE);
+		return;
+	}
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID,
+			     &res);
+	if (res.a0 == SMCCC_RET_SUCCESS) {
+		guard_granule = granule;
+		static_branch_enable(&ioremap_guard_key);
+		pr_info("Using KVM MMIO guard for ioremap\n");
+	} else {
+		pr_warn("KVM MMIO guard registration failed (%ld)\n", res.a0);
+	}
+}
+
+void ioremap_phys_range_hook(phys_addr_t phys_addr, size_t size, pgprot_t prot)
+{
+	int guard_shift;
+
+	if (!static_branch_unlikely(&ioremap_guard_key))
+		return;
+
+	guard_shift = __builtin_ctzl(guard_granule);
+
+	mutex_lock(&ioremap_guard_lock);
+
+	while (size) {
+		u64 guard_fn = phys_addr >> guard_shift;
+		struct ioremap_guard_ref *ref;
+		struct arm_smccc_res res;
+
+		if (pfn_valid(__phys_to_pfn(phys_addr)))
+			goto next;
+
+		ref = xa_load(&ioremap_guard_array, guard_fn);
+		if (ref) {
+			refcount_inc(&ref->count);
+			goto next;
+		}
+
+		/*
+		 * It is acceptable for the allocation to fail, specially
+		 * if trying to ioremap something very early on, like with
+		 * earlycon, which happens long before kmem_cache_init.
+		 * This page will be permanently accessible, similar to a
+		 * saturated refcount.
+		 */
+		if (slab_is_available())
+			ref = kzalloc(sizeof(*ref), GFP_KERNEL);
+		if (ref) {
+			refcount_set(&ref->count, 1);
+			if (xa_err(xa_store(&ioremap_guard_array, guard_fn, ref,
+					    GFP_KERNEL))) {
+				kfree(ref);
+				ref = NULL;
+			}
+		}
+
+		arm_smccc_1_1_hvc(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID,
+				  phys_addr, prot, &res);
+		if (res.a0 != SMCCC_RET_SUCCESS) {
+			pr_warn_ratelimited("Failed to register %llx\n",
+					    phys_addr);
+			xa_erase(&ioremap_guard_array, guard_fn);
+			kfree(ref);
+			goto out;
+		}
+
+	next:
+		size -= guard_granule;
+		phys_addr += guard_granule;
+	}
+out:
+	mutex_unlock(&ioremap_guard_lock);
+}
+
+void iounmap_phys_range_hook(phys_addr_t phys_addr, size_t size)
+{
+	int guard_shift;
+
+	if (!static_branch_unlikely(&ioremap_guard_key))
+		return;
+
+	VM_BUG_ON(phys_addr & ~PAGE_MASK || size & ~PAGE_MASK);
+	guard_shift = __builtin_ctzl(guard_granule);
+
+	mutex_lock(&ioremap_guard_lock);
+
+	while (size) {
+		u64 guard_fn = phys_addr >> guard_shift;
+		struct ioremap_guard_ref *ref;
+		struct arm_smccc_res res;
+
+		ref = xa_load(&ioremap_guard_array, guard_fn);
+		if (!ref) {
+			pr_warn_ratelimited("%llx not tracked, left mapped\n",
+					    phys_addr);
+			goto next;
+		}
+
+		if (!refcount_dec_and_test(&ref->count))
+			goto next;
+
+		xa_erase(&ioremap_guard_array, guard_fn);
+		kfree(ref);
+
+		arm_smccc_1_1_hvc(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID,
+				  phys_addr, &res);
+		if (res.a0 != SMCCC_RET_SUCCESS) {
+			pr_warn_ratelimited("Failed to unregister %llx\n",
+					    phys_addr);
+			goto out;
+		}
+
+	next:
+		size -= guard_granule;
+		phys_addr += guard_granule;
+	}
+out:
+	mutex_unlock(&ioremap_guard_lock);
+}
 
 bool ioremap_allowed(phys_addr_t phys_addr, size_t size, unsigned long prot)
 {

diff --git a/arch/arm64/mm/mem_encrypt.c b/arch/arm64/mm/mem_encrypt.c
new file mode 100644
index 0000000..fd59ab7
--- /dev/null
+++ b/arch/arm64/mm/mem_encrypt.c

@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Implementation of the memory encryption/decryption API.
+ *
+ * Amusingly, no crypto is actually performed. Rather, we call into the
+ * hypervisor component of KVM to expose pages selectively to the host
+ * for virtio "DMA" operations. In other words, "encrypted" pages are
+ * not accessible to the host, whereas "decrypted" pages are.
+ *
+ * Author: Will Deacon <will@kernel.org>
+ */
+#include <linux/arm-smccc.h>
+#include <linux/mem_encrypt.h>
+#include <linux/memory.h>
+#include <linux/mm.h>
+#include <linux/set_memory.h>
+#include <linux/types.h>
+
+#include <asm/hypervisor.h>
+
+#ifndef ARM_SMCCC_KVM_FUNC_HYP_MEMINFO
+#define ARM_SMCCC_KVM_FUNC_HYP_MEMINFO	2
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_HYP_MEMINFO)
+#endif	/* ARM_SMCCC_KVM_FUNC_HYP_MEMINFO */
+
+#ifndef ARM_SMCCC_KVM_FUNC_MEM_SHARE
+#define ARM_SMCCC_KVM_FUNC_MEM_SHARE	3
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MEM_SHARE)
+#endif	/* ARM_SMCCC_KVM_FUNC_MEM_SHARE */
+
+#ifndef ARM_SMCCC_KVM_FUNC_MEM_UNSHARE
+#define ARM_SMCCC_KVM_FUNC_MEM_UNSHARE	4
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MEM_UNSHARE)
+#endif	/* ARM_SMCCC_KVM_FUNC_MEM_UNSHARE */
+
+static unsigned long memshare_granule_sz;
+
+bool mem_encrypt_active(void)
+{
+	return memshare_granule_sz;
+}
+EXPORT_SYMBOL(mem_encrypt_active);
+
+void kvm_init_memshare_services(void)
+{
+	int i;
+	struct arm_smccc_res res;
+	const u32 funcs[] = {
+		ARM_SMCCC_KVM_FUNC_HYP_MEMINFO,
+		ARM_SMCCC_KVM_FUNC_MEM_SHARE,
+		ARM_SMCCC_KVM_FUNC_MEM_UNSHARE,
+	};
+
+	for (i = 0; i < ARRAY_SIZE(funcs); ++i) {
+		if (!kvm_arm_hyp_service_available(funcs[i]))
+			return;
+	}
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID,
+			     0, 0, 0, &res);
+	if (res.a0 > PAGE_SIZE) /* Includes error codes */
+		return;
+
+	memshare_granule_sz = res.a0;
+}
+
+static int arm_smccc_share_unshare_page(u32 func_id, phys_addr_t phys)
+{
+	phys_addr_t end = phys + PAGE_SIZE;
+
+	while (phys < end) {
+		struct arm_smccc_res res;
+
+		arm_smccc_1_1_invoke(func_id, phys, 0, 0, &res);
+		if (res.a0 != SMCCC_RET_SUCCESS)
+			return -EPERM;
+
+		phys += memshare_granule_sz;
+	}
+
+	return 0;
+}
+
+static int set_memory_xcrypted(u32 func_id, unsigned long start, int numpages)
+{
+	void *addr = (void *)start, *end = addr + numpages * PAGE_SIZE;
+
+	while (addr < end) {
+		int err;
+
+		err = arm_smccc_share_unshare_page(func_id, virt_to_phys(addr));
+		if (err)
+			return err;
+
+		addr += PAGE_SIZE;
+	}
+
+	return 0;
+}
+
+int set_memory_encrypted(unsigned long addr, int numpages)
+{
+	if (!memshare_granule_sz || WARN_ON(!PAGE_ALIGNED(addr)))
+		return 0;
+
+	return set_memory_xcrypted(ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID,
+				   addr, numpages);
+}
+EXPORT_SYMBOL_GPL(set_memory_encrypted);
+
+int set_memory_decrypted(unsigned long addr, int numpages)
+{
+	if (!memshare_granule_sz || WARN_ON(!PAGE_ALIGNED(addr)))
+		return 0;
+
+	return set_memory_xcrypted(ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID,
+				   addr, numpages);
+}
+EXPORT_SYMBOL_GPL(set_memory_decrypted);

diff --git a/arch/arm64/mm/mem_relinquish.c b/arch/arm64/mm/mem_relinquish.c
new file mode 100644
index 0000000..7948098
--- /dev/null
+++ b/arch/arm64/mm/mem_relinquish.c

@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2022 Google LLC
+ * Author: Keir Fraser <keirf@google.com>
+ */
+
+#include <linux/arm-smccc.h>
+#include <linux/mem_relinquish.h>
+#include <linux/memory.h>
+#include <linux/mm.h>
+#include <linux/types.h>
+
+#include <asm/hypervisor.h>
+
+static unsigned long memshare_granule_sz;
+
+void kvm_init_memrelinquish_services(void)
+{
+	int i;
+	struct arm_smccc_res res;
+	const u32 funcs[] = {
+		ARM_SMCCC_KVM_FUNC_HYP_MEMINFO,
+		ARM_SMCCC_KVM_FUNC_MEM_RELINQUISH,
+	};
+
+	for (i = 0; i < ARRAY_SIZE(funcs); ++i) {
+		if (!kvm_arm_hyp_service_available(funcs[i]))
+			return;
+	}
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID,
+			     0, 0, 0, &res);
+	if (res.a0 > PAGE_SIZE) /* Includes error codes */
+		return;
+
+	memshare_granule_sz = res.a0;
+}
+
+bool kvm_has_memrelinquish_services(void)
+{
+	return !!memshare_granule_sz;
+}
+EXPORT_SYMBOL_GPL(kvm_has_memrelinquish_services);
+
+void page_relinquish(struct page *page)
+{
+	phys_addr_t phys, end;
+	u32 func_id = ARM_SMCCC_VENDOR_HYP_KVM_MEM_RELINQUISH_FUNC_ID;
+
+	if (!memshare_granule_sz)
+		return;
+
+	phys = page_to_phys(page);
+	end = phys + PAGE_SIZE;
+
+	while (phys < end) {
+		struct arm_smccc_res res;
+
+		arm_smccc_1_1_invoke(func_id, phys, 0, 0, &res);
+		BUG_ON(res.a0 != SMCCC_RET_SUCCESS);
+
+		phys += memshare_granule_sz;
+	}
+}
+EXPORT_SYMBOL_GPL(page_relinquish);

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9a7c3896..ae25524d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c

@@ -1351,6 +1351,21 @@ void __set_fixmap(enum fixed_addresses idx,
 	}
 }
 
+pte_t *__get_fixmap_pte(enum fixed_addresses idx)
+{
+	unsigned long addr = __fix_to_virt(idx);
+	pte_t *ptep;
+
+	BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses);
+
+	ptep = fixmap_pte(addr);
+
+	if (!pte_valid(*ptep))
+		return NULL;
+
+	return ptep;
+}
+
 void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
 {
 	const u64 dt_virt_base = __fix_to_virt(FIX_FDT);

diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 5922178..debdecf 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c

@@ -160,6 +160,23 @@ int set_memory_valid(unsigned long addr, int numpages, int enable)
 					__pgprot(PTE_VALID));
 }
 
+/*
+ * Only to be used with memory in the logical map (e.g. vmapped memory will
+ * face coherency issues as we don't call vm_unmap_aliases()). Only to be used
+ * whilst accesses are not ongoing to the region, as we do not follow the
+ * make-before-break sequence in order to cut down the run time of this
+ * function.
+ */
+int arch_set_direct_map_range_uncached(unsigned long addr, unsigned long numpages)
+{
+	if (!can_set_direct_map())
+		return 0;
+
+	return __change_memory_common(addr, PAGE_SIZE * numpages,
+				      __pgprot(PTE_ATTRINDX(MT_NORMAL_NC)),
+				      __pgprot(PTE_ATTRINDX_MASK));
+}
+
 int set_direct_map_invalid_noflush(struct page *page)
 {
 	struct page_change_data data = {

diff --git a/arch/arm64/tools/.gitignore b/arch/arm64/tools/.gitignore
new file mode 100644
index 0000000..1ddeddd
--- /dev/null
+++ b/arch/arm64/tools/.gitignore

@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+gen-hyprel

diff --git a/arch/arm64/tools/Makefile b/arch/arm64/tools/Makefile
index 07a93ab..3b60450 100644
--- a/arch/arm64/tools/Makefile
+++ b/arch/arm64/tools/Makefile

@@ -22,3 +22,7 @@
 
 $(kapi)/sysreg-defs.h: $(src)/gen-sysreg.awk $(src)/sysreg FORCE
 	$(call if_changed,gen_sysreg)
+
+HOST_EXTRACFLAGS += -I$(objtree)/include
+hostprogs += gen-hyprel
+gen-hyprel: $(obj)/gen-hyprel

diff --git a/arch/arm64/kvm/hyp/nvhe/gen-hyprel.c b/arch/arm64/tools/gen-hyprel.c
similarity index 100%
rename from arch/arm64/kvm/hyp/nvhe/gen-hyprel.c
rename to arch/arm64/tools/gen-hyprel.c


diff --git a/arch/riscv/configs/gki.config b/arch/riscv/configs/gki.config
new file mode 100644
index 0000000..5558fef
--- /dev/null
+++ b/arch/riscv/configs/gki.config

@@ -0,0 +1,2 @@
+CONFIG_CPU_FREQ=y
+CONFIG_GPIOLIB=y

diff --git a/arch/riscv/configs/gki_defconfig b/arch/riscv/configs/gki_defconfig
new file mode 120000
index 0000000..3388ea4
--- /dev/null
+++ b/arch/riscv/configs/gki_defconfig

@@ -0,0 +1 @@
+../../arm64/configs/gki_defconfig
\ No newline at end of file

diff --git a/arch/s390/include/asm/hardirq.h b/arch/s390/include/asm/hardirq.h
index 58668ffb..cd9cc11 100644
--- a/arch/s390/include/asm/hardirq.h
+++ b/arch/s390/include/asm/hardirq.h

@@ -16,6 +16,12 @@
 #define local_softirq_pending() (S390_lowcore.softirq_pending)
 #define set_softirq_pending(x) (S390_lowcore.softirq_pending = (x))
 #define or_softirq_pending(x)  (S390_lowcore.softirq_pending |= (x))
+/*
+ *  Not sure what the right thing is here  for s390,
+ *  but returning 0 will result in no logical change
+ *  from what happens now
+ */
+#define __cpu_softirq_pending(x) (0)
 
 #define __ARCH_IRQ_STAT
 #define __ARCH_IRQ_EXIT_IRQS_DISABLED

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 3419ffa..298e5ec 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile

@@ -264,8 +264,11 @@
 
 PHONY += bzImage $(BOOT_TARGETS)
 
+# Don't compile Image in mixed build with "all" target
+ifndef KBUILD_MIXED_TREE
 # Default kernel to build
 all: bzImage
+endif
 
 # KBUILD_IMAGE specify target image being built
 KBUILD_IMAGE := $(boot)/bzImage

diff --git a/arch/x86/OWNERS b/arch/x86/OWNERS
new file mode 100644
index 0000000..f59fa99
--- /dev/null
+++ b/arch/x86/OWNERS

@@ -0,0 +1,3 @@
+per-file crypto/**=file:/crypto/OWNERS
+per-file mm/**=file:/mm/OWNERS
+per-file net/**=file:/net/OWNERS

diff --git a/arch/x86/configs/gki_defconfig b/arch/x86/configs/gki_defconfig
new file mode 100644
index 0000000..6d30b70
--- /dev/null
+++ b/arch/x86/configs/gki_defconfig

@@ -0,0 +1,652 @@
+CONFIG_UAPI_HEADER_TEST=y
+CONFIG_KERNEL_LZ4=y
+CONFIG_AUDIT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BPF_SYSCALL=y
+CONFIG_BPF_JIT=y
+CONFIG_BPF_JIT_ALWAYS_ON=y
+# CONFIG_BPF_UNPRIV_DEFAULT_OFF is not set
+CONFIG_PREEMPT=y
+# CONFIG_PREEMPT_DYNAMIC is not set
+CONFIG_IRQ_TIME_ACCOUNTING=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+CONFIG_PSI=y
+CONFIG_RCU_EXPERT=y
+CONFIG_RCU_BOOST=y
+CONFIG_RCU_NOCB_CPU=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_IKHEADERS=y
+CONFIG_UCLAMP_TASK=y
+CONFIG_UCLAMP_BUCKETS_COUNT=20
+CONFIG_CGROUPS=y
+CONFIG_MEMCG=y
+CONFIG_BLK_CGROUP=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_UCLAMP_TASK_GROUP=y
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CPUSETS=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_BPF=y
+CONFIG_NAMESPACES=y
+# CONFIG_TIME_NS is not set
+# CONFIG_PID_NS is not set
+CONFIG_RT_SOFTIRQ_AWARE_SCHED=y
+# CONFIG_RD_BZIP2 is not set
+# CONFIG_RD_LZMA is not set
+# CONFIG_RD_XZ is not set
+# CONFIG_RD_LZO is not set
+CONFIG_BOOT_CONFIG=y
+# CONFIG_UID16 is not set
+# CONFIG_SYSFS_SYSCALL is not set
+# CONFIG_FHANDLE is not set
+# CONFIG_PCSPKR_PLATFORM is not set
+CONFIG_KALLSYMS_ALL=y
+# CONFIG_RSEQ is not set
+CONFIG_EMBEDDED=y
+CONFIG_PROFILING=y
+CONFIG_SMP=y
+CONFIG_X86_X2APIC=y
+CONFIG_HYPERVISOR_GUEST=y
+CONFIG_PARAVIRT=y
+CONFIG_PARAVIRT_TIME_ACCOUNTING=y
+CONFIG_NR_CPUS=32
+# CONFIG_X86_MCE is not set
+# CONFIG_MICROCODE is not set
+# CONFIG_X86_5LEVEL is not set
+# CONFIG_MTRR_SANITIZER is not set
+CONFIG_EFI=y
+CONFIG_CMDLINE_BOOL=y
+CONFIG_CMDLINE="console=ttynull stack_depot_disable=on cgroup_disable=pressure bootconfig"
+CONFIG_PM_WAKELOCKS=y
+CONFIG_PM_WAKELOCKS_LIMIT=0
+# CONFIG_PM_WAKELOCKS_GC is not set
+# CONFIG_ACPI_AC is not set
+# CONFIG_ACPI_BATTERY is not set
+# CONFIG_ACPI_FAN is not set
+# CONFIG_ACPI_THERMAL is not set
+# CONFIG_X86_PM_TIMER is not set
+CONFIG_CPU_FREQ_STAT=y
+CONFIG_CPU_FREQ_TIMES=y
+CONFIG_CPU_FREQ_GOV_POWERSAVE=y
+CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
+CONFIG_IA32_EMULATION=y
+CONFIG_KVM=y
+CONFIG_KVM_INTEL=y
+CONFIG_KVM_AMD=y
+CONFIG_KPROBES=y
+CONFIG_JUMP_LABEL=y
+CONFIG_CFI_CLANG=y
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_MODVERSIONS=y
+CONFIG_MODULE_SIG=y
+CONFIG_MODULE_SIG_PROTECT=y
+CONFIG_BLK_DEV_ZONED=y
+CONFIG_BLK_INLINE_ENCRYPTION=y
+CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y
+CONFIG_IOSCHED_BFQ=y
+CONFIG_BFQ_GROUP_IOSCHED=y
+CONFIG_GKI_HACKS_TO_FIX=y
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+CONFIG_BINFMT_MISC=y
+# CONFIG_SLAB_MERGE_DEFAULT is not set
+CONFIG_SLAB_FREELIST_RANDOM=y
+CONFIG_SLAB_FREELIST_HARDENED=y
+CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
+# CONFIG_COMPAT_BRK is not set
+CONFIG_MEMORY_HOTPLUG=y
+CONFIG_MEMORY_HOTREMOVE=y
+CONFIG_DEFAULT_MMAP_MIN_ADDR=32768
+CONFIG_TRANSPARENT_HUGEPAGE=y
+CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
+CONFIG_CMA=y
+CONFIG_CMA_DEBUGFS=y
+CONFIG_CMA_AREAS=16
+# CONFIG_ZONE_DMA is not set
+CONFIG_ANON_VMA_NAME=y
+CONFIG_USERFAULTFD=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_XFRM_USER=y
+CONFIG_XFRM_INTERFACE=y
+CONFIG_XFRM_MIGRATE=y
+CONFIG_XFRM_STATISTICS=y
+CONFIG_NET_KEY=y
+CONFIG_XDP_SOCKETS=y
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_NET_IPIP=y
+CONFIG_NET_IPGRE_DEMUX=y
+CONFIG_NET_IPGRE=y
+CONFIG_NET_IPVTI=y
+CONFIG_INET_ESP=y
+CONFIG_INET_UDP_DIAG=y
+CONFIG_INET_DIAG_DESTROY=y
+CONFIG_IPV6_ROUTER_PREF=y
+CONFIG_IPV6_ROUTE_INFO=y
+CONFIG_IPV6_OPTIMISTIC_DAD=y
+CONFIG_INET6_ESP=y
+CONFIG_INET6_IPCOMP=y
+CONFIG_IPV6_MIP6=y
+CONFIG_IPV6_VTI=y
+CONFIG_IPV6_GRE=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_IPV6_MROUTE=y
+CONFIG_NETFILTER=y
+CONFIG_NF_CONNTRACK=y
+CONFIG_NF_CONNTRACK_SECMARK=y
+CONFIG_NF_CONNTRACK_PROCFS=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CONNTRACK_AMANDA=y
+CONFIG_NF_CONNTRACK_FTP=y
+CONFIG_NF_CONNTRACK_H323=y
+CONFIG_NF_CONNTRACK_IRC=y
+CONFIG_NF_CONNTRACK_NETBIOS_NS=y
+CONFIG_NF_CONNTRACK_PPTP=y
+CONFIG_NF_CONNTRACK_SANE=y
+CONFIG_NF_CONNTRACK_TFTP=y
+CONFIG_NF_CT_NETLINK=y
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
+CONFIG_NETFILTER_XT_TARGET_DSCP=y
+CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
+CONFIG_NETFILTER_XT_TARGET_MARK=y
+CONFIG_NETFILTER_XT_TARGET_NFLOG=y
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
+CONFIG_NETFILTER_XT_TARGET_NOTRACK=y
+CONFIG_NETFILTER_XT_TARGET_TEE=y
+CONFIG_NETFILTER_XT_TARGET_TPROXY=y
+CONFIG_NETFILTER_XT_TARGET_TRACE=y
+CONFIG_NETFILTER_XT_TARGET_SECMARK=y
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
+CONFIG_NETFILTER_XT_MATCH_BPF=y
+CONFIG_NETFILTER_XT_MATCH_COMMENT=y
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
+CONFIG_NETFILTER_XT_MATCH_DSCP=y
+CONFIG_NETFILTER_XT_MATCH_ESP=y
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_HELPER=y
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
+CONFIG_NETFILTER_XT_MATCH_L2TP=y
+CONFIG_NETFILTER_XT_MATCH_LENGTH=y
+CONFIG_NETFILTER_XT_MATCH_LIMIT=y
+CONFIG_NETFILTER_XT_MATCH_MAC=y
+CONFIG_NETFILTER_XT_MATCH_MARK=y
+CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y
+CONFIG_NETFILTER_XT_MATCH_OWNER=y
+CONFIG_NETFILTER_XT_MATCH_POLICY=y
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA2=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG=y
+CONFIG_NETFILTER_XT_MATCH_SOCKET=y
+CONFIG_NETFILTER_XT_MATCH_STATE=y
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
+CONFIG_NETFILTER_XT_MATCH_STRING=y
+CONFIG_NETFILTER_XT_MATCH_TIME=y
+CONFIG_NETFILTER_XT_MATCH_U32=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP_NF_MATCH_ECN=y
+CONFIG_IP_NF_MATCH_TTL=y
+CONFIG_IP_NF_FILTER=y
+CONFIG_IP_NF_TARGET_REJECT=y
+CONFIG_IP_NF_NAT=y
+CONFIG_IP_NF_TARGET_MASQUERADE=y
+CONFIG_IP_NF_TARGET_NETMAP=y
+CONFIG_IP_NF_TARGET_REDIRECT=y
+CONFIG_IP_NF_MANGLE=y
+CONFIG_IP_NF_RAW=y
+CONFIG_IP_NF_SECURITY=y
+CONFIG_IP_NF_ARPTABLES=y
+CONFIG_IP_NF_ARPFILTER=y
+CONFIG_IP_NF_ARP_MANGLE=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_MATCH_RPFILTER=y
+CONFIG_IP6_NF_FILTER=y
+CONFIG_IP6_NF_TARGET_REJECT=y
+CONFIG_IP6_NF_MANGLE=y
+CONFIG_IP6_NF_RAW=y
+CONFIG_TIPC=m
+CONFIG_L2TP=m
+CONFIG_BRIDGE=y
+CONFIG_VLAN_8021Q=m
+CONFIG_6LOWPAN=m
+CONFIG_IEEE802154=m
+CONFIG_IEEE802154_6LOWPAN=m
+CONFIG_MAC802154=m
+CONFIG_NET_SCHED=y
+CONFIG_NET_SCH_HTB=y
+CONFIG_NET_SCH_PRIO=y
+CONFIG_NET_SCH_MULTIQ=y
+CONFIG_NET_SCH_SFQ=y
+CONFIG_NET_SCH_TBF=y
+CONFIG_NET_SCH_NETEM=y
+CONFIG_NET_SCH_CODEL=y
+CONFIG_NET_SCH_FQ_CODEL=y
+CONFIG_NET_SCH_FQ=y
+CONFIG_NET_SCH_INGRESS=y
+CONFIG_NET_CLS_BASIC=y
+CONFIG_NET_CLS_TCINDEX=y
+CONFIG_NET_CLS_FW=y
+CONFIG_NET_CLS_U32=y
+CONFIG_CLS_U32_MARK=y
+CONFIG_NET_CLS_FLOW=y
+CONFIG_NET_CLS_BPF=y
+CONFIG_NET_CLS_MATCHALL=y
+CONFIG_NET_EMATCH=y
+CONFIG_NET_EMATCH_CMP=y
+CONFIG_NET_EMATCH_NBYTE=y
+CONFIG_NET_EMATCH_U32=y
+CONFIG_NET_EMATCH_META=y
+CONFIG_NET_EMATCH_TEXT=y
+CONFIG_NET_CLS_ACT=y
+CONFIG_NET_ACT_POLICE=y
+CONFIG_NET_ACT_GACT=y
+CONFIG_NET_ACT_MIRRED=y
+CONFIG_NET_ACT_SKBEDIT=y
+CONFIG_NET_ACT_BPF=y
+CONFIG_VSOCKETS=y
+CONFIG_CGROUP_NET_PRIO=y
+CONFIG_CAN=m
+CONFIG_BT=m
+CONFIG_BT_RFCOMM=m
+CONFIG_BT_RFCOMM_TTY=y
+CONFIG_BT_HIDP=m
+CONFIG_BT_HCIBTSDIO=m
+CONFIG_BT_HCIUART=m
+CONFIG_BT_HCIUART_LL=y
+CONFIG_BT_HCIUART_BCM=y
+CONFIG_BT_HCIUART_QCA=y
+CONFIG_CFG80211=m
+CONFIG_NL80211_TESTMODE=y
+# CONFIG_CFG80211_DEFAULT_PS is not set
+# CONFIG_CFG80211_CRDA_SUPPORT is not set
+CONFIG_MAC80211=m
+CONFIG_RFKILL=m
+CONFIG_NFC=m
+CONFIG_PCI=y
+CONFIG_PCIEPORTBUS=y
+CONFIG_PCIEAER=y
+CONFIG_PCI_MSI=y
+CONFIG_PCI_IOV=y
+CONFIG_PCIE_DW_PLAT_EP=y
+CONFIG_PCI_ENDPOINT=y
+CONFIG_FW_LOADER_USER_HELPER=y
+# CONFIG_FW_CACHE is not set
+CONFIG_GNSS=y
+CONFIG_OF=y
+CONFIG_ZRAM=m
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_LOOP_MIN_COUNT=16
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_SIZE=8192
+CONFIG_SRAM=y
+CONFIG_SCSI=y
+# CONFIG_SCSI_PROC_FS is not set
+CONFIG_BLK_DEV_SD=y
+CONFIG_MD=y
+CONFIG_BLK_DEV_DM=y
+CONFIG_DM_CRYPT=y
+CONFIG_DM_DEFAULT_KEY=y
+CONFIG_DM_SNAPSHOT=y
+CONFIG_DM_UEVENT=y
+CONFIG_DM_VERITY=y
+CONFIG_DM_VERITY_FEC=y
+CONFIG_NETDEVICES=y
+CONFIG_DUMMY=y
+CONFIG_WIREGUARD=y
+CONFIG_IFB=y
+CONFIG_MACSEC=y
+CONFIG_TUN=y
+CONFIG_VETH=y
+CONFIG_CAN_VCAN=m
+CONFIG_CAN_SLCAN=m
+CONFIG_PPP=m
+CONFIG_PPP_BSDCOMP=m
+CONFIG_PPP_DEFLATE=m
+CONFIG_PPP_MPPE=m
+CONFIG_PPTP=m
+CONFIG_PPPOL2TP=m
+CONFIG_USB_RTL8150=y
+CONFIG_USB_RTL8152=y
+CONFIG_USB_USBNET=y
+CONFIG_USB_NET_CDC_EEM=y
+# CONFIG_USB_NET_NET1080 is not set
+# CONFIG_USB_NET_CDC_SUBSET is not set
+# CONFIG_USB_NET_ZAURUS is not set
+CONFIG_USB_NET_AQC111=y
+# CONFIG_WLAN_VENDOR_ADMTEK is not set
+# CONFIG_WLAN_VENDOR_ATH is not set
+# CONFIG_WLAN_VENDOR_ATMEL is not set
+# CONFIG_WLAN_VENDOR_BROADCOM is not set
+# CONFIG_WLAN_VENDOR_CISCO is not set
+# CONFIG_WLAN_VENDOR_INTEL is not set
+# CONFIG_WLAN_VENDOR_INTERSIL is not set
+# CONFIG_WLAN_VENDOR_MARVELL is not set
+# CONFIG_WLAN_VENDOR_MEDIATEK is not set
+# CONFIG_WLAN_VENDOR_RALINK is not set
+# CONFIG_WLAN_VENDOR_REALTEK is not set
+# CONFIG_WLAN_VENDOR_RSI is not set
+# CONFIG_WLAN_VENDOR_ST is not set
+# CONFIG_WLAN_VENDOR_TI is not set
+# CONFIG_WLAN_VENDOR_ZYDAS is not set
+# CONFIG_WLAN_VENDOR_QUANTENNA is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_KEYBOARD_GPIO=y
+# CONFIG_MOUSE_PS2 is not set
+CONFIG_INPUT_JOYSTICK=y
+CONFIG_JOYSTICK_XPAD=y
+CONFIG_INPUT_TOUCHSCREEN=y
+CONFIG_INPUT_MISC=y
+CONFIG_INPUT_UINPUT=y
+# CONFIG_VT is not set
+# CONFIG_LEGACY_PTYS is not set
+CONFIG_SERIAL_8250=y
+# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_RUNTIME_UARTS=0
+CONFIG_SERIAL_OF_PLATFORM=y
+CONFIG_SERIAL_SAMSUNG=y
+CONFIG_SERIAL_SAMSUNG_CONSOLE=y
+CONFIG_NULL_TTY=y
+CONFIG_SERIAL_DEV_BUS=y
+CONFIG_HW_RANDOM=y
+# CONFIG_HW_RANDOM_VIA is not set
+# CONFIG_DEVMEM is not set
+# CONFIG_DEVPORT is not set
+CONFIG_HPET=y
+# CONFIG_I2C_COMPAT is not set
+# CONFIG_I2C_HELPER_AUTO is not set
+CONFIG_I3C=y
+CONFIG_SPI=y
+CONFIG_SPI_MEM=y
+CONFIG_GPIOLIB=y
+CONFIG_GPIO_GENERIC_PLATFORM=y
+# CONFIG_HWMON is not set
+CONFIG_THERMAL_NETLINK=y
+CONFIG_THERMAL_STATISTICS=y
+CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=100
+CONFIG_THERMAL_WRITABLE_TRIPS=y
+CONFIG_THERMAL_GOV_USER_SPACE=y
+CONFIG_CPU_THERMAL=y
+CONFIG_DEVFREQ_THERMAL=y
+CONFIG_THERMAL_EMULATION=y
+# CONFIG_X86_PKG_TEMP_THERMAL is not set
+CONFIG_WATCHDOG=y
+CONFIG_WATCHDOG_CORE=y
+CONFIG_MFD_SYSCON=y
+CONFIG_REGULATOR=y
+CONFIG_REGULATOR_FIXED_VOLTAGE=y
+CONFIG_RC_CORE=y
+CONFIG_BPF_LIRC_MODE2=y
+CONFIG_LIRC=y
+# CONFIG_RC_MAP is not set
+CONFIG_RC_DECODERS=y
+CONFIG_RC_DEVICES=y
+CONFIG_MEDIA_CEC_RC=y
+# CONFIG_MEDIA_ANALOG_TV_SUPPORT is not set
+# CONFIG_MEDIA_DIGITAL_TV_SUPPORT is not set
+# CONFIG_MEDIA_RADIO_SUPPORT is not set
+# CONFIG_MEDIA_SDR_SUPPORT is not set
+# CONFIG_MEDIA_TEST_SUPPORT is not set
+CONFIG_MEDIA_USB_SUPPORT=y
+CONFIG_USB_GSPCA=y
+CONFIG_USB_VIDEO_CLASS=y
+CONFIG_V4L_PLATFORM_DRIVERS=y
+CONFIG_V4L_MEM2MEM_DRIVERS=y
+CONFIG_DRM=y
+CONFIG_BACKLIGHT_CLASS_DEVICE=y
+CONFIG_SOUND=y
+CONFIG_SND=y
+CONFIG_SND_HRTIMER=y
+# CONFIG_SND_SUPPORT_OLD_API is not set
+# CONFIG_SND_VERBOSE_PROCFS is not set
+# CONFIG_SND_DRIVERS is not set
+CONFIG_SND_USB_AUDIO=y
+CONFIG_SND_SOC=y
+CONFIG_HID_BATTERY_STRENGTH=y
+CONFIG_HIDRAW=y
+CONFIG_UHID=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_PRODIKEYS=y
+CONFIG_HID_ELECOM=y
+CONFIG_HID_UCLOGIC=y
+CONFIG_HID_LOGITECH=y
+CONFIG_HID_LOGITECH_DJ=y
+CONFIG_HID_MAGICMOUSE=y
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MULTITOUCH=y
+CONFIG_HID_NINTENDO=y
+CONFIG_HID_PICOLCD=y
+CONFIG_HID_PLANTRONICS=y
+CONFIG_HID_PLAYSTATION=y
+CONFIG_PLAYSTATION_FF=y
+CONFIG_HID_ROCCAT=y
+CONFIG_HID_SONY=y
+CONFIG_SONY_FF=y
+CONFIG_HID_STEAM=y
+CONFIG_HID_WACOM=y
+CONFIG_HID_WIIMOTE=y
+CONFIG_USB_HIDDEV=y
+CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
+CONFIG_USB_XHCI_HCD=y
+CONFIG_USB_EHCI_HCD=y
+CONFIG_USB_EHCI_ROOT_HUB_TT=y
+CONFIG_USB_EHCI_HCD_PLATFORM=y
+CONFIG_USB_ACM=m
+CONFIG_USB_STORAGE=y
+CONFIG_USB_UAS=y
+CONFIG_USB_DWC3=y
+CONFIG_USB_SERIAL=m
+CONFIG_USB_SERIAL_FTDI_SIO=m
+CONFIG_USB_GADGET=y
+CONFIG_USB_CONFIGFS=y
+CONFIG_USB_CONFIGFS_UEVENT=y
+CONFIG_USB_CONFIGFS_SERIAL=y
+CONFIG_USB_CONFIGFS_ACM=y
+CONFIG_USB_CONFIGFS_NCM=y
+CONFIG_USB_CONFIGFS_ECM=y
+CONFIG_USB_CONFIGFS_EEM=y
+CONFIG_USB_CONFIGFS_MASS_STORAGE=y
+CONFIG_USB_CONFIGFS_F_FS=y
+CONFIG_USB_CONFIGFS_F_ACC=y
+CONFIG_USB_CONFIGFS_F_AUDIO_SRC=y
+CONFIG_USB_CONFIGFS_F_MIDI=y
+CONFIG_USB_CONFIGFS_F_HID=y
+CONFIG_USB_CONFIGFS_F_UVC=y
+CONFIG_TYPEC=y
+CONFIG_TYPEC_TCPM=y
+CONFIG_TYPEC_TCPCI=y
+CONFIG_TYPEC_UCSI=y
+CONFIG_MMC=y
+# CONFIG_PWRSEQ_EMMC is not set
+# CONFIG_PWRSEQ_SIMPLE is not set
+CONFIG_MMC_CRYPTO=y
+CONFIG_MMC_SDHCI=y
+CONFIG_MMC_SDHCI_PLTFM=y
+CONFIG_SCSI_UFSHCD=y
+CONFIG_SCSI_UFS_BSG=y
+CONFIG_SCSI_UFS_CRYPTO=y
+CONFIG_SCSI_UFSHCD_PCI=y
+CONFIG_SCSI_UFSHCD_PLATFORM=y
+CONFIG_SCSI_UFS_DWC_TC_PLATFORM=y
+CONFIG_LEDS_CLASS_FLASH=y
+CONFIG_LEDS_CLASS_MULTICOLOR=y
+CONFIG_LEDS_TRIGGER_TIMER=y
+CONFIG_LEDS_TRIGGER_TRANSIENT=y
+CONFIG_EDAC=y
+CONFIG_RTC_CLASS=y
+# CONFIG_RTC_HCTOSYS is not set
+CONFIG_DMABUF_HEAPS=y
+CONFIG_DMABUF_HEAPS_DEFERRED_FREE=y
+CONFIG_UIO=y
+CONFIG_VHOST_VSOCK=y
+CONFIG_STAGING=y
+CONFIG_ASHMEM=y
+CONFIG_REMOTEPROC=y
+CONFIG_REMOTEPROC_CDEV=y
+CONFIG_RPMSG_CHAR=y
+CONFIG_PM_DEVFREQ_EVENT=y
+CONFIG_IIO=y
+CONFIG_IIO_BUFFER=y
+CONFIG_IIO_TRIGGER=y
+CONFIG_POWERCAP=y
+CONFIG_ANDROID_BINDER_IPC=y
+CONFIG_ANDROID_BINDERFS=y
+CONFIG_ANDROID_VENDOR_HOOKS=y
+CONFIG_ANDROID_DEBUG_KINFO=y
+CONFIG_LIBNVDIMM=y
+CONFIG_INTERCONNECT=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_POSIX_ACL=y
+CONFIG_EXT4_FS_SECURITY=y
+CONFIG_F2FS_FS=y
+CONFIG_F2FS_FS_SECURITY=y
+CONFIG_F2FS_FS_COMPRESSION=y
+CONFIG_F2FS_UNFAIR_RWSEM=y
+CONFIG_FS_ENCRYPTION=y
+CONFIG_FS_ENCRYPTION_INLINE_CRYPT=y
+CONFIG_FS_VERITY=y
+CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
+# CONFIG_DNOTIFY is not set
+CONFIG_QUOTA=y
+CONFIG_QFMT_V2=y
+CONFIG_FUSE_FS=y
+CONFIG_VIRTIO_FS=y
+CONFIG_FUSE_BPF=y
+CONFIG_OVERLAY_FS=y
+CONFIG_INCREMENTAL_FS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_EXFAT_FS=y
+CONFIG_TMPFS=y
+CONFIG_TMPFS_POSIX_ACL=y
+# CONFIG_EFIVAR_FS is not set
+CONFIG_PSTORE=y
+CONFIG_PSTORE_CONSOLE=y
+CONFIG_PSTORE_PMSG=y
+CONFIG_PSTORE_RAM=y
+CONFIG_EROFS_FS=y
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_CODEPAGE_737=y
+CONFIG_NLS_CODEPAGE_775=y
+CONFIG_NLS_CODEPAGE_850=y
+CONFIG_NLS_CODEPAGE_852=y
+CONFIG_NLS_CODEPAGE_855=y
+CONFIG_NLS_CODEPAGE_857=y
+CONFIG_NLS_CODEPAGE_860=y
+CONFIG_NLS_CODEPAGE_861=y
+CONFIG_NLS_CODEPAGE_862=y
+CONFIG_NLS_CODEPAGE_863=y
+CONFIG_NLS_CODEPAGE_864=y
+CONFIG_NLS_CODEPAGE_865=y
+CONFIG_NLS_CODEPAGE_866=y
+CONFIG_NLS_CODEPAGE_869=y
+CONFIG_NLS_CODEPAGE_936=y
+CONFIG_NLS_CODEPAGE_950=y
+CONFIG_NLS_CODEPAGE_932=y
+CONFIG_NLS_CODEPAGE_949=y
+CONFIG_NLS_CODEPAGE_874=y
+CONFIG_NLS_ISO8859_8=y
+CONFIG_NLS_CODEPAGE_1250=y
+CONFIG_NLS_CODEPAGE_1251=y
+CONFIG_NLS_ASCII=y
+CONFIG_NLS_ISO8859_1=y
+CONFIG_NLS_ISO8859_2=y
+CONFIG_NLS_ISO8859_3=y
+CONFIG_NLS_ISO8859_4=y
+CONFIG_NLS_ISO8859_5=y
+CONFIG_NLS_ISO8859_6=y
+CONFIG_NLS_ISO8859_7=y
+CONFIG_NLS_ISO8859_9=y
+CONFIG_NLS_ISO8859_13=y
+CONFIG_NLS_ISO8859_14=y
+CONFIG_NLS_ISO8859_15=y
+CONFIG_NLS_KOI8_R=y
+CONFIG_NLS_KOI8_U=y
+CONFIG_NLS_MAC_ROMAN=y
+CONFIG_NLS_MAC_CELTIC=y
+CONFIG_NLS_MAC_CENTEURO=y
+CONFIG_NLS_MAC_CROATIAN=y
+CONFIG_NLS_MAC_CYRILLIC=y
+CONFIG_NLS_MAC_GAELIC=y
+CONFIG_NLS_MAC_GREEK=y
+CONFIG_NLS_MAC_ICELAND=y
+CONFIG_NLS_MAC_INUIT=y
+CONFIG_NLS_MAC_ROMANIAN=y
+CONFIG_NLS_MAC_TURKISH=y
+CONFIG_NLS_UTF8=y
+CONFIG_UNICODE=y
+CONFIG_SECURITY=y
+CONFIG_SECURITYFS=y
+CONFIG_SECURITY_NETWORK=y
+CONFIG_HARDENED_USERCOPY=y
+CONFIG_FORTIFY_SOURCE=y
+CONFIG_STATIC_USERMODEHELPER=y
+CONFIG_STATIC_USERMODEHELPER_PATH=""
+CONFIG_SECURITY_SELINUX=y
+CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
+CONFIG_CRYPTO_ECDH=y
+CONFIG_CRYPTO_DES=y
+CONFIG_CRYPTO_ADIANTUM=y
+CONFIG_CRYPTO_HCTR2=y
+CONFIG_CRYPTO_CHACHA20POLY1305=y
+CONFIG_CRYPTO_CCM=y
+CONFIG_CRYPTO_BLAKE2B=y
+CONFIG_CRYPTO_CMAC=y
+CONFIG_CRYPTO_MD5=y
+CONFIG_CRYPTO_XCBC=y
+CONFIG_CRYPTO_LZO=y
+CONFIG_CRYPTO_LZ4=y
+CONFIG_CRYPTO_ZSTD=y
+CONFIG_CRYPTO_ANSI_CPRNG=y
+CONFIG_CRYPTO_AES_NI_INTEL=y
+CONFIG_CRYPTO_POLYVAL_CLMUL_NI=y
+CONFIG_CRYPTO_SHA256_SSSE3=y
+CONFIG_CRYPTO_SHA512_SSSE3=y
+CONFIG_CRC_CCITT=y
+CONFIG_CRC8=y
+CONFIG_XZ_DEC=y
+CONFIG_DMA_CMA=y
+CONFIG_PRINTK_TIME=y
+CONFIG_DYNAMIC_DEBUG_CORE=y
+CONFIG_DEBUG_INFO_DWARF4=y
+CONFIG_DEBUG_INFO_BTF=y
+CONFIG_MODULE_ALLOW_BTF_MISMATCH=y
+CONFIG_HEADERS_INSTALL=y
+# CONFIG_SECTION_MISMATCH_WARN_ONLY is not set
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_UBSAN=y
+CONFIG_UBSAN_TRAP=y
+CONFIG_UBSAN_LOCAL_BOUNDS=y
+# CONFIG_UBSAN_SHIFT is not set
+# CONFIG_UBSAN_BOOL is not set
+# CONFIG_UBSAN_ENUM is not set
+CONFIG_PAGE_OWNER=y
+CONFIG_DEBUG_STACK_USAGE=y
+CONFIG_DEBUG_MEMORY_INIT=y
+CONFIG_KFENCE=y
+CONFIG_KFENCE_SAMPLE_INTERVAL=500
+CONFIG_KFENCE_NUM_OBJECTS=63
+CONFIG_KFENCE_STATIC_KEYS=y
+CONFIG_PANIC_ON_OOPS=y
+CONFIG_PANIC_TIMEOUT=-1
+CONFIG_SOFTLOCKUP_DETECTOR=y
+CONFIG_WQ_WATCHDOG=y
+CONFIG_SCHEDSTATS=y
+CONFIG_BUG_ON_DATA_CORRUPTION=y
+CONFIG_HIST_TRIGGERS=y
+CONFIG_UNWINDER_FRAME_POINTER=y
+CONFIG_KUNIT=y
+CONFIG_KUNIT_DEBUGFS=y
+# CONFIG_KUNIT_DEFAULT_ENABLED is not set

diff --git a/block/OWNERS b/block/OWNERS
new file mode 100644
index 0000000..2641e06
--- /dev/null
+++ b/block/OWNERS

@@ -0,0 +1,2 @@
+bvanassche@google.com
+jaegeuk@google.com

diff --git a/block/bio.c b/block/bio.c
index 57c2f32..b5ac2fa 100644
--- a/block/bio.c
+++ b/block/bio.c

@@ -264,6 +264,9 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
 #endif
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
 	bio->bi_crypt_context = NULL;
+#if IS_ENABLED(CONFIG_DM_DEFAULT_KEY)
+	bio->bi_skip_dm_default_key = false;
+#endif
 #endif
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 	bio->bi_integrity = NULL;

diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index ad9844c..243953d 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c

@@ -87,7 +87,7 @@ static struct bio_set crypto_bio_split;
  * This is the key we set when evicting a keyslot. This *should* be the all 0's
  * key, but AES-XTS rejects that key, so we use some random bytes instead.
  */
-static u8 blank_key[BLK_CRYPTO_MAX_KEY_SIZE];
+static u8 blank_key[BLK_CRYPTO_MAX_STANDARD_KEY_SIZE];
 
 static void blk_crypto_fallback_evict_keyslot(unsigned int slot)
 {
@@ -180,6 +180,8 @@ static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
 
 	bio_clone_blkg_association(bio, bio_src);
 
+	bio_clone_skip_dm_default_key(bio, bio_src);
+
 	return bio;
 }
 
@@ -539,7 +541,7 @@ static int blk_crypto_fallback_init(void)
 	if (blk_crypto_fallback_inited)
 		return 0;
 
-	get_random_bytes(blank_key, BLK_CRYPTO_MAX_KEY_SIZE);
+	get_random_bytes(blank_key, sizeof(blank_key));
 
 	err = bioset_init(&crypto_bio_split, 64, 0, 0);
 	if (err)
@@ -552,6 +554,7 @@ static int blk_crypto_fallback_init(void)
 
 	profile->ll_ops = blk_crypto_fallback_ll_ops;
 	profile->max_dun_bytes_supported = BLK_CRYPTO_MAX_IV_SIZE;
+	profile->key_types_supported = BLK_CRYPTO_KEY_TYPE_STANDARD;
 
 	/* All blk-crypto modes have a crypto API fallback. */
 	for (i = 0; i < BLK_ENCRYPTION_MODE_MAX; i++)

diff --git a/block/blk-crypto-internal.h b/block/blk-crypto-internal.h
index e6818ff..f26076f 100644
--- a/block/blk-crypto-internal.h
+++ b/block/blk-crypto-internal.h

@@ -14,6 +14,7 @@ struct blk_crypto_mode {
 	const char *name; /* name of this mode, shown in sysfs */
 	const char *cipher_str; /* crypto API name (for fallback case) */
 	unsigned int keysize; /* key size in bytes */
+	unsigned int security_strength; /* security strength in bytes */
 	unsigned int ivsize; /* iv size in bytes */
 };
 
@@ -21,9 +22,9 @@ extern const struct blk_crypto_mode blk_crypto_modes[];
 
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
 
-int blk_crypto_sysfs_register(struct request_queue *q);
+int blk_crypto_sysfs_register(struct gendisk *disk);
 
-void blk_crypto_sysfs_unregister(struct request_queue *q);
+void blk_crypto_sysfs_unregister(struct gendisk *disk);
 
 void bio_crypt_dun_increment(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
 			     unsigned int inc);
@@ -65,14 +66,28 @@ static inline bool blk_crypto_rq_is_encrypted(struct request *rq)
 	return rq->crypt_ctx;
 }
 
+blk_status_t blk_crypto_get_keyslot(struct blk_crypto_profile *profile,
+				    const struct blk_crypto_key *key,
+				    struct blk_crypto_keyslot **slot_ptr);
+
+void blk_crypto_put_keyslot(struct blk_crypto_keyslot *slot);
+
+int __blk_crypto_evict_key(struct blk_crypto_profile *profile,
+			   const struct blk_crypto_key *key);
+
+bool __blk_crypto_cfg_supported(struct blk_crypto_profile *profile,
+				const struct blk_crypto_config *cfg);
+
 #else /* CONFIG_BLK_INLINE_ENCRYPTION */
 
-static inline int blk_crypto_sysfs_register(struct request_queue *q)
+static inline int blk_crypto_sysfs_register(struct gendisk *disk)
 {
 	return 0;
 }
 
-static inline void blk_crypto_sysfs_unregister(struct request_queue *q) { }
+static inline void blk_crypto_sysfs_unregister(struct gendisk *disk)
+{
+}
 
 static inline bool bio_crypt_rq_ctx_compatible(struct request *rq,
 					       struct bio *bio)

diff --git a/block/blk-crypto-profile.c b/block/blk-crypto-profile.c
index 96c5119..757d0fc 100644
--- a/block/blk-crypto-profile.c
+++ b/block/blk-crypto-profile.c

@@ -32,6 +32,7 @@
 #include <linux/wait.h>
 #include <linux/blkdev.h>
 #include <linux/blk-integrity.h>
+#include "blk-crypto-internal.h"
 
 struct blk_crypto_keyslot {
 	atomic_t slot_refs;
@@ -350,6 +351,8 @@ bool __blk_crypto_cfg_supported(struct blk_crypto_profile *profile,
 		return false;
 	if (profile->max_dun_bytes_supported < cfg->dun_bytes)
 		return false;
+	if (!(profile->key_types_supported & cfg->key_type))
+		return false;
 	return true;
 }
 
@@ -464,6 +467,45 @@ bool blk_crypto_register(struct blk_crypto_profile *profile,
 EXPORT_SYMBOL_GPL(blk_crypto_register);
 
 /**
+ * blk_crypto_derive_sw_secret() - Derive software secret from wrapped key
+ * @bdev: a block device that supports hardware-wrapped keys
+ * @eph_key: the hardware-wrapped key in ephemerally-wrapped form
+ * @eph_key_size: size of @eph_key in bytes
+ * @sw_secret: (output) the software secret
+ *
+ * Given a hardware-wrapped key in ephemerally-wrapped form (the same form that
+ * it is used for I/O), ask the hardware to derive the secret which software can
+ * use for cryptographic tasks other than inline encryption.  This secret is
+ * guaranteed to be cryptographically isolated from the inline encryption key,
+ * i.e. derived with a different KDF context.
+ *
+ * Return: 0 on success, -EOPNOTSUPP if the block device doesn't support
+ *	   hardware-wrapped keys, -EBADMSG if the key isn't a valid
+ *	   hardware-wrapped key, or another -errno code.
+ */
+int blk_crypto_derive_sw_secret(struct block_device *bdev,
+				const u8 *eph_key, size_t eph_key_size,
+				u8 sw_secret[BLK_CRYPTO_SW_SECRET_SIZE])
+{
+	struct blk_crypto_profile *profile =
+		bdev_get_queue(bdev)->crypto_profile;
+	int err;
+
+	if (!profile)
+		return -EOPNOTSUPP;
+	if (!(profile->key_types_supported & BLK_CRYPTO_KEY_TYPE_HW_WRAPPED))
+		return -EOPNOTSUPP;
+	if (!profile->ll_ops.derive_sw_secret)
+		return -EOPNOTSUPP;
+	blk_crypto_hw_enter(profile);
+	err = profile->ll_ops.derive_sw_secret(profile, eph_key, eph_key_size,
+					       sw_secret);
+	blk_crypto_hw_exit(profile);
+	return err;
+}
+EXPORT_SYMBOL_GPL(blk_crypto_derive_sw_secret);
+
+/**
  * blk_crypto_intersect_capabilities() - restrict supported crypto capabilities
  *					 by child device
  * @parent: the crypto profile for the parent device
@@ -486,10 +528,12 @@ void blk_crypto_intersect_capabilities(struct blk_crypto_profile *parent,
 			    child->max_dun_bytes_supported);
 		for (i = 0; i < ARRAY_SIZE(child->modes_supported); i++)
 			parent->modes_supported[i] &= child->modes_supported[i];
+		parent->key_types_supported &= child->key_types_supported;
 	} else {
 		parent->max_dun_bytes_supported = 0;
 		memset(parent->modes_supported, 0,
 		       sizeof(parent->modes_supported));
+		parent->key_types_supported = 0;
 	}
 }
 EXPORT_SYMBOL_GPL(blk_crypto_intersect_capabilities);
@@ -522,6 +566,9 @@ bool blk_crypto_has_capabilities(const struct blk_crypto_profile *target,
 	    target->max_dun_bytes_supported)
 		return false;
 
+	if (reference->key_types_supported & ~target->key_types_supported)
+		return false;
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(blk_crypto_has_capabilities);
@@ -556,5 +603,6 @@ void blk_crypto_update_capabilities(struct blk_crypto_profile *dst,
 	       sizeof(dst->modes_supported));
 
 	dst->max_dun_bytes_supported = src->max_dun_bytes_supported;
+	dst->key_types_supported = src->key_types_supported;
 }
 EXPORT_SYMBOL_GPL(blk_crypto_update_capabilities);

diff --git a/block/blk-crypto-sysfs.c b/block/blk-crypto-sysfs.c
index fd93bd2..e05f145 100644
--- a/block/blk-crypto-sysfs.c
+++ b/block/blk-crypto-sysfs.c

@@ -126,8 +126,9 @@ static struct kobj_type blk_crypto_ktype = {
  * If the request_queue has a blk_crypto_profile, create the "crypto"
  * subdirectory in sysfs (/sys/block/$disk/queue/crypto/).
  */
-int blk_crypto_sysfs_register(struct request_queue *q)
+int blk_crypto_sysfs_register(struct gendisk *disk)
 {
+	struct request_queue *q = disk->queue;
 	struct blk_crypto_kobj *obj;
 	int err;
 
@@ -149,9 +150,9 @@ int blk_crypto_sysfs_register(struct request_queue *q)
 	return 0;
 }
 
-void blk_crypto_sysfs_unregister(struct request_queue *q)
+void blk_crypto_sysfs_unregister(struct gendisk *disk)
 {
-	kobject_put(q->crypto_kobject);
+	kobject_put(disk->queue->crypto_kobject);
 }
 
 static int __init blk_crypto_sysfs_init(void)

diff --git a/block/blk-crypto.c b/block/blk-crypto.c
index a496aae..005cc11 100644
--- a/block/blk-crypto.c
+++ b/block/blk-crypto.c

@@ -22,20 +22,30 @@ const struct blk_crypto_mode blk_crypto_modes[] = {
 		.name = "AES-256-XTS",
 		.cipher_str = "xts(aes)",
 		.keysize = 64,
+		.security_strength = 32,
 		.ivsize = 16,
 	},
 	[BLK_ENCRYPTION_MODE_AES_128_CBC_ESSIV] = {
 		.name = "AES-128-CBC-ESSIV",
 		.cipher_str = "essiv(cbc(aes),sha256)",
 		.keysize = 16,
+		.security_strength = 16,
 		.ivsize = 16,
 	},
 	[BLK_ENCRYPTION_MODE_ADIANTUM] = {
 		.name = "Adiantum",
 		.cipher_str = "adiantum(xchacha12,aes)",
 		.keysize = 32,
+		.security_strength = 32,
 		.ivsize = 32,
 	},
+	[BLK_ENCRYPTION_MODE_SM4_XTS] = {
+		.name = "SM4-XTS",
+		.cipher_str = "xts(sm4)",
+		.keysize = 32,
+		.security_strength = 16,
+		.ivsize = 16,
+	},
 };
 
 /*
@@ -69,9 +79,15 @@ static int __init bio_crypt_ctx_init(void)
 	/* This is assumed in various places. */
 	BUILD_BUG_ON(BLK_ENCRYPTION_MODE_INVALID != 0);
 
-	/* Sanity check that no algorithm exceeds the defined limits. */
+	/*
+	 * Validate the crypto mode properties.  This ideally would be done with
+	 * static assertions, but boot-time checks are the next best thing.
+	 */
 	for (i = 0; i < BLK_ENCRYPTION_MODE_MAX; i++) {
-		BUG_ON(blk_crypto_modes[i].keysize > BLK_CRYPTO_MAX_KEY_SIZE);
+		BUG_ON(blk_crypto_modes[i].keysize >
+		       BLK_CRYPTO_MAX_STANDARD_KEY_SIZE);
+		BUG_ON(blk_crypto_modes[i].security_strength >
+		       blk_crypto_modes[i].keysize);
 		BUG_ON(blk_crypto_modes[i].ivsize > BLK_CRYPTO_MAX_IV_SIZE);
 	}
 
@@ -99,6 +115,7 @@ void bio_crypt_set_ctx(struct bio *bio, const struct blk_crypto_key *key,
 
 	bio->bi_crypt_context = bc;
 }
+EXPORT_SYMBOL_GPL(bio_crypt_set_ctx);
 
 void __bio_crypt_free_ctx(struct bio *bio)
 {
@@ -267,7 +284,6 @@ bool __blk_crypto_bio_prep(struct bio **bio_ptr)
 {
 	struct bio *bio = *bio_ptr;
 	const struct blk_crypto_key *bc_key = bio->bi_crypt_context->bc_key;
-	struct blk_crypto_profile *profile;
 
 	/* Error if bio has no data. */
 	if (WARN_ON_ONCE(!bio_has_data(bio))) {
@@ -284,10 +300,9 @@ bool __blk_crypto_bio_prep(struct bio **bio_ptr)
 	 * Success if device supports the encryption context, or if we succeeded
 	 * in falling back to the crypto API.
 	 */
-	profile = bdev_get_queue(bio->bi_bdev)->crypto_profile;
-	if (__blk_crypto_cfg_supported(profile, &bc_key->crypto_cfg))
+	if (blk_crypto_config_supported_natively(bio->bi_bdev,
+						 &bc_key->crypto_cfg))
 		return true;
-
 	if (blk_crypto_fallback_bio_prep(bio_ptr))
 		return true;
 fail:
@@ -310,8 +325,9 @@ int __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
 /**
  * blk_crypto_init_key() - Prepare a key for use with blk-crypto
  * @blk_key: Pointer to the blk_crypto_key to initialize.
- * @raw_key: Pointer to the raw key. Must be the correct length for the chosen
- *	     @crypto_mode; see blk_crypto_modes[].
+ * @raw_key: the raw bytes of the key
+ * @raw_key_size: size of the raw key in bytes
+ * @key_type: type of the key -- either standard or hardware-wrapped
  * @crypto_mode: identifier for the encryption algorithm to use
  * @dun_bytes: number of bytes that will be used to specify the DUN when this
  *	       key is used
@@ -320,7 +336,9 @@ int __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
  * Return: 0 on success, -errno on failure.  The caller is responsible for
  *	   zeroizing both blk_key and raw_key when done with them.
  */
-int blk_crypto_init_key(struct blk_crypto_key *blk_key, const u8 *raw_key,
+int blk_crypto_init_key(struct blk_crypto_key *blk_key,
+			const u8 *raw_key, size_t raw_key_size,
+			enum blk_crypto_key_type key_type,
 			enum blk_crypto_mode_num crypto_mode,
 			unsigned int dun_bytes,
 			unsigned int data_unit_size)
@@ -333,8 +351,19 @@ int blk_crypto_init_key(struct blk_crypto_key *blk_key, const u8 *raw_key,
 		return -EINVAL;
 
 	mode = &blk_crypto_modes[crypto_mode];
-	if (mode->keysize == 0)
+	switch (key_type) {
+	case BLK_CRYPTO_KEY_TYPE_STANDARD:
+		if (raw_key_size != mode->keysize)
+			return -EINVAL;
+		break;
+	case BLK_CRYPTO_KEY_TYPE_HW_WRAPPED:
+		if (raw_key_size < mode->security_strength ||
+		    raw_key_size > BLK_CRYPTO_MAX_HW_WRAPPED_KEY_SIZE)
+			return -EINVAL;
+		break;
+	default:
 		return -EINVAL;
+	}
 
 	if (dun_bytes == 0 || dun_bytes > mode->ivsize)
 		return -EINVAL;
@@ -345,29 +374,40 @@ int blk_crypto_init_key(struct blk_crypto_key *blk_key, const u8 *raw_key,
 	blk_key->crypto_cfg.crypto_mode = crypto_mode;
 	blk_key->crypto_cfg.dun_bytes = dun_bytes;
 	blk_key->crypto_cfg.data_unit_size = data_unit_size;
+	blk_key->crypto_cfg.key_type = key_type;
 	blk_key->data_unit_size_bits = ilog2(data_unit_size);
-	blk_key->size = mode->keysize;
-	memcpy(blk_key->raw, raw_key, mode->keysize);
+	blk_key->size = raw_key_size;
+	memcpy(blk_key->raw, raw_key, raw_key_size);
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blk_crypto_init_key);
+
+bool blk_crypto_config_supported_natively(struct block_device *bdev,
+					  const struct blk_crypto_config *cfg)
+{
+	return __blk_crypto_cfg_supported(bdev_get_queue(bdev)->crypto_profile,
+					  cfg);
+}
 
 /*
  * Check if bios with @cfg can be en/decrypted by blk-crypto (i.e. either the
- * request queue it's submitted to supports inline crypto, or the
+ * block_device it's submitted to supports inline crypto, or the
  * blk-crypto-fallback is enabled and supports the cfg).
  */
-bool blk_crypto_config_supported(struct request_queue *q,
+bool blk_crypto_config_supported(struct block_device *bdev,
 				 const struct blk_crypto_config *cfg)
 {
-	return IS_ENABLED(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) ||
-	       __blk_crypto_cfg_supported(q->crypto_profile, cfg);
+	if (IS_ENABLED(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) &&
+	    cfg->key_type == BLK_CRYPTO_KEY_TYPE_STANDARD)
+		return true;
+	return blk_crypto_config_supported_natively(bdev, cfg);
 }
 
 /**
  * blk_crypto_start_using_key() - Start using a blk_crypto_key on a device
+ * @bdev: block device to operate on
  * @key: A key to use on the device
- * @q: the request queue for the device
  *
  * Upper layers must call this function to ensure that either the hardware
  * supports the key's crypto settings, or the crypto API fallback has transforms
@@ -379,18 +419,23 @@ bool blk_crypto_config_supported(struct request_queue *q,
  *	   blk-crypto-fallback is either disabled or the needed algorithm
  *	   is disabled in the crypto API; or another -errno code.
  */
-int blk_crypto_start_using_key(const struct blk_crypto_key *key,
-			       struct request_queue *q)
+int blk_crypto_start_using_key(struct block_device *bdev,
+			       const struct blk_crypto_key *key)
 {
-	if (__blk_crypto_cfg_supported(q->crypto_profile, &key->crypto_cfg))
+	if (blk_crypto_config_supported_natively(bdev, &key->crypto_cfg))
 		return 0;
+	if (key->crypto_cfg.key_type != BLK_CRYPTO_KEY_TYPE_STANDARD) {
+		pr_warn_once("tried to use wrapped key, but hardware doesn't support it\n");
+		return -EOPNOTSUPP;
+	}
 	return blk_crypto_fallback_start_using_mode(key->crypto_cfg.crypto_mode);
 }
+EXPORT_SYMBOL_GPL(blk_crypto_start_using_key);
 
 /**
  * blk_crypto_evict_key() - Evict a key from any inline encryption hardware
  *			    it may have been programmed into
- * @q: The request queue who's associated inline encryption hardware this key
+ * @bdev: The block_device who's associated inline encryption hardware this key
  *     might have been programmed into
  * @key: The key to evict
  *
@@ -400,14 +445,16 @@ int blk_crypto_start_using_key(const struct blk_crypto_key *key,
  *
  * Return: 0 on success or if the key wasn't in any keyslot; -errno on error.
  */
-int blk_crypto_evict_key(struct request_queue *q,
+int blk_crypto_evict_key(struct block_device *bdev,
 			 const struct blk_crypto_key *key)
 {
-	if (__blk_crypto_cfg_supported(q->crypto_profile, &key->crypto_cfg))
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (blk_crypto_config_supported_natively(bdev, &key->crypto_cfg))
 		return __blk_crypto_evict_key(q->crypto_profile, key);
 
 	/*
-	 * If the request_queue didn't support the key, then blk-crypto-fallback
+	 * If the block_device didn't support the key, then blk-crypto-fallback
 	 * may have been used, so try to evict the key from blk-crypto-fallback.
 	 */
 	return blk_crypto_fallback_evict_key(key);

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index e71b3b4..bc39225 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c

@@ -838,7 +838,7 @@ int blk_register_queue(struct gendisk *disk)
 			goto put_dev;
 	}
 
-	ret = blk_crypto_sysfs_register(q);
+	ret = blk_crypto_sysfs_register(disk);
 	if (ret)
 		goto put_dev;
 
@@ -915,7 +915,7 @@ void blk_unregister_queue(struct gendisk *disk)
 	 */
 	if (queue_is_mq(q))
 		blk_mq_sysfs_unregister(disk);
-	blk_crypto_sysfs_unregister(q);
+	blk_crypto_sysfs_unregister(disk);
 
 	mutex_lock(&q->sysfs_lock);
 	elv_unregister_queue(q);

diff --git a/build.config.aarch64 b/build.config.aarch64
new file mode 100644
index 0000000..9fcbceb
--- /dev/null
+++ b/build.config.aarch64

@@ -0,0 +1,16 @@
+ARCH=arm64
+MAKE_GOALS="
+Image
+modules
+"
+
+FILES="
+arch/arm64/boot/Image
+vmlinux
+System.map
+vmlinux.symvers
+modules.builtin
+modules.builtin.modinfo
+"
+
+NDK_TRIPLE=aarch64-linux-android31

diff --git a/build.config.allmodconfig b/build.config.allmodconfig
new file mode 100644
index 0000000..8bd0d5f
--- /dev/null
+++ b/build.config.allmodconfig

@@ -0,0 +1,16 @@
+DEFCONFIG=allmodconfig
+
+POST_DEFCONFIG_CMDS="update_config"
+function update_config() {
+    ${KERNEL_DIR}/scripts/config --file ${OUT_DIR}/.config \
+         -e UNWINDER_FRAME_POINTER \
+         -d WERROR \
+         -d SAMPLES \
+         -d BPFILTER \
+         -e RANDSTRUCT_NONE \
+         -d RANDSTRUCT_FULL \
+         -d RANDSTRUCT \
+
+    (cd ${OUT_DIR} && \
+     make O=${OUT_DIR} $archsubarch CROSS_COMPILE=${CROSS_COMPILE} ${TOOL_ARGS} ${MAKE_ARGS} olddefconfig)
+}

diff --git a/build.config.allmodconfig.aarch64 b/build.config.allmodconfig.aarch64
new file mode 100644
index 0000000..2fbe380
--- /dev/null
+++ b/build.config.allmodconfig.aarch64

@@ -0,0 +1,4 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.aarch64
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.allmodconfig
+

diff --git a/build.config.allmodconfig.arm b/build.config.allmodconfig.arm
new file mode 100644
index 0000000..e92744a
--- /dev/null
+++ b/build.config.allmodconfig.arm

@@ -0,0 +1,4 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.arm
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.allmodconfig
+

diff --git a/build.config.allmodconfig.x86_64 b/build.config.allmodconfig.x86_64
new file mode 100644
index 0000000..f06b30c
--- /dev/null
+++ b/build.config.allmodconfig.x86_64

@@ -0,0 +1,4 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.x86_64
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.allmodconfig
+

diff --git a/build.config.amlogic b/build.config.amlogic
new file mode 100644
index 0000000..e04bd52
--- /dev/null
+++ b/build.config.amlogic

@@ -0,0 +1,34 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki.aarch64
+
+DEFCONFIG=amlogic_gki_defconfig
+FRAGMENT_CONFIG=${KERNEL_DIR}/arch/arm64/configs/amlogic_gki.fragment
+
+PRE_DEFCONFIG_CMDS="KCONFIG_CONFIG=${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/${DEFCONFIG} ${ROOT_DIR}/${KERNEL_DIR}/scripts/kconfig/merge_config.sh -m -r ${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/gki_defconfig ${ROOT_DIR}/${FRAGMENT_CONFIG}"
+POST_DEFCONFIG_CMDS="rm ${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/${DEFCONFIG}"
+
+# needed for DT overlay support
+DTC_FLAGS="-@"
+
+MAKE_GOALS="${MAKE_GOALS}
+dtbs
+"
+
+FILES="${FILES}
+arch/arm64/boot/Image.lz4
+arch/arm64/boot/dts/amlogic/meson-g12a-sei510*.dtb
+arch/arm64/boot/dts/amlogic/meson-sm1-sei610*.dtb
+arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l*.dtb
+arch/arm64/boot/dts/amlogic/meson-g12b-a311d-khadas-vim3*.dtb
+"
+
+#
+# NOTE: Using Image.lz4 in MAKE_GOALS does not work because
+#   kernel build passes legacy option (-l) to lz4 command
+#   and u-boot fails to decompress.  Instead, add custom
+#   command to lz4 compress same options as kernel, but
+#   without the -l.
+#
+EXTRA_CMDS="lz4_compress"
+function lz4_compress() {
+        lz4 -f -12 --favor-decSpeed ${OUT_DIR}/arch/arm64/boot/Image ${OUT_DIR}/arch/arm64/boot/Image.lz4
+}

diff --git a/build.config.arm b/build.config.arm
new file mode 100644
index 0000000..c7f114c
--- /dev/null
+++ b/build.config.arm

@@ -0,0 +1,13 @@
+ARCH=arm
+MAKE_GOALS="
+zImage
+modules
+"
+
+FILES="
+arch/arm/boot/zImage
+vmlinux
+System.map
+"
+
+NDK_TRIPLE=arm-linux-androideabi31

diff --git a/build.config.common b/build.config.common
new file mode 100644
index 0000000..3106a14
--- /dev/null
+++ b/build.config.common

@@ -0,0 +1,17 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.constants
+
+KMI_GENERATION=0
+
+LLVM=1
+DEPMOD=depmod
+CLANG_PREBUILT_BIN=prebuilts/clang/host/linux-x86/clang-${CLANG_VERSION}/bin
+BUILDTOOLS_PREBUILT_BIN=build/kernel/build-tools/path/linux-x86
+DTC=${ROOT_DIR}/${BUILDTOOLS_PREBUILT_BIN}/dtc
+
+KCFLAGS="${KCFLAGS} -D__ANDROID_COMMON_KERNEL__"
+EXTRA_CMDS=''
+STOP_SHIP_TRACEPRINTK=1
+IN_KERNEL_MODULES=1
+DO_NOT_STRIP_MODULES=1
+
+HERMETIC_TOOLCHAIN=${HERMETIC_TOOLCHAIN:-1}

diff --git a/build.config.constants b/build.config.constants
new file mode 100644
index 0000000..4acb770f
--- /dev/null
+++ b/build.config.constants

@@ -0,0 +1,2 @@
+BRANCH=android14-6.1
+CLANG_VERSION=r475365b

diff --git a/build.config.db845c b/build.config.db845c
new file mode 100644
index 0000000..9e1c48e
--- /dev/null
+++ b/build.config.db845c

@@ -0,0 +1,21 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.aarch64
+
+BUILD_INITRAMFS=1
+DEFCONFIG=db845c_gki_defconfig
+FRAGMENT_CONFIG=${KERNEL_DIR}/arch/arm64/configs/db845c_gki.fragment
+PRE_DEFCONFIG_CMDS="KCONFIG_CONFIG=${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/${DEFCONFIG} ${ROOT_DIR}/${KERNEL_DIR}/scripts/kconfig/merge_config.sh -m -r ${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/gki_defconfig ${ROOT_DIR}/${FRAGMENT_CONFIG}"
+POST_DEFCONFIG_CMDS="rm ${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/${DEFCONFIG}"
+
+MAKE_GOALS="
+modules
+qcom/sdm845-db845c.dtb
+qcom/qrb5165-rb5.dtb
+qcom/sm8450-qrd.dtb
+"
+
+FILES="
+arch/arm64/boot/dts/qcom/sdm845-db845c.dtb
+arch/arm64/boot/dts/qcom/qrb5165-rb5.dtb
+arch/arm64/boot/dts/qcom/sm8450-qrd.dtb
+"

diff --git a/build.config.gki b/build.config.gki
new file mode 100644
index 0000000..4b931d9
--- /dev/null
+++ b/build.config.gki

@@ -0,0 +1,2 @@
+DEFCONFIG=gki_defconfig
+POST_DEFCONFIG_CMDS="check_defconfig"

diff --git a/build.config.gki-debug.aarch64 b/build.config.gki-debug.aarch64
new file mode 100644
index 0000000..c1fe2f0
--- /dev/null
+++ b/build.config.gki-debug.aarch64

@@ -0,0 +1,3 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki.aarch64
+TRIM_NONLISTED_KMI=""
+KMI_SYMBOL_LIST_STRICT_MODE=""

diff --git a/build.config.gki-debug.x86_64 b/build.config.gki-debug.x86_64
new file mode 100644
index 0000000..d89b7ad
--- /dev/null
+++ b/build.config.gki-debug.x86_64

@@ -0,0 +1,3 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki.x86_64
+TRIM_NONLISTED_KMI=""
+KMI_SYMBOL_LIST_STRICT_MODE=""

diff --git a/build.config.gki.aarch64 b/build.config.gki.aarch64
new file mode 100644
index 0000000..709af184
--- /dev/null
+++ b/build.config.gki.aarch64

@@ -0,0 +1,27 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.aarch64
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki
+
+MAKE_GOALS="${MAKE_GOALS}
+Image.lz4
+Image.gz
+"
+
+FILES="${FILES}
+arch/arm64/boot/Image.lz4
+arch/arm64/boot/Image.gz
+"
+
+BUILD_SYSTEM_DLKM=1
+MODULES_LIST=${ROOT_DIR}/${KERNEL_DIR}/android/gki_system_dlkm_modules
+
+BUILD_GKI_CERTIFICATION_TOOLS=1
+
+BUILD_GKI_ARTIFACTS=1
+BUILD_GKI_BOOT_IMG_SIZE=67108864
+BUILD_GKI_BOOT_IMG_GZ_SIZE=47185920
+BUILD_GKI_BOOT_IMG_LZ4_SIZE=53477376
+
+if [ -n "${GKI_BUILD_CONFIG_FRAGMENT}" ]; then
+source ${GKI_BUILD_CONFIG_FRAGMENT}
+fi

diff --git a/build.config.gki.aarch64.16k b/build.config.gki.aarch64.16k
new file mode 100644
index 0000000..20be95d
--- /dev/null
+++ b/build.config.gki.aarch64.16k

@@ -0,0 +1,5 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki.aarch64
+
+DEFCONFIG=16k_gki_defconfig
+PRE_DEFCONFIG_CMDS="mkdir -p \${OUT_DIR}/arch/arm64/configs/ && cat ${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/gki_defconfig ${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/16k_gki.fragment > \${OUT_DIR}/arch/arm64/configs/${DEFCONFIG};"
+POST_DEFCONFIG_CMDS=""

diff --git a/build.config.gki.aarch64.fips140 b/build.config.gki.aarch64.fips140
new file mode 100644
index 0000000..ec493efc
--- /dev/null
+++ b/build.config.gki.aarch64.fips140

@@ -0,0 +1,27 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.aarch64
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki
+
+FILES="
+crypto/fips140.ko
+"
+
+MAKE_GOALS="
+modules
+"
+
+if [ "${LTO}" = "none" ]; then
+	echo "The FIPS140 module needs LTO to be enabled."
+	exit 1
+fi
+
+MODULES_ORDER=android/gki_aarch64_fips140_modules
+KERNEL_DIR=common
+
+DEFCONFIG=fips140_gki_defconfig
+PRE_DEFCONFIG_CMDS="mkdir -p \${OUT_DIR}/arch/arm64/configs/ && KCONFIG_CONFIG=\${OUT_DIR}/arch/arm64/configs/${DEFCONFIG} ${ROOT_DIR}/${KERNEL_DIR}/scripts/kconfig/merge_config.sh -m -r ${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/gki_defconfig ${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/fips140_gki.fragment"
+POST_DEFCONFIG_CMDS=""
+
+if [ -n "${GKI_BUILD_CONFIG_FRAGMENT}" ]; then
+source ${GKI_BUILD_CONFIG_FRAGMENT}
+fi

diff --git a/build.config.gki.riscv64 b/build.config.gki.riscv64
new file mode 100644
index 0000000..8215d16
--- /dev/null
+++ b/build.config.gki.riscv64

@@ -0,0 +1,30 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.riscv64
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki
+
+MAKE_GOALS="${MAKE_GOALS}
+Image.lz4
+Image.gz
+"
+
+FILES="${FILES}
+arch/riscv/boot/Image.lz4
+arch/riscv/boot/Image.gz
+"
+
+BUILD_SYSTEM_DLKM=1
+MODULES_LIST=${ROOT_DIR}/${KERNEL_DIR}/android/gki_system_dlkm_modules
+
+BUILD_GKI_CERTIFICATION_TOOLS=1
+
+BUILD_GKI_ARTIFACTS=1
+BUILD_GKI_BOOT_IMG_SIZE=67108864
+BUILD_GKI_BOOT_IMG_GZ_SIZE=47185920
+BUILD_GKI_BOOT_IMG_LZ4_SIZE=53477376
+
+PRE_DEFCONFIG_CMDS="mkdir -p \${OUT_DIR}/arch/riscv/configs/ && cat ${ROOT_DIR}/${KERNEL_DIR}/arch/riscv/configs/gki_defconfig ${ROOT_DIR}/${KERNEL_DIR}/arch/riscv/configs/64-bit.config ${ROOT_DIR}/${KERNEL_DIR}/arch/riscv/configs/gki.config > \${OUT_DIR}/arch/riscv/configs/${DEFCONFIG};"
+POST_DEFCONFIG_CMDS=""
+
+if [ -n "${GKI_BUILD_CONFIG_FRAGMENT}" ]; then
+source ${GKI_BUILD_CONFIG_FRAGMENT}
+fi

diff --git a/build.config.gki.x86_64 b/build.config.gki.x86_64
new file mode 100644
index 0000000..93f492c
--- /dev/null
+++ b/build.config.gki.x86_64

@@ -0,0 +1,15 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.x86_64
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki
+
+BUILD_SYSTEM_DLKM=1
+MODULES_LIST=${ROOT_DIR}/${KERNEL_DIR}/android/gki_system_dlkm_modules
+
+BUILD_GKI_CERTIFICATION_TOOLS=1
+
+BUILD_GKI_ARTIFACTS=1
+BUILD_GKI_BOOT_IMG_SIZE=67108864
+
+if [ -n "${GKI_BUILD_CONFIG_FRAGMENT}" ]; then
+source ${GKI_BUILD_CONFIG_FRAGMENT}
+fi

diff --git a/build.config.gki_kasan b/build.config.gki_kasan
new file mode 100644
index 0000000..1aca62c
--- /dev/null
+++ b/build.config.gki_kasan

@@ -0,0 +1,22 @@
+DEFCONFIG=gki_defconfig
+POST_DEFCONFIG_CMDS="check_defconfig && update_kasan_config"
+KERNEL_DIR=common
+LTO=none
+
+function update_kasan_config() {
+    ${KERNEL_DIR}/scripts/config --file ${OUT_DIR}/.config \
+         -e CONFIG_KASAN \
+         -e CONFIG_KASAN_INLINE \
+         -e CONFIG_KCOV \
+         -e CONFIG_PANIC_ON_WARN_DEFAULT_ENABLE \
+         -d CONFIG_RANDOMIZE_BASE \
+         -d CONFIG_KASAN_OUTLINE \
+         --set-val CONFIG_FRAME_WARN 0 \
+         -d CFI \
+         -d CFI_PERMISSIVE \
+         -d CFI_CLANG \
+         -d SHADOW_CALL_STACK
+    (cd ${OUT_DIR} && \
+     make ${TOOL_ARGS} O=${OUT_DIR} olddefconfig)
+}
+

diff --git a/build.config.gki_kasan.aarch64 b/build.config.gki_kasan.aarch64
new file mode 100644
index 0000000..9fd2560
--- /dev/null
+++ b/build.config.gki_kasan.aarch64

@@ -0,0 +1,3 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.aarch64
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki_kasan

diff --git a/build.config.gki_kasan.x86_64 b/build.config.gki_kasan.x86_64
new file mode 100644
index 0000000..eec6458
--- /dev/null
+++ b/build.config.gki_kasan.x86_64

@@ -0,0 +1,4 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.x86_64
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki_kasan
+

diff --git a/build.config.gki_kprobes b/build.config.gki_kprobes
new file mode 100644
index 0000000..f1c6cf7
--- /dev/null
+++ b/build.config.gki_kprobes

@@ -0,0 +1,20 @@
+DEFCONFIG=gki_defconfig
+POST_DEFCONFIG_CMDS="check_defconfig && update_kprobes_config"
+function update_kprobes_config() {
+    ${KERNEL_DIR}/scripts/config --file ${OUT_DIR}/.config \
+         -d LTO \
+         -d LTO_CLANG_THIN \
+         -d CFI \
+         -d CFI_PERMISSIVE \
+         -d CFI_CLANG \
+         -e CONFIG_DYNAMIC_FTRACE \
+         -e CONFIG_FUNCTION_TRACER \
+         -e CONFIG_IRQSOFF_TRACER \
+         -e CONFIG_FUNCTION_PROFILER \
+         -e CONFIG_PREEMPT_TRACER \
+         -e CONFIG_CHECKPOINT_RESTORE \
+         -d CONFIG_RANDOMIZE_BASE
+    (cd ${OUT_DIR} && \
+     make ${TOOL_ARGS} O=${OUT_DIR} olddefconfig)
+}
+

diff --git a/build.config.gki_kprobes.aarch64 b/build.config.gki_kprobes.aarch64
new file mode 100644
index 0000000..627c217
--- /dev/null
+++ b/build.config.gki_kprobes.aarch64

@@ -0,0 +1,4 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.aarch64
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki_kprobes
+

diff --git a/build.config.gki_kprobes.x86_64 b/build.config.gki_kprobes.x86_64
new file mode 100644
index 0000000..a1d3da7
--- /dev/null
+++ b/build.config.gki_kprobes.x86_64

@@ -0,0 +1,4 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.x86_64
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.gki_kprobes
+

diff --git a/build.config.khwasan b/build.config.khwasan
new file mode 100644
index 0000000..c8e0c5e
--- /dev/null
+++ b/build.config.khwasan

@@ -0,0 +1,17 @@
+append_cmd POST_DEFCONFIG_CMDS update_khwasan_config
+
+function update_khwasan_config() {
+  ${KERNEL_DIR}/scripts/config --file ${OUT_DIR}/.config \
+    -e CONFIG_KASAN \
+    -d CONFIG_KASAN_HW_TAGS \
+    -e CONFIG_KASAN_SW_TAGS \
+    -e CONFIG_KASAN_OUTLINE \
+    -e CONFIG_KASAN_PANIC_ON_WARN \
+    -e CONFIG_KCOV \
+    -e CONFIG_PANIC_ON_WARN_DEFAULT_ENABLE \
+    -d CONFIG_RANDOMIZE_BASE \
+    --set-val CONFIG_FRAME_WARN 0 \
+    -d SHADOW_CALL_STACK
+  (cd ${OUT_DIR} && \
+   make O=${OUT_DIR} "${TOOL_ARGS[@]}" ${MAKE_ARGS} olddefconfig)
+}

diff --git a/build.config.riscv64 b/build.config.riscv64
new file mode 100644
index 0000000..0ef054b
--- /dev/null
+++ b/build.config.riscv64

@@ -0,0 +1,14 @@
+ARCH=riscv
+MAKE_GOALS="
+Image
+modules
+"
+
+FILES="
+arch/riscv/boot/Image
+vmlinux
+System.map
+vmlinux.symvers
+modules.builtin
+modules.builtin.modinfo
+"

diff --git a/build.config.rockpi4 b/build.config.rockpi4
new file mode 100644
index 0000000..c9c9776
--- /dev/null
+++ b/build.config.rockpi4

@@ -0,0 +1,21 @@
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.common
+. ${ROOT_DIR}/${KERNEL_DIR}/build.config.aarch64
+TRIM_NONLISTED_KMI=""
+KMI_SYMBOL_LIST_STRICT_MODE=""
+
+BUILD_INITRAMFS=1
+LZ4_RAMDISK=1
+DEFCONFIG=rockpi4_gki_defconfig
+FRAGMENT_CONFIG=${KERNEL_DIR}/arch/arm64/configs/rockpi4_gki.fragment
+PRE_DEFCONFIG_CMDS="KCONFIG_CONFIG=${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/${DEFCONFIG} ${ROOT_DIR}/${KERNEL_DIR}/scripts/kconfig/merge_config.sh -m -r ${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/gki_defconfig ${ROOT_DIR}/${FRAGMENT_CONFIG}"
+POST_DEFCONFIG_CMDS="rm ${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/configs/${DEFCONFIG}"
+
+DTS_EXT_DIR=common-modules/virtual-device
+DTC_INCLUDE=${ROOT_DIR}/${KERNEL_DIR}/arch/arm64/boot/dts/rockchip
+MAKE_GOALS="${MAKE_GOALS}
+rk3399-rock-pi-4b.dtb
+"
+
+FILES="${FILES}
+../common-modules/virtual-device/rk3399-rock-pi-4b.dtb
+"

diff --git a/build.config.x86_64 b/build.config.x86_64
new file mode 100644
index 0000000..b5ac82b
--- /dev/null
+++ b/build.config.x86_64

@@ -0,0 +1,16 @@
+ARCH=x86_64
+MAKE_GOALS="
+bzImage
+modules
+"
+
+FILES="
+arch/x86/boot/bzImage
+vmlinux
+System.map
+vmlinux.symvers
+modules.builtin
+modules.builtin.modinfo
+"
+
+NDK_TRIPLE=x86_64-linux-android31

diff --git a/certs/extract-cert.c b/certs/extract-cert.c
index 8c1fb9a..5c66eda 100644
--- a/certs/extract-cert.c
+++ b/certs/extract-cert.c

@@ -56,6 +56,7 @@ static void display_openssl_errors(int l)
 	}
 }
 
+#ifndef OPENSSL_IS_BORINGSSL
 static void drain_openssl_errors(void)
 {
 	const char *file;
@@ -65,6 +66,7 @@ static void drain_openssl_errors(void)
 		return;
 	while (ERR_get_error_line(&file, &line)) {}
 }
+#endif
 
 #define ERR(cond, fmt, ...)				\
 	do {						\
@@ -119,6 +121,10 @@ int main(int argc, char **argv)
 		fclose(f);
 		exit(0);
 	} else if (!strncmp(cert_src, "pkcs11:", 7)) {
+#ifdef OPENSSL_IS_BORINGSSL
+		ERR(1, "BoringSSL does not support extracting from PKCS#11");
+		exit(1);
+#else
 		ENGINE *e;
 		struct {
 			const char *cert_id;
@@ -141,6 +147,7 @@ int main(int argc, char **argv)
 		ENGINE_ctrl_cmd(e, "LOAD_CERT_CTRL", 0, &parms, NULL, 1);
 		ERR(!parms.cert, "Get X.509 from PKCS#11");
 		write_cert(parms.cert);
+#endif
 	} else {
 		BIO *b;
 		X509 *x509;

diff --git a/crypto/Kconfig b/crypto/Kconfig
index d779667..65f71ac 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig

@@ -54,6 +54,31 @@
 	  This option provides the ability to override the FIPS Module Version.
 	  By default the KERNELRELEASE value is used.
 
+config CRYPTO_FIPS140_MOD
+	tristate "Enable FIPS 140 cryptographic module"
+	depends on ARM64 && ARM64_MODULE_PLTS
+	depends on m
+	help
+	  This option enables building a loadable module fips140.ko, which
+	  contains various crypto algorithms that are also built into vmlinux.
+	  At load time, this module overrides the built-in implementations of
+	  these algorithms with its implementations.  It also runs self-tests on
+	  these algorithms and verifies the integrity of its code and data.  If
+	  either of these steps fails, the kernel will panic.
+
+	  This module is intended to be loaded at early boot time in order to
+	  meet FIPS 140 and NIAP FPT_TST_EXT.1 requirements.  It shouldn't be
+	  used if you don't need to meet these requirements.
+
+config CRYPTO_FIPS140_MOD_EVAL_TESTING
+	bool "Enable evaluation testing features in FIPS 140 module"
+	depends on CRYPTO_FIPS140_MOD
+	help
+	  This option adds some features to the FIPS 140 module which are needed
+	  for lab evaluation testing of the module, e.g. support for injecting
+	  errors and support for a userspace interface to some of the module's
+	  services.  This option should not be enabled in production builds.
+
 config CRYPTO_ALGAPI
 	tristate
 	select CRYPTO_ALGAPI2

diff --git a/crypto/Makefile b/crypto/Makefile
index 303b21c..1fa1ea1 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile

@@ -212,3 +212,56 @@
 # Key derivation function
 #
 obj-$(CONFIG_CRYPTO_KDF800108_CTR) += kdf_sp800108.o
+
+ifneq ($(CONFIG_CRYPTO_FIPS140_MOD),)
+
+FIPS140_CFLAGS := -DBUILD_FIPS140_KO -include $(srctree)/crypto/fips140-defs.h
+
+CFLAGS_jitterentropy-fips.o := -O0
+KASAN_SANITIZE_jitterentropy-fips.o = n
+UBSAN_SANITIZE_jitterentropy-fips.o = n
+
+# Compile an extra copy of various crypto algorithms into the fips140 module.
+#
+# Note: the module will still work if some files are removed from here.
+# However, it may affect FIPS certifiability.  Don't remove files from here
+# without considering impact on FIPS certifiability.
+
+crypto-fips-objs := drbg.o ecb.o cbc.o ctr.o cts.o gcm.o xts.o hmac.o cmac.o \
+		    gf128mul.o aes_generic.o lib-crypto-aes.o \
+		    jitterentropy.o jitterentropy-kcapi.o \
+		    sha1_generic.o sha256_generic.o sha512_generic.o \
+		    lib-crypto-memneq.o lib-crypto-sha1.o lib-crypto-sha256.o \
+		    lib-crypto-utils.o
+crypto-fips-objs := $(foreach o,$(crypto-fips-objs),$(o:.o=-fips.o))
+
+# get the arch to add its objects to $(crypto-fips-objs)
+include $(srctree)/arch/$(ARCH)/crypto/Kbuild.fips140
+
+$(obj)/%-fips.o: KBUILD_CFLAGS += $(FIPS140_CFLAGS)
+$(obj)/%-fips.o: $(src)/%.c FORCE
+	$(call if_changed_rule,cc_o_c)
+$(obj)/lib-%-fips.o: $(srctree)/lib/%.c FORCE
+	$(call if_changed_rule,cc_o_c)
+$(obj)/lib-crypto-%-fips.o: $(srctree)/lib/crypto/%.c FORCE
+	$(call if_changed_rule,cc_o_c)
+
+fips140-objs := \
+	fips140-alg-registration.o \
+	fips140-module.o \
+	fips140-refs.o \
+	fips140-selftests.o \
+	$(crypto-fips-objs)
+fips140-$(CONFIG_CRYPTO_FIPS140_MOD_EVAL_TESTING) += \
+	fips140-eval-testing.o
+obj-m += fips140.o
+
+CFLAGS_fips140-alg-registration.o += $(FIPS140_CFLAGS)
+CFLAGS_fips140-module.o += $(FIPS140_CFLAGS)
+CFLAGS_fips140-selftests.o += $(FIPS140_CFLAGS)
+CFLAGS_fips140-eval-testing.o += $(FIPS140_CFLAGS)
+
+hostprogs-always-y := fips140_gen_hmac
+HOSTLDLIBS_fips140_gen_hmac := -lcrypto -lelf
+
+endif

diff --git a/crypto/OWNERS b/crypto/OWNERS
new file mode 100644
index 0000000..4ed35a0
--- /dev/null
+++ b/crypto/OWNERS

@@ -0,0 +1 @@
+ardb@google.com

diff --git a/crypto/algapi.c b/crypto/algapi.c
index 5c69ff8..d08f864 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c

@@ -222,12 +222,65 @@ void crypto_remove_spawns(struct crypto_alg *alg, struct list_head *list,
 }
 EXPORT_SYMBOL_GPL(crypto_remove_spawns);
 
+static void crypto_alg_finish_registration(struct crypto_alg *alg,
+					   bool fulfill_requests,
+					   struct list_head *algs_to_put)
+{
+	struct crypto_alg *q;
+
+	list_for_each_entry(q, &crypto_alg_list, cra_list) {
+		if (q == alg)
+			continue;
+
+		if (crypto_is_moribund(q))
+			continue;
+
+		if (crypto_is_larval(q)) {
+			struct crypto_larval *larval = (void *)q;
+
+			/*
+			 * Check to see if either our generic name or
+			 * specific name can satisfy the name requested
+			 * by the larval entry q.
+			 */
+			if (strcmp(alg->cra_name, q->cra_name) &&
+			    strcmp(alg->cra_driver_name, q->cra_name))
+				continue;
+
+			if (larval->adult)
+				continue;
+			if ((q->cra_flags ^ alg->cra_flags) & larval->mask)
+				continue;
+
+			if (fulfill_requests && crypto_mod_get(alg))
+				larval->adult = alg;
+			else
+				larval->adult = ERR_PTR(-EAGAIN);
+
+			continue;
+		}
+
+		if (strcmp(alg->cra_name, q->cra_name))
+			continue;
+
+		if (strcmp(alg->cra_driver_name, q->cra_driver_name) &&
+		    q->cra_priority > alg->cra_priority)
+			continue;
+
+		crypto_remove_spawns(q, algs_to_put, alg);
+	}
+
+	crypto_notify(CRYPTO_MSG_ALG_LOADED, alg);
+}
+
 static struct crypto_larval *crypto_alloc_test_larval(struct crypto_alg *alg)
 {
 	struct crypto_larval *larval;
 
-	if (!IS_ENABLED(CONFIG_CRYPTO_MANAGER))
-		return NULL;
+	if (!IS_ENABLED(CONFIG_CRYPTO_MANAGER) ||
+	    IS_ENABLED(CONFIG_CRYPTO_MANAGER_DISABLE_TESTS) ||
+	    (alg->cra_flags & CRYPTO_ALG_INTERNAL))
+		return NULL; /* No self-test needed */
 
 	larval = crypto_larval_alloc(alg->cra_name,
 				     alg->cra_flags | CRYPTO_ALG_TESTED, 0);
@@ -248,7 +301,8 @@ static struct crypto_larval *crypto_alloc_test_larval(struct crypto_alg *alg)
 	return larval;
 }
 
-static struct crypto_larval *__crypto_register_alg(struct crypto_alg *alg)
+static struct crypto_larval *
+__crypto_register_alg(struct crypto_alg *alg, struct list_head *algs_to_put)
 {
 	struct crypto_alg *q;
 	struct crypto_larval *larval;
@@ -259,9 +313,6 @@ static struct crypto_larval *__crypto_register_alg(struct crypto_alg *alg)
 
 	INIT_LIST_HEAD(&alg->cra_users);
 
-	/* No cheating! */
-	alg->cra_flags &= ~CRYPTO_ALG_TESTED;
-
 	ret = -EEXIST;
 
 	list_for_each_entry(q, &crypto_alg_list, cra_list) {
@@ -288,13 +339,18 @@ static struct crypto_larval *__crypto_register_alg(struct crypto_alg *alg)
 
 	list_add(&alg->cra_list, &crypto_alg_list);
 
-	if (larval)
-		list_add(&larval->alg.cra_list, &crypto_alg_list);
-	else
-		alg->cra_flags |= CRYPTO_ALG_TESTED;
-
 	crypto_stats_init(alg);
 
+	if (larval) {
+		/* No cheating! */
+		alg->cra_flags &= ~CRYPTO_ALG_TESTED;
+
+		list_add(&larval->alg.cra_list, &crypto_alg_list);
+	} else {
+		alg->cra_flags |= CRYPTO_ALG_TESTED;
+		crypto_alg_finish_registration(alg, true, algs_to_put);
+	}
+
 out:
 	return larval;
 
@@ -341,7 +397,10 @@ void crypto_alg_tested(const char *name, int err)
 
 	alg->cra_flags |= CRYPTO_ALG_TESTED;
 
-	/* Only satisfy larval waiters if we are the best. */
+	/*
+	 * If a higher-priority implementation of the same algorithm is
+	 * currently being tested, then don't fulfill request larvals.
+	 */
 	best = true;
 	list_for_each_entry(q, &crypto_alg_list, cra_list) {
 		if (crypto_is_moribund(q) || !crypto_is_larval(q))
@@ -356,47 +415,7 @@ void crypto_alg_tested(const char *name, int err)
 		}
 	}
 
-	list_for_each_entry(q, &crypto_alg_list, cra_list) {
-		if (q == alg)
-			continue;
-
-		if (crypto_is_moribund(q))
-			continue;
-
-		if (crypto_is_larval(q)) {
-			struct crypto_larval *larval = (void *)q;
-
-			/*
-			 * Check to see if either our generic name or
-			 * specific name can satisfy the name requested
-			 * by the larval entry q.
-			 */
-			if (strcmp(alg->cra_name, q->cra_name) &&
-			    strcmp(alg->cra_driver_name, q->cra_name))
-				continue;
-
-			if (larval->adult)
-				continue;
-			if ((q->cra_flags ^ alg->cra_flags) & larval->mask)
-				continue;
-
-			if (best && crypto_mod_get(alg))
-				larval->adult = alg;
-			else
-				larval->adult = ERR_PTR(-EAGAIN);
-
-			continue;
-		}
-
-		if (strcmp(alg->cra_name, q->cra_name))
-			continue;
-
-		if (strcmp(alg->cra_driver_name, q->cra_driver_name) &&
-		    q->cra_priority > alg->cra_priority)
-			continue;
-
-		crypto_remove_spawns(q, &list, alg);
-	}
+	crypto_alg_finish_registration(alg, best, &list);
 
 complete:
 	complete_all(&test->completion);
@@ -423,7 +442,8 @@ EXPORT_SYMBOL_GPL(crypto_remove_final);
 int crypto_register_alg(struct crypto_alg *alg)
 {
 	struct crypto_larval *larval;
-	bool test_started;
+	LIST_HEAD(algs_to_put);
+	bool test_started = false;
 	int err;
 
 	alg->cra_flags &= ~CRYPTO_ALG_DEAD;
@@ -432,17 +452,18 @@ int crypto_register_alg(struct crypto_alg *alg)
 		return err;
 
 	down_write(&crypto_alg_sem);
-	larval = __crypto_register_alg(alg);
-	test_started = static_key_enabled(&crypto_boot_test_finished);
-	if (!IS_ERR_OR_NULL(larval))
+	larval = __crypto_register_alg(alg, &algs_to_put);
+	if (!IS_ERR_OR_NULL(larval)) {
+		test_started = crypto_boot_test_finished();
 		larval->test_started = test_started;
+	}
 	up_write(&crypto_alg_sem);
 
-	if (IS_ERR_OR_NULL(larval))
+	if (IS_ERR(larval))
 		return PTR_ERR(larval);
-
 	if (test_started)
 		crypto_wait_for_test(larval);
+	crypto_remove_final(&algs_to_put);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(crypto_register_alg);
@@ -619,6 +640,7 @@ int crypto_register_instance(struct crypto_template *tmpl,
 	struct crypto_larval *larval;
 	struct crypto_spawn *spawn;
 	u32 fips_internal = 0;
+	LIST_HEAD(algs_to_put);
 	int err;
 
 	err = crypto_check_alg(&inst->alg);
@@ -650,7 +672,7 @@ int crypto_register_instance(struct crypto_template *tmpl,
 
 	inst->alg.cra_flags |= (fips_internal & CRYPTO_ALG_FIPS_INTERNAL);
 
-	larval = __crypto_register_alg(&inst->alg);
+	larval = __crypto_register_alg(&inst->alg, &algs_to_put);
 	if (IS_ERR(larval))
 		goto unlock;
 	else if (larval)
@@ -662,15 +684,12 @@ int crypto_register_instance(struct crypto_template *tmpl,
 unlock:
 	up_write(&crypto_alg_sem);
 
-	err = PTR_ERR(larval);
-	if (IS_ERR_OR_NULL(larval))
-		goto err;
-
-	crypto_wait_for_test(larval);
-	err = 0;
-
-err:
-	return err;
+	if (IS_ERR(larval))
+		return PTR_ERR(larval);
+	if (larval)
+		crypto_wait_for_test(larval);
+	crypto_remove_final(&algs_to_put);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(crypto_register_instance);
 
@@ -1234,6 +1253,9 @@ EXPORT_SYMBOL_GPL(crypto_stats_skcipher_decrypt);
 
 static void __init crypto_start_tests(void)
 {
+	if (IS_ENABLED(CONFIG_CRYPTO_MANAGER_DISABLE_TESTS))
+		return;
+
 	for (;;) {
 		struct crypto_larval *larval = NULL;
 		struct crypto_alg *q;
@@ -1267,7 +1289,7 @@ static void __init crypto_start_tests(void)
 		crypto_wait_for_test(larval);
 	}
 
-	static_branch_enable(&crypto_boot_test_finished);
+	set_crypto_boot_test_finished();
 }
 
 static int __init crypto_algapi_init(void)

diff --git a/crypto/algboss.c b/crypto/algboss.c
index eb5fe84..0de1e66 100644
--- a/crypto/algboss.c
+++ b/crypto/algboss.c

@@ -175,18 +175,10 @@ static int cryptomgr_test(void *data)
 {
 	struct crypto_test_param *param = data;
 	u32 type = param->type;
-	int err = 0;
-
-#ifdef CONFIG_CRYPTO_MANAGER_DISABLE_TESTS
-	goto skiptest;
-#endif
-
-	if (type & CRYPTO_ALG_TESTED)
-		goto skiptest;
+	int err;
 
 	err = alg_test(param->driver, param->alg, type, CRYPTO_ALG_TESTED);
 
-skiptest:
 	crypto_alg_tested(param->driver, err);
 
 	kfree(param);
@@ -197,7 +189,9 @@ static int cryptomgr_schedule_test(struct crypto_alg *alg)
 {
 	struct task_struct *thread;
 	struct crypto_test_param *param;
-	u32 type;
+
+	if (IS_ENABLED(CONFIG_CRYPTO_MANAGER_DISABLE_TESTS))
+		return NOTIFY_DONE;
 
 	if (!try_module_get(THIS_MODULE))
 		goto err;
@@ -208,13 +202,7 @@ static int cryptomgr_schedule_test(struct crypto_alg *alg)
 
 	memcpy(param->driver, alg->cra_driver_name, sizeof(param->driver));
 	memcpy(param->alg, alg->cra_name, sizeof(param->alg));
-	type = alg->cra_flags;
-
-	/* Do not test internal algorithms. */
-	if (type & CRYPTO_ALG_INTERNAL)
-		type |= CRYPTO_ALG_TESTED;
-
-	param->type = type;
+	param->type = alg->cra_flags;
 
 	thread = kthread_run(cryptomgr_test, param, "cryptomgr_test");
 	if (IS_ERR(thread))

diff --git a/crypto/api.c b/crypto/api.c
index 64f2d36..b022702 100644
--- a/crypto/api.c
+++ b/crypto/api.c

@@ -31,8 +31,10 @@ EXPORT_SYMBOL_GPL(crypto_alg_sem);
 BLOCKING_NOTIFIER_HEAD(crypto_chain);
 EXPORT_SYMBOL_GPL(crypto_chain);
 
-DEFINE_STATIC_KEY_FALSE(crypto_boot_test_finished);
-EXPORT_SYMBOL_GPL(crypto_boot_test_finished);
+#ifndef CONFIG_CRYPTO_MANAGER_DISABLE_TESTS
+DEFINE_STATIC_KEY_FALSE(__crypto_boot_test_finished);
+EXPORT_SYMBOL_GPL(__crypto_boot_test_finished);
+#endif
 
 static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg);
 
@@ -172,9 +174,6 @@ void crypto_wait_for_test(struct crypto_larval *larval)
 
 	err = wait_for_completion_killable(&larval->completion);
 	WARN_ON(err);
-	if (!err)
-		crypto_notify(CRYPTO_MSG_ALG_LOADED, larval);
-
 out:
 	crypto_larval_kill(&larval->alg);
 }
@@ -205,7 +204,7 @@ static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg)
 	struct crypto_larval *larval = (void *)alg;
 	long timeout;
 
-	if (!static_branch_likely(&crypto_boot_test_finished))
+	if (!crypto_boot_test_finished())
 		crypto_start_test(larval);
 
 	timeout = wait_for_completion_killable_timeout(

diff --git a/crypto/fips140-alg-registration.c b/crypto/fips140-alg-registration.c
new file mode 100644
index 0000000..03757f8
--- /dev/null
+++ b/crypto/fips140-alg-registration.c

@@ -0,0 +1,388 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Block crypto operations until tests complete
+ *
+ * Copyright 2021 Google LLC
+ *
+ * This file defines the fips140_crypto_register_*() functions, to which all
+ * calls to crypto_register_*() in the module are redirected.  These functions
+ * override the tfm initialization function of each algorithm to insert a wait
+ * for the module having completed its self-tests and integrity check.
+ *
+ * The exact field that we override depends on the algorithm type.  For
+ * algorithm types that have a strongly-typed initialization function pointer
+ * (e.g. skcipher), we must override that, since cra_init isn't guaranteed to be
+ * called for those despite the field being present in the base struct.  For the
+ * other algorithm types (e.g. "cipher") we must override cra_init.
+ *
+ * All of this applies to both normal algorithms and template instances.
+ *
+ * The purpose of all of this is to meet a FIPS requirement where the module
+ * must not produce any output from cryptographic algorithms until it completes
+ * its tests.  Technically this is impossible, but this solution meets the
+ * intent of the requirement, assuming the user makes a supported sequence of
+ * API calls.  Note that we can't simply run the tests before registering the
+ * algorithms, as the algorithms must be registered in order to run the tests.
+ *
+ * It would be much easier to handle this in the kernel's crypto API framework.
+ * Unfortunately, that was deemed insufficient because the module itself is
+ * required to do the enforcement.  What is *actually* required is still very
+ * vague, but the approach implemented here should meet the requirement.
+ */
+
+/*
+ * This file is the one place in fips140.ko that needs to call the kernel's real
+ * algorithm registration functions, so #undefine all the macros from
+ * fips140-defs.h so that the "fips140_" prefix doesn't automatically get added.
+ */
+#undef aead_register_instance
+#undef ahash_register_instance
+#undef crypto_register_aead
+#undef crypto_register_aeads
+#undef crypto_register_ahash
+#undef crypto_register_ahashes
+#undef crypto_register_alg
+#undef crypto_register_algs
+#undef crypto_register_rng
+#undef crypto_register_rngs
+#undef crypto_register_shash
+#undef crypto_register_shashes
+#undef crypto_register_skcipher
+#undef crypto_register_skciphers
+#undef shash_register_instance
+#undef skcipher_register_instance
+
+#include <crypto/algapi.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/rng.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/xarray.h>
+
+#include "fips140-module.h"
+
+/* Indicates whether the self-tests and integrity check have completed */
+DECLARE_COMPLETION(fips140_tests_done);
+
+/* The thread running the self-tests and integrity check */
+struct task_struct *fips140_init_thread;
+
+/*
+ * Map from crypto_alg to original initialization function (possibly NULL)
+ *
+ * Note: unregistering an algorithm will leak its map entry, as we don't bother
+ * to remove it.  This should be fine since fips140.ko can't be unloaded.  The
+ * proper solution would be to store the original function pointer in a new
+ * field in 'struct crypto_alg', but that would require kernel support.
+ */
+static DEFINE_XARRAY(fips140_init_func_map);
+
+static bool fips140_ready(void)
+{
+	return completion_done(&fips140_tests_done);
+}
+
+/*
+ * Wait until crypto operations are allowed to proceed.  Return true if the
+ * tests are done, or false if the caller is the thread running the tests so it
+ * is allowed to proceed anyway.
+ */
+static bool fips140_wait_until_ready(struct crypto_alg *alg)
+{
+	if (fips140_ready())
+		return true;
+	/*
+	 * The thread running the tests must not wait.  Since tfms can only be
+	 * allocated in task context, we can reliably determine whether the
+	 * invocation is from that thread or not by checking 'current'.
+	 */
+	if (current == fips140_init_thread)
+		return false;
+
+	pr_info("blocking user of %s until tests complete\n",
+		alg->cra_driver_name);
+	wait_for_completion(&fips140_tests_done);
+	pr_info("tests done, allowing %s to proceed\n", alg->cra_driver_name);
+	return true;
+}
+
+static int fips140_store_init_function(struct crypto_alg *alg, void *func)
+{
+	void *ret;
+
+	/*
+	 * The XArray API requires 4-byte aligned values.  Although function
+	 * pointers in general aren't guaranteed to be 4-byte aligned, it should
+	 * be the case for the platforms this module is used on.
+	 */
+	if (WARN_ON((unsigned long)func & 3))
+		return -EINVAL;
+
+	ret = xa_store(&fips140_init_func_map, (unsigned long)alg, func,
+		       GFP_KERNEL);
+	return xa_err(ret);
+}
+
+/* Get the algorithm's original initialization function (possibly NULL) */
+static void *fips140_load_init_function(struct crypto_alg *alg)
+{
+	return xa_load(&fips140_init_func_map, (unsigned long)alg);
+}
+
+/* tfm initialization function overrides */
+
+static int fips140_alg_init_tfm(struct crypto_tfm *tfm)
+{
+	struct crypto_alg *alg = tfm->__crt_alg;
+	int (*cra_init)(struct crypto_tfm *tfm) =
+		fips140_load_init_function(alg);
+
+	if (fips140_wait_until_ready(alg))
+		WRITE_ONCE(alg->cra_init, cra_init);
+	return cra_init ? cra_init(tfm) : 0;
+}
+
+static int fips140_aead_init_tfm(struct crypto_aead *tfm)
+{
+	struct aead_alg *alg = crypto_aead_alg(tfm);
+	int (*init)(struct crypto_aead *tfm) =
+		fips140_load_init_function(&alg->base);
+
+	if (fips140_wait_until_ready(&alg->base))
+		WRITE_ONCE(alg->init, init);
+	return init ? init(tfm) : 0;
+}
+
+static int fips140_ahash_init_tfm(struct crypto_ahash *tfm)
+{
+	struct hash_alg_common *halg = crypto_hash_alg_common(tfm);
+	struct ahash_alg *alg = container_of(halg, struct ahash_alg, halg);
+	int (*init_tfm)(struct crypto_ahash *tfm) =
+		fips140_load_init_function(&halg->base);
+
+	if (fips140_wait_until_ready(&halg->base))
+		WRITE_ONCE(alg->init_tfm, init_tfm);
+	return init_tfm ? init_tfm(tfm) : 0;
+}
+
+static int fips140_shash_init_tfm(struct crypto_shash *tfm)
+{
+	struct shash_alg *alg = crypto_shash_alg(tfm);
+	int (*init_tfm)(struct crypto_shash *tfm) =
+		fips140_load_init_function(&alg->base);
+
+	if (fips140_wait_until_ready(&alg->base))
+		WRITE_ONCE(alg->init_tfm, init_tfm);
+	return init_tfm ? init_tfm(tfm) : 0;
+}
+
+static int fips140_skcipher_init_tfm(struct crypto_skcipher *tfm)
+{
+	struct skcipher_alg *alg = crypto_skcipher_alg(tfm);
+	int (*init)(struct crypto_skcipher *tfm) =
+		fips140_load_init_function(&alg->base);
+
+	if (fips140_wait_until_ready(&alg->base))
+		WRITE_ONCE(alg->init, init);
+	return init ? init(tfm) : 0;
+}
+
+/* Single algorithm registration */
+
+#define prepare_alg(alg, base_alg, field, wrapper_func)			\
+({									\
+	int err = 0;							\
+									\
+	if (!fips140_ready() && alg->field != wrapper_func) {		\
+		err = fips140_store_init_function(base_alg, alg->field);\
+		if (err == 0)						\
+			alg->field = wrapper_func;			\
+	}								\
+	err;								\
+})
+
+static int fips140_prepare_alg(struct crypto_alg *alg)
+{
+	/*
+	 * Override cra_init.  This is only for algorithm types like cipher and
+	 * rng that don't have a strongly-typed initialization function.
+	 */
+	return prepare_alg(alg, alg, cra_init, fips140_alg_init_tfm);
+}
+
+static int fips140_prepare_aead_alg(struct aead_alg *alg)
+{
+	return prepare_alg(alg, &alg->base, init, fips140_aead_init_tfm);
+}
+
+static int fips140_prepare_ahash_alg(struct ahash_alg *alg)
+{
+	return prepare_alg(alg, &alg->halg.base, init_tfm,
+			   fips140_ahash_init_tfm);
+}
+
+static int fips140_prepare_rng_alg(struct rng_alg *alg)
+{
+	/*
+	 * rng doesn't have a strongly-typed initialization function, so we must
+	 * treat rng algorithms as "generic" algorithms.
+	 */
+	return fips140_prepare_alg(&alg->base);
+}
+
+static int fips140_prepare_shash_alg(struct shash_alg *alg)
+{
+	return prepare_alg(alg, &alg->base, init_tfm, fips140_shash_init_tfm);
+}
+
+static int fips140_prepare_skcipher_alg(struct skcipher_alg *alg)
+{
+	return prepare_alg(alg, &alg->base, init, fips140_skcipher_init_tfm);
+}
+
+int fips140_crypto_register_alg(struct crypto_alg *alg)
+{
+	return fips140_prepare_alg(alg) ?: crypto_register_alg(alg);
+}
+
+int fips140_crypto_register_aead(struct aead_alg *alg)
+{
+	return fips140_prepare_aead_alg(alg) ?: crypto_register_aead(alg);
+}
+
+int fips140_crypto_register_ahash(struct ahash_alg *alg)
+{
+	return fips140_prepare_ahash_alg(alg) ?: crypto_register_ahash(alg);
+}
+
+int fips140_crypto_register_rng(struct rng_alg *alg)
+{
+	return fips140_prepare_rng_alg(alg) ?: crypto_register_rng(alg);
+}
+
+int fips140_crypto_register_shash(struct shash_alg *alg)
+{
+	return fips140_prepare_shash_alg(alg) ?: crypto_register_shash(alg);
+}
+
+int fips140_crypto_register_skcipher(struct skcipher_alg *alg)
+{
+	return fips140_prepare_skcipher_alg(alg) ?:
+		crypto_register_skcipher(alg);
+}
+
+/* Instance registration */
+
+int fips140_aead_register_instance(struct crypto_template *tmpl,
+				   struct aead_instance *inst)
+{
+	return fips140_prepare_aead_alg(&inst->alg) ?:
+		aead_register_instance(tmpl, inst);
+}
+
+int fips140_ahash_register_instance(struct crypto_template *tmpl,
+				    struct ahash_instance *inst)
+{
+	return fips140_prepare_ahash_alg(&inst->alg) ?:
+		ahash_register_instance(tmpl, inst);
+}
+
+int fips140_shash_register_instance(struct crypto_template *tmpl,
+				    struct shash_instance *inst)
+{
+	return fips140_prepare_shash_alg(&inst->alg) ?:
+		shash_register_instance(tmpl, inst);
+}
+
+int fips140_skcipher_register_instance(struct crypto_template *tmpl,
+				       struct skcipher_instance *inst)
+{
+	return fips140_prepare_skcipher_alg(&inst->alg) ?:
+		skcipher_register_instance(tmpl, inst);
+}
+
+/* Bulk algorithm registration */
+
+int fips140_crypto_register_algs(struct crypto_alg *algs, int count)
+{
+	int i;
+	int err;
+
+	for (i = 0; i < count; i++) {
+		err = fips140_prepare_alg(&algs[i]);
+		if (err)
+			return err;
+	}
+
+	return crypto_register_algs(algs, count);
+}
+
+int fips140_crypto_register_aeads(struct aead_alg *algs, int count)
+{
+	int i;
+	int err;
+
+	for (i = 0; i < count; i++) {
+		err = fips140_prepare_aead_alg(&algs[i]);
+		if (err)
+			return err;
+	}
+
+	return crypto_register_aeads(algs, count);
+}
+
+int fips140_crypto_register_ahashes(struct ahash_alg *algs, int count)
+{
+	int i;
+	int err;
+
+	for (i = 0; i < count; i++) {
+		err = fips140_prepare_ahash_alg(&algs[i]);
+		if (err)
+			return err;
+	}
+
+	return crypto_register_ahashes(algs, count);
+}
+
+int fips140_crypto_register_rngs(struct rng_alg *algs, int count)
+{
+	int i;
+	int err;
+
+	for (i = 0; i < count; i++) {
+		err = fips140_prepare_rng_alg(&algs[i]);
+		if (err)
+			return err;
+	}
+
+	return crypto_register_rngs(algs, count);
+}
+
+int fips140_crypto_register_shashes(struct shash_alg *algs, int count)
+{
+	int i;
+	int err;
+
+	for (i = 0; i < count; i++) {
+		err = fips140_prepare_shash_alg(&algs[i]);
+		if (err)
+			return err;
+	}
+
+	return crypto_register_shashes(algs, count);
+}
+
+int fips140_crypto_register_skciphers(struct skcipher_alg *algs, int count)
+{
+	int i;
+	int err;
+
+	for (i = 0; i < count; i++) {
+		err = fips140_prepare_skcipher_alg(&algs[i]);
+		if (err)
+			return err;
+	}
+
+	return crypto_register_skciphers(algs, count);
+}

diff --git a/crypto/fips140-defs.h b/crypto/fips140-defs.h
new file mode 100644
index 0000000..9005f95
--- /dev/null
+++ b/crypto/fips140-defs.h

@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright 2021 Google LLC
+ *
+ * This file is automatically included by all files built into fips140.ko, via
+ * the "-include" compiler flag.
+ */
+
+/*
+ * fips140.ko is built from various unmodified or minimally modified kernel
+ * source files, many of which are normally meant to be buildable into different
+ * modules themselves.  That results in conflicting instances of module_init()
+ * and related macros such as MODULE_LICENSE().
+ *
+ * To solve that, we undefine MODULE to trick the kernel headers into thinking
+ * the code is being compiled as built-in.  That causes module_init() and
+ * related macros to be expanded as they would be for built-in code; e.g.,
+ * module_init() adds the function to the .initcalls section of the binary.
+ *
+ * The .c file that contains the real module_init() for fips140.ko is then
+ * responsible for redefining MODULE, and the real module_init() is responsible
+ * for executing all the initcalls that were collected into .initcalls.
+ */
+#undef MODULE
+
+/*
+ * Defining KBUILD_MODFILE is also required, since the kernel headers expect it
+ * to be defined when code that can be a module is compiled as built-in.
+ */
+#define KBUILD_MODFILE "crypto/fips140"
+
+/*
+ * Disable symbol exports by default.  fips140.ko includes various files that
+ * use EXPORT_SYMBOL*(), but it's unwanted to export any symbols from fips140.ko
+ * except where explicitly needed for FIPS certification reasons.
+ */
+#define __DISABLE_EXPORTS
+
+/*
+ * Redirect all calls to algorithm registration functions to the wrapper
+ * functions defined within the module.
+ */
+#define aead_register_instance		fips140_aead_register_instance
+#define ahash_register_instance		fips140_ahash_register_instance
+#define crypto_register_aead		fips140_crypto_register_aead
+#define crypto_register_aeads		fips140_crypto_register_aeads
+#define crypto_register_ahash		fips140_crypto_register_ahash
+#define crypto_register_ahashes		fips140_crypto_register_ahashes
+#define crypto_register_alg		fips140_crypto_register_alg
+#define crypto_register_algs		fips140_crypto_register_algs
+#define crypto_register_rng		fips140_crypto_register_rng
+#define crypto_register_rngs		fips140_crypto_register_rngs
+#define crypto_register_shash		fips140_crypto_register_shash
+#define crypto_register_shashes		fips140_crypto_register_shashes
+#define crypto_register_skcipher	fips140_crypto_register_skcipher
+#define crypto_register_skciphers	fips140_crypto_register_skciphers
+#define shash_register_instance		fips140_shash_register_instance
+#define skcipher_register_instance	fips140_skcipher_register_instance

diff --git a/crypto/fips140-eval-testing-uapi.h b/crypto/fips140-eval-testing-uapi.h
new file mode 100644
index 0000000..04e6cf633
--- /dev/null
+++ b/crypto/fips140-eval-testing-uapi.h

@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _CRYPTO_FIPS140_EVAL_TESTING_H
+#define _CRYPTO_FIPS140_EVAL_TESTING_H
+
+#include <linux/ioctl.h>
+
+/*
+ * This header defines the ioctls that are available on the fips140 character
+ * device.  These ioctls expose some of the module's services to userspace so
+ * that they can be tested by the FIPS certification lab; this is a required
+ * part of getting a FIPS 140 certification.  These ioctls do not have any other
+ * purpose, and they do not need to be present in production builds.
+ */
+
+/*
+ * Call the fips140_is_approved_service() function.  The argument must be the
+ * service name as a NUL-terminated string.  The return value will be 1 if
+ * fips140_is_approved_service() returned true, or 0 if it returned false.
+ */
+#define FIPS140_IOCTL_IS_APPROVED_SERVICE	_IO('F', 0)
+
+/*
+ * Call the fips140_module_version() function.  The argument must be a pointer
+ * to a buffer of size >= 256 chars.  The NUL-terminated string returned by
+ * fips140_module_version() will be written to this buffer.
+ */
+#define FIPS140_IOCTL_MODULE_VERSION		_IOR('F', 1, char[256])
+
+#endif /* _CRYPTO_FIPS140_EVAL_TESTING_H */

diff --git a/crypto/fips140-eval-testing.c b/crypto/fips140-eval-testing.c
new file mode 100644
index 0000000..ea3cd65
--- /dev/null
+++ b/crypto/fips140-eval-testing.c

@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2021 Google LLC
+ *
+ * This file can optionally be built into fips140.ko in order to support certain
+ * types of testing that the FIPS lab has to do to evaluate the module.  It
+ * should not be included in production builds of the module.
+ */
+
+/*
+ * We have to redefine inline to mean always_inline, so that _copy_to_user()
+ * gets inlined.  This is needed for it to be placed into the correct section.
+ * See fips140_copy_to_user().
+ *
+ * We also need to undefine BUILD_FIPS140_KO to allow the use of the code
+ * patching which copy_to_user() requires.
+ */
+#undef inline
+#define inline inline __attribute__((__always_inline__)) __gnu_inline \
+       __inline_maybe_unused notrace
+#undef BUILD_FIPS140_KO
+
+#include <linux/cdev.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+#include "fips140-module.h"
+#include "fips140-eval-testing-uapi.h"
+
+/*
+ * This option allows deliberately failing the self-tests for a particular
+ * algorithm.
+ */
+static char *fips140_fail_selftest;
+module_param_named(fail_selftest, fips140_fail_selftest, charp, 0);
+
+/* This option allows deliberately failing the integrity check. */
+static bool fips140_fail_integrity_check;
+module_param_named(fail_integrity_check, fips140_fail_integrity_check, bool, 0);
+
+static dev_t fips140_devnum;
+static struct cdev fips140_cdev;
+
+/* Inject a self-test failure (via corrupting the result) if requested. */
+void fips140_inject_selftest_failure(const char *impl, u8 *result)
+{
+	if (fips140_fail_selftest && strcmp(impl, fips140_fail_selftest) == 0)
+		result[0] ^= 0xff;
+}
+
+/* Inject an integrity check failure (via corrupting the text) if requested. */
+void fips140_inject_integrity_failure(u8 *textcopy)
+{
+	if (fips140_fail_integrity_check)
+		textcopy[0] ^= 0xff;
+}
+
+static long fips140_ioctl_is_approved_service(unsigned long arg)
+{
+	const char *service_name = strndup_user((const char __user *)arg, 256);
+	long ret;
+
+	if (IS_ERR(service_name))
+		return PTR_ERR(service_name);
+
+	ret = fips140_is_approved_service(service_name);
+
+	kfree(service_name);
+	return ret;
+}
+
+/*
+ * Code in fips140.ko is covered by an integrity check by default, and this
+ * check breaks if copy_to_user() is called.  This is because copy_to_user() is
+ * an inline function that relies on code patching.  However, since this is
+ * "evaluation testing" code which isn't included in the production builds of
+ * fips140.ko, it's acceptable to just exclude it from the integrity check.
+ */
+static noinline unsigned long __section("text.._fips140_unchecked")
+fips140_copy_to_user(void __user *to, const void *from, unsigned long n)
+{
+	return copy_to_user(to, from, n);
+}
+
+static long fips140_ioctl_module_version(unsigned long arg)
+{
+	const char *version = fips140_module_version();
+	size_t len = strlen(version) + 1;
+
+	if (len > 256)
+		return -EOVERFLOW;
+
+	if (fips140_copy_to_user((void __user *)arg, version, len))
+		return -EFAULT;
+
+	return 0;
+}
+
+static long fips140_ioctl(struct file *file, unsigned int cmd,
+			  unsigned long arg)
+{
+	switch (cmd) {
+	case FIPS140_IOCTL_IS_APPROVED_SERVICE:
+		return fips140_ioctl_is_approved_service(arg);
+	case FIPS140_IOCTL_MODULE_VERSION:
+		return fips140_ioctl_module_version(arg);
+	default:
+		return -ENOTTY;
+	}
+}
+
+static const struct file_operations fips140_fops = {
+	.unlocked_ioctl = fips140_ioctl,
+};
+
+bool fips140_eval_testing_init(void)
+{
+	if (alloc_chrdev_region(&fips140_devnum, 1, 1, "fips140") != 0) {
+		pr_err("failed to allocate device number\n");
+		return false;
+	}
+	cdev_init(&fips140_cdev, &fips140_fops);
+	if (cdev_add(&fips140_cdev, fips140_devnum, 1) != 0) {
+		pr_err("failed to add fips140 character device\n");
+		return false;
+	}
+	return true;
+}

diff --git a/crypto/fips140-generated-testvecs.h b/crypto/fips140-generated-testvecs.h
new file mode 100644
index 0000000..d4ccd77
--- /dev/null
+++ b/crypto/fips140-generated-testvecs.h

@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2021 Google LLC */
+
+/*
+ * This header was automatically generated by gen_fips140_testvecs.py.
+ * Don't edit it directly.
+ */
+
+static const u8 fips_message[32] __initconst =
+	"This is a 32-byte test message.";
+
+static const u8 fips_aes_key[16] __initconst = "128-bit AES key";
+
+static const u8 fips_aes_iv[16] __initconst = "ABCDEFGHIJKLMNOP";
+
+static const u8 fips_aes_cbc_ciphertext[32] __initconst =
+	"\x4c\x3e\xeb\x38\x8d\x1f\x28\xfd\xa2\x3b\xa9\xda\x36\xf2\x99\xe2"
+	"\x84\x84\x66\x37\x0a\x53\x68\x2f\x17\x95\x8d\x7f\xca\x5a\x68\x4e";
+
+static const u8 fips_aes_ecb_ciphertext[32] __initconst =
+	"\xc1\x9d\xe6\xb8\xb2\x90\xff\xfe\xf2\x77\x18\xb0\x55\xd3\xee\xa9"
+	"\xe2\x6f\x4a\x32\x67\xfd\xb7\xa5\x2f\x4b\x6e\x1a\x86\x2b\x6e\x3a";
+
+static const u8 fips_aes_ctr_ciphertext[32] __initconst =
+	"\xed\x06\x2c\xd0\xbc\x48\xd1\x2e\x6a\x4e\x13\xe9\xaa\x17\x40\xca"
+	"\x00\xb4\xaf\x3b\x4f\xee\x73\xd6\x6c\x41\xf6\x4c\x8b\x0d\x6a\x0f";
+
+static const u8 fips_aes_gcm_assoc[22] __initconst = "associated data string";
+
+static const u8 fips_aes_gcm_ciphertext[48] __initconst =
+	"\x37\x88\x3e\x1d\x58\x50\xda\x10\x07\xeb\x52\xdf\xea\x0a\x54\xd4"
+	"\x44\xbf\x88\x2a\xf3\x03\x03\x84\xaf\x8b\x96\xbd\xea\x65\x60\x6f"
+	"\x82\xfa\x51\xf4\x28\xad\x0c\xf1\xce\x0f\x91\xdd\x1a\x4c\x77\x5f";
+
+static const u8 fips_aes_xts_key[32] __initconst =
+	"This is an AES-128-XTS key.";
+
+static const u8 fips_aes_xts_ciphertext[32] __initconst =
+	"\x4f\xf7\x9f\x6c\x00\xa8\x30\xdf\xff\xf3\x25\x9c\xf6\x0b\x1b\xfd"
+	"\x3b\x34\x5e\x67\x7c\xf8\x8b\x68\x9a\xb9\x5a\x89\x51\x51\xbd\x35";
+
+static const u8 fips_aes_cmac_digest[16] __initconst =
+	"\x0c\x05\xda\x64\x51\x0c\x8e\x6c\x86\x52\x46\xa8\x2d\xb1\xfe\x0f";
+
+static const u8 fips_hmac_key[16] __initconst = "128-bit HMAC key";
+
+static const u8 fips_sha1_digest[20] __initconst =
+	"\x1b\x78\xc7\x4b\xd5\xd4\x83\xb1\x58\xc5\x96\x83\x4f\x16\x8d\x15"
+	"\xb4\xaa\x22\x8c";
+
+static const u8 fips_sha256_digest[32] __initconst =
+	"\x4e\x11\x83\x0c\x53\x80\x1e\x5f\x9b\x38\x33\x38\xe8\x74\x43\xb0"
+	"\xc1\x3a\xbe\xbf\x75\xf0\x12\x0f\x21\x33\xf5\x16\x33\xf1\xb0\x81";
+
+static const u8 fips_hmac_sha256_digest[32] __initconst =
+	"\x63\x0e\xb5\x73\x79\xfc\xaf\x5f\x86\xe3\xaf\xf0\xc8\x36\xef\xd5"
+	"\x35\x8d\x40\x25\x38\xb3\x65\x72\x98\xf3\x59\xd8\x1e\x54\x4c\xa1";
+
+static const u8 fips_sha512_digest[64] __initconst =
+	"\x32\xe0\x44\x23\xbd\xe3\xec\x28\xbf\xf1\x34\x11\xd5\xae\xbf\xd5"
+	"\xc0\x8e\xb5\xa1\x04\xef\x2f\x07\x84\xf1\xd9\x83\x0f\x6c\x31\xab"
+	"\xf7\xe7\x57\xfa\xf7\xae\xf0\x6f\xb2\x16\x08\x32\xcf\xc7\xef\x35"
+	"\xb3\x3b\x51\xb9\xfd\xe7\xff\x5e\xb2\x8b\xc6\x79\xe6\x14\x04\xb4";
+
+/*
+ * This header was automatically generated by gen_fips140_testvecs.py.
+ * Don't edit it directly.
+ */

diff --git a/crypto/fips140-module.c b/crypto/fips140-module.c
new file mode 100644
index 0000000..5c2a594
--- /dev/null
+++ b/crypto/fips140-module.c

@@ -0,0 +1,611 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2021 Google LLC
+ * Author: Ard Biesheuvel <ardb@google.com>
+ *
+ * This file is the core of fips140.ko, which contains various crypto algorithms
+ * that are also built into vmlinux.  At load time, this module overrides the
+ * built-in implementations of these algorithms with its implementations.  It
+ * also runs self-tests on these algorithms and verifies the integrity of its
+ * code and data.  If either of these steps fails, the kernel will panic.
+ *
+ * This module is intended to be loaded at early boot time in order to meet
+ * FIPS 140 and NIAP FPT_TST_EXT.1 requirements.  It shouldn't be used if you
+ * don't need to meet these requirements.
+ */
+
+/*
+ * Since this .c file is the real entry point of fips140.ko, it needs to be
+ * compiled normally, so undo the hacks that were done in fips140-defs.h.
+ */
+#define MODULE
+#undef KBUILD_MODFILE
+#undef __DISABLE_EXPORTS
+
+#include <linux/ctype.h>
+#include <linux/module.h>
+#include <crypto/aead.h>
+#include <crypto/aes.h>
+#include <crypto/hash.h>
+#include <crypto/sha2.h>
+#include <crypto/skcipher.h>
+#include <crypto/rng.h>
+#include <trace/hooks/fips140.h>
+
+#include "fips140-module.h"
+#include "internal.h"
+
+/*
+ * FIPS 140-2 prefers the use of HMAC with a public key over a plain hash.
+ */
+u8 __initdata fips140_integ_hmac_key[] = "The quick brown fox jumps over the lazy dog";
+
+/* this is populated by the build tool */
+u8 __initdata fips140_integ_hmac_digest[SHA256_DIGEST_SIZE];
+
+const u32 __initcall_start_marker __section(".initcalls._start");
+const u32 __initcall_end_marker __section(".initcalls._end");
+
+const u8 __fips140_text_start __section(".text.._start");
+const u8 __fips140_text_end __section(".text.._end");
+
+const u8 __fips140_rodata_start __section(".rodata.._start");
+const u8 __fips140_rodata_end __section(".rodata.._end");
+
+/*
+ * We need this little detour to prevent Clang from detecting out of bounds
+ * accesses to __fips140_text_start and __fips140_rodata_start, which only exist
+ * to delineate the section, and so their sizes are not relevant to us.
+ */
+const u32 *__initcall_start = &__initcall_start_marker;
+
+const u8 *__text_start = &__fips140_text_start;
+const u8 *__rodata_start = &__fips140_rodata_start;
+
+/*
+ * The list of the crypto API algorithms (by cra_name) that will be unregistered
+ * by this module, in preparation for the module registering its own
+ * implementation(s) of them.
+ *
+ * All algorithms that will be declared as FIPS-approved in the module
+ * certification must be listed here, to ensure that the non-FIPS-approved
+ * implementations of these algorithms in the kernel image aren't used.
+ *
+ * For every algorithm in this list, the module should contain all the "same"
+ * implementations that the kernel image does, including the C implementation as
+ * well as any architecture-specific implementations.  This is needed to avoid
+ * performance regressions as well as the possibility of an algorithm being
+ * unavailable on some CPUs.  E.g., "xcbc(aes)" isn't in this list, as the
+ * module doesn't have a C implementation of it (and it won't be FIPS-approved).
+ *
+ * Due to a quirk in the FIPS requirements, "gcm(aes)" isn't actually able to be
+ * FIPS-approved.  However, we otherwise treat it the same as the algorithms
+ * that will be FIPS-approved, and therefore it's included in this list.
+ *
+ * When adding a new algorithm here, make sure to consider whether it needs a
+ * self-test added to fips140_selftests[] as well.
+ */
+static const struct {
+	const char *name;
+	bool approved;
+} fips140_algs_to_replace[] = {
+	{"aes", true},
+
+	{"cmac(aes)", true},
+	{"ecb(aes)", true},
+
+	{"cbc(aes)", true},
+	{"cts(cbc(aes))", true},
+	{"ctr(aes)", true},
+	{"xts(aes)", true},
+	{"gcm(aes)", false},
+
+	{"hmac(sha1)", true},
+	{"hmac(sha224)", true},
+	{"hmac(sha256)", true},
+	{"hmac(sha384)", true},
+	{"hmac(sha512)", true},
+	{"sha1", true},
+	{"sha224", true},
+	{"sha256", true},
+	{"sha384", true},
+	{"sha512", true},
+
+	{"stdrng", true},
+	{"jitterentropy_rng", false},
+};
+
+static bool __init fips140_should_unregister_alg(struct crypto_alg *alg)
+{
+	int i;
+
+	/*
+	 * All software algorithms are synchronous, hardware algorithms must
+	 * be covered by their own FIPS 140 certification.
+	 */
+	if (alg->cra_flags & CRYPTO_ALG_ASYNC)
+		return false;
+
+	for (i = 0; i < ARRAY_SIZE(fips140_algs_to_replace); i++) {
+		if (!strcmp(alg->cra_name, fips140_algs_to_replace[i].name))
+			return true;
+	}
+	return false;
+}
+
+/*
+ * FIPS 140-3 service indicators.  FIPS 140-3 requires that all services
+ * "provide an indicator when the service utilises an approved cryptographic
+ * algorithm, security function or process in an approved manner".  What this
+ * means is very debatable, even with the help of the FIPS 140-3 Implementation
+ * Guidance document.  However, it was decided that a function that takes in an
+ * algorithm name and returns whether that algorithm is approved or not will
+ * meet this requirement.  Note, this relies on some properties of the module:
+ *
+ *   - The module doesn't distinguish between "services" and "algorithms"; its
+ *     services are simply its algorithms.
+ *
+ *   - The status of an approved algorithm is never non-approved, since (a) the
+ *     module doesn't support operating in a non-approved mode, such as a mode
+ *     where the self-tests are skipped; (b) there are no cases where the module
+ *     supports non-approved settings for approved algorithms, e.g.
+ *     non-approved key sizes; and (c) this function isn't available to be
+ *     called until the module_init function has completed, so it's guaranteed
+ *     that the self-tests and integrity check have already passed.
+ *
+ *   - The module does support some non-approved algorithms, so a single static
+ *     indicator ("return true;") would not be acceptable.
+ */
+bool fips140_is_approved_service(const char *name)
+{
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(fips140_algs_to_replace); i++) {
+		if (!strcmp(name, fips140_algs_to_replace[i].name))
+			return fips140_algs_to_replace[i].approved;
+	}
+	return false;
+}
+EXPORT_SYMBOL_GPL(fips140_is_approved_service);
+
+/*
+ * FIPS 140-3 requires that modules provide a "service" that outputs "the name
+ * or module identifier and the versioning information that can be correlated
+ * with a validation record".  This function meets that requirement.
+ *
+ * Note: the module also prints this same information to the kernel log when it
+ * is loaded.  That might meet the requirement by itself.  However, given the
+ * vagueness of what counts as a "service", we provide this function too, just
+ * in case the certification lab or CMVP is happier with an explicit function.
+ *
+ * Note: /sys/modules/fips140/scmversion also provides versioning information
+ * about the module.  However that file just shows the bare git commit ID, so it
+ * probably isn't sufficient to meet the FIPS requirement, which seems to want
+ * the "official" module name and version number used in the FIPS certificate.
+ */
+const char *fips140_module_version(void)
+{
+	return FIPS140_MODULE_NAME " " FIPS140_MODULE_VERSION;
+}
+EXPORT_SYMBOL_GPL(fips140_module_version);
+
+static LIST_HEAD(existing_live_algos);
+
+/*
+ * Release a list of algorithms which have been removed from crypto_alg_list.
+ *
+ * Note that even though the list is a private list, we have to hold
+ * crypto_alg_sem while iterating through it because crypto_unregister_alg() may
+ * run concurrently (as we haven't taken a reference to the algorithms on the
+ * list), and crypto_unregister_alg() will remove the algorithm from whichever
+ * list it happens to be on, while holding crypto_alg_sem.  That's okay, since
+ * in that case crypto_unregister_alg() will handle the crypto_alg_put().
+ */
+static void fips140_remove_final(struct list_head *list)
+{
+	struct crypto_alg *alg;
+	struct crypto_alg *n;
+
+	/*
+	 * We need to take crypto_alg_sem to safely traverse the list (see
+	 * comment above), but we have to drop it when doing each
+	 * crypto_alg_put() as that may take crypto_alg_sem again.
+	 */
+	down_write(&crypto_alg_sem);
+	list_for_each_entry_safe(alg, n, list, cra_list) {
+		list_del_init(&alg->cra_list);
+		up_write(&crypto_alg_sem);
+
+		crypto_alg_put(alg);
+
+		down_write(&crypto_alg_sem);
+	}
+	up_write(&crypto_alg_sem);
+}
+
+static void __init unregister_existing_fips140_algos(void)
+{
+	struct crypto_alg *alg, *tmp;
+	LIST_HEAD(remove_list);
+	LIST_HEAD(spawns);
+
+	down_write(&crypto_alg_sem);
+
+	/*
+	 * Find all registered algorithms that we care about, and move them to a
+	 * private list so that they are no longer exposed via the algo lookup
+	 * API. Subsequently, we will unregister them if they are not in active
+	 * use. If they are, we can't fully unregister them but we can ensure
+	 * that new users won't use them.
+	 */
+	list_for_each_entry_safe(alg, tmp, &crypto_alg_list, cra_list) {
+		if (!fips140_should_unregister_alg(alg))
+			continue;
+		if (refcount_read(&alg->cra_refcnt) == 1) {
+			/*
+			 * This algorithm is not currently in use, but there may
+			 * be template instances holding references to it via
+			 * spawns. So let's tear it down like
+			 * crypto_unregister_alg() would, but without releasing
+			 * the lock, to prevent races with concurrent TFM
+			 * allocations.
+			 */
+			alg->cra_flags |= CRYPTO_ALG_DEAD;
+			list_move(&alg->cra_list, &remove_list);
+			crypto_remove_spawns(alg, &spawns, NULL);
+		} else {
+			/*
+			 * This algorithm is live, i.e. it has TFMs allocated,
+			 * so we can't fully unregister it.  It's not necessary
+			 * to dynamically redirect existing users to the FIPS
+			 * code, given that they can't be relying on FIPS
+			 * certified crypto in the first place.  However, we do
+			 * need to ensure that new users will get the FIPS code.
+			 *
+			 * In most cases, setting alg->cra_priority to 0
+			 * achieves this.  However, that isn't enough for
+			 * algorithms like "hmac(sha256)" that need to be
+			 * instantiated from a template, since existing
+			 * algorithms always take priority over a template being
+			 * instantiated.  Therefore, we move the algorithm to
+			 * a private list so that algorithm lookups won't find
+			 * it anymore.  To further distinguish it from the FIPS
+			 * algorithms, we also append "+orig" to its name.
+			 */
+			pr_info("found already-live algorithm '%s' ('%s')\n",
+				alg->cra_name, alg->cra_driver_name);
+			alg->cra_priority = 0;
+			strlcat(alg->cra_name, "+orig", CRYPTO_MAX_ALG_NAME);
+			strlcat(alg->cra_driver_name, "+orig",
+				CRYPTO_MAX_ALG_NAME);
+			list_move(&alg->cra_list, &existing_live_algos);
+		}
+	}
+	up_write(&crypto_alg_sem);
+
+	fips140_remove_final(&remove_list);
+	fips140_remove_final(&spawns);
+}
+
+static void __init unapply_text_relocations(void *section, int section_size,
+					    const Elf64_Rela *rela, int numrels)
+{
+	while (numrels--) {
+		u32 *place = (u32 *)(section + rela->r_offset);
+
+		BUG_ON(rela->r_offset >= section_size);
+
+		switch (ELF64_R_TYPE(rela->r_info)) {
+#ifdef CONFIG_ARM64
+		case R_AARCH64_ABS32: /* for KCFI */
+			*place = 0;
+			break;
+
+		case R_AARCH64_JUMP26:
+		case R_AARCH64_CALL26:
+			*place &= ~GENMASK(25, 0);
+			break;
+
+		case R_AARCH64_ADR_PREL_LO21:
+		case R_AARCH64_ADR_PREL_PG_HI21:
+		case R_AARCH64_ADR_PREL_PG_HI21_NC:
+			*place &= ~(GENMASK(30, 29) | GENMASK(23, 5));
+			break;
+
+		case R_AARCH64_ADD_ABS_LO12_NC:
+		case R_AARCH64_LDST8_ABS_LO12_NC:
+		case R_AARCH64_LDST16_ABS_LO12_NC:
+		case R_AARCH64_LDST32_ABS_LO12_NC:
+		case R_AARCH64_LDST64_ABS_LO12_NC:
+		case R_AARCH64_LDST128_ABS_LO12_NC:
+			*place &= ~GENMASK(21, 10);
+			break;
+		default:
+			pr_err("unhandled relocation type %llu\n",
+			       ELF64_R_TYPE(rela->r_info));
+			BUG();
+#else
+#error
+#endif
+		}
+		rela++;
+	}
+}
+
+static void __init unapply_rodata_relocations(void *section, int section_size,
+					      const Elf64_Rela *rela, int numrels)
+{
+	while (numrels--) {
+		void *place = section + rela->r_offset;
+
+		BUG_ON(rela->r_offset >= section_size);
+
+		switch (ELF64_R_TYPE(rela->r_info)) {
+#ifdef CONFIG_ARM64
+		case R_AARCH64_ABS64:
+			*(u64 *)place = 0;
+			break;
+		default:
+			pr_err("unhandled relocation type %llu\n",
+			       ELF64_R_TYPE(rela->r_info));
+			BUG();
+#else
+#error
+#endif
+		}
+		rela++;
+	}
+}
+
+extern struct {
+	u32	offset;
+	u32	count;
+} fips140_rela_text, fips140_rela_rodata;
+
+static bool __init check_fips140_module_hmac(void)
+{
+	struct crypto_shash *tfm = NULL;
+	SHASH_DESC_ON_STACK(desc, dontcare);
+	u8 digest[SHA256_DIGEST_SIZE];
+	void *textcopy, *rodatacopy;
+	int textsize, rodatasize;
+	bool ok = false;
+	int err;
+
+	textsize	= &__fips140_text_end - &__fips140_text_start;
+	rodatasize	= &__fips140_rodata_end - &__fips140_rodata_start;
+
+	pr_info("text size  : 0x%x\n", textsize);
+	pr_info("rodata size: 0x%x\n", rodatasize);
+
+	textcopy = kmalloc(textsize + rodatasize, GFP_KERNEL);
+	if (!textcopy) {
+		pr_err("Failed to allocate memory for copy of .text\n");
+		goto out;
+	}
+
+	rodatacopy = textcopy + textsize;
+
+	memcpy(textcopy, __text_start, textsize);
+	memcpy(rodatacopy, __rodata_start, rodatasize);
+
+	// apply the relocations in reverse on the copies of .text  and .rodata
+	unapply_text_relocations(textcopy, textsize,
+				 offset_to_ptr(&fips140_rela_text.offset),
+				 fips140_rela_text.count);
+
+	unapply_rodata_relocations(rodatacopy, rodatasize,
+				  offset_to_ptr(&fips140_rela_rodata.offset),
+				  fips140_rela_rodata.count);
+
+	fips140_inject_integrity_failure(textcopy);
+
+	tfm = crypto_alloc_shash("hmac(sha256)", 0, 0);
+	if (IS_ERR(tfm)) {
+		pr_err("failed to allocate hmac tfm (%ld)\n", PTR_ERR(tfm));
+		tfm = NULL;
+		goto out;
+	}
+	desc->tfm = tfm;
+
+	pr_info("using '%s' for integrity check\n",
+		crypto_shash_driver_name(tfm));
+
+	err = crypto_shash_setkey(tfm, fips140_integ_hmac_key,
+				  strlen(fips140_integ_hmac_key)) ?:
+	      crypto_shash_init(desc) ?:
+	      crypto_shash_update(desc, textcopy, textsize) ?:
+	      crypto_shash_finup(desc, rodatacopy, rodatasize, digest);
+
+	/* Zeroizing this is important; see the comment below. */
+	shash_desc_zero(desc);
+
+	if (err) {
+		pr_err("failed to calculate hmac shash (%d)\n", err);
+		goto out;
+	}
+
+	if (memcmp(digest, fips140_integ_hmac_digest, sizeof(digest))) {
+		pr_err("provided_digest  : %*phN\n", (int)sizeof(digest),
+		       fips140_integ_hmac_digest);
+
+		pr_err("calculated digest: %*phN\n", (int)sizeof(digest),
+		       digest);
+		goto out;
+	}
+	ok = true;
+out:
+	/*
+	 * FIPS 140-3 requires that all "temporary value(s) generated during the
+	 * integrity test" be zeroized (ref: FIPS 140-3 IG 9.7.B).  There is no
+	 * technical reason to do this given that these values are public
+	 * information, but this is the requirement so we follow it.
+	 */
+	crypto_free_shash(tfm);
+	memzero_explicit(digest, sizeof(digest));
+	kfree_sensitive(textcopy);
+	return ok;
+}
+
+static void fips140_sha256(void *p, const u8 *data, unsigned int len, u8 *out,
+			   int *hook_inuse)
+{
+	sha256(data, len, out);
+	*hook_inuse = 1;
+}
+
+static void fips140_aes_expandkey(void *p, struct crypto_aes_ctx *ctx,
+				  const u8 *in_key, unsigned int key_len,
+				  int *err)
+{
+	*err = aes_expandkey(ctx, in_key, key_len);
+}
+
+static void fips140_aes_encrypt(void *priv, const struct crypto_aes_ctx *ctx,
+				u8 *out, const u8 *in, int *hook_inuse)
+{
+	aes_encrypt(ctx, out, in);
+	*hook_inuse = 1;
+}
+
+static void fips140_aes_decrypt(void *priv, const struct crypto_aes_ctx *ctx,
+				u8 *out, const u8 *in, int *hook_inuse)
+{
+	aes_decrypt(ctx, out, in);
+	*hook_inuse = 1;
+}
+
+static bool update_fips140_library_routines(void)
+{
+	int ret;
+
+	ret = register_trace_android_vh_sha256(fips140_sha256, NULL) ?:
+	      register_trace_android_vh_aes_expandkey(fips140_aes_expandkey, NULL) ?:
+	      register_trace_android_vh_aes_encrypt(fips140_aes_encrypt, NULL) ?:
+	      register_trace_android_vh_aes_decrypt(fips140_aes_decrypt, NULL);
+
+	return ret == 0;
+}
+
+/*
+ * Initialize the FIPS 140 module.
+ *
+ * Note: this routine iterates over the contents of the initcall section, which
+ * consists of an array of function pointers that was emitted by the linker
+ * rather than the compiler. This means that these function pointers lack the
+ * usual CFI stubs that the compiler emits when CFI codegen is enabled. So
+ * let's disable CFI locally when handling the initcall array, to avoid
+ * surpises.
+ */
+static int __init __attribute__((__no_sanitize__("cfi")))
+fips140_init(void)
+{
+	const u32 *initcall;
+
+	pr_info("loading " FIPS140_MODULE_NAME " " FIPS140_MODULE_VERSION "\n");
+	fips140_init_thread = current;
+
+	unregister_existing_fips140_algos();
+
+	/* iterate over all init routines present in this module and call them */
+	for (initcall = __initcall_start + 1;
+	     initcall < &__initcall_end_marker;
+	     initcall++) {
+		int (*init)(void) = offset_to_ptr(initcall);
+		int err = init();
+
+		/*
+		 * ENODEV is expected from initcalls that only register
+		 * algorithms that depend on non-present CPU features.  Besides
+		 * that, errors aren't expected here.
+		 */
+		if (err && err != -ENODEV) {
+			pr_err("initcall %ps() failed: %d\n", init, err);
+			goto panic;
+		}
+	}
+
+	if (!fips140_run_selftests())
+		goto panic;
+
+	/*
+	 * It may seem backward to perform the integrity check last, but this
+	 * is intentional: the check itself uses hmac(sha256) which is one of
+	 * the algorithms that are replaced with versions from this module, and
+	 * the integrity check must use the replacement version.  Also, to be
+	 * ready for FIPS 140-3, the integrity check algorithm must have already
+	 * been self-tested.
+	 */
+
+	if (!check_fips140_module_hmac()) {
+		pr_crit("integrity check failed -- giving up!\n");
+		goto panic;
+	}
+	pr_info("integrity check passed\n");
+
+	complete_all(&fips140_tests_done);
+
+	if (!update_fips140_library_routines())
+		goto panic;
+
+	if (!fips140_eval_testing_init())
+		goto panic;
+
+	pr_info("module successfully loaded\n");
+	return 0;
+
+panic:
+	panic("FIPS 140 module load failure");
+}
+
+module_init(fips140_init);
+
+MODULE_IMPORT_NS(CRYPTO_INTERNAL);
+MODULE_LICENSE("GPL v2");
+
+/*
+ * Below are copies of some selected "crypto-related" helper functions that are
+ * used by fips140.ko but are not already built into it, due to them being
+ * defined in a file that cannot easily be built into fips140.ko (e.g.,
+ * crypto/algapi.c) instead of one that can (e.g., most files in lib/).
+ *
+ * There is no hard rule about what needs to be included here, as this is for
+ * FIPS certifiability, not any technical reason.  FIPS modules are supposed to
+ * implement the "crypto" themselves, but to do so they are allowed to call
+ * non-cryptographic helper functions from outside the module.  Something like
+ * memcpy() is "clearly" non-cryptographic.  However, there is is ambiguity
+ * about functions like crypto_inc() which aren't cryptographic by themselves,
+ * but are more closely associated with cryptography than e.g. memcpy().  To err
+ * on the side of caution, we define copies of some selected functions below so
+ * that calls to them from within fips140.ko will remain in fips140.ko.
+ */
+
+static inline void crypto_inc_byte(u8 *a, unsigned int size)
+{
+	u8 *b = (a + size);
+	u8 c;
+
+	for (; size; size--) {
+		c = *--b + 1;
+		*b = c;
+		if (c)
+			break;
+	}
+}
+
+void crypto_inc(u8 *a, unsigned int size)
+{
+	__be32 *b = (__be32 *)(a + size);
+	u32 c;
+
+	if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
+	    IS_ALIGNED((unsigned long)b, __alignof__(*b)))
+		for (; size >= 4; size -= 4) {
+			c = be32_to_cpu(*--b) + 1;
+			*b = cpu_to_be32(c);
+			if (likely(c))
+				return;
+		}
+
+	crypto_inc_byte(a, size);
+}

diff --git a/crypto/fips140-module.h b/crypto/fips140-module.h
new file mode 100644
index 0000000..a2a6319
--- /dev/null
+++ b/crypto/fips140-module.h

@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright 2021 Google LLC
+ */
+
+#ifndef _CRYPTO_FIPS140_MODULE_H
+#define _CRYPTO_FIPS140_MODULE_H
+
+#include <linux/completion.h>
+#include <linux/module.h>
+#include <generated/utsrelease.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "fips140: " fmt
+
+/*
+ * This is the name and version number of the module that are shown on the FIPS
+ * certificate.
+ */
+#define FIPS140_MODULE_NAME "Android Kernel Cryptographic Module"
+#define FIPS140_MODULE_VERSION UTS_RELEASE
+
+/* fips140-eval-testing.c */
+#ifdef CONFIG_CRYPTO_FIPS140_MOD_EVAL_TESTING
+void fips140_inject_selftest_failure(const char *impl, u8 *result);
+void fips140_inject_integrity_failure(u8 *textcopy);
+bool fips140_eval_testing_init(void);
+#else
+static inline void fips140_inject_selftest_failure(const char *impl, u8 *result)
+{
+}
+static inline void fips140_inject_integrity_failure(u8 *textcopy)
+{
+}
+static inline bool fips140_eval_testing_init(void)
+{
+	return true;
+}
+#endif /* !CONFIG_CRYPTO_FIPS140_MOD_EVAL_TESTING */
+
+/* fips140-module.c */
+extern struct completion fips140_tests_done;
+extern struct task_struct *fips140_init_thread;
+bool fips140_is_approved_service(const char *name);
+const char *fips140_module_version(void);
+
+/* fips140-selftests.c */
+bool __init __must_check fips140_run_selftests(void);
+
+#endif /* _CRYPTO_FIPS140_MODULE_H */

diff --git a/crypto/fips140-refs.S b/crypto/fips140-refs.S
new file mode 100644
index 0000000..fcbd527
--- /dev/null
+++ b/crypto/fips140-refs.S

@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright 2021 Google LLC
+ * Author: Ard Biesheuvel <ardb@google.com>
+ *
+ * This file contains the variable definitions that will be used by the FIPS140
+ * s/w module to access the RELA sections in the ELF image. These are used to
+ * apply the relocations applied by the module loader in reverse, so that we
+ * can reconstruct the image that was used to derive the HMAC used by the
+ * integrity check.
+ *
+ * The first .long of each entry will be populated by the module loader based
+ * on the actual placement of the respective RELA section in memory. The second
+ * .long carries the RELA entry count, and is populated by the host tool that
+ * also generates the HMAC of the contents of .text and .rodata.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+	.section	".init.rodata", "a"
+
+	.align	2
+	.globl	fips140_rela_text
+fips140_rela_text:
+	.weak	__sec_rela_text
+	.long	__sec_rela_text - .
+	.long	0
+
+	.globl	fips140_rela_rodata
+fips140_rela_rodata:
+	.weak	__sec_rela_rodata
+	.long	__sec_rela_rodata - .
+	.long	0

diff --git a/crypto/fips140-selftests.c b/crypto/fips140-selftests.c
new file mode 100644
index 0000000..9663b1a
--- /dev/null
+++ b/crypto/fips140-selftests.c

@@ -0,0 +1,998 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2021 Google LLC
+ *
+ * Authors: Elena Petrova <lenaptr@google.com>,
+ *          Eric Biggers <ebiggers@google.com>
+ *
+ * Self-tests of fips140.ko cryptographic functionality.  These are run at
+ * module load time to fulfill FIPS 140 and NIAP FPT_TST_EXT.1 requirements.
+ *
+ * The actual requirements for these self-tests are somewhat vague, but
+ * section 9 ("Self-Tests") of the FIPS 140-2 Implementation Guidance document
+ * (https://csrc.nist.gov/csrc/media/projects/cryptographic-module-validation-program/documents/fips140-2/fips1402ig.pdf)
+ * is somewhat helpful.  Basically, all implementations of all FIPS approved
+ * algorithms (including modes of operation) must be tested.  However:
+ *
+ *   - There are provisions for skipping tests that are already sufficiently
+ *     covered by other tests.  E.g., HMAC-SHA256 may cover SHA-256.
+ *
+ *   - Only one test vector is required per algorithm, and it can be generated
+ *     by any known-good implementation or taken from any official document.
+ *
+ *   - For ciphers, both encryption and decryption must be tested.
+ *
+ *   - Only one key size per algorithm needs to be tested.
+ *
+ * There is some ambiguity about whether all implementations of each algorithm
+ * must be tested, or whether it is sufficient to test just the highest priority
+ * implementation.  To be safe we test all implementations, except ones that can
+ * be excluded by one of the rules above.
+ *
+ * See fips140_selftests[] for the list of tests we've selected.  Currently, all
+ * our test vectors except the AES-CBC-CTS and DRBG ones were generated by the
+ * script tools/crypto/gen_fips140_testvecs.py, using the known-good
+ * implementations in the Python packages hashlib, pycryptodome, and
+ * cryptography.
+ *
+ * Note that we don't reuse the upstream crypto API's self-tests
+ * (crypto/testmgr.{c,h}), for several reasons:
+ *
+ *   - To meet FIPS requirements, the self-tests must be located within the FIPS
+ *     module boundary (fips140.ko).  But testmgr is integrated into the crypto
+ *     API framework and can't be extracted into the module.
+ *
+ *   - testmgr is much more heavyweight than required for FIPS and NIAP; it
+ *     tests more algorithms and does more tests per algorithm, as it's meant to
+ *     do proper testing and not just meet certification requirements.  We need
+ *     tests that can run with minimal overhead on every boot-up.
+ *
+ *   - Despite being more heavyweight in general, testmgr doesn't test the
+ *     SHA-256 and AES library APIs, despite that being needed here.
+ */
+#include <crypto/aead.h>
+#include <crypto/aes.h>
+#include <crypto/drbg.h>
+#include <crypto/hash.h>
+#include <crypto/rng.h>
+#include <crypto/sha2.h>
+#include <crypto/skcipher.h>
+
+#include "fips140-module.h"
+
+/* Test vector for an AEAD algorithm */
+struct aead_testvec {
+	const u8 *key;
+	size_t key_size;
+	const u8 *iv;
+	size_t iv_size;
+	const u8 *assoc;
+	size_t assoc_size;
+	const u8 *plaintext;
+	size_t plaintext_size;
+	const u8 *ciphertext;
+	size_t ciphertext_size;
+};
+
+/* Test vector for a length-preserving encryption algorithm */
+struct skcipher_testvec {
+	const u8 *key;
+	size_t key_size;
+	const u8 *iv;
+	size_t iv_size;
+	const u8 *plaintext;
+	const u8 *ciphertext;
+	size_t message_size;
+};
+
+/* Test vector for a hash algorithm */
+struct hash_testvec {
+	const u8 *key;
+	size_t key_size;
+	const u8 *message;
+	size_t message_size;
+	const u8 *digest;
+	size_t digest_size;
+};
+
+/* Test vector for a DRBG algorithm */
+struct drbg_testvec {
+	const u8 *entropy;
+	size_t entropy_size;
+	const u8 *pers;
+	size_t pers_size;
+	const u8 *entpr_a;
+	const u8 *entpr_b;
+	size_t entpr_size;
+	const u8 *add_a;
+	const u8 *add_b;
+	size_t add_size;
+	const u8 *output;
+	size_t out_size;
+};
+
+struct fips_test {
+	/* The name of the algorithm, in crypto API syntax */
+	const char *alg;
+
+	/*
+	 * The optional list of implementations to test.  @func will be called
+	 * once per implementation, or once with @alg if this list is empty.
+	 * The implementation names must be given in crypto API syntax, or in
+	 * the case of a library implementation should have "-lib" appended.
+	 */
+	const char *impls[8];
+
+	/*
+	 * The test function.  It should execute a known-answer test on an
+	 * algorithm implementation, using the below test vector.
+	 */
+	int __must_check (*func)(const struct fips_test *test,
+				 const char *impl);
+
+	/* The test vector, with a format specific to the type of algorithm */
+	union {
+		struct aead_testvec aead;
+		struct skcipher_testvec skcipher;
+		struct hash_testvec hash;
+		struct drbg_testvec drbg;
+	};
+};
+
+/* Maximum IV size (in bytes) among any algorithm tested here */
+#define MAX_IV_SIZE	16
+
+static int __init __must_check
+fips_check_result(u8 *result, const u8 *expected_result, size_t result_size,
+		  const char *impl, const char *operation)
+{
+	fips140_inject_selftest_failure(impl, result);
+	if (memcmp(result, expected_result, result_size) != 0) {
+		pr_err("wrong result from %s %s\n", impl, operation);
+		return -EBADMSG;
+	}
+	return 0;
+}
+
+/*
+ * None of the algorithms should be ASYNC, as the FIPS module doesn't register
+ * any ASYNC algorithms.  (The ASYNC flag is only declared by hardware
+ * algorithms, which would need their own FIPS certification.)
+ *
+ * Ideally we would verify alg->cra_module == THIS_MODULE here as well, but that
+ * doesn't work because the files are compiled as built-in code.
+ */
+static int __init __must_check
+fips_validate_alg(const struct crypto_alg *alg)
+{
+	if (alg->cra_flags & CRYPTO_ALG_ASYNC) {
+		pr_err("unexpectedly got async implementation of %s (%s)\n",
+		       alg->cra_name, alg->cra_driver_name);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int __init __must_check
+fips_handle_alloc_tfm_error(const char *impl, int err)
+{
+	if (err == -ENOENT) {
+		/*
+		 * The requested implementation of the algorithm wasn't found.
+		 * This is expected if the CPU lacks a feature the
+		 * implementation needs, such as the ARMv8 Crypto Extensions.
+		 *
+		 * When this happens, the implementation isn't available for
+		 * use, so we can't test it, nor do we need to.  So we just skip
+		 * the test.
+		 */
+		pr_info("%s is unavailable (no CPU support?), skipping testing it\n",
+			impl);
+		return 0;
+	}
+	pr_err("failed to allocate %s tfm: %d\n", impl, err);
+	return err;
+}
+
+static int __init __must_check
+fips_test_aes_library(const struct fips_test *test, const char *impl)
+{
+	const struct skcipher_testvec *vec = &test->skcipher;
+	struct crypto_aes_ctx ctx;
+	u8 block[AES_BLOCK_SIZE];
+	int err;
+
+	if (WARN_ON(vec->message_size != AES_BLOCK_SIZE))
+		return -EINVAL;
+
+	err = aes_expandkey(&ctx, vec->key, vec->key_size);
+	if (err) {
+		pr_err("aes_expandkey() failed: %d\n", err);
+		return err;
+	}
+	aes_encrypt(&ctx, block, vec->plaintext);
+	err = fips_check_result(block, vec->ciphertext, AES_BLOCK_SIZE,
+				impl, "encryption");
+	if (err)
+		return err;
+	aes_decrypt(&ctx, block, block);
+	return fips_check_result(block, vec->plaintext, AES_BLOCK_SIZE,
+				 impl, "decryption");
+}
+
+/* Test a length-preserving symmetric cipher using the crypto_skcipher API. */
+static int __init __must_check
+fips_test_skcipher(const struct fips_test *test, const char *impl)
+{
+	const struct skcipher_testvec *vec = &test->skcipher;
+	struct crypto_skcipher *tfm;
+	struct skcipher_request *req = NULL;
+	u8 *message = NULL;
+	struct scatterlist sg;
+	u8 iv[MAX_IV_SIZE];
+	int err;
+
+	if (WARN_ON(vec->iv_size > MAX_IV_SIZE))
+		return -EINVAL;
+	if (WARN_ON(vec->message_size <= 0))
+		return -EINVAL;
+
+	tfm = crypto_alloc_skcipher(impl, 0, 0);
+	if (IS_ERR(tfm))
+		return fips_handle_alloc_tfm_error(impl, PTR_ERR(tfm));
+	err = fips_validate_alg(&crypto_skcipher_alg(tfm)->base);
+	if (err)
+		goto out;
+	if (crypto_skcipher_ivsize(tfm) != vec->iv_size) {
+		pr_err("%s has wrong IV size\n", impl);
+		err = -EINVAL;
+		goto out;
+	}
+
+	req = skcipher_request_alloc(tfm, GFP_KERNEL);
+	message = kmemdup(vec->plaintext, vec->message_size, GFP_KERNEL);
+	if (!req || !message) {
+		err = -ENOMEM;
+		goto out;
+	}
+	sg_init_one(&sg, message, vec->message_size);
+
+	skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP,
+				      NULL, NULL);
+	skcipher_request_set_crypt(req, &sg, &sg, vec->message_size, iv);
+
+	err = crypto_skcipher_setkey(tfm, vec->key, vec->key_size);
+	if (err) {
+		pr_err("failed to set %s key: %d\n", impl, err);
+		goto out;
+	}
+
+	/* Encrypt the plaintext, then verify the resulting ciphertext. */
+	memcpy(iv, vec->iv, vec->iv_size);
+	err = crypto_skcipher_encrypt(req);
+	if (err) {
+		pr_err("%s encryption failed: %d\n", impl, err);
+		goto out;
+	}
+	err = fips_check_result(message, vec->ciphertext, vec->message_size,
+				impl, "encryption");
+	if (err)
+		goto out;
+
+	/* Decrypt the ciphertext, then verify the resulting plaintext. */
+	memcpy(iv, vec->iv, vec->iv_size);
+	err = crypto_skcipher_decrypt(req);
+	if (err) {
+		pr_err("%s decryption failed: %d\n", impl, err);
+		goto out;
+	}
+	err = fips_check_result(message, vec->plaintext, vec->message_size,
+				impl, "decryption");
+out:
+	kfree(message);
+	skcipher_request_free(req);
+	crypto_free_skcipher(tfm);
+	return err;
+}
+
+/* Test an AEAD using the crypto_aead API. */
+static int __init __must_check
+fips_test_aead(const struct fips_test *test, const char *impl)
+{
+	const struct aead_testvec *vec = &test->aead;
+	const int tag_size = vec->ciphertext_size - vec->plaintext_size;
+	struct crypto_aead *tfm;
+	struct aead_request *req = NULL;
+	u8 *assoc = NULL;
+	u8 *message = NULL;
+	struct scatterlist sg[2];
+	int sg_idx = 0;
+	u8 iv[MAX_IV_SIZE];
+	int err;
+
+	if (WARN_ON(vec->iv_size > MAX_IV_SIZE))
+		return -EINVAL;
+	if (WARN_ON(vec->ciphertext_size <= vec->plaintext_size))
+		return -EINVAL;
+
+	tfm = crypto_alloc_aead(impl, 0, 0);
+	if (IS_ERR(tfm))
+		return fips_handle_alloc_tfm_error(impl, PTR_ERR(tfm));
+	err = fips_validate_alg(&crypto_aead_alg(tfm)->base);
+	if (err)
+		goto out;
+	if (crypto_aead_ivsize(tfm) != vec->iv_size) {
+		pr_err("%s has wrong IV size\n", impl);
+		err = -EINVAL;
+		goto out;
+	}
+
+	req = aead_request_alloc(tfm, GFP_KERNEL);
+	assoc = kmemdup(vec->assoc, vec->assoc_size, GFP_KERNEL);
+	message = kzalloc(vec->ciphertext_size, GFP_KERNEL);
+	if (!req || !assoc || !message) {
+		err = -ENOMEM;
+		goto out;
+	}
+	memcpy(message, vec->plaintext, vec->plaintext_size);
+
+	sg_init_table(sg, ARRAY_SIZE(sg));
+	if (vec->assoc_size)
+		sg_set_buf(&sg[sg_idx++], assoc, vec->assoc_size);
+	sg_set_buf(&sg[sg_idx++], message, vec->ciphertext_size);
+
+	aead_request_set_ad(req, vec->assoc_size);
+	aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
+
+	err = crypto_aead_setkey(tfm, vec->key, vec->key_size);
+	if (err) {
+		pr_err("failed to set %s key: %d\n", impl, err);
+		goto out;
+	}
+
+	err = crypto_aead_setauthsize(tfm, tag_size);
+	if (err) {
+		pr_err("failed to set %s authentication tag size: %d\n",
+		       impl, err);
+		goto out;
+	}
+
+	/*
+	 * Encrypt the plaintext, then verify the resulting ciphertext (which
+	 * includes the authentication tag).
+	 */
+	memcpy(iv, vec->iv, vec->iv_size);
+	aead_request_set_crypt(req, sg, sg, vec->plaintext_size, iv);
+	err = crypto_aead_encrypt(req);
+	if (err) {
+		pr_err("%s encryption failed: %d\n", impl, err);
+		goto out;
+	}
+	err = fips_check_result(message, vec->ciphertext, vec->ciphertext_size,
+				impl, "encryption");
+	if (err)
+		goto out;
+
+	/*
+	 * Decrypt the ciphertext (which includes the authentication tag), then
+	 * verify the resulting plaintext.
+	 */
+	memcpy(iv, vec->iv, vec->iv_size);
+	aead_request_set_crypt(req, sg, sg, vec->ciphertext_size, iv);
+	err = crypto_aead_decrypt(req);
+	if (err) {
+		pr_err("%s decryption failed: %d\n", impl, err);
+		goto out;
+	}
+	err = fips_check_result(message, vec->plaintext, vec->plaintext_size,
+				impl, "decryption");
+out:
+	kfree(message);
+	kfree(assoc);
+	aead_request_free(req);
+	crypto_free_aead(tfm);
+	return err;
+}
+
+/*
+ * Test a hash algorithm using the crypto_shash API.
+ *
+ * Note that we don't need to test the crypto_ahash API too, since none of the
+ * hash algorithms in the FIPS module have the ASYNC flag, and thus there will
+ * be no hash algorithms that can be accessed only through crypto_ahash.
+ */
+static int __init __must_check
+fips_test_hash(const struct fips_test *test, const char *impl)
+{
+	const struct hash_testvec *vec = &test->hash;
+	struct crypto_shash *tfm;
+	u8 digest[HASH_MAX_DIGESTSIZE];
+	int err;
+
+	if (WARN_ON(vec->digest_size > HASH_MAX_DIGESTSIZE))
+		return -EINVAL;
+
+	tfm = crypto_alloc_shash(impl, 0, 0);
+	if (IS_ERR(tfm))
+		return fips_handle_alloc_tfm_error(impl, PTR_ERR(tfm));
+	err = fips_validate_alg(&crypto_shash_alg(tfm)->base);
+	if (err)
+		goto out;
+	if (crypto_shash_digestsize(tfm) != vec->digest_size) {
+		pr_err("%s has wrong digest size\n", impl);
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (vec->key) {
+		err = crypto_shash_setkey(tfm, vec->key, vec->key_size);
+		if (err) {
+			pr_err("failed to set %s key: %d\n", impl, err);
+			goto out;
+		}
+	}
+
+	err = crypto_shash_tfm_digest(tfm, vec->message, vec->message_size,
+				      digest);
+	if (err) {
+		pr_err("%s digest computation failed: %d\n", impl, err);
+		goto out;
+	}
+	err = fips_check_result(digest, vec->digest, vec->digest_size,
+				impl, "digest");
+out:
+	crypto_free_shash(tfm);
+	return err;
+}
+
+static int __init __must_check
+fips_test_sha256_library(const struct fips_test *test, const char *impl)
+{
+	const struct hash_testvec *vec = &test->hash;
+	u8 digest[SHA256_DIGEST_SIZE];
+
+	if (WARN_ON(vec->digest_size != SHA256_DIGEST_SIZE))
+		return -EINVAL;
+
+	sha256(vec->message, vec->message_size, digest);
+	return fips_check_result(digest, vec->digest, vec->digest_size,
+				 impl, "digest");
+}
+
+/* Test a DRBG using the crypto_rng API. */
+static int __init __must_check
+fips_test_drbg(const struct fips_test *test, const char *impl)
+{
+	const struct drbg_testvec *vec = &test->drbg;
+	struct crypto_rng *rng;
+	u8 *output = NULL;
+	struct drbg_test_data test_data;
+	struct drbg_string addtl, pers, testentropy;
+	int err;
+
+	rng = crypto_alloc_rng(impl, 0, 0);
+	if (IS_ERR(rng))
+		return fips_handle_alloc_tfm_error(impl, PTR_ERR(rng));
+	err = fips_validate_alg(&crypto_rng_alg(rng)->base);
+	if (err)
+		goto out;
+
+	output = kzalloc(vec->out_size, GFP_KERNEL);
+	if (!output) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	/*
+	 * Initialize the DRBG with the entropy and personalization string given
+	 * in the test vector.
+	 */
+	test_data.testentropy = &testentropy;
+	drbg_string_fill(&testentropy, vec->entropy, vec->entropy_size);
+	drbg_string_fill(&pers, vec->pers, vec->pers_size);
+	err = crypto_drbg_reset_test(rng, &pers, &test_data);
+	if (err) {
+		pr_err("failed to reset %s\n", impl);
+		goto out;
+	}
+
+	/*
+	 * Generate some random bytes using the additional data string provided
+	 * in the test vector.  Also use the additional entropy if provided
+	 * (relevant for the prediction-resistant DRBG variants only).
+	 */
+	drbg_string_fill(&addtl, vec->add_a, vec->add_size);
+	if (vec->entpr_size) {
+		drbg_string_fill(&testentropy, vec->entpr_a, vec->entpr_size);
+		err = crypto_drbg_get_bytes_addtl_test(rng, output,
+						       vec->out_size, &addtl,
+						       &test_data);
+	} else {
+		err = crypto_drbg_get_bytes_addtl(rng, output, vec->out_size,
+						  &addtl);
+	}
+	if (err) {
+		pr_err("failed to get bytes from %s (try 1): %d\n",
+		       impl, err);
+		goto out;
+	}
+
+	/*
+	 * Do the same again, using a second additional data string, and (when
+	 * applicable) a second additional entropy string.
+	 */
+	drbg_string_fill(&addtl, vec->add_b, vec->add_size);
+	if (test->drbg.entpr_size) {
+		drbg_string_fill(&testentropy, vec->entpr_b, vec->entpr_size);
+		err = crypto_drbg_get_bytes_addtl_test(rng, output,
+						       vec->out_size, &addtl,
+						       &test_data);
+	} else {
+		err = crypto_drbg_get_bytes_addtl(rng, output, vec->out_size,
+						  &addtl);
+	}
+	if (err) {
+		pr_err("failed to get bytes from %s (try 2): %d\n",
+		       impl, err);
+		goto out;
+	}
+
+	/* Check that the DRBG generated the expected output. */
+	err = fips_check_result(output, vec->output, vec->out_size,
+				impl, "get_bytes");
+out:
+	kfree(output);
+	crypto_free_rng(rng);
+	return err;
+}
+
+/* Include the test vectors generated by the Python script. */
+#include "fips140-generated-testvecs.h"
+
+/*
+ * List of all self-tests.  Keep this in sync with fips140_algorithms[].
+ *
+ * When possible, we have followed the FIPS 140-2 Implementation Guidance (IG)
+ * document when creating this list of tests.  The result is intended to be a
+ * list of tests that is near-minimal (and thus minimizes runtime overhead)
+ * while complying with all requirements.  For additional details, see the
+ * comment at the beginning of this file.
+ */
+static const struct fips_test fips140_selftests[] __initconst = {
+	/*
+	 * Test for the AES library API.
+	 *
+	 * Since the AES library API may use its own AES implementation and the
+	 * module provides no support for composing it with a mode of operation
+	 * (it's just plain AES), we must test it directly.
+	 *
+	 * In contrast, we don't need to directly test the "aes" ciphers that
+	 * are accessible through the crypto_cipher API (e.g. "aes-ce"), as they
+	 * are covered indirectly by AES-CMAC and AES-ECB tests.
+	 */
+	{
+		.alg		= "aes",
+		.impls		= {"aes-lib"},
+		.func		= fips_test_aes_library,
+		.skcipher	= {
+			.key		= fips_aes_key,
+			.key_size	= sizeof(fips_aes_key),
+			.plaintext	= fips_message,
+			.ciphertext	= fips_aes_ecb_ciphertext,
+			.message_size	= 16,
+		}
+	},
+	/*
+	 * Tests for AES-CMAC, a.k.a. "cmac(aes)" in crypto API syntax.
+	 *
+	 * The IG requires that each underlying AES implementation be tested in
+	 * an authenticated mode, if implemented.  Of such modes, this module
+	 * implements AES-GCM and AES-CMAC.  However, AES-GCM doesn't "count"
+	 * because this module's implementations of AES-GCM won't actually be
+	 * FIPS-approved, due to a quirk in the FIPS requirements.
+	 *
+	 * Therefore, for us this requirement applies to AES-CMAC, so we must
+	 * test the "cmac" template composed with each "aes" implementation.
+	 *
+	 * Separately from the above, we also must test all standalone
+	 * implementations of "cmac(aes)" such as "cmac-aes-ce", as they don't
+	 * reuse another full AES implementation and thus can't be covered by
+	 * another test.
+	 */
+	{
+		.alg		= "cmac(aes)",
+		.impls		= {
+			/* "cmac" template with all "aes" implementations */
+			"cmac(aes-generic)",
+			"cmac(aes-arm64)",
+			"cmac(aes-ce)",
+			/* All standalone implementations of "cmac(aes)" */
+			"cmac-aes-neon",
+			"cmac-aes-ce",
+		},
+		.func		= fips_test_hash,
+		.hash		= {
+			.key		= fips_aes_key,
+			.key_size	= sizeof(fips_aes_key),
+			.message	= fips_message,
+			.message_size	= sizeof(fips_message),
+			.digest		= fips_aes_cmac_digest,
+			.digest_size	= sizeof(fips_aes_cmac_digest),
+		}
+	},
+	/*
+	 * Tests for AES-ECB, a.k.a. "ecb(aes)" in crypto API syntax.
+	 *
+	 * The IG requires that each underlying AES implementation be tested in
+	 * a mode that exercises the encryption direction of AES and in a mode
+	 * that exercises the decryption direction of AES.  CMAC only covers the
+	 * encryption direction, so we choose ECB to test decryption.  Thus, we
+	 * test the "ecb" template composed with each "aes" implementation.
+	 *
+	 * Separately from the above, we also must test all standalone
+	 * implementations of "ecb(aes)" such as "ecb-aes-ce", as they don't
+	 * reuse another full AES implementation and thus can't be covered by
+	 * another test.
+	 */
+	{
+		.alg		= "ecb(aes)",
+		.impls		= {
+			/* "ecb" template with all "aes" implementations */
+			"ecb(aes-generic)",
+			"ecb(aes-arm64)",
+			"ecb(aes-ce)",
+			/* All standalone implementations of "ecb(aes)" */
+			"ecb-aes-neon",
+			"ecb-aes-neonbs",
+			"ecb-aes-ce",
+		},
+		.func		= fips_test_skcipher,
+		.skcipher	= {
+			.key		= fips_aes_key,
+			.key_size	= sizeof(fips_aes_key),
+			.plaintext	= fips_message,
+			.ciphertext	= fips_aes_ecb_ciphertext,
+			.message_size	= sizeof(fips_message)
+		}
+	},
+	/*
+	 * Tests for AES-CBC, AES-CBC-CTS, AES-CTR, AES-XTS, and AES-GCM.
+	 *
+	 * According to the IG, an AES mode of operation doesn't need to have
+	 * its own test, provided that (a) both the encryption and decryption
+	 * directions of the underlying AES implementation are already tested
+	 * via other mode(s), and (b) in the case of an authenticated mode, at
+	 * least one other authenticated mode is already tested.  The tests of
+	 * the "cmac" and "ecb" templates fulfill these conditions; therefore,
+	 * we don't need to test any other AES mode templates.
+	 *
+	 * This does *not* apply to standalone implementations of these modes
+	 * such as "cbc-aes-ce", as such implementations don't reuse another
+	 * full AES implementation and thus can't be covered by another test.
+	 * We must test all such standalone implementations.
+	 *
+	 * The AES-GCM test isn't actually required, as it's expected that this
+	 * module's AES-GCM implementation won't actually be able to be
+	 * FIPS-approved.  This is unfortunate; it's caused by the FIPS
+	 * requirements for GCM being incompatible with GCM implementations that
+	 * don't generate their own IVs.  We choose to still include the AES-GCM
+	 * test to keep it on par with the other FIPS-approved algorithms, in
+	 * case it turns out that AES-GCM can be approved after all.
+	 */
+	{
+		.alg		= "cbc(aes)",
+		.impls		= {
+			/* All standalone implementations of "cbc(aes)" */
+			"cbc-aes-neon",
+			"cbc-aes-neonbs",
+			"cbc-aes-ce",
+		},
+		.func		= fips_test_skcipher,
+		.skcipher	= {
+			.key		= fips_aes_key,
+			.key_size	= sizeof(fips_aes_key),
+			.iv		= fips_aes_iv,
+			.iv_size	= sizeof(fips_aes_iv),
+			.plaintext	= fips_message,
+			.ciphertext	= fips_aes_cbc_ciphertext,
+			.message_size	= sizeof(fips_message),
+		}
+	}, {
+		.alg		= "cts(cbc(aes))",
+		.impls		= {
+			/* All standalone implementations of "cts(cbc(aes))" */
+			"cts-cbc-aes-neon",
+			"cts-cbc-aes-ce",
+		},
+		.func		= fips_test_skcipher,
+		/* Test vector taken from RFC 3962 */
+		.skcipher	= {
+			.key    = "\x63\x68\x69\x63\x6b\x65\x6e\x20"
+				  "\x74\x65\x72\x69\x79\x61\x6b\x69",
+			.key_size = 16,
+			.iv	= "\x00\x00\x00\x00\x00\x00\x00\x00"
+				  "\x00\x00\x00\x00\x00\x00\x00\x00",
+			.iv_size = 16,
+			.plaintext = "\x49\x20\x77\x6f\x75\x6c\x64\x20"
+				     "\x6c\x69\x6b\x65\x20\x74\x68\x65"
+				     "\x20\x47\x65\x6e\x65\x72\x61\x6c"
+				     "\x20\x47\x61\x75\x27\x73\x20",
+			.ciphertext = "\xfc\x00\x78\x3e\x0e\xfd\xb2\xc1"
+				      "\xd4\x45\xd4\xc8\xef\xf7\xed\x22"
+				      "\x97\x68\x72\x68\xd6\xec\xcc\xc0"
+				      "\xc0\x7b\x25\xe2\x5e\xcf\xe5",
+			.message_size = 31,
+		}
+	}, {
+		.alg		= "ctr(aes)",
+		.impls		= {
+			/* All standalone implementations of "ctr(aes)" */
+			"ctr-aes-neon",
+			"ctr-aes-neonbs",
+			"ctr-aes-ce",
+		},
+		.func		= fips_test_skcipher,
+		.skcipher	= {
+			.key		= fips_aes_key,
+			.key_size	= sizeof(fips_aes_key),
+			.iv		= fips_aes_iv,
+			.iv_size	= sizeof(fips_aes_iv),
+			.plaintext	= fips_message,
+			.ciphertext	= fips_aes_ctr_ciphertext,
+			.message_size	= sizeof(fips_message),
+		}
+	}, {
+		.alg		= "xts(aes)",
+		.impls		= {
+			/* All standalone implementations of "xts(aes)" */
+			"xts-aes-neon",
+			"xts-aes-neonbs",
+			"xts-aes-ce",
+		},
+		.func		= fips_test_skcipher,
+		.skcipher	= {
+			.key		= fips_aes_xts_key,
+			.key_size	= sizeof(fips_aes_xts_key),
+			.iv		= fips_aes_iv,
+			.iv_size	= sizeof(fips_aes_iv),
+			.plaintext	= fips_message,
+			.ciphertext	= fips_aes_xts_ciphertext,
+			.message_size	= sizeof(fips_message),
+		}
+	}, {
+		.alg		= "gcm(aes)",
+		.impls		= {
+			/* All standalone implementations of "gcm(aes)" */
+			"gcm-aes-ce",
+		},
+		.func		= fips_test_aead,
+		.aead		= {
+			.key		= fips_aes_key,
+			.key_size	= sizeof(fips_aes_key),
+			.iv		= fips_aes_iv,
+			/* The GCM implementations assume an IV size of 12. */
+			.iv_size	= 12,
+			.assoc		= fips_aes_gcm_assoc,
+			.assoc_size	= sizeof(fips_aes_gcm_assoc),
+			.plaintext	= fips_message,
+			.plaintext_size	= sizeof(fips_message),
+			.ciphertext	= fips_aes_gcm_ciphertext,
+			.ciphertext_size = sizeof(fips_aes_gcm_ciphertext),
+		}
+	},
+
+	/* Tests for SHA-1 */
+	{
+		.alg		= "sha1",
+		.impls		= {
+			/* All implementations of "sha1" */
+			"sha1-generic",
+			"sha1-ce"
+		},
+		.func		= fips_test_hash,
+		.hash		= {
+			.message	= fips_message,
+			.message_size	= sizeof(fips_message),
+			.digest		= fips_sha1_digest,
+			.digest_size	= sizeof(fips_sha1_digest)
+		}
+	},
+	/*
+	 * Tests for all SHA-256 implementations other than the sha256() library
+	 * function.  As per the IG, these tests also fulfill the tests for the
+	 * corresponding SHA-224 implementations.
+	 */
+	{
+		.alg		= "sha256",
+		.impls		= {
+			/* All implementations of "sha256" */
+			"sha256-generic",
+			"sha256-arm64",
+			"sha256-ce",
+		},
+		.func		= fips_test_hash,
+		.hash		= {
+			.message	= fips_message,
+			.message_size	= sizeof(fips_message),
+			.digest		= fips_sha256_digest,
+			.digest_size	= sizeof(fips_sha256_digest)
+		}
+	},
+	/*
+	 * Test for the sha256() library function.  This must be tested
+	 * separately because it may use its own SHA-256 implementation.
+	 */
+	{
+		.alg		= "sha256",
+		.impls		= {"sha256-lib"},
+		.func		= fips_test_sha256_library,
+		.hash		= {
+			.message	= fips_message,
+			.message_size	= sizeof(fips_message),
+			.digest		= fips_sha256_digest,
+			.digest_size	= sizeof(fips_sha256_digest)
+		}
+	},
+	/*
+	 * Tests for all SHA-512 implementations.  As per the IG, these tests
+	 * also fulfill the tests for the corresponding SHA-384 implementations.
+	 */
+	{
+		.alg		= "sha512",
+		.impls		= {
+			/* All implementations of "sha512" */
+			"sha512-generic",
+			"sha512-arm64",
+			"sha512-ce",
+		},
+		.func		= fips_test_hash,
+		.hash		= {
+			.message	= fips_message,
+			.message_size	= sizeof(fips_message),
+			.digest		= fips_sha512_digest,
+			.digest_size	= sizeof(fips_sha512_digest)
+		}
+	},
+	/*
+	 * Test for HMAC.  As per the IG, only one HMAC test is required,
+	 * provided that the same HMAC code is shared by all HMAC-SHA*.  This is
+	 * true in our case.  We choose HMAC-SHA256 for the test.
+	 *
+	 * Note that as per the IG, this can fulfill the test for the underlying
+	 * SHA.  However, we don't currently rely on this.
+	 */
+	{
+		.alg		= "hmac(sha256)",
+		.func		= fips_test_hash,
+		.hash		= {
+			.key		= fips_hmac_key,
+			.key_size	= sizeof(fips_hmac_key),
+			.message	= fips_message,
+			.message_size	= sizeof(fips_message),
+			.digest		= fips_hmac_sha256_digest,
+			.digest_size	= sizeof(fips_hmac_sha256_digest)
+		}
+	},
+	/*
+	 * Known-answer tests for the SP800-90A DRBG algorithms.
+	 *
+	 * These test vectors were manually extracted from
+	 * https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Algorithm-Validation-Program/documents/drbg/drbgtestvectors.zip.
+	 *
+	 * The selection of these tests follows the FIPS 140-2 IG as well as
+	 * Section 11 of SP800-90A:
+	 *
+	 * - We must test all DRBG types (HMAC, Hash, and CTR) that the module
+	 *   implements.  However, currently the module only implements
+	 *   HMAC_DRBG (since CONFIG_CRYPTO_DRBG_CTR and CONFIG_CRYPTO_DRBG_HASH
+	 *   aren't enabled).  Therefore, we only need to test HMAC_DRBG.
+	 *
+	 * - We only need to test one HMAC variant.
+	 *
+	 * - We must test all DRBG operations: Instantiate(), Reseed(), and
+	 *   Generate().  However, a single test sequence with a single output
+	 *   comparison may cover all three operations, and this is what we do.
+	 *   Note that Reseed() happens implicitly via the use of the additional
+	 *   input and also via the use of prediction resistance when enabled.
+	 *
+	 * - The personalization string, additional input, and prediction
+	 *   resistance support must be tested.  Therefore we have chosen test
+	 *   vectors that have a nonempty personalization string and nonempty
+	 *   additional input, and we test the prediction-resistant variant.
+	 *   Testing the non-prediction-resistant variant is not required.
+	 */
+	{
+		.alg	= "drbg_pr_hmac_sha256",
+		.func	= fips_test_drbg,
+		.drbg	= {
+			.entropy =
+				"\xc7\xcc\xbc\x67\x7e\x21\x66\x1e\x27\x2b\x63\xdd"
+				"\x3a\x78\xdc\xdf\x66\x6d\x3f\x24\xae\xcf\x37\x01"
+				"\xa9\x0d\x89\x8a\xa7\xdc\x81\x58\xae\xb2\x10\x15"
+				"\x7e\x18\x44\x6d\x13\xea\xdf\x37\x85\xfe\x81\xfb",
+			.entropy_size = 48,
+			.entpr_a =
+				"\x7b\xa1\x91\x5b\x3c\x04\xc4\x1b\x1d\x19\x2f\x1a"
+				"\x18\x81\x60\x3c\x6c\x62\x91\xb7\xe9\xf5\xcb\x96"
+				"\xbb\x81\x6a\xcc\xb5\xae\x55\xb6",
+			.entpr_b =
+				"\x99\x2c\xc7\x78\x7e\x3b\x88\x12\xef\xbe\xd3\xd2"
+				"\x7d\x2a\xa5\x86\xda\x8d\x58\x73\x4a\x0a\xb2\x2e"
+				"\xbb\x4c\x7e\xe3\x9a\xb6\x81\xc1",
+			.entpr_size = 32,
+			.output =
+				"\x95\x6f\x95\xfc\x3b\xb7\xfe\x3e\xd0\x4e\x1a\x14"
+				"\x6c\x34\x7f\x7b\x1d\x0d\x63\x5e\x48\x9c\x69\xe6"
+				"\x46\x07\xd2\x87\xf3\x86\x52\x3d\x98\x27\x5e\xd7"
+				"\x54\xe7\x75\x50\x4f\xfb\x4d\xfd\xac\x2f\x4b\x77"
+				"\xcf\x9e\x8e\xcc\x16\xa2\x24\xcd\x53\xde\x3e\xc5"
+				"\x55\x5d\xd5\x26\x3f\x89\xdf\xca\x8b\x4e\x1e\xb6"
+				"\x88\x78\x63\x5c\xa2\x63\x98\x4e\x6f\x25\x59\xb1"
+				"\x5f\x2b\x23\xb0\x4b\xa5\x18\x5d\xc2\x15\x74\x40"
+				"\x59\x4c\xb4\x1e\xcf\x9a\x36\xfd\x43\xe2\x03\xb8"
+				"\x59\x91\x30\x89\x2a\xc8\x5a\x43\x23\x7c\x73\x72"
+				"\xda\x3f\xad\x2b\xba\x00\x6b\xd1",
+			.out_size = 128,
+			.add_a =
+				"\x18\xe8\x17\xff\xef\x39\xc7\x41\x5c\x73\x03\x03"
+				"\xf6\x3d\xe8\x5f\xc8\xab\xe4\xab\x0f\xad\xe8\xd6"
+				"\x86\x88\x55\x28\xc1\x69\xdd\x76",
+			.add_b =
+				"\xac\x07\xfc\xbe\x87\x0e\xd3\xea\x1f\x7e\xb8\xe7"
+				"\x9d\xec\xe8\xe7\xbc\xf3\x18\x25\x77\x35\x4a\xaa"
+				"\x00\x99\x2a\xdd\x0a\x00\x50\x82",
+			.add_size = 32,
+			.pers =
+				"\xbc\x55\xab\x3c\xf6\x52\xb0\x11\x3d\x7b\x90\xb8"
+				"\x24\xc9\x26\x4e\x5a\x1e\x77\x0d\x3d\x58\x4a\xda"
+				"\xd1\x81\xe9\xf8\xeb\x30\x8f\x6f",
+			.pers_size = 32,
+		}
+	}
+};
+
+static int __init __must_check
+fips_run_test(const struct fips_test *test)
+{
+	int i;
+	int err;
+
+	/*
+	 * If no implementations were specified, then just test the default one.
+	 * Otherwise, test the specified list of implementations.
+	 */
+
+	if (test->impls[0] == NULL) {
+		err = test->func(test, test->alg);
+		if (err)
+			pr_emerg("self-tests failed for algorithm %s: %d\n",
+				 test->alg, err);
+		return err;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(test->impls) && test->impls[i] != NULL;
+	     i++) {
+		err = test->func(test, test->impls[i]);
+		if (err) {
+			pr_emerg("self-tests failed for algorithm %s, implementation %s: %d\n",
+				 test->alg, test->impls[i], err);
+			return err;
+		}
+	}
+	return 0;
+}
+
+bool __init fips140_run_selftests(void)
+{
+	int i;
+
+	pr_info("running self-tests\n");
+	for (i = 0; i < ARRAY_SIZE(fips140_selftests); i++) {
+		if (fips_run_test(&fips140_selftests[i]) != 0) {
+			/* The caller is responsible for calling panic(). */
+			return false;
+		}
+	}
+	pr_info("all self-tests passed\n");
+	return true;
+}

diff --git a/crypto/fips140_gen_hmac.c b/crypto/fips140_gen_hmac.c
new file mode 100644
index 0000000..69f754d
--- /dev/null
+++ b/crypto/fips140_gen_hmac.c

@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - Google LLC
+ * Author: Ard Biesheuvel <ardb@google.com>
+ *
+ * This is a host tool that is intended to be used to take the HMAC digest of
+ * the .text and .rodata sections of the fips140.ko module, and store it inside
+ * the module. The module will perform an integrity selfcheck at module_init()
+ * time, by recalculating the digest and comparing it with the value calculated
+ * here.
+ *
+ * Note that the peculiar way an HMAC is being used as a digest with a public
+ * key rather than as a symmetric key signature is mandated by FIPS 140-2.
+ */
+
+#include <elf.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <openssl/hmac.h>
+
+static Elf64_Ehdr *ehdr;
+static Elf64_Shdr *shdr;
+static int num_shdr;
+static const char *strtab, *shstrtab;
+static Elf64_Sym *syms;
+static int num_syms;
+
+static Elf64_Shdr *find_symtab_section(void)
+{
+	int i;
+
+	for (i = 0; i < num_shdr; i++)
+		if (shdr[i].sh_type == SHT_SYMTAB)
+			return &shdr[i];
+	return NULL;
+}
+
+static int get_section_idx(const char *name)
+{
+	int i;
+
+	for (i = 0; i < num_shdr; i++)
+		if (!strcmp(shstrtab + shdr[i].sh_name, name))
+			return i;
+	return -1;
+}
+
+static int get_sym_idx(const char *sym_name)
+{
+	int i;
+
+	for (i = 0; i < num_syms; i++)
+		if (!strcmp(strtab + syms[i].st_name, sym_name))
+			return i;
+	return -1;
+}
+
+static void *get_sym_addr(const char *sym_name)
+{
+	int i = get_sym_idx(sym_name);
+
+	if (i >= 0)
+		return (void *)ehdr + shdr[syms[i].st_shndx].sh_offset +
+		       syms[i].st_value;
+	return NULL;
+}
+
+static int update_rela_ref(const char *name)
+{
+	/*
+	 * We need to do a couple of things to ensure that the copied RELA data
+	 * is accessible to the module itself at module init time:
+	 * - the associated entry in the symbol table needs to refer to the
+	 *   correct section index, and have SECTION type and GLOBAL linkage.
+	 * - the 'count' global variable in the module need to be set to the
+	 *   right value based on the size of the RELA section.
+	 */
+	unsigned int *size_var;
+	int sec_idx, sym_idx;
+	char str[32];
+
+	sprintf(str, "fips140_rela_%s", name);
+	size_var = get_sym_addr(str);
+	if (!size_var) {
+		printf("variable '%s' not found, disregarding .%s section\n",
+		       str, name);
+		return 1;
+	}
+
+	sprintf(str, "__sec_rela_%s", name);
+	sym_idx = get_sym_idx(str);
+
+	sprintf(str, ".init.rela.%s", name);
+	sec_idx = get_section_idx(str);
+
+	if (sec_idx < 0 || sym_idx < 0) {
+		fprintf(stderr, "failed to locate metadata for .%s section in binary\n",
+			name);
+		return 0;
+	}
+
+	syms[sym_idx].st_shndx = sec_idx;
+	syms[sym_idx].st_info = (STB_GLOBAL << 4) | STT_SECTION;
+
+	size_var[1] = shdr[sec_idx].sh_size / sizeof(Elf64_Rela);
+
+	return 1;
+}
+
+static void hmac_section(HMAC_CTX *hmac, const char *start, const char *end)
+{
+	void *start_addr = get_sym_addr(start);
+	void *end_addr = get_sym_addr(end);
+
+	HMAC_Update(hmac, start_addr, end_addr - start_addr);
+}
+
+int main(int argc, char **argv)
+{
+	Elf64_Shdr *symtab_shdr;
+	const char *hmac_key;
+	unsigned char *dg;
+	unsigned int dglen;
+	struct stat stat;
+	HMAC_CTX *hmac;
+	int fd, ret;
+
+	if (argc < 2) {
+		fprintf(stderr, "file argument missing\n");
+		exit(EXIT_FAILURE);
+	}
+
+	fd = open(argv[1], O_RDWR);
+	if (fd < 0) {
+		fprintf(stderr, "failed to open %s\n", argv[1]);
+		exit(EXIT_FAILURE);
+	}
+
+	ret = fstat(fd, &stat);
+	if (ret < 0) {
+		fprintf(stderr, "failed to stat() %s\n", argv[1]);
+		exit(EXIT_FAILURE);
+	}
+
+	ehdr = mmap(0, stat.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	if (ehdr == MAP_FAILED) {
+		fprintf(stderr, "failed to mmap() %s\n", argv[1]);
+		exit(EXIT_FAILURE);
+	}
+
+	shdr = (void *)ehdr + ehdr->e_shoff;
+	num_shdr = ehdr->e_shnum;
+
+	symtab_shdr = find_symtab_section();
+
+	syms = (void *)ehdr + symtab_shdr->sh_offset;
+	num_syms = symtab_shdr->sh_size / sizeof(Elf64_Sym);
+
+	strtab = (void *)ehdr + shdr[symtab_shdr->sh_link].sh_offset;
+	shstrtab = (void *)ehdr + shdr[ehdr->e_shstrndx].sh_offset;
+
+	if (!update_rela_ref("text") || !update_rela_ref("rodata"))
+		exit(EXIT_FAILURE);
+
+	hmac_key = get_sym_addr("fips140_integ_hmac_key");
+	if (!hmac_key) {
+		fprintf(stderr, "failed to locate HMAC key in binary\n");
+		exit(EXIT_FAILURE);
+	}
+
+	dg = get_sym_addr("fips140_integ_hmac_digest");
+	if (!dg) {
+		fprintf(stderr, "failed to locate HMAC digest in binary\n");
+		exit(EXIT_FAILURE);
+	}
+
+	hmac = HMAC_CTX_new();
+	HMAC_Init_ex(hmac, hmac_key, strlen(hmac_key), EVP_sha256(), NULL);
+
+	hmac_section(hmac, "__fips140_text_start", "__fips140_text_end");
+	hmac_section(hmac, "__fips140_rodata_start", "__fips140_rodata_end");
+
+	HMAC_Final(hmac, dg, &dglen);
+
+	close(fd);
+	return 0;
+}

diff --git a/crypto/internal.h b/crypto/internal.h
index c083855..932f0aa 100644
--- a/crypto/internal.h
+++ b/crypto/internal.h

@@ -47,7 +47,25 @@ extern struct list_head crypto_alg_list;
 extern struct rw_semaphore crypto_alg_sem;
 extern struct blocking_notifier_head crypto_chain;
 
-DECLARE_STATIC_KEY_FALSE(crypto_boot_test_finished);
+#ifdef CONFIG_CRYPTO_MANAGER_DISABLE_TESTS
+static inline bool crypto_boot_test_finished(void)
+{
+	return true;
+}
+static inline void set_crypto_boot_test_finished(void)
+{
+}
+#else
+DECLARE_STATIC_KEY_FALSE(__crypto_boot_test_finished);
+static inline bool crypto_boot_test_finished(void)
+{
+	return static_branch_likely(&__crypto_boot_test_finished);
+}
+static inline void set_crypto_boot_test_finished(void)
+{
+	static_branch_enable(&__crypto_boot_test_finished);
+}
+#endif /* !CONFIG_CRYPTO_MANAGER_DISABLE_TESTS */
 
 #ifdef CONFIG_PROC_FS
 void __init crypto_init_proc(void);

diff --git a/crypto/kdf_sp800108.c b/crypto/kdf_sp800108.c
index 58edf77..c3f9938 100644
--- a/crypto/kdf_sp800108.c
+++ b/crypto/kdf_sp800108.c

@@ -125,9 +125,13 @@ static const struct kdf_testvec kdf_ctr_hmac_sha256_tv_template[] = {
 
 static int __init crypto_kdf108_init(void)
 {
-	int ret = kdf_test(&kdf_ctr_hmac_sha256_tv_template[0], "hmac(sha256)",
-			   crypto_kdf108_setkey, crypto_kdf108_ctr_generate);
+	int ret;
 
+	if (IS_ENABLED(CONFIG_CRYPTO_MANAGER_DISABLE_TESTS))
+		return 0;
+
+	ret = kdf_test(&kdf_ctr_hmac_sha256_tv_template[0], "hmac(sha256)",
+		       crypto_kdf108_setkey, crypto_kdf108_ctr_generate);
 	if (ret) {
 		if (fips_enabled)
 			panic("alg: self-tests for CTR-KDF (hmac(sha256)) failed (rc=%d)\n",
@@ -136,7 +140,7 @@ static int __init crypto_kdf108_init(void)
 		WARN(1,
 		     "alg: self-tests for CTR-KDF (hmac(sha256)) failed (rc=%d)\n",
 		     ret);
-	} else {
+	} else if (fips_enabled) {
 		pr_info("alg: self-tests for CTR-KDF (hmac(sha256)) passed\n");
 	}
 

diff --git a/drivers/OWNERS b/drivers/OWNERS
new file mode 100644
index 0000000..c13fa05
--- /dev/null
+++ b/drivers/OWNERS

@@ -0,0 +1,6 @@
+per-file base/**=gregkh@google.com,saravanak@google.com
+per-file block/**=akailash@google.com
+per-file md/**=akailash@google.com,paullawrence@google.com
+per-file net/**=file:/net/OWNERS
+per-file scsi/**=bvanassche@google.com,jaegeuk@google.com
+per-file {tty,usb}/**=gregkh@google.com

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 07aa8ae..c5e063b 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig

@@ -47,4 +47,59 @@
 	  exhaustively with combinations of various buffer sizes and
 	  alignments.
 
+config ANDROID_VENDOR_HOOKS
+	bool "Android Vendor Hooks"
+	depends on TRACEPOINTS
+	help
+	  Enable vendor hooks implemented as tracepoints
+
+	  Allow vendor modules to attach to tracepoint "hooks" defined via
+	  DECLARE_HOOK or DECLARE_RESTRICTED_HOOK.
+
+config ANDROID_DEBUG_KINFO
+	bool "Android Debug Kernel Information Support"
+	depends on KALLSYMS
+	help
+	  This supports kernel information backup for bootloader usage.
+	  Specifics:
+	   - The kallsyms symbols for unwind_backtrace
+	   - Page directory pointer
+	   - UTS_RELEASE
+	   - BUILD_INFO(ro.build.fingerprint)
+
+config ANDROID_KABI_RESERVE
+	bool "Android KABI reserve padding"
+	default y
+	help
+	  This option enables the padding that the Android GKI kernel adds
+	  to many different kernel structures to support an in-kernel stable ABI
+	  over the lifespan of support for the kernel.
+
+	  Only disable this option if you have a system that needs the Android
+	  kernel drivers, but is NOT an Android GKI kernel image. If disabled
+	  it has the possibility to make the kernel static and runtime image
+	  slightly smaller but will NOT be supported by the Google Android
+	  kernel team.
+
+	  If even slightly unsure, say Y.
+
+config ANDROID_VENDOR_OEM_DATA
+	bool "Android vendor and OEM data padding"
+	default y
+	help
+	  This option enables the padding that the Android GKI kernel adds
+	  to many different kernel structures to support an in-kernel stable ABI
+	  over the lifespan of support for the kernel as well as OEM additional
+	  fields that are needed by some of the Android kernel tracepoints. The
+	  macros enabled by this option are used to enable padding in vendor modules
+	  used for the above specified purposes.
+
+	  Only disable this option if you have a system that needs the Android
+	  kernel drivers, but is NOT an Android GKI kernel image and you do NOT
+	  use the Android kernel tracepoints. If disabled it has the possibility
+	  to make the kernel static and runtime image slightly smaller but will
+	  NOT be supported by the Google Android kernel team.
+
+	  If even slightly unsure, say Y.
+
 endmenu

diff --git a/drivers/android/Makefile b/drivers/android/Makefile
index c9d3d0c9..9b89e4b 100644
--- a/drivers/android/Makefile
+++ b/drivers/android/Makefile

@@ -4,3 +4,5 @@
 obj-$(CONFIG_ANDROID_BINDERFS)		+= binderfs.o
 obj-$(CONFIG_ANDROID_BINDER_IPC)	+= binder.o binder_alloc.o
 obj-$(CONFIG_ANDROID_BINDER_IPC_SELFTEST) += binder_alloc_selftest.o
+obj-$(CONFIG_ANDROID_VENDOR_HOOKS) += vendor_hooks.o
+obj-$(CONFIG_ANDROID_DEBUG_KINFO)	+= debug_kinfo.o

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 880224e..3479308 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c

@@ -66,13 +66,16 @@
 #include <linux/syscalls.h>
 #include <linux/task_work.h>
 #include <linux/sizes.h>
+#include <linux/android_vendor.h>
 
+#include <uapi/linux/sched/types.h>
 #include <uapi/linux/android/binder.h>
 
 #include <linux/cacheflush.h>
 
 #include "binder_internal.h"
 #include "binder_trace.h"
+#include <trace/hooks/binder.h>
 
 static HLIST_HEAD(binder_deferred_list);
 static DEFINE_MUTEX(binder_deferred_lock);
@@ -574,6 +577,7 @@ static void binder_wakeup_poll_threads_ilocked(struct binder_proc *proc,
 		thread = rb_entry(n, struct binder_thread, rb_node);
 		if (thread->looper & BINDER_LOOPER_STATE_POLL &&
 		    binder_available_for_proc_work_ilocked(thread)) {
+			trace_android_vh_binder_wakeup_ilocked(thread->task, sync, proc);
 			if (sync)
 				wake_up_interruptible_sync(&thread->wait);
 			else
@@ -633,6 +637,7 @@ static void binder_wakeup_thread_ilocked(struct binder_proc *proc,
 	assert_spin_locked(&proc->inner_lock);
 
 	if (thread) {
+		trace_android_vh_binder_wakeup_ilocked(thread->task, sync, proc);
 		if (sync)
 			wake_up_interruptible_sync(&thread->wait);
 		else
@@ -663,22 +668,190 @@ static void binder_wakeup_proc_ilocked(struct binder_proc *proc)
 	binder_wakeup_thread_ilocked(proc, thread, /* sync = */false);
 }
 
-static void binder_set_nice(long nice)
+static bool is_rt_policy(int policy)
 {
-	long min_nice;
+	return policy == SCHED_FIFO || policy == SCHED_RR;
+}
 
-	if (can_nice(current, nice)) {
-		set_user_nice(current, nice);
+static bool is_fair_policy(int policy)
+{
+	return policy == SCHED_NORMAL || policy == SCHED_BATCH;
+}
+
+static bool binder_supported_policy(int policy)
+{
+	return is_fair_policy(policy) || is_rt_policy(policy);
+}
+
+static int to_userspace_prio(int policy, int kernel_priority)
+{
+	if (is_fair_policy(policy))
+		return PRIO_TO_NICE(kernel_priority);
+	else
+		return MAX_RT_PRIO - 1 - kernel_priority;
+}
+
+static int to_kernel_prio(int policy, int user_priority)
+{
+	if (is_fair_policy(policy))
+		return NICE_TO_PRIO(user_priority);
+	else
+		return MAX_RT_PRIO - 1 - user_priority;
+}
+
+static void binder_do_set_priority(struct binder_thread *thread,
+				   const struct binder_priority *desired,
+				   bool verify)
+{
+	struct task_struct *task = thread->task;
+	int priority; /* user-space prio value */
+	bool has_cap_nice;
+	unsigned int policy = desired->sched_policy;
+
+	if (task->policy == policy && task->normal_prio == desired->prio) {
+		spin_lock(&thread->prio_lock);
+		if (thread->prio_state == BINDER_PRIO_PENDING)
+			thread->prio_state = BINDER_PRIO_SET;
+		spin_unlock(&thread->prio_lock);
 		return;
 	}
-	min_nice = rlimit_to_nice(rlimit(RLIMIT_NICE));
-	binder_debug(BINDER_DEBUG_PRIORITY_CAP,
-		     "%d: nice value %ld not allowed use %ld instead\n",
-		      current->pid, nice, min_nice);
-	set_user_nice(current, min_nice);
-	if (min_nice <= MAX_NICE)
+
+	has_cap_nice = has_capability_noaudit(task, CAP_SYS_NICE);
+
+	priority = to_userspace_prio(policy, desired->prio);
+
+	if (verify && is_rt_policy(policy) && !has_cap_nice) {
+		long max_rtprio = task_rlimit(task, RLIMIT_RTPRIO);
+
+		if (max_rtprio == 0) {
+			policy = SCHED_NORMAL;
+			priority = MIN_NICE;
+		} else if (priority > max_rtprio) {
+			priority = max_rtprio;
+		}
+	}
+
+	if (verify && is_fair_policy(policy) && !has_cap_nice) {
+		long min_nice = rlimit_to_nice(task_rlimit(task, RLIMIT_NICE));
+
+		if (min_nice > MAX_NICE) {
+			binder_user_error("%d RLIMIT_NICE not set\n",
+					  task->pid);
+			return;
+		} else if (priority < min_nice) {
+			priority = min_nice;
+		}
+	}
+
+	if (policy != desired->sched_policy ||
+	    to_kernel_prio(policy, priority) != desired->prio)
+		binder_debug(BINDER_DEBUG_PRIORITY_CAP,
+			     "%d: priority %d not allowed, using %d instead\n",
+			      task->pid, desired->prio,
+			      to_kernel_prio(policy, priority));
+
+	trace_binder_set_priority(task->tgid, task->pid, task->normal_prio,
+				  to_kernel_prio(policy, priority),
+				  desired->prio);
+
+	spin_lock(&thread->prio_lock);
+	if (!verify && thread->prio_state == BINDER_PRIO_ABORT) {
+		/*
+		 * A new priority has been set by an incoming nested
+		 * transaction. Abort this priority restore and allow
+		 * the transaction to run at the new desired priority.
+		 */
+		spin_unlock(&thread->prio_lock);
+		binder_debug(BINDER_DEBUG_PRIORITY_CAP,
+			"%d: %s: aborting priority restore\n",
+			thread->pid, __func__);
 		return;
-	binder_user_error("%d RLIMIT_NICE not set\n", current->pid);
+	}
+
+	/* Set the actual priority */
+	if (task->policy != policy || is_rt_policy(policy)) {
+		struct sched_param params;
+
+		params.sched_priority = is_rt_policy(policy) ? priority : 0;
+
+		sched_setscheduler_nocheck(task,
+					   policy | SCHED_RESET_ON_FORK,
+					   &params);
+	}
+	if (is_fair_policy(policy))
+		set_user_nice(task, priority);
+
+	thread->prio_state = BINDER_PRIO_SET;
+	spin_unlock(&thread->prio_lock);
+}
+
+static void binder_set_priority(struct binder_thread *thread,
+				const struct binder_priority *desired)
+{
+	binder_do_set_priority(thread, desired, /* verify = */ true);
+}
+
+static void binder_restore_priority(struct binder_thread *thread,
+				    const struct binder_priority *desired)
+{
+	binder_do_set_priority(thread, desired, /* verify = */ false);
+}
+
+static void binder_transaction_priority(struct binder_thread *thread,
+					struct binder_transaction *t,
+					struct binder_node *node)
+{
+	struct task_struct *task = thread->task;
+	struct binder_priority desired = t->priority;
+	const struct binder_priority node_prio = {
+		.sched_policy = node->sched_policy,
+		.prio = node->min_priority,
+	};
+
+	if (t->set_priority_called)
+		return;
+
+	t->set_priority_called = true;
+
+	if (!node->inherit_rt && is_rt_policy(desired.sched_policy)) {
+		desired.prio = NICE_TO_PRIO(0);
+		desired.sched_policy = SCHED_NORMAL;
+	}
+
+	if (node_prio.prio < t->priority.prio ||
+	    (node_prio.prio == t->priority.prio &&
+	     node_prio.sched_policy == SCHED_FIFO)) {
+		/*
+		 * In case the minimum priority on the node is
+		 * higher (lower value), use that priority. If
+		 * the priority is the same, but the node uses
+		 * SCHED_FIFO, prefer SCHED_FIFO, since it can
+		 * run unbounded, unlike SCHED_RR.
+		 */
+		desired = node_prio;
+	}
+
+	spin_lock(&thread->prio_lock);
+	if (thread->prio_state == BINDER_PRIO_PENDING) {
+		/*
+		 * Task is in the process of changing priorities
+		 * saving its current values would be incorrect.
+		 * Instead, save the pending priority and signal
+		 * the task to abort the priority restore.
+		 */
+		t->saved_priority = thread->prio_next;
+		thread->prio_state = BINDER_PRIO_ABORT;
+		binder_debug(BINDER_DEBUG_PRIORITY_CAP,
+			"%d: saved pending priority %d\n",
+			current->pid, thread->prio_next.prio);
+	} else {
+		t->saved_priority.sched_policy = task->policy;
+		t->saved_priority.prio = task->normal_prio;
+	}
+	spin_unlock(&thread->prio_lock);
+
+	binder_set_priority(thread, &desired);
+	trace_android_vh_binder_set_priority(t, task);
 }
 
 static struct binder_node *binder_get_node_ilocked(struct binder_proc *proc,
@@ -731,6 +904,7 @@ static struct binder_node *binder_init_node_ilocked(
 	binder_uintptr_t ptr = fp ? fp->binder : 0;
 	binder_uintptr_t cookie = fp ? fp->cookie : 0;
 	__u32 flags = fp ? fp->flags : 0;
+	s8 priority;
 
 	assert_spin_locked(&proc->inner_lock);
 
@@ -763,8 +937,12 @@ static struct binder_node *binder_init_node_ilocked(
 	node->ptr = ptr;
 	node->cookie = cookie;
 	node->work.type = BINDER_WORK_NODE;
-	node->min_priority = flags & FLAT_BINDER_FLAG_PRIORITY_MASK;
+	priority = flags & FLAT_BINDER_FLAG_PRIORITY_MASK;
+	node->sched_policy = (flags & FLAT_BINDER_FLAG_SCHED_POLICY_MASK) >>
+		FLAT_BINDER_FLAG_SCHED_POLICY_SHIFT;
+	node->min_priority = to_kernel_prio(node->sched_policy, priority);
 	node->accept_fds = !!(flags & FLAT_BINDER_FLAG_ACCEPTS_FDS);
+	node->inherit_rt = !!(flags & FLAT_BINDER_FLAG_INHERIT_RT);
 	node->txn_security_ctx = !!(flags & FLAT_BINDER_FLAG_TXN_SECURITY_CTX);
 	spin_lock_init(&node->lock);
 	INIT_LIST_HEAD(&node->work.entry);
@@ -2741,6 +2919,7 @@ static int binder_proc_transaction(struct binder_transaction *t,
 
 	BUG_ON(!node);
 	binder_node_lock(node);
+
 	if (oneway) {
 		BUG_ON(thread);
 		if (node->has_async_transaction)
@@ -2766,6 +2945,7 @@ static int binder_proc_transaction(struct binder_transaction *t,
 		thread = binder_select_thread_ilocked(proc);
 
 	if (thread) {
+		binder_transaction_priority(thread, t, node);
 		binder_enqueue_thread_work_ilocked(thread, &t->work);
 	} else if (!pending_async) {
 		binder_enqueue_work_ilocked(&t->work, &proc->todo);
@@ -2902,6 +3082,7 @@ static void binder_transaction(struct binder_proc *proc,
 	struct list_head pf_head;
 	const void __user *user_buffer = (const void __user *)
 				(uintptr_t)tr->data.ptr.buffer;
+	bool is_nested = false;
 	INIT_LIST_HEAD(&sgc_head);
 	INIT_LIST_HEAD(&pf_head);
 
@@ -2949,7 +3130,6 @@ static void binder_transaction(struct binder_proc *proc,
 		}
 		thread->transaction_stack = in_reply_to->to_parent;
 		binder_inner_proc_unlock(proc);
-		binder_set_nice(in_reply_to->saved_priority);
 		target_thread = binder_get_txn_from_and_acq_inner(in_reply_to);
 		if (target_thread == NULL) {
 			/* annotation for sparse */
@@ -3099,6 +3279,7 @@ static void binder_transaction(struct binder_proc *proc,
 					atomic_inc(&from->tmp_ref);
 					target_thread = from;
 					spin_unlock(&tmp->lock);
+					is_nested = true;
 					break;
 				}
 				spin_unlock(&tmp->lock);
@@ -3124,6 +3305,7 @@ static void binder_transaction(struct binder_proc *proc,
 	INIT_LIST_HEAD(&t->fd_fixups);
 	binder_stats_created(BINDER_STAT_TRANSACTION);
 	spin_lock_init(&t->lock);
+	trace_android_vh_binder_transaction_init(t);
 
 	tcomplete = kzalloc(sizeof(*tcomplete), GFP_KERNEL);
 	if (tcomplete == NULL) {
@@ -3166,7 +3348,16 @@ static void binder_transaction(struct binder_proc *proc,
 	t->to_thread = target_thread;
 	t->code = tr->code;
 	t->flags = tr->flags;
-	t->priority = task_nice(current);
+	t->is_nested = is_nested;
+	if (!(t->flags & TF_ONE_WAY) &&
+	    binder_supported_policy(current->policy)) {
+		/* Inherit supported policies for synchronous transactions */
+		t->priority.sched_policy = current->policy;
+		t->priority.prio = current->normal_prio;
+	} else {
+		/* Otherwise, fall back to the default priority */
+		t->priority = target_proc->default_priority;
+	}
 
 	if (target_node && target_node->txn_security_ctx) {
 		u32 secid;
@@ -3579,7 +3770,15 @@ static void binder_transaction(struct binder_proc *proc,
 		binder_enqueue_thread_work_ilocked(target_thread, &t->work);
 		target_proc->outstanding_txns++;
 		binder_inner_proc_unlock(target_proc);
+		if (in_reply_to->is_nested) {
+			spin_lock(&thread->prio_lock);
+			thread->prio_state = BINDER_PRIO_PENDING;
+			thread->prio_next = in_reply_to->saved_priority;
+			spin_unlock(&thread->prio_lock);
+		}
 		wake_up_interruptible_sync(&target_thread->wait);
+		trace_android_vh_binder_restore_priority(in_reply_to, current);
+		binder_restore_priority(thread, &in_reply_to->saved_priority);
 		binder_free_transaction(in_reply_to);
 	} else if (!(t->flags & TF_ONE_WAY)) {
 		BUG_ON(t->buffer->async_transaction != 0);
@@ -3702,6 +3901,8 @@ static void binder_transaction(struct binder_proc *proc,
 
 	BUG_ON(thread->return_error.cmd != BR_OK);
 	if (in_reply_to) {
+		trace_android_vh_binder_restore_priority(in_reply_to, current);
+		binder_restore_priority(thread, &in_reply_to->saved_priority);
 		binder_set_txn_from_error(in_reply_to, t_debug_id,
 				return_error, return_error_param);
 		thread->return_error.cmd = BR_TRANSACTION_COMPLETE;
@@ -4267,6 +4468,7 @@ static int binder_wait_for_work(struct binder_thread *thread,
 		if (do_proc_work)
 			list_add(&thread->waiting_thread_node,
 				 &proc->waiting_threads);
+		trace_android_vh_binder_wait_for_work(do_proc_work, thread, proc);
 		binder_inner_proc_unlock(proc);
 		schedule();
 		binder_inner_proc_lock(proc);
@@ -4372,7 +4574,8 @@ static int binder_thread_read(struct binder_proc *proc,
 			wait_event_interruptible(binder_user_error_wait,
 						 binder_stop_on_user_error < 2);
 		}
-		binder_set_nice(proc->default_priority);
+		trace_android_vh_binder_restore_priority(NULL, current);
+		binder_restore_priority(thread, &proc->default_priority);
 	}
 
 	if (non_block) {
@@ -4602,13 +4805,7 @@ static int binder_thread_read(struct binder_proc *proc,
 
 			trd->target.ptr = target_node->ptr;
 			trd->cookie =  target_node->cookie;
-			t->saved_priority = task_nice(current);
-			if (t->priority < target_node->min_priority &&
-			    !(t->flags & TF_ONE_WAY))
-				binder_set_nice(t->priority);
-			else if (!(t->flags & TF_ONE_WAY) ||
-				 t->saved_priority > target_node->min_priority)
-				binder_set_nice(target_node->min_priority);
+			binder_transaction_priority(thread, t, target_node);
 			cmd = BR_TRANSACTION;
 		} else {
 			trd->target.ptr = 0;
@@ -4626,6 +4823,7 @@ static int binder_thread_read(struct binder_proc *proc,
 			trd->sender_pid =
 				task_tgid_nr_ns(sender,
 						task_active_pid_ns(current));
+			trace_android_vh_sync_txn_recvd(thread->task, t_from->task);
 		} else {
 			trd->sender_pid = 0;
 		}
@@ -4826,6 +5024,8 @@ static struct binder_thread *binder_get_thread_ilocked(
 	binder_stats_created(BINDER_STAT_THREAD);
 	thread->proc = proc;
 	thread->pid = current->pid;
+	get_task_struct(current);
+	thread->task = current;
 	atomic_set(&thread->tmp_ref, 0);
 	init_waitqueue_head(&thread->wait);
 	INIT_LIST_HEAD(&thread->todo);
@@ -4836,6 +5036,8 @@ static struct binder_thread *binder_get_thread_ilocked(
 	thread->return_error.cmd = BR_OK;
 	thread->reply_error.work.type = BINDER_WORK_RETURN_ERROR;
 	thread->reply_error.cmd = BR_OK;
+	spin_lock_init(&thread->prio_lock);
+	thread->prio_state = BINDER_PRIO_SET;
 	thread->ee.command = BR_OK;
 	INIT_LIST_HEAD(&new_thread->waiting_thread_node);
 	return thread;
@@ -4888,6 +5090,7 @@ static void binder_free_thread(struct binder_thread *thread)
 	BUG_ON(!list_empty(&thread->todo));
 	binder_stats_deleted(BINDER_STAT_THREAD);
 	binder_proc_dec_tmpref(thread->proc);
+	put_task_struct(thread->task);
 	kfree(thread);
 }
 
@@ -5602,7 +5805,14 @@ static int binder_open(struct inode *nodp, struct file *filp)
 	proc->cred = get_cred(filp->f_cred);
 	INIT_LIST_HEAD(&proc->todo);
 	init_waitqueue_head(&proc->freeze_wait);
-	proc->default_priority = task_nice(current);
+	if (binder_supported_policy(current->policy)) {
+		proc->default_priority.sched_policy = current->policy;
+		proc->default_priority.prio = current->normal_prio;
+	} else {
+		proc->default_priority.sched_policy = SCHED_NORMAL;
+		proc->default_priority.prio = NICE_TO_PRIO(0);
+	}
+
 	/* binderfs stashes devices in i_private */
 	if (is_binderfs_device(nodp)) {
 		binder_dev = nodp->i_private;
@@ -5927,13 +6137,14 @@ static void print_binder_transaction_ilocked(struct seq_file *m,
 	spin_lock(&t->lock);
 	to_proc = t->to_proc;
 	seq_printf(m,
-		   "%s %d: %pK from %d:%d to %d:%d code %x flags %x pri %ld r%d",
+		   "%s %d: %pK from %d:%d to %d:%d code %x flags %x pri %d:%d r%d",
 		   prefix, t->debug_id, t,
 		   t->from ? t->from->proc->pid : 0,
 		   t->from ? t->from->pid : 0,
 		   to_proc ? to_proc->pid : 0,
 		   t->to_thread ? t->to_thread->pid : 0,
-		   t->code, t->flags, t->priority, t->need_reply);
+		   t->code, t->flags, t->priority.sched_policy,
+		   t->priority.prio, t->need_reply);
 	spin_unlock(&t->lock);
 
 	if (proc != to_proc) {
@@ -6051,8 +6262,9 @@ static void print_binder_node_nilocked(struct seq_file *m,
 	hlist_for_each_entry(ref, &node->refs, node_entry)
 		count++;
 
-	seq_printf(m, "  node %d: u%016llx c%016llx hs %d hw %d ls %d lw %d is %d iw %d tr %d",
+	seq_printf(m, "  node %d: u%016llx c%016llx pri %d:%d hs %d hw %d ls %d lw %d is %d iw %d tr %d",
 		   node->debug_id, (u64)node->ptr, (u64)node->cookie,
+		   node->sched_policy, node->min_priority,
 		   node->has_strong_ref, node->has_weak_ref,
 		   node->local_strong_refs, node->local_weak_refs,
 		   node->internal_strong_refs, count, node->tmp_refs);
@@ -6596,5 +6808,6 @@ device_initcall(binder_init);
 
 #define CREATE_TRACE_POINTS
 #include "binder_trace.h"
+EXPORT_TRACEPOINT_SYMBOL_GPL(binder_transaction_received);
 
 MODULE_LICENSE("GPL v2");

diff --git a/drivers/android/binder_internal.h b/drivers/android/binder_internal.h
index abe19d8..9132d14 100644
--- a/drivers/android/binder_internal.h
+++ b/drivers/android/binder_internal.h

@@ -215,10 +215,13 @@ struct binder_error {
  *                        and by @lock)
  * @has_async_transaction: async transaction to node in progress
  *                        (protected by @lock)
+ * @sched_policy:         minimum scheduling policy for node
+ *                        (invariant after initialized)
  * @accept_fds:           file descriptor operations supported for node
  *                        (invariant after initialized)
  * @min_priority:         minimum scheduling priority
  *                        (invariant after initialized)
+ * @inherit_rt:           inherit RT scheduling policy from caller
  * @txn_security_ctx:     require sender's security context
  *                        (invariant after initialized)
  * @async_todo:           list of async work items
@@ -256,6 +259,8 @@ struct binder_node {
 		/*
 		 * invariant after initialization
 		 */
+		u8 sched_policy:2;
+		u8 inherit_rt:1;
 		u8 accept_fds:1;
 		u8 txn_security_ctx:1;
 		u8 min_priority;
@@ -325,6 +330,28 @@ struct binder_ref {
 };
 
 /**
+ * struct binder_priority - scheduler policy and priority
+ * @sched_policy            scheduler policy
+ * @prio                    [100..139] for SCHED_NORMAL, [0..99] for FIFO/RT
+ *
+ * The binder driver supports inheriting the following scheduler policies:
+ * SCHED_NORMAL
+ * SCHED_BATCH
+ * SCHED_FIFO
+ * SCHED_RR
+ */
+struct binder_priority {
+	unsigned int sched_policy;
+	int prio;
+};
+
+enum binder_prio_state {
+	BINDER_PRIO_SET,	/* desired priority set */
+	BINDER_PRIO_PENDING,	/* initiated a saved priority restore */
+	BINDER_PRIO_ABORT,	/* abort the pending priority restore */
+};
+
+/**
  * struct binder_proc - binder process bookkeeping
  * @proc_node:            element for binder_procs list
  * @threads:              rbtree of binder_threads in this proc
@@ -424,7 +451,7 @@ struct binder_proc {
 	int requested_threads;
 	int requested_threads_started;
 	int tmp_ref;
-	long default_priority;
+	struct binder_priority default_priority;
 	struct dentry *debugfs_entry;
 	struct binder_alloc alloc;
 	struct binder_context *context;
@@ -469,6 +496,13 @@ struct binder_proc {
  * @is_dead:              thread is dead and awaiting free
  *                        when outstanding transactions are cleaned up
  *                        (protected by @proc->inner_lock)
+ * @task:                 struct task_struct for this thread
+ * @prio_lock:            protects thread priority fields
+ * @prio_next:            saved priority to be restored next
+ *                        (protected by @prio_lock)
+ * @prio_state:           state of the priority restore process as
+ *                        defined by enum binder_prio_state
+ *                        (protected by @prio_lock)
  *
  * Bookkeeping structure for binder threads.
  */
@@ -489,6 +523,10 @@ struct binder_thread {
 	struct binder_stats stats;
 	atomic_t tmp_ref;
 	bool is_dead;
+	struct task_struct *task;
+	spinlock_t prio_lock;
+	struct binder_priority prio_next;
+	enum binder_prio_state prio_state;
 };
 
 /**
@@ -524,8 +562,10 @@ struct binder_transaction {
 	struct binder_buffer *buffer;
 	unsigned int    code;
 	unsigned int    flags;
-	long    priority;
-	long    saved_priority;
+	struct binder_priority priority;
+	struct binder_priority saved_priority;
+	bool set_priority_called;
+	bool is_nested;
 	kuid_t  sender_euid;
 	struct list_head fd_fixups;
 	binder_uintptr_t security_ctx;
@@ -536,6 +576,7 @@ struct binder_transaction {
 	 * during thread teardown
 	 */
 	spinlock_t lock;
+	ANDROID_VENDOR_DATA(1);
 };
 
 /**

diff --git a/drivers/android/binder_trace.h b/drivers/android/binder_trace.h
index 8cc07e6..5d82cf8 100644
--- a/drivers/android/binder_trace.h
+++ b/drivers/android/binder_trace.h

@@ -76,6 +76,30 @@ DEFINE_BINDER_FUNCTION_RETURN_EVENT(binder_ioctl_done);
 DEFINE_BINDER_FUNCTION_RETURN_EVENT(binder_write_done);
 DEFINE_BINDER_FUNCTION_RETURN_EVENT(binder_read_done);
 
+TRACE_EVENT(binder_set_priority,
+	TP_PROTO(int proc, int thread, unsigned int old_prio,
+		 unsigned int desired_prio, unsigned int new_prio),
+	TP_ARGS(proc, thread, old_prio, new_prio, desired_prio),
+
+	TP_STRUCT__entry(
+		__field(int, proc)
+		__field(int, thread)
+		__field(unsigned int, old_prio)
+		__field(unsigned int, new_prio)
+		__field(unsigned int, desired_prio)
+	),
+	TP_fast_assign(
+		__entry->proc = proc;
+		__entry->thread = thread;
+		__entry->old_prio = old_prio;
+		__entry->new_prio = new_prio;
+		__entry->desired_prio = desired_prio;
+	),
+	TP_printk("proc=%d thread=%d old=%d => new=%d desired=%d",
+		  __entry->proc, __entry->thread, __entry->old_prio,
+		  __entry->new_prio, __entry->desired_prio)
+);
+
 TRACE_EVENT(binder_wait_for_work,
 	TP_PROTO(bool proc_work, bool transaction_stack, bool thread_todo),
 	TP_ARGS(proc_work, transaction_stack, thread_todo),

diff --git a/drivers/android/debug_kinfo.c b/drivers/android/debug_kinfo.c
new file mode 100644
index 0000000..6a16232
--- /dev/null
+++ b/drivers/android/debug_kinfo.c

@@ -0,0 +1,198 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * debug_kinfo.c - backup kernel information for bootloader usage
+ *
+ * Copyright 2002 Rusty Russell <rusty@rustcorp.com.au> IBM Corporation
+ * Copyright 2021 Google LLC
+ */
+
+#include <linux/platform_device.h>
+#include <linux/kallsyms.h>
+#include <linux/vmalloc.h>
+#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/of_reserved_mem.h>
+#include <linux/pgtable.h>
+#include <asm/module.h>
+#include "debug_kinfo.h"
+
+/*
+ * These will be re-linked against their real values
+ * during the second link stage.
+ */
+extern const unsigned long kallsyms_addresses[] __weak;
+extern const int kallsyms_offsets[] __weak;
+extern const u8 kallsyms_names[] __weak;
+
+/*
+ * Tell the compiler that the count isn't in the small data section if the arch
+ * has one (eg: FRV).
+ */
+extern const unsigned int kallsyms_num_syms __weak
+__section(".rodata");
+
+extern const unsigned long kallsyms_relative_base __weak
+__section(".rodata");
+
+extern const u8 kallsyms_token_table[] __weak;
+extern const u16 kallsyms_token_index[] __weak;
+
+extern const unsigned int kallsyms_markers[] __weak;
+
+static void *all_info_addr;
+static u32 all_info_size;
+
+static void update_kernel_all_info(struct kernel_all_info *all_info)
+{
+	int index;
+	struct kernel_info *info;
+	u32 *checksum_info;
+
+	all_info->magic_number = DEBUG_KINFO_MAGIC;
+	all_info->combined_checksum = 0;
+
+	info = &(all_info->info);
+	checksum_info = (u32 *)info;
+	for (index = 0; index < sizeof(*info) / sizeof(u32); index++)
+		all_info->combined_checksum ^= checksum_info[index];
+}
+
+static int build_info_set(const char *str, const struct kernel_param *kp)
+{
+	struct kernel_all_info *all_info;
+	size_t build_info_size;
+	int ret = 0;
+
+	if (all_info_addr == 0 || all_info_size == 0) {
+		ret = -EPERM;
+		goto Exit;
+	}
+
+	all_info = (struct kernel_all_info *)all_info_addr;
+	build_info_size = sizeof(all_info->info.build_info);
+
+	memcpy(&all_info->info.build_info, str, min(build_info_size - 1, strlen(str)));
+	update_kernel_all_info(all_info);
+
+	if (strlen(str) > build_info_size) {
+		pr_warn("%s: Build info buffer (len: %zd) can't hold entire string '%s'\n",
+				__func__, build_info_size, str);
+		ret = -ENOMEM;
+	}
+
+Exit:
+	return ret;
+}
+
+static const struct kernel_param_ops build_info_op = {
+	.set = build_info_set,
+};
+
+module_param_cb(build_info, &build_info_op, NULL, 0200);
+MODULE_PARM_DESC(build_info, "Write build info to field 'build_info' of debug kinfo.");
+
+static int debug_kinfo_probe(struct platform_device *pdev)
+{
+	struct device_node *mem_region;
+	struct reserved_mem *rmem;
+	struct kernel_all_info *all_info;
+	struct kernel_info *info;
+
+	mem_region = of_parse_phandle(pdev->dev.of_node, "memory-region", 0);
+	if (!mem_region) {
+		dev_warn(&pdev->dev, "no such memory-region\n");
+		return -ENODEV;
+	}
+
+	rmem = of_reserved_mem_lookup(mem_region);
+	if (!rmem) {
+		dev_warn(&pdev->dev, "no such reserved mem of node name %s\n",
+				pdev->dev.of_node->name);
+		return -ENODEV;
+	}
+
+	/* Need to wait for reserved memory to be mapped */
+	if (!rmem->priv) {
+		return -EPROBE_DEFER;
+	}
+
+	if (!rmem->base || !rmem->size) {
+		dev_warn(&pdev->dev, "unexpected reserved memory\n");
+		return -EINVAL;
+	}
+
+	if (rmem->size < sizeof(struct kernel_all_info)) {
+		dev_warn(&pdev->dev, "unexpected reserved memory size\n");
+		return -EINVAL;
+	}
+
+	all_info_addr = rmem->priv;
+	all_info_size = rmem->size;
+
+	memset(all_info_addr, 0, sizeof(struct kernel_all_info));
+	all_info = (struct kernel_all_info *)all_info_addr;
+	info = &(all_info->info);
+	info->enabled_all = IS_ENABLED(CONFIG_KALLSYMS_ALL);
+	info->enabled_base_relative = IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE);
+	info->enabled_absolute_percpu = IS_ENABLED(CONFIG_KALLSYMS_ABSOLUTE_PERCPU);
+	info->enabled_cfi_clang = IS_ENABLED(CONFIG_CFI_CLANG);
+	info->num_syms = kallsyms_num_syms;
+	info->name_len = KSYM_NAME_LEN;
+	info->bit_per_long = BITS_PER_LONG;
+	info->module_name_len = MODULE_NAME_LEN;
+	info->symbol_len = KSYM_SYMBOL_LEN;
+	if (!info->enabled_base_relative)
+		info->_addresses_pa = (u64)__pa_symbol((volatile void *)kallsyms_addresses);
+	else {
+		info->_relative_pa = (u64)__pa_symbol((volatile void *)kallsyms_relative_base);
+		info->_offsets_pa = (u64)__pa_symbol((volatile void *)kallsyms_offsets);
+	}
+	info->_stext_pa = (u64)__pa_symbol(_stext);
+	info->_etext_pa = (u64)__pa_symbol(_etext);
+	info->_sinittext_pa = (u64)__pa_symbol(_sinittext);
+	info->_einittext_pa = (u64)__pa_symbol(_einittext);
+	info->_end_pa = (u64)__pa_symbol(_end);
+	info->_names_pa = (u64)__pa_symbol((volatile void *)kallsyms_names);
+	info->_token_table_pa = (u64)__pa_symbol((volatile void *)kallsyms_token_table);
+	info->_token_index_pa = (u64)__pa_symbol((volatile void *)kallsyms_token_index);
+	info->_markers_pa = (u64)__pa_symbol((volatile void *)kallsyms_markers);
+	info->thread_size = THREAD_SIZE;
+	info->swapper_pg_dir_pa = (u64)__pa_symbol(swapper_pg_dir);
+	strlcpy(info->last_uts_release, init_utsname()->release, sizeof(info->last_uts_release));
+	info->enabled_modules_tree_lookup = IS_ENABLED(CONFIG_MODULES_TREE_LOOKUP);
+	info->mod_core_layout_offset = offsetof(struct module, core_layout);
+	info->mod_init_layout_offset = offsetof(struct module, init_layout);
+	info->mod_kallsyms_offset = offsetof(struct module, kallsyms);
+#if defined(CONFIG_RANDOMIZE_BASE) && defined(MODULES_VSIZE)
+	info->module_start_va = module_alloc_base;
+	info->module_end_va = info->module_start_va + MODULES_VSIZE;
+#elif defined(CONFIG_MODULES) && defined(MODULES_VADDR)
+	info->module_start_va = MODULES_VADDR;
+	info->module_end_va = MODULES_END;
+#else
+	info->module_start_va = VMALLOC_START;
+	info->module_end_va = VMALLOC_END;
+#endif
+	update_kernel_all_info(all_info);
+
+	return 0;
+}
+
+static const struct of_device_id debug_kinfo_of_match[] = {
+	{ .compatible	= "google,debug-kinfo" },
+	{},
+};
+MODULE_DEVICE_TABLE(of, debug_kinfo_of_match);
+
+static struct platform_driver debug_kinfo_driver = {
+	.probe = debug_kinfo_probe,
+	.driver = {
+		.name = "debug-kinfo",
+		.of_match_table = of_match_ptr(debug_kinfo_of_match),
+	},
+};
+module_platform_driver(debug_kinfo_driver);
+
+MODULE_AUTHOR("Jone Chou <jonechou@google.com>");
+MODULE_DESCRIPTION("Debug Kinfo Driver");
+MODULE_LICENSE("GPL v2");

diff --git a/drivers/android/debug_kinfo.h b/drivers/android/debug_kinfo.h
new file mode 100644
index 0000000..921f140
--- /dev/null
+++ b/drivers/android/debug_kinfo.h

@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * debug_kinfo.h - backup kernel information for bootloader usage
+ *
+ * Copyright 2021 Google LLC
+ */
+
+#ifndef DEBUG_KINFO_H
+#define DEBUG_KINFO_H
+
+#include <linux/utsname.h>
+
+#define BUILD_INFO_LEN		256
+#define DEBUG_KINFO_MAGIC	0xCCEEDDFF
+
+/*
+ * Header structure must be byte-packed, since the table is provided to
+ * bootloader.
+ */
+struct kernel_info {
+	/* For kallsyms */
+	__u8 enabled_all;
+	__u8 enabled_base_relative;
+	__u8 enabled_absolute_percpu;
+	__u8 enabled_cfi_clang;
+	__u32 num_syms;
+	__u16 name_len;
+	__u16 bit_per_long;
+	__u16 module_name_len;
+	__u16 symbol_len;
+	__u64 _addresses_pa;
+	__u64 _relative_pa;
+	__u64 _stext_pa;
+	__u64 _etext_pa;
+	__u64 _sinittext_pa;
+	__u64 _einittext_pa;
+	__u64 _end_pa;
+	__u64 _offsets_pa;
+	__u64 _names_pa;
+	__u64 _token_table_pa;
+	__u64 _token_index_pa;
+	__u64 _markers_pa;
+
+	/* For frame pointer */
+	__u32 thread_size;
+
+	/* For virt_to_phys */
+	__u64 swapper_pg_dir_pa;
+
+	/* For linux banner */
+	__u8 last_uts_release[__NEW_UTS_LEN];
+
+	/* Info of running build */
+	__u8 build_info[BUILD_INFO_LEN];
+
+	/* For module kallsyms */
+	__u32 enabled_modules_tree_lookup;
+	__u32 mod_core_layout_offset;
+	__u32 mod_init_layout_offset;
+	__u32 mod_kallsyms_offset;
+	__u64 module_start_va;
+	__u64 module_end_va;
+} __packed;
+
+struct kernel_all_info {
+	__u32 magic_number;
+	__u32 combined_checksum;
+	struct kernel_info info;
+} __packed;
+
+#endif // DEBUG_KINFO_H

diff --git a/drivers/android/vendor_hooks.c b/drivers/android/vendor_hooks.c
new file mode 100644
index 0000000..2998afd
--- /dev/null
+++ b/drivers/android/vendor_hooks.c

@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* vendor_hook.c
+ *
+ * Android Vendor Hook Support
+ *
+ * Copyright 2020 Google LLC
+ */
+
+#include <linux/iova.h>
+#include <linux/dma-buf.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/hooks/vendor_hooks.h>
+#include <linux/tracepoint.h>
+
+#include <trace/hooks/fpsimd.h>
+#include <trace/hooks/binder.h>
+#include <trace/hooks/cpuidle.h>
+#include <trace/hooks/mpam.h>
+#include <trace/hooks/wqlockup.h>
+#include <trace/hooks/debug.h>
+#include <trace/hooks/sysrqcrash.h>
+#include <trace/hooks/printk.h>
+#include <trace/hooks/epoch.h>
+#include <trace/hooks/cpufreq.h>
+#include <trace/hooks/preemptirq.h>
+#include <trace/hooks/ftrace_dump.h>
+#include <trace/hooks/ufshcd.h>
+#include <trace/hooks/cgroup.h>
+#include <trace/hooks/sys.h>
+#include <trace/hooks/iommu.h>
+#include <trace/hooks/net.h>
+#include <trace/hooks/pm_domain.h>
+#include <trace/hooks/cpuidle_psci.h>
+#include <trace/hooks/vmscan.h>
+#include <trace/hooks/avc.h>
+#include <trace/hooks/creds.h>
+#include <trace/hooks/module.h>
+#include <trace/hooks/selinux.h>
+#include <trace/hooks/syscall_check.h>
+#include <trace/hooks/remoteproc.h>
+#include <trace/hooks/rwsem.h>
+#include <trace/hooks/futex.h>
+#include <trace/hooks/fips140.h>
+#include <trace/hooks/dmabuf.h>
+#include <trace/hooks/gic.h>
+#include <trace/hooks/gic_v3.h>
+#include <trace/hooks/timer.h>
+#include <trace/hooks/topology.h>
+#include <trace/hooks/hung_task.h>
+
+/*
+ * Export tracepoints that act as a bare tracehook (ie: have no trace event
+ * associated with them) to allow external modules to probe them.
+ */
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_arch_set_freq_scale);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_is_fpsimd_save);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_binder_transaction_init);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_binder_set_priority);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_binder_restore_priority);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_binder_wakeup_ilocked);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cpu_idle_enter);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cpu_idle_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_mpam_set);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_wq_lockup_pool);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ipi_stop);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_sysrq_crash);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_printk_hotplug);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_show_suspend_epoch_val);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_show_resume_epoch_val);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_freq_table_limits);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cpufreq_resolve_freq);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cpufreq_fast_switch);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cpufreq_target);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_preempt_disable);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_preempt_enable);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_irqs_disable);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_irqs_enable);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_cpu_cgroup_attach);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_cpu_cgroup_online);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ftrace_oops_enter);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ftrace_oops_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ftrace_size_check);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ftrace_format_check);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ftrace_dump_buffer);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ufs_fill_prdt);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ufs_prepare_command);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ufs_update_sysfs);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ufs_send_command);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ufs_compl_command);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cgroup_set_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_syscall_prctl_finished);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ufs_send_uic_command);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ufs_send_tm_command);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ufs_check_int_errors);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cgroup_attach);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_iommu_setup_dma_ops);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_iommu_alloc_insert_iova);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_iommu_iovad_alloc_iova);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_iommu_iovad_free_iova);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_iommu_iovad_init_alloc_algo);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_iommu_limit_align_shift);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ptype_head);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_allow_domain_state);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cpuidle_psci_enter);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cpuidle_psci_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_binder_wait_for_work);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_sync_txn_recvd);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_cpufreq_transition);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_set_balance_anon_file_reclaim);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_show_max_freq);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_selinux_avc_insert);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_selinux_avc_node_delete);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_selinux_avc_node_replace);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_selinux_avc_lookup);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_commit_creds);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_exit_creds);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_override_creds);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_revert_creds);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_set_module_core_rw_nx);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_set_module_init_rw_nx);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_set_module_permit_before_init);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_set_module_permit_after_init);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_selinux_is_initialized);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_check_mmap_file);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_check_file_open);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_check_bpf_syscall);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ignore_dmabuf_vmap_bounds);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_rproc_recovery);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_rproc_recovery_set);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_rwsem_init);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_rwsem_wake);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_rwsem_write_finished);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_alter_rwsem_list_add);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_alter_futex_plist_add);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_sha256);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_aes_expandkey);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_aes_encrypt);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_aes_decrypt);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_timer_calc_index);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_gic_v3_set_affinity);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_gic_set_affinity);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_gic_v3_affinity_init);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_check_uninterrupt_tasks);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_check_uninterrupt_tasks_done);

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index e7d6e665..2cdade8 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c

@@ -23,6 +23,10 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/thermal_pressure.h>
 
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/sched.h>
+#include <trace/hooks/topology.h>
+
 static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
 static struct cpumask scale_freq_counters_mask;
 static bool scale_freq_invariant;
@@ -146,6 +150,8 @@ void topology_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq,
 
 	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
 
+	trace_android_vh_arch_set_freq_scale(cpus, cur_freq, max_freq, &scale);
+
 	for_each_cpu(i, cpus)
 		per_cpu(arch_freq_scale, i) = scale;
 }
@@ -159,6 +165,7 @@ void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
 }
 
 DEFINE_PER_CPU(unsigned long, thermal_pressure);
+EXPORT_PER_CPU_SYMBOL_GPL(thermal_pressure);
 
 /**
  * topology_update_thermal_pressure() - Update thermal pressure for CPUs
@@ -201,8 +208,11 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
 
 	trace_thermal_pressure_update(cpu, th_pressure);
 
-	for_each_cpu(cpu, cpus)
+	for_each_cpu(cpu, cpus) {
 		WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
+		trace_android_rvh_update_thermal_stats(cpu);
+	}
+
 }
 EXPORT_SYMBOL_GPL(topology_update_thermal_pressure);
 
@@ -240,6 +250,8 @@ static int register_cpu_capacity_sysctl(void)
 subsys_initcall(register_cpu_capacity_sysctl);
 
 static int update_topology;
+bool topology_update_done;
+EXPORT_SYMBOL_GPL(topology_update_done);
 
 int topology_update_cpu_topology(void)
 {
@@ -254,6 +266,8 @@ static void update_topology_flags_workfn(struct work_struct *work)
 {
 	update_topology = 1;
 	rebuild_sched_domains();
+	topology_update_done = true;
+	trace_android_vh_update_topology_flags_workfn(NULL);
 	pr_debug("sched_domain hierarchy rebuilt, flags updated\n");
 	update_topology = 0;
 }

diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
index 7c3590fd..6966a185 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c

@@ -467,21 +467,85 @@ static int fw_decompress_xz(struct device *dev, struct fw_priv *fw_priv,
 #endif /* CONFIG_FW_LOADER_COMPRESS_XZ */
 
 /* direct firmware loading support */
-static char fw_path_para[256];
+#define CUSTOM_FW_PATH_COUNT	10
+#define PATH_SIZE		255
+static char fw_path_para[CUSTOM_FW_PATH_COUNT][PATH_SIZE];
 static const char * const fw_path[] = {
-	fw_path_para,
+	fw_path_para[0],
+	fw_path_para[1],
+	fw_path_para[2],
+	fw_path_para[3],
+	fw_path_para[4],
+	fw_path_para[5],
+	fw_path_para[6],
+	fw_path_para[7],
+	fw_path_para[8],
+	fw_path_para[9],
 	"/lib/firmware/updates/" UTS_RELEASE,
 	"/lib/firmware/updates",
 	"/lib/firmware/" UTS_RELEASE,
 	"/lib/firmware"
 };
 
+static char strpath[PATH_SIZE * CUSTOM_FW_PATH_COUNT];
+static int firmware_param_path_set(const char *val, const struct kernel_param *kp)
+{
+	int i;
+	char *path, *end;
+
+	strscpy(strpath, val, sizeof(strpath));
+	/* Remove leading and trailing spaces from path */
+	path = strim(strpath);
+	for (i = 0; path && i < CUSTOM_FW_PATH_COUNT; i++) {
+		end = strchr(path, ',');
+
+		/* Skip continuous token case, for example ',,,' */
+		if (end == path) {
+			i--;
+			path = ++end;
+			continue;
+		}
+
+		if (end != NULL)
+			*end = '\0';
+		else {
+			/* end of the string reached and no other tockens ','  */
+			strscpy(fw_path_para[i], path, PATH_SIZE);
+			break;
+		}
+
+		strscpy(fw_path_para[i], path, PATH_SIZE);
+		path = ++end;
+	}
+
+	return 0;
+}
+
+static int firmware_param_path_get(char *buffer, const struct kernel_param *kp)
+{
+	int count = 0, i;
+
+	for (i = 0; i < CUSTOM_FW_PATH_COUNT; i++) {
+		if (strlen(fw_path_para[i]) != 0)
+			count += scnprintf(buffer + count, PATH_SIZE, "%s%s", fw_path_para[i], ",");
+	}
+
+	buffer[count - 1] = '\0';
+
+	return count - 1;
+}
 /*
- * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH'
+ * Typical usage is that passing 'firmware_class.path=/vendor,/firwmare_mnt'
  * from kernel command line because firmware_class is generally built in
- * kernel instead of module.
+ * kernel instead of module. ',' is used as delimiter for setting 10
+ * custom paths for firmware loader.
  */
-module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644);
+
+static const struct kernel_param_ops firmware_param_ops = {
+	.set = firmware_param_path_set,
+	.get = firmware_param_path_get,
+};
+module_param_cb(path, &firmware_param_ops, NULL, 0644);
 MODULE_PARM_DESC(path, "customized firmware image search path with a higher priority than default path");
 
 static int

diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c
index 282a3a1..88616ee 100644
--- a/drivers/base/power/domain_governor.c
+++ b/drivers/base/power/domain_governor.c

@@ -12,6 +12,8 @@
 #include <linux/cpumask.h>
 #include <linux/ktime.h>
 
+#include <trace/hooks/pm_domain.h>
+
 static int dev_update_qos_constraint(struct device *dev, void *data)
 {
 	s64 *constraint_ns_p = data;
@@ -179,6 +181,11 @@ static bool __default_power_down_ok(struct dev_pm_domain *pd,
 	struct pm_domain_data *pdd;
 	s64 min_off_time_ns;
 	s64 off_on_time_ns;
+	bool allow = true;
+
+	trace_android_vh_allow_domain_state(genpd, state, &allow);
+	if (!allow)
+		return false;
 
 	off_on_time_ns = genpd->states[state].power_off_latency_ns +
 		genpd->states[state].power_on_latency_ns;

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index c501392..5ebfb40 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c

@@ -34,6 +34,7 @@
 #include <linux/cpufreq.h>
 #include <linux/devfreq.h>
 #include <linux/timer.h>
+#include <linux/wakeup_reason.h>
 
 #include "../base.h"
 #include "power.h"
@@ -1241,6 +1242,8 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
 	error = dpm_run_callback(callback, dev, state, info);
 	if (error) {
 		async_error = error;
+		log_suspend_abort_reason("Callback failed on %s in %pS returned %d",
+					 dev_name(dev), callback, error);
 		goto Complete;
 	}
 
@@ -1435,6 +1438,8 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
 	error = dpm_run_callback(callback, dev, state, info);
 	if (error) {
 		async_error = error;
+		log_suspend_abort_reason("Callback failed on %s in %pS returned %d",
+					 dev_name(dev), callback, error);
 		goto Complete;
 	}
 	dpm_propagate_wakeup_to_parent(dev);
@@ -1711,6 +1716,9 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async)
 
 		dpm_propagate_wakeup_to_parent(dev);
 		dpm_clear_superiors_direct_complete(dev);
+	} else {
+		log_suspend_abort_reason("Callback failed on %s in %pS returned %d",
+					 dev_name(dev), callback, error);
 	}
 
 	device_unlock(dev);
@@ -1924,6 +1932,9 @@ int dpm_prepare(pm_message_t state)
 		} else {
 			dev_info(dev, "not prepared for power transition: code %d\n",
 				 error);
+			log_suspend_abort_reason("Device %s not prepared for power transition: code %d",
+						 dev_name(dev), error);
+			dpm_save_failed_dev(dev_name(dev));
 		}
 
 		mutex_unlock(&dpm_list_mtx);

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 8e93167..0489db3 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c

@@ -137,6 +137,7 @@ s32 dev_pm_qos_read_value(struct device *dev, enum dev_pm_qos_req_type type)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(dev_pm_qos_read_value);
 
 /**
  * apply_constraint - Add/modify/remove device PM QoS request.

diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index 7cc0c0c..441092f 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c

@@ -15,6 +15,9 @@
 #include <linux/seq_file.h>
 #include <linux/debugfs.h>
 #include <linux/pm_wakeirq.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
+#include <linux/wakeup_reason.h>
 #include <trace/events/power.h>
 
 #include "power.h"
@@ -844,6 +847,37 @@ void pm_wakeup_dev_event(struct device *dev, unsigned int msec, bool hard)
 }
 EXPORT_SYMBOL_GPL(pm_wakeup_dev_event);
 
+void pm_get_active_wakeup_sources(char *pending_wakeup_source, size_t max)
+{
+	struct wakeup_source *ws, *last_active_ws = NULL;
+	int len = 0;
+	bool active = false;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
+		if (ws->active && len < max) {
+			if (!active)
+				len += scnprintf(pending_wakeup_source, max,
+						"Pending Wakeup Sources: ");
+			len += scnprintf(pending_wakeup_source + len, max - len,
+				"%s ", ws->name);
+			active = true;
+		} else if (!active &&
+			   (!last_active_ws ||
+			    ktime_to_ns(ws->last_time) >
+			    ktime_to_ns(last_active_ws->last_time))) {
+			last_active_ws = ws;
+		}
+	}
+	if (!active && last_active_ws) {
+		scnprintf(pending_wakeup_source, max,
+				"Last active Wakeup Source: %s",
+				last_active_ws->name);
+	}
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(pm_get_active_wakeup_sources);
+
 void pm_print_active_wakeup_sources(void)
 {
 	struct wakeup_source *ws;
@@ -882,6 +916,7 @@ bool pm_wakeup_pending(void)
 {
 	unsigned long flags;
 	bool ret = false;
+	char suspend_abort[MAX_SUSPEND_ABORT_LEN];
 
 	raw_spin_lock_irqsave(&events_lock, flags);
 	if (events_check_enabled) {
@@ -896,6 +931,10 @@ bool pm_wakeup_pending(void)
 	if (ret) {
 		pm_pr_dbg("Wakeup pending, aborting suspend\n");
 		pm_print_active_wakeup_sources();
+		pm_get_active_wakeup_sources(suspend_abort,
+					     MAX_SUSPEND_ABORT_LEN);
+		log_suspend_abort_reason(suspend_abort);
+		pr_info("PM: %s\n", suspend_abort);
 	}
 
 	return ret || atomic_read(&pm_abort_suspend) > 0;
@@ -948,8 +987,21 @@ void pm_system_irq_wakeup(unsigned int irq_number)
 
 	raw_spin_unlock_irqrestore(&wakeup_irq_lock, flags);
 
-	if (irq_number)
+	if (irq_number) {
+		struct irq_desc *desc;
+		const char *name = "null";
+
+		desc = irq_to_desc(irq_number);
+		if (desc == NULL)
+			name = "stray irq";
+		else if (desc->action && desc->action->name)
+			name = desc->action->name;
+
+		log_irq_wakeup_reason(irq_number);
+		pr_warn("%s: %d triggered %s\n", __func__, irq_number, name);
+
 		pm_system_wakeup();
+	}
 }
 
 unsigned int pm_wakeup_irq(void)

diff --git a/drivers/base/syscore.c b/drivers/base/syscore.c
index 13db1f78..3a9527d 100644
--- a/drivers/base/syscore.c
+++ b/drivers/base/syscore.c

@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/suspend.h>
 #include <trace/events/power.h>
+#include <linux/wakeup_reason.h>
 
 static LIST_HEAD(syscore_ops_list);
 static DEFINE_MUTEX(syscore_ops_lock);
@@ -73,7 +74,9 @@ int syscore_suspend(void)
 	return 0;
 
  err_out:
-	pr_err("PM: System core suspend callback %pS failed.\n", ops->suspend);
+	log_suspend_abort_reason("System core suspend callback %pS failed",
+		ops->suspend);
+	pr_err("PM: System core suspend callback %pF failed.\n", ops->suspend);
 
 	list_for_each_entry_continue(ops, &syscore_ops_list, node)
 		if (ops->resume)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 57b8366..05fe21f 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c

@@ -72,6 +72,8 @@ struct clk_core {
 	unsigned long		flags;
 	bool			orphan;
 	bool			rpm_enabled;
+	bool			need_sync;
+	bool			boot_enabled;
 	unsigned int		enable_count;
 	unsigned int		prepare_count;
 	unsigned int		protect_count;
@@ -1301,6 +1303,10 @@ static void __init clk_unprepare_unused_subtree(struct clk_core *core)
 	hlist_for_each_entry(child, &core->children, child_node)
 		clk_unprepare_unused_subtree(child);
 
+	if (dev_has_sync_state(core->dev) &&
+	    !(core->flags & CLK_DONT_HOLD_STATE))
+		return;
+
 	if (core->prepare_count)
 		return;
 
@@ -1332,6 +1338,10 @@ static void __init clk_disable_unused_subtree(struct clk_core *core)
 	hlist_for_each_entry(child, &core->children, child_node)
 		clk_disable_unused_subtree(child);
 
+	if (dev_has_sync_state(core->dev) &&
+	    !(core->flags & CLK_DONT_HOLD_STATE))
+		return;
+
 	if (core->flags & CLK_OPS_PARENT_ENABLE)
 		clk_core_prepare_enable(core->parent);
 
@@ -1405,6 +1415,38 @@ static int __init clk_disable_unused(void)
 }
 late_initcall_sync(clk_disable_unused);
 
+static void clk_unprepare_disable_dev_subtree(struct clk_core *core,
+					      struct device *dev)
+{
+	struct clk_core *child;
+
+	lockdep_assert_held(&prepare_lock);
+
+	hlist_for_each_entry(child, &core->children, child_node)
+		clk_unprepare_disable_dev_subtree(child, dev);
+
+	if (core->dev != dev || !core->need_sync)
+		return;
+
+	clk_core_disable_unprepare(core);
+}
+
+void clk_sync_state(struct device *dev)
+{
+	struct clk_core *core;
+
+	clk_prepare_lock();
+
+	hlist_for_each_entry(core, &clk_root_list, child_node)
+		clk_unprepare_disable_dev_subtree(core, dev);
+
+	hlist_for_each_entry(core, &clk_orphan_list, child_node)
+		clk_unprepare_disable_dev_subtree(core, dev);
+
+	clk_prepare_unlock();
+}
+EXPORT_SYMBOL_GPL(clk_sync_state);
+
 static int clk_core_determine_round_nolock(struct clk_core *core,
 					   struct clk_rate_request *req)
 {
@@ -1895,6 +1937,33 @@ int clk_hw_get_parent_index(struct clk_hw *hw)
 }
 EXPORT_SYMBOL_GPL(clk_hw_get_parent_index);
 
+static void clk_core_hold_state(struct clk_core *core)
+{
+	if (core->need_sync || !core->boot_enabled)
+		return;
+
+	if (core->orphan || !dev_has_sync_state(core->dev))
+		return;
+
+	if (core->flags & CLK_DONT_HOLD_STATE)
+		return;
+
+	core->need_sync = !clk_core_prepare_enable(core);
+}
+
+static void __clk_core_update_orphan_hold_state(struct clk_core *core)
+{
+	struct clk_core *child;
+
+	if (core->orphan)
+		return;
+
+	clk_core_hold_state(core);
+
+	hlist_for_each_entry(child, &core->children, child_node)
+		__clk_core_update_orphan_hold_state(child);
+}
+
 /*
  * Update the orphan status of @core and all its children.
  */
@@ -2198,6 +2267,13 @@ static struct clk_core *clk_propagate_rate_change(struct clk_core *core,
 			fail_clk = core;
 	}
 
+	if (core->ops->pre_rate_change) {
+		ret = core->ops->pre_rate_change(core->hw, core->rate,
+						 core->new_rate);
+		if (ret)
+			fail_clk = core;
+	}
+
 	hlist_for_each_entry(child, &core->children, child_node) {
 		/* Skip children who will be reparented to another clock */
 		if (child->new_parent && child->new_parent != core)
@@ -2292,6 +2368,9 @@ static void clk_change_rate(struct clk_core *core)
 	if (core->flags & CLK_RECALC_NEW_RATES)
 		(void)clk_calc_new_rates(core, core->new_rate);
 
+	if (core->ops->post_rate_change)
+		core->ops->post_rate_change(core->hw, old_rate, core->rate);
+
 	/*
 	 * Use safe iteration, as change_rate can actually swap parents
 	 * for certain clock types.
@@ -3231,7 +3310,7 @@ static int clk_dump_show(struct seq_file *s, void *data)
 }
 DEFINE_SHOW_ATTRIBUTE(clk_dump);
 
-#undef CLOCK_ALLOW_WRITE_DEBUGFS
+#define CLOCK_ALLOW_WRITE_DEBUGFS
 #ifdef CLOCK_ALLOW_WRITE_DEBUGFS
 /*
  * This can be dangerous, therefore don't provide any real compile time
@@ -3560,24 +3639,6 @@ static int __init clk_debug_init(void)
 {
 	struct clk_core *core;
 
-#ifdef CLOCK_ALLOW_WRITE_DEBUGFS
-	pr_warn("\n");
-	pr_warn("********************************************************************\n");
-	pr_warn("**     NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE           **\n");
-	pr_warn("**                                                                **\n");
-	pr_warn("**  WRITEABLE clk DebugFS SUPPORT HAS BEEN ENABLED IN THIS KERNEL **\n");
-	pr_warn("**                                                                **\n");
-	pr_warn("** This means that this kernel is built to expose clk operations  **\n");
-	pr_warn("** such as parent or rate setting, enabling, disabling, etc.      **\n");
-	pr_warn("** to userspace, which may compromise security on your system.    **\n");
-	pr_warn("**                                                                **\n");
-	pr_warn("** If you see this message and you are not debugging the          **\n");
-	pr_warn("** kernel, report this immediately to your vendor!                **\n");
-	pr_warn("**                                                                **\n");
-	pr_warn("**     NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE           **\n");
-	pr_warn("********************************************************************\n");
-#endif
-
 	rootdir = debugfs_create_dir("clk", NULL);
 
 	debugfs_create_file("clk_summary", 0444, rootdir, &all_lists,
@@ -3630,6 +3691,7 @@ static void clk_core_reparent_orphans_nolock(void)
 			__clk_set_parent_after(orphan, parent, NULL);
 			__clk_recalc_accuracies(orphan);
 			__clk_recalc_rates(orphan, true, 0);
+			__clk_core_update_orphan_hold_state(orphan);
 
 			/*
 			 * __clk_init_parent() will set the initial req_rate to
@@ -3806,6 +3868,8 @@ static int __clk_core_init(struct clk_core *core)
 		rate = 0;
 	core->rate = core->req_rate = rate;
 
+	core->boot_enabled = clk_core_is_enabled(core);
+
 	/*
 	 * Enable CLK_IS_CRITICAL clocks so newly added critical clocks
 	 * don't get accidentally disabled when walking the orphan tree and
@@ -3828,6 +3892,7 @@ static int __clk_core_init(struct clk_core *core)
 		}
 	}
 
+	clk_core_hold_state(core);
 	clk_core_reparent_orphans_nolock();
 
 	kref_init(&core->ref);

diff --git a/drivers/clk/qcom/dispcc-sdm845.c b/drivers/clk/qcom/dispcc-sdm845.c
index 735adfe..414ffb6 100644
--- a/drivers/clk/qcom/dispcc-sdm845.c
+++ b/drivers/clk/qcom/dispcc-sdm845.c

@@ -3,6 +3,7 @@
  * Copyright (c) 2018-2019, The Linux Foundation. All rights reserved.
  */
 
+#include <linux/clk.h>
 #include <linux/clk-provider.h>
 #include <linux/module.h>
 #include <linux/platform_device.h>
@@ -869,6 +870,7 @@ static struct platform_driver disp_cc_sdm845_driver = {
 	.driver		= {
 		.name	= "disp_cc-sdm845",
 		.of_match_table = disp_cc_sdm845_match_table,
+		.sync_state = clk_sync_state,
 	},
 };
 

diff --git a/drivers/clk/qcom/gcc-msm8998.c b/drivers/clk/qcom/gcc-msm8998.c
index 33473c5..8b4e4a3 100644
--- a/drivers/clk/qcom/gcc-msm8998.c
+++ b/drivers/clk/qcom/gcc-msm8998.c

@@ -3261,6 +3261,7 @@ static struct platform_driver gcc_msm8998_driver = {
 	.driver		= {
 		.name	= "gcc-msm8998",
 		.of_match_table = gcc_msm8998_match_table,
+		.sync_state = clk_sync_state,
 	},
 };
 

diff --git a/drivers/clk/qcom/gcc-sdm845.c b/drivers/clk/qcom/gcc-sdm845.c
index 6af08e0..0f2aadd 100644
--- a/drivers/clk/qcom/gcc-sdm845.c
+++ b/drivers/clk/qcom/gcc-sdm845.c

@@ -4020,6 +4020,7 @@ static struct platform_driver gcc_sdm845_driver = {
 	.driver		= {
 		.name	= "gcc-sdm845",
 		.of_match_table = gcc_sdm845_match_table,
+		.sync_state = clk_sync_state,
 	},
 };
 

diff --git a/drivers/clk/qcom/gpucc-sdm845.c b/drivers/clk/qcom/gpucc-sdm845.c
index 110b544..8749aab 100644
--- a/drivers/clk/qcom/gpucc-sdm845.c
+++ b/drivers/clk/qcom/gpucc-sdm845.c

@@ -205,6 +205,7 @@ static struct platform_driver gpu_cc_sdm845_driver = {
 	.driver = {
 		.name = "sdm845-gpucc",
 		.of_match_table = gpu_cc_sdm845_match_table,
+		.sync_state = clk_sync_state,
 	},
 };
 

diff --git a/drivers/clk/qcom/videocc-sdm845.c b/drivers/clk/qcom/videocc-sdm845.c
index c77a4dd..f678ade 100644
--- a/drivers/clk/qcom/videocc-sdm845.c
+++ b/drivers/clk/qcom/videocc-sdm845.c

@@ -337,6 +337,7 @@ static struct platform_driver video_cc_sdm845_driver = {
 	.driver		= {
 		.name	= "sdm845-videocc",
 		.of_match_table = video_cc_sdm845_match_table,
+		.sync_state = clk_sync_state,
 	},
 };
 

diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index 4469e7f..aa859ca 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig

@@ -135,7 +135,7 @@
 	  Enables the support for the RDA Micro timer driver.
 
 config SUN4I_TIMER
-	bool "Sun4i timer driver" if COMPILE_TEST
+	bool "Sun4i timer driver"
 	depends on HAS_IOMEM
 	select CLKSRC_MMIO
 	select TIMER_OF

diff --git a/drivers/clocksource/mmio.c b/drivers/clocksource/mmio.c
index 9de7515..826dcc4 100644
--- a/drivers/clocksource/mmio.c
+++ b/drivers/clocksource/mmio.c

@@ -21,6 +21,7 @@ u64 clocksource_mmio_readl_up(struct clocksource *c)
 {
 	return (u64)readl_relaxed(to_mmio_clksrc(c)->reg);
 }
+EXPORT_SYMBOL_GPL(clocksource_mmio_readl_up);
 
 u64 clocksource_mmio_readl_down(struct clocksource *c)
 {
@@ -46,7 +47,7 @@ u64 clocksource_mmio_readw_down(struct clocksource *c)
  * @bits:	Number of valid bits
  * @read:	One of clocksource_mmio_read*() above
  */
-int __init clocksource_mmio_init(void __iomem *base, const char *name,
+int clocksource_mmio_init(void __iomem *base, const char *name,
 	unsigned long hz, int rating, unsigned bits,
 	u64 (*read)(struct clocksource *))
 {
@@ -68,3 +69,4 @@ int __init clocksource_mmio_init(void __iomem *base, const char *name,
 
 	return clocksource_register_hz(&cs->clksrc, hz);
 }
+EXPORT_SYMBOL_GPL(clocksource_mmio_init);

diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index 2a84fc6..c939546 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig

@@ -35,6 +35,13 @@
 
 	  If in doubt, say N.
 
+config CPU_FREQ_TIMES
+       bool "CPU frequency time-in-state statistics"
+       help
+         Export CPU time-in-state information through procfs.
+
+         If in doubt, say N.
+
 choice
 	prompt "Default CPUFreq governor"
 	default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || ARM_SA1110_CPUFREQ
@@ -227,6 +234,15 @@
 
 	  If in doubt, say N.
 
+config CPUFREQ_DUMMY
+	tristate "Dummy CPU frequency driver"
+	help
+	  This option adds a generic dummy CPUfreq driver, which sets a fake
+	  2-frequency table when initializing each policy and otherwise does
+	  nothing.
+
+	  If in doubt, say N
+
 if X86
 source "drivers/cpufreq/Kconfig.x86"
 endif

diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 49b98c6..1f8ec8c8 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile

@@ -5,7 +5,10 @@
 # CPUfreq stats
 obj-$(CONFIG_CPU_FREQ_STAT)             += cpufreq_stats.o
 
-# CPUfreq governors 
+# CPUfreq times
+obj-$(CONFIG_CPU_FREQ_TIMES)		+= cpufreq_times.o
+
+# CPUfreq governors
 obj-$(CONFIG_CPU_FREQ_GOV_PERFORMANCE)	+= cpufreq_performance.o
 obj-$(CONFIG_CPU_FREQ_GOV_POWERSAVE)	+= cpufreq_powersave.o
 obj-$(CONFIG_CPU_FREQ_GOV_USERSPACE)	+= cpufreq_userspace.o
@@ -17,6 +20,8 @@
 obj-$(CONFIG_CPUFREQ_DT)		+= cpufreq-dt.o
 obj-$(CONFIG_CPUFREQ_DT_PLATDEV)	+= cpufreq-dt-platdev.o
 
+obj-$(CONFIG_CPUFREQ_DUMMY)		+= dummy-cpufreq.o
+
 # Traces
 CFLAGS_amd-pstate-trace.o               := -I$(src)
 amd_pstate-y				:= amd-pstate.o amd-pstate-trace.o

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 7e56a42..6a6a21d6 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c

@@ -16,6 +16,7 @@
 
 #include <linux/cpu.h>
 #include <linux/cpufreq.h>
+#include <linux/cpufreq_times.h>
 #include <linux/cpu_cooling.h>
 #include <linux/delay.h>
 #include <linux/device.h>
@@ -30,6 +31,7 @@
 #include <linux/tick.h>
 #include <linux/units.h>
 #include <trace/events/power.h>
+#include <trace/hooks/cpufreq.h>
 
 static LIST_HEAD(cpufreq_policy_list);
 
@@ -387,7 +389,9 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
 					 CPUFREQ_POSTCHANGE, freqs);
 
 		cpufreq_stats_record_transition(policy, freqs->new);
+		cpufreq_times_record_transition(policy, freqs->new);
 		policy->cur = freqs->new;
+		trace_android_rvh_cpufreq_transition(policy);
 	}
 }
 
@@ -529,8 +533,10 @@ static unsigned int __resolve_freq(struct cpufreq_policy *policy,
 		unsigned int target_freq, unsigned int relation)
 {
 	unsigned int idx;
+	unsigned int old_target_freq = target_freq;
 
 	target_freq = clamp_val(target_freq, policy->min, policy->max);
+	trace_android_vh_cpufreq_resolve_freq(policy, &target_freq, old_target_freq);
 
 	if (!policy->freq_table)
 		return target_freq;
@@ -689,8 +695,15 @@ static ssize_t show_##file_name				\
 	return sprintf(buf, "%u\n", policy->object);	\
 }
 
+static ssize_t show_cpuinfo_max_freq(struct cpufreq_policy *policy, char *buf)
+{
+	unsigned int max_freq = policy->cpuinfo.max_freq;
+
+	trace_android_rvh_show_max_freq(policy, &max_freq);
+	return sprintf(buf, "%u\n", max_freq);
+}
+
 show_one(cpuinfo_min_freq, cpuinfo.min_freq);
-show_one(cpuinfo_max_freq, cpuinfo.max_freq);
 show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
 show_one(scaling_min_freq, min);
 show_one(scaling_max_freq, max);
@@ -1489,6 +1502,7 @@ static int cpufreq_online(unsigned int cpu)
 			goto out_destroy_policy;
 
 		cpufreq_stats_create_table(policy);
+		cpufreq_times_create_policy(policy);
 
 		write_lock_irqsave(&cpufreq_driver_lock, flags);
 		list_add(&policy->policy_list, &cpufreq_policy_list);
@@ -2116,9 +2130,11 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
 					unsigned int target_freq)
 {
 	unsigned int freq;
+	unsigned int old_target_freq = target_freq;
 	int cpu;
 
 	target_freq = clamp_val(target_freq, policy->min, policy->max);
+	trace_android_vh_cpufreq_fast_switch(policy, &target_freq, old_target_freq);
 	freq = cpufreq_driver->fast_switch(policy, target_freq);
 
 	if (!freq)
@@ -2128,6 +2144,7 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
 	arch_set_freq_scale(policy->related_cpus, freq,
 			    policy->cpuinfo.max_freq);
 	cpufreq_stats_record_transition(policy, freq);
+	trace_android_rvh_cpufreq_transition(policy);
 
 	if (trace_cpu_frequency_enabled()) {
 		for_each_cpu(cpu, policy->cpus)
@@ -2275,6 +2292,8 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
 
 	target_freq = __resolve_freq(policy, target_freq, relation);
 
+	trace_android_vh_cpufreq_target(policy, &target_freq, old_target_freq);
+
 	pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n",
 		 policy->cpu, target_freq, relation, old_target_freq);
 
@@ -2604,7 +2623,6 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
 		ret = cpufreq_start_governor(policy);
 		if (!ret) {
 			pr_debug("governor change\n");
-			sched_cpufreq_governor_change(policy, old_gov);
 			return 0;
 		}
 		cpufreq_exit_governor(policy);
@@ -2622,6 +2640,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
 
 	return ret;
 }
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_frequency_limits);
 
 /**
  * cpufreq_update_policy - Re-evaluate an existing cpufreq policy.

diff --git a/drivers/cpufreq/cpufreq_times.c b/drivers/cpufreq/cpufreq_times.c
new file mode 100644
index 0000000..4df55b3
--- /dev/null
+++ b/drivers/cpufreq/cpufreq_times.c

@@ -0,0 +1,211 @@
+/* drivers/cpufreq/cpufreq_times.c
+ *
+ * Copyright (C) 2018 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/cpufreq.h>
+#include <linux/cpufreq_times.h>
+#include <linux/jiffies.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/threads.h>
+
+static DEFINE_SPINLOCK(task_time_in_state_lock); /* task->time_in_state */
+
+/**
+ * struct cpu_freqs - per-cpu frequency information
+ * @offset: start of these freqs' stats in task time_in_state array
+ * @max_state: number of entries in freq_table
+ * @last_index: index in freq_table of last frequency switched to
+ * @freq_table: list of available frequencies
+ */
+struct cpu_freqs {
+	unsigned int offset;
+	unsigned int max_state;
+	unsigned int last_index;
+	unsigned int freq_table[0];
+};
+
+static struct cpu_freqs *all_freqs[NR_CPUS];
+
+static unsigned int next_offset;
+
+void cpufreq_task_times_init(struct task_struct *p)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&task_time_in_state_lock, flags);
+	p->time_in_state = NULL;
+	spin_unlock_irqrestore(&task_time_in_state_lock, flags);
+	p->max_state = 0;
+}
+
+void cpufreq_task_times_alloc(struct task_struct *p)
+{
+	void *temp;
+	unsigned long flags;
+	unsigned int max_state = READ_ONCE(next_offset);
+
+	/* We use one array to avoid multiple allocs per task */
+	temp = kcalloc(max_state, sizeof(p->time_in_state[0]), GFP_ATOMIC);
+	if (!temp)
+		return;
+
+	spin_lock_irqsave(&task_time_in_state_lock, flags);
+	p->time_in_state = temp;
+	spin_unlock_irqrestore(&task_time_in_state_lock, flags);
+	p->max_state = max_state;
+}
+
+/* Caller must hold task_time_in_state_lock */
+static int cpufreq_task_times_realloc_locked(struct task_struct *p)
+{
+	void *temp;
+	unsigned int max_state = READ_ONCE(next_offset);
+
+	temp = krealloc(p->time_in_state, max_state * sizeof(u64), GFP_ATOMIC);
+	if (!temp)
+		return -ENOMEM;
+	p->time_in_state = temp;
+	memset(p->time_in_state + p->max_state, 0,
+	       (max_state - p->max_state) * sizeof(u64));
+	p->max_state = max_state;
+	return 0;
+}
+
+void cpufreq_task_times_exit(struct task_struct *p)
+{
+	unsigned long flags;
+	void *temp;
+
+	if (!p->time_in_state)
+		return;
+
+	spin_lock_irqsave(&task_time_in_state_lock, flags);
+	temp = p->time_in_state;
+	p->time_in_state = NULL;
+	spin_unlock_irqrestore(&task_time_in_state_lock, flags);
+	kfree(temp);
+}
+
+int proc_time_in_state_show(struct seq_file *m, struct pid_namespace *ns,
+	struct pid *pid, struct task_struct *p)
+{
+	unsigned int cpu, i;
+	u64 cputime;
+	unsigned long flags;
+	struct cpu_freqs *freqs;
+	struct cpu_freqs *last_freqs = NULL;
+
+	spin_lock_irqsave(&task_time_in_state_lock, flags);
+	for_each_possible_cpu(cpu) {
+		freqs = all_freqs[cpu];
+		if (!freqs || freqs == last_freqs)
+			continue;
+		last_freqs = freqs;
+
+		seq_printf(m, "cpu%u\n", cpu);
+		for (i = 0; i < freqs->max_state; i++) {
+			cputime = 0;
+			if (freqs->offset + i < p->max_state &&
+			    p->time_in_state)
+				cputime = p->time_in_state[freqs->offset + i];
+			seq_printf(m, "%u %lu\n", freqs->freq_table[i],
+				   (unsigned long)nsec_to_clock_t(cputime));
+		}
+	}
+	spin_unlock_irqrestore(&task_time_in_state_lock, flags);
+	return 0;
+}
+
+void cpufreq_acct_update_power(struct task_struct *p, u64 cputime)
+{
+	unsigned long flags;
+	unsigned int state;
+	struct cpu_freqs *freqs = all_freqs[task_cpu(p)];
+
+	if (!freqs || is_idle_task(p) || p->flags & PF_EXITING)
+		return;
+
+	state = freqs->offset + READ_ONCE(freqs->last_index);
+
+	spin_lock_irqsave(&task_time_in_state_lock, flags);
+	if ((state < p->max_state || !cpufreq_task_times_realloc_locked(p)) &&
+	    p->time_in_state)
+		p->time_in_state[state] += cputime;
+	spin_unlock_irqrestore(&task_time_in_state_lock, flags);
+}
+
+static int cpufreq_times_get_index(struct cpu_freqs *freqs, unsigned int freq)
+{
+	int index;
+        for (index = 0; index < freqs->max_state; ++index) {
+		if (freqs->freq_table[index] == freq)
+			return index;
+        }
+	return -1;
+}
+
+void cpufreq_times_create_policy(struct cpufreq_policy *policy)
+{
+	int cpu, index = 0;
+	unsigned int count = 0;
+	struct cpufreq_frequency_table *pos, *table;
+	struct cpu_freqs *freqs;
+	void *tmp;
+
+	if (all_freqs[policy->cpu])
+		return;
+
+	table = policy->freq_table;
+	if (!table)
+		return;
+
+	cpufreq_for_each_valid_entry(pos, table)
+		count++;
+
+	tmp =  kzalloc(sizeof(*freqs) + sizeof(freqs->freq_table[0]) * count,
+		       GFP_KERNEL);
+	if (!tmp)
+		return;
+
+	freqs = tmp;
+	freqs->max_state = count;
+
+	cpufreq_for_each_valid_entry(pos, table)
+		freqs->freq_table[index++] = pos->frequency;
+
+	index = cpufreq_times_get_index(freqs, policy->cur);
+	if (index >= 0)
+		WRITE_ONCE(freqs->last_index, index);
+
+	freqs->offset = next_offset;
+	WRITE_ONCE(next_offset, freqs->offset + count);
+	for_each_cpu(cpu, policy->related_cpus)
+		all_freqs[cpu] = freqs;
+}
+
+void cpufreq_times_record_transition(struct cpufreq_policy *policy,
+	unsigned int new_freq)
+{
+	int index;
+	struct cpu_freqs *freqs = all_freqs[policy->cpu];
+	if (!freqs)
+		return;
+
+	index = cpufreq_times_get_index(freqs, new_freq);
+	if (index >= 0)
+		WRITE_ONCE(freqs->last_index, index);
+}

diff --git a/drivers/cpufreq/dummy-cpufreq.c b/drivers/cpufreq/dummy-cpufreq.c
new file mode 100644
index 0000000..e74ef67
--- /dev/null
+++ b/drivers/cpufreq/dummy-cpufreq.c

@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ */
+#include <linux/cpufreq.h>
+#include <linux/module.h>
+
+static struct cpufreq_frequency_table freq_table[] = {
+	{ .frequency = 1 },
+	{ .frequency = 2 },
+	{ .frequency = CPUFREQ_TABLE_END },
+};
+
+static int dummy_cpufreq_target_index(struct cpufreq_policy *policy,
+				   unsigned int index)
+{
+	return 0;
+}
+
+static int dummy_cpufreq_driver_init(struct cpufreq_policy *policy)
+{
+	policy->freq_table = freq_table;
+	return 0;
+}
+
+static unsigned int dummy_cpufreq_get(unsigned int cpu)
+{
+	return 1;
+}
+
+static int dummy_cpufreq_verify(struct cpufreq_policy_data *data)
+{
+	return 0;
+}
+
+static struct cpufreq_driver dummy_cpufreq_driver = {
+	.name = "dummy",
+	.target_index = dummy_cpufreq_target_index,
+	.init = dummy_cpufreq_driver_init,
+	.get = dummy_cpufreq_get,
+	.verify = dummy_cpufreq_verify,
+	.attr = cpufreq_generic_attr,
+};
+
+static int __init dummy_cpufreq_init(void)
+{
+	return cpufreq_register_driver(&dummy_cpufreq_driver);
+}
+
+static void __exit dummy_cpufreq_exit(void)
+{
+	cpufreq_unregister_driver(&dummy_cpufreq_driver);
+}
+
+module_init(dummy_cpufreq_init);
+module_exit(dummy_cpufreq_exit);
+
+MODULE_AUTHOR("Connor O'Brien <connoro@google.com>");
+MODULE_DESCRIPTION("dummy cpufreq driver");
+MODULE_LICENSE("GPL");

diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index 67e56cf..24d3115 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c

@@ -9,6 +9,7 @@
 
 #include <linux/cpufreq.h>
 #include <linux/module.h>
+#include <trace/hooks/cpufreq.h>
 
 /*********************************************************************
  *                     FREQUENCY TABLE HELPERS                       *
@@ -51,6 +52,7 @@ int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy,
 			max_freq = freq;
 	}
 
+	trace_android_vh_freq_table_limits(policy, min_freq, max_freq);
 	policy->min = policy->cpuinfo.min_freq = min_freq;
 	policy->max = max_freq;
 	/*

diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index 57bc3e3..396df7d 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c

@@ -26,6 +26,7 @@
 #include <linux/syscore_ops.h>
 
 #include <asm/cpuidle.h>
+#include <trace/hooks/cpuidle_psci.h>
 
 #include "cpuidle-psci.h"
 #include "dt_idle_states.h"
@@ -68,6 +69,8 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
 	if (ret)
 		return -1;
 
+	trace_android_vh_cpuidle_psci_enter(dev, s2idle);
+
 	/* Do runtime PM to manage a hierarchical CPU toplogy. */
 	ct_irq_enter_irqson();
 	if (s2idle)
@@ -89,6 +92,8 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
 		pm_runtime_get_sync(pd_dev);
 	ct_irq_exit_irqson();
 
+	trace_android_vh_cpuidle_psci_exit(dev, s2idle);
+
 	cpu_pm_exit();
 
 	/* Clear the domain state to start fresh when back from idle. */

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 6eceb19..054449e 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c

@@ -26,6 +26,7 @@
 #include <linux/mmu_context.h>
 #include <linux/context_tracking.h>
 #include <trace/events/power.h>
+#include <trace/hooks/cpuidle.h>
 
 #include "cpuidle.h"
 
@@ -204,11 +205,22 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 {
 	int entered_state;
 
-	struct cpuidle_state *target_state = &drv->states[index];
-	bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
+	struct cpuidle_state *target_state;
+	bool broadcast;
 	ktime_t time_start, time_end;
 
 	/*
+	 * The vendor hook may modify index, which means target_state and
+	 * broadcast must be assigned after the vendor hook.
+	 */
+	trace_android_vh_cpu_idle_enter(&index, dev);
+	if (index < 0)
+		return index;
+
+	target_state = &drv->states[index];
+	broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
+
+	/*
 	 * Tell the time framework to switch to a broadcast timer because our
 	 * local timer will be shut down.  If a local timer is used from another
 	 * CPU as a broadcast timer, this call may fail if it is not available.
@@ -244,6 +256,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	sched_clock_idle_wakeup_event();
 	time_end = ns_to_ktime(local_clock());
 	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
+	trace_android_vh_cpu_idle_exit(entered_state, dev);
 
 	/* The cpu is no longer idle or about to enter idle. */
 	sched_idle_set_state(NULL);

diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index f70aa17..5bfdb86 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c

@@ -385,3 +385,4 @@ void cpuidle_driver_state_disabled(struct cpuidle_driver *drv, int idx,
 
 	mutex_unlock(&cpuidle_lock);
 }
+EXPORT_SYMBOL_GPL(cpuidle_driver_state_disabled);

diff --git a/drivers/cpuidle/governor.c b/drivers/cpuidle/governor.c
index 0d0f975..ca15755 100644
--- a/drivers/cpuidle/governor.c
+++ b/drivers/cpuidle/governor.c

@@ -101,6 +101,7 @@ int cpuidle_register_governor(struct cpuidle_governor *gov)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(cpuidle_register_governor);
 
 /**
  * cpuidle_governor_latency_req - Compute a latency constraint for CPU
@@ -117,3 +118,4 @@ s64 cpuidle_governor_latency_req(unsigned int cpu)
 
 	return (s64)device_req * NSEC_PER_USEC;
 }
+EXPORT_SYMBOL_GPL(cpuidle_governor_latency_req);

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index eb6b593..1e77907 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c

@@ -27,6 +27,7 @@
 #include <linux/mm.h>
 #include <linux/mount.h>
 #include <linux/pseudo_fs.h>
+#include <trace/hooks/dmabuf.h>
 
 #include <uapi/linux/dma-buf.h>
 #include <uapi/linux/magic.h>
@@ -131,6 +132,7 @@ static struct file_system_type dma_buf_fs_type = {
 static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 {
 	struct dma_buf *dmabuf;
+	bool ignore_bounds = false;
 
 	if (!is_dma_buf_file(file))
 		return -EINVAL;
@@ -141,9 +143,11 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 	if (!dmabuf->ops->mmap)
 		return -EINVAL;
 
+	trace_android_vh_ignore_dmabuf_vmap_bounds(dmabuf, &ignore_bounds);
+
 	/* check for overflowing the buffer's size */
-	if (vma->vm_pgoff + vma_pages(vma) >
-	    dmabuf->size >> PAGE_SHIFT)
+	if ((vma->vm_pgoff + vma_pages(vma) >
+	    dmabuf->size >> PAGE_SHIFT) && !ignore_bounds)
 		return -EINVAL;
 
 	return dmabuf->ops->mmap(dmabuf, vma);
@@ -1306,6 +1310,30 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_begin_cpu_access, DMA_BUF);
 
+int dma_buf_begin_cpu_access_partial(struct dma_buf *dmabuf,
+				     enum dma_data_direction direction,
+				     unsigned int offset, unsigned int len)
+{
+	int ret = 0;
+
+	if (WARN_ON(!dmabuf))
+		return -EINVAL;
+
+	if (dmabuf->ops->begin_cpu_access_partial)
+		ret = dmabuf->ops->begin_cpu_access_partial(dmabuf, direction,
+							    offset, len);
+
+	/* Ensure that all fences are waited upon - but we first allow
+	 * the native handler the chance to do so more efficiently if it
+	 * chooses. A double invocation here will be reasonably cheap no-op.
+	 */
+	if (ret == 0)
+		ret = __dma_buf_begin_cpu_access(dmabuf, direction);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access_partial);
+
 /**
  * dma_buf_end_cpu_access - Must be called after accessing a dma_buf from the
  * cpu in the kernel context. Calls end_cpu_access to allow exporter-specific
@@ -1334,6 +1362,21 @@ int dma_buf_end_cpu_access(struct dma_buf *dmabuf,
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_end_cpu_access, DMA_BUF);
 
+int dma_buf_end_cpu_access_partial(struct dma_buf *dmabuf,
+				   enum dma_data_direction direction,
+				   unsigned int offset, unsigned int len)
+{
+	int ret = 0;
+
+	WARN_ON(!dmabuf);
+
+	if (dmabuf->ops->end_cpu_access_partial)
+		ret = dmabuf->ops->end_cpu_access_partial(dmabuf, direction,
+							  offset, len);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access_partial);
 
 /**
  * dma_buf_mmap - Setup up a userspace mmap with the given vma
@@ -1454,6 +1497,20 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map)
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF);
 
+int dma_buf_get_flags(struct dma_buf *dmabuf, unsigned long *flags)
+{
+	int ret = 0;
+
+	if (WARN_ON(!dmabuf) || !flags)
+		return -EINVAL;
+
+	if (dmabuf->ops->get_flags)
+		ret = dmabuf->ops->get_flags(dmabuf, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(dma_buf_get_flags);
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index 59d1588..9277e55 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c

@@ -31,6 +31,7 @@
  * @heap_devt		heap device node
  * @list		list head connecting to list of heaps
  * @heap_cdev		heap char device
+ * @heap_dev		heap device struct
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -41,6 +42,8 @@ struct dma_heap {
 	dev_t heap_devt;
 	struct list_head list;
 	struct cdev heap_cdev;
+	struct kref refcount;
+	struct device *heap_dev;
 };
 
 static LIST_HEAD(heap_list);
@@ -49,22 +52,60 @@ static dev_t dma_heap_devt;
 static struct class *dma_heap_class;
 static DEFINE_XARRAY_ALLOC(dma_heap_minors);
 
-static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
-				 unsigned int fd_flags,
-				 unsigned int heap_flags)
+struct dma_heap *dma_heap_find(const char *name)
 {
-	struct dma_buf *dmabuf;
-	int fd;
+	struct dma_heap *h;
 
+	mutex_lock(&heap_list_lock);
+	list_for_each_entry(h, &heap_list, list) {
+		if (!strcmp(h->name, name)) {
+			kref_get(&h->refcount);
+			mutex_unlock(&heap_list_lock);
+			return h;
+		}
+	}
+	mutex_unlock(&heap_list_lock);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(dma_heap_find);
+
+
+void dma_heap_buffer_free(struct dma_buf *dmabuf)
+{
+	dma_buf_put(dmabuf);
+}
+EXPORT_SYMBOL_GPL(dma_heap_buffer_free);
+
+struct dma_buf *dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
+				      unsigned int fd_flags,
+				      unsigned int heap_flags)
+{
+	if (fd_flags & ~DMA_HEAP_VALID_FD_FLAGS)
+		return ERR_PTR(-EINVAL);
+
+	if (heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
+		return ERR_PTR(-EINVAL);
 	/*
 	 * Allocations from all heaps have to begin
 	 * and end on page boundaries.
 	 */
 	len = PAGE_ALIGN(len);
 	if (!len)
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
-	dmabuf = heap->ops->allocate(heap, len, fd_flags, heap_flags);
+	return heap->ops->allocate(heap, len, fd_flags, heap_flags);
+}
+EXPORT_SYMBOL_GPL(dma_heap_buffer_alloc);
+
+int dma_heap_bufferfd_alloc(struct dma_heap *heap, size_t len,
+			    unsigned int fd_flags,
+			    unsigned int heap_flags)
+{
+	struct dma_buf *dmabuf;
+	int fd;
+
+	dmabuf = dma_heap_buffer_alloc(heap, len, fd_flags, heap_flags);
+
 	if (IS_ERR(dmabuf))
 		return PTR_ERR(dmabuf);
 
@@ -74,7 +115,9 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
 		/* just return, as put will call release and that will free */
 	}
 	return fd;
+
 }
+EXPORT_SYMBOL_GPL(dma_heap_bufferfd_alloc);
 
 static int dma_heap_open(struct inode *inode, struct file *file)
 {
@@ -102,15 +145,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
 	if (heap_allocation->fd)
 		return -EINVAL;
 
-	if (heap_allocation->fd_flags & ~DMA_HEAP_VALID_FD_FLAGS)
-		return -EINVAL;
-
-	if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
-		return -EINVAL;
-
-	fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
-				   heap_allocation->fd_flags,
-				   heap_allocation->heap_flags);
+	fd = dma_heap_bufferfd_alloc(heap, heap_allocation->len,
+				     heap_allocation->fd_flags,
+				     heap_allocation->heap_flags);
 	if (fd < 0)
 		return fd;
 
@@ -203,6 +240,47 @@ void *dma_heap_get_drvdata(struct dma_heap *heap)
 {
 	return heap->priv;
 }
+EXPORT_SYMBOL_GPL(dma_heap_get_drvdata);
+
+static void dma_heap_release(struct kref *ref)
+{
+	struct dma_heap *heap = container_of(ref, struct dma_heap, refcount);
+	int minor = MINOR(heap->heap_devt);
+
+	/* Note, we already holding the heap_list_lock here */
+	list_del(&heap->list);
+
+	device_destroy(dma_heap_class, heap->heap_devt);
+	cdev_del(&heap->heap_cdev);
+	xa_erase(&dma_heap_minors, minor);
+
+	kfree(heap);
+}
+
+void dma_heap_put(struct dma_heap *h)
+{
+	/*
+	 * Take the heap_list_lock now to avoid racing with code
+	 * scanning the list and then taking a kref.
+	 */
+	mutex_lock(&heap_list_lock);
+	kref_put(&h->refcount, dma_heap_release);
+	mutex_unlock(&heap_list_lock);
+}
+EXPORT_SYMBOL_GPL(dma_heap_put);
+
+/**
+ * dma_heap_get_dev() - get device struct for the heap
+ * @heap: DMA-Heap to retrieve device struct from
+ *
+ * Returns:
+ * The device struct for the heap.
+ */
+struct device *dma_heap_get_dev(struct dma_heap *heap)
+{
+	return heap->heap_dev;
+}
+EXPORT_SYMBOL_GPL(dma_heap_get_dev);
 
 /**
  * dma_heap_get_name() - get heap name
@@ -215,11 +293,11 @@ const char *dma_heap_get_name(struct dma_heap *heap)
 {
 	return heap->name;
 }
+EXPORT_SYMBOL_GPL(dma_heap_get_name);
 
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
 	struct dma_heap *heap, *h, *err_ret;
-	struct device *dev_ret;
 	unsigned int minor;
 	int ret;
 
@@ -237,6 +315,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 	if (!heap)
 		return ERR_PTR(-ENOMEM);
 
+	kref_init(&heap->refcount);
 	heap->name = exp_info->name;
 	heap->ops = exp_info->ops;
 	heap->priv = exp_info->priv;
@@ -261,17 +340,20 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 		goto err1;
 	}
 
-	dev_ret = device_create(dma_heap_class,
-				NULL,
-				heap->heap_devt,
-				NULL,
-				heap->name);
-	if (IS_ERR(dev_ret)) {
+	heap->heap_dev = device_create(dma_heap_class,
+				       NULL,
+				       heap->heap_devt,
+				       NULL,
+				       heap->name);
+	if (IS_ERR(heap->heap_dev)) {
 		pr_err("dma_heap: Unable to create device\n");
-		err_ret = ERR_CAST(dev_ret);
+		err_ret = ERR_CAST(heap->heap_dev);
 		goto err2;
 	}
 
+	/* Make sure it doesn't disappear on us */
+	heap->heap_dev = get_device(heap->heap_dev);
+
 	mutex_lock(&heap_list_lock);
 	/* check the name is unique */
 	list_for_each_entry(h, &heap_list, list) {
@@ -280,6 +362,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 			pr_err("dma_heap: Already registered heap named %s\n",
 			       exp_info->name);
 			err_ret = ERR_PTR(-EINVAL);
+			put_device(heap->heap_dev);
 			goto err3;
 		}
 	}
@@ -300,6 +383,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 	kfree(heap);
 	return err_ret;
 }
+EXPORT_SYMBOL_GPL(dma_heap_add);
 
 static char *dma_heap_devnode(struct device *dev, umode_t *mode)
 {

diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
index a5eef06..ad58616 100644
--- a/drivers/dma-buf/heaps/Kconfig
+++ b/drivers/dma-buf/heaps/Kconfig

@@ -1,12 +1,17 @@
+menuconfig DMABUF_HEAPS_DEFERRED_FREE
+	bool "DMA-BUF heaps deferred-free library"
+	help
+	  Choose this option to enable the DMA-BUF heaps deferred-free library.
+
 config DMABUF_HEAPS_SYSTEM
-	bool "DMA-BUF System Heap"
+	tristate "DMA-BUF System Heap"
 	depends on DMABUF_HEAPS
 	help
 	  Choose this option to enable the system dmabuf heap. The system heap
 	  is backed by pages from the buddy allocator. If in doubt, say Y.
 
 config DMABUF_HEAPS_CMA
-	bool "DMA-BUF CMA Heap"
+	tristate "DMA-BUF CMA Heap"
 	depends on DMABUF_HEAPS && DMA_CMA
 	help
 	  Choose this option to enable dma-buf CMA heap. This heap is backed

diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
index 97446779..4e78398 100644
--- a/drivers/dma-buf/heaps/Makefile
+++ b/drivers/dma-buf/heaps/Makefile

@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_DMABUF_HEAPS_DEFERRED_FREE) += deferred-free-helper.o
 obj-$(CONFIG_DMABUF_HEAPS_SYSTEM)	+= system_heap.o
 obj-$(CONFIG_DMABUF_HEAPS_CMA)		+= cma_heap.o

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 28fb04e..3da34c9 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c

@@ -99,9 +99,10 @@ static struct sg_table *cma_heap_map_dma_buf(struct dma_buf_attachment *attachme
 {
 	struct dma_heap_attachment *a = attachment->priv;
 	struct sg_table *table = &a->table;
+	int attrs = attachment->dma_map_attrs;
 	int ret;
 
-	ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+	ret = dma_map_sgtable(attachment->dev, table, direction, attrs);
 	if (ret)
 		return ERR_PTR(-ENOMEM);
 	a->mapped = true;
@@ -113,9 +114,10 @@ static void cma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
 				   enum dma_data_direction direction)
 {
 	struct dma_heap_attachment *a = attachment->priv;
+	int attrs = attachment->dma_map_attrs;
 
 	a->mapped = false;
-	dma_unmap_sgtable(attachment->dev, table, direction, 0);
+	dma_unmap_sgtable(attachment->dev, table, direction, attrs);
 }
 
 static int cma_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
@@ -405,3 +407,4 @@ static int add_default_cma_heap(void)
 module_init(add_default_cma_heap);
 MODULE_DESCRIPTION("DMA-BUF CMA Heap");
 MODULE_LICENSE("GPL v2");
+MODULE_IMPORT_NS(DMA_BUF);

diff --git a/drivers/dma-buf/heaps/deferred-free-helper.c b/drivers/dma-buf/heaps/deferred-free-helper.c
new file mode 100644
index 0000000..bae2717
--- /dev/null
+++ b/drivers/dma-buf/heaps/deferred-free-helper.c

@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Deferred dmabuf freeing helper
+ *
+ * Copyright (C) 2020 Linaro, Ltd.
+ *
+ * Based on the ION page pool code
+ * Copyright (C) 2011 Google, Inc.
+ */
+
+#include <linux/freezer.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/sched/signal.h>
+
+#include "deferred-free-helper.h"
+
+static LIST_HEAD(free_list);
+static size_t list_nr_pages;
+wait_queue_head_t freelist_waitqueue;
+struct task_struct *freelist_task;
+static DEFINE_SPINLOCK(free_list_lock);
+
+void deferred_free(struct deferred_freelist_item *item,
+		   void (*free)(struct deferred_freelist_item*,
+				enum df_reason),
+		   size_t nr_pages)
+{
+	unsigned long flags;
+
+	INIT_LIST_HEAD(&item->list);
+	item->nr_pages = nr_pages;
+	item->free = free;
+
+	spin_lock_irqsave(&free_list_lock, flags);
+	list_add(&item->list, &free_list);
+	list_nr_pages += nr_pages;
+	spin_unlock_irqrestore(&free_list_lock, flags);
+	wake_up(&freelist_waitqueue);
+}
+EXPORT_SYMBOL_GPL(deferred_free);
+
+static size_t free_one_item(enum df_reason reason)
+{
+	unsigned long flags;
+	size_t nr_pages;
+	struct deferred_freelist_item *item;
+
+	spin_lock_irqsave(&free_list_lock, flags);
+	if (list_empty(&free_list)) {
+		spin_unlock_irqrestore(&free_list_lock, flags);
+		return 0;
+	}
+	item = list_first_entry(&free_list, struct deferred_freelist_item, list);
+	list_del(&item->list);
+	nr_pages = item->nr_pages;
+	list_nr_pages -= nr_pages;
+	spin_unlock_irqrestore(&free_list_lock, flags);
+
+	item->free(item, reason);
+	return nr_pages;
+}
+
+static unsigned long get_freelist_nr_pages(void)
+{
+	unsigned long nr_pages;
+	unsigned long flags;
+
+	spin_lock_irqsave(&free_list_lock, flags);
+	nr_pages = list_nr_pages;
+	spin_unlock_irqrestore(&free_list_lock, flags);
+	return nr_pages;
+}
+
+static unsigned long freelist_shrink_count(struct shrinker *shrinker,
+					   struct shrink_control *sc)
+{
+	return get_freelist_nr_pages();
+}
+
+static unsigned long freelist_shrink_scan(struct shrinker *shrinker,
+					  struct shrink_control *sc)
+{
+	unsigned long total_freed = 0;
+
+	if (sc->nr_to_scan == 0)
+		return 0;
+
+	while (total_freed < sc->nr_to_scan) {
+		size_t pages_freed = free_one_item(DF_UNDER_PRESSURE);
+
+		if (!pages_freed)
+			break;
+
+		total_freed += pages_freed;
+	}
+
+	return total_freed;
+}
+
+static struct shrinker freelist_shrinker = {
+	.count_objects = freelist_shrink_count,
+	.scan_objects = freelist_shrink_scan,
+	.seeks = DEFAULT_SEEKS,
+	.batch = 0,
+};
+
+static int deferred_free_thread(void *data)
+{
+	while (true) {
+		wait_event_freezable(freelist_waitqueue,
+				     get_freelist_nr_pages() > 0);
+
+		free_one_item(DF_NORMAL);
+	}
+
+	return 0;
+}
+
+static int deferred_freelist_init(void)
+{
+	list_nr_pages = 0;
+
+	init_waitqueue_head(&freelist_waitqueue);
+	freelist_task = kthread_run(deferred_free_thread, NULL,
+				    "%s", "dmabuf-deferred-free-worker");
+	if (IS_ERR(freelist_task)) {
+		pr_err("Creating thread for deferred free failed\n");
+		return -1;
+	}
+	sched_set_normal(freelist_task, 19);
+
+	return register_shrinker(&freelist_shrinker, "dmabuf-deferred-free-shrinker");
+}
+module_init(deferred_freelist_init);
+MODULE_LICENSE("GPL v2");
+

diff --git a/drivers/dma-buf/heaps/deferred-free-helper.h b/drivers/dma-buf/heaps/deferred-free-helper.h
new file mode 100644
index 0000000..1194032
--- /dev/null
+++ b/drivers/dma-buf/heaps/deferred-free-helper.h

@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef DEFERRED_FREE_HELPER_H
+#define DEFERRED_FREE_HELPER_H
+
+/**
+ * df_reason - enum for reason why item was freed
+ *
+ * This provides a reason for why the free function was called
+ * on the item. This is useful when deferred_free is used in
+ * combination with a pagepool, so under pressure the page can
+ * be immediately freed.
+ *
+ * DF_NORMAL:         Normal deferred free
+ *
+ * DF_UNDER_PRESSURE: Free was called because the system
+ *                    is under memory pressure. Usually
+ *                    from a shrinker. Avoid allocating
+ *                    memory in the free call, as it may
+ *                    fail.
+ */
+enum df_reason {
+	DF_NORMAL,
+	DF_UNDER_PRESSURE,
+};
+
+/**
+ * deferred_freelist_item - item structure for deferred freelist
+ *
+ * This is to be added to the structure for whatever you want to
+ * defer freeing on.
+ *
+ * @nr_pages: number of pages used by item to be freed
+ * @free: function pointer to be called when freeing the item
+ * @list: list entry for the deferred list
+ */
+struct deferred_freelist_item {
+	size_t nr_pages;
+	void (*free)(struct deferred_freelist_item *i,
+		     enum df_reason reason);
+	struct list_head list;
+};
+
+/**
+ * deferred_free - call to add item to the deferred free list
+ *
+ * @item: Pointer to deferred_freelist_item field of a structure
+ * @free: Function pointer to the free call
+ * @nr_pages: number of pages to be freed
+ */
+void deferred_free(struct deferred_freelist_item *item,
+		   void (*free)(struct deferred_freelist_item *i,
+				enum df_reason reason),
+		   size_t nr_pages);
+#endif

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index fcf836b..41f7aaf2 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c

@@ -22,6 +22,7 @@
 #include <linux/vmalloc.h>
 
 static struct dma_heap *sys_heap;
+static struct dma_heap *sys_uncached_heap;
 
 struct system_heap_buffer {
 	struct dma_heap *heap;
@@ -31,6 +32,8 @@ struct system_heap_buffer {
 	struct sg_table sg_table;
 	int vmap_cnt;
 	void *vaddr;
+
+	bool uncached;
 };
 
 struct dma_heap_attachment {
@@ -38,6 +41,8 @@ struct dma_heap_attachment {
 	struct sg_table *table;
 	struct list_head list;
 	bool mapped;
+
+	bool uncached;
 };
 
 #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
@@ -101,7 +106,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
 	a->dev = attachment->dev;
 	INIT_LIST_HEAD(&a->list);
 	a->mapped = false;
-
+	a->uncached = buffer->uncached;
 	attachment->priv = a;
 
 	mutex_lock(&buffer->lock);
@@ -131,9 +136,13 @@ static struct sg_table *system_heap_map_dma_buf(struct dma_buf_attachment *attac
 {
 	struct dma_heap_attachment *a = attachment->priv;
 	struct sg_table *table = a->table;
+	int attr = attachment->dma_map_attrs;
 	int ret;
 
-	ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+	if (a->uncached)
+		attr |= DMA_ATTR_SKIP_CPU_SYNC;
+
+	ret = dma_map_sgtable(attachment->dev, table, direction, attr);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -146,9 +155,12 @@ static void system_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
 				      enum dma_data_direction direction)
 {
 	struct dma_heap_attachment *a = attachment->priv;
+	int attr = attachment->dma_map_attrs;
 
+	if (a->uncached)
+		attr |= DMA_ATTR_SKIP_CPU_SYNC;
 	a->mapped = false;
-	dma_unmap_sgtable(attachment->dev, table, direction, 0);
+	dma_unmap_sgtable(attachment->dev, table, direction, attr);
 }
 
 static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
@@ -162,10 +174,12 @@ static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
 	if (buffer->vmap_cnt)
 		invalidate_kernel_vmap_range(buffer->vaddr, buffer->len);
 
-	list_for_each_entry(a, &buffer->attachments, list) {
-		if (!a->mapped)
-			continue;
-		dma_sync_sgtable_for_cpu(a->dev, a->table, direction);
+	if (!buffer->uncached) {
+		list_for_each_entry(a, &buffer->attachments, list) {
+			if (!a->mapped)
+				continue;
+			dma_sync_sgtable_for_cpu(a->dev, a->table, direction);
+		}
 	}
 	mutex_unlock(&buffer->lock);
 
@@ -183,10 +197,12 @@ static int system_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
 	if (buffer->vmap_cnt)
 		flush_kernel_vmap_range(buffer->vaddr, buffer->len);
 
-	list_for_each_entry(a, &buffer->attachments, list) {
-		if (!a->mapped)
-			continue;
-		dma_sync_sgtable_for_device(a->dev, a->table, direction);
+	if (!buffer->uncached) {
+		list_for_each_entry(a, &buffer->attachments, list) {
+			if (!a->mapped)
+				continue;
+			dma_sync_sgtable_for_device(a->dev, a->table, direction);
+		}
 	}
 	mutex_unlock(&buffer->lock);
 
@@ -201,6 +217,9 @@ static int system_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
 	struct sg_page_iter piter;
 	int ret;
 
+	if (buffer->uncached)
+		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
 	for_each_sgtable_page(table, &piter, vma->vm_pgoff) {
 		struct page *page = sg_page_iter_page(&piter);
 
@@ -222,17 +241,21 @@ static void *system_heap_do_vmap(struct system_heap_buffer *buffer)
 	struct page **pages = vmalloc(sizeof(struct page *) * npages);
 	struct page **tmp = pages;
 	struct sg_page_iter piter;
+	pgprot_t pgprot = PAGE_KERNEL;
 	void *vaddr;
 
 	if (!pages)
 		return ERR_PTR(-ENOMEM);
 
+	if (buffer->uncached)
+		pgprot = pgprot_writecombine(PAGE_KERNEL);
+
 	for_each_sgtable_page(table, &piter, 0) {
 		WARN_ON(tmp - pages >= npages);
 		*tmp++ = sg_page_iter_page(&piter);
 	}
 
-	vaddr = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
+	vaddr = vmap(pages, npages, VM_MAP, pgprot);
 	vfree(pages);
 
 	if (!vaddr)
@@ -332,10 +355,11 @@ static struct page *alloc_largest_available(unsigned long size,
 	return NULL;
 }
 
-static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
-					    unsigned long len,
-					    unsigned long fd_flags,
-					    unsigned long heap_flags)
+static struct dma_buf *system_heap_do_allocate(struct dma_heap *heap,
+					       unsigned long len,
+					       unsigned long fd_flags,
+					       unsigned long heap_flags,
+					       bool uncached)
 {
 	struct system_heap_buffer *buffer;
 	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
@@ -356,6 +380,7 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 	mutex_init(&buffer->lock);
 	buffer->heap = heap;
 	buffer->len = len;
+	buffer->uncached = uncached;
 
 	INIT_LIST_HEAD(&pages);
 	i = 0;
@@ -401,6 +426,18 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 		ret = PTR_ERR(dmabuf);
 		goto free_pages;
 	}
+
+	/*
+	 * For uncached buffers, we need to initially flush cpu cache, since
+	 * the __GFP_ZERO on the allocation means the zeroing was done by the
+	 * cpu and thus it is likely cached. Map (and implicitly flush) and
+	 * unmap it now so we don't get corruption later on.
+	 */
+	if (buffer->uncached) {
+		dma_map_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);
+		dma_unmap_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);
+	}
+
 	return dmabuf;
 
 free_pages:
@@ -418,10 +455,40 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 	return ERR_PTR(ret);
 }
 
+static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
+					    unsigned long len,
+					    unsigned long fd_flags,
+					    unsigned long heap_flags)
+{
+	return system_heap_do_allocate(heap, len, fd_flags, heap_flags, false);
+}
+
 static const struct dma_heap_ops system_heap_ops = {
 	.allocate = system_heap_allocate,
 };
 
+static struct dma_buf *system_uncached_heap_allocate(struct dma_heap *heap,
+						     unsigned long len,
+						     unsigned long fd_flags,
+						     unsigned long heap_flags)
+{
+	return system_heap_do_allocate(heap, len, fd_flags, heap_flags, true);
+}
+
+/* Dummy function to be used until we can call coerce_mask_and_coherent */
+static struct dma_buf *system_uncached_heap_not_initialized(struct dma_heap *heap,
+							    unsigned long len,
+							    unsigned long fd_flags,
+							    unsigned long heap_flags)
+{
+	return ERR_PTR(-EBUSY);
+}
+
+static struct dma_heap_ops system_uncached_heap_ops = {
+	/* After system_heap_create is complete, we will swap this */
+	.allocate = system_uncached_heap_not_initialized,
+};
+
 static int system_heap_create(void)
 {
 	struct dma_heap_export_info exp_info;
@@ -434,7 +501,20 @@ static int system_heap_create(void)
 	if (IS_ERR(sys_heap))
 		return PTR_ERR(sys_heap);
 
+	exp_info.name = "system-uncached";
+	exp_info.ops = &system_uncached_heap_ops;
+	exp_info.priv = NULL;
+
+	sys_uncached_heap = dma_heap_add(&exp_info);
+	if (IS_ERR(sys_uncached_heap))
+		return PTR_ERR(sys_uncached_heap);
+
+	dma_coerce_mask_and_coherent(dma_heap_get_dev(sys_uncached_heap), DMA_BIT_MASK(64));
+	mb(); /* make sure we only set allocate after dma_mask is set */
+	system_uncached_heap_ops.allocate = system_uncached_heap_allocate;
+
 	return 0;
 }
 module_init(system_heap_create);
 MODULE_LICENSE("GPL v2");
+MODULE_IMPORT_NS(DMA_BUF);

diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index d5e86ef..fa85c64 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c

@@ -36,81 +36,6 @@
 #include "common.h"
 
 #define FFA_DRIVER_VERSION	FFA_VERSION_1_0
-
-#define FFA_SMC(calling_convention, func_num)				\
-	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, (calling_convention),	\
-			   ARM_SMCCC_OWNER_STANDARD, (func_num))
-
-#define FFA_SMC_32(func_num)	FFA_SMC(ARM_SMCCC_SMC_32, (func_num))
-#define FFA_SMC_64(func_num)	FFA_SMC(ARM_SMCCC_SMC_64, (func_num))
-
-#define FFA_ERROR			FFA_SMC_32(0x60)
-#define FFA_SUCCESS			FFA_SMC_32(0x61)
-#define FFA_INTERRUPT			FFA_SMC_32(0x62)
-#define FFA_VERSION			FFA_SMC_32(0x63)
-#define FFA_FEATURES			FFA_SMC_32(0x64)
-#define FFA_RX_RELEASE			FFA_SMC_32(0x65)
-#define FFA_RXTX_MAP			FFA_SMC_32(0x66)
-#define FFA_FN64_RXTX_MAP		FFA_SMC_64(0x66)
-#define FFA_RXTX_UNMAP			FFA_SMC_32(0x67)
-#define FFA_PARTITION_INFO_GET		FFA_SMC_32(0x68)
-#define FFA_ID_GET			FFA_SMC_32(0x69)
-#define FFA_MSG_POLL			FFA_SMC_32(0x6A)
-#define FFA_MSG_WAIT			FFA_SMC_32(0x6B)
-#define FFA_YIELD			FFA_SMC_32(0x6C)
-#define FFA_RUN				FFA_SMC_32(0x6D)
-#define FFA_MSG_SEND			FFA_SMC_32(0x6E)
-#define FFA_MSG_SEND_DIRECT_REQ		FFA_SMC_32(0x6F)
-#define FFA_FN64_MSG_SEND_DIRECT_REQ	FFA_SMC_64(0x6F)
-#define FFA_MSG_SEND_DIRECT_RESP	FFA_SMC_32(0x70)
-#define FFA_FN64_MSG_SEND_DIRECT_RESP	FFA_SMC_64(0x70)
-#define FFA_MEM_DONATE			FFA_SMC_32(0x71)
-#define FFA_FN64_MEM_DONATE		FFA_SMC_64(0x71)
-#define FFA_MEM_LEND			FFA_SMC_32(0x72)
-#define FFA_FN64_MEM_LEND		FFA_SMC_64(0x72)
-#define FFA_MEM_SHARE			FFA_SMC_32(0x73)
-#define FFA_FN64_MEM_SHARE		FFA_SMC_64(0x73)
-#define FFA_MEM_RETRIEVE_REQ		FFA_SMC_32(0x74)
-#define FFA_FN64_MEM_RETRIEVE_REQ	FFA_SMC_64(0x74)
-#define FFA_MEM_RETRIEVE_RESP		FFA_SMC_32(0x75)
-#define FFA_MEM_RELINQUISH		FFA_SMC_32(0x76)
-#define FFA_MEM_RECLAIM			FFA_SMC_32(0x77)
-#define FFA_MEM_OP_PAUSE		FFA_SMC_32(0x78)
-#define FFA_MEM_OP_RESUME		FFA_SMC_32(0x79)
-#define FFA_MEM_FRAG_RX			FFA_SMC_32(0x7A)
-#define FFA_MEM_FRAG_TX			FFA_SMC_32(0x7B)
-#define FFA_NORMAL_WORLD_RESUME		FFA_SMC_32(0x7C)
-
-/*
- * For some calls it is necessary to use SMC64 to pass or return 64-bit values.
- * For such calls FFA_FN_NATIVE(name) will choose the appropriate
- * (native-width) function ID.
- */
-#ifdef CONFIG_64BIT
-#define FFA_FN_NATIVE(name)	FFA_FN64_##name
-#else
-#define FFA_FN_NATIVE(name)	FFA_##name
-#endif
-
-/* FFA error codes. */
-#define FFA_RET_SUCCESS            (0)
-#define FFA_RET_NOT_SUPPORTED      (-1)
-#define FFA_RET_INVALID_PARAMETERS (-2)
-#define FFA_RET_NO_MEMORY          (-3)
-#define FFA_RET_BUSY               (-4)
-#define FFA_RET_INTERRUPTED        (-5)
-#define FFA_RET_DENIED             (-6)
-#define FFA_RET_RETRY              (-7)
-#define FFA_RET_ABORTED            (-8)
-
-#define MAJOR_VERSION_MASK	GENMASK(30, 16)
-#define MINOR_VERSION_MASK	GENMASK(15, 0)
-#define MAJOR_VERSION(x)	((u16)(FIELD_GET(MAJOR_VERSION_MASK, (x))))
-#define MINOR_VERSION(x)	((u16)(FIELD_GET(MINOR_VERSION_MASK, (x))))
-#define PACK_VERSION_INFO(major, minor)			\
-	(FIELD_PREP(MAJOR_VERSION_MASK, (major)) |	\
-	 FIELD_PREP(MINOR_VERSION_MASK, (minor)))
-#define FFA_VERSION_1_0		PACK_VERSION_INFO(1, 0)
 #define FFA_MIN_VERSION		FFA_VERSION_1_0
 
 #define SENDER_ID_MASK		GENMASK(31, 16)
@@ -121,12 +46,6 @@
 	(FIELD_PREP(SENDER_ID_MASK, (s)) | FIELD_PREP(RECEIVER_ID_MASK, (r)))
 
 /*
- * FF-A specification mentions explicitly about '4K pages'. This should
- * not be confused with the kernel PAGE_SIZE, which is the translation
- * granule kernel is configured and may be one among 4K, 16K and 64K.
- */
-#define FFA_PAGE_SIZE		SZ_4K
-/*
  * Keeping RX TX buffer size as 4K for now
  * 64K may be preferred to keep it min a page in 64K PAGE_SIZE config
  */
@@ -178,9 +97,9 @@ static struct ffa_drv_info *drv_info;
  */
 static u32 ffa_compatible_version_find(u32 version)
 {
-	u16 major = MAJOR_VERSION(version), minor = MINOR_VERSION(version);
-	u16 drv_major = MAJOR_VERSION(FFA_DRIVER_VERSION);
-	u16 drv_minor = MINOR_VERSION(FFA_DRIVER_VERSION);
+	u16 major = FFA_MAJOR_VERSION(version), minor = FFA_MINOR_VERSION(version);
+	u16 drv_major = FFA_MAJOR_VERSION(FFA_DRIVER_VERSION);
+	u16 drv_minor = FFA_MINOR_VERSION(FFA_DRIVER_VERSION);
 
 	if ((major < drv_major) || (major == drv_major && minor <= drv_minor))
 		return version;
@@ -204,16 +123,16 @@ static int ffa_version_check(u32 *version)
 
 	if (ver.a0 < FFA_MIN_VERSION) {
 		pr_err("Incompatible v%d.%d! Earliest supported v%d.%d\n",
-		       MAJOR_VERSION(ver.a0), MINOR_VERSION(ver.a0),
-		       MAJOR_VERSION(FFA_MIN_VERSION),
-		       MINOR_VERSION(FFA_MIN_VERSION));
+		       FFA_MAJOR_VERSION(ver.a0), FFA_MINOR_VERSION(ver.a0),
+		       FFA_MAJOR_VERSION(FFA_MIN_VERSION),
+		       FFA_MINOR_VERSION(FFA_MIN_VERSION));
 		return -EINVAL;
 	}
 
-	pr_info("Driver version %d.%d\n", MAJOR_VERSION(FFA_DRIVER_VERSION),
-		MINOR_VERSION(FFA_DRIVER_VERSION));
-	pr_info("Firmware version %d.%d found\n", MAJOR_VERSION(ver.a0),
-		MINOR_VERSION(ver.a0));
+	pr_info("Driver version %d.%d\n", FFA_MAJOR_VERSION(FFA_DRIVER_VERSION),
+		FFA_MINOR_VERSION(FFA_DRIVER_VERSION));
+	pr_info("Firmware version %d.%d found\n", FFA_MAJOR_VERSION(ver.a0),
+		FFA_MINOR_VERSION(ver.a0));
 	*version = ffa_compatible_version_find(ver.a0);
 
 	return 0;

diff --git a/drivers/firmware/smccc/kvm_guest.c b/drivers/firmware/smccc/kvm_guest.c
index 89a68e7..e2c9528 100644
--- a/drivers/firmware/smccc/kvm_guest.c
+++ b/drivers/firmware/smccc/kvm_guest.c

@@ -10,6 +10,8 @@
 
 #include <asm/hypervisor.h>
 
+void __weak kvm_arm_init_hyp_services(void) {}
+
 static DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS) __ro_after_init = { };
 
 void __init kvm_init_hyp_services(void)
@@ -39,6 +41,8 @@ void __init kvm_init_hyp_services(void)
 
 	pr_info("hypervisor services detected (0x%08lx 0x%08lx 0x%08lx 0x%08lx)\n",
 		 res.a3, res.a2, res.a1, res.a0);
+
+	kvm_arm_init_hyp_services();
 }
 
 bool kvm_arm_hyp_service_available(u32 func_id)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index ca2a6e6..8c69534 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c

@@ -678,9 +678,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_RMFB, drm_mode_rmfb_ioctl, 0),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_PAGE_FLIP, drm_mode_page_flip_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_DIRTYFB, drm_mode_dirtyfb_ioctl, DRM_MASTER),
-	DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_DUMB, drm_mode_create_dumb_ioctl, 0),
-	DRM_IOCTL_DEF(DRM_IOCTL_MODE_MAP_DUMB, drm_mode_mmap_dumb_ioctl, 0),
-	DRM_IOCTL_DEF(DRM_IOCTL_MODE_DESTROY_DUMB, drm_mode_destroy_dumb_ioctl, 0),
+	DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_DUMB, drm_mode_create_dumb_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF(DRM_IOCTL_MODE_MAP_DUMB, drm_mode_mmap_dumb_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF(DRM_IOCTL_MODE_DESTROY_DUMB, drm_mode_destroy_dumb_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_OBJ_GETPROPERTIES, drm_mode_obj_get_properties_ioctl, 0),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_OBJ_SETPROPERTY, drm_mode_obj_set_property_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_CURSOR2, drm_mode_cursor2_ioctl, DRM_MASTER),

diff --git a/drivers/gpu/drm/drm_modes.c b/drivers/gpu/drm/drm_modes.c
index 304004f..fb3a2ad 100644
--- a/drivers/gpu/drm/drm_modes.c
+++ b/drivers/gpu/drm/drm_modes.c

@@ -2012,6 +2012,7 @@ void drm_mode_convert_to_umode(struct drm_mode_modeinfo *out,
 	strncpy(out->name, in->name, DRM_DISPLAY_MODE_LEN);
 	out->name[DRM_DISPLAY_MODE_LEN-1] = 0;
 }
+EXPORT_SYMBOL_GPL(drm_mode_convert_to_umode);
 
 /**
  * drm_mode_convert_umode - convert a modeinfo into a drm_display_mode
@@ -2088,6 +2089,7 @@ int drm_mode_convert_umode(struct drm_device *dev,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(drm_mode_convert_umode);
 
 /**
  * drm_mode_is_420_only - if a given videomode can be only supported in YCBCR420

diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
index 9ea7611..2df9d83 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c

@@ -301,10 +301,6 @@ virtio_gpu_user_framebuffer_create(struct drm_device *dev,
 	struct virtio_gpu_framebuffer *virtio_gpu_fb;
 	int ret;
 
-	if (mode_cmd->pixel_format != DRM_FORMAT_HOST_XRGB8888 &&
-	    mode_cmd->pixel_format != DRM_FORMAT_HOST_ARGB8888)
-		return ERR_PTR(-ENOENT);
-
 	/* lookup object associated with res handle */
 	obj = drm_gem_object_lookup(file_priv, mode_cmd->handles[0]);
 	if (!obj)
@@ -340,7 +336,6 @@ int virtio_gpu_modeset_init(struct virtio_gpu_device *vgdev)
 	if (ret)
 		return ret;
 
-	vgdev->ddev->mode_config.quirk_addfb_prefer_host_byte_order = true;
 	vgdev->ddev->mode_config.funcs = &virtio_gpu_mode_funcs;
 
 	/* modes will be validated against the framebuffer size */

diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index 4c09e31..b8fcbca4 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c

@@ -30,7 +30,14 @@
 #include "virtgpu_drv.h"
 
 static const uint32_t virtio_gpu_formats[] = {
-	DRM_FORMAT_HOST_XRGB8888,
+	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_ARGB8888,
+	DRM_FORMAT_BGRX8888,
+	DRM_FORMAT_BGRA8888,
+	DRM_FORMAT_RGBX8888,
+	DRM_FORMAT_RGBA8888,
+	DRM_FORMAT_XBGR8888,
+	DRM_FORMAT_ABGR8888,
 };
 
 static const uint32_t virtio_gpu_cursor_formats[] = {
@@ -42,6 +49,32 @@ uint32_t virtio_gpu_translate_format(uint32_t drm_fourcc)
 	uint32_t format;
 
 	switch (drm_fourcc) {
+#ifdef __BIG_ENDIAN
+	case DRM_FORMAT_XRGB8888:
+		format = VIRTIO_GPU_FORMAT_X8R8G8B8_UNORM;
+		break;
+	case DRM_FORMAT_ARGB8888:
+		format = VIRTIO_GPU_FORMAT_A8R8G8B8_UNORM;
+		break;
+	case DRM_FORMAT_BGRX8888:
+		format = VIRTIO_GPU_FORMAT_B8G8R8X8_UNORM;
+		break;
+	case DRM_FORMAT_BGRA8888:
+		format = VIRTIO_GPU_FORMAT_B8G8R8A8_UNORM;
+		break;
+	case DRM_FORMAT_RGBX8888:
+		format = VIRTIO_GPU_FORMAT_R8G8B8X8_UNORM;
+		break;
+	case DRM_FORMAT_RGBA8888:
+		format = VIRTIO_GPU_FORMAT_R8G8B8A8_UNORM;
+		break;
+	case DRM_FORMAT_XBGR8888:
+		format = VIRTIO_GPU_FORMAT_X8B8G8R8_UNORM;
+		break;
+	case DRM_FORMAT_ABGR8888:
+		format = VIRTIO_GPU_FORMAT_A8B8G8R8_UNORM;
+		break;
+#else
 	case DRM_FORMAT_XRGB8888:
 		format = VIRTIO_GPU_FORMAT_B8G8R8X8_UNORM;
 		break;
@@ -54,6 +87,19 @@ uint32_t virtio_gpu_translate_format(uint32_t drm_fourcc)
 	case DRM_FORMAT_BGRA8888:
 		format = VIRTIO_GPU_FORMAT_A8R8G8B8_UNORM;
 		break;
+	case DRM_FORMAT_RGBX8888:
+		format = VIRTIO_GPU_FORMAT_X8B8G8R8_UNORM;
+		break;
+	case DRM_FORMAT_RGBA8888:
+		format = VIRTIO_GPU_FORMAT_A8B8G8R8_UNORM;
+		break;
+	case DRM_FORMAT_XBGR8888:
+		format = VIRTIO_GPU_FORMAT_R8G8B8X8_UNORM;
+		break;
+	case DRM_FORMAT_ABGR8888:
+		format = VIRTIO_GPU_FORMAT_R8G8B8A8_UNORM;
+		break;
+#endif
 	default:
 		/*
 		 * This should not happen, we handle everything listed

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 56f7e06..0ffe095 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c

@@ -19,6 +19,7 @@
 #include <linux/interrupt.h>
 #include <linux/set_memory.h>
 #include <asm/page.h>
+#include <asm/mem_encrypt.h>
 #include <asm/mshyperv.h>
 
 #include "hyperv_vmbus.h"

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 9dc27e5..074bffd 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c

@@ -21,6 +21,7 @@
 #include <linux/export.h>
 #include <linux/io.h>
 #include <linux/set_memory.h>
+#include <asm/mem_encrypt.h>
 #include <asm/mshyperv.h>
 
 #include "hyperv_vmbus.h"

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9297b74..7ac6dc2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c

@@ -28,6 +28,7 @@
 #include <linux/spinlock.h>
 #include <linux/swiotlb.h>
 #include <linux/vmalloc.h>
+#include <trace/hooks/iommu.h>
 
 #include "dma-iommu.h"
 
@@ -578,6 +579,9 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
 	}
 
 	init_iova_domain(iovad, 1UL << order, base_pfn);
+
+	trace_android_rvh_iommu_iovad_init_alloc_algo(dev, iovad);
+
 	ret = iova_domain_init_rcaches(iovad);
 	if (ret)
 		goto done_unlock;
@@ -651,6 +655,8 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
 		iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
 				       true);
 
+	trace_android_vh_iommu_iovad_alloc_iova(dev, iovad, (dma_addr_t)iova << shift, size);
+
 	return (dma_addr_t)iova << shift;
 }
 
@@ -669,6 +675,8 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
 	else
 		free_iova_fast(iovad, iova_pfn(iovad, iova),
 				size >> iova_shift(iovad));
+
+	trace_android_vh_iommu_iovad_free_iova(iovad, iova, size);
 }
 
 static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index fe452ce..1cf7d0f 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c

@@ -11,6 +11,7 @@
 #include <linux/smp.h>
 #include <linux/bitops.h>
 #include <linux/cpu.h>
+#include <trace/hooks/iommu.h>
 
 /* The anchor node sits above the top of the usable address space */
 #define IOVA_ANCHOR	~0UL
@@ -70,6 +71,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,
 	iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
 	rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
 	rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
+	android_init_vendor_data(iovad, 1);
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
@@ -186,8 +188,11 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
 	unsigned long align_mask = ~0UL;
 	unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
 
-	if (size_aligned)
-		align_mask <<= fls_long(size - 1);
+	if (size_aligned) {
+		unsigned long shift = fls_long(size - 1);
+		trace_android_rvh_iommu_limit_align_shift(iovad, size, &shift);
+		align_mask <<= shift;
+	}
 
 	/* Walk the tree backwards */
 	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
@@ -316,14 +321,18 @@ alloc_iova(struct iova_domain *iovad, unsigned long size,
 	bool size_aligned)
 {
 	struct iova *new_iova;
-	int ret;
+	int ret = -1;
 
 	new_iova = alloc_iova_mem();
 	if (!new_iova)
 		return NULL;
 
-	ret = __alloc_and_insert_iova_range(iovad, size, limit_pfn + 1,
-			new_iova, size_aligned);
+	trace_android_rvh_iommu_alloc_insert_iova(iovad, size, limit_pfn + 1,
+			new_iova, size_aligned, &ret);
+	if (ret) {
+		ret = __alloc_and_insert_iova_range(iovad, size,
+			limit_pfn + 1, new_iova, size_aligned);
+	}
 
 	if (ret) {
 		free_iova_mem(new_iova);

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 34d5856..d738685 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c

@@ -18,6 +18,7 @@
 #include <linux/percpu.h>
 #include <linux/refcount.h>
 #include <linux/slab.h>
+#include <trace/hooks/gic_v3.h>
 
 #include <linux/irqchip.h>
 #include <linux/irqchip/arm-gic-common.h>
@@ -881,11 +882,15 @@ static void __init gic_dist_init(void)
 	 * enabled.
 	 */
 	affinity = gic_mpidr_to_affinity(cpu_logical_map(smp_processor_id()));
-	for (i = 32; i < GIC_LINE_NR; i++)
+	for (i = 32; i < GIC_LINE_NR; i++) {
+		trace_android_vh_gic_v3_affinity_init(i, GICD_IROUTER, &affinity);
 		gic_write_irouter(affinity, base + GICD_IROUTER + i * 8);
+	}
 
-	for (i = 0; i < GIC_ESPI_NR; i++)
+	for (i = 0; i < GIC_ESPI_NR; i++) {
+		trace_android_vh_gic_v3_affinity_init(i, GICD_IROUTERnE, &affinity);
 		gic_write_irouter(affinity, base + GICD_IROUTERnE + i * 8);
+	}
 }
 
 static int gic_iterate_rdists(int (*fn)(struct redist_region *, void __iomem *))
@@ -1347,6 +1352,9 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 	reg = gic_dist_base(d) + offset + (index * 8);
 	val = gic_mpidr_to_affinity(cpu_logical_map(cpu));
 
+	trace_android_rvh_gic_v3_set_affinity(d, mask_val, &val, force, gic_dist_base(d),
+					gic_data.redist_regions[0].redist_base,
+					gic_data.redist_stride);
 	gic_write_irouter(val, reg);
 
 	/*

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 4c7bae0..98c6778 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c

@@ -39,6 +39,7 @@
 #include <linux/irqchip.h>
 #include <linux/irqchip/chained_irq.h>
 #include <linux/irqchip/arm-gic.h>
+#include <trace/hooks/gic.h>
 
 #include <asm/cputype.h>
 #include <asm/irq.h>
@@ -816,6 +817,8 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 		writeb_relaxed(gic_cpu_map[cpu], reg);
 	irq_data_update_effective_affinity(d, cpumask_of(cpu));
 
+	trace_android_vh_gic_set_affinity(d, mask_val, force, gic_cpu_map, reg);
+
 	return IRQ_SET_MASK_OK_DONE;
 }
 

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 998a5cf..dd70f54 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig

@@ -290,6 +290,27 @@
 
 	  If unsure, say N.
 
+config DM_DEFAULT_KEY
+	tristate "Default-key target support"
+	depends on BLK_DEV_DM
+	depends on BLK_INLINE_ENCRYPTION
+	# dm-default-key doesn't require -o inlinecrypt, but it does currently
+	# rely on the inline encryption hooks being built into the kernel.
+	depends on FS_ENCRYPTION_INLINE_CRYPT
+	help
+	  This device-mapper target allows you to create a device that
+	  assigns a default encryption key to bios that aren't for the
+	  contents of an encrypted file.
+
+	  This ensures that all blocks on-disk will be encrypted with
+	  some key, without the performance hit of file contents being
+	  encrypted twice when fscrypt (File-Based Encryption) is used.
+
+	  It is only appropriate to use dm-default-key when key
+	  configuration is tightly controlled, like it is in Android,
+	  such that all fscrypt keys are at least as hard to compromise
+	  as the default key.
+
 config DM_SNAPSHOT
        tristate "Snapshot target"
        depends on BLK_DEV_DM
@@ -653,4 +674,18 @@
 	  Enables audit logging of several security relevant events in the
 	  particular device-mapper targets, especially the integrity target.
 
+config DM_USER
+	tristate "Block device in userspace"
+	depends on BLK_DEV_DM
+	default y
+	help
+	  This device-mapper target allows a userspace daemon to provide the
+	  contents of a block device.  See
+	  <file:Documentation/block/dm-user.rst> for more information.
+
+	  To compile this code as a module, choose M here: the module will be
+	  called dm-user.
+
+	  If unsure, say N.
+
 endif # MD

diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 84291e3..369f2ae 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile

@@ -57,6 +57,7 @@
 obj-$(CONFIG_DM_BUFIO)		+= dm-bufio.o
 obj-$(CONFIG_DM_BIO_PRISON)	+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT)		+= dm-crypt.o
+obj-$(CONFIG_DM_DEFAULT_KEY)	+= dm-default-key.o
 obj-$(CONFIG_DM_DELAY)		+= dm-delay.o
 obj-$(CONFIG_DM_DUST)		+= dm-dust.o
 obj-$(CONFIG_DM_FLAKEY)		+= dm-flakey.o
@@ -84,6 +85,7 @@
 obj-$(CONFIG_DM_ZONED)		+= dm-zoned.o
 obj-$(CONFIG_DM_WRITECACHE)	+= dm-writecache.o
 obj-$(CONFIG_SECURITY_LOADPIN_VERITY)	+= dm-verity-loadpin.o
+obj-$(CONFIG_DM_USER)		+= dm-user.o
 
 ifeq ($(CONFIG_DM_INIT),y)
 dm-mod-objs			+= dm-init.o

diff --git a/drivers/md/dm-default-key.c b/drivers/md/dm-default-key.c
new file mode 100644
index 0000000..577c5fd
--- /dev/null
+++ b/drivers/md/dm-default-key.c

@@ -0,0 +1,439 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2017 Google, Inc.
+ */
+
+#include <linux/blk-crypto.h>
+#include <linux/device-mapper.h>
+#include <linux/module.h>
+
+#define DM_MSG_PREFIX		"default-key"
+
+static const struct dm_default_key_cipher {
+	const char *name;
+	enum blk_crypto_mode_num mode_num;
+	int key_size;
+} dm_default_key_ciphers[] = {
+	{
+		.name = "aes-xts-plain64",
+		.mode_num = BLK_ENCRYPTION_MODE_AES_256_XTS,
+		.key_size = 64,
+	}, {
+		.name = "xchacha12,aes-adiantum-plain64",
+		.mode_num = BLK_ENCRYPTION_MODE_ADIANTUM,
+		.key_size = 32,
+	},
+};
+
+/**
+ * struct dm_default_c - private data of a default-key target
+ * @dev: the underlying device
+ * @start: starting sector of the range of @dev which this target actually maps.
+ *	   For this purpose a "sector" is 512 bytes.
+ * @cipher_string: the name of the encryption algorithm being used
+ * @iv_offset: starting offset for IVs.  IVs are generated as if the target were
+ *	       preceded by @iv_offset 512-byte sectors.
+ * @sector_size: crypto sector size in bytes (usually 4096)
+ * @sector_bits: log2(sector_size)
+ * @key: the encryption key to use
+ * @max_dun: the maximum DUN that may be used (computed from other params)
+ */
+struct default_key_c {
+	struct dm_dev *dev;
+	sector_t start;
+	const char *cipher_string;
+	u64 iv_offset;
+	unsigned int sector_size;
+	unsigned int sector_bits;
+	struct blk_crypto_key key;
+	enum blk_crypto_key_type key_type;
+	u64 max_dun;
+};
+
+static const struct dm_default_key_cipher *
+lookup_cipher(const char *cipher_string)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(dm_default_key_ciphers); i++) {
+		if (strcmp(cipher_string, dm_default_key_ciphers[i].name) == 0)
+			return &dm_default_key_ciphers[i];
+	}
+	return NULL;
+}
+
+static void default_key_dtr(struct dm_target *ti)
+{
+	struct default_key_c *dkc = ti->private;
+	int err;
+
+	if (dkc->dev) {
+		err = blk_crypto_evict_key(dkc->dev->bdev, &dkc->key);
+		if (err && err != -ENOKEY)
+			DMWARN("Failed to evict crypto key: %d", err);
+		dm_put_device(ti, dkc->dev);
+	}
+	kfree_sensitive(dkc->cipher_string);
+	kfree_sensitive(dkc);
+}
+
+static int default_key_ctr_optional(struct dm_target *ti,
+				    unsigned int argc, char **argv)
+{
+	struct default_key_c *dkc = ti->private;
+	struct dm_arg_set as;
+	static const struct dm_arg _args[] = {
+		{0, 4, "Invalid number of feature args"},
+	};
+	unsigned int opt_params;
+	const char *opt_string;
+	bool iv_large_sectors = false;
+	char dummy;
+	int err;
+
+	as.argc = argc;
+	as.argv = argv;
+
+	err = dm_read_arg_group(_args, &as, &opt_params, &ti->error);
+	if (err)
+		return err;
+
+	while (opt_params--) {
+		opt_string = dm_shift_arg(&as);
+		if (!opt_string) {
+			ti->error = "Not enough feature arguments";
+			return -EINVAL;
+		}
+		if (!strcmp(opt_string, "allow_discards")) {
+			ti->num_discard_bios = 1;
+		} else if (sscanf(opt_string, "sector_size:%u%c",
+				  &dkc->sector_size, &dummy) == 1) {
+			if (dkc->sector_size < SECTOR_SIZE ||
+			    dkc->sector_size > 4096 ||
+			    !is_power_of_2(dkc->sector_size)) {
+				ti->error = "Invalid sector_size";
+				return -EINVAL;
+			}
+		} else if (!strcmp(opt_string, "iv_large_sectors")) {
+			iv_large_sectors = true;
+		} else if (!strcmp(opt_string, "wrappedkey_v0")) {
+			dkc->key_type = BLK_CRYPTO_KEY_TYPE_HW_WRAPPED;
+		} else {
+			ti->error = "Invalid feature arguments";
+			return -EINVAL;
+		}
+	}
+
+	/* dm-default-key doesn't implement iv_large_sectors=false. */
+	if (dkc->sector_size != SECTOR_SIZE && !iv_large_sectors) {
+		ti->error = "iv_large_sectors must be specified";
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Construct a default-key mapping:
+ * <cipher> <key> <iv_offset> <dev_path> <start>
+ *
+ * This syntax matches dm-crypt's, but lots of unneeded functionality has been
+ * removed.  Also, dm-default-key requires that the "iv_large_sectors" option be
+ * given whenever a non-default sector size is used.
+ */
+static int default_key_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct default_key_c *dkc;
+	const struct dm_default_key_cipher *cipher;
+	u8 raw_key[BLK_CRYPTO_MAX_ANY_KEY_SIZE];
+	unsigned int raw_key_size;
+	unsigned int dun_bytes;
+	unsigned long long tmpll;
+	char dummy;
+	int err;
+
+	if (argc < 5) {
+		ti->error = "Not enough arguments";
+		return -EINVAL;
+	}
+
+	dkc = kzalloc(sizeof(*dkc), GFP_KERNEL);
+	if (!dkc) {
+		ti->error = "Out of memory";
+		return -ENOMEM;
+	}
+	ti->private = dkc;
+	dkc->key_type = BLK_CRYPTO_KEY_TYPE_STANDARD;
+
+	/* <cipher> */
+	dkc->cipher_string = kstrdup(argv[0], GFP_KERNEL);
+	if (!dkc->cipher_string) {
+		ti->error = "Out of memory";
+		err = -ENOMEM;
+		goto bad;
+	}
+	cipher = lookup_cipher(dkc->cipher_string);
+	if (!cipher) {
+		ti->error = "Unsupported cipher";
+		err = -EINVAL;
+		goto bad;
+	}
+
+	/* <key> */
+	raw_key_size = strlen(argv[1]);
+	if (raw_key_size > 2 * BLK_CRYPTO_MAX_ANY_KEY_SIZE ||
+	    raw_key_size % 2) {
+		ti->error = "Invalid keysize";
+		err = -EINVAL;
+		goto bad;
+	}
+	raw_key_size /= 2;
+	if (hex2bin(raw_key, argv[1], raw_key_size) != 0) {
+		ti->error = "Malformed key string";
+		err = -EINVAL;
+		goto bad;
+	}
+
+	/* <iv_offset> */
+	if (sscanf(argv[2], "%llu%c", &dkc->iv_offset, &dummy) != 1) {
+		ti->error = "Invalid iv_offset sector";
+		err = -EINVAL;
+		goto bad;
+	}
+
+	/* <dev_path> */
+	err = dm_get_device(ti, argv[3], dm_table_get_mode(ti->table),
+			    &dkc->dev);
+	if (err) {
+		ti->error = "Device lookup failed";
+		goto bad;
+	}
+
+	/* <start> */
+	if (sscanf(argv[4], "%llu%c", &tmpll, &dummy) != 1 ||
+	    tmpll != (sector_t)tmpll) {
+		ti->error = "Invalid start sector";
+		err = -EINVAL;
+		goto bad;
+	}
+	dkc->start = tmpll;
+
+	/* optional arguments */
+	dkc->sector_size = SECTOR_SIZE;
+	if (argc > 5) {
+		err = default_key_ctr_optional(ti, argc - 5, &argv[5]);
+		if (err)
+			goto bad;
+	}
+	dkc->sector_bits = ilog2(dkc->sector_size);
+	if (ti->len & ((dkc->sector_size >> SECTOR_SHIFT) - 1)) {
+		ti->error = "Device size is not a multiple of sector_size";
+		err = -EINVAL;
+		goto bad;
+	}
+
+	dkc->max_dun = (dkc->iv_offset + ti->len - 1) >>
+		       (dkc->sector_bits - SECTOR_SHIFT);
+	dun_bytes = DIV_ROUND_UP(fls64(dkc->max_dun), 8);
+
+	err = blk_crypto_init_key(&dkc->key, raw_key, raw_key_size,
+				  dkc->key_type, cipher->mode_num,
+				  dun_bytes, dkc->sector_size);
+	if (err) {
+		ti->error = "Error initializing blk-crypto key";
+		goto bad;
+	}
+
+	err = blk_crypto_start_using_key(dkc->dev->bdev, &dkc->key);
+	if (err) {
+		ti->error = "Error starting to use blk-crypto";
+		goto bad;
+	}
+
+	ti->num_flush_bios = 1;
+
+	err = 0;
+	goto out;
+
+bad:
+	default_key_dtr(ti);
+out:
+	memzero_explicit(raw_key, sizeof(raw_key));
+	return err;
+}
+
+static int default_key_map(struct dm_target *ti, struct bio *bio)
+{
+	const struct default_key_c *dkc = ti->private;
+	sector_t sector_in_target;
+	u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE] = { 0 };
+
+	bio_set_dev(bio, dkc->dev->bdev);
+
+	/*
+	 * If the bio is a device-level request which doesn't target a specific
+	 * sector, there's nothing more to do.
+	 */
+	if (bio_sectors(bio) == 0)
+		return DM_MAPIO_REMAPPED;
+
+	/* Map the bio's sector to the underlying device. (512-byte sectors) */
+	sector_in_target = dm_target_offset(ti, bio->bi_iter.bi_sector);
+	bio->bi_iter.bi_sector = dkc->start + sector_in_target;
+
+	/*
+	 * If the bio should skip dm-default-key (i.e. if it's for an encrypted
+	 * file's contents), or if it doesn't have any data (e.g. if it's a
+	 * DISCARD request), there's nothing more to do.
+	 */
+	if (bio_should_skip_dm_default_key(bio) || !bio_has_data(bio))
+		return DM_MAPIO_REMAPPED;
+
+	/*
+	 * Else, dm-default-key needs to set this bio's encryption context.
+	 * It must not already have one.
+	 */
+	if (WARN_ON_ONCE(bio_has_crypt_ctx(bio)))
+		return DM_MAPIO_KILL;
+
+	/* Calculate the DUN and enforce data-unit (crypto sector) alignment. */
+	dun[0] = dkc->iv_offset + sector_in_target; /* 512-byte sectors */
+	if (dun[0] & ((dkc->sector_size >> SECTOR_SHIFT) - 1))
+		return DM_MAPIO_KILL;
+	dun[0] >>= dkc->sector_bits - SECTOR_SHIFT; /* crypto sectors */
+
+	/*
+	 * This check isn't necessary as we should have calculated max_dun
+	 * correctly, but be safe.
+	 */
+	if (WARN_ON_ONCE(dun[0] > dkc->max_dun))
+		return DM_MAPIO_KILL;
+
+	bio_crypt_set_ctx(bio, &dkc->key, dun, GFP_NOIO);
+
+	return DM_MAPIO_REMAPPED;
+}
+
+static void default_key_status(struct dm_target *ti, status_type_t type,
+			       unsigned int status_flags, char *result,
+			       unsigned int maxlen)
+{
+	const struct default_key_c *dkc = ti->private;
+	unsigned int sz = 0;
+	int num_feature_args = 0;
+
+	switch (type) {
+	case STATUSTYPE_INFO:
+	case STATUSTYPE_IMA:
+		result[0] = '\0';
+		break;
+
+	case STATUSTYPE_TABLE:
+		/* Omit the key for now. */
+		DMEMIT("%s - %llu %s %llu", dkc->cipher_string, dkc->iv_offset,
+		       dkc->dev->name, (unsigned long long)dkc->start);
+
+		num_feature_args += !!ti->num_discard_bios;
+		if (dkc->sector_size != SECTOR_SIZE)
+			num_feature_args += 2;
+		if (dkc->key_type == BLK_CRYPTO_KEY_TYPE_HW_WRAPPED)
+			num_feature_args += 1;
+		if (num_feature_args != 0) {
+			DMEMIT(" %d", num_feature_args);
+			if (ti->num_discard_bios)
+				DMEMIT(" allow_discards");
+			if (dkc->sector_size != SECTOR_SIZE) {
+				DMEMIT(" sector_size:%u", dkc->sector_size);
+				DMEMIT(" iv_large_sectors");
+			}
+			if (dkc->key_type == BLK_CRYPTO_KEY_TYPE_HW_WRAPPED)
+				DMEMIT(" wrappedkey_v0");
+		}
+		break;
+	}
+}
+
+static int default_key_prepare_ioctl(struct dm_target *ti,
+				     struct block_device **bdev)
+{
+	const struct default_key_c *dkc = ti->private;
+	const struct dm_dev *dev = dkc->dev;
+
+	*bdev = dev->bdev;
+
+	/* Only pass ioctls through if the device sizes match exactly. */
+	if (dkc->start != 0 ||
+	    ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT)
+		return 1;
+	return 0;
+}
+
+static int default_key_iterate_devices(struct dm_target *ti,
+				       iterate_devices_callout_fn fn,
+				       void *data)
+{
+	const struct default_key_c *dkc = ti->private;
+
+	return fn(ti, dkc->dev, dkc->start, ti->len, data);
+}
+
+static void default_key_io_hints(struct dm_target *ti,
+				 struct queue_limits *limits)
+{
+	const struct default_key_c *dkc = ti->private;
+	const unsigned int sector_size = dkc->sector_size;
+
+	limits->logical_block_size =
+		max_t(unsigned int, limits->logical_block_size, sector_size);
+	limits->physical_block_size =
+		max_t(unsigned int, limits->physical_block_size, sector_size);
+	limits->io_min = max_t(unsigned int, limits->io_min, sector_size);
+}
+
+#ifdef CONFIG_BLK_DEV_ZONED
+static int default_key_report_zones(struct dm_target *ti,
+		struct dm_report_zones_args *args, unsigned int nr_zones)
+{
+	struct default_key_c *dkc = ti->private;
+
+	return dm_report_zones(dkc->dev->bdev, dkc->start,
+			dkc->start + dm_target_offset(ti, args->next_sector),
+			args, nr_zones);
+}
+#else
+#define default_key_report_zones NULL
+#endif
+
+static struct target_type default_key_target = {
+	.name			= "default-key",
+	.version		= {2, 1, 0},
+	.features		= DM_TARGET_PASSES_CRYPTO | DM_TARGET_ZONED_HM,
+	.report_zones		= default_key_report_zones,
+	.module			= THIS_MODULE,
+	.ctr			= default_key_ctr,
+	.dtr			= default_key_dtr,
+	.map			= default_key_map,
+	.status			= default_key_status,
+	.prepare_ioctl		= default_key_prepare_ioctl,
+	.iterate_devices	= default_key_iterate_devices,
+	.io_hints		= default_key_io_hints,
+};
+
+static int __init dm_default_key_init(void)
+{
+	return dm_register_target(&default_key_target);
+}
+
+static void __exit dm_default_key_exit(void)
+{
+	dm_unregister_target(&default_key_target);
+}
+
+module_init(dm_default_key_init);
+module_exit(dm_default_key_exit);
+
+MODULE_AUTHOR("Paul Lawrence <paullawrence@google.com>");
+MODULE_AUTHOR("Paul Crowley <paulcrowley@google.com>");
+MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
+MODULE_DESCRIPTION(DM_NAME " target for encrypting filesystem metadata");
+MODULE_LICENSE("GPL");

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 078da18..e91afd6 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c

@@ -1215,7 +1215,7 @@ static int dm_keyslot_evict_callback(struct dm_target *ti, struct dm_dev *dev,
 	struct dm_keyslot_evict_args *args = data;
 	int err;
 
-	err = blk_crypto_evict_key(bdev_get_queue(dev->bdev), args->key);
+	err = blk_crypto_evict_key(dev->bdev, args->key);
 	if (!args->err)
 		args->err = err;
 	/* Always try to evict the key from all devices. */
@@ -1251,6 +1251,68 @@ static int dm_keyslot_evict(struct blk_crypto_profile *profile,
 	return args.err;
 }
 
+struct dm_derive_sw_secret_args {
+	const u8 *eph_key;
+	size_t eph_key_size;
+	u8 *sw_secret;
+	int err;
+};
+
+static int dm_derive_sw_secret_callback(struct dm_target *ti,
+					struct dm_dev *dev, sector_t start,
+					sector_t len, void *data)
+{
+	struct dm_derive_sw_secret_args *args = data;
+
+	if (!args->err)
+		return 0;
+
+	args->err = blk_crypto_derive_sw_secret(dev->bdev,
+						args->eph_key,
+						args->eph_key_size,
+						args->sw_secret);
+	/* Try another device in case this fails. */
+	return 0;
+}
+
+/*
+ * Retrieve the sw_secret from the underlying device.  Given that only one
+ * sw_secret can exist for a particular wrapped key, retrieve it only from the
+ * first device that supports derive_sw_secret().
+ */
+static int dm_derive_sw_secret(struct blk_crypto_profile *profile,
+			       const u8 *eph_key, size_t eph_key_size,
+			       u8 sw_secret[BLK_CRYPTO_SW_SECRET_SIZE])
+{
+	struct mapped_device *md =
+		container_of(profile, struct dm_crypto_profile, profile)->md;
+	struct dm_derive_sw_secret_args args = {
+		.eph_key = eph_key,
+		.eph_key_size = eph_key_size,
+		.sw_secret = sw_secret,
+		.err = -EOPNOTSUPP,
+	};
+	struct dm_table *t;
+	int srcu_idx;
+	int i;
+	struct dm_target *ti;
+
+	t = dm_get_live_table(md, &srcu_idx);
+	if (!t)
+		return -EOPNOTSUPP;
+	for (i = 0; i < t->num_targets; i++) {
+		ti = dm_table_get_target(t, i);
+		if (!ti->type->iterate_devices)
+			continue;
+		ti->type->iterate_devices(ti, dm_derive_sw_secret_callback,
+					  &args);
+		if (!args.err)
+			break;
+	}
+	dm_put_live_table(md, srcu_idx);
+	return args.err;
+}
+
 static int
 device_intersect_crypto_capabilities(struct dm_target *ti, struct dm_dev *dev,
 				     sector_t start, sector_t len, void *data)
@@ -1306,9 +1368,11 @@ static int dm_table_construct_crypto_profile(struct dm_table *t)
 	profile = &dmcp->profile;
 	blk_crypto_profile_init(profile, 0);
 	profile->ll_ops.keyslot_evict = dm_keyslot_evict;
+	profile->ll_ops.derive_sw_secret = dm_derive_sw_secret;
 	profile->max_dun_bytes_supported = UINT_MAX;
 	memset(profile->modes_supported, 0xFF,
 	       sizeof(profile->modes_supported));
+	profile->key_types_supported = ~0;
 
 	for (i = 0; i < t->num_targets; i++) {
 		struct dm_target *ti = dm_table_get_target(t, i);

diff --git a/drivers/md/dm-user.c b/drivers/md/dm-user.c
new file mode 100644
index 0000000..b3a4e2e
--- /dev/null
+++ b/drivers/md/dm-user.c

@@ -0,0 +1,1286 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2020 Google, Inc
+ * Copyright (C) 2020 Palmer Dabbelt <palmerdabbelt@google.com>
+ */
+
+#include <linux/device-mapper.h>
+#include <uapi/linux/dm-user.h>
+
+#include <linux/bio.h>
+#include <linux/init.h>
+#include <linux/mempool.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/uio.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+#define DM_MSG_PREFIX "user"
+
+#define MAX_OUTSTANDING_MESSAGES 128
+
+static unsigned int daemon_timeout_msec = 4000;
+module_param_named(dm_user_daemon_timeout_msec, daemon_timeout_msec, uint,
+		   0644);
+MODULE_PARM_DESC(dm_user_daemon_timeout_msec,
+		 "IO Timeout in msec if daemon does not process");
+
+/*
+ * dm-user uses four structures:
+ *
+ *  - "struct target", the outermost structure, corresponds to a single device
+ *    mapper target.  This contains the set of outstanding BIOs that have been
+ *    provided by DM and are not actively being processed by the user, along
+ *    with a misc device that userspace can open to communicate with the
+ *    kernel.  Each time userspaces opens the misc device a new channel is
+ *    created.
+ *  - "struct channel", which represents a single active communication channel
+ *    with userspace.  Userspace may choose arbitrary read/write sizes to use
+ *    when processing messages, channels form these into logical accesses.
+ *    When userspace responds to a full message the channel completes the BIO
+ *    and obtains a new message to process from the target.
+ *  - "struct message", which wraps a BIO with the additional information
+ *    required by the kernel to sort out what to do with BIOs when they return
+ *    from userspace.
+ *  - "struct dm_user_message", which is the exact message format that
+ *    userspace sees.
+ *
+ * The hot path contains three distinct operations:
+ *
+ *  - user_map(), which is provided a BIO from device mapper that is queued
+ *    into the target.  This allocates and enqueues a new message.
+ *  - dev_read(), which dequeues a message, copies it to userspace.
+ *  - dev_write(), which looks up a message (keyed by sequence number) and
+ *    completes the corresponding BIO.
+ *
+ * Lock ordering (outer to inner)
+ *
+ * 1) miscdevice's global lock.  This is held around dev_open, so it has to be
+ *    the outermost lock.
+ * 2) target->lock
+ * 3) channel->lock
+ */
+
+struct message {
+	/*
+	 * Messages themselves do not need a lock, they're protected by either
+	 * the target or channel's lock, depending on which can reference them
+	 * directly.
+	 */
+	struct dm_user_message msg;
+	struct bio *bio;
+	size_t posn_to_user;
+	size_t total_to_user;
+	size_t posn_from_user;
+	size_t total_from_user;
+
+	struct list_head from_user;
+	struct list_head to_user;
+
+	/*
+	 * These are written back from the user.  They live in the same spot in
+	 * the message, but we need to either keep the old values around or
+	 * call a bunch more BIO helpers.  These are only valid after write has
+	 * adopted the message.
+	 */
+	u64 return_type;
+	u64 return_flags;
+
+	struct delayed_work work;
+	bool delayed;
+	struct target *t;
+};
+
+struct target {
+	/*
+	 * A target has a single lock, which protects everything in the target
+	 * (but does not protect the channels associated with a target).
+	 */
+	struct mutex lock;
+
+	/*
+	 * There is only one point at which anything blocks: userspace blocks
+	 * reading a new message, which is woken up by device mapper providing
+	 * a new BIO to process (or tearing down the target).  The
+	 * corresponding write side doesn't block, instead we treat userspace's
+	 * response containing a message that has yet to be mapped as an
+	 * invalid operation.
+	 */
+	struct wait_queue_head wq;
+
+	/*
+	 * Messages are delivered to userspace in order, but may be returned
+	 * out of order.  This allows userspace to schedule IO if it wants to.
+	 */
+	mempool_t message_pool;
+	u64 next_seq_to_map;
+	u64 next_seq_to_user;
+	struct list_head to_user;
+
+	/*
+	 * There is a misc device per target.  The name is selected by
+	 * userspace (via a DM create ioctl argument), and each ends up in
+	 * /dev/dm-user/.  It looks like a better way to do this may be to have
+	 * a filesystem to manage these, but this was more expedient.  The
+	 * current mechanism is functional, but does result in an arbitrary
+	 * number of dynamically created misc devices.
+	 */
+	struct miscdevice miscdev;
+
+	/*
+	 * Device mapper's target destructor triggers tearing this all down,
+	 * but we can't actually free until every channel associated with this
+	 * target has been destroyed.  Channels each have a reference to their
+	 * target, and there is an additional single reference that corresponds
+	 * to both DM and the misc device (both of which are destroyed by DM).
+	 *
+	 * In the common case userspace will be asleep waiting for a new
+	 * message when device mapper decides to destroy the target, which
+	 * means no new messages will appear.  The destroyed flag triggers a
+	 * wakeup, which will end up removing the reference.
+	 */
+	struct kref references;
+	int dm_destroyed;
+	bool daemon_terminated;
+};
+
+struct channel {
+	struct target *target;
+
+	/*
+	 * A channel has a single lock, which prevents multiple reads (or
+	 * multiple writes) from conflicting with each other.
+	 */
+	struct mutex lock;
+
+	struct message *cur_to_user;
+	struct message *cur_from_user;
+	ssize_t to_user_error;
+	ssize_t from_user_error;
+
+	/*
+	 * Once a message has been forwarded to userspace on a channel it must
+	 * be responded to on the same channel.  This allows us to error out
+	 * the messages that have not yet been responded to by a channel when
+	 * that channel closes, which makes handling errors more reasonable for
+	 * fault-tolerant userspace daemons.  It also happens to make avoiding
+	 * shared locks between user_map() and dev_read() a lot easier.
+	 *
+	 * This does preclude a multi-threaded work stealing userspace
+	 * implementation (or at least, force a degree of head-of-line blocking
+	 * on the response path).
+	 */
+	struct list_head from_user;
+
+	/*
+	 * Responses from userspace can arrive in arbitrarily small chunks.
+	 * We need some place to buffer one up until we can find the
+	 * corresponding kernel-side message to continue processing, so instead
+	 * of allocating them we just keep one off to the side here.  This can
+	 * only ever be pointer to by from_user_cur, and will never have a BIO.
+	 */
+	struct message scratch_message_from_user;
+};
+
+static void message_kill(struct message *m, mempool_t *pool)
+{
+	m->bio->bi_status = BLK_STS_IOERR;
+	bio_endio(m->bio);
+	mempool_free(m, pool);
+}
+
+static inline bool is_user_space_thread_present(struct target *t)
+{
+	lockdep_assert_held(&t->lock);
+	return (kref_read(&t->references) > 1);
+}
+
+static void process_delayed_work(struct work_struct *work)
+{
+	struct delayed_work *del_work = to_delayed_work(work);
+	struct message *msg = container_of(del_work, struct message, work);
+
+	struct target *t = msg->t;
+
+	mutex_lock(&t->lock);
+
+	/*
+	 * There is at least one thread to process the IO.
+	 */
+	if (is_user_space_thread_present(t)) {
+		mutex_unlock(&t->lock);
+		return;
+	}
+
+	/*
+	 * Terminate the IO with an error
+	 */
+	list_del(&msg->to_user);
+	pr_err("I/O error: sector %llu: no user-space daemon for %s target\n",
+	       msg->bio->bi_iter.bi_sector,
+	       t->miscdev.name);
+	message_kill(msg, &t->message_pool);
+	mutex_unlock(&t->lock);
+}
+
+static void enqueue_delayed_work(struct message *m, bool is_delay)
+{
+	unsigned long delay = 0;
+
+	m->delayed = true;
+	INIT_DELAYED_WORK(&m->work, process_delayed_work);
+
+	/*
+	 * Snapuserd daemon is the user-space process
+	 * which processes IO request from dm-user
+	 * when OTA is applied. Per the current design,
+	 * when a dm-user target is created, daemon
+	 * attaches to target and starts processing
+	 * the IO's. Daemon is terminated only when
+	 * dm-user target is destroyed.
+	 *
+	 * If for some reason, daemon crashes or terminates early,
+	 * without destroying the dm-user target; then
+	 * there is no mechanism to restart the daemon
+	 * and start processing the IO's from the same target.
+	 * Theoretically, it is possible but that infrastructure
+	 * doesn't exist in the android ecosystem.
+	 *
+	 * Thus, when the daemon terminates, there is no way the IO's
+	 * issued on that target will be processed. Hence,
+	 * we set the delay to 0 and fail the IO's immediately.
+	 *
+	 * On the other hand, when a new dm-user target is created,
+	 * we wait for the daemon to get attached for the first time.
+	 * This primarily happens when init first stage spins up
+	 * the daemon. At this point, since the snapshot device is mounted
+	 * of a root filesystem, dm-user target may receive IO request
+	 * even though daemon is not fully launched. We don't want
+	 * to fail those IO requests immediately. Thus, we queue these
+	 * requests with a timeout so that daemon is ready to process
+	 * those IO requests. Again, if the daemon fails to launch within
+	 * the timeout period, then IO's will be failed.
+	 */
+	if (is_delay)
+		delay = msecs_to_jiffies(daemon_timeout_msec);
+
+	queue_delayed_work(system_wq, &m->work, delay);
+}
+
+static inline struct target *target_from_target(struct dm_target *target)
+{
+	WARN_ON(target->private == NULL);
+	return target->private;
+}
+
+static inline struct target *target_from_miscdev(struct miscdevice *miscdev)
+{
+	return container_of(miscdev, struct target, miscdev);
+}
+
+static inline struct channel *channel_from_file(struct file *file)
+{
+	WARN_ON(file->private_data == NULL);
+	return file->private_data;
+}
+
+static inline struct target *target_from_channel(struct channel *c)
+{
+	WARN_ON(c->target == NULL);
+	return c->target;
+}
+
+static inline size_t bio_size(struct bio *bio)
+{
+	struct bio_vec bvec;
+	struct bvec_iter iter;
+	size_t out = 0;
+
+	bio_for_each_segment (bvec, bio, iter)
+		out += bio_iter_len(bio, iter);
+	return out;
+}
+
+static inline size_t bio_bytes_needed_to_user(struct bio *bio)
+{
+	switch (bio_op(bio)) {
+	case REQ_OP_WRITE:
+		return sizeof(struct dm_user_message) + bio_size(bio);
+	case REQ_OP_READ:
+	case REQ_OP_FLUSH:
+	case REQ_OP_DISCARD:
+	case REQ_OP_SECURE_ERASE:
+	case REQ_OP_WRITE_ZEROES:
+		return sizeof(struct dm_user_message);
+
+	/*
+	 * These ops are not passed to userspace under the assumption that
+	 * they're not going to be particularly useful in that context.
+	 */
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static inline size_t bio_bytes_needed_from_user(struct bio *bio)
+{
+	switch (bio_op(bio)) {
+	case REQ_OP_READ:
+		return sizeof(struct dm_user_message) + bio_size(bio);
+	case REQ_OP_WRITE:
+	case REQ_OP_FLUSH:
+	case REQ_OP_DISCARD:
+	case REQ_OP_SECURE_ERASE:
+	case REQ_OP_WRITE_ZEROES:
+		return sizeof(struct dm_user_message);
+
+	/*
+	 * These ops are not passed to userspace under the assumption that
+	 * they're not going to be particularly useful in that context.
+	 */
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static inline long bio_type_to_user_type(struct bio *bio)
+{
+	switch (bio_op(bio)) {
+	case REQ_OP_READ:
+		return DM_USER_REQ_MAP_READ;
+	case REQ_OP_WRITE:
+		return DM_USER_REQ_MAP_WRITE;
+	case REQ_OP_FLUSH:
+		return DM_USER_REQ_MAP_FLUSH;
+	case REQ_OP_DISCARD:
+		return DM_USER_REQ_MAP_DISCARD;
+	case REQ_OP_SECURE_ERASE:
+		return DM_USER_REQ_MAP_SECURE_ERASE;
+	case REQ_OP_WRITE_ZEROES:
+		return DM_USER_REQ_MAP_WRITE_ZEROES;
+
+	/*
+	 * These ops are not passed to userspace under the assumption that
+	 * they're not going to be particularly useful in that context.
+	 */
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static inline long bio_flags_to_user_flags(struct bio *bio)
+{
+	u64 out = 0;
+	typeof(bio->bi_opf) opf = bio->bi_opf & ~REQ_OP_MASK;
+
+	if (opf & REQ_FAILFAST_DEV) {
+		opf &= ~REQ_FAILFAST_DEV;
+		out |= DM_USER_REQ_MAP_FLAG_FAILFAST_DEV;
+	}
+
+	if (opf & REQ_FAILFAST_TRANSPORT) {
+		opf &= ~REQ_FAILFAST_TRANSPORT;
+		out |= DM_USER_REQ_MAP_FLAG_FAILFAST_TRANSPORT;
+	}
+
+	if (opf & REQ_FAILFAST_DRIVER) {
+		opf &= ~REQ_FAILFAST_DRIVER;
+		out |= DM_USER_REQ_MAP_FLAG_FAILFAST_DRIVER;
+	}
+
+	if (opf & REQ_SYNC) {
+		opf &= ~REQ_SYNC;
+		out |= DM_USER_REQ_MAP_FLAG_SYNC;
+	}
+
+	if (opf & REQ_META) {
+		opf &= ~REQ_META;
+		out |= DM_USER_REQ_MAP_FLAG_META;
+	}
+
+	if (opf & REQ_PRIO) {
+		opf &= ~REQ_PRIO;
+		out |= DM_USER_REQ_MAP_FLAG_PRIO;
+	}
+
+	if (opf & REQ_NOMERGE) {
+		opf &= ~REQ_NOMERGE;
+		out |= DM_USER_REQ_MAP_FLAG_NOMERGE;
+	}
+
+	if (opf & REQ_IDLE) {
+		opf &= ~REQ_IDLE;
+		out |= DM_USER_REQ_MAP_FLAG_IDLE;
+	}
+
+	if (opf & REQ_INTEGRITY) {
+		opf &= ~REQ_INTEGRITY;
+		out |= DM_USER_REQ_MAP_FLAG_INTEGRITY;
+	}
+
+	if (opf & REQ_FUA) {
+		opf &= ~REQ_FUA;
+		out |= DM_USER_REQ_MAP_FLAG_FUA;
+	}
+
+	if (opf & REQ_PREFLUSH) {
+		opf &= ~REQ_PREFLUSH;
+		out |= DM_USER_REQ_MAP_FLAG_PREFLUSH;
+	}
+
+	if (opf & REQ_RAHEAD) {
+		opf &= ~REQ_RAHEAD;
+		out |= DM_USER_REQ_MAP_FLAG_RAHEAD;
+	}
+
+	if (opf & REQ_BACKGROUND) {
+		opf &= ~REQ_BACKGROUND;
+		out |= DM_USER_REQ_MAP_FLAG_BACKGROUND;
+	}
+
+	if (opf & REQ_NOWAIT) {
+		opf &= ~REQ_NOWAIT;
+		out |= DM_USER_REQ_MAP_FLAG_NOWAIT;
+	}
+
+	if (opf & REQ_NOUNMAP) {
+		opf &= ~REQ_NOUNMAP;
+		out |= DM_USER_REQ_MAP_FLAG_NOUNMAP;
+	}
+
+	if (unlikely(opf)) {
+		pr_warn("unsupported BIO type %x\n", opf);
+		return -EOPNOTSUPP;
+	}
+	WARN_ON(out < 0);
+	return out;
+}
+
+/*
+ * Not quite what's in blk-map.c, but instead what I thought the functions in
+ * blk-map did.  This one seems more generally useful and I think we could
+ * write the blk-map version in terms of this one.  The differences are that
+ * this has a return value that counts, and blk-map uses the BIO _all iters.
+ * Neither  advance the BIO iter but don't advance the IOV iter, which is a bit
+ * odd here.
+ */
+static ssize_t bio_copy_from_iter(struct bio *bio, struct iov_iter *iter)
+{
+	struct bio_vec bvec;
+	struct bvec_iter biter;
+	ssize_t out = 0;
+
+	bio_for_each_segment (bvec, bio, biter) {
+		ssize_t ret;
+
+		ret = copy_page_from_iter(bvec.bv_page, bvec.bv_offset,
+					  bvec.bv_len, iter);
+
+		/*
+		 * FIXME: I thought that IOV copies had a mechanism for
+		 * terminating early, if for example a signal came in while
+		 * sleeping waiting for a page to be mapped, but I don't see
+		 * where that would happen.
+		 */
+		WARN_ON(ret < 0);
+		out += ret;
+
+		if (!iov_iter_count(iter))
+			break;
+
+		if (ret < bvec.bv_len)
+			return ret;
+	}
+
+	return out;
+}
+
+static ssize_t bio_copy_to_iter(struct bio *bio, struct iov_iter *iter)
+{
+	struct bio_vec bvec;
+	struct bvec_iter biter;
+	ssize_t out = 0;
+
+	bio_for_each_segment (bvec, bio, biter) {
+		ssize_t ret;
+
+		ret = copy_page_to_iter(bvec.bv_page, bvec.bv_offset,
+					bvec.bv_len, iter);
+
+		/* as above */
+		WARN_ON(ret < 0);
+		out += ret;
+
+		if (!iov_iter_count(iter))
+			break;
+
+		if (ret < bvec.bv_len)
+			return ret;
+	}
+
+	return out;
+}
+
+static ssize_t msg_copy_to_iov(struct message *msg, struct iov_iter *to)
+{
+	ssize_t copied = 0;
+
+	if (!iov_iter_count(to))
+		return 0;
+
+	if (msg->posn_to_user < sizeof(msg->msg)) {
+		copied = copy_to_iter((char *)(&msg->msg) + msg->posn_to_user,
+				      sizeof(msg->msg) - msg->posn_to_user, to);
+	} else {
+		copied = bio_copy_to_iter(msg->bio, to);
+		if (copied > 0)
+			bio_advance(msg->bio, copied);
+	}
+
+	if (copied < 0)
+		return copied;
+
+	msg->posn_to_user += copied;
+	return copied;
+}
+
+static ssize_t msg_copy_from_iov(struct message *msg, struct iov_iter *from)
+{
+	ssize_t copied = 0;
+
+	if (!iov_iter_count(from))
+		return 0;
+
+	if (msg->posn_from_user < sizeof(msg->msg)) {
+		copied = copy_from_iter(
+			(char *)(&msg->msg) + msg->posn_from_user,
+			sizeof(msg->msg) - msg->posn_from_user, from);
+	} else {
+		copied = bio_copy_from_iter(msg->bio, from);
+		if (copied > 0)
+			bio_advance(msg->bio, copied);
+	}
+
+	if (copied < 0)
+		return copied;
+
+	msg->posn_from_user += copied;
+	return copied;
+}
+
+static struct message *msg_get_map(struct target *t)
+{
+	struct message *m;
+
+	lockdep_assert_held(&t->lock);
+
+	m = mempool_alloc(&t->message_pool, GFP_NOIO);
+	m->msg.seq = t->next_seq_to_map++;
+	INIT_LIST_HEAD(&m->to_user);
+	INIT_LIST_HEAD(&m->from_user);
+	return m;
+}
+
+static struct message *msg_get_to_user(struct target *t)
+{
+	struct message *m;
+
+	lockdep_assert_held(&t->lock);
+
+	if (list_empty(&t->to_user))
+		return NULL;
+
+	m = list_first_entry(&t->to_user, struct message, to_user);
+
+	list_del(&m->to_user);
+
+	/*
+	 * If the IO was queued to workqueue since there
+	 * was no daemon to service the IO, then we
+	 * will have to cancel the delayed work as the
+	 * IO will be processed by this user-space thread.
+	 *
+	 * If the delayed work was already picked up for
+	 * processing, then wait for it to complete. Note
+	 * that the IO will not be terminated by the work
+	 * queue thread.
+	 */
+	if (unlikely(m->delayed)) {
+		mutex_unlock(&t->lock);
+		cancel_delayed_work_sync(&m->work);
+		mutex_lock(&t->lock);
+	}
+	return m;
+}
+
+static struct message *msg_get_from_user(struct channel *c, u64 seq)
+{
+	struct message *m;
+	struct list_head *cur, *tmp;
+
+	lockdep_assert_held(&c->lock);
+
+	list_for_each_safe (cur, tmp, &c->from_user) {
+		m = list_entry(cur, struct message, from_user);
+		if (m->msg.seq == seq) {
+			list_del(&m->from_user);
+			return m;
+		}
+	}
+
+	return NULL;
+}
+
+/*
+ * Returns 0 when there is no work left to do.  This must be callable without
+ * holding the target lock, as it is part of the waitqueue's check expression.
+ * When called without the lock it may spuriously indicate there is remaining
+ * work, but when called with the lock it must be accurate.
+ */
+static int target_poll(struct target *t)
+{
+	return !list_empty(&t->to_user) || t->dm_destroyed;
+}
+
+static void target_release(struct kref *ref)
+{
+	struct target *t = container_of(ref, struct target, references);
+	struct list_head *cur, *tmp;
+
+	/*
+	 * There may be outstanding BIOs that have not yet been given to
+	 * userspace.  At this point there's nothing we can do about them, as
+	 * there are and will never be any channels.
+	 */
+	list_for_each_safe (cur, tmp, &t->to_user) {
+		struct message *m = list_entry(cur, struct message, to_user);
+
+		if (unlikely(m->delayed)) {
+			bool ret;
+
+			mutex_unlock(&t->lock);
+			ret = cancel_delayed_work_sync(&m->work);
+			mutex_lock(&t->lock);
+			if (!ret)
+				continue;
+		}
+		message_kill(m, &t->message_pool);
+	}
+
+	mempool_exit(&t->message_pool);
+	mutex_unlock(&t->lock);
+	mutex_destroy(&t->lock);
+	kfree(t);
+}
+
+static void target_put(struct target *t)
+{
+	/*
+	 * This both releases a reference to the target and the lock.  We leave
+	 * it up to the caller to hold the lock, as they probably needed it for
+	 * something else.
+	 */
+	lockdep_assert_held(&t->lock);
+
+	if (!kref_put(&t->references, target_release)) {
+		/*
+		 * User-space thread is getting terminated.
+		 * We need to scan the list for all those
+		 * pending IO's which were not processed yet
+		 * and put them back to work-queue for delayed
+		 * processing.
+		 */
+		if (!is_user_space_thread_present(t)) {
+			struct list_head *cur, *tmp;
+
+			list_for_each_safe(cur, tmp, &t->to_user) {
+				struct message *m = list_entry(cur,
+							       struct message,
+							       to_user);
+				if (!m->delayed)
+					enqueue_delayed_work(m, false);
+			}
+			/*
+			 * Daemon attached to this target is terminated.
+			 */
+			t->daemon_terminated = true;
+		}
+		mutex_unlock(&t->lock);
+	}
+}
+
+static struct channel *channel_alloc(struct target *t)
+{
+	struct channel *c;
+
+	lockdep_assert_held(&t->lock);
+
+	c = kzalloc(sizeof(*c), GFP_KERNEL);
+	if (c == NULL)
+		return NULL;
+
+	kref_get(&t->references);
+	c->target = t;
+	c->cur_from_user = &c->scratch_message_from_user;
+	mutex_init(&c->lock);
+	INIT_LIST_HEAD(&c->from_user);
+	return c;
+}
+
+static void channel_free(struct channel *c)
+{
+	struct list_head *cur, *tmp;
+
+	lockdep_assert_held(&c->lock);
+
+	/*
+	 * There may be outstanding BIOs that have been given to userspace but
+	 * have not yet been completed.  The channel has been shut down so
+	 * there's no way to process the rest of those messages, so we just go
+	 * ahead and error out the BIOs.  Hopefully whatever's on the other end
+	 * can handle the errors.  One could imagine splitting the BIOs and
+	 * completing as much as we got, but that seems like overkill here.
+	 *
+	 * Our only other options would be to let the BIO hang around (which
+	 * seems way worse) or to resubmit it to userspace in the hope there's
+	 * another channel.  I don't really like the idea of submitting a
+	 * message twice.
+	 */
+	if (c->cur_to_user != NULL)
+		message_kill(c->cur_to_user, &c->target->message_pool);
+	if (c->cur_from_user != &c->scratch_message_from_user)
+		message_kill(c->cur_from_user, &c->target->message_pool);
+	list_for_each_safe (cur, tmp, &c->from_user)
+		message_kill(list_entry(cur, struct message, from_user),
+			     &c->target->message_pool);
+
+	mutex_lock(&c->target->lock);
+	target_put(c->target);
+	mutex_unlock(&c->lock);
+	mutex_destroy(&c->lock);
+	kfree(c);
+}
+
+static int dev_open(struct inode *inode, struct file *file)
+{
+	struct channel *c;
+	struct target *t;
+
+	/*
+	 * This is called by miscdev, which sets private_data to point to the
+	 * struct miscdevice that was opened.  The rest of our file operations
+	 * want to refer to the channel that's been opened, so we swap that
+	 * pointer out with a fresh channel.
+	 *
+	 * This is called with the miscdev lock held, which is also held while
+	 * registering/unregistering the miscdev.  The miscdev must be
+	 * registered for this to get called, which means there must be an
+	 * outstanding reference to the target, which means it cannot be freed
+	 * out from under us despite us not holding a reference yet.
+	 */
+	t = container_of(file->private_data, struct target, miscdev);
+	mutex_lock(&t->lock);
+	file->private_data = c = channel_alloc(t);
+
+	if (c == NULL) {
+		mutex_unlock(&t->lock);
+		return -ENOMEM;
+	}
+
+	mutex_unlock(&t->lock);
+	return 0;
+}
+
+static ssize_t dev_read(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct channel *c = channel_from_file(iocb->ki_filp);
+	ssize_t total_processed = 0;
+	ssize_t processed;
+
+	mutex_lock(&c->lock);
+
+	if (unlikely(c->to_user_error)) {
+		total_processed = c->to_user_error;
+		goto cleanup_unlock;
+	}
+
+	if (c->cur_to_user == NULL) {
+		struct target *t = target_from_channel(c);
+
+		mutex_lock(&t->lock);
+
+		while (!target_poll(t)) {
+			int e;
+
+			mutex_unlock(&t->lock);
+			mutex_unlock(&c->lock);
+			e = wait_event_interruptible(t->wq, target_poll(t));
+			mutex_lock(&c->lock);
+			mutex_lock(&t->lock);
+
+			if (unlikely(e != 0)) {
+				/*
+				 * We haven't processed any bytes in either the
+				 * BIO or the IOV, so we can just terminate
+				 * right now.  Elsewhere in the kernel handles
+				 * restarting the syscall when appropriate.
+				 */
+				total_processed = e;
+				mutex_unlock(&t->lock);
+				goto cleanup_unlock;
+			}
+		}
+
+		if (unlikely(t->dm_destroyed)) {
+			/*
+			 * DM has destroyed this target, so just lock
+			 * the user out.  There's really nothing else
+			 * we can do here.  Note that we don't actually
+			 * tear any thing down until userspace has
+			 * closed the FD, as there may still be
+			 * outstanding BIOs.
+			 *
+			 * This is kind of a wacky error code to
+			 * return.  My goal was really just to try and
+			 * find something that wasn't likely to be
+			 * returned by anything else in the miscdev
+			 * path.  The message "block device required"
+			 * seems like a somewhat reasonable thing to
+			 * say when the target has disappeared out from
+			 * under us, but "not block" isn't sensible.
+			 */
+			c->to_user_error = total_processed = -ENOTBLK;
+			mutex_unlock(&t->lock);
+			goto cleanup_unlock;
+		}
+
+		/*
+		 * Ensures that accesses to the message data are not ordered
+		 * before the remote accesses that produce that message data.
+		 *
+		 * This pairs with the barrier in user_map(), via the
+		 * conditional within the while loop above. Also see the lack
+		 * of barrier in user_dtr(), which is why this can be after the
+		 * destroyed check.
+		 */
+		smp_rmb();
+
+		c->cur_to_user = msg_get_to_user(t);
+		WARN_ON(c->cur_to_user == NULL);
+		mutex_unlock(&t->lock);
+	}
+
+	processed = msg_copy_to_iov(c->cur_to_user, to);
+	total_processed += processed;
+
+	WARN_ON(c->cur_to_user->posn_to_user > c->cur_to_user->total_to_user);
+	if (c->cur_to_user->posn_to_user == c->cur_to_user->total_to_user) {
+		struct message *m = c->cur_to_user;
+
+		c->cur_to_user = NULL;
+		list_add_tail(&m->from_user, &c->from_user);
+	}
+
+cleanup_unlock:
+	mutex_unlock(&c->lock);
+	return total_processed;
+}
+
+static ssize_t dev_write(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct channel *c = channel_from_file(iocb->ki_filp);
+	ssize_t total_processed = 0;
+	ssize_t processed;
+
+	mutex_lock(&c->lock);
+
+	if (unlikely(c->from_user_error)) {
+		total_processed = c->from_user_error;
+		goto cleanup_unlock;
+	}
+
+	/*
+	 * cur_from_user can never be NULL.  If there's no real message it must
+	 * point to the scratch space.
+	 */
+	WARN_ON(c->cur_from_user == NULL);
+	if (c->cur_from_user->posn_from_user < sizeof(struct dm_user_message)) {
+		struct message *msg, *old;
+
+		processed = msg_copy_from_iov(c->cur_from_user, from);
+		if (processed <= 0) {
+			pr_warn("msg_copy_from_iov() returned %zu\n",
+				processed);
+			c->from_user_error = -EINVAL;
+			goto cleanup_unlock;
+		}
+		total_processed += processed;
+
+		/*
+		 * In the unlikely event the user has provided us a very short
+		 * write, not even big enough to fill a message, just succeed.
+		 * We'll eventually build up enough bytes to do something.
+		 */
+		if (unlikely(c->cur_from_user->posn_from_user <
+			     sizeof(struct dm_user_message)))
+			goto cleanup_unlock;
+
+		old = c->cur_from_user;
+		mutex_lock(&c->target->lock);
+		msg = msg_get_from_user(c, c->cur_from_user->msg.seq);
+		if (msg == NULL) {
+			pr_info("user provided an invalid messag seq of %llx\n",
+				old->msg.seq);
+			mutex_unlock(&c->target->lock);
+			c->from_user_error = -EINVAL;
+			goto cleanup_unlock;
+		}
+		mutex_unlock(&c->target->lock);
+
+		WARN_ON(old->posn_from_user != sizeof(struct dm_user_message));
+		msg->posn_from_user = sizeof(struct dm_user_message);
+		msg->return_type = old->msg.type;
+		msg->return_flags = old->msg.flags;
+		WARN_ON(msg->posn_from_user > msg->total_from_user);
+		c->cur_from_user = msg;
+		WARN_ON(old != &c->scratch_message_from_user);
+	}
+
+	/*
+	 * Userspace can signal an error for single requests by overwriting the
+	 * seq field.
+	 */
+	switch (c->cur_from_user->return_type) {
+	case DM_USER_RESP_SUCCESS:
+		c->cur_from_user->bio->bi_status = BLK_STS_OK;
+		break;
+	case DM_USER_RESP_ERROR:
+	case DM_USER_RESP_UNSUPPORTED:
+	default:
+		c->cur_from_user->bio->bi_status = BLK_STS_IOERR;
+		goto finish_bio;
+	}
+
+	/*
+	 * The op was a success as far as userspace is concerned, so process
+	 * whatever data may come along with it.  The user may provide the BIO
+	 * data in multiple chunks, in which case we don't need to finish the
+	 * BIO.
+	 */
+	processed = msg_copy_from_iov(c->cur_from_user, from);
+	total_processed += processed;
+
+	if (c->cur_from_user->posn_from_user <
+	    c->cur_from_user->total_from_user)
+		goto cleanup_unlock;
+
+finish_bio:
+	/*
+	 * When we set up this message the BIO's size matched the
+	 * message size, if that's not still the case then something
+	 * has gone off the rails.
+	 */
+	WARN_ON(bio_size(c->cur_from_user->bio) != 0);
+	bio_endio(c->cur_from_user->bio);
+
+	/*
+	 * We don't actually need to take the target lock here, as all
+	 * we're doing is freeing the message and mempools have their
+	 * own lock.  Each channel has its ows scratch message.
+	 */
+	WARN_ON(c->cur_from_user == &c->scratch_message_from_user);
+	mempool_free(c->cur_from_user, &c->target->message_pool);
+	c->scratch_message_from_user.posn_from_user = 0;
+	c->cur_from_user = &c->scratch_message_from_user;
+
+cleanup_unlock:
+	mutex_unlock(&c->lock);
+	return total_processed;
+}
+
+static int dev_release(struct inode *inode, struct file *file)
+{
+	struct channel *c;
+
+	c = channel_from_file(file);
+	mutex_lock(&c->lock);
+	channel_free(c);
+
+	return 0;
+}
+
+static const struct file_operations file_operations = {
+	.owner = THIS_MODULE,
+	.open = dev_open,
+	.llseek = no_llseek,
+	.read_iter = dev_read,
+	.write_iter = dev_write,
+	.release = dev_release,
+};
+
+static int user_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+	struct target *t;
+	int r;
+
+	if (argc != 3) {
+		ti->error = "Invalid argument count";
+		r = -EINVAL;
+		goto cleanup_none;
+	}
+
+	t = kzalloc(sizeof(*t), GFP_KERNEL);
+	if (t == NULL) {
+		r = -ENOMEM;
+		goto cleanup_none;
+	}
+	ti->private = t;
+
+	/* Enable more BIO types. */
+	ti->num_discard_bios = 1;
+	ti->discards_supported = true;
+	ti->num_flush_bios = 1;
+	ti->flush_supported = true;
+
+	/*
+	 * We begin with a single reference to the target, which is miscdev's
+	 * reference.  This ensures that the target won't be freed
+	 * until after the miscdev has been unregistered and all extant
+	 * channels have been closed.
+	 */
+	kref_init(&t->references);
+
+	t->daemon_terminated = false;
+	mutex_init(&t->lock);
+	init_waitqueue_head(&t->wq);
+	INIT_LIST_HEAD(&t->to_user);
+	mempool_init_kmalloc_pool(&t->message_pool, MAX_OUTSTANDING_MESSAGES,
+				  sizeof(struct message));
+
+	t->miscdev.minor = MISC_DYNAMIC_MINOR;
+	t->miscdev.fops = &file_operations;
+	t->miscdev.name = kasprintf(GFP_KERNEL, "dm-user/%s", argv[2]);
+	if (t->miscdev.name == NULL) {
+		r = -ENOMEM;
+		goto cleanup_message_pool;
+	}
+
+	/*
+	 * Once the miscdev is registered it can be opened and therefor
+	 * concurrent references to the channel can happen.  Holding the target
+	 * lock during misc_register() could deadlock.  If registration
+	 * succeeds then we will not access the target again so we just stick a
+	 * barrier here, which pairs with taking the target lock everywhere
+	 * else the target is accessed.
+	 *
+	 * I forgot where we ended up on the RCpc/RCsc locks.  IIU RCsc locks
+	 * would mean that we could take the target lock earlier and release it
+	 * here instead of the memory barrier.  I'm not sure that's any better,
+	 * though, and this isn't on a hot path so it probably doesn't matter
+	 * either way.
+	 */
+	smp_mb();
+
+	r = misc_register(&t->miscdev);
+	if (r) {
+		DMERR("Unable to register miscdev %s for dm-user",
+		      t->miscdev.name);
+		r = -ENOMEM;
+		goto cleanup_misc_name;
+	}
+
+	return 0;
+
+cleanup_misc_name:
+	kfree(t->miscdev.name);
+cleanup_message_pool:
+	mempool_exit(&t->message_pool);
+	kfree(t);
+cleanup_none:
+	return r;
+}
+
+static void user_dtr(struct dm_target *ti)
+{
+	struct target *t = target_from_target(ti);
+
+	/*
+	 * Removes the miscdev.  This must be called without the target lock
+	 * held to avoid a possible deadlock because our open implementation is
+	 * called holding the miscdev lock and must later take the target lock.
+	 *
+	 * There is no race here because only DM can register/unregister the
+	 * miscdev, and DM ensures that doesn't happen twice.  The internal
+	 * miscdev lock is sufficient to ensure there are no races between
+	 * deregistering the miscdev and open.
+	 */
+	misc_deregister(&t->miscdev);
+
+	/*
+	 * We are now free to take the target's lock and drop our reference to
+	 * the target.  There are almost certainly tasks sleeping in read on at
+	 * least one of the channels associated with this target, this
+	 * explicitly wakes them up and terminates the read.
+	 */
+	mutex_lock(&t->lock);
+	/*
+	 * No barrier here, as wait/wake ensures that the flag visibility is
+	 * correct WRT the wake/sleep state of the target tasks.
+	 */
+	t->dm_destroyed = true;
+	wake_up_all(&t->wq);
+	target_put(t);
+}
+
+/*
+ * Consumes a BIO from device mapper, queueing it up for userspace.
+ */
+static int user_map(struct dm_target *ti, struct bio *bio)
+{
+	struct target *t;
+	struct message *entry;
+
+	t = target_from_target(ti);
+	/*
+	 * FIXME
+	 *
+	 * This seems like a bad idea.  Specifically, here we're
+	 * directly on the IO path when we take the target lock, which may also
+	 * be taken from a user context.  The user context doesn't actively
+	 * trigger anything that may sleep while holding the lock, but this
+	 * still seems like a bad idea.
+	 *
+	 * The obvious way to fix this would be to use a proper queue, which
+	 * would result in no shared locks between the direct IO path and user
+	 * tasks.  I had a version that did this, but the head-of-line blocking
+	 * from the circular buffer resulted in us needing a fairly large
+	 * allocation in order to avoid situations in which the queue fills up
+	 * and everything goes off the rails.
+	 *
+	 * I could jump through a some hoops to avoid a shared lock while still
+	 * allowing for a large queue, but I'm not actually sure that allowing
+	 * for very large queues is the right thing to do here.  Intuitively it
+	 * seems better to keep the queues small in here (essentially sized to
+	 * the user latency for performance reasons only) and rely on returning
+	 * DM_MAPIO_REQUEUE regularly, as that would give the rest of the
+	 * kernel more information.
+	 *
+	 * I'll spend some time trying to figure out what's going on with
+	 * DM_MAPIO_REQUEUE, but if someone has a better idea of how to fix
+	 * this I'm all ears.
+	 */
+	mutex_lock(&t->lock);
+
+	/*
+	 * FIXME
+	 *
+	 * The assumption here is that there's no benefit to returning
+	 * DM_MAPIO_KILL as opposed to just erroring out the BIO, but I'm not
+	 * sure that's actually true -- for example, I could imagine users
+	 * expecting that submitted BIOs are unlikely to fail and therefor
+	 * relying on submission failure to indicate an unsupported type.
+	 *
+	 * There's two ways I can think of to fix this:
+	 *   - Add DM arguments that are parsed during the constructor that
+	 *     allow various dm_target flags to be set that indicate the op
+	 *     types supported by this target.  This may make sense for things
+	 *     like discard, where DM can already transform the BIOs to a form
+	 *     that's likely to be supported.
+	 *   - Some sort of pre-filter that allows userspace to hook in here
+	 *     and kill BIOs before marking them as submitted.  My guess would
+	 *     be that a userspace round trip is a bad idea here, but a BPF
+	 *     call seems resonable.
+	 *
+	 * My guess is that we'd likely want to do both.  The first one is easy
+	 * and gives DM the proper info, so it seems better.  The BPF call
+	 * seems overly complex for just this, but one could imagine wanting to
+	 * sometimes return _MAPPED and a BPF filter would be the way to do
+	 * that.
+	 *
+	 * For example, in Android we have an in-kernel DM device called
+	 * "dm-bow" that takes advange of some portion of the space that has
+	 * been discarded on a device to provide opportunistic block-level
+	 * backups.  While one could imagine just implementing this entirely in
+	 * userspace, that would come with an appreciable performance penalty.
+	 * Instead one could keep a BPF program that forwards most accesses
+	 * directly to the backing block device while informing a userspace
+	 * daemon of any discarded space and on writes to blocks that are to be
+	 * backed up.
+	 */
+	if (unlikely((bio_type_to_user_type(bio) < 0) ||
+		     (bio_flags_to_user_flags(bio) < 0))) {
+		mutex_unlock(&t->lock);
+		return DM_MAPIO_KILL;
+	}
+
+	entry = msg_get_map(t);
+	if (unlikely(entry == NULL)) {
+		mutex_unlock(&t->lock);
+		return DM_MAPIO_REQUEUE;
+	}
+
+	entry->msg.type = bio_type_to_user_type(bio);
+	entry->msg.flags = bio_flags_to_user_flags(bio);
+	entry->msg.sector = bio->bi_iter.bi_sector;
+	entry->msg.len = bio_size(bio);
+	entry->bio = bio;
+	entry->posn_to_user = 0;
+	entry->total_to_user = bio_bytes_needed_to_user(bio);
+	entry->posn_from_user = 0;
+	entry->total_from_user = bio_bytes_needed_from_user(bio);
+	entry->delayed = false;
+	entry->t = t;
+	/* Pairs with the barrier in dev_read() */
+	smp_wmb();
+	list_add_tail(&entry->to_user, &t->to_user);
+
+	/*
+	 * If there is no daemon to process the IO's,
+	 * queue these messages into a workqueue with
+	 * a timeout.
+	 */
+	if (!is_user_space_thread_present(t))
+		enqueue_delayed_work(entry, !t->daemon_terminated);
+
+	wake_up_interruptible(&t->wq);
+	mutex_unlock(&t->lock);
+	return DM_MAPIO_SUBMITTED;
+}
+
+static struct target_type user_target = {
+	.name = "user",
+	.version = { 1, 0, 0 },
+	.module = THIS_MODULE,
+	.ctr = user_ctr,
+	.dtr = user_dtr,
+	.map = user_map,
+};
+
+static int __init dm_user_init(void)
+{
+	int r;
+
+	r = dm_register_target(&user_target);
+	if (r) {
+		DMERR("register failed %d", r);
+		goto error;
+	}
+
+	return 0;
+
+error:
+	return r;
+}
+
+static void __exit dm_user_exit(void)
+{
+	dm_unregister_target(&user_target);
+}
+
+module_init(dm_user_init);
+module_exit(dm_user_exit);
+MODULE_AUTHOR("Palmer Dabbelt <palmerdabbelt@google.com>");
+MODULE_DESCRIPTION(DM_NAME " target returning blocks from userspace");
+MODULE_LICENSE("GPL");

diff --git a/drivers/media/Kconfig b/drivers/media/Kconfig
index 283b78b..351c1e8 100644
--- a/drivers/media/Kconfig
+++ b/drivers/media/Kconfig

@@ -65,8 +65,7 @@
 # Multimedia support - automatically enable V4L2 and DVB core
 #
 config MEDIA_CAMERA_SUPPORT
-	bool
-	prompt "Cameras and video grabbers" if MEDIA_SUPPORT_FILTER
+	bool "Cameras and video grabbers"
 	default y if !MEDIA_SUPPORT_FILTER
 	help
 	  Enable support for webcams and video grabbers.
@@ -74,8 +73,7 @@
 	  Say Y when you have a webcam or a video capture grabber board.
 
 config MEDIA_ANALOG_TV_SUPPORT
-	bool
-	prompt "Analog TV" if MEDIA_SUPPORT_FILTER
+	bool "Analog TV"
 	default y if !MEDIA_SUPPORT_FILTER
 	help
 	  Enable analog TV support.
@@ -88,8 +86,7 @@
 		will disable support for them.
 
 config MEDIA_DIGITAL_TV_SUPPORT
-	bool
-	prompt "Digital TV" if MEDIA_SUPPORT_FILTER
+	bool "Digital TV"
 	default y if !MEDIA_SUPPORT_FILTER
 	help
 	  Enable digital TV support.
@@ -98,8 +95,7 @@
 	  hybrid digital TV and analog TV.
 
 config MEDIA_RADIO_SUPPORT
-	bool
-	prompt "AM/FM radio receivers/transmitters" if MEDIA_SUPPORT_FILTER
+	bool "AM/FM radio receivers/transmitters"
 	default y if !MEDIA_SUPPORT_FILTER
 	help
 	  Enable AM/FM radio support.
@@ -114,8 +110,7 @@
 		disable support for them.
 
 config MEDIA_SDR_SUPPORT
-	bool
-	prompt "Software defined radio" if MEDIA_SUPPORT_FILTER
+	bool "Software defined radio"
 	default y if !MEDIA_SUPPORT_FILTER
 	help
 	  Enable software defined radio support.
@@ -123,8 +118,7 @@
 	  Say Y when you have a software defined radio device.
 
 config MEDIA_PLATFORM_SUPPORT
-	bool
-	prompt "Platform-specific devices" if MEDIA_SUPPORT_FILTER
+	bool "Platform-specific devices"
 	default y if !MEDIA_SUPPORT_FILTER
 	help
 	  Enable support for complex cameras, codecs, and other hardware
@@ -137,8 +131,7 @@
 	  Say Y when you want to be able to see such devices.
 
 config MEDIA_TEST_SUPPORT
-	bool
-	prompt "Test drivers" if MEDIA_SUPPORT_FILTER
+	bool "Test drivers"
 	default y if !MEDIA_SUPPORT_FILTER
 	help
 	  These drivers should not be used on production kernels, but

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 358ad56..8834e33 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig

@@ -461,6 +461,22 @@
 	tristate
 	default MISC_RTSX_PCI || MISC_RTSX_USB
 
+config UID_SYS_STATS
+	bool "Per-UID statistics"
+	depends on PROFILING && TASK_XACCT && TASK_IO_ACCOUNTING
+	depends on BROKEN
+	help
+	  Per UID based cpu time statistics exported to /proc/uid_cputime
+	  Per UID based io statistics exported to /proc/uid_io
+	  Per UID based procstat control in /proc/uid_procstat
+
+config UID_SYS_STATS_DEBUG
+	bool "Per-TASK statistics"
+	depends on UID_SYS_STATS
+	default n
+	help
+	  Per TASK based io statistics exported to /proc/uid_io
+
 config HISI_HIKEY_USB
 	tristate "USB GPIO Hub on HiSilicon Hikey 960/970 Platform"
 	depends on (OF && GPIOLIB) || COMPILE_TEST

diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index ac9b3e7..72b154b 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile

@@ -62,3 +62,4 @@
 obj-$(CONFIG_OPEN_DICE)		+= open-dice.o
 obj-$(CONFIG_GP_PCI1XXXX)	+= mchp_pci1xxxx/
 obj-$(CONFIG_VCPU_STALL_DETECTOR)	+= vcpu_stall_detector.o
+obj-$(CONFIG_UID_SYS_STATS)	+= uid_sys_stats.o

diff --git a/drivers/misc/uid_sys_stats.c b/drivers/misc/uid_sys_stats.c
new file mode 100644
index 0000000..a941c47
--- /dev/null
+++ b/drivers/misc/uid_sys_stats.c

@@ -0,0 +1,706 @@
+/* drivers/misc/uid_sys_stats.c
+ *
+ * Copyright (C) 2014 - 2015 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/atomic.h>
+#include <linux/err.h>
+#include <linux/hashtable.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/proc_fs.h>
+#include <linux/profile.h>
+#include <linux/rtmutex.h>
+#include <linux/sched/cputime.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+
+#define UID_HASH_BITS	10
+DECLARE_HASHTABLE(hash_table, UID_HASH_BITS);
+
+static DEFINE_RT_MUTEX(uid_lock);
+static struct proc_dir_entry *cpu_parent;
+static struct proc_dir_entry *io_parent;
+static struct proc_dir_entry *proc_parent;
+
+struct io_stats {
+	u64 read_bytes;
+	u64 write_bytes;
+	u64 rchar;
+	u64 wchar;
+	u64 fsync;
+};
+
+#define UID_STATE_FOREGROUND	0
+#define UID_STATE_BACKGROUND	1
+#define UID_STATE_BUCKET_SIZE	2
+
+#define UID_STATE_TOTAL_CURR	2
+#define UID_STATE_TOTAL_LAST	3
+#define UID_STATE_DEAD_TASKS	4
+#define UID_STATE_SIZE		5
+
+#define MAX_TASK_COMM_LEN 256
+
+struct task_entry {
+	char comm[MAX_TASK_COMM_LEN];
+	pid_t pid;
+	struct io_stats io[UID_STATE_SIZE];
+	struct hlist_node hash;
+};
+
+struct uid_entry {
+	uid_t uid;
+	u64 utime;
+	u64 stime;
+	u64 active_utime;
+	u64 active_stime;
+	int state;
+	struct io_stats io[UID_STATE_SIZE];
+	struct hlist_node hash;
+#ifdef CONFIG_UID_SYS_STATS_DEBUG
+	DECLARE_HASHTABLE(task_entries, UID_HASH_BITS);
+#endif
+};
+
+static u64 compute_write_bytes(struct task_struct *task)
+{
+	if (task->ioac.write_bytes <= task->ioac.cancelled_write_bytes)
+		return 0;
+
+	return task->ioac.write_bytes - task->ioac.cancelled_write_bytes;
+}
+
+static void compute_io_bucket_stats(struct io_stats *io_bucket,
+					struct io_stats *io_curr,
+					struct io_stats *io_last,
+					struct io_stats *io_dead)
+{
+	/* tasks could switch to another uid group, but its io_last in the
+	 * previous uid group could still be positive.
+	 * therefore before each update, do an overflow check first
+	 */
+	int64_t delta;
+
+	delta = io_curr->read_bytes + io_dead->read_bytes -
+		io_last->read_bytes;
+	io_bucket->read_bytes += delta > 0 ? delta : 0;
+	delta = io_curr->write_bytes + io_dead->write_bytes -
+		io_last->write_bytes;
+	io_bucket->write_bytes += delta > 0 ? delta : 0;
+	delta = io_curr->rchar + io_dead->rchar - io_last->rchar;
+	io_bucket->rchar += delta > 0 ? delta : 0;
+	delta = io_curr->wchar + io_dead->wchar - io_last->wchar;
+	io_bucket->wchar += delta > 0 ? delta : 0;
+	delta = io_curr->fsync + io_dead->fsync - io_last->fsync;
+	io_bucket->fsync += delta > 0 ? delta : 0;
+
+	io_last->read_bytes = io_curr->read_bytes;
+	io_last->write_bytes = io_curr->write_bytes;
+	io_last->rchar = io_curr->rchar;
+	io_last->wchar = io_curr->wchar;
+	io_last->fsync = io_curr->fsync;
+
+	memset(io_dead, 0, sizeof(struct io_stats));
+}
+
+#ifdef CONFIG_UID_SYS_STATS_DEBUG
+static void get_full_task_comm(struct task_entry *task_entry,
+		struct task_struct *task)
+{
+	int i = 0, offset = 0, len = 0;
+	/* save one byte for terminating null character */
+	int unused_len = MAX_TASK_COMM_LEN - TASK_COMM_LEN - 1;
+	char buf[MAX_TASK_COMM_LEN - TASK_COMM_LEN - 1];
+	struct mm_struct *mm = task->mm;
+
+	/* fill the first TASK_COMM_LEN bytes with thread name */
+	__get_task_comm(task_entry->comm, TASK_COMM_LEN, task);
+	i = strlen(task_entry->comm);
+	while (i < TASK_COMM_LEN)
+		task_entry->comm[i++] = ' ';
+
+	/* next the executable file name */
+	if (mm) {
+		mmap_write_lock(mm);
+		if (mm->exe_file) {
+			char *pathname = d_path(&mm->exe_file->f_path, buf,
+					unused_len);
+
+			if (!IS_ERR(pathname)) {
+				len = strlcpy(task_entry->comm + i, pathname,
+						unused_len);
+				i += len;
+				task_entry->comm[i++] = ' ';
+				unused_len--;
+			}
+		}
+		mmap_write_unlock(mm);
+	}
+	unused_len -= len;
+
+	/* fill the rest with command line argument
+	 * replace each null or new line character
+	 * between args in argv with whitespace */
+	len = get_cmdline(task, buf, unused_len);
+	while (offset < len) {
+		if (buf[offset] != '\0' && buf[offset] != '\n')
+			task_entry->comm[i++] = buf[offset];
+		else
+			task_entry->comm[i++] = ' ';
+		offset++;
+	}
+
+	/* get rid of trailing whitespaces in case when arg is memset to
+	 * zero before being reset in userspace
+	 */
+	while (task_entry->comm[i-1] == ' ')
+		i--;
+	task_entry->comm[i] = '\0';
+}
+
+static struct task_entry *find_task_entry(struct uid_entry *uid_entry,
+		struct task_struct *task)
+{
+	struct task_entry *task_entry;
+
+	hash_for_each_possible(uid_entry->task_entries, task_entry, hash,
+			task->pid) {
+		if (task->pid == task_entry->pid) {
+			/* if thread name changed, update the entire command */
+			int len = strnchr(task_entry->comm, ' ', TASK_COMM_LEN)
+				- task_entry->comm;
+
+			if (strncmp(task_entry->comm, task->comm, len))
+				get_full_task_comm(task_entry, task);
+			return task_entry;
+		}
+	}
+	return NULL;
+}
+
+static struct task_entry *find_or_register_task(struct uid_entry *uid_entry,
+		struct task_struct *task)
+{
+	struct task_entry *task_entry;
+	pid_t pid = task->pid;
+
+	task_entry = find_task_entry(uid_entry, task);
+	if (task_entry)
+		return task_entry;
+
+	task_entry = kzalloc(sizeof(struct task_entry), GFP_ATOMIC);
+	if (!task_entry)
+		return NULL;
+
+	get_full_task_comm(task_entry, task);
+
+	task_entry->pid = pid;
+	hash_add(uid_entry->task_entries, &task_entry->hash, (unsigned int)pid);
+
+	return task_entry;
+}
+
+static void remove_uid_tasks(struct uid_entry *uid_entry)
+{
+	struct task_entry *task_entry;
+	unsigned long bkt_task;
+	struct hlist_node *tmp_task;
+
+	hash_for_each_safe(uid_entry->task_entries, bkt_task,
+			tmp_task, task_entry, hash) {
+		hash_del(&task_entry->hash);
+		kfree(task_entry);
+	}
+}
+
+static void set_io_uid_tasks_zero(struct uid_entry *uid_entry)
+{
+	struct task_entry *task_entry;
+	unsigned long bkt_task;
+
+	hash_for_each(uid_entry->task_entries, bkt_task, task_entry, hash) {
+		memset(&task_entry->io[UID_STATE_TOTAL_CURR], 0,
+			sizeof(struct io_stats));
+	}
+}
+
+static void add_uid_tasks_io_stats(struct uid_entry *uid_entry,
+		struct task_struct *task, int slot)
+{
+	struct task_entry *task_entry = find_or_register_task(uid_entry, task);
+	struct io_stats *task_io_slot = &task_entry->io[slot];
+
+	task_io_slot->read_bytes += task->ioac.read_bytes;
+	task_io_slot->write_bytes += compute_write_bytes(task);
+	task_io_slot->rchar += task->ioac.rchar;
+	task_io_slot->wchar += task->ioac.wchar;
+	task_io_slot->fsync += task->ioac.syscfs;
+}
+
+static void compute_io_uid_tasks(struct uid_entry *uid_entry)
+{
+	struct task_entry *task_entry;
+	unsigned long bkt_task;
+
+	hash_for_each(uid_entry->task_entries, bkt_task, task_entry, hash) {
+		compute_io_bucket_stats(&task_entry->io[uid_entry->state],
+					&task_entry->io[UID_STATE_TOTAL_CURR],
+					&task_entry->io[UID_STATE_TOTAL_LAST],
+					&task_entry->io[UID_STATE_DEAD_TASKS]);
+	}
+}
+
+static void show_io_uid_tasks(struct seq_file *m, struct uid_entry *uid_entry)
+{
+	struct task_entry *task_entry;
+	unsigned long bkt_task;
+
+	hash_for_each(uid_entry->task_entries, bkt_task, task_entry, hash) {
+		/* Separated by comma because space exists in task comm */
+		seq_printf(m, "task,%s,%lu,%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu\n",
+				task_entry->comm,
+				(unsigned long)task_entry->pid,
+				task_entry->io[UID_STATE_FOREGROUND].rchar,
+				task_entry->io[UID_STATE_FOREGROUND].wchar,
+				task_entry->io[UID_STATE_FOREGROUND].read_bytes,
+				task_entry->io[UID_STATE_FOREGROUND].write_bytes,
+				task_entry->io[UID_STATE_BACKGROUND].rchar,
+				task_entry->io[UID_STATE_BACKGROUND].wchar,
+				task_entry->io[UID_STATE_BACKGROUND].read_bytes,
+				task_entry->io[UID_STATE_BACKGROUND].write_bytes,
+				task_entry->io[UID_STATE_FOREGROUND].fsync,
+				task_entry->io[UID_STATE_BACKGROUND].fsync);
+	}
+}
+#else
+static void remove_uid_tasks(struct uid_entry *uid_entry) {};
+static void set_io_uid_tasks_zero(struct uid_entry *uid_entry) {};
+static void add_uid_tasks_io_stats(struct uid_entry *uid_entry,
+		struct task_struct *task, int slot) {};
+static void compute_io_uid_tasks(struct uid_entry *uid_entry) {};
+static void show_io_uid_tasks(struct seq_file *m,
+		struct uid_entry *uid_entry) {}
+#endif
+
+static struct uid_entry *find_uid_entry(uid_t uid)
+{
+	struct uid_entry *uid_entry;
+	hash_for_each_possible(hash_table, uid_entry, hash, uid) {
+		if (uid_entry->uid == uid)
+			return uid_entry;
+	}
+	return NULL;
+}
+
+static struct uid_entry *find_or_register_uid(uid_t uid)
+{
+	struct uid_entry *uid_entry;
+
+	uid_entry = find_uid_entry(uid);
+	if (uid_entry)
+		return uid_entry;
+
+	uid_entry = kzalloc(sizeof(struct uid_entry), GFP_ATOMIC);
+	if (!uid_entry)
+		return NULL;
+
+	uid_entry->uid = uid;
+#ifdef CONFIG_UID_SYS_STATS_DEBUG
+	hash_init(uid_entry->task_entries);
+#endif
+	hash_add(hash_table, &uid_entry->hash, uid);
+
+	return uid_entry;
+}
+
+static int uid_cputime_show(struct seq_file *m, void *v)
+{
+	struct uid_entry *uid_entry = NULL;
+	struct task_struct *task, *temp;
+	struct user_namespace *user_ns = current_user_ns();
+	u64 utime;
+	u64 stime;
+	unsigned long bkt;
+	uid_t uid;
+
+	rt_mutex_lock(&uid_lock);
+
+	hash_for_each(hash_table, bkt, uid_entry, hash) {
+		uid_entry->active_stime = 0;
+		uid_entry->active_utime = 0;
+	}
+
+	rcu_read_lock();
+	do_each_thread(temp, task) {
+		uid = from_kuid_munged(user_ns, task_uid(task));
+		if (!uid_entry || uid_entry->uid != uid)
+			uid_entry = find_or_register_uid(uid);
+		if (!uid_entry) {
+			rcu_read_unlock();
+			rt_mutex_unlock(&uid_lock);
+			pr_err("%s: failed to find the uid_entry for uid %d\n",
+				__func__, uid);
+			return -ENOMEM;
+		}
+		/* avoid double accounting of dying threads */
+		if (!(task->flags & PF_EXITING)) {
+			task_cputime_adjusted(task, &utime, &stime);
+			uid_entry->active_utime += utime;
+			uid_entry->active_stime += stime;
+		}
+	} while_each_thread(temp, task);
+	rcu_read_unlock();
+
+	hash_for_each(hash_table, bkt, uid_entry, hash) {
+		u64 total_utime = uid_entry->utime +
+							uid_entry->active_utime;
+		u64 total_stime = uid_entry->stime +
+							uid_entry->active_stime;
+		seq_printf(m, "%d: %llu %llu\n", uid_entry->uid,
+			ktime_to_us(total_utime), ktime_to_us(total_stime));
+	}
+
+	rt_mutex_unlock(&uid_lock);
+	return 0;
+}
+
+static int uid_cputime_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, uid_cputime_show, pde_data(inode));
+}
+
+static const struct proc_ops uid_cputime_fops = {
+	.proc_open	= uid_cputime_open,
+	.proc_read	= seq_read,
+	.proc_lseek	= seq_lseek,
+	.proc_release	= single_release,
+};
+
+static int uid_remove_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, NULL, NULL);
+}
+
+static ssize_t uid_remove_write(struct file *file,
+			const char __user *buffer, size_t count, loff_t *ppos)
+{
+	struct uid_entry *uid_entry;
+	struct hlist_node *tmp;
+	char uids[128];
+	char *start_uid, *end_uid = NULL;
+	long int uid_start = 0, uid_end = 0;
+
+	if (count >= sizeof(uids))
+		count = sizeof(uids) - 1;
+
+	if (copy_from_user(uids, buffer, count))
+		return -EFAULT;
+
+	uids[count] = '\0';
+	end_uid = uids;
+	start_uid = strsep(&end_uid, "-");
+
+	if (!start_uid || !end_uid)
+		return -EINVAL;
+
+	if (kstrtol(start_uid, 10, &uid_start) != 0 ||
+		kstrtol(end_uid, 10, &uid_end) != 0) {
+		return -EINVAL;
+	}
+
+	rt_mutex_lock(&uid_lock);
+
+	for (; uid_start <= uid_end; uid_start++) {
+		hash_for_each_possible_safe(hash_table, uid_entry, tmp,
+							hash, (uid_t)uid_start) {
+			if (uid_start == uid_entry->uid) {
+				remove_uid_tasks(uid_entry);
+				hash_del(&uid_entry->hash);
+				kfree(uid_entry);
+			}
+		}
+	}
+
+	rt_mutex_unlock(&uid_lock);
+	return count;
+}
+
+static const struct proc_ops uid_remove_fops = {
+	.proc_open	= uid_remove_open,
+	.proc_release	= single_release,
+	.proc_write	= uid_remove_write,
+};
+
+
+static void add_uid_io_stats(struct uid_entry *uid_entry,
+			struct task_struct *task, int slot)
+{
+	struct io_stats *io_slot = &uid_entry->io[slot];
+
+	/* avoid double accounting of dying threads */
+	if (slot != UID_STATE_DEAD_TASKS && (task->flags & PF_EXITING))
+		return;
+
+	io_slot->read_bytes += task->ioac.read_bytes;
+	io_slot->write_bytes += compute_write_bytes(task);
+	io_slot->rchar += task->ioac.rchar;
+	io_slot->wchar += task->ioac.wchar;
+	io_slot->fsync += task->ioac.syscfs;
+
+	add_uid_tasks_io_stats(uid_entry, task, slot);
+}
+
+static void update_io_stats_all_locked(void)
+{
+	struct uid_entry *uid_entry = NULL;
+	struct task_struct *task, *temp;
+	struct user_namespace *user_ns = current_user_ns();
+	unsigned long bkt;
+	uid_t uid;
+
+	hash_for_each(hash_table, bkt, uid_entry, hash) {
+		memset(&uid_entry->io[UID_STATE_TOTAL_CURR], 0,
+			sizeof(struct io_stats));
+		set_io_uid_tasks_zero(uid_entry);
+	}
+
+	rcu_read_lock();
+	do_each_thread(temp, task) {
+		uid = from_kuid_munged(user_ns, task_uid(task));
+		if (!uid_entry || uid_entry->uid != uid)
+			uid_entry = find_or_register_uid(uid);
+		if (!uid_entry)
+			continue;
+		add_uid_io_stats(uid_entry, task, UID_STATE_TOTAL_CURR);
+	} while_each_thread(temp, task);
+	rcu_read_unlock();
+
+	hash_for_each(hash_table, bkt, uid_entry, hash) {
+		compute_io_bucket_stats(&uid_entry->io[uid_entry->state],
+					&uid_entry->io[UID_STATE_TOTAL_CURR],
+					&uid_entry->io[UID_STATE_TOTAL_LAST],
+					&uid_entry->io[UID_STATE_DEAD_TASKS]);
+		compute_io_uid_tasks(uid_entry);
+	}
+}
+
+static void update_io_stats_uid_locked(struct uid_entry *uid_entry)
+{
+	struct task_struct *task, *temp;
+	struct user_namespace *user_ns = current_user_ns();
+
+	memset(&uid_entry->io[UID_STATE_TOTAL_CURR], 0,
+		sizeof(struct io_stats));
+	set_io_uid_tasks_zero(uid_entry);
+
+	rcu_read_lock();
+	do_each_thread(temp, task) {
+		if (from_kuid_munged(user_ns, task_uid(task)) != uid_entry->uid)
+			continue;
+		add_uid_io_stats(uid_entry, task, UID_STATE_TOTAL_CURR);
+	} while_each_thread(temp, task);
+	rcu_read_unlock();
+
+	compute_io_bucket_stats(&uid_entry->io[uid_entry->state],
+				&uid_entry->io[UID_STATE_TOTAL_CURR],
+				&uid_entry->io[UID_STATE_TOTAL_LAST],
+				&uid_entry->io[UID_STATE_DEAD_TASKS]);
+	compute_io_uid_tasks(uid_entry);
+}
+
+
+static int uid_io_show(struct seq_file *m, void *v)
+{
+	struct uid_entry *uid_entry;
+	unsigned long bkt;
+
+	rt_mutex_lock(&uid_lock);
+
+	update_io_stats_all_locked();
+
+	hash_for_each(hash_table, bkt, uid_entry, hash) {
+		seq_printf(m, "%d %llu %llu %llu %llu %llu %llu %llu %llu %llu %llu\n",
+				uid_entry->uid,
+				uid_entry->io[UID_STATE_FOREGROUND].rchar,
+				uid_entry->io[UID_STATE_FOREGROUND].wchar,
+				uid_entry->io[UID_STATE_FOREGROUND].read_bytes,
+				uid_entry->io[UID_STATE_FOREGROUND].write_bytes,
+				uid_entry->io[UID_STATE_BACKGROUND].rchar,
+				uid_entry->io[UID_STATE_BACKGROUND].wchar,
+				uid_entry->io[UID_STATE_BACKGROUND].read_bytes,
+				uid_entry->io[UID_STATE_BACKGROUND].write_bytes,
+				uid_entry->io[UID_STATE_FOREGROUND].fsync,
+				uid_entry->io[UID_STATE_BACKGROUND].fsync);
+
+		show_io_uid_tasks(m, uid_entry);
+	}
+
+	rt_mutex_unlock(&uid_lock);
+	return 0;
+}
+
+static int uid_io_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, uid_io_show, pde_data(inode));
+}
+
+static const struct proc_ops uid_io_fops = {
+	.proc_open	= uid_io_open,
+	.proc_read	= seq_read,
+	.proc_lseek	= seq_lseek,
+	.proc_release	= single_release,
+};
+
+static int uid_procstat_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, NULL, NULL);
+}
+
+static ssize_t uid_procstat_write(struct file *file,
+			const char __user *buffer, size_t count, loff_t *ppos)
+{
+	struct uid_entry *uid_entry;
+	uid_t uid;
+	int argc, state;
+	char input[128];
+
+	if (count >= sizeof(input))
+		return -EINVAL;
+
+	if (copy_from_user(input, buffer, count))
+		return -EFAULT;
+
+	input[count] = '\0';
+
+	argc = sscanf(input, "%u %d", &uid, &state);
+	if (argc != 2)
+		return -EINVAL;
+
+	if (state != UID_STATE_BACKGROUND && state != UID_STATE_FOREGROUND)
+		return -EINVAL;
+
+	rt_mutex_lock(&uid_lock);
+
+	uid_entry = find_or_register_uid(uid);
+	if (!uid_entry) {
+		rt_mutex_unlock(&uid_lock);
+		return -EINVAL;
+	}
+
+	if (uid_entry->state == state) {
+		rt_mutex_unlock(&uid_lock);
+		return count;
+	}
+
+	update_io_stats_uid_locked(uid_entry);
+
+	uid_entry->state = state;
+
+	rt_mutex_unlock(&uid_lock);
+
+	return count;
+}
+
+static const struct proc_ops uid_procstat_fops = {
+	.proc_open	= uid_procstat_open,
+	.proc_release	= single_release,
+	.proc_write	= uid_procstat_write,
+};
+
+static int process_notifier(struct notifier_block *self,
+			unsigned long cmd, void *v)
+{
+	struct task_struct *task = v;
+	struct uid_entry *uid_entry;
+	u64 utime, stime;
+	uid_t uid;
+
+	if (!task)
+		return NOTIFY_OK;
+
+	rt_mutex_lock(&uid_lock);
+	uid = from_kuid_munged(current_user_ns(), task_uid(task));
+	uid_entry = find_or_register_uid(uid);
+	if (!uid_entry) {
+		pr_err("%s: failed to find uid %d\n", __func__, uid);
+		goto exit;
+	}
+
+	task_cputime_adjusted(task, &utime, &stime);
+	uid_entry->utime += utime;
+	uid_entry->stime += stime;
+
+	add_uid_io_stats(uid_entry, task, UID_STATE_DEAD_TASKS);
+
+exit:
+	rt_mutex_unlock(&uid_lock);
+	return NOTIFY_OK;
+}
+
+static struct notifier_block process_notifier_block = {
+	.notifier_call	= process_notifier,
+};
+
+static int __init proc_uid_sys_stats_init(void)
+{
+	hash_init(hash_table);
+
+	cpu_parent = proc_mkdir("uid_cputime", NULL);
+	if (!cpu_parent) {
+		pr_err("%s: failed to create uid_cputime proc entry\n",
+			__func__);
+		goto err;
+	}
+
+	proc_create_data("remove_uid_range", 0222, cpu_parent,
+		&uid_remove_fops, NULL);
+	proc_create_data("show_uid_stat", 0444, cpu_parent,
+		&uid_cputime_fops, NULL);
+
+	io_parent = proc_mkdir("uid_io", NULL);
+	if (!io_parent) {
+		pr_err("%s: failed to create uid_io proc entry\n",
+			__func__);
+		goto err;
+	}
+
+	proc_create_data("stats", 0444, io_parent,
+		&uid_io_fops, NULL);
+
+	proc_parent = proc_mkdir("uid_procstat", NULL);
+	if (!proc_parent) {
+		pr_err("%s: failed to create uid_procstat proc entry\n",
+			__func__);
+		goto err;
+	}
+
+	proc_create_data("set", 0222, proc_parent,
+		&uid_procstat_fops, NULL);
+
+	profile_event_register(PROFILE_TASK_EXIT, &process_notifier_block);
+
+	return 0;
+
+err:
+	remove_proc_subtree("uid_cputime", NULL);
+	remove_proc_subtree("uid_io", NULL);
+	remove_proc_subtree("uid_procstat", NULL);
+	return -ENOMEM;
+}
+
+early_initcall(proc_uid_sys_stats_init);

diff --git a/drivers/mmc/host/cqhci-crypto.c b/drivers/mmc/host/cqhci-crypto.c
index d5f4b69..6652982 100644
--- a/drivers/mmc/host/cqhci-crypto.c
+++ b/drivers/mmc/host/cqhci-crypto.c

@@ -210,6 +210,8 @@ int cqhci_crypto_init(struct cqhci_host *cq_host)
 	/* Unfortunately, CQHCI crypto only supports 32 DUN bits. */
 	profile->max_dun_bytes_supported = 4;
 
+	profile->key_types_supported = BLK_CRYPTO_KEY_TYPE_STANDARD;
+
 	/*
 	 * Cache all the crypto capabilities and advertise the supported crypto
 	 * modes and data unit sizes to the block layer.

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index d1a68b6..335d39e 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c

@@ -1148,10 +1148,32 @@ int __init early_init_dt_scan_memory(void)
 	return found_memory;
 }
 
+/*
+ * Convert configs to something easy to use in C code
+ */
+#if defined(CONFIG_CMDLINE_FORCE)
+static const int overwrite_incoming_cmdline = 1;
+static const int read_dt_cmdline;
+static const int concat_cmdline;
+#elif defined(CONFIG_CMDLINE_EXTEND)
+static const int overwrite_incoming_cmdline;
+static const int read_dt_cmdline = 1;
+static const int concat_cmdline = 1;
+#else /* CMDLINE_FROM_BOOTLOADER */
+static const int overwrite_incoming_cmdline;
+static const int read_dt_cmdline = 1;
+static const int concat_cmdline;
+#endif
+#ifdef CONFIG_CMDLINE
+static const char *config_cmdline = CONFIG_CMDLINE;
+#else
+static const char *config_cmdline = "";
+#endif
+
 int __init early_init_dt_scan_chosen(char *cmdline)
 {
-	int l, node;
-	const char *p;
+	int l = 0, node;
+	const char *p = NULL;
 	const void *rng_seed;
 	const void *fdt = initial_boot_params;
 
@@ -1179,30 +1201,29 @@ int __init early_init_dt_scan_chosen(char *cmdline)
 				fdt_totalsize(initial_boot_params));
 	}
 
-	/* Retrieve command line */
-	p = of_get_flat_dt_prop(node, "bootargs", &l);
-	if (p != NULL && l > 0)
-		strscpy(cmdline, p, min(l, COMMAND_LINE_SIZE));
+	/* Put CONFIG_CMDLINE in if forced or if data had nothing in it to start */
+	if (overwrite_incoming_cmdline || !cmdline[0])
+		strscpy(cmdline, config_cmdline, COMMAND_LINE_SIZE);
+
+	/* Retrieve command line unless forcing */
+	if (read_dt_cmdline)
+		p = of_get_flat_dt_prop(node, "bootargs", &l);
+	if (p != NULL && l > 0) {
+		if (concat_cmdline) {
+			int cmdline_len;
+			int copy_len;
+			strlcat(cmdline, " ", COMMAND_LINE_SIZE);
+			cmdline_len = strlen(cmdline);
+			copy_len = COMMAND_LINE_SIZE - cmdline_len - 1;
+			copy_len = min((int)l, copy_len);
+			strncpy(cmdline + cmdline_len, p, copy_len);
+			cmdline[cmdline_len + copy_len] = '\0';
+		} else {
+			strscpy(cmdline, p, min(l, COMMAND_LINE_SIZE));
+		}
+	}
 
 handle_cmdline:
-	/*
-	 * CONFIG_CMDLINE is meant to be a default in case nothing else
-	 * managed to set the command line, unless CONFIG_CMDLINE_FORCE
-	 * is set in which case we override whatever was found earlier.
-	 */
-#ifdef CONFIG_CMDLINE
-#if defined(CONFIG_CMDLINE_EXTEND)
-	strlcat(cmdline, " ", COMMAND_LINE_SIZE);
-	strlcat(cmdline, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#elif defined(CONFIG_CMDLINE_FORCE)
-	strscpy(cmdline, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#else
-	/* No arguments from boot loader, use kernel's  cmdl*/
-	if (!((char *)cmdline)[0])
-		strscpy(cmdline, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#endif
-#endif /* CONFIG_CMDLINE */
-
 	pr_debug("Command line is: %s\n", (char *)cmdline);
 
 	return 0;

diff --git a/drivers/of/unittest-data/Makefile b/drivers/of/unittest-data/Makefile
index d072f3b..011dcd2 100644
--- a/drivers/of/unittest-data/Makefile
+++ b/drivers/of/unittest-data/Makefile

@@ -42,9 +42,7 @@
 DTC_FLAGS_testcases += -@
 
 # suppress warnings about intentional errors
-DTC_FLAGS_testcases += -Wno-interrupts_property \
-	-Wno-node_name_vs_property_name \
-	-Wno-interrupt_map
+DTC_FLAGS_testcases += -Wno-interrupts_property
 
 # Apply overlays statically with fdtoverlay.  This is a build time test that
 # the overlays can be applied successfully by fdtoverlay.  This does not
@@ -94,10 +92,6 @@
 
 apply_static_overlay_2 := overlay.dtbo
 
-DTC_FLAGS_static_base_1 += -Wno-interrupts_property \
-	-Wno-node_name_vs_property_name \
-	-Wno-interrupt_map
-
 static_test_1-dtbs := static_base_1.dtb $(apply_static_overlay_1)
 static_test_2-dtbs := static_base_2.dtb $(apply_static_overlay_2)
 

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 39f3b37..6241d3f 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c

@@ -83,6 +83,7 @@ irqreturn_t dw_handle_msi_irq(struct dw_pcie_rp *pp)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(dw_handle_msi_irq);
 
 /* Chained MSI interrupt service routine */
 static void dw_chained_msi_isr(struct irq_desc *desc)

diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c
index 01d1ac7..62a87f1 100644
--- a/drivers/power/supply/power_supply_core.c
+++ b/drivers/power/supply/power_supply_core.c

@@ -34,6 +34,13 @@ EXPORT_SYMBOL_GPL(power_supply_notifier);
 
 static struct device_type power_supply_dev_type;
 
+struct match_device_node_array_param {
+	struct device_node *parent_of_node;
+	struct power_supply **psy;
+	ssize_t psy_size;
+	ssize_t psy_count;
+};
+
 #define POWER_SUPPLY_DEFERRED_REGISTER_TIME	msecs_to_jiffies(10)
 
 static bool __power_supply_is_supplied_by(struct power_supply *supplier,
@@ -526,6 +533,77 @@ struct power_supply *power_supply_get_by_phandle(struct device_node *np,
 }
 EXPORT_SYMBOL_GPL(power_supply_get_by_phandle);
 
+static int power_supply_match_device_node_array(struct device *dev,
+						void *data)
+{
+	struct match_device_node_array_param *param =
+		(struct match_device_node_array_param *)data;
+	struct power_supply **psy = param->psy;
+	ssize_t size = param->psy_size;
+	ssize_t *count = &param->psy_count;
+
+	if (!dev->parent || dev->parent->of_node != param->parent_of_node)
+		return 0;
+
+	if (*count >= size)
+		return -EOVERFLOW;
+
+	psy[*count] = dev_get_drvdata(dev);
+	atomic_inc(&psy[*count]->use_cnt);
+	(*count)++;
+
+	return 0;
+}
+
+/**
+ * power_supply_get_by_phandle_array() - Similar to
+ * power_supply_get_by_phandle but returns an array of power supply
+ * objects which are associated with the phandle.
+ * @np: Pointer to device node holding phandle property.
+ * @property: Name of property holding a power supply name.
+ * @psy: Array of power_supply pointers provided by the client which is
+ * filled by power_supply_get_by_phandle_array.
+ * @size: size of power_supply pointer array.
+ *
+ * If power supply was found, it increases reference count for the
+ * internal power supply's device. The user should power_supply_put()
+ * after usage.
+ *
+ * Return: On success returns the number of power supply objects filled
+ * in the @psy array.
+ * -EOVERFLOW when size of @psy array is not suffice.
+ * -EINVAL when @psy is NULL or @size is 0.
+ * -ENODEV when matching device_node is not found.
+ */
+int power_supply_get_by_phandle_array(struct device_node *np,
+				      const char *property,
+				      struct power_supply **psy,
+				      ssize_t size)
+{
+	struct device_node *power_supply_np;
+	int ret;
+	struct match_device_node_array_param param;
+
+	if (!psy || !size)
+		return -EINVAL;
+
+	power_supply_np = of_parse_phandle(np, property, 0);
+	if (!power_supply_np)
+		return -ENODEV;
+
+	param.parent_of_node = power_supply_np;
+	param.psy = psy;
+	param.psy_size = size;
+	param.psy_count = 0;
+	ret = class_for_each_device(power_supply_class, NULL, &param,
+				    power_supply_match_device_node_array);
+
+	of_node_put(power_supply_np);
+
+	return param.psy_count;
+}
+EXPORT_SYMBOL_GPL(power_supply_get_by_phandle_array);
+
 static void devm_power_supply_put(struct device *dev, void *res)
 {
 	struct power_supply **psy = res;

diff --git a/drivers/power/supply/power_supply_sysfs.c b/drivers/power/supply/power_supply_sysfs.c
index 5369aba..bab8049 100644
--- a/drivers/power/supply/power_supply_sysfs.c
+++ b/drivers/power/supply/power_supply_sysfs.c

@@ -90,6 +90,7 @@ static const char * const POWER_SUPPLY_CHARGE_TYPE_TEXT[] = {
 	[POWER_SUPPLY_CHARGE_TYPE_CUSTOM]	= "Custom",
 	[POWER_SUPPLY_CHARGE_TYPE_LONGLIFE]	= "Long Life",
 	[POWER_SUPPLY_CHARGE_TYPE_BYPASS]	= "Bypass",
+	[POWER_SUPPLY_CHARGE_TYPE_TAPER_EXT]	= "Taper",
 };
 
 static const char * const POWER_SUPPLY_HEALTH_TEXT[] = {

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c3f194d..0444e15 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c

@@ -38,6 +38,7 @@
 #include <linux/virtio_ring.h>
 #include <asm/byteorder.h>
 #include <linux/platform_device.h>
+#include <trace/hooks/remoteproc.h>
 
 #include "remoteproc_internal.h"
 
@@ -1890,6 +1891,7 @@ static void rproc_crash_handler_work(struct work_struct *work)
 		rproc_trigger_recovery(rproc);
 
 out:
+	trace_android_vh_rproc_recovery(rproc);
 	pm_relax(rproc->dev.parent);
 }
 

diff --git a/drivers/remoteproc/remoteproc_coredump.c b/drivers/remoteproc/remoteproc_coredump.c
index 4b09342..9c53c3d 100644
--- a/drivers/remoteproc/remoteproc_coredump.c
+++ b/drivers/remoteproc/remoteproc_coredump.c

@@ -32,6 +32,7 @@ void rproc_coredump_cleanup(struct rproc *rproc)
 		kfree(entry);
 	}
 }
+EXPORT_SYMBOL_GPL(rproc_coredump_cleanup);
 
 /**
  * rproc_coredump_add_segment() - add segment of device memory to coredump
@@ -327,6 +328,7 @@ void rproc_coredump(struct rproc *rproc)
 	 */
 	wait_for_completion(&dump_state.dump_done);
 }
+EXPORT_SYMBOL_GPL(rproc_coredump);
 
 /**
  * rproc_coredump_using_sections() - perform coredump using section headers

diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
index 8c7ea89..22c316d 100644
--- a/drivers/remoteproc/remoteproc_sysfs.c
+++ b/drivers/remoteproc/remoteproc_sysfs.c

@@ -5,6 +5,7 @@
 
 #include <linux/remoteproc.h>
 #include <linux/slab.h>
+#include <trace/hooks/remoteproc.h>
 
 #include "remoteproc_internal.h"
 
@@ -51,9 +52,11 @@ static ssize_t recovery_store(struct device *dev,
 	if (sysfs_streq(buf, "enabled")) {
 		/* change the flag and begin the recovery process if needed */
 		rproc->recovery_disabled = false;
+		trace_android_vh_rproc_recovery_set(rproc);
 		rproc_trigger_recovery(rproc);
 	} else if (sysfs_streq(buf, "disabled")) {
 		rproc->recovery_disabled = true;
+		trace_android_vh_rproc_recovery_set(rproc);
 	} else if (sysfs_streq(buf, "recover")) {
 		/* begin the recovery process without changing the flag */
 		rproc_trigger_recovery(rproc);

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 5cfabd5..306a484 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig

@@ -56,6 +56,8 @@
 
 source "drivers/staging/media/Kconfig"
 
+source "drivers/staging/android/Kconfig"
+
 source "drivers/staging/board/Kconfig"
 
 source "drivers/staging/gdm724x/Kconfig"

diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index f8c3aa9c..a375649 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile

@@ -18,6 +18,7 @@
 obj-$(CONFIG_FB_SM750)		+= sm750fb/
 obj-$(CONFIG_USB_EMXX)		+= emxx_udc/
 obj-$(CONFIG_MFD_NVEC)		+= nvec/
+obj-$(CONFIG_ASHMEM)		+= android/
 obj-$(CONFIG_STAGING_BOARD)	+= board/
 obj-$(CONFIG_LTE_GDM724X)	+= gdm724x/
 obj-$(CONFIG_FB_TFT)		+= fbtft/

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
new file mode 100644
index 0000000..36f4413
--- /dev/null
+++ b/drivers/staging/android/Kconfig

@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0
+menu "Android"
+
+config ASHMEM
+	bool "Enable the Anonymous Shared Memory Subsystem"
+	depends on SHMEM
+	help
+	  The ashmem subsystem is a new shared memory allocator, similar to
+	  POSIX SHM but with different behavior and sporting a simpler
+	  file-based API.
+
+	  It is, in theory, a good memory allocator for low-memory devices,
+	  because it can discard shared memory units when under memory pressure.
+
+endmenu

diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
new file mode 100644
index 0000000..e9a55a5
--- /dev/null
+++ b/drivers/staging/android/Makefile

@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+ccflags-y += -I$(src)			# needed for trace events
+
+obj-$(CONFIG_ASHMEM)			+= ashmem.o

diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO
new file mode 100644
index 0000000..f74eb44
--- /dev/null
+++ b/drivers/staging/android/TODO

@@ -0,0 +1,8 @@
+TODO:
+	- sparse fixes
+	- rename files to be not so "generic"
+	- add proper arch dependencies as needed
+	- audit userspace interfaces to make sure they are sane
+
+Please send patches to Greg Kroah-Hartman <greg@kroah.com> and Cc:
+Arve Hjønnevåg <arve@android.com> and Riley Andrews <riandrews@android.com>

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
new file mode 100644
index 0000000..feadf798
--- /dev/null
+++ b/drivers/staging/android/ashmem.c

@@ -0,0 +1,991 @@
+// SPDX-License-Identifier: GPL-2.0
+/* mm/ashmem.c
+ *
+ * Anonymous Shared Memory Subsystem, ashmem
+ *
+ * Copyright (C) 2008 Google, Inc.
+ *
+ * Robert Love <rlove@google.com>
+ */
+
+#define pr_fmt(fmt) "ashmem: " fmt
+
+#include <linux/init.h>
+#include <linux/export.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/falloc.h>
+#include <linux/miscdevice.h>
+#include <linux/security.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/uaccess.h>
+#include <linux/personality.h>
+#include <linux/bitops.h>
+#include <linux/mutex.h>
+#include <linux/shmem_fs.h>
+#include "ashmem.h"
+
+#define ASHMEM_NAME_PREFIX "dev/ashmem/"
+#define ASHMEM_NAME_PREFIX_LEN (sizeof(ASHMEM_NAME_PREFIX) - 1)
+#define ASHMEM_FULL_NAME_LEN (ASHMEM_NAME_LEN + ASHMEM_NAME_PREFIX_LEN)
+
+/**
+ * struct ashmem_area - The anonymous shared memory area
+ * @name:		The optional name in /proc/pid/maps
+ * @unpinned_list:	The list of all ashmem areas
+ * @file:		The shmem-based backing file
+ * @size:		The size of the mapping, in bytes
+ * @prot_mask:		The allowed protection bits, as vm_flags
+ *
+ * The lifecycle of this structure is from our parent file's open() until
+ * its release(). It is also protected by 'ashmem_mutex'
+ *
+ * Warning: Mappings do NOT pin this structure; It dies on close()
+ */
+struct ashmem_area {
+	char name[ASHMEM_FULL_NAME_LEN];
+	struct list_head unpinned_list;
+	struct file *file;
+	size_t size;
+	unsigned long prot_mask;
+};
+
+/**
+ * struct ashmem_range - A range of unpinned/evictable pages
+ * @lru:	         The entry in the LRU list
+ * @unpinned:	         The entry in its area's unpinned list
+ * @asma:	         The associated anonymous shared memory area.
+ * @pgstart:	         The starting page (inclusive)
+ * @pgend:	         The ending page (inclusive)
+ * @purged:	         The purge status (ASHMEM_NOT or ASHMEM_WAS_PURGED)
+ *
+ * The lifecycle of this structure is from unpin to pin.
+ * It is protected by 'ashmem_mutex'
+ */
+struct ashmem_range {
+	struct list_head lru;
+	struct list_head unpinned;
+	struct ashmem_area *asma;
+	size_t pgstart;
+	size_t pgend;
+	unsigned int purged;
+};
+
+/* LRU list of unpinned pages, protected by ashmem_mutex */
+static LIST_HEAD(ashmem_lru_list);
+
+static atomic_t ashmem_shrink_inflight = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(ashmem_shrink_wait);
+
+/*
+ * long lru_count - The count of pages on our LRU list.
+ *
+ * This is protected by ashmem_mutex.
+ */
+static unsigned long lru_count;
+
+/*
+ * ashmem_mutex - protects the list of and each individual ashmem_area
+ *
+ * Lock Ordering: ashmex_mutex -> i_mutex -> i_alloc_sem
+ */
+static DEFINE_MUTEX(ashmem_mutex);
+
+static struct kmem_cache *ashmem_area_cachep __read_mostly;
+static struct kmem_cache *ashmem_range_cachep __read_mostly;
+
+/*
+ * A separate lockdep class for the backing shmem inodes to resolve the lockdep
+ * warning about the race between kswapd taking fs_reclaim before inode_lock
+ * and write syscall taking inode_lock and then fs_reclaim.
+ * Note that such race is impossible because ashmem does not support write
+ * syscalls operating on the backing shmem.
+ */
+static struct lock_class_key backing_shmem_inode_class;
+
+static inline unsigned long range_size(struct ashmem_range *range)
+{
+	return range->pgend - range->pgstart + 1;
+}
+
+static inline bool range_on_lru(struct ashmem_range *range)
+{
+	return range->purged == ASHMEM_NOT_PURGED;
+}
+
+static inline bool page_range_subsumes_range(struct ashmem_range *range,
+					     size_t start, size_t end)
+{
+	return (range->pgstart >= start) && (range->pgend <= end);
+}
+
+static inline bool page_range_subsumed_by_range(struct ashmem_range *range,
+						size_t start, size_t end)
+{
+	return (range->pgstart <= start) && (range->pgend >= end);
+}
+
+static inline bool page_in_range(struct ashmem_range *range, size_t page)
+{
+	return (range->pgstart <= page) && (range->pgend >= page);
+}
+
+static inline bool page_range_in_range(struct ashmem_range *range,
+				       size_t start, size_t end)
+{
+	return page_in_range(range, start) || page_in_range(range, end) ||
+		page_range_subsumes_range(range, start, end);
+}
+
+static inline bool range_before_page(struct ashmem_range *range,
+				     size_t page)
+{
+	return range->pgend < page;
+}
+
+#define PROT_MASK		(PROT_EXEC | PROT_READ | PROT_WRITE)
+
+/**
+ * lru_add() - Adds a range of memory to the LRU list
+ * @range:     The memory range being added.
+ *
+ * The range is first added to the end (tail) of the LRU list.
+ * After this, the size of the range is added to @lru_count
+ */
+static inline void lru_add(struct ashmem_range *range)
+{
+	list_add_tail(&range->lru, &ashmem_lru_list);
+	lru_count += range_size(range);
+}
+
+/**
+ * lru_del() - Removes a range of memory from the LRU list
+ * @range:     The memory range being removed
+ *
+ * The range is first deleted from the LRU list.
+ * After this, the size of the range is removed from @lru_count
+ */
+static inline void lru_del(struct ashmem_range *range)
+{
+	list_del(&range->lru);
+	lru_count -= range_size(range);
+}
+
+/**
+ * range_alloc() - Allocates and initializes a new ashmem_range structure
+ * @asma:	   The associated ashmem_area
+ * @prev_range:	   The previous ashmem_range in the sorted asma->unpinned list
+ * @purged:	   Initial purge status (ASMEM_NOT_PURGED or ASHMEM_WAS_PURGED)
+ * @start:	   The starting page (inclusive)
+ * @end:	   The ending page (inclusive)
+ * @new_range:	   The placeholder for the new range
+ *
+ * This function is protected by ashmem_mutex.
+ */
+static void range_alloc(struct ashmem_area *asma,
+			struct ashmem_range *prev_range, unsigned int purged,
+			size_t start, size_t end,
+			struct ashmem_range **new_range)
+{
+	struct ashmem_range *range = *new_range;
+
+	*new_range = NULL;
+	range->asma = asma;
+	range->pgstart = start;
+	range->pgend = end;
+	range->purged = purged;
+
+	list_add_tail(&range->unpinned, &prev_range->unpinned);
+
+	if (range_on_lru(range))
+		lru_add(range);
+}
+
+/**
+ * range_del() - Deletes and deallocates an ashmem_range structure
+ * @range:	 The associated ashmem_range that has previously been allocated
+ */
+static void range_del(struct ashmem_range *range)
+{
+	list_del(&range->unpinned);
+	if (range_on_lru(range))
+		lru_del(range);
+	kmem_cache_free(ashmem_range_cachep, range);
+}
+
+/**
+ * range_shrink() - Shrinks an ashmem_range
+ * @range:	    The associated ashmem_range being shrunk
+ * @start:	    The starting byte of the new range
+ * @end:	    The ending byte of the new range
+ *
+ * This does not modify the data inside the existing range in any way - It
+ * simply shrinks the boundaries of the range.
+ *
+ * Theoretically, with a little tweaking, this could eventually be changed
+ * to range_resize, and expand the lru_count if the new range is larger.
+ */
+static inline void range_shrink(struct ashmem_range *range,
+				size_t start, size_t end)
+{
+	size_t pre = range_size(range);
+
+	range->pgstart = start;
+	range->pgend = end;
+
+	if (range_on_lru(range))
+		lru_count -= pre - range_size(range);
+}
+
+/**
+ * ashmem_open() - Opens an Anonymous Shared Memory structure
+ * @inode:	   The backing file's index node(?)
+ * @file:	   The backing file
+ *
+ * Please note that the ashmem_area is not returned by this function - It is
+ * instead written to "file->private_data".
+ *
+ * Return: 0 if successful, or another code if unsuccessful.
+ */
+static int ashmem_open(struct inode *inode, struct file *file)
+{
+	struct ashmem_area *asma;
+	int ret;
+
+	ret = generic_file_open(inode, file);
+	if (ret)
+		return ret;
+
+	asma = kmem_cache_zalloc(ashmem_area_cachep, GFP_KERNEL);
+	if (!asma)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&asma->unpinned_list);
+	memcpy(asma->name, ASHMEM_NAME_PREFIX, ASHMEM_NAME_PREFIX_LEN);
+	asma->prot_mask = PROT_MASK;
+	file->private_data = asma;
+
+	return 0;
+}
+
+/**
+ * ashmem_release() - Releases an Anonymous Shared Memory structure
+ * @ignored:	      The backing file's Index Node(?) - It is ignored here.
+ * @file:	      The backing file
+ *
+ * Return: 0 if successful. If it is anything else, go have a coffee and
+ * try again.
+ */
+static int ashmem_release(struct inode *ignored, struct file *file)
+{
+	struct ashmem_area *asma = file->private_data;
+	struct ashmem_range *range, *next;
+
+	mutex_lock(&ashmem_mutex);
+	list_for_each_entry_safe(range, next, &asma->unpinned_list, unpinned)
+		range_del(range);
+	mutex_unlock(&ashmem_mutex);
+
+	if (asma->file)
+		fput(asma->file);
+	kmem_cache_free(ashmem_area_cachep, asma);
+
+	return 0;
+}
+
+static ssize_t ashmem_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct ashmem_area *asma = iocb->ki_filp->private_data;
+	int ret = 0;
+
+	mutex_lock(&ashmem_mutex);
+
+	/* If size is not set, or set to 0, always return EOF. */
+	if (asma->size == 0)
+		goto out_unlock;
+
+	if (!asma->file) {
+		ret = -EBADF;
+		goto out_unlock;
+	}
+
+	/*
+	 * asma and asma->file are used outside the lock here.  We assume
+	 * once asma->file is set it will never be changed, and will not
+	 * be destroyed until all references to the file are dropped and
+	 * ashmem_release is called.
+	 */
+	mutex_unlock(&ashmem_mutex);
+	ret = vfs_iter_read(asma->file, iter, &iocb->ki_pos, 0);
+	mutex_lock(&ashmem_mutex);
+	if (ret > 0)
+		asma->file->f_pos = iocb->ki_pos;
+out_unlock:
+	mutex_unlock(&ashmem_mutex);
+	return ret;
+}
+
+static loff_t ashmem_llseek(struct file *file, loff_t offset, int origin)
+{
+	struct ashmem_area *asma = file->private_data;
+	loff_t ret;
+
+	mutex_lock(&ashmem_mutex);
+
+	if (asma->size == 0) {
+		mutex_unlock(&ashmem_mutex);
+		return -EINVAL;
+	}
+
+	if (!asma->file) {
+		mutex_unlock(&ashmem_mutex);
+		return -EBADF;
+	}
+
+	mutex_unlock(&ashmem_mutex);
+
+	ret = vfs_llseek(asma->file, offset, origin);
+	if (ret < 0)
+		return ret;
+
+	/** Copy f_pos from backing file, since f_ops->llseek() sets it */
+	file->f_pos = asma->file->f_pos;
+	return ret;
+}
+
+static inline vm_flags_t calc_vm_may_flags(unsigned long prot)
+{
+	return _calc_vm_trans(prot, PROT_READ,  VM_MAYREAD) |
+	       _calc_vm_trans(prot, PROT_WRITE, VM_MAYWRITE) |
+	       _calc_vm_trans(prot, PROT_EXEC,  VM_MAYEXEC);
+}
+
+static int ashmem_vmfile_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	/* do not allow to mmap ashmem backing shmem file directly */
+	return -EPERM;
+}
+
+static unsigned long
+ashmem_vmfile_get_unmapped_area(struct file *file, unsigned long addr,
+				unsigned long len, unsigned long pgoff,
+				unsigned long flags)
+{
+	return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
+static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	static struct file_operations vmfile_fops;
+	struct ashmem_area *asma = file->private_data;
+	int ret = 0;
+
+	mutex_lock(&ashmem_mutex);
+
+	/* user needs to SET_SIZE before mapping */
+	if (!asma->size) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* requested mapping size larger than object size */
+	if (vma->vm_end - vma->vm_start > PAGE_ALIGN(asma->size)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* requested protection bits must match our allowed protection mask */
+	if ((vma->vm_flags & ~calc_vm_prot_bits(asma->prot_mask, 0)) &
+	    calc_vm_prot_bits(PROT_MASK, 0)) {
+		ret = -EPERM;
+		goto out;
+	}
+	vma->vm_flags &= ~calc_vm_may_flags(~asma->prot_mask);
+
+	if (!asma->file) {
+		char *name = ASHMEM_NAME_DEF;
+		struct file *vmfile;
+		struct inode *inode;
+
+		if (asma->name[ASHMEM_NAME_PREFIX_LEN] != '\0')
+			name = asma->name;
+
+		/* ... and allocate the backing shmem file */
+		vmfile = shmem_file_setup(name, asma->size, vma->vm_flags);
+		if (IS_ERR(vmfile)) {
+			ret = PTR_ERR(vmfile);
+			goto out;
+		}
+		vmfile->f_mode |= FMODE_LSEEK;
+		inode = file_inode(vmfile);
+		lockdep_set_class(&inode->i_rwsem, &backing_shmem_inode_class);
+		asma->file = vmfile;
+		/*
+		 * override mmap operation of the vmfile so that it can't be
+		 * remapped which would lead to creation of a new vma with no
+		 * asma permission checks. Have to override get_unmapped_area
+		 * as well to prevent VM_BUG_ON check for f_ops modification.
+		 */
+		if (!vmfile_fops.mmap) {
+			vmfile_fops = *vmfile->f_op;
+			vmfile_fops.mmap = ashmem_vmfile_mmap;
+			vmfile_fops.get_unmapped_area =
+					ashmem_vmfile_get_unmapped_area;
+		}
+		vmfile->f_op = &vmfile_fops;
+	}
+	get_file(asma->file);
+
+	/*
+	 * XXX - Reworked to use shmem_zero_setup() instead of
+	 * shmem_set_file while we're in staging. -jstultz
+	 */
+	if (vma->vm_flags & VM_SHARED) {
+		ret = shmem_zero_setup(vma);
+		if (ret) {
+			fput(asma->file);
+			goto out;
+		}
+	} else {
+		vma_set_anonymous(vma);
+	}
+
+	vma_set_file(vma, asma->file);
+	/* XXX: merge this with the get_file() above if possible */
+	fput(asma->file);
+
+out:
+	mutex_unlock(&ashmem_mutex);
+	return ret;
+}
+
+/*
+ * ashmem_shrink - our cache shrinker, called from mm/vmscan.c
+ *
+ * 'nr_to_scan' is the number of objects to scan for freeing.
+ *
+ * 'gfp_mask' is the mask of the allocation that got us into this mess.
+ *
+ * Return value is the number of objects freed or -1 if we cannot
+ * proceed without risk of deadlock (due to gfp_mask).
+ *
+ * We approximate LRU via least-recently-unpinned, jettisoning unpinned partial
+ * chunks of ashmem regions LRU-wise one-at-a-time until we hit 'nr_to_scan'
+ * pages freed.
+ */
+static unsigned long
+ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+	unsigned long freed = 0;
+
+	/* We might recurse into filesystem code, so bail out if necessary */
+	if (!(sc->gfp_mask & __GFP_FS))
+		return SHRINK_STOP;
+
+	if (!mutex_trylock(&ashmem_mutex))
+		return -1;
+
+	while (!list_empty(&ashmem_lru_list)) {
+		struct ashmem_range *range =
+			list_first_entry(&ashmem_lru_list, typeof(*range), lru);
+		loff_t start = range->pgstart * PAGE_SIZE;
+		loff_t end = (range->pgend + 1) * PAGE_SIZE;
+		struct file *f = range->asma->file;
+
+		get_file(f);
+		atomic_inc(&ashmem_shrink_inflight);
+		range->purged = ASHMEM_WAS_PURGED;
+		lru_del(range);
+
+		freed += range_size(range);
+		mutex_unlock(&ashmem_mutex);
+		f->f_op->fallocate(f,
+				   FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+				   start, end - start);
+		fput(f);
+		if (atomic_dec_and_test(&ashmem_shrink_inflight))
+			wake_up_all(&ashmem_shrink_wait);
+		if (!mutex_trylock(&ashmem_mutex))
+			goto out;
+		if (--sc->nr_to_scan <= 0)
+			break;
+	}
+	mutex_unlock(&ashmem_mutex);
+out:
+	return freed;
+}
+
+static unsigned long
+ashmem_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+	/*
+	 * note that lru_count is count of pages on the lru, not a count of
+	 * objects on the list. This means the scan function needs to return the
+	 * number of pages freed, not the number of objects scanned.
+	 */
+	return lru_count;
+}
+
+static struct shrinker ashmem_shrinker = {
+	.count_objects = ashmem_shrink_count,
+	.scan_objects = ashmem_shrink_scan,
+	/*
+	 * XXX (dchinner): I wish people would comment on why they need on
+	 * significant changes to the default value here
+	 */
+	.seeks = DEFAULT_SEEKS * 4,
+};
+
+static int set_prot_mask(struct ashmem_area *asma, unsigned long prot)
+{
+	int ret = 0;
+
+	mutex_lock(&ashmem_mutex);
+
+	/* the user can only remove, not add, protection bits */
+	if ((asma->prot_mask & prot) != prot) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* does the application expect PROT_READ to imply PROT_EXEC? */
+	if ((prot & PROT_READ) && (current->personality & READ_IMPLIES_EXEC))
+		prot |= PROT_EXEC;
+
+	asma->prot_mask = prot;
+
+out:
+	mutex_unlock(&ashmem_mutex);
+	return ret;
+}
+
+static int set_name(struct ashmem_area *asma, void __user *name)
+{
+	int len;
+	int ret = 0;
+	char local_name[ASHMEM_NAME_LEN];
+
+	/*
+	 * Holding the ashmem_mutex while doing a copy_from_user might cause
+	 * an data abort which would try to access mmap_lock. If another
+	 * thread has invoked ashmem_mmap then it will be holding the
+	 * semaphore and will be waiting for ashmem_mutex, there by leading to
+	 * deadlock. We'll release the mutex and take the name to a local
+	 * variable that does not need protection and later copy the local
+	 * variable to the structure member with lock held.
+	 */
+	len = strncpy_from_user(local_name, name, ASHMEM_NAME_LEN);
+	if (len < 0)
+		return len;
+
+	mutex_lock(&ashmem_mutex);
+	/* cannot change an existing mapping's name */
+	if (asma->file)
+		ret = -EINVAL;
+	else
+		strscpy(asma->name + ASHMEM_NAME_PREFIX_LEN, local_name,
+			ASHMEM_NAME_LEN);
+
+	mutex_unlock(&ashmem_mutex);
+	return ret;
+}
+
+static int get_name(struct ashmem_area *asma, void __user *name)
+{
+	int ret = 0;
+	size_t len;
+	/*
+	 * Have a local variable to which we'll copy the content
+	 * from asma with the lock held. Later we can copy this to the user
+	 * space safely without holding any locks. So even if we proceed to
+	 * wait for mmap_lock, it won't lead to deadlock.
+	 */
+	char local_name[ASHMEM_NAME_LEN];
+
+	mutex_lock(&ashmem_mutex);
+	if (asma->name[ASHMEM_NAME_PREFIX_LEN] != '\0') {
+		/*
+		 * Copying only `len', instead of ASHMEM_NAME_LEN, bytes
+		 * prevents us from revealing one user's stack to another.
+		 */
+		len = strlen(asma->name + ASHMEM_NAME_PREFIX_LEN) + 1;
+		memcpy(local_name, asma->name + ASHMEM_NAME_PREFIX_LEN, len);
+	} else {
+		len = sizeof(ASHMEM_NAME_DEF);
+		memcpy(local_name, ASHMEM_NAME_DEF, len);
+	}
+	mutex_unlock(&ashmem_mutex);
+
+	/*
+	 * Now we are just copying from the stack variable to userland
+	 * No lock held
+	 */
+	if (copy_to_user(name, local_name, len))
+		ret = -EFAULT;
+	return ret;
+}
+
+/*
+ * ashmem_pin - pin the given ashmem region, returning whether it was
+ * previously purged (ASHMEM_WAS_PURGED) or not (ASHMEM_NOT_PURGED).
+ *
+ * Caller must hold ashmem_mutex.
+ */
+static int ashmem_pin(struct ashmem_area *asma, size_t pgstart, size_t pgend,
+		      struct ashmem_range **new_range)
+{
+	struct ashmem_range *range, *next;
+	int ret = ASHMEM_NOT_PURGED;
+
+	list_for_each_entry_safe(range, next, &asma->unpinned_list, unpinned) {
+		/* moved past last applicable page; we can short circuit */
+		if (range_before_page(range, pgstart))
+			break;
+
+		/*
+		 * The user can ask us to pin pages that span multiple ranges,
+		 * or to pin pages that aren't even unpinned, so this is messy.
+		 *
+		 * Four cases:
+		 * 1. The requested range subsumes an existing range, so we
+		 *    just remove the entire matching range.
+		 * 2. The requested range overlaps the start of an existing
+		 *    range, so we just update that range.
+		 * 3. The requested range overlaps the end of an existing
+		 *    range, so we just update that range.
+		 * 4. The requested range punches a hole in an existing range,
+		 *    so we have to update one side of the range and then
+		 *    create a new range for the other side.
+		 */
+		if (page_range_in_range(range, pgstart, pgend)) {
+			ret |= range->purged;
+
+			/* Case #1: Easy. Just nuke the whole thing. */
+			if (page_range_subsumes_range(range, pgstart, pgend)) {
+				range_del(range);
+				continue;
+			}
+
+			/* Case #2: We overlap from the start, so adjust it */
+			if (range->pgstart >= pgstart) {
+				range_shrink(range, pgend + 1, range->pgend);
+				continue;
+			}
+
+			/* Case #3: We overlap from the rear, so adjust it */
+			if (range->pgend <= pgend) {
+				range_shrink(range, range->pgstart,
+					     pgstart - 1);
+				continue;
+			}
+
+			/*
+			 * Case #4: We eat a chunk out of the middle. A bit
+			 * more complicated, we allocate a new range for the
+			 * second half and adjust the first chunk's endpoint.
+			 */
+			range_alloc(asma, range, range->purged,
+				    pgend + 1, range->pgend, new_range);
+			range_shrink(range, range->pgstart, pgstart - 1);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+/*
+ * ashmem_unpin - unpin the given range of pages. Returns zero on success.
+ *
+ * Caller must hold ashmem_mutex.
+ */
+static int ashmem_unpin(struct ashmem_area *asma, size_t pgstart, size_t pgend,
+			struct ashmem_range **new_range)
+{
+	struct ashmem_range *range = NULL, *iter, *next;
+	unsigned int purged = ASHMEM_NOT_PURGED;
+
+restart:
+	list_for_each_entry_safe(iter, next, &asma->unpinned_list, unpinned) {
+		/* short circuit: this is our insertion point */
+		if (range_before_page(iter, pgstart)) {
+			range = iter;
+			break;
+		}
+
+		/*
+		 * The user can ask us to unpin pages that are already entirely
+		 * or partially pinned. We handle those two cases here.
+		 */
+		if (page_range_subsumed_by_range(iter, pgstart, pgend))
+			return 0;
+		if (page_range_in_range(iter, pgstart, pgend)) {
+			pgstart = min(iter->pgstart, pgstart);
+			pgend = max(iter->pgend, pgend);
+			purged |= iter->purged;
+			range_del(iter);
+			goto restart;
+		}
+	}
+
+	range = list_prepare_entry(range, &asma->unpinned_list, unpinned);
+	range_alloc(asma, range, purged, pgstart, pgend, new_range);
+	return 0;
+}
+
+/*
+ * ashmem_get_pin_status - Returns ASHMEM_IS_UNPINNED if _any_ pages in the
+ * given interval are unpinned and ASHMEM_IS_PINNED otherwise.
+ *
+ * Caller must hold ashmem_mutex.
+ */
+static int ashmem_get_pin_status(struct ashmem_area *asma, size_t pgstart,
+				 size_t pgend)
+{
+	struct ashmem_range *range;
+	int ret = ASHMEM_IS_PINNED;
+
+	list_for_each_entry(range, &asma->unpinned_list, unpinned) {
+		if (range_before_page(range, pgstart))
+			break;
+		if (page_range_in_range(range, pgstart, pgend)) {
+			ret = ASHMEM_IS_UNPINNED;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static int ashmem_pin_unpin(struct ashmem_area *asma, unsigned long cmd,
+			    void __user *p)
+{
+	struct ashmem_pin pin;
+	size_t pgstart, pgend;
+	int ret = -EINVAL;
+	struct ashmem_range *range = NULL;
+
+	if (copy_from_user(&pin, p, sizeof(pin)))
+		return -EFAULT;
+
+	if (cmd == ASHMEM_PIN || cmd == ASHMEM_UNPIN) {
+		range = kmem_cache_zalloc(ashmem_range_cachep, GFP_KERNEL);
+		if (!range)
+			return -ENOMEM;
+	}
+
+	mutex_lock(&ashmem_mutex);
+	wait_event(ashmem_shrink_wait, !atomic_read(&ashmem_shrink_inflight));
+
+	if (!asma->file)
+		goto out_unlock;
+
+	/* per custom, you can pass zero for len to mean "everything onward" */
+	if (!pin.len)
+		pin.len = PAGE_ALIGN(asma->size) - pin.offset;
+
+	if ((pin.offset | pin.len) & ~PAGE_MASK)
+		goto out_unlock;
+
+	if (((__u32)-1) - pin.offset < pin.len)
+		goto out_unlock;
+
+	if (PAGE_ALIGN(asma->size) < pin.offset + pin.len)
+		goto out_unlock;
+
+	pgstart = pin.offset / PAGE_SIZE;
+	pgend = pgstart + (pin.len / PAGE_SIZE) - 1;
+
+	switch (cmd) {
+	case ASHMEM_PIN:
+		ret = ashmem_pin(asma, pgstart, pgend, &range);
+		break;
+	case ASHMEM_UNPIN:
+		ret = ashmem_unpin(asma, pgstart, pgend, &range);
+		break;
+	case ASHMEM_GET_PIN_STATUS:
+		ret = ashmem_get_pin_status(asma, pgstart, pgend);
+		break;
+	}
+
+out_unlock:
+	mutex_unlock(&ashmem_mutex);
+	if (range)
+		kmem_cache_free(ashmem_range_cachep, range);
+
+	return ret;
+}
+
+static long ashmem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct ashmem_area *asma = file->private_data;
+	unsigned long ino;
+	long ret = -ENOTTY;
+
+	switch (cmd) {
+	case ASHMEM_SET_NAME:
+		ret = set_name(asma, (void __user *)arg);
+		break;
+	case ASHMEM_GET_NAME:
+		ret = get_name(asma, (void __user *)arg);
+		break;
+	case ASHMEM_SET_SIZE:
+		ret = -EINVAL;
+		mutex_lock(&ashmem_mutex);
+		if (!asma->file) {
+			ret = 0;
+			asma->size = (size_t)arg;
+		}
+		mutex_unlock(&ashmem_mutex);
+		break;
+	case ASHMEM_GET_SIZE:
+		ret = asma->size;
+		break;
+	case ASHMEM_SET_PROT_MASK:
+		ret = set_prot_mask(asma, arg);
+		break;
+	case ASHMEM_GET_PROT_MASK:
+		ret = asma->prot_mask;
+		break;
+	case ASHMEM_PIN:
+	case ASHMEM_UNPIN:
+	case ASHMEM_GET_PIN_STATUS:
+		ret = ashmem_pin_unpin(asma, cmd, (void __user *)arg);
+		break;
+	case ASHMEM_PURGE_ALL_CACHES:
+		ret = -EPERM;
+		if (capable(CAP_SYS_ADMIN)) {
+			struct shrink_control sc = {
+				.gfp_mask = GFP_KERNEL,
+				.nr_to_scan = LONG_MAX,
+			};
+			ret = ashmem_shrink_count(&ashmem_shrinker, &sc);
+			ashmem_shrink_scan(&ashmem_shrinker, &sc);
+		}
+		break;
+	case ASHMEM_GET_FILE_ID:
+		/* Lock around our check to avoid racing with ashmem_mmap(). */
+		mutex_lock(&ashmem_mutex);
+		if (!asma || !asma->file) {
+			mutex_unlock(&ashmem_mutex);
+			ret = -EINVAL;
+			break;
+		}
+		ino = file_inode(asma->file)->i_ino;
+		mutex_unlock(&ashmem_mutex);
+
+		if (copy_to_user((void __user *)arg, &ino, sizeof(ino))) {
+			ret = -EFAULT;
+			break;
+		}
+		ret = 0;
+		break;
+	}
+
+	return ret;
+}
+
+/* support of 32bit userspace on 64bit platforms */
+#ifdef CONFIG_COMPAT
+static long compat_ashmem_ioctl(struct file *file, unsigned int cmd,
+				unsigned long arg)
+{
+	switch (cmd) {
+	case COMPAT_ASHMEM_SET_SIZE:
+		cmd = ASHMEM_SET_SIZE;
+		break;
+	case COMPAT_ASHMEM_SET_PROT_MASK:
+		cmd = ASHMEM_SET_PROT_MASK;
+		break;
+	}
+	return ashmem_ioctl(file, cmd, arg);
+}
+#endif
+#ifdef CONFIG_PROC_FS
+static void ashmem_show_fdinfo(struct seq_file *m, struct file *file)
+{
+	struct ashmem_area *asma = file->private_data;
+
+	mutex_lock(&ashmem_mutex);
+
+	if (asma->file)
+		seq_printf(m, "inode:\t%ld\n", file_inode(asma->file)->i_ino);
+
+	if (asma->name[ASHMEM_NAME_PREFIX_LEN] != '\0')
+		seq_printf(m, "name:\t%s\n",
+			   asma->name + ASHMEM_NAME_PREFIX_LEN);
+
+	seq_printf(m, "size:\t%zu\n", asma->size);
+
+	mutex_unlock(&ashmem_mutex);
+}
+#endif
+static const struct file_operations ashmem_fops = {
+	.owner = THIS_MODULE,
+	.open = ashmem_open,
+	.release = ashmem_release,
+	.read_iter = ashmem_read_iter,
+	.llseek = ashmem_llseek,
+	.mmap = ashmem_mmap,
+	.unlocked_ioctl = ashmem_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl = compat_ashmem_ioctl,
+#endif
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo = ashmem_show_fdinfo,
+#endif
+};
+
+static struct miscdevice ashmem_misc = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "ashmem",
+	.fops = &ashmem_fops,
+};
+
+static int __init ashmem_init(void)
+{
+	int ret = -ENOMEM;
+
+	ashmem_area_cachep = kmem_cache_create("ashmem_area_cache",
+					       sizeof(struct ashmem_area),
+					       0, 0, NULL);
+	if (!ashmem_area_cachep) {
+		pr_err("failed to create slab cache\n");
+		goto out;
+	}
+
+	ashmem_range_cachep = kmem_cache_create("ashmem_range_cache",
+						sizeof(struct ashmem_range),
+						0, SLAB_RECLAIM_ACCOUNT, NULL);
+	if (!ashmem_range_cachep) {
+		pr_err("failed to create slab cache\n");
+		goto out_free1;
+	}
+
+	ret = misc_register(&ashmem_misc);
+	if (ret) {
+		pr_err("failed to register misc device!\n");
+		goto out_free2;
+	}
+
+	ret = register_shrinker(&ashmem_shrinker, "android-ashmem");
+	if (ret) {
+		pr_err("failed to register shrinker!\n");
+		goto out_demisc;
+	}
+
+	pr_info("initialized\n");
+
+	return 0;
+
+out_demisc:
+	misc_deregister(&ashmem_misc);
+out_free2:
+	kmem_cache_destroy(ashmem_range_cachep);
+out_free1:
+	kmem_cache_destroy(ashmem_area_cachep);
+out:
+	return ret;
+}
+device_initcall(ashmem_init);

diff --git a/drivers/staging/android/ashmem.h b/drivers/staging/android/ashmem.h
new file mode 100644
index 0000000..1a47817
--- /dev/null
+++ b/drivers/staging/android/ashmem.h

@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Apache-2.0 */
+/*
+ * include/linux/ashmem.h
+ *
+ * Copyright 2008 Google Inc.
+ * Author: Robert Love
+ */
+
+#ifndef _LINUX_ASHMEM_H
+#define _LINUX_ASHMEM_H
+
+#include <linux/limits.h>
+#include <linux/ioctl.h>
+#include <linux/compat.h>
+
+#include "uapi/ashmem.h"
+
+/* support of 32bit userspace on 64bit platforms */
+#ifdef CONFIG_COMPAT
+#define COMPAT_ASHMEM_SET_SIZE		_IOW(__ASHMEMIOC, 3, compat_size_t)
+#define COMPAT_ASHMEM_SET_PROT_MASK	_IOW(__ASHMEMIOC, 5, unsigned int)
+#endif
+
+#endif	/* _LINUX_ASHMEM_H */

diff --git a/drivers/staging/android/uapi/ashmem.h b/drivers/staging/android/uapi/ashmem.h
new file mode 100644
index 0000000..f962a5f
--- /dev/null
+++ b/drivers/staging/android/uapi/ashmem.h

@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Apache-2.0 */
+/*
+ * Copyright 2008 Google Inc.
+ * Author: Robert Love
+ */
+
+#ifndef _UAPI_LINUX_ASHMEM_H
+#define _UAPI_LINUX_ASHMEM_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#define ASHMEM_NAME_LEN		256
+
+#define ASHMEM_NAME_DEF		"dev/ashmem"
+
+/* Return values from ASHMEM_PIN: Was the mapping purged while unpinned? */
+#define ASHMEM_NOT_PURGED	0
+#define ASHMEM_WAS_PURGED	1
+
+/* Return values from ASHMEM_GET_PIN_STATUS: Is the mapping pinned? */
+#define ASHMEM_IS_UNPINNED	0
+#define ASHMEM_IS_PINNED	1
+
+struct ashmem_pin {
+	__u32 offset;	/* offset into region, in bytes, page-aligned */
+	__u32 len;	/* length forward from offset, in bytes, page-aligned */
+};
+
+#define __ASHMEMIOC		0x77
+
+#define ASHMEM_SET_NAME		_IOW(__ASHMEMIOC, 1, char[ASHMEM_NAME_LEN])
+#define ASHMEM_GET_NAME		_IOR(__ASHMEMIOC, 2, char[ASHMEM_NAME_LEN])
+#define ASHMEM_SET_SIZE		_IOW(__ASHMEMIOC, 3, size_t)
+#define ASHMEM_GET_SIZE		_IO(__ASHMEMIOC, 4)
+#define ASHMEM_SET_PROT_MASK	_IOW(__ASHMEMIOC, 5, unsigned long)
+#define ASHMEM_GET_PROT_MASK	_IO(__ASHMEMIOC, 6)
+#define ASHMEM_PIN		_IOW(__ASHMEMIOC, 7, struct ashmem_pin)
+#define ASHMEM_UNPIN		_IOW(__ASHMEMIOC, 8, struct ashmem_pin)
+#define ASHMEM_GET_PIN_STATUS	_IO(__ASHMEMIOC, 9)
+#define ASHMEM_PURGE_ALL_CACHES	_IO(__ASHMEMIOC, 10)
+#define ASHMEM_GET_FILE_ID		_IOR(__ASHMEMIOC, 11, unsigned long)
+
+#endif	/* _UAPI_LINUX_ASHMEM_H */

diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
index fca0b23..096b221 100644
--- a/drivers/thermal/thermal_helpers.c
+++ b/drivers/thermal/thermal_helpers.c

@@ -238,6 +238,7 @@ void thermal_cdev_update(struct thermal_cooling_device *cdev)
 	}
 	mutex_unlock(&cdev->lock);
 }
+EXPORT_SYMBOL_GPL(thermal_cdev_update);
 
 /**
  * thermal_zone_get_slope - return the slope attribute of the thermal zone

diff --git a/drivers/tty/hvc/hvc_console.h b/drivers/tty/hvc/hvc_console.h
index 18d0058..97200fb 100644
--- a/drivers/tty/hvc/hvc_console.h
+++ b/drivers/tty/hvc/hvc_console.h

@@ -30,7 +30,7 @@
  * for the tty device.  Since this driver supports hotplug of vty adapters we
  * need to make sure we have enough allocated.
  */
-#define HVC_ALLOC_TTY_ADAPTERS	8
+#define HVC_ALLOC_TTY_ADAPTERS	64
 
 struct hvc_struct {
 	struct tty_port port;

diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
index 1751108..3cbd441 100644
--- a/drivers/tty/hvc/hvc_dcc.c
+++ b/drivers/tty/hvc/hvc_dcc.c

@@ -6,6 +6,7 @@
 #include <linux/cpumask.h>
 #include <linux/init.h>
 #include <linux/kfifo.h>
+#include <linux/moduleparam.h>
 #include <linux/serial.h>
 #include <linux/serial_core.h>
 #include <linux/smp.h>
@@ -16,6 +17,13 @@
 
 #include "hvc_console.h"
 
+/*
+ * Disable DCC driver at runtime. Want driver enabled for GKI, but some devices
+ * do not support the registers and crash when driver pokes the registers
+ */
+static bool enable;
+module_param(enable, bool, 0444);
+
 /* DCC Status Bits */
 #define DCC_STATUS_RX		(1 << 30)
 #define DCC_STATUS_TX		(1 << 29)
@@ -257,7 +265,7 @@ static int __init hvc_dcc_console_init(void)
 {
 	int ret;
 
-	if (!hvc_dcc_check())
+	if (!enable || !hvc_dcc_check())
 		return -ENODEV;
 
 	/* Returns -1 if error */
@@ -271,7 +279,7 @@ static int __init hvc_dcc_init(void)
 {
 	struct hvc_struct *p;
 
-	if (!hvc_dcc_check())
+	if (!enable || !hvc_dcc_check())
 		return -ENODEV;
 
 	if (IS_ENABLED(CONFIG_HVC_DCC_SERIALIZE_SMP)) {

diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index 434f831..a504c0e 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig

@@ -238,7 +238,6 @@
 
 config SERIAL_SAMSUNG
 	tristate "Samsung SoC serial support"
-	depends on PLAT_SAMSUNG || ARCH_S5PV210 || ARCH_EXYNOS || ARCH_APPLE || ARCH_ARTPEC || COMPILE_TEST
 	select SERIAL_CORE
 	help
 	  Support for the on-chip UARTs on the Samsung

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index d2b2720..a8e9de2 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c

@@ -55,6 +55,8 @@
 #include <asm/ptrace.h>
 #include <asm/irq_regs.h>
 
+#include <trace/hooks/sysrqcrash.h>
+
 /* Whether we react on sysrq keys or just ignore them */
 static int __read_mostly sysrq_enabled = CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE;
 static bool __read_mostly sysrq_always_enabled;
@@ -152,6 +154,8 @@ static void sysrq_handle_crash(int key)
 	/* release the RCU read lock before crashing */
 	rcu_read_unlock();
 
+	trace_android_vh_sysrq_crash(current);
+
 	panic("sysrq triggered crash\n");
 }
 static const struct sysrq_key_op sysrq_crash_op = {

diff --git a/drivers/ufs/core/ufs-sysfs.c b/drivers/ufs/core/ufs-sysfs.c
index 53aea56..d64db8a 100644
--- a/drivers/ufs/core/ufs-sysfs.c
+++ b/drivers/ufs/core/ufs-sysfs.c

@@ -10,6 +10,8 @@
 #include "ufs-sysfs.h"
 #include "ufshcd-priv.h"
 
+#include <trace/hooks/ufshcd.h>
+
 static const char *ufshcd_uic_link_state_to_string(
 			enum uic_link_state state)
 {
@@ -1337,10 +1339,12 @@ void ufs_sysfs_add_nodes(struct device *dev)
 	int ret;
 
 	ret = sysfs_create_groups(&dev->kobj, ufs_sysfs_groups);
-	if (ret)
+	if (ret) {
 		dev_err(dev,
 			"%s: sysfs groups creation failed (err = %d)\n",
 			__func__, ret);
+		return;
+	}
 }
 
 void ufs_sysfs_remove_nodes(struct device *dev)

diff --git a/drivers/ufs/core/ufshcd-crypto.c b/drivers/ufs/core/ufshcd-crypto.c
index 198360f..72a9400 100644
--- a/drivers/ufs/core/ufshcd-crypto.c
+++ b/drivers/ufs/core/ufshcd-crypto.c

@@ -123,6 +123,10 @@ bool ufshcd_crypto_enable(struct ufs_hba *hba)
 
 	/* Reset might clear all keys, so reprogram all the keys. */
 	blk_crypto_reprogram_all_keys(&hba->crypto_profile);
+
+	if (hba->quirks & UFSHCD_QUIRK_BROKEN_CRYPTO_ENABLE)
+		return false;
+
 	return true;
 }
 
@@ -159,6 +163,9 @@ int ufshcd_hba_init_crypto_capabilities(struct ufs_hba *hba)
 	int err = 0;
 	enum blk_crypto_mode_num blk_mode_num;
 
+	if (hba->quirks & UFSHCD_QUIRK_CUSTOM_CRYPTO_PROFILE)
+		return 0;
+
 	/*
 	 * Don't use crypto if either the hardware doesn't advertise the
 	 * standard crypto capability bit *or* if the vendor specific driver
@@ -190,6 +197,7 @@ int ufshcd_hba_init_crypto_capabilities(struct ufs_hba *hba)
 	hba->crypto_profile.ll_ops = ufshcd_crypto_ops;
 	/* UFS only supports 8 bytes for any DUN */
 	hba->crypto_profile.max_dun_bytes_supported = 8;
+	hba->crypto_profile.key_types_supported = BLK_CRYPTO_KEY_TYPE_STANDARD;
 	hba->crypto_profile.dev = hba->dev;
 
 	/*
@@ -228,9 +236,10 @@ void ufshcd_init_crypto(struct ufs_hba *hba)
 	if (!(hba->caps & UFSHCD_CAP_CRYPTO))
 		return;
 
-	/* Clear all keyslots - the number of keyslots is (CFGC + 1) */
-	for (slot = 0; slot < hba->crypto_capabilities.config_count + 1; slot++)
-		ufshcd_clear_keyslot(hba, slot);
+	/* Clear all keyslots */
+	for (slot = 0; slot < hba->crypto_profile.num_slots; slot++)
+		hba->crypto_profile.ll_ops.keyslot_evict(&hba->crypto_profile,
+							 NULL, slot);
 }
 
 void ufshcd_crypto_register(struct ufs_hba *hba, struct request_queue *q)

diff --git a/drivers/ufs/core/ufshcd-crypto.h b/drivers/ufs/core/ufshcd-crypto.h
index 504cc84..55c2a5d 100644
--- a/drivers/ufs/core/ufshcd-crypto.h
+++ b/drivers/ufs/core/ufshcd-crypto.h

@@ -37,6 +37,19 @@ ufshcd_prepare_req_desc_hdr_crypto(struct ufshcd_lrb *lrbp, u32 *dword_0,
 	}
 }
 
+static inline void ufshcd_crypto_clear_prdt(struct ufs_hba *hba,
+					    struct ufshcd_lrb *lrbp)
+{
+	if (!(hba->quirks & UFSHCD_QUIRK_KEYS_IN_PRDT))
+		return;
+
+	if (!(scsi_cmd_to_rq(lrbp->cmd)->crypt_ctx))
+		return;
+
+	memzero_explicit(lrbp->ucd_prdt_ptr,
+			 hba->sg_entry_size * scsi_sg_count(lrbp->cmd));
+}
+
 bool ufshcd_crypto_enable(struct ufs_hba *hba);
 
 int ufshcd_hba_init_crypto_capabilities(struct ufs_hba *hba);
@@ -54,6 +67,9 @@ static inline void
 ufshcd_prepare_req_desc_hdr_crypto(struct ufshcd_lrb *lrbp, u32 *dword_0,
 				   u32 *dword_1, u32 *dword_3) { }
 
+static inline void ufshcd_crypto_clear_prdt(struct ufs_hba *hba,
+					    struct ufshcd_lrb *lrbp) { }
+
 static inline bool ufshcd_crypto_enable(struct ufs_hba *hba)
 {
 	return false;

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index fb5c9e2..3f0bf07 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c

@@ -40,6 +40,9 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/ufs.h>
 
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/ufshcd.h>
+
 #define UFSHCD_ENABLE_INTRS	(UTP_TRANSFER_REQ_COMPL |\
 				 UTP_TASK_REQ_COMPL |\
 				 UFSHCD_ERROR_MASK)
@@ -346,6 +349,8 @@ static void ufshcd_add_tm_upiu_trace(struct ufs_hba *hba, unsigned int tag,
 {
 	struct utp_task_req_desc *descp = &hba->utmrdl_base_addr[tag];
 
+	trace_android_vh_ufs_send_tm_command(hba, tag, (int)str_t);
+
 	if (!trace_ufshcd_upiu_enabled())
 		return;
 
@@ -367,6 +372,8 @@ static void ufshcd_add_uic_command_trace(struct ufs_hba *hba,
 {
 	u32 cmd;
 
+	trace_android_vh_ufs_send_uic_command(hba, ucmd, (int)str_t);
+
 	if (!trace_ufshcd_uic_command_enabled())
 		return;
 
@@ -525,7 +532,7 @@ void ufshcd_print_trs(struct ufs_hba *hba, unsigned long bitmap, bool pr_prdt)
 		prdt_length = le16_to_cpu(
 			lrbp->utr_descriptor_ptr->prd_table_length);
 		if (hba->quirks & UFSHCD_QUIRK_PRDT_BYTE_GRAN)
-			prdt_length /= sizeof(struct ufshcd_sg_entry);
+			prdt_length /= hba->sg_entry_size;
 
 		dev_err(hba->dev,
 			"UPIU[%d] - PRDT - %d entries  phys@0x%llx\n",
@@ -534,7 +541,7 @@ void ufshcd_print_trs(struct ufs_hba *hba, unsigned long bitmap, bool pr_prdt)
 
 		if (pr_prdt)
 			ufshcd_hex_dump("UPIU PRDT: ", lrbp->ucd_prdt_ptr,
-				sizeof(struct ufshcd_sg_entry) * prdt_length);
+				hba->sg_entry_size * prdt_length);
 	}
 }
 
@@ -2145,6 +2152,7 @@ void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 	lrbp->issue_time_stamp_local_clock = local_clock();
 	lrbp->compl_time_stamp = ktime_set(0, 0);
 	lrbp->compl_time_stamp_local_clock = 0;
+	trace_android_vh_ufs_send_command(hba, lrbp);
 	ufshcd_add_command_trace(hba, task_tag, UFS_CMD_SEND);
 	ufshcd_clk_scaling_start_busy(hba);
 	if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
@@ -2403,11 +2411,12 @@ int ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmd)
  */
 static int ufshcd_map_sg(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 {
-	struct ufshcd_sg_entry *prd_table;
+	struct ufshcd_sg_entry *prd;
 	struct scatterlist *sg;
 	struct scsi_cmnd *cmd;
 	int sg_segments;
 	int i;
+	int err;
 
 	cmd = lrbp->cmd;
 	sg_segments = scsi_dma_map(cmd);
@@ -2418,13 +2427,12 @@ static int ufshcd_map_sg(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 
 		if (hba->quirks & UFSHCD_QUIRK_PRDT_BYTE_GRAN)
 			lrbp->utr_descriptor_ptr->prd_table_length =
-				cpu_to_le16((sg_segments *
-					sizeof(struct ufshcd_sg_entry)));
+				cpu_to_le16(sg_segments * hba->sg_entry_size);
 		else
 			lrbp->utr_descriptor_ptr->prd_table_length =
 				cpu_to_le16(sg_segments);
 
-		prd_table = lrbp->ucd_prdt_ptr;
+		prd = (struct ufshcd_sg_entry *)lrbp->ucd_prdt_ptr;
 
 		scsi_for_each_sg(cmd, sg, sg_segments, i) {
 			const unsigned int len = sg_dma_len(sg);
@@ -2438,15 +2446,18 @@ static int ufshcd_map_sg(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 			 * indicates 4 bytes, '7' indicates 8 bytes, etc."
 			 */
 			WARN_ONCE(len > 256 * 1024, "len = %#x\n", len);
-			prd_table[i].size = cpu_to_le32(len - 1);
-			prd_table[i].addr = cpu_to_le64(sg->dma_address);
-			prd_table[i].reserved = 0;
+			prd->size = cpu_to_le32(len - 1);
+			prd->addr = cpu_to_le64(sg->dma_address);
+			prd->reserved = 0;
+			prd = (void *)prd + hba->sg_entry_size;
 		}
 	} else {
 		lrbp->utr_descriptor_ptr->prd_table_length = 0;
 	}
 
-	return 0;
+	err = 0;
+	trace_android_vh_ufs_fill_prdt(hba, lrbp, sg_segments, &err);
+	return err;
 }
 
 /**
@@ -2730,10 +2741,11 @@ static void ufshcd_map_queues(struct Scsi_Host *shost)
 
 static void ufshcd_init_lrb(struct ufs_hba *hba, struct ufshcd_lrb *lrb, int i)
 {
-	struct utp_transfer_cmd_desc *cmd_descp = hba->ucdl_base_addr;
+	struct utp_transfer_cmd_desc *cmd_descp = (void *)hba->ucdl_base_addr +
+		i * sizeof_utp_transfer_cmd_desc(hba);
 	struct utp_transfer_req_desc *utrdlp = hba->utrdl_base_addr;
 	dma_addr_t cmd_desc_element_addr = hba->ucdl_dma_addr +
-		i * sizeof(struct utp_transfer_cmd_desc);
+		i * sizeof_utp_transfer_cmd_desc(hba);
 	u16 response_offset = offsetof(struct utp_transfer_cmd_desc,
 				       response_upiu);
 	u16 prdt_offset = offsetof(struct utp_transfer_cmd_desc, prd_table);
@@ -2741,11 +2753,11 @@ static void ufshcd_init_lrb(struct ufs_hba *hba, struct ufshcd_lrb *lrb, int i)
 	lrb->utr_descriptor_ptr = utrdlp + i;
 	lrb->utrd_dma_addr = hba->utrdl_dma_addr +
 		i * sizeof(struct utp_transfer_req_desc);
-	lrb->ucd_req_ptr = (struct utp_upiu_req *)(cmd_descp + i);
+	lrb->ucd_req_ptr = (struct utp_upiu_req *)cmd_descp;
 	lrb->ucd_req_dma_addr = cmd_desc_element_addr;
-	lrb->ucd_rsp_ptr = (struct utp_upiu_rsp *)cmd_descp[i].response_upiu;
+	lrb->ucd_rsp_ptr = (struct utp_upiu_rsp *)cmd_descp->response_upiu;
 	lrb->ucd_rsp_dma_addr = cmd_desc_element_addr + response_offset;
-	lrb->ucd_prdt_ptr = cmd_descp[i].prd_table;
+	lrb->ucd_prdt_ptr = (struct ufshcd_sg_entry *)cmd_descp->prd_table;
 	lrb->ucd_prdt_dma_addr = cmd_desc_element_addr + prdt_offset;
 }
 
@@ -2833,6 +2845,14 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 
 	ufshcd_prepare_lrbp_crypto(scsi_cmd_to_rq(cmd), lrbp);
 
+	trace_android_vh_ufs_prepare_command(hba, scsi_cmd_to_rq(cmd), lrbp,
+					     &err);
+	if (err) {
+		lrbp->cmd = NULL;
+		ufshcd_release(hba);
+		goto out;
+	}
+
 	lrbp->req_abort_skip = false;
 
 	ufshpb_prep(hba, lrbp);
@@ -3078,7 +3098,7 @@ static inline void ufshcd_init_query(struct ufs_hba *hba,
 	(*request)->upiu_req.selector = selector;
 }
 
-static int ufshcd_query_flag_retry(struct ufs_hba *hba,
+int ufshcd_query_flag_retry(struct ufs_hba *hba,
 	enum query_opcode opcode, enum flag_idn idn, u8 index, bool *flag_res)
 {
 	int ret;
@@ -3100,6 +3120,7 @@ static int ufshcd_query_flag_retry(struct ufs_hba *hba,
 			__func__, opcode, idn, ret, retries);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(ufshcd_query_flag_retry);
 
 /**
  * ufshcd_query_flag() - API function for sending flag query requests
@@ -3168,6 +3189,7 @@ int ufshcd_query_flag(struct ufs_hba *hba, enum query_opcode opcode,
 	ufshcd_release(hba);
 	return err;
 }
+EXPORT_SYMBOL_GPL(ufshcd_query_flag);
 
 /**
  * ufshcd_query_attr - API function for sending attribute requests
@@ -3231,6 +3253,7 @@ int ufshcd_query_attr(struct ufs_hba *hba, enum query_opcode opcode,
 	ufshcd_release(hba);
 	return err;
 }
+EXPORT_SYMBOL_GPL(ufshcd_query_attr);
 
 /**
  * ufshcd_query_attr_retry() - API function for sending query
@@ -3268,6 +3291,7 @@ int ufshcd_query_attr_retry(struct ufs_hba *hba,
 			__func__, idn, ret, QUERY_REQ_RETRIES);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(ufshcd_query_attr_retry);
 
 static int __ufshcd_query_descriptor(struct ufs_hba *hba,
 			enum query_opcode opcode, enum desc_idn idn, u8 index,
@@ -3363,6 +3387,7 @@ int ufshcd_query_descriptor_retry(struct ufs_hba *hba,
 
 	return err;
 }
+EXPORT_SYMBOL_GPL(ufshcd_query_descriptor_retry);
 
 /**
  * ufshcd_map_desc_id_to_length - map descriptor IDN to its length
@@ -3481,6 +3506,7 @@ int ufshcd_read_desc_param(struct ufs_hba *hba,
 		kfree(desc_buf);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(ufshcd_read_desc_param);
 
 /**
  * struct uc_string_id - unicode string
@@ -3654,7 +3680,7 @@ static int ufshcd_memory_alloc(struct ufs_hba *hba)
 	size_t utmrdl_size, utrdl_size, ucdl_size;
 
 	/* Allocate memory for UTP command descriptors */
-	ucdl_size = (sizeof(struct utp_transfer_cmd_desc) * hba->nutrs);
+	ucdl_size = (sizeof_utp_transfer_cmd_desc(hba) * hba->nutrs);
 	hba->ucdl_base_addr = dmam_alloc_coherent(hba->dev,
 						  ucdl_size,
 						  &hba->ucdl_dma_addr,
@@ -3748,7 +3774,7 @@ static void ufshcd_host_memory_configure(struct ufs_hba *hba)
 	prdt_offset =
 		offsetof(struct utp_transfer_cmd_desc, prd_table);
 
-	cmd_desc_size = sizeof(struct utp_transfer_cmd_desc);
+	cmd_desc_size = sizeof_utp_transfer_cmd_desc(hba);
 	cmd_desc_dma_addr = hba->ucdl_dma_addr;
 
 	for (i = 0; i < hba->nutrs; i++) {
@@ -5338,6 +5364,7 @@ static void ufshcd_release_scsi_cmd(struct ufs_hba *hba,
 	struct scsi_cmnd *cmd = lrbp->cmd;
 
 	scsi_dma_unmap(cmd);
+	ufshcd_crypto_clear_prdt(hba, lrbp);
 	lrbp->cmd = NULL;	/* Mark the command as completed. */
 	ufshcd_release(hba);
 	ufshcd_clk_scaling_update_busy(hba);
@@ -5361,6 +5388,7 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 		lrbp->compl_time_stamp_local_clock = local_clock();
 		cmd = lrbp->cmd;
 		if (cmd) {
+			trace_android_vh_ufs_compl_command(hba, lrbp);
 			if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
 				ufshcd_update_monitor(hba, lrbp);
 			ufshcd_add_command_trace(hba, index, UFS_CMD_COMP);
@@ -5371,6 +5399,7 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 		} else if (lrbp->command_type == UTP_CMD_TYPE_DEV_MANAGE ||
 			lrbp->command_type == UTP_CMD_TYPE_UFS_STORAGE) {
 			if (hba->dev_cmd.complete) {
+				trace_android_vh_ufs_compl_command(hba, lrbp);
 				ufshcd_add_command_trace(hba, index,
 							 UFS_DEV_COMP);
 				complete(hba->dev_cmd.complete);
@@ -5662,7 +5691,7 @@ static inline int ufshcd_get_bkops_status(struct ufs_hba *hba, u32 *status)
  * to know whether auto bkops is enabled or disabled after this function
  * returns control to it.
  */
-static int ufshcd_bkops_ctrl(struct ufs_hba *hba,
+int ufshcd_bkops_ctrl(struct ufs_hba *hba,
 			     enum bkops_status status)
 {
 	int err;
@@ -5687,6 +5716,7 @@ static int ufshcd_bkops_ctrl(struct ufs_hba *hba,
 out:
 	return err;
 }
+EXPORT_SYMBOL_GPL(ufshcd_bkops_ctrl);
 
 /**
  * ufshcd_urgent_bkops - handle urgent bkops exception event
@@ -6571,6 +6601,8 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba, u32 intr_status)
 		queue_eh_work = true;
 	}
 
+	trace_android_vh_ufs_check_int_errors(hba, queue_eh_work);
+
 	if (queue_eh_work) {
 		/*
 		 * update the transfer error masks to sticky bits, let's do this
@@ -9654,6 +9686,7 @@ int ufshcd_alloc_host(struct device *dev, struct ufs_hba **hba_handle)
 	hba->dev = dev;
 	hba->dev_ref_clk_freq = REF_CLK_FREQ_INVAL;
 	hba->nop_out_timeout = NOP_OUT_TIMEOUT;
+	hba->sg_entry_size = sizeof(struct ufshcd_sg_entry);
 	INIT_LIST_HEAD(&hba->clk_list_head);
 	spin_lock_init(&hba->outstanding_lock);
 
@@ -9877,7 +9910,8 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	ufshcd_set_ufs_dev_active(hba);
 
 	async_schedule(ufshcd_async_scan, hba);
-	ufs_sysfs_add_nodes(hba->dev);
+	ufs_sysfs_add_nodes(dev);
+	trace_android_vh_ufs_update_sysfs(hba);
 
 	device_enable_async_suspend(dev);
 	return 0;
@@ -10033,11 +10067,6 @@ static int __init ufshcd_core_init(void)
 {
 	int ret;
 
-	/* Verify that there are no gaps in struct utp_transfer_cmd_desc. */
-	static_assert(sizeof(struct utp_transfer_cmd_desc) ==
-		      2 * ALIGNED_UPIU_SIZE +
-			      SG_ALL * sizeof(struct ufshcd_sg_entry));
-
 	ufs_debugfs_init();
 
 	ret = scsi_register_driver(&ufs_dev_wlun_template.gendrv);

diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
index 4fa2ddf3..87a97af 100644
--- a/drivers/usb/gadget/Kconfig
+++ b/drivers/usb/gadget/Kconfig

@@ -216,6 +216,12 @@
 config USB_F_TCM
 	tristate
 
+config USB_F_ACC
+	tristate
+
+config USB_F_AUDIO_SRC
+	tristate
+
 # this first set of drivers all depend on bulk-capable hardware.
 
 config USB_CONFIGFS
@@ -230,6 +236,14 @@
 	  appropriate symbolic links.
 	  For more information see Documentation/usb/gadget_configfs.rst.
 
+config USB_CONFIGFS_UEVENT
+	bool "Uevent notification of Gadget state"
+	depends on USB_CONFIGFS
+	help
+	  Enable uevent notifications to userspace when the gadget
+	  state changes. The gadget can be in any of the following
+	  three states: "CONNECTED/DISCONNECTED/CONFIGURED"
+
 config USB_CONFIGFS_SERIAL
 	bool "Generic serial bulk in/out"
 	depends on USB_CONFIGFS
@@ -371,6 +385,23 @@
 	  implemented in kernel space (for instance Ethernet, serial or
 	  mass storage) and other are implemented in user space.
 
+config USB_CONFIGFS_F_ACC
+	bool "Accessory gadget"
+	depends on USB_CONFIGFS
+	depends on HID=y
+	select USB_F_ACC
+	help
+	  USB gadget Accessory support
+
+config USB_CONFIGFS_F_AUDIO_SRC
+	bool "Audio Source gadget"
+	depends on USB_CONFIGFS
+	depends on SND
+	select SND_PCM
+	select USB_F_AUDIO_SRC
+	help
+	  USB gadget Audio Source support
+
 config USB_CONFIGFS_F_UAC1
 	bool "Audio Class 1.0"
 	depends on USB_CONFIGFS

diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c
index 7bbc776..c2bff3f5 100644
--- a/drivers/usb/gadget/configfs.c
+++ b/drivers/usb/gadget/configfs.c

@@ -10,6 +10,32 @@
 #include "u_f.h"
 #include "u_os_desc.h"
 
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+#include <linux/platform_device.h>
+#include <linux/kdev_t.h>
+#include <linux/usb/ch9.h>
+
+#ifdef CONFIG_USB_CONFIGFS_F_ACC
+extern int acc_ctrlrequest(struct usb_composite_dev *cdev,
+				const struct usb_ctrlrequest *ctrl);
+void acc_disconnect(void);
+#endif
+static struct class *android_class;
+static struct device *android_device;
+static int index;
+static int gadget_index;
+
+struct device *create_function_device(char *name)
+{
+	if (android_device && !IS_ERR(android_device))
+		return device_create(android_class, android_device,
+			MKDEV(0, index++), NULL, name);
+	else
+		return ERR_PTR(-EINVAL);
+}
+EXPORT_SYMBOL_GPL(create_function_device);
+#endif
+
 int check_user_usb_string(const char *name,
 		struct usb_gadget_strings *stringtab_dev)
 {
@@ -51,6 +77,12 @@ struct gadget_info {
 	char qw_sign[OS_STRING_QW_SIGN_LEN];
 	spinlock_t spinlock;
 	bool unbind;
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+	bool connected;
+	bool sw_connected;
+	struct work_struct work;
+	struct device *dev;
+#endif
 };
 
 static inline struct gadget_info *to_gadget_info(struct config_item *item)
@@ -273,7 +305,7 @@ static ssize_t gadget_dev_desc_UDC_store(struct config_item *item,
 
 	mutex_lock(&gi->lock);
 
-	if (!strlen(name)) {
+	if (!strlen(name) || strcmp(name, "none") == 0) {
 		ret = unregister_gadget(gi);
 		if (ret)
 			goto err;
@@ -1418,6 +1450,57 @@ static int configfs_composite_bind(struct usb_gadget *gadget,
 	return ret;
 }
 
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+static void android_work(struct work_struct *data)
+{
+	struct gadget_info *gi = container_of(data, struct gadget_info, work);
+	struct usb_composite_dev *cdev = &gi->cdev;
+	char *disconnected[2] = { "USB_STATE=DISCONNECTED", NULL };
+	char *connected[2]    = { "USB_STATE=CONNECTED", NULL };
+	char *configured[2]   = { "USB_STATE=CONFIGURED", NULL };
+	/* 0-connected 1-configured 2-disconnected*/
+	bool status[3] = { false, false, false };
+	unsigned long flags;
+	bool uevent_sent = false;
+
+	spin_lock_irqsave(&cdev->lock, flags);
+	if (cdev->config)
+		status[1] = true;
+
+	if (gi->connected != gi->sw_connected) {
+		if (gi->connected)
+			status[0] = true;
+		else
+			status[2] = true;
+		gi->sw_connected = gi->connected;
+	}
+	spin_unlock_irqrestore(&cdev->lock, flags);
+
+	if (status[0]) {
+		kobject_uevent_env(&gi->dev->kobj, KOBJ_CHANGE, connected);
+		pr_info("%s: sent uevent %s\n", __func__, connected[0]);
+		uevent_sent = true;
+	}
+
+	if (status[1]) {
+		kobject_uevent_env(&gi->dev->kobj, KOBJ_CHANGE, configured);
+		pr_info("%s: sent uevent %s\n", __func__, configured[0]);
+		uevent_sent = true;
+	}
+
+	if (status[2]) {
+		kobject_uevent_env(&gi->dev->kobj, KOBJ_CHANGE, disconnected);
+		pr_info("%s: sent uevent %s\n", __func__, disconnected[0]);
+		uevent_sent = true;
+	}
+
+	if (!uevent_sent) {
+		pr_info("%s: did not send uevent (%d %d %p)\n", __func__,
+			gi->connected, gi->sw_connected, cdev->config);
+	}
+}
+#endif
+
 static void configfs_composite_unbind(struct usb_gadget *gadget)
 {
 	struct usb_composite_dev	*cdev;
@@ -1445,6 +1528,50 @@ static void configfs_composite_unbind(struct usb_gadget *gadget)
 	spin_unlock_irqrestore(&gi->spinlock, flags);
 }
 
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+static int android_setup(struct usb_gadget *gadget,
+			const struct usb_ctrlrequest *c)
+{
+	struct usb_composite_dev *cdev = get_gadget_data(gadget);
+	unsigned long flags;
+	struct gadget_info *gi = container_of(cdev, struct gadget_info, cdev);
+	int value = -EOPNOTSUPP;
+	struct usb_function_instance *fi;
+
+	spin_lock_irqsave(&cdev->lock, flags);
+	if (!gi->connected) {
+		gi->connected = 1;
+		schedule_work(&gi->work);
+	}
+	spin_unlock_irqrestore(&cdev->lock, flags);
+	list_for_each_entry(fi, &gi->available_func, cfs_list) {
+		if (fi != NULL && fi->f != NULL && fi->f->setup != NULL) {
+			value = fi->f->setup(fi->f, c);
+			if (value >= 0)
+				break;
+		}
+	}
+
+#ifdef CONFIG_USB_CONFIGFS_F_ACC
+	if (value < 0)
+		value = acc_ctrlrequest(cdev, c);
+#endif
+
+	if (value < 0)
+		value = composite_setup(gadget, c);
+
+	spin_lock_irqsave(&cdev->lock, flags);
+	if (c->bRequest == USB_REQ_SET_CONFIGURATION &&
+						cdev->config) {
+		schedule_work(&gi->work);
+	}
+	spin_unlock_irqrestore(&cdev->lock, flags);
+
+	return value;
+}
+
+#else // CONFIG_USB_CONFIGFS_UEVENT
+
 static int configfs_composite_setup(struct usb_gadget *gadget,
 		const struct usb_ctrlrequest *ctrl)
 {
@@ -1470,6 +1597,8 @@ static int configfs_composite_setup(struct usb_gadget *gadget,
 	return ret;
 }
 
+#endif // CONFIG_USB_CONFIGFS_UEVENT
+
 static void configfs_composite_disconnect(struct usb_gadget *gadget)
 {
 	struct usb_composite_dev *cdev;
@@ -1480,6 +1609,14 @@ static void configfs_composite_disconnect(struct usb_gadget *gadget)
 	if (!cdev)
 		return;
 
+#ifdef CONFIG_USB_CONFIGFS_F_ACC
+	/*
+	 * accessory HID support can be active while the
+	 * accessory function is not actually enabled,
+	 * so we need to inform it when we are disconnected.
+	 */
+	acc_disconnect();
+#endif
 	gi = container_of(cdev, struct gadget_info, cdev);
 	spin_lock_irqsave(&gi->spinlock, flags);
 	cdev = get_gadget_data(gadget);
@@ -1488,6 +1625,10 @@ static void configfs_composite_disconnect(struct usb_gadget *gadget)
 		return;
 	}
 
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+	gi->connected = 0;
+	schedule_work(&gi->work);
+#endif
 	composite_disconnect(gadget);
 	spin_unlock_irqrestore(&gi->spinlock, flags);
 }
@@ -1562,10 +1703,13 @@ static const struct usb_gadget_driver configfs_driver_template = {
 	.bind           = configfs_composite_bind,
 	.unbind         = configfs_composite_unbind,
 
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+	.setup          = android_setup,
+#else
 	.setup          = configfs_composite_setup,
+#endif
 	.reset          = configfs_composite_reset,
 	.disconnect     = configfs_composite_disconnect,
-
 	.suspend	= configfs_composite_suspend,
 	.resume		= configfs_composite_resume,
 
@@ -1576,6 +1720,91 @@ static const struct usb_gadget_driver configfs_driver_template = {
 	.match_existing_only = 1,
 };
 
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+static ssize_t state_show(struct device *pdev, struct device_attribute *attr,
+			char *buf)
+{
+	struct gadget_info *dev = dev_get_drvdata(pdev);
+	struct usb_composite_dev *cdev;
+	char *state = "DISCONNECTED";
+	unsigned long flags;
+
+	if (!dev)
+		goto out;
+
+	cdev = &dev->cdev;
+
+	if (!cdev)
+		goto out;
+
+	spin_lock_irqsave(&cdev->lock, flags);
+	if (cdev->config)
+		state = "CONFIGURED";
+	else if (dev->connected)
+		state = "CONNECTED";
+	spin_unlock_irqrestore(&cdev->lock, flags);
+out:
+	return sprintf(buf, "%s\n", state);
+}
+
+static DEVICE_ATTR(state, S_IRUGO, state_show, NULL);
+
+static struct device_attribute *android_usb_attributes[] = {
+	&dev_attr_state,
+	NULL
+};
+
+static int android_device_create(struct gadget_info *gi)
+{
+	struct device_attribute **attrs;
+	struct device_attribute *attr;
+
+	INIT_WORK(&gi->work, android_work);
+	gi->dev = device_create(android_class, NULL,
+			MKDEV(0, 0), NULL, "android%d", gadget_index++);
+	if (IS_ERR(gi->dev))
+		return PTR_ERR(gi->dev);
+
+	dev_set_drvdata(gi->dev, gi);
+	if (!android_device)
+		android_device = gi->dev;
+
+	attrs = android_usb_attributes;
+	while ((attr = *attrs++)) {
+		int err;
+
+		err = device_create_file(gi->dev, attr);
+		if (err) {
+			device_destroy(gi->dev->class,
+				       gi->dev->devt);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+static void android_device_destroy(struct gadget_info *gi)
+{
+	struct device_attribute **attrs;
+	struct device_attribute *attr;
+
+	attrs = android_usb_attributes;
+	while ((attr = *attrs++))
+		device_remove_file(gi->dev, attr);
+	device_destroy(gi->dev->class, gi->dev->devt);
+}
+#else
+static inline int android_device_create(struct gadget_info *gi)
+{
+	return 0;
+}
+
+static inline void android_device_destroy(struct gadget_info *gi)
+{
+}
+#endif
+
 static struct config_group *gadgets_make(
 		struct config_group *group,
 		const char *name)
@@ -1633,6 +1862,9 @@ static struct config_group *gadgets_make(
 	if (!gi->composite.gadget_driver.function)
 		goto out_free_driver_name;
 
+	if (android_device_create(gi) < 0)
+		goto out_free_driver_name;
+
 	return &gi->group;
 
 out_free_driver_name:
@@ -1644,7 +1876,11 @@ static struct config_group *gadgets_make(
 
 static void gadgets_drop(struct config_group *group, struct config_item *item)
 {
+	struct gadget_info *gi;
+
+	gi = container_of(to_config_group(item), struct gadget_info, group);
 	config_item_put(item);
+	android_device_destroy(gi);
 }
 
 static struct configfs_group_operations gadgets_ops = {
@@ -1684,6 +1920,13 @@ static int __init gadget_cfs_init(void)
 	config_group_init(&gadget_subsys.su_group);
 
 	ret = configfs_register_subsystem(&gadget_subsys);
+
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+	android_class = class_create(THIS_MODULE, "android_usb");
+	if (IS_ERR(android_class))
+		return PTR_ERR(android_class);
+#endif
+
 	return ret;
 }
 module_init(gadget_cfs_init);
@@ -1691,5 +1934,10 @@ module_init(gadget_cfs_init);
 static void __exit gadget_cfs_exit(void)
 {
 	configfs_unregister_subsystem(&gadget_subsys);
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+	if (!IS_ERR(android_class))
+		class_destroy(android_class);
+#endif
+
 }
 module_exit(gadget_cfs_exit);

diff --git a/drivers/usb/gadget/function/Makefile b/drivers/usb/gadget/function/Makefile
index 5d3a6cf..dd33a12 100644
--- a/drivers/usb/gadget/function/Makefile
+++ b/drivers/usb/gadget/function/Makefile

@@ -50,3 +50,7 @@
 obj-$(CONFIG_USB_F_PRINTER)	+= usb_f_printer.o
 usb_f_tcm-y			:= f_tcm.o
 obj-$(CONFIG_USB_F_TCM)		+= usb_f_tcm.o
+usb_f_accessory-y		:= f_accessory.o
+obj-$(CONFIG_USB_F_ACC)		+= usb_f_accessory.o
+usb_f_audio_source-y		:= f_audio_source.o
+obj-$(CONFIG_USB_F_AUDIO_SRC)	+= usb_f_audio_source.o

diff --git a/drivers/usb/gadget/function/f_accessory.c b/drivers/usb/gadget/function/f_accessory.c
new file mode 100644
index 0000000..3510f6d
--- /dev/null
+++ b/drivers/usb/gadget/function/f_accessory.c

@@ -0,0 +1,1554 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Gadget Function Driver for Android USB accessories
+ *
+ * Copyright (C) 2011 Google, Inc.
+ * Author: Mike Lockwood <lockwood@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+/* #define DEBUG */
+/* #define VERBOSE_DEBUG */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/poll.h>
+#include <linux/delay.h>
+#include <linux/wait.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/kthread.h>
+#include <linux/freezer.h>
+#include <linux/kref.h>
+
+#include <linux/types.h>
+#include <linux/file.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+
+#include <linux/hid.h>
+#include <linux/hiddev.h>
+#include <linux/usb.h>
+#include <linux/usb/ch9.h>
+#include <linux/usb/f_accessory.h>
+
+#include <linux/configfs.h>
+#include <linux/usb/composite.h>
+
+#define MAX_INST_NAME_LEN        40
+#define BULK_BUFFER_SIZE    16384
+#define ACC_STRING_SIZE     256
+
+#define PROTOCOL_VERSION    2
+
+/* String IDs */
+#define INTERFACE_STRING_INDEX	0
+
+/* number of tx and rx requests to allocate */
+#define TX_REQ_MAX 4
+#define RX_REQ_MAX 2
+
+struct acc_hid_dev {
+	struct list_head	list;
+	struct hid_device *hid;
+	struct acc_dev *dev;
+	/* accessory defined ID */
+	int id;
+	/* HID report descriptor */
+	u8 *report_desc;
+	/* length of HID report descriptor */
+	int report_desc_len;
+	/* number of bytes of report_desc we have received so far */
+	int report_desc_offset;
+};
+
+struct acc_dev {
+	struct usb_function function;
+	struct usb_composite_dev *cdev;
+	spinlock_t lock;
+	struct acc_dev_ref *ref;
+
+	struct usb_ep *ep_in;
+	struct usb_ep *ep_out;
+
+	/* online indicates state of function_set_alt & function_unbind
+	 * set to 1 when we connect
+	 */
+	int online;
+
+	/* disconnected indicates state of open & release
+	 * Set to 1 when we disconnect.
+	 * Not cleared until our file is closed.
+	 */
+	int disconnected;
+
+	/* strings sent by the host */
+	char manufacturer[ACC_STRING_SIZE];
+	char model[ACC_STRING_SIZE];
+	char description[ACC_STRING_SIZE];
+	char version[ACC_STRING_SIZE];
+	char uri[ACC_STRING_SIZE];
+	char serial[ACC_STRING_SIZE];
+
+	/* for acc_complete_set_string */
+	int string_index;
+
+	/* set to 1 if we have a pending start request */
+	int start_requested;
+
+	int audio_mode;
+
+	/* synchronize access to our device file */
+	atomic_t open_excl;
+
+	struct list_head tx_idle;
+
+	wait_queue_head_t read_wq;
+	wait_queue_head_t write_wq;
+	struct usb_request *rx_req[RX_REQ_MAX];
+	int rx_done;
+
+	/* delayed work for handling ACCESSORY_START */
+	struct delayed_work start_work;
+
+	/* work for handling ACCESSORY GET PROTOCOL */
+	struct work_struct getprotocol_work;
+
+	/* work for handling ACCESSORY SEND STRING */
+	struct work_struct sendstring_work;
+
+	/* worker for registering and unregistering hid devices */
+	struct work_struct hid_work;
+
+	/* list of active HID devices */
+	struct list_head	hid_list;
+
+	/* list of new HID devices to register */
+	struct list_head	new_hid_list;
+
+	/* list of dead HID devices to unregister */
+	struct list_head	dead_hid_list;
+};
+
+static struct usb_interface_descriptor acc_interface_desc = {
+	.bLength                = USB_DT_INTERFACE_SIZE,
+	.bDescriptorType        = USB_DT_INTERFACE,
+	.bInterfaceNumber       = 0,
+	.bNumEndpoints          = 2,
+	.bInterfaceClass        = USB_CLASS_VENDOR_SPEC,
+	.bInterfaceSubClass     = USB_SUBCLASS_VENDOR_SPEC,
+	.bInterfaceProtocol     = 0,
+};
+
+static struct usb_endpoint_descriptor acc_superspeedplus_in_desc = {
+	.bLength                = USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType        = USB_DT_ENDPOINT,
+	.bEndpointAddress       = USB_DIR_IN,
+	.bmAttributes           = USB_ENDPOINT_XFER_BULK,
+	.wMaxPacketSize         = cpu_to_le16(1024),
+};
+
+static struct usb_endpoint_descriptor acc_superspeedplus_out_desc = {
+	.bLength                = USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType        = USB_DT_ENDPOINT,
+	.bEndpointAddress       = USB_DIR_OUT,
+	.bmAttributes           = USB_ENDPOINT_XFER_BULK,
+	.wMaxPacketSize         = cpu_to_le16(1024),
+};
+
+static struct usb_ss_ep_comp_descriptor acc_superspeedplus_comp_desc = {
+	.bLength                = sizeof(acc_superspeedplus_comp_desc),
+	.bDescriptorType        = USB_DT_SS_ENDPOINT_COMP,
+
+	/* the following 2 values can be tweaked if necessary */
+	/* .bMaxBurst =         0, */
+	/* .bmAttributes =      0, */
+};
+
+static struct usb_endpoint_descriptor acc_superspeed_in_desc = {
+	.bLength                = USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType        = USB_DT_ENDPOINT,
+	.bEndpointAddress       = USB_DIR_IN,
+	.bmAttributes           = USB_ENDPOINT_XFER_BULK,
+	.wMaxPacketSize         = cpu_to_le16(1024),
+};
+
+static struct usb_endpoint_descriptor acc_superspeed_out_desc = {
+	.bLength                = USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType        = USB_DT_ENDPOINT,
+	.bEndpointAddress       = USB_DIR_OUT,
+	.bmAttributes           = USB_ENDPOINT_XFER_BULK,
+	.wMaxPacketSize         = cpu_to_le16(1024),
+};
+
+static struct usb_ss_ep_comp_descriptor acc_superspeed_comp_desc = {
+	.bLength                = sizeof(acc_superspeed_comp_desc),
+	.bDescriptorType        = USB_DT_SS_ENDPOINT_COMP,
+
+	/* the following 2 values can be tweaked if necessary */
+	/* .bMaxBurst =         0, */
+	/* .bmAttributes =      0, */
+};
+
+static struct usb_endpoint_descriptor acc_highspeed_in_desc = {
+	.bLength                = USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType        = USB_DT_ENDPOINT,
+	.bEndpointAddress       = USB_DIR_IN,
+	.bmAttributes           = USB_ENDPOINT_XFER_BULK,
+	.wMaxPacketSize         = cpu_to_le16(512),
+};
+
+static struct usb_endpoint_descriptor acc_highspeed_out_desc = {
+	.bLength                = USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType        = USB_DT_ENDPOINT,
+	.bEndpointAddress       = USB_DIR_OUT,
+	.bmAttributes           = USB_ENDPOINT_XFER_BULK,
+	.wMaxPacketSize         = cpu_to_le16(512),
+};
+
+static struct usb_endpoint_descriptor acc_fullspeed_in_desc = {
+	.bLength                = USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType        = USB_DT_ENDPOINT,
+	.bEndpointAddress       = USB_DIR_IN,
+	.bmAttributes           = USB_ENDPOINT_XFER_BULK,
+};
+
+static struct usb_endpoint_descriptor acc_fullspeed_out_desc = {
+	.bLength                = USB_DT_ENDPOINT_SIZE,
+	.bDescriptorType        = USB_DT_ENDPOINT,
+	.bEndpointAddress       = USB_DIR_OUT,
+	.bmAttributes           = USB_ENDPOINT_XFER_BULK,
+};
+
+static struct usb_descriptor_header *fs_acc_descs[] = {
+	(struct usb_descriptor_header *) &acc_interface_desc,
+	(struct usb_descriptor_header *) &acc_fullspeed_in_desc,
+	(struct usb_descriptor_header *) &acc_fullspeed_out_desc,
+	NULL,
+};
+
+static struct usb_descriptor_header *hs_acc_descs[] = {
+	(struct usb_descriptor_header *) &acc_interface_desc,
+	(struct usb_descriptor_header *) &acc_highspeed_in_desc,
+	(struct usb_descriptor_header *) &acc_highspeed_out_desc,
+	NULL,
+};
+
+static struct usb_descriptor_header *ss_acc_descs[] = {
+	(struct usb_descriptor_header *) &acc_interface_desc,
+	(struct usb_descriptor_header *) &acc_superspeed_in_desc,
+	(struct usb_descriptor_header *) &acc_superspeed_comp_desc,
+	(struct usb_descriptor_header *) &acc_superspeed_out_desc,
+	(struct usb_descriptor_header *) &acc_superspeed_comp_desc,
+	NULL,
+};
+
+static struct usb_descriptor_header *ssp_acc_descs[] = {
+	(struct usb_descriptor_header *) &acc_interface_desc,
+	(struct usb_descriptor_header *) &acc_superspeedplus_in_desc,
+	(struct usb_descriptor_header *) &acc_superspeedplus_comp_desc,
+	(struct usb_descriptor_header *) &acc_superspeedplus_out_desc,
+	(struct usb_descriptor_header *) &acc_superspeedplus_comp_desc,
+	NULL,
+};
+
+static struct usb_string acc_string_defs[] = {
+	[INTERFACE_STRING_INDEX].s	= "Android Accessory Interface",
+	{  },	/* end of list */
+};
+
+static struct usb_gadget_strings acc_string_table = {
+	.language		= 0x0409,	/* en-US */
+	.strings		= acc_string_defs,
+};
+
+static struct usb_gadget_strings *acc_strings[] = {
+	&acc_string_table,
+	NULL,
+};
+
+struct acc_dev_ref {
+	struct kref	kref;
+	struct acc_dev	*acc_dev;
+};
+
+static struct acc_dev_ref _acc_dev_ref = {
+	.kref = KREF_INIT(0),
+};
+
+struct acc_instance {
+	struct usb_function_instance func_inst;
+	const char *name;
+};
+
+static struct acc_dev *get_acc_dev(void)
+{
+	struct acc_dev_ref *ref = &_acc_dev_ref;
+
+	return kref_get_unless_zero(&ref->kref) ? ref->acc_dev : NULL;
+}
+
+static void __put_acc_dev(struct kref *kref)
+{
+	struct acc_dev_ref *ref = container_of(kref, struct acc_dev_ref, kref);
+	struct acc_dev *dev = ref->acc_dev;
+
+	/* Cancel any async work */
+	cancel_delayed_work_sync(&dev->start_work);
+	cancel_work_sync(&dev->getprotocol_work);
+	cancel_work_sync(&dev->sendstring_work);
+	cancel_work_sync(&dev->hid_work);
+
+	ref->acc_dev = NULL;
+	kfree(dev);
+}
+
+static void put_acc_dev(struct acc_dev *dev)
+{
+	struct acc_dev_ref *ref = dev->ref;
+
+	WARN_ON(ref->acc_dev != dev);
+	kref_put(&ref->kref, __put_acc_dev);
+}
+
+static inline struct acc_dev *func_to_dev(struct usb_function *f)
+{
+	return container_of(f, struct acc_dev, function);
+}
+
+static struct usb_request *acc_request_new(struct usb_ep *ep, int buffer_size)
+{
+	struct usb_request *req = usb_ep_alloc_request(ep, GFP_KERNEL);
+
+	if (!req)
+		return NULL;
+
+	/* now allocate buffers for the requests */
+	req->buf = kmalloc(buffer_size, GFP_KERNEL);
+	if (!req->buf) {
+		usb_ep_free_request(ep, req);
+		return NULL;
+	}
+
+	return req;
+}
+
+static void acc_request_free(struct usb_request *req, struct usb_ep *ep)
+{
+	if (req) {
+		kfree(req->buf);
+		usb_ep_free_request(ep, req);
+	}
+}
+
+/* add a request to the tail of a list */
+static void req_put(struct acc_dev *dev, struct list_head *head,
+		struct usb_request *req)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	list_add_tail(&req->list, head);
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/* remove a request from the head of a list */
+static struct usb_request *req_get(struct acc_dev *dev, struct list_head *head)
+{
+	unsigned long flags;
+	struct usb_request *req;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	if (list_empty(head)) {
+		req = 0;
+	} else {
+		req = list_first_entry(head, struct usb_request, list);
+		list_del(&req->list);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+	return req;
+}
+
+static void acc_set_disconnected(struct acc_dev *dev)
+{
+	dev->disconnected = 1;
+}
+
+static void acc_complete_in(struct usb_ep *ep, struct usb_request *req)
+{
+	struct acc_dev *dev = get_acc_dev();
+
+	if (!dev)
+		return;
+
+	if (req->status == -ESHUTDOWN) {
+		pr_debug("acc_complete_in set disconnected");
+		acc_set_disconnected(dev);
+	}
+
+	req_put(dev, &dev->tx_idle, req);
+
+	wake_up(&dev->write_wq);
+	put_acc_dev(dev);
+}
+
+static void acc_complete_out(struct usb_ep *ep, struct usb_request *req)
+{
+	struct acc_dev *dev = get_acc_dev();
+
+	if (!dev)
+		return;
+
+	dev->rx_done = 1;
+	if (req->status == -ESHUTDOWN) {
+		pr_debug("acc_complete_out set disconnected");
+		acc_set_disconnected(dev);
+	}
+
+	wake_up(&dev->read_wq);
+	put_acc_dev(dev);
+}
+
+static void acc_complete_set_string(struct usb_ep *ep, struct usb_request *req)
+{
+	struct acc_dev	*dev = ep->driver_data;
+	char *string_dest = NULL;
+	int length = req->actual;
+
+	if (req->status != 0) {
+		pr_err("acc_complete_set_string, err %d\n", req->status);
+		return;
+	}
+
+	switch (dev->string_index) {
+	case ACCESSORY_STRING_MANUFACTURER:
+		string_dest = dev->manufacturer;
+		break;
+	case ACCESSORY_STRING_MODEL:
+		string_dest = dev->model;
+		break;
+	case ACCESSORY_STRING_DESCRIPTION:
+		string_dest = dev->description;
+		break;
+	case ACCESSORY_STRING_VERSION:
+		string_dest = dev->version;
+		break;
+	case ACCESSORY_STRING_URI:
+		string_dest = dev->uri;
+		break;
+	case ACCESSORY_STRING_SERIAL:
+		string_dest = dev->serial;
+		break;
+	}
+	if (string_dest) {
+		unsigned long flags;
+
+		if (length >= ACC_STRING_SIZE)
+			length = ACC_STRING_SIZE - 1;
+
+		spin_lock_irqsave(&dev->lock, flags);
+		memcpy(string_dest, req->buf, length);
+		/* ensure zero termination */
+		string_dest[length] = 0;
+		spin_unlock_irqrestore(&dev->lock, flags);
+	} else {
+		pr_err("unknown accessory string index %d\n",
+			dev->string_index);
+	}
+}
+
+static void acc_complete_set_hid_report_desc(struct usb_ep *ep,
+		struct usb_request *req)
+{
+	struct acc_hid_dev *hid = req->context;
+	struct acc_dev *dev = hid->dev;
+	int length = req->actual;
+
+	if (req->status != 0) {
+		pr_err("acc_complete_set_hid_report_desc, err %d\n",
+			req->status);
+		return;
+	}
+
+	memcpy(hid->report_desc + hid->report_desc_offset, req->buf, length);
+	hid->report_desc_offset += length;
+	if (hid->report_desc_offset == hid->report_desc_len) {
+		/* After we have received the entire report descriptor
+		 * we schedule work to initialize the HID device
+		 */
+		schedule_work(&dev->hid_work);
+	}
+}
+
+static void acc_complete_send_hid_event(struct usb_ep *ep,
+		struct usb_request *req)
+{
+	struct acc_hid_dev *hid = req->context;
+	int length = req->actual;
+
+	if (req->status != 0) {
+		pr_err("acc_complete_send_hid_event, err %d\n", req->status);
+		return;
+	}
+
+	hid_report_raw_event(hid->hid, HID_INPUT_REPORT, req->buf, length, 1);
+}
+
+static int acc_hid_parse(struct hid_device *hid)
+{
+	struct acc_hid_dev *hdev = hid->driver_data;
+
+	hid_parse_report(hid, hdev->report_desc, hdev->report_desc_len);
+	return 0;
+}
+
+static int acc_hid_start(struct hid_device *hid)
+{
+	return 0;
+}
+
+static void acc_hid_stop(struct hid_device *hid)
+{
+}
+
+static int acc_hid_open(struct hid_device *hid)
+{
+	return 0;
+}
+
+static void acc_hid_close(struct hid_device *hid)
+{
+}
+
+static int acc_hid_raw_request(struct hid_device *hid, unsigned char reportnum,
+	__u8 *buf, size_t len, unsigned char rtype, int reqtype)
+{
+	return 0;
+}
+
+static struct hid_ll_driver acc_hid_ll_driver = {
+	.parse = acc_hid_parse,
+	.start = acc_hid_start,
+	.stop = acc_hid_stop,
+	.open = acc_hid_open,
+	.close = acc_hid_close,
+	.raw_request = acc_hid_raw_request,
+};
+
+static struct acc_hid_dev *acc_hid_new(struct acc_dev *dev,
+		int id, int desc_len)
+{
+	struct acc_hid_dev *hdev;
+
+	hdev = kzalloc(sizeof(*hdev), GFP_ATOMIC);
+	if (!hdev)
+		return NULL;
+	hdev->report_desc = kzalloc(desc_len, GFP_ATOMIC);
+	if (!hdev->report_desc) {
+		kfree(hdev);
+		return NULL;
+	}
+	hdev->dev = dev;
+	hdev->id = id;
+	hdev->report_desc_len = desc_len;
+
+	return hdev;
+}
+
+static struct acc_hid_dev *acc_hid_get(struct list_head *list, int id)
+{
+	struct acc_hid_dev *hid;
+
+	list_for_each_entry(hid, list, list) {
+		if (hid->id == id)
+			return hid;
+	}
+	return NULL;
+}
+
+static int acc_register_hid(struct acc_dev *dev, int id, int desc_length)
+{
+	struct acc_hid_dev *hid;
+	unsigned long flags;
+
+	/* report descriptor length must be > 0 */
+	if (desc_length <= 0)
+		return -EINVAL;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	/* replace HID if one already exists with this ID */
+	hid = acc_hid_get(&dev->hid_list, id);
+	if (!hid)
+		hid = acc_hid_get(&dev->new_hid_list, id);
+	if (hid)
+		list_move(&hid->list, &dev->dead_hid_list);
+
+	hid = acc_hid_new(dev, id, desc_length);
+	if (!hid) {
+		spin_unlock_irqrestore(&dev->lock, flags);
+		return -ENOMEM;
+	}
+
+	list_add(&hid->list, &dev->new_hid_list);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	/* schedule work to register the HID device */
+	schedule_work(&dev->hid_work);
+	return 0;
+}
+
+static int acc_unregister_hid(struct acc_dev *dev, int id)
+{
+	struct acc_hid_dev *hid;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	hid = acc_hid_get(&dev->hid_list, id);
+	if (!hid)
+		hid = acc_hid_get(&dev->new_hid_list, id);
+	if (!hid) {
+		spin_unlock_irqrestore(&dev->lock, flags);
+		return -EINVAL;
+	}
+
+	list_move(&hid->list, &dev->dead_hid_list);
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	schedule_work(&dev->hid_work);
+	return 0;
+}
+
+static int create_bulk_endpoints(struct acc_dev *dev,
+				struct usb_endpoint_descriptor *in_desc,
+				struct usb_endpoint_descriptor *out_desc)
+{
+	struct usb_composite_dev *cdev = dev->cdev;
+	struct usb_request *req;
+	struct usb_ep *ep;
+	int i;
+
+	DBG(cdev, "create_bulk_endpoints dev: %p\n", dev);
+
+	ep = usb_ep_autoconfig(cdev->gadget, in_desc);
+	if (!ep) {
+		DBG(cdev, "usb_ep_autoconfig for ep_in failed\n");
+		return -ENODEV;
+	}
+	DBG(cdev, "usb_ep_autoconfig for ep_in got %s\n", ep->name);
+	ep->driver_data = dev;		/* claim the endpoint */
+	dev->ep_in = ep;
+
+	ep = usb_ep_autoconfig(cdev->gadget, out_desc);
+	if (!ep) {
+		DBG(cdev, "usb_ep_autoconfig for ep_out failed\n");
+		return -ENODEV;
+	}
+	DBG(cdev, "usb_ep_autoconfig for ep_out got %s\n", ep->name);
+	ep->driver_data = dev;		/* claim the endpoint */
+	dev->ep_out = ep;
+
+	/* now allocate requests for our endpoints */
+	for (i = 0; i < TX_REQ_MAX; i++) {
+		req = acc_request_new(dev->ep_in, BULK_BUFFER_SIZE);
+		if (!req)
+			goto fail;
+		req->complete = acc_complete_in;
+		req_put(dev, &dev->tx_idle, req);
+	}
+	for (i = 0; i < RX_REQ_MAX; i++) {
+		req = acc_request_new(dev->ep_out, BULK_BUFFER_SIZE);
+		if (!req)
+			goto fail;
+		req->complete = acc_complete_out;
+		dev->rx_req[i] = req;
+	}
+
+	return 0;
+
+fail:
+	pr_err("acc_bind() could not allocate requests\n");
+	while ((req = req_get(dev, &dev->tx_idle)))
+		acc_request_free(req, dev->ep_in);
+	for (i = 0; i < RX_REQ_MAX; i++) {
+		acc_request_free(dev->rx_req[i], dev->ep_out);
+		dev->rx_req[i] = NULL;
+	}
+
+	return -1;
+}
+
+static ssize_t acc_read(struct file *fp, char __user *buf,
+	size_t count, loff_t *pos)
+{
+	struct acc_dev *dev = fp->private_data;
+	struct usb_request *req;
+	ssize_t r = count;
+	ssize_t data_length;
+	unsigned xfer;
+	int ret = 0;
+
+	pr_debug("acc_read(%zu)\n", count);
+
+	if (dev->disconnected) {
+		pr_debug("acc_read disconnected");
+		return -ENODEV;
+	}
+
+	if (count > BULK_BUFFER_SIZE)
+		count = BULK_BUFFER_SIZE;
+
+	/* we will block until we're online */
+	pr_debug("acc_read: waiting for online\n");
+	ret = wait_event_interruptible(dev->read_wq, dev->online);
+	if (ret < 0) {
+		r = ret;
+		goto done;
+	}
+
+	if (!dev->rx_req[0]) {
+		pr_warn("acc_read: USB request already handled/freed");
+		r = -EINVAL;
+		goto done;
+	}
+
+	/*
+	 * Calculate the data length by considering termination character.
+	 * Then compansite the difference of rounding up to
+	 * integer multiple of maxpacket size.
+	 */
+	data_length = count;
+	data_length += dev->ep_out->maxpacket - 1;
+	data_length -= data_length % dev->ep_out->maxpacket;
+
+	if (dev->rx_done) {
+		// last req cancelled. try to get it.
+		req = dev->rx_req[0];
+		goto copy_data;
+	}
+
+requeue_req:
+	/* queue a request */
+	req = dev->rx_req[0];
+	req->length = data_length;
+	dev->rx_done = 0;
+	ret = usb_ep_queue(dev->ep_out, req, GFP_KERNEL);
+	if (ret < 0) {
+		r = -EIO;
+		goto done;
+	} else {
+		pr_debug("rx %p queue\n", req);
+	}
+
+	/* wait for a request to complete */
+	ret = wait_event_interruptible(dev->read_wq, dev->rx_done);
+	if (ret < 0) {
+		r = ret;
+		ret = usb_ep_dequeue(dev->ep_out, req);
+		if (ret != 0) {
+			// cancel failed. There can be a data already received.
+			// it will be retrieved in the next read.
+			pr_debug("acc_read: cancelling failed %d", ret);
+		}
+		goto done;
+	}
+
+copy_data:
+	dev->rx_done = 0;
+	if (dev->online) {
+		/* If we got a 0-len packet, throw it back and try again. */
+		if (req->actual == 0)
+			goto requeue_req;
+
+		pr_debug("rx %p %u\n", req, req->actual);
+		xfer = (req->actual < count) ? req->actual : count;
+		r = xfer;
+		if (copy_to_user(buf, req->buf, xfer))
+			r = -EFAULT;
+	} else
+		r = -EIO;
+
+done:
+	pr_debug("acc_read returning %zd\n", r);
+	return r;
+}
+
+static ssize_t acc_write(struct file *fp, const char __user *buf,
+	size_t count, loff_t *pos)
+{
+	struct acc_dev *dev = fp->private_data;
+	struct usb_request *req = 0;
+	ssize_t r = count;
+	unsigned xfer;
+	int ret;
+
+	pr_debug("acc_write(%zu)\n", count);
+
+	if (!dev->online || dev->disconnected) {
+		pr_debug("acc_write disconnected or not online");
+		return -ENODEV;
+	}
+
+	while (count > 0) {
+		/* get an idle tx request to use */
+		req = 0;
+		ret = wait_event_interruptible(dev->write_wq,
+			((req = req_get(dev, &dev->tx_idle)) || !dev->online));
+		if (!dev->online || dev->disconnected) {
+			pr_debug("acc_write dev->error\n");
+			r = -EIO;
+			break;
+		}
+
+		if (!req) {
+			r = ret;
+			break;
+		}
+
+		if (count > BULK_BUFFER_SIZE) {
+			xfer = BULK_BUFFER_SIZE;
+			/* ZLP, They will be more TX requests so not yet. */
+			req->zero = 0;
+		} else {
+			xfer = count;
+			/* If the data length is a multple of the
+			 * maxpacket size then send a zero length packet(ZLP).
+			*/
+			req->zero = ((xfer % dev->ep_in->maxpacket) == 0);
+		}
+		if (copy_from_user(req->buf, buf, xfer)) {
+			r = -EFAULT;
+			break;
+		}
+
+		req->length = xfer;
+		ret = usb_ep_queue(dev->ep_in, req, GFP_KERNEL);
+		if (ret < 0) {
+			pr_debug("acc_write: xfer error %d\n", ret);
+			r = -EIO;
+			break;
+		}
+
+		buf += xfer;
+		count -= xfer;
+
+		/* zero this so we don't try to free it on error exit */
+		req = 0;
+	}
+
+	if (req)
+		req_put(dev, &dev->tx_idle, req);
+
+	pr_debug("acc_write returning %zd\n", r);
+	return r;
+}
+
+static long acc_ioctl(struct file *fp, unsigned code, unsigned long value)
+{
+	struct acc_dev *dev = fp->private_data;
+	char *src = NULL;
+	int ret;
+
+	switch (code) {
+	case ACCESSORY_GET_STRING_MANUFACTURER:
+		src = dev->manufacturer;
+		break;
+	case ACCESSORY_GET_STRING_MODEL:
+		src = dev->model;
+		break;
+	case ACCESSORY_GET_STRING_DESCRIPTION:
+		src = dev->description;
+		break;
+	case ACCESSORY_GET_STRING_VERSION:
+		src = dev->version;
+		break;
+	case ACCESSORY_GET_STRING_URI:
+		src = dev->uri;
+		break;
+	case ACCESSORY_GET_STRING_SERIAL:
+		src = dev->serial;
+		break;
+	case ACCESSORY_IS_START_REQUESTED:
+		return dev->start_requested;
+	case ACCESSORY_GET_AUDIO_MODE:
+		return dev->audio_mode;
+	}
+	if (!src)
+		return -EINVAL;
+
+	ret = strlen(src) + 1;
+	if (copy_to_user((void __user *)value, src, ret))
+		ret = -EFAULT;
+	return ret;
+}
+
+static int acc_open(struct inode *ip, struct file *fp)
+{
+	struct acc_dev *dev = get_acc_dev();
+
+	if (!dev)
+		return -ENODEV;
+
+	if (atomic_xchg(&dev->open_excl, 1)) {
+		put_acc_dev(dev);
+		return -EBUSY;
+	}
+
+	dev->disconnected = 0;
+	fp->private_data = dev;
+	return 0;
+}
+
+static int acc_release(struct inode *ip, struct file *fp)
+{
+	struct acc_dev *dev = fp->private_data;
+
+	if (!dev)
+		return -ENOENT;
+
+	/* indicate that we are disconnected
+	 * still could be online so don't touch online flag
+	 */
+	dev->disconnected = 1;
+
+	fp->private_data = NULL;
+	WARN_ON(!atomic_xchg(&dev->open_excl, 0));
+	put_acc_dev(dev);
+	return 0;
+}
+
+/* file operations for /dev/usb_accessory */
+static const struct file_operations acc_fops = {
+	.owner = THIS_MODULE,
+	.read = acc_read,
+	.write = acc_write,
+	.unlocked_ioctl = acc_ioctl,
+	.compat_ioctl = acc_ioctl,
+	.open = acc_open,
+	.release = acc_release,
+};
+
+static int acc_hid_probe(struct hid_device *hdev,
+		const struct hid_device_id *id)
+{
+	int ret;
+
+	ret = hid_parse(hdev);
+	if (ret)
+		return ret;
+	return hid_hw_start(hdev, HID_CONNECT_DEFAULT);
+}
+
+static struct miscdevice acc_device = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "usb_accessory",
+	.fops = &acc_fops,
+};
+
+static const struct hid_device_id acc_hid_table[] = {
+	{ HID_USB_DEVICE(HID_ANY_ID, HID_ANY_ID) },
+	{ }
+};
+
+static struct hid_driver acc_hid_driver = {
+	.name = "USB accessory",
+	.id_table = acc_hid_table,
+	.probe = acc_hid_probe,
+};
+
+static void acc_complete_setup_noop(struct usb_ep *ep, struct usb_request *req)
+{
+	/*
+	 * Default no-op function when nothing needs to be done for the
+	 * setup request
+	 */
+}
+
+int acc_ctrlrequest(struct usb_composite_dev *cdev,
+				const struct usb_ctrlrequest *ctrl)
+{
+	struct acc_dev	*dev = get_acc_dev();
+	int	value = -EOPNOTSUPP;
+	struct acc_hid_dev *hid;
+	int offset;
+	u8 b_requestType = ctrl->bRequestType;
+	u8 b_request = ctrl->bRequest;
+	u16	w_index = le16_to_cpu(ctrl->wIndex);
+	u16	w_value = le16_to_cpu(ctrl->wValue);
+	u16	w_length = le16_to_cpu(ctrl->wLength);
+	unsigned long flags;
+
+	/*
+	 * If instance is not created which is the case in power off charging
+	 * mode, dev will be NULL. Hence return error if it is the case.
+	 */
+	if (!dev)
+		return -ENODEV;
+
+	if (b_requestType == (USB_DIR_OUT | USB_TYPE_VENDOR)) {
+		if (b_request == ACCESSORY_START) {
+			dev->start_requested = 1;
+			schedule_delayed_work(
+				&dev->start_work, msecs_to_jiffies(10));
+			value = 0;
+			cdev->req->complete = acc_complete_setup_noop;
+		} else if (b_request == ACCESSORY_SEND_STRING) {
+			schedule_work(&dev->sendstring_work);
+			dev->string_index = w_index;
+			cdev->gadget->ep0->driver_data = dev;
+			cdev->req->complete = acc_complete_set_string;
+			value = w_length;
+		} else if (b_request == ACCESSORY_SET_AUDIO_MODE &&
+				w_index == 0 && w_length == 0) {
+			dev->audio_mode = w_value;
+			cdev->req->complete = acc_complete_setup_noop;
+			value = 0;
+		} else if (b_request == ACCESSORY_REGISTER_HID) {
+			cdev->req->complete = acc_complete_setup_noop;
+			value = acc_register_hid(dev, w_value, w_index);
+		} else if (b_request == ACCESSORY_UNREGISTER_HID) {
+			cdev->req->complete = acc_complete_setup_noop;
+			value = acc_unregister_hid(dev, w_value);
+		} else if (b_request == ACCESSORY_SET_HID_REPORT_DESC) {
+			spin_lock_irqsave(&dev->lock, flags);
+			hid = acc_hid_get(&dev->new_hid_list, w_value);
+			spin_unlock_irqrestore(&dev->lock, flags);
+			if (!hid) {
+				value = -EINVAL;
+				goto err;
+			}
+			offset = w_index;
+			if (offset != hid->report_desc_offset
+				|| offset + w_length > hid->report_desc_len) {
+				value = -EINVAL;
+				goto err;
+			}
+			cdev->req->context = hid;
+			cdev->req->complete = acc_complete_set_hid_report_desc;
+			value = w_length;
+		} else if (b_request == ACCESSORY_SEND_HID_EVENT) {
+			spin_lock_irqsave(&dev->lock, flags);
+			hid = acc_hid_get(&dev->hid_list, w_value);
+			spin_unlock_irqrestore(&dev->lock, flags);
+			if (!hid) {
+				value = -EINVAL;
+				goto err;
+			}
+			cdev->req->context = hid;
+			cdev->req->complete = acc_complete_send_hid_event;
+			value = w_length;
+		}
+	} else if (b_requestType == (USB_DIR_IN | USB_TYPE_VENDOR)) {
+		if (b_request == ACCESSORY_GET_PROTOCOL) {
+			schedule_work(&dev->getprotocol_work);
+			*((u16 *)cdev->req->buf) = PROTOCOL_VERSION;
+			value = sizeof(u16);
+			cdev->req->complete = acc_complete_setup_noop;
+			/* clear any string left over from a previous session */
+			memset(dev->manufacturer, 0, sizeof(dev->manufacturer));
+			memset(dev->model, 0, sizeof(dev->model));
+			memset(dev->description, 0, sizeof(dev->description));
+			memset(dev->version, 0, sizeof(dev->version));
+			memset(dev->uri, 0, sizeof(dev->uri));
+			memset(dev->serial, 0, sizeof(dev->serial));
+			dev->start_requested = 0;
+			dev->audio_mode = 0;
+		}
+	}
+
+	if (value >= 0) {
+		cdev->req->zero = 0;
+		cdev->req->length = value;
+		value = usb_ep_queue(cdev->gadget->ep0, cdev->req, GFP_ATOMIC);
+		if (value < 0)
+			ERROR(cdev, "%s setup response queue error\n",
+				__func__);
+	}
+
+err:
+	if (value == -EOPNOTSUPP)
+		VDBG(cdev,
+			"unknown class-specific control req "
+			"%02x.%02x v%04x i%04x l%u\n",
+			ctrl->bRequestType, ctrl->bRequest,
+			w_value, w_index, w_length);
+	put_acc_dev(dev);
+	return value;
+}
+EXPORT_SYMBOL_GPL(acc_ctrlrequest);
+
+static int
+__acc_function_bind(struct usb_configuration *c,
+			struct usb_function *f, bool configfs)
+{
+	struct usb_composite_dev *cdev = c->cdev;
+	struct acc_dev	*dev = func_to_dev(f);
+	int			id;
+	int			ret;
+
+	DBG(cdev, "acc_function_bind dev: %p\n", dev);
+
+	if (configfs) {
+		if (acc_string_defs[INTERFACE_STRING_INDEX].id == 0) {
+			ret = usb_string_id(c->cdev);
+			if (ret < 0)
+				return ret;
+			acc_string_defs[INTERFACE_STRING_INDEX].id = ret;
+			acc_interface_desc.iInterface = ret;
+		}
+		dev->cdev = c->cdev;
+	}
+	ret = hid_register_driver(&acc_hid_driver);
+	if (ret)
+		return ret;
+
+	dev->start_requested = 0;
+
+	/* allocate interface ID(s) */
+	id = usb_interface_id(c, f);
+	if (id < 0)
+		return id;
+	acc_interface_desc.bInterfaceNumber = id;
+
+	/* allocate endpoints */
+	ret = create_bulk_endpoints(dev, &acc_fullspeed_in_desc,
+			&acc_fullspeed_out_desc);
+	if (ret)
+		return ret;
+
+	/* support high speed hardware */
+	acc_highspeed_in_desc.bEndpointAddress =
+		acc_fullspeed_in_desc.bEndpointAddress;
+	acc_highspeed_out_desc.bEndpointAddress =
+		acc_fullspeed_out_desc.bEndpointAddress;
+
+	/* support super speed hardware */
+	acc_superspeed_in_desc.bEndpointAddress =
+		acc_fullspeed_in_desc.bEndpointAddress;
+	acc_superspeed_out_desc.bEndpointAddress =
+		acc_fullspeed_out_desc.bEndpointAddress;
+
+	/* support super speed plus hardware */
+	acc_superspeedplus_in_desc.bEndpointAddress =
+		acc_fullspeed_in_desc.bEndpointAddress;
+	acc_superspeedplus_out_desc.bEndpointAddress =
+		acc_fullspeed_out_desc.bEndpointAddress;
+
+	DBG(cdev, "%s speed %s: IN/%s, OUT/%s\n",
+			gadget_is_dualspeed(c->cdev->gadget) ? "dual" : "full",
+			f->name, dev->ep_in->name, dev->ep_out->name);
+	return 0;
+}
+
+static int
+acc_function_bind_configfs(struct usb_configuration *c,
+			struct usb_function *f) {
+	return __acc_function_bind(c, f, true);
+}
+
+static void
+kill_all_hid_devices(struct acc_dev *dev)
+{
+	struct acc_hid_dev *hid;
+	struct list_head *entry, *temp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	list_for_each_safe(entry, temp, &dev->hid_list) {
+		hid = list_entry(entry, struct acc_hid_dev, list);
+		list_del(&hid->list);
+		list_add(&hid->list, &dev->dead_hid_list);
+	}
+	list_for_each_safe(entry, temp, &dev->new_hid_list) {
+		hid = list_entry(entry, struct acc_hid_dev, list);
+		list_del(&hid->list);
+		list_add(&hid->list, &dev->dead_hid_list);
+	}
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	schedule_work(&dev->hid_work);
+}
+
+static void
+acc_hid_unbind(struct acc_dev *dev)
+{
+	hid_unregister_driver(&acc_hid_driver);
+	kill_all_hid_devices(dev);
+}
+
+static void
+acc_function_unbind(struct usb_configuration *c, struct usb_function *f)
+{
+	struct acc_dev	*dev = func_to_dev(f);
+	struct usb_request *req;
+	int i;
+
+	dev->online = 0;		/* clear online flag */
+	wake_up(&dev->read_wq);		/* unblock reads on closure */
+	wake_up(&dev->write_wq);	/* likewise for writes */
+
+	while ((req = req_get(dev, &dev->tx_idle)))
+		acc_request_free(req, dev->ep_in);
+	for (i = 0; i < RX_REQ_MAX; i++) {
+		acc_request_free(dev->rx_req[i], dev->ep_out);
+		dev->rx_req[i] = NULL;
+	}
+
+	acc_hid_unbind(dev);
+}
+
+static void acc_getprotocol_work(struct work_struct *data)
+{
+	char *envp[2] = { "ACCESSORY=GETPROTOCOL", NULL };
+
+	kobject_uevent_env(&acc_device.this_device->kobj, KOBJ_CHANGE, envp);
+}
+
+static void acc_sendstring_work(struct work_struct *data)
+{
+	char *envp[2] = { "ACCESSORY=SENDSTRING", NULL };
+
+	kobject_uevent_env(&acc_device.this_device->kobj, KOBJ_CHANGE, envp);
+}
+
+static void acc_start_work(struct work_struct *data)
+{
+	char *envp[2] = { "ACCESSORY=START", NULL };
+
+	kobject_uevent_env(&acc_device.this_device->kobj, KOBJ_CHANGE, envp);
+}
+
+static int acc_hid_init(struct acc_hid_dev *hdev)
+{
+	struct hid_device *hid;
+	int ret;
+
+	hid = hid_allocate_device();
+	if (IS_ERR(hid))
+		return PTR_ERR(hid);
+
+	hid->ll_driver = &acc_hid_ll_driver;
+	hid->dev.parent = acc_device.this_device;
+
+	hid->bus = BUS_USB;
+	hid->vendor = HID_ANY_ID;
+	hid->product = HID_ANY_ID;
+	hid->driver_data = hdev;
+	ret = hid_add_device(hid);
+	if (ret) {
+		pr_err("can't add hid device: %d\n", ret);
+		hid_destroy_device(hid);
+		return ret;
+	}
+
+	hdev->hid = hid;
+	return 0;
+}
+
+static void acc_hid_delete(struct acc_hid_dev *hid)
+{
+	kfree(hid->report_desc);
+	kfree(hid);
+}
+
+static void acc_hid_work(struct work_struct *data)
+{
+	struct acc_dev *dev = get_acc_dev();
+	struct list_head	*entry, *temp;
+	struct acc_hid_dev *hid;
+	struct list_head	new_list, dead_list;
+	unsigned long flags;
+
+	if (!dev)
+		return;
+
+	INIT_LIST_HEAD(&new_list);
+
+	spin_lock_irqsave(&dev->lock, flags);
+
+	/* copy hids that are ready for initialization to new_list */
+	list_for_each_safe(entry, temp, &dev->new_hid_list) {
+		hid = list_entry(entry, struct acc_hid_dev, list);
+		if (hid->report_desc_offset == hid->report_desc_len)
+			list_move(&hid->list, &new_list);
+	}
+
+	if (list_empty(&dev->dead_hid_list)) {
+		INIT_LIST_HEAD(&dead_list);
+	} else {
+		/* move all of dev->dead_hid_list to dead_list */
+		dead_list.prev = dev->dead_hid_list.prev;
+		dead_list.next = dev->dead_hid_list.next;
+		dead_list.next->prev = &dead_list;
+		dead_list.prev->next = &dead_list;
+		INIT_LIST_HEAD(&dev->dead_hid_list);
+	}
+
+	spin_unlock_irqrestore(&dev->lock, flags);
+
+	/* register new HID devices */
+	list_for_each_safe(entry, temp, &new_list) {
+		hid = list_entry(entry, struct acc_hid_dev, list);
+		if (acc_hid_init(hid)) {
+			pr_err("can't add HID device %p\n", hid);
+			acc_hid_delete(hid);
+		} else {
+			spin_lock_irqsave(&dev->lock, flags);
+			list_move(&hid->list, &dev->hid_list);
+			spin_unlock_irqrestore(&dev->lock, flags);
+		}
+	}
+
+	/* remove dead HID devices */
+	list_for_each_safe(entry, temp, &dead_list) {
+		hid = list_entry(entry, struct acc_hid_dev, list);
+		list_del(&hid->list);
+		if (hid->hid)
+			hid_destroy_device(hid->hid);
+		acc_hid_delete(hid);
+	}
+
+	put_acc_dev(dev);
+}
+
+static int acc_function_set_alt(struct usb_function *f,
+		unsigned intf, unsigned alt)
+{
+	struct acc_dev	*dev = func_to_dev(f);
+	struct usb_composite_dev *cdev = f->config->cdev;
+	int ret;
+
+	DBG(cdev, "acc_function_set_alt intf: %d alt: %d\n", intf, alt);
+
+	ret = config_ep_by_speed(cdev->gadget, f, dev->ep_in);
+	if (ret)
+		return ret;
+
+	ret = usb_ep_enable(dev->ep_in);
+	if (ret)
+		return ret;
+
+	ret = config_ep_by_speed(cdev->gadget, f, dev->ep_out);
+	if (ret)
+		return ret;
+
+	ret = usb_ep_enable(dev->ep_out);
+	if (ret) {
+		usb_ep_disable(dev->ep_in);
+		return ret;
+	}
+
+	dev->online = 1;
+	dev->disconnected = 0; /* if online then not disconnected */
+
+	/* readers may be blocked waiting for us to go online */
+	wake_up(&dev->read_wq);
+	return 0;
+}
+
+static void acc_function_disable(struct usb_function *f)
+{
+	struct acc_dev	*dev = func_to_dev(f);
+	struct usb_composite_dev	*cdev = dev->cdev;
+
+	DBG(cdev, "acc_function_disable\n");
+	acc_set_disconnected(dev); /* this now only sets disconnected */
+	dev->online = 0; /* so now need to clear online flag here too */
+	usb_ep_disable(dev->ep_in);
+	usb_ep_disable(dev->ep_out);
+
+	/* readers may be blocked waiting for us to go online */
+	wake_up(&dev->read_wq);
+
+	VDBG(cdev, "%s disabled\n", dev->function.name);
+}
+
+static int acc_setup(void)
+{
+	struct acc_dev_ref *ref = &_acc_dev_ref;
+	struct acc_dev *dev;
+	int ret;
+
+	if (kref_read(&ref->kref))
+		return -EBUSY;
+
+	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+	if (!dev)
+		return -ENOMEM;
+
+	spin_lock_init(&dev->lock);
+	init_waitqueue_head(&dev->read_wq);
+	init_waitqueue_head(&dev->write_wq);
+	atomic_set(&dev->open_excl, 0);
+	INIT_LIST_HEAD(&dev->tx_idle);
+	INIT_LIST_HEAD(&dev->hid_list);
+	INIT_LIST_HEAD(&dev->new_hid_list);
+	INIT_LIST_HEAD(&dev->dead_hid_list);
+	INIT_DELAYED_WORK(&dev->start_work, acc_start_work);
+	INIT_WORK(&dev->hid_work, acc_hid_work);
+	INIT_WORK(&dev->getprotocol_work, acc_getprotocol_work);
+	INIT_WORK(&dev->sendstring_work, acc_sendstring_work);
+
+	dev->ref = ref;
+	if (cmpxchg_relaxed(&ref->acc_dev, NULL, dev)) {
+		ret = -EBUSY;
+		goto err_free_dev;
+	}
+
+	ret = misc_register(&acc_device);
+	if (ret)
+		goto err_zap_ptr;
+
+	kref_init(&ref->kref);
+	return 0;
+
+err_zap_ptr:
+	ref->acc_dev = NULL;
+err_free_dev:
+	kfree(dev);
+	pr_err("USB accessory gadget driver failed to initialize\n");
+	return ret;
+}
+
+void acc_disconnect(void)
+{
+	struct acc_dev *dev = get_acc_dev();
+
+	if (!dev)
+		return;
+
+	/* unregister all HID devices if USB is disconnected */
+	kill_all_hid_devices(dev);
+	put_acc_dev(dev);
+}
+EXPORT_SYMBOL_GPL(acc_disconnect);
+
+static void acc_cleanup(void)
+{
+	struct acc_dev *dev = get_acc_dev();
+
+	misc_deregister(&acc_device);
+	put_acc_dev(dev);
+	put_acc_dev(dev); /* Pairs with kref_init() in acc_setup() */
+}
+static struct acc_instance *to_acc_instance(struct config_item *item)
+{
+	return container_of(to_config_group(item), struct acc_instance,
+		func_inst.group);
+}
+
+static void acc_attr_release(struct config_item *item)
+{
+	struct acc_instance *fi_acc = to_acc_instance(item);
+
+	usb_put_function_instance(&fi_acc->func_inst);
+}
+
+static struct configfs_item_operations acc_item_ops = {
+	.release        = acc_attr_release,
+};
+
+static struct config_item_type acc_func_type = {
+	.ct_item_ops    = &acc_item_ops,
+	.ct_owner       = THIS_MODULE,
+};
+
+static struct acc_instance *to_fi_acc(struct usb_function_instance *fi)
+{
+	return container_of(fi, struct acc_instance, func_inst);
+}
+
+static int acc_set_inst_name(struct usb_function_instance *fi, const char *name)
+{
+	struct acc_instance *fi_acc;
+	char *ptr;
+	int name_len;
+
+	name_len = strlen(name) + 1;
+	if (name_len > MAX_INST_NAME_LEN)
+		return -ENAMETOOLONG;
+
+	ptr = kstrndup(name, name_len, GFP_KERNEL);
+	if (!ptr)
+		return -ENOMEM;
+
+	fi_acc = to_fi_acc(fi);
+	fi_acc->name = ptr;
+	return 0;
+}
+
+static void acc_free_inst(struct usb_function_instance *fi)
+{
+	struct acc_instance *fi_acc;
+
+	fi_acc = to_fi_acc(fi);
+	kfree(fi_acc->name);
+	acc_cleanup();
+}
+
+static struct usb_function_instance *acc_alloc_inst(void)
+{
+	struct acc_instance *fi_acc;
+	int err;
+
+	fi_acc = kzalloc(sizeof(*fi_acc), GFP_KERNEL);
+	if (!fi_acc)
+		return ERR_PTR(-ENOMEM);
+	fi_acc->func_inst.set_inst_name = acc_set_inst_name;
+	fi_acc->func_inst.free_func_inst = acc_free_inst;
+
+	err = acc_setup();
+	if (err) {
+		kfree(fi_acc);
+		return ERR_PTR(err);
+	}
+
+	config_group_init_type_name(&fi_acc->func_inst.group,
+					"", &acc_func_type);
+	return  &fi_acc->func_inst;
+}
+
+static void acc_free(struct usb_function *f)
+{
+	struct acc_dev *dev = func_to_dev(f);
+
+	put_acc_dev(dev);
+}
+
+int acc_ctrlrequest_configfs(struct usb_function *f,
+			const struct usb_ctrlrequest *ctrl) {
+	if (f->config != NULL && f->config->cdev != NULL)
+		return acc_ctrlrequest(f->config->cdev, ctrl);
+	else
+		return -1;
+}
+
+static struct usb_function *acc_alloc(struct usb_function_instance *fi)
+{
+	struct acc_dev *dev = get_acc_dev();
+
+	dev->function.name = "accessory";
+	dev->function.strings = acc_strings,
+	dev->function.fs_descriptors = fs_acc_descs;
+	dev->function.hs_descriptors = hs_acc_descs;
+	dev->function.ss_descriptors = ss_acc_descs;
+	dev->function.ssp_descriptors = ssp_acc_descs;
+	dev->function.bind = acc_function_bind_configfs;
+	dev->function.unbind = acc_function_unbind;
+	dev->function.set_alt = acc_function_set_alt;
+	dev->function.disable = acc_function_disable;
+	dev->function.free_func = acc_free;
+	dev->function.setup = acc_ctrlrequest_configfs;
+
+	return &dev->function;
+}
+DECLARE_USB_FUNCTION_INIT(accessory, acc_alloc_inst, acc_alloc);
+MODULE_LICENSE("GPL");

diff --git a/drivers/usb/gadget/function/f_audio_source.c b/drivers/usb/gadget/function/f_audio_source.c
new file mode 100644
index 0000000..c768a52
--- /dev/null
+++ b/drivers/usb/gadget/function/f_audio_source.c

@@ -0,0 +1,1071 @@
+/*
+ * Gadget Function Driver for USB audio source device
+ *
+ * Copyright (C) 2012 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/device.h>
+#include <linux/usb/audio.h>
+#include <linux/wait.h>
+#include <linux/pm_qos.h>
+#include <sound/core.h>
+#include <sound/initval.h>
+#include <sound/pcm.h>
+
+#include <linux/usb.h>
+#include <linux/usb_usual.h>
+#include <linux/usb/ch9.h>
+#include <linux/configfs.h>
+#include <linux/usb/composite.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#define SAMPLE_RATE 44100
+#define FRAMES_PER_MSEC (SAMPLE_RATE / 1000)
+
+#define IN_EP_MAX_PACKET_SIZE 256
+
+/* Number of requests to allocate */
+#define IN_EP_REQ_COUNT 4
+
+#define AUDIO_AC_INTERFACE	0
+#define AUDIO_AS_INTERFACE	1
+#define AUDIO_NUM_INTERFACES	2
+#define MAX_INST_NAME_LEN     40
+
+/* B.3.1  Standard AC Interface Descriptor */
+static struct usb_interface_descriptor ac_interface_desc = {
+	.bLength =		USB_DT_INTERFACE_SIZE,
+	.bDescriptorType =	USB_DT_INTERFACE,
+	.bNumEndpoints =	0,
+	.bInterfaceClass =	USB_CLASS_AUDIO,
+	.bInterfaceSubClass =	USB_SUBCLASS_AUDIOCONTROL,
+};
+
+DECLARE_UAC_AC_HEADER_DESCRIPTOR(2);
+
+#define UAC_DT_AC_HEADER_LENGTH	UAC_DT_AC_HEADER_SIZE(AUDIO_NUM_INTERFACES)
+/* 1 input terminal, 1 output terminal and 1 feature unit */
+#define UAC_DT_TOTAL_LENGTH (UAC_DT_AC_HEADER_LENGTH \
+	+ UAC_DT_INPUT_TERMINAL_SIZE + UAC_DT_OUTPUT_TERMINAL_SIZE \
+	+ UAC_DT_FEATURE_UNIT_SIZE(0))
+/* B.3.2  Class-Specific AC Interface Descriptor */
+static struct uac1_ac_header_descriptor_2 ac_header_desc = {
+	.bLength =		UAC_DT_AC_HEADER_LENGTH,
+	.bDescriptorType =	USB_DT_CS_INTERFACE,
+	.bDescriptorSubtype =	UAC_HEADER,
+	.bcdADC =		__constant_cpu_to_le16(0x0100),
+	.wTotalLength =		__constant_cpu_to_le16(UAC_DT_TOTAL_LENGTH),
+	.bInCollection =	AUDIO_NUM_INTERFACES,
+	.baInterfaceNr = {
+		[0] =		AUDIO_AC_INTERFACE,
+		[1] =		AUDIO_AS_INTERFACE,
+	}
+};
+
+#define INPUT_TERMINAL_ID	1
+static struct uac_input_terminal_descriptor input_terminal_desc = {
+	.bLength =		UAC_DT_INPUT_TERMINAL_SIZE,
+	.bDescriptorType =	USB_DT_CS_INTERFACE,
+	.bDescriptorSubtype =	UAC_INPUT_TERMINAL,
+	.bTerminalID =		INPUT_TERMINAL_ID,
+	.wTerminalType =	UAC_INPUT_TERMINAL_MICROPHONE,
+	.bAssocTerminal =	0,
+	.wChannelConfig =	0x3,
+};
+
+DECLARE_UAC_FEATURE_UNIT_DESCRIPTOR(0);
+
+#define FEATURE_UNIT_ID		2
+static struct uac_feature_unit_descriptor_0 feature_unit_desc = {
+	.bLength		= UAC_DT_FEATURE_UNIT_SIZE(0),
+	.bDescriptorType	= USB_DT_CS_INTERFACE,
+	.bDescriptorSubtype	= UAC_FEATURE_UNIT,
+	.bUnitID		= FEATURE_UNIT_ID,
+	.bSourceID		= INPUT_TERMINAL_ID,
+	.bControlSize		= 2,
+};
+
+#define OUTPUT_TERMINAL_ID	3
+static struct uac1_output_terminal_descriptor output_terminal_desc = {
+	.bLength		= UAC_DT_OUTPUT_TERMINAL_SIZE,
+	.bDescriptorType	= USB_DT_CS_INTERFACE,
+	.bDescriptorSubtype	= UAC_OUTPUT_TERMINAL,
+	.bTerminalID		= OUTPUT_TERMINAL_ID,
+	.wTerminalType		= UAC_TERMINAL_STREAMING,
+	.bAssocTerminal		= FEATURE_UNIT_ID,
+	.bSourceID		= FEATURE_UNIT_ID,
+};
+
+/* B.4.1  Standard AS Interface Descriptor */
+static struct usb_interface_descriptor as_interface_alt_0_desc = {
+	.bLength =		USB_DT_INTERFACE_SIZE,
+	.bDescriptorType =	USB_DT_INTERFACE,
+	.bAlternateSetting =	0,
+	.bNumEndpoints =	0,
+	.bInterfaceClass =	USB_CLASS_AUDIO,
+	.bInterfaceSubClass =	USB_SUBCLASS_AUDIOSTREAMING,
+};
+
+static struct usb_interface_descriptor as_interface_alt_1_desc = {
+	.bLength =		USB_DT_INTERFACE_SIZE,
+	.bDescriptorType =	USB_DT_INTERFACE,
+	.bAlternateSetting =	1,
+	.bNumEndpoints =	1,
+	.bInterfaceClass =	USB_CLASS_AUDIO,
+	.bInterfaceSubClass =	USB_SUBCLASS_AUDIOSTREAMING,
+};
+
+/* B.4.2  Class-Specific AS Interface Descriptor */
+static struct uac1_as_header_descriptor as_header_desc = {
+	.bLength =		UAC_DT_AS_HEADER_SIZE,
+	.bDescriptorType =	USB_DT_CS_INTERFACE,
+	.bDescriptorSubtype =	UAC_AS_GENERAL,
+	.bTerminalLink =	INPUT_TERMINAL_ID,
+	.bDelay =		1,
+	.wFormatTag =		UAC_FORMAT_TYPE_I_PCM,
+};
+
+DECLARE_UAC_FORMAT_TYPE_I_DISCRETE_DESC(1);
+
+static struct uac_format_type_i_discrete_descriptor_1 as_type_i_desc = {
+	.bLength =		UAC_FORMAT_TYPE_I_DISCRETE_DESC_SIZE(1),
+	.bDescriptorType =	USB_DT_CS_INTERFACE,
+	.bDescriptorSubtype =	UAC_FORMAT_TYPE,
+	.bFormatType =		UAC_FORMAT_TYPE_I,
+	.bSubframeSize =	2,
+	.bBitResolution =	16,
+	.bSamFreqType =		1,
+};
+
+/* Standard ISO IN Endpoint Descriptor for highspeed */
+static struct usb_endpoint_descriptor hs_as_in_ep_desc  = {
+	.bLength =		USB_DT_ENDPOINT_AUDIO_SIZE,
+	.bDescriptorType =	USB_DT_ENDPOINT,
+	.bEndpointAddress =	USB_DIR_IN,
+	.bmAttributes =		USB_ENDPOINT_SYNC_SYNC
+				| USB_ENDPOINT_XFER_ISOC,
+	.wMaxPacketSize =	__constant_cpu_to_le16(IN_EP_MAX_PACKET_SIZE),
+	.bInterval =		4, /* poll 1 per millisecond */
+};
+
+/* Standard ISO IN Endpoint Descriptor for highspeed */
+static struct usb_endpoint_descriptor fs_as_in_ep_desc  = {
+	.bLength =		USB_DT_ENDPOINT_AUDIO_SIZE,
+	.bDescriptorType =	USB_DT_ENDPOINT,
+	.bEndpointAddress =	USB_DIR_IN,
+	.bmAttributes =		USB_ENDPOINT_SYNC_SYNC
+				| USB_ENDPOINT_XFER_ISOC,
+	.wMaxPacketSize =	__constant_cpu_to_le16(IN_EP_MAX_PACKET_SIZE),
+	.bInterval =		1, /* poll 1 per millisecond */
+};
+
+/* Class-specific AS ISO OUT Endpoint Descriptor */
+static struct uac_iso_endpoint_descriptor as_iso_in_desc = {
+	.bLength =		UAC_ISO_ENDPOINT_DESC_SIZE,
+	.bDescriptorType =	USB_DT_CS_ENDPOINT,
+	.bDescriptorSubtype =	UAC_EP_GENERAL,
+	.bmAttributes =		1,
+	.bLockDelayUnits =	1,
+	.wLockDelay =		__constant_cpu_to_le16(1),
+};
+
+static struct usb_descriptor_header *hs_audio_desc[] = {
+	(struct usb_descriptor_header *)&ac_interface_desc,
+	(struct usb_descriptor_header *)&ac_header_desc,
+
+	(struct usb_descriptor_header *)&input_terminal_desc,
+	(struct usb_descriptor_header *)&output_terminal_desc,
+	(struct usb_descriptor_header *)&feature_unit_desc,
+
+	(struct usb_descriptor_header *)&as_interface_alt_0_desc,
+	(struct usb_descriptor_header *)&as_interface_alt_1_desc,
+	(struct usb_descriptor_header *)&as_header_desc,
+
+	(struct usb_descriptor_header *)&as_type_i_desc,
+
+	(struct usb_descriptor_header *)&hs_as_in_ep_desc,
+	(struct usb_descriptor_header *)&as_iso_in_desc,
+	NULL,
+};
+
+static struct usb_descriptor_header *fs_audio_desc[] = {
+	(struct usb_descriptor_header *)&ac_interface_desc,
+	(struct usb_descriptor_header *)&ac_header_desc,
+
+	(struct usb_descriptor_header *)&input_terminal_desc,
+	(struct usb_descriptor_header *)&output_terminal_desc,
+	(struct usb_descriptor_header *)&feature_unit_desc,
+
+	(struct usb_descriptor_header *)&as_interface_alt_0_desc,
+	(struct usb_descriptor_header *)&as_interface_alt_1_desc,
+	(struct usb_descriptor_header *)&as_header_desc,
+
+	(struct usb_descriptor_header *)&as_type_i_desc,
+
+	(struct usb_descriptor_header *)&fs_as_in_ep_desc,
+	(struct usb_descriptor_header *)&as_iso_in_desc,
+	NULL,
+};
+
+static struct snd_pcm_hardware audio_hw_info = {
+	.info =			SNDRV_PCM_INFO_MMAP |
+				SNDRV_PCM_INFO_MMAP_VALID |
+				SNDRV_PCM_INFO_BATCH |
+				SNDRV_PCM_INFO_INTERLEAVED |
+				SNDRV_PCM_INFO_BLOCK_TRANSFER,
+
+	.formats		= SNDRV_PCM_FMTBIT_S16_LE,
+	.channels_min		= 2,
+	.channels_max		= 2,
+	.rate_min		= SAMPLE_RATE,
+	.rate_max		= SAMPLE_RATE,
+
+	.buffer_bytes_max =	1024 * 1024,
+	.period_bytes_min =	64,
+	.period_bytes_max =	512 * 1024,
+	.periods_min =		2,
+	.periods_max =		1024,
+};
+
+/*-------------------------------------------------------------------------*/
+
+struct audio_source_config {
+	int	card;
+	int	device;
+};
+
+struct audio_dev {
+	struct usb_function		func;
+	struct snd_card			*card;
+	struct snd_pcm			*pcm;
+	struct snd_pcm_substream *substream;
+
+	struct list_head		idle_reqs;
+	struct usb_ep			*in_ep;
+
+	spinlock_t			lock;
+
+	/* beginning, end and current position in our buffer */
+	void				*buffer_start;
+	void				*buffer_end;
+	void				*buffer_pos;
+
+	/* byte size of a "period" */
+	unsigned int			period;
+	/* bytes sent since last call to snd_pcm_period_elapsed */
+	unsigned int			period_offset;
+	/* time we started playing */
+	ktime_t				start_time;
+	/* number of frames sent since start_time */
+	s64				frames_sent;
+	struct audio_source_config	*config;
+	/* for creating and issuing QoS requests */
+	struct pm_qos_request pm_qos;
+};
+
+static inline struct audio_dev *func_to_audio(struct usb_function *f)
+{
+	return container_of(f, struct audio_dev, func);
+}
+
+/*-------------------------------------------------------------------------*/
+
+struct audio_source_instance {
+	struct usb_function_instance func_inst;
+	const char *name;
+	struct audio_source_config *config;
+	struct device *audio_device;
+};
+
+static void audio_source_attr_release(struct config_item *item);
+
+static struct configfs_item_operations audio_source_item_ops = {
+	.release        = audio_source_attr_release,
+};
+
+static struct config_item_type audio_source_func_type = {
+	.ct_item_ops    = &audio_source_item_ops,
+	.ct_owner       = THIS_MODULE,
+};
+
+static ssize_t audio_source_pcm_show(struct device *dev,
+		struct device_attribute *attr, char *buf);
+
+static DEVICE_ATTR(pcm, S_IRUGO, audio_source_pcm_show, NULL);
+
+static struct device_attribute *audio_source_function_attributes[] = {
+	&dev_attr_pcm,
+	NULL
+};
+
+/*--------------------------------------------------------------------------*/
+
+static struct usb_request *audio_request_new(struct usb_ep *ep, int buffer_size)
+{
+	struct usb_request *req = usb_ep_alloc_request(ep, GFP_KERNEL);
+
+	if (!req)
+		return NULL;
+
+	req->buf = kmalloc(buffer_size, GFP_KERNEL);
+	if (!req->buf) {
+		usb_ep_free_request(ep, req);
+		return NULL;
+	}
+	req->length = buffer_size;
+	return req;
+}
+
+static void audio_request_free(struct usb_request *req, struct usb_ep *ep)
+{
+	if (req) {
+		kfree(req->buf);
+		usb_ep_free_request(ep, req);
+	}
+}
+
+static void audio_req_put(struct audio_dev *audio, struct usb_request *req)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&audio->lock, flags);
+	list_add_tail(&req->list, &audio->idle_reqs);
+	spin_unlock_irqrestore(&audio->lock, flags);
+}
+
+static struct usb_request *audio_req_get(struct audio_dev *audio)
+{
+	unsigned long flags;
+	struct usb_request *req;
+
+	spin_lock_irqsave(&audio->lock, flags);
+	if (list_empty(&audio->idle_reqs)) {
+		req = 0;
+	} else {
+		req = list_first_entry(&audio->idle_reqs, struct usb_request,
+				list);
+		list_del(&req->list);
+	}
+	spin_unlock_irqrestore(&audio->lock, flags);
+	return req;
+}
+
+/* send the appropriate number of packets to match our bitrate */
+static void audio_send(struct audio_dev *audio)
+{
+	struct snd_pcm_runtime *runtime;
+	struct usb_request *req;
+	int length, length1, length2, ret;
+	s64 msecs;
+	s64 frames;
+	ktime_t now;
+
+	/* audio->substream will be null if we have been closed */
+	if (!audio->substream)
+		return;
+	/* audio->buffer_pos will be null if we have been stopped */
+	if (!audio->buffer_pos)
+		return;
+
+	runtime = audio->substream->runtime;
+
+	/* compute number of frames to send */
+	now = ktime_get();
+	msecs = div_s64((ktime_to_ns(now) - ktime_to_ns(audio->start_time)),
+			1000000);
+	frames = div_s64((msecs * SAMPLE_RATE), 1000);
+
+	/* Readjust our frames_sent if we fall too far behind.
+	 * If we get too far behind it is better to drop some frames than
+	 * to keep sending data too fast in an attempt to catch up.
+	 */
+	if (frames - audio->frames_sent > 10 * FRAMES_PER_MSEC)
+		audio->frames_sent = frames - FRAMES_PER_MSEC;
+
+	frames -= audio->frames_sent;
+
+	/* We need to send something to keep the pipeline going */
+	if (frames <= 0)
+		frames = FRAMES_PER_MSEC;
+
+	while (frames > 0) {
+		req = audio_req_get(audio);
+		if (!req)
+			break;
+
+		length = frames_to_bytes(runtime, frames);
+		if (length > IN_EP_MAX_PACKET_SIZE)
+			length = IN_EP_MAX_PACKET_SIZE;
+
+		if (audio->buffer_pos + length > audio->buffer_end)
+			length1 = audio->buffer_end - audio->buffer_pos;
+		else
+			length1 = length;
+		memcpy(req->buf, audio->buffer_pos, length1);
+		if (length1 < length) {
+			/* Wrap around and copy remaining length
+			 * at beginning of buffer.
+			 */
+			length2 = length - length1;
+			memcpy(req->buf + length1, audio->buffer_start,
+					length2);
+			audio->buffer_pos = audio->buffer_start + length2;
+		} else {
+			audio->buffer_pos += length1;
+			if (audio->buffer_pos >= audio->buffer_end)
+				audio->buffer_pos = audio->buffer_start;
+		}
+
+		req->length = length;
+		ret = usb_ep_queue(audio->in_ep, req, GFP_ATOMIC);
+		if (ret < 0) {
+			pr_err("usb_ep_queue failed ret: %d\n", ret);
+			audio_req_put(audio, req);
+			break;
+		}
+
+		frames -= bytes_to_frames(runtime, length);
+		audio->frames_sent += bytes_to_frames(runtime, length);
+	}
+}
+
+static void audio_control_complete(struct usb_ep *ep, struct usb_request *req)
+{
+	/* nothing to do here */
+}
+
+static void audio_data_complete(struct usb_ep *ep, struct usb_request *req)
+{
+	struct audio_dev *audio = req->context;
+
+	pr_debug("audio_data_complete req->status %d req->actual %d\n",
+		req->status, req->actual);
+
+	audio_req_put(audio, req);
+
+	if (!audio->buffer_start || req->status)
+		return;
+
+	audio->period_offset += req->actual;
+	if (audio->period_offset >= audio->period) {
+		snd_pcm_period_elapsed(audio->substream);
+		audio->period_offset = 0;
+	}
+	audio_send(audio);
+}
+
+static int audio_set_endpoint_req(struct usb_function *f,
+		const struct usb_ctrlrequest *ctrl)
+{
+	int value = -EOPNOTSUPP;
+	u16 ep = le16_to_cpu(ctrl->wIndex);
+	u16 len = le16_to_cpu(ctrl->wLength);
+	u16 w_value = le16_to_cpu(ctrl->wValue);
+
+	pr_debug("bRequest 0x%x, w_value 0x%04x, len %d, endpoint %d\n",
+			ctrl->bRequest, w_value, len, ep);
+
+	switch (ctrl->bRequest) {
+	case UAC_SET_CUR:
+	case UAC_SET_MIN:
+	case UAC_SET_MAX:
+	case UAC_SET_RES:
+		value = len;
+		break;
+	default:
+		break;
+	}
+
+	return value;
+}
+
+static int audio_get_endpoint_req(struct usb_function *f,
+		const struct usb_ctrlrequest *ctrl)
+{
+	struct usb_composite_dev *cdev = f->config->cdev;
+	int value = -EOPNOTSUPP;
+	u8 ep = ((le16_to_cpu(ctrl->wIndex) >> 8) & 0xFF);
+	u16 len = le16_to_cpu(ctrl->wLength);
+	u16 w_value = le16_to_cpu(ctrl->wValue);
+	u8 *buf = cdev->req->buf;
+
+	pr_debug("bRequest 0x%x, w_value 0x%04x, len %d, endpoint %d\n",
+			ctrl->bRequest, w_value, len, ep);
+
+	if (w_value == UAC_EP_CS_ATTR_SAMPLE_RATE << 8) {
+		switch (ctrl->bRequest) {
+		case UAC_GET_CUR:
+		case UAC_GET_MIN:
+		case UAC_GET_MAX:
+		case UAC_GET_RES:
+			/* return our sample rate */
+			buf[0] = (u8)SAMPLE_RATE;
+			buf[1] = (u8)(SAMPLE_RATE >> 8);
+			buf[2] = (u8)(SAMPLE_RATE >> 16);
+			value = 3;
+			break;
+		default:
+			break;
+		}
+	}
+
+	return value;
+}
+
+static int
+audio_setup(struct usb_function *f, const struct usb_ctrlrequest *ctrl)
+{
+	struct usb_composite_dev *cdev = f->config->cdev;
+	struct usb_request *req = cdev->req;
+	int value = -EOPNOTSUPP;
+	u16 w_index = le16_to_cpu(ctrl->wIndex);
+	u16 w_value = le16_to_cpu(ctrl->wValue);
+	u16 w_length = le16_to_cpu(ctrl->wLength);
+
+	/* composite driver infrastructure handles everything; interface
+	 * activation uses set_alt().
+	 */
+	switch (ctrl->bRequestType) {
+	case USB_DIR_OUT | USB_TYPE_CLASS | USB_RECIP_ENDPOINT:
+		value = audio_set_endpoint_req(f, ctrl);
+		break;
+
+	case USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_ENDPOINT:
+		value = audio_get_endpoint_req(f, ctrl);
+		break;
+	}
+
+	/* respond with data transfer or status phase? */
+	if (value >= 0) {
+		pr_debug("audio req%02x.%02x v%04x i%04x l%d\n",
+			ctrl->bRequestType, ctrl->bRequest,
+			w_value, w_index, w_length);
+		req->zero = 0;
+		req->length = value;
+		req->complete = audio_control_complete;
+		value = usb_ep_queue(cdev->gadget->ep0, req, GFP_ATOMIC);
+		if (value < 0)
+			pr_err("audio response on err %d\n", value);
+	}
+
+	/* device either stalls (value < 0) or reports success */
+	return value;
+}
+
+static int audio_set_alt(struct usb_function *f, unsigned intf, unsigned alt)
+{
+	struct audio_dev *audio = func_to_audio(f);
+	struct usb_composite_dev *cdev = f->config->cdev;
+	int ret;
+
+	pr_debug("audio_set_alt intf %d, alt %d\n", intf, alt);
+
+	ret = config_ep_by_speed(cdev->gadget, f, audio->in_ep);
+	if (ret)
+		return ret;
+
+	usb_ep_enable(audio->in_ep);
+	return 0;
+}
+
+static void audio_disable(struct usb_function *f)
+{
+	struct audio_dev	*audio = func_to_audio(f);
+
+	pr_debug("audio_disable\n");
+	usb_ep_disable(audio->in_ep);
+}
+
+static void audio_free_func(struct usb_function *f)
+{
+	/* no-op */
+}
+
+/*-------------------------------------------------------------------------*/
+
+static void audio_build_desc(struct audio_dev *audio)
+{
+	u8 *sam_freq;
+	int rate;
+
+	/* Set channel numbers */
+	input_terminal_desc.bNrChannels = 2;
+	as_type_i_desc.bNrChannels = 2;
+
+	/* Set sample rates */
+	rate = SAMPLE_RATE;
+	sam_freq = as_type_i_desc.tSamFreq[0];
+	memcpy(sam_freq, &rate, 3);
+}
+
+
+static int snd_card_setup(struct usb_configuration *c,
+	struct audio_source_config *config);
+static struct audio_source_instance *to_fi_audio_source(
+	const struct usb_function_instance *fi);
+
+
+/* audio function driver setup/binding */
+static int
+audio_bind(struct usb_configuration *c, struct usb_function *f)
+{
+	struct usb_composite_dev *cdev = c->cdev;
+	struct audio_dev *audio = func_to_audio(f);
+	int status;
+	struct usb_ep *ep;
+	struct usb_request *req;
+	int i;
+	int err;
+
+	if (IS_ENABLED(CONFIG_USB_CONFIGFS)) {
+		struct audio_source_instance *fi_audio =
+				to_fi_audio_source(f->fi);
+		struct audio_source_config *config =
+				fi_audio->config;
+
+		err = snd_card_setup(c, config);
+		if (err)
+			return err;
+	}
+
+	audio_build_desc(audio);
+
+	/* allocate instance-specific interface IDs, and patch descriptors */
+	status = usb_interface_id(c, f);
+	if (status < 0)
+		goto fail;
+	ac_interface_desc.bInterfaceNumber = status;
+
+	/* AUDIO_AC_INTERFACE */
+	ac_header_desc.baInterfaceNr[0] = status;
+
+	status = usb_interface_id(c, f);
+	if (status < 0)
+		goto fail;
+	as_interface_alt_0_desc.bInterfaceNumber = status;
+	as_interface_alt_1_desc.bInterfaceNumber = status;
+
+	/* AUDIO_AS_INTERFACE */
+	ac_header_desc.baInterfaceNr[1] = status;
+
+	status = -ENODEV;
+
+	/* allocate our endpoint */
+	ep = usb_ep_autoconfig(cdev->gadget, &fs_as_in_ep_desc);
+	if (!ep)
+		goto fail;
+	audio->in_ep = ep;
+	ep->driver_data = audio; /* claim */
+
+	if (gadget_is_dualspeed(c->cdev->gadget))
+		hs_as_in_ep_desc.bEndpointAddress =
+			fs_as_in_ep_desc.bEndpointAddress;
+
+	f->fs_descriptors = fs_audio_desc;
+	f->hs_descriptors = hs_audio_desc;
+
+	for (i = 0, status = 0; i < IN_EP_REQ_COUNT && status == 0; i++) {
+		req = audio_request_new(ep, IN_EP_MAX_PACKET_SIZE);
+		if (req) {
+			req->context = audio;
+			req->complete = audio_data_complete;
+			audio_req_put(audio, req);
+		} else
+			status = -ENOMEM;
+	}
+
+fail:
+	return status;
+}
+
+static void
+audio_unbind(struct usb_configuration *c, struct usb_function *f)
+{
+	struct audio_dev *audio = func_to_audio(f);
+	struct usb_request *req;
+
+	while ((req = audio_req_get(audio)))
+		audio_request_free(req, audio->in_ep);
+
+	snd_card_free_when_closed(audio->card);
+	audio->card = NULL;
+	audio->pcm = NULL;
+	audio->substream = NULL;
+	audio->in_ep = NULL;
+
+	if (IS_ENABLED(CONFIG_USB_CONFIGFS)) {
+		struct audio_source_instance *fi_audio =
+				to_fi_audio_source(f->fi);
+		struct audio_source_config *config =
+				fi_audio->config;
+
+		config->card = -1;
+		config->device = -1;
+	}
+}
+
+static void audio_pcm_playback_start(struct audio_dev *audio)
+{
+	audio->start_time = ktime_get();
+	audio->frames_sent = 0;
+	audio_send(audio);
+}
+
+static void audio_pcm_playback_stop(struct audio_dev *audio)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&audio->lock, flags);
+	audio->buffer_start = 0;
+	audio->buffer_end = 0;
+	audio->buffer_pos = 0;
+	spin_unlock_irqrestore(&audio->lock, flags);
+}
+
+static int audio_pcm_open(struct snd_pcm_substream *substream)
+{
+	struct snd_pcm_runtime *runtime = substream->runtime;
+	struct audio_dev *audio = substream->private_data;
+
+	runtime->private_data = audio;
+	runtime->hw = audio_hw_info;
+	snd_pcm_limit_hw_rates(runtime);
+	runtime->hw.channels_max = 2;
+
+	audio->substream = substream;
+
+	/* Add the QoS request and set the latency to 0 */
+	cpu_latency_qos_add_request(&audio->pm_qos, 0);
+
+	return 0;
+}
+
+static int audio_pcm_close(struct snd_pcm_substream *substream)
+{
+	struct audio_dev *audio = substream->private_data;
+	unsigned long flags;
+
+	spin_lock_irqsave(&audio->lock, flags);
+
+	/* Remove the QoS request */
+	cpu_latency_qos_remove_request(&audio->pm_qos);
+
+	audio->substream = NULL;
+	spin_unlock_irqrestore(&audio->lock, flags);
+
+	return 0;
+}
+
+static int audio_pcm_hw_params(struct snd_pcm_substream *substream,
+				struct snd_pcm_hw_params *params)
+{
+	unsigned int channels = params_channels(params);
+	unsigned int rate = params_rate(params);
+
+	if (rate != SAMPLE_RATE)
+		return -EINVAL;
+	if (channels != 2)
+		return -EINVAL;
+
+	return snd_pcm_lib_alloc_vmalloc_buffer(substream,
+		params_buffer_bytes(params));
+}
+
+static int audio_pcm_hw_free(struct snd_pcm_substream *substream)
+{
+	return snd_pcm_lib_free_vmalloc_buffer(substream);
+}
+
+static int audio_pcm_prepare(struct snd_pcm_substream *substream)
+{
+	struct snd_pcm_runtime *runtime = substream->runtime;
+	struct audio_dev *audio = runtime->private_data;
+
+	audio->period = snd_pcm_lib_period_bytes(substream);
+	audio->period_offset = 0;
+	audio->buffer_start = runtime->dma_area;
+	audio->buffer_end = audio->buffer_start
+		+ snd_pcm_lib_buffer_bytes(substream);
+	audio->buffer_pos = audio->buffer_start;
+
+	return 0;
+}
+
+static snd_pcm_uframes_t audio_pcm_pointer(struct snd_pcm_substream *substream)
+{
+	struct snd_pcm_runtime *runtime = substream->runtime;
+	struct audio_dev *audio = runtime->private_data;
+	ssize_t bytes = audio->buffer_pos - audio->buffer_start;
+
+	/* return offset of next frame to fill in our buffer */
+	return bytes_to_frames(runtime, bytes);
+}
+
+static int audio_pcm_playback_trigger(struct snd_pcm_substream *substream,
+					int cmd)
+{
+	struct audio_dev *audio = substream->runtime->private_data;
+	int ret = 0;
+
+	switch (cmd) {
+	case SNDRV_PCM_TRIGGER_START:
+	case SNDRV_PCM_TRIGGER_RESUME:
+		audio_pcm_playback_start(audio);
+		break;
+
+	case SNDRV_PCM_TRIGGER_STOP:
+	case SNDRV_PCM_TRIGGER_SUSPEND:
+		audio_pcm_playback_stop(audio);
+		break;
+
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static struct audio_dev _audio_dev = {
+	.func = {
+		.name = "audio_source",
+		.bind = audio_bind,
+		.unbind = audio_unbind,
+		.set_alt = audio_set_alt,
+		.setup = audio_setup,
+		.disable = audio_disable,
+		.free_func = audio_free_func,
+	},
+	.lock = __SPIN_LOCK_UNLOCKED(_audio_dev.lock),
+	.idle_reqs = LIST_HEAD_INIT(_audio_dev.idle_reqs),
+};
+
+static struct snd_pcm_ops audio_playback_ops = {
+	.open		= audio_pcm_open,
+	.close		= audio_pcm_close,
+	.ioctl		= snd_pcm_lib_ioctl,
+	.hw_params	= audio_pcm_hw_params,
+	.hw_free	= audio_pcm_hw_free,
+	.prepare	= audio_pcm_prepare,
+	.trigger	= audio_pcm_playback_trigger,
+	.pointer	= audio_pcm_pointer,
+};
+
+int audio_source_bind_config(struct usb_configuration *c,
+		struct audio_source_config *config)
+{
+	struct audio_dev *audio;
+	int err;
+
+	config->card = -1;
+	config->device = -1;
+
+	audio = &_audio_dev;
+
+	err = snd_card_setup(c, config);
+	if (err)
+		return err;
+
+	err = usb_add_function(c, &audio->func);
+	if (err)
+		goto add_fail;
+
+	return 0;
+
+add_fail:
+	snd_card_free(audio->card);
+	return err;
+}
+
+static int snd_card_setup(struct usb_configuration *c,
+		struct audio_source_config *config)
+{
+	struct audio_dev *audio;
+	struct snd_card *card;
+	struct snd_pcm *pcm;
+	int err;
+
+	audio = &_audio_dev;
+
+	err = snd_card_new(&c->cdev->gadget->dev,
+			SNDRV_DEFAULT_IDX1, SNDRV_DEFAULT_STR1,
+			THIS_MODULE, 0, &card);
+	if (err)
+		return err;
+
+	err = snd_pcm_new(card, "USB audio source", 0, 1, 0, &pcm);
+	if (err)
+		goto pcm_fail;
+
+	pcm->private_data = audio;
+	pcm->info_flags = 0;
+	audio->pcm = pcm;
+
+	strlcpy(pcm->name, "USB gadget audio", sizeof(pcm->name));
+
+	snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, &audio_playback_ops);
+	snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV,
+				NULL, 0, 64 * 1024);
+
+	strlcpy(card->driver, "audio_source", sizeof(card->driver));
+	strlcpy(card->shortname, card->driver, sizeof(card->shortname));
+	strlcpy(card->longname, "USB accessory audio source",
+		sizeof(card->longname));
+
+	err = snd_card_register(card);
+	if (err)
+		goto register_fail;
+
+	config->card = pcm->card->number;
+	config->device = pcm->device;
+	audio->card = card;
+	return 0;
+
+register_fail:
+pcm_fail:
+	snd_card_free(audio->card);
+	return err;
+}
+
+static struct audio_source_instance *to_audio_source_instance(
+					struct config_item *item)
+{
+	return container_of(to_config_group(item), struct audio_source_instance,
+		func_inst.group);
+}
+
+static struct audio_source_instance *to_fi_audio_source(
+					const struct usb_function_instance *fi)
+{
+	return container_of(fi, struct audio_source_instance, func_inst);
+}
+
+static void audio_source_attr_release(struct config_item *item)
+{
+	struct audio_source_instance *fi_audio = to_audio_source_instance(item);
+
+	usb_put_function_instance(&fi_audio->func_inst);
+}
+
+static int audio_source_set_inst_name(struct usb_function_instance *fi,
+					const char *name)
+{
+	struct audio_source_instance *fi_audio;
+	char *ptr;
+	int name_len;
+
+	name_len = strlen(name) + 1;
+	if (name_len > MAX_INST_NAME_LEN)
+		return -ENAMETOOLONG;
+
+	ptr = kstrndup(name, name_len, GFP_KERNEL);
+	if (!ptr)
+		return -ENOMEM;
+
+	fi_audio = to_fi_audio_source(fi);
+	fi_audio->name = ptr;
+
+	return 0;
+}
+
+static void audio_source_free_inst(struct usb_function_instance *fi)
+{
+	struct audio_source_instance *fi_audio;
+
+	fi_audio = to_fi_audio_source(fi);
+	device_destroy(fi_audio->audio_device->class,
+			fi_audio->audio_device->devt);
+	kfree(fi_audio->name);
+	kfree(fi_audio->config);
+}
+
+static ssize_t audio_source_pcm_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct audio_source_instance *fi_audio = dev_get_drvdata(dev);
+	struct audio_source_config *config = fi_audio->config;
+
+	/* print PCM card and device numbers */
+	return sprintf(buf, "%d %d\n", config->card, config->device);
+}
+
+struct device *create_function_device(char *name);
+
+static struct usb_function_instance *audio_source_alloc_inst(void)
+{
+	struct audio_source_instance *fi_audio;
+	struct device_attribute **attrs;
+	struct device_attribute *attr;
+	struct device *dev;
+	void *err_ptr;
+	int err = 0;
+
+	fi_audio = kzalloc(sizeof(*fi_audio), GFP_KERNEL);
+	if (!fi_audio)
+		return ERR_PTR(-ENOMEM);
+
+	fi_audio->func_inst.set_inst_name = audio_source_set_inst_name;
+	fi_audio->func_inst.free_func_inst = audio_source_free_inst;
+
+	fi_audio->config = kzalloc(sizeof(struct audio_source_config),
+							GFP_KERNEL);
+	if (!fi_audio->config) {
+		err_ptr = ERR_PTR(-ENOMEM);
+		goto fail_audio;
+	}
+
+	config_group_init_type_name(&fi_audio->func_inst.group, "",
+						&audio_source_func_type);
+	dev = create_function_device("f_audio_source");
+
+	if (IS_ERR(dev)) {
+		err_ptr = dev;
+		goto fail_audio_config;
+	}
+
+	fi_audio->config->card = -1;
+	fi_audio->config->device = -1;
+	fi_audio->audio_device = dev;
+
+	attrs = audio_source_function_attributes;
+	if (attrs) {
+		while ((attr = *attrs++) && !err)
+			err = device_create_file(dev, attr);
+		if (err) {
+			err_ptr = ERR_PTR(-EINVAL);
+			goto fail_device;
+		}
+	}
+
+	dev_set_drvdata(dev, fi_audio);
+	_audio_dev.config = fi_audio->config;
+
+	return  &fi_audio->func_inst;
+
+fail_device:
+	device_destroy(dev->class, dev->devt);
+fail_audio_config:
+	kfree(fi_audio->config);
+fail_audio:
+	kfree(fi_audio);
+	return err_ptr;
+
+}
+
+static struct usb_function *audio_source_alloc(struct usb_function_instance *fi)
+{
+	return &_audio_dev.func;
+}
+
+DECLARE_USB_FUNCTION_INIT(audio_source, audio_source_alloc_inst,
+			audio_source_alloc);
+MODULE_LICENSE("GPL");

diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c
index fddf539..9511b22 100644
--- a/drivers/usb/gadget/function/f_midi.c
+++ b/drivers/usb/gadget/function/f_midi.c

@@ -1268,6 +1268,65 @@ static void f_midi_free_inst(struct usb_function_instance *f)
 	}
 }
 
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+extern struct device *create_function_device(char *name);
+static ssize_t alsa_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct usb_function_instance *fi_midi = dev_get_drvdata(dev);
+	struct f_midi *midi;
+
+	if (!fi_midi->f)
+		dev_warn(dev, "f_midi: function not set\n");
+
+	if (fi_midi && fi_midi->f) {
+		midi = func_to_midi(fi_midi->f);
+		if (midi->rmidi && midi->card && midi->rmidi->card)
+			return sprintf(buf, "%d %d\n",
+			midi->rmidi->card->number, midi->rmidi->device);
+	}
+
+	/* print PCM card and device numbers */
+	return sprintf(buf, "%d %d\n", -1, -1);
+}
+
+static DEVICE_ATTR(alsa, S_IRUGO, alsa_show, NULL);
+
+static struct device_attribute *alsa_function_attributes[] = {
+	&dev_attr_alsa,
+	NULL
+};
+
+static int create_alsa_device(struct usb_function_instance *fi)
+{
+	struct device *dev;
+	struct device_attribute **attrs;
+	struct device_attribute *attr;
+	int err = 0;
+
+	dev = create_function_device("f_midi");
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	attrs = alsa_function_attributes;
+	if (attrs) {
+		while ((attr = *attrs++) && !err)
+			err = device_create_file(dev, attr);
+		if (err) {
+			device_destroy(dev->class, dev->devt);
+			return -EINVAL;
+		}
+	}
+	dev_set_drvdata(dev, fi);
+	return 0;
+}
+#else
+static int create_alsa_device(struct usb_function_instance *fi)
+{
+	return 0;
+}
+#endif
+
 static struct usb_function_instance *f_midi_alloc_inst(void)
 {
 	struct f_midi_opts *opts;
@@ -1286,6 +1345,11 @@ static struct usb_function_instance *f_midi_alloc_inst(void)
 	opts->out_ports = 1;
 	opts->refcnt = 1;
 
+	if (create_alsa_device(&opts->func_inst)) {
+		kfree(opts);
+		return ERR_PTR(-ENODEV);
+	}
+
 	config_group_init_type_name(&opts->func_inst.group, "",
 				    &midi_func_type);
 
@@ -1306,6 +1370,7 @@ static void f_midi_free(struct usb_function *f)
 		kfifo_free(&midi->in_req_fifo);
 		kfree(midi);
 		free = true;
+		opts->func_inst.f = NULL;
 	}
 	mutex_unlock(&opts->lock);
 
@@ -1393,6 +1458,7 @@ static struct usb_function *f_midi_alloc(struct usb_function_instance *fi)
 	midi->func.disable	= f_midi_disable;
 	midi->func.free_func	= f_midi_free;
 
+	fi->f = &midi->func;
 	return &midi->func;
 
 midi_free:

diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
index 4419b79..32f2c16 100644
--- a/drivers/usb/gadget/function/f_uvc.c
+++ b/drivers/usb/gadget/function/f_uvc.c

@@ -39,9 +39,6 @@ MODULE_PARM_DESC(trace, "Trace level bitmask");
 
 /* string IDs are assigned dynamically */
 
-#define UVC_STRING_CONTROL_IDX			0
-#define UVC_STRING_STREAMING_IDX		1
-
 static struct usb_string uvc_en_us_strings[] = {
 	/* [UVC_STRING_CONTROL_IDX].s = DYNAMIC, */
 	[UVC_STRING_STREAMING_IDX].s = "Video Streaming",
@@ -229,6 +226,8 @@ uvc_function_setup(struct usb_function *f, const struct usb_ctrlrequest *ctrl)
 	struct uvc_device *uvc = to_uvc(f);
 	struct v4l2_event v4l2_event;
 	struct uvc_event *uvc_event = (void *)&v4l2_event.u.data;
+	unsigned int interface = le16_to_cpu(ctrl->wIndex) & 0xff;
+	struct usb_ctrlrequest *mctrl;
 
 	if ((ctrl->bRequestType & USB_TYPE_MASK) != USB_TYPE_CLASS) {
 		uvcg_info(f, "invalid request type\n");
@@ -249,6 +248,16 @@ uvc_function_setup(struct usb_function *f, const struct usb_ctrlrequest *ctrl)
 	memset(&v4l2_event, 0, sizeof(v4l2_event));
 	v4l2_event.type = UVC_EVENT_SETUP;
 	memcpy(&uvc_event->req, ctrl, sizeof(uvc_event->req));
+
+	/* check for the interface number, fixup the interface number in
+	 * the ctrl request so the userspace doesn't have to bother with
+	 * offset and configfs parsing
+	 */
+	mctrl = &uvc_event->req;
+	mctrl->wIndex &= ~cpu_to_le16(0xff);
+	if (interface == uvc->streaming_intf)
+		mctrl->wIndex = cpu_to_le16(UVC_STRING_STREAMING_IDX);
+
 	v4l2_event_queue(&uvc->vdev, &v4l2_event);
 
 	return 0;

diff --git a/drivers/usb/gadget/function/uvc_configfs.c b/drivers/usb/gadget/function/uvc_configfs.c
index 4303a32..76cb60d 100644
--- a/drivers/usb/gadget/function/uvc_configfs.c
+++ b/drivers/usb/gadget/function/uvc_configfs.c

@@ -1512,7 +1512,7 @@ UVCG_UNCOMPRESSED_ATTR(b_bits_per_pixel, bBitsPerPixel, 8);
 UVCG_UNCOMPRESSED_ATTR(b_default_frame_index, bDefaultFrameIndex, 8);
 UVCG_UNCOMPRESSED_ATTR_RO(b_aspect_ratio_x, bAspectRatioX, 8);
 UVCG_UNCOMPRESSED_ATTR_RO(b_aspect_ratio_y, bAspectRatioY, 8);
-UVCG_UNCOMPRESSED_ATTR_RO(bm_interface_flags, bmInterfaceFlags, 8);
+UVCG_UNCOMPRESSED_ATTR_RO(bm_interlace_flags, bmInterlaceFlags, 8);
 
 #undef UVCG_UNCOMPRESSED_ATTR
 #undef UVCG_UNCOMPRESSED_ATTR_RO
@@ -1541,7 +1541,7 @@ static struct configfs_attribute *uvcg_uncompressed_attrs[] = {
 	&uvcg_uncompressed_attr_b_default_frame_index,
 	&uvcg_uncompressed_attr_b_aspect_ratio_x,
 	&uvcg_uncompressed_attr_b_aspect_ratio_y,
-	&uvcg_uncompressed_attr_bm_interface_flags,
+	&uvcg_uncompressed_attr_bm_interlace_flags,
 	&uvcg_uncompressed_attr_bma_controls,
 	NULL,
 };
@@ -1574,7 +1574,7 @@ static struct config_group *uvcg_uncompressed_make(struct config_group *group,
 	h->desc.bDefaultFrameIndex	= 1;
 	h->desc.bAspectRatioX		= 0;
 	h->desc.bAspectRatioY		= 0;
-	h->desc.bmInterfaceFlags	= 0;
+	h->desc.bmInterlaceFlags	= 0;
 	h->desc.bCopyProtect		= 0;
 
 	INIT_LIST_HEAD(&h->fmt.frames);
@@ -1700,7 +1700,7 @@ UVCG_MJPEG_ATTR(b_default_frame_index, bDefaultFrameIndex, 8);
 UVCG_MJPEG_ATTR_RO(bm_flags, bmFlags, 8);
 UVCG_MJPEG_ATTR_RO(b_aspect_ratio_x, bAspectRatioX, 8);
 UVCG_MJPEG_ATTR_RO(b_aspect_ratio_y, bAspectRatioY, 8);
-UVCG_MJPEG_ATTR_RO(bm_interface_flags, bmInterfaceFlags, 8);
+UVCG_MJPEG_ATTR_RO(bm_interlace_flags, bmInterlaceFlags, 8);
 
 #undef UVCG_MJPEG_ATTR
 #undef UVCG_MJPEG_ATTR_RO
@@ -1728,7 +1728,7 @@ static struct configfs_attribute *uvcg_mjpeg_attrs[] = {
 	&uvcg_mjpeg_attr_bm_flags,
 	&uvcg_mjpeg_attr_b_aspect_ratio_x,
 	&uvcg_mjpeg_attr_b_aspect_ratio_y,
-	&uvcg_mjpeg_attr_bm_interface_flags,
+	&uvcg_mjpeg_attr_bm_interlace_flags,
 	&uvcg_mjpeg_attr_bma_controls,
 	NULL,
 };
@@ -1755,7 +1755,7 @@ static struct config_group *uvcg_mjpeg_make(struct config_group *group,
 	h->desc.bDefaultFrameIndex	= 1;
 	h->desc.bAspectRatioX		= 0;
 	h->desc.bAspectRatioY		= 0;
-	h->desc.bmInterfaceFlags	= 0;
+	h->desc.bmInterlaceFlags	= 0;
 	h->desc.bCopyProtect		= 0;
 
 	INIT_LIST_HEAD(&h->fmt.frames);

diff --git a/drivers/usb/gadget/legacy/webcam.c b/drivers/usb/gadget/legacy/webcam.c
index e9b5846..c06dd1a 100644
--- a/drivers/usb/gadget/legacy/webcam.c
+++ b/drivers/usb/gadget/legacy/webcam.c

@@ -171,7 +171,7 @@ static const struct uvc_format_uncompressed uvc_format_yuv = {
 	.bDefaultFrameIndex	= 1,
 	.bAspectRatioX		= 0,
 	.bAspectRatioY		= 0,
-	.bmInterfaceFlags	= 0,
+	.bmInterlaceFlags	= 0,
 	.bCopyProtect		= 0,
 };
 
@@ -222,7 +222,7 @@ static const struct uvc_format_mjpeg uvc_format_mjpg = {
 	.bDefaultFrameIndex	= 1,
 	.bAspectRatioX		= 0,
 	.bAspectRatioY		= 0,
-	.bmInterfaceFlags	= 0,
+	.bmInterlaceFlags	= 0,
 	.bCopyProtect		= 0,
 };
 

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 0a53a61..8f4730f 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig

@@ -167,7 +167,7 @@
 	 If unsure, say 'N'.
 
 config VIRTIO_DMA_SHARED_BUFFER
-	tristate
+	tristate "Virtio DMA shared buffer support"
 	depends on DMA_SHARED_BUFFER
 	help
 	 This option adds a flavor of dma buffers that are backed by

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 3f78a3a..72fe240 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c

@@ -553,11 +553,15 @@ static int init_vqs(struct virtio_balloon *vb)
 		virtqueue_kick(vb->stats_vq);
 	}
 
-	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 		vb->free_page_vq = vqs[VIRTIO_BALLOON_VQ_FREE_PAGE];
+		virtqueue_disable_dma_api_for_buffers(vb->free_page_vq);
+	}
 
-	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING))
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) {
 		vb->reporting_vq = vqs[VIRTIO_BALLOON_VQ_REPORTING];
+		virtqueue_disable_dma_api_for_buffers(vb->reporting_vq);
+	}
 
 	return 0;
 }
@@ -1093,7 +1097,6 @@ static int virtballoon_validate(struct virtio_device *vdev)
 	else if (!virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON))
 		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_REPORTING);
 
-	__virtio_clear_bit(vdev, VIRTIO_F_ACCESS_PLATFORM);
 	return 0;
 }
 

diff --git a/drivers/virtio/virtio_dma_buf.c b/drivers/virtio/virtio_dma_buf.c
index 2521a750..19a843f 100644
--- a/drivers/virtio/virtio_dma_buf.c
+++ b/drivers/virtio/virtio_dma_buf.c

@@ -25,11 +25,14 @@ struct dma_buf *virtio_dma_buf_export
 			     const struct virtio_dma_buf_ops, ops);
 
 	if (!exp_info->ops ||
-	    exp_info->ops->attach != &virtio_dma_buf_attach ||
 	    !virtio_ops->get_uuid) {
 		return ERR_PTR(-EINVAL);
 	}
 
+	if (!(IS_ENABLED(CONFIG_CFI_CLANG) && IS_ENABLED(CONFIG_MODULES)) &&
+	    exp_info->ops->attach != &virtio_dma_buf_attach)
+		return ERR_PTR(-EINVAL);
+
 	return dma_buf_export(exp_info);
 }
 EXPORT_SYMBOL(virtio_dma_buf_export);
@@ -60,6 +63,9 @@ EXPORT_SYMBOL(virtio_dma_buf_attach);
  */
 bool is_virtio_dma_buf(struct dma_buf *dma_buf)
 {
+	if (IS_ENABLED(CONFIG_CFI_CLANG) && IS_ENABLED(CONFIG_MODULES))
+		return true;
+
 	return dma_buf->ops->attach == &virtio_dma_buf_attach;
 }
 EXPORT_SYMBOL(is_virtio_dma_buf);

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 2e7689b..3eca714 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c

@@ -2863,4 +2863,14 @@ const struct vring *virtqueue_get_vring(struct virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_vring);
 
+/*
+ * Prevents use of DMA API for buffers passed via the specified virtqueue.
+ * DMA API may still be used for the vrings themselves.
+ */
+void virtqueue_disable_dma_api_for_buffers(struct virtqueue *vq)
+{
+	to_vvq(vq)->use_dma_api = false;
+}
+EXPORT_SYMBOL_GPL(virtqueue_disable_dma_api_for_buffers);
+
 MODULE_LICENSE("GPL");

diff --git a/fs/Kconfig b/fs/Kconfig
index 2685a4d..6e109b4 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig

@@ -127,6 +127,7 @@
 source "fs/autofs/Kconfig"
 source "fs/fuse/Kconfig"
 source "fs/overlayfs/Kconfig"
+source "fs/incfs/Kconfig"
 
 menu "Caches"
 

diff --git a/fs/Makefile b/fs/Makefile
index 4dea178..bfdaafb 100644
--- a/fs/Makefile
+++ b/fs/Makefile

@@ -114,6 +114,7 @@
 obj-$(CONFIG_FUSE_FS)		+= fuse/
 obj-$(CONFIG_OVERLAY_FS)	+= overlayfs/
 obj-$(CONFIG_ORANGEFS_FS)       += orangefs/
+obj-$(CONFIG_INCREMENTAL_FS)	+= incfs/
 obj-$(CONFIG_UDF_FS)		+= udf/
 obj-$(CONFIG_SUN_OPENPROMFS)	+= openpromfs/
 obj-$(CONFIG_OMFS_FS)		+= omfs/

diff --git a/fs/OWNERS b/fs/OWNERS
new file mode 100644
index 0000000..7780f6b
--- /dev/null
+++ b/fs/OWNERS

@@ -0,0 +1 @@
+per-file {crypto,verity}/**=ebiggers@google.com

diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
index d5f68a0..98d1901 100644
--- a/fs/crypto/fscrypt_private.h
+++ b/fs/crypto/fscrypt_private.h

@@ -27,6 +27,27 @@
  */
 #define FSCRYPT_MIN_KEY_SIZE	16
 
+/* Maximum size of a standard fscrypt master key */
+#define FSCRYPT_MAX_STANDARD_KEY_SIZE	64
+
+/* Maximum size of a hardware-wrapped fscrypt master key */
+#define FSCRYPT_MAX_HW_WRAPPED_KEY_SIZE	BLK_CRYPTO_MAX_HW_WRAPPED_KEY_SIZE
+
+/*
+ * Maximum size of an fscrypt master key across both key types.
+ * This should just use max(), but max() doesn't work in a struct definition.
+ */
+#define FSCRYPT_MAX_ANY_KEY_SIZE \
+	(FSCRYPT_MAX_HW_WRAPPED_KEY_SIZE > FSCRYPT_MAX_STANDARD_KEY_SIZE ? \
+	 FSCRYPT_MAX_HW_WRAPPED_KEY_SIZE : FSCRYPT_MAX_STANDARD_KEY_SIZE)
+
+/*
+ * FSCRYPT_MAX_KEY_SIZE is defined in the UAPI header, but the addition of
+ * hardware-wrapped keys has made it misleading as it's only for standard keys.
+ * Don't use it in kernel code; use one of the above constants instead.
+ */
+#undef FSCRYPT_MAX_KEY_SIZE
+
 #define FSCRYPT_CONTEXT_V1	1
 #define FSCRYPT_CONTEXT_V2	2
 
@@ -332,7 +353,8 @@ void fscrypt_destroy_hkdf(struct fscrypt_hkdf *hkdf);
 
 /* inline_crypt.c */
 #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
-int fscrypt_select_encryption_impl(struct fscrypt_info *ci);
+int fscrypt_select_encryption_impl(struct fscrypt_info *ci,
+				   bool is_hw_wrapped_key);
 
 static inline bool
 fscrypt_using_inline_encryption(const struct fscrypt_info *ci)
@@ -341,12 +363,17 @@ fscrypt_using_inline_encryption(const struct fscrypt_info *ci)
 }
 
 int fscrypt_prepare_inline_crypt_key(struct fscrypt_prepared_key *prep_key,
-				     const u8 *raw_key,
+				     const u8 *raw_key, size_t raw_key_size,
+				     bool is_hw_wrapped,
 				     const struct fscrypt_info *ci);
 
 void fscrypt_destroy_inline_crypt_key(struct super_block *sb,
 				      struct fscrypt_prepared_key *prep_key);
 
+int fscrypt_derive_sw_secret(struct super_block *sb,
+			     const u8 *wrapped_key, size_t wrapped_key_size,
+			     u8 sw_secret[BLK_CRYPTO_SW_SECRET_SIZE]);
+
 /*
  * Check whether the crypto transform or blk-crypto key has been allocated in
  * @prep_key, depending on which encryption implementation the file will use.
@@ -370,7 +397,8 @@ fscrypt_is_key_prepared(struct fscrypt_prepared_key *prep_key,
 
 #else /* CONFIG_FS_ENCRYPTION_INLINE_CRYPT */
 
-static inline int fscrypt_select_encryption_impl(struct fscrypt_info *ci)
+static inline int fscrypt_select_encryption_impl(struct fscrypt_info *ci,
+						 bool is_hw_wrapped_key)
 {
 	return 0;
 }
@@ -383,7 +411,8 @@ fscrypt_using_inline_encryption(const struct fscrypt_info *ci)
 
 static inline int
 fscrypt_prepare_inline_crypt_key(struct fscrypt_prepared_key *prep_key,
-				 const u8 *raw_key,
+				 const u8 *raw_key, size_t raw_key_size,
+				 bool is_hw_wrapped,
 				 const struct fscrypt_info *ci)
 {
 	WARN_ON(1);
@@ -396,6 +425,15 @@ fscrypt_destroy_inline_crypt_key(struct super_block *sb,
 {
 }
 
+static inline int
+fscrypt_derive_sw_secret(struct super_block *sb,
+			 const u8 *wrapped_key, size_t wrapped_key_size,
+			 u8 sw_secret[BLK_CRYPTO_SW_SECRET_SIZE])
+{
+	fscrypt_warn(NULL, "kernel doesn't support hardware-wrapped keys");
+	return -EOPNOTSUPP;
+}
+
 static inline bool
 fscrypt_is_key_prepared(struct fscrypt_prepared_key *prep_key,
 			const struct fscrypt_info *ci)
@@ -412,20 +450,38 @@ fscrypt_is_key_prepared(struct fscrypt_prepared_key *prep_key,
 struct fscrypt_master_key_secret {
 
 	/*
-	 * For v2 policy keys: HKDF context keyed by this master key.
-	 * For v1 policy keys: not set (hkdf.hmac_tfm == NULL).
+	 * The KDF with which subkeys of this key can be derived.
+	 *
+	 * For v1 policy keys, this isn't applicable and won't be set.
+	 * Otherwise, this KDF will be keyed by this master key if
+	 * ->is_hw_wrapped=false, or by the "software secret" that hardware
+	 * derived from this master key if ->is_hw_wrapped=true.
 	 */
 	struct fscrypt_hkdf	hkdf;
 
 	/*
+	 * True if this key is a hardware-wrapped key; false if this key is a
+	 * standard key (i.e. a "software key").  For v1 policy keys this will
+	 * always be false, as v1 policy support is a legacy feature which
+	 * doesn't support newer functionality such as hardware-wrapped keys.
+	 */
+	bool			is_hw_wrapped;
+
+	/*
 	 * Size of the raw key in bytes.  This remains set even if ->raw was
 	 * zeroized due to no longer being needed.  I.e. we still remember the
 	 * size of the key even if we don't need to remember the key itself.
 	 */
 	u32			size;
 
-	/* For v1 policy keys: the raw key.  Wiped for v2 policy keys. */
-	u8			raw[FSCRYPT_MAX_KEY_SIZE];
+	/*
+	 * The raw key which userspace provided, when still needed.  This can be
+	 * either a standard key or a hardware-wrapped key, as indicated by
+	 * ->is_hw_wrapped.  In the case of a standard, v2 policy key, there is
+	 * no need to remember the raw key separately from ->hkdf so this field
+	 * will be zeroized as soon as ->hkdf is initialized.
+	 */
+	u8			raw[FSCRYPT_MAX_ANY_KEY_SIZE];
 
 } __randomize_layout;
 
@@ -439,13 +495,7 @@ struct fscrypt_master_key_secret {
 struct fscrypt_master_key {
 
 	/*
-	 * Back-pointer to the super_block of the filesystem to which this
-	 * master key has been added.  Only valid if ->mk_active_refs > 0.
-	 */
-	struct super_block			*mk_sb;
-
-	/*
-	 * Link in ->mk_sb->s_master_keys->key_hashtable.
+	 * Link in ->s_master_keys->key_hashtable.
 	 * Only valid if ->mk_active_refs > 0.
 	 */
 	struct hlist_node			mk_node;
@@ -456,7 +506,7 @@ struct fscrypt_master_key {
 	/*
 	 * Active and structural reference counts.  An active ref guarantees
 	 * that the struct continues to exist, continues to be in the keyring
-	 * ->mk_sb->s_master_keys, and that any embedded subkeys (e.g.
+	 * ->s_master_keys, and that any embedded subkeys (e.g.
 	 * ->mk_direct_keys) that have been prepared continue to exist.
 	 * A structural ref only guarantees that the struct continues to exist.
 	 *
@@ -569,7 +619,8 @@ static inline int master_key_spec_len(const struct fscrypt_key_specifier *spec)
 
 void fscrypt_put_master_key(struct fscrypt_master_key *mk);
 
-void fscrypt_put_master_key_activeref(struct fscrypt_master_key *mk);
+void fscrypt_put_master_key_activeref(struct super_block *sb,
+				      struct fscrypt_master_key *mk);
 
 struct fscrypt_master_key *
 fscrypt_find_master_key(struct super_block *sb,

diff --git a/fs/crypto/hkdf.c b/fs/crypto/hkdf.c
index 7607d18b..41e7c9b 100644
--- a/fs/crypto/hkdf.c
+++ b/fs/crypto/hkdf.c

@@ -4,7 +4,9 @@
  * Function"), aka RFC 5869.  See also the original paper (Krawczyk 2010):
  * "Cryptographic Extraction and Key Derivation: The HKDF Scheme".
  *
- * This is used to derive keys from the fscrypt master keys.
+ * This is used to derive keys from the fscrypt master keys (or from the
+ * "software secrets" which hardware derives from the fscrypt master keys, in
+ * the case that the fscrypt master keys are hardware-wrapped keys).
  *
  * Copyright 2019 Google LLC
  */

diff --git a/fs/crypto/inline_crypt.c b/fs/crypto/inline_crypt.c
index cea8b14..64fff38 100644
--- a/fs/crypto/inline_crypt.c
+++ b/fs/crypto/inline_crypt.c

@@ -12,7 +12,7 @@
  * provides the key and IV to use.
  */
 
-#include <linux/blk-crypto-profile.h>
+#include <linux/blk-crypto.h>
 #include <linux/blkdev.h>
 #include <linux/buffer_head.h>
 #include <linux/sched/mm.h>
@@ -77,10 +77,8 @@ static void fscrypt_log_blk_crypto_impl(struct fscrypt_mode *mode,
 	unsigned int i;
 
 	for (i = 0; i < num_devs; i++) {
-		struct request_queue *q = bdev_get_queue(devs[i]);
-
 		if (!IS_ENABLED(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) ||
-		    __blk_crypto_cfg_supported(q->crypto_profile, cfg)) {
+		    blk_crypto_config_supported_natively(devs[i], cfg)) {
 			if (!xchg(&mode->logged_blk_crypto_native, 1))
 				pr_info("fscrypt: %s using blk-crypto (native)\n",
 					mode->friendly_name);
@@ -92,7 +90,8 @@ static void fscrypt_log_blk_crypto_impl(struct fscrypt_mode *mode,
 }
 
 /* Enable inline encryption for this file if supported. */
-int fscrypt_select_encryption_impl(struct fscrypt_info *ci)
+int fscrypt_select_encryption_impl(struct fscrypt_info *ci,
+				   bool is_hw_wrapped_key)
 {
 	const struct inode *inode = ci->ci_inode;
 	struct super_block *sb = inode->i_sb;
@@ -133,14 +132,16 @@ int fscrypt_select_encryption_impl(struct fscrypt_info *ci)
 	crypto_cfg.crypto_mode = ci->ci_mode->blk_crypto_mode;
 	crypto_cfg.data_unit_size = sb->s_blocksize;
 	crypto_cfg.dun_bytes = fscrypt_get_dun_bytes(ci);
+	crypto_cfg.key_type =
+		is_hw_wrapped_key ? BLK_CRYPTO_KEY_TYPE_HW_WRAPPED :
+		BLK_CRYPTO_KEY_TYPE_STANDARD;
 
 	devs = fscrypt_get_devices(sb, &num_devs);
 	if (IS_ERR(devs))
 		return PTR_ERR(devs);
 
 	for (i = 0; i < num_devs; i++) {
-		if (!blk_crypto_config_supported(bdev_get_queue(devs[i]),
-						 &crypto_cfg))
+		if (!blk_crypto_config_supported(devs[i], &crypto_cfg))
 			goto out_free_devs;
 	}
 
@@ -154,12 +155,15 @@ int fscrypt_select_encryption_impl(struct fscrypt_info *ci)
 }
 
 int fscrypt_prepare_inline_crypt_key(struct fscrypt_prepared_key *prep_key,
-				     const u8 *raw_key,
+				     const u8 *raw_key, size_t raw_key_size,
+				     bool is_hw_wrapped,
 				     const struct fscrypt_info *ci)
 {
 	const struct inode *inode = ci->ci_inode;
 	struct super_block *sb = inode->i_sb;
 	enum blk_crypto_mode_num crypto_mode = ci->ci_mode->blk_crypto_mode;
+	enum blk_crypto_key_type key_type = is_hw_wrapped ?
+		BLK_CRYPTO_KEY_TYPE_HW_WRAPPED : BLK_CRYPTO_KEY_TYPE_STANDARD;
 	struct blk_crypto_key *blk_key;
 	struct block_device **devs;
 	unsigned int num_devs;
@@ -170,8 +174,9 @@ int fscrypt_prepare_inline_crypt_key(struct fscrypt_prepared_key *prep_key,
 	if (!blk_key)
 		return -ENOMEM;
 
-	err = blk_crypto_init_key(blk_key, raw_key, crypto_mode,
-				  fscrypt_get_dun_bytes(ci), sb->s_blocksize);
+	err = blk_crypto_init_key(blk_key, raw_key, raw_key_size, key_type,
+				  crypto_mode, fscrypt_get_dun_bytes(ci),
+				  sb->s_blocksize);
 	if (err) {
 		fscrypt_err(inode, "error %d initializing blk-crypto key", err);
 		goto fail;
@@ -184,8 +189,7 @@ int fscrypt_prepare_inline_crypt_key(struct fscrypt_prepared_key *prep_key,
 		goto fail;
 	}
 	for (i = 0; i < num_devs; i++) {
-		err = blk_crypto_start_using_key(blk_key,
-						 bdev_get_queue(devs[i]));
+		err = blk_crypto_start_using_key(devs[i], blk_key);
 		if (err)
 			break;
 	}
@@ -224,12 +228,40 @@ void fscrypt_destroy_inline_crypt_key(struct super_block *sb,
 	devs = fscrypt_get_devices(sb, &num_devs);
 	if (!IS_ERR(devs)) {
 		for (i = 0; i < num_devs; i++)
-			blk_crypto_evict_key(bdev_get_queue(devs[i]), blk_key);
+			blk_crypto_evict_key(devs[i], blk_key);
 		kfree(devs);
 	}
 	kfree_sensitive(blk_key);
 }
 
+/*
+ * Ask the inline encryption hardware to derive the software secret from a
+ * hardware-wrapped key.  Returns -EOPNOTSUPP if hardware-wrapped keys aren't
+ * supported on this filesystem or hardware.
+ */
+int fscrypt_derive_sw_secret(struct super_block *sb,
+			     const u8 *wrapped_key, size_t wrapped_key_size,
+			     u8 sw_secret[BLK_CRYPTO_SW_SECRET_SIZE])
+{
+	int err;
+
+	/* The filesystem must be mounted with -o inlinecrypt. */
+	if (!(sb->s_flags & SB_INLINECRYPT)) {
+		fscrypt_warn(NULL,
+			     "%s: filesystem not mounted with inlinecrypt\n",
+			     sb->s_id);
+		return -EOPNOTSUPP;
+	}
+
+	err = blk_crypto_derive_sw_secret(sb->s_bdev, wrapped_key,
+					  wrapped_key_size, sw_secret);
+	if (err == -EOPNOTSUPP)
+		fscrypt_warn(NULL,
+			     "%s: block device doesn't support hardware-wrapped keys\n",
+			     sb->s_id);
+	return err;
+}
+
 bool __fscrypt_inode_uses_inline_crypto(const struct inode *inode)
 {
 	return inode->i_crypt_info->ci_inlinecrypt;
@@ -265,6 +297,8 @@ static void fscrypt_generate_dun(const struct fscrypt_info *ci, u64 lblk_num,
  * otherwise fscrypt_mergeable_bio() won't work as intended.
  *
  * The encryption context will be freed automatically when the bio is freed.
+ *
+ * This function also handles setting bi_skip_dm_default_key when needed.
  */
 void fscrypt_set_bio_crypt_ctx(struct bio *bio, const struct inode *inode,
 			       u64 first_lblk, gfp_t gfp_mask)
@@ -272,6 +306,9 @@ void fscrypt_set_bio_crypt_ctx(struct bio *bio, const struct inode *inode,
 	const struct fscrypt_info *ci;
 	u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE];
 
+	if (fscrypt_inode_should_skip_dm_default_key(inode))
+		bio_set_skip_dm_default_key(bio);
+
 	if (!fscrypt_inode_uses_inline_crypto(inode))
 		return;
 	ci = inode->i_crypt_info;
@@ -346,6 +383,9 @@ EXPORT_SYMBOL_GPL(fscrypt_set_bio_crypt_ctx_bh);
  * another way, such as I/O targeting only a single file (and thus a single key)
  * combined with fscrypt_limit_io_blocks() to ensure DUN contiguity.
  *
+ * This function also returns false if the next part of the I/O would need to
+ * have a different value for the bi_skip_dm_default_key flag.
+ *
  * Return: true iff the I/O is mergeable
  */
 bool fscrypt_mergeable_bio(struct bio *bio, const struct inode *inode,
@@ -356,6 +396,9 @@ bool fscrypt_mergeable_bio(struct bio *bio, const struct inode *inode,
 
 	if (!!bc != fscrypt_inode_uses_inline_crypto(inode))
 		return false;
+	if (bio_should_skip_dm_default_key(bio) !=
+	    fscrypt_inode_should_skip_dm_default_key(inode))
+		return false;
 	if (!bc)
 		return true;
 
@@ -389,7 +432,8 @@ bool fscrypt_mergeable_bio_bh(struct bio *bio,
 	u64 next_lblk;
 
 	if (!bh_get_inode_and_lblk_num(next_bh, &inode, &next_lblk))
-		return !bio->bi_crypt_context;
+		return !bio->bi_crypt_context &&
+		       !bio_should_skip_dm_default_key(bio);
 
 	return fscrypt_mergeable_bio(bio, inode, next_lblk);
 }

diff --git a/fs/crypto/keyring.c b/fs/crypto/keyring.c
index 2a24b1f..d43c7e3 100644
--- a/fs/crypto/keyring.c
+++ b/fs/crypto/keyring.c

@@ -79,10 +79,9 @@ void fscrypt_put_master_key(struct fscrypt_master_key *mk)
 	call_rcu(&mk->mk_rcu_head, fscrypt_free_master_key);
 }
 
-void fscrypt_put_master_key_activeref(struct fscrypt_master_key *mk)
+void fscrypt_put_master_key_activeref(struct super_block *sb,
+				      struct fscrypt_master_key *mk)
 {
-	struct super_block *sb = mk->mk_sb;
-	struct fscrypt_keyring *keyring = sb->s_master_keys;
 	size_t i;
 
 	if (!refcount_dec_and_test(&mk->mk_active_refs))
@@ -93,9 +92,9 @@ void fscrypt_put_master_key_activeref(struct fscrypt_master_key *mk)
 	 * destroying any subkeys embedded in it.
 	 */
 
-	spin_lock(&keyring->lock);
+	spin_lock(&sb->s_master_keys->lock);
 	hlist_del_rcu(&mk->mk_node);
-	spin_unlock(&keyring->lock);
+	spin_unlock(&sb->s_master_keys->lock);
 
 	/*
 	 * ->mk_active_refs == 0 implies that ->mk_secret is not present and
@@ -131,11 +130,11 @@ static int fscrypt_user_key_instantiate(struct key *key,
 					struct key_preparsed_payload *prep)
 {
 	/*
-	 * We just charge FSCRYPT_MAX_KEY_SIZE bytes to the user's key quota for
-	 * each key, regardless of the exact key size.  The amount of memory
-	 * actually used is greater than the size of the raw key anyway.
+	 * We just charge FSCRYPT_MAX_STANDARD_KEY_SIZE bytes to the user's key
+	 * quota for each key, regardless of the exact key size.  The amount of
+	 * memory actually used is greater than the size of the raw key anyway.
 	 */
-	return key_payload_reserve(key, FSCRYPT_MAX_KEY_SIZE);
+	return key_payload_reserve(key, FSCRYPT_MAX_STANDARD_KEY_SIZE);
 }
 
 static void fscrypt_user_key_describe(const struct key *key, struct seq_file *m)
@@ -243,7 +242,7 @@ void fscrypt_destroy_keyring(struct super_block *sb)
 			WARN_ON(refcount_read(&mk->mk_struct_refs) != 1);
 			WARN_ON(!is_master_key_secret_present(&mk->mk_secret));
 			wipe_master_key_secret(&mk->mk_secret);
-			fscrypt_put_master_key_activeref(mk);
+			fscrypt_put_master_key_activeref(sb, mk);
 		}
 	}
 	kfree_sensitive(keyring);
@@ -424,7 +423,6 @@ static int add_new_master_key(struct super_block *sb,
 	if (!mk)
 		return -ENOMEM;
 
-	mk->mk_sb = sb;
 	init_rwsem(&mk->mk_sem);
 	refcount_set(&mk->mk_struct_refs, 1);
 	mk->mk_spec = *mk_spec;
@@ -537,20 +535,36 @@ static int add_master_key(struct super_block *sb,
 	int err;
 
 	if (key_spec->type == FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER) {
-		err = fscrypt_init_hkdf(&secret->hkdf, secret->raw,
-					secret->size);
+		u8 sw_secret[BLK_CRYPTO_SW_SECRET_SIZE];
+		u8 *kdf_key = secret->raw;
+		unsigned int kdf_key_size = secret->size;
+		u8 keyid_kdf_ctx = HKDF_CONTEXT_KEY_IDENTIFIER;
+
+		/*
+		 * For standard keys, the fscrypt master key is used directly as
+		 * the fscrypt KDF key.  For hardware-wrapped keys, we have to
+		 * pass the master key to the hardware to derive the KDF key,
+		 * which is then only used to derive non-file-contents subkeys.
+		 */
+		if (secret->is_hw_wrapped) {
+			err = fscrypt_derive_sw_secret(sb, secret->raw,
+						       secret->size, sw_secret);
+			if (err)
+				return err;
+			kdf_key = sw_secret;
+			kdf_key_size = sizeof(sw_secret);
+		}
+		err = fscrypt_init_hkdf(&secret->hkdf, kdf_key, kdf_key_size);
+		/*
+		 * Now that the KDF context is initialized, the raw KDF key is
+		 * no longer needed.
+		 */
+		memzero_explicit(kdf_key, kdf_key_size);
 		if (err)
 			return err;
 
-		/*
-		 * Now that the HKDF context is initialized, the raw key is no
-		 * longer needed.
-		 */
-		memzero_explicit(secret->raw, secret->size);
-
 		/* Calculate the key identifier */
-		err = fscrypt_hkdf_expand(&secret->hkdf,
-					  HKDF_CONTEXT_KEY_IDENTIFIER, NULL, 0,
+		err = fscrypt_hkdf_expand(&secret->hkdf, keyid_kdf_ctx, NULL, 0,
 					  key_spec->u.identifier,
 					  FSCRYPT_KEY_IDENTIFIER_SIZE);
 		if (err)
@@ -564,7 +578,7 @@ static int fscrypt_provisioning_key_preparse(struct key_preparsed_payload *prep)
 	const struct fscrypt_provisioning_key_payload *payload = prep->data;
 
 	if (prep->datalen < sizeof(*payload) + FSCRYPT_MIN_KEY_SIZE ||
-	    prep->datalen > sizeof(*payload) + FSCRYPT_MAX_KEY_SIZE)
+	    prep->datalen > sizeof(*payload) + FSCRYPT_MAX_ANY_KEY_SIZE)
 		return -EINVAL;
 
 	if (payload->type != FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR &&
@@ -713,15 +727,30 @@ int fscrypt_ioctl_add_key(struct file *filp, void __user *_uarg)
 		return -EACCES;
 
 	memset(&secret, 0, sizeof(secret));
+
+	if (arg.__flags) {
+		if (arg.__flags & ~__FSCRYPT_ADD_KEY_FLAG_HW_WRAPPED)
+			return -EINVAL;
+		if (arg.key_spec.type != FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER)
+			return -EINVAL;
+		secret.is_hw_wrapped = true;
+	}
+
 	if (arg.key_id) {
 		if (arg.raw_size != 0)
 			return -EINVAL;
 		err = get_keyring_key(arg.key_id, arg.key_spec.type, &secret);
 		if (err)
 			goto out_wipe_secret;
+		err = -EINVAL;
+		if (secret.size > FSCRYPT_MAX_STANDARD_KEY_SIZE &&
+		    !secret.is_hw_wrapped)
+			goto out_wipe_secret;
 	} else {
 		if (arg.raw_size < FSCRYPT_MIN_KEY_SIZE ||
-		    arg.raw_size > FSCRYPT_MAX_KEY_SIZE)
+		    arg.raw_size > (secret.is_hw_wrapped ?
+				    FSCRYPT_MAX_HW_WRAPPED_KEY_SIZE :
+				    FSCRYPT_MAX_STANDARD_KEY_SIZE))
 			return -EINVAL;
 		secret.size = arg.raw_size;
 		err = -EFAULT;
@@ -749,13 +778,13 @@ EXPORT_SYMBOL_GPL(fscrypt_ioctl_add_key);
 static void
 fscrypt_get_test_dummy_secret(struct fscrypt_master_key_secret *secret)
 {
-	static u8 test_key[FSCRYPT_MAX_KEY_SIZE];
+	static u8 test_key[FSCRYPT_MAX_STANDARD_KEY_SIZE];
 
-	get_random_once(test_key, FSCRYPT_MAX_KEY_SIZE);
+	get_random_once(test_key, sizeof(test_key));
 
 	memset(secret, 0, sizeof(*secret));
-	secret->size = FSCRYPT_MAX_KEY_SIZE;
-	memcpy(secret->raw, test_key, FSCRYPT_MAX_KEY_SIZE);
+	secret->size = sizeof(test_key);
+	memcpy(secret->raw, test_key, sizeof(test_key));
 }
 
 int fscrypt_get_test_dummy_key_identifier(
@@ -1068,7 +1097,7 @@ static int do_remove_key(struct file *filp, void __user *_uarg, bool all_users)
 	err = -ENOKEY;
 	if (is_master_key_secret_present(&mk->mk_secret)) {
 		wipe_master_key_secret(&mk->mk_secret);
-		fscrypt_put_master_key_activeref(mk);
+		fscrypt_put_master_key_activeref(sb, mk);
 		err = 0;
 	}
 	inodes_remain = refcount_read(&mk->mk_active_refs) > 0;

diff --git a/fs/crypto/keysetup.c b/fs/crypto/keysetup.c
index f740707..6d4ae17 100644
--- a/fs/crypto/keysetup.c
+++ b/fs/crypto/keysetup.c

@@ -44,6 +44,21 @@ struct fscrypt_mode fscrypt_modes[] = {
 		.security_strength = 16,
 		.ivsize = 16,
 	},
+	[FSCRYPT_MODE_SM4_XTS] = {
+		.friendly_name = "SM4-XTS",
+		.cipher_str = "xts(sm4)",
+		.keysize = 32,
+		.security_strength = 16,
+		.ivsize = 16,
+		.blk_crypto_mode = BLK_ENCRYPTION_MODE_SM4_XTS,
+	},
+	[FSCRYPT_MODE_SM4_CTS] = {
+		.friendly_name = "SM4-CTS-CBC",
+		.cipher_str = "cts(cbc(sm4))",
+		.keysize = 16,
+		.security_strength = 16,
+		.ivsize = 16,
+	},
 	[FSCRYPT_MODE_ADIANTUM] = {
 		.friendly_name = "Adiantum",
 		.cipher_str = "adiantum(xchacha12,aes)",
@@ -138,7 +153,9 @@ int fscrypt_prepare_key(struct fscrypt_prepared_key *prep_key,
 	struct crypto_skcipher *tfm;
 
 	if (fscrypt_using_inline_encryption(ci))
-		return fscrypt_prepare_inline_crypt_key(prep_key, raw_key, ci);
+		return fscrypt_prepare_inline_crypt_key(prep_key, raw_key,
+							ci->ci_mode->keysize,
+							false, ci);
 
 	tfm = fscrypt_allocate_skcipher(ci->ci_mode, raw_key, ci->ci_inode);
 	if (IS_ERR(tfm))
@@ -179,14 +196,29 @@ static int setup_per_mode_enc_key(struct fscrypt_info *ci,
 	struct fscrypt_mode *mode = ci->ci_mode;
 	const u8 mode_num = mode - fscrypt_modes;
 	struct fscrypt_prepared_key *prep_key;
-	u8 mode_key[FSCRYPT_MAX_KEY_SIZE];
+	u8 mode_key[FSCRYPT_MAX_STANDARD_KEY_SIZE];
 	u8 hkdf_info[sizeof(mode_num) + sizeof(sb->s_uuid)];
 	unsigned int hkdf_infolen = 0;
+	bool use_hw_wrapped_key = false;
 	int err;
 
 	if (WARN_ON(mode_num > FSCRYPT_MODE_MAX))
 		return -EINVAL;
 
+	if (mk->mk_secret.is_hw_wrapped && S_ISREG(inode->i_mode)) {
+		/* Using a hardware-wrapped key for file contents encryption */
+		if (!fscrypt_using_inline_encryption(ci)) {
+			if (sb->s_flags & SB_INLINECRYPT)
+				fscrypt_warn(ci->ci_inode,
+					     "Hardware-wrapped key required, but no suitable inline encryption capabilities are available");
+			else
+				fscrypt_warn(ci->ci_inode,
+					     "Hardware-wrapped keys require inline encryption (-o inlinecrypt)");
+			return -EINVAL;
+		}
+		use_hw_wrapped_key = true;
+	}
+
 	prep_key = &keys[mode_num];
 	if (fscrypt_is_key_prepared(prep_key, ci)) {
 		ci->ci_enc_key = *prep_key;
@@ -198,6 +230,16 @@ static int setup_per_mode_enc_key(struct fscrypt_info *ci,
 	if (fscrypt_is_key_prepared(prep_key, ci))
 		goto done_unlock;
 
+	if (use_hw_wrapped_key) {
+		err = fscrypt_prepare_inline_crypt_key(prep_key,
+						       mk->mk_secret.raw,
+						       mk->mk_secret.size, true,
+						       ci);
+		if (err)
+			goto out_unlock;
+		goto done_unlock;
+	}
+
 	BUILD_BUG_ON(sizeof(mode_num) != 1);
 	BUILD_BUG_ON(sizeof(sb->s_uuid) != 16);
 	BUILD_BUG_ON(sizeof(hkdf_info) != 17);
@@ -320,6 +362,14 @@ static int fscrypt_setup_v2_file_key(struct fscrypt_info *ci,
 {
 	int err;
 
+	if (mk->mk_secret.is_hw_wrapped &&
+	    !(ci->ci_policy.v2.flags & (FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64 |
+					FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32))) {
+		fscrypt_warn(ci->ci_inode,
+			     "Hardware-wrapped keys are only supported with IV_INO_LBLK policies");
+		return -EINVAL;
+	}
+
 	if (ci->ci_policy.v2.flags & FSCRYPT_POLICY_FLAG_DIRECT_KEY) {
 		/*
 		 * DIRECT_KEY: instead of deriving per-file encryption keys, the
@@ -346,7 +396,7 @@ static int fscrypt_setup_v2_file_key(struct fscrypt_info *ci,
 		   FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32) {
 		err = fscrypt_setup_iv_ino_lblk_32_key(ci, mk);
 	} else {
-		u8 derived_key[FSCRYPT_MAX_KEY_SIZE];
+		u8 derived_key[FSCRYPT_MAX_STANDARD_KEY_SIZE];
 
 		err = fscrypt_hkdf_expand(&mk->mk_secret.hkdf,
 					  HKDF_CONTEXT_PER_FILE_ENC_KEY,
@@ -427,10 +477,6 @@ static int setup_file_encryption_key(struct fscrypt_info *ci,
 	struct fscrypt_master_key *mk;
 	int err;
 
-	err = fscrypt_select_encryption_impl(ci);
-	if (err)
-		return err;
-
 	err = fscrypt_policy_to_key_spec(&ci->ci_policy, &mk_spec);
 	if (err)
 		return err;
@@ -440,6 +486,10 @@ static int setup_file_encryption_key(struct fscrypt_info *ci,
 		if (ci->ci_policy.version != FSCRYPT_POLICY_V1)
 			return -ENOKEY;
 
+		err = fscrypt_select_encryption_impl(ci, false);
+		if (err)
+			return err;
+
 		/*
 		 * As a legacy fallback for v1 policies, search for the key in
 		 * the current task's subscribed keyrings too.  Don't move this
@@ -461,8 +511,20 @@ static int setup_file_encryption_key(struct fscrypt_info *ci,
 		goto out_release_key;
 	}
 
+	err = fscrypt_select_encryption_impl(ci, mk->mk_secret.is_hw_wrapped);
+	if (err)
+		goto out_release_key;
+
 	switch (ci->ci_policy.version) {
 	case FSCRYPT_POLICY_V1:
+		if (WARN_ON(mk->mk_secret.is_hw_wrapped)) {
+			/*
+			 * This should never happen, as adding a v1 policy key
+			 * that is hardware-wrapped isn't allowed.
+			 */
+			err = -EINVAL;
+			goto out_release_key;
+		}
 		err = fscrypt_setup_v1_file_key(ci, mk->mk_secret.raw);
 		break;
 	case FSCRYPT_POLICY_V2:
@@ -509,7 +571,7 @@ static void put_crypt_info(struct fscrypt_info *ci)
 		spin_lock(&mk->mk_decrypted_inodes_lock);
 		list_del(&ci->ci_master_key_link);
 		spin_unlock(&mk->mk_decrypted_inodes_lock);
-		fscrypt_put_master_key_activeref(mk);
+		fscrypt_put_master_key_activeref(ci->ci_inode->i_sb, mk);
 	}
 	memzero_explicit(ci, sizeof(*ci));
 	kmem_cache_free(fscrypt_info_cachep, ci);

diff --git a/fs/crypto/keysetup_v1.c b/fs/crypto/keysetup_v1.c
index 75dabd9b..ee82463 100644
--- a/fs/crypto/keysetup_v1.c
+++ b/fs/crypto/keysetup_v1.c

@@ -118,7 +118,8 @@ find_and_lock_process_key(const char *prefix,
 	payload = (const struct fscrypt_key *)ukp->data;
 
 	if (ukp->datalen != sizeof(struct fscrypt_key) ||
-	    payload->size < 1 || payload->size > FSCRYPT_MAX_KEY_SIZE) {
+	    payload->size < 1 ||
+	    payload->size > FSCRYPT_MAX_STANDARD_KEY_SIZE) {
 		fscrypt_warn(NULL,
 			     "key with description '%s' has invalid payload",
 			     key->description);
@@ -149,7 +150,7 @@ struct fscrypt_direct_key {
 	const struct fscrypt_mode	*dk_mode;
 	struct fscrypt_prepared_key	dk_key;
 	u8				dk_descriptor[FSCRYPT_KEY_DESCRIPTOR_SIZE];
-	u8				dk_raw[FSCRYPT_MAX_KEY_SIZE];
+	u8				dk_raw[FSCRYPT_MAX_STANDARD_KEY_SIZE];
 };
 
 static void free_direct_key(struct fscrypt_direct_key *dk)

diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index 46757c3..893661b 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c

@@ -61,6 +61,13 @@ fscrypt_get_dummy_policy(struct super_block *sb)
 	return sb->s_cop->get_dummy_policy(sb);
 }
 
+/*
+ * Return %true if the given combination of encryption modes is supported for v1
+ * (and later) encryption policies.
+ *
+ * Do *not* add anything new here, since v1 encryption policies are deprecated.
+ * New combinations of modes should go in fscrypt_valid_enc_modes_v2() only.
+ */
 static bool fscrypt_valid_enc_modes_v1(u32 contents_mode, u32 filenames_mode)
 {
 	if (contents_mode == FSCRYPT_MODE_AES_256_XTS &&
@@ -83,6 +90,11 @@ static bool fscrypt_valid_enc_modes_v2(u32 contents_mode, u32 filenames_mode)
 	if (contents_mode == FSCRYPT_MODE_AES_256_XTS &&
 	    filenames_mode == FSCRYPT_MODE_AES_256_HCTR2)
 		return true;
+
+	if (contents_mode == FSCRYPT_MODE_SM4_XTS &&
+	    filenames_mode == FSCRYPT_MODE_SM4_CTS)
+		return true;
+
 	return fscrypt_valid_enc_modes_v1(contents_mode, filenames_mode);
 }
 

diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index 3d21eae..e604ea4e 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c

@@ -75,14 +75,10 @@ static void __read_end_io(struct bio *bio)
 	bio_for_each_segment_all(bv, bio, iter_all) {
 		page = bv->bv_page;
 
-		/* PG_error was set if verity failed. */
-		if (bio->bi_status || PageError(page)) {
+		if (bio->bi_status)
 			ClearPageUptodate(page);
-			/* will re-read again later */
-			ClearPageError(page);
-		} else {
+		else
 			SetPageUptodate(page);
-		}
 		unlock_page(page);
 	}
 	if (bio->bi_private)

diff --git a/fs/f2fs/OWNERS b/fs/f2fs/OWNERS
new file mode 100644
index 0000000..6a5c01163
--- /dev/null
+++ b/fs/f2fs/OWNERS

@@ -0,0 +1 @@
+jaegeuk@google.com

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 0c82dae..56f7d0d 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c

@@ -171,6 +171,11 @@ static bool __is_bitmap_valid(struct f2fs_sb_info *sbi, block_t blkaddr,
 bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi,
 					block_t blkaddr, int type)
 {
+	if (time_to_inject(sbi, FAULT_BLKADDR)) {
+		f2fs_show_injection_info(sbi, FAULT_BLKADDR);
+		return false;
+	}
+
 	switch (type) {
 	case META_NAT:
 		break;
@@ -1897,8 +1902,10 @@ int f2fs_start_ckpt_thread(struct f2fs_sb_info *sbi)
 	cprc->f2fs_issue_ckpt = kthread_run(issue_checkpoint_thread, sbi,
 			"f2fs_ckpt-%u:%u", MAJOR(dev), MINOR(dev));
 	if (IS_ERR(cprc->f2fs_issue_ckpt)) {
+		int err = PTR_ERR(cprc->f2fs_issue_ckpt);
+
 		cprc->f2fs_issue_ckpt = NULL;
-		return -ENOMEM;
+		return err;
 	}
 
 	set_task_ioprio(cprc->f2fs_issue_ckpt, cprc->ckpt_thread_ioprio);

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 74d3f2d..2532f36 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c

@@ -567,10 +567,7 @@ MODULE_PARM_DESC(num_compress_pages,
 int f2fs_init_compress_mempool(void)
 {
 	compress_page_pool = mempool_create_page_pool(num_compress_pages, 0);
-	if (!compress_page_pool)
-		return -ENOMEM;
-
-	return 0;
+	return compress_page_pool ? 0 : -ENOMEM;
 }
 
 void f2fs_destroy_compress_mempool(void)
@@ -1711,50 +1708,27 @@ static void f2fs_put_dic(struct decompress_io_ctx *dic, bool in_task)
 	}
 }
 
-/*
- * Update and unlock the cluster's pagecache pages, and release the reference to
- * the decompress_io_ctx that was being held for I/O completion.
- */
-static void __f2fs_decompress_end_io(struct decompress_io_ctx *dic, bool failed,
-				bool in_task)
-{
-	int i;
-
-	for (i = 0; i < dic->cluster_size; i++) {
-		struct page *rpage = dic->rpages[i];
-
-		if (!rpage)
-			continue;
-
-		/* PG_error was set if verity failed. */
-		if (failed || PageError(rpage)) {
-			ClearPageUptodate(rpage);
-			/* will re-read again later */
-			ClearPageError(rpage);
-		} else {
-			SetPageUptodate(rpage);
-		}
-		unlock_page(rpage);
-	}
-
-	f2fs_put_dic(dic, in_task);
-}
-
 static void f2fs_verify_cluster(struct work_struct *work)
 {
 	struct decompress_io_ctx *dic =
 		container_of(work, struct decompress_io_ctx, verity_work);
 	int i;
 
-	/* Verify the cluster's decompressed pages with fs-verity. */
+	/* Verify, update, and unlock the decompressed pages. */
 	for (i = 0; i < dic->cluster_size; i++) {
 		struct page *rpage = dic->rpages[i];
 
-		if (rpage && !fsverity_verify_page(rpage))
-			SetPageError(rpage);
+		if (!rpage)
+			continue;
+
+		if (fsverity_verify_page(rpage))
+			SetPageUptodate(rpage);
+		else
+			ClearPageUptodate(rpage);
+		unlock_page(rpage);
 	}
 
-	__f2fs_decompress_end_io(dic, false, true);
+	f2fs_put_dic(dic, true);
 }
 
 /*
@@ -1764,6 +1738,8 @@ static void f2fs_verify_cluster(struct work_struct *work)
 void f2fs_decompress_end_io(struct decompress_io_ctx *dic, bool failed,
 				bool in_task)
 {
+	int i;
+
 	if (!failed && dic->need_verity) {
 		/*
 		 * Note that to avoid deadlocks, the verity work can't be done
@@ -1773,9 +1749,28 @@ void f2fs_decompress_end_io(struct decompress_io_ctx *dic, bool failed,
 		 */
 		INIT_WORK(&dic->verity_work, f2fs_verify_cluster);
 		fsverity_enqueue_verify_work(&dic->verity_work);
-	} else {
-		__f2fs_decompress_end_io(dic, failed, in_task);
+		return;
 	}
+
+	/* Update and unlock the cluster's pagecache pages. */
+	for (i = 0; i < dic->cluster_size; i++) {
+		struct page *rpage = dic->rpages[i];
+
+		if (!rpage)
+			continue;
+
+		if (failed)
+			ClearPageUptodate(rpage);
+		else
+			SetPageUptodate(rpage);
+		unlock_page(rpage);
+	}
+
+	/*
+	 * Release the reference to the decompress_io_ctx that was being held
+	 * for I/O completion.
+	 */
+	f2fs_put_dic(dic, in_task);
 }
 
 /*
@@ -1983,9 +1978,7 @@ int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi)
 
 	sbi->page_array_slab = f2fs_kmem_cache_create(slab_name,
 					sbi->page_array_slab_size);
-	if (!sbi->page_array_slab)
-		return -ENOMEM;
-	return 0;
+	return sbi->page_array_slab ? 0 : -ENOMEM;
 }
 
 void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi)
@@ -1993,53 +1986,24 @@ void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi)
 	kmem_cache_destroy(sbi->page_array_slab);
 }
 
-static int __init f2fs_init_cic_cache(void)
+int __init f2fs_init_compress_cache(void)
 {
 	cic_entry_slab = f2fs_kmem_cache_create("f2fs_cic_entry",
 					sizeof(struct compress_io_ctx));
 	if (!cic_entry_slab)
 		return -ENOMEM;
-	return 0;
-}
-
-static void f2fs_destroy_cic_cache(void)
-{
-	kmem_cache_destroy(cic_entry_slab);
-}
-
-static int __init f2fs_init_dic_cache(void)
-{
 	dic_entry_slab = f2fs_kmem_cache_create("f2fs_dic_entry",
 					sizeof(struct decompress_io_ctx));
 	if (!dic_entry_slab)
-		return -ENOMEM;
-	return 0;
-}
-
-static void f2fs_destroy_dic_cache(void)
-{
-	kmem_cache_destroy(dic_entry_slab);
-}
-
-int __init f2fs_init_compress_cache(void)
-{
-	int err;
-
-	err = f2fs_init_cic_cache();
-	if (err)
-		goto out;
-	err = f2fs_init_dic_cache();
-	if (err)
 		goto free_cic;
 	return 0;
 free_cic:
-	f2fs_destroy_cic_cache();
-out:
+	kmem_cache_destroy(cic_entry_slab);
 	return -ENOMEM;
 }
 
 void f2fs_destroy_compress_cache(void)
 {
-	f2fs_destroy_dic_cache();
-	f2fs_destroy_cic_cache();
+	kmem_cache_destroy(dic_entry_slab);
+	kmem_cache_destroy(cic_entry_slab);
 }

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a71e818..2899bb9 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c

@@ -39,10 +39,8 @@ static struct bio_set f2fs_bioset;
 
 int __init f2fs_init_bioset(void)
 {
-	if (bioset_init(&f2fs_bioset, F2FS_BIO_POOL_SIZE,
-					0, BIOSET_NEED_BVECS))
-		return -ENOMEM;
-	return 0;
+	return bioset_init(&f2fs_bioset, F2FS_BIO_POOL_SIZE,
+					0, BIOSET_NEED_BVECS);
 }
 
 void f2fs_destroy_bioset(void)
@@ -116,43 +114,56 @@ struct bio_post_read_ctx {
 	struct f2fs_sb_info *sbi;
 	struct work_struct work;
 	unsigned int enabled_steps;
+	/*
+	 * decompression_attempted keeps track of whether
+	 * f2fs_end_read_compressed_page() has been called on the pages in the
+	 * bio that belong to a compressed cluster yet.
+	 */
+	bool decompression_attempted;
 	block_t fs_blkaddr;
 };
 
+/*
+ * Update and unlock a bio's pages, and free the bio.
+ *
+ * This marks pages up-to-date only if there was no error in the bio (I/O error,
+ * decryption error, or verity error), as indicated by bio->bi_status.
+ *
+ * "Compressed pages" (pagecache pages backed by a compressed cluster on-disk)
+ * aren't marked up-to-date here, as decompression is done on a per-compression-
+ * cluster basis rather than a per-bio basis.  Instead, we only must do two
+ * things for each compressed page here: call f2fs_end_read_compressed_page()
+ * with failed=true if an error occurred before it would have normally gotten
+ * called (i.e., I/O error or decryption error, but *not* verity error), and
+ * release the bio's reference to the decompress_io_ctx of the page's cluster.
+ */
 static void f2fs_finish_read_bio(struct bio *bio, bool in_task)
 {
 	struct bio_vec *bv;
 	struct bvec_iter_all iter_all;
+	struct bio_post_read_ctx *ctx = bio->bi_private;
 
-	/*
-	 * Update and unlock the bio's pagecache pages, and put the
-	 * decompression context for any compressed pages.
-	 */
 	bio_for_each_segment_all(bv, bio, iter_all) {
 		struct page *page = bv->bv_page;
 
 		if (f2fs_is_compressed_page(page)) {
-			if (bio->bi_status)
+			if (ctx && !ctx->decompression_attempted)
 				f2fs_end_read_compressed_page(page, true, 0,
 							in_task);
 			f2fs_put_page_dic(page, in_task);
 			continue;
 		}
 
-		/* PG_error was set if verity failed. */
-		if (bio->bi_status || PageError(page)) {
+		if (bio->bi_status)
 			ClearPageUptodate(page);
-			/* will re-read again later */
-			ClearPageError(page);
-		} else {
+		else
 			SetPageUptodate(page);
-		}
 		dec_page_count(F2FS_P_SB(page), __read_io_type(page));
 		unlock_page(page);
 	}
 
-	if (bio->bi_private)
-		mempool_free(bio->bi_private, bio_post_read_ctx_pool);
+	if (ctx)
+		mempool_free(ctx, bio_post_read_ctx_pool);
 	bio_put(bio);
 }
 
@@ -185,8 +196,10 @@ static void f2fs_verify_bio(struct work_struct *work)
 			struct page *page = bv->bv_page;
 
 			if (!f2fs_is_compressed_page(page) &&
-			    !fsverity_verify_page(page))
-				SetPageError(page);
+			    !fsverity_verify_page(page)) {
+				bio->bi_status = BLK_STS_IOERR;
+				break;
+			}
 		}
 	} else {
 		fsverity_verify_bio(bio);
@@ -245,6 +258,8 @@ static void f2fs_handle_step_decompress(struct bio_post_read_ctx *ctx,
 		blkaddr++;
 	}
 
+	ctx->decompression_attempted = true;
+
 	/*
 	 * Optimization: if all the bio's pages are compressed, then scheduling
 	 * the per-bio verity work is unnecessary, as verity will be fully
@@ -476,6 +491,8 @@ static void f2fs_set_bio_crypt_ctx(struct bio *bio, const struct inode *inode,
 	 */
 	if (!fio || !fio->encrypted_page)
 		fscrypt_set_bio_crypt_ctx(bio, inode, first_idx, gfp_mask);
+	else if (fscrypt_inode_should_skip_dm_default_key(inode))
+		bio_set_skip_dm_default_key(bio);
 }
 
 static bool f2fs_crypt_mergeable_bio(struct bio *bio, const struct inode *inode,
@@ -487,7 +504,9 @@ static bool f2fs_crypt_mergeable_bio(struct bio *bio, const struct inode *inode,
 	 * read/write raw data without encryption.
 	 */
 	if (fio && fio->encrypted_page)
-		return !bio_has_crypt_ctx(bio);
+		return !bio_has_crypt_ctx(bio) &&
+			(bio_should_skip_dm_default_key(bio) ==
+			 fscrypt_inode_should_skip_dm_default_key(inode));
 
 	return fscrypt_mergeable_bio(bio, inode, next_idx);
 }
@@ -1062,6 +1081,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
 		ctx->sbi = sbi;
 		ctx->enabled_steps = post_read_steps;
 		ctx->fs_blkaddr = blkaddr;
+		ctx->decompression_attempted = false;
 		bio->bi_private = ctx;
 	}
 	iostat_alloc_and_bind_ctx(sbi, bio, ctx);
@@ -1089,7 +1109,6 @@ static int f2fs_submit_page_read(struct inode *inode, struct page *page,
 		bio_put(bio);
 		return -EFAULT;
 	}
-	ClearPageError(page);
 	inc_page_count(sbi, F2FS_RD_DATA);
 	f2fs_update_iostat(sbi, NULL, FS_DATA_READ_IO, F2FS_BLKSIZE);
 	__submit_bio(sbi, bio, DATA);
@@ -1128,7 +1147,7 @@ void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr)
 {
 	dn->data_blkaddr = blkaddr;
 	f2fs_set_data_blkaddr(dn);
-	f2fs_update_extent_cache(dn);
+	f2fs_update_read_extent_cache(dn);
 }
 
 /* dn->ofs_in_node will be returned with up-to-date last block pointer */
@@ -1197,7 +1216,7 @@ int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index)
 	struct extent_info ei = {0, };
 	struct inode *inode = dn->inode;
 
-	if (f2fs_lookup_extent_cache(inode, index, &ei)) {
+	if (f2fs_lookup_read_extent_cache(inode, index, &ei)) {
 		dn->data_blkaddr = ei.blk + index - ei.fofs;
 		return 0;
 	}
@@ -1206,7 +1225,8 @@ int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index)
 }
 
 struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
-				     blk_opf_t op_flags, bool for_write)
+				     blk_opf_t op_flags, bool for_write,
+				     pgoff_t *next_pgofs)
 {
 	struct address_space *mapping = inode->i_mapping;
 	struct dnode_of_data dn;
@@ -1218,7 +1238,7 @@ struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
 	if (!page)
 		return ERR_PTR(-ENOMEM);
 
-	if (f2fs_lookup_extent_cache(inode, index, &ei)) {
+	if (f2fs_lookup_read_extent_cache(inode, index, &ei)) {
 		dn.data_blkaddr = ei.blk + index - ei.fofs;
 		if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), dn.data_blkaddr,
 						DATA_GENERIC_ENHANCE_READ)) {
@@ -1232,12 +1252,17 @@ struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
 
 	set_new_dnode(&dn, inode, NULL, NULL, 0);
 	err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE);
-	if (err)
+	if (err) {
+		if (err == -ENOENT && next_pgofs)
+			*next_pgofs = f2fs_get_next_page_offset(&dn, index);
 		goto put_err;
+	}
 	f2fs_put_dnode(&dn);
 
 	if (unlikely(dn.data_blkaddr == NULL_ADDR)) {
 		err = -ENOENT;
+		if (next_pgofs)
+			*next_pgofs = index + 1;
 		goto put_err;
 	}
 	if (dn.data_blkaddr != NEW_ADDR &&
@@ -1281,7 +1306,8 @@ struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
 	return ERR_PTR(err);
 }
 
-struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index)
+struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index,
+					pgoff_t *next_pgofs)
 {
 	struct address_space *mapping = inode->i_mapping;
 	struct page *page;
@@ -1291,7 +1317,7 @@ struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index)
 		return page;
 	f2fs_put_page(page, 0);
 
-	page = f2fs_get_read_data_page(inode, index, 0, false);
+	page = f2fs_get_read_data_page(inode, index, 0, false, next_pgofs);
 	if (IS_ERR(page))
 		return page;
 
@@ -1317,7 +1343,7 @@ struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index,
 	struct address_space *mapping = inode->i_mapping;
 	struct page *page;
 repeat:
-	page = f2fs_get_read_data_page(inode, index, 0, for_write);
+	page = f2fs_get_read_data_page(inode, index, 0, for_write, NULL);
 	if (IS_ERR(page))
 		return page;
 
@@ -1480,7 +1506,7 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
 	pgofs =	(pgoff_t)map->m_lblk;
 	end = pgofs + maxblocks;
 
-	if (!create && f2fs_lookup_extent_cache(inode, pgofs, &ei)) {
+	if (!create && f2fs_lookup_read_extent_cache(inode, pgofs, &ei)) {
 		if (f2fs_lfs_mode(sbi) && flag == F2FS_GET_BLOCK_DIO &&
 							map->m_may_create)
 			goto next_dnode;
@@ -1690,7 +1716,7 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
 		if (map->m_flags & F2FS_MAP_MAPPED) {
 			unsigned int ofs = start_pgofs - map->m_lblk;
 
-			f2fs_update_extent_cache_range(&dn,
+			f2fs_update_read_extent_cache_range(&dn,
 				start_pgofs, map->m_pblk + ofs,
 				map->m_len - ofs);
 		}
@@ -1735,7 +1761,7 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
 		if (map->m_flags & F2FS_MAP_MAPPED) {
 			unsigned int ofs = start_pgofs - map->m_lblk;
 
-			f2fs_update_extent_cache_range(&dn,
+			f2fs_update_read_extent_cache_range(&dn,
 				start_pgofs, map->m_pblk + ofs,
 				map->m_len - ofs);
 		}
@@ -2141,7 +2167,6 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page,
 	inc_page_count(F2FS_I_SB(inode), F2FS_RD_DATA);
 	f2fs_update_iostat(F2FS_I_SB(inode), NULL, FS_DATA_READ_IO,
 							F2FS_BLKSIZE);
-	ClearPageError(page);
 	*last_block_in_bio = block_nr;
 	goto out;
 out:
@@ -2162,7 +2187,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
 	sector_t last_block_in_file;
 	const unsigned blocksize = blks_to_bytes(inode, 1);
 	struct decompress_io_ctx *dic = NULL;
-	struct extent_info ei = {0, };
+	struct extent_info ei = {};
 	bool from_dnode = true;
 	int i;
 	int ret = 0;
@@ -2196,7 +2221,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
 	if (f2fs_cluster_is_empty(cc))
 		goto out;
 
-	if (f2fs_lookup_extent_cache(inode, start_idx, &ei))
+	if (f2fs_lookup_read_extent_cache(inode, start_idx, &ei))
 		from_dnode = false;
 
 	if (!from_dnode)
@@ -2289,7 +2314,6 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
 
 		inc_page_count(sbi, F2FS_RD_DATA);
 		f2fs_update_iostat(sbi, inode, FS_DATA_READ_IO, F2FS_BLKSIZE);
-		ClearPageError(page);
 		*last_block_in_bio = blkaddr;
 	}
 
@@ -2306,7 +2330,6 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
 	for (i = 0; i < cc->cluster_size; i++) {
 		if (cc->rpages[i]) {
 			ClearPageUptodate(cc->rpages[i]);
-			ClearPageError(cc->rpages[i]);
 			unlock_page(cc->rpages[i]);
 		}
 	}
@@ -2403,7 +2426,6 @@ static int f2fs_mpage_readpages(struct inode *inode,
 #ifdef CONFIG_F2FS_FS_COMPRESSION
 set_error_page:
 #endif
-			SetPageError(page);
 			zero_user_segment(page, 0, PAGE_SIZE);
 			unlock_page(page);
 		}
@@ -2630,7 +2652,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
 		set_new_dnode(&dn, inode, NULL, NULL, 0);
 
 	if (need_inplace_update(fio) &&
-			f2fs_lookup_extent_cache(inode, page->index, &ei)) {
+	    f2fs_lookup_read_extent_cache(inode, page->index, &ei)) {
 		fio->old_blkaddr = ei.blk + page->index - ei.fofs;
 
 		if (!f2fs_is_valid_blkaddr(fio->sbi, fio->old_blkaddr,
@@ -3354,7 +3376,7 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
 	} else if (locked) {
 		err = f2fs_get_block(&dn, index);
 	} else {
-		if (f2fs_lookup_extent_cache(inode, index, &ei)) {
+		if (f2fs_lookup_read_extent_cache(inode, index, &ei)) {
 			dn.data_blkaddr = ei.blk + index - ei.fofs;
 		} else {
 			/* hole case */
@@ -3395,7 +3417,7 @@ static int __find_data_block(struct inode *inode, pgoff_t index,
 
 	set_new_dnode(&dn, inode, ipage, ipage, 0);
 
-	if (f2fs_lookup_extent_cache(inode, index, &ei)) {
+	if (f2fs_lookup_read_extent_cache(inode, index, &ei)) {
 		dn.data_blkaddr = ei.blk + index - ei.fofs;
 	} else {
 		/* hole case */
@@ -3459,6 +3481,9 @@ static int prepare_atomic_write_begin(struct f2fs_sb_info *sbi,
 	else if (*blk_addr != NULL_ADDR)
 		return 0;
 
+	if (is_inode_flag_set(inode, FI_ATOMIC_REPLACE))
+		goto reserve_block;
+
 	/* Look for the block in the original inode */
 	err = __find_data_block(inode, index, &ori_blk_addr);
 	if (err)
@@ -4080,9 +4105,7 @@ int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi)
 	sbi->post_read_wq = alloc_workqueue("f2fs_post_read_wq",
 						 WQ_UNBOUND | WQ_HIGHPRI,
 						 num_online_cpus());
-	if (!sbi->post_read_wq)
-		return -ENOMEM;
-	return 0;
+	return sbi->post_read_wq ? 0 : -ENOMEM;
 }
 
 void f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi)
@@ -4095,9 +4118,7 @@ int __init f2fs_init_bio_entry_cache(void)
 {
 	bio_entry_slab = f2fs_kmem_cache_create("f2fs_bio_entry_slab",
 			sizeof(struct bio_entry));
-	if (!bio_entry_slab)
-		return -ENOMEM;
-	return 0;
+	return bio_entry_slab ? 0 : -ENOMEM;
 }
 
 void f2fs_destroy_bio_entry_cache(void)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index a216dcd..32af4f0 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c

@@ -72,15 +72,26 @@ static void update_general_status(struct f2fs_sb_info *sbi)
 	si->main_area_zones = si->main_area_sections /
 				le32_to_cpu(raw_super->secs_per_zone);
 
-	/* validation check of the segment numbers */
+	/* general extent cache stats */
+	for (i = 0; i < NR_EXTENT_CACHES; i++) {
+		struct extent_tree_info *eti = &sbi->extent_tree[i];
+
+		si->hit_cached[i] = atomic64_read(&sbi->read_hit_cached[i]);
+		si->hit_rbtree[i] = atomic64_read(&sbi->read_hit_rbtree[i]);
+		si->total_ext[i] = atomic64_read(&sbi->total_hit_ext[i]);
+		si->hit_total[i] = si->hit_cached[i] + si->hit_rbtree[i];
+		si->ext_tree[i] = atomic_read(&eti->total_ext_tree);
+		si->zombie_tree[i] = atomic_read(&eti->total_zombie_tree);
+		si->ext_node[i] = atomic_read(&eti->total_ext_node);
+	}
+	/* read extent_cache only */
 	si->hit_largest = atomic64_read(&sbi->read_hit_largest);
-	si->hit_cached = atomic64_read(&sbi->read_hit_cached);
-	si->hit_rbtree = atomic64_read(&sbi->read_hit_rbtree);
-	si->hit_total = si->hit_largest + si->hit_cached + si->hit_rbtree;
-	si->total_ext = atomic64_read(&sbi->total_hit_ext);
-	si->ext_tree = atomic_read(&sbi->total_ext_tree);
-	si->zombie_tree = atomic_read(&sbi->total_zombie_tree);
-	si->ext_node = atomic_read(&sbi->total_ext_node);
+	si->hit_total[EX_READ] += si->hit_largest;
+
+	/* block age extent_cache only */
+	si->allocated_data_blocks = atomic64_read(&sbi->allocated_data_blocks);
+
+	/* validation check of the segment numbers */
 	si->ndirty_node = get_pages(sbi, F2FS_DIRTY_NODES);
 	si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS);
 	si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META);
@@ -294,25 +305,32 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
 				sizeof(struct nat_entry_set);
 	for (i = 0; i < MAX_INO_ENTRY; i++)
 		si->cache_mem += sbi->im[i].ino_num * sizeof(struct ino_entry);
-	si->cache_mem += atomic_read(&sbi->total_ext_tree) *
+
+	for (i = 0; i < NR_EXTENT_CACHES; i++) {
+		struct extent_tree_info *eti = &sbi->extent_tree[i];
+
+		si->ext_mem[i] = atomic_read(&eti->total_ext_tree) *
 						sizeof(struct extent_tree);
-	si->cache_mem += atomic_read(&sbi->total_ext_node) *
+		si->ext_mem[i] += atomic_read(&eti->total_ext_node) *
 						sizeof(struct extent_node);
+		si->cache_mem += si->ext_mem[i];
+	}
 
 	si->page_mem = 0;
 	if (sbi->node_inode) {
-		unsigned npages = NODE_MAPPING(sbi)->nrpages;
+		unsigned long npages = NODE_MAPPING(sbi)->nrpages;
 
 		si->page_mem += (unsigned long long)npages << PAGE_SHIFT;
 	}
 	if (sbi->meta_inode) {
-		unsigned npages = META_MAPPING(sbi)->nrpages;
+		unsigned long npages = META_MAPPING(sbi)->nrpages;
 
 		si->page_mem += (unsigned long long)npages << PAGE_SHIFT;
 	}
 #ifdef CONFIG_F2FS_FS_COMPRESSION
 	if (sbi->compress_inode) {
-		unsigned npages = COMPRESS_MAPPING(sbi)->nrpages;
+		unsigned long npages = COMPRESS_MAPPING(sbi)->nrpages;
+
 		si->page_mem += (unsigned long long)npages << PAGE_SHIFT;
 	}
 #endif
@@ -460,28 +478,28 @@ static int stat_show(struct seq_file *s, void *v)
 				si->meta_count[META_NAT]);
 		seq_printf(s, "  - ssa blocks : %u\n",
 				si->meta_count[META_SSA]);
-		seq_printf(s, "CP merge (Queued: %4d, Issued: %4d, Total: %4d, "
-				"Cur time: %4d(ms), Peak time: %4d(ms))\n",
-				si->nr_queued_ckpt, si->nr_issued_ckpt,
-				si->nr_total_ckpt, si->cur_ckpt_time,
-				si->peak_ckpt_time);
+		seq_puts(s, "CP merge:\n");
+		seq_printf(s, "  - Queued : %4d\n", si->nr_queued_ckpt);
+		seq_printf(s, "  - Issued : %4d\n", si->nr_issued_ckpt);
+		seq_printf(s, "  - Total : %4d\n", si->nr_total_ckpt);
+		seq_printf(s, "  - Cur time : %4d(ms)\n", si->cur_ckpt_time);
+		seq_printf(s, "  - Peak time : %4d(ms)\n", si->peak_ckpt_time);
 		seq_printf(s, "GC calls: %d (BG: %d)\n",
 			   si->call_count, si->bg_gc);
 		seq_printf(s, "  - data segments : %d (%d)\n",
 				si->data_segs, si->bg_data_segs);
 		seq_printf(s, "  - node segments : %d (%d)\n",
 				si->node_segs, si->bg_node_segs);
-		seq_printf(s, "  - Reclaimed segs : Normal (%d), Idle CB (%d), "
-				"Idle Greedy (%d), Idle AT (%d), "
-				"Urgent High (%d), Urgent Mid (%d), "
-				"Urgent Low (%d)\n",
-				si->sbi->gc_reclaimed_segs[GC_NORMAL],
-				si->sbi->gc_reclaimed_segs[GC_IDLE_CB],
-				si->sbi->gc_reclaimed_segs[GC_IDLE_GREEDY],
-				si->sbi->gc_reclaimed_segs[GC_IDLE_AT],
-				si->sbi->gc_reclaimed_segs[GC_URGENT_HIGH],
-				si->sbi->gc_reclaimed_segs[GC_URGENT_MID],
-				si->sbi->gc_reclaimed_segs[GC_URGENT_LOW]);
+		seq_puts(s, "  - Reclaimed segs :\n");
+		seq_printf(s, "    - Normal : %d\n", si->sbi->gc_reclaimed_segs[GC_NORMAL]);
+		seq_printf(s, "    - Idle CB : %d\n", si->sbi->gc_reclaimed_segs[GC_IDLE_CB]);
+		seq_printf(s, "    - Idle Greedy : %d\n",
+				si->sbi->gc_reclaimed_segs[GC_IDLE_GREEDY]);
+		seq_printf(s, "    - Idle AT : %d\n", si->sbi->gc_reclaimed_segs[GC_IDLE_AT]);
+		seq_printf(s, "    - Urgent High : %d\n",
+				si->sbi->gc_reclaimed_segs[GC_URGENT_HIGH]);
+		seq_printf(s, "    - Urgent Mid : %d\n", si->sbi->gc_reclaimed_segs[GC_URGENT_MID]);
+		seq_printf(s, "    - Urgent Low : %d\n", si->sbi->gc_reclaimed_segs[GC_URGENT_LOW]);
 		seq_printf(s, "Try to move %d blocks (BG: %d)\n", si->tot_blks,
 				si->bg_data_blks + si->bg_node_blks);
 		seq_printf(s, "  - data blocks : %d (%d)\n", si->data_blks,
@@ -490,26 +508,44 @@ static int stat_show(struct seq_file *s, void *v)
 				si->bg_node_blks);
 		seq_printf(s, "BG skip : IO: %u, Other: %u\n",
 				si->io_skip_bggc, si->other_skip_bggc);
-		seq_puts(s, "\nExtent Cache:\n");
+		seq_puts(s, "\nExtent Cache (Read):\n");
 		seq_printf(s, "  - Hit Count: L1-1:%llu L1-2:%llu L2:%llu\n",
-				si->hit_largest, si->hit_cached,
-				si->hit_rbtree);
+				si->hit_largest, si->hit_cached[EX_READ],
+				si->hit_rbtree[EX_READ]);
 		seq_printf(s, "  - Hit Ratio: %llu%% (%llu / %llu)\n",
-				!si->total_ext ? 0 :
-				div64_u64(si->hit_total * 100, si->total_ext),
-				si->hit_total, si->total_ext);
+				!si->total_ext[EX_READ] ? 0 :
+				div64_u64(si->hit_total[EX_READ] * 100,
+				si->total_ext[EX_READ]),
+				si->hit_total[EX_READ], si->total_ext[EX_READ]);
 		seq_printf(s, "  - Inner Struct Count: tree: %d(%d), node: %d\n",
-				si->ext_tree, si->zombie_tree, si->ext_node);
+				si->ext_tree[EX_READ], si->zombie_tree[EX_READ],
+				si->ext_node[EX_READ]);
+		seq_puts(s, "\nExtent Cache (Block Age):\n");
+		seq_printf(s, "  - Allocated Data Blocks: %llu\n",
+				si->allocated_data_blocks);
+		seq_printf(s, "  - Hit Count: L1:%llu L2:%llu\n",
+				si->hit_cached[EX_BLOCK_AGE],
+				si->hit_rbtree[EX_BLOCK_AGE]);
+		seq_printf(s, "  - Hit Ratio: %llu%% (%llu / %llu)\n",
+				!si->total_ext[EX_BLOCK_AGE] ? 0 :
+				div64_u64(si->hit_total[EX_BLOCK_AGE] * 100,
+				si->total_ext[EX_BLOCK_AGE]),
+				si->hit_total[EX_BLOCK_AGE],
+				si->total_ext[EX_BLOCK_AGE]);
+		seq_printf(s, "  - Inner Struct Count: tree: %d(%d), node: %d\n",
+				si->ext_tree[EX_BLOCK_AGE],
+				si->zombie_tree[EX_BLOCK_AGE],
+				si->ext_node[EX_BLOCK_AGE]);
 		seq_puts(s, "\nBalancing F2FS Async:\n");
 		seq_printf(s, "  - DIO (R: %4d, W: %4d)\n",
 			   si->nr_dio_read, si->nr_dio_write);
 		seq_printf(s, "  - IO_R (Data: %4d, Node: %4d, Meta: %4d\n",
 			   si->nr_rd_data, si->nr_rd_node, si->nr_rd_meta);
-		seq_printf(s, "  - IO_W (CP: %4d, Data: %4d, Flush: (%4d %4d %4d), "
-			"Discard: (%4d %4d)) cmd: %4d undiscard:%4u\n",
+		seq_printf(s, "  - IO_W (CP: %4d, Data: %4d, Flush: (%4d %4d %4d), ",
 			   si->nr_wb_cp_data, si->nr_wb_data,
 			   si->nr_flushing, si->nr_flushed,
-			   si->flush_list_empty,
+			   si->flush_list_empty);
+		seq_printf(s, "Discard: (%4d %4d)) cmd: %4d undiscard:%4u\n",
 			   si->nr_discarding, si->nr_discarded,
 			   si->nr_discard_cmd, si->undiscard_blks);
 		seq_printf(s, "  - atomic IO: %4d (Max. %4d)\n",
@@ -566,8 +602,12 @@ static int stat_show(struct seq_file *s, void *v)
 			(si->base_mem + si->cache_mem + si->page_mem) >> 10);
 		seq_printf(s, "  - static: %llu KB\n",
 				si->base_mem >> 10);
-		seq_printf(s, "  - cached: %llu KB\n",
+		seq_printf(s, "  - cached all: %llu KB\n",
 				si->cache_mem >> 10);
+		seq_printf(s, "  - read extent cache: %llu KB\n",
+				si->ext_mem[EX_READ] >> 10);
+		seq_printf(s, "  - block age extent cache: %llu KB\n",
+				si->ext_mem[EX_BLOCK_AGE] >> 10);
 		seq_printf(s, "  - paged : %llu KB\n",
 				si->page_mem >> 10);
 	}
@@ -600,10 +640,15 @@ int f2fs_build_stats(struct f2fs_sb_info *sbi)
 	si->sbi = sbi;
 	sbi->stat_info = si;
 
-	atomic64_set(&sbi->total_hit_ext, 0);
-	atomic64_set(&sbi->read_hit_rbtree, 0);
+	/* general extent cache stats */
+	for (i = 0; i < NR_EXTENT_CACHES; i++) {
+		atomic64_set(&sbi->total_hit_ext[i], 0);
+		atomic64_set(&sbi->read_hit_rbtree[i], 0);
+		atomic64_set(&sbi->read_hit_cached[i], 0);
+	}
+
+	/* read extent_cache only */
 	atomic64_set(&sbi->read_hit_largest, 0);
-	atomic64_set(&sbi->read_hit_cached, 0);
 
 	atomic_set(&sbi->inline_xattr, 0);
 	atomic_set(&sbi->inline_inode, 0);

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 21960a8..8e02515 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c

@@ -340,6 +340,7 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir,
 	unsigned int bidx, end_block;
 	struct page *dentry_page;
 	struct f2fs_dir_entry *de = NULL;
+	pgoff_t next_pgofs;
 	bool room = false;
 	int max_slots;
 
@@ -350,12 +351,13 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir,
 			       le32_to_cpu(fname->hash) % nbucket);
 	end_block = bidx + nblock;
 
-	for (; bidx < end_block; bidx++) {
+	while (bidx < end_block) {
 		/* no need to allocate new dentry pages to all the indices */
-		dentry_page = f2fs_find_data_page(dir, bidx);
+		dentry_page = f2fs_find_data_page(dir, bidx, &next_pgofs);
 		if (IS_ERR(dentry_page)) {
 			if (PTR_ERR(dentry_page) == -ENOENT) {
 				room = true;
+				bidx = next_pgofs;
 				continue;
 			} else {
 				*res_page = dentry_page;
@@ -376,6 +378,8 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir,
 		if (max_slots >= s)
 			room = true;
 		f2fs_put_page(dentry_page, 0);
+
+		bidx++;
 	}
 
 	if (!de && room && F2FS_I(dir)->chash != fname->hash) {
@@ -956,7 +960,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
 
 bool f2fs_empty_dir(struct inode *dir)
 {
-	unsigned long bidx;
+	unsigned long bidx = 0;
 	struct page *dentry_page;
 	unsigned int bit_pos;
 	struct f2fs_dentry_block *dentry_blk;
@@ -965,13 +969,17 @@ bool f2fs_empty_dir(struct inode *dir)
 	if (f2fs_has_inline_dentry(dir))
 		return f2fs_empty_inline_dir(dir);
 
-	for (bidx = 0; bidx < nblock; bidx++) {
-		dentry_page = f2fs_get_lock_data_page(dir, bidx, false);
+	while (bidx < nblock) {
+		pgoff_t next_pgofs;
+
+		dentry_page = f2fs_find_data_page(dir, bidx, &next_pgofs);
 		if (IS_ERR(dentry_page)) {
-			if (PTR_ERR(dentry_page) == -ENOENT)
+			if (PTR_ERR(dentry_page) == -ENOENT) {
+				bidx = next_pgofs;
 				continue;
-			else
+			} else {
 				return false;
+			}
 		}
 
 		dentry_blk = page_address(dentry_page);
@@ -983,10 +991,12 @@ bool f2fs_empty_dir(struct inode *dir)
 						NR_DENTRY_IN_BLOCK,
 						bit_pos);
 
-		f2fs_put_page(dentry_page, 1);
+		f2fs_put_page(dentry_page, 0);
 
 		if (bit_pos < NR_DENTRY_IN_BLOCK)
 			return false;
+
+		bidx++;
 	}
 	return true;
 }
@@ -1000,7 +1010,7 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d,
 	struct fscrypt_str de_name = FSTR_INIT(NULL, 0);
 	struct f2fs_sb_info *sbi = F2FS_I_SB(d->inode);
 	struct blk_plug plug;
-	bool readdir_ra = sbi->readdir_ra == 1;
+	bool readdir_ra = sbi->readdir_ra;
 	bool found_valid_dirent = false;
 	int err = 0;
 
@@ -1104,7 +1114,8 @@ static int f2fs_readdir(struct file *file, struct dir_context *ctx)
 		goto out_free;
 	}
 
-	for (; n < npages; n++, ctx->pos = n * NR_DENTRY_IN_BLOCK) {
+	for (; n < npages; ctx->pos = n * NR_DENTRY_IN_BLOCK) {
+		pgoff_t next_pgofs;
 
 		/* allow readdir() to be interrupted */
 		if (fatal_signal_pending(current)) {
@@ -1118,11 +1129,12 @@ static int f2fs_readdir(struct file *file, struct dir_context *ctx)
 			page_cache_sync_readahead(inode->i_mapping, ra, file, n,
 				min(npages - n, (pgoff_t)MAX_DIR_RA_PAGES));
 
-		dentry_page = f2fs_find_data_page(inode, n);
+		dentry_page = f2fs_find_data_page(inode, n, &next_pgofs);
 		if (IS_ERR(dentry_page)) {
 			err = PTR_ERR(dentry_page);
 			if (err == -ENOENT) {
 				err = 0;
+				n = next_pgofs;
 				continue;
 			} else {
 				goto out_free;
@@ -1141,6 +1153,8 @@ static int f2fs_readdir(struct file *file, struct dir_context *ctx)
 		}
 
 		f2fs_put_page(dentry_page, 0);
+
+		n++;
 	}
 out_free:
 	fscrypt_fname_free_buffer(&fstr);

diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c
index 6c9e6f7..342af24 100644
--- a/fs/f2fs/extent_cache.c
+++ b/fs/f2fs/extent_cache.c

@@ -6,6 +6,10 @@
  * Copyright (c) 2015 Samsung Electronics
  * Authors: Jaegeuk Kim <jaegeuk@kernel.org>
  *          Chao Yu <chao2.yu@samsung.com>
+ *
+ * block_age-based extent cache added by:
+ * Copyright (c) 2022 xiaomi Co., Ltd.
+ *             http://www.xiaomi.com/
  */
 
 #include <linux/fs.h>
@@ -15,6 +19,123 @@
 #include "node.h"
 #include <trace/events/f2fs.h>
 
+static void __set_extent_info(struct extent_info *ei,
+				unsigned int fofs, unsigned int len,
+				block_t blk, bool keep_clen,
+				unsigned long age, unsigned long last_blocks,
+				enum extent_type type)
+{
+	ei->fofs = fofs;
+	ei->len = len;
+
+	if (type == EX_READ) {
+		ei->blk = blk;
+		if (keep_clen)
+			return;
+#ifdef CONFIG_F2FS_FS_COMPRESSION
+		ei->c_len = 0;
+#endif
+	} else if (type == EX_BLOCK_AGE) {
+		ei->age = age;
+		ei->last_blocks = last_blocks;
+	}
+}
+
+static bool __may_read_extent_tree(struct inode *inode)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+	if (!test_opt(sbi, READ_EXTENT_CACHE))
+		return false;
+	if (is_inode_flag_set(inode, FI_NO_EXTENT))
+		return false;
+	if (is_inode_flag_set(inode, FI_COMPRESSED_FILE) &&
+			 !f2fs_sb_has_readonly(sbi))
+		return false;
+	return S_ISREG(inode->i_mode);
+}
+
+static bool __may_age_extent_tree(struct inode *inode)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+	if (!test_opt(sbi, AGE_EXTENT_CACHE))
+		return false;
+	/* don't cache block age info for cold file */
+	if (is_inode_flag_set(inode, FI_COMPRESSED_FILE))
+		return false;
+	if (file_is_cold(inode))
+		return false;
+
+	return S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode);
+}
+
+static bool __init_may_extent_tree(struct inode *inode, enum extent_type type)
+{
+	if (type == EX_READ)
+		return __may_read_extent_tree(inode);
+	else if (type == EX_BLOCK_AGE)
+		return __may_age_extent_tree(inode);
+	return false;
+}
+
+static bool __may_extent_tree(struct inode *inode, enum extent_type type)
+{
+	/*
+	 * for recovered files during mount do not create extents
+	 * if shrinker is not registered.
+	 */
+	if (list_empty(&F2FS_I_SB(inode)->s_list))
+		return false;
+
+	return __init_may_extent_tree(inode, type);
+}
+
+static void __try_update_largest_extent(struct extent_tree *et,
+						struct extent_node *en)
+{
+	if (et->type != EX_READ)
+		return;
+	if (en->ei.len <= et->largest.len)
+		return;
+
+	et->largest = en->ei;
+	et->largest_updated = true;
+}
+
+static bool __is_extent_mergeable(struct extent_info *back,
+		struct extent_info *front, enum extent_type type)
+{
+	if (type == EX_READ) {
+#ifdef CONFIG_F2FS_FS_COMPRESSION
+		if (back->c_len && back->len != back->c_len)
+			return false;
+		if (front->c_len && front->len != front->c_len)
+			return false;
+#endif
+		return (back->fofs + back->len == front->fofs &&
+				back->blk + back->len == front->blk);
+	} else if (type == EX_BLOCK_AGE) {
+		return (back->fofs + back->len == front->fofs &&
+			abs(back->age - front->age) <= SAME_AGE_REGION &&
+			abs(back->last_blocks - front->last_blocks) <=
+							SAME_AGE_REGION);
+	}
+	return false;
+}
+
+static bool __is_back_mergeable(struct extent_info *cur,
+		struct extent_info *back, enum extent_type type)
+{
+	return __is_extent_mergeable(back, cur, type);
+}
+
+static bool __is_front_mergeable(struct extent_info *cur,
+		struct extent_info *front, enum extent_type type)
+{
+	return __is_extent_mergeable(cur, front, type);
+}
+
 static struct rb_entry *__lookup_rb_tree_fast(struct rb_entry *cached_re,
 							unsigned int ofs)
 {
@@ -237,6 +358,7 @@ static struct extent_node *__attach_extent_node(struct f2fs_sb_info *sbi,
 				struct rb_node *parent, struct rb_node **p,
 				bool leftmost)
 {
+	struct extent_tree_info *eti = &sbi->extent_tree[et->type];
 	struct extent_node *en;
 
 	en = f2fs_kmem_cache_alloc(extent_node_slab, GFP_ATOMIC, false, sbi);
@@ -250,16 +372,18 @@ static struct extent_node *__attach_extent_node(struct f2fs_sb_info *sbi,
 	rb_link_node(&en->rb_node, parent, p);
 	rb_insert_color_cached(&en->rb_node, &et->root, leftmost);
 	atomic_inc(&et->node_cnt);
-	atomic_inc(&sbi->total_ext_node);
+	atomic_inc(&eti->total_ext_node);
 	return en;
 }
 
 static void __detach_extent_node(struct f2fs_sb_info *sbi,
 				struct extent_tree *et, struct extent_node *en)
 {
+	struct extent_tree_info *eti = &sbi->extent_tree[et->type];
+
 	rb_erase_cached(&en->rb_node, &et->root);
 	atomic_dec(&et->node_cnt);
-	atomic_dec(&sbi->total_ext_node);
+	atomic_dec(&eti->total_ext_node);
 
 	if (et->cached_en == en)
 		et->cached_en = NULL;
@@ -275,61 +399,51 @@ static void __detach_extent_node(struct f2fs_sb_info *sbi,
 static void __release_extent_node(struct f2fs_sb_info *sbi,
 			struct extent_tree *et, struct extent_node *en)
 {
-	spin_lock(&sbi->extent_lock);
+	struct extent_tree_info *eti = &sbi->extent_tree[et->type];
+
+	spin_lock(&eti->extent_lock);
 	f2fs_bug_on(sbi, list_empty(&en->list));
 	list_del_init(&en->list);
-	spin_unlock(&sbi->extent_lock);
+	spin_unlock(&eti->extent_lock);
 
 	__detach_extent_node(sbi, et, en);
 }
 
-static struct extent_tree *__grab_extent_tree(struct inode *inode)
+static struct extent_tree *__grab_extent_tree(struct inode *inode,
+						enum extent_type type)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	struct extent_tree_info *eti = &sbi->extent_tree[type];
 	struct extent_tree *et;
 	nid_t ino = inode->i_ino;
 
-	mutex_lock(&sbi->extent_tree_lock);
-	et = radix_tree_lookup(&sbi->extent_tree_root, ino);
+	mutex_lock(&eti->extent_tree_lock);
+	et = radix_tree_lookup(&eti->extent_tree_root, ino);
 	if (!et) {
 		et = f2fs_kmem_cache_alloc(extent_tree_slab,
 					GFP_NOFS, true, NULL);
-		f2fs_radix_tree_insert(&sbi->extent_tree_root, ino, et);
+		f2fs_radix_tree_insert(&eti->extent_tree_root, ino, et);
 		memset(et, 0, sizeof(struct extent_tree));
 		et->ino = ino;
+		et->type = type;
 		et->root = RB_ROOT_CACHED;
 		et->cached_en = NULL;
 		rwlock_init(&et->lock);
 		INIT_LIST_HEAD(&et->list);
 		atomic_set(&et->node_cnt, 0);
-		atomic_inc(&sbi->total_ext_tree);
+		atomic_inc(&eti->total_ext_tree);
 	} else {
-		atomic_dec(&sbi->total_zombie_tree);
+		atomic_dec(&eti->total_zombie_tree);
 		list_del_init(&et->list);
 	}
-	mutex_unlock(&sbi->extent_tree_lock);
+	mutex_unlock(&eti->extent_tree_lock);
 
 	/* never died until evict_inode */
-	F2FS_I(inode)->extent_tree = et;
+	F2FS_I(inode)->extent_tree[type] = et;
 
 	return et;
 }
 
-static struct extent_node *__init_extent_tree(struct f2fs_sb_info *sbi,
-				struct extent_tree *et, struct extent_info *ei)
-{
-	struct rb_node **p = &et->root.rb_root.rb_node;
-	struct extent_node *en;
-
-	en = __attach_extent_node(sbi, et, ei, NULL, p, true);
-	if (!en)
-		return NULL;
-
-	et->largest = en->ei;
-	et->cached_en = en;
-	return en;
-}
-
 static unsigned int __free_extent_tree(struct f2fs_sb_info *sbi,
 					struct extent_tree *et)
 {
@@ -358,71 +472,89 @@ static void __drop_largest_extent(struct extent_tree *et,
 	}
 }
 
-/* return true, if inode page is changed */
-static void __f2fs_init_extent_tree(struct inode *inode, struct page *ipage)
+void f2fs_init_read_extent_tree(struct inode *inode, struct page *ipage)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	struct f2fs_extent *i_ext = ipage ? &F2FS_INODE(ipage)->i_ext : NULL;
+	struct extent_tree_info *eti = &sbi->extent_tree[EX_READ];
+	struct f2fs_extent *i_ext = &F2FS_INODE(ipage)->i_ext;
 	struct extent_tree *et;
 	struct extent_node *en;
 	struct extent_info ei;
 
-	if (!f2fs_may_extent_tree(inode)) {
-		/* drop largest extent */
+	if (!__may_extent_tree(inode, EX_READ)) {
+		/* drop largest read extent */
 		if (i_ext && i_ext->len) {
 			f2fs_wait_on_page_writeback(ipage, NODE, true, true);
 			i_ext->len = 0;
 			set_page_dirty(ipage);
-			return;
 		}
-		return;
+		goto out;
 	}
 
-	et = __grab_extent_tree(inode);
+	et = __grab_extent_tree(inode, EX_READ);
 
 	if (!i_ext || !i_ext->len)
-		return;
+		goto out;
 
-	get_extent_info(&ei, i_ext);
+	get_read_extent_info(&ei, i_ext);
 
 	write_lock(&et->lock);
 	if (atomic_read(&et->node_cnt))
-		goto out;
+		goto unlock_out;
 
-	en = __init_extent_tree(sbi, et, &ei);
+	en = __attach_extent_node(sbi, et, &ei, NULL,
+				&et->root.rb_root.rb_node, true);
 	if (en) {
-		spin_lock(&sbi->extent_lock);
-		list_add_tail(&en->list, &sbi->extent_list);
-		spin_unlock(&sbi->extent_lock);
+		et->largest = en->ei;
+		et->cached_en = en;
+
+		spin_lock(&eti->extent_lock);
+		list_add_tail(&en->list, &eti->extent_list);
+		spin_unlock(&eti->extent_lock);
 	}
-out:
+unlock_out:
 	write_unlock(&et->lock);
-}
-
-void f2fs_init_extent_tree(struct inode *inode, struct page *ipage)
-{
-	__f2fs_init_extent_tree(inode, ipage);
-
-	if (!F2FS_I(inode)->extent_tree)
+out:
+	if (!F2FS_I(inode)->extent_tree[EX_READ])
 		set_inode_flag(inode, FI_NO_EXTENT);
 }
 
-static bool f2fs_lookup_extent_tree(struct inode *inode, pgoff_t pgofs,
-							struct extent_info *ei)
+void f2fs_init_age_extent_tree(struct inode *inode)
+{
+	if (!__init_may_extent_tree(inode, EX_BLOCK_AGE))
+		return;
+	__grab_extent_tree(inode, EX_BLOCK_AGE);
+}
+
+void f2fs_init_extent_tree(struct inode *inode)
+{
+	/* initialize read cache */
+	if (__init_may_extent_tree(inode, EX_READ))
+		__grab_extent_tree(inode, EX_READ);
+
+	/* initialize block age cache */
+	if (__init_may_extent_tree(inode, EX_BLOCK_AGE))
+		__grab_extent_tree(inode, EX_BLOCK_AGE);
+}
+
+static bool __lookup_extent_tree(struct inode *inode, pgoff_t pgofs,
+			struct extent_info *ei, enum extent_type type)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	struct extent_tree *et = F2FS_I(inode)->extent_tree;
+	struct extent_tree_info *eti = &sbi->extent_tree[type];
+	struct extent_tree *et = F2FS_I(inode)->extent_tree[type];
 	struct extent_node *en;
 	bool ret = false;
 
 	if (!et)
 		return false;
 
-	trace_f2fs_lookup_extent_tree_start(inode, pgofs);
+	trace_f2fs_lookup_extent_tree_start(inode, pgofs, type);
 
 	read_lock(&et->lock);
 
-	if (et->largest.fofs <= pgofs &&
+	if (type == EX_READ &&
+			et->largest.fofs <= pgofs &&
 			et->largest.fofs + et->largest.len > pgofs) {
 		*ei = et->largest;
 		ret = true;
@@ -436,23 +568,26 @@ static bool f2fs_lookup_extent_tree(struct inode *inode, pgoff_t pgofs,
 		goto out;
 
 	if (en == et->cached_en)
-		stat_inc_cached_node_hit(sbi);
+		stat_inc_cached_node_hit(sbi, type);
 	else
-		stat_inc_rbtree_node_hit(sbi);
+		stat_inc_rbtree_node_hit(sbi, type);
 
 	*ei = en->ei;
-	spin_lock(&sbi->extent_lock);
+	spin_lock(&eti->extent_lock);
 	if (!list_empty(&en->list)) {
-		list_move_tail(&en->list, &sbi->extent_list);
+		list_move_tail(&en->list, &eti->extent_list);
 		et->cached_en = en;
 	}
-	spin_unlock(&sbi->extent_lock);
+	spin_unlock(&eti->extent_lock);
 	ret = true;
 out:
-	stat_inc_total_hit(sbi);
+	stat_inc_total_hit(sbi, type);
 	read_unlock(&et->lock);
 
-	trace_f2fs_lookup_extent_tree_end(inode, pgofs, ei);
+	if (type == EX_READ)
+		trace_f2fs_lookup_read_extent_tree_end(inode, pgofs, ei);
+	else if (type == EX_BLOCK_AGE)
+		trace_f2fs_lookup_age_extent_tree_end(inode, pgofs, ei);
 	return ret;
 }
 
@@ -461,18 +596,20 @@ static struct extent_node *__try_merge_extent_node(struct f2fs_sb_info *sbi,
 				struct extent_node *prev_ex,
 				struct extent_node *next_ex)
 {
+	struct extent_tree_info *eti = &sbi->extent_tree[et->type];
 	struct extent_node *en = NULL;
 
-	if (prev_ex && __is_back_mergeable(ei, &prev_ex->ei)) {
+	if (prev_ex && __is_back_mergeable(ei, &prev_ex->ei, et->type)) {
 		prev_ex->ei.len += ei->len;
 		ei = &prev_ex->ei;
 		en = prev_ex;
 	}
 
-	if (next_ex && __is_front_mergeable(ei, &next_ex->ei)) {
+	if (next_ex && __is_front_mergeable(ei, &next_ex->ei, et->type)) {
 		next_ex->ei.fofs = ei->fofs;
-		next_ex->ei.blk = ei->blk;
 		next_ex->ei.len += ei->len;
+		if (et->type == EX_READ)
+			next_ex->ei.blk = ei->blk;
 		if (en)
 			__release_extent_node(sbi, et, prev_ex);
 
@@ -484,12 +621,12 @@ static struct extent_node *__try_merge_extent_node(struct f2fs_sb_info *sbi,
 
 	__try_update_largest_extent(et, en);
 
-	spin_lock(&sbi->extent_lock);
+	spin_lock(&eti->extent_lock);
 	if (!list_empty(&en->list)) {
-		list_move_tail(&en->list, &sbi->extent_list);
+		list_move_tail(&en->list, &eti->extent_list);
 		et->cached_en = en;
 	}
-	spin_unlock(&sbi->extent_lock);
+	spin_unlock(&eti->extent_lock);
 	return en;
 }
 
@@ -499,6 +636,7 @@ static struct extent_node *__insert_extent_tree(struct f2fs_sb_info *sbi,
 				struct rb_node *insert_parent,
 				bool leftmost)
 {
+	struct extent_tree_info *eti = &sbi->extent_tree[et->type];
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
 	struct extent_node *en = NULL;
@@ -521,48 +659,55 @@ static struct extent_node *__insert_extent_tree(struct f2fs_sb_info *sbi,
 	__try_update_largest_extent(et, en);
 
 	/* update in global extent list */
-	spin_lock(&sbi->extent_lock);
-	list_add_tail(&en->list, &sbi->extent_list);
+	spin_lock(&eti->extent_lock);
+	list_add_tail(&en->list, &eti->extent_list);
 	et->cached_en = en;
-	spin_unlock(&sbi->extent_lock);
+	spin_unlock(&eti->extent_lock);
 	return en;
 }
 
-static void f2fs_update_extent_tree_range(struct inode *inode,
-				pgoff_t fofs, block_t blkaddr, unsigned int len)
+static void __update_extent_tree_range(struct inode *inode,
+			struct extent_info *tei, enum extent_type type)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	struct extent_tree *et = F2FS_I(inode)->extent_tree;
+	struct extent_tree *et = F2FS_I(inode)->extent_tree[type];
 	struct extent_node *en = NULL, *en1 = NULL;
 	struct extent_node *prev_en = NULL, *next_en = NULL;
 	struct extent_info ei, dei, prev;
 	struct rb_node **insert_p = NULL, *insert_parent = NULL;
+	unsigned int fofs = tei->fofs, len = tei->len;
 	unsigned int end = fofs + len;
-	unsigned int pos = (unsigned int)fofs;
 	bool updated = false;
 	bool leftmost = false;
 
 	if (!et)
 		return;
 
-	trace_f2fs_update_extent_tree_range(inode, fofs, blkaddr, len, 0);
+	if (type == EX_READ)
+		trace_f2fs_update_read_extent_tree_range(inode, fofs, len,
+						tei->blk, 0);
+	else if (type == EX_BLOCK_AGE)
+		trace_f2fs_update_age_extent_tree_range(inode, fofs, len,
+						tei->age, tei->last_blocks);
 
 	write_lock(&et->lock);
 
-	if (is_inode_flag_set(inode, FI_NO_EXTENT)) {
-		write_unlock(&et->lock);
-		return;
+	if (type == EX_READ) {
+		if (is_inode_flag_set(inode, FI_NO_EXTENT)) {
+			write_unlock(&et->lock);
+			return;
+		}
+
+		prev = et->largest;
+		dei.len = 0;
+
+		/*
+		 * drop largest extent before lookup, in case it's already
+		 * been shrunk from extent tree
+		 */
+		__drop_largest_extent(et, fofs, len);
 	}
 
-	prev = et->largest;
-	dei.len = 0;
-
-	/*
-	 * drop largest extent before lookup, in case it's already
-	 * been shrunk from extent tree
-	 */
-	__drop_largest_extent(et, fofs, len);
-
 	/* 1. lookup first extent node in range [fofs, fofs + len - 1] */
 	en = (struct extent_node *)f2fs_lookup_rb_tree_ret(&et->root,
 					(struct rb_entry *)et->cached_en, fofs,
@@ -582,26 +727,32 @@ static void f2fs_update_extent_tree_range(struct inode *inode,
 
 		dei = en->ei;
 		org_end = dei.fofs + dei.len;
-		f2fs_bug_on(sbi, pos >= org_end);
+		f2fs_bug_on(sbi, fofs >= org_end);
 
-		if (pos > dei.fofs && pos - dei.fofs >= F2FS_MIN_EXTENT_LEN) {
-			en->ei.len = pos - en->ei.fofs;
+		if (fofs > dei.fofs && (type != EX_READ ||
+				fofs - dei.fofs >= F2FS_MIN_EXTENT_LEN)) {
+			en->ei.len = fofs - en->ei.fofs;
 			prev_en = en;
 			parts = 1;
 		}
 
-		if (end < org_end && org_end - end >= F2FS_MIN_EXTENT_LEN) {
+		if (end < org_end && (type != EX_READ ||
+				org_end - end >= F2FS_MIN_EXTENT_LEN)) {
 			if (parts) {
-				set_extent_info(&ei, end,
-						end - dei.fofs + dei.blk,
-						org_end - end);
+				__set_extent_info(&ei,
+					end, org_end - end,
+					end - dei.fofs + dei.blk, false,
+					dei.age, dei.last_blocks,
+					type);
 				en1 = __insert_extent_tree(sbi, et, &ei,
 							NULL, NULL, true);
 				next_en = en1;
 			} else {
-				en->ei.fofs = end;
-				en->ei.blk += end - dei.fofs;
-				en->ei.len -= end - dei.fofs;
+				__set_extent_info(&en->ei,
+					end, en->ei.len - (end - dei.fofs),
+					en->ei.blk + (end - dei.fofs), true,
+					dei.age, dei.last_blocks,
+					type);
 				next_en = en;
 			}
 			parts++;
@@ -631,10 +782,15 @@ static void f2fs_update_extent_tree_range(struct inode *inode,
 		en = next_en;
 	}
 
-	/* 3. update extent in extent cache */
-	if (blkaddr) {
+	if (type == EX_BLOCK_AGE)
+		goto update_age_extent_cache;
 
-		set_extent_info(&ei, fofs, blkaddr, len);
+	/* 3. update extent in read extent cache */
+	BUG_ON(type != EX_READ);
+
+	if (tei->blk) {
+		__set_extent_info(&ei, fofs, len, tei->blk, false,
+				  0, 0, EX_READ);
 		if (!__try_merge_extent_node(sbi, et, &ei, prev_en, next_en))
 			__insert_extent_tree(sbi, et, &ei,
 					insert_p, insert_parent, leftmost);
@@ -656,7 +812,17 @@ static void f2fs_update_extent_tree_range(struct inode *inode,
 		et->largest_updated = false;
 		updated = true;
 	}
+	goto out_read_extent_cache;
+update_age_extent_cache:
+	if (!tei->last_blocks)
+		goto out_read_extent_cache;
 
+	__set_extent_info(&ei, fofs, len, 0, false,
+			tei->age, tei->last_blocks, EX_BLOCK_AGE);
+	if (!__try_merge_extent_node(sbi, et, &ei, prev_en, next_en))
+		__insert_extent_tree(sbi, et, &ei,
+					insert_p, insert_parent, leftmost);
+out_read_extent_cache:
 	write_unlock(&et->lock);
 
 	if (updated)
@@ -664,19 +830,20 @@ static void f2fs_update_extent_tree_range(struct inode *inode,
 }
 
 #ifdef CONFIG_F2FS_FS_COMPRESSION
-void f2fs_update_extent_tree_range_compressed(struct inode *inode,
+void f2fs_update_read_extent_tree_range_compressed(struct inode *inode,
 				pgoff_t fofs, block_t blkaddr, unsigned int llen,
 				unsigned int c_len)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	struct extent_tree *et = F2FS_I(inode)->extent_tree;
+	struct extent_tree *et = F2FS_I(inode)->extent_tree[EX_READ];
 	struct extent_node *en = NULL;
 	struct extent_node *prev_en = NULL, *next_en = NULL;
 	struct extent_info ei;
 	struct rb_node **insert_p = NULL, *insert_parent = NULL;
 	bool leftmost = false;
 
-	trace_f2fs_update_extent_tree_range(inode, fofs, blkaddr, llen, c_len);
+	trace_f2fs_update_read_extent_tree_range(inode, fofs, llen,
+						blkaddr, c_len);
 
 	/* it is safe here to check FI_NO_EXTENT w/o et->lock in ro image */
 	if (is_inode_flag_set(inode, FI_NO_EXTENT))
@@ -693,7 +860,7 @@ void f2fs_update_extent_tree_range_compressed(struct inode *inode,
 	if (en)
 		goto unlock_out;
 
-	set_extent_info(&ei, fofs, blkaddr, llen);
+	__set_extent_info(&ei, fofs, llen, blkaddr, true, 0, 0, EX_READ);
 	ei.c_len = c_len;
 
 	if (!__try_merge_extent_node(sbi, et, &ei, prev_en, next_en))
@@ -704,24 +871,114 @@ void f2fs_update_extent_tree_range_compressed(struct inode *inode,
 }
 #endif
 
-unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink)
+static unsigned long long __calculate_block_age(unsigned long long new,
+						unsigned long long old)
 {
+	unsigned long long diff;
+
+	diff = (new >= old) ? new - (new - old) : new + (old - new);
+
+	return div_u64(diff * LAST_AGE_WEIGHT, 100);
+}
+
+/* This returns a new age and allocated blocks in ei */
+static int __get_new_block_age(struct inode *inode, struct extent_info *ei,
+						block_t blkaddr)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	loff_t f_size = i_size_read(inode);
+	unsigned long long cur_blocks =
+				atomic64_read(&sbi->allocated_data_blocks);
+	struct extent_info tei = *ei;	/* only fofs and len are valid */
+
+	/*
+	 * When I/O is not aligned to a PAGE_SIZE, update will happen to the last
+	 * file block even in seq write. So don't record age for newly last file
+	 * block here.
+	 */
+	if ((f_size >> PAGE_SHIFT) == ei->fofs && f_size & (PAGE_SIZE - 1) &&
+			blkaddr == NEW_ADDR)
+		return -EINVAL;
+
+	if (__lookup_extent_tree(inode, ei->fofs, &tei, EX_BLOCK_AGE)) {
+		unsigned long long cur_age;
+
+		if (cur_blocks >= tei.last_blocks)
+			cur_age = cur_blocks - tei.last_blocks;
+		else
+			/* allocated_data_blocks overflow */
+			cur_age = ULLONG_MAX - tei.last_blocks + cur_blocks;
+
+		if (tei.age)
+			ei->age = __calculate_block_age(cur_age, tei.age);
+		else
+			ei->age = cur_age;
+		ei->last_blocks = cur_blocks;
+		WARN_ON(ei->age > cur_blocks);
+		return 0;
+	}
+
+	f2fs_bug_on(sbi, blkaddr == NULL_ADDR);
+
+	/* the data block was allocated for the first time */
+	if (blkaddr == NEW_ADDR)
+		goto out;
+
+	if (__is_valid_data_blkaddr(blkaddr) &&
+	    !f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC_ENHANCE)) {
+		f2fs_bug_on(sbi, 1);
+		return -EINVAL;
+	}
+out:
+	/*
+	 * init block age with zero, this can happen when the block age extent
+	 * was reclaimed due to memory constraint or system reboot
+	 */
+	ei->age = 0;
+	ei->last_blocks = cur_blocks;
+	return 0;
+}
+
+static void __update_extent_cache(struct dnode_of_data *dn, enum extent_type type)
+{
+	struct extent_info ei = {};
+
+	if (!__may_extent_tree(dn->inode, type))
+		return;
+
+	ei.fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) +
+								dn->ofs_in_node;
+	ei.len = 1;
+
+	if (type == EX_READ) {
+		if (dn->data_blkaddr == NEW_ADDR)
+			ei.blk = NULL_ADDR;
+		else
+			ei.blk = dn->data_blkaddr;
+	} else if (type == EX_BLOCK_AGE) {
+		if (__get_new_block_age(dn->inode, &ei, dn->data_blkaddr))
+			return;
+	}
+	__update_extent_tree_range(dn->inode, &ei, type);
+}
+
+static unsigned int __shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink,
+					enum extent_type type)
+{
+	struct extent_tree_info *eti = &sbi->extent_tree[type];
 	struct extent_tree *et, *next;
 	struct extent_node *en;
 	unsigned int node_cnt = 0, tree_cnt = 0;
 	int remained;
 
-	if (!test_opt(sbi, EXTENT_CACHE))
-		return 0;
-
-	if (!atomic_read(&sbi->total_zombie_tree))
+	if (!atomic_read(&eti->total_zombie_tree))
 		goto free_node;
 
-	if (!mutex_trylock(&sbi->extent_tree_lock))
+	if (!mutex_trylock(&eti->extent_tree_lock))
 		goto out;
 
 	/* 1. remove unreferenced extent tree */
-	list_for_each_entry_safe(et, next, &sbi->zombie_list, list) {
+	list_for_each_entry_safe(et, next, &eti->zombie_list, list) {
 		if (atomic_read(&et->node_cnt)) {
 			write_lock(&et->lock);
 			node_cnt += __free_extent_tree(sbi, et);
@@ -729,61 +986,137 @@ unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink)
 		}
 		f2fs_bug_on(sbi, atomic_read(&et->node_cnt));
 		list_del_init(&et->list);
-		radix_tree_delete(&sbi->extent_tree_root, et->ino);
+		radix_tree_delete(&eti->extent_tree_root, et->ino);
 		kmem_cache_free(extent_tree_slab, et);
-		atomic_dec(&sbi->total_ext_tree);
-		atomic_dec(&sbi->total_zombie_tree);
+		atomic_dec(&eti->total_ext_tree);
+		atomic_dec(&eti->total_zombie_tree);
 		tree_cnt++;
 
 		if (node_cnt + tree_cnt >= nr_shrink)
 			goto unlock_out;
 		cond_resched();
 	}
-	mutex_unlock(&sbi->extent_tree_lock);
+	mutex_unlock(&eti->extent_tree_lock);
 
 free_node:
 	/* 2. remove LRU extent entries */
-	if (!mutex_trylock(&sbi->extent_tree_lock))
+	if (!mutex_trylock(&eti->extent_tree_lock))
 		goto out;
 
 	remained = nr_shrink - (node_cnt + tree_cnt);
 
-	spin_lock(&sbi->extent_lock);
+	spin_lock(&eti->extent_lock);
 	for (; remained > 0; remained--) {
-		if (list_empty(&sbi->extent_list))
+		if (list_empty(&eti->extent_list))
 			break;
-		en = list_first_entry(&sbi->extent_list,
+		en = list_first_entry(&eti->extent_list,
 					struct extent_node, list);
 		et = en->et;
 		if (!write_trylock(&et->lock)) {
 			/* refresh this extent node's position in extent list */
-			list_move_tail(&en->list, &sbi->extent_list);
+			list_move_tail(&en->list, &eti->extent_list);
 			continue;
 		}
 
 		list_del_init(&en->list);
-		spin_unlock(&sbi->extent_lock);
+		spin_unlock(&eti->extent_lock);
 
 		__detach_extent_node(sbi, et, en);
 
 		write_unlock(&et->lock);
 		node_cnt++;
-		spin_lock(&sbi->extent_lock);
+		spin_lock(&eti->extent_lock);
 	}
-	spin_unlock(&sbi->extent_lock);
+	spin_unlock(&eti->extent_lock);
 
 unlock_out:
-	mutex_unlock(&sbi->extent_tree_lock);
+	mutex_unlock(&eti->extent_tree_lock);
 out:
-	trace_f2fs_shrink_extent_tree(sbi, node_cnt, tree_cnt);
+	trace_f2fs_shrink_extent_tree(sbi, node_cnt, tree_cnt, type);
 
 	return node_cnt + tree_cnt;
 }
 
-unsigned int f2fs_destroy_extent_node(struct inode *inode)
+/* read extent cache operations */
+bool f2fs_lookup_read_extent_cache(struct inode *inode, pgoff_t pgofs,
+				struct extent_info *ei)
+{
+	if (!__may_extent_tree(inode, EX_READ))
+		return false;
+
+	return __lookup_extent_tree(inode, pgofs, ei, EX_READ);
+}
+
+void f2fs_update_read_extent_cache(struct dnode_of_data *dn)
+{
+	return __update_extent_cache(dn, EX_READ);
+}
+
+void f2fs_update_read_extent_cache_range(struct dnode_of_data *dn,
+				pgoff_t fofs, block_t blkaddr, unsigned int len)
+{
+	struct extent_info ei = {
+		.fofs = fofs,
+		.len = len,
+		.blk = blkaddr,
+	};
+
+	if (!__may_extent_tree(dn->inode, EX_READ))
+		return;
+
+	__update_extent_tree_range(dn->inode, &ei, EX_READ);
+}
+
+unsigned int f2fs_shrink_read_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink)
+{
+	if (!test_opt(sbi, READ_EXTENT_CACHE))
+		return 0;
+
+	return __shrink_extent_tree(sbi, nr_shrink, EX_READ);
+}
+
+/* block age extent cache operations */
+bool f2fs_lookup_age_extent_cache(struct inode *inode, pgoff_t pgofs,
+				struct extent_info *ei)
+{
+	if (!__may_extent_tree(inode, EX_BLOCK_AGE))
+		return false;
+
+	return __lookup_extent_tree(inode, pgofs, ei, EX_BLOCK_AGE);
+}
+
+void f2fs_update_age_extent_cache(struct dnode_of_data *dn)
+{
+	return __update_extent_cache(dn, EX_BLOCK_AGE);
+}
+
+void f2fs_update_age_extent_cache_range(struct dnode_of_data *dn,
+				pgoff_t fofs, unsigned int len)
+{
+	struct extent_info ei = {
+		.fofs = fofs,
+		.len = len,
+	};
+
+	if (!__may_extent_tree(dn->inode, EX_BLOCK_AGE))
+		return;
+
+	__update_extent_tree_range(dn->inode, &ei, EX_BLOCK_AGE);
+}
+
+unsigned int f2fs_shrink_age_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink)
+{
+	if (!test_opt(sbi, AGE_EXTENT_CACHE))
+		return 0;
+
+	return __shrink_extent_tree(sbi, nr_shrink, EX_BLOCK_AGE);
+}
+
+static unsigned int __destroy_extent_node(struct inode *inode,
+					enum extent_type type)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	struct extent_tree *et = F2FS_I(inode)->extent_tree;
+	struct extent_tree *et = F2FS_I(inode)->extent_tree[type];
 	unsigned int node_cnt = 0;
 
 	if (!et || !atomic_read(&et->node_cnt))
@@ -796,31 +1129,46 @@ unsigned int f2fs_destroy_extent_node(struct inode *inode)
 	return node_cnt;
 }
 
-void f2fs_drop_extent_tree(struct inode *inode)
+void f2fs_destroy_extent_node(struct inode *inode)
+{
+	__destroy_extent_node(inode, EX_READ);
+	__destroy_extent_node(inode, EX_BLOCK_AGE);
+}
+
+static void __drop_extent_tree(struct inode *inode, enum extent_type type)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	struct extent_tree *et = F2FS_I(inode)->extent_tree;
+	struct extent_tree *et = F2FS_I(inode)->extent_tree[type];
 	bool updated = false;
 
-	if (!f2fs_may_extent_tree(inode))
+	if (!__may_extent_tree(inode, type))
 		return;
 
 	write_lock(&et->lock);
-	set_inode_flag(inode, FI_NO_EXTENT);
 	__free_extent_tree(sbi, et);
-	if (et->largest.len) {
-		et->largest.len = 0;
-		updated = true;
+	if (type == EX_READ) {
+		set_inode_flag(inode, FI_NO_EXTENT);
+		if (et->largest.len) {
+			et->largest.len = 0;
+			updated = true;
+		}
 	}
 	write_unlock(&et->lock);
 	if (updated)
 		f2fs_mark_inode_dirty_sync(inode, true);
 }
 
-void f2fs_destroy_extent_tree(struct inode *inode)
+void f2fs_drop_extent_tree(struct inode *inode)
+{
+	__drop_extent_tree(inode, EX_READ);
+	__drop_extent_tree(inode, EX_BLOCK_AGE);
+}
+
+static void __destroy_extent_tree(struct inode *inode, enum extent_type type)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	struct extent_tree *et = F2FS_I(inode)->extent_tree;
+	struct extent_tree_info *eti = &sbi->extent_tree[type];
+	struct extent_tree *et = F2FS_I(inode)->extent_tree[type];
 	unsigned int node_cnt = 0;
 
 	if (!et)
@@ -828,76 +1176,56 @@ void f2fs_destroy_extent_tree(struct inode *inode)
 
 	if (inode->i_nlink && !is_bad_inode(inode) &&
 					atomic_read(&et->node_cnt)) {
-		mutex_lock(&sbi->extent_tree_lock);
-		list_add_tail(&et->list, &sbi->zombie_list);
-		atomic_inc(&sbi->total_zombie_tree);
-		mutex_unlock(&sbi->extent_tree_lock);
+		mutex_lock(&eti->extent_tree_lock);
+		list_add_tail(&et->list, &eti->zombie_list);
+		atomic_inc(&eti->total_zombie_tree);
+		mutex_unlock(&eti->extent_tree_lock);
 		return;
 	}
 
 	/* free all extent info belong to this extent tree */
-	node_cnt = f2fs_destroy_extent_node(inode);
+	node_cnt = __destroy_extent_node(inode, type);
 
 	/* delete extent tree entry in radix tree */
-	mutex_lock(&sbi->extent_tree_lock);
+	mutex_lock(&eti->extent_tree_lock);
 	f2fs_bug_on(sbi, atomic_read(&et->node_cnt));
-	radix_tree_delete(&sbi->extent_tree_root, inode->i_ino);
+	radix_tree_delete(&eti->extent_tree_root, inode->i_ino);
 	kmem_cache_free(extent_tree_slab, et);
-	atomic_dec(&sbi->total_ext_tree);
-	mutex_unlock(&sbi->extent_tree_lock);
+	atomic_dec(&eti->total_ext_tree);
+	mutex_unlock(&eti->extent_tree_lock);
 
-	F2FS_I(inode)->extent_tree = NULL;
+	F2FS_I(inode)->extent_tree[type] = NULL;
 
-	trace_f2fs_destroy_extent_tree(inode, node_cnt);
+	trace_f2fs_destroy_extent_tree(inode, node_cnt, type);
 }
 
-bool f2fs_lookup_extent_cache(struct inode *inode, pgoff_t pgofs,
-					struct extent_info *ei)
+void f2fs_destroy_extent_tree(struct inode *inode)
 {
-	if (!f2fs_may_extent_tree(inode))
-		return false;
-
-	return f2fs_lookup_extent_tree(inode, pgofs, ei);
+	__destroy_extent_tree(inode, EX_READ);
+	__destroy_extent_tree(inode, EX_BLOCK_AGE);
 }
 
-void f2fs_update_extent_cache(struct dnode_of_data *dn)
+static void __init_extent_tree_info(struct extent_tree_info *eti)
 {
-	pgoff_t fofs;
-	block_t blkaddr;
-
-	if (!f2fs_may_extent_tree(dn->inode))
-		return;
-
-	if (dn->data_blkaddr == NEW_ADDR)
-		blkaddr = NULL_ADDR;
-	else
-		blkaddr = dn->data_blkaddr;
-
-	fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) +
-								dn->ofs_in_node;
-	f2fs_update_extent_tree_range(dn->inode, fofs, blkaddr, 1);
-}
-
-void f2fs_update_extent_cache_range(struct dnode_of_data *dn,
-				pgoff_t fofs, block_t blkaddr, unsigned int len)
-
-{
-	if (!f2fs_may_extent_tree(dn->inode))
-		return;
-
-	f2fs_update_extent_tree_range(dn->inode, fofs, blkaddr, len);
+	INIT_RADIX_TREE(&eti->extent_tree_root, GFP_NOIO);
+	mutex_init(&eti->extent_tree_lock);
+	INIT_LIST_HEAD(&eti->extent_list);
+	spin_lock_init(&eti->extent_lock);
+	atomic_set(&eti->total_ext_tree, 0);
+	INIT_LIST_HEAD(&eti->zombie_list);
+	atomic_set(&eti->total_zombie_tree, 0);
+	atomic_set(&eti->total_ext_node, 0);
 }
 
 void f2fs_init_extent_cache_info(struct f2fs_sb_info *sbi)
 {
-	INIT_RADIX_TREE(&sbi->extent_tree_root, GFP_NOIO);
-	mutex_init(&sbi->extent_tree_lock);
-	INIT_LIST_HEAD(&sbi->extent_list);
-	spin_lock_init(&sbi->extent_lock);
-	atomic_set(&sbi->total_ext_tree, 0);
-	INIT_LIST_HEAD(&sbi->zombie_list);
-	atomic_set(&sbi->total_zombie_tree, 0);
-	atomic_set(&sbi->total_ext_node, 0);
+	__init_extent_tree_info(&sbi->extent_tree[EX_READ]);
+	__init_extent_tree_info(&sbi->extent_tree[EX_BLOCK_AGE]);
+
+	/* initialize for block age extents */
+	atomic64_set(&sbi->allocated_data_blocks, 0);
+	sbi->hot_data_age_threshold = DEF_HOT_DATA_AGE_THRESHOLD;
+	sbi->warm_data_age_threshold = DEF_WARM_DATA_AGE_THRESHOLD;
 }
 
 int __init f2fs_create_extent_cache(void)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8b9f0b3..e8953c3 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h

@@ -60,6 +60,7 @@ enum {
 	FAULT_SLAB_ALLOC,
 	FAULT_DQUOT_INIT,
 	FAULT_LOCK_OP,
+	FAULT_BLKADDR,
 	FAULT_MAX,
 };
 
@@ -91,7 +92,7 @@ extern const char *f2fs_fault_name[FAULT_MAX];
 #define F2FS_MOUNT_FLUSH_MERGE		0x00000400
 #define F2FS_MOUNT_NOBARRIER		0x00000800
 #define F2FS_MOUNT_FASTBOOT		0x00001000
-#define F2FS_MOUNT_EXTENT_CACHE		0x00002000
+#define F2FS_MOUNT_READ_EXTENT_CACHE	0x00002000
 #define F2FS_MOUNT_DATA_FLUSH		0x00008000
 #define F2FS_MOUNT_FAULT_INJECTION	0x00010000
 #define F2FS_MOUNT_USRQUOTA		0x00080000
@@ -106,6 +107,7 @@ extern const char *f2fs_fault_name[FAULT_MAX];
 #define F2FS_MOUNT_MERGE_CHECKPOINT	0x10000000
 #define	F2FS_MOUNT_GC_MERGE		0x20000000
 #define F2FS_MOUNT_COMPRESS_CACHE	0x40000000
+#define F2FS_MOUNT_AGE_EXTENT_CACHE	0x80000000
 
 #define F2FS_OPTION(sbi)	((sbi)->mount_opt)
 #define clear_opt(sbi, option)	(F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option)
@@ -202,10 +204,6 @@ struct f2fs_mount_info {
 #define __F2FS_HAS_FEATURE(raw_super, mask)				\
 	((raw_super->feature & cpu_to_le32(mask)) != 0)
 #define F2FS_HAS_FEATURE(sbi, mask)	__F2FS_HAS_FEATURE(sbi->raw_super, mask)
-#define F2FS_SET_FEATURE(sbi, mask)					\
-	(sbi->raw_super->feature |= cpu_to_le32(mask))
-#define F2FS_CLEAR_FEATURE(sbi, mask)					\
-	(sbi->raw_super->feature &= ~cpu_to_le32(mask))
 
 /*
  * Default values for user and/or group using reserved blocks
@@ -328,8 +326,12 @@ struct discard_entry {
 	unsigned char discard_map[SIT_VBLOCK_MAP_SIZE];	/* segment discard bitmap */
 };
 
+/* minimum discard granularity, unit: block count */
+#define MIN_DISCARD_GRANULARITY		1
 /* default discard granularity of inner discard thread, unit: block count */
 #define DEFAULT_DISCARD_GRANULARITY		16
+/* default maximum discard granularity of ordered discard, unit: block count */
+#define DEFAULT_MAX_ORDERED_DISCARD_GRANULARITY	16
 
 /* max discard pend list number */
 #define MAX_PLIST_NUM		512
@@ -408,7 +410,9 @@ struct discard_cmd_control {
 	unsigned int min_discard_issue_time;	/* min. interval between discard issue */
 	unsigned int mid_discard_issue_time;	/* mid. interval between discard issue */
 	unsigned int max_discard_issue_time;	/* max. interval between discard issue */
+	unsigned int discard_urgent_util;	/* utilization which issue discard proactively */
 	unsigned int discard_granularity;	/* discard granularity */
+	unsigned int max_ordered_discard;	/* maximum discard granularity issued by lba order */
 	unsigned int undiscard_blks;		/* # of undiscard blocks */
 	unsigned int next_pos;			/* next discard position */
 	atomic_t issued_discard;		/* # of issued discard */
@@ -593,17 +597,36 @@ enum {
 /* dirty segments threshold for triggering CP */
 #define DEFAULT_DIRTY_THRESHOLD		4
 
-/* for in-memory extent cache entry */
-#define F2FS_MIN_EXTENT_LEN	64	/* minimum extent length */
-
-/* number of extent info in extent cache we try to shrink */
-#define EXTENT_CACHE_SHRINK_NUMBER	128
-
 #define RECOVERY_MAX_RA_BLOCKS		BIO_MAX_VECS
 #define RECOVERY_MIN_RA_BLOCKS		1
 
 #define F2FS_ONSTACK_PAGES	16	/* nr of onstack pages */
 
+/* for in-memory extent cache entry */
+#define F2FS_MIN_EXTENT_LEN	64	/* minimum extent length */
+
+/* number of extent info in extent cache we try to shrink */
+#define READ_EXTENT_CACHE_SHRINK_NUMBER	128
+
+/* number of age extent info in extent cache we try to shrink */
+#define AGE_EXTENT_CACHE_SHRINK_NUMBER	128
+#define LAST_AGE_WEIGHT			30
+#define SAME_AGE_REGION			1024
+
+/*
+ * Define data block with age less than 1GB as hot data
+ * define data block with age less than 10GB but more than 1GB as warm data
+ */
+#define DEF_HOT_DATA_AGE_THRESHOLD	262144
+#define DEF_WARM_DATA_AGE_THRESHOLD	2621440
+
+/* extent cache type */
+enum extent_type {
+	EX_READ,
+	EX_BLOCK_AGE,
+	NR_EXTENT_CACHES,
+};
+
 struct rb_entry {
 	struct rb_node rb_node;		/* rb node located in rb-tree */
 	union {
@@ -618,10 +641,24 @@ struct rb_entry {
 struct extent_info {
 	unsigned int fofs;		/* start offset in a file */
 	unsigned int len;		/* length of the extent */
-	u32 blk;			/* start block address of the extent */
+	union {
+		/* read extent_cache */
+		struct {
+			/* start block address of the extent */
+			block_t blk;
 #ifdef CONFIG_F2FS_FS_COMPRESSION
-	unsigned int c_len;		/* physical extent length of compressed blocks */
+			/* physical extent length of compressed blocks */
+			unsigned int c_len;
 #endif
+		};
+		/* block age extent_cache */
+		struct {
+			/* block age of the extent */
+			unsigned long long age;
+			/* last total blocks allocated */
+			unsigned long long last_blocks;
+		};
+	};
 };
 
 struct extent_node {
@@ -633,13 +670,25 @@ struct extent_node {
 
 struct extent_tree {
 	nid_t ino;			/* inode number */
+	enum extent_type type;		/* keep the extent tree type */
 	struct rb_root_cached root;	/* root of extent info rb-tree */
 	struct extent_node *cached_en;	/* recently accessed extent node */
-	struct extent_info largest;	/* largested extent info */
 	struct list_head list;		/* to be used by sbi->zombie_list */
 	rwlock_t lock;			/* protect extent info rb-tree */
 	atomic_t node_cnt;		/* # of extent node in rb-tree*/
 	bool largest_updated;		/* largest extent updated */
+	struct extent_info largest;	/* largest cached extent for EX_READ */
+};
+
+struct extent_tree_info {
+	struct radix_tree_root extent_tree_root;/* cache extent cache entries */
+	struct mutex extent_tree_lock;	/* locking extent radix tree */
+	struct list_head extent_list;		/* lru list for shrinker */
+	spinlock_t extent_lock;			/* locking extent lru list */
+	atomic_t total_ext_tree;		/* extent tree count */
+	struct list_head zombie_list;		/* extent zombie tree list */
+	atomic_t total_zombie_tree;		/* extent zombie tree count */
+	atomic_t total_ext_node;		/* extent info count */
 };
 
 /*
@@ -764,6 +813,8 @@ enum {
 	FI_COMPRESS_RELEASED,	/* compressed blocks were released */
 	FI_ALIGNED_WRITE,	/* enable aligned write */
 	FI_COW_FILE,		/* indicate COW file */
+	FI_ATOMIC_COMMITTED,	/* indicate atomic commit completed except disk sync */
+	FI_ATOMIC_REPLACE,	/* indicate atomic replace */
 	FI_MAX,			/* max flag, never be used */
 };
 
@@ -800,7 +851,8 @@ struct f2fs_inode_info {
 	struct list_head dirty_list;	/* dirty list for dirs and files */
 	struct list_head gdirty_list;	/* linked in global dirty list */
 	struct task_struct *atomic_write_task;	/* store atomic write task */
-	struct extent_tree *extent_tree;	/* cached extent_tree entry */
+	struct extent_tree *extent_tree[NR_EXTENT_CACHES];
+					/* cached extent_tree entry */
 	struct inode *cow_inode;	/* copy-on-write inode for atomic write */
 
 	/* avoid racing between foreground op and gc */
@@ -822,9 +874,10 @@ struct f2fs_inode_info {
 	unsigned int i_cluster_size;		/* cluster size */
 
 	unsigned int atomic_write_cnt;
+	loff_t original_i_size;		/* original i_size before atomic write */
 };
 
-static inline void get_extent_info(struct extent_info *ext,
+static inline void get_read_extent_info(struct extent_info *ext,
 					struct f2fs_extent *i_ext)
 {
 	ext->fofs = le32_to_cpu(i_ext->fofs);
@@ -832,7 +885,7 @@ static inline void get_extent_info(struct extent_info *ext,
 	ext->len = le32_to_cpu(i_ext->len);
 }
 
-static inline void set_raw_extent(struct extent_info *ext,
+static inline void set_raw_read_extent(struct extent_info *ext,
 					struct f2fs_extent *i_ext)
 {
 	i_ext->fofs = cpu_to_le32(ext->fofs);
@@ -840,17 +893,6 @@ static inline void set_raw_extent(struct extent_info *ext,
 	i_ext->len = cpu_to_le32(ext->len);
 }
 
-static inline void set_extent_info(struct extent_info *ei, unsigned int fofs,
-						u32 blk, unsigned int len)
-{
-	ei->fofs = fofs;
-	ei->blk = blk;
-	ei->len = len;
-#ifdef CONFIG_F2FS_FS_COMPRESSION
-	ei->c_len = 0;
-#endif
-}
-
 static inline bool __is_discard_mergeable(struct discard_info *back,
 			struct discard_info *front, unsigned int max_len)
 {
@@ -870,41 +912,6 @@ static inline bool __is_discard_front_mergeable(struct discard_info *cur,
 	return __is_discard_mergeable(cur, front, max_len);
 }
 
-static inline bool __is_extent_mergeable(struct extent_info *back,
-						struct extent_info *front)
-{
-#ifdef CONFIG_F2FS_FS_COMPRESSION
-	if (back->c_len && back->len != back->c_len)
-		return false;
-	if (front->c_len && front->len != front->c_len)
-		return false;
-#endif
-	return (back->fofs + back->len == front->fofs &&
-			back->blk + back->len == front->blk);
-}
-
-static inline bool __is_back_mergeable(struct extent_info *cur,
-						struct extent_info *back)
-{
-	return __is_extent_mergeable(back, cur);
-}
-
-static inline bool __is_front_mergeable(struct extent_info *cur,
-						struct extent_info *front)
-{
-	return __is_extent_mergeable(cur, front);
-}
-
-extern void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync);
-static inline void __try_update_largest_extent(struct extent_tree *et,
-						struct extent_node *en)
-{
-	if (en->ei.len > et->largest.len) {
-		et->largest = en->ei;
-		et->largest_updated = true;
-	}
-}
-
 /*
  * For free nid management
  */
@@ -1062,9 +1069,6 @@ struct f2fs_sm_info {
 	/* a threshold to reclaim prefree segments */
 	unsigned int rec_prefree_segments;
 
-	/* for batched trimming */
-	unsigned int trim_sections;		/* # of sections to trim */
-
 	struct list_head sit_entry_set;	/* sit entry set list */
 
 	unsigned int ipu_policy;	/* in-place-update policy */
@@ -1318,6 +1322,7 @@ enum {
 	MAX_TIME,
 };
 
+/* Note that you need to keep synchronization with this gc_mode_names array */
 enum {
 	GC_NORMAL,
 	GC_IDLE_CB,
@@ -1668,14 +1673,12 @@ struct f2fs_sb_info {
 	struct mutex flush_lock;		/* for flush exclusion */
 
 	/* for extent tree cache */
-	struct radix_tree_root extent_tree_root;/* cache extent cache entries */
-	struct mutex extent_tree_lock;	/* locking extent radix tree */
-	struct list_head extent_list;		/* lru list for shrinker */
-	spinlock_t extent_lock;			/* locking extent lru list */
-	atomic_t total_ext_tree;		/* extent tree count */
-	struct list_head zombie_list;		/* extent zombie tree list */
-	atomic_t total_zombie_tree;		/* extent zombie tree count */
-	atomic_t total_ext_node;		/* extent info count */
+	struct extent_tree_info extent_tree[NR_EXTENT_CACHES];
+	atomic64_t allocated_data_blocks;	/* for block age extent_cache */
+
+	/* The threshold used for hot and warm data seperation*/
+	unsigned int hot_data_age_threshold;
+	unsigned int warm_data_age_threshold;
 
 	/* basic filesystem units */
 	unsigned int log_sectors_per_block;	/* log2 sectors per block */
@@ -1693,7 +1696,7 @@ struct f2fs_sb_info {
 	unsigned int total_node_count;		/* total node block count */
 	unsigned int total_valid_node_count;	/* valid node block count */
 	int dir_level;				/* directory level */
-	int readdir_ra;				/* readahead inode in readdir */
+	bool readdir_ra;			/* readahead inode in readdir */
 	u64 max_io_bytes;			/* max io bytes to merge IOs */
 
 	block_t user_block_count;		/* # of user blocks */
@@ -1734,8 +1737,9 @@ struct f2fs_sb_info {
 	unsigned int cur_victim_sec;		/* current victim section num */
 	unsigned int gc_mode;			/* current GC state */
 	unsigned int next_victim_seg[2];	/* next segment in victim section */
-	spinlock_t gc_urgent_high_lock;
-	unsigned int gc_urgent_high_remaining;	/* remaining trial count for GC_URGENT_HIGH */
+	spinlock_t gc_remaining_trials_lock;
+	/* remaining trial count for GC_URGENT_* and GC_IDLE_* */
+	unsigned int gc_remaining_trials;
 
 	/* for skip statistic */
 	unsigned long long skipped_gc_rwsem;		/* FG_GC only */
@@ -1759,10 +1763,14 @@ struct f2fs_sb_info {
 	unsigned int segment_count[2];		/* # of allocated segments */
 	unsigned int block_count[2];		/* # of allocated blocks */
 	atomic_t inplace_count;		/* # of inplace update */
-	atomic64_t total_hit_ext;		/* # of lookup extent cache */
-	atomic64_t read_hit_rbtree;		/* # of hit rbtree extent node */
-	atomic64_t read_hit_largest;		/* # of hit largest extent node */
-	atomic64_t read_hit_cached;		/* # of hit cached extent node */
+	/* # of lookup extent cache */
+	atomic64_t total_hit_ext[NR_EXTENT_CACHES];
+	/* # of hit rbtree extent node */
+	atomic64_t read_hit_rbtree[NR_EXTENT_CACHES];
+	/* # of hit cached extent node */
+	atomic64_t read_hit_cached[NR_EXTENT_CACHES];
+	/* # of hit largest extent node in read extent cache */
+	atomic64_t read_hit_largest;
 	atomic_t inline_xattr;			/* # of inline_xattr inodes */
 	atomic_t inline_inode;			/* # of inline_data inodes */
 	atomic_t inline_dir;			/* # of inline_dentry inodes */
@@ -2576,6 +2584,7 @@ static inline block_t __start_sum_addr(struct f2fs_sb_info *sbi)
 	return le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_start_sum);
 }
 
+extern void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync);
 static inline int inc_valid_node_count(struct f2fs_sb_info *sbi,
 					struct inode *inode, bool is_inode)
 {
@@ -3072,6 +3081,8 @@ static inline void f2fs_i_blocks_write(struct inode *inode,
 		set_inode_flag(inode, FI_AUTO_RECOVER);
 }
 
+static inline bool f2fs_is_atomic_file(struct inode *inode);
+
 static inline void f2fs_i_size_write(struct inode *inode, loff_t i_size)
 {
 	bool clean = !is_inode_flag_set(inode, FI_DIRTY_INODE);
@@ -3081,6 +3092,10 @@ static inline void f2fs_i_size_write(struct inode *inode, loff_t i_size)
 		return;
 
 	i_size_write(inode, i_size);
+
+	if (f2fs_is_atomic_file(inode))
+		return;
+
 	f2fs_mark_inode_dirty_sync(inode, true);
 	if (clean || recover)
 		set_inode_flag(inode, FI_AUTO_RECOVER);
@@ -3796,8 +3811,9 @@ int f2fs_reserve_new_block(struct dnode_of_data *dn);
 int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
 int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
 struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
-			blk_opf_t op_flags, bool for_write);
-struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index);
+			blk_opf_t op_flags, bool for_write, pgoff_t *next_pgofs);
+struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index,
+							pgoff_t *next_pgofs);
 struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index,
 			bool for_write);
 struct page *f2fs_get_new_data_page(struct inode *inode,
@@ -3856,9 +3872,19 @@ struct f2fs_stat_info {
 	struct f2fs_sb_info *sbi;
 	int all_area_segs, sit_area_segs, nat_area_segs, ssa_area_segs;
 	int main_area_segs, main_area_sections, main_area_zones;
-	unsigned long long hit_largest, hit_cached, hit_rbtree;
-	unsigned long long hit_total, total_ext;
-	int ext_tree, zombie_tree, ext_node;
+	unsigned long long hit_cached[NR_EXTENT_CACHES];
+	unsigned long long hit_rbtree[NR_EXTENT_CACHES];
+	unsigned long long total_ext[NR_EXTENT_CACHES];
+	unsigned long long hit_total[NR_EXTENT_CACHES];
+	int ext_tree[NR_EXTENT_CACHES];
+	int zombie_tree[NR_EXTENT_CACHES];
+	int ext_node[NR_EXTENT_CACHES];
+	/* to count memory footprint */
+	unsigned long long ext_mem[NR_EXTENT_CACHES];
+	/* for read extent cache */
+	unsigned long long hit_largest;
+	/* for block age extent cache */
+	unsigned long long allocated_data_blocks;
 	int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
 	int ndirty_data, ndirty_qdata;
 	unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
@@ -3917,10 +3943,10 @@ static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi)
 #define stat_other_skip_bggc_count(sbi)	((sbi)->other_skip_bggc++)
 #define stat_inc_dirty_inode(sbi, type)	((sbi)->ndirty_inode[type]++)
 #define stat_dec_dirty_inode(sbi, type)	((sbi)->ndirty_inode[type]--)
-#define stat_inc_total_hit(sbi)		(atomic64_inc(&(sbi)->total_hit_ext))
-#define stat_inc_rbtree_node_hit(sbi)	(atomic64_inc(&(sbi)->read_hit_rbtree))
+#define stat_inc_total_hit(sbi, type)		(atomic64_inc(&(sbi)->total_hit_ext[type]))
+#define stat_inc_rbtree_node_hit(sbi, type)	(atomic64_inc(&(sbi)->read_hit_rbtree[type]))
 #define stat_inc_largest_node_hit(sbi)	(atomic64_inc(&(sbi)->read_hit_largest))
-#define stat_inc_cached_node_hit(sbi)	(atomic64_inc(&(sbi)->read_hit_cached))
+#define stat_inc_cached_node_hit(sbi, type)	(atomic64_inc(&(sbi)->read_hit_cached[type]))
 #define stat_inc_inline_xattr(inode)					\
 	do {								\
 		if (f2fs_has_inline_xattr(inode))			\
@@ -4043,10 +4069,10 @@ void f2fs_update_sit_info(struct f2fs_sb_info *sbi);
 #define stat_other_skip_bggc_count(sbi)			do { } while (0)
 #define stat_inc_dirty_inode(sbi, type)			do { } while (0)
 #define stat_dec_dirty_inode(sbi, type)			do { } while (0)
-#define stat_inc_total_hit(sbi)				do { } while (0)
-#define stat_inc_rbtree_node_hit(sbi)			do { } while (0)
+#define stat_inc_total_hit(sbi, type)			do { } while (0)
+#define stat_inc_rbtree_node_hit(sbi, type)		do { } while (0)
 #define stat_inc_largest_node_hit(sbi)			do { } while (0)
-#define stat_inc_cached_node_hit(sbi)			do { } while (0)
+#define stat_inc_cached_node_hit(sbi, type)		do { } while (0)
 #define stat_inc_inline_xattr(inode)			do { } while (0)
 #define stat_dec_inline_xattr(inode)			do { } while (0)
 #define stat_inc_inline_inode(inode)			do { } while (0)
@@ -4152,20 +4178,34 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
 		bool force, bool *leftmost);
 bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
 				struct rb_root_cached *root, bool check_key);
-unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
-void f2fs_init_extent_tree(struct inode *inode, struct page *ipage);
+void f2fs_init_extent_tree(struct inode *inode);
 void f2fs_drop_extent_tree(struct inode *inode);
-unsigned int f2fs_destroy_extent_node(struct inode *inode);
+void f2fs_destroy_extent_node(struct inode *inode);
 void f2fs_destroy_extent_tree(struct inode *inode);
-bool f2fs_lookup_extent_cache(struct inode *inode, pgoff_t pgofs,
-			struct extent_info *ei);
-void f2fs_update_extent_cache(struct dnode_of_data *dn);
-void f2fs_update_extent_cache_range(struct dnode_of_data *dn,
-			pgoff_t fofs, block_t blkaddr, unsigned int len);
 void f2fs_init_extent_cache_info(struct f2fs_sb_info *sbi);
 int __init f2fs_create_extent_cache(void);
 void f2fs_destroy_extent_cache(void);
 
+/* read extent cache ops */
+void f2fs_init_read_extent_tree(struct inode *inode, struct page *ipage);
+bool f2fs_lookup_read_extent_cache(struct inode *inode, pgoff_t pgofs,
+			struct extent_info *ei);
+void f2fs_update_read_extent_cache(struct dnode_of_data *dn);
+void f2fs_update_read_extent_cache_range(struct dnode_of_data *dn,
+			pgoff_t fofs, block_t blkaddr, unsigned int len);
+unsigned int f2fs_shrink_read_extent_tree(struct f2fs_sb_info *sbi,
+			int nr_shrink);
+
+/* block age extent cache ops */
+void f2fs_init_age_extent_tree(struct inode *inode);
+bool f2fs_lookup_age_extent_cache(struct inode *inode, pgoff_t pgofs,
+			struct extent_info *ei);
+void f2fs_update_age_extent_cache(struct dnode_of_data *dn);
+void f2fs_update_age_extent_cache_range(struct dnode_of_data *dn,
+			pgoff_t fofs, unsigned int len);
+unsigned int f2fs_shrink_age_extent_tree(struct f2fs_sb_info *sbi,
+			int nr_shrink);
+
 /*
  * sysfs.c
  */
@@ -4235,9 +4275,9 @@ int f2fs_write_multi_pages(struct compress_ctx *cc,
 						struct writeback_control *wbc,
 						enum iostat_type io_type);
 int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index);
-void f2fs_update_extent_tree_range_compressed(struct inode *inode,
-				pgoff_t fofs, block_t blkaddr, unsigned int llen,
-				unsigned int c_len);
+void f2fs_update_read_extent_tree_range_compressed(struct inode *inode,
+				pgoff_t fofs, block_t blkaddr,
+				unsigned int llen, unsigned int c_len);
 int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
 				unsigned nr_pages, sector_t *last_block_in_bio,
 				bool is_readahead, bool for_write);
@@ -4318,9 +4358,10 @@ static inline bool f2fs_load_compressed_page(struct f2fs_sb_info *sbi,
 static inline void f2fs_invalidate_compress_pages(struct f2fs_sb_info *sbi,
 							nid_t ino) { }
 #define inc_compr_inode_stat(inode)		do { } while (0)
-static inline void f2fs_update_extent_tree_range_compressed(struct inode *inode,
-				pgoff_t fofs, block_t blkaddr, unsigned int llen,
-				unsigned int c_len) { }
+static inline void f2fs_update_read_extent_tree_range_compressed(
+				struct inode *inode,
+				pgoff_t fofs, block_t blkaddr,
+				unsigned int llen, unsigned int c_len) { }
 #endif
 
 static inline int set_compress_context(struct inode *inode)
@@ -4371,7 +4412,7 @@ static inline bool f2fs_disable_compressed_file(struct inode *inode)
 }
 
 #define F2FS_FEATURE_FUNCS(name, flagname) \
-static inline int f2fs_sb_has_##name(struct f2fs_sb_info *sbi) \
+static inline bool f2fs_sb_has_##name(struct f2fs_sb_info *sbi) \
 { \
 	return F2FS_HAS_FEATURE(sbi, F2FS_FEATURE_##flagname); \
 }
@@ -4391,26 +4432,6 @@ F2FS_FEATURE_FUNCS(casefold, CASEFOLD);
 F2FS_FEATURE_FUNCS(compression, COMPRESSION);
 F2FS_FEATURE_FUNCS(readonly, RO);
 
-static inline bool f2fs_may_extent_tree(struct inode *inode)
-{
-	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-
-	if (!test_opt(sbi, EXTENT_CACHE) ||
-			is_inode_flag_set(inode, FI_NO_EXTENT) ||
-			(is_inode_flag_set(inode, FI_COMPRESSED_FILE) &&
-			 !f2fs_sb_has_readonly(sbi)))
-		return false;
-
-	/*
-	 * for recovered files during mount do not create extents
-	 * if shrinker is not registered.
-	 */
-	if (list_empty(&sbi->s_list))
-		return false;
-
-	return S_ISREG(inode->i_mode);
-}
-
 #ifdef CONFIG_BLK_DEV_ZONED
 static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi,
 				    block_t blkaddr)
@@ -4563,6 +4584,11 @@ static inline void f2fs_handle_page_eio(struct f2fs_sb_info *sbi, pgoff_t ofs,
 	}
 }
 
+static inline bool f2fs_is_readonly(struct f2fs_sb_info *sbi)
+{
+	return f2fs_sb_has_readonly(sbi) || f2fs_readonly(sbi->sb);
+}
+
 #define EFSBADCRC	EBADMSG		/* Bad CRC detected */
 #define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
 

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index f96bbfa..d8d8773 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c

@@ -571,7 +571,7 @@ void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count)
 	raw_node = F2FS_NODE(dn->node_page);
 	addr = blkaddr_in_node(raw_node) + base + ofs;
 
-	/* Assumption: truncateion starts with cluster */
+	/* Assumption: truncation starts with cluster */
 	for (; count > 0; count--, addr++, dn->ofs_in_node++, cluster_index++) {
 		block_t blkaddr = le32_to_cpu(*addr);
 
@@ -618,7 +618,8 @@ void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count)
 		 */
 		fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page),
 							dn->inode) + ofs;
-		f2fs_update_extent_cache_range(dn, fofs, 0, len);
+		f2fs_update_read_extent_cache_range(dn, fofs, 0, len);
+		f2fs_update_age_extent_cache_range(dn, fofs, nr_free);
 		dec_valid_block_count(sbi, dn->inode, nr_free);
 	}
 	dn->ofs_in_node = ofs;
@@ -1496,7 +1497,7 @@ static int f2fs_do_zero_range(struct dnode_of_data *dn, pgoff_t start,
 		f2fs_set_data_blkaddr(dn);
 	}
 
-	f2fs_update_extent_cache_range(dn, start, 0, index - start);
+	f2fs_update_read_extent_cache_range(dn, start, 0, index - start);
 
 	return ret;
 }
@@ -2034,13 +2035,14 @@ static int f2fs_ioc_getversion(struct file *filp, unsigned long arg)
 	return put_user(inode->i_generation, (int __user *)arg);
 }
 
-static int f2fs_ioc_start_atomic_write(struct file *filp)
+static int f2fs_ioc_start_atomic_write(struct file *filp, bool truncate)
 {
 	struct inode *inode = file_inode(filp);
 	struct user_namespace *mnt_userns = file_mnt_user_ns(filp);
 	struct f2fs_inode_info *fi = F2FS_I(inode);
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	struct inode *pinode;
+	loff_t isize;
 	int ret;
 
 	if (!inode_owner_or_capable(mnt_userns, inode))
@@ -2099,13 +2101,25 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
 		f2fs_up_write(&fi->i_gc_rwsem[WRITE]);
 		goto out;
 	}
-	f2fs_i_size_write(fi->cow_inode, i_size_read(inode));
+
+	f2fs_write_inode(inode, NULL);
 
 	stat_inc_atomic_inode(inode);
 
 	set_inode_flag(inode, FI_ATOMIC_FILE);
 	set_inode_flag(fi->cow_inode, FI_COW_FILE);
 	clear_inode_flag(fi->cow_inode, FI_INLINE_DATA);
+
+	isize = i_size_read(inode);
+	fi->original_i_size = isize;
+	if (truncate) {
+		set_inode_flag(inode, FI_ATOMIC_REPLACE);
+		truncate_inode_pages_final(inode->i_mapping);
+		f2fs_i_size_write(inode, 0);
+		isize = 0;
+	}
+	f2fs_i_size_write(fi->cow_inode, isize);
+
 	f2fs_up_write(&fi->i_gc_rwsem[WRITE]);
 
 	f2fs_update_time(sbi, REQ_TIME);
@@ -2137,16 +2151,14 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp)
 
 	if (f2fs_is_atomic_file(inode)) {
 		ret = f2fs_commit_atomic_write(inode);
-		if (ret)
-			goto unlock_out;
-
-		ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
 		if (!ret)
-			f2fs_abort_atomic_write(inode, false);
+			ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
+
+		f2fs_abort_atomic_write(inode, ret);
 	} else {
 		ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false);
 	}
-unlock_out:
+
 	inode_unlock(inode);
 	mnt_drop_write_file(filp);
 	return ret;
@@ -2547,7 +2559,7 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi,
 	struct f2fs_map_blocks map = { .m_next_extent = NULL,
 					.m_seg_type = NO_CHECK_TYPE,
 					.m_may_create = false };
-	struct extent_info ei = {0, 0, 0};
+	struct extent_info ei = {};
 	pgoff_t pg_start, pg_end, next_pgofs;
 	unsigned int blk_per_seg = sbi->blocks_per_seg;
 	unsigned int total = 0, sec_num;
@@ -2579,7 +2591,7 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi,
 	 * lookup mapping info in extent cache, skip defragmenting if physical
 	 * block addresses are continuous.
 	 */
-	if (f2fs_lookup_extent_cache(inode, pg_start, &ei)) {
+	if (f2fs_lookup_read_extent_cache(inode, pg_start, &ei)) {
 		if (ei.fofs + ei.len >= pg_end)
 			goto out;
 	}
@@ -4135,7 +4147,9 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	case FS_IOC_GETVERSION:
 		return f2fs_ioc_getversion(filp, arg);
 	case F2FS_IOC_START_ATOMIC_WRITE:
-		return f2fs_ioc_start_atomic_write(filp);
+		return f2fs_ioc_start_atomic_write(filp, false);
+	case F2FS_IOC_START_ATOMIC_REPLACE:
+		return f2fs_ioc_start_atomic_write(filp, true);
 	case F2FS_IOC_COMMIT_ATOMIC_WRITE:
 		return f2fs_ioc_commit_atomic_write(filp);
 	case F2FS_IOC_ABORT_ATOMIC_WRITE:

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index ee68364..d7a9d84 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c

@@ -141,6 +141,10 @@ static int gc_thread_func(void *data)
 			/* don't bother wait_ms by foreground gc */
 			if (!foreground)
 				wait_ms = gc_th->no_gc_sleep_time;
+		} else {
+			/* reset wait_ms to default sleep time */
+			if (wait_ms == gc_th->no_gc_sleep_time)
+				wait_ms = gc_th->min_sleep_time;
 		}
 
 		if (foreground)
@@ -152,14 +156,14 @@ static int gc_thread_func(void *data)
 		/* balancing f2fs's metadata periodically */
 		f2fs_balance_fs_bg(sbi, true);
 next:
-		if (sbi->gc_mode == GC_URGENT_HIGH) {
-			spin_lock(&sbi->gc_urgent_high_lock);
-			if (sbi->gc_urgent_high_remaining) {
-				sbi->gc_urgent_high_remaining--;
-				if (!sbi->gc_urgent_high_remaining)
+		if (sbi->gc_mode != GC_NORMAL) {
+			spin_lock(&sbi->gc_remaining_trials_lock);
+			if (sbi->gc_remaining_trials) {
+				sbi->gc_remaining_trials--;
+				if (!sbi->gc_remaining_trials)
 					sbi->gc_mode = GC_NORMAL;
 			}
-			spin_unlock(&sbi->gc_urgent_high_lock);
+			spin_unlock(&sbi->gc_remaining_trials_lock);
 		}
 		sb_end_write(sbi->sb);
 
@@ -171,13 +175,10 @@ int f2fs_start_gc_thread(struct f2fs_sb_info *sbi)
 {
 	struct f2fs_gc_kthread *gc_th;
 	dev_t dev = sbi->sb->s_bdev->bd_dev;
-	int err = 0;
 
 	gc_th = f2fs_kmalloc(sbi, sizeof(struct f2fs_gc_kthread), GFP_KERNEL);
-	if (!gc_th) {
-		err = -ENOMEM;
-		goto out;
-	}
+	if (!gc_th)
+		return -ENOMEM;
 
 	gc_th->urgent_sleep_time = DEF_GC_THREAD_URGENT_SLEEP_TIME;
 	gc_th->min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME;
@@ -192,12 +193,14 @@ int f2fs_start_gc_thread(struct f2fs_sb_info *sbi)
 	sbi->gc_thread->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
 			"f2fs_gc-%u:%u", MAJOR(dev), MINOR(dev));
 	if (IS_ERR(gc_th->f2fs_gc_task)) {
-		err = PTR_ERR(gc_th->f2fs_gc_task);
+		int err = PTR_ERR(gc_th->f2fs_gc_task);
+
 		kfree(gc_th);
 		sbi->gc_thread = NULL;
+		return err;
 	}
-out:
-	return err;
+
+	return 0;
 }
 
 void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi)
@@ -1147,7 +1150,7 @@ static int ra_data_block(struct inode *inode, pgoff_t index)
 	struct address_space *mapping = inode->i_mapping;
 	struct dnode_of_data dn;
 	struct page *page;
-	struct extent_info ei = {0, 0, 0};
+	struct extent_info ei = {0, };
 	struct f2fs_io_info fio = {
 		.sbi = sbi,
 		.ino = inode->i_ino,
@@ -1165,7 +1168,7 @@ static int ra_data_block(struct inode *inode, pgoff_t index)
 	if (!page)
 		return -ENOMEM;
 
-	if (f2fs_lookup_extent_cache(inode, index, &ei)) {
+	if (f2fs_lookup_read_extent_cache(inode, index, &ei)) {
 		dn.data_blkaddr = ei.blk + index - ei.fofs;
 		if (unlikely(!f2fs_is_valid_blkaddr(sbi, dn.data_blkaddr,
 						DATA_GENERIC_ENHANCE_READ))) {
@@ -1569,8 +1572,8 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
 				continue;
 			}
 
-			data_page = f2fs_get_read_data_page(inode,
-						start_bidx, REQ_RAHEAD, true);
+			data_page = f2fs_get_read_data_page(inode, start_bidx,
+							REQ_RAHEAD, true, NULL);
 			f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
 			if (IS_ERR(data_page)) {
 				iput(inode);
@@ -1905,9 +1908,7 @@ int __init f2fs_create_garbage_collection_cache(void)
 {
 	victim_entry_slab = f2fs_kmem_cache_create("f2fs_victim_entry",
 					sizeof(struct victim_entry));
-	if (!victim_entry_slab)
-		return -ENOMEM;
-	return 0;
+	return victim_entry_slab ? 0 : -ENOMEM;
 }
 
 void f2fs_destroy_garbage_collection_cache(void)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 9f0d386..ff6cf66 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c

@@ -262,8 +262,8 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page)
 		return false;
 	}
 
-	if (fi->extent_tree) {
-		struct extent_info *ei = &fi->extent_tree->largest;
+	if (fi->extent_tree[EX_READ]) {
+		struct extent_info *ei = &fi->extent_tree[EX_READ]->largest;
 
 		if (ei->len &&
 			(!f2fs_is_valid_blkaddr(sbi, ei->blk,
@@ -392,8 +392,6 @@ static int do_read_inode(struct inode *inode)
 	fi->i_pino = le32_to_cpu(ri->i_pino);
 	fi->i_dir_level = ri->i_dir_level;
 
-	f2fs_init_extent_tree(inode, node_page);
-
 	get_inline_info(inode, ri);
 
 	fi->i_extra_isize = f2fs_has_extra_attr(inode) ?
@@ -479,6 +477,11 @@ static int do_read_inode(struct inode *inode)
 	}
 
 	init_idisk_time(inode);
+
+	/* Need all the flag bits */
+	f2fs_init_read_extent_tree(inode, node_page);
+	f2fs_init_age_extent_tree(inode);
+
 	f2fs_put_page(node_page, 1);
 
 	stat_inc_inline_xattr(inode);
@@ -607,7 +610,7 @@ struct inode *f2fs_iget_retry(struct super_block *sb, unsigned long ino)
 void f2fs_update_inode(struct inode *inode, struct page *node_page)
 {
 	struct f2fs_inode *ri;
-	struct extent_tree *et = F2FS_I(inode)->extent_tree;
+	struct extent_tree *et = F2FS_I(inode)->extent_tree[EX_READ];
 
 	f2fs_wait_on_page_writeback(node_page, NODE, true, true);
 	set_page_dirty(node_page);
@@ -621,12 +624,15 @@ void f2fs_update_inode(struct inode *inode, struct page *node_page)
 	ri->i_uid = cpu_to_le32(i_uid_read(inode));
 	ri->i_gid = cpu_to_le32(i_gid_read(inode));
 	ri->i_links = cpu_to_le32(inode->i_nlink);
-	ri->i_size = cpu_to_le64(i_size_read(inode));
 	ri->i_blocks = cpu_to_le64(SECTOR_TO_BLOCK(inode->i_blocks) + 1);
 
+	if (!f2fs_is_atomic_file(inode) ||
+			is_inode_flag_set(inode, FI_ATOMIC_COMMITTED))
+		ri->i_size = cpu_to_le64(i_size_read(inode));
+
 	if (et) {
 		read_lock(&et->lock);
-		set_raw_extent(&et->largest, &ri->i_ext);
+		set_raw_read_extent(&et->largest, &ri->i_ext);
 		read_unlock(&et->lock);
 	} else {
 		memset(&ri->i_ext, 0, sizeof(ri->i_ext));

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index b6c14c9..46de782 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c

@@ -176,6 +176,32 @@ static void set_compress_new_inode(struct f2fs_sb_info *sbi, struct inode *dir,
 	}
 }
 
+/*
+ * Set file's temperature for hot/cold data separation
+ */
+static void set_file_temperature(struct f2fs_sb_info *sbi, struct inode *inode,
+		const unsigned char *name)
+{
+	__u8 (*extlist)[F2FS_EXTENSION_LEN] = sbi->raw_super->extension_list;
+	int i, cold_count, hot_count;
+
+	f2fs_down_read(&sbi->sb_lock);
+	cold_count = le32_to_cpu(sbi->raw_super->extension_count);
+	hot_count = sbi->raw_super->hot_ext_count;
+	for (i = 0; i < cold_count + hot_count; i++)
+		if (is_extension_exist(name, extlist[i], true))
+			break;
+	f2fs_up_read(&sbi->sb_lock);
+
+	if (i == cold_count + hot_count)
+		return;
+
+	if (i < cold_count)
+		file_set_cold(inode);
+	else
+		file_set_hot(inode);
+}
+
 static struct inode *f2fs_new_inode(struct user_namespace *mnt_userns,
 						struct inode *dir, umode_t mode,
 						const char *name)
@@ -258,8 +284,6 @@ static struct inode *f2fs_new_inode(struct user_namespace *mnt_userns,
 	}
 	F2FS_I(inode)->i_inline_xattr_size = xattr_size;
 
-	f2fs_init_extent_tree(inode, NULL);
-
 	F2FS_I(inode)->i_flags =
 		f2fs_mask_flags(mode, F2FS_I(dir)->i_flags & F2FS_FL_INHERITED);
 
@@ -276,12 +300,17 @@ static struct inode *f2fs_new_inode(struct user_namespace *mnt_userns,
 	if (test_opt(sbi, INLINE_DATA) && f2fs_may_inline_data(inode))
 		set_inode_flag(inode, FI_INLINE_DATA);
 
+	if (name && !test_opt(sbi, DISABLE_EXT_IDENTIFY))
+		set_file_temperature(sbi, inode, name);
+
 	stat_inc_inline_xattr(inode);
 	stat_inc_inline_inode(inode);
 	stat_inc_inline_dir(inode);
 
 	f2fs_set_inode_flags(inode);
 
+	f2fs_init_extent_tree(inode);
+
 	trace_f2fs_new_inode(inode, 0);
 	return inode;
 
@@ -304,36 +333,6 @@ static struct inode *f2fs_new_inode(struct user_namespace *mnt_userns,
 	return ERR_PTR(err);
 }
 
-/*
- * Set file's temperature for hot/cold data separation
- */
-static inline void set_file_temperature(struct f2fs_sb_info *sbi, struct inode *inode,
-		const unsigned char *name)
-{
-	__u8 (*extlist)[F2FS_EXTENSION_LEN] = sbi->raw_super->extension_list;
-	int i, cold_count, hot_count;
-
-	f2fs_down_read(&sbi->sb_lock);
-
-	cold_count = le32_to_cpu(sbi->raw_super->extension_count);
-	hot_count = sbi->raw_super->hot_ext_count;
-
-	for (i = 0; i < cold_count + hot_count; i++) {
-		if (is_extension_exist(name, extlist[i], true))
-			break;
-	}
-
-	f2fs_up_read(&sbi->sb_lock);
-
-	if (i == cold_count + hot_count)
-		return;
-
-	if (i < cold_count)
-		file_set_cold(inode);
-	else
-		file_set_hot(inode);
-}
-
 static int f2fs_create(struct user_namespace *mnt_userns, struct inode *dir,
 		       struct dentry *dentry, umode_t mode, bool excl)
 {
@@ -355,9 +354,6 @@ static int f2fs_create(struct user_namespace *mnt_userns, struct inode *dir,
 	if (IS_ERR(inode))
 		return PTR_ERR(inode);
 
-	if (!test_opt(sbi, DISABLE_EXT_IDENTIFY))
-		set_file_temperature(sbi, inode, dentry->d_name.name);
-
 	inode->i_op = &f2fs_file_inode_operations;
 	inode->i_fop = &f2fs_file_operations;
 	inode->i_mapping->a_ops = &f2fs_dblock_aops;
@@ -629,6 +625,8 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry)
 		goto fail;
 	}
 	f2fs_delete_entry(de, page, dir, inode);
+	f2fs_unlock_op(sbi);
+
 #if IS_ENABLED(CONFIG_UNICODE)
 	/* VFS negative dentries are incompatible with Encoding and
 	 * Case-insensitiveness. Eventually we'll want avoid
@@ -639,8 +637,6 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry)
 	if (IS_CASEFOLDED(dir))
 		d_invalidate(dentry);
 #endif
-	f2fs_unlock_op(sbi);
-
 	if (IS_DIRSYNC(dir))
 		f2fs_sync_fs(sbi->sb, 1);
 fail:

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index b9ee5a11..dde4c04 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c

@@ -60,7 +60,7 @@ bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type)
 	avail_ram = val.totalram - val.totalhigh;
 
 	/*
-	 * give 25%, 25%, 50%, 50%, 50% memory for each components respectively
+	 * give 25%, 25%, 50%, 50%, 25%, 25% memory for each components respectively
 	 */
 	if (type == FREE_NIDS) {
 		mem_size = (nm_i->nid_cnt[FREE_NID] *
@@ -85,12 +85,16 @@ bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type)
 						sizeof(struct ino_entry);
 		mem_size >>= PAGE_SHIFT;
 		res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1);
-	} else if (type == EXTENT_CACHE) {
-		mem_size = (atomic_read(&sbi->total_ext_tree) *
+	} else if (type == READ_EXTENT_CACHE || type == AGE_EXTENT_CACHE) {
+		enum extent_type etype = type == READ_EXTENT_CACHE ?
+						EX_READ : EX_BLOCK_AGE;
+		struct extent_tree_info *eti = &sbi->extent_tree[etype];
+
+		mem_size = (atomic_read(&eti->total_ext_tree) *
 				sizeof(struct extent_tree) +
-				atomic_read(&sbi->total_ext_node) *
+				atomic_read(&eti->total_ext_node) *
 				sizeof(struct extent_node)) >> PAGE_SHIFT;
-		res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1);
+		res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2);
 	} else if (type == DISCARD_CACHE) {
 		mem_size = (atomic_read(&dcc->discard_cmd_cnt) *
 				sizeof(struct discard_cmd)) >> PAGE_SHIFT;
@@ -859,7 +863,7 @@ int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode)
 			blkaddr = data_blkaddr(dn->inode, dn->node_page,
 						dn->ofs_in_node + 1);
 
-		f2fs_update_extent_tree_range_compressed(dn->inode,
+		f2fs_update_read_extent_tree_range_compressed(dn->inode,
 					index, blkaddr,
 					F2FS_I(dn->inode)->i_cluster_size,
 					c_len);

diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
index 3c09cae..99454d4 100644
--- a/fs/f2fs/node.h
+++ b/fs/f2fs/node.h

@@ -146,7 +146,8 @@ enum mem_type {
 	NAT_ENTRIES,	/* indicates the cached nat entry */
 	DIRTY_DENTS,	/* indicates dirty dentry pages */
 	INO_ENTRIES,	/* indicates inode entries */
-	EXTENT_CACHE,	/* indicates extent cache */
+	READ_EXTENT_CACHE,	/* indicates read extent cache */
+	AGE_EXTENT_CACHE,	/* indicates age extent cache */
 	DISCARD_CACHE,	/* indicates memory of cached discard cmds */
 	COMPRESS_PAGE,	/* indicates memory of cached compressed pages */
 	BASE_CHECK,	/* check kernel status */

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index dea95b4..77fd453 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c

@@ -923,9 +923,7 @@ int __init f2fs_create_recovery_cache(void)
 {
 	fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
 					sizeof(struct fsync_inode_entry));
-	if (!fsync_entry_slab)
-		return -ENOMEM;
-	return 0;
+	return fsync_entry_slab ? 0 : -ENOMEM;
 }
 
 void f2fs_destroy_recovery_cache(void)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c1d0713..0f457f5 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c

@@ -192,14 +192,19 @@ void f2fs_abort_atomic_write(struct inode *inode, bool clean)
 	if (!f2fs_is_atomic_file(inode))
 		return;
 
-	if (clean)
-		truncate_inode_pages_final(inode->i_mapping);
 	clear_inode_flag(fi->cow_inode, FI_COW_FILE);
 	iput(fi->cow_inode);
 	fi->cow_inode = NULL;
 	release_atomic_write_cnt(inode);
+	clear_inode_flag(inode, FI_ATOMIC_COMMITTED);
+	clear_inode_flag(inode, FI_ATOMIC_REPLACE);
 	clear_inode_flag(inode, FI_ATOMIC_FILE);
 	stat_dec_atomic_inode(inode);
+
+	if (clean) {
+		truncate_inode_pages_final(inode->i_mapping);
+		f2fs_i_size_write(inode, fi->original_i_size);
+	}
 }
 
 static int __replace_atomic_write_block(struct inode *inode, pgoff_t index,
@@ -257,14 +262,19 @@ static void __complete_revoke_list(struct inode *inode, struct list_head *head,
 					bool revoke)
 {
 	struct revoke_entry *cur, *tmp;
+	bool truncate = is_inode_flag_set(inode, FI_ATOMIC_REPLACE);
 
 	list_for_each_entry_safe(cur, tmp, head, list) {
 		if (revoke)
 			__replace_atomic_write_block(inode, cur->index,
 						cur->old_addr, NULL, true);
+
 		list_del(&cur->list);
 		kmem_cache_free(revoke_entry_slab, cur);
 	}
+
+	if (!revoke && truncate)
+		f2fs_do_truncate_blocks(inode, 0, false);
 }
 
 static int __f2fs_commit_atomic_write(struct inode *inode)
@@ -335,10 +345,12 @@ static int __f2fs_commit_atomic_write(struct inode *inode)
 	}
 
 out:
-	if (ret)
+	if (ret) {
 		sbi->revoked_atomic_block += fi->atomic_write_cnt;
-	else
+	} else {
 		sbi->committed_atomic_block += fi->atomic_write_cnt;
+		set_inode_flag(inode, FI_ATOMIC_COMMITTED);
+	}
 
 	__complete_revoke_list(inode, &revoke_list, ret ? true : false);
 
@@ -437,8 +449,14 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi, bool from_bg)
 		return;
 
 	/* try to shrink extent cache when there is no enough memory */
-	if (!f2fs_available_free_memory(sbi, EXTENT_CACHE))
-		f2fs_shrink_extent_tree(sbi, EXTENT_CACHE_SHRINK_NUMBER);
+	if (!f2fs_available_free_memory(sbi, READ_EXTENT_CACHE))
+		f2fs_shrink_read_extent_tree(sbi,
+				READ_EXTENT_CACHE_SHRINK_NUMBER);
+
+	/* try to shrink age extent cache when there is no enough memory */
+	if (!f2fs_available_free_memory(sbi, AGE_EXTENT_CACHE))
+		f2fs_shrink_age_extent_tree(sbi,
+				AGE_EXTENT_CACHE_SHRINK_NUMBER);
 
 	/* check the # of cached NAT entries */
 	if (!f2fs_available_free_memory(sbi, NAT_ENTRIES))
@@ -620,12 +638,11 @@ int f2fs_create_flush_cmd_control(struct f2fs_sb_info *sbi)
 {
 	dev_t dev = sbi->sb->s_bdev->bd_dev;
 	struct flush_cmd_control *fcc;
-	int err = 0;
 
 	if (SM_I(sbi)->fcc_info) {
 		fcc = SM_I(sbi)->fcc_info;
 		if (fcc->f2fs_issue_flush)
-			return err;
+			return 0;
 		goto init_thread;
 	}
 
@@ -638,19 +655,19 @@ int f2fs_create_flush_cmd_control(struct f2fs_sb_info *sbi)
 	init_llist_head(&fcc->issue_list);
 	SM_I(sbi)->fcc_info = fcc;
 	if (!test_opt(sbi, FLUSH_MERGE))
-		return err;
+		return 0;
 
 init_thread:
 	fcc->f2fs_issue_flush = kthread_run(issue_flush_thread, sbi,
 				"f2fs_flush-%u:%u", MAJOR(dev), MINOR(dev));
 	if (IS_ERR(fcc->f2fs_issue_flush)) {
-		err = PTR_ERR(fcc->f2fs_issue_flush);
-		kfree(fcc);
-		SM_I(sbi)->fcc_info = NULL;
+		int err = PTR_ERR(fcc->f2fs_issue_flush);
+
+		fcc->f2fs_issue_flush = NULL;
 		return err;
 	}
 
-	return err;
+	return 0;
 }
 
 void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free)
@@ -856,7 +873,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info *sbi)
 	}
 	mutex_unlock(&dirty_i->seglist_lock);
 
-	unusable = holes[DATA] > holes[NODE] ? holes[DATA] : holes[NODE];
+	unusable = max(holes[DATA], holes[NODE]);
 	if (unusable > ovp_holes)
 		return unusable - ovp_holes;
 	return 0;
@@ -1052,8 +1069,8 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi,
 		dpolicy->io_aware = true;
 		dpolicy->sync = false;
 		dpolicy->ordered = true;
-		if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
-			dpolicy->granularity = 1;
+		if (utilization(sbi) > dcc->discard_urgent_util) {
+			dpolicy->granularity = MIN_DISCARD_GRANULARITY;
 			if (atomic_read(&dcc->discard_cmd_cnt))
 				dpolicy->max_interval =
 					dcc->min_discard_issue_time;
@@ -1068,7 +1085,7 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi,
 	} else if (discard_type == DPOLICY_UMOUNT) {
 		dpolicy->io_aware = false;
 		/* we need to issue all to keep CP_TRIMMED_FLAG */
-		dpolicy->granularity = 1;
+		dpolicy->granularity = MIN_DISCARD_GRANULARITY;
 		dpolicy->timeout = true;
 	}
 }
@@ -1126,13 +1143,12 @@ static int __submit_discard_cmd(struct f2fs_sb_info *sbi,
 		if (time_to_inject(sbi, FAULT_DISCARD)) {
 			f2fs_show_injection_info(sbi, FAULT_DISCARD);
 			err = -EIO;
-			goto submit;
-		}
-		err = __blkdev_issue_discard(bdev,
+		} else {
+			err = __blkdev_issue_discard(bdev,
 					SECTOR_FROM_BLOCK(start),
 					SECTOR_FROM_BLOCK(len),
 					GFP_NOFS, &bio);
-submit:
+		}
 		if (err) {
 			spin_lock_irqsave(&dc->lock, flags);
 			if (dc->state == D_PARTIAL)
@@ -1342,13 +1358,13 @@ static void __update_discard_tree_range(struct f2fs_sb_info *sbi,
 	}
 }
 
-static int __queue_discard_cmd(struct f2fs_sb_info *sbi,
+static void __queue_discard_cmd(struct f2fs_sb_info *sbi,
 		struct block_device *bdev, block_t blkstart, block_t blklen)
 {
 	block_t lblkstart = blkstart;
 
 	if (!f2fs_bdev_support_discard(bdev))
-		return 0;
+		return;
 
 	trace_f2fs_queue_discard(bdev, blkstart, blklen);
 
@@ -1360,7 +1376,6 @@ static int __queue_discard_cmd(struct f2fs_sb_info *sbi,
 	mutex_lock(&SM_I(sbi)->dcc_info->cmd_lock);
 	__update_discard_tree_range(sbi, bdev, lblkstart, blkstart, blklen);
 	mutex_unlock(&SM_I(sbi)->dcc_info->cmd_lock);
-	return 0;
 }
 
 static unsigned int __issue_discard_cmd_orderly(struct f2fs_sb_info *sbi,
@@ -1448,7 +1463,7 @@ static int __issue_discard_cmd(struct f2fs_sb_info *sbi,
 		if (i + 1 < dpolicy->granularity)
 			break;
 
-		if (i + 1 < DEFAULT_DISCARD_GRANULARITY && dpolicy->ordered)
+		if (i + 1 < dcc->max_ordered_discard && dpolicy->ordered)
 			return __issue_discard_cmd_orderly(sbi, dpolicy);
 
 		pend_list = &dcc->pend_list[i];
@@ -1645,6 +1660,9 @@ bool f2fs_issue_discard_timeout(struct f2fs_sb_info *sbi)
 	struct discard_policy dpolicy;
 	bool dropped;
 
+	if (!atomic_read(&dcc->discard_cmd_cnt))
+		return false;
+
 	__init_discard_policy(sbi, &dpolicy, DPOLICY_UMOUNT,
 					dcc->discard_granularity);
 	__issue_discard_cmd(sbi, &dpolicy);
@@ -1669,6 +1687,11 @@ static int issue_discard_thread(void *data)
 	set_freezable();
 
 	do {
+		wait_event_interruptible_timeout(*q,
+				kthread_should_stop() || freezing(current) ||
+				dcc->discard_wake,
+				msecs_to_jiffies(wait_ms));
+
 		if (sbi->gc_mode == GC_URGENT_HIGH ||
 			!f2fs_available_free_memory(sbi, DISCARD_CACHE))
 			__init_discard_policy(sbi, &dpolicy, DPOLICY_FORCE, 1);
@@ -1676,14 +1699,6 @@ static int issue_discard_thread(void *data)
 			__init_discard_policy(sbi, &dpolicy, DPOLICY_BG,
 						dcc->discard_granularity);
 
-		if (!atomic_read(&dcc->discard_cmd_cnt))
-		       wait_ms = dpolicy.max_interval;
-
-		wait_event_interruptible_timeout(*q,
-				kthread_should_stop() || freezing(current) ||
-				dcc->discard_wake,
-				msecs_to_jiffies(wait_ms));
-
 		if (dcc->discard_wake)
 			dcc->discard_wake = 0;
 
@@ -1697,12 +1712,11 @@ static int issue_discard_thread(void *data)
 			continue;
 		if (kthread_should_stop())
 			return 0;
-		if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) {
+		if (is_sbi_flag_set(sbi, SBI_NEED_FSCK) ||
+			!atomic_read(&dcc->discard_cmd_cnt)) {
 			wait_ms = dpolicy.max_interval;
 			continue;
 		}
-		if (!atomic_read(&dcc->discard_cmd_cnt))
-			continue;
 
 		sb_start_intwrite(sbi->sb);
 
@@ -1717,6 +1731,8 @@ static int issue_discard_thread(void *data)
 		} else {
 			wait_ms = dpolicy.max_interval;
 		}
+		if (!atomic_read(&dcc->discard_cmd_cnt))
+			wait_ms = dpolicy.max_interval;
 
 		sb_end_intwrite(sbi->sb);
 
@@ -1760,7 +1776,8 @@ static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
 	}
 
 	/* For conventional zones, use regular discard if supported */
-	return __queue_discard_cmd(sbi, bdev, lblkstart, blklen);
+	__queue_discard_cmd(sbi, bdev, lblkstart, blklen);
+	return 0;
 }
 #endif
 
@@ -1771,7 +1788,8 @@ static int __issue_discard_async(struct f2fs_sb_info *sbi,
 	if (f2fs_sb_has_blkzoned(sbi) && bdev_is_zoned(bdev))
 		return __f2fs_issue_discard_zone(sbi, bdev, blkstart, blklen);
 #endif
-	return __queue_discard_cmd(sbi, bdev, blkstart, blklen);
+	__queue_discard_cmd(sbi, bdev, blkstart, blklen);
+	return 0;
 }
 
 static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
@@ -2048,6 +2066,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
 		return -ENOMEM;
 
 	dcc->discard_granularity = DEFAULT_DISCARD_GRANULARITY;
+	dcc->max_ordered_discard = DEFAULT_MAX_ORDERED_DISCARD_GRANULARITY;
 	if (F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_SEGMENT)
 		dcc->discard_granularity = sbi->blocks_per_seg;
 	else if (F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_SECTION)
@@ -2068,6 +2087,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
 	dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
 	dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
 	dcc->max_discard_issue_time = DEF_MAX_DISCARD_ISSUE_TIME;
+	dcc->discard_urgent_util = DEF_DISCARD_URGENT_UTIL;
 	dcc->undiscard_blks = 0;
 	dcc->next_pos = 0;
 	dcc->root = RB_ROOT_CACHED;
@@ -2098,8 +2118,7 @@ static void destroy_discard_cmd_control(struct f2fs_sb_info *sbi)
 	 * Recovery can cache discard commands, so in error path of
 	 * fill_super(), it needs to give a chance to handle them.
 	 */
-	if (unlikely(atomic_read(&dcc->discard_cmd_cnt)))
-		f2fs_issue_discard_timeout(sbi);
+	f2fs_issue_discard_timeout(sbi);
 
 	kfree(dcc);
 	SM_I(sbi)->dcc_info = NULL;
@@ -2644,7 +2663,7 @@ bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno)
  * This function always allocates a used segment(from dirty seglist) by SSR
  * manner, so it should recover the existing segment information of valid blocks
  */
-static void change_curseg(struct f2fs_sb_info *sbi, int type, bool flush)
+static void change_curseg(struct f2fs_sb_info *sbi, int type)
 {
 	struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
 	struct curseg_info *curseg = CURSEG_I(sbi, type);
@@ -2652,9 +2671,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int type, bool flush)
 	struct f2fs_summary_block *sum_node;
 	struct page *sum_page;
 
-	if (flush)
-		write_sum_page(sbi, curseg->sum_blk,
-					GET_SUM_BLOCK(sbi, curseg->segno));
+	write_sum_page(sbi, curseg->sum_blk, GET_SUM_BLOCK(sbi, curseg->segno));
 
 	__set_test_and_inuse(sbi, new_segno);
 
@@ -2693,7 +2710,7 @@ static void get_atssr_segment(struct f2fs_sb_info *sbi, int type,
 		struct seg_entry *se = get_seg_entry(sbi, curseg->next_segno);
 
 		curseg->seg_type = se->type;
-		change_curseg(sbi, type, true);
+		change_curseg(sbi, type);
 	} else {
 		/* allocate cold segment by default */
 		curseg->seg_type = CURSEG_COLD_DATA;
@@ -2837,31 +2854,20 @@ static int get_ssr_segment(struct f2fs_sb_info *sbi, int type,
 	return 0;
 }
 
-/*
- * flush out current segment and replace it with new segment
- * This function should be returned with success, otherwise BUG
- */
-static void allocate_segment_by_default(struct f2fs_sb_info *sbi,
-						int type, bool force)
+static bool need_new_seg(struct f2fs_sb_info *sbi, int type)
 {
 	struct curseg_info *curseg = CURSEG_I(sbi, type);
 
-	if (force)
-		new_curseg(sbi, type, true);
-	else if (!is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_FLAG) &&
-					curseg->seg_type == CURSEG_WARM_NODE)
-		new_curseg(sbi, type, false);
-	else if (curseg->alloc_type == LFS &&
-			is_next_segment_free(sbi, curseg, type) &&
-			likely(!is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
-		new_curseg(sbi, type, false);
-	else if (f2fs_need_SSR(sbi) &&
-			get_ssr_segment(sbi, type, SSR, 0))
-		change_curseg(sbi, type, true);
-	else
-		new_curseg(sbi, type, false);
-
-	stat_inc_seg_type(sbi, curseg);
+	if (!is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_FLAG) &&
+	    curseg->seg_type == CURSEG_WARM_NODE)
+		return true;
+	if (curseg->alloc_type == LFS &&
+	    is_next_segment_free(sbi, curseg, type) &&
+	    likely(!is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
+		return true;
+	if (!f2fs_need_SSR(sbi) || !get_ssr_segment(sbi, type, SSR, 0))
+		return true;
+	return false;
 }
 
 void f2fs_allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type,
@@ -2879,7 +2885,7 @@ void f2fs_allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type,
 		goto unlock;
 
 	if (f2fs_need_SSR(sbi) && get_ssr_segment(sbi, type, SSR, 0))
-		change_curseg(sbi, type, true);
+		change_curseg(sbi, type);
 	else
 		new_curseg(sbi, type, true);
 
@@ -2914,7 +2920,8 @@ static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type,
 		return;
 alloc:
 	old_segno = curseg->segno;
-	SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true);
+	new_curseg(sbi, type, true);
+	stat_inc_seg_type(sbi, curseg);
 	locate_dirty_segment(sbi, old_segno);
 }
 
@@ -2945,10 +2952,6 @@ void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi)
 	f2fs_up_read(&SM_I(sbi)->curseg_lock);
 }
 
-static const struct segment_allocation default_salloc_ops = {
-	.allocate_segment = allocate_segment_by_default,
-};
-
 bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
 						struct cp_control *cpc)
 {
@@ -3154,10 +3157,28 @@ static int __get_segment_type_4(struct f2fs_io_info *fio)
 	}
 }
 
+static int __get_age_segment_type(struct inode *inode, pgoff_t pgofs)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	struct extent_info ei = {};
+
+	if (f2fs_lookup_age_extent_cache(inode, pgofs, &ei)) {
+		if (!ei.age)
+			return NO_CHECK_TYPE;
+		if (ei.age <= sbi->hot_data_age_threshold)
+			return CURSEG_HOT_DATA;
+		if (ei.age <= sbi->warm_data_age_threshold)
+			return CURSEG_WARM_DATA;
+		return CURSEG_COLD_DATA;
+	}
+	return NO_CHECK_TYPE;
+}
+
 static int __get_segment_type_6(struct f2fs_io_info *fio)
 {
 	if (fio->type == DATA) {
 		struct inode *inode = fio->page->mapping->host;
+		int type;
 
 		if (is_inode_flag_set(inode, FI_ALIGNED_WRITE))
 			return CURSEG_COLD_DATA_PINNED;
@@ -3172,6 +3193,11 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
 		}
 		if (file_is_cold(inode) || f2fs_need_compress_data(inode))
 			return CURSEG_COLD_DATA;
+
+		type = __get_age_segment_type(inode, fio->page->index);
+		if (type != NO_CHECK_TYPE)
+			return type;
+
 		if (file_is_hot(inode) ||
 				is_inode_flag_set(inode, FI_HOT_DATA) ||
 				f2fs_is_cow_file(inode))
@@ -3268,11 +3294,19 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
 		update_sit_entry(sbi, old_blkaddr, -1);
 
 	if (!__has_curseg_space(sbi, curseg)) {
-		if (from_gc)
+		/*
+		 * Flush out current segment and replace it with new segment.
+		 */
+		if (from_gc) {
 			get_atssr_segment(sbi, type, se->type,
 						AT_SSR, se->mtime);
-		else
-			sit_i->s_ops->allocate_segment(sbi, type, false);
+		} else {
+			if (need_new_seg(sbi, type))
+				new_curseg(sbi, type, false);
+			else
+				change_curseg(sbi, type);
+			stat_inc_seg_type(sbi, curseg);
+		}
 	}
 	/*
 	 * segment dirty status should be updated after segment allocation,
@@ -3282,6 +3316,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
 	locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
 	locate_dirty_segment(sbi, GET_SEGNO(sbi, *new_blkaddr));
 
+	if (IS_DATASEG(type))
+		atomic64_inc(&sbi->allocated_data_blocks);
+
 	up_write(&sit_i->sentry_lock);
 
 	if (page && IS_NODESEG(type)) {
@@ -3409,6 +3446,8 @@ void f2fs_outplace_write_data(struct dnode_of_data *dn,
 	struct f2fs_summary sum;
 
 	f2fs_bug_on(sbi, dn->data_blkaddr == NULL_ADDR);
+	if (fio->io_type == FS_DATA_IO || fio->io_type == FS_CP_DATA_IO)
+		f2fs_update_age_extent_cache(dn);
 	set_summary(&sum, dn->nid, dn->ofs_in_node, fio->version);
 	do_write_page(&sum, fio);
 	f2fs_update_data_blkaddr(dn, fio->new_blkaddr);
@@ -3533,7 +3572,7 @@ void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
 	/* change the current segment */
 	if (segno != curseg->segno) {
 		curseg->next_segno = segno;
-		change_curseg(sbi, type, true);
+		change_curseg(sbi, type);
 	}
 
 	curseg->next_blkoff = GET_BLKOFF_FROM_SEG0(sbi, new_blkaddr);
@@ -3561,7 +3600,7 @@ void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
 	if (recover_curseg) {
 		if (old_cursegno != curseg->segno) {
 			curseg->next_segno = old_cursegno;
-			change_curseg(sbi, type, true);
+			change_curseg(sbi, type);
 		}
 		curseg->next_blkoff = old_blkoff;
 		curseg->alloc_type = old_alloc_type;
@@ -4258,9 +4297,6 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
 		return -ENOMEM;
 #endif
 
-	/* init SIT information */
-	sit_i->s_ops = &default_salloc_ops;
-
 	sit_i->sit_base_addr = le32_to_cpu(raw_super->sit_blkaddr);
 	sit_i->sit_blocks = sit_segs << sbi->log_blocks_per_seg;
 	sit_i->written_valid_blocks = 0;
@@ -5101,11 +5137,9 @@ int f2fs_build_segment_manager(struct f2fs_sb_info *sbi)
 
 	init_f2fs_rwsem(&sm_info->curseg_lock);
 
-	if (!f2fs_readonly(sbi->sb)) {
-		err = f2fs_create_flush_cmd_control(sbi);
-		if (err)
-			return err;
-	}
+	err = f2fs_create_flush_cmd_control(sbi);
+	if (err)
+		return err;
 
 	err = create_discard_cmd_control(sbi);
 	if (err)

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index be8f2d7..3ad1b7b 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h

@@ -222,10 +222,6 @@ struct sec_entry {
 	unsigned int valid_blocks;	/* # of valid blocks in a section */
 };
 
-struct segment_allocation {
-	void (*allocate_segment)(struct f2fs_sb_info *, int, bool);
-};
-
 #define MAX_SKIP_GC_COUNT			16
 
 struct revoke_entry {
@@ -235,8 +231,6 @@ struct revoke_entry {
 };
 
 struct sit_info {
-	const struct segment_allocation *s_ops;
-
 	block_t sit_base_addr;		/* start block address of SIT area */
 	block_t sit_blocks;		/* # of blocks used by SIT area */
 	block_t written_valid_blocks;	/* # of valid blocks in main area */

diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
index dd3c3c7..83d6fb9 100644
--- a/fs/f2fs/shrinker.c
+++ b/fs/f2fs/shrinker.c

@@ -28,10 +28,13 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
 	return count > 0 ? count : 0;
 }
 
-static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
+static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi,
+					enum extent_type type)
 {
-	return atomic_read(&sbi->total_zombie_tree) +
-				atomic_read(&sbi->total_ext_node);
+	struct extent_tree_info *eti = &sbi->extent_tree[type];
+
+	return atomic_read(&eti->total_zombie_tree) +
+				atomic_read(&eti->total_ext_node);
 }
 
 unsigned long f2fs_shrink_count(struct shrinker *shrink,
@@ -53,8 +56,11 @@ unsigned long f2fs_shrink_count(struct shrinker *shrink,
 		}
 		spin_unlock(&f2fs_list_lock);
 
-		/* count extent cache entries */
-		count += __count_extent_cache(sbi);
+		/* count read extent cache entries */
+		count += __count_extent_cache(sbi, EX_READ);
+
+		/* count block age extent cache entries */
+		count += __count_extent_cache(sbi, EX_BLOCK_AGE);
 
 		/* count clean nat cache entries */
 		count += __count_nat_entries(sbi);
@@ -100,7 +106,10 @@ unsigned long f2fs_shrink_scan(struct shrinker *shrink,
 		sbi->shrinker_run_no = run_no;
 
 		/* shrink extent cache entries */
-		freed += f2fs_shrink_extent_tree(sbi, nr >> 1);
+		freed += f2fs_shrink_age_extent_tree(sbi, nr >> 2);
+
+		/* shrink read extent cache entries */
+		freed += f2fs_shrink_read_extent_tree(sbi, nr >> 2);
 
 		/* shrink clean nat cache entries */
 		if (freed < nr)
@@ -130,7 +139,9 @@ void f2fs_join_shrinker(struct f2fs_sb_info *sbi)
 
 void f2fs_leave_shrinker(struct f2fs_sb_info *sbi)
 {
-	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
+	f2fs_shrink_read_extent_tree(sbi, __count_extent_cache(sbi, EX_READ));
+	f2fs_shrink_age_extent_tree(sbi,
+				__count_extent_cache(sbi, EX_BLOCK_AGE));
 
 	spin_lock(&f2fs_list_lock);
 	list_del_init(&sbi->s_list);

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index eaabb85..1f812b9 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c

@@ -61,6 +61,7 @@ const char *f2fs_fault_name[FAULT_MAX] = {
 	[FAULT_SLAB_ALLOC]	= "slab alloc",
 	[FAULT_DQUOT_INIT]	= "dquot initialize",
 	[FAULT_LOCK_OP]		= "lock_op",
+	[FAULT_BLKADDR]		= "invalid blkaddr",
 };
 
 void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate,
@@ -110,6 +111,7 @@ enum {
 	Opt_noinline_dentry,
 	Opt_flush_merge,
 	Opt_noflush_merge,
+	Opt_barrier,
 	Opt_nobarrier,
 	Opt_fastboot,
 	Opt_extent_cache,
@@ -161,6 +163,7 @@ enum {
 	Opt_nogc_merge,
 	Opt_discard_unit,
 	Opt_memory_mode,
+	Opt_age_extent_cache,
 	Opt_err,
 };
 
@@ -186,6 +189,7 @@ static match_table_t f2fs_tokens = {
 	{Opt_noinline_dentry, "noinline_dentry"},
 	{Opt_flush_merge, "flush_merge"},
 	{Opt_noflush_merge, "noflush_merge"},
+	{Opt_barrier, "barrier"},
 	{Opt_nobarrier, "nobarrier"},
 	{Opt_fastboot, "fastboot"},
 	{Opt_extent_cache, "extent_cache"},
@@ -238,6 +242,7 @@ static match_table_t f2fs_tokens = {
 	{Opt_nogc_merge, "nogc_merge"},
 	{Opt_discard_unit, "discard_unit=%s"},
 	{Opt_memory_mode, "memory=%s"},
+	{Opt_age_extent_cache, "age_extent_cache"},
 	{Opt_err, NULL},
 };
 
@@ -285,9 +290,7 @@ static int __init f2fs_create_casefold_cache(void)
 {
 	f2fs_cf_name_slab = f2fs_kmem_cache_create("f2fs_casefolded_name",
 							F2FS_NAME_LEN);
-	if (!f2fs_cf_name_slab)
-		return -ENOMEM;
-	return 0;
+	return f2fs_cf_name_slab ? 0 : -ENOMEM;
 }
 
 static void f2fs_destroy_casefold_cache(void)
@@ -806,14 +809,17 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
 		case Opt_nobarrier:
 			set_opt(sbi, NOBARRIER);
 			break;
+		case Opt_barrier:
+			clear_opt(sbi, NOBARRIER);
+			break;
 		case Opt_fastboot:
 			set_opt(sbi, FASTBOOT);
 			break;
 		case Opt_extent_cache:
-			set_opt(sbi, EXTENT_CACHE);
+			set_opt(sbi, READ_EXTENT_CACHE);
 			break;
 		case Opt_noextent_cache:
-			clear_opt(sbi, EXTENT_CACHE);
+			clear_opt(sbi, READ_EXTENT_CACHE);
 			break;
 		case Opt_noinline_data:
 			clear_opt(sbi, INLINE_DATA);
@@ -1253,6 +1259,9 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
 			}
 			kfree(name);
 			break;
+		case Opt_age_extent_cache:
+			set_opt(sbi, AGE_EXTENT_CACHE);
+			break;
 		default:
 			f2fs_err(sbi, "Unrecognized mount option \"%s\" or missing value",
 				 p);
@@ -1347,6 +1356,11 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
 		return -EINVAL;
 	}
 
+	if (f2fs_is_readonly(sbi) && test_opt(sbi, FLUSH_MERGE)) {
+		f2fs_err(sbi, "FLUSH_MERGE not compatible with readonly mode");
+		return -EINVAL;
+	}
+
 	if (f2fs_sb_has_readonly(sbi) && !f2fs_readonly(sbi->sb)) {
 		f2fs_err(sbi, "Allow to mount readonly mode only");
 		return -EROFS;
@@ -1567,8 +1581,7 @@ static void f2fs_put_super(struct super_block *sb)
 	/* be sure to wait for any on-going discard commands */
 	dropped = f2fs_issue_discard_timeout(sbi);
 
-	if ((f2fs_hw_support_discard(sbi) || f2fs_hw_should_discard(sbi)) &&
-					!sbi->discard_blks && !dropped) {
+	if (f2fs_realtime_discard_enable(sbi) && !sbi->discard_blks && !dropped) {
 		struct cp_control cpc = {
 			.reason = CP_UMOUNT | CP_TRIMMED,
 		};
@@ -1935,16 +1948,22 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",inline_dentry");
 	else
 		seq_puts(seq, ",noinline_dentry");
-	if (!f2fs_readonly(sbi->sb) && test_opt(sbi, FLUSH_MERGE))
+	if (test_opt(sbi, FLUSH_MERGE))
 		seq_puts(seq, ",flush_merge");
+	else
+		seq_puts(seq, ",noflush_merge");
 	if (test_opt(sbi, NOBARRIER))
 		seq_puts(seq, ",nobarrier");
+	else
+		seq_puts(seq, ",barrier");
 	if (test_opt(sbi, FASTBOOT))
 		seq_puts(seq, ",fastboot");
-	if (test_opt(sbi, EXTENT_CACHE))
+	if (test_opt(sbi, READ_EXTENT_CACHE))
 		seq_puts(seq, ",extent_cache");
 	else
 		seq_puts(seq, ",noextent_cache");
+	if (test_opt(sbi, AGE_EXTENT_CACHE))
+		seq_puts(seq, ",age_extent_cache");
 	if (test_opt(sbi, DATA_FLUSH))
 		seq_puts(seq, ",data_flush");
 
@@ -2043,7 +2062,11 @@ static void default_options(struct f2fs_sb_info *sbi)
 		F2FS_OPTION(sbi).active_logs = NR_CURSEG_PERSIST_TYPE;
 
 	F2FS_OPTION(sbi).inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
-	F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_DEFAULT;
+	if (le32_to_cpu(F2FS_RAW_SUPER(sbi)->segment_count_main) <=
+							SMALL_VOLUME_SEGMENTS)
+		F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE;
+	else
+		F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_DEFAULT;
 	F2FS_OPTION(sbi).fsync_mode = FSYNC_MODE_POSIX;
 	F2FS_OPTION(sbi).s_resuid = make_kuid(&init_user_ns, F2FS_DEF_RESUID);
 	F2FS_OPTION(sbi).s_resgid = make_kgid(&init_user_ns, F2FS_DEF_RESGID);
@@ -2059,13 +2082,14 @@ static void default_options(struct f2fs_sb_info *sbi)
 	set_opt(sbi, INLINE_XATTR);
 	set_opt(sbi, INLINE_DATA);
 	set_opt(sbi, INLINE_DENTRY);
-	set_opt(sbi, EXTENT_CACHE);
+	set_opt(sbi, READ_EXTENT_CACHE);
 	set_opt(sbi, NOHEAP);
 	clear_opt(sbi, DISABLE_CHECKPOINT);
 	set_opt(sbi, MERGE_CHECKPOINT);
 	F2FS_OPTION(sbi).unusable_cap = 0;
 	sbi->sb->s_flags |= SB_LAZYTIME;
-	set_opt(sbi, FLUSH_MERGE);
+	if (!f2fs_is_readonly(sbi))
+		set_opt(sbi, FLUSH_MERGE);
 	if (f2fs_hw_support_discard(sbi) || f2fs_hw_should_discard(sbi))
 		set_opt(sbi, DISCARD);
 	if (f2fs_sb_has_blkzoned(sbi)) {
@@ -2200,14 +2224,14 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
 	bool need_restart_ckpt = false, need_stop_ckpt = false;
 	bool need_restart_flush = false, need_stop_flush = false;
 	bool need_restart_discard = false, need_stop_discard = false;
-	bool no_extent_cache = !test_opt(sbi, EXTENT_CACHE);
+	bool no_read_extent_cache = !test_opt(sbi, READ_EXTENT_CACHE);
+	bool no_age_extent_cache = !test_opt(sbi, AGE_EXTENT_CACHE);
 	bool enable_checkpoint = !test_opt(sbi, DISABLE_CHECKPOINT);
 	bool no_io_align = !F2FS_IO_ALIGNED(sbi);
 	bool no_atgc = !test_opt(sbi, ATGC);
 	bool no_discard = !test_opt(sbi, DISCARD);
 	bool no_compress_cache = !test_opt(sbi, COMPRESS_CACHE);
 	bool block_unit_discard = f2fs_block_unit_discard(sbi);
-	struct discard_cmd_control *dcc;
 #ifdef CONFIG_QUOTA
 	int i, j;
 #endif
@@ -2290,11 +2314,17 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
 	}
 
 	/* disallow enable/disable extent_cache dynamically */
-	if (no_extent_cache == !!test_opt(sbi, EXTENT_CACHE)) {
+	if (no_read_extent_cache == !!test_opt(sbi, READ_EXTENT_CACHE)) {
 		err = -EINVAL;
 		f2fs_warn(sbi, "switch extent_cache option is not allowed");
 		goto restore_opts;
 	}
+	/* disallow enable/disable age extent_cache dynamically */
+	if (no_age_extent_cache == !!test_opt(sbi, AGE_EXTENT_CACHE)) {
+		err = -EINVAL;
+		f2fs_warn(sbi, "switch age_extent_cache option is not allowed");
+		goto restore_opts;
+	}
 
 	if (no_io_align == !!F2FS_IO_ALIGNED(sbi)) {
 		err = -EINVAL;
@@ -2388,10 +2418,8 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
 				goto restore_flush;
 			need_stop_discard = true;
 		} else {
-			dcc = SM_I(sbi)->dcc_info;
 			f2fs_stop_discard_thread(sbi);
-			if (atomic_read(&dcc->discard_cmd_cnt))
-				f2fs_issue_discard_timeout(sbi);
+			f2fs_issue_discard_timeout(sbi);
 			need_restart_discard = true;
 		}
 	}
@@ -3616,7 +3644,7 @@ static void init_sb_info(struct f2fs_sb_info *sbi)
 	sbi->seq_file_ra_mul = MIN_RA_MUL;
 	sbi->max_fragment_chunk = DEF_FRAGMENT_SIZE;
 	sbi->max_fragment_hole = DEF_FRAGMENT_SIZE;
-	spin_lock_init(&sbi->gc_urgent_high_lock);
+	spin_lock_init(&sbi->gc_remaining_trials_lock);
 	atomic64_set(&sbi->current_atomic_write, 0);
 
 	sbi->dir_level = DEF_DIR_LEVEL;
@@ -4056,18 +4084,16 @@ static int f2fs_setup_casefold(struct f2fs_sb_info *sbi)
 
 static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
 {
-	struct f2fs_sm_info *sm_i = SM_I(sbi);
-
 	/* adjust parameters according to the volume size */
-	if (sm_i->main_segments <= SMALL_VOLUME_SEGMENTS) {
-		F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE;
+	if (MAIN_SEGS(sbi) <= SMALL_VOLUME_SEGMENTS) {
 		if (f2fs_block_unit_discard(sbi))
-			sm_i->dcc_info->discard_granularity = 1;
-		sm_i->ipu_policy = 1 << F2FS_IPU_FORCE |
+			SM_I(sbi)->dcc_info->discard_granularity =
+						MIN_DISCARD_GRANULARITY;
+		SM_I(sbi)->ipu_policy = 1 << F2FS_IPU_FORCE |
 					1 << F2FS_IPU_HONOR_OPU_WRITE;
 	}
 
-	sbi->readdir_ra = 1;
+	sbi->readdir_ra = true;
 }
 
 static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
@@ -4628,9 +4654,7 @@ static int __init init_inodecache(void)
 	f2fs_inode_cachep = kmem_cache_create("f2fs_inode_cache",
 			sizeof(struct f2fs_inode_info), 0,
 			SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT, NULL);
-	if (!f2fs_inode_cachep)
-		return -ENOMEM;
-	return 0;
+	return f2fs_inode_cachep ? 0 : -ENOMEM;
 }
 
 static void destroy_inodecache(void)
@@ -4695,7 +4719,7 @@ static int __init init_f2fs_fs(void)
 		goto free_iostat;
 	err = f2fs_init_bioset();
 	if (err)
-		goto free_bio_enrty_cache;
+		goto free_bio_entry_cache;
 	err = f2fs_init_compress_mempool();
 	if (err)
 		goto free_bioset;
@@ -4712,7 +4736,7 @@ static int __init init_f2fs_fs(void)
 	f2fs_destroy_compress_mempool();
 free_bioset:
 	f2fs_destroy_bioset();
-free_bio_enrty_cache:
+free_bio_entry_cache:
 	f2fs_destroy_bio_entry_cache();
 free_iostat:
 	f2fs_destroy_iostat_processing();

diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index df27afd..83a366f 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c

@@ -53,9 +53,9 @@ static const char *gc_mode_names[MAX_GC_MODE] = {
 
 struct f2fs_attr {
 	struct attribute attr;
-	ssize_t (*show)(struct f2fs_attr *, struct f2fs_sb_info *, char *);
-	ssize_t (*store)(struct f2fs_attr *, struct f2fs_sb_info *,
-			 const char *, size_t);
+	ssize_t (*show)(struct f2fs_attr *a, struct f2fs_sb_info *sbi, char *buf);
+	ssize_t (*store)(struct f2fs_attr *a, struct f2fs_sb_info *sbi,
+			 const char *buf, size_t len);
 	int struct_type;
 	int offset;
 	int id;
@@ -95,28 +95,28 @@ static unsigned char *__struct_ptr(struct f2fs_sb_info *sbi, int struct_type)
 static ssize_t dirty_segments_show(struct f2fs_attr *a,
 		struct f2fs_sb_info *sbi, char *buf)
 {
-	return sprintf(buf, "%llu\n",
+	return sysfs_emit(buf, "%llu\n",
 			(unsigned long long)(dirty_segments(sbi)));
 }
 
 static ssize_t free_segments_show(struct f2fs_attr *a,
 		struct f2fs_sb_info *sbi, char *buf)
 {
-	return sprintf(buf, "%llu\n",
+	return sysfs_emit(buf, "%llu\n",
 			(unsigned long long)(free_segments(sbi)));
 }
 
 static ssize_t ovp_segments_show(struct f2fs_attr *a,
 		struct f2fs_sb_info *sbi, char *buf)
 {
-	return sprintf(buf, "%llu\n",
+	return sysfs_emit(buf, "%llu\n",
 			(unsigned long long)(overprovision_segments(sbi)));
 }
 
 static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
 		struct f2fs_sb_info *sbi, char *buf)
 {
-	return sprintf(buf, "%llu\n",
+	return sysfs_emit(buf, "%llu\n",
 			(unsigned long long)(sbi->kbytes_written +
 			((f2fs_get_sectors_written(sbi) -
 				sbi->sectors_written_start) >> 1)));
@@ -125,13 +125,13 @@ static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
 static ssize_t sb_status_show(struct f2fs_attr *a,
 		struct f2fs_sb_info *sbi, char *buf)
 {
-	return sprintf(buf, "%lx\n", sbi->s_flag);
+	return sysfs_emit(buf, "%lx\n", sbi->s_flag);
 }
 
 static ssize_t cp_status_show(struct f2fs_attr *a,
 		struct f2fs_sb_info *sbi, char *buf)
 {
-	return sprintf(buf, "%x\n", le32_to_cpu(F2FS_CKPT(sbi)->ckpt_flags));
+	return sysfs_emit(buf, "%x\n", le32_to_cpu(F2FS_CKPT(sbi)->ckpt_flags));
 }
 
 static ssize_t pending_discard_show(struct f2fs_attr *a,
@@ -139,10 +139,16 @@ static ssize_t pending_discard_show(struct f2fs_attr *a,
 {
 	if (!SM_I(sbi)->dcc_info)
 		return -EINVAL;
-	return sprintf(buf, "%llu\n", (unsigned long long)atomic_read(
+	return sysfs_emit(buf, "%llu\n", (unsigned long long)atomic_read(
 				&SM_I(sbi)->dcc_info->discard_cmd_cnt));
 }
 
+static ssize_t gc_mode_show(struct f2fs_attr *a,
+		struct f2fs_sb_info *sbi, char *buf)
+{
+	return sysfs_emit(buf, "%s\n", gc_mode_names[sbi->gc_mode]);
+}
+
 static ssize_t features_show(struct f2fs_attr *a,
 		struct f2fs_sb_info *sbi, char *buf)
 {
@@ -199,7 +205,7 @@ static ssize_t features_show(struct f2fs_attr *a,
 static ssize_t current_reserved_blocks_show(struct f2fs_attr *a,
 					struct f2fs_sb_info *sbi, char *buf)
 {
-	return sprintf(buf, "%u\n", sbi->current_reserved_blocks);
+	return sysfs_emit(buf, "%u\n", sbi->current_reserved_blocks);
 }
 
 static ssize_t unusable_show(struct f2fs_attr *a,
@@ -211,7 +217,7 @@ static ssize_t unusable_show(struct f2fs_attr *a,
 		unusable = sbi->unusable_block_count;
 	else
 		unusable = f2fs_get_unusable_blocks(sbi);
-	return sprintf(buf, "%llu\n", (unsigned long long)unusable);
+	return sysfs_emit(buf, "%llu\n", (unsigned long long)unusable);
 }
 
 static ssize_t encoding_show(struct f2fs_attr *a,
@@ -226,13 +232,13 @@ static ssize_t encoding_show(struct f2fs_attr *a,
 			(sb->s_encoding->version >> 8) & 0xff,
 			sb->s_encoding->version & 0xff);
 #endif
-	return sprintf(buf, "(none)");
+	return sysfs_emit(buf, "(none)\n");
 }
 
 static ssize_t mounted_time_sec_show(struct f2fs_attr *a,
 		struct f2fs_sb_info *sbi, char *buf)
 {
-	return sprintf(buf, "%llu", SIT_I(sbi)->mounted_time);
+	return sysfs_emit(buf, "%llu\n", SIT_I(sbi)->mounted_time);
 }
 
 #ifdef CONFIG_F2FS_STAT_FS
@@ -241,7 +247,7 @@ static ssize_t moved_blocks_foreground_show(struct f2fs_attr *a,
 {
 	struct f2fs_stat_info *si = F2FS_STAT(sbi);
 
-	return sprintf(buf, "%llu\n",
+	return sysfs_emit(buf, "%llu\n",
 		(unsigned long long)(si->tot_blks -
 			(si->bg_data_blks + si->bg_node_blks)));
 }
@@ -251,7 +257,7 @@ static ssize_t moved_blocks_background_show(struct f2fs_attr *a,
 {
 	struct f2fs_stat_info *si = F2FS_STAT(sbi);
 
-	return sprintf(buf, "%llu\n",
+	return sysfs_emit(buf, "%llu\n",
 		(unsigned long long)(si->bg_data_blks + si->bg_node_blks));
 }
 
@@ -262,7 +268,7 @@ static ssize_t avg_vblocks_show(struct f2fs_attr *a,
 
 	si->dirty_count = dirty_segments(sbi);
 	f2fs_update_sit_info(sbi);
-	return sprintf(buf, "%llu\n", (unsigned long long)(si->avg_vblocks));
+	return sysfs_emit(buf, "%llu\n", (unsigned long long)(si->avg_vblocks));
 }
 #endif
 
@@ -332,13 +338,8 @@ static ssize_t f2fs_sbi_show(struct f2fs_attr *a,
 		return sysfs_emit(buf, "%u\n", sbi->compr_new_inode);
 #endif
 
-	if (!strcmp(a->attr.name, "gc_urgent"))
-		return sysfs_emit(buf, "%s\n",
-				gc_mode_names[sbi->gc_mode]);
-
 	if (!strcmp(a->attr.name, "gc_segment_mode"))
-		return sysfs_emit(buf, "%s\n",
-				gc_mode_names[sbi->gc_segment_mode]);
+		return sysfs_emit(buf, "%u\n", sbi->gc_segment_mode);
 
 	if (!strcmp(a->attr.name, "gc_reclaimed_segments")) {
 		return sysfs_emit(buf, "%u\n",
@@ -362,7 +363,7 @@ static ssize_t f2fs_sbi_show(struct f2fs_attr *a,
 
 	ui = (unsigned int *)(ptr + a->offset);
 
-	return sprintf(buf, "%u\n", *ui);
+	return sysfs_emit(buf, "%u\n", *ui);
 }
 
 static ssize_t __sbi_store(struct f2fs_attr *a,
@@ -483,14 +484,27 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
 		return count;
 	}
 
+	if (!strcmp(a->attr.name, "max_ordered_discard")) {
+		if (t == 0 || t > MAX_PLIST_NUM)
+			return -EINVAL;
+		if (!f2fs_block_unit_discard(sbi))
+			return -EINVAL;
+		*ui = t;
+		return count;
+	}
+
+	if (!strcmp(a->attr.name, "discard_urgent_util")) {
+		if (t > 100)
+			return -EINVAL;
+		*ui = t;
+		return count;
+	}
+
 	if (!strcmp(a->attr.name, "migration_granularity")) {
 		if (t == 0 || t > sbi->segs_per_sec)
 			return -EINVAL;
 	}
 
-	if (!strcmp(a->attr.name, "trim_sections"))
-		return -EINVAL;
-
 	if (!strcmp(a->attr.name, "gc_urgent")) {
 		if (t == 0) {
 			sbi->gc_mode = GC_NORMAL;
@@ -531,10 +545,10 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
 		return count;
 	}
 
-	if (!strcmp(a->attr.name, "gc_urgent_high_remaining")) {
-		spin_lock(&sbi->gc_urgent_high_lock);
-		sbi->gc_urgent_high_remaining = t;
-		spin_unlock(&sbi->gc_urgent_high_lock);
+	if (!strcmp(a->attr.name, "gc_remaining_trials")) {
+		spin_lock(&sbi->gc_remaining_trials_lock);
+		sbi->gc_remaining_trials = t;
+		spin_unlock(&sbi->gc_remaining_trials_lock);
 
 		return count;
 	}
@@ -649,6 +663,29 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
 		return count;
 	}
 
+	if (!strcmp(a->attr.name, "readdir_ra")) {
+		sbi->readdir_ra = !!t;
+		return count;
+	}
+
+	if (!strcmp(a->attr.name, "hot_data_age_threshold")) {
+		if (t == 0 || t >= sbi->warm_data_age_threshold)
+			return -EINVAL;
+		if (t == *ui)
+			return count;
+		*ui = (unsigned int)t;
+		return count;
+	}
+
+	if (!strcmp(a->attr.name, "warm_data_age_threshold")) {
+		if (t == 0 || t <= sbi->hot_data_age_threshold)
+			return -EINVAL;
+		if (t == *ui)
+			return count;
+		*ui = (unsigned int)t;
+		return count;
+	}
+
 	*ui = (unsigned int)t;
 
 	return count;
@@ -721,7 +758,7 @@ static void f2fs_sb_release(struct kobject *kobj)
 static ssize_t f2fs_feature_show(struct f2fs_attr *a,
 		struct f2fs_sb_info *sbi, char *buf)
 {
-	return sprintf(buf, "supported\n");
+	return sysfs_emit(buf, "supported\n");
 }
 
 #define F2FS_FEATURE_RO_ATTR(_name)				\
@@ -734,8 +771,8 @@ static ssize_t f2fs_sb_feature_show(struct f2fs_attr *a,
 		struct f2fs_sb_info *sbi, char *buf)
 {
 	if (F2FS_HAS_FEATURE(sbi, a->id))
-		return sprintf(buf, "supported\n");
-	return sprintf(buf, "unsupported\n");
+		return sysfs_emit(buf, "supported\n");
+	return sysfs_emit(buf, "unsupported\n");
 }
 
 #define F2FS_SB_FEATURE_RO_ATTR(_name, _feat)			\
@@ -788,9 +825,10 @@ F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, max_discard_request, max_discard_req
 F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, min_discard_issue_time, min_discard_issue_time);
 F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, mid_discard_issue_time, mid_discard_issue_time);
 F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, max_discard_issue_time, max_discard_issue_time);
+F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, discard_urgent_util, discard_urgent_util);
 F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, discard_granularity, discard_granularity);
+F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, max_ordered_discard, max_ordered_discard);
 F2FS_RW_ATTR(RESERVED_BLOCKS, f2fs_sb_info, reserved_blocks, reserved_blocks);
-F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, batched_trim_sections, trim_sections);
 F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, ipu_policy, ipu_policy);
 F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ipu_util, min_ipu_util);
 F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_fsync_blocks, min_fsync_blocks);
@@ -825,7 +863,7 @@ F2FS_RW_ATTR(FAULT_INFO_TYPE, f2fs_fault_info, inject_type, inject_type);
 #endif
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, data_io_flag, data_io_flag);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, node_io_flag);
-F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_urgent_high_remaining, gc_urgent_high_remaining);
+F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_remaining_trials, gc_remaining_trials);
 F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, ckpt_thread_ioprio);
 F2FS_GENERAL_RO_ATTR(dirty_segments);
 F2FS_GENERAL_RO_ATTR(free_segments);
@@ -838,6 +876,7 @@ F2FS_GENERAL_RO_ATTR(encoding);
 F2FS_GENERAL_RO_ATTR(mounted_time_sec);
 F2FS_GENERAL_RO_ATTR(main_blkaddr);
 F2FS_GENERAL_RO_ATTR(pending_discard);
+F2FS_GENERAL_RO_ATTR(gc_mode);
 #ifdef CONFIG_F2FS_STAT_FS
 F2FS_STAT_ATTR(STAT_INFO, f2fs_stat_info, cp_foreground_calls, cp_count);
 F2FS_STAT_ATTR(STAT_INFO, f2fs_stat_info, cp_background_calls, bg_cp_count);
@@ -902,6 +941,10 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, peak_atomic_write, peak_atomic_write);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, committed_atomic_block, committed_atomic_block);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, revoked_atomic_block, revoked_atomic_block);
 
+/* For block age extent cache */
+F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, hot_data_age_threshold, hot_data_age_threshold);
+F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, warm_data_age_threshold, warm_data_age_threshold);
+
 #define ATTR_LIST(name) (&f2fs_attr_##name.attr)
 static struct attribute *f2fs_attrs[] = {
 	ATTR_LIST(gc_urgent_sleep_time),
@@ -917,9 +960,11 @@ static struct attribute *f2fs_attrs[] = {
 	ATTR_LIST(min_discard_issue_time),
 	ATTR_LIST(mid_discard_issue_time),
 	ATTR_LIST(max_discard_issue_time),
+	ATTR_LIST(discard_urgent_util),
 	ATTR_LIST(discard_granularity),
+	ATTR_LIST(max_ordered_discard),
 	ATTR_LIST(pending_discard),
-	ATTR_LIST(batched_trim_sections),
+	ATTR_LIST(gc_mode),
 	ATTR_LIST(ipu_policy),
 	ATTR_LIST(min_ipu_util),
 	ATTR_LIST(min_fsync_blocks),
@@ -952,7 +997,7 @@ static struct attribute *f2fs_attrs[] = {
 #endif
 	ATTR_LIST(data_io_flag),
 	ATTR_LIST(node_io_flag),
-	ATTR_LIST(gc_urgent_high_remaining),
+	ATTR_LIST(gc_remaining_trials),
 	ATTR_LIST(ckpt_thread_ioprio),
 	ATTR_LIST(dirty_segments),
 	ATTR_LIST(free_segments),
@@ -995,6 +1040,8 @@ static struct attribute *f2fs_attrs[] = {
 	ATTR_LIST(peak_atomic_write),
 	ATTR_LIST(committed_atomic_block),
 	ATTR_LIST(revoked_atomic_block),
+	ATTR_LIST(hot_data_age_threshold),
+	ATTR_LIST(warm_data_age_threshold),
 	NULL,
 };
 ATTRIBUTE_GROUPS(f2fs);
@@ -1243,6 +1290,44 @@ static int __maybe_unused victim_bits_seq_show(struct seq_file *seq,
 	return 0;
 }
 
+static int __maybe_unused discard_plist_seq_show(struct seq_file *seq,
+						void *offset)
+{
+	struct super_block *sb = seq->private;
+	struct f2fs_sb_info *sbi = F2FS_SB(sb);
+	struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
+	int i, count;
+
+	seq_puts(seq, "Discard pend list(Show diacrd_cmd count on each entry, .:not exist):\n");
+	if (!f2fs_realtime_discard_enable(sbi))
+		return 0;
+
+	if (dcc) {
+		mutex_lock(&dcc->cmd_lock);
+		for (i = 0; i < MAX_PLIST_NUM; i++) {
+			struct list_head *pend_list;
+			struct discard_cmd *dc, *tmp;
+
+			if (i % 8 == 0)
+				seq_printf(seq, "  %-3d", i);
+			count = 0;
+			pend_list = &dcc->pend_list[i];
+			list_for_each_entry_safe(dc, tmp, pend_list, list)
+				count++;
+			if (count)
+				seq_printf(seq, " %7d", count);
+			else
+				seq_puts(seq, "       .");
+			if (i % 8 == 7)
+				seq_putc(seq, '\n');
+		}
+		seq_putc(seq, '\n');
+		mutex_unlock(&dcc->cmd_lock);
+	}
+
+	return 0;
+}
+
 int __init f2fs_init_sysfs(void)
 {
 	int ret;
@@ -1313,6 +1398,8 @@ int f2fs_register_sysfs(struct f2fs_sb_info *sbi)
 #endif
 		proc_create_single_data("victim_bits", 0444, sbi->s_proc,
 				victim_bits_seq_show, sb);
+		proc_create_single_data("discard_plist_info", 0444, sbi->s_proc,
+				discard_plist_seq_show, sb);
 	}
 	return 0;
 put_feature_list_kobj:
@@ -1336,6 +1423,7 @@ void f2fs_unregister_sysfs(struct f2fs_sb_info *sbi)
 		remove_proc_entry("segment_info", sbi->s_proc);
 		remove_proc_entry("segment_bits", sbi->s_proc);
 		remove_proc_entry("victim_bits", sbi->s_proc);
+		remove_proc_entry("discard_plist_info", sbi->s_proc);
 		remove_proc_entry(sbi->sb->s_id, f2fs_proc_root);
 	}
 

diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 038ed0b..3a64fa7 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig

@@ -52,3 +52,11 @@
 
 	  If you want to allow mounting a Virtio Filesystem with the "dax"
 	  option, answer Y.
+
+config FUSE_BPF
+	bool "Adds BPF to fuse"
+	depends on FUSE_FS
+	depends on BPF
+	help
+	  Extends FUSE by adding BPF to prefilter calls and potentially pass to a
+	  backing file system

diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 0c48b35..096bd78 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile

@@ -8,6 +8,8 @@
 obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
 
 fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
+fuse-y += passthrough.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
+fuse-$(CONFIG_FUSE_BPF) += backing.o
 
 virtiofs-y := virtio_fs.o

diff --git a/fs/fuse/OWNERS b/fs/fuse/OWNERS
new file mode 100644
index 0000000..5ee6098
--- /dev/null
+++ b/fs/fuse/OWNERS

@@ -0,0 +1 @@
+balsini@google.com

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
new file mode 100644
index 0000000..e784c17
--- /dev/null
+++ b/fs/fuse/backing.c

@@ -0,0 +1,2469 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * FUSE-BPF: Filesystem in Userspace with BPF
+ * Copyright (c) 2021 Google LLC
+ */
+
+#include "fuse_i.h"
+
+#include <linux/fdtable.h>
+#include <linux/filter.h>
+#include <linux/fs_stack.h>
+#include <linux/namei.h>
+
+#include "../internal.h"
+
+#define FUSE_BPF_IOCB_MASK (IOCB_APPEND | IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
+
+struct fuse_bpf_aio_req {
+	struct kiocb iocb;
+	refcount_t ref;
+	struct kiocb *iocb_orig;
+};
+
+static struct kmem_cache *fuse_bpf_aio_request_cachep;
+
+static void fuse_stat_to_attr(struct fuse_conn *fc, struct inode *inode,
+		struct kstat *stat, struct fuse_attr *attr);
+
+static void fuse_file_accessed(struct file *dst_file, struct file *src_file)
+{
+	struct inode *dst_inode;
+	struct inode *src_inode;
+
+	if (dst_file->f_flags & O_NOATIME)
+		return;
+
+	dst_inode = file_inode(dst_file);
+	src_inode = file_inode(src_file);
+
+	if ((!timespec64_equal(&dst_inode->i_mtime, &src_inode->i_mtime) ||
+	     !timespec64_equal(&dst_inode->i_ctime, &src_inode->i_ctime))) {
+		dst_inode->i_mtime = src_inode->i_mtime;
+		dst_inode->i_ctime = src_inode->i_ctime;
+	}
+
+	touch_atime(&dst_file->f_path);
+}
+
+int fuse_open_initialize(struct fuse_bpf_args *fa, struct fuse_open_io *foio,
+			 struct inode *inode, struct file *file, bool isdir)
+{
+	foio->foi = (struct fuse_open_in) {
+		.flags = file->f_flags & ~(O_CREAT | O_EXCL | O_NOCTTY),
+	};
+
+	foio->foo = (struct fuse_open_out) {0};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_fuse_inode(inode)->nodeid,
+		.opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN,
+		.in_numargs = 1,
+		.out_numargs = 1,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = sizeof(foio->foi),
+			.value = &foio->foi,
+		},
+		.out_args[0] = (struct fuse_bpf_arg) {
+			.size = sizeof(foio->foo),
+			.value = &foio->foo,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_open_backing(struct fuse_bpf_args *fa,
+		      struct inode *inode, struct file *file, bool isdir)
+{
+	struct fuse_mount *fm = get_fuse_mount(inode);
+	const struct fuse_open_in *foi = fa->in_args[0].value;
+	struct fuse_file *ff;
+	int retval;
+	int mask;
+	struct fuse_dentry *fd = get_fuse_dentry(file->f_path.dentry);
+	struct file *backing_file;
+
+	ff = fuse_file_alloc(fm);
+	if (!ff)
+		return -ENOMEM;
+	file->private_data = ff;
+
+	switch (foi->flags & O_ACCMODE) {
+	case O_RDONLY:
+		mask = MAY_READ;
+		break;
+
+	case O_WRONLY:
+		mask = MAY_WRITE;
+		break;
+
+	case O_RDWR:
+		mask = MAY_READ | MAY_WRITE;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	retval = inode_permission(&init_user_ns,
+				  get_fuse_inode(inode)->backing_inode, mask);
+	if (retval)
+		return retval;
+
+	backing_file = dentry_open(&fd->backing_path,
+				   foi->flags,
+				   current_cred());
+
+	if (IS_ERR(backing_file)) {
+		fuse_file_free(ff);
+		file->private_data = NULL;
+		return PTR_ERR(backing_file);
+	}
+	ff->backing_file = backing_file;
+
+	return 0;
+}
+
+void *fuse_open_finalize(struct fuse_bpf_args *fa,
+		       struct inode *inode, struct file *file, bool isdir)
+{
+	struct fuse_file *ff = file->private_data;
+	struct fuse_open_out *foo = fa->out_args[0].value;
+
+	if (ff) {
+		ff->fh = foo->fh;
+		ff->nodeid = get_fuse_inode(inode)->nodeid;
+	}
+	return 0;
+}
+
+int fuse_create_open_initialize(
+		struct fuse_bpf_args *fa, struct fuse_create_open_io *fcoio,
+		struct inode *dir, struct dentry *entry,
+		struct file *file, unsigned int flags, umode_t mode)
+{
+	fcoio->fci = (struct fuse_create_in) {
+		.flags = file->f_flags & ~(O_CREAT | O_EXCL | O_NOCTTY),
+		.mode = mode,
+	};
+
+	fcoio->feo = (struct fuse_entry_out) {0};
+	fcoio->foo = (struct fuse_open_out) {0};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_CREATE,
+		.in_numargs = 2,
+		.out_numargs = 2,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = sizeof(fcoio->fci),
+			.value = &fcoio->fci,
+		},
+		.in_args[1] = (struct fuse_bpf_in_arg) {
+			.size = entry->d_name.len + 1,
+			.value = entry->d_name.name,
+		},
+		.out_args[0] = (struct fuse_bpf_arg) {
+			.size = sizeof(fcoio->feo),
+			.value = &fcoio->feo,
+		},
+		.out_args[1] = (struct fuse_bpf_arg) {
+			.size = sizeof(fcoio->foo),
+			.value = &fcoio->foo,
+		},
+	};
+
+	return 0;
+}
+
+static int fuse_open_file_backing(struct inode *inode, struct file *file)
+{
+	struct fuse_mount *fm = get_fuse_mount(inode);
+	struct dentry *entry = file->f_path.dentry;
+	struct fuse_dentry *fuse_dentry = get_fuse_dentry(entry);
+	struct fuse_file *fuse_file;
+	struct file *backing_file;
+
+	fuse_file = fuse_file_alloc(fm);
+	if (!fuse_file)
+		return -ENOMEM;
+	file->private_data = fuse_file;
+
+	backing_file = dentry_open(&fuse_dentry->backing_path, file->f_flags,
+				   current_cred());
+	if (IS_ERR(backing_file)) {
+		fuse_file_free(fuse_file);
+		file->private_data = NULL;
+		return PTR_ERR(backing_file);
+	}
+	fuse_file->backing_file = backing_file;
+
+	return 0;
+}
+
+int fuse_create_open_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry,
+		struct file *file, unsigned int flags, umode_t mode)
+{
+	struct fuse_inode *dir_fuse_inode = get_fuse_inode(dir);
+	struct fuse_dentry *dir_fuse_dentry = get_fuse_dentry(entry->d_parent);
+	struct dentry *backing_dentry = NULL;
+	struct inode *inode = NULL;
+	struct dentry *newent;
+	int err = 0;
+	const struct fuse_create_in *fci = fa->in_args[0].value;
+	struct inode *d_inode = entry->d_inode;
+	u64 target_nodeid = 0;
+
+	if (!dir_fuse_inode || !dir_fuse_dentry)
+		return -EIO;
+
+	inode_lock_nested(dir_fuse_inode->backing_inode, I_MUTEX_PARENT);
+	backing_dentry = lookup_one_len(fa->in_args[1].value,
+					dir_fuse_dentry->backing_path.dentry,
+					strlen(fa->in_args[1].value));
+	inode_unlock(dir_fuse_inode->backing_inode);
+
+	if (IS_ERR(backing_dentry))
+		return PTR_ERR(backing_dentry);
+
+	if (d_really_is_positive(backing_dentry)) {
+		err = -EIO;
+		goto out;
+	}
+
+	err = vfs_create(&init_user_ns,  dir_fuse_inode->backing_inode,
+			 backing_dentry, fci->mode, true);
+	if (err)
+		goto out;
+
+	if (get_fuse_dentry(entry)->backing_path.dentry)
+		path_put(&get_fuse_dentry(entry)->backing_path);
+	get_fuse_dentry(entry)->backing_path = (struct path) {
+		.mnt = dir_fuse_dentry->backing_path.mnt,
+		.dentry = backing_dentry,
+	};
+	path_get(&get_fuse_dentry(entry)->backing_path);
+
+	if (d_inode)
+		target_nodeid = get_fuse_inode(d_inode)->nodeid;
+
+	inode = fuse_iget_backing(dir->i_sb, target_nodeid,
+			get_fuse_dentry(entry)->backing_path.dentry->d_inode);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		goto out;
+	}
+
+	if (get_fuse_inode(inode)->bpf)
+		bpf_prog_put(get_fuse_inode(inode)->bpf);
+	get_fuse_inode(inode)->bpf = dir_fuse_inode->bpf;
+	if (get_fuse_inode(inode)->bpf)
+		bpf_prog_inc(dir_fuse_inode->bpf);
+
+	newent = d_splice_alias(inode, entry);
+	if (IS_ERR(newent)) {
+		err = PTR_ERR(newent);
+		goto out;
+	}
+
+	entry = newent ? newent : entry;
+	err = finish_open(file, entry, fuse_open_file_backing);
+
+out:
+	dput(backing_dentry);
+	return err;
+}
+
+void *fuse_create_open_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry,
+		struct file *file, unsigned int flags, umode_t mode)
+{
+	struct fuse_file *ff = file->private_data;
+	struct fuse_inode *fi = get_fuse_inode(file->f_inode);
+	struct fuse_entry_out *feo = fa->out_args[0].value;
+	struct fuse_open_out *foo = fa->out_args[1].value;
+
+	if (fi)
+		fi->nodeid = feo->nodeid;
+	if (ff)
+		ff->fh = foo->fh;
+	return 0;
+}
+
+int fuse_release_initialize(struct fuse_bpf_args *fa, struct fuse_release_in *fri,
+			    struct inode *inode, struct file *file)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	/* Always put backing file whatever bpf/userspace says */
+	fput(fuse_file->backing_file);
+
+	*fri = (struct fuse_release_in) {
+		.fh = ((struct fuse_file *)(file->private_data))->fh,
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_fuse_inode(inode)->nodeid,
+		.opcode = FUSE_RELEASE,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*fri),
+		.in_args[0].value = fri,
+	};
+
+	return 0;
+}
+
+int fuse_releasedir_initialize(struct fuse_bpf_args *fa,
+			struct fuse_release_in *fri,
+			struct inode *inode, struct file *file)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	/* Always put backing file whatever bpf/userspace says */
+	fput(fuse_file->backing_file);
+
+	*fri = (struct fuse_release_in) {
+		.fh = ((struct fuse_file *)(file->private_data))->fh,
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_fuse_inode(inode)->nodeid,
+		.opcode = FUSE_RELEASEDIR,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*fri),
+		.in_args[0].value = fri,
+	};
+
+	return 0;
+}
+
+int fuse_release_backing(struct fuse_bpf_args *fa,
+			 struct inode *inode, struct file *file)
+{
+	return 0;
+}
+
+void *fuse_release_finalize(struct fuse_bpf_args *fa,
+			    struct inode *inode, struct file *file)
+{
+	fuse_file_free(file->private_data);
+	return NULL;
+}
+
+int fuse_flush_initialize(struct fuse_bpf_args *fa, struct fuse_flush_in *ffi,
+			   struct file *file, fl_owner_t id)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	*ffi = (struct fuse_flush_in) {
+		.fh = fuse_file->fh,
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(file->f_inode),
+		.opcode = FUSE_FLUSH,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*ffi),
+		.in_args[0].value = ffi,
+		.flags = FUSE_BPF_FORCE,
+	};
+
+	return 0;
+}
+
+int fuse_flush_backing(struct fuse_bpf_args *fa, struct file *file, fl_owner_t id)
+{
+	struct fuse_file *fuse_file = file->private_data;
+	struct file *backing_file = fuse_file->backing_file;
+
+	if (backing_file->f_op->flush)
+		return backing_file->f_op->flush(backing_file, id);
+	return 0;
+}
+
+void *fuse_flush_finalize(struct fuse_bpf_args *fa, struct file *file, fl_owner_t id)
+{
+	return NULL;
+}
+
+int fuse_lseek_initialize(struct fuse_bpf_args *fa, struct fuse_lseek_io *flio,
+			  struct file *file, loff_t offset, int whence)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	flio->fli = (struct fuse_lseek_in) {
+		.fh = fuse_file->fh,
+		.offset = offset,
+		.whence = whence,
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(file->f_inode),
+		.opcode = FUSE_LSEEK,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(flio->fli),
+		.in_args[0].value = &flio->fli,
+		.out_numargs = 1,
+		.out_args[0].size = sizeof(flio->flo),
+		.out_args[0].value = &flio->flo,
+	};
+
+	return 0;
+}
+
+int fuse_lseek_backing(struct fuse_bpf_args *fa, struct file *file, loff_t offset, int whence)
+{
+	const struct fuse_lseek_in *fli = fa->in_args[0].value;
+	struct fuse_lseek_out *flo = fa->out_args[0].value;
+	struct fuse_file *fuse_file = file->private_data;
+	struct file *backing_file = fuse_file->backing_file;
+	loff_t ret;
+
+	/* TODO: Handle changing of the file handle */
+	if (offset == 0) {
+		if (whence == SEEK_CUR) {
+			flo->offset = file->f_pos;
+			return flo->offset;
+		}
+
+		if (whence == SEEK_SET) {
+			flo->offset = vfs_setpos(file, 0, 0);
+			return flo->offset;
+		}
+	}
+
+	inode_lock(file->f_inode);
+	backing_file->f_pos = file->f_pos;
+	ret = vfs_llseek(backing_file, fli->offset, fli->whence);
+	flo->offset = ret;
+	inode_unlock(file->f_inode);
+	return ret;
+}
+
+void *fuse_lseek_finalize(struct fuse_bpf_args *fa, struct file *file, loff_t offset, int whence)
+{
+	struct fuse_lseek_out *flo = fa->out_args[0].value;
+
+	if (!fa->error_in)
+		file->f_pos = flo->offset;
+	return ERR_PTR(flo->offset);
+}
+
+int fuse_copy_file_range_initialize(struct fuse_bpf_args *fa, struct fuse_copy_file_range_io *fcf,
+				   struct file *file_in, loff_t pos_in, struct file *file_out,
+				   loff_t pos_out, size_t len, unsigned int flags)
+{
+	struct fuse_file *fuse_file_in = file_in->private_data;
+	struct fuse_file *fuse_file_out = file_out->private_data;
+
+
+	fcf->fci = (struct fuse_copy_file_range_in) {
+		.fh_in = fuse_file_in->fh,
+		.off_in = pos_in,
+		.nodeid_out = fuse_file_out->nodeid,
+		.fh_out = fuse_file_out->fh,
+		.off_out = pos_out,
+		.len = len,
+		.flags = flags,
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(file_in->f_inode),
+		.opcode = FUSE_COPY_FILE_RANGE,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(fcf->fci),
+		.in_args[0].value = &fcf->fci,
+		.out_numargs = 1,
+		.out_args[0].size = sizeof(fcf->fwo),
+		.out_args[0].value = &fcf->fwo,
+	};
+
+	return 0;
+}
+
+int fuse_copy_file_range_backing(struct fuse_bpf_args *fa, struct file *file_in, loff_t pos_in,
+				 struct file *file_out, loff_t pos_out, size_t len,
+				 unsigned int flags)
+{
+	const struct fuse_copy_file_range_in *fci = fa->in_args[0].value;
+	struct fuse_file *fuse_file_in = file_in->private_data;
+	struct file *backing_file_in = fuse_file_in->backing_file;
+	struct fuse_file *fuse_file_out = file_out->private_data;
+	struct file *backing_file_out = fuse_file_out->backing_file;
+
+	/* TODO: Handle changing of in/out files */
+	if (backing_file_out)
+		return vfs_copy_file_range(backing_file_in, fci->off_in, backing_file_out,
+					   fci->off_out, fci->len, fci->flags);
+	else
+		return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
+					       flags);
+}
+
+void *fuse_copy_file_range_finalize(struct fuse_bpf_args *fa, struct file *file_in, loff_t pos_in,
+				    struct file *file_out, loff_t pos_out, size_t len,
+				    unsigned int flags)
+{
+	return NULL;
+}
+
+int fuse_fsync_initialize(struct fuse_bpf_args *fa, struct fuse_fsync_in *ffi,
+		   struct file *file, loff_t start, loff_t end, int datasync)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	*ffi = (struct fuse_fsync_in) {
+		.fh = fuse_file->fh,
+		.fsync_flags = datasync ? FUSE_FSYNC_FDATASYNC : 0,
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_fuse_inode(file->f_inode)->nodeid,
+		.opcode = FUSE_FSYNC,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*ffi),
+		.in_args[0].value = ffi,
+		.flags = FUSE_BPF_FORCE,
+	};
+
+	return 0;
+}
+
+int fuse_fsync_backing(struct fuse_bpf_args *fa,
+		   struct file *file, loff_t start, loff_t end, int datasync)
+{
+	struct fuse_file *fuse_file = file->private_data;
+	struct file *backing_file = fuse_file->backing_file;
+	const struct fuse_fsync_in *ffi = fa->in_args[0].value;
+	int new_datasync = (ffi->fsync_flags & FUSE_FSYNC_FDATASYNC) ? 1 : 0;
+
+	return vfs_fsync(backing_file, new_datasync);
+}
+
+void *fuse_fsync_finalize(struct fuse_bpf_args *fa,
+		   struct file *file, loff_t start, loff_t end, int datasync)
+{
+	return NULL;
+}
+
+int fuse_dir_fsync_initialize(struct fuse_bpf_args *fa, struct fuse_fsync_in *ffi,
+		   struct file *file, loff_t start, loff_t end, int datasync)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	*ffi = (struct fuse_fsync_in) {
+		.fh = fuse_file->fh,
+		.fsync_flags = datasync ? FUSE_FSYNC_FDATASYNC : 0,
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_fuse_inode(file->f_inode)->nodeid,
+		.opcode = FUSE_FSYNCDIR,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*ffi),
+		.in_args[0].value = ffi,
+		.flags = FUSE_BPF_FORCE,
+	};
+
+	return 0;
+}
+
+int fuse_getxattr_initialize(struct fuse_bpf_args *fa,
+		struct fuse_getxattr_io *fgio,
+		struct dentry *dentry, const char *name, void *value,
+		size_t size)
+{
+	*fgio = (struct fuse_getxattr_io) {
+		.fgi.size = size,
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_fuse_inode(dentry->d_inode)->nodeid,
+		.opcode = FUSE_GETXATTR,
+		.in_numargs = 2,
+		.out_numargs = 1,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = sizeof(fgio->fgi),
+			.value = &fgio->fgi,
+		},
+		.in_args[1] = (struct fuse_bpf_in_arg) {
+			.size = strlen(name) + 1,
+			.value = name,
+		},
+		.flags = size ? FUSE_BPF_OUT_ARGVAR : 0,
+		.out_args[0].size = size ? size : sizeof(fgio->fgo),
+		.out_args[0].value = size ? value : &fgio->fgo,
+	};
+	return 0;
+}
+
+int fuse_getxattr_backing(struct fuse_bpf_args *fa,
+		struct dentry *dentry, const char *name, void *value,
+		size_t size)
+{
+	ssize_t ret = vfs_getxattr(&init_user_ns,
+				   get_fuse_dentry(dentry)->backing_path.dentry,
+				   fa->in_args[1].value, value, size);
+
+	if (fa->flags & FUSE_BPF_OUT_ARGVAR)
+		fa->out_args[0].size = ret;
+	else
+		((struct fuse_getxattr_out *)fa->out_args[0].value)->size = ret;
+
+	return 0;
+}
+
+void *fuse_getxattr_finalize(struct fuse_bpf_args *fa,
+		struct dentry *dentry, const char *name, void *value,
+		size_t size)
+{
+	struct fuse_getxattr_out *fgo;
+
+	if (fa->flags & FUSE_BPF_OUT_ARGVAR)
+		return ERR_PTR(fa->out_args[0].size);
+
+	fgo = fa->out_args[0].value;
+
+	return ERR_PTR(fgo->size);
+
+}
+
+int fuse_listxattr_initialize(struct fuse_bpf_args *fa,
+			      struct fuse_getxattr_io *fgio,
+			      struct dentry *dentry, char *list, size_t size)
+{
+	*fgio = (struct fuse_getxattr_io){
+		.fgi.size = size,
+	};
+
+	*fa = (struct fuse_bpf_args){
+		.nodeid = get_fuse_inode(dentry->d_inode)->nodeid,
+		.opcode = FUSE_LISTXATTR,
+		.in_numargs = 1,
+		.out_numargs = 1,
+		.in_args[0] =
+			(struct fuse_bpf_in_arg){
+				.size = sizeof(fgio->fgi),
+				.value = &fgio->fgi,
+			},
+		.flags = size ? FUSE_BPF_OUT_ARGVAR : 0,
+		.out_args[0].size = size ? size : sizeof(fgio->fgo),
+		.out_args[0].value = size ? (void *)list : &fgio->fgo,
+	};
+
+	return 0;
+}
+
+int fuse_listxattr_backing(struct fuse_bpf_args *fa, struct dentry *dentry,
+			   char *list, size_t size)
+{
+	ssize_t ret =
+		vfs_listxattr(get_fuse_dentry(dentry)->backing_path.dentry,
+			      list, size);
+
+	if (ret < 0)
+		return ret;
+
+	if (fa->flags & FUSE_BPF_OUT_ARGVAR)
+		fa->out_args[0].size = ret;
+	else
+		((struct fuse_getxattr_out *)fa->out_args[0].value)->size = ret;
+
+	return ret;
+}
+
+void *fuse_listxattr_finalize(struct fuse_bpf_args *fa, struct dentry *dentry,
+			      char *list, size_t size)
+{
+	struct fuse_getxattr_out *fgo;
+
+	if (fa->error_in)
+		return NULL;
+
+	if (fa->flags & FUSE_BPF_OUT_ARGVAR)
+		return ERR_PTR(fa->out_args[0].size);
+
+	fgo = fa->out_args[0].value;
+	return ERR_PTR(fgo->size);
+}
+
+int fuse_setxattr_initialize(struct fuse_bpf_args *fa,
+			     struct fuse_setxattr_in *fsxi,
+			     struct dentry *dentry, const char *name,
+			     const void *value, size_t size, int flags)
+{
+	*fsxi = (struct fuse_setxattr_in) {
+		.size = size,
+		.flags = flags,
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_fuse_inode(dentry->d_inode)->nodeid,
+		.opcode = FUSE_SETXATTR,
+		.in_numargs = 3,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = sizeof(*fsxi),
+			.value = fsxi,
+		},
+		.in_args[1] = (struct fuse_bpf_in_arg) {
+			.size = strlen(name) + 1,
+			.value = name,
+		},
+		.in_args[2] = (struct fuse_bpf_in_arg) {
+			.size = size,
+			.value = value,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_setxattr_backing(struct fuse_bpf_args *fa, struct dentry *dentry,
+			  const char *name, const void *value, size_t size,
+			  int flags)
+{
+	return vfs_setxattr(&init_user_ns,
+			    get_fuse_dentry(dentry)->backing_path.dentry, name,
+			    value, size, flags);
+}
+
+void *fuse_setxattr_finalize(struct fuse_bpf_args *fa, struct dentry *dentry,
+			     const char *name, const void *value, size_t size,
+			     int flags)
+{
+	return NULL;
+}
+
+int fuse_removexattr_initialize(struct fuse_bpf_args *fa,
+				struct fuse_dummy_io *unused,
+				struct dentry *dentry, const char *name)
+{
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_fuse_inode(dentry->d_inode)->nodeid,
+		.opcode = FUSE_REMOVEXATTR,
+		.in_numargs = 1,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = strlen(name) + 1,
+			.value = name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_removexattr_backing(struct fuse_bpf_args *fa,
+			     struct dentry *dentry, const char *name)
+{
+	struct path *backing_path =
+		&get_fuse_dentry(dentry)->backing_path;
+
+	/* TODO account for changes of the name by prefilter */
+	return vfs_removexattr(&init_user_ns, backing_path->dentry, name);
+}
+
+void *fuse_removexattr_finalize(struct fuse_bpf_args *fa,
+				struct dentry *dentry, const char *name)
+{
+	return NULL;
+}
+
+static inline void fuse_bpf_aio_put(struct fuse_bpf_aio_req *aio_req)
+{
+	if (refcount_dec_and_test(&aio_req->ref))
+		kmem_cache_free(fuse_bpf_aio_request_cachep, aio_req);
+}
+
+static void fuse_bpf_aio_cleanup_handler(struct fuse_bpf_aio_req *aio_req)
+{
+	struct kiocb *iocb = &aio_req->iocb;
+	struct kiocb *iocb_orig = aio_req->iocb_orig;
+
+	if (iocb->ki_flags & IOCB_WRITE) {
+		__sb_writers_acquired(file_inode(iocb->ki_filp)->i_sb,
+				      SB_FREEZE_WRITE);
+		file_end_write(iocb->ki_filp);
+		fuse_copyattr(iocb_orig->ki_filp, iocb->ki_filp);
+	}
+	iocb_orig->ki_pos = iocb->ki_pos;
+	fuse_bpf_aio_put(aio_req);
+}
+
+static void fuse_bpf_aio_rw_complete(struct kiocb *iocb, long res)
+{
+	struct fuse_bpf_aio_req *aio_req =
+		container_of(iocb, struct fuse_bpf_aio_req, iocb);
+	struct kiocb *iocb_orig = aio_req->iocb_orig;
+
+	fuse_bpf_aio_cleanup_handler(aio_req);
+	iocb_orig->ki_complete(iocb_orig, res);
+}
+
+
+int fuse_file_read_iter_initialize(
+		struct fuse_bpf_args *fa, struct fuse_file_read_iter_io *fri,
+		struct kiocb *iocb, struct iov_iter *to)
+{
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
+
+	fri->fri = (struct fuse_read_in) {
+		.fh = ff->fh,
+		.offset = iocb->ki_pos,
+		.size = to->count,
+	};
+
+	fri->frio = (struct fuse_read_iter_out) {
+		.ret = fri->fri.size,
+	};
+
+	/* TODO we can't assume 'to' is a kvec */
+	/* TODO we also can't assume the vector has only one component */
+	*fa = (struct fuse_bpf_args) {
+		.opcode = FUSE_READ,
+		.nodeid = ff->nodeid,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(fri->fri),
+		.in_args[0].value = &fri->fri,
+		.out_numargs = 1,
+		.out_args[0].size = sizeof(fri->frio),
+		.out_args[0].value = &fri->frio,
+		/*
+		 * TODO Design this properly.
+		 * Possible approach: do not pass buf to bpf
+		 * If going to userland, do a deep copy
+		 * For extra credit, do that to/from the vector, rather than
+		 * making an extra copy in the kernel
+		 */
+	};
+
+	return 0;
+}
+
+int fuse_file_read_iter_backing(struct fuse_bpf_args *fa,
+		struct kiocb *iocb, struct iov_iter *to)
+{
+	struct fuse_read_iter_out *frio = fa->out_args[0].value;
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
+	ssize_t ret;
+
+	if (!iov_iter_count(to))
+		return 0;
+
+	if ((iocb->ki_flags & IOCB_DIRECT) &&
+	    (!ff->backing_file->f_mapping->a_ops ||
+	     !ff->backing_file->f_mapping->a_ops->direct_IO))
+		return -EINVAL;
+
+	/* TODO This just plain ignores any change to fuse_read_in */
+	if (is_sync_kiocb(iocb)) {
+		ret = vfs_iter_read(ff->backing_file, to, &iocb->ki_pos,
+				iocb_to_rw_flags(iocb->ki_flags, FUSE_BPF_IOCB_MASK));
+	} else {
+		struct fuse_bpf_aio_req *aio_req;
+
+		ret = -ENOMEM;
+		aio_req = kmem_cache_zalloc(fuse_bpf_aio_request_cachep, GFP_KERNEL);
+		if (!aio_req)
+			goto out;
+
+		aio_req->iocb_orig = iocb;
+		kiocb_clone(&aio_req->iocb, iocb, ff->backing_file);
+		aio_req->iocb.ki_complete = fuse_bpf_aio_rw_complete;
+		refcount_set(&aio_req->ref, 2);
+		ret = vfs_iocb_iter_read(ff->backing_file, &aio_req->iocb, to);
+		fuse_bpf_aio_put(aio_req);
+		if (ret != -EIOCBQUEUED)
+			fuse_bpf_aio_cleanup_handler(aio_req);
+	}
+
+	frio->ret = ret;
+
+	/* TODO Need to point value at the buffer for post-modification */
+
+out:
+	fuse_file_accessed(file, ff->backing_file);
+
+	return ret;
+}
+
+void *fuse_file_read_iter_finalize(struct fuse_bpf_args *fa,
+		struct kiocb *iocb, struct iov_iter *to)
+{
+	struct fuse_read_iter_out *frio = fa->out_args[0].value;
+
+	return ERR_PTR(frio->ret);
+}
+
+int fuse_file_write_iter_initialize(
+		struct fuse_bpf_args *fa, struct fuse_file_write_iter_io *fwio,
+		struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
+
+	*fwio = (struct fuse_file_write_iter_io) {
+		.fwi.fh = ff->fh,
+		.fwi.offset = iocb->ki_pos,
+		.fwi.size = from->count,
+	};
+
+	/* TODO we can't assume 'from' is a kvec */
+	*fa = (struct fuse_bpf_args) {
+		.opcode = FUSE_WRITE,
+		.nodeid = ff->nodeid,
+		.in_numargs = 2,
+		.in_args[0].size = sizeof(fwio->fwi),
+		.in_args[0].value = &fwio->fwi,
+		.in_args[1].size = fwio->fwi.size,
+		.in_args[1].value = iov_iter_is_kvec(from)
+			? from->kvec->iov_base : NULL,
+		.out_numargs = 1,
+		.out_args[0].size = sizeof(fwio->fwio),
+		.out_args[0].value = &fwio->fwio,
+	};
+
+	return 0;
+}
+
+int fuse_file_write_iter_backing(struct fuse_bpf_args *fa,
+		struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
+	struct fuse_write_iter_out *fwio = fa->out_args[0].value;
+	ssize_t ret;
+
+	if (!iov_iter_count(from))
+		return 0;
+
+	/* TODO This just plain ignores any change to fuse_write_in */
+	/* TODO uint32_t seems smaller than ssize_t.... right? */
+	inode_lock(file_inode(file));
+
+	fuse_copyattr(file, ff->backing_file);
+
+	if (is_sync_kiocb(iocb)) {
+		file_start_write(ff->backing_file);
+		ret = vfs_iter_write(ff->backing_file, from, &iocb->ki_pos,
+					   iocb_to_rw_flags(iocb->ki_flags, FUSE_BPF_IOCB_MASK));
+		file_end_write(ff->backing_file);
+
+		/* Must reflect change in size of backing file to upper file */
+		if (ret > 0)
+			fuse_copyattr(file, ff->backing_file);
+	} else {
+		struct fuse_bpf_aio_req *aio_req;
+
+		ret = -ENOMEM;
+		aio_req = kmem_cache_zalloc(fuse_bpf_aio_request_cachep, GFP_KERNEL);
+		if (!aio_req)
+			goto out;
+
+		file_start_write(ff->backing_file);
+		__sb_writers_release(file_inode(ff->backing_file)->i_sb, SB_FREEZE_WRITE);
+		aio_req->iocb_orig = iocb;
+		kiocb_clone(&aio_req->iocb, iocb, ff->backing_file);
+		aio_req->iocb.ki_complete = fuse_bpf_aio_rw_complete;
+		refcount_set(&aio_req->ref, 2);
+		ret = vfs_iocb_iter_write(ff->backing_file, &aio_req->iocb, from);
+		fuse_bpf_aio_put(aio_req);
+		if (ret != -EIOCBQUEUED)
+			fuse_bpf_aio_cleanup_handler(aio_req);
+	}
+
+out:
+	inode_unlock(file_inode(file));
+	fwio->ret = ret;
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+void *fuse_file_write_iter_finalize(struct fuse_bpf_args *fa,
+		struct kiocb *iocb, struct iov_iter *from)
+{
+	struct fuse_write_iter_out *fwio = fa->out_args[0].value;
+
+	return ERR_PTR(fwio->ret);
+}
+
+ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	int ret;
+	struct fuse_file *ff = file->private_data;
+	struct inode *fuse_inode = file_inode(file);
+	struct file *backing_file = ff->backing_file;
+	struct inode *backing_inode = file_inode(backing_file);
+
+	if (!backing_file->f_op->mmap)
+		return -ENODEV;
+
+	if (WARN_ON(file != vma->vm_file))
+		return -EIO;
+
+	vma->vm_file = get_file(backing_file);
+
+	ret = call_mmap(vma->vm_file, vma);
+
+	if (ret)
+		fput(backing_file);
+	else
+		fput(file);
+
+	if (file->f_flags & O_NOATIME)
+		return ret;
+
+	if ((!timespec64_equal(&fuse_inode->i_mtime,
+			       &backing_inode->i_mtime) ||
+	     !timespec64_equal(&fuse_inode->i_ctime,
+			       &backing_inode->i_ctime))) {
+		fuse_inode->i_mtime = backing_inode->i_mtime;
+		fuse_inode->i_ctime = backing_inode->i_ctime;
+	}
+	touch_atime(&file->f_path);
+
+	return ret;
+}
+
+int fuse_file_fallocate_initialize(struct fuse_bpf_args *fa,
+		struct fuse_fallocate_in *ffi,
+		struct file *file, int mode, loff_t offset, loff_t length)
+{
+	struct fuse_file *ff = file->private_data;
+
+	*ffi = (struct fuse_fallocate_in) {
+		.fh = ff->fh,
+		.offset = offset,
+		.length = length,
+		.mode = mode
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.opcode = FUSE_FALLOCATE,
+		.nodeid = ff->nodeid,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*ffi),
+		.in_args[0].value = ffi,
+	};
+
+	return 0;
+}
+
+int fuse_file_fallocate_backing(struct fuse_bpf_args *fa,
+		struct file *file, int mode, loff_t offset, loff_t length)
+{
+	const struct fuse_fallocate_in *ffi = fa->in_args[0].value;
+	struct fuse_file *ff = file->private_data;
+
+	return vfs_fallocate(ff->backing_file, ffi->mode, ffi->offset,
+			     ffi->length);
+}
+
+void *fuse_file_fallocate_finalize(struct fuse_bpf_args *fa,
+		struct file *file, int mode, loff_t offset, loff_t length)
+{
+	return NULL;
+}
+
+/*******************************************************************************
+ * Directory operations after here                                             *
+ ******************************************************************************/
+
+int fuse_lookup_initialize(struct fuse_bpf_args *fa, struct fuse_lookup_io *fli,
+	       struct inode *dir, struct dentry *entry, unsigned int flags)
+{
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_fuse_inode(dir)->nodeid,
+		.opcode = FUSE_LOOKUP,
+		.in_numargs = 1,
+		.out_numargs = 2,
+		.flags = FUSE_BPF_OUT_ARGVAR,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = entry->d_name.len + 1,
+			.value = entry->d_name.name,
+		},
+		.out_args[0] = (struct fuse_bpf_arg) {
+			.size = sizeof(fli->feo),
+			.value = &fli->feo,
+		},
+		.out_args[1] = (struct fuse_bpf_arg) {
+			.size = sizeof(fli->feb.out),
+			.value = &fli->feb.out,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_lookup_backing(struct fuse_bpf_args *fa, struct inode *dir,
+			  struct dentry *entry, unsigned int flags)
+{
+	struct fuse_dentry *fuse_entry = get_fuse_dentry(entry);
+	struct fuse_dentry *dir_fuse_entry = get_fuse_dentry(entry->d_parent);
+	struct dentry *dir_backing_entry = dir_fuse_entry->backing_path.dentry;
+	struct inode *dir_backing_inode = dir_backing_entry->d_inode;
+	struct dentry *backing_entry;
+	struct fuse_entry_out *feo = (void *)fa->out_args[0].value;
+	struct kstat stat;
+	int err;
+
+	/* TODO this will not handle lookups over mount points */
+	inode_lock_nested(dir_backing_inode, I_MUTEX_PARENT);
+	backing_entry = lookup_one_len(entry->d_name.name, dir_backing_entry,
+					strlen(entry->d_name.name));
+	inode_unlock(dir_backing_inode);
+
+	if (IS_ERR(backing_entry))
+		return PTR_ERR(backing_entry);
+
+	fuse_entry->backing_path = (struct path) {
+		.dentry = backing_entry,
+		.mnt = mntget(dir_fuse_entry->backing_path.mnt),
+	};
+
+	if (d_is_negative(backing_entry)) {
+		fa->error_in = -ENOENT;
+		return 0;
+	}
+
+	err = vfs_getattr(&fuse_entry->backing_path, &stat,
+				  STATX_BASIC_STATS, 0);
+	if (err) {
+		path_put_init(&fuse_entry->backing_path);
+		return err;
+	}
+
+	fuse_stat_to_attr(get_fuse_conn(dir),
+			  backing_entry->d_inode, &stat, &feo->attr);
+	return 0;
+}
+
+int fuse_handle_backing(struct fuse_entry_bpf *feb, struct inode **backing_inode,
+			struct path *backing_path)
+{
+	switch (feb->out.backing_action) {
+	case FUSE_ACTION_KEEP:
+		/* backing inode/path are added in fuse_lookup_backing */
+		break;
+
+	case FUSE_ACTION_REMOVE:
+		iput(*backing_inode);
+		*backing_inode = NULL;
+		path_put_init(backing_path);
+		break;
+
+	case FUSE_ACTION_REPLACE: {
+		struct file *backing_file = feb->backing_file;
+
+		if (!backing_file)
+			return -EINVAL;
+		if (IS_ERR(backing_file))
+			return PTR_ERR(backing_file);
+
+		if (backing_inode)
+			iput(*backing_inode);
+		*backing_inode = backing_file->f_inode;
+		ihold(*backing_inode);
+
+		path_put(backing_path);
+		*backing_path = backing_file->f_path;
+		path_get(backing_path);
+
+		fput(backing_file);
+		break;
+	}
+
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int fuse_handle_bpf_prog(struct fuse_entry_bpf *feb, struct inode *parent,
+			 struct bpf_prog **bpf)
+{
+	struct fuse_inode *pi;
+
+	// Parent isn't presented, but we want to keep
+	// Don't touch bpf program at all in this case
+	if (feb->out.bpf_action == FUSE_ACTION_KEEP && !parent)
+		goto out;
+
+	if (*bpf) {
+		bpf_prog_put(*bpf);
+		*bpf = NULL;
+	}
+
+	switch (feb->out.bpf_action) {
+	case FUSE_ACTION_KEEP:
+		pi = get_fuse_inode(parent);
+		*bpf = pi->bpf;
+		if (*bpf)
+			bpf_prog_inc(*bpf);
+		break;
+
+	case FUSE_ACTION_REMOVE:
+		break;
+
+	case FUSE_ACTION_REPLACE: {
+		struct file *bpf_file = feb->bpf_file;
+		struct bpf_prog *bpf_prog = ERR_PTR(-EINVAL);
+
+		if (bpf_file && !IS_ERR(bpf_file))
+			bpf_prog = fuse_get_bpf_prog(bpf_file);
+
+		if (IS_ERR(bpf_prog))
+			return PTR_ERR(bpf_prog);
+
+		*bpf = bpf_prog;
+		break;
+	}
+
+	default:
+		return -EINVAL;
+	}
+
+out:
+	return 0;
+}
+
+struct dentry *fuse_lookup_finalize(struct fuse_bpf_args *fa, struct inode *dir,
+			   struct dentry *entry, unsigned int flags)
+{
+	struct fuse_dentry *fd;
+	struct dentry *bd;
+	struct inode *inode, *backing_inode;
+	struct inode *d_inode = entry->d_inode;
+	struct fuse_entry_out *feo = fa->out_args[0].value;
+	struct fuse_entry_bpf_out *febo = fa->out_args[1].value;
+	struct fuse_entry_bpf *feb = container_of(febo, struct fuse_entry_bpf, out);
+	int error = -1;
+	u64 target_nodeid = 0;
+
+	fd = get_fuse_dentry(entry);
+	if (!fd)
+		return ERR_PTR(-EIO);
+	bd = fd->backing_path.dentry;
+	if (!bd)
+		return ERR_PTR(-ENOENT);
+	backing_inode = bd->d_inode;
+	if (!backing_inode)
+		return 0;
+
+	if (d_inode)
+		target_nodeid = get_fuse_inode(d_inode)->nodeid;
+
+	inode = fuse_iget_backing(dir->i_sb, target_nodeid, backing_inode);
+
+	if (IS_ERR(inode))
+		return ERR_PTR(PTR_ERR(inode));
+
+	error = fuse_handle_bpf_prog(feb, dir, &get_fuse_inode(inode)->bpf);
+	if (error)
+		return ERR_PTR(error);
+
+	error = fuse_handle_backing(feb, &get_fuse_inode(inode)->backing_inode, &fd->backing_path);
+	if (error)
+		return ERR_PTR(error);
+
+	get_fuse_inode(inode)->nodeid = feo->nodeid;
+
+	return d_splice_alias(inode, entry);
+}
+
+int fuse_revalidate_backing(struct dentry *entry, unsigned int flags)
+{
+	struct fuse_dentry *fuse_dentry = get_fuse_dentry(entry);
+	struct dentry *backing_entry = fuse_dentry->backing_path.dentry;
+
+	spin_lock(&backing_entry->d_lock);
+	if (d_unhashed(backing_entry)) {
+		spin_unlock(&backing_entry->d_lock);
+			return 0;
+	}
+	spin_unlock(&backing_entry->d_lock);
+
+	if (unlikely(backing_entry->d_flags & DCACHE_OP_REVALIDATE))
+		return backing_entry->d_op->d_revalidate(backing_entry, flags);
+	return 1;
+}
+
+int fuse_canonical_path_initialize(struct fuse_bpf_args *fa,
+				   struct fuse_dummy_io *fdi,
+				   const struct path *path,
+				   struct path *canonical_path)
+{
+	fa->opcode = FUSE_CANONICAL_PATH;
+	return 0;
+}
+
+int fuse_canonical_path_backing(struct fuse_bpf_args *fa, const struct path *path,
+				struct path *canonical_path)
+{
+	get_fuse_backing_path(path->dentry, canonical_path);
+	return 0;
+}
+
+void *fuse_canonical_path_finalize(struct fuse_bpf_args *fa,
+				   const struct path *path,
+				   struct path *canonical_path)
+{
+	return NULL;
+}
+
+int fuse_mknod_initialize(
+		struct fuse_bpf_args *fa, struct fuse_mknod_in *fmi,
+		struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev)
+{
+	*fmi = (struct fuse_mknod_in) {
+		.mode = mode,
+		.rdev = new_encode_dev(rdev),
+		.umask = current_umask(),
+	};
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_MKNOD,
+		.in_numargs = 2,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = sizeof(*fmi),
+			.value = fmi,
+		},
+		.in_args[1] = (struct fuse_bpf_in_arg) {
+			.size = entry->d_name.len + 1,
+			.value = entry->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_mknod_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev)
+{
+	int err = 0;
+	const struct fuse_mknod_in *fmi = fa->in_args[0].value;
+	struct fuse_inode *fuse_inode = get_fuse_inode(dir);
+	struct inode *backing_inode = fuse_inode->backing_inode;
+	struct path backing_path = {};
+	struct inode *inode = NULL;
+
+	//TODO Actually deal with changing the backing entry in mknod
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	mode = fmi->mode;
+	if (!IS_POSIXACL(backing_inode))
+		mode &= ~fmi->umask;
+	err = vfs_mknod(&init_user_ns, backing_inode, backing_path.dentry,
+			mode, new_decode_dev(fmi->rdev));
+	inode_unlock(backing_inode);
+	if (err)
+		goto out;
+	if (d_really_is_negative(backing_path.dentry) ||
+		unlikely(d_unhashed(backing_path.dentry))) {
+		err = -EINVAL;
+		/**
+		 * TODO: overlayfs responds to this situation with a
+		 * lookupOneLen. Should we do that too?
+		 */
+		goto out;
+	}
+	inode = fuse_iget_backing(dir->i_sb, fuse_inode->nodeid, backing_inode);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		goto out;
+	}
+	d_instantiate(entry, inode);
+out:
+	path_put(&backing_path);
+	return err;
+}
+
+void *fuse_mknod_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev)
+{
+	return NULL;
+}
+
+int fuse_mkdir_initialize(
+		struct fuse_bpf_args *fa, struct fuse_mkdir_in *fmi,
+		struct inode *dir, struct dentry *entry, umode_t mode)
+{
+	*fmi = (struct fuse_mkdir_in) {
+		.mode = mode,
+		.umask = current_umask(),
+	};
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_MKDIR,
+		.in_numargs = 2,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = sizeof(*fmi),
+			.value = fmi,
+		},
+		.in_args[1] = (struct fuse_bpf_in_arg) {
+			.size = entry->d_name.len + 1,
+			.value = entry->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_mkdir_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, umode_t mode)
+{
+	int err = 0;
+	const struct fuse_mkdir_in *fmi = fa->in_args[0].value;
+	struct fuse_inode *fuse_inode = get_fuse_inode(dir);
+	struct inode *backing_inode = fuse_inode->backing_inode;
+	struct path backing_path = {};
+	struct inode *inode = NULL;
+	struct dentry *d;
+
+	//TODO Actually deal with changing the backing entry in mkdir
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	mode = fmi->mode;
+	if (!IS_POSIXACL(backing_inode))
+		mode &= ~fmi->umask;
+	err = vfs_mkdir(&init_user_ns, backing_inode, backing_path.dentry, mode);
+	if (err)
+		goto out;
+	if (d_really_is_negative(backing_path.dentry) ||
+		unlikely(d_unhashed(backing_path.dentry))) {
+		d = lookup_one_len(entry->d_name.name, backing_path.dentry->d_parent,
+				entry->d_name.len);
+		if (IS_ERR(d)) {
+			err = PTR_ERR(d);
+			goto out;
+		}
+		dput(backing_path.dentry);
+		backing_path.dentry = d;
+	}
+	inode = fuse_iget_backing(dir->i_sb, fuse_inode->nodeid, backing_inode);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		goto out;
+	}
+	d_instantiate(entry, inode);
+out:
+	inode_unlock(backing_inode);
+	path_put(&backing_path);
+	return err;
+}
+
+void *fuse_mkdir_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, umode_t mode)
+{
+	return NULL;
+}
+
+int fuse_rmdir_initialize(
+		struct fuse_bpf_args *fa, struct fuse_dummy_io *dummy,
+		struct inode *dir, struct dentry *entry)
+{
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_RMDIR,
+		.in_numargs = 1,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = entry->d_name.len + 1,
+			.value = entry->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_rmdir_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry)
+{
+	int err = 0;
+	struct path backing_path = {};
+	struct dentry *backing_parent_dentry;
+	struct inode *backing_inode;
+
+	/* TODO Actually deal with changing the backing entry in rmdir */
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	/* TODO Not sure if we should reverify like overlayfs, or get inode from d_parent */
+	backing_parent_dentry = dget_parent(backing_path.dentry);
+	backing_inode = d_inode(backing_parent_dentry);
+
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	err = vfs_rmdir(&init_user_ns, backing_inode, backing_path.dentry);
+	inode_unlock(backing_inode);
+
+	dput(backing_parent_dentry);
+	if (!err)
+		d_drop(entry);
+	path_put(&backing_path);
+	return err;
+}
+
+void *fuse_rmdir_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry)
+{
+	return NULL;
+}
+
+static int fuse_rename_backing_common(
+			 struct inode *olddir, struct dentry *oldent,
+			 struct inode *newdir, struct dentry *newent,
+			 unsigned int flags)
+{
+	int err = 0;
+	struct path old_backing_path;
+	struct path new_backing_path;
+	struct dentry *old_backing_dir_dentry;
+	struct dentry *old_backing_dentry;
+	struct dentry *new_backing_dir_dentry;
+	struct dentry *new_backing_dentry;
+	struct dentry *trap = NULL;
+	struct inode *target_inode;
+	struct renamedata rd;
+
+	//TODO Actually deal with changing anything that isn't a flag
+	get_fuse_backing_path(oldent, &old_backing_path);
+	if (!old_backing_path.dentry)
+		return -EBADF;
+	get_fuse_backing_path(newent, &new_backing_path);
+	if (!new_backing_path.dentry) {
+		/*
+		 * TODO A file being moved from a backing path to another
+		 * backing path which is not yet instrumented with FUSE-BPF.
+		 * This may be slow and should be substituted with something
+		 * more clever.
+		 */
+		err = -EXDEV;
+		goto put_old_path;
+	}
+	if (new_backing_path.mnt != old_backing_path.mnt) {
+		err = -EXDEV;
+		goto put_new_path;
+	}
+	old_backing_dentry = old_backing_path.dentry;
+	new_backing_dentry = new_backing_path.dentry;
+	old_backing_dir_dentry = dget_parent(old_backing_dentry);
+	new_backing_dir_dentry = dget_parent(new_backing_dentry);
+	target_inode = d_inode(newent);
+
+	trap = lock_rename(old_backing_dir_dentry, new_backing_dir_dentry);
+	if (trap == old_backing_dentry) {
+		err = -EINVAL;
+		goto put_parents;
+	}
+	if (trap == new_backing_dentry) {
+		err = -ENOTEMPTY;
+		goto put_parents;
+	}
+	rd = (struct renamedata) {
+		.old_mnt_userns = &init_user_ns,
+		.old_dir = d_inode(old_backing_dir_dentry),
+		.old_dentry = old_backing_dentry,
+		.new_mnt_userns = &init_user_ns,
+		.new_dir = d_inode(new_backing_dir_dentry),
+		.new_dentry = new_backing_dentry,
+		.flags = flags,
+	};
+	err = vfs_rename(&rd);
+	if (err)
+		goto unlock;
+	if (target_inode)
+		fsstack_copy_attr_all(target_inode,
+				get_fuse_inode(target_inode)->backing_inode);
+	fsstack_copy_attr_all(d_inode(oldent), d_inode(old_backing_dentry));
+unlock:
+	unlock_rename(old_backing_dir_dentry, new_backing_dir_dentry);
+put_parents:
+	dput(new_backing_dir_dentry);
+	dput(old_backing_dir_dentry);
+put_new_path:
+	path_put(&new_backing_path);
+put_old_path:
+	path_put(&old_backing_path);
+	return err;
+}
+
+int fuse_rename2_initialize(struct fuse_bpf_args *fa, struct fuse_rename2_in *fri,
+			    struct inode *olddir, struct dentry *oldent,
+			    struct inode *newdir, struct dentry *newent,
+			    unsigned int flags)
+{
+	*fri = (struct fuse_rename2_in) {
+		.newdir = get_node_id(newdir),
+		.flags = flags,
+	};
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(olddir),
+		.opcode = FUSE_RENAME2,
+		.in_numargs = 3,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = sizeof(*fri),
+			.value = fri,
+		},
+		.in_args[1] = (struct fuse_bpf_in_arg) {
+			.size = oldent->d_name.len + 1,
+			.value = oldent->d_name.name,
+		},
+		.in_args[2] = (struct fuse_bpf_in_arg) {
+			.size = newent->d_name.len + 1,
+			.value = newent->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_rename2_backing(struct fuse_bpf_args *fa,
+			 struct inode *olddir, struct dentry *oldent,
+			 struct inode *newdir, struct dentry *newent,
+			 unsigned int flags)
+{
+	const struct fuse_rename2_in *fri = fa->in_args[0].value;
+
+	/* TODO: deal with changing dirs/ents */
+	return fuse_rename_backing_common(olddir, oldent, newdir, newent, fri->flags);
+}
+
+void *fuse_rename2_finalize(struct fuse_bpf_args *fa,
+			    struct inode *olddir, struct dentry *oldent,
+			    struct inode *newdir, struct dentry *newent,
+			    unsigned int flags)
+{
+	return NULL;
+}
+
+int fuse_rename_initialize(struct fuse_bpf_args *fa, struct fuse_rename_in *fri,
+			   struct inode *olddir, struct dentry *oldent,
+			   struct inode *newdir, struct dentry *newent)
+{
+	*fri = (struct fuse_rename_in) {
+		.newdir = get_node_id(newdir),
+	};
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(olddir),
+		.opcode = FUSE_RENAME,
+		.in_numargs = 3,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = sizeof(*fri),
+			.value = fri,
+		},
+		.in_args[1] = (struct fuse_bpf_in_arg) {
+			.size = oldent->d_name.len + 1,
+			.value = oldent->d_name.name,
+		},
+		.in_args[2] = (struct fuse_bpf_in_arg) {
+			.size = newent->d_name.len + 1,
+			.value = newent->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_rename_backing(struct fuse_bpf_args *fa,
+			struct inode *olddir, struct dentry *oldent,
+			struct inode *newdir, struct dentry *newent)
+{
+	/* TODO: deal with changing dirs/ents */
+	return fuse_rename_backing_common(olddir, oldent, newdir, newent, 0);
+}
+
+void *fuse_rename_finalize(struct fuse_bpf_args *fa,
+			   struct inode *olddir, struct dentry *oldent,
+			   struct inode *newdir, struct dentry *newent)
+{
+	return NULL;
+}
+
+int fuse_unlink_initialize(
+		struct fuse_bpf_args *fa, struct fuse_dummy_io *dummy,
+		struct inode *dir, struct dentry *entry)
+{
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_UNLINK,
+		.in_numargs = 1,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = entry->d_name.len + 1,
+			.value = entry->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_unlink_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry)
+{
+	int err = 0;
+	struct path backing_path = {};
+	struct dentry *backing_parent_dentry;
+	struct inode *backing_inode;
+
+	/* TODO Actually deal with changing the backing entry in unlink */
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	/* TODO Not sure if we should reverify like overlayfs, or get inode from d_parent */
+	backing_parent_dentry = dget_parent(backing_path.dentry);
+	backing_inode = d_inode(backing_parent_dentry);
+
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	err = vfs_unlink(&init_user_ns, backing_inode, backing_path.dentry, NULL);
+	inode_unlock(backing_inode);
+
+	dput(backing_parent_dentry);
+	if (!err)
+		d_drop(entry);
+	path_put(&backing_path);
+	return err;
+}
+
+void *fuse_unlink_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry)
+{
+	return NULL;
+}
+
+int fuse_link_initialize(struct fuse_bpf_args *fa, struct fuse_link_in *fli,
+			 struct dentry *entry, struct inode *dir,
+			 struct dentry *newent)
+{
+	struct inode *src_inode = entry->d_inode;
+
+	*fli = (struct fuse_link_in){
+		.oldnodeid = get_node_id(src_inode),
+	};
+
+	fa->opcode = FUSE_LINK;
+	fa->in_numargs = 2;
+	fa->in_args[0].size = sizeof(*fli);
+	fa->in_args[0].value = fli;
+	fa->in_args[1].size = newent->d_name.len + 1;
+	fa->in_args[1].value = newent->d_name.name;
+
+	return 0;
+}
+
+int fuse_link_backing(struct fuse_bpf_args *fa, struct dentry *entry,
+		      struct inode *dir, struct dentry *newent)
+{
+	int err = 0;
+	struct path backing_old_path = {};
+	struct path backing_new_path = {};
+	struct dentry *backing_dir_dentry;
+	struct inode *fuse_new_inode = NULL;
+	struct fuse_inode *fuse_dir_inode = get_fuse_inode(dir);
+	struct inode *backing_dir_inode = fuse_dir_inode->backing_inode;
+
+	get_fuse_backing_path(entry, &backing_old_path);
+	if (!backing_old_path.dentry)
+		return -EBADF;
+
+	get_fuse_backing_path(newent, &backing_new_path);
+	if (!backing_new_path.dentry) {
+		err = -EBADF;
+		goto err_dst_path;
+	}
+
+	backing_dir_dentry = dget_parent(backing_new_path.dentry);
+	backing_dir_inode = d_inode(backing_dir_dentry);
+
+	inode_lock_nested(backing_dir_inode, I_MUTEX_PARENT);
+	err = vfs_link(backing_old_path.dentry,  &init_user_ns,
+		       backing_dir_inode, backing_new_path.dentry, NULL);
+	inode_unlock(backing_dir_inode);
+	if (err)
+		goto out;
+
+	if (d_really_is_negative(backing_new_path.dentry) ||
+	    unlikely(d_unhashed(backing_new_path.dentry))) {
+		err = -EINVAL;
+		/**
+		 * TODO: overlayfs responds to this situation with a
+		 * lookupOneLen. Should we do that too?
+		 */
+		goto out;
+	}
+
+	fuse_new_inode = fuse_iget_backing(dir->i_sb, fuse_dir_inode->nodeid, backing_dir_inode);
+	if (IS_ERR(fuse_new_inode)) {
+		err = PTR_ERR(fuse_new_inode);
+		goto out;
+	}
+	d_instantiate(newent, fuse_new_inode);
+
+out:
+	dput(backing_dir_dentry);
+	path_put(&backing_new_path);
+err_dst_path:
+	path_put(&backing_old_path);
+	return err;
+}
+
+void *fuse_link_finalize(struct fuse_bpf_args *fa, struct dentry *entry,
+			 struct inode *dir, struct dentry *newent)
+{
+	return NULL;
+}
+
+int fuse_getattr_initialize(struct fuse_bpf_args *fa, struct fuse_getattr_io *fgio,
+			const struct dentry *entry, struct kstat *stat,
+			u32 request_mask, unsigned int flags)
+{
+	fgio->fgi = (struct fuse_getattr_in) {
+		.getattr_flags = flags,
+		.fh = -1, /* TODO is this OK? */
+	};
+
+	fgio->fao = (struct fuse_attr_out) {0};
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(entry->d_inode),
+		.opcode = FUSE_GETATTR,
+		.in_numargs = 1,
+		.out_numargs = 1,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = sizeof(fgio->fgi),
+			.value = &fgio->fgi,
+		},
+		.out_args[0] = (struct fuse_bpf_arg) {
+			.size = sizeof(fgio->fao),
+			.value = &fgio->fao,
+		},
+	};
+
+	return 0;
+}
+
+static void fuse_stat_to_attr(struct fuse_conn *fc, struct inode *inode,
+		struct kstat *stat, struct fuse_attr *attr)
+{
+	unsigned int blkbits;
+
+	/* see the comment in fuse_change_attributes() */
+	if (fc->writeback_cache && S_ISREG(inode->i_mode)) {
+		stat->size = i_size_read(inode);
+		stat->mtime.tv_sec = inode->i_mtime.tv_sec;
+		stat->mtime.tv_nsec = inode->i_mtime.tv_nsec;
+		stat->ctime.tv_sec = inode->i_ctime.tv_sec;
+		stat->ctime.tv_nsec = inode->i_ctime.tv_nsec;
+	}
+
+	attr->ino = stat->ino;
+	attr->mode = (inode->i_mode & S_IFMT) | (stat->mode & 07777);
+	attr->nlink = stat->nlink;
+	attr->uid = from_kuid(fc->user_ns, stat->uid);
+	attr->gid = from_kgid(fc->user_ns, stat->gid);
+	attr->atime = stat->atime.tv_sec;
+	attr->atimensec = stat->atime.tv_nsec;
+	attr->mtime = stat->mtime.tv_sec;
+	attr->mtimensec = stat->mtime.tv_nsec;
+	attr->ctime = stat->ctime.tv_sec;
+	attr->ctimensec = stat->ctime.tv_nsec;
+	attr->size = stat->size;
+	attr->blocks = stat->blocks;
+
+	if (stat->blksize != 0)
+		blkbits = ilog2(stat->blksize);
+	else
+		blkbits = inode->i_sb->s_blocksize_bits;
+
+	attr->blksize = 1 << blkbits;
+}
+
+int fuse_getattr_backing(struct fuse_bpf_args *fa,
+		const struct dentry *entry, struct kstat *stat,
+			u32 request_mask, unsigned int flags)
+{
+	struct path *backing_path =
+		&get_fuse_dentry(entry)->backing_path;
+	struct inode *backing_inode = backing_path->dentry->d_inode;
+	struct fuse_attr_out *fao = fa->out_args[0].value;
+	struct kstat tmp;
+	int err;
+
+	if (!stat)
+		stat = &tmp;
+
+	err = vfs_getattr(backing_path, stat, request_mask, flags);
+
+	if (!err)
+		fuse_stat_to_attr(get_fuse_conn(entry->d_inode),
+				  backing_inode, stat, &fao->attr);
+
+	return err;
+}
+
+void *fuse_getattr_finalize(struct fuse_bpf_args *fa,
+			const struct dentry *entry, struct kstat *stat,
+			u32 request_mask, unsigned int flags)
+{
+	struct fuse_attr_out *outarg = fa->out_args[0].value;
+	struct inode *inode = entry->d_inode;
+	u64 attr_version = fuse_get_attr_version(get_fuse_mount(inode)->fc);
+	int err = 0;
+
+	/* TODO: Ensure this doesn't happen if we had an error getting attrs in
+	 * backing.
+	 */
+	err = finalize_attr(inode, outarg, attr_version, stat);
+	return ERR_PTR(err);
+}
+
+static void fattr_to_iattr(struct fuse_conn *fc,
+			   const struct fuse_setattr_in *arg,
+			   struct iattr *iattr)
+{
+	unsigned int fvalid = arg->valid;
+
+	if (fvalid & FATTR_MODE)
+		iattr->ia_valid |= ATTR_MODE, iattr->ia_mode = arg->mode;
+	if (fvalid & FATTR_UID) {
+		iattr->ia_valid |= ATTR_UID;
+		iattr->ia_uid = make_kuid(fc->user_ns, arg->uid);
+	}
+	if (fvalid & FATTR_GID) {
+		iattr->ia_valid |= ATTR_GID;
+		iattr->ia_gid = make_kgid(fc->user_ns, arg->gid);
+	}
+	if (fvalid & FATTR_SIZE)
+		iattr->ia_valid |= ATTR_SIZE,  iattr->ia_size = arg->size;
+	if (fvalid & FATTR_ATIME) {
+		iattr->ia_valid |= ATTR_ATIME;
+		iattr->ia_atime.tv_sec = arg->atime;
+		iattr->ia_atime.tv_nsec = arg->atimensec;
+		if (!(fvalid & FATTR_ATIME_NOW))
+			iattr->ia_valid |= ATTR_ATIME_SET;
+	}
+	if (fvalid & FATTR_MTIME) {
+		iattr->ia_valid |= ATTR_MTIME;
+		iattr->ia_mtime.tv_sec = arg->mtime;
+		iattr->ia_mtime.tv_nsec = arg->mtimensec;
+		if (!(fvalid & FATTR_MTIME_NOW))
+			iattr->ia_valid |= ATTR_MTIME_SET;
+	}
+	if (fvalid & FATTR_CTIME) {
+		iattr->ia_valid |= ATTR_CTIME;
+		iattr->ia_ctime.tv_sec = arg->ctime;
+		iattr->ia_ctime.tv_nsec = arg->ctimensec;
+	}
+}
+
+int fuse_setattr_initialize(struct fuse_bpf_args *fa, struct fuse_setattr_io *fsio,
+		struct dentry *dentry, struct iattr *attr, struct file *file)
+{
+	struct fuse_conn *fc = get_fuse_conn(dentry->d_inode);
+
+	*fsio = (struct fuse_setattr_io) {0};
+	iattr_to_fattr(fc, attr, &fsio->fsi, true);
+
+	*fa = (struct fuse_bpf_args) {
+		.opcode = FUSE_SETATTR,
+		.nodeid = get_node_id(dentry->d_inode),
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(fsio->fsi),
+		.in_args[0].value = &fsio->fsi,
+		.out_numargs = 1,
+		.out_args[0].size = sizeof(fsio->fao),
+		.out_args[0].value = &fsio->fao,
+	};
+
+	return 0;
+}
+
+int fuse_setattr_backing(struct fuse_bpf_args *fa,
+		struct dentry *dentry, struct iattr *attr, struct file *file)
+{
+	struct fuse_conn *fc = get_fuse_conn(dentry->d_inode);
+	const struct fuse_setattr_in *fsi = fa->in_args[0].value;
+	struct iattr new_attr = {0};
+	struct path *backing_path = &get_fuse_dentry(dentry)->backing_path;
+	int res;
+
+	fattr_to_iattr(fc, fsi, &new_attr);
+	/* TODO: Some info doesn't get saved by the attr->fattr->attr transition
+	 * When we actually allow the bpf to change these, we may have to consider
+	 * the extra flags more, or pass more info into the bpf. Until then we can
+	 * keep everything except for ATTR_FILE, since we'd need a file on the
+	 * lower fs. For what it's worth, neither f2fs nor ext4 make use of that
+	 * even if it is present.
+	 */
+	new_attr.ia_valid = attr->ia_valid & ~ATTR_FILE;
+	inode_lock(d_inode(backing_path->dentry));
+	res = notify_change(&init_user_ns, backing_path->dentry, &new_attr,
+			    NULL);
+	inode_unlock(d_inode(backing_path->dentry));
+
+	if (res == 0 && (new_attr.ia_valid & ATTR_SIZE))
+		i_size_write(dentry->d_inode, new_attr.ia_size);
+	return res;
+}
+
+void *fuse_setattr_finalize(struct fuse_bpf_args *fa,
+		struct dentry *dentry, struct iattr *attr, struct file *file)
+{
+	return NULL;
+}
+
+int fuse_statfs_initialize(
+		struct fuse_bpf_args *fa, struct fuse_statfs_out *fso,
+		struct dentry *dentry, struct kstatfs *buf)
+{
+	*fso = (struct fuse_statfs_out) {0};
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(d_inode(dentry)),
+		.opcode = FUSE_STATFS,
+		.out_numargs = 1,
+		.out_numargs = 1,
+		.out_args[0].size = sizeof(fso),
+		.out_args[0].value = fso,
+	};
+
+	return 0;
+}
+
+int fuse_statfs_backing(
+		struct fuse_bpf_args *fa,
+		struct dentry *dentry, struct kstatfs *buf)
+{
+	int err = 0;
+	struct path backing_path;
+	struct fuse_statfs_out *fso = fa->out_args[0].value;
+
+	get_fuse_backing_path(dentry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+	err = vfs_statfs(&backing_path, buf);
+	path_put(&backing_path);
+	buf->f_type = FUSE_SUPER_MAGIC;
+
+	//TODO Provide postfilter opportunity to modify
+	if (!err)
+		convert_statfs_to_fuse(&fso->st, buf);
+
+	return err;
+}
+
+void *fuse_statfs_finalize(
+		struct fuse_bpf_args *fa,
+		struct dentry *dentry, struct kstatfs *buf)
+{
+	struct fuse_statfs_out *fso = fa->out_args[0].value;
+
+	if (!fa->error_in)
+		convert_fuse_statfs(buf, &fso->st);
+	return NULL;
+}
+
+int fuse_get_link_initialize(struct fuse_bpf_args *fa, struct fuse_dummy_io *unused,
+		struct inode *inode, struct dentry *dentry,
+		struct delayed_call *callback, const char **out)
+{
+	/*
+	 * TODO
+	 * If we want to handle changing these things, we'll need to copy
+	 * the lower fs's data into our own buffer, and provide our own callback
+	 * to free that buffer.
+	 *
+	 * Pre could change the name we're looking at
+	 * postfilter can change the name we return
+	 *
+	 * We ought to only make that buffer if it's been requested, so leaving
+	 * this unimplemented for the moment
+	 */
+	*fa = (struct fuse_bpf_args) {
+		.opcode = FUSE_READLINK,
+		.nodeid = get_node_id(inode),
+		.in_numargs = 1,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = dentry->d_name.len + 1,
+			.value = dentry->d_name.name,
+		},
+		/*
+		 * .out_argvar = 1,
+		 * .out_numargs = 1,
+		 * .out_args[0].size = ,
+		 * .out_args[0].value = ,
+		 */
+	};
+
+	return 0;
+}
+
+int fuse_get_link_backing(struct fuse_bpf_args *fa,
+		struct inode *inode, struct dentry *dentry,
+		struct delayed_call *callback, const char **out)
+{
+	struct path backing_path;
+
+	if (!dentry) {
+		*out = ERR_PTR(-ECHILD);
+		return PTR_ERR(*out);
+	}
+
+	get_fuse_backing_path(dentry, &backing_path);
+	if (!backing_path.dentry) {
+		*out = ERR_PTR(-ECHILD);
+		return PTR_ERR(*out);
+	}
+
+	/*
+	 * TODO: If we want to do our own thing, copy the data and then call the
+	 * callback
+	 */
+	*out = vfs_get_link(backing_path.dentry, callback);
+
+	path_put(&backing_path);
+	return 0;
+}
+
+void *fuse_get_link_finalize(struct fuse_bpf_args *fa,
+		struct inode *inode, struct dentry *dentry,
+		struct delayed_call *callback,  const char **out)
+{
+	return NULL;
+}
+
+int fuse_symlink_initialize(
+		struct fuse_bpf_args *fa, struct fuse_dummy_io *unused,
+		struct inode *dir, struct dentry *entry, const char *link, int len)
+{
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_SYMLINK,
+		.in_numargs = 2,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = entry->d_name.len + 1,
+			.value = entry->d_name.name,
+		},
+		.in_args[1] = (struct fuse_bpf_in_arg) {
+			.size = len,
+			.value = link,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_symlink_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, const char *link, int len)
+{
+	int err = 0;
+	struct fuse_inode *fuse_inode = get_fuse_inode(dir);
+	struct inode *backing_inode = fuse_inode->backing_inode;
+	struct path backing_path = {};
+	struct inode *inode = NULL;
+
+	//TODO Actually deal with changing the backing entry in symlink
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	err = vfs_symlink(&init_user_ns, backing_inode, backing_path.dentry,
+			  link);
+	inode_unlock(backing_inode);
+	if (err)
+		goto out;
+	if (d_really_is_negative(backing_path.dentry) ||
+		unlikely(d_unhashed(backing_path.dentry))) {
+		err = -EINVAL;
+		/**
+		 * TODO: overlayfs responds to this situation with a
+		 * lookupOneLen. Should we do that too?
+		 */
+		goto out;
+	}
+	inode = fuse_iget_backing(dir->i_sb, fuse_inode->nodeid, backing_inode);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		goto out;
+	}
+	d_instantiate(entry, inode);
+out:
+	path_put(&backing_path);
+	return err;
+}
+
+void *fuse_symlink_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, const char *link, int len)
+{
+	return NULL;
+}
+
+int fuse_readdir_initialize(struct fuse_bpf_args *fa, struct fuse_read_io *frio,
+			    struct file *file, struct dir_context *ctx,
+			    bool *force_again, bool *allow_force, bool is_continued)
+{
+	struct fuse_file *ff = file->private_data;
+	u8 *page = (u8 *)__get_free_page(GFP_KERNEL);
+
+	if (!page)
+		return -ENOMEM;
+
+	*fa = (struct fuse_bpf_args) {
+		.nodeid = ff->nodeid,
+		.opcode = FUSE_READDIR,
+		.in_numargs = 1,
+		.flags = FUSE_BPF_OUT_ARGVAR,
+		.out_numargs = 2,
+		.in_args[0] = (struct fuse_bpf_in_arg) {
+			.size = sizeof(frio->fri),
+			.value = &frio->fri,
+		},
+		.out_args[0] = (struct fuse_bpf_arg) {
+			.size = sizeof(frio->fro),
+			.value = &frio->fro,
+		},
+		.out_args[1] = (struct fuse_bpf_arg) {
+			.size = PAGE_SIZE,
+			.value = page,
+		},
+	};
+
+	frio->fri = (struct fuse_read_in) {
+		.fh = ff->fh,
+		.offset = ctx->pos,
+		.size = PAGE_SIZE,
+	};
+	frio->fro = (struct fuse_read_out) {
+		.again = 0,
+		.offset = 0,
+	};
+	*force_again = false;
+	*allow_force = true;
+	return 0;
+}
+
+struct extfuse_ctx {
+	struct dir_context ctx;
+	u8 *addr;
+	size_t offset;
+};
+
+static bool filldir(struct dir_context *ctx, const char *name, int namelen,
+				   loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct extfuse_ctx *ec = container_of(ctx, struct extfuse_ctx, ctx);
+	struct fuse_dirent *fd = (struct fuse_dirent *) (ec->addr + ec->offset);
+
+	if (ec->offset + sizeof(struct fuse_dirent) + namelen > PAGE_SIZE)
+		return false;
+
+	*fd = (struct fuse_dirent) {
+		.ino = ino,
+		.off = offset,
+		.namelen = namelen,
+		.type = d_type,
+	};
+
+	memcpy(fd->name, name, namelen);
+	ec->offset += FUSE_DIRENT_SIZE(fd);
+
+	return true;
+}
+
+static int parse_dirfile(char *buf, size_t nbytes, struct dir_context *ctx)
+{
+	while (nbytes >= FUSE_NAME_OFFSET) {
+		struct fuse_dirent *dirent = (struct fuse_dirent *) buf;
+		size_t reclen = FUSE_DIRENT_SIZE(dirent);
+
+		if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX)
+			return -EIO;
+		if (reclen > nbytes)
+			break;
+		if (memchr(dirent->name, '/', dirent->namelen) != NULL)
+			return -EIO;
+
+		ctx->pos = dirent->off;
+		if (!dir_emit(ctx, dirent->name, dirent->namelen, dirent->ino,
+				dirent->type))
+			break;
+
+		buf += reclen;
+		nbytes -= reclen;
+	}
+
+	return 0;
+}
+
+
+int fuse_readdir_backing(struct fuse_bpf_args *fa,
+			 struct file *file, struct dir_context *ctx,
+			 bool *force_again, bool *allow_force, bool is_continued)
+{
+	struct fuse_file *ff = file->private_data;
+	struct file *backing_dir = ff->backing_file;
+	struct fuse_read_out *fro = fa->out_args[0].value;
+	struct extfuse_ctx ec;
+	int err;
+
+	ec = (struct extfuse_ctx) {
+		.ctx.actor = filldir,
+		.ctx.pos = ctx->pos,
+		.addr = fa->out_args[1].value,
+	};
+
+	if (!ec.addr)
+		return -ENOMEM;
+
+	if (!is_continued)
+		backing_dir->f_pos = file->f_pos;
+
+	err = iterate_dir(backing_dir, &ec.ctx);
+	if (ec.offset == 0)
+		*allow_force = false;
+	fa->out_args[1].size = ec.offset;
+
+	fro->offset = ec.ctx.pos;
+	fro->again = false;
+	return err;
+}
+
+void *fuse_readdir_finalize(struct fuse_bpf_args *fa,
+			    struct file *file, struct dir_context *ctx,
+			    bool *force_again, bool *allow_force, bool is_continued)
+{
+	struct fuse_read_out *fro = fa->out_args[0].value;
+	struct fuse_file *ff = file->private_data;
+	struct file *backing_dir = ff->backing_file;
+	int err = 0;
+
+	err = parse_dirfile(fa->out_args[1].value, fa->out_args[1].size, ctx);
+	*force_again = !!fro->again;
+	if (*force_again && !*allow_force)
+		err = -EINVAL;
+
+	ctx->pos = fro->offset;
+	backing_dir->f_pos = fro->offset;
+
+	free_page((unsigned long) fa->out_args[1].value);
+	return ERR_PTR(err);
+}
+
+int fuse_access_initialize(struct fuse_bpf_args *fa, struct fuse_access_in *fai,
+			    struct inode *inode, int mask)
+{
+	*fai = (struct fuse_access_in) {
+		.mask = mask,
+	};
+
+	*fa = (struct fuse_bpf_args) {
+		.opcode = FUSE_ACCESS,
+		.nodeid = get_node_id(inode),
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*fai),
+		.in_args[0].value = fai,
+	};
+
+	return 0;
+}
+
+int fuse_access_backing(struct fuse_bpf_args *fa, struct inode *inode, int mask)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	const struct fuse_access_in *fai = fa->in_args[0].value;
+
+	return inode_permission(&init_user_ns, fi->backing_inode, fai->mask);
+}
+
+void *fuse_access_finalize(struct fuse_bpf_args *fa, struct inode *inode, int mask)
+{
+	return NULL;
+}
+
+int __init fuse_bpf_init(void)
+{
+	fuse_bpf_aio_request_cachep = kmem_cache_create("fuse_bpf_aio_req",
+						   sizeof(struct fuse_bpf_aio_req),
+						   0, SLAB_HWCACHE_ALIGN, NULL);
+	if (!fuse_bpf_aio_request_cachep)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void __exit fuse_bpf_cleanup(void)
+{
+	kmem_cache_destroy(fuse_bpf_aio_request_cachep);
+}
+
+ssize_t fuse_bpf_simple_request(struct fuse_mount *fm, struct fuse_bpf_args *bpf_args)
+{
+	int i;
+	ssize_t res;
+	struct fuse_args args = {
+		.nodeid = bpf_args->nodeid,
+		.opcode = bpf_args->opcode,
+		.error_in = bpf_args->error_in,
+		.in_numargs = bpf_args->in_numargs,
+		.out_numargs = bpf_args->out_numargs,
+		.force = !!(bpf_args->flags & FUSE_BPF_FORCE),
+		.out_argvar = !!(bpf_args->flags & FUSE_BPF_OUT_ARGVAR),
+	};
+
+	for (i = 0; i < args.in_numargs; ++i)
+		args.in_args[i] = (struct fuse_in_arg) {
+			.size = bpf_args->in_args[i].size,
+			.value = bpf_args->in_args[i].value,
+		};
+	for (i = 0; i < args.out_numargs; ++i)
+		args.out_args[i] = (struct fuse_arg) {
+			.size = bpf_args->out_args[i].size,
+			.value = bpf_args->out_args[i].value,
+		};
+
+	res = fuse_simple_request(fm, &args);
+
+	*bpf_args = (struct fuse_bpf_args) {
+		.nodeid = args.nodeid,
+		.opcode = args.opcode,
+		.error_in = args.error_in,
+		.in_numargs = args.in_numargs,
+		.out_numargs = args.out_numargs,
+	};
+	if (args.force)
+		bpf_args->flags |= FUSE_BPF_FORCE;
+	if (args.out_argvar)
+		bpf_args->flags |= FUSE_BPF_OUT_ARGVAR;
+	for (i = 0; i < args.in_numargs; ++i)
+		bpf_args->in_args[i] = (struct fuse_bpf_in_arg) {
+			.size = args.in_args[i].size,
+			.value = args.in_args[i].value,
+		};
+	for (i = 0; i < args.out_numargs; ++i)
+		bpf_args->out_args[i] = (struct fuse_bpf_arg) {
+			.size = args.out_args[i].size,
+			.value = args.out_args[i].value,
+		};
+	return res;
+}

diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index 247ef4f..6855524 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c

@@ -378,7 +378,7 @@ int __init fuse_ctl_init(void)
 	return register_filesystem(&fuse_ctl_fs_type);
 }
 
-void __exit fuse_ctl_cleanup(void)
+void fuse_ctl_cleanup(void)
 {
 	unregister_filesystem(&fuse_ctl_fs_type);
 }

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index b4a6e0a..155e073 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c

@@ -14,6 +14,7 @@
 #include <linux/sched/signal.h>
 #include <linux/uio.h>
 #include <linux/miscdevice.h>
+#include <linux/namei.h>
 #include <linux/pagemap.h>
 #include <linux/file.h>
 #include <linux/slab.h>
@@ -207,10 +208,13 @@ static unsigned int fuse_req_hash(u64 unique)
 /**
  * A new request is available, wake fiq->waitq
  */
-static void fuse_dev_wake_and_unlock(struct fuse_iqueue *fiq)
+static void fuse_dev_wake_and_unlock(struct fuse_iqueue *fiq, bool sync)
 __releases(fiq->lock)
 {
-	wake_up(&fiq->waitq);
+	if (sync)
+		wake_up_sync(&fiq->waitq);
+	else
+		wake_up(&fiq->waitq);
 	kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
 	spin_unlock(&fiq->lock);
 }
@@ -223,14 +227,14 @@ const struct fuse_iqueue_ops fuse_dev_fiq_ops = {
 EXPORT_SYMBOL_GPL(fuse_dev_fiq_ops);
 
 static void queue_request_and_unlock(struct fuse_iqueue *fiq,
-				     struct fuse_req *req)
+				     struct fuse_req *req, bool sync)
 __releases(fiq->lock)
 {
 	req->in.h.len = sizeof(struct fuse_in_header) +
 		fuse_len_args(req->args->in_numargs,
 			      (struct fuse_arg *) req->args->in_args);
 	list_add_tail(&req->list, &fiq->pending);
-	fiq->ops->wake_pending_and_unlock(fiq);
+	fiq->ops->wake_pending_and_unlock(fiq, sync);
 }
 
 void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget,
@@ -238,6 +242,11 @@ void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget,
 {
 	struct fuse_iqueue *fiq = &fc->iq;
 
+	if (nodeid == 0) {
+		kfree(forget);
+		return;
+	}
+
 	forget->forget_one.nodeid = nodeid;
 	forget->forget_one.nlookup = nlookup;
 
@@ -245,7 +254,7 @@ void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget,
 	if (fiq->connected) {
 		fiq->forget_list_tail->next = forget;
 		fiq->forget_list_tail = forget;
-		fiq->ops->wake_forget_and_unlock(fiq);
+		fiq->ops->wake_forget_and_unlock(fiq, false);
 	} else {
 		kfree(forget);
 		spin_unlock(&fiq->lock);
@@ -265,7 +274,7 @@ static void flush_bg_queue(struct fuse_conn *fc)
 		fc->active_background++;
 		spin_lock(&fiq->lock);
 		req->in.h.unique = fuse_get_unique(fiq);
-		queue_request_and_unlock(fiq, req);
+		queue_request_and_unlock(fiq, req, false);
 	}
 }
 
@@ -354,7 +363,7 @@ static int queue_interrupt(struct fuse_req *req)
 			spin_unlock(&fiq->lock);
 			return 0;
 		}
-		fiq->ops->wake_interrupt_and_unlock(fiq);
+		fiq->ops->wake_interrupt_and_unlock(fiq, false);
 	} else {
 		spin_unlock(&fiq->lock);
 	}
@@ -421,7 +430,7 @@ static void __fuse_request_send(struct fuse_req *req)
 		/* acquire extra reference, since request is still needed
 		   after fuse_request_end() */
 		__fuse_get_request(req);
-		queue_request_and_unlock(fiq, req);
+		queue_request_and_unlock(fiq, req, true);
 
 		request_wait_answer(req);
 		/* Pairs with smp_wmb() in fuse_request_end() */
@@ -475,6 +484,7 @@ static void fuse_args_to_req(struct fuse_req *req, struct fuse_args *args)
 {
 	req->in.h.opcode = args->opcode;
 	req->in.h.nodeid = args->nodeid;
+	req->in.h.padding = args->error_in;
 	req->args = args;
 	if (args->end)
 		__set_bit(FR_ASYNC, &req->flags);
@@ -592,7 +602,7 @@ static int fuse_simple_notify_reply(struct fuse_mount *fm,
 
 	spin_lock(&fiq->lock);
 	if (fiq->connected) {
-		queue_request_and_unlock(fiq, req);
+		queue_request_and_unlock(fiq, req, false);
 	} else {
 		err = -ENODEV;
 		spin_unlock(&fiq->lock);
@@ -1922,6 +1932,27 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 		err = copy_out_args(cs, req->args, nbytes);
 	fuse_copy_finish(cs);
 
+	if (!err && req->in.h.opcode == FUSE_CANONICAL_PATH) {
+		char *path = (char *)req->args->out_args[0].value;
+
+		path[req->args->out_args[0].size - 1] = 0;
+		req->out.h.error =
+			kern_path(path, 0, req->args->canonical_path);
+	}
+
+	if (!err && (req->in.h.opcode == FUSE_LOOKUP ||
+		     req->in.h.opcode == (FUSE_LOOKUP | FUSE_POSTFILTER)) &&
+		req->args->out_args[1].size == sizeof(struct fuse_entry_bpf_out)) {
+		struct fuse_entry_bpf_out *febo = (struct fuse_entry_bpf_out *)
+				req->args->out_args[1].value;
+		struct fuse_entry_bpf *feb = container_of(febo, struct fuse_entry_bpf, out);
+
+		if (febo->backing_action == FUSE_ACTION_REPLACE)
+			feb->backing_file = fget(febo->backing_fd);
+		if (febo->bpf_action == FUSE_ACTION_REPLACE)
+			feb->bpf_file = fget(febo->bpf_fd);
+	}
+
 	spin_lock(&fpq->lock);
 	clear_bit(FR_LOCKED, &req->flags);
 	if (!fpq->connected)
@@ -2268,7 +2299,8 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 				 * uses the same ioctl handler.
 				 */
 				if (old->f_op == file->f_op &&
-				    old->f_cred->user_ns == file->f_cred->user_ns)
+				    old->f_cred->user_ns ==
+					    file->f_cred->user_ns)
 					fud = fuse_get_dev(old);
 
 				if (fud) {
@@ -2280,6 +2312,15 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 			}
 		}
 		break;
+	case FUSE_DEV_IOC_PASSTHROUGH_OPEN:
+		res = -EFAULT;
+		if (!get_user(oldfd, (__u32 __user *)arg)) {
+			res = -EINVAL;
+			fud = fuse_get_dev(file);
+			if (fud)
+				res = fuse_passthrough_open(fud, oldfd);
+		}
+		break;
 	default:
 		res = -ENOTTY;
 		break;

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index bb97a38..d7c90d7 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c

@@ -8,8 +8,10 @@
 
 #include "fuse_i.h"
 
+#include <linux/fdtable.h>
 #include <linux/pagemap.h>
 #include <linux/file.h>
+#include <linux/filter.h>
 #include <linux/fs_context.h>
 #include <linux/moduleparam.h>
 #include <linux/sched.h>
@@ -27,6 +29,8 @@ module_param(allow_sys_admin_access, bool, 0644);
 MODULE_PARM_DESC(allow_sys_admin_access,
 		 "Allow users with CAP_SYS_ADMIN in initial userns to bypass allow_other access check");
 
+#include "../internal.h"
+
 static void fuse_advise_use_readdirplus(struct inode *dir)
 {
 	struct fuse_inode *fi = get_fuse_inode(dir);
@@ -34,7 +38,7 @@ static void fuse_advise_use_readdirplus(struct inode *dir)
 	set_bit(FUSE_I_ADVISE_RDPLUS, &fi->state);
 }
 
-#if BITS_PER_LONG >= 64
+#if BITS_PER_LONG >= 64 && !defined(CONFIG_FUSE_BPF)
 static inline void __fuse_dentry_settime(struct dentry *entry, u64 time)
 {
 	entry->d_fsdata = (void *) time;
@@ -46,19 +50,15 @@ static inline u64 fuse_dentry_time(const struct dentry *entry)
 }
 
 #else
-union fuse_dentry {
-	u64 time;
-	struct rcu_head rcu;
-};
 
 static inline void __fuse_dentry_settime(struct dentry *dentry, u64 time)
 {
-	((union fuse_dentry *) dentry->d_fsdata)->time = time;
+	((struct fuse_dentry *) dentry->d_fsdata)->time = time;
 }
 
 static inline u64 fuse_dentry_time(const struct dentry *entry)
 {
-	return ((union fuse_dentry *) entry->d_fsdata)->time;
+	return ((struct fuse_dentry *) entry->d_fsdata)->time;
 }
 #endif
 
@@ -83,26 +83,16 @@ static void fuse_dentry_settime(struct dentry *dentry, u64 time)
 	__fuse_dentry_settime(dentry, time);
 }
 
-/*
- * FUSE caches dentries and attributes with separate timeout.  The
- * time in jiffies until the dentry/attributes are valid is stored in
- * dentry->d_fsdata and fuse_inode->i_time respectively.
- */
-
-/*
- * Calculate the time in jiffies until a dentry/attributes are valid
- */
-static u64 time_to_jiffies(u64 sec, u32 nsec)
+void fuse_init_dentry_root(struct dentry *root, struct file *backing_dir)
 {
-	if (sec || nsec) {
-		struct timespec64 ts = {
-			sec,
-			min_t(u32, nsec, NSEC_PER_SEC - 1)
-		};
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_dentry *fuse_dentry = root->d_fsdata;
 
-		return get_jiffies_64() + timespec64_to_jiffies(&ts);
-	} else
-		return 0;
+	if (backing_dir) {
+		fuse_dentry->backing_path = backing_dir->f_path;
+		path_get(&fuse_dentry->backing_path);
+	}
+#endif
 }
 
 /*
@@ -115,11 +105,6 @@ void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o)
 		time_to_jiffies(o->entry_valid, o->entry_valid_nsec));
 }
 
-static u64 attr_timeout(struct fuse_attr_out *o)
-{
-	return time_to_jiffies(o->attr_valid, o->attr_valid_nsec);
-}
-
 u64 entry_attr_timeout(struct fuse_entry_out *o)
 {
 	return time_to_jiffies(o->attr_valid, o->attr_valid_nsec);
@@ -180,7 +165,8 @@ static void fuse_invalidate_entry(struct dentry *entry)
 
 static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
 			     u64 nodeid, const struct qstr *name,
-			     struct fuse_entry_out *outarg)
+			     struct fuse_entry_out *outarg,
+			     struct fuse_entry_bpf_out *bpf_outarg)
 {
 	memset(outarg, 0, sizeof(struct fuse_entry_out));
 	args->opcode = FUSE_LOOKUP;
@@ -188,11 +174,52 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
 	args->in_numargs = 1;
 	args->in_args[0].size = name->len + 1;
 	args->in_args[0].value = name->name;
-	args->out_numargs = 1;
+	args->out_argvar = true;
+	args->out_numargs = 2;
 	args->out_args[0].size = sizeof(struct fuse_entry_out);
 	args->out_args[0].value = outarg;
+	args->out_args[1].size = sizeof(struct fuse_entry_bpf_out);
+	args->out_args[1].value = bpf_outarg;
 }
 
+#ifdef CONFIG_FUSE_BPF
+static bool backing_data_changed(struct fuse_inode *fi, struct dentry *entry,
+				 struct fuse_entry_bpf *bpf_arg)
+{
+	struct path new_backing_path;
+	struct inode *new_backing_inode;
+	struct bpf_prog *bpf = NULL;
+	int err;
+	bool ret = true;
+
+	if (!entry)
+		return false;
+
+	get_fuse_backing_path(entry, &new_backing_path);
+	new_backing_inode = fi->backing_inode;
+	ihold(new_backing_inode);
+
+	err = fuse_handle_backing(bpf_arg, &new_backing_inode, &new_backing_path);
+
+	if (err)
+		goto put_inode;
+
+	err = fuse_handle_bpf_prog(bpf_arg, entry->d_parent->d_inode, &bpf);
+	if (err)
+		goto put_bpf;
+
+	ret = (bpf != fi->bpf || fi->backing_inode != new_backing_inode ||
+			!path_equal(&get_fuse_dentry(entry)->backing_path, &new_backing_path));
+put_bpf:
+	if (bpf)
+		bpf_prog_put(bpf);
+put_inode:
+	iput(new_backing_inode);
+	path_put(&new_backing_path);
+	return ret;
+}
+#endif
+
 /*
  * Check whether the dentry is still valid
  *
@@ -213,9 +240,23 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 	inode = d_inode_rcu(entry);
 	if (inode && fuse_is_bad(inode))
 		goto invalid;
-	else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
+
+#ifdef CONFIG_FUSE_BPF
+	/* TODO: Do we need bpf support for revalidate?
+	 * If the lower filesystem says the entry is invalid, FUSE probably shouldn't
+	 * try to fix that without going through the normal lookup path...
+	 */
+	if (get_fuse_dentry(entry)->backing_path.dentry) {
+		ret = fuse_revalidate_backing(entry, flags);
+		if (ret <= 0) {
+			goto out;
+		}
+	}
+#endif
+	if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
 		 (flags & (LOOKUP_EXCL | LOOKUP_REVAL))) {
 		struct fuse_entry_out outarg;
+		struct fuse_entry_bpf bpf_arg;
 		FUSE_ARGS(args);
 		struct fuse_forget_link *forget;
 		u64 attr_version;
@@ -227,27 +268,44 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 		ret = -ECHILD;
 		if (flags & LOOKUP_RCU)
 			goto out;
-
 		fm = get_fuse_mount(inode);
 
+		parent = dget_parent(entry);
+
+#ifdef CONFIG_FUSE_BPF
+		/* TODO: Once we're handling timeouts for backing inodes, do a
+		 * bpf based lookup_revalidate here.
+		 */
+		if (get_fuse_inode(parent->d_inode)->backing_inode) {
+			dput(parent);
+			ret = 1;
+			goto out;
+		}
+#endif
 		forget = fuse_alloc_forget();
 		ret = -ENOMEM;
-		if (!forget)
+		if (!forget) {
+			dput(parent);
 			goto out;
+		}
 
 		attr_version = fuse_get_attr_version(fm->fc);
 
-		parent = dget_parent(entry);
 		fuse_lookup_init(fm->fc, &args, get_node_id(d_inode(parent)),
-				 &entry->d_name, &outarg);
+				 &entry->d_name, &outarg, &bpf_arg.out);
 		ret = fuse_simple_request(fm, &args);
 		dput(parent);
+
 		/* Zero nodeid is same as -ENOENT */
 		if (!ret && !outarg.nodeid)
 			ret = -ENOENT;
-		if (!ret) {
+		if (!ret || ret == sizeof(bpf_arg.out)) {
 			fi = get_fuse_inode(inode);
 			if (outarg.nodeid != get_node_id(inode) ||
+#ifdef CONFIG_FUSE_BPF
+			    (ret == sizeof(bpf_arg.out) &&
+					    backing_data_changed(fi, entry, &bpf_arg)) ||
+#endif
 			    (bool) IS_AUTOMOUNT(inode) != (bool) (outarg.attr.flags & FUSE_ATTR_SUBMOUNT)) {
 				fuse_queue_forget(fm->fc, forget,
 						  outarg.nodeid, 1);
@@ -289,17 +347,20 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 	goto out;
 }
 
-#if BITS_PER_LONG < 64
+#if BITS_PER_LONG < 64 || defined(CONFIG_FUSE_BPF)
 static int fuse_dentry_init(struct dentry *dentry)
 {
-	dentry->d_fsdata = kzalloc(sizeof(union fuse_dentry),
+	dentry->d_fsdata = kzalloc(sizeof(struct fuse_dentry),
 				   GFP_KERNEL_ACCOUNT | __GFP_RECLAIMABLE);
 
 	return dentry->d_fsdata ? 0 : -ENOMEM;
 }
 static void fuse_dentry_release(struct dentry *dentry)
 {
-	union fuse_dentry *fd = dentry->d_fsdata;
+	struct fuse_dentry *fd = dentry->d_fsdata;
+
+	if (fd && fd->backing_path.dentry)
+		path_put(&fd->backing_path);
 
 	kfree_rcu(fd, rcu);
 }
@@ -337,18 +398,70 @@ static struct vfsmount *fuse_dentry_automount(struct path *path)
 	return mnt;
 }
 
+/*
+ * Get the canonical path. Since we must translate to a path, this must be done
+ * in the context of the userspace daemon, however, the userspace daemon cannot
+ * look up paths on its own. Instead, we handle the lookup as a special case
+ * inside of the write request.
+ */
+static void fuse_dentry_canonical_path(const struct path *path,
+				       struct path *canonical_path)
+{
+	struct inode *inode = d_inode(path->dentry);
+	//struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_mount *fm = get_fuse_mount_super(path->mnt->mnt_sb);
+	FUSE_ARGS(args);
+	char *path_name;
+	int err;
+
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_dummy_io,
+			       fuse_canonical_path_initialize,
+			       fuse_canonical_path_backing,
+			       fuse_canonical_path_finalize, path,
+			       canonical_path);
+	if (fer.ret)
+		return;
+#endif
+
+	path_name = (char *)get_zeroed_page(GFP_KERNEL);
+	if (!path_name)
+		goto default_path;
+
+	args.opcode = FUSE_CANONICAL_PATH;
+	args.nodeid = get_node_id(inode);
+	args.in_numargs = 0;
+	args.out_numargs = 1;
+	args.out_args[0].size = PATH_MAX;
+	args.out_args[0].value = path_name;
+	args.canonical_path = canonical_path;
+	args.out_argvar = 1;
+
+	err = fuse_simple_request(fm, &args);
+	free_page((unsigned long)path_name);
+	if (err > 0)
+		return;
+default_path:
+	canonical_path->dentry = path->dentry;
+	canonical_path->mnt = path->mnt;
+	path_get(canonical_path);
+}
+
 const struct dentry_operations fuse_dentry_operations = {
 	.d_revalidate	= fuse_dentry_revalidate,
 	.d_delete	= fuse_dentry_delete,
-#if BITS_PER_LONG < 64
+#if BITS_PER_LONG < 64 || defined(CONFIG_FUSE_BPF)
 	.d_init		= fuse_dentry_init,
 	.d_release	= fuse_dentry_release,
 #endif
 	.d_automount	= fuse_dentry_automount,
+	.d_canonical_path = fuse_dentry_canonical_path,
 };
 
 const struct dentry_operations fuse_root_dentry_operations = {
-#if BITS_PER_LONG < 64
+#if BITS_PER_LONG < 64 || defined(CONFIG_FUSE_BPF)
 	.d_init		= fuse_dentry_init,
 	.d_release	= fuse_dentry_release,
 #endif
@@ -367,10 +480,13 @@ bool fuse_invalid_attr(struct fuse_attr *attr)
 }
 
 int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name,
-		     struct fuse_entry_out *outarg, struct inode **inode)
+		     struct fuse_entry_out *outarg,
+		     struct dentry *entry,
+		     struct inode **inode)
 {
 	struct fuse_mount *fm = get_fuse_mount_super(sb);
 	FUSE_ARGS(args);
+	struct fuse_entry_bpf bpf_arg = {0};
 	struct fuse_forget_link *forget;
 	u64 attr_version;
 	int err;
@@ -388,23 +504,68 @@ int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name
 
 	attr_version = fuse_get_attr_version(fm->fc);
 
-	fuse_lookup_init(fm->fc, &args, nodeid, name, outarg);
+	fuse_lookup_init(fm->fc, &args, nodeid, name, outarg, &bpf_arg.out);
 	err = fuse_simple_request(fm, &args);
-	/* Zero nodeid is same as -ENOENT, but with valid timeout */
-	if (err || !outarg->nodeid)
-		goto out_put_forget;
 
-	err = -EIO;
-	if (!outarg->nodeid)
-		goto out_put_forget;
-	if (fuse_invalid_attr(&outarg->attr))
-		goto out_put_forget;
+#ifdef CONFIG_FUSE_BPF
+	if (err == sizeof(bpf_arg.out)) {
+		/* TODO Make sure this handles invalid handles */
+		struct file *backing_file;
+		struct inode *backing_inode;
 
-	*inode = fuse_iget(sb, outarg->nodeid, outarg->generation,
-			   &outarg->attr, entry_attr_timeout(outarg),
-			   attr_version);
+		err = -ENOENT;
+		if (!entry)
+			goto out_queue_forget;
+
+		err = -EINVAL;
+		backing_file = bpf_arg.backing_file;
+		if (!backing_file)
+			goto out_queue_forget;
+
+		if (IS_ERR(backing_file)) {
+			err = PTR_ERR(backing_file);
+			goto out_queue_forget;
+		}
+
+		backing_inode = backing_file->f_inode;
+		*inode = fuse_iget_backing(sb, outarg->nodeid, backing_inode);
+		if (!*inode)
+			goto bpf_arg_out;
+
+		err = fuse_handle_backing(&bpf_arg,
+				&get_fuse_inode(*inode)->backing_inode,
+				&get_fuse_dentry(entry)->backing_path);
+		if (err)
+			goto out;
+
+		err = fuse_handle_bpf_prog(&bpf_arg, NULL, &get_fuse_inode(*inode)->bpf);
+		if (err)
+			goto out;
+bpf_arg_out:
+		fput(backing_file);
+	} else
+#endif
+	{
+		/* Zero nodeid is same as -ENOENT, but with valid timeout */
+		if (err || !outarg->nodeid)
+			goto out_put_forget;
+
+		err = -EIO;
+		if (!outarg->nodeid)
+			goto out_put_forget;
+		if (fuse_invalid_attr(&outarg->attr))
+			goto out_put_forget;
+
+		*inode = fuse_iget(sb, outarg->nodeid, outarg->generation,
+				   &outarg->attr, entry_attr_timeout(outarg),
+				   attr_version);
+	}
+
 	err = -ENOMEM;
-	if (!*inode) {
+#ifdef CONFIG_FUSE_BPF
+out_queue_forget:
+#endif
+	if (!*inode && outarg->nodeid) {
 		fuse_queue_forget(fm->fc, forget, outarg->nodeid, 1);
 		goto out;
 	}
@@ -426,12 +587,23 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry,
 	bool outarg_valid = true;
 	bool locked;
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(dir, struct fuse_lookup_io,
+			       fuse_lookup_initialize, fuse_lookup_backing,
+			       fuse_lookup_finalize,
+			       dir, entry, flags);
+	if (fer.ret)
+		return fer.result;
+#endif
+
 	if (fuse_is_bad(dir))
 		return ERR_PTR(-EIO);
 
 	locked = fuse_lock_inode(dir);
 	err = fuse_lookup_name(dir->i_sb, get_node_id(dir), &entry->d_name,
-			       &outarg, &inode);
+			       &outarg, entry, &inode);
 	fuse_unlock_inode(dir, locked);
 	if (err == -ENOENT) {
 		outarg_valid = false;
@@ -533,6 +705,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 {
 	int err;
 	struct inode *inode;
+	struct fuse_conn *fc = get_fuse_conn(dir);
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
 	struct fuse_forget_link *forget;
@@ -548,6 +721,20 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	/* Userspace expects S_IFREG in create mode */
 	BUG_ON((mode & S_IFMT) != S_IFREG);
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		struct fuse_err_ret fer;
+
+		fer = fuse_bpf_backing(dir, struct fuse_create_open_io,
+				       fuse_create_open_initialize,
+				       fuse_create_open_backing,
+				       fuse_create_open_finalize,
+				       dir, entry, file, flags, mode);
+		if (fer.ret)
+			return PTR_ERR(fer.result);
+	}
+#endif
+
 	forget = fuse_alloc_forget();
 	err = -ENOMEM;
 	if (!forget)
@@ -610,6 +797,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	ff->fh = outopen.fh;
 	ff->nodeid = outentry.nodeid;
 	ff->open_flags = outopen.open_flags;
+	fuse_passthrough_setup(fc, ff, &outopen);
 	inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
 			  &outentry.attr, entry_attr_timeout(&outentry), 0);
 	if (!inode) {
@@ -780,6 +968,17 @@ static int fuse_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(dir, struct fuse_mknod_in,
+			fuse_mknod_initialize, fuse_mknod_backing,
+			fuse_mknod_finalize,
+			dir, entry, mode, rdev);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (!fm->fc->dont_mask)
 		mode &= ~current_umask();
 
@@ -826,6 +1025,17 @@ static int fuse_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(dir, struct fuse_mkdir_in,
+			fuse_mkdir_initialize, fuse_mkdir_backing,
+			fuse_mkdir_finalize,
+			dir, entry, mode);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (!fm->fc->dont_mask)
 		mode &= ~current_umask();
 
@@ -848,6 +1058,17 @@ static int fuse_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 	unsigned len = strlen(link) + 1;
 	FUSE_ARGS(args);
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(dir, struct fuse_dummy_io,
+			fuse_symlink_initialize, fuse_symlink_backing,
+			fuse_symlink_finalize,
+			dir, entry, link, len);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	args.opcode = FUSE_SYMLINK;
 	args.in_numargs = 2;
 	args.in_args[0].size = entry->d_name.len + 1;
@@ -911,6 +1132,20 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
 	if (fuse_is_bad(dir))
 		return -EIO;
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		struct fuse_err_ret fer;
+
+		fer = fuse_bpf_backing(dir, struct fuse_dummy_io,
+					fuse_unlink_initialize,
+					fuse_unlink_backing,
+					fuse_unlink_finalize,
+					dir, entry);
+		if (fer.ret)
+			return PTR_ERR(fer.result);
+	}
+#endif
+
 	args.opcode = FUSE_UNLINK;
 	args.nodeid = get_node_id(dir);
 	args.in_numargs = 1;
@@ -934,6 +1169,20 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
 	if (fuse_is_bad(dir))
 		return -EIO;
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		struct fuse_err_ret fer;
+
+		fer = fuse_bpf_backing(dir, struct fuse_dummy_io,
+					fuse_rmdir_initialize,
+					fuse_rmdir_backing,
+					fuse_rmdir_finalize,
+					dir, entry);
+		if (fer.ret)
+			return PTR_ERR(fer.result);
+	}
+#endif
+
 	args.opcode = FUSE_RMDIR;
 	args.nodeid = get_node_id(dir);
 	args.in_numargs = 1;
@@ -1012,6 +1261,18 @@ static int fuse_rename2(struct user_namespace *mnt_userns, struct inode *olddir,
 		return -EINVAL;
 
 	if (flags) {
+#ifdef CONFIG_FUSE_BPF
+		struct fuse_err_ret fer;
+
+		fer = fuse_bpf_backing(olddir, struct fuse_rename2_in,
+						fuse_rename2_initialize, fuse_rename2_backing,
+						fuse_rename2_finalize,
+						olddir, oldent, newdir, newent, flags);
+		if (fer.ret)
+			return PTR_ERR(fer.result);
+#endif
+
+		/* TODO: how should this go with bpfs involved? */
 		if (fc->no_rename2 || fc->minor < 23)
 			return -EINVAL;
 
@@ -1023,6 +1284,17 @@ static int fuse_rename2(struct user_namespace *mnt_userns, struct inode *olddir,
 			err = -EINVAL;
 		}
 	} else {
+#ifdef CONFIG_FUSE_BPF
+		struct fuse_err_ret fer;
+
+		fer = fuse_bpf_backing(olddir, struct fuse_rename_in,
+						fuse_rename_initialize, fuse_rename_backing,
+						fuse_rename_finalize,
+						olddir, oldent, newdir, newent);
+		if (fer.ret)
+			return PTR_ERR(fer.result);
+#endif
+
 		err = fuse_rename_common(olddir, oldent, newdir, newent, 0,
 					 FUSE_RENAME,
 					 sizeof(struct fuse_rename_in));
@@ -1040,6 +1312,16 @@ static int fuse_link(struct dentry *entry, struct inode *newdir,
 	struct fuse_mount *fm = get_fuse_mount(inode);
 	FUSE_ARGS(args);
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_link_in, fuse_link_initialize,
+			       fuse_link_backing, fuse_link_finalize, entry,
+			       newdir, newent);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	memset(&inarg, 0, sizeof(inarg));
 	inarg.oldnodeid = get_node_id(inode);
 	args.opcode = FUSE_LINK;
@@ -1057,7 +1339,7 @@ static int fuse_link(struct dentry *entry, struct inode *newdir,
 	return err;
 }
 
-static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
+void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
 			  struct kstat *stat)
 {
 	unsigned int blkbits;
@@ -1117,23 +1399,13 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
 	args.out_args[0].size = sizeof(outarg);
 	args.out_args[0].value = &outarg;
 	err = fuse_simple_request(fm, &args);
-	if (!err) {
-		if (fuse_invalid_attr(&outarg.attr) ||
-		    inode_wrong_type(inode, outarg.attr.mode)) {
-			fuse_make_bad(inode);
-			err = -EIO;
-		} else {
-			fuse_change_attributes(inode, &outarg.attr,
-					       attr_timeout(&outarg),
-					       attr_version);
-			if (stat)
-				fuse_fillattr(inode, &outarg.attr, stat);
-		}
-	}
+	if (!err)
+		err = finalize_attr(inode, &outarg, attr_version, stat);
 	return err;
 }
 
 static int fuse_update_get_attr(struct inode *inode, struct file *file,
+				const struct path *path,
 				struct kstat *stat, u32 request_mask,
 				unsigned int flags)
 {
@@ -1143,6 +1415,17 @@ static int fuse_update_get_attr(struct inode *inode, struct file *file,
 	u32 inval_mask = READ_ONCE(fi->inval_mask);
 	u32 cache_mask = fuse_get_cache_mask(inode);
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_getattr_io,
+			       fuse_getattr_initialize,	fuse_getattr_backing,
+			       fuse_getattr_finalize,
+			       path->dentry, stat, request_mask, flags);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (flags & AT_STATX_FORCE_SYNC)
 		sync = true;
 	else if (flags & AT_STATX_DONT_SYNC)
@@ -1166,7 +1449,9 @@ static int fuse_update_get_attr(struct inode *inode, struct file *file,
 
 int fuse_update_attributes(struct inode *inode, struct file *file, u32 mask)
 {
-	return fuse_update_get_attr(inode, file, NULL, mask, 0);
+	/* Do *not* need to get atime for internal purposes */
+	return fuse_update_get_attr(inode, file, &file->f_path, NULL,
+				    mask & ~STATX_ATIME, 0);
 }
 
 int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
@@ -1277,6 +1562,16 @@ static int fuse_access(struct inode *inode, int mask)
 	struct fuse_access_in inarg;
 	int err;
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_access_in,
+			       fuse_access_initialize, fuse_access_backing,
+			       fuse_access_finalize, inode, mask);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	BUG_ON(mask & MAY_NOT_BLOCK);
 
 	if (fm->fc->no_access)
@@ -1325,6 +1620,10 @@ static int fuse_permission(struct user_namespace *mnt_userns,
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	bool refreshed = false;
 	int err = 0;
+	struct fuse_inode *fi = get_fuse_inode(inode);
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+#endif
 
 	if (fuse_is_bad(inode))
 		return -EIO;
@@ -1332,12 +1631,19 @@ static int fuse_permission(struct user_namespace *mnt_userns,
 	if (!fuse_allow_current_process(fc))
 		return -EACCES;
 
+#ifdef CONFIG_FUSE_BPF
+	fer = fuse_bpf_backing(inode, struct fuse_access_in,
+			       fuse_access_initialize, fuse_access_backing,
+			       fuse_access_finalize, inode, mask);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	/*
 	 * If attributes are needed, refresh them before proceeding
 	 */
 	if (fc->default_permissions ||
 	    ((mask & MAY_EXEC) && S_ISREG(inode->i_mode))) {
-		struct fuse_inode *fi = get_fuse_inode(inode);
 		u32 perm_mask = STATX_MODE | STATX_UID | STATX_GID;
 
 		if (perm_mask & READ_ONCE(fi->inval_mask) ||
@@ -1428,6 +1734,21 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode,
 	if (fuse_is_bad(inode))
 		goto out_err;
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		struct fuse_err_ret fer;
+		const char *out = NULL;
+
+		fer = fuse_bpf_backing(inode, struct fuse_dummy_io,
+				       fuse_get_link_initialize,
+				       fuse_get_link_backing,
+				       fuse_get_link_finalize,
+				       inode, dentry, callback, &out);
+		if (fer.ret)
+			return fer.result ?: out;
+	}
+#endif
+
 	if (fc->cache_symlinks)
 		return page_get_link(dentry, inode, callback);
 
@@ -1461,8 +1782,18 @@ static int fuse_dir_open(struct inode *inode, struct file *file)
 
 static int fuse_dir_release(struct inode *inode, struct file *file)
 {
-	fuse_release_common(file, true);
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
 
+	fer = fuse_bpf_backing(inode, struct fuse_release_in,
+		       fuse_releasedir_initialize, fuse_release_backing,
+		       fuse_release_finalize,
+		       inode, file);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
+	fuse_release_common(file, true);
 	return 0;
 }
 
@@ -1476,6 +1807,19 @@ static int fuse_dir_fsync(struct file *file, loff_t start, loff_t end,
 	if (fuse_is_bad(inode))
 		return -EIO;
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		struct fuse_err_ret fer;
+
+		fer = fuse_bpf_backing(inode, struct fuse_fsync_in,
+				fuse_dir_fsync_initialize, fuse_fsync_backing,
+				fuse_fsync_finalize,
+				file, start, end, datasync);
+		if (fer.ret)
+			return PTR_ERR(fer.result);
+	}
+#endif
+
 	if (fc->no_fsyncdir)
 		return 0;
 
@@ -1514,58 +1858,6 @@ static long fuse_dir_compat_ioctl(struct file *file, unsigned int cmd,
 				 FUSE_IOCTL_COMPAT | FUSE_IOCTL_DIR);
 }
 
-static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
-{
-	/* Always update if mtime is explicitly set  */
-	if (ivalid & ATTR_MTIME_SET)
-		return true;
-
-	/* Or if kernel i_mtime is the official one */
-	if (trust_local_mtime)
-		return true;
-
-	/* If it's an open(O_TRUNC) or an ftruncate(), don't update */
-	if ((ivalid & ATTR_SIZE) && (ivalid & (ATTR_OPEN | ATTR_FILE)))
-		return false;
-
-	/* In all other cases update */
-	return true;
-}
-
-static void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
-			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
-{
-	unsigned ivalid = iattr->ia_valid;
-
-	if (ivalid & ATTR_MODE)
-		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
-	if (ivalid & ATTR_UID)
-		arg->valid |= FATTR_UID,    arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
-	if (ivalid & ATTR_GID)
-		arg->valid |= FATTR_GID,    arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
-	if (ivalid & ATTR_SIZE)
-		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
-	if (ivalid & ATTR_ATIME) {
-		arg->valid |= FATTR_ATIME;
-		arg->atime = iattr->ia_atime.tv_sec;
-		arg->atimensec = iattr->ia_atime.tv_nsec;
-		if (!(ivalid & ATTR_ATIME_SET))
-			arg->valid |= FATTR_ATIME_NOW;
-	}
-	if ((ivalid & ATTR_MTIME) && update_mtime(ivalid, trust_local_cmtime)) {
-		arg->valid |= FATTR_MTIME;
-		arg->mtime = iattr->ia_mtime.tv_sec;
-		arg->mtimensec = iattr->ia_mtime.tv_nsec;
-		if (!(ivalid & ATTR_MTIME_SET) && !trust_local_cmtime)
-			arg->valid |= FATTR_MTIME_NOW;
-	}
-	if ((ivalid & ATTR_CTIME) && trust_local_cmtime) {
-		arg->valid |= FATTR_CTIME;
-		arg->ctime = iattr->ia_ctime.tv_sec;
-		arg->ctimensec = iattr->ia_ctime.tv_nsec;
-	}
-}
-
 /*
  * Prevent concurrent writepages on inode
  *
@@ -1680,6 +1972,16 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
 	bool trust_local_cmtime = is_wb;
 	bool fault_blocked = false;
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_setattr_io,
+			       fuse_setattr_initialize, fuse_setattr_backing,
+			       fuse_setattr_finalize, dentry, attr, file);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (!fc->default_permissions)
 		attr->ia_valid |= ATTR_FORCE;
 
@@ -1855,11 +2157,22 @@ static int fuse_setattr(struct user_namespace *mnt_userns, struct dentry *entry,
 		 * This should be done on write(), truncate() and chown().
 		 */
 		if (!fc->handle_killpriv && !fc->handle_killpriv_v2) {
+#ifdef CONFIG_FUSE_BPF
+			struct fuse_err_ret fer;
+
 			/*
 			 * ia_mode calculation may have used stale i_mode.
 			 * Refresh and recalculate.
 			 */
-			ret = fuse_do_getattr(inode, NULL, file);
+			fer = fuse_bpf_backing(inode, struct fuse_getattr_io,
+					       fuse_getattr_initialize,	fuse_getattr_backing,
+					       fuse_getattr_finalize,
+					       entry, NULL, 0, 0);
+			if (fer.ret)
+				ret = PTR_ERR(fer.result);
+			else
+#endif
+				ret = fuse_do_getattr(inode, NULL, file);
 			if (ret)
 				return ret;
 
@@ -1916,7 +2229,8 @@ static int fuse_getattr(struct user_namespace *mnt_userns,
 		return -EACCES;
 	}
 
-	return fuse_update_get_attr(inode, NULL, stat, request_mask, flags);
+	return fuse_update_get_attr(inode, NULL, path, stat, request_mask,
+				    flags);
 }
 
 static const struct inode_operations fuse_dir_inode_operations = {

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 89f4741..bccf5cf 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c

@@ -8,6 +8,7 @@
 
 #include "fuse_i.h"
 
+#include <linux/filter.h>
 #include <linux/pagemap.h>
 #include <linux/slab.h>
 #include <linux/kernel.h>
@@ -146,7 +147,7 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 		if (!err) {
 			ff->fh = outarg.fh;
 			ff->open_flags = outarg.open_flags;
-
+			fuse_passthrough_setup(fc, ff, &outarg);
 		} else if (err != -ENOSYS) {
 			fuse_file_free(ff);
 			return ERR_PTR(err);
@@ -235,6 +236,20 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
 	if (err)
 		return err;
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		struct fuse_err_ret fer;
+
+		fer = fuse_bpf_backing(inode, struct fuse_open_io,
+				       fuse_open_initialize,
+				       fuse_open_backing,
+				       fuse_open_finalize,
+				       inode, file, isdir);
+		if (fer.ret)
+			return PTR_ERR(fer.result);
+	}
+#endif
+
 	if (is_wb_truncate || dax_truncate)
 		inode_lock(inode);
 
@@ -308,6 +323,8 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 	struct fuse_release_args *ra = ff->release_args;
 	int opcode = isdir ? FUSE_RELEASEDIR : FUSE_RELEASE;
 
+	fuse_passthrough_release(&ff->passthrough);
+
 	fuse_prepare_release(fi, ff, open_flags, opcode);
 
 	if (ff->flock) {
@@ -344,6 +361,17 @@ static int fuse_release(struct inode *inode, struct file *file)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_release_in,
+		       fuse_release_initialize, fuse_release_backing,
+		       fuse_release_finalize,
+		       inode, file);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	/*
 	 * Dirty pages might remain despite write_inode_now() call from
 	 * fuse_flush() due to writes racing with the close.
@@ -486,6 +514,17 @@ static int fuse_flush(struct file *file, fl_owner_t id)
 	FUSE_ARGS(args);
 	int err;
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(file->f_inode, struct fuse_flush_in,
+			       fuse_flush_initialize, fuse_flush_backing,
+			       fuse_flush_finalize,
+			       file, id);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 
@@ -561,6 +600,17 @@ static int fuse_fsync(struct file *file, loff_t start, loff_t end,
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	int err;
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_fsync_in,
+			       fuse_fsync_initialize, fuse_fsync_backing,
+			       fuse_fsync_finalize,
+			       file, start, end, datasync);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 
@@ -1598,7 +1648,23 @@ static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (FUSE_IS_DAX(inode))
 		return fuse_dax_read_iter(iocb, to);
 
-	if (!(ff->open_flags & FOPEN_DIRECT_IO))
+#ifdef CONFIG_FUSE_BPF
+	{
+		struct fuse_err_ret fer;
+
+		fer = fuse_bpf_backing(inode, struct fuse_file_read_iter_io,
+				       fuse_file_read_iter_initialize,
+				       fuse_file_read_iter_backing,
+				       fuse_file_read_iter_finalize,
+				       iocb, to);
+		if (fer.ret)
+			return PTR_ERR(fer.result);
+	}
+#endif
+
+	if (ff->passthrough.filp)
+		return fuse_passthrough_read_iter(iocb, to);
+	else if (!(ff->open_flags & FOPEN_DIRECT_IO))
 		return fuse_cache_read_iter(iocb, to);
 	else
 		return fuse_direct_read_iter(iocb, to);
@@ -1616,7 +1682,23 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (FUSE_IS_DAX(inode))
 		return fuse_dax_write_iter(iocb, from);
 
-	if (!(ff->open_flags & FOPEN_DIRECT_IO))
+#ifdef CONFIG_FUSE_BPF
+	{
+		struct fuse_err_ret fer;
+
+		fer = fuse_bpf_backing(inode, struct fuse_file_write_iter_io,
+				       fuse_file_write_iter_initialize,
+				       fuse_file_write_iter_backing,
+				       fuse_file_write_iter_finalize,
+				       iocb, from);
+		if (fer.ret)
+			return PTR_ERR(fer.result);
+	}
+#endif
+
+	if (ff->passthrough.filp)
+		return fuse_passthrough_write_iter(iocb, from);
+	else if (!(ff->open_flags & FOPEN_DIRECT_IO))
 		return fuse_cache_write_iter(iocb, from);
 	else
 		return fuse_direct_write_iter(iocb, from);
@@ -1862,6 +1944,19 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
 	struct fuse_file *ff;
 	int err;
 
+	/**
+	 * TODO - fully understand why this is necessary
+	 *
+	 * With fuse-bpf, fsstress fails if rename is enabled without this
+	 *
+	 * We are getting writes here on directory inodes, which do not have an
+	 * initialized file list so crash.
+	 *
+	 * The question is why we are getting those writes
+	 */
+	if (!S_ISREG(inode->i_mode))
+		return 0;
+
 	/*
 	 * Inode is always written before the last reference is dropped and
 	 * hence this should not be reached from reclaim.
@@ -2433,6 +2528,15 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 	if (FUSE_IS_DAX(file_inode(file)))
 		return fuse_dax_mmap(file, vma);
 
+#ifdef CONFIG_FUSE_BPF
+	/* TODO - this is simply passthrough, not a proper BPF filter */
+	if (ff->backing_file)
+		return fuse_backing_mmap(file, vma);
+#endif
+
+	if (ff->passthrough.filp)
+		return fuse_passthrough_mmap(file, vma);
+
 	if (ff->open_flags & FOPEN_DIRECT_IO) {
 		/* Can't provide the coherency needed for MAP_SHARED */
 		if (vma->vm_flags & VM_MAYSHARE)
@@ -2678,6 +2782,17 @@ static loff_t fuse_file_llseek(struct file *file, loff_t offset, int whence)
 {
 	loff_t retval;
 	struct inode *inode = file_inode(file);
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_lseek_io,
+			       fuse_lseek_initialize,
+			       fuse_lseek_backing,
+			       fuse_lseek_finalize,
+			       file, offset, whence);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
 
 	switch (whence) {
 	case SEEK_SET:
@@ -2967,6 +3082,18 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
 		(!(mode & FALLOC_FL_KEEP_SIZE) ||
 		 (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)));
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_fallocate_in,
+			       fuse_file_fallocate_initialize,
+			       fuse_file_fallocate_backing,
+			       fuse_file_fallocate_finalize,
+			       file, mode, offset, length);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
 		     FALLOC_FL_ZERO_RANGE))
 		return -EOPNOTSUPP;
@@ -3070,6 +3197,18 @@ static ssize_t __fuse_copy_file_range(struct file *file_in, loff_t pos_in,
 	bool is_unstable = (!fc->writeback_cache) &&
 			   ((pos_out + len) > inode_out->i_size);
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(file_in->f_inode, struct fuse_copy_file_range_io,
+			       fuse_copy_file_range_initialize,
+			       fuse_copy_file_range_backing,
+			       fuse_copy_file_range_finalize,
+			       file_in, pos_in, file_out, pos_out, len, flags);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (fc->no_copy_file_range)
 		return -EOPNOTSUPP;
 

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 98a9cf5..3e2131f 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h

@@ -13,6 +13,9 @@
 # define pr_fmt(fmt) "fuse: " fmt
 #endif
 
+#include <linux/android_fuse.h>
+#include <linux/filter.h>
+#include <linux/pagemap.h>
 #include <linux/fuse.h>
 #include <linux/fs.h>
 #include <linux/mount.h>
@@ -31,6 +34,9 @@
 #include <linux/pid_namespace.h>
 #include <linux/refcount.h>
 #include <linux/user_namespace.h>
+#include <linux/statfs.h>
+
+#define FUSE_SUPER_MAGIC 0x65735546
 
 /** Default max number of pages that can be used in a single read request */
 #define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32
@@ -63,11 +69,57 @@ struct fuse_forget_link {
 	struct fuse_forget_link *next;
 };
 
+/** FUSE specific dentry data */
+#if BITS_PER_LONG < 64 || defined(CONFIG_FUSE_BPF)
+struct fuse_dentry {
+	union {
+		u64 time;
+		struct rcu_head rcu;
+	};
+	struct path backing_path;
+};
+
+static inline struct fuse_dentry *get_fuse_dentry(const struct dentry *entry)
+{
+	return entry->d_fsdata;
+}
+#endif
+
+#ifdef CONFIG_FUSE_BPF
+static inline void get_fuse_backing_path(const struct dentry *d,
+					  struct path *path)
+{
+	struct fuse_dentry *di = get_fuse_dentry(d);
+
+	if (!di) {
+		*path = (struct path) {};
+		return;
+	}
+
+	*path = di->backing_path;
+	path_get(path);
+}
+#endif
+
 /** FUSE inode */
 struct fuse_inode {
 	/** Inode data */
 	struct inode inode;
 
+#ifdef CONFIG_FUSE_BPF
+	/**
+	 * Backing inode, if this inode is from a backing file system.
+	 * If this is set, nodeid is 0.
+	 */
+	struct inode *backing_inode;
+
+	/**
+	 * bpf_prog, run on all operations to determine whether to pass through
+	 * or handle in place
+	 */
+	struct bpf_prog *bpf;
+#endif
+
 	/** Unique ID, which identifies the inode between userspace
 	 * and kernel */
 	u64 nodeid;
@@ -173,6 +225,17 @@ struct fuse_conn;
 struct fuse_mount;
 struct fuse_release_args;
 
+/**
+ * Reference to lower filesystem file for read/write operations handled in
+ * passthrough mode.
+ * This struct also tracks the credentials to be used for handling read/write
+ * operations.
+ */
+struct fuse_passthrough {
+	struct file *filp;
+	struct cred *cred;
+};
+
 /** FUSE specific file data */
 struct fuse_file {
 	/** Fuse connection for this file */
@@ -218,6 +281,17 @@ struct fuse_file {
 
 	} readdir;
 
+	/** Container for data related to the passthrough functionality */
+	struct fuse_passthrough passthrough;
+
+#ifdef CONFIG_FUSE_BPF
+	/**
+	 * TODO: Reconcile with passthrough file
+	 * backing file when in bpf mode
+	 */
+	struct file *backing_file;
+#endif
+
 	/** RB node to be linked on fuse_conn->polled_files */
 	struct rb_node polled_node;
 
@@ -249,6 +323,7 @@ struct fuse_page_desc {
 struct fuse_args {
 	uint64_t nodeid;
 	uint32_t opcode;
+	uint32_t error_in;
 	unsigned short in_numargs;
 	unsigned short out_numargs;
 	bool force:1;
@@ -261,9 +336,12 @@ struct fuse_args {
 	bool page_zeroing:1;
 	bool page_replace:1;
 	bool may_block:1;
-	struct fuse_in_arg in_args[3];
-	struct fuse_arg out_args[2];
+	struct fuse_in_arg in_args[FUSE_MAX_IN_ARGS];
+	struct fuse_arg out_args[FUSE_MAX_OUT_ARGS];
 	void (*end)(struct fuse_mount *fm, struct fuse_args *args, int error);
+
+	/* Path used for completing d_canonical_path */
+	struct path *canonical_path;
 };
 
 struct fuse_args_pages {
@@ -367,10 +445,8 @@ struct fuse_req {
 	/** Used to wake up the task waiting for completion of request*/
 	wait_queue_head_t waitq;
 
-#if IS_ENABLED(CONFIG_VIRTIO_FS)
 	/** virtio-fs's physically contiguous buffer for in and out args */
 	void *argbuf;
-#endif
 
 	/** fuse_mount this request belongs to */
 	struct fuse_mount *fm;
@@ -390,19 +466,19 @@ struct fuse_iqueue_ops {
 	/**
 	 * Signal that a forget has been queued
 	 */
-	void (*wake_forget_and_unlock)(struct fuse_iqueue *fiq)
+	void (*wake_forget_and_unlock)(struct fuse_iqueue *fiq, bool sync)
 		__releases(fiq->lock);
 
 	/**
 	 * Signal that an INTERRUPT request has been queued
 	 */
-	void (*wake_interrupt_and_unlock)(struct fuse_iqueue *fiq)
+	void (*wake_interrupt_and_unlock)(struct fuse_iqueue *fiq, bool sync)
 		__releases(fiq->lock);
 
 	/**
 	 * Signal that a request has been queued
 	 */
-	void (*wake_pending_and_unlock)(struct fuse_iqueue *fiq)
+	void (*wake_pending_and_unlock)(struct fuse_iqueue *fiq, bool sync)
 		__releases(fiq->lock);
 
 	/**
@@ -511,9 +587,12 @@ struct fuse_fs_context {
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
 	enum fuse_dax_mode dax_mode;
+	bool no_daemon:1;
 	unsigned int max_read;
 	unsigned int blksize;
 	const char *subtype;
+	struct bpf_prog *root_bpf;
+	struct file *root_dir;
 
 	/* DAX device, may be NULL */
 	struct dax_device *dax_dev;
@@ -775,6 +854,9 @@ struct fuse_conn {
 	/* Auto-mount submounts announced by the server */
 	unsigned int auto_submounts:1;
 
+	/** Passthrough mode for read/write IO */
+	unsigned int passthrough:1;
+
 	/* Propagate syncfs() to server */
 	unsigned int sync_fs:1;
 
@@ -787,6 +869,9 @@ struct fuse_conn {
 	/* Is tmpfile not implemented by fs? */
 	unsigned int no_tmpfile:1;
 
+	/** BPF Only, no Daemon running */
+	unsigned int no_daemon:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
@@ -836,6 +921,12 @@ struct fuse_conn {
 
 	/* New writepages go into this bucket */
 	struct fuse_sync_bucket __rcu *curr_bucket;
+
+	/** IDR for passthrough requests */
+	struct idr passthrough_req;
+
+	/** Protects passthrough_req */
+	spinlock_t passthrough_req_lock;
 };
 
 /*
@@ -955,14 +1046,18 @@ extern const struct dentry_operations fuse_dentry_operations;
 extern const struct dentry_operations fuse_root_dentry_operations;
 
 /**
- * Get a filled in inode
+ * Get a filled-in inode
  */
+struct inode *fuse_iget_backing(struct super_block *sb,
+				u64 nodeid,
+				struct inode *backing_inode);
 struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 			int generation, struct fuse_attr *attr,
 			u64 attr_valid, u64 attr_version);
 
 int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name,
-		     struct fuse_entry_out *outarg, struct inode **inode);
+		     struct fuse_entry_out *outarg,
+		     struct dentry *entry, struct inode **inode);
 
 /**
  * Send FORGET command
@@ -999,7 +1094,6 @@ struct fuse_io_args {
 void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos,
 			 size_t count, int opcode);
 
-
 /**
  * Send OPEN or OPENDIR request
  */
@@ -1071,7 +1165,7 @@ int fuse_dev_init(void);
 void fuse_dev_cleanup(void);
 
 int fuse_ctl_init(void);
-void __exit fuse_ctl_cleanup(void);
+void fuse_ctl_cleanup(void);
 
 /**
  * Simple request sending that does request allocation and freeing
@@ -1107,6 +1201,7 @@ void fuse_invalidate_entry_cache(struct dentry *entry);
 void fuse_invalidate_atime(struct inode *inode);
 
 u64 entry_attr_timeout(struct fuse_entry_out *o);
+void fuse_init_dentry_root(struct dentry *root, struct file *backing_dir);
 void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o);
 
 /**
@@ -1319,4 +1414,647 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 		       unsigned int open_flags, fl_owner_t id, bool isdir);
 
+/* passthrough.c */
+void fuse_copyattr(struct file *dst_file, struct file *src_file);
+int fuse_passthrough_open(struct fuse_dev *fud, u32 lower_fd);
+int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
+			   struct fuse_open_out *openarg);
+void fuse_passthrough_release(struct fuse_passthrough *passthrough);
+ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *to);
+ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct iov_iter *from);
+ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma);
+
+/* backing.c */
+
+/*
+ * Dummy io passed to fuse_bpf_backing when io operation needs no scratch space
+ */
+struct fuse_dummy_io {
+	int unused;
+};
+
+struct fuse_open_io {
+	struct fuse_open_in foi;
+	struct fuse_open_out foo;
+};
+
+int fuse_open_initialize(struct fuse_bpf_args *fa, struct fuse_open_io *foi,
+			 struct inode *inode, struct file *file, bool isdir);
+int fuse_open_backing(struct fuse_bpf_args *fa,
+		      struct inode *inode, struct file *file, bool isdir);
+void *fuse_open_finalize(struct fuse_bpf_args *fa,
+		       struct inode *inode, struct file *file, bool isdir);
+
+struct fuse_create_open_io {
+	struct fuse_create_in fci;
+	struct fuse_entry_out feo;
+	struct fuse_open_out foo;
+};
+
+int fuse_create_open_initialize(
+		struct fuse_bpf_args *fa, struct fuse_create_open_io *fcoi,
+		struct inode *dir, struct dentry *entry,
+		struct file *file, unsigned int flags, umode_t mode);
+int fuse_create_open_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry,
+		struct file *file, unsigned int flags, umode_t mode);
+void *fuse_create_open_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry,
+		struct file *file, unsigned int flags, umode_t mode);
+
+int fuse_mknod_initialize(
+		struct fuse_bpf_args *fa, struct fuse_mknod_in *fmi,
+		struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev);
+int fuse_mknod_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev);
+void *fuse_mknod_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev);
+
+int fuse_mkdir_initialize(
+		struct fuse_bpf_args *fa, struct fuse_mkdir_in *fmi,
+		struct inode *dir, struct dentry *entry, umode_t mode);
+int fuse_mkdir_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, umode_t mode);
+void *fuse_mkdir_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, umode_t mode);
+
+int fuse_rmdir_initialize(
+		struct fuse_bpf_args *fa, struct fuse_dummy_io *fmi,
+		struct inode *dir, struct dentry *entry);
+int fuse_rmdir_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry);
+void *fuse_rmdir_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry);
+
+int fuse_rename2_initialize(struct fuse_bpf_args *fa, struct fuse_rename2_in *fri,
+			    struct inode *olddir, struct dentry *oldent,
+			    struct inode *newdir, struct dentry *newent,
+			    unsigned int flags);
+int fuse_rename2_backing(struct fuse_bpf_args *fa,
+			 struct inode *olddir, struct dentry *oldent,
+			 struct inode *newdir, struct dentry *newent,
+			 unsigned int flags);
+void *fuse_rename2_finalize(struct fuse_bpf_args *fa,
+			    struct inode *olddir, struct dentry *oldent,
+			    struct inode *newdir, struct dentry *newent,
+			    unsigned int flags);
+
+int fuse_rename_initialize(struct fuse_bpf_args *fa, struct fuse_rename_in *fri,
+			   struct inode *olddir, struct dentry *oldent,
+			   struct inode *newdir, struct dentry *newent);
+int fuse_rename_backing(struct fuse_bpf_args *fa,
+			struct inode *olddir, struct dentry *oldent,
+			struct inode *newdir, struct dentry *newent);
+void *fuse_rename_finalize(struct fuse_bpf_args *fa,
+			   struct inode *olddir, struct dentry *oldent,
+			   struct inode *newdir, struct dentry *newent);
+
+int fuse_unlink_initialize(
+		struct fuse_bpf_args *fa, struct fuse_dummy_io *fmi,
+		struct inode *dir, struct dentry *entry);
+int fuse_unlink_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry);
+void *fuse_unlink_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry);
+
+int fuse_link_initialize(struct fuse_bpf_args *fa, struct fuse_link_in *fli,
+			  struct dentry *entry, struct inode *dir,
+			  struct dentry *newent);
+int fuse_link_backing(struct fuse_bpf_args *fa, struct dentry *entry,
+		      struct inode *dir, struct dentry *newent);
+void *fuse_link_finalize(struct fuse_bpf_args *fa, struct dentry *entry,
+			 struct inode *dir, struct dentry *newent);
+
+int fuse_release_initialize(struct fuse_bpf_args *fa, struct fuse_release_in *fri,
+			    struct inode *inode, struct file *file);
+int fuse_releasedir_initialize(struct fuse_bpf_args *fa,
+			struct fuse_release_in *fri,
+			struct inode *inode, struct file *file);
+int fuse_release_backing(struct fuse_bpf_args *fa,
+			 struct inode *inode, struct file *file);
+void *fuse_release_finalize(struct fuse_bpf_args *fa,
+			    struct inode *inode, struct file *file);
+
+int fuse_flush_initialize(struct fuse_bpf_args *fa, struct fuse_flush_in *ffi,
+			  struct file *file, fl_owner_t id);
+int fuse_flush_backing(struct fuse_bpf_args *fa, struct file *file, fl_owner_t id);
+void *fuse_flush_finalize(struct fuse_bpf_args *fa,
+			  struct file *file, fl_owner_t id);
+
+struct fuse_lseek_io {
+	struct fuse_lseek_in fli;
+	struct fuse_lseek_out flo;
+};
+
+int fuse_lseek_initialize(struct fuse_bpf_args *fa, struct fuse_lseek_io *fli,
+			  struct file *file, loff_t offset, int whence);
+int fuse_lseek_backing(struct fuse_bpf_args *fa, struct file *file, loff_t offset, int whence);
+void *fuse_lseek_finalize(struct fuse_bpf_args *fa, struct file *file, loff_t offset, int whence);
+
+struct fuse_copy_file_range_io {
+	struct fuse_copy_file_range_in fci;
+	struct fuse_write_out fwo;
+};
+
+int fuse_copy_file_range_initialize(struct fuse_bpf_args *fa,
+				   struct fuse_copy_file_range_io *fcf,
+				   struct file *file_in, loff_t pos_in,
+				   struct file *file_out, loff_t pos_out,
+				   size_t len, unsigned int flags);
+int fuse_copy_file_range_backing(struct fuse_bpf_args *fa,
+				 struct file *file_in, loff_t pos_in,
+				 struct file *file_out, loff_t pos_out,
+				 size_t len, unsigned int flags);
+void *fuse_copy_file_range_finalize(struct fuse_bpf_args *fa,
+				    struct file *file_in, loff_t pos_in,
+				    struct file *file_out, loff_t pos_out,
+				    size_t len, unsigned int flags);
+
+int fuse_fsync_initialize(struct fuse_bpf_args *fa, struct fuse_fsync_in *ffi,
+		   struct file *file, loff_t start, loff_t end, int datasync);
+int fuse_fsync_backing(struct fuse_bpf_args *fa,
+		   struct file *file, loff_t start, loff_t end, int datasync);
+void *fuse_fsync_finalize(struct fuse_bpf_args *fa,
+		   struct file *file, loff_t start, loff_t end, int datasync);
+int fuse_dir_fsync_initialize(struct fuse_bpf_args *fa, struct fuse_fsync_in *ffi,
+		   struct file *file, loff_t start, loff_t end, int datasync);
+
+struct fuse_getxattr_io {
+	struct fuse_getxattr_in fgi;
+	struct fuse_getxattr_out fgo;
+};
+
+int fuse_getxattr_initialize(
+		struct fuse_bpf_args *fa, struct fuse_getxattr_io *fgio,
+		struct dentry *dentry, const char *name, void *value,
+		size_t size);
+int fuse_getxattr_backing(
+		struct fuse_bpf_args *fa,
+		struct dentry *dentry, const char *name, void *value,
+		size_t size);
+void *fuse_getxattr_finalize(
+		struct fuse_bpf_args *fa,
+		struct dentry *dentry, const char *name, void *value,
+		size_t size);
+
+int fuse_listxattr_initialize(struct fuse_bpf_args *fa,
+			       struct fuse_getxattr_io *fgio,
+			       struct dentry *dentry, char *list, size_t size);
+int fuse_listxattr_backing(struct fuse_bpf_args *fa, struct dentry *dentry,
+			   char *list, size_t size);
+void *fuse_listxattr_finalize(struct fuse_bpf_args *fa, struct dentry *dentry,
+			      char *list, size_t size);
+
+int fuse_setxattr_initialize(struct fuse_bpf_args *fa,
+			     struct fuse_setxattr_in *fsxi,
+			     struct dentry *dentry, const char *name,
+			     const void *value, size_t size, int flags);
+int fuse_setxattr_backing(struct fuse_bpf_args *fa, struct dentry *dentry,
+			  const char *name, const void *value, size_t size,
+			  int flags);
+void *fuse_setxattr_finalize(struct fuse_bpf_args *fa, struct dentry *dentry,
+			     const char *name, const void *value, size_t size,
+			     int flags);
+
+int fuse_removexattr_initialize(struct fuse_bpf_args *fa,
+				struct fuse_dummy_io *unused,
+				struct dentry *dentry, const char *name);
+int fuse_removexattr_backing(struct fuse_bpf_args *fa,
+			     struct dentry *dentry, const char *name);
+void *fuse_removexattr_finalize(struct fuse_bpf_args *fa,
+				struct dentry *dentry, const char *name);
+
+struct fuse_read_iter_out {
+	uint64_t ret;
+};
+struct fuse_file_read_iter_io {
+	struct fuse_read_in fri;
+	struct fuse_read_iter_out frio;
+};
+
+int fuse_file_read_iter_initialize(
+		struct fuse_bpf_args *fa, struct fuse_file_read_iter_io *fri,
+		struct kiocb *iocb, struct iov_iter *to);
+int fuse_file_read_iter_backing(struct fuse_bpf_args *fa,
+		struct kiocb *iocb, struct iov_iter *to);
+void *fuse_file_read_iter_finalize(struct fuse_bpf_args *fa,
+		struct kiocb *iocb, struct iov_iter *to);
+
+struct fuse_write_iter_out {
+	uint64_t ret;
+};
+struct fuse_file_write_iter_io {
+	struct fuse_write_in fwi;
+	struct fuse_write_out fwo;
+	struct fuse_write_iter_out fwio;
+};
+
+int fuse_file_write_iter_initialize(
+		struct fuse_bpf_args *fa, struct fuse_file_write_iter_io *fwio,
+		struct kiocb *iocb, struct iov_iter *from);
+int fuse_file_write_iter_backing(struct fuse_bpf_args *fa,
+		struct kiocb *iocb, struct iov_iter *from);
+void *fuse_file_write_iter_finalize(struct fuse_bpf_args *fa,
+		struct kiocb *iocb, struct iov_iter *from);
+
+ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma);
+
+int fuse_file_fallocate_initialize(struct fuse_bpf_args *fa,
+		struct fuse_fallocate_in *ffi,
+		struct file *file, int mode, loff_t offset, loff_t length);
+int fuse_file_fallocate_backing(struct fuse_bpf_args *fa,
+		struct file *file, int mode, loff_t offset, loff_t length);
+void *fuse_file_fallocate_finalize(struct fuse_bpf_args *fa,
+		struct file *file, int mode, loff_t offset, loff_t length);
+
+struct fuse_lookup_io {
+	struct fuse_entry_out feo;
+	struct fuse_entry_bpf feb;
+};
+
+int fuse_handle_backing(struct fuse_entry_bpf *feb, struct inode **backing_inode,
+			struct path *backing_path);
+int fuse_handle_bpf_prog(struct fuse_entry_bpf *feb, struct inode *parent,
+			 struct bpf_prog **bpf);
+
+int fuse_lookup_initialize(struct fuse_bpf_args *fa, struct fuse_lookup_io *feo,
+	       struct inode *dir, struct dentry *entry, unsigned int flags);
+int fuse_lookup_backing(struct fuse_bpf_args *fa, struct inode *dir,
+			  struct dentry *entry, unsigned int flags);
+struct dentry *fuse_lookup_finalize(struct fuse_bpf_args *fa, struct inode *dir,
+			   struct dentry *entry, unsigned int flags);
+int fuse_revalidate_backing(struct dentry *entry, unsigned int flags);
+
+int fuse_canonical_path_initialize(struct fuse_bpf_args *fa,
+				   struct fuse_dummy_io *fdi,
+				   const struct path *path,
+				   struct path *canonical_path);
+int fuse_canonical_path_backing(struct fuse_bpf_args *fa, const struct path *path,
+				struct path *canonical_path);
+void *fuse_canonical_path_finalize(struct fuse_bpf_args *fa,
+				   const struct path *path,
+				   struct path *canonical_path);
+
+struct fuse_getattr_io {
+	struct fuse_getattr_in fgi;
+	struct fuse_attr_out fao;
+};
+int fuse_getattr_initialize(struct fuse_bpf_args *fa, struct fuse_getattr_io *fgio,
+			const struct dentry *entry, struct kstat *stat,
+			u32 request_mask, unsigned int flags);
+int fuse_getattr_backing(struct fuse_bpf_args *fa,
+			const struct dentry *entry, struct kstat *stat,
+			u32 request_mask, unsigned int flags);
+void *fuse_getattr_finalize(struct fuse_bpf_args *fa,
+			const struct dentry *entry, struct kstat *stat,
+			u32 request_mask, unsigned int flags);
+
+struct fuse_setattr_io {
+	struct fuse_setattr_in fsi;
+	struct fuse_attr_out fao;
+};
+
+int fuse_setattr_initialize(struct fuse_bpf_args *fa, struct fuse_setattr_io *fsi,
+		struct dentry *dentry, struct iattr *attr, struct file *file);
+int fuse_setattr_backing(struct fuse_bpf_args *fa,
+		struct dentry *dentry, struct iattr *attr, struct file *file);
+void *fuse_setattr_finalize(struct fuse_bpf_args *fa,
+		struct dentry *dentry, struct iattr *attr, struct file *file);
+
+int fuse_statfs_initialize(struct fuse_bpf_args *fa, struct fuse_statfs_out *fso,
+		struct dentry *dentry, struct kstatfs *buf);
+int fuse_statfs_backing(struct fuse_bpf_args *fa,
+		struct dentry *dentry, struct kstatfs *buf);
+void *fuse_statfs_finalize(struct fuse_bpf_args *fa,
+		struct dentry *dentry, struct kstatfs *buf);
+
+int fuse_get_link_initialize(struct fuse_bpf_args *fa, struct fuse_dummy_io *dummy,
+		struct inode *inode, struct dentry *dentry,
+		struct delayed_call *callback, const char **out);
+int fuse_get_link_backing(struct fuse_bpf_args *fa,
+		struct inode *inode, struct dentry *dentry,
+		struct delayed_call *callback, const char **out);
+void *fuse_get_link_finalize(struct fuse_bpf_args *fa,
+		struct inode *inode, struct dentry *dentry,
+		struct delayed_call *callback, const char **out);
+
+int fuse_symlink_initialize(
+		struct fuse_bpf_args *fa, struct fuse_dummy_io *unused,
+		struct inode *dir, struct dentry *entry, const char *link, int len);
+int fuse_symlink_backing(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, const char *link, int len);
+void *fuse_symlink_finalize(
+		struct fuse_bpf_args *fa,
+		struct inode *dir, struct dentry *entry, const char *link, int len);
+
+struct fuse_read_io {
+	struct fuse_read_in fri;
+	struct fuse_read_out fro;
+};
+
+int fuse_readdir_initialize(struct fuse_bpf_args *fa, struct fuse_read_io *frio,
+			    struct file *file, struct dir_context *ctx,
+			    bool *force_again, bool *allow_force, bool is_continued);
+int fuse_readdir_backing(struct fuse_bpf_args *fa,
+			 struct file *file, struct dir_context *ctx,
+			 bool *force_again, bool *allow_force, bool is_continued);
+void *fuse_readdir_finalize(struct fuse_bpf_args *fa,
+			    struct file *file, struct dir_context *ctx,
+			    bool *force_again, bool *allow_force, bool is_continued);
+
+int fuse_access_initialize(struct fuse_bpf_args *fa, struct fuse_access_in *fai,
+			   struct inode *inode, int mask);
+int fuse_access_backing(struct fuse_bpf_args *fa, struct inode *inode, int mask);
+void *fuse_access_finalize(struct fuse_bpf_args *fa, struct inode *inode, int mask);
+
+/*
+ * FUSE caches dentries and attributes with separate timeout.  The
+ * time in jiffies until the dentry/attributes are valid is stored in
+ * dentry->d_fsdata and fuse_inode->i_time respectively.
+ */
+
+/*
+ * Calculate the time in jiffies until a dentry/attributes are valid
+ */
+static inline u64 time_to_jiffies(u64 sec, u32 nsec)
+{
+	if (sec || nsec) {
+		struct timespec64 ts = {
+			sec,
+			min_t(u32, nsec, NSEC_PER_SEC - 1)
+		};
+
+		return get_jiffies_64() + timespec64_to_jiffies(&ts);
+	} else
+		return 0;
+}
+
+static inline u64 attr_timeout(struct fuse_attr_out *o)
+{
+	return time_to_jiffies(o->attr_valid, o->attr_valid_nsec);
+}
+
+static inline bool update_mtime(unsigned int ivalid, bool trust_local_mtime)
+{
+	/* Always update if mtime is explicitly set  */
+	if (ivalid & ATTR_MTIME_SET)
+		return true;
+
+	/* Or if kernel i_mtime is the official one */
+	if (trust_local_mtime)
+		return true;
+
+	/* If it's an open(O_TRUNC) or an ftruncate(), don't update */
+	if ((ivalid & ATTR_SIZE) && (ivalid & (ATTR_OPEN | ATTR_FILE)))
+		return false;
+
+	/* In all other cases update */
+	return true;
+}
+
+void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
+			  struct kstat *stat);
+
+static inline void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
+			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
+{
+	unsigned int ivalid = iattr->ia_valid;
+
+	if (ivalid & ATTR_MODE)
+		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
+	if (ivalid & ATTR_UID)
+		arg->valid |= FATTR_UID,    arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
+	if (ivalid & ATTR_GID)
+		arg->valid |= FATTR_GID,    arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
+	if (ivalid & ATTR_SIZE)
+		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
+	if (ivalid & ATTR_ATIME) {
+		arg->valid |= FATTR_ATIME;
+		arg->atime = iattr->ia_atime.tv_sec;
+		arg->atimensec = iattr->ia_atime.tv_nsec;
+		if (!(ivalid & ATTR_ATIME_SET))
+			arg->valid |= FATTR_ATIME_NOW;
+	}
+	if ((ivalid & ATTR_MTIME) && update_mtime(ivalid, trust_local_cmtime)) {
+		arg->valid |= FATTR_MTIME;
+		arg->mtime = iattr->ia_mtime.tv_sec;
+		arg->mtimensec = iattr->ia_mtime.tv_nsec;
+		if (!(ivalid & ATTR_MTIME_SET) && !trust_local_cmtime)
+			arg->valid |= FATTR_MTIME_NOW;
+	}
+	if ((ivalid & ATTR_CTIME) && trust_local_cmtime) {
+		arg->valid |= FATTR_CTIME;
+		arg->ctime = iattr->ia_ctime.tv_sec;
+		arg->ctimensec = iattr->ia_ctime.tv_nsec;
+	}
+}
+
+static inline int finalize_attr(struct inode *inode, struct fuse_attr_out *outarg,
+				u64 attr_version, struct kstat *stat)
+{
+	int err = 0;
+
+	if (fuse_invalid_attr(&outarg->attr) ||
+	    ((inode->i_mode ^ outarg->attr.mode) & S_IFMT)) {
+		fuse_make_bad(inode);
+		err = -EIO;
+	} else {
+		fuse_change_attributes(inode, &outarg->attr,
+				       attr_timeout(outarg),
+				       attr_version);
+		if (stat)
+			fuse_fillattr(inode, &outarg->attr, stat);
+	}
+	return err;
+}
+
+static inline void convert_statfs_to_fuse(struct fuse_kstatfs *attr, struct kstatfs *stbuf)
+{
+	attr->bsize   = stbuf->f_bsize;
+	attr->frsize  = stbuf->f_frsize;
+	attr->blocks  = stbuf->f_blocks;
+	attr->bfree   = stbuf->f_bfree;
+	attr->bavail  = stbuf->f_bavail;
+	attr->files   = stbuf->f_files;
+	attr->ffree   = stbuf->f_ffree;
+	attr->namelen = stbuf->f_namelen;
+	/* fsid is left zero */
+}
+
+static inline void convert_fuse_statfs(struct kstatfs *stbuf, struct fuse_kstatfs *attr)
+{
+	stbuf->f_type    = FUSE_SUPER_MAGIC;
+	stbuf->f_bsize   = attr->bsize;
+	stbuf->f_frsize  = attr->frsize;
+	stbuf->f_blocks  = attr->blocks;
+	stbuf->f_bfree   = attr->bfree;
+	stbuf->f_bavail  = attr->bavail;
+	stbuf->f_files   = attr->files;
+	stbuf->f_ffree   = attr->ffree;
+	stbuf->f_namelen = attr->namelen;
+	/* fsid is left zero */
+}
+
+#ifdef CONFIG_FUSE_BPF
+struct fuse_err_ret {
+	void *result;
+	bool ret;
+};
+
+int __init fuse_bpf_init(void);
+void __exit fuse_bpf_cleanup(void);
+
+ssize_t fuse_bpf_simple_request(struct fuse_mount *fm, struct fuse_bpf_args *args);
+
+/*
+ * expression statement to wrap the backing filter logic
+ * struct inode *inode: inode with bpf and backing inode
+ * typedef io: (typically complex) type whose components fuse_args can point to.
+ *	An instance of this type is created locally and passed to initialize
+ * void initialize(struct fuse_bpf_args *fa, io *in_out, args...): function that sets
+ *	up fa and io based on args
+ * int backing(struct fuse_bpf_args *fa, args...): function that actually performs
+ *	the backing io operation
+ * void *finalize(struct fuse_bpf_args *, args...): function that performs any final
+ *	work needed to commit the backing io
+ */
+#define fuse_bpf_backing(inode, io, initialize, backing, finalize,	\
+			 args...)					\
+({									\
+	struct fuse_err_ret fer = {0};					\
+	int ext_flags;							\
+	struct fuse_inode *fuse_inode = get_fuse_inode(inode);		\
+	struct fuse_mount *fm = get_fuse_mount(inode);			\
+	io feo = {0};							\
+	struct fuse_bpf_args fa = {0}, fa_backup = {0};			\
+	bool locked;							\
+	ssize_t res;							\
+	void *err;							\
+	int i;								\
+	bool initialized = false;					\
+									\
+	do {								\
+		if (!fuse_inode || !fuse_inode->backing_inode)		\
+			break;						\
+									\
+		err = ERR_PTR(initialize(&fa, &feo, args));		\
+		if (err) {						\
+			fer = (struct fuse_err_ret) {			\
+				err,					\
+				true,					\
+			};						\
+			break;						\
+		}							\
+		initialized = true;					\
+									\
+		fa_backup = fa;						\
+		fa.opcode |= FUSE_PREFILTER;				\
+		for (i = 0; i < fa.in_numargs; ++i)			\
+			fa.out_args[i] = (struct fuse_bpf_arg) {	\
+				.size = fa.in_args[i].size,		\
+				.value = (void *)fa.in_args[i].value,	\
+			};						\
+		fa.out_numargs = fa.in_numargs;				\
+									\
+		ext_flags = fuse_inode->bpf ?				\
+			bpf_prog_run(fuse_inode->bpf, &fa) :		\
+			FUSE_BPF_BACKING;				\
+		if (ext_flags < 0) {					\
+			fer = (struct fuse_err_ret) {			\
+				ERR_PTR(ext_flags),			\
+				true,					\
+			};						\
+			break;						\
+		}							\
+									\
+		if (ext_flags & FUSE_BPF_USER_FILTER) {			\
+			locked = fuse_lock_inode(inode);		\
+			res = fuse_bpf_simple_request(fm, &fa);		\
+			fuse_unlock_inode(inode, locked);		\
+			if (res < 0) {					\
+				fer = (struct fuse_err_ret) {		\
+					ERR_PTR(res),			\
+					true,				\
+				};					\
+				break;					\
+			}						\
+		}							\
+									\
+		if (!(ext_flags & FUSE_BPF_BACKING))			\
+			break;						\
+									\
+		fa.opcode &= ~FUSE_PREFILTER;				\
+		for (i = 0; i < fa.in_numargs; ++i)			\
+			fa.in_args[i] = (struct fuse_bpf_in_arg) {	\
+				.size = fa.out_args[i].size,		\
+				.value = fa.out_args[i].value,		\
+			};						\
+		for (i = 0; i < fa_backup.out_numargs; ++i)		\
+			fa.out_args[i] = (struct fuse_bpf_arg) {	\
+				.size = fa_backup.out_args[i].size,	\
+				.value = fa_backup.out_args[i].value,	\
+			};						\
+		fa.out_numargs = fa_backup.out_numargs;			\
+									\
+		fer = (struct fuse_err_ret) {				\
+			ERR_PTR(backing(&fa, args)),			\
+			true,						\
+		};							\
+		if (IS_ERR(fer.result))					\
+			fa.error_in = PTR_ERR(fer.result);		\
+		if (!(ext_flags & FUSE_BPF_POST_FILTER))		\
+			break;						\
+									\
+		fa.opcode |= FUSE_POSTFILTER;				\
+		for (i = 0; i < fa.out_numargs; ++i)			\
+			fa.in_args[fa.in_numargs++] =			\
+				(struct fuse_bpf_in_arg) {		\
+					.size = fa.out_args[i].size,	\
+					.value = fa.out_args[i].value,	\
+				};					\
+		ext_flags = bpf_prog_run(fuse_inode->bpf, &fa);		\
+		if (ext_flags < 0) {					\
+			fer = (struct fuse_err_ret) {			\
+				ERR_PTR(ext_flags),			\
+				true,					\
+			};						\
+			break;						\
+		}							\
+		if (!(ext_flags & FUSE_BPF_USER_FILTER))		\
+			break;						\
+									\
+		fa.out_args[0].size = fa_backup.out_args[0].size;	\
+		fa.out_args[1].size = fa_backup.out_args[1].size;	\
+		fa.out_numargs = fa_backup.out_numargs;			\
+		locked = fuse_lock_inode(inode);			\
+		res = fuse_bpf_simple_request(fm, &fa);			\
+		fuse_unlock_inode(inode, locked);			\
+		if (res < 0) {						\
+			fer.result = ERR_PTR(res);			\
+			break;						\
+		}							\
+	} while (false);						\
+									\
+	if (initialized && fer.ret) {					\
+		err = finalize(&fa, args);				\
+		if (err)						\
+			fer.result = err;				\
+	}								\
+									\
+	fer;								\
+})
+
+#endif /* CONFIG_FUSE_BPF */
+
 #endif /* _FS_FUSE_I_H */

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 6b3beda..1f59323 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c

@@ -78,6 +78,10 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
 
 	fi->i_time = 0;
 	fi->inval_mask = 0;
+#ifdef CONFIG_FUSE_BPF
+	fi->backing_inode = NULL;
+	fi->bpf = NULL;
+#endif
 	fi->nodeid = 0;
 	fi->nlookup = 0;
 	fi->attr_version = 0;
@@ -120,6 +124,12 @@ static void fuse_evict_inode(struct inode *inode)
 	/* Will write inode on close/munmap and in all other dirtiers */
 	WARN_ON(inode->i_state & I_DIRTY_INODE);
 
+#ifdef CONFIG_FUSE_BPF
+	iput(fi->backing_inode);
+	if (fi->bpf)
+		bpf_prog_put(fi->bpf);
+	fi->bpf = NULL;
+#endif
 	truncate_inode_pages_final(&inode->i_data);
 	clear_inode(inode);
 	if (inode->i_sb->s_flags & SB_ACTIVE) {
@@ -162,6 +172,28 @@ static ino_t fuse_squash_ino(u64 ino64)
 	return ino;
 }
 
+static void fuse_fill_attr_from_inode(struct fuse_attr *attr,
+				      const struct inode *inode)
+{
+	*attr = (struct fuse_attr){
+		.ino		= inode->i_ino,
+		.size		= inode->i_size,
+		.blocks		= inode->i_blocks,
+		.atime		= inode->i_atime.tv_sec,
+		.mtime		= inode->i_mtime.tv_sec,
+		.ctime		= inode->i_ctime.tv_sec,
+		.atimensec	= inode->i_atime.tv_nsec,
+		.mtimensec	= inode->i_mtime.tv_nsec,
+		.ctimensec	= inode->i_ctime.tv_nsec,
+		.mode		= inode->i_mode,
+		.nlink		= inode->i_nlink,
+		.uid		= inode->i_uid.val,
+		.gid		= inode->i_gid.val,
+		.rdev		= inode->i_rdev,
+		.blksize	= 1u << inode->i_blkbits,
+	};
+}
+
 void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 				   u64 attr_valid, u32 cache_mask)
 {
@@ -329,28 +361,104 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 	else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
 		 S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
 		fuse_init_common(inode);
-		init_special_inode(inode, inode->i_mode,
-				   new_decode_dev(attr->rdev));
+		init_special_inode(inode, inode->i_mode, attr->rdev);
 	} else
 		BUG();
 }
 
+struct fuse_inode_identifier {
+	u64 nodeid;
+	struct inode *backing_inode;
+};
+
 static int fuse_inode_eq(struct inode *inode, void *_nodeidp)
 {
-	u64 nodeid = *(u64 *) _nodeidp;
-	if (get_node_id(inode) == nodeid)
-		return 1;
-	else
-		return 0;
+	struct fuse_inode_identifier *fii =
+		(struct fuse_inode_identifier *) _nodeidp;
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	return fii->nodeid == fi->nodeid;
+}
+
+static int fuse_inode_backing_eq(struct inode *inode, void *_nodeidp)
+{
+	struct fuse_inode_identifier *fii =
+		(struct fuse_inode_identifier *) _nodeidp;
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	return fii->nodeid == fi->nodeid
+#ifdef CONFIG_FUSE_BPF
+		&& fii->backing_inode == fi->backing_inode
+#endif
+		;
 }
 
 static int fuse_inode_set(struct inode *inode, void *_nodeidp)
 {
-	u64 nodeid = *(u64 *) _nodeidp;
-	get_fuse_inode(inode)->nodeid = nodeid;
+	struct fuse_inode_identifier *fii =
+		(struct fuse_inode_identifier *) _nodeidp;
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	fi->nodeid = fii->nodeid;
+
 	return 0;
 }
 
+static int fuse_inode_backing_set(struct inode *inode, void *_nodeidp)
+{
+	struct fuse_inode_identifier *fii =
+		(struct fuse_inode_identifier *) _nodeidp;
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	fi->nodeid = fii->nodeid;
+#ifdef CONFIG_FUSE_BPF
+	fi->backing_inode = fii->backing_inode;
+	if (fi->backing_inode)
+		ihold(fi->backing_inode);
+#endif
+
+	return 0;
+}
+
+struct inode *fuse_iget_backing(struct super_block *sb, u64 nodeid,
+				struct inode *backing_inode)
+{
+	struct inode *inode;
+	struct fuse_inode *fi;
+	struct fuse_conn *fc = get_fuse_conn_super(sb);
+	struct fuse_inode_identifier fii = {
+		.nodeid = nodeid,
+		.backing_inode = backing_inode,
+	};
+	struct fuse_attr attr;
+	unsigned long hash = (unsigned long) backing_inode;
+
+	if (nodeid)
+		hash = nodeid;
+
+	fuse_fill_attr_from_inode(&attr, backing_inode);
+	inode = iget5_locked(sb, hash, fuse_inode_backing_eq,
+			     fuse_inode_backing_set, &fii);
+	if (!inode)
+		return NULL;
+
+	if ((inode->i_state & I_NEW)) {
+		inode->i_flags |= S_NOATIME;
+		if (!fc->writeback_cache)
+			inode->i_flags |= S_NOCMTIME;
+		fuse_init_common(inode);
+		unlock_new_inode(inode);
+	}
+
+	fi = get_fuse_inode(inode);
+	fuse_init_inode(inode, &attr);
+	spin_lock(&fi->lock);
+	fi->nlookup++;
+	spin_unlock(&fi->lock);
+
+	return inode;
+}
+
 struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 			int generation, struct fuse_attr *attr,
 			u64 attr_valid, u64 attr_version)
@@ -358,6 +466,9 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 	struct inode *inode;
 	struct fuse_inode *fi;
 	struct fuse_conn *fc = get_fuse_conn_super(sb);
+	struct fuse_inode_identifier fii = {
+		.nodeid = nodeid,
+	};
 
 	/*
 	 * Auto mount points get their node id from the submount root, which is
@@ -379,7 +490,7 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 	}
 
 retry:
-	inode = iget5_locked(sb, nodeid, fuse_inode_eq, fuse_inode_set, &nodeid);
+	inode = iget5_locked(sb, nodeid, fuse_inode_eq, fuse_inode_set, &fii);
 	if (!inode)
 		return NULL;
 
@@ -411,13 +522,16 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid,
 {
 	struct fuse_mount *fm_iter;
 	struct inode *inode;
+	struct fuse_inode_identifier fii = {
+		.nodeid = nodeid,
+	};
 
 	WARN_ON(!rwsem_is_locked(&fc->killsb));
 	list_for_each_entry(fm_iter, &fc->mounts, fc_entry) {
 		if (!fm_iter->sb)
 			continue;
 
-		inode = ilookup5(fm_iter->sb, nodeid, fuse_inode_eq, &nodeid);
+		inode = ilookup5(fm_iter->sb, nodeid, fuse_inode_eq, &fii);
 		if (inode) {
 			if (fm)
 				*fm = fm_iter;
@@ -504,20 +618,6 @@ static void fuse_send_destroy(struct fuse_mount *fm)
 	}
 }
 
-static void convert_fuse_statfs(struct kstatfs *stbuf, struct fuse_kstatfs *attr)
-{
-	stbuf->f_type    = FUSE_SUPER_MAGIC;
-	stbuf->f_bsize   = attr->bsize;
-	stbuf->f_frsize  = attr->frsize;
-	stbuf->f_blocks  = attr->blocks;
-	stbuf->f_bfree   = attr->bfree;
-	stbuf->f_bavail  = attr->bavail;
-	stbuf->f_files   = attr->files;
-	stbuf->f_ffree   = attr->ffree;
-	stbuf->f_namelen = attr->namelen;
-	/* fsid is left zero */
-}
-
 static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
@@ -525,12 +625,24 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf)
 	FUSE_ARGS(args);
 	struct fuse_statfs_out outarg;
 	int err;
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+#endif
 
 	if (!fuse_allow_current_process(fm->fc)) {
 		buf->f_type = FUSE_SUPER_MAGIC;
 		return 0;
 	}
 
+#ifdef CONFIG_FUSE_BPF
+	fer = fuse_bpf_backing(dentry->d_inode, struct fuse_statfs_out,
+			       fuse_statfs_initialize, fuse_statfs_backing,
+			       fuse_statfs_finalize,
+			       dentry, buf);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	memset(&outarg, 0, sizeof(outarg));
 	args.in_numargs = 0;
 	args.opcode = FUSE_STATFS;
@@ -647,6 +759,9 @@ enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_ROOT_BPF,
+	OPT_ROOT_DIR,
+	OPT_NO_DAEMON,
 	OPT_ERR
 };
 
@@ -661,6 +776,9 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
 	fsparam_u32	("max_read",		OPT_MAX_READ),
 	fsparam_u32	("blksize",		OPT_BLKSIZE),
 	fsparam_string	("subtype",		OPT_SUBTYPE),
+	fsparam_u32	("root_bpf",		OPT_ROOT_BPF),
+	fsparam_u32	("root_dir",		OPT_ROOT_DIR),
+	fsparam_flag	("no_daemon",		OPT_NO_DAEMON),
 	{}
 };
 
@@ -744,6 +862,26 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
 		ctx->blksize = result.uint_32;
 		break;
 
+	case OPT_ROOT_BPF:
+		ctx->root_bpf = bpf_prog_get_type_dev(result.uint_32,
+						BPF_PROG_TYPE_FUSE, false);
+		if (IS_ERR(ctx->root_bpf)) {
+			ctx->root_bpf = NULL;
+			return invalfc(fsc, "Unable to open bpf program");
+		}
+		break;
+
+	case OPT_ROOT_DIR:
+		ctx->root_dir = fget(result.uint_32);
+		if (!ctx->root_dir)
+			return invalfc(fsc, "Unable to open root directory");
+		break;
+
+	case OPT_NO_DAEMON:
+		ctx->no_daemon = true;
+		ctx->fd_present = true;
+		break;
+
 	default:
 		return -EINVAL;
 	}
@@ -756,6 +894,10 @@ static void fuse_free_fsc(struct fs_context *fsc)
 	struct fuse_fs_context *ctx = fsc->fs_private;
 
 	if (ctx) {
+		if (ctx->root_dir)
+			fput(ctx->root_dir);
+		if (ctx->root_bpf)
+			bpf_prog_put(ctx->root_bpf);
 		kfree(ctx->subtype);
 		kfree(ctx);
 	}
@@ -825,6 +967,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 	memset(fc, 0, sizeof(*fc));
 	spin_lock_init(&fc->lock);
 	spin_lock_init(&fc->bg_lock);
+	spin_lock_init(&fc->passthrough_req_lock);
 	init_rwsem(&fc->killsb);
 	refcount_set(&fc->count, 1);
 	atomic_set(&fc->dev_count, 1);
@@ -833,6 +976,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 	INIT_LIST_HEAD(&fc->bg_queue);
 	INIT_LIST_HEAD(&fc->entry);
 	INIT_LIST_HEAD(&fc->devices);
+	idr_init(&fc->passthrough_req);
 	atomic_set(&fc->num_waiting, 0);
 	fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
 	fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
@@ -883,15 +1027,34 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc)
 }
 EXPORT_SYMBOL_GPL(fuse_conn_get);
 
-static struct inode *fuse_get_root_inode(struct super_block *sb, unsigned mode)
+static struct inode *fuse_get_root_inode(struct super_block *sb,
+					 unsigned int mode,
+					 struct bpf_prog *root_bpf,
+					 struct file *backing_fd)
 {
 	struct fuse_attr attr;
-	memset(&attr, 0, sizeof(attr));
+	struct inode *inode;
 
+	memset(&attr, 0, sizeof(attr));
 	attr.mode = mode;
 	attr.ino = FUSE_ROOT_ID;
 	attr.nlink = 1;
-	return fuse_iget(sb, 1, 0, &attr, 0, 0);
+	inode = fuse_iget(sb, 1, 0, &attr, 0, 0);
+	if (!inode)
+		return NULL;
+
+#ifdef CONFIG_FUSE_BPF
+	get_fuse_inode(inode)->bpf = root_bpf;
+	if (root_bpf)
+		bpf_prog_inc(root_bpf);
+
+	if (backing_fd) {
+		get_fuse_inode(inode)->backing_inode = backing_fd->f_inode;
+		ihold(backing_fd->f_inode);
+	}
+#endif
+
+	return inode;
 }
 
 struct fuse_inode_handle {
@@ -906,11 +1069,14 @@ static struct dentry *fuse_get_dentry(struct super_block *sb,
 	struct inode *inode;
 	struct dentry *entry;
 	int err = -ESTALE;
+	struct fuse_inode_identifier fii = {
+		.nodeid = handle->nodeid,
+	};
 
 	if (handle->nodeid == 0)
 		goto out_err;
 
-	inode = ilookup5(sb, handle->nodeid, fuse_inode_eq, &handle->nodeid);
+	inode = ilookup5(sb, handle->nodeid, fuse_inode_eq, &fii);
 	if (!inode) {
 		struct fuse_entry_out outarg;
 		const struct qstr name = QSTR_INIT(".", 1);
@@ -919,7 +1085,7 @@ static struct dentry *fuse_get_dentry(struct super_block *sb,
 			goto out_err;
 
 		err = fuse_lookup_name(sb, handle->nodeid, &name, &outarg,
-				       &inode);
+				       NULL, &inode);
 		if (err && err != -ENOENT)
 			goto out_err;
 		if (err || !inode) {
@@ -1013,13 +1179,14 @@ static struct dentry *fuse_get_parent(struct dentry *child)
 	struct inode *inode;
 	struct dentry *parent;
 	struct fuse_entry_out outarg;
+	const struct qstr name = QSTR_INIT("..", 2);
 	int err;
 
 	if (!fc->export_support)
 		return ERR_PTR(-ESTALE);
 
 	err = fuse_lookup_name(child_inode->i_sb, get_node_id(child_inode),
-			       &dotdot_name, &outarg, &inode);
+			       &name, &outarg, NULL, &inode);
 	if (err) {
 		if (err == -ENOENT)
 			return ERR_PTR(-ESTALE);
@@ -1197,6 +1364,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 				fc->handle_killpriv_v2 = 1;
 				fm->sb->s_flags |= SB_NOSEC;
 			}
+			if (flags & FUSE_PASSTHROUGH) {
+				fc->passthrough = 1;
+				/* Prevent further stacking */
+				fm->sb->s_stack_depth =
+					FILESYSTEM_MAX_STACK_DEPTH;
+			}
 			if (flags & FUSE_SETXATTR_EXT)
 				fc->setxattr_ext = 1;
 			if (flags & FUSE_SECURITY_CTX)
@@ -1256,6 +1429,8 @@ void fuse_send_init(struct fuse_mount *fm)
 	if (fm->fc->auto_submounts)
 		flags |= FUSE_SUBMOUNTS;
 
+	flags |= FUSE_PASSTHROUGH;
+
 	ia->in.flags = flags;
 	ia->in.flags2 = flags >> 32;
 
@@ -1274,14 +1449,26 @@ void fuse_send_init(struct fuse_mount *fm)
 	ia->args.nocreds = true;
 	ia->args.end = process_init_reply;
 
-	if (fuse_simple_background(fm, &ia->args, GFP_KERNEL) != 0)
+	if (unlikely(fm->fc->no_daemon) || fuse_simple_background(fm, &ia->args, GFP_KERNEL) != 0)
 		process_init_reply(fm, &ia->args, -ENOTCONN);
 }
 EXPORT_SYMBOL_GPL(fuse_send_init);
 
+static int free_fuse_passthrough(int id, void *p, void *data)
+{
+	struct fuse_passthrough *passthrough = (struct fuse_passthrough *)p;
+
+	fuse_passthrough_release(passthrough);
+	kfree(p);
+
+	return 0;
+}
+
 void fuse_free_conn(struct fuse_conn *fc)
 {
 	WARN_ON(!list_empty(&fc->devices));
+	idr_for_each(&fc->passthrough_req, free_fuse_passthrough, NULL);
+	idr_destroy(&fc->passthrough_req);
 	kfree_rcu(fc, rcu);
 }
 EXPORT_SYMBOL_GPL(fuse_free_conn);
@@ -1386,28 +1573,6 @@ void fuse_dev_free(struct fuse_dev *fud)
 }
 EXPORT_SYMBOL_GPL(fuse_dev_free);
 
-static void fuse_fill_attr_from_inode(struct fuse_attr *attr,
-				      const struct fuse_inode *fi)
-{
-	*attr = (struct fuse_attr){
-		.ino		= fi->inode.i_ino,
-		.size		= fi->inode.i_size,
-		.blocks		= fi->inode.i_blocks,
-		.atime		= fi->inode.i_atime.tv_sec,
-		.mtime		= fi->inode.i_mtime.tv_sec,
-		.ctime		= fi->inode.i_ctime.tv_sec,
-		.atimensec	= fi->inode.i_atime.tv_nsec,
-		.mtimensec	= fi->inode.i_mtime.tv_nsec,
-		.ctimensec	= fi->inode.i_ctime.tv_nsec,
-		.mode		= fi->inode.i_mode,
-		.nlink		= fi->inode.i_nlink,
-		.uid		= fi->inode.i_uid.val,
-		.gid		= fi->inode.i_gid.val,
-		.rdev		= fi->inode.i_rdev,
-		.blksize	= 1u << fi->inode.i_blkbits,
-	};
-}
-
 static void fuse_sb_defaults(struct super_block *sb)
 {
 	sb->s_magic = FUSE_SUPER_MAGIC;
@@ -1451,7 +1616,7 @@ static int fuse_fill_super_submount(struct super_block *sb,
 	if (parent_sb->s_subtype && !sb->s_subtype)
 		return -ENOMEM;
 
-	fuse_fill_attr_from_inode(&root_attr, parent_fi);
+	fuse_fill_attr_from_inode(&root_attr, &parent_fi->inode);
 	root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0);
 	/*
 	 * This inode is just a duplicate, so it is not looked up and
@@ -1578,13 +1743,16 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	fc->destroy = ctx->destroy;
 	fc->no_control = ctx->no_control;
 	fc->no_force_umount = ctx->no_force_umount;
+	fc->no_daemon = ctx->no_daemon;
 
 	err = -ENOMEM;
-	root = fuse_get_root_inode(sb, ctx->rootmode);
+	root = fuse_get_root_inode(sb, ctx->rootmode, ctx->root_bpf,
+				   ctx->root_dir);
 	sb->s_d_op = &fuse_root_dentry_operations;
 	root_dentry = d_make_root(root);
 	if (!root_dentry)
 		goto err_dev_free;
+	fuse_init_dentry_root(root_dentry, ctx->root_dir);
 	/* Root dentry doesn't have .d_revalidate */
 	sb->s_d_op = &fuse_dentry_operations;
 
@@ -1623,18 +1791,20 @@ static int fuse_fill_super(struct super_block *sb, struct fs_context *fsc)
 	struct fuse_fs_context *ctx = fsc->fs_private;
 	int err;
 
-	if (!ctx->file || !ctx->rootmode_present ||
-	    !ctx->user_id_present || !ctx->group_id_present)
-		return -EINVAL;
+	if (!ctx->no_daemon) {
+		if (!ctx->file || !ctx->rootmode_present ||
+		    !ctx->user_id_present || !ctx->group_id_present)
+			return -EINVAL;
 
-	/*
-	 * Require mount to happen from the same user namespace which
-	 * opened /dev/fuse to prevent potential attacks.
-	 */
-	if ((ctx->file->f_op != &fuse_dev_operations) ||
-	    (ctx->file->f_cred->user_ns != sb->s_user_ns))
-		return -EINVAL;
-	ctx->fudptr = &ctx->file->private_data;
+		/*
+		 * Require mount to happen from the same user namespace which
+		 * opened /dev/fuse to prevent potential attacks.
+		 */
+		if ((ctx->file->f_op != &fuse_dev_operations) ||
+		    (ctx->file->f_cred->user_ns != sb->s_user_ns))
+			return -EINVAL;
+		ctx->fudptr = &ctx->file->private_data;
+	}
 
 	err = fuse_fill_super_common(sb, ctx);
 	if (err)
@@ -1915,6 +2085,57 @@ static void fuse_fs_cleanup(void)
 
 static struct kobject *fuse_kobj;
 
+static ssize_t fuse_bpf_show(struct kobject *kobj,
+				       struct kobj_attribute *attr, char *buff)
+{
+	return sysfs_emit(buff, "supported\n");
+}
+
+static struct kobj_attribute fuse_bpf_attr =
+		__ATTR_RO(fuse_bpf);
+
+static struct attribute *bpf_features[] = {
+	&fuse_bpf_attr.attr,
+	NULL,
+};
+
+static const struct attribute_group bpf_features_group = {
+	.name = "features",
+	.attrs = bpf_features,
+};
+
+/*
+ * TODO Remove this once fuse-bpf is upstreamed
+ *
+ * bpf_prog_type_fuse exports the bpf_prog_type_fuse 'constant', which cannot be
+ * constant until the code is upstreamed
+ */
+static ssize_t bpf_prog_type_fuse_show(struct kobject *kobj,
+				       struct kobj_attribute *attr, char *buff)
+{
+	return sysfs_emit(buff, "%d\n", BPF_PROG_TYPE_FUSE);
+}
+
+static struct kobj_attribute bpf_prog_type_fuse_attr =
+		__ATTR_RO(bpf_prog_type_fuse);
+
+static struct attribute *bpf_attributes[] = {
+	&bpf_prog_type_fuse_attr.attr,
+	NULL,
+};
+
+static const struct attribute_group bpf_attr_group = {
+	.attrs = bpf_attributes,
+};
+
+static const struct attribute_group *attribute_groups[] = {
+	&bpf_features_group,
+	&bpf_attr_group,
+	NULL
+};
+
+/* TODO remove to here */
+
 static int fuse_sysfs_init(void)
 {
 	int err;
@@ -1929,8 +2150,15 @@ static int fuse_sysfs_init(void)
 	if (err)
 		goto out_fuse_unregister;
 
+	/* TODO Remove when BPF_PROG_TYPE_FUSE is upstreamed */
+	err = sysfs_create_groups(fuse_kobj, attribute_groups);
+	if (err)
+		goto out_fuse_remove_mount_point;
+
 	return 0;
 
+ out_fuse_remove_mount_point:
+	sysfs_remove_mount_point(fuse_kobj, "connections");
  out_fuse_unregister:
 	kobject_put(fuse_kobj);
  out_err:
@@ -1939,6 +2167,7 @@ static int fuse_sysfs_init(void)
 
 static void fuse_sysfs_cleanup(void)
 {
+	sysfs_remove_groups(fuse_kobj, attribute_groups);
 	sysfs_remove_mount_point(fuse_kobj, "connections");
 	kobject_put(fuse_kobj);
 }
@@ -1967,11 +2196,21 @@ static int __init fuse_init(void)
 	if (res)
 		goto err_sysfs_cleanup;
 
+#ifdef CONFIG_FUSE_BPF
+	res = fuse_bpf_init();
+	if (res)
+		goto err_ctl_cleanup;
+#endif
+
 	sanitize_global_limit(&max_user_bgreq);
 	sanitize_global_limit(&max_user_congthresh);
 
 	return 0;
 
+#ifdef CONFIG_FUSE_BPF
+ err_ctl_cleanup:
+	fuse_ctl_cleanup();
+#endif
  err_sysfs_cleanup:
 	fuse_sysfs_cleanup();
  err_dev_cleanup:
@@ -1989,6 +2228,9 @@ static void __exit fuse_exit(void)
 	fuse_ctl_cleanup();
 	fuse_sysfs_cleanup();
 	fuse_fs_cleanup();
+#ifdef CONFIG_FUSE_BPF
+	fuse_bpf_cleanup();
+#endif
 	fuse_dev_cleanup();
 }
 

diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
new file mode 100644
index 0000000..c0ae306
--- /dev/null
+++ b/fs/fuse/passthrough.c

@@ -0,0 +1,294 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "fuse_i.h"
+
+#include <linux/file.h>
+#include <linux/fuse.h>
+#include <linux/idr.h>
+#include <linux/uio.h>
+
+#define PASSTHROUGH_IOCB_MASK                                                  \
+	(IOCB_APPEND | IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
+
+struct fuse_aio_req {
+	struct kiocb iocb;
+	struct kiocb *iocb_fuse;
+};
+
+static void fuse_file_accessed(struct file *dst_file, struct file *src_file)
+{
+	struct inode *dst_inode;
+	struct inode *src_inode;
+
+	if (dst_file->f_flags & O_NOATIME)
+		return;
+
+	dst_inode = file_inode(dst_file);
+	src_inode = file_inode(src_file);
+
+	if ((!timespec64_equal(&dst_inode->i_mtime, &src_inode->i_mtime) ||
+	     !timespec64_equal(&dst_inode->i_ctime, &src_inode->i_ctime))) {
+		dst_inode->i_mtime = src_inode->i_mtime;
+		dst_inode->i_ctime = src_inode->i_ctime;
+	}
+
+	touch_atime(&dst_file->f_path);
+}
+
+void fuse_copyattr(struct file *dst_file, struct file *src_file)
+{
+	struct inode *dst = file_inode(dst_file);
+	struct inode *src = file_inode(src_file);
+
+	dst->i_atime = src->i_atime;
+	dst->i_mtime = src->i_mtime;
+	dst->i_ctime = src->i_ctime;
+	i_size_write(dst, i_size_read(src));
+}
+
+static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req)
+{
+	struct kiocb *iocb = &aio_req->iocb;
+	struct kiocb *iocb_fuse = aio_req->iocb_fuse;
+
+	if (iocb->ki_flags & IOCB_WRITE) {
+		__sb_writers_acquired(file_inode(iocb->ki_filp)->i_sb,
+				      SB_FREEZE_WRITE);
+		file_end_write(iocb->ki_filp);
+		fuse_copyattr(iocb_fuse->ki_filp, iocb->ki_filp);
+	}
+
+	iocb_fuse->ki_pos = iocb->ki_pos;
+	kfree(aio_req);
+}
+
+static void fuse_aio_rw_complete(struct kiocb *iocb, long res)
+{
+	struct fuse_aio_req *aio_req =
+		container_of(iocb, struct fuse_aio_req, iocb);
+	struct kiocb *iocb_fuse = aio_req->iocb_fuse;
+
+	fuse_aio_cleanup_handler(aio_req);
+	iocb_fuse->ki_complete(iocb_fuse, res);
+}
+
+ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
+				   struct iov_iter *iter)
+{
+	ssize_t ret;
+	const struct cred *old_cred;
+	struct file *fuse_filp = iocb_fuse->ki_filp;
+	struct fuse_file *ff = fuse_filp->private_data;
+	struct file *passthrough_filp = ff->passthrough.filp;
+
+	if (!iov_iter_count(iter))
+		return 0;
+
+	old_cred = override_creds(ff->passthrough.cred);
+	if (is_sync_kiocb(iocb_fuse)) {
+		ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos,
+				    iocb_to_rw_flags(iocb_fuse->ki_flags,
+						     PASSTHROUGH_IOCB_MASK));
+	} else {
+		struct fuse_aio_req *aio_req;
+
+		aio_req = kmalloc(sizeof(struct fuse_aio_req), GFP_KERNEL);
+		if (!aio_req) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		aio_req->iocb_fuse = iocb_fuse;
+		kiocb_clone(&aio_req->iocb, iocb_fuse, passthrough_filp);
+		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
+		ret = call_read_iter(passthrough_filp, &aio_req->iocb, iter);
+		if (ret != -EIOCBQUEUED)
+			fuse_aio_cleanup_handler(aio_req);
+	}
+out:
+	revert_creds(old_cred);
+
+	fuse_file_accessed(fuse_filp, passthrough_filp);
+
+	return ret;
+}
+
+ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
+				    struct iov_iter *iter)
+{
+	ssize_t ret;
+	const struct cred *old_cred;
+	struct file *fuse_filp = iocb_fuse->ki_filp;
+	struct fuse_file *ff = fuse_filp->private_data;
+	struct inode *fuse_inode = file_inode(fuse_filp);
+	struct file *passthrough_filp = ff->passthrough.filp;
+	struct inode *passthrough_inode = file_inode(passthrough_filp);
+
+	if (!iov_iter_count(iter))
+		return 0;
+
+	inode_lock(fuse_inode);
+
+	fuse_copyattr(fuse_filp, passthrough_filp);
+
+	old_cred = override_creds(ff->passthrough.cred);
+	if (is_sync_kiocb(iocb_fuse)) {
+		file_start_write(passthrough_filp);
+		ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos,
+				     iocb_to_rw_flags(iocb_fuse->ki_flags,
+						      PASSTHROUGH_IOCB_MASK));
+		file_end_write(passthrough_filp);
+		if (ret > 0)
+			fuse_copyattr(fuse_filp, passthrough_filp);
+	} else {
+		struct fuse_aio_req *aio_req;
+
+		aio_req = kmalloc(sizeof(struct fuse_aio_req), GFP_KERNEL);
+		if (!aio_req) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		file_start_write(passthrough_filp);
+		__sb_writers_release(passthrough_inode->i_sb, SB_FREEZE_WRITE);
+
+		aio_req->iocb_fuse = iocb_fuse;
+		kiocb_clone(&aio_req->iocb, iocb_fuse, passthrough_filp);
+		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
+		ret = call_write_iter(passthrough_filp, &aio_req->iocb, iter);
+		if (ret != -EIOCBQUEUED)
+			fuse_aio_cleanup_handler(aio_req);
+	}
+out:
+	revert_creds(old_cred);
+	inode_unlock(fuse_inode);
+
+	return ret;
+}
+
+ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	int ret;
+	const struct cred *old_cred;
+	struct fuse_file *ff = file->private_data;
+	struct file *passthrough_filp = ff->passthrough.filp;
+
+	if (!passthrough_filp->f_op->mmap)
+		return -ENODEV;
+
+	if (WARN_ON(file != vma->vm_file))
+		return -EIO;
+
+	vma->vm_file = get_file(passthrough_filp);
+
+	old_cred = override_creds(ff->passthrough.cred);
+	ret = call_mmap(vma->vm_file, vma);
+	revert_creds(old_cred);
+
+	if (ret)
+		fput(passthrough_filp);
+	else
+		fput(file);
+
+	fuse_file_accessed(file, passthrough_filp);
+
+	return ret;
+}
+
+int fuse_passthrough_open(struct fuse_dev *fud, u32 lower_fd)
+{
+	int res;
+	struct file *passthrough_filp;
+	struct fuse_conn *fc = fud->fc;
+	struct inode *passthrough_inode;
+	struct super_block *passthrough_sb;
+	struct fuse_passthrough *passthrough;
+
+	if (!fc->passthrough)
+		return -EPERM;
+
+	passthrough_filp = fget(lower_fd);
+	if (!passthrough_filp) {
+		pr_err("FUSE: invalid file descriptor for passthrough.\n");
+		return -EBADF;
+	}
+
+	if (!passthrough_filp->f_op->read_iter ||
+	    !passthrough_filp->f_op->write_iter) {
+		pr_err("FUSE: passthrough file misses file operations.\n");
+		res = -EBADF;
+		goto err_free_file;
+	}
+
+	passthrough_inode = file_inode(passthrough_filp);
+	passthrough_sb = passthrough_inode->i_sb;
+	if (passthrough_sb->s_stack_depth >= FILESYSTEM_MAX_STACK_DEPTH) {
+		pr_err("FUSE: fs stacking depth exceeded for passthrough\n");
+		res = -EINVAL;
+		goto err_free_file;
+	}
+
+	passthrough = kmalloc(sizeof(struct fuse_passthrough), GFP_KERNEL);
+	if (!passthrough) {
+		res = -ENOMEM;
+		goto err_free_file;
+	}
+
+	passthrough->filp = passthrough_filp;
+	passthrough->cred = prepare_creds();
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fc->passthrough_req_lock);
+	res = idr_alloc(&fc->passthrough_req, passthrough, 1, 0, GFP_ATOMIC);
+	spin_unlock(&fc->passthrough_req_lock);
+	idr_preload_end();
+
+	if (res > 0)
+		return res;
+
+	fuse_passthrough_release(passthrough);
+	kfree(passthrough);
+
+err_free_file:
+	fput(passthrough_filp);
+
+	return res;
+}
+
+int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
+			   struct fuse_open_out *openarg)
+{
+	struct fuse_passthrough *passthrough;
+	int passthrough_fh = openarg->passthrough_fh;
+
+	if (!fc->passthrough)
+		return -EPERM;
+
+	/* Default case, passthrough is not requested */
+	if (passthrough_fh <= 0)
+		return -EINVAL;
+
+	spin_lock(&fc->passthrough_req_lock);
+	passthrough = idr_remove(&fc->passthrough_req, passthrough_fh);
+	spin_unlock(&fc->passthrough_req_lock);
+
+	if (!passthrough)
+		return -EINVAL;
+
+	ff->passthrough = *passthrough;
+	kfree(passthrough);
+
+	return 0;
+}
+
+void fuse_passthrough_release(struct fuse_passthrough *passthrough)
+{
+	if (passthrough->filp) {
+		fput(passthrough->filp);
+		passthrough->filp = NULL;
+	}
+	if (passthrough->cred) {
+		put_cred(passthrough->cred);
+		passthrough->cred = NULL;
+	}
+}

diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index e8deaac..4d97cdd 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c

@@ -20,6 +20,8 @@ static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
 
 	if (!fc->do_readdirplus)
 		return false;
+	if (fi->nodeid == 0)
+		return false;
 	if (!fc->readdirplus_auto)
 		return true;
 	if (test_and_clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state))
@@ -579,6 +581,26 @@ int fuse_readdir(struct file *file, struct dir_context *ctx)
 	struct inode *inode = file_inode(file);
 	int err;
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+	bool allow_force;
+	bool force_again = false;
+	bool is_continued = false;
+
+again:
+	fer = fuse_bpf_backing(inode, struct fuse_read_io,
+			       fuse_readdir_initialize, fuse_readdir_backing,
+			       fuse_readdir_finalize,
+			       file, ctx, &force_again, &allow_force, is_continued);
+	if (force_again && !IS_ERR(fer.result)) {
+		is_continued = true;
+		goto again;
+	}
+
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 4d8d4f1..23a42d8 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c

@@ -970,7 +970,7 @@ static struct virtio_driver virtio_fs_driver = {
 #endif
 };
 
-static void virtio_fs_wake_forget_and_unlock(struct fuse_iqueue *fiq)
+static void virtio_fs_wake_forget_and_unlock(struct fuse_iqueue *fiq, bool sync)
 __releases(fiq->lock)
 {
 	struct fuse_forget_link *link;
@@ -1005,7 +1005,8 @@ __releases(fiq->lock)
 	kfree(link);
 }
 
-static void virtio_fs_wake_interrupt_and_unlock(struct fuse_iqueue *fiq)
+static void virtio_fs_wake_interrupt_and_unlock(struct fuse_iqueue *fiq,
+						bool sync)
 __releases(fiq->lock)
 {
 	/*
@@ -1220,7 +1221,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 	return ret;
 }
 
-static void virtio_fs_wake_pending_and_unlock(struct fuse_iqueue *fiq)
+static void virtio_fs_wake_pending_and_unlock(struct fuse_iqueue *fiq,
+					      bool sync)
 __releases(fiq->lock)
 {
 	unsigned int queue_id = VQ_REQUEST; /* TODO multiqueue */

diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 0d3e717..3029d1e 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c

@@ -115,6 +115,17 @@ ssize_t fuse_listxattr(struct dentry *entry, char *list, size_t size)
 	struct fuse_getxattr_out outarg;
 	ssize_t ret;
 
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_getxattr_io,
+			       fuse_listxattr_initialize,
+			       fuse_listxattr_backing, fuse_listxattr_finalize,
+			       entry, list, size);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 
@@ -182,6 +193,17 @@ static int fuse_xattr_get(const struct xattr_handler *handler,
 			 struct dentry *dentry, struct inode *inode,
 			 const char *name, void *value, size_t size)
 {
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	fer = fuse_bpf_backing(inode, struct fuse_getxattr_io,
+			       fuse_getxattr_initialize, fuse_getxattr_backing,
+			       fuse_getxattr_finalize,
+			       dentry, name, value, size);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 
@@ -194,6 +216,24 @@ static int fuse_xattr_set(const struct xattr_handler *handler,
 			  const char *name, const void *value, size_t size,
 			  int flags)
 {
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_err_ret fer;
+
+	if (value)
+		fer = fuse_bpf_backing(inode, struct fuse_setxattr_in,
+			       fuse_setxattr_initialize, fuse_setxattr_backing,
+			       fuse_setxattr_finalize, dentry, name, value,
+			       size, flags);
+	else
+		fer = fuse_bpf_backing(inode, struct fuse_dummy_io,
+				       fuse_removexattr_initialize,
+				       fuse_removexattr_backing,
+				       fuse_removexattr_finalize,
+				       dentry, name);
+	if (fer.ret)
+		return PTR_ERR(fer.result);
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 

diff --git a/fs/incfs/Kconfig b/fs/incfs/Kconfig
new file mode 100644
index 0000000..5f15000
--- /dev/null
+++ b/fs/incfs/Kconfig

@@ -0,0 +1,13 @@
+config INCREMENTAL_FS
+	tristate "Incremental file system support"
+	depends on BLOCK
+	select DECOMPRESS_LZ4
+	select DECOMPRESS_ZSTD
+	select CRYPTO_SHA256
+	help
+	  Incremental FS is a read-only virtual file system that facilitates execution
+	  of programs while their binaries are still being lazily downloaded over the
+	  network, USB or pigeon post.
+
+	  To compile this file system support as a module, choose M here: the
+	  module will be called incrementalfs.

diff --git a/fs/incfs/Makefile b/fs/incfs/Makefile
new file mode 100644
index 0000000..05795d1
--- /dev/null
+++ b/fs/incfs/Makefile

@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_INCREMENTAL_FS)	+= incrementalfs.o
+
+incrementalfs-y := \
+	data_mgmt.o \
+	format.o \
+	integrity.o \
+	main.o \
+	pseudo_files.o \
+	sysfs.o \
+	vfs.o
+
+incrementalfs-$(CONFIG_FS_VERITY) += verity.o

diff --git a/fs/incfs/OWNERS b/fs/incfs/OWNERS
new file mode 100644
index 0000000..1b97669
--- /dev/null
+++ b/fs/incfs/OWNERS

@@ -0,0 +1,2 @@
+akailash@google.com
+paullawrence@google.com

diff --git a/fs/incfs/data_mgmt.c b/fs/incfs/data_mgmt.c
new file mode 100644
index 0000000..01ee5d2
--- /dev/null
+++ b/fs/incfs/data_mgmt.c

@@ -0,0 +1,1892 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2019 Google LLC
+ */
+#include <linux/crc32.h>
+#include <linux/delay.h>
+#include <linux/file.h>
+#include <linux/fsverity.h>
+#include <linux/gfp.h>
+#include <linux/kobject.h>
+#include <linux/ktime.h>
+#include <linux/lz4.h>
+#include <linux/mm.h>
+#include <linux/namei.h>
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+
+#include "data_mgmt.h"
+#include "format.h"
+#include "integrity.h"
+#include "sysfs.h"
+#include "verity.h"
+
+static int incfs_scan_metadata_chain(struct data_file *df);
+
+static void log_wake_up_all(struct work_struct *work)
+{
+	struct delayed_work *dw = container_of(work, struct delayed_work, work);
+	struct read_log *rl = container_of(dw, struct read_log, ml_wakeup_work);
+	wake_up_all(&rl->ml_notif_wq);
+}
+
+static void zstd_free_workspace(struct work_struct *work)
+{
+	struct delayed_work *dw = container_of(work, struct delayed_work, work);
+	struct mount_info *mi =
+		container_of(dw, struct mount_info, mi_zstd_cleanup_work);
+
+	mutex_lock(&mi->mi_zstd_workspace_mutex);
+	kvfree(mi->mi_zstd_workspace);
+	mi->mi_zstd_workspace = NULL;
+	mi->mi_zstd_stream = NULL;
+	mutex_unlock(&mi->mi_zstd_workspace_mutex);
+}
+
+struct mount_info *incfs_alloc_mount_info(struct super_block *sb,
+					  struct mount_options *options,
+					  struct path *backing_dir_path)
+{
+	struct mount_info *mi = NULL;
+	int error = 0;
+	struct incfs_sysfs_node *node;
+
+	mi = kzalloc(sizeof(*mi), GFP_NOFS);
+	if (!mi)
+		return ERR_PTR(-ENOMEM);
+
+	mi->mi_sb = sb;
+	mi->mi_backing_dir_path = *backing_dir_path;
+	mi->mi_owner = get_current_cred();
+	path_get(&mi->mi_backing_dir_path);
+	mutex_init(&mi->mi_dir_struct_mutex);
+	init_waitqueue_head(&mi->mi_pending_reads_notif_wq);
+	init_waitqueue_head(&mi->mi_log.ml_notif_wq);
+	init_waitqueue_head(&mi->mi_blocks_written_notif_wq);
+	atomic_set(&mi->mi_blocks_written, 0);
+	INIT_DELAYED_WORK(&mi->mi_log.ml_wakeup_work, log_wake_up_all);
+	spin_lock_init(&mi->mi_log.rl_lock);
+	spin_lock_init(&mi->pending_read_lock);
+	INIT_LIST_HEAD(&mi->mi_reads_list_head);
+	spin_lock_init(&mi->mi_per_uid_read_timeouts_lock);
+	mutex_init(&mi->mi_zstd_workspace_mutex);
+	INIT_DELAYED_WORK(&mi->mi_zstd_cleanup_work, zstd_free_workspace);
+	mutex_init(&mi->mi_le_mutex);
+
+	node = incfs_add_sysfs_node(options->sysfs_name, mi);
+	if (IS_ERR(node)) {
+		error = PTR_ERR(node);
+		goto err;
+	}
+	mi->mi_sysfs_node = node;
+
+	error = incfs_realloc_mount_info(mi, options);
+	if (error)
+		goto err;
+
+	return mi;
+
+err:
+	incfs_free_mount_info(mi);
+	return ERR_PTR(error);
+}
+
+int incfs_realloc_mount_info(struct mount_info *mi,
+			     struct mount_options *options)
+{
+	void *new_buffer = NULL;
+	void *old_buffer;
+	size_t new_buffer_size = 0;
+
+	if (options->read_log_pages != mi->mi_options.read_log_pages) {
+		struct read_log_state log_state;
+		/*
+		 * Even though having two buffers allocated at once isn't
+		 * usually good, allocating a multipage buffer under a spinlock
+		 * is even worse, so let's optimize for the shorter lock
+		 * duration. It's not end of the world if we fail to increase
+		 * the buffer size anyway.
+		 */
+		if (options->read_log_pages > 0) {
+			new_buffer_size = PAGE_SIZE * options->read_log_pages;
+			new_buffer = kzalloc(new_buffer_size, GFP_NOFS);
+			if (!new_buffer)
+				return -ENOMEM;
+		}
+
+		spin_lock(&mi->mi_log.rl_lock);
+		old_buffer = mi->mi_log.rl_ring_buf;
+		mi->mi_log.rl_ring_buf = new_buffer;
+		mi->mi_log.rl_size = new_buffer_size;
+		log_state = (struct read_log_state){
+			.generation_id = mi->mi_log.rl_head.generation_id + 1,
+		};
+		mi->mi_log.rl_head = log_state;
+		mi->mi_log.rl_tail = log_state;
+		spin_unlock(&mi->mi_log.rl_lock);
+
+		kfree(old_buffer);
+	}
+
+	if (options->sysfs_name && !mi->mi_sysfs_node)
+		mi->mi_sysfs_node = incfs_add_sysfs_node(options->sysfs_name,
+							 mi);
+	else if (!options->sysfs_name && mi->mi_sysfs_node) {
+		incfs_free_sysfs_node(mi->mi_sysfs_node);
+		mi->mi_sysfs_node = NULL;
+	} else if (options->sysfs_name &&
+		strcmp(options->sysfs_name,
+		       kobject_name(&mi->mi_sysfs_node->isn_sysfs_node))) {
+		incfs_free_sysfs_node(mi->mi_sysfs_node);
+		mi->mi_sysfs_node = incfs_add_sysfs_node(options->sysfs_name,
+							 mi);
+	}
+
+	if (IS_ERR(mi->mi_sysfs_node)) {
+		int err = PTR_ERR(mi->mi_sysfs_node);
+
+		mi->mi_sysfs_node = NULL;
+		return err;
+	}
+
+	mi->mi_options = *options;
+	return 0;
+}
+
+void incfs_free_mount_info(struct mount_info *mi)
+{
+	int i;
+	if (!mi)
+		return;
+
+	flush_delayed_work(&mi->mi_log.ml_wakeup_work);
+	flush_delayed_work(&mi->mi_zstd_cleanup_work);
+
+	dput(mi->mi_index_dir);
+	dput(mi->mi_incomplete_dir);
+	path_put(&mi->mi_backing_dir_path);
+	mutex_destroy(&mi->mi_dir_struct_mutex);
+	mutex_destroy(&mi->mi_zstd_workspace_mutex);
+	put_cred(mi->mi_owner);
+	kfree(mi->mi_log.rl_ring_buf);
+	for (i = 0; i < ARRAY_SIZE(mi->pseudo_file_xattr); ++i)
+		kfree(mi->pseudo_file_xattr[i].data);
+	kfree(mi->mi_per_uid_read_timeouts);
+	incfs_free_sysfs_node(mi->mi_sysfs_node);
+	kfree(mi);
+}
+
+static void data_file_segment_init(struct data_file_segment *segment)
+{
+	init_waitqueue_head(&segment->new_data_arrival_wq);
+	init_rwsem(&segment->rwsem);
+	INIT_LIST_HEAD(&segment->reads_list_head);
+}
+
+char *file_id_to_str(incfs_uuid_t id)
+{
+	char *result = kmalloc(1 + sizeof(id.bytes) * 2, GFP_NOFS);
+	char *end;
+
+	if (!result)
+		return NULL;
+
+	end = bin2hex(result, id.bytes, sizeof(id.bytes));
+	*end = 0;
+	return result;
+}
+
+struct dentry *incfs_lookup_dentry(struct dentry *parent, const char *name)
+{
+	struct inode *inode;
+	struct dentry *result = NULL;
+
+	if (!parent)
+		return ERR_PTR(-EFAULT);
+
+	inode = d_inode(parent);
+	inode_lock_nested(inode, I_MUTEX_PARENT);
+	result = lookup_one_len(name, parent, strlen(name));
+	inode_unlock(inode);
+
+	if (IS_ERR(result))
+		pr_warn("%s err:%ld\n", __func__, PTR_ERR(result));
+
+	return result;
+}
+
+static struct data_file *handle_mapped_file(struct mount_info *mi,
+					    struct data_file *df)
+{
+	char *file_id_str;
+	struct dentry *index_file_dentry;
+	struct path path;
+	struct file *bf;
+	struct data_file *result = NULL;
+	const struct cred *old_cred;
+
+	file_id_str = file_id_to_str(df->df_id);
+	if (!file_id_str)
+		return ERR_PTR(-ENOENT);
+
+	index_file_dentry = incfs_lookup_dentry(mi->mi_index_dir,
+						file_id_str);
+	kfree(file_id_str);
+	if (!index_file_dentry)
+		return ERR_PTR(-ENOENT);
+	if (IS_ERR(index_file_dentry))
+		return ERR_CAST(index_file_dentry);
+	if (!d_really_is_positive(index_file_dentry)) {
+		result = ERR_PTR(-ENOENT);
+		goto out;
+	}
+
+	path = (struct path) {
+		.mnt = mi->mi_backing_dir_path.mnt,
+		.dentry = index_file_dentry
+	};
+
+	old_cred = override_creds(mi->mi_owner);
+	bf = dentry_open(&path, O_RDWR | O_NOATIME | O_LARGEFILE,
+			 current_cred());
+	revert_creds(old_cred);
+
+	if (IS_ERR(bf)) {
+		result = ERR_CAST(bf);
+		goto out;
+	}
+
+	result = incfs_open_data_file(mi, bf);
+	fput(bf);
+	if (IS_ERR(result))
+		goto out;
+
+	result->df_mapped_offset = df->df_metadata_off;
+
+out:
+	dput(index_file_dentry);
+	return result;
+}
+
+struct data_file *incfs_open_data_file(struct mount_info *mi, struct file *bf)
+{
+	struct data_file *df = NULL;
+	struct backing_file_context *bfc = NULL;
+	int md_records;
+	u64 size;
+	int error = 0;
+	int i;
+
+	if (!bf || !mi)
+		return ERR_PTR(-EFAULT);
+
+	if (!S_ISREG(bf->f_inode->i_mode))
+		return ERR_PTR(-EBADF);
+
+	bfc = incfs_alloc_bfc(mi, bf);
+	if (IS_ERR(bfc))
+		return ERR_CAST(bfc);
+
+	df = kzalloc(sizeof(*df), GFP_NOFS);
+	if (!df) {
+		error = -ENOMEM;
+		goto out;
+	}
+
+	mutex_init(&df->df_enable_verity);
+
+	df->df_backing_file_context = bfc;
+	df->df_mount_info = mi;
+	for (i = 0; i < ARRAY_SIZE(df->df_segments); i++)
+		data_file_segment_init(&df->df_segments[i]);
+
+	error = incfs_read_file_header(bfc, &df->df_metadata_off, &df->df_id,
+				       &size, &df->df_header_flags);
+
+	if (error)
+		goto out;
+
+	df->df_size = size;
+	if (size > 0)
+		df->df_data_block_count = get_blocks_count_for_size(size);
+
+	if (df->df_header_flags & INCFS_FILE_MAPPED) {
+		struct data_file *mapped_df = handle_mapped_file(mi, df);
+
+		incfs_free_data_file(df);
+		return mapped_df;
+	}
+
+	md_records = incfs_scan_metadata_chain(df);
+	if (md_records < 0)
+		error = md_records;
+
+out:
+	if (error) {
+		incfs_free_bfc(bfc);
+		if (df)
+			df->df_backing_file_context = NULL;
+		incfs_free_data_file(df);
+		return ERR_PTR(error);
+	}
+	return df;
+}
+
+void incfs_free_data_file(struct data_file *df)
+{
+	u32 data_blocks_written, hash_blocks_written;
+
+	if (!df)
+		return;
+
+	data_blocks_written = atomic_read(&df->df_data_blocks_written);
+	hash_blocks_written = atomic_read(&df->df_hash_blocks_written);
+
+	if (data_blocks_written != df->df_initial_data_blocks_written ||
+	    hash_blocks_written != df->df_initial_hash_blocks_written) {
+		struct backing_file_context *bfc = df->df_backing_file_context;
+		int error = -1;
+
+		if (bfc && !mutex_lock_interruptible(&bfc->bc_mutex)) {
+			error = incfs_write_status_to_backing_file(
+						df->df_backing_file_context,
+						df->df_status_offset,
+						data_blocks_written,
+						hash_blocks_written);
+			mutex_unlock(&bfc->bc_mutex);
+		}
+
+		if (error)
+			/* Nothing can be done, just warn */
+			pr_warn("incfs: failed to write status to backing file\n");
+	}
+
+	incfs_free_mtree(df->df_hash_tree);
+	incfs_free_bfc(df->df_backing_file_context);
+	kfree(df->df_signature);
+	kfree(df->df_verity_file_digest.data);
+	kfree(df->df_verity_signature);
+	mutex_destroy(&df->df_enable_verity);
+	kfree(df);
+}
+
+int make_inode_ready_for_data_ops(struct mount_info *mi,
+				struct inode *inode,
+				struct file *backing_file)
+{
+	struct inode_info *node = get_incfs_node(inode);
+	struct data_file *df = NULL;
+	int err = 0;
+
+	inode_lock(inode);
+	if (S_ISREG(inode->i_mode)) {
+		if (!node->n_file) {
+			df = incfs_open_data_file(mi, backing_file);
+
+			if (IS_ERR(df))
+				err = PTR_ERR(df);
+			else
+				node->n_file = df;
+		}
+	} else
+		err = -EBADF;
+	inode_unlock(inode);
+	return err;
+}
+
+struct dir_file *incfs_open_dir_file(struct mount_info *mi, struct file *bf)
+{
+	struct dir_file *dir = NULL;
+
+	if (!S_ISDIR(bf->f_inode->i_mode))
+		return ERR_PTR(-EBADF);
+
+	dir = kzalloc(sizeof(*dir), GFP_NOFS);
+	if (!dir)
+		return ERR_PTR(-ENOMEM);
+
+	dir->backing_dir = get_file(bf);
+	dir->mount_info = mi;
+	return dir;
+}
+
+void incfs_free_dir_file(struct dir_file *dir)
+{
+	if (!dir)
+		return;
+	if (dir->backing_dir)
+		fput(dir->backing_dir);
+	kfree(dir);
+}
+
+static ssize_t zstd_decompress_safe(struct mount_info *mi,
+				    struct mem_range src, struct mem_range dst)
+{
+	ssize_t result;
+	ZSTD_inBuffer inbuf = {.src = src.data,	.size = src.len};
+	ZSTD_outBuffer outbuf = {.dst = dst.data, .size = dst.len};
+
+	result = mutex_lock_interruptible(&mi->mi_zstd_workspace_mutex);
+	if (result)
+		return result;
+
+	if (!mi->mi_zstd_stream) {
+		unsigned int workspace_size = zstd_dstream_workspace_bound(
+						INCFS_DATA_FILE_BLOCK_SIZE);
+		void *workspace = kvmalloc(workspace_size, GFP_NOFS);
+		ZSTD_DStream *stream;
+
+		if (!workspace) {
+			result = -ENOMEM;
+			goto out;
+		}
+
+		stream = zstd_init_dstream(INCFS_DATA_FILE_BLOCK_SIZE, workspace,
+				  workspace_size);
+		if (!stream) {
+			kvfree(workspace);
+			result = -EIO;
+			goto out;
+		}
+
+		mi->mi_zstd_workspace = workspace;
+		mi->mi_zstd_stream = stream;
+	}
+
+	result = zstd_decompress_stream(mi->mi_zstd_stream, &outbuf, &inbuf) ?
+		-EBADMSG : outbuf.pos;
+
+	mod_delayed_work(system_wq, &mi->mi_zstd_cleanup_work,
+			 msecs_to_jiffies(5000));
+
+out:
+	mutex_unlock(&mi->mi_zstd_workspace_mutex);
+	return result;
+}
+
+static ssize_t decompress(struct mount_info *mi,
+			  struct mem_range src, struct mem_range dst, int alg)
+{
+	int result;
+
+	switch (alg) {
+	case INCFS_BLOCK_COMPRESSED_LZ4:
+		result = LZ4_decompress_safe(src.data, dst.data, src.len,
+					     dst.len);
+		if (result < 0)
+			return -EBADMSG;
+		return result;
+
+	case INCFS_BLOCK_COMPRESSED_ZSTD:
+		return zstd_decompress_safe(mi, src, dst);
+
+	default:
+		WARN_ON(true);
+		return -EOPNOTSUPP;
+	}
+}
+
+static void log_read_one_record(struct read_log *rl, struct read_log_state *rs)
+{
+	union log_record *record =
+		(union log_record *)((u8 *)rl->rl_ring_buf + rs->next_offset);
+	size_t record_size;
+
+	switch (record->full_record.type) {
+	case FULL:
+		rs->base_record = record->full_record;
+		record_size = sizeof(record->full_record);
+		break;
+
+	case SAME_FILE:
+		rs->base_record.block_index =
+			record->same_file.block_index;
+		rs->base_record.absolute_ts_us +=
+			record->same_file.relative_ts_us;
+		rs->base_record.uid = record->same_file.uid;
+		record_size = sizeof(record->same_file);
+		break;
+
+	case SAME_FILE_CLOSE_BLOCK:
+		rs->base_record.block_index +=
+			record->same_file_close_block.block_index_delta;
+		rs->base_record.absolute_ts_us +=
+			record->same_file_close_block.relative_ts_us;
+		record_size = sizeof(record->same_file_close_block);
+		break;
+
+	case SAME_FILE_CLOSE_BLOCK_SHORT:
+		rs->base_record.block_index +=
+			record->same_file_close_block_short.block_index_delta;
+		rs->base_record.absolute_ts_us +=
+		   record->same_file_close_block_short.relative_ts_tens_us * 10;
+		record_size = sizeof(record->same_file_close_block_short);
+		break;
+
+	case SAME_FILE_NEXT_BLOCK:
+		++rs->base_record.block_index;
+		rs->base_record.absolute_ts_us +=
+			record->same_file_next_block.relative_ts_us;
+		record_size = sizeof(record->same_file_next_block);
+		break;
+
+	case SAME_FILE_NEXT_BLOCK_SHORT:
+		++rs->base_record.block_index;
+		rs->base_record.absolute_ts_us +=
+		    record->same_file_next_block_short.relative_ts_tens_us * 10;
+		record_size = sizeof(record->same_file_next_block_short);
+		break;
+	}
+
+	rs->next_offset += record_size;
+	if (rs->next_offset > rl->rl_size - sizeof(*record)) {
+		rs->next_offset = 0;
+		++rs->current_pass_no;
+	}
+	++rs->current_record_no;
+}
+
+static void log_block_read(struct mount_info *mi, incfs_uuid_t *id,
+			   int block_index)
+{
+	struct read_log *log = &mi->mi_log;
+	struct read_log_state *head, *tail;
+	s64 now_us;
+	s64 relative_us;
+	union log_record record;
+	size_t record_size;
+	uid_t uid = current_uid().val;
+	int block_delta;
+	bool same_file, same_uid;
+	bool next_block, close_block, very_close_block;
+	bool close_time, very_close_time, very_very_close_time;
+
+	/*
+	 * This may read the old value, but it's OK to delay the logging start
+	 * right after the configuration update.
+	 */
+	if (READ_ONCE(log->rl_size) == 0)
+		return;
+
+	now_us = ktime_to_us(ktime_get());
+
+	spin_lock(&log->rl_lock);
+	if (log->rl_size == 0) {
+		spin_unlock(&log->rl_lock);
+		return;
+	}
+
+	head = &log->rl_head;
+	tail = &log->rl_tail;
+	relative_us = now_us - head->base_record.absolute_ts_us;
+
+	same_file = !memcmp(id, &head->base_record.file_id,
+			    sizeof(incfs_uuid_t));
+	same_uid = uid == head->base_record.uid;
+
+	block_delta = block_index - head->base_record.block_index;
+	next_block = block_delta == 1;
+	very_close_block = block_delta >= S8_MIN && block_delta <= S8_MAX;
+	close_block = block_delta >= S16_MIN && block_delta <= S16_MAX;
+
+	very_very_close_time = relative_us < (1 << 5) * 10;
+	very_close_time = relative_us < (1 << 13);
+	close_time = relative_us < (1 << 16);
+
+	if (same_file && same_uid && next_block && very_very_close_time) {
+		record.same_file_next_block_short =
+			(struct same_file_next_block_short){
+				.type = SAME_FILE_NEXT_BLOCK_SHORT,
+				.relative_ts_tens_us = div_s64(relative_us, 10),
+			};
+		record_size = sizeof(struct same_file_next_block_short);
+	} else if (same_file && same_uid && next_block && very_close_time) {
+		record.same_file_next_block = (struct same_file_next_block){
+			.type = SAME_FILE_NEXT_BLOCK,
+			.relative_ts_us = relative_us,
+		};
+		record_size = sizeof(struct same_file_next_block);
+	} else if (same_file && same_uid && very_close_block &&
+		   very_very_close_time) {
+		record.same_file_close_block_short =
+			(struct same_file_close_block_short){
+				.type = SAME_FILE_CLOSE_BLOCK_SHORT,
+				.relative_ts_tens_us = div_s64(relative_us, 10),
+				.block_index_delta = block_delta,
+			};
+		record_size = sizeof(struct same_file_close_block_short);
+	} else if (same_file && same_uid && close_block && very_close_time) {
+		record.same_file_close_block = (struct same_file_close_block){
+				.type = SAME_FILE_CLOSE_BLOCK,
+				.relative_ts_us = relative_us,
+				.block_index_delta = block_delta,
+			};
+		record_size = sizeof(struct same_file_close_block);
+	} else if (same_file && close_time) {
+		record.same_file = (struct same_file){
+			.type = SAME_FILE,
+			.block_index = block_index,
+			.relative_ts_us = relative_us,
+			.uid = uid,
+		};
+		record_size = sizeof(struct same_file);
+	} else {
+		record.full_record = (struct full_record){
+			.type = FULL,
+			.block_index = block_index,
+			.file_id = *id,
+			.absolute_ts_us = now_us,
+			.uid = uid,
+		};
+		head->base_record.file_id = *id;
+		record_size = sizeof(struct full_record);
+	}
+
+	head->base_record.block_index = block_index;
+	head->base_record.absolute_ts_us = now_us;
+
+	/* Advance tail beyond area we are going to overwrite */
+	while (tail->current_pass_no < head->current_pass_no &&
+	       tail->next_offset < head->next_offset + record_size)
+		log_read_one_record(log, tail);
+
+	memcpy(((u8 *)log->rl_ring_buf) + head->next_offset, &record,
+	       record_size);
+	head->next_offset += record_size;
+	if (head->next_offset > log->rl_size - sizeof(record)) {
+		head->next_offset = 0;
+		++head->current_pass_no;
+	}
+	++head->current_record_no;
+
+	spin_unlock(&log->rl_lock);
+	schedule_delayed_work(&log->ml_wakeup_work, msecs_to_jiffies(16));
+}
+
+static int validate_hash_tree(struct backing_file_context *bfc, struct file *f,
+			      int block_index, struct mem_range data, u8 *buf)
+{
+	struct data_file *df = get_incfs_data_file(f);
+	u8 stored_digest[INCFS_MAX_HASH_SIZE] = {};
+	u8 calculated_digest[INCFS_MAX_HASH_SIZE] = {};
+	struct mtree *tree = NULL;
+	struct incfs_df_signature *sig = NULL;
+	int digest_size;
+	int hash_block_index = block_index;
+	int lvl;
+	int res;
+	loff_t hash_block_offset[INCFS_MAX_MTREE_LEVELS];
+	size_t hash_offset_in_block[INCFS_MAX_MTREE_LEVELS];
+	int hash_per_block;
+	pgoff_t file_pages;
+
+	/*
+	 * Memory barrier to make sure tree is fully present if added via enable
+	 * verity
+	 */
+	tree = smp_load_acquire(&df->df_hash_tree);
+	sig = df->df_signature;
+	if (!tree || !sig)
+		return 0;
+
+	digest_size = tree->alg->digest_size;
+	hash_per_block = INCFS_DATA_FILE_BLOCK_SIZE / digest_size;
+	for (lvl = 0; lvl < tree->depth; lvl++) {
+		loff_t lvl_off = tree->hash_level_suboffset[lvl];
+
+		hash_block_offset[lvl] =
+			lvl_off + round_down(hash_block_index * digest_size,
+					     INCFS_DATA_FILE_BLOCK_SIZE);
+		hash_offset_in_block[lvl] = hash_block_index * digest_size %
+					    INCFS_DATA_FILE_BLOCK_SIZE;
+		hash_block_index /= hash_per_block;
+	}
+
+	memcpy(stored_digest, tree->root_hash, digest_size);
+
+	file_pages = DIV_ROUND_UP(df->df_size, INCFS_DATA_FILE_BLOCK_SIZE);
+	for (lvl = tree->depth - 1; lvl >= 0; lvl--) {
+		pgoff_t hash_page =
+			file_pages +
+			hash_block_offset[lvl] / INCFS_DATA_FILE_BLOCK_SIZE;
+		struct page *page = find_get_page_flags(
+			f->f_inode->i_mapping, hash_page, FGP_ACCESSED);
+
+		if (page && PageChecked(page)) {
+			u8 *addr = kmap_atomic(page);
+
+			memcpy(stored_digest, addr + hash_offset_in_block[lvl],
+			       digest_size);
+
+			kunmap_atomic(addr);
+			put_page(page);
+			continue;
+		}
+
+		if (page)
+			put_page(page);
+
+		res = incfs_kread(bfc, buf, INCFS_DATA_FILE_BLOCK_SIZE,
+				  hash_block_offset[lvl] + sig->hash_offset);
+		if (res < 0)
+			return res;
+		if (res != INCFS_DATA_FILE_BLOCK_SIZE)
+			return -EIO;
+		res = incfs_calc_digest(tree->alg,
+					range(buf, INCFS_DATA_FILE_BLOCK_SIZE),
+					range(calculated_digest, digest_size));
+		if (res)
+			return res;
+
+		if (memcmp(stored_digest, calculated_digest, digest_size)) {
+			int i;
+			bool zero = true;
+
+			pr_warn("incfs: Hash mismatch lvl:%d blk:%d\n",
+				lvl, block_index);
+			for (i = 0; i < digest_size; i++)
+				if (stored_digest[i]) {
+					zero = false;
+					break;
+				}
+
+			if (zero)
+				pr_debug("Note saved_digest all zero - did you forget to load the hashes?\n");
+			return -EBADMSG;
+		}
+
+		memcpy(stored_digest, buf + hash_offset_in_block[lvl],
+		       digest_size);
+
+		page = grab_cache_page(f->f_inode->i_mapping, hash_page);
+		if (page) {
+			u8 *addr = kmap_atomic(page);
+
+			memcpy(addr, buf, INCFS_DATA_FILE_BLOCK_SIZE);
+			kunmap_atomic(addr);
+			SetPageChecked(page);
+			SetPageUptodate(page);
+			unlock_page(page);
+			put_page(page);
+		}
+	}
+
+	res = incfs_calc_digest(tree->alg, data,
+				range(calculated_digest, digest_size));
+	if (res)
+		return res;
+
+	if (memcmp(stored_digest, calculated_digest, digest_size)) {
+		pr_debug("Leaf hash mismatch blk:%d\n", block_index);
+		return -EBADMSG;
+	}
+
+	return 0;
+}
+
+static struct data_file_segment *get_file_segment(struct data_file *df,
+						  int block_index)
+{
+	int seg_idx = block_index % ARRAY_SIZE(df->df_segments);
+
+	return &df->df_segments[seg_idx];
+}
+
+static bool is_data_block_present(struct data_file_block *block)
+{
+	return (block->db_backing_file_data_offset != 0) &&
+	       (block->db_stored_size != 0);
+}
+
+static void convert_data_file_block(struct incfs_blockmap_entry *bme,
+				    struct data_file_block *res_block)
+{
+	u16 flags = le16_to_cpu(bme->me_flags);
+
+	res_block->db_backing_file_data_offset =
+		le16_to_cpu(bme->me_data_offset_hi);
+	res_block->db_backing_file_data_offset <<= 32;
+	res_block->db_backing_file_data_offset |=
+		le32_to_cpu(bme->me_data_offset_lo);
+	res_block->db_stored_size = le16_to_cpu(bme->me_data_size);
+	res_block->db_comp_alg = flags & INCFS_BLOCK_COMPRESSED_MASK;
+}
+
+static int get_data_file_block(struct data_file *df, int index,
+			       struct data_file_block *res_block)
+{
+	struct incfs_blockmap_entry bme = {};
+	struct backing_file_context *bfc = NULL;
+	loff_t blockmap_off = 0;
+	int error = 0;
+
+	if (!df || !res_block)
+		return -EFAULT;
+
+	blockmap_off = df->df_blockmap_off;
+	bfc = df->df_backing_file_context;
+
+	if (index < 0 || blockmap_off == 0)
+		return -EINVAL;
+
+	error = incfs_read_blockmap_entry(bfc, index, blockmap_off, &bme);
+	if (error)
+		return error;
+
+	convert_data_file_block(&bme, res_block);
+	return 0;
+}
+
+static int check_room_for_one_range(u32 size, u32 size_out)
+{
+	if (size_out + sizeof(struct incfs_filled_range) > size)
+		return -ERANGE;
+	return 0;
+}
+
+static int copy_one_range(struct incfs_filled_range *range, void __user *buffer,
+			  u32 size, u32 *size_out)
+{
+	int error = check_room_for_one_range(size, *size_out);
+	if (error)
+		return error;
+
+	if (copy_to_user(((char __user *)buffer) + *size_out, range,
+				sizeof(*range)))
+		return -EFAULT;
+
+	*size_out += sizeof(*range);
+	return 0;
+}
+
+#define READ_BLOCKMAP_ENTRIES 512
+int incfs_get_filled_blocks(struct data_file *df,
+			    struct incfs_file_data *fd,
+			    struct incfs_get_filled_blocks_args *arg)
+{
+	int error = 0;
+	bool in_range = false;
+	struct incfs_filled_range range;
+	void __user *buffer = u64_to_user_ptr(arg->range_buffer);
+	u32 size = arg->range_buffer_size;
+	u32 end_index =
+		arg->end_index ? arg->end_index : df->df_total_block_count;
+	u32 *size_out = &arg->range_buffer_size_out;
+	int i = READ_BLOCKMAP_ENTRIES - 1;
+	int entries_read = 0;
+	struct incfs_blockmap_entry *bme;
+	int data_blocks_filled = 0;
+	int hash_blocks_filled = 0;
+
+	*size_out = 0;
+	if (end_index > df->df_total_block_count)
+		end_index = df->df_total_block_count;
+	arg->total_blocks_out = df->df_total_block_count;
+	arg->data_blocks_out = df->df_data_block_count;
+
+	if (atomic_read(&df->df_data_blocks_written) ==
+	    df->df_data_block_count) {
+		pr_debug("File marked full, fast get_filled_blocks");
+		if (arg->start_index > end_index) {
+			arg->index_out = arg->start_index;
+			return 0;
+		}
+		arg->index_out = arg->start_index;
+
+		error = check_room_for_one_range(size, *size_out);
+		if (error)
+			return error;
+
+		range = (struct incfs_filled_range){
+			.begin = arg->start_index,
+			.end = end_index,
+		};
+
+		error = copy_one_range(&range, buffer, size, size_out);
+		if (error)
+			return error;
+		arg->index_out = end_index;
+		return 0;
+	}
+
+	bme = kzalloc(sizeof(*bme) * READ_BLOCKMAP_ENTRIES,
+		      GFP_NOFS | __GFP_COMP);
+	if (!bme)
+		return -ENOMEM;
+
+	for (arg->index_out = arg->start_index; arg->index_out < end_index;
+	     ++arg->index_out) {
+		struct data_file_block dfb;
+
+		if (++i == READ_BLOCKMAP_ENTRIES) {
+			entries_read = incfs_read_blockmap_entries(
+				df->df_backing_file_context, bme,
+				arg->index_out, READ_BLOCKMAP_ENTRIES,
+				df->df_blockmap_off);
+			if (entries_read < 0) {
+				error = entries_read;
+				break;
+			}
+
+			i = 0;
+		}
+
+		if (i >= entries_read) {
+			error = -EIO;
+			break;
+		}
+
+		convert_data_file_block(bme + i, &dfb);
+
+		if (is_data_block_present(&dfb)) {
+			if (arg->index_out >= df->df_data_block_count)
+				++hash_blocks_filled;
+			else
+				++data_blocks_filled;
+		}
+
+		if (is_data_block_present(&dfb) == in_range)
+			continue;
+
+		if (!in_range) {
+			error = check_room_for_one_range(size, *size_out);
+			if (error)
+				break;
+			in_range = true;
+			range.begin = arg->index_out;
+		} else {
+			range.end = arg->index_out;
+			error = copy_one_range(&range, buffer, size, size_out);
+			if (error) {
+				/* there will be another try out of the loop,
+				 * it will reset the index_out if it fails too
+				 */
+				break;
+			}
+			in_range = false;
+		}
+	}
+
+	if (in_range) {
+		range.end = arg->index_out;
+		error = copy_one_range(&range, buffer, size, size_out);
+		if (error)
+			arg->index_out = range.begin;
+	}
+
+	if (arg->start_index == 0) {
+		fd->fd_get_block_pos = 0;
+		fd->fd_filled_data_blocks = 0;
+		fd->fd_filled_hash_blocks = 0;
+	}
+
+	if (arg->start_index == fd->fd_get_block_pos) {
+		fd->fd_get_block_pos = arg->index_out + 1;
+		fd->fd_filled_data_blocks += data_blocks_filled;
+		fd->fd_filled_hash_blocks += hash_blocks_filled;
+	}
+
+	if (fd->fd_get_block_pos == df->df_total_block_count + 1) {
+		if (fd->fd_filled_data_blocks >
+		   atomic_read(&df->df_data_blocks_written))
+			atomic_set(&df->df_data_blocks_written,
+				   fd->fd_filled_data_blocks);
+
+		if (fd->fd_filled_hash_blocks >
+		   atomic_read(&df->df_hash_blocks_written))
+			atomic_set(&df->df_hash_blocks_written,
+				   fd->fd_filled_hash_blocks);
+	}
+
+	kfree(bme);
+	return error;
+}
+
+static bool is_read_done(struct pending_read *read)
+{
+	return atomic_read_acquire(&read->done) != 0;
+}
+
+static void set_read_done(struct pending_read *read)
+{
+	atomic_set_release(&read->done, 1);
+}
+
+/*
+ * Notifies a given data file about pending read from a given block.
+ * Returns a new pending read entry.
+ */
+static struct pending_read *add_pending_read(struct data_file *df,
+					     int block_index)
+{
+	struct pending_read *result = NULL;
+	struct data_file_segment *segment = NULL;
+	struct mount_info *mi = NULL;
+
+	segment = get_file_segment(df, block_index);
+	mi = df->df_mount_info;
+
+	result = kzalloc(sizeof(*result), GFP_NOFS);
+	if (!result)
+		return NULL;
+
+	result->file_id = df->df_id;
+	result->block_index = block_index;
+	result->timestamp_us = ktime_to_us(ktime_get());
+	result->uid = current_uid().val;
+
+	spin_lock(&mi->pending_read_lock);
+
+	result->serial_number = ++mi->mi_last_pending_read_number;
+	mi->mi_pending_reads_count++;
+
+	list_add_rcu(&result->mi_reads_list, &mi->mi_reads_list_head);
+	list_add_rcu(&result->segment_reads_list, &segment->reads_list_head);
+
+	spin_unlock(&mi->pending_read_lock);
+
+	wake_up_all(&mi->mi_pending_reads_notif_wq);
+	return result;
+}
+
+static void free_pending_read_entry(struct rcu_head *entry)
+{
+	struct pending_read *read;
+
+	read = container_of(entry, struct pending_read, rcu);
+
+	kfree(read);
+}
+
+/* Notifies a given data file that pending read is completed. */
+static void remove_pending_read(struct data_file *df, struct pending_read *read)
+{
+	struct mount_info *mi = NULL;
+
+	if (!df || !read) {
+		WARN_ON(!df);
+		WARN_ON(!read);
+		return;
+	}
+
+	mi = df->df_mount_info;
+
+	spin_lock(&mi->pending_read_lock);
+
+	list_del_rcu(&read->mi_reads_list);
+	list_del_rcu(&read->segment_reads_list);
+
+	mi->mi_pending_reads_count--;
+
+	spin_unlock(&mi->pending_read_lock);
+
+	/* Don't free. Wait for readers */
+	call_rcu(&read->rcu, free_pending_read_entry);
+}
+
+static void notify_pending_reads(struct mount_info *mi,
+		struct data_file_segment *segment,
+		int index)
+{
+	struct pending_read *entry = NULL;
+
+	/* Notify pending reads waiting for this block. */
+	rcu_read_lock();
+	list_for_each_entry_rcu(entry, &segment->reads_list_head,
+						segment_reads_list) {
+		if (entry->block_index == index)
+			set_read_done(entry);
+	}
+	rcu_read_unlock();
+	wake_up_all(&segment->new_data_arrival_wq);
+
+	atomic_inc(&mi->mi_blocks_written);
+	wake_up_all(&mi->mi_blocks_written_notif_wq);
+}
+
+static int usleep_interruptible(u32 us)
+{
+	/* See:
+	 * https://www.kernel.org/doc/Documentation/timers/timers-howto.txt
+	 * for explanation
+	 */
+	if (us < 10) {
+		udelay(us);
+		return 0;
+	} else if (us < 20000) {
+		usleep_range(us, us + us / 10);
+		return 0;
+	} else
+		return msleep_interruptible(us / 1000);
+}
+
+static int wait_for_data_block(struct data_file *df, int block_index,
+			       struct data_file_block *res_block,
+			       struct incfs_read_data_file_timeouts *timeouts)
+{
+	struct data_file_block block = {};
+	struct data_file_segment *segment = NULL;
+	struct pending_read *read = NULL;
+	struct mount_info *mi = NULL;
+	int error;
+	int wait_res = 0;
+	unsigned int delayed_pending_us = 0, delayed_min_us = 0;
+	bool delayed_pending = false;
+
+	if (!df || !res_block)
+		return -EFAULT;
+
+	if (block_index < 0 || block_index >= df->df_data_block_count)
+		return -EINVAL;
+
+	if (df->df_blockmap_off <= 0 || !df->df_mount_info)
+		return -ENODATA;
+
+	mi = df->df_mount_info;
+	segment = get_file_segment(df, block_index);
+
+	error = down_read_killable(&segment->rwsem);
+	if (error)
+		return error;
+
+	/* Look up the given block */
+	error = get_data_file_block(df, block_index, &block);
+
+	up_read(&segment->rwsem);
+
+	if (error)
+		return error;
+
+	/* If the block was found, just return it. No need to wait. */
+	if (is_data_block_present(&block)) {
+		*res_block = block;
+		if (timeouts && timeouts->min_time_us) {
+			delayed_min_us = timeouts->min_time_us;
+			error = usleep_interruptible(delayed_min_us);
+			goto out;
+		}
+		return 0;
+	} else {
+		/* If it's not found, create a pending read */
+		if (timeouts && timeouts->max_pending_time_us) {
+			read = add_pending_read(df, block_index);
+			if (!read)
+				return -ENOMEM;
+		} else {
+			log_block_read(mi, &df->df_id, block_index);
+			return -ETIME;
+		}
+	}
+
+	/* Rest of function only applies if timeouts != NULL */
+	if (!timeouts) {
+		pr_warn("incfs: timeouts unexpectedly NULL\n");
+		return -EFSCORRUPTED;
+	}
+
+	/* Wait for notifications about block's arrival */
+	wait_res =
+		wait_event_interruptible_timeout(segment->new_data_arrival_wq,
+			(is_read_done(read)),
+			usecs_to_jiffies(timeouts->max_pending_time_us));
+
+	/* Woke up, the pending read is no longer needed. */
+	remove_pending_read(df, read);
+
+	if (wait_res == 0) {
+		/* Wait has timed out */
+		log_block_read(mi, &df->df_id, block_index);
+		return -ETIME;
+	}
+	if (wait_res < 0) {
+		/*
+		 * Only ERESTARTSYS is really expected here when a signal
+		 * comes while we wait.
+		 */
+		return wait_res;
+	}
+
+	delayed_pending = true;
+	delayed_pending_us = timeouts->max_pending_time_us -
+				jiffies_to_usecs(wait_res);
+	if (timeouts->min_pending_time_us > delayed_pending_us) {
+		delayed_min_us = timeouts->min_pending_time_us -
+					     delayed_pending_us;
+		error = usleep_interruptible(delayed_min_us);
+		if (error)
+			return error;
+	}
+
+	error = down_read_killable(&segment->rwsem);
+	if (error)
+		return error;
+
+	/*
+	 * Re-read blocks info now, it has just arrived and
+	 * should be available.
+	 */
+	error = get_data_file_block(df, block_index, &block);
+	if (!error) {
+		if (is_data_block_present(&block))
+			*res_block = block;
+		else {
+			/*
+			 * Somehow wait finished successfully but block still
+			 * can't be found. It's not normal.
+			 */
+			pr_warn("incfs: Wait succeeded but block not found.\n");
+			error = -ENODATA;
+		}
+	}
+	up_read(&segment->rwsem);
+
+out:
+	if (error)
+		return error;
+
+	if (delayed_pending) {
+		mi->mi_reads_delayed_pending++;
+		mi->mi_reads_delayed_pending_us +=
+			delayed_pending_us;
+	}
+
+	if (delayed_min_us) {
+		mi->mi_reads_delayed_min++;
+		mi->mi_reads_delayed_min_us += delayed_min_us;
+	}
+
+	return 0;
+}
+
+static int incfs_update_sysfs_error(struct file *file, int index, int result,
+				struct mount_info *mi, struct data_file *df)
+{
+	int error;
+
+	if (result >= 0)
+		return 0;
+
+	error = mutex_lock_interruptible(&mi->mi_le_mutex);
+	if (error)
+		return error;
+
+	mi->mi_le_file_id = df->df_id;
+	mi->mi_le_time_us = ktime_to_us(ktime_get());
+	mi->mi_le_page = index;
+	mi->mi_le_errno = result;
+	mi->mi_le_uid = current_uid().val;
+	mutex_unlock(&mi->mi_le_mutex);
+
+	return 0;
+}
+
+ssize_t incfs_read_data_file_block(struct mem_range dst, struct file *f,
+			int index, struct mem_range tmp,
+			struct incfs_read_data_file_timeouts *timeouts)
+{
+	loff_t pos;
+	ssize_t result;
+	size_t bytes_to_read;
+	struct mount_info *mi = NULL;
+	struct backing_file_context *bfc = NULL;
+	struct data_file_block block = {};
+	struct data_file *df = get_incfs_data_file(f);
+
+	if (!dst.data || !df || !tmp.data)
+		return -EFAULT;
+
+	if (tmp.len < 2 * INCFS_DATA_FILE_BLOCK_SIZE)
+		return -ERANGE;
+
+	mi = df->df_mount_info;
+	bfc = df->df_backing_file_context;
+
+	result = wait_for_data_block(df, index, &block, timeouts);
+	if (result < 0)
+		goto out;
+
+	pos = block.db_backing_file_data_offset;
+	if (block.db_comp_alg == COMPRESSION_NONE) {
+		bytes_to_read = min(dst.len, block.db_stored_size);
+		result = incfs_kread(bfc, dst.data, bytes_to_read, pos);
+
+		/* Some data was read, but not enough */
+		if (result >= 0 && result != bytes_to_read)
+			result = -EIO;
+	} else {
+		bytes_to_read = min(tmp.len, block.db_stored_size);
+		result = incfs_kread(bfc, tmp.data, bytes_to_read, pos);
+		if (result == bytes_to_read) {
+			result =
+				decompress(mi, range(tmp.data, bytes_to_read),
+					   dst, block.db_comp_alg);
+			if (result < 0) {
+				const char *name =
+				    bfc->bc_file->f_path.dentry->d_name.name;
+
+				pr_warn_once("incfs: Decompression error. %s",
+					     name);
+			}
+		} else if (result >= 0) {
+			/* Some data was read, but not enough */
+			result = -EIO;
+		}
+	}
+
+	if (result > 0) {
+		int err = validate_hash_tree(bfc, f, index, dst, tmp.data);
+
+		if (err < 0)
+			result = err;
+	}
+
+	if (result >= 0)
+		log_block_read(mi, &df->df_id, index);
+
+out:
+	if (result == -ETIME)
+		mi->mi_reads_failed_timed_out++;
+	else if (result == -EBADMSG)
+		mi->mi_reads_failed_hash_verification++;
+	else if (result < 0)
+		mi->mi_reads_failed_other++;
+
+	incfs_update_sysfs_error(f, index, result, mi, df);
+
+	return result;
+}
+
+ssize_t incfs_read_merkle_tree_blocks(struct mem_range dst,
+				      struct data_file *df, size_t offset)
+{
+	struct backing_file_context *bfc = NULL;
+	struct incfs_df_signature *sig = NULL;
+	size_t to_read = dst.len;
+
+	if (!dst.data || !df)
+		return -EFAULT;
+
+	sig = df->df_signature;
+	bfc = df->df_backing_file_context;
+
+	if (offset > sig->hash_size)
+		return -ERANGE;
+
+	if (offset + to_read > sig->hash_size)
+		to_read = sig->hash_size - offset;
+
+	return incfs_kread(bfc, dst.data, to_read, sig->hash_offset + offset);
+}
+
+int incfs_process_new_data_block(struct data_file *df,
+				 struct incfs_fill_block *block, u8 *data)
+{
+	struct mount_info *mi = NULL;
+	struct backing_file_context *bfc = NULL;
+	struct data_file_segment *segment = NULL;
+	struct data_file_block existing_block = {};
+	u16 flags = 0;
+	int error = 0;
+
+	if (!df || !block)
+		return -EFAULT;
+
+	bfc = df->df_backing_file_context;
+	mi = df->df_mount_info;
+
+	if (block->block_index >= df->df_data_block_count)
+		return -ERANGE;
+
+	segment = get_file_segment(df, block->block_index);
+	if (!segment)
+		return -EFAULT;
+
+	if (block->compression == COMPRESSION_LZ4)
+		flags |= INCFS_BLOCK_COMPRESSED_LZ4;
+	else if (block->compression == COMPRESSION_ZSTD)
+		flags |= INCFS_BLOCK_COMPRESSED_ZSTD;
+	else if (block->compression)
+		return -EINVAL;
+
+	error = down_read_killable(&segment->rwsem);
+	if (error)
+		return error;
+
+	error = get_data_file_block(df, block->block_index, &existing_block);
+
+	up_read(&segment->rwsem);
+
+	if (error)
+		return error;
+	if (is_data_block_present(&existing_block)) {
+		/* Block is already present, nothing to do here */
+		return 0;
+	}
+
+	error = down_write_killable(&segment->rwsem);
+	if (error)
+		return error;
+
+	error = mutex_lock_interruptible(&bfc->bc_mutex);
+	if (!error) {
+		error = incfs_write_data_block_to_backing_file(
+			bfc, range(data, block->data_len), block->block_index,
+			df->df_blockmap_off, flags);
+		mutex_unlock(&bfc->bc_mutex);
+	}
+	if (!error) {
+		notify_pending_reads(mi, segment, block->block_index);
+		atomic_inc(&df->df_data_blocks_written);
+	}
+
+	up_write(&segment->rwsem);
+
+	if (error)
+		pr_debug("%d error: %d\n", block->block_index, error);
+	return error;
+}
+
+int incfs_read_file_signature(struct data_file *df, struct mem_range dst)
+{
+	struct backing_file_context *bfc = df->df_backing_file_context;
+	struct incfs_df_signature *sig;
+	int read_res = 0;
+
+	if (!dst.data)
+		return -EFAULT;
+
+	sig = df->df_signature;
+	if (!sig)
+		return 0;
+
+	if (dst.len < sig->sig_size)
+		return -E2BIG;
+
+	read_res = incfs_kread(bfc, dst.data, sig->sig_size, sig->sig_offset);
+
+	if (read_res < 0)
+		return read_res;
+
+	if (read_res != sig->sig_size)
+		return -EIO;
+
+	return read_res;
+}
+
+int incfs_process_new_hash_block(struct data_file *df,
+				 struct incfs_fill_block *block, u8 *data)
+{
+	struct backing_file_context *bfc = NULL;
+	struct mount_info *mi = NULL;
+	struct mtree *hash_tree = NULL;
+	struct incfs_df_signature *sig = NULL;
+	loff_t hash_area_base = 0;
+	loff_t hash_area_size = 0;
+	int error = 0;
+
+	if (!df || !block)
+		return -EFAULT;
+
+	if (!(block->flags & INCFS_BLOCK_FLAGS_HASH))
+		return -EINVAL;
+
+	bfc = df->df_backing_file_context;
+	mi = df->df_mount_info;
+
+	if (!df)
+		return -ENOENT;
+
+	hash_tree = df->df_hash_tree;
+	sig = df->df_signature;
+	if (!hash_tree || !sig || sig->hash_offset == 0)
+		return -ENOTSUPP;
+
+	hash_area_base = sig->hash_offset;
+	hash_area_size = sig->hash_size;
+	if (hash_area_size < block->block_index * INCFS_DATA_FILE_BLOCK_SIZE
+				+ block->data_len) {
+		/* Hash block goes beyond dedicated hash area of this file. */
+		return -ERANGE;
+	}
+
+	error = mutex_lock_interruptible(&bfc->bc_mutex);
+	if (!error) {
+		error = incfs_write_hash_block_to_backing_file(
+			bfc, range(data, block->data_len), block->block_index,
+			hash_area_base, df->df_blockmap_off, df->df_size);
+		mutex_unlock(&bfc->bc_mutex);
+	}
+	if (!error)
+		atomic_inc(&df->df_hash_blocks_written);
+
+	return error;
+}
+
+static int process_blockmap_md(struct incfs_blockmap *bm,
+			       struct metadata_handler *handler)
+{
+	struct data_file *df = handler->context;
+	int error = 0;
+	loff_t base_off = le64_to_cpu(bm->m_base_offset);
+	u32 block_count = le32_to_cpu(bm->m_block_count);
+
+	if (!df)
+		return -EFAULT;
+
+	if (df->df_data_block_count > block_count)
+		return -EBADMSG;
+
+	df->df_total_block_count = block_count;
+	df->df_blockmap_off = base_off;
+	return error;
+}
+
+static int process_file_signature_md(struct incfs_file_signature *sg,
+				struct metadata_handler *handler)
+{
+	struct data_file *df = handler->context;
+	struct mtree *hash_tree = NULL;
+	int error = 0;
+	struct incfs_df_signature *signature =
+		kzalloc(sizeof(*signature), GFP_NOFS);
+	void *buf = NULL;
+	ssize_t read;
+
+	if (!signature)
+		return -ENOMEM;
+
+	if (!df || !df->df_backing_file_context ||
+	    !df->df_backing_file_context->bc_file) {
+		error = -ENOENT;
+		goto out;
+	}
+
+	signature->hash_offset = le64_to_cpu(sg->sg_hash_tree_offset);
+	signature->hash_size = le32_to_cpu(sg->sg_hash_tree_size);
+	signature->sig_offset = le64_to_cpu(sg->sg_sig_offset);
+	signature->sig_size = le32_to_cpu(sg->sg_sig_size);
+
+	buf = kzalloc(signature->sig_size, GFP_NOFS);
+	if (!buf) {
+		error = -ENOMEM;
+		goto out;
+	}
+
+	read = incfs_kread(df->df_backing_file_context, buf,
+			   signature->sig_size, signature->sig_offset);
+	if (read < 0) {
+		error = read;
+		goto out;
+	}
+
+	if (read != signature->sig_size) {
+		error = -EINVAL;
+		goto out;
+	}
+
+	hash_tree = incfs_alloc_mtree(range(buf, signature->sig_size),
+				      df->df_data_block_count);
+	if (IS_ERR(hash_tree)) {
+		error = PTR_ERR(hash_tree);
+		hash_tree = NULL;
+		goto out;
+	}
+	if (hash_tree->hash_tree_area_size != signature->hash_size) {
+		error = -EINVAL;
+		goto out;
+	}
+	if (signature->hash_size > 0 &&
+	    handler->md_record_offset <= signature->hash_offset) {
+		error = -EINVAL;
+		goto out;
+	}
+	if (handler->md_record_offset <= signature->sig_offset) {
+		error = -EINVAL;
+		goto out;
+	}
+	df->df_hash_tree = hash_tree;
+	hash_tree = NULL;
+	df->df_signature = signature;
+	signature = NULL;
+out:
+	incfs_free_mtree(hash_tree);
+	kfree(signature);
+	kfree(buf);
+
+	return error;
+}
+
+static int process_status_md(struct incfs_status *is,
+			     struct metadata_handler *handler)
+{
+	struct data_file *df = handler->context;
+
+	df->df_initial_data_blocks_written =
+		le32_to_cpu(is->is_data_blocks_written);
+	atomic_set(&df->df_data_blocks_written,
+		   df->df_initial_data_blocks_written);
+
+	df->df_initial_hash_blocks_written =
+		le32_to_cpu(is->is_hash_blocks_written);
+	atomic_set(&df->df_hash_blocks_written,
+		   df->df_initial_hash_blocks_written);
+
+	df->df_status_offset = handler->md_record_offset;
+	return 0;
+}
+
+static int process_file_verity_signature_md(
+		struct incfs_file_verity_signature *vs,
+		struct metadata_handler *handler)
+{
+	struct data_file *df = handler->context;
+	struct incfs_df_verity_signature *verity_signature;
+
+	if (!df)
+		return -EFAULT;
+
+	verity_signature = kzalloc(sizeof(*verity_signature), GFP_NOFS);
+	if (!verity_signature)
+		return -ENOMEM;
+
+	verity_signature->offset = le64_to_cpu(vs->vs_offset);
+	verity_signature->size = le32_to_cpu(vs->vs_size);
+	if (verity_signature->size > FS_VERITY_MAX_SIGNATURE_SIZE) {
+		kfree(verity_signature);
+		return -EFAULT;
+	}
+
+	df->df_verity_signature = verity_signature;
+	return 0;
+}
+
+static int incfs_scan_metadata_chain(struct data_file *df)
+{
+	struct metadata_handler *handler = NULL;
+	int result = 0;
+	int records_count = 0;
+	int error = 0;
+	struct backing_file_context *bfc = NULL;
+	int nondata_block_count;
+
+	if (!df || !df->df_backing_file_context)
+		return -EFAULT;
+
+	bfc = df->df_backing_file_context;
+
+	handler = kzalloc(sizeof(*handler), GFP_NOFS);
+	if (!handler)
+		return -ENOMEM;
+
+	handler->md_record_offset = df->df_metadata_off;
+	handler->context = df;
+	handler->handle_blockmap = process_blockmap_md;
+	handler->handle_signature = process_file_signature_md;
+	handler->handle_status = process_status_md;
+	handler->handle_verity_signature = process_file_verity_signature_md;
+
+	while (handler->md_record_offset > 0) {
+		error = incfs_read_next_metadata_record(bfc, handler);
+		if (error) {
+			pr_warn("incfs: Error during reading incfs-metadata record. Offset: %lld Record #%d Error code: %d\n",
+				handler->md_record_offset, records_count + 1,
+				-error);
+			break;
+		}
+		records_count++;
+	}
+	if (error) {
+		pr_warn("incfs: Error %d after reading %d incfs-metadata records.\n",
+			 -error, records_count);
+		result = error;
+	} else
+		result = records_count;
+
+	nondata_block_count = df->df_total_block_count -
+		df->df_data_block_count;
+	if (df->df_hash_tree) {
+		int hash_block_count = get_blocks_count_for_size(
+			df->df_hash_tree->hash_tree_area_size);
+
+		/*
+		 * Files that were created with a hash tree have the hash tree
+		 * included in the block map, i.e. nondata_block_count ==
+		 * hash_block_count.  Files whose hash tree was added by
+		 * FS_IOC_ENABLE_VERITY will still have the original block
+		 * count, i.e. nondata_block_count == 0.
+		 */
+		if (nondata_block_count != hash_block_count &&
+		    nondata_block_count != 0)
+			result = -EINVAL;
+	} else if (nondata_block_count != 0) {
+		result = -EINVAL;
+	}
+
+	kfree(handler);
+	return result;
+}
+
+/*
+ * Quickly checks if there are pending reads with a serial number larger
+ * than a given one.
+ */
+bool incfs_fresh_pending_reads_exist(struct mount_info *mi, int last_number)
+{
+	bool result = false;
+
+	spin_lock(&mi->pending_read_lock);
+	result = (mi->mi_last_pending_read_number > last_number) &&
+		(mi->mi_pending_reads_count > 0);
+	spin_unlock(&mi->pending_read_lock);
+	return result;
+}
+
+int incfs_collect_pending_reads(struct mount_info *mi, int sn_lowerbound,
+				struct incfs_pending_read_info *reads,
+				struct incfs_pending_read_info2 *reads2,
+				int reads_size, int *new_max_sn)
+{
+	int reported_reads = 0;
+	struct pending_read *entry = NULL;
+
+	if (!mi)
+		return -EFAULT;
+
+	if (reads_size <= 0)
+		return 0;
+
+	if (!incfs_fresh_pending_reads_exist(mi, sn_lowerbound))
+		return 0;
+
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(entry, &mi->mi_reads_list_head, mi_reads_list) {
+		if (entry->serial_number <= sn_lowerbound)
+			continue;
+
+		if (reads) {
+			reads[reported_reads].file_id = entry->file_id;
+			reads[reported_reads].block_index = entry->block_index;
+			reads[reported_reads].serial_number =
+				entry->serial_number;
+			reads[reported_reads].timestamp_us =
+				entry->timestamp_us;
+		}
+
+		if (reads2) {
+			reads2[reported_reads].file_id = entry->file_id;
+			reads2[reported_reads].block_index = entry->block_index;
+			reads2[reported_reads].serial_number =
+				entry->serial_number;
+			reads2[reported_reads].timestamp_us =
+				entry->timestamp_us;
+			reads2[reported_reads].uid = entry->uid;
+		}
+
+		if (entry->serial_number > *new_max_sn)
+			*new_max_sn = entry->serial_number;
+
+		reported_reads++;
+		if (reported_reads >= reads_size)
+			break;
+	}
+
+	rcu_read_unlock();
+
+	return reported_reads;
+}
+
+struct read_log_state incfs_get_log_state(struct mount_info *mi)
+{
+	struct read_log *log = &mi->mi_log;
+	struct read_log_state result;
+
+	spin_lock(&log->rl_lock);
+	result = log->rl_head;
+	spin_unlock(&log->rl_lock);
+	return result;
+}
+
+int incfs_get_uncollected_logs_count(struct mount_info *mi,
+				     const struct read_log_state *state)
+{
+	struct read_log *log = &mi->mi_log;
+	u32 generation;
+	u64 head_no, tail_no;
+
+	spin_lock(&log->rl_lock);
+	tail_no = log->rl_tail.current_record_no;
+	head_no = log->rl_head.current_record_no;
+	generation = log->rl_head.generation_id;
+	spin_unlock(&log->rl_lock);
+
+	if (generation != state->generation_id)
+		return head_no - tail_no;
+	else
+		return head_no - max_t(u64, tail_no, state->current_record_no);
+}
+
+int incfs_collect_logged_reads(struct mount_info *mi,
+			       struct read_log_state *state,
+			       struct incfs_pending_read_info *reads,
+			       struct incfs_pending_read_info2 *reads2,
+			       int reads_size)
+{
+	int dst_idx;
+	struct read_log *log = &mi->mi_log;
+	struct read_log_state *head, *tail;
+
+	spin_lock(&log->rl_lock);
+	head = &log->rl_head;
+	tail = &log->rl_tail;
+
+	if (state->generation_id != head->generation_id) {
+		pr_debug("read ptr is wrong generation: %u/%u",
+			 state->generation_id, head->generation_id);
+
+		*state = (struct read_log_state){
+			.generation_id = head->generation_id,
+		};
+	}
+
+	if (state->current_record_no < tail->current_record_no) {
+		pr_debug("read ptr is behind, moving: %u/%u -> %u/%u\n",
+			 (u32)state->next_offset,
+			 (u32)state->current_pass_no,
+			 (u32)tail->next_offset, (u32)tail->current_pass_no);
+
+		*state = *tail;
+	}
+
+	for (dst_idx = 0; dst_idx < reads_size; dst_idx++) {
+		if (state->current_record_no == head->current_record_no)
+			break;
+
+		log_read_one_record(log, state);
+
+		if (reads)
+			reads[dst_idx] = (struct incfs_pending_read_info) {
+				.file_id = state->base_record.file_id,
+				.block_index = state->base_record.block_index,
+				.serial_number = state->current_record_no,
+				.timestamp_us =
+					state->base_record.absolute_ts_us,
+			};
+
+		if (reads2)
+			reads2[dst_idx] = (struct incfs_pending_read_info2) {
+				.file_id = state->base_record.file_id,
+				.block_index = state->base_record.block_index,
+				.serial_number = state->current_record_no,
+				.timestamp_us =
+					state->base_record.absolute_ts_us,
+				.uid = state->base_record.uid,
+			};
+	}
+
+	spin_unlock(&log->rl_lock);
+	return dst_idx;
+}
+

diff --git a/fs/incfs/data_mgmt.h b/fs/incfs/data_mgmt.h
new file mode 100644
index 0000000..2227913
--- /dev/null
+++ b/fs/incfs/data_mgmt.h

@@ -0,0 +1,549 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2019 Google LLC
+ */
+#ifndef _INCFS_DATA_MGMT_H
+#define _INCFS_DATA_MGMT_H
+
+#include <linux/cred.h>
+#include <linux/fs.h>
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+#include <linux/rcupdate.h>
+#include <linux/completion.h>
+#include <linux/wait.h>
+#include <linux/zstd.h>
+#include <crypto/hash.h>
+#include <linux/rwsem.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+#include "internal.h"
+#include "pseudo_files.h"
+
+#define SEGMENTS_PER_FILE 3
+
+enum LOG_RECORD_TYPE {
+	FULL,
+	SAME_FILE,
+	SAME_FILE_CLOSE_BLOCK,
+	SAME_FILE_CLOSE_BLOCK_SHORT,
+	SAME_FILE_NEXT_BLOCK,
+	SAME_FILE_NEXT_BLOCK_SHORT,
+};
+
+struct full_record {
+	enum LOG_RECORD_TYPE type : 3; /* FULL */
+	u32 block_index : 29;
+	incfs_uuid_t file_id;
+	u64 absolute_ts_us;
+	uid_t uid;
+} __packed; /* 32 bytes */
+
+struct same_file {
+	enum LOG_RECORD_TYPE type : 3; /* SAME_FILE */
+	u32 block_index : 29;
+	uid_t uid;
+	u16 relative_ts_us; /* max 2^16 us ~= 64 ms */
+} __packed; /* 10 bytes */
+
+struct same_file_close_block {
+	enum LOG_RECORD_TYPE type : 3; /* SAME_FILE_CLOSE_BLOCK */
+	u16 relative_ts_us : 13; /* max 2^13 us ~= 8 ms */
+	s16 block_index_delta;
+} __packed; /* 4 bytes */
+
+struct same_file_close_block_short {
+	enum LOG_RECORD_TYPE type : 3; /* SAME_FILE_CLOSE_BLOCK_SHORT */
+	u8 relative_ts_tens_us : 5; /* max 2^5*10 us ~= 320 us */
+	s8 block_index_delta;
+} __packed; /* 2 bytes */
+
+struct same_file_next_block {
+	enum LOG_RECORD_TYPE type : 3; /* SAME_FILE_NEXT_BLOCK */
+	u16 relative_ts_us : 13; /* max 2^13 us ~= 8 ms */
+} __packed; /* 2 bytes */
+
+struct same_file_next_block_short {
+	enum LOG_RECORD_TYPE type : 3; /* SAME_FILE_NEXT_BLOCK_SHORT */
+	u8 relative_ts_tens_us : 5; /* max 2^5*10 us ~= 320 us */
+} __packed; /* 1 byte */
+
+union log_record {
+	struct full_record full_record;
+	struct same_file same_file;
+	struct same_file_close_block same_file_close_block;
+	struct same_file_close_block_short same_file_close_block_short;
+	struct same_file_next_block same_file_next_block;
+	struct same_file_next_block_short same_file_next_block_short;
+};
+
+struct read_log_state {
+	/* Log buffer generation id, incremented on configuration changes */
+	u32 generation_id;
+
+	/* Offset in rl_ring_buf to write into. */
+	u32 next_offset;
+
+	/* Current number of writer passes over rl_ring_buf */
+	u32 current_pass_no;
+
+	/* Current full_record to diff against */
+	struct full_record base_record;
+
+	/* Current record number counting from configuration change */
+	u64 current_record_no;
+};
+
+/* A ring buffer to save records about data blocks which were recently read. */
+struct read_log {
+	void *rl_ring_buf;
+
+	int rl_size;
+
+	struct read_log_state rl_head;
+
+	struct read_log_state rl_tail;
+
+	/* A lock to protect the above fields */
+	spinlock_t rl_lock;
+
+	/* A queue of waiters who want to be notified about reads */
+	wait_queue_head_t ml_notif_wq;
+
+	/* A work item to wake up those waiters without slowing down readers */
+	struct delayed_work ml_wakeup_work;
+};
+
+struct mount_options {
+	unsigned int read_timeout_ms;
+	unsigned int readahead_pages;
+	unsigned int read_log_pages;
+	unsigned int read_log_wakeup_count;
+	bool report_uid;
+	char *sysfs_name;
+};
+
+struct mount_info {
+	struct super_block *mi_sb;
+
+	struct path mi_backing_dir_path;
+
+	struct dentry *mi_index_dir;
+	/* For stacking mounts, if true, this indicates if the index dir needs
+	 * to be freed for this SB otherwise it was created by lower level SB */
+	bool mi_index_free;
+
+	struct dentry *mi_incomplete_dir;
+	/* For stacking mounts, if true, this indicates if the incomplete dir
+	 * needs to be freed for this SB. Similar to mi_index_free */
+	bool mi_incomplete_free;
+
+	const struct cred *mi_owner;
+
+	struct mount_options mi_options;
+
+	/* This mutex is to be taken before create, rename, delete */
+	struct mutex mi_dir_struct_mutex;
+
+	/*
+	 * A queue of waiters who want to be notified about new pending reads.
+	 */
+	wait_queue_head_t mi_pending_reads_notif_wq;
+
+	/*
+	 * Protects - RCU safe:
+	 *  - reads_list_head
+	 *  - mi_pending_reads_count
+	 *  - mi_last_pending_read_number
+	 *  - data_file_segment.reads_list_head
+	 */
+	spinlock_t pending_read_lock;
+
+	/* List of active pending_read objects */
+	struct list_head mi_reads_list_head;
+
+	/* Total number of items in reads_list_head */
+	int mi_pending_reads_count;
+
+	/*
+	 * Last serial number that was assigned to a pending read.
+	 * 0 means no pending reads have been seen yet.
+	 */
+	int mi_last_pending_read_number;
+
+	/* Temporary buffer for read logger. */
+	struct read_log mi_log;
+
+	/* SELinux needs special xattrs on our pseudo files */
+	struct mem_range pseudo_file_xattr[PSEUDO_FILE_COUNT];
+
+	/* A queue of waiters who want to be notified about blocks_written */
+	wait_queue_head_t mi_blocks_written_notif_wq;
+
+	/* Number of blocks written since mount */
+	atomic_t mi_blocks_written;
+
+	/* Per UID read timeouts */
+	spinlock_t mi_per_uid_read_timeouts_lock;
+	struct incfs_per_uid_read_timeouts *mi_per_uid_read_timeouts;
+	int mi_per_uid_read_timeouts_size;
+
+	/* zstd workspace */
+	struct mutex mi_zstd_workspace_mutex;
+	void *mi_zstd_workspace;
+	ZSTD_DStream *mi_zstd_stream;
+	struct delayed_work mi_zstd_cleanup_work;
+
+	/* sysfs node */
+	struct incfs_sysfs_node *mi_sysfs_node;
+
+	/* Last error information */
+	struct mutex	mi_le_mutex;
+	incfs_uuid_t	mi_le_file_id;
+	u64		mi_le_time_us;
+	u32		mi_le_page;
+	u32		mi_le_errno;
+	uid_t		mi_le_uid;
+
+	/* Number of reads timed out */
+	u32 mi_reads_failed_timed_out;
+
+	/* Number of reads failed because hash verification failed */
+	u32 mi_reads_failed_hash_verification;
+
+	/* Number of reads failed for another reason */
+	u32 mi_reads_failed_other;
+
+	/* Number of reads delayed because page had to be fetched */
+	u32 mi_reads_delayed_pending;
+
+	/* Total time waiting for pages to be fetched */
+	u64 mi_reads_delayed_pending_us;
+
+	/*
+	 * Number of reads delayed because of per-uid min_time_us or
+	 * min_pending_time_us settings
+	 */
+	u32 mi_reads_delayed_min;
+
+	/* Total time waiting because of per-uid min_time_us or
+	 * min_pending_time_us settings.
+	 *
+	 * Note that if a read is initially delayed because we have to wait for
+	 * the page, then further delayed because of min_pending_time_us
+	 * setting, this counter gets incremented by only the further delay
+	 * time.
+	 */
+	u64 mi_reads_delayed_min_us;
+};
+
+struct data_file_block {
+	loff_t db_backing_file_data_offset;
+
+	size_t db_stored_size;
+
+	enum incfs_compression_alg db_comp_alg;
+};
+
+struct pending_read {
+	incfs_uuid_t file_id;
+
+	s64 timestamp_us;
+
+	atomic_t done;
+
+	int block_index;
+
+	int serial_number;
+
+	uid_t uid;
+
+	struct list_head mi_reads_list;
+
+	struct list_head segment_reads_list;
+
+	struct rcu_head rcu;
+};
+
+struct data_file_segment {
+	wait_queue_head_t new_data_arrival_wq;
+
+	/* Protects reads and writes from the blockmap */
+	struct rw_semaphore rwsem;
+
+	/* List of active pending_read objects belonging to this segment */
+	/* Protected by mount_info.pending_reads_mutex */
+	struct list_head reads_list_head;
+};
+
+/*
+ * Extra info associated with a file. Just a few bytes set by a user.
+ */
+struct file_attr {
+	loff_t fa_value_offset;
+
+	size_t fa_value_size;
+
+	u32 fa_crc;
+};
+
+
+struct data_file {
+	struct backing_file_context *df_backing_file_context;
+
+	struct mount_info *df_mount_info;
+
+	incfs_uuid_t df_id;
+
+	/*
+	 * Array of segments used to reduce lock contention for the file.
+	 * Segment is chosen for a block depends on the block's index.
+	 */
+	struct data_file_segment df_segments[SEGMENTS_PER_FILE];
+
+	/* Base offset of the first metadata record. */
+	loff_t df_metadata_off;
+
+	/* Base offset of the block map. */
+	loff_t df_blockmap_off;
+
+	/* File size in bytes */
+	loff_t df_size;
+
+	/* File header flags */
+	u32 df_header_flags;
+
+	/* File size in DATA_FILE_BLOCK_SIZE blocks */
+	int df_data_block_count;
+
+	/* Total number of blocks, data + hash */
+	int df_total_block_count;
+
+	/* For mapped files, the offset into the actual file */
+	loff_t df_mapped_offset;
+
+	/* Number of data blocks written to file */
+	atomic_t df_data_blocks_written;
+
+	/* Number of data blocks in the status block */
+	u32 df_initial_data_blocks_written;
+
+	/* Number of hash blocks written to file */
+	atomic_t df_hash_blocks_written;
+
+	/* Number of hash blocks in the status block */
+	u32 df_initial_hash_blocks_written;
+
+	/* Offset to status metadata header */
+	loff_t df_status_offset;
+
+	/*
+	 * Mutex acquired while enabling verity. Note that df_hash_tree is set
+	 * by enable verity.
+	 *
+	 * The backing file mutex bc_mutex  may be taken while this mutex is
+	 * held.
+	 */
+	struct mutex df_enable_verity;
+
+	/*
+	 * Set either at construction time or during enabling verity. In the
+	 * latter case, set via smp_store_release, so use smp_load_acquire to
+	 * read it.
+	 */
+	struct mtree *df_hash_tree;
+
+	/* Guaranteed set if df_hash_tree is set. */
+	struct incfs_df_signature *df_signature;
+
+	/*
+	 * The verity file digest, set when verity is enabled and the file has
+	 * been opened
+	 */
+	struct mem_range df_verity_file_digest;
+
+	struct incfs_df_verity_signature *df_verity_signature;
+};
+
+struct dir_file {
+	struct mount_info *mount_info;
+
+	struct file *backing_dir;
+};
+
+struct inode_info {
+	struct mount_info *n_mount_info; /* A mount, this file belongs to */
+
+	struct inode *n_backing_inode;
+
+	struct data_file *n_file;
+
+	struct inode n_vfs_inode;
+};
+
+struct dentry_info {
+	struct path backing_path;
+};
+
+enum FILL_PERMISSION {
+	CANT_FILL = 0,
+	CAN_FILL = 1,
+};
+
+struct incfs_file_data {
+	/* Does this file handle have INCFS_IOC_FILL_BLOCKS permission */
+	enum FILL_PERMISSION fd_fill_permission;
+
+	/* If INCFS_IOC_GET_FILLED_BLOCKS has been called, where are we */
+	int fd_get_block_pos;
+
+	/* And how many filled blocks are there up to that point */
+	int fd_filled_data_blocks;
+	int fd_filled_hash_blocks;
+};
+
+struct mount_info *incfs_alloc_mount_info(struct super_block *sb,
+					  struct mount_options *options,
+					  struct path *backing_dir_path);
+
+int incfs_realloc_mount_info(struct mount_info *mi,
+			     struct mount_options *options);
+
+void incfs_free_mount_info(struct mount_info *mi);
+
+char *file_id_to_str(incfs_uuid_t id);
+struct dentry *incfs_lookup_dentry(struct dentry *parent, const char *name);
+struct data_file *incfs_open_data_file(struct mount_info *mi, struct file *bf);
+void incfs_free_data_file(struct data_file *df);
+
+struct dir_file *incfs_open_dir_file(struct mount_info *mi, struct file *bf);
+void incfs_free_dir_file(struct dir_file *dir);
+
+struct incfs_read_data_file_timeouts {
+	u32 min_time_us;
+	u32 min_pending_time_us;
+	u32 max_pending_time_us;
+};
+
+ssize_t incfs_read_data_file_block(struct mem_range dst, struct file *f,
+			int index, struct mem_range tmp,
+			struct incfs_read_data_file_timeouts *timeouts);
+
+ssize_t incfs_read_merkle_tree_blocks(struct mem_range dst,
+				      struct data_file *df, size_t offset);
+
+int incfs_get_filled_blocks(struct data_file *df,
+			    struct incfs_file_data *fd,
+			    struct incfs_get_filled_blocks_args *arg);
+
+int incfs_read_file_signature(struct data_file *df, struct mem_range dst);
+
+int incfs_process_new_data_block(struct data_file *df,
+				 struct incfs_fill_block *block, u8 *data);
+
+int incfs_process_new_hash_block(struct data_file *df,
+				 struct incfs_fill_block *block, u8 *data);
+
+bool incfs_fresh_pending_reads_exist(struct mount_info *mi, int last_number);
+
+/*
+ * Collects pending reads and saves them into the array (reads/reads_size).
+ * Only reads with serial_number > sn_lowerbound are reported.
+ * Returns how many reads were saved into the array.
+ */
+int incfs_collect_pending_reads(struct mount_info *mi, int sn_lowerbound,
+				struct incfs_pending_read_info *reads,
+				struct incfs_pending_read_info2 *reads2,
+				int reads_size, int *new_max_sn);
+
+int incfs_collect_logged_reads(struct mount_info *mi,
+			       struct read_log_state *start_state,
+			       struct incfs_pending_read_info *reads,
+			       struct incfs_pending_read_info2 *reads2,
+			       int reads_size);
+struct read_log_state incfs_get_log_state(struct mount_info *mi);
+int incfs_get_uncollected_logs_count(struct mount_info *mi,
+				     const struct read_log_state *state);
+
+static inline struct inode_info *get_incfs_node(struct inode *inode)
+{
+	if (!inode)
+		return NULL;
+
+	if (inode->i_sb->s_magic != INCFS_MAGIC_NUMBER) {
+		/* This inode doesn't belong to us. */
+		pr_warn_once("incfs: %s on an alien inode.", __func__);
+		return NULL;
+	}
+
+	return container_of(inode, struct inode_info, n_vfs_inode);
+}
+
+static inline struct data_file *get_incfs_data_file(struct file *f)
+{
+	struct inode_info *node = NULL;
+
+	if (!f)
+		return NULL;
+
+	if (!S_ISREG(f->f_inode->i_mode))
+		return NULL;
+
+	node = get_incfs_node(f->f_inode);
+	if (!node)
+		return NULL;
+
+	return node->n_file;
+}
+
+static inline struct dir_file *get_incfs_dir_file(struct file *f)
+{
+	if (!f)
+		return NULL;
+
+	if (!S_ISDIR(f->f_inode->i_mode))
+		return NULL;
+
+	return (struct dir_file *)f->private_data;
+}
+
+/*
+ * Make sure that inode_info.n_file is initialized and inode can be used
+ * for reading and writing data from/to the backing file.
+ */
+int make_inode_ready_for_data_ops(struct mount_info *mi,
+				struct inode *inode,
+				struct file *backing_file);
+
+static inline struct dentry_info *get_incfs_dentry(const struct dentry *d)
+{
+	if (!d)
+		return NULL;
+
+	return (struct dentry_info *)d->d_fsdata;
+}
+
+static inline void get_incfs_backing_path(const struct dentry *d,
+					  struct path *path)
+{
+	struct dentry_info *di = get_incfs_dentry(d);
+
+	if (!di) {
+		*path = (struct path) {};
+		return;
+	}
+
+	*path = di->backing_path;
+	path_get(path);
+}
+
+static inline int get_blocks_count_for_size(u64 size)
+{
+	if (size == 0)
+		return 0;
+	return 1 + (size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+}
+
+#endif /* _INCFS_DATA_MGMT_H */

diff --git a/fs/incfs/format.c b/fs/incfs/format.c
new file mode 100644
index 0000000..6ffcf33
--- /dev/null
+++ b/fs/incfs/format.c

@@ -0,0 +1,752 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/mm.h>
+#include <linux/falloc.h>
+#include <linux/slab.h>
+#include <linux/crc32.h>
+#include <linux/kernel.h>
+
+#include "format.h"
+#include "data_mgmt.h"
+
+struct backing_file_context *incfs_alloc_bfc(struct mount_info *mi,
+					     struct file *backing_file)
+{
+	struct backing_file_context *result = NULL;
+
+	result = kzalloc(sizeof(*result), GFP_NOFS);
+	if (!result)
+		return ERR_PTR(-ENOMEM);
+
+	result->bc_file = get_file(backing_file);
+	result->bc_cred = mi->mi_owner;
+	mutex_init(&result->bc_mutex);
+	return result;
+}
+
+void incfs_free_bfc(struct backing_file_context *bfc)
+{
+	if (!bfc)
+		return;
+
+	if (bfc->bc_file)
+		fput(bfc->bc_file);
+
+	mutex_destroy(&bfc->bc_mutex);
+	kfree(bfc);
+}
+
+static loff_t incfs_get_end_offset(struct file *f)
+{
+	/*
+	 * This function assumes that file size and the end-offset
+	 * are the same. This is not always true.
+	 */
+	return i_size_read(file_inode(f));
+}
+
+/*
+ * Truncate the tail of the file to the given length.
+ * Used to rollback partially successful multistep writes.
+ */
+static int truncate_backing_file(struct backing_file_context *bfc,
+				loff_t new_end)
+{
+	struct inode *inode = NULL;
+	struct dentry *dentry = NULL;
+	loff_t old_end = 0;
+	struct iattr attr;
+	int result = 0;
+
+	if (!bfc)
+		return -EFAULT;
+
+	LOCK_REQUIRED(bfc->bc_mutex);
+
+	if (!bfc->bc_file)
+		return -EFAULT;
+
+	old_end = incfs_get_end_offset(bfc->bc_file);
+	if (old_end == new_end)
+		return 0;
+	if (old_end < new_end)
+		return -EINVAL;
+
+	inode = bfc->bc_file->f_inode;
+	dentry = bfc->bc_file->f_path.dentry;
+
+	attr.ia_size = new_end;
+	attr.ia_valid = ATTR_SIZE;
+
+	inode_lock(inode);
+	result = notify_change(&init_user_ns, dentry, &attr, NULL);
+	inode_unlock(inode);
+
+	return result;
+}
+
+static int write_to_bf(struct backing_file_context *bfc, const void *buf,
+			size_t count, loff_t pos)
+{
+	ssize_t res = incfs_kwrite(bfc, buf, count, pos);
+
+	if (res < 0)
+		return res;
+	if (res != count)
+		return -EIO;
+	return 0;
+}
+
+static int append_zeros_no_fallocate(struct backing_file_context *bfc,
+				     size_t file_size, size_t len)
+{
+	u8 buffer[256] = {};
+	size_t i;
+
+	for (i = 0; i < len; i += sizeof(buffer)) {
+		int to_write = len - i > sizeof(buffer)
+			? sizeof(buffer) : len - i;
+		int err = write_to_bf(bfc, buffer, to_write, file_size + i);
+
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+/* Append a given number of zero bytes to the end of the backing file. */
+static int append_zeros(struct backing_file_context *bfc, size_t len)
+{
+	loff_t file_size = 0;
+	loff_t new_last_byte_offset = 0;
+	int result;
+
+	if (!bfc)
+		return -EFAULT;
+
+	if (len == 0)
+		return 0;
+
+	LOCK_REQUIRED(bfc->bc_mutex);
+
+	/*
+	 * Allocate only one byte at the new desired end of the file.
+	 * It will increase file size and create a zeroed area of
+	 * a given size.
+	 */
+	file_size = incfs_get_end_offset(bfc->bc_file);
+	new_last_byte_offset = file_size + len - 1;
+	result = vfs_fallocate(bfc->bc_file, 0, new_last_byte_offset, 1);
+	if (result != -EOPNOTSUPP)
+		return result;
+
+	return append_zeros_no_fallocate(bfc, file_size, len);
+}
+
+/*
+ * Append a given metadata record to the backing file and update a previous
+ * record to add the new record the the metadata list.
+ */
+static int append_md_to_backing_file(struct backing_file_context *bfc,
+			      struct incfs_md_header *record)
+{
+	int result = 0;
+	loff_t record_offset;
+	loff_t file_pos;
+	__le64 new_md_offset;
+	size_t record_size;
+
+	if (!bfc || !record)
+		return -EFAULT;
+
+	if (bfc->bc_last_md_record_offset < 0)
+		return -EINVAL;
+
+	LOCK_REQUIRED(bfc->bc_mutex);
+
+	record_size = le16_to_cpu(record->h_record_size);
+	file_pos = incfs_get_end_offset(bfc->bc_file);
+	record->h_next_md_offset = 0;
+
+	/* Write the metadata record to the end of the backing file */
+	record_offset = file_pos;
+	new_md_offset = cpu_to_le64(record_offset);
+	result = write_to_bf(bfc, record, record_size, file_pos);
+	if (result)
+		return result;
+
+	/* Update next metadata offset in a previous record or a superblock. */
+	if (bfc->bc_last_md_record_offset) {
+		/*
+		 * Find a place in the previous md record where new record's
+		 * offset needs to be saved.
+		 */
+		file_pos = bfc->bc_last_md_record_offset +
+			offsetof(struct incfs_md_header, h_next_md_offset);
+	} else {
+		/*
+		 * No metadata yet, file a place to update in the
+		 * file_header.
+		 */
+		file_pos = offsetof(struct incfs_file_header,
+				    fh_first_md_offset);
+	}
+	result = write_to_bf(bfc, &new_md_offset, sizeof(new_md_offset),
+			     file_pos);
+	if (result)
+		return result;
+
+	bfc->bc_last_md_record_offset = record_offset;
+	return result;
+}
+
+/*
+ * Reserve 0-filled space for the blockmap body, and append
+ * incfs_blockmap metadata record pointing to it.
+ */
+int incfs_write_blockmap_to_backing_file(struct backing_file_context *bfc,
+					 u32 block_count)
+{
+	struct incfs_blockmap blockmap = {};
+	int result = 0;
+	loff_t file_end = 0;
+	size_t map_size = block_count * sizeof(struct incfs_blockmap_entry);
+
+	if (!bfc)
+		return -EFAULT;
+
+	blockmap.m_header.h_md_entry_type = INCFS_MD_BLOCK_MAP;
+	blockmap.m_header.h_record_size = cpu_to_le16(sizeof(blockmap));
+	blockmap.m_header.h_next_md_offset = cpu_to_le64(0);
+	blockmap.m_block_count = cpu_to_le32(block_count);
+
+	LOCK_REQUIRED(bfc->bc_mutex);
+
+	/* Reserve 0-filled space for the blockmap body in the backing file. */
+	file_end = incfs_get_end_offset(bfc->bc_file);
+	result = append_zeros(bfc, map_size);
+	if (result)
+		return result;
+
+	/* Write blockmap metadata record pointing to the body written above. */
+	blockmap.m_base_offset = cpu_to_le64(file_end);
+	result = append_md_to_backing_file(bfc, &blockmap.m_header);
+	if (result)
+		/* Error, rollback file changes */
+		truncate_backing_file(bfc, file_end);
+
+	return result;
+}
+
+int incfs_write_signature_to_backing_file(struct backing_file_context *bfc,
+					struct mem_range sig, u32 tree_size,
+					loff_t *tree_offset, loff_t *sig_offset)
+{
+	struct incfs_file_signature sg = {};
+	int result = 0;
+	loff_t rollback_pos = 0;
+	loff_t tree_area_pos = 0;
+	size_t alignment = 0;
+
+	if (!bfc)
+		return -EFAULT;
+
+	LOCK_REQUIRED(bfc->bc_mutex);
+
+	rollback_pos = incfs_get_end_offset(bfc->bc_file);
+
+	sg.sg_header.h_md_entry_type = INCFS_MD_SIGNATURE;
+	sg.sg_header.h_record_size = cpu_to_le16(sizeof(sg));
+	sg.sg_header.h_next_md_offset = cpu_to_le64(0);
+	if (sig.data != NULL && sig.len > 0) {
+		sg.sg_sig_size = cpu_to_le32(sig.len);
+		sg.sg_sig_offset = cpu_to_le64(rollback_pos);
+
+		result = write_to_bf(bfc, sig.data, sig.len, rollback_pos);
+		if (result)
+			goto err;
+	}
+
+	tree_area_pos = incfs_get_end_offset(bfc->bc_file);
+	if (tree_size > 0) {
+		if (tree_size > 5 * INCFS_DATA_FILE_BLOCK_SIZE) {
+			/*
+			 * If hash tree is big enough, it makes sense to
+			 * align in the backing file for faster access.
+			 */
+			loff_t offset = round_up(tree_area_pos, PAGE_SIZE);
+
+			alignment = offset - tree_area_pos;
+			tree_area_pos = offset;
+		}
+
+		/*
+		 * If root hash is not the only hash in the tree.
+		 * reserve 0-filled space for the tree.
+		 */
+		result = append_zeros(bfc, tree_size + alignment);
+		if (result)
+			goto err;
+
+		sg.sg_hash_tree_size = cpu_to_le32(tree_size);
+		sg.sg_hash_tree_offset = cpu_to_le64(tree_area_pos);
+	}
+
+	/* Write a hash tree metadata record pointing to the hash tree above. */
+	result = append_md_to_backing_file(bfc, &sg.sg_header);
+err:
+	if (result)
+		/* Error, rollback file changes */
+		truncate_backing_file(bfc, rollback_pos);
+	else {
+		if (tree_offset)
+			*tree_offset = tree_area_pos;
+		if (sig_offset)
+			*sig_offset = rollback_pos;
+	}
+
+	return result;
+}
+
+static int write_new_status_to_backing_file(struct backing_file_context *bfc,
+				       u32 data_blocks_written,
+				       u32 hash_blocks_written)
+{
+	int result;
+	loff_t rollback_pos;
+	struct incfs_status is = {
+		.is_header = {
+			.h_md_entry_type = INCFS_MD_STATUS,
+			.h_record_size = cpu_to_le16(sizeof(is)),
+		},
+		.is_data_blocks_written = cpu_to_le32(data_blocks_written),
+		.is_hash_blocks_written = cpu_to_le32(hash_blocks_written),
+	};
+
+	LOCK_REQUIRED(bfc->bc_mutex);
+	rollback_pos = incfs_get_end_offset(bfc->bc_file);
+	result = append_md_to_backing_file(bfc, &is.is_header);
+	if (result)
+		truncate_backing_file(bfc, rollback_pos);
+
+	return result;
+}
+
+int incfs_write_status_to_backing_file(struct backing_file_context *bfc,
+				       loff_t status_offset,
+				       u32 data_blocks_written,
+				       u32 hash_blocks_written)
+{
+	struct incfs_status is;
+	int result;
+
+	if (!bfc)
+		return -EFAULT;
+
+	if (status_offset == 0)
+		return write_new_status_to_backing_file(bfc,
+				data_blocks_written, hash_blocks_written);
+
+	result = incfs_kread(bfc, &is, sizeof(is), status_offset);
+	if (result != sizeof(is))
+		return -EIO;
+
+	is.is_data_blocks_written = cpu_to_le32(data_blocks_written);
+	is.is_hash_blocks_written = cpu_to_le32(hash_blocks_written);
+	result = incfs_kwrite(bfc, &is, sizeof(is), status_offset);
+	if (result != sizeof(is))
+		return -EIO;
+
+	return 0;
+}
+
+int incfs_write_verity_signature_to_backing_file(
+		struct backing_file_context *bfc, struct mem_range signature,
+		loff_t *offset)
+{
+	struct incfs_file_verity_signature vs = {};
+	int result;
+	loff_t pos;
+
+	/* No verity signature section is equivalent to an empty section */
+	if (signature.data == NULL || signature.len == 0)
+		return 0;
+
+	pos = incfs_get_end_offset(bfc->bc_file);
+
+	vs = (struct incfs_file_verity_signature) {
+		.vs_header = (struct incfs_md_header) {
+			.h_md_entry_type = INCFS_MD_VERITY_SIGNATURE,
+			.h_record_size = cpu_to_le16(sizeof(vs)),
+			.h_next_md_offset = cpu_to_le64(0),
+		},
+		.vs_size = cpu_to_le32(signature.len),
+		.vs_offset = cpu_to_le64(pos),
+	};
+
+	result = write_to_bf(bfc, signature.data, signature.len, pos);
+	if (result)
+		goto err;
+
+	result = append_md_to_backing_file(bfc, &vs.vs_header);
+	if (result)
+		goto err;
+
+	*offset = pos;
+err:
+	if (result)
+		/* Error, rollback file changes */
+		truncate_backing_file(bfc, pos);
+	return result;
+}
+
+/*
+ * Write a backing file header
+ * It should always be called only on empty file.
+ * fh.fh_first_md_offset is 0 for now, but will be updated
+ * once first metadata record is added.
+ */
+int incfs_write_fh_to_backing_file(struct backing_file_context *bfc,
+				   incfs_uuid_t *uuid, u64 file_size)
+{
+	struct incfs_file_header fh = {};
+	loff_t file_pos = 0;
+
+	if (!bfc)
+		return -EFAULT;
+
+	fh.fh_magic = cpu_to_le64(INCFS_MAGIC_NUMBER);
+	fh.fh_version = cpu_to_le64(INCFS_FORMAT_CURRENT_VER);
+	fh.fh_header_size = cpu_to_le16(sizeof(fh));
+	fh.fh_first_md_offset = cpu_to_le64(0);
+	fh.fh_data_block_size = cpu_to_le16(INCFS_DATA_FILE_BLOCK_SIZE);
+
+	fh.fh_file_size = cpu_to_le64(file_size);
+	fh.fh_uuid = *uuid;
+
+	LOCK_REQUIRED(bfc->bc_mutex);
+
+	file_pos = incfs_get_end_offset(bfc->bc_file);
+	if (file_pos != 0)
+		return -EEXIST;
+
+	return write_to_bf(bfc, &fh, sizeof(fh), file_pos);
+}
+
+/*
+ * Write a backing file header for a mapping file
+ * It should always be called only on empty file.
+ */
+int incfs_write_mapping_fh_to_backing_file(struct backing_file_context *bfc,
+				incfs_uuid_t *uuid, u64 file_size, u64 offset)
+{
+	struct incfs_file_header fh = {};
+	loff_t file_pos = 0;
+
+	if (!bfc)
+		return -EFAULT;
+
+	fh.fh_magic = cpu_to_le64(INCFS_MAGIC_NUMBER);
+	fh.fh_version = cpu_to_le64(INCFS_FORMAT_CURRENT_VER);
+	fh.fh_header_size = cpu_to_le16(sizeof(fh));
+	fh.fh_original_offset = cpu_to_le64(offset);
+	fh.fh_data_block_size = cpu_to_le16(INCFS_DATA_FILE_BLOCK_SIZE);
+
+	fh.fh_mapped_file_size = cpu_to_le64(file_size);
+	fh.fh_original_uuid = *uuid;
+	fh.fh_flags = cpu_to_le32(INCFS_FILE_MAPPED);
+
+	LOCK_REQUIRED(bfc->bc_mutex);
+
+	file_pos = incfs_get_end_offset(bfc->bc_file);
+	if (file_pos != 0)
+		return -EEXIST;
+
+	return write_to_bf(bfc, &fh, sizeof(fh), file_pos);
+}
+
+/* Write a given data block and update file's blockmap to point it. */
+int incfs_write_data_block_to_backing_file(struct backing_file_context *bfc,
+				     struct mem_range block, int block_index,
+				     loff_t bm_base_off, u16 flags)
+{
+	struct incfs_blockmap_entry bm_entry = {};
+	int result = 0;
+	loff_t data_offset = 0;
+	loff_t bm_entry_off =
+		bm_base_off + sizeof(struct incfs_blockmap_entry) * block_index;
+
+	if (!bfc)
+		return -EFAULT;
+
+	if (block.len >= (1 << 16) || block_index < 0)
+		return -EINVAL;
+
+	LOCK_REQUIRED(bfc->bc_mutex);
+
+	data_offset = incfs_get_end_offset(bfc->bc_file);
+	if (data_offset <= bm_entry_off) {
+		/* Blockmap entry is beyond the file's end. It is not normal. */
+		return -EINVAL;
+	}
+
+	/* Write the block data at the end of the backing file. */
+	result = write_to_bf(bfc, block.data, block.len, data_offset);
+	if (result)
+		return result;
+
+	/* Update the blockmap to point to the newly written data. */
+	bm_entry.me_data_offset_lo = cpu_to_le32((u32)data_offset);
+	bm_entry.me_data_offset_hi = cpu_to_le16((u16)(data_offset >> 32));
+	bm_entry.me_data_size = cpu_to_le16((u16)block.len);
+	bm_entry.me_flags = cpu_to_le16(flags);
+
+	return write_to_bf(bfc, &bm_entry, sizeof(bm_entry),
+				bm_entry_off);
+}
+
+int incfs_write_hash_block_to_backing_file(struct backing_file_context *bfc,
+					   struct mem_range block,
+					   int block_index,
+					   loff_t hash_area_off,
+					   loff_t bm_base_off,
+					   loff_t file_size)
+{
+	struct incfs_blockmap_entry bm_entry = {};
+	int result;
+	loff_t data_offset = 0;
+	loff_t file_end = 0;
+	loff_t bm_entry_off =
+		bm_base_off +
+		sizeof(struct incfs_blockmap_entry) *
+			(block_index + get_blocks_count_for_size(file_size));
+
+	if (!bfc)
+		return -EFAULT;
+
+	LOCK_REQUIRED(bfc->bc_mutex);
+
+	data_offset = hash_area_off + block_index * INCFS_DATA_FILE_BLOCK_SIZE;
+	file_end = incfs_get_end_offset(bfc->bc_file);
+	if (data_offset + block.len > file_end) {
+		/* Block is located beyond the file's end. It is not normal. */
+		return -EINVAL;
+	}
+
+	result = write_to_bf(bfc, block.data, block.len, data_offset);
+	if (result)
+		return result;
+
+	bm_entry.me_data_offset_lo = cpu_to_le32((u32)data_offset);
+	bm_entry.me_data_offset_hi = cpu_to_le16((u16)(data_offset >> 32));
+	bm_entry.me_data_size = cpu_to_le16(INCFS_DATA_FILE_BLOCK_SIZE);
+
+	return write_to_bf(bfc, &bm_entry, sizeof(bm_entry), bm_entry_off);
+}
+
+int incfs_read_blockmap_entry(struct backing_file_context *bfc, int block_index,
+			loff_t bm_base_off,
+			struct incfs_blockmap_entry *bm_entry)
+{
+	int error = incfs_read_blockmap_entries(bfc, bm_entry, block_index, 1,
+						bm_base_off);
+
+	if (error < 0)
+		return error;
+
+	if (error == 0)
+		return -EIO;
+
+	if (error != 1)
+		return -EFAULT;
+
+	return 0;
+}
+
+int incfs_read_blockmap_entries(struct backing_file_context *bfc,
+		struct incfs_blockmap_entry *entries,
+		int start_index, int blocks_number,
+		loff_t bm_base_off)
+{
+	loff_t bm_entry_off =
+		bm_base_off + sizeof(struct incfs_blockmap_entry) * start_index;
+	const size_t bytes_to_read = sizeof(struct incfs_blockmap_entry)
+					* blocks_number;
+	int result = 0;
+
+	if (!bfc || !entries)
+		return -EFAULT;
+
+	if (start_index < 0 || bm_base_off <= 0)
+		return -ENODATA;
+
+	result = incfs_kread(bfc, entries, bytes_to_read, bm_entry_off);
+	if (result < 0)
+		return result;
+	return result / sizeof(*entries);
+}
+
+int incfs_read_file_header(struct backing_file_context *bfc,
+			   loff_t *first_md_off, incfs_uuid_t *uuid,
+			   u64 *file_size, u32 *flags)
+{
+	ssize_t bytes_read = 0;
+	struct incfs_file_header fh = {};
+
+	if (!bfc || !first_md_off)
+		return -EFAULT;
+
+	bytes_read = incfs_kread(bfc, &fh, sizeof(fh), 0);
+	if (bytes_read < 0)
+		return bytes_read;
+
+	if (bytes_read < sizeof(fh))
+		return -EBADMSG;
+
+	if (le64_to_cpu(fh.fh_magic) != INCFS_MAGIC_NUMBER)
+		return -EILSEQ;
+
+	if (le64_to_cpu(fh.fh_version) > INCFS_FORMAT_CURRENT_VER)
+		return -EILSEQ;
+
+	if (le16_to_cpu(fh.fh_data_block_size) != INCFS_DATA_FILE_BLOCK_SIZE)
+		return -EILSEQ;
+
+	if (le16_to_cpu(fh.fh_header_size) != sizeof(fh))
+		return -EILSEQ;
+
+	if (first_md_off)
+		*first_md_off = le64_to_cpu(fh.fh_first_md_offset);
+	if (uuid)
+		*uuid = fh.fh_uuid;
+	if (file_size)
+		*file_size = le64_to_cpu(fh.fh_file_size);
+	if (flags)
+		*flags = le32_to_cpu(fh.fh_flags);
+	return 0;
+}
+
+/*
+ * Read through metadata records from the backing file one by one
+ * and call provided metadata handlers.
+ */
+int incfs_read_next_metadata_record(struct backing_file_context *bfc,
+			      struct metadata_handler *handler)
+{
+	const ssize_t max_md_size = INCFS_MAX_METADATA_RECORD_SIZE;
+	ssize_t bytes_read = 0;
+	size_t md_record_size = 0;
+	loff_t next_record = 0;
+	int res = 0;
+	struct incfs_md_header *md_hdr = NULL;
+
+	if (!bfc || !handler)
+		return -EFAULT;
+
+	if (handler->md_record_offset == 0)
+		return -EPERM;
+
+	memset(&handler->md_buffer, 0, max_md_size);
+	bytes_read = incfs_kread(bfc, &handler->md_buffer, max_md_size,
+				 handler->md_record_offset);
+	if (bytes_read < 0)
+		return bytes_read;
+	if (bytes_read < sizeof(*md_hdr))
+		return -EBADMSG;
+
+	md_hdr = &handler->md_buffer.md_header;
+	next_record = le64_to_cpu(md_hdr->h_next_md_offset);
+	md_record_size = le16_to_cpu(md_hdr->h_record_size);
+
+	if (md_record_size > max_md_size) {
+		pr_warn("incfs: The record is too large. Size: %zu",
+				md_record_size);
+		return -EBADMSG;
+	}
+
+	if (bytes_read < md_record_size) {
+		pr_warn("incfs: The record hasn't been fully read.");
+		return -EBADMSG;
+	}
+
+	if (next_record <= handler->md_record_offset && next_record != 0) {
+		pr_warn("incfs: Next record (%lld) points back in file.",
+			next_record);
+		return -EBADMSG;
+	}
+
+	switch (md_hdr->h_md_entry_type) {
+	case INCFS_MD_NONE:
+		break;
+	case INCFS_MD_BLOCK_MAP:
+		if (handler->handle_blockmap)
+			res = handler->handle_blockmap(
+				&handler->md_buffer.blockmap, handler);
+		break;
+	case INCFS_MD_FILE_ATTR:
+		/*
+		 * File attrs no longer supported, ignore section for
+		 * compatibility
+		 */
+		break;
+	case INCFS_MD_SIGNATURE:
+		if (handler->handle_signature)
+			res = handler->handle_signature(
+				&handler->md_buffer.signature, handler);
+		break;
+	case INCFS_MD_STATUS:
+		if (handler->handle_status)
+			res = handler->handle_status(
+				&handler->md_buffer.status, handler);
+		break;
+	case INCFS_MD_VERITY_SIGNATURE:
+		if (handler->handle_verity_signature)
+			res = handler->handle_verity_signature(
+				&handler->md_buffer.verity_signature, handler);
+		break;
+	default:
+		res = -ENOTSUPP;
+		break;
+	}
+
+	if (!res) {
+		if (next_record == 0) {
+			/*
+			 * Zero offset for the next record means that the last
+			 * metadata record has just been processed.
+			 */
+			bfc->bc_last_md_record_offset =
+				handler->md_record_offset;
+		}
+		handler->md_prev_record_offset = handler->md_record_offset;
+		handler->md_record_offset = next_record;
+	}
+	return res;
+}
+
+ssize_t incfs_kread(struct backing_file_context *bfc, void *buf, size_t size,
+		    loff_t pos)
+{
+	const struct cred *old_cred = override_creds(bfc->bc_cred);
+	int ret = kernel_read(bfc->bc_file, buf, size, &pos);
+
+	revert_creds(old_cred);
+	return ret;
+}
+
+ssize_t incfs_kwrite(struct backing_file_context *bfc, const void *buf,
+		     size_t size, loff_t pos)
+{
+	const struct cred *old_cred = override_creds(bfc->bc_cred);
+	int ret = kernel_write(bfc->bc_file, buf, size, &pos);
+
+	revert_creds(old_cred);
+	return ret;
+}

diff --git a/fs/incfs/format.h b/fs/incfs/format.h
new file mode 100644
index 0000000..14e475b
--- /dev/null
+++ b/fs/incfs/format.h

@@ -0,0 +1,407 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2018 Google LLC
+ */
+
+/*
+ * Overview
+ * --------
+ * The backbone of the incremental-fs ondisk format is an append only linked
+ * list of metadata blocks. Each metadata block contains an offset of the next
+ * one. These blocks describe files and directories on the
+ * file system. They also represent actions of adding and removing file names
+ * (hard links).
+ *
+ * Every time incremental-fs instance is mounted, it reads through this list
+ * to recreate filesystem's state in memory. An offset of the first record in
+ * the metadata list is stored in the superblock at the beginning of the backing
+ * file.
+ *
+ * Most of the backing file is taken by data areas and blockmaps.
+ * Since data blocks can be compressed and have different sizes,
+ * single per-file data area can't be pre-allocated. That's why blockmaps are
+ * needed in order to find a location and size of each data block in
+ * the backing file. Each time a file is created, a corresponding block map is
+ * allocated to store future offsets of data blocks.
+ *
+ * Whenever a data block is given by data loader to incremental-fs:
+ *   - A data area with the given block is appended to the end of
+ *     the backing file.
+ *   - A record in the blockmap for the given block index is updated to reflect
+ *     its location, size, and compression algorithm.
+
+ * Metadata records
+ * ----------------
+ * incfs_blockmap - metadata record that specifies size and location
+ *                           of a blockmap area for a given file. This area
+ *                           contains an array of incfs_blockmap_entry-s.
+ * incfs_file_signature - metadata record that specifies where file signature
+ *                           and its hash tree can be found in the backing file.
+ *
+ * incfs_file_attr - metadata record that specifies where additional file
+ *		        attributes blob can be found.
+ *
+ * Metadata header
+ * ---------------
+ * incfs_md_header - header of a metadata record. It's always a part
+ *                   of other structures and served purpose of metadata
+ *                   bookkeeping.
+ *
+ *              +-----------------------------------------------+       ^
+ *              |            incfs_md_header                    |       |
+ *              | 1. type of body(BLOCKMAP, FILE_ATTR..)        |       |
+ *              | 2. size of the whole record header + body     |       |
+ *              | 3. CRC the whole record header + body         |       |
+ *              | 4. offset of the previous md record           |]------+
+ *              | 5. offset of the next md record (md link)     |]---+
+ *              +-----------------------------------------------+    |
+ *              |  Metadata record body with useful data        |    |
+ *              +-----------------------------------------------+    |
+ *                                                                   +--->
+ *
+ * Other ondisk structures
+ * -----------------------
+ * incfs_super_block - backing file header
+ * incfs_blockmap_entry - a record in a blockmap area that describes size
+ *                       and location of a data block.
+ * Data blocks dont have any particular structure, they are written to the
+ * backing file in a raw form as they come from a data loader.
+ *
+ * Backing file layout
+ * -------------------
+ *
+ *
+ *              +-------------------------------------------+
+ *              |            incfs_file_header              |]---+
+ *              +-------------------------------------------+    |
+ *              |                 metadata                  |<---+
+ *              |           incfs_file_signature            |]---+
+ *              +-------------------------------------------+    |
+ *                        .........................              |
+ *              +-------------------------------------------+    |   metadata
+ *     +------->|               blockmap area               |    |  list links
+ *     |        |          [incfs_blockmap_entry]           |    |
+ *     |        |          [incfs_blockmap_entry]           |    |
+ *     |        |          [incfs_blockmap_entry]           |    |
+ *     |    +--[|          [incfs_blockmap_entry]           |    |
+ *     |    |   |          [incfs_blockmap_entry]           |    |
+ *     |    |   |          [incfs_blockmap_entry]           |    |
+ *     |    |   +-------------------------------------------+    |
+ *     |    |             .........................              |
+ *     |    |   +-------------------------------------------+    |
+ *     |    |   |                 metadata                  |<---+
+ *     +----|--[|               incfs_blockmap              |]---+
+ *          |   +-------------------------------------------+    |
+ *          |             .........................              |
+ *          |   +-------------------------------------------+    |
+ *          +-->|                 data block                |    |
+ *              +-------------------------------------------+    |
+ *                        .........................              |
+ *              +-------------------------------------------+    |
+ *              |                 metadata                  |<---+
+ *              |              incfs_file_attr              |
+ *              +-------------------------------------------+
+ */
+#ifndef _INCFS_FORMAT_H
+#define _INCFS_FORMAT_H
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <uapi/linux/incrementalfs.h>
+
+#include "internal.h"
+
+#define INCFS_MAX_NAME_LEN 255
+#define INCFS_FORMAT_V1 1
+#define INCFS_FORMAT_CURRENT_VER INCFS_FORMAT_V1
+
+enum incfs_metadata_type {
+	INCFS_MD_NONE = 0,
+	INCFS_MD_BLOCK_MAP = 1,
+	INCFS_MD_FILE_ATTR = 2,
+	INCFS_MD_SIGNATURE = 3,
+	INCFS_MD_STATUS = 4,
+	INCFS_MD_VERITY_SIGNATURE = 5,
+};
+
+enum incfs_file_header_flags {
+	INCFS_FILE_MAPPED = 1 << 1,
+};
+
+/* Header included at the beginning of all metadata records on the disk. */
+struct incfs_md_header {
+	__u8 h_md_entry_type;
+
+	/*
+	 * Size of the metadata record.
+	 * (e.g. inode, dir entry etc) not just this struct.
+	 */
+	__le16 h_record_size;
+
+	/*
+	 * Was: CRC32 of the metadata record.
+	 * (e.g. inode, dir entry etc) not just this struct.
+	 */
+	__le32 h_unused1;
+
+	/* Offset of the next metadata entry if any */
+	__le64 h_next_md_offset;
+
+	/* Was: Offset of the previous metadata entry if any */
+	__le64 h_unused2;
+
+} __packed;
+
+/* Backing file header */
+struct incfs_file_header {
+	/* Magic number: INCFS_MAGIC_NUMBER */
+	__le64 fh_magic;
+
+	/* Format version: INCFS_FORMAT_CURRENT_VER */
+	__le64 fh_version;
+
+	/* sizeof(incfs_file_header) */
+	__le16 fh_header_size;
+
+	/* INCFS_DATA_FILE_BLOCK_SIZE */
+	__le16 fh_data_block_size;
+
+	/* File flags, from incfs_file_header_flags */
+	__le32 fh_flags;
+
+	union {
+		/* Standard incfs file */
+		struct {
+			/* Offset of the first metadata record */
+			__le64 fh_first_md_offset;
+
+			/* Full size of the file's content */
+			__le64 fh_file_size;
+
+			/* File uuid */
+			incfs_uuid_t fh_uuid;
+		};
+
+		/* Mapped file - INCFS_FILE_MAPPED set in fh_flags */
+		struct {
+			/* Offset in original file */
+			__le64 fh_original_offset;
+
+			/* Full size of the file's content */
+			__le64 fh_mapped_file_size;
+
+			/* Original file's uuid */
+			incfs_uuid_t fh_original_uuid;
+		};
+	};
+} __packed;
+
+enum incfs_block_map_entry_flags {
+	INCFS_BLOCK_COMPRESSED_LZ4 = 1,
+	INCFS_BLOCK_COMPRESSED_ZSTD = 2,
+
+	/* Reserve 3 bits for compression alg */
+	INCFS_BLOCK_COMPRESSED_MASK = 7,
+};
+
+/* Block map entry pointing to an actual location of the data block. */
+struct incfs_blockmap_entry {
+	/* Offset of the actual data block. Lower 32 bits */
+	__le32 me_data_offset_lo;
+
+	/* Offset of the actual data block. Higher 16 bits */
+	__le16 me_data_offset_hi;
+
+	/* How many bytes the data actually occupies in the backing file */
+	__le16 me_data_size;
+
+	/* Block flags from incfs_block_map_entry_flags */
+	__le16 me_flags;
+} __packed;
+
+/* Metadata record for locations of file blocks. Type = INCFS_MD_BLOCK_MAP */
+struct incfs_blockmap {
+	struct incfs_md_header m_header;
+
+	/* Base offset of the array of incfs_blockmap_entry */
+	__le64 m_base_offset;
+
+	/* Size of the map entry array in blocks */
+	__le32 m_block_count;
+} __packed;
+
+/*
+ * Metadata record for file signature. Type = INCFS_MD_SIGNATURE
+ *
+ * The signature stored here is the APK V4 signature data blob. See the
+ * definition of incfs_new_file_args::signature_info for an explanation of this
+ * blob. Specifically, it contains the root hash, but it does *not* contain
+ * anything that the kernel treats as a signature.
+ *
+ * When FS_IOC_ENABLE_VERITY is called on a file without this record, an APK V4
+ * signature blob and a hash tree are added to the file, and then this metadata
+ * record is created to record their locations.
+ */
+struct incfs_file_signature {
+	struct incfs_md_header sg_header;
+
+	__le32 sg_sig_size; /* The size of the signature. */
+
+	__le64 sg_sig_offset; /* Signature's offset in the backing file */
+
+	__le32 sg_hash_tree_size; /* The size of the hash tree. */
+
+	__le64 sg_hash_tree_offset; /* Hash tree offset in the backing file */
+} __packed;
+
+/* In memory version of above */
+struct incfs_df_signature {
+	u32 sig_size;
+	u64 sig_offset;
+	u32 hash_size;
+	u64 hash_offset;
+};
+
+struct incfs_status {
+	struct incfs_md_header is_header;
+
+	__le32 is_data_blocks_written; /* Number of data blocks written */
+
+	__le32 is_hash_blocks_written; /* Number of hash blocks written */
+
+	__le32 is_dummy[6]; /* Spare fields */
+} __packed;
+
+/*
+ * Metadata record for verity signature. Type = INCFS_MD_VERITY_SIGNATURE
+ *
+ * This record will only exist for verity-enabled files with signatures. Verity
+ * enabled files without signatures do not have this record. This signature is
+ * checked by fs-verity identically to any other fs-verity signature.
+ */
+struct incfs_file_verity_signature {
+	struct incfs_md_header vs_header;
+
+	 /* The size of the signature */
+	__le32 vs_size;
+
+	 /* Signature's offset in the backing file */
+	__le64 vs_offset;
+} __packed;
+
+/* In memory version of above */
+struct incfs_df_verity_signature {
+	u32 size;
+	u64 offset;
+};
+
+/* State of the backing file. */
+struct backing_file_context {
+	/* Protects writes to bc_file */
+	struct mutex bc_mutex;
+
+	/* File object to read data from */
+	struct file *bc_file;
+
+	/*
+	 * Offset of the last known metadata record in the backing file.
+	 * 0 means there are no metadata records.
+	 */
+	loff_t bc_last_md_record_offset;
+
+	/*
+	 * Credentials to set before reads/writes
+	 * Note that this is a pointer to the mount_info mi_owner field so
+	 * there is no need to get/put the creds
+	 */
+	const struct cred *bc_cred;
+};
+
+struct metadata_handler {
+	loff_t md_record_offset;
+	loff_t md_prev_record_offset;
+	void *context;
+
+	union {
+		struct incfs_md_header md_header;
+		struct incfs_blockmap blockmap;
+		struct incfs_file_signature signature;
+		struct incfs_status status;
+		struct incfs_file_verity_signature verity_signature;
+	} md_buffer;
+
+	int (*handle_blockmap)(struct incfs_blockmap *bm,
+			       struct metadata_handler *handler);
+	int (*handle_signature)(struct incfs_file_signature *sig,
+				 struct metadata_handler *handler);
+	int (*handle_status)(struct incfs_status *sig,
+				 struct metadata_handler *handler);
+	int (*handle_verity_signature)(struct incfs_file_verity_signature *s,
+					struct metadata_handler *handler);
+};
+#define INCFS_MAX_METADATA_RECORD_SIZE \
+	sizeof_field(struct metadata_handler, md_buffer)
+
+/* Backing file context management */
+struct mount_info;
+struct backing_file_context *incfs_alloc_bfc(struct mount_info *mi,
+					     struct file *backing_file);
+
+void incfs_free_bfc(struct backing_file_context *bfc);
+
+/* Writing stuff */
+int incfs_write_blockmap_to_backing_file(struct backing_file_context *bfc,
+					 u32 block_count);
+
+int incfs_write_fh_to_backing_file(struct backing_file_context *bfc,
+				   incfs_uuid_t *uuid, u64 file_size);
+
+int incfs_write_mapping_fh_to_backing_file(struct backing_file_context *bfc,
+				incfs_uuid_t *uuid, u64 file_size, u64 offset);
+
+int incfs_write_data_block_to_backing_file(struct backing_file_context *bfc,
+					   struct mem_range block,
+					   int block_index, loff_t bm_base_off,
+					   u16 flags);
+
+int incfs_write_hash_block_to_backing_file(struct backing_file_context *bfc,
+					   struct mem_range block,
+					   int block_index,
+					   loff_t hash_area_off,
+					   loff_t bm_base_off,
+					   loff_t file_size);
+
+int incfs_write_signature_to_backing_file(struct backing_file_context *bfc,
+				struct mem_range sig, u32 tree_size,
+				loff_t *tree_offset, loff_t *sig_offset);
+
+int incfs_write_status_to_backing_file(struct backing_file_context *bfc,
+				       loff_t status_offset,
+				       u32 data_blocks_written,
+				       u32 hash_blocks_written);
+int incfs_write_verity_signature_to_backing_file(
+		struct backing_file_context *bfc, struct mem_range signature,
+		loff_t *offset);
+
+/* Reading stuff */
+int incfs_read_file_header(struct backing_file_context *bfc,
+			   loff_t *first_md_off, incfs_uuid_t *uuid,
+			   u64 *file_size, u32 *flags);
+
+int incfs_read_blockmap_entry(struct backing_file_context *bfc, int block_index,
+			      loff_t bm_base_off,
+			      struct incfs_blockmap_entry *bm_entry);
+
+int incfs_read_blockmap_entries(struct backing_file_context *bfc,
+		struct incfs_blockmap_entry *entries,
+		int start_index, int blocks_number,
+		loff_t bm_base_off);
+
+int incfs_read_next_metadata_record(struct backing_file_context *bfc,
+				    struct metadata_handler *handler);
+
+ssize_t incfs_kread(struct backing_file_context *bfc, void *buf, size_t size,
+		    loff_t pos);
+ssize_t incfs_kwrite(struct backing_file_context *bfc, const void *buf,
+		     size_t size, loff_t pos);
+
+#endif /* _INCFS_FORMAT_H */

diff --git a/fs/incfs/integrity.c b/fs/incfs/integrity.c
new file mode 100644
index 0000000..93e7643
--- /dev/null
+++ b/fs/incfs/integrity.c

@@ -0,0 +1,235 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2019 Google LLC
+ */
+#include <crypto/sha2.h>
+#include <crypto/hash.h>
+#include <linux/err.h>
+#include <linux/version.h>
+
+#include "integrity.h"
+
+struct incfs_hash_alg *incfs_get_hash_alg(enum incfs_hash_tree_algorithm id)
+{
+	static struct incfs_hash_alg sha256 = {
+		.name = "sha256",
+		.digest_size = SHA256_DIGEST_SIZE,
+		.id = INCFS_HASH_TREE_SHA256
+	};
+	struct incfs_hash_alg *result = NULL;
+	struct crypto_shash *shash;
+
+	if (id == INCFS_HASH_TREE_SHA256) {
+		BUILD_BUG_ON(INCFS_MAX_HASH_SIZE < SHA256_DIGEST_SIZE);
+		result = &sha256;
+	}
+
+	if (result == NULL)
+		return ERR_PTR(-ENOENT);
+
+	/* pairs with cmpxchg_release() below */
+	shash = smp_load_acquire(&result->shash);
+	if (shash)
+		return result;
+
+	shash = crypto_alloc_shash(result->name, 0, 0);
+	if (IS_ERR(shash)) {
+		int err = PTR_ERR(shash);
+
+		pr_err("Can't allocate hash alg %s, error code:%d",
+			result->name, err);
+		return ERR_PTR(err);
+	}
+
+	/* pairs with smp_load_acquire() above */
+	if (cmpxchg_release(&result->shash, NULL, shash) != NULL)
+		crypto_free_shash(shash);
+
+	return result;
+}
+
+struct signature_info {
+	u32 version;
+	enum incfs_hash_tree_algorithm hash_algorithm;
+	u8 log2_blocksize;
+	struct mem_range salt;
+	struct mem_range root_hash;
+};
+
+static bool read_u32(u8 **p, u8 *top, u32 *result)
+{
+	if (*p + sizeof(u32) > top)
+		return false;
+
+	*result = le32_to_cpu(*(__le32 *)*p);
+	*p += sizeof(u32);
+	return true;
+}
+
+static bool read_u8(u8 **p, u8 *top, u8 *result)
+{
+	if (*p + sizeof(u8) > top)
+		return false;
+
+	*result = *(u8 *)*p;
+	*p += sizeof(u8);
+	return true;
+}
+
+static bool read_mem_range(u8 **p, u8 *top, struct mem_range *range)
+{
+	u32 len;
+
+	if (!read_u32(p, top, &len) || *p + len > top)
+		return false;
+
+	range->len = len;
+	range->data = *p;
+	*p += len;
+	return true;
+}
+
+static int incfs_parse_signature(struct mem_range signature,
+				 struct signature_info *si)
+{
+	u8 *p = signature.data;
+	u8 *top = signature.data + signature.len;
+	u32 hash_section_size;
+
+	if (signature.len > INCFS_MAX_SIGNATURE_SIZE)
+		return -EINVAL;
+
+	if (!read_u32(&p, top, &si->version) ||
+	    si->version != INCFS_SIGNATURE_VERSION)
+		return -EINVAL;
+
+	if (!read_u32(&p, top, &hash_section_size) ||
+	    p + hash_section_size > top)
+		return -EINVAL;
+	top = p + hash_section_size;
+
+	if (!read_u32(&p, top, &si->hash_algorithm) ||
+	    si->hash_algorithm != INCFS_HASH_TREE_SHA256)
+		return -EINVAL;
+
+	if (!read_u8(&p, top, &si->log2_blocksize) || si->log2_blocksize != 12)
+		return -EINVAL;
+
+	if (!read_mem_range(&p, top, &si->salt))
+		return -EINVAL;
+
+	if (!read_mem_range(&p, top, &si->root_hash))
+		return -EINVAL;
+
+	if (p != top)
+		return -EINVAL;
+
+	return 0;
+}
+
+struct mtree *incfs_alloc_mtree(struct mem_range signature,
+				int data_block_count)
+{
+	int error;
+	struct signature_info si;
+	struct mtree *result = NULL;
+	struct incfs_hash_alg *hash_alg = NULL;
+	int hash_per_block;
+	int lvl;
+	int total_blocks = 0;
+	int blocks_in_level[INCFS_MAX_MTREE_LEVELS];
+	int blocks = data_block_count;
+
+	if (data_block_count <= 0)
+		return ERR_PTR(-EINVAL);
+
+	error = incfs_parse_signature(signature, &si);
+	if (error)
+		return ERR_PTR(error);
+
+	hash_alg = incfs_get_hash_alg(si.hash_algorithm);
+	if (IS_ERR(hash_alg))
+		return ERR_PTR(PTR_ERR(hash_alg));
+
+	if (si.root_hash.len < hash_alg->digest_size)
+		return ERR_PTR(-EINVAL);
+
+	result = kzalloc(sizeof(*result), GFP_NOFS);
+	if (!result)
+		return ERR_PTR(-ENOMEM);
+
+	result->alg = hash_alg;
+	hash_per_block = INCFS_DATA_FILE_BLOCK_SIZE / result->alg->digest_size;
+
+	/* Calculating tree geometry. */
+	/* First pass: calculate how many blocks in each tree level. */
+	for (lvl = 0; blocks > 1; lvl++) {
+		if (lvl >= INCFS_MAX_MTREE_LEVELS) {
+			pr_err("incfs: too much data in mtree");
+			goto err;
+		}
+
+		blocks = (blocks + hash_per_block - 1) / hash_per_block;
+		blocks_in_level[lvl] = blocks;
+		total_blocks += blocks;
+	}
+	result->depth = lvl;
+	result->hash_tree_area_size = total_blocks * INCFS_DATA_FILE_BLOCK_SIZE;
+	if (result->hash_tree_area_size > INCFS_MAX_HASH_AREA_SIZE)
+		goto err;
+
+	blocks = 0;
+	/* Second pass: calculate offset of each level. 0th level goes last. */
+	for (lvl = 0; lvl < result->depth; lvl++) {
+		u32 suboffset;
+
+		blocks += blocks_in_level[lvl];
+		suboffset = (total_blocks - blocks)
+					* INCFS_DATA_FILE_BLOCK_SIZE;
+
+		result->hash_level_suboffset[lvl] = suboffset;
+	}
+
+	/* Root hash is stored separately from the rest of the tree. */
+	memcpy(result->root_hash, si.root_hash.data, hash_alg->digest_size);
+	return result;
+
+err:
+	kfree(result);
+	return ERR_PTR(-E2BIG);
+}
+
+void incfs_free_mtree(struct mtree *tree)
+{
+	kfree(tree);
+}
+
+int incfs_calc_digest(struct incfs_hash_alg *alg, struct mem_range data,
+			struct mem_range digest)
+{
+	SHASH_DESC_ON_STACK(desc, alg->shash);
+
+	if (!alg || !alg->shash || !data.data || !digest.data)
+		return -EFAULT;
+
+	if (alg->digest_size > digest.len)
+		return -EINVAL;
+
+	desc->tfm = alg->shash;
+
+	if (data.len < INCFS_DATA_FILE_BLOCK_SIZE) {
+		int err;
+		void *buf = kzalloc(INCFS_DATA_FILE_BLOCK_SIZE, GFP_NOFS);
+
+		if (!buf)
+			return -ENOMEM;
+
+		memcpy(buf, data.data, data.len);
+		err = crypto_shash_digest(desc, buf, INCFS_DATA_FILE_BLOCK_SIZE,
+					  digest.data);
+		kfree(buf);
+		return err;
+	}
+	return crypto_shash_digest(desc, data.data, data.len, digest.data);
+}
+

diff --git a/fs/incfs/integrity.h b/fs/incfs/integrity.h
new file mode 100644
index 0000000..cf79b64
--- /dev/null
+++ b/fs/incfs/integrity.h

@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2019 Google LLC
+ */
+#ifndef _INCFS_INTEGRITY_H
+#define _INCFS_INTEGRITY_H
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <crypto/hash.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+#include "internal.h"
+
+#define INCFS_MAX_MTREE_LEVELS 8
+#define INCFS_MAX_HASH_AREA_SIZE (1280 * 1024 * 1024)
+
+struct incfs_hash_alg {
+	const char *name;
+	int digest_size;
+	enum incfs_hash_tree_algorithm id;
+
+	struct crypto_shash *shash;
+};
+
+/* Merkle tree structure. */
+struct mtree {
+	struct incfs_hash_alg *alg;
+
+	u8 root_hash[INCFS_MAX_HASH_SIZE];
+
+	/* Offset of each hash level in the hash area. */
+	u32 hash_level_suboffset[INCFS_MAX_MTREE_LEVELS];
+
+	u32 hash_tree_area_size;
+
+	/* Number of levels in hash_level_suboffset */
+	int depth;
+};
+
+struct incfs_hash_alg *incfs_get_hash_alg(enum incfs_hash_tree_algorithm id);
+
+struct mtree *incfs_alloc_mtree(struct mem_range signature,
+				int data_block_count);
+
+void incfs_free_mtree(struct mtree *tree);
+
+size_t incfs_get_mtree_depth(enum incfs_hash_tree_algorithm alg, loff_t size);
+
+size_t incfs_get_mtree_hash_count(enum incfs_hash_tree_algorithm alg,
+					loff_t size);
+
+int incfs_calc_digest(struct incfs_hash_alg *alg, struct mem_range data,
+			struct mem_range digest);
+
+#endif /* _INCFS_INTEGRITY_H */

diff --git a/fs/incfs/internal.h b/fs/incfs/internal.h
new file mode 100644
index 0000000..c2df8bf
--- /dev/null
+++ b/fs/incfs/internal.h

@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2018 Google LLC
+ */
+#ifndef _INCFS_INTERNAL_H
+#define _INCFS_INTERNAL_H
+#include <linux/types.h>
+
+struct mem_range {
+	u8 *data;
+	size_t len;
+};
+
+static inline struct mem_range range(u8 *data, size_t len)
+{
+	return (struct mem_range){ .data = data, .len = len };
+}
+
+#define LOCK_REQUIRED(lock)  WARN_ON_ONCE(!mutex_is_locked(&lock))
+
+#define EFSCORRUPTED EUCLEAN
+
+#endif /* _INCFS_INTERNAL_H */

diff --git a/fs/incfs/main.c b/fs/incfs/main.c
new file mode 100644
index 0000000..23347ac
--- /dev/null
+++ b/fs/incfs/main.c

@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+#include "sysfs.h"
+#include "vfs.h"
+
+static struct file_system_type incfs_fs_type = {
+	.owner = THIS_MODULE,
+	.name = INCFS_NAME,
+	.mount = incfs_mount_fs,
+	.kill_sb = incfs_kill_sb,
+	.fs_flags = 0
+};
+
+static int __init init_incfs_module(void)
+{
+	int err = 0;
+
+	err = incfs_init_sysfs();
+	if (err)
+		return err;
+
+	err = register_filesystem(&incfs_fs_type);
+	if (err)
+		incfs_cleanup_sysfs();
+
+	return err;
+}
+
+static void __exit cleanup_incfs_module(void)
+{
+	incfs_cleanup_sysfs();
+	unregister_filesystem(&incfs_fs_type);
+}
+
+module_init(init_incfs_module);
+module_exit(cleanup_incfs_module);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Eugene Zemtsov <ezemtsov@google.com>");
+MODULE_DESCRIPTION("Incremental File System");

diff --git a/fs/incfs/pseudo_files.c b/fs/incfs/pseudo_files.c
new file mode 100644
index 0000000..57c3353
--- /dev/null
+++ b/fs/incfs/pseudo_files.c

@@ -0,0 +1,1393 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Google LLC
+ */
+
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/fsnotify.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/syscalls.h>
+#include <linux/fdtable.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+#include "pseudo_files.h"
+
+#include "data_mgmt.h"
+#include "format.h"
+#include "integrity.h"
+#include "vfs.h"
+
+#define READ_WRITE_FILE_MODE 0666
+
+static bool is_pseudo_filename(struct mem_range name);
+
+/*******************************************************************************
+ * .pending_reads pseudo file definition
+ ******************************************************************************/
+#define INCFS_PENDING_READS_INODE 2
+static const char pending_reads_file_name[] = INCFS_PENDING_READS_FILENAME;
+
+/* State of an open .pending_reads file, unique for each file descriptor. */
+struct pending_reads_state {
+	/* A serial number of the last pending read obtained from this file. */
+	int last_pending_read_sn;
+};
+
+static ssize_t pending_reads_read(struct file *f, char __user *buf, size_t len,
+			    loff_t *ppos)
+{
+	struct pending_reads_state *pr_state = f->private_data;
+	struct mount_info *mi = get_mount_info(file_superblock(f));
+	bool report_uid;
+	unsigned long page = 0;
+	struct incfs_pending_read_info *reads_buf = NULL;
+	struct incfs_pending_read_info2 *reads_buf2 = NULL;
+	size_t record_size;
+	size_t reads_to_collect;
+	int last_known_read_sn = READ_ONCE(pr_state->last_pending_read_sn);
+	int new_max_sn = last_known_read_sn;
+	int reads_collected = 0;
+	ssize_t result = 0;
+
+	if (!mi)
+		return -EFAULT;
+
+	report_uid = mi->mi_options.report_uid;
+	record_size = report_uid ? sizeof(*reads_buf2) : sizeof(*reads_buf);
+	reads_to_collect = len / record_size;
+
+	if (!incfs_fresh_pending_reads_exist(mi, last_known_read_sn))
+		return 0;
+
+	page = get_zeroed_page(GFP_NOFS);
+	if (!page)
+		return -ENOMEM;
+
+	if (report_uid)
+		reads_buf2 = (struct incfs_pending_read_info2 *) page;
+	else
+		reads_buf = (struct incfs_pending_read_info *) page;
+
+	reads_to_collect =
+		min_t(size_t, PAGE_SIZE / record_size, reads_to_collect);
+
+	reads_collected = incfs_collect_pending_reads(mi, last_known_read_sn,
+				reads_buf, reads_buf2, reads_to_collect,
+				&new_max_sn);
+
+	if (reads_collected < 0) {
+		result = reads_collected;
+		goto out;
+	}
+
+	/*
+	 * Just to make sure that we don't accidentally copy more data
+	 * to reads buffer than userspace can handle.
+	 */
+	reads_collected = min_t(size_t, reads_collected, reads_to_collect);
+	result = reads_collected * record_size;
+
+	/* Copy reads info to the userspace buffer */
+	if (copy_to_user(buf, (void *)page, result)) {
+		result = -EFAULT;
+		goto out;
+	}
+
+	WRITE_ONCE(pr_state->last_pending_read_sn, new_max_sn);
+	*ppos = 0;
+
+out:
+	free_page(page);
+	return result;
+}
+
+static __poll_t pending_reads_poll(struct file *file, poll_table *wait)
+{
+	struct pending_reads_state *state = file->private_data;
+	struct mount_info *mi = get_mount_info(file_superblock(file));
+	__poll_t ret = 0;
+
+	poll_wait(file, &mi->mi_pending_reads_notif_wq, wait);
+	if (incfs_fresh_pending_reads_exist(mi,
+					    state->last_pending_read_sn))
+		ret = EPOLLIN | EPOLLRDNORM;
+
+	return ret;
+}
+
+static int pending_reads_open(struct inode *inode, struct file *file)
+{
+	struct pending_reads_state *state = NULL;
+
+	state = kzalloc(sizeof(*state), GFP_NOFS);
+	if (!state)
+		return -ENOMEM;
+
+	file->private_data = state;
+	return 0;
+}
+
+static int pending_reads_release(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+	return 0;
+}
+
+static long ioctl_permit_fill(struct file *f, void __user *arg)
+{
+	struct incfs_permit_fill __user *usr_permit_fill = arg;
+	struct incfs_permit_fill permit_fill;
+	long error = 0;
+	struct file *file = NULL;
+	struct incfs_file_data *fd;
+
+	if (copy_from_user(&permit_fill, usr_permit_fill, sizeof(permit_fill)))
+		return -EFAULT;
+
+	file = fget(permit_fill.file_descriptor);
+	if (IS_ERR_OR_NULL(file)) {
+		if (!file)
+			return -ENOENT;
+
+		return PTR_ERR(file);
+	}
+
+	if (file->f_op != &incfs_file_ops) {
+		error = -EPERM;
+		goto out;
+	}
+
+	if (file->f_inode->i_sb != f->f_inode->i_sb) {
+		error = -EPERM;
+		goto out;
+	}
+
+	fd = file->private_data;
+
+	switch (fd->fd_fill_permission) {
+	case CANT_FILL:
+		fd->fd_fill_permission = CAN_FILL;
+		break;
+
+	case CAN_FILL:
+		pr_debug("CAN_FILL already set");
+		break;
+
+	default:
+		pr_warn("Invalid file private data");
+		error = -EFAULT;
+		goto out;
+	}
+
+out:
+	fput(file);
+	return error;
+}
+
+static int chmod(struct dentry *dentry, umode_t mode)
+{
+	struct inode *inode = dentry->d_inode;
+	struct inode *delegated_inode = NULL;
+	struct iattr newattrs;
+	int error;
+
+retry_deleg:
+	inode_lock(inode);
+	newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
+	newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
+	error = notify_change(&init_user_ns, dentry, &newattrs, &delegated_inode);
+	inode_unlock(inode);
+	if (delegated_inode) {
+		error = break_deleg_wait(&delegated_inode);
+		if (!error)
+			goto retry_deleg;
+	}
+	return error;
+}
+
+static bool incfs_equal_ranges(struct mem_range lhs, struct mem_range rhs)
+{
+	if (lhs.len != rhs.len)
+		return false;
+	return memcmp(lhs.data, rhs.data, lhs.len) == 0;
+}
+
+static int validate_name(char *file_name)
+{
+	struct mem_range name = range(file_name, strlen(file_name));
+	int i = 0;
+
+	if (name.len > INCFS_MAX_NAME_LEN)
+		return -ENAMETOOLONG;
+
+	if (is_pseudo_filename(name))
+		return -EINVAL;
+
+	for (i = 0; i < name.len; i++)
+		if (name.data[i] == '/')
+			return -EINVAL;
+
+	return 0;
+}
+
+static int dir_relative_path_resolve(
+			struct mount_info *mi,
+			const char __user *relative_path,
+			struct path *result_path,
+			struct path *base_path)
+{
+	int dir_fd = get_unused_fd_flags(0);
+	struct file *dir_f = NULL;
+	int error = 0;
+
+	if (!base_path)
+		base_path = &mi->mi_backing_dir_path;
+
+	if (dir_fd < 0)
+		return dir_fd;
+
+	dir_f = dentry_open(base_path, O_RDONLY | O_NOATIME, current_cred());
+
+	if (IS_ERR(dir_f)) {
+		error = PTR_ERR(dir_f);
+		goto out;
+	}
+	fd_install(dir_fd, dir_f);
+
+	if (!relative_path) {
+		/* No relative path given, just return the base dir. */
+		*result_path = *base_path;
+		path_get(result_path);
+		goto out;
+	}
+
+	error = user_path_at_empty(dir_fd, relative_path,
+		LOOKUP_FOLLOW | LOOKUP_DIRECTORY, result_path, NULL);
+
+out:
+	close_fd(dir_fd);
+	if (error)
+		pr_debug("Error: %d\n", error);
+	return error;
+}
+
+static struct mem_range incfs_copy_signature_info_from_user(u8 __user *original,
+							    u64 size)
+{
+	u8 *result;
+
+	if (!original)
+		return range(NULL, 0);
+
+	if (size > INCFS_MAX_SIGNATURE_SIZE)
+		return range(ERR_PTR(-EFAULT), 0);
+
+	result = kzalloc(size, GFP_NOFS | __GFP_COMP);
+	if (!result)
+		return range(ERR_PTR(-ENOMEM), 0);
+
+	if (copy_from_user(result, original, size)) {
+		kfree(result);
+		return range(ERR_PTR(-EFAULT), 0);
+	}
+
+	return range(result, size);
+}
+
+static int init_new_file(struct mount_info *mi, struct dentry *dentry,
+			 incfs_uuid_t *uuid, u64 size, struct mem_range attr,
+			 u8 __user *user_signature_info, u64 signature_size)
+{
+	struct path path = {};
+	struct file *new_file;
+	int error = 0;
+	struct backing_file_context *bfc = NULL;
+	u32 block_count;
+	struct mem_range raw_signature = { NULL };
+	struct mtree *hash_tree = NULL;
+
+	if (!mi || !dentry || !uuid)
+		return -EFAULT;
+
+	/* Resize newly created file to its true size. */
+	path = (struct path) {
+		.mnt = mi->mi_backing_dir_path.mnt,
+		.dentry = dentry
+	};
+
+	new_file = dentry_open(&path, O_RDWR | O_NOATIME | O_LARGEFILE,
+			       current_cred());
+
+	if (IS_ERR(new_file)) {
+		error = PTR_ERR(new_file);
+		goto out;
+	}
+
+	bfc = incfs_alloc_bfc(mi, new_file);
+	fput(new_file);
+	if (IS_ERR(bfc)) {
+		error = PTR_ERR(bfc);
+		bfc = NULL;
+		goto out;
+	}
+
+	mutex_lock(&bfc->bc_mutex);
+	error = incfs_write_fh_to_backing_file(bfc, uuid, size);
+	if (error)
+		goto out;
+
+	block_count = (u32)get_blocks_count_for_size(size);
+
+	if (user_signature_info) {
+		raw_signature = incfs_copy_signature_info_from_user(
+			user_signature_info, signature_size);
+
+		if (IS_ERR(raw_signature.data)) {
+			error = PTR_ERR(raw_signature.data);
+			raw_signature.data = NULL;
+			goto out;
+		}
+
+		hash_tree = incfs_alloc_mtree(raw_signature, block_count);
+		if (IS_ERR(hash_tree)) {
+			error = PTR_ERR(hash_tree);
+			hash_tree = NULL;
+			goto out;
+		}
+
+		error = incfs_write_signature_to_backing_file(bfc,
+				raw_signature, hash_tree->hash_tree_area_size,
+				NULL, NULL);
+		if (error)
+			goto out;
+
+		block_count += get_blocks_count_for_size(
+			hash_tree->hash_tree_area_size);
+	}
+
+	if (block_count)
+		error = incfs_write_blockmap_to_backing_file(bfc, block_count);
+
+	if (error)
+		goto out;
+
+out:
+	if (bfc) {
+		mutex_unlock(&bfc->bc_mutex);
+		incfs_free_bfc(bfc);
+	}
+	incfs_free_mtree(hash_tree);
+	kfree(raw_signature.data);
+
+	if (error)
+		pr_debug("incfs: %s error: %d\n", __func__, error);
+	return error;
+}
+
+static void notify_create(struct file *pending_reads_file,
+			  const char  __user *dir_name, const char *file_name,
+			  const char *file_id_str, bool incomplete_file)
+{
+	struct mount_info *mi =
+		get_mount_info(file_superblock(pending_reads_file));
+	struct path base_path = {
+		.mnt = pending_reads_file->f_path.mnt,
+		.dentry = pending_reads_file->f_path.dentry->d_parent,
+	};
+	struct path dir_path = {};
+	struct dentry *file = NULL;
+	struct dentry *dir = NULL;
+	int error;
+
+	error = dir_relative_path_resolve(mi, dir_name, &dir_path, &base_path);
+	if (error)
+		goto out;
+
+	file = incfs_lookup_dentry(dir_path.dentry, file_name);
+	if (IS_ERR(file)) {
+		error = PTR_ERR(file);
+		file = NULL;
+		goto out;
+	}
+
+	fsnotify_create(d_inode(dir_path.dentry), file);
+
+	if (file_id_str) {
+		dir = incfs_lookup_dentry(base_path.dentry, INCFS_INDEX_NAME);
+		if (IS_ERR(dir)) {
+			error = PTR_ERR(dir);
+			dir = NULL;
+			goto out;
+		}
+
+		dput(file);
+		file = incfs_lookup_dentry(dir, file_id_str);
+		if (IS_ERR(file)) {
+			error = PTR_ERR(file);
+			file = NULL;
+			goto out;
+		}
+
+		fsnotify_create(d_inode(dir), file);
+
+		if (incomplete_file) {
+			dput(dir);
+			dir = incfs_lookup_dentry(base_path.dentry,
+						  INCFS_INCOMPLETE_NAME);
+			if (IS_ERR(dir)) {
+				error = PTR_ERR(dir);
+				dir = NULL;
+				goto out;
+			}
+
+			dput(file);
+			file = incfs_lookup_dentry(dir, file_id_str);
+			if (IS_ERR(file)) {
+				error = PTR_ERR(file);
+				file = NULL;
+				goto out;
+			}
+
+			fsnotify_create(d_inode(dir), file);
+		}
+	}
+out:
+	if (error)
+		pr_warn("%s failed with error %d\n", __func__, error);
+
+	dput(dir);
+	dput(file);
+	path_put(&dir_path);
+}
+
+static long ioctl_create_file(struct file *file,
+			struct incfs_new_file_args __user *usr_args)
+{
+	struct mount_info *mi = get_mount_info(file_superblock(file));
+	struct incfs_new_file_args args;
+	char *file_id_str = NULL;
+	struct dentry *index_file_dentry = NULL;
+	struct dentry *named_file_dentry = NULL;
+	struct dentry *incomplete_file_dentry = NULL;
+	struct path parent_dir_path = {};
+	struct inode *index_dir_inode = NULL;
+	__le64 size_attr_value = 0;
+	char *file_name = NULL;
+	char *attr_value = NULL;
+	int error = 0;
+	bool locked = false;
+	bool index_linked = false;
+	bool name_linked = false;
+	bool incomplete_linked = false;
+
+	if (!mi || !mi->mi_index_dir || !mi->mi_incomplete_dir) {
+		error = -EFAULT;
+		goto out;
+	}
+
+	if (copy_from_user(&args, usr_args, sizeof(args)) > 0) {
+		error = -EFAULT;
+		goto out;
+	}
+
+	file_name = strndup_user(u64_to_user_ptr(args.file_name), PATH_MAX);
+	if (IS_ERR(file_name)) {
+		error = PTR_ERR(file_name);
+		file_name = NULL;
+		goto out;
+	}
+
+	error = validate_name(file_name);
+	if (error)
+		goto out;
+
+	file_id_str = file_id_to_str(args.file_id);
+	if (!file_id_str) {
+		error = -ENOMEM;
+		goto out;
+	}
+
+	error = mutex_lock_interruptible(&mi->mi_dir_struct_mutex);
+	if (error)
+		goto out;
+	locked = true;
+
+	/* Find a directory to put the file into. */
+	error = dir_relative_path_resolve(mi,
+			u64_to_user_ptr(args.directory_path),
+			&parent_dir_path, NULL);
+	if (error)
+		goto out;
+
+	if (parent_dir_path.dentry == mi->mi_index_dir) {
+		/* Can't create a file directly inside .index */
+		error = -EBUSY;
+		goto out;
+	}
+
+	if (parent_dir_path.dentry == mi->mi_incomplete_dir) {
+		/* Can't create a file directly inside .incomplete */
+		error = -EBUSY;
+		goto out;
+	}
+
+	/* Look up a dentry in the parent dir. It should be negative. */
+	named_file_dentry = incfs_lookup_dentry(parent_dir_path.dentry,
+					file_name);
+	if (!named_file_dentry) {
+		error = -EFAULT;
+		goto out;
+	}
+	if (IS_ERR(named_file_dentry)) {
+		error = PTR_ERR(named_file_dentry);
+		named_file_dentry = NULL;
+		goto out;
+	}
+	if (d_really_is_positive(named_file_dentry)) {
+		/* File with this path already exists. */
+		error = -EEXIST;
+		goto out;
+	}
+
+	/* Look up a dentry in the incomplete dir. It should be negative. */
+	incomplete_file_dentry = incfs_lookup_dentry(mi->mi_incomplete_dir,
+					file_id_str);
+	if (!incomplete_file_dentry) {
+		error = -EFAULT;
+		goto out;
+	}
+	if (IS_ERR(incomplete_file_dentry)) {
+		error = PTR_ERR(incomplete_file_dentry);
+		incomplete_file_dentry = NULL;
+		goto out;
+	}
+	if (d_really_is_positive(incomplete_file_dentry)) {
+		/* File with this path already exists. */
+		error = -EEXIST;
+		goto out;
+	}
+
+	/* Look up a dentry in the .index dir. It should be negative. */
+	index_file_dentry = incfs_lookup_dentry(mi->mi_index_dir, file_id_str);
+	if (!index_file_dentry) {
+		error = -EFAULT;
+		goto out;
+	}
+	if (IS_ERR(index_file_dentry)) {
+		error = PTR_ERR(index_file_dentry);
+		index_file_dentry = NULL;
+		goto out;
+	}
+	if (d_really_is_positive(index_file_dentry)) {
+		/* File with this ID already exists in index. */
+		error = -EEXIST;
+		goto out;
+	}
+
+	/* Creating a file in the .index dir. */
+	index_dir_inode = d_inode(mi->mi_index_dir);
+	inode_lock_nested(index_dir_inode, I_MUTEX_PARENT);
+	error = vfs_create(&init_user_ns, index_dir_inode, index_file_dentry,
+			   args.mode | 0222, true);
+	inode_unlock(index_dir_inode);
+
+	if (error)
+		goto out;
+	if (!d_really_is_positive(index_file_dentry)) {
+		error = -EINVAL;
+		goto out;
+	}
+
+	error = chmod(index_file_dentry, args.mode | 0222);
+	if (error) {
+		pr_debug("incfs: chmod err: %d\n", error);
+		goto out;
+	}
+
+	/* Save the file's ID as an xattr for easy fetching in future. */
+	error = vfs_setxattr(&init_user_ns, index_file_dentry, INCFS_XATTR_ID_NAME,
+		file_id_str, strlen(file_id_str), XATTR_CREATE);
+	if (error) {
+		pr_debug("incfs: vfs_setxattr err:%d\n", error);
+		goto out;
+	}
+
+	/* Save the file's size as an xattr for easy fetching in future. */
+	size_attr_value = cpu_to_le64(args.size);
+	error = vfs_setxattr(&init_user_ns, index_file_dentry, INCFS_XATTR_SIZE_NAME,
+		(char *)&size_attr_value, sizeof(size_attr_value),
+		XATTR_CREATE);
+	if (error) {
+		pr_debug("incfs: vfs_setxattr err:%d\n", error);
+		goto out;
+	}
+
+	/* Save the file's attribute as an xattr */
+	if (args.file_attr_len && args.file_attr) {
+		if (args.file_attr_len > INCFS_MAX_FILE_ATTR_SIZE) {
+			error = -E2BIG;
+			goto out;
+		}
+
+		attr_value = kmalloc(args.file_attr_len, GFP_NOFS);
+		if (!attr_value) {
+			error = -ENOMEM;
+			goto out;
+		}
+
+		if (copy_from_user(attr_value,
+				u64_to_user_ptr(args.file_attr),
+				args.file_attr_len) > 0) {
+			error = -EFAULT;
+			goto out;
+		}
+
+		error = vfs_setxattr(&init_user_ns, index_file_dentry,
+				INCFS_XATTR_METADATA_NAME,
+				attr_value, args.file_attr_len,
+				XATTR_CREATE);
+
+		if (error)
+			goto out;
+	}
+
+	/* Initializing a newly created file. */
+	error = init_new_file(mi, index_file_dentry, &args.file_id, args.size,
+			      range(attr_value, args.file_attr_len),
+			      u64_to_user_ptr(args.signature_info),
+			      args.signature_size);
+	if (error)
+		goto out;
+	index_linked = true;
+
+	/* Linking a file with its real name from the requested dir. */
+	error = incfs_link(index_file_dentry, named_file_dentry);
+	if (error)
+		goto out;
+	name_linked = true;
+
+	if (args.size) {
+		/* Linking a file with its incomplete entry */
+		error = incfs_link(index_file_dentry, incomplete_file_dentry);
+		if (error)
+			goto out;
+		incomplete_linked = true;
+	}
+
+	notify_create(file, u64_to_user_ptr(args.directory_path), file_name,
+		      file_id_str, args.size != 0);
+
+out:
+	if (error) {
+		pr_debug("incfs: %s err:%d\n", __func__, error);
+		if (index_linked)
+			incfs_unlink(index_file_dentry);
+		if (name_linked)
+			incfs_unlink(named_file_dentry);
+		if (incomplete_linked)
+			incfs_unlink(incomplete_file_dentry);
+	}
+
+	kfree(file_id_str);
+	kfree(file_name);
+	kfree(attr_value);
+	dput(named_file_dentry);
+	dput(index_file_dentry);
+	dput(incomplete_file_dentry);
+	path_put(&parent_dir_path);
+	if (locked)
+		mutex_unlock(&mi->mi_dir_struct_mutex);
+
+	return error;
+}
+
+static int init_new_mapped_file(struct mount_info *mi, struct dentry *dentry,
+			 incfs_uuid_t *uuid, u64 size, u64 offset)
+{
+	struct path path = {};
+	struct file *new_file;
+	int error = 0;
+	struct backing_file_context *bfc = NULL;
+
+	if (!mi || !dentry || !uuid)
+		return -EFAULT;
+
+	/* Resize newly created file to its true size. */
+	path = (struct path) {
+		.mnt = mi->mi_backing_dir_path.mnt,
+		.dentry = dentry
+	};
+	new_file = dentry_open(&path, O_RDWR | O_NOATIME | O_LARGEFILE,
+			       current_cred());
+
+	if (IS_ERR(new_file)) {
+		error = PTR_ERR(new_file);
+		goto out;
+	}
+
+	bfc = incfs_alloc_bfc(mi, new_file);
+	fput(new_file);
+	if (IS_ERR(bfc)) {
+		error = PTR_ERR(bfc);
+		bfc = NULL;
+		goto out;
+	}
+
+	mutex_lock(&bfc->bc_mutex);
+	error = incfs_write_mapping_fh_to_backing_file(bfc, uuid, size, offset);
+	if (error)
+		goto out;
+
+out:
+	if (bfc) {
+		mutex_unlock(&bfc->bc_mutex);
+		incfs_free_bfc(bfc);
+	}
+
+	if (error)
+		pr_debug("incfs: %s error: %d\n", __func__, error);
+	return error;
+}
+
+static long ioctl_create_mapped_file(struct file *file, void __user *arg)
+{
+	struct mount_info *mi = get_mount_info(file_superblock(file));
+	struct incfs_create_mapped_file_args __user *args_usr_ptr = arg;
+	struct incfs_create_mapped_file_args args = {};
+	char *file_name;
+	int error = 0;
+	struct path parent_dir_path = {};
+	char *source_file_name = NULL;
+	struct dentry *source_file_dentry = NULL;
+	u64 source_file_size;
+	struct dentry *file_dentry = NULL;
+	struct inode *parent_inode;
+	__le64 size_attr_value;
+
+	if (copy_from_user(&args, args_usr_ptr, sizeof(args)) > 0)
+		return -EINVAL;
+
+	file_name = strndup_user(u64_to_user_ptr(args.file_name), PATH_MAX);
+	if (IS_ERR(file_name)) {
+		error = PTR_ERR(file_name);
+		file_name = NULL;
+		goto out;
+	}
+
+	error = validate_name(file_name);
+	if (error)
+		goto out;
+
+	if (args.source_offset % INCFS_DATA_FILE_BLOCK_SIZE) {
+		error = -EINVAL;
+		goto out;
+	}
+
+	/* Validate file mapping is in range */
+	source_file_name = file_id_to_str(args.source_file_id);
+	if (!source_file_name) {
+		pr_warn("Failed to alloc source_file_name\n");
+		error = -ENOMEM;
+		goto out;
+	}
+
+	source_file_dentry = incfs_lookup_dentry(mi->mi_index_dir,
+						       source_file_name);
+	if (!source_file_dentry) {
+		pr_warn("Source file does not exist\n");
+		error = -EINVAL;
+		goto out;
+	}
+	if (IS_ERR(source_file_dentry)) {
+		pr_warn("Error opening source file\n");
+		error = PTR_ERR(source_file_dentry);
+		source_file_dentry = NULL;
+		goto out;
+	}
+	if (!d_really_is_positive(source_file_dentry)) {
+		pr_warn("Source file dentry negative\n");
+		error = -EINVAL;
+		goto out;
+	}
+
+	error = vfs_getxattr(&init_user_ns, source_file_dentry, INCFS_XATTR_SIZE_NAME,
+			     (char *)&size_attr_value, sizeof(size_attr_value));
+	if (error < 0)
+		goto out;
+
+	if (error != sizeof(size_attr_value)) {
+		pr_warn("Mapped file has no size attr\n");
+		error = -EINVAL;
+		goto out;
+	}
+
+	source_file_size = le64_to_cpu(size_attr_value);
+	if (args.source_offset + args.size > source_file_size) {
+		pr_warn("Mapped file out of range\n");
+		error = -EINVAL;
+		goto out;
+	}
+
+	/* Find a directory to put the file into. */
+	error = dir_relative_path_resolve(mi,
+			u64_to_user_ptr(args.directory_path),
+			&parent_dir_path, NULL);
+	if (error)
+		goto out;
+
+	if (parent_dir_path.dentry == mi->mi_index_dir) {
+		/* Can't create a file directly inside .index */
+		error = -EBUSY;
+		goto out;
+	}
+
+	/* Look up a dentry in the parent dir. It should be negative. */
+	file_dentry = incfs_lookup_dentry(parent_dir_path.dentry,
+					file_name);
+	if (!file_dentry) {
+		error = -EFAULT;
+		goto out;
+	}
+	if (IS_ERR(file_dentry)) {
+		error = PTR_ERR(file_dentry);
+		file_dentry = NULL;
+		goto out;
+	}
+	if (d_really_is_positive(file_dentry)) {
+		error = -EEXIST;
+		goto out;
+	}
+
+	parent_inode = d_inode(parent_dir_path.dentry);
+	inode_lock_nested(parent_inode, I_MUTEX_PARENT);
+	error = vfs_create(&init_user_ns, parent_inode, file_dentry,
+			   args.mode | 0222, true);
+	inode_unlock(parent_inode);
+	if (error)
+		goto out;
+
+	error = chmod(file_dentry, args.mode | 0222);
+	if (error) {
+		pr_debug("incfs: chmod err: %d\n", error);
+		goto delete_file;
+	}
+
+	/* Save the file's size as an xattr for easy fetching in future. */
+	size_attr_value = cpu_to_le64(args.size);
+	error = vfs_setxattr(&init_user_ns, file_dentry, INCFS_XATTR_SIZE_NAME,
+		(char *)&size_attr_value, sizeof(size_attr_value),
+		XATTR_CREATE);
+	if (error) {
+		pr_debug("incfs: vfs_setxattr err:%d\n", error);
+		goto delete_file;
+	}
+
+	error = init_new_mapped_file(mi, file_dentry, &args.source_file_id,
+			args.size, args.source_offset);
+	if (error)
+		goto delete_file;
+
+	notify_create(file, u64_to_user_ptr(args.directory_path), file_name,
+		      NULL, false);
+
+	goto out;
+
+delete_file:
+	incfs_unlink(file_dentry);
+
+out:
+	dput(file_dentry);
+	dput(source_file_dentry);
+	path_put(&parent_dir_path);
+	kfree(file_name);
+	kfree(source_file_name);
+	return error;
+}
+
+static long ioctl_get_read_timeouts(struct mount_info *mi, void __user *arg)
+{
+	struct incfs_get_read_timeouts_args __user *args_usr_ptr = arg;
+	struct incfs_get_read_timeouts_args args = {};
+	int error = 0;
+	struct incfs_per_uid_read_timeouts *buffer;
+	int size;
+
+	if (copy_from_user(&args, args_usr_ptr, sizeof(args)))
+		return -EINVAL;
+
+	if (args.timeouts_array_size_out > INCFS_DATA_FILE_BLOCK_SIZE)
+		return -EINVAL;
+
+	buffer = kzalloc(args.timeouts_array_size_out, GFP_NOFS);
+	if (!buffer)
+		return -ENOMEM;
+
+	spin_lock(&mi->mi_per_uid_read_timeouts_lock);
+	size = mi->mi_per_uid_read_timeouts_size;
+	if (args.timeouts_array_size < size)
+		error = -E2BIG;
+	else if (size)
+		memcpy(buffer, mi->mi_per_uid_read_timeouts, size);
+	spin_unlock(&mi->mi_per_uid_read_timeouts_lock);
+
+	args.timeouts_array_size_out = size;
+	if (!error && size)
+		if (copy_to_user(u64_to_user_ptr(args.timeouts_array), buffer,
+				 size))
+			error = -EFAULT;
+
+	if (!error || error == -E2BIG)
+		if (copy_to_user(args_usr_ptr, &args, sizeof(args)) > 0)
+			error = -EFAULT;
+
+	kfree(buffer);
+	return error;
+}
+
+static long ioctl_set_read_timeouts(struct mount_info *mi, void __user *arg)
+{
+	struct incfs_set_read_timeouts_args __user *args_usr_ptr = arg;
+	struct incfs_set_read_timeouts_args args = {};
+	int error = 0;
+	int size;
+	struct incfs_per_uid_read_timeouts *buffer = NULL, *tmp;
+	int i;
+
+	if (copy_from_user(&args, args_usr_ptr, sizeof(args)))
+		return -EINVAL;
+
+	size = args.timeouts_array_size;
+	if (size) {
+		if (size > INCFS_DATA_FILE_BLOCK_SIZE ||
+		    size % sizeof(*buffer) != 0)
+			return -EINVAL;
+
+		buffer = kzalloc(size, GFP_NOFS);
+		if (!buffer)
+			return -ENOMEM;
+
+		if (copy_from_user(buffer, u64_to_user_ptr(args.timeouts_array),
+				   size)) {
+			error = -EINVAL;
+			goto out;
+		}
+
+		for (i = 0; i < size / sizeof(*buffer); ++i) {
+			struct incfs_per_uid_read_timeouts *t = &buffer[i];
+
+			if (t->min_pending_time_us > t->max_pending_time_us) {
+				error = -EINVAL;
+				goto out;
+			}
+		}
+	}
+
+	spin_lock(&mi->mi_per_uid_read_timeouts_lock);
+	mi->mi_per_uid_read_timeouts_size = size;
+	tmp = mi->mi_per_uid_read_timeouts;
+	mi->mi_per_uid_read_timeouts = buffer;
+	buffer = tmp;
+	spin_unlock(&mi->mi_per_uid_read_timeouts_lock);
+
+out:
+	kfree(buffer);
+	return error;
+}
+
+static long ioctl_get_last_read_error(struct mount_info *mi, void __user *arg)
+{
+	struct incfs_get_last_read_error_args __user *args_usr_ptr = arg;
+	struct incfs_get_last_read_error_args args = {};
+	int error;
+
+	error = mutex_lock_interruptible(&mi->mi_le_mutex);
+	if (error)
+		return error;
+
+	args.file_id_out = mi->mi_le_file_id;
+	args.time_us_out = mi->mi_le_time_us;
+	args.page_out = mi->mi_le_page;
+	args.errno_out = mi->mi_le_errno;
+	args.uid_out = mi->mi_le_uid;
+
+	mutex_unlock(&mi->mi_le_mutex);
+	if (copy_to_user(args_usr_ptr, &args, sizeof(args)) > 0)
+		error = -EFAULT;
+
+	return error;
+}
+
+static long pending_reads_dispatch_ioctl(struct file *f, unsigned int req,
+					unsigned long arg)
+{
+	struct mount_info *mi = get_mount_info(file_superblock(f));
+
+	switch (req) {
+	case INCFS_IOC_CREATE_FILE:
+		return ioctl_create_file(f, (void __user *)arg);
+	case INCFS_IOC_PERMIT_FILL:
+		return ioctl_permit_fill(f, (void __user *)arg);
+	case INCFS_IOC_CREATE_MAPPED_FILE:
+		return ioctl_create_mapped_file(f, (void __user *)arg);
+	case INCFS_IOC_GET_READ_TIMEOUTS:
+		return ioctl_get_read_timeouts(mi, (void __user *)arg);
+	case INCFS_IOC_SET_READ_TIMEOUTS:
+		return ioctl_set_read_timeouts(mi, (void __user *)arg);
+	case INCFS_IOC_GET_LAST_READ_ERROR:
+		return ioctl_get_last_read_error(mi, (void __user *)arg);
+	default:
+		return -EINVAL;
+	}
+}
+
+static const struct file_operations incfs_pending_reads_file_ops = {
+	.read = pending_reads_read,
+	.poll = pending_reads_poll,
+	.open = pending_reads_open,
+	.release = pending_reads_release,
+	.llseek = noop_llseek,
+	.unlocked_ioctl = pending_reads_dispatch_ioctl,
+	.compat_ioctl = pending_reads_dispatch_ioctl
+};
+
+/*******************************************************************************
+ * .log pseudo file definition
+ ******************************************************************************/
+#define INCFS_LOG_INODE 3
+static const char log_file_name[] = INCFS_LOG_FILENAME;
+
+/* State of an open .log file, unique for each file descriptor. */
+struct log_file_state {
+	struct read_log_state state;
+};
+
+static ssize_t log_read(struct file *f, char __user *buf, size_t len,
+			loff_t *ppos)
+{
+	struct log_file_state *log_state = f->private_data;
+	struct mount_info *mi = get_mount_info(file_superblock(f));
+	int total_reads_collected = 0;
+	int rl_size;
+	ssize_t result = 0;
+	bool report_uid;
+	unsigned long page = 0;
+	struct incfs_pending_read_info *reads_buf = NULL;
+	struct incfs_pending_read_info2 *reads_buf2 = NULL;
+	size_t record_size;
+	ssize_t reads_to_collect;
+	ssize_t reads_per_page;
+
+	if (!mi)
+		return -EFAULT;
+
+	report_uid = mi->mi_options.report_uid;
+	record_size = report_uid ? sizeof(*reads_buf2) : sizeof(*reads_buf);
+	reads_to_collect = len / record_size;
+	reads_per_page = PAGE_SIZE / record_size;
+
+	rl_size = READ_ONCE(mi->mi_log.rl_size);
+	if (rl_size == 0)
+		return 0;
+
+	page = __get_free_page(GFP_NOFS);
+	if (!page)
+		return -ENOMEM;
+
+	if (report_uid)
+		reads_buf2 = (struct incfs_pending_read_info2 *)page;
+	else
+		reads_buf = (struct incfs_pending_read_info *)page;
+
+	reads_to_collect = min_t(ssize_t, rl_size, reads_to_collect);
+	while (reads_to_collect > 0) {
+		struct read_log_state next_state;
+		int reads_collected;
+
+		memcpy(&next_state, &log_state->state, sizeof(next_state));
+		reads_collected = incfs_collect_logged_reads(
+			mi, &next_state, reads_buf, reads_buf2,
+			min_t(ssize_t, reads_to_collect, reads_per_page));
+		if (reads_collected <= 0) {
+			result = total_reads_collected ?
+				       total_reads_collected * record_size :
+				       reads_collected;
+			goto out;
+		}
+		if (copy_to_user(buf, (void *)page,
+				 reads_collected * record_size)) {
+			result = total_reads_collected ?
+				       total_reads_collected * record_size :
+				       -EFAULT;
+			goto out;
+		}
+
+		memcpy(&log_state->state, &next_state, sizeof(next_state));
+		total_reads_collected += reads_collected;
+		buf += reads_collected * record_size;
+		reads_to_collect -= reads_collected;
+	}
+
+	result = total_reads_collected * record_size;
+	*ppos = 0;
+out:
+	free_page(page);
+	return result;
+}
+
+static __poll_t log_poll(struct file *file, poll_table *wait)
+{
+	struct log_file_state *log_state = file->private_data;
+	struct mount_info *mi = get_mount_info(file_superblock(file));
+	int count;
+	__poll_t ret = 0;
+
+	poll_wait(file, &mi->mi_log.ml_notif_wq, wait);
+	count = incfs_get_uncollected_logs_count(mi, &log_state->state);
+	if (count >= mi->mi_options.read_log_wakeup_count)
+		ret = EPOLLIN | EPOLLRDNORM;
+
+	return ret;
+}
+
+static int log_open(struct inode *inode, struct file *file)
+{
+	struct log_file_state *log_state = NULL;
+	struct mount_info *mi = get_mount_info(file_superblock(file));
+
+	log_state = kzalloc(sizeof(*log_state), GFP_NOFS);
+	if (!log_state)
+		return -ENOMEM;
+
+	log_state->state = incfs_get_log_state(mi);
+	file->private_data = log_state;
+	return 0;
+}
+
+static int log_release(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+	return 0;
+}
+
+static const struct file_operations incfs_log_file_ops = {
+	.read = log_read,
+	.poll = log_poll,
+	.open = log_open,
+	.release = log_release,
+	.llseek = noop_llseek,
+};
+
+/*******************************************************************************
+ * .blocks_written pseudo file definition
+ ******************************************************************************/
+#define INCFS_BLOCKS_WRITTEN_INODE 4
+static const char blocks_written_file_name[] = INCFS_BLOCKS_WRITTEN_FILENAME;
+
+/* State of an open .blocks_written file, unique for each file descriptor. */
+struct blocks_written_file_state {
+	unsigned long blocks_written;
+};
+
+static ssize_t blocks_written_read(struct file *f, char __user *buf, size_t len,
+			loff_t *ppos)
+{
+	struct mount_info *mi = get_mount_info(file_superblock(f));
+	struct blocks_written_file_state *state = f->private_data;
+	unsigned long blocks_written;
+	char string[21];
+	int result = 0;
+
+	if (!mi)
+		return -EFAULT;
+
+	blocks_written = atomic_read(&mi->mi_blocks_written);
+	if (state->blocks_written == blocks_written)
+		return 0;
+
+	result = snprintf(string, sizeof(string), "%lu", blocks_written);
+	if (result > len)
+		result = len;
+	if (copy_to_user(buf, string, result))
+		return -EFAULT;
+
+	state->blocks_written = blocks_written;
+	return result;
+}
+
+static __poll_t blocks_written_poll(struct file *f, poll_table *wait)
+{
+	struct mount_info *mi = get_mount_info(file_superblock(f));
+	struct blocks_written_file_state *state = f->private_data;
+	unsigned long blocks_written;
+
+	if (!mi)
+		return 0;
+
+	poll_wait(f, &mi->mi_blocks_written_notif_wq, wait);
+	blocks_written = atomic_read(&mi->mi_blocks_written);
+	if (state->blocks_written == blocks_written)
+		return 0;
+
+	return EPOLLIN | EPOLLRDNORM;
+}
+
+static int blocks_written_open(struct inode *inode, struct file *file)
+{
+	struct blocks_written_file_state *state =
+		kzalloc(sizeof(*state), GFP_NOFS);
+
+	if (!state)
+		return -ENOMEM;
+
+	state->blocks_written = -1;
+	file->private_data = state;
+	return 0;
+}
+
+static int blocks_written_release(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+	return 0;
+}
+
+static const struct file_operations incfs_blocks_written_file_ops = {
+	.read = blocks_written_read,
+	.poll = blocks_written_poll,
+	.open = blocks_written_open,
+	.release = blocks_written_release,
+	.llseek = noop_llseek,
+};
+
+/*******************************************************************************
+ * Generic inode lookup functionality
+ ******************************************************************************/
+
+const struct mem_range incfs_pseudo_file_names[] = {
+	{ .data = (u8 *)pending_reads_file_name,
+	  .len = ARRAY_SIZE(pending_reads_file_name) - 1 },
+	{ .data = (u8 *)log_file_name, .len = ARRAY_SIZE(log_file_name) - 1 },
+	{ .data = (u8 *)blocks_written_file_name,
+	  .len = ARRAY_SIZE(blocks_written_file_name) - 1 }
+};
+
+const unsigned long incfs_pseudo_file_inodes[] = { INCFS_PENDING_READS_INODE,
+						   INCFS_LOG_INODE,
+						   INCFS_BLOCKS_WRITTEN_INODE };
+
+static const struct file_operations *const pseudo_file_operations[] = {
+	&incfs_pending_reads_file_ops, &incfs_log_file_ops,
+	&incfs_blocks_written_file_ops
+};
+
+static bool is_pseudo_filename(struct mem_range name)
+{
+	int i = 0;
+
+	for (; i < ARRAY_SIZE(incfs_pseudo_file_names); ++i)
+		if (incfs_equal_ranges(incfs_pseudo_file_names[i], name))
+			return true;
+	return false;
+}
+
+static bool get_pseudo_inode(int ino, struct inode *inode)
+{
+	int i = 0;
+
+	for (; i < ARRAY_SIZE(incfs_pseudo_file_inodes); ++i)
+		if (ino == incfs_pseudo_file_inodes[i])
+			break;
+	if (i == ARRAY_SIZE(incfs_pseudo_file_inodes))
+		return false;
+
+	inode->i_ctime = (struct timespec64){};
+	inode->i_mtime = inode->i_ctime;
+	inode->i_atime = inode->i_ctime;
+	inode->i_size = 0;
+	inode->i_ino = ino;
+	inode->i_private = NULL;
+	inode_init_owner(&init_user_ns, inode, NULL, S_IFREG | READ_WRITE_FILE_MODE);
+	inode->i_op = &incfs_file_inode_ops;
+	inode->i_fop = pseudo_file_operations[i];
+	return true;
+}
+
+struct inode_search {
+	unsigned long ino;
+};
+
+static int inode_test(struct inode *inode, void *opaque)
+{
+	struct inode_search *search = opaque;
+
+	return inode->i_ino == search->ino;
+}
+
+static int inode_set(struct inode *inode, void *opaque)
+{
+	struct inode_search *search = opaque;
+
+	if (get_pseudo_inode(search->ino, inode))
+		return 0;
+
+	/* Unknown inode requested. */
+	return -EINVAL;
+}
+
+static struct inode *fetch_inode(struct super_block *sb, unsigned long ino)
+{
+	struct inode_search search = {
+		.ino = ino
+	};
+	struct inode *inode = iget5_locked(sb, search.ino, inode_test,
+				inode_set, &search);
+
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+
+	if (inode->i_state & I_NEW)
+		unlock_new_inode(inode);
+
+	return inode;
+}
+
+int dir_lookup_pseudo_files(struct super_block *sb, struct dentry *dentry)
+{
+	struct mem_range name_range =
+			range((u8 *)dentry->d_name.name, dentry->d_name.len);
+	unsigned long ino;
+	struct inode *inode;
+	int i = 0;
+
+	for (; i < ARRAY_SIZE(incfs_pseudo_file_names); ++i)
+		if (incfs_equal_ranges(incfs_pseudo_file_names[i], name_range))
+			break;
+	if (i == ARRAY_SIZE(incfs_pseudo_file_names))
+		return -ENOENT;
+
+	ino = incfs_pseudo_file_inodes[i];
+
+	inode = fetch_inode(sb, ino);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+
+	d_add(dentry, inode);
+	return 0;
+}
+
+int emit_pseudo_files(struct dir_context *ctx)
+{
+	loff_t i = ctx->pos;
+
+	for (; i < ARRAY_SIZE(incfs_pseudo_file_names); ++i) {
+		if (!dir_emit(ctx, incfs_pseudo_file_names[i].data,
+			      incfs_pseudo_file_names[i].len,
+			      incfs_pseudo_file_inodes[i], DT_REG))
+			return -EINVAL;
+
+		ctx->pos++;
+	}
+	return 0;
+}

diff --git a/fs/incfs/pseudo_files.h b/fs/incfs/pseudo_files.h
new file mode 100644
index 0000000..1887218
--- /dev/null
+++ b/fs/incfs/pseudo_files.h

@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Google LLC
+ */
+
+#ifndef _INCFS_PSEUDO_FILES_H
+#define _INCFS_PSEUDO_FILES_H
+
+#include "internal.h"
+
+#define PSEUDO_FILE_COUNT 3
+#define INCFS_START_INO_RANGE 10
+
+extern const struct mem_range incfs_pseudo_file_names[PSEUDO_FILE_COUNT];
+extern const unsigned long incfs_pseudo_file_inodes[PSEUDO_FILE_COUNT];
+
+int dir_lookup_pseudo_files(struct super_block *sb, struct dentry *dentry);
+int emit_pseudo_files(struct dir_context *ctx);
+
+#endif

diff --git a/fs/incfs/sysfs.c b/fs/incfs/sysfs.c
new file mode 100644
index 0000000..360f03c
--- /dev/null
+++ b/fs/incfs/sysfs.c

@@ -0,0 +1,201 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Google LLC
+ */
+#include <linux/fs.h>
+#include <linux/kobject.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+#include "sysfs.h"
+#include "data_mgmt.h"
+#include "vfs.h"
+
+/******************************************************************************
+ * Define sys/fs/incrementalfs & sys/fs/incrementalfs/features
+ *****************************************************************************/
+#define INCFS_NODE_FEATURES "features"
+#define INCFS_NODE_INSTANCES "instances"
+
+static struct kobject *sysfs_root;
+static struct kobject *features_node;
+static struct kobject *instances_node;
+
+#define DECLARE_FEATURE_FLAG(name)					\
+	static ssize_t name##_show(struct kobject *kobj,		\
+			 struct kobj_attribute *attr, char *buff)	\
+{									\
+	return sysfs_emit(buff, "supported\n");				\
+}									\
+									\
+static struct kobj_attribute name##_attr = __ATTR_RO(name)
+
+DECLARE_FEATURE_FLAG(corefs);
+DECLARE_FEATURE_FLAG(zstd);
+DECLARE_FEATURE_FLAG(v2);
+
+static struct attribute *attributes[] = {
+	&corefs_attr.attr,
+	&zstd_attr.attr,
+	&v2_attr.attr,
+	NULL,
+};
+
+static const struct attribute_group attr_group = {
+	.attrs = attributes,
+};
+
+int __init incfs_init_sysfs(void)
+{
+	int res = -ENOMEM;
+
+	sysfs_root = kobject_create_and_add(INCFS_NAME, fs_kobj);
+	if (!sysfs_root)
+		return -ENOMEM;
+
+	instances_node = kobject_create_and_add(INCFS_NODE_INSTANCES,
+						sysfs_root);
+	if (!instances_node)
+		goto err_put_root;
+
+	features_node = kobject_create_and_add(INCFS_NODE_FEATURES,
+						sysfs_root);
+	if (!features_node)
+		goto err_put_instances;
+
+	res = sysfs_create_group(features_node, &attr_group);
+	if (res)
+		goto err_put_features;
+
+	return 0;
+
+err_put_features:
+	kobject_put(features_node);
+err_put_instances:
+	kobject_put(instances_node);
+err_put_root:
+	kobject_put(sysfs_root);
+
+	return res;
+}
+
+void incfs_cleanup_sysfs(void)
+{
+	if (features_node) {
+		sysfs_remove_group(features_node, &attr_group);
+		kobject_put(features_node);
+	}
+
+	kobject_put(instances_node);
+	kobject_put(sysfs_root);
+}
+
+/******************************************************************************
+ * Define sys/fs/incrementalfs/instances/<name>/
+ *****************************************************************************/
+#define __DECLARE_STATUS_FLAG(name)					\
+static ssize_t name##_show(struct kobject *kobj,			\
+			 struct kobj_attribute *attr, char *buff)	\
+{									\
+	struct incfs_sysfs_node *node = container_of(kobj,		\
+			struct incfs_sysfs_node, isn_sysfs_node);	\
+									\
+	return sysfs_emit(buff, "%d\n", node->isn_mi->mi_##name);	\
+}									\
+									\
+static struct kobj_attribute name##_attr = __ATTR_RO(name)
+
+#define __DECLARE_STATUS_FLAG64(name)					\
+static ssize_t name##_show(struct kobject *kobj,			\
+			 struct kobj_attribute *attr, char *buff)	\
+{									\
+	struct incfs_sysfs_node *node = container_of(kobj,		\
+			struct incfs_sysfs_node, isn_sysfs_node);	\
+									\
+	return sysfs_emit(buff, "%lld\n", node->isn_mi->mi_##name);	\
+}									\
+									\
+static struct kobj_attribute name##_attr = __ATTR_RO(name)
+
+__DECLARE_STATUS_FLAG(reads_failed_timed_out);
+__DECLARE_STATUS_FLAG(reads_failed_hash_verification);
+__DECLARE_STATUS_FLAG(reads_failed_other);
+__DECLARE_STATUS_FLAG(reads_delayed_pending);
+__DECLARE_STATUS_FLAG64(reads_delayed_pending_us);
+__DECLARE_STATUS_FLAG(reads_delayed_min);
+__DECLARE_STATUS_FLAG64(reads_delayed_min_us);
+
+static struct attribute *mount_attributes[] = {
+	&reads_failed_timed_out_attr.attr,
+	&reads_failed_hash_verification_attr.attr,
+	&reads_failed_other_attr.attr,
+	&reads_delayed_pending_attr.attr,
+	&reads_delayed_pending_us_attr.attr,
+	&reads_delayed_min_attr.attr,
+	&reads_delayed_min_us_attr.attr,
+	NULL,
+};
+
+static void incfs_sysfs_release(struct kobject *kobj)
+{
+	struct incfs_sysfs_node *node = container_of(kobj,
+				struct incfs_sysfs_node, isn_sysfs_node);
+
+	complete(&node->isn_completion);
+}
+
+static const struct attribute_group mount_attr_group = {
+	.attrs = mount_attributes,
+};
+
+static struct kobj_type incfs_kobj_node_ktype = {
+	.sysfs_ops	= &kobj_sysfs_ops,
+	.release	= &incfs_sysfs_release,
+};
+
+struct incfs_sysfs_node *incfs_add_sysfs_node(const char *name,
+					      struct mount_info *mi)
+{
+	struct incfs_sysfs_node *node = NULL;
+	int error;
+
+	if (!name)
+		return NULL;
+
+	node = kzalloc(sizeof(*node), GFP_NOFS);
+	if (!node)
+		return ERR_PTR(-ENOMEM);
+
+	node->isn_mi = mi;
+
+	init_completion(&node->isn_completion);
+	kobject_init(&node->isn_sysfs_node, &incfs_kobj_node_ktype);
+	error = kobject_add(&node->isn_sysfs_node, instances_node, "%s", name);
+	if (error)
+		goto err;
+
+	error = sysfs_create_group(&node->isn_sysfs_node, &mount_attr_group);
+	if (error)
+		goto err;
+
+	return node;
+
+err:
+	/*
+	 * Note kobject_put always calls release, so incfs_sysfs_release will
+	 * free node
+	 */
+	kobject_put(&node->isn_sysfs_node);
+	return ERR_PTR(error);
+}
+
+void incfs_free_sysfs_node(struct incfs_sysfs_node *node)
+{
+	if (!node)
+		return;
+
+	sysfs_remove_group(&node->isn_sysfs_node, &mount_attr_group);
+	kobject_put(&node->isn_sysfs_node);
+	wait_for_completion_interruptible(&node->isn_completion);
+	kfree(node);
+}

diff --git a/fs/incfs/sysfs.h b/fs/incfs/sysfs.h
new file mode 100644
index 0000000..65bf554
--- /dev/null
+++ b/fs/incfs/sysfs.h

@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2021 Google LLC
+ */
+#ifndef _INCFS_SYSFS_H
+#define _INCFS_SYSFS_H
+
+struct incfs_sysfs_node {
+	struct kobject isn_sysfs_node;
+
+	struct completion isn_completion;
+
+	struct mount_info *isn_mi;
+};
+
+int incfs_init_sysfs(void);
+void incfs_cleanup_sysfs(void);
+struct incfs_sysfs_node *incfs_add_sysfs_node(const char *name,
+					      struct mount_info *mi);
+void incfs_free_sysfs_node(struct incfs_sysfs_node *node);
+
+#endif

diff --git a/fs/incfs/verity.c b/fs/incfs/verity.c
new file mode 100644
index 0000000..b69656b
--- /dev/null
+++ b/fs/incfs/verity.c

@@ -0,0 +1,851 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Google LLC
+ */
+
+/*
+ * fs-verity integration into incfs
+ *
+ * Since incfs has its own merkle tree implementation, most of fs-verity code
+ * is not needed. The key part that is needed is the signature check, since
+ * that is based on the private /proc/sys/fs/verity/require_signatures value
+ * and a private keyring. Thus the first change is to modify verity code to
+ * export a version of fsverity_verify_signature.
+ *
+ * fs-verity integration then consists of the following modifications:
+ *
+ * 1. Add the (optional) verity signature to the incfs file format
+ * 2. Add a pointer to the digest of the fs-verity descriptor struct to the
+ *    data_file struct that incfs attaches to each file inode.
+ * 3. Add the following ioclts:
+ *  - FS_IOC_ENABLE_VERITY
+ *  - FS_IOC_GETFLAGS
+ *  - FS_IOC_MEASURE_VERITY
+ * 4. When FS_IOC_ENABLE_VERITY is called on a non-verity file, the
+ *    fs-verity descriptor struct is populated and digested. If it passes the
+ *    signature check or the signature is NULL and
+ *    fs.verity.require_signatures=0, then the S_VERITY flag is set and the
+ *    xattr incfs.verity is set. If the signature is non-NULL, an
+ *    INCFS_MD_VERITY_SIGNATURE is added to the backing file containing the
+ *    signature.
+ * 5. When a file with an incfs.verity xattr's inode is initialized, the
+ *    inode’s S_VERITY flag is set.
+ * 6. When a file with the S_VERITY flag set on its inode is opened, the
+ *    data_file is checked for its verity digest. If the file doesn’t have a
+ *    digest, the file’s digest is calculated as above, checked, and set, or the
+ *    open is denied if it is not valid.
+ * 7. FS_IOC_GETFLAGS simply returns the value of the S_VERITY flag
+ * 8. FS_IOC_MEASURE_VERITY simply returns the cached digest
+ * 9. The final complication is that if FS_IOC_ENABLE_VERITY is called on a file
+ *    which doesn’t have a merkle tree, the merkle tree is calculated before the
+ *    rest of the process is completed.
+ */
+
+#include <crypto/hash.h>
+#include <crypto/sha2.h>
+#include <linux/fsverity.h>
+#include <linux/mount.h>
+
+#include "verity.h"
+
+#include "data_mgmt.h"
+#include "format.h"
+#include "integrity.h"
+#include "vfs.h"
+
+#define FS_VERITY_MAX_SIGNATURE_SIZE	16128
+
+static int incfs_get_root_hash(struct file *filp, u8 *root_hash)
+{
+	struct data_file *df = get_incfs_data_file(filp);
+
+	if (!df)
+		return -EINVAL;
+
+	memcpy(root_hash, df->df_hash_tree->root_hash,
+	       df->df_hash_tree->alg->digest_size);
+
+	return 0;
+}
+
+static int incfs_end_enable_verity(struct file *filp, u8 *sig, size_t sig_size)
+{
+	struct inode *inode = file_inode(filp);
+	struct mem_range signature = {
+		.data = sig,
+		.len = sig_size,
+	};
+	struct data_file *df = get_incfs_data_file(filp);
+	struct backing_file_context *bfc;
+	int error;
+	struct incfs_df_verity_signature *vs = NULL;
+	loff_t offset;
+
+	if (!df || !df->df_backing_file_context)
+		return -EFSCORRUPTED;
+
+	if (sig) {
+		vs = kzalloc(sizeof(*vs), GFP_NOFS);
+		if (!vs)
+			return -ENOMEM;
+	}
+
+	bfc = df->df_backing_file_context;
+	error = mutex_lock_interruptible(&bfc->bc_mutex);
+	if (error)
+		goto out;
+
+	error = incfs_write_verity_signature_to_backing_file(bfc, signature,
+							     &offset);
+	mutex_unlock(&bfc->bc_mutex);
+	if (error)
+		goto out;
+
+	/*
+	 * Set verity xattr so we can set S_VERITY without opening backing file
+	 */
+	error = vfs_setxattr(&init_user_ns, bfc->bc_file->f_path.dentry,
+			     INCFS_XATTR_VERITY_NAME, NULL, 0, XATTR_CREATE);
+	if (error) {
+		pr_warn("incfs: error setting verity xattr: %d\n", error);
+		goto out;
+	}
+
+	if (sig) {
+		*vs = (struct incfs_df_verity_signature) {
+			.size = signature.len,
+			.offset = offset,
+		};
+
+		df->df_verity_signature = vs;
+		vs = NULL;
+	}
+
+	inode_set_flags(inode, S_VERITY, S_VERITY);
+
+out:
+	kfree(vs);
+	return error;
+}
+
+static int incfs_compute_file_digest(struct incfs_hash_alg *alg,
+				struct fsverity_descriptor *desc,
+				u8 *digest)
+{
+	SHASH_DESC_ON_STACK(d, alg->shash);
+
+	d->tfm = alg->shash;
+	return crypto_shash_digest(d, (u8 *)desc, sizeof(*desc), digest);
+}
+
+static enum incfs_hash_tree_algorithm incfs_convert_fsverity_hash_alg(
+								int hash_alg)
+{
+	switch (hash_alg) {
+	case FS_VERITY_HASH_ALG_SHA256:
+		return INCFS_HASH_TREE_SHA256;
+	default:
+		return -EINVAL;
+	}
+}
+
+static struct mem_range incfs_get_verity_digest(struct inode *inode)
+{
+	struct inode_info *node = get_incfs_node(inode);
+	struct data_file *df;
+	struct mem_range verity_file_digest;
+
+	if (!node) {
+		pr_warn("Invalid inode\n");
+		return range(NULL, 0);
+	}
+
+	df = node->n_file;
+
+	/*
+	 * Pairs with the cmpxchg_release() in incfs_set_verity_digest().
+	 * I.e., another task may publish ->df_verity_file_digest concurrently,
+	 * executing a RELEASE barrier.  We need to use smp_load_acquire() here
+	 * to safely ACQUIRE the memory the other task published.
+	 */
+	verity_file_digest.data = smp_load_acquire(
+					&df->df_verity_file_digest.data);
+	verity_file_digest.len = df->df_verity_file_digest.len;
+	return verity_file_digest;
+}
+
+static void incfs_set_verity_digest(struct inode *inode,
+				     struct mem_range verity_file_digest)
+{
+	struct inode_info *node = get_incfs_node(inode);
+	struct data_file *df;
+
+	if (!node) {
+		pr_warn("Invalid inode\n");
+		kfree(verity_file_digest.data);
+		return;
+	}
+
+	df = node->n_file;
+	df->df_verity_file_digest.len = verity_file_digest.len;
+
+	/*
+	 * Multiple tasks may race to set ->df_verity_file_digest.data, so use
+	 * cmpxchg_release().  This pairs with the smp_load_acquire() in
+	 * incfs_get_verity_digest().  I.e., here we publish
+	 * ->df_verity_file_digest.data, with a RELEASE barrier so that other
+	 * tasks can ACQUIRE it.
+	 */
+	if (cmpxchg_release(&df->df_verity_file_digest.data, NULL,
+			    verity_file_digest.data) != NULL)
+		/* Lost the race, so free the file_digest we allocated. */
+		kfree(verity_file_digest.data);
+}
+
+/*
+ * Calculate the digest of the fsverity_descriptor. The signature (if present)
+ * is also checked.
+ */
+static struct mem_range incfs_calc_verity_digest_from_desc(
+					const struct inode *inode,
+					struct fsverity_descriptor *desc,
+					u8 *signature, size_t sig_size)
+{
+	enum incfs_hash_tree_algorithm incfs_hash_alg;
+	struct mem_range verity_file_digest;
+	int err;
+	struct incfs_hash_alg *hash_alg;
+
+	incfs_hash_alg = incfs_convert_fsverity_hash_alg(desc->hash_algorithm);
+	if (incfs_hash_alg < 0)
+		return range(ERR_PTR(incfs_hash_alg), 0);
+
+	hash_alg = incfs_get_hash_alg(incfs_hash_alg);
+	if (IS_ERR(hash_alg))
+		return range((u8 *)hash_alg, 0);
+
+	verity_file_digest = range(kzalloc(hash_alg->digest_size, GFP_KERNEL),
+				   hash_alg->digest_size);
+	if (!verity_file_digest.data)
+		return range(ERR_PTR(-ENOMEM), 0);
+
+	err = incfs_compute_file_digest(hash_alg, desc,
+					verity_file_digest.data);
+	if (err) {
+		pr_err("Error %d computing file digest", err);
+		goto out;
+	}
+	pr_debug("Computed file digest: %s:%*phN\n",
+		 hash_alg->name, (int) verity_file_digest.len,
+		 verity_file_digest.data);
+
+	err = __fsverity_verify_signature(inode, signature, sig_size,
+					  verity_file_digest.data,
+					  desc->hash_algorithm);
+out:
+	if (err) {
+		kfree(verity_file_digest.data);
+		verity_file_digest = range(ERR_PTR(err), 0);
+	}
+	return verity_file_digest;
+}
+
+static struct fsverity_descriptor *incfs_get_fsverity_descriptor(
+					struct file *filp, int hash_algorithm)
+{
+	struct inode *inode = file_inode(filp);
+	struct fsverity_descriptor *desc = kzalloc(sizeof(*desc), GFP_KERNEL);
+	int err;
+
+	if (!desc)
+		return ERR_PTR(-ENOMEM);
+
+	*desc = (struct fsverity_descriptor) {
+		.version = 1,
+		.hash_algorithm = hash_algorithm,
+		.log_blocksize = ilog2(INCFS_DATA_FILE_BLOCK_SIZE),
+		.data_size = cpu_to_le64(inode->i_size),
+	};
+
+	err = incfs_get_root_hash(filp, desc->root_hash);
+	if (err) {
+		kfree(desc);
+		return ERR_PTR(err);
+	}
+
+	return desc;
+}
+
+static struct mem_range incfs_calc_verity_digest(
+					struct inode *inode, struct file *filp,
+					u8 *signature, size_t signature_size,
+					int hash_algorithm)
+{
+	struct fsverity_descriptor *desc = incfs_get_fsverity_descriptor(filp,
+							hash_algorithm);
+	struct mem_range verity_file_digest;
+
+	if (IS_ERR(desc))
+		return range((u8 *)desc, 0);
+	verity_file_digest = incfs_calc_verity_digest_from_desc(inode, desc,
+						signature, signature_size);
+	kfree(desc);
+	return verity_file_digest;
+}
+
+static int incfs_build_merkle_tree(struct file *f, struct data_file *df,
+			     struct backing_file_context *bfc,
+			     struct mtree *hash_tree, loff_t hash_offset,
+			     struct incfs_hash_alg *alg, struct mem_range hash)
+{
+	int error = 0;
+	int limit, lvl, i, result;
+	struct mem_range buf = {.len = INCFS_DATA_FILE_BLOCK_SIZE};
+	struct mem_range tmp = {.len = 2 * INCFS_DATA_FILE_BLOCK_SIZE};
+
+	buf.data = (u8 *)__get_free_pages(GFP_NOFS, get_order(buf.len));
+	tmp.data = (u8 *)__get_free_pages(GFP_NOFS, get_order(tmp.len));
+	if (!buf.data || !tmp.data) {
+		error = -ENOMEM;
+		goto out;
+	}
+
+	/*
+	 * lvl - 1 is the level we are reading, lvl the level we are writing
+	 * lvl == -1 means actual blocks
+	 * lvl == hash_tree->depth means root hash
+	 */
+	limit = df->df_data_block_count;
+	for (lvl = 0; lvl <= hash_tree->depth; lvl++) {
+		for (i = 0; i < limit; ++i) {
+			loff_t hash_level_offset;
+			struct mem_range partial_buf = buf;
+
+			if (lvl == 0)
+				result = incfs_read_data_file_block(partial_buf,
+						f, i, tmp, NULL);
+			else {
+				hash_level_offset = hash_offset +
+				       hash_tree->hash_level_suboffset[lvl - 1];
+
+				result = incfs_kread(bfc, partial_buf.data,
+						partial_buf.len,
+						hash_level_offset + i *
+						INCFS_DATA_FILE_BLOCK_SIZE);
+			}
+
+			if (result < 0) {
+				error = result;
+				goto out;
+			}
+
+			partial_buf.len = result;
+			error = incfs_calc_digest(alg, partial_buf, hash);
+			if (error)
+				goto out;
+
+			/*
+			 * last level - only one hash to take and it is stored
+			 * in the incfs signature record
+			 */
+			if (lvl == hash_tree->depth)
+				break;
+
+			hash_level_offset = hash_offset +
+				hash_tree->hash_level_suboffset[lvl];
+
+			result = incfs_kwrite(bfc, hash.data, hash.len,
+					hash_level_offset + hash.len * i);
+
+			if (result < 0) {
+				error = result;
+				goto out;
+			}
+
+			if (result != hash.len) {
+				error = -EIO;
+				goto out;
+			}
+		}
+		limit = DIV_ROUND_UP(limit,
+				     INCFS_DATA_FILE_BLOCK_SIZE / hash.len);
+	}
+
+out:
+	free_pages((unsigned long)tmp.data, get_order(tmp.len));
+	free_pages((unsigned long)buf.data, get_order(buf.len));
+	return error;
+}
+
+/*
+ * incfs files have a signature record that is separate from the
+ * verity_signature record. The signature record does not actually contain a
+ * signature, rather it contains the size/offset of the hash tree, and a binary
+ * blob which contains the root hash and potentially a signature.
+ *
+ * If the file was created with a signature record, then this function simply
+ * returns.
+ *
+ * Otherwise it will create a signature record with a minimal binary blob as
+ * defined by the structure below, create space for the hash tree and then
+ * populate it using incfs_build_merkle_tree
+ */
+static int incfs_add_signature_record(struct file *f)
+{
+	/* See incfs_parse_signature */
+	struct {
+		__le32 version;
+		__le32 size_of_hash_info_section;
+		struct {
+			__le32 hash_algorithm;
+			u8 log2_blocksize;
+			__le32 salt_size;
+			u8 salt[0];
+			__le32 hash_size;
+			u8 root_hash[32];
+		} __packed hash_section;
+		__le32 size_of_signing_info_section;
+		u8 signing_info_section[0];
+	} __packed sig = {
+		.version = cpu_to_le32(INCFS_SIGNATURE_VERSION),
+		.size_of_hash_info_section =
+			cpu_to_le32(sizeof(sig.hash_section)),
+		.hash_section = {
+			.hash_algorithm = cpu_to_le32(INCFS_HASH_TREE_SHA256),
+			.log2_blocksize = ilog2(INCFS_DATA_FILE_BLOCK_SIZE),
+			.hash_size = cpu_to_le32(SHA256_DIGEST_SIZE),
+		},
+	};
+
+	struct data_file *df = get_incfs_data_file(f);
+	struct mtree *hash_tree = NULL;
+	struct backing_file_context *bfc;
+	int error;
+	loff_t hash_offset, sig_offset;
+	struct incfs_hash_alg *alg = incfs_get_hash_alg(INCFS_HASH_TREE_SHA256);
+	u8 hash_buf[INCFS_MAX_HASH_SIZE];
+	int hash_size = alg->digest_size;
+	struct mem_range hash = range(hash_buf, hash_size);
+	int result;
+	struct incfs_df_signature *signature = NULL;
+
+	if (!df)
+		return -EINVAL;
+
+	if (df->df_header_flags & INCFS_FILE_MAPPED)
+		return -EINVAL;
+
+	/* Already signed? */
+	if (df->df_signature && df->df_hash_tree)
+		return 0;
+
+	if (df->df_signature || df->df_hash_tree)
+		return -EFSCORRUPTED;
+
+	/* Add signature metadata record to file */
+	hash_tree = incfs_alloc_mtree(range((u8 *)&sig, sizeof(sig)),
+				      df->df_data_block_count);
+	if (IS_ERR(hash_tree))
+		return PTR_ERR(hash_tree);
+
+	bfc = df->df_backing_file_context;
+	if (!bfc) {
+		error = -EFSCORRUPTED;
+		goto out;
+	}
+
+	error = mutex_lock_interruptible(&bfc->bc_mutex);
+	if (error)
+		goto out;
+
+	error = incfs_write_signature_to_backing_file(bfc,
+				range((u8 *)&sig, sizeof(sig)),
+				hash_tree->hash_tree_area_size,
+				&hash_offset, &sig_offset);
+	mutex_unlock(&bfc->bc_mutex);
+	if (error)
+		goto out;
+
+	/* Populate merkle tree */
+	error = incfs_build_merkle_tree(f, df, bfc, hash_tree, hash_offset, alg,
+				  hash);
+	if (error)
+		goto out;
+
+	/* Update signature metadata record */
+	memcpy(sig.hash_section.root_hash, hash.data, alg->digest_size);
+	result = incfs_kwrite(bfc, &sig, sizeof(sig), sig_offset);
+	if (result < 0) {
+		error = result;
+		goto out;
+	}
+
+	if (result != sizeof(sig)) {
+		error = -EIO;
+		goto out;
+	}
+
+	/* Update in-memory records */
+	memcpy(hash_tree->root_hash, hash.data, alg->digest_size);
+	signature = kzalloc(sizeof(*signature), GFP_NOFS);
+	if (!signature) {
+		error = -ENOMEM;
+		goto out;
+	}
+	*signature = (struct incfs_df_signature) {
+		.hash_offset = hash_offset,
+		.hash_size = hash_tree->hash_tree_area_size,
+		.sig_offset = sig_offset,
+		.sig_size = sizeof(sig),
+	};
+	df->df_signature = signature;
+	signature = NULL;
+
+	/*
+	 * Use memory barrier to prevent readpage seeing the hash tree until
+	 * it's fully there
+	 */
+	smp_store_release(&df->df_hash_tree, hash_tree);
+	hash_tree = NULL;
+
+out:
+	kfree(signature);
+	kfree(hash_tree);
+	return error;
+}
+
+static int incfs_enable_verity(struct file *filp,
+			 const struct fsverity_enable_arg *arg)
+{
+	struct inode *inode = file_inode(filp);
+	struct data_file *df = get_incfs_data_file(filp);
+	u8 *signature = NULL;
+	struct mem_range verity_file_digest = range(NULL, 0);
+	int err;
+
+	if (!df)
+		return -EFSCORRUPTED;
+
+	err = mutex_lock_interruptible(&df->df_enable_verity);
+	if (err)
+		return err;
+
+	if (IS_VERITY(inode)) {
+		err = -EEXIST;
+		goto out;
+	}
+
+	err = incfs_add_signature_record(filp);
+	if (err)
+		goto out;
+
+	/* Get the signature if the user provided one */
+	if (arg->sig_size) {
+		signature = memdup_user(u64_to_user_ptr(arg->sig_ptr),
+					arg->sig_size);
+		if (IS_ERR(signature)) {
+			err = PTR_ERR(signature);
+			signature = NULL;
+			goto out;
+		}
+	}
+
+	verity_file_digest = incfs_calc_verity_digest(inode, filp, signature,
+					arg->sig_size, arg->hash_algorithm);
+	if (IS_ERR(verity_file_digest.data)) {
+		err = PTR_ERR(verity_file_digest.data);
+		verity_file_digest.data = NULL;
+		goto out;
+	}
+
+	err = incfs_end_enable_verity(filp, signature, arg->sig_size);
+	if (err)
+		goto out;
+
+	/* Successfully enabled verity */
+	incfs_set_verity_digest(inode, verity_file_digest);
+	verity_file_digest.data = NULL;
+out:
+	mutex_unlock(&df->df_enable_verity);
+	kfree(signature);
+	kfree(verity_file_digest.data);
+	if (err)
+		pr_err("%s failed with err %d\n", __func__, err);
+	return err;
+}
+
+int incfs_ioctl_enable_verity(struct file *filp, const void __user *uarg)
+{
+	struct inode *inode = file_inode(filp);
+	struct fsverity_enable_arg arg;
+
+	if (copy_from_user(&arg, uarg, sizeof(arg)))
+		return -EFAULT;
+
+	if (arg.version != 1)
+		return -EINVAL;
+
+	if (arg.__reserved1 ||
+	    memchr_inv(arg.__reserved2, 0, sizeof(arg.__reserved2)))
+		return -EINVAL;
+
+	if (arg.hash_algorithm != FS_VERITY_HASH_ALG_SHA256)
+		return -EINVAL;
+
+	if (arg.block_size != PAGE_SIZE)
+		return -EINVAL;
+
+	if (arg.salt_size)
+		return -EINVAL;
+
+	if (arg.sig_size > FS_VERITY_MAX_SIGNATURE_SIZE)
+		return -EMSGSIZE;
+
+	if (S_ISDIR(inode->i_mode))
+		return -EISDIR;
+
+	if (!S_ISREG(inode->i_mode))
+		return -EINVAL;
+
+	return incfs_enable_verity(filp, &arg);
+}
+
+static u8 *incfs_get_verity_signature(struct file *filp, size_t *sig_size)
+{
+	struct data_file *df = get_incfs_data_file(filp);
+	struct incfs_df_verity_signature *vs;
+	u8 *signature;
+	int res;
+
+	if (!df || !df->df_backing_file_context)
+		return ERR_PTR(-EFSCORRUPTED);
+
+	vs = df->df_verity_signature;
+	if (!vs) {
+		*sig_size = 0;
+		return NULL;
+	}
+
+	if (!vs->size) {
+		*sig_size = 0;
+		return ERR_PTR(-EFSCORRUPTED);
+	}
+
+	signature = kzalloc(vs->size, GFP_KERNEL);
+	if (!signature)
+		return ERR_PTR(-ENOMEM);
+
+	res = incfs_kread(df->df_backing_file_context,
+			  signature, vs->size, vs->offset);
+
+	if (res < 0)
+		goto err_out;
+
+	if (res != vs->size) {
+		res = -EINVAL;
+		goto err_out;
+	}
+
+	*sig_size = vs->size;
+	return signature;
+
+err_out:
+	kfree(signature);
+	return ERR_PTR(res);
+}
+
+/* Ensure data_file->df_verity_file_digest is populated */
+static int ensure_verity_info(struct inode *inode, struct file *filp)
+{
+	struct mem_range verity_file_digest;
+	u8 *signature = NULL;
+	size_t sig_size;
+	int err = 0;
+
+	/* See if this file's verity file digest is already cached */
+	verity_file_digest = incfs_get_verity_digest(inode);
+	if (verity_file_digest.data)
+		return 0;
+
+	signature = incfs_get_verity_signature(filp, &sig_size);
+	if (IS_ERR(signature))
+		return PTR_ERR(signature);
+
+	verity_file_digest = incfs_calc_verity_digest(inode, filp, signature,
+						     sig_size,
+						     FS_VERITY_HASH_ALG_SHA256);
+	if (IS_ERR(verity_file_digest.data)) {
+		err = PTR_ERR(verity_file_digest.data);
+		goto out;
+	}
+
+	incfs_set_verity_digest(inode, verity_file_digest);
+
+out:
+	kfree(signature);
+	return err;
+}
+
+/**
+ * incfs_fsverity_file_open() - prepare to open a file that may be
+ * verity-enabled
+ * @inode: the inode being opened
+ * @filp: the struct file being set up
+ *
+ * When opening a verity file, set up data_file->df_verity_file_digest if not
+ * already done. Note that incfs does not allow opening for writing, so there is
+ * no need for that check.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int incfs_fsverity_file_open(struct inode *inode, struct file *filp)
+{
+	if (IS_VERITY(inode))
+		return ensure_verity_info(inode, filp);
+
+	return 0;
+}
+
+int incfs_ioctl_measure_verity(struct file *filp, void __user *_uarg)
+{
+	struct inode *inode = file_inode(filp);
+	struct mem_range verity_file_digest = incfs_get_verity_digest(inode);
+	struct fsverity_digest __user *uarg = _uarg;
+	struct fsverity_digest arg;
+
+	if (!verity_file_digest.data || !verity_file_digest.len)
+		return -ENODATA; /* not a verity file */
+
+	/*
+	 * The user specifies the digest_size their buffer has space for; we can
+	 * return the digest if it fits in the available space.  We write back
+	 * the actual size, which may be shorter than the user-specified size.
+	 */
+
+	if (get_user(arg.digest_size, &uarg->digest_size))
+		return -EFAULT;
+	if (arg.digest_size < verity_file_digest.len)
+		return -EOVERFLOW;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.digest_algorithm = FS_VERITY_HASH_ALG_SHA256;
+	arg.digest_size = verity_file_digest.len;
+
+	if (copy_to_user(uarg, &arg, sizeof(arg)))
+		return -EFAULT;
+
+	if (copy_to_user(uarg->digest, verity_file_digest.data,
+			 verity_file_digest.len))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int incfs_read_merkle_tree(struct file *filp, void __user *buf,
+				  u64 start_offset, int length)
+{
+	struct mem_range tmp_buf;
+	size_t offset;
+	int retval = 0;
+	int err = 0;
+	struct data_file *df = get_incfs_data_file(filp);
+
+	if (!df)
+		return -EINVAL;
+
+	tmp_buf = (struct mem_range) {
+		.data = kzalloc(INCFS_DATA_FILE_BLOCK_SIZE, GFP_NOFS),
+		.len = INCFS_DATA_FILE_BLOCK_SIZE,
+	};
+	if (!tmp_buf.data)
+		return -ENOMEM;
+
+	for (offset = start_offset; offset < start_offset + length;
+	     offset += tmp_buf.len) {
+		err = incfs_read_merkle_tree_blocks(tmp_buf, df, offset);
+
+		if (err < 0)
+			break;
+
+		if (err != tmp_buf.len)
+			break;
+
+		if (copy_to_user(buf, tmp_buf.data, tmp_buf.len))
+			break;
+
+		buf += tmp_buf.len;
+		retval += tmp_buf.len;
+	}
+
+	kfree(tmp_buf.data);
+	return retval ? retval : err;
+}
+
+static int incfs_read_descriptor(struct file *filp,
+				 void __user *buf, u64 offset, int length)
+{
+	int err;
+	struct fsverity_descriptor *desc = incfs_get_fsverity_descriptor(filp,
+						FS_VERITY_HASH_ALG_SHA256);
+
+	if (IS_ERR(desc))
+		return PTR_ERR(desc);
+	length = min_t(u64, length, sizeof(*desc));
+	err = copy_to_user(buf, desc, length);
+	kfree(desc);
+	return err ? err : length;
+}
+
+static int incfs_read_signature(struct file *filp,
+				void __user *buf, u64 offset, int length)
+{
+	size_t sig_size;
+	static u8 *signature;
+	int err;
+
+	signature = incfs_get_verity_signature(filp, &sig_size);
+	if (IS_ERR(signature))
+		return PTR_ERR(signature);
+
+	if (!signature)
+		return -ENODATA;
+
+	length = min_t(u64, length, sig_size);
+	err = copy_to_user(buf, signature, length);
+	kfree(signature);
+	return err ? err : length;
+}
+
+int incfs_ioctl_read_verity_metadata(struct file *filp,
+				     const void __user *uarg)
+{
+	struct fsverity_read_metadata_arg arg;
+	int length;
+	void __user *buf;
+
+	if (copy_from_user(&arg, uarg, sizeof(arg)))
+		return -EFAULT;
+
+	if (arg.__reserved)
+		return -EINVAL;
+
+	/* offset + length must not overflow. */
+	if (arg.offset + arg.length < arg.offset)
+		return -EINVAL;
+
+	/* Ensure that the return value will fit in INT_MAX. */
+	length = min_t(u64, arg.length, INT_MAX);
+
+	buf = u64_to_user_ptr(arg.buf_ptr);
+
+	switch (arg.metadata_type) {
+	case FS_VERITY_METADATA_TYPE_MERKLE_TREE:
+		return incfs_read_merkle_tree(filp, buf, arg.offset, length);
+	case FS_VERITY_METADATA_TYPE_DESCRIPTOR:
+		return incfs_read_descriptor(filp, buf, arg.offset, length);
+	case FS_VERITY_METADATA_TYPE_SIGNATURE:
+		return incfs_read_signature(filp, buf, arg.offset, length);
+	default:
+		return -EINVAL;
+	}
+}

diff --git a/fs/incfs/verity.h b/fs/incfs/verity.h
new file mode 100644
index 0000000..8fcdbc8
--- /dev/null
+++ b/fs/incfs/verity.h

@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Google LLC
+ */
+
+#ifndef _INCFS_VERITY_H
+#define _INCFS_VERITY_H
+
+/* Arbitrary limit to bound the kmalloc() size.  Can be changed. */
+#define FS_VERITY_MAX_SIGNATURE_SIZE	16128
+
+#ifdef CONFIG_FS_VERITY
+
+int incfs_ioctl_enable_verity(struct file *filp, const void __user *uarg);
+int incfs_ioctl_measure_verity(struct file *filp, void __user *_uarg);
+
+int incfs_fsverity_file_open(struct inode *inode, struct file *filp);
+int incfs_ioctl_read_verity_metadata(struct file *filp,
+				     const void __user *uarg);
+
+#else /* !CONFIG_FS_VERITY */
+
+static inline int incfs_ioctl_enable_verity(struct file *filp,
+					    const void __user *uarg)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int incfs_ioctl_measure_verity(struct file *filp,
+					     void __user *_uarg)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int incfs_fsverity_file_open(struct inode *inode,
+					   struct file *filp)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int incfs_ioctl_read_verity_metadata(struct file *filp,
+						const void __user *uarg)
+{
+	return -EOPNOTSUPP;
+}
+
+#endif /* !CONFIG_FS_VERITY */
+
+#endif

diff --git a/fs/incfs/vfs.c b/fs/incfs/vfs.c
new file mode 100644
index 0000000..8f784e0
--- /dev/null
+++ b/fs/incfs/vfs.c

@@ -0,0 +1,1967 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+
+#include <linux/blkdev.h>
+#include <linux/compat.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/fs_stack.h>
+#include <linux/fsnotify.h>
+#include <linux/fsverity.h>
+#include <linux/mmap_lock.h>
+#include <linux/namei.h>
+#include <linux/pagemap.h>
+#include <linux/parser.h>
+#include <linux/seq_file.h>
+#include <linux/backing-dev-defs.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+#include "vfs.h"
+
+#include "data_mgmt.h"
+#include "format.h"
+#include "internal.h"
+#include "pseudo_files.h"
+#include "sysfs.h"
+#include "verity.h"
+
+static int incfs_remount_fs(struct super_block *sb, int *flags, char *data);
+
+static int dentry_revalidate(struct dentry *dentry, unsigned int flags);
+static void dentry_release(struct dentry *d);
+
+static int iterate_incfs_dir(struct file *file, struct dir_context *ctx);
+static struct dentry *dir_lookup(struct inode *dir_inode,
+		struct dentry *dentry, unsigned int flags);
+static int dir_mkdir(struct user_namespace *ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode);
+static int dir_unlink(struct inode *dir, struct dentry *dentry);
+static int dir_link(struct dentry *old_dentry, struct inode *dir,
+			 struct dentry *new_dentry);
+static int dir_rmdir(struct inode *dir, struct dentry *dentry);
+static int dir_rename(struct inode *old_dir, struct dentry *old_dentry,
+		struct inode *new_dir, struct dentry *new_dentry,
+		unsigned int flags);
+
+static int file_open(struct inode *inode, struct file *file);
+static int file_release(struct inode *inode, struct file *file);
+static int read_folio(struct file *f, struct folio *folio);
+static long dispatch_ioctl(struct file *f, unsigned int req, unsigned long arg);
+
+#ifdef CONFIG_COMPAT
+static long incfs_compat_ioctl(struct file *file, unsigned int cmd,
+			 unsigned long arg);
+#endif
+
+static struct inode *alloc_inode(struct super_block *sb);
+static void free_inode(struct inode *inode);
+static void evict_inode(struct inode *inode);
+
+static int incfs_setattr(struct user_namespace *ns, struct dentry *dentry,
+			 struct iattr *ia);
+static int incfs_getattr(struct user_namespace *ns, const struct path *path,
+			 struct kstat *stat, u32 request_mask,
+			 unsigned int query_flags);
+static ssize_t incfs_getxattr(struct dentry *d, const char *name,
+			void *value, size_t size);
+static ssize_t incfs_setxattr(struct user_namespace *ns, struct dentry *d,
+			      const char *name, void *value, size_t size,
+			      int flags);
+static ssize_t incfs_listxattr(struct dentry *d, char *list, size_t size);
+
+static int show_options(struct seq_file *, struct dentry *);
+
+static const struct super_operations incfs_super_ops = {
+	.statfs = simple_statfs,
+	.remount_fs = incfs_remount_fs,
+	.alloc_inode	= alloc_inode,
+	.destroy_inode	= free_inode,
+	.evict_inode = evict_inode,
+	.show_options = show_options
+};
+
+static int dir_rename_wrap(struct user_namespace *ns, struct inode *old_dir,
+			   struct dentry *old_dentry, struct inode *new_dir,
+			   struct dentry *new_dentry, unsigned int flags)
+{
+	return dir_rename(old_dir, old_dentry, new_dir, new_dentry, flags);
+}
+
+static const struct inode_operations incfs_dir_inode_ops = {
+	.lookup = dir_lookup,
+	.mkdir = dir_mkdir,
+	.rename = dir_rename_wrap,
+	.unlink = dir_unlink,
+	.link = dir_link,
+	.rmdir = dir_rmdir,
+	.setattr = incfs_setattr,
+};
+
+static const struct file_operations incfs_dir_fops = {
+	.llseek = generic_file_llseek,
+	.read = generic_read_dir,
+	.iterate = iterate_incfs_dir,
+	.open = file_open,
+	.release = file_release,
+};
+
+static const struct dentry_operations incfs_dentry_ops = {
+	.d_revalidate = dentry_revalidate,
+	.d_release = dentry_release
+};
+
+static const struct address_space_operations incfs_address_space_ops = {
+	.read_folio = read_folio,
+	/* .readpages = readpages */
+};
+
+static vm_fault_t incfs_fault(struct vm_fault *vmf)
+{
+	vmf->flags &= ~FAULT_FLAG_ALLOW_RETRY;
+	return filemap_fault(vmf);
+}
+
+static const struct vm_operations_struct incfs_file_vm_ops = {
+	.fault		= incfs_fault,
+	.map_pages	= filemap_map_pages,
+	.page_mkwrite	= filemap_page_mkwrite,
+};
+
+/* This is used for a general mmap of a disk file */
+
+static int incfs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	if (!mapping->a_ops->read_folio)
+		return -ENOEXEC;
+	file_accessed(file);
+	vma->vm_ops = &incfs_file_vm_ops;
+	return 0;
+}
+
+const struct file_operations incfs_file_ops = {
+	.open = file_open,
+	.release = file_release,
+	.read_iter = generic_file_read_iter,
+	.mmap = incfs_file_mmap,
+	.splice_read = generic_file_splice_read,
+	.llseek = generic_file_llseek,
+	.unlocked_ioctl = dispatch_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl = incfs_compat_ioctl,
+#endif
+};
+
+const struct inode_operations incfs_file_inode_ops = {
+	.setattr = incfs_setattr,
+	.getattr = incfs_getattr,
+	.listxattr = incfs_listxattr
+};
+
+static int incfs_handler_getxattr(const struct xattr_handler *xh,
+				  struct dentry *d, struct inode *inode,
+				  const char *name, void *buffer, size_t size)
+{
+	return incfs_getxattr(d, name, buffer, size);
+}
+
+static int incfs_handler_setxattr(const struct xattr_handler *xh,
+				  struct user_namespace *ns,
+				  struct dentry *d, struct inode *inode,
+				  const char *name, const void *buffer,
+				  size_t size, int flags)
+{
+	return incfs_setxattr(ns, d, name, (void *)buffer, size, flags);
+}
+
+static const struct xattr_handler incfs_xattr_handler = {
+	.prefix = "",	/* AKA all attributes */
+	.get = incfs_handler_getxattr,
+	.set = incfs_handler_setxattr,
+};
+
+static const struct xattr_handler *incfs_xattr_ops[] = {
+	&incfs_xattr_handler,
+	NULL,
+};
+
+struct inode_search {
+	unsigned long ino;
+
+	struct dentry *backing_dentry;
+
+	size_t size;
+
+	bool verity;
+};
+
+enum parse_parameter {
+	Opt_read_timeout,
+	Opt_readahead_pages,
+	Opt_rlog_pages,
+	Opt_rlog_wakeup_cnt,
+	Opt_report_uid,
+	Opt_sysfs_name,
+	Opt_err
+};
+
+static const match_table_t option_tokens = {
+	{ Opt_read_timeout, "read_timeout_ms=%u" },
+	{ Opt_readahead_pages, "readahead=%u" },
+	{ Opt_rlog_pages, "rlog_pages=%u" },
+	{ Opt_rlog_wakeup_cnt, "rlog_wakeup_cnt=%u" },
+	{ Opt_report_uid, "report_uid" },
+	{ Opt_sysfs_name, "sysfs_name=%s" },
+	{ Opt_err, NULL }
+};
+
+static void free_options(struct mount_options *opts)
+{
+	kfree(opts->sysfs_name);
+	opts->sysfs_name = NULL;
+}
+
+static int parse_options(struct mount_options *opts, char *str)
+{
+	substring_t args[MAX_OPT_ARGS];
+	int value;
+	char *position;
+
+	if (opts == NULL)
+		return -EFAULT;
+
+	*opts = (struct mount_options) {
+		.read_timeout_ms = 1000, /* Default: 1s */
+		.readahead_pages = 10,
+		.read_log_pages = 2,
+		.read_log_wakeup_count = 10,
+	};
+
+	if (str == NULL || *str == 0)
+		return 0;
+
+	while ((position = strsep(&str, ",")) != NULL) {
+		int token;
+
+		if (!*position)
+			continue;
+
+		token = match_token(position, option_tokens, args);
+
+		switch (token) {
+		case Opt_read_timeout:
+			if (match_int(&args[0], &value))
+				return -EINVAL;
+			if (value > 3600000)
+				return -EINVAL;
+			opts->read_timeout_ms = value;
+			break;
+		case Opt_readahead_pages:
+			if (match_int(&args[0], &value))
+				return -EINVAL;
+			opts->readahead_pages = value;
+			break;
+		case Opt_rlog_pages:
+			if (match_int(&args[0], &value))
+				return -EINVAL;
+			opts->read_log_pages = value;
+			break;
+		case Opt_rlog_wakeup_cnt:
+			if (match_int(&args[0], &value))
+				return -EINVAL;
+			opts->read_log_wakeup_count = value;
+			break;
+		case Opt_report_uid:
+			opts->report_uid = true;
+			break;
+		case Opt_sysfs_name:
+			opts->sysfs_name = match_strdup(&args[0]);
+			break;
+		default:
+			free_options(opts);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+/* Read file size from the attribute. Quicker than reading the header */
+static u64 read_size_attr(struct dentry *backing_dentry)
+{
+	__le64 attr_value;
+	ssize_t bytes_read;
+
+	bytes_read = vfs_getxattr(&init_user_ns, backing_dentry, INCFS_XATTR_SIZE_NAME,
+			(char *)&attr_value, sizeof(attr_value));
+
+	if (bytes_read != sizeof(attr_value))
+		return 0;
+
+	return le64_to_cpu(attr_value);
+}
+
+/* Read verity flag from the attribute. Quicker than reading the header */
+static bool read_verity_attr(struct dentry *backing_dentry)
+{
+	return vfs_getxattr(&init_user_ns, backing_dentry, INCFS_XATTR_VERITY_NAME, NULL, 0)
+		>= 0;
+}
+
+static int inode_test(struct inode *inode, void *opaque)
+{
+	struct inode_search *search = opaque;
+	struct inode_info *node = get_incfs_node(inode);
+	struct inode *backing_inode = d_inode(search->backing_dentry);
+
+	if (!node)
+		return 0;
+
+	return node->n_backing_inode == backing_inode &&
+		inode->i_ino == search->ino;
+}
+
+static int inode_set(struct inode *inode, void *opaque)
+{
+	struct inode_search *search = opaque;
+	struct inode_info *node = get_incfs_node(inode);
+	struct dentry *backing_dentry = search->backing_dentry;
+	struct inode *backing_inode = d_inode(backing_dentry);
+
+	fsstack_copy_attr_all(inode, backing_inode);
+	if (S_ISREG(inode->i_mode)) {
+		u64 size = search->size;
+
+		inode->i_size = size;
+		inode->i_blocks = get_blocks_count_for_size(size);
+		inode->i_mapping->a_ops = &incfs_address_space_ops;
+		inode->i_op = &incfs_file_inode_ops;
+		inode->i_fop = &incfs_file_ops;
+		inode->i_mode &= ~0222;
+		if (search->verity)
+			inode_set_flags(inode, S_VERITY, S_VERITY);
+	} else if (S_ISDIR(inode->i_mode)) {
+		inode->i_size = 0;
+		inode->i_blocks = 1;
+		inode->i_mapping->a_ops = &incfs_address_space_ops;
+		inode->i_op = &incfs_dir_inode_ops;
+		inode->i_fop = &incfs_dir_fops;
+	} else {
+		pr_warn_once("incfs: Unexpected inode type\n");
+		return -EBADF;
+	}
+
+	ihold(backing_inode);
+	node->n_backing_inode = backing_inode;
+	node->n_mount_info = get_mount_info(inode->i_sb);
+	inode->i_ctime = backing_inode->i_ctime;
+	inode->i_mtime = backing_inode->i_mtime;
+	inode->i_atime = backing_inode->i_atime;
+	inode->i_ino = backing_inode->i_ino;
+	if (backing_inode->i_ino < INCFS_START_INO_RANGE) {
+		pr_warn("incfs: ino conflict with backing FS %ld\n",
+			backing_inode->i_ino);
+	}
+
+	return 0;
+}
+
+static struct inode *fetch_regular_inode(struct super_block *sb,
+					struct dentry *backing_dentry)
+{
+	struct inode *backing_inode = d_inode(backing_dentry);
+	struct inode_search search = {
+		.ino = backing_inode->i_ino,
+		.backing_dentry = backing_dentry,
+		.size = read_size_attr(backing_dentry),
+		.verity = read_verity_attr(backing_dentry),
+	};
+	struct inode *inode = iget5_locked(sb, search.ino, inode_test,
+				inode_set, &search);
+
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+
+	if (inode->i_state & I_NEW)
+		unlock_new_inode(inode);
+
+	return inode;
+}
+
+static int iterate_incfs_dir(struct file *file, struct dir_context *ctx)
+{
+	struct dir_file *dir = get_incfs_dir_file(file);
+	int error = 0;
+	struct mount_info *mi = get_mount_info(file_superblock(file));
+	bool root;
+
+	if (!dir) {
+		error = -EBADF;
+		goto out;
+	}
+
+	root = dir->backing_dir->f_inode
+			== d_inode(mi->mi_backing_dir_path.dentry);
+
+	if (root) {
+		error = emit_pseudo_files(ctx);
+		if (error)
+			goto out;
+	}
+
+	ctx->pos -= PSEUDO_FILE_COUNT;
+	error = iterate_dir(dir->backing_dir, ctx);
+	ctx->pos += PSEUDO_FILE_COUNT;
+	file->f_pos = dir->backing_dir->f_pos;
+out:
+	if (error)
+		pr_warn("incfs: %s %s %d\n", __func__,
+			file->f_path.dentry->d_name.name, error);
+	return error;
+}
+
+static int incfs_init_dentry(struct dentry *dentry, struct path *path)
+{
+	struct dentry_info *d_info = NULL;
+
+	if (!dentry || !path)
+		return -EFAULT;
+
+	d_info = kzalloc(sizeof(*d_info), GFP_NOFS);
+	if (!d_info)
+		return -ENOMEM;
+
+	d_info->backing_path = *path;
+	path_get(path);
+
+	dentry->d_fsdata = d_info;
+	return 0;
+}
+
+static struct dentry *open_or_create_special_dir(struct dentry *backing_dir,
+						 const char *name,
+						 bool *created)
+{
+	struct dentry *index_dentry;
+	struct inode *backing_inode = d_inode(backing_dir);
+	int err = 0;
+
+	index_dentry = incfs_lookup_dentry(backing_dir, name);
+	if (!index_dentry) {
+		return ERR_PTR(-EINVAL);
+	} else if (IS_ERR(index_dentry)) {
+		return index_dentry;
+	} else if (d_really_is_positive(index_dentry)) {
+		/* Index already exists. */
+		*created = false;
+		return index_dentry;
+	}
+
+	/* Index needs to be created. */
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	err = vfs_mkdir(&init_user_ns, backing_inode, index_dentry, 0777);
+	inode_unlock(backing_inode);
+
+	if (err) {
+		dput(index_dentry);
+		return ERR_PTR(err);
+	}
+
+	if (!d_really_is_positive(index_dentry) ||
+		unlikely(d_unhashed(index_dentry))) {
+		dput(index_dentry);
+		return ERR_PTR(-EINVAL);
+	}
+
+	*created = true;
+	return index_dentry;
+}
+
+static int read_single_page_timeouts(struct data_file *df, struct file *f,
+				     int block_index, struct mem_range range,
+				     struct mem_range tmp)
+{
+	struct mount_info *mi = df->df_mount_info;
+	struct incfs_read_data_file_timeouts timeouts = {
+		.max_pending_time_us = U32_MAX,
+	};
+	int uid = current_uid().val;
+	int i;
+
+	spin_lock(&mi->mi_per_uid_read_timeouts_lock);
+	for (i = 0; i < mi->mi_per_uid_read_timeouts_size /
+		sizeof(*mi->mi_per_uid_read_timeouts); ++i) {
+		struct incfs_per_uid_read_timeouts *t =
+			&mi->mi_per_uid_read_timeouts[i];
+
+		if(t->uid == uid) {
+			timeouts.min_time_us = t->min_time_us;
+			timeouts.min_pending_time_us = t->min_pending_time_us;
+			timeouts.max_pending_time_us = t->max_pending_time_us;
+			break;
+		}
+	}
+	spin_unlock(&mi->mi_per_uid_read_timeouts_lock);
+	if (timeouts.max_pending_time_us == U32_MAX) {
+		u64 read_timeout_us = (u64)mi->mi_options.read_timeout_ms *
+					1000;
+
+		timeouts.max_pending_time_us = read_timeout_us <= U32_MAX ?
+					       read_timeout_us : U32_MAX;
+	}
+
+	return incfs_read_data_file_block(range, f, block_index, tmp,
+					  &timeouts);
+}
+
+static int read_folio(struct file *f, struct folio *folio)
+{
+	struct page *page = &folio->page;
+	loff_t offset = 0;
+	loff_t size = 0;
+	ssize_t bytes_to_read = 0;
+	ssize_t read_result = 0;
+	struct data_file *df = get_incfs_data_file(f);
+	int result = 0;
+	void *page_start;
+	int block_index;
+
+	if (!df) {
+		SetPageError(page);
+		unlock_page(page);
+		return -EBADF;
+	}
+
+	page_start = kmap(page);
+	offset = page_offset(page);
+	block_index = (offset + df->df_mapped_offset) /
+		INCFS_DATA_FILE_BLOCK_SIZE;
+	size = df->df_size;
+
+	if (offset < size) {
+		struct mem_range tmp = {
+			.len = 2 * INCFS_DATA_FILE_BLOCK_SIZE
+		};
+		tmp.data = (u8 *)__get_free_pages(GFP_NOFS, get_order(tmp.len));
+		if (!tmp.data) {
+			read_result = -ENOMEM;
+			goto err;
+		}
+		bytes_to_read = min_t(loff_t, size - offset, PAGE_SIZE);
+
+		read_result = read_single_page_timeouts(df, f, block_index,
+					range(page_start, bytes_to_read), tmp);
+
+		free_pages((unsigned long)tmp.data, get_order(tmp.len));
+	} else {
+		bytes_to_read = 0;
+		read_result = 0;
+	}
+
+err:
+	if (read_result < 0)
+		result = read_result;
+	else if (read_result < PAGE_SIZE)
+		zero_user(page, read_result, PAGE_SIZE - read_result);
+
+	if (result == 0)
+		SetPageUptodate(page);
+	else
+		SetPageError(page);
+
+	flush_dcache_page(page);
+	kunmap(page);
+	unlock_page(page);
+	return result;
+}
+
+int incfs_link(struct dentry *what, struct dentry *where)
+{
+	struct dentry *parent_dentry = dget_parent(where);
+	struct inode *pinode = d_inode(parent_dentry);
+	int error = 0;
+
+	inode_lock_nested(pinode, I_MUTEX_PARENT);
+	error = vfs_link(what, &init_user_ns, pinode, where, NULL);
+	inode_unlock(pinode);
+
+	dput(parent_dentry);
+	return error;
+}
+
+int incfs_unlink(struct dentry *dentry)
+{
+	struct dentry *parent_dentry = dget_parent(dentry);
+	struct inode *pinode = d_inode(parent_dentry);
+	int error = 0;
+
+	inode_lock_nested(pinode, I_MUTEX_PARENT);
+	error = vfs_unlink(&init_user_ns, pinode, dentry, NULL);
+	inode_unlock(pinode);
+
+	dput(parent_dentry);
+	return error;
+}
+
+static int incfs_rmdir(struct dentry *dentry)
+{
+	struct dentry *parent_dentry = dget_parent(dentry);
+	struct inode *pinode = d_inode(parent_dentry);
+	int error = 0;
+
+	inode_lock_nested(pinode, I_MUTEX_PARENT);
+	error = vfs_rmdir(&init_user_ns, pinode, dentry);
+	inode_unlock(pinode);
+
+	dput(parent_dentry);
+	return error;
+}
+
+static void notify_unlink(struct dentry *dentry, const char *file_id_str,
+			  const char *special_directory)
+{
+	struct dentry *root = dentry;
+	struct dentry *file = NULL;
+	struct dentry *dir = NULL;
+	int error = 0;
+	bool take_lock = root->d_parent != root->d_parent->d_parent;
+
+	while (root != root->d_parent)
+		root = root->d_parent;
+
+	if (take_lock)
+		dir = incfs_lookup_dentry(root, special_directory);
+	else
+		dir = lookup_one_len(special_directory, root,
+				     strlen(special_directory));
+
+	if (IS_ERR(dir)) {
+		error = PTR_ERR(dir);
+		goto out;
+	}
+	if (d_is_negative(dir)) {
+		error = -ENOENT;
+		goto out;
+	}
+
+	file = incfs_lookup_dentry(dir, file_id_str);
+	if (IS_ERR(file)) {
+		error = PTR_ERR(file);
+		goto out;
+	}
+	if (d_is_negative(file)) {
+		error = -ENOENT;
+		goto out;
+	}
+
+	fsnotify_unlink(d_inode(dir), file);
+	d_delete(file);
+
+out:
+	if (error)
+		pr_warn("%s failed with error %d\n", __func__, error);
+
+	dput(dir);
+	dput(file);
+}
+
+static void maybe_delete_incomplete_file(struct file *f,
+					 struct data_file *df)
+{
+	struct backing_file_context *bfc;
+	struct mount_info *mi = df->df_mount_info;
+	char *file_id_str = NULL;
+	struct dentry *incomplete_file_dentry = NULL;
+	const struct cred *old_cred = override_creds(mi->mi_owner);
+	int error;
+
+	if (atomic_read(&df->df_data_blocks_written) < df->df_data_block_count)
+		goto out;
+
+	/* Truncate file to remove any preallocated space */
+	bfc = df->df_backing_file_context;
+	if (bfc) {
+		struct file *f = bfc->bc_file;
+
+		if (f) {
+			loff_t size = i_size_read(file_inode(f));
+
+			error = vfs_truncate(&f->f_path, size);
+			if (error)
+				/* No useful action on failure */
+				pr_warn("incfs: Failed to truncate complete file: %d\n",
+					error);
+		}
+	}
+
+	/* This is best effort - there is no useful action to take on failure */
+	file_id_str = file_id_to_str(df->df_id);
+	if (!file_id_str)
+		goto out;
+
+	incomplete_file_dentry = incfs_lookup_dentry(
+					df->df_mount_info->mi_incomplete_dir,
+					file_id_str);
+	if (!incomplete_file_dentry || IS_ERR(incomplete_file_dentry)) {
+		incomplete_file_dentry = NULL;
+		goto out;
+	}
+
+	if (!d_really_is_positive(incomplete_file_dentry))
+		goto out;
+
+	vfs_fsync(df->df_backing_file_context->bc_file, 0);
+	error = incfs_unlink(incomplete_file_dentry);
+	if (error) {
+		pr_warn("incfs: Deleting incomplete file failed: %d\n", error);
+		goto out;
+	}
+
+	notify_unlink(f->f_path.dentry, file_id_str, INCFS_INCOMPLETE_NAME);
+
+out:
+	dput(incomplete_file_dentry);
+	kfree(file_id_str);
+	revert_creds(old_cred);
+}
+
+static long ioctl_fill_blocks(struct file *f, void __user *arg)
+{
+	struct incfs_fill_blocks __user *usr_fill_blocks = arg;
+	struct incfs_fill_blocks fill_blocks;
+	struct incfs_fill_block __user *usr_fill_block_array;
+	struct data_file *df = get_incfs_data_file(f);
+	struct incfs_file_data *fd = f->private_data;
+	const ssize_t data_buf_size = 2 * INCFS_DATA_FILE_BLOCK_SIZE;
+	u8 *data_buf = NULL;
+	ssize_t error = 0;
+	int i = 0;
+
+	if (!df)
+		return -EBADF;
+
+	if (!fd || fd->fd_fill_permission != CAN_FILL)
+		return -EPERM;
+
+	if (copy_from_user(&fill_blocks, usr_fill_blocks, sizeof(fill_blocks)))
+		return -EFAULT;
+
+	usr_fill_block_array = u64_to_user_ptr(fill_blocks.fill_blocks);
+	data_buf = (u8 *)__get_free_pages(GFP_NOFS | __GFP_COMP,
+					  get_order(data_buf_size));
+	if (!data_buf)
+		return -ENOMEM;
+
+	for (i = 0; i < fill_blocks.count; i++) {
+		struct incfs_fill_block fill_block = {};
+
+		if (copy_from_user(&fill_block, &usr_fill_block_array[i],
+				   sizeof(fill_block)) > 0) {
+			error = -EFAULT;
+			break;
+		}
+
+		if (fill_block.data_len > data_buf_size) {
+			error = -E2BIG;
+			break;
+		}
+
+		if (copy_from_user(data_buf, u64_to_user_ptr(fill_block.data),
+				   fill_block.data_len) > 0) {
+			error = -EFAULT;
+			break;
+		}
+		fill_block.data = 0; /* To make sure nobody uses it. */
+		if (fill_block.flags & INCFS_BLOCK_FLAGS_HASH) {
+			error = incfs_process_new_hash_block(df, &fill_block,
+							     data_buf);
+		} else {
+			error = incfs_process_new_data_block(df, &fill_block,
+							     data_buf);
+		}
+		if (error)
+			break;
+	}
+
+	if (data_buf)
+		free_pages((unsigned long)data_buf, get_order(data_buf_size));
+
+	maybe_delete_incomplete_file(f, df);
+
+	/*
+	 * Only report the error if no records were processed, otherwise
+	 * just return how many were processed successfully.
+	 */
+	if (i == 0)
+		return error;
+
+	return i;
+}
+
+static long ioctl_read_file_signature(struct file *f, void __user *arg)
+{
+	struct incfs_get_file_sig_args __user *args_usr_ptr = arg;
+	struct incfs_get_file_sig_args args = {};
+	u8 *sig_buffer = NULL;
+	size_t sig_buf_size = 0;
+	int error = 0;
+	int read_result = 0;
+	struct data_file *df = get_incfs_data_file(f);
+
+	if (!df)
+		return -EINVAL;
+
+	if (copy_from_user(&args, args_usr_ptr, sizeof(args)) > 0)
+		return -EINVAL;
+
+	sig_buf_size = args.file_signature_buf_size;
+	if (sig_buf_size > INCFS_MAX_SIGNATURE_SIZE)
+		return -E2BIG;
+
+	sig_buffer = kzalloc(sig_buf_size, GFP_NOFS | __GFP_COMP);
+	if (!sig_buffer)
+		return -ENOMEM;
+
+	read_result = incfs_read_file_signature(df,
+			range(sig_buffer, sig_buf_size));
+
+	if (read_result < 0) {
+		error = read_result;
+		goto out;
+	}
+
+	if (copy_to_user(u64_to_user_ptr(args.file_signature), sig_buffer,
+			read_result)) {
+		error = -EFAULT;
+		goto out;
+	}
+
+	args.file_signature_len_out = read_result;
+	if (copy_to_user(args_usr_ptr, &args, sizeof(args)))
+		error = -EFAULT;
+
+out:
+	kfree(sig_buffer);
+
+	return error;
+}
+
+static long ioctl_get_filled_blocks(struct file *f, void __user *arg)
+{
+	struct incfs_get_filled_blocks_args __user *args_usr_ptr = arg;
+	struct incfs_get_filled_blocks_args args = {};
+	struct data_file *df = get_incfs_data_file(f);
+	struct incfs_file_data *fd = f->private_data;
+	int error;
+
+	if (!df || !fd)
+		return -EINVAL;
+
+	if (fd->fd_fill_permission != CAN_FILL)
+		return -EPERM;
+
+	if (copy_from_user(&args, args_usr_ptr, sizeof(args)) > 0)
+		return -EINVAL;
+
+	error = incfs_get_filled_blocks(df, fd, &args);
+
+	if (copy_to_user(args_usr_ptr, &args, sizeof(args)))
+		return -EFAULT;
+
+	return error;
+}
+
+static long ioctl_get_block_count(struct file *f, void __user *arg)
+{
+	struct incfs_get_block_count_args __user *args_usr_ptr = arg;
+	struct incfs_get_block_count_args args = {};
+	struct data_file *df = get_incfs_data_file(f);
+
+	if (!df)
+		return -EINVAL;
+
+	args.total_data_blocks_out = df->df_data_block_count;
+	args.filled_data_blocks_out = atomic_read(&df->df_data_blocks_written);
+	args.total_hash_blocks_out = df->df_total_block_count -
+		df->df_data_block_count;
+	args.filled_hash_blocks_out = atomic_read(&df->df_hash_blocks_written);
+
+	if (copy_to_user(args_usr_ptr, &args, sizeof(args)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int incfs_ioctl_get_flags(struct file *f, void __user *arg)
+{
+	u32 flags = IS_VERITY(file_inode(f)) ? FS_VERITY_FL : 0;
+
+	return put_user(flags, (int __user *) arg);
+}
+
+static long dispatch_ioctl(struct file *f, unsigned int req, unsigned long arg)
+{
+	switch (req) {
+	case INCFS_IOC_FILL_BLOCKS:
+		return ioctl_fill_blocks(f, (void __user *)arg);
+	case INCFS_IOC_READ_FILE_SIGNATURE:
+		return ioctl_read_file_signature(f, (void __user *)arg);
+	case INCFS_IOC_GET_FILLED_BLOCKS:
+		return ioctl_get_filled_blocks(f, (void __user *)arg);
+	case INCFS_IOC_GET_BLOCK_COUNT:
+		return ioctl_get_block_count(f, (void __user *)arg);
+	case FS_IOC_ENABLE_VERITY:
+		return incfs_ioctl_enable_verity(f, (const void __user *)arg);
+	case FS_IOC_GETFLAGS:
+		return incfs_ioctl_get_flags(f, (void __user *) arg);
+	case FS_IOC_MEASURE_VERITY:
+		return incfs_ioctl_measure_verity(f, (void __user *)arg);
+	case FS_IOC_READ_VERITY_METADATA:
+		return incfs_ioctl_read_verity_metadata(f, (void __user *)arg);
+	default:
+		return -EINVAL;
+	}
+}
+
+#ifdef CONFIG_COMPAT
+static long incfs_compat_ioctl(struct file *file, unsigned int cmd,
+			       unsigned long arg)
+{
+	switch (cmd) {
+	case FS_IOC32_GETFLAGS:
+		cmd = FS_IOC_GETFLAGS;
+		break;
+	case INCFS_IOC_FILL_BLOCKS:
+	case INCFS_IOC_READ_FILE_SIGNATURE:
+	case INCFS_IOC_GET_FILLED_BLOCKS:
+	case INCFS_IOC_GET_BLOCK_COUNT:
+	case FS_IOC_ENABLE_VERITY:
+	case FS_IOC_MEASURE_VERITY:
+	case FS_IOC_READ_VERITY_METADATA:
+		break;
+	default:
+		return -ENOIOCTLCMD;
+	}
+	return dispatch_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
+}
+#endif
+
+static struct dentry *dir_lookup(struct inode *dir_inode, struct dentry *dentry,
+				 unsigned int flags)
+{
+	struct mount_info *mi = get_mount_info(dir_inode->i_sb);
+	struct dentry *dir_dentry = NULL;
+	struct dentry *backing_dentry = NULL;
+	struct path dir_backing_path = {};
+	struct inode_info *dir_info = get_incfs_node(dir_inode);
+	int err = 0;
+
+	if (!mi || !dir_info || !dir_info->n_backing_inode)
+		return ERR_PTR(-EBADF);
+
+	if (d_inode(mi->mi_backing_dir_path.dentry) ==
+		dir_info->n_backing_inode) {
+		/* We do lookup in the FS root. Show pseudo files. */
+		err = dir_lookup_pseudo_files(dir_inode->i_sb, dentry);
+		if (err != -ENOENT)
+			goto out;
+		err = 0;
+	}
+
+	dir_dentry = dget_parent(dentry);
+	get_incfs_backing_path(dir_dentry, &dir_backing_path);
+	backing_dentry = incfs_lookup_dentry(dir_backing_path.dentry,
+						dentry->d_name.name);
+
+	if (!backing_dentry || IS_ERR(backing_dentry)) {
+		err = IS_ERR(backing_dentry)
+			? PTR_ERR(backing_dentry)
+			: -EFAULT;
+		backing_dentry = NULL;
+		goto out;
+	} else {
+		struct inode *inode = NULL;
+		struct path backing_path = {
+			.mnt = dir_backing_path.mnt,
+			.dentry = backing_dentry
+		};
+
+		err = incfs_init_dentry(dentry, &backing_path);
+		if (err)
+			goto out;
+
+		if (!d_really_is_positive(backing_dentry)) {
+			/*
+			 * No such entry found in the backing dir.
+			 * Create a negative entry.
+			 */
+			d_add(dentry, NULL);
+			err = 0;
+			goto out;
+		}
+
+		if (d_inode(backing_dentry)->i_sb !=
+				dir_info->n_backing_inode->i_sb) {
+			/*
+			 * Somehow after the path lookup we ended up in a
+			 * different fs mount. If we keep going it's going
+			 * to end badly.
+			 */
+			err = -EXDEV;
+			goto out;
+		}
+
+		inode = fetch_regular_inode(dir_inode->i_sb, backing_dentry);
+		if (IS_ERR(inode)) {
+			err = PTR_ERR(inode);
+			goto out;
+		}
+
+		d_add(dentry, inode);
+	}
+
+out:
+	dput(dir_dentry);
+	dput(backing_dentry);
+	path_put(&dir_backing_path);
+	if (err)
+		pr_debug("incfs: %s %s %d\n", __func__,
+			 dentry->d_name.name, err);
+	return ERR_PTR(err);
+}
+
+static int dir_mkdir(struct user_namespace *ns, struct inode *dir, struct dentry *dentry, umode_t mode)
+{
+	struct mount_info *mi = get_mount_info(dir->i_sb);
+	struct inode_info *dir_node = get_incfs_node(dir);
+	struct dentry *backing_dentry = NULL;
+	struct path backing_path = {};
+	int err = 0;
+
+
+	if (!mi || !dir_node || !dir_node->n_backing_inode)
+		return -EBADF;
+
+	err = mutex_lock_interruptible(&mi->mi_dir_struct_mutex);
+	if (err)
+		return err;
+
+	get_incfs_backing_path(dentry, &backing_path);
+	backing_dentry = backing_path.dentry;
+
+	if (!backing_dentry) {
+		err = -EBADF;
+		goto path_err;
+	}
+
+	if (backing_dentry->d_parent == mi->mi_index_dir) {
+		/* Can't create a subdir inside .index */
+		err = -EBUSY;
+		goto out;
+	}
+
+	if (backing_dentry->d_parent == mi->mi_incomplete_dir) {
+		/* Can't create a subdir inside .incomplete */
+		err = -EBUSY;
+		goto out;
+	}
+	inode_lock_nested(dir_node->n_backing_inode, I_MUTEX_PARENT);
+	err = vfs_mkdir(ns, dir_node->n_backing_inode, backing_dentry, mode | 0222);
+	inode_unlock(dir_node->n_backing_inode);
+	if (!err) {
+		struct inode *inode = NULL;
+
+		if (d_really_is_negative(backing_dentry) ||
+			unlikely(d_unhashed(backing_dentry))) {
+			err = -EINVAL;
+			goto out;
+		}
+
+		inode = fetch_regular_inode(dir->i_sb, backing_dentry);
+		if (IS_ERR(inode)) {
+			err = PTR_ERR(inode);
+			goto out;
+		}
+		d_instantiate(dentry, inode);
+	}
+
+out:
+	if (d_really_is_negative(dentry))
+		d_drop(dentry);
+	path_put(&backing_path);
+
+path_err:
+	mutex_unlock(&mi->mi_dir_struct_mutex);
+	if (err)
+		pr_debug("incfs: %s err:%d\n", __func__, err);
+	return err;
+}
+
+/*
+ * Delete file referenced by backing_dentry and if appropriate its hardlink
+ * from .index and .incomplete
+ */
+static int file_delete(struct mount_info *mi, struct dentry *dentry,
+			struct dentry *backing_dentry, int nlink)
+{
+	struct dentry *index_file_dentry = NULL;
+	struct dentry *incomplete_file_dentry = NULL;
+	/* 2 chars per byte of file ID + 1 char for \0 */
+	char file_id_str[2 * sizeof(incfs_uuid_t) + 1] = {0};
+	ssize_t uuid_size = 0;
+	int error = 0;
+
+	WARN_ON(!mutex_is_locked(&mi->mi_dir_struct_mutex));
+
+	if (nlink > 3)
+		goto just_unlink;
+
+	uuid_size = vfs_getxattr(&init_user_ns, backing_dentry, INCFS_XATTR_ID_NAME,
+			file_id_str, 2 * sizeof(incfs_uuid_t));
+	if (uuid_size < 0) {
+		error = uuid_size;
+		goto out;
+	}
+
+	if (uuid_size != 2 * sizeof(incfs_uuid_t)) {
+		error = -EBADMSG;
+		goto out;
+	}
+
+	index_file_dentry = incfs_lookup_dentry(mi->mi_index_dir, file_id_str);
+	if (IS_ERR(index_file_dentry)) {
+		error = PTR_ERR(index_file_dentry);
+		index_file_dentry = NULL;
+		goto out;
+	}
+
+	if (d_really_is_positive(index_file_dentry) && nlink > 0)
+		nlink--;
+
+	if (nlink > 2)
+		goto just_unlink;
+
+	incomplete_file_dentry = incfs_lookup_dentry(mi->mi_incomplete_dir,
+						     file_id_str);
+	if (IS_ERR(incomplete_file_dentry)) {
+		error = PTR_ERR(incomplete_file_dentry);
+		incomplete_file_dentry = NULL;
+		goto out;
+	}
+
+	if (d_really_is_positive(incomplete_file_dentry) && nlink > 0)
+		nlink--;
+
+	if (nlink > 1)
+		goto just_unlink;
+
+	if (d_really_is_positive(index_file_dentry)) {
+		error = incfs_unlink(index_file_dentry);
+		if (error)
+			goto out;
+		notify_unlink(dentry, file_id_str, INCFS_INDEX_NAME);
+	}
+
+	if (d_really_is_positive(incomplete_file_dentry)) {
+		error = incfs_unlink(incomplete_file_dentry);
+		if (error)
+			goto out;
+		notify_unlink(dentry, file_id_str, INCFS_INCOMPLETE_NAME);
+	}
+
+just_unlink:
+	error = incfs_unlink(backing_dentry);
+
+out:
+	dput(index_file_dentry);
+	dput(incomplete_file_dentry);
+	if (error)
+		pr_debug("incfs: delete_file_from_index err:%d\n", error);
+	return error;
+}
+
+static int dir_unlink(struct inode *dir, struct dentry *dentry)
+{
+	struct mount_info *mi = get_mount_info(dir->i_sb);
+	struct path backing_path = {};
+	struct kstat stat;
+	int err = 0;
+
+	if (!mi)
+		return -EBADF;
+
+	err = mutex_lock_interruptible(&mi->mi_dir_struct_mutex);
+	if (err)
+		return err;
+
+	get_incfs_backing_path(dentry, &backing_path);
+	if (!backing_path.dentry) {
+		err = -EBADF;
+		goto path_err;
+	}
+
+	if (backing_path.dentry->d_parent == mi->mi_index_dir) {
+		/* Direct unlink from .index are not allowed. */
+		err = -EBUSY;
+		goto out;
+	}
+
+	if (backing_path.dentry->d_parent == mi->mi_incomplete_dir) {
+		/* Direct unlink from .incomplete are not allowed. */
+		err = -EBUSY;
+		goto out;
+	}
+
+	err = vfs_getattr(&backing_path, &stat, STATX_NLINK,
+			  AT_STATX_SYNC_AS_STAT);
+	if (err)
+		goto out;
+
+	err = file_delete(mi, dentry, backing_path.dentry, stat.nlink);
+
+	d_drop(dentry);
+out:
+	path_put(&backing_path);
+path_err:
+	if (err)
+		pr_debug("incfs: %s err:%d\n", __func__, err);
+	mutex_unlock(&mi->mi_dir_struct_mutex);
+	return err;
+}
+
+static int dir_link(struct dentry *old_dentry, struct inode *dir,
+			 struct dentry *new_dentry)
+{
+	struct mount_info *mi = get_mount_info(dir->i_sb);
+	struct path backing_old_path = {};
+	struct path backing_new_path = {};
+	int error = 0;
+
+	if (!mi)
+		return -EBADF;
+
+	error = mutex_lock_interruptible(&mi->mi_dir_struct_mutex);
+	if (error)
+		return error;
+
+	get_incfs_backing_path(old_dentry, &backing_old_path);
+	get_incfs_backing_path(new_dentry, &backing_new_path);
+
+	if (backing_new_path.dentry->d_parent == mi->mi_index_dir) {
+		/* Can't link to .index */
+		error = -EBUSY;
+		goto out;
+	}
+
+	if (backing_new_path.dentry->d_parent == mi->mi_incomplete_dir) {
+		/* Can't link to .incomplete */
+		error = -EBUSY;
+		goto out;
+	}
+
+	error = incfs_link(backing_old_path.dentry, backing_new_path.dentry);
+	if (!error) {
+		struct inode *inode = NULL;
+		struct dentry *bdentry = backing_new_path.dentry;
+
+		if (d_really_is_negative(bdentry)) {
+			error = -EINVAL;
+			goto out;
+		}
+
+		inode = fetch_regular_inode(dir->i_sb, bdentry);
+		if (IS_ERR(inode)) {
+			error = PTR_ERR(inode);
+			goto out;
+		}
+		d_instantiate(new_dentry, inode);
+	}
+
+out:
+	path_put(&backing_old_path);
+	path_put(&backing_new_path);
+	if (error)
+		pr_debug("incfs: %s err:%d\n", __func__, error);
+	mutex_unlock(&mi->mi_dir_struct_mutex);
+	return error;
+}
+
+static int dir_rmdir(struct inode *dir, struct dentry *dentry)
+{
+	struct mount_info *mi = get_mount_info(dir->i_sb);
+	struct path backing_path = {};
+	int err = 0;
+
+	if (!mi)
+		return -EBADF;
+
+	err = mutex_lock_interruptible(&mi->mi_dir_struct_mutex);
+	if (err)
+		return err;
+
+	get_incfs_backing_path(dentry, &backing_path);
+	if (!backing_path.dentry) {
+		err = -EBADF;
+		goto path_err;
+	}
+
+	if (backing_path.dentry == mi->mi_index_dir) {
+		/* Can't delete .index */
+		err = -EBUSY;
+		goto out;
+	}
+
+	if (backing_path.dentry == mi->mi_incomplete_dir) {
+		/* Can't delete .incomplete */
+		err = -EBUSY;
+		goto out;
+	}
+
+	err = incfs_rmdir(backing_path.dentry);
+	if (!err)
+		d_drop(dentry);
+out:
+	path_put(&backing_path);
+
+path_err:
+	if (err)
+		pr_debug("incfs: %s err:%d\n", __func__, err);
+	mutex_unlock(&mi->mi_dir_struct_mutex);
+	return err;
+}
+
+static int dir_rename(struct inode *old_dir, struct dentry *old_dentry,
+		struct inode *new_dir, struct dentry *new_dentry,
+		unsigned int flags)
+{
+	struct mount_info *mi = get_mount_info(old_dir->i_sb);
+	struct dentry *backing_old_dentry;
+	struct dentry *backing_new_dentry;
+	struct dentry *backing_old_dir_dentry;
+	struct dentry *backing_new_dir_dentry;
+	struct inode *target_inode;
+	struct dentry *trap;
+	struct renamedata rd = {};
+	int error = 0;
+
+	error = mutex_lock_interruptible(&mi->mi_dir_struct_mutex);
+	if (error)
+		return error;
+
+	backing_old_dentry = get_incfs_dentry(old_dentry)->backing_path.dentry;
+
+	if (!backing_old_dentry || backing_old_dentry == mi->mi_index_dir ||
+	    backing_old_dentry == mi->mi_incomplete_dir) {
+		/* Renaming .index or .incomplete not allowed */
+		error = -EBUSY;
+		goto exit;
+	}
+
+	backing_new_dentry = get_incfs_dentry(new_dentry)->backing_path.dentry;
+	dget(backing_old_dentry);
+	dget(backing_new_dentry);
+
+	backing_old_dir_dentry = dget_parent(backing_old_dentry);
+	backing_new_dir_dentry = dget_parent(backing_new_dentry);
+	target_inode = d_inode(new_dentry);
+
+	if (backing_old_dir_dentry == mi->mi_index_dir ||
+	    backing_old_dir_dentry == mi->mi_incomplete_dir) {
+		/* Direct moves from .index or .incomplete are not allowed. */
+		error = -EBUSY;
+		goto out;
+	}
+
+	trap = lock_rename(backing_old_dir_dentry, backing_new_dir_dentry);
+
+	if (trap == backing_old_dentry) {
+		error = -EINVAL;
+		goto unlock_out;
+	}
+	if (trap == backing_new_dentry) {
+		error = -ENOTEMPTY;
+		goto unlock_out;
+	}
+
+	rd.old_dir	= d_inode(backing_old_dir_dentry);
+	rd.old_dentry	= backing_old_dentry;
+	rd.new_dir	= d_inode(backing_new_dir_dentry);
+	rd.new_dentry	= backing_new_dentry;
+	rd.flags	= flags;
+	rd.old_mnt_userns = &init_user_ns;
+	rd.new_mnt_userns = &init_user_ns;
+	rd.delegated_inode = NULL;
+
+	error = vfs_rename(&rd);
+	if (error)
+		goto unlock_out;
+	if (target_inode)
+		fsstack_copy_attr_all(target_inode,
+			get_incfs_node(target_inode)->n_backing_inode);
+	fsstack_copy_attr_all(new_dir, d_inode(backing_new_dir_dentry));
+	if (new_dir != old_dir)
+		fsstack_copy_attr_all(old_dir, d_inode(backing_old_dir_dentry));
+
+unlock_out:
+	unlock_rename(backing_old_dir_dentry, backing_new_dir_dentry);
+
+out:
+	dput(backing_new_dir_dentry);
+	dput(backing_old_dir_dentry);
+	dput(backing_new_dentry);
+	dput(backing_old_dentry);
+
+exit:
+	mutex_unlock(&mi->mi_dir_struct_mutex);
+	if (error)
+		pr_debug("incfs: %s err:%d\n", __func__, error);
+	return error;
+}
+
+
+static int file_open(struct inode *inode, struct file *file)
+{
+	struct mount_info *mi = get_mount_info(inode->i_sb);
+	struct file *backing_file = NULL;
+	struct path backing_path = {};
+	int err = 0;
+	int flags = O_NOATIME | O_LARGEFILE |
+		(S_ISDIR(inode->i_mode) ? O_RDONLY : O_RDWR);
+	const struct cred *old_cred;
+
+	WARN_ON(file->private_data);
+
+	if (!mi)
+		return -EBADF;
+
+	get_incfs_backing_path(file->f_path.dentry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	old_cred = override_creds(mi->mi_owner);
+	backing_file = dentry_open(&backing_path, flags, current_cred());
+	revert_creds(old_cred);
+	path_put(&backing_path);
+
+	if (IS_ERR(backing_file)) {
+		err = PTR_ERR(backing_file);
+		backing_file = NULL;
+		goto out;
+	}
+
+	if (S_ISREG(inode->i_mode)) {
+		struct incfs_file_data *fd = kzalloc(sizeof(*fd), GFP_NOFS);
+
+		if (!fd) {
+			err = -ENOMEM;
+			goto out;
+		}
+
+		*fd = (struct incfs_file_data) {
+			.fd_fill_permission = CANT_FILL,
+		};
+		file->private_data = fd;
+
+		err = make_inode_ready_for_data_ops(mi, inode, backing_file);
+		if (err)
+			goto out;
+
+		err = incfs_fsverity_file_open(inode, file);
+		if (err)
+			goto out;
+	} else if (S_ISDIR(inode->i_mode)) {
+		struct dir_file *dir = NULL;
+
+		dir = incfs_open_dir_file(mi, backing_file);
+		if (IS_ERR(dir))
+			err = PTR_ERR(dir);
+		else
+			file->private_data = dir;
+	} else
+		err = -EBADF;
+
+out:
+	if (err) {
+		pr_debug("name:%s err: %d\n",
+			 file->f_path.dentry->d_name.name, err);
+		if (S_ISREG(inode->i_mode))
+			kfree(file->private_data);
+		else if (S_ISDIR(inode->i_mode))
+			incfs_free_dir_file(file->private_data);
+
+		file->private_data = NULL;
+	}
+
+	if (backing_file)
+		fput(backing_file);
+	return err;
+}
+
+static int file_release(struct inode *inode, struct file *file)
+{
+	if (S_ISREG(inode->i_mode)) {
+		kfree(file->private_data);
+		file->private_data = NULL;
+	} else if (S_ISDIR(inode->i_mode)) {
+		struct dir_file *dir = get_incfs_dir_file(file);
+
+		incfs_free_dir_file(dir);
+	}
+
+	return 0;
+}
+
+static int dentry_revalidate(struct dentry *d, unsigned int flags)
+{
+	struct path backing_path = {};
+	struct inode_info *info = get_incfs_node(d_inode(d));
+	struct inode *binode = (info == NULL) ? NULL : info->n_backing_inode;
+	struct dentry *backing_dentry = NULL;
+	int result = 0;
+
+	if (flags & LOOKUP_RCU)
+		return -ECHILD;
+
+	get_incfs_backing_path(d, &backing_path);
+	backing_dentry = backing_path.dentry;
+	if (!backing_dentry)
+		goto out;
+
+	if (d_inode(backing_dentry) != binode) {
+		/*
+		 * Backing inodes obtained via dentry and inode don't match.
+		 * It indicates that most likely backing dir has changed
+		 * directly bypassing Incremental FS interface.
+		 */
+		goto out;
+	}
+
+	if (backing_dentry->d_flags & DCACHE_OP_REVALIDATE) {
+		result = backing_dentry->d_op->d_revalidate(backing_dentry,
+				flags);
+	} else
+		result = 1;
+
+out:
+	path_put(&backing_path);
+	return result;
+}
+
+static void dentry_release(struct dentry *d)
+{
+	struct dentry_info *di = get_incfs_dentry(d);
+
+	if (di)
+		path_put(&di->backing_path);
+	kfree(d->d_fsdata);
+	d->d_fsdata = NULL;
+}
+
+static struct inode *alloc_inode(struct super_block *sb)
+{
+	struct inode_info *node = kzalloc(sizeof(*node), GFP_NOFS);
+
+	/* TODO: add a slab-based cache here. */
+	if (!node)
+		return NULL;
+	inode_init_once(&node->n_vfs_inode);
+	return &node->n_vfs_inode;
+}
+
+static void free_inode(struct inode *inode)
+{
+	struct inode_info *node = get_incfs_node(inode);
+
+	kfree(node);
+}
+
+static void evict_inode(struct inode *inode)
+{
+	struct inode_info *node = get_incfs_node(inode);
+
+	if (node) {
+		if (node->n_backing_inode) {
+			iput(node->n_backing_inode);
+			node->n_backing_inode = NULL;
+		}
+		if (node->n_file) {
+			incfs_free_data_file(node->n_file);
+			node->n_file = NULL;
+		}
+	}
+
+	truncate_inode_pages(&inode->i_data, 0);
+	clear_inode(inode);
+}
+
+static int incfs_setattr(struct user_namespace *ns, struct dentry *dentry,
+			 struct iattr *ia)
+{
+	struct dentry_info *di = get_incfs_dentry(dentry);
+	struct dentry *backing_dentry;
+	struct inode *backing_inode;
+	int error;
+
+	if (ia->ia_valid & ATTR_SIZE)
+		return -EINVAL;
+
+	if ((ia->ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID)) &&
+	    (ia->ia_valid & ATTR_MODE))
+		return -EINVAL;
+
+	if (!di)
+		return -EINVAL;
+	backing_dentry = di->backing_path.dentry;
+	if (!backing_dentry)
+		return -EINVAL;
+
+	backing_inode = d_inode(backing_dentry);
+
+	/* incfs files are readonly, but the backing files must be writeable */
+	if (S_ISREG(backing_inode->i_mode)) {
+		if ((ia->ia_valid & ATTR_MODE) && (ia->ia_mode & 0222))
+			return -EINVAL;
+
+		ia->ia_mode |= 0222;
+	}
+
+	inode_lock(d_inode(backing_dentry));
+	error = notify_change(ns, backing_dentry, ia, NULL);
+	inode_unlock(d_inode(backing_dentry));
+
+	if (error)
+		return error;
+
+	if (S_ISREG(backing_inode->i_mode))
+		ia->ia_mode &= ~0222;
+
+	return simple_setattr(ns, dentry, ia);
+}
+
+
+static int incfs_getattr(struct user_namespace *ns, const struct path *path,
+			 struct kstat *stat, u32 request_mask,
+			 unsigned int query_flags)
+{
+	struct inode *inode = d_inode(path->dentry);
+
+	generic_fillattr(ns, inode, stat);
+
+	if (inode->i_ino < INCFS_START_INO_RANGE)
+		return 0;
+
+	stat->attributes &= ~STATX_ATTR_VERITY;
+	if (IS_VERITY(inode))
+		stat->attributes |= STATX_ATTR_VERITY;
+	stat->attributes_mask |= STATX_ATTR_VERITY;
+
+	if (request_mask & STATX_BLOCKS) {
+		struct kstat backing_kstat;
+		struct dentry_info *di = get_incfs_dentry(path->dentry);
+		int error = 0;
+		struct path *backing_path;
+
+		if (!di)
+			return -EFSCORRUPTED;
+		backing_path = &di->backing_path;
+		error = vfs_getattr(backing_path, &backing_kstat, STATX_BLOCKS,
+				    AT_STATX_SYNC_AS_STAT);
+		if (error)
+			return error;
+
+		stat->blocks = backing_kstat.blocks;
+	}
+
+	return 0;
+}
+
+static ssize_t incfs_getxattr(struct dentry *d, const char *name,
+			void *value, size_t size)
+{
+	struct dentry_info *di = get_incfs_dentry(d);
+	struct mount_info *mi = get_mount_info(d->d_sb);
+	char *stored_value;
+	size_t stored_size;
+	int i;
+
+	if (di && di->backing_path.dentry)
+		return vfs_getxattr(&init_user_ns, di->backing_path.dentry, name, value, size);
+
+	if (strcmp(name, "security.selinux"))
+		return -ENODATA;
+
+	for (i = 0; i < PSEUDO_FILE_COUNT; ++i)
+		if (!strcmp(d->d_iname, incfs_pseudo_file_names[i].data))
+			break;
+	if (i == PSEUDO_FILE_COUNT)
+		return -ENODATA;
+
+	stored_value = mi->pseudo_file_xattr[i].data;
+	stored_size = mi->pseudo_file_xattr[i].len;
+	if (!stored_value)
+		return -ENODATA;
+
+	if (stored_size > size)
+		return -E2BIG;
+
+	memcpy(value, stored_value, stored_size);
+	return stored_size;
+}
+
+
+static ssize_t incfs_setxattr(struct user_namespace *ns, struct dentry *d,
+			      const char *name, void *value, size_t size,
+			      int flags)
+{
+	struct dentry_info *di = get_incfs_dentry(d);
+	struct mount_info *mi = get_mount_info(d->d_sb);
+	u8 **stored_value;
+	size_t *stored_size;
+	int i;
+
+	if (di && di->backing_path.dentry)
+		return vfs_setxattr(ns, di->backing_path.dentry, name, value,
+				    size, flags);
+
+	if (strcmp(name, "security.selinux"))
+		return -ENODATA;
+
+	if (size > INCFS_MAX_FILE_ATTR_SIZE)
+		return -E2BIG;
+
+	for (i = 0; i < PSEUDO_FILE_COUNT; ++i)
+		if (!strcmp(d->d_iname, incfs_pseudo_file_names[i].data))
+			break;
+	if (i == PSEUDO_FILE_COUNT)
+		return -ENODATA;
+
+	stored_value = &mi->pseudo_file_xattr[i].data;
+	stored_size = &mi->pseudo_file_xattr[i].len;
+	kfree (*stored_value);
+	*stored_value = kzalloc(size, GFP_NOFS);
+	if (!*stored_value)
+		return -ENOMEM;
+
+	memcpy(*stored_value, value, size);
+	*stored_size = size;
+	return 0;
+}
+
+static ssize_t incfs_listxattr(struct dentry *d, char *list, size_t size)
+{
+	struct dentry_info *di = get_incfs_dentry(d);
+
+	if (!di || !di->backing_path.dentry)
+		return -ENODATA;
+
+	return vfs_listxattr(di->backing_path.dentry, list, size);
+}
+
+struct dentry *incfs_mount_fs(struct file_system_type *type, int flags,
+			      const char *dev_name, void *data)
+{
+	struct mount_options options = {};
+	struct mount_info *mi = NULL;
+	struct path backing_dir_path = {};
+	struct dentry *index_dir = NULL;
+	struct dentry *incomplete_dir = NULL;
+	struct super_block *src_fs_sb = NULL;
+	struct inode *root_inode = NULL;
+	struct super_block *sb = sget(type, NULL, set_anon_super, flags, NULL);
+	bool dir_created = false;
+	int error = 0;
+
+	if (IS_ERR(sb))
+		return ERR_CAST(sb);
+
+	sb->s_op = &incfs_super_ops;
+	sb->s_d_op = &incfs_dentry_ops;
+	sb->s_flags |= S_NOATIME;
+	sb->s_magic = INCFS_MAGIC_NUMBER;
+	sb->s_time_gran = 1;
+	sb->s_blocksize = INCFS_DATA_FILE_BLOCK_SIZE;
+	sb->s_blocksize_bits = blksize_bits(sb->s_blocksize);
+	sb->s_xattr = incfs_xattr_ops;
+
+	BUILD_BUG_ON(PAGE_SIZE != INCFS_DATA_FILE_BLOCK_SIZE);
+
+	if (!dev_name) {
+		pr_err("incfs: Backing dir is not set, filesystem can't be mounted.\n");
+		error = -ENOENT;
+		goto err_deactivate;
+	}
+
+	error = parse_options(&options, (char *)data);
+	if (error != 0) {
+		pr_err("incfs: Options parsing error. %d\n", error);
+		goto err_deactivate;
+	}
+
+	sb->s_bdi->ra_pages = options.readahead_pages;
+	if (!dev_name) {
+		pr_err("incfs: Backing dir is not set, filesystem can't be mounted.\n");
+		error = -ENOENT;
+		goto err_free_opts;
+	}
+
+	error = kern_path(dev_name, LOOKUP_FOLLOW | LOOKUP_DIRECTORY,
+			&backing_dir_path);
+	if (error || backing_dir_path.dentry == NULL ||
+		!d_really_is_positive(backing_dir_path.dentry)) {
+		pr_err("incfs: Error accessing: %s.\n",
+			dev_name);
+		goto err_free_opts;
+	}
+	src_fs_sb = backing_dir_path.dentry->d_sb;
+	sb->s_maxbytes = src_fs_sb->s_maxbytes;
+	sb->s_stack_depth = src_fs_sb->s_stack_depth + 1;
+
+	if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
+		error = -EINVAL;
+		goto err_put_path;
+	}
+
+	mi = incfs_alloc_mount_info(sb, &options, &backing_dir_path);
+	if (IS_ERR_OR_NULL(mi)) {
+		error = PTR_ERR(mi);
+		pr_err("incfs: Error allocating mount info. %d\n", error);
+		goto err_put_path;
+	}
+
+	sb->s_fs_info = mi;
+	mi->mi_backing_dir_path = backing_dir_path;
+	index_dir = open_or_create_special_dir(backing_dir_path.dentry,
+					       INCFS_INDEX_NAME, &dir_created);
+	if (IS_ERR_OR_NULL(index_dir)) {
+		error = PTR_ERR(index_dir);
+		pr_err("incfs: Can't find or create .index dir in %s\n",
+			dev_name);
+		/* No need to null index_dir since we don't put it */
+		goto err_put_path;
+	}
+
+	mi->mi_index_dir = index_dir;
+	mi->mi_index_free = dir_created;
+
+	incomplete_dir = open_or_create_special_dir(backing_dir_path.dentry,
+						    INCFS_INCOMPLETE_NAME,
+						    &dir_created);
+	if (IS_ERR_OR_NULL(incomplete_dir)) {
+		error = PTR_ERR(incomplete_dir);
+		pr_err("incfs: Can't find or create .incomplete dir in %s\n",
+			dev_name);
+		/* No need to null incomplete_dir since we don't put it */
+		goto err_put_path;
+	}
+	mi->mi_incomplete_dir = incomplete_dir;
+	mi->mi_incomplete_free = dir_created;
+
+	root_inode = fetch_regular_inode(sb, backing_dir_path.dentry);
+	if (IS_ERR(root_inode)) {
+		error = PTR_ERR(root_inode);
+		goto err_put_path;
+	}
+
+	sb->s_root = d_make_root(root_inode);
+	if (!sb->s_root) {
+		error = -ENOMEM;
+		goto err_put_path;
+	}
+	error = incfs_init_dentry(sb->s_root, &backing_dir_path);
+	if (error)
+		goto err_put_path;
+
+	path_put(&backing_dir_path);
+	sb->s_flags |= SB_ACTIVE;
+
+	pr_debug("incfs: mount\n");
+	return dget(sb->s_root);
+
+err_put_path:
+	path_put(&backing_dir_path);
+err_free_opts:
+	free_options(&options);
+err_deactivate:
+	deactivate_locked_super(sb);
+	pr_err("incfs: mount failed %d\n", error);
+	return ERR_PTR(error);
+}
+
+static int incfs_remount_fs(struct super_block *sb, int *flags, char *data)
+{
+	struct mount_options options;
+	struct mount_info *mi = get_mount_info(sb);
+	int err = 0;
+
+	sync_filesystem(sb);
+	err = parse_options(&options, (char *)data);
+	if (err)
+		return err;
+
+	if (options.report_uid != mi->mi_options.report_uid) {
+		pr_err("incfs: Can't change report_uid mount option on remount\n");
+		err = -EOPNOTSUPP;
+		goto out;
+	}
+
+	err = incfs_realloc_mount_info(mi, &options);
+	if (err)
+		goto out;
+
+	pr_debug("incfs: remount\n");
+
+out:
+	free_options(&options);
+	return err;
+}
+
+void incfs_kill_sb(struct super_block *sb)
+{
+	struct mount_info *mi = sb->s_fs_info;
+	struct inode *dinode = NULL;
+
+	pr_debug("incfs: unmount\n");
+
+	if (mi) {
+		if (mi->mi_backing_dir_path.dentry)
+			dinode = d_inode(mi->mi_backing_dir_path.dentry);
+
+		if (dinode) {
+			if (mi->mi_index_dir && mi->mi_index_free)
+				vfs_rmdir(&init_user_ns, dinode,
+					  mi->mi_index_dir);
+
+			if (mi->mi_incomplete_dir && mi->mi_incomplete_free)
+				vfs_rmdir(&init_user_ns, dinode,
+					  mi->mi_incomplete_dir);
+		}
+
+		incfs_free_mount_info(mi);
+		sb->s_fs_info = NULL;
+	}
+	kill_anon_super(sb);
+}
+
+static int show_options(struct seq_file *m, struct dentry *root)
+{
+	struct mount_info *mi = get_mount_info(root->d_sb);
+
+	seq_printf(m, ",read_timeout_ms=%u", mi->mi_options.read_timeout_ms);
+	seq_printf(m, ",readahead=%u", mi->mi_options.readahead_pages);
+	if (mi->mi_options.read_log_pages != 0) {
+		seq_printf(m, ",rlog_pages=%u", mi->mi_options.read_log_pages);
+		seq_printf(m, ",rlog_wakeup_cnt=%u",
+			   mi->mi_options.read_log_wakeup_count);
+	}
+	if (mi->mi_options.report_uid)
+		seq_puts(m, ",report_uid");
+
+	if (mi->mi_sysfs_node)
+		seq_printf(m, ",sysfs_name=%s",
+			   kobject_name(&mi->mi_sysfs_node->isn_sysfs_node));
+	return 0;
+}

diff --git a/fs/incfs/vfs.h b/fs/incfs/vfs.h
new file mode 100644
index 0000000..79fdf24
--- /dev/null
+++ b/fs/incfs/vfs.h

@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2018 Google LLC
+ */
+
+#ifndef _INCFS_VFS_H
+#define _INCFS_VFS_H
+
+extern const struct file_operations incfs_file_ops;
+extern const struct inode_operations incfs_file_inode_ops;
+
+void incfs_kill_sb(struct super_block *sb);
+struct dentry *incfs_mount_fs(struct file_system_type *type, int flags,
+			      const char *dev_name, void *data);
+int incfs_link(struct dentry *what, struct dentry *where);
+int incfs_unlink(struct dentry *dentry);
+
+static inline struct mount_info *get_mount_info(struct super_block *sb)
+{
+	struct mount_info *result = sb->s_fs_info;
+
+	WARN_ON(!result);
+	return result;
+}
+
+static inline struct super_block *file_superblock(struct file *f)
+{
+	struct inode *inode = file_inode(f);
+
+	return inode->i_sb;
+}
+
+#endif

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 1c4bfda..05299fb 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c

@@ -733,6 +733,8 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	struct fsnotify_group *group;
 	struct inode *inode;
 	struct path path;
+	struct path alteredpath;
+	struct path *canonical_path = &path;
 	struct fd f;
 	int ret;
 	unsigned flags = 0;
@@ -779,13 +781,23 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	if (ret)
 		goto fput_and_out;
 
+	/* support stacked filesystems */
+	if (path.dentry && path.dentry->d_op) {
+		if (path.dentry->d_op->d_canonical_path) {
+			path.dentry->d_op->d_canonical_path(&path,
+							    &alteredpath);
+			canonical_path = &alteredpath;
+			path_put(&path);
+		}
+	}
+
 	/* inode held in place by reference to path; group by fget on fd */
-	inode = path.dentry->d_inode;
+	inode = canonical_path->dentry->d_inode;
 	group = f.file->private_data;
 
 	/* create/update an inode mark */
 	ret = inotify_update_watch(group, inode, mask);
-	path_put(&path);
+	path_put(canonical_path);
 fput_and_out:
 	fdput(f);
 	return ret;

diff --git a/fs/open.c b/fs/open.c
index a81319b..9c8fa36 100644
--- a/fs/open.c
+++ b/fs/open.c

@@ -35,6 +35,7 @@
 #include <linux/mnt_idmapping.h>
 
 #include "internal.h"
+#include <trace/hooks/syscall_check.h>
 
 int do_truncate(struct user_namespace *mnt_userns, struct dentry *dentry,
 		loff_t length, unsigned int time_attrs, struct file *filp)
@@ -865,6 +866,7 @@ static int do_dentry_open(struct file *f,
 		error = -ENODEV;
 		goto cleanup_all;
 	}
+	trace_android_vh_check_file_open(f);
 
 	error = security_file_open(f);
 	if (error)

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 91a95bf..358e1d8 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c

@@ -1059,7 +1059,7 @@ static int ovl_copy_up_flags(struct dentry *dentry, int flags)
 		dput(parent);
 		dput(next);
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return err;
 }

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index c3032ce..71c9b4d 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c

@@ -572,7 +572,7 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
 			      struct ovl_cattr *attr, bool origin)
 {
 	int err;
-	const struct cred *old_cred;
+	const struct cred *old_cred, *hold_cred = NULL;
 	struct cred *override_cred;
 	struct dentry *parent = dentry->d_parent;
 
@@ -613,13 +613,14 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
 		override_cred->fsuid = inode->i_uid;
 		override_cred->fsgid = inode->i_gid;
 		err = security_dentry_create_files_as(dentry,
-				attr->mode, &dentry->d_name, old_cred,
+				attr->mode, &dentry->d_name,
+				old_cred ? old_cred : current_cred(),
 				override_cred);
 		if (err) {
 			put_cred(override_cred);
 			goto out_revert_creds;
 		}
-		put_cred(override_creds(override_cred));
+		hold_cred = override_creds(override_cred);
 		put_cred(override_cred);
 	}
 
@@ -629,7 +630,9 @@ static int ovl_create_or_link(struct dentry *dentry, struct inode *inode,
 		err = ovl_create_over_whiteout(dentry, inode, attr);
 
 out_revert_creds:
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred ?: hold_cred);
+	if (old_cred && hold_cred)
+		put_cred(hold_cred);
 	return err;
 }
 
@@ -706,7 +709,7 @@ static int ovl_set_link_redirect(struct dentry *dentry)
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	err = ovl_set_redirect(dentry, false);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return err;
 }
@@ -925,7 +928,7 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
 		err = ovl_remove_upper(dentry, is_dir, &list);
 	else
 		err = ovl_remove_and_whiteout(dentry, &list);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (!err) {
 		if (is_dir)
 			clear_nlink(dentry->d_inode);
@@ -1300,7 +1303,7 @@ static int ovl_rename(struct user_namespace *mnt_userns, struct inode *olddir,
 out_unlock:
 	unlock_rename(new_upperdir, old_upperdir);
 out_revert_creds:
-	revert_creds(old_cred);
+	ovl_revert_creds(old->d_sb, old_cred);
 	if (update_nlink)
 		ovl_nlink_end(new);
 out_drop_write:

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 6011f95..f2ebae6 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c

@@ -15,6 +15,8 @@
 #include <linux/fs.h>
 #include "overlayfs.h"
 
+#define OVL_IOCB_MASK (IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
+
 struct ovl_aio_req {
 	struct kiocb iocb;
 	refcount_t ref;
@@ -58,13 +60,14 @@ static struct file *ovl_open_realfile(const struct file *file,
 	if (err) {
 		realfile = ERR_PTR(err);
 	} else {
-		if (!inode_owner_or_capable(real_mnt_userns, realinode))
+		if (old_cred && !inode_owner_or_capable(real_mnt_userns,
+							realinode))
 			flags &= ~O_NOATIME;
 
 		realfile = open_with_fake_path(&file->f_path, flags, realinode,
 					       current_cred());
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
 		 file, file, ovl_whatisit(inode, realinode), file->f_flags,
@@ -209,7 +212,7 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 
 	old_cred = ovl_override_creds(inode->i_sb);
 	ret = vfs_llseek(real.file, offset, whence);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	file->f_pos = real.file->f_pos;
 	ovl_inode_unlock(inode);
@@ -241,22 +244,6 @@ static void ovl_file_accessed(struct file *file)
 	touch_atime(&file->f_path);
 }
 
-static rwf_t ovl_iocb_to_rwf(int ifl)
-{
-	rwf_t flags = 0;
-
-	if (ifl & IOCB_NOWAIT)
-		flags |= RWF_NOWAIT;
-	if (ifl & IOCB_HIPRI)
-		flags |= RWF_HIPRI;
-	if (ifl & IOCB_DSYNC)
-		flags |= RWF_DSYNC;
-	if (ifl & IOCB_SYNC)
-		flags |= RWF_SYNC;
-
-	return flags;
-}
-
 static inline void ovl_aio_put(struct ovl_aio_req *aio_req)
 {
 	if (refcount_dec_and_test(&aio_req->ref)) {
@@ -316,7 +303,8 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	if (is_sync_kiocb(iocb)) {
 		ret = vfs_iter_read(real.file, iter, &iocb->ki_pos,
-				    ovl_iocb_to_rwf(iocb->ki_flags));
+				    iocb_to_rw_flags(iocb->ki_flags,
+						     OVL_IOCB_MASK));
 	} else {
 		struct ovl_aio_req *aio_req;
 
@@ -337,7 +325,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 			ovl_aio_cleanup_handler(aio_req);
 	}
 out:
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 	ovl_file_accessed(file);
 out_fdput:
 	fdput(real);
@@ -380,7 +368,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (is_sync_kiocb(iocb)) {
 		file_start_write(real.file);
 		ret = vfs_iter_write(real.file, iter, &iocb->ki_pos,
-				     ovl_iocb_to_rwf(ifl));
+				     iocb_to_rw_flags(ifl, OVL_IOCB_MASK));
 		file_end_write(real.file);
 		/* Update size */
 		ovl_copyattr(inode);
@@ -409,7 +397,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 			ovl_aio_cleanup_handler(aio_req);
 	}
 out:
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 out_fdput:
 	fdput(real);
 
@@ -454,7 +442,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	file_end_write(real.file);
 	/* Update size */
 	ovl_copyattr(inode);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 	fdput(real);
 
 out_unlock:
@@ -481,7 +469,7 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	if (file_inode(real.file) == ovl_inode_upper(file_inode(file))) {
 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
 		ret = vfs_fsync_range(real.file, start, end, datasync);
-		revert_creds(old_cred);
+		ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 	}
 
 	fdput(real);
@@ -505,7 +493,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = call_mmap(vma->vm_file, vma);
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 	ovl_file_accessed(file);
 
 	return ret;
@@ -531,7 +519,7 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = vfs_fallocate(real.file, mode, offset, len);
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 
 	/* Update size */
 	ovl_copyattr(inode);
@@ -556,7 +544,7 @@ static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = vfs_fadvise(real.file, offset, len, advice);
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 
 	fdput(real);
 
@@ -615,7 +603,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 						flags);
 		break;
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file_out)->i_sb, old_cred);
 
 	/* Update size */
 	ovl_copyattr(inode_out);
@@ -677,7 +665,7 @@ static int ovl_flush(struct file *file, fl_owner_t id)
 	if (real.file->f_op->flush) {
 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
 		err = real.file->f_op->flush(real.file, id);
-		revert_creds(old_cred);
+		ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 	}
 	fdput(real);
 

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 9e61511..d690e75 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c

@@ -79,7 +79,7 @@ int ovl_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 		inode_lock(upperdentry->d_inode);
 		old_cred = ovl_override_creds(dentry->d_sb);
 		err = ovl_do_notify_change(ofs, upperdentry, attr);
-		revert_creds(old_cred);
+		ovl_revert_creds(dentry->d_sb, old_cred);
 		if (!err)
 			ovl_copyattr(dentry->d_inode);
 		inode_unlock(upperdentry->d_inode);
@@ -271,7 +271,7 @@ int ovl_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		stat->nlink = dentry->d_inode->i_nlink;
 
 out:
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return err;
 }
@@ -309,7 +309,7 @@ int ovl_permission(struct user_namespace *mnt_userns,
 		mask |= MAY_READ;
 	}
 	err = inode_permission(mnt_user_ns(realpath.mnt), realinode, mask);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	return err;
 }
@@ -326,7 +326,7 @@ static const char *ovl_get_link(struct dentry *dentry,
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	p = vfs_get_link(ovl_dentry_real(dentry), done);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return p;
 }
 
@@ -360,7 +360,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 		ovl_path_lower(dentry, &realpath);
 		old_cred = ovl_override_creds(dentry->d_sb);
 		err = vfs_getxattr(mnt_user_ns(realpath.mnt), realdentry, name, NULL, 0);
-		revert_creds(old_cred);
+		ovl_revert_creds(dentry->d_sb, old_cred);
 		if (err < 0)
 			goto out_drop_write;
 	}
@@ -381,7 +381,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 		WARN_ON(flags != XATTR_REPLACE);
 		err = ovl_do_removexattr(ofs, realdentry, name);
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	/* copy c/mtime */
 	ovl_copyattr(inode);
@@ -402,7 +402,7 @@ int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
 	ovl_i_path_real(inode, &realpath);
 	old_cred = ovl_override_creds(dentry->d_sb);
 	res = vfs_getxattr(mnt_user_ns(realpath.mnt), realpath.dentry, name, value, size);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return res;
 }
 
@@ -430,7 +430,7 @@ ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size)
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	res = vfs_listxattr(realdentry, list, size);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (res <= 0 || size == 0)
 		return res;
 
@@ -518,7 +518,7 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type, bool rcu)
 
 		old_cred = ovl_override_creds(inode->i_sb);
 		acl = get_acl(realinode, type);
-		revert_creds(old_cred);
+		ovl_revert_creds(inode->i_sb, old_cred);
 	}
 	/*
 	 * If there are no POSIX ACLs, or we encountered an error,
@@ -578,7 +578,7 @@ static int ovl_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 
 	old_cred = ovl_override_creds(inode->i_sb);
 	err = realinode->i_op->fiemap(realinode, fieinfo, start, len);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	return err;
 }
@@ -649,7 +649,7 @@ int ovl_fileattr_set(struct user_namespace *mnt_userns,
 		err = ovl_set_protattr(inode, upperpath.dentry, fa);
 		if (!err)
 			err = ovl_real_fileattr_set(&upperpath, fa);
-		revert_creds(old_cred);
+		ovl_revert_creds(inode->i_sb, old_cred);
 
 		/*
 		 * Merge real inode flags with inode flags read from
@@ -711,7 +711,7 @@ int ovl_fileattr_get(struct dentry *dentry, struct fileattr *fa)
 	old_cred = ovl_override_creds(inode->i_sb);
 	err = ovl_real_fileattr_get(&realpath, fa);
 	ovl_fileattr_prot_flags(inode, fa);
-	revert_creds(old_cred);
+	ovl_revert_creds(inode->i_sb, old_cred);
 
 	return err;
 }

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 0fd1d5f..28bb2a3 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c

@@ -1119,7 +1119,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	ovl_dentry_update_reval(dentry, upperdentry,
 			DCACHE_OP_REVALIDATE | DCACHE_OP_WEAK_REVALIDATE);
 
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (origin_path) {
 		dput(origin_path->dentry);
 		kfree(origin_path);
@@ -1146,7 +1146,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	kfree(upperredirect);
 out:
 	kfree(d.redirect);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return ERR_PTR(err);
 }
 
@@ -1198,7 +1198,7 @@ bool ovl_lower_positive(struct dentry *dentry)
 			dput(this);
 		}
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 	return positive;
 }

diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index e74a610..09fa935 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h

@@ -354,6 +354,7 @@ int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
 const struct cred *ovl_override_creds(struct super_block *sb);
+void ovl_revert_creds(struct super_block *sb, const struct cred *oldcred);
 int ovl_can_decode_fh(struct super_block *sb);
 struct dentry *ovl_indexdir(struct super_block *sb);
 bool ovl_index_all(struct super_block *sb);

diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index e1af8f6..17c13f7 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h

@@ -20,6 +20,7 @@ struct ovl_config {
 	bool metacopy;
 	bool userxattr;
 	bool ovl_volatile;
+	bool override_creds;
 };
 
 struct ovl_sb {

diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 2b21064..aad7a62 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c

@@ -286,7 +286,7 @@ static int ovl_check_whiteouts(const struct path *path, struct ovl_readdir_data
 		}
 		inode_unlock(dir->d_inode);
 	}
-	revert_creds(old_cred);
+	ovl_revert_creds(rdd->dentry->d_sb, old_cred);
 
 	return err;
 }
@@ -789,7 +789,7 @@ static int ovl_iterate(struct file *file, struct dir_context *ctx)
 	}
 	err = 0;
 out:
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	return err;
 }
 
@@ -841,7 +841,7 @@ static struct file *ovl_dir_open_realfile(const struct file *file,
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	res = ovl_path_open(realpath, O_RDONLY | (file->f_flags & O_LARGEFILE));
-	revert_creds(old_cred);
+	ovl_revert_creds(file_inode(file)->i_sb, old_cred);
 
 	return res;
 }
@@ -967,7 +967,7 @@ int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list)
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	err = ovl_dir_read_merged(dentry, list, &root);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 	if (err)
 		return err;
 

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 3d14a3f..416a665 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c

@@ -54,6 +54,11 @@ module_param_named(xino_auto, ovl_xino_auto_def, bool, 0644);
 MODULE_PARM_DESC(xino_auto,
 		 "Auto enable xino feature");
 
+static bool __read_mostly ovl_override_creds_def = true;
+module_param_named(override_creds, ovl_override_creds_def, bool, 0644);
+MODULE_PARM_DESC(ovl_override_creds_def,
+		 "Use mounter's credentials for accesses");
+
 static void ovl_entry_stack_free(struct ovl_entry *oe)
 {
 	unsigned int i;
@@ -391,6 +396,9 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 		seq_puts(m, ",volatile");
 	if (ofs->config.userxattr)
 		seq_puts(m, ",userxattr");
+	if (ofs->config.override_creds != ovl_override_creds_def)
+		seq_show_option(m, "override_creds",
+				ofs->config.override_creds ? "on" : "off");
 	return 0;
 }
 
@@ -446,6 +454,8 @@ enum {
 	OPT_METACOPY_ON,
 	OPT_METACOPY_OFF,
 	OPT_VOLATILE,
+	OPT_OVERRIDE_CREDS_ON,
+	OPT_OVERRIDE_CREDS_OFF,
 	OPT_ERR,
 };
 
@@ -468,6 +478,8 @@ static const match_table_t ovl_tokens = {
 	{OPT_METACOPY_ON,		"metacopy=on"},
 	{OPT_METACOPY_OFF,		"metacopy=off"},
 	{OPT_VOLATILE,			"volatile"},
+	{OPT_OVERRIDE_CREDS_ON,		"override_creds=on"},
+	{OPT_OVERRIDE_CREDS_OFF,	"override_creds=off"},
 	{OPT_ERR,			NULL}
 };
 
@@ -527,6 +539,7 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 	config->redirect_mode = kstrdup(ovl_redirect_mode_def(), GFP_KERNEL);
 	if (!config->redirect_mode)
 		return -ENOMEM;
+	config->override_creds = ovl_override_creds_def;
 
 	while ((p = ovl_next_opt(&opt)) != NULL) {
 		int token;
@@ -628,6 +641,14 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
 			config->userxattr = true;
 			break;
 
+		case OPT_OVERRIDE_CREDS_ON:
+			config->override_creds = true;
+			break;
+
+		case OPT_OVERRIDE_CREDS_OFF:
+			config->override_creds = false;
+			break;
+
 		default:
 			pr_err("unrecognized mount option \"%s\" or missing value\n",
 					p);
@@ -2166,7 +2187,6 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	kfree(splitlower);
 
 	sb->s_root = root_dentry;
-
 	return 0;
 
 out_free_oe:

diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 81a57a8..4197dab 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c

@@ -38,9 +38,18 @@ const struct cred *ovl_override_creds(struct super_block *sb)
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
 
+	if (!ofs->config.override_creds)
+		return NULL;
 	return override_creds(ofs->creator_cred);
 }
 
+void ovl_revert_creds(struct super_block *sb, const struct cred *old_cred)
+{
+	if (old_cred)
+		revert_creds(old_cred);
+}
+
+
 /*
  * Check if underlying fs supports file handles and try to determine encoding
  * type, in order to deduce maximum inode number used by fs.
@@ -927,7 +936,7 @@ int ovl_nlink_start(struct dentry *dentry)
 	 * value relative to the upper inode nlink in an upper inode xattr.
 	 */
 	err = ovl_set_nlink_upper(dentry);
-	revert_creds(old_cred);
+	ovl_revert_creds(dentry->d_sb, old_cred);
 
 out:
 	if (err)
@@ -945,7 +954,7 @@ void ovl_nlink_end(struct dentry *dentry)
 
 		old_cred = ovl_override_creds(dentry->d_sb);
 		ovl_cleanup_index(dentry);
-		revert_creds(old_cred);
+		ovl_revert_creds(dentry->d_sb, old_cred);
 	}
 
 	ovl_inode_unlock(inode);

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9e479d7d..f7c6474c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c

@@ -96,6 +96,7 @@
 #include <linux/time_namespace.h>
 #include <linux/resctrl.h>
 #include <linux/cn_proc.h>
+#include <linux/cpufreq_times.h>
 #include <trace/events/oom.h>
 #include "internal.h"
 #include "fd.h"
@@ -3336,6 +3337,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_LIVEPATCH
 	ONE("patch_state",  S_IRUSR, proc_pid_patch_state),
 #endif
+#ifdef CONFIG_CPU_FREQ_TIMES
+	ONE("time_in_state", 0444, proc_time_in_state_show),
+#endif
 #ifdef CONFIG_STACKLEAK_METRICS
 	ONE("stack_depth", S_IRUGO, proc_stack_depth),
 #endif
@@ -3687,6 +3691,9 @@ static const struct pid_entry tid_base_stuff[] = {
 	ONE("ksm_merging_pages",  S_IRUSR, proc_pid_ksm_merging_pages),
 	ONE("ksm_stat",  S_IRUSR, proc_pid_ksm_stat),
 #endif
+#ifdef CONFIG_CPU_FREQ_TIMES
+	ONE("time_in_state", 0444, proc_time_in_state_show),
+#endif
 };
 
 static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx)

diff --git a/fs/select.c b/fs/select.c
index 0ee55af..794e2a9 100644
--- a/fs/select.c
+++ b/fs/select.c

@@ -476,6 +476,7 @@ static inline void wait_key_set(poll_table *wait, unsigned long in,
 		wait->_key |= POLLOUT_SET;
 }
 
+noinline_for_stack
 static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
 {
 	ktime_t expire, *to = NULL;

diff --git a/fs/sync.c b/fs/sync.c
index dc725914..df95543 100644
--- a/fs/sync.c
+++ b/fs/sync.c

@@ -10,7 +10,7 @@
 #include <linux/slab.h>
 #include <linux/export.h>
 #include <linux/namei.h>
-#include <linux/sched.h>
+#include <linux/sched/xacct.h>
 #include <linux/writeback.h>
 #include <linux/syscalls.h>
 #include <linux/linkage.h>
@@ -211,6 +211,7 @@ static int do_fsync(unsigned int fd, int datasync)
 	if (f.file) {
 		ret = vfs_fsync(f.file, datasync);
 		fdput(f);
+		inc_syscfs(current);
 	}
 	return ret;
 }

diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index dbe1ce5..c7fcb855 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h

@@ -32,6 +32,11 @@ struct fsverity_hash_alg {
 	unsigned int digest_size; /* digest size in bytes, e.g. 32 for SHA-256 */
 	unsigned int block_size;  /* block size in bytes, e.g. 64 for SHA-256 */
 	mempool_t req_pool;	  /* mempool with a preallocated hash request */
+	/*
+	 * The HASH_ALGO_* constant for this algorithm.  This is different from
+	 * FS_VERITY_HASH_ALG_*, which uses a different numbering scheme.
+	 */
+	enum hash_algo algo_id;
 };
 
 /* Merkle tree parameters: hash algorithm, initial hash state, and topology */

diff --git a/fs/verity/hash_algs.c b/fs/verity/hash_algs.c
index 71d0fcc..6f8170c 100644
--- a/fs/verity/hash_algs.c
+++ b/fs/verity/hash_algs.c

@@ -16,11 +16,13 @@ struct fsverity_hash_alg fsverity_hash_algs[] = {
 		.name = "sha256",
 		.digest_size = SHA256_DIGEST_SIZE,
 		.block_size = SHA256_BLOCK_SIZE,
+		.algo_id = HASH_ALGO_SHA256,
 	},
 	[FS_VERITY_HASH_ALG_SHA512] = {
 		.name = "sha512",
 		.digest_size = SHA512_DIGEST_SIZE,
 		.block_size = SHA512_BLOCK_SIZE,
+		.algo_id = HASH_ALGO_SHA512,
 	},
 };
 
@@ -324,5 +326,9 @@ void __init fsverity_check_hash_algs(void)
 		 */
 		BUG_ON(!is_power_of_2(alg->digest_size));
 		BUG_ON(!is_power_of_2(alg->block_size));
+
+		/* Verify that there is a valid mapping to HASH_ALGO_*. */
+		BUG_ON(alg->algo_id == 0);
+		BUG_ON(alg->digest_size != hash_digest_size[alg->algo_id]);
 	}
 }

diff --git a/fs/verity/measure.c b/fs/verity/measure.c
index e99c003..5c79ea1 100644
--- a/fs/verity/measure.c
+++ b/fs/verity/measure.c

@@ -65,8 +65,7 @@ EXPORT_SYMBOL_GPL(fsverity_ioctl_measure);
  * @alg: (out) pointer to the hash algorithm enumeration
  *
  * Return the file hash algorithm and digest of an fsverity protected file.
- * Assumption: before calling fsverity_get_digest(), the file must have been
- * opened.
+ * Assumption: before calling this, the file must have been opened.
  *
  * Return: 0 on success, -errno on failure
  */
@@ -76,27 +75,13 @@ int fsverity_get_digest(struct inode *inode,
 {
 	const struct fsverity_info *vi;
 	const struct fsverity_hash_alg *hash_alg;
-	int i;
 
 	vi = fsverity_get_info(inode);
 	if (!vi)
 		return -ENODATA; /* not a verity file */
 
 	hash_alg = vi->tree_params.hash_alg;
-	memset(digest, 0, FS_VERITY_MAX_DIGEST_SIZE);
-
-	/* convert the verity hash algorithm name to a hash_algo_name enum */
-	i = match_string(hash_algo_name, HASH_ALGO__LAST, hash_alg->name);
-	if (i < 0)
-		return -EINVAL;
-	*alg = i;
-
-	if (WARN_ON_ONCE(hash_alg->digest_size != hash_digest_size[*alg]))
-		return -EINVAL;
 	memcpy(digest, vi->file_digest, hash_alg->digest_size);
-
-	pr_debug("file digest %s:%*phN\n", hash_algo_name[*alg],
-		 hash_digest_size[*alg], digest);
-
+	*alg = hash_alg->algo_id;
 	return 0;
 }

diff --git a/fs/verity/signature.c b/fs/verity/signature.c
index 143a530..e33f6c4 100644
--- a/fs/verity/signature.c
+++ b/fs/verity/signature.c

@@ -40,11 +40,38 @@ static struct key *fsverity_keyring;
 int fsverity_verify_signature(const struct fsverity_info *vi,
 			      const u8 *signature, size_t sig_size)
 {
-	const struct inode *inode = vi->inode;
-	const struct fsverity_hash_alg *hash_alg = vi->tree_params.hash_alg;
+	unsigned int digest_algorithm =
+		vi->tree_params.hash_alg - fsverity_hash_algs;
+
+	return __fsverity_verify_signature(vi->inode, signature, sig_size,
+					   vi->file_digest, digest_algorithm);
+}
+
+/**
+ * __fsverity_verify_signature() - check a verity file's signature
+ * @inode: the file's inode
+ * @signature: the file's signature
+ * @sig_size: size of @signature. Can be 0 if there is no signature
+ * @file_digest: the file's digest
+ * @digest_algorithm: the digest algorithm used
+ *
+ * Takes the file's digest and optional signature and verifies the signature
+ * against the digest and the fs-verity keyring if appropriate
+ *
+ * Return: 0 on success (signature valid or not required); -errno on failure
+ */
+int __fsverity_verify_signature(const struct inode *inode, const u8 *signature,
+				size_t sig_size, const u8 *file_digest,
+				unsigned int digest_algorithm)
+{
 	struct fsverity_formatted_digest *d;
+	struct fsverity_hash_alg *hash_alg = fsverity_get_hash_alg(inode,
+							digest_algorithm);
 	int err;
 
+	if (IS_ERR(hash_alg))
+		return PTR_ERR(hash_alg);
+
 	if (sig_size == 0) {
 		if (fsverity_require_signatures) {
 			fsverity_err(inode,
@@ -60,7 +87,7 @@ int fsverity_verify_signature(const struct fsverity_info *vi,
 	memcpy(d->magic, "FSVerity", 8);
 	d->digest_algorithm = cpu_to_le16(hash_alg - fsverity_hash_algs);
 	d->digest_size = cpu_to_le16(hash_alg->digest_size);
-	memcpy(d->digest, vi->file_digest, hash_alg->digest_size);
+	memcpy(d->digest, file_digest, hash_alg->digest_size);
 
 	err = verify_pkcs7_signature(d, sizeof(*d) + hash_alg->digest_size,
 				     signature, sig_size, fsverity_keyring,
@@ -83,9 +110,10 @@ int fsverity_verify_signature(const struct fsverity_info *vi,
 	}
 
 	pr_debug("Valid signature for file digest %s:%*phN\n",
-		 hash_alg->name, hash_alg->digest_size, vi->file_digest);
+		 hash_alg->name, hash_alg->digest_size, file_digest);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(__fsverity_verify_signature);
 
 #ifdef CONFIG_SYSCTL
 static struct ctl_table_header *fsverity_sysctl_header;

diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index bde8c9b..961ba24 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c

@@ -200,9 +200,8 @@ EXPORT_SYMBOL_GPL(fsverity_verify_page);
  * @bio: the bio to verify
  *
  * Verify a set of pages that have just been read from a verity file.  The pages
- * must be pagecache pages that are still locked and not yet uptodate.  Pages
- * that fail verification are set to the Error state.  Verification is skipped
- * for pages already in the Error state, e.g. due to fscrypt decryption failure.
+ * must be pagecache pages that are still locked and not yet uptodate.  If a
+ * page fails verification, then bio->bi_status is set to an error status.
  *
  * This is a helper function for use by the ->readahead() method of filesystems
  * that issue bios to read data directly into the page cache.  Filesystems that
@@ -244,9 +243,10 @@ void fsverity_verify_bio(struct bio *bio)
 		unsigned long level0_ra_pages =
 			min(max_ra_pages, params->level0_blocks - level0_index);
 
-		if (!PageError(page) &&
-		    !verify_page(inode, vi, req, page, level0_ra_pages))
-			SetPageError(page);
+		if (!verify_page(inode, vi, req, page, level0_ra_pages)) {
+			bio->bi_status = BLK_STS_IOERR;
+			break;
+		}
 	}
 
 	fsverity_free_hash_request(params->hash_alg, req);

diff --git a/include/OWNERS b/include/OWNERS
new file mode 100644
index 0000000..4b815c2
--- /dev/null
+++ b/include/OWNERS

@@ -0,0 +1 @@
+per-file net/**=file:/net/OWNERS

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index a68f8fbf..4c44a29 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h

@@ -80,24 +80,24 @@ DECLARE_TRACEPOINT(rwmmio_read);
 DECLARE_TRACEPOINT(rwmmio_post_read);
 
 void log_write_mmio(u64 val, u8 width, volatile void __iomem *addr,
-		    unsigned long caller_addr);
+		    unsigned long caller_addr, unsigned long caller_addr0);
 void log_post_write_mmio(u64 val, u8 width, volatile void __iomem *addr,
-			 unsigned long caller_addr);
+			 unsigned long caller_addr, unsigned long caller_addr0);
 void log_read_mmio(u8 width, const volatile void __iomem *addr,
-		   unsigned long caller_addr);
+		   unsigned long caller_addr, unsigned long caller_addr0);
 void log_post_read_mmio(u64 val, u8 width, const volatile void __iomem *addr,
-			unsigned long caller_addr);
+			unsigned long caller_addr, unsigned long caller_addr0);
 
 #else
 
 static inline void log_write_mmio(u64 val, u8 width, volatile void __iomem *addr,
-				  unsigned long caller_addr) {}
+				  unsigned long caller_addr, unsigned long caller_addr0) {}
 static inline void log_post_write_mmio(u64 val, u8 width, volatile void __iomem *addr,
-				       unsigned long caller_addr) {}
+				       unsigned long caller_addr, unsigned long caller_addr0) {}
 static inline void log_read_mmio(u8 width, const volatile void __iomem *addr,
-				 unsigned long caller_addr) {}
+				 unsigned long caller_addr, unsigned long caller_addr0) {}
 static inline void log_post_read_mmio(u64 val, u8 width, const volatile void __iomem *addr,
-				      unsigned long caller_addr) {}
+				      unsigned long caller_addr, unsigned long caller_addr0) {}
 
 #endif /* CONFIG_TRACE_MMIO_ACCESS */
 
@@ -188,11 +188,11 @@ static inline u8 readb(const volatile void __iomem *addr)
 {
 	u8 val;
 
-	log_read_mmio(8, addr, _THIS_IP_);
+	log_read_mmio(8, addr, _THIS_IP_, _RET_IP_);
 	__io_br();
 	val = __raw_readb(addr);
 	__io_ar(val);
-	log_post_read_mmio(val, 8, addr, _THIS_IP_);
+	log_post_read_mmio(val, 8, addr, _THIS_IP_, _RET_IP_);
 	return val;
 }
 #endif
@@ -203,11 +203,11 @@ static inline u16 readw(const volatile void __iomem *addr)
 {
 	u16 val;
 
-	log_read_mmio(16, addr, _THIS_IP_);
+	log_read_mmio(16, addr, _THIS_IP_, _RET_IP_);
 	__io_br();
 	val = __le16_to_cpu((__le16 __force)__raw_readw(addr));
 	__io_ar(val);
-	log_post_read_mmio(val, 16, addr, _THIS_IP_);
+	log_post_read_mmio(val, 16, addr, _THIS_IP_, _RET_IP_);
 	return val;
 }
 #endif
@@ -218,11 +218,11 @@ static inline u32 readl(const volatile void __iomem *addr)
 {
 	u32 val;
 
-	log_read_mmio(32, addr, _THIS_IP_);
+	log_read_mmio(32, addr, _THIS_IP_, _RET_IP_);
 	__io_br();
 	val = __le32_to_cpu((__le32 __force)__raw_readl(addr));
 	__io_ar(val);
-	log_post_read_mmio(val, 32, addr, _THIS_IP_);
+	log_post_read_mmio(val, 32, addr, _THIS_IP_, _RET_IP_);
 	return val;
 }
 #endif
@@ -234,11 +234,11 @@ static inline u64 readq(const volatile void __iomem *addr)
 {
 	u64 val;
 
-	log_read_mmio(64, addr, _THIS_IP_);
+	log_read_mmio(64, addr, _THIS_IP_, _RET_IP_);
 	__io_br();
 	val = __le64_to_cpu(__raw_readq(addr));
 	__io_ar(val);
-	log_post_read_mmio(val, 64, addr, _THIS_IP_);
+	log_post_read_mmio(val, 64, addr, _THIS_IP_, _RET_IP_);
 	return val;
 }
 #endif
@@ -248,11 +248,11 @@ static inline u64 readq(const volatile void __iomem *addr)
 #define writeb writeb
 static inline void writeb(u8 value, volatile void __iomem *addr)
 {
-	log_write_mmio(value, 8, addr, _THIS_IP_);
+	log_write_mmio(value, 8, addr, _THIS_IP_, _RET_IP_);
 	__io_bw();
 	__raw_writeb(value, addr);
 	__io_aw();
-	log_post_write_mmio(value, 8, addr, _THIS_IP_);
+	log_post_write_mmio(value, 8, addr, _THIS_IP_, _RET_IP_);
 }
 #endif
 
@@ -260,11 +260,11 @@ static inline void writeb(u8 value, volatile void __iomem *addr)
 #define writew writew
 static inline void writew(u16 value, volatile void __iomem *addr)
 {
-	log_write_mmio(value, 16, addr, _THIS_IP_);
+	log_write_mmio(value, 16, addr, _THIS_IP_, _RET_IP_);
 	__io_bw();
 	__raw_writew((u16 __force)cpu_to_le16(value), addr);
 	__io_aw();
-	log_post_write_mmio(value, 16, addr, _THIS_IP_);
+	log_post_write_mmio(value, 16, addr, _THIS_IP_, _RET_IP_);
 }
 #endif
 
@@ -272,11 +272,11 @@ static inline void writew(u16 value, volatile void __iomem *addr)
 #define writel writel
 static inline void writel(u32 value, volatile void __iomem *addr)
 {
-	log_write_mmio(value, 32, addr, _THIS_IP_);
+	log_write_mmio(value, 32, addr, _THIS_IP_, _RET_IP_);
 	__io_bw();
 	__raw_writel((u32 __force)__cpu_to_le32(value), addr);
 	__io_aw();
-	log_post_write_mmio(value, 32, addr, _THIS_IP_);
+	log_post_write_mmio(value, 32, addr, _THIS_IP_, _RET_IP_);
 }
 #endif
 
@@ -285,11 +285,11 @@ static inline void writel(u32 value, volatile void __iomem *addr)
 #define writeq writeq
 static inline void writeq(u64 value, volatile void __iomem *addr)
 {
-	log_write_mmio(value, 64, addr, _THIS_IP_);
+	log_write_mmio(value, 64, addr, _THIS_IP_, _RET_IP_);
 	__io_bw();
 	__raw_writeq(__cpu_to_le64(value), addr);
 	__io_aw();
-	log_post_write_mmio(value, 64, addr, _THIS_IP_);
+	log_post_write_mmio(value, 64, addr, _THIS_IP_, _RET_IP_);
 }
 #endif
 #endif /* CONFIG_64BIT */
@@ -305,9 +305,9 @@ static inline u8 readb_relaxed(const volatile void __iomem *addr)
 {
 	u8 val;
 
-	log_read_mmio(8, addr, _THIS_IP_);
+	log_read_mmio(8, addr, _THIS_IP_, _RET_IP_);
 	val = __raw_readb(addr);
-	log_post_read_mmio(val, 8, addr, _THIS_IP_);
+	log_post_read_mmio(val, 8, addr, _THIS_IP_, _RET_IP_);
 	return val;
 }
 #endif
@@ -318,9 +318,9 @@ static inline u16 readw_relaxed(const volatile void __iomem *addr)
 {
 	u16 val;
 
-	log_read_mmio(16, addr, _THIS_IP_);
+	log_read_mmio(16, addr, _THIS_IP_, _RET_IP_);
 	val = __le16_to_cpu(__raw_readw(addr));
-	log_post_read_mmio(val, 16, addr, _THIS_IP_);
+	log_post_read_mmio(val, 16, addr, _THIS_IP_, _RET_IP_);
 	return val;
 }
 #endif
@@ -331,9 +331,9 @@ static inline u32 readl_relaxed(const volatile void __iomem *addr)
 {
 	u32 val;
 
-	log_read_mmio(32, addr, _THIS_IP_);
+	log_read_mmio(32, addr, _THIS_IP_, _RET_IP_);
 	val = __le32_to_cpu(__raw_readl(addr));
-	log_post_read_mmio(val, 32, addr, _THIS_IP_);
+	log_post_read_mmio(val, 32, addr, _THIS_IP_, _RET_IP_);
 	return val;
 }
 #endif
@@ -344,9 +344,9 @@ static inline u64 readq_relaxed(const volatile void __iomem *addr)
 {
 	u64 val;
 
-	log_read_mmio(64, addr, _THIS_IP_);
+	log_read_mmio(64, addr, _THIS_IP_, _RET_IP_);
 	val = __le64_to_cpu(__raw_readq(addr));
-	log_post_read_mmio(val, 64, addr, _THIS_IP_);
+	log_post_read_mmio(val, 64, addr, _THIS_IP_, _RET_IP_);
 	return val;
 }
 #endif
@@ -355,9 +355,9 @@ static inline u64 readq_relaxed(const volatile void __iomem *addr)
 #define writeb_relaxed writeb_relaxed
 static inline void writeb_relaxed(u8 value, volatile void __iomem *addr)
 {
-	log_write_mmio(value, 8, addr, _THIS_IP_);
+	log_write_mmio(value, 8, addr, _THIS_IP_, _RET_IP_);
 	__raw_writeb(value, addr);
-	log_post_write_mmio(value, 8, addr, _THIS_IP_);
+	log_post_write_mmio(value, 8, addr, _THIS_IP_, _RET_IP_);
 }
 #endif
 
@@ -365,9 +365,9 @@ static inline void writeb_relaxed(u8 value, volatile void __iomem *addr)
 #define writew_relaxed writew_relaxed
 static inline void writew_relaxed(u16 value, volatile void __iomem *addr)
 {
-	log_write_mmio(value, 16, addr, _THIS_IP_);
+	log_write_mmio(value, 16, addr, _THIS_IP_, _RET_IP_);
 	__raw_writew(cpu_to_le16(value), addr);
-	log_post_write_mmio(value, 16, addr, _THIS_IP_);
+	log_post_write_mmio(value, 16, addr, _THIS_IP_, _RET_IP_);
 }
 #endif
 
@@ -375,9 +375,9 @@ static inline void writew_relaxed(u16 value, volatile void __iomem *addr)
 #define writel_relaxed writel_relaxed
 static inline void writel_relaxed(u32 value, volatile void __iomem *addr)
 {
-	log_write_mmio(value, 32, addr, _THIS_IP_);
+	log_write_mmio(value, 32, addr, _THIS_IP_, _RET_IP_);
 	__raw_writel(__cpu_to_le32(value), addr);
-	log_post_write_mmio(value, 32, addr, _THIS_IP_);
+	log_post_write_mmio(value, 32, addr, _THIS_IP_, _RET_IP_);
 }
 #endif
 
@@ -385,9 +385,9 @@ static inline void writel_relaxed(u32 value, volatile void __iomem *addr)
 #define writeq_relaxed writeq_relaxed
 static inline void writeq_relaxed(u64 value, volatile void __iomem *addr)
 {
-	log_write_mmio(value, 64, addr, _THIS_IP_);
+	log_write_mmio(value, 64, addr, _THIS_IP_, _RET_IP_);
 	__raw_writeq(__cpu_to_le64(value), addr);
-	log_post_write_mmio(value, 64, addr, _THIS_IP_);
+	log_post_write_mmio(value, 64, addr, _THIS_IP_, _RET_IP_);
 }
 #endif
 

diff --git a/include/drm/drm_mode_object.h b/include/drm/drm_mode_object.h
index 912f1e4..08d7a7f 100644
--- a/include/drm/drm_mode_object.h
+++ b/include/drm/drm_mode_object.h

@@ -60,7 +60,7 @@ struct drm_mode_object {
 	void (*free_cb)(struct kref *kref);
 };
 
-#define DRM_OBJECT_MAX_PROPERTY 24
+#define DRM_OBJECT_MAX_PROPERTY 64
 /**
  * struct drm_object_properties - property tracking for &drm_mode_object
  */

diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 6e55b92..8170930 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h

@@ -36,6 +36,35 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
 	return KVM_ARM_PSCI_0_1;
 }
 
+/* Narrow the PSCI register arguments (r1 to r3) to 32 bits. */
+static inline void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
+{
+	int i;
+
+	/*
+	 * Zero the input registers' upper 32 bits. They will be fully
+	 * zeroed on exit, so we're fine changing them in place.
+	 */
+	for (i = 1; i < 4; i++)
+		vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i)));
+}
+
+static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
+					   unsigned long affinity)
+{
+	return !(affinity & ~MPIDR_HWID_BITMASK);
+}
+
+
+#define AFFINITY_MASK(level)	~((0x1UL << ((level) * MPIDR_LEVEL_BITS)) - 1)
+
+static inline unsigned long psci_affinity_mask(unsigned long affinity_level)
+{
+	if (affinity_level <= 3)
+		return MPIDR_HWID_BITMASK & AFFINITY_MASK(affinity_level);
+
+	return 0;
+}
 
 int kvm_psci_call(struct kvm_vcpu *vcpu);
 

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4df9e73..ffeffbc 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h

@@ -384,8 +384,7 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 
 void kvm_vgic_load(struct kvm_vcpu *vcpu);
-void kvm_vgic_put(struct kvm_vcpu *vcpu);
-void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
+void kvm_vgic_put(struct kvm_vcpu *vcpu, bool blocking);
 
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	((k)->arch.vgic.initialized)

diff --git a/include/linux/OWNERS b/include/linux/OWNERS
new file mode 100644
index 0000000..68b6ded
--- /dev/null
+++ b/include/linux/OWNERS

@@ -0,0 +1,4 @@
+per-file bio.h=file:/block/OWNERS
+per-file blk*.h=file:/block/OWNERS
+per-file f2fs**=file:/fs/f2fs/OWNERS
+per-file net**=file:/net/OWNERS

diff --git a/include/linux/android_kabi.h b/include/linux/android_kabi.h
new file mode 100644
index 0000000..f6dd7f0
--- /dev/null
+++ b/include/linux/android_kabi.h

@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * android_kabi.h - Android kernel abi abstraction header
+ *
+ * Copyright (C) 2020 Google, Inc.
+ *
+ * Heavily influenced by rh_kabi.h which came from the RHEL/CENTOS kernel and
+ * was:
+ *	Copyright (c) 2014 Don Zickus
+ *	Copyright (c) 2015-2018 Jiri Benc
+ *	Copyright (c) 2015 Sabrina Dubroca, Hannes Frederic Sowa
+ *	Copyright (c) 2016-2018 Prarit Bhargava
+ *	Copyright (c) 2017 Paolo Abeni, Larry Woodman
+ *
+ * These macros are to be used to try to help alleviate future kernel abi
+ * changes that will occur as LTS and other kernel patches are merged into the
+ * tree during a period in which the kernel abi is wishing to not be disturbed.
+ *
+ * There are two times these macros should be used:
+ *  - Before the kernel abi is "frozen"
+ *    Padding can be added to various kernel structures that have in the past
+ *    been known to change over time.  That will give "room" in the structure
+ *    that can then be used when fields are added so that the structure size
+ *    will not change.
+ *
+ *  - After the kernel abi is "frozen"
+ *    If a structure's field is changed to a type that is identical in size to
+ *    the previous type, it can be changed with a union macro
+ *    If a field is added to a structure, the padding fields can be used to add
+ *    the new field in a "safe" way.
+ */
+#ifndef _ANDROID_KABI_H
+#define _ANDROID_KABI_H
+
+#include <linux/compiler.h>
+
+/*
+ * Worker macros, don't use these, use the ones without a leading '_'
+ */
+
+#define __ANDROID_KABI_CHECK_SIZE_ALIGN(_orig, _new)				\
+	union {									\
+		_Static_assert(sizeof(struct{_new;}) <= sizeof(struct{_orig;}),	\
+			       __FILE__ ":" __stringify(__LINE__) ": "		\
+			       __stringify(_new)				\
+			       " is larger than "				\
+			       __stringify(_orig) );				\
+		_Static_assert(__alignof__(struct{_new;}) <= __alignof__(struct{_orig;}),	\
+			       __FILE__ ":" __stringify(__LINE__) ": "		\
+			       __stringify(_orig)				\
+			       " is not aligned the same as "			\
+			       __stringify(_new) );				\
+	}
+
+#ifdef __GENKSYMS__
+
+#define _ANDROID_KABI_REPLACE(_orig, _new)		_orig
+
+#else
+
+#define _ANDROID_KABI_REPLACE(_orig, _new)			\
+	union {							\
+		_new;						\
+		struct {					\
+			_orig;					\
+		};						\
+		__ANDROID_KABI_CHECK_SIZE_ALIGN(_orig, _new);	\
+	}
+
+#endif /* __GENKSYMS__ */
+
+#define _ANDROID_KABI_RESERVE(n)		u64 android_kabi_reserved##n
+
+
+/*
+ * Macros to use _before_ the ABI is frozen
+ */
+
+/*
+ * ANDROID_KABI_RESERVE
+ *   Reserve some "padding" in a structure for potential future use.
+ *   This normally placed at the end of a structure.
+ *   number: the "number" of the padding variable in the structure.  Start with
+ *   1 and go up.
+ */
+#ifdef CONFIG_ANDROID_KABI_RESERVE
+#define ANDROID_KABI_RESERVE(number)	_ANDROID_KABI_RESERVE(number)
+#else
+#define ANDROID_KABI_RESERVE(number)
+#endif
+
+
+/*
+ * Macros to use _after_ the ABI is frozen
+ */
+
+/*
+ * ANDROID_KABI_USE(number, _new)
+ *   Use a previous padding entry that was defined with ANDROID_KABI_RESERVE
+ *   number: the previous "number" of the padding variable
+ *   _new: the variable to use now instead of the padding variable
+ */
+#define ANDROID_KABI_USE(number, _new)		\
+	_ANDROID_KABI_REPLACE(_ANDROID_KABI_RESERVE(number), _new)
+
+/*
+ * ANDROID_KABI_USE2(number, _new1, _new2)
+ *   Use a previous padding entry that was defined with ANDROID_KABI_RESERVE for
+ *   two new variables that fit into 64 bits.  This is good for when you do not
+ *   want to "burn" a 64bit padding variable for a smaller variable size if not
+ *   needed.
+ */
+#define ANDROID_KABI_USE2(number, _new1, _new2)			\
+	_ANDROID_KABI_REPLACE(_ANDROID_KABI_RESERVE(number), struct{ _new1; _new2; })
+
+
+#endif /* _ANDROID_KABI_H */

diff --git a/include/linux/android_vendor.h b/include/linux/android_vendor.h
new file mode 100644
index 0000000..af3014c
--- /dev/null
+++ b/include/linux/android_vendor.h

@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * android_vendor.h - Android vendor data
+ *
+ * Copyright 2020 Google LLC
+ *
+ * These macros are to be used to reserve space in kernel data structures
+ * for use by vendor modules.
+ *
+ * These macros should be used before the kernel abi is "frozen".
+ * Fields can be added to various kernel structures that need space
+ * for functionality implemented in vendor modules. The use of
+ * these fields is vendor specific.
+ */
+#ifndef _ANDROID_VENDOR_H
+#define _ANDROID_VENDOR_H
+
+/*
+ * ANDROID_VENDOR_DATA
+ *   Reserve some "padding" in a structure for potential future use.
+ *   This normally placed at the end of a structure.
+ *   number: the "number" of the padding variable in the structure.  Start with
+ *   1 and go up.
+ *
+ * ANDROID_VENDOR_DATA_ARRAY
+ *   Same as ANDROID_VENDOR_DATA but allocates an array of u64 with
+ *   the specified size
+ */
+#ifdef CONFIG_ANDROID_VENDOR_OEM_DATA
+#define ANDROID_VENDOR_DATA(n)		u64 android_vendor_data##n
+#define ANDROID_VENDOR_DATA_ARRAY(n, s)	u64 android_vendor_data##n[s]
+
+#define ANDROID_OEM_DATA(n)		u64 android_oem_data##n
+#define ANDROID_OEM_DATA_ARRAY(n, s)	u64 android_oem_data##n[s]
+
+#define android_init_vendor_data(p, n) \
+	memset(&p->android_vendor_data##n, 0, sizeof(p->android_vendor_data##n))
+#define android_init_oem_data(p, n) \
+	memset(&p->android_oem_data##n, 0, sizeof(p->android_oem_data##n))
+#else
+#define ANDROID_VENDOR_DATA(n)
+#define ANDROID_VENDOR_DATA_ARRAY(n, s)
+#define ANDROID_OEM_DATA(n)
+#define ANDROID_OEM_DATA_ARRAY(n, s)
+
+#define android_init_vendor_data(p, n)
+#define android_init_oem_data(p, n)
+#endif
+
+#endif /* _ANDROID_VENDOR_H */

diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index a07b510..2fdd447 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h

@@ -7,6 +7,7 @@
 
 #include <linux/types.h>
 #include <linux/percpu.h>
+#include <linux/android_vendor.h>
 
 void topology_normalize_cpu_scale(void);
 int topology_update_cpu_topology(void);
@@ -72,6 +73,8 @@ struct cpu_topology {
 	cpumask_t core_sibling;
 	cpumask_t cluster_sibling;
 	cpumask_t llc_sibling;
+
+	ANDROID_VENDOR_DATA_ARRAY(1, 1);
 };
 
 #ifdef CONFIG_GENERIC_ARCH_TOPOLOGY
@@ -93,5 +96,6 @@ void remove_cpu_topology(unsigned int cpuid);
 void reset_cpu_topology(void);
 int parse_acpi_topology(void);
 #endif
+extern bool topology_update_done;
 
 #endif /* _LINUX_ARCH_TOPOLOGY_H_ */

diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 220c8c6..b23906d 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h

@@ -112,6 +112,14 @@
 /* KVM "vendor specific" services */
 #define ARM_SMCCC_KVM_FUNC_FEATURES		0
 #define ARM_SMCCC_KVM_FUNC_PTP			1
+#define ARM_SMCCC_KVM_FUNC_HYP_MEMINFO		2
+#define ARM_SMCCC_KVM_FUNC_MEM_SHARE		3
+#define ARM_SMCCC_KVM_FUNC_MEM_UNSHARE		4
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO	5
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL	6
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP	7
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP	8
+#define ARM_SMCCC_KVM_FUNC_MEM_RELINQUISH	9
 #define ARM_SMCCC_KVM_FUNC_FEATURES_2		127
 #define ARM_SMCCC_KVM_NUM_FUNCS			128
 
@@ -134,10 +142,58 @@
 			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
 			   ARM_SMCCC_KVM_FUNC_PTP)
 
+#define ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_HYP_MEMINFO)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MEM_SHARE)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MEM_UNSHARE)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MEM_RELINQUISH_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MEM_RELINQUISH)
+
 /* ptp_kvm counter type ID */
 #define KVM_PTP_VIRT_COUNTER			0
 #define KVM_PTP_PHYS_COUNTER			1
 
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID		\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID		\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID		\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_64,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP)
+
 /* Paravirtualised time calls (defined by ARM DEN0057A) */
 #define ARM_SMCCC_HV_PV_TIME_FEATURES				\
 	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\

diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h
index 5f02d2e..b9f8103 100644
--- a/include/linux/arm_ffa.h
+++ b/include/linux/arm_ffa.h

@@ -11,6 +11,97 @@
 #include <linux/types.h>
 #include <linux/uuid.h>
 
+#define FFA_SMC(calling_convention, func_num)				\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, (calling_convention),	\
+			   ARM_SMCCC_OWNER_STANDARD, (func_num))
+
+#define FFA_SMC_32(func_num)	FFA_SMC(ARM_SMCCC_SMC_32, (func_num))
+#define FFA_SMC_64(func_num)	FFA_SMC(ARM_SMCCC_SMC_64, (func_num))
+
+#define FFA_ERROR			FFA_SMC_32(0x60)
+#define FFA_SUCCESS			FFA_SMC_32(0x61)
+#define FFA_INTERRUPT			FFA_SMC_32(0x62)
+#define FFA_VERSION			FFA_SMC_32(0x63)
+#define FFA_FEATURES			FFA_SMC_32(0x64)
+#define FFA_RX_RELEASE			FFA_SMC_32(0x65)
+#define FFA_RXTX_MAP			FFA_SMC_32(0x66)
+#define FFA_FN64_RXTX_MAP		FFA_SMC_64(0x66)
+#define FFA_RXTX_UNMAP			FFA_SMC_32(0x67)
+#define FFA_PARTITION_INFO_GET		FFA_SMC_32(0x68)
+#define FFA_ID_GET			FFA_SMC_32(0x69)
+#define FFA_MSG_POLL			FFA_SMC_32(0x6A)
+#define FFA_MSG_WAIT			FFA_SMC_32(0x6B)
+#define FFA_YIELD			FFA_SMC_32(0x6C)
+#define FFA_RUN				FFA_SMC_32(0x6D)
+#define FFA_MSG_SEND			FFA_SMC_32(0x6E)
+#define FFA_MSG_SEND_DIRECT_REQ		FFA_SMC_32(0x6F)
+#define FFA_FN64_MSG_SEND_DIRECT_REQ	FFA_SMC_64(0x6F)
+#define FFA_MSG_SEND_DIRECT_RESP	FFA_SMC_32(0x70)
+#define FFA_FN64_MSG_SEND_DIRECT_RESP	FFA_SMC_64(0x70)
+#define FFA_MEM_DONATE			FFA_SMC_32(0x71)
+#define FFA_FN64_MEM_DONATE		FFA_SMC_64(0x71)
+#define FFA_MEM_LEND			FFA_SMC_32(0x72)
+#define FFA_FN64_MEM_LEND		FFA_SMC_64(0x72)
+#define FFA_MEM_SHARE			FFA_SMC_32(0x73)
+#define FFA_FN64_MEM_SHARE		FFA_SMC_64(0x73)
+#define FFA_MEM_RETRIEVE_REQ		FFA_SMC_32(0x74)
+#define FFA_FN64_MEM_RETRIEVE_REQ	FFA_SMC_64(0x74)
+#define FFA_MEM_RETRIEVE_RESP		FFA_SMC_32(0x75)
+#define FFA_MEM_RELINQUISH		FFA_SMC_32(0x76)
+#define FFA_MEM_RECLAIM			FFA_SMC_32(0x77)
+#define FFA_MEM_OP_PAUSE		FFA_SMC_32(0x78)
+#define FFA_MEM_OP_RESUME		FFA_SMC_32(0x79)
+#define FFA_MEM_FRAG_RX			FFA_SMC_32(0x7A)
+#define FFA_MEM_FRAG_TX			FFA_SMC_32(0x7B)
+#define FFA_NORMAL_WORLD_RESUME		FFA_SMC_32(0x7C)
+
+/*
+ * For some calls it is necessary to use SMC64 to pass or return 64-bit values.
+ * For such calls FFA_FN_NATIVE(name) will choose the appropriate
+ * (native-width) function ID.
+ */
+#ifdef CONFIG_64BIT
+#define FFA_FN_NATIVE(name)	FFA_FN64_##name
+#else
+#define FFA_FN_NATIVE(name)	FFA_##name
+#endif
+
+/* FFA error codes. */
+#define FFA_RET_SUCCESS            (0)
+#define FFA_RET_NOT_SUPPORTED      (-1)
+#define FFA_RET_INVALID_PARAMETERS (-2)
+#define FFA_RET_NO_MEMORY          (-3)
+#define FFA_RET_BUSY               (-4)
+#define FFA_RET_INTERRUPTED        (-5)
+#define FFA_RET_DENIED             (-6)
+#define FFA_RET_RETRY              (-7)
+#define FFA_RET_ABORTED            (-8)
+
+/* FFA version encoding */
+#define FFA_MAJOR_VERSION_MASK	GENMASK(30, 16)
+#define FFA_MINOR_VERSION_MASK	GENMASK(15, 0)
+#define FFA_MAJOR_VERSION(x)	((u16)(FIELD_GET(FFA_MAJOR_VERSION_MASK, (x))))
+#define FFA_MINOR_VERSION(x)	((u16)(FIELD_GET(FFA_MINOR_VERSION_MASK, (x))))
+#define FFA_PACK_VERSION_INFO(major, minor)			\
+	(FIELD_PREP(FFA_MAJOR_VERSION_MASK, (major)) |		\
+	 FIELD_PREP(FFA_MINOR_VERSION_MASK, (minor)))
+#define FFA_VERSION_1_0		FFA_PACK_VERSION_INFO(1, 0)
+
+/**
+ * FF-A specification mentions explicitly about '4K pages'. This should
+ * not be confused with the kernel PAGE_SIZE, which is the translation
+ * granule kernel is configured and may be one among 4K, 16K and 64K.
+ */
+#define FFA_PAGE_SIZE		SZ_4K
+
+/*
+ * Minimum buffer size/alignment encodings returned by an FFA_FEATURES
+ * query for FFA_RXTX_MAP.
+ */
+#define FFA_FEAT_RXTX_MIN_SZ_4K		0
+#define FFA_FEAT_RXTX_MIN_SZ_64K	1
+#define FFA_FEAT_RXTX_MIN_SZ_16K	2
+
 /* FFA Bus/Device/Driver related */
 struct ffa_device {
 	int vm_id;
@@ -161,11 +252,11 @@ struct ffa_mem_region_attributes {
 	 */
 #define FFA_MEM_RETRIEVE_SELF_BORROWER	BIT(0)
 	u8 flag;
-	u32 composite_off;
 	/*
 	 * Offset in bytes from the start of the outer `ffa_memory_region` to
 	 * an `struct ffa_mem_region_addr_range`.
 	 */
+	u32 composite_off;
 	u64 reserved;
 };
 

diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index 5ca2d56..318cfe8 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h

@@ -43,6 +43,7 @@
 #include <linux/err.h>
 #include <linux/fs.h>
 #include <linux/list.h>
+#include <linux/mem_relinquish.h>
 
 /*
  * Balloon device information descriptor.
@@ -95,6 +96,7 @@ static inline void balloon_page_insert(struct balloon_dev_info *balloon,
 	__SetPageMovable(page, &balloon_mops);
 	set_page_private(page, (unsigned long)balloon);
 	list_add(&page->lru, &balloon->pages);
+	page_relinquish(page);
 }
 
 /*
@@ -139,6 +141,7 @@ static inline void balloon_page_insert(struct balloon_dev_info *balloon,
 {
 	__SetPageOffline(page);
 	list_add(&page->lru, &balloon->pages);
+	page_relinquish(page);
 }
 
 static inline void balloon_page_delete(struct page *page)

diff --git a/include/linux/blk-crypto-profile.h b/include/linux/blk-crypto-profile.h
index bbab65b..8b30d04 100644
--- a/include/linux/blk-crypto-profile.h
+++ b/include/linux/blk-crypto-profile.h

@@ -57,6 +57,20 @@ struct blk_crypto_ll_ops {
 	int (*keyslot_evict)(struct blk_crypto_profile *profile,
 			     const struct blk_crypto_key *key,
 			     unsigned int slot);
+
+	/**
+	 * @derive_sw_secret: Derive the software secret from a hardware-wrapped
+	 *		      key in ephemerally-wrapped form.
+	 *
+	 * This only needs to be implemented if BLK_CRYPTO_KEY_TYPE_HW_WRAPPED
+	 * is supported.
+	 *
+	 * Must return 0 on success, -EBADMSG if the key is invalid, or another
+	 * -errno code on other errors.
+	 */
+	int (*derive_sw_secret)(struct blk_crypto_profile *profile,
+				const u8 *eph_key, size_t eph_key_size,
+				u8 sw_secret[BLK_CRYPTO_SW_SECRET_SIZE]);
 };
 
 /**
@@ -85,6 +99,12 @@ struct blk_crypto_profile {
 	unsigned int max_dun_bytes_supported;
 
 	/**
+	 * @key_types_supported: A bitmask of the supported key types:
+	 * BLK_CRYPTO_KEY_TYPE_STANDARD and/or BLK_CRYPTO_KEY_TYPE_HW_WRAPPED.
+	 */
+	unsigned int key_types_supported;
+
+	/**
 	 * @modes_supported: Array of bitmasks that specifies whether each
 	 * combination of crypto mode and data unit size is supported.
 	 * Specifically, the i'th bit of modes_supported[crypto_mode] is set if
@@ -138,18 +158,6 @@ int devm_blk_crypto_profile_init(struct device *dev,
 
 unsigned int blk_crypto_keyslot_index(struct blk_crypto_keyslot *slot);
 
-blk_status_t blk_crypto_get_keyslot(struct blk_crypto_profile *profile,
-				    const struct blk_crypto_key *key,
-				    struct blk_crypto_keyslot **slot_ptr);
-
-void blk_crypto_put_keyslot(struct blk_crypto_keyslot *slot);
-
-bool __blk_crypto_cfg_supported(struct blk_crypto_profile *profile,
-				const struct blk_crypto_config *cfg);
-
-int __blk_crypto_evict_key(struct blk_crypto_profile *profile,
-			   const struct blk_crypto_key *key);
-
 void blk_crypto_reprogram_all_keys(struct blk_crypto_profile *profile);
 
 void blk_crypto_profile_destroy(struct blk_crypto_profile *profile);

diff --git a/include/linux/blk-crypto.h b/include/linux/blk-crypto.h
index 69b24fe..ef771c9 100644
--- a/include/linux/blk-crypto.h
+++ b/include/linux/blk-crypto.h

@@ -13,10 +13,62 @@ enum blk_crypto_mode_num {
 	BLK_ENCRYPTION_MODE_AES_256_XTS,
 	BLK_ENCRYPTION_MODE_AES_128_CBC_ESSIV,
 	BLK_ENCRYPTION_MODE_ADIANTUM,
+	BLK_ENCRYPTION_MODE_SM4_XTS,
 	BLK_ENCRYPTION_MODE_MAX,
 };
 
-#define BLK_CRYPTO_MAX_KEY_SIZE		64
+/*
+ * Supported types of keys.  Must be bitflags due to their use in
+ * blk_crypto_profile::key_types_supported.
+ */
+enum blk_crypto_key_type {
+	/*
+	 * Standard keys (i.e. "software keys").  These keys are simply kept in
+	 * raw, plaintext form in kernel memory.
+	 */
+	BLK_CRYPTO_KEY_TYPE_STANDARD = 1 << 0,
+
+	/*
+	 * Hardware-wrapped keys.  These keys are only present in kernel memory
+	 * in ephemerally-wrapped form, and they can only be unwrapped by
+	 * dedicated hardware.  For details, see the "Hardware-wrapped keys"
+	 * section of Documentation/block/inline-encryption.rst.
+	 */
+	BLK_CRYPTO_KEY_TYPE_HW_WRAPPED = 1 << 1,
+};
+
+/*
+ * Currently the maximum standard key size is 64 bytes, as that is the key size
+ * of BLK_ENCRYPTION_MODE_AES_256_XTS which takes the longest key.
+ *
+ * The maximum hardware-wrapped key size depends on the hardware's key wrapping
+ * algorithm, which is a hardware implementation detail, so it isn't precisely
+ * specified.  But currently 128 bytes is plenty in practice.  Implementations
+ * are recommended to wrap a 32-byte key for the hardware KDF with AES-256-GCM,
+ * which should result in a size closer to 64 bytes than 128.
+ *
+ * Both of these values can trivially be increased if ever needed.
+ */
+#define BLK_CRYPTO_MAX_STANDARD_KEY_SIZE	64
+#define BLK_CRYPTO_MAX_HW_WRAPPED_KEY_SIZE	128
+
+/* This should use max(), but max() doesn't work in a struct definition. */
+#define BLK_CRYPTO_MAX_ANY_KEY_SIZE \
+	(BLK_CRYPTO_MAX_HW_WRAPPED_KEY_SIZE > \
+	 BLK_CRYPTO_MAX_STANDARD_KEY_SIZE ? \
+	 BLK_CRYPTO_MAX_HW_WRAPPED_KEY_SIZE : BLK_CRYPTO_MAX_STANDARD_KEY_SIZE)
+
+/*
+ * Size of the "software secret" which can be derived from a hardware-wrapped
+ * key.  This is currently always 32 bytes.  Note, the choice of 32 bytes
+ * assumes that the software secret is only used directly for algorithms that
+ * don't require more than a 256-bit key to get the desired security strength.
+ * If it were to be used e.g. directly as an AES-256-XTS key, then this would
+ * need to be increased (which is possible if hardware supports it, but care
+ * would need to be taken to avoid breaking users who need exactly 32 bytes).
+ */
+#define BLK_CRYPTO_SW_SECRET_SIZE	32
+
 /**
  * struct blk_crypto_config - an inline encryption key's crypto configuration
  * @crypto_mode: encryption algorithm this key is for
@@ -25,20 +77,23 @@ enum blk_crypto_mode_num {
  *	ciphertext.  This is always a power of 2.  It might be e.g. the
  *	filesystem block size or the disk sector size.
  * @dun_bytes: the maximum number of bytes of DUN used when using this key
+ * @key_type: the type of this key -- either standard or hardware-wrapped
  */
 struct blk_crypto_config {
 	enum blk_crypto_mode_num crypto_mode;
 	unsigned int data_unit_size;
 	unsigned int dun_bytes;
+	enum blk_crypto_key_type key_type;
 };
 
 /**
  * struct blk_crypto_key - an inline encryption key
- * @crypto_cfg: the crypto configuration (like crypto_mode, key size) for this
- *		key
+ * @crypto_cfg: the crypto mode, data unit size, key type, and other
+ *		characteristics of this key and how it will be used
  * @data_unit_size_bits: log2 of data_unit_size
- * @size: size of this key in bytes (determined by @crypto_cfg.crypto_mode)
- * @raw: the raw bytes of this key.  Only the first @size bytes are used.
+ * @size: size of this key in bytes.  The size of a standard key is fixed for a
+ *	  given crypto mode, but the size of a hardware-wrapped key can vary.
+ * @raw: the bytes of this key.  Only the first @size bytes are significant.
  *
  * A blk_crypto_key is immutable once created, and many bios can reference it at
  * the same time.  It must not be freed until all bios using it have completed
@@ -48,7 +103,7 @@ struct blk_crypto_key {
 	struct blk_crypto_config crypto_cfg;
 	unsigned int data_unit_size_bits;
 	unsigned int size;
-	u8 raw[BLK_CRYPTO_MAX_KEY_SIZE];
+	u8 raw[BLK_CRYPTO_MAX_ANY_KEY_SIZE];
 };
 
 #define BLK_CRYPTO_MAX_IV_SIZE		32
@@ -71,9 +126,6 @@ struct bio_crypt_ctx {
 #include <linux/blk_types.h>
 #include <linux/blkdev.h>
 
-struct request;
-struct request_queue;
-
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
 
 static inline bool bio_has_crypt_ctx(struct bio *bio)
@@ -89,20 +141,28 @@ bool bio_crypt_dun_is_contiguous(const struct bio_crypt_ctx *bc,
 				 unsigned int bytes,
 				 const u64 next_dun[BLK_CRYPTO_DUN_ARRAY_SIZE]);
 
-int blk_crypto_init_key(struct blk_crypto_key *blk_key, const u8 *raw_key,
+int blk_crypto_init_key(struct blk_crypto_key *blk_key,
+			const u8 *raw_key, size_t raw_key_size,
+			enum blk_crypto_key_type key_type,
 			enum blk_crypto_mode_num crypto_mode,
 			unsigned int dun_bytes,
 			unsigned int data_unit_size);
 
-int blk_crypto_start_using_key(const struct blk_crypto_key *key,
-			       struct request_queue *q);
+int blk_crypto_start_using_key(struct block_device *bdev,
+			       const struct blk_crypto_key *key);
 
-int blk_crypto_evict_key(struct request_queue *q,
+int blk_crypto_evict_key(struct block_device *bdev,
 			 const struct blk_crypto_key *key);
 
-bool blk_crypto_config_supported(struct request_queue *q,
+bool blk_crypto_config_supported_natively(struct block_device *bdev,
+					  const struct blk_crypto_config *cfg);
+bool blk_crypto_config_supported(struct block_device *bdev,
 				 const struct blk_crypto_config *cfg);
 
+int blk_crypto_derive_sw_secret(struct block_device *bdev,
+				const u8 *eph_key, size_t eph_key_size,
+				u8 sw_secret[BLK_CRYPTO_SW_SECRET_SIZE]);
+
 #else /* CONFIG_BLK_INLINE_ENCRYPTION */
 
 static inline bool bio_has_crypt_ctx(struct bio *bio)
@@ -112,6 +172,9 @@ static inline bool bio_has_crypt_ctx(struct bio *bio)
 
 #endif /* CONFIG_BLK_INLINE_ENCRYPTION */
 
+static inline void bio_clone_skip_dm_default_key(struct bio *dst,
+						 const struct bio *src);
+
 int __bio_crypt_clone(struct bio *dst, struct bio *src, gfp_t gfp_mask);
 /**
  * bio_crypt_clone - clone bio encryption context
@@ -127,9 +190,42 @@ int __bio_crypt_clone(struct bio *dst, struct bio *src, gfp_t gfp_mask);
 static inline int bio_crypt_clone(struct bio *dst, struct bio *src,
 				  gfp_t gfp_mask)
 {
+	bio_clone_skip_dm_default_key(dst, src);
 	if (bio_has_crypt_ctx(src))
 		return __bio_crypt_clone(dst, src, gfp_mask);
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_DM_DEFAULT_KEY)
+static inline void bio_set_skip_dm_default_key(struct bio *bio)
+{
+	bio->bi_skip_dm_default_key = true;
+}
+
+static inline bool bio_should_skip_dm_default_key(const struct bio *bio)
+{
+	return bio->bi_skip_dm_default_key;
+}
+
+static inline void bio_clone_skip_dm_default_key(struct bio *dst,
+						 const struct bio *src)
+{
+	dst->bi_skip_dm_default_key = src->bi_skip_dm_default_key;
+}
+#else /* CONFIG_DM_DEFAULT_KEY */
+static inline void bio_set_skip_dm_default_key(struct bio *bio)
+{
+}
+
+static inline bool bio_should_skip_dm_default_key(const struct bio *bio)
+{
+	return false;
+}
+
+static inline void bio_clone_skip_dm_default_key(struct bio *dst,
+						 const struct bio *src)
+{
+}
+#endif /* !CONFIG_DM_DEFAULT_KEY */
+
 #endif /* __LINUX_BLK_CRYPTO_H */

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index e0b0980..e242b88 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h

@@ -281,6 +281,9 @@ struct bio {
 
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
 	struct bio_crypt_ctx	*bi_crypt_context;
+#if IS_ENABLED(CONFIG_DM_DEFAULT_KEY)
+	bool			bi_skip_dm_default_key;
+#endif
 #endif
 
 	union {

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 2c6a4f25..1c61204 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h

@@ -79,6 +79,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm,
 #endif
 BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall,
 	      void *, void *)
+#ifdef CONFIG_FUSE_BPF
+BPF_PROG_TYPE(BPF_PROG_TYPE_FUSE, fuse, struct fuse_bpf_args, struct fuse_bpf_args)
+#endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)

diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index 267cd06..e989ab74 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h

@@ -32,6 +32,7 @@
 #define CLK_OPS_PARENT_ENABLE	BIT(12)
 /* duty cycle call may be forwarded to the parent clock */
 #define CLK_DUTY_CYCLE_PARENT	BIT(13)
+#define CLK_DONT_HOLD_STATE	BIT(14) /* Don't hold state */
 
 struct clk;
 struct clk_hw;
@@ -217,6 +218,13 @@ struct clk_duty {
  *		directory is provided as an argument.  Called with
  *		prepare_lock held.  Returns 0 on success, -EERROR otherwise.
  *
+ * @pre_rate_change: Optional callback for a clock to fulfill its rate
+ *		change requirements before any rate change has occurred in
+ *		its clock tree. Returns 0 on success, -EERROR otherwise.
+ *
+ * @post_rate_change: Optional callback for a clock to clean up any
+ *		requirements that were needed while the clock and its tree
+ *		was changing states. Returns 0 on success, -EERROR otherwise.
  *
  * The clk_enable/clk_disable and clk_prepare/clk_unprepare pairs allow
  * implementations to split any work between atomic (enable) and sleepable
@@ -264,6 +272,12 @@ struct clk_ops {
 	int		(*init)(struct clk_hw *hw);
 	void		(*terminate)(struct clk_hw *hw);
 	void		(*debug_init)(struct clk_hw *hw, struct dentry *dentry);
+	int		(*pre_rate_change)(struct clk_hw *hw,
+					   unsigned long rate,
+					   unsigned long new_rate);
+	int		(*post_rate_change)(struct clk_hw *hw,
+					    unsigned long old_rate,
+					    unsigned long rate);
 };
 
 /**
@@ -1272,6 +1286,7 @@ int __must_check of_clk_hw_register(struct device_node *node, struct clk_hw *hw)
 void clk_unregister(struct clk *clk);
 
 void clk_hw_unregister(struct clk_hw *hw);
+void clk_sync_state(struct device *dev);
 
 /* helper functions */
 const char *__clk_get_name(const struct clk *clk);

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 52a9ff6..fb8a8f3 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h

@@ -180,6 +180,8 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 extern void kcompactd_run(int nid);
 extern void kcompactd_stop(int nid);
 extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int highest_zoneidx);
+extern unsigned long isolate_and_split_free_page(struct page *page,
+				struct list_head *list);
 
 #else
 static inline void reset_isolation_suitable(pg_data_t *pgdat)
@@ -224,6 +226,12 @@ static inline void wakeup_kcompactd(pg_data_t *pgdat,
 {
 }
 
+static inline unsigned long isolate_and_split_free_page(struct page *page,
+				struct list_head *list)
+{
+	return 0;
+}
+
 #endif /* CONFIG_COMPACTION */
 
 struct node;

diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index d5595d5..e74037b 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h

@@ -1191,14 +1191,6 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
 }
 #endif
 
-#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
-void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
-			struct cpufreq_governor *old_gov);
-#else
-static inline void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
-			struct cpufreq_governor *old_gov) { }
-#endif
-
 extern unsigned int arch_freq_get_on_cpu(int cpu);
 
 #ifndef arch_set_freq_scale

diff --git a/include/linux/cpufreq_times.h b/include/linux/cpufreq_times.h
new file mode 100644
index 0000000..38272a5
--- /dev/null
+++ b/include/linux/cpufreq_times.h

@@ -0,0 +1,42 @@
+/* drivers/cpufreq/cpufreq_times.c
+ *
+ * Copyright (C) 2018 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_CPUFREQ_TIMES_H
+#define _LINUX_CPUFREQ_TIMES_H
+
+#include <linux/cpufreq.h>
+#include <linux/pid.h>
+
+#ifdef CONFIG_CPU_FREQ_TIMES
+void cpufreq_task_times_init(struct task_struct *p);
+void cpufreq_task_times_alloc(struct task_struct *p);
+void cpufreq_task_times_exit(struct task_struct *p);
+int proc_time_in_state_show(struct seq_file *m, struct pid_namespace *ns,
+			    struct pid *pid, struct task_struct *p);
+void cpufreq_acct_update_power(struct task_struct *p, u64 cputime);
+void cpufreq_times_create_policy(struct cpufreq_policy *policy);
+void cpufreq_times_record_transition(struct cpufreq_policy *policy,
+                                     unsigned int new_freq);
+#else
+static inline void cpufreq_task_times_init(struct task_struct *p) {}
+static inline void cpufreq_task_times_alloc(struct task_struct *p) {}
+static inline void cpufreq_task_times_exit(struct task_struct *p) {}
+static inline void cpufreq_acct_update_power(struct task_struct *p,
+					     u64 cputime) {}
+static inline void cpufreq_times_create_policy(struct cpufreq_policy *policy) {}
+static inline void cpufreq_times_record_transition(
+	struct cpufreq_policy *policy, unsigned int new_freq) {}
+#endif /* CONFIG_CPU_FREQ_TIMES */
+#endif /* _LINUX_CPUFREQ_TIMES_H */

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index d58e047..92b0a92 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h

@@ -71,8 +71,6 @@ extern void cpuset_init_smp(void);
 extern void cpuset_force_rebuild(void);
 extern void cpuset_update_active_cpus(void);
 extern void cpuset_wait_for_hotplug(void);
-extern void cpuset_read_lock(void);
-extern void cpuset_read_unlock(void);
 extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
 extern bool cpuset_cpus_allowed_fallback(struct task_struct *p);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
@@ -196,9 +194,6 @@ static inline void cpuset_update_active_cpus(void)
 
 static inline void cpuset_wait_for_hotplug(void) { }
 
-static inline void cpuset_read_lock(void) { }
-static inline void cpuset_read_unlock(void) { }
-
 static inline void cpuset_cpus_allowed(struct task_struct *p,
 				       struct cpumask *mask)
 {

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 6b351e0..ccdb1c2 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h

@@ -140,6 +140,7 @@ struct dentry_operations {
 	struct vfsmount *(*d_automount)(struct path *);
 	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, const struct inode *);
+	void (*d_canonical_path)(const struct path *, struct path *);
 } ____cacheline_aligned;
 
 /*

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 7173179..7ed5320 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h

@@ -230,6 +230,41 @@ struct dma_buf_ops {
 	int (*begin_cpu_access)(struct dma_buf *, enum dma_data_direction);
 
 	/**
+	 * @begin_cpu_access_partial:
+	 *
+	 * This is called from dma_buf_begin_cpu_access_partial() and allows the
+	 * exporter to ensure that the memory specified in the range is
+	 * available for cpu access - the exporter might need to allocate or
+	 * swap-in and pin the backing storage.
+	 * The exporter also needs to ensure that cpu access is
+	 * coherent for the access direction. The direction can be used by the
+	 * exporter to optimize the cache flushing, i.e. access with a different
+	 * direction (read instead of write) might return stale or even bogus
+	 * data (e.g. when the exporter needs to copy the data to temporary
+	 * storage).
+	 *
+	 * This callback is optional.
+	 *
+	 * FIXME: This is both called through the DMA_BUF_IOCTL_SYNC command
+	 * from userspace (where storage shouldn't be pinned to avoid handing
+	 * de-factor mlock rights to userspace) and for the kernel-internal
+	 * users of the various kmap interfaces, where the backing storage must
+	 * be pinned to guarantee that the atomic kmap calls can succeed. Since
+	 * there's no in-kernel users of the kmap interfaces yet this isn't a
+	 * real problem.
+	 *
+	 * Returns:
+	 *
+	 * 0 on success or a negative error code on failure. This can for
+	 * example fail when the backing storage can't be allocated. Can also
+	 * return -ERESTARTSYS or -EINTR when the call has been interrupted and
+	 * needs to be restarted.
+	 */
+	int (*begin_cpu_access_partial)(struct dma_buf *dmabuf,
+					enum dma_data_direction,
+					unsigned int offset, unsigned int len);
+
+	/**
 	 * @end_cpu_access:
 	 *
 	 * This is called from dma_buf_end_cpu_access() when the importer is
@@ -247,6 +282,28 @@ struct dma_buf_ops {
 	int (*end_cpu_access)(struct dma_buf *, enum dma_data_direction);
 
 	/**
+	 * @end_cpu_access_partial:
+	 *
+	 * This is called from dma_buf_end_cpu_access_partial() when the
+	 * importer is done accessing the CPU. The exporter can use to limit
+	 * cache flushing to only the range specefied and to unpin any
+	 * resources pinned in @begin_cpu_access_umapped.
+	 * The result of any dma_buf kmap calls after end_cpu_access_partial is
+	 * undefined.
+	 *
+	 * This callback is optional.
+	 *
+	 * Returns:
+	 *
+	 * 0 on success or a negative error code on failure. Can return
+	 * -ERESTARTSYS or -EINTR when the call has been interrupted and needs
+	 * to be restarted.
+	 */
+	int (*end_cpu_access_partial)(struct dma_buf *dmabuf,
+				      enum dma_data_direction,
+				      unsigned int offset, unsigned int len);
+
+	/**
 	 * @mmap:
 	 *
 	 * This callback is used by the dma_buf_mmap() function
@@ -285,6 +342,20 @@ struct dma_buf_ops {
 
 	int (*vmap)(struct dma_buf *dmabuf, struct iosys_map *map);
 	void (*vunmap)(struct dma_buf *dmabuf, struct iosys_map *map);
+
+	/**
+	 * @get_flags:
+	 *
+	 * This is called by dma_buf_get_flags and is used to get the buffer's
+	 * flags.
+	 * This callback is optional.
+	 *
+	 * Returns:
+	 *
+	 * 0 on success or a negative error code on failure. On success flags
+	 * will be populated with the buffer's flags.
+	 */
+	int (*get_flags)(struct dma_buf *dmabuf, unsigned long *flags);
 };
 
 /**
@@ -504,6 +575,8 @@ struct dma_buf_attach_ops {
  * @importer_ops: importer operations for this attachment, if provided
  * dma_buf_map/unmap_attachment() must be called with the dma_resv lock held.
  * @importer_priv: importer specific attachment data.
+ * @dma_map_attrs: DMA attributes to be used when the exporter maps the buffer
+ * through dma_buf_map_attachment.
  *
  * This structure holds the attachment information between the dma_buf buffer
  * and its user device(s). The list contains one attachment struct per device
@@ -524,6 +597,7 @@ struct dma_buf_attachment {
 	const struct dma_buf_attach_ops *importer_ops;
 	void *importer_priv;
 	void *priv;
+	unsigned long dma_map_attrs;
 };
 
 /**
@@ -625,11 +699,18 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
 void dma_buf_move_notify(struct dma_buf *dma_buf);
 int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 			     enum dma_data_direction dir);
+int dma_buf_begin_cpu_access_partial(struct dma_buf *dma_buf,
+				     enum dma_data_direction dir,
+				     unsigned int offset, unsigned int len);
 int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
 			   enum dma_data_direction dir);
+int dma_buf_end_cpu_access_partial(struct dma_buf *dma_buf,
+				     enum dma_data_direction dir,
+				     unsigned int offset, unsigned int len);
 
 int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 		 unsigned long);
 int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map);
 void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map);
+int dma_buf_get_flags(struct dma_buf *dmabuf, unsigned long *flags);
 #endif /* __DMA_BUF_H__ */

diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 0c05561..44f1050 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h

@@ -51,6 +51,15 @@ struct dma_heap_export_info {
 void *dma_heap_get_drvdata(struct dma_heap *heap);
 
 /**
+ * dma_heap_get_dev() - get device struct for the heap
+ * @heap: DMA-Heap to retrieve device struct from
+ *
+ * Returns:
+ * The device struct for the heap.
+ */
+struct device *dma_heap_get_dev(struct dma_heap *heap);
+
+/**
  * dma_heap_get_name() - get heap name
  * @heap: DMA-Heap to retrieve private data for
  *
@@ -65,4 +74,49 @@ const char *dma_heap_get_name(struct dma_heap *heap);
  */
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info);
 
+/**
+ * dma_heap_put - drops a reference to a dmabuf heaps, potentially freeing it
+ * @heap:		heap pointer
+ */
+void dma_heap_put(struct dma_heap *heap);
+
+/**
+ * dma_heap_find - Returns the registered dma_heap with the specified name
+ * @name: Name of the heap to find
+ *
+ * NOTE: dma_heaps returned from this function MUST be released
+ * using dma_heap_put() when the user is done.
+ */
+struct dma_heap *dma_heap_find(const char *name);
+
+/**
+ * dma_heap_buffer_alloc - Allocate dma-buf from a dma_heap
+ * @heap:	dma_heap to allocate from
+ * @len:	size to allocate
+ * @fd_flags:	flags to set on returned dma-buf fd
+ * @heap_flags:	flags to pass to the dma heap
+ *
+ * This is for internal dma-buf allocations only.
+ */
+struct dma_buf *dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
+				      unsigned int fd_flags,
+				      unsigned int heap_flags);
+
+/** dma_heap_buffer_free - Free dma_buf allocated by dma_heap_buffer_alloc
+ * @dma_buf:	dma_buf to free
+ *
+ * This is really only a simple wrapper to dma_buf_put()
+ */
+void dma_heap_buffer_free(struct dma_buf *);
+
+/**
+ * dma_heap_bufferfd_alloc - Allocate dma-buf fd from a dma_heap
+ * @heap:	dma_heap to allocate from
+ * @len:	size to allocate
+ * @fd_flags:	flags to set on returned dma-buf fd
+ * @heap_flags:	flags to pass to the dma heap
+ */
+int dma_heap_bufferfd_alloc(struct dma_heap *heap, size_t len,
+			    unsigned int fd_flags,
+			    unsigned int heap_flags);
 #endif /* _DMA_HEAPS_H */

diff --git a/include/linux/export.h b/include/linux/export.h
index 3f31ced..b674ca6 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h

@@ -106,7 +106,7 @@ struct kernel_symbol {
  */
 #define __EXPORT_SYMBOL(sym, sec, ns)
 
-#elif defined(CONFIG_TRIM_UNUSED_KSYMS)
+#elif defined(CONFIG_TRIM_UNUSED_KSYMS) && !defined(MODULE)
 
 #include <generated/autoksyms.h>
 

diff --git a/include/linux/fips.h b/include/linux/fips.h
index c6961e9..83f5d6f 100644
--- a/include/linux/fips.h
+++ b/include/linux/fips.h

@@ -2,7 +2,15 @@
 #ifndef _FIPS_H
 #define _FIPS_H
 
-#ifdef CONFIG_CRYPTO_FIPS
+#ifdef BUILD_FIPS140_KO
+/*
+ * In fips140.ko, enable the behavior that the upstream fips_enabled flag
+ * controls, such as the XTS weak key check.
+ */
+#define fips_enabled 1
+#define CONFIG_CRYPTO_FIPS 1
+
+#elif defined(CONFIG_CRYPTO_FIPS)
 extern int fips_enabled;
 extern struct atomic_notifier_head fips_fail_notif_chain;
 

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 081d1f53..e4f51af 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h

@@ -3448,6 +3448,11 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
 	return 0;
 }
 
+static inline rwf_t iocb_to_rw_flags(int ifl, int iocb_mask)
+{
+	return ifl & iocb_mask;
+}
+
 static inline ino_t parent_ino(struct dentry *dentry)
 {
 	ino_t res;

diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 4f5f8a6..c431ad6 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h

@@ -809,6 +809,20 @@ static inline u64 fscrypt_limit_io_blocks(const struct inode *inode, u64 lblk,
 }
 #endif /* !CONFIG_FS_ENCRYPTION_INLINE_CRYPT */
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_DM_DEFAULT_KEY)
+static inline bool
+fscrypt_inode_should_skip_dm_default_key(const struct inode *inode)
+{
+	return IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode);
+}
+#else
+static inline bool
+fscrypt_inode_should_skip_dm_default_key(const struct inode *inode)
+{
+	return false;
+}
+#endif
+
 /**
  * fscrypt_inode_uses_inline_crypto() - test whether an inode uses inline
  *					encryption

diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 40f14e5..0beb972 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h

@@ -254,4 +254,18 @@ static inline bool fsverity_active(const struct inode *inode)
 	return fsverity_get_info(inode) != NULL;
 }
 
+#ifdef CONFIG_FS_VERITY_BUILTIN_SIGNATURES
+int __fsverity_verify_signature(const struct inode *inode, const u8 *signature,
+				size_t sig_size, const u8 *file_digest,
+				unsigned int digest_algorithm);
+#else /* !CONFIG_FS_VERITY_BUILTIN_SIGNATURES */
+static inline int __fsverity_verify_signature(const struct inode *inode,
+				const u8 *signature, size_t sig_size,
+				const u8 *file_digest,
+				unsigned int digest_algorithm)
+{
+	return 0;
+}
+#endif /* !CONFIG_FS_VERITY_BUILTIN_SIGNATURES */
+
 #endif	/* _LINUX_FSVERITY_H */

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index a92bce4..7d09eb9 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h

@@ -527,6 +527,17 @@ DECLARE_STATIC_KEY_FALSE(force_irqthreads_key);
 #define set_softirq_pending(x)	(__this_cpu_write(local_softirq_pending_ref, (x)))
 #define or_softirq_pending(x)	(__this_cpu_or(local_softirq_pending_ref, (x)))
 
+/**
+ * __cpu_softirq_pending() - Checks to see if softirq is pending on a cpu
+ *
+ * This helper is inherently racy, as we're accessing per-cpu data w/o locks.
+ * But peeking at the flag can still be useful when deciding where to place a
+ * task.
+ */
+static inline u32 __cpu_softirq_pending(int cpu)
+{
+	return (u32)per_cpu(local_softirq_pending_ref, cpu);
+}
 #endif /* local_softirq_pending */
 
 /* Some architectures might implement lazy enabling/disabling of
@@ -571,6 +582,11 @@ enum
  * _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue
  */
 #define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(RCU_SOFTIRQ) | BIT(IRQ_POLL_SOFTIRQ))
+/* Softirq's where the handling might be long: */
+#define LONG_SOFTIRQ_MASK (BIT(NET_TX_SOFTIRQ)    | \
+			   BIT(NET_RX_SOFTIRQ)    | \
+			   BIT(BLOCK_SOFTIRQ)     | \
+			   BIT(IRQ_POLL_SOFTIRQ))
 
 /* map softirq index to softirq name. update 'softirq_to_name' in
  * kernel/softirq.c when adding a new softirq.
@@ -607,6 +623,10 @@ extern void raise_softirq(unsigned int nr);
 
 DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
 
+#ifdef CONFIG_RT_SOFTIRQ_AWARE_SCHED
+DECLARE_PER_CPU(u32, active_softirqs);
+#endif
+
 static inline struct task_struct *this_cpu_ksoftirqd(void)
 {
 	return this_cpu_read(ksoftirqd);

diff --git a/include/linux/io.h b/include/linux/io.h
index 308f4f0..f63b096 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h

@@ -21,6 +21,8 @@ void __ioread32_copy(void *to, const void __iomem *from, size_t count);
 void __iowrite64_copy(void __iomem *to, const void *from, size_t count);
 
 #ifdef CONFIG_MMU
+void ioremap_phys_range_hook(phys_addr_t phys_addr, size_t size, pgprot_t prot);
+void iounmap_phys_range_hook(phys_addr_t phys_addr, size_t size);
 int ioremap_page_range(unsigned long addr, unsigned long end,
 		       phys_addr_t phys_addr, pgprot_t prot);
 #else

diff --git a/include/linux/iova.h b/include/linux/iova.h
index 83c00fa..54fb81b 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h

@@ -13,6 +13,7 @@
 #include <linux/kernel.h>
 #include <linux/rbtree.h>
 #include <linux/dma-mapping.h>
+#include <linux/android_vendor.h>
 
 /* iova structure */
 struct iova {
@@ -38,6 +39,8 @@ struct iova_domain {
 
 	struct iova_rcache	*rcaches;
 	struct hlist_node	cpuhp_dead;
+
+	ANDROID_VENDOR_DATA(1);
 };
 
 static inline unsigned long iova_size(struct iova *iova)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 37dfdcf..03cead7 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h

@@ -43,6 +43,7 @@ struct ipv6_devconf {
 	__s32		accept_ra_rt_info_max_plen;
 #endif
 #endif
+	__s32		accept_ra_rt_table;
 	__s32		proxy_ndp;
 	__s32		accept_source_route;
 	__s32		accept_ra_from_local;

diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 844a8e3..e4b51ed 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h

@@ -5,6 +5,7 @@
 #include <linux/rcupdate.h>
 #include <linux/kobject.h>
 #include <linux/mutex.h>
+#include <linux/android_vendor.h>
 
 /*
  * Core internal functions to deal with irq descriptors
@@ -102,6 +103,7 @@ struct irq_desc {
 	int			parent_irq;
 	struct module		*owner;
 	const char		*name;
+	ANDROID_VENDOR_DATA(1);
 } ____cacheline_internodealigned_in_smp;
 
 #ifdef CONFIG_SPARSE_IRQ

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 570831ca..27c9ce3 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h

@@ -108,7 +108,7 @@ struct static_key {
 
 #endif /* __ASSEMBLY__ */
 
-#ifdef CONFIG_JUMP_LABEL
+#if defined(CONFIG_JUMP_LABEL) && !defined(BUILD_FIPS140_KO)
 #include <asm/jump_label.h>
 
 #ifndef __ASSEMBLY__
@@ -195,7 +195,30 @@ enum jump_label_type {
 
 struct module;
 
-#ifdef CONFIG_JUMP_LABEL
+#ifdef BUILD_FIPS140_KO
+
+#include <linux/atomic.h>
+
+static inline int static_key_count(struct static_key *key)
+{
+	return arch_atomic_read(&key->enabled);
+}
+
+static __always_inline bool static_key_false(struct static_key *key)
+{
+	if (unlikely(static_key_count(key) > 0))
+		return true;
+	return false;
+}
+
+static __always_inline bool static_key_true(struct static_key *key)
+{
+	if (likely(static_key_count(key) > 0))
+		return true;
+	return false;
+}
+
+#elif defined(CONFIG_JUMP_LABEL)
 
 #define JUMP_TYPE_FALSE		0UL
 #define JUMP_TYPE_TRUE		1UL
@@ -408,7 +431,7 @@ extern bool ____wrong_branch_error(void);
 	static_key_count((struct static_key *)x) > 0;				\
 })
 
-#ifdef CONFIG_JUMP_LABEL
+#if defined(CONFIG_JUMP_LABEL) && !defined(BUILD_FIPS140_KO)
 
 /*
  * Combine the right initial value (type) with the right branch order

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index d811b3d7..6293091 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h

@@ -120,12 +120,13 @@ static __always_inline void kasan_poison_pages(struct page *page,
 		__kasan_poison_pages(page, order, init);
 }
 
-void __kasan_unpoison_pages(struct page *page, unsigned int order, bool init);
-static __always_inline void kasan_unpoison_pages(struct page *page,
+bool __kasan_unpoison_pages(struct page *page, unsigned int order, bool init);
+static __always_inline bool kasan_unpoison_pages(struct page *page,
 						 unsigned int order, bool init)
 {
 	if (kasan_enabled())
-		__kasan_unpoison_pages(page, order, init);
+		return __kasan_unpoison_pages(page, order, init);
+	return false;
 }
 
 void __kasan_cache_create_kmalloc(struct kmem_cache *cache);
@@ -249,8 +250,11 @@ static __always_inline bool kasan_check_byte(const void *addr)
 static inline void kasan_unpoison_range(const void *address, size_t size) {}
 static inline void kasan_poison_pages(struct page *page, unsigned int order,
 				      bool init) {}
-static inline void kasan_unpoison_pages(struct page *page, unsigned int order,
-					bool init) {}
+static inline bool kasan_unpoison_pages(struct page *page, unsigned int order,
+					bool init)
+{
+	return false;
+}
 static inline void kasan_cache_create_kmalloc(struct kmem_cache *cache) {}
 static inline void kasan_poison_slab(struct slab *slab) {}
 static inline void kasan_unpoison_object_data(struct kmem_cache *cache,

diff --git a/include/linux/mem_relinquish.h b/include/linux/mem_relinquish.h
new file mode 100644
index 0000000..b906c41
--- /dev/null
+++ b/include/linux/mem_relinquish.h

@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2022 Google LLC
+ * Author: Keir Fraser <keirf@google.com>
+ */
+
+#ifndef __MEM_RELINQUISH_H__
+#define __MEM_RELINQUISH_H__
+
+#ifdef CONFIG_MEMORY_RELINQUISH
+
+#include <asm/mem_relinquish.h>
+
+#else	/* !CONFIG_MEMORY_RELINQUISH */
+
+static inline bool kvm_has_memrelinquish_services(void) { return false; }
+static inline void page_relinquish(struct page *page) { }
+
+#endif	/* CONFIG_MEMORY_RELINQUISH */
+
+#endif	/* __MEM_RELINQUISH_H__ */

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5f74891..f944387 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h

@@ -148,9 +148,7 @@ enum zone_stat_item {
 	NR_MLOCK,		/* mlock()ed pages found and moved off LRU */
 	/* Second 128 byte cacheline */
 	NR_BOUNCE,
-#if IS_ENABLED(CONFIG_ZSMALLOC)
 	NR_ZSPAGES,		/* allocated in zsmalloc */
-#endif
 	NR_FREE_CMA_PAGES,
 	NR_VM_ZONE_STAT_ITEMS };
 
@@ -1409,6 +1407,7 @@ static inline struct pglist_data *NODE_DATA(int nid)
 extern struct pglist_data *first_online_pgdat(void);
 extern struct pglist_data *next_online_pgdat(struct pglist_data *pgdat);
 extern struct zone *next_zone(struct zone *zone);
+extern int isolate_anon_lru_page(struct page *page);
 
 /**
  * for_each_online_pgdat - helper macro to iterate over all online nodes

diff --git a/include/linux/module.h b/include/linux/module.h
index ec61fb5..6aa2bc8 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h

@@ -404,10 +404,12 @@ struct module {
 	const s32 *gpl_crcs;
 	bool using_gplonly_symbols;
 
-#ifdef CONFIG_MODULE_SIG
-	/* Signature was verified. */
+	/*
+	 * Signature was verified. Unconditionally compiled in Android to
+	 * preserve ABI compatibility between kernels without module
+	 * signing enabled and signed modules.
+	 */
 	bool sig_ok;
-#endif
 
 	bool async_probe_requested;
 

diff --git a/include/linux/netfilter/xt_quota2.h b/include/linux/netfilter/xt_quota2.h
new file mode 100644
index 0000000..a391871
--- /dev/null
+++ b/include/linux/netfilter/xt_quota2.h

@@ -0,0 +1,26 @@
+#ifndef _XT_QUOTA_H
+#define _XT_QUOTA_H
+#include <linux/types.h>
+
+enum xt_quota_flags {
+	XT_QUOTA_INVERT    = 1 << 0,
+	XT_QUOTA_GROW      = 1 << 1,
+	XT_QUOTA_PACKET    = 1 << 2,
+	XT_QUOTA_NO_CHANGE = 1 << 3,
+	XT_QUOTA_MASK      = 0x0F,
+};
+
+struct xt_quota_counter;
+
+struct xt_quota_mtinfo2 {
+	char name[15];
+	u_int8_t flags;
+
+	/* Comparison-invariant */
+	aligned_u64 quota;
+
+	/* Used internally by the kernel */
+	struct xt_quota_counter *master __attribute__((aligned(8)));
+};
+
+#endif /* _XT_QUOTA_H */

diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index d69ad5b..991e4c1 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h

@@ -58,6 +58,27 @@ extern int of_flat_dt_is_compatible(unsigned long node, const char *name);
 extern unsigned long of_get_flat_dt_root(void);
 extern uint32_t of_get_flat_dt_phandle(unsigned long node);
 
+/*
+ * early_init_dt_scan_chosen - scan the device tree for ramdisk and bootargs
+ *
+ * The boot arguments will be placed into the memory pointed to by @data.
+ * That memory should be COMMAND_LINE_SIZE big and initialized to be a valid
+ * (possibly empty) string.  Logic for what will be in @data after this
+ * function finishes:
+ *
+ * - CONFIG_CMDLINE_FORCE=true
+ *     CONFIG_CMDLINE
+ * - CONFIG_CMDLINE_EXTEND=true, @data is non-empty string
+ *     @data + dt bootargs (even if dt bootargs are empty)
+ * - CONFIG_CMDLINE_EXTEND=true, @data is empty string
+ *     CONFIG_CMDLINE + dt bootargs (even if dt bootargs are empty)
+ * - CMDLINE_FROM_BOOTLOADER=true, dt bootargs=non-empty:
+ *     dt bootargs
+ * - CMDLINE_FROM_BOOTLOADER=true, dt bootargs=empty, @data is non-empty string
+ *     @data is left unchanged
+ * - CMDLINE_FROM_BOOTLOADER=true, dt bootargs=empty, @data is empty string
+ *     CONFIG_CMDLINE (or "" if that's not defined)
+ */
 extern int early_init_dt_scan_chosen(char *cmdline);
 extern int early_init_dt_scan_memory(void);
 extern void early_init_dt_check_for_usable_mem_range(void);

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0031f7b..13c606a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h

@@ -1298,22 +1298,40 @@ extern void perf_event_bpf_event(struct bpf_prog *prog,
 
 #ifdef CONFIG_GUEST_PERF_EVENTS
 extern struct perf_guest_info_callbacks __rcu *perf_guest_cbs;
-
-DECLARE_STATIC_CALL(__perf_guest_state, *perf_guest_cbs->state);
-DECLARE_STATIC_CALL(__perf_guest_get_ip, *perf_guest_cbs->get_ip);
-DECLARE_STATIC_CALL(__perf_guest_handle_intel_pt_intr, *perf_guest_cbs->handle_intel_pt_intr);
-
+static inline struct perf_guest_info_callbacks *perf_get_guest_cbs(void)
+{
+	/*
+	 * Callbacks are RCU-protected and must be READ_ONCE to avoid reloading
+	 * the callbacks between a !NULL check and dereferences, to ensure
+	 * pending stores/changes to the callback pointers are visible before a
+	 * non-NULL perf_guest_cbs is visible to readers, and to prevent a
+	 * module from unloading callbacks while readers are active.
+	 */
+	return rcu_dereference(perf_guest_cbs);
+}
 static inline unsigned int perf_guest_state(void)
 {
-	return static_call(__perf_guest_state)();
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+
+	return guest_cbs ? guest_cbs->state() : 0;
 }
 static inline unsigned long perf_guest_get_ip(void)
 {
-	return static_call(__perf_guest_get_ip)();
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+
+	/*
+	 * Arbitrarily return '0' in the unlikely scenario that the callbacks
+	 * are unregistered between checking guest state and getting the IP.
+	 */
+	return guest_cbs ? guest_cbs->get_ip() : 0;
 }
 static inline unsigned int perf_guest_handle_intel_pt_intr(void)
 {
-	return static_call(__perf_guest_handle_intel_pt_intr)();
+	struct perf_guest_info_callbacks *guest_cbs = perf_get_guest_cbs();
+
+	if (guest_cbs && guest_cbs->handle_intel_pt_intr)
+		return guest_cbs->handle_intel_pt_intr();
+	return 0;
 }
 extern void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs);
 extern void perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs);

diff --git a/include/linux/poll.h b/include/linux/poll.h
index a9e0e1c..d1ea4f3 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h

@@ -14,11 +14,7 @@
 
 /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
    additional memory. */
-#ifdef __clang__
-#define MAX_STACK_ALLOC 768
-#else
 #define MAX_STACK_ALLOC 832
-#endif
 #define FRONTEND_STACK_ALLOC	256
 #define SELECT_STACK_ALLOC	FRONTEND_STACK_ALLOC
 #define POLL_STACK_ALLOC	FRONTEND_STACK_ALLOC

diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h
index aa2c4a7..b0cc4b8 100644
--- a/include/linux/power_supply.h
+++ b/include/linux/power_supply.h

@@ -50,6 +50,12 @@ enum {
 	POWER_SUPPLY_CHARGE_TYPE_CUSTOM,	/* use CHARGE_CONTROL_* props */
 	POWER_SUPPLY_CHARGE_TYPE_LONGLIFE,	/* slow speed, longer life */
 	POWER_SUPPLY_CHARGE_TYPE_BYPASS,	/* bypassing the charger */
+
+	/*
+	 * force to 50 to minimize the chances of userspace binary
+	 * incompatibility on newer upstream kernels
+	 */
+	POWER_SUPPLY_CHARGE_TYPE_TAPER_EXT = 50,	/* charging in CV phase */
 };
 
 enum {
@@ -780,12 +786,22 @@ static inline struct power_supply *power_supply_get_by_name(const char *name)
 #ifdef CONFIG_OF
 extern struct power_supply *power_supply_get_by_phandle(struct device_node *np,
 							const char *property);
+extern int power_supply_get_by_phandle_array(struct device_node *np,
+					     const char *property,
+					     struct power_supply **psy,
+					     ssize_t size);
 extern struct power_supply *devm_power_supply_get_by_phandle(
 				    struct device *dev, const char *property);
 #else /* !CONFIG_OF */
 static inline struct power_supply *
 power_supply_get_by_phandle(struct device_node *np, const char *property)
 { return NULL; }
+static inline int
+power_supply_get_by_phandle_array(struct device_node *np,
+				  const char *property,
+				  struct power_supply **psy,
+				  int size)
+{ return 0; }
 static inline struct power_supply *
 devm_power_supply_get_by_phandle(struct device *dev, const char *property)
 { return NULL; }

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 3c7d295..b93a349d 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h

@@ -3,8 +3,11 @@
 #define _LINUX_RING_BUFFER_H
 
 #include <linux/mm.h>
-#include <linux/seq_file.h>
 #include <linux/poll.h>
+#include <linux/ring_buffer_ext.h>
+#include <linux/seq_file.h>
+
+#include <asm/local.h>
 
 struct trace_buffer;
 struct ring_buffer_iter;
@@ -18,6 +21,8 @@ struct ring_buffer_event {
 	u32		array[];
 };
 
+#define RB_EVNT_HDR_SIZE (offsetof(struct ring_buffer_event, array))
+
 /**
  * enum ring_buffer_type - internal ring buffer types
  *
@@ -59,11 +64,49 @@ enum ring_buffer_type {
 	RINGBUF_TYPE_TIME_STAMP,
 };
 
+#define TS_SHIFT        27
+#define TS_MASK         ((1ULL << TS_SHIFT) - 1)
+#define TS_DELTA_TEST   (~TS_MASK)
+
+/*
+ * We need to fit the time_stamp delta into 27 bits.
+ */
+static inline int test_time_stamp(u64 delta)
+{
+	if (delta & TS_DELTA_TEST)
+		return 1;
+	return 0;
+}
+
 unsigned ring_buffer_event_length(struct ring_buffer_event *event);
 void *ring_buffer_event_data(struct ring_buffer_event *event);
 u64 ring_buffer_event_time_stamp(struct trace_buffer *buffer,
 				 struct ring_buffer_event *event);
 
+#define BUF_PAGE_HDR_SIZE offsetof(struct buffer_data_page, data)
+
+#define BUF_PAGE_SIZE (PAGE_SIZE - BUF_PAGE_HDR_SIZE)
+
+#define RB_ALIGNMENT		4U
+#define RB_MAX_SMALL_DATA	(RB_ALIGNMENT * RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
+#define RB_EVNT_MIN_SIZE	8U	/* two 32bit words */
+
+#ifndef CONFIG_HAVE_64BIT_ALIGNED_ACCESS
+# define RB_FORCE_8BYTE_ALIGNMENT	0
+# define RB_ARCH_ALIGNMENT		RB_ALIGNMENT
+#else
+# define RB_FORCE_8BYTE_ALIGNMENT	1
+# define RB_ARCH_ALIGNMENT		8U
+#endif
+
+#define RB_ALIGN_DATA		__aligned(RB_ARCH_ALIGNMENT)
+
+struct buffer_data_page {
+	u64		 time_stamp;	/* page time stamp */
+	local_t		 commit;	/* write committed index */
+	unsigned char	 data[] RB_ALIGN_DATA;	/* data of buffer page */
+};
+
 /*
  * ring_buffer_discard_commit will remove an event that has not
  *   been committed yet. If this is used, then ring_buffer_unlock_commit
@@ -98,6 +141,14 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
 	__ring_buffer_alloc((size), (flags), &__key);	\
 })
 
+struct ring_buffer_ext_cb {
+	int (*update_footers)(int cpu);
+	int (*swap_reader)(int cpu);
+};
+
+struct trace_buffer *
+ring_buffer_alloc_ext(unsigned long size, struct ring_buffer_ext_cb *cb);
+
 int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
 __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
 			  struct file *filp, poll_table *poll_table, int full);
@@ -212,4 +263,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
 #define trace_rb_cpu_prepare	NULL
 #endif
 
+size_t trace_buffer_pack_size(struct trace_buffer *trace_buffer);
+int trace_buffer_pack(struct trace_buffer *trace_buffer, struct trace_buffer_pack *pack);
+int ring_buffer_poke(struct trace_buffer *buffer, int cpu);
+
 #endif /* _LINUX_RING_BUFFER_H */

diff --git a/include/linux/ring_buffer_ext.h b/include/linux/ring_buffer_ext.h
new file mode 100644
index 0000000..7d86b8c
--- /dev/null
+++ b/include/linux/ring_buffer_ext.h

@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_RING_BUFFER_EXT_H
+#define _LINUX_RING_BUFFER_EXT_H
+#include <linux/mm.h>
+#include <linux/types.h>
+
+struct rb_ext_stats {
+	u64		entries;
+	unsigned long	pages_touched;
+	unsigned long	overrun;
+};
+
+#define RB_PAGE_FT_HEAD		(1 << 0)
+#define RB_PAGE_FT_READER	(1 << 1)
+#define RB_PAGE_FT_COMMIT	(1 << 2)
+
+/*
+ * The pages where the events are stored are the only shared elements between
+ * the reader and the external writer. They are convenient to enable
+ * communication from the writer to the reader. The data will be used by the
+ * reader to update its view on the ring buffer.
+ */
+struct rb_ext_page_footer {
+	atomic_t		writer_status;
+	atomic_t		reader_status;
+	struct rb_ext_stats	stats;
+};
+
+static inline struct rb_ext_page_footer *rb_ext_page_get_footer(void *page)
+{
+	struct rb_ext_page_footer *footer;
+	unsigned long page_va = (unsigned long)page;
+
+	page_va	= ALIGN_DOWN(page_va, PAGE_SIZE);
+
+	return (struct rb_ext_page_footer *)(page_va + PAGE_SIZE -
+					     sizeof(*footer));
+}
+
+#define BUF_EXT_PAGE_SIZE (BUF_PAGE_SIZE - sizeof(struct rb_ext_page_footer))
+
+/*
+ * An external writer can't rely on the internal struct ring_buffer_per_cpu.
+ * Instead, allow to pack the relevant information into struct
+ * ring_buffer_pack which can be sent to the writer. The latter can then create
+ * its own view on the ring buffer.
+ */
+struct ring_buffer_pack {
+	int		cpu;
+	unsigned long	reader_page_va;
+	unsigned long	nr_pages;
+	unsigned long	page_va[];
+};
+
+struct trace_buffer_pack {
+	int			nr_cpus;
+	unsigned long		total_pages;
+	char			__data[];	/* contains ring_buffer_pack */
+};
+
+static inline
+struct ring_buffer_pack *__next_ring_buffer_pack(struct ring_buffer_pack *rb_pack)
+{
+	size_t len;
+
+	len = offsetof(struct ring_buffer_pack, page_va) +
+		       sizeof(unsigned long) * rb_pack->nr_pages;
+
+	return (struct ring_buffer_pack *)((void *)rb_pack + len);
+}
+
+/*
+ * Accessor for ring_buffer_pack's within trace_buffer_pack
+ */
+#define for_each_ring_buffer_pack(rb_pack, cpu, trace_pack)		\
+	for (rb_pack = (struct ring_buffer_pack *)&trace_pack->__data[0], cpu = 0;	\
+	     cpu < trace_pack->nr_cpus;					\
+	     cpu++, rb_pack = __next_ring_buffer_pack(rb_pack))
+#endif

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index efa5c32..34bd4ee 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h

@@ -31,6 +31,7 @@
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 #include <linux/osq_lock.h>
 #endif
+#include <linux/android_vendor.h>
 
 /*
  * For an uncontended rwsem, count and owner are the only fields a task
@@ -63,6 +64,7 @@ struct rw_semaphore {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map	dep_map;
 #endif
+	ANDROID_VENDOR_DATA(1);
 };
 
 /* In all implementations count != 0 means locked */

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ffb6eb5..54a7ce5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h

@@ -36,6 +36,7 @@
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
 #include <linux/rv.h>
+#include <linux/android_vendor.h>
 #include <asm/kmap_size.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
@@ -311,6 +312,8 @@ extern int __must_check io_schedule_prepare(void);
 extern void io_schedule_finish(int token);
 extern long io_schedule_timeout(long timeout);
 extern void io_schedule(void);
+extern struct task_struct *pick_migrate_task(struct rq *rq);
+extern int select_fallback_rq(int cpu, struct task_struct *p);
 
 /**
  * struct prev_cputime - snapshot of system and user cputime
@@ -1023,6 +1026,10 @@ struct task_struct {
 	u64				stimescaled;
 #endif
 	u64				gtime;
+#ifdef CONFIG_CPU_FREQ_TIMES
+	u64				*time_in_state;
+	unsigned int			max_state;
+#endif
 	struct prev_cputime		prev_cputime;
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 	struct vtime			vtime;
@@ -1135,6 +1142,7 @@ struct task_struct {
 	raw_spinlock_t			pi_lock;
 
 	struct wake_q_node		wake_q;
+	int				wake_q_count;
 
 #ifdef CONFIG_RT_MUTEXES
 	/* PI waiters blocked on a rt_mutex held by this task: */
@@ -1500,6 +1508,8 @@ struct task_struct {
 	struct callback_head		mce_kill_me;
 	int				mce_count;
 #endif
+	ANDROID_VENDOR_DATA_ARRAY(1, 64);
+	ANDROID_OEM_DATA_ARRAY(1, 6);
 
 #ifdef CONFIG_KRETPROBES
 	struct llist_head               kretprobe_instances;

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 816df6c..2b7267c1 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h

@@ -3,6 +3,7 @@
 #define _LINUX_SCHED_TOPOLOGY_H
 
 #include <linux/topology.h>
+#include <linux/android_vendor.h>
 
 #include <linux/sched/idle.h>
 
@@ -82,6 +83,8 @@ struct sched_domain_shared {
 	atomic_t	nr_busy_cpus;
 	int		has_idle_cores;
 	int		nr_idle_scan;
+
+	ANDROID_VENDOR_DATA(1);
 };
 
 struct sched_domain {

diff --git a/include/linux/sched/wake_q.h b/include/linux/sched/wake_q.h
index 06cd8fb..9a1394f 100644
--- a/include/linux/sched/wake_q.h
+++ b/include/linux/sched/wake_q.h

@@ -38,6 +38,7 @@
 struct wake_q_head {
 	struct wake_q_node *first;
 	struct wake_q_node **lastp;
+	int count;
 };
 
 #define WAKE_Q_TAIL ((struct wake_q_node *) 0x01)
@@ -52,6 +53,7 @@ static inline void wake_q_init(struct wake_q_head *head)
 {
 	head->first = WAKE_Q_TAIL;
 	head->lastp = &head->first;
+	head->count = 0;
 }
 
 static inline bool wake_q_empty(struct wake_q_head *head)

diff --git a/include/linux/sched/xacct.h b/include/linux/sched/xacct.h
index c078f0a..9544c9d 100644
--- a/include/linux/sched/xacct.h
+++ b/include/linux/sched/xacct.h

@@ -28,6 +28,11 @@ static inline void inc_syscw(struct task_struct *tsk)
 {
 	tsk->ioac.syscw++;
 }
+
+static inline void inc_syscfs(struct task_struct *tsk)
+{
+	tsk->ioac.syscfs++;
+}
 #else
 static inline void add_rchar(struct task_struct *tsk, ssize_t amt)
 {
@@ -44,6 +49,10 @@ static inline void inc_syscr(struct task_struct *tsk)
 static inline void inc_syscw(struct task_struct *tsk)
 {
 }
+
+static inline void inc_syscfs(struct task_struct *tsk)
+{
+}
 #endif
 
 #endif /* _LINUX_SCHED_XACCT_H */

diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index 369769ce..68c9116 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h

@@ -56,6 +56,8 @@ static inline int clear_mce_nospec(unsigned long pfn)
 }
 #endif
 
+int set_direct_map_range_uncached(unsigned long addr, unsigned long numpages);
+
 #ifndef CONFIG_ARCH_HAS_MEM_ENCRYPT
 static inline int set_memory_encrypted(unsigned long addr, int numpages)
 {

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 45efc6c..8477d8e 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h

@@ -637,7 +637,7 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag
 }
 
 void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
-				  unsigned long caller) __alloc_size(1);
+				  unsigned long caller);
 #define kmalloc_node_track_caller(size, flags, node) \
 	__kmalloc_node_track_caller(size, flags, node, \
 				    _RET_IP_)

diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index cfe19a0..d1ab3ff 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h

@@ -510,6 +510,7 @@ extern bool pm_get_wakeup_count(unsigned int *count, bool block);
 extern bool pm_save_wakeup_count(unsigned int count);
 extern void pm_wakep_autosleep_enabled(bool set);
 extern void pm_print_active_wakeup_sources(void);
+extern void pm_get_active_wakeup_sources(char *pending_sources, size_t max);
 
 extern unsigned int lock_system_sleep(void);
 extern void unlock_system_sleep(unsigned int);

diff --git a/include/linux/task_io_accounting.h b/include/linux/task_io_accounting.h
index 6f6acce..bb26108 100644
--- a/include/linux/task_io_accounting.h
+++ b/include/linux/task_io_accounting.h

@@ -19,6 +19,8 @@ struct task_io_accounting {
 	u64 syscr;
 	/* # of write syscalls */
 	u64 syscw;
+	/* # of fsync syscalls */
+	u64 syscfs;
 #endif /* CONFIG_TASK_XACCT */
 
 #ifdef CONFIG_TASK_IO_ACCOUNTING

diff --git a/include/linux/task_io_accounting_ops.h b/include/linux/task_io_accounting_ops.h
index bb5498b..733ab62 100644
--- a/include/linux/task_io_accounting_ops.h
+++ b/include/linux/task_io_accounting_ops.h

@@ -97,6 +97,7 @@ static inline void task_chr_io_accounting_add(struct task_io_accounting *dst,
 	dst->wchar += src->wchar;
 	dst->syscr += src->syscr;
 	dst->syscw += src->syscw;
+	dst->syscfs += src->syscfs;
 }
 #else
 static inline void task_chr_io_accounting_add(struct task_io_accounting *dst,

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index fe1e467..01308f5 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h

@@ -240,17 +240,23 @@ struct ktime_timestamps {
  *				 counter value
  * @cycles:	Clocksource counter value to produce the system times
  * @real:	Realtime system time
+ * @boot:	Boot time
  * @raw:	Monotonic raw system time
  * @clock_was_set_seq:	The sequence number of clock was set events
  * @cs_was_changed_seq:	The sequence number of clocksource change events
+ * @mono_shift:	The monotonic clock slope shift
+ * @mono_mult:	The monotonic clock slope mult
  */
 struct system_time_snapshot {
 	u64			cycles;
 	ktime_t			real;
+	ktime_t			boot;
 	ktime_t			raw;
 	enum clocksource_ids	cs_id;
 	unsigned int		clock_was_set_seq;
 	u8			cs_was_changed_seq;
+	u32			mono_shift;
+	u32			mono_mult;
 };
 
 /**

diff --git a/include/linux/usb/composite.h b/include/linux/usb/composite.h
index 43ac3fa..814f6579 100644
--- a/include/linux/usb/composite.h
+++ b/include/linux/usb/composite.h

@@ -578,6 +578,7 @@ struct usb_function_instance {
 	struct config_group group;
 	struct list_head cfs_list;
 	struct usb_function_driver *fd;
+	struct usb_function *f;
 	int (*set_inst_name)(struct usb_function_instance *inst,
 			      const char *name);
 	void (*free_func_inst)(struct usb_function_instance *inst);

diff --git a/include/linux/usb/f_accessory.h b/include/linux/usb/f_accessory.h
new file mode 100644
index 0000000..ebe3c4d
--- /dev/null
+++ b/include/linux/usb/f_accessory.h

@@ -0,0 +1,23 @@
+/*
+ * Gadget Function Driver for Android USB accessories
+ *
+ * Copyright (C) 2011 Google, Inc.
+ * Author: Mike Lockwood <lockwood@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef __LINUX_USB_F_ACCESSORY_H
+#define __LINUX_USB_F_ACCESSORY_H
+
+#include <uapi/linux/usb/f_accessory.h>
+
+#endif /* __LINUX_USB_F_ACCESSORY_H */

diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index dcab9c7..34e93634 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h

@@ -96,6 +96,8 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
 int virtqueue_resize(struct virtqueue *vq, u32 num,
 		     void (*recycle)(struct virtqueue *vq, void *buf));
 
+void virtqueue_disable_dma_api_for_buffers(struct virtqueue *vq);
+
 /**
  * struct virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 35d7eed..de584a7 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h

@@ -9,7 +9,8 @@
 
 #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
 #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
-#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
+#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		virtio_transport_max_vsock_pkt_buf_size
+extern uint virtio_transport_max_vsock_pkt_buf_size;
 
 enum {
 	VSOCK_VQ_RX     = 0, /* for host to guest data */

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 7f5a51a..2eae4fc 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h

@@ -229,6 +229,7 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head);
 #define wake_up_interruptible_nr(x, nr)	__wake_up(x, TASK_INTERRUPTIBLE, nr, NULL)
 #define wake_up_interruptible_all(x)	__wake_up(x, TASK_INTERRUPTIBLE, 0, NULL)
 #define wake_up_interruptible_sync(x)	__wake_up_sync((x), TASK_INTERRUPTIBLE)
+#define wake_up_sync(x)			__wake_up_sync((x), TASK_NORMAL)
 
 /*
  * Wakeup macros to be used to report events to the targets.

diff --git a/include/linux/wakeup_reason.h b/include/linux/wakeup_reason.h
new file mode 100644
index 0000000..54f5caa
--- /dev/null
+++ b/include/linux/wakeup_reason.h

@@ -0,0 +1,37 @@
+/*
+ * include/linux/wakeup_reason.h
+ *
+ * Logs the reason which caused the kernel to resume
+ * from the suspend mode.
+ *
+ * Copyright (C) 2014 Google, Inc.
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _LINUX_WAKEUP_REASON_H
+#define _LINUX_WAKEUP_REASON_H
+
+#define MAX_SUSPEND_ABORT_LEN 256
+
+#ifdef CONFIG_SUSPEND
+void log_irq_wakeup_reason(int irq);
+void log_threaded_irq_wakeup_reason(int irq, int parent_irq);
+void log_suspend_abort_reason(const char *fmt, ...);
+void log_abnormal_wakeup_reason(const char *fmt, ...);
+void clear_wakeup_reasons(void);
+#else
+static inline void log_irq_wakeup_reason(int irq) { }
+static inline void log_threaded_irq_wakeup_reason(int irq, int parent_irq) { }
+static inline void log_suspend_abort_reason(const char *fmt, ...) { }
+static inline void log_abnormal_wakeup_reason(const char *fmt, ...) { }
+static inline void clear_wakeup_reasons(void) { }
+#endif
+
+#endif /* _LINUX_WAKEUP_REASON_H */

diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 3253bd2..b64bb64 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h

@@ -20,7 +20,7 @@
 #include <media/media-request.h>
 #include <media/frame_vector.h>
 
-#define VB2_MAX_FRAME	(32)
+#define VB2_MAX_FRAME	(64)
 #define VB2_MAX_PLANES	(8)
 
 /**

diff --git a/include/net/TEST_MAPPING b/include/net/TEST_MAPPING
new file mode 100644
index 0000000..1eb8d43
--- /dev/null
+++ b/include/net/TEST_MAPPING

@@ -0,0 +1,12 @@
+{
+  "presubmit": [
+    {
+      "name": "CtsNetTestCases",
+      "options": [
+        {
+          "exclude-annotation": "com.android.testutils.SkipPresubmit"
+        }
+      ]
+    }
+  ]
+}

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index c04f359..7ae7175 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h

@@ -271,6 +271,18 @@ static inline bool ipv6_is_mld(struct sk_buff *skb, int nexthdr, int offset)
 void addrconf_prefix_rcv(struct net_device *dev,
 			 u8 *opt, int len, bool sllao);
 
+/* Determines into what table to put autoconf PIO/RIO/default routes
+ * learned on this device.
+ *
+ * - If 0, use the same table for every device. This puts routes into
+ *   one of RT_TABLE_{PREFIX,INFO,DFLT} depending on the type of route
+ *   (but note that these three are currently all equal to
+ *   RT6_TABLE_MAIN).
+ * - If > 0, use the specified table.
+ * - If < 0, put routes into table dev->ifindex + (-rt_table).
+ */
+u32 addrconf_rt_table(const struct net_device *dev, u32 default_table);
+
 /*
  *	anycast prototypes (anycast.c)
  */

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index dbc81f5..380eea4 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h

@@ -1597,9 +1597,7 @@ int xfrm_trans_queue(struct sk_buff *skb,
 int xfrm_output_resume(struct sock *sk, struct sk_buff *skb, int err);
 int xfrm_output(struct sock *sk, struct sk_buff *skb);
 
-#if IS_ENABLED(CONFIG_NET_PKTGEN)
 int pktgen_xfrm_outer_mode_output(struct xfrm_state *x, struct sk_buff *skb);
-#endif
 
 void xfrm_local_error(struct sk_buff *skb, int mtu);
 int xfrm4_extract_input(struct xfrm_state *x, struct sk_buff *skb);

diff --git a/include/sound/soc.h b/include/sound/soc.h
index 37bbfc8..f16861c 100644
--- a/include/sound/soc.h
+++ b/include/sound/soc.h

@@ -253,6 +253,14 @@
 	.get = xhandler_get, .put = xhandler_put, \
 	.private_value = SOC_DOUBLE_R_VALUE(reg_left, reg_right, xshift, \
 					    xmax, xinvert) }
+#define SOC_SINGLE_MULTI_EXT(xname, xreg, xshift, xmax, xinvert, xcount,\
+	xhandler_get, xhandler_put) \
+{	.iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = xname, \
+	.info = snd_soc_info_multi_ext, \
+	.get = xhandler_get, .put = xhandler_put, \
+	.private_value = (unsigned long)&(struct soc_multi_mixer_control) \
+		{.reg = xreg, .shift = xshift, .rshift = xshift, .max = xmax, \
+		.count = xcount, .platform_max = xmax, .invert = xinvert} }
 #define SOC_SINGLE_EXT_TLV(xname, xreg, xshift, xmax, xinvert,\
 	 xhandler_get, xhandler_put, tlv_array) \
 {	.iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = xname, \
@@ -606,6 +614,8 @@ int snd_soc_get_strobe(struct snd_kcontrol *kcontrol,
 	struct snd_ctl_elem_value *ucontrol);
 int snd_soc_put_strobe(struct snd_kcontrol *kcontrol,
 	struct snd_ctl_elem_value *ucontrol);
+int snd_soc_info_multi_ext(struct snd_kcontrol *kcontrol,
+	struct snd_ctl_elem_info *uinfo);
 
 /* SoC PCM stream information */
 struct snd_soc_pcm_stream {
@@ -1165,6 +1175,11 @@ struct soc_mreg_control {
 	unsigned int regbase, regcount, nbits, invert;
 };
 
+struct soc_multi_mixer_control {
+	int min, max, platform_max, count;
+	unsigned int reg, rreg, shift, rshift, invert;
+};
+
 /* enumerated kcontrol */
 struct soc_enum {
 	int reg;

diff --git a/include/trace/events/OWNERS b/include/trace/events/OWNERS
new file mode 100644
index 0000000..a63dbf4
--- /dev/null
+++ b/include/trace/events/OWNERS

@@ -0,0 +1 @@
+per-file f2fs**=file:/fs/f2fs/OWNERS

diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index ff57e7f..31d994e 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h

@@ -48,6 +48,8 @@ TRACE_DEFINE_ENUM(CP_DISCARD);
 TRACE_DEFINE_ENUM(CP_TRIMMED);
 TRACE_DEFINE_ENUM(CP_PAUSE);
 TRACE_DEFINE_ENUM(CP_RESIZE);
+TRACE_DEFINE_ENUM(EX_READ);
+TRACE_DEFINE_ENUM(EX_BLOCK_AGE);
 
 #define show_block_type(type)						\
 	__print_symbolic(type,						\
@@ -154,6 +156,11 @@ TRACE_DEFINE_ENUM(CP_RESIZE);
 		{ COMPRESS_ZSTD,	"ZSTD" },			\
 		{ COMPRESS_LZORLE,	"LZO-RLE" })
 
+#define show_extent_type(type)						\
+	__print_symbolic(type,						\
+		{ EX_READ,	"Read" },				\
+		{ EX_BLOCK_AGE,	"Block Age" })
+
 struct f2fs_sb_info;
 struct f2fs_io_info;
 struct extent_info;
@@ -1404,7 +1411,7 @@ TRACE_EVENT(f2fs_readpages,
 
 TRACE_EVENT(f2fs_write_checkpoint,
 
-	TP_PROTO(struct super_block *sb, int reason, char *msg),
+	TP_PROTO(struct super_block *sb, int reason, const char *msg),
 
 	TP_ARGS(sb, reason, msg),
 
@@ -1522,28 +1529,31 @@ TRACE_EVENT(f2fs_issue_flush,
 
 TRACE_EVENT(f2fs_lookup_extent_tree_start,
 
-	TP_PROTO(struct inode *inode, unsigned int pgofs),
+	TP_PROTO(struct inode *inode, unsigned int pgofs, enum extent_type type),
 
-	TP_ARGS(inode, pgofs),
+	TP_ARGS(inode, pgofs, type),
 
 	TP_STRUCT__entry(
 		__field(dev_t,	dev)
 		__field(ino_t,	ino)
 		__field(unsigned int, pgofs)
+		__field(enum extent_type, type)
 	),
 
 	TP_fast_assign(
 		__entry->dev = inode->i_sb->s_dev;
 		__entry->ino = inode->i_ino;
 		__entry->pgofs = pgofs;
+		__entry->type = type;
 	),
 
-	TP_printk("dev = (%d,%d), ino = %lu, pgofs = %u",
+	TP_printk("dev = (%d,%d), ino = %lu, pgofs = %u, type = %s",
 		show_dev_ino(__entry),
-		__entry->pgofs)
+		__entry->pgofs,
+		show_extent_type(__entry->type))
 );
 
-TRACE_EVENT_CONDITION(f2fs_lookup_extent_tree_end,
+TRACE_EVENT_CONDITION(f2fs_lookup_read_extent_tree_end,
 
 	TP_PROTO(struct inode *inode, unsigned int pgofs,
 						struct extent_info *ei),
@@ -1557,8 +1567,8 @@ TRACE_EVENT_CONDITION(f2fs_lookup_extent_tree_end,
 		__field(ino_t,	ino)
 		__field(unsigned int, pgofs)
 		__field(unsigned int, fofs)
-		__field(u32, blk)
 		__field(unsigned int, len)
+		__field(u32, blk)
 	),
 
 	TP_fast_assign(
@@ -1566,26 +1576,65 @@ TRACE_EVENT_CONDITION(f2fs_lookup_extent_tree_end,
 		__entry->ino = inode->i_ino;
 		__entry->pgofs = pgofs;
 		__entry->fofs = ei->fofs;
-		__entry->blk = ei->blk;
 		__entry->len = ei->len;
+		__entry->blk = ei->blk;
 	),
 
 	TP_printk("dev = (%d,%d), ino = %lu, pgofs = %u, "
-		"ext_info(fofs: %u, blk: %u, len: %u)",
+		"read_ext_info(fofs: %u, len: %u, blk: %u)",
 		show_dev_ino(__entry),
 		__entry->pgofs,
 		__entry->fofs,
-		__entry->blk,
-		__entry->len)
+		__entry->len,
+		__entry->blk)
 );
 
-TRACE_EVENT(f2fs_update_extent_tree_range,
+TRACE_EVENT_CONDITION(f2fs_lookup_age_extent_tree_end,
 
-	TP_PROTO(struct inode *inode, unsigned int pgofs, block_t blkaddr,
-						unsigned int len,
+	TP_PROTO(struct inode *inode, unsigned int pgofs,
+						struct extent_info *ei),
+
+	TP_ARGS(inode, pgofs, ei),
+
+	TP_CONDITION(ei),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(unsigned int, pgofs)
+		__field(unsigned int, fofs)
+		__field(unsigned int, len)
+		__field(unsigned long long, age)
+		__field(unsigned long long, blocks)
+	),
+
+	TP_fast_assign(
+		__entry->dev = inode->i_sb->s_dev;
+		__entry->ino = inode->i_ino;
+		__entry->pgofs = pgofs;
+		__entry->fofs = ei->fofs;
+		__entry->len = ei->len;
+		__entry->age = ei->age;
+		__entry->blocks = ei->last_blocks;
+	),
+
+	TP_printk("dev = (%d,%d), ino = %lu, pgofs = %u, "
+		"age_ext_info(fofs: %u, len: %u, age: %llu, blocks: %llu)",
+		show_dev_ino(__entry),
+		__entry->pgofs,
+		__entry->fofs,
+		__entry->len,
+		__entry->age,
+		__entry->blocks)
+);
+
+TRACE_EVENT(f2fs_update_read_extent_tree_range,
+
+	TP_PROTO(struct inode *inode, unsigned int pgofs, unsigned int len,
+						block_t blkaddr,
 						unsigned int c_len),
 
-	TP_ARGS(inode, pgofs, blkaddr, len, c_len),
+	TP_ARGS(inode, pgofs, len, blkaddr, c_len),
 
 	TP_STRUCT__entry(
 		__field(dev_t,	dev)
@@ -1600,67 +1649,108 @@ TRACE_EVENT(f2fs_update_extent_tree_range,
 		__entry->dev = inode->i_sb->s_dev;
 		__entry->ino = inode->i_ino;
 		__entry->pgofs = pgofs;
-		__entry->blk = blkaddr;
 		__entry->len = len;
+		__entry->blk = blkaddr;
 		__entry->c_len = c_len;
 	),
 
 	TP_printk("dev = (%d,%d), ino = %lu, pgofs = %u, "
-					"blkaddr = %u, len = %u, "
-					"c_len = %u",
+				"len = %u, blkaddr = %u, c_len = %u",
 		show_dev_ino(__entry),
 		__entry->pgofs,
-		__entry->blk,
 		__entry->len,
+		__entry->blk,
 		__entry->c_len)
 );
 
+TRACE_EVENT(f2fs_update_age_extent_tree_range,
+
+	TP_PROTO(struct inode *inode, unsigned int pgofs, unsigned int len,
+					unsigned long long age,
+					unsigned long long last_blks),
+
+	TP_ARGS(inode, pgofs, len, age, last_blks),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(unsigned int, pgofs)
+		__field(unsigned int, len)
+		__field(unsigned long long, age)
+		__field(unsigned long long, blocks)
+	),
+
+	TP_fast_assign(
+		__entry->dev = inode->i_sb->s_dev;
+		__entry->ino = inode->i_ino;
+		__entry->pgofs = pgofs;
+		__entry->len = len;
+		__entry->age = age;
+		__entry->blocks = last_blks;
+	),
+
+	TP_printk("dev = (%d,%d), ino = %lu, pgofs = %u, "
+				"len = %u, age = %llu, blocks = %llu",
+		show_dev_ino(__entry),
+		__entry->pgofs,
+		__entry->len,
+		__entry->age,
+		__entry->blocks)
+);
+
 TRACE_EVENT(f2fs_shrink_extent_tree,
 
 	TP_PROTO(struct f2fs_sb_info *sbi, unsigned int node_cnt,
-						unsigned int tree_cnt),
+			unsigned int tree_cnt, enum extent_type type),
 
-	TP_ARGS(sbi, node_cnt, tree_cnt),
+	TP_ARGS(sbi, node_cnt, tree_cnt, type),
 
 	TP_STRUCT__entry(
 		__field(dev_t,	dev)
 		__field(unsigned int, node_cnt)
 		__field(unsigned int, tree_cnt)
+		__field(enum extent_type, type)
 	),
 
 	TP_fast_assign(
 		__entry->dev = sbi->sb->s_dev;
 		__entry->node_cnt = node_cnt;
 		__entry->tree_cnt = tree_cnt;
+		__entry->type = type;
 	),
 
-	TP_printk("dev = (%d,%d), shrunk: node_cnt = %u, tree_cnt = %u",
+	TP_printk("dev = (%d,%d), shrunk: node_cnt = %u, tree_cnt = %u, type = %s",
 		show_dev(__entry->dev),
 		__entry->node_cnt,
-		__entry->tree_cnt)
+		__entry->tree_cnt,
+		show_extent_type(__entry->type))
 );
 
 TRACE_EVENT(f2fs_destroy_extent_tree,
 
-	TP_PROTO(struct inode *inode, unsigned int node_cnt),
+	TP_PROTO(struct inode *inode, unsigned int node_cnt,
+				enum extent_type type),
 
-	TP_ARGS(inode, node_cnt),
+	TP_ARGS(inode, node_cnt, type),
 
 	TP_STRUCT__entry(
 		__field(dev_t,	dev)
 		__field(ino_t,	ino)
 		__field(unsigned int, node_cnt)
+		__field(enum extent_type, type)
 	),
 
 	TP_fast_assign(
 		__entry->dev = inode->i_sb->s_dev;
 		__entry->ino = inode->i_ino;
 		__entry->node_cnt = node_cnt;
+		__entry->type = type;
 	),
 
-	TP_printk("dev = (%d,%d), ino = %lu, destroyed: node_cnt = %u",
+	TP_printk("dev = (%d,%d), ino = %lu, destroyed: node_cnt = %u, type = %s",
 		show_dev_ino(__entry),
-		__entry->node_cnt)
+		__entry->node_cnt,
+		show_extent_type(__entry->type))
 );
 
 DECLARE_EVENT_CLASS(f2fs_sync_dirty_inodes,

diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index eeceafa..da85851 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h

@@ -160,6 +160,49 @@ DEFINE_EVENT(softirq, softirq_raise,
 	TP_ARGS(vec_nr)
 );
 
+DECLARE_EVENT_CLASS(tasklet,
+
+	TP_PROTO(void *func),
+
+	TP_ARGS(func),
+
+	TP_STRUCT__entry(
+		__field(	void *,	func)
+	),
+
+	TP_fast_assign(
+		__entry->func = func;
+	),
+
+	TP_printk("function=%ps", __entry->func)
+);
+
+/**
+ * tasklet_entry - called immediately before the tasklet is run
+ * @func:  tasklet callback or function being run
+ *
+ * Used to find individual tasklet execution time
+ */
+DEFINE_EVENT(tasklet, tasklet_entry,
+
+	TP_PROTO(void *func),
+
+	TP_ARGS(func)
+);
+
+/**
+ * tasklet_exit - called immediately after the tasklet is run
+ * @func:  tasklet callback or function being run
+ *
+ * Used to find individual tasklet execution time
+ */
+DEFINE_EVENT(tasklet, tasklet_exit,
+
+	TP_PROTO(void *func),
+
+	TP_ARGS(func)
+);
+
 #endif /*  _TRACE_IRQ_H */
 
 /* This part must be outside protection */

diff --git a/include/trace/events/rwmmio.h b/include/trace/events/rwmmio.h
index de41159..a43e5dd 100644
--- a/include/trace/events/rwmmio.h
+++ b/include/trace/events/rwmmio.h

@@ -12,12 +12,14 @@
 
 DECLARE_EVENT_CLASS(rwmmio_rw_template,
 
-	TP_PROTO(unsigned long caller, u64 val, u8 width, volatile void __iomem *addr),
+	TP_PROTO(unsigned long caller, unsigned long caller0, u64 val, u8 width,
+		 volatile void __iomem *addr),
 
-	TP_ARGS(caller, val, width, addr),
+	TP_ARGS(caller, caller0, val, width, addr),
 
 	TP_STRUCT__entry(
 		__field(unsigned long, caller)
+		__field(unsigned long, caller0)
 		__field(unsigned long, addr)
 		__field(u64, val)
 		__field(u8, width)
@@ -25,56 +27,64 @@ DECLARE_EVENT_CLASS(rwmmio_rw_template,
 
 	TP_fast_assign(
 		__entry->caller = caller;
+		__entry->caller0 = caller0;
 		__entry->val = val;
 		__entry->addr = (unsigned long)addr;
 		__entry->width = width;
 	),
 
-	TP_printk("%pS width=%d val=%#llx addr=%#lx",
-		(void *)__entry->caller, __entry->width,
+	TP_printk("%pS -> %pS width=%d val=%#llx addr=%#lx",
+		(void *)__entry->caller0, (void *)__entry->caller, __entry->width,
 		__entry->val, __entry->addr)
 );
 
 DEFINE_EVENT(rwmmio_rw_template, rwmmio_write,
-	TP_PROTO(unsigned long caller, u64 val, u8 width, volatile void __iomem *addr),
-	TP_ARGS(caller, val, width, addr)
+	TP_PROTO(unsigned long caller, unsigned long caller0, u64 val, u8 width,
+		 volatile void __iomem *addr),
+	TP_ARGS(caller, caller0, val, width, addr)
 );
 
 DEFINE_EVENT(rwmmio_rw_template, rwmmio_post_write,
-	TP_PROTO(unsigned long caller, u64 val, u8 width, volatile void __iomem *addr),
-	TP_ARGS(caller, val, width, addr)
+	TP_PROTO(unsigned long caller, unsigned long caller0, u64 val, u8 width,
+		 volatile void __iomem *addr),
+	TP_ARGS(caller, caller0, val, width, addr)
 );
 
 TRACE_EVENT(rwmmio_read,
 
-	TP_PROTO(unsigned long caller, u8 width, const volatile void __iomem *addr),
+	TP_PROTO(unsigned long caller, unsigned long caller0, u8 width,
+		 const volatile void __iomem *addr),
 
-	TP_ARGS(caller, width, addr),
+	TP_ARGS(caller, caller0, width, addr),
 
 	TP_STRUCT__entry(
 		__field(unsigned long, caller)
+		__field(unsigned long, caller0)
 		__field(unsigned long, addr)
 		__field(u8, width)
 	),
 
 	TP_fast_assign(
 		__entry->caller = caller;
+		__entry->caller0 = caller0;
 		__entry->addr = (unsigned long)addr;
 		__entry->width = width;
 	),
 
-	TP_printk("%pS width=%d addr=%#lx",
-		 (void *)__entry->caller, __entry->width, __entry->addr)
+	TP_printk("%pS -> %pS width=%d addr=%#lx",
+		 (void *)__entry->caller0, (void *)__entry->caller, __entry->width, __entry->addr)
 );
 
 TRACE_EVENT(rwmmio_post_read,
 
-	TP_PROTO(unsigned long caller, u64 val, u8 width, const volatile void __iomem *addr),
+	TP_PROTO(unsigned long caller, unsigned long caller0, u64 val, u8 width,
+		 const volatile void __iomem *addr),
 
-	TP_ARGS(caller, val, width, addr),
+	TP_ARGS(caller, caller0, val, width, addr),
 
 	TP_STRUCT__entry(
 		__field(unsigned long, caller)
+		__field(unsigned long, caller0)
 		__field(unsigned long, addr)
 		__field(u64, val)
 		__field(u8, width)
@@ -82,13 +92,14 @@ TRACE_EVENT(rwmmio_post_read,
 
 	TP_fast_assign(
 		__entry->caller = caller;
+		__entry->caller0 = caller0;
 		__entry->val = val;
 		__entry->addr = (unsigned long)addr;
 		__entry->width = width;
 	),
 
-	TP_printk("%pS width=%d val=%#llx addr=%#lx",
-		 (void *)__entry->caller, __entry->width,
+	TP_printk("%pS -> %pS width=%d val=%#llx addr=%#lx",
+		 (void *)__entry->caller0, (void *)__entry->caller, __entry->width,
 		 __entry->val, __entry->addr)
 );
 

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index fbb99a6..cd969a6 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h

@@ -488,6 +488,30 @@ DEFINE_EVENT_SCHEDSTAT(sched_stat_template, sched_stat_blocked,
 	     TP_ARGS(tsk, delay));
 
 /*
+ * Tracepoint for recording the cause of uninterruptible sleep.
+ */
+TRACE_EVENT(sched_blocked_reason,
+
+	TP_PROTO(struct task_struct *tsk),
+
+	TP_ARGS(tsk),
+
+	TP_STRUCT__entry(
+		__field( pid_t,	pid	)
+		__field( void*, caller	)
+		__field( bool, io_wait	)
+	),
+
+	TP_fast_assign(
+		__entry->pid	= tsk->pid;
+		__entry->caller = (void *)__get_wchan(tsk);
+		__entry->io_wait = tsk->in_iowait;
+	),
+
+	TP_printk("pid=%d iowait=%d caller=%pS", __entry->pid, __entry->io_wait, __entry->caller)
+);
+
+/*
  * Tracepoint for accounting runtime (time the task is executing
  * on a CPU).
  */

diff --git a/include/trace/hooks/avc.h b/include/trace/hooks/avc.h
new file mode 100644
index 0000000..5100dde7
--- /dev/null
+++ b/include/trace/hooks/avc.h

@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM avc
+
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_AVC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_AVC_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct avc_node;
+DECLARE_RESTRICTED_HOOK(android_rvh_selinux_avc_insert,
+	TP_PROTO(const struct avc_node *node),
+	TP_ARGS(node), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_selinux_avc_node_delete,
+	TP_PROTO(const struct avc_node *node),
+	TP_ARGS(node), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_selinux_avc_node_replace,
+	TP_PROTO(const struct avc_node *old, const struct avc_node *new),
+	TP_ARGS(old, new), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_selinux_avc_lookup,
+	TP_PROTO(const struct avc_node *node, u32 ssid, u32 tsid, u16 tclass),
+	TP_ARGS(node, ssid, tsid, tclass), 1);
+
+#endif /* _TRACE_HOOK_AVC_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/binder.h b/include/trace/hooks/binder.h
new file mode 100644
index 0000000..4cb0d78
--- /dev/null
+++ b/include/trace/hooks/binder.h

@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM binder
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_BINDER_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_BINDER_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct binder_transaction;
+struct task_struct;
+DECLARE_HOOK(android_vh_binder_transaction_init,
+	TP_PROTO(struct binder_transaction *t),
+	TP_ARGS(t));
+DECLARE_HOOK(android_vh_binder_set_priority,
+	TP_PROTO(struct binder_transaction *t, struct task_struct *task),
+	TP_ARGS(t, task));
+DECLARE_HOOK(android_vh_binder_restore_priority,
+	TP_PROTO(struct binder_transaction *t, struct task_struct *task),
+	TP_ARGS(t, task));
+struct binder_proc;
+struct binder_thread;
+DECLARE_HOOK(android_vh_binder_wakeup_ilocked,
+	TP_PROTO(struct task_struct *task, bool sync, struct binder_proc *proc),
+	TP_ARGS(task, sync, proc));
+DECLARE_HOOK(android_vh_binder_wait_for_work,
+	TP_PROTO(bool do_proc_work, struct binder_thread *tsk, struct binder_proc *proc),
+	TP_ARGS(do_proc_work, tsk, proc));
+DECLARE_HOOK(android_vh_sync_txn_recvd,
+	TP_PROTO(struct task_struct *tsk, struct task_struct *from),
+	TP_ARGS(tsk, from));
+#endif /* _TRACE_HOOK_BINDER_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/cgroup.h b/include/trace/hooks/cgroup.h
new file mode 100644
index 0000000..b8510d6
--- /dev/null
+++ b/include/trace/hooks/cgroup.h

@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cgroup
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_CGROUP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_CGROUP_H
+#include <trace/hooks/vendor_hooks.h>
+
+struct task_struct;
+DECLARE_HOOK(android_vh_cgroup_set_task,
+	TP_PROTO(int ret, struct task_struct *task),
+	TP_ARGS(ret, task));
+
+struct cgroup_subsys;
+struct cgroup_taskset;
+DECLARE_HOOK(android_vh_cgroup_attach,
+	TP_PROTO(struct cgroup_subsys *ss, struct cgroup_taskset *tset),
+	TP_ARGS(ss, tset))
+DECLARE_RESTRICTED_HOOK(android_rvh_cgroup_force_kthread_migration,
+	TP_PROTO(struct task_struct *tsk, struct cgroup *dst_cgrp, bool *force_migration),
+	TP_ARGS(tsk, dst_cgrp, force_migration), 1);
+
+struct cgroup_taskset;
+struct cgroup_subsys;
+
+DECLARE_RESTRICTED_HOOK(android_rvh_cpu_cgroup_attach,
+	TP_PROTO(struct cgroup_taskset *tset),
+	TP_ARGS(tset), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_cpu_cgroup_online,
+	TP_PROTO(struct cgroup_subsys_state *css),
+	TP_ARGS(css), 1);
+#endif
+
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/cpufreq.h b/include/trace/hooks/cpufreq.h
new file mode 100644
index 0000000..c6c2b19
--- /dev/null
+++ b/include/trace/hooks/cpufreq.h

@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cpufreq
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_CPUFREQ_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_CPUFREQ_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+struct cpufreq_policy;
+
+DECLARE_RESTRICTED_HOOK(android_rvh_show_max_freq,
+	TP_PROTO(struct cpufreq_policy *policy, unsigned int *max_freq),
+	TP_ARGS(policy, max_freq), 1);
+
+DECLARE_HOOK(android_vh_freq_table_limits,
+	TP_PROTO(struct cpufreq_policy *policy, unsigned int min_freq,
+		 unsigned int max_freq),
+	TP_ARGS(policy, min_freq, max_freq));
+
+DECLARE_RESTRICTED_HOOK(android_rvh_cpufreq_transition,
+	TP_PROTO(struct cpufreq_policy *policy),
+	TP_ARGS(policy), 1);
+
+DECLARE_HOOK(android_vh_cpufreq_resolve_freq,
+	TP_PROTO(struct cpufreq_policy *policy, unsigned int *target_freq,
+		unsigned int old_target_freq),
+	TP_ARGS(policy, target_freq, old_target_freq));
+
+DECLARE_HOOK(android_vh_cpufreq_fast_switch,
+	TP_PROTO(struct cpufreq_policy *policy, unsigned int *target_freq,
+		unsigned int old_target_freq),
+	TP_ARGS(policy, target_freq, old_target_freq));
+
+DECLARE_HOOK(android_vh_cpufreq_target,
+	TP_PROTO(struct cpufreq_policy *policy, unsigned int *target_freq,
+		unsigned int old_target_freq),
+	TP_ARGS(policy, target_freq, old_target_freq));
+
+#endif /* _TRACE_HOOK_CPUFREQ_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/cpuidle.h b/include/trace/hooks/cpuidle.h
new file mode 100644
index 0000000..b1ee27e
--- /dev/null
+++ b/include/trace/hooks/cpuidle.h

@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cpuidle
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_CPUIDLE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_CPUIDLE_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+struct cpuidle_device;
+
+DECLARE_HOOK(android_vh_cpu_idle_enter,
+	TP_PROTO(int *state, struct cpuidle_device *dev),
+	TP_ARGS(state, dev))
+DECLARE_HOOK(android_vh_cpu_idle_exit,
+	TP_PROTO(int state, struct cpuidle_device *dev),
+	TP_ARGS(state, dev))
+
+#endif /* _TRACE_HOOK_CPUIDLE_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>
+

diff --git a/include/trace/hooks/cpuidle_psci.h b/include/trace/hooks/cpuidle_psci.h
new file mode 100644
index 0000000..eef0032
--- /dev/null
+++ b/include/trace/hooks/cpuidle_psci.h

@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cpuidle_psci
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_CPUIDLE_PSCI_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_CPUIDLE_PSCI_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+
+struct cpuidle_device;
+DECLARE_HOOK(android_vh_cpuidle_psci_enter,
+	TP_PROTO(struct cpuidle_device *dev, bool s2idle),
+	TP_ARGS(dev, s2idle));
+
+DECLARE_HOOK(android_vh_cpuidle_psci_exit,
+	TP_PROTO(struct cpuidle_device *dev, bool s2idle),
+	TP_ARGS(dev, s2idle));
+
+#endif /* _TRACE_HOOK_CPUIDLE_PSCI_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/creds.h b/include/trace/hooks/creds.h
new file mode 100644
index 0000000..69a6808
--- /dev/null
+++ b/include/trace/hooks/creds.h

@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM creds
+
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_CREDS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_CREDS_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct cred;
+struct task_struct;
+DECLARE_RESTRICTED_HOOK(android_rvh_commit_creds,
+	TP_PROTO(const struct task_struct *task, const struct cred *new),
+	TP_ARGS(task, new), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_exit_creds,
+	TP_PROTO(const struct task_struct *task, const struct cred *cred),
+	TP_ARGS(task, cred), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_override_creds,
+	TP_PROTO(const struct task_struct *task, const struct cred *new),
+	TP_ARGS(task, new), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_revert_creds,
+	TP_PROTO(const struct task_struct *task, const struct cred *old),
+	TP_ARGS(task, old), 1);
+
+#endif /* _TRACE_HOOK_CREDS_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/debug.h b/include/trace/hooks/debug.h
new file mode 100644
index 0000000..5a20141
--- /dev/null
+++ b/include/trace/hooks/debug.h

@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM debug
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_DEBUG_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_DEBUG_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+struct pt_regs;
+
+DECLARE_HOOK(android_vh_ipi_stop,
+	TP_PROTO(struct pt_regs *regs),
+	TP_ARGS(regs))
+
+#endif /* _TRACE_HOOK_DEBUG_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/dmabuf.h b/include/trace/hooks/dmabuf.h
new file mode 100644
index 0000000..85688eb
--- /dev/null
+++ b/include/trace/hooks/dmabuf.h

@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM dmabuf
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_DMA_BUF_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_DMA_BUF_H
+
+struct dma_buf;
+
+#include <trace/hooks/vendor_hooks.h>
+
+DECLARE_HOOK(android_vh_ignore_dmabuf_vmap_bounds,
+	     TP_PROTO(struct dma_buf *dma_buf, bool *ignore_bounds),
+	     TP_ARGS(dma_buf, ignore_bounds));
+
+#endif /* _TRACE_HOOK_DMA_BUF_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/epoch.h b/include/trace/hooks/epoch.h
new file mode 100644
index 0000000..ccee2d8
--- /dev/null
+++ b/include/trace/hooks/epoch.h

@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM epoch
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_EPOCH_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_EPOCH_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+DECLARE_HOOK(android_vh_show_suspend_epoch_val,
+	TP_PROTO(u64 suspend_ns, u64 suspend_cycles),
+	TP_ARGS(suspend_ns, suspend_cycles));
+
+DECLARE_HOOK(android_vh_show_resume_epoch_val,
+	TP_PROTO(u64 resume_cycles),
+	TP_ARGS(resume_cycles));
+
+#endif /* _TRACE_HOOK_EPOCH_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/fips140.h b/include/trace/hooks/fips140.h
new file mode 100644
index 0000000..fd4a42c
--- /dev/null
+++ b/include/trace/hooks/fips140.h

@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fips140
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_FIPS140_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_FIPS140_H
+#include <trace/hooks/vendor_hooks.h>
+
+struct crypto_aes_ctx;
+
+/*
+ * These hooks exist only for the benefit of the FIPS140 crypto module, which
+ * uses them to swap out the underlying implementation with one that is integrity
+ * checked as per FIPS 140 requirements. No other uses are allowed or
+ * supported.
+ */
+
+DECLARE_HOOK(android_vh_sha256,
+	     TP_PROTO(const u8 *data,
+		      unsigned int len,
+		      u8 *out,
+		      int *hook_inuse),
+	     TP_ARGS(data, len, out, hook_inuse));
+
+DECLARE_HOOK(android_vh_aes_expandkey,
+	     TP_PROTO(struct crypto_aes_ctx *ctx,
+		      const u8 *in_key,
+		      unsigned int key_len,
+		      int *err),
+	     TP_ARGS(ctx, in_key, key_len, err));
+
+DECLARE_HOOK(android_vh_aes_encrypt,
+	     TP_PROTO(const struct crypto_aes_ctx *ctx,
+		      u8 *out,
+		      const u8 *in,
+		      int *hook_inuse),
+	     TP_ARGS(ctx, out, in, hook_inuse));
+
+DECLARE_HOOK(android_vh_aes_decrypt,
+	     TP_PROTO(const struct crypto_aes_ctx *ctx,
+		      u8 *out,
+		      const u8 *in,
+		      int *hook_inuse),
+	     TP_ARGS(ctx, out, in, hook_inuse));
+
+#endif /* _TRACE_HOOK_FIPS140_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/fpsimd.h b/include/trace/hooks/fpsimd.h
new file mode 100644
index 0000000..1033718
--- /dev/null
+++ b/include/trace/hooks/fpsimd.h

@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fpsimd
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_FPSIMD_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_FPSIMD_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+struct task_struct;
+
+DECLARE_HOOK(android_vh_is_fpsimd_save,
+	TP_PROTO(struct task_struct *prev, struct task_struct *next),
+	TP_ARGS(prev, next))
+
+#endif /* _TRACE_HOOK_FPSIMD_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/ftrace_dump.h b/include/trace/hooks/ftrace_dump.h
new file mode 100644
index 0000000..3d0b980
--- /dev/null
+++ b/include/trace/hooks/ftrace_dump.h

@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ftrace_dump
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_FTRACE_DUMP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_FTRACE_DUMP_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+struct trace_seq;
+
+DECLARE_HOOK(android_vh_ftrace_oops_enter,
+	TP_PROTO(bool *ftrace_check),
+	TP_ARGS(ftrace_check));
+
+DECLARE_HOOK(android_vh_ftrace_oops_exit,
+	TP_PROTO(bool *ftrace_check),
+	TP_ARGS(ftrace_check));
+
+DECLARE_HOOK(android_vh_ftrace_size_check,
+	TP_PROTO(unsigned long size, bool *ftrace_check),
+	TP_ARGS(size, ftrace_check));
+
+DECLARE_HOOK(android_vh_ftrace_format_check,
+	TP_PROTO(bool *ftrace_check),
+	TP_ARGS(ftrace_check));
+
+DECLARE_HOOK(android_vh_ftrace_dump_buffer,
+	TP_PROTO(struct trace_seq *trace_buf, bool *dump_printk),
+	TP_ARGS(trace_buf, dump_printk));
+
+#endif /* _TRACE_HOOK_FTRACE_DUMP_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/futex.h b/include/trace/hooks/futex.h
new file mode 100644
index 0000000..cbf6abb
--- /dev/null
+++ b/include/trace/hooks/futex.h

@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM futex
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_FUTEX_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_FUTEX_H
+#include <linux/tracepoint.h>
+#include <trace/hooks/vendor_hooks.h>
+#include <linux/plist.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+#if defined(CONFIG_TRACEPOINTS) && defined(CONFIG_ANDROID_VENDOR_HOOKS)
+DECLARE_HOOK(android_vh_alter_futex_plist_add,
+	TP_PROTO(struct plist_node *node,
+		 struct plist_head *head,
+		 bool *already_on_hb),
+	TP_ARGS(node, head, already_on_hb));
+#else
+#define trace_android_vh_alter_futex_plist_add(node, head, already_on_hb)
+#endif
+#endif /* _TRACE_HOOK_FUTEX_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/gic.h b/include/trace/hooks/gic.h
new file mode 100644
index 0000000..daa11cd
--- /dev/null
+++ b/include/trace/hooks/gic.h

@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM gic
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_GIC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_GIC_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+struct irq_data;
+DECLARE_HOOK(android_vh_gic_set_affinity,
+	TP_PROTO(struct irq_data *d, const struct cpumask *mask_val,
+		 bool force, u8 *gic_cpu_map, void __iomem *reg),
+	TP_ARGS(d, mask_val, force, gic_cpu_map, reg));
+
+#endif /* _TRACE_HOOK_GIC_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/gic_v3.h b/include/trace/hooks/gic_v3.h
new file mode 100644
index 0000000..0f81993
--- /dev/null
+++ b/include/trace/hooks/gic_v3.h

@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM gic_v3
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_GIC_V3_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_GIC_V3_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct irq_data;
+struct cpumask;
+DECLARE_HOOK(android_vh_gic_v3_affinity_init,
+	TP_PROTO(int irq, u32 offset, u64 *affinity),
+	TP_ARGS(irq, offset, affinity));
+DECLARE_RESTRICTED_HOOK(android_rvh_gic_v3_set_affinity,
+	TP_PROTO(struct irq_data *d, const struct cpumask *mask_val,
+		 u64 *affinity, bool force, void __iomem *base,
+		 void __iomem *rbase, u64 redist_stride),
+	TP_ARGS(d, mask_val, affinity, force, base, rbase, redist_stride),
+	1);
+
+#endif /* _TRACE_HOOK_GIC_V3_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/hung_task.h b/include/trace/hooks/hung_task.h
new file mode 100644
index 0000000..cdbf59e
--- /dev/null
+++ b/include/trace/hooks/hung_task.h

@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hung_task
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_HUNG_TASK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_HUNG_TASK_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+DECLARE_HOOK(android_vh_check_uninterrupt_tasks,
+	TP_PROTO(struct task_struct *t, unsigned long timeout,
+		bool *need_check),
+	TP_ARGS(t, timeout, need_check));
+
+DECLARE_HOOK(android_vh_check_uninterrupt_tasks_done,
+	TP_PROTO(void *unused),
+	TP_ARGS(unused));
+
+#endif /* _TRACE_HOOK_HUNG_TASK_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/iommu.h b/include/trace/hooks/iommu.h
new file mode 100644
index 0000000..d36446f
--- /dev/null
+++ b/include/trace/hooks/iommu.h

@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM iommu
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_IOMMU_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_IOMMU_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+DECLARE_RESTRICTED_HOOK(android_rvh_iommu_setup_dma_ops,
+	TP_PROTO(struct device *dev, u64 dma_base, u64 dma_limit),
+	TP_ARGS(dev, dma_base, dma_limit), 1);
+
+struct iova_domain;
+struct iova;
+
+DECLARE_RESTRICTED_HOOK(android_rvh_iommu_alloc_insert_iova,
+	TP_PROTO(struct iova_domain *iovad, unsigned long size,
+		unsigned long limit_pfn, struct iova *new_iova,
+		bool size_aligned, int *ret),
+	TP_ARGS(iovad, size, limit_pfn, new_iova, size_aligned, ret),
+	1);
+
+DECLARE_HOOK(android_vh_iommu_iovad_alloc_iova,
+	TP_PROTO(struct device *dev, struct iova_domain *iovad, dma_addr_t iova, size_t size),
+	TP_ARGS(dev, iovad, iova, size));
+
+DECLARE_HOOK(android_vh_iommu_iovad_free_iova,
+	TP_PROTO(struct iova_domain *iovad, dma_addr_t iova, size_t size),
+	TP_ARGS(iovad, iova, size));
+
+DECLARE_RESTRICTED_HOOK(android_rvh_iommu_iovad_init_alloc_algo,
+	TP_PROTO(struct device *dev, struct iova_domain *iovad),
+	TP_ARGS(dev, iovad), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_iommu_limit_align_shift,
+	TP_PROTO(struct iova_domain *iovad, unsigned long size,
+		unsigned long *shift),
+	TP_ARGS(iovad, size, shift), 1);
+
+#endif /* _TRACE_HOOK_IOMMU_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/mm.h b/include/trace/hooks/mm.h
new file mode 100644
index 0000000..b3912a9
--- /dev/null
+++ b/include/trace/hooks/mm.h

@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mm
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_MM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_MM_H
+
+#include <trace/hooks/vendor_hooks.h>
+/*
+
+DECLARE_RESTRICTED_HOOK(android_rvh_set_skip_swapcache_flags,
+			TP_PROTO(gfp_t *flags),
+			TP_ARGS(flags), 1);
+DECLARE_RESTRICTED_HOOK(android_rvh_set_gfp_zone_flags,
+			TP_PROTO(gfp_t *flags),
+			TP_ARGS(flags), 1);
+DECLARE_RESTRICTED_HOOK(android_rvh_set_readahead_gfp_mask,
+			TP_PROTO(gfp_t *flags),
+			TP_ARGS(flags), 1);
+
+*/
+#endif /* _TRACE_HOOK_MM_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/module.h b/include/trace/hooks/module.h
new file mode 100644
index 0000000..780e767
--- /dev/null
+++ b/include/trace/hooks/module.h

@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM module
+
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_MODULE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_MODULE_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct module;
+DECLARE_RESTRICTED_HOOK(android_rvh_set_module_permit_before_init,
+	TP_PROTO(const struct module *mod),
+	TP_ARGS(mod), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_set_module_permit_after_init,
+	TP_PROTO(const struct module *mod),
+	TP_ARGS(mod), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_set_module_core_rw_nx,
+	TP_PROTO(const struct module *mod),
+	TP_ARGS(mod), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_set_module_init_rw_nx,
+	TP_PROTO(const struct module *mod),
+	TP_ARGS(mod), 1);
+
+#endif /* _TRACE_HOOK_MODULE_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/mpam.h b/include/trace/hooks/mpam.h
new file mode 100644
index 0000000..50f5a68
--- /dev/null
+++ b/include/trace/hooks/mpam.h

@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mpam
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_MPAM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_MPAM_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct task_struct;
+DECLARE_HOOK(android_vh_mpam_set,
+	TP_PROTO(struct task_struct *prev, struct task_struct *next),
+	TP_ARGS(prev, next));
+
+#endif /* _TRACE_HOOK_MPAM_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/net.h b/include/trace/hooks/net.h
new file mode 100644
index 0000000..381649b
--- /dev/null
+++ b/include/trace/hooks/net.h

@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM net
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_NET_VH_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_NET_VH_H
+#include <trace/hooks/vendor_hooks.h>
+
+struct packet_type;
+struct list_head;
+DECLARE_HOOK(android_vh_ptype_head,
+	TP_PROTO(const struct packet_type *pt, struct list_head *vendor_pt),
+	TP_ARGS(pt, vendor_pt));
+
+/* macro versions of hooks are no longer required */
+
+#endif /* _TRACE_HOOK_NET_VH_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/pm_domain.h b/include/trace/hooks/pm_domain.h
new file mode 100644
index 0000000..2a530d1
--- /dev/null
+++ b/include/trace/hooks/pm_domain.h

@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pm_domain
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_PM_DOMAIN_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_PM_DOMAIN_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+struct generic_pm_domain;
+DECLARE_HOOK(android_vh_allow_domain_state,
+	TP_PROTO(struct generic_pm_domain *genpd, uint32_t idx, bool *allow),
+	TP_ARGS(genpd, idx, allow))
+
+#endif /* _TRACE_HOOK_PM_DOMAIN_H */
+
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/preemptirq.h b/include/trace/hooks/preemptirq.h
new file mode 100644
index 0000000..52308c8
--- /dev/null
+++ b/include/trace/hooks/preemptirq.h

@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM preemptirq
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_PREEMPTIRQ_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_PREEMPTIRQ_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+DECLARE_RESTRICTED_HOOK(android_rvh_preempt_disable,
+	TP_PROTO(unsigned long ip, unsigned long parent_ip),
+	TP_ARGS(ip, parent_ip), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_preempt_enable,
+	TP_PROTO(unsigned long ip, unsigned long parent_ip),
+	TP_ARGS(ip, parent_ip), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_irqs_disable,
+	TP_PROTO(unsigned long ip, unsigned long parent_ip),
+	TP_ARGS(ip, parent_ip), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_irqs_enable,
+	TP_PROTO(unsigned long ip, unsigned long parent_ip),
+	TP_ARGS(ip, parent_ip), 1);
+
+#endif /* _TRACE_HOOK_PREEMPTIRQ_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/printk.h b/include/trace/hooks/printk.h
new file mode 100644
index 0000000..2598490
--- /dev/null
+++ b/include/trace/hooks/printk.h

@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM printk
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_PRINTK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_PRINTK_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+DECLARE_HOOK(android_vh_printk_hotplug,
+	TP_PROTO(int *flag),
+	TP_ARGS(flag));
+
+#endif /* _TRACE_HOOK_PRINTK_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/remoteproc.h b/include/trace/hooks/remoteproc.h
new file mode 100644
index 0000000..55fae70
--- /dev/null
+++ b/include/trace/hooks/remoteproc.h

@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM remoteproc
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_RPROC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_RPROC_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+struct rproc;
+
+/* When recovery succeeds */
+DECLARE_HOOK(android_vh_rproc_recovery,
+	TP_PROTO(struct rproc *rproc),
+	TP_ARGS(rproc));
+
+/* When recovery mode is enabled or disabled by sysfs */
+DECLARE_HOOK(android_vh_rproc_recovery_set,
+	TP_PROTO(struct rproc *rproc),
+	TP_ARGS(rproc));
+
+#endif /* _TRACE_HOOK_RPROC_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/rwsem.h b/include/trace/hooks/rwsem.h
new file mode 100644
index 0000000..71215df
--- /dev/null
+++ b/include/trace/hooks/rwsem.h

@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM rwsem
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_RWSEM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_RWSEM_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct rw_semaphore;
+struct rwsem_waiter;
+DECLARE_HOOK(android_vh_rwsem_init,
+	TP_PROTO(struct rw_semaphore *sem),
+	TP_ARGS(sem));
+DECLARE_HOOK(android_vh_rwsem_wake,
+	TP_PROTO(struct rw_semaphore *sem),
+	TP_ARGS(sem));
+DECLARE_HOOK(android_vh_rwsem_write_finished,
+	TP_PROTO(struct rw_semaphore *sem),
+	TP_ARGS(sem));
+DECLARE_HOOK(android_vh_alter_rwsem_list_add,
+	TP_PROTO(struct rwsem_waiter *waiter,
+		 struct rw_semaphore *sem,
+		 bool *already_on_list),
+	TP_ARGS(waiter, sem, already_on_list));
+#endif /* _TRACE_HOOK_RWSEM_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/sched.h b/include/trace/hooks/sched.h
new file mode 100644
index 0000000..e978b679
--- /dev/null
+++ b/include/trace/hooks/sched.h

@@ -0,0 +1,340 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM sched
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_SCHED_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_SCHED_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct task_struct;
+DECLARE_RESTRICTED_HOOK(android_rvh_select_task_rq_fair,
+	TP_PROTO(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags, int *new_cpu),
+	TP_ARGS(p, prev_cpu, sd_flag, wake_flags, new_cpu), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_select_task_rq_rt,
+	TP_PROTO(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags, int *new_cpu),
+	TP_ARGS(p, prev_cpu, sd_flag, wake_flags, new_cpu), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_select_fallback_rq,
+	TP_PROTO(int cpu, struct task_struct *p, int *new_cpu),
+	TP_ARGS(cpu, p, new_cpu), 1);
+
+struct rq;
+DECLARE_HOOK(android_vh_scheduler_tick,
+	TP_PROTO(struct rq *rq),
+	TP_ARGS(rq));
+
+DECLARE_RESTRICTED_HOOK(android_rvh_enqueue_task,
+	TP_PROTO(struct rq *rq, struct task_struct *p, int flags),
+	TP_ARGS(rq, p, flags), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_dequeue_task,
+	TP_PROTO(struct rq *rq, struct task_struct *p, int flags),
+	TP_ARGS(rq, p, flags), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_can_migrate_task,
+	TP_PROTO(struct task_struct *p, int dst_cpu, int *can_migrate),
+	TP_ARGS(p, dst_cpu, can_migrate), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_find_lowest_rq,
+	TP_PROTO(struct task_struct *p, struct cpumask *local_cpu_mask,
+			int ret, int *lowest_cpu),
+	TP_ARGS(p, local_cpu_mask, ret, lowest_cpu), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_prepare_prio_fork,
+	TP_PROTO(struct task_struct *p),
+	TP_ARGS(p), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_finish_prio_fork,
+	TP_PROTO(struct task_struct *p),
+	TP_ARGS(p), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_rtmutex_prepare_setprio,
+	TP_PROTO(struct task_struct *p, struct task_struct *pi_task),
+	TP_ARGS(p, pi_task), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_rto_next_cpu,
+	TP_PROTO(int rto_cpu, struct cpumask *rto_mask, int *cpu),
+	TP_ARGS(rto_cpu, rto_mask, cpu), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_is_cpu_allowed,
+	TP_PROTO(struct task_struct *p, int cpu, bool *allowed),
+	TP_ARGS(p, cpu, allowed), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_get_nohz_timer_target,
+	TP_PROTO(int *cpu, bool *done),
+	TP_ARGS(cpu, done), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_set_user_nice,
+	TP_PROTO(struct task_struct *p, long *nice, bool *allowed),
+	TP_ARGS(p, nice, allowed), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_setscheduler,
+	TP_PROTO(struct task_struct *p),
+	TP_ARGS(p), 1);
+
+struct sched_group;
+DECLARE_RESTRICTED_HOOK(android_rvh_find_busiest_group,
+	TP_PROTO(struct sched_group *busiest, struct rq *dst_rq, int *out_balance),
+		TP_ARGS(busiest, dst_rq, out_balance), 1);
+
+DECLARE_HOOK(android_vh_dump_throttled_rt_tasks,
+	TP_PROTO(int cpu, u64 clock, ktime_t rt_period, u64 rt_runtime,
+			s64 rt_period_timer_expires),
+	TP_ARGS(cpu, clock, rt_period, rt_runtime, rt_period_timer_expires));
+
+DECLARE_HOOK(android_vh_jiffies_update,
+	TP_PROTO(void *unused),
+	TP_ARGS(unused));
+
+struct rq_flags;
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_newidle_balance,
+	TP_PROTO(struct rq *this_rq, struct rq_flags *rf,
+		 int *pulled_task, int *done),
+	TP_ARGS(this_rq, rf, pulled_task, done), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_nohz_balancer_kick,
+	TP_PROTO(struct rq *rq, unsigned int *flags, int *done),
+	TP_ARGS(rq, flags, done), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_rebalance_domains,
+	TP_PROTO(struct rq *rq, int *continue_balancing),
+	TP_ARGS(rq, continue_balancing), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_find_busiest_queue,
+	TP_PROTO(int dst_cpu, struct sched_group *group,
+		 struct cpumask *env_cpus, struct rq **busiest,
+		 int *done),
+	TP_ARGS(dst_cpu, group, env_cpus, busiest, done), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_migrate_queued_task,
+	TP_PROTO(struct rq *rq, struct rq_flags *rf,
+		 struct task_struct *p, int new_cpu,
+		 int *detached),
+	TP_ARGS(rq, rf, p, new_cpu, detached), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_cpu_overutilized,
+	TP_PROTO(int cpu, int *overutilized),
+	TP_ARGS(cpu, overutilized), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_setaffinity,
+	TP_PROTO(struct task_struct *p, const struct cpumask *in_mask, int *retval),
+	TP_ARGS(p, in_mask, retval), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_getaffinity,
+	TP_PROTO(struct task_struct *p, struct cpumask *in_mask),
+	TP_ARGS(p, in_mask), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_set_task_cpu,
+	TP_PROTO(struct task_struct *p, unsigned int new_cpu),
+	TP_ARGS(p, new_cpu), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_try_to_wake_up,
+	TP_PROTO(struct task_struct *p),
+	TP_ARGS(p), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_try_to_wake_up_success,
+	TP_PROTO(struct task_struct *p),
+	TP_ARGS(p), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_fork,
+	TP_PROTO(struct task_struct *p),
+	TP_ARGS(p), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_wake_up_new_task,
+	TP_PROTO(struct task_struct *p),
+	TP_ARGS(p), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_new_task_stats,
+	TP_PROTO(struct task_struct *p),
+	TP_ARGS(p), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_flush_task,
+	TP_PROTO(struct task_struct *prev),
+	TP_ARGS(prev), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_tick_entry,
+	TP_PROTO(struct rq *rq),
+	TP_ARGS(rq), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_schedule,
+	TP_PROTO(struct task_struct *prev, struct task_struct *next, struct rq *rq),
+	TP_ARGS(prev, next, rq), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_cpu_starting,
+	TP_PROTO(int cpu),
+	TP_ARGS(cpu), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_cpu_dying,
+	TP_PROTO(int cpu),
+	TP_ARGS(cpu), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_account_irq,
+	TP_PROTO(struct task_struct *curr, int cpu, s64 delta, bool start),
+	TP_ARGS(curr, cpu, delta, start), 1);
+
+struct sched_entity;
+DECLARE_RESTRICTED_HOOK(android_rvh_place_entity,
+	TP_PROTO(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial, u64 *vruntime),
+	TP_ARGS(cfs_rq, se, initial, vruntime), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_build_perf_domains,
+	TP_PROTO(bool *eas_check),
+	TP_ARGS(eas_check), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_update_cpu_capacity,
+	TP_PROTO(int cpu, unsigned long *capacity),
+	TP_ARGS(cpu, capacity), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_update_misfit_status,
+	TP_PROTO(struct task_struct *p, struct rq *rq, bool *need_update),
+	TP_ARGS(p, rq, need_update), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_update_cpus_allowed,
+	TP_PROTO(struct task_struct *p, cpumask_var_t cpus_requested,
+		 const struct cpumask *new_mask, int *ret),
+	TP_ARGS(p, cpus_requested, new_mask, ret), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_fork_init,
+	TP_PROTO(struct task_struct *p),
+	TP_ARGS(p), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_ttwu_cond,
+	TP_PROTO(int cpu, bool *cond),
+	TP_ARGS(cpu, cond), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_schedule_bug,
+	TP_PROTO(void *unused),
+	TP_ARGS(unused), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_exec,
+	TP_PROTO(bool *cond),
+	TP_ARGS(cond), 1);
+
+DECLARE_HOOK(android_vh_build_sched_domains,
+	TP_PROTO(bool has_asym),
+	TP_ARGS(has_asym));
+
+DECLARE_RESTRICTED_HOOK(android_rvh_check_preempt_tick,
+	TP_PROTO(struct task_struct *p, unsigned long *ideal_runtime, bool *skip_preempt,
+			unsigned long delta_exec, struct cfs_rq *cfs_rq, struct sched_entity *curr,
+			unsigned int granularity),
+	TP_ARGS(p, ideal_runtime, skip_preempt, delta_exec, cfs_rq, curr, granularity), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_check_preempt_wakeup_ignore,
+	TP_PROTO(struct task_struct *p, bool *ignore),
+	TP_ARGS(p, ignore), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_replace_next_task_fair,
+	TP_PROTO(struct rq *rq, struct task_struct **p, struct sched_entity **se, bool *repick,
+			bool simple, struct task_struct *prev),
+	TP_ARGS(rq, p, se, repick, simple, prev), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_sched_balance_rt,
+	TP_PROTO(struct rq *rq, struct task_struct *p, int *done),
+	TP_ARGS(rq, p, done), 1);
+
+struct cfs_rq;
+DECLARE_RESTRICTED_HOOK(android_rvh_pick_next_entity,
+	TP_PROTO(struct cfs_rq *cfs_rq, struct sched_entity *curr,
+		 struct sched_entity **se),
+	TP_ARGS(cfs_rq, curr, se), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_check_preempt_wakeup,
+	TP_PROTO(struct rq *rq, struct task_struct *p, bool *preempt, bool *nopreempt,
+			int wake_flags, struct sched_entity *se, struct sched_entity *pse,
+			int next_buddy_marked, unsigned int granularity),
+	TP_ARGS(rq, p, preempt, nopreempt, wake_flags, se, pse, next_buddy_marked,
+			granularity), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_set_cpus_allowed_by_task,
+	TP_PROTO(const struct cpumask *cpu_valid_mask, const struct cpumask *new_mask,
+		 struct task_struct *p, unsigned int *dest_cpu),
+	TP_ARGS(cpu_valid_mask, new_mask, p, dest_cpu), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_do_sched_yield,
+	TP_PROTO(struct rq *rq),
+	TP_ARGS(rq), 1);
+
+DECLARE_HOOK(android_vh_free_task,
+	TP_PROTO(struct task_struct *p),
+	TP_ARGS(p));
+
+enum uclamp_id;
+struct uclamp_se;
+DECLARE_RESTRICTED_HOOK(android_rvh_uclamp_eff_get,
+	TP_PROTO(struct task_struct *p, enum uclamp_id clamp_id,
+		 struct uclamp_se *uclamp_max, struct uclamp_se *uclamp_eff, int *ret),
+	TP_ARGS(p, clamp_id, uclamp_max, uclamp_eff, ret), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_after_enqueue_task,
+	TP_PROTO(struct rq *rq, struct task_struct *p, int flags),
+	TP_ARGS(rq, p, flags), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_after_dequeue_task,
+	TP_PROTO(struct rq *rq, struct task_struct *p, int flags),
+	TP_ARGS(rq, p, flags), 1);
+
+struct cfs_rq;
+struct sched_entity;
+struct rq_flags;
+DECLARE_RESTRICTED_HOOK(android_rvh_enqueue_entity,
+	TP_PROTO(struct cfs_rq *cfs, struct sched_entity *se),
+	TP_ARGS(cfs, se), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_dequeue_entity,
+	TP_PROTO(struct cfs_rq *cfs, struct sched_entity *se),
+	TP_ARGS(cfs, se), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_entity_tick,
+	TP_PROTO(struct cfs_rq *cfs_rq, struct sched_entity *se),
+	TP_ARGS(cfs_rq, se), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_enqueue_task_fair,
+	TP_PROTO(struct rq *rq, struct task_struct *p, int flags),
+	TP_ARGS(rq, p, flags), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_dequeue_task_fair,
+	TP_PROTO(struct rq *rq, struct task_struct *p, int flags),
+	TP_ARGS(rq, p, flags), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_util_est_update,
+	TP_PROTO(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep, int *ret),
+	TP_ARGS(cfs_rq, p, task_sleep, ret), 1);
+
+DECLARE_HOOK(android_vh_setscheduler_uclamp,
+	TP_PROTO(struct task_struct *tsk, int clamp_id, unsigned int value),
+	TP_ARGS(tsk, clamp_id, value));
+
+DECLARE_HOOK(android_vh_update_topology_flags_workfn,
+	TP_PROTO(void *unused),
+	TP_ARGS(unused));
+
+DECLARE_RESTRICTED_HOOK(android_rvh_update_thermal_stats,
+		TP_PROTO(int cpu),
+		TP_ARGS(cpu), 1);
+
+DECLARE_HOOK(android_vh_do_wake_up_sync,
+	TP_PROTO(struct wait_queue_head *wq_head, int *done, struct sock *sk),
+	TP_ARGS(wq_head, done, sk));
+
+DECLARE_HOOK(android_vh_set_wake_flags,
+	TP_PROTO(int *wake_flags, unsigned int *mode),
+	TP_ARGS(wake_flags, mode));
+
+DECLARE_RESTRICTED_HOOK(android_rvh_find_new_ilb,
+	TP_PROTO(struct cpumask *nohz_idle_cpus_mask, int *ilb),
+	TP_ARGS(nohz_idle_cpus_mask, ilb), 1);
+
+DECLARE_RESTRICTED_HOOK(android_rvh_find_energy_efficient_cpu,
+	TP_PROTO(struct task_struct *p, int prev_cpu, int sync, int *new_cpu),
+	TP_ARGS(p, prev_cpu, sync, new_cpu), 1);
+
+/* macro versions of hooks are no longer required */
+
+#endif /* _TRACE_HOOK_SCHED_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/selinux.h b/include/trace/hooks/selinux.h
new file mode 100644
index 0000000..0b65631b
--- /dev/null
+++ b/include/trace/hooks/selinux.h

@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM selinux
+
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_SELINUX_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_SELINUX_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct selinux_state;
+DECLARE_RESTRICTED_HOOK(android_rvh_selinux_is_initialized,
+	TP_PROTO(const struct selinux_state *state),
+	TP_ARGS(state), 1);
+
+#endif /* _TRACE_HOOK_SELINUX_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/sys.h b/include/trace/hooks/sys.h
new file mode 100644
index 0000000..e2d5d6d
--- /dev/null
+++ b/include/trace/hooks/sys.h

@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM sys
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_SYS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_SYS_H
+#include <trace/hooks/vendor_hooks.h>
+
+struct task_struct;
+DECLARE_HOOK(android_vh_syscall_prctl_finished,
+	TP_PROTO(int option, struct task_struct *task),
+	TP_ARGS(option, task));
+#endif
+
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/syscall_check.h b/include/trace/hooks/syscall_check.h
new file mode 100644
index 0000000..56d8267
--- /dev/null
+++ b/include/trace/hooks/syscall_check.h

@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM syscall_check
+
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_SYSCALL_CHECK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_SYSCALL_CHECK_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct file;
+union bpf_attr;
+DECLARE_HOOK(android_vh_check_mmap_file,
+	TP_PROTO(const struct file *file, unsigned long prot,
+		unsigned long flag, unsigned long ret),
+	TP_ARGS(file, prot, flag, ret));
+
+DECLARE_HOOK(android_vh_check_file_open,
+	TP_PROTO(const struct file *file),
+	TP_ARGS(file));
+
+DECLARE_HOOK(android_vh_check_bpf_syscall,
+	TP_PROTO(int cmd, const union bpf_attr *attr, unsigned int size),
+	TP_ARGS(cmd, attr, size));
+
+#endif /* _TRACE_HOOK_SYSCALL_CHECK_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/sysrqcrash.h b/include/trace/hooks/sysrqcrash.h
new file mode 100644
index 0000000..92e7bc7
--- /dev/null
+++ b/include/trace/hooks/sysrqcrash.h

@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM sysrqcrash
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_SYSRQCRASH_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_SYSRQCRASH_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+DECLARE_HOOK(android_vh_sysrq_crash,
+	TP_PROTO(void *data),
+	TP_ARGS(data));
+
+#endif /* _TRACE_HOOK_SYSRQCRASH_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/timer.h b/include/trace/hooks/timer.h
new file mode 100644
index 0000000..67ef865
--- /dev/null
+++ b/include/trace/hooks/timer.h

@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM timer
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_TIMER_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_TIMER_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+DECLARE_HOOK(android_vh_timer_calc_index,
+	TP_PROTO(unsigned int lvl, unsigned long *expires),
+	TP_ARGS(lvl, expires));
+
+#endif /* _TRACE_HOOK_TIMER_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/topology.h b/include/trace/hooks/topology.h
new file mode 100644
index 0000000..d2673d4
--- /dev/null
+++ b/include/trace/hooks/topology.h

@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM topology
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_TOPOLOGY_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_TOPOLOGY_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+struct cpumask;
+
+#if defined(CONFIG_TRACEPOINTS) && defined(CONFIG_ANDROID_VENDOR_HOOKS)
+
+DECLARE_HOOK(android_vh_arch_set_freq_scale,
+	TP_PROTO(const struct cpumask *cpus, unsigned long freq, unsigned long max,
+		unsigned long *scale),
+	TP_ARGS(cpus, freq, max, scale));
+
+#else
+
+#define trace_android_vh_arch_set_freq_scale(cpus, freq, max, scale)
+
+#endif
+
+#endif /* _TRACE_HOOK_TOPOLOGY_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/ufshcd.h b/include/trace/hooks/ufshcd.h
new file mode 100644
index 0000000..7f5c545
--- /dev/null
+++ b/include/trace/hooks/ufshcd.h

@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ufshcd
+#define TRACE_INCLUDE_PATH trace/hooks
+#if !defined(_TRACE_HOOK_UFSHCD_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_UFSHCD_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+struct ufs_hba;
+struct request;
+struct ufshcd_lrb;
+
+DECLARE_HOOK(android_vh_ufs_fill_prdt,
+	TP_PROTO(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
+		 unsigned int segments, int *err),
+	TP_ARGS(hba, lrbp, segments, err));
+
+DECLARE_HOOK(android_vh_ufs_prepare_command,
+	TP_PROTO(struct ufs_hba *hba, struct request *rq,
+		 struct ufshcd_lrb *lrbp, int *err),
+	TP_ARGS(hba, rq, lrbp, err));
+
+DECLARE_HOOK(android_vh_ufs_update_sysfs,
+	TP_PROTO(struct ufs_hba *hba),
+	TP_ARGS(hba));
+
+DECLARE_HOOK(android_vh_ufs_send_command,
+	TP_PROTO(struct ufs_hba *hba, struct ufshcd_lrb *lrbp),
+	TP_ARGS(hba, lrbp));
+
+DECLARE_HOOK(android_vh_ufs_compl_command,
+	TP_PROTO(struct ufs_hba *hba, struct ufshcd_lrb *lrbp),
+	TP_ARGS(hba, lrbp));
+
+struct uic_command;
+DECLARE_HOOK(android_vh_ufs_send_uic_command,
+	TP_PROTO(struct ufs_hba *hba, const struct uic_command *ucmd,
+		 int str_t),
+	TP_ARGS(hba, ucmd, str_t));
+
+DECLARE_HOOK(android_vh_ufs_send_tm_command,
+	TP_PROTO(struct ufs_hba *hba, int tag, int str_t),
+	TP_ARGS(hba, tag, str_t));
+
+DECLARE_HOOK(android_vh_ufs_check_int_errors,
+	TP_PROTO(struct ufs_hba *hba, bool queue_eh_work),
+	TP_ARGS(hba, queue_eh_work));
+
+#endif /* _TRACE_HOOK_UFSHCD_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/vendor_hooks.h b/include/trace/hooks/vendor_hooks.h
new file mode 100644
index 0000000..48ee67a
--- /dev/null
+++ b/include/trace/hooks/vendor_hooks.h

@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Note: we intentionally omit include file ifdef protection
+ *  This is due to the way trace events work. If a file includes two
+ *  trace event headers under one "CREATE_TRACE_POINTS" the first include
+ *  will override the DECLARE_RESTRICTED_HOOK and break the second include.
+ */
+
+#ifndef __GENKSYMS__
+#include <linux/tracepoint.h>
+#endif
+
+#if defined(CONFIG_TRACEPOINTS) && defined(CONFIG_ANDROID_VENDOR_HOOKS)
+
+#define DECLARE_HOOK DECLARE_TRACE
+
+int android_rvh_probe_register(struct tracepoint *tp, void *probe, void *data);
+
+#ifdef TRACE_HEADER_MULTI_READ
+
+#define DEFINE_HOOK_FN(_name, _reg, _unreg, proto, args)		\
+	static const char __tpstrtab_##_name[]				\
+	__section("__tracepoints_strings") = #_name;			\
+	extern struct static_call_key STATIC_CALL_KEY(tp_func_##_name);	\
+	int __traceiter_##_name(void *__data, proto);			\
+	struct tracepoint __tracepoint_##_name	__used			\
+	__section("__tracepoints") = {					\
+		.name = __tpstrtab_##_name,				\
+		.key = STATIC_KEY_INIT_FALSE,				\
+		.static_call_key = &STATIC_CALL_KEY(tp_func_##_name),	\
+		.static_call_tramp = STATIC_CALL_TRAMP_ADDR(tp_func_##_name), \
+		.iterator = &__traceiter_##_name,			\
+		.regfunc = _reg,					\
+		.unregfunc = _unreg,					\
+		.funcs = NULL };					\
+	__TRACEPOINT_ENTRY(_name);					\
+	int __traceiter_##_name(void *__data, proto)			\
+	{								\
+		struct tracepoint_func *it_func_ptr;			\
+		void *it_func;						\
+									\
+		it_func_ptr = (&__tracepoint_##_name)->funcs;		\
+		it_func = (it_func_ptr)->func;				\
+		do {							\
+			__data = (it_func_ptr)->data;			\
+			((void(*)(void *, proto))(it_func))(__data, args); \
+			it_func = READ_ONCE((++it_func_ptr)->func);	\
+		} while (it_func);					\
+		return 0;						\
+	}								\
+	DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name);
+
+#undef DECLARE_RESTRICTED_HOOK
+#define DECLARE_RESTRICTED_HOOK(name, proto, args, cond) \
+	DEFINE_HOOK_FN(name, NULL, NULL, PARAMS(proto), PARAMS(args))
+
+/* prevent additional recursion */
+#undef TRACE_HEADER_MULTI_READ
+#else /* TRACE_HEADER_MULTI_READ */
+
+#ifdef CONFIG_HAVE_STATIC_CALL
+#define __DO_RESTRICTED_HOOK_CALL(name, args)					\
+	do {								\
+		struct tracepoint_func *it_func_ptr;			\
+		void *__data;						\
+		it_func_ptr = (&__tracepoint_##name)->funcs;		\
+		if (it_func_ptr) {					\
+			__data = (it_func_ptr)->data;			\
+			static_call(tp_func_##name)(__data, args);	\
+		}							\
+	} while (0)
+#else
+#define __DO_RESTRICTED_HOOK_CALL(name, args)	__traceiter_##name(NULL, args)
+#endif
+
+#define DO_RESTRICTED_HOOK(name, args, cond)					\
+	do {								\
+		if (!(cond))						\
+			return;						\
+									\
+		__DO_RESTRICTED_HOOK_CALL(name, TP_ARGS(args));		\
+	} while (0)
+
+#define __DECLARE_RESTRICTED_HOOK(name, proto, args, cond, data_proto)	\
+	extern int __traceiter_##name(data_proto);			\
+	DECLARE_STATIC_CALL(tp_func_##name, __traceiter_##name);	\
+	extern struct tracepoint __tracepoint_##name;			\
+	static inline void trace_##name(proto)				\
+	{								\
+		if (static_key_false(&__tracepoint_##name.key))		\
+			DO_RESTRICTED_HOOK(name,			\
+					   TP_ARGS(args),		\
+					   TP_CONDITION(cond));		\
+	}								\
+	static inline bool						\
+	trace_##name##_enabled(void)					\
+	{								\
+		return static_key_false(&__tracepoint_##name.key);	\
+	}								\
+	static inline int						\
+	register_trace_##name(void (*probe)(data_proto), void *data) 	\
+	{								\
+		return android_rvh_probe_register(&__tracepoint_##name,	\
+						  (void *)probe, data);	\
+	}								\
+	/* vendor hooks cannot be unregistered */			\
+
+#undef DECLARE_RESTRICTED_HOOK
+#define DECLARE_RESTRICTED_HOOK(name, proto, args, cond)		\
+	__DECLARE_RESTRICTED_HOOK(name, PARAMS(proto), PARAMS(args),	\
+			cond,						\
+			PARAMS(void *__data, proto))
+
+#endif /* TRACE_HEADER_MULTI_READ */
+
+#else /* !CONFIG_TRACEPOINTS || !CONFIG_ANDROID_VENDOR_HOOKS */
+/* suppress trace hooks */
+#define DECLARE_HOOK DECLARE_EVENT_NOP
+#define DECLARE_RESTRICTED_HOOK(name, proto, args, cond)		\
+	DECLARE_EVENT_NOP(name, PARAMS(proto), PARAMS(args))
+#endif

diff --git a/include/trace/hooks/vmscan.h b/include/trace/hooks/vmscan.h
new file mode 100644
index 0000000..83116f7
--- /dev/null
+++ b/include/trace/hooks/vmscan.h

@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM vmscan
+
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_VMSCAN_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_VMSCAN_H
+
+#include <trace/hooks/vendor_hooks.h>
+
+DECLARE_RESTRICTED_HOOK(android_rvh_set_balance_anon_file_reclaim,
+			TP_PROTO(bool *balance_anon_file_reclaim),
+			TP_ARGS(balance_anon_file_reclaim), 1);
+#endif /* _TRACE_HOOK_VMSCAN_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/trace/hooks/wqlockup.h b/include/trace/hooks/wqlockup.h
new file mode 100644
index 0000000..92e01ab
--- /dev/null
+++ b/include/trace/hooks/wqlockup.h

@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM wqlockup
+#define TRACE_INCLUDE_PATH trace/hooks
+
+#if !defined(_TRACE_HOOK_WQLOCKUP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOOK_WQLOCKUP_H
+#include <trace/hooks/vendor_hooks.h>
+/*
+ * Following tracepoints are not exported in tracefs and provide a
+ * mechanism for vendor modules to hook and extend functionality
+ */
+DECLARE_HOOK(android_vh_wq_lockup_pool,
+	TP_PROTO(int cpu, unsigned long pool_ts),
+	TP_ARGS(cpu, pool_ts));
+
+#endif /* _TRACE_HOOK_WQLOCKUP_H */
+/* This part must be outside protection */
+#include <trace/define_trace.h>

diff --git a/include/uapi/linux/OWNERS b/include/uapi/linux/OWNERS
new file mode 100644
index 0000000..8aed640
--- /dev/null
+++ b/include/uapi/linux/OWNERS

@@ -0,0 +1,3 @@
+per-file f2fs**=file:/fs/f2fs/OWNERS
+per-file fuse**=file:/fs/fuse/OWNERS
+per-file net**=file:/net/OWNERS

diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index e72e4de..7fa36a8 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h

@@ -38,11 +38,59 @@ enum {
 	BINDER_TYPE_PTR		= B_PACK_CHARS('p', 't', '*', B_TYPE_LARGE),
 };
 
-enum {
+/**
+ * enum flat_binder_object_shifts: shift values for flat_binder_object_flags
+ * @FLAT_BINDER_FLAG_SCHED_POLICY_SHIFT: shift for getting scheduler policy.
+ *
+ */
+enum flat_binder_object_shifts {
+	FLAT_BINDER_FLAG_SCHED_POLICY_SHIFT = 9,
+};
+
+/**
+ * enum flat_binder_object_flags - flags for use in flat_binder_object.flags
+ */
+enum flat_binder_object_flags {
+	/**
+	 * @FLAT_BINDER_FLAG_PRIORITY_MASK: bit-mask for min scheduler priority
+	 *
+	 * These bits can be used to set the minimum scheduler priority
+	 * at which transactions into this node should run. Valid values
+	 * in these bits depend on the scheduler policy encoded in
+	 * @FLAT_BINDER_FLAG_SCHED_POLICY_MASK.
+	 *
+	 * For SCHED_NORMAL/SCHED_BATCH, the valid range is between [-20..19]
+	 * For SCHED_FIFO/SCHED_RR, the value can run between [1..99]
+	 */
 	FLAT_BINDER_FLAG_PRIORITY_MASK = 0xff,
+	/**
+	 * @FLAT_BINDER_FLAG_ACCEPTS_FDS: whether the node accepts fds.
+	 */
 	FLAT_BINDER_FLAG_ACCEPTS_FDS = 0x100,
 
 	/**
+	 * @FLAT_BINDER_FLAG_SCHED_POLICY_MASK: bit-mask for scheduling policy
+	 *
+	 * These two bits can be used to set the min scheduling policy at which
+	 * transactions on this node should run. These match the UAPI
+	 * scheduler policy values, eg:
+	 * 00b: SCHED_NORMAL
+	 * 01b: SCHED_FIFO
+	 * 10b: SCHED_RR
+	 * 11b: SCHED_BATCH
+	 */
+	FLAT_BINDER_FLAG_SCHED_POLICY_MASK =
+		3U << FLAT_BINDER_FLAG_SCHED_POLICY_SHIFT,
+
+	/**
+	 * @FLAT_BINDER_FLAG_INHERIT_RT: whether the node inherits RT policy
+	 *
+	 * Only when set, calls into this node will inherit a real-time
+	 * scheduling policy from the caller (for synchronous transactions).
+	 */
+	FLAT_BINDER_FLAG_INHERIT_RT = 0x800,
+
+	/**
 	 * @FLAT_BINDER_FLAG_TXN_SECURITY_CTX: request security contexts
 	 *
 	 * Only when set, causes senders to include their security

diff --git a/include/uapi/linux/android_fuse.h b/include/uapi/linux/android_fuse.h
new file mode 100644
index 0000000..221e30e
--- /dev/null
+++ b/include/uapi/linux/android_fuse.h

@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause WITH Linux-syscall-note */
+/* Copyright (c) 2022 Google LLC */
+
+#ifndef _LINUX_ANDROID_FUSE_H
+#define _LINUX_ANDROID_FUSE_H
+
+#ifdef __KERNEL__
+#include <linux/types.h>
+#else
+#include <stdint.h>
+#endif
+
+#define FUSE_ACTION_KEEP	0
+#define FUSE_ACTION_REMOVE	1
+#define FUSE_ACTION_REPLACE	2
+
+struct fuse_entry_bpf_out {
+	uint64_t	backing_action;
+	uint64_t	backing_fd;
+	uint64_t	bpf_action;
+	uint64_t	bpf_fd;
+};
+
+struct fuse_entry_bpf {
+	struct fuse_entry_bpf_out out;
+	struct file *backing_file;
+	struct file *bpf_file;
+};
+
+struct fuse_read_out {
+	uint64_t	offset;
+	uint32_t	again;
+	uint32_t	padding;
+};
+
+struct fuse_in_postfilter_header {
+	uint32_t	len;
+	uint32_t	opcode;
+	uint64_t	unique;
+	uint64_t	nodeid;
+	uint32_t	uid;
+	uint32_t	gid;
+	uint32_t	pid;
+	uint32_t	error_in;
+};
+
+/*
+ * Fuse BPF Args
+ *
+ * Used to communicate with bpf programs to allow checking or altering certain values.
+ * The end_offset allows the bpf verifier to check boundaries statically. This reflects
+ * the ends of the buffer. size shows the length that was actually used.
+ *
+ */
+
+/** One input argument of a request */
+struct fuse_bpf_in_arg {
+	uint32_t size;
+	const void *value;
+	const void *end_offset;
+};
+
+/** One output argument of a request */
+struct fuse_bpf_arg {
+	uint32_t size;
+	void *value;
+	void *end_offset;
+};
+
+#define FUSE_MAX_IN_ARGS 5
+#define FUSE_MAX_OUT_ARGS 3
+
+#define FUSE_BPF_FORCE (1 << 0)
+#define FUSE_BPF_OUT_ARGVAR (1 << 6)
+
+struct fuse_bpf_args {
+	uint64_t nodeid;
+	uint32_t opcode;
+	uint32_t error_in;
+	uint32_t in_numargs;
+	uint32_t out_numargs;
+	uint32_t flags;
+	struct fuse_bpf_in_arg in_args[FUSE_MAX_IN_ARGS];
+	struct fuse_bpf_arg out_args[FUSE_MAX_OUT_ARGS];
+};
+
+#define FUSE_BPF_USER_FILTER	1
+#define FUSE_BPF_BACKING	2
+#define FUSE_BPF_POST_FILTER	4
+
+#define FUSE_OPCODE_FILTER	0x0ffff
+#define FUSE_PREFILTER		0x10000
+#define FUSE_POSTFILTER		0x20000
+
+struct bpf_prog *fuse_get_bpf_prog(struct file *file);
+
+#endif  /* _LINUX_ANDROID_FUSE_H */

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 51b9aa6..a8b5bf3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h

@@ -978,6 +978,16 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LSM,
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
+
+	/*
+	 * Until fuse-bpf is upstreamed, this value must be at the end to allow for
+	 * other recently-added upstreamed values to be correct.
+	 * This works because no one should use this value directly, rather they must
+	 * read the value from /sys/fs/fuse/bpf_prog_type_fuse
+	 * Please maintain this value at the end of the list until fuse-bpf is
+	 * upstreamed.
+	 */
+	BPF_PROG_TYPE_FUSE,
 };
 
 enum bpf_attach_type {

diff --git a/include/uapi/linux/dm-user.h b/include/uapi/linux/dm-user.h
new file mode 100644
index 0000000..6d8f535b
--- /dev/null
+++ b/include/uapi/linux/dm-user.h

@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: LGPL-2.0+ WITH Linux-syscall-note */
+/*
+ * Copyright (C) 2020 Google, Inc
+ * Copyright (C) 2020 Palmer Dabbelt <palmerdabbelt@google.com>
+ */
+
+#ifndef _LINUX_DM_USER_H
+#define _LINUX_DM_USER_H
+
+#include <linux/types.h>
+
+/*
+ * dm-user proxies device mapper ops between the kernel and userspace.  It's
+ * essentially just an RPC mechanism: all kernel calls create a request,
+ * userspace handles that with a response.  Userspace obtains requests via
+ * read() and provides responses via write().
+ *
+ * See Documentation/block/dm-user.rst for more information.
+ */
+
+#define DM_USER_REQ_MAP_READ 0
+#define DM_USER_REQ_MAP_WRITE 1
+#define DM_USER_REQ_MAP_FLUSH 2
+#define DM_USER_REQ_MAP_DISCARD 3
+#define DM_USER_REQ_MAP_SECURE_ERASE 4
+#define DM_USER_REQ_MAP_WRITE_SAME 5
+#define DM_USER_REQ_MAP_WRITE_ZEROES 6
+#define DM_USER_REQ_MAP_ZONE_OPEN 7
+#define DM_USER_REQ_MAP_ZONE_CLOSE 8
+#define DM_USER_REQ_MAP_ZONE_FINISH 9
+#define DM_USER_REQ_MAP_ZONE_APPEND 10
+#define DM_USER_REQ_MAP_ZONE_RESET 11
+#define DM_USER_REQ_MAP_ZONE_RESET_ALL 12
+
+#define DM_USER_REQ_MAP_FLAG_FAILFAST_DEV 0x00001
+#define DM_USER_REQ_MAP_FLAG_FAILFAST_TRANSPORT 0x00002
+#define DM_USER_REQ_MAP_FLAG_FAILFAST_DRIVER 0x00004
+#define DM_USER_REQ_MAP_FLAG_SYNC 0x00008
+#define DM_USER_REQ_MAP_FLAG_META 0x00010
+#define DM_USER_REQ_MAP_FLAG_PRIO 0x00020
+#define DM_USER_REQ_MAP_FLAG_NOMERGE 0x00040
+#define DM_USER_REQ_MAP_FLAG_IDLE 0x00080
+#define DM_USER_REQ_MAP_FLAG_INTEGRITY 0x00100
+#define DM_USER_REQ_MAP_FLAG_FUA 0x00200
+#define DM_USER_REQ_MAP_FLAG_PREFLUSH 0x00400
+#define DM_USER_REQ_MAP_FLAG_RAHEAD 0x00800
+#define DM_USER_REQ_MAP_FLAG_BACKGROUND 0x01000
+#define DM_USER_REQ_MAP_FLAG_NOWAIT 0x02000
+#define DM_USER_REQ_MAP_FLAG_CGROUP_PUNT 0x04000
+#define DM_USER_REQ_MAP_FLAG_NOUNMAP 0x08000
+#define DM_USER_REQ_MAP_FLAG_HIPRI 0x10000
+#define DM_USER_REQ_MAP_FLAG_DRV 0x20000
+#define DM_USER_REQ_MAP_FLAG_SWAP 0x40000
+
+#define DM_USER_RESP_SUCCESS 0
+#define DM_USER_RESP_ERROR 1
+#define DM_USER_RESP_UNSUPPORTED 2
+
+struct dm_user_message {
+	__u64 seq;
+	__u64 type;
+	__u64 flags;
+	__u64 sector;
+	__u64 len;
+	__u8 buf[];
+};
+
+#endif

diff --git a/include/uapi/linux/f2fs.h b/include/uapi/linux/f2fs.h
index 3121d12..955d440 100644
--- a/include/uapi/linux/f2fs.h
+++ b/include/uapi/linux/f2fs.h

@@ -42,6 +42,7 @@
 						struct f2fs_comp_option)
 #define F2FS_IOC_DECOMPRESS_FILE	_IO(F2FS_IOCTL_MAGIC, 23)
 #define F2FS_IOC_COMPRESS_FILE		_IO(F2FS_IOCTL_MAGIC, 24)
+#define F2FS_IOC_START_ATOMIC_REPLACE	_IO(F2FS_IOCTL_MAGIC, 25)
 
 /*
  * should be same as XFS_IOC_GOINGDOWN.

diff --git a/include/uapi/linux/fscrypt.h b/include/uapi/linux/fscrypt.h
index a756b29..024def3 100644
--- a/include/uapi/linux/fscrypt.h
+++ b/include/uapi/linux/fscrypt.h

@@ -26,6 +26,8 @@
 #define FSCRYPT_MODE_AES_256_CTS		4
 #define FSCRYPT_MODE_AES_128_CBC		5
 #define FSCRYPT_MODE_AES_128_CTS		6
+#define FSCRYPT_MODE_SM4_XTS			7
+#define FSCRYPT_MODE_SM4_CTS			8
 #define FSCRYPT_MODE_ADIANTUM			9
 #define FSCRYPT_MODE_AES_256_HCTR2		10
 /* If adding a mode number > 10, update FSCRYPT_MODE_MAX in fscrypt_private.h */
@@ -125,7 +127,10 @@ struct fscrypt_add_key_arg {
 	struct fscrypt_key_specifier key_spec;
 	__u32 raw_size;
 	__u32 key_id;
-	__u32 __reserved[8];
+	__u32 __reserved[7];
+	/* N.B.: "temporary" flag, not reserved upstream */
+#define __FSCRYPT_ADD_KEY_FLAG_HW_WRAPPED		0x00000001
+	__u32 __flags;
 	__u8 raw[];
 };
 
@@ -185,8 +190,6 @@ struct fscrypt_get_key_status_arg {
 #define FS_ENCRYPTION_MODE_AES_256_CTS	FSCRYPT_MODE_AES_256_CTS
 #define FS_ENCRYPTION_MODE_AES_128_CBC	FSCRYPT_MODE_AES_128_CBC
 #define FS_ENCRYPTION_MODE_AES_128_CTS	FSCRYPT_MODE_AES_128_CTS
-#define FS_ENCRYPTION_MODE_SPECK128_256_XTS	7	/* removed */
-#define FS_ENCRYPTION_MODE_SPECK128_256_CTS	8	/* removed */
 #define FS_ENCRYPTION_MODE_ADIANTUM	FSCRYPT_MODE_ADIANTUM
 #define FS_KEY_DESC_PREFIX		FSCRYPT_KEY_DESC_PREFIX
 #define FS_KEY_DESC_PREFIX_SIZE		FSCRYPT_KEY_DESC_PREFIX_SIZE

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 76ee8f9e..9b691de 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h

@@ -393,6 +393,18 @@ struct fuse_file_lock {
 #define FUSE_SECURITY_CTX	(1ULL << 32)
 #define FUSE_HAS_INODE_DAX	(1ULL << 33)
 
+/*
+ * For FUSE < 7.36 FUSE_PASSTHROUGH has value (1 << 31).
+ * This condition check is not really required, but would prevent having a
+ * broken commit in the tree.
+ */
+#if FUSE_KERNEL_VERSION > 7 ||                                                 \
+	(FUSE_KERNEL_VERSION == 7 && FUSE_KERNEL_MINOR_VERSION >= 36)
+#define FUSE_PASSTHROUGH (1ULL << 63)
+#else
+#define FUSE_PASSTHROUGH (1 << 31)
+#endif
+
 /**
  * CUSE INIT request/reply flags
  *
@@ -541,6 +553,7 @@ enum fuse_opcode {
 	FUSE_REMOVEMAPPING	= 49,
 	FUSE_SYNCFS		= 50,
 	FUSE_TMPFILE		= 51,
+	FUSE_CANONICAL_PATH	= 2016,
 
 	/* CUSE specific operations */
 	CUSE_INIT		= 4096,
@@ -667,7 +680,7 @@ struct fuse_create_in {
 struct fuse_open_out {
 	uint64_t	fh;
 	uint32_t	open_flags;
-	uint32_t	padding;
+	uint32_t	passthrough_fh;
 };
 
 struct fuse_release_in {
@@ -957,6 +970,9 @@ struct fuse_notify_retrieve_in {
 /* Device ioctls: */
 #define FUSE_DEV_IOC_MAGIC		229
 #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
+/* 127 is reserved for the V1 interface implementation in Android (deprecated) */
+/* 126 is reserved for the V2 interface implementation in Android */
+#define FUSE_DEV_IOC_PASSTHROUGH_OPEN	_IOW(FUSE_DEV_IOC_MAGIC, 126, uint32_t)
 
 struct fuse_lseek_in {
 	uint64_t	fh;

diff --git a/include/uapi/linux/icmp.h b/include/uapi/linux/icmp.h
index 163c099..d3242d5 100644
--- a/include/uapi/linux/icmp.h
+++ b/include/uapi/linux/icmp.h

@@ -97,7 +97,11 @@ struct icmphdr {
 	} echo;
 	__be32	gateway;
 	struct {
+#ifdef __BIONIC__
+		__be16	__linux_unused;
+#else
 		__be16	__unused;
+#endif
 		__be16	mtu;
 	} frag;
 	__u8	reserved[4];

diff --git a/include/uapi/linux/incrementalfs.h b/include/uapi/linux/incrementalfs.h
new file mode 100644
index 0000000..f8338af
--- /dev/null
+++ b/include/uapi/linux/incrementalfs.h

@@ -0,0 +1,590 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Userspace interface for Incremental FS.
+ *
+ * Incremental FS is special-purpose Linux virtual file system that allows
+ * execution of a program while its binary and resource files are still being
+ * lazily downloaded over the network, USB etc.
+ *
+ * Copyright 2019 Google LLC
+ */
+#ifndef _UAPI_LINUX_INCREMENTALFS_H
+#define _UAPI_LINUX_INCREMENTALFS_H
+
+#include <linux/limits.h>
+#include <linux/ioctl.h>
+#include <linux/types.h>
+#include <linux/xattr.h>
+
+/* ===== constants ===== */
+#define INCFS_NAME "incremental-fs"
+
+/*
+ * Magic number used in file header and in memory superblock
+ * Note that it is a 5 byte unsigned long. Thus on 32 bit kernels, it is
+ * truncated to a 4 byte number
+ */
+#define INCFS_MAGIC_NUMBER (0x5346434e49ul & ULONG_MAX)
+
+#define INCFS_DATA_FILE_BLOCK_SIZE 4096
+#define INCFS_HEADER_VER 1
+
+/* TODO: This value is assumed in incfs_copy_signature_info_from_user to be the
+ * actual signature length. Set back to 64 when fixed.
+ */
+#define INCFS_MAX_HASH_SIZE 32
+#define INCFS_MAX_FILE_ATTR_SIZE 512
+
+#define INCFS_INDEX_NAME ".index"
+#define INCFS_INCOMPLETE_NAME ".incomplete"
+#define INCFS_PENDING_READS_FILENAME ".pending_reads"
+#define INCFS_LOG_FILENAME ".log"
+#define INCFS_BLOCKS_WRITTEN_FILENAME ".blocks_written"
+#define INCFS_XATTR_ID_NAME (XATTR_USER_PREFIX "incfs.id")
+#define INCFS_XATTR_SIZE_NAME (XATTR_USER_PREFIX "incfs.size")
+#define INCFS_XATTR_METADATA_NAME (XATTR_USER_PREFIX "incfs.metadata")
+#define INCFS_XATTR_VERITY_NAME (XATTR_USER_PREFIX "incfs.verity")
+
+#define INCFS_MAX_SIGNATURE_SIZE 8096
+#define INCFS_SIGNATURE_VERSION 2
+#define INCFS_SIGNATURE_SECTIONS 2
+
+#define INCFS_IOCTL_BASE_CODE 'g'
+
+/* ===== ioctl requests on the command dir ===== */
+
+/*
+ * Create a new file
+ * May only be called on .pending_reads file
+ */
+#define INCFS_IOC_CREATE_FILE \
+	_IOWR(INCFS_IOCTL_BASE_CODE, 30, struct incfs_new_file_args)
+
+/* Read file signature */
+#define INCFS_IOC_READ_FILE_SIGNATURE                                          \
+	_IOR(INCFS_IOCTL_BASE_CODE, 31, struct incfs_get_file_sig_args)
+
+/*
+ * Fill in one or more data block. This may only be called on a handle
+ * passed as a parameter to INCFS_IOC_PERMIT_FILLING
+ *
+ * Returns number of blocks filled in, or error if none were
+ */
+#define INCFS_IOC_FILL_BLOCKS                                                  \
+	_IOR(INCFS_IOCTL_BASE_CODE, 32, struct incfs_fill_blocks)
+
+/*
+ * Permit INCFS_IOC_FILL_BLOCKS on the given file descriptor
+ * May only be called on .pending_reads file
+ *
+ * Returns 0 on success or error
+ */
+#define INCFS_IOC_PERMIT_FILL                                                  \
+	_IOW(INCFS_IOCTL_BASE_CODE, 33, struct incfs_permit_fill)
+
+/*
+ * Fills buffer with ranges of populated blocks
+ *
+ * Returns 0 if all ranges written
+ *	   error otherwise
+ *
+ *	   Either way, range_buffer_size_out is set to the number
+ *	   of bytes written. Should be set to 0 by caller. The ranges
+ *	   filled are valid, but if an error was returned there might
+ *	   be more ranges to come.
+ *
+ *	   Ranges are ranges of filled blocks:
+ *
+ *	   1 2 7 9
+ *
+ *	   means blocks 1, 2, 7, 8, 9 are filled, 0, 3, 4, 5, 6 and 10 on
+ *	   are not
+ *
+ *	   If hashing is enabled for the file, the hash blocks are simply
+ *	   treated as though they immediately followed the data blocks.
+ */
+#define INCFS_IOC_GET_FILLED_BLOCKS                                            \
+	_IOR(INCFS_IOCTL_BASE_CODE, 34, struct incfs_get_filled_blocks_args)
+
+/*
+ * Creates a new mapped file
+ * May only be called on .pending_reads file
+ */
+#define INCFS_IOC_CREATE_MAPPED_FILE \
+	_IOWR(INCFS_IOCTL_BASE_CODE, 35, struct incfs_create_mapped_file_args)
+
+/*
+ * Get number of blocks, total and filled
+ * May only be called on .pending_reads file
+ */
+#define INCFS_IOC_GET_BLOCK_COUNT \
+	_IOR(INCFS_IOCTL_BASE_CODE, 36, struct incfs_get_block_count_args)
+
+/*
+ * Get per UID read timeouts
+ * May only be called on .pending_reads file
+ */
+#define INCFS_IOC_GET_READ_TIMEOUTS \
+	_IOR(INCFS_IOCTL_BASE_CODE, 37, struct incfs_get_read_timeouts_args)
+
+/*
+ * Set per UID read timeouts
+ * May only be called on .pending_reads file
+ */
+#define INCFS_IOC_SET_READ_TIMEOUTS \
+	_IOW(INCFS_IOCTL_BASE_CODE, 38, struct incfs_set_read_timeouts_args)
+
+/*
+ * Get last read error
+ * May only be called on .pending_reads file
+ */
+#define INCFS_IOC_GET_LAST_READ_ERROR \
+	_IOW(INCFS_IOCTL_BASE_CODE, 39, struct incfs_get_last_read_error_args)
+
+/* ===== sysfs feature flags ===== */
+/*
+ * Each flag is represented by a file in /sys/fs/incremental-fs/features
+ * If the file exists the feature is supported
+ * Also the file contents will be the line "supported"
+ */
+
+/*
+ * Basic flag stating that the core incfs file system is available
+ */
+#define INCFS_FEATURE_FLAG_COREFS "corefs"
+
+/*
+ * zstd compression support
+ */
+#define INCFS_FEATURE_FLAG_ZSTD "zstd"
+
+/*
+ * v2 feature set support. Covers:
+ *   INCFS_IOC_CREATE_MAPPED_FILE
+ *   INCFS_IOC_GET_BLOCK_COUNT
+ *   INCFS_IOC_GET_READ_TIMEOUTS/INCFS_IOC_SET_READ_TIMEOUTS
+ *   .blocks_written status file
+ *   .incomplete folder
+ *   report_uid mount option
+ */
+#define INCFS_FEATURE_FLAG_V2 "v2"
+
+enum incfs_compression_alg {
+	COMPRESSION_NONE = 0,
+	COMPRESSION_LZ4 = 1,
+	COMPRESSION_ZSTD = 2,
+};
+
+enum incfs_block_flags {
+	INCFS_BLOCK_FLAGS_NONE = 0,
+	INCFS_BLOCK_FLAGS_HASH = 1,
+};
+
+typedef struct {
+	__u8 bytes[16];
+} incfs_uuid_t __attribute__((aligned (8)));
+
+/*
+ * Description of a pending read. A pending read - a read call by
+ * a userspace program for which the filesystem currently doesn't have data.
+ *
+ * Reads from .pending_reads and .log return an array of these structure
+ */
+struct incfs_pending_read_info {
+	/* Id of a file that is being read from. */
+	incfs_uuid_t file_id;
+
+	/* A number of microseconds since system boot to the read. */
+	__aligned_u64 timestamp_us;
+
+	/* Index of a file block that is being read. */
+	__u32 block_index;
+
+	/* A serial number of this pending read. */
+	__u32 serial_number;
+};
+
+/*
+ * Description of a pending read. A pending read - a read call by
+ * a userspace program for which the filesystem currently doesn't have data.
+ *
+ * This version of incfs_pending_read_info is used whenever the file system is
+ * mounted with the report_uid flag
+ */
+struct incfs_pending_read_info2 {
+	/* Id of a file that is being read from. */
+	incfs_uuid_t file_id;
+
+	/* A number of microseconds since system boot to the read. */
+	__aligned_u64 timestamp_us;
+
+	/* Index of a file block that is being read. */
+	__u32 block_index;
+
+	/* A serial number of this pending read. */
+	__u32 serial_number;
+
+	/* The UID of the reading process */
+	__u32 uid;
+
+	__u32 reserved;
+};
+
+/*
+ * Description of a data or hash block to add to a data file.
+ */
+struct incfs_fill_block {
+	/* Index of a data block. */
+	__u32 block_index;
+
+	/* Length of data */
+	__u32 data_len;
+
+	/*
+	 * A pointer to an actual data for the block.
+	 *
+	 * Equivalent to: __u8 *data;
+	 */
+	__aligned_u64 data;
+
+	/*
+	 * Compression algorithm used to compress the data block.
+	 * Values from enum incfs_compression_alg.
+	 */
+	__u8 compression;
+
+	/* Values from enum incfs_block_flags */
+	__u8 flags;
+
+	__u16 reserved1;
+
+	__u32 reserved2;
+
+	__aligned_u64 reserved3;
+};
+
+/*
+ * Description of a number of blocks to add to a data file
+ *
+ * Argument for INCFS_IOC_FILL_BLOCKS
+ */
+struct incfs_fill_blocks {
+	/* Number of blocks */
+	__u64 count;
+
+	/* A pointer to an array of incfs_fill_block structs */
+	__aligned_u64 fill_blocks;
+};
+
+/*
+ * Permit INCFS_IOC_FILL_BLOCKS on the given file descriptor
+ * May only be called on .pending_reads file
+ *
+ * Argument for INCFS_IOC_PERMIT_FILL
+ */
+struct incfs_permit_fill {
+	/* File to permit fills on */
+	__u32 file_descriptor;
+};
+
+enum incfs_hash_tree_algorithm {
+	INCFS_HASH_TREE_NONE = 0,
+	INCFS_HASH_TREE_SHA256 = 1
+};
+
+/*
+ * Create a new file or directory.
+ */
+struct incfs_new_file_args {
+	/* Id of a file to create. */
+	incfs_uuid_t file_id;
+
+	/*
+	 * Total size of the new file. Ignored if S_ISDIR(mode).
+	 */
+	__aligned_u64 size;
+
+	/*
+	 * File mode. Permissions and dir flag.
+	 */
+	__u16 mode;
+
+	__u16 reserved1;
+
+	__u32 reserved2;
+
+	/*
+	 * A pointer to a null-terminated relative path to the file's parent
+	 * dir.
+	 * Max length: PATH_MAX
+	 *
+	 * Equivalent to: char *directory_path;
+	 */
+	__aligned_u64 directory_path;
+
+	/*
+	 * A pointer to a null-terminated file's name.
+	 * Max length: PATH_MAX
+	 *
+	 * Equivalent to: char *file_name;
+	 */
+	__aligned_u64 file_name;
+
+	/*
+	 * A pointer to a file attribute to be set on creation.
+	 *
+	 * Equivalent to: u8 *file_attr;
+	 */
+	__aligned_u64 file_attr;
+
+	/*
+	 * Length of the data buffer specfied by file_attr.
+	 * Max value: INCFS_MAX_FILE_ATTR_SIZE
+	 */
+	__u32 file_attr_len;
+
+	__u32 reserved4;
+
+	/*
+	 * Points to an APK V4 Signature data blob
+	 * Signature must have two sections
+	 * Format is:
+	 *	u32 version
+	 *	u32 size_of_hash_info_section
+	 *	u8 hash_info_section[]
+	 *	u32 size_of_signing_info_section
+	 *	u8 signing_info_section[]
+	 *
+	 * Note that incfs does not care about what is in signing_info_section
+	 *
+	 * hash_info_section has following format:
+	 *	u32 hash_algorithm; // Must be SHA256 == 1
+	 *	u8 log2_blocksize;  // Must be 12 for 4096 byte blocks
+	 *	u32 salt_size;
+	 *	u8 salt[];
+	 *	u32 hash_size;
+	 *	u8 root_hash[];
+	 */
+	__aligned_u64 signature_info;
+
+	/* Size of signature_info */
+	__aligned_u64 signature_size;
+
+	__aligned_u64 reserved6;
+};
+
+/*
+ * Request a digital signature blob for a given file.
+ * Argument for INCFS_IOC_READ_FILE_SIGNATURE ioctl
+ */
+struct incfs_get_file_sig_args {
+	/*
+	 * A pointer to the data buffer to save an signature blob to.
+	 *
+	 * Equivalent to: u8 *file_signature;
+	 */
+	__aligned_u64 file_signature;
+
+	/* Size of the buffer at file_signature. */
+	__u32 file_signature_buf_size;
+
+	/*
+	 * Number of bytes save file_signature buffer.
+	 * It is set after ioctl done.
+	 */
+	__u32 file_signature_len_out;
+};
+
+struct incfs_filled_range {
+	__u32 begin;
+	__u32 end;
+};
+
+/*
+ * Request ranges of filled blocks
+ * Argument for INCFS_IOC_GET_FILLED_BLOCKS
+ */
+struct incfs_get_filled_blocks_args {
+	/*
+	 * A buffer to populate with ranges of filled blocks
+	 *
+	 * Equivalent to struct incfs_filled_ranges *range_buffer
+	 */
+	__aligned_u64 range_buffer;
+
+	/* Size of range_buffer */
+	__u32 range_buffer_size;
+
+	/* Start index to read from */
+	__u32 start_index;
+
+	/*
+	 * End index to read to. 0 means read to end. This is a range,
+	 * so incfs will read from start_index to end_index - 1
+	 */
+	__u32 end_index;
+
+	/* Actual number of blocks in file */
+	__u32 total_blocks_out;
+
+	/* The  number of data blocks in file */
+	__u32 data_blocks_out;
+
+	/* Number of bytes written to range buffer */
+	__u32 range_buffer_size_out;
+
+	/* Sector scanned up to, if the call was interrupted */
+	__u32 index_out;
+};
+
+/*
+ * Create a new mapped file
+ * Argument for INCFS_IOC_CREATE_MAPPED_FILE
+ */
+struct incfs_create_mapped_file_args {
+	/*
+	 * Total size of the new file.
+	 */
+	__aligned_u64 size;
+
+	/*
+	 * File mode. Permissions and dir flag.
+	 */
+	__u16 mode;
+
+	__u16 reserved1;
+
+	__u32 reserved2;
+
+	/*
+	 * A pointer to a null-terminated relative path to the incfs mount
+	 * point
+	 * Max length: PATH_MAX
+	 *
+	 * Equivalent to: char *directory_path;
+	 */
+	__aligned_u64 directory_path;
+
+	/*
+	 * A pointer to a null-terminated file name.
+	 * Max length: PATH_MAX
+	 *
+	 * Equivalent to: char *file_name;
+	 */
+	__aligned_u64 file_name;
+
+	/* Id of source file to map. */
+	incfs_uuid_t source_file_id;
+
+	/*
+	 * Offset in source file to start mapping. Must be a multiple of
+	 * INCFS_DATA_FILE_BLOCK_SIZE
+	 */
+	__aligned_u64 source_offset;
+};
+
+/*
+ * Get information about the blocks in this file
+ * Argument for INCFS_IOC_GET_BLOCK_COUNT
+ */
+struct incfs_get_block_count_args {
+	/* Total number of data blocks in the file */
+	__u32 total_data_blocks_out;
+
+	/* Number of filled data blocks in the file */
+	__u32 filled_data_blocks_out;
+
+	/* Total number of hash blocks in the file */
+	__u32 total_hash_blocks_out;
+
+	/* Number of filled hash blocks in the file */
+	__u32 filled_hash_blocks_out;
+};
+
+/* Description of timeouts for one UID */
+struct incfs_per_uid_read_timeouts {
+	/* UID to apply these timeouts to */
+	__u32 uid;
+
+	/*
+	 * Min time in microseconds to read any block. Note that this doesn't
+	 * apply to reads which are satisfied from the page cache.
+	 */
+	__u32 min_time_us;
+
+	/*
+	 * Min time in microseconds to satisfy a pending read. Any pending read
+	 * which is filled before this time will be delayed so that the total
+	 * read time >= this value.
+	 */
+	__u32 min_pending_time_us;
+
+	/*
+	 * Max time in microseconds to satisfy a pending read before the read
+	 * times out. If set to U32_MAX, defaults to mount options
+	 * read_timeout_ms * 1000. Must be >= min_pending_time_us
+	 */
+	__u32 max_pending_time_us;
+};
+
+/*
+ * Get the read timeouts array
+ * Argument for INCFS_IOC_GET_READ_TIMEOUTS
+ */
+struct incfs_get_read_timeouts_args {
+	/*
+	 * A pointer to a buffer to fill with the current timeouts
+	 *
+	 * Equivalent to struct incfs_per_uid_read_timeouts *
+	 */
+	__aligned_u64 timeouts_array;
+
+	/* Size of above buffer in bytes */
+	__u32 timeouts_array_size;
+
+	/* Size used in bytes, or size needed if -ENOMEM returned */
+	__u32 timeouts_array_size_out;
+};
+
+/*
+ * Set the read timeouts array
+ * Arguments for INCFS_IOC_SET_READ_TIMEOUTS
+ */
+struct incfs_set_read_timeouts_args {
+	/*
+	 * A pointer to an array containing the new timeouts
+	 * This will replace any existing timeouts
+	 *
+	 * Equivalent to struct incfs_per_uid_read_timeouts *
+	 */
+	__aligned_u64 timeouts_array;
+
+	/* Size of above array in bytes. Must be < 256 */
+	__u32 timeouts_array_size;
+};
+
+/*
+ * Get last read error struct
+ * Arguments for INCFS_IOC_GET_LAST_READ_ERROR
+ */
+struct incfs_get_last_read_error_args {
+	/* File id of last file that had a read error */
+	incfs_uuid_t	file_id_out;
+
+	/* Time of last read error, in us, from CLOCK_MONOTONIC */
+	__u64	time_us_out;
+
+	/* Index of page that was being read at last read error */
+	__u32	page_out;
+
+	/* errno of last read error */
+	__u32	errno_out;
+
+	/* uid of last read error */
+	__u32	uid_out;
+
+	__u32	reserved1;
+	__u64	reserved2;
+};
+
+#endif /* _UAPI_LINUX_INCREMENTALFS_H */

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d441..1c9279f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h

@@ -911,6 +911,12 @@ struct kvm_ppc_resize_hpt {
 #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK	0xffULL
 #define KVM_VM_TYPE_ARM_IPA_SIZE(x)		\
 	((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
+
+#define KVM_VM_TYPE_ARM_PROTECTED	(1UL << 31)
+
+#define KVM_VM_TYPE_MASK	(KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
+				 KVM_VM_TYPE_ARM_PROTECTED)
+
 /*
  * ioctls for /dev/kvm fds:
  */
@@ -1178,6 +1184,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_ARM_PROTECTED_VM 0xffbadab1
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

diff --git a/include/uapi/linux/netfilter/xt_IDLETIMER.h b/include/uapi/linux/netfilter/xt_IDLETIMER.h
index 7bfb31a..104ac32 100644
--- a/include/uapi/linux/netfilter/xt_IDLETIMER.h
+++ b/include/uapi/linux/netfilter/xt_IDLETIMER.h

@@ -33,7 +33,7 @@ struct idletimer_tg_info_v1 {
 
 	char label[MAX_IDLETIMER_LABEL_SIZE];
 
-	__u8 send_nl_msg;   /* unused: for compatibility with Android */
+	__u8 send_nl_msg;
 	__u8 timer_type;
 
 	/* for kernel module internal use only */

diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 42a40ad..4dd7734 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h

@@ -69,6 +69,8 @@
 #define PSCI_1_1_FN64_SYSTEM_RESET2		PSCI_0_2_FN64(18)
 #define PSCI_1_1_FN64_MEM_PROTECT_CHECK_RANGE	PSCI_0_2_FN64(20)
 
+#define PSCI_1_1_FN64_MEM_PROTECT		PSCI_0_2_FN64(19)
+
 /* PSCI v0.2 power state encoding for CPU_SUSPEND function */
 #define PSCI_0_2_POWER_STATE_ID_MASK		0xffff
 #define PSCI_0_2_POWER_STATE_ID_SHIFT		0

diff --git a/include/uapi/linux/usb/f_accessory.h b/include/uapi/linux/usb/f_accessory.h
new file mode 100644
index 0000000..0baeb7d
--- /dev/null
+++ b/include/uapi/linux/usb/f_accessory.h

@@ -0,0 +1,146 @@
+/*
+ * Gadget Function Driver for Android USB accessories
+ *
+ * Copyright (C) 2011 Google, Inc.
+ * Author: Mike Lockwood <lockwood@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _UAPI_LINUX_USB_F_ACCESSORY_H
+#define _UAPI_LINUX_USB_F_ACCESSORY_H
+
+/* Use Google Vendor ID when in accessory mode */
+#define USB_ACCESSORY_VENDOR_ID 0x18D1
+
+
+/* Product ID to use when in accessory mode */
+#define USB_ACCESSORY_PRODUCT_ID 0x2D00
+
+/* Product ID to use when in accessory mode and adb is enabled */
+#define USB_ACCESSORY_ADB_PRODUCT_ID 0x2D01
+
+/* Indexes for strings sent by the host via ACCESSORY_SEND_STRING */
+#define ACCESSORY_STRING_MANUFACTURER   0
+#define ACCESSORY_STRING_MODEL          1
+#define ACCESSORY_STRING_DESCRIPTION    2
+#define ACCESSORY_STRING_VERSION        3
+#define ACCESSORY_STRING_URI            4
+#define ACCESSORY_STRING_SERIAL         5
+
+/* Control request for retrieving device's protocol version
+ *
+ *	requestType:    USB_DIR_IN | USB_TYPE_VENDOR
+ *	request:        ACCESSORY_GET_PROTOCOL
+ *	value:          0
+ *	index:          0
+ *	data            version number (16 bits little endian)
+ *                     1 for original accessory support
+ *                     2 adds HID and device to host audio support
+ */
+#define ACCESSORY_GET_PROTOCOL  51
+
+/* Control request for host to send a string to the device
+ *
+ *	requestType:    USB_DIR_OUT | USB_TYPE_VENDOR
+ *	request:        ACCESSORY_SEND_STRING
+ *	value:          0
+ *	index:          string ID
+ *	data            zero terminated UTF8 string
+ *
+ *  The device can later retrieve these strings via the
+ *  ACCESSORY_GET_STRING_* ioctls
+ */
+#define ACCESSORY_SEND_STRING   52
+
+/* Control request for starting device in accessory mode.
+ * The host sends this after setting all its strings to the device.
+ *
+ *	requestType:    USB_DIR_OUT | USB_TYPE_VENDOR
+ *	request:        ACCESSORY_START
+ *	value:          0
+ *	index:          0
+ *	data            none
+ */
+#define ACCESSORY_START         53
+
+/* Control request for registering a HID device.
+ * Upon registering, a unique ID is sent by the accessory in the
+ * value parameter. This ID will be used for future commands for
+ * the device
+ *
+ *	requestType:    USB_DIR_OUT | USB_TYPE_VENDOR
+ *	request:        ACCESSORY_REGISTER_HID_DEVICE
+ *	value:          Accessory assigned ID for the HID device
+ *	index:          total length of the HID report descriptor
+ *	data            none
+ */
+#define ACCESSORY_REGISTER_HID         54
+
+/* Control request for unregistering a HID device.
+ *
+ *	requestType:    USB_DIR_OUT | USB_TYPE_VENDOR
+ *	request:        ACCESSORY_REGISTER_HID
+ *	value:          Accessory assigned ID for the HID device
+ *	index:          0
+ *	data            none
+ */
+#define ACCESSORY_UNREGISTER_HID         55
+
+/* Control request for sending the HID report descriptor.
+ * If the HID descriptor is longer than the endpoint zero max packet size,
+ * the descriptor will be sent in multiple ACCESSORY_SET_HID_REPORT_DESC
+ * commands. The data for the descriptor must be sent sequentially
+ * if multiple packets are needed.
+ *
+ *	requestType:    USB_DIR_OUT | USB_TYPE_VENDOR
+ *	request:        ACCESSORY_SET_HID_REPORT_DESC
+ *	value:          Accessory assigned ID for the HID device
+ *	index:          offset of data in descriptor
+ *                      (needed when HID descriptor is too big for one packet)
+ *	data            the HID report descriptor
+ */
+#define ACCESSORY_SET_HID_REPORT_DESC         56
+
+/* Control request for sending HID events.
+ *
+ *	requestType:    USB_DIR_OUT | USB_TYPE_VENDOR
+ *	request:        ACCESSORY_SEND_HID_EVENT
+ *	value:          Accessory assigned ID for the HID device
+ *	index:          0
+ *	data            the HID report for the event
+ */
+#define ACCESSORY_SEND_HID_EVENT         57
+
+/* Control request for setting the audio mode.
+ *
+ *	requestType:	USB_DIR_OUT | USB_TYPE_VENDOR
+ *	request:        ACCESSORY_SET_AUDIO_MODE
+ *	value:          0 - no audio
+ *                     1 - device to host, 44100 16-bit stereo PCM
+ *	index:          0
+ *	data            none
+ */
+#define ACCESSORY_SET_AUDIO_MODE         58
+
+/* ioctls for retrieving strings set by the host */
+#define ACCESSORY_GET_STRING_MANUFACTURER   _IOW('M', 1, char[256])
+#define ACCESSORY_GET_STRING_MODEL          _IOW('M', 2, char[256])
+#define ACCESSORY_GET_STRING_DESCRIPTION    _IOW('M', 3, char[256])
+#define ACCESSORY_GET_STRING_VERSION        _IOW('M', 4, char[256])
+#define ACCESSORY_GET_STRING_URI            _IOW('M', 5, char[256])
+#define ACCESSORY_GET_STRING_SERIAL         _IOW('M', 6, char[256])
+/* returns 1 if there is a start request pending */
+#define ACCESSORY_IS_START_REQUESTED        _IO('M', 7)
+/* returns audio mode (set via the ACCESSORY_SET_AUDIO_MODE control request) */
+#define ACCESSORY_GET_AUDIO_MODE            _IO('M', 8)
+
+#endif /* _UAPI_LINUX_USB_F_ACCESSORY_H */

diff --git a/include/uapi/linux/usb/g_uvc.h b/include/uapi/linux/usb/g_uvc.h
index 652f169..8d7824d 100644
--- a/include/uapi/linux/usb/g_uvc.h
+++ b/include/uapi/linux/usb/g_uvc.h

@@ -21,6 +21,9 @@
 #define UVC_EVENT_DATA			(V4L2_EVENT_PRIVATE_START + 5)
 #define UVC_EVENT_LAST			(V4L2_EVENT_PRIVATE_START + 5)
 
+#define UVC_STRING_CONTROL_IDX			0
+#define UVC_STRING_STREAMING_IDX		1
+
 struct uvc_request_data {
 	__s32 length;
 	__u8 data[60];

diff --git a/include/uapi/linux/usb/video.h b/include/uapi/linux/usb/video.h
index bfdae12..6e8e572 100644
--- a/include/uapi/linux/usb/video.h
+++ b/include/uapi/linux/usb/video.h

@@ -466,7 +466,7 @@ struct uvc_format_uncompressed {
 	__u8  bDefaultFrameIndex;
 	__u8  bAspectRatioX;
 	__u8  bAspectRatioY;
-	__u8  bmInterfaceFlags;
+	__u8  bmInterlaceFlags;
 	__u8  bCopyProtect;
 } __attribute__((__packed__));
 
@@ -522,7 +522,7 @@ struct uvc_format_mjpeg {
 	__u8  bDefaultFrameIndex;
 	__u8  bAspectRatioX;
 	__u8  bAspectRatioY;
-	__u8  bmInterfaceFlags;
+	__u8  bmInterlaceFlags;
 	__u8  bCopyProtect;
 } __attribute__((__packed__));
 

diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 29da1f4..e9fce7d 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h

@@ -70,7 +70,7 @@
  * Common stuff for both V4L1 and V4L2
  * Moved from videodev.h
  */
-#define VIDEO_MAX_FRAME               32
+#define VIDEO_MAX_FRAME               64
 #define VIDEO_MAX_PLANES               8
 
 /*

diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 2bb8929..ca4210d 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h

@@ -593,6 +593,29 @@ enum ufshcd_quirks {
 	 * auto-hibernate capability but it's FASTAUTO only.
 	 */
 	UFSHCD_QUIRK_HIBERN_FASTAUTO			= 1 << 18,
+
+	/*
+	 * This quirk needs to be enabled if the host controller supports inline
+	 * encryption, but it needs to initialize the crypto capabilities in a
+	 * nonstandard way and/or it needs to override blk_crypto_ll_ops.  If
+	 * enabled, the standard code won't initialize the blk_crypto_profile;
+	 * ufs_hba_variant_ops::init() must do it instead.
+	 */
+	UFSHCD_QUIRK_CUSTOM_CRYPTO_PROFILE		= 1 << 20,
+
+	/*
+	 * This quirk needs to be enabled if the host controller supports inline
+	 * encryption, but the CRYPTO_GENERAL_ENABLE bit is not implemented and
+	 * breaks the HCE sequence if used.
+	 */
+	UFSHCD_QUIRK_BROKEN_CRYPTO_ENABLE		= 1 << 21,
+
+	/*
+	 * This quirk needs to be enabled if the host controller requires that
+	 * the PRDT be cleared after each encrypted request because encryption
+	 * keys were stored in it.
+	 */
+	UFSHCD_QUIRK_KEYS_IN_PRDT			= 1 << 22,
 };
 
 enum ufshcd_caps {
@@ -754,6 +777,7 @@ struct ufs_hba_monitor {
  * @vops: pointer to variant specific operations
  * @vps: pointer to variant specific parameters
  * @priv: pointer to variant specific private data
+ * @sg_entry_size: size of struct ufshcd_sg_entry (may include variant fields)
  * @irq: Irq number of the controller
  * @is_irq_enabled: whether or not the UFS controller interrupt is enabled.
  * @dev_ref_clk_freq: reference clock frequency
@@ -876,6 +900,7 @@ struct ufs_hba {
 	const struct ufs_hba_variant_ops *vops;
 	struct ufs_hba_variant_params *vps;
 	void *priv;
+	size_t sg_entry_size;
 	unsigned int irq;
 	bool is_irq_enabled;
 	enum ufs_ref_clk_freq dev_ref_clk_freq;
@@ -1177,6 +1202,20 @@ static inline int ufshcd_disable_host_tx_lcc(struct ufs_hba *hba)
 	return ufshcd_dme_set(hba, UIC_ARG_MIB(PA_LOCAL_TX_LCC_ENABLE), 0);
 }
 
+int ufshcd_read_desc_param(struct ufs_hba *hba,
+			   enum desc_idn desc_id,
+			   int desc_index,
+			   u8 param_offset,
+			   u8 *param_read_buf,
+			   u8 param_size);
+int ufshcd_query_attr_retry(struct ufs_hba *hba,
+	enum query_opcode opcode, enum attr_idn idn, u8 index, u8 selector,
+	u32 *attr_val);
+int ufshcd_query_flag_retry(struct ufs_hba *hba,
+	enum query_opcode opcode, enum flag_idn idn, u8 index, bool *flag_res);
+
+int ufshcd_bkops_ctrl(struct ufs_hba *hba, enum bkops_status status);
+
 void ufshcd_auto_hibern8_enable(struct ufs_hba *hba);
 void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
 void ufshcd_fixup_dev_quirks(struct ufs_hba *hba,

diff --git a/include/ufs/ufshci.h b/include/ufs/ufshci.h
index f525566a..36961ac 100644
--- a/include/ufs/ufshci.h
+++ b/include/ufs/ufshci.h

@@ -422,20 +422,28 @@ struct ufshcd_sg_entry {
 	__le64    addr;
 	__le32    reserved;
 	__le32    size;
+	/*
+	 * followed by variant-specific fields if
+	 * hba->sg_entry_size != sizeof(struct ufshcd_sg_entry)
+	 */
 };
 
 /**
  * struct utp_transfer_cmd_desc - UTP Command Descriptor (UCD)
  * @command_upiu: Command UPIU Frame address
  * @response_upiu: Response UPIU Frame address
- * @prd_table: Physical Region Descriptor
+ * @prd_table: Physical Region Descriptor: an array of SG_ALL struct
+ *	ufshcd_sg_entry's.  Variant-specific fields may be present after each.
  */
 struct utp_transfer_cmd_desc {
 	u8 command_upiu[ALIGNED_UPIU_SIZE];
 	u8 response_upiu[ALIGNED_UPIU_SIZE];
-	struct ufshcd_sg_entry    prd_table[SG_ALL];
+	u8 prd_table[];
 };
 
+#define sizeof_utp_transfer_cmd_desc(hba)	\
+	(sizeof(struct utp_transfer_cmd_desc) + SG_ALL * (hba)->sg_entry_size)
+
 /**
  * struct request_desc_header - Descriptor Header common to both UTRD and UTMRD
  * @dword0: Descriptor Header DW0

diff --git a/init/Kconfig b/init/Kconfig
index 0c214af..1256810 100644
--- a/init/Kconfig
+++ b/init/Kconfig

@@ -155,7 +155,7 @@
 
 config WERROR
 	bool "Compile the kernel with warnings as errors"
-	default COMPILE_TEST
+	default y
 	help
 	  A kernel build should not cause any compiler warnings, and this
 	  enables the '-Werror' (for C) and '-Dwarnings' (for Rust) flags
@@ -1295,6 +1295,16 @@
 	  desktop applications.  Task group autogeneration is currently based
 	  upon task session.
 
+config RT_SOFTIRQ_AWARE_SCHED
+	bool "Improve RT scheduling during long softirq execution"
+	depends on SMP && !PREEMPT_RT
+	default n
+	help
+	  Enable an optimization which tries to avoid placing RT tasks on CPUs
+	  occupied by nonpreemptible tasks, such as a long softirq or CPUs
+	  which may soon block preemptions, such as a CPU running a ksoftirq
+	  thread which handles slow softirqs.
+
 config SYSFS_DEPRECATED
 	bool "Enable deprecated sysfs features to support old userspace tools"
 	depends on SYSFS
@@ -2011,3 +2021,5 @@
 # <asm/syscall_wrapper.h>.
 config ARCH_HAS_SYSCALL_WRAPPER
 	def_bool n
+
+source "init/Kconfig.gki"

diff --git a/init/Kconfig.gki b/init/Kconfig.gki
new file mode 100644
index 0000000..29eb1ee
--- /dev/null
+++ b/init/Kconfig.gki

@@ -0,0 +1,282 @@
+config GKI_HIDDEN_DRM_CONFIGS
+	bool "Hidden DRM configs needed for GKI"
+	select DRM_KMS_HELPER if (HAS_IOMEM && DRM)
+	select DRM_GEM_SHMEM_HELPER if (DRM)
+	select DRM_MIPI_DSI
+	select DRM_TTM if (HAS_IOMEM && DRM)
+	select VIDEOMODE_HELPERS
+	select WANT_DEV_COREDUMP
+	select INTERVAL_TREE
+	help
+	  Dummy config option used to enable hidden DRM configs.
+	  These are normally selected implicitly when including a
+	  DRM module, but for GKI, the modules are built out-of-tree.
+
+config GKI_HIDDEN_MCP251XFD_CONFIGS
+        bool "Hidden MCP251XFD configs needed for GKI"
+        select CAN_RX_OFFLOAD
+        help
+          Dummy config option used to enable hidden MCP251XFD configs.
+          These are normally selected implicitly when including a
+          MCP251XFD module, but for GKI, the modules are built out-of-tree.
+
+config GKI_HIDDEN_REGMAP_CONFIGS
+	bool "Hidden Regmap configs needed for GKI"
+	select REGMAP_IRQ
+	select REGMAP_MMIO
+	select REGMAP_SPMI
+	select SPMI
+	help
+	  Dummy config option used to enable hidden regmap configs.
+	  These are normally selected implicitly when a module
+	  that relies on it is configured.
+
+config GKI_HIDDEN_CRYPTO_CONFIGS
+	bool "Hidden CRYPTO configs needed for GKI"
+	select CRYPTO_ENGINE
+	help
+	  Dummy config option used to enable hidden CRYPTO configs.
+	  These are normally selected implicitly when a module
+	  that relies on it is configured.
+
+config GKI_HIDDEN_SND_CONFIGS
+	bool "Hidden SND configs needed for GKI"
+	select SND_VMASTER
+	select SND_PCM_ELD
+	select SND_JACK
+	select SND_JACK_INPUT_DEV
+	select SND_INTEL_NHLT if (ACPI)
+	help
+	  Dummy config option used to enable hidden SND configs.
+	  These are normally selected implicitly when a module
+	  that relies on it is configured.
+
+config GKI_HIDDEN_SND_SOC_CONFIGS
+	bool "Hidden SND_SOC configs needed for GKI"
+	select SND_SOC_GENERIC_DMAENGINE_PCM if (SND_SOC && SND)
+	select SND_PCM_IEC958
+	select SND_SOC_COMPRESS if (SND_SOC && SND)
+	select SND_SOC_TOPOLOGY if (SND_SOC && SND)
+	select DMADEVICES
+	select DMA_VIRTUAL_CHANNELS
+	help
+	  Dummy config option used to enable hidden SND_SOC configs.
+	  These are normally selected implicitly when a module
+	  that relies on it is configured.
+
+config GKI_HIDDEN_MMC_CONFIGS
+	bool "Hidden MMC configs needed for GKI"
+	select MMC_SDHCI_IO_ACCESSORS if (MMC_SDHCI)
+	help
+	  Dummy config option used to enable hidden MMC configs.
+	  These are normally selected implicitly when a module
+	  that relies on it is configured.
+
+config GKI_HIDDEN_GPIO_CONFIGS
+	bool "Hidden GPIO configs needed for GKI"
+	select PINCTRL_SINGLE if (PINCTRL && OF && HAS_IOMEM)
+	select GPIO_PL061 if (HAS_IOMEM && ARM_AMBA && GPIOLIB)
+	help
+	  Dummy config option used to enable hidden GPIO configs.
+	  These are normally selected implicitly when a module
+	  that relies on it is configured.
+
+config GKI_HIDDEN_QCOM_CONFIGS
+	bool "Hidden QCOM configs needed for GKI"
+	select QCOM_SMEM_STATE
+	select QCOM_GDSC if (ARCH_QCOM)
+	select IOMMU_IO_PGTABLE_LPAE if (ARCH_QCOM)
+	select INTERCONNECT_QCOM if (ARCH_QCOM)
+
+	help
+	  Dummy config option used to enable hidden QCOM configs.
+	  These are normally selected implicitly when a module
+	  that relies on it is configured.
+
+config GKI_HIDDEN_MEDIA_CONFIGS
+	bool "Hidden Media configs needed for GKI"
+	select VIDEOBUF2_CORE
+	select V4L2_MEM2MEM_DEV
+	select MEDIA_CONTROLLER
+	select MEDIA_CONTROLLER_REQUEST_API
+	select MEDIA_SUPPORT
+	select FRAME_VECTOR
+	select CEC_CORE
+	select CEC_NOTIFIER
+	select CEC_PIN
+	select VIDEOBUF2_DMA_CONTIG
+	select VIDEOBUF2_DMA_SG
+	select VIDEO_V4L2_SUBDEV_API
+	help
+	  Dummy config option used to enable hidden media configs.
+	  These are normally selected implicitly when a module
+	  that relies on it is configured.
+
+config GKI_HIDDEN_VIRTUAL_CONFIGS
+	bool "Hidden Virtual configs needed for GKI"
+	select HVC_DRIVER
+	help
+	  Dummy config option used to enable hidden virtual device configs.
+	  These are normally selected implicitly when a module
+	  that relies on it is configured.
+
+# LEGACY_WEXT_ALLCONFIG Discussed upstream, soundly rejected as a unique
+# problem for GKI to solve.  It should be noted that these extensions are
+# in-effect deprecated and generally unsupported and we should pressure
+# the SOC vendors to drop any modules that require these extensions.
+config GKI_LEGACY_WEXT_ALLCONFIG
+	bool "Hidden wireless extension configs needed for GKI"
+	select WIRELESS_EXT
+	select WEXT_CORE
+	select WEXT_PROC
+	select WEXT_SPY
+	select WEXT_PRIV
+	help
+	  Dummy config option used to enable all the hidden legacy wireless
+	  extensions to the core wireless network functionality used by
+	  add-in modules.
+
+	  If you are not building a kernel to be used for a variety of
+	  out-of-kernel built wireless modules, say N here.
+
+config GKI_HIDDEN_USB_CONFIGS
+	bool "Hiddel USB configurations needed for GKI"
+	select USB_PHY
+	help
+	  Dummy config option used to enable all USB related hidden configs.
+	  These configurations are usually only selected by another config
+	  option or a combination of them.
+
+	  If you are not building a kernel to be used for a variety of
+	  out-of-kernel build USB drivers, say N here.
+
+config GKI_HIDDEN_SOC_BUS_CONFIGS
+	bool "Hidden SoC bus configuration needed for GKI"
+	select SOC_BUS
+	  help
+	    Dummy config option used to enable SOC_BUS hidden Kconfig.
+	    The configuration is required for SoCs to register themselves to the bus.
+
+	    If you are not building a kernel to be used for a variety of SoCs and
+	    out-of-tree drivers, say N here.
+
+config GKI_HIDDEN_RPMSG_CONFIGS
+	bool "Hidden RPMSG configuration needed for GKI"
+	select RPMSG
+	help
+	  Dummy config option used to enable the hidden RPMSG config.
+	  This configuration is usually only selected by another config
+	  option or a combination of them.
+
+	  If you are not building a kernel to be used for a variety of
+	  out-of-kernel build RPMSG drivers, say N here.
+
+config GKI_HIDDEN_GPU_CONFIGS
+	bool "Hidden GPU configuration needed for GKI"
+	select TRACE_GPU_MEM
+	help
+	  Dummy config option used to enable the hidden GPU config.
+	  These are normally selected implicitly when a module
+	  that relies on it is configured.
+
+config GKI_HIDDEN_IRQ_CONFIGS
+	bool "Hidden IRQ configuration needed for GKI"
+	select GENERIC_IRQ_CHIP
+	select IRQ_DOMAIN_HIERARCHY
+	select IRQ_FASTEOI_HIERARCHY_HANDLERS
+	help
+	  Dummy config option used to enable GENERIC_IRQ_CHIP hidden
+	  config, required by various SoC platforms. This is usually
+	  selected by ARCH_*.
+
+config GKI_HIDDEN_HYPERVISOR_CONFIGS
+	bool "Hidden hypervisor configuration needed for GKI"
+	select SYS_HYPERVISOR
+	help
+	  Dummy config option used to enable the SYS_HYPERVISOR hidden
+	  config, required by various SoC platforms. This is usually
+	  selected by XEN or S390.
+
+config GKI_HIDDEN_NET_CONFIGS
+	bool "Hidden networking configuration needed for GKI"
+	select PAGE_POOL
+	select NET_PTP_CLASSIFY
+	help
+	  Dummy config option used to enable the networking hidden
+	  config, required by various SoC platforms.
+
+config GKI_HIDDEN_PHY_CONFIGS
+	bool "Hidden PHY configuration needed for GKI"
+	select GENERIC_PHY_MIPI_DPHY
+	help
+	  Dummy config option used to enable the hidden PHY configs,
+	  required by various SoC platforms.
+
+config GKI_HIDDEN_MM_CONFIGS
+	bool "Hidden MM configuration needed for GKI"
+	select PAGE_REPORTING
+	select BALLOON_COMPACTION
+	select MEMORY_BALLOON
+	help
+	  Dummy config option used to enable hidden MM configs,
+	  currently required for VIRTIO_BALLOON
+
+config GKI_HIDDEN_ETHERNET_CONFIGS
+	bool "Hidden Ethernet configuration needed for GKI"
+	select PHYLINK
+	help
+	  Dummy config option used to enable the hidden Ethernet PHYLINK
+	  configs, required by various ethernet devices.
+
+config GKI_HIDDEN_DMA_CONFIGS
+	bool "Hidden DMA configuration needed for GKI"
+	select ASYNC_TX_ENABLE_CHANNEL_SWITCH
+	help
+	  Dummy config option used to enable the hidden DMA configs,
+	  required by various SoC platforms.
+
+config GKI_NET_XFRM_HACKS
+	bool "XFRM changes required by Android"
+	help
+	  Android Networking tests fail without this.
+
+# Atrocities needed for
+# a) building GKI modules in separate tree, or
+# b) building drivers that are not modularizable
+#
+# All of these should be reworked into an upstream solution
+# if possible.
+#
+config GKI_HACKS_TO_FIX
+	bool "GKI Dummy config options"
+	select GKI_NET_XFRM_HACKS
+	select GKI_HIDDEN_CRYPTO_CONFIGS
+	select GKI_HIDDEN_DRM_CONFIGS
+	select GKI_HIDDEN_MCP251XFD_CONFIGS
+	select GKI_HIDDEN_REGMAP_CONFIGS
+	select GKI_HIDDEN_SND_CONFIGS
+	select GKI_HIDDEN_SND_SOC_CONFIGS
+	select GKI_HIDDEN_MMC_CONFIGS
+	select GKI_HIDDEN_GPIO_CONFIGS
+	select GKI_HIDDEN_QCOM_CONFIGS
+	select GKI_LEGACY_WEXT_ALLCONFIG
+	select GKI_HIDDEN_MEDIA_CONFIGS
+	select GKI_HIDDEN_VIRTUAL_CONFIGS
+	select GKI_HIDDEN_USB_CONFIGS
+	select GKI_HIDDEN_SOC_BUS_CONFIGS
+	select GKI_HIDDEN_RPMSG_CONFIGS
+	select GKI_HIDDEN_GPU_CONFIGS
+	select GKI_HIDDEN_IRQ_CONFIGS
+	select GKI_HIDDEN_HYPERVISOR_CONFIGS
+	select GKI_HIDDEN_NET_CONFIGS
+	select GKI_HIDDEN_PHY_CONFIGS
+	select GKI_HIDDEN_MM_CONFIGS
+	select GKI_HIDDEN_ETHERNET_CONFIGS
+	select GKI_HIDDEN_DMA_CONFIGS
+
+	help
+	  Dummy config option used to enable core functionality used by
+	  modules that may not be selectable in this config.
+
+	  Unless you are building a GKI kernel to be used with modules
+	  built from a different config, say N here.

diff --git a/init/init_task.c b/init/init_task.c
index ff6c4b9..31ceb0e 100644
--- a/init/init_task.c
+++ b/init/init_task.c

@@ -210,6 +210,10 @@ struct task_struct init_task
 #ifdef CONFIG_SECCOMP_FILTER
 	.seccomp	= { .filter_count = ATOMIC_INIT(0) },
 #endif
+#ifdef CONFIG_ANDROID_VENDOR_OEM_DATA
+	.android_vendor_data1 = {0, },
+	.android_oem_data1 = {0, },
+#endif
 };
 EXPORT_SYMBOL(init_task);
 

diff --git a/init/main.c b/init/main.c
index aa21add..8937f68 100644
--- a/init/main.c
+++ b/init/main.c

@@ -112,8 +112,6 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/initcall.h>
 
-#include <kunit/test.h>
-
 static int kernel_init(void *);
 
 extern void init_IRQ(void);
@@ -1630,8 +1628,6 @@ static noinline void __init kernel_init_freeable(void)
 
 	do_basic_setup();
 
-	kunit_run_all_tests();
-
 	wait_for_initramfs();
 	console_on_rootfs();
 

diff --git a/kernel/.gitignore b/kernel/.gitignore
index c6b299a..43a96fc 100644
--- a/kernel/.gitignore
+++ b/kernel/.gitignore

@@ -1,3 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 /config_data
 /kheaders.md5
+/gki_module_exported.h
+/gki_module_protected.h

diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 341c94f..0f0b283 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile

@@ -43,3 +43,6 @@
 obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
 $(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
 	$(call if_changed_rule,cc_o_c)
+ifeq ($(CONFIG_FUSE_BPF),y)
+obj-$(CONFIG_BPF_SYSCALL) += bpf_fuse.o
+endif

diff --git a/kernel/bpf/bpf_fuse.c b/kernel/bpf/bpf_fuse.c
new file mode 100644
index 0000000..c6aa670b
--- /dev/null
+++ b/kernel/bpf/bpf_fuse.c

@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2021 Google LLC
+
+#include <linux/filter.h>
+#include <linux/android_fuse.h>
+
+static const struct bpf_func_proto *
+fuse_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_trace_printk:
+			return bpf_get_trace_printk_proto();
+
+	case BPF_FUNC_get_current_uid_gid:
+			return &bpf_get_current_uid_gid_proto;
+
+	case BPF_FUNC_get_current_pid_tgid:
+			return &bpf_get_current_pid_tgid_proto;
+
+	case BPF_FUNC_map_lookup_elem:
+		return &bpf_map_lookup_elem_proto;
+
+	case BPF_FUNC_map_update_elem:
+		return &bpf_map_update_elem_proto;
+
+	default:
+		pr_debug("Invalid fuse bpf func %d\n", func_id);
+		return NULL;
+	}
+}
+
+static bool fuse_prog_is_valid_access(int off, int size,
+				enum bpf_access_type type,
+				const struct bpf_prog *prog,
+				struct bpf_insn_access_aux *info)
+{
+	int i;
+
+	if (off < 0 || off > offsetofend(struct fuse_bpf_args, out_args))
+		return false;
+
+	/* TODO This is garbage. Do it properly */
+	for (i = 0; i < 5; i++) {
+		if (off == offsetof(struct fuse_bpf_args, in_args[i].value)) {
+			info->reg_type = PTR_TO_BUF;
+			info->ctx_field_size = 256;
+			if (type != BPF_READ)
+				return false;
+			return true;
+		}
+	}
+	for (i = 0; i < 3; i++) {
+		if (off == offsetof(struct fuse_bpf_args, out_args[i].value)) {
+			info->reg_type = PTR_TO_BUF;
+			info->ctx_field_size = 256;
+			return true;
+		}
+	}
+	if (type != BPF_READ)
+		return false;
+
+	return true;
+}
+
+const struct bpf_verifier_ops fuse_verifier_ops = {
+	.get_func_proto  = fuse_prog_func_proto,
+	.is_valid_access = fuse_prog_is_valid_access,
+};
+
+const struct bpf_prog_ops fuse_prog_ops = {
+};
+
+struct bpf_prog *fuse_get_bpf_prog(struct file *file)
+{
+	struct bpf_prog *bpf_prog = ERR_PTR(-EINVAL);
+
+	if (!file || IS_ERR(file))
+		return bpf_prog;
+	/**
+	 * Two ways of getting a bpf prog from another task's fd, since
+	 * bpf_prog_get_type_dev only works with an fd
+	 *
+	 * 1) Duplicate a little of the needed code. Requires access to
+	 *    bpf_prog_fops for validation, which is not exported for modules
+	 * 2) Insert the bpf_file object into a fd from the current task
+	 *    Stupidly complex, but I think OK, as security checks are not run
+	 *    during the existence of the handle
+	 *
+	 * Best would be to upstream 1) into kernel/bpf/syscall.c and export it
+	 * for use here. Failing that, we have to use 2, since fuse must be
+	 * compilable as a module.
+	 */
+#if 1
+	if (file->f_op != &bpf_prog_fops)
+		goto out;
+
+	bpf_prog = file->private_data;
+	if (bpf_prog->type == BPF_PROG_TYPE_FUSE)
+		bpf_prog_inc(bpf_prog);
+	else
+		bpf_prog = ERR_PTR(-EINVAL);
+
+#else
+	{
+		int task_fd = get_unused_fd_flags(file->f_flags);
+
+		if (task_fd < 0)
+			goto out;
+
+		fd_install(task_fd, file);
+
+		bpf_prog = bpf_prog_get_type_dev(task_fd, BPF_PROG_TYPE_FUSE,
+						 false);
+
+		/* Close the fd, which also closes the file */
+		__close_fd(current->files, task_fd);
+		file = NULL;
+	}
+#endif
+
+out:
+	if (file)
+		fput(file);
+	return bpf_prog;
+}
+EXPORT_SYMBOL(fuse_get_bpf_prog);
+
+

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index a7c2f0c..81dc6f9 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c

@@ -3,6 +3,7 @@
 
 #include <uapi/linux/btf.h>
 #include <uapi/linux/bpf.h>
+#include <uapi/linux/android_fuse.h>
 #include <uapi/linux/bpf_perf_event.h>
 #include <uapi/linux/types.h>
 #include <linux/seq_file.h>
@@ -1403,12 +1404,18 @@ __printf(4, 5) static void __btf_verifier_log_type(struct btf_verifier_env *env,
 	if (!bpf_verifier_log_needed(log))
 		return;
 
-	/* btf verifier prints all types it is processing via
-	 * btf_verifier_log_type(..., fmt = NULL).
-	 * Skip those prints for in-kernel BTF verification.
-	 */
-	if (log->level == BPF_LOG_KERNEL && !fmt)
-		return;
+	if (log->level == BPF_LOG_KERNEL) {
+		/* btf verifier prints all types it is processing via
+		 * btf_verifier_log_type(..., fmt = NULL).
+		 * Skip those prints for in-kernel BTF verification.
+		 */
+		if (!fmt)
+			return;
+
+		/* Skip logging when loading module BTF with mismatches permitted */
+		if (env->btf->base_btf && IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH))
+			return;
+	}
 
 	__btf_verifier_log(log, "[%u] %s %s%s",
 			   env->log_type_id,
@@ -1447,8 +1454,15 @@ static void btf_verifier_log_member(struct btf_verifier_env *env,
 	if (!bpf_verifier_log_needed(log))
 		return;
 
-	if (log->level == BPF_LOG_KERNEL && !fmt)
-		return;
+	if (log->level == BPF_LOG_KERNEL) {
+		if (!fmt)
+			return;
+
+		/* Skip logging when loading module BTF with mismatches permitted */
+		if (env->btf->base_btf && IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH))
+			return;
+	}
+
 	/* The CHECK_META phase already did a btf dump.
 	 *
 	 * If member is logged again, it must hit an error in
@@ -7111,11 +7125,14 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
 		}
 		btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
 		if (IS_ERR(btf)) {
-			pr_warn("failed to validate module [%s] BTF: %ld\n",
-				mod->name, PTR_ERR(btf));
 			kfree(btf_mod);
-			if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH))
+			if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) {
+				pr_warn("failed to validate module [%s] BTF: %ld\n",
+					mod->name, PTR_ERR(btf));
 				err = PTR_ERR(btf);
+			} else {
+				pr_warn_once("Kernel module BTF mismatch detected, BTF debug info may be unavailable for some modules\n");
+			}
 			goto out;
 		}
 		err = btf_alloc_id(btf);

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6c61dba..816579f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c

@@ -36,6 +36,8 @@
 #include <linux/memcontrol.h>
 #include <linux/trace_events.h>
 
+#include <trace/hooks/syscall_check.h>
+
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
@@ -4936,6 +4938,8 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 	if (copy_from_bpfptr(&attr, uattr, size) != 0)
 		return -EFAULT;
 
+	trace_android_vh_check_bpf_syscall(cmd, &attr, size);
+
 	err = security_bpf(cmd, &attr, size);
 	if (err < 0)
 		return err;

diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 52bb5a74..c6822bf 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c

@@ -17,6 +17,7 @@
 #include <linux/fs_parser.h>
 
 #include <trace/events/cgroup.h>
+#include <trace/hooks/cgroup.h>
 
 /*
  * pidlists linger the following amount before being destroyed.  The goal
@@ -514,13 +515,15 @@ static ssize_t __cgroup1_procs_write(struct kernfs_open_file *of,
 	tcred = get_task_cred(task);
 	if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
 	    !uid_eq(cred->euid, tcred->uid) &&
-	    !uid_eq(cred->euid, tcred->suid))
+	    !uid_eq(cred->euid, tcred->suid) &&
+	    !ns_capable(tcred->user_ns, CAP_SYS_NICE))
 		ret = -EACCES;
 	put_cred(tcred);
 	if (ret)
 		goto out_finish;
 
 	ret = cgroup_attach_task(cgrp, task, threadgroup);
+	trace_android_vh_cgroup_set_task(ret, task);
 
 out_finish:
 	cgroup_procs_write_finish(task, locked);

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 2319946..7f20e57 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c

@@ -62,6 +62,9 @@
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/cgroup.h>
+#undef CREATE_TRACE_POINTS
+
+#include <trace/hooks/cgroup.h>
 
 #define CGROUP_FILE_NAME_MAX		(MAX_CGROUP_TYPE_NAMELEN +	\
 					 MAX_CFTYPE_NAME + 2)
@@ -2496,6 +2499,7 @@ struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset,
 
 	return cgroup_taskset_next(tset, dst_cssp);
 }
+EXPORT_SYMBOL_GPL(cgroup_taskset_first);
 
 /**
  * cgroup_taskset_next - iterate to the next task in taskset
@@ -2542,6 +2546,7 @@ struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset,
 
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(cgroup_taskset_next);
 
 /**
  * cgroup_migrate_execute - migrate a taskset
@@ -2612,6 +2617,7 @@ static int cgroup_migrate_execute(struct cgroup_mgctx *mgctx)
 		do_each_subsys_mask(ss, ssid, mgctx->ss_mask) {
 			if (ss->attach) {
 				tset->ssid = ssid;
+				trace_android_vh_cgroup_attach(ss, tset);
 				ss->attach(tset);
 			}
 		} while_each_subsys_mask();
@@ -4557,6 +4563,7 @@ struct cgroup_subsys_state *css_next_child(struct cgroup_subsys_state *pos,
 		return next;
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(css_next_child);
 
 /**
  * css_next_descendant_pre - find the next descendant for pre-order walk

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a753adc..74c0837 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c

@@ -67,6 +67,8 @@
 #include <linux/cgroup.h>
 #include <linux/wait.h>
 
+#include <trace/hooks/sched.h>
+
 DEFINE_STATIC_KEY_FALSE(cpusets_pre_enable_key);
 DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key);
 
@@ -137,6 +139,7 @@ struct cpuset {
 
 	/* user-configured CPUs and Memory Nodes allow to tasks */
 	cpumask_var_t cpus_allowed;
+	cpumask_var_t cpus_requested;
 	nodemask_t mems_allowed;
 
 	/* effective CPUs and Memory Nodes allow to tasks */
@@ -404,17 +407,6 @@ static struct cpuset top_cpuset = {
  */
 
 DEFINE_STATIC_PERCPU_RWSEM(cpuset_rwsem);
-
-void cpuset_read_lock(void)
-{
-	percpu_down_read(&cpuset_rwsem);
-}
-
-void cpuset_read_unlock(void)
-{
-	percpu_up_read(&cpuset_rwsem);
-}
-
 static DEFINE_SPINLOCK(callback_lock);
 
 static struct workqueue_struct *cpuset_migrate_mm_wq;
@@ -576,7 +568,7 @@ static void cpuset_update_task_spread_flag(struct cpuset *cs,
 
 static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
 {
-	return	cpumask_subset(p->cpus_allowed, q->cpus_allowed) &&
+	return	cpumask_subset(p->cpus_requested, q->cpus_requested) &&
 		nodes_subset(p->mems_allowed, q->mems_allowed) &&
 		is_cpu_exclusive(p) <= is_cpu_exclusive(q) &&
 		is_mem_exclusive(p) <= is_mem_exclusive(q);
@@ -613,8 +605,13 @@ static inline int alloc_cpumasks(struct cpuset *cs, struct tmpmasks *tmp)
 	if (!zalloc_cpumask_var(pmask3, GFP_KERNEL))
 		goto free_two;
 
+	if (cs && !zalloc_cpumask_var(&cs->cpus_requested, GFP_KERNEL))
+		goto free_three;
+
 	return 0;
 
+free_three:
+	free_cpumask_var(*pmask3);
 free_two:
 	free_cpumask_var(*pmask2);
 free_one:
@@ -631,6 +628,7 @@ static inline void free_cpumasks(struct cpuset *cs, struct tmpmasks *tmp)
 {
 	if (cs) {
 		free_cpumask_var(cs->cpus_allowed);
+		free_cpumask_var(cs->cpus_requested);
 		free_cpumask_var(cs->effective_cpus);
 		free_cpumask_var(cs->subparts_cpus);
 	}
@@ -659,6 +657,7 @@ static struct cpuset *alloc_trial_cpuset(struct cpuset *cs)
 	}
 
 	cpumask_copy(trial->cpus_allowed, cs->cpus_allowed);
+	cpumask_copy(trial->cpus_requested, cs->cpus_requested);
 	cpumask_copy(trial->effective_cpus, cs->effective_cpus);
 	return trial;
 }
@@ -773,7 +772,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 	cpuset_for_each_child(c, css, par) {
 		if ((is_cpu_exclusive(trial) || is_cpu_exclusive(c)) &&
 		    c != cur &&
-		    cpumask_intersects(trial->cpus_allowed, c->cpus_allowed))
+		    cpumask_intersects(trial->cpus_requested, c->cpus_requested))
 			goto out;
 		if ((is_mem_exclusive(trial) || is_mem_exclusive(c)) &&
 		    c != cur &&
@@ -1197,6 +1196,19 @@ void rebuild_sched_domains(void)
 	percpu_up_write(&cpuset_rwsem);
 	cpus_read_unlock();
 }
+EXPORT_SYMBOL_GPL(rebuild_sched_domains);
+
+static int update_cpus_allowed(struct cpuset *cs, struct task_struct *p,
+				const struct cpumask *new_mask)
+{
+	int ret = -EINVAL;
+
+	trace_android_rvh_update_cpus_allowed(p, cs->cpus_requested, new_mask, &ret);
+	if (!ret)
+		return ret;
+
+	return set_cpus_allowed_ptr(p, new_mask);
+}
 
 /**
  * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
@@ -1220,7 +1232,7 @@ static void update_tasks_cpumask(struct cpuset *cs)
 		if (top_cs && (task->flags & PF_KTHREAD) &&
 		    kthread_is_per_cpu(task))
 			continue;
-		set_cpus_allowed_ptr(task, cs->effective_cpus);
+		update_cpus_allowed(cs, task, cs->effective_cpus);
 	}
 	css_task_iter_end(&it);
 }
@@ -1242,10 +1254,10 @@ static void compute_effective_cpumask(struct cpumask *new_cpus,
 	if (parent->nr_subparts_cpus) {
 		cpumask_or(new_cpus, parent->effective_cpus,
 			   parent->subparts_cpus);
-		cpumask_and(new_cpus, new_cpus, cs->cpus_allowed);
+		cpumask_and(new_cpus, new_cpus, cs->cpus_requested);
 		cpumask_and(new_cpus, new_cpus, cpu_active_mask);
 	} else {
-		cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus);
+		cpumask_and(new_cpus, cs->cpus_requested, parent_cs(cs)->effective_cpus);
 	}
 }
 
@@ -1737,25 +1749,26 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 		return -EACCES;
 
 	/*
-	 * An empty cpus_allowed is ok only if the cpuset has no tasks.
+	 * An empty cpus_requested is ok only if the cpuset has no tasks.
 	 * Since cpulist_parse() fails on an empty mask, we special case
 	 * that parsing.  The validate_change() call ensures that cpusets
 	 * with tasks have cpus.
 	 */
 	if (!*buf) {
-		cpumask_clear(trialcs->cpus_allowed);
+		cpumask_clear(trialcs->cpus_requested);
 	} else {
-		retval = cpulist_parse(buf, trialcs->cpus_allowed);
+		retval = cpulist_parse(buf, trialcs->cpus_requested);
 		if (retval < 0)
 			return retval;
-
-		if (!cpumask_subset(trialcs->cpus_allowed,
-				    top_cpuset.cpus_allowed))
-			return -EINVAL;
 	}
 
+	if (!cpumask_subset(trialcs->cpus_requested, cpu_present_mask))
+		return -EINVAL;
+
+	cpumask_and(trialcs->cpus_allowed, trialcs->cpus_requested, cpu_active_mask);
+
 	/* Nothing to do if the cpus didn't change */
-	if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed))
+	if (cpumask_equal(cs->cpus_requested, trialcs->cpus_requested))
 		return 0;
 
 #ifdef CONFIG_CPUMASK_OFFSTACK
@@ -1810,6 +1823,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 				  parent_cs(cs));
 	spin_lock_irq(&callback_lock);
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
+	cpumask_copy(cs->cpus_requested, trialcs->cpus_requested);
 
 	/*
 	 * Make sure that subparts_cpus, if not empty, is a subset of
@@ -2528,7 +2542,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 		 * can_attach beforehand should guarantee that this doesn't
 		 * fail.  TODO: have a better way to handle failure here
 		 */
-		WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));
+		WARN_ON_ONCE(update_cpus_allowed(cs, task, cpus_attach));
 
 		cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to);
 		cpuset_update_task_spread_flag(cs, task);
@@ -2752,7 +2766,7 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
 
 	switch (type) {
 	case FILE_CPULIST:
-		seq_printf(sf, "%*pbl\n", cpumask_pr_args(cs->cpus_allowed));
+		seq_printf(sf, "%*pbl\n", cpumask_pr_args(cs->cpus_requested));
 		break;
 	case FILE_MEMLIST:
 		seq_printf(sf, "%*pbl\n", nodemask_pr_args(&cs->mems_allowed));
@@ -3141,6 +3155,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
 	cs->mems_allowed = parent->mems_allowed;
 	cs->effective_mems = parent->mems_allowed;
 	cpumask_copy(cs->cpus_allowed, parent->cpus_allowed);
+	cpumask_copy(cs->cpus_requested, parent->cpus_requested);
 	cpumask_copy(cs->effective_cpus, parent->cpus_allowed);
 	spin_unlock_irq(&callback_lock);
 out_unlock:
@@ -3257,8 +3272,10 @@ int __init cpuset_init(void)
 	BUG_ON(!alloc_cpumask_var(&top_cpuset.cpus_allowed, GFP_KERNEL));
 	BUG_ON(!alloc_cpumask_var(&top_cpuset.effective_cpus, GFP_KERNEL));
 	BUG_ON(!zalloc_cpumask_var(&top_cpuset.subparts_cpus, GFP_KERNEL));
+	BUG_ON(!alloc_cpumask_var(&top_cpuset.cpus_requested, GFP_KERNEL));
 
 	cpumask_setall(top_cpuset.cpus_allowed);
+	cpumask_setall(top_cpuset.cpus_requested);
 	nodes_setall(top_cpuset.mems_allowed);
 	cpumask_setall(top_cpuset.effective_cpus);
 	nodes_setall(top_cpuset.effective_mems);

diff --git a/kernel/cred.c b/kernel/cred.c
index e10c15f..d4a529c 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c

@@ -17,6 +17,8 @@
 #include <linux/cn_proc.h>
 #include <linux/uidgid.h>
 
+#include <trace/hooks/creds.h>
+
 #if 0
 #define kdebug(FMT, ...)						\
 	printk("[%-5.5s%5u] " FMT "\n",					\
@@ -181,6 +183,7 @@ void exit_creds(struct task_struct *tsk)
 	key_put(tsk->cached_requested_key);
 	tsk->cached_requested_key = NULL;
 #endif
+	trace_android_rvh_exit_creds(tsk, cred);
 }
 
 /**
@@ -499,6 +502,7 @@ int commit_creds(struct cred *new)
 		inc_rlimit_ucounts(new->ucounts, UCOUNT_RLIMIT_NPROC, 1);
 	rcu_assign_pointer(task->real_cred, new);
 	rcu_assign_pointer(task->cred, new);
+	trace_android_rvh_commit_creds(task, new);
 	if (new->user != old->user || new->user_ns != old->user_ns)
 		dec_rlimit_ucounts(old->ucounts, UCOUNT_RLIMIT_NPROC, 1);
 	alter_cred_subscribers(old, -2);
@@ -576,6 +580,7 @@ const struct cred *override_creds(const struct cred *new)
 	get_new_cred((struct cred *)new);
 	alter_cred_subscribers(new, 1);
 	rcu_assign_pointer(current->cred, new);
+	trace_android_rvh_override_creds(current, new);
 	alter_cred_subscribers(old, -1);
 
 	kdebug("override_creds() = %p{%d,%d}", old,
@@ -604,6 +609,7 @@ void revert_creds(const struct cred *old)
 	validate_creds(override);
 	alter_cred_subscribers(old, 1);
 	rcu_assign_pointer(current->cred, old);
+	trace_android_rvh_revert_creds(current, old);
 	alter_cred_subscribers(override, -1);
 	put_cred(override);
 }

diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 6ea80ae..d06c598 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c

@@ -58,6 +58,7 @@
 #endif
 
 struct cma *dma_contiguous_default_area;
+EXPORT_SYMBOL_GPL(dma_contiguous_default_area);
 
 /*
  * Default global CMA area size can be defined in kernel's .config.

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 18c93c2..082329b 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c

@@ -1134,10 +1134,11 @@ static void check_sync(struct device *dev,
 				dir2name[entry->direction],
 				dir2name[ref->direction]);
 
+	/* sg list count can be less than map count when partial cache sync */
 	if (ref->sg_call_ents && ref->type == dma_debug_sg &&
-	    ref->sg_call_ents != entry->sg_call_ents) {
+	    ref->sg_call_ents > entry->sg_call_ents) {
 		err_printk(ref->dev, entry, "device driver syncs "
-			   "DMA sg list with different entry count "
+			   "DMA sg list count larger than map count "
 			   "[map count=%d] [sync count=%d]\n",
 			   entry->sg_call_ents, ref->sg_call_ents);
 	}

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3b9e861..4ffe195 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c

@@ -4516,6 +4516,7 @@ int perf_event_read_local(struct perf_event *event, u64 *value,
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(perf_event_read_local);
 
 static int perf_event_read(struct perf_event *event, bool group)
 {
@@ -6591,23 +6592,12 @@ static void perf_pending_task(struct callback_head *head)
 #ifdef CONFIG_GUEST_PERF_EVENTS
 struct perf_guest_info_callbacks __rcu *perf_guest_cbs;
 
-DEFINE_STATIC_CALL_RET0(__perf_guest_state, *perf_guest_cbs->state);
-DEFINE_STATIC_CALL_RET0(__perf_guest_get_ip, *perf_guest_cbs->get_ip);
-DEFINE_STATIC_CALL_RET0(__perf_guest_handle_intel_pt_intr, *perf_guest_cbs->handle_intel_pt_intr);
-
 void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 {
 	if (WARN_ON_ONCE(rcu_access_pointer(perf_guest_cbs)))
 		return;
 
 	rcu_assign_pointer(perf_guest_cbs, cbs);
-	static_call_update(__perf_guest_state, cbs->state);
-	static_call_update(__perf_guest_get_ip, cbs->get_ip);
-
-	/* Implementing ->handle_intel_pt_intr is optional. */
-	if (cbs->handle_intel_pt_intr)
-		static_call_update(__perf_guest_handle_intel_pt_intr,
-				   cbs->handle_intel_pt_intr);
 }
 EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
 
@@ -6617,10 +6607,6 @@ void perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 		return;
 
 	rcu_assign_pointer(perf_guest_cbs, NULL);
-	static_call_update(__perf_guest_state, (void *)&__static_call_return0);
-	static_call_update(__perf_guest_get_ip, (void *)&__static_call_return0);
-	static_call_update(__perf_guest_handle_intel_pt_intr,
-			   (void *)&__static_call_return0);
 	synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);

diff --git a/kernel/fork.c b/kernel/fork.c
index 844dfdc..d37b266 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c

@@ -97,6 +97,7 @@
 #include <linux/scs.h>
 #include <linux/io_uring.h>
 #include <linux/bpf.h>
+#include <linux/cpufreq_times.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -109,6 +110,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/task.h>
 
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/sched.h>
 /*
  * Minimum number of threads to boot the kernel
  */
@@ -119,6 +122,8 @@
  */
 #define MAX_THREADS FUTEX_TID_MASK
 
+EXPORT_TRACEPOINT_SYMBOL_GPL(task_newtask);
+
 /*
  * Protected counters by write_lock_irq(&tasklist_lock)
  */
@@ -139,6 +144,7 @@ static const char * const resident_page_types[] = {
 DEFINE_PER_CPU(unsigned long, process_counts) = 0;
 
 __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
+EXPORT_SYMBOL_GPL(tasklist_lock);
 
 #ifdef CONFIG_PROVE_RCU
 int lockdep_tasklist_lock_is_held(void)
@@ -538,9 +544,11 @@ void free_task(struct task_struct *tsk)
 #ifdef CONFIG_SECCOMP
 	WARN_ON_ONCE(tsk->seccomp.filter);
 #endif
+	cpufreq_task_times_exit(tsk);
 	release_user_cpus_ptr(tsk);
 	scs_release(tsk);
 
+	trace_android_vh_free_task(tsk);
 #ifndef CONFIG_THREAD_INFO_IN_TASK
 	/*
 	 * The task is finally done with both the stack and thread_info,
@@ -1050,6 +1058,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	tsk->reported_split_lock = 0;
 #endif
 
+	android_init_vendor_data(tsk, 1);
+	android_init_oem_data(tsk, 1);
+
 	return tsk;
 
 free_stack:
@@ -2099,6 +2110,8 @@ static __latent_entropy struct task_struct *copy_process(
 		siginitsetinv(&p->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
 	}
 
+	cpufreq_task_times_init(p);
+
 	p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? args->child_tid : NULL;
 	/*
 	 * Clear TID on mm_release()?
@@ -2679,6 +2692,8 @@ pid_t kernel_clone(struct kernel_clone_args *args)
 	if (IS_ERR(p))
 		return PTR_ERR(p);
 
+	cpufreq_task_times_alloc(p);
+
 	/*
 	 * Do this prior waking up the new thread - the thread pointer
 	 * might get invalid after that point, if the thread exits quickly.

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 514e458..94d6e02 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c

@@ -40,6 +40,7 @@
 
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
+#include <trace/hooks/futex.h>
 
 /*
  * The base of the bucket array and its size are always used together
@@ -543,6 +544,7 @@ void futex_q_unlock(struct futex_hash_bucket *hb)
 void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb)
 {
 	int prio;
+	bool already_on_hb = false;
 
 	/*
 	 * The priority used to register this element is
@@ -555,7 +557,9 @@ void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb)
 	prio = min(current->normal_prio, MAX_RT_PRIO);
 
 	plist_node_init(&q->list, prio);
-	plist_add(&q->list, &hb->chain);
+	trace_android_vh_alter_futex_plist_add(&q->list, &hb->chain, &already_on_hb);
+	if (!already_on_hb)
+		plist_add(&q->list, &hb->chain);
 	q->task = current;
 }
 

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index c71889f..6766fd4 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c

@@ -24,6 +24,8 @@
 #include <linux/sched/sysctl.h>
 
 #include <trace/events/sched.h>
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/hung_task.h>
 
 /*
  * The number of tasks checked:
@@ -180,6 +182,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	int max_count = sysctl_hung_task_check_count;
 	unsigned long last_break = jiffies;
 	struct task_struct *g, *t;
+	bool need_check = true;
 
 	/*
 	 * If the system crashed already then all bets are off,
@@ -204,12 +207,16 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 		 * skip the TASK_KILLABLE tasks -- these can be killed
 		 * skip the TASK_IDLE tasks -- those are genuinely idle
 		 */
-		state = READ_ONCE(t->__state);
-		if ((state & TASK_UNINTERRUPTIBLE) &&
-		    !(state & TASK_WAKEKILL) &&
-		    !(state & TASK_NOLOAD))
-			check_hung_task(t, timeout);
+		trace_android_vh_check_uninterrupt_tasks(t, timeout, &need_check);
+		if (need_check) {
+			state = READ_ONCE(t->__state);
+			if ((state & TASK_UNINTERRUPTIBLE) &&
+			    !(state & TASK_WAKEKILL) &&
+			    !(state & TASK_NOLOAD))
+				check_hung_task(t, timeout);
+		}
 	}
+	trace_android_vh_check_uninterrupt_tasks_done(NULL);
  unlock:
 	rcu_read_unlock();
 	if (hung_task_show_lock)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 8ac37e8..4ca134b9 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c

@@ -14,6 +14,7 @@
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
 #include <linux/irqdomain.h>
+#include <linux/wakeup_reason.h>
 
 #include <trace/events/irq.h>
 
@@ -509,8 +510,22 @@ static bool irq_may_run(struct irq_desc *desc)
 	 * If the interrupt is not in progress and is not an armed
 	 * wakeup interrupt, proceed.
 	 */
-	if (!irqd_has_set(&desc->irq_data, mask))
+	if (!irqd_has_set(&desc->irq_data, mask)) {
+#ifdef CONFIG_PM_SLEEP
+		if (unlikely(desc->no_suspend_depth &&
+			     irqd_is_wakeup_set(&desc->irq_data))) {
+			unsigned int irq = irq_desc_get_irq(desc);
+			const char *name = "(unnamed)";
+
+			if (desc->action && desc->action->name)
+				name = desc->action->name;
+
+			log_abnormal_wakeup_reason("misconfigured IRQ %u %s",
+						   irq, name);
+		}
+#endif
 		return true;
+	}
 
 	/*
 	 * If the interrupt is an armed wakeup source, mark it pending

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index fd09962..53b6358 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c

@@ -355,9 +355,7 @@ struct irq_desc *irq_to_desc(unsigned int irq)
 {
 	return radix_tree_lookup(&irq_desc_tree, irq);
 }
-#ifdef CONFIG_KVM_BOOK3S_64_HV_MODULE
 EXPORT_SYMBOL_GPL(irq_to_desc);
-#endif
 
 static void delete_irq_desc(unsigned int irq)
 {
@@ -936,6 +934,7 @@ unsigned int kstat_irqs_cpu(unsigned int irq, int cpu)
 	return desc && desc->kstat_irqs ?
 			*per_cpu_ptr(desc->kstat_irqs, cpu) : 0;
 }
+EXPORT_SYMBOL_GPL(kstat_irqs_cpu);
 
 static bool irq_is_nmi(struct irq_desc *desc)
 {
@@ -993,3 +992,4 @@ void __irq_set_lockdep_class(unsigned int irq, struct lock_class_key *lock_class
 }
 EXPORT_SYMBOL_GPL(__irq_set_lockdep_class);
 #endif
+EXPORT_SYMBOL_GPL(kstat_irqs_usr);

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 7afa40f..fa98e18 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c

@@ -170,6 +170,7 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
 	return true;
 #endif /* CONFIG_SMP */
 }
+EXPORT_SYMBOL_GPL(irq_work_queue_on);
 
 bool irq_work_needs_cpu(void)
 {

diff --git a/kernel/kthread.c b/kernel/kthread.c
index f97fd01..2610210 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c

@@ -541,6 +541,7 @@ void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
 {
 	__kthread_bind_mask(p, mask, TASK_UNINTERRUPTIBLE);
 }
+EXPORT_SYMBOL_GPL(kthread_bind_mask);
 
 /**
  * kthread_bind - bind a just-created kthread to a cpu.

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 4487359..7e2a83a 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c

@@ -31,6 +31,7 @@
 
 #ifndef CONFIG_PREEMPT_RT
 #include "lock_events.h"
+#include <trace/hooks/rwsem.h>
 
 /*
  * The least significant 2 bits of the owner value has the following
@@ -330,6 +331,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name,
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	osq_lock_init(&sem->osq);
 #endif
+	trace_android_vh_rwsem_init(sem);
 }
 EXPORT_SYMBOL(__init_rwsem);
 
@@ -1008,6 +1010,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 	long rcnt = (count >> RWSEM_READER_SHIFT);
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
+	bool already_on_list = false;
 
 	/*
 	 * To prevent a constant stream of readers from starving a sleeping
@@ -1064,12 +1067,17 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 		}
 		adjustment += RWSEM_FLAG_WAITERS;
 	}
-	rwsem_add_waiter(sem, &waiter);
+	trace_android_vh_alter_rwsem_list_add(
+					&waiter,
+					sem, &already_on_list);
+	if (!already_on_list)
+		rwsem_add_waiter(sem, &waiter);
 
 	/* we're now waiting on the lock, but no longer actively locking */
 	count = atomic_long_add_return(adjustment, &sem->count);
 
 	rwsem_cond_wake_waiter(sem, count, &wake_q);
+	trace_android_vh_rwsem_wake(sem);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
 	if (!wake_q_empty(&wake_q))
@@ -1117,6 +1125,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 {
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
+	bool already_on_list = false;
 
 	/* do optimistic spinning and steal lock if possible */
 	if (rwsem_can_spin_on_owner(sem) && rwsem_optimistic_spin(sem)) {
@@ -1134,7 +1143,11 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 	waiter.handoff_set = false;
 
 	raw_spin_lock_irq(&sem->wait_lock);
-	rwsem_add_waiter(sem, &waiter);
+	trace_android_vh_alter_rwsem_list_add(
+					&waiter,
+					sem, &already_on_list);
+	if (!already_on_list)
+		rwsem_add_waiter(sem, &waiter);
 
 	/* we're now waiting on the lock */
 	if (rwsem_first_waiter(sem) != &waiter) {
@@ -1153,6 +1166,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 		atomic_long_or(RWSEM_FLAG_WAITERS, &sem->count);
 	}
 
+	trace_android_vh_rwsem_wake(sem);
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
 	trace_contention_begin(sem, LCB_F_WRITE);
@@ -1612,6 +1626,7 @@ EXPORT_SYMBOL(up_read);
 void up_write(struct rw_semaphore *sem)
 {
 	rwsem_release(&sem->dep_map, _RET_IP_);
+	trace_android_vh_rwsem_write_finished(sem);
 	__up_write(sem);
 }
 EXPORT_SYMBOL(up_write);
@@ -1622,6 +1637,7 @@ EXPORT_SYMBOL(up_write);
 void downgrade_write(struct rw_semaphore *sem)
 {
 	lock_downgrade(&sem->dep_map, _RET_IP_);
+	trace_android_vh_rwsem_write_finished(sem);
 	__downgrade_write(sem);
 }
 EXPORT_SYMBOL(downgrade_write);

diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
index 26ea5d0..4f0ccf6 100644
--- a/kernel/module/Kconfig
+++ b/kernel/module/Kconfig

@@ -117,6 +117,19 @@
 	  Reject unsigned modules or signed modules for which we don't have a
 	  key.  Without this, such modules will simply taint the kernel.
 
+config MODULE_SIG_PROTECT
+	bool "Android GKI module protection"
+	depends on MODULE_SIG && !MODULE_SIG_FORCE
+	help
+	  Enables Android GKI symbol and export protection support.
+
+	  This modifies the behavior of the MODULE_SIG_FORCE as follows:
+	  - Allows Android GKI Modules signed using MODULE_SIG_ALL during build.
+	  - Allows other modules to load if they don't violate the access to
+	    Android GKI protected symbols and do not export the symbols already
+	    exported by the Android GKI modules. Loading will fail and return
+	    -EACCES (Permission denied) if symbol access conditions are not met.
+
 config MODULE_SIG_ALL
 	bool "Automatically sign all modules"
 	default y

diff --git a/kernel/module/Makefile b/kernel/module/Makefile
index 948efea..6a55a3f 100644
--- a/kernel/module/Makefile
+++ b/kernel/module/Makefile

@@ -10,6 +10,7 @@
 obj-y += main.o strict_rwx.o
 obj-$(CONFIG_MODULE_DECOMPRESS) += decompress.o
 obj-$(CONFIG_MODULE_SIG) += signing.o
+obj-$(CONFIG_MODULE_SIG_PROTECT) += gki_module.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-$(CONFIG_MODULES_TREE_LOOKUP) += tree_lookup.o
 obj-$(CONFIG_DEBUG_KMEMLEAK) += debug_kmemleak.o
@@ -19,3 +20,28 @@
 obj-$(CONFIG_KGDB_KDB) += kdb.o
 obj-$(CONFIG_MODVERSIONS) += version.o
 obj-$(CONFIG_MODULE_UNLOAD_TAINT_TRACKING) += tracking.o
+
+#
+# ANDROID: GKI: Generate headerfiles required for gki_module.o
+#
+# Dependencies on generated files need to be listed explicitly
+$(obj)/gki_module.o: $(obj)/gki_module_protected_exports.h \
+			$(obj)/gki_module_unprotected.h
+
+ALL_KMI_SYMBOLS := all_kmi_symbols
+
+$(obj)/gki_module_unprotected.h: $(ALL_KMI_SYMBOLS) \
+				$(srctree)/scripts/gen_gki_modules_headers.sh
+	$(Q)$(CONFIG_SHELL) $(srctree)/scripts/gen_gki_modules_headers.sh $@ \
+	"$(srctree)" \
+	$(ALL_KMI_SYMBOLS)
+
+# Generate symbol list with union of all symbol list for arm64; empty for others
+$(ALL_KMI_SYMBOLS): $(if $(filter arm64,$(ARCH)),$(wildcard $(srctree)/android/abi_gki_aarch64_*),)
+	$(if $(strip $^),cat $^ > $(ALL_KMI_SYMBOLS), echo "" > $(ALL_KMI_SYMBOLS))
+
+$(obj)/gki_module_protected_exports.h: $(srctree)/android/abi_gki_protected_exports \
+				$(srctree)/scripts/gen_gki_modules_headers.sh
+	$(Q)$(CONFIG_SHELL) $(srctree)/scripts/gen_gki_modules_headers.sh $@ \
+	"$(srctree)" \
+	$<

diff --git a/kernel/module/gki_module.c b/kernel/module/gki_module.c
new file mode 100644
index 0000000..4f124f9
--- /dev/null
+++ b/kernel/module/gki_module.c

@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2022 Google LLC
+ * Author: ramjiyani@google.com (Ramji Jiyani)
+ */
+
+#include <linux/bsearch.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+/*
+ * Build time generated header files
+ *
+ * gki_module_protected_exports.h -- Symbols protected from _export_ by unsigned modules
+ * gki_module_unprotected.h -- Symbols allowed to _access_ by unsigned modules
+ */
+#include "gki_module_protected_exports.h"
+#include "gki_module_unprotected.h"
+
+#define MAX_STRCMP_LEN (max(MAX_UNPROTECTED_NAME_LEN, MAX_PROTECTED_EXPORTS_NAME_LEN))
+
+/* bsearch() comparision callback */
+static int cmp_name(const void *sym, const void *protected_sym)
+{
+	return strncmp(sym, protected_sym, MAX_STRCMP_LEN);
+}
+
+/**
+ * gki_is_module_protected_export - Is a symbol exported from a protected GKI module?
+ *
+ * @name:	Symbol being checked against exported symbols from protected GKI modules
+ */
+bool gki_is_module_protected_export(const char *name)
+{
+	if (NR_UNPROTECTED_SYMBOLS) {
+		return bsearch(name, gki_protected_exports_symbols, NR_PROTECTED_EXPORTS_SYMBOLS,
+		       MAX_PROTECTED_EXPORTS_NAME_LEN, cmp_name) != NULL;
+	} else {
+		/*
+		 * If there are no symbols in unprotected list; We don't need to
+		 * protect exports as there is no KMI enforcement.
+		 * Treat everything exportable in this case.
+		 */
+		return false;
+	}
+}
+
+/**
+ * gki_is_module_unprotected_symbol - Is a symbol unprotected for unsigned module?
+ *
+ * @name:	Symbol being checked in list of unprotected symbols
+ */
+bool gki_is_module_unprotected_symbol(const char *name)
+{
+	if (NR_UNPROTECTED_SYMBOLS) {
+		return bsearch(name, gki_unprotected_symbols, NR_UNPROTECTED_SYMBOLS,
+				MAX_UNPROTECTED_NAME_LEN, cmp_name) != NULL;
+	} else {
+		/*
+		 * If there are no symbols in unprotected list;
+		 * there isn't a KMI enforcement for the kernel.
+		 * Treat everything accessible in this case.
+		 */
+		return true;
+	}
+}

diff --git a/kernel/module/internal.h b/kernel/module/internal.h
index 2e2bf23..d51b047 100644
--- a/kernel/module/internal.h
+++ b/kernel/module/internal.h

@@ -303,3 +303,17 @@ static inline int same_magic(const char *amagic, const char *bmagic, bool has_cr
 	return strcmp(amagic, bmagic) == 0;
 }
 #endif /* CONFIG_MODVERSIONS */
+
+#ifdef CONFIG_MODULE_SIG_PROTECT
+extern bool gki_is_module_unprotected_symbol(const char *name);
+extern bool gki_is_module_protected_export(const char *name);
+#else
+static inline bool gki_is_module_unprotected_symbol(const char *name)
+{
+	return true;
+}
+static inline bool gki_is_module_protected_export(const char *name)
+{
+	return false;
+}
+#endif /* CONFIG_MODULE_SIG_PROTECT */

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 7a62734..fbe7cb2 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c

@@ -60,6 +60,9 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/module.h>
 
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/module.h>
+
 /*
  * Mutex protects:
  * 1) List of modules (also safely readable with preempt_disable),
@@ -1095,6 +1098,21 @@ static const struct kernel_symbol *resolve_symbol(struct module *mod,
 		goto getname;
 	}
 
+	/*
+	 * ANDROID: GKI:
+	 * In case of an unsigned module symbol resolves only if:
+	 * 1. Symbol is in the list of unprotected symbol list OR
+	 * 2. If symbol owner is not NULL i.e. owner is another module;
+	 *    it has to be an unsigned module and not signed GKI module
+	 *    to protect symbols exported by signed GKI modules.
+	 */
+	if (!mod->sig_ok &&
+	    !gki_is_module_unprotected_symbol(name) &&
+	    fsa.owner && fsa.owner->sig_ok) {
+		fsa.sym = ERR_PTR(-EACCES);
+		goto getname;
+	}
+
 	err = ref_module(mod, fsa.owner);
 	if (err) {
 		fsa.sym = ERR_PTR(err);
@@ -1191,6 +1209,7 @@ static void free_module(struct module *mod)
 
 	/* This may be empty, but that's OK */
 	module_arch_freeing_init(mod);
+	trace_android_rvh_set_module_init_rw_nx(mod);
 	module_memfree(mod->init_layout.base);
 	kfree(mod->args);
 	percpu_modfree(mod);
@@ -1199,6 +1218,7 @@ static void free_module(struct module *mod)
 	lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size);
 
 	/* Finally, free the core (containing the module structure) */
+	trace_android_rvh_set_module_core_rw_nx(mod);
 	module_memfree(mod->core_layout.base);
 #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
 	vfree(mod->data_layout.base);
@@ -1247,6 +1267,14 @@ static int verify_exported_symbols(struct module *mod)
 				.name	= kernel_symbol_name(s),
 				.gplok	= true,
 			};
+
+			if (!mod->sig_ok && gki_is_module_protected_export(
+						kernel_symbol_name(s))) {
+				pr_err("%s: exports protected symbol %s\n",
+				       mod->name, kernel_symbol_name(s));
+				return -EACCES;
+			}
+
 			if (find_symbol(&fsa)) {
 				pr_err("%s: exports duplicate symbol %s"
 				       " (owned by %s)\n",
@@ -1327,9 +1355,15 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
 			     ignore_undef_symbol(info->hdr->e_machine, name)))
 				break;
 
-			ret = PTR_ERR(ksym) ?: -ENOENT;
-			pr_warn("%s: Unknown symbol %s (err %d)\n",
-				mod->name, name, ret);
+			if (PTR_ERR(ksym) == -EACCES) {
+				ret = -EACCES;
+				pr_warn("%s: Protected symbol: %s (err %d)\n",
+					mod->name, name, ret);
+			} else {
+				ret = PTR_ERR(ksym) ?: -ENOENT;
+				pr_warn("%s: Unknown symbol %s (err %d)\n",
+					mod->name, name, ret);
+			}
 			break;
 
 		default:
@@ -2342,7 +2376,9 @@ static void module_deallocate(struct module *mod, struct load_info *info)
 {
 	percpu_modfree(mod);
 	module_arch_freeing_init(mod);
+	trace_android_rvh_set_module_init_rw_nx(mod);
 	module_memfree(mod->init_layout.base);
+	trace_android_rvh_set_module_core_rw_nx(mod);
 	module_memfree(mod->core_layout.base);
 #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
 	vfree(mod->data_layout.base);
@@ -2495,8 +2531,10 @@ static noinline int do_init_module(struct module *mod)
 	rcu_assign_pointer(mod->kallsyms, &mod->core_kallsyms);
 #endif
 	module_enable_ro(mod, true);
+	trace_android_rvh_set_module_permit_after_init(mod);
 	mod_tree_remove_init(mod);
 	module_arch_freeing_init(mod);
+	trace_android_rvh_set_module_init_rw_nx(mod);
 	mod->init_layout.base = NULL;
 	mod->init_layout.size = 0;
 	mod->init_layout.ro_size = 0;
@@ -2626,6 +2664,7 @@ static int complete_formation(struct module *mod, struct load_info *info)
 	module_enable_ro(mod, false);
 	module_enable_nx(mod);
 	module_enable_x(mod);
+	trace_android_rvh_set_module_permit_before_init(mod);
 
 	/*
 	 * Mark state as coming so strong_try_module_get() ignores us,
@@ -2765,6 +2804,8 @@ static int load_module(struct load_info *info, const char __user *uargs,
 			       "kernel\n", mod->name);
 		add_taint_module(mod, TAINT_UNSIGNED_MODULE, LOCKDEP_STILL_OK);
 	}
+#else
+	mod->sig_ok = 0;
 #endif
 
 	/* To avoid stressing percpu allocator, do this once we're unique. */

diff --git a/kernel/module/signing.c b/kernel/module/signing.c
index a2ff4242..7ffb2ae 100644
--- a/kernel/module/signing.c
+++ b/kernel/module/signing.c

@@ -19,8 +19,20 @@
 #undef MODULE_PARAM_PREFIX
 #define MODULE_PARAM_PREFIX "module."
 
+/*
+ * ANDROID: GKI:
+ * Only enforce signature if SIG_PROTECT is not set
+ */
+#ifndef CONFIG_MODULE_SIG_PROTECT
 static bool sig_enforce = IS_ENABLED(CONFIG_MODULE_SIG_FORCE);
 module_param(sig_enforce, bool_enable_only, 0644);
+void set_module_sig_enforced(void)
+{
+	sig_enforce = true;
+}
+#else
+#define sig_enforce false
+#endif
 
 /*
  * Export sig_enforce kernel cmdline parameter to allow other subsystems rely
@@ -32,11 +44,6 @@ bool is_module_sig_enforced(void)
 }
 EXPORT_SYMBOL(is_module_sig_enforced);
 
-void set_module_sig_enforced(void)
-{
-	sig_enforce = true;
-}
-
 /*
  * Verify the signature on a module.
  */
@@ -121,5 +128,13 @@ int module_sig_check(struct load_info *info, int flags)
 		return -EKEYREJECTED;
 	}
 
+/*
+ * ANDROID: GKI: Do not prevent loading of unsigned modules;
+ * as all modules except GKI modules are not signed.
+ */
+#ifndef CONFIG_MODULE_SIG_PROTECT
 	return security_locked_down(LOCKDOWN_MODULE_SIGNATURE);
+#else
+	return 0;
+#endif
 }

diff --git a/kernel/pid.c b/kernel/pid.c
index 3fbc5e4..9efcf71 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c

@@ -421,6 +421,7 @@ struct task_struct *find_task_by_vpid(pid_t vnr)
 {
 	return find_task_by_pid_ns(vnr, task_active_pid_ns(current));
 }
+EXPORT_SYMBOL_GPL(find_task_by_vpid);
 
 struct task_struct *find_get_task_by_vpid(pid_t nr)
 {

diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index 874ad83..3e84020 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile

@@ -21,4 +21,5 @@
 
 obj-$(CONFIG_MAGIC_SYSRQ)	+= poweroff.o
 
+obj-$(CONFIG_SUSPEND)		+= wakeup_reason.o
 obj-$(CONFIG_ENERGY_MODEL)	+= energy_model.o

diff --git a/kernel/power/process.c b/kernel/power/process.c
index ddd9988..9b3581a 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c

@@ -81,18 +81,21 @@ static int try_to_freeze_tasks(bool user_only)
 	elapsed = ktime_sub(end, start);
 	elapsed_msecs = ktime_to_ms(elapsed);
 
-	if (todo) {
+	if (wakeup) {
 		pr_cont("\n");
-		pr_err("Freezing of tasks %s after %d.%03d seconds "
-		       "(%d tasks refusing to freeze, wq_busy=%d):\n",
-		       wakeup ? "aborted" : "failed",
+		pr_err("Freezing of tasks aborted after %d.%03d seconds",
+		       elapsed_msecs / 1000, elapsed_msecs % 1000);
+	} else if (todo) {
+		pr_cont("\n");
+		pr_err("Freezing of tasks failed after %d.%03d seconds"
+		       " (%d tasks refusing to freeze, wq_busy=%d):\n",
 		       elapsed_msecs / 1000, elapsed_msecs % 1000,
 		       todo - wq_busy, wq_busy);
 
 		if (wq_busy)
 			show_all_workqueues();
 
-		if (!wakeup || pm_debug_messages_on) {
+		if (pm_debug_messages_on) {
 			read_lock(&tasklist_lock);
 			for_each_process_thread(g, p) {
 				if (p != current && freezing(p) && !frozen(p))

diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index fa3bf16..c61c378 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c

@@ -30,6 +30,7 @@
 #include <trace/events/power.h>
 #include <linux/compiler.h>
 #include <linux/moduleparam.h>
+#include <linux/wakeup_reason.h>
 
 #include "power.h"
 
@@ -138,6 +139,8 @@ static void s2idle_loop(void)
 			break;
 		}
 
+		clear_wakeup_reasons();
+
 		if (s2idle_ops && s2idle_ops->check)
 			s2idle_ops->check();
 
@@ -367,6 +370,7 @@ static int suspend_prepare(suspend_state_t state)
 	if (!error)
 		return 0;
 
+	log_suspend_abort_reason("One or more tasks refusing to freeze");
 	suspend_stats.failed_freeze++;
 	dpm_save_failed_step(SUSPEND_FREEZE);
 	pm_notifier_call_chain(PM_POST_SUSPEND);
@@ -396,7 +400,7 @@ void __weak arch_suspend_enable_irqs(void)
  */
 static int suspend_enter(suspend_state_t state, bool *wakeup)
 {
-	int error;
+	int error, last_dev;
 
 	error = platform_suspend_prepare(state);
 	if (error)
@@ -404,7 +408,11 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
 
 	error = dpm_suspend_late(PMSG_SUSPEND);
 	if (error) {
+		last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
+		last_dev %= REC_FAILED_NUM;
 		pr_err("late suspend of devices failed\n");
+		log_suspend_abort_reason("late suspend of %s device failed",
+					 suspend_stats.failed_devs[last_dev]);
 		goto Platform_finish;
 	}
 	error = platform_suspend_prepare_late(state);
@@ -413,7 +421,11 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
 
 	error = dpm_suspend_noirq(PMSG_SUSPEND);
 	if (error) {
+		last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
+		last_dev %= REC_FAILED_NUM;
 		pr_err("noirq suspend of devices failed\n");
+		log_suspend_abort_reason("noirq suspend of %s device failed",
+					 suspend_stats.failed_devs[last_dev]);
 		goto Platform_early_resume;
 	}
 	error = platform_suspend_prepare_noirq(state);
@@ -429,8 +441,10 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
 	}
 
 	error = pm_sleep_disable_secondary_cpus();
-	if (error || suspend_test(TEST_CPUS))
+	if (error || suspend_test(TEST_CPUS)) {
+		log_suspend_abort_reason("Disabling non-boot cpus failed");
 		goto Enable_cpus;
+	}
 
 	arch_suspend_disable_irqs();
 	BUG_ON(!irqs_disabled());
@@ -501,6 +515,8 @@ int suspend_devices_and_enter(suspend_state_t state)
 	error = dpm_suspend_start(PMSG_SUSPEND);
 	if (error) {
 		pr_err("Some devices failed to suspend, or early wake event detected\n");
+		log_suspend_abort_reason(
+				"Some devices failed to suspend, or early wake event detected");
 		goto Recover_platform;
 	}
 	suspend_test_finish("suspend devices");

diff --git a/kernel/power/wakeup_reason.c b/kernel/power/wakeup_reason.c
new file mode 100644
index 0000000..8fefaa3
--- /dev/null
+++ b/kernel/power/wakeup_reason.c

@@ -0,0 +1,438 @@
+/*
+ * kernel/power/wakeup_reason.c
+ *
+ * Logs the reasons which caused the kernel to resume from
+ * the suspend mode.
+ *
+ * Copyright (C) 2020 Google, Inc.
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/wakeup_reason.h>
+#include <linux/kernel.h>
+#include <linux/irq.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/notifier.h>
+#include <linux/suspend.h>
+#include <linux/slab.h>
+
+/*
+ * struct wakeup_irq_node - stores data and relationships for IRQs logged as
+ * either base or nested wakeup reasons during suspend/resume flow.
+ * @siblings - for membership on leaf or parent IRQ lists
+ * @irq      - the IRQ number
+ * @irq_name - the name associated with the IRQ, or a default if none
+ */
+struct wakeup_irq_node {
+	struct list_head siblings;
+	int irq;
+	const char *irq_name;
+};
+
+enum wakeup_reason_flag {
+	RESUME_NONE = 0,
+	RESUME_IRQ,
+	RESUME_ABORT,
+	RESUME_ABNORMAL,
+};
+
+static DEFINE_SPINLOCK(wakeup_reason_lock);
+
+static LIST_HEAD(leaf_irqs);   /* kept in ascending IRQ sorted order */
+static LIST_HEAD(parent_irqs); /* unordered */
+
+static struct kmem_cache *wakeup_irq_nodes_cache;
+
+static const char *default_irq_name = "(unnamed)";
+
+static struct kobject *kobj;
+
+static bool capture_reasons;
+static int wakeup_reason;
+static char non_irq_wake_reason[MAX_SUSPEND_ABORT_LEN];
+
+static ktime_t last_monotime; /* monotonic time before last suspend */
+static ktime_t curr_monotime; /* monotonic time after last suspend */
+static ktime_t last_stime; /* monotonic boottime offset before last suspend */
+static ktime_t curr_stime; /* monotonic boottime offset after last suspend */
+
+static void init_node(struct wakeup_irq_node *p, int irq)
+{
+	struct irq_desc *desc;
+
+	INIT_LIST_HEAD(&p->siblings);
+
+	p->irq = irq;
+	desc = irq_to_desc(irq);
+	if (desc && desc->action && desc->action->name)
+		p->irq_name = desc->action->name;
+	else
+		p->irq_name = default_irq_name;
+}
+
+static struct wakeup_irq_node *create_node(int irq)
+{
+	struct wakeup_irq_node *result;
+
+	result = kmem_cache_alloc(wakeup_irq_nodes_cache, GFP_ATOMIC);
+	if (unlikely(!result))
+		pr_warn("Failed to log wakeup IRQ %d\n", irq);
+	else
+		init_node(result, irq);
+
+	return result;
+}
+
+static void delete_list(struct list_head *head)
+{
+	struct wakeup_irq_node *n;
+
+	while (!list_empty(head)) {
+		n = list_first_entry(head, struct wakeup_irq_node, siblings);
+		list_del(&n->siblings);
+		kmem_cache_free(wakeup_irq_nodes_cache, n);
+	}
+}
+
+static bool add_sibling_node_sorted(struct list_head *head, int irq)
+{
+	struct wakeup_irq_node *n = NULL;
+	struct list_head *predecessor = head;
+
+	if (unlikely(WARN_ON(!head)))
+		return NULL;
+
+	if (!list_empty(head))
+		list_for_each_entry(n, head, siblings) {
+			if (n->irq < irq)
+				predecessor = &n->siblings;
+			else if (n->irq == irq)
+				return true;
+			else
+				break;
+		}
+
+	n = create_node(irq);
+	if (n) {
+		list_add(&n->siblings, predecessor);
+		return true;
+	}
+
+	return false;
+}
+
+static struct wakeup_irq_node *find_node_in_list(struct list_head *head,
+						 int irq)
+{
+	struct wakeup_irq_node *n;
+
+	if (unlikely(WARN_ON(!head)))
+		return NULL;
+
+	list_for_each_entry(n, head, siblings)
+		if (n->irq == irq)
+			return n;
+
+	return NULL;
+}
+
+void log_irq_wakeup_reason(int irq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&wakeup_reason_lock, flags);
+	if (wakeup_reason == RESUME_ABNORMAL || wakeup_reason == RESUME_ABORT) {
+		spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+		return;
+	}
+
+	if (!capture_reasons) {
+		spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+		return;
+	}
+
+	if (find_node_in_list(&parent_irqs, irq) == NULL)
+		add_sibling_node_sorted(&leaf_irqs, irq);
+
+	wakeup_reason = RESUME_IRQ;
+	spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+}
+
+void log_threaded_irq_wakeup_reason(int irq, int parent_irq)
+{
+	struct wakeup_irq_node *parent;
+	unsigned long flags;
+
+	/*
+	 * Intentionally unsynchronized.  Calls that come in after we have
+	 * resumed should have a fast exit path since there's no work to be
+	 * done, any any coherence issue that could cause a wrong value here is
+	 * both highly improbable - given the set/clear timing - and very low
+	 * impact (parent IRQ gets logged instead of the specific child).
+	 */
+	if (!capture_reasons)
+		return;
+
+	spin_lock_irqsave(&wakeup_reason_lock, flags);
+
+	if (wakeup_reason == RESUME_ABNORMAL || wakeup_reason == RESUME_ABORT) {
+		spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+		return;
+	}
+
+	if (!capture_reasons || (find_node_in_list(&leaf_irqs, irq) != NULL)) {
+		spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+		return;
+	}
+
+	parent = find_node_in_list(&parent_irqs, parent_irq);
+	if (parent != NULL)
+		add_sibling_node_sorted(&leaf_irqs, irq);
+	else {
+		parent = find_node_in_list(&leaf_irqs, parent_irq);
+		if (parent != NULL) {
+			list_del_init(&parent->siblings);
+			list_add_tail(&parent->siblings, &parent_irqs);
+			add_sibling_node_sorted(&leaf_irqs, irq);
+		}
+	}
+
+	spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+}
+EXPORT_SYMBOL_GPL(log_threaded_irq_wakeup_reason);
+
+static void __log_abort_or_abnormal_wake(bool abort, const char *fmt,
+					 va_list args)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&wakeup_reason_lock, flags);
+
+	/* Suspend abort or abnormal wake reason has already been logged. */
+	if (wakeup_reason != RESUME_NONE) {
+		spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+		return;
+	}
+
+	if (abort)
+		wakeup_reason = RESUME_ABORT;
+	else
+		wakeup_reason = RESUME_ABNORMAL;
+
+	vsnprintf(non_irq_wake_reason, MAX_SUSPEND_ABORT_LEN, fmt, args);
+
+	spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+}
+
+void log_suspend_abort_reason(const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	__log_abort_or_abnormal_wake(true, fmt, args);
+	va_end(args);
+}
+EXPORT_SYMBOL_GPL(log_suspend_abort_reason);
+
+void log_abnormal_wakeup_reason(const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	__log_abort_or_abnormal_wake(false, fmt, args);
+	va_end(args);
+}
+EXPORT_SYMBOL_GPL(log_abnormal_wakeup_reason);
+
+void clear_wakeup_reasons(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&wakeup_reason_lock, flags);
+
+	delete_list(&leaf_irqs);
+	delete_list(&parent_irqs);
+	wakeup_reason = RESUME_NONE;
+	capture_reasons = true;
+
+	spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+}
+
+static void print_wakeup_sources(void)
+{
+	struct wakeup_irq_node *n;
+	unsigned long flags;
+
+	spin_lock_irqsave(&wakeup_reason_lock, flags);
+
+	capture_reasons = false;
+
+	if (wakeup_reason == RESUME_ABORT) {
+		pr_info("Abort: %s\n", non_irq_wake_reason);
+		spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+		return;
+	}
+
+	if (wakeup_reason == RESUME_IRQ && !list_empty(&leaf_irqs))
+		list_for_each_entry(n, &leaf_irqs, siblings)
+			pr_info("Resume caused by IRQ %d, %s\n", n->irq,
+				n->irq_name);
+	else if (wakeup_reason == RESUME_ABNORMAL)
+		pr_info("Resume caused by %s\n", non_irq_wake_reason);
+	else
+		pr_info("Resume cause unknown\n");
+
+	spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+}
+
+static ssize_t last_resume_reason_show(struct kobject *kobj,
+				       struct kobj_attribute *attr, char *buf)
+{
+	ssize_t buf_offset = 0;
+	struct wakeup_irq_node *n;
+	unsigned long flags;
+
+	spin_lock_irqsave(&wakeup_reason_lock, flags);
+
+	if (wakeup_reason == RESUME_ABORT) {
+		buf_offset = scnprintf(buf, PAGE_SIZE, "Abort: %s",
+				       non_irq_wake_reason);
+		spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+		return buf_offset;
+	}
+
+	if (wakeup_reason == RESUME_IRQ && !list_empty(&leaf_irqs))
+		list_for_each_entry(n, &leaf_irqs, siblings)
+			buf_offset += scnprintf(buf + buf_offset,
+						PAGE_SIZE - buf_offset,
+						"%d %s\n", n->irq, n->irq_name);
+	else if (wakeup_reason == RESUME_ABNORMAL)
+		buf_offset = scnprintf(buf, PAGE_SIZE, "-1 %s",
+				       non_irq_wake_reason);
+
+	spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+
+	return buf_offset;
+}
+
+static ssize_t last_suspend_time_show(struct kobject *kobj,
+			struct kobj_attribute *attr, char *buf)
+{
+	struct timespec64 sleep_time;
+	struct timespec64 total_time;
+	struct timespec64 suspend_resume_time;
+
+	/*
+	 * total_time is calculated from monotonic bootoffsets because
+	 * unlike CLOCK_MONOTONIC it include the time spent in suspend state.
+	 */
+	total_time = ktime_to_timespec64(ktime_sub(curr_stime, last_stime));
+
+	/*
+	 * suspend_resume_time is calculated as monotonic (CLOCK_MONOTONIC)
+	 * time interval before entering suspend and post suspend.
+	 */
+	suspend_resume_time =
+		ktime_to_timespec64(ktime_sub(curr_monotime, last_monotime));
+
+	/* sleep_time = total_time - suspend_resume_time */
+	sleep_time = timespec64_sub(total_time, suspend_resume_time);
+
+	/* Export suspend_resume_time and sleep_time in pair here. */
+	return sprintf(buf, "%llu.%09lu %llu.%09lu\n",
+		       (unsigned long long)suspend_resume_time.tv_sec,
+		       suspend_resume_time.tv_nsec,
+		       (unsigned long long)sleep_time.tv_sec,
+		       sleep_time.tv_nsec);
+}
+
+static struct kobj_attribute resume_reason = __ATTR_RO(last_resume_reason);
+static struct kobj_attribute suspend_time = __ATTR_RO(last_suspend_time);
+
+static struct attribute *attrs[] = {
+	&resume_reason.attr,
+	&suspend_time.attr,
+	NULL,
+};
+static struct attribute_group attr_group = {
+	.attrs = attrs,
+};
+
+/* Detects a suspend and clears all the previous wake up reasons*/
+static int wakeup_reason_pm_event(struct notifier_block *notifier,
+		unsigned long pm_event, void *unused)
+{
+	switch (pm_event) {
+	case PM_SUSPEND_PREPARE:
+		/* monotonic time since boot */
+		last_monotime = ktime_get();
+		/* monotonic time since boot including the time spent in suspend */
+		last_stime = ktime_get_boottime();
+		clear_wakeup_reasons();
+		break;
+	case PM_POST_SUSPEND:
+		/* monotonic time since boot */
+		curr_monotime = ktime_get();
+		/* monotonic time since boot including the time spent in suspend */
+		curr_stime = ktime_get_boottime();
+		print_wakeup_sources();
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block wakeup_reason_pm_notifier_block = {
+	.notifier_call = wakeup_reason_pm_event,
+};
+
+static int __init wakeup_reason_init(void)
+{
+	if (register_pm_notifier(&wakeup_reason_pm_notifier_block)) {
+		pr_warn("[%s] failed to register PM notifier\n", __func__);
+		goto fail;
+	}
+
+	kobj = kobject_create_and_add("wakeup_reasons", kernel_kobj);
+	if (!kobj) {
+		pr_warn("[%s] failed to create a sysfs kobject\n", __func__);
+		goto fail_unregister_pm_notifier;
+	}
+
+	if (sysfs_create_group(kobj, &attr_group)) {
+		pr_warn("[%s] failed to create a sysfs group\n", __func__);
+		goto fail_kobject_put;
+	}
+
+	wakeup_irq_nodes_cache =
+		kmem_cache_create("wakeup_irq_node_cache",
+				  sizeof(struct wakeup_irq_node), 0, 0, NULL);
+	if (!wakeup_irq_nodes_cache)
+		goto fail_remove_group;
+
+	return 0;
+
+fail_remove_group:
+	sysfs_remove_group(kobj, &attr_group);
+fail_kobject_put:
+	kobject_put(kobj);
+fail_unregister_pm_notifier:
+	unregister_pm_notifier(&wakeup_reason_pm_notifier_block);
+fail:
+	return 1;
+}
+
+late_initcall(wakeup_reason_init);

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e4f1e74..580d26c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c

@@ -54,6 +54,8 @@
 #include <trace/events/initcall.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/printk.h>
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/printk.h>
 
 #include "printk_ringbuffer.h"
 #include "console_cmdline.h"
@@ -2544,6 +2546,12 @@ void resume_console(void)
  */
 static int console_cpu_notify(unsigned int cpu)
 {
+	int flag = 0;
+
+	trace_android_vh_printk_hotplug(&flag);
+	if (flag)
+		return 0;
+
 	if (!cpuhp_tasks_frozen) {
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
@@ -3523,6 +3531,7 @@ int _printk_deferred(const char *fmt, ...)
 
 	return r;
 }
+EXPORT_SYMBOL_GPL(_printk_deferred);
 
 /*
  * printk rate limiting, lifted from the networking subsystem.

diff --git a/kernel/reboot.c b/kernel/reboot.c
index 3bba88c..a032312 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c

@@ -35,6 +35,7 @@ EXPORT_SYMBOL(cad_pid);
 enum reboot_mode reboot_mode DEFAULT_REBOOT_MODE;
 EXPORT_SYMBOL_GPL(reboot_mode);
 enum reboot_mode panic_reboot_mode = REBOOT_UNDEFINED;
+EXPORT_SYMBOL_GPL(panic_reboot_mode);
 
 /*
  * This variable is used privately to keep track of whether or not

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 976092b..ff1738c 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile

@@ -32,3 +32,4 @@
 obj-y += fair.o
 obj-y += build_policy.o
 obj-y += build_utility.o
+obj-$(CONFIG_ANDROID_VENDOR_HOOKS) += vendor_hooks.o

diff --git a/kernel/sched/OWNERS b/kernel/sched/OWNERS
new file mode 100644
index 0000000..09a9f8e
--- /dev/null
+++ b/kernel/sched/OWNERS

@@ -0,0 +1,4 @@
+connoro@google.com
+elavila@google.com
+qperret@google.com
+tkjos@google.com

diff --git a/kernel/sched/android.h b/kernel/sched/android.h
new file mode 100644
index 0000000..5fe6bb7
--- /dev/null
+++ b/kernel/sched/android.h

@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Android scheduler hooks and modifications
+ *
+ * Put all of the android-specific scheduler hooks and changes
+ * in this .h file to make merges and modifications easier.  It's also
+ * simpler to notice what is, and is not, an upstream change this way over time.
+ */
+
+
+/*
+ * task_may_not_preempt - check whether a task may not be preemptible soon
+ */
+static inline bool task_may_not_preempt(struct task_struct *task, int cpu)
+{
+	return false;
+}
+
+static inline bool uclamp_boosted(struct task_struct *p)
+{
+	return false;
+}
+
+static inline bool uclamp_latency_sensitive(struct task_struct *p)
+{
+	return false;
+}

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f730b6f..0d1676a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c

@@ -95,6 +95,9 @@
 #include "../../io_uring/io-wq.h"
 #include "../smpboot.h"
 
+#include <trace/hooks/sched.h>
+#include <trace/hooks/cgroup.h>
+
 /*
  * Export tracepoints that act as a bare tracehook (ie: have no trace event
  * associated with them) to allow external modules to probe them.
@@ -110,8 +113,10 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_overutilized_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_cfs_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_se_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp);
+EXPORT_TRACEPOINT_SYMBOL_GPL(sched_switch);
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
+EXPORT_SYMBOL_GPL(runqueues);
 
 #ifdef CONFIG_SCHED_DEBUG
 /*
@@ -126,6 +131,7 @@ DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 const_debug unsigned int sysctl_sched_features =
 #include "features.h"
 	0;
+EXPORT_SYMBOL_GPL(sysctl_sched_features);
 #undef SCHED_FEAT
 
 /*
@@ -551,6 +557,7 @@ void raw_spin_rq_lock_nested(struct rq *rq, int subclass)
 		raw_spin_unlock(lock);
 	}
 }
+EXPORT_SYMBOL_GPL(raw_spin_rq_lock_nested);
 
 bool raw_spin_rq_trylock(struct rq *rq)
 {
@@ -580,6 +587,7 @@ void raw_spin_rq_unlock(struct rq *rq)
 {
 	raw_spin_unlock(rq_lockp(rq));
 }
+EXPORT_SYMBOL_GPL(raw_spin_rq_unlock);
 
 #ifdef CONFIG_SMP
 /*
@@ -598,6 +606,7 @@ void double_rq_lock(struct rq *rq1, struct rq *rq2)
 
 	double_rq_clock_clear_update(rq1, rq2);
 }
+EXPORT_SYMBOL_GPL(double_rq_lock);
 #endif
 
 /*
@@ -623,6 +632,7 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 			cpu_relax();
 	}
 }
+EXPORT_SYMBOL_GPL(__task_rq_lock);
 
 /*
  * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
@@ -665,6 +675,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 			cpu_relax();
 	}
 }
+EXPORT_SYMBOL_GPL(task_rq_lock);
 
 /*
  * RQ-clock updating methods:
@@ -722,7 +733,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 	if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
 		update_irq_load_avg(rq, irq_delta + steal);
 #endif
-	update_rq_clock_pelt(rq, delta);
+	update_rq_clock_task_mult(rq, delta);
 }
 
 void update_rq_clock(struct rq *rq)
@@ -746,6 +757,7 @@ void update_rq_clock(struct rq *rq)
 	rq->clock += delta;
 	update_rq_clock_task(rq, delta);
 }
+EXPORT_SYMBOL_GPL(update_rq_clock);
 
 #ifdef CONFIG_SCHED_HRTICK
 /*
@@ -944,6 +956,7 @@ static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task)
 	 */
 	*head->lastp = node;
 	head->lastp = &node->next;
+	head->count++;
 	return true;
 }
 
@@ -999,12 +1012,14 @@ void wake_up_q(struct wake_q_head *head)
 		/* Task can safely be re-inserted now: */
 		node = node->next;
 		task->wake_q.next = NULL;
+		task->wake_q_count = head->count;
 
 		/*
 		 * wake_up_process() executes a full barrier, which pairs with
 		 * the queueing in wake_q_add() so as not to miss wakeups.
 		 */
 		wake_up_process(task);
+		task->wake_q_count = 0;
 		put_task_struct(task);
 	}
 }
@@ -1039,6 +1054,7 @@ void resched_curr(struct rq *rq)
 	else
 		trace_sched_wake_idle_without_ipi(cpu);
 }
+EXPORT_SYMBOL_GPL(resched_curr);
 
 void resched_cpu(int cpu)
 {
@@ -1066,6 +1082,11 @@ int get_nohz_timer_target(void)
 	int i, cpu = smp_processor_id(), default_cpu = -1;
 	struct sched_domain *sd;
 	const struct cpumask *hk_mask;
+	bool done = false;
+
+	trace_android_rvh_get_nohz_timer_target(&cpu, &done);
+	if (done)
+		return cpu;
 
 	if (housekeeping_cpu(cpu, HK_TYPE_TIMER)) {
 		if (!idle_cpu(cpu))
@@ -1341,6 +1362,7 @@ static struct uclamp_se uclamp_default[UCLAMP_CNT];
  *   * An admin modifying the cgroup cpu.uclamp.{min, max}
  */
 DEFINE_STATIC_KEY_FALSE(sched_uclamp_used);
+EXPORT_SYMBOL_GPL(sched_uclamp_used);
 
 /* Integer rounded range for each bucket */
 #define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
@@ -1487,6 +1509,12 @@ uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
 {
 	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
 	struct uclamp_se uc_max = uclamp_default[clamp_id];
+	struct uclamp_se uc_eff;
+	int ret = 0;
+
+	trace_android_rvh_uclamp_eff_get(p, clamp_id, &uc_max, &uc_eff, &ret);
+	if (ret)
+		return uc_eff;
 
 	/* System default restrictions always apply */
 	if (unlikely(uc_req.value > uc_max.value))
@@ -1507,6 +1535,7 @@ unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id)
 
 	return (unsigned long)uc_eff.value;
 }
+EXPORT_SYMBOL_GPL(uclamp_eff_value);
 
 /*
  * When a task is enqueued on a rq, the clamp bucket currently defined by the
@@ -1934,12 +1963,14 @@ static void __setscheduler_uclamp(struct task_struct *p,
 	    attr->sched_util_min != -1) {
 		uclamp_se_set(&p->uclamp_req[UCLAMP_MIN],
 			      attr->sched_util_min, true);
+		trace_android_vh_setscheduler_uclamp(p, UCLAMP_MIN, attr->sched_util_min);
 	}
 
 	if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX &&
 	    attr->sched_util_max != -1) {
 		uclamp_se_set(&p->uclamp_req[UCLAMP_MAX],
 			      attr->sched_util_max, true);
+		trace_android_vh_setscheduler_uclamp(p, UCLAMP_MAX, attr->sched_util_max);
 	}
 }
 
@@ -2057,7 +2088,9 @@ static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 	}
 
 	uclamp_rq_inc(rq, p);
+	trace_android_rvh_enqueue_task(rq, p, flags);
 	p->sched_class->enqueue_task(rq, p, flags);
+	trace_android_rvh_after_enqueue_task(rq, p, flags);
 
 	if (sched_core_enabled(rq))
 		sched_core_enqueue(rq, p);
@@ -2077,7 +2110,9 @@ static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
 	}
 
 	uclamp_rq_dec(rq, p);
+	trace_android_rvh_dequeue_task(rq, p, flags);
 	p->sched_class->dequeue_task(rq, p, flags);
+	trace_android_rvh_after_dequeue_task(rq, p, flags);
 }
 
 void activate_task(struct rq *rq, struct task_struct *p, int flags)
@@ -2086,6 +2121,7 @@ void activate_task(struct rq *rq, struct task_struct *p, int flags)
 
 	p->on_rq = TASK_ON_RQ_QUEUED;
 }
+EXPORT_SYMBOL_GPL(activate_task);
 
 void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 {
@@ -2093,6 +2129,7 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 
 	dequeue_task(rq, p, flags);
 }
+EXPORT_SYMBOL_GPL(deactivate_task);
 
 static inline int __normal_prio(int policy, int rt_prio, int nice)
 {
@@ -2185,6 +2222,7 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 	if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
 		rq_clock_skip_update(rq);
 }
+EXPORT_SYMBOL_GPL(check_preempt_curr);
 
 #ifdef CONFIG_SMP
 
@@ -2267,6 +2305,8 @@ static inline bool rq_has_pinned_tasks(struct rq *rq)
  */
 static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 {
+	bool allowed = true;
+
 	/* When not in the task's cpumask, no point in looking further. */
 	if (!cpumask_test_cpu(cpu, p->cpus_ptr))
 		return false;
@@ -2275,14 +2315,20 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 	if (is_migration_disabled(p))
 		return cpu_online(cpu);
 
+	/* check for all cases */
+	trace_android_rvh_is_cpu_allowed(p, cpu, &allowed);
+
 	/* Non kernel threads are not allowed during either online or offline. */
 	if (!(p->flags & PF_KTHREAD))
-		return cpu_active(cpu) && task_cpu_possible(cpu, p);
+		return cpu_active(cpu) && task_cpu_possible(cpu, p) && allowed;
 
 	/* KTHREAD_IS_PER_CPU is always allowed. */
 	if (kthread_is_per_cpu(p))
 		return cpu_online(cpu);
 
+	if (!allowed)
+		return false;
+
 	/* Regular kernel threads don't get to stay during offline. */
 	if (cpu_dying(cpu))
 		return false;
@@ -2313,12 +2359,24 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
 				   struct task_struct *p, int new_cpu)
 {
+	int detached = 0;
+
 	lockdep_assert_rq_held(rq);
 
+	/*
+	 * The vendor hook may drop the lock temporarily, so
+	 * pass the rq flags to unpin lock. We expect the
+	 * rq lock to be held after return.
+	 */
+	trace_android_rvh_migrate_queued_task(rq, rf, p, new_cpu, &detached);
+	if (detached)
+		goto attach;
+
 	deactivate_task(rq, p, DEQUEUE_NOCLOCK);
 	set_task_cpu(p, new_cpu);
-	rq_unlock(rq, rf);
 
+attach:
+	rq_unlock(rq, rf);
 	rq = cpu_rq(new_cpu);
 
 	rq_lock(rq, rf);
@@ -2356,8 +2414,8 @@ struct set_affinity_pending {
  * So we race with normal scheduler movements, but that's OK, as long
  * as the task is no longer on this CPU.
  */
-static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
-				 struct task_struct *p, int dest_cpu)
+struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
+			  struct task_struct *p, int dest_cpu)
 {
 	/* Affinity changed (again). */
 	if (!is_cpu_allowed(p, dest_cpu))
@@ -2368,6 +2426,7 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
 
 	return rq;
 }
+EXPORT_SYMBOL_GPL(__migrate_task);
 
 /*
  * migration_cpu_stop - this will be executed by a highprio stopper thread
@@ -2923,6 +2982,8 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
 	 * immediately required to distribute the tasks within their new mask.
 	 */
 	dest_cpu = cpumask_any_and_distribute(cpu_valid_mask, new_mask);
+	trace_android_rvh_set_cpus_allowed_by_task(cpu_valid_mask, new_mask, p, &dest_cpu);
+
 	if (dest_cpu >= nr_cpu_ids) {
 		ret = -EINVAL;
 		goto out;
@@ -3152,12 +3213,13 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 		p->se.nr_migrations++;
 		rseq_migrate(p);
 		perf_event_task_migrate(p);
+		trace_android_rvh_set_task_cpu(p, new_cpu);
 	}
 
 	__set_task_cpu(p, new_cpu);
 }
+EXPORT_SYMBOL_GPL(set_task_cpu);
 
-#ifdef CONFIG_NUMA_BALANCING
 static void __migrate_swap_task(struct task_struct *p, int cpu)
 {
 	if (task_on_rq_queued(p)) {
@@ -3272,7 +3334,7 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p,
 out:
 	return ret;
 }
-#endif /* CONFIG_NUMA_BALANCING */
+EXPORT_SYMBOL_GPL(migrate_swap);
 
 /*
  * wait_task_inactive - wait for a thread to unschedule.
@@ -3429,12 +3491,16 @@ EXPORT_SYMBOL_GPL(kick_process);
  * select_task_rq() below may allow selection of !active CPUs in order
  * to satisfy the above rules.
  */
-static int select_fallback_rq(int cpu, struct task_struct *p)
+int select_fallback_rq(int cpu, struct task_struct *p)
 {
 	int nid = cpu_to_node(cpu);
 	const struct cpumask *nodemask = NULL;
 	enum { cpuset, possible, fail } state = cpuset;
-	int dest_cpu;
+	int dest_cpu = -1;
+
+	trace_android_rvh_select_fallback_rq(cpu, p, &dest_cpu);
+	if (dest_cpu >= 0)
+		return dest_cpu;
 
 	/*
 	 * If the node that the CPU is on has been offlined, cpu_to_node()
@@ -3499,6 +3565,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 
 	return dest_cpu;
 }
+EXPORT_SYMBOL_GPL(select_fallback_rq);
 
 /*
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_ptr is stable.
@@ -3675,6 +3742,9 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 {
 	int en_flags = ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK;
 
+	if (wake_flags & WF_SYNC)
+		en_flags |= ENQUEUE_WAKEUP_SYNC;
+
 	lockdep_assert_rq_held(rq);
 
 	if (p->sched_contributes_to_load)
@@ -3816,6 +3886,7 @@ void wake_up_if_idle(int cpu)
 out:
 	rcu_read_unlock();
 }
+EXPORT_SYMBOL_GPL(wake_up_if_idle);
 
 bool cpus_share_cache(int this_cpu, int that_cpu)
 {
@@ -3867,7 +3938,12 @@ static inline bool ttwu_queue_cond(struct task_struct *p, int cpu)
 
 static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags)
 {
-	if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(p, cpu)) {
+	bool cond = false;
+
+	trace_android_rvh_ttwu_cond(cpu, &cond);
+
+	if ((sched_feat(TTWU_QUEUE) && ttwu_queue_cond(p, cpu)) ||
+			cond) {
 		sched_clock_cpu(cpu); /* Sync clocks across CPUs */
 		__ttwu_queue_wakelist(p, cpu, wake_flags);
 		return true;
@@ -4135,6 +4211,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	if (READ_ONCE(p->on_rq) && ttwu_runnable(p, wake_flags))
 		goto unlock;
 
+	if (READ_ONCE(p->__state) & TASK_UNINTERRUPTIBLE)
+		trace_sched_blocked_reason(p);
+
 #ifdef CONFIG_SMP
 	/*
 	 * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be
@@ -4203,6 +4282,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	 */
 	smp_cond_load_acquire(&p->on_cpu, !VAL);
 
+	trace_android_rvh_try_to_wake_up(p);
+
 	cpu = select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU);
 	if (task_cpu(p) != cpu) {
 		if (p->in_iowait) {
@@ -4222,8 +4303,10 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 unlock:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 out:
-	if (success)
+	if (success) {
+		trace_android_rvh_try_to_wake_up_success(p);
 		ttwu_stat(p, task_cpu(p), wake_flags);
+	}
 	preempt_enable();
 
 	return success;
@@ -4383,6 +4466,8 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	p->se.cfs_rq			= NULL;
 #endif
 
+	trace_android_rvh_sched_fork_init(p);
+
 #ifdef CONFIG_SCHEDSTATS
 	/* Even if schedstat is disabled, there should not be garbage */
 	memset(&p->stats, 0, sizeof(p->stats));
@@ -4590,6 +4675,8 @@ late_initcall(sched_core_sysctl_init);
  */
 int sched_fork(unsigned long clone_flags, struct task_struct *p)
 {
+	trace_android_rvh_sched_fork(p);
+
 	__sched_fork(clone_flags, p);
 	/*
 	 * We mark the process as NEW here. This guarantees that
@@ -4602,6 +4689,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 	 * Make sure we do not leak PI boosting priority to the child.
 	 */
 	p->prio = current->normal_prio;
+	trace_android_rvh_prepare_prio_fork(p);
 
 	uclamp_fork(p);
 
@@ -4634,6 +4722,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 		p->sched_class = &fair_sched_class;
 
 	init_entity_runnable_average(&p->se);
+	trace_android_rvh_finish_prio_fork(p);
 
 
 #ifdef CONFIG_SCHED_INFO
@@ -4713,6 +4802,8 @@ void wake_up_new_task(struct task_struct *p)
 	struct rq_flags rf;
 	struct rq *rq;
 
+	trace_android_rvh_wake_up_new_task(p);
+
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 	WRITE_ONCE(p->__state, TASK_RUNNING);
 #ifdef CONFIG_SMP
@@ -4731,6 +4822,7 @@ void wake_up_new_task(struct task_struct *p)
 	rq = __task_rq_lock(p, &rf);
 	update_rq_clock(rq);
 	post_init_entity_util_avg(p);
+	trace_android_rvh_new_task_stats(p);
 
 	activate_task(rq, p, ENQUEUE_NOCLOCK);
 	trace_sched_wakeup_new(p);
@@ -4904,6 +4996,7 @@ struct balance_callback balance_push_callback = {
 	.next = NULL,
 	.func = balance_push,
 };
+EXPORT_SYMBOL_GPL(balance_push_callback);
 
 static inline struct balance_callback *
 __splice_balance_callbacks(struct rq *rq, bool split)
@@ -4935,10 +5028,11 @@ static inline struct balance_callback *splice_balance_callbacks(struct rq *rq)
 	return __splice_balance_callbacks(rq, true);
 }
 
-static void __balance_callbacks(struct rq *rq)
+void __balance_callbacks(struct rq *rq)
 {
 	do_balance_callbacks(rq, __splice_balance_callbacks(rq, false));
 }
+EXPORT_SYMBOL_GPL(__balance_callbacks);
 
 static inline void balance_callbacks(struct rq *rq, struct balance_callback *head)
 {
@@ -5145,6 +5239,8 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 		if (prev->sched_class->task_dead)
 			prev->sched_class->task_dead(prev);
 
+		trace_android_rvh_flush_task(prev);
+
 		/* Task is done with its stack. */
 		put_task_stack(prev);
 
@@ -5350,6 +5446,11 @@ void sched_exec(void)
 	struct task_struct *p = current;
 	unsigned long flags;
 	int dest_cpu;
+	bool cond = false;
+
+	trace_android_rvh_sched_exec(&cond);
+	if (cond)
+		return;
 
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), WF_EXEC);
@@ -5506,6 +5607,8 @@ void scheduler_tick(void)
 	rq_lock(rq, &rf);
 
 	update_rq_clock(rq);
+	trace_android_rvh_tick_entry(rq);
+
 	thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
 	update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure);
 	curr->sched_class->task_tick(rq, curr, 0);
@@ -5525,6 +5628,8 @@ void scheduler_tick(void)
 	rq->idle_balance = idle_cpu(cpu);
 	trigger_load_balance(rq);
 #endif
+
+	trace_android_vh_scheduler_tick(rq);
 }
 
 #ifdef CONFIG_NO_HZ_FULL
@@ -5780,6 +5885,8 @@ static noinline void __schedule_bug(struct task_struct *prev)
 	}
 	check_panic_on_warn("scheduling while atomic");
 
+	trace_android_rvh_schedule_bug(prev);
+
 	dump_stack();
 	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 }
@@ -6519,6 +6626,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	rq->last_seen_need_resched_ns = 0;
 #endif
 
+	trace_android_rvh_schedule(prev, next, rq);
 	if (likely(prev != next)) {
 		rq->nr_switches++;
 		/*
@@ -6871,7 +6979,7 @@ asmlinkage __visible void __sched preempt_schedule_irq(void)
 int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags,
 			  void *key)
 {
-	WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~WF_SYNC);
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~(WF_SYNC | WF_ANDROID_VENDOR));
 	return try_to_wake_up(curr->private, mode, wake_flags);
 }
 EXPORT_SYMBOL(default_wake_function);
@@ -6924,6 +7032,7 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 	struct rq_flags rf;
 	struct rq *rq;
 
+	trace_android_rvh_rtmutex_prepare_setprio(p, pi_task);
 	/* XXX used to be waiter->prio, not waiter->task->prio */
 	prio = __rt_effective_prio(pi_task, p->normal_prio);
 
@@ -7042,12 +7151,13 @@ static inline int rt_effective_prio(struct task_struct *p, int prio)
 
 void set_user_nice(struct task_struct *p, long nice)
 {
-	bool queued, running;
+	bool queued, running, allowed = false;
 	int old_prio;
 	struct rq_flags rf;
 	struct rq *rq;
 
-	if (task_nice(p) == nice || nice < MIN_NICE || nice > MAX_NICE)
+	trace_android_rvh_set_user_nice(p, &nice, &allowed);
+	if ((task_nice(p) == nice || nice < MIN_NICE || nice > MAX_NICE) && !allowed)
 		return;
 	/*
 	 * We have to be careful, if called from sys_setpriority(),
@@ -7212,6 +7322,7 @@ int available_idle_cpu(int cpu)
 
 	return 1;
 }
+EXPORT_SYMBOL_GPL(available_idle_cpu);
 
 /**
  * idle_task - return the idle task for a given CPU.
@@ -7521,9 +7632,6 @@ static int __sched_setscheduler(struct task_struct *p,
 			return retval;
 	}
 
-	if (pi)
-		cpuset_read_lock();
-
 	/*
 	 * Make sure no PI-waiters arrive (or leave) while we are
 	 * changing the priority of the task:
@@ -7598,8 +7706,6 @@ static int __sched_setscheduler(struct task_struct *p,
 	if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
 		policy = oldpolicy = -1;
 		task_rq_unlock(rq, p, &rf);
-		if (pi)
-			cpuset_read_unlock();
 		goto recheck;
 	}
 
@@ -7665,10 +7771,8 @@ static int __sched_setscheduler(struct task_struct *p,
 	head = splice_balance_callbacks(rq);
 	task_rq_unlock(rq, p, &rf);
 
-	if (pi) {
-		cpuset_read_unlock();
+	if (pi)
 		rt_mutex_adjust_pi(p);
-	}
 
 	/* Run balance callbacks after we've adjusted the PI chain: */
 	balance_callbacks(rq, head);
@@ -7678,8 +7782,6 @@ static int __sched_setscheduler(struct task_struct *p,
 
 unlock:
 	task_rq_unlock(rq, p, &rf);
-	if (pi)
-		cpuset_read_unlock();
 	return retval;
 }
 
@@ -7718,11 +7820,13 @@ int sched_setscheduler(struct task_struct *p, int policy,
 {
 	return _sched_setscheduler(p, policy, param, true);
 }
+EXPORT_SYMBOL_GPL(sched_setscheduler);
 
 int sched_setattr(struct task_struct *p, const struct sched_attr *attr)
 {
 	return __sched_setscheduler(p, attr, true, true);
 }
+EXPORT_SYMBOL_GPL(sched_setattr);
 
 int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)
 {
@@ -7748,6 +7852,7 @@ int sched_setscheduler_nocheck(struct task_struct *p, int policy,
 {
 	return _sched_setscheduler(p, policy, param, false);
 }
+EXPORT_SYMBOL_GPL(sched_setscheduler_nocheck);
 
 /*
  * SCHED_FIFO is a broken scheduler model; that is, it is fundamentally
@@ -7809,14 +7914,9 @@ do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
 	rcu_read_lock();
 	retval = -ESRCH;
 	p = find_process_by_pid(pid);
-	if (likely(p))
-		get_task_struct(p);
-	rcu_read_unlock();
-
-	if (likely(p)) {
+	if (p != NULL)
 		retval = sched_setscheduler(p, policy, &lparam);
-		put_task_struct(p);
-	}
+	rcu_read_unlock();
 
 	return retval;
 }
@@ -8214,6 +8314,8 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 		goto out_put_task;
 
 	retval = __sched_setaffinity(p, in_mask);
+	trace_android_rvh_sched_setaffinity(p, in_mask, &retval);
+
 out_put_task:
 	put_task_struct(p);
 	return retval;
@@ -8273,6 +8375,7 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
 
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	cpumask_and(mask, &p->cpus_mask, cpu_active_mask);
+	trace_android_rvh_sched_getaffinity(p, mask);
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
 out_unlock:
@@ -8328,6 +8431,8 @@ static void do_sched_yield(void)
 	schedstat_inc(rq->yld_count);
 	current->sched_class->yield_task(rq);
 
+	trace_android_rvh_do_sched_yield(rq);
+
 	preempt_disable();
 	rq_unlock_irq(rq, &rf);
 	sched_preempt_enable_no_resched();
@@ -9174,6 +9279,24 @@ void idle_task_exit(void)
 	/* finish_cpu(), as ran on the BP, will clean up the active_mm state */
 }
 
+struct task_struct *pick_migrate_task(struct rq *rq)
+{
+	const struct sched_class *class;
+	struct task_struct *next;
+
+	for_each_class(class) {
+		next = class->pick_next_task(rq);
+		if (next) {
+			next->sched_class->put_prev_task(rq, next);
+			return next;
+		}
+	}
+
+	/* The idle class should always have a runnable task */
+	BUG();
+}
+EXPORT_SYMBOL_GPL(pick_migrate_task);
+
 static int __balance_push_cpu_stop(void *arg)
 {
 	struct task_struct *p = arg;
@@ -9519,6 +9642,7 @@ int sched_cpu_starting(unsigned int cpu)
 	sched_core_cpu_starting(cpu);
 	sched_rq_cpu_starting(cpu);
 	sched_tick_start(cpu);
+	trace_android_rvh_sched_cpu_starting(cpu);
 	return 0;
 }
 
@@ -9592,6 +9716,8 @@ int sched_cpu_dying(unsigned int cpu)
 	}
 	rq_unlock_irqrestore(rq, &rf);
 
+	trace_android_rvh_sched_cpu_dying(cpu);
+
 	calc_load_migrate(rq);
 	update_max_interval();
 	hrtick_clear(rq);
@@ -9652,7 +9778,9 @@ int in_sched_functions(unsigned long addr)
  * Every task in system belongs to this group at bootup.
  */
 struct task_group root_task_group;
+EXPORT_SYMBOL_GPL(root_task_group);
 LIST_HEAD(task_groups);
+EXPORT_SYMBOL_GPL(task_groups);
 
 /* Cacheline aligned slab cache for task_group */
 static struct kmem_cache *task_group_cache __read_mostly;
@@ -9935,6 +10063,8 @@ void __might_resched(const char *file, int line, unsigned int offsets)
 	print_preempt_disable_ip(offsets & MIGHT_RESCHED_PREEMPT_MASK,
 				 preempt_disable_ip);
 
+	trace_android_rvh_schedule_bug(NULL);
+
 	dump_stack();
 	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 }
@@ -10314,6 +10444,7 @@ static int cpu_cgroup_css_online(struct cgroup_subsys_state *css)
 	mutex_unlock(&uclamp_mutex);
 #endif
 
+	trace_android_rvh_cpu_cgroup_online(css);
 	return 0;
 }
 
@@ -10355,6 +10486,8 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 
 	cgroup_taskset_for_each(task, css, tset)
 		sched_move_task(task);
+
+	trace_android_rvh_cpu_cgroup_attach(tset);
 }
 
 #ifdef CONFIG_UCLAMP_TASK_GROUP
@@ -10532,6 +10665,27 @@ static int cpu_uclamp_max_show(struct seq_file *sf, void *v)
 	cpu_uclamp_print(sf, UCLAMP_MAX);
 	return 0;
 }
+
+static int cpu_uclamp_ls_write_u64(struct cgroup_subsys_state *css,
+				   struct cftype *cftype, u64 ls)
+{
+	struct task_group *tg;
+
+	if (ls > 1)
+		return -EINVAL;
+	tg = css_tg(css);
+	tg->latency_sensitive = (unsigned int) ls;
+
+	return 0;
+}
+
+static u64 cpu_uclamp_ls_read_u64(struct cgroup_subsys_state *css,
+				  struct cftype *cft)
+{
+	struct task_group *tg = css_tg(css);
+
+	return (u64) tg->latency_sensitive;
+}
 #endif /* CONFIG_UCLAMP_TASK_GROUP */
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -10974,6 +11128,12 @@ static struct cftype cpu_legacy_files[] = {
 		.seq_show = cpu_uclamp_max_show,
 		.write = cpu_uclamp_max_write,
 	},
+	{
+		.name = "uclamp.latency_sensitive",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = cpu_uclamp_ls_read_u64,
+		.write_u64 = cpu_uclamp_ls_write_u64,
+	},
 #endif
 	{ }	/* Terminate */
 };
@@ -11172,6 +11332,12 @@ static struct cftype cpu_files[] = {
 		.seq_show = cpu_uclamp_max_show,
 		.write = cpu_uclamp_max_write,
 	},
+	{
+		.name = "uclamp.latency_sensitive",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = cpu_uclamp_ls_read_u64,
+		.write_u64 = cpu_uclamp_ls_write_u64,
+	},
 #endif
 	{ }	/* terminate */
 };

diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index 5252fb1..b3a8120 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c

@@ -72,3 +72,4 @@ bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
 		(policy->dvfs_possible_from_any_cpu &&
 		 rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
 }
+EXPORT_SYMBOL_GPL(cpufreq_this_cpu_can_update);

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 1207c78..919f8e9 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c

@@ -834,29 +834,3 @@ struct cpufreq_governor *cpufreq_default_governor(void)
 #endif
 
 cpufreq_governor_init(schedutil_gov);
-
-#ifdef CONFIG_ENERGY_MODEL
-static void rebuild_sd_workfn(struct work_struct *work)
-{
-	rebuild_sched_domains_energy();
-}
-static DECLARE_WORK(rebuild_sd_work, rebuild_sd_workfn);
-
-/*
- * EAS shouldn't be attempted without sugov, so rebuild the sched_domains
- * on governor changes to make sure the scheduler knows about it.
- */
-void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
-				  struct cpufreq_governor *old_gov)
-{
-	if (old_gov == &schedutil_gov || policy->governor == &schedutil_gov) {
-		/*
-		 * When called from the cpufreq_register_driver() path, the
-		 * cpu_hotplug_lock is already held, so use a work item to
-		 * avoid nested locking in rebuild_sched_domains().
-		 */
-		schedule_work(&rebuild_sd_work);
-	}
-
-}
-#endif

diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index a286e72..64e0cce 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c

@@ -195,6 +195,7 @@ int cpupri_find_fitness(struct cpupri *cp, struct task_struct *p,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(cpupri_find_fitness);
 
 /**
  * cpupri_set - update the CPU priority setting

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 95fc778..a485f3b0 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c

@@ -2,6 +2,8 @@
 /*
  * Simple CPU accounting cgroup controller
  */
+#include <linux/cpufreq_times.h>
+#include <trace/hooks/sched.h>
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 
@@ -17,6 +19,7 @@
  * compromise in place of having locks on each irq in account_system_time.
  */
 DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
+EXPORT_PER_CPU_SYMBOL_GPL(cpu_irqtime);
 
 static int sched_clock_irqtime;
 
@@ -52,6 +55,7 @@ void irqtime_account_irq(struct task_struct *curr, unsigned int offset)
 	unsigned int pc;
 	s64 delta;
 	int cpu;
+	bool irq_start = true;
 
 	if (!sched_clock_irqtime)
 		return;
@@ -67,10 +71,15 @@ void irqtime_account_irq(struct task_struct *curr, unsigned int offset)
 	 * in that case, so as not to confuse scheduler with a special task
 	 * that do not consume any time, but still wants to run.
 	 */
-	if (pc & HARDIRQ_MASK)
+	if (pc & HARDIRQ_MASK) {
 		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
-	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
+		irq_start = false;
+	} else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd()) {
 		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
+		irq_start = false;
+	}
+
+	trace_android_rvh_account_irq(curr, cpu, delta, irq_start);
 }
 
 static u64 irqtime_tick_accounted(u64 maxtime)
@@ -129,6 +138,9 @@ void account_user_time(struct task_struct *p, u64 cputime)
 
 	/* Account for user time used */
 	acct_account_cputime(p);
+
+	/* Account power usage for user time */
+	cpufreq_acct_update_power(p, cputime);
 }
 
 /*
@@ -173,6 +185,9 @@ void account_system_index_time(struct task_struct *p,
 
 	/* Account for system time used */
 	acct_account_cputime(p);
+
+	/* Account power usage for system time */
+	cpufreq_acct_update_power(p, cputime);
 }
 
 /*
@@ -472,6 +487,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
 	*ut = cputime.utime;
 	*st = cputime.stime;
 }
+EXPORT_SYMBOL_GPL(thread_group_cputime_adjusted);
 
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE: */
 
@@ -642,6 +658,8 @@ void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
 	thread_group_cputime(p, &cputime);
 	cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
 }
+EXPORT_SYMBOL_GPL(thread_group_cputime_adjusted);
+
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 1637b65..3ceee0e 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c

@@ -47,9 +47,10 @@ static unsigned long nsec_low(unsigned long long nsec)
 #define SCHED_FEAT(name, enabled)	\
 	#name ,
 
-static const char * const sched_feat_names[] = {
+const char * const sched_feat_names[] = {
 #include "features.h"
 };
+EXPORT_SYMBOL_GPL(sched_feat_names);
 
 #undef SCHED_FEAT
 
@@ -78,6 +79,7 @@ static int sched_feat_show(struct seq_file *m, void *v)
 struct static_key sched_feat_keys[__SCHED_FEAT_NR] = {
 #include "features.h"
 };
+EXPORT_SYMBOL_GPL(sched_feat_keys);
 
 #undef SCHED_FEAT
 

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2c3d0d4..99388b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c

@@ -56,6 +56,8 @@
 #include "stats.h"
 #include "autogroup.h"
 
+#include <trace/hooks/sched.h>
+
 /*
  * Targeted preemption latency for CPU-bound tasks:
  *
@@ -70,6 +72,7 @@
  * (default: 6ms * (1 + ilog(ncpus)), units: nanoseconds)
  */
 unsigned int sysctl_sched_latency			= 6000000ULL;
+EXPORT_SYMBOL_GPL(sysctl_sched_latency);
 static unsigned int normalized_sysctl_sched_latency	= 6000000ULL;
 
 /*
@@ -627,11 +630,13 @@ static inline bool __entity_less(struct rb_node *a, const struct rb_node *b)
  */
 static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
+	trace_android_rvh_enqueue_entity(cfs_rq, se);
 	rb_add_cached(&se->run_node, &cfs_rq->tasks_timeline, __entity_less);
 }
 
 static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
+	trace_android_rvh_dequeue_entity(cfs_rq, se);
 	rb_erase_cached(&se->run_node, &cfs_rq->tasks_timeline);
 }
 
@@ -4348,6 +4353,11 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 {
 	long last_ewma_diff, last_enqueued_diff;
 	struct util_est ue;
+	int ret = 0;
+
+	trace_android_rvh_util_est_update(cfs_rq, p, task_sleep, &ret);
+	if (ret)
+		return;
 
 	if (!sched_feat(UTIL_EST))
 		return;
@@ -4561,7 +4571,10 @@ static inline int task_fits_cpu(struct task_struct *p, int cpu)
 
 static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
 {
-	if (!sched_asym_cpucap_active())
+	bool need_update = true;
+
+	trace_android_rvh_update_misfit_status(p, rq, &need_update);
+	if (!sched_asym_cpucap_active() || !need_update)
 		return;
 
 	if (!p || p->nr_cpus_allowed == 1) {
@@ -4671,6 +4684,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 
 	/* ensure we never gain time by being placed backwards. */
 	se->vruntime = max_vruntime(se->vruntime, vruntime);
+	trace_android_rvh_place_entity(cfs_rq, se, initial, &vruntime);
 }
 
 static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
@@ -4879,9 +4893,14 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 	unsigned long ideal_runtime, delta_exec;
 	struct sched_entity *se;
 	s64 delta;
+	bool skip_preempt = false;
 
 	ideal_runtime = sched_slice(cfs_rq, curr);
 	delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
+	trace_android_rvh_check_preempt_tick(current, &ideal_runtime, &skip_preempt,
+			delta_exec, cfs_rq, curr, sysctl_sched_min_granularity);
+	if (skip_preempt)
+		return;
 	if (delta_exec > ideal_runtime) {
 		resched_curr(rq_of(cfs_rq));
 		/*
@@ -4910,8 +4929,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 		resched_curr(rq_of(cfs_rq));
 }
 
-static void
-set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
+void set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	clear_buddies(cfs_rq, se);
 
@@ -4947,6 +4965,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 
 	se->prev_sum_exec_runtime = se->sum_exec_runtime;
 }
+EXPORT_SYMBOL_GPL(set_next_entity);
 
 static int
 wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se);
@@ -4962,7 +4981,11 @@ static struct sched_entity *
 pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 {
 	struct sched_entity *left = __pick_first_entity(cfs_rq);
-	struct sched_entity *se;
+	struct sched_entity *se = NULL;
+
+	trace_android_rvh_pick_next_entity(cfs_rq, curr, &se);
+	if (se)
+		goto done;
 
 	/*
 	 * If curr is set we have to see if its left of the leftmost entity
@@ -5004,6 +5027,7 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 		se = cfs_rq->last;
 	}
 
+done:
 	return se;
 }
 
@@ -5066,6 +5090,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
 
 	if (cfs_rq->nr_running > 1)
 		check_preempt_tick(cfs_rq, curr);
+	trace_android_rvh_entity_tick(cfs_rq, curr);
 }
 
 
@@ -5991,6 +6016,11 @@ static inline bool cpu_overutilized(int cpu)
 {
 	unsigned long rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
 	unsigned long rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
+	int overutilized = -1;
+
+	trace_android_rvh_cpu_overutilized(cpu, &overutilized);
+	if (overutilized != -1)
+		return overutilized;
 
 	return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu);
 }
@@ -6079,6 +6109,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		flags = ENQUEUE_WAKEUP;
 	}
 
+	trace_android_rvh_enqueue_task_fair(rq, p, flags);
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
 
@@ -6169,6 +6200,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		flags |= DEQUEUE_SLEEP;
 	}
 
+	trace_android_rvh_dequeue_task_fair(rq, p, flags);
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
 
@@ -7177,7 +7209,7 @@ compute_energy(struct energy_env *eenv, struct perf_domain *pd,
  * other use-cases too. So, until someone finds a better way to solve this,
  * let's keep things simple by re-using the existing slow path.
  */
-static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
+static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu, int sync)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
 	unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;
@@ -7188,12 +7220,27 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 	struct sched_domain *sd;
 	struct perf_domain *pd;
 	struct energy_env eenv;
+	int new_cpu = INT_MAX;
+
+	trace_android_rvh_find_energy_efficient_cpu(p, prev_cpu, sync, &new_cpu);
+	if (new_cpu != INT_MAX)
+		return new_cpu;
+
+	sync_entity_load_avg(&p->se);
 
 	rcu_read_lock();
 	pd = rcu_dereference(rd->pd);
 	if (!pd || READ_ONCE(rd->overutilized))
 		goto unlock;
 
+	cpu = smp_processor_id();
+	if (sync && cpu_rq(cpu)->nr_running == 1 &&
+	    cpumask_test_cpu(cpu, p->cpus_ptr) &&
+	    task_fits_cpu(p, cpu)) {
+		rcu_read_unlock();
+		return cpu;
+	}
+
 	/*
 	 * Energy-aware wake-up happens on the lowest sched_domain starting
 	 * from sd_asym_cpucapacity spanning over this_cpu and prev_cpu.
@@ -7206,7 +7253,6 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 
 	target = prev_cpu;
 
-	sync_entity_load_avg(&p->se);
 	if (!uclamp_task_util(p, p_util_min, p_util_max))
 		goto unlock;
 
@@ -7351,9 +7397,18 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	int cpu = smp_processor_id();
 	int new_cpu = prev_cpu;
 	int want_affine = 0;
+	int target_cpu = -1;
 	/* SD_flags and WF_flags share the first nibble */
 	int sd_flag = wake_flags & 0xF;
 
+	if (trace_android_rvh_select_task_rq_fair_enabled() &&
+	    !(sd_flag & SD_BALANCE_FORK))
+		sync_entity_load_avg(&p->se);
+	trace_android_rvh_select_task_rq_fair(p, prev_cpu, sd_flag,
+			wake_flags, &target_cpu);
+	if (target_cpu >= 0)
+		return target_cpu;
+
 	/*
 	 * required for stable ->cpus_allowed
 	 */
@@ -7362,7 +7417,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 		record_wakee(p);
 
 		if (sched_energy_enabled()) {
-			new_cpu = find_energy_efficient_cpu(p, prev_cpu);
+			new_cpu = find_energy_efficient_cpu(p, prev_cpu, sync);
 			if (new_cpu >= 0)
 				return new_cpu;
 			new_cpu = prev_cpu;
@@ -7558,9 +7613,14 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
 	int scale = cfs_rq->nr_running >= sched_nr_latency;
 	int next_buddy_marked = 0;
 	int cse_is_idle, pse_is_idle;
+	bool ignore = false;
+	bool preempt = false;
 
 	if (unlikely(se == pse))
 		return;
+	trace_android_rvh_check_preempt_wakeup_ignore(curr, &ignore);
+	if (ignore)
+		return;
 
 	/*
 	 * This is possible from callers such as attach_tasks(), in which we
@@ -7617,6 +7677,13 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
 		return;
 
 	update_curr(cfs_rq_of(se));
+	trace_android_rvh_check_preempt_wakeup(rq, p, &preempt, &ignore,
+			wake_flags, se, pse, next_buddy_marked, sysctl_sched_wakeup_granularity);
+	if (preempt)
+		goto preempt;
+	if (ignore)
+		return;
+
 	if (wakeup_preempt_entity(se, pse) == 1) {
 		/*
 		 * Bias pick_next to pick the sched entity that is
@@ -7684,9 +7751,10 @@ struct task_struct *
 pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct cfs_rq *cfs_rq = &rq->cfs;
-	struct sched_entity *se;
-	struct task_struct *p;
+	struct sched_entity *se = NULL;
+	struct task_struct *p = NULL;
 	int new_tasks;
+	bool repick = false;
 
 again:
 	if (!sched_fair_runnable(rq))
@@ -7740,7 +7808,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 	} while (cfs_rq);
 
 	p = task_of(se);
-
+	trace_android_rvh_replace_next_task_fair(rq, &p, &se, &repick, false, prev);
 	/*
 	 * Since we haven't yet done put_prev_entity and if the selected task
 	 * is a different task than we started out with, try and touch the
@@ -7773,6 +7841,10 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 	if (prev)
 		put_prev_task(rq, prev);
 
+	trace_android_rvh_replace_next_task_fair(rq, &p, &se, &repick, true, prev);
+	if (repick)
+		goto done;
+
 	do {
 		se = pick_next_entity(cfs_rq, NULL);
 		set_next_entity(cfs_rq, se);
@@ -8014,7 +8086,8 @@ static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)
  *      rewrite all of this once again.]
  */
 
-static unsigned long __read_mostly max_load_balance_interval = HZ/10;
+unsigned long __read_mostly max_load_balance_interval = HZ/10;
+EXPORT_SYMBOL_GPL(max_load_balance_interval);
 
 enum fbq_type { regular, remote, all };
 
@@ -8094,6 +8167,7 @@ struct lb_env {
 	enum fbq_type		fbq_type;
 	enum migration_type	migration_type;
 	struct list_head	tasks;
+	struct rq_flags		*src_rq_rf;
 };
 
 /*
@@ -8208,9 +8282,14 @@ static
 int can_migrate_task(struct task_struct *p, struct lb_env *env)
 {
 	int tsk_cache_hot;
+	int can_migrate = 1;
 
 	lockdep_assert_rq_held(env->src_rq);
 
+	trace_android_rvh_can_migrate_task(p, env->dst_cpu, &can_migrate);
+	if (!can_migrate)
+		return 0;
+
 	/*
 	 * We do not migrate tasks that are:
 	 * 1) throttled_lb_pair, or
@@ -8298,8 +8377,20 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
  */
 static void detach_task(struct task_struct *p, struct lb_env *env)
 {
+	int detached = 0;
+
 	lockdep_assert_rq_held(env->src_rq);
 
+	/*
+	 * The vendor hook may drop the lock temporarily, so
+	 * pass the rq flags to unpin lock. We expect the
+	 * rq lock to be held after return.
+	 */
+	trace_android_rvh_migrate_queued_task(env->src_rq, env->src_rq_rf, p,
+					      env->dst_cpu, &detached);
+	if (detached)
+		return;
+
 	deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK);
 	set_task_cpu(p, env->dst_cpu);
 }
@@ -8830,6 +8921,7 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
 	if (!capacity)
 		capacity = 1;
 
+	trace_android_rvh_update_cpu_capacity(cpu, &capacity);
 	cpu_rq(cpu)->cpu_capacity = capacity;
 	trace_sched_cpu_capacity_tp(cpu_rq(cpu));
 
@@ -10061,8 +10153,12 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 
 	if (sched_energy_enabled()) {
 		struct root_domain *rd = env->dst_rq->rd;
+		int out_balance = 1;
 
-		if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized))
+		trace_android_rvh_find_busiest_group(sds.busiest, env->dst_rq,
+					&out_balance);
+		if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized)
+					&& out_balance)
 			goto out_balanced;
 	}
 
@@ -10181,7 +10277,12 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 	struct rq *busiest = NULL, *rq;
 	unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1;
 	unsigned int busiest_nr = 0;
-	int i;
+	int i, done = 0;
+
+	trace_android_rvh_find_busiest_queue(env->dst_cpu, group, env->cpus,
+					     &busiest, &done);
+	if (done)
+		return busiest;
 
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
 		unsigned long capacity, load, util;
@@ -10483,6 +10584,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 
 more_balance:
 		rq_lock_irqsave(busiest, &rf);
+		env.src_rq_rf = &rf;
 		update_rq_clock(busiest);
 
 		/*
@@ -10778,6 +10880,7 @@ static int active_load_balance_cpu_stop(void *data)
 			.src_rq		= busiest_rq,
 			.idle		= CPU_IDLE,
 			.flags		= LBF_ACTIVE_LB,
+			.src_rq_rf	= &rf,
 		};
 
 		schedstat_inc(sd->alb_count);
@@ -10859,6 +10962,10 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 	int need_serialize, need_decay = 0;
 	u64 max_cost = 0;
 
+	trace_android_rvh_sched_rebalance_domains(rq, &continue_balancing);
+	if (!continue_balancing)
+		return;
+
 	rcu_read_lock();
 	for_each_domain(cpu, sd) {
 		/*
@@ -10945,9 +11052,13 @@ static inline int on_null_domain(struct rq *rq)
 
 static inline int find_new_ilb(void)
 {
-	int ilb;
+	int ilb = -1;
 	const struct cpumask *hk_mask;
 
+	trace_android_rvh_find_new_ilb(nohz.idle_cpus_mask, &ilb);
+	if (ilb >= 0)
+		return ilb;
+
 	hk_mask = housekeeping_cpumask(HK_TYPE_MISC);
 
 	for_each_cpu_and(ilb, nohz.idle_cpus_mask, hk_mask) {
@@ -11009,6 +11120,7 @@ static void nohz_balancer_kick(struct rq *rq)
 	struct sched_domain *sd;
 	int nr_busy, i, cpu = rq->cpu;
 	unsigned int flags = 0;
+	int done = 0;
 
 	if (unlikely(rq->idle_balance))
 		return;
@@ -11033,6 +11145,10 @@ static void nohz_balancer_kick(struct rq *rq)
 	if (time_before(now, nohz.next_balance))
 		goto out;
 
+	trace_android_rvh_sched_nohz_balancer_kick(rq, &flags, &done);
+	if (done)
+		goto out;
+
 	if (rq->nr_running >= 2) {
 		flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK;
 		goto out;
@@ -11438,6 +11554,11 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 	u64 t0, t1, curr_cost = 0;
 	struct sched_domain *sd;
 	int pulled_task = 0;
+	int done = 0;
+
+	trace_android_rvh_sched_newidle_balance(this_rq, rf, &pulled_task, &done);
+	if (done)
+		return pulled_task;
 
 	update_misfit_status(NULL, this_rq);
 

diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 0f31076..3e94c96 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c

@@ -302,6 +302,7 @@ int __update_load_avg_blocked_se(u64 now, struct sched_entity *se)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(__update_load_avg_blocked_se);
 
 int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
@@ -467,3 +468,63 @@ int update_irq_load_avg(struct rq *rq, u64 running)
 	return ret;
 }
 #endif
+
+__read_mostly unsigned int sched_pelt_lshift;
+
+#ifdef CONFIG_SYSCTL
+static unsigned int sysctl_sched_pelt_multiplier = 1;
+
+int sched_pelt_multiplier(struct ctl_table *table, int write, void *buffer,
+			  size_t *lenp, loff_t *ppos)
+{
+	static DEFINE_MUTEX(mutex);
+	unsigned int old;
+	int ret;
+
+	mutex_lock(&mutex);
+	old = sysctl_sched_pelt_multiplier;
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	if (ret)
+		goto undo;
+	if (!write)
+		goto done;
+
+	switch (sysctl_sched_pelt_multiplier)  {
+	case 1:
+		fallthrough;
+	case 2:
+		fallthrough;
+	case 4:
+		WRITE_ONCE(sched_pelt_lshift,
+			   sysctl_sched_pelt_multiplier >> 1);
+		goto done;
+	default:
+		ret = -EINVAL;
+	}
+
+undo:
+	sysctl_sched_pelt_multiplier = old;
+done:
+	mutex_unlock(&mutex);
+
+	return ret;
+}
+
+static struct ctl_table sched_pelt_sysctls[] = {
+	{
+		.procname       = "sched_pelt_multiplier",
+		.data           = &sysctl_sched_pelt_multiplier,
+		.maxlen         = sizeof(unsigned int),
+		.mode           = 0644,
+		.proc_handler   = sched_pelt_multiplier,
+	},
+	{}
+};
+
+static int __init sched_pelt_sysctl_init(void)
+{
+	register_sysctl_init("kernel", sched_pelt_sysctls);
+	return 0;
+}
+late_initcall(sched_pelt_sysctl_init);
+#endif

diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index 3a0e0dc..9b35b50 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h

@@ -61,6 +61,14 @@ static inline void cfs_se_util_change(struct sched_avg *avg)
 	WRITE_ONCE(avg->util_est.enqueued, enqueued);
 }
 
+static inline u64 rq_clock_task_mult(struct rq *rq)
+{
+	lockdep_assert_rq_held(rq);
+	assert_clock_updated(rq);
+
+	return rq->clock_task_mult;
+}
+
 static inline u64 rq_clock_pelt(struct rq *rq)
 {
 	lockdep_assert_rq_held(rq);
@@ -72,7 +80,7 @@ static inline u64 rq_clock_pelt(struct rq *rq)
 /* The rq is idle, we can sync to clock_task */
 static inline void _update_idle_rq_clock_pelt(struct rq *rq)
 {
-	rq->clock_pelt  = rq_clock_task(rq);
+	rq->clock_pelt = rq_clock_task_mult(rq);
 
 	u64_u32_store(rq->clock_idle, rq_clock(rq));
 	/* Paired with smp_rmb in migrate_se_pelt_lag() */
@@ -121,6 +129,27 @@ static inline void update_rq_clock_pelt(struct rq *rq, s64 delta)
 	rq->clock_pelt += delta;
 }
 
+extern unsigned int sched_pelt_lshift;
+
+/*
+ * absolute time   |1      |2      |3      |4      |5      |6      |
+ * @ mult = 1      --------****************--------****************-
+ * @ mult = 2      --------********----------------********---------
+ * @ mult = 4      --------****--------------------****-------------
+ * clock task mult
+ * @ mult = 2      |   |   |2  |3  |   |   |   |   |5  |6  |   |   |
+ * @ mult = 4      | | | | |2|3| | | | | | | | | | |5|6| | | | | | |
+ *
+ */
+static inline void update_rq_clock_task_mult(struct rq *rq, s64 delta)
+{
+	delta <<= READ_ONCE(sched_pelt_lshift);
+
+	rq->clock_task_mult += delta;
+
+	update_rq_clock_pelt(rq, delta);
+}
+
 /*
  * When rq becomes idle, we have to check if it has lost idle time
  * because it was fully busy. A rq is fully used when the /Sum util_sum
@@ -147,7 +176,7 @@ static inline void update_idle_rq_clock_pelt(struct rq *rq)
 	 * rq's clock_task.
 	 */
 	if (util_sum >= divider)
-		rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt;
+		rq->lost_idle_time += rq_clock_task_mult(rq) - rq->clock_pelt;
 
 	_update_idle_rq_clock_pelt(rq);
 }
@@ -218,13 +247,18 @@ update_irq_load_avg(struct rq *rq, u64 running)
 	return 0;
 }
 
-static inline u64 rq_clock_pelt(struct rq *rq)
+static inline u64 rq_clock_task_mult(struct rq *rq)
 {
 	return rq_clock_task(rq);
 }
 
+static inline u64 rq_clock_pelt(struct rq *rq)
+{
+	return rq_clock_task_mult(rq);
+}
+
 static inline void
-update_rq_clock_pelt(struct rq *rq, s64 delta) { }
+update_rq_clock_task_mult(struct rq *rq, s64 delta) { }
 
 static inline void
 update_idle_rq_clock_pelt(struct rq *rq) { }

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ed2a47e..3f02aa9 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c

@@ -4,6 +4,8 @@
  * policies)
  */
 
+#include <trace/hooks/sched.h>
+
 int sched_rr_timeslice = RR_TIMESLICE;
 /* More than 4 hours if BW_SHIFT equals 20. */
 static const u64 max_rt_runtime = MAX_BW;
@@ -1020,6 +1022,13 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
 		if (likely(rt_b->rt_runtime)) {
 			rt_rq->rt_throttled = 1;
 			printk_deferred_once("sched: RT throttling activated\n");
+
+			trace_android_vh_dump_throttled_rt_tasks(
+				raw_smp_processor_id(),
+				rq_clock(rq_of_rt_rq(rt_rq)),
+				sched_rt_period(rt_rq),
+				runtime,
+				hrtimer_get_expires_ns(&rt_b->rt_period_timer));
 		} else {
 			/*
 			 * In case we did anyway, make it go away,
@@ -1528,6 +1537,27 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
 	enqueue_top_rt_rq(&rq->rt);
 }
 
+#ifdef CONFIG_SMP
+static inline bool should_honor_rt_sync(struct rq *rq, struct task_struct *p,
+					bool sync)
+{
+	/*
+	 * If the waker is CFS, then an RT sync wakeup would preempt the waker
+	 * and force it to run for a likely small time after the RT wakee is
+	 * done. So, only honor RT sync wakeups from RT wakers.
+	 */
+	return sync && task_has_rt_policy(rq->curr) &&
+		p->prio <= rq->rt.highest_prio.next &&
+		rq->rt.rt_nr_running <= 2;
+}
+#else
+static inline bool should_honor_rt_sync(struct rq *rq, struct task_struct *p,
+					bool sync)
+{
+	return 0;
+}
+#endif
+
 /*
  * Adding/removing a task to/from a priority array:
  */
@@ -1535,6 +1565,7 @@ static void
 enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
 {
 	struct sched_rt_entity *rt_se = &p->rt;
+	bool sync = !!(flags & ENQUEUE_WAKEUP_SYNC);
 
 	if (flags & ENQUEUE_WAKEUP)
 		rt_se->timeout = 0;
@@ -1544,7 +1575,8 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
 
 	enqueue_rt_entity(rt_se, flags);
 
-	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
+	if (!task_current(rq, p) && p->nr_cpus_allowed > 1 &&
+	    !should_honor_rt_sync(rq, p, sync))
 		enqueue_pushable_task(rq, p);
 }
 
@@ -1595,12 +1627,47 @@ static void yield_task_rt(struct rq *rq)
 #ifdef CONFIG_SMP
 static int find_lowest_rq(struct task_struct *task);
 
+#ifdef CONFIG_RT_SOFTIRQ_AWARE_SCHED
+/*
+ * Return whether the given cpu is currently non-preemptible
+ * while handling a potentially long softirq, or if the current
+ * task is likely to block preemptions soon because it is a
+ * ksoftirq thread that is handling softirqs.
+ */
+static bool cpu_busy_with_softirqs(int cpu)
+{
+	u32 softirqs = per_cpu(active_softirqs, cpu) |
+		       __cpu_softirq_pending(cpu);
+
+	return softirqs & LONG_SOFTIRQ_MASK;
+}
+#else
+static bool cpu_busy_with_softirqs(int cpu)
+{
+	return false;
+}
+#endif /* CONFIG_RT_SOFTIRQ_AWARE_SCHED */
+
+static bool rt_task_fits_cpu(struct task_struct *p, int cpu)
+{
+	return rt_task_fits_capacity(p, cpu) && !cpu_busy_with_softirqs(cpu);
+}
+
 static int
 select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 {
 	struct task_struct *curr;
 	struct rq *rq;
+	struct rq *this_cpu_rq;
 	bool test;
+	int target_cpu = -1;
+	bool sync = !!(flags & WF_SYNC);
+	int this_cpu;
+
+	trace_android_rvh_select_task_rq_rt(p, cpu, flags & 0xF,
+					flags, &target_cpu);
+	if (target_cpu >= 0)
+		return target_cpu;
 
 	/* For anything but wake ups, just return the task_cpu */
 	if (!(flags & (WF_TTWU | WF_FORK)))
@@ -1610,6 +1677,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 
 	rcu_read_lock();
 	curr = READ_ONCE(rq->curr); /* unlocked access */
+	this_cpu = smp_processor_id();
+	this_cpu_rq = cpu_rq(this_cpu);
 
 	/*
 	 * If the current task on @p's runqueue is an RT task, then
@@ -1633,22 +1702,33 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 	 * This test is optimistic, if we get it wrong the load-balancer
 	 * will have to sort it out.
 	 *
-	 * We take into account the capacity of the CPU to ensure it fits the
-	 * requirement of the task - which is only important on heterogeneous
-	 * systems like big.LITTLE.
+	 * We use rt_task_fits_cpu() to evaluate if the CPU is busy with
+	 * potentially long-running softirq work, as well as take into
+	 * account the capacity of the CPU to ensure it fits the
+	 * requirement of the task - which is only important on
+	 * heterogeneous systems like big.LITTLE.
 	 */
 	test = curr &&
 	       unlikely(rt_task(curr)) &&
 	       (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
 
-	if (test || !rt_task_fits_capacity(p, cpu)) {
+	/*
+	 * Respect the sync flag as long as the task can run on this CPU.
+	 */
+	if (should_honor_rt_sync(this_cpu_rq, p, sync) &&
+	    cpumask_test_cpu(this_cpu, p->cpus_ptr)) {
+		cpu = this_cpu;
+		goto out_unlock;
+	}
+
+	if (test || !rt_task_fits_cpu(p, cpu)) {
 		int target = find_lowest_rq(p);
 
 		/*
 		 * Bail out if we were forcing a migration to find a better
 		 * fitting CPU but our search failed.
 		 */
-		if (!test && target != -1 && !rt_task_fits_capacity(p, target))
+		if (!test && target != -1 && !rt_task_fits_cpu(p, target))
 			goto out_unlock;
 
 		/*
@@ -1697,6 +1777,8 @@ static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
 static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
 {
 	if (!on_rt_rq(&p->rt) && need_pull_rt_task(rq, p)) {
+		int done = 0;
+
 		/*
 		 * This is OK, because current is on_cpu, which avoids it being
 		 * picked for load-balance and preemption/IRQs are still
@@ -1704,7 +1786,9 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
 		 * not yet started the picking loop.
 		 */
 		rq_unpin_lock(rq, rf);
-		pull_rt_task(rq);
+		trace_android_rvh_sched_balance_rt(rq, p, &done);
+		if (!done)
+			pull_rt_task(rq);
 		rq_repin_lock(rq, rf);
 	}
 
@@ -1856,7 +1940,7 @@ static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
  * Return the highest pushable rq's task, which is suitable to be executed
  * on the CPU, NULL otherwise
  */
-static struct task_struct *pick_highest_pushable_task(struct rq *rq, int cpu)
+struct task_struct *pick_highest_pushable_task(struct rq *rq, int cpu)
 {
 	struct plist_head *head = &rq->rt.pushable_tasks;
 	struct task_struct *p;
@@ -1871,6 +1955,7 @@ static struct task_struct *pick_highest_pushable_task(struct rq *rq, int cpu)
 
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(pick_highest_pushable_task);
 
 static DEFINE_PER_CPU(cpumask_var_t, local_cpu_mask);
 
@@ -1879,7 +1964,7 @@ static int find_lowest_rq(struct task_struct *task)
 	struct sched_domain *sd;
 	struct cpumask *lowest_mask = this_cpu_cpumask_var_ptr(local_cpu_mask);
 	int this_cpu = smp_processor_id();
-	int cpu      = task_cpu(task);
+	int cpu      = -1;
 	int ret;
 
 	/* Make sure the mask is initialized first */
@@ -1890,23 +1975,32 @@ static int find_lowest_rq(struct task_struct *task)
 		return -1; /* No other targets possible */
 
 	/*
-	 * If we're on asym system ensure we consider the different capacities
-	 * of the CPUs when searching for the lowest_mask.
+	 * If we're using the softirq optimization or if we are
+	 * on asym system, ensure we consider the softirq processing
+	 * or different capacities of the CPUs when searching for the
+	 * lowest_mask.
 	 */
-	if (sched_asym_cpucap_active()) {
+	if (IS_ENABLED(CONFIG_RT_SOFTIRQ_AWARE_SCHED) ||
+	    sched_asym_cpucap_active()) {
 
 		ret = cpupri_find_fitness(&task_rq(task)->rd->cpupri,
 					  task, lowest_mask,
-					  rt_task_fits_capacity);
+					  rt_task_fits_cpu);
 	} else {
 
 		ret = cpupri_find(&task_rq(task)->rd->cpupri,
 				  task, lowest_mask);
 	}
 
+	trace_android_rvh_find_lowest_rq(task, lowest_mask, ret, &cpu);
+	if (cpu >= 0)
+		return cpu;
+
 	if (!ret)
 		return -1; /* No targets found */
 
+	cpu = task_cpu(task);
+
 	/*
 	 * At this point we have built a mask of CPUs representing the
 	 * lowest priority tasks in the system.  Now we want to elect
@@ -2238,6 +2332,9 @@ static int rto_next_cpu(struct root_domain *rd)
 		/* When rto_cpu is -1 this acts like cpumask_first() */
 		cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
 
+		/* this will be any CPU in the rd->rto_mask, and can be a halted cpu update it */
+		trace_android_rvh_rto_next_cpu(rd->rto_cpu, rd->rto_mask, &cpu);
+
 		rd->rto_cpu = cpu;
 
 		if (cpu < nr_cpu_ids)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d6d488e..9228036 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h

@@ -68,6 +68,8 @@
 #include <linux/wait_api.h>
 #include <linux/wait_bit.h>
 #include <linux/workqueue_api.h>
+#include <linux/android_vendor.h>
+#include "android.h"
 
 #include <trace/events/power.h>
 #include <trace/events/sched.h>
@@ -416,6 +418,10 @@ struct task_group {
 	struct uclamp_se	uclamp_req[UCLAMP_CNT];
 	/* Effective clamp values used for a task group */
 	struct uclamp_se	uclamp[UCLAMP_CNT];
+	/* Latency-sensitive flag used for a task group */
+	unsigned int		latency_sensitive;
+
+	ANDROID_VENDOR_DATA_ARRAY(1, 4);
 #endif
 
 };
@@ -881,6 +887,8 @@ struct root_domain {
 	 * CPUs of the rd. Protected by RCU.
 	 */
 	struct perf_domain __rcu *pd;
+
+	ANDROID_VENDOR_DATA_ARRAY(1, 1);
 };
 
 extern void init_defrootdomain(void);
@@ -892,6 +900,7 @@ extern void sched_put_rd(struct root_domain *rd);
 #ifdef HAVE_RT_PUSH_IPI
 extern void rto_push_irq_work_func(struct irq_work *work);
 #endif
+extern struct task_struct *pick_highest_pushable_task(struct rq *rq, int cpu);
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_UCLAMP_TASK
@@ -1015,6 +1024,7 @@ struct rq {
 	u64			clock;
 	/* Ensure that all clocks are in the same cache line */
 	u64			clock_task ____cacheline_aligned;
+	u64			clock_task_mult;
 	u64			clock_pelt;
 	unsigned long		lost_idle_time;
 	u64			clock_pelt_idle;
@@ -1150,6 +1160,8 @@ struct rq {
 	unsigned int		core_forceidle_occupation;
 	u64			core_forceidle_start;
 #endif
+
+	ANDROID_VENDOR_DATA_ARRAY(1, 1);
 };
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -1550,6 +1562,11 @@ struct rq_flags {
 #endif
 };
 
+#ifdef CONFIG_SMP
+extern struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
+				 struct task_struct *p, int dest_cpu);
+#endif
+
 extern struct balance_callback balance_push_callback;
 
 /*
@@ -1716,8 +1733,6 @@ enum numa_faults_stats {
 };
 extern void sched_setnuma(struct task_struct *p, int node);
 extern int migrate_task_to(struct task_struct *p, int cpu);
-extern int migrate_swap(struct task_struct *p, struct task_struct *t,
-			int cpu, int scpu);
 extern void init_numa_balancing(unsigned long clone_flags, struct task_struct *p);
 #else
 static inline void
@@ -1728,6 +1743,9 @@ init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
 
 #ifdef CONFIG_SMP
 
+extern int migrate_swap(struct task_struct *p, struct task_struct *t,
+			int cpu, int scpu);
+
 static inline void
 queue_balance_callback(struct rq *rq,
 		       struct balance_callback *head,
@@ -2010,6 +2028,8 @@ static __always_inline bool static_branch_##name(struct static_key *key) \
 #undef SCHED_FEAT
 
 extern struct static_key sched_feat_keys[__SCHED_FEAT_NR];
+extern const char * const sched_feat_names[__SCHED_FEAT_NR];
+
 #define sched_feat(x) (static_branch_##x(&sched_feat_keys[__SCHED_FEAT_##x]))
 
 #else /* !CONFIG_JUMP_LABEL */
@@ -2084,6 +2104,8 @@ static inline int task_on_rq_migrating(struct task_struct *p)
 #define WF_SYNC     0x10 /* Waker goes to sleep after wakeup */
 #define WF_MIGRATED 0x20 /* Internal use, task got migrated */
 
+#define WF_ANDROID_VENDOR	0x1000 /* Vendor specific for Android */
+
 #ifdef CONFIG_SMP
 static_assert(WF_EXEC == SD_BALANCE_EXEC);
 static_assert(WF_FORK == SD_BALANCE_FORK);
@@ -2142,6 +2164,8 @@ extern const u32		sched_prio_to_wmult[40];
 #define ENQUEUE_MIGRATED	0x00
 #endif
 
+#define ENQUEUE_WAKEUP_SYNC	0x80
+
 #define RETRY_TASK		((void *)-1UL)
 
 struct sched_class {
@@ -2308,6 +2332,7 @@ static inline struct task_struct *get_push_task(struct rq *rq)
 
 extern int push_cpu_stop(void *arg);
 
+extern unsigned long __read_mostly max_load_balance_interval;
 #endif
 
 #ifdef CONFIG_CPU_IDLE

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8739c2a..c480058 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c

@@ -3,6 +3,8 @@
  * Scheduler topology setup/handling methods
  */
 
+#include <trace/hooks/sched.h>
+
 DEFINE_MUTEX(sched_domains_mutex);
 
 /* Protected by sched_domains_mutex: */
@@ -347,8 +349,7 @@ static void sched_energy_set(bool has_eas)
  *    2. the SD_ASYM_CPUCAPACITY flag is set in the sched_domain hierarchy.
  *    3. no SMT is detected.
  *    4. the EM complexity is low enough to keep scheduling overheads low;
- *    5. schedutil is driving the frequency of all CPUs of the rd;
- *    6. frequency invariance support is present;
+ *    5. frequency invariance support is present;
  *
  * The complexity of the Energy Model is defined as:
  *
@@ -368,21 +369,23 @@ static void sched_energy_set(bool has_eas)
  */
 #define EM_MAX_COMPLEXITY 2048
 
-extern struct cpufreq_governor schedutil_gov;
 static bool build_perf_domains(const struct cpumask *cpu_map)
 {
 	int i, nr_pd = 0, nr_ps = 0, nr_cpus = cpumask_weight(cpu_map);
 	struct perf_domain *pd = NULL, *tmp;
 	int cpu = cpumask_first(cpu_map);
 	struct root_domain *rd = cpu_rq(cpu)->rd;
-	struct cpufreq_policy *policy;
-	struct cpufreq_governor *gov;
+	bool eas_check = false;
 
 	if (!sysctl_sched_energy_aware)
 		goto free;
 
-	/* EAS is enabled for asymmetric CPU capacity topologies. */
-	if (!per_cpu(sd_asym_cpucapacity, cpu)) {
+	/*
+	 * EAS is enabled for asymmetric CPU capacity topologies.
+	 * Allow vendor to override if desired.
+	 */
+	trace_android_rvh_build_perf_domains(&eas_check);
+	if (!per_cpu(sd_asym_cpucapacity, cpu) && !eas_check) {
 		if (sched_debug()) {
 			pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
 					cpumask_pr_args(cpu_map));
@@ -410,19 +413,6 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
 		if (find_pd(pd, i))
 			continue;
 
-		/* Do not attempt EAS if schedutil is not being used. */
-		policy = cpufreq_cpu_get(i);
-		if (!policy)
-			goto free;
-		gov = policy->governor;
-		cpufreq_cpu_put(policy);
-		if (gov != &schedutil_gov) {
-			if (rd->pd)
-				pr_warn("rd %*pbl: Disabling EAS, schedutil is mandatory\n",
-						cpumask_pr_args(cpu_map));
-			goto free;
-		}
-
 		/* Create the new pd and add it to the local list. */
 		tmp = pd_init(i);
 		if (!tmp)
@@ -2388,6 +2378,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
 		pr_info("root domain span: %*pbl (max cpu_capacity = %lu)\n",
 			cpumask_pr_args(cpu_map), rq->rd->max_cpu_capacity);
 	}
+	trace_android_vh_build_sched_domains(has_asym);
 
 	ret = 0;
 error:

diff --git a/kernel/sched/vendor_hooks.c b/kernel/sched/vendor_hooks.c
new file mode 100644
index 0000000..35d9846
--- /dev/null
+++ b/kernel/sched/vendor_hooks.c

@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* vendor_hook.c
+ *
+ * Copyright 2022 Google LLC
+ */
+#include <linux/sched/cputime.h>
+#include "sched.h"
+#include "pelt.h"
+#include "smp.h"
+
+#define CREATE_TRACE_POINTS
+#include <trace/hooks/vendor_hooks.h>
+#include <linux/tracepoint.h>
+#include <trace/hooks/sched.h>
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_select_task_rq_fair);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_select_task_rq_rt);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_select_fallback_rq);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_scheduler_tick);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_enqueue_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_dequeue_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_can_migrate_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_find_lowest_rq);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_prepare_prio_fork);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_finish_prio_fork);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_rtmutex_prepare_setprio);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_set_user_nice);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_setscheduler);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_find_busiest_group);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_dump_throttled_rt_tasks);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_jiffies_update);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_newidle_balance);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_nohz_balancer_kick);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_rebalance_domains);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_find_busiest_queue);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_migrate_queued_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_cpu_overutilized);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_setaffinity);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_update_cpus_allowed);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_build_sched_domains);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_check_preempt_tick);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_check_preempt_wakeup_ignore);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_replace_next_task_fair);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_balance_rt);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_pick_next_entity);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_check_preempt_wakeup);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_set_cpus_allowed_by_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_free_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_uclamp_eff_get);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_after_enqueue_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_after_dequeue_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_enqueue_entity);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_dequeue_entity);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_entity_tick);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_enqueue_task_fair);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_dequeue_task_fair);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_util_est_update);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_setscheduler_uclamp);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_set_task_cpu);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_try_to_wake_up);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_try_to_wake_up_success);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_fork);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_wake_up_new_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_new_task_stats);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_flush_task);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_tick_entry);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_schedule);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_cpu_starting);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_cpu_dying);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_account_irq);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_place_entity);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_build_perf_domains);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_update_cpu_capacity);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_update_misfit_status);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_rto_next_cpu);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_is_cpu_allowed);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_get_nohz_timer_target);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_getaffinity);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_do_sched_yield);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_fork_init);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_ttwu_cond);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_schedule_bug);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_sched_exec);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_update_topology_flags_workfn);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_update_thermal_stats);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_do_wake_up_sync);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_set_wake_flags);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_find_new_ilb);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_find_energy_efficient_cpu);

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 9860bb9a..ba93b6c 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c

@@ -4,6 +4,7 @@
  *
  * (C) 2004 Nadia Yvette Chambers, Oracle
  */
+#include <trace/hooks/sched.h>
 
 void __init_waitqueue_head(struct wait_queue_head *wq_head, const char *name, struct lock_class_key *key)
 {
@@ -198,10 +199,13 @@ EXPORT_SYMBOL_GPL(__wake_up_locked_key_bookmark);
 void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode,
 			void *key)
 {
+	int wake_flags = WF_SYNC;
+
 	if (unlikely(!wq_head))
 		return;
 
-	__wake_up_common_lock(wq_head, mode, 1, WF_SYNC, key);
+	trace_android_vh_set_wake_flags(&wake_flags, &mode);
+	__wake_up_common_lock(wq_head, mode, 1, wake_flags, key);
 }
 EXPORT_SYMBOL_GPL(__wake_up_sync_key);
 

diff --git a/kernel/softirq.c b/kernel/softirq.c
index c8a6913..a63d8b8 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c

@@ -33,6 +33,9 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/irq.h>
 
+EXPORT_TRACEPOINT_SYMBOL_GPL(irq_handler_entry);
+EXPORT_TRACEPOINT_SYMBOL_GPL(irq_handler_exit);
+
 /*
    - No shared variables, all the data are CPU local.
    - If a softirq needs serialization, let it serialize itself
@@ -59,6 +62,22 @@ EXPORT_PER_CPU_SYMBOL(irq_stat);
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
 DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+EXPORT_PER_CPU_SYMBOL_GPL(ksoftirqd);
+
+#ifdef CONFIG_RT_SOFTIRQ_AWARE_SCHED
+/*
+ * active_softirqs -- per cpu, a mask of softirqs that are being handled,
+ * with the expectation that approximate answers are acceptable and therefore
+ * no synchronization.
+ */
+DEFINE_PER_CPU(u32, active_softirqs);
+static inline void set_active_softirqs(u32 pending)
+{
+	__this_cpu_write(active_softirqs, pending);
+}
+#else /* CONFIG_RT_SOFTIRQ_AWARE_SCHED */
+static inline void set_active_softirqs(u32 pending) {};
+#endif /* CONFIG_RT_SOFTIRQ_AWARE_SCHED */
 
 const char * const softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
@@ -80,6 +99,7 @@ static void wakeup_softirqd(void)
 		wake_up_process(tsk);
 }
 
+#ifndef CONFIG_RT_SOFTIRQ_AWARE_SCHED
 /*
  * If ksoftirqd is scheduled, we do not want to process pending softirqs
  * right now. Let ksoftirqd handle this at its own rate, to get fairness,
@@ -94,6 +114,9 @@ static bool ksoftirqd_running(unsigned long pending)
 		return false;
 	return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
 }
+#else
+#define ksoftirqd_running(pending) (false)
+#endif /* CONFIG_RT_SOFTIRQ_AWARE_SCHED */
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 DEFINE_PER_CPU(int, hardirqs_enabled);
@@ -525,6 +548,21 @@ static inline bool lockdep_softirq_start(void) { return false; }
 static inline void lockdep_softirq_end(bool in_hardirq) { }
 #endif
 
+#ifdef CONFIG_RT_SOFTIRQ_AWARE_SCHED
+static __u32 softirq_deferred_for_rt(__u32 *pending)
+{
+	__u32 deferred = 0;
+
+	if (rt_task(current)) {
+		deferred = *pending & LONG_SOFTIRQ_MASK;
+		*pending &= ~LONG_SOFTIRQ_MASK;
+	}
+	return deferred;
+}
+#else
+#define softirq_deferred_for_rt(x) (0)
+#endif
+
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
 	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
@@ -532,6 +570,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	int max_restart = MAX_SOFTIRQ_RESTART;
 	struct softirq_action *h;
 	bool in_hardirq;
+	__u32 deferred;
 	__u32 pending;
 	int softirq_bit;
 
@@ -543,14 +582,17 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
+	deferred = softirq_deferred_for_rt(&pending);
 
 	softirq_handle_begin();
+
 	in_hardirq = lockdep_softirq_start();
 	account_softirq_enter(current);
 
 restart:
 	/* Reset the pending bitmask before enabling irqs */
-	set_softirq_pending(0);
+	set_softirq_pending(deferred);
+	set_active_softirqs(pending);
 
 	local_irq_enable();
 
@@ -580,6 +622,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 		pending >>= softirq_bit;
 	}
 
+	set_active_softirqs(0);
 	if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
 	    __this_cpu_read(ksoftirqd) == current)
 		rcu_softirq_qs();
@@ -587,14 +630,17 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	local_irq_disable();
 
 	pending = local_softirq_pending();
+	deferred = softirq_deferred_for_rt(&pending);
+
 	if (pending) {
 		if (time_before(jiffies, end) && !need_resched() &&
 		    --max_restart)
 			goto restart;
-
-		wakeup_softirqd();
 	}
 
+	if (pending | deferred)
+		wakeup_softirqd();
+
 	account_softirq_exit(current);
 	lockdep_softirq_end(in_hardirq);
 	softirq_handle_end();
@@ -793,10 +839,15 @@ static void tasklet_action_common(struct softirq_action *a,
 		if (tasklet_trylock(t)) {
 			if (!atomic_read(&t->count)) {
 				if (tasklet_clear_sched(t)) {
-					if (t->use_callback)
+					if (t->use_callback) {
+						trace_tasklet_entry(t->callback);
 						t->callback(t);
-					else
+						trace_tasklet_exit(t->callback);
+					} else {
+						trace_tasklet_entry(t->func);
 						t->func(t->data);
+						trace_tasklet_exit(t->func);
+					}
 				}
 				tasklet_unlock(t);
 				continue;

diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
index 9ed5ce9..af34650 100644
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c

@@ -151,6 +151,7 @@ unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
 	put_task_stack(tsk);
 	return c.len;
 }
+EXPORT_SYMBOL_GPL(stack_trace_save_tsk);
 
 /**
  * stack_trace_save_regs - Save a stack trace based on pt_regs into a storage array
@@ -174,6 +175,7 @@ unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
 	arch_stack_walk(consume_entry, &c, current, regs);
 	return c.len;
 }
+EXPORT_SYMBOL_GPL(stack_trace_save_regs);
 
 #ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
 /**

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index cedb17b..feffcfd 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c

@@ -152,6 +152,7 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
 	wait_for_completion(&done.completion);
 	return done.ret;
 }
+EXPORT_SYMBOL_GPL(stop_one_cpu);
 
 /* This controls the threads on each CPU. */
 enum multi_stop_state {
@@ -387,6 +388,7 @@ bool stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
 	*work_buf = (struct cpu_stop_work){ .fn = fn, .arg = arg, .caller = _RET_IP_, };
 	return cpu_stop_queue_work(cpu, work_buf);
 }
+EXPORT_SYMBOL_GPL(stop_one_cpu_nowait);
 
 static bool queue_stop_cpus_work(const struct cpumask *cpumask,
 				 cpu_stop_fn_t fn, void *arg,

diff --git a/kernel/sys.c b/kernel/sys.c
index 88b31f09..1c7b1ea 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c

@@ -76,6 +76,8 @@
 
 #include "uid16.h"
 
+#include <trace/hooks/sys.h>
+
 #ifndef SET_UNALIGN_CTL
 # define SET_UNALIGN_CTL(a, b)	(-EINVAL)
 #endif
@@ -2632,6 +2634,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		error = -EINVAL;
 		break;
 	}
+	trace_android_vh_syscall_prctl_finished(option, me);
 	return error;
 }
 

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 8464c5a..2d945d3 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c

@@ -17,6 +17,7 @@
 #include <linux/sched_clock.h>
 #include <linux/seqlock.h>
 #include <linux/bitops.h>
+#include <trace/hooks/epoch.h>
 
 #include "timekeeping.h"
 
@@ -269,6 +270,7 @@ int sched_clock_suspend(void)
 	update_sched_clock();
 	hrtimer_cancel(&sched_clock_timer);
 	rd->read_sched_clock = suspended_sched_clock_read;
+	trace_android_vh_show_suspend_epoch_val(rd->epoch_ns, rd->epoch_cyc);
 
 	return 0;
 }
@@ -280,6 +282,7 @@ void sched_clock_resume(void)
 	rd->epoch_cyc = cd.actual_read_sched_clock();
 	hrtimer_start(&sched_clock_timer, cd.wrap_kt, HRTIMER_MODE_REL_HARD);
 	rd->read_sched_clock = cd.actual_read_sched_clock;
+	trace_android_vh_show_resume_epoch_val(rd->epoch_cyc);
 }
 
 static struct syscore_ops sched_clock_ops = {

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 4678935..5c47cc5 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c

@@ -17,6 +17,7 @@
 #include <linux/sched.h>
 #include <linux/module.h>
 #include <trace/events/power.h>
+#include <trace/hooks/sched.h>
 
 #include <asm/irq_regs.h>
 
@@ -95,6 +96,7 @@ static void tick_periodic(int cpu)
 		write_seqcount_end(&jiffies_seq);
 		raw_spin_unlock(&jiffies_lock);
 		update_wall_time();
+		trace_android_vh_jiffies_update(NULL);
 	}
 
 	update_process_times(user_mode(get_irq_regs()));

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b0e3c920..1e67c83 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c

@@ -26,6 +26,7 @@
 #include <linux/posix-timers.h>
 #include <linux/context_tracking.h>
 #include <linux/mm.h>
+#include <trace/hooks/sched.h>
 
 #include <asm/irq_regs.h>
 
@@ -195,8 +196,10 @@ static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
 #endif
 
 	/* Check, if the jiffies need an update */
-	if (tick_do_timer_cpu == cpu)
+	if (tick_do_timer_cpu == cpu) {
 		tick_do_update_jiffies64(now);
+		trace_android_vh_jiffies_update(NULL);
+	}
 
 	/*
 	 * If jiffies update stalled for too long (timekeeper in stop_machine()
@@ -1248,6 +1251,7 @@ ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
 
 	return ktime_sub(next_event, now);
 }
+EXPORT_SYMBOL_GPL(tick_nohz_get_sleep_length);
 
 /**
  * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value
@@ -1261,6 +1265,7 @@ unsigned long tick_nohz_get_idle_calls_cpu(int cpu)
 
 	return ts->idle_calls;
 }
+EXPORT_SYMBOL_GPL(tick_nohz_get_idle_calls_cpu);
 
 /**
  * tick_nohz_get_idle_calls - return the current idle calls counter value

diff --git a/kernel/time/time.c b/kernel/time/time.c
index 526257b..f2e92bd 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c

@@ -686,6 +686,7 @@ u64 nsec_to_clock_t(u64 x)
 	return div_u64(x * 9, (9ull * NSEC_PER_SEC + (USER_HZ / 2)) / USER_HZ);
 #endif
 }
+EXPORT_SYMBOL_GPL(nsec_to_clock_t);
 
 u64 jiffies64_to_nsecs(u64 j)
 {

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index f72b9f1..a96d698 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c

@@ -1055,9 +1055,11 @@ noinstr time64_t __ktime_get_real_seconds(void)
 void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
+	u32 mono_mult, mono_shift;
 	unsigned int seq;
 	ktime_t base_raw;
 	ktime_t base_real;
+	ktime_t base_boot;
 	u64 nsec_raw;
 	u64 nsec_real;
 	u64 now;
@@ -1072,14 +1074,21 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
 		base_real = ktime_add(tk->tkr_mono.base,
 				      tk_core.timekeeper.offs_real);
+		base_boot = ktime_add(tk->tkr_mono.base,
+				      tk_core.timekeeper.offs_boot);
 		base_raw = tk->tkr_raw.base;
 		nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
 		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
+		mono_mult = tk->tkr_mono.mult;
+		mono_shift = tk->tkr_mono.shift;
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
 	systime_snapshot->cycles = now;
 	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
+	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
 	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
+	systime_snapshot->mono_shift = mono_shift;
+	systime_snapshot->mono_mult = mono_mult;
 }
 EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 717fcb9..3cd0eb3 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c

@@ -56,6 +56,11 @@
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/timer.h>
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/timer.h>
+
+EXPORT_TRACEPOINT_SYMBOL_GPL(hrtimer_expire_entry);
+EXPORT_TRACEPOINT_SYMBOL_GPL(hrtimer_expire_exit);
 
 __visible u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;
 
@@ -525,6 +530,7 @@ static inline unsigned calc_index(unsigned long expires, unsigned lvl,
 	 *
 	 * Round up with level granularity to prevent this.
 	 */
+	trace_android_vh_timer_calc_index(lvl, &expires);
 	expires = (expires >> LVL_SHIFT(lvl)) + 1;
 	*bucket_expiry = expires << LVL_SHIFT(lvl);
 	return LVL_OFFS(lvl) + (expires & LVL_MASK);

diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 21bb161..3ca551c 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c

@@ -18,4 +18,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(suspend_resume);
 EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_frequency);
 EXPORT_TRACEPOINT_SYMBOL_GPL(powernv_throttle);
-
+EXPORT_TRACEPOINT_SYMBOL_GPL(device_pm_callback_start);
+EXPORT_TRACEPOINT_SYMBOL_GPL(device_pm_callback_end);
+EXPORT_TRACEPOINT_SYMBOL_GPL(clock_set_rate);

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b21bf14..0325277 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c

@@ -5,6 +5,7 @@
  * Copyright (C) 2008 Steven Rostedt <srostedt@redhat.com>
  */
 #include <linux/trace_recursion.h>
+#include <linux/ring_buffer_ext.h>
 #include <linux/trace_events.h>
 #include <linux/ring_buffer.h>
 #include <linux/trace_clock.h>
@@ -132,23 +133,6 @@ int ring_buffer_print_entry_header(struct trace_seq *s)
 /* Used for individual buffers (after the counter) */
 #define RB_BUFFER_OFF		(1 << 20)
 
-#define BUF_PAGE_HDR_SIZE offsetof(struct buffer_data_page, data)
-
-#define RB_EVNT_HDR_SIZE (offsetof(struct ring_buffer_event, array))
-#define RB_ALIGNMENT		4U
-#define RB_MAX_SMALL_DATA	(RB_ALIGNMENT * RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
-#define RB_EVNT_MIN_SIZE	8U	/* two 32bit words */
-
-#ifndef CONFIG_HAVE_64BIT_ALIGNED_ACCESS
-# define RB_FORCE_8BYTE_ALIGNMENT	0
-# define RB_ARCH_ALIGNMENT		RB_ALIGNMENT
-#else
-# define RB_FORCE_8BYTE_ALIGNMENT	1
-# define RB_ARCH_ALIGNMENT		8U
-#endif
-
-#define RB_ALIGN_DATA		__aligned(RB_ARCH_ALIGNMENT)
-
 /* define RINGBUF_TYPE_DATA for 'case RINGBUF_TYPE_DATA:' */
 #define RINGBUF_TYPE_DATA 0 ... RINGBUF_TYPE_DATA_TYPE_LEN_MAX
 
@@ -291,10 +275,6 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data);
 #define for_each_online_buffer_cpu(buffer, cpu)		\
 	for_each_cpu_and(cpu, buffer->cpumask, cpu_online_mask)
 
-#define TS_SHIFT	27
-#define TS_MASK		((1ULL << TS_SHIFT) - 1)
-#define TS_DELTA_TEST	(~TS_MASK)
-
 static u64 rb_event_time_stamp(struct ring_buffer_event *event)
 {
 	u64 ts;
@@ -311,12 +291,6 @@ static u64 rb_event_time_stamp(struct ring_buffer_event *event)
 /* Missed count stored at end */
 #define RB_MISSED_STORED	(1 << 30)
 
-struct buffer_data_page {
-	u64		 time_stamp;	/* page time stamp */
-	local_t		 commit;	/* write committed index */
-	unsigned char	 data[] RB_ALIGN_DATA;	/* data of buffer page */
-};
-
 /*
  * Note, the buffer_page list must be first. The buffer pages
  * are allocated in cache lines, which means that each buffer
@@ -364,18 +338,6 @@ static void free_buffer_page(struct buffer_page *bpage)
 	kfree(bpage);
 }
 
-/*
- * We need to fit the time_stamp delta into 27 bits.
- */
-static inline int test_time_stamp(u64 delta)
-{
-	if (delta & TS_DELTA_TEST)
-		return 1;
-	return 0;
-}
-
-#define BUF_PAGE_SIZE (PAGE_SIZE - BUF_PAGE_HDR_SIZE)
-
 /* Max payload is BUF_PAGE_SIZE - header (8bytes) */
 #define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2))
 
@@ -555,6 +517,8 @@ struct trace_buffer {
 
 	struct rb_irq_work		irq_work;
 	bool				time_stamp_abs;
+
+	struct ring_buffer_ext_cb	*ext_cb;
 };
 
 struct ring_buffer_iter {
@@ -760,6 +724,16 @@ static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set)
 }
 #endif
 
+static inline bool has_ext_writer(struct trace_buffer *buffer)
+{
+	return !!buffer->ext_cb;
+}
+
+static inline bool rb_has_ext_writer(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	return has_ext_writer(cpu_buffer->buffer);
+}
+
 /*
  * Enable this to make sure that the event passed to
  * ring_buffer_event_time_stamp() is not committed and also
@@ -1895,6 +1869,26 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 }
 EXPORT_SYMBOL_GPL(__ring_buffer_alloc);
 
+struct trace_buffer *ring_buffer_alloc_ext(unsigned long size,
+					   struct ring_buffer_ext_cb *cb)
+{
+	struct trace_buffer *buffer;
+
+	if (!cb || !cb->update_footers || !cb->swap_reader)
+		return NULL;
+
+	buffer = ring_buffer_alloc(size, RB_FL_OVERWRITE);
+	if (!buffer)
+		return NULL;
+
+	WARN_ON(cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE,
+					    &buffer->node));
+	buffer->ext_cb = cb;
+	atomic_set(&buffer->record_disabled, 1);
+
+	return buffer;
+}
+
 /**
  * ring_buffer_free - free a ring buffer.
  * @buffer: the buffer to free.
@@ -1904,7 +1898,9 @@ ring_buffer_free(struct trace_buffer *buffer)
 {
 	int cpu;
 
-	cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node);
+	if (!has_ext_writer(buffer))
+		cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE,
+					    &buffer->node);
 
 	for_each_buffer_cpu(buffer, cpu)
 		rb_free_cpu_buffer(buffer->buffers[cpu]);
@@ -2173,6 +2169,8 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
 	unsigned long nr_pages;
 	int cpu, err;
 
+	if (unlikely(has_ext_writer(buffer)))
+		return -EINVAL;
 	/*
 	 * Always succeed at resizing a non-existent buffer:
 	 */
@@ -3488,7 +3486,7 @@ static void check_buffer(struct ring_buffer_per_cpu *cpu_buffer,
 		return;
 
 	/*
-	 * If this interrupted another event, 
+	 * If this interrupted another event,
 	 */
 	if (atomic_inc_return(this_cpu_ptr(&checking)) != 1)
 		goto out;
@@ -3898,6 +3896,9 @@ void ring_buffer_discard_commit(struct trace_buffer *buffer,
 	struct ring_buffer_per_cpu *cpu_buffer;
 	int cpu;
 
+	if (unlikely(has_ext_writer(buffer)))
+		return;
+
 	/* The event is discarded regardless */
 	rb_event_discard(event);
 
@@ -4053,6 +4054,9 @@ EXPORT_SYMBOL_GPL(ring_buffer_record_disable);
  */
 void ring_buffer_record_enable(struct trace_buffer *buffer)
 {
+	if (unlikely(has_ext_writer(buffer)))
+		return;
+
 	atomic_dec(&buffer->record_disabled);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_record_enable);
@@ -4096,6 +4100,9 @@ void ring_buffer_record_on(struct trace_buffer *buffer)
 	unsigned int rd;
 	unsigned int new_rd;
 
+	if (unlikely(has_ext_writer(buffer)))
+		return;
+
 	do {
 		rd = atomic_read(&buffer->record_disabled);
 		new_rd = rd & ~RB_BUFFER_OFF;
@@ -4540,49 +4547,119 @@ rb_update_iter_read_stamp(struct ring_buffer_iter *iter,
 	return;
 }
 
-static struct buffer_page *
-rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
+static void __set_head_page_flag(struct buffer_page *head, int flag)
 {
-	struct buffer_page *reader = NULL;
-	unsigned long overwrite;
-	unsigned long flags;
-	int nr_loops = 0;
-	int ret;
+	struct list_head *prev = head->list.prev;
 
-	local_irq_save(flags);
-	arch_spin_lock(&cpu_buffer->lock);
+	prev->next = (struct list_head *)(((unsigned long)prev->next & ~RB_FLAG_MASK) | flag);
+}
 
- again:
+static int __read_footer_reader_status(struct buffer_page *bpage)
+{
+	struct rb_ext_page_footer *footer = rb_ext_page_get_footer(bpage->page);
+
+	return atomic_read(&footer->reader_status);
+}
+
+static int __read_footer_writer_status(struct buffer_page *bpage)
+{
+	struct rb_ext_page_footer *footer = rb_ext_page_get_footer(bpage->page);
+
+	return atomic_read(&footer->writer_status);
+}
+
+static struct buffer_page *
+ring_buffer_search_footer(struct buffer_page *start, unsigned long flag)
+{
+	bool search_writer = flag == RB_PAGE_FT_COMMIT;
+	struct buffer_page *bpage = start;
+	unsigned long status;
+	int cnt = 0;
+again:
+	do {
+		status = search_writer ? __read_footer_writer_status(bpage) :
+					 __read_footer_reader_status(bpage);
+		if (flag & status)
+			return bpage;
+
+		rb_inc_page(&bpage);
+	} while (bpage != start);
+
 	/*
-	 * This should normally only loop twice. But because the
-	 * start of the reader inserts an empty page, it causes
-	 * a case where we will loop three times. There should be no
-	 * reason to loop four times (that I know of).
+	 * There's a chance the writer is in the middle of moving the flag and
+	 * we might not find anything after a first round. Let's try again.
 	 */
-	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 3)) {
-		reader = NULL;
-		goto out;
+	if (cnt++ < 3)
+		goto again;
+
+	return NULL;
+}
+
+static struct buffer_page *
+noinline rb_swap_reader_page_ext(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	struct buffer_page *new_reader, *new_rb_page, *new_head;
+	struct rb_ext_page_footer *footer;
+	unsigned long overrun;
+
+	if (cpu_buffer->buffer->ext_cb->swap_reader(cpu_buffer->cpu)) {
+		WARN_ON(1);
+		return NULL;
 	}
 
-	reader = cpu_buffer->reader_page;
+	new_rb_page = cpu_buffer->reader_page;
 
-	/* If there's more to read, return this page */
-	if (cpu_buffer->reader_page->read < rb_page_size(reader))
-		goto out;
+	/*
+	 * Find what page is the new reader... starting with the latest known
+	 * head.
+	 */
+	new_reader = ring_buffer_search_footer(cpu_buffer->head_page,
+					       RB_PAGE_FT_READER);
+	if (!new_reader) {
+		WARN_ON(1);
+		return NULL;
+	}
 
-	/* Never should we have an index greater than the size */
-	if (RB_WARN_ON(cpu_buffer,
-		       cpu_buffer->reader_page->read > rb_page_size(reader)))
-		goto out;
+	/* ... and install it into the ring buffer in place of the old head */
+	rb_list_head_clear(&new_reader->list);
+	new_rb_page->list.next = new_reader->list.next;
+	new_rb_page->list.prev = new_reader->list.prev;
+	new_rb_page->list.next->prev = &new_rb_page->list;
+	new_rb_page->list.prev->next = &new_rb_page->list;
 
-	/* check if we caught up to the tail */
-	reader = NULL;
-	if (cpu_buffer->commit_page == cpu_buffer->reader_page)
-		goto out;
+	cpu_buffer->reader_page = new_reader;
+	cpu_buffer->reader_page->read = 0;
 
-	/* Don't bother swapping if the ring buffer is empty */
-	if (rb_num_of_entries(cpu_buffer) == 0)
-		goto out;
+	/* Install the new head page */
+	new_head = new_rb_page;
+	rb_inc_page(&new_head);
+	cpu_buffer->head_page = new_head;
+
+	/*
+	 * cpu_buffer->pages just needs to point to the buffer, it
+	 *  has no specific buffer page to point to. Lets move it out
+	 *  of our way so we don't accidentally swap it.
+	 */
+	cpu_buffer->pages = &new_head->list;
+
+	__set_head_page_flag(new_head, RB_PAGE_HEAD);
+
+	footer = rb_ext_page_get_footer(new_reader->page);
+	overrun = footer->stats.overrun;
+	if (overrun != cpu_buffer->last_overrun) {
+		cpu_buffer->lost_events = overrun - cpu_buffer->last_overrun;
+		cpu_buffer->last_overrun = overrun;
+	}
+
+	return new_reader;
+}
+
+static struct buffer_page *
+rb_swap_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	struct buffer_page *reader;
+	unsigned long overwrite;
+	int ret;
 
 	/*
 	 * Reset the reader page to size zero.
@@ -4598,7 +4675,8 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	 */
 	reader = rb_set_head_page(cpu_buffer);
 	if (!reader)
-		goto out;
+		return NULL;
+
 	cpu_buffer->reader_page->list.next = rb_list_head(reader->list.next);
 	cpu_buffer->reader_page->list.prev = reader->list.prev;
 
@@ -4662,7 +4740,60 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 		cpu_buffer->last_overrun = overwrite;
 	}
 
-	goto again;
+	return reader;
+}
+
+static struct buffer_page *
+rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	struct buffer_page *reader = NULL;
+	unsigned long flags;
+	int nr_loops = 0;
+	unsigned int page_size;
+
+	local_irq_save(flags);
+	arch_spin_lock(&cpu_buffer->lock);
+
+ again:
+	/*
+	 * This should normally only loop twice. But because the
+	 * start of the reader inserts an empty page, it causes
+	 * a case where we will loop three times. There should be no
+	 * reason to loop four times (that I know of).
+	 */
+	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 3)) {
+		reader = NULL;
+		goto out;
+	}
+
+	reader = cpu_buffer->reader_page;
+
+	/* If there's more to read, return this page */
+	if (cpu_buffer->reader_page->read < rb_page_size(reader))
+		goto out;
+
+	page_size = rb_page_size(reader);
+	/* Never should we have an index greater than the size */
+	if (RB_WARN_ON(cpu_buffer,
+		       cpu_buffer->reader_page->read > page_size))
+		goto out;
+
+	/* check if we caught up to the tail */
+	reader = NULL;
+	if (cpu_buffer->commit_page == cpu_buffer->reader_page)
+		goto out;
+
+	/* Don't bother swapping if the ring buffer is empty */
+	if (rb_num_of_entries(cpu_buffer) == 0)
+		goto out;
+
+	if (rb_has_ext_writer(cpu_buffer))
+		reader = rb_swap_reader_page_ext(cpu_buffer);
+	else
+		reader = rb_swap_reader_page(cpu_buffer);
+
+	if (reader)
+		goto again;
 
  out:
 	/* Update the read_stamp on the first event */
@@ -5082,6 +5213,73 @@ ring_buffer_consume(struct trace_buffer *buffer, int cpu, u64 *ts,
 }
 EXPORT_SYMBOL_GPL(ring_buffer_consume);
 
+static void ring_buffer_update_view(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	struct rb_ext_page_footer *footer;
+	struct buffer_page *bpage;
+
+	if (!rb_has_ext_writer(cpu_buffer))
+		return;
+
+	raw_spin_lock_irq(&cpu_buffer->reader_lock);
+	arch_spin_lock(&cpu_buffer->lock);
+
+	cpu_buffer->buffer->ext_cb->update_footers(cpu_buffer->cpu);
+
+	bpage = cpu_buffer->reader_page;
+	footer = rb_ext_page_get_footer(bpage->page);
+
+	local_set(&cpu_buffer->entries, footer->stats.entries);
+	local_set(&cpu_buffer->pages_touched, footer->stats.pages_touched);
+	local_set(&cpu_buffer->overrun, footer->stats.overrun);
+
+	/* Update the commit page */
+	bpage = ring_buffer_search_footer(cpu_buffer->commit_page,
+					  RB_PAGE_FT_COMMIT);
+	if (!bpage) {
+		WARN_ON(1);
+		goto unlock;
+	}
+	cpu_buffer->commit_page = bpage;
+
+	/* Update the head page */
+	bpage = ring_buffer_search_footer(cpu_buffer->head_page,
+					  RB_PAGE_FT_HEAD);
+	if (!bpage) {
+		WARN_ON(1);
+		goto unlock;
+	}
+
+	/* Reset the previous RB_PAGE_HEAD flag */
+	__set_head_page_flag(cpu_buffer->head_page, RB_PAGE_NORMAL);
+
+	/* Set RB_PAGE_HEAD flag pointing to the new head */
+	__set_head_page_flag(bpage, RB_PAGE_HEAD);
+
+	cpu_buffer->reader_page->list.next = &cpu_buffer->head_page->list;
+
+	cpu_buffer->head_page = bpage;
+
+unlock:
+	arch_spin_unlock(&cpu_buffer->lock);
+	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+}
+
+int ring_buffer_poke(struct trace_buffer *buffer, int cpu)
+{
+	struct ring_buffer_per_cpu *cpu_buffer;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return -EINVAL;
+
+	cpu_buffer = buffer->buffers[cpu];
+
+	ring_buffer_update_view(cpu_buffer);
+	rb_wakeups(buffer, cpu_buffer);
+
+	return 0;
+}
+
 /**
  * ring_buffer_read_prepare - Prepare for a non consuming read of the buffer
  * @buffer: The ring buffer to read from
@@ -5128,6 +5326,8 @@ ring_buffer_read_prepare(struct trace_buffer *buffer, int cpu, gfp_t flags)
 
 	atomic_inc(&cpu_buffer->resize_disabled);
 
+	ring_buffer_update_view(cpu_buffer);
+
 	return iter;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_read_prepare);
@@ -5488,6 +5688,9 @@ int ring_buffer_swap_cpu(struct trace_buffer *buffer_a,
 	struct ring_buffer_per_cpu *cpu_buffer_b;
 	int ret = -EINVAL;
 
+	if (unlikely(has_ext_writer(buffer_a) || has_ext_writer(buffer_b)))
+		return -EINVAL;
+
 	if (!cpumask_test_cpu(cpu, buffer_a->cpumask) ||
 	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
 		goto out;
@@ -5788,14 +5991,21 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
 		cpu_buffer->read += rb_page_entries(reader);
 		cpu_buffer->read_bytes += BUF_PAGE_SIZE;
 
-		/* swap the pages */
-		rb_init_page(bpage);
-		bpage = reader->page;
-		reader->page = *data_page;
-		local_set(&reader->write, 0);
-		local_set(&reader->entries, 0);
-		reader->read = 0;
-		*data_page = bpage;
+		if (unlikely(has_ext_writer(buffer))) {
+			u64 commit = local_read(&reader->page->commit);
+
+			memcpy(bpage, reader->page,
+			       BUF_PAGE_HDR_SIZE + commit);
+		} else {
+			/* swap the pages */
+			rb_init_page(bpage);
+			bpage = reader->page;
+			reader->page = *data_page;
+			local_set(&reader->write, 0);
+			local_set(&reader->entries, 0);
+			reader->read = 0;
+			*data_page = bpage;
+		}
 
 		/*
 		 * Use the real_end for the data size,
@@ -5883,6 +6093,74 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node)
 	return 0;
 }
 
+#define TRACE_BUFFER_PACK_HDR_SIZE offsetof(struct trace_buffer_pack, __data)
+#define RING_BUFFER_PACK_HDR_SIZE offsetof(struct ring_buffer_pack, page_va)
+
+size_t trace_buffer_pack_size(struct trace_buffer *trace_buffer)
+{
+	size_t size = 0;
+	int cpu;
+
+	for_each_buffer_cpu(trace_buffer, cpu) {
+		struct ring_buffer_per_cpu *rb = trace_buffer->buffers[cpu];
+		size += rb->nr_pages * sizeof(unsigned long);
+		size += RING_BUFFER_PACK_HDR_SIZE;
+	}
+
+	size += TRACE_BUFFER_PACK_HDR_SIZE;
+
+	return size;
+}
+
+int trace_buffer_pack(struct trace_buffer *trace_buffer,
+		      struct trace_buffer_pack *pack)
+{
+	struct ring_buffer_pack *cpu_pack;
+	int cpu = -1, pack_cpu, j;
+
+	if (!has_ext_writer(trace_buffer))
+		return -EINVAL;
+
+	pack->nr_cpus = cpumask_weight(trace_buffer->cpumask);
+	pack->total_pages = 0;
+
+	for_each_ring_buffer_pack(cpu_pack, pack_cpu, pack) {
+		struct ring_buffer_per_cpu *rb;
+		unsigned long flags, nr_pages;
+		struct buffer_page *bpage;
+
+		cpu = cpumask_next(cpu, trace_buffer->cpumask);
+		if (cpu > nr_cpu_ids) {
+			WARN_ON(1);
+			break;
+		}
+
+		rb = trace_buffer->buffers[cpu];
+
+		local_irq_save(flags);
+		arch_spin_lock(&rb->lock);
+
+		bpage = rb->head_page;
+		nr_pages = rb->nr_pages;
+
+		pack->total_pages += nr_pages + 1;
+
+		cpu_pack->cpu = cpu;
+		cpu_pack->reader_page_va = (unsigned long)rb->reader_page->page;
+		cpu_pack->nr_pages = nr_pages;
+
+		for (j = 0; j < nr_pages; j++) {
+			cpu_pack->page_va[j] = (unsigned long)bpage->page;
+			rb_inc_page(&bpage);
+		}
+
+		arch_spin_unlock(&rb->lock);
+		local_irq_restore(flags);
+	}
+
+	return 0;
+}
+
 #ifdef CONFIG_RING_BUFFER_STARTUP_TEST
 /*
  * This is a basic integrity check of the ring buffer.

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 546e84ae..3fe42bc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c

@@ -49,6 +49,7 @@
 #include <linux/fsnotify.h>
 #include <linux/irq_work.h>
 #include <linux/workqueue.h>
+#include <trace/hooks/ftrace_dump.h>
 
 #include "trace.h"
 #include "trace_output.h"
@@ -9874,8 +9875,17 @@ fs_initcall(tracer_init_tracefs);
 static int trace_panic_handler(struct notifier_block *this,
 			       unsigned long event, void *unused)
 {
+	bool ftrace_check = false;
+
+	trace_android_vh_ftrace_oops_enter(&ftrace_check);
+
+	if (ftrace_check)
+		return NOTIFY_OK;
+
 	if (ftrace_dump_on_oops)
 		ftrace_dump(ftrace_dump_on_oops);
+
+	trace_android_vh_ftrace_oops_exit(&ftrace_check);
 	return NOTIFY_OK;
 }
 
@@ -9889,6 +9899,13 @@ static int trace_die_handler(struct notifier_block *self,
 			     unsigned long val,
 			     void *data)
 {
+	bool ftrace_check = false;
+
+	trace_android_vh_ftrace_oops_enter(&ftrace_check);
+
+	if (ftrace_check)
+		return NOTIFY_OK;
+
 	switch (val) {
 	case DIE_OOPS:
 		if (ftrace_dump_on_oops)
@@ -9897,6 +9914,8 @@ static int trace_die_handler(struct notifier_block *self,
 	default:
 		break;
 	}
+
+	trace_android_vh_ftrace_oops_exit(&ftrace_check);
 	return NOTIFY_OK;
 }
 
@@ -9921,6 +9940,8 @@ static struct notifier_block trace_die_notifier = {
 void
 trace_printk_seq(struct trace_seq *s)
 {
+	bool dump_printk = true;
+
 	/* Probably should print a warning here. */
 	if (s->seq.len >= TRACE_MAX_PRINT)
 		s->seq.len = TRACE_MAX_PRINT;
@@ -9936,7 +9957,9 @@ trace_printk_seq(struct trace_seq *s)
 	/* should be zero ended, but we are paranoid. */
 	s->buffer[s->seq.len] = 0;
 
-	printk(KERN_TRACE "%s", s->buffer);
+	trace_android_vh_ftrace_dump_buffer(s, &dump_printk);
+	if (dump_printk)
+		printk(KERN_TRACE "%s", s->buffer);
 
 	trace_seq_init(s);
 }
@@ -9975,6 +9998,8 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 	unsigned int old_userobj;
 	unsigned long flags;
 	int cnt = 0, cpu;
+	bool ftrace_check = false;
+	unsigned long size;
 
 	/* Only allow one dump user at a time. */
 	if (atomic_inc_return(&dump_running) != 1) {
@@ -9999,6 +10024,8 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 
 	for_each_tracing_cpu(cpu) {
 		atomic_inc(&per_cpu_ptr(iter.array_buffer->data, cpu)->disabled);
+		size = ring_buffer_size(iter.array_buffer->buffer, cpu);
+		trace_android_vh_ftrace_size_check(size, &ftrace_check);
 	}
 
 	old_userobj = tr->trace_flags & TRACE_ITER_SYM_USEROBJ;
@@ -10006,6 +10033,9 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 	/* don't look at user memory in panic mode */
 	tr->trace_flags &= ~TRACE_ITER_SYM_USEROBJ;
 
+	if (ftrace_check)
+		goto out_enable;
+
 	switch (oops_dump_mode) {
 	case DUMP_ALL:
 		iter.cpu_file = RING_BUFFER_ALL_CPUS;
@@ -10036,6 +10066,7 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 	 */
 
 	while (!trace_empty(&iter)) {
+		ftrace_check = true;
 
 		if (!cnt)
 			printk(KERN_TRACE "---------------------------------\n");
@@ -10043,7 +10074,9 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 		cnt++;
 
 		trace_iterator_reset(&iter);
-		iter.iter_flags |= TRACE_FILE_LAT_FMT;
+		trace_android_vh_ftrace_format_check(&ftrace_check);
+		if (ftrace_check)
+			iter.iter_flags |= TRACE_FILE_LAT_FMT;
 
 		if (trace_find_next_entry_inc(&iter) != NULL) {
 			int ret;

diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
index 1e130da..070befb 100644
--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c

@@ -14,6 +14,8 @@
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/preemptirq.h>
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/preemptirq.h>
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 /* Per-cpu variable to prevent redundant calls when IRQs already off */
@@ -28,8 +30,11 @@ static DEFINE_PER_CPU(int, tracing_irq_cpu);
 void trace_hardirqs_on_prepare(void)
 {
 	if (this_cpu_read(tracing_irq_cpu)) {
-		if (!in_nmi())
+		if (!in_nmi()) {
 			trace_irq_enable(CALLER_ADDR0, CALLER_ADDR1);
+			trace_android_rvh_irqs_enable(CALLER_ADDR0,
+						      CALLER_ADDR1);
+		}
 		tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
 		this_cpu_write(tracing_irq_cpu, 0);
 	}
@@ -40,8 +45,11 @@ NOKPROBE_SYMBOL(trace_hardirqs_on_prepare);
 void trace_hardirqs_on(void)
 {
 	if (this_cpu_read(tracing_irq_cpu)) {
-		if (!in_nmi())
+		if (!in_nmi()) {
 			trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+			trace_android_rvh_irqs_enable(CALLER_ADDR0,
+						      CALLER_ADDR1);
+		}
 		tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
 		this_cpu_write(tracing_irq_cpu, 0);
 	}
@@ -63,8 +71,11 @@ void trace_hardirqs_off_finish(void)
 	if (!this_cpu_read(tracing_irq_cpu)) {
 		this_cpu_write(tracing_irq_cpu, 1);
 		tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1);
-		if (!in_nmi())
+		if (!in_nmi()) {
 			trace_irq_disable(CALLER_ADDR0, CALLER_ADDR1);
+			trace_android_rvh_irqs_disable(CALLER_ADDR0,
+						       CALLER_ADDR1);
+		}
 	}
 
 }
@@ -78,8 +89,11 @@ void trace_hardirqs_off(void)
 	if (!this_cpu_read(tracing_irq_cpu)) {
 		this_cpu_write(tracing_irq_cpu, 1);
 		tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1);
-		if (!in_nmi())
+		if (!in_nmi()) {
 			trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+			trace_android_rvh_irqs_disable(CALLER_ADDR0,
+						       CALLER_ADDR1);
+		}
 	}
 }
 EXPORT_SYMBOL(trace_hardirqs_off);
@@ -88,8 +102,11 @@ NOKPROBE_SYMBOL(trace_hardirqs_off);
 __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
 {
 	if (this_cpu_read(tracing_irq_cpu)) {
-		if (!in_nmi())
+		if (!in_nmi()) {
 			trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
+			trace_android_rvh_irqs_enable(CALLER_ADDR0,
+						      caller_addr);
+		}
 		tracer_hardirqs_on(CALLER_ADDR0, caller_addr);
 		this_cpu_write(tracing_irq_cpu, 0);
 	}
@@ -107,8 +124,11 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
 	if (!this_cpu_read(tracing_irq_cpu)) {
 		this_cpu_write(tracing_irq_cpu, 1);
 		tracer_hardirqs_off(CALLER_ADDR0, caller_addr);
-		if (!in_nmi())
+		if (!in_nmi()) {
 			trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+			trace_android_rvh_irqs_enable(CALLER_ADDR0,
+						      caller_addr);
+		}
 	}
 }
 EXPORT_SYMBOL(trace_hardirqs_off_caller);
@@ -119,15 +139,19 @@ NOKPROBE_SYMBOL(trace_hardirqs_off_caller);
 
 void trace_preempt_on(unsigned long a0, unsigned long a1)
 {
-	if (!in_nmi())
+	if (!in_nmi()) {
 		trace_preempt_enable_rcuidle(a0, a1);
+		trace_android_rvh_preempt_enable(a0, a1);
+	}
 	tracer_preempt_on(a0, a1);
 }
 
 void trace_preempt_off(unsigned long a0, unsigned long a1)
 {
-	if (!in_nmi())
+	if (!in_nmi()) {
 		trace_preempt_disable_rcuidle(a0, a1);
+		trace_android_rvh_preempt_disable(a0, a1);
+	}
 	tracer_preempt_off(a0, a1);
 }
 #endif

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index f23144a..a644ecc 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c

@@ -785,3 +785,82 @@ void syscall_unregfunc(void)
 	}
 }
 #endif
+
+#ifdef CONFIG_ANDROID_VENDOR_HOOKS
+
+static void *rvh_zalloc_funcs(int count)
+{
+	return kzalloc(sizeof(struct tracepoint_func) * count, GFP_KERNEL);
+}
+
+#define ANDROID_RVH_NR_PROBES_MAX	2
+static int rvh_func_add(struct tracepoint *tp, struct tracepoint_func *func)
+{
+	int i;
+
+	if (!static_key_enabled(&tp->key)) {
+		/* '+ 1' for the last NULL element */
+		tp->funcs = rvh_zalloc_funcs(ANDROID_RVH_NR_PROBES_MAX + 1);
+		if (!tp->funcs)
+			return ENOMEM;
+	}
+
+	for (i = 0; i < ANDROID_RVH_NR_PROBES_MAX; i++) {
+		if (!tp->funcs[i].func) {
+			if (!static_key_enabled(&tp->key))
+				tp->funcs[i].data = func->data;
+			WRITE_ONCE(tp->funcs[i].func, func->func);
+
+			return 0;
+		}
+	}
+
+	return -EBUSY;
+}
+
+static int android_rvh_add_func(struct tracepoint *tp, struct tracepoint_func *func)
+{
+	int ret;
+
+	if (tp->regfunc && !static_key_enabled(&tp->key)) {
+		ret = tp->regfunc();
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = rvh_func_add(tp, func);
+	if (ret)
+		return ret;
+	tracepoint_update_call(tp, tp->funcs);
+	static_key_enable(&tp->key);
+
+	return 0;
+}
+
+int android_rvh_probe_register(struct tracepoint *tp, void *probe, void *data)
+{
+	struct tracepoint_func tp_func;
+	int ret;
+
+	/*
+	 * Once the static key has been flipped, the array may be read
+	 * concurrently. Although __traceiter_*()  always checks .func first,
+	 * it doesn't enforce read->read dependencies, and we can't strongly
+	 * guarantee it will see the correct .data for the second element
+	 * without adding smp_load_acquire() in the fast path. But this is a
+	 * corner case which is unlikely to be needed by anybody in practice,
+	 * so let's just forbid it and keep the fast path clean.
+	 */
+	if (WARN_ON(static_key_enabled(&tp->key) && data))
+		return -EINVAL;
+
+	mutex_lock(&tracepoints_mutex);
+	tp_func.func = probe;
+	tp_func.data = data;
+	ret = android_rvh_add_func(tp, &tp_func);
+	mutex_unlock(&tracepoints_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(android_rvh_probe_register);
+#endif

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7cd5f5e..afa1767 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c

@@ -54,6 +54,10 @@
 
 #include "workqueue_internal.h"
 
+#include <trace/hooks/wqlockup.h>
+/* events/workqueue.h uses default TRACE_INCLUDE_PATH */
+#undef TRACE_INCLUDE_PATH
+
 enum {
 	/*
 	 * worker_pool flags
@@ -381,6 +385,9 @@ static void show_one_worker_pool(struct worker_pool *pool);
 #define CREATE_TRACE_POINTS
 #include <trace/events/workqueue.h>
 
+EXPORT_TRACEPOINT_SYMBOL_GPL(workqueue_execute_start);
+EXPORT_TRACEPOINT_SYMBOL_GPL(workqueue_execute_end);
+
 #define assert_rcu_or_pool_mutex()					\
 	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&			\
 			 !lockdep_is_held(&wq_pool_mutex),		\
@@ -5871,6 +5878,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 			pr_cont_pool_info(pool);
 			pr_cont(" stuck for %us!\n",
 				jiffies_to_msecs(now - pool_ts) / 1000);
+			trace_android_vh_wq_lockup_pool(pool->cpu, pool_ts);
 		}
 	}
 

diff --git a/lib/crypto/aes.c b/lib/crypto/aes.c
index 827fe89..5fc78e5 100644
--- a/lib/crypto/aes.c
+++ b/lib/crypto/aes.c

@@ -7,6 +7,7 @@
 #include <linux/crypto.h>
 #include <linux/module.h>
 #include <asm/unaligned.h>
+#include <trace/hooks/fips140.h>
 
 /*
  * Emit the sbox as volatile const to prevent the compiler from doing
@@ -189,6 +190,13 @@ int aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
 	u32 rc, i, j;
 	int err;
 
+#ifndef __DISABLE_EXPORTS
+	err = -(MAX_ERRNO + 1);
+	trace_android_vh_aes_expandkey(ctx, in_key, key_len, &err);
+	if (err != -(MAX_ERRNO + 1))
+		return err;
+#endif
+
 	err = aes_check_keylen(key_len);
 	if (err)
 		return err;
@@ -261,6 +269,13 @@ void aes_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in)
 	int rounds = 6 + ctx->key_length / 4;
 	u32 st0[4], st1[4];
 	int round;
+#ifndef __DISABLE_EXPORTS
+	int hook_inuse = 0;
+
+	trace_android_vh_aes_encrypt(ctx, out, in, &hook_inuse);
+	if (hook_inuse)
+		return;
+#endif
 
 	st0[0] = ctx->key_enc[0] ^ get_unaligned_le32(in);
 	st0[1] = ctx->key_enc[1] ^ get_unaligned_le32(in + 4);
@@ -312,6 +327,13 @@ void aes_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in)
 	int rounds = 6 + ctx->key_length / 4;
 	u32 st0[4], st1[4];
 	int round;
+#ifndef __DISABLE_EXPORTS
+	int hook_inuse = 0;
+
+	trace_android_vh_aes_decrypt(ctx, out, in, &hook_inuse);
+	if (hook_inuse)
+		return;
+#endif
 
 	st0[0] = ctx->key_dec[0] ^ get_unaligned_le32(in);
 	st0[1] = ctx->key_dec[1] ^ get_unaligned_le32(in + 4);

diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
index 72a4b0b..3f04750 100644
--- a/lib/crypto/sha256.c
+++ b/lib/crypto/sha256.c

@@ -17,6 +17,7 @@
 #include <linux/string.h>
 #include <crypto/sha2.h>
 #include <asm/unaligned.h>
+#include <trace/hooks/fips140.h>
 
 static const u32 SHA256_K[] = {
 	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
@@ -200,6 +201,14 @@ void sha256(const u8 *data, unsigned int len, u8 *out)
 {
 	struct sha256_state sctx;
 
+#ifndef __DISABLE_EXPORTS
+	int hook_inuse = 0;
+
+	trace_android_vh_sha256(data, len, out, &hook_inuse);
+	if (hook_inuse)
+		return;
+#endif
+
 	sha256_init(&sctx);
 	sha256_update(&sctx, data, len);
 	sha256_final(&sctx, out);

diff --git a/lib/trace_readwrite.c b/lib/trace_readwrite.c
index 88637038b..62b4e8b 100644
--- a/lib/trace_readwrite.c
+++ b/lib/trace_readwrite.c

@@ -14,33 +14,33 @@
 
 #ifdef CONFIG_TRACE_MMIO_ACCESS
 void log_write_mmio(u64 val, u8 width, volatile void __iomem *addr,
-		    unsigned long caller_addr)
+		    unsigned long caller_addr, unsigned long caller_addr0)
 {
-	trace_rwmmio_write(caller_addr, val, width, addr);
+	trace_rwmmio_write(caller_addr, caller_addr0, val, width, addr);
 }
 EXPORT_SYMBOL_GPL(log_write_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(rwmmio_write);
 
 void log_post_write_mmio(u64 val, u8 width, volatile void __iomem *addr,
-			 unsigned long caller_addr)
+			 unsigned long caller_addr, unsigned long caller_addr0)
 {
-	trace_rwmmio_post_write(caller_addr, val, width, addr);
+	trace_rwmmio_post_write(caller_addr, caller_addr0, val, width, addr);
 }
 EXPORT_SYMBOL_GPL(log_post_write_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(rwmmio_post_write);
 
 void log_read_mmio(u8 width, const volatile void __iomem *addr,
-		   unsigned long caller_addr)
+		   unsigned long caller_addr, unsigned long caller_addr0)
 {
-	trace_rwmmio_read(caller_addr, width, addr);
+	trace_rwmmio_read(caller_addr, caller_addr0, width, addr);
 }
 EXPORT_SYMBOL_GPL(log_read_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(rwmmio_read);
 
 void log_post_read_mmio(u64 val, u8 width, const volatile void __iomem *addr,
-			unsigned long caller_addr)
+			unsigned long caller_addr, unsigned long caller_addr0)
 {
-	trace_rwmmio_post_read(caller_addr, val, width, addr);
+	trace_rwmmio_post_read(caller_addr, caller_addr0, val, width, addr);
 }
 EXPORT_SYMBOL_GPL(log_post_read_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(rwmmio_post_read);

diff --git a/mm/Kconfig b/mm/Kconfig
index 57e1d8c..f41b963 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig

@@ -542,6 +542,13 @@
 	bool
 
 #
+# support for memory relinquish
+config MEMORY_RELINQUISH
+	def_bool y
+	depends on ARCH_HAS_MEM_RELINQUISH
+	depends on MEMORY_BALLOON || PAGE_REPORTING
+
+#
 # support for memory balloon compaction
 config BALLOON_COMPACTION
 	bool "Allow for balloon memory compaction/migration"
@@ -1073,6 +1080,11 @@
 config IO_MAPPING
 	bool
 
+# Some architectures want callbacks for all IO mappings in order to
+# track the physical addresses that get used as devices.
+config ARCH_HAS_IOREMAP_PHYS_HOOKS
+	bool
+
 config SECRETMEM
 	def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED
 

diff --git a/mm/OWNERS b/mm/OWNERS
new file mode 100644
index 0000000..5f97cfd
--- /dev/null
+++ b/mm/OWNERS

@@ -0,0 +1,3 @@
+kaleshsingh@google.com
+surenb@google.com
+minchan@google.com

diff --git a/mm/cma.c b/mm/cma.c
index 4a978e0..332f3c6 100644
--- a/mm/cma.c
+++ b/mm/cma.c

@@ -24,6 +24,7 @@
 #include <linux/memblock.h>
 #include <linux/err.h>
 #include <linux/mm.h>
+#include <linux/module.h>
 #include <linux/sizes.h>
 #include <linux/slab.h>
 #include <linux/log2.h>
@@ -31,6 +32,8 @@
 #include <linux/highmem.h>
 #include <linux/io.h>
 #include <linux/kmemleak.h>
+#include <linux/sched.h>
+#include <linux/jiffies.h>
 #include <trace/events/cma.h>
 
 #include "cma.h"
@@ -53,6 +56,7 @@ const char *cma_get_name(const struct cma *cma)
 {
 	return cma->name;
 }
+EXPORT_SYMBOL_GPL(cma_get_name);
 
 static unsigned long cma_bitmap_aligned_mask(const struct cma *cma,
 					     unsigned int align_order)
@@ -431,6 +435,8 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 	unsigned long i;
 	struct page *page = NULL;
 	int ret = -ENOMEM;
+	int num_attempts = 0;
+	int max_retries = 5;
 
 	if (!cma || !cma->count || !cma->bitmap)
 		goto out;
@@ -457,8 +463,28 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 				bitmap_maxno, start, bitmap_count, mask,
 				offset);
 		if (bitmap_no >= bitmap_maxno) {
-			spin_unlock_irq(&cma->lock);
-			break;
+			if ((num_attempts < max_retries) && (ret == -EBUSY)) {
+				spin_unlock_irq(&cma->lock);
+
+				if (fatal_signal_pending(current))
+					break;
+
+				/*
+				 * Page may be momentarily pinned by some other
+				 * process which has been scheduled out, e.g.
+				 * in exit path, during unmap call, or process
+				 * fork and so cannot be freed there. Sleep
+				 * for 100ms and retry the allocation.
+				 */
+				start = 0;
+				ret = -ENOMEM;
+				schedule_timeout_killable(msecs_to_jiffies(100));
+				num_attempts++;
+				continue;
+			} else {
+				spin_unlock_irq(&cma->lock);
+				break;
+			}
 		}
 		bitmap_set(cma->bitmap, bitmap_no, bitmap_count);
 		/*
@@ -522,6 +548,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 
 	return page;
 }
+EXPORT_SYMBOL_GPL(cma_alloc);
 
 bool cma_pages_valid(struct cma *cma, const struct page *pages,
 		     unsigned long count)
@@ -572,6 +599,7 @@ bool cma_release(struct cma *cma, const struct page *pages,
 
 	return true;
 }
+EXPORT_SYMBOL_GPL(cma_release);
 
 int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
 {
@@ -586,3 +614,4 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(cma_for_each_area);

diff --git a/mm/compaction.c b/mm/compaction.c
index 8238e83..2d9bc2e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c

@@ -744,6 +744,31 @@ isolate_freepages_range(struct compact_control *cc,
 	return pfn;
 }
 
+#ifdef CONFIG_COMPACTION
+unsigned long isolate_and_split_free_page(struct page *page,
+						struct list_head *list)
+{
+	unsigned long isolated;
+	unsigned int order;
+
+	if (!PageBuddy(page))
+		return 0;
+
+	order = buddy_order(page);
+	isolated = __isolate_free_page(page, order);
+	if (!isolated)
+		return 0;
+
+	set_page_private(page, order);
+	list_add(&page->lru, list);
+
+	split_map_pages(list);
+
+	return isolated;
+}
+EXPORT_SYMBOL_GPL(isolate_and_split_free_page);
+#endif
+
 /* Similar to reclaim, but different enough that they don't share logic */
 static bool too_many_isolated(pg_data_t *pgdat)
 {

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 833bf2c..1d0008e 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c

@@ -95,19 +95,24 @@ asmlinkage void kasan_unpoison_task_stack_below(const void *watermark)
 }
 #endif /* CONFIG_KASAN_STACK */
 
-void __kasan_unpoison_pages(struct page *page, unsigned int order, bool init)
+bool __kasan_unpoison_pages(struct page *page, unsigned int order, bool init)
 {
 	u8 tag;
 	unsigned long i;
 
 	if (unlikely(PageHighMem(page)))
-		return;
+		return false;
+
+	if (!kasan_sample_page_alloc(order))
+		return false;
 
 	tag = kasan_random_tag();
 	kasan_unpoison(set_tag(page_address(page), tag),
 		       PAGE_SIZE << order, init);
 	for (i = 0; i < (1 << order); i++)
 		page_kasan_tag_set(page + i, tag);
+
+	return true;
 }
 
 void __kasan_poison_pages(struct page *page, unsigned int order, bool init)

diff --git a/mm/kasan/hw_tags.c b/mm/kasan/hw_tags.c
index b22c4f4..d1bcb02 100644
--- a/mm/kasan/hw_tags.c
+++ b/mm/kasan/hw_tags.c

@@ -59,6 +59,24 @@ EXPORT_SYMBOL_GPL(kasan_mode);
 /* Whether to enable vmalloc tagging. */
 DEFINE_STATIC_KEY_TRUE(kasan_flag_vmalloc);
 
+#define PAGE_ALLOC_SAMPLE_DEFAULT	1
+#define PAGE_ALLOC_SAMPLE_ORDER_DEFAULT	3
+
+/*
+ * Sampling interval of page_alloc allocation (un)poisoning.
+ * Defaults to no sampling.
+ */
+unsigned long kasan_page_alloc_sample = PAGE_ALLOC_SAMPLE_DEFAULT;
+
+/*
+ * Minimum order of page_alloc allocations to be affected by sampling.
+ * The default value is chosen to match both
+ * PAGE_ALLOC_COSTLY_ORDER and SKB_FRAG_PAGE_ORDER.
+ */
+unsigned int kasan_page_alloc_sample_order = PAGE_ALLOC_SAMPLE_ORDER_DEFAULT;
+
+DEFINE_PER_CPU(long, kasan_page_alloc_skip);
+
 /* kasan=off/on */
 static int __init early_kasan_flag(char *arg)
 {
@@ -122,6 +140,48 @@ static inline const char *kasan_mode_info(void)
 		return "sync";
 }
 
+/* kasan.page_alloc.sample=<sampling interval> */
+static int __init early_kasan_flag_page_alloc_sample(char *arg)
+{
+	int rv;
+
+	if (!arg)
+		return -EINVAL;
+
+	rv = kstrtoul(arg, 0, &kasan_page_alloc_sample);
+	if (rv)
+		return rv;
+
+	if (!kasan_page_alloc_sample || kasan_page_alloc_sample > LONG_MAX) {
+		kasan_page_alloc_sample = PAGE_ALLOC_SAMPLE_DEFAULT;
+		return -EINVAL;
+	}
+
+	return 0;
+}
+early_param("kasan.page_alloc.sample", early_kasan_flag_page_alloc_sample);
+
+/* kasan.page_alloc.sample.order=<minimum page order> */
+static int __init early_kasan_flag_page_alloc_sample_order(char *arg)
+{
+	int rv;
+
+	if (!arg)
+		return -EINVAL;
+
+	rv = kstrtouint(arg, 0, &kasan_page_alloc_sample_order);
+	if (rv)
+		return rv;
+
+	if (kasan_page_alloc_sample_order > INT_MAX) {
+		kasan_page_alloc_sample_order = PAGE_ALLOC_SAMPLE_ORDER_DEFAULT;
+		return -EINVAL;
+	}
+
+	return 0;
+}
+early_param("kasan.page_alloc.sample.order", early_kasan_flag_page_alloc_sample_order);
+
 /*
  * kasan_init_hw_tags_cpu() is called for each CPU.
  * Not marked as __init as a CPU can be hot-plugged after boot.

diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index abbcc1b..8b6e0983 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h

@@ -42,6 +42,10 @@ enum kasan_mode {
 
 extern enum kasan_mode kasan_mode __ro_after_init;
 
+extern unsigned long kasan_page_alloc_sample;
+extern unsigned int kasan_page_alloc_sample_order;
+DECLARE_PER_CPU(long, kasan_page_alloc_skip);
+
 static inline bool kasan_vmalloc_enabled(void)
 {
 	return static_branch_likely(&kasan_flag_vmalloc);
@@ -57,6 +61,24 @@ static inline bool kasan_sync_fault_possible(void)
 	return kasan_mode == KASAN_MODE_SYNC || kasan_mode == KASAN_MODE_ASYMM;
 }
 
+static inline bool kasan_sample_page_alloc(unsigned int order)
+{
+	/* Fast-path for when sampling is disabled. */
+	if (kasan_page_alloc_sample == 1)
+		return true;
+
+	if (order < kasan_page_alloc_sample_order)
+		return true;
+
+	if (this_cpu_dec_return(kasan_page_alloc_skip) < 0) {
+		this_cpu_write(kasan_page_alloc_skip,
+			       kasan_page_alloc_sample - 1);
+		return true;
+	}
+
+	return false;
+}
+
 #else /* CONFIG_KASAN_HW_TAGS */
 
 static inline bool kasan_async_fault_possible(void)
@@ -69,6 +91,11 @@ static inline bool kasan_sync_fault_possible(void)
 	return true;
 }
 
+static inline bool kasan_sample_page_alloc(unsigned int order)
+{
+	return true;
+}
+
 #endif /* CONFIG_KASAN_HW_TAGS */
 
 #ifdef CONFIG_KASAN_GENERIC

diff --git a/mm/madvise.c b/mm/madvise.c
index d03e149..621b252 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c

@@ -321,6 +321,21 @@ static long madvise_willneed(struct vm_area_struct *vma,
 	return 0;
 }
 
+static inline bool can_do_file_pageout(struct vm_area_struct *vma)
+{
+	if (!vma->vm_file)
+		return false;
+	/*
+	 * paging out pagecache only for non-anonymous mappings that correspond
+	 * to the files the calling process could (if tried) open for writing;
+	 * otherwise we'd be including shared non-exclusive mappings, which
+	 * opens a side channel.
+	 */
+	return inode_owner_or_capable(&init_user_ns,
+				      file_inode(vma->vm_file)) ||
+	       file_permission(vma->vm_file, MAY_WRITE) == 0;
+}
+
 static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 				unsigned long addr, unsigned long end,
 				struct mm_walk *walk)
@@ -334,10 +349,14 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 	spinlock_t *ptl;
 	struct page *page = NULL;
 	LIST_HEAD(page_list);
+	bool pageout_anon_only_filter;
 
 	if (fatal_signal_pending(current))
 		return -EINTR;
 
+	pageout_anon_only_filter = pageout && !vma_is_anonymous(vma) &&
+					!can_do_file_pageout(vma);
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	if (pmd_trans_huge(*pmd)) {
 		pmd_t orig_pmd;
@@ -364,6 +383,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		if (page_mapcount(page) != 1)
 			goto huge_unlock;
 
+		if (pageout_anon_only_filter && !PageAnon(page))
+			goto huge_unlock;
+
 		if (next - addr != HPAGE_PMD_SIZE) {
 			int err;
 
@@ -432,6 +454,8 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		if (PageTransCompound(page)) {
 			if (page_mapcount(page) != 1)
 				break;
+			if (pageout_anon_only_filter && !PageAnon(page))
+				break;
 			get_page(page);
 			if (!trylock_page(page)) {
 				put_page(page);
@@ -459,6 +483,9 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		if (!PageLRU(page) || page_mapcount(page) != 1)
 			continue;
 
+		if (pageout_anon_only_filter && !PageAnon(page))
+			continue;
+
 		VM_BUG_ON_PAGE(PageTransCompound(page), page);
 
 		if (pte_young(ptent)) {
@@ -553,23 +580,6 @@ static void madvise_pageout_page_range(struct mmu_gather *tlb,
 	tlb_end_vma(tlb, vma);
 }
 
-static inline bool can_do_pageout(struct vm_area_struct *vma)
-{
-	if (vma_is_anonymous(vma))
-		return true;
-	if (!vma->vm_file)
-		return false;
-	/*
-	 * paging out pagecache only for non-anonymous mappings that correspond
-	 * to the files the calling process could (if tried) open for writing;
-	 * otherwise we'd be including shared non-exclusive mappings, which
-	 * opens a side channel.
-	 */
-	return inode_owner_or_capable(&init_user_ns,
-				      file_inode(vma->vm_file)) ||
-	       file_permission(vma->vm_file, MAY_WRITE) == 0;
-}
-
 static long madvise_pageout(struct vm_area_struct *vma,
 			struct vm_area_struct **prev,
 			unsigned long start_addr, unsigned long end_addr)
@@ -581,7 +591,14 @@ static long madvise_pageout(struct vm_area_struct *vma,
 	if (!can_madv_lru_vma(vma))
 		return -EINVAL;
 
-	if (!can_do_pageout(vma))
+	/*
+	 * If the VMA belongs to a private file mapping, there can be private
+	 * dirty pages which can be paged out if even this process is neither
+	 * owner nor write capable of the file. We allow private file mappings
+	 * further to pageout dirty anon pages.
+	 */
+	if (!vma_is_anonymous(vma) && (!can_do_file_pageout(vma) &&
+				(vma->vm_flags & VM_MAYSHARE)))
 		return 0;
 
 	lru_add_drain();

diff --git a/mm/memblock.c b/mm/memblock.c
index fc3d8fb..ff53d24 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c

@@ -1677,6 +1677,7 @@ phys_addr_t __init_memblock memblock_end_of_DRAM(void)
 
 	return (memblock.memory.regions[idx].base + memblock.memory.regions[idx].size);
 }
+EXPORT_SYMBOL_GPL(memblock_end_of_DRAM);
 
 static phys_addr_t __init_memblock __find_max_addr(phys_addr_t limit)
 {

diff --git a/mm/memory.c b/mm/memory.c
index f6f93e5..fbefac4 100644
--- a/mm/memory.c
+++ b/mm/memory.c

@@ -77,6 +77,7 @@
 #include <linux/ptrace.h>
 #include <linux/vmalloc.h>
 #include <linux/sched/sysctl.h>
+#include <linux/set_memory.h>
 
 #include <trace/events/kmem.h>
 
@@ -166,6 +167,7 @@ void mm_trace_rss_stat(struct mm_struct *mm, int member, long count)
 {
 	trace_rss_stat(mm, member, count);
 }
+EXPORT_SYMBOL_GPL(mm_trace_rss_stat);
 
 #if defined(SPLIT_RSS_COUNTING)
 
@@ -5846,3 +5848,13 @@ void ptlock_free(struct page *page)
 	kmem_cache_free(page_ptl_cachep, page->ptl);
 }
 #endif
+
+int set_direct_map_range_uncached(unsigned long addr, unsigned long numpages)
+{
+#ifdef CONFIG_ARM64
+	return arch_set_direct_map_range_uncached(addr, numpages);
+#else
+	return -EOPNOTSUPP;
+#endif
+}
+EXPORT_SYMBOL_GPL(set_direct_map_range_uncached);

diff --git a/mm/migrate.c b/mm/migrate.c
index dff3335..91e8ef3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c

@@ -167,6 +167,7 @@ void putback_movable_pages(struct list_head *l)
 		}
 	}
 }
+EXPORT_SYMBOL_GPL(putback_movable_pages);
 
 /*
  * Restore a potential migration pte to a working pte entry
@@ -1602,6 +1603,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 
 	return rc;
 }
+EXPORT_SYMBOL_GPL(migrate_pages);
 
 struct page *alloc_migration_target(struct page *page, unsigned long private)
 {

diff --git a/mm/mmap.c b/mm/mmap.c
index 1777148..47c95ec 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c

@@ -1633,6 +1633,7 @@ unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info)
 	trace_vm_unmapped_area(addr, info);
 	return addr;
 }
+EXPORT_SYMBOL_GPL(vm_unmapped_area);
 
 /* Get an address range which is currently unmapped.
  * For shmat() with addr=0.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e60657..5f5a9528 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c

@@ -585,6 +585,24 @@ unsigned long get_pfnblock_flags_mask(const struct page *page,
 {
 	return __get_pfnblock_flags_mask(page, pfn, mask);
 }
+EXPORT_SYMBOL_GPL(get_pfnblock_flags_mask);
+
+int isolate_anon_lru_page(struct page *page)
+{
+	int ret;
+
+	if (!PageLRU(page) || !PageAnon(page))
+		return -EINVAL;
+
+	if (!get_page_unless_zero(page))
+		return -EINVAL;
+
+	ret = isolate_lru_page(page);
+	put_page(page);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(isolate_anon_lru_page);
 
 static __always_inline int get_pfnblock_migratetype(const struct page *page,
 					unsigned long pfn)
@@ -1367,6 +1385,8 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
  *    see the comment next to it.
  * 3. Skipping poisoning is requested via __GFP_SKIP_KASAN_POISON,
  *    see the comment next to it.
+ * 4. The allocation is excluded from being checked due to sampling,
+ *    see the call to kasan_unpoison_pages.
  *
  * Poisoning pages during deferred memory init will greatly lengthen the
  * process and cause problem in large memory systems as the deferred pages
@@ -2476,7 +2496,8 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 {
 	bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
 			!should_skip_init(gfp_flags);
-	bool init_tags = init && (gfp_flags & __GFP_ZEROTAGS);
+	bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS);
+	bool reset_tags = true;
 	int i;
 
 	set_page_private(page, 0);
@@ -2499,30 +2520,43 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 	 */
 
 	/*
-	 * If memory tags should be zeroed (which happens only when memory
-	 * should be initialized as well).
+	 * If memory tags should be zeroed
+	 * (which happens only when memory should be initialized as well).
 	 */
-	if (init_tags) {
-		/* Initialize both memory and tags. */
+	if (zero_tags) {
+		/* Initialize both memory and memory tags. */
 		for (i = 0; i != 1 << order; ++i)
 			tag_clear_highpage(page + i);
 
-		/* Note that memory is already initialized by the loop above. */
+		/* Take note that memory was initialized by the loop above. */
 		init = false;
 	}
 	if (!should_skip_kasan_unpoison(gfp_flags)) {
-		/* Unpoison shadow memory or set memory tags. */
-		kasan_unpoison_pages(page, order, init);
-
-		/* Note that memory is already initialized by KASAN. */
-		if (kasan_has_integrated_init())
-			init = false;
-	} else {
-		/* Ensure page_address() dereferencing does not fault. */
+		/* Try unpoisoning (or setting tags) and initializing memory. */
+		if (kasan_unpoison_pages(page, order, init)) {
+			/* Take note that memory was initialized by KASAN. */
+			if (kasan_has_integrated_init())
+				init = false;
+			/* Take note that memory tags were set by KASAN. */
+			reset_tags = false;
+		} else {
+			/*
+			 * KASAN decided to exclude this allocation from being
+			 * (un)poisoned due to sampling. Make KASAN skip
+			 * poisoning when the allocation is freed.
+			 */
+			SetPageSkipKASanPoison(page);
+		}
+	}
+	/*
+	 * If memory tags have not been set by KASAN, reset the page tags to
+	 * ensure page_address() dereferencing does not fault.
+	 */
+	if (reset_tags) {
 		for (i = 0; i != 1 << order; ++i)
 			page_kasan_tag_reset(page + i);
 	}
-	/* If memory is still not initialized, do it now. */
+	/* If memory is still not initialized, initialize it now. */
 	if (init)
 		kernel_init_pages(page, 1 << order);
 	/* Propagate __GFP_SKIP_KASAN_POISON to page flags. */

diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 382958e..5c4b1fb 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c

@@ -7,6 +7,7 @@
 #include <linux/module.h>
 #include <linux/delay.h>
 #include <linux/scatterlist.h>
+#include <linux/mem_relinquish.h>
 
 #include "page_reporting.h"
 #include "internal.h"
@@ -120,7 +121,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone,
 	unsigned int page_len = PAGE_SIZE << order;
 	struct page *page, *next;
 	long budget;
-	int err = 0;
+	int i, err = 0;
 
 	/*
 	 * Perform early check, if free area is empty there is
@@ -175,6 +176,10 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone,
 			--(*offset);
 			sg_set_page(&sgl[*offset], page, page_len, 0);
 
+			/* Notify hyp that these pages are reclaimable. */
+			for (i = 0; i < (1<<order); i++)
+				page_relinquish(page+i);
+
 			continue;
 		}
 

diff --git a/mm/percpu.c b/mm/percpu.c
index 27697b2..5457a2c 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c

@@ -2417,6 +2417,7 @@ phys_addr_t per_cpu_ptr_to_phys(void *addr)
 		return page_to_phys(pcpu_addr_to_page(addr)) +
 		       offset_in_page(addr);
 }
+EXPORT_SYMBOL_GPL(per_cpu_ptr_to_phys);
 
 /**
  * pcpu_alloc_alloc_info - allocate percpu allocation info

diff --git a/mm/util.c b/mm/util.c
index 12984e7..4afa7dd 100644
--- a/mm/util.c
+++ b/mm/util.c

@@ -29,6 +29,10 @@
 #include "internal.h"
 #include "swap.h"
 
+#ifndef __GENSYMS__
+#include <trace/hooks/syscall_check.h>
+#endif
+
 /**
  * kfree_const - conditionally free memory
  * @x: pointer to the memory
@@ -524,6 +528,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 		if (populate)
 			mm_populate(ret, populate);
 	}
+	trace_android_vh_check_mmap_file(file, prot, flag, ret);
 	return ret;
 }
 

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ccaa4619..bdcbde0 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c

@@ -40,6 +40,7 @@
 #include <linux/uaccess.h>
 #include <linux/hugetlb.h>
 #include <linux/sched/mm.h>
+#include <linux/io.h>
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
@@ -317,12 +318,17 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
 {
 	int err;
 
-	err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
+	prot = pgprot_nx(prot);
+	err = vmap_range_noflush(addr, end, phys_addr, prot,
 				 ioremap_max_page_shift);
 	flush_cache_vmap(addr, end);
 	if (!err)
 		kmsan_ioremap_page_range(addr, end, phys_addr, prot,
 					 ioremap_max_page_shift);
+
+	if (IS_ENABLED(CONFIG_ARCH_HAS_IOREMAP_PHYS_HOOKS) && !err)
+		ioremap_phys_range_hook(phys_addr, end - addr, prot);
+
 	return err;
 }
 
@@ -2696,6 +2702,10 @@ static void __vunmap(const void *addr, int deallocate_pages)
 
 	kasan_poison_vmalloc(area->addr, get_vm_area_size(area));
 
+	if (IS_ENABLED(CONFIG_ARCH_HAS_IOREMAP_PHYS_HOOKS) &&
+	    area->flags & VM_IOREMAP)
+		iounmap_phys_range_hook(area->phys_addr, get_vm_area_size(area));
+
 	vm_remove_mappings(area, deallocate_pages);
 
 	if (deallocate_pages) {

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 96eb9da..22981e3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c

@@ -68,6 +68,9 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/vmscan.h>
+
 struct scan_control {
 	/* How many pages shrink_list() should reclaim */
 	unsigned long nr_to_reclaim;
@@ -2907,6 +2910,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	enum scan_balance scan_balance;
 	unsigned long ap, fp;
 	enum lru_list lru;
+	bool balance_anon_file_reclaim = false;
 
 	/* If we have no swap space, do not bother scanning anon folios. */
 	if (!sc->may_swap || !can_reclaim_anon_pages(memcg, pgdat->node_id, sc)) {
@@ -2944,11 +2948,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 		goto out;
 	}
 
+	trace_android_rvh_set_balance_anon_file_reclaim(&balance_anon_file_reclaim);
+
 	/*
 	 * If there is enough inactive page cache, we do not reclaim
-	 * anything from the anonymous working right now.
+	 * anything from the anonymous working right now. But when balancing
+	 * anon and page cache files for reclaim, allow swapping of anon pages
+	 * even if there are a number of inactive file cache pages.
 	 */
-	if (sc->cache_trim_mode) {
+	if (!balance_anon_file_reclaim && sc->cache_trim_mode) {
 		scan_balance = SCAN_FILE;
 		goto out;
 	}

diff --git a/mm/vmstat.c b/mm/vmstat.c
index b2371d7..436d3ef 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c

@@ -1176,9 +1176,7 @@ const char * const vmstat_text[] = {
 	"nr_zone_write_pending",
 	"nr_mlock",
 	"nr_bounce",
-#if IS_ENABLED(CONFIG_ZSMALLOC)
 	"nr_zspages",
-#endif
 	"nr_free_cma",
 
 	/* enum numa_stat_item counters */

diff --git a/modules.bzl b/modules.bzl
new file mode 100644
index 0000000..6473dcb
--- /dev/null
+++ b/modules.bzl

@@ -0,0 +1,59 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2022 The Android Open Source Project
+
+"""
+This module contains a full list of kernel modules
+ compiled by GKI.
+"""
+
+COMMON_GKI_MODULES_LIST = [
+    # keep sorted
+    "drivers/block/zram/zram.ko",
+    "drivers/bluetooth/btbcm.ko",
+    "drivers/bluetooth/btqca.ko",
+    "drivers/bluetooth/btsdio.ko",
+    "drivers/bluetooth/hci_uart.ko",
+    "drivers/net/can/dev/can-dev.ko",
+    "drivers/net/can/slcan/slcan.ko",
+    "drivers/net/can/vcan.ko",
+    "drivers/net/ppp/bsd_comp.ko",
+    "drivers/net/ppp/ppp_deflate.ko",
+    "drivers/net/ppp/ppp_generic.ko",
+    "drivers/net/ppp/ppp_mppe.ko",
+    "drivers/net/ppp/pppox.ko",
+    "drivers/net/ppp/pptp.ko",
+    "drivers/net/slip/slhc.ko",
+    "drivers/usb/class/cdc-acm.ko",
+    "drivers/usb/serial/ftdi_sio.ko",
+    "drivers/usb/serial/usbserial.ko",
+    "lib/crypto/libarc4.ko",
+    "mm/zsmalloc.ko",
+    "net/6lowpan/6lowpan.ko",
+    "net/6lowpan/nhc_dest.ko",
+    "net/6lowpan/nhc_fragment.ko",
+    "net/6lowpan/nhc_hop.ko",
+    "net/6lowpan/nhc_ipv6.ko",
+    "net/6lowpan/nhc_mobility.ko",
+    "net/6lowpan/nhc_routing.ko",
+    "net/6lowpan/nhc_udp.ko",
+    "net/8021q/8021q.ko",
+    "net/bluetooth/bluetooth.ko",
+    "net/bluetooth/hidp/hidp.ko",
+    "net/bluetooth/rfcomm/rfcomm.ko",
+    "net/can/can.ko",
+    "net/can/can-bcm.ko",
+    "net/can/can-gw.ko",
+    "net/can/can-raw.ko",
+    "net/ieee802154/6lowpan/ieee802154_6lowpan.ko",
+    "net/ieee802154/ieee802154.ko",
+    "net/ieee802154/ieee802154_socket.ko",
+    "net/l2tp/l2tp_core.ko",
+    "net/l2tp/l2tp_ppp.ko",
+    "net/mac80211/mac80211.ko",
+    "net/mac802154/mac802154.ko",
+    "net/nfc/nfc.ko",
+    "net/rfkill/rfkill.ko",
+    "net/tipc/diag.ko",
+    "net/tipc/tipc.ko",
+    "net/wireless/cfg80211.ko",
+]

diff --git a/net/OWNERS b/net/OWNERS
new file mode 100644
index 0000000..cbbfa70
--- /dev/null
+++ b/net/OWNERS

@@ -0,0 +1,2 @@
+lorenzo@google.com
+maze@google.com

diff --git a/net/TEST_MAPPING b/net/TEST_MAPPING
new file mode 100644
index 0000000..1eb8d43
--- /dev/null
+++ b/net/TEST_MAPPING

@@ -0,0 +1,12 @@
+{
+  "presubmit": [
+    {
+      "name": "CtsNetTestCases",
+      "options": [
+        {
+          "exclude-annotation": "com.android.testutils.SkipPresubmit"
+        }
+      ]
+    }
+  ]
+}

diff --git a/net/core/dev.c b/net/core/dev.c
index 70e0685..0df7c9f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c

@@ -150,6 +150,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/prandom.h>
 #include <linux/once_lite.h>
+#include <trace/hooks/net.h>
 
 #include "dev.h"
 #include "net-sysfs.h"
@@ -535,6 +536,12 @@ static inline void netdev_set_addr_lockdep_class(struct net_device *dev)
 
 static inline struct list_head *ptype_head(const struct packet_type *pt)
 {
+	struct list_head vendor_pt = { .next  = NULL, };
+
+	trace_android_vh_ptype_head(pt, &vendor_pt);
+	if (vendor_pt.next)
+		return vendor_pt.next;
+
 	if (pt->type == htons(ETH_P_ALL))
 		return pt->dev ? &pt->dev->ptype_all : &ptype_all;
 	else

diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index c40cd8dd..82a8a5c 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c

@@ -35,13 +35,11 @@
 #include <trace/events/tcp.h>
 #include <trace/events/fib.h>
 #include <trace/events/qdisc.h>
-#if IS_ENABLED(CONFIG_BRIDGE)
 #include <trace/events/bridge.h>
 EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_add);
 EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_external_learn_add);
 EXPORT_TRACEPOINT_SYMBOL_GPL(fdb_delete);
 EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_update);
-#endif
 
 #if IS_ENABLED(CONFIG_PAGE_POOL)
 #include <trace/events/page_pool.h>
@@ -56,6 +54,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(neigh_event_send_dead);
 EXPORT_TRACEPOINT_SYMBOL_GPL(neigh_cleanup_and_release);
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);
+EXPORT_TRACEPOINT_SYMBOL_GPL(consume_skb);
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(napi_poll);
 

diff --git a/net/core/sock.c b/net/core/sock.c
index 30407b2..c71c228 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c

@@ -135,6 +135,7 @@
 #include <net/bpf_sk_storage.h>
 
 #include <trace/events/sock.h>
+#include <trace/hooks/sched.h>
 
 #include <net/tcp.h>
 #include <net/busy_poll.h>
@@ -3271,9 +3272,19 @@ void sock_def_readable(struct sock *sk)
 
 	rcu_read_lock();
 	wq = rcu_dereference(sk->sk_wq);
-	if (skwq_has_sleeper(wq))
+
+	if (skwq_has_sleeper(wq)) {
+		int done = 0;
+
+		trace_android_vh_do_wake_up_sync(&wq->wait, &done, sk);
+		if (done)
+			goto out;
+
 		wake_up_interruptible_sync_poll(&wq->wait, EPOLLIN | EPOLLPRI |
 						EPOLLRDNORM | EPOLLRDBAND);
+	}
+
+out:
 	sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
 	rcu_read_unlock();
 }

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index e6c7edc..c967e5f 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c

@@ -211,6 +211,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
 	.accept_ra_rt_info_max_plen = 0,
 #endif
 #endif
+	.accept_ra_rt_table	= 0,
 	.proxy_ndp		= 0,
 	.accept_source_route	= 0,	/* we do not accept RH0 by default. */
 	.disable_ipv6		= 0,
@@ -271,6 +272,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	.accept_ra_rt_info_max_plen = 0,
 #endif
 #endif
+	.accept_ra_rt_table	= 0,
 	.proxy_ndp		= 0,
 	.accept_source_route	= 0,	/* we do not accept RH0 by default. */
 	.disable_ipv6		= 0,
@@ -2390,6 +2392,26 @@ static void ipv6_gen_rnd_iid(struct in6_addr *addr)
 		goto regen;
 }
 
+u32 addrconf_rt_table(const struct net_device *dev, u32 default_table)
+{
+	struct inet6_dev *idev = in6_dev_get(dev);
+	int sysctl;
+	u32 table;
+
+	if (!idev)
+		return default_table;
+	sysctl = idev->cnf.accept_ra_rt_table;
+	if (sysctl == 0) {
+		table = default_table;
+	} else if (sysctl > 0) {
+		table = (u32) sysctl;
+	} else {
+		table = (unsigned) dev->ifindex + (-sysctl);
+	}
+	in6_dev_put(idev);
+	return table;
+}
+
 /*
  *	Add prefix route.
  */
@@ -2400,7 +2422,7 @@ addrconf_prefix_route(struct in6_addr *pfx, int plen, u32 metric,
 		      u32 flags, gfp_t gfp_flags)
 {
 	struct fib6_config cfg = {
-		.fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX,
+		.fc_table = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_PREFIX),
 		.fc_metric = metric ? : IP6_RT_PRIO_ADDRCONF,
 		.fc_ifindex = dev->ifindex,
 		.fc_expires = expires,
@@ -2435,7 +2457,7 @@ static struct fib6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
 	struct fib6_node *fn;
 	struct fib6_info *rt = NULL;
 	struct fib6_table *table;
-	u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX;
+	u32 tb_id = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_PREFIX);
 
 	table = fib6_get_table(dev_net(dev), tb_id);
 	if (!table)
@@ -6829,6 +6851,13 @@ static const struct ctl_table addrconf_sysctl[] = {
 #endif
 #endif
 	{
+		.procname	= "accept_ra_rt_table",
+		.data		= &ipv6_devconf.accept_ra_rt_table,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
 		.procname	= "proxy_ndp",
 		.data		= &ipv6_devconf.proxy_ndp,
 		.maxlen		= sizeof(int),

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2f355f0..b6dac89 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c

@@ -4251,7 +4251,7 @@ static struct fib6_info *rt6_get_route_info(struct net *net,
 					   const struct in6_addr *gwaddr,
 					   struct net_device *dev)
 {
-	u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO;
+	u32 tb_id = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_INFO);
 	int ifindex = dev->ifindex;
 	struct fib6_node *fn;
 	struct fib6_info *rt = NULL;
@@ -4305,7 +4305,7 @@ static struct fib6_info *rt6_add_route_info(struct net *net,
 		.fc_nlinfo.nl_net = net,
 	};
 
-	cfg.fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO;
+	cfg.fc_table = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_INFO);
 	cfg.fc_dst = *prefix;
 	cfg.fc_gateway = *gwaddr;
 
@@ -4323,7 +4323,7 @@ struct fib6_info *rt6_get_dflt_router(struct net *net,
 				     const struct in6_addr *addr,
 				     struct net_device *dev)
 {
-	u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_DFLT;
+	u32 tb_id = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_DFLT);
 	struct fib6_info *rt;
 	struct fib6_table *table;
 
@@ -4358,7 +4358,7 @@ struct fib6_info *rt6_add_dflt_router(struct net *net,
 				     u32 defrtr_usr_metric)
 {
 	struct fib6_config cfg = {
-		.fc_table	= l3mdev_fib_table(dev) ? : RT6_TABLE_DFLT,
+		.fc_table	= l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_DFLT),
 		.fc_metric	= defrtr_usr_metric,
 		.fc_ifindex	= dev->ifindex,
 		.fc_flags	= RTF_GATEWAY | RTF_ADDRCONF | RTF_DEFAULT |
@@ -4383,47 +4383,24 @@ struct fib6_info *rt6_add_dflt_router(struct net *net,
 	return rt6_get_dflt_router(net, gwaddr, dev);
 }
 
-static void __rt6_purge_dflt_routers(struct net *net,
-				     struct fib6_table *table)
+static int rt6_addrconf_purge(struct fib6_info *rt, void *arg)
 {
-	struct fib6_info *rt;
+	struct net_device *dev = fib6_info_nh_dev(rt);
+	struct inet6_dev *idev = dev ? __in6_dev_get(dev) : NULL;
 
-restart:
-	rcu_read_lock();
-	for_each_fib6_node_rt_rcu(&table->tb6_root) {
-		struct net_device *dev = fib6_info_nh_dev(rt);
-		struct inet6_dev *idev = dev ? __in6_dev_get(dev) : NULL;
-
-		if (rt->fib6_flags & (RTF_DEFAULT | RTF_ADDRCONF) &&
-		    (!idev || idev->cnf.accept_ra != 2) &&
-		    fib6_info_hold_safe(rt)) {
-			rcu_read_unlock();
-			ip6_del_rt(net, rt, false);
-			goto restart;
-		}
+	if (rt->fib6_flags & (RTF_DEFAULT | RTF_ADDRCONF) &&
+	    (!idev || idev->cnf.accept_ra != 2)) {
+		/* Delete this route. See fib6_clean_tree() */
+		return -1;
 	}
-	rcu_read_unlock();
 
-	table->flags &= ~RT6_TABLE_HAS_DFLT_ROUTER;
+	/* Continue walking */
+	return 0;
 }
 
 void rt6_purge_dflt_routers(struct net *net)
 {
-	struct fib6_table *table;
-	struct hlist_head *head;
-	unsigned int h;
-
-	rcu_read_lock();
-
-	for (h = 0; h < FIB6_TABLE_HASHSZ; h++) {
-		head = &net->ipv6.fib_table_hash[h];
-		hlist_for_each_entry_rcu(table, head, tb6_hlist) {
-			if (table->flags & RT6_TABLE_HAS_DFLT_ROUTER)
-				__rt6_purge_dflt_routers(net, table);
-		}
-	}
-
-	rcu_read_unlock();
+	fib6_clean_all(net, rt6_addrconf_purge, NULL);
 }
 
 static void rtmsg_to_fib6_config(struct net *net,

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 4b8d046..09b2c54 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig

@@ -1523,6 +1523,29 @@
 	  If you want to compile it as a module, say M here and read
 	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
+config NETFILTER_XT_MATCH_QUOTA2
+	tristate '"quota2" match support'
+	depends on NETFILTER_ADVANCED
+	help
+	  This option adds a `quota2' match, which allows to match on a
+	  byte counter correctly and not per CPU.
+	  It allows naming the quotas.
+	  This is based on http://xtables-addons.git.sourceforge.net
+
+	  If you want to compile it as a module, say M here and read
+	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+
+config NETFILTER_XT_MATCH_QUOTA2_LOG
+	bool '"quota2" Netfilter LOG support'
+	depends on NETFILTER_XT_MATCH_QUOTA2
+	default n
+	help
+	  This option allows `quota2' to log ONCE when a quota limit
+	  is passed. It logs via NETLINK using the NETLINK_NFLOG family.
+	  It logs similarly to how ipt_ULOG would without data.
+
+	  If unsure, say `N'.
+
 config NETFILTER_XT_MATCH_RATEEST
 	tristate '"rateest" match support'
 	depends on NETFILTER_ADVANCED

diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 0f060d10..7756b2b 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile

@@ -206,6 +206,7 @@
 obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA2) += xt_quota2.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_RATEEST) += xt_rateest.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_RECENT) += xt_recent.o

diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c
index 0f8bb0b..30ba8d7 100644
--- a/net/netfilter/xt_IDLETIMER.c
+++ b/net/netfilter/xt_IDLETIMER.c

@@ -28,6 +28,11 @@
 #include <linux/kobject.h>
 #include <linux/workqueue.h>
 #include <linux/sysfs.h>
+#include <linux/suspend.h>
+#include <net/sock.h>
+#include <net/inet_sock.h>
+
+#define NLMSG_MAX_SIZE 64
 
 struct idletimer_tg {
 	struct list_head entry;
@@ -38,15 +43,112 @@ struct idletimer_tg {
 	struct kobject *kobj;
 	struct device_attribute attr;
 
+	struct timespec64 delayed_timer_trigger;
+	struct timespec64 last_modified_timer;
+	struct timespec64 last_suspend_time;
+	struct notifier_block pm_nb;
+
+	int timeout;
 	unsigned int refcnt;
 	u8 timer_type;
+
+	bool work_pending;
+	bool send_nl_msg;
+	bool active;
+	uid_t uid;
+	bool suspend_time_valid;
 };
 
 static LIST_HEAD(idletimer_tg_list);
 static DEFINE_MUTEX(list_mutex);
+static DEFINE_SPINLOCK(timestamp_lock);
 
 static struct kobject *idletimer_tg_kobj;
 
+static bool check_for_delayed_trigger(struct idletimer_tg *timer,
+				      struct timespec64 *ts)
+{
+	bool state;
+	struct timespec64 temp;
+	spin_lock_bh(&timestamp_lock);
+	timer->work_pending = false;
+	if ((ts->tv_sec - timer->last_modified_timer.tv_sec) > timer->timeout ||
+	    timer->delayed_timer_trigger.tv_sec != 0) {
+		state = false;
+		temp.tv_sec = timer->timeout;
+		temp.tv_nsec = 0;
+		if (timer->delayed_timer_trigger.tv_sec != 0) {
+			temp = timespec64_add(timer->delayed_timer_trigger,
+					      temp);
+			ts->tv_sec = temp.tv_sec;
+			ts->tv_nsec = temp.tv_nsec;
+			timer->delayed_timer_trigger.tv_sec = 0;
+			timer->work_pending = true;
+			schedule_work(&timer->work);
+		} else {
+			temp = timespec64_add(timer->last_modified_timer, temp);
+			ts->tv_sec = temp.tv_sec;
+			ts->tv_nsec = temp.tv_nsec;
+		}
+	} else {
+		state = timer->active;
+	}
+	spin_unlock_bh(&timestamp_lock);
+	return state;
+}
+
+static void notify_netlink_uevent(const char *iface, struct idletimer_tg *timer)
+{
+	char iface_msg[NLMSG_MAX_SIZE];
+	char state_msg[NLMSG_MAX_SIZE];
+	char timestamp_msg[NLMSG_MAX_SIZE];
+	char uid_msg[NLMSG_MAX_SIZE];
+	char *envp[] = { iface_msg, state_msg, timestamp_msg, uid_msg, NULL };
+	int res;
+	struct timespec64 ts;
+	u64 time_ns;
+	bool state;
+
+	res = snprintf(iface_msg, NLMSG_MAX_SIZE, "INTERFACE=%s",
+		       iface);
+	if (NLMSG_MAX_SIZE <= res) {
+		pr_err("message too long (%d)\n", res);
+		return;
+	}
+
+	ts = ktime_to_timespec64(ktime_get_boottime());
+	state = check_for_delayed_trigger(timer, &ts);
+	res = snprintf(state_msg, NLMSG_MAX_SIZE, "STATE=%s",
+		       state ? "active" : "inactive");
+
+	if (NLMSG_MAX_SIZE <= res) {
+		pr_err("message too long (%d)\n", res);
+		return;
+	}
+
+	if (state) {
+		res = snprintf(uid_msg, NLMSG_MAX_SIZE, "UID=%u", timer->uid);
+		if (NLMSG_MAX_SIZE <= res)
+			pr_err("message too long (%d)\n", res);
+	} else {
+		res = snprintf(uid_msg, NLMSG_MAX_SIZE, "UID=");
+		if (NLMSG_MAX_SIZE <= res)
+			pr_err("message too long (%d)\n", res);
+	}
+
+	time_ns = timespec64_to_ns(&ts);
+	res = snprintf(timestamp_msg, NLMSG_MAX_SIZE, "TIME_NS=%llu", time_ns);
+	if (NLMSG_MAX_SIZE <= res) {
+		timestamp_msg[0] = '\0';
+		pr_err("message too long (%d)\n", res);
+	}
+
+	pr_debug("putting nlmsg: <%s> <%s> <%s> <%s>\n", iface_msg, state_msg,
+		 timestamp_msg, uid_msg);
+	kobject_uevent_env(idletimer_tg_kobj, KOBJ_CHANGE, envp);
+	return;
+}
+
 static
 struct idletimer_tg *__idletimer_tg_find_by_label(const char *label)
 {
@@ -67,6 +169,7 @@ static ssize_t idletimer_tg_show(struct device *dev,
 	unsigned long expires = 0;
 	struct timespec64 ktimespec = {};
 	long time_diff = 0;
+	unsigned long now = jiffies;
 
 	mutex_lock(&list_mutex);
 
@@ -78,15 +181,19 @@ static ssize_t idletimer_tg_show(struct device *dev,
 			time_diff = ktimespec.tv_sec;
 		} else {
 			expires = timer->timer.expires;
-			time_diff = jiffies_to_msecs(expires - jiffies) / 1000;
+			time_diff = jiffies_to_msecs(expires - now) / 1000;
 		}
 	}
 
 	mutex_unlock(&list_mutex);
 
-	if (time_after(expires, jiffies) || ktimespec.tv_sec > 0)
+	if (time_after(expires, now) || ktimespec.tv_sec > 0)
 		return sysfs_emit(buf, "%ld\n", time_diff);
 
+	if (timer->send_nl_msg)
+		return sysfs_emit(buf, "0 %d\n",
+				  jiffies_to_msecs(now - expires) / 1000);
+
 	return sysfs_emit(buf, "0\n");
 }
 
@@ -96,6 +203,9 @@ static void idletimer_tg_work(struct work_struct *work)
 						  work);
 
 	sysfs_notify(idletimer_tg_kobj, NULL, timer->attr.attr.name);
+
+	if (timer->send_nl_msg)
+		notify_netlink_uevent(timer->attr.attr.name, timer);
 }
 
 static void idletimer_tg_expired(struct timer_list *t)
@@ -104,7 +214,62 @@ static void idletimer_tg_expired(struct timer_list *t)
 
 	pr_debug("timer %s expired\n", timer->attr.attr.name);
 
+	spin_lock_bh(&timestamp_lock);
+	timer->active = false;
+	timer->work_pending = true;
 	schedule_work(&timer->work);
+	spin_unlock_bh(&timestamp_lock);
+}
+
+static int idletimer_resume(struct notifier_block *notifier,
+			    unsigned long pm_event, void *unused)
+{
+	struct timespec64 ts;
+	unsigned long time_diff, now = jiffies;
+	struct idletimer_tg *timer = container_of(notifier,
+						  struct idletimer_tg, pm_nb);
+	if (!timer)
+		return NOTIFY_DONE;
+
+	switch (pm_event) {
+	case PM_SUSPEND_PREPARE:
+		timer->last_suspend_time =
+			ktime_to_timespec64(ktime_get_boottime());
+		timer->suspend_time_valid = true;
+		break;
+	case PM_POST_SUSPEND:
+		if (!timer->suspend_time_valid)
+			break;
+		timer->suspend_time_valid = false;
+
+		spin_lock_bh(&timestamp_lock);
+		if (!timer->active) {
+			spin_unlock_bh(&timestamp_lock);
+			break;
+		}
+		/* since jiffies are not updated when suspended now represents
+		 * the time it would have suspended */
+		if (time_after(timer->timer.expires, now)) {
+			ts = ktime_to_timespec64(ktime_get_boottime());
+			ts = timespec64_sub(ts, timer->last_suspend_time);
+			time_diff = timespec64_to_jiffies(&ts);
+			if (timer->timer.expires > (time_diff + now)) {
+				mod_timer_pending(&timer->timer,
+						  (timer->timer.expires - time_diff));
+			} else {
+				del_timer(&timer->timer);
+				timer->timer.expires = 0;
+				timer->active = false;
+				timer->work_pending = true;
+				schedule_work(&timer->work);
+			}
+		}
+		spin_unlock_bh(&timestamp_lock);
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_DONE;
 }
 
 static enum alarmtimer_restart idletimer_tg_alarmproc(struct alarm *alarm,
@@ -158,17 +323,34 @@ static int idletimer_tg_create(struct idletimer_tg_info *info)
 
 	ret = sysfs_create_file(idletimer_tg_kobj, &info->timer->attr.attr);
 	if (ret < 0) {
-		pr_debug("couldn't add file to sysfs");
+		pr_debug("couldn't add file to sysfs\n");
 		goto out_free_attr;
 	}
 
 	list_add(&info->timer->entry, &idletimer_tg_list);
-
-	timer_setup(&info->timer->timer, idletimer_tg_expired, 0);
+	pr_debug("timer type value is 0.\n");
+	info->timer->timer_type = 0;
 	info->timer->refcnt = 1;
+	info->timer->send_nl_msg = false;
+	info->timer->active = true;
+	info->timer->timeout = info->timeout;
+
+	info->timer->delayed_timer_trigger.tv_sec = 0;
+	info->timer->delayed_timer_trigger.tv_nsec = 0;
+	info->timer->work_pending = false;
+	info->timer->uid = 0;
+	info->timer->last_modified_timer =
+		ktime_to_timespec64(ktime_get_boottime());
+
+	info->timer->pm_nb.notifier_call = idletimer_resume;
+	ret = register_pm_notifier(&info->timer->pm_nb);
+	if (ret)
+		printk(KERN_WARNING "[%s] Failed to register pm notifier %d\n",
+		       __func__, ret);
 
 	INIT_WORK(&info->timer->work, idletimer_tg_work);
 
+	timer_setup(&info->timer->timer, idletimer_tg_expired, 0);
 	mod_timer(&info->timer->timer,
 		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
 
@@ -186,7 +368,7 @@ static int idletimer_tg_create_v1(struct idletimer_tg_info_v1 *info)
 {
 	int ret;
 
-	info->timer = kmalloc(sizeof(*info->timer), GFP_KERNEL);
+	info->timer = kzalloc(sizeof(*info->timer), GFP_KERNEL);
 	if (!info->timer) {
 		ret = -ENOMEM;
 		goto out;
@@ -207,7 +389,7 @@ static int idletimer_tg_create_v1(struct idletimer_tg_info_v1 *info)
 
 	ret = sysfs_create_file(idletimer_tg_kobj, &info->timer->attr.attr);
 	if (ret < 0) {
-		pr_debug("couldn't add file to sysfs");
+		pr_debug("couldn't add file to sysfs\n");
 		goto out_free_attr;
 	}
 
@@ -215,9 +397,25 @@ static int idletimer_tg_create_v1(struct idletimer_tg_info_v1 *info)
 	kobject_uevent(idletimer_tg_kobj,KOBJ_ADD);
 
 	list_add(&info->timer->entry, &idletimer_tg_list);
-	pr_debug("timer type value is %u", info->timer_type);
+	pr_debug("timer type value is %u\n", info->timer_type);
 	info->timer->timer_type = info->timer_type;
 	info->timer->refcnt = 1;
+	info->timer->send_nl_msg = (info->send_nl_msg != 0);
+	info->timer->active = true;
+	info->timer->timeout = info->timeout;
+
+	info->timer->delayed_timer_trigger.tv_sec = 0;
+	info->timer->delayed_timer_trigger.tv_nsec = 0;
+	info->timer->work_pending = false;
+	info->timer->uid = 0;
+	info->timer->last_modified_timer =
+		ktime_to_timespec64(ktime_get_boottime());
+
+	info->timer->pm_nb.notifier_call = idletimer_resume;
+	ret = register_pm_notifier(&info->timer->pm_nb);
+	if (ret)
+		printk(KERN_WARNING "[%s] Failed to register pm notifier %d\n",
+		       __func__, ret);
 
 	INIT_WORK(&info->timer->work, idletimer_tg_work);
 
@@ -231,7 +429,7 @@ static int idletimer_tg_create_v1(struct idletimer_tg_info_v1 *info)
 	} else {
 		timer_setup(&info->timer->timer, idletimer_tg_expired, 0);
 		mod_timer(&info->timer->timer,
-				msecs_to_jiffies(info->timeout * 1000) + jiffies);
+			  msecs_to_jiffies(info->timeout * 1000) + jiffies);
 	}
 
 	return 0;
@@ -244,6 +442,41 @@ static int idletimer_tg_create_v1(struct idletimer_tg_info_v1 *info)
 	return ret;
 }
 
+static void reset_timer(struct idletimer_tg * const info_timer,
+			const __u32 info_timeout,
+			struct sk_buff *skb)
+{
+	unsigned long now = jiffies;
+	bool timer_prev;
+
+	spin_lock_bh(&timestamp_lock);
+	timer_prev = info_timer->active;
+	info_timer->active = true;
+	/* timer_prev is used to guard overflow problem in time_before*/
+	if (!timer_prev || time_before(info_timer->timer.expires, now)) {
+		pr_debug("Starting Checkentry timer (Expired, Jiffies): %lu, %lu\n",
+			 info_timer->timer.expires, now);
+
+		/* Stores the uid resposible for waking up the radio */
+		if (skb && (skb->sk)) {
+			info_timer->uid = from_kuid_munged(current_user_ns(),
+							   sock_i_uid(skb_to_full_sk(skb)));
+		}
+
+		/* checks if there is a pending inactive notification*/
+		if (info_timer->work_pending)
+			info_timer->delayed_timer_trigger = info_timer->last_modified_timer;
+		else {
+			info_timer->work_pending = true;
+			schedule_work(&info_timer->work);
+		}
+	}
+
+	info_timer->last_modified_timer = ktime_to_timespec64(ktime_get_boottime());
+	mod_timer(&info_timer->timer, msecs_to_jiffies(info_timeout * 1000) + now);
+	spin_unlock_bh(&timestamp_lock);
+}
+
 /*
  * The actual xt_tables plugin.
  */
@@ -251,12 +484,21 @@ static unsigned int idletimer_tg_target(struct sk_buff *skb,
 					 const struct xt_action_param *par)
 {
 	const struct idletimer_tg_info *info = par->targinfo;
+	unsigned long now = jiffies;
 
 	pr_debug("resetting timer %s, timeout period %u\n",
 		 info->label, info->timeout);
 
-	mod_timer(&info->timer->timer,
-		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
+	info->timer->active = true;
+
+	if (time_before(info->timer->timer.expires, now)) {
+		schedule_work(&info->timer->work);
+		pr_debug("Starting timer %s (Expired, Jiffies): %lu, %lu\n",
+			 info->label, info->timer->timer.expires, now);
+	}
+
+	/* TODO: Avoid modifying timers on each packet */
+	reset_timer(info->timer, info->timeout, skb);
 
 	return XT_CONTINUE;
 }
@@ -268,6 +510,7 @@ static unsigned int idletimer_tg_target_v1(struct sk_buff *skb,
 					 const struct xt_action_param *par)
 {
 	const struct idletimer_tg_info_v1 *info = par->targinfo;
+	unsigned long now = jiffies;
 
 	pr_debug("resetting timer %s, timeout period %u\n",
 		 info->label, info->timeout);
@@ -276,8 +519,16 @@ static unsigned int idletimer_tg_target_v1(struct sk_buff *skb,
 		ktime_t tout = ktime_set(info->timeout, 0);
 		alarm_start_relative(&info->timer->alarm, tout);
 	} else {
-		mod_timer(&info->timer->timer,
-				msecs_to_jiffies(info->timeout * 1000) + jiffies);
+		info->timer->active = true;
+
+		if (time_before(info->timer->timer.expires, now)) {
+			schedule_work(&info->timer->work);
+			pr_debug("Starting timer %s (Expired, Jiffies): %lu, %lu\n",
+				 info->label, info->timer->timer.expires, now);
+		}
+
+		/* TODO: Avoid modifying timers on each packet */
+		reset_timer(info->timer, info->timeout, skb);
 	}
 
 	return XT_CONTINUE;
@@ -321,9 +572,7 @@ static int idletimer_tg_checkentry(const struct xt_tgchk_param *par)
 	info->timer = __idletimer_tg_find_by_label(info->label);
 	if (info->timer) {
 		info->timer->refcnt++;
-		mod_timer(&info->timer->timer,
-			  msecs_to_jiffies(info->timeout * 1000) + jiffies);
-
+		reset_timer(info->timer, info->timeout, NULL);
 		pr_debug("increased refcnt of timer %s to %u\n",
 			 info->label, info->timer->refcnt);
 	} else {
@@ -346,9 +595,6 @@ static int idletimer_tg_checkentry_v1(const struct xt_tgchk_param *par)
 
 	pr_debug("checkentry targinfo%s\n", info->label);
 
-	if (info->send_nl_msg)
-		return -EOPNOTSUPP;
-
 	ret = idletimer_tg_helper((struct idletimer_tg_info *)info);
 	if(ret < 0)
 	{
@@ -361,6 +607,11 @@ static int idletimer_tg_checkentry_v1(const struct xt_tgchk_param *par)
 		return -EINVAL;
 	}
 
+	if (info->send_nl_msg > 1) {
+		pr_debug("invalid value for send_nl_msg\n");
+		return -EINVAL;
+	}
+
 	mutex_lock(&list_mutex);
 
 	info->timer = __idletimer_tg_find_by_label(info->label);
@@ -383,8 +634,7 @@ static int idletimer_tg_checkentry_v1(const struct xt_tgchk_param *par)
 				alarm_start_relative(&info->timer->alarm, tout);
 			}
 		} else {
-				mod_timer(&info->timer->timer,
-					msecs_to_jiffies(info->timeout * 1000) + jiffies);
+			reset_timer(info->timer, info->timeout, NULL);
 		}
 		pr_debug("increased refcnt of timer %s to %u\n",
 			 info->label, info->timer->refcnt);
@@ -414,8 +664,9 @@ static void idletimer_tg_destroy(const struct xt_tgdtor_param *par)
 
 		list_del(&info->timer->entry);
 		del_timer_sync(&info->timer->timer);
-		cancel_work_sync(&info->timer->work);
 		sysfs_remove_file(idletimer_tg_kobj, &info->timer->attr.attr);
+		unregister_pm_notifier(&info->timer->pm_nb);
+		cancel_work_sync(&info->timer->work);
 		kfree(info->timer->attr.attr.name);
 		kfree(info->timer);
 	} else {
@@ -443,8 +694,9 @@ static void idletimer_tg_destroy_v1(const struct xt_tgdtor_param *par)
 		} else {
 			del_timer_sync(&info->timer->timer);
 		}
-		cancel_work_sync(&info->timer->work);
 		sysfs_remove_file(idletimer_tg_kobj, &info->timer->attr.attr);
+		unregister_pm_notifier(&info->timer->pm_nb);
+		cancel_work_sync(&info->timer->work);
 		kfree(info->timer->attr.attr.name);
 		kfree(info->timer);
 	} else {
@@ -540,3 +792,4 @@ MODULE_DESCRIPTION("Xtables: idle time monitor");
 MODULE_LICENSE("GPL v2");
 MODULE_ALIAS("ipt_IDLETIMER");
 MODULE_ALIAS("ip6t_IDLETIMER");
+MODULE_ALIAS("arpt_IDLETIMER");

diff --git a/net/netfilter/xt_quota2.c b/net/netfilter/xt_quota2.c
new file mode 100644
index 0000000..27acc86
--- /dev/null
+++ b/net/netfilter/xt_quota2.c

@@ -0,0 +1,397 @@
+/*
+ * xt_quota2 - enhanced xt_quota that can count upwards and in packets
+ * as a minimal accounting match.
+ * by Jan Engelhardt <jengelh@medozas.de>, 2008
+ *
+ * Originally based on xt_quota.c:
+ * 	netfilter module to enforce network quotas
+ * 	Sam Johnston <samj@samj.net>
+ *
+ *	This program is free software; you can redistribute it and/or modify
+ *	it under the terms of the GNU General Public License; either
+ *	version 2 of the License, as published by the Free Software Foundation.
+ */
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/skbuff.h>
+#include <linux/spinlock.h>
+#include <asm/atomic.h>
+#include <net/netlink.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_quota2.h>
+
+#ifdef CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
+/* For compatibility, these definitions are copied from the
+ * deprecated header file <linux/netfilter_ipv4/ipt_ULOG.h> */
+#define ULOG_MAC_LEN	80
+#define ULOG_PREFIX_LEN	32
+
+/* Format of the ULOG packets passed through netlink */
+typedef struct ulog_packet_msg {
+	unsigned long mark;
+	long timestamp_sec;
+	long timestamp_usec;
+	unsigned int hook;
+	char indev_name[IFNAMSIZ];
+	char outdev_name[IFNAMSIZ];
+	size_t data_len;
+	char prefix[ULOG_PREFIX_LEN];
+	unsigned char mac_len;
+	unsigned char mac[ULOG_MAC_LEN];
+	unsigned char payload[0];
+} ulog_packet_msg_t;
+#endif
+
+/**
+ * @lock:	lock to protect quota writers from each other
+ */
+struct xt_quota_counter {
+	u_int64_t quota;
+	spinlock_t lock;
+	struct list_head list;
+	atomic_t ref;
+	char name[sizeof(((struct xt_quota_mtinfo2 *)NULL)->name)];
+	struct proc_dir_entry *procfs_entry;
+};
+
+#ifdef CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
+/* Harald's favorite number +1 :D From ipt_ULOG.C */
+static int qlog_nl_event = 112;
+module_param_named(event_num, qlog_nl_event, uint, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(event_num,
+		 "Event number for NETLINK_NFLOG message. 0 disables log."
+		 "111 is what ipt_ULOG uses.");
+static struct sock *nflognl;
+#endif
+
+static LIST_HEAD(counter_list);
+static DEFINE_SPINLOCK(counter_list_lock);
+
+static struct proc_dir_entry *proc_xt_quota;
+static unsigned int quota_list_perms = S_IRUGO | S_IWUSR;
+static kuid_t quota_list_uid = KUIDT_INIT(0);
+static kgid_t quota_list_gid = KGIDT_INIT(0);
+module_param_named(perms, quota_list_perms, uint, S_IRUGO | S_IWUSR);
+
+#ifdef CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
+static void quota2_log(unsigned int hooknum,
+		       const struct sk_buff *skb,
+		       const struct net_device *in,
+		       const struct net_device *out,
+		       const char *prefix)
+{
+	ulog_packet_msg_t *pm;
+	struct sk_buff *log_skb;
+	size_t size;
+	struct nlmsghdr *nlh;
+
+	if (!qlog_nl_event)
+		return;
+
+	size = NLMSG_SPACE(sizeof(*pm));
+	size = max(size, (size_t)NLMSG_GOODSIZE);
+	log_skb = alloc_skb(size, GFP_ATOMIC);
+	if (!log_skb) {
+		pr_err("xt_quota2: cannot alloc skb for logging\n");
+		return;
+	}
+
+	nlh = nlmsg_put(log_skb, /*pid*/0, /*seq*/0, qlog_nl_event,
+			sizeof(*pm), 0);
+	if (!nlh) {
+		pr_err("xt_quota2: nlmsg_put failed\n");
+		kfree_skb(log_skb);
+		return;
+	}
+	pm = nlmsg_data(nlh);
+	memset(pm, 0, sizeof(*pm));
+	if (skb->tstamp == 0)
+		__net_timestamp((struct sk_buff *)skb);
+	pm->hook = hooknum;
+	if (prefix != NULL)
+		strlcpy(pm->prefix, prefix, sizeof(pm->prefix));
+	if (in)
+		strlcpy(pm->indev_name, in->name, sizeof(pm->indev_name));
+	if (out)
+		strlcpy(pm->outdev_name, out->name, sizeof(pm->outdev_name));
+
+	NETLINK_CB(log_skb).dst_group = 1;
+	pr_debug("throwing 1 packets to netlink group 1\n");
+	netlink_broadcast(nflognl, log_skb, 0, 1, GFP_ATOMIC);
+}
+#else
+static void quota2_log(unsigned int hooknum,
+		       const struct sk_buff *skb,
+		       const struct net_device *in,
+		       const struct net_device *out,
+		       const char *prefix)
+{
+}
+#endif  /* if+else CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG */
+
+static ssize_t quota_proc_read(struct file *file, char __user *buf,
+			   size_t size, loff_t *ppos)
+{
+	struct xt_quota_counter *e = pde_data(file_inode(file));
+	char tmp[24];
+	size_t tmp_size;
+
+	spin_lock_bh(&e->lock);
+	tmp_size = scnprintf(tmp, sizeof(tmp), "%llu\n", e->quota);
+	spin_unlock_bh(&e->lock);
+	return simple_read_from_buffer(buf, size, ppos, tmp, tmp_size);
+}
+
+static ssize_t quota_proc_write(struct file *file, const char __user *input,
+                            size_t size, loff_t *ppos)
+{
+	struct xt_quota_counter *e = pde_data(file_inode(file));
+	char buf[sizeof("18446744073709551616")];
+
+	if (size > sizeof(buf))
+		size = sizeof(buf);
+	if (copy_from_user(buf, input, size) != 0)
+		return -EFAULT;
+	buf[sizeof(buf)-1] = '\0';
+	if (size < sizeof(buf))
+		buf[size] = '\0';
+
+	spin_lock_bh(&e->lock);
+	e->quota = simple_strtoull(buf, NULL, 0);
+	spin_unlock_bh(&e->lock);
+	return size;
+}
+
+static const struct proc_ops q2_counter_fops = {
+	.proc_read	= quota_proc_read,
+	.proc_write	= quota_proc_write,
+	.proc_lseek	= default_llseek,
+};
+
+static struct xt_quota_counter *
+q2_new_counter(const struct xt_quota_mtinfo2 *q, bool anon)
+{
+	struct xt_quota_counter *e;
+	unsigned int size;
+
+	/* Do not need all the procfs things for anonymous counters. */
+	size = anon ? offsetof(typeof(*e), list) : sizeof(*e);
+	e = kmalloc(size, GFP_KERNEL);
+	if (e == NULL)
+		return NULL;
+
+	e->quota = q->quota;
+	spin_lock_init(&e->lock);
+	if (!anon) {
+		INIT_LIST_HEAD(&e->list);
+		atomic_set(&e->ref, 1);
+		strlcpy(e->name, q->name, sizeof(e->name));
+	}
+	return e;
+}
+
+/**
+ * q2_get_counter - get ref to counter or create new
+ * @name:	name of counter
+ */
+static struct xt_quota_counter *
+q2_get_counter(const struct xt_quota_mtinfo2 *q)
+{
+	struct proc_dir_entry *p;
+	struct xt_quota_counter *e = NULL;
+	struct xt_quota_counter *new_e;
+
+	if (*q->name == '\0')
+		return q2_new_counter(q, true);
+
+	/* No need to hold a lock while getting a new counter */
+	new_e = q2_new_counter(q, false);
+	if (new_e == NULL)
+		goto out;
+
+	spin_lock_bh(&counter_list_lock);
+	list_for_each_entry(e, &counter_list, list)
+		if (strcmp(e->name, q->name) == 0) {
+			atomic_inc(&e->ref);
+			spin_unlock_bh(&counter_list_lock);
+			kfree(new_e);
+			pr_debug("xt_quota2: old counter name=%s", e->name);
+			return e;
+		}
+	e = new_e;
+	pr_debug("xt_quota2: new_counter name=%s", e->name);
+	list_add_tail(&e->list, &counter_list);
+	/* The entry having a refcount of 1 is not directly destructible.
+	 * This func has not yet returned the new entry, thus iptables
+	 * has not references for destroying this entry.
+	 * For another rule to try to destroy it, it would 1st need for this
+	 * func* to be re-invoked, acquire a new ref for the same named quota.
+	 * Nobody will access the e->procfs_entry either.
+	 * So release the lock. */
+	spin_unlock_bh(&counter_list_lock);
+
+	/* create_proc_entry() is not spin_lock happy */
+	p = e->procfs_entry = proc_create_data(e->name, quota_list_perms,
+	                      proc_xt_quota, &q2_counter_fops, e);
+
+	if (IS_ERR_OR_NULL(p)) {
+		spin_lock_bh(&counter_list_lock);
+		list_del(&e->list);
+		spin_unlock_bh(&counter_list_lock);
+		goto out;
+	}
+	proc_set_user(p, quota_list_uid, quota_list_gid);
+	return e;
+
+ out:
+	kfree(e);
+	return NULL;
+}
+
+static int quota_mt2_check(const struct xt_mtchk_param *par)
+{
+	struct xt_quota_mtinfo2 *q = par->matchinfo;
+
+	pr_debug("xt_quota2: check() flags=0x%04x", q->flags);
+
+	if (q->flags & ~XT_QUOTA_MASK)
+		return -EINVAL;
+
+	q->name[sizeof(q->name)-1] = '\0';
+	if (*q->name == '.' || strchr(q->name, '/') != NULL) {
+		printk(KERN_ERR "xt_quota.3: illegal name\n");
+		return -EINVAL;
+	}
+
+	q->master = q2_get_counter(q);
+	if (q->master == NULL) {
+		printk(KERN_ERR "xt_quota.3: memory alloc failure\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void quota_mt2_destroy(const struct xt_mtdtor_param *par)
+{
+	struct xt_quota_mtinfo2 *q = par->matchinfo;
+	struct xt_quota_counter *e = q->master;
+
+	if (*q->name == '\0') {
+		kfree(e);
+		return;
+	}
+
+	spin_lock_bh(&counter_list_lock);
+	if (!atomic_dec_and_test(&e->ref)) {
+		spin_unlock_bh(&counter_list_lock);
+		return;
+	}
+
+	list_del(&e->list);
+	spin_unlock_bh(&counter_list_lock);
+	remove_proc_entry(e->name, proc_xt_quota);
+	kfree(e);
+}
+
+static bool
+quota_mt2(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	struct xt_quota_mtinfo2 *q = (void *)par->matchinfo;
+	struct xt_quota_counter *e = q->master;
+	int charge = (q->flags & XT_QUOTA_PACKET) ? 1 : skb->len;
+	bool no_change = q->flags & XT_QUOTA_NO_CHANGE;
+	bool ret = q->flags & XT_QUOTA_INVERT;
+
+	spin_lock_bh(&e->lock);
+	if (q->flags & XT_QUOTA_GROW) {
+		/*
+		 * While no_change is pointless in "grow" mode, we will
+		 * implement it here simply to have a consistent behavior.
+		 */
+		if (!no_change)
+			e->quota += charge;
+		ret = true; /* note: does not respect inversion (bug??) */
+	} else {
+		if (e->quota > charge) {
+			if (!no_change)
+				e->quota -= charge;
+			ret = !ret;
+		} else if (e->quota) {
+			/* We are transitioning, log that fact. */
+			quota2_log(xt_hooknum(par),
+				   skb,
+				   xt_in(par),
+				   xt_out(par),
+				   q->name);
+			/* we do not allow even small packets from now on */
+			e->quota = 0;
+		}
+	}
+	spin_unlock_bh(&e->lock);
+	return ret;
+}
+
+static struct xt_match quota_mt2_reg[] __read_mostly = {
+	{
+		.name       = "quota2",
+		.revision   = 3,
+		.family     = NFPROTO_IPV4,
+		.checkentry = quota_mt2_check,
+		.match      = quota_mt2,
+		.destroy    = quota_mt2_destroy,
+		.matchsize  = sizeof(struct xt_quota_mtinfo2),
+		.usersize   = offsetof(struct xt_quota_mtinfo2, master),
+		.me         = THIS_MODULE,
+	},
+	{
+		.name       = "quota2",
+		.revision   = 3,
+		.family     = NFPROTO_IPV6,
+		.checkentry = quota_mt2_check,
+		.match      = quota_mt2,
+		.destroy    = quota_mt2_destroy,
+		.matchsize  = sizeof(struct xt_quota_mtinfo2),
+		.usersize   = offsetof(struct xt_quota_mtinfo2, master),
+		.me         = THIS_MODULE,
+	},
+};
+
+static int __init quota_mt2_init(void)
+{
+	int ret;
+	pr_debug("xt_quota2: init()");
+
+#ifdef CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
+	nflognl = netlink_kernel_create(&init_net, NETLINK_NFLOG, NULL);
+	if (!nflognl)
+		return -ENOMEM;
+#endif
+
+	proc_xt_quota = proc_mkdir("xt_quota", init_net.proc_net);
+	if (proc_xt_quota == NULL)
+		return -EACCES;
+
+	ret = xt_register_matches(quota_mt2_reg, ARRAY_SIZE(quota_mt2_reg));
+	if (ret < 0)
+		remove_proc_entry("xt_quota", init_net.proc_net);
+	pr_debug("xt_quota2: init() %d", ret);
+	return ret;
+}
+
+static void __exit quota_mt2_exit(void)
+{
+	xt_unregister_matches(quota_mt2_reg, ARRAY_SIZE(quota_mt2_reg));
+	remove_proc_entry("xt_quota", init_net.proc_net);
+}
+
+module_init(quota_mt2_init);
+module_exit(quota_mt2_exit);
+MODULE_DESCRIPTION("Xtables: countdown quota match; up counter");
+MODULE_AUTHOR("Sam Johnston <samj@samj.net>");
+MODULE_AUTHOR("Jan Engelhardt <jengelh@medozas.de>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_quota2");
+MODULE_ALIAS("ip6t_quota2");

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a9980e9b..b6ab3c0 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c

@@ -26,6 +26,10 @@
 /* Threshold for detecting small packets to copy */
 #define GOOD_COPY_LEN  128
 
+uint virtio_transport_max_vsock_pkt_buf_size = 64 * 1024;
+module_param(virtio_transport_max_vsock_pkt_buf_size, uint, 0444);
+EXPORT_SYMBOL_GPL(virtio_transport_max_vsock_pkt_buf_size);
+
 static const struct virtio_transport *
 virtio_transport_get_ops(struct vsock_sock *vsk)
 {

diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index 094734f..8ff6e87 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c

@@ -237,7 +237,7 @@ static struct xfrm_algo_desc aalg_list[] = {
 
 	.uinfo = {
 		.auth = {
-			.icv_truncbits = 96,
+			.icv_truncbits = IS_ENABLED(CONFIG_GKI_NET_XFRM_HACKS) ? 128 : 96,
 			.icv_fullbits = 256,
 		}
 	},

diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 9a5e79a..93e790d 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c

@@ -480,13 +480,11 @@ static int xfrm_outer_mode_output(struct xfrm_state *x, struct sk_buff *skb)
 	return -EOPNOTSUPP;
 }
 
-#if IS_ENABLED(CONFIG_NET_PKTGEN)
 int pktgen_xfrm_outer_mode_output(struct xfrm_state *x, struct sk_buff *skb)
 {
 	return xfrm_outer_mode_output(x, skb);
 }
 EXPORT_SYMBOL_GPL(pktgen_xfrm_outer_mode_output);
-#endif
 
 static int xfrm_output_one(struct sk_buff *skb, int err)
 {

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 3d2fe77..8499779 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c

@@ -2442,19 +2442,22 @@ int xfrm_user_policy(struct sock *sk, int optname, sockptr_t optval, int optlen)
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
-	if (in_compat_syscall()) {
-		struct xfrm_translator *xtr = xfrm_get_translator();
+	/* Use the 64-bit / untranslated format on Android, even for compat */
+	if (!IS_ENABLED(CONFIG_GKI_NET_XFRM_HACKS) || IS_ENABLED(CONFIG_XFRM_USER_COMPAT)) {
+		if (in_compat_syscall()) {
+			struct xfrm_translator *xtr = xfrm_get_translator();
 
-		if (!xtr) {
-			kfree(data);
-			return -EOPNOTSUPP;
-		}
+			if (!xtr) {
+				kfree(data);
+				return -EOPNOTSUPP;
+			}
 
-		err = xtr->xlate_user_policy_sockptr(&data, optlen);
-		xfrm_put_translator(xtr);
-		if (err) {
-			kfree(data);
-			return err;
+			err = xtr->xlate_user_policy_sockptr(&data, optlen);
+			xfrm_put_translator(xtr);
+			if (err) {
+				kfree(data);
+				return err;
+			}
 		}
 	}
 

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index e73f9ef..9605896 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c

@@ -3007,19 +3007,22 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (!netlink_net_capable(skb, CAP_NET_ADMIN))
 		return -EPERM;
 
-	if (in_compat_syscall()) {
-		struct xfrm_translator *xtr = xfrm_get_translator();
+	/* Use the 64-bit / untranslated format on Android, even for compat */
+	if (!IS_ENABLED(CONFIG_GKI_NET_XFRM_HACKS) || IS_ENABLED(CONFIG_XFRM_USER_COMPAT)) {
+		if (in_compat_syscall()) {
+			struct xfrm_translator *xtr = xfrm_get_translator();
 
-		if (!xtr)
-			return -EOPNOTSUPP;
+			if (!xtr)
+				return -EOPNOTSUPP;
 
-		nlh64 = xtr->rcv_msg_compat(nlh, link->nla_max,
-					    link->nla_pol, extack);
-		xfrm_put_translator(xtr);
-		if (IS_ERR(nlh64))
-			return PTR_ERR(nlh64);
-		if (nlh64)
-			nlh = nlh64;
+			nlh64 = xtr->rcv_msg_compat(nlh, link->nla_max,
+						    link->nla_pol, extack);
+			xfrm_put_translator(xtr);
+			if (IS_ERR(nlh64))
+				return PTR_ERR(nlh64);
+			if (nlh64)
+				nlh = nlh64;
+		}
 	}
 
 	if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) ||

diff --git a/samples/crypto/fips140_lab_util.c b/samples/crypto/fips140_lab_util.c
new file mode 100644
index 0000000..5f8e901
--- /dev/null
+++ b/samples/crypto/fips140_lab_util.c

@@ -0,0 +1,638 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2021 Google LLC
+ *
+ * This program provides commands that dump certain types of output from the
+ * fips140 kernel module, as required by the FIPS lab for evaluation purposes.
+ *
+ * While the fips140 kernel module can only be accessed directly by other kernel
+ * code, an easy-to-use userspace utility program was desired for lab testing.
+ * When possible, this program uses AF_ALG to access the crypto algorithms; this
+ * requires that the kernel has AF_ALG enabled.  Where AF_ALG isn't sufficient,
+ * a custom device node /dev/fips140 is used instead; this requires that the
+ * fips140 module is loaded and has evaluation testing support compiled in.
+ *
+ * This program can be compiled and run on an Android device as follows:
+ *
+ *	NDK_DIR=$HOME/android-ndk-r23b  # adjust directory path as needed
+ *	$NDK_DIR/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android31-clang \
+ *		fips140_lab_util.c -O2 -Wall -o fips140_lab_util
+ *	adb push fips140_lab_util /data/local/tmp/
+ *	adb root
+ *	adb shell /data/local/tmp/fips140_lab_util
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <limits.h>
+#include <linux/if_alg.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/sysmacros.h>
+#include <unistd.h>
+
+#include "../../crypto/fips140-eval-testing-uapi.h"
+
+/* ---------------------------------------------------------------------------
+ *			       Utility functions
+ * ---------------------------------------------------------------------------*/
+
+#define ARRAY_SIZE(A)	(sizeof(A) / sizeof((A)[0]))
+#define MIN(a, b)	((a) < (b) ? (a) : (b))
+#define MAX(a, b)	((a) > (b) ? (a) : (b))
+
+static void __attribute__((noreturn))
+do_die(const char *format, va_list va, int err)
+{
+	fputs("ERROR: ", stderr);
+	vfprintf(stderr, format, va);
+	if (err)
+		fprintf(stderr, ": %s", strerror(err));
+	putc('\n', stderr);
+	exit(1);
+}
+
+static void __attribute__((noreturn, format(printf, 1, 2)))
+die_errno(const char *format, ...)
+{
+	va_list va;
+
+	va_start(va, format);
+	do_die(format, va, errno);
+	va_end(va);
+}
+
+static void __attribute__((noreturn, format(printf, 1, 2)))
+die(const char *format, ...)
+{
+	va_list va;
+
+	va_start(va, format);
+	do_die(format, va, 0);
+	va_end(va);
+}
+
+static void __attribute__((noreturn))
+assertion_failed(const char *expr, const char *file, int line)
+{
+	die("Assertion failed: %s at %s:%d", expr, file, line);
+}
+
+#define ASSERT(e) ({ if (!(e)) assertion_failed(#e, __FILE__, __LINE__); })
+
+static void rand_bytes(uint8_t *bytes, size_t count)
+{
+	size_t i;
+
+	for (i = 0; i < count; i++)
+		bytes[i] = rand();
+}
+
+static const char *booltostr(bool b)
+{
+	return b ? "true" : "false";
+}
+
+static const char *bytes_to_hex(const uint8_t *bytes, size_t count)
+{
+	static char hex[1025];
+	size_t i;
+
+	ASSERT(count <= 512);
+	for (i = 0; i < count; i++)
+		sprintf(&hex[2*i], "%02x", bytes[i]);
+	return hex;
+}
+
+static void full_write(int fd, const void *buf, size_t count)
+{
+	while (count) {
+		ssize_t ret = write(fd, buf, count);
+
+		if (ret < 0)
+			die_errno("write failed");
+		buf += ret;
+		count -= ret;
+	}
+}
+
+enum {
+	OPT_AMOUNT,
+	OPT_ITERATIONS,
+};
+
+static void usage(void);
+
+/* ---------------------------------------------------------------------------
+ *			      /dev/fips140 ioctls
+ * ---------------------------------------------------------------------------*/
+
+static int get_fips140_device_number(void)
+{
+	FILE *f;
+	char line[128];
+	int number;
+	char name[32];
+
+	f = fopen("/proc/devices", "r");
+	if (!f)
+		die_errno("Failed to open /proc/devices");
+	while (fgets(line, sizeof(line), f)) {
+		if (sscanf(line, "%d %31s", &number, name) == 2 &&
+		    strcmp(name, "fips140") == 0)
+			return number;
+	}
+	fclose(f);
+	die("fips140 device node is unavailable.\n"
+"The fips140 device node is only available when the fips140 module is loaded\n"
+"and has been built with evaluation testing support.");
+}
+
+static void create_fips140_node_if_needed(void)
+{
+	struct stat stbuf;
+	int major;
+
+	if (stat("/dev/fips140", &stbuf) == 0)
+		return;
+
+	major = get_fips140_device_number();
+	if (mknod("/dev/fips140", S_IFCHR | 0600, makedev(major, 1)) != 0)
+		die_errno("Failed to create fips140 device node");
+}
+
+static int fips140_dev_fd = -1;
+
+static int fips140_ioctl(int cmd, const void *arg)
+{
+	if (fips140_dev_fd < 0) {
+		create_fips140_node_if_needed();
+		fips140_dev_fd = open("/dev/fips140", O_RDONLY);
+		if (fips140_dev_fd < 0)
+			die_errno("Failed to open /dev/fips140");
+	}
+	return ioctl(fips140_dev_fd, cmd, arg);
+}
+
+static bool fips140_is_approved_service(const char *name)
+{
+	int ret = fips140_ioctl(FIPS140_IOCTL_IS_APPROVED_SERVICE, name);
+
+	if (ret < 0)
+		die_errno("FIPS140_IOCTL_IS_APPROVED_SERVICE unexpectedly failed");
+	if (ret == 1)
+		return true;
+	if (ret == 0)
+		return false;
+	die("FIPS140_IOCTL_IS_APPROVED_SERVICE returned unexpected value %d",
+	    ret);
+}
+
+static const char *fips140_module_version(void)
+{
+	static char buf[256];
+	int ret;
+
+	memset(buf, 0, sizeof(buf));
+	ret = fips140_ioctl(FIPS140_IOCTL_MODULE_VERSION, buf);
+	if (ret < 0)
+		die_errno("FIPS140_IOCTL_MODULE_VERSION unexpectedly failed");
+	if (ret != 0)
+		die("FIPS140_IOCTL_MODULE_VERSION returned unexpected value %d",
+		    ret);
+	return buf;
+}
+
+/* ---------------------------------------------------------------------------
+ *				AF_ALG utilities
+ * ---------------------------------------------------------------------------*/
+
+#define AF_ALG_MAX_RNG_REQUEST_SIZE	128
+
+static int get_alg_fd(const char *alg_type, const char *alg_name)
+{
+	struct sockaddr_alg addr = {};
+	int alg_fd;
+
+	alg_fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
+	if (alg_fd < 0)
+		die("Failed to create AF_ALG socket.\n"
+"AF_ALG is only available when it has been enabled in the kernel.\n");
+
+	strncpy((char *)addr.salg_type, alg_type, sizeof(addr.salg_type) - 1);
+	strncpy((char *)addr.salg_name, alg_name, sizeof(addr.salg_name) - 1);
+
+	if (bind(alg_fd, (void *)&addr, sizeof(addr)) != 0)
+		die_errno("Failed to bind AF_ALG socket to %s %s",
+			  alg_type, alg_name);
+	return alg_fd;
+}
+
+static int get_req_fd(int alg_fd, const char *alg_name)
+{
+	int req_fd = accept(alg_fd, NULL, NULL);
+
+	if (req_fd < 0)
+		die_errno("Failed to get request file descriptor for %s",
+			  alg_name);
+	return req_fd;
+}
+
+/* ---------------------------------------------------------------------------
+ *			   dump_jitterentropy command
+ * ---------------------------------------------------------------------------*/
+
+static void dump_from_jent_fd(int fd, size_t count)
+{
+	uint8_t buf[AF_ALG_MAX_RNG_REQUEST_SIZE];
+
+	while (count) {
+		ssize_t ret;
+
+		memset(buf, 0, sizeof(buf));
+		ret = read(fd, buf, MIN(count, sizeof(buf)));
+		if (ret < 0)
+			die_errno("error reading from jitterentropy_rng");
+		full_write(STDOUT_FILENO, buf, ret);
+		count -= ret;
+	}
+}
+
+static int cmd_dump_jitterentropy(int argc, char *argv[])
+{
+	static const struct option longopts[] = {
+		{ "amount", required_argument, NULL, OPT_AMOUNT },
+		{ "iterations", required_argument, NULL, OPT_ITERATIONS },
+		{ NULL, 0, NULL, 0 },
+	};
+	size_t amount = 128;
+	size_t iterations = 1;
+	size_t i;
+	int c;
+
+	while ((c = getopt_long(argc, argv, "", longopts, NULL)) != -1) {
+		switch (c) {
+		case OPT_AMOUNT:
+			amount = strtoul(optarg, NULL, 0);
+			if (amount <= 0 || amount >= ULONG_MAX)
+				die("invalid argument to --amount");
+			break;
+		case OPT_ITERATIONS:
+			iterations = strtoul(optarg, NULL, 0);
+			if (iterations <= 0 || iterations >= ULONG_MAX)
+				die("invalid argument to --iterations");
+			break;
+		default:
+			usage();
+			return 1;
+		}
+	}
+
+	for (i = 0; i < iterations; i++) {
+		int alg_fd = get_alg_fd("rng", "jitterentropy_rng");
+		int req_fd = get_req_fd(alg_fd, "jitterentropy_rng");
+
+		dump_from_jent_fd(req_fd, amount);
+
+		close(req_fd);
+		close(alg_fd);
+	}
+	return 0;
+}
+
+/* ---------------------------------------------------------------------------
+ *			  show_invalid_inputs command
+ * ---------------------------------------------------------------------------*/
+
+enum direction {
+	UNSPECIFIED,
+	DECRYPT,
+	ENCRYPT,
+};
+
+static const struct invalid_input_test {
+	const char *alg_type;
+	const char *alg_name;
+	const char *key;
+	size_t key_size;
+	const char *msg;
+	size_t msg_size;
+	const char *iv;
+	size_t iv_size;
+	enum direction direction;
+	int setkey_error;
+	int crypt_error;
+} invalid_input_tests[] = {
+	{
+		.alg_type = "skcipher",
+		.alg_name = "cbc(aes)",
+		.key_size = 16,
+	}, {
+		.alg_type = "skcipher",
+		.alg_name = "cbc(aes)",
+		.key_size = 17,
+		.setkey_error = EINVAL,
+	}, {
+		.alg_type = "skcipher",
+		.alg_name = "cbc(aes)",
+		.key_size = 24,
+	}, {
+		.alg_type = "skcipher",
+		.alg_name = "cbc(aes)",
+		.key_size = 32,
+	}, {
+		.alg_type = "skcipher",
+		.alg_name = "cbc(aes)",
+		.key_size = 33,
+		.setkey_error = EINVAL,
+	}, {
+		.alg_type = "skcipher",
+		.alg_name = "cbc(aes)",
+		.key_size = 16,
+		.msg_size = 1,
+		.direction = DECRYPT,
+		.crypt_error = EINVAL,
+	}, {
+		.alg_type = "skcipher",
+		.alg_name = "cbc(aes)",
+		.key_size = 16,
+		.msg_size = 16,
+		.direction = ENCRYPT,
+	}, {
+		.alg_type = "skcipher",
+		.alg_name = "cbc(aes)",
+		.key_size = 16,
+		.msg_size = 17,
+		.direction = ENCRYPT,
+		.crypt_error = EINVAL,
+	}, {
+		.alg_type = "hash",
+		.alg_name = "cmac(aes)",
+		.key_size = 29,
+		.setkey_error = EINVAL,
+	}, {
+		.alg_type = "skcipher",
+		.alg_name = "xts(aes)",
+		.key_size = 32,
+	}, {
+		.alg_type = "skcipher",
+		.alg_name = "xts(aes)",
+		.key = "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
+		       "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
+		.key_size = 32,
+		.setkey_error = EINVAL,
+	}
+};
+
+static const char *describe_crypt_op(const struct invalid_input_test *t)
+{
+	if (t->direction == ENCRYPT)
+		return "encryption";
+	if (t->direction == DECRYPT)
+		return "decryption";
+	if (strcmp(t->alg_type, "hash") == 0)
+		return "hashing";
+	ASSERT(0);
+}
+
+static bool af_alg_setkey(const struct invalid_input_test *t, int alg_fd)
+{
+	const uint8_t *key = (const uint8_t *)t->key;
+	uint8_t _key[t->key_size];
+
+	if (t->key_size == 0)
+		return true;
+
+	if (t->key == NULL) {
+		rand_bytes(_key, t->key_size);
+		key = _key;
+	}
+	if (setsockopt(alg_fd, SOL_ALG, ALG_SET_KEY, key, t->key_size) != 0) {
+		printf("%s: setting %zu-byte key failed with error '%s'\n",
+		       t->alg_name, t->key_size, strerror(errno));
+		printf("\tkey was %s\n\n", bytes_to_hex(key, t->key_size));
+		ASSERT(t->setkey_error == errno);
+		return false;
+	}
+	printf("%s: setting %zu-byte key succeeded\n",
+	       t->alg_name, t->key_size);
+	printf("\tkey was %s\n\n", bytes_to_hex(key, t->key_size));
+	ASSERT(t->setkey_error == 0);
+	return true;
+}
+
+static void af_alg_process_msg(const struct invalid_input_test *t, int alg_fd)
+{
+	struct iovec iov;
+	struct msghdr hdr = {
+		.msg_iov = &iov,
+		.msg_iovlen = 1,
+	};
+	const uint8_t *msg = (const uint8_t *)t->msg;
+	uint8_t *_msg = NULL;
+	uint8_t *output = NULL;
+	uint8_t *control = NULL;
+	size_t controllen = 0;
+	struct cmsghdr *cmsg;
+	int req_fd;
+
+	if (t->msg_size == 0)
+		return;
+
+	req_fd = get_req_fd(alg_fd, t->alg_name);
+
+	if (t->msg == NULL) {
+		_msg = malloc(t->msg_size);
+		rand_bytes(_msg, t->msg_size);
+		msg = _msg;
+	}
+	output = malloc(t->msg_size);
+	iov.iov_base = (void *)msg;
+	iov.iov_len = t->msg_size;
+
+	if (t->direction != UNSPECIFIED)
+		controllen += CMSG_SPACE(sizeof(uint32_t));
+	if (t->iv_size)
+		controllen += CMSG_SPACE(sizeof(struct af_alg_iv) + t->iv_size);
+	control = calloc(1, controllen);
+	hdr.msg_control = control;
+	hdr.msg_controllen = controllen;
+	cmsg = CMSG_FIRSTHDR(&hdr);
+	if (t->direction != UNSPECIFIED) {
+		cmsg->cmsg_level = SOL_ALG;
+		cmsg->cmsg_type = ALG_SET_OP;
+		cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t));
+		*(uint32_t *)CMSG_DATA(cmsg) = t->direction == DECRYPT ?
+				ALG_OP_DECRYPT : ALG_OP_ENCRYPT;
+		cmsg = CMSG_NXTHDR(&hdr, cmsg);
+	}
+	if (t->iv_size) {
+		struct af_alg_iv *alg_iv;
+
+		cmsg->cmsg_level = SOL_ALG;
+		cmsg->cmsg_type = ALG_SET_IV;
+		cmsg->cmsg_len = CMSG_LEN(sizeof(*alg_iv) + t->iv_size);
+		alg_iv = (struct af_alg_iv *)CMSG_DATA(cmsg);
+		alg_iv->ivlen = t->iv_size;
+		memcpy(alg_iv->iv, t->iv, t->iv_size);
+	}
+
+	if (sendmsg(req_fd, &hdr, 0) != t->msg_size)
+		die_errno("sendmsg failed");
+
+	if (read(req_fd, output, t->msg_size) != t->msg_size) {
+		printf("%s: %s of %zu-byte message failed with error '%s'\n",
+		       t->alg_name, describe_crypt_op(t), t->msg_size,
+		       strerror(errno));
+		printf("\tmessage was %s\n\n", bytes_to_hex(msg, t->msg_size));
+		ASSERT(t->crypt_error == errno);
+	} else {
+		printf("%s: %s of %zu-byte message succeeded\n",
+		       t->alg_name, describe_crypt_op(t), t->msg_size);
+		printf("\tmessage was %s\n\n", bytes_to_hex(msg, t->msg_size));
+		ASSERT(t->crypt_error == 0);
+	}
+	free(_msg);
+	free(output);
+	free(control);
+	close(req_fd);
+}
+
+static void test_invalid_input(const struct invalid_input_test *t)
+{
+	int alg_fd = get_alg_fd(t->alg_type, t->alg_name);
+
+	if (af_alg_setkey(t, alg_fd))
+		af_alg_process_msg(t, alg_fd);
+
+	close(alg_fd);
+}
+
+static int cmd_show_invalid_inputs(int argc, char *argv[])
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(invalid_input_tests); i++)
+		test_invalid_input(&invalid_input_tests[i]);
+	return 0;
+}
+
+/* ---------------------------------------------------------------------------
+ *			  show_module_version command
+ * ---------------------------------------------------------------------------*/
+
+static int cmd_show_module_version(int argc, char *argv[])
+{
+	printf("fips140_module_version() => \"%s\"\n",
+	       fips140_module_version());
+	return 0;
+}
+
+/* ---------------------------------------------------------------------------
+ *			show_service_indicators command
+ * ---------------------------------------------------------------------------*/
+
+static const char * const default_services_to_show[] = {
+	"aes",
+	"cbc(aes)",
+	"cbcmac(aes)",
+	"cmac(aes)",
+	"ctr(aes)",
+	"cts(cbc(aes))",
+	"ecb(aes)",
+	"essiv(cbc(aes),sha256)",
+	"gcm(aes)",
+	"hmac(sha1)",
+	"hmac(sha224)",
+	"hmac(sha256)",
+	"hmac(sha384)",
+	"hmac(sha512)",
+	"jitterentropy_rng",
+	"sha1",
+	"sha224",
+	"sha256",
+	"sha384",
+	"sha512",
+	"stdrng",
+	"xcbc(aes)",
+	"xts(aes)",
+};
+
+static int cmd_show_service_indicators(int argc, char *argv[])
+{
+	const char * const *services = default_services_to_show;
+	int count = ARRAY_SIZE(default_services_to_show);
+	int i;
+
+	if (argc > 1) {
+		services = (const char **)(argv + 1);
+		count = argc - 1;
+	}
+	for (i = 0; i < count; i++) {
+		printf("fips140_is_approved_service(\"%s\") => %s\n",
+		       services[i],
+		       booltostr(fips140_is_approved_service(services[i])));
+	}
+	return 0;
+}
+
+/* ---------------------------------------------------------------------------
+ *				     main()
+ * ---------------------------------------------------------------------------*/
+
+static const struct command {
+	const char *name;
+	int (*func)(int argc, char *argv[]);
+} commands[] = {
+	{ "dump_jitterentropy", cmd_dump_jitterentropy },
+	{ "show_invalid_inputs", cmd_show_invalid_inputs },
+	{ "show_module_version", cmd_show_module_version },
+	{ "show_service_indicators", cmd_show_service_indicators },
+};
+
+static void usage(void)
+{
+	fprintf(stderr,
+"Usage:\n"
+"       fips140_lab_util dump_jitterentropy [OPTION]...\n"
+"       fips140_lab_util show_invalid_inputs\n"
+"       fips140_lab_util show_module_version\n"
+"       fips140_lab_util show_service_indicators [SERVICE]...\n"
+"\n"
+"Options for dump_jitterentropy:\n"
+"  --amount=AMOUNT      Amount to dump in bytes per iteration (default 128)\n"
+"  --iterations=COUNT   Number of start-up iterations (default 1)\n"
+	);
+}
+
+int main(int argc, char *argv[])
+{
+	int i;
+
+	if (argc < 2) {
+		usage();
+		return 2;
+	}
+	for (i = 1; i < argc; i++) {
+		if (strcmp(argv[i], "--help") == 0) {
+			usage();
+			return 2;
+		}
+	}
+
+	for (i = 0; i < ARRAY_SIZE(commands); i++) {
+		if (strcmp(commands[i].name, argv[1]) == 0)
+			return commands[i].func(argc - 1, argv + 1);
+	}
+	fprintf(stderr, "Unknown command: %s\n\n", argv[1]);
+	usage();
+	return 2;
+}

diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 3aa384c..2e20d34 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib

@@ -242,7 +242,9 @@
 
 ld_flags       = $(KBUILD_LDFLAGS) $(ldflags-y) $(LDFLAGS_$(@F))
 
-DTC_INCLUDE    := $(srctree)/scripts/dtc/include-prefixes
+# ANDROID: Allow DTC_INCLUDE to be set by the BUILD_CONFIG. This allows one to
+# compile an out-of-tree device tree.
+DTC_INCLUDE    += $(srctree)/scripts/dtc/include-prefixes
 
 dtc_cpp_flags  = -Wp,-MMD,$(depfile).pre.tmp -nostdinc                    \
 		 $(addprefix -I,$(DTC_INCLUDE))                          \
@@ -455,10 +457,10 @@
       cmd_lzo_with_size = { cat $(real-prereqs) | $(KLZOP) -9; $(size_append); } > $@
 
 quiet_cmd_lz4 = LZ4     $@
-      cmd_lz4 = cat $(real-prereqs) | $(LZ4) -l -c1 stdin stdout > $@
+      cmd_lz4 = cat $(real-prereqs) | $(LZ4) -l -12 --favor-decSpeed stdin stdout > $@
 
 quiet_cmd_lz4_with_size = LZ4     $@
-      cmd_lz4_with_size = { cat $(real-prereqs) | $(LZ4) -l -c1 stdin stdout; \
+      cmd_lz4_with_size = { cat $(real-prereqs) | $(LZ4) -l -12 --favor-decSpeed stdin stdout; \
                   $(size_append); } > $@
 
 # U-Boot mkimage

diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 25bedd8..440f0b28 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal

@@ -12,6 +12,8 @@
 # for c_flags
 include $(srctree)/scripts/Makefile.lib
 
+mixed-build-prefix = $(if $(KBUILD_MIXED_TREE),$(KBUILD_MIXED_TREE)/)
+
 # find all modules listed in modules.order
 modules := $(sort $(shell cat $(MODORDER)))
 
@@ -39,13 +41,13 @@
 
 quiet_cmd_btf_ko = BTF [M] $@
       cmd_btf_ko = 							\
-	if [ ! -f vmlinux ]; then					\
-		printf "Skipping BTF generation for %s due to unavailability of vmlinux\n" $@ 1>&2; \
+	if [ ! -f $(mixed-build-prefix)vmlinux ]; then					\
+		printf "Skipping BTF generation for %s due to unavailability of $(mixed-build-prefix)vmlinux\n" $@ 1>&2; \
 	elif [ -n "$(CONFIG_RUST)" ] && $(srctree)/scripts/is_rust_module.sh $@; then 		\
 		printf "Skipping BTF generation for %s because it's a Rust module\n" $@ 1>&2; \
 	else								\
-		LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) --btf_base vmlinux $@; \
-		$(RESOLVE_BTFIDS) -b vmlinux $@; 			\
+		LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) --btf_base $(mixed-build-prefix)vmlinux $@; \
+		$(RESOLVE_BTFIDS) -b $(mixed-build-prefix)vmlinux $@; 			\
 	fi;
 
 # Same as newer-prereqs, but allows to exclude specified extra dependencies
@@ -57,8 +59,8 @@
 	printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd, @:)
 
 # Re-generate module BTFs if either module's .ko or vmlinux changed
-$(modules): %.ko: %.o %.mod.o scripts/module.lds $(and $(CONFIG_DEBUG_INFO_BTF_MODULES),$(KBUILD_BUILTIN),vmlinux) FORCE
-	+$(call if_changed_except,ld_ko_o,vmlinux)
+$(modules): %.ko: %.o %.mod.o scripts/module.lds $(and $(CONFIG_DEBUG_INFO_BTF_MODULES),$(KBUILD_BUILTIN),$(mixed-build-prefix)vmlinux) FORCE
+	+$(call if_changed_except,ld_ko_o,$(mixed-build-prefix)vmlinux)
 ifdef CONFIG_DEBUG_INFO_BTF_MODULES
 	+$(if $(newer-prereqs),$(call cmd,btf_ko))
 endif

diff --git a/scripts/Makefile.modinst b/scripts/Makefile.modinst
index 10df89b9..8019c71 100644
--- a/scripts/Makefile.modinst
+++ b/scripts/Makefile.modinst

@@ -27,6 +27,10 @@
 suffix-$(CONFIG_MODULE_COMPRESS_ZSTD)	:= .zst
 
 modules := $(patsubst $(extmod_prefix)%, $(dst)/%$(suffix-y), $(modules))
+ifneq ($(KBUILD_EXTMOD),)
+extmod_suffix := $(shell echo "${KBUILD_EXTMOD}" | md5sum | cut -d " " -f 1)
+modules += $(dst)/modules.order.$(extmod_suffix)
+endif
 
 __modinst: $(modules)
 	@:
@@ -86,6 +90,12 @@
 	$(call cmd,strip)
 	$(call cmd,sign)
 
+ifneq ($(KBUILD_EXTMOD),)
+$(dst)/modules.order.$(extmod_suffix): $(MODORDER) FORCE
+	$(call cmd,install)
+	@sed -i "s:^$(KBUILD_EXTMOD):$(INSTALL_MOD_DIR):g" $@
+endif
+
 else
 
 $(dst)/%.ko: FORCE

diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index e41dee6..9aaf186 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost

@@ -38,6 +38,8 @@
 include include/config/auto.conf
 include $(srctree)/scripts/Kbuild.include
 
+mixed-build-prefix = $(if $(KBUILD_MIXED_TREE),$(KBUILD_MIXED_TREE)/)
+
 modpost-args =										\
 	$(if $(CONFIG_MODVERSIONS),-m)							\
 	$(if $(CONFIG_MODULE_SRCVERSION_ALL),-a)					\
@@ -58,8 +60,10 @@
 
 # This is used to retrieve symbol versions generated by genksyms.
 ifdef CONFIG_MODVERSIONS
+ifndef KBUILD_MIXED_TREE
 vmlinux.symvers Module.symvers: .vmlinux.objs
 endif
+endif
 
 # Ignore libgcc.a
 # Some architectures do '$(CC) --print-libgcc-file-name' to borrow libgcc.a
@@ -74,9 +78,17 @@
 		esac			\
 	done > $@
 
+quiet_cmd_vmlinux_symvers = GEN     $@
+      cmd_vmlinux_symvers = grep "\<vmlinux\s\+EXPORT" $< > $@
+
+quiet_cmd_cat_symvers = GEN     $@
+      cmd_cat_symvers = cat $(real-prereqs) > $@
+
+ifndef KBUILD_MIXED_TREE
 targets += .vmlinux.objs
 .vmlinux.objs: vmlinux.a $(KBUILD_VMLINUX_LIBS) FORCE
 	$(call if_changed,vmlinux_objs)
+endif
 
 vmlinux.o-if-present := $(wildcard vmlinux.o)
 output-symdump := vmlinux.symvers
@@ -84,6 +96,27 @@
 ifdef KBUILD_MODULES
 output-symdump := $(if $(vmlinux.o-if-present), Module.symvers, modules-only.symvers)
 missing-input := $(filter-out $(vmlinux.o-if-present),vmlinux.o)
+
+targets += vmlinux.symvers
+vmlinux.symvers: Module.symvers FORCE
+	$(call if_changed,vmlinux_symvers)
+
+__modpost: $(if $(vmlinux.o-if-present),vmlinux.symvers,)
+endif
+
+# Overwrite values for mixed building (overwritting makes merges easier)
+ifdef KBUILD_MIXED_TREE
+targets += Module.symvers
+Module.symvers: $(mixed-build-prefix)vmlinux.symvers modules-only.symvers FORCE
+	$(call if_changed,cat_symvers)
+
+__modpost: Module.symvers
+
+vmlinux.o-if-present :=
+vmlinux.symvers-if-present := $(wildcard $(mixed-build-prefix)vmlinux.symvers)
+modpost-args += $(addprefix -i ,$(vmlinux.symvers-if-present))
+output-symdump := modules-only.symvers
+missing-input := $(filter-out $(vmlinux.symvers-if-present),$(mixed-build-prefix)vmlinux.symvers)
 endif
 
 else

diff --git a/scripts/depmod.sh b/scripts/depmod.sh
index 3643b4f..2de598d 100755
--- a/scripts/depmod.sh
+++ b/scripts/depmod.sh

@@ -3,14 +3,15 @@
 #
 # A depmod wrapper used by the toplevel Makefile
 
-if test $# -ne 2; then
-	echo "Usage: $0 /sbin/depmod <kernelrelease>" >&2
+if test $# -ne 2 -a $# -ne 3; then
+	echo "Usage: $0 /sbin/depmod <kernelrelease> [System.map folder]" >&2
 	exit 1
 fi
 DEPMOD=$1
 KERNELRELEASE=$2
+KBUILD_MIXED_TREE=$3
 
-if ! test -r System.map ; then
+if ! test -r ${KBUILD_MIXED_TREE}System.map ; then
 	echo "Warning: modules_install: missing 'System.map' file. Skipping depmod." >&2
 	exit 0
 fi
@@ -41,7 +42,7 @@
 	KERNELRELEASE=99.98.$KERNELRELEASE
 fi
 
-set -- -ae -F System.map
+set -- -ae -F ${KBUILD_MIXED_TREE}System.map
 if test -n "$INSTALL_MOD_PATH"; then
 	set -- "$@" -b "$INSTALL_MOD_PATH"
 fi

diff --git a/scripts/gen_gki_modules_headers.sh b/scripts/gen_gki_modules_headers.sh
new file mode 100755
index 0000000..a8fc072
--- /dev/null
+++ b/scripts/gen_gki_modules_headers.sh

@@ -0,0 +1,107 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright 2022 Google LLC
+# Author: ramjiyani@google.com (Ramji Jiyani)
+#
+
+#
+# Generates header file with list of unprotected symbols
+#
+# Called By: KERNEL_SRC/kernel/Makefile if CONFIG_MODULE_SIG_PROTECT=y
+#
+# gki_module_unprotected.h: Symbols allowed to _access_ by unsigned modules
+#
+# If valid symbol file doesn't exists then still generates valid C header files for
+# compilation to proceed with no symbols to protect
+#
+
+# Collect arguments from Makefile
+TARGET=$1
+SRCTREE=$2
+SYMBOL_LIST=$3
+
+set -e
+
+#
+# Common Definitions
+#
+# Use "make V=1" to debug this script.
+case "$KBUILD_VERBOSE" in
+*1*)
+	set -x
+	;;
+esac
+
+#
+# generate_header():
+# Args: $1 = Name of the header file
+#       $2 = Input symbol list
+#       $3 = Symbol type ("unprotected")
+#
+generate_header() {
+	local header_file=$1
+	local symbol_file=$2
+	local symbol_type=$3
+
+	if [ -f "${header_file}" ]; then
+		rm -f -- "${header_file}"
+	fi
+
+	# If symbol_file exist preprocess it and find maximum name length
+	if [  -s "${symbol_file}" ]; then
+		# Remove White Spaces, empty lines and symbol list markers if any
+		sed -i '/^[[:space:]]*$/d; /^#/d; /\[abi_symbol_list\]/d' "${symbol_file}"
+
+		# Sort in byte order for kernel binary search at runtime
+		LC_ALL=C sort -u -o "${symbol_file}" "${symbol_file}"
+
+		# Trim white spaces & +1 for null termination
+		local max_name_len=$(awk '
+				{
+					$1=$1;
+					if ( length > L ) {
+						L=length
+					}
+				} END { print ++L }' "${symbol_file}")
+	else
+		# Set to 1 to generate valid C header file
+		local max_name_len=1
+	fi
+
+	# Header generation
+	cat > "${header_file}" <<- EOT
+	/*
+	 * DO NOT EDIT
+	 *
+	 * Build generated header file with ${symbol_type}
+	 */
+
+	#define NR_$(printf ${symbol_type} | tr [:lower:] [:upper:])_SYMBOLS \\
+	$(printf '\t')(ARRAY_SIZE(gki_${symbol_type}_symbols))
+	#define MAX_$(printf ${symbol_type} | tr [:lower:] [:upper:])_NAME_LEN (${max_name_len})
+
+	static const char gki_${symbol_type}_symbols[][MAX_$(printf ${symbol_type} |
+							tr [:lower:] [:upper:])_NAME_LEN] = {
+	EOT
+
+	# If a valid symbol_file present add symbols in an array except the 1st line
+	if [  -s "${symbol_file}" ]; then
+		sed -e 's/^[ \t]*/\t"/;s/[ \t]*$/",/' "${symbol_file}" >> "${header_file}"
+	fi
+
+	# Terminate the file
+	echo "};" >> "${header_file}"
+}
+
+if [ "$(basename "${TARGET}")" = "gki_module_unprotected.h" ]; then
+	# Union of vendor symbol lists
+	GKI_VENDOR_SYMBOLS="${SYMBOL_LIST}"
+	generate_header "${TARGET}" "${GKI_VENDOR_SYMBOLS}" "unprotected"
+else
+	# Sorted list of exported symbols
+	GKI_EXPORTED_SYMBOLS="${SYMBOL_LIST}"
+
+	generate_header "${TARGET}" "${GKI_EXPORTED_SYMBOLS}" "protected_exports"
+fi
+

diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index da4bddd..023c498 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S

@@ -27,7 +27,36 @@
 	__kcfi_traps 		: { KEEP(*(.kcfi_traps)) }
 #endif
 
-#ifdef CONFIG_LTO_CLANG
+#if IS_ENABLED(CONFIG_CRYPTO_FIPS140_MOD)
+	/*
+	 * The FIPS140 module incorporates copies of builtin code, which gets
+	 * integrity checked at module load time, and registered in a way that
+	 * ensures that the integrity checked versions supersede the builtin
+	 * ones.  These objects are compiled as builtin code, and so their init
+	 * hooks will be exported from the binary in the same way as builtin
+	 * initcalls are, i.e., annotated with a level that defines the order
+	 * in which the hooks are expected to be invoked.
+	 */
+#define INIT_CALLS_LEVEL(level)						\
+		KEEP(*(.initcall##level##.init*))			\
+		KEEP(*(.initcall##level##s.init*))
+
+	.initcalls : {
+		*(.initcalls._start)
+		INIT_CALLS_LEVEL(0)
+		INIT_CALLS_LEVEL(1)
+		INIT_CALLS_LEVEL(2)
+		INIT_CALLS_LEVEL(3)
+		INIT_CALLS_LEVEL(4)
+		INIT_CALLS_LEVEL(5)
+		INIT_CALLS_LEVEL(rootfs)
+		INIT_CALLS_LEVEL(6)
+		INIT_CALLS_LEVEL(7)
+		*(.initcalls._end)
+	}
+#endif
+
+#if defined(CONFIG_LTO_CLANG) || IS_ENABLED(CONFIG_CRYPTO_FIPS140_MOD)
 	/*
 	 * With CONFIG_LTO_CLANG, LLD always enables -fdata-sections and
 	 * -ffunction-sections, which increases the size of the final module.
@@ -44,8 +73,17 @@
 	}
 
 	.rodata : {
+		*(.rodata.._start)
 		*(.rodata .rodata.[0-9a-zA-Z_]*)
 		*(.rodata..L*)
+		*(.rodata.._end)
+	}
+
+	.text : {
+		*(.text.._start)
+		*(.text .text.[0-9a-zA-Z_]*)
+		*(.text.._end)
+		*(.text.._fips140_unchecked)
 	}
 #endif
 }

diff --git a/scripts/setlocalversion b/scripts/setlocalversion
index af4754a..1b733ae 100755
--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion

@@ -11,12 +11,14 @@
 #
 
 usage() {
-	echo "Usage: $0 [--save-scmversion] [srctree]" >&2
+	echo "Usage: $0 [--save-scmversion] [srctree] [branch] [kmi-generation]" >&2
 	exit 1
 }
 
 scm_only=false
 srctree=.
+android_release=
+kmi_generation=
 if test "$1" = "--save-scmversion"; then
 	scm_only=true
 	shift
@@ -25,6 +27,22 @@
 	srctree=$1
 	shift
 fi
+if test $# -gt 0; then
+	# Extract the Android release version. If there is no match, then return 255
+	# and clear the var $android_release
+	android_release=`echo "$1" | sed -e '/android[0-9]\{2,\}/!{q255}; \
+		s/^\(android[0-9]\{2,\}\)-.*/\1/'`
+	if test $? -ne 0; then
+		android_release=
+	fi
+	shift
+
+	if test $# -gt 0; then
+		kmi_generation=$1
+		[ $(expr $kmi_generation : '^[0-9]\+$') -eq 0 ] && usage
+		shift
+	fi
+fi
 if test $# -gt 0 -o ! -d "$srctree"; then
 	usage
 fi
@@ -44,8 +62,13 @@
 	fi
 
 	# Check for git and a git repo.
-	if test -z "$(git rev-parse --show-cdup 2>/dev/null)" &&
-	   head=$(git rev-parse --verify HEAD 2>/dev/null); then
+	if head=$(git rev-parse --verify HEAD 2>/dev/null); then
+
+		if [ -n "$android_release" ] && [ -n "$kmi_generation" ]; then
+			printf '%s' "-$android_release-$kmi_generation"
+		elif [ -n "$android_release" ]; then
+			printf '%s' "-$android_release"
+		fi
 
 		# If we are at a tagged commit (like "v2.6.30-rc6"), we ignore
 		# it, because this version is defined in the top level Makefile.
@@ -142,4 +165,9 @@
 	res="$res${scm:++}"
 fi
 
+# finally, add the abXXX number if BUILD_NUMBER is set
+if test -n "${BUILD_NUMBER}"; then
+	res="$res-ab${BUILD_NUMBER}"
+fi
+
 echo "$res"

diff --git a/scripts/sign-file.c b/scripts/sign-file.c
index 598ef54..cadbc26 100644
--- a/scripts/sign-file.c
+++ b/scripts/sign-file.c

@@ -99,6 +99,7 @@ static void display_openssl_errors(int l)
 	}
 }
 
+#ifndef OPENSSL_NO_ENGINE
 static void drain_openssl_errors(void)
 {
 	const char *file;
@@ -108,6 +109,7 @@ static void drain_openssl_errors(void)
 		return;
 	while (ERR_get_error_line(&file, &line)) {}
 }
+#endif
 
 #define ERR(cond, fmt, ...)				\
 	do {						\
@@ -142,7 +144,9 @@ static int pem_pw_cb(char *buf, int len, int w, void *v)
 static EVP_PKEY *read_private_key(const char *private_key_name)
 {
 	EVP_PKEY *private_key;
+	BIO *b;
 
+#ifndef OPENSSL_NO_ENGINE
 	if (!strncmp(private_key_name, "pkcs11:", 7)) {
 		ENGINE *e;
 
@@ -160,17 +164,16 @@ static EVP_PKEY *read_private_key(const char *private_key_name)
 		private_key = ENGINE_load_private_key(e, private_key_name,
 						      NULL, NULL);
 		ERR(!private_key, "%s", private_key_name);
-	} else {
-		BIO *b;
-
-		b = BIO_new_file(private_key_name, "rb");
-		ERR(!b, "%s", private_key_name);
-		private_key = PEM_read_bio_PrivateKey(b, NULL, pem_pw_cb,
-						      NULL);
-		ERR(!private_key, "%s", private_key_name);
-		BIO_free(b);
+		return private_key;
 	}
+#endif
 
+	b = BIO_new_file(private_key_name, "rb");
+	ERR(!b, "%s", private_key_name);
+	private_key = PEM_read_bio_PrivateKey(b, NULL, pem_pw_cb,
+					      NULL);
+	ERR(!private_key, "%s", private_key_name);
+	BIO_free(b);
 	return private_key;
 }
 

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 9a43af0..e897b1c 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c

@@ -44,6 +44,9 @@
 #define avc_cache_stats_incr(field)	do {} while (0)
 #endif
 
+#undef CREATE_TRACE_POINTS
+#include <trace/hooks/avc.h>
+
 struct avc_entry {
 	u32			ssid;
 	u32			tsid;
@@ -441,6 +444,7 @@ static void avc_node_free(struct rcu_head *rhead)
 
 static void avc_node_delete(struct selinux_avc *avc, struct avc_node *node)
 {
+	trace_android_rvh_selinux_avc_node_delete(node);
 	hlist_del_rcu(&node->list);
 	call_rcu(&node->rhead, avc_node_free);
 	atomic_dec(&avc->avc_cache.active_nodes);
@@ -457,6 +461,7 @@ static void avc_node_kill(struct selinux_avc *avc, struct avc_node *node)
 static void avc_node_replace(struct selinux_avc *avc,
 			     struct avc_node *new, struct avc_node *old)
 {
+	trace_android_rvh_selinux_avc_node_replace(old, new);
 	hlist_replace_rcu(&old->list, &new->list);
 	call_rcu(&old->rhead, avc_node_free);
 	atomic_dec(&avc->avc_cache.active_nodes);
@@ -566,8 +571,10 @@ static struct avc_node *avc_lookup(struct selinux_avc *avc,
 	avc_cache_stats_incr(lookups);
 	node = avc_search_node(avc, ssid, tsid, tclass);
 
-	if (node)
+	if (node) {
+		trace_android_rvh_selinux_avc_lookup(node, ssid, tsid, tclass);
 		return node;
+	}
 
 	avc_cache_stats_incr(misses);
 	return NULL;
@@ -652,6 +659,7 @@ static struct avc_node *avc_insert(struct selinux_avc *avc,
 		}
 	}
 	hlist_add_head_rcu(&node->list, head);
+	trace_android_rvh_selinux_avc_insert(node);
 found:
 	spin_unlock_irqrestore(lock, flag);
 	return node;

diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index a3c3807..d743c30 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h

@@ -117,7 +117,8 @@ const struct security_class_mapping secclass_map[] = {
 	  { COMMON_IPC_PERMS, NULL } },
 	{ "netlink_route_socket",
 	  { COMMON_SOCK_PERMS,
-	    "nlmsg_read", "nlmsg_write", NULL } },
+	    "nlmsg_read", "nlmsg_write", "nlmsg_readpriv", "nlmsg_getneigh",
+	    NULL } },
 	{ "netlink_tcpdiag_socket",
 	  { COMMON_SOCK_PERMS,
 	    "nlmsg_read", "nlmsg_write", NULL } },

diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index 393aff4..ad59c4c0 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h

@@ -99,6 +99,8 @@ struct selinux_state {
 	bool checkreqprot;
 	bool initialized;
 	bool policycap[__POLICYDB_CAP_MAX];
+	bool android_netlink_route;
+	bool android_netlink_getneigh;
 
 	struct page *status_page;
 	struct mutex status_lock;
@@ -230,6 +232,20 @@ static inline bool selinux_policycap_ioctl_skip_cloexec(void)
 	return READ_ONCE(state->policycap[POLICYDB_CAP_IOCTL_SKIP_CLOEXEC]);
 }
 
+static inline bool selinux_android_nlroute_getlink(void)
+{
+	struct selinux_state *state = &selinux_state;
+
+	return state->android_netlink_route;
+}
+
+static inline bool selinux_android_nlroute_getneigh(void)
+{
+	struct selinux_state *state = &selinux_state;
+
+	return state->android_netlink_getneigh;
+}
+
 struct selinux_policy_convert_data;
 
 struct selinux_load_state {
@@ -463,5 +479,6 @@ extern void avtab_cache_init(void);
 extern void ebitmap_cache_init(void);
 extern void hashtab_cache_init(void);
 extern int security_sidtab_hash_stats(struct selinux_state *state, char *page);
+extern void selinux_nlmsg_init(void);
 
 #endif /* _SELINUX_SECURITY_H_ */

diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 2ee7b4ed..32200ec 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c

@@ -25,7 +25,7 @@ struct nlmsg_perm {
 	u32	perm;
 };
 
-static const struct nlmsg_perm nlmsg_route_perms[] = {
+static struct nlmsg_perm nlmsg_route_perms[] = {
 	{ RTM_NEWLINK,		NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_DELLINK,		NETLINK_ROUTE_SOCKET__NLMSG_WRITE },
 	{ RTM_GETLINK,		NETLINK_ROUTE_SOCKET__NLMSG_READ  },
@@ -216,3 +216,43 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
 
 	return err;
 }
+
+static void nlmsg_set_perm_for_type(u32 perm, u16 type)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(nlmsg_route_perms); i++) {
+		if (nlmsg_route_perms[i].nlmsg_type == type) {
+			nlmsg_route_perms[i].perm = perm;
+			break;
+		}
+	}
+}
+
+/**
+ * Use nlmsg_readpriv as the permission for RTM_GETLINK messages if the
+ * netlink_route_getlink policy capability is set. Otherwise use nlmsg_read.
+ * Similarly, use nlmsg_getneigh for RTM_GETNEIGH and RTM_GETNEIGHTBL if the
+ * netlink_route_getneigh policy capability is set. Otherwise use nlmsg_read.
+ */
+void selinux_nlmsg_init(void)
+{
+	if (selinux_android_nlroute_getlink())
+		nlmsg_set_perm_for_type(NETLINK_ROUTE_SOCKET__NLMSG_READPRIV,
+					RTM_GETLINK);
+	else
+		nlmsg_set_perm_for_type(NETLINK_ROUTE_SOCKET__NLMSG_READ,
+					RTM_GETLINK);
+
+	if (selinux_android_nlroute_getneigh()) {
+		nlmsg_set_perm_for_type(NETLINK_ROUTE_SOCKET__NLMSG_GETNEIGH,
+					RTM_GETNEIGH);
+		nlmsg_set_perm_for_type(NETLINK_ROUTE_SOCKET__NLMSG_GETNEIGH,
+					RTM_GETNEIGHTBL);
+	} else {
+		nlmsg_set_perm_for_type(NETLINK_ROUTE_SOCKET__NLMSG_READ,
+					RTM_GETNEIGH);
+		nlmsg_set_perm_for_type(NETLINK_ROUTE_SOCKET__NLMSG_READ,
+					RTM_GETNEIGHTBL);
+	}
+}

diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
index adcfb63..a36825a 100644
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c

@@ -2485,6 +2485,14 @@ int policydb_read(struct policydb *p, void *fp)
 	p->reject_unknown = !!(le32_to_cpu(buf[1]) & REJECT_UNKNOWN);
 	p->allow_unknown = !!(le32_to_cpu(buf[1]) & ALLOW_UNKNOWN);
 
+	if ((le32_to_cpu(buf[1]) & POLICYDB_CONFIG_ANDROID_NETLINK_ROUTE)) {
+		p->android_netlink_route = 1;
+	}
+
+	if ((le32_to_cpu(buf[1]) & POLICYDB_CONFIG_ANDROID_NETLINK_GETNEIGH)) {
+		p->android_netlink_getneigh = 1;
+	}
+
 	if (p->policyvers >= POLICYDB_VERSION_POLCAP) {
 		rc = ebitmap_read(&p->policycaps, fp);
 		if (rc)

diff --git a/security/selinux/ss/policydb.h b/security/selinux/ss/policydb.h
index ffc4e7b..1d75811 100644
--- a/security/selinux/ss/policydb.h
+++ b/security/selinux/ss/policydb.h

@@ -238,6 +238,8 @@ struct genfs {
 /* The policy database */
 struct policydb {
 	int mls_enabled;
+	int android_netlink_route;
+	int android_netlink_getneigh;
 
 	/* symbol tables */
 	struct symtab symtab[SYM_NUM];
@@ -334,6 +336,8 @@ extern struct role_trans_datum *policydb_roletr_search(
 	struct policydb *p, struct role_trans_key *key);
 
 #define POLICYDB_CONFIG_MLS    1
+#define POLICYDB_CONFIG_ANDROID_NETLINK_ROUTE    (1 << 31)
+#define POLICYDB_CONFIG_ANDROID_NETLINK_GETNEIGH (1 << 30)
 
 /* the config flags related to unknown classes/perms are bits 2 and 3 */
 #define REJECT_UNKNOWN	0x00000002

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 64a6a37..d52ec72 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c

@@ -68,6 +68,8 @@
 #include "policycap_names.h"
 #include "ima.h"
 
+#include <trace/hooks/selinux.h>
+
 struct convert_context_args {
 	struct selinux_state *state;
 	struct policydb *oldp;
@@ -2166,6 +2168,10 @@ static void security_load_policycaps(struct selinux_state *state,
 			pr_info("SELinux:  unknown policy capability %u\n",
 				i);
 	}
+
+	state->android_netlink_route = p->android_netlink_route;
+	state->android_netlink_getneigh = p->android_netlink_getneigh;
+	selinux_nlmsg_init();
 }
 
 static int security_preserve_bools(struct selinux_policy *oldpolicy,
@@ -2259,6 +2265,7 @@ void selinux_policy_commit(struct selinux_state *state,
 		 */
 		selinux_mark_initialized(state);
 		selinux_complete_init();
+		trace_android_rvh_selinux_is_initialized(state);
 	}
 
 	/* Free the old policy */

diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index a409fbe..49e0fa73 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c

@@ -3225,6 +3225,39 @@ int snd_soc_get_dai_id(struct device_node *ep)
 }
 EXPORT_SYMBOL_GPL(snd_soc_get_dai_id);
 
+/**
+ * snd_soc_info_multi_ext - external single mixer info callback
+ * @kcontrol: mixer control
+ * @uinfo: control element information
+ *
+ * Callback to provide information about a single external mixer control.
+ * that accepts multiple input.
+ *
+ * Returns 0 for success.
+ */
+int snd_soc_info_multi_ext(struct snd_kcontrol *kcontrol,
+	struct snd_ctl_elem_info *uinfo)
+{
+	struct soc_multi_mixer_control *mc =
+		(struct soc_multi_mixer_control *)kcontrol->private_value;
+	int platform_max;
+
+	if (!mc->platform_max)
+		mc->platform_max = mc->max;
+	platform_max = mc->platform_max;
+
+	if (platform_max == 1 && !strnstr(kcontrol->id.name, " Volume", 30))
+		uinfo->type = SNDRV_CTL_ELEM_TYPE_BOOLEAN;
+	else
+		uinfo->type = SNDRV_CTL_ELEM_TYPE_INTEGER;
+
+	uinfo->count = mc->count;
+	uinfo->value.integer.min = 0;
+	uinfo->value.integer.max = platform_max;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(snd_soc_info_multi_ext);
+
 int snd_soc_get_dai_name(const struct of_phandle_args *args,
 				const char **dai_name)
 {

diff --git a/tools/crypto/gen_fips140_testvecs.py b/tools/crypto/gen_fips140_testvecs.py
new file mode 100755
index 0000000..825c487
--- /dev/null
+++ b/tools/crypto/gen_fips140_testvecs.py

@@ -0,0 +1,125 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright 2021 Google LLC
+#
+# Generate most of the test vectors for the FIPS 140 cryptographic self-tests.
+#
+# Usage:
+#    tools/crypto/gen_fips140_testvecs.py > crypto/fips140-generated-testvecs.h
+#
+# Prerequisites:
+#    Debian:      apt-get install python3-pycryptodome python3-cryptography
+#    Arch Linux:  pacman -S python-pycryptodomex python-cryptography
+
+import hashlib
+import hmac
+import os
+
+import Cryptodome.Cipher.AES
+import Cryptodome.Util.Counter
+
+import cryptography.hazmat.primitives.ciphers
+import cryptography.hazmat.primitives.ciphers.algorithms
+import cryptography.hazmat.primitives.ciphers.modes
+
+scriptname = os.path.basename(__file__)
+
+message     = bytes('This is a 32-byte test message.\0', 'ascii')
+aes_key     = bytes('128-bit AES key\0', 'ascii')
+aes_xts_key = bytes('This is an AES-128-XTS key.\0\0\0\0\0', 'ascii')
+aes_iv      = bytes('ABCDEFGHIJKLMNOP', 'ascii')
+assoc       = bytes('associated data string', 'ascii')
+hmac_key    = bytes('128-bit HMAC key', 'ascii')
+
+def warn_generated():
+    print(f'''/*
+ * This header was automatically generated by {scriptname}.
+ * Don't edit it directly.
+ */''')
+
+def is_string_value(value):
+    return (value.isascii() and
+            all(c == '\x00' or c.isprintable() for c in str(value, 'ascii')))
+
+def format_value(value, is_string):
+    if is_string:
+        return value
+    hexstr = ''
+    for byte in value:
+        hexstr += f'\\x{byte:02x}'
+    return hexstr
+
+def print_value(name, value):
+    is_string = is_string_value(value)
+    hdr = f'static const u8 fips_{name}[{len(value)}] __initconst ='
+    print(hdr, end='')
+    if is_string:
+        value = str(value, 'ascii').rstrip('\x00')
+        chars_per_byte = 1
+    else:
+        chars_per_byte = 4
+    bytes_per_line = 64 // chars_per_byte
+
+    if len(hdr) + (chars_per_byte * len(value)) + 4 <= 80:
+        print(f' "{format_value(value, is_string)}"', end='')
+    else:
+        for chunk in [value[i:i+bytes_per_line]
+                      for i in range(0, len(value), bytes_per_line)]:
+            print(f'\n\t"{format_value(chunk, is_string)}"', end='')
+    print(';')
+    print('')
+
+def generate_aes_testvecs():
+    print_value('aes_key', aes_key)
+    print_value('aes_iv', aes_iv)
+
+    cbc = Cryptodome.Cipher.AES.new(aes_key, Cryptodome.Cipher.AES.MODE_CBC,
+                                    iv=aes_iv)
+    print_value('aes_cbc_ciphertext', cbc.encrypt(message))
+
+    ecb = Cryptodome.Cipher.AES.new(aes_key, Cryptodome.Cipher.AES.MODE_ECB)
+    print_value('aes_ecb_ciphertext', ecb.encrypt(message))
+
+    ctr = Cryptodome.Cipher.AES.new(aes_key, Cryptodome.Cipher.AES.MODE_CTR,
+                                    nonce=bytes(), initial_value=aes_iv)
+    print_value('aes_ctr_ciphertext', ctr.encrypt(message))
+
+    print_value('aes_gcm_assoc', assoc)
+    gcm = Cryptodome.Cipher.AES.new(aes_key, Cryptodome.Cipher.AES.MODE_GCM,
+                                    nonce=aes_iv[:12], mac_len=16)
+    gcm.update(assoc)
+    raw_ciphertext, tag = gcm.encrypt_and_digest(message)
+    print_value('aes_gcm_ciphertext', raw_ciphertext + tag)
+
+    # Unfortunately, pycryptodome doesn't support XTS, so for it we need to use
+    # a different Python package (the "cryptography" package).
+    print_value('aes_xts_key', aes_xts_key)
+    xts = cryptography.hazmat.primitives.ciphers.Cipher(
+        cryptography.hazmat.primitives.ciphers.algorithms.AES(aes_xts_key),
+        cryptography.hazmat.primitives.ciphers.modes.XTS(aes_iv)).encryptor()
+    ciphertext = xts.update(message) + xts.finalize()
+    print_value('aes_xts_ciphertext', ciphertext)
+
+    cmac = Cryptodome.Hash.CMAC.new(aes_key, ciphermod=Cryptodome.Cipher.AES)
+    cmac.update(message)
+    print_value('aes_cmac_digest', cmac.digest())
+
+def generate_sha_testvecs():
+    print_value('hmac_key', hmac_key)
+    for alg in ['sha1', 'sha256', 'hmac_sha256', 'sha512']:
+        if alg.startswith('hmac_'):
+            h = hmac.new(hmac_key, message, alg.removeprefix('hmac_'))
+        else:
+            h = hashlib.new(alg, message)
+        print_value(f'{alg}_digest', h.digest())
+
+print('/* SPDX-License-Identifier: GPL-2.0-only */')
+print('/* Copyright 2021 Google LLC */')
+print('')
+warn_generated()
+print('')
+print_value('message', message)
+generate_aes_testvecs()
+generate_sha_testvecs()
+warn_generated()

diff --git a/tools/testing/selftests/filesystems/fuse/.gitignore b/tools/testing/selftests/filesystems/fuse/.gitignore
new file mode 100644
index 0000000..3ee9a27
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/.gitignore

@@ -0,0 +1,2 @@
+fuse_test
+*.raw

diff --git a/tools/testing/selftests/filesystems/fuse/Makefile b/tools/testing/selftests/filesystems/fuse/Makefile
new file mode 100644
index 0000000..261d760
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/Makefile

@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0
+CFLAGS += -D_FILE_OFFSET_BITS=64 -Wall -Werror -I../.. -I../../../../.. -I../../../../include
+LDLIBS := -lpthread -lelf
+TEST_GEN_PROGS := fuse_test fuse_daemon
+TEST_GEN_FILES := \
+	test_bpf.bpf \
+	fd_bpf.bpf \
+	fd.sh \
+
+EXTRA_CLEAN := *.bpf
+BPF_FLAGS = -Wall -Werror -O2 -g -emit-llvm \
+	    -I ../../../../../include \
+	    -idirafter /usr/lib/gcc/x86_64-linux-gnu/10/include \
+	    -idirafter /usr/local/include \
+	    -idirafter /usr/include/x86_64-linux-gnu \
+	    -idirafter /usr/include \
+
+include ../../lib.mk
+
+# Put after include ../../lib.mk since that changes $(TEST_GEN_PROGS)
+# Otherwise you get multiple targets, this becomes the default, and it's a mess
+EXTRA_SOURCES := bpf_loader.c
+$(TEST_GEN_PROGS) : $(EXTRA_SOURCES)
+
+$(OUTPUT)/%.ir: %.c
+	clang $(BPF_FLAGS) -c $< -o $@
+
+$(OUTPUT)/%.bpf: $(OUTPUT)/%.ir
+	llc -march=bpf -filetype=obj -o $@ $<
+
+$(OUTPUT)/fd.sh: fd.txt
+	cp $< $@
+	chmod 755 $@
+

diff --git a/tools/testing/selftests/filesystems/fuse/OWNERS b/tools/testing/selftests/filesystems/fuse/OWNERS
new file mode 100644
index 0000000..5eb371e
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/OWNERS

@@ -0,0 +1,2 @@
+# include OWNERS from the authoritative android-mainline branch
+include kernel/common:android-mainline:/tools/testing/selftests/filesystems/incfs/OWNERS

diff --git a/tools/testing/selftests/filesystems/fuse/bpf_loader.c b/tools/testing/selftests/filesystems/fuse/bpf_loader.c
new file mode 100644
index 0000000..5bf26ea
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/bpf_loader.c

@@ -0,0 +1,791 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Google LLC
+ */
+
+#include "test_fuse.h"
+
+#include <dirent.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <gelf.h>
+#include <libelf.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/statfs.h>
+#include <sys/xattr.h>
+
+#include <linux/unistd.h>
+
+#include <include/uapi/linux/fuse.h>
+#include <include/uapi/linux/bpf.h>
+
+struct _test_options test_options;
+
+struct s s(const char *s1)
+{
+	struct s s = {0};
+
+	if (!s1)
+		return s;
+
+	s.s = malloc(strlen(s1) + 1);
+	if (!s.s)
+		return s;
+
+	strcpy(s.s, s1);
+	return s;
+}
+
+struct s sn(const char *s1, const char *s2)
+{
+	struct s s = {0};
+
+	if (!s1)
+		return s;
+
+	s.s = malloc(s2 - s1 + 1);
+	if (!s.s)
+		return s;
+
+	strncpy(s.s, s1, s2 - s1);
+	s.s[s2 - s1] = 0;
+	return s;
+}
+
+int s_cmp(struct s s1, struct s s2)
+{
+	int result = -1;
+
+	if (!s1.s || !s2.s)
+		goto out;
+	result = strcmp(s1.s, s2.s);
+out:
+	free(s1.s);
+	free(s2.s);
+	return result;
+}
+
+struct s s_cat(struct s s1, struct s s2)
+{
+	struct s s = {0};
+
+	if (!s1.s || !s2.s)
+		goto out;
+
+	s.s = malloc(strlen(s1.s) + strlen(s2.s) + 1);
+	if (!s.s)
+		goto out;
+
+	strcpy(s.s, s1.s);
+	strcat(s.s, s2.s);
+out:
+	free(s1.s);
+	free(s2.s);
+	return s;
+}
+
+struct s s_splitleft(struct s s1, char c)
+{
+	struct s s = {0};
+	char *split;
+
+	if (!s1.s)
+		return s;
+
+	split = strchr(s1.s, c);
+	if (split)
+		s = sn(s1.s, split);
+
+	free(s1.s);
+	return s;
+}
+
+struct s s_splitright(struct s s1, char c)
+{
+	struct s s2 = {0};
+	char *split;
+
+	if (!s1.s)
+		return s2;
+
+	split = strchr(s1.s, c);
+	if (split)
+		s2 = s(split + 1);
+
+	free(s1.s);
+	return s2;
+}
+
+struct s s_word(struct s s1, char c, size_t n)
+{
+	while (n--)
+		s1 = s_splitright(s1, c);
+	return s_splitleft(s1, c);
+}
+
+struct s s_path(struct s s1, struct s s2)
+{
+	return s_cat(s_cat(s1, s("/")), s2);
+}
+
+struct s s_pathn(size_t n, struct s s1, ...)
+{
+	va_list argp;
+
+	va_start(argp, s1);
+	while (--n)
+		s1 = s_path(s1, va_arg(argp, struct s));
+	va_end(argp);
+	return s1;
+}
+
+int s_link(struct s src_pathname, struct s dst_pathname)
+{
+	int res;
+
+	if (src_pathname.s && dst_pathname.s) {
+		res = link(src_pathname.s, dst_pathname.s);
+	} else {
+		res = -1;
+		errno = ENOMEM;
+	}
+
+	free(src_pathname.s);
+	free(dst_pathname.s);
+	return res;
+}
+
+int s_symlink(struct s src_pathname, struct s dst_pathname)
+{
+	int res;
+
+	if (src_pathname.s && dst_pathname.s) {
+		res = symlink(src_pathname.s, dst_pathname.s);
+	} else {
+		res = -1;
+		errno = ENOMEM;
+	}
+
+	free(src_pathname.s);
+	free(dst_pathname.s);
+	return res;
+}
+
+
+int s_mkdir(struct s pathname, mode_t mode)
+{
+	int res;
+
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	res = mkdir(pathname.s, mode);
+	free(pathname.s);
+	return res;
+}
+
+int s_rmdir(struct s pathname)
+{
+	int res;
+
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	res = rmdir(pathname.s);
+	free(pathname.s);
+	return res;
+}
+
+int s_unlink(struct s pathname)
+{
+	int res;
+
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	res = unlink(pathname.s);
+	free(pathname.s);
+	return res;
+}
+
+int s_open(struct s pathname, int flags, ...)
+{
+	va_list ap;
+	int res;
+
+	va_start(ap, flags);
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	if (flags & (O_CREAT | O_TMPFILE))
+		res = open(pathname.s, flags, va_arg(ap, mode_t));
+	else
+		res = open(pathname.s, flags);
+
+	free(pathname.s);
+	va_end(ap);
+	return res;
+}
+
+int s_openat(int dirfd, struct s pathname, int flags, ...)
+{
+	va_list ap;
+	int res;
+
+	va_start(ap, flags);
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	if (flags & (O_CREAT | O_TMPFILE))
+		res = openat(dirfd, pathname.s, flags, va_arg(ap, mode_t));
+	else
+		res = openat(dirfd, pathname.s, flags);
+
+	free(pathname.s);
+	va_end(ap);
+	return res;
+}
+
+int s_creat(struct s pathname, mode_t mode)
+{
+	int res;
+
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	res = open(pathname.s, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC, mode);
+	free(pathname.s);
+	return res;
+}
+
+int s_mkfifo(struct s pathname, mode_t mode)
+{
+	int res;
+
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	res = mknod(pathname.s, S_IFIFO | mode, 0);
+	free(pathname.s);
+	return res;
+}
+
+int s_stat(struct s pathname, struct stat *st)
+{
+	int res;
+
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	res = stat(pathname.s, st);
+	free(pathname.s);
+	return res;
+}
+
+int s_statfs(struct s pathname, struct statfs *st)
+{
+	int res;
+
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	res = statfs(pathname.s, st);
+	free(pathname.s);
+	return res;
+}
+
+DIR *s_opendir(struct s pathname)
+{
+	DIR *res;
+
+	res = opendir(pathname.s);
+	free(pathname.s);
+	return res;
+}
+
+int s_getxattr(struct s pathname, const char name[], void *value, size_t size,
+	       ssize_t *ret_size)
+{
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	*ret_size = getxattr(pathname.s, name, value, size);
+	free(pathname.s);
+	return *ret_size >= 0 ? 0 : -1;
+}
+
+int s_listxattr(struct s pathname, void *list, size_t size, ssize_t *ret_size)
+{
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	*ret_size = listxattr(pathname.s, list, size);
+	free(pathname.s);
+	return *ret_size >= 0 ? 0 : -1;
+}
+
+int s_setxattr(struct s pathname, const char name[], const void *value, size_t size, int flags)
+{
+	int res;
+
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	res = setxattr(pathname.s, name, value, size, flags);
+	free(pathname.s);
+	return res;
+}
+
+int s_removexattr(struct s pathname, const char name[])
+{
+	int res;
+
+	if (!pathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	res = removexattr(pathname.s, name);
+	free(pathname.s);
+	return res;
+}
+
+int s_rename(struct s oldpathname, struct s newpathname)
+{
+	int res;
+
+	if (!oldpathname.s || !newpathname.s) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	res = rename(oldpathname.s, newpathname.s);
+	free(oldpathname.s);
+	free(newpathname.s);
+	return res;
+}
+
+int s_fuse_attr(struct s pathname, struct fuse_attr *fuse_attr_out)
+{
+
+	struct stat st;
+	int result = TEST_FAILURE;
+
+	TESTSYSCALL(s_stat(pathname, &st));
+
+	fuse_attr_out->ino = st.st_ino;
+	fuse_attr_out->mode = st.st_mode;
+	fuse_attr_out->nlink = st.st_nlink;
+	fuse_attr_out->uid = st.st_uid;
+	fuse_attr_out->gid = st.st_gid;
+	fuse_attr_out->rdev = st.st_rdev;
+	fuse_attr_out->size = st.st_size;
+	fuse_attr_out->blksize = st.st_blksize;
+	fuse_attr_out->blocks = st.st_blocks;
+	fuse_attr_out->atime = st.st_atime;
+	fuse_attr_out->mtime = st.st_mtime;
+	fuse_attr_out->ctime = st.st_ctime;
+	fuse_attr_out->atimensec = UINT32_MAX;
+	fuse_attr_out->mtimensec = UINT32_MAX;
+	fuse_attr_out->ctimensec = UINT32_MAX;
+
+	result = TEST_SUCCESS;
+out:
+	return result;
+}
+
+struct s tracing_folder(void)
+{
+	struct s trace = {0};
+	FILE *mounts = NULL;
+	char *line = NULL;
+	size_t size = 0;
+
+	TEST(mounts = fopen("/proc/mounts", "re"), mounts);
+	while (getline(&line, &size, mounts) != -1) {
+		if (!s_cmp(s_word(sn(line, line + size), ' ', 2),
+			   s("tracefs"))) {
+			trace = s_word(sn(line, line + size), ' ', 1);
+			break;
+		}
+
+		if (!s_cmp(s_word(sn(line, line + size), ' ', 2), s("debugfs")))
+			trace = s_path(s_word(sn(line, line + size), ' ', 1),
+				       s("tracing"));
+	}
+
+out:
+	free(line);
+	fclose(mounts);
+	return trace;
+}
+
+int tracing_on(void)
+{
+	int result = TEST_FAILURE;
+	int tracing_on = -1;
+
+	TEST(tracing_on = s_open(s_path(tracing_folder(), s("tracing_on")),
+				 O_WRONLY | O_CLOEXEC),
+	     tracing_on != -1);
+	TESTEQUAL(write(tracing_on, "1", 1), 1);
+	result = TEST_SUCCESS;
+out:
+	close(tracing_on);
+	return result;
+}
+
+char *concat_file_name(const char *dir, const char *file)
+{
+	char full_name[FILENAME_MAX] = "";
+
+	if (snprintf(full_name, ARRAY_SIZE(full_name), "%s/%s", dir, file) < 0)
+		return NULL;
+	return strdup(full_name);
+}
+
+char *setup_mount_dir(const char *name)
+{
+	struct stat st;
+	char *current_dir = getcwd(NULL, 0);
+	char *mount_dir = concat_file_name(current_dir, name);
+
+	free(current_dir);
+	if (stat(mount_dir, &st) == 0) {
+		if (S_ISDIR(st.st_mode))
+			return mount_dir;
+
+		ksft_print_msg("%s is a file, not a dir.\n", mount_dir);
+		return NULL;
+	}
+
+	if (mkdir(mount_dir, 0777)) {
+		ksft_print_msg("Can't create mount dir.");
+		return NULL;
+	}
+
+	return mount_dir;
+}
+
+int delete_dir_tree(const char *dir_path, bool remove_root)
+{
+	DIR *dir = NULL;
+	struct dirent *dp;
+	int result = 0;
+
+	dir = opendir(dir_path);
+	if (!dir) {
+		result = -errno;
+		goto out;
+	}
+
+	while ((dp = readdir(dir))) {
+		char *full_path;
+
+		if (!strcmp(dp->d_name, ".") || !strcmp(dp->d_name, ".."))
+			continue;
+
+		full_path = concat_file_name(dir_path, dp->d_name);
+		if (dp->d_type == DT_DIR)
+			result = delete_dir_tree(full_path, true);
+		else
+			result = unlink(full_path);
+		free(full_path);
+		if (result)
+			goto out;
+	}
+
+out:
+	if (dir)
+		closedir(dir);
+	if (!result && remove_root)
+		rmdir(dir_path);
+	return result;
+}
+
+static int mount_fuse_maybe_init(const char *mount_dir, int bpf_fd, int dir_fd,
+			     int *fuse_dev_ptr, bool init)
+{
+	int result = TEST_FAILURE;
+	int fuse_dev = -1;
+	char options[FILENAME_MAX];
+	uint8_t bytes_in[FUSE_MIN_READ_BUFFER];
+	uint8_t bytes_out[FUSE_MIN_READ_BUFFER];
+
+	DECL_FUSE_IN(init);
+
+	TEST(fuse_dev = open("/dev/fuse", O_RDWR | O_CLOEXEC), fuse_dev != -1);
+	snprintf(options, FILENAME_MAX, "fd=%d,user_id=0,group_id=0,rootmode=0040000",
+		 fuse_dev);
+	if (bpf_fd != -1)
+		snprintf(options + strlen(options),
+			 sizeof(options) - strlen(options),
+			 ",root_bpf=%d", bpf_fd);
+	if (dir_fd != -1)
+		snprintf(options + strlen(options),
+			 sizeof(options) - strlen(options),
+			 ",root_dir=%d", dir_fd);
+	TESTSYSCALL(mount("ABC", mount_dir, "fuse", 0, options));
+
+	if (init) {
+		TESTFUSEIN(FUSE_INIT, init_in);
+		TESTEQUAL(init_in->major, FUSE_KERNEL_VERSION);
+		TESTEQUAL(init_in->minor, FUSE_KERNEL_MINOR_VERSION);
+		TESTFUSEOUT1(fuse_init_out, ((struct fuse_init_out) {
+			.major = FUSE_KERNEL_VERSION,
+			.minor = FUSE_KERNEL_MINOR_VERSION,
+			.max_readahead = 4096,
+			.flags = 0,
+			.max_background = 0,
+			.congestion_threshold = 0,
+			.max_write = 4096,
+			.time_gran = 1000,
+			.max_pages = 12,
+			.map_alignment = 4096,
+		}));
+	}
+
+	*fuse_dev_ptr = fuse_dev;
+	fuse_dev = -1;
+	result = TEST_SUCCESS;
+out:
+	close(fuse_dev);
+	return result;
+}
+
+int mount_fuse(const char *mount_dir, int bpf_fd, int dir_fd, int *fuse_dev_ptr)
+{
+	return mount_fuse_maybe_init(mount_dir, bpf_fd, dir_fd, fuse_dev_ptr,
+				     true);
+}
+
+int mount_fuse_no_init(const char *mount_dir, int bpf_fd, int dir_fd,
+		       int *fuse_dev_ptr)
+{
+	return mount_fuse_maybe_init(mount_dir, bpf_fd, dir_fd, fuse_dev_ptr,
+				     false);
+}
+
+struct fuse_bpf_map {
+	unsigned int map_type;
+	size_t key_size;
+	size_t value_size;
+	unsigned int max_entries;
+};
+
+static int install_maps(Elf_Data *maps, int maps_index, Elf *elf,
+			Elf_Data *symbols, int symbol_index,
+			struct map_relocation **mr, size_t *map_count)
+{
+	int result = TEST_FAILURE;
+	int i;
+	GElf_Sym symbol;
+
+	TESTNE((void *)symbols, NULL);
+
+	for (i = 0; i < symbols->d_size / sizeof(symbol); ++i) {
+		TESTNE((void *)gelf_getsym(symbols, i, &symbol), 0);
+		if (symbol.st_shndx == maps_index) {
+			struct fuse_bpf_map *map;
+			union bpf_attr attr;
+			int map_fd;
+
+			map = (struct fuse_bpf_map *)
+				((char *)maps->d_buf + symbol.st_value);
+
+			attr = (union bpf_attr) {
+				.map_type = map->map_type,
+				.key_size = map->key_size,
+				.value_size = map->value_size,
+				.max_entries = map->max_entries,
+			};
+
+			TEST(*mr = realloc(*mr, ++*map_count *
+					   sizeof(struct fuse_bpf_map)),
+			     *mr);
+			TEST(map_fd = syscall(__NR_bpf, BPF_MAP_CREATE,
+					      &attr, sizeof(attr)),
+			     map_fd != -1);
+			(*mr)[*map_count - 1] = (struct map_relocation) {
+				.name = strdup(elf_strptr(elf, symbol_index,
+							  symbol.st_name)),
+				.fd = map_fd,
+				.value = symbol.st_value,
+			};
+		}
+	}
+
+	result = TEST_SUCCESS;
+out:
+	return result;
+}
+
+static inline int relocate_maps(GElf_Shdr *rel_header, Elf_Data *rel_data,
+			 Elf_Data *prog_data, Elf_Data *symbol_data,
+			 struct map_relocation *map_relocations,
+			 size_t map_count)
+{
+	int result = TEST_FAILURE;
+	int i;
+	struct bpf_insn *insns = (struct bpf_insn *) prog_data->d_buf;
+
+	for (i = 0; i < rel_header->sh_size / rel_header->sh_entsize; ++i) {
+		GElf_Sym sym;
+		GElf_Rel rel;
+		unsigned int insn_idx;
+		int map_idx;
+
+		gelf_getrel(rel_data, i, &rel);
+		insn_idx = rel.r_offset / sizeof(struct bpf_insn);
+		insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+
+		gelf_getsym(symbol_data, GELF_R_SYM(rel.r_info), &sym);
+		for (map_idx = 0; map_idx < map_count; map_idx++) {
+			if (map_relocations[map_idx].value == sym.st_value) {
+				insns[insn_idx].imm =
+					map_relocations[map_idx].fd;
+				break;
+			}
+		}
+		TESTNE(map_idx, map_count);
+	}
+
+	result = TEST_SUCCESS;
+out:
+	return result;
+}
+
+int install_elf_bpf(const char *file, const char *section, int *fd,
+		    struct map_relocation **map_relocations, size_t *map_count)
+{
+	int result = TEST_FAILURE;
+	char path[PATH_MAX] = {};
+	char *last_slash;
+	int filter_fd = -1;
+	union bpf_attr bpf_attr;
+	static char log[1 << 20];
+	Elf *elf = NULL;
+	GElf_Ehdr ehdr;
+	Elf_Data *data_prog = NULL, *data_maps = NULL, *data_symbols = NULL;
+	int maps_index, symbol_index, prog_index;
+	int i;
+	int bpf_prog_type_fuse_fd = -1;
+	char buffer[10] = {0};
+	int bpf_prog_type_fuse;
+
+	TESTNE(readlink("/proc/self/exe", path, PATH_MAX), -1);
+	TEST(last_slash = strrchr(path, '/'), last_slash);
+	strcpy(last_slash + 1, file);
+	TEST(filter_fd = open(path, O_RDONLY | O_CLOEXEC), filter_fd != -1);
+	TESTNE(elf_version(EV_CURRENT), EV_NONE);
+	TEST(elf = elf_begin(filter_fd, ELF_C_READ, NULL), elf);
+	TESTEQUAL((void *) gelf_getehdr(elf, &ehdr), &ehdr);
+	for (i = 1; i < ehdr.e_shnum; i++) {
+		char *shname;
+		GElf_Shdr shdr;
+		Elf_Scn *scn;
+
+		TEST(scn = elf_getscn(elf, i), scn);
+		TESTEQUAL((void *)gelf_getshdr(scn, &shdr), &shdr);
+		TEST(shname = elf_strptr(elf, ehdr.e_shstrndx, shdr.sh_name),
+		     shname);
+
+		if (!strcmp(shname, "maps")) {
+			TEST(data_maps = elf_getdata(scn, 0), data_maps);
+			maps_index = i;
+		} else if (shdr.sh_type == SHT_SYMTAB) {
+			TEST(data_symbols = elf_getdata(scn, 0), data_symbols);
+			symbol_index = shdr.sh_link;
+		} else if (!strcmp(shname, section)) {
+			TEST(data_prog = elf_getdata(scn, 0), data_prog);
+			prog_index = i;
+		}
+	}
+	TESTNE((void *) data_prog, NULL);
+
+	if (data_maps)
+		TESTEQUAL(install_maps(data_maps, maps_index, elf,
+				       data_symbols, symbol_index,
+				       map_relocations, map_count), 0);
+
+	/* Now relocate maps */
+	for (i = 1; i < ehdr.e_shnum; i++) {
+		GElf_Shdr rel_header;
+		Elf_Scn *scn;
+		Elf_Data *rel_data;
+
+		TEST(scn = elf_getscn(elf, i), scn);
+		TESTEQUAL((void *)gelf_getshdr(scn, &rel_header),
+			&rel_header);
+		if (rel_header.sh_type != SHT_REL)
+			continue;
+		TEST(rel_data = elf_getdata(scn, 0), rel_data);
+
+		if (rel_header.sh_info != prog_index)
+			continue;
+		TESTEQUAL(relocate_maps(&rel_header, rel_data,
+					data_prog, data_symbols,
+					*map_relocations, *map_count),
+			  0);
+	}
+
+	TEST(bpf_prog_type_fuse_fd = open("/sys/fs/fuse/bpf_prog_type_fuse",
+					  O_RDONLY | O_CLOEXEC),
+	     bpf_prog_type_fuse_fd != -1);
+	TESTGE(read(bpf_prog_type_fuse_fd, buffer, sizeof(buffer)), 1);
+	TEST(bpf_prog_type_fuse = strtol(buffer, NULL, 10),
+	     bpf_prog_type_fuse != 0);
+
+	bpf_attr = (union bpf_attr) {
+		.prog_type = bpf_prog_type_fuse,
+		.insn_cnt = data_prog->d_size / 8,
+		.insns = ptr_to_u64(data_prog->d_buf),
+		.license = ptr_to_u64("GPL"),
+		.log_buf = test_options.verbose ? ptr_to_u64(log) : 0,
+		.log_size = test_options.verbose ? sizeof(log) : 0,
+		.log_level = test_options.verbose ? 2 : 0,
+	};
+	*fd = syscall(__NR_bpf, BPF_PROG_LOAD, &bpf_attr, sizeof(bpf_attr));
+	if (test_options.verbose)
+		ksft_print_msg("%s\n", log);
+	if (*fd == -1 && errno == ENOSPC)
+		ksft_print_msg("bpf log size too small!\n");
+	TESTNE(*fd, -1);
+
+	result = TEST_SUCCESS;
+out:
+	close(filter_fd);
+	close(bpf_prog_type_fuse_fd);
+	return result;
+}
+
+

diff --git a/tools/testing/selftests/filesystems/fuse/fd.txt b/tools/testing/selftests/filesystems/fuse/fd.txt
new file mode 100644
index 0000000..15ce771
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/fd.txt

@@ -0,0 +1,21 @@
+fuse_daemon $*
+cd fd-dst
+ls
+cd show
+ls
+fsstress -s 123 -d . -p 4 -n 100 -l5
+echo test > wibble
+ls
+cat wibble
+fallocate -l 1000 wobble
+mkdir testdir
+mkdir tmpdir
+rmdir tmpdir
+touch tmp
+mv tmp tmp2
+rm tmp2
+
+# FUSE_LINK
+echo "ln_src contents" > ln_src
+ln ln_src ln_link
+cat ln_link

diff --git a/tools/testing/selftests/filesystems/fuse/fd_bpf.c b/tools/testing/selftests/filesystems/fuse/fd_bpf.c
new file mode 100644
index 0000000..3cd82d6
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/fd_bpf.c

@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2021 Google LLC
+
+#include "test_fuse_bpf.h"
+
+SEC("maps") struct fuse_bpf_map test_map = {
+	BPF_MAP_TYPE_ARRAY,
+	sizeof(uint32_t),
+	sizeof(uint32_t),
+	1000,
+};
+
+SEC("maps") struct fuse_bpf_map test_map2 = {
+	BPF_MAP_TYPE_HASH,
+	sizeof(uint32_t),
+	sizeof(uint64_t),
+	76,
+};
+
+SEC("test_daemon") int trace_daemon(struct fuse_bpf_args *fa)
+{
+	uint64_t uid_gid = bpf_get_current_uid_gid();
+	uint32_t uid = uid_gid & 0xffffffff;
+	uint64_t pid_tgid = bpf_get_current_pid_tgid();
+	uint32_t pid = pid_tgid & 0xffffffff;
+	uint32_t key = 23;
+	uint32_t *pvalue;
+
+	pvalue = bpf_map_lookup_elem(&test_map, &key);
+	if (pvalue) {
+		uint32_t value = *pvalue;
+
+		bpf_printk("pid %u uid %u value %u", pid, uid, value);
+		value++;
+		bpf_map_update_elem(&test_map, &key,  &value, BPF_ANY);
+	}
+
+	switch (fa->opcode) {
+	case FUSE_ACCESS | FUSE_PREFILTER: {
+		bpf_printk("Access: %d", fa->nodeid);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_GETATTR | FUSE_PREFILTER: {
+		const struct fuse_getattr_in *fgi = fa->in_args[0].value;
+
+		bpf_printk("Get Attr %d", fgi->fh);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_SETATTR | FUSE_PREFILTER: {
+		const struct fuse_setattr_in *fsi = fa->in_args[0].value;
+
+		bpf_printk("Set Attr %d", fsi->fh);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_OPENDIR | FUSE_PREFILTER: {
+		bpf_printk("Open Dir: %d", fa->nodeid);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_READDIR | FUSE_PREFILTER: {
+		const struct fuse_read_in *fri = fa->in_args[0].value;
+
+		bpf_printk("Read Dir: fh: %lu", fri->fh, fri->offset);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_LOOKUP | FUSE_PREFILTER: {
+		const char *name = fa->in_args[0].value;
+
+		bpf_printk("Lookup: %lx %s", fa->nodeid, name);
+		if (fa->nodeid == 1)
+			return FUSE_BPF_USER_FILTER | FUSE_BPF_BACKING;
+		else
+			return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_MKNOD | FUSE_PREFILTER: {
+		const struct fuse_mknod_in *fmi = fa->in_args[0].value;
+		const char *name = fa->in_args[1].value;
+
+		bpf_printk("mknod %s %x %x", name,  fmi->rdev | fmi->mode, fmi->umask);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_MKDIR | FUSE_PREFILTER: {
+		const struct fuse_mkdir_in *fmi = fa->in_args[0].value;
+		const char *name = fa->in_args[1].value;
+
+		bpf_printk("mkdir: %s %x %x", name, fmi->mode, fmi->umask);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_RMDIR | FUSE_PREFILTER: {
+		const char *name = fa->in_args[0].value;
+
+		bpf_printk("rmdir: %s", name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_RENAME | FUSE_PREFILTER: {
+		const char *oldname = fa->in_args[1].value;
+		const char *newname = fa->in_args[2].value;
+
+		bpf_printk("rename from %s", oldname);
+		bpf_printk("rename to %s", newname);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_RENAME2 | FUSE_PREFILTER: {
+		const struct fuse_rename2_in *fri = fa->in_args[0].value;
+		uint32_t flags = fri->flags;
+		const char *oldname = fa->in_args[1].value;
+		const char *newname = fa->in_args[2].value;
+
+		bpf_printk("rename(%x) from %s", flags, oldname);
+		bpf_printk("rename to %s", newname);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_UNLINK | FUSE_PREFILTER: {
+		const char *name = fa->in_args[0].value;
+
+		bpf_printk("unlink: %s", name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_LINK | FUSE_PREFILTER: {
+		const struct fuse_link_in *fli = fa->in_args[0].value;
+		const char *dst_name = fa->in_args[1].value;
+
+		bpf_printk("Link: %d %s", fli->oldnodeid, dst_name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_SYMLINK | FUSE_PREFILTER: {
+		const char *link_name = fa->in_args[0].value;
+		const char *link_dest = fa->in_args[1].value;
+
+		bpf_printk("symlink from %s", link_name);
+		bpf_printk("symlink to %s", link_dest);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_READLINK | FUSE_PREFILTER: {
+		const char *link_name = fa->in_args[0].value;
+
+		bpf_printk("readlink from %s", link_name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_RELEASE | FUSE_PREFILTER: {
+		const struct fuse_release_in *fri = fa->in_args[0].value;
+
+		bpf_printk("Release: %d", fri->fh);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_RELEASEDIR | FUSE_PREFILTER: {
+		const struct fuse_release_in *fri = fa->in_args[0].value;
+
+		bpf_printk("Release Dir: %d", fri->fh);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_CREATE | FUSE_PREFILTER: {
+		bpf_printk("Create %s", fa->in_args[1].value);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_OPEN | FUSE_PREFILTER: {
+		bpf_printk("Open: %d", fa->nodeid);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_READ | FUSE_PREFILTER: {
+		const struct fuse_read_in *fri = fa->in_args[0].value;
+
+		bpf_printk("Read: fh: %lu, offset %lu, size %lu",
+			   fri->fh, fri->offset, fri->size);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_WRITE | FUSE_PREFILTER: {
+		const struct fuse_write_in *fwi = fa->in_args[0].value;
+
+		bpf_printk("Write: fh: %lu, offset %lu, size %lu",
+			   fwi->fh, fwi->offset, fwi->size);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_FLUSH | FUSE_PREFILTER: {
+		const struct fuse_flush_in *ffi = fa->in_args[0].value;
+
+		bpf_printk("Flush %d", ffi->fh);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_FALLOCATE | FUSE_PREFILTER: {
+		const struct fuse_fallocate_in *ffa = fa->in_args[0].value;
+
+		bpf_printk("Fallocate %d %lu", ffa->fh, ffa->length);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_GETXATTR | FUSE_PREFILTER: {
+		const char *name = fa->in_args[1].value;
+
+		bpf_printk("Getxattr %d %s", fa->nodeid, name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_LISTXATTR | FUSE_PREFILTER: {
+		const char *name = fa->in_args[1].value;
+
+		bpf_printk("Listxattr %d %s", fa->nodeid, name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_SETXATTR | FUSE_PREFILTER: {
+		const char *name = fa->in_args[1].value;
+
+		bpf_printk("Setxattr %d %s", fa->nodeid, name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_STATFS | FUSE_PREFILTER: {
+		bpf_printk("statfs %d", fa->nodeid);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_LSEEK | FUSE_PREFILTER: {
+		const struct fuse_lseek_in *fli = fa->in_args[0].value;
+
+		bpf_printk("lseek type:%d, offset:%lld", fli->whence, fli->offset);
+		return FUSE_BPF_BACKING;
+	}
+
+	default:
+		if (fa->opcode & FUSE_PREFILTER)
+			bpf_printk("prefilter *** UNKNOWN *** opcode: %d",
+				   fa->opcode & FUSE_OPCODE_FILTER);
+		else if (fa->opcode & FUSE_POSTFILTER)
+			bpf_printk("postfilter *** UNKNOWN *** opcode: %d",
+				   fa->opcode & FUSE_OPCODE_FILTER);
+		else
+			bpf_printk("*** UNKNOWN *** opcode: %d", fa->opcode);
+		return FUSE_BPF_BACKING;
+	}
+}

diff --git a/tools/testing/selftests/filesystems/fuse/fuse_daemon.c b/tools/testing/selftests/filesystems/fuse/fuse_daemon.c
new file mode 100644
index 0000000..1b6f8c2a
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/fuse_daemon.c

@@ -0,0 +1,294 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Google LLC
+ */
+
+#include "test_fuse.h"
+
+#include <errno.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+
+#include <linux/unistd.h>
+
+#include <include/uapi/linux/fuse.h>
+#include <include/uapi/linux/bpf.h>
+
+bool user_messages;
+bool kernel_messages;
+
+static int display_trace(void)
+{
+	int pid = -1;
+	int tp = -1;
+	char c;
+	ssize_t bytes_read;
+	static char line[256] = {0};
+
+	if (!kernel_messages)
+		return TEST_SUCCESS;
+
+	TEST(pid = fork(), pid != -1);
+	if (pid != 0)
+		return pid;
+
+	TESTEQUAL(tracing_on(), 0);
+	TEST(tp = s_open(s_path(tracing_folder(), s("trace_pipe")),
+			 O_RDONLY | O_CLOEXEC), tp != -1);
+	for (;;) {
+		TEST(bytes_read = read(tp, &c, sizeof(c)),
+		     bytes_read == 1);
+		if (c == '\n') {
+			printf("%s\n", line);
+			line[0] = 0;
+		} else
+			sprintf(line + strlen(line), "%c", c);
+	}
+out:
+	if (pid == 0) {
+		close(tp);
+		exit(TEST_FAILURE);
+	}
+	return pid;
+}
+
+static const char *fuse_opcode_to_string(int opcode)
+{
+	switch (opcode & FUSE_OPCODE_FILTER) {
+	case FUSE_LOOKUP:
+		return "FUSE_LOOKUP";
+	case FUSE_FORGET:
+		return "FUSE_FORGET";
+	case FUSE_GETATTR:
+		return "FUSE_GETATTR";
+	case FUSE_SETATTR:
+		return "FUSE_SETATTR";
+	case FUSE_READLINK:
+		return "FUSE_READLINK";
+	case FUSE_SYMLINK:
+		return "FUSE_SYMLINK";
+	case FUSE_MKNOD:
+		return "FUSE_MKNOD";
+	case FUSE_MKDIR:
+		return "FUSE_MKDIR";
+	case FUSE_UNLINK:
+		return "FUSE_UNLINK";
+	case FUSE_RMDIR:
+		return "FUSE_RMDIR";
+	case FUSE_RENAME:
+		return "FUSE_RENAME";
+	case FUSE_LINK:
+		return "FUSE_LINK";
+	case FUSE_OPEN:
+		return "FUSE_OPEN";
+	case FUSE_READ:
+		return "FUSE_READ";
+	case FUSE_WRITE:
+		return "FUSE_WRITE";
+	case FUSE_STATFS:
+		return "FUSE_STATFS";
+	case FUSE_RELEASE:
+		return "FUSE_RELEASE";
+	case FUSE_FSYNC:
+		return "FUSE_FSYNC";
+	case FUSE_SETXATTR:
+		return "FUSE_SETXATTR";
+	case FUSE_GETXATTR:
+		return "FUSE_GETXATTR";
+	case FUSE_LISTXATTR:
+		return "FUSE_LISTXATTR";
+	case FUSE_REMOVEXATTR:
+		return "FUSE_REMOVEXATTR";
+	case FUSE_FLUSH:
+		return "FUSE_FLUSH";
+	case FUSE_INIT:
+		return "FUSE_INIT";
+	case FUSE_OPENDIR:
+		return "FUSE_OPENDIR";
+	case FUSE_READDIR:
+		return "FUSE_READDIR";
+	case FUSE_RELEASEDIR:
+		return "FUSE_RELEASEDIR";
+	case FUSE_FSYNCDIR:
+		return "FUSE_FSYNCDIR";
+	case FUSE_GETLK:
+		return "FUSE_GETLK";
+	case FUSE_SETLK:
+		return "FUSE_SETLK";
+	case FUSE_SETLKW:
+		return "FUSE_SETLKW";
+	case FUSE_ACCESS:
+		return "FUSE_ACCESS";
+	case FUSE_CREATE:
+		return "FUSE_CREATE";
+	case FUSE_INTERRUPT:
+		return "FUSE_INTERRUPT";
+	case FUSE_BMAP:
+		return "FUSE_BMAP";
+	case FUSE_DESTROY:
+		return "FUSE_DESTROY";
+	case FUSE_IOCTL:
+		return "FUSE_IOCTL";
+	case FUSE_POLL:
+		return "FUSE_POLL";
+	case FUSE_NOTIFY_REPLY:
+		return "FUSE_NOTIFY_REPLY";
+	case FUSE_BATCH_FORGET:
+		return "FUSE_BATCH_FORGET";
+	case FUSE_FALLOCATE:
+		return "FUSE_FALLOCATE";
+	case FUSE_READDIRPLUS:
+		return "FUSE_READDIRPLUS";
+	case FUSE_RENAME2:
+		return "FUSE_RENAME2";
+	case FUSE_LSEEK:
+		return "FUSE_LSEEK";
+	case FUSE_COPY_FILE_RANGE:
+		return "FUSE_COPY_FILE_RANGE";
+	case FUSE_SETUPMAPPING:
+		return "FUSE_SETUPMAPPING";
+	case FUSE_REMOVEMAPPING:
+		return "FUSE_REMOVEMAPPING";
+	//case FUSE_SYNCFS:
+	//	return "FUSE_SYNCFS";
+	case CUSE_INIT:
+		return "CUSE_INIT";
+	case CUSE_INIT_BSWAP_RESERVED:
+		return "CUSE_INIT_BSWAP_RESERVED";
+	case FUSE_INIT_BSWAP_RESERVED:
+		return "FUSE_INIT_BSWAP_RESERVED";
+	}
+	return "?";
+}
+
+static int parse_options(int argc, char *const *argv)
+{
+	signed char c;
+
+	while ((c = getopt(argc, argv, "kuv")) != -1)
+		switch (c) {
+		case 'v':
+			test_options.verbose = true;
+			break;
+
+		case 'u':
+			user_messages = true;
+			break;
+
+		case 'k':
+			kernel_messages = true;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int result = TEST_FAILURE;
+	int trace_pid = -1;
+	char *mount_dir = NULL;
+	char *src_dir = NULL;
+	int bpf_fd = -1;
+	int src_fd = -1;
+	int fuse_dev = -1;
+	struct map_relocation *map_relocations = NULL;
+	size_t map_count = 0;
+	int i;
+
+	if (geteuid() != 0)
+		ksft_print_msg("Not a root, might fail to mount.\n");
+	TESTEQUAL(parse_options(argc, argv), 0);
+
+	TEST(trace_pid = display_trace(), trace_pid != -1);
+
+	delete_dir_tree("fd-src", true);
+	TEST(src_dir = setup_mount_dir("fd-src"), src_dir);
+	delete_dir_tree("fd-dst", true);
+	TEST(mount_dir = setup_mount_dir("fd-dst"), mount_dir);
+
+	TESTEQUAL(install_elf_bpf("fd_bpf.bpf", "test_daemon", &bpf_fd,
+				  &map_relocations, &map_count), 0);
+
+	TEST(src_fd = open("fd-src", O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTSYSCALL(mkdirat(src_fd, "show", 0777));
+	TESTSYSCALL(mkdirat(src_fd, "hide", 0777));
+
+	for (i = 0; i < map_count; ++i)
+		if (!strcmp(map_relocations[i].name, "test_map")) {
+			uint32_t key = 23;
+			uint32_t value = 1234;
+			union bpf_attr attr = {
+				.map_fd = map_relocations[i].fd,
+				.key    = ptr_to_u64(&key),
+				.value  = ptr_to_u64(&value),
+				.flags  = BPF_ANY,
+			};
+			TESTSYSCALL(syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM,
+					    &attr, sizeof(attr)));
+		}
+
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	if (fork())
+		return 0;
+
+	for (;;) {
+		uint8_t bytes_in[FUSE_MIN_READ_BUFFER];
+		uint8_t bytes_out[FUSE_MIN_READ_BUFFER] __maybe_unused;
+		struct fuse_in_header *in_header =
+			(struct fuse_in_header *)bytes_in;
+		ssize_t res = read(fuse_dev, bytes_in, sizeof(bytes_in));
+
+		if (res == -1)
+			break;
+
+		switch (in_header->opcode) {
+		case FUSE_LOOKUP | FUSE_PREFILTER: {
+			char *name = (char *)(bytes_in + sizeof(*in_header));
+
+			if (user_messages)
+				printf("Lookup %s\n", name);
+			if (!strcmp(name, "hide"))
+				TESTFUSEOUTERROR(-ENOENT);
+			else
+				TESTFUSEOUTREAD(name, strlen(name) + 1);
+			break;
+		}
+		default:
+			if (user_messages) {
+				printf("opcode is %d (%s)\n", in_header->opcode,
+				       fuse_opcode_to_string(
+					       in_header->opcode));
+			}
+			break;
+		}
+	}
+
+	result = TEST_SUCCESS;
+
+out:
+	for (i = 0; i < map_count; ++i) {
+		free(map_relocations[i].name);
+		close(map_relocations[i].fd);
+	}
+	free(map_relocations);
+	umount2(mount_dir, MNT_FORCE);
+	delete_dir_tree(mount_dir, true);
+	free(mount_dir);
+	delete_dir_tree(src_dir, true);
+	free(src_dir);
+	if (trace_pid != -1)
+		kill(trace_pid, SIGKILL);
+	return result;
+}

diff --git a/tools/testing/selftests/filesystems/fuse/fuse_test.c b/tools/testing/selftests/filesystems/fuse/fuse_test.c
new file mode 100644
index 0000000..c23f75b
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/fuse_test.c

@@ -0,0 +1,2142 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Google LLC
+ */
+#define _GNU_SOURCE
+
+#include "test_fuse.h"
+
+#include <errno.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/inotify.h>
+#include <sys/mman.h>
+#include <sys/mount.h>
+#include <sys/syscall.h>
+#include <sys/wait.h>
+
+#include <linux/capability.h>
+#include <linux/random.h>
+
+#include <include/uapi/linux/fuse.h>
+#include <include/uapi/linux/bpf.h>
+
+static const char *ft_src = "ft-src";
+static const char *ft_dst = "ft-dst";
+
+static void fill_buffer(uint8_t *data, size_t len, int file, int block)
+{
+	int i;
+	int seed = 7919 * file + block;
+
+	for (i = 0; i < len; i++) {
+		seed = 1103515245 * seed + 12345;
+		data[i] = (uint8_t)(seed >> (i % 13));
+	}
+}
+
+static bool test_buffer(uint8_t *data, size_t len, int file, int block)
+{
+	int i;
+	int seed = 7919 * file + block;
+
+	for (i = 0; i < len; i++) {
+		seed = 1103515245 * seed + 12345;
+		if (data[i] != (uint8_t)(seed >> (i % 13)))
+			return false;
+	}
+
+	return true;
+}
+
+static int create_file(int dir, struct s name, int index, size_t blocks)
+{
+	int result = TEST_FAILURE;
+	int fd = -1;
+	int i;
+	uint8_t data[PAGE_SIZE];
+
+	TEST(fd = s_openat(dir, name, O_CREAT | O_WRONLY, 0777), fd != -1);
+	for (i = 0; i < blocks; ++i) {
+		fill_buffer(data, PAGE_SIZE, index, i);
+		TESTEQUAL(write(fd, data, sizeof(data)), PAGE_SIZE);
+	}
+	TESTSYSCALL(close(fd));
+	result = TEST_SUCCESS;
+
+out:
+	close(fd);
+	return result;
+}
+
+static int bpf_clear_trace(void)
+{
+	int result = TEST_FAILURE;
+	int tp = -1;
+
+	TEST(tp = s_open(s_path(tracing_folder(), s("trace")),
+			 O_WRONLY | O_TRUNC | O_CLOEXEC), tp != -1);
+
+	result = TEST_SUCCESS;
+out:
+	close(tp);
+	return result;
+}
+
+static int bpf_test_trace_maybe(const char *substr, bool present)
+{
+	int result = TEST_FAILURE;
+	int tp = -1;
+	char trace_buffer[4096] = {};
+	ssize_t bytes_read;
+
+	TEST(tp = s_open(s_path(tracing_folder(), s("trace_pipe")),
+			 O_RDONLY | O_CLOEXEC),
+	     tp != -1);
+	fcntl(tp, F_SETFL, O_NONBLOCK);
+
+	for (;;) {
+		bytes_read = read(tp, trace_buffer, sizeof(trace_buffer));
+		if (present)
+			TESTCOND(bytes_read > 0);
+		else if (bytes_read <= 0) {
+			result = TEST_SUCCESS;
+			break;
+		}
+
+		if (test_options.verbose)
+			ksft_print_msg("%s\n", trace_buffer);
+
+		if (strstr(trace_buffer, substr)) {
+			if (present)
+				result = TEST_SUCCESS;
+			break;
+		}
+	}
+out:
+	close(tp);
+	return result;
+}
+
+static int bpf_test_trace(const char *substr)
+{
+	return bpf_test_trace_maybe(substr, true);
+}
+
+static int bpf_test_no_trace(const char *substr)
+{
+	return bpf_test_trace_maybe(substr, false);
+}
+
+static int basic_test(const char *mount_dir)
+{
+	const char *test_name = "test";
+	const char *test_data = "data";
+
+	int result = TEST_FAILURE;
+	int fuse_dev = -1;
+	char *filename = NULL;
+	int fd = -1;
+	FUSE_DECLARE_DAEMON;
+
+	TESTEQUAL(mount_fuse(mount_dir, -1, -1, &fuse_dev), 0);
+	FUSE_START_DAEMON();
+	if (action) {
+		char data[256];
+
+		filename = concat_file_name(mount_dir, test_name);
+		TESTERR(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+		TESTEQUAL(read(fd, data, strlen(test_data)), strlen(test_data));
+		TESTCOND(!strcmp(data, test_data));
+		TESTSYSCALL(close(fd));
+		fd = -1;
+	} else {
+		DECL_FUSE_IN(open);
+		DECL_FUSE_IN(read);
+		DECL_FUSE_IN(flush);
+		DECL_FUSE_IN(release);
+
+		TESTFUSELOOKUP(test_name, 0);
+		TESTFUSEOUT1(fuse_entry_out, ((struct fuse_entry_out) {
+			.nodeid		= 2,
+			.generation	= 1,
+			.attr.ino = 100,
+			.attr.size = 4,
+			.attr.blksize = 512,
+			.attr.mode = S_IFREG | 0777,
+			}));
+
+		TESTFUSEIN(FUSE_OPEN, open_in);
+		TESTFUSEOUT1(fuse_open_out, ((struct fuse_open_out) {
+			.fh = 1,
+			.open_flags = open_in->flags,
+		}));
+
+		//TESTFUSEINNULL(FUSE_CANONICAL_PATH);
+		//TESTFUSEOUTREAD("ignored", 7);
+
+		TESTFUSEIN(FUSE_READ, read_in);
+		TESTFUSEOUTREAD(test_data, strlen(test_data));
+
+		TESTFUSEIN(FUSE_FLUSH, flush_in);
+		TESTFUSEOUTEMPTY();
+
+		TESTFUSEIN(FUSE_RELEASE, release_in);
+		TESTFUSEOUTEMPTY();
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(fuse_dev);
+	close(fd);
+	free(filename);
+	umount(mount_dir);
+	return result;
+}
+
+static int bpf_test_real(const char *mount_dir)
+{
+	const char *test_name = "real";
+	const char *test_data = "Weebles wobble but they don't fall down";
+	int result = TEST_FAILURE;
+	int bpf_fd = -1;
+	int src_fd = -1;
+	int fuse_dev = -1;
+	char *filename = NULL;
+	int fd = -1;
+	char read_buffer[256] = {};
+	ssize_t bytes_read;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TEST(fd = openat(src_fd, test_name, O_CREAT | O_RDWR | O_CLOEXEC, 0777),
+	     fd != -1);
+	TESTEQUAL(write(fd, test_data, strlen(test_data)), strlen(test_data));
+	TESTSYSCALL(close(fd));
+	fd = -1;
+
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	filename = concat_file_name(mount_dir, test_name);
+	TESTERR(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	bytes_read = read(fd, read_buffer, strlen(test_data));
+	TESTEQUAL(bytes_read, strlen(test_data));
+	TESTEQUAL(strcmp(test_data, read_buffer), 0);
+	TESTEQUAL(bpf_test_trace("read"), 0);
+
+	result = TEST_SUCCESS;
+out:
+	close(fuse_dev);
+	close(fd);
+	free(filename);
+	umount(mount_dir);
+	close(src_fd);
+	close(bpf_fd);
+	return result;
+}
+
+
+static int bpf_test_partial(const char *mount_dir)
+{
+	const char *test_name = "partial";
+	int result = TEST_FAILURE;
+	int bpf_fd = -1;
+	int src_fd = -1;
+	int fuse_dev = -1;
+	char *filename = NULL;
+	int fd = -1;
+	FUSE_DECLARE_DAEMON;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(create_file(src_fd, s(test_name), 1, 2), 0);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	FUSE_START_DAEMON();
+	if (action) {
+		uint8_t data[PAGE_SIZE];
+
+		TEST(filename = concat_file_name(mount_dir, test_name),
+		     filename);
+		TESTERR(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+		TESTEQUAL(read(fd, data, PAGE_SIZE), PAGE_SIZE);
+		TESTEQUAL(bpf_test_trace("read"), 0);
+		TESTCOND(test_buffer(data, PAGE_SIZE, 2, 0));
+		TESTCOND(!test_buffer(data, PAGE_SIZE, 1, 0));
+		TESTEQUAL(read(fd, data, PAGE_SIZE), PAGE_SIZE);
+		TESTCOND(test_buffer(data, PAGE_SIZE, 1, 1));
+		TESTCOND(!test_buffer(data, PAGE_SIZE, 2, 1));
+		TESTSYSCALL(close(fd));
+		fd = -1;
+	} else {
+		DECL_FUSE(open);
+		DECL_FUSE(read);
+		DECL_FUSE(release);
+		uint8_t data[PAGE_SIZE];
+
+		TESTFUSEIN2(FUSE_OPEN | FUSE_POSTFILTER, open_in, open_out);
+		TESTFUSEOUT1(fuse_open_out, ((struct fuse_open_out) {
+			.fh = 1,
+			.open_flags = open_in->flags,
+		}));
+
+		TESTFUSEIN(FUSE_READ, read_in);
+		fill_buffer(data, PAGE_SIZE, 2, 0);
+		TESTFUSEOUTREAD(data, PAGE_SIZE);
+
+		TESTFUSEIN(FUSE_RELEASE, release_in);
+		TESTFUSEOUTEMPTY();
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(fuse_dev);
+	close(fd);
+	free(filename);
+	umount(mount_dir);
+	close(src_fd);
+	close(bpf_fd);
+	return result;
+}
+
+static int bpf_test_attrs(const char *mount_dir)
+{
+	const char *test_name = "partial";
+	int result = TEST_FAILURE;
+	int bpf_fd = -1;
+	int src_fd = -1;
+	int fuse_dev = -1;
+	char *filename = NULL;
+	struct stat st;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(create_file(src_fd, s(test_name), 1, 2), 0);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TEST(filename = concat_file_name(mount_dir, test_name), filename);
+	TESTSYSCALL(stat(filename, &st));
+	TESTSYSCALL(chmod(filename, 0111));
+	TESTSYSCALL(stat(filename, &st));
+	TESTEQUAL(st.st_mode & 0777, 0111);
+	TESTSYSCALL(chmod(filename, 0777));
+	TESTSYSCALL(stat(filename, &st));
+	TESTEQUAL(st.st_mode & 0777, 0777);
+	TESTSYSCALL(chown(filename, 5, 6));
+	TESTSYSCALL(stat(filename, &st));
+	TESTEQUAL(st.st_uid, 5);
+	TESTEQUAL(st.st_gid, 6);
+
+	result = TEST_SUCCESS;
+out:
+	close(fuse_dev);
+	free(filename);
+	umount(mount_dir);
+	close(src_fd);
+	close(bpf_fd);
+	return result;
+}
+
+static int bpf_test_readdir(const char *mount_dir)
+{
+	static const char * const names[] = {
+		"real", "partial", "fake", ".", ".."
+	};
+	bool used[ARRAY_SIZE(names)] = { false };
+	int result = TEST_FAILURE;
+	int bpf_fd = -1;
+	int src_fd = -1;
+	int fuse_dev = -1;
+	DIR *dir = NULL;
+	struct dirent *dirent;
+	FUSE_DECLARE_DAEMON;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(create_file(src_fd, s(names[0]), 1, 2), 0);
+	TESTEQUAL(create_file(src_fd, s(names[1]), 1, 2), 0);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	FUSE_START_DAEMON();
+	if (action) {
+		int i, j;
+
+		TEST(dir = s_opendir(s(mount_dir)), dir);
+		TESTEQUAL(bpf_test_trace("opendir"), 0);
+
+		for (i = 0; i < ARRAY_SIZE(names); ++i) {
+			TEST(dirent = readdir(dir), dirent);
+
+			for (j = 0; j < ARRAY_SIZE(names); ++j)
+				if (!used[j] &&
+				    strcmp(names[j], dirent->d_name) == 0) {
+					used[j] = true;
+					break;
+				}
+			TESTNE(j, ARRAY_SIZE(names));
+		}
+		TEST(dirent = readdir(dir), dirent == NULL);
+		TESTSYSCALL(closedir(dir));
+		dir = NULL;
+		TESTEQUAL(bpf_test_trace("readdir"), 0);
+	} else {
+		struct fuse_in_header *in_header =
+			(struct fuse_in_header *)bytes_in;
+		ssize_t res = read(fuse_dev, bytes_in, sizeof(bytes_in));
+		struct fuse_read_out *read_out =
+			(struct fuse_read_out *) (bytes_in +
+					sizeof(*in_header) +
+					sizeof(struct fuse_read_in));
+		struct fuse_dirent *fuse_dirent =
+			(struct fuse_dirent *) (bytes_in + res);
+
+		TESTGE(res, sizeof(*in_header) + sizeof(struct fuse_read_in));
+		TESTEQUAL(in_header->opcode, FUSE_READDIR | FUSE_POSTFILTER);
+		*fuse_dirent = (struct fuse_dirent) {
+			.ino = 100,
+			.off = 5,
+			.namelen = strlen("fake"),
+			.type = DT_REG,
+		};
+		strcpy((char *)(bytes_in + res + sizeof(*fuse_dirent)), "fake");
+		res += FUSE_DIRENT_ALIGN(sizeof(*fuse_dirent) + strlen("fake") +
+					 1);
+		TESTFUSEDIROUTREAD(read_out,
+				bytes_in +
+				   sizeof(struct fuse_in_header) +
+				   sizeof(struct fuse_read_in) +
+				   sizeof(struct fuse_read_out),
+				res - sizeof(struct fuse_in_header) -
+				    sizeof(struct fuse_read_in) -
+				    sizeof(struct fuse_read_out));
+		res = read(fuse_dev, bytes_in, sizeof(bytes_in));
+		TESTEQUAL(res, sizeof(*in_header) +
+			  sizeof(struct fuse_read_in) +
+			  sizeof(struct fuse_read_out));
+		TESTEQUAL(in_header->opcode, FUSE_READDIR | FUSE_POSTFILTER);
+		TESTFUSEDIROUTREAD(read_out, bytes_in, 0);
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	closedir(dir);
+	close(fuse_dev);
+	umount(mount_dir);
+	close(src_fd);
+	close(bpf_fd);
+	return result;
+}
+
+static int bpf_test_redact_readdir(const char *mount_dir)
+{
+	static const char * const names[] = {
+		"f1", "f2", "f3", "f4", "f5", "f6", ".", ".."
+	};
+	bool used[ARRAY_SIZE(names)] = { false };
+	int num_shown = (ARRAY_SIZE(names) - 2) / 2 + 2;
+	int result = TEST_FAILURE;
+	int bpf_fd = -1;
+	int src_fd = -1;
+	int fuse_dev = -1;
+	DIR *dir = NULL;
+	struct dirent *dirent;
+	int i;
+	int count = 0;
+	FUSE_DECLARE_DAEMON;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	for (i = 0; i < ARRAY_SIZE(names) - 2; i++)
+		TESTEQUAL(create_file(src_fd, s(names[i]), 1, 2), 0);
+
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_readdir_redact",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	FUSE_START_DAEMON();
+	if (action) {
+		int j;
+
+		TEST(dir = s_opendir(s(mount_dir)), dir);
+		while ((dirent = readdir(dir))) {
+			errno = 0;
+			TESTEQUAL(errno, 0);
+
+			for (j = 0; j < ARRAY_SIZE(names); ++j)
+				if (!used[j] &&
+				    strcmp(names[j], dirent->d_name) == 0) {
+					used[j] = true;
+					count++;
+					break;
+				}
+			TESTNE(j, ARRAY_SIZE(names));
+			TESTGE(num_shown, count);
+		}
+		TESTEQUAL(count, num_shown);
+		TESTSYSCALL(closedir(dir));
+		dir = NULL;
+	} else {
+		bool skip = true;
+
+		for (int i = 0; i < ARRAY_SIZE(names) + 1; i++) {
+			uint8_t bytes_in[FUSE_MIN_READ_BUFFER];
+			uint8_t bytes_out[FUSE_MIN_READ_BUFFER];
+			struct fuse_in_header *in_header =
+				(struct fuse_in_header *)bytes_in;
+			ssize_t res = read(fuse_dev, bytes_in, sizeof(bytes_in));
+			int length_out = 0;
+			uint8_t *pos;
+			uint8_t *dirs_in;
+			uint8_t *dirs_out;
+			struct fuse_read_in *fuse_read_in;
+			struct fuse_read_out *fuse_read_out_in;
+			struct fuse_read_out *fuse_read_out_out;
+			struct fuse_dirent *fuse_dirent_in = NULL;
+			struct fuse_dirent *next = NULL;
+			bool again = false;
+			int dir_ent_len = 0;
+
+			TESTGE(res, sizeof(struct fuse_in_header) +
+					sizeof(struct fuse_read_in) +
+					sizeof(struct fuse_read_out));
+
+			pos = bytes_in + sizeof(struct fuse_in_header);
+			fuse_read_in = (struct fuse_read_in *) pos;
+			pos += sizeof(*fuse_read_in);
+			fuse_read_out_in = (struct fuse_read_out *) pos;
+			pos += sizeof(*fuse_read_out_in);
+			dirs_in = pos;
+
+			pos = bytes_out + sizeof(struct fuse_out_header);
+			fuse_read_out_out = (struct fuse_read_out *) pos;
+			pos += sizeof(*fuse_read_out_out);
+			dirs_out = pos;
+
+			if (dirs_in < bytes_in + res) {
+				bool is_dot;
+
+				fuse_dirent_in = (struct fuse_dirent *) dirs_in;
+				is_dot = (fuse_dirent_in->namelen == 1 &&
+						!strncmp(fuse_dirent_in->name, ".", 1)) ||
+					 (fuse_dirent_in->namelen == 2 &&
+						!strncmp(fuse_dirent_in->name, "..", 2));
+
+				dir_ent_len = FUSE_DIRENT_ALIGN(
+					sizeof(*fuse_dirent_in) +
+					fuse_dirent_in->namelen);
+
+				if (dirs_in + dir_ent_len < bytes_in + res)
+					next = (struct fuse_dirent *)
+							(dirs_in + dir_ent_len);
+
+				if (!skip || is_dot) {
+					memcpy(dirs_out, fuse_dirent_in,
+					       sizeof(struct fuse_dirent) +
+					       fuse_dirent_in->namelen);
+					length_out += dir_ent_len;
+				}
+				again = ((skip && !is_dot) && next);
+
+				if (!is_dot)
+					skip = !skip;
+			}
+
+			fuse_read_out_out->offset = next ? next->off :
+					fuse_read_out_in->offset;
+			fuse_read_out_out->again = again;
+
+			{
+			struct fuse_out_header *out_header =
+				(struct fuse_out_header *)bytes_out;
+
+			*out_header = (struct fuse_out_header) {
+				.len = sizeof(*out_header) +
+				       sizeof(*fuse_read_out_out) + length_out,
+				.unique = in_header->unique,
+			};
+			TESTEQUAL(write(fuse_dev, bytes_out, out_header->len),
+				  out_header->len);
+			}
+		}
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	closedir(dir);
+	close(fuse_dev);
+	umount(mount_dir);
+	close(src_fd);
+	close(bpf_fd);
+	return result;
+}
+
+/*
+ * This test is more to show what classic fuse does with a creat in a subdir
+ * than a test of any new functionality
+ */
+static int bpf_test_creat(const char *mount_dir)
+{
+	const char *dir_name = "show";
+	const char *file_name = "file";
+	int result = TEST_FAILURE;
+	int fuse_dev = -1;
+	int fd = -1;
+	FUSE_DECLARE_DAEMON;
+
+	TESTEQUAL(mount_fuse(mount_dir, -1, -1, &fuse_dev), 0);
+
+	FUSE_START_DAEMON();
+	if (action) {
+		TEST(fd = s_creat(s_path(s_path(s(mount_dir), s(dir_name)),
+					 s(file_name)),
+				  0777),
+		     fd != -1);
+		TESTSYSCALL(close(fd));
+	} else {
+		DECL_FUSE_IN(create);
+		DECL_FUSE_IN(release);
+		DECL_FUSE_IN(flush);
+
+		TESTFUSELOOKUP(dir_name, 0);
+		TESTFUSEOUT1(fuse_entry_out, ((struct fuse_entry_out) {
+			.nodeid		= 3,
+			.generation	= 1,
+			.attr.ino = 100,
+			.attr.size = 4,
+			.attr.blksize = 512,
+			.attr.mode = S_IFDIR | 0777,
+			}));
+
+		TESTFUSELOOKUP(file_name, 0);
+		TESTFUSEOUTERROR(-ENOENT);
+
+		TESTFUSEINEXT(FUSE_CREATE, create_in, strlen(file_name) + 1);
+		TESTFUSEOUT2(fuse_entry_out, ((struct fuse_entry_out) {
+			.nodeid		= 2,
+			.generation	= 1,
+			.attr.ino = 200,
+			.attr.size = 4,
+			.attr.blksize = 512,
+			.attr.mode = S_IFREG,
+			}),
+			fuse_open_out, ((struct fuse_open_out) {
+			.fh = 1,
+			.open_flags = create_in->flags,
+			}));
+
+		//TESTFUSEINNULL(FUSE_CANONICAL_PATH);
+		//TESTFUSEOUTREAD("ignored", 7);
+
+		TESTFUSEIN(FUSE_FLUSH, flush_in);
+		TESTFUSEOUTEMPTY();
+
+		TESTFUSEIN(FUSE_RELEASE, release_in);
+		TESTFUSEOUTEMPTY();
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(fuse_dev);
+	umount(mount_dir);
+	return result;
+}
+
+static int bpf_test_hidden_entries(const char *mount_dir)
+{
+	static const char * const dir_names[] = {
+		"show",
+		"hide",
+	};
+	const char *file_name = "file";
+	const char *data = "The quick brown fox jumps over the lazy dog\n";
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	int fd = -1;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTSYSCALL(mkdirat(src_fd, dir_names[0], 0777));
+	TESTSYSCALL(mkdirat(src_fd, dir_names[1], 0777));
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_hidden",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TEST(fd = s_creat(s_path(s_path(s(mount_dir), s(dir_names[0])),
+				 s(file_name)),
+			  0777),
+	     fd != -1);
+	TESTSYSCALL(fallocate(fd, 0, 0, 4096));
+	TEST(write(fd, data, strlen(data)), strlen(data));
+	TESTSYSCALL(close(fd));
+	TESTEQUAL(bpf_test_trace("Create"), 0);
+
+	result = TEST_SUCCESS;
+out:
+	close(fuse_dev);
+	umount(mount_dir);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_dir(const char *mount_dir)
+{
+	const char *dir_name = "dir";
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	struct stat st;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TESTSYSCALL(s_mkdir(s_path(s(mount_dir), s(dir_name)), 0777));
+	TESTEQUAL(bpf_test_trace("mkdir"), 0);
+	TESTSYSCALL(s_stat(s_path(s(ft_src), s(dir_name)), &st));
+	TESTSYSCALL(s_rmdir(s_path(s(mount_dir), s(dir_name))));
+	TESTEQUAL(s_stat(s_path(s(ft_src), s(dir_name)), &st), -1);
+	TESTEQUAL(errno, ENOENT);
+	result = TEST_SUCCESS;
+out:
+	close(fuse_dev);
+	umount(mount_dir);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_file(const char *mount_dir, bool close_first)
+{
+	const char *file_name = "real";
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	int fd = -1;
+	struct stat st;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+			  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TEST(fd = s_creat(s_path(s(mount_dir), s(file_name)),
+			  0777),
+	     fd != -1);
+	TESTEQUAL(bpf_test_trace("Create"), 0);
+	if (close_first) {
+		TESTSYSCALL(close(fd));
+		fd = -1;
+	}
+	TESTSYSCALL(s_stat(s_path(s(ft_src), s(file_name)), &st));
+	TESTSYSCALL(s_unlink(s_path(s(mount_dir), s(file_name))));
+	TESTEQUAL(bpf_test_trace("unlink"), 0);
+	TESTEQUAL(s_stat(s_path(s(ft_src), s(file_name)), &st), -1);
+	TESTEQUAL(errno, ENOENT);
+	if (!close_first) {
+		TESTSYSCALL(close(fd));
+		fd = -1;
+	}
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	close(fuse_dev);
+	umount(mount_dir);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_file_early_close(const char *mount_dir)
+{
+	return bpf_test_file(mount_dir, true);
+}
+
+static int bpf_test_file_late_close(const char *mount_dir)
+{
+	return bpf_test_file(mount_dir, false);
+}
+
+static int bpf_test_alter_errcode_bpf(const char *mount_dir)
+{
+	const char *dir_name = "dir";
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	struct stat st;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_error",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TESTSYSCALL(s_mkdir(s_path(s(mount_dir), s(dir_name)), 0777));
+	//TESTEQUAL(bpf_test_trace("mkdir"), 0);
+	TESTSYSCALL(s_stat(s_path(s(ft_src), s(dir_name)), &st));
+	TESTEQUAL(s_mkdir(s_path(s(mount_dir), s(dir_name)), 0777), -EPERM);
+	TESTSYSCALL(s_rmdir(s_path(s(mount_dir), s(dir_name))));
+	TESTEQUAL(s_stat(s_path(s(ft_src), s(dir_name)), &st), -1);
+	TESTEQUAL(errno, ENOENT);
+	result = TEST_SUCCESS;
+out:
+	close(fuse_dev);
+	umount(mount_dir);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_alter_errcode_userspace(const char *mount_dir)
+{
+	const char *dir_name = "doesnotexist";
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	FUSE_DECLARE_DAEMON;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_error",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	FUSE_START_DAEMON();
+	if (action) {
+		TESTEQUAL(s_unlink(s_path(s(mount_dir), s(dir_name))),
+		     -1);
+		TESTEQUAL(errno, ENOMEM);
+	} else {
+		TESTFUSELOOKUP("doesnotexist", FUSE_POSTFILTER);
+		TESTFUSEOUTERROR(-ENOMEM);
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(fuse_dev);
+	umount(mount_dir);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_mknod(const char *mount_dir)
+{
+	const char *file_name = "real";
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	struct stat st;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TESTSYSCALL(s_mkfifo(s_path(s(mount_dir), s(file_name)), 0777));
+	TESTEQUAL(bpf_test_trace("mknod"), 0);
+	TESTSYSCALL(s_stat(s_path(s(ft_src), s(file_name)), &st));
+	TESTSYSCALL(s_unlink(s_path(s(mount_dir), s(file_name))));
+	TESTEQUAL(bpf_test_trace("unlink"), 0);
+	TESTEQUAL(s_stat(s_path(s(ft_src), s(file_name)), &st), -1);
+	TESTEQUAL(errno, ENOENT);
+	result = TEST_SUCCESS;
+out:
+	close(fuse_dev);
+	umount(mount_dir);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_largedir(const char *mount_dir)
+{
+	const char *show = "show";
+	const int files = 1000;
+
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	struct map_relocation *map_relocations = NULL;
+	size_t map_count = 0;
+	FUSE_DECLARE_DAEMON;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("fd_bpf.bpf", "test_daemon",
+			  &bpf_fd, &map_relocations, &map_count), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	FUSE_START_DAEMON();
+	if (action) {
+		int i;
+		int fd;
+		DIR *dir = NULL;
+		struct dirent *dirent;
+
+		TESTSYSCALL(s_mkdir(s_path(s(mount_dir), s(show)), 0777));
+		for (i = 0; i < files; ++i) {
+			char filename[NAME_MAX];
+
+			sprintf(filename, "%d", i);
+			TEST(fd = s_creat(s_path(s_path(s(mount_dir), s(show)),
+						 s(filename)), 0777), fd != -1);
+			TESTSYSCALL(close(fd));
+		}
+
+		TEST(dir = s_opendir(s_path(s(mount_dir), s(show))), dir);
+		for (dirent = readdir(dir); dirent; dirent = readdir(dir))
+			;
+		closedir(dir);
+	} else {
+		int i;
+
+		for (i = 0; i < files + 2; ++i) {
+			TESTFUSELOOKUP(show, FUSE_PREFILTER);
+			TESTFUSEOUTREAD(show, 5);
+		}
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(fuse_dev);
+	umount(mount_dir);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_link(const char *mount_dir)
+{
+	const char *file_name = "real";
+	const char *link_name = "partial";
+	int result = TEST_FAILURE;
+	int fd = -1;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	struct stat st;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace", &bpf_fd, NULL,
+				  NULL),
+		  0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TEST(fd = s_creat(s_path(s(mount_dir), s(file_name)), 0777), fd != -1);
+	TESTEQUAL(bpf_test_trace("Create"), 0);
+	TESTSYSCALL(s_stat(s_path(s(ft_src), s(file_name)), &st));
+
+	TESTSYSCALL(s_link(s_path(s(mount_dir), s(file_name)),
+			   s_path(s(mount_dir), s(link_name))));
+
+	TESTEQUAL(bpf_test_trace("link"), 0);
+	TESTSYSCALL(s_stat(s_path(s(ft_src), s(link_name)), &st));
+
+	TESTSYSCALL(s_unlink(s_path(s(mount_dir), s(link_name))));
+	TESTEQUAL(bpf_test_trace("unlink"), 0);
+	TESTEQUAL(s_stat(s_path(s(ft_src), s(link_name)), &st), -1);
+	TESTEQUAL(errno, ENOENT);
+
+	TESTSYSCALL(s_unlink(s_path(s(mount_dir), s(file_name))));
+	TESTEQUAL(bpf_test_trace("unlink"), 0);
+	TESTEQUAL(s_stat(s_path(s(ft_src), s(file_name)), &st), -1);
+	TESTEQUAL(errno, ENOENT);
+
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	close(fuse_dev);
+	umount(mount_dir);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_symlink(const char *mount_dir)
+{
+	const char *test_name = "real";
+	const char *symlink_name = "partial";
+	const char *test_data = "Weebles wobble but they don't fall down";
+	int result = TEST_FAILURE;
+	int bpf_fd = -1;
+	int src_fd = -1;
+	int fuse_dev = -1;
+	int fd = -1;
+	char read_buffer[256] = {};
+	ssize_t bytes_read;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TEST(fd = openat(src_fd, test_name, O_CREAT | O_RDWR | O_CLOEXEC, 0777),
+	     fd != -1);
+	TESTEQUAL(write(fd, test_data, strlen(test_data)), strlen(test_data));
+	TESTSYSCALL(close(fd));
+	fd = -1;
+
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TESTSYSCALL(s_symlink(s_path(s(mount_dir), s(test_name)),
+				   s_path(s(mount_dir), s(symlink_name))));
+	TESTEQUAL(bpf_test_trace("symlink"), 0);
+
+	TESTERR(fd = s_open(s_path(s(mount_dir), s(symlink_name)), O_RDONLY | O_CLOEXEC), fd != -1);
+	bytes_read = read(fd, read_buffer, strlen(test_data));
+	TESTEQUAL(bpf_test_trace("readlink"), 0);
+	TESTEQUAL(bytes_read, strlen(test_data));
+	TESTEQUAL(strcmp(test_data, read_buffer), 0);
+
+	result = TEST_SUCCESS;
+out:
+	close(fuse_dev);
+	close(fd);
+	umount(mount_dir);
+	close(src_fd);
+	close(bpf_fd);
+	return result;
+}
+
+static int bpf_test_xattr(const char *mount_dir)
+{
+	static const char file_name[] = "real";
+	static const char xattr_name[] = "user.xattr_test_name";
+	static const char xattr_value[] = "this_is_a_test";
+	const size_t xattr_size = sizeof(xattr_value);
+	char xattr_value_ret[256];
+	ssize_t xattr_size_ret;
+	int result = TEST_FAILURE;
+	int fd = -1;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	struct stat st;
+
+	memset(xattr_value_ret, '\0', sizeof(xattr_value_ret));
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace", &bpf_fd, NULL,
+				  NULL),
+		  0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TEST(fd = s_creat(s_path(s(mount_dir), s(file_name)), 0777), fd != -1);
+	TESTEQUAL(bpf_test_trace("Create"), 0);
+	TESTSYSCALL(close(fd));
+
+	TESTSYSCALL(s_stat(s_path(s(ft_src), s(file_name)), &st));
+	TEST(result = s_getxattr(s_path(s(mount_dir), s(file_name)), xattr_name,
+				 xattr_value_ret, sizeof(xattr_value_ret),
+				 &xattr_size_ret),
+	     result == -1);
+	TESTEQUAL(errno, ENODATA);
+	TESTEQUAL(bpf_test_trace("getxattr"), 0);
+
+	TESTSYSCALL(s_listxattr(s_path(s(mount_dir), s(file_name)),
+				xattr_value_ret, sizeof(xattr_value_ret),
+				&xattr_size_ret));
+	TESTEQUAL(bpf_test_trace("listxattr"), 0);
+	TESTEQUAL(xattr_size_ret, 0);
+
+	TESTSYSCALL(s_setxattr(s_path(s(mount_dir), s(file_name)), xattr_name,
+			       xattr_value, xattr_size, 0));
+	TESTEQUAL(bpf_test_trace("setxattr"), 0);
+
+	TESTSYSCALL(s_listxattr(s_path(s(mount_dir), s(file_name)),
+				xattr_value_ret, sizeof(xattr_value_ret),
+				&xattr_size_ret));
+	TESTEQUAL(bpf_test_trace("listxattr"), 0);
+	TESTEQUAL(xattr_size_ret, sizeof(xattr_name));
+	TESTEQUAL(strcmp(xattr_name, xattr_value_ret), 0);
+
+	TESTSYSCALL(s_getxattr(s_path(s(mount_dir), s(file_name)), xattr_name,
+			       xattr_value_ret, sizeof(xattr_value_ret),
+			       &xattr_size_ret));
+	TESTEQUAL(bpf_test_trace("getxattr"), 0);
+	TESTEQUAL(xattr_size, xattr_size_ret);
+	TESTEQUAL(strcmp(xattr_value, xattr_value_ret), 0);
+
+	TESTSYSCALL(s_removexattr(s_path(s(mount_dir), s(file_name)), xattr_name));
+	TESTEQUAL(bpf_test_trace("removexattr"), 0);
+
+	TESTEQUAL(s_getxattr(s_path(s(mount_dir), s(file_name)), xattr_name,
+			       xattr_value_ret, sizeof(xattr_value_ret),
+			       &xattr_size_ret), -1);
+	TESTEQUAL(errno, ENODATA);
+
+	TESTSYSCALL(s_unlink(s_path(s(mount_dir), s(file_name))));
+	TESTEQUAL(bpf_test_trace("unlink"), 0);
+	TESTEQUAL(s_stat(s_path(s(ft_src), s(file_name)), &st), -1);
+	TESTEQUAL(errno, ENOENT);
+
+	result = TEST_SUCCESS;
+out:
+	close(fuse_dev);
+	umount(mount_dir);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_set_backing(const char *mount_dir)
+{
+	const char *backing_name = "backing";
+	const char *test_data = "data";
+	const char *test_name = "test";
+
+	int result = TEST_FAILURE;
+	int fuse_dev = -1;
+	int fd = -1;
+	FUSE_DECLARE_DAEMON;
+
+	TESTEQUAL(mount_fuse_no_init(mount_dir, -1, -1, &fuse_dev), 0);
+	FUSE_START_DAEMON();
+	if (action) {
+		char data[256] = {0};
+
+		TESTERR(fd = s_open(s_path(s(mount_dir), s(test_name)),
+				    O_RDONLY | O_CLOEXEC), fd != -1);
+		TESTEQUAL(read(fd, data, strlen(test_data)), strlen(test_data));
+		TESTCOND(!strcmp(data, test_data));
+		TESTSYSCALL(close(fd));
+		fd = -1;
+		TESTSYSCALL(umount(mount_dir));
+	} else {
+		int bpf_fd  = -1;
+		int backing_fd = -1;
+
+		TESTERR(backing_fd = s_creat(s_path(s(ft_src), s(backing_name)), 0777),
+			backing_fd != -1);
+		TESTEQUAL(write(backing_fd, test_data, strlen(test_data)),
+			  strlen(test_data));
+		TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_simple",
+					  &bpf_fd, NULL, NULL), 0);
+
+		TESTFUSEINIT();
+		TESTFUSELOOKUP(test_name, 0);
+		TESTFUSEOUT2(fuse_entry_out, ((struct fuse_entry_out) {0}),
+			     fuse_entry_bpf_out, ((struct fuse_entry_bpf_out) {
+			.backing_action = FUSE_ACTION_REPLACE,
+			.backing_fd = backing_fd,
+			.bpf_action = FUSE_ACTION_REPLACE,
+			.bpf_fd = bpf_fd,
+			}));
+		read(fuse_dev, bytes_in, sizeof(bytes_in));
+		TESTSYSCALL(close(bpf_fd));
+		TESTSYSCALL(close(backing_fd));
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(fuse_dev);
+	close(fd);
+	umount(mount_dir);
+	return result;
+}
+
+static int bpf_test_remove_backing(const char *mount_dir)
+{
+	const char *folder1 = "folder1";
+	const char *folder2 = "folder2";
+	const char *file = "file1";
+	const char *contents1 = "contents1";
+	const char *contents2 = "contents2";
+
+	int result = TEST_FAILURE;
+	int fuse_dev = -1;
+	int fd = -1;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	char data[256] = {0};
+	FUSE_DECLARE_DAEMON;
+
+	/*
+	 * Create folder1/file
+	 *        folder2/file
+	 *
+	 * test will install bpf into mount.
+	 * bpf will postfilter root lookup to daemon.
+	 * daemon will remove bpf and redirect opens on folder1 to folder2.
+	 * test will open folder1/file which will be redirected to folder2.
+	 * test will check no traces for file, and contents are folder2/file.
+	 */
+	TESTEQUAL(bpf_clear_trace(), 0);
+	TESTSYSCALL(s_mkdir(s_path(s(ft_src), s(folder1)), 0777));
+	TEST(fd = s_creat(s_pathn(3, s(ft_src), s(folder1), s(file)), 0777),
+	     fd != -1);
+	TESTEQUAL(write(fd, contents1, strlen(contents1)), strlen(contents1));
+	TESTSYSCALL(close(fd));
+	TESTSYSCALL(s_mkdir(s_path(s(ft_src), s(folder2)), 0777));
+	TEST(fd = s_creat(s_pathn(3, s(ft_src), s(folder2), s(file)), 0777),
+	     fd != -1);
+	TESTEQUAL(write(fd, contents2, strlen(contents2)), strlen(contents2));
+	TESTSYSCALL(close(fd));
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_passthrough", &bpf_fd,
+				  NULL, NULL), 0);
+	TESTEQUAL(mount_fuse_no_init(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	FUSE_START_DAEMON();
+	if (action) {
+		TESTERR(fd = s_open(s_pathn(3, s(mount_dir), s(folder1),
+					    s(file)),
+				    O_RDONLY | O_CLOEXEC), fd != -1);
+		TESTEQUAL(read(fd, data, sizeof(data)), strlen(contents2));
+		TESTCOND(!strcmp(data, contents2));
+		TESTEQUAL(bpf_test_no_trace("file"), 0);
+		TESTSYSCALL(close(fd));
+		fd = -1;
+		TESTSYSCALL(umount(mount_dir));
+	} else {
+		struct {
+			char name[8];
+			struct fuse_entry_out feo;
+			struct fuse_entry_bpf_out febo;
+		} __packed in;
+		int backing_fd = -1;
+
+		TESTFUSEINIT();
+		TESTFUSEIN(FUSE_LOOKUP | FUSE_POSTFILTER, &in);
+		TEST(backing_fd = s_open(s_path(s(ft_src), s(folder2)),
+				 O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+		     backing_fd != -1);
+		TESTFUSEOUT2(fuse_entry_out, ((struct fuse_entry_out) {0}),
+			     fuse_entry_bpf_out, ((struct fuse_entry_bpf_out) {
+			.bpf_action = FUSE_ACTION_REMOVE,
+			.backing_action = FUSE_ACTION_REPLACE,
+			.backing_fd = backing_fd,
+			}));
+
+		while (read(fuse_dev, bytes_in, sizeof(bytes_in)) != -1)
+			;
+		TESTSYSCALL(close(backing_fd));
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(fuse_dev);
+	close(fd);
+	close(src_fd);
+	close(bpf_fd);
+	umount(mount_dir);
+	return result;
+}
+
+static int bpf_test_dir_rename(const char *mount_dir)
+{
+	const char *dir_name = "dir";
+	const char *dir_name2 = "dir2";
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	struct stat st;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TESTSYSCALL(s_mkdir(s_path(s(mount_dir), s(dir_name)), 0777));
+	TESTEQUAL(bpf_test_trace("mkdir"), 0);
+	TESTSYSCALL(s_stat(s_path(s(ft_src), s(dir_name)), &st));
+	TESTSYSCALL(s_rename(s_path(s(mount_dir), s(dir_name)),
+			     s_path(s(mount_dir), s(dir_name2))));
+	TESTEQUAL(s_stat(s_path(s(ft_src), s(dir_name)), &st), -1);
+	TESTEQUAL(errno, ENOENT);
+	TESTSYSCALL(s_stat(s_path(s(ft_src), s(dir_name2)), &st));
+	result = TEST_SUCCESS;
+out:
+	close(fuse_dev);
+	umount(mount_dir);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_file_rename(const char *mount_dir)
+{
+	const char *dir = "dir";
+	const char *file1 = "file1";
+	const char *file2 = "file2";
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	int fd = -1;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TESTSYSCALL(s_mkdir(s_path(s(mount_dir), s(dir)), 0777));
+	TEST(fd = s_creat(s_pathn(3, s(mount_dir), s(dir), s(file1)), 0777),
+	     fd != -1);
+	TESTSYSCALL(s_rename(s_pathn(3, s(mount_dir), s(dir), s(file1)),
+			     s_pathn(3, s(mount_dir), s(dir), s(file2))));
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	umount(mount_dir);
+	close(fuse_dev);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int mmap_test(const char *mount_dir)
+{
+	const char *file = "file";
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int fuse_dev = -1;
+	int fd = -1;
+	char *addr = NULL;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(mount_fuse(mount_dir, -1, src_fd, &fuse_dev), 0);
+	TEST(fd = s_open(s_path(s(mount_dir), s(file)),
+			 O_CREAT | O_RDWR | O_CLOEXEC, 0777),
+	     fd != -1);
+	TESTSYSCALL(fallocate(fd, 0, 4096, SEEK_CUR));
+	TEST(addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0),
+	     addr != (void *) -1);
+	memset(addr, 'a', 4096);
+
+	result = TEST_SUCCESS;
+out:
+	munmap(addr, 4096);
+	close(fd);
+	umount(mount_dir);
+	close(fuse_dev);
+	close(src_fd);
+	return result;
+}
+
+static int readdir_perms_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	struct __user_cap_header_struct uchs = { _LINUX_CAPABILITY_VERSION_3 };
+	struct __user_cap_data_struct ucds[2];
+	int src_fd = -1;
+	int fuse_dev = -1;
+	DIR *dir = NULL;
+
+	/* Must remove capabilities for this test. */
+	TESTSYSCALL(syscall(SYS_capget, &uchs, ucds));
+	ucds[0].effective &= ~(1 << CAP_DAC_OVERRIDE | 1 << CAP_DAC_READ_SEARCH);
+	TESTSYSCALL(syscall(SYS_capset, &uchs, ucds));
+
+	/* This is what we are testing in fuseland. First test without fuse, */
+	TESTSYSCALL(mkdir("test", 0111));
+	TEST(dir = opendir("test"), dir == NULL);
+	closedir(dir);
+	dir = NULL;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(mount_fuse(mount_dir, -1, src_fd, &fuse_dev), 0);
+
+	TESTSYSCALL(s_mkdir(s_path(s(mount_dir), s("test")), 0111));
+	TEST(dir = s_opendir(s_path(s(mount_dir), s("test"))), dir == NULL);
+
+	result = TEST_SUCCESS;
+out:
+	ucds[0].effective |= 1 << CAP_DAC_OVERRIDE | 1 << CAP_DAC_READ_SEARCH;
+	syscall(SYS_capset, &uchs, ucds);
+
+	closedir(dir);
+	s_rmdir(s_path(s(mount_dir), s("test")));
+	umount(mount_dir);
+	close(fuse_dev);
+	close(src_fd);
+	rmdir("test");
+	return result;
+}
+
+static int inotify_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int fuse_dev = -1;
+	struct s dir;
+	int inotify_fd = -1;
+	int watch;
+	int fd = -1;
+	char buffer[sizeof(struct inotify_event) + NAME_MAX + 1];
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(mount_fuse(mount_dir, -1, src_fd, &fuse_dev), 0);
+
+	TEST(inotify_fd = inotify_init1(IN_CLOEXEC), inotify_fd != -1);
+	dir = s_path(s(mount_dir), s("dir"));
+	TESTSYSCALL(mkdir(dir.s, 0777));
+	TEST(watch = inotify_add_watch(inotify_fd, dir.s, IN_CREATE), watch);
+	TEST(fd = s_creat(s_path(s(ft_src), s("dir/file")), 0777), fd != -1);
+	// buffer will be two struct lengths, as "file" gets rounded up to the
+	// next multiple of struct inotify_event
+	TESTEQUAL(read(inotify_fd, &buffer, sizeof(buffer)),
+		  sizeof(struct inotify_event) * 2);
+
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	s_unlink(s_path(s(ft_src), s("dir/file")));
+	close(inotify_fd);
+	rmdir(dir.s);
+	free(dir.s);
+	umount(mount_dir);
+	close(fuse_dev);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_statfs(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	int fd = -1;
+	struct statfs st;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TESTSYSCALL(s_statfs(s(mount_dir), &st));
+	TESTEQUAL(bpf_test_trace("statfs"), 0);
+	TESTEQUAL(st.f_type, 0x65735546);
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	umount(mount_dir);
+	close(fuse_dev);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+static int bpf_test_lseek(const char *mount_dir)
+{
+	const char *file = "real";
+	const char *test_data = "data";
+	int result = TEST_FAILURE;
+	int src_fd = -1;
+	int bpf_fd = -1;
+	int fuse_dev = -1;
+	int fd = -1;
+
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TEST(fd = openat(src_fd, file, O_CREAT | O_RDWR | O_CLOEXEC, 0777),
+	     fd != -1);
+	TESTEQUAL(write(fd, test_data, strlen(test_data)), strlen(test_data));
+	TESTSYSCALL(close(fd));
+	fd = -1;
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_trace",
+				  &bpf_fd, NULL, NULL), 0);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+
+	TEST(fd = s_open(s_path(s(mount_dir), s(file)), O_RDONLY | O_CLOEXEC),
+	     fd != -1);
+	TESTEQUAL(lseek(fd, 3, SEEK_SET), 3);
+	TESTEQUAL(bpf_test_trace("lseek"), 0);
+	TESTEQUAL(lseek(fd, 5, SEEK_END), 9);
+	TESTEQUAL(bpf_test_trace("lseek"), 0);
+	TESTEQUAL(lseek(fd, 1, SEEK_CUR), 10);
+	TESTEQUAL(bpf_test_trace("lseek"), 0);
+	TESTEQUAL(lseek(fd, 1, SEEK_DATA), 1);
+	TESTEQUAL(bpf_test_trace("lseek"), 0);
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	umount(mount_dir);
+	close(fuse_dev);
+	close(bpf_fd);
+	close(src_fd);
+	return result;
+}
+
+/*
+ * State:
+ * Original: dst/folder1/content.txt
+ *                  ^
+ *                  |
+ *                  |
+ * Backing:  src/folder1/content.txt
+ *
+ * Step 1:  open(folder1) - set backing to src/folder1
+ * Check 1: cat(content.txt) - check not receiving call on the fuse daemon
+ *                             and content is the same
+ * Step 2:  readdirplus(dst)
+ * Check 2: cat(content.txt) - check not receiving call on the fuse daemon
+ *                             and content is the same
+ */
+static int bpf_test_readdirplus_not_overriding_backing(const char *mount_dir)
+{
+	const char *folder1 = "folder1";
+	const char *content_file = "content.txt";
+	const char *content = "hello world";
+
+	int result = TEST_FAILURE;
+	int fuse_dev = -1;
+	int src_fd = -1;
+	int content_fd = -1;
+	FUSE_DECLARE_DAEMON;
+
+	TESTSYSCALL(s_mkdir(s_path(s(ft_src), s(folder1)), 0777));
+	TEST(content_fd = s_creat(s_pathn(3, s(ft_src), s(folder1), s(content_file)), 0777),
+		content_fd != -1);
+	TESTEQUAL(write(content_fd, content, strlen(content)), strlen(content));
+	TESTEQUAL(mount_fuse_no_init(mount_dir, -1, -1, &fuse_dev), 0);
+
+	FUSE_START_DAEMON();
+	if (action) {
+		DIR *open_mount_dir = NULL;
+		struct dirent *mount_dirent;
+		int dst_folder1_fd = -1;
+		int dst_content_fd = -1;
+		char content_buffer[12];
+
+		// Step 1: Lookup folder1
+		TESTERR(dst_folder1_fd = s_open(s_path(s(mount_dir), s(folder1)),
+			O_RDONLY | O_CLOEXEC), dst_folder1_fd != -1);
+
+		// Check 1: Read content file (backed)
+		TESTERR(dst_content_fd =
+			s_open(s_pathn(3, s(mount_dir), s(folder1), s(content_file)),
+			O_RDONLY | O_CLOEXEC), dst_content_fd != -1);
+
+		TESTEQUAL(read(dst_content_fd, content_buffer, strlen(content)),
+			  strlen(content));
+		TESTEQUAL(strncmp(content, content_buffer, strlen(content)), 0);
+
+		TESTSYSCALL(close(dst_content_fd));
+		dst_content_fd = -1;
+		TESTSYSCALL(close(dst_folder1_fd));
+		dst_folder1_fd = -1;
+		memset(content_buffer, 0, strlen(content));
+
+		// Step 2: readdir folder 1
+		TEST(open_mount_dir = s_opendir(s(mount_dir)),
+			open_mount_dir != NULL);
+		TEST(mount_dirent = readdir(open_mount_dir), mount_dirent != NULL);
+		TESTSYSCALL(closedir(open_mount_dir));
+		open_mount_dir = NULL;
+
+		// Check 2: Read content file again (must be backed)
+		TESTERR(dst_content_fd =
+			s_open(s_pathn(3, s(mount_dir), s(folder1), s(content_file)),
+			O_RDONLY | O_CLOEXEC), dst_content_fd != -1);
+
+		TESTEQUAL(read(dst_content_fd, content_buffer, strlen(content)),
+			  strlen(content));
+		TESTEQUAL(strncmp(content, content_buffer, strlen(content)), 0);
+
+		TESTSYSCALL(close(dst_content_fd));
+		dst_content_fd = -1;
+	} else {
+		size_t read_size = 0;
+		struct fuse_in_header *in_header = (struct fuse_in_header *)bytes_in;
+		struct fuse_read_out *read_out = NULL;
+		struct fuse_attr attr = {};
+		int backing_fd = -1;
+		DECL_FUSE_IN(open);
+		DECL_FUSE_IN(getattr);
+
+		TESTFUSEINITFLAGS(FUSE_DO_READDIRPLUS | FUSE_READDIRPLUS_AUTO);
+
+		// Step 1: Lookup folder 1 with backing
+		TESTFUSELOOKUP(folder1, 0);
+		TESTSYSCALL(s_fuse_attr(s_path(s(ft_src), s(folder1)), &attr));
+		TEST(backing_fd = s_open(s_path(s(ft_src), s(folder1)),
+					 O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+		     backing_fd != -1);
+		TESTFUSEOUT2(fuse_entry_out, ((struct fuse_entry_out) {
+				.nodeid = attr.ino,
+				.generation = 0,
+				.entry_valid = UINT64_MAX,
+				.attr_valid = UINT64_MAX,
+				.entry_valid_nsec = UINT32_MAX,
+				.attr_valid_nsec = UINT32_MAX,
+				.attr = attr,
+			     }), fuse_entry_bpf_out, ((struct fuse_entry_bpf_out) {
+				.backing_action = FUSE_ACTION_REPLACE,
+				.backing_fd = backing_fd,
+			     }));
+		TESTSYSCALL(close(backing_fd));
+
+		// Step 2: Open root dir
+		TESTFUSEIN(FUSE_OPENDIR, open_in);
+		TESTFUSEOUT1(fuse_open_out, ((struct fuse_open_out) {
+			.fh = 100,
+			.open_flags = open_in->flags
+		}));
+
+		// Step 2: Handle getattr
+		TESTFUSEIN(FUSE_GETATTR, getattr_in);
+		TESTSYSCALL(s_fuse_attr(s(ft_src), &attr));
+		TESTFUSEOUT1(fuse_attr_out, ((struct fuse_attr_out) {
+			.attr_valid = UINT64_MAX,
+			.attr_valid_nsec = UINT32_MAX,
+			.attr = attr
+		}));
+
+		// Step 2: Handle readdirplus
+		read_size = read(fuse_dev, bytes_in, sizeof(bytes_in));
+		TESTEQUAL(in_header->opcode, FUSE_READDIRPLUS);
+
+		struct fuse_direntplus *dirent_plus =
+			(struct fuse_direntplus *) (bytes_in + read_size);
+		struct fuse_dirent dirent;
+		struct fuse_entry_out entry_out;
+
+		read_out = (struct fuse_read_out *) (bytes_in +
+					sizeof(*in_header) +
+					sizeof(struct fuse_read_in));
+
+		TESTSYSCALL(s_fuse_attr(s_path(s(ft_src), s(folder1)), &attr));
+
+		dirent = (struct fuse_dirent) {
+			.ino = attr.ino,
+			.off = 1,
+			.namelen = strlen(folder1),
+			.type = DT_REG
+		};
+		entry_out = (struct fuse_entry_out) {
+			.nodeid = attr.ino,
+			.generation = 0,
+			.entry_valid = UINT64_MAX,
+			.attr_valid = UINT64_MAX,
+			.entry_valid_nsec = UINT32_MAX,
+			.attr_valid_nsec = UINT32_MAX,
+			.attr = attr
+		};
+		*dirent_plus = (struct fuse_direntplus) {
+			.dirent = dirent,
+			.entry_out = entry_out
+		};
+
+		strcpy((char *)(bytes_in + read_size + sizeof(*dirent_plus)), folder1);
+		read_size += FUSE_DIRENT_ALIGN(sizeof(*dirent_plus) + strlen(folder1) +
+					1);
+		TESTFUSEDIROUTREAD(read_out,
+				bytes_in +
+				sizeof(struct fuse_in_header) +
+				sizeof(struct fuse_read_in) +
+				sizeof(struct fuse_read_out),
+				read_size - sizeof(struct fuse_in_header) -
+					sizeof(struct fuse_read_in) -
+					sizeof(struct fuse_read_out));
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(fuse_dev);
+	close(content_fd);
+	close(src_fd);
+	umount(mount_dir);
+	return result;
+}
+
+static int bpf_test_no_readdirplus_without_nodeid(const char *mount_dir)
+{
+	const char *folder1 = "folder1";
+	const char *folder2 = "folder2";
+	int result = TEST_FAILURE;
+	int fuse_dev = -1;
+	int src_fd = -1;
+	int content_fd = -1;
+	int bpf_fd = -1;
+	FUSE_DECLARE_DAEMON;
+
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_readdirplus",
+					  &bpf_fd, NULL, NULL), 0);
+	TESTSYSCALL(s_mkdir(s_path(s(ft_src), s(folder1)), 0777));
+	TESTSYSCALL(s_mkdir(s_path(s(ft_src), s(folder2)), 0777));
+	TESTEQUAL(mount_fuse_no_init(mount_dir, -1, -1, &fuse_dev), 0);
+	FUSE_START_DAEMON();
+	if (action) {
+		DIR *open_dir = NULL;
+		struct dirent *dirent;
+
+		// Folder 1: Readdir with no nodeid
+		TEST(open_dir = s_opendir(s_path(s(ft_dst), s(folder1))),
+				open_dir != NULL);
+		TEST(dirent = readdir(open_dir), dirent == NULL);
+		TESTCOND(errno == EINVAL);
+		TESTSYSCALL(closedir(open_dir));
+		open_dir = NULL;
+
+		// Folder 2: Readdir with a nodeid
+		TEST(open_dir = s_opendir(s_path(s(ft_dst), s(folder2))),
+				open_dir != NULL);
+		TEST(dirent = readdir(open_dir), dirent == NULL);
+		TESTCOND(errno == EINVAL);
+		TESTSYSCALL(closedir(open_dir));
+		open_dir = NULL;
+	} else {
+		size_t read_size;
+		struct fuse_in_header *in_header = (struct fuse_in_header *)bytes_in;
+		struct fuse_attr attr = {};
+		int backing_fd = -1;
+
+		TESTFUSEINITFLAGS(FUSE_DO_READDIRPLUS | FUSE_READDIRPLUS_AUTO);
+
+		// folder 1: Set 0 as nodeid, Expect READDIR
+		TESTFUSELOOKUP(folder1, 0);
+		TEST(backing_fd = s_open(s_path(s(ft_src), s(folder1)),
+					 O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+		     backing_fd != -1);
+		TESTFUSEOUT2(fuse_entry_out, ((struct fuse_entry_out) {
+				.nodeid = 0,
+				.generation = 0,
+				.entry_valid = UINT64_MAX,
+				.attr_valid = UINT64_MAX,
+				.entry_valid_nsec = UINT32_MAX,
+				.attr_valid_nsec = UINT32_MAX,
+				.attr = attr,
+			     }), fuse_entry_bpf_out, ((struct fuse_entry_bpf_out) {
+				.backing_action = FUSE_ACTION_REPLACE,
+				.backing_fd = backing_fd,
+				.bpf_action = FUSE_ACTION_REPLACE,
+				.bpf_fd = bpf_fd,
+			     }));
+		TESTSYSCALL(close(backing_fd));
+		TEST(read_size = read(fuse_dev, bytes_in, sizeof(bytes_in)), read_size > 0);
+		TESTEQUAL(in_header->opcode, FUSE_READDIR);
+		TESTFUSEOUTERROR(-EINVAL);
+
+		// folder 2: Set 10 as nodeid, Expect READDIRPLUS
+		TESTFUSELOOKUP(folder2, 0);
+		TEST(backing_fd = s_open(s_path(s(ft_src), s(folder2)),
+					 O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+		     backing_fd != -1);
+		TESTFUSEOUT2(fuse_entry_out, ((struct fuse_entry_out) {
+				.nodeid = 10,
+				.generation = 0,
+				.entry_valid = UINT64_MAX,
+				.attr_valid = UINT64_MAX,
+				.entry_valid_nsec = UINT32_MAX,
+				.attr_valid_nsec = UINT32_MAX,
+				.attr = attr,
+			     }), fuse_entry_bpf_out, ((struct fuse_entry_bpf_out) {
+				.backing_action = FUSE_ACTION_REPLACE,
+				.backing_fd = backing_fd,
+				.bpf_action = FUSE_ACTION_REPLACE,
+				.bpf_fd = bpf_fd,
+			     }));
+		TESTSYSCALL(close(backing_fd));
+		TEST(read_size = read(fuse_dev, bytes_in, sizeof(bytes_in)), read_size > 0);
+		TESTEQUAL(in_header->opcode, FUSE_READDIRPLUS);
+		TESTFUSEOUTERROR(-EINVAL);
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(fuse_dev);
+	close(content_fd);
+	close(src_fd);
+	close(bpf_fd);
+	umount(mount_dir);
+	return result;
+}
+
+/*
+ * State:
+ * Original: dst/folder1/content.txt
+ *                  ^
+ *                  |
+ *                  |
+ * Backing:  src/folder1/content.txt
+ *
+ * Step 1:  open(folder1) - lookup folder1 with entry_timeout set to 0
+ * Step 2:  open(folder1) - lookup folder1 again to trigger revalidate wich will
+ *                          set backing fd
+ *
+ * Check 1: cat(content.txt) - check not receiving call on the fuse daemon
+ *                             and content is the same
+ */
+static int bpf_test_revalidate_handle_backing_fd(const char *mount_dir)
+{
+	const char *folder1 = "folder1";
+	const char *content_file = "content.txt";
+	const char *content = "hello world";
+	int result = TEST_FAILURE;
+	int fuse_dev = -1;
+	int src_fd = -1;
+	int content_fd = -1;
+	FUSE_DECLARE_DAEMON;
+
+	TESTSYSCALL(s_mkdir(s_path(s(ft_src), s(folder1)), 0777));
+	TEST(content_fd = s_creat(s_pathn(3, s(ft_src), s(folder1), s(content_file)), 0777),
+		content_fd != -1);
+	TESTEQUAL(write(content_fd, content, strlen(content)), strlen(content));
+	TESTSYSCALL(close(content_fd));
+	content_fd = -1;
+	TESTEQUAL(mount_fuse_no_init(mount_dir, -1, -1, &fuse_dev), 0);
+	FUSE_START_DAEMON();
+	if (action) {
+		int dst_folder1_fd = -1;
+		int dst_content_fd = -1;
+		char content_buffer[9] = {0};
+		// Step 1: Lookup folder1
+		TESTERR(dst_folder1_fd = s_open(s_path(s(mount_dir), s(folder1)),
+			O_RDONLY | O_CLOEXEC), dst_folder1_fd != -1);
+		TESTSYSCALL(close(dst_folder1_fd));
+		dst_folder1_fd = -1;
+		// Step 2: Lookup folder1 again
+		TESTERR(dst_folder1_fd = s_open(s_path(s(mount_dir), s(folder1)),
+			O_RDONLY | O_CLOEXEC), dst_folder1_fd != -1);
+		TESTSYSCALL(close(dst_folder1_fd));
+		dst_folder1_fd = -1;
+		// Check 1: Read content file (must be backed)
+		TESTERR(dst_content_fd =
+			s_open(s_pathn(3, s(mount_dir), s(folder1), s(content_file)),
+			O_RDONLY | O_CLOEXEC), dst_content_fd != -1);
+		TESTEQUAL(read(dst_content_fd, content_buffer, strlen(content)),
+			  strlen(content));
+		TESTEQUAL(strncmp(content, content_buffer, strlen(content)), 0);
+		TESTSYSCALL(close(dst_content_fd));
+		dst_content_fd = -1;
+	} else {
+		struct fuse_attr attr = {};
+		int backing_fd = -1;
+
+		TESTFUSEINITFLAGS(FUSE_DO_READDIRPLUS | FUSE_READDIRPLUS_AUTO);
+		// Step 1: Lookup folder1 set entry_timeout to 0 to trigger
+		// revalidate later
+		TESTFUSELOOKUP(folder1, 0);
+		TESTSYSCALL(s_fuse_attr(s_path(s(ft_src), s(folder1)), &attr));
+		TEST(backing_fd = s_open(s_path(s(ft_src), s(folder1)),
+					 O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+		     backing_fd != -1);
+		TESTFUSEOUT2(fuse_entry_out, ((struct fuse_entry_out) {
+				.nodeid = attr.ino,
+				.generation = 0,
+				.entry_valid = 0,
+				.attr_valid = UINT64_MAX,
+				.entry_valid_nsec = 0,
+				.attr_valid_nsec = UINT32_MAX,
+				.attr = attr,
+			     }), fuse_entry_bpf_out, ((struct fuse_entry_bpf_out) {
+				.backing_action = FUSE_ACTION_REPLACE,
+				.backing_fd = backing_fd,
+			     }));
+		TESTSYSCALL(close(backing_fd));
+		// Step 1: Lookup folder1 as a reaction to revalidate call
+		// This attempts to change the backing node, which is not allowed on revalidate
+		TESTFUSELOOKUP(folder1, 0);
+		TESTSYSCALL(s_fuse_attr(s_path(s(ft_src), s(folder1)), &attr));
+		TEST(backing_fd = s_open(s_path(s(ft_src), s(folder1)),
+					 O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+		     backing_fd != -1);
+		TESTFUSEOUT2(fuse_entry_out, ((struct fuse_entry_out) {
+				.nodeid = attr.ino,
+				.generation = 0,
+				.entry_valid = UINT64_MAX,
+				.attr_valid = UINT64_MAX,
+				.entry_valid_nsec = UINT32_MAX,
+				.attr_valid_nsec = UINT32_MAX,
+				.attr = attr,
+			     }), fuse_entry_bpf_out, ((struct fuse_entry_bpf_out) {
+				.backing_action = FUSE_ACTION_REPLACE,
+				.backing_fd = backing_fd,
+			     }));
+		TESTSYSCALL(close(backing_fd));
+
+		// Lookup folder1 as a reaction to failed revalidate
+		TESTFUSELOOKUP(folder1, 0);
+		TESTSYSCALL(s_fuse_attr(s_path(s(ft_src), s(folder1)), &attr));
+		TEST(backing_fd = s_open(s_path(s(ft_src), s(folder1)),
+					 O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+		     backing_fd != -1);
+		TESTFUSEOUT2(fuse_entry_out, ((struct fuse_entry_out) {
+				.nodeid = attr.ino,
+				.generation = 0,
+				.entry_valid = UINT64_MAX,
+				.attr_valid = UINT64_MAX,
+				.entry_valid_nsec = UINT32_MAX,
+				.attr_valid_nsec = UINT32_MAX,
+				.attr = attr,
+			     }), fuse_entry_bpf_out, ((struct fuse_entry_bpf_out) {
+				.backing_action = FUSE_ACTION_REPLACE,
+				.backing_fd = backing_fd,
+			     }));
+		TESTSYSCALL(close(backing_fd));
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(fuse_dev);
+	close(content_fd);
+	close(src_fd);
+	umount(mount_dir);
+	return result;
+}
+
+static int bpf_test_lookup_postfilter(const char *mount_dir)
+{
+	const char *file1_name = "file1";
+	const char *file2_name = "file2";
+	const char *file3_name = "file3";
+	int result = TEST_FAILURE;
+	int bpf_fd = -1;
+	int src_fd = -1;
+	int fuse_dev = -1;
+	int file_fd = -1;
+	FUSE_DECLARE_DAEMON;
+
+	TEST(file_fd = s_creat(s_path(s(ft_src), s(file1_name)), 0777),
+	     file_fd != -1);
+	TESTSYSCALL(close(file_fd));
+	TEST(file_fd = s_creat(s_path(s(ft_src), s(file2_name)), 0777),
+	     file_fd != -1);
+	TESTSYSCALL(close(file_fd));
+	file_fd = -1;
+	TESTEQUAL(install_elf_bpf("test_bpf.bpf", "test_lookup_postfilter",
+					  &bpf_fd, NULL, NULL), 0);
+	TEST(src_fd = open(ft_src, O_DIRECTORY | O_RDONLY | O_CLOEXEC),
+	     src_fd != -1);
+	TESTEQUAL(mount_fuse(mount_dir, bpf_fd, src_fd, &fuse_dev), 0);
+	FUSE_START_DAEMON();
+	if (action) {
+		int fd = -1;
+
+		TESTEQUAL(s_open(s_path(s(mount_dir), s(file1_name)), O_RDONLY),
+			  -1);
+		TESTEQUAL(errno, ENOENT);
+		TEST(fd = s_open(s_path(s(mount_dir), s(file2_name)), O_RDONLY),
+		     fd != -1);
+		TESTSYSCALL(close(fd));
+		TESTEQUAL(s_open(s_path(s(mount_dir), s(file3_name)), O_RDONLY),
+			  -1);
+	} else {
+		struct fuse_in_postfilter_header *in_header =
+				(struct fuse_in_postfilter_header *)bytes_in;
+		struct fuse_entry_out *feo;
+		struct fuse_entry_bpf_out *febo;
+
+		TESTFUSELOOKUP(file1_name, FUSE_POSTFILTER);
+		TESTFUSEOUTERROR(-ENOENT);
+
+		TESTFUSELOOKUP(file2_name, FUSE_POSTFILTER);
+		feo = (struct fuse_entry_out *) (bytes_in +
+			sizeof(struct fuse_in_header) +	strlen(file2_name) + 1);
+		febo = (struct fuse_entry_bpf_out *) ((char *)feo +
+			sizeof(*feo));
+		TESTFUSEOUT2(fuse_entry_out, *feo, fuse_entry_bpf_out, *febo);
+
+		TESTFUSELOOKUP(file3_name, FUSE_POSTFILTER);
+		TESTEQUAL(in_header->error_in, -ENOENT);
+		TESTFUSEOUTERROR(-ENOENT);
+		exit(TEST_SUCCESS);
+	}
+	FUSE_END_DAEMON();
+	close(file_fd);
+	close(fuse_dev);
+	umount(mount_dir);
+	close(src_fd);
+	close(bpf_fd);
+	return result;
+}
+
+static void parse_range(const char *ranges, bool *run_test, size_t tests)
+{
+	size_t i;
+	char *range;
+
+	for (i = 0; i < tests; ++i)
+		run_test[i] = false;
+
+	range = strtok(optarg, ",");
+	while (range) {
+		char *dash = strchr(range, '-');
+
+		if (dash) {
+			size_t start = 1, end = tests;
+			char *end_ptr;
+
+			if (dash > range) {
+				start = strtol(range, &end_ptr, 10);
+				if (*end_ptr != '-' || start <= 0 || start > tests)
+					ksft_exit_fail_msg("Bad range\n");
+			}
+
+			if (dash[1]) {
+				end = strtol(dash + 1, &end_ptr, 10);
+				if (*end_ptr || end <= start || end > tests)
+					ksft_exit_fail_msg("Bad range\n");
+			}
+
+			for (i = start; i <= end; ++i)
+				run_test[i - 1] = true;
+		} else {
+			char *end;
+			long value = strtol(range, &end, 10);
+
+			if (*end || value <= 0 || value > tests)
+				ksft_exit_fail_msg("Bad range\n");
+			run_test[value - 1] = true;
+		}
+		range = strtok(NULL, ",");
+	}
+}
+
+static int parse_options(int argc, char *const *argv, bool *run_test,
+			 size_t tests)
+{
+	signed char c;
+
+	while ((c = getopt(argc, argv, "f:t:v")) != -1)
+		switch (c) {
+		case 'f':
+			test_options.file = strtol(optarg, NULL, 10);
+			break;
+
+		case 't':
+			parse_range(optarg, run_test, tests);
+			break;
+
+		case 'v':
+			test_options.verbose = true;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+
+	return 0;
+}
+
+struct test_case {
+	int (*pfunc)(const char *dir);
+	const char *name;
+};
+
+static void run_one_test(const char *mount_dir,
+			 const struct test_case *test_case)
+{
+	ksft_print_msg("Running %s\n", test_case->name);
+	if (test_case->pfunc(mount_dir) == TEST_SUCCESS)
+		ksft_test_result_pass("%s\n", test_case->name);
+	else
+		ksft_test_result_fail("%s\n", test_case->name);
+}
+
+int main(int argc, char *argv[])
+{
+	char *mount_dir = NULL;
+	char *src_dir = NULL;
+	int i;
+	int fd, count;
+
+#define MAKE_TEST(test)                                                        \
+	{                                                                      \
+		test, #test                                                    \
+	}
+	const struct test_case cases[] = {
+		MAKE_TEST(basic_test),
+		MAKE_TEST(bpf_test_real),
+		MAKE_TEST(bpf_test_partial),
+		MAKE_TEST(bpf_test_attrs),
+		MAKE_TEST(bpf_test_readdir),
+		MAKE_TEST(bpf_test_creat),
+		MAKE_TEST(bpf_test_hidden_entries),
+		MAKE_TEST(bpf_test_dir),
+		MAKE_TEST(bpf_test_file_early_close),
+		MAKE_TEST(bpf_test_file_late_close),
+		MAKE_TEST(bpf_test_mknod),
+		MAKE_TEST(bpf_test_largedir),
+		MAKE_TEST(bpf_test_link),
+		MAKE_TEST(bpf_test_symlink),
+		MAKE_TEST(bpf_test_xattr),
+		MAKE_TEST(bpf_test_redact_readdir),
+		MAKE_TEST(bpf_test_set_backing),
+		MAKE_TEST(bpf_test_remove_backing),
+		MAKE_TEST(bpf_test_dir_rename),
+		MAKE_TEST(bpf_test_file_rename),
+		MAKE_TEST(bpf_test_alter_errcode_bpf),
+		MAKE_TEST(bpf_test_alter_errcode_userspace),
+		MAKE_TEST(mmap_test),
+		MAKE_TEST(readdir_perms_test),
+		MAKE_TEST(inotify_test),
+		MAKE_TEST(bpf_test_statfs),
+		MAKE_TEST(bpf_test_lseek),
+		MAKE_TEST(bpf_test_readdirplus_not_overriding_backing),
+		MAKE_TEST(bpf_test_no_readdirplus_without_nodeid),
+		MAKE_TEST(bpf_test_revalidate_handle_backing_fd),
+		MAKE_TEST(bpf_test_lookup_postfilter),
+	};
+#undef MAKE_TEST
+
+	bool run_test[ARRAY_SIZE(cases)];
+
+	for (int i = 0; i < ARRAY_SIZE(cases); ++i)
+		run_test[i] = true;
+
+	if (parse_options(argc, argv, run_test, ARRAY_SIZE(cases)))
+		ksft_exit_fail_msg("Bad options\n");
+
+	// Seed randomness pool for testing on QEMU
+	// NOTE - this abuses the concept of randomness - do *not* ever do this
+	// on a machine for production use - the device will think it has good
+	// randomness when it does not.
+	fd = open("/dev/urandom", O_WRONLY | O_CLOEXEC);
+	count = 4096;
+	for (int i = 0; i < 128; ++i)
+		ioctl(fd, RNDADDTOENTCNT, &count);
+	close(fd);
+
+	ksft_print_header();
+
+	if (geteuid() != 0)
+		ksft_print_msg("Not a root, might fail to mount.\n");
+
+	if (tracing_on() != TEST_SUCCESS)
+		ksft_exit_fail_msg("Can't turn on tracing\n");
+
+	src_dir = setup_mount_dir(ft_src);
+	mount_dir = setup_mount_dir(ft_dst);
+	if (src_dir == NULL || mount_dir == NULL)
+		ksft_exit_fail_msg("Can't create a mount dir\n");
+
+	ksft_set_plan(ARRAY_SIZE(run_test));
+
+	for (i = 0; i < ARRAY_SIZE(run_test); ++i)
+		if (run_test[i]) {
+			delete_dir_tree(mount_dir, false);
+			delete_dir_tree(src_dir, false);
+			run_one_test(mount_dir, &cases[i]);
+		} else
+			ksft_cnt.ksft_xskip++;
+
+	umount2(mount_dir, MNT_FORCE);
+	delete_dir_tree(mount_dir, true);
+	delete_dir_tree(src_dir, true);
+	return !ksft_get_fail_cnt() ? ksft_exit_pass() : ksft_exit_fail();
+}

diff --git a/tools/testing/selftests/filesystems/fuse/test_bpf.c b/tools/testing/selftests/filesystems/fuse/test_bpf.c
new file mode 100644
index 0000000..032cb11
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/test_bpf.c

@@ -0,0 +1,507 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2022 Google LLC
+
+#include "test_fuse_bpf.h"
+
+SEC("test_readdir_redact")
+/* return FUSE_BPF_BACKING to use backing fs, 0 to pass to usermode */
+int readdir_test(struct fuse_bpf_args *fa)
+{
+	switch (fa->opcode) {
+	case FUSE_READDIR | FUSE_PREFILTER: {
+		const struct fuse_read_in *fri = fa->in_args[0].value;
+
+		bpf_printk("readdir %d", fri->fh);
+		return FUSE_BPF_BACKING | FUSE_BPF_POST_FILTER;
+	}
+
+	case FUSE_READDIR | FUSE_POSTFILTER: {
+		const struct fuse_read_in *fri = fa->in_args[0].value;
+
+		bpf_printk("readdir postfilter %x", fri->fh);
+		return FUSE_BPF_USER_FILTER;
+	}
+
+	default:
+		bpf_printk("opcode %d", fa->opcode);
+		return FUSE_BPF_BACKING;
+	}
+}
+
+SEC("test_trace")
+/* return FUSE_BPF_BACKING to use backing fs, 0 to pass to usermode */
+int trace_test(struct fuse_bpf_args *fa)
+{
+	switch (fa->opcode) {
+	case FUSE_LOOKUP | FUSE_PREFILTER: {
+		/* real and partial use backing file */
+		const char *name = fa->in_args[0].value;
+		bool backing = false;
+
+		if (strcmp(name, "real") == 0 || strcmp(name, "partial") == 0)
+			backing = true;
+
+		if (strcmp(name, "dir") == 0)
+			backing = true;
+		if (strcmp(name, "dir2") == 0)
+			backing = true;
+
+		if (strcmp(name, "file1") == 0)
+			backing = true;
+		if (strcmp(name, "file2") == 0)
+			backing = true;
+
+		bpf_printk("lookup %s %d", name, backing);
+		return backing ? FUSE_BPF_BACKING | FUSE_BPF_POST_FILTER : 0;
+	}
+
+	case FUSE_LOOKUP | FUSE_POSTFILTER: {
+		const char *name = fa->in_args[0].value;
+		struct fuse_entry_out *feo = fa->out_args[0].value;
+
+		if (strcmp(name, "real") == 0)
+			feo->nodeid = 5;
+		else if (strcmp(name, "partial") == 0)
+			feo->nodeid = 6;
+
+		bpf_printk("post-lookup %s %d", name, feo->nodeid);
+		return 0;
+	}
+
+	case FUSE_ACCESS | FUSE_PREFILTER: {
+		bpf_printk("Access: %d", fa->nodeid);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_CREATE | FUSE_PREFILTER:
+		bpf_printk("Create: %d", fa->nodeid);
+		return FUSE_BPF_BACKING;
+
+	case FUSE_MKNOD | FUSE_PREFILTER: {
+		const struct fuse_mknod_in *fmi = fa->in_args[0].value;
+		const char *name = fa->in_args[1].value;
+
+		bpf_printk("mknod %s %x %x", name, fmi->rdev | fmi->mode, fmi->umask);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_MKDIR | FUSE_PREFILTER: {
+		const struct fuse_mkdir_in *fmi = fa->in_args[0].value;
+		const char *name = fa->in_args[1].value;
+
+		bpf_printk("mkdir %s %x %x", name, fmi->mode, fmi->umask);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_RMDIR | FUSE_PREFILTER: {
+		const char *name = fa->in_args[0].value;
+
+		bpf_printk("rmdir %s", name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_RENAME | FUSE_PREFILTER: {
+		const char *oldname = fa->in_args[1].value;
+		const char *newname = fa->in_args[2].value;
+
+		bpf_printk("rename from %s", oldname);
+		bpf_printk("rename to %s", newname);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_RENAME2 | FUSE_PREFILTER: {
+		const struct fuse_rename2_in *fri = fa->in_args[0].value;
+		uint32_t flags = fri->flags;
+		const char *oldname = fa->in_args[1].value;
+		const char *newname = fa->in_args[2].value;
+
+		bpf_printk("rename(%x) from %s", flags, oldname);
+		bpf_printk("rename to %s", newname);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_UNLINK | FUSE_PREFILTER: {
+		const char *name = fa->in_args[0].value;
+
+		bpf_printk("unlink %s", name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_LINK | FUSE_PREFILTER: {
+		const struct fuse_link_in *fli = fa->in_args[0].value;
+		const char *link_name = fa->in_args[1].value;
+
+		bpf_printk("link %d %s", fli->oldnodeid, link_name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_SYMLINK | FUSE_PREFILTER: {
+		const char *link_name = fa->in_args[0].value;
+		const char *link_dest = fa->in_args[1].value;
+
+		bpf_printk("symlink from %s", link_name);
+		bpf_printk("symlink to %s", link_dest);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_READLINK | FUSE_PREFILTER: {
+		const char *link_name = fa->in_args[0].value;
+
+		bpf_printk("readlink from", link_name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_OPEN | FUSE_PREFILTER: {
+		int backing = 0;
+
+		switch (fa->nodeid) {
+		case 5:
+			backing = FUSE_BPF_BACKING;
+			break;
+
+		case 6:
+			backing = FUSE_BPF_BACKING | FUSE_BPF_POST_FILTER;
+			break;
+
+		default:
+			break;
+		}
+
+		bpf_printk("open %d %d", fa->nodeid, backing);
+		return backing;
+	}
+
+	case FUSE_OPEN | FUSE_POSTFILTER:
+		bpf_printk("open postfilter");
+		return FUSE_BPF_USER_FILTER;
+
+	case FUSE_READ | FUSE_PREFILTER: {
+		const struct fuse_read_in *fri = fa->in_args[0].value;
+
+		bpf_printk("read %llu %llu", fri->fh, fri->offset);
+		if (fri->fh == 1 && fri->offset == 0)
+			return 0;
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_GETATTR | FUSE_PREFILTER: {
+		/* real and partial use backing file */
+		int backing = 0;
+
+		switch (fa->nodeid) {
+		case 1:
+		case 5:
+		case 6:
+		/*
+		 * TODO: Find better solution
+		 * Add 100 to stop clang compiling to jump table which bpf hates
+		 */
+		case 100:
+			backing = FUSE_BPF_BACKING;
+			break;
+		}
+
+		bpf_printk("getattr %d %d", fa->nodeid, backing);
+		return backing;
+	}
+
+	case FUSE_SETATTR | FUSE_PREFILTER: {
+		/* real and partial use backing file */
+		int backing = 0;
+
+		switch (fa->nodeid) {
+		case 1:
+		case 5:
+		case 6:
+		/* TODO See above */
+		case 100:
+			backing = FUSE_BPF_BACKING;
+			break;
+		}
+
+		bpf_printk("setattr %d %d", fa->nodeid, backing);
+		return backing;
+	}
+
+	case FUSE_OPENDIR | FUSE_PREFILTER: {
+		int backing = 0;
+
+		switch (fa->nodeid) {
+		case 1:
+			backing = FUSE_BPF_BACKING | FUSE_BPF_POST_FILTER;
+			break;
+		}
+
+		bpf_printk("opendir %d %d", fa->nodeid, backing);
+		return backing;
+	}
+
+	case FUSE_OPENDIR | FUSE_POSTFILTER: {
+		struct fuse_open_out *foo = fa->out_args[0].value;
+
+		foo->fh = 2;
+		bpf_printk("opendir postfilter");
+		return 0;
+	}
+
+	case FUSE_READDIR | FUSE_PREFILTER: {
+		const struct fuse_read_in *fri = fa->in_args[0].value;
+		int backing = 0;
+
+		if (fri->fh == 2)
+			backing = FUSE_BPF_BACKING | FUSE_BPF_POST_FILTER;
+
+		bpf_printk("readdir %d %d", fri->fh, backing);
+		return backing;
+	}
+
+	case FUSE_READDIR | FUSE_POSTFILTER: {
+		const struct fuse_read_in *fri = fa->in_args[0].value;
+		int backing = 0;
+
+		if (fri->fh == 2)
+			backing = FUSE_BPF_USER_FILTER | FUSE_BPF_BACKING |
+				  FUSE_BPF_POST_FILTER;
+
+		bpf_printk("readdir postfilter %d %d", fri->fh, backing);
+		return backing;
+	}
+
+	case FUSE_FLUSH | FUSE_PREFILTER: {
+		const struct fuse_flush_in *ffi = fa->in_args[0].value;
+
+		bpf_printk("Flush %d", ffi->fh);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_GETXATTR | FUSE_PREFILTER: {
+		const struct fuse_flush_in *ffi = fa->in_args[0].value;
+		const char *name = fa->in_args[1].value;
+
+		bpf_printk("getxattr %d %s", ffi->fh, name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_LISTXATTR | FUSE_PREFILTER: {
+		const struct fuse_flush_in *ffi = fa->in_args[0].value;
+		const char *name = fa->in_args[1].value;
+
+		bpf_printk("listxattr %d %s", ffi->fh, name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_SETXATTR | FUSE_PREFILTER: {
+		const struct fuse_flush_in *ffi = fa->in_args[0].value;
+		const char *name = fa->in_args[1].value;
+		unsigned int size = fa->in_args[2].size;
+
+		bpf_printk("setxattr %d %s %u", ffi->fh, name, size);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_REMOVEXATTR | FUSE_PREFILTER: {
+		const char *name = fa->in_args[0].value;
+
+		bpf_printk("removexattr %s", name);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_CANONICAL_PATH | FUSE_PREFILTER: {
+		bpf_printk("canonical_path");
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_STATFS | FUSE_PREFILTER: {
+		bpf_printk("statfs");
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_LSEEK | FUSE_PREFILTER: {
+		const struct fuse_lseek_in *fli = fa->in_args[0].value;
+
+		bpf_printk("lseek type:%d, offset:%lld", fli->whence, fli->offset);
+		return FUSE_BPF_BACKING;
+	}
+
+	default:
+		bpf_printk("Unknown opcode %d", fa->opcode);
+		return 0;
+	}
+}
+
+SEC("test_hidden")
+int trace_hidden(struct fuse_bpf_args *fa)
+{
+	switch (fa->opcode) {
+	case FUSE_LOOKUP | FUSE_PREFILTER: {
+		const char *name = fa->in_args[0].value;
+
+		bpf_printk("Lookup: %s", name);
+		if (!strcmp(name, "show"))
+			return FUSE_BPF_BACKING;
+		if (!strcmp(name, "hide"))
+			return -ENOENT;
+
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_ACCESS | FUSE_PREFILTER: {
+		bpf_printk("Access: %d", fa->nodeid);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_CREATE | FUSE_PREFILTER:
+		bpf_printk("Create: %d", fa->nodeid);
+		return FUSE_BPF_BACKING;
+
+	case FUSE_WRITE | FUSE_PREFILTER:
+	// TODO: Clang combines similar printk calls, causing BPF to complain
+	//	bpf_printk("Write: %d", fa->nodeid);
+		return FUSE_BPF_BACKING;
+
+	case FUSE_FLUSH | FUSE_PREFILTER: {
+	//	const struct fuse_flush_in *ffi = fa->in_args[0].value;
+
+	//	bpf_printk("Flush %d", ffi->fh);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_RELEASE | FUSE_PREFILTER: {
+	//	const struct fuse_release_in *fri = fa->in_args[0].value;
+
+	//	bpf_printk("Release %d", fri->fh);
+		return FUSE_BPF_BACKING;
+	}
+
+	case FUSE_FALLOCATE | FUSE_PREFILTER:
+	//	bpf_printk("fallocate %d", fa->nodeid);
+		return FUSE_BPF_BACKING;
+
+	case FUSE_CANONICAL_PATH | FUSE_PREFILTER: {
+		return FUSE_BPF_BACKING;
+	}
+	default:
+		bpf_printk("Unknown opcode: %d", fa->opcode);
+		return 0;
+	}
+}
+
+SEC("test_simple")
+int trace_simple(struct fuse_bpf_args *fa)
+{
+	if (fa->opcode & FUSE_PREFILTER)
+		bpf_printk("prefilter opcode: %d",
+			   fa->opcode & FUSE_OPCODE_FILTER);
+	else if (fa->opcode & FUSE_POSTFILTER)
+		bpf_printk("postfilter opcode: %d",
+			   fa->opcode & FUSE_OPCODE_FILTER);
+	else
+		bpf_printk("*** UNKNOWN *** opcode: %d", fa->opcode);
+	return FUSE_BPF_BACKING;
+}
+
+SEC("test_passthrough")
+int trace_daemon(struct fuse_bpf_args *fa)
+{
+	switch (fa->opcode) {
+	case FUSE_LOOKUP | FUSE_PREFILTER: {
+		const char *name = fa->in_args[0].value;
+
+		bpf_printk("Lookup prefilter: %lx %s", fa->nodeid, name);
+		return FUSE_BPF_BACKING | FUSE_BPF_POST_FILTER;
+	}
+
+	case FUSE_LOOKUP | FUSE_POSTFILTER: {
+		const char *name = fa->in_args[0].value;
+		struct fuse_entry_bpf_out *febo = fa->out_args[1].value;
+
+		bpf_printk("Lookup postfilter: %lx %s %lu", fa->nodeid, name);
+		febo->bpf_action = FUSE_ACTION_REMOVE;
+
+		return FUSE_BPF_USER_FILTER;
+	}
+
+	default:
+		if (fa->opcode & FUSE_PREFILTER)
+			bpf_printk("prefilter opcode: %d",
+				   fa->opcode & FUSE_OPCODE_FILTER);
+		else if (fa->opcode & FUSE_POSTFILTER)
+			bpf_printk("postfilter opcode: %d",
+				   fa->opcode & FUSE_OPCODE_FILTER);
+		else
+			bpf_printk("*** UNKNOWN *** opcode: %d", fa->opcode);
+		return FUSE_BPF_BACKING;
+	}
+}
+
+SEC("test_error")
+/* return FUSE_BPF_BACKING to use backing fs, 0 to pass to usermode */
+int error_test(struct fuse_bpf_args *fa)
+{
+	switch (fa->opcode) {
+	case FUSE_MKDIR | FUSE_PREFILTER: {
+		bpf_printk("mkdir");
+		return FUSE_BPF_BACKING | FUSE_BPF_POST_FILTER;
+	}
+	case FUSE_MKDIR | FUSE_POSTFILTER: {
+		bpf_printk("mkdir postfilter");
+		if (fa->error_in == -EEXIST)
+			return -EPERM;
+
+		return 0;
+	}
+
+	case FUSE_LOOKUP | FUSE_PREFILTER: {
+		const char *name = fa->in_args[0].value;
+
+		bpf_printk("lookup prefilter %s", name);
+		return FUSE_BPF_BACKING | FUSE_BPF_POST_FILTER;
+	}
+	case FUSE_LOOKUP | FUSE_POSTFILTER: {
+		const char *name = fa->in_args[0].value;
+
+		bpf_printk("lookup postfilter %s %d", name, fa->error_in);
+		if (strcmp(name, "doesnotexist") == 0/* && fa->error_in == -EEXIST*/) {
+			bpf_printk("lookup postfilter doesnotexist");
+			return FUSE_BPF_USER_FILTER;
+		}
+		bpf_printk("meh");
+		return 0;
+	}
+
+	default:
+		if (fa->opcode & FUSE_PREFILTER)
+			bpf_printk("prefilter opcode: %d",
+				   fa->opcode & FUSE_OPCODE_FILTER);
+		else if (fa->opcode & FUSE_POSTFILTER)
+			bpf_printk("postfilter opcode: %d",
+				   fa->opcode & FUSE_OPCODE_FILTER);
+		else
+			bpf_printk("*** UNKNOWN *** opcode: %d", fa->opcode);
+		return FUSE_BPF_BACKING;
+	}
+}
+
+SEC("test_readdirplus")
+int readdirplus_test(struct fuse_bpf_args *fa)
+{
+	switch (fa->opcode) {
+	case FUSE_READDIR | FUSE_PREFILTER: {
+		return 0;
+	}
+	}
+	return FUSE_BPF_BACKING;
+}
+
+SEC("test_lookup_postfilter")
+int lookuppostfilter_test(struct fuse_bpf_args *fa)
+{
+	switch (fa->opcode) {
+	case FUSE_LOOKUP | FUSE_PREFILTER:
+		return FUSE_BPF_BACKING | FUSE_BPF_POST_FILTER;
+	case FUSE_LOOKUP | FUSE_POSTFILTER:
+		return FUSE_BPF_USER_FILTER;
+	default:
+		return FUSE_BPF_BACKING;
+	}
+}

diff --git a/tools/testing/selftests/filesystems/fuse/test_framework.h b/tools/testing/selftests/filesystems/fuse/test_framework.h
new file mode 100644
index 0000000..efc6f53
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/test_framework.h

@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Google LLC
+ */
+
+#ifndef _TEST_FRAMEWORK_H
+#define _TEST_FRAMEWORK_H
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <linux/compiler.h>
+
+#ifdef __ANDROID__
+static int test_case_pass;
+static int test_case_fail;
+#define ksft_print_msg			printf
+#define ksft_test_result_pass(...)	({test_case_pass++; printf(__VA_ARGS__); })
+#define ksft_test_result_fail(...)	({test_case_fail++; printf(__VA_ARGS__); })
+#define ksft_exit_fail_msg(...)		printf(__VA_ARGS__)
+#define ksft_print_header()
+#define ksft_set_plan(cnt)
+#define ksft_get_fail_cnt()		test_case_fail
+#define ksft_exit_pass()		0
+#define ksft_exit_fail()		1
+#else
+#include <kselftest.h>
+#endif
+
+#define TEST_FAILURE 1
+#define TEST_SUCCESS 0
+
+#define ptr_to_u64(p) ((__u64)p)
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define le16_to_cpu(x)          (x)
+#define le32_to_cpu(x)          (x)
+#define le64_to_cpu(x)          (x)
+#else
+#error Big endian not supported!
+#endif
+
+struct _test_options {
+	int file;
+	bool verbose;
+};
+
+extern struct _test_options test_options;
+
+#define TESTCOND(condition)						\
+	do {								\
+		if (!(condition)) {					\
+			ksft_print_msg("%s failed %d\n",		\
+				       __func__, __LINE__);		\
+			goto out;					\
+		} else if (test_options.verbose)			\
+			ksft_print_msg("%s succeeded %d\n",		\
+				       __func__, __LINE__);		\
+	} while (false)
+
+#define TESTCONDERR(condition)						\
+	do {								\
+		if (!(condition)) {					\
+			ksft_print_msg("%s failed %d\n",		\
+				       __func__, __LINE__);		\
+			ksft_print_msg("Error %d (\"%s\")\n",		\
+				       errno, strerror(errno));		\
+			goto out;					\
+		} else if (test_options.verbose)			\
+			ksft_print_msg("%s succeeded %d\n",		\
+				       __func__, __LINE__);		\
+	} while (false)
+
+#define TEST(statement, condition)					\
+	do {								\
+		statement;						\
+		TESTCOND(condition);					\
+	} while (false)
+
+#define TESTERR(statement, condition)					\
+	do {								\
+		statement;						\
+		TESTCONDERR(condition);					\
+	} while (false)
+
+enum _operator {
+	_eq,
+	_ne,
+	_ge,
+};
+
+static const char * const _operator_name[] = {
+	"==",
+	"!=",
+	">=",
+};
+
+#define _TEST_OPERATOR(name, _type, format_specifier)			\
+static inline int _test_operator_##name(const char *func, int line,	\
+				_type a, _type b, enum _operator o)	\
+{									\
+	bool pass;							\
+	switch (o) {							\
+	case _eq:							\
+		pass = a == b;						\
+		break;							\
+	case _ne:							\
+		pass = a != b;						\
+		break;							\
+	case _ge:							\
+		pass = a >= b;						\
+		break;							\
+	}								\
+									\
+	if (!pass)							\
+		ksft_print_msg("Failed: %s at line %d, "		\
+			       format_specifier " %s "			\
+			       format_specifier	"\n",			\
+			       func, line, a, _operator_name[o], b);	\
+	else if (test_options.verbose)					\
+		ksft_print_msg("Passed: %s at line %d, "		\
+			       format_specifier " %s "			\
+			       format_specifier "\n",			\
+			       func, line, a, _operator_name[o], b);	\
+									\
+	return pass ? TEST_SUCCESS : TEST_FAILURE;			\
+}
+
+_TEST_OPERATOR(i, int, "%d")
+_TEST_OPERATOR(ui, unsigned int, "%u")
+_TEST_OPERATOR(lui, unsigned long, "%lu")
+_TEST_OPERATOR(ss, ssize_t, "%zd")
+_TEST_OPERATOR(vp, void *, "%px")
+_TEST_OPERATOR(cp, char *, "%px")
+
+#define _CALL_TO(_type, name, a, b, o)					\
+	_test_operator_##name(__func__, __LINE__,			\
+				  (_type) (long long) (a),		\
+				  (_type) (long long) (b), o)
+
+#define TESTOPERATOR(a, b, o)						\
+	do {								\
+		if (_Generic((a),					\
+			int : _CALL_TO(int, i, a, b, o),		\
+			unsigned int : _CALL_TO(unsigned int, ui, a, b, o),	\
+			unsigned long : _CALL_TO(unsigned long, lui, a, b, o),	\
+			ssize_t : _CALL_TO(ssize_t, ss, a, b, o),		\
+			void * : _CALL_TO(void *, vp, a, b, o),		\
+			char * : _CALL_TO(char *, cp, a, b, o)		\
+		))							\
+			goto out;					\
+	} while (false)
+
+#define TESTEQUAL(a, b) TESTOPERATOR(a, b, _eq)
+#define TESTNE(a, b) TESTOPERATOR(a, b, _ne)
+#define TESTGE(a, b) TESTOPERATOR(a, b, _ge)
+
+/* For testing a syscall that returns 0 on success and sets errno otherwise */
+#define TESTSYSCALL(statement) TESTCONDERR((statement) == 0)
+
+static inline void print_bytes(const void *data, size_t size)
+{
+	const char *bytes = data;
+	int i;
+
+	for (i = 0; i < size; ++i) {
+		if (i % 0x10 == 0)
+			printf("%08x:", i);
+		printf("%02x ", (unsigned int) (unsigned char) bytes[i]);
+		if (i % 0x10 == 0x0f)
+			printf("\n");
+	}
+
+	if (i % 0x10 != 0)
+		printf("\n");
+}
+
+
+
+#endif

diff --git a/tools/testing/selftests/filesystems/fuse/test_fuse.h b/tools/testing/selftests/filesystems/fuse/test_fuse.h
new file mode 100644
index 0000000..69dadc9
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/test_fuse.h

@@ -0,0 +1,337 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Google LLC
+ */
+
+#ifndef TEST_FUSE__H
+#define TEST_FUSE__H
+
+#define _GNU_SOURCE
+
+#include "test_framework.h"
+
+#include <dirent.h>
+#include <sys/stat.h>
+#include <sys/statfs.h>
+#include <sys/types.h>
+
+#include <include/uapi/linux/android_fuse.h>
+#include <include/uapi/linux/fuse.h>
+
+#define PAGE_SIZE 4096
+#define FUSE_POSTFILTER 0x20000
+
+extern struct _test_options test_options;
+
+/* Slow but semantically easy string functions */
+
+/*
+ * struct s just wraps a char pointer
+ * It is a pointer to a malloc'd string, or null
+ * All consumers handle null input correctly
+ * All consumers free the string
+ */
+struct s {
+	char *s;
+};
+
+struct s s(const char *s1);
+struct s sn(const char *s1, const char *s2);
+int s_cmp(struct s s1, struct s s2);
+struct s s_cat(struct s s1, struct s s2);
+struct s s_splitleft(struct s s1, char c);
+struct s s_splitright(struct s s1, char c);
+struct s s_word(struct s s1, char c, size_t n);
+struct s s_path(struct s s1, struct s s2);
+struct s s_pathn(size_t n, struct s s1, ...);
+int s_link(struct s src_pathname, struct s dst_pathname);
+int s_symlink(struct s src_pathname, struct s dst_pathname);
+int s_mkdir(struct s pathname, mode_t mode);
+int s_rmdir(struct s pathname);
+int s_unlink(struct s pathname);
+int s_open(struct s pathname, int flags, ...);
+int s_openat(int dirfd, struct s pathname, int flags, ...);
+int s_creat(struct s pathname, mode_t mode);
+int s_mkfifo(struct s pathname, mode_t mode);
+int s_stat(struct s pathname, struct stat *st);
+int s_statfs(struct s pathname, struct statfs *st);
+int s_fuse_attr(struct s pathname, struct fuse_attr *fuse_attr_out);
+DIR *s_opendir(struct s pathname);
+int s_getxattr(struct s pathname, const char name[], void *value, size_t size,
+	       ssize_t *ret_size);
+int s_listxattr(struct s pathname, void *list, size_t size, ssize_t *ret_size);
+int s_setxattr(struct s pathname, const char name[], const void *value,
+	       size_t size, int flags);
+int s_removexattr(struct s pathname, const char name[]);
+int s_rename(struct s oldpathname, struct s newpathname);
+
+struct s tracing_folder(void);
+int tracing_on(void);
+
+char *concat_file_name(const char *dir, const char *file);
+char *setup_mount_dir(const char *name);
+int delete_dir_tree(const char *dir_path, bool remove_root);
+
+#define TESTFUSEINNULL(_opcode)						\
+	do {								\
+		struct fuse_in_header *in_header =			\
+				(struct fuse_in_header *)bytes_in;	\
+		ssize_t res = read(fuse_dev, &bytes_in,			\
+			sizeof(bytes_in));				\
+									\
+		TESTEQUAL(in_header->opcode, _opcode);			\
+		TESTEQUAL(res, sizeof(*in_header));			\
+	} while (false)
+
+#define TESTFUSEIN(_opcode, in_struct)					\
+	do {								\
+		struct fuse_in_header *in_header =			\
+				(struct fuse_in_header *)bytes_in;	\
+		ssize_t res = read(fuse_dev, &bytes_in,			\
+			sizeof(bytes_in));				\
+									\
+		TESTEQUAL(in_header->opcode, _opcode);			\
+		TESTEQUAL(res, sizeof(*in_header) + sizeof(*in_struct));\
+	} while (false)
+
+#define TESTFUSEIN2(_opcode, in_struct1, in_struct2)			\
+	do {								\
+		struct fuse_in_header *in_header =			\
+				(struct fuse_in_header *)bytes_in;	\
+		ssize_t res = read(fuse_dev, &bytes_in,			\
+			sizeof(bytes_in));				\
+									\
+		TESTEQUAL(in_header->opcode, _opcode);			\
+		TESTEQUAL(res, sizeof(*in_header) + sizeof(*in_struct1) \
+						+ sizeof(*in_struct2)); \
+		in_struct1 = (void *)(bytes_in + sizeof(*in_header));	\
+		in_struct2 = (void *)(bytes_in + sizeof(*in_header)	\
+				      + sizeof(*in_struct1));		\
+	} while (false)
+
+#define TESTFUSEINEXT(_opcode, in_struct, extra)			\
+	do {								\
+		struct fuse_in_header *in_header =			\
+				(struct fuse_in_header *)bytes_in;	\
+		ssize_t res = read(fuse_dev, &bytes_in,			\
+			sizeof(bytes_in));				\
+									\
+		TESTEQUAL(in_header->opcode, _opcode);			\
+		TESTEQUAL(res,						\
+		       sizeof(*in_header) + sizeof(*in_struct) + extra);\
+	} while (false)
+
+#define TESTFUSEINUNKNOWN()						\
+	do {								\
+		struct fuse_in_header *in_header =			\
+				(struct fuse_in_header *)bytes_in;	\
+		ssize_t res = read(fuse_dev, &bytes_in,			\
+			sizeof(bytes_in));				\
+									\
+		TESTGE(res, sizeof(*in_header));			\
+		TESTEQUAL(in_header->opcode, -1);			\
+	} while (false)
+
+/* Special case lookup since it is asymmetric */
+#define TESTFUSELOOKUP(expected, filter)				\
+	do {								\
+		struct fuse_in_header *in_header =			\
+				(struct fuse_in_header *)bytes_in;	\
+		char *name = (char *) (bytes_in + sizeof(*in_header));	\
+		ssize_t res;						\
+									\
+		TEST(res = read(fuse_dev, &bytes_in, sizeof(bytes_in)),	\
+			  res != -1);					\
+		/* TODO once we handle forgets properly, remove */	\
+		if (in_header->opcode == FUSE_FORGET)			\
+			continue;					\
+		if (in_header->opcode == FUSE_BATCH_FORGET)		\
+			continue;					\
+		TESTGE(res, sizeof(*in_header));			\
+		TESTEQUAL(in_header->opcode,				\
+			FUSE_LOOKUP | filter);				\
+		TESTEQUAL(res,						\
+			  sizeof(*in_header) + strlen(expected) + 1 +	\
+				(filter == FUSE_POSTFILTER ?		\
+				sizeof(struct fuse_entry_out) +		\
+				sizeof(struct fuse_entry_bpf_out) : 0));\
+		TESTCOND(!strcmp(name, expected));			\
+		break;							\
+	} while (true)
+
+#define TESTFUSEOUTEMPTY()						\
+	do {								\
+		struct fuse_in_header *in_header =			\
+				(struct fuse_in_header *)bytes_in;	\
+		struct fuse_out_header *out_header =			\
+			(struct fuse_out_header *)bytes_out;		\
+									\
+		*out_header = (struct fuse_out_header) {		\
+			.len = sizeof(*out_header),			\
+			.unique = in_header->unique,			\
+		};							\
+		TESTEQUAL(write(fuse_dev, bytes_out, out_header->len),	\
+			  out_header->len);				\
+	} while (false)
+
+#define TESTFUSEOUTERROR(errno)						\
+	do {								\
+		struct fuse_in_header *in_header =			\
+				(struct fuse_in_header *)bytes_in;	\
+		struct fuse_out_header *out_header =			\
+			(struct fuse_out_header *)bytes_out;		\
+									\
+		*out_header = (struct fuse_out_header) {		\
+			.len = sizeof(*out_header),			\
+			.error = errno,					\
+			.unique = in_header->unique,			\
+		};							\
+		TESTEQUAL(write(fuse_dev, bytes_out, out_header->len),	\
+			  out_header->len);				\
+	} while (false)
+
+#define TESTFUSEOUTREAD(data, length)					\
+	do {								\
+		struct fuse_in_header *in_header =			\
+				(struct fuse_in_header *)bytes_in;	\
+		struct fuse_out_header *out_header =			\
+			(struct fuse_out_header *)bytes_out;		\
+									\
+		*out_header = (struct fuse_out_header) {		\
+			.len = sizeof(*out_header) + length,		\
+			.unique = in_header->unique,			\
+		};							\
+		memcpy(bytes_out + sizeof(*out_header), data, length);	\
+		TESTEQUAL(write(fuse_dev, bytes_out, out_header->len),	\
+			  out_header->len);				\
+	} while (false)
+
+#define TESTFUSEDIROUTREAD(read_out, data, length)			\
+	do {								\
+		struct fuse_in_header *in_header =			\
+				(struct fuse_in_header *)bytes_in;	\
+		struct fuse_out_header *out_header =			\
+			(struct fuse_out_header *)bytes_out;		\
+									\
+		*out_header = (struct fuse_out_header) {		\
+			.len = sizeof(*out_header) +			\
+			       sizeof(*read_out) + length,		\
+			.unique = in_header->unique,			\
+		};							\
+		memcpy(bytes_out + sizeof(*out_header) +		\
+				sizeof(*read_out), data, length);	\
+		memcpy(bytes_out + sizeof(*out_header),			\
+				read_out, sizeof(*read_out));		\
+		TESTEQUAL(write(fuse_dev, bytes_out, out_header->len),	\
+			  out_header->len);				\
+	} while (false)
+
+#define TESTFUSEOUT1(type1, obj1)					\
+	do {								\
+		*(struct fuse_out_header *) bytes_out			\
+			= (struct fuse_out_header) {			\
+			.len = sizeof(struct fuse_out_header)		\
+				+ sizeof(struct type1),			\
+			.unique = ((struct fuse_in_header *)		\
+				   bytes_in)->unique,			\
+		};							\
+		*(struct type1 *) (bytes_out				\
+			+ sizeof(struct fuse_out_header))		\
+			= obj1;						\
+		TESTEQUAL(write(fuse_dev, bytes_out,			\
+			((struct fuse_out_header *)bytes_out)->len),	\
+			((struct fuse_out_header *)bytes_out)->len);	\
+	} while (false)
+
+#define TESTFUSEOUT2(type1, obj1, type2, obj2)				\
+	do {								\
+		*(struct fuse_out_header *) bytes_out			\
+			= (struct fuse_out_header) {			\
+			.len = sizeof(struct fuse_out_header)		\
+				+ sizeof(struct type1)			\
+				+ sizeof(struct type2),			\
+			.unique = ((struct fuse_in_header *)		\
+				   bytes_in)->unique,			\
+		};							\
+		*(struct type1 *) (bytes_out				\
+			+ sizeof(struct fuse_out_header))		\
+			= obj1;						\
+		*(struct type2 *) (bytes_out				\
+			+ sizeof(struct fuse_out_header)		\
+			+ sizeof(struct type1))				\
+			= obj2;						\
+		TESTEQUAL(write(fuse_dev, bytes_out,			\
+			((struct fuse_out_header *)bytes_out)->len),	\
+			((struct fuse_out_header *)bytes_out)->len);	\
+	} while (false)
+
+#define TESTFUSEINITFLAGS(fuse_connection_flags)			\
+	do {								\
+		DECL_FUSE_IN(init);					\
+									\
+		TESTFUSEIN(FUSE_INIT, init_in);				\
+		TESTEQUAL(init_in->major, FUSE_KERNEL_VERSION);		\
+		TESTEQUAL(init_in->minor, FUSE_KERNEL_MINOR_VERSION);	\
+		TESTFUSEOUT1(fuse_init_out, ((struct fuse_init_out) {	\
+			.major = FUSE_KERNEL_VERSION,			\
+			.minor = FUSE_KERNEL_MINOR_VERSION,		\
+			.max_readahead = 4096,				\
+			.flags = fuse_connection_flags,			\
+			.max_background = 0,				\
+			.congestion_threshold = 0,			\
+			.max_write = 4096,				\
+			.time_gran = 1000,				\
+			.max_pages = 12,				\
+			.map_alignment = 4096,				\
+		}));							\
+	} while (false)
+
+#define TESTFUSEINIT()							\
+	TESTFUSEINITFLAGS(0)
+
+#define DECL_FUSE_IN(name)						\
+	struct fuse_##name##_in *name##_in =				\
+		(struct fuse_##name##_in *)				\
+		(bytes_in + sizeof(struct fuse_in_header))
+
+#define DECL_FUSE(name)							\
+	struct fuse_##name##_in *name##_in __maybe_unused;		\
+	struct fuse_##name##_out *name##_out __maybe_unused
+
+#define FUSE_DECLARE_DAEMON						\
+	int daemon = -1;						\
+	int status;							\
+	bool action;							\
+	uint8_t bytes_in[FUSE_MIN_READ_BUFFER] __maybe_unused;		\
+	uint8_t bytes_out[FUSE_MIN_READ_BUFFER]	__maybe_unused
+
+#define FUSE_START_DAEMON()						\
+	do {								\
+		TEST(daemon = fork(), daemon != -1);			\
+		action = daemon != 0;					\
+	} while (false)
+
+#define FUSE_END_DAEMON()						\
+	do {								\
+		TESTEQUAL(waitpid(daemon, &status, 0), daemon);		\
+		TESTEQUAL(status, TEST_SUCCESS);			\
+		result = TEST_SUCCESS;					\
+out:									\
+		if (!daemon)						\
+			exit(TEST_FAILURE);				\
+	} while (false)
+
+
+struct map_relocation {
+	char *name;
+	int fd;
+	int value;
+};
+
+int mount_fuse(const char *mount_dir, int bpf_fd, int dir_fd,
+	       int *fuse_dev_ptr);
+int mount_fuse_no_init(const char *mount_dir, int bpf_fd, int dir_fd,
+	       int *fuse_dev_ptr);
+int install_elf_bpf(const char *file, const char *section, int *fd,
+		    struct map_relocation **map_relocations, size_t *map_count);
+#endif

diff --git a/tools/testing/selftests/filesystems/fuse/test_fuse_bpf.h b/tools/testing/selftests/filesystems/fuse/test_fuse_bpf.h
new file mode 100644
index 0000000..9097626
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fuse/test_fuse_bpf.h

@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2022 Google LLC
+ */
+
+#ifndef TEST_FUSE__BPF__H
+#define TEST_FUSE__BPF__H
+
+#define __EXPORTED_HEADERS__
+#define __KERNEL__
+
+#ifdef __ANDROID__
+#include <stdint.h>
+#endif
+
+#include <uapi/linux/types.h>
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/android_fuse.h>
+#include <uapi/linux/fuse.h>
+#include <uapi/linux/errno.h>
+
+#define SEC(NAME) __section(NAME)
+
+struct fuse_bpf_map {
+	int map_type;
+	size_t key_size;
+	size_t value_size;
+	int max_entries;
+};
+
+static void *(*bpf_map_lookup_elem)(struct fuse_bpf_map *map, void *key)
+	= (void *) 1;
+
+static void *(*bpf_map_update_elem)(struct fuse_bpf_map *map, void *key,
+				    void *value, int flags)
+	= (void *) 2;
+
+static long (*bpf_trace_printk)(const char *fmt, __u32 fmt_size, ...)
+	= (void *) 6;
+
+static long (*bpf_get_current_pid_tgid)()
+	= (void *) 14;
+
+static long (*bpf_get_current_uid_gid)()
+	= (void *) 15;
+
+#define bpf_printk(fmt, ...)					\
+	({			                                \
+		char ____fmt[] = fmt;                           \
+		bpf_trace_printk(____fmt, sizeof(____fmt),      \
+					##__VA_ARGS__);		\
+	})
+
+SEC("dummy") inline int strcmp(const char *a, const char *b)
+{
+	int i;
+
+	for (i = 0; i < __builtin_strlen(b) + 1; ++i)
+		if (a[i] != b[i])
+			return -1;
+
+	return 0;
+}
+
+#endif

diff --git a/tools/testing/selftests/filesystems/incfs/.gitignore b/tools/testing/selftests/filesystems/incfs/.gitignore
new file mode 100644
index 0000000..f0e3cd9
--- /dev/null
+++ b/tools/testing/selftests/filesystems/incfs/.gitignore

@@ -0,0 +1,3 @@
+incfs_test
+incfs_stress
+incfs_perf

diff --git a/tools/testing/selftests/filesystems/incfs/Makefile b/tools/testing/selftests/filesystems/incfs/Makefile
new file mode 100644
index 0000000..5a2f630
--- /dev/null
+++ b/tools/testing/selftests/filesystems/incfs/Makefile

@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+CFLAGS += -D_FILE_OFFSET_BITS=64 -Wall -Werror -I../.. -I../../../../.. -fno-omit-frame-pointer -fsanitize=address -g
+LDLIBS := -llz4 -lzstd -lcrypto -lpthread -fsanitize=address
+TEST_GEN_PROGS := incfs_test incfs_stress incfs_perf
+
+include ../../lib.mk
+
+# Put after include ../../lib.mk since that changes $(TEST_GEN_PROGS)
+# Otherwise you get multiple targets, this becomes the default, and it's a mess
+EXTRA_SOURCES := utils.c
+$(TEST_GEN_PROGS) : $(EXTRA_SOURCES)

diff --git a/tools/testing/selftests/filesystems/incfs/OWNERS b/tools/testing/selftests/filesystems/incfs/OWNERS
new file mode 100644
index 0000000..f26e11c
--- /dev/null
+++ b/tools/testing/selftests/filesystems/incfs/OWNERS

@@ -0,0 +1 @@
+file:/fs/incfs/OWNERS

diff --git a/tools/testing/selftests/filesystems/incfs/incfs_perf.c b/tools/testing/selftests/filesystems/incfs/incfs_perf.c
new file mode 100644
index 0000000..ed36bbd
--- /dev/null
+++ b/tools/testing/selftests/filesystems/incfs/incfs_perf.c

@@ -0,0 +1,717 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Google LLC
+ */
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <lz4.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <time.h>
+#include <ctype.h>
+#include <unistd.h>
+
+#include "utils.h"
+
+#define err_msg(...)                                                           \
+	do {                                                                   \
+		fprintf(stderr, "%s: (%d) ", TAG, __LINE__);                   \
+		fprintf(stderr, __VA_ARGS__);                                  \
+		fprintf(stderr, " (%s)\n", strerror(errno));                   \
+	} while (false)
+
+#define TAG "incfs_perf"
+
+struct options {
+	int blocks; /* -b number of diff block sizes */
+	bool no_cleanup; /* -c don't clean up after */
+	const char *test_dir; /* -d working directory */
+	const char *file_types; /* -f sScCvV */
+	bool no_native; /* -n don't test native files */
+	bool no_random; /* -r don't do random reads*/
+	bool no_linear; /* -R random reads only */
+	size_t size; /* -s file size as power of 2 */
+	int tries; /* -t times to run test*/
+};
+
+enum flags {
+	SHUFFLE = 1,
+	COMPRESS = 2,
+	VERIFY = 4,
+	LAST_FLAG = 8,
+};
+
+void print_help(void)
+{
+	puts(
+	"incfs_perf. Performance test tool for incfs\n"
+	"\tTests read performance of incfs by creating files of various types\n"
+	"\tflushing caches and then reading them back.\n"
+	"\tEach file is read with different block sizes and average\n"
+	"\tthroughput in megabytes/second and memory usage are reported for\n"
+	"\teach block size\n"
+	"\tNative files are tested for comparison\n"
+	"\tNative files are created in native folder, incfs files are created\n"
+	"\tin src folder which is mounted on dst folder\n"
+	"\n"
+	"\t-bn (default 8) number of different block sizes, starting at 4096\n"
+	"\t                and doubling\n"
+	"\t-c		   don't Clean up - leave files and mount point\n"
+	"\t-d dir          create directories in dir\n"
+	"\t-fs|Sc|Cv|V     restrict which files are created.\n"
+	"\t                s blocks not shuffled, S blocks shuffled\n"
+	"\t                c blocks not compress, C blocks compressed\n"
+	"\t                v files not verified, V files verified\n"
+	"\t                If a letter is omitted, both options are tested\n"
+	"\t                If no letter are given, incfs is not tested\n"
+	"\t-n              Don't test native files\n"
+	"\t-r              No random reads (sequential only)\n"
+	"\t-R              Random reads only (no sequential)\n"
+	"\t-sn (default 30)File size as power of 2\n"
+	"\t-tn (default 5) Number of tries per file. Results are averaged\n"
+	);
+}
+
+int parse_options(int argc, char *const *argv, struct options *options)
+{
+	signed char c;
+
+	/* Set defaults here */
+	*options = (struct options){
+		.blocks = 8,
+		.test_dir = ".",
+		.tries = 5,
+		.size = 30,
+	};
+
+	/* Load options from command line here */
+	while ((c = getopt(argc, argv, "b:cd:f::hnrRs:t:")) != -1) {
+		switch (c) {
+		case 'b':
+			options->blocks = strtol(optarg, NULL, 10);
+			break;
+
+		case 'c':
+			options->no_cleanup = true;
+			break;
+
+		case 'd':
+			options->test_dir = optarg;
+			break;
+
+		case 'f':
+			if (optarg)
+				options->file_types = optarg;
+			else
+				options->file_types = "sS";
+			break;
+
+		case 'h':
+			print_help();
+			exit(0);
+
+		case 'n':
+			options->no_native = true;
+			break;
+
+		case 'r':
+			options->no_random = true;
+			break;
+
+		case 'R':
+			options->no_linear = true;
+			break;
+
+		case 's':
+			options->size = strtol(optarg, NULL, 10);
+			break;
+
+		case 't':
+			options->tries = strtol(optarg, NULL, 10);
+			break;
+
+		default:
+			print_help();
+			return -EINVAL;
+		}
+	}
+
+	options->size = 1L << options->size;
+
+	return 0;
+}
+
+void shuffle(size_t *buffer, size_t size)
+{
+	size_t i;
+
+	for (i = 0; i < size; ++i) {
+		size_t j = random() * (size - i - 1) / RAND_MAX;
+		size_t temp = buffer[i];
+
+		buffer[i] = buffer[j];
+		buffer[j] = temp;
+	}
+}
+
+int get_free_memory(void)
+{
+	FILE *meminfo = fopen("/proc/meminfo", "re");
+	char field[256];
+	char value[256] = {};
+
+	if (!meminfo)
+		return -ENOENT;
+
+	while (fscanf(meminfo, "%[^:]: %s kB\n", field, value) == 2) {
+		if (!strcmp(field, "MemFree"))
+			break;
+		*value = 0;
+	}
+
+	fclose(meminfo);
+
+	if (!*value)
+		return -ENOENT;
+
+	return strtol(value, NULL, 10);
+}
+
+int write_data(int cmd_fd, int dir_fd, const char *name, size_t size, int flags)
+{
+	int fd = openat(dir_fd, name, O_RDWR | O_CLOEXEC);
+	struct incfs_permit_fill permit_fill = {
+		.file_descriptor = fd,
+	};
+	int block_count = 1 + (size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+	size_t *blocks = malloc(sizeof(size_t) * block_count);
+	int error = 0;
+	size_t i;
+	uint8_t data[INCFS_DATA_FILE_BLOCK_SIZE] = {};
+	uint8_t compressed_data[INCFS_DATA_FILE_BLOCK_SIZE] = {};
+	struct incfs_fill_block fill_block = {
+		.compression = COMPRESSION_NONE,
+		.data_len = sizeof(data),
+		.data = ptr_to_u64(data),
+	};
+
+	if (!blocks) {
+		err_msg("Out of memory");
+		error = -errno;
+		goto out;
+	}
+
+	if (fd == -1) {
+		err_msg("Could not open file for writing %s", name);
+		error = -errno;
+		goto out;
+	}
+
+	if (ioctl(cmd_fd, INCFS_IOC_PERMIT_FILL, &permit_fill)) {
+		err_msg("Failed to call PERMIT_FILL");
+		error = -errno;
+		goto out;
+	}
+
+	for (i = 0; i < block_count; ++i)
+		blocks[i] = i;
+
+	if (flags & SHUFFLE)
+		shuffle(blocks, block_count);
+
+	if (flags & COMPRESS) {
+		size_t comp_size = LZ4_compress_default(
+			(char *)data, (char *)compressed_data, sizeof(data),
+			ARRAY_SIZE(compressed_data));
+
+		if (comp_size <= 0) {
+			error = -EBADMSG;
+			goto out;
+		}
+		fill_block.compression = COMPRESSION_LZ4;
+		fill_block.data = ptr_to_u64(compressed_data);
+		fill_block.data_len = comp_size;
+	}
+
+	for (i = 0; i < block_count; ++i) {
+		struct incfs_fill_blocks fill_blocks = {
+			.count = 1,
+			.fill_blocks = ptr_to_u64(&fill_block),
+		};
+
+		fill_block.block_index = blocks[i];
+		int written = ioctl(fd, INCFS_IOC_FILL_BLOCKS, &fill_blocks);
+
+		if (written != 1) {
+			error = -errno;
+			err_msg("Failed to write block %lu in file %s", i,
+				name);
+			break;
+		}
+	}
+
+out:
+	free(blocks);
+	close(fd);
+	sync();
+	return error;
+}
+
+int measure_read_throughput_internal(const char *tag, int dir, const char *name,
+				     const struct options *options, bool random)
+{
+	int block;
+
+	if (random)
+		printf("%32s(random)", tag);
+	else
+		printf("%40s", tag);
+
+	for (block = 0; block < options->blocks; ++block) {
+		size_t buffer_size;
+		char *buffer;
+		int try;
+		double time = 0;
+		double throughput;
+		int memory = 0;
+
+		buffer_size = 1 << (block + 12);
+		buffer = malloc(buffer_size);
+
+		for (try = 0; try < options->tries; ++try) {
+			int err;
+			struct timespec start_time, end_time;
+			off_t i;
+			int fd;
+			size_t offsets_size = options->size / buffer_size;
+			size_t *offsets =
+				malloc(offsets_size * sizeof(*offsets));
+			int start_memory, end_memory;
+
+			if (!offsets) {
+				err_msg("Not enough memory");
+				return -ENOMEM;
+			}
+
+			for (i = 0; i < offsets_size; ++i)
+				offsets[i] = i * buffer_size;
+
+			if (random)
+				shuffle(offsets, offsets_size);
+
+			err = drop_caches();
+			if (err) {
+				err_msg("Failed to drop caches");
+				return err;
+			}
+
+			start_memory = get_free_memory();
+			if (start_memory < 0) {
+				err_msg("Failed to get start memory");
+				return start_memory;
+			}
+
+			fd = openat(dir, name, O_RDONLY | O_CLOEXEC);
+			if (fd == -1) {
+				err_msg("Failed to open file");
+				return err;
+			}
+
+			err = clock_gettime(CLOCK_MONOTONIC, &start_time);
+			if (err) {
+				err_msg("Failed to get start time");
+				return err;
+			}
+
+			for (i = 0; i < offsets_size; ++i)
+				if (pread(fd, buffer, buffer_size,
+					  offsets[i]) != buffer_size) {
+					err_msg("Failed to read file");
+					err = -errno;
+					goto fail;
+				}
+
+			err = clock_gettime(CLOCK_MONOTONIC, &end_time);
+			if (err) {
+				err_msg("Failed to get start time");
+				goto fail;
+			}
+
+			end_memory = get_free_memory();
+			if (end_memory < 0) {
+				err_msg("Failed to get end memory");
+				return end_memory;
+			}
+
+			time += end_time.tv_sec - start_time.tv_sec;
+			time += (end_time.tv_nsec - start_time.tv_nsec) / 1e9;
+
+			close(fd);
+			fd = -1;
+			memory += start_memory - end_memory;
+
+fail:
+			free(offsets);
+			close(fd);
+			if (err)
+				return err;
+		}
+
+		throughput = options->size * options->tries / time;
+		printf("%10.3e %10d", throughput, memory / options->tries);
+		free(buffer);
+	}
+
+	printf("\n");
+	return 0;
+}
+
+int measure_read_throughput(const char *tag, int dir, const char *name,
+			    const struct options *options)
+{
+	int err = 0;
+
+	if (!options->no_linear)
+		err = measure_read_throughput_internal(tag, dir, name, options,
+						       false);
+
+	if (!err && !options->no_random)
+		err = measure_read_throughput_internal(tag, dir, name, options,
+						       true);
+	return err;
+}
+
+int test_native_file(int dir, const struct options *options)
+{
+	const char *name = "file";
+	int fd;
+	char buffer[4096] = {};
+	off_t i;
+	int err;
+
+	fd = openat(dir, name, O_CREAT | O_WRONLY | O_CLOEXEC, 0600);
+	if (fd == -1) {
+		err_msg("Could not open native file");
+		return -errno;
+	}
+
+	for (i = 0; i < options->size; i += sizeof(buffer))
+		if (pwrite(fd, buffer, sizeof(buffer), i) != sizeof(buffer)) {
+			err_msg("Failed to write file");
+			err = -errno;
+			goto fail;
+		}
+
+	close(fd);
+	sync();
+	fd = -1;
+
+	err = measure_read_throughput("native", dir, name, options);
+
+fail:
+	close(fd);
+	return err;
+}
+
+struct hash_block {
+	char data[INCFS_DATA_FILE_BLOCK_SIZE];
+};
+
+static struct hash_block *build_mtree(size_t size, char *root_hash,
+				      int *mtree_block_count)
+{
+	char data[INCFS_DATA_FILE_BLOCK_SIZE] = {};
+	const int digest_size = SHA256_DIGEST_SIZE;
+	const int hash_per_block = INCFS_DATA_FILE_BLOCK_SIZE / digest_size;
+	int block_count = 0;
+	int hash_block_count = 0;
+	int total_tree_block_count = 0;
+	int tree_lvl_index[INCFS_MAX_MTREE_LEVELS] = {};
+	int tree_lvl_count[INCFS_MAX_MTREE_LEVELS] = {};
+	int levels_count = 0;
+	int i, level;
+	struct hash_block *mtree;
+
+	if (size == 0)
+		return 0;
+
+	block_count = 1 + (size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+	hash_block_count = block_count;
+	for (i = 0; hash_block_count > 1; i++) {
+		hash_block_count = (hash_block_count + hash_per_block - 1) /
+				   hash_per_block;
+		tree_lvl_count[i] = hash_block_count;
+		total_tree_block_count += hash_block_count;
+	}
+	levels_count = i;
+
+	for (i = 0; i < levels_count; i++) {
+		int prev_lvl_base = (i == 0) ? total_tree_block_count :
+					       tree_lvl_index[i - 1];
+
+		tree_lvl_index[i] = prev_lvl_base - tree_lvl_count[i];
+	}
+
+	*mtree_block_count = total_tree_block_count;
+	mtree = calloc(total_tree_block_count, sizeof(*mtree));
+	/* Build level 0 hashes. */
+	for (i = 0; i < block_count; i++) {
+		int block_index = tree_lvl_index[0] + i / hash_per_block;
+		int block_off = (i % hash_per_block) * digest_size;
+		char *hash_ptr = mtree[block_index].data + block_off;
+
+		sha256(data, INCFS_DATA_FILE_BLOCK_SIZE, hash_ptr);
+	}
+
+	/* Build higher levels of hash tree. */
+	for (level = 1; level < levels_count; level++) {
+		int prev_lvl_base = tree_lvl_index[level - 1];
+		int prev_lvl_count = tree_lvl_count[level - 1];
+
+		for (i = 0; i < prev_lvl_count; i++) {
+			int block_index =
+				i / hash_per_block + tree_lvl_index[level];
+			int block_off = (i % hash_per_block) * digest_size;
+			char *hash_ptr = mtree[block_index].data + block_off;
+
+			sha256(mtree[i + prev_lvl_base].data,
+			       INCFS_DATA_FILE_BLOCK_SIZE, hash_ptr);
+		}
+	}
+
+	/* Calculate root hash from the top block */
+	sha256(mtree[0].data, INCFS_DATA_FILE_BLOCK_SIZE, root_hash);
+
+	return mtree;
+}
+
+static int load_hash_tree(int cmd_fd, int dir, const char *name,
+			  struct hash_block *mtree, int mtree_block_count)
+{
+	int err;
+	int i;
+	int fd;
+	struct incfs_fill_block *fill_block_array =
+		calloc(mtree_block_count, sizeof(struct incfs_fill_block));
+	struct incfs_fill_blocks fill_blocks = {
+		.count = mtree_block_count,
+		.fill_blocks = ptr_to_u64(fill_block_array),
+	};
+	struct incfs_permit_fill permit_fill;
+
+	if (!fill_block_array)
+		return -ENOMEM;
+
+	for (i = 0; i < fill_blocks.count; i++) {
+		fill_block_array[i] = (struct incfs_fill_block){
+			.block_index = i,
+			.data_len = INCFS_DATA_FILE_BLOCK_SIZE,
+			.data = ptr_to_u64(mtree[i].data),
+			.flags = INCFS_BLOCK_FLAGS_HASH
+		};
+	}
+
+	fd = openat(dir, name, O_RDONLY | O_CLOEXEC);
+	if (fd < 0) {
+		err = errno;
+		goto failure;
+	}
+
+	permit_fill.file_descriptor = fd;
+	if (ioctl(cmd_fd, INCFS_IOC_PERMIT_FILL, &permit_fill)) {
+		err_msg("Failed to call PERMIT_FILL");
+		err = -errno;
+		goto failure;
+	}
+
+	err = ioctl(fd, INCFS_IOC_FILL_BLOCKS, &fill_blocks);
+	close(fd);
+	if (err < fill_blocks.count)
+		err = errno;
+	else
+		err = 0;
+
+failure:
+	free(fill_block_array);
+	return err;
+}
+
+int test_incfs_file(int dst_dir, const struct options *options, int flags)
+{
+	int cmd_file = openat(dst_dir, INCFS_PENDING_READS_FILENAME,
+			      O_RDONLY | O_CLOEXEC);
+	int err;
+	char name[4];
+	incfs_uuid_t id;
+	char tag[256];
+
+	snprintf(name, sizeof(name), "%c%c%c",
+		 flags & SHUFFLE ? 'S' : 's',
+		 flags & COMPRESS ? 'C' : 'c',
+		 flags & VERIFY ? 'V' : 'v');
+
+	if (cmd_file == -1) {
+		err_msg("Could not open command file");
+		return -errno;
+	}
+
+	if (flags & VERIFY) {
+		char root_hash[INCFS_MAX_HASH_SIZE];
+		int mtree_block_count;
+		struct hash_block *mtree = build_mtree(options->size, root_hash,
+						       &mtree_block_count);
+
+		if (!mtree) {
+			err_msg("Failed to build hash tree");
+			err = -ENOMEM;
+			goto fail;
+		}
+
+		err = crypto_emit_file(cmd_file, NULL, name, &id, options->size,
+				       root_hash, "add_data");
+
+		if (!err)
+			err = load_hash_tree(cmd_file, dst_dir, name, mtree,
+					     mtree_block_count);
+
+		free(mtree);
+	} else
+		err = emit_file(cmd_file, NULL, name, &id, options->size, NULL);
+
+	if (err) {
+		err_msg("Failed to create file %s", name);
+		goto fail;
+	}
+
+	if (write_data(cmd_file, dst_dir, name, options->size, flags))
+		goto fail;
+
+	snprintf(tag, sizeof(tag), "incfs%s%s%s",
+		 flags & SHUFFLE ? "(shuffle)" : "",
+		 flags & COMPRESS ? "(compress)" : "",
+		 flags & VERIFY ? "(verify)" : "");
+
+	err = measure_read_throughput(tag, dst_dir, name, options);
+
+fail:
+	close(cmd_file);
+	return err;
+}
+
+bool skip(struct options const *options, int flag, char c)
+{
+	if (!options->file_types)
+		return false;
+
+	if (flag && strchr(options->file_types, tolower(c)))
+		return true;
+
+	if (!flag && strchr(options->file_types, toupper(c)))
+		return true;
+
+	return false;
+}
+
+int main(int argc, char *const *argv)
+{
+	struct options options;
+	int err;
+	const char *native_dir = "native";
+	const char *src_dir = "src";
+	const char *dst_dir = "dst";
+	int native_dir_fd = -1;
+	int src_dir_fd = -1;
+	int dst_dir_fd = -1;
+	int block;
+	int flags;
+
+	err = parse_options(argc, argv, &options);
+	if (err)
+		return err;
+
+	err = chdir(options.test_dir);
+	if (err) {
+		err_msg("Failed to change to %s", options.test_dir);
+		return -errno;
+	}
+
+	/* Clean up any interrupted previous runs */
+	while (!umount(dst_dir))
+		;
+
+	err = remove_dir(native_dir) || remove_dir(src_dir) ||
+	      remove_dir(dst_dir);
+	if (err)
+		return err;
+
+	err = mkdir(native_dir, 0700);
+	if (err) {
+		err_msg("Failed to make directory %s", src_dir);
+		err = -errno;
+		goto cleanup;
+	}
+
+	err = mkdir(src_dir, 0700);
+	if (err) {
+		err_msg("Failed to make directory %s", src_dir);
+		err = -errno;
+		goto cleanup;
+	}
+
+	err = mkdir(dst_dir, 0700);
+	if (err) {
+		err_msg("Failed to make directory %s", src_dir);
+		err = -errno;
+		goto cleanup;
+	}
+
+	err = mount_fs_opt(dst_dir, src_dir, "readahead=0,rlog_pages=0", 0);
+	if (err) {
+		err_msg("Failed to mount incfs");
+		goto cleanup;
+	}
+
+	native_dir_fd = open(native_dir, O_RDONLY | O_CLOEXEC);
+	src_dir_fd = open(src_dir, O_RDONLY | O_CLOEXEC);
+	dst_dir_fd = open(dst_dir, O_RDONLY | O_CLOEXEC);
+	if (native_dir_fd == -1 || src_dir_fd == -1 || dst_dir_fd == -1) {
+		err_msg("Failed to open native, src or dst dir");
+		err = -errno;
+		goto cleanup;
+	}
+
+	printf("%40s", "");
+	for (block = 0; block < options.blocks; ++block)
+		printf("%21d", 1 << (block + 12));
+	printf("\n");
+
+	if (!err && !options.no_native)
+		err = test_native_file(native_dir_fd, &options);
+
+	for (flags = 0; flags < LAST_FLAG && !err; ++flags) {
+		if (skip(&options, flags & SHUFFLE, 's') ||
+		    skip(&options, flags & COMPRESS, 'c') ||
+		    skip(&options, flags & VERIFY, 'v'))
+			continue;
+		err = test_incfs_file(dst_dir_fd, &options, flags);
+	}
+
+cleanup:
+	close(native_dir_fd);
+	close(src_dir_fd);
+	close(dst_dir_fd);
+	if (!options.no_cleanup) {
+		umount(dst_dir);
+		remove_dir(native_dir);
+		remove_dir(dst_dir);
+		remove_dir(src_dir);
+	}
+
+	return err;
+}

diff --git a/tools/testing/selftests/filesystems/incfs/incfs_stress.c b/tools/testing/selftests/filesystems/incfs/incfs_stress.c
new file mode 100644
index 0000000..a1d4917
--- /dev/null
+++ b/tools/testing/selftests/filesystems/incfs/incfs_stress.c

@@ -0,0 +1,322 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Google LLC
+ */
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "utils.h"
+
+#define err_msg(...)                                                           \
+	do {                                                                   \
+		fprintf(stderr, "%s: (%d) ", TAG, __LINE__);                   \
+		fprintf(stderr, __VA_ARGS__);                                  \
+		fprintf(stderr, " (%s)\n", strerror(errno));                   \
+	} while (false)
+
+#define TAG "incfs_stress"
+
+struct options {
+	bool no_cleanup; /* -c */
+	const char *test_dir; /* -d */
+	unsigned int rng_seed; /* -g */
+	int num_reads; /* -n */
+	int readers; /* -r */
+	int size; /* -s */
+	int timeout; /* -t */
+};
+
+struct read_data {
+	const char *filename;
+	int dir_fd;
+	size_t filesize;
+	int num_reads;
+	unsigned int rng_seed;
+};
+
+int cancel_threads;
+
+int parse_options(int argc, char *const *argv, struct options *options)
+{
+	signed char c;
+
+	/* Set defaults here */
+	*options = (struct options){
+		.test_dir = ".",
+		.num_reads = 1000,
+		.readers = 10,
+		.size = 10,
+	};
+
+	/* Load options from command line here */
+	while ((c = getopt(argc, argv, "cd:g:n:r:s:t:")) != -1) {
+		switch (c) {
+		case 'c':
+			options->no_cleanup = true;
+			break;
+
+		case 'd':
+			options->test_dir = optarg;
+			break;
+
+		case 'g':
+			options->rng_seed = strtol(optarg, NULL, 10);
+			break;
+
+		case 'n':
+			options->num_reads = strtol(optarg, NULL, 10);
+			break;
+
+		case 'r':
+			options->readers = strtol(optarg, NULL, 10);
+			break;
+
+		case 's':
+			options->size = strtol(optarg, NULL, 10);
+			break;
+
+		case 't':
+			options->timeout = strtol(optarg, NULL, 10);
+			break;
+		}
+	}
+
+	return 0;
+}
+
+void *reader(void *data)
+{
+	struct read_data *read_data = (struct read_data *)data;
+	int i;
+	int fd = -1;
+	void *buffer = malloc(read_data->filesize);
+
+	if (!buffer) {
+		err_msg("Failed to alloc read buffer");
+		goto out;
+	}
+
+	fd = openat(read_data->dir_fd, read_data->filename,
+		    O_RDONLY | O_CLOEXEC);
+	if (fd == -1) {
+		err_msg("Failed to open file");
+		goto out;
+	}
+
+	for (i = 0; i < read_data->num_reads && !cancel_threads; ++i) {
+		off_t offset = rnd(read_data->filesize, &read_data->rng_seed);
+		size_t count =
+			rnd(read_data->filesize - offset, &read_data->rng_seed);
+		ssize_t err = pread(fd, buffer, count, offset);
+
+		if (err != count)
+			err_msg("failed to read with value %lu", err);
+	}
+
+out:
+	close(fd);
+	free(read_data);
+	free(buffer);
+	return NULL;
+}
+
+int write_data(int cmd_fd, int dir_fd, const char *name, size_t size)
+{
+	int fd = openat(dir_fd, name, O_RDWR | O_CLOEXEC);
+	struct incfs_permit_fill permit_fill = {
+		.file_descriptor = fd,
+	};
+	int error = 0;
+	int i;
+	int block_count = 1 + (size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+
+	if (fd == -1) {
+		err_msg("Could not open file for writing %s", name);
+		return -errno;
+	}
+
+	if (ioctl(cmd_fd, INCFS_IOC_PERMIT_FILL, &permit_fill)) {
+		err_msg("Failed to call PERMIT_FILL");
+		error = -errno;
+		goto out;
+	}
+
+	for (i = 0; i < block_count; ++i) {
+		uint8_t data[INCFS_DATA_FILE_BLOCK_SIZE] = {};
+		size_t block_size =
+			size > i * INCFS_DATA_FILE_BLOCK_SIZE ?
+				INCFS_DATA_FILE_BLOCK_SIZE :
+				size - (i * INCFS_DATA_FILE_BLOCK_SIZE);
+		struct incfs_fill_block fill_block = {
+			.compression = COMPRESSION_NONE,
+			.block_index = i,
+			.data_len = block_size,
+			.data = ptr_to_u64(data),
+		};
+		struct incfs_fill_blocks fill_blocks = {
+			.count = 1,
+			.fill_blocks = ptr_to_u64(&fill_block),
+		};
+		int written = ioctl(fd, INCFS_IOC_FILL_BLOCKS, &fill_blocks);
+
+		if (written != 1) {
+			error = -errno;
+			err_msg("Failed to write block %d in file %s", i, name);
+			break;
+		}
+	}
+out:
+	close(fd);
+	return error;
+}
+
+int test_files(int src_dir, int dst_dir, struct options const *options)
+{
+	unsigned int seed = options->rng_seed;
+	int cmd_file = openat(dst_dir, INCFS_PENDING_READS_FILENAME,
+			      O_RDONLY | O_CLOEXEC);
+	int err;
+	const char *name = "001";
+	incfs_uuid_t id;
+	size_t size;
+	int i;
+	pthread_t *threads = NULL;
+
+	size = 1 << (rnd(options->size, &seed) + 12);
+	size += rnd(size, &seed);
+
+	if (cmd_file == -1) {
+		err_msg("Could not open command file");
+		return -errno;
+	}
+
+	err = emit_file(cmd_file, NULL, name, &id, size, NULL);
+	if (err) {
+		err_msg("Failed to create file %s", name);
+		return err;
+	}
+
+	threads = malloc(sizeof(pthread_t) * options->readers);
+	if (!threads) {
+		err_msg("Could not allocate memory for threads");
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < options->readers; ++i) {
+		struct read_data *read_data = malloc(sizeof(*read_data));
+
+		if (!read_data) {
+			err_msg("Failed to allocate read_data");
+			err = -ENOMEM;
+			break;
+		}
+
+		*read_data = (struct read_data){
+			.filename = name,
+			.dir_fd = dst_dir,
+			.filesize = size,
+			.num_reads = options->num_reads,
+			.rng_seed = seed,
+		};
+
+		rnd(0, &seed);
+
+		err = pthread_create(threads + i, 0, reader, read_data);
+		if (err) {
+			err_msg("Failed to create thread");
+			free(read_data);
+			break;
+		}
+	}
+
+	if (err)
+		cancel_threads = 1;
+	else
+		err = write_data(cmd_file, dst_dir, name, size);
+
+	for (; i > 0; --i) {
+		if (pthread_join(threads[i - 1], NULL)) {
+			err_msg("FATAL: failed to join thread");
+			exit(-errno);
+		}
+	}
+
+	free(threads);
+	close(cmd_file);
+	return err;
+}
+
+int main(int argc, char *const *argv)
+{
+	struct options options;
+	int err;
+	const char *src_dir = "src";
+	const char *dst_dir = "dst";
+	int src_dir_fd = -1;
+	int dst_dir_fd = -1;
+
+	err = parse_options(argc, argv, &options);
+	if (err)
+		return err;
+
+	err = chdir(options.test_dir);
+	if (err) {
+		err_msg("Failed to change to %s", options.test_dir);
+		return -errno;
+	}
+
+	err = remove_dir(src_dir) || remove_dir(dst_dir);
+	if (err)
+		return err;
+
+	err = mkdir(src_dir, 0700);
+	if (err) {
+		err_msg("Failed to make directory %s", src_dir);
+		err = -errno;
+		goto cleanup;
+	}
+
+	err = mkdir(dst_dir, 0700);
+	if (err) {
+		err_msg("Failed to make directory %s", src_dir);
+		err = -errno;
+		goto cleanup;
+	}
+
+	err = mount_fs(dst_dir, src_dir, options.timeout);
+	if (err) {
+		err_msg("Failed to mount incfs");
+		goto cleanup;
+	}
+
+	src_dir_fd = open(src_dir, O_RDONLY | O_CLOEXEC);
+	dst_dir_fd = open(dst_dir, O_RDONLY | O_CLOEXEC);
+	if (src_dir_fd == -1 || dst_dir_fd == -1) {
+		err_msg("Failed to open src or dst dir");
+		err = -errno;
+		goto cleanup;
+	}
+
+	err = test_files(src_dir_fd, dst_dir_fd, &options);
+
+cleanup:
+	close(src_dir_fd);
+	close(dst_dir_fd);
+	if (!options.no_cleanup) {
+		umount(dst_dir);
+		remove_dir(dst_dir);
+		remove_dir(src_dir);
+	}
+
+	return err;
+}

diff --git a/tools/testing/selftests/filesystems/incfs/incfs_test.c b/tools/testing/selftests/filesystems/incfs/incfs_test.c
new file mode 100644
index 0000000..10c15fa
--- /dev/null
+++ b/tools/testing/selftests/filesystems/incfs/incfs_test.c

@@ -0,0 +1,4803 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+#define _GNU_SOURCE
+
+#include <alloca.h>
+#include <dirent.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <lz4.h>
+#include <poll.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include <unistd.h>
+#include <zstd.h>
+
+#include <sys/inotify.h>
+#include <sys/mman.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/xattr.h>
+#include <sys/statvfs.h>
+
+#include <linux/random.h>
+#include <linux/stat.h>
+#include <linux/unistd.h>
+
+#include <openssl/pem.h>
+#include <openssl/x509.h>
+
+#include <kselftest.h>
+#include <include/uapi/linux/fsverity.h>
+
+#include "utils.h"
+
+/* Can't include uapi/linux/fs.h because it clashes with mount.h */
+#define	FS_IOC_GETFLAGS			_IOR('f', 1, long)
+#define FS_VERITY_FL			0x00100000 /* Verity protected inode */
+
+#define TEST_SKIP 2
+#define TEST_FAILURE 1
+#define TEST_SUCCESS 0
+
+#define INCFS_ROOT_INODE 0
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define le16_to_cpu(x)          (x)
+#define le32_to_cpu(x)          (x)
+#define le64_to_cpu(x)          (x)
+#else
+#error Big endian not supported!
+#endif
+
+struct {
+	int file;
+	int test;
+	bool verbose;
+} options;
+
+#define TESTCOND(condition)						\
+	do {								\
+		if (!(condition)) {					\
+			ksft_print_msg("%s failed %d\n",		\
+				       __func__, __LINE__);		\
+			goto out;					\
+		} else if (options.verbose)				\
+			ksft_print_msg("%s succeeded %d\n",		\
+				       __func__, __LINE__);		\
+	} while (false)
+
+#define TEST(statement, condition)					\
+	do {								\
+		statement;						\
+		TESTCOND(condition);					\
+	} while (false)
+
+#define TESTEQUAL(statement, res)					\
+	TESTCOND((statement) == (res))
+
+#define TESTNE(statement, res)					\
+	TESTCOND((statement) != (res))
+
+#define TESTSYSCALL(statement)						\
+	do {								\
+		int res = statement;					\
+									\
+		if (res)						\
+			ksft_print_msg("Failed: %s (%d)\n",		\
+				       strerror(errno), errno);		\
+		TESTEQUAL(res, 0);					\
+	} while (false)
+
+void print_bytes(const void *data, size_t size)
+{
+	const uint8_t *bytes = data;
+	int i;
+
+	for (i = 0; i < size; ++i) {
+		if (i % 0x10 == 0)
+			printf("%08x:", i);
+		printf("%02x ", (unsigned int) bytes[i]);
+		if (i % 0x10 == 0x0f)
+			printf("\n");
+	}
+
+	if (i % 0x10 != 0)
+		printf("\n");
+}
+
+struct hash_block {
+	char data[INCFS_DATA_FILE_BLOCK_SIZE];
+};
+
+struct test_signature {
+	void *data;
+	size_t size;
+
+	char add_data[100];
+	size_t add_data_size;
+};
+
+struct test_file {
+	int index;
+	incfs_uuid_t id;
+	char *name;
+	off_t size;
+	char root_hash[INCFS_MAX_HASH_SIZE];
+	struct hash_block *mtree;
+	int mtree_block_count;
+	struct test_signature sig;
+	unsigned char *verity_sig;
+	size_t verity_sig_size;
+};
+
+struct test_files_set {
+	struct test_file *files;
+	int files_count;
+};
+
+struct linux_dirent64 {
+	uint64_t       d_ino;
+	int64_t        d_off;
+	unsigned short d_reclen;
+	unsigned char  d_type;
+	char	       d_name[0];
+} __packed;
+
+struct test_files_set get_test_files_set(void)
+{
+	static struct test_file files[] = {
+		{ .index = 0, .name = "file_one_byte", .size = 1 },
+		{ .index = 1,
+		  .name = "file_one_block",
+		  .size = INCFS_DATA_FILE_BLOCK_SIZE },
+		{ .index = 2,
+		  .name = "file_one_and_a_half_blocks",
+		  .size = INCFS_DATA_FILE_BLOCK_SIZE +
+			  INCFS_DATA_FILE_BLOCK_SIZE / 2 },
+		{ .index = 3,
+		  .name = "file_three",
+		  .size = 300 * INCFS_DATA_FILE_BLOCK_SIZE + 3 },
+		{ .index = 4,
+		  .name = "file_four",
+		  .size = 400 * INCFS_DATA_FILE_BLOCK_SIZE + 7 },
+		{ .index = 5,
+		  .name = "file_five",
+		  .size = 500 * INCFS_DATA_FILE_BLOCK_SIZE + 7 },
+		{ .index = 6,
+		  .name = "file_six",
+		  .size = 600 * INCFS_DATA_FILE_BLOCK_SIZE + 7 },
+		{ .index = 7,
+		  .name = "file_seven",
+		  .size = 700 * INCFS_DATA_FILE_BLOCK_SIZE + 7 },
+		{ .index = 8,
+		  .name = "file_eight",
+		  .size = 800 * INCFS_DATA_FILE_BLOCK_SIZE + 7 },
+		{ .index = 9,
+		  .name = "file_nine",
+		  .size = 900 * INCFS_DATA_FILE_BLOCK_SIZE + 7 },
+		{ .index = 10, .name = "file_big", .size = 500 * 1024 * 1024 }
+	};
+
+	if (options.file)
+		return (struct test_files_set) {
+			.files = files + options.file - 1,
+			.files_count = 1,
+		};
+
+	return (struct test_files_set){ .files = files,
+					.files_count = ARRAY_SIZE(files) };
+}
+
+struct test_files_set get_small_test_files_set(void)
+{
+	static struct test_file files[] = {
+		{ .index = 0, .name = "file_one_byte", .size = 1 },
+		{ .index = 1,
+		  .name = "file_one_block",
+		  .size = INCFS_DATA_FILE_BLOCK_SIZE },
+		{ .index = 2,
+		  .name = "file_one_and_a_half_blocks",
+		  .size = INCFS_DATA_FILE_BLOCK_SIZE +
+			  INCFS_DATA_FILE_BLOCK_SIZE / 2 },
+		{ .index = 3,
+		  .name = "file_three",
+		  .size = 300 * INCFS_DATA_FILE_BLOCK_SIZE + 3 },
+		{ .index = 4,
+		  .name = "file_four",
+		  .size = 400 * INCFS_DATA_FILE_BLOCK_SIZE + 7 }
+	};
+	return (struct test_files_set){ .files = files,
+					.files_count = ARRAY_SIZE(files) };
+}
+
+static int get_file_block_seed(int file, int block)
+{
+	return 7919 * file + block;
+}
+
+static loff_t min(loff_t a, loff_t b)
+{
+	return a < b ? a : b;
+}
+
+static int ilog2(size_t n)
+{
+	int l = 0;
+
+	while (n > 1) {
+		++l;
+		n >>= 1;
+	}
+	return l;
+}
+
+static pid_t flush_and_fork(void)
+{
+	fflush(stdout);
+	return fork();
+}
+
+static void print_error(char *msg)
+{
+	ksft_print_msg("%s: %s\n", msg, strerror(errno));
+}
+
+static int wait_for_process(pid_t pid)
+{
+	int status;
+	int wait_res;
+
+	wait_res = waitpid(pid, &status, 0);
+	if (wait_res <= 0) {
+		print_error("Can't wait for the child");
+		return -EINVAL;
+	}
+	if (!WIFEXITED(status)) {
+		ksft_print_msg("Unexpected child status pid=%d\n", pid);
+		return -EINVAL;
+	}
+	status = WEXITSTATUS(status);
+	if (status != 0)
+		return status;
+	return 0;
+}
+
+static void rnd_buf(uint8_t *data, size_t len, unsigned int seed)
+{
+	int i;
+
+	for (i = 0; i < len; i++) {
+		seed = 1103515245 * seed + 12345;
+		data[i] = (uint8_t)(seed >> (i % 13));
+	}
+}
+
+char *bin2hex(char *dst, const void *src, size_t count)
+{
+	const unsigned char *_src = src;
+	static const char hex_asc[] = "0123456789abcdef";
+
+	while (count--) {
+		unsigned char x = *_src++;
+
+		*dst++ = hex_asc[(x & 0xf0) >> 4];
+		*dst++ = hex_asc[(x & 0x0f)];
+	}
+	*dst = 0;
+	return dst;
+}
+
+static char *get_index_filename(const char *mnt_dir, incfs_uuid_t id)
+{
+	char path[FILENAME_MAX];
+	char str_id[1 + 2 * sizeof(id)];
+
+	bin2hex(str_id, id.bytes, sizeof(id.bytes));
+	snprintf(path, ARRAY_SIZE(path), "%s/.index/%s", mnt_dir, str_id);
+
+	return strdup(path);
+}
+
+static char *get_incomplete_filename(const char *mnt_dir, incfs_uuid_t id)
+{
+	char path[FILENAME_MAX];
+	char str_id[1 + 2 * sizeof(id)];
+
+	bin2hex(str_id, id.bytes, sizeof(id.bytes));
+	snprintf(path, ARRAY_SIZE(path), "%s/.incomplete/%s", mnt_dir, str_id);
+
+	return strdup(path);
+}
+
+int open_file_by_id(const char *mnt_dir, incfs_uuid_t id, bool use_ioctl)
+{
+	char *path = get_index_filename(mnt_dir, id);
+	int cmd_fd = open_commands_file(mnt_dir);
+	int fd = open(path, O_RDWR | O_CLOEXEC);
+	struct incfs_permit_fill permit_fill = {
+		.file_descriptor = fd,
+	};
+	int error = 0;
+
+	if (fd < 0) {
+		print_error("Can't open file by id.");
+		error = -errno;
+		goto out;
+	}
+
+	if (use_ioctl && ioctl(cmd_fd, INCFS_IOC_PERMIT_FILL, &permit_fill)) {
+		print_error("Failed to call PERMIT_FILL");
+		error = -errno;
+		goto out;
+	}
+
+	if (ioctl(fd, INCFS_IOC_PERMIT_FILL, &permit_fill) != -1) {
+		print_error(
+			"Successfully called PERMIT_FILL on non pending_read file");
+		return -errno;
+		goto out;
+	}
+
+out:
+	free(path);
+	close(cmd_fd);
+
+	if (error) {
+		close(fd);
+		return error;
+	}
+
+	return fd;
+}
+
+int get_file_attr(const char *mnt_dir, incfs_uuid_t id, char *value, int size)
+{
+	char *path = get_index_filename(mnt_dir, id);
+	int res;
+
+	res = getxattr(path, INCFS_XATTR_METADATA_NAME, value, size);
+	if (res < 0)
+		res = -errno;
+
+	free(path);
+	return res;
+}
+
+static bool same_id(incfs_uuid_t *id1, incfs_uuid_t *id2)
+{
+	return !memcmp(id1->bytes, id2->bytes, sizeof(id1->bytes));
+}
+
+ssize_t ZSTD_compress_default(char *data, char *comp_data, size_t data_size,
+					size_t comp_size)
+{
+	return ZSTD_compress(comp_data, comp_size, data, data_size, 1);
+}
+
+static int emit_test_blocks(const char *mnt_dir, struct test_file *file,
+			int blocks[], int count)
+{
+	uint8_t data[INCFS_DATA_FILE_BLOCK_SIZE];
+	uint8_t comp_data[2 * INCFS_DATA_FILE_BLOCK_SIZE];
+	int block_count = (count > 32) ? 32 : count;
+	int data_buf_size = 2 * INCFS_DATA_FILE_BLOCK_SIZE * block_count;
+	uint8_t *data_buf = malloc(data_buf_size);
+	uint8_t *current_data = data_buf;
+	uint8_t *data_end = data_buf + data_buf_size;
+	struct incfs_fill_block *block_buf =
+		calloc(block_count, sizeof(struct incfs_fill_block));
+	struct incfs_fill_blocks fill_blocks = {
+		.count = block_count,
+		.fill_blocks = ptr_to_u64(block_buf),
+	};
+	ssize_t write_res = 0;
+	int fd = -1;
+	int error = 0;
+	int i = 0;
+	int blocks_written = 0;
+
+	for (i = 0; i < block_count; i++) {
+		int block_index = blocks[i];
+		bool compress_zstd = (file->index + block_index) % 4 == 2;
+		bool compress_lz4 = (file->index + block_index) % 4 == 0;
+		int seed = get_file_block_seed(file->index, block_index);
+		off_t block_offset =
+			((off_t)block_index) * INCFS_DATA_FILE_BLOCK_SIZE;
+		size_t block_size = 0;
+
+		if (block_offset > file->size) {
+			error = -EINVAL;
+			break;
+		}
+		if (file->size - block_offset >
+			INCFS_DATA_FILE_BLOCK_SIZE)
+			block_size = INCFS_DATA_FILE_BLOCK_SIZE;
+		else
+			block_size = file->size - block_offset;
+
+		rnd_buf(data, block_size, seed);
+		if (compress_lz4) {
+			size_t comp_size = LZ4_compress_default((char *)data,
+					(char *)comp_data, block_size,
+					ARRAY_SIZE(comp_data));
+
+			if (comp_size <= 0) {
+				error = -EBADMSG;
+				break;
+			}
+			if (current_data + comp_size > data_end) {
+				error = -ENOMEM;
+				break;
+			}
+			memcpy(current_data, comp_data, comp_size);
+			block_size = comp_size;
+			block_buf[i].compression = COMPRESSION_LZ4;
+		} else if (compress_zstd) {
+			size_t comp_size = ZSTD_compress(comp_data,
+					ARRAY_SIZE(comp_data), data, block_size,
+					1);
+
+			if (comp_size <= 0) {
+				error = -EBADMSG;
+				break;
+			}
+			if (current_data + comp_size > data_end) {
+				error = -ENOMEM;
+				break;
+			}
+			memcpy(current_data, comp_data, comp_size);
+			block_size = comp_size;
+			block_buf[i].compression = COMPRESSION_ZSTD;
+		} else {
+			if (current_data + block_size > data_end) {
+				error = -ENOMEM;
+				break;
+			}
+			memcpy(current_data, data, block_size);
+			block_buf[i].compression = COMPRESSION_NONE;
+		}
+
+		block_buf[i].block_index = block_index;
+		block_buf[i].data_len = block_size;
+		block_buf[i].data = ptr_to_u64(current_data);
+		current_data += block_size;
+	}
+
+	if (!error) {
+		fd = open_file_by_id(mnt_dir, file->id, false);
+		if (fd < 0) {
+			error = -errno;
+			goto out;
+		}
+		write_res = ioctl(fd, INCFS_IOC_FILL_BLOCKS, &fill_blocks);
+		if (write_res >= 0) {
+			ksft_print_msg("Wrote to file via normal fd error\n");
+			error = -EPERM;
+			goto out;
+		}
+
+		close(fd);
+		fd = open_file_by_id(mnt_dir, file->id, true);
+		if (fd < 0) {
+			error = -errno;
+			goto out;
+		}
+		write_res = ioctl(fd, INCFS_IOC_FILL_BLOCKS, &fill_blocks);
+		if (write_res < 0)
+			error = -errno;
+		else
+			blocks_written = write_res;
+	}
+	if (error) {
+		ksft_print_msg(
+			"Writing data block error. Write returned: %d. Error:%s\n",
+			write_res, strerror(-error));
+	}
+
+out:
+	free(block_buf);
+	free(data_buf);
+	close(fd);
+	return (error < 0) ? error : blocks_written;
+}
+
+static int emit_test_block(const char *mnt_dir, struct test_file *file,
+				int block_index)
+{
+	int res = emit_test_blocks(mnt_dir, file, &block_index, 1);
+
+	if (res == 0)
+		return -EINVAL;
+	if (res == 1)
+		return 0;
+	return res;
+}
+
+static void shuffle(int array[], int count, unsigned int seed)
+{
+	int i;
+
+	for (i = 0; i < count - 1; i++) {
+		int items_left = count - i;
+		int shuffle_index;
+		int v;
+
+		seed = 1103515245 * seed + 12345;
+		shuffle_index = i + seed % items_left;
+
+		v = array[shuffle_index];
+		array[shuffle_index] = array[i];
+		array[i] = v;
+	}
+}
+
+static int emit_test_file_data(const char *mount_dir, struct test_file *file)
+{
+	int i;
+	int block_cnt = 1 + (file->size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+	int *block_indexes = NULL;
+	int result = 0;
+	int blocks_written = 0;
+
+	if (file->size == 0)
+		return 0;
+
+	block_indexes = calloc(block_cnt, sizeof(*block_indexes));
+	for (i = 0; i < block_cnt; i++)
+		block_indexes[i] = i;
+	shuffle(block_indexes, block_cnt, file->index);
+
+	for (i = 0; i < block_cnt; i += blocks_written) {
+		blocks_written = emit_test_blocks(mount_dir, file,
+					block_indexes + i, block_cnt - i);
+		if (blocks_written < 0) {
+			result = blocks_written;
+			goto out;
+		}
+		if (blocks_written == 0) {
+			result = -EIO;
+			goto out;
+		}
+	}
+out:
+	free(block_indexes);
+	return result;
+}
+
+static loff_t read_whole_file(const char *filename)
+{
+	int fd = -1;
+	loff_t result;
+	loff_t bytes_read = 0;
+	uint8_t buff[16 * 1024];
+
+	fd = open(filename, O_RDONLY | O_CLOEXEC);
+	if (fd <= 0)
+		return fd;
+
+	while (1) {
+		int read_result = read(fd, buff, ARRAY_SIZE(buff));
+
+		if (read_result < 0) {
+			print_error("Error during reading from a file.");
+			result = -errno;
+			goto cleanup;
+		} else if (read_result == 0)
+			break;
+
+		bytes_read += read_result;
+	}
+	result = bytes_read;
+
+cleanup:
+	close(fd);
+	return result;
+}
+
+static int read_test_file(uint8_t *buf, size_t len, char *filename,
+			  int block_idx)
+{
+	int fd = -1;
+	int result;
+	int bytes_read = 0;
+	size_t bytes_to_read = len;
+	off_t offset = ((off_t)block_idx) * INCFS_DATA_FILE_BLOCK_SIZE;
+
+	fd = open(filename, O_RDONLY | O_CLOEXEC);
+	if (fd <= 0)
+		return fd;
+
+	if (lseek(fd, offset, SEEK_SET) != offset) {
+		print_error("Seek error");
+		return -errno;
+	}
+
+	while (bytes_read < bytes_to_read) {
+		int read_result =
+			read(fd, buf + bytes_read, bytes_to_read - bytes_read);
+		if (read_result < 0) {
+			result = -errno;
+			goto cleanup;
+		} else if (read_result == 0)
+			break;
+
+		bytes_read += read_result;
+	}
+	result = bytes_read;
+
+cleanup:
+	close(fd);
+	return result;
+}
+
+static char *create_backing_dir(const char *mount_dir)
+{
+	struct stat st;
+	char backing_dir_name[255];
+
+	snprintf(backing_dir_name, ARRAY_SIZE(backing_dir_name), "%s-src",
+		 mount_dir);
+
+	if (stat(backing_dir_name, &st) == 0) {
+		if (S_ISDIR(st.st_mode)) {
+			int error = delete_dir_tree(backing_dir_name);
+
+			if (error) {
+				ksft_print_msg(
+				      "Can't delete existing backing dir. %d\n",
+				      error);
+				return NULL;
+			}
+		} else {
+			if (unlink(backing_dir_name)) {
+				print_error("Can't clear backing dir");
+				return NULL;
+			}
+		}
+	}
+
+	if (mkdir(backing_dir_name, 0777)) {
+		if (errno != EEXIST) {
+			print_error("Can't open/create backing dir");
+			return NULL;
+		}
+	}
+
+	return strdup(backing_dir_name);
+}
+
+static int validate_test_file_content_with_seed(const char *mount_dir,
+						struct test_file *file,
+						unsigned int shuffle_seed)
+{
+	int error = -1;
+	char *filename = concat_file_name(mount_dir, file->name);
+	off_t size = file->size;
+	loff_t actual_size = get_file_size(filename);
+	int block_cnt = 1 + (size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+	int *block_indexes = NULL;
+	int i;
+
+	block_indexes = alloca(sizeof(int) * block_cnt);
+	for (i = 0; i < block_cnt; i++)
+		block_indexes[i] = i;
+
+	if (shuffle_seed != 0)
+		shuffle(block_indexes, block_cnt, shuffle_seed);
+
+	if (actual_size != size) {
+		ksft_print_msg(
+			"File size doesn't match. name: %s expected size:%ld actual size:%ld\n",
+			filename, size, actual_size);
+		error = -1;
+		goto failure;
+	}
+
+	for (i = 0; i < block_cnt; i++) {
+		int block_idx = block_indexes[i];
+		uint8_t expected_block[INCFS_DATA_FILE_BLOCK_SIZE];
+		uint8_t actual_block[INCFS_DATA_FILE_BLOCK_SIZE];
+		int seed = get_file_block_seed(file->index, block_idx);
+		size_t bytes_to_compare = min(
+			(off_t)INCFS_DATA_FILE_BLOCK_SIZE,
+			size - ((off_t)block_idx) * INCFS_DATA_FILE_BLOCK_SIZE);
+		int read_result =
+			read_test_file(actual_block, INCFS_DATA_FILE_BLOCK_SIZE,
+				       filename, block_idx);
+		if (read_result < 0) {
+			ksft_print_msg(
+				"Error reading block %d from file %s. Error: %s\n",
+				block_idx, filename, strerror(-read_result));
+			error = read_result;
+			goto failure;
+		}
+		rnd_buf(expected_block, INCFS_DATA_FILE_BLOCK_SIZE, seed);
+		if (memcmp(expected_block, actual_block, bytes_to_compare)) {
+			ksft_print_msg(
+				"File contents don't match. name: %s block:%d\n",
+				file->name, block_idx);
+			error = -2;
+			goto failure;
+		}
+	}
+	free(filename);
+	return 0;
+
+failure:
+	free(filename);
+	return error;
+}
+
+static int validate_test_file_content(const char *mount_dir,
+				      struct test_file *file)
+{
+	return validate_test_file_content_with_seed(mount_dir, file, 0);
+}
+
+static int data_producer(const char *mount_dir, struct test_files_set *test_set)
+{
+	int ret = 0;
+	int timeout_ms = 1000;
+	struct incfs_pending_read_info prs[100] = {};
+	int prs_size = ARRAY_SIZE(prs);
+	int fd = open_commands_file(mount_dir);
+
+	if (fd < 0)
+		return -errno;
+
+	while ((ret = wait_for_pending_reads(fd, timeout_ms, prs, prs_size)) >
+	       0) {
+		int read_count = ret;
+		int i;
+
+		for (i = 0; i < read_count; i++) {
+			int j = 0;
+			struct test_file *file = NULL;
+
+			for (j = 0; j < test_set->files_count; j++) {
+				bool same = same_id(&(test_set->files[j].id),
+					&(prs[i].file_id));
+
+				if (same) {
+					file = &test_set->files[j];
+					break;
+				}
+			}
+			if (!file) {
+				ksft_print_msg(
+					"Unknown file in pending reads.\n");
+				break;
+			}
+
+			ret = emit_test_block(mount_dir, file,
+				prs[i].block_index);
+			if (ret < 0) {
+				ksft_print_msg("Emitting test data error: %s\n",
+						strerror(-ret));
+				break;
+			}
+		}
+	}
+	close(fd);
+	return ret;
+}
+
+static int data_producer2(const char *mount_dir,
+			  struct test_files_set *test_set)
+{
+	int ret = 0;
+	int timeout_ms = 1000;
+	struct incfs_pending_read_info2 prs[100] = {};
+	int prs_size = ARRAY_SIZE(prs);
+	int fd = open_commands_file(mount_dir);
+
+	if (fd < 0)
+		return -errno;
+
+	while ((ret = wait_for_pending_reads2(fd, timeout_ms, prs, prs_size)) >
+	       0) {
+		int read_count = ret;
+		int i;
+
+		for (i = 0; i < read_count; i++) {
+			int j = 0;
+			struct test_file *file = NULL;
+
+			for (j = 0; j < test_set->files_count; j++) {
+				bool same = same_id(&(test_set->files[j].id),
+					&(prs[i].file_id));
+
+				if (same) {
+					file = &test_set->files[j];
+					break;
+				}
+			}
+			if (!file) {
+				ksft_print_msg(
+					"Unknown file in pending reads.\n");
+				break;
+			}
+
+			ret = emit_test_block(mount_dir, file,
+				prs[i].block_index);
+			if (ret < 0) {
+				ksft_print_msg("Emitting test data error: %s\n",
+						strerror(-ret));
+				break;
+			}
+		}
+	}
+	close(fd);
+	return ret;
+}
+
+static int build_mtree(struct test_file *file)
+{
+	char data[INCFS_DATA_FILE_BLOCK_SIZE] = {};
+	const int digest_size = SHA256_DIGEST_SIZE;
+	const int hash_per_block = INCFS_DATA_FILE_BLOCK_SIZE / digest_size;
+	int block_count = 0;
+	int hash_block_count = 0;
+	int total_tree_block_count = 0;
+	int tree_lvl_index[INCFS_MAX_MTREE_LEVELS] = {};
+	int tree_lvl_count[INCFS_MAX_MTREE_LEVELS] = {};
+	int levels_count = 0;
+	int i, level;
+
+	if (file->size == 0)
+		return 0;
+
+	block_count = 1 + (file->size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+	hash_block_count = block_count;
+	for (i = 0; hash_block_count > 1; i++) {
+		hash_block_count = (hash_block_count + hash_per_block - 1)
+			/ hash_per_block;
+		tree_lvl_count[i] = hash_block_count;
+		total_tree_block_count += hash_block_count;
+	}
+	levels_count = i;
+
+	for (i = 0; i < levels_count; i++) {
+		int prev_lvl_base = (i == 0) ? total_tree_block_count :
+			tree_lvl_index[i - 1];
+
+		tree_lvl_index[i] = prev_lvl_base - tree_lvl_count[i];
+	}
+
+	file->mtree_block_count = total_tree_block_count;
+	if (block_count == 1) {
+		int seed = get_file_block_seed(file->index, 0);
+
+		memset(data, 0, INCFS_DATA_FILE_BLOCK_SIZE);
+		rnd_buf((uint8_t *)data, file->size, seed);
+		sha256(data, INCFS_DATA_FILE_BLOCK_SIZE, file->root_hash);
+		return 0;
+	}
+
+	file->mtree = calloc(total_tree_block_count, sizeof(*file->mtree));
+	/* Build level 0 hashes. */
+	for (i = 0; i < block_count; i++) {
+		off_t offset = i * INCFS_DATA_FILE_BLOCK_SIZE;
+		size_t block_size = INCFS_DATA_FILE_BLOCK_SIZE;
+		int block_index = tree_lvl_index[0] +
+					i / hash_per_block;
+		int block_off = (i % hash_per_block) * digest_size;
+		int seed = get_file_block_seed(file->index, i);
+		char *hash_ptr = file->mtree[block_index].data + block_off;
+
+		if (file->size - offset < block_size) {
+			block_size = file->size - offset;
+			memset(data, 0, INCFS_DATA_FILE_BLOCK_SIZE);
+		}
+
+		rnd_buf((uint8_t *)data, block_size, seed);
+		sha256(data, INCFS_DATA_FILE_BLOCK_SIZE, hash_ptr);
+	}
+
+	/* Build higher levels of hash tree. */
+	for (level = 1; level < levels_count; level++) {
+		int prev_lvl_base = tree_lvl_index[level - 1];
+		int prev_lvl_count = tree_lvl_count[level - 1];
+
+		for (i = 0; i < prev_lvl_count; i++) {
+			int block_index =
+				i / hash_per_block + tree_lvl_index[level];
+			int block_off = (i % hash_per_block) * digest_size;
+			char *hash_ptr =
+				file->mtree[block_index].data + block_off;
+
+			sha256(file->mtree[i + prev_lvl_base].data,
+			       INCFS_DATA_FILE_BLOCK_SIZE, hash_ptr);
+		}
+	}
+
+	/* Calculate root hash from the top block */
+	sha256(file->mtree[0].data,
+		INCFS_DATA_FILE_BLOCK_SIZE, file->root_hash);
+
+	return 0;
+}
+
+static int load_hash_tree(const char *mount_dir, struct test_file *file)
+{
+	int err;
+	int i;
+	int fd;
+	struct incfs_fill_blocks fill_blocks = {
+		.count = file->mtree_block_count,
+	};
+	struct incfs_fill_block *fill_block_array =
+		calloc(fill_blocks.count, sizeof(struct incfs_fill_block));
+
+	if (fill_blocks.count == 0)
+		return 0;
+
+	if (!fill_block_array)
+		return -ENOMEM;
+	fill_blocks.fill_blocks = ptr_to_u64(fill_block_array);
+
+	for (i = 0; i < fill_blocks.count; i++) {
+		fill_block_array[i] = (struct incfs_fill_block){
+			.block_index = i,
+			.data_len = INCFS_DATA_FILE_BLOCK_SIZE,
+			.data = ptr_to_u64(file->mtree[i].data),
+			.flags = INCFS_BLOCK_FLAGS_HASH
+		};
+	}
+
+	fd = open_file_by_id(mount_dir, file->id, false);
+	if (fd < 0) {
+		err = errno;
+		goto failure;
+	}
+
+	err = ioctl(fd, INCFS_IOC_FILL_BLOCKS, &fill_blocks);
+	close(fd);
+	if (err >= 0) {
+		err = -EPERM;
+		goto failure;
+	}
+
+	fd = open_file_by_id(mount_dir, file->id, true);
+	if (fd < 0) {
+		err = errno;
+		goto failure;
+	}
+
+	err = ioctl(fd, INCFS_IOC_FILL_BLOCKS, &fill_blocks);
+	close(fd);
+	if (err < fill_blocks.count)
+		err = errno;
+	else
+		err = 0;
+
+failure:
+	free(fill_block_array);
+	return err;
+}
+
+static int cant_touch_index_test(const char *mount_dir)
+{
+	char *file_name = "test_file";
+	int file_size = 123;
+	incfs_uuid_t file_id;
+	char *index_path = concat_file_name(mount_dir, ".index");
+	char *subdir = concat_file_name(index_path, "subdir");
+	char *dst_name = concat_file_name(mount_dir, "something");
+	char *filename_in_index = NULL;
+	char *file_path = concat_file_name(mount_dir, file_name);
+	char *backing_dir;
+	int cmd_fd = -1;
+	int err;
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	/* Mount FS and release the backing file. */
+	if (mount_fs(mount_dir, backing_dir, 50) != 0)
+		goto failure;
+	free(backing_dir);
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+
+	err = mkdir(subdir, 0777);
+	if (err == 0 || errno != EBUSY) {
+		print_error("Shouldn't be able to crate subdir in index\n");
+		goto failure;
+	}
+
+	err = rmdir(index_path);
+	if (err == 0 || errno != EBUSY) {
+		print_error(".index directory should not be removed\n");
+		goto failure;
+	}
+
+	err = emit_file(cmd_fd, ".index", file_name, &file_id,
+				file_size, NULL);
+	if (err != -EBUSY) {
+		print_error("Shouldn't be able to crate a file in index\n");
+		goto failure;
+	}
+
+	err = emit_file(cmd_fd, NULL, file_name, &file_id,
+				file_size, NULL);
+	if (err < 0)
+		goto failure;
+	filename_in_index = get_index_filename(mount_dir, file_id);
+
+	err = unlink(filename_in_index);
+	if (err == 0 || errno != EBUSY) {
+		print_error("Shouldn't be delete from index\n");
+		goto failure;
+	}
+
+
+	err = rename(filename_in_index, dst_name);
+	if (err == 0 || errno != EBUSY) {
+		print_error("Shouldn't be able to move from index\n");
+		goto failure;
+	}
+
+	free(filename_in_index);
+	filename_in_index = concat_file_name(index_path, "abc");
+	err = link(file_path, filename_in_index);
+	if (err == 0 || errno != EBUSY) {
+		print_error("Shouldn't be able to link inside index\n");
+		goto failure;
+	}
+
+	err = rename(index_path, dst_name);
+	if (err == 0 || errno != EBUSY) {
+		print_error("Shouldn't rename .index directory\n");
+		goto failure;
+	}
+
+	close(cmd_fd);
+	free(subdir);
+	free(index_path);
+	free(dst_name);
+	free(filename_in_index);
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	return TEST_SUCCESS;
+
+failure:
+	free(subdir);
+	free(dst_name);
+	free(index_path);
+	free(filename_in_index);
+	close(cmd_fd);
+	umount(mount_dir);
+	return TEST_FAILURE;
+}
+
+static bool iterate_directory(const char *dir_to_iterate, bool root,
+			      int file_count)
+{
+	struct expected_name {
+		const char *name;
+		bool root_only;
+		bool found;
+	} names[] = {
+		{INCFS_LOG_FILENAME, true, false},
+		{INCFS_PENDING_READS_FILENAME, true, false},
+		{INCFS_BLOCKS_WRITTEN_FILENAME, true, false},
+		{".index", true, false},
+		{".incomplete", true, false},
+		{"..", false, false},
+		{".", false, false},
+	};
+
+	bool pass = true, found;
+	int i;
+
+	/* Test directory iteration */
+	int fd = open(dir_to_iterate, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+
+	if (fd < 0) {
+		print_error("Can't open directory\n");
+		return false;
+	}
+
+	for (;;) {
+		/* Enough space for one dirent - no name over 30 */
+		char buf[sizeof(struct linux_dirent64) + NAME_MAX];
+		struct linux_dirent64 *dirent = (struct linux_dirent64 *) buf;
+		int nread;
+		int i;
+
+		for (i = 0; i < NAME_MAX; ++i) {
+			nread = syscall(__NR_getdents64, fd, buf,
+					 sizeof(struct linux_dirent64) + i);
+
+			if (nread >= 0)
+				break;
+			if (errno != EINVAL)
+				break;
+		}
+
+		if (nread == 0)
+			break;
+		if (nread < 0) {
+			print_error("Error iterating directory\n");
+			pass = false;
+			goto failure;
+		}
+
+		/* Expected size is rounded up to 8 byte boundary. Not sure if
+		 * this is universal truth or just happenstance, but useful test
+		 * for the moment
+		 */
+		if (nread != (((sizeof(struct linux_dirent64)
+				+ strlen(dirent->d_name) + 1) + 7) & ~7)) {
+			print_error("Wrong dirent size");
+			pass = false;
+			goto failure;
+		}
+
+		found = false;
+		for (i = 0; i < sizeof(names) / sizeof(*names); ++i)
+			if (!strcmp(dirent->d_name, names[i].name)) {
+				if (names[i].root_only && !root) {
+					print_error("Root file error");
+					pass = false;
+					goto failure;
+				}
+
+				if (names[i].found) {
+					print_error("File appears twice");
+					pass = false;
+					goto failure;
+				}
+
+				names[i].found = true;
+				found = true;
+				break;
+			}
+
+		if (!found)
+			--file_count;
+	}
+
+	for (i = 0; i < sizeof(names) / sizeof(*names); ++i) {
+		if (!names[i].found)
+			if (root || !names[i].root_only) {
+				print_error("Expected file not present");
+				pass = false;
+				goto failure;
+			}
+	}
+
+	if (file_count) {
+		print_error("Wrong number of files\n");
+		pass = false;
+		goto failure;
+	}
+
+failure:
+	close(fd);
+	return pass;
+}
+
+static int basic_file_ops_test(const char *mount_dir)
+{
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+	char *subdir1 = concat_file_name(mount_dir, "subdir1");
+	char *subdir2 = concat_file_name(mount_dir, "subdir2");
+	char *backing_dir;
+	int cmd_fd = -1;
+	int i, err;
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	/* Mount FS and release the backing file. */
+	if (mount_fs(mount_dir, backing_dir, 50) != 0)
+		goto failure;
+	free(backing_dir);
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	err = mkdir(subdir1, 0777);
+	if (err < 0 && errno != EEXIST) {
+		print_error("Can't create subdir1\n");
+		goto failure;
+	}
+
+	err = mkdir(subdir2, 0777);
+	if (err < 0 && errno != EEXIST) {
+		print_error("Can't create subdir2\n");
+		goto failure;
+	}
+
+	/* Create all test files in subdir1 directory */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		loff_t size;
+		char *file_path = concat_file_name(subdir1, file->name);
+
+		err = emit_file(cmd_fd, "subdir1", file->name, &file->id,
+				     file->size, NULL);
+		if (err < 0)
+			goto failure;
+
+		size = get_file_size(file_path);
+		free(file_path);
+		if (size != file->size) {
+			ksft_print_msg("Wrong size %lld of %s.\n",
+				size, file->name);
+			goto failure;
+		}
+	}
+
+	if (!iterate_directory(subdir1, false, file_num))
+		goto failure;
+
+	/* Link the files to subdir2 */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		char *src_name = concat_file_name(subdir1, file->name);
+		char *dst_name = concat_file_name(subdir2, file->name);
+		loff_t size;
+
+		err = link(src_name, dst_name);
+		if (err < 0) {
+			print_error("Can't move file\n");
+			goto failure;
+		}
+
+		size = get_file_size(dst_name);
+		if (size != file->size) {
+			ksft_print_msg("Wrong size %lld of %s.\n",
+				size, file->name);
+			goto failure;
+		}
+		free(src_name);
+		free(dst_name);
+	}
+
+	/* Move the files from subdir2 to the mount dir */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		char *src_name = concat_file_name(subdir2, file->name);
+		char *dst_name = concat_file_name(mount_dir, file->name);
+		loff_t size;
+
+		err = rename(src_name, dst_name);
+		if (err < 0) {
+			print_error("Can't move file\n");
+			goto failure;
+		}
+
+		size = get_file_size(dst_name);
+		if (size != file->size) {
+			ksft_print_msg("Wrong size %lld of %s.\n",
+				size, file->name);
+			goto failure;
+		}
+		free(src_name);
+		free(dst_name);
+	}
+
+	/* +2 because there are 2 subdirs */
+	if (!iterate_directory(mount_dir, true, file_num + 2))
+		goto failure;
+
+	/* Open and close all files from the mount dir */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		char *path = concat_file_name(mount_dir, file->name);
+		int fd;
+
+		fd = open(path, O_RDWR | O_CLOEXEC);
+		free(path);
+		if (fd <= 0) {
+			print_error("Can't open file");
+			goto failure;
+		}
+		if (close(fd)) {
+			print_error("Can't close file");
+			goto failure;
+		}
+	}
+
+	/* Delete all files from the mount dir */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		char *path = concat_file_name(mount_dir, file->name);
+
+		err = unlink(path);
+		free(path);
+		if (err < 0) {
+			print_error("Can't unlink file");
+			goto failure;
+		}
+	}
+
+	err = delete_dir_tree(subdir1);
+	if (err) {
+		ksft_print_msg("Error deleting subdir1 %d", err);
+		goto failure;
+	}
+
+	err = rmdir(subdir2);
+	if (err) {
+		print_error("Error deleting subdir2");
+		goto failure;
+	}
+
+	close(cmd_fd);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	return TEST_SUCCESS;
+
+failure:
+	close(cmd_fd);
+	umount(mount_dir);
+	return TEST_FAILURE;
+}
+
+static int dynamic_files_and_data_test(const char *mount_dir)
+{
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+	const int missing_file_idx = 5;
+	int cmd_fd = -1;
+	char *backing_dir;
+	int i;
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	/* Mount FS and release the backing file. */
+	if (mount_fs(mount_dir, backing_dir, 50) != 0)
+		goto failure;
+	free(backing_dir);
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Check that test files don't exist in the filesystem. */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		char *filename = concat_file_name(mount_dir, file->name);
+
+		if (access(filename, F_OK) != -1) {
+			ksft_print_msg(
+				"File %s somehow already exists in a clean FS.\n",
+				filename);
+			goto failure;
+		}
+		free(filename);
+	}
+
+	/* Write test data into the command file. */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		int res;
+
+		res = emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, NULL);
+		if (res < 0) {
+			ksft_print_msg("Error %s emiting file %s.\n",
+				       strerror(-res), file->name);
+			goto failure;
+		}
+
+		/* Skip writing data to one file so we can check */
+		/* that it's missing later. */
+		if (i == missing_file_idx)
+			continue;
+
+		res = emit_test_file_data(mount_dir, file);
+		if (res) {
+			ksft_print_msg("Error %s emiting data for %s.\n",
+				       strerror(-res), file->name);
+			goto failure;
+		}
+	}
+
+	/* Validate contents of the FS */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (i == missing_file_idx) {
+			/* No data has been written to this file. */
+			/* Check for read error; */
+			uint8_t buf;
+			char *filename =
+				concat_file_name(mount_dir, file->name);
+			int res = read_test_file(&buf, 1, filename, 0);
+
+			free(filename);
+			if (res > 0) {
+				ksft_print_msg(
+					"Data present, even though never writtern.\n");
+				goto failure;
+			}
+			if (res != -ETIME) {
+				ksft_print_msg("Wrong error code: %d.\n", res);
+				goto failure;
+			}
+		} else {
+			if (validate_test_file_content(mount_dir, file) < 0)
+				goto failure;
+		}
+	}
+
+	close(cmd_fd);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	return TEST_SUCCESS;
+
+failure:
+	close(cmd_fd);
+	umount(mount_dir);
+	return TEST_FAILURE;
+}
+
+static int concurrent_reads_and_writes_test(const char *mount_dir)
+{
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+	/* Validate each file from that many child processes. */
+	const int child_multiplier = 3;
+	int cmd_fd = -1;
+	char *backing_dir;
+	int status;
+	int i;
+	pid_t producer_pid;
+	pid_t *child_pids = alloca(child_multiplier * file_num * sizeof(pid_t));
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	/* Mount FS and release the backing file. */
+	if (mount_fs(mount_dir, backing_dir, 500) != 0)
+		goto failure;
+	free(backing_dir);
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Tell FS about the files, without actually providing the data. */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		int res;
+
+		res = emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, NULL);
+		if (res)
+			goto failure;
+	}
+
+	/* Start child processes acessing data in the files */
+	for (i = 0; i < file_num * child_multiplier; i++) {
+		struct test_file *file = &test.files[i / child_multiplier];
+		pid_t child_pid = flush_and_fork();
+
+		if (child_pid == 0) {
+			/* This is a child process, do the data validation. */
+			int ret = validate_test_file_content_with_seed(
+				mount_dir, file, i);
+			if (ret >= 0) {
+				/* Zero exit status if data is valid. */
+				exit(0);
+			}
+
+			/* Positive status if validation error found. */
+			exit(-ret);
+		} else if (child_pid > 0) {
+			child_pids[i] = child_pid;
+		} else {
+			print_error("Fork error");
+			goto failure;
+		}
+	}
+
+	producer_pid = flush_and_fork();
+	if (producer_pid == 0) {
+		int ret;
+		/*
+		 * This is a child that should provide data to
+		 * pending reads.
+		 */
+
+		ret = data_producer(mount_dir, &test);
+		exit(-ret);
+	} else {
+		status = wait_for_process(producer_pid);
+		if (status != 0) {
+			ksft_print_msg("Data produces failed. %d(%s) ", status,
+				       strerror(status));
+			goto failure;
+		}
+	}
+
+	/* Check that all children has finished with 0 exit status */
+	for (i = 0; i < file_num * child_multiplier; i++) {
+		struct test_file *file = &test.files[i / child_multiplier];
+
+		status = wait_for_process(child_pids[i]);
+		if (status != 0) {
+			ksft_print_msg(
+				"Validation for the file %s failed with code %d (%s)\n",
+				file->name, status, strerror(status));
+			goto failure;
+		}
+	}
+
+	/* Check that there are no pending reads left */
+	{
+		struct incfs_pending_read_info prs[1] = {};
+		int timeout = 0;
+		int read_count = wait_for_pending_reads(cmd_fd, timeout, prs,
+							ARRAY_SIZE(prs));
+
+		if (read_count) {
+			ksft_print_msg(
+				"Pending reads pending when all data written\n");
+			goto failure;
+		}
+	}
+
+	close(cmd_fd);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	return TEST_SUCCESS;
+
+failure:
+	close(cmd_fd);
+	umount(mount_dir);
+	return TEST_FAILURE;
+}
+
+static int work_after_remount_test(const char *mount_dir)
+{
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+	const int file_num_stage1 = file_num / 2;
+	const int file_num_stage2 = file_num;
+	char *backing_dir = NULL;
+	int i = 0;
+	int cmd_fd = -1;
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	/* Mount FS and release the backing file. */
+	if (mount_fs(mount_dir, backing_dir, 50) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Write first half of the data into the command file. (stage 1) */
+	for (i = 0; i < file_num_stage1; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, NULL))
+			goto failure;
+
+		if (emit_test_file_data(mount_dir, file))
+			goto failure;
+	}
+
+	/* Unmount and mount again, to see that data is persistent. */
+	close(cmd_fd);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	if (mount_fs(mount_dir, backing_dir, 50) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Write the second half of the data into the command file. (stage 2) */
+	for (; i < file_num_stage2; i++) {
+		struct test_file *file = &test.files[i];
+		int res = emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, NULL);
+
+		if (res)
+			goto failure;
+
+		if (emit_test_file_data(mount_dir, file))
+			goto failure;
+	}
+
+	/* Validate contents of the FS */
+	for (i = 0; i < file_num_stage2; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (validate_test_file_content(mount_dir, file) < 0)
+			goto failure;
+	}
+
+	/* Delete all files */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		char *filename = concat_file_name(mount_dir, file->name);
+		char *filename_in_index = get_index_filename(mount_dir,
+							file->id);
+
+		if (access(filename, F_OK) != 0) {
+			ksft_print_msg("File %s is not visible.\n", filename);
+			goto failure;
+		}
+
+		if (access(filename_in_index, F_OK) != 0) {
+			ksft_print_msg("File %s is not visible.\n",
+				filename_in_index);
+			goto failure;
+		}
+
+		unlink(filename);
+
+		if (access(filename, F_OK) != -1) {
+			ksft_print_msg("File %s is still present.\n", filename);
+			goto failure;
+		}
+
+		if (access(filename_in_index, F_OK) != -1) {
+			ksft_print_msg("File %s is still present.\n",
+				filename_in_index);
+			goto failure;
+		}
+		free(filename);
+		free(filename_in_index);
+	}
+
+	/* Unmount and mount again, to see that deleted files stay deleted. */
+	close(cmd_fd);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	if (mount_fs(mount_dir, backing_dir, 50) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Validate all deleted files are still deleted. */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		char *filename = concat_file_name(mount_dir, file->name);
+
+		if (access(filename, F_OK) != -1) {
+			ksft_print_msg("File %s is still visible.\n", filename);
+			goto failure;
+		}
+		free(filename);
+	}
+
+	/* Final unmount */
+	close(cmd_fd);
+	free(backing_dir);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	return TEST_SUCCESS;
+
+failure:
+	close(cmd_fd);
+	free(backing_dir);
+	umount(mount_dir);
+	return TEST_FAILURE;
+}
+
+static int attribute_test(const char *mount_dir)
+{
+	char file_attr[] = "metadata123123";
+	char attr_buf[INCFS_MAX_FILE_ATTR_SIZE] = {};
+	int cmd_fd = -1;
+	incfs_uuid_t file_id;
+	int attr_res = 0;
+	char *backing_dir;
+
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	/* Mount FS and release the backing file. */
+	if (mount_fs(mount_dir, backing_dir, 50) != 0)
+		goto failure;
+
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	if (emit_file(cmd_fd, NULL, "file", &file_id, 12, file_attr))
+		goto failure;
+
+	/* Test attribute values */
+	attr_res = get_file_attr(mount_dir, file_id, attr_buf,
+		ARRAY_SIZE(attr_buf));
+	if (attr_res != strlen(file_attr)) {
+		ksft_print_msg("Get file attr error: %d\n", attr_res);
+		goto failure;
+	}
+	if (strcmp(attr_buf, file_attr) != 0) {
+		ksft_print_msg("Incorrect file attr value: '%s'", attr_buf);
+		goto failure;
+	}
+
+	/* Unmount and mount again, to see that attributes are persistent. */
+	close(cmd_fd);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	if (mount_fs(mount_dir, backing_dir, 50) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Test attribute values again after remount*/
+	attr_res = get_file_attr(mount_dir, file_id, attr_buf,
+		ARRAY_SIZE(attr_buf));
+	if (attr_res != strlen(file_attr)) {
+		ksft_print_msg("Get dir attr error: %d\n", attr_res);
+		goto failure;
+	}
+	if (strcmp(attr_buf, file_attr) != 0) {
+		ksft_print_msg("Incorrect file attr value: '%s'", attr_buf);
+		goto failure;
+	}
+
+	/* Final unmount */
+	close(cmd_fd);
+	free(backing_dir);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	return TEST_SUCCESS;
+
+failure:
+	close(cmd_fd);
+	free(backing_dir);
+	umount(mount_dir);
+	return TEST_FAILURE;
+}
+
+static int child_procs_waiting_for_data_test(const char *mount_dir)
+{
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+	int cmd_fd = -1;
+	int i;
+	pid_t *child_pids = alloca(file_num * sizeof(pid_t));
+	char *backing_dir;
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	/* Mount FS and release the backing file.  (10s wait time) */
+	if (mount_fs(mount_dir, backing_dir, 10000) != 0)
+		goto failure;
+
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Tell FS about the files, without actually providing the data. */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, NULL);
+	}
+
+	/* Start child processes acessing data in the files */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		pid_t child_pid = flush_and_fork();
+
+		if (child_pid == 0) {
+			/* This is a child process, do the data validation. */
+			int ret = validate_test_file_content(mount_dir, file);
+
+			if (ret >= 0) {
+				/* Zero exit status if data is valid. */
+				exit(0);
+			}
+
+			/* Positive status if validation error found. */
+			exit(-ret);
+		} else if (child_pid > 0) {
+			child_pids[i] = child_pid;
+		} else {
+			print_error("Fork error");
+			goto failure;
+		}
+	}
+
+	/* Write test data into the command file. */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (emit_test_file_data(mount_dir, file))
+			goto failure;
+	}
+
+	/* Check that all children has finished with 0 exit status */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		int status = wait_for_process(child_pids[i]);
+
+		if (status != 0) {
+			ksft_print_msg(
+				"Validation for the file %s failed with code %d (%s)\n",
+				file->name, status, strerror(status));
+			goto failure;
+		}
+	}
+
+	close(cmd_fd);
+	free(backing_dir);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	return TEST_SUCCESS;
+
+failure:
+	close(cmd_fd);
+	free(backing_dir);
+	umount(mount_dir);
+	return TEST_FAILURE;
+}
+
+static int multiple_providers_test(const char *mount_dir)
+{
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+	const int producer_count = 5;
+	int cmd_fd = -1;
+	int status;
+	int i;
+	pid_t *producer_pids = alloca(producer_count * sizeof(pid_t));
+	char *backing_dir;
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	/* Mount FS and release the backing file.  (10s wait time) */
+	if (mount_fs_opt(mount_dir, backing_dir,
+			 "read_timeout_ms=10000,report_uid", false) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Tell FS about the files, without actually providing the data. */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, NULL) < 0)
+			goto failure;
+	}
+
+	/* Start producer processes */
+	for (i = 0; i < producer_count; i++) {
+		pid_t producer_pid = flush_and_fork();
+
+		if (producer_pid == 0) {
+			int ret;
+			/*
+			 * This is a child that should provide data to
+			 * pending reads.
+			 */
+
+			ret = data_producer2(mount_dir, &test);
+			exit(-ret);
+		} else if (producer_pid > 0) {
+			producer_pids[i] = producer_pid;
+		} else {
+			print_error("Fork error");
+			goto failure;
+		}
+	}
+
+	/* Validate FS content */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		char *filename = concat_file_name(mount_dir, file->name);
+		loff_t read_result = read_whole_file(filename);
+
+		free(filename);
+		if (read_result != file->size) {
+			ksft_print_msg(
+				"Error validating file %s. Result: %ld\n",
+				file->name, read_result);
+			goto failure;
+		}
+	}
+
+	/* Check that all producers has finished with 0 exit status */
+	for (i = 0; i < producer_count; i++) {
+		status = wait_for_process(producer_pids[i]);
+		if (status != 0) {
+			ksft_print_msg("Producer %d failed with code (%s)\n", i,
+				       strerror(status));
+			goto failure;
+		}
+	}
+
+	close(cmd_fd);
+	free(backing_dir);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+
+	return TEST_SUCCESS;
+
+failure:
+	close(cmd_fd);
+	free(backing_dir);
+	umount(mount_dir);
+	return TEST_FAILURE;
+}
+
+static int validate_hash_tree(const char *mount_dir, struct test_file *file)
+{
+	int result = TEST_FAILURE;
+	char *filename = NULL;
+	int fd = -1;
+	unsigned char *buf;
+	int i, err;
+
+	TEST(filename = concat_file_name(mount_dir, file->name), filename);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TEST(buf = malloc(INCFS_DATA_FILE_BLOCK_SIZE * 8), buf);
+
+	for (i = 0; i < file->mtree_block_count; ) {
+		int blocks_to_read = i % 7 + 1;
+		struct fsverity_read_metadata_arg args = {
+			.metadata_type = FS_VERITY_METADATA_TYPE_MERKLE_TREE,
+			.offset = i * INCFS_DATA_FILE_BLOCK_SIZE,
+			.length = blocks_to_read * INCFS_DATA_FILE_BLOCK_SIZE,
+			.buf_ptr = ptr_to_u64(buf),
+		};
+
+		TEST(err = ioctl(fd, FS_IOC_READ_VERITY_METADATA, &args),
+		     err == min(args.length, (file->mtree_block_count - i) *
+					     INCFS_DATA_FILE_BLOCK_SIZE));
+		TESTEQUAL(memcmp(buf, file->mtree[i].data, err), 0);
+
+		i += blocks_to_read;
+	}
+
+	result = TEST_SUCCESS;
+
+out:
+	free(buf);
+	close(fd);
+	free(filename);
+	return result;
+}
+
+static int hash_tree_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir;
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+	const int corrupted_file_idx = 5;
+	int i = 0;
+	int cmd_fd = -1;
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	/* Mount FS and release the backing file. */
+	if (mount_fs(mount_dir, backing_dir, 50) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Write hashes and data. */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+		int res;
+
+		build_mtree(file);
+		res = crypto_emit_file(cmd_fd, NULL, file->name, &file->id,
+				       file->size, file->root_hash,
+				       file->sig.add_data);
+
+		if (i == corrupted_file_idx) {
+			/* Corrupt third blocks hash */
+			file->mtree[0].data[2 * SHA256_DIGEST_SIZE] ^= 0xff;
+		}
+		if (emit_test_file_data(mount_dir, file))
+			goto failure;
+
+		res = load_hash_tree(mount_dir, file);
+		if (res) {
+			ksft_print_msg("Can't load hashes for %s. error: %s\n",
+				file->name, strerror(-res));
+			goto failure;
+		}
+	}
+
+	/* Validate data */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (i == corrupted_file_idx) {
+			uint8_t data[INCFS_DATA_FILE_BLOCK_SIZE];
+			char *filename =
+				concat_file_name(mount_dir, file->name);
+			int res;
+
+			res = read_test_file(data, INCFS_DATA_FILE_BLOCK_SIZE,
+					     filename, 2);
+			free(filename);
+			if (res != -EBADMSG) {
+				ksft_print_msg("Hash violation missed1. %d\n",
+					       res);
+				goto failure;
+			}
+		} else if (validate_test_file_content(mount_dir, file) < 0)
+			goto failure;
+		else if (validate_hash_tree(mount_dir, file) < 0)
+			goto failure;
+	}
+
+	/* Unmount and mount again, to that hashes are persistent. */
+	close(cmd_fd);
+	cmd_fd = -1;
+	if (umount(mount_dir) != 0) {
+		print_error("Can't unmout FS");
+		goto failure;
+	}
+	if (mount_fs(mount_dir, backing_dir, 50) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Validate data again */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (i == corrupted_file_idx) {
+			uint8_t data[INCFS_DATA_FILE_BLOCK_SIZE];
+			char *filename =
+				concat_file_name(mount_dir, file->name);
+			int res;
+
+			res = read_test_file(data, INCFS_DATA_FILE_BLOCK_SIZE,
+					     filename, 2);
+			free(filename);
+			if (res != -EBADMSG) {
+				ksft_print_msg("Hash violation missed2. %d\n",
+					       res);
+				goto failure;
+			}
+		} else if (validate_test_file_content(mount_dir, file) < 0)
+			goto failure;
+	}
+	result = TEST_SUCCESS;
+
+failure:
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		free(file->mtree);
+	}
+
+	close(cmd_fd);
+	free(backing_dir);
+	umount(mount_dir);
+	return result;
+}
+
+enum expected_log { FULL_LOG, NO_LOG, PARTIAL_LOG };
+
+static int validate_logs(const char *mount_dir, int log_fd,
+			 struct test_file *file,
+			 enum expected_log expected_log,
+			 bool report_uid, bool expect_data)
+{
+	int result = TEST_FAILURE;
+	uint8_t data[INCFS_DATA_FILE_BLOCK_SIZE];
+	struct incfs_pending_read_info prs[2048] = {};
+	struct incfs_pending_read_info2 prs2[2048] = {};
+	struct incfs_pending_read_info *previous_record = NULL;
+	int prs_size = ARRAY_SIZE(prs);
+	int block_count = 1 + (file->size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+	int expected_read_count, read_count, block_index, read_index;
+	char *filename = NULL;
+	int fd = -1;
+
+	TEST(filename = concat_file_name(mount_dir, file->name), filename);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+
+	if (block_count > prs_size)
+		block_count = prs_size;
+	expected_read_count = block_count;
+
+	for (block_index = 0; block_index < block_count; block_index++) {
+		int result = pread(fd, data, sizeof(data),
+			    INCFS_DATA_FILE_BLOCK_SIZE * block_index);
+
+		/* Make some read logs of type SAME_FILE_NEXT_BLOCK */
+		if (block_index % 100 == 10)
+			usleep(20000);
+
+		/* Skip some blocks to make logs of type SAME_FILE */
+		if (block_index % 10 == 5) {
+			++block_index;
+			--expected_read_count;
+		}
+
+		if (expect_data)
+			TESTCOND(result > 0);
+
+		if (!expect_data)
+			TESTEQUAL(result, -1);
+	}
+
+	if (report_uid)
+		read_count = wait_for_pending_reads2(log_fd,
+				expected_log == NO_LOG ? 10 : 0,
+				prs2, prs_size);
+	else
+		read_count = wait_for_pending_reads(log_fd,
+				expected_log == NO_LOG ? 10 : 0,
+				prs, prs_size);
+
+	if (expected_log == NO_LOG)
+		TESTEQUAL(read_count, 0);
+
+	if (expected_log == PARTIAL_LOG)
+		TESTCOND(read_count > 0 &&
+			 read_count <= expected_read_count);
+
+	if (expected_log == FULL_LOG)
+		TESTEQUAL(read_count, expected_read_count);
+
+	/* If read less than expected, advance block_index appropriately */
+	for (block_index = 0, read_index = 0;
+	     read_index < expected_read_count - read_count;
+	     block_index++, read_index++)
+		if (block_index % 10 == 5)
+			++block_index;
+
+	for (read_index = 0; read_index < read_count;
+	     block_index++, read_index++) {
+		struct incfs_pending_read_info *record = report_uid ?
+			(struct incfs_pending_read_info *) &prs2[read_index] :
+			&prs[read_index];
+
+		TESTCOND(same_id(&record->file_id, &file->id));
+		TESTEQUAL(record->block_index, block_index);
+		TESTNE(record->timestamp_us, 0);
+		if (previous_record)
+			TESTEQUAL(record->serial_number,
+				  previous_record->serial_number + 1);
+
+		previous_record = record;
+		if (block_index % 10 == 5)
+			++block_index;
+	}
+
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	free(filename);
+	return result;
+}
+
+static int read_log_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+	int i = 0;
+	int cmd_fd = -1, log_fd = -1;
+	char *backing_dir = NULL;
+
+	/* Create files */
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir,
+			       "readahead=0,report_uid,read_timeout_ms=0",
+				 false), 0);
+	TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		TESTEQUAL(emit_file(cmd_fd, NULL, file->name, &file->id,
+				    file->size, NULL), 0);
+	}
+	close(cmd_fd);
+	cmd_fd = -1;
+
+	/* Validate logs */
+	TEST(log_fd = open_log_file(mount_dir), log_fd != -1);
+	for (i = 0; i < file_num; i++)
+		TESTEQUAL(validate_logs(mount_dir, log_fd, &test.files[i],
+					FULL_LOG, true, false), 0);
+
+	/* Unmount and mount again without report_uid */
+	close(log_fd);
+	log_fd = -1;
+	TESTEQUAL(umount(mount_dir), 0);
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir,
+			       "readahead=0,read_timeout_ms=0", false), 0);
+
+	TEST(log_fd = open_log_file(mount_dir), log_fd != -1);
+	for (i = 0; i < file_num; i++)
+		TESTEQUAL(validate_logs(mount_dir, log_fd, &test.files[i],
+					FULL_LOG, false, false), 0);
+
+	/* No read log to make sure poll doesn't crash */
+	close(log_fd);
+	log_fd = -1;
+	TESTEQUAL(umount(mount_dir), 0);
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir,
+			       "readahead=0,rlog_pages=0,read_timeout_ms=0",
+			       false), 0);
+
+	TEST(log_fd = open_log_file(mount_dir), log_fd != -1);
+	for (i = 0; i < file_num; i++)
+		TESTEQUAL(validate_logs(mount_dir, log_fd, &test.files[i],
+					NO_LOG, false, false), 0);
+
+	/* Remount and check that logs start working again */
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir,
+			       "readahead=0,rlog_pages=1,read_timeout_ms=0",
+			       true), 0);
+	for (i = 0; i < file_num; i++)
+		TESTEQUAL(validate_logs(mount_dir, log_fd, &test.files[i],
+					PARTIAL_LOG, false, false), 0);
+
+	/* Remount and check that logs continue working */
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir,
+			       "readahead=0,rlog_pages=4,read_timeout_ms=0",
+			       true), 0);
+	for (i = 0; i < file_num; i++)
+		TESTEQUAL(validate_logs(mount_dir, log_fd, &test.files[i],
+					FULL_LOG, false, false), 0);
+
+	/* Check logs work with data */
+	for (i = 0; i < file_num; i++) {
+		TESTEQUAL(emit_test_file_data(mount_dir, &test.files[i]), 0);
+		TESTEQUAL(validate_logs(mount_dir, log_fd, &test.files[i],
+					FULL_LOG, false, true), 0);
+	}
+
+	/* Final unmount */
+	close(log_fd);
+	log_fd = -1;
+	TESTEQUAL(umount(mount_dir), 0);
+
+	result = TEST_SUCCESS;
+out:
+	close(cmd_fd);
+	close(log_fd);
+	free(backing_dir);
+	umount(mount_dir);
+	return result;
+}
+
+static int emit_partial_test_file_data(const char *mount_dir,
+				       struct test_file *file)
+{
+	int i, j;
+	int block_cnt = 1 + (file->size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+	int *block_indexes = NULL;
+	int result = 0;
+	int blocks_written = 0;
+	int bw_fd = -1;
+	char buffer[20];
+	struct pollfd pollfd;
+	long blocks_written_total, blocks_written_new_total;
+
+	if (file->size == 0)
+		return 0;
+
+	bw_fd = open_blocks_written_file(mount_dir);
+	if (bw_fd == -1)
+		return -errno;
+
+	result = read(bw_fd, buffer, sizeof(buffer));
+	if (result <= 0) {
+		result = -EIO;
+		goto out;
+	}
+
+	buffer[result] = 0;
+	blocks_written_total = strtol(buffer, NULL, 10);
+	result = 0;
+
+	pollfd = (struct pollfd) {
+		.fd = bw_fd,
+		.events = POLLIN,
+	};
+
+	result = poll(&pollfd, 1, 0);
+	if (result) {
+		result = -EIO;
+		goto out;
+	}
+
+	/* Emit 2 blocks, skip 2 blocks etc*/
+	block_indexes = calloc(block_cnt, sizeof(*block_indexes));
+	for (i = 0, j = 0; i < block_cnt; ++i)
+		if ((i & 2) == 0) {
+			block_indexes[j] = i;
+			++j;
+		}
+
+	for (i = 0; i < j; i += blocks_written) {
+		blocks_written = emit_test_blocks(mount_dir, file,
+						  block_indexes + i, j - i);
+		if (blocks_written < 0) {
+			result = blocks_written;
+			goto out;
+		}
+		if (blocks_written == 0) {
+			result = -EIO;
+			goto out;
+		}
+
+		result = poll(&pollfd, 1, 0);
+		if (result != 1 || pollfd.revents != POLLIN) {
+			result = -EIO;
+			goto out;
+		}
+
+		result = read(bw_fd, buffer, sizeof(buffer));
+		buffer[result] = 0;
+		blocks_written_new_total = strtol(buffer, NULL, 10);
+
+		if (blocks_written_new_total - blocks_written_total
+		    != blocks_written) {
+			result = -EIO;
+			goto out;
+		}
+
+		blocks_written_total = blocks_written_new_total;
+		result = 0;
+	}
+out:
+	free(block_indexes);
+	close(bw_fd);
+	return result;
+}
+
+static int validate_ranges(const char *mount_dir, struct test_file *file)
+{
+	int block_cnt = 1 + (file->size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+	char *filename = concat_file_name(mount_dir, file->name);
+	int fd;
+	struct incfs_filled_range ranges[128];
+	struct incfs_get_filled_blocks_args fba = {
+		.range_buffer = ptr_to_u64(ranges),
+		.range_buffer_size = sizeof(ranges),
+	};
+	int error = TEST_SUCCESS;
+	int i;
+	int range_cnt;
+	int cmd_fd = -1;
+	struct incfs_permit_fill permit_fill;
+
+	fd = open(filename, O_RDONLY | O_CLOEXEC);
+	free(filename);
+	if (fd <= 0)
+		return TEST_FAILURE;
+
+	error = ioctl(fd, INCFS_IOC_GET_FILLED_BLOCKS, &fba);
+	if (error != -1 || errno != EPERM) {
+		ksft_print_msg("INCFS_IOC_GET_FILLED_BLOCKS not blocked\n");
+		error = -EPERM;
+		goto out;
+	}
+
+	cmd_fd = open_commands_file(mount_dir);
+	permit_fill.file_descriptor = fd;
+	if (ioctl(cmd_fd, INCFS_IOC_PERMIT_FILL, &permit_fill)) {
+		print_error("INCFS_IOC_PERMIT_FILL failed");
+		return -EPERM;
+		goto out;
+	}
+
+	error = ioctl(fd, INCFS_IOC_GET_FILLED_BLOCKS, &fba);
+	if (error && errno != ERANGE)
+		goto out;
+
+	if (error && errno == ERANGE && block_cnt < 509)
+		goto out;
+
+	if (!error && block_cnt >= 509) {
+		error = -ERANGE;
+		goto out;
+	}
+
+	if (fba.total_blocks_out != block_cnt) {
+		error = -EINVAL;
+		goto out;
+	}
+
+	if (fba.data_blocks_out != block_cnt) {
+		error = -EINVAL;
+		goto out;
+	}
+
+	range_cnt = (block_cnt + 3) / 4;
+	if (range_cnt > 128)
+		range_cnt = 128;
+	if (range_cnt != fba.range_buffer_size_out / sizeof(*ranges)) {
+		error = -ERANGE;
+		goto out;
+	}
+
+	error = TEST_SUCCESS;
+	for (i = 0; i < fba.range_buffer_size_out / sizeof(*ranges) - 1; ++i)
+		if (ranges[i].begin != i * 4 || ranges[i].end != i * 4 + 2) {
+			error = -EINVAL;
+			goto out;
+		}
+
+	if (ranges[i].begin != i * 4 ||
+	    (ranges[i].end != i * 4 + 1 && ranges[i].end != i * 4 + 2)) {
+		error = -EINVAL;
+		goto out;
+	}
+
+	for (i = 0; i < 64; ++i) {
+		fba.start_index = i * 2;
+		fba.end_index = i * 2 + 2;
+		error = ioctl(fd, INCFS_IOC_GET_FILLED_BLOCKS, &fba);
+		if (error)
+			goto out;
+
+		if (fba.total_blocks_out != block_cnt) {
+			error = -EINVAL;
+			goto out;
+		}
+
+		if (fba.start_index >= block_cnt) {
+			if (fba.index_out != fba.start_index) {
+				error = -EINVAL;
+				goto out;
+			}
+
+			break;
+		}
+
+		if (i % 2) {
+			if (fba.range_buffer_size_out != 0) {
+				error = -EINVAL;
+				goto out;
+			}
+		} else {
+			if (fba.range_buffer_size_out != sizeof(*ranges)) {
+				error = -EINVAL;
+				goto out;
+			}
+
+			if (ranges[0].begin != i * 2) {
+				error = -EINVAL;
+				goto out;
+			}
+
+			if (ranges[0].end != i * 2 + 1 &&
+			    ranges[0].end != i * 2 + 2) {
+				error = -EINVAL;
+				goto out;
+			}
+		}
+	}
+
+out:
+	close(fd);
+	close(cmd_fd);
+	return error;
+}
+
+static int get_blocks_test(const char *mount_dir)
+{
+	char *backing_dir;
+	int cmd_fd = -1;
+	int i;
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	if (mount_fs_opt(mount_dir, backing_dir, "readahead=0", false) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	/* Write data. */
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (emit_file(cmd_fd, NULL, file->name, &file->id, file->size,
+			      NULL))
+			goto failure;
+
+		if (emit_partial_test_file_data(mount_dir, file))
+			goto failure;
+	}
+
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (validate_ranges(mount_dir, file))
+			goto failure;
+
+		/*
+		 * The smallest files are filled completely, so this checks that
+		 * the fast get_filled_blocks path is not causing issues
+		 */
+		if (validate_ranges(mount_dir, file))
+			goto failure;
+	}
+
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return TEST_SUCCESS;
+
+failure:
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return TEST_FAILURE;
+}
+
+static int emit_partial_test_file_hash(const char *mount_dir,
+				       struct test_file *file)
+{
+	int err;
+	int fd;
+	struct incfs_fill_blocks fill_blocks = {
+		.count = 1,
+	};
+	struct incfs_fill_block *fill_block_array =
+		calloc(fill_blocks.count, sizeof(struct incfs_fill_block));
+	uint8_t data[INCFS_DATA_FILE_BLOCK_SIZE];
+
+	if (file->size <= 4096 / 32 * 4096)
+		return 0;
+
+	if (!fill_block_array)
+		return -ENOMEM;
+	fill_blocks.fill_blocks = ptr_to_u64(fill_block_array);
+
+	rnd_buf(data, sizeof(data), 0);
+
+	fill_block_array[0] =
+		(struct incfs_fill_block){ .block_index = 1,
+					   .data_len =
+						   INCFS_DATA_FILE_BLOCK_SIZE,
+					   .data = ptr_to_u64(data),
+					   .flags = INCFS_BLOCK_FLAGS_HASH };
+
+	fd = open_file_by_id(mount_dir, file->id, true);
+	if (fd < 0) {
+		err = errno;
+		goto failure;
+	}
+
+	err = ioctl(fd, INCFS_IOC_FILL_BLOCKS, &fill_blocks);
+	close(fd);
+	if (err < fill_blocks.count)
+		err = errno;
+	else
+		err = 0;
+
+failure:
+	free(fill_block_array);
+	return err;
+}
+
+static int validate_hash_ranges(const char *mount_dir, struct test_file *file)
+{
+	int block_cnt = 1 + (file->size - 1) / INCFS_DATA_FILE_BLOCK_SIZE;
+	char *filename = concat_file_name(mount_dir, file->name);
+	int fd;
+	struct incfs_filled_range ranges[128];
+	struct incfs_get_filled_blocks_args fba = {
+		.range_buffer = ptr_to_u64(ranges),
+		.range_buffer_size = sizeof(ranges),
+	};
+	int error = TEST_SUCCESS;
+	int file_blocks = (file->size + INCFS_DATA_FILE_BLOCK_SIZE - 1) /
+			  INCFS_DATA_FILE_BLOCK_SIZE;
+	int cmd_fd = -1;
+	struct incfs_permit_fill permit_fill;
+
+	if (file->size <= 4096 / 32 * 4096)
+		return 0;
+
+	fd = open(filename, O_RDONLY | O_CLOEXEC);
+	free(filename);
+	if (fd <= 0)
+		return TEST_FAILURE;
+
+	error = ioctl(fd, INCFS_IOC_GET_FILLED_BLOCKS, &fba);
+	if (error != -1 || errno != EPERM) {
+		ksft_print_msg("INCFS_IOC_GET_FILLED_BLOCKS not blocked\n");
+		error = -EPERM;
+		goto out;
+	}
+
+	cmd_fd = open_commands_file(mount_dir);
+	permit_fill.file_descriptor = fd;
+	if (ioctl(cmd_fd, INCFS_IOC_PERMIT_FILL, &permit_fill)) {
+		print_error("INCFS_IOC_PERMIT_FILL failed");
+		return -EPERM;
+		goto out;
+	}
+
+	error = ioctl(fd, INCFS_IOC_GET_FILLED_BLOCKS, &fba);
+	if (error)
+		goto out;
+
+	if (fba.total_blocks_out <= block_cnt) {
+		error = -EINVAL;
+		goto out;
+	}
+
+	if (fba.data_blocks_out != block_cnt) {
+		error = -EINVAL;
+		goto out;
+	}
+
+	if (fba.range_buffer_size_out != sizeof(struct incfs_filled_range)) {
+		error = -EINVAL;
+		goto out;
+	}
+
+	if (ranges[0].begin != file_blocks + 1 ||
+	    ranges[0].end != file_blocks + 2) {
+		error = -EINVAL;
+		goto out;
+	}
+
+out:
+	close(cmd_fd);
+	close(fd);
+	return error;
+}
+
+static int get_hash_blocks_test(const char *mount_dir)
+{
+	char *backing_dir;
+	int cmd_fd = -1;
+	int i;
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	if (mount_fs_opt(mount_dir, backing_dir, "readahead=0", false) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (crypto_emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, file->root_hash,
+				     file->sig.add_data))
+			goto failure;
+
+		if (emit_partial_test_file_hash(mount_dir, file))
+			goto failure;
+	}
+
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		if (validate_hash_ranges(mount_dir, file))
+			goto failure;
+	}
+
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return TEST_SUCCESS;
+
+failure:
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return TEST_FAILURE;
+}
+
+#define THREE_GB (3LL * 1024 * 1024 * 1024)
+#define FOUR_GB (4LL * 1024 * 1024 * 1024) /* Have 1GB of margin */
+static int large_file_test(const char *mount_dir)
+{
+	char *backing_dir;
+	int cmd_fd = -1;
+	int i;
+	int result = TEST_FAILURE, ret;
+	uint8_t data[INCFS_DATA_FILE_BLOCK_SIZE] = {};
+	int block_count = THREE_GB / INCFS_DATA_FILE_BLOCK_SIZE;
+	struct incfs_fill_block *block_buf =
+		calloc(block_count, sizeof(struct incfs_fill_block));
+	struct incfs_fill_blocks fill_blocks = {
+		.count = block_count,
+		.fill_blocks = ptr_to_u64(block_buf),
+	};
+	incfs_uuid_t id;
+	int fd = -1;
+	struct statvfs svfs;
+	unsigned long long free_disksz;
+
+	ret = statvfs(mount_dir, &svfs);
+	if (ret) {
+		ksft_print_msg("Can't get disk size. Skipping %s...\n", __func__);
+		return TEST_SKIP;
+	}
+
+	free_disksz = (unsigned long long)svfs.f_bavail * svfs.f_bsize;
+
+	if (FOUR_GB > free_disksz) {
+		ksft_print_msg("Not enough free disk space (%lldMB). Skipping %s...\n",
+				free_disksz >> 20, __func__);
+		return TEST_SKIP;
+	}
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	if (mount_fs_opt(mount_dir, backing_dir, "readahead=0", false) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	if (emit_file(cmd_fd, NULL, "very_large_file", &id,
+		      (uint64_t)block_count * INCFS_DATA_FILE_BLOCK_SIZE,
+		      NULL) < 0)
+		goto failure;
+
+	for (i = 0; i < block_count; i++) {
+		block_buf[i].compression = COMPRESSION_NONE;
+		block_buf[i].block_index = i;
+		block_buf[i].data_len = INCFS_DATA_FILE_BLOCK_SIZE;
+		block_buf[i].data = ptr_to_u64(data);
+	}
+
+	fd = open_file_by_id(mount_dir, id, true);
+	if (fd < 0)
+		goto failure;
+
+	if (ioctl(fd, INCFS_IOC_FILL_BLOCKS, &fill_blocks) != block_count)
+		goto failure;
+
+	if (emit_file(cmd_fd, NULL, "very_very_large_file", &id, 1LL << 40,
+		      NULL) < 0)
+		goto failure;
+
+	result = TEST_SUCCESS;
+
+failure:
+	close(fd);
+	close(cmd_fd);
+	unlink("very_large_file");
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+static int validate_mapped_file(const char *orig_name, const char *name,
+				size_t size, size_t offset)
+{
+	struct stat st;
+	int orig_fd = -1, fd = -1;
+	size_t block;
+	int result = TEST_FAILURE;
+
+	if (stat(name, &st)) {
+		ksft_print_msg("Failed to stat %s with error %s\n",
+			       name, strerror(errno));
+		goto failure;
+	}
+
+	if (size != st.st_size) {
+		ksft_print_msg("Mismatched file sizes for file %s - expected %llu, got %llu\n",
+				   name, size, st.st_size);
+		goto failure;
+	}
+
+	fd = open(name, O_RDONLY | O_CLOEXEC);
+	if (fd == -1) {
+		ksft_print_msg("Failed to open %s with error %s\n", name,
+			       strerror(errno));
+		goto failure;
+	}
+
+	orig_fd = open(orig_name, O_RDONLY | O_CLOEXEC);
+	if (orig_fd == -1) {
+		ksft_print_msg("Failed to open %s with error %s\n", orig_name,
+			       strerror(errno));
+		goto failure;
+	}
+
+	for (block = 0; block < size; block += INCFS_DATA_FILE_BLOCK_SIZE) {
+		uint8_t orig_data[INCFS_DATA_FILE_BLOCK_SIZE];
+		uint8_t data[INCFS_DATA_FILE_BLOCK_SIZE];
+		ssize_t orig_read, mapped_read;
+
+		orig_read = pread(orig_fd, orig_data,
+				 INCFS_DATA_FILE_BLOCK_SIZE, block + offset);
+		mapped_read = pread(fd, data, INCFS_DATA_FILE_BLOCK_SIZE,
+				    block);
+
+		if (orig_read < mapped_read ||
+		    mapped_read != min(size - block,
+				       INCFS_DATA_FILE_BLOCK_SIZE)) {
+			ksft_print_msg("Failed to read enough data: %llu %llu %llu %lld %lld\n",
+				       block, size, offset, orig_read,
+				       mapped_read);
+			goto failure;
+		}
+
+		if (memcmp(orig_data, data, mapped_read)) {
+			ksft_print_msg("Data doesn't match: %llu %llu %llu %lld %lld\n",
+				       block, size, offset, orig_read,
+				       mapped_read);
+			goto failure;
+		}
+	}
+
+	result = TEST_SUCCESS;
+
+failure:
+	close(orig_fd);
+	close(fd);
+	return result;
+}
+
+static int mapped_file_test(const char *mount_dir)
+{
+	char *backing_dir;
+	int result = TEST_FAILURE;
+	int cmd_fd = -1;
+	int i;
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+
+	backing_dir = create_backing_dir(mount_dir);
+	if (!backing_dir)
+		goto failure;
+
+	if (mount_fs_opt(mount_dir, backing_dir, "readahead=0", false) != 0)
+		goto failure;
+
+	cmd_fd = open_commands_file(mount_dir);
+	if (cmd_fd < 0)
+		goto failure;
+
+	for (i = 0; i < file_num; ++i) {
+		struct test_file *file = &test.files[i];
+		size_t blocks = file->size / INCFS_DATA_FILE_BLOCK_SIZE;
+		size_t mapped_offset = blocks / 4 *
+			INCFS_DATA_FILE_BLOCK_SIZE;
+		size_t mapped_size = file->size / 4 * 3 - mapped_offset;
+		struct incfs_create_mapped_file_args mfa;
+		char mapped_file_name[FILENAME_MAX];
+		char orig_file_path[PATH_MAX];
+		char mapped_file_path[PATH_MAX];
+
+		if (emit_file(cmd_fd, NULL, file->name, &file->id, file->size,
+					NULL) < 0)
+			goto failure;
+
+		if (emit_test_file_data(mount_dir, file))
+			goto failure;
+
+		if (snprintf(mapped_file_name, ARRAY_SIZE(mapped_file_name),
+					"%s.mapped", file->name) < 0)
+			goto failure;
+
+		mfa = (struct incfs_create_mapped_file_args) {
+			.size = mapped_size,
+			.mode = 0664,
+			.file_name = ptr_to_u64(mapped_file_name),
+			.source_file_id = file->id,
+			.source_offset = mapped_offset,
+		};
+
+		result = ioctl(cmd_fd, INCFS_IOC_CREATE_MAPPED_FILE, &mfa);
+		if (result) {
+			ksft_print_msg(
+				"Failed to create mapped file with error %d\n",
+				result);
+			goto failure;
+		}
+
+		result = snprintf(orig_file_path,
+				  ARRAY_SIZE(orig_file_path), "%s/%s",
+				  mount_dir, file->name);
+
+		if (result < 0 || result >= ARRAY_SIZE(mapped_file_path)) {
+			result = TEST_FAILURE;
+			goto failure;
+		}
+
+		result = snprintf(mapped_file_path,
+				  ARRAY_SIZE(mapped_file_path), "%s/%s",
+				  mount_dir, mapped_file_name);
+
+		if (result < 0 || result >= ARRAY_SIZE(mapped_file_path)) {
+			result = TEST_FAILURE;
+			goto failure;
+		}
+
+		result = validate_mapped_file(orig_file_path, mapped_file_path,
+					      mapped_size, mapped_offset);
+		if (result)
+			goto failure;
+	}
+
+failure:
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+static const char v1_file[] = {
+	/* Header */
+	/* 0x00: Magic number */
+	0x49, 0x4e, 0x43, 0x46, 0x53, 0x00, 0x00, 0x00,
+	/* 0x08: Version */
+	0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	/* 0x10: Header size */
+	0x38, 0x00,
+	/* 0x12: Block size */
+	0x00, 0x10,
+	/* 0x14: Flags */
+	0x00, 0x00, 0x00, 0x00,
+	/* 0x18: First md offset */
+	0x46, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	/* 0x20: File size */
+	0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	/* 0x28: UUID */
+	0x8c, 0x7d, 0xd9, 0x22, 0xad, 0x47, 0x49, 0x4f,
+	0xc0, 0x2c, 0x38, 0x8e, 0x12, 0xc0, 0x0e, 0xac,
+
+	/* 0x38: Attribute */
+	0x6d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61,
+	0x31, 0x32, 0x33, 0x31, 0x32, 0x33,
+
+	/* Attribute md record */
+	/* 0x46: Type */
+	0x02,
+	/* 0x47: Size */
+	0x25, 0x00,
+	/* 0x49: CRC */
+	0x9a, 0xef, 0xef, 0x72,
+	/* 0x4d: Next md offset */
+	0x75, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	/* 0x55: Prev md offset */
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	/* 0x5d: fa_offset */
+	0x38, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	/* 0x65: fa_size */
+	0x0e, 0x00,
+	/* 0x67: fa_crc */
+	0xfb, 0x5e, 0x72, 0x89,
+
+	/* Blockmap table */
+	/* 0x6b: First 10-byte entry */
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+
+	/* Blockmap md record */
+	/* 0x75: Type */
+	0x01,
+	/* 0x76: Size */
+	0x23, 0x00,
+	/* 0x78: CRC */
+	0x74, 0x45, 0xd3, 0xb9,
+	/* 0x7c: Next md offset */
+	0x00, 0x00, 0x00, 0x00,	0x00, 0x00, 0x00, 0x00,
+	/* 0x84: Prev md offset */
+	0x46, 0x00, 0x00, 0x00,	0x00, 0x00, 0x00, 0x00,
+	/* 0x8c: blockmap offset */
+	0x6b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	/* 0x94: blockmap count */
+	0x01, 0x00, 0x00, 0x00,
+};
+
+static int compatibility_test(const char *mount_dir)
+{
+	static const char *name = "file";
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL;
+	char *filename = NULL;
+	int fd = -1;
+	uint64_t size = 0x0c;
+
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TEST(filename = concat_file_name(backing_dir, name), filename);
+	TEST(fd = open(filename, O_CREAT | O_WRONLY | O_CLOEXEC, 0777),
+	     fd != -1);
+	TESTEQUAL(write(fd, v1_file, sizeof(v1_file)), sizeof(v1_file));
+	TESTEQUAL(fsetxattr(fd, INCFS_XATTR_SIZE_NAME, &size, sizeof(size), 0),
+		  0);
+	TESTEQUAL(mount_fs(mount_dir, backing_dir, 50), 0);
+	free(filename);
+	TEST(filename = concat_file_name(mount_dir, name), filename);
+	close(fd);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	umount(mount_dir);
+	free(backing_dir);
+	free(filename);
+	return result;
+}
+
+static int zero_blocks_written_count(int fd, uint32_t data_blocks_written,
+				     uint32_t hash_blocks_written)
+{
+	int test_result = TEST_FAILURE;
+	uint64_t offset;
+	uint8_t type;
+	uint32_t bw;
+
+	/* Get first md record */
+	TESTEQUAL(pread(fd, &offset, sizeof(offset), 24), sizeof(offset));
+
+	/* Find status md record */
+	for (;;) {
+		TESTNE(offset, 0);
+		TESTEQUAL(pread(fd, &type, sizeof(type), le64_to_cpu(offset)),
+			  sizeof(type));
+		if (type == 4)
+			break;
+		TESTEQUAL(pread(fd, &offset, sizeof(offset),
+				le64_to_cpu(offset) + 7),
+			  sizeof(offset));
+	}
+
+	/* Read blocks_written */
+	offset = le64_to_cpu(offset);
+	TESTEQUAL(pread(fd, &bw, sizeof(bw), offset + 23), sizeof(bw));
+	TESTEQUAL(le32_to_cpu(bw), data_blocks_written);
+	TESTEQUAL(pread(fd, &bw, sizeof(bw), offset + 27), sizeof(bw));
+	TESTEQUAL(le32_to_cpu(bw), hash_blocks_written);
+
+	/* Write out zero */
+	bw = 0;
+	TESTEQUAL(pwrite(fd, &bw, sizeof(bw), offset + 23), sizeof(bw));
+	TESTEQUAL(pwrite(fd, &bw, sizeof(bw), offset + 27), sizeof(bw));
+
+	test_result = TEST_SUCCESS;
+out:
+	return test_result;
+}
+
+static int validate_block_count(const char *mount_dir, const char *backing_dir,
+				struct test_file *file,
+				int total_data_blocks, int filled_data_blocks,
+				int total_hash_blocks, int filled_hash_blocks)
+{
+	char *filename = NULL;
+	char *backing_filename = NULL;
+	int fd = -1;
+	struct incfs_get_block_count_args bca = {};
+	int test_result = TEST_FAILURE;
+	struct incfs_filled_range ranges[128];
+	struct incfs_get_filled_blocks_args fba = {
+		.range_buffer = ptr_to_u64(ranges),
+		.range_buffer_size = sizeof(ranges),
+	};
+	int cmd_fd = -1;
+	struct incfs_permit_fill permit_fill;
+
+	TEST(filename = concat_file_name(mount_dir, file->name), filename);
+	TEST(backing_filename = concat_file_name(backing_dir, file->name),
+	     backing_filename);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+
+	TESTEQUAL(ioctl(fd, INCFS_IOC_GET_BLOCK_COUNT, &bca), 0);
+	TESTEQUAL(bca.total_data_blocks_out, total_data_blocks);
+	TESTEQUAL(bca.filled_data_blocks_out, filled_data_blocks);
+	TESTEQUAL(bca.total_hash_blocks_out, total_hash_blocks);
+	TESTEQUAL(bca.filled_hash_blocks_out, filled_hash_blocks);
+
+	close(fd);
+	TESTEQUAL(umount(mount_dir), 0);
+	TEST(fd = open(backing_filename, O_RDWR | O_CLOEXEC), fd != -1);
+	TESTEQUAL(zero_blocks_written_count(fd, filled_data_blocks,
+					    filled_hash_blocks),
+		  TEST_SUCCESS);
+	close(fd);
+	fd = -1;
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir, "readahead=0", false),
+		  0);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+
+	TESTEQUAL(ioctl(fd, INCFS_IOC_GET_BLOCK_COUNT, &bca), 0);
+	TESTEQUAL(bca.total_data_blocks_out, total_data_blocks);
+	TESTEQUAL(bca.filled_data_blocks_out, 0);
+	TESTEQUAL(bca.total_hash_blocks_out, total_hash_blocks);
+	TESTEQUAL(bca.filled_hash_blocks_out, 0);
+
+	TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+	permit_fill.file_descriptor = fd;
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_PERMIT_FILL, &permit_fill), 0);
+	do {
+		ioctl(fd, INCFS_IOC_GET_FILLED_BLOCKS, &fba);
+		fba.start_index = fba.index_out + 1;
+	} while (fba.index_out < fba.total_blocks_out);
+
+	TESTEQUAL(ioctl(fd, INCFS_IOC_GET_BLOCK_COUNT, &bca), 0);
+	TESTEQUAL(bca.total_data_blocks_out, total_data_blocks);
+	TESTEQUAL(bca.filled_data_blocks_out, filled_data_blocks);
+	TESTEQUAL(bca.total_hash_blocks_out, total_hash_blocks);
+	TESTEQUAL(bca.filled_hash_blocks_out, filled_hash_blocks);
+
+	test_result = TEST_SUCCESS;
+out:
+	close(cmd_fd);
+	close(fd);
+	free(filename);
+	free(backing_filename);
+	return test_result;
+}
+
+
+
+static int validate_data_block_count(const char *mount_dir,
+				     const char *backing_dir,
+				     struct test_file *file)
+{
+	const int total_data_blocks = 1 + (file->size - 1) /
+						INCFS_DATA_FILE_BLOCK_SIZE;
+	const int filled_data_blocks = (total_data_blocks + 1) / 2;
+
+	int test_result = TEST_FAILURE;
+	char *filename = NULL;
+	char *incomplete_filename = NULL;
+	struct stat stat_buf_incomplete, stat_buf_file;
+	int fd = -1;
+	struct incfs_get_block_count_args bca = {};
+	int i;
+
+	TEST(filename = concat_file_name(mount_dir, file->name), filename);
+	TEST(incomplete_filename = get_incomplete_filename(mount_dir, file->id),
+	     incomplete_filename);
+
+	TESTEQUAL(stat(filename, &stat_buf_file), 0);
+	TESTEQUAL(stat(incomplete_filename, &stat_buf_incomplete), 0);
+	TESTEQUAL(stat_buf_file.st_ino, stat_buf_incomplete.st_ino);
+	TESTEQUAL(stat_buf_file.st_nlink, 3);
+
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTEQUAL(ioctl(fd, INCFS_IOC_GET_BLOCK_COUNT, &bca), 0);
+	TESTEQUAL(bca.total_data_blocks_out, total_data_blocks);
+	TESTEQUAL(bca.filled_data_blocks_out, 0);
+	TESTEQUAL(bca.total_hash_blocks_out, 0);
+	TESTEQUAL(bca.filled_hash_blocks_out, 0);
+
+	for (i = 0; i < total_data_blocks; i += 2)
+		TESTEQUAL(emit_test_block(mount_dir, file, i), 0);
+
+	TESTEQUAL(ioctl(fd, INCFS_IOC_GET_BLOCK_COUNT, &bca), 0);
+	TESTEQUAL(bca.total_data_blocks_out, total_data_blocks);
+	TESTEQUAL(bca.filled_data_blocks_out, filled_data_blocks);
+	TESTEQUAL(bca.total_hash_blocks_out, 0);
+	TESTEQUAL(bca.filled_hash_blocks_out, 0);
+	close(fd);
+	fd = -1;
+
+	TESTEQUAL(validate_block_count(mount_dir, backing_dir, file,
+				       total_data_blocks, filled_data_blocks,
+				       0, 0),
+		  0);
+
+	for (i = 1; i < total_data_blocks; i += 2)
+		TESTEQUAL(emit_test_block(mount_dir, file, i), 0);
+
+	TESTEQUAL(stat(incomplete_filename, &stat_buf_incomplete), -1);
+	TESTEQUAL(errno, ENOENT);
+
+	test_result = TEST_SUCCESS;
+out:
+	close(fd);
+	free(incomplete_filename);
+	free(filename);
+	return test_result;
+}
+
+static int data_block_count_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir;
+	int cmd_fd = -1;
+	int i;
+	struct test_files_set test = get_test_files_set();
+
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir, "readahead=0", false),
+		  0);
+
+	for (i = 0; i < test.files_count; ++i) {
+		struct test_file *file = &test.files[i];
+
+		TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+		TESTEQUAL(emit_file(cmd_fd, NULL, file->name, &file->id,
+				    file->size,	NULL),
+			  0);
+		close(cmd_fd);
+		cmd_fd = -1;
+
+		TESTEQUAL(validate_data_block_count(mount_dir, backing_dir,
+						    file),
+			  0);
+	}
+
+	result = TEST_SUCCESS;
+out:
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+static int validate_hash_block_count(const char *mount_dir,
+				     const char *backing_dir,
+				     struct test_file *file)
+{
+	const int digest_size = SHA256_DIGEST_SIZE;
+	const int hash_per_block = INCFS_DATA_FILE_BLOCK_SIZE / digest_size;
+	const int total_data_blocks = 1 + (file->size - 1) /
+					INCFS_DATA_FILE_BLOCK_SIZE;
+
+	int result = TEST_FAILURE;
+	int hash_layer = total_data_blocks;
+	int total_hash_blocks = 0;
+	int filled_hash_blocks;
+	char *filename = NULL;
+	int fd = -1;
+	struct incfs_get_block_count_args bca = {};
+
+	while (hash_layer > 1) {
+		hash_layer = (hash_layer + hash_per_block - 1) / hash_per_block;
+		total_hash_blocks += hash_layer;
+	}
+	filled_hash_blocks = total_hash_blocks > 1 ? 1 : 0;
+
+	TEST(filename = concat_file_name(mount_dir, file->name), filename);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+
+	TESTEQUAL(ioctl(fd, INCFS_IOC_GET_BLOCK_COUNT, &bca), 0);
+	TESTEQUAL(bca.total_data_blocks_out, total_data_blocks);
+	TESTEQUAL(bca.filled_data_blocks_out, 0);
+	TESTEQUAL(bca.total_hash_blocks_out, total_hash_blocks);
+	TESTEQUAL(bca.filled_hash_blocks_out, 0);
+
+	TESTEQUAL(emit_partial_test_file_hash(mount_dir, file), 0);
+
+	TESTEQUAL(ioctl(fd, INCFS_IOC_GET_BLOCK_COUNT, &bca), 0);
+	TESTEQUAL(bca.total_data_blocks_out, total_data_blocks);
+	TESTEQUAL(bca.filled_data_blocks_out, 0);
+	TESTEQUAL(bca.total_hash_blocks_out, total_hash_blocks);
+	TESTEQUAL(bca.filled_hash_blocks_out, filled_hash_blocks);
+	close(fd);
+	fd = -1;
+
+	if (filled_hash_blocks)
+		TESTEQUAL(validate_block_count(mount_dir, backing_dir, file,
+				       total_data_blocks, 0,
+				       total_hash_blocks, filled_hash_blocks),
+		  0);
+
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	free(filename);
+	return result;
+}
+
+static int hash_block_count_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir;
+	int cmd_fd = -1;
+	int i;
+	struct test_files_set test = get_test_files_set();
+
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir, "readahead=0", false),
+		  0);
+
+	for (i = 0; i < test.files_count; i++) {
+		struct test_file *file = &test.files[i];
+
+		TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+		TESTEQUAL(crypto_emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, file->root_hash,
+				     file->sig.add_data),
+			  0);
+		close(cmd_fd);
+		cmd_fd = -1;
+
+		TESTEQUAL(validate_hash_block_count(mount_dir, backing_dir,
+						    &test.files[i]),
+			  0);
+	}
+
+	result = TEST_SUCCESS;
+out:
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+static int is_close(struct timespec *start, int expected_ms)
+{
+	const int allowed_variance = 100;
+	int result = TEST_FAILURE;
+	struct timespec finish;
+	int diff;
+
+	TESTEQUAL(clock_gettime(CLOCK_MONOTONIC, &finish), 0);
+	diff = (finish.tv_sec - start->tv_sec) * 1000 +
+		(finish.tv_nsec - start->tv_nsec) / 1000000;
+
+	TESTCOND(diff >= expected_ms - allowed_variance);
+	TESTCOND(diff <= expected_ms + allowed_variance);
+	result = TEST_SUCCESS;
+out:
+	return result;
+}
+
+static int per_uid_read_timeouts_test(const char *mount_dir)
+{
+	struct test_file file = {
+		.name = "file",
+		.size = 16 * INCFS_DATA_FILE_BLOCK_SIZE
+	};
+
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL;
+	int pid = -1;
+	int cmd_fd = -1;
+	char *filename = NULL;
+	int fd = -1;
+	struct timespec start;
+	char buffer[4096];
+	struct incfs_per_uid_read_timeouts purt_get[1];
+	struct incfs_get_read_timeouts_args grt = {
+		ptr_to_u64(purt_get),
+		sizeof(purt_get)
+	};
+	struct incfs_per_uid_read_timeouts purt_set[] = {
+		{
+			.uid = 0,
+			.min_time_us = 1000000,
+			.min_pending_time_us = 2000000,
+			.max_pending_time_us = 3000000,
+		},
+	};
+	struct incfs_set_read_timeouts_args srt = {
+		ptr_to_u64(purt_set),
+		sizeof(purt_set)
+	};
+	int status;
+
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir,
+			       "read_timeout_ms=1000,readahead=0", false), 0);
+
+	TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+	TESTEQUAL(emit_file(cmd_fd, NULL, file.name, &file.id, file.size,
+			    NULL), 0);
+
+	TEST(filename = concat_file_name(mount_dir, file.name), filename);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTEQUAL(fcntl(fd, F_SETFD, fcntl(fd, F_GETFD) | FD_CLOEXEC), 0);
+
+	/* Default mount options read failure is 1000 */
+	TESTEQUAL(clock_gettime(CLOCK_MONOTONIC, &start), 0);
+	TESTEQUAL(pread(fd, buffer, sizeof(buffer), 0), -1);
+	TESTEQUAL(is_close(&start, 1000), 0);
+
+	grt.timeouts_array_size = 0;
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_GET_READ_TIMEOUTS, &grt), 0);
+	TESTEQUAL(grt.timeouts_array_size_out, 0);
+
+	/* Set it to 3000 */
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_SET_READ_TIMEOUTS, &srt), 0);
+	TESTEQUAL(clock_gettime(CLOCK_MONOTONIC, &start), 0);
+	TESTEQUAL(pread(fd, buffer, sizeof(buffer), 0), -1);
+	TESTEQUAL(is_close(&start, 3000), 0);
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_GET_READ_TIMEOUTS, &grt), -1);
+	TESTEQUAL(errno, E2BIG);
+	TESTEQUAL(grt.timeouts_array_size_out, sizeof(purt_get));
+	grt.timeouts_array_size = sizeof(purt_get);
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_GET_READ_TIMEOUTS, &grt), 0);
+	TESTEQUAL(grt.timeouts_array_size_out, sizeof(purt_get));
+	TESTEQUAL(purt_get[0].uid, purt_set[0].uid);
+	TESTEQUAL(purt_get[0].min_time_us, purt_set[0].min_time_us);
+	TESTEQUAL(purt_get[0].min_pending_time_us,
+		  purt_set[0].min_pending_time_us);
+	TESTEQUAL(purt_get[0].max_pending_time_us,
+		  purt_set[0].max_pending_time_us);
+
+	/* Still 1000 in UID 2 */
+	TESTEQUAL(clock_gettime(CLOCK_MONOTONIC, &start), 0);
+	TEST(pid = fork(), pid != -1);
+	if (pid == 0) {
+		TESTEQUAL(setuid(2), 0);
+		TESTEQUAL(pread(fd, buffer, sizeof(buffer), 0), -1);
+		exit(0);
+	}
+	TESTNE(wait(&status), -1);
+	TESTEQUAL(WEXITSTATUS(status), 0);
+	TESTEQUAL(is_close(&start, 1000), 0);
+
+	/* Set it to default */
+	purt_set[0].max_pending_time_us = UINT32_MAX;
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_SET_READ_TIMEOUTS, &srt), 0);
+	TESTEQUAL(clock_gettime(CLOCK_MONOTONIC, &start), 0);
+	TESTEQUAL(pread(fd, buffer, sizeof(buffer), 0), -1);
+	TESTEQUAL(is_close(&start, 1000), 0);
+
+	/* Test min read time */
+	TESTEQUAL(emit_test_block(mount_dir, &file, 0), 0);
+	TESTEQUAL(clock_gettime(CLOCK_MONOTONIC, &start), 0);
+	TESTEQUAL(pread(fd, buffer, sizeof(buffer), 0), sizeof(buffer));
+	TESTEQUAL(is_close(&start, 1000), 0);
+
+	/* Test min pending time */
+	purt_set[0].uid = 2;
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_SET_READ_TIMEOUTS, &srt), 0);
+	TESTEQUAL(clock_gettime(CLOCK_MONOTONIC, &start), 0);
+	TEST(pid = fork(), pid != -1);
+	if (pid == 0) {
+		TESTEQUAL(setuid(2), 0);
+		TESTEQUAL(pread(fd, buffer, sizeof(buffer), sizeof(buffer)),
+			  sizeof(buffer));
+		exit(0);
+	}
+	sleep(1);
+	TESTEQUAL(emit_test_block(mount_dir, &file, 1), 0);
+	TESTNE(wait(&status), -1);
+	TESTEQUAL(WEXITSTATUS(status), 0);
+	TESTEQUAL(is_close(&start, 2000), 0);
+
+	/* Clear timeouts */
+	srt.timeouts_array_size = 0;
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_SET_READ_TIMEOUTS, &srt), 0);
+	grt.timeouts_array_size = 0;
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_GET_READ_TIMEOUTS, &grt), 0);
+	TESTEQUAL(grt.timeouts_array_size_out, 0);
+
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+
+	if (pid == 0)
+		exit(result);
+
+	free(filename);
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+#define DIRS 3
+static int inotify_test(const char *mount_dir)
+{
+	const char *mapped_file_name = "mapped_name";
+	struct test_file file = {
+		.name = "file",
+		.size = 16 * INCFS_DATA_FILE_BLOCK_SIZE
+	};
+
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL, *index_dir = NULL, *incomplete_dir = NULL;
+	char *file_name = NULL;
+	int cmd_fd = -1;
+	int notify_fd = -1;
+	int wds[DIRS];
+	char buffer[DIRS * (sizeof(struct inotify_event) + NAME_MAX + 1)];
+	char *ptr = buffer;
+	struct inotify_event *event;
+	struct inotify_event *events[DIRS] = {};
+	const char *names[DIRS] = {};
+	int res;
+	char id[sizeof(incfs_uuid_t) * 2 + 1];
+	struct incfs_create_mapped_file_args mfa;
+
+	/* File creation triggers inotify events in .index and .incomplete? */
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TEST(index_dir = concat_file_name(mount_dir, ".index"), index_dir);
+	TEST(incomplete_dir = concat_file_name(mount_dir, ".incomplete"),
+	     incomplete_dir);
+	TESTEQUAL(mount_fs(mount_dir, backing_dir, 50), 0);
+	TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+	TEST(notify_fd = inotify_init1(IN_NONBLOCK | IN_CLOEXEC),
+	     notify_fd != -1);
+	TEST(wds[0] = inotify_add_watch(notify_fd, mount_dir,
+					IN_CREATE | IN_DELETE),
+	     wds[0] != -1);
+	TEST(wds[1] = inotify_add_watch(notify_fd, index_dir,
+					IN_CREATE | IN_DELETE),
+	     wds[1] != -1);
+	TEST(wds[2] = inotify_add_watch(notify_fd, incomplete_dir,
+					IN_CREATE | IN_DELETE),
+	     wds[2] != -1);
+	TESTEQUAL(emit_file(cmd_fd, NULL, file.name, &file.id, file.size,
+			   NULL), 0);
+	TEST(res = read(notify_fd, buffer, sizeof(buffer)), res != -1);
+
+	while (ptr < buffer + res) {
+		int i;
+
+		event = (struct inotify_event *) ptr;
+		TESTCOND(ptr + sizeof(*event) <= buffer + res);
+		for (i = 0; i < DIRS; ++i)
+			if (event->wd == wds[i]) {
+				TESTEQUAL(events[i], NULL);
+				events[i] = event;
+				ptr += sizeof(*event);
+				names[i] = ptr;
+				ptr += events[i]->len;
+				TESTCOND(ptr <= buffer + res);
+				break;
+			}
+		TESTCOND(i < DIRS);
+	}
+
+	TESTNE(events[0], NULL);
+	TESTNE(events[1], NULL);
+	TESTNE(events[2], NULL);
+
+	bin2hex(id, file.id.bytes, sizeof(incfs_uuid_t));
+
+	TESTEQUAL(events[0]->mask, IN_CREATE);
+	TESTEQUAL(events[1]->mask, IN_CREATE);
+	TESTEQUAL(events[2]->mask, IN_CREATE);
+	TESTEQUAL(strcmp(names[0], file.name), 0);
+	TESTEQUAL(strcmp(names[1], id), 0);
+	TESTEQUAL(strcmp(names[2], id), 0);
+
+	/* Creating a mapped file triggers inotify event */
+	mfa = (struct incfs_create_mapped_file_args) {
+		.size = INCFS_DATA_FILE_BLOCK_SIZE,
+		.mode = 0664,
+		.file_name = ptr_to_u64(mapped_file_name),
+		.source_file_id = file.id,
+		.source_offset = INCFS_DATA_FILE_BLOCK_SIZE,
+	};
+
+	TEST(res = ioctl(cmd_fd, INCFS_IOC_CREATE_MAPPED_FILE, &mfa),
+	     res != -1);
+	TEST(res = read(notify_fd, buffer, sizeof(buffer)), res != -1);
+	event = (struct inotify_event *) buffer;
+	TESTEQUAL(event->wd, wds[0]);
+	TESTEQUAL(event->mask, IN_CREATE);
+	TESTEQUAL(strcmp(event->name, mapped_file_name), 0);
+
+	/* File completion triggers inotify event in .incomplete? */
+	TESTEQUAL(emit_test_file_data(mount_dir, &file), 0);
+	TEST(res = read(notify_fd, buffer, sizeof(buffer)), res != -1);
+	event = (struct inotify_event *) buffer;
+	TESTEQUAL(event->wd, wds[2]);
+	TESTEQUAL(event->mask, IN_DELETE);
+	TESTEQUAL(strcmp(event->name, id), 0);
+
+	/* File unlinking triggers inotify event in .index? */
+	TEST(file_name = concat_file_name(mount_dir, file.name), file_name);
+	TESTEQUAL(unlink(file_name), 0);
+	TEST(res = read(notify_fd, buffer, sizeof(buffer)), res != -1);
+	memset(events, 0, sizeof(events));
+	memset(names, 0, sizeof(names));
+	for (ptr = buffer; ptr < buffer + res;) {
+		event = (struct inotify_event *) ptr;
+		int i;
+
+		TESTCOND(ptr + sizeof(*event) <= buffer + res);
+		for (i = 0; i < DIRS; ++i)
+			if (event->wd == wds[i]) {
+				TESTEQUAL(events[i], NULL);
+				events[i] = event;
+				ptr += sizeof(*event);
+				names[i] = ptr;
+				ptr += events[i]->len;
+				TESTCOND(ptr <= buffer + res);
+				break;
+			}
+		TESTCOND(i < DIRS);
+	}
+
+	TESTNE(events[0], NULL);
+	TESTNE(events[1], NULL);
+	TESTEQUAL(events[2], NULL);
+
+	TESTEQUAL(events[0]->mask, IN_DELETE);
+	TESTEQUAL(events[1]->mask, IN_DELETE);
+	TESTEQUAL(strcmp(names[0], file.name), 0);
+	TESTEQUAL(strcmp(names[1], id), 0);
+
+	result = TEST_SUCCESS;
+out:
+	free(file_name);
+	close(notify_fd);
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	free(index_dir);
+	free(incomplete_dir);
+	return result;
+}
+
+static EVP_PKEY *create_key(void)
+{
+	EVP_PKEY *pkey = NULL;
+	RSA *rsa = NULL;
+	BIGNUM *bn = NULL;
+
+	pkey = EVP_PKEY_new();
+	if (!pkey)
+		goto fail;
+
+	bn = BN_new();
+	BN_set_word(bn, RSA_F4);
+
+	rsa = RSA_new();
+	if (!rsa)
+		goto fail;
+
+	RSA_generate_key_ex(rsa, 4096, bn, NULL);
+	EVP_PKEY_assign_RSA(pkey, rsa);
+
+	BN_free(bn);
+	return pkey;
+
+fail:
+	BN_free(bn);
+	EVP_PKEY_free(pkey);
+	return NULL;
+}
+
+static X509 *get_cert(EVP_PKEY *key)
+{
+	X509 *x509 = NULL;
+	X509_NAME *name = NULL;
+
+	x509 = X509_new();
+	if (!x509)
+		return NULL;
+
+	ASN1_INTEGER_set(X509_get_serialNumber(x509), 1);
+	X509_gmtime_adj(X509_get_notBefore(x509), 0);
+	X509_gmtime_adj(X509_get_notAfter(x509), 31536000L);
+	X509_set_pubkey(x509, key);
+
+	name = X509_get_subject_name(x509);
+	X509_NAME_add_entry_by_txt(name, "C", MBSTRING_ASC,
+		   (const unsigned char *)"US", -1, -1, 0);
+	X509_NAME_add_entry_by_txt(name, "ST", MBSTRING_ASC,
+		   (const unsigned char *)"CA", -1, -1, 0);
+	X509_NAME_add_entry_by_txt(name, "L", MBSTRING_ASC,
+		   (const unsigned char *)"San Jose", -1, -1, 0);
+	X509_NAME_add_entry_by_txt(name, "O", MBSTRING_ASC,
+		   (const unsigned char *)"Example", -1, -1, 0);
+	X509_NAME_add_entry_by_txt(name, "OU", MBSTRING_ASC,
+		   (const unsigned char *)"Org", -1, -1, 0);
+	X509_NAME_add_entry_by_txt(name, "CN", MBSTRING_ASC,
+		   (const unsigned char *)"www.example.com", -1, -1, 0);
+	X509_set_issuer_name(x509, name);
+
+	if (!X509_sign(x509, key, EVP_sha256()))
+		return NULL;
+
+	return x509;
+}
+
+static int sign(EVP_PKEY *key, X509 *cert, const char *data, size_t len,
+		unsigned char **sig, size_t *sig_len)
+{
+	const int pkcs7_flags = PKCS7_BINARY | PKCS7_NOATTR | PKCS7_PARTIAL |
+			  PKCS7_DETACHED;
+	const EVP_MD *md = EVP_sha256();
+
+	int result = TEST_FAILURE;
+
+	BIO *bio = NULL;
+	PKCS7 *p7 = NULL;
+	unsigned char *bio_buffer;
+
+	TEST(bio = BIO_new_mem_buf(data, len), bio);
+	TEST(p7 = PKCS7_sign(NULL, NULL, NULL, bio, pkcs7_flags), p7);
+	TESTNE(PKCS7_sign_add_signer(p7, cert, key, md, pkcs7_flags), 0);
+	TESTEQUAL(PKCS7_final(p7, bio, pkcs7_flags), 1);
+	TEST(*sig_len = i2d_PKCS7(p7, NULL), *sig_len);
+	TEST(bio_buffer = malloc(*sig_len), bio_buffer);
+	*sig = bio_buffer;
+	TEST(*sig_len = i2d_PKCS7(p7, &bio_buffer), *sig_len);
+	TESTEQUAL(PKCS7_verify(p7, NULL, NULL, bio, NULL,
+			 pkcs7_flags | PKCS7_NOVERIFY | PKCS7_NOSIGS), 1);
+
+	result = TEST_SUCCESS;
+out:
+	PKCS7_free(p7);
+	BIO_free(bio);
+	return result;
+}
+
+static int verity_installed(const char *mount_dir, int cmd_fd, bool *installed)
+{
+	int result = TEST_FAILURE;
+	char *filename = NULL;
+	int fd = -1;
+	struct test_file *file = &get_test_files_set().files[0];
+
+	TESTEQUAL(emit_file(cmd_fd, NULL, file->name, &file->id, file->size,
+			    NULL), 0);
+	TEST(filename = concat_file_name(mount_dir, file->name), filename);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTEQUAL(ioctl(fd, FS_IOC_ENABLE_VERITY, NULL), -1);
+	*installed = errno != EOPNOTSUPP;
+
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	if (filename)
+		remove(filename);
+	free(filename);
+	return result;
+}
+
+static int enable_verity(const char *mount_dir, struct test_file *file,
+			   EVP_PKEY *key, X509 *cert, bool use_signatures)
+{
+	int result = TEST_FAILURE;
+	char *filename = NULL;
+	int fd = -1;
+	struct fsverity_enable_arg fear = {
+		.version = 1,
+		.hash_algorithm = FS_VERITY_HASH_ALG_SHA256,
+		.block_size = INCFS_DATA_FILE_BLOCK_SIZE,
+		.sig_size = 0,
+		.sig_ptr = 0,
+	};
+	struct {
+		__u8 version;           /* must be 1 */
+		__u8 hash_algorithm;    /* Merkle tree hash algorithm */
+		__u8 log_blocksize;     /* log2 of size of data and tree blocks */
+		__u8 salt_size;         /* size of salt in bytes; 0 if none */
+		__le32 sig_size;        /* must be 0 */
+		__le64 data_size;       /* size of file the Merkle tree is built over */
+		__u8 root_hash[64];     /* Merkle tree root hash */
+		__u8 salt[32];          /* salt prepended to each hashed block */
+		__u8 __reserved[144];   /* must be 0's */
+	} __packed fsverity_descriptor = {
+		.version = 1,
+		.hash_algorithm = 1,
+		.log_blocksize = 12,
+		.data_size = file->size,
+	};
+	struct {
+		char magic[8];                  /* must be "FSVerity" */
+		__le16 digest_algorithm;
+		__le16 digest_size;
+		__u8 digest[32];
+	} __packed fsverity_signed_digest =  {
+		.digest_algorithm = 1,
+		.digest_size = 32
+	};
+	unsigned char *sig = NULL;
+	size_t sig_size = 0;
+	uint64_t flags;
+	struct statx statxbuf = {};
+
+	memcpy(fsverity_signed_digest.magic, "FSVerity", 8);
+
+	TEST(filename = concat_file_name(mount_dir, file->name), filename);
+	TESTEQUAL(syscall(__NR_statx, AT_FDCWD, filename, 0, STATX_ALL,
+			  &statxbuf), 0);
+	TESTEQUAL(statxbuf.stx_attributes_mask & STATX_ATTR_VERITY,
+		  STATX_ATTR_VERITY);
+	TESTEQUAL(statxbuf.stx_attributes & STATX_ATTR_VERITY, 0);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTEQUAL(ioctl(fd, FS_IOC_GETFLAGS, &flags), 0);
+	TESTEQUAL(flags & FS_VERITY_FL, 0);
+
+	/* First try to enable verity with random digest */
+	if (key) {
+		TESTEQUAL(sign(key, cert, (void *)&fsverity_signed_digest,
+			    sizeof(fsverity_signed_digest), &sig, &sig_size),
+			  0);
+
+		fear.sig_size = sig_size;
+		fear.sig_ptr = ptr_to_u64(sig);
+		TESTEQUAL(ioctl(fd, FS_IOC_ENABLE_VERITY, &fear), -1);
+	}
+
+	/* Now try with correct digest */
+	memcpy(fsverity_descriptor.root_hash, file->root_hash, 32);
+	sha256((char *)&fsverity_descriptor, sizeof(fsverity_descriptor),
+	       (char *)fsverity_signed_digest.digest);
+
+	if (ioctl(fd, FS_IOC_ENABLE_VERITY, NULL) == -1 &&
+	    errno == EOPNOTSUPP) {
+		result = TEST_SUCCESS;
+		goto out;
+	}
+
+	free(sig);
+	sig = NULL;
+
+	if (key)
+		TESTEQUAL(sign(key, cert, (void *)&fsverity_signed_digest,
+			       sizeof(fsverity_signed_digest),
+			       &sig, &sig_size),
+		  0);
+
+	if (use_signatures) {
+		fear.sig_size = sig_size;
+		file->verity_sig_size = sig_size;
+		fear.sig_ptr = ptr_to_u64(sig);
+		file->verity_sig = sig;
+		sig = NULL;
+	} else {
+		fear.sig_size = 0;
+		fear.sig_ptr = 0;
+	}
+	TESTEQUAL(ioctl(fd, FS_IOC_ENABLE_VERITY, &fear), 0);
+
+	result = TEST_SUCCESS;
+out:
+	free(sig);
+	close(fd);
+	free(filename);
+	return result;
+}
+
+static int memzero(const unsigned char *buf, size_t size)
+{
+	size_t i;
+
+	for (i = 0; i < size; ++i)
+		if (buf[i])
+			return -1;
+	return 0;
+}
+
+static int validate_verity(const char *mount_dir, struct test_file *file)
+{
+	int result = TEST_FAILURE;
+	char *filename = concat_file_name(mount_dir, file->name);
+	int fd = -1;
+	uint64_t flags;
+	struct fsverity_digest *digest;
+	struct statx statxbuf = {};
+	struct fsverity_read_metadata_arg frma = {};
+	uint8_t *buf = NULL;
+	struct fsverity_descriptor desc;
+
+	TEST(digest = malloc(sizeof(struct fsverity_digest) +
+			     INCFS_MAX_HASH_SIZE), digest != NULL);
+	TEST(filename = concat_file_name(mount_dir, file->name), filename);
+	TESTEQUAL(syscall(__NR_statx, AT_FDCWD, filename, 0, STATX_ALL,
+			  &statxbuf), 0);
+	TESTEQUAL(statxbuf.stx_attributes & STATX_ATTR_VERITY,
+		  STATX_ATTR_VERITY);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTEQUAL(ioctl(fd, FS_IOC_GETFLAGS, &flags), 0);
+	TESTEQUAL(flags & FS_VERITY_FL, FS_VERITY_FL);
+	digest->digest_size = INCFS_MAX_HASH_SIZE;
+	TESTEQUAL(ioctl(fd, FS_IOC_MEASURE_VERITY, digest), 0);
+	TESTEQUAL(digest->digest_algorithm, FS_VERITY_HASH_ALG_SHA256);
+	TESTEQUAL(digest->digest_size, 32);
+
+	if (file->verity_sig) {
+		TEST(buf = malloc(file->verity_sig_size), buf);
+		frma = (struct fsverity_read_metadata_arg) {
+			.metadata_type = FS_VERITY_METADATA_TYPE_SIGNATURE,
+			.length = file->verity_sig_size,
+			.buf_ptr = ptr_to_u64(buf),
+		};
+		TESTEQUAL(ioctl(fd, FS_IOC_READ_VERITY_METADATA, &frma),
+			  file->verity_sig_size);
+		TESTEQUAL(memcmp(buf, file->verity_sig, file->verity_sig_size),
+			  0);
+	} else {
+		frma = (struct fsverity_read_metadata_arg) {
+			.metadata_type = FS_VERITY_METADATA_TYPE_SIGNATURE,
+		};
+		TESTEQUAL(ioctl(fd, FS_IOC_READ_VERITY_METADATA, &frma), -1);
+		TESTEQUAL(errno, ENODATA);
+	}
+
+	frma = (struct fsverity_read_metadata_arg) {
+		.metadata_type = FS_VERITY_METADATA_TYPE_DESCRIPTOR,
+		.length = sizeof(desc),
+		.buf_ptr = ptr_to_u64(&desc),
+	};
+	TESTEQUAL(ioctl(fd, FS_IOC_READ_VERITY_METADATA, &frma),
+		  sizeof(desc));
+	TESTEQUAL(desc.version, 1);
+	TESTEQUAL(desc.hash_algorithm, FS_VERITY_HASH_ALG_SHA256);
+	TESTEQUAL(desc.log_blocksize, ilog2(INCFS_DATA_FILE_BLOCK_SIZE));
+	TESTEQUAL(desc.salt_size, 0);
+	TESTEQUAL(desc.__reserved_0x04, 0);
+	TESTEQUAL(desc.data_size, file->size);
+	TESTEQUAL(memcmp(desc.root_hash, file->root_hash, SHA256_DIGEST_SIZE),
+		  0);
+	TESTEQUAL(memzero(desc.root_hash + SHA256_DIGEST_SIZE,
+			  sizeof(desc.root_hash) - SHA256_DIGEST_SIZE), 0);
+	TESTEQUAL(memzero(desc.salt, sizeof(desc.salt)), 0);
+	TESTEQUAL(memzero(desc.__reserved, sizeof(desc.__reserved)), 0);
+
+	result = TEST_SUCCESS;
+out:
+	free(buf);
+	close(fd);
+	free(filename);
+	free(digest);
+	return result;
+}
+
+static int verity_test_optional_sigs(const char *mount_dir, bool use_signatures)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL;
+	bool installed;
+	int cmd_fd = -1;
+	int i;
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+	EVP_PKEY *key = NULL;
+	X509 *cert = NULL;
+	BIO *mem = NULL;
+	long len;
+	void *ptr;
+	FILE *proc_key_fd = NULL;
+	char *line = NULL;
+	size_t read = 0;
+	int key_id = -1;
+
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir, "readahead=0", false),
+		  0);
+	TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+	TESTEQUAL(verity_installed(mount_dir, cmd_fd, &installed), 0);
+	if (!installed) {
+		result = TEST_SUCCESS;
+		goto out;
+	}
+	TEST(key = create_key(), key);
+	TEST(cert = get_cert(key), cert);
+
+	TEST(proc_key_fd = fopen("/proc/keys", "r"), proc_key_fd != NULL);
+	while (getline(&line, &read, proc_key_fd) != -1)
+		if (strstr(line, ".fs-verity"))
+			key_id = strtol(line, NULL, 16);
+
+	TEST(mem = BIO_new(BIO_s_mem()), mem != NULL);
+	TESTEQUAL(i2d_X509_bio(mem, cert), 1);
+	TEST(len = BIO_get_mem_data(mem, &ptr), len != 0);
+	TESTCOND(key_id == -1
+		 || syscall(__NR_add_key, "asymmetric", "test:key", ptr, len,
+			    key_id) != -1);
+
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		build_mtree(file);
+		TESTEQUAL(crypto_emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, file->root_hash,
+				     file->sig.add_data), 0);
+
+		TESTEQUAL(load_hash_tree(mount_dir, file), 0);
+		TESTEQUAL(enable_verity(mount_dir, file, key, cert,
+					use_signatures),
+			  0);
+	}
+
+	for (i = 0; i < file_num; i++)
+		TESTEQUAL(validate_verity(mount_dir, &test.files[i]), 0);
+
+	close(cmd_fd);
+	cmd_fd = -1;
+	TESTEQUAL(umount(mount_dir), 0);
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir, "readahead=0", false),
+		  0);
+
+	for (i = 0; i < file_num; i++)
+		TESTEQUAL(validate_verity(mount_dir, &test.files[i]), 0);
+
+	result = TEST_SUCCESS;
+out:
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		free(file->mtree);
+		free(file->verity_sig);
+
+		file->mtree = NULL;
+		file->verity_sig = NULL;
+	}
+
+	free(line);
+	BIO_free(mem);
+	X509_free(cert);
+	EVP_PKEY_free(key);
+	fclose(proc_key_fd);
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+static int verity_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+
+	TESTEQUAL(verity_test_optional_sigs(mount_dir, true), TEST_SUCCESS);
+	TESTEQUAL(verity_test_optional_sigs(mount_dir, false), TEST_SUCCESS);
+	result = TEST_SUCCESS;
+out:
+	return result;
+}
+
+static int verity_file_valid(const char *mount_dir, struct test_file *file)
+{
+	int result = TEST_FAILURE;
+	char *filename = NULL;
+	int fd = -1;
+	uint8_t buffer[INCFS_DATA_FILE_BLOCK_SIZE];
+	struct incfs_get_file_sig_args gfsa = {
+		.file_signature = ptr_to_u64(buffer),
+		.file_signature_buf_size = sizeof(buffer),
+	};
+	int i;
+
+	TEST(filename = concat_file_name(mount_dir, file->name), filename);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTEQUAL(ioctl(fd, INCFS_IOC_READ_FILE_SIGNATURE, &gfsa), 0);
+	for (i = 0; i < file->size; i += sizeof(buffer))
+		TESTEQUAL(pread(fd, buffer, sizeof(buffer), i),
+			  file->size - i > sizeof(buffer) ?
+				sizeof(buffer) : file->size - i);
+
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	free(filename);
+	return result;
+}
+
+static int enable_verity_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL;
+	bool installed;
+	int cmd_fd = -1;
+	struct test_files_set test = get_test_files_set();
+	int i;
+
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs(mount_dir, backing_dir, 0), 0);
+	TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+	TESTEQUAL(verity_installed(mount_dir, cmd_fd, &installed), 0);
+	if (!installed) {
+		result = TEST_SUCCESS;
+		goto out;
+	}
+	for (i = 0; i < test.files_count; ++i) {
+		struct test_file *file = &test.files[i];
+
+		TESTEQUAL(emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, NULL), 0);
+		TESTEQUAL(emit_test_file_data(mount_dir, file), 0);
+		TESTEQUAL(enable_verity(mount_dir, file, NULL, NULL, false), 0);
+	}
+
+	/* Check files are valid on disk */
+	close(cmd_fd);
+	cmd_fd = -1;
+	TESTEQUAL(umount(mount_dir), 0);
+	TESTEQUAL(mount_fs(mount_dir, backing_dir, 0), 0);
+	for (i = 0; i < test.files_count; ++i)
+		TESTEQUAL(verity_file_valid(mount_dir, &test.files[i]), 0);
+
+	result = TEST_SUCCESS;
+out:
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+static int mmap_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL;
+	int cmd_fd = -1;
+	/*
+	 * File is big enough to have a two layer tree with two hashes in the
+	 * higher level, so we can corrupt the second one
+	 */
+	int shas_per_block = INCFS_DATA_FILE_BLOCK_SIZE / SHA256_DIGEST_SIZE;
+	struct test_file file = {
+		  .name = "file",
+		  .size = INCFS_DATA_FILE_BLOCK_SIZE * shas_per_block * 2,
+	};
+	char *filename = NULL;
+	int fd = -1;
+	char *addr = (void *)-1;
+
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs(mount_dir, backing_dir, 0), 0);
+	TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+
+	TESTEQUAL(build_mtree(&file), 0);
+	file.mtree[1].data[INCFS_DATA_FILE_BLOCK_SIZE] ^= 0xff;
+	TESTEQUAL(crypto_emit_file(cmd_fd, NULL, file.name, &file.id,
+			       file.size, file.root_hash,
+			       file.sig.add_data), 0);
+	TESTEQUAL(emit_test_file_data(mount_dir, &file), 0);
+	TESTEQUAL(load_hash_tree(mount_dir, &file), 0);
+	TEST(filename = concat_file_name(mount_dir, file.name), filename);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TEST(addr = mmap(NULL, file.size, PROT_READ, MAP_PRIVATE, fd, 0),
+	     addr != (void *) -1);
+	TESTEQUAL(mlock(addr, INCFS_DATA_FILE_BLOCK_SIZE), 0);
+	TESTEQUAL(munlock(addr, INCFS_DATA_FILE_BLOCK_SIZE), 0);
+	TESTEQUAL(mlock(addr + shas_per_block * INCFS_DATA_FILE_BLOCK_SIZE,
+			INCFS_DATA_FILE_BLOCK_SIZE), -1);
+	TESTEQUAL(mlock(addr + (shas_per_block - 1) *
+			       INCFS_DATA_FILE_BLOCK_SIZE,
+			INCFS_DATA_FILE_BLOCK_SIZE), 0);
+	TESTEQUAL(munlock(addr + (shas_per_block - 1) *
+				 INCFS_DATA_FILE_BLOCK_SIZE,
+			  INCFS_DATA_FILE_BLOCK_SIZE), 0);
+	TESTEQUAL(mlock(addr + (shas_per_block - 1) *
+			       INCFS_DATA_FILE_BLOCK_SIZE,
+			INCFS_DATA_FILE_BLOCK_SIZE * 2), -1);
+	TESTEQUAL(munmap(addr, file.size), 0);
+
+	result = TEST_SUCCESS;
+out:
+	free(file.mtree);
+	close(fd);
+	free(filename);
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+static int truncate_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL;
+	int cmd_fd = -1;
+	struct test_file file = {
+		  .name = "file",
+		  .size = INCFS_DATA_FILE_BLOCK_SIZE,
+	};
+	char *backing_file = NULL;
+	int fd = -1;
+	struct stat st;
+
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs(mount_dir, backing_dir, 0), 0);
+	TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+	TESTEQUAL(emit_file(cmd_fd, NULL, file.name, &file.id, file.size, NULL),
+		  0);
+	TEST(backing_file = concat_file_name(backing_dir, file.name),
+	     backing_file);
+	TEST(fd = open(backing_file, O_RDWR | O_CLOEXEC), fd != -1);
+	TESTEQUAL(stat(backing_file, &st), 0);
+	TESTCOND(st.st_blocks < 128);
+	TESTEQUAL(fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 1 << 24), 0);
+	TESTEQUAL(stat(backing_file, &st), 0);
+	TESTCOND(st.st_blocks > 32768);
+	TESTEQUAL(emit_test_file_data(mount_dir, &file), 0);
+	TESTEQUAL(stat(backing_file, &st), 0);
+	TESTCOND(st.st_blocks < 128);
+
+	result = TEST_SUCCESS;
+out:
+	close(fd);
+	free(backing_file);
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+static int stat_file_test(const char *mount_dir, int cmd_fd,
+			  struct test_file *file)
+{
+	int result = TEST_FAILURE;
+	struct stat st;
+	char *filename = NULL;
+
+	TESTEQUAL(emit_file(cmd_fd, NULL, file->name, &file->id,
+				     file->size, NULL), 0);
+	TEST(filename = concat_file_name(mount_dir, file->name), filename);
+	TESTEQUAL(stat(filename, &st), 0);
+	TESTCOND(st.st_blocks < 32);
+	TESTEQUAL(emit_test_file_data(mount_dir, file), 0);
+	TESTEQUAL(stat(filename, &st), 0);
+	TESTCOND(st.st_blocks > file->size / 512);
+
+	result = TEST_SUCCESS;
+out:
+	free(filename);
+	return result;
+}
+
+static int stat_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL;
+	int cmd_fd = -1;
+	int i;
+	struct test_files_set test = get_test_files_set();
+	const int file_num = test.files_count;
+
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs(mount_dir, backing_dir, 0), 0);
+	TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+
+	for (i = 0; i < file_num; i++) {
+		struct test_file *file = &test.files[i];
+
+		TESTEQUAL(stat_file_test(mount_dir, cmd_fd, file), 0);
+	}
+
+	result = TEST_SUCCESS;
+out:
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+#define SYSFS_DIR "/sys/fs/incremental-fs/instances/test_node/"
+
+static int sysfs_test_value(const char *name, uint64_t value)
+{
+	int result = TEST_FAILURE;
+	char *filename = NULL;
+	FILE *file = NULL;
+	uint64_t res;
+
+	TEST(filename = concat_file_name(SYSFS_DIR, name), filename);
+	TEST(file = fopen(filename, "re"), file);
+	TESTEQUAL(fscanf(file, "%lu", &res), 1);
+	TESTEQUAL(res, value);
+
+	result = TEST_SUCCESS;
+out:
+	if (file)
+		fclose(file);
+	free(filename);
+	return result;
+}
+
+static int sysfs_test_value_range(const char *name, uint64_t low, uint64_t high)
+{
+	int result = TEST_FAILURE;
+	char *filename = NULL;
+	FILE *file = NULL;
+	uint64_t res;
+
+	TEST(filename = concat_file_name(SYSFS_DIR, name), filename);
+	TEST(file = fopen(filename, "re"), file);
+	TESTEQUAL(fscanf(file, "%lu", &res), 1);
+	TESTCOND(res >= low && res <= high);
+
+	result = TEST_SUCCESS;
+out:
+	if (file)
+		fclose(file);
+	free(filename);
+	return result;
+}
+
+static int ioctl_test_last_error(int cmd_fd, const incfs_uuid_t *file_id,
+				 int page, int error)
+{
+	int result = TEST_FAILURE;
+	struct incfs_get_last_read_error_args glre;
+
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_GET_LAST_READ_ERROR, &glre), 0);
+	if (file_id)
+		TESTEQUAL(memcmp(&glre.file_id_out, file_id, sizeof(*file_id)),
+			  0);
+
+	TESTEQUAL(glre.page_out, page);
+	TESTEQUAL(glre.errno_out, error);
+	result = TEST_SUCCESS;
+out:
+	return result;
+}
+
+static int sysfs_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL;
+	int cmd_fd = -1;
+	struct test_file file = {
+		  .name = "file",
+		  .size = INCFS_DATA_FILE_BLOCK_SIZE,
+	};
+	char *filename = NULL;
+	int fd = -1;
+	int pid = -1;
+	char buffer[32];
+	char *null_buf = NULL;
+	int status;
+	struct incfs_per_uid_read_timeouts purt_set[] = {
+		{
+			.uid = 0,
+			.min_time_us = 1000000,
+			.min_pending_time_us = 1000000,
+			.max_pending_time_us = 2000000,
+		},
+	};
+	struct incfs_set_read_timeouts_args srt = {
+		ptr_to_u64(purt_set),
+		sizeof(purt_set)
+	};
+
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir, "sysfs_name=test_node",
+			       false),
+		  0);
+	TEST(cmd_fd = open_commands_file(mount_dir), cmd_fd != -1);
+	TESTEQUAL(build_mtree(&file), 0);
+	file.root_hash[0] ^= 0xff;
+	TESTEQUAL(crypto_emit_file(cmd_fd, NULL, file.name, &file.id, file.size,
+				   file.root_hash, file.sig.add_data),
+		  0);
+	TEST(filename = concat_file_name(mount_dir, file.name), filename);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTEQUAL(ioctl_test_last_error(cmd_fd, NULL, 0, 0), 0);
+	TESTEQUAL(sysfs_test_value("reads_failed_timed_out", 0), 0);
+	TESTEQUAL(read(fd, null_buf, 1), -1);
+	TESTEQUAL(ioctl_test_last_error(cmd_fd, &file.id, 0, -ETIME), 0);
+	TESTEQUAL(sysfs_test_value("reads_failed_timed_out", 2), 0);
+
+	TESTEQUAL(emit_test_file_data(mount_dir, &file), 0);
+	TESTEQUAL(sysfs_test_value("reads_failed_hash_verification", 0), 0);
+	TESTEQUAL(read(fd, null_buf, 1), -1);
+	TESTEQUAL(sysfs_test_value("reads_failed_hash_verification", 1), 0);
+	TESTSYSCALL(close(fd));
+	fd = -1;
+
+	TESTSYSCALL(unlink(filename));
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir,
+			       "read_timeout_ms=10000,sysfs_name=test_node",
+			       true),
+		  0);
+	TESTEQUAL(emit_file(cmd_fd, NULL, file.name, &file.id, file.size, NULL),
+		  0);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTSYSCALL(fcntl(fd, F_SETFD, fcntl(fd, F_GETFD) | FD_CLOEXEC));
+	TEST(pid = fork(), pid != -1);
+	if (pid == 0) {
+		TESTEQUAL(read(fd, buffer, sizeof(buffer)), sizeof(buffer));
+		exit(0);
+	}
+	sleep(1);
+	TESTEQUAL(sysfs_test_value("reads_delayed_pending", 0), 0);
+	TESTEQUAL(emit_test_file_data(mount_dir, &file), 0);
+	TESTNE(wait(&status), -1);
+	TESTEQUAL(status, 0);
+	TESTEQUAL(sysfs_test_value("reads_delayed_pending", 1), 0);
+	/* Allow +/- 10% */
+	TESTEQUAL(sysfs_test_value_range("reads_delayed_pending_us", 900000, 1100000),
+		  0);
+
+	TESTSYSCALL(close(fd));
+	fd = -1;
+
+	TESTSYSCALL(unlink(filename));
+	TESTEQUAL(ioctl(cmd_fd, INCFS_IOC_SET_READ_TIMEOUTS, &srt), 0);
+	TESTEQUAL(emit_file(cmd_fd, NULL, file.name, &file.id, file.size, NULL),
+		  0);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTEQUAL(sysfs_test_value("reads_delayed_min", 0), 0);
+	TESTEQUAL(emit_test_file_data(mount_dir, &file), 0);
+	TESTEQUAL(read(fd, buffer, sizeof(buffer)), sizeof(buffer));
+	TESTEQUAL(sysfs_test_value("reads_delayed_min", 1), 0);
+	/* This should be exact */
+	TESTEQUAL(sysfs_test_value("reads_delayed_min_us", 1000000), 0);
+
+	TESTSYSCALL(close(fd));
+	fd = -1;
+
+	TESTSYSCALL(unlink(filename));
+	TESTEQUAL(emit_file(cmd_fd, NULL, file.name, &file.id, file.size, NULL),
+		  0);
+	TEST(fd = open(filename, O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTSYSCALL(fcntl(fd, F_SETFD, fcntl(fd, F_GETFD) | FD_CLOEXEC));
+	TEST(pid = fork(), pid != -1);
+	if (pid == 0) {
+		TESTEQUAL(read(fd, buffer, sizeof(buffer)), sizeof(buffer));
+		exit(0);
+	}
+	usleep(500000);
+	TESTEQUAL(sysfs_test_value("reads_delayed_pending", 1), 0);
+	TESTEQUAL(sysfs_test_value("reads_delayed_min", 1), 0);
+	TESTEQUAL(emit_test_file_data(mount_dir, &file), 0);
+	TESTNE(wait(&status), -1);
+	TESTEQUAL(status, 0);
+	TESTEQUAL(sysfs_test_value("reads_delayed_pending", 2), 0);
+	TESTEQUAL(sysfs_test_value("reads_delayed_min", 2), 0);
+	/* Exact 1000000 plus 500000 +/- 10% */
+	TESTEQUAL(sysfs_test_value_range("reads_delayed_min_us", 1450000, 1550000), 0);
+	/* Allow +/- 10% */
+	TESTEQUAL(sysfs_test_value_range("reads_delayed_pending_us", 1350000, 1650000),
+		  0);
+
+	result = TEST_SUCCESS;
+out:
+	if (pid == 0)
+		exit(result);
+	free(file.mtree);
+	free(filename);
+	close(fd);
+	close(cmd_fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+static int sysfs_test_directories(bool one_present, bool two_present)
+{
+	int result = TEST_FAILURE;
+	struct stat st;
+
+	TESTEQUAL(stat("/sys/fs/incremental-fs/instances/1", &st),
+		  one_present ? 0 : -1);
+	if (one_present)
+		TESTCOND(S_ISDIR(st.st_mode));
+	else
+		TESTEQUAL(errno, ENOENT);
+	TESTEQUAL(stat("/sys/fs/incremental-fs/instances/2", &st),
+		  two_present ? 0 : -1);
+	if (two_present)
+		TESTCOND(S_ISDIR(st.st_mode));
+	else
+		TESTEQUAL(errno, ENOENT);
+
+	result = TEST_SUCCESS;
+out:
+	return result;
+}
+
+static int sysfs_rename_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL;
+	char *mount_dir2 = NULL;
+	int fd = -1;
+	char c;
+
+	/* Mount with no node */
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs(mount_dir, backing_dir, 0), 0);
+	TESTEQUAL(sysfs_test_directories(false, false), 0);
+
+	/* Remount with node */
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir, "sysfs_name=1", true),
+		  0);
+	TESTEQUAL(sysfs_test_directories(true, false), 0);
+	TEST(fd = open("/sys/fs/incremental-fs/instances/1/reads_delayed_min",
+		       O_RDONLY | O_CLOEXEC), fd != -1);
+	TESTEQUAL(pread(fd, &c, 1, 0), 1);
+	TESTEQUAL(c, '0');
+	TESTEQUAL(pread(fd, &c, 1, 0), 1);
+	TESTEQUAL(c, '0');
+
+	/* Rename node */
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir, "sysfs_name=2", true),
+		  0);
+	TESTEQUAL(sysfs_test_directories(false, true), 0);
+	TESTEQUAL(pread(fd, &c, 1, 0), -1);
+
+	/* Try mounting another instance with same node name */
+	TEST(mount_dir2 = concat_file_name(backing_dir, "incfs-mount-dir2"),
+	     mount_dir2);
+	rmdir(mount_dir2); /* In case we crashed before */
+	TESTSYSCALL(mkdir(mount_dir2, 0777));
+	TEST(mount_fs_opt(mount_dir2, backing_dir, "sysfs_name=2", false),
+		  -1);
+
+	/* Try mounting another instance then remounting with existing name */
+	TESTEQUAL(mount_fs(mount_dir2, backing_dir, 0), 0);
+	TESTEQUAL(mount_fs_opt(mount_dir2, backing_dir, "sysfs_name=2", true),
+		  -1);
+
+	/* Remount with no node */
+	TESTEQUAL(mount_fs_opt(mount_dir, backing_dir, "", true),
+		  0);
+	TESTEQUAL(sysfs_test_directories(false, false), 0);
+
+	result = TEST_SUCCESS;
+out:
+	umount(mount_dir2);
+	rmdir(mount_dir2);
+	free(mount_dir2);
+	close(fd);
+	umount(mount_dir);
+	free(backing_dir);
+	return result;
+}
+
+static int stacked_mount_test(const char *mount_dir)
+{
+	int result = TEST_FAILURE;
+	char *backing_dir = NULL;
+
+	/* Mount with no node */
+	TEST(backing_dir = create_backing_dir(mount_dir), backing_dir);
+	TESTEQUAL(mount_fs(mount_dir, backing_dir, 0), 0);
+	/* Try mounting another instance with same name */
+	TESTEQUAL(mount_fs(mount_dir, backing_dir, 0), 0);
+	/* Try unmounting the first instance */
+	TESTEQUAL(umount_fs(mount_dir), 0);
+	/* Try unmounting the second instance */
+	TESTEQUAL(umount_fs(mount_dir), 0);
+	result = TEST_SUCCESS;
+out:
+	/* Cleanup */
+	rmdir(mount_dir);
+	rmdir(backing_dir);
+	free(backing_dir);
+	return result;
+}
+
+static char *setup_mount_dir()
+{
+	struct stat st;
+	char *current_dir = getcwd(NULL, 0);
+	char *mount_dir = concat_file_name(current_dir, "incfs-mount-dir");
+
+	free(current_dir);
+	if (stat(mount_dir, &st) == 0) {
+		if (S_ISDIR(st.st_mode))
+			return mount_dir;
+
+		ksft_print_msg("%s is a file, not a dir.\n", mount_dir);
+		return NULL;
+	}
+
+	if (mkdir(mount_dir, 0777)) {
+		print_error("Can't create mount dir.");
+		return NULL;
+	}
+
+	return mount_dir;
+}
+
+int parse_options(int argc, char *const *argv)
+{
+	signed char c;
+
+	while ((c = getopt(argc, argv, "f:t:v")) != -1)
+		switch (c) {
+		case 'f':
+			options.file = strtol(optarg, NULL, 10);
+			break;
+
+		case 't':
+			options.test = strtol(optarg, NULL, 10);
+			break;
+
+		case 'v':
+			options.verbose = true;
+			break;
+
+		default:
+			return -EINVAL;
+		}
+
+	return 0;
+}
+
+struct test_case {
+	int (*pfunc)(const char *dir);
+	const char *name;
+};
+
+void run_one_test(const char *mount_dir, struct test_case *test_case)
+{
+	int ret;
+
+	ksft_print_msg("Running %s\n", test_case->name);
+	ret = test_case->pfunc(mount_dir);
+
+	if (ret == TEST_SUCCESS)
+		ksft_test_result_pass("%s\n", test_case->name);
+	else if (ret == TEST_SKIP)
+		ksft_test_result_skip("%s\n", test_case->name);
+	else
+		ksft_test_result_fail("%s\n", test_case->name);
+}
+
+int main(int argc, char *argv[])
+{
+	char *mount_dir = NULL;
+	int i;
+	int fd, count;
+
+	if (parse_options(argc, argv))
+		ksft_exit_fail_msg("Bad options\n");
+
+	// Seed randomness pool for testing on QEMU
+	// NOTE - this abuses the concept of randomness - do *not* ever do this
+	// on a machine for production use - the device will think it has good
+	// randomness when it does not.
+	fd = open("/dev/urandom", O_WRONLY | O_CLOEXEC);
+	count = 4096;
+	for (int i = 0; i < 128; ++i)
+		ioctl(fd, RNDADDTOENTCNT, &count);
+	close(fd);
+
+	ksft_print_header();
+
+	if (geteuid() != 0)
+		ksft_print_msg("Not a root, might fail to mount.\n");
+
+	mount_dir = setup_mount_dir();
+	if (mount_dir == NULL)
+		ksft_exit_fail_msg("Can't create a mount dir\n");
+
+#define MAKE_TEST(test)                                                        \
+	{                                                                      \
+		test, #test                                                    \
+	}
+	struct test_case cases[] = {
+		MAKE_TEST(basic_file_ops_test),
+		MAKE_TEST(cant_touch_index_test),
+		MAKE_TEST(dynamic_files_and_data_test),
+		MAKE_TEST(concurrent_reads_and_writes_test),
+		MAKE_TEST(attribute_test),
+		MAKE_TEST(work_after_remount_test),
+		MAKE_TEST(child_procs_waiting_for_data_test),
+		MAKE_TEST(multiple_providers_test),
+		MAKE_TEST(hash_tree_test),
+		MAKE_TEST(read_log_test),
+		MAKE_TEST(get_blocks_test),
+		MAKE_TEST(get_hash_blocks_test),
+		MAKE_TEST(large_file_test),
+		MAKE_TEST(mapped_file_test),
+		MAKE_TEST(compatibility_test),
+		MAKE_TEST(data_block_count_test),
+		MAKE_TEST(hash_block_count_test),
+		MAKE_TEST(per_uid_read_timeouts_test),
+		MAKE_TEST(inotify_test),
+		MAKE_TEST(verity_test),
+		MAKE_TEST(enable_verity_test),
+		MAKE_TEST(mmap_test),
+		MAKE_TEST(truncate_test),
+		MAKE_TEST(stat_test),
+		MAKE_TEST(sysfs_test),
+		MAKE_TEST(sysfs_rename_test),
+		MAKE_TEST(stacked_mount_test),
+	};
+#undef MAKE_TEST
+
+	if (options.test) {
+		if (options.test <= 0 || options.test > ARRAY_SIZE(cases))
+			ksft_exit_fail_msg("Invalid test\n");
+
+		ksft_set_plan(1);
+		run_one_test(mount_dir, &cases[options.test - 1]);
+	} else {
+		ksft_set_plan(ARRAY_SIZE(cases));
+		for (i = 0; i < ARRAY_SIZE(cases); ++i)
+			run_one_test(mount_dir, &cases[i]);
+	}
+
+	umount2(mount_dir, MNT_FORCE);
+	rmdir(mount_dir);
+	return !ksft_get_fail_cnt() ? ksft_exit_pass() : ksft_exit_fail();
+}

diff --git a/tools/testing/selftests/filesystems/incfs/utils.c b/tools/testing/selftests/filesystems/incfs/utils.c
new file mode 100644
index 0000000..d7deb53
--- /dev/null
+++ b/tools/testing/selftests/filesystems/incfs/utils.c

@@ -0,0 +1,391 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+#include <dirent.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/ioctl.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+
+#include <openssl/sha.h>
+#include <openssl/md5.h>
+
+#include "utils.h"
+
+#ifndef __S_IFREG
+#define __S_IFREG S_IFREG
+#endif
+
+unsigned int rnd(unsigned int max, unsigned int *seed)
+{
+	return rand_r(seed) * ((uint64_t)max + 1) / RAND_MAX;
+}
+
+int remove_dir(const char *dir)
+{
+	int err = rmdir(dir);
+
+	if (err && errno == ENOTEMPTY) {
+		err = delete_dir_tree(dir);
+		if (err)
+			return err;
+		return 0;
+	}
+
+	if (err && errno != ENOENT)
+		return -errno;
+
+	return 0;
+}
+
+int drop_caches(void)
+{
+	int drop_caches =
+		open("/proc/sys/vm/drop_caches", O_WRONLY | O_CLOEXEC);
+	int i;
+
+	if (drop_caches == -1)
+		return -errno;
+	i = write(drop_caches, "3", 1);
+	close(drop_caches);
+
+	if (i != 1)
+		return -errno;
+
+	return 0;
+}
+
+int mount_fs(const char *mount_dir, const char *backing_dir,
+	     int read_timeout_ms)
+{
+	static const char fs_name[] = INCFS_NAME;
+	char mount_options[512];
+	int result;
+
+	snprintf(mount_options, ARRAY_SIZE(mount_options),
+		 "read_timeout_ms=%u",
+		  read_timeout_ms);
+
+	result = mount(backing_dir, mount_dir, fs_name, 0, mount_options);
+	if (result != 0)
+		perror("Error mounting fs.");
+	return result;
+}
+
+int umount_fs(const char *mount_dir)
+{
+	int result;
+
+	result = umount(mount_dir);
+	if (result != 0)
+		perror("Error unmounting fs.");
+	return result;
+}
+
+int mount_fs_opt(const char *mount_dir, const char *backing_dir,
+		 const char *opt, bool remount)
+{
+	static const char fs_name[] = INCFS_NAME;
+	int result;
+
+	result = mount(backing_dir, mount_dir, fs_name,
+		       remount ? MS_REMOUNT : 0, opt);
+	if (result != 0)
+		perror("Error mounting fs.");
+	return result;
+}
+
+struct hash_section {
+	uint32_t algorithm;
+	uint8_t log2_blocksize;
+	uint32_t salt_size;
+	/* no salt */
+	uint32_t hash_size;
+	uint8_t hash[SHA256_DIGEST_SIZE];
+} __packed;
+
+struct signature_blob {
+	uint32_t version;
+	uint32_t hash_section_size;
+	struct hash_section hash_section;
+	uint32_t signing_section_size;
+	uint8_t signing_section[];
+} __packed;
+
+size_t format_signature(void **buf, const char *root_hash, const char *add_data)
+{
+	size_t size = sizeof(struct signature_blob) + strlen(add_data) + 1;
+	struct signature_blob *sb = malloc(size);
+
+	if (!sb)
+		return 0;
+
+	*sb = (struct signature_blob){
+		.version = INCFS_SIGNATURE_VERSION,
+		.hash_section_size = sizeof(struct hash_section),
+		.hash_section =
+			(struct hash_section){
+				.algorithm = INCFS_HASH_TREE_SHA256,
+				.log2_blocksize = 12,
+				.salt_size = 0,
+				.hash_size = SHA256_DIGEST_SIZE,
+			},
+		.signing_section_size = strlen(add_data) + 1,
+	};
+
+	memcpy(sb->hash_section.hash, root_hash, SHA256_DIGEST_SIZE);
+	memcpy((char *)sb->signing_section, add_data, strlen(add_data) + 1);
+	*buf = sb;
+	return size;
+}
+
+int crypto_emit_file(int fd, const char *dir, const char *filename,
+		     incfs_uuid_t *id_out, size_t size, const char *root_hash,
+		     const char *add_data)
+{
+	int mode = __S_IFREG | 0555;
+	void *signature;
+	int error = 0;
+
+	struct incfs_new_file_args args = {
+			.size = size,
+			.mode = mode,
+			.file_name = ptr_to_u64(filename),
+			.directory_path = ptr_to_u64(dir),
+			.file_attr = 0,
+			.file_attr_len = 0
+	};
+
+	args.signature_size = format_signature(&signature, root_hash, add_data);
+	args.signature_info = ptr_to_u64(signature);
+
+	md5(filename, strlen(filename), (char *)args.file_id.bytes);
+
+	if (ioctl(fd, INCFS_IOC_CREATE_FILE, &args) != 0) {
+		error = -errno;
+		goto out;
+	}
+
+	*id_out = args.file_id;
+
+out:
+	free(signature);
+	return error;
+}
+
+int emit_file(int fd, const char *dir, const char *filename,
+	      incfs_uuid_t *id_out, size_t size, const char *attr)
+{
+	int mode = __S_IFREG | 0555;
+	struct incfs_new_file_args args = { .size = size,
+					    .mode = mode,
+					    .file_name = ptr_to_u64(filename),
+					    .directory_path = ptr_to_u64(dir),
+					    .signature_info = ptr_to_u64(NULL),
+					    .signature_size = 0,
+					    .file_attr = ptr_to_u64(attr),
+					    .file_attr_len =
+						    attr ? strlen(attr) : 0 };
+
+	md5(filename, strlen(filename), (char *)args.file_id.bytes);
+
+	if (ioctl(fd, INCFS_IOC_CREATE_FILE, &args) != 0)
+		return -errno;
+
+	*id_out = args.file_id;
+	return 0;
+}
+
+int get_file_bmap(int cmd_fd, int ino, unsigned char *buf, int buf_size)
+{
+	return 0;
+}
+
+int get_file_signature(int fd, unsigned char *buf, int buf_size)
+{
+	struct incfs_get_file_sig_args args = {
+		.file_signature = ptr_to_u64(buf),
+		.file_signature_buf_size = buf_size
+	};
+
+	if (ioctl(fd, INCFS_IOC_READ_FILE_SIGNATURE, &args) == 0)
+		return args.file_signature_len_out;
+	return -errno;
+}
+
+loff_t get_file_size(const char *name)
+{
+	struct stat st;
+
+	if (stat(name, &st) == 0)
+		return st.st_size;
+	return -ENOENT;
+}
+
+int open_commands_file(const char *mount_dir)
+{
+	char cmd_file[255];
+	int cmd_fd;
+
+	snprintf(cmd_file, ARRAY_SIZE(cmd_file),
+			"%s/%s", mount_dir, INCFS_PENDING_READS_FILENAME);
+	cmd_fd = open(cmd_file, O_RDONLY | O_CLOEXEC);
+
+	if (cmd_fd < 0)
+		perror("Can't open commands file");
+	return cmd_fd;
+}
+
+int open_log_file(const char *mount_dir)
+{
+	char file[255];
+	int fd;
+
+	snprintf(file, ARRAY_SIZE(file), "%s/.log", mount_dir);
+	fd = open(file, O_RDWR | O_CLOEXEC);
+	if (fd < 0)
+		perror("Can't open log file");
+	return fd;
+}
+
+int open_blocks_written_file(const char *mount_dir)
+{
+	char file[255];
+	int fd;
+
+	snprintf(file, ARRAY_SIZE(file),
+			"%s/%s", mount_dir, INCFS_BLOCKS_WRITTEN_FILENAME);
+	fd = open(file, O_RDONLY | O_CLOEXEC);
+
+	if (fd < 0)
+		perror("Can't open blocks_written file");
+	return fd;
+}
+
+int wait_for_pending_reads(int fd, int timeout_ms,
+	struct incfs_pending_read_info *prs, int prs_count)
+{
+	ssize_t read_res = 0;
+
+	if (timeout_ms > 0) {
+		int poll_res = 0;
+		struct pollfd pollfd = {
+			.fd = fd,
+			.events = POLLIN
+		};
+
+		poll_res = poll(&pollfd, 1, timeout_ms);
+		if (poll_res < 0)
+			return -errno;
+		if (poll_res == 0)
+			return 0;
+		if (!(pollfd.revents | POLLIN))
+			return 0;
+	}
+
+	read_res = read(fd, prs, prs_count * sizeof(*prs));
+	if (read_res < 0)
+		return -errno;
+
+	return read_res / sizeof(*prs);
+}
+
+int wait_for_pending_reads2(int fd, int timeout_ms,
+	struct incfs_pending_read_info2 *prs, int prs_count)
+{
+	ssize_t read_res = 0;
+
+	if (timeout_ms > 0) {
+		int poll_res = 0;
+		struct pollfd pollfd = {
+			.fd = fd,
+			.events = POLLIN
+		};
+
+		poll_res = poll(&pollfd, 1, timeout_ms);
+		if (poll_res < 0)
+			return -errno;
+		if (poll_res == 0)
+			return 0;
+		if (!(pollfd.revents | POLLIN))
+			return 0;
+	}
+
+	read_res = read(fd, prs, prs_count * sizeof(*prs));
+	if (read_res < 0)
+		return -errno;
+
+	return read_res / sizeof(*prs);
+}
+
+char *concat_file_name(const char *dir, const char *file)
+{
+	char full_name[FILENAME_MAX] = "";
+
+	if (snprintf(full_name, ARRAY_SIZE(full_name), "%s/%s", dir, file) < 0)
+		return NULL;
+	return strdup(full_name);
+}
+
+int delete_dir_tree(const char *dir_path)
+{
+	DIR *dir = NULL;
+	struct dirent *dp;
+	int result = 0;
+
+	dir = opendir(dir_path);
+	if (!dir) {
+		result = -errno;
+		goto out;
+	}
+
+	while ((dp = readdir(dir))) {
+		char *full_path;
+
+		if (!strcmp(dp->d_name, ".") || !strcmp(dp->d_name, ".."))
+			continue;
+
+		full_path = concat_file_name(dir_path, dp->d_name);
+		if (dp->d_type == DT_DIR)
+			result = delete_dir_tree(full_path);
+		else
+			result = unlink(full_path);
+		free(full_path);
+		if (result)
+			goto out;
+	}
+
+out:
+	if (dir)
+		closedir(dir);
+	if (!result)
+		rmdir(dir_path);
+	return result;
+}
+
+void sha256(const char *data, size_t dsize, char *hash)
+{
+	SHA256_CTX ctx;
+
+	SHA256_Init(&ctx);
+	SHA256_Update(&ctx, data, dsize);
+	SHA256_Final((unsigned char *)hash, &ctx);
+}
+
+void md5(const char *data, size_t dsize, char *hash)
+{
+	MD5_CTX ctx;
+
+	MD5_Init(&ctx);
+	MD5_Update(&ctx, data, dsize);
+	MD5_Final((unsigned char *)hash, &ctx);
+}

diff --git a/tools/testing/selftests/filesystems/incfs/utils.h b/tools/testing/selftests/filesystems/incfs/utils.h
new file mode 100644
index 0000000..17a1ac5
--- /dev/null
+++ b/tools/testing/selftests/filesystems/incfs/utils.h

@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2019 Google LLC
+ */
+#include <stdbool.h>
+#include <sys/stat.h>
+
+#include <include/uapi/linux/incrementalfs.h>
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
+
+#define __packed __attribute__((__packed__))
+
+#ifdef __LP64__
+#define ptr_to_u64(p) ((__u64)p)
+#else
+#define ptr_to_u64(p) ((__u64)(__u32)p)
+#endif
+
+#define SHA256_DIGEST_SIZE 32
+#define INCFS_MAX_MTREE_LEVELS 8
+
+unsigned int rnd(unsigned int max, unsigned int *seed);
+
+int remove_dir(const char *dir);
+
+int drop_caches(void);
+
+int mount_fs(const char *mount_dir, const char *backing_dir,
+	     int read_timeout_ms);
+
+int umount_fs(const char *mount_dir);
+
+int mount_fs_opt(const char *mount_dir, const char *backing_dir,
+		 const char *opt, bool remount);
+
+int get_file_bmap(int cmd_fd, int ino, unsigned char *buf, int buf_size);
+
+int get_file_signature(int fd, unsigned char *buf, int buf_size);
+
+int emit_node(int fd, char *filename, int *ino_out, int parent_ino,
+		size_t size, mode_t mode, char *attr);
+
+int emit_file(int fd, const char *dir, const char *filename,
+	      incfs_uuid_t *id_out, size_t size, const char *attr);
+
+int crypto_emit_file(int fd, const char *dir, const char *filename,
+		     incfs_uuid_t *id_out, size_t size, const char *root_hash,
+		     const char *add_data);
+
+loff_t get_file_size(const char *name);
+
+int open_commands_file(const char *mount_dir);
+
+int open_log_file(const char *mount_dir);
+
+int open_blocks_written_file(const char *mount_dir);
+
+int wait_for_pending_reads(int fd, int timeout_ms,
+	struct incfs_pending_read_info *prs, int prs_count);
+
+int wait_for_pending_reads2(int fd, int timeout_ms,
+	struct incfs_pending_read_info2 *prs, int prs_count);
+
+char *concat_file_name(const char *dir, const char *file);
+
+void sha256(const char *data, size_t dsize, char *hash);
+
+void md5(const char *data, size_t dsize, char *hash);
+
+int delete_dir_tree(const char *path);
commit	c747c018510514875d9d790dde177c52bba7b679	[log] [tgz]
author	Greg Kroah-Hartman <gregkh@google.com>	Thu Feb 09 13:29:55 2023 +0000
committer	Greg Kroah-Hartman <gregkh@google.com>	Thu Feb 09 13:29:55 2023 +0000
tree	6873c12b013a89e31f46a6ff316987db6d6ac14f
parent	22bb0d018db1d896a0df8de10a727241573cab3f [diff]
parent	d60c95efffe84428e3611431bf688f50bfc13f4e [diff]