Merge 4.9.133 into android-4.9 Changes in 4.9.133 mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly fbdev/omapfb: fix omapfb_memory_read infoleak xen-netback: fix input validation in xenvif_set_hash_mapping() x86/vdso: Fix asm constraints on vDSO syscall fallbacks x86/vdso: Fix vDSO syscall fallback asm constraint regression PCI: Reprogram bridge prefetch registers on resume mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys PM / core: Clear the direct_complete flag on errors dm cache metadata: ignore hints array being too small during resize dm cache: fix resize crash if user doesn't reload cache table xhci: Add missing CAS workaround for Intel Sunrise Point xHCI usb: xhci-mtk: resume USB3 roothub first USB: serial: simple: add Motorola Tetra MTP6550 id tty: Drop tty->count on tty_reopen() failure of: unittest: Disable interrupt node tests for old world MAC systems ext4: add corruption check in ext4_xattr_set_entry() ext4: always verify the magic number in xattr blocks cgroup: Fix deadlock in cpu hotplug path ath10k: fix use-after-free in ath10k_wmi_cmd_send_nowait ath10k: fix kernel panic issue during pci probe powerpc/fadump: Return error when fadump registration fails ARC: clone syscall to setp r25 as thread pointer x86/mm: Expand static page table for fixmap space f2fs: fix invalid memory access ucma: fix a use-after-free in ucma_resolve_ip() ubifs: Check for name being NULL while mounting ath10k: fix scan crash due to incorrect length calculation ebtables: arpreply: Add the standard target sanity check x86/fpu: Remove use_eager_fpu() x86/fpu: Remove struct fpu::counter Revert "perf: sync up x86/.../cpufeatures.h" x86/fpu: Finish excising 'eagerfpu' Linux 4.9.133 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
diff --git a/.gitignore b/.gitignore index c2ed4ec..4105cfb 100644 --- a/.gitignore +++ b/.gitignore
@@ -33,6 +33,7 @@ *.lzo *.patch *.gcno +*.ll modules.builtin Module.symvers *.dwo @@ -114,3 +115,6 @@ # Kdevelop4 *.kdev4 + +# fetched Android config fragments +kernel/configs/android-*.cfg
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 3acc4f1..4a5a887 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX
@@ -436,6 +436,8 @@ - info on the magic SysRq key. target/ - directory with info on generating TCM v4 fabric .ko modules +tee.txt + - info on the TEE subsystem and drivers this_cpu_ops.txt - List rationale behind and the way to use this_cpu operations. thermal/
diff --git a/Documentation/ABI/obsolete/sysfs-block-zram b/Documentation/ABI/obsolete/sysfs-block-zram deleted file mode 100644 index 720ea92..0000000 --- a/Documentation/ABI/obsolete/sysfs-block-zram +++ /dev/null
@@ -1,119 +0,0 @@ -What: /sys/block/zram<id>/num_reads -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The num_reads file is read-only and specifies the number of - reads (failed or successful) done on this device. - Now accessible via zram<id>/stat node. - -What: /sys/block/zram<id>/num_writes -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The num_writes file is read-only and specifies the number of - writes (failed or successful) done on this device. - Now accessible via zram<id>/stat node. - -What: /sys/block/zram<id>/invalid_io -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The invalid_io file is read-only and specifies the number of - non-page-size-aligned I/O requests issued to this device. - Now accessible via zram<id>/io_stat node. - -What: /sys/block/zram<id>/failed_reads -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The failed_reads file is read-only and specifies the number of - failed reads happened on this device. - Now accessible via zram<id>/io_stat node. - -What: /sys/block/zram<id>/failed_writes -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The failed_writes file is read-only and specifies the number of - failed writes happened on this device. - Now accessible via zram<id>/io_stat node. - -What: /sys/block/zram<id>/notify_free -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The notify_free file is read-only. Depending on device usage - scenario it may account a) the number of pages freed because - of swap slot free notifications or b) the number of pages freed - because of REQ_DISCARD requests sent by bio. The former ones - are sent to a swap block device when a swap slot is freed, which - implies that this disk is being used as a swap disk. The latter - ones are sent by filesystem mounted with discard option, - whenever some data blocks are getting discarded. - Now accessible via zram<id>/io_stat node. - -What: /sys/block/zram<id>/zero_pages -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The zero_pages file is read-only and specifies number of zero - filled pages written to this disk. No memory is allocated for - such pages. - Now accessible via zram<id>/mm_stat node. - -What: /sys/block/zram<id>/orig_data_size -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The orig_data_size file is read-only and specifies uncompressed - size of data stored in this disk. This excludes zero-filled - pages (zero_pages) since no memory is allocated for them. - Unit: bytes - Now accessible via zram<id>/mm_stat node. - -What: /sys/block/zram<id>/compr_data_size -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The compr_data_size file is read-only and specifies compressed - size of data stored in this disk. So, compression ratio can be - calculated using orig_data_size and this statistic. - Unit: bytes - Now accessible via zram<id>/mm_stat node. - -What: /sys/block/zram<id>/mem_used_total -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The mem_used_total file is read-only and specifies the amount - of memory, including allocator fragmentation and metadata - overhead, allocated for this disk. So, allocator space - efficiency can be calculated using compr_data_size and this - statistic. - Unit: bytes - Now accessible via zram<id>/mm_stat node. - -What: /sys/block/zram<id>/mem_used_max -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The mem_used_max file is read/write and specifies the amount - of maximum memory zram have consumed to store compressed data. - For resetting the value, you should write "0". Otherwise, - you could see -EINVAL. - Unit: bytes - Downgraded to write-only node: so it's possible to set new - value only; its current value is stored in zram<id>/mm_stat - node. - -What: /sys/block/zram<id>/mem_limit -Date: August 2015 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The mem_limit file is read/write and specifies the maximum - amount of memory ZRAM can use to store the compressed data. - The limit could be changed in run time and "0" means disable - the limit. No limit is the initial state. Unit: bytes - Downgraded to write-only node: so it's possible to set new - value only; its current value is stored in zram<id>/mm_stat - node.
diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index 4518d30..c1513c7 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -22,41 +22,6 @@ device. The reset operation frees all the memory associated with this device. -What: /sys/block/zram<id>/num_reads -Date: August 2010 -Contact: Nitin Gupta <ngupta@vflare.org> -Description: - The num_reads file is read-only and specifies the number of - reads (failed or successful) done on this device. - -What: /sys/block/zram<id>/num_writes -Date: August 2010 -Contact: Nitin Gupta <ngupta@vflare.org> -Description: - The num_writes file is read-only and specifies the number of - writes (failed or successful) done on this device. - -What: /sys/block/zram<id>/invalid_io -Date: August 2010 -Contact: Nitin Gupta <ngupta@vflare.org> -Description: - The invalid_io file is read-only and specifies the number of - non-page-size-aligned I/O requests issued to this device. - -What: /sys/block/zram<id>/failed_reads -Date: February 2014 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The failed_reads file is read-only and specifies the number of - failed reads happened on this device. - -What: /sys/block/zram<id>/failed_writes -Date: February 2014 -Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -Description: - The failed_writes file is read-only and specifies the number of - failed writes happened on this device. - What: /sys/block/zram<id>/max_comp_streams Date: February 2014 Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> @@ -73,74 +38,24 @@ available and selected compression algorithms, change compression algorithm selection. -What: /sys/block/zram<id>/notify_free -Date: August 2010 -Contact: Nitin Gupta <ngupta@vflare.org> -Description: - The notify_free file is read-only. Depending on device usage - scenario it may account a) the number of pages freed because - of swap slot free notifications or b) the number of pages freed - because of REQ_DISCARD requests sent by bio. The former ones - are sent to a swap block device when a swap slot is freed, which - implies that this disk is being used as a swap disk. The latter - ones are sent by filesystem mounted with discard option, - whenever some data blocks are getting discarded. - -What: /sys/block/zram<id>/zero_pages -Date: August 2010 -Contact: Nitin Gupta <ngupta@vflare.org> -Description: - The zero_pages file is read-only and specifies number of zero - filled pages written to this disk. No memory is allocated for - such pages. - -What: /sys/block/zram<id>/orig_data_size -Date: August 2010 -Contact: Nitin Gupta <ngupta@vflare.org> -Description: - The orig_data_size file is read-only and specifies uncompressed - size of data stored in this disk. This excludes zero-filled - pages (zero_pages) since no memory is allocated for them. - Unit: bytes - -What: /sys/block/zram<id>/compr_data_size -Date: August 2010 -Contact: Nitin Gupta <ngupta@vflare.org> -Description: - The compr_data_size file is read-only and specifies compressed - size of data stored in this disk. So, compression ratio can be - calculated using orig_data_size and this statistic. - Unit: bytes - -What: /sys/block/zram<id>/mem_used_total -Date: August 2010 -Contact: Nitin Gupta <ngupta@vflare.org> -Description: - The mem_used_total file is read-only and specifies the amount - of memory, including allocator fragmentation and metadata - overhead, allocated for this disk. So, allocator space - efficiency can be calculated using compr_data_size and this - statistic. - Unit: bytes - What: /sys/block/zram<id>/mem_used_max Date: August 2014 Contact: Minchan Kim <minchan@kernel.org> Description: - The mem_used_max file is read/write and specifies the amount - of maximum memory zram have consumed to store compressed data. - For resetting the value, you should write "0". Otherwise, - you could see -EINVAL. + The mem_used_max file is write-only and is used to reset + the counter of maximum memory zram have consumed to store + compressed data. For resetting the value, you should write + "0". Otherwise, you could see -EINVAL. Unit: bytes What: /sys/block/zram<id>/mem_limit Date: August 2014 Contact: Minchan Kim <minchan@kernel.org> Description: - The mem_limit file is read/write and specifies the maximum - amount of memory ZRAM can use to store the compressed data. The - limit could be changed in run time and "0" means disable the - limit. No limit is the initial state. Unit: bytes + The mem_limit file is write-only and specifies the maximum + amount of memory ZRAM can use to store the compressed data. + The limit could be changed in run time and "0" means disable + the limit. No limit is the initial state. Unit: bytes What: /sys/block/zram<id>/compact Date: August 2015 @@ -175,3 +90,11 @@ device's debugging info useful for kernel developers. Its format is not documented intentionally and may change anytime without any notice. + +What: /sys/block/zram<id>/backing_dev +Date: June 2017 +Contact: Minchan Kim <minchan@kernel.org> +Description: + The backing_dev file is read-write and set up backing + device for zram to write incompressible pages. + For using, user should enable CONFIG_ZRAM_WRITEBACK.
diff --git a/Documentation/ABI/testing/sysfs-class-dual-role-usb b/Documentation/ABI/testing/sysfs-class-dual-role-usb new file mode 100644 index 0000000..a900fd7 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-dual-role-usb
@@ -0,0 +1,71 @@ +What: /sys/class/dual_role_usb/.../ +Date: June 2015 +Contact: Badhri Jagan Sridharan<badhri@google.com> +Description: + Provide a generic interface to monitor and change + the state of dual role usb ports. The name here + refers to the name mentioned in the + dual_role_phy_desc that is passed while registering + the dual_role_phy_intstance through + devm_dual_role_instance_register. + +What: /sys/class/dual_role_usb/.../supported_modes +Date: June 2015 +Contact: Badhri Jagan Sridharan<badhri@google.com> +Description: + This is a static node, once initialized this + is not expected to change during runtime. "dfp" + refers to "downstream facing port" i.e. port can + only act as host. "ufp" refers to "upstream + facing port" i.e. port can only act as device. + "dfp ufp" refers to "dual role port" i.e. the port + can either be a host port or a device port. + +What: /sys/class/dual_role_usb/.../mode +Date: June 2015 +Contact: Badhri Jagan Sridharan<badhri@google.com> +Description: + The mode node refers to the current mode in which the + port is operating. "dfp" for host ports. "ufp" for device + ports and "none" when cable is not connected. + + On devices where the USB mode is software-controllable, + userspace can change the mode by writing "dfp" or "ufp". + On devices where the USB mode is fixed in hardware, + this attribute is read-only. + +What: /sys/class/dual_role_usb/.../power_role +Date: June 2015 +Contact: Badhri Jagan Sridharan<badhri@google.com> +Description: + The power_role node mentions whether the port + is "sink"ing or "source"ing power. "none" if + they are not connected. + + On devices implementing USB Power Delivery, + userspace can control the power role by writing "sink" or + "source". On devices without USB-PD, this attribute is + read-only. + +What: /sys/class/dual_role_usb/.../data_role +Date: June 2015 +Contact: Badhri Jagan Sridharan<badhri@google.com> +Description: + The data_role node mentions whether the port + is acting as "host" or "device" for USB data connection. + "none" if there is no active data link. + + On devices implementing USB Power Delivery, userspace + can control the data role by writing "host" or "device". + On devices without USB-PD, this attribute is read-only + +What: /sys/class/dual_role_usb/.../powers_vconn +Date: June 2015 +Contact: Badhri Jagan Sridharan<badhri@google.com> +Description: + The powers_vconn node mentions whether the port + is supplying power for VCONN pin. + + On devices with software control of VCONN, + userspace can disable the power supply to VCONN by writing "n", + or enable the power supply by writing "y".
diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index a809f60..3bbb9fe 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -51,12 +51,41 @@ Controls the dirty page count condition for the in-place-update policies. +What: /sys/fs/f2fs/<disk>/min_seq_blocks +Date: August 2018 +Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> +Description: + Controls the dirty page count condition for batched sequential + writes in ->writepages. + + +What: /sys/fs/f2fs/<disk>/min_hot_blocks +Date: March 2017 +Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> +Description: + Controls the dirty page count condition for redefining hot data. + +What: /sys/fs/f2fs/<disk>/min_ssr_sections +Date: October 2017 +Contact: "Chao Yu" <yuchao0@huawei.com> +Description: + Controls the fee section threshold to trigger SSR allocation. + What: /sys/fs/f2fs/<disk>/max_small_discards Date: November 2013 Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com> Description: Controls the issue rate of small discard commands. +What: /sys/fs/f2fs/<disk>/discard_granularity +Date: July 2017 +Contact: "Chao Yu" <yuchao0@huawei.com> +Description: + Controls discard granularity of inner discard thread, inner thread + will not issue discards with size that is smaller than granularity. + The unit size is one block, now only support configuring in range + of [1, 512]. + What: /sys/fs/f2fs/<disk>/max_victim_search Date: January 2014 Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com> @@ -80,6 +109,7 @@ Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> Description: Controls the trimming rate in batch mode. + <deprecated> What: /sys/fs/f2fs/<disk>/cp_interval Date: October 2015 @@ -93,6 +123,12 @@ Description: Controls the idle timing. +What: /sys/fs/f2fs/<disk>/iostat_enable +Date: August 2017 +Contact: "Chao Yu" <yuchao0@huawei.com> +Description: + Controls to enable/disable IO stat. + What: /sys/fs/f2fs/<disk>/ra_nid_pages Date: October 2015 Contact: "Chao Yu" <chao2.yu@samsung.com> @@ -112,3 +148,67 @@ Contact: "Shuoran Liu" <liushuoran@huawei.com> Description: Shows total written kbytes issued to disk. + +What: /sys/fs/f2fs/<disk>/feature +Date: July 2017 +Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> +Description: + Shows all enabled features in current device. + +What: /sys/fs/f2fs/<disk>/inject_rate +Date: May 2016 +Contact: "Sheng Yong" <shengyong1@huawei.com> +Description: + Controls the injection rate. + +What: /sys/fs/f2fs/<disk>/inject_type +Date: May 2016 +Contact: "Sheng Yong" <shengyong1@huawei.com> +Description: + Controls the injection type. + +What: /sys/fs/f2fs/<disk>/reserved_blocks +Date: June 2017 +Contact: "Chao Yu" <yuchao0@huawei.com> +Description: + Controls target reserved blocks in system, the threshold + is soft, it could exceed current available user space. + +What: /sys/fs/f2fs/<disk>/current_reserved_blocks +Date: October 2017 +Contact: "Yunlong Song" <yunlong.song@huawei.com> +Contact: "Chao Yu" <yuchao0@huawei.com> +Description: + Shows current reserved blocks in system, it may be temporarily + smaller than target_reserved_blocks, but will gradually + increase to target_reserved_blocks when more free blocks are + freed by user later. + +What: /sys/fs/f2fs/<disk>/gc_urgent +Date: August 2017 +Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> +Description: + Do background GC agressively + +What: /sys/fs/f2fs/<disk>/gc_urgent_sleep_time +Date: August 2017 +Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> +Description: + Controls sleep time of GC urgent mode + +What: /sys/fs/f2fs/<disk>/readdir_ra +Date: November 2017 +Contact: "Sheng Yong" <shengyong1@huawei.com> +Description: + Controls readahead inode block in readdir. + +What: /sys/fs/f2fs/<disk>/extension_list +Date: Feburary 2018 +Contact: "Chao Yu" <yuchao0@huawei.com> +Description: + Used to control configure extension list: + - Query: cat /sys/fs/f2fs/<disk>/extension_list + - Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list + - Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list + - [h] means add/del hot file extension + - [c] means add/del cold file extension
diff --git a/Documentation/ABI/testing/sysfs-kernel-wakeup_reasons b/Documentation/ABI/testing/sysfs-kernel-wakeup_reasons new file mode 100644 index 0000000..acb19b9 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-wakeup_reasons
@@ -0,0 +1,16 @@ +What: /sys/kernel/wakeup_reasons/last_resume_reason +Date: February 2014 +Contact: Ruchi Kandoi <kandoiruchi@google.com> +Description: + The /sys/kernel/wakeup_reasons/last_resume_reason is + used to report wakeup reasons after system exited suspend. + +What: /sys/kernel/wakeup_reasons/last_suspend_time +Date: March 2015 +Contact: jinqian <jinqian@google.com> +Description: + The /sys/kernel/wakeup_reasons/last_suspend_time is + used to report time spent in last suspend cycle. It contains + two numbers (in seconds) separated by space. First number is + the time spent in suspend and resume processes. Second number + is the time spent in sleep state. \ No newline at end of file
diff --git a/Documentation/android.txt b/Documentation/android.txt new file mode 100644 index 0000000..0f40a78 --- /dev/null +++ b/Documentation/android.txt
@@ -0,0 +1,121 @@ + ============= + A N D R O I D + ============= + +Copyright (C) 2009 Google, Inc. +Written by Mike Chan <mike@android.com> + +CONTENTS: +--------- + +1. Android + 1.1 Required enabled config options + 1.2 Required disabled config options + 1.3 Recommended enabled config options +2. Contact + + +1. Android +========== + +Android (www.android.com) is an open source operating system for mobile devices. +This document describes configurations needed to run the Android framework on +top of the Linux kernel. + +To see a working defconfig look at msm_defconfig or goldfish_defconfig +which can be found at http://android.git.kernel.org in kernel/common.git +and kernel/msm.git + + +1.1 Required enabled config options +----------------------------------- +After building a standard defconfig, ensure that these options are enabled in +your .config or defconfig if they are not already. Based off the msm_defconfig. +You should keep the rest of the default options enabled in the defconfig +unless you know what you are doing. + +ANDROID_PARANOID_NETWORK +ASHMEM +CONFIG_FB_MODE_HELPERS +CONFIG_FONT_8x16 +CONFIG_FONT_8x8 +CONFIG_YAFFS_SHORT_NAMES_IN_RAM +DAB +EARLYSUSPEND +FB +FB_CFB_COPYAREA +FB_CFB_FILLRECT +FB_CFB_IMAGEBLIT +FB_DEFERRED_IO +FB_TILEBLITTING +HIGH_RES_TIMERS +INOTIFY +INOTIFY_USER +INPUT_EVDEV +INPUT_GPIO +INPUT_MISC +LEDS_CLASS +LEDS_GPIO +LOCK_KERNEL +LkOGGER +LOW_MEMORY_KILLER +MISC_DEVICES +NEW_LEDS +NO_HZ +POWER_SUPPLY +PREEMPT +RAMFS +RTC_CLASS +RTC_LIB +SWITCH +SWITCH_GPIO +TMPFS +UID_STAT +UID16 +USB_FUNCTION +USB_FUNCTION_ADB +USER_WAKELOCK +VIDEO_OUTPUT_CONTROL +WAKELOCK +YAFFS_AUTO_YAFFS2 +YAFFS_FS +YAFFS_YAFFS1 +YAFFS_YAFFS2 + + +1.2 Required disabled config options +------------------------------------ +CONFIG_YAFFS_DISABLE_LAZY_LOAD +DNOTIFY + + +1.3 Recommended enabled config options +------------------------------ +ANDROID_PMEM +PSTORE_CONSOLE +PSTORE_RAM +SCHEDSTATS +DEBUG_PREEMPT +DEBUG_MUTEXES +DEBUG_SPINLOCK_SLEEP +DEBUG_INFO +FRAME_POINTER +CPU_FREQ +CPU_FREQ_TABLE +CPU_FREQ_DEFAULT_GOV_ONDEMAND +CPU_FREQ_GOV_ONDEMAND +CRC_CCITT +EMBEDDED +INPUT_TOUCHSCREEN +I2C +I2C_BOARDINFO +LOG_BUF_SHIFT=17 +SERIAL_CORE +SERIAL_CORE_CONSOLE + + +2. Contact +========== +website: http://android.git.kernel.org + +mailing-lists: android-kernel@googlegroups.com
diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX index e55103a..a542b9f 100644 --- a/Documentation/block/00-INDEX +++ b/Documentation/block/00-INDEX
@@ -30,3 +30,9 @@ - Switching I/O schedulers at runtime writeback_cache_control.txt - Control of volatile write back caches +mmc-max-speed.txt + - eMMC layer speed simulation, related to /sys/block/mmcblk*/ + attributes: + max_read_speed + max_write_speed + cache_size
diff --git a/Documentation/block/mmc-max-speed.txt b/Documentation/block/mmc-max-speed.txt new file mode 100644 index 0000000..3f052b9 --- /dev/null +++ b/Documentation/block/mmc-max-speed.txt
@@ -0,0 +1,38 @@ +eMMC Block layer simulation speed controls in /sys/block/mmcblk*/ +=============================================== + +Turned on with CONFIG_MMC_SIMULATE_MAX_SPEED which enables MMC device speed +limiting. Used to test and simulate the behavior of the system when +confronted with a slow MMC. + +Enables max_read_speed, max_write_speed and cache_size attributes and module +default parameters to control the write or read maximum KB/second speed +behaviors. + +NB: There is room for improving the algorithm for aspects tied directly to +eMMC specific behavior. For instance, wear leveling and stalls from an +exhausted erase pool. We would expect that if there was a need to provide +similar speed simulation controls to other types of block devices, aspects of +their behavior are modelled separately (e.g. head seek times, heat assist, +shingling and rotational latency). + +/sys/block/mmcblk0/max_read_speed: + +Number of KB/second reads allowed to the block device. Used to test and +simulate the behavior of the system when confronted with a slow reading MMC. +Set to 0 or "off" to place no speed limit. + +/sys/block/mmcblk0/max_write_speed: + +Number of KB/second writes allowed to the block device. Used to test and +simulate the behavior of the system when confronted with a slow writing MMC. +Set to 0 or "off" to place no speed limit. + +/sys/block/mmcblk0/cache_size: + +Number of MB of high speed memory or high speed SLC cache expected on the +eMMC device being simulated. Used to help simulate the write-back behavior +more accurately. The assumption is the cache has no delay, but draws down +in the background to the MLC/TLC primary store at the max_write_speed rate. +Any write speed delays will show up when the cache is full, or when an I/O +request to flush is issued.
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 0535ae1..875b2b5 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt
@@ -161,42 +161,15 @@ disksize RW show and set the device's disk size initstate RO shows the initialization state of the device reset WO trigger device reset -num_reads RO the number of reads -failed_reads RO the number of failed reads -num_write RO the number of writes -failed_writes RO the number of failed writes -invalid_io RO the number of non-page-size-aligned I/O requests +mem_used_max WO reset the `mem_used_max' counter (see later) +mem_limit WO specifies the maximum amount of memory ZRAM can use + to store the compressed data max_comp_streams RW the number of possible concurrent compress operations comp_algorithm RW show and change the compression algorithm -notify_free RO the number of notifications to free pages (either - slot free notifications or REQ_DISCARD requests) -zero_pages RO the number of zero filled pages written to this disk -orig_data_size RO uncompressed size of data stored in this disk -compr_data_size RO compressed size of data stored in this disk -mem_used_total RO the amount of memory allocated for this disk -mem_used_max RW the maximum amount of memory zram have consumed to - store the data (to reset this counter to the actual - current value, write 1 to this attribute) -mem_limit RW the maximum amount of memory ZRAM can use to store - the compressed data -pages_compacted RO the number of pages freed during compaction - (available only via zram<id>/mm_stat node) compact WO trigger memory compaction debug_stat RO this file is used for zram debugging purposes +backing_dev RW set up backend storage for zram to write out -WARNING -======= -per-stat sysfs attributes are considered to be deprecated. -The basic strategy is: --- the existing RW nodes will be downgraded to WO nodes (in linux 4.11) --- deprecated RO sysfs nodes will eventually be removed (in linux 4.11) - -The list of deprecated attributes can be found here: -Documentation/ABI/obsolete/sysfs-block-zram - -Basically, every attribute that has its own read accessible sysfs node -(e.g. num_reads) *AND* is accessible via one of the stat files (zram<id>/stat -or zram<id>/io_stat or zram<id>/mm_stat) is considered to be deprecated. User space is advised to use the following files to read the device statistics. @@ -211,22 +184,41 @@ layer and, thus, not available in zram<id>/stat file. It consists of a single line of text and contains the following stats separated by whitespace: - failed_reads - failed_writes - invalid_io - notify_free + failed_reads the number of failed reads + failed_writes the number of failed writes + invalid_io the number of non-page-size-aligned I/O requests + notify_free Depending on device usage scenario it may account + a) the number of pages freed because of swap slot free + notifications or b) the number of pages freed because of + REQ_DISCARD requests sent by bio. The former ones are + sent to a swap block device when a swap slot is freed, + which implies that this disk is being used as a swap disk. + The latter ones are sent by filesystem mounted with + discard option, whenever some data blocks are getting + discarded. File /sys/block/zram<id>/mm_stat The stat file represents device's mm statistics. It consists of a single line of text and contains the following stats separated by whitespace: - orig_data_size - compr_data_size - mem_used_total - mem_limit - mem_used_max - zero_pages - num_migrated + orig_data_size uncompressed size of data stored in this disk. + This excludes same-element-filled pages (same_pages) since + no memory is allocated for them. + Unit: bytes + compr_data_size compressed size of data stored in this disk + mem_used_total the amount of memory allocated for this disk. This + includes allocator fragmentation and metadata overhead, + allocated for this disk. So, allocator space efficiency + can be calculated using compr_data_size and this statistic. + Unit: bytes + mem_limit the maximum amount of memory ZRAM can use to store + the compressed data + mem_used_max the maximum amount of memory zram have consumed to + store the data + same_pages the number of same element filled pages written to this disk. + No memory is allocated for such pages. + pages_compacted the number of pages freed during compaction + huge_pages the number of incompressible pages 9) Deactivate: swapoff /dev/zram0 @@ -241,5 +233,39 @@ resets the disksize to zero. You must set the disksize again before reusing the device. +* Optional Feature + += writeback + +With incompressible pages, there is no memory saving with zram. +Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page +to backing storage rather than keeping it in memory. +User should set up backing device via /sys/block/zramX/backing_dev +before disksize setting. + += memory tracking + +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the +zram block. It could be useful to catch cold or incompressible +pages of the process with*pagemap. +If you enable the feature, you could see block state via +/sys/kernel/debug/zram/zram0/block_state". The output is as follows, + + 300 75.033841 .wh + 301 63.806904 s.. + 302 63.806919 ..h + +First column is zram's block index. +Second column is access time since the system was booted +Third column is state of the block. +(s: same page +w: written page to backing store +h: huge page) + +First line of above example says 300th block is accessed at 75.033841sec +and the block's state is huge so it is written back to the backing +storage. It's a debugging feature so anyone shouldn't rely on it to work +properly. + Nitin Gupta ngupta@vflare.org
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index c15aa75..0cf9a6b 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt
@@ -28,6 +28,7 @@ 2.3 Userspace 2.4 Ondemand 2.5 Conservative +2.6 Interactive 3. The Governor Interface in the CPUfreq Core @@ -218,6 +219,91 @@ speed. Load for frequency increase is still evaluated every sampling rate. +2.6 Interactive +--------------- + +The CPUfreq governor "interactive" is designed for latency-sensitive, +interactive workloads. This governor sets the CPU speed depending on +usage, similar to "ondemand" and "conservative" governors, but with a +different set of configurable behaviors. + +The tunable values for this governor are: + +above_hispeed_delay: When speed is at or above hispeed_freq, wait for +this long before raising speed in response to continued high load. +The format is a single delay value, optionally followed by pairs of +CPU speeds and the delay to use at or above those speeds. Colons can +be used between the speeds and associated delays for readability. For +example: + + 80000 1300000:200000 1500000:40000 + +uses delay 80000 uS until CPU speed 1.3 GHz, at which speed delay +200000 uS is used until speed 1.5 GHz, at which speed (and above) +delay 40000 uS is used. If speeds are specified these must appear in +ascending order. Default is 20000 uS. + +boost: If non-zero, immediately boost speed of all CPUs to at least +hispeed_freq until zero is written to this attribute. If zero, allow +CPU speeds to drop below hispeed_freq according to load as usual. +Default is zero. + +boostpulse: On each write, immediately boost speed of all CPUs to +hispeed_freq for at least the period of time specified by +boostpulse_duration, after which speeds are allowed to drop below +hispeed_freq according to load as usual. Its a write-only file. + +boostpulse_duration: Length of time to hold CPU speed at hispeed_freq +on a write to boostpulse, before allowing speed to drop according to +load as usual. Default is 80000 uS. + +go_hispeed_load: The CPU load at which to ramp to hispeed_freq. +Default is 99%. + +hispeed_freq: An intermediate "high speed" at which to initially ramp +when CPU load hits the value specified in go_hispeed_load. If load +stays high for the amount of time specified in above_hispeed_delay, +then speed may be bumped higher. Default is the maximum speed allowed +by the policy at governor initialization time. + +io_is_busy: If set, the governor accounts IO time as CPU busy time. + +min_sample_time: The minimum amount of time to spend at the current +frequency before ramping down. Default is 80000 uS. + +target_loads: CPU load values used to adjust speed to influence the +current CPU load toward that value. In general, the lower the target +load, the more often the governor will raise CPU speeds to bring load +below the target. The format is a single target load, optionally +followed by pairs of CPU speeds and CPU loads to target at or above +those speeds. Colons can be used between the speeds and associated +target loads for readability. For example: + + 85 1000000:90 1700000:99 + +targets CPU load 85% below speed 1GHz, 90% at or above 1GHz, until +1.7GHz and above, at which load 99% is targeted. If speeds are +specified these must appear in ascending order. Higher target load +values are typically specified for higher speeds, that is, target load +values also usually appear in an ascending order. The default is +target load 90% for all speeds. + +timer_rate: Sample rate for reevaluating CPU load when the CPU is not +idle. A deferrable timer is used, such that the CPU will not be woken +from idle to service this timer until something else needs to run. +(The maximum time to allow deferring this timer when not running at +minimum speed is configurable via timer_slack.) Default is 20000 uS. + +timer_slack: Maximum additional time to defer handling the governor +sampling timer beyond timer_rate when running at speeds above the +minimum. For platforms that consume additional power at idle when +CPUs are running at speeds greater than minimum, this places an upper +bound on how long the timer will be deferred prior to re-evaluating +load and dropping speed. For example, if timer_rate is 20000uS and +timer_slack is 10000uS then timers will be deferred for up to 30msec +when not at lowest speed. A value of -1 means defer timers +indefinitely at all speeds. Default is 80000 uS. + 3. The Governor Interface in the CPUfreq Core =============================================
diff --git a/Documentation/device-mapper/boot.txt b/Documentation/device-mapper/boot.txt new file mode 100644 index 0000000..adcaad5 --- /dev/null +++ b/Documentation/device-mapper/boot.txt
@@ -0,0 +1,42 @@ +Boot time creation of mapped devices +=================================== + +It is possible to configure a device mapper device to act as the root +device for your system in two ways. + +The first is to build an initial ramdisk which boots to a minimal +userspace which configures the device, then pivot_root(8) in to it. + +For simple device mapper configurations, it is possible to boot directly +using the following kernel command line: + +dm="<name> <uuid> <ro>,table line 1,...,table line n" + +name = the name to associate with the device + after boot, udev, if used, will use that name to label + the device node. +uuid = may be 'none' or the UUID desired for the device. +ro = may be "ro" or "rw". If "ro", the device and device table will be + marked read-only. + +Each table line may be as normal when using the dmsetup tool except for +two variations: +1. Any use of commas will be interpreted as a newline +2. Quotation marks cannot be escaped and cannot be used without + terminating the dm= argument. + +Unless renamed by udev, the device node created will be dm-0 as the +first minor number for the device-mapper is used during early creation. + +Example +======= + +- Booting to a linear array made up of user-mode linux block devices: + + dm="lroot none 0, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" \ + root=/dev/dm-0 + +Will boot to a rw dm-linear target of 8192 sectors split across two +block devices identified by their major:minor numbers. After boot, udev +will rename this target to /dev/mapper/lroot (depending on the rules). +No uuid was assigned.
diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt index 89fd8f9..b3d2e4a 100644 --- a/Documentation/device-mapper/verity.txt +++ b/Documentation/device-mapper/verity.txt
@@ -109,6 +109,17 @@ This is the offset, in <data_block_size> blocks, from the start of the FEC device to the beginning of the encoding data. +check_at_most_once + Verify data blocks only the first time they are read from the data device, + rather than every time. This reduces the overhead of dm-verity so that it + can be used on systems that are memory and/or CPU constrained. However, it + provides a reduced level of security because only offline tampering of the + data device's content will be detected, not online tampering. + + Hash blocks are still verified each time they are read from the hash device, + since verification of hash blocks is less performance critical than data + blocks, and a hash block will not be verified any more after all the data + blocks it covers have been verified anyway. Theory of operation ===================
diff --git a/Documentation/devicetree/bindings/arm/firmware/linaro,optee-tz.txt b/Documentation/devicetree/bindings/arm/firmware/linaro,optee-tz.txt new file mode 100644 index 0000000..d38834c --- /dev/null +++ b/Documentation/devicetree/bindings/arm/firmware/linaro,optee-tz.txt
@@ -0,0 +1,31 @@ +OP-TEE Device Tree Bindings + +OP-TEE is a piece of software using hardware features to provide a Trusted +Execution Environment. The security can be provided with ARM TrustZone, but +also by virtualization or a separate chip. + +We're using "linaro" as the first part of the compatible property for +the reference implementation maintained by Linaro. + +* OP-TEE based on ARM TrustZone required properties: + +- compatible : should contain "linaro,optee-tz" + +- method : The method of calling the OP-TEE Trusted OS. Permitted + values are: + + "smc" : SMC #0, with the register assignments specified + in drivers/tee/optee/optee_smc.h + + "hvc" : HVC #0, with the register assignments specified + in drivers/tee/optee/optee_smc.h + + + +Example: + firmware { + optee { + compatible = "linaro,optee-tz"; + method = "smc"; + }; + };
diff --git a/Documentation/devicetree/bindings/misc/memory-state-time.txt b/Documentation/devicetree/bindings/misc/memory-state-time.txt new file mode 100644 index 0000000..c99a506 --- /dev/null +++ b/Documentation/devicetree/bindings/misc/memory-state-time.txt
@@ -0,0 +1,8 @@ +Memory bandwidth and frequency state tracking + +Required properties: +- compatible : should be: + "memory-state-time" +- freq-tbl: Should contain entries with each frequency in Hz. +- bw-buckets: Should contain upper-bound limits for each bandwidth bucket in Mbps. + Must match the framework power_profile.xml for the device.
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index bceffff..d0526e0 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -156,6 +156,7 @@ lantiq Lantiq Semiconductor lenovo Lenovo Group Ltd. lg LG Corporation +linaro Linaro Limited linux Linux-specific binding lltc Linear Technology Corporation lsi LSI Corp. (LSI Logic)
diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index 753dd4f..d9a0f69 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt
@@ -154,9 +154,62 @@ enabled by default. data_flush Enable data flushing before checkpoint in order to persist data of regular and symlink. +fault_injection=%d Enable fault injection in all supported types with + specified injection rate. +fault_type=%d Support configuring fault injection type, should be + enabled with fault_injection option, fault type value + is shown below, it supports single or combined type. + Type_Name Type_Value + FAULT_KMALLOC 0x000000001 + FAULT_KVMALLOC 0x000000002 + FAULT_PAGE_ALLOC 0x000000004 + FAULT_PAGE_GET 0x000000008 + FAULT_ALLOC_BIO 0x000000010 + FAULT_ALLOC_NID 0x000000020 + FAULT_ORPHAN 0x000000040 + FAULT_BLOCK 0x000000080 + FAULT_DIR_DEPTH 0x000000100 + FAULT_EVICT_INODE 0x000000200 + FAULT_TRUNCATE 0x000000400 + FAULT_IO 0x000000800 + FAULT_CHECKPOINT 0x000001000 + FAULT_DISCARD 0x000002000 mode=%s Control block allocation mode which supports "adaptive" and "lfs". In "lfs" mode, there should be no random writes towards main area. +io_bits=%u Set the bit size of write IO requests. It should be set + with "mode=lfs". +usrquota Enable plain user disk quota accounting. +grpquota Enable plain group disk quota accounting. +prjquota Enable plain project quota accounting. +usrjquota=<file> Appoint specified file and type during mount, so that quota +grpjquota=<file> information can be properly updated during recovery flow, +prjjquota=<file> <quota file>: must be in root directory; +jqfmt=<quota type> <quota type>: [vfsold,vfsv0,vfsv1]. +offusrjquota Turn off user journelled quota. +offgrpjquota Turn off group journelled quota. +offprjjquota Turn off project journelled quota. +quota Enable plain user disk quota accounting. +noquota Disable all plain disk quota option. +whint_mode=%s Control which write hints are passed down to block + layer. This supports "off", "user-based", and + "fs-based". In "off" mode (default), f2fs does not pass + down hints. In "user-based" mode, f2fs tries to pass + down hints given by users. And in "fs-based" mode, f2fs + passes down hints with its policy. +alloc_mode=%s Adjust block allocation policy, which supports "reuse" + and "default". +fsync_mode=%s Control the policy of fsync. Currently supports "posix", + "strict", and "nobarrier". In "posix" mode, which is + default, fsync will follow POSIX semantics and does a + light operation to improve the filesystem performance. + In "strict" mode, fsync will be heavy and behaves in line + with xfs, ext4 and btrfs, where xfstest generic/342 will + pass, but the performance will regress. "nobarrier" is + based on "posix", but doesn't issue flush command for + non-atomic files likewise "nobarrier" mount option. +test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt + context. The fake fscrypt context is used by xfstests. ================================================================================ DEBUGFS ENTRIES @@ -202,6 +255,15 @@ gc_idle = 1 will select the Cost Benefit approach & setting gc_idle = 2 will select the greedy approach. + gc_urgent This parameter controls triggering background GCs + urgently or not. Setting gc_urgent = 0 [default] + makes back to default behavior, while if it is set + to 1, background thread starts to do GC by given + gc_urgent_sleep_time interval. + + gc_urgent_sleep_time This parameter controls sleep time for gc_urgent. + 500 ms is set by default. See above gc_urgent. + reclaim_segments This parameter controls the number of prefree segments to be reclaimed. If the number of prefree segments is larger than the number of segments
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst new file mode 100644 index 0000000..48b424d --- /dev/null +++ b/Documentation/filesystems/fscrypt.rst
@@ -0,0 +1,626 @@ +===================================== +Filesystem-level encryption (fscrypt) +===================================== + +Introduction +============ + +fscrypt is a library which filesystems can hook into to support +transparent encryption of files and directories. + +Note: "fscrypt" in this document refers to the kernel-level portion, +implemented in ``fs/crypto/``, as opposed to the userspace tool +`fscrypt <https://github.com/google/fscrypt>`_. This document only +covers the kernel-level portion. For command-line examples of how to +use encryption, see the documentation for the userspace tool `fscrypt +<https://github.com/google/fscrypt>`_. Also, it is recommended to use +the fscrypt userspace tool, or other existing userspace tools such as +`fscryptctl <https://github.com/google/fscryptctl>`_ or `Android's key +management system +<https://source.android.com/security/encryption/file-based>`_, over +using the kernel's API directly. Using existing tools reduces the +chance of introducing your own security bugs. (Nevertheless, for +completeness this documentation covers the kernel's API anyway.) + +Unlike dm-crypt, fscrypt operates at the filesystem level rather than +at the block device level. This allows it to encrypt different files +with different keys and to have unencrypted files on the same +filesystem. This is useful for multi-user systems where each user's +data-at-rest needs to be cryptographically isolated from the others. +However, except for filenames, fscrypt does not encrypt filesystem +metadata. + +Unlike eCryptfs, which is a stacked filesystem, fscrypt is integrated +directly into supported filesystems --- currently ext4, F2FS, and +UBIFS. This allows encrypted files to be read and written without +caching both the decrypted and encrypted pages in the pagecache, +thereby nearly halving the memory used and bringing it in line with +unencrypted files. Similarly, half as many dentries and inodes are +needed. eCryptfs also limits encrypted filenames to 143 bytes, +causing application compatibility issues; fscrypt allows the full 255 +bytes (NAME_MAX). Finally, unlike eCryptfs, the fscrypt API can be +used by unprivileged users, with no need to mount anything. + +fscrypt does not support encrypting files in-place. Instead, it +supports marking an empty directory as encrypted. Then, after +userspace provides the key, all regular files, directories, and +symbolic links created in that directory tree are transparently +encrypted. + +Threat model +============ + +Offline attacks +--------------- + +Provided that userspace chooses a strong encryption key, fscrypt +protects the confidentiality of file contents and filenames in the +event of a single point-in-time permanent offline compromise of the +block device content. fscrypt does not protect the confidentiality of +non-filename metadata, e.g. file sizes, file permissions, file +timestamps, and extended attributes. Also, the existence and location +of holes (unallocated blocks which logically contain all zeroes) in +files is not protected. + +fscrypt is not guaranteed to protect confidentiality or authenticity +if an attacker is able to manipulate the filesystem offline prior to +an authorized user later accessing the filesystem. + +Online attacks +-------------- + +fscrypt (and storage encryption in general) can only provide limited +protection, if any at all, against online attacks. In detail: + +fscrypt is only resistant to side-channel attacks, such as timing or +electromagnetic attacks, to the extent that the underlying Linux +Cryptographic API algorithms are. If a vulnerable algorithm is used, +such as a table-based implementation of AES, it may be possible for an +attacker to mount a side channel attack against the online system. +Side channel attacks may also be mounted against applications +consuming decrypted data. + +After an encryption key has been provided, fscrypt is not designed to +hide the plaintext file contents or filenames from other users on the +same system, regardless of the visibility of the keyring key. +Instead, existing access control mechanisms such as file mode bits, +POSIX ACLs, LSMs, or mount namespaces should be used for this purpose. +Also note that as long as the encryption keys are *anywhere* in +memory, an online attacker can necessarily compromise them by mounting +a physical attack or by exploiting any kernel security vulnerability +which provides an arbitrary memory read primitive. + +While it is ostensibly possible to "evict" keys from the system, +recently accessed encrypted files will remain accessible at least +until the filesystem is unmounted or the VFS caches are dropped, e.g. +using ``echo 2 > /proc/sys/vm/drop_caches``. Even after that, if the +RAM is compromised before being powered off, it will likely still be +possible to recover portions of the plaintext file contents, if not +some of the encryption keys as well. (Since Linux v4.12, all +in-kernel keys related to fscrypt are sanitized before being freed. +However, userspace would need to do its part as well.) + +Currently, fscrypt does not prevent a user from maliciously providing +an incorrect key for another user's existing encrypted files. A +protection against this is planned. + +Key hierarchy +============= + +Master Keys +----------- + +Each encrypted directory tree is protected by a *master key*. Master +keys can be up to 64 bytes long, and must be at least as long as the +greater of the key length needed by the contents and filenames +encryption modes being used. For example, if AES-256-XTS is used for +contents encryption, the master key must be 64 bytes (512 bits). Note +that the XTS mode is defined to require a key twice as long as that +required by the underlying block cipher. + +To "unlock" an encrypted directory tree, userspace must provide the +appropriate master key. There can be any number of master keys, each +of which protects any number of directory trees on any number of +filesystems. + +Userspace should generate master keys either using a cryptographically +secure random number generator, or by using a KDF (Key Derivation +Function). Note that whenever a KDF is used to "stretch" a +lower-entropy secret such as a passphrase, it is critical that a KDF +designed for this purpose be used, such as scrypt, PBKDF2, or Argon2. + +Per-file keys +------------- + +Master keys are not used to encrypt file contents or names directly. +Instead, a unique key is derived for each encrypted file, including +each regular file, directory, and symbolic link. This has several +advantages: + +- In cryptosystems, the same key material should never be used for + different purposes. Using the master key as both an XTS key for + contents encryption and as a CTS-CBC key for filenames encryption + would violate this rule. +- Per-file keys simplify the choice of IVs (Initialization Vectors) + for contents encryption. Without per-file keys, to ensure IV + uniqueness both the inode and logical block number would need to be + encoded in the IVs. This would make it impossible to renumber + inodes, which e.g. ``resize2fs`` can do when resizing an ext4 + filesystem. With per-file keys, it is sufficient to encode just the + logical block number in the IVs. +- Per-file keys strengthen the encryption of filenames, where IVs are + reused out of necessity. With a unique key per directory, IV reuse + is limited to within a single directory. +- Per-file keys allow individual files to be securely erased simply by + securely erasing their keys. (Not yet implemented.) + +A KDF (Key Derivation Function) is used to derive per-file keys from +the master key. This is done instead of wrapping a randomly-generated +key for each file because it reduces the size of the encryption xattr, +which for some filesystems makes the xattr more likely to fit in-line +in the filesystem's inode table. With a KDF, only a 16-byte nonce is +required --- long enough to make key reuse extremely unlikely. A +wrapped key, on the other hand, would need to be up to 64 bytes --- +the length of an AES-256-XTS key. Furthermore, currently there is no +requirement to support unlocking a file with multiple alternative +master keys or to support rotating master keys. Instead, the master +keys may be wrapped in userspace, e.g. as done by the `fscrypt +<https://github.com/google/fscrypt>`_ tool. + +The current KDF encrypts the master key using the 16-byte nonce as an +AES-128-ECB key. The output is used as the derived key. If the +output is longer than needed, then it is truncated to the needed +length. Truncation is the norm for directories and symlinks, since +those use the CTS-CBC encryption mode which requires a key half as +long as that required by the XTS encryption mode. + +Note: this KDF meets the primary security requirement, which is to +produce unique derived keys that preserve the entropy of the master +key, assuming that the master key is already a good pseudorandom key. +However, it is nonstandard and has some problems such as being +reversible, so it is generally considered to be a mistake! It may be +replaced with HKDF or another more standard KDF in the future. + +Encryption modes and usage +========================== + +fscrypt allows one encryption mode to be specified for file contents +and one encryption mode to be specified for filenames. Different +directory trees are permitted to use different encryption modes. +Currently, the following pairs of encryption modes are supported: + +- AES-256-XTS for contents and AES-256-CTS-CBC for filenames +- AES-128-CBC for contents and AES-128-CTS-CBC for filenames +- Speck128/256-XTS for contents and Speck128/256-CTS-CBC for filenames + +It is strongly recommended to use AES-256-XTS for contents encryption. +AES-128-CBC was added only for low-powered embedded devices with +crypto accelerators such as CAAM or CESA that do not support XTS. + +Similarly, Speck128/256 support was only added for older or low-end +CPUs which cannot do AES fast enough -- especially ARM CPUs which have +NEON instructions but not the Cryptography Extensions -- and for which +it would not otherwise be feasible to use encryption at all. It is +not recommended to use Speck on CPUs that have AES instructions. +Speck support is only available if it has been enabled in the crypto +API via CONFIG_CRYPTO_SPECK. Also, on ARM platforms, to get +acceptable performance CONFIG_CRYPTO_SPECK_NEON must be enabled. + +New encryption modes can be added relatively easily, without changes +to individual filesystems. However, authenticated encryption (AE) +modes are not currently supported because of the difficulty of dealing +with ciphertext expansion. + +For file contents, each filesystem block is encrypted independently. +Currently, only the case where the filesystem block size is equal to +the system's page size (usually 4096 bytes) is supported. With the +XTS mode of operation (recommended), the logical block number within +the file is used as the IV. With the CBC mode of operation (not +recommended), ESSIV is used; specifically, the IV for CBC is the +logical block number encrypted with AES-256, where the AES-256 key is +the SHA-256 hash of the inode's data encryption key. + +For filenames, the full filename is encrypted at once. Because of the +requirements to retain support for efficient directory lookups and +filenames of up to 255 bytes, a constant initialization vector (IV) is +used. However, each encrypted directory uses a unique key, which +limits IV reuse to within a single directory. Note that IV reuse in +the context of CTS-CBC encryption means that when the original +filenames share a common prefix at least as long as the cipher block +size (16 bytes for AES), the corresponding encrypted filenames will +also share a common prefix. This is undesirable; it may be fixed in +the future by switching to an encryption mode that is a strong +pseudorandom permutation on arbitrary-length messages, e.g. the HEH +(Hash-Encrypt-Hash) mode. + +Since filenames are encrypted with the CTS-CBC mode of operation, the +plaintext and ciphertext filenames need not be multiples of the AES +block size, i.e. 16 bytes. However, the minimum size that can be +encrypted is 16 bytes, so shorter filenames are NUL-padded to 16 bytes +before being encrypted. In addition, to reduce leakage of filename +lengths via their ciphertexts, all filenames are NUL-padded to the +next 4, 8, 16, or 32-byte boundary (configurable). 32 is recommended +since this provides the best confidentiality, at the cost of making +directory entries consume slightly more space. Note that since NUL +(``\0``) is not otherwise a valid character in filenames, the padding +will never produce duplicate plaintexts. + +Symbolic link targets are considered a type of filename and are +encrypted in the same way as filenames in directory entries. Each +symlink also uses a unique key; hence, the hardcoded IV is not a +problem for symlinks. + +User API +======== + +Setting an encryption policy +---------------------------- + +The FS_IOC_SET_ENCRYPTION_POLICY ioctl sets an encryption policy on an +empty directory or verifies that a directory or regular file already +has the specified encryption policy. It takes in a pointer to a +:c:type:`struct fscrypt_policy`, defined as follows:: + + #define FS_KEY_DESCRIPTOR_SIZE 8 + + struct fscrypt_policy { + __u8 version; + __u8 contents_encryption_mode; + __u8 filenames_encryption_mode; + __u8 flags; + __u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE]; + }; + +This structure must be initialized as follows: + +- ``version`` must be 0. + +- ``contents_encryption_mode`` and ``filenames_encryption_mode`` must + be set to constants from ``<linux/fs.h>`` which identify the + encryption modes to use. If unsure, use + FS_ENCRYPTION_MODE_AES_256_XTS (1) for ``contents_encryption_mode`` + and FS_ENCRYPTION_MODE_AES_256_CTS (4) for + ``filenames_encryption_mode``. + +- ``flags`` must be set to a value from ``<linux/fs.h>`` which + identifies the amount of NUL-padding to use when encrypting + filenames. If unsure, use FS_POLICY_FLAGS_PAD_32 (0x3). + +- ``master_key_descriptor`` specifies how to find the master key in + the keyring; see `Adding keys`_. It is up to userspace to choose a + unique ``master_key_descriptor`` for each master key. The e4crypt + and fscrypt tools use the first 8 bytes of + ``SHA-512(SHA-512(master_key))``, but this particular scheme is not + required. Also, the master key need not be in the keyring yet when + FS_IOC_SET_ENCRYPTION_POLICY is executed. However, it must be added + before any files can be created in the encrypted directory. + +If the file is not yet encrypted, then FS_IOC_SET_ENCRYPTION_POLICY +verifies that the file is an empty directory. If so, the specified +encryption policy is assigned to the directory, turning it into an +encrypted directory. After that, and after providing the +corresponding master key as described in `Adding keys`_, all regular +files, directories (recursively), and symlinks created in the +directory will be encrypted, inheriting the same encryption policy. +The filenames in the directory's entries will be encrypted as well. + +Alternatively, if the file is already encrypted, then +FS_IOC_SET_ENCRYPTION_POLICY validates that the specified encryption +policy exactly matches the actual one. If they match, then the ioctl +returns 0. Otherwise, it fails with EEXIST. This works on both +regular files and directories, including nonempty directories. + +Note that the ext4 filesystem does not allow the root directory to be +encrypted, even if it is empty. Users who want to encrypt an entire +filesystem with one key should consider using dm-crypt instead. + +FS_IOC_SET_ENCRYPTION_POLICY can fail with the following errors: + +- ``EACCES``: the file is not owned by the process's uid, nor does the + process have the CAP_FOWNER capability in a namespace with the file + owner's uid mapped +- ``EEXIST``: the file is already encrypted with an encryption policy + different from the one specified +- ``EINVAL``: an invalid encryption policy was specified (invalid + version, mode(s), or flags) +- ``ENOTDIR``: the file is unencrypted and is a regular file, not a + directory +- ``ENOTEMPTY``: the file is unencrypted and is a nonempty directory +- ``ENOTTY``: this type of filesystem does not implement encryption +- ``EOPNOTSUPP``: the kernel was not configured with encryption + support for this filesystem, or the filesystem superblock has not + had encryption enabled on it. (For example, to use encryption on an + ext4 filesystem, CONFIG_EXT4_ENCRYPTION must be enabled in the + kernel config, and the superblock must have had the "encrypt" + feature flag enabled using ``tune2fs -O encrypt`` or ``mkfs.ext4 -O + encrypt``.) +- ``EPERM``: this directory may not be encrypted, e.g. because it is + the root directory of an ext4 filesystem +- ``EROFS``: the filesystem is readonly + +Getting an encryption policy +---------------------------- + +The FS_IOC_GET_ENCRYPTION_POLICY ioctl retrieves the :c:type:`struct +fscrypt_policy`, if any, for a directory or regular file. See above +for the struct definition. No additional permissions are required +beyond the ability to open the file. + +FS_IOC_GET_ENCRYPTION_POLICY can fail with the following errors: + +- ``EINVAL``: the file is encrypted, but it uses an unrecognized + encryption context format +- ``ENODATA``: the file is not encrypted +- ``ENOTTY``: this type of filesystem does not implement encryption +- ``EOPNOTSUPP``: the kernel was not configured with encryption + support for this filesystem + +Note: if you only need to know whether a file is encrypted or not, on +most filesystems it is also possible to use the FS_IOC_GETFLAGS ioctl +and check for FS_ENCRYPT_FL, or to use the statx() system call and +check for STATX_ATTR_ENCRYPTED in stx_attributes. + +Getting the per-filesystem salt +------------------------------- + +Some filesystems, such as ext4 and F2FS, also support the deprecated +ioctl FS_IOC_GET_ENCRYPTION_PWSALT. This ioctl retrieves a randomly +generated 16-byte value stored in the filesystem superblock. This +value is intended to used as a salt when deriving an encryption key +from a passphrase or other low-entropy user credential. + +FS_IOC_GET_ENCRYPTION_PWSALT is deprecated. Instead, prefer to +generate and manage any needed salt(s) in userspace. + +Adding keys +----------- + +To provide a master key, userspace must add it to an appropriate +keyring using the add_key() system call (see: +``Documentation/security/keys/core.rst``). The key type must be +"logon"; keys of this type are kept in kernel memory and cannot be +read back by userspace. The key description must be "fscrypt:" +followed by the 16-character lower case hex representation of the +``master_key_descriptor`` that was set in the encryption policy. The +key payload must conform to the following structure:: + + #define FS_MAX_KEY_SIZE 64 + + struct fscrypt_key { + u32 mode; + u8 raw[FS_MAX_KEY_SIZE]; + u32 size; + }; + +``mode`` is ignored; just set it to 0. The actual key is provided in +``raw`` with ``size`` indicating its size in bytes. That is, the +bytes ``raw[0..size-1]`` (inclusive) are the actual key. + +The key description prefix "fscrypt:" may alternatively be replaced +with a filesystem-specific prefix such as "ext4:". However, the +filesystem-specific prefixes are deprecated and should not be used in +new programs. + +There are several different types of keyrings in which encryption keys +may be placed, such as a session keyring, a user session keyring, or a +user keyring. Each key must be placed in a keyring that is "attached" +to all processes that might need to access files encrypted with it, in +the sense that request_key() will find the key. Generally, if only +processes belonging to a specific user need to access a given +encrypted directory and no session keyring has been installed, then +that directory's key should be placed in that user's user session +keyring or user keyring. Otherwise, a session keyring should be +installed if needed, and the key should be linked into that session +keyring, or in a keyring linked into that session keyring. + +Note: introducing the complex visibility semantics of keyrings here +was arguably a mistake --- especially given that by design, after any +process successfully opens an encrypted file (thereby setting up the +per-file key), possessing the keyring key is not actually required for +any process to read/write the file until its in-memory inode is +evicted. In the future there probably should be a way to provide keys +directly to the filesystem instead, which would make the intended +semantics clearer. + +Access semantics +================ + +With the key +------------ + +With the encryption key, encrypted regular files, directories, and +symlinks behave very similarly to their unencrypted counterparts --- +after all, the encryption is intended to be transparent. However, +astute users may notice some differences in behavior: + +- Unencrypted files, or files encrypted with a different encryption + policy (i.e. different key, modes, or flags), cannot be renamed or + linked into an encrypted directory; see `Encryption policy + enforcement`_. Attempts to do so will fail with EPERM. However, + encrypted files can be renamed within an encrypted directory, or + into an unencrypted directory. + +- Direct I/O is not supported on encrypted files. Attempts to use + direct I/O on such files will fall back to buffered I/O. + +- The fallocate operations FALLOC_FL_COLLAPSE_RANGE, + FALLOC_FL_INSERT_RANGE, and FALLOC_FL_ZERO_RANGE are not supported + on encrypted files and will fail with EOPNOTSUPP. + +- Online defragmentation of encrypted files is not supported. The + EXT4_IOC_MOVE_EXT and F2FS_IOC_MOVE_RANGE ioctls will fail with + EOPNOTSUPP. + +- The ext4 filesystem does not support data journaling with encrypted + regular files. It will fall back to ordered data mode instead. + +- DAX (Direct Access) is not supported on encrypted files. + +- The st_size of an encrypted symlink will not necessarily give the + length of the symlink target as required by POSIX. It will actually + give the length of the ciphertext, which will be slightly longer + than the plaintext due to NUL-padding and an extra 2-byte overhead. + +- The maximum length of an encrypted symlink is 2 bytes shorter than + the maximum length of an unencrypted symlink. For example, on an + EXT4 filesystem with a 4K block size, unencrypted symlinks can be up + to 4095 bytes long, while encrypted symlinks can only be up to 4093 + bytes long (both lengths excluding the terminating null). + +Note that mmap *is* supported. This is possible because the pagecache +for an encrypted file contains the plaintext, not the ciphertext. + +Without the key +--------------- + +Some filesystem operations may be performed on encrypted regular +files, directories, and symlinks even before their encryption key has +been provided: + +- File metadata may be read, e.g. using stat(). + +- Directories may be listed, in which case the filenames will be + listed in an encoded form derived from their ciphertext. The + current encoding algorithm is described in `Filename hashing and + encoding`_. The algorithm is subject to change, but it is + guaranteed that the presented filenames will be no longer than + NAME_MAX bytes, will not contain the ``/`` or ``\0`` characters, and + will uniquely identify directory entries. + + The ``.`` and ``..`` directory entries are special. They are always + present and are not encrypted or encoded. + +- Files may be deleted. That is, nondirectory files may be deleted + with unlink() as usual, and empty directories may be deleted with + rmdir() as usual. Therefore, ``rm`` and ``rm -r`` will work as + expected. + +- Symlink targets may be read and followed, but they will be presented + in encrypted form, similar to filenames in directories. Hence, they + are unlikely to point to anywhere useful. + +Without the key, regular files cannot be opened or truncated. +Attempts to do so will fail with ENOKEY. This implies that any +regular file operations that require a file descriptor, such as +read(), write(), mmap(), fallocate(), and ioctl(), are also forbidden. + +Also without the key, files of any type (including directories) cannot +be created or linked into an encrypted directory, nor can a name in an +encrypted directory be the source or target of a rename, nor can an +O_TMPFILE temporary file be created in an encrypted directory. All +such operations will fail with ENOKEY. + +It is not currently possible to backup and restore encrypted files +without the encryption key. This would require special APIs which +have not yet been implemented. + +Encryption policy enforcement +============================= + +After an encryption policy has been set on a directory, all regular +files, directories, and symbolic links created in that directory +(recursively) will inherit that encryption policy. Special files --- +that is, named pipes, device nodes, and UNIX domain sockets --- will +not be encrypted. + +Except for those special files, it is forbidden to have unencrypted +files, or files encrypted with a different encryption policy, in an +encrypted directory tree. Attempts to link or rename such a file into +an encrypted directory will fail with EPERM. This is also enforced +during ->lookup() to provide limited protection against offline +attacks that try to disable or downgrade encryption in known locations +where applications may later write sensitive data. It is recommended +that systems implementing a form of "verified boot" take advantage of +this by validating all top-level encryption policies prior to access. + +Implementation details +====================== + +Encryption context +------------------ + +An encryption policy is represented on-disk by a :c:type:`struct +fscrypt_context`. It is up to individual filesystems to decide where +to store it, but normally it would be stored in a hidden extended +attribute. It should *not* be exposed by the xattr-related system +calls such as getxattr() and setxattr() because of the special +semantics of the encryption xattr. (In particular, there would be +much confusion if an encryption policy were to be added to or removed +from anything other than an empty directory.) The struct is defined +as follows:: + + #define FS_KEY_DESCRIPTOR_SIZE 8 + #define FS_KEY_DERIVATION_NONCE_SIZE 16 + + struct fscrypt_context { + u8 format; + u8 contents_encryption_mode; + u8 filenames_encryption_mode; + u8 flags; + u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE]; + u8 nonce[FS_KEY_DERIVATION_NONCE_SIZE]; + }; + +Note that :c:type:`struct fscrypt_context` contains the same +information as :c:type:`struct fscrypt_policy` (see `Setting an +encryption policy`_), except that :c:type:`struct fscrypt_context` +also contains a nonce. The nonce is randomly generated by the kernel +and is used to derive the inode's encryption key as described in +`Per-file keys`_. + +Data path changes +----------------- + +For the read path (->readpage()) of regular files, filesystems can +read the ciphertext into the page cache and decrypt it in-place. The +page lock must be held until decryption has finished, to prevent the +page from becoming visible to userspace prematurely. + +For the write path (->writepage()) of regular files, filesystems +cannot encrypt data in-place in the page cache, since the cached +plaintext must be preserved. Instead, filesystems must encrypt into a +temporary buffer or "bounce page", then write out the temporary +buffer. Some filesystems, such as UBIFS, already use temporary +buffers regardless of encryption. Other filesystems, such as ext4 and +F2FS, have to allocate bounce pages specially for encryption. + +Filename hashing and encoding +----------------------------- + +Modern filesystems accelerate directory lookups by using indexed +directories. An indexed directory is organized as a tree keyed by +filename hashes. When a ->lookup() is requested, the filesystem +normally hashes the filename being looked up so that it can quickly +find the corresponding directory entry, if any. + +With encryption, lookups must be supported and efficient both with and +without the encryption key. Clearly, it would not work to hash the +plaintext filenames, since the plaintext filenames are unavailable +without the key. (Hashing the plaintext filenames would also make it +impossible for the filesystem's fsck tool to optimize encrypted +directories.) Instead, filesystems hash the ciphertext filenames, +i.e. the bytes actually stored on-disk in the directory entries. When +asked to do a ->lookup() with the key, the filesystem just encrypts +the user-supplied name to get the ciphertext. + +Lookups without the key are more complicated. The raw ciphertext may +contain the ``\0`` and ``/`` characters, which are illegal in +filenames. Therefore, readdir() must base64-encode the ciphertext for +presentation. For most filenames, this works fine; on ->lookup(), the +filesystem just base64-decodes the user-supplied name to get back to +the raw ciphertext. + +However, for very long filenames, base64 encoding would cause the +filename length to exceed NAME_MAX. To prevent this, readdir() +actually presents long filenames in an abbreviated form which encodes +a strong "hash" of the ciphertext filename, along with the optional +filesystem-specific hash(es) needed for directory lookups. This +allows the filesystem to still, with a high degree of confidence, map +the filename given in ->lookup() back to a particular directory entry +that was previously listed by readdir(). See :c:type:`struct +fscrypt_digested_name` in the source for more details. + +Note that the precise way that filenames are presented to userspace +without the key is subject to change in the future. It is only meant +as a way to temporarily present valid filenames so that commands like +``rm -r`` work as expected on encrypted directories.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 74329fd..6e027ae 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt
@@ -392,6 +392,8 @@ [stack] = the stack of the main process [vdso] = the "virtual dynamic shared object", the kernel system call handler + [anon:<name>] = an anonymous mapping that has been + named by userspace or if empty, the mapping is anonymous. @@ -419,6 +421,7 @@ MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd ex mr mw me dw +Name: name from userspace the first of these lines shows the same information as is displayed for the mapping in /proc/PID/maps. The remaining lines show the size of the mapping @@ -486,6 +489,9 @@ be present in all further kernel releases. Things get changed, the flags may be vanished or the reverse -- new added. +The "Name" field will only be present on a mapping that has been named by +userspace, and will show the name passed in by userspace. + This file is only present if the CONFIG_MMU kernel configuration option is enabled.
diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst index 53b872c..db86cda 100644 --- a/Documentation/gpu/drm-kms.rst +++ b/Documentation/gpu/drm-kms.rst
@@ -308,6 +308,12 @@ .. kernel-doc:: drivers/gpu/drm/drm_color_mgmt.c :export: +Explicit Fencing Properties +--------------------------- + +.. kernel-doc:: drivers/gpu/drm/drm_atomic.c + :doc: explicit fencing properties + Existing KMS Properties -----------------------
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index 81c7f2b..efb38da 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt
@@ -308,6 +308,7 @@ 0xA3 80-8F Port ACL in development: <mailto:tlewis@mindspring.com> 0xA3 90-9F linux/dtlk.h +0xA4 00-1F uapi/linux/tee.h Generic TEE subsystem 0xAA 00-3F linux/uapi/linux/userfaultfd.h 0xAB 00-1F linux/nbd.h 0xAC 00-1F linux/raw.h
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index f9f67be..ea79e69 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt
@@ -87,6 +87,7 @@ BLACKFIN Blackfin architecture is enabled. CLK Common clock infrastructure is enabled. CMA Contiguous Memory Area support is enabled. + DM Device mapper support is enabled. DRM Direct Rendering Management support is enabled. DYNAMIC_DEBUG Build in debug messages and enable them at runtime EDD BIOS Enhanced Disk Drive Services (EDD) is enabled @@ -1035,6 +1036,11 @@ dis_ucode_ldr [X86] Disable the microcode loader. + dm= [DM] Allows early creation of a device-mapper device. + See Documentation/device-mapper/boot.txt. + + dmasound= [HW,OSS] Sound subsystem buff + dma_debug=off If the kernel is compiled with DMA_API_DEBUG support, this option disables the debugging code at boot. @@ -1897,6 +1903,12 @@ kernel and module base offset ASLR (Address Space Layout Randomization). + kasan_multi_shot + [KNL] Enforce KASAN (Kernel Address Sanitizer) to print + report on every invalid memory access. Without this + parameter KASAN will print report only for the first + invalid access. + keepinitrd [HW,ARM] kernelcore= [KNL,X86,IA-64,PPC] @@ -3968,6 +3980,14 @@ last alloc / free. For more information see Documentation/vm/slub.txt. + slub_memcg_sysfs= [MM, SLUB] + Determines whether to enable sysfs directories for + memory cgroup sub-caches. 1 to enable, 0 to disable. + The default is determined by CONFIG_SLUB_MEMCG_SYSFS_ON. + Enabling this can lead to a very high number of debug + directories and files being created under + /sys/kernel/slub. + slub_max_order= [MM, SLUB] Determines the maximum allowed order for slabs. A high setting may cause OOMs due to memory
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 3db8c67d..f9154c5 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt
@@ -603,6 +603,16 @@ Note that that additional client or server features are only effective if the basic support (0x1 and 0x2) are enabled respectively. +tcp_fwmark_accept - BOOLEAN + If set, incoming connections to listening sockets that do not have a + socket mark will set the mark of the accepting socket to the fwmark of + the incoming SYN packet. This will cause all packets on that connection + (starting from the first SYNACK) to be sent with that fwmark. The + listening socket's mark is unchanged. Listening sockets that already + have a fwmark set via setsockopt(SOL_SOCKET, SO_MARK, ...) are + unaffected. + Default: 0 + tcp_syn_retries - INTEGER Number of times initial SYNs for an active TCP connection attempt will be retransmitted. Should not be higher than 127. Default value @@ -1442,11 +1452,20 @@ Functional default: enabled if accept_ra is enabled. disabled if accept_ra is disabled. +accept_ra_rt_info_min_plen - INTEGER + Minimum prefix length of Route Information in RA. + + Route Information w/ prefix smaller than this variable shall + be ignored. + + Functional default: 0 if accept_ra_rtr_pref is enabled. + -1 if accept_ra_rtr_pref is disabled. + accept_ra_rt_info_max_plen - INTEGER Maximum prefix length of Route Information in RA. - Route Information w/ prefix larger than or equal to this - variable shall be ignored. + Route Information w/ prefix larger than this variable shall + be ignored. Functional default: 0 if accept_ra_rtr_pref is enabled. -1 if accept_ra_rtr_pref is disabled.
diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt new file mode 100644 index 0000000..dab2f90 --- /dev/null +++ b/Documentation/scheduler/sched-energy.txt
@@ -0,0 +1,362 @@ +Energy cost model for energy-aware scheduling (EXPERIMENTAL) + +Introduction +============= + +The basic energy model uses platform energy data stored in sched_group_energy +data structures attached to the sched_groups in the sched_domain hierarchy. The +energy cost model offers two functions that can be used to guide scheduling +decisions: + +1. static unsigned int sched_group_energy(struct energy_env *eenv) +2. static int energy_diff(struct energy_env *eenv) + +sched_group_energy() estimates the energy consumed by all cpus in a specific +sched_group including any shared resources owned exclusively by this group of +cpus. Resources shared with other cpus are excluded (e.g. later level caches). + +energy_diff() estimates the total energy impact of a utilization change. That +is, adding, removing, or migrating utilization (tasks). + +Both functions use a struct energy_env to specify the scenario to be evaluated: + + struct energy_env { + struct sched_group *sg_top; + struct sched_group *sg_cap; + int cap_idx; + int util_delta; + int src_cpu; + int dst_cpu; + int energy; + }; + +sg_top: sched_group to be evaluated. Not used by energy_diff(). + +sg_cap: sched_group covering the cpus in the same frequency domain. Set by +sched_group_energy(). + +cap_idx: Capacity state to be used for energy calculations. Set by +find_new_capacity(). + +util_delta: Amount of utilization to be added, removed, or migrated. + +src_cpu: Source cpu from where 'util_delta' utilization is removed. Should be +-1 if no source (e.g. task wake-up). + +dst_cpu: Destination cpu where 'util_delta' utilization is added. Should be -1 +if utilization is removed (e.g. terminating tasks). + +energy: Result of sched_group_energy(). + +The metric used to represent utilization is the actual per-entity running time +averaged over time using a geometric series. Very similar to the existing +per-entity load-tracking, but _not_ scaled by task priority and capped by the +capacity of the cpu. The latter property does mean that utilization may +underestimate the compute requirements for task on fully/over utilized cpus. +The greatest potential for energy savings without affecting performance too much +is scenarios where the system isn't fully utilized. If the system is deemed +fully utilized load-balancing should be done with task load (includes task +priority) instead in the interest of fairness and performance. + + +Background and Terminology +=========================== + +To make it clear from the start: + +energy = [joule] (resource like a battery on powered devices) +power = energy/time = [joule/second] = [watt] + +The goal of energy-aware scheduling is to minimize energy, while still getting +the job done. That is, we want to maximize: + + performance [inst/s] + -------------------- + power [W] + +which is equivalent to minimizing: + + energy [J] + ----------- + instruction + +while still getting 'good' performance. It is essentially an alternative +optimization objective to the current performance-only objective for the +scheduler. This alternative considers two objectives: energy-efficiency and +performance. Hence, there needs to be a user controllable knob to switch the +objective. Since it is early days, this is currently a sched_feature +(ENERGY_AWARE). + +The idea behind introducing an energy cost model is to allow the scheduler to +evaluate the implications of its decisions rather than applying energy-saving +techniques blindly that may only have positive effects on some platforms. At +the same time, the energy cost model must be as simple as possible to minimize +the scheduler latency impact. + +Platform topology +------------------ + +The system topology (cpus, caches, and NUMA information, not peripherals) is +represented in the scheduler by the sched_domain hierarchy which has +sched_groups attached at each level that covers one or more cpus (see +sched-domains.txt for more details). To add energy awareness to the scheduler +we need to consider power and frequency domains. + +Power domain: + +A power domain is a part of the system that can be powered on/off +independently. Power domains are typically organized in a hierarchy where you +may be able to power down just a cpu or a group of cpus along with any +associated resources (e.g. shared caches). Powering up a cpu means that all +power domains it is a part of in the hierarchy must be powered up. Hence, it is +more expensive to power up the first cpu that belongs to a higher level power +domain than powering up additional cpus in the same high level domain. Two +level power domain hierarchy example: + + Power source + +-------------------------------+----... +per group PD G G + | +----------+ | + +--------+-------| Shared | (other groups) +per-cpu PD G G | resource | + | | +----------+ + +-------+ +-------+ + | CPU 0 | | CPU 1 | + +-------+ +-------+ + +Frequency domain: + +Frequency domains (P-states) typically cover the same group of cpus as one of +the power domain levels. That is, there might be several smaller power domains +sharing the same frequency (P-state) or there might be a power domain spanning +multiple frequency domains. + +From a scheduling point of view there is no need to know the actual frequencies +[Hz]. All the scheduler cares about is the compute capacity available at the +current state (P-state) the cpu is in and any other available states. For that +reason, and to also factor in any cpu micro-architecture differences, compute +capacity scaling states are called 'capacity states' in this document. For SMP +systems this is equivalent to P-states. For mixed micro-architecture systems +(like ARM big.LITTLE) it is P-states scaled according to the micro-architecture +performance relative to the other cpus in the system. + +Energy modelling: +------------------ + +Due to the hierarchical nature of the power domains, the most obvious way to +model energy costs is therefore to associate power and energy costs with +domains (groups of cpus). Energy costs of shared resources are associated with +the group of cpus that share the resources, only the cost of powering the +cpu itself and any private resources (e.g. private L1 caches) is associated +with the per-cpu groups (lowest level). + +For example, for an SMP system with per-cpu power domains and a cluster level +(group of cpus) power domain we get the overall energy costs to be: + + energy = energy_cluster + n * energy_cpu + +where 'n' is the number of cpus powered up and energy_cluster is the cost paid +as soon as any cpu in the cluster is powered up. + +The power and frequency domains can naturally be mapped onto the existing +sched_domain hierarchy and sched_groups by adding the necessary data to the +existing data structures. + +The energy model considers energy consumption from two contributors (shown in +the illustration below): + +1. Busy energy: Energy consumed while a cpu and the higher level groups that it +belongs to are busy running tasks. Busy energy is associated with the state of +the cpu, not an event. The time the cpu spends in this state varies. Thus, the +most obvious platform parameter for this contribution is busy power +(energy/time). + +2. Idle energy: Energy consumed while a cpu and higher level groups that it +belongs to are idle (in a C-state). Like busy energy, idle energy is associated +with the state of the cpu. Thus, the platform parameter for this contribution +is idle power (energy/time). + +Energy consumed during transitions from an idle-state (C-state) to a busy state +(P-state) or going the other way is ignored by the model to simplify the energy +model calculations. + + + Power + ^ + | busy->idle idle->busy + | transition transition + | + | _ __ + | / \ / \__________________ + |______________/ \ / + | \ / + | Busy \ Idle / Busy + | low P-state \____________/ high P-state + | + +------------------------------------------------------------> time + +Busy |--------------| |-----------------| + +Wakeup |------| |------| + +Idle |------------| + + +The basic algorithm +==================== + +The basic idea is to determine the total energy impact when utilization is +added or removed by estimating the impact at each level in the sched_domain +hierarchy starting from the bottom (sched_group contains just a single cpu). +The energy cost comes from busy time (sched_group is awake because one or more +cpus are busy) and idle time (in an idle-state). Energy model numbers account +for energy costs associated with all cpus in the sched_group as a group. + + for_each_domain(cpu, sd) { + sg = sched_group_of(cpu) + energy_before = curr_util(sg) * busy_power(sg) + + (1-curr_util(sg)) * idle_power(sg) + energy_after = new_util(sg) * busy_power(sg) + + (1-new_util(sg)) * idle_power(sg) + energy_diff += energy_before - energy_after + + } + + return energy_diff + +{curr, new}_util: The cpu utilization at the lowest level and the overall +non-idle time for the entire group for higher levels. Utilization is in the +range 0.0 to 1.0 in the pseudo-code. + +busy_power: The power consumption of the sched_group. + +idle_power: The power consumption of the sched_group when idle. + +Note: It is a fundamental assumption that the utilization is (roughly) scale +invariant. Task utilization tracking factors in any frequency scaling and +performance scaling differences due to difference cpu microarchitectures such +that task utilization can be used across the entire system. + + +Platform energy data +===================== + +struct sched_group_energy can be attached to sched_groups in the sched_domain +hierarchy and has the following members: + +cap_states: + List of struct capacity_state representing the supported capacity states + (P-states). struct capacity_state has two members: cap and power, which + represents the compute capacity and the busy_power of the state. The + list must be ordered by capacity low->high. + +nr_cap_states: + Number of capacity states in cap_states list. + +idle_states: + List of struct idle_state containing idle_state power cost for each + idle-state supported by the system orderd by shallowest state first. + All states must be included at all level in the hierarchy, i.e. a + sched_group spanning just a single cpu must also include coupled + idle-states (cluster states). In addition to the cpuidle idle-states, + the list must also contain an entry for the idling using the arch + default idle (arch_idle_cpu()). Despite this state may not be a true + hardware idle-state it is considered the shallowest idle-state in the + energy model and must be the first entry. cpus may enter this state + (possibly 'active idling') if cpuidle decides not enter a cpuidle + idle-state. Default idle may not be used when cpuidle is enabled. + In this case, it should just be a copy of the first cpuidle idle-state. + +nr_idle_states: + Number of idle states in idle_states list. + +There are no unit requirements for the energy cost data. Data can be normalized +with any reference, however, the normalization must be consistent across all +energy cost data. That is, one bogo-joule/watt must be the same quantity for +data, but we don't care what it is. + +A recipe for platform characterization +======================================= + +Obtaining the actual model data for a particular platform requires some way of +measuring power/energy. There isn't a tool to help with this (yet). This +section provides a recipe for use as reference. It covers the steps used to +characterize the ARM TC2 development platform. This sort of measurements is +expected to be done anyway when tuning cpuidle and cpufreq for a given +platform. + +The energy model needs two types of data (struct sched_group_energy holds +these) for each sched_group where energy costs should be taken into account: + +1. Capacity state information + +A list containing the compute capacity and power consumption when fully +utilized attributed to the group as a whole for each available capacity state. +At the lowest level (group contains just a single cpu) this is the power of the +cpu alone without including power consumed by resources shared with other cpus. +It basically needs to fit the basic modelling approach described in "Background +and Terminology" section: + + energy_system = energy_shared + n * energy_cpu + +for a system containing 'n' busy cpus. Only 'energy_cpu' should be included at +the lowest level. 'energy_shared' is included at the next level which +represents the group of cpus among which the resources are shared. + +This model is, of course, a simplification of reality. Thus, power/energy +attributions might not always exactly represent how the hardware is designed. +Also, busy power is likely to depend on the workload. It is therefore +recommended to use a representative mix of workloads when characterizing the +capacity states. + +If the group has no capacity scaling support, the list will contain a single +state where power is the busy power attributed to the group. The capacity +should be set to a default value (1024). + +When frequency domains include multiple power domains, the group representing +the frequency domain and all child groups share capacity states. This must be +indicated by setting the SD_SHARE_CAP_STATES sched_domain flag. All groups at +all levels that share the capacity state must have the list of capacity states +with the power set to the contribution of the individual group. + +2. Idle power information + +Stored in the idle_states list. The power number is the group idle power +consumption in each idle state as well when the group is idle but has not +entered an idle-state ('active idle' as mentioned earlier). Due to the way the +energy model is defined, the idle power of the deepest group idle state can +alternatively be accounted for in the parent group busy power. In that case the +group idle state power values are offset such that the idle power of the +deepest state is zero. It is less intuitive, but it is easier to measure as +idle power consumed by the group and the busy/idle power of the parent group +cannot be distinguished without per group measurement points. + +Measuring capacity states and idle power: + +The capacity states' capacity and power can be estimated by running a benchmark +workload at each available capacity state. By restricting the benchmark to run +on subsets of cpus it is possible to extrapolate the power consumption of +shared resources. + +ARM TC2 has two clusters of two and three cpus respectively. Each cluster has a +shared L2 cache. TC2 has on-chip energy counters per cluster. Running a +benchmark workload on just one cpu in a cluster means that power is consumed in +the cluster (higher level group) and a single cpu (lowest level group). Adding +another benchmark task to another cpu increases the power consumption by the +amount consumed by the additional cpu. Hence, it is possible to extrapolate the +cluster busy power. + +For platforms that don't have energy counters or equivalent instrumentation +built-in, it may be possible to use an external DAQ to acquire similar data. + +If the benchmark includes some performance score (for example sysbench cpu +benchmark), this can be used to record the compute capacity. + +Measuring idle power requires insight into the idle state implementation on the +particular platform. Specifically, if the platform has coupled idle-states (or +package states). To measure non-coupled per-cpu idle-states it is necessary to +keep one cpu busy to keep any shared resources alive to isolate the idle power +of the cpu from idle/busy power of the shared resources. The cpu can be tricked +into different per-cpu idle states by disabling the other states. Based on +various combinations of measurements with specific cpus busy and disabling +idle-states it is possible to extrapolate the idle-state power.
diff --git a/Documentation/scheduler/sched-tune.txt b/Documentation/scheduler/sched-tune.txt new file mode 100644 index 0000000..9bd2231 --- /dev/null +++ b/Documentation/scheduler/sched-tune.txt
@@ -0,0 +1,366 @@ + Central, scheduler-driven, power-performance control + (EXPERIMENTAL) + +Abstract +======== + +The topic of a single simple power-performance tunable, that is wholly +scheduler centric, and has well defined and predictable properties has come up +on several occasions in the past [1,2]. With techniques such as a scheduler +driven DVFS [3], we now have a good framework for implementing such a tunable. +This document describes the overall ideas behind its design and implementation. + + +Table of Contents +================= + +1. Motivation +2. Introduction +3. Signal Boosting Strategy +4. OPP selection using boosted CPU utilization +5. Per task group boosting +6. Question and Answers + - What about "auto" mode? + - What about boosting on a congested system? + - How CPUs are boosted when we have tasks with multiple boost values? +7. References + + +1. Motivation +============= + +Sched-DVFS [3] is a new event-driven cpufreq governor which allows the +scheduler to select the optimal DVFS operating point (OPP) for running a task +allocated to a CPU. The introduction of sched-DVFS enables running workloads at +the most energy efficient OPPs. + +However, sometimes it may be desired to intentionally boost the performance of +a workload even if that could imply a reasonable increase in energy +consumption. For example, in order to reduce the response time of a task, we +may want to run the task at a higher OPP than the one that is actually required +by it's CPU bandwidth demand. + +This last requirement is especially important if we consider that one of the +main goals of the sched-DVFS component is to replace all currently available +CPUFreq policies. Since sched-DVFS is event based, as opposed to the sampling +driven governors we currently have, it is already more responsive at selecting +the optimal OPP to run tasks allocated to a CPU. However, just tracking the +actual task load demand may not be enough from a performance standpoint. For +example, it is not possible to get behaviors similar to those provided by the +"performance" and "interactive" CPUFreq governors. + +This document describes an implementation of a tunable, stacked on top of the +sched-DVFS which extends its functionality to support task performance +boosting. + +By "performance boosting" we mean the reduction of the time required to +complete a task activation, i.e. the time elapsed from a task wakeup to its +next deactivation (e.g. because it goes back to sleep or it terminates). For +example, if we consider a simple periodic task which executes the same workload +for 5[s] every 20[s] while running at a certain OPP, a boosted execution of +that task must complete each of its activations in less than 5[s]. + +A previous attempt [5] to introduce such a boosting feature has not been +successful mainly because of the complexity of the proposed solution. The +approach described in this document exposes a single simple interface to +user-space. This single tunable knob allows the tuning of system wide +scheduler behaviours ranging from energy efficiency at one end through to +incremental performance boosting at the other end. This first tunable affects +all tasks. However, a more advanced extension of the concept is also provided +which uses CGroups to boost the performance of only selected tasks while using +the energy efficient default for all others. + +The rest of this document introduces in more details the proposed solution +which has been named SchedTune. + + +2. Introduction +=============== + +SchedTune exposes a simple user-space interface with a single power-performance +tunable: + + /proc/sys/kernel/sched_cfs_boost + +This permits expressing a boost value as an integer in the range [0..100]. + +A value of 0 (default) configures the CFS scheduler for maximum energy +efficiency. This means that sched-DVFS runs the tasks at the minimum OPP +required to satisfy their workload demand. +A value of 100 configures scheduler for maximum performance, which translates +to the selection of the maximum OPP on that CPU. + +The range between 0 and 100 can be set to satisfy other scenarios suitably. For +example to satisfy interactive response or depending on other system events +(battery level etc). + +A CGroup based extension is also provided, which permits further user-space +defined task classification to tune the scheduler for different goals depending +on the specific nature of the task, e.g. background vs interactive vs +low-priority. + +The overall design of the SchedTune module is built on top of "Per-Entity Load +Tracking" (PELT) signals and sched-DVFS by introducing a bias on the Operating +Performance Point (OPP) selection. +Each time a task is allocated on a CPU, sched-DVFS has the opportunity to tune +the operating frequency of that CPU to better match the workload demand. The +selection of the actual OPP being activated is influenced by the global boost +value, or the boost value for the task CGroup when in use. + +This simple biasing approach leverages existing frameworks, which means minimal +modifications to the scheduler, and yet it allows to achieve a range of +different behaviours all from a single simple tunable knob. +The only new concept introduced is that of signal boosting. + + +3. Signal Boosting Strategy +=========================== + +The whole PELT machinery works based on the value of a few load tracking signals +which basically track the CPU bandwidth requirements for tasks and the capacity +of CPUs. The basic idea behind the SchedTune knob is to artificially inflate +some of these load tracking signals to make a task or RQ appears more demanding +that it actually is. + +Which signals have to be inflated depends on the specific "consumer". However, +independently from the specific (signal, consumer) pair, it is important to +define a simple and possibly consistent strategy for the concept of boosting a +signal. + +A boosting strategy defines how the "abstract" user-space defined +sched_cfs_boost value is translated into an internal "margin" value to be added +to a signal to get its inflated value: + + margin := boosting_strategy(sched_cfs_boost, signal) + boosted_signal := signal + margin + +Different boosting strategies were identified and analyzed before selecting the +one found to be most effective. + +Signal Proportional Compensation (SPC) +-------------------------------------- + +In this boosting strategy the sched_cfs_boost value is used to compute a +margin which is proportional to the complement of the original signal. +When a signal has a maximum possible value, its complement is defined as +the delta from the actual value and its possible maximum. + +Since the tunable implementation uses signals which have SCHED_LOAD_SCALE as +the maximum possible value, the margin becomes: + + margin := sched_cfs_boost * (SCHED_LOAD_SCALE - signal) + +Using this boosting strategy: +- a 100% sched_cfs_boost means that the signal is scaled to the maximum value +- each value in the range of sched_cfs_boost effectively inflates the signal in + question by a quantity which is proportional to the maximum value. + +For example, by applying the SPC boosting strategy to the selection of the OPP +to run a task it is possible to achieve these behaviors: + +- 0% boosting: run the task at the minimum OPP required by its workload +- 100% boosting: run the task at the maximum OPP available for the CPU +- 50% boosting: run at the half-way OPP between minimum and maximum + +Which means that, at 50% boosting, a task will be scheduled to run at half of +the maximum theoretically achievable performance on the specific target +platform. + +A graphical representation of an SPC boosted signal is represented in the +following figure where: + a) "-" represents the original signal + b) "b" represents a 50% boosted signal + c) "p" represents a 100% boosted signal + + + ^ + | SCHED_LOAD_SCALE + +-----------------------------------------------------------------+ + |pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp + | + | boosted_signal + | bbbbbbbbbbbbbbbbbbbbbbbb + | + | original signal + | bbbbbbbbbbbbbbbbbbbbbbbb+----------------------+ + | | + |bbbbbbbbbbbbbbbbbb | + | | + | | + | | + | +-----------------------+ + | | + | | + | | + |------------------+ + | + | + +-----------------------------------------------------------------------> + +The plot above shows a ramped load signal (titled 'original_signal') and it's +boosted equivalent. For each step of the original signal the boosted signal +corresponding to a 50% boost is midway from the original signal and the upper +bound. Boosting by 100% generates a boosted signal which is always saturated to +the upper bound. + + +4. OPP selection using boosted CPU utilization +============================================== + +It is worth calling out that the implementation does not introduce any new load +signals. Instead, it provides an API to tune existing signals. This tuning is +done on demand and only in scheduler code paths where it is sensible to do so. +The new API calls are defined to return either the default signal or a boosted +one, depending on the value of sched_cfs_boost. This is a clean an non invasive +modification of the existing existing code paths. + +The signal representing a CPU's utilization is boosted according to the +previously described SPC boosting strategy. To sched-DVFS, this allows a CPU +(ie CFS run-queue) to appear more used then it actually is. + +Thus, with the sched_cfs_boost enabled we have the following main functions to +get the current utilization of a CPU: + + cpu_util() + boosted_cpu_util() + +The new boosted_cpu_util() is similar to the first but returns a boosted +utilization signal which is a function of the sched_cfs_boost value. + +This function is used in the CFS scheduler code paths where sched-DVFS needs to +decide the OPP to run a CPU at. +For example, this allows selecting the highest OPP for a CPU which has +the boost value set to 100%. + + +5. Per task group boosting +========================== + +The availability of a single knob which is used to boost all tasks in the +system is certainly a simple solution but it quite likely doesn't fit many +utilization scenarios, especially in the mobile device space. + +For example, on battery powered devices there usually are many background +services which are long running and need energy efficient scheduling. On the +other hand, some applications are more performance sensitive and require an +interactive response and/or maximum performance, regardless of the energy cost. +To better service such scenarios, the SchedTune implementation has an extension +that provides a more fine grained boosting interface. + +A new CGroup controller, namely "schedtune", could be enabled which allows to +defined and configure task groups with different boosting values. +Tasks that require special performance can be put into separate CGroups. +The value of the boost associated with the tasks in this group can be specified +using a single knob exposed by the CGroup controller: + + schedtune.boost + +This knob allows the definition of a boost value that is to be used for +SPC boosting of all tasks attached to this group. + +The current schedtune controller implementation is really simple and has these +main characteristics: + + 1) It is only possible to create 1 level depth hierarchies + + The root control groups define the system-wide boost value to be applied + by default to all tasks. Its direct subgroups are named "boost groups" and + they define the boost value for specific set of tasks. + Further nested subgroups are not allowed since they do not have a sensible + meaning from a user-space standpoint. + + 2) It is possible to define only a limited number of "boost groups" + + This number is defined at compile time and by default configured to 16. + This is a design decision motivated by two main reasons: + a) In a real system we do not expect utilization scenarios with more then few + boost groups. For example, a reasonable collection of groups could be + just "background", "interactive" and "performance". + b) It simplifies the implementation considerably, especially for the code + which has to compute the per CPU boosting once there are multiple + RUNNABLE tasks with different boost values. + +Such a simple design should allow servicing the main utilization scenarios identified +so far. It provides a simple interface which can be used to manage the +power-performance of all tasks or only selected tasks. +Moreover, this interface can be easily integrated by user-space run-times (e.g. +Android, ChromeOS) to implement a QoS solution for task boosting based on tasks +classification, which has been a long standing requirement. + +Setup and usage +--------------- + +0. Use a kernel with CGROUP_SCHEDTUNE support enabled + +1. Check that the "schedtune" CGroup controller is available: + + root@linaro-nano:~# cat /proc/cgroups + #subsys_name hierarchy num_cgroups enabled + cpuset 0 1 1 + cpu 0 1 1 + schedtune 0 1 1 + +2. Mount a tmpfs to create the CGroups mount point (Optional) + + root@linaro-nano:~# sudo mount -t tmpfs cgroups /sys/fs/cgroup + +3. Mount the "schedtune" controller + + root@linaro-nano:~# mkdir /sys/fs/cgroup/stune + root@linaro-nano:~# sudo mount -t cgroup -o schedtune stune /sys/fs/cgroup/stune + +4. Setup the system-wide boost value (Optional) + + If not configured the root control group has a 0% boost value, which + basically disables boosting for all tasks in the system thus running in + an energy-efficient mode. + + root@linaro-nano:~# echo $SYSBOOST > /sys/fs/cgroup/stune/schedtune.boost + +5. Create task groups and configure their specific boost value (Optional) + + For example here we create a "performance" boost group configure to boost + all its tasks to 100% + + root@linaro-nano:~# mkdir /sys/fs/cgroup/stune/performance + root@linaro-nano:~# echo 100 > /sys/fs/cgroup/stune/performance/schedtune.boost + +6. Move tasks into the boost group + + For example, the following moves the tasks with PID $TASKPID (and all its + threads) into the "performance" boost group. + + root@linaro-nano:~# echo "TASKPID > /sys/fs/cgroup/stune/performance/cgroup.procs + +This simple configuration allows only the threads of the $TASKPID task to run, +when needed, at the highest OPP in the most capable CPU of the system. + + +6. Question and Answers +======================= + +What about "auto" mode? +----------------------- + +The 'auto' mode as described in [5] can be implemented by interfacing SchedTune +with some suitable user-space element. This element could use the exposed +system-wide or cgroup based interface. + +How are multiple groups of tasks with different boost values managed? +--------------------------------------------------------------------- + +The current SchedTune implementation keeps track of the boosted RUNNABLE tasks +on a CPU. Once sched-DVFS selects the OPP to run a CPU at, the CPU utilization +is boosted with a value which is the maximum of the boost values of the +currently RUNNABLE tasks in its RQ. + +This allows sched-DVFS to boost a CPU only while there are boosted tasks ready +to run and switch back to the energy efficient mode as soon as the last boosted +task is dequeued. + + +7. References +============= +[1] http://lwn.net/Articles/552889 +[2] http://lkml.org/lkml/2012/5/18/91 +[3] http://lkml.org/lkml/2015/6/26/620
diff --git a/Documentation/security/keys.txt b/Documentation/security/keys.txt index 3849814..0e03baf 100644 --- a/Documentation/security/keys.txt +++ b/Documentation/security/keys.txt
@@ -1151,8 +1151,21 @@ usage. This is called key->payload.rcu_data0. The following accessors wrap the RCU calls to this element: - rcu_assign_keypointer(struct key *key, void *data); - void *rcu_dereference_key(struct key *key); + (a) Set or change the first payload pointer: + + rcu_assign_keypointer(struct key *key, void *data); + + (b) Read the first payload pointer with the key semaphore held: + + [const] void *dereference_key_locked([const] struct key *key); + + Note that the return value will inherit its constness from the key + parameter. Static analysis will give an error if it things the lock + isn't held. + + (c) Read the first payload pointer with the RCU read lock held: + + const void *dereference_key_rcu(const struct key *key); ===================
diff --git a/Documentation/sync.txt b/Documentation/sync.txt new file mode 100644 index 0000000..a2d05e7 --- /dev/null +++ b/Documentation/sync.txt
@@ -0,0 +1,75 @@ +Motivation: + +In complicated DMA pipelines such as graphics (multimedia, camera, gpu, display) +a consumer of a buffer needs to know when the producer has finished producing +it. Likewise the producer needs to know when the consumer is finished with the +buffer so it can reuse it. A particular buffer may be consumed by multiple +consumers which will retain the buffer for different amounts of time. In +addition, a consumer may consume multiple buffers atomically. +The sync framework adds an API which allows synchronization between the +producers and consumers in a generic way while also allowing platforms which +have shared hardware synchronization primitives to exploit them. + +Goals: + * provide a generic API for expressing synchronization dependencies + * allow drivers to exploit hardware synchronization between hardware + blocks + * provide a userspace API that allows a compositor to manage + dependencies. + * provide rich telemetry data to allow debugging slowdowns and stalls of + the graphics pipeline. + +Objects: + * sync_timeline + * sync_pt + * sync_fence + +sync_timeline: + +A sync_timeline is an abstract monotonically increasing counter. In general, +each driver/hardware block context will have one of these. They can be backed +by the appropriate hardware or rely on the generic sw_sync implementation. +Timelines are only ever created through their specific implementations +(i.e. sw_sync.) + +sync_pt: + +A sync_pt is an abstract value which marks a point on a sync_timeline. Sync_pts +have a single timeline parent. They have 3 states: active, signaled, and error. +They start in active state and transition, once, to either signaled (when the +timeline counter advances beyond the sync_pt’s value) or error state. + +sync_fence: + +Sync_fences are the primary primitives used by drivers to coordinate +synchronization of their buffers. They are a collection of sync_pts which may +or may not have the same timeline parent. A sync_pt can only exist in one fence +and the fence's list of sync_pts is immutable once created. Fences can be +waited on synchronously or asynchronously. Two fences can also be merged to +create a third fence containing a copy of the two fences’ sync_pts. Fences are +backed by file descriptors to allow userspace to coordinate the display pipeline +dependencies. + +Use: + +A driver implementing sync support should have a work submission function which: + * takes a fence argument specifying when to begin work + * asynchronously queues that work to kick off when the fence is signaled + * returns a fence to indicate when its work will be done. + * signals the returned fence once the work is completed. + +Consider an imaginary display driver that has the following API: +/* + * assumes buf is ready to be displayed. + * blocks until the buffer is on screen. + */ + void display_buffer(struct dma_buf *buf); + +The new API will become: +/* + * will display buf when fence is signaled. + * returns immediately with a fence that will signal when buf + * is no longer displayed. + */ +struct sync_fence* display_buffer(struct dma_buf *buf, + struct sync_fence *fence);
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index ffab8b5..52daff6 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt
@@ -659,12 +659,14 @@ perf_event_paranoid: Controls use of the performance events system by unprivileged -users (without CAP_SYS_ADMIN). The default value is 2. +users (without CAP_SYS_ADMIN). The default value is 3 if +CONFIG_SECURITY_PERF_EVENTS_RESTRICT is set, or 2 otherwise. -1: Allow use of (almost) all events by all users >=0: Disallow raw tracepoint access by users without CAP_IOC_LOCK >=1: Disallow CPU event access by users without CAP_SYS_ADMIN >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN +>=3: Disallow all event access by users without CAP_SYS_ADMIN ==============================================================
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 95ccbe6..206c9b0 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt
@@ -30,6 +30,7 @@ - dirty_writeback_centisecs - drop_caches - extfrag_threshold +- extra_free_kbytes - hugepages_treat_as_movable - hugetlb_shm_group - laptop_mode @@ -240,6 +241,21 @@ ============================================================== +extra_free_kbytes + +This parameter tells the VM to keep extra free memory between the threshold +where background reclaim (kswapd) kicks in, and the threshold where direct +reclaim (by allocating processes) kicks in. + +This is useful for workloads that require low latency memory allocations +and have a bounded burstiness in memory allocations, for example a +realtime application that receives and transmits network traffic +(causing in-kernel memory allocations) with a maximum total message burst +size of 200MB may need 200MB of extra free memory to avoid direct reclaim +related latencies. + +============================================================== + hugepages_treat_as_movable This parameter controls whether we can allocate hugepages from ZONE_MOVABLE
diff --git a/Documentation/tee.txt b/Documentation/tee.txt new file mode 100644 index 0000000..56ea85f --- /dev/null +++ b/Documentation/tee.txt
@@ -0,0 +1,127 @@ +============= +TEE subsystem +============= + +This document describes the TEE subsystem in Linux. + +A TEE (Trusted Execution Environment) is a trusted OS running in some +secure environment, for example, TrustZone on ARM CPUs, or a separate +secure co-processor etc. A TEE driver handles the details needed to +communicate with the TEE. + +This subsystem deals with: + +- Registration of TEE drivers + +- Managing shared memory between Linux and the TEE + +- Providing a generic API to the TEE + +The TEE interface +================= + +include/uapi/linux/tee.h defines the generic interface to a TEE. + +User space (the client) connects to the driver by opening /dev/tee[0-9]* or +/dev/teepriv[0-9]*. + +- TEE_IOC_SHM_ALLOC allocates shared memory and returns a file descriptor + which user space can mmap. When user space doesn't need the file + descriptor any more, it should be closed. When shared memory isn't needed + any longer it should be unmapped with munmap() to allow the reuse of + memory. + +- TEE_IOC_VERSION lets user space know which TEE this driver handles and + the its capabilities. + +- TEE_IOC_OPEN_SESSION opens a new session to a Trusted Application. + +- TEE_IOC_INVOKE invokes a function in a Trusted Application. + +- TEE_IOC_CANCEL may cancel an ongoing TEE_IOC_OPEN_SESSION or TEE_IOC_INVOKE. + +- TEE_IOC_CLOSE_SESSION closes a session to a Trusted Application. + +There are two classes of clients, normal clients and supplicants. The latter is +a helper process for the TEE to access resources in Linux, for example file +system access. A normal client opens /dev/tee[0-9]* and a supplicant opens +/dev/teepriv[0-9]. + +Much of the communication between clients and the TEE is opaque to the +driver. The main job for the driver is to receive requests from the +clients, forward them to the TEE and send back the results. In the case of +supplicants the communication goes in the other direction, the TEE sends +requests to the supplicant which then sends back the result. + +OP-TEE driver +============= + +The OP-TEE driver handles OP-TEE [1] based TEEs. Currently it is only the ARM +TrustZone based OP-TEE solution that is supported. + +Lowest level of communication with OP-TEE builds on ARM SMC Calling +Convention (SMCCC) [2], which is the foundation for OP-TEE's SMC interface +[3] used internally by the driver. Stacked on top of that is OP-TEE Message +Protocol [4]. + +OP-TEE SMC interface provides the basic functions required by SMCCC and some +additional functions specific for OP-TEE. The most interesting functions are: + +- OPTEE_SMC_FUNCID_CALLS_UID (part of SMCCC) returns the version information + which is then returned by TEE_IOC_VERSION + +- OPTEE_SMC_CALL_GET_OS_UUID returns the particular OP-TEE implementation, used + to tell, for instance, a TrustZone OP-TEE apart from an OP-TEE running on a + separate secure co-processor. + +- OPTEE_SMC_CALL_WITH_ARG drives the OP-TEE message protocol + +- OPTEE_SMC_GET_SHM_CONFIG lets the driver and OP-TEE agree on which memory + range to used for shared memory between Linux and OP-TEE. + +The GlobalPlatform TEE Client API [5] is implemented on top of the generic +TEE API. + +Picture of the relationship between the different components in the +OP-TEE architecture:: + + User space Kernel Secure world + ~~~~~~~~~~ ~~~~~~ ~~~~~~~~~~~~ + +--------+ +-------------+ + | Client | | Trusted | + +--------+ | Application | + /\ +-------------+ + || +----------+ /\ + || |tee- | || + || |supplicant| \/ + || +----------+ +-------------+ + \/ /\ | TEE Internal| + +-------+ || | API | + + TEE | || +--------+--------+ +-------------+ + | Client| || | TEE | OP-TEE | | OP-TEE | + | API | \/ | subsys | driver | | Trusted OS | + +-------+----------------+----+-------+----+-----------+-------------+ + | Generic TEE API | | OP-TEE MSG | + | IOCTL (TEE_IOC_*) | | SMCCC (OPTEE_SMC_CALL_*) | + +-----------------------------+ +------------------------------+ + +RPC (Remote Procedure Call) are requests from secure world to kernel driver +or tee-supplicant. An RPC is identified by a special range of SMCCC return +values from OPTEE_SMC_CALL_WITH_ARG. RPC messages which are intended for the +kernel are handled by the kernel driver. Other RPC messages will be forwarded to +tee-supplicant without further involvement of the driver, except switching +shared memory buffer representation. + +References +========== + +[1] https://github.com/OP-TEE/optee_os + +[2] http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html + +[3] drivers/tee/optee/optee_smc.h + +[4] drivers/tee/optee/optee_msg.h + +[5] http://www.globalplatform.org/specificationsdevice.asp look for + "TEE Client API Specification v1.0" and click download.
diff --git a/Documentation/trace/events-power.txt b/Documentation/trace/events-power.txt index 21d514ce..4d817d5 100644 --- a/Documentation/trace/events-power.txt +++ b/Documentation/trace/events-power.txt
@@ -25,6 +25,7 @@ cpu_idle "state=%lu cpu_id=%lu" cpu_frequency "state=%lu cpu_id=%lu" +cpu_frequency_limits "min=%lu max=%lu cpu_id=%lu" A suspend event is used to indicate the system going in and out of the suspend mode:
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index 185c39f..91723ed 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt
@@ -362,6 +362,26 @@ to correlate events across hypervisor/guest if tb_offset is known. + mono: This uses the fast monotonic clock (CLOCK_MONOTONIC) + which is monotonic and is subject to NTP rate adjustments. + + mono_raw: + This is the raw monotonic clock (CLOCK_MONOTONIC_RAW) + which is montonic but is not subject to any rate adjustments + and ticks at the same rate as the hardware clocksource. + + boot: This is the boot clock (CLOCK_BOOTTIME) and is based on the + fast monotonic clock, but also accounts for time spent in + suspend. Since the clock access is designed for use in + tracing in the suspend path, some side effects are possible + if clock is accessed after the suspend time is accounted before + the fast mono clock is updated. In this case, the clock update + appears to happen slightly sooner than it normally would have. + Also on 32-bit systems, it's possible that the 64-bit boot offset + sees a partial update. These effects are rare and post + processing should be able to handle them. See comments in the + ktime_get_boot_fast_ns() function for more information. + To set a clock, simply echo the clock name into this file. echo global > trace_clock @@ -2102,6 +2122,35 @@ 1) 1.449 us | } +You can disable the hierarchical function call formatting and instead print a +flat list of function entry and return events. This uses the format described +in the Output Formatting section and respects all the trace options that +control that formatting. Hierarchical formatting is the default. + + hierachical: echo nofuncgraph-flat > trace_options + flat: echo funcgraph-flat > trace_options + + ie: + + # tracer: function_graph + # + # entries-in-buffer/entries-written: 68355/68355 #P:2 + # + # _-----=> irqs-off + # / _----=> need-resched + # | / _---=> hardirq/softirq + # || / _--=> preempt-depth + # ||| / delay + # TASK-PID CPU# |||| TIMESTAMP FUNCTION + # | | | |||| | | + sh-1806 [001] d... 198.843443: graph_ent: func=_raw_spin_lock + sh-1806 [001] d... 198.843445: graph_ent: func=__raw_spin_lock + sh-1806 [001] d..1 198.843447: graph_ret: func=__raw_spin_lock + sh-1806 [001] d..1 198.843449: graph_ret: func=_raw_spin_lock + sh-1806 [001] d..1 198.843451: graph_ent: func=_raw_spin_unlock_irqrestore + sh-1806 [001] d... 198.843453: graph_ret: func=_raw_spin_unlock_irqrestore + + You might find other useful features for this tracer in the following "dynamic ftrace" section such as tracing only specific functions or tasks.
diff --git a/Kbuild b/Kbuild index 3d0ae15..94c75276 100644 --- a/Kbuild +++ b/Kbuild
@@ -7,31 +7,6 @@ # 4) Check for missing system calls # 5) Generate constants.py (may need bounds.h) -# Default sed regexp - multiline due to syntax constraints -define sed-y - "/^->/{s:->#\(.*\):/* \1 */:; \ - s:^->\([^ ]*\) [\$$#]*\([-0-9]*\) \(.*\):#define \1 \2 /* \3 */:; \ - s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; \ - s:->::; p;}" -endef - -# Use filechk to avoid rebuilds when a header changes, but the resulting file -# does not -define filechk_offsets - (set -e; \ - echo "#ifndef $2"; \ - echo "#define $2"; \ - echo "/*"; \ - echo " * DO NOT MODIFY."; \ - echo " *"; \ - echo " * This file was generated by Kbuild"; \ - echo " */"; \ - echo ""; \ - sed -ne $(sed-y); \ - echo ""; \ - echo "#endif" ) -endef - ##### # 1) Generate bounds.h
diff --git a/MAINTAINERS b/MAINTAINERS index 63cefa6..a419303 100644 --- a/MAINTAINERS +++ b/MAINTAINERS
@@ -9010,6 +9010,11 @@ F: drivers/oprofile/ F: include/linux/oprofile.h +OP-TEE DRIVER +M: Jens Wiklander <jens.wiklander@linaro.org> +S: Maintained +F: drivers/tee/optee/ + ORACLE CLUSTER FILESYSTEM 2 (OCFS2) M: Mark Fasheh <mfasheh@versity.com> M: Joel Becker <jlbec@evilplan.org> @@ -10655,6 +10660,14 @@ F: include/linux/stm.h F: include/uapi/linux/stm.h +TEE SUBSYSTEM +M: Jens Wiklander <jens.wiklander@linaro.org> +S: Maintained +F: include/linux/tee_drv.h +F: include/uapi/linux/tee.h +F: drivers/tee/ +F: Documentation/tee.txt + THUNDERBOLT DRIVER M: Andreas Noever <andreas.noever@gmail.com> S: Maintained
diff --git a/Makefile b/Makefile index 18090f8..aa5d18a 100644 --- a/Makefile +++ b/Makefile
@@ -303,7 +303,7 @@ HOSTCC = gcc HOSTCXX = g++ -HOSTCFLAGS = -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89 +HOSTCFLAGS := -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89 HOSTCXXFLAGS = -O2 ifeq ($(shell $(HOSTCC) -v 2>&1 | grep -c "clang version"), 1) @@ -348,6 +348,7 @@ # Make variables (CC, etc...) AS = $(CROSS_COMPILE)as LD = $(CROSS_COMPILE)ld +LDGOLD = $(CROSS_COMPILE)ld.gold CC = $(CROSS_COMPILE)gcc CPP = $(CC) -E AR = $(CROSS_COMPILE)ar @@ -624,6 +625,20 @@ CFLAGS_KCOV := $(call cc-option,-fsanitize-coverage=trace-pc,) export CFLAGS_GCOV CFLAGS_KCOV +# Make toolchain changes before including arch/$(SRCARCH)/Makefile to ensure +# ar/cc/ld-* macros return correct values. +ifdef CONFIG_LTO_CLANG +# use GNU gold with LLVMgold for LTO linking, and LD for vmlinux_link +LDFINAL_vmlinux := $(LD) +LD := $(LDGOLD) +LDFLAGS += -plugin LLVMgold.so +# use llvm-ar for building symbol tables from IR files, and llvm-dis instead +# of objdump for processing symbol versions and exports +LLVM_AR := llvm-ar +LLVM_DIS := llvm-dis +export LLVM_AR LLVM_DIS +endif + # The arch Makefile can set ARCH_{CPP,A,C}FLAGS to override the default # values of the respective KBUILD_* variables ARCH_CPPFLAGS := @@ -643,8 +658,56 @@ KBUILD_CFLAGS += $(call cc-option,-fdata-sections,) endif +ifdef CONFIG_LTO_CLANG +lto-clang-flags := -flto -fvisibility=hidden + +# allow disabling only clang LTO where needed +DISABLE_LTO_CLANG := -fno-lto -fvisibility=default +export DISABLE_LTO_CLANG +endif + +ifdef CONFIG_LTO +lto-flags := $(lto-clang-flags) +KBUILD_CFLAGS += $(lto-flags) + +DISABLE_LTO := $(DISABLE_LTO_CLANG) +export DISABLE_LTO + +# LDFINAL_vmlinux and LDFLAGS_FINAL_vmlinux can be set to override +# the linker and flags for vmlinux_link. +export LDFINAL_vmlinux LDFLAGS_FINAL_vmlinux +endif + +ifdef CONFIG_CFI_CLANG +cfi-clang-flags += -fsanitize=cfi +DISABLE_CFI_CLANG := -fno-sanitize=cfi +ifdef CONFIG_MODULES +cfi-clang-flags += -fsanitize-cfi-cross-dso +DISABLE_CFI_CLANG += -fno-sanitize-cfi-cross-dso +endif +ifdef CONFIG_CFI_PERMISSIVE +cfi-clang-flags += -fsanitize-recover=cfi -fno-sanitize-trap=cfi +endif + +# also disable CFI when LTO is disabled +DISABLE_LTO_CLANG += $(DISABLE_CFI_CLANG) +# allow disabling only clang CFI where needed +export DISABLE_CFI_CLANG +endif + +ifdef CONFIG_CFI +# cfi-flags are re-tested in prepare-compiler-check +cfi-flags := $(cfi-clang-flags) +KBUILD_CFLAGS += $(cfi-flags) + +DISABLE_CFI := $(DISABLE_CFI_CLANG) +DISABLE_LTO += $(DISABLE_CFI) +export DISABLE_CFI +endif + ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE -KBUILD_CFLAGS += -Os $(call cc-disable-warning,maybe-uninitialized,) +KBUILD_CFLAGS += $(call cc-option,-Oz,-Os) +KBUILD_CFLAGS += $(call cc-disable-warning,maybe-uninitialized,) else ifdef CONFIG_PROFILE_ALL_BRANCHES KBUILD_CFLAGS += -O2 $(call cc-disable-warning,maybe-uninitialized,) @@ -704,11 +767,22 @@ KBUILD_CFLAGS += $(stackp-flag) ifeq ($(cc-name),clang) +ifneq ($(CROSS_COMPILE),) +CLANG_TRIPLE ?= $(CROSS_COMPILE) +CLANG_TARGET := --target=$(notdir $(CLANG_TRIPLE:%-=%)) +GCC_TOOLCHAIN := $(realpath $(dir $(shell which $(LD)))/..) +endif +ifneq ($(GCC_TOOLCHAIN),) +CLANG_GCC_TC := --gcc-toolchain=$(GCC_TOOLCHAIN) +endif +KBUILD_CFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) +KBUILD_AFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) KBUILD_CPPFLAGS += $(call cc-option,-Qunused-arguments,) -KBUILD_CPPFLAGS += $(call cc-option,-Wno-unknown-warning-option,) KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable) KBUILD_CFLAGS += $(call cc-disable-warning, format-invalid-specifier) KBUILD_CFLAGS += $(call cc-disable-warning, gnu) +KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member) +KBUILD_CFLAGS += $(call cc-disable-warning, duplicate-decl-specifier) # Quiet clang warning: comparison of unsigned expression < 0 is always false KBUILD_CFLAGS += $(call cc-disable-warning, tautological-compare) # CLANG uses a _MergedGlobals as optimization, but this breaks modpost, as the @@ -716,6 +790,8 @@ # See modpost pattern 2 KBUILD_CFLAGS += $(call cc-option, -mno-global-merge,) KBUILD_CFLAGS += $(call cc-option, -fcatch-undefined-behavior) +KBUILD_CFLAGS += $(call cc-option, -no-integrated-as) +KBUILD_AFLAGS += $(call cc-option, -no-integrated-as) else # These warnings generated too much noise in a regular build. @@ -1080,6 +1156,22 @@ # CC_STACKPROTECTOR_STRONG! Why did it build with _REGULAR?!") PHONY += prepare-compiler-check prepare-compiler-check: FORCE +# Make sure we're using a supported toolchain with LTO_CLANG +ifdef CONFIG_LTO_CLANG + ifneq ($(call clang-ifversion, -ge, 0500, y), y) + @echo Cannot use CONFIG_LTO_CLANG: requires clang 5.0 or later >&2 && exit 1 + endif + ifneq ($(call gold-ifversion, -ge, 112000000, y), y) + @echo Cannot use CONFIG_LTO_CLANG: requires GNU gold 1.12 or later >&2 && exit 1 + endif +endif +# Make sure compiler supports LTO flags +ifdef lto-flags + ifeq ($(call cc-option, $(lto-flags)),) + @echo Cannot use CONFIG_LTO: $(lto-flags) not supported by compiler \ + >&2 && exit 1 + endif +endif # Make sure compiler supports requested stack protector flag. ifdef stackp-name ifeq ($(call cc-option, $(stackp-flag)),) @@ -1094,6 +1186,11 @@ $(stackp-flag) available but compiler is broken >&2 && exit 1 endif endif +ifdef cfi-flags + ifeq ($(call cc-option, $(cfi-flags)),) + @echo Cannot use CONFIG_CFI: $(cfi-flags) not supported by compiler >&2 && exit 1 + endif +endif @: # Generate some files @@ -1379,6 +1476,8 @@ @echo ' (default: $$(INSTALL_MOD_PATH)/lib/firmware)' @echo ' dir/ - Build all files in dir and below' @echo ' dir/file.[ois] - Build specified target only' + @echo ' dir/file.ll - Build the LLVM assembly file' + @echo ' (requires compiler support for LLVM assembly generation)' @echo ' dir/file.lst - Build specified mixed source/assembly target only' @echo ' (requires a recent binutils and recent build (System.map))' @echo ' dir/file.ko - Build module including final link' @@ -1563,7 +1662,9 @@ -o -name '*.symtypes' -o -name 'modules.order' \ -o -name modules.builtin -o -name '.tmp_*.o.*' \ -o -name '*.c.[012]*.*' \ - -o -name '*.gcno' \) -type f -print | xargs rm -f + -o -name '*.ll' \ + -o -name '*.gcno' \ + -o -name '*.*.symversions' \) -type f -print | xargs rm -f # Generate tags for editors # --------------------------------------------------------------------------- @@ -1666,6 +1767,8 @@ $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@) %.symtypes: %.c prepare scripts FORCE $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@) +%.ll: %.c prepare scripts FORCE + $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@) # Modules /: prepare scripts FORCE
diff --git a/arch/Kconfig b/arch/Kconfig index b39d0f93..c8e6cdf 100644 --- a/arch/Kconfig +++ b/arch/Kconfig
@@ -492,6 +492,75 @@ sections (e.g., '.text.init'). Typically '.' in section names is used to distinguish them from label names / C identifiers. +config LTO + def_bool n + +config ARCH_SUPPORTS_LTO_CLANG + bool + help + An architecture should select this option it supports: + - compiling with clang, + - compiling inline assembly with clang's integrated assembler, + - and linking with either lld or GNU gold w/ LLVMgold. + +choice + prompt "Link-Time Optimization (LTO) (EXPERIMENTAL)" + default LTO_NONE + help + This option turns on Link-Time Optimization (LTO). + +config LTO_NONE + bool "None" + +config LTO_CLANG + bool "Use clang Link Time Optimization (LTO) (EXPERIMENTAL)" + depends on ARCH_SUPPORTS_LTO_CLANG + depends on !FTRACE_MCOUNT_RECORD || HAVE_C_RECORDMCOUNT + depends on !KASAN + select LTO + select THIN_ARCHIVES + select LD_DEAD_CODE_DATA_ELIMINATION + help + This option enables clang's Link Time Optimization (LTO), which allows + the compiler to optimize the kernel globally at link time. If you + enable this option, the compiler generates LLVM IR instead of object + files, and the actual compilation from IR occurs at the LTO link step, + which may take several minutes. + + If you select this option, you must compile the kernel with clang >= + 5.0 (make CC=clang) and GNU gold from binutils >= 2.27, and have the + LLVMgold plug-in in LD_LIBRARY_PATH. + +endchoice + +config CFI + bool + +config CFI_PERMISSIVE + bool "Use CFI in permissive mode" + depends on CFI + help + When selected, Control Flow Integrity (CFI) violations result in a + warning instead of a kernel panic. This option is useful for finding + CFI violations in drivers during development. + +config CFI_CLANG + bool "Use clang Control Flow Integrity (CFI) (EXPERIMENTAL)" + depends on LTO_CLANG + depends on KALLSYMS + select CFI + help + This option enables clang Control Flow Integrity (CFI), which adds + runtime checking for indirect function calls. + +config CFI_CLANG_SHADOW + bool "Use CFI shadow to speed up cross-module checks" + default y + depends on CFI_CLANG + help + If you select this option, the kernel builds a fast look-up table of + CFI check functions in loaded modules to reduce overhead. + config HAVE_ARCH_WITHIN_STACK_FRAMES bool help
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h index 9e46d6e..fa47df6a 100644 --- a/arch/alpha/include/uapi/asm/socket.h +++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@ #define SO_CNX_ADVICE 53 +#define SO_COOKIE 57 + #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index b5d529fd..00be82f 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig
@@ -1836,6 +1836,15 @@ help Say Y if you want to run Linux in a Virtual Machine on Xen on ARM. +config ARM_FLUSH_CONSOLE_ON_RESTART + bool "Force flush the console on restart" + help + If the console is locked while the system is rebooted, the messages + in the temporary logbuffer would not have propogated to all the + console drivers. This option forces the console lock to be + released if it failed to be acquired, which will cause all the + pending messages to be flushed. + endmenu menu "Boot options" @@ -1864,6 +1873,21 @@ This was deprecated in 2001 and announced to live on for 5 years. Some old boot loaders still use this way. +config BUILD_ARM_APPENDED_DTB_IMAGE + bool "Build a concatenated zImage/dtb by default" + depends on OF + help + Enabling this option will cause a concatenated zImage and list of + DTBs to be built by default (instead of a standalone zImage.) + The image will built in arch/arm/boot/zImage-dtb + +config BUILD_ARM_APPENDED_DTB_IMAGE_NAMES + string "Default dtb names" + depends on BUILD_ARM_APPENDED_DTB_IMAGE + help + Space separated list of names of dtbs to append when + building a concatenated zImage-dtb. + # Compressed boot loader in ROM. Yes, we really want to ask about # TEXT and BSS so we preserve their values in the config files. config ZBOOT_ROM_TEXT
diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug index d83f7c3..17dcd94 100644 --- a/arch/arm/Kconfig.debug +++ b/arch/arm/Kconfig.debug
@@ -1723,6 +1723,14 @@ kernel low-level debugging functions. Add earlyprintk to your kernel parameters to enable this console. +config EARLY_PRINTK_DIRECT + bool "Early printk direct" + depends on DEBUG_LL + help + Say Y here if you want to have an early console using the + kernel low-level debugging functions and EARLY_PRINTK is + not early enough. + config ARM_KPROBES_TEST tristate "Kprobes test module" depends on KPROBES && MODULES
diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 6be9ee1..b53a7b4 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile
@@ -298,6 +298,8 @@ # Default target when executing plain make ifeq ($(CONFIG_XIP_KERNEL),y) KBUILD_IMAGE := xipImage +else ifeq ($(CONFIG_BUILD_ARM_APPENDED_DTB_IMAGE),y) +KBUILD_IMAGE := zImage-dtb else KBUILD_IMAGE := zImage endif @@ -349,6 +351,9 @@ $(Q)$(MAKE) $(build)=arch/arm/vdso $@ endif +zImage-dtb: vmlinux scripts dtbs + $(Q)$(MAKE) $(build)=$(boot) MACHINE=$(MACHINE) $(boot)/$@ + # We use MRPROPER_FILES and CLEAN_FILES now archclean: $(Q)$(MAKE) $(clean)=$(boot)
diff --git a/arch/arm/boot/.gitignore b/arch/arm/boot/.gitignore index 3c79f85..ad7a025 100644 --- a/arch/arm/boot/.gitignore +++ b/arch/arm/boot/.gitignore
@@ -4,3 +4,4 @@ bootpImage uImage *.dtb +zImage-dtb \ No newline at end of file
diff --git a/arch/arm/boot/Makefile b/arch/arm/boot/Makefile index 50f8d1be..da75630 100644 --- a/arch/arm/boot/Makefile +++ b/arch/arm/boot/Makefile
@@ -16,6 +16,7 @@ ifneq ($(MACHINE),) include $(MACHINE)/Makefile.boot endif +include $(srctree)/arch/arm/boot/dts/Makefile # Note: the following conditions must always be true: # ZRELADDR == virt_to_phys(PAGE_OFFSET + TEXT_OFFSET) @@ -29,6 +30,14 @@ targets := Image zImage xipImage bootpImage uImage +DTB_NAMES := $(subst $\",,$(CONFIG_BUILD_ARM_APPENDED_DTB_IMAGE_NAMES)) +ifneq ($(DTB_NAMES),) +DTB_LIST := $(addsuffix .dtb,$(DTB_NAMES)) +else +DTB_LIST := $(dtb-y) +endif +DTB_OBJS := $(addprefix $(obj)/dts/,$(DTB_LIST)) + ifeq ($(CONFIG_XIP_KERNEL),y) $(obj)/xipImage: vmlinux FORCE @@ -55,6 +64,10 @@ $(obj)/zImage: $(obj)/compressed/vmlinux FORCE $(call if_changed,objcopy) +$(obj)/zImage-dtb: $(obj)/zImage $(DTB_OBJS) FORCE + $(call if_changed,cat) + @echo ' Kernel: $@ is ready' + endif ifneq ($(LOADADDR),)
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index fc6d541..51fc9fb 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S
@@ -781,6 +781,8 @@ bic r6, r6, #1 << 31 @ 32-bit translation system bic r6, r6, #(7 << 0) | (1 << 4) @ use only ttbr0 mcrne p15, 0, r3, c2, c0, 0 @ load page table pointer + mcrne p15, 0, r0, c8, c7, 0 @ flush I,D TLBs + mcr p15, 0, r0, c7, c5, 4 @ ISB mcrne p15, 0, r1, c3, c0, 0 @ load domain access control mcrne p15, 0, r6, c2, c0, 2 @ load ttb control #endif
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index 7037201..54f95d3 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile
@@ -960,5 +960,15 @@ dtstree := $(srctree)/$(src) dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard $(dtstree)/*.dts)) -always := $(dtb-y) +DTB_NAMES := $(subst $\",,$(CONFIG_BUILD_ARM_APPENDED_DTB_IMAGE_NAMES)) +ifneq ($(DTB_NAMES),) +DTB_LIST := $(addsuffix .dtb,$(DTB_NAMES)) +else +DTB_LIST := $(dtb-y) +endif + +targets += dtbs dtbs_install +targets += $(DTB_LIST) + +always := $(DTB_LIST) clean-files := *.dtb
diff --git a/arch/arm/common/Kconfig b/arch/arm/common/Kconfig index 9353184..ce01364 100644 --- a/arch/arm/common/Kconfig +++ b/arch/arm/common/Kconfig
@@ -17,3 +17,7 @@ config SHARP_SCOOP bool + +config FIQ_GLUE + bool + select FIQ
diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile index 27f23b1..04aca89 100644 --- a/arch/arm/common/Makefile +++ b/arch/arm/common/Makefile
@@ -4,6 +4,7 @@ obj-y += firmware.o +obj-$(CONFIG_FIQ_GLUE) += fiq_glue.o fiq_glue_setup.o obj-$(CONFIG_ICST) += icst.o obj-$(CONFIG_SA1111) += sa1111.o obj-$(CONFIG_DMABOUNCE) += dmabounce.o
diff --git a/arch/arm/common/fiq_glue.S b/arch/arm/common/fiq_glue.S new file mode 100644 index 0000000..24b42ce --- /dev/null +++ b/arch/arm/common/fiq_glue.S
@@ -0,0 +1,118 @@ +/* + * Copyright (C) 2008 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/linkage.h> +#include <asm/assembler.h> + + .text + + .global fiq_glue_end + + /* fiq stack: r0-r15,cpsr,spsr of interrupted mode */ + +ENTRY(fiq_glue) + /* store pc, cpsr from previous mode, reserve space for spsr */ + mrs r12, spsr + sub lr, lr, #4 + subs r10, #1 + bne nested_fiq + + str r12, [sp, #-8]! + str lr, [sp, #-4]! + + /* store r8-r14 from previous mode */ + sub sp, sp, #(7 * 4) + stmia sp, {r8-r14}^ + nop + + /* store r0-r7 from previous mode */ + stmfd sp!, {r0-r7} + + /* setup func(data,regs) arguments */ + mov r0, r9 + mov r1, sp + mov r3, r8 + + mov r7, sp + + /* Get sp and lr from non-user modes */ + and r4, r12, #MODE_MASK + cmp r4, #USR_MODE + beq fiq_from_usr_mode + + mov r7, sp + orr r4, r4, #(PSR_I_BIT | PSR_F_BIT) + msr cpsr_c, r4 + str sp, [r7, #(4 * 13)] + str lr, [r7, #(4 * 14)] + mrs r5, spsr + str r5, [r7, #(4 * 17)] + + cmp r4, #(SVC_MODE | PSR_I_BIT | PSR_F_BIT) + /* use fiq stack if we reenter this mode */ + subne sp, r7, #(4 * 3) + +fiq_from_usr_mode: + msr cpsr_c, #(SVC_MODE | PSR_I_BIT | PSR_F_BIT) + mov r2, sp + sub sp, r7, #12 + stmfd sp!, {r2, ip, lr} + /* call func(data,regs) */ + blx r3 + ldmfd sp, {r2, ip, lr} + mov sp, r2 + + /* restore/discard saved state */ + cmp r4, #USR_MODE + beq fiq_from_usr_mode_exit + + msr cpsr_c, r4 + ldr sp, [r7, #(4 * 13)] + ldr lr, [r7, #(4 * 14)] + msr spsr_cxsf, r5 + +fiq_from_usr_mode_exit: + msr cpsr_c, #(FIQ_MODE | PSR_I_BIT | PSR_F_BIT) + + ldmfd sp!, {r0-r7} + ldr lr, [sp, #(4 * 7)] + ldr r12, [sp, #(4 * 8)] + add sp, sp, #(10 * 4) +exit_fiq: + msr spsr_cxsf, r12 + add r10, #1 + cmp r11, #0 + moveqs pc, lr + bx r11 /* jump to custom fiq return function */ + +nested_fiq: + orr r12, r12, #(PSR_F_BIT) + b exit_fiq + +fiq_glue_end: + +ENTRY(fiq_glue_setup) /* func, data, sp, smc call number */ + stmfd sp!, {r4} + mrs r4, cpsr + msr cpsr_c, #(FIQ_MODE | PSR_I_BIT | PSR_F_BIT) + movs r8, r0 + mov r9, r1 + mov sp, r2 + mov r11, r3 + moveq r10, #0 + movne r10, #1 + msr cpsr_c, r4 + ldmfd sp!, {r4} + bx lr +
diff --git a/arch/arm/common/fiq_glue_setup.c b/arch/arm/common/fiq_glue_setup.c new file mode 100644 index 0000000..8cb1b61 --- /dev/null +++ b/arch/arm/common/fiq_glue_setup.c
@@ -0,0 +1,147 @@ +/* + * Copyright (C) 2010 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/kernel.h> +#include <linux/percpu.h> +#include <linux/slab.h> +#include <asm/fiq.h> +#include <asm/fiq_glue.h> + +extern unsigned char fiq_glue, fiq_glue_end; +extern void fiq_glue_setup(void *func, void *data, void *sp, + fiq_return_handler_t fiq_return_handler); + +static struct fiq_handler fiq_debbuger_fiq_handler = { + .name = "fiq_glue", +}; +DEFINE_PER_CPU(void *, fiq_stack); +static struct fiq_glue_handler *current_handler; +static fiq_return_handler_t fiq_return_handler; +static DEFINE_MUTEX(fiq_glue_lock); + +static void fiq_glue_setup_helper(void *info) +{ + struct fiq_glue_handler *handler = info; + fiq_glue_setup(handler->fiq, handler, + __get_cpu_var(fiq_stack) + THREAD_START_SP, + fiq_return_handler); +} + +int fiq_glue_register_handler(struct fiq_glue_handler *handler) +{ + int ret; + int cpu; + + if (!handler || !handler->fiq) + return -EINVAL; + + mutex_lock(&fiq_glue_lock); + if (fiq_stack) { + ret = -EBUSY; + goto err_busy; + } + + for_each_possible_cpu(cpu) { + void *stack; + stack = (void *)__get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER); + if (WARN_ON(!stack)) { + ret = -ENOMEM; + goto err_alloc_fiq_stack; + } + per_cpu(fiq_stack, cpu) = stack; + } + + ret = claim_fiq(&fiq_debbuger_fiq_handler); + if (WARN_ON(ret)) + goto err_claim_fiq; + + current_handler = handler; + on_each_cpu(fiq_glue_setup_helper, handler, true); + set_fiq_handler(&fiq_glue, &fiq_glue_end - &fiq_glue); + + mutex_unlock(&fiq_glue_lock); + return 0; + +err_claim_fiq: +err_alloc_fiq_stack: + for_each_possible_cpu(cpu) { + __free_pages(per_cpu(fiq_stack, cpu), THREAD_SIZE_ORDER); + per_cpu(fiq_stack, cpu) = NULL; + } +err_busy: + mutex_unlock(&fiq_glue_lock); + return ret; +} + +static void fiq_glue_update_return_handler(void (*fiq_return)(void)) +{ + fiq_return_handler = fiq_return; + if (current_handler) + on_each_cpu(fiq_glue_setup_helper, current_handler, true); +} + +int fiq_glue_set_return_handler(void (*fiq_return)(void)) +{ + int ret; + + mutex_lock(&fiq_glue_lock); + if (fiq_return_handler) { + ret = -EBUSY; + goto err_busy; + } + fiq_glue_update_return_handler(fiq_return); + ret = 0; +err_busy: + mutex_unlock(&fiq_glue_lock); + + return ret; +} +EXPORT_SYMBOL(fiq_glue_set_return_handler); + +int fiq_glue_clear_return_handler(void (*fiq_return)(void)) +{ + int ret; + + mutex_lock(&fiq_glue_lock); + if (WARN_ON(fiq_return_handler != fiq_return)) { + ret = -EINVAL; + goto err_inval; + } + fiq_glue_update_return_handler(NULL); + ret = 0; +err_inval: + mutex_unlock(&fiq_glue_lock); + + return ret; +} +EXPORT_SYMBOL(fiq_glue_clear_return_handler); + +/** + * fiq_glue_resume - Restore fiqs after suspend or low power idle states + * + * This must be called before calling local_fiq_enable after returning from a + * power state where the fiq mode registers were lost. If a driver provided + * a resume hook when it registered the handler it will be called. + */ + +void fiq_glue_resume(void) +{ + if (!current_handler) + return; + fiq_glue_setup(current_handler->fiq, current_handler, + __get_cpu_var(fiq_stack) + THREAD_START_SP, + fiq_return_handler); + if (current_handler->resume) + current_handler->resume(current_handler); +} +
diff --git a/arch/arm/configs/ranchu_defconfig b/arch/arm/configs/ranchu_defconfig new file mode 100644 index 0000000..49e7bbd --- /dev/null +++ b/arch/arm/configs/ranchu_defconfig
@@ -0,0 +1,316 @@ +# CONFIG_LOCALVERSION_AUTO is not set +CONFIG_AUDIT=y +CONFIG_NO_HZ=y +CONFIG_HIGH_RES_TIMERS=y +CONFIG_TASKSTATS=y +CONFIG_TASK_DELAY_ACCT=y +CONFIG_TASK_XACCT=y +CONFIG_TASK_IO_ACCOUNTING=y +CONFIG_IKCONFIG=y +CONFIG_IKCONFIG_PROC=y +CONFIG_LOG_BUF_SHIFT=14 +CONFIG_CGROUPS=y +CONFIG_CGROUP_DEBUG=y +CONFIG_CGROUP_FREEZER=y +CONFIG_CPUSETS=y +CONFIG_CGROUP_CPUACCT=y +CONFIG_CGROUP_SCHED=y +CONFIG_RT_GROUP_SCHED=y +CONFIG_BLK_DEV_INITRD=y +CONFIG_KALLSYMS_ALL=y +CONFIG_EMBEDDED=y +CONFIG_PROFILING=y +CONFIG_OPROFILE=y +CONFIG_ARCH_MMAP_RND_BITS=16 +# CONFIG_BLK_DEV_BSG is not set +# CONFIG_IOSCHED_DEADLINE is not set +# CONFIG_IOSCHED_CFQ is not set +CONFIG_ARCH_VIRT=y +CONFIG_ARM_KERNMEM_PERMS=y +CONFIG_SMP=y +CONFIG_PREEMPT=y +CONFIG_AEABI=y +CONFIG_HIGHMEM=y +CONFIG_KSM=y +CONFIG_SECCOMP=y +CONFIG_CMDLINE="console=ttyAMA0" +CONFIG_VFP=y +CONFIG_NEON=y +# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set +CONFIG_PM_AUTOSLEEP=y +CONFIG_PM_WAKELOCKS=y +CONFIG_PM_WAKELOCKS_LIMIT=0 +# CONFIG_PM_WAKELOCKS_GC is not set +CONFIG_PM_DEBUG=y +CONFIG_NET=y +CONFIG_PACKET=y +CONFIG_UNIX=y +CONFIG_XFRM_USER=y +CONFIG_NET_KEY=y +CONFIG_INET=y +CONFIG_INET_DIAG_DESTROY=y +CONFIG_IP_MULTICAST=y +CONFIG_IP_ADVANCED_ROUTER=y +CONFIG_IP_MULTIPLE_TABLES=y +CONFIG_IP_PNP=y +CONFIG_IP_PNP_DHCP=y +CONFIG_IP_PNP_BOOTP=y +CONFIG_INET_ESP=y +# CONFIG_INET_LRO is not set +CONFIG_IPV6_ROUTER_PREF=y +CONFIG_IPV6_ROUTE_INFO=y +CONFIG_IPV6_OPTIMISTIC_DAD=y +CONFIG_INET6_AH=y +CONFIG_INET6_ESP=y +CONFIG_INET6_IPCOMP=y +CONFIG_IPV6_MIP6=y +CONFIG_IPV6_MULTIPLE_TABLES=y +CONFIG_NETFILTER=y +CONFIG_NF_CONNTRACK=y +CONFIG_NF_CONNTRACK_SECMARK=y +CONFIG_NF_CONNTRACK_EVENTS=y +CONFIG_NF_CT_PROTO_DCCP=y +CONFIG_NF_CT_PROTO_SCTP=y +CONFIG_NF_CT_PROTO_UDPLITE=y +CONFIG_NF_CONNTRACK_AMANDA=y +CONFIG_NF_CONNTRACK_FTP=y +CONFIG_NF_CONNTRACK_H323=y +CONFIG_NF_CONNTRACK_IRC=y +CONFIG_NF_CONNTRACK_NETBIOS_NS=y +CONFIG_NF_CONNTRACK_PPTP=y +CONFIG_NF_CONNTRACK_SANE=y +CONFIG_NF_CONNTRACK_TFTP=y +CONFIG_NF_CT_NETLINK=y +CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y +CONFIG_NETFILTER_XT_TARGET_CONNMARK=y +CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y +CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y +CONFIG_NETFILTER_XT_TARGET_MARK=y +CONFIG_NETFILTER_XT_TARGET_NFLOG=y +CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y +CONFIG_NETFILTER_XT_TARGET_TPROXY=y +CONFIG_NETFILTER_XT_TARGET_TRACE=y +CONFIG_NETFILTER_XT_TARGET_SECMARK=y +CONFIG_NETFILTER_XT_TARGET_TCPMSS=y +CONFIG_NETFILTER_XT_MATCH_COMMENT=y +CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y +CONFIG_NETFILTER_XT_MATCH_CONNMARK=y +CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y +CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y +CONFIG_NETFILTER_XT_MATCH_HELPER=y +CONFIG_NETFILTER_XT_MATCH_IPRANGE=y +CONFIG_NETFILTER_XT_MATCH_LENGTH=y +CONFIG_NETFILTER_XT_MATCH_LIMIT=y +CONFIG_NETFILTER_XT_MATCH_MAC=y +CONFIG_NETFILTER_XT_MATCH_MARK=y +CONFIG_NETFILTER_XT_MATCH_POLICY=y +CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y +CONFIG_NETFILTER_XT_MATCH_QTAGUID=y +CONFIG_NETFILTER_XT_MATCH_QUOTA=y +CONFIG_NETFILTER_XT_MATCH_QUOTA2=y +CONFIG_NETFILTER_XT_MATCH_SOCKET=y +CONFIG_NETFILTER_XT_MATCH_STATE=y +CONFIG_NETFILTER_XT_MATCH_STATISTIC=y +CONFIG_NETFILTER_XT_MATCH_STRING=y +CONFIG_NETFILTER_XT_MATCH_TIME=y +CONFIG_NETFILTER_XT_MATCH_U32=y +CONFIG_NF_CONNTRACK_IPV4=y +CONFIG_IP_NF_IPTABLES=y +CONFIG_IP_NF_MATCH_AH=y +CONFIG_IP_NF_MATCH_ECN=y +CONFIG_IP_NF_MATCH_TTL=y +CONFIG_IP_NF_FILTER=y +CONFIG_IP_NF_TARGET_REJECT=y +CONFIG_IP_NF_MANGLE=y +CONFIG_IP_NF_RAW=y +CONFIG_IP_NF_SECURITY=y +CONFIG_IP_NF_ARPTABLES=y +CONFIG_IP_NF_ARPFILTER=y +CONFIG_IP_NF_ARP_MANGLE=y +CONFIG_NF_CONNTRACK_IPV6=y +CONFIG_IP6_NF_IPTABLES=y +CONFIG_IP6_NF_FILTER=y +CONFIG_IP6_NF_TARGET_REJECT=y +CONFIG_IP6_NF_MANGLE=y +CONFIG_IP6_NF_RAW=y +CONFIG_BRIDGE=y +CONFIG_NET_SCHED=y +CONFIG_NET_SCH_HTB=y +CONFIG_NET_CLS_U32=y +CONFIG_NET_EMATCH=y +CONFIG_NET_EMATCH_U32=y +CONFIG_NET_CLS_ACT=y +# CONFIG_WIRELESS is not set +CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_MTD=y +CONFIG_MTD_CMDLINE_PARTS=y +CONFIG_MTD_BLOCK=y +CONFIG_MTD_CFI=y +CONFIG_MTD_CFI_INTELEXT=y +CONFIG_MTD_CFI_AMDSTD=y +CONFIG_BLK_DEV_LOOP=y +CONFIG_BLK_DEV_RAM=y +CONFIG_BLK_DEV_RAM_SIZE=8192 +CONFIG_VIRTIO_BLK=y +CONFIG_MD=y +CONFIG_BLK_DEV_DM=y +CONFIG_DM_CRYPT=y +CONFIG_DM_UEVENT=y +CONFIG_DM_VERITY=y +CONFIG_DM_VERITY_FEC=y +CONFIG_NETDEVICES=y +CONFIG_TUN=y +CONFIG_VIRTIO_NET=y +CONFIG_SMSC911X=y +CONFIG_PPP=y +CONFIG_PPP_BSDCOMP=y +CONFIG_PPP_DEFLATE=y +CONFIG_PPP_MPPE=y +CONFIG_PPPOLAC=y +CONFIG_PPPOPNS=y +CONFIG_USB_USBNET=y +# CONFIG_WLAN is not set +CONFIG_INPUT_EVDEV=y +CONFIG_INPUT_KEYRESET=y +CONFIG_KEYBOARD_GOLDFISH_EVENTS=y +# CONFIG_INPUT_MOUSE is not set +CONFIG_INPUT_JOYSTICK=y +CONFIG_JOYSTICK_XPAD=y +CONFIG_JOYSTICK_XPAD_FF=y +CONFIG_JOYSTICK_XPAD_LEDS=y +CONFIG_INPUT_TABLET=y +CONFIG_TABLET_USB_ACECAD=y +CONFIG_TABLET_USB_AIPTEK=y +CONFIG_TABLET_USB_GTCO=y +CONFIG_TABLET_USB_HANWANG=y +CONFIG_TABLET_USB_KBTAB=y +CONFIG_INPUT_MISC=y +CONFIG_INPUT_KEYCHORD=y +CONFIG_INPUT_UINPUT=y +CONFIG_INPUT_GPIO=y +# CONFIG_SERIO_SERPORT is not set +CONFIG_SERIO_AMBAKMI=y +# CONFIG_VT is not set +# CONFIG_LEGACY_PTYS is not set +# CONFIG_DEVMEM is not set +# CONFIG_DEVKMEM is not set +CONFIG_SERIAL_AMBA_PL011=y +CONFIG_SERIAL_AMBA_PL011_CONSOLE=y +CONFIG_VIRTIO_CONSOLE=y +# CONFIG_HW_RANDOM is not set +# CONFIG_HWMON is not set +CONFIG_MEDIA_SUPPORT=y +CONFIG_FB=y +CONFIG_FB_GOLDFISH=y +CONFIG_FB_SIMPLE=y +CONFIG_BACKLIGHT_LCD_SUPPORT=y +CONFIG_LOGO=y +# CONFIG_LOGO_LINUX_MONO is not set +# CONFIG_LOGO_LINUX_VGA16 is not set +CONFIG_SOUND=y +CONFIG_SND=y +CONFIG_HIDRAW=y +CONFIG_UHID=y +CONFIG_HID_A4TECH=y +CONFIG_HID_ACRUX=y +CONFIG_HID_ACRUX_FF=y +CONFIG_HID_APPLE=y +CONFIG_HID_BELKIN=y +CONFIG_HID_CHERRY=y +CONFIG_HID_CHICONY=y +CONFIG_HID_PRODIKEYS=y +CONFIG_HID_CYPRESS=y +CONFIG_HID_DRAGONRISE=y +CONFIG_DRAGONRISE_FF=y +CONFIG_HID_EMS_FF=y +CONFIG_HID_ELECOM=y +CONFIG_HID_EZKEY=y +CONFIG_HID_HOLTEK=y +CONFIG_HID_KEYTOUCH=y +CONFIG_HID_KYE=y +CONFIG_HID_UCLOGIC=y +CONFIG_HID_WALTOP=y +CONFIG_HID_GYRATION=y +CONFIG_HID_TWINHAN=y +CONFIG_HID_KENSINGTON=y +CONFIG_HID_LCPOWER=y +CONFIG_HID_LOGITECH=y +CONFIG_HID_LOGITECH_DJ=y +CONFIG_LOGITECH_FF=y +CONFIG_LOGIRUMBLEPAD2_FF=y +CONFIG_LOGIG940_FF=y +CONFIG_HID_MAGICMOUSE=y +CONFIG_HID_MICROSOFT=y +CONFIG_HID_MONTEREY=y +CONFIG_HID_MULTITOUCH=y +CONFIG_HID_NTRIG=y +CONFIG_HID_ORTEK=y +CONFIG_HID_PANTHERLORD=y +CONFIG_PANTHERLORD_FF=y +CONFIG_HID_PETALYNX=y +CONFIG_HID_PICOLCD=y +CONFIG_HID_PRIMAX=y +CONFIG_HID_ROCCAT=y +CONFIG_HID_SAITEK=y +CONFIG_HID_SAMSUNG=y +CONFIG_HID_SONY=y +CONFIG_HID_SPEEDLINK=y +CONFIG_HID_SUNPLUS=y +CONFIG_HID_GREENASIA=y +CONFIG_GREENASIA_FF=y +CONFIG_HID_SMARTJOYPLUS=y +CONFIG_SMARTJOYPLUS_FF=y +CONFIG_HID_TIVO=y +CONFIG_HID_TOPSEED=y +CONFIG_HID_THRUSTMASTER=y +CONFIG_HID_WACOM=y +CONFIG_HID_WIIMOTE=y +CONFIG_HID_ZEROPLUS=y +CONFIG_HID_ZYDACRON=y +CONFIG_USB_HIDDEV=y +CONFIG_USB_ANNOUNCE_NEW_DEVICES=y +CONFIG_USB_EHCI_HCD=y +CONFIG_USB_OTG_WAKELOCK=y +CONFIG_RTC_CLASS=y +CONFIG_RTC_DRV_PL031=y +CONFIG_VIRTIO_MMIO=y +CONFIG_STAGING=y +CONFIG_ASHMEM=y +CONFIG_ANDROID_LOW_MEMORY_KILLER=y +CONFIG_SYNC=y +CONFIG_SW_SYNC=y +CONFIG_SW_SYNC_USER=y +CONFIG_ION=y +CONFIG_GOLDFISH_AUDIO=y +CONFIG_GOLDFISH=y +CONFIG_GOLDFISH_PIPE=y +CONFIG_ANDROID=y +CONFIG_ANDROID_BINDER_IPC=y +CONFIG_EXT4_FS=y +CONFIG_EXT4_FS_SECURITY=y +CONFIG_QUOTA=y +CONFIG_FUSE_FS=y +CONFIG_CUSE=y +CONFIG_MSDOS_FS=y +CONFIG_VFAT_FS=y +CONFIG_TMPFS=y +CONFIG_TMPFS_POSIX_ACL=y +CONFIG_PSTORE=y +CONFIG_PSTORE_CONSOLE=y +CONFIG_PSTORE_RAM=y +CONFIG_NFS_FS=y +CONFIG_ROOT_NFS=y +CONFIG_NLS_CODEPAGE_437=y +CONFIG_NLS_ISO8859_1=y +CONFIG_DEBUG_INFO=y +CONFIG_MAGIC_SYSRQ=y +CONFIG_DETECT_HUNG_TASK=y +CONFIG_PANIC_TIMEOUT=5 +# CONFIG_SCHED_DEBUG is not set +CONFIG_SCHEDSTATS=y +CONFIG_TIMER_STATS=y +CONFIG_ENABLE_DEFAULT_TRACERS=y +CONFIG_SECURITY=y +CONFIG_SECURITY_NETWORK=y +CONFIG_SECURITY_SELINUX=y +CONFIG_VIRTUALIZATION=y
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig index 27ed1b1..72aa5ec 100644 --- a/arch/arm/crypto/Kconfig +++ b/arch/arm/crypto/Kconfig
@@ -120,4 +120,11 @@ that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64) that is part of the ARMv8 Crypto Extensions +config CRYPTO_SPECK_NEON + tristate "NEON accelerated Speck cipher algorithms" + depends on KERNEL_MODE_NEON + select CRYPTO_BLKCIPHER + select CRYPTO_GF128MUL + select CRYPTO_SPECK + endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index fc51507..1d0448a 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o @@ -36,6 +37,7 @@ sha2-arm-ce-y := sha2-ce-core.o sha2-ce-glue.o aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o +speck-neon-y := speck-neon-core.o speck-neon-glue.o quiet_cmd_perl = PERL $@ cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S new file mode 100644 index 0000000..3c1e203 --- /dev/null +++ b/arch/arm/crypto/speck-neon-core.S
@@ -0,0 +1,432 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS + * + * Copyright (c) 2018 Google, Inc + * + * Author: Eric Biggers <ebiggers@google.com> + */ + +#include <linux/linkage.h> + + .text + .fpu neon + + // arguments + ROUND_KEYS .req r0 // const {u64,u32} *round_keys + NROUNDS .req r1 // int nrounds + DST .req r2 // void *dst + SRC .req r3 // const void *src + NBYTES .req r4 // unsigned int nbytes + TWEAK .req r5 // void *tweak + + // registers which hold the data being encrypted/decrypted + X0 .req q0 + X0_L .req d0 + X0_H .req d1 + Y0 .req q1 + Y0_H .req d3 + X1 .req q2 + X1_L .req d4 + X1_H .req d5 + Y1 .req q3 + Y1_H .req d7 + X2 .req q4 + X2_L .req d8 + X2_H .req d9 + Y2 .req q5 + Y2_H .req d11 + X3 .req q6 + X3_L .req d12 + X3_H .req d13 + Y3 .req q7 + Y3_H .req d15 + + // the round key, duplicated in all lanes + ROUND_KEY .req q8 + ROUND_KEY_L .req d16 + ROUND_KEY_H .req d17 + + // index vector for vtbl-based 8-bit rotates + ROTATE_TABLE .req d18 + + // multiplication table for updating XTS tweaks + GF128MUL_TABLE .req d19 + GF64MUL_TABLE .req d19 + + // current XTS tweak value(s) + TWEAKV .req q10 + TWEAKV_L .req d20 + TWEAKV_H .req d21 + + TMP0 .req q12 + TMP0_L .req d24 + TMP0_H .req d25 + TMP1 .req q13 + TMP2 .req q14 + TMP3 .req q15 + + .align 4 +.Lror64_8_table: + .byte 1, 2, 3, 4, 5, 6, 7, 0 +.Lror32_8_table: + .byte 1, 2, 3, 0, 5, 6, 7, 4 +.Lrol64_8_table: + .byte 7, 0, 1, 2, 3, 4, 5, 6 +.Lrol32_8_table: + .byte 3, 0, 1, 2, 7, 4, 5, 6 +.Lgf128mul_table: + .byte 0, 0x87 + .fill 14 +.Lgf64mul_table: + .byte 0, 0x1b, (0x1b << 1), (0x1b << 1) ^ 0x1b + .fill 12 + +/* + * _speck_round_128bytes() - Speck encryption round on 128 bytes at a time + * + * Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for + * Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes + * of ROUND_KEY. 'n' is the lane size: 64 for Speck128, or 32 for Speck64. + * + * The 8-bit rotates are implemented using vtbl instead of vshr + vsli because + * the vtbl approach is faster on some processors and the same speed on others. + */ +.macro _speck_round_128bytes n + + // x = ror(x, 8) + vtbl.8 X0_L, {X0_L}, ROTATE_TABLE + vtbl.8 X0_H, {X0_H}, ROTATE_TABLE + vtbl.8 X1_L, {X1_L}, ROTATE_TABLE + vtbl.8 X1_H, {X1_H}, ROTATE_TABLE + vtbl.8 X2_L, {X2_L}, ROTATE_TABLE + vtbl.8 X2_H, {X2_H}, ROTATE_TABLE + vtbl.8 X3_L, {X3_L}, ROTATE_TABLE + vtbl.8 X3_H, {X3_H}, ROTATE_TABLE + + // x += y + vadd.u\n X0, Y0 + vadd.u\n X1, Y1 + vadd.u\n X2, Y2 + vadd.u\n X3, Y3 + + // x ^= k + veor X0, ROUND_KEY + veor X1, ROUND_KEY + veor X2, ROUND_KEY + veor X3, ROUND_KEY + + // y = rol(y, 3) + vshl.u\n TMP0, Y0, #3 + vshl.u\n TMP1, Y1, #3 + vshl.u\n TMP2, Y2, #3 + vshl.u\n TMP3, Y3, #3 + vsri.u\n TMP0, Y0, #(\n - 3) + vsri.u\n TMP1, Y1, #(\n - 3) + vsri.u\n TMP2, Y2, #(\n - 3) + vsri.u\n TMP3, Y3, #(\n - 3) + + // y ^= x + veor Y0, TMP0, X0 + veor Y1, TMP1, X1 + veor Y2, TMP2, X2 + veor Y3, TMP3, X3 +.endm + +/* + * _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time + * + * This is the inverse of _speck_round_128bytes(). + */ +.macro _speck_unround_128bytes n + + // y ^= x + veor TMP0, Y0, X0 + veor TMP1, Y1, X1 + veor TMP2, Y2, X2 + veor TMP3, Y3, X3 + + // y = ror(y, 3) + vshr.u\n Y0, TMP0, #3 + vshr.u\n Y1, TMP1, #3 + vshr.u\n Y2, TMP2, #3 + vshr.u\n Y3, TMP3, #3 + vsli.u\n Y0, TMP0, #(\n - 3) + vsli.u\n Y1, TMP1, #(\n - 3) + vsli.u\n Y2, TMP2, #(\n - 3) + vsli.u\n Y3, TMP3, #(\n - 3) + + // x ^= k + veor X0, ROUND_KEY + veor X1, ROUND_KEY + veor X2, ROUND_KEY + veor X3, ROUND_KEY + + // x -= y + vsub.u\n X0, Y0 + vsub.u\n X1, Y1 + vsub.u\n X2, Y2 + vsub.u\n X3, Y3 + + // x = rol(x, 8); + vtbl.8 X0_L, {X0_L}, ROTATE_TABLE + vtbl.8 X0_H, {X0_H}, ROTATE_TABLE + vtbl.8 X1_L, {X1_L}, ROTATE_TABLE + vtbl.8 X1_H, {X1_H}, ROTATE_TABLE + vtbl.8 X2_L, {X2_L}, ROTATE_TABLE + vtbl.8 X2_H, {X2_H}, ROTATE_TABLE + vtbl.8 X3_L, {X3_L}, ROTATE_TABLE + vtbl.8 X3_H, {X3_H}, ROTATE_TABLE +.endm + +.macro _xts128_precrypt_one dst_reg, tweak_buf, tmp + + // Load the next source block + vld1.8 {\dst_reg}, [SRC]! + + // Save the current tweak in the tweak buffer + vst1.8 {TWEAKV}, [\tweak_buf:128]! + + // XOR the next source block with the current tweak + veor \dst_reg, TWEAKV + + /* + * Calculate the next tweak by multiplying the current one by x, + * modulo p(x) = x^128 + x^7 + x^2 + x + 1. + */ + vshr.u64 \tmp, TWEAKV, #63 + vshl.u64 TWEAKV, #1 + veor TWEAKV_H, \tmp\()_L + vtbl.8 \tmp\()_H, {GF128MUL_TABLE}, \tmp\()_H + veor TWEAKV_L, \tmp\()_H +.endm + +.macro _xts64_precrypt_two dst_reg, tweak_buf, tmp + + // Load the next two source blocks + vld1.8 {\dst_reg}, [SRC]! + + // Save the current two tweaks in the tweak buffer + vst1.8 {TWEAKV}, [\tweak_buf:128]! + + // XOR the next two source blocks with the current two tweaks + veor \dst_reg, TWEAKV + + /* + * Calculate the next two tweaks by multiplying the current ones by x^2, + * modulo p(x) = x^64 + x^4 + x^3 + x + 1. + */ + vshr.u64 \tmp, TWEAKV, #62 + vshl.u64 TWEAKV, #2 + vtbl.8 \tmp\()_L, {GF64MUL_TABLE}, \tmp\()_L + vtbl.8 \tmp\()_H, {GF64MUL_TABLE}, \tmp\()_H + veor TWEAKV, \tmp +.endm + +/* + * _speck_xts_crypt() - Speck-XTS encryption/decryption + * + * Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer + * using Speck-XTS, specifically the variant with a block size of '2n' and round + * count given by NROUNDS. The expanded round keys are given in ROUND_KEYS, and + * the current XTS tweak value is given in TWEAK. It's assumed that NBYTES is a + * nonzero multiple of 128. + */ +.macro _speck_xts_crypt n, decrypting + push {r4-r7} + mov r7, sp + + /* + * The first four parameters were passed in registers r0-r3. Load the + * additional parameters, which were passed on the stack. + */ + ldr NBYTES, [sp, #16] + ldr TWEAK, [sp, #20] + + /* + * If decrypting, modify the ROUND_KEYS parameter to point to the last + * round key rather than the first, since for decryption the round keys + * are used in reverse order. + */ +.if \decrypting +.if \n == 64 + add ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #3 + sub ROUND_KEYS, #8 +.else + add ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #2 + sub ROUND_KEYS, #4 +.endif +.endif + + // Load the index vector for vtbl-based 8-bit rotates +.if \decrypting + ldr r12, =.Lrol\n\()_8_table +.else + ldr r12, =.Lror\n\()_8_table +.endif + vld1.8 {ROTATE_TABLE}, [r12:64] + + // One-time XTS preparation + + /* + * Allocate stack space to store 128 bytes worth of tweaks. For + * performance, this space is aligned to a 16-byte boundary so that we + * can use the load/store instructions that declare 16-byte alignment. + */ + sub sp, #128 + bic sp, #0xf + +.if \n == 64 + // Load first tweak + vld1.8 {TWEAKV}, [TWEAK] + + // Load GF(2^128) multiplication table + ldr r12, =.Lgf128mul_table + vld1.8 {GF128MUL_TABLE}, [r12:64] +.else + // Load first tweak + vld1.8 {TWEAKV_L}, [TWEAK] + + // Load GF(2^64) multiplication table + ldr r12, =.Lgf64mul_table + vld1.8 {GF64MUL_TABLE}, [r12:64] + + // Calculate second tweak, packing it together with the first + vshr.u64 TMP0_L, TWEAKV_L, #63 + vtbl.u8 TMP0_L, {GF64MUL_TABLE}, TMP0_L + vshl.u64 TWEAKV_H, TWEAKV_L, #1 + veor TWEAKV_H, TMP0_L +.endif + +.Lnext_128bytes_\@: + + /* + * Load the source blocks into {X,Y}[0-3], XOR them with their XTS tweak + * values, and save the tweaks on the stack for later. Then + * de-interleave the 'x' and 'y' elements of each block, i.e. make it so + * that the X[0-3] registers contain only the second halves of blocks, + * and the Y[0-3] registers contain only the first halves of blocks. + * (Speck uses the order (y, x) rather than the more intuitive (x, y).) + */ + mov r12, sp +.if \n == 64 + _xts128_precrypt_one X0, r12, TMP0 + _xts128_precrypt_one Y0, r12, TMP0 + _xts128_precrypt_one X1, r12, TMP0 + _xts128_precrypt_one Y1, r12, TMP0 + _xts128_precrypt_one X2, r12, TMP0 + _xts128_precrypt_one Y2, r12, TMP0 + _xts128_precrypt_one X3, r12, TMP0 + _xts128_precrypt_one Y3, r12, TMP0 + vswp X0_L, Y0_H + vswp X1_L, Y1_H + vswp X2_L, Y2_H + vswp X3_L, Y3_H +.else + _xts64_precrypt_two X0, r12, TMP0 + _xts64_precrypt_two Y0, r12, TMP0 + _xts64_precrypt_two X1, r12, TMP0 + _xts64_precrypt_two Y1, r12, TMP0 + _xts64_precrypt_two X2, r12, TMP0 + _xts64_precrypt_two Y2, r12, TMP0 + _xts64_precrypt_two X3, r12, TMP0 + _xts64_precrypt_two Y3, r12, TMP0 + vuzp.32 Y0, X0 + vuzp.32 Y1, X1 + vuzp.32 Y2, X2 + vuzp.32 Y3, X3 +.endif + + // Do the cipher rounds + + mov r12, ROUND_KEYS + mov r6, NROUNDS + +.Lnext_round_\@: +.if \decrypting +.if \n == 64 + vld1.64 ROUND_KEY_L, [r12] + sub r12, #8 + vmov ROUND_KEY_H, ROUND_KEY_L +.else + vld1.32 {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12] + sub r12, #4 +.endif + _speck_unround_128bytes \n +.else +.if \n == 64 + vld1.64 ROUND_KEY_L, [r12]! + vmov ROUND_KEY_H, ROUND_KEY_L +.else + vld1.32 {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]! +.endif + _speck_round_128bytes \n +.endif + subs r6, r6, #1 + bne .Lnext_round_\@ + + // Re-interleave the 'x' and 'y' elements of each block +.if \n == 64 + vswp X0_L, Y0_H + vswp X1_L, Y1_H + vswp X2_L, Y2_H + vswp X3_L, Y3_H +.else + vzip.32 Y0, X0 + vzip.32 Y1, X1 + vzip.32 Y2, X2 + vzip.32 Y3, X3 +.endif + + // XOR the encrypted/decrypted blocks with the tweaks we saved earlier + mov r12, sp + vld1.8 {TMP0, TMP1}, [r12:128]! + vld1.8 {TMP2, TMP3}, [r12:128]! + veor X0, TMP0 + veor Y0, TMP1 + veor X1, TMP2 + veor Y1, TMP3 + vld1.8 {TMP0, TMP1}, [r12:128]! + vld1.8 {TMP2, TMP3}, [r12:128]! + veor X2, TMP0 + veor Y2, TMP1 + veor X3, TMP2 + veor Y3, TMP3 + + // Store the ciphertext in the destination buffer + vst1.8 {X0, Y0}, [DST]! + vst1.8 {X1, Y1}, [DST]! + vst1.8 {X2, Y2}, [DST]! + vst1.8 {X3, Y3}, [DST]! + + // Continue if there are more 128-byte chunks remaining, else return + subs NBYTES, #128 + bne .Lnext_128bytes_\@ + + // Store the next tweak +.if \n == 64 + vst1.8 {TWEAKV}, [TWEAK] +.else + vst1.8 {TWEAKV_L}, [TWEAK] +.endif + + mov sp, r7 + pop {r4-r7} + bx lr +.endm + +ENTRY(speck128_xts_encrypt_neon) + _speck_xts_crypt n=64, decrypting=0 +ENDPROC(speck128_xts_encrypt_neon) + +ENTRY(speck128_xts_decrypt_neon) + _speck_xts_crypt n=64, decrypting=1 +ENDPROC(speck128_xts_decrypt_neon) + +ENTRY(speck64_xts_encrypt_neon) + _speck_xts_crypt n=32, decrypting=0 +ENDPROC(speck64_xts_encrypt_neon) + +ENTRY(speck64_xts_decrypt_neon) + _speck_xts_crypt n=32, decrypting=1 +ENDPROC(speck64_xts_decrypt_neon)
diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c new file mode 100644 index 0000000..ea36c3a --- /dev/null +++ b/arch/arm/crypto/speck-neon-glue.c
@@ -0,0 +1,314 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS + * + * Copyright (c) 2018 Google, Inc + * + * Note: the NIST recommendation for XTS only specifies a 128-bit block size, + * but a 64-bit version (needed for Speck64) is fairly straightforward; the math + * is just done in GF(2^64) instead of GF(2^128), with the reducing polynomial + * x^64 + x^4 + x^3 + x + 1 from the original XEX paper (Rogaway, 2004: + * "Efficient Instantiations of Tweakable Blockciphers and Refinements to Modes + * OCB and PMAC"), represented as 0x1B. + */ + +#include <asm/hwcap.h> +#include <asm/neon.h> +#include <asm/simd.h> +#include <crypto/algapi.h> +#include <crypto/gf128mul.h> +#include <crypto/speck.h> +#include <crypto/xts.h> +#include <linux/kernel.h> +#include <linux/module.h> + +/* The assembly functions only handle multiples of 128 bytes */ +#define SPECK_NEON_CHUNK_SIZE 128 + +/* Speck128 */ + +struct speck128_xts_tfm_ctx { + struct speck128_tfm_ctx main_key; + struct speck128_tfm_ctx tweak_key; +}; + +asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds, + void *dst, const void *src, + unsigned int nbytes, void *tweak); + +asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds, + void *dst, const void *src, + unsigned int nbytes, void *tweak); + +typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *, + u8 *, const u8 *); +typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *, + const void *, unsigned int, void *); + +static __always_inline int +__speck128_xts_crypt(struct blkcipher_desc *desc, struct scatterlist *dst, + struct scatterlist *src, unsigned int nbytes, + speck128_crypt_one_t crypt_one, + speck128_xts_crypt_many_t crypt_many) +{ + struct crypto_blkcipher *tfm = desc->tfm; + const struct speck128_xts_tfm_ctx *ctx = crypto_blkcipher_ctx(tfm); + struct blkcipher_walk walk; + le128 tweak; + int err; + + blkcipher_walk_init(&walk, dst, src, nbytes); + err = blkcipher_walk_virt_block(desc, &walk, SPECK_NEON_CHUNK_SIZE); + + crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv); + + while (walk.nbytes > 0) { + unsigned int nbytes = walk.nbytes; + u8 *dst = walk.dst.virt.addr; + const u8 *src = walk.src.virt.addr; + + if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) { + unsigned int count; + + count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE); + kernel_neon_begin(); + (*crypt_many)(ctx->main_key.round_keys, + ctx->main_key.nrounds, + dst, src, count, &tweak); + kernel_neon_end(); + dst += count; + src += count; + nbytes -= count; + } + + /* Handle any remainder with generic code */ + while (nbytes >= sizeof(tweak)) { + le128_xor((le128 *)dst, (const le128 *)src, &tweak); + (*crypt_one)(&ctx->main_key, dst, dst); + le128_xor((le128 *)dst, (const le128 *)dst, &tweak); + gf128mul_x_ble((be128 *)&tweak, (const be128 *)&tweak); + + dst += sizeof(tweak); + src += sizeof(tweak); + nbytes -= sizeof(tweak); + } + err = blkcipher_walk_done(desc, &walk, nbytes); + } + + return err; +} + +static int speck128_xts_encrypt(struct blkcipher_desc *desc, + struct scatterlist *dst, + struct scatterlist *src, + unsigned int nbytes) +{ + return __speck128_xts_crypt(desc, dst, src, nbytes, + crypto_speck128_encrypt, + speck128_xts_encrypt_neon); +} + +static int speck128_xts_decrypt(struct blkcipher_desc *desc, + struct scatterlist *dst, + struct scatterlist *src, + unsigned int nbytes) +{ + return __speck128_xts_crypt(desc, dst, src, nbytes, + crypto_speck128_decrypt, + speck128_xts_decrypt_neon); +} + +static int speck128_xts_setkey(struct crypto_tfm *tfm, const u8 *key, + unsigned int keylen) +{ + struct speck128_xts_tfm_ctx *ctx = crypto_tfm_ctx(tfm); + int err; + + if (keylen % 2) + return -EINVAL; + + keylen /= 2; + + err = crypto_speck128_setkey(&ctx->main_key, key, keylen); + if (err) + return err; + + return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen); +} + +/* Speck64 */ + +struct speck64_xts_tfm_ctx { + struct speck64_tfm_ctx main_key; + struct speck64_tfm_ctx tweak_key; +}; + +asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds, + void *dst, const void *src, + unsigned int nbytes, void *tweak); + +asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds, + void *dst, const void *src, + unsigned int nbytes, void *tweak); + +typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *, + u8 *, const u8 *); +typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *, + const void *, unsigned int, void *); + +static __always_inline int +__speck64_xts_crypt(struct blkcipher_desc *desc, struct scatterlist *dst, + struct scatterlist *src, unsigned int nbytes, + speck64_crypt_one_t crypt_one, + speck64_xts_crypt_many_t crypt_many) +{ + struct crypto_blkcipher *tfm = desc->tfm; + const struct speck64_xts_tfm_ctx *ctx = crypto_blkcipher_ctx(tfm); + struct blkcipher_walk walk; + __le64 tweak; + int err; + + blkcipher_walk_init(&walk, dst, src, nbytes); + err = blkcipher_walk_virt_block(desc, &walk, SPECK_NEON_CHUNK_SIZE); + + crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv); + + while (walk.nbytes > 0) { + unsigned int nbytes = walk.nbytes; + u8 *dst = walk.dst.virt.addr; + const u8 *src = walk.src.virt.addr; + + if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) { + unsigned int count; + + count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE); + kernel_neon_begin(); + (*crypt_many)(ctx->main_key.round_keys, + ctx->main_key.nrounds, + dst, src, count, &tweak); + kernel_neon_end(); + dst += count; + src += count; + nbytes -= count; + } + + /* Handle any remainder with generic code */ + while (nbytes >= sizeof(tweak)) { + *(__le64 *)dst = *(__le64 *)src ^ tweak; + (*crypt_one)(&ctx->main_key, dst, dst); + *(__le64 *)dst ^= tweak; + tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^ + ((tweak & cpu_to_le64(1ULL << 63)) ? + 0x1B : 0)); + dst += sizeof(tweak); + src += sizeof(tweak); + nbytes -= sizeof(tweak); + } + err = blkcipher_walk_done(desc, &walk, nbytes); + } + + return err; +} + +static int speck64_xts_encrypt(struct blkcipher_desc *desc, + struct scatterlist *dst, struct scatterlist *src, + unsigned int nbytes) +{ + return __speck64_xts_crypt(desc, dst, src, nbytes, + crypto_speck64_encrypt, + speck64_xts_encrypt_neon); +} + +static int speck64_xts_decrypt(struct blkcipher_desc *desc, + struct scatterlist *dst, struct scatterlist *src, + unsigned int nbytes) +{ + return __speck64_xts_crypt(desc, dst, src, nbytes, + crypto_speck64_decrypt, + speck64_xts_decrypt_neon); +} + +static int speck64_xts_setkey(struct crypto_tfm *tfm, const u8 *key, + unsigned int keylen) +{ + struct speck64_xts_tfm_ctx *ctx = crypto_tfm_ctx(tfm); + int err; + + if (keylen % 2) + return -EINVAL; + + keylen /= 2; + + err = crypto_speck64_setkey(&ctx->main_key, key, keylen); + if (err) + return err; + + return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen); +} + +static struct crypto_alg speck_algs[] = { + { + .cra_name = "xts(speck128)", + .cra_driver_name = "xts-speck128-neon", + .cra_priority = 300, + .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER, + .cra_blocksize = SPECK128_BLOCK_SIZE, + .cra_type = &crypto_blkcipher_type, + .cra_ctxsize = sizeof(struct speck128_xts_tfm_ctx), + .cra_alignmask = 7, + .cra_module = THIS_MODULE, + .cra_u = { + .blkcipher = { + .min_keysize = 2 * SPECK128_128_KEY_SIZE, + .max_keysize = 2 * SPECK128_256_KEY_SIZE, + .ivsize = SPECK128_BLOCK_SIZE, + .setkey = speck128_xts_setkey, + .encrypt = speck128_xts_encrypt, + .decrypt = speck128_xts_decrypt, + } + } + }, { + .cra_name = "xts(speck64)", + .cra_driver_name = "xts-speck64-neon", + .cra_priority = 300, + .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER, + .cra_blocksize = SPECK64_BLOCK_SIZE, + .cra_type = &crypto_blkcipher_type, + .cra_ctxsize = sizeof(struct speck64_xts_tfm_ctx), + .cra_alignmask = 7, + .cra_module = THIS_MODULE, + .cra_u = { + .blkcipher = { + .min_keysize = 2 * SPECK64_96_KEY_SIZE, + .max_keysize = 2 * SPECK64_128_KEY_SIZE, + .ivsize = SPECK64_BLOCK_SIZE, + .setkey = speck64_xts_setkey, + .encrypt = speck64_xts_encrypt, + .decrypt = speck64_xts_decrypt, + } + } + } +}; + +static int __init speck_neon_module_init(void) +{ + if (!(elf_hwcap & HWCAP_NEON)) + return -ENODEV; + return crypto_register_algs(speck_algs, ARRAY_SIZE(speck_algs)); +} + +static void __exit speck_neon_module_exit(void) +{ + crypto_unregister_algs(speck_algs, ARRAY_SIZE(speck_algs)); +} + +module_init(speck_neon_module_init); +module_exit(speck_neon_module_exit); + +MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)"); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>"); +MODULE_ALIAS_CRYPTO("xts(speck128)"); +MODULE_ALIAS_CRYPTO("xts-speck128-neon"); +MODULE_ALIAS_CRYPTO("xts(speck64)"); +MODULE_ALIAS_CRYPTO("xts-speck64-neon");
diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h index f13ae15..d2315ff 100644 --- a/arch/arm/include/asm/elf.h +++ b/arch/arm/include/asm/elf.h
@@ -112,8 +112,12 @@ int dump_task_regs(struct task_struct *t, elf_gregset_t *elfregs); #define CORE_DUMP_USE_REGSET #define ELF_EXEC_PAGESIZE 4096 -/* This is the base location for PIE (ET_DYN with INTERP) loads. */ -#define ELF_ET_DYN_BASE 0x400000UL +/* This is the location that an ET_DYN program is loaded if exec'ed. Typical + use of this is to invoke "./ld.so someprog" to test out a new version of + the loader. We need to make sure that it is out of the way of the program + that it will "exec", and that there is sufficient room for the brk. */ + +#define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) /* When the program starts, a1 contains a pointer to a function to be registered with atexit, as per the SVR4 ABI. A value of 0 means we
diff --git a/arch/arm/include/asm/fiq_glue.h b/arch/arm/include/asm/fiq_glue.h new file mode 100644 index 0000000..a9e244f9 --- /dev/null +++ b/arch/arm/include/asm/fiq_glue.h
@@ -0,0 +1,33 @@ +/* + * Copyright (C) 2010 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef __ASM_FIQ_GLUE_H +#define __ASM_FIQ_GLUE_H + +struct fiq_glue_handler { + void (*fiq)(struct fiq_glue_handler *h, void *regs, void *svc_sp); + void (*resume)(struct fiq_glue_handler *h); +}; +typedef void (*fiq_return_handler_t)(void); + +int fiq_glue_register_handler(struct fiq_glue_handler *handler); +int fiq_glue_set_return_handler(fiq_return_handler_t fiq_return); +int fiq_glue_clear_return_handler(fiq_return_handler_t fiq_return); + +#ifdef CONFIG_FIQ_GLUE +void fiq_glue_resume(void); +#else +static inline void fiq_glue_resume(void) {} +#endif + +#endif
diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h index 370f7a7..4e9b957 100644 --- a/arch/arm/include/asm/topology.h +++ b/arch/arm/include/asm/topology.h
@@ -3,6 +3,7 @@ #ifdef CONFIG_ARM_CPU_TOPOLOGY +#include <linux/cpufreq.h> #include <linux/cpumask.h> struct cputopo_arm { @@ -24,6 +25,14 @@ void init_cpu_topology(void); void store_cpu_topology(unsigned int cpuid); const struct cpumask *cpu_coregroup_mask(int cpu); +#ifdef CONFIG_CPU_FREQ +#define arch_scale_freq_capacity cpufreq_scale_freq_capacity +#define arch_scale_max_freq_capacity cpufreq_scale_max_freq_capacity +#define arch_scale_min_freq_capacity cpufreq_scale_min_freq_capacity +#endif +#define arch_scale_cpu_capacity scale_cpu_capacity +extern unsigned long scale_cpu_capacity(struct sched_domain *sd, int cpu); + #else static inline void init_cpu_topology(void) { }
diff --git a/arch/arm/include/asm/traps.h b/arch/arm/include/asm/traps.h index f555bb3..683d923 100644 --- a/arch/arm/include/asm/traps.h +++ b/arch/arm/include/asm/traps.h
@@ -18,7 +18,6 @@ struct undef_hook { void register_undef_hook(struct undef_hook *hook); void unregister_undef_hook(struct undef_hook *hook); -#ifdef CONFIG_FUNCTION_GRAPH_TRACER static inline int __in_irqentry_text(unsigned long ptr) { extern char __irqentry_text_start[]; @@ -27,12 +26,6 @@ static inline int __in_irqentry_text(unsigned long ptr) return ptr >= (unsigned long)&__irqentry_text_start && ptr < (unsigned long)&__irqentry_text_end; } -#else -static inline int __in_irqentry_text(unsigned long ptr) -{ - return 0; -} -#endif static inline int in_exception_text(unsigned long ptr) {
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index 10c3283..0dcd9e7 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S
@@ -12,6 +12,7 @@ #include <asm/unistd.h> #include <asm/ftrace.h> #include <asm/unwind.h> +#include <asm/memory.h> #ifdef CONFIG_NEED_RET_TO_USER #include <mach/entry-macro.S> @@ -35,6 +36,9 @@ UNWIND(.fnstart ) UNWIND(.cantunwind ) disable_irq_notrace @ disable interrupts + ldr r2, [tsk, #TI_ADDR_LIMIT] + cmp r2, #TASK_SIZE + blne addr_limit_check_failed ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK bne fast_work_pending @@ -61,6 +65,9 @@ UNWIND(.cantunwind ) str r0, [sp, #S_R0 + S_OFF]! @ save returned r0 disable_irq_notrace @ disable interrupts + ldr r2, [tsk, #TI_ADDR_LIMIT] + cmp r2, #TASK_SIZE + blne addr_limit_check_failed ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK beq no_work_pending @@ -93,6 +100,9 @@ ret_slow_syscall: disable_irq_notrace @ disable interrupts ENTRY(ret_to_user_from_irq) + ldr r2, [tsk, #TI_ADDR_LIMIT] + cmp r2, #TASK_SIZE + blne addr_limit_check_failed ldr r1, [tsk, #TI_FLAGS] tst r1, #_TIF_WORK_MASK bne slow_work_pending
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c index 9232cae..f3c6622 100644 --- a/arch/arm/kernel/kgdb.c +++ b/arch/arm/kernel/kgdb.c
@@ -140,6 +140,8 @@ int kgdb_arch_handle_exception(int exception_vector, int signo, static int kgdb_brk_fn(struct pt_regs *regs, unsigned int instr) { + if (user_mode(regs)) + return -1; kgdb_handle_exception(1, SIGTRAP, 0, regs); return 0; @@ -147,6 +149,8 @@ static int kgdb_brk_fn(struct pt_regs *regs, unsigned int instr) static int kgdb_compiled_brk_fn(struct pt_regs *regs, unsigned int instr) { + if (user_mode(regs)) + return -1; compiled_break = 1; kgdb_handle_exception(1, SIGTRAP, 0, regs);
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index 91d2d5b0..38ad8b9 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c
@@ -80,6 +80,7 @@ void arch_cpu_idle_prepare(void) void arch_cpu_idle_enter(void) { + idle_notifier_call_chain(IDLE_START); ledtrig_cpu(CPU_LED_IDLE_START); #ifdef CONFIG_PL310_ERRATA_769419 wmb(); @@ -89,6 +90,78 @@ void arch_cpu_idle_enter(void) void arch_cpu_idle_exit(void) { ledtrig_cpu(CPU_LED_IDLE_END); + idle_notifier_call_chain(IDLE_END); +} + +/* + * dump a block of kernel memory from around the given address + */ +static void show_data(unsigned long addr, int nbytes, const char *name) +{ + int i, j; + int nlines; + u32 *p; + + /* + * don't attempt to dump non-kernel addresses or + * values that are probably just small negative numbers + */ + if (addr < PAGE_OFFSET || addr > -256UL) + return; + + printk("\n%s: %#lx:\n", name, addr); + + /* + * round address down to a 32 bit boundary + * and always dump a multiple of 32 bytes + */ + p = (u32 *)(addr & ~(sizeof(u32) - 1)); + nbytes += (addr & (sizeof(u32) - 1)); + nlines = (nbytes + 31) / 32; + + + for (i = 0; i < nlines; i++) { + /* + * just display low 16 bits of address to keep + * each line of the dump < 80 characters + */ + printk("%04lx ", (unsigned long)p & 0xffff); + for (j = 0; j < 8; j++) { + u32 data; + if (probe_kernel_address(p, data)) { + pr_cont(" ********"); + } else { + pr_cont(" %08x", data); + } + ++p; + } + pr_cont("\n"); + } +} + +static void show_extra_register_data(struct pt_regs *regs, int nbytes) +{ + mm_segment_t fs; + + fs = get_fs(); + set_fs(KERNEL_DS); + show_data(regs->ARM_pc - nbytes, nbytes * 2, "PC"); + show_data(regs->ARM_lr - nbytes, nbytes * 2, "LR"); + show_data(regs->ARM_sp - nbytes, nbytes * 2, "SP"); + show_data(regs->ARM_ip - nbytes, nbytes * 2, "IP"); + show_data(regs->ARM_fp - nbytes, nbytes * 2, "FP"); + show_data(regs->ARM_r0 - nbytes, nbytes * 2, "R0"); + show_data(regs->ARM_r1 - nbytes, nbytes * 2, "R1"); + show_data(regs->ARM_r2 - nbytes, nbytes * 2, "R2"); + show_data(regs->ARM_r3 - nbytes, nbytes * 2, "R3"); + show_data(regs->ARM_r4 - nbytes, nbytes * 2, "R4"); + show_data(regs->ARM_r5 - nbytes, nbytes * 2, "R5"); + show_data(regs->ARM_r6 - nbytes, nbytes * 2, "R6"); + show_data(regs->ARM_r7 - nbytes, nbytes * 2, "R7"); + show_data(regs->ARM_r8 - nbytes, nbytes * 2, "R8"); + show_data(regs->ARM_r9 - nbytes, nbytes * 2, "R9"); + show_data(regs->ARM_r10 - nbytes, nbytes * 2, "R10"); + set_fs(fs); } void __show_regs(struct pt_regs *regs) @@ -182,6 +255,8 @@ void __show_regs(struct pt_regs *regs) printk("Control: %08x%s\n", ctrl, buf); } #endif + + show_extra_register_data(regs, 128); } void show_regs(struct pt_regs * regs)
diff --git a/arch/arm/kernel/reboot.c b/arch/arm/kernel/reboot.c index 3fa867a..d704df8 100644 --- a/arch/arm/kernel/reboot.c +++ b/arch/arm/kernel/reboot.c
@@ -6,6 +6,7 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ +#include <linux/console.h> #include <linux/cpu.h> #include <linux/delay.h> #include <linux/reboot.h> @@ -122,6 +123,31 @@ void machine_power_off(void) pm_power_off(); } +#ifdef CONFIG_ARM_FLUSH_CONSOLE_ON_RESTART +void arm_machine_flush_console(void) +{ + printk("\n"); + pr_emerg("Restarting %s\n", linux_banner); + if (console_trylock()) { + console_unlock(); + return; + } + + mdelay(50); + + local_irq_disable(); + if (!console_trylock()) + pr_emerg("arm_restart: Console was locked! Busting\n"); + else + pr_emerg("arm_restart: Console was locked!\n"); + console_unlock(); +} +#else +void arm_machine_flush_console(void) +{ +} +#endif + /* * Restart requires that the secondary CPUs stop performing any activity * while the primary CPU resets the system. Systems with a single CPU can @@ -138,6 +164,10 @@ void machine_restart(char *cmd) local_irq_disable(); smp_send_stop(); + /* Flush the console to make sure all the relevant messages make it + * out to the console drivers */ + arm_machine_flush_console(); + if (arm_pm_restart) arm_pm_restart(reboot_mode, cmd); else
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index 7b8f214..304e684 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c
@@ -14,6 +14,7 @@ #include <linux/uaccess.h> #include <linux/tracehook.h> #include <linux/uprobes.h> +#include <linux/syscalls.h> #include <asm/elf.h> #include <asm/cacheflush.h> @@ -631,3 +632,9 @@ struct page *get_signal_page(void) return page; } + +/* Defer to generic check */ +asmlinkage void addr_limit_check_failed(void) +{ + addr_limit_user_check(); +}
diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index ec279d1..07b0812 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c
@@ -42,7 +42,7 @@ */ static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; -unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu) +unsigned long scale_cpu_capacity(struct sched_domain *sd, int cpu) { return per_cpu(cpu_scale, cpu); } @@ -153,6 +153,8 @@ static void __init parse_dt_topology(void) } +static const struct sched_group_energy * const cpu_core_energy(int cpu); + /* * Look for a customed capacity of a CPU in the cpu_capacity table during the * boot. The update of all CPUs is in O(n^2) for heteregeneous system but the @@ -160,10 +162,14 @@ static void __init parse_dt_topology(void) */ static void update_cpu_capacity(unsigned int cpu) { - if (!cpu_capacity(cpu)) - return; + unsigned long capacity = SCHED_CAPACITY_SCALE; - set_capacity_scale(cpu, cpu_capacity(cpu) / middle_capacity); + if (cpu_core_energy(cpu)) { + int max_cap_idx = cpu_core_energy(cpu)->nr_cap_states - 1; + capacity = cpu_core_energy(cpu)->cap_states[max_cap_idx].cap; + } + + set_capacity_scale(cpu, capacity); pr_info("CPU%u: update cpu_capacity %lu\n", cpu, arch_scale_cpu_capacity(NULL, cpu)); @@ -275,17 +281,138 @@ void store_cpu_topology(unsigned int cpuid) cpu_topology[cpuid].socket_id, mpidr); } +/* + * ARM TC2 specific energy cost model data. There are no unit requirements for + * the data. Data can be normalized to any reference point, but the + * normalization must be consistent. That is, one bogo-joule/watt must be the + * same quantity for all data, but we don't care what it is. + */ +static struct idle_state idle_states_cluster_a7[] = { + { .power = 25 }, /* arch_cpu_idle() (active idle) = WFI */ + { .power = 25 }, /* WFI */ + { .power = 10 }, /* cluster-sleep-l */ + }; + +static struct idle_state idle_states_cluster_a15[] = { + { .power = 70 }, /* arch_cpu_idle() (active idle) = WFI */ + { .power = 70 }, /* WFI */ + { .power = 25 }, /* cluster-sleep-b */ + }; + +static struct capacity_state cap_states_cluster_a7[] = { + /* Cluster only power */ + { .cap = 150, .power = 2967, }, /* 350 MHz */ + { .cap = 172, .power = 2792, }, /* 400 MHz */ + { .cap = 215, .power = 2810, }, /* 500 MHz */ + { .cap = 258, .power = 2815, }, /* 600 MHz */ + { .cap = 301, .power = 2919, }, /* 700 MHz */ + { .cap = 344, .power = 2847, }, /* 800 MHz */ + { .cap = 387, .power = 3917, }, /* 900 MHz */ + { .cap = 430, .power = 4905, }, /* 1000 MHz */ + }; + +static struct capacity_state cap_states_cluster_a15[] = { + /* Cluster only power */ + { .cap = 426, .power = 7920, }, /* 500 MHz */ + { .cap = 512, .power = 8165, }, /* 600 MHz */ + { .cap = 597, .power = 8172, }, /* 700 MHz */ + { .cap = 682, .power = 8195, }, /* 800 MHz */ + { .cap = 768, .power = 8265, }, /* 900 MHz */ + { .cap = 853, .power = 8446, }, /* 1000 MHz */ + { .cap = 938, .power = 11426, }, /* 1100 MHz */ + { .cap = 1024, .power = 15200, }, /* 1200 MHz */ + }; + +static struct sched_group_energy energy_cluster_a7 = { + .nr_idle_states = ARRAY_SIZE(idle_states_cluster_a7), + .idle_states = idle_states_cluster_a7, + .nr_cap_states = ARRAY_SIZE(cap_states_cluster_a7), + .cap_states = cap_states_cluster_a7, +}; + +static struct sched_group_energy energy_cluster_a15 = { + .nr_idle_states = ARRAY_SIZE(idle_states_cluster_a15), + .idle_states = idle_states_cluster_a15, + .nr_cap_states = ARRAY_SIZE(cap_states_cluster_a15), + .cap_states = cap_states_cluster_a15, +}; + +static struct idle_state idle_states_core_a7[] = { + { .power = 0 }, /* arch_cpu_idle (active idle) = WFI */ + { .power = 0 }, /* WFI */ + { .power = 0 }, /* cluster-sleep-l */ + }; + +static struct idle_state idle_states_core_a15[] = { + { .power = 0 }, /* arch_cpu_idle (active idle) = WFI */ + { .power = 0 }, /* WFI */ + { .power = 0 }, /* cluster-sleep-b */ + }; + +static struct capacity_state cap_states_core_a7[] = { + /* Power per cpu */ + { .cap = 150, .power = 187, }, /* 350 MHz */ + { .cap = 172, .power = 275, }, /* 400 MHz */ + { .cap = 215, .power = 334, }, /* 500 MHz */ + { .cap = 258, .power = 407, }, /* 600 MHz */ + { .cap = 301, .power = 447, }, /* 700 MHz */ + { .cap = 344, .power = 549, }, /* 800 MHz */ + { .cap = 387, .power = 761, }, /* 900 MHz */ + { .cap = 430, .power = 1024, }, /* 1000 MHz */ + }; + +static struct capacity_state cap_states_core_a15[] = { + /* Power per cpu */ + { .cap = 426, .power = 2021, }, /* 500 MHz */ + { .cap = 512, .power = 2312, }, /* 600 MHz */ + { .cap = 597, .power = 2756, }, /* 700 MHz */ + { .cap = 682, .power = 3125, }, /* 800 MHz */ + { .cap = 768, .power = 3524, }, /* 900 MHz */ + { .cap = 853, .power = 3846, }, /* 1000 MHz */ + { .cap = 938, .power = 5177, }, /* 1100 MHz */ + { .cap = 1024, .power = 6997, }, /* 1200 MHz */ + }; + +static struct sched_group_energy energy_core_a7 = { + .nr_idle_states = ARRAY_SIZE(idle_states_core_a7), + .idle_states = idle_states_core_a7, + .nr_cap_states = ARRAY_SIZE(cap_states_core_a7), + .cap_states = cap_states_core_a7, +}; + +static struct sched_group_energy energy_core_a15 = { + .nr_idle_states = ARRAY_SIZE(idle_states_core_a15), + .idle_states = idle_states_core_a15, + .nr_cap_states = ARRAY_SIZE(cap_states_core_a15), + .cap_states = cap_states_core_a15, +}; + +/* sd energy functions */ +static inline +const struct sched_group_energy * const cpu_cluster_energy(int cpu) +{ + return cpu_topology[cpu].socket_id ? &energy_cluster_a7 : + &energy_cluster_a15; +} + +static inline +const struct sched_group_energy * const cpu_core_energy(int cpu) +{ + return cpu_topology[cpu].socket_id ? &energy_core_a7 : + &energy_core_a15; +} + static inline int cpu_corepower_flags(void) { - return SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN; + return SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN | \ + SD_SHARE_CAP_STATES; } static struct sched_domain_topology_level arm_topology[] = { #ifdef CONFIG_SCHED_MC - { cpu_corepower_mask, cpu_corepower_flags, SD_INIT_NAME(GMC) }, - { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) }, + { cpu_coregroup_mask, cpu_corepower_flags, cpu_core_energy, SD_INIT_NAME(MC) }, #endif - { cpu_cpu_mask, SD_INIT_NAME(DIE) }, + { cpu_cpu_mask, NULL, cpu_cluster_energy, SD_INIT_NAME(DIE) }, { NULL, }, };
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 2465995..11da0f5 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S
@@ -270,6 +270,11 @@ * - end - virtual end address of region */ ENTRY(v6_dma_flush_range) +#ifdef CONFIG_CACHE_FLUSH_RANGE_LIMIT + sub r2, r1, r0 + cmp r2, #CONFIG_CACHE_FLUSH_RANGE_LIMIT + bhi v6_dma_flush_dcache_all +#endif #ifdef CONFIG_DMA_CACHE_RWFO ldrb r2, [r0] @ read for ownership strb r2, [r0] @ write for ownership @@ -292,6 +297,18 @@ mcr p15, 0, r0, c7, c10, 4 @ drain write buffer ret lr +#ifdef CONFIG_CACHE_FLUSH_RANGE_LIMIT +v6_dma_flush_dcache_all: + mov r0, #0 +#ifdef HARVARD_CACHE + mcr p15, 0, r0, c7, c14, 0 @ D cache clean+invalidate +#else + mcr p15, 0, r0, c7, c15, 0 @ Cache clean+invalidate +#endif + mcr p15, 0, r0, c7, c10, 4 @ drain write buffer + mov pc, lr +#endif + /* * dma_map_area(start, size, dir) * - start - kernel virtual start address
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index f7861dc..a9ef54db 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c
@@ -273,10 +273,10 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) local_irq_enable(); /* - * If we're in an interrupt or have no user + * If we're in an interrupt, or have no irqs, or have no user * context, we must not take the fault.. */ - if (faulthandler_disabled() || !mm) + if (faulthandler_disabled() || irqs_disabled() || !mm) goto no_context; if (user_mode(regs))
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 3e43874..514dd68 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig
@@ -15,6 +15,7 @@ select ARCH_HAS_SG_CHAIN select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_USE_CMPXCHG_LOCKREF + select ARCH_SUPPORTS_LTO_CLANG select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_WANT_COMPAT_IPC_PARSE_VERSION @@ -109,6 +110,7 @@ select POWER_SUPPLY select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE + select THREAD_INFO_IN_TASK help ARM 64-bit (AArch64) Linux support. @@ -417,7 +419,7 @@ config ARM64_ERRATUM_843419 bool "Cortex-A53: 843419: A load or store might access an incorrect address" - default y + default y if !LTO_CLANG select ARM64_MODULE_CMODEL_LARGE if MODULES help This option links the kernel with '--fix-cortex-a53-843419' and @@ -852,6 +854,14 @@ If unsure, say Y endif +config ARM64_SW_TTBR0_PAN + bool "Emulate Privileged Access Never using TTBR0_EL1 switching" + help + Enabling this option prevents the kernel from accessing + user-space memory directly by pointing TTBR0_EL1 to a reserved + zeroed area and reserved ASID. The user access routines + restore the valid TTBR0_EL1 temporarily. + menu "ARMv8.1 architectural features" config ARM64_HW_AFDBM @@ -977,7 +987,7 @@ config RANDOMIZE_MODULE_REGION_FULL bool "Randomize the module region independently from the core kernel" - depends on RANDOMIZE_BASE && !DYNAMIC_FTRACE + depends on RANDOMIZE_BASE && !DYNAMIC_FTRACE && !LTO_CLANG default y help Randomizes the location of the module region without considering the @@ -1011,6 +1021,23 @@ entering them here. As a minimum, you should specify the the root device (e.g. root=/dev/nfs). +choice + prompt "Kernel command line type" if CMDLINE != "" + default CMDLINE_FROM_BOOTLOADER + +config CMDLINE_FROM_BOOTLOADER + bool "Use bootloader kernel arguments if available" + help + Uses the command-line options passed by the boot loader. If + the boot loader doesn't provide any, the default kernel command + string provided in CMDLINE will be used. + +config CMDLINE_EXTEND + bool "Extend bootloader kernel arguments" + help + The command-line arguments provided by the boot loader will be + appended to the default kernel command string. + config CMDLINE_FORCE bool "Always use the default kernel command string" help @@ -1018,6 +1045,7 @@ loader passes other arguments to the kernel. This is useful if you cannot or don't want to change the command-line options your boot loader passes to the kernel. +endchoice config EFI_STUB bool @@ -1050,6 +1078,41 @@ However, even with this option, the resultant kernel should continue to boot on existing non-UEFI platforms. +config BUILD_ARM64_APPENDED_DTB_IMAGE + bool "Build a concatenated Image.gz/dtb by default" + depends on OF + help + Enabling this option will cause a concatenated Image.gz and list of + DTBs to be built by default (instead of a standalone Image.gz.) + The image will built in arch/arm64/boot/Image.gz-dtb + +choice + prompt "Appended DTB Kernel Image name" + depends on BUILD_ARM64_APPENDED_DTB_IMAGE + help + Enabling this option will cause a specific kernel image Image or + Image.gz to be used for final image creation. + The image will built in arch/arm64/boot/IMAGE-NAME-dtb + + config IMG_GZ_DTB + bool "Image.gz-dtb" + config IMG_DTB + bool "Image-dtb" +endchoice + +config BUILD_ARM64_APPENDED_KERNEL_IMAGE_NAME + string + depends on BUILD_ARM64_APPENDED_DTB_IMAGE + default "Image.gz-dtb" if IMG_GZ_DTB + default "Image-dtb" if IMG_DTB + +config BUILD_ARM64_APPENDED_DTB_IMAGE_NAMES + string "Default dtb names" + depends on BUILD_ARM64_APPENDED_DTB_IMAGE + help + Space separated list of names of dtbs to append when + building a concatenated Image.gz-dtb. + endmenu menu "Userspace binary formats"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 92110c2..a4f7ecb 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile
@@ -26,8 +26,17 @@ ifeq ($(call ld-option, --fix-cortex-a53-843419),) $(warning ld does not support --fix-cortex-a53-843419; kernel may be susceptible to erratum) else + ifeq ($(call gold-ifversion, -lt, 114000000, y), y) +$(warning This version of GNU gold may generate incorrect code with --fix-cortex-a53-843419;\ + see https://sourceware.org/bugzilla/show_bug.cgi?id=21491) + endif LDFLAGS_vmlinux += --fix-cortex-a53-843419 endif +else + ifeq ($(ld-name),gold) +# Pass --no-fix-cortex-a53-843419 to ensure the erratum fix is disabled +LDFLAGS += --no-fix-cortex-a53-843419 + endif endif KBUILD_DEFCONFIG := defconfig @@ -41,9 +50,17 @@ endif endif -KBUILD_CFLAGS += -mgeneral-regs-only $(lseinstr) +ifeq ($(cc-name),clang) +# This is a workaround for https://bugs.llvm.org/show_bug.cgi?id=30792. +# TODO: revert when this is fixed in LLVM. +KBUILD_CFLAGS += -mno-implicit-float +else +KBUILD_CFLAGS += -mgeneral-regs-only +endif +KBUILD_CFLAGS += $(lseinstr) KBUILD_CFLAGS += -fno-asynchronous-unwind-tables KBUILD_CFLAGS += $(call cc-option, -mpc-relative-literal-loads) +KBUILD_CFLAGS += -fno-pic KBUILD_AFLAGS += $(lseinstr) ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) @@ -62,6 +79,10 @@ ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y) KBUILD_CFLAGS_MODULE += -mcmodel=large +ifeq ($(CONFIG_LTO_CLANG), y) +# Code model is not stored in LLVM IR, so we need to pass it also to LLVMgold +KBUILD_LDFLAGS_MODULE += -plugin-opt=-code-model=large +endif endif ifeq ($(CONFIG_ARM64_MODULE_PLTS),y) @@ -80,6 +101,10 @@ TEXT_OFFSET := 0x00080000 endif +ifeq ($(cc-name),clang) +KBUILD_CFLAGS += $(call cc-disable-warning, asm-operand-widths) +endif + # KASAN_SHADOW_OFFSET = VA_START + (1 << (VA_BITS - 3)) - (1 << 61) # in 32-bit arithmetic KASAN_SHADOW_OFFSET := $(shell printf "0x%08x00000000\n" $$(( \ @@ -98,7 +123,12 @@ core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a # Default target when executing plain make +ifeq ($(CONFIG_BUILD_ARM64_APPENDED_DTB_IMAGE),y) +KBUILD_IMAGE := $(subst $\",,$(CONFIG_BUILD_ARM64_APPENDED_KERNEL_IMAGE_NAME)) +else KBUILD_IMAGE := Image.gz +endif + KBUILD_DTBS := dtbs all: $(KBUILD_IMAGE) $(KBUILD_DTBS) @@ -125,6 +155,9 @@ dtbs_install: $(Q)$(MAKE) $(dtbinst)=$(boot)/dts +Image-dtb Image.gz-dtb: vmlinux scripts dtbs + $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@ + PHONY += vdso_install vdso_install: $(Q)$(MAKE) $(build)=arch/arm64/kernel/vdso $@
diff --git a/arch/arm64/boot/.gitignore b/arch/arm64/boot/.gitignore index 8dab0bb..34e3520 100644 --- a/arch/arm64/boot/.gitignore +++ b/arch/arm64/boot/.gitignore
@@ -1,2 +1,4 @@ Image +Image-dtb Image.gz +Image.gz-dtb
diff --git a/arch/arm64/boot/Makefile b/arch/arm64/boot/Makefile index 1f012c5..2c8cb86 100644 --- a/arch/arm64/boot/Makefile +++ b/arch/arm64/boot/Makefile
@@ -14,16 +14,29 @@ # Based on the ia64 boot/Makefile. # +include $(srctree)/arch/arm64/boot/dts/Makefile + OBJCOPYFLAGS_Image :=-O binary -R .note -R .note.gnu.build-id -R .comment -S targets := Image Image.gz +DTB_NAMES := $(subst $\",,$(CONFIG_BUILD_ARM64_APPENDED_DTB_IMAGE_NAMES)) +ifneq ($(DTB_NAMES),) +DTB_LIST := $(addsuffix .dtb,$(DTB_NAMES)) +else +DTB_LIST := $(dtb-y) +endif +DTB_OBJS := $(addprefix $(obj)/dts/,$(DTB_LIST)) + $(obj)/Image: vmlinux FORCE $(call if_changed,objcopy) $(obj)/Image.bz2: $(obj)/Image FORCE $(call if_changed,bzip2) +$(obj)/Image-dtb: $(obj)/Image $(DTB_OBJS) FORCE + $(call if_changed,cat) + $(obj)/Image.gz: $(obj)/Image FORCE $(call if_changed,gzip) @@ -36,6 +49,9 @@ $(obj)/Image.lzo: $(obj)/Image FORCE $(call if_changed,lzo) +$(obj)/Image.gz-dtb: $(obj)/Image.gz $(DTB_OBJS) FORCE + $(call if_changed,cat) + install: $(CONFIG_SHELL) $(srctree)/$(src)/install.sh $(KERNELRELEASE) \ $(obj)/Image System.map "$(INSTALL_PATH)"
diff --git a/arch/arm64/boot/dts/Makefile b/arch/arm64/boot/dts/Makefile index 6684f97c..7ad2cf0 100644 --- a/arch/arm64/boot/dts/Makefile +++ b/arch/arm64/boot/dts/Makefile
@@ -28,3 +28,17 @@ dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(foreach d,$(dts-dirs), $(wildcard $(dtstree)/$(d)/*.dts))) always := $(dtb-y) + +targets += dtbs + +DTB_NAMES := $(subst $\",,$(CONFIG_BUILD_ARM64_APPENDED_DTB_IMAGE_NAMES)) +ifneq ($(DTB_NAMES),) +DTB_LIST := $(addsuffix .dtb,$(DTB_NAMES)) +else +DTB_LIST := $(dtb-y) +endif +targets += $(DTB_LIST) + +dtbs: $(addprefix $(obj)/, $(DTB_LIST)) + +clean-files := dts/*.dtb *.dtb
diff --git a/arch/arm64/configs/ranchu64_defconfig b/arch/arm64/configs/ranchu64_defconfig new file mode 100644 index 0000000..fc55008 --- /dev/null +++ b/arch/arm64/configs/ranchu64_defconfig
@@ -0,0 +1,312 @@ +# CONFIG_LOCALVERSION_AUTO is not set +# CONFIG_SWAP is not set +CONFIG_POSIX_MQUEUE=y +CONFIG_AUDIT=y +CONFIG_NO_HZ=y +CONFIG_HIGH_RES_TIMERS=y +CONFIG_BSD_PROCESS_ACCT=y +CONFIG_BSD_PROCESS_ACCT_V3=y +CONFIG_TASKSTATS=y +CONFIG_TASK_DELAY_ACCT=y +CONFIG_TASK_XACCT=y +CONFIG_TASK_IO_ACCOUNTING=y +CONFIG_IKCONFIG=y +CONFIG_IKCONFIG_PROC=y +CONFIG_LOG_BUF_SHIFT=14 +CONFIG_CGROUP_DEBUG=y +CONFIG_CGROUP_FREEZER=y +CONFIG_CGROUP_CPUACCT=y +CONFIG_RT_GROUP_SCHED=y +CONFIG_SCHED_AUTOGROUP=y +CONFIG_BLK_DEV_INITRD=y +CONFIG_KALLSYMS_ALL=y +CONFIG_EMBEDDED=y +# CONFIG_COMPAT_BRK is not set +CONFIG_PROFILING=y +CONFIG_ARCH_MMAP_RND_BITS=24 +CONFIG_ARCH_MMAP_RND_COMPAT_BITS=16 +# CONFIG_BLK_DEV_BSG is not set +# CONFIG_IOSCHED_DEADLINE is not set +CONFIG_ARCH_VEXPRESS=y +CONFIG_NR_CPUS=4 +CONFIG_PREEMPT=y +CONFIG_KSM=y +CONFIG_SECCOMP=y +CONFIG_ARMV8_DEPRECATED=y +CONFIG_SWP_EMULATION=y +CONFIG_CP15_BARRIER_EMULATION=y +CONFIG_SETEND_EMULATION=y +CONFIG_CMDLINE="console=ttyAMA0" +# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set +CONFIG_COMPAT=y +CONFIG_PM_AUTOSLEEP=y +CONFIG_PM_WAKELOCKS=y +CONFIG_PM_WAKELOCKS_LIMIT=0 +# CONFIG_PM_WAKELOCKS_GC is not set +CONFIG_PM_DEBUG=y +CONFIG_NET=y +CONFIG_PACKET=y +CONFIG_UNIX=y +CONFIG_XFRM_USER=y +CONFIG_NET_KEY=y +CONFIG_INET=y +CONFIG_INET_DIAG_DESTROY=y +CONFIG_IP_MULTICAST=y +CONFIG_IP_ADVANCED_ROUTER=y +CONFIG_IP_MULTIPLE_TABLES=y +CONFIG_IP_PNP=y +CONFIG_IP_PNP_DHCP=y +CONFIG_IP_PNP_BOOTP=y +CONFIG_INET_ESP=y +# CONFIG_INET_LRO is not set +CONFIG_IPV6_ROUTER_PREF=y +CONFIG_IPV6_ROUTE_INFO=y +CONFIG_IPV6_OPTIMISTIC_DAD=y +CONFIG_INET6_AH=y +CONFIG_INET6_ESP=y +CONFIG_INET6_IPCOMP=y +CONFIG_IPV6_MIP6=y +CONFIG_IPV6_MULTIPLE_TABLES=y +CONFIG_NETFILTER=y +CONFIG_NF_CONNTRACK=y +CONFIG_NF_CONNTRACK_SECMARK=y +CONFIG_NF_CONNTRACK_EVENTS=y +CONFIG_NF_CT_PROTO_DCCP=y +CONFIG_NF_CT_PROTO_SCTP=y +CONFIG_NF_CT_PROTO_UDPLITE=y +CONFIG_NF_CONNTRACK_AMANDA=y +CONFIG_NF_CONNTRACK_FTP=y +CONFIG_NF_CONNTRACK_H323=y +CONFIG_NF_CONNTRACK_IRC=y +CONFIG_NF_CONNTRACK_NETBIOS_NS=y +CONFIG_NF_CONNTRACK_PPTP=y +CONFIG_NF_CONNTRACK_SANE=y +CONFIG_NF_CONNTRACK_TFTP=y +CONFIG_NF_CT_NETLINK=y +CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y +CONFIG_NETFILTER_XT_TARGET_CONNMARK=y +CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y +CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y +CONFIG_NETFILTER_XT_TARGET_MARK=y +CONFIG_NETFILTER_XT_TARGET_NFLOG=y +CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y +CONFIG_NETFILTER_XT_TARGET_TPROXY=y +CONFIG_NETFILTER_XT_TARGET_TRACE=y +CONFIG_NETFILTER_XT_TARGET_SECMARK=y +CONFIG_NETFILTER_XT_TARGET_TCPMSS=y +CONFIG_NETFILTER_XT_MATCH_COMMENT=y +CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y +CONFIG_NETFILTER_XT_MATCH_CONNMARK=y +CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y +CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y +CONFIG_NETFILTER_XT_MATCH_HELPER=y +CONFIG_NETFILTER_XT_MATCH_IPRANGE=y +CONFIG_NETFILTER_XT_MATCH_LENGTH=y +CONFIG_NETFILTER_XT_MATCH_LIMIT=y +CONFIG_NETFILTER_XT_MATCH_MAC=y +CONFIG_NETFILTER_XT_MATCH_MARK=y +CONFIG_NETFILTER_XT_MATCH_POLICY=y +CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y +CONFIG_NETFILTER_XT_MATCH_QTAGUID=y +CONFIG_NETFILTER_XT_MATCH_QUOTA=y +CONFIG_NETFILTER_XT_MATCH_QUOTA2=y +CONFIG_NETFILTER_XT_MATCH_SOCKET=y +CONFIG_NETFILTER_XT_MATCH_STATE=y +CONFIG_NETFILTER_XT_MATCH_STATISTIC=y +CONFIG_NETFILTER_XT_MATCH_STRING=y +CONFIG_NETFILTER_XT_MATCH_TIME=y +CONFIG_NETFILTER_XT_MATCH_U32=y +CONFIG_NF_CONNTRACK_IPV4=y +CONFIG_IP_NF_IPTABLES=y +CONFIG_IP_NF_MATCH_AH=y +CONFIG_IP_NF_MATCH_ECN=y +CONFIG_IP_NF_MATCH_RPFILTER=y +CONFIG_IP_NF_MATCH_TTL=y +CONFIG_IP_NF_FILTER=y +CONFIG_IP_NF_TARGET_REJECT=y +CONFIG_IP_NF_MANGLE=y +CONFIG_IP_NF_TARGET_ECN=y +CONFIG_IP_NF_TARGET_TTL=y +CONFIG_IP_NF_RAW=y +CONFIG_IP_NF_SECURITY=y +CONFIG_IP_NF_ARPTABLES=y +CONFIG_IP_NF_ARPFILTER=y +CONFIG_IP_NF_ARP_MANGLE=y +CONFIG_NF_CONNTRACK_IPV6=y +CONFIG_IP6_NF_IPTABLES=y +CONFIG_IP6_NF_MATCH_AH=y +CONFIG_IP6_NF_MATCH_EUI64=y +CONFIG_IP6_NF_MATCH_FRAG=y +CONFIG_IP6_NF_MATCH_OPTS=y +CONFIG_IP6_NF_MATCH_HL=y +CONFIG_IP6_NF_MATCH_IPV6HEADER=y +CONFIG_IP6_NF_MATCH_MH=y +CONFIG_IP6_NF_MATCH_RT=y +CONFIG_IP6_NF_TARGET_HL=y +CONFIG_IP6_NF_FILTER=y +CONFIG_IP6_NF_TARGET_REJECT=y +CONFIG_IP6_NF_MANGLE=y +CONFIG_IP6_NF_RAW=y +CONFIG_BRIDGE=y +CONFIG_NET_SCHED=y +CONFIG_NET_SCH_HTB=y +CONFIG_NET_CLS_U32=y +CONFIG_NET_EMATCH=y +CONFIG_NET_EMATCH_U32=y +CONFIG_NET_CLS_ACT=y +# CONFIG_WIRELESS is not set +CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_BLK_DEV_LOOP=y +CONFIG_BLK_DEV_RAM=y +CONFIG_BLK_DEV_RAM_SIZE=8192 +CONFIG_VIRTIO_BLK=y +CONFIG_SCSI=y +# CONFIG_SCSI_PROC_FS is not set +CONFIG_BLK_DEV_SD=y +# CONFIG_SCSI_LOWLEVEL is not set +CONFIG_MD=y +CONFIG_BLK_DEV_DM=y +CONFIG_DM_CRYPT=y +CONFIG_DM_UEVENT=y +CONFIG_DM_VERITY=y +CONFIG_DM_VERITY_FEC=y +CONFIG_NETDEVICES=y +CONFIG_TUN=y +CONFIG_VIRTIO_NET=y +CONFIG_SMC91X=y +CONFIG_PPP=y +CONFIG_PPP_BSDCOMP=y +CONFIG_PPP_DEFLATE=y +CONFIG_PPP_MPPE=y +CONFIG_PPPOLAC=y +CONFIG_PPPOPNS=y +# CONFIG_WLAN is not set +CONFIG_INPUT_EVDEV=y +CONFIG_INPUT_KEYRESET=y +CONFIG_KEYBOARD_GOLDFISH_EVENTS=y +# CONFIG_INPUT_MOUSE is not set +CONFIG_INPUT_JOYSTICK=y +CONFIG_INPUT_TABLET=y +CONFIG_INPUT_MISC=y +CONFIG_INPUT_KEYCHORD=y +CONFIG_INPUT_UINPUT=y +CONFIG_INPUT_GPIO=y +# CONFIG_SERIO_SERPORT is not set +# CONFIG_VT is not set +# CONFIG_LEGACY_PTYS is not set +# CONFIG_DEVMEM is not set +# CONFIG_DEVKMEM is not set +CONFIG_SERIAL_AMBA_PL011=y +CONFIG_SERIAL_AMBA_PL011_CONSOLE=y +CONFIG_VIRTIO_CONSOLE=y +# CONFIG_HW_RANDOM is not set +CONFIG_BATTERY_GOLDFISH=y +# CONFIG_HWMON is not set +CONFIG_MEDIA_SUPPORT=y +CONFIG_FB=y +CONFIG_FB_GOLDFISH=y +CONFIG_FB_SIMPLE=y +CONFIG_BACKLIGHT_LCD_SUPPORT=y +CONFIG_LOGO=y +# CONFIG_LOGO_LINUX_MONO is not set +# CONFIG_LOGO_LINUX_VGA16 is not set +CONFIG_SOUND=y +CONFIG_SND=y +CONFIG_HIDRAW=y +CONFIG_UHID=y +CONFIG_HID_A4TECH=y +CONFIG_HID_ACRUX=y +CONFIG_HID_ACRUX_FF=y +CONFIG_HID_APPLE=y +CONFIG_HID_BELKIN=y +CONFIG_HID_CHERRY=y +CONFIG_HID_CHICONY=y +CONFIG_HID_PRODIKEYS=y +CONFIG_HID_CYPRESS=y +CONFIG_HID_DRAGONRISE=y +CONFIG_DRAGONRISE_FF=y +CONFIG_HID_EMS_FF=y +CONFIG_HID_ELECOM=y +CONFIG_HID_EZKEY=y +CONFIG_HID_KEYTOUCH=y +CONFIG_HID_KYE=y +CONFIG_HID_WALTOP=y +CONFIG_HID_GYRATION=y +CONFIG_HID_TWINHAN=y +CONFIG_HID_KENSINGTON=y +CONFIG_HID_LCPOWER=y +CONFIG_HID_LOGITECH=y +CONFIG_HID_LOGITECH_DJ=y +CONFIG_LOGITECH_FF=y +CONFIG_LOGIRUMBLEPAD2_FF=y +CONFIG_LOGIG940_FF=y +CONFIG_HID_MAGICMOUSE=y +CONFIG_HID_MICROSOFT=y +CONFIG_HID_MONTEREY=y +CONFIG_HID_MULTITOUCH=y +CONFIG_HID_ORTEK=y +CONFIG_HID_PANTHERLORD=y +CONFIG_PANTHERLORD_FF=y +CONFIG_HID_PETALYNX=y +CONFIG_HID_PICOLCD=y +CONFIG_HID_PRIMAX=y +CONFIG_HID_SAITEK=y +CONFIG_HID_SAMSUNG=y +CONFIG_HID_SPEEDLINK=y +CONFIG_HID_SUNPLUS=y +CONFIG_HID_GREENASIA=y +CONFIG_GREENASIA_FF=y +CONFIG_HID_SMARTJOYPLUS=y +CONFIG_SMARTJOYPLUS_FF=y +CONFIG_HID_TIVO=y +CONFIG_HID_TOPSEED=y +CONFIG_HID_THRUSTMASTER=y +CONFIG_HID_WACOM=y +CONFIG_HID_WIIMOTE=y +CONFIG_HID_ZEROPLUS=y +CONFIG_HID_ZYDACRON=y +# CONFIG_USB_SUPPORT is not set +CONFIG_RTC_CLASS=y +CONFIG_VIRTIO_MMIO=y +CONFIG_STAGING=y +CONFIG_ASHMEM=y +CONFIG_ANDROID_TIMED_GPIO=y +CONFIG_ANDROID_LOW_MEMORY_KILLER=y +CONFIG_SYNC=y +CONFIG_SW_SYNC=y +CONFIG_SW_SYNC_USER=y +CONFIG_ION=y +CONFIG_GOLDFISH_AUDIO=y +CONFIG_GOLDFISH=y +CONFIG_GOLDFISH_PIPE=y +# CONFIG_IOMMU_SUPPORT is not set +CONFIG_ANDROID=y +CONFIG_ANDROID_BINDER_IPC=y +CONFIG_EXT2_FS=y +CONFIG_EXT4_FS=y +CONFIG_EXT4_FS_SECURITY=y +CONFIG_QUOTA=y +CONFIG_FUSE_FS=y +CONFIG_CUSE=y +CONFIG_MSDOS_FS=y +CONFIG_VFAT_FS=y +CONFIG_TMPFS=y +CONFIG_TMPFS_POSIX_ACL=y +# CONFIG_MISC_FILESYSTEMS is not set +CONFIG_NFS_FS=y +CONFIG_ROOT_NFS=y +CONFIG_NLS_CODEPAGE_437=y +CONFIG_NLS_ISO8859_1=y +CONFIG_DEBUG_INFO=y +CONFIG_DEBUG_FS=y +CONFIG_MAGIC_SYSRQ=y +CONFIG_PANIC_TIMEOUT=5 +# CONFIG_SCHED_DEBUG is not set +CONFIG_SCHEDSTATS=y +CONFIG_TIMER_STATS=y +# CONFIG_FTRACE is not set +CONFIG_ATOMIC64_SELFTEST=y +CONFIG_DEBUG_RODATA=y +CONFIG_SECURITY=y +CONFIG_SECURITY_NETWORK=y +CONFIG_SECURITY_SELINUX=y
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index 2cf32e9..2b10984 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig
@@ -53,4 +53,11 @@ tristate "CRC32 and CRC32C using optional ARMv8 instructions" depends on ARM64 select CRYPTO_HASH + +config CRYPTO_SPECK_NEON + tristate "NEON accelerated Speck cipher algorithms" + depends on KERNEL_MODE_NEON + select CRYPTO_BLKCIPHER + select CRYPTO_GF128MUL + select CRYPTO_SPECK endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index abb79b3..9908765 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile
@@ -18,7 +18,8 @@ ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o -CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto +aes-ce-cipher-y := aes-ce-cipher-glue.o aes-ce-cipher-core.o +CFLAGS_aes-ce-cipher-core.o += -march=armv8-a+crypto -Wa,-march=armv8-a+crypto $(DISABLE_LTO) obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o @@ -29,6 +30,9 @@ obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o aes-neon-blk-y := aes-glue-neon.o aes-neon.o +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o +speck-neon-y := speck-neon-core.o speck-neon-glue.o + AFLAGS_aes-ce.o := -DINTERLEAVE=4 AFLAGS_aes-neon.o := -DINTERLEAVE=4
diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher-core.c similarity index 77% rename from arch/arm64/crypto/aes-ce-cipher.c rename to arch/arm64/crypto/aes-ce-cipher-core.c index 50d9fe1..9f41917 100644 --- a/arch/arm64/crypto/aes-ce-cipher.c +++ b/arch/arm64/crypto/aes-ce-cipher-core.c
@@ -1,5 +1,5 @@ /* - * aes-ce-cipher.c - core AES cipher using ARMv8 Crypto Extensions + * aes-ce-cipher-core.c - core AES cipher using ARMv8 Crypto Extensions * * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org> * @@ -10,16 +10,10 @@ #include <asm/neon.h> #include <crypto/aes.h> -#include <linux/cpufeature.h> #include <linux/crypto.h> -#include <linux/module.h> #include "aes-ce-setkey.h" -MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions"); -MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>"); -MODULE_LICENSE("GPL v2"); - struct aes_block { u8 b[AES_BLOCK_SIZE]; }; @@ -36,7 +30,7 @@ static int num_rounds(struct crypto_aes_ctx *ctx) return 6 + ctx->key_length / 4; } -static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) +void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) { struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); struct aes_block *out = (struct aes_block *)dst; @@ -81,7 +75,7 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) kernel_neon_end(); } -static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) +void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) { struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); struct aes_block *out = (struct aes_block *)dst; @@ -223,48 +217,3 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key, return 0; } EXPORT_SYMBOL(ce_aes_expandkey); - -int ce_aes_setkey(struct crypto_tfm *tfm, const u8 *in_key, - unsigned int key_len) -{ - struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); - int ret; - - ret = ce_aes_expandkey(ctx, in_key, key_len); - if (!ret) - return 0; - - tfm->crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN; - return -EINVAL; -} -EXPORT_SYMBOL(ce_aes_setkey); - -static struct crypto_alg aes_alg = { - .cra_name = "aes", - .cra_driver_name = "aes-ce", - .cra_priority = 250, - .cra_flags = CRYPTO_ALG_TYPE_CIPHER, - .cra_blocksize = AES_BLOCK_SIZE, - .cra_ctxsize = sizeof(struct crypto_aes_ctx), - .cra_module = THIS_MODULE, - .cra_cipher = { - .cia_min_keysize = AES_MIN_KEY_SIZE, - .cia_max_keysize = AES_MAX_KEY_SIZE, - .cia_setkey = ce_aes_setkey, - .cia_encrypt = aes_cipher_encrypt, - .cia_decrypt = aes_cipher_decrypt - } -}; - -static int __init aes_mod_init(void) -{ - return crypto_register_alg(&aes_alg); -} - -static void __exit aes_mod_exit(void) -{ - crypto_unregister_alg(&aes_alg); -} - -module_cpu_feature_match(AES, aes_mod_init); -module_exit(aes_mod_exit);
diff --git a/arch/arm64/crypto/aes-ce-cipher-glue.c b/arch/arm64/crypto/aes-ce-cipher-glue.c new file mode 100644 index 0000000..442949e --- /dev/null +++ b/arch/arm64/crypto/aes-ce-cipher-glue.c
@@ -0,0 +1,83 @@ +/* + * aes-ce-cipher.c - core AES cipher using ARMv8 Crypto Extensions + * + * Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <crypto/aes.h> +#include <linux/cpufeature.h> +#include <linux/crypto.h> +#include <linux/module.h> + +#include "aes-ce-setkey.h" + +MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions"); +MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>"); +MODULE_LICENSE("GPL v2"); + +extern void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]); +extern void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]); + +#ifdef CONFIG_CFI_CLANG +static inline void __cfi_aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) +{ + aes_cipher_encrypt(tfm, dst, src); +} + +static inline void __cfi_aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[]) +{ + aes_cipher_decrypt(tfm, dst, src); +} + +#define aes_cipher_encrypt __cfi_aes_cipher_encrypt +#define aes_cipher_decrypt __cfi_aes_cipher_decrypt +#endif + +int ce_aes_setkey(struct crypto_tfm *tfm, const u8 *in_key, + unsigned int key_len) +{ + struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); + int ret; + + ret = ce_aes_expandkey(ctx, in_key, key_len); + if (!ret) + return 0; + + tfm->crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN; + return -EINVAL; +} +EXPORT_SYMBOL(ce_aes_setkey); + +static struct crypto_alg aes_alg = { + .cra_name = "aes", + .cra_driver_name = "aes-ce", + .cra_priority = 250, + .cra_flags = CRYPTO_ALG_TYPE_CIPHER, + .cra_blocksize = AES_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct crypto_aes_ctx), + .cra_module = THIS_MODULE, + .cra_cipher = { + .cia_min_keysize = AES_MIN_KEY_SIZE, + .cia_max_keysize = AES_MAX_KEY_SIZE, + .cia_setkey = ce_aes_setkey, + .cia_encrypt = aes_cipher_encrypt, + .cia_decrypt = aes_cipher_decrypt + } +}; + +static int __init aes_mod_init(void) +{ + return crypto_register_alg(&aes_alg); +} + +static void __exit aes_mod_exit(void) +{ + crypto_unregister_alg(&aes_alg); +} + +module_cpu_feature_match(AES, aes_mod_init); +module_exit(aes_mod_exit);
diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S index c98e7e8..85504087 100644 --- a/arch/arm64/crypto/sha1-ce-core.S +++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -82,7 +82,8 @@ ldr dgb, [x0, #16] /* load sha1_ce_state::finalize */ - ldr w4, [x0, #:lo12:sha1_ce_offsetof_finalize] + ldr_l w4, sha1_ce_offsetof_finalize, x4 + ldr w4, [x0, x4] /* load input */ 0: ld1 {v8.4s-v11.4s}, [x1], #64 @@ -132,7 +133,8 @@ * the padding is handled by the C code in that case. */ cbz x4, 3f - ldr x4, [x0, #:lo12:sha1_ce_offsetof_count] + ldr_l w4, sha1_ce_offsetof_count, x4 + ldr x4, [x0, x4] movi v9.2d, #0 mov x8, #0x80000000 movi v10.2d, #0
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c index aefda986..f08ecf4 100644 --- a/arch/arm64/crypto/sha1-ce-glue.c +++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -17,9 +17,6 @@ #include <linux/crypto.h> #include <linux/module.h> -#define ASM_EXPORT(sym, val) \ - asm(".globl " #sym "; .set " #sym ", %0" :: "I"(val)); - MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions"); MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>"); MODULE_LICENSE("GPL v2"); @@ -31,6 +28,17 @@ struct sha1_ce_state { asmlinkage void sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src, int blocks); +#ifdef CONFIG_CFI_CLANG +static inline void __cfi_sha1_ce_transform(struct sha1_state *sst, + u8 const *src, int blocks) +{ + sha1_ce_transform((struct sha1_ce_state *)sst, src, blocks); +} +#define sha1_ce_transform __cfi_sha1_ce_transform +#endif + +const u32 sha1_ce_offsetof_count = offsetof(struct sha1_ce_state, sst.count); +const u32 sha1_ce_offsetof_finalize = offsetof(struct sha1_ce_state, finalize); static int sha1_ce_update(struct shash_desc *desc, const u8 *data, unsigned int len) @@ -52,11 +60,6 @@ static int sha1_ce_finup(struct shash_desc *desc, const u8 *data, struct sha1_ce_state *sctx = shash_desc_ctx(desc); bool finalize = !sctx->sst.count && !(len % SHA1_BLOCK_SIZE); - ASM_EXPORT(sha1_ce_offsetof_count, - offsetof(struct sha1_ce_state, sst.count)); - ASM_EXPORT(sha1_ce_offsetof_finalize, - offsetof(struct sha1_ce_state, finalize)); - /* * Allow the asm code to perform the finalization if there is no * partial data and the input is a round multiple of the block size.
diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S index 01cfee0..679c6c0 100644 --- a/arch/arm64/crypto/sha2-ce-core.S +++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -88,7 +88,8 @@ ld1 {dgav.4s, dgbv.4s}, [x0] /* load sha256_ce_state::finalize */ - ldr w4, [x0, #:lo12:sha256_ce_offsetof_finalize] + ldr_l w4, sha256_ce_offsetof_finalize, x4 + ldr w4, [x0, x4] /* load input */ 0: ld1 {v16.4s-v19.4s}, [x1], #64 @@ -136,7 +137,8 @@ * the padding is handled by the C code in that case. */ cbz x4, 3f - ldr x4, [x0, #:lo12:sha256_ce_offsetof_count] + ldr_l w4, sha256_ce_offsetof_count, x4 + ldr x4, [x0, x4] movi v17.2d, #0 mov x8, #0x80000000 movi v18.2d, #0
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c index 7cd5875..c243221 100644 --- a/arch/arm64/crypto/sha2-ce-glue.c +++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -17,9 +17,6 @@ #include <linux/crypto.h> #include <linux/module.h> -#define ASM_EXPORT(sym, val) \ - asm(".globl " #sym "; .set " #sym ", %0" :: "I"(val)); - MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions"); MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>"); MODULE_LICENSE("GPL v2"); @@ -31,6 +28,19 @@ struct sha256_ce_state { asmlinkage void sha2_ce_transform(struct sha256_ce_state *sst, u8 const *src, int blocks); +#ifdef CONFIG_CFI_CLANG +static inline void __cfi_sha2_ce_transform(struct sha256_state *sst, + u8 const *src, int blocks) +{ + sha2_ce_transform((struct sha256_ce_state *)sst, src, blocks); +} +#define sha2_ce_transform __cfi_sha2_ce_transform +#endif + +const u32 sha256_ce_offsetof_count = offsetof(struct sha256_ce_state, + sst.count); +const u32 sha256_ce_offsetof_finalize = offsetof(struct sha256_ce_state, + finalize); static int sha256_ce_update(struct shash_desc *desc, const u8 *data, unsigned int len) @@ -52,11 +62,6 @@ static int sha256_ce_finup(struct shash_desc *desc, const u8 *data, struct sha256_ce_state *sctx = shash_desc_ctx(desc); bool finalize = !sctx->sst.count && !(len % SHA256_BLOCK_SIZE); - ASM_EXPORT(sha256_ce_offsetof_count, - offsetof(struct sha256_ce_state, sst.count)); - ASM_EXPORT(sha256_ce_offsetof_finalize, - offsetof(struct sha256_ce_state, finalize)); - /* * Allow the asm code to perform the finalization if there is no * partial data and the input is a round multiple of the block size.
diff --git a/arch/arm64/crypto/speck-neon-core.S b/arch/arm64/crypto/speck-neon-core.S new file mode 100644 index 0000000..b144634 --- /dev/null +++ b/arch/arm64/crypto/speck-neon-core.S
@@ -0,0 +1,352 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ARM64 NEON-accelerated implementation of Speck128-XTS and Speck64-XTS + * + * Copyright (c) 2018 Google, Inc + * + * Author: Eric Biggers <ebiggers@google.com> + */ + +#include <linux/linkage.h> + + .text + + // arguments + ROUND_KEYS .req x0 // const {u64,u32} *round_keys + NROUNDS .req w1 // int nrounds + NROUNDS_X .req x1 + DST .req x2 // void *dst + SRC .req x3 // const void *src + NBYTES .req w4 // unsigned int nbytes + TWEAK .req x5 // void *tweak + + // registers which hold the data being encrypted/decrypted + // (underscores avoid a naming collision with ARM64 registers x0-x3) + X_0 .req v0 + Y_0 .req v1 + X_1 .req v2 + Y_1 .req v3 + X_2 .req v4 + Y_2 .req v5 + X_3 .req v6 + Y_3 .req v7 + + // the round key, duplicated in all lanes + ROUND_KEY .req v8 + + // index vector for tbl-based 8-bit rotates + ROTATE_TABLE .req v9 + ROTATE_TABLE_Q .req q9 + + // temporary registers + TMP0 .req v10 + TMP1 .req v11 + TMP2 .req v12 + TMP3 .req v13 + + // multiplication table for updating XTS tweaks + GFMUL_TABLE .req v14 + GFMUL_TABLE_Q .req q14 + + // next XTS tweak value(s) + TWEAKV_NEXT .req v15 + + // XTS tweaks for the blocks currently being encrypted/decrypted + TWEAKV0 .req v16 + TWEAKV1 .req v17 + TWEAKV2 .req v18 + TWEAKV3 .req v19 + TWEAKV4 .req v20 + TWEAKV5 .req v21 + TWEAKV6 .req v22 + TWEAKV7 .req v23 + + .align 4 +.Lror64_8_table: + .octa 0x080f0e0d0c0b0a090007060504030201 +.Lror32_8_table: + .octa 0x0c0f0e0d080b0a090407060500030201 +.Lrol64_8_table: + .octa 0x0e0d0c0b0a09080f0605040302010007 +.Lrol32_8_table: + .octa 0x0e0d0c0f0a09080b0605040702010003 +.Lgf128mul_table: + .octa 0x00000000000000870000000000000001 +.Lgf64mul_table: + .octa 0x0000000000000000000000002d361b00 + +/* + * _speck_round_128bytes() - Speck encryption round on 128 bytes at a time + * + * Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for + * Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes + * of ROUND_KEY. 'n' is the lane size: 64 for Speck128, or 32 for Speck64. + * 'lanes' is the lane specifier: "2d" for Speck128 or "4s" for Speck64. + */ +.macro _speck_round_128bytes n, lanes + + // x = ror(x, 8) + tbl X_0.16b, {X_0.16b}, ROTATE_TABLE.16b + tbl X_1.16b, {X_1.16b}, ROTATE_TABLE.16b + tbl X_2.16b, {X_2.16b}, ROTATE_TABLE.16b + tbl X_3.16b, {X_3.16b}, ROTATE_TABLE.16b + + // x += y + add X_0.\lanes, X_0.\lanes, Y_0.\lanes + add X_1.\lanes, X_1.\lanes, Y_1.\lanes + add X_2.\lanes, X_2.\lanes, Y_2.\lanes + add X_3.\lanes, X_3.\lanes, Y_3.\lanes + + // x ^= k + eor X_0.16b, X_0.16b, ROUND_KEY.16b + eor X_1.16b, X_1.16b, ROUND_KEY.16b + eor X_2.16b, X_2.16b, ROUND_KEY.16b + eor X_3.16b, X_3.16b, ROUND_KEY.16b + + // y = rol(y, 3) + shl TMP0.\lanes, Y_0.\lanes, #3 + shl TMP1.\lanes, Y_1.\lanes, #3 + shl TMP2.\lanes, Y_2.\lanes, #3 + shl TMP3.\lanes, Y_3.\lanes, #3 + sri TMP0.\lanes, Y_0.\lanes, #(\n - 3) + sri TMP1.\lanes, Y_1.\lanes, #(\n - 3) + sri TMP2.\lanes, Y_2.\lanes, #(\n - 3) + sri TMP3.\lanes, Y_3.\lanes, #(\n - 3) + + // y ^= x + eor Y_0.16b, TMP0.16b, X_0.16b + eor Y_1.16b, TMP1.16b, X_1.16b + eor Y_2.16b, TMP2.16b, X_2.16b + eor Y_3.16b, TMP3.16b, X_3.16b +.endm + +/* + * _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time + * + * This is the inverse of _speck_round_128bytes(). + */ +.macro _speck_unround_128bytes n, lanes + + // y ^= x + eor TMP0.16b, Y_0.16b, X_0.16b + eor TMP1.16b, Y_1.16b, X_1.16b + eor TMP2.16b, Y_2.16b, X_2.16b + eor TMP3.16b, Y_3.16b, X_3.16b + + // y = ror(y, 3) + ushr Y_0.\lanes, TMP0.\lanes, #3 + ushr Y_1.\lanes, TMP1.\lanes, #3 + ushr Y_2.\lanes, TMP2.\lanes, #3 + ushr Y_3.\lanes, TMP3.\lanes, #3 + sli Y_0.\lanes, TMP0.\lanes, #(\n - 3) + sli Y_1.\lanes, TMP1.\lanes, #(\n - 3) + sli Y_2.\lanes, TMP2.\lanes, #(\n - 3) + sli Y_3.\lanes, TMP3.\lanes, #(\n - 3) + + // x ^= k + eor X_0.16b, X_0.16b, ROUND_KEY.16b + eor X_1.16b, X_1.16b, ROUND_KEY.16b + eor X_2.16b, X_2.16b, ROUND_KEY.16b + eor X_3.16b, X_3.16b, ROUND_KEY.16b + + // x -= y + sub X_0.\lanes, X_0.\lanes, Y_0.\lanes + sub X_1.\lanes, X_1.\lanes, Y_1.\lanes + sub X_2.\lanes, X_2.\lanes, Y_2.\lanes + sub X_3.\lanes, X_3.\lanes, Y_3.\lanes + + // x = rol(x, 8) + tbl X_0.16b, {X_0.16b}, ROTATE_TABLE.16b + tbl X_1.16b, {X_1.16b}, ROTATE_TABLE.16b + tbl X_2.16b, {X_2.16b}, ROTATE_TABLE.16b + tbl X_3.16b, {X_3.16b}, ROTATE_TABLE.16b +.endm + +.macro _next_xts_tweak next, cur, tmp, n +.if \n == 64 + /* + * Calculate the next tweak by multiplying the current one by x, + * modulo p(x) = x^128 + x^7 + x^2 + x + 1. + */ + sshr \tmp\().2d, \cur\().2d, #63 + and \tmp\().16b, \tmp\().16b, GFMUL_TABLE.16b + shl \next\().2d, \cur\().2d, #1 + ext \tmp\().16b, \tmp\().16b, \tmp\().16b, #8 + eor \next\().16b, \next\().16b, \tmp\().16b +.else + /* + * Calculate the next two tweaks by multiplying the current ones by x^2, + * modulo p(x) = x^64 + x^4 + x^3 + x + 1. + */ + ushr \tmp\().2d, \cur\().2d, #62 + shl \next\().2d, \cur\().2d, #2 + tbl \tmp\().16b, {GFMUL_TABLE.16b}, \tmp\().16b + eor \next\().16b, \next\().16b, \tmp\().16b +.endif +.endm + +/* + * _speck_xts_crypt() - Speck-XTS encryption/decryption + * + * Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer + * using Speck-XTS, specifically the variant with a block size of '2n' and round + * count given by NROUNDS. The expanded round keys are given in ROUND_KEYS, and + * the current XTS tweak value is given in TWEAK. It's assumed that NBYTES is a + * nonzero multiple of 128. + */ +.macro _speck_xts_crypt n, lanes, decrypting + + /* + * If decrypting, modify the ROUND_KEYS parameter to point to the last + * round key rather than the first, since for decryption the round keys + * are used in reverse order. + */ +.if \decrypting + mov NROUNDS, NROUNDS /* zero the high 32 bits */ +.if \n == 64 + add ROUND_KEYS, ROUND_KEYS, NROUNDS_X, lsl #3 + sub ROUND_KEYS, ROUND_KEYS, #8 +.else + add ROUND_KEYS, ROUND_KEYS, NROUNDS_X, lsl #2 + sub ROUND_KEYS, ROUND_KEYS, #4 +.endif +.endif + + // Load the index vector for tbl-based 8-bit rotates +.if \decrypting + ldr ROTATE_TABLE_Q, .Lrol\n\()_8_table +.else + ldr ROTATE_TABLE_Q, .Lror\n\()_8_table +.endif + + // One-time XTS preparation +.if \n == 64 + // Load first tweak + ld1 {TWEAKV0.16b}, [TWEAK] + + // Load GF(2^128) multiplication table + ldr GFMUL_TABLE_Q, .Lgf128mul_table +.else + // Load first tweak + ld1 {TWEAKV0.8b}, [TWEAK] + + // Load GF(2^64) multiplication table + ldr GFMUL_TABLE_Q, .Lgf64mul_table + + // Calculate second tweak, packing it together with the first + ushr TMP0.2d, TWEAKV0.2d, #63 + shl TMP1.2d, TWEAKV0.2d, #1 + tbl TMP0.8b, {GFMUL_TABLE.16b}, TMP0.8b + eor TMP0.8b, TMP0.8b, TMP1.8b + mov TWEAKV0.d[1], TMP0.d[0] +.endif + +.Lnext_128bytes_\@: + + // Calculate XTS tweaks for next 128 bytes + _next_xts_tweak TWEAKV1, TWEAKV0, TMP0, \n + _next_xts_tweak TWEAKV2, TWEAKV1, TMP0, \n + _next_xts_tweak TWEAKV3, TWEAKV2, TMP0, \n + _next_xts_tweak TWEAKV4, TWEAKV3, TMP0, \n + _next_xts_tweak TWEAKV5, TWEAKV4, TMP0, \n + _next_xts_tweak TWEAKV6, TWEAKV5, TMP0, \n + _next_xts_tweak TWEAKV7, TWEAKV6, TMP0, \n + _next_xts_tweak TWEAKV_NEXT, TWEAKV7, TMP0, \n + + // Load the next source blocks into {X,Y}[0-3] + ld1 {X_0.16b-Y_1.16b}, [SRC], #64 + ld1 {X_2.16b-Y_3.16b}, [SRC], #64 + + // XOR the source blocks with their XTS tweaks + eor TMP0.16b, X_0.16b, TWEAKV0.16b + eor Y_0.16b, Y_0.16b, TWEAKV1.16b + eor TMP1.16b, X_1.16b, TWEAKV2.16b + eor Y_1.16b, Y_1.16b, TWEAKV3.16b + eor TMP2.16b, X_2.16b, TWEAKV4.16b + eor Y_2.16b, Y_2.16b, TWEAKV5.16b + eor TMP3.16b, X_3.16b, TWEAKV6.16b + eor Y_3.16b, Y_3.16b, TWEAKV7.16b + + /* + * De-interleave the 'x' and 'y' elements of each block, i.e. make it so + * that the X[0-3] registers contain only the second halves of blocks, + * and the Y[0-3] registers contain only the first halves of blocks. + * (Speck uses the order (y, x) rather than the more intuitive (x, y).) + */ + uzp2 X_0.\lanes, TMP0.\lanes, Y_0.\lanes + uzp1 Y_0.\lanes, TMP0.\lanes, Y_0.\lanes + uzp2 X_1.\lanes, TMP1.\lanes, Y_1.\lanes + uzp1 Y_1.\lanes, TMP1.\lanes, Y_1.\lanes + uzp2 X_2.\lanes, TMP2.\lanes, Y_2.\lanes + uzp1 Y_2.\lanes, TMP2.\lanes, Y_2.\lanes + uzp2 X_3.\lanes, TMP3.\lanes, Y_3.\lanes + uzp1 Y_3.\lanes, TMP3.\lanes, Y_3.\lanes + + // Do the cipher rounds + mov x6, ROUND_KEYS + mov w7, NROUNDS +.Lnext_round_\@: +.if \decrypting + ld1r {ROUND_KEY.\lanes}, [x6] + sub x6, x6, #( \n / 8 ) + _speck_unround_128bytes \n, \lanes +.else + ld1r {ROUND_KEY.\lanes}, [x6], #( \n / 8 ) + _speck_round_128bytes \n, \lanes +.endif + subs w7, w7, #1 + bne .Lnext_round_\@ + + // Re-interleave the 'x' and 'y' elements of each block + zip1 TMP0.\lanes, Y_0.\lanes, X_0.\lanes + zip2 Y_0.\lanes, Y_0.\lanes, X_0.\lanes + zip1 TMP1.\lanes, Y_1.\lanes, X_1.\lanes + zip2 Y_1.\lanes, Y_1.\lanes, X_1.\lanes + zip1 TMP2.\lanes, Y_2.\lanes, X_2.\lanes + zip2 Y_2.\lanes, Y_2.\lanes, X_2.\lanes + zip1 TMP3.\lanes, Y_3.\lanes, X_3.\lanes + zip2 Y_3.\lanes, Y_3.\lanes, X_3.\lanes + + // XOR the encrypted/decrypted blocks with the tweaks calculated earlier + eor X_0.16b, TMP0.16b, TWEAKV0.16b + eor Y_0.16b, Y_0.16b, TWEAKV1.16b + eor X_1.16b, TMP1.16b, TWEAKV2.16b + eor Y_1.16b, Y_1.16b, TWEAKV3.16b + eor X_2.16b, TMP2.16b, TWEAKV4.16b + eor Y_2.16b, Y_2.16b, TWEAKV5.16b + eor X_3.16b, TMP3.16b, TWEAKV6.16b + eor Y_3.16b, Y_3.16b, TWEAKV7.16b + mov TWEAKV0.16b, TWEAKV_NEXT.16b + + // Store the ciphertext in the destination buffer + st1 {X_0.16b-Y_1.16b}, [DST], #64 + st1 {X_2.16b-Y_3.16b}, [DST], #64 + + // Continue if there are more 128-byte chunks remaining + subs NBYTES, NBYTES, #128 + bne .Lnext_128bytes_\@ + + // Store the next tweak and return +.if \n == 64 + st1 {TWEAKV_NEXT.16b}, [TWEAK] +.else + st1 {TWEAKV_NEXT.8b}, [TWEAK] +.endif + ret +.endm + +ENTRY(speck128_xts_encrypt_neon) + _speck_xts_crypt n=64, lanes=2d, decrypting=0 +ENDPROC(speck128_xts_encrypt_neon) + +ENTRY(speck128_xts_decrypt_neon) + _speck_xts_crypt n=64, lanes=2d, decrypting=1 +ENDPROC(speck128_xts_decrypt_neon) + +ENTRY(speck64_xts_encrypt_neon) + _speck_xts_crypt n=32, lanes=4s, decrypting=0 +ENDPROC(speck64_xts_encrypt_neon) + +ENTRY(speck64_xts_decrypt_neon) + _speck_xts_crypt n=32, lanes=4s, decrypting=1 +ENDPROC(speck64_xts_decrypt_neon)
diff --git a/arch/arm64/crypto/speck-neon-glue.c b/arch/arm64/crypto/speck-neon-glue.c new file mode 100644 index 0000000..c7d1856 --- /dev/null +++ b/arch/arm64/crypto/speck-neon-glue.c
@@ -0,0 +1,308 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS + * (64-bit version; based on the 32-bit version) + * + * Copyright (c) 2018 Google, Inc + */ + +#include <asm/hwcap.h> +#include <asm/neon.h> +#include <asm/simd.h> +#include <crypto/algapi.h> +#include <crypto/gf128mul.h> +#include <crypto/speck.h> +#include <crypto/xts.h> +#include <linux/kernel.h> +#include <linux/module.h> + +/* The assembly functions only handle multiples of 128 bytes */ +#define SPECK_NEON_CHUNK_SIZE 128 + +/* Speck128 */ + +struct speck128_xts_tfm_ctx { + struct speck128_tfm_ctx main_key; + struct speck128_tfm_ctx tweak_key; +}; + +asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds, + void *dst, const void *src, + unsigned int nbytes, void *tweak); + +asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds, + void *dst, const void *src, + unsigned int nbytes, void *tweak); + +typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *, + u8 *, const u8 *); +typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *, + const void *, unsigned int, void *); + +static __always_inline int +__speck128_xts_crypt(struct blkcipher_desc *desc, struct scatterlist *dst, + struct scatterlist *src, unsigned int nbytes, + speck128_crypt_one_t crypt_one, + speck128_xts_crypt_many_t crypt_many) +{ + struct crypto_blkcipher *tfm = desc->tfm; + const struct speck128_xts_tfm_ctx *ctx = crypto_blkcipher_ctx(tfm); + struct blkcipher_walk walk; + le128 tweak; + int err; + + blkcipher_walk_init(&walk, dst, src, nbytes); + err = blkcipher_walk_virt_block(desc, &walk, SPECK_NEON_CHUNK_SIZE); + + crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv); + + while (walk.nbytes > 0) { + unsigned int nbytes = walk.nbytes; + u8 *dst = walk.dst.virt.addr; + const u8 *src = walk.src.virt.addr; + + if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) { + unsigned int count; + + count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE); + kernel_neon_begin(); + (*crypt_many)(ctx->main_key.round_keys, + ctx->main_key.nrounds, + dst, src, count, &tweak); + kernel_neon_end(); + dst += count; + src += count; + nbytes -= count; + } + + /* Handle any remainder with generic code */ + while (nbytes >= sizeof(tweak)) { + le128_xor((le128 *)dst, (const le128 *)src, &tweak); + (*crypt_one)(&ctx->main_key, dst, dst); + le128_xor((le128 *)dst, (const le128 *)dst, &tweak); + gf128mul_x_ble((be128 *)&tweak, (const be128 *)&tweak); + + dst += sizeof(tweak); + src += sizeof(tweak); + nbytes -= sizeof(tweak); + } + err = blkcipher_walk_done(desc, &walk, nbytes); + } + + return err; +} + +static int speck128_xts_encrypt(struct blkcipher_desc *desc, + struct scatterlist *dst, + struct scatterlist *src, + unsigned int nbytes) +{ + return __speck128_xts_crypt(desc, dst, src, nbytes, + crypto_speck128_encrypt, + speck128_xts_encrypt_neon); +} + +static int speck128_xts_decrypt(struct blkcipher_desc *desc, + struct scatterlist *dst, + struct scatterlist *src, + unsigned int nbytes) +{ + return __speck128_xts_crypt(desc, dst, src, nbytes, + crypto_speck128_decrypt, + speck128_xts_decrypt_neon); +} + +static int speck128_xts_setkey(struct crypto_tfm *tfm, const u8 *key, + unsigned int keylen) +{ + struct speck128_xts_tfm_ctx *ctx = crypto_tfm_ctx(tfm); + int err; + + if (keylen % 2) + return -EINVAL; + + keylen /= 2; + + err = crypto_speck128_setkey(&ctx->main_key, key, keylen); + if (err) + return err; + + return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen); +} + +/* Speck64 */ + +struct speck64_xts_tfm_ctx { + struct speck64_tfm_ctx main_key; + struct speck64_tfm_ctx tweak_key; +}; + +asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds, + void *dst, const void *src, + unsigned int nbytes, void *tweak); + +asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds, + void *dst, const void *src, + unsigned int nbytes, void *tweak); + +typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *, + u8 *, const u8 *); +typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *, + const void *, unsigned int, void *); + +static __always_inline int +__speck64_xts_crypt(struct blkcipher_desc *desc, struct scatterlist *dst, + struct scatterlist *src, unsigned int nbytes, + speck64_crypt_one_t crypt_one, + speck64_xts_crypt_many_t crypt_many) +{ + struct crypto_blkcipher *tfm = desc->tfm; + const struct speck64_xts_tfm_ctx *ctx = crypto_blkcipher_ctx(tfm); + struct blkcipher_walk walk; + __le64 tweak; + int err; + + blkcipher_walk_init(&walk, dst, src, nbytes); + err = blkcipher_walk_virt_block(desc, &walk, SPECK_NEON_CHUNK_SIZE); + + crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv); + + while (walk.nbytes > 0) { + unsigned int nbytes = walk.nbytes; + u8 *dst = walk.dst.virt.addr; + const u8 *src = walk.src.virt.addr; + + if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) { + unsigned int count; + + count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE); + kernel_neon_begin(); + (*crypt_many)(ctx->main_key.round_keys, + ctx->main_key.nrounds, + dst, src, count, &tweak); + kernel_neon_end(); + dst += count; + src += count; + nbytes -= count; + } + + /* Handle any remainder with generic code */ + while (nbytes >= sizeof(tweak)) { + *(__le64 *)dst = *(__le64 *)src ^ tweak; + (*crypt_one)(&ctx->main_key, dst, dst); + *(__le64 *)dst ^= tweak; + tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^ + ((tweak & cpu_to_le64(1ULL << 63)) ? + 0x1B : 0)); + dst += sizeof(tweak); + src += sizeof(tweak); + nbytes -= sizeof(tweak); + } + err = blkcipher_walk_done(desc, &walk, nbytes); + } + + return err; +} + +static int speck64_xts_encrypt(struct blkcipher_desc *desc, + struct scatterlist *dst, struct scatterlist *src, + unsigned int nbytes) +{ + return __speck64_xts_crypt(desc, dst, src, nbytes, + crypto_speck64_encrypt, + speck64_xts_encrypt_neon); +} + +static int speck64_xts_decrypt(struct blkcipher_desc *desc, + struct scatterlist *dst, struct scatterlist *src, + unsigned int nbytes) +{ + return __speck64_xts_crypt(desc, dst, src, nbytes, + crypto_speck64_decrypt, + speck64_xts_decrypt_neon); +} + +static int speck64_xts_setkey(struct crypto_tfm *tfm, const u8 *key, + unsigned int keylen) +{ + struct speck64_xts_tfm_ctx *ctx = crypto_tfm_ctx(tfm); + int err; + + if (keylen % 2) + return -EINVAL; + + keylen /= 2; + + err = crypto_speck64_setkey(&ctx->main_key, key, keylen); + if (err) + return err; + + return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen); +} + +static struct crypto_alg speck_algs[] = { + { + .cra_name = "xts(speck128)", + .cra_driver_name = "xts-speck128-neon", + .cra_priority = 300, + .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER, + .cra_blocksize = SPECK128_BLOCK_SIZE, + .cra_type = &crypto_blkcipher_type, + .cra_ctxsize = sizeof(struct speck128_xts_tfm_ctx), + .cra_alignmask = 7, + .cra_module = THIS_MODULE, + .cra_u = { + .blkcipher = { + .min_keysize = 2 * SPECK128_128_KEY_SIZE, + .max_keysize = 2 * SPECK128_256_KEY_SIZE, + .ivsize = SPECK128_BLOCK_SIZE, + .setkey = speck128_xts_setkey, + .encrypt = speck128_xts_encrypt, + .decrypt = speck128_xts_decrypt, + } + } + }, { + .cra_name = "xts(speck64)", + .cra_driver_name = "xts-speck64-neon", + .cra_priority = 300, + .cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER, + .cra_blocksize = SPECK64_BLOCK_SIZE, + .cra_type = &crypto_blkcipher_type, + .cra_ctxsize = sizeof(struct speck64_xts_tfm_ctx), + .cra_alignmask = 7, + .cra_module = THIS_MODULE, + .cra_u = { + .blkcipher = { + .min_keysize = 2 * SPECK64_96_KEY_SIZE, + .max_keysize = 2 * SPECK64_128_KEY_SIZE, + .ivsize = SPECK64_BLOCK_SIZE, + .setkey = speck64_xts_setkey, + .encrypt = speck64_xts_encrypt, + .decrypt = speck64_xts_decrypt, + } + } + } +}; + +static int __init speck_neon_module_init(void) +{ + if (!(elf_hwcap & HWCAP_ASIMD)) + return -ENODEV; + return crypto_register_algs(speck_algs, ARRAY_SIZE(speck_algs)); +} + +static void __exit speck_neon_module_exit(void) +{ + crypto_unregister_algs(speck_algs, ARRAY_SIZE(speck_algs)); +} + +module_init(speck_neon_module_init); +module_exit(speck_neon_module_exit); + +MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)"); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>"); +MODULE_ALIAS_CRYPTO("xts(speck128)"); +MODULE_ALIAS_CRYPTO("xts-speck128-neon"); +MODULE_ALIAS_CRYPTO("xts(speck64)"); +MODULE_ALIAS_CRYPTO("xts-speck64-neon");
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 44e1d7f..28196b1 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild
@@ -1,7 +1,6 @@ generic-y += bugs.h generic-y += clkdev.h generic-y += cputime.h -generic-y += current.h generic-y += delay.h generic-y += div64.h generic-y += dma.h
diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h index f8ae6d6..fecf7bf 100644 --- a/arch/arm64/include/asm/arch_gicv3.h +++ b/arch/arm64/include/asm/arch_gicv3.h
@@ -83,14 +83,20 @@ #define read_gicreg(r) \ ({ \ u64 reg; \ - asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \ + asm volatile(DEFINE_MRS_S \ + "mrs_s %0, " __stringify(r) "\n" \ + UNDEFINE_MRS_S \ + : "=r" (reg)); \ reg; \ }) #define write_gicreg(v,r) \ do { \ u64 __val = (v); \ - asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\ + asm volatile(DEFINE_MSR_S \ + "msr_s " __stringify(r) ", %0\n" \ + UNDEFINE_MSR_S \ + : : "r" (__val)); \ } while (0) /* @@ -102,13 +108,19 @@ static inline void gic_write_eoir(u32 irq) { - asm volatile("msr_s " __stringify(ICC_EOIR1_EL1) ", %0" : : "r" ((u64)irq)); + asm volatile(DEFINE_MSR_S + "msr_s " __stringify(ICC_EOIR1_EL1) ", %0\n" + UNDEFINE_MSR_S + : : "r" ((u64)irq)); isb(); } static inline void gic_write_dir(u32 irq) { - asm volatile("msr_s " __stringify(ICC_DIR_EL1) ", %0" : : "r" ((u64)irq)); + asm volatile(DEFINE_MSR_S + "msr_s " __stringify(ICC_DIR_EL1) ", %0\n" + UNDEFINE_MSR_S + : : "r" ((u64)irq)); isb(); } @@ -116,7 +128,10 @@ static inline u64 gic_read_iar_common(void) { u64 irqstat; - asm volatile("mrs_s %0, " __stringify(ICC_IAR1_EL1) : "=r" (irqstat)); + asm volatile(DEFINE_MRS_S + "mrs_s %0, " __stringify(ICC_IAR1_EL1) "\n" + UNDEFINE_MRS_S + : "=r" (irqstat)); dsb(sy); return irqstat; } @@ -135,7 +150,9 @@ static inline u64 gic_read_iar_cavium_thunderx(void) asm volatile( "nop;nop;nop;nop\n\t" "nop;nop;nop;nop\n\t" + DEFINE_MRS_S "mrs_s %0, " __stringify(ICC_IAR1_EL1) "\n\t" + UNDEFINE_MRS_S "nop;nop;nop;nop" : "=r" (irqstat)); mb(); @@ -145,43 +162,68 @@ static inline u64 gic_read_iar_cavium_thunderx(void) static inline void gic_write_pmr(u32 val) { - asm volatile("msr_s " __stringify(ICC_PMR_EL1) ", %0" : : "r" ((u64)val)); + asm volatile(DEFINE_MSR_S + "msr_s " __stringify(ICC_PMR_EL1) ", %0\n" + UNDEFINE_MSR_S + : : "r" ((u64)val)); + /* As per the architecture specification */ + mb(); } static inline void gic_write_ctlr(u32 val) { - asm volatile("msr_s " __stringify(ICC_CTLR_EL1) ", %0" : : "r" ((u64)val)); + asm volatile(DEFINE_MSR_S + "msr_s " __stringify(ICC_CTLR_EL1) ", %0\n" + UNDEFINE_MSR_S + : : "r" ((u64)val)); isb(); } static inline void gic_write_grpen1(u32 val) { - asm volatile("msr_s " __stringify(ICC_GRPEN1_EL1) ", %0" : : "r" ((u64)val)); + asm volatile(DEFINE_MSR_S + "msr_s " __stringify(ICC_GRPEN1_EL1) ", %0\n" + UNDEFINE_MSR_S + : : "r" ((u64)val)); isb(); } static inline void gic_write_sgi1r(u64 val) { - asm volatile("msr_s " __stringify(ICC_SGI1R_EL1) ", %0" : : "r" (val)); + asm volatile(DEFINE_MSR_S + "msr_s " __stringify(ICC_SGI1R_EL1) ", %0\n" + UNDEFINE_MSR_S + : : "r" (val)); + /* As per the architecture specification */ + mb(); } static inline u32 gic_read_sre(void) { u64 val; - asm volatile("mrs_s %0, " __stringify(ICC_SRE_EL1) : "=r" (val)); + asm volatile(DEFINE_MRS_S + "mrs_s %0, " __stringify(ICC_SRE_EL1) "\n" + UNDEFINE_MRS_S + : "=r" (val)); return val; } static inline void gic_write_sre(u32 val) { - asm volatile("msr_s " __stringify(ICC_SRE_EL1) ", %0" : : "r" ((u64)val)); + asm volatile(DEFINE_MSR_S + "msr_s " __stringify(ICC_SRE_EL1) ", %0\n" + UNDEFINE_MSR_S + : : "r" ((u64)val)); isb(); } static inline void gic_write_bpr1(u32 val) { - asm volatile("msr_s " __stringify(ICC_BPR1_EL1) ", %0" : : "r" (val)); + asm volatile(DEFINE_MSR_S + "msr_s " __stringify(ICC_BPR1_EL1) ", %x0\n" + UNDEFINE_MSR_S + : : "rZ" (val)); } #define gic_read_typer(c) readq_relaxed(c)
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 3f85bbc..148849c 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h
@@ -42,6 +42,15 @@ msr daifclr, #2 .endm + .macro save_and_disable_irq, flags + mrs \flags, daif + msr daifset, #2 + .endm + + .macro restore_irq, flags + msr daif, \flags + .endm + /* * Enable and disable debug exceptions. */ @@ -451,6 +460,13 @@ alternative_endif movk \reg, :abs_g0_nc:\val .endm +/* + * Return the current thread_info. + */ + .macro get_thread_info, rd + mrs \rd, sp_el0 + .endm + .macro pte_to_phys, phys, pte and \phys, \pte, #(((1 << (48 - PAGE_SHIFT)) - 1) << PAGE_SHIFT) .endm
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 15868eca..1dc16f5 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h
@@ -221,6 +221,12 @@ static inline bool system_supports_mixed_endian_el0(void) return id_aa64mmfr0_mixed_endian_el0(read_system_reg(SYS_ID_AA64MMFR0_EL1)); } +static inline bool system_uses_ttbr0_pan(void) +{ + return IS_ENABLED(CONFIG_ARM64_SW_TTBR0_PAN) && + !cpus_have_cap(ARM64_HAS_PAN); +} + #define ARM64_SSBD_UNKNOWN -1 #define ARM64_SSBD_FORCE_DISABLE 0 #define ARM64_SSBD_KERNEL 1
diff --git a/arch/arm64/include/asm/current.h b/arch/arm64/include/asm/current.h new file mode 100644 index 0000000..86c4041 --- /dev/null +++ b/arch/arm64/include/asm/current.h
@@ -0,0 +1,30 @@ +#ifndef __ASM_CURRENT_H +#define __ASM_CURRENT_H + +#include <linux/compiler.h> + +#include <asm/sysreg.h> + +#ifndef __ASSEMBLY__ + +struct task_struct; + +/* + * We don't use read_sysreg() as we want the compiler to cache the value where + * possible. + */ +static __always_inline struct task_struct *get_current(void) +{ + unsigned long sp_el0; + + asm ("mrs %0, sp_el0" : "=r" (sp_el0)); + + return (struct task_struct *)sp_el0; +} + +#define current get_current() + +#endif /* __ASSEMBLY__ */ + +#endif /* __ASM_CURRENT_H */ +
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h index a9e54aa..6c4f9e8 100644 --- a/arch/arm64/include/asm/efi.h +++ b/arch/arm64/include/asm/efi.h
@@ -1,6 +1,7 @@ #ifndef _ASM_EFI_H #define _ASM_EFI_H +#include <asm/cpufeature.h> #include <asm/io.h> #include <asm/mmu_context.h> #include <asm/neon.h> @@ -54,6 +55,9 @@ int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md); #define alloc_screen_info(x...) &screen_info #define free_screen_info(x...) +/* redeclare as 'hidden' so the compiler will generate relative references */ +extern struct screen_info screen_info __attribute__((__visibility__("hidden"))); + static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) { } @@ -75,7 +79,32 @@ static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) static inline void efi_set_pgd(struct mm_struct *mm) { - switch_mm(NULL, mm, NULL); + __switch_mm(mm); + + if (system_uses_ttbr0_pan()) { + if (mm != current->active_mm) { + /* + * Update the current thread's saved ttbr0 since it is + * restored as part of a return from exception. Enable + * access to the valid TTBR0_EL1 and invoke the errata + * workaround directly since there is no return from + * exception when invoking the EFI run-time services. + */ + update_saved_ttbr0(current, mm); + uaccess_ttbr0_enable(); + post_ttbr_update_workaround(); + } else { + /* + * Defer the switch to the current thread's TTBR0_EL1 + * until uaccess_enable(). Restore the current + * thread's saved ttbr0 corresponding to its active_mm + * (if different from init_mm). + */ + uaccess_ttbr0_disable(); + if (current->active_mm != &init_mm) + update_saved_ttbr0(current, current->active_mm); + } + } } void efi_virtmap_load(void);
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 1fb02307..40a8a94 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h
@@ -169,7 +169,7 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm, #ifdef CONFIG_COMPAT /* PIE load location for compat arm. Must match ARM ELF_ET_DYN_BASE. */ -#define COMPAT_ELF_ET_DYN_BASE 0x000400000UL +#define COMPAT_ELF_ET_DYN_BASE (2 * TASK_SIZE_32 / 3) /* AArch32 registers. */ #define COMPAT_ELF_NGREG 18
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h index d14c478..85997c0 100644 --- a/arch/arm64/include/asm/esr.h +++ b/arch/arm64/include/asm/esr.h
@@ -175,6 +175,12 @@ #define ESR_ELx_SYS64_ISS_SYS_CTR_READ (ESR_ELx_SYS64_ISS_SYS_CTR | \ ESR_ELx_SYS64_ISS_DIR_READ) +#define ESR_ELx_SYS64_ISS_SYS_CNTVCT (ESR_ELx_SYS64_ISS_SYS_VAL(3, 3, 2, 14, 0) | \ + ESR_ELx_SYS64_ISS_DIR_READ) + +#define ESR_ELx_SYS64_ISS_SYS_CNTFRQ (ESR_ELx_SYS64_ISS_SYS_VAL(3, 3, 0, 14, 0) | \ + ESR_ELx_SYS64_ISS_DIR_READ) + #ifndef __ASSEMBLY__ #include <asm/types.h>
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h index 2a5090f..a891bb6 100644 --- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h
@@ -21,15 +21,12 @@ #include <linux/futex.h> #include <linux/uaccess.h> -#include <asm/alternative.h> -#include <asm/cpufeature.h> #include <asm/errno.h> -#include <asm/sysreg.h> #define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \ +do { \ + uaccess_enable(); \ asm volatile( \ - ALTERNATIVE("nop", SET_PSTATE_PAN(0), ARM64_HAS_PAN, \ - CONFIG_ARM64_PAN) \ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ @@ -44,11 +41,11 @@ " .popsection\n" \ _ASM_EXTABLE(1b, 4b) \ _ASM_EXTABLE(2b, 4b) \ - ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN, \ - CONFIG_ARM64_PAN) \ : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp) \ : "r" (oparg), "Ir" (-EFAULT) \ - : "memory") + : "memory"); \ + uaccess_disable(); \ +} while (0) static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) @@ -102,8 +99,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *_uaddr, return -EFAULT; uaddr = __uaccess_mask_ptr(_uaddr); + uaccess_enable(); asm volatile("// futex_atomic_cmpxchg_inatomic\n" -ALTERNATIVE("nop", SET_PSTATE_PAN(0), ARM64_HAS_PAN, CONFIG_ARM64_PAN) " prfm pstl1strm, %2\n" "1: ldxr %w1, %2\n" " sub %w3, %w1, %w4\n" @@ -118,10 +115,10 @@ ALTERNATIVE("nop", SET_PSTATE_PAN(0), ARM64_HAS_PAN, CONFIG_ARM64_PAN) " .popsection\n" _ASM_EXTABLE(1b, 4b) _ASM_EXTABLE(2b, 4b) -ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp) : "r" (oldval), "r" (newval), "Ir" (-EFAULT) : "memory"); + uaccess_disable(); *uval = val; return ret;
diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 9510ace..b6b167a 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -77,7 +77,11 @@ static inline void decode_ctrl_reg(u32 reg, /* Lengths */ #define ARM_BREAKPOINT_LEN_1 0x1 #define ARM_BREAKPOINT_LEN_2 0x3 +#define ARM_BREAKPOINT_LEN_3 0x7 #define ARM_BREAKPOINT_LEN_4 0xf +#define ARM_BREAKPOINT_LEN_5 0x1f +#define ARM_BREAKPOINT_LEN_6 0x3f +#define ARM_BREAKPOINT_LEN_7 0x7f #define ARM_BREAKPOINT_LEN_8 0xff /* Kernel stepping */ @@ -119,7 +123,7 @@ struct perf_event; struct pmu; extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, - int *gen_len, int *gen_type); + int *gen_len, int *gen_type, int *offset); extern int arch_check_bp_in_kernelspace(struct perf_event *bp); extern int arch_validate_hwbkpt_settings(struct perf_event *bp); extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h index 7e51d1b..7803343 100644 --- a/arch/arm64/include/asm/kernel-pgtable.h +++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -19,6 +19,7 @@ #ifndef __ASM_KERNEL_PGTABLE_H #define __ASM_KERNEL_PGTABLE_H +#include <asm/pgtable.h> #include <asm/sparsemem.h> /* @@ -54,6 +55,12 @@ #define SWAPPER_DIR_SIZE (SWAPPER_PGTABLE_LEVELS * PAGE_SIZE) #define IDMAP_DIR_SIZE (IDMAP_PGTABLE_LEVELS * PAGE_SIZE) +#ifdef CONFIG_ARM64_SW_TTBR0_PAN +#define RESERVED_TTBR0_SIZE (PAGE_SIZE) +#else +#define RESERVED_TTBR0_SIZE (0) +#endif + /* Initial memory map size */ #if ARM64_SWAPPER_USES_SECTION_MAPS #define SWAPPER_BLOCK_SHIFT SECTION_SHIFT
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index b18e852..36d6758 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -29,7 +29,9 @@ ({ \ u64 reg; \ asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##nvh),\ - "mrs_s %0, " __stringify(r##vh),\ + DEFINE_MRS_S \ + "mrs_s %0, " __stringify(r##vh) "\n"\ + UNDEFINE_MRS_S, \ ARM64_HAS_VIRT_HOST_EXTN) \ : "=r" (reg)); \ reg; \ @@ -39,7 +41,9 @@ do { \ u64 __val = (u64)(v); \ asm volatile(ALTERNATIVE("msr " __stringify(r##nvh) ", %x0",\ - "msr_s " __stringify(r##vh) ", %x0",\ + DEFINE_MSR_S \ + "msr_s " __stringify(r##vh) ", %x0\n"\ + UNDEFINE_MSR_S, \ ARM64_HAS_VIRT_HOST_EXTN) \ : : "rZ" (__val)); \ } while (0)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 547519a..4287acb 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -47,7 +47,7 @@ * If the page is in the bottom half, we have to use the top half. If * the page is in the top half, we have to use the bottom half: * - * T = __virt_to_phys(__hyp_idmap_text_start) + * T = __pa_symbol(__hyp_idmap_text_start) * if (T & BIT(VA_BITS - 1)) * HYP_VA_MIN = 0 //idmap in upper half * else @@ -290,7 +290,7 @@ static inline void __kvm_flush_dcache_pud(pud_t pud) kvm_flush_dcache_to_poc(page_address(page), PUD_SIZE); } -#define kvm_virt_to_phys(x) __virt_to_phys((unsigned long)(x)) +#define kvm_virt_to_phys(x) __pa_symbol(x) void kvm_set_way_flush(struct kvm_vcpu *vcpu); void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index ba917be..3450209 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h
@@ -152,6 +152,11 @@ extern u64 kimage_vaddr; /* the offset between the kernel virtual and physical mappings */ extern u64 kimage_voffset; +static inline unsigned long kaslr_offset(void) +{ + return kimage_vaddr - KIMAGE_VADDR; +} + /* * Allow all memory at the discovery stage. We will clip it later. */ @@ -192,6 +197,7 @@ static inline void *phys_to_virt(phys_addr_t x) #define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x))) #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) #define virt_to_pfn(x) __phys_to_pfn(__virt_to_phys(x)) +#define sym_to_pfn(x) __phys_to_pfn(__pa_symbol(x)) /* * virt_to_page(k) convert a _valid_ virtual address to struct page *
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 6ac34c7..18b9f15 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h
@@ -16,7 +16,9 @@ #ifndef __ASM_MMU_H #define __ASM_MMU_H + #define USER_ASID_FLAG (UL(1) << 48) +#define TTBR_ASID_MASK (UL(0xffff) << 48) #ifndef __ASSEMBLY__
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h index b96c4799..c0ae91c 100644 --- a/arch/arm64/include/asm/mmu_context.h +++ b/arch/arm64/include/asm/mmu_context.h
@@ -23,6 +23,7 @@ #include <linux/sched.h> #include <asm/cacheflush.h> +#include <asm/cpufeature.h> #include <asm/proc-fns.h> #include <asm-generic/mm_hooks.h> #include <asm/cputype.h> @@ -44,7 +45,7 @@ static inline void contextidr_thread_switch(struct task_struct *next) */ static inline void cpu_set_reserved_ttbr0(void) { - unsigned long ttbr = virt_to_phys(empty_zero_page); + unsigned long ttbr = __pa_symbol(empty_zero_page); write_sysreg(ttbr, ttbr0_el1); isb(); @@ -110,7 +111,7 @@ static inline void cpu_uninstall_idmap(void) local_flush_tlb_all(); cpu_set_default_tcr_t0sz(); - if (mm != &init_mm) + if (mm != &init_mm && !system_uses_ttbr0_pan()) cpu_switch_mm(mm->pgd, mm); } @@ -120,14 +121,14 @@ static inline void cpu_install_idmap(void) local_flush_tlb_all(); cpu_set_idmap_tcr_t0sz(); - cpu_switch_mm(idmap_pg_dir, &init_mm); + cpu_switch_mm(lm_alias(idmap_pg_dir), &init_mm); } /* * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD, * avoiding the possibility of conflicting TLB entries being allocated. */ -static inline void cpu_replace_ttbr1(pgd_t *pgd) +static inline void __nocfi cpu_replace_ttbr1(pgd_t *pgd) { typedef void (ttbr_replace_func)(phys_addr_t); extern ttbr_replace_func idmap_cpu_replace_ttbr1; @@ -135,7 +136,7 @@ static inline void cpu_replace_ttbr1(pgd_t *pgd) phys_addr_t pgd_phys = virt_to_phys(pgd); - replace_phys = (void *)virt_to_phys(idmap_cpu_replace_ttbr1); + replace_phys = (void *)__pa_symbol(idmap_cpu_replace_ttbr1); cpu_install_idmap(); replace_phys(pgd_phys); @@ -170,21 +171,28 @@ enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) { } -/* - * This is the actual mm switch as far as the scheduler - * is concerned. No registers are touched. We avoid - * calling the CPU specific function when the mm hasn't - * actually changed. - */ -static inline void -switch_mm(struct mm_struct *prev, struct mm_struct *next, - struct task_struct *tsk) +#ifdef CONFIG_ARM64_SW_TTBR0_PAN +static inline void update_saved_ttbr0(struct task_struct *tsk, + struct mm_struct *mm) +{ + if (system_uses_ttbr0_pan()) { + u64 ttbr; + BUG_ON(mm->pgd == swapper_pg_dir); + ttbr = virt_to_phys(mm->pgd) | ASID(mm) << 48; + WRITE_ONCE(task_thread_info(tsk)->ttbr0, ttbr); + } +} +#else +static inline void update_saved_ttbr0(struct task_struct *tsk, + struct mm_struct *mm) +{ +} +#endif + +static inline void __switch_mm(struct mm_struct *next) { unsigned int cpu = smp_processor_id(); - if (prev == next) - return; - /* * init_mm.pgd does not contain any user mappings and it is always * active for kernel addresses in TTBR1. Just set the reserved TTBR0. @@ -197,9 +205,28 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next, check_and_switch_context(next, cpu); } +static inline void +switch_mm(struct mm_struct *prev, struct mm_struct *next, + struct task_struct *tsk) +{ + if (prev != next) + __switch_mm(next); + + /* + * Update the saved TTBR0_EL1 of the scheduled-in task as the previous + * value may have not been initialised yet (activate_mm caller) or the + * ASID has changed since the last run (following the context switch + * of another thread of the same process). Avoid setting the reserved + * TTBR0_EL1 to swapper_pg_dir (init_mm; e.g. via idle_task_exit). + */ + if (next != &init_mm) + update_saved_ttbr0(tsk, next); +} + #define deactivate_mm(tsk,mm) do { } while (0) -#define activate_mm(prev,next) switch_mm(prev, next, NULL) +#define activate_mm(prev,next) switch_mm(prev, next, current) void verify_cpu_asid_bits(void); +void post_ttbr_update_workaround(void); #endif
diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h index 0d55157..a2f6bd2 100644 --- a/arch/arm64/include/asm/percpu.h +++ b/arch/arm64/include/asm/percpu.h
@@ -16,6 +16,7 @@ #ifndef __ASM_PERCPU_H #define __ASM_PERCPU_H +#include <asm/stack_pointer.h> #include <asm/alternative.h> static inline void set_my_cpu_offset(unsigned long off)
diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h index 38b6a2b..8d5cbec 100644 --- a/arch/arm64/include/asm/perf_event.h +++ b/arch/arm64/include/asm/perf_event.h
@@ -17,6 +17,8 @@ #ifndef __ASM_PERF_EVENT_H #define __ASM_PERF_EVENT_H +#include <asm/stack_pointer.h> + #define ARMV8_PMU_MAX_COUNTERS 32 #define ARMV8_PMU_COUNTER_MASK (ARMV8_PMU_MAX_COUNTERS - 1)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 3a30a39..7bcd8b9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h
@@ -52,7 +52,7 @@ extern void __pgd_error(const char *file, int line, unsigned long val); * for zero-mapped memory areas etc.. */ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]; -#define ZERO_PAGE(vaddr) pfn_to_page(PHYS_PFN(__pa(empty_zero_page))) +#define ZERO_PAGE(vaddr) phys_to_page(__pa_symbol(empty_zero_page)) #define pte_ERROR(pte) __pte_error(__FILE__, __LINE__, pte_val(pte))
diff --git a/arch/arm64/include/asm/signal32.h b/arch/arm64/include/asm/signal32.h index eeaa975..81abea0 100644 --- a/arch/arm64/include/asm/signal32.h +++ b/arch/arm64/include/asm/signal32.h
@@ -22,8 +22,6 @@ #define AARCH32_KERN_SIGRET_CODE_OFFSET 0x500 -extern const compat_ulong_t aarch32_sigret_code[6]; - int compat_setup_frame(int usig, struct ksignal *ksig, sigset_t *set, struct pt_regs *regs); int compat_setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set,
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 0226447..d050d72 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h
@@ -29,11 +29,22 @@ #ifndef __ASSEMBLY__ +#include <asm/percpu.h> + #include <linux/threads.h> #include <linux/cpumask.h> #include <linux/thread_info.h> -#define raw_smp_processor_id() (current_thread_info()->cpu) +DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number); + +/* + * We don't use this_cpu_read(cpu_number) as that has implicit writes to + * preempt_count, and associated (compiler) barriers, that we'd like to avoid + * the expense of. If we're preemptible, the value can be stale at use anyway. + * And we can't use this_cpu_ptr() either, as that winds up recursing back + * here under CONFIG_DEBUG_PREEMPT=y. + */ +#define raw_smp_processor_id() (*raw_cpu_ptr(&cpu_number)) struct seq_file; @@ -73,6 +84,7 @@ asmlinkage void secondary_start_kernel(void); */ struct secondary_data { void *stack; + struct task_struct *task; long status; };
diff --git a/arch/arm64/include/asm/stack_pointer.h b/arch/arm64/include/asm/stack_pointer.h new file mode 100644 index 0000000..ffcdf74 --- /dev/null +++ b/arch/arm64/include/asm/stack_pointer.h
@@ -0,0 +1,9 @@ +#ifndef __ASM_STACK_POINTER_H +#define __ASM_STACK_POINTER_H + +/* + * how to get the current stack pointer from C + */ +register unsigned long current_stack_pointer asm ("sp"); + +#endif /* __ASM_STACK_POINTER_H */
diff --git a/arch/arm64/include/asm/suspend.h b/arch/arm64/include/asm/suspend.h index b8a313f..de5600f 100644 --- a/arch/arm64/include/asm/suspend.h +++ b/arch/arm64/include/asm/suspend.h
@@ -1,7 +1,7 @@ #ifndef __ASM_SUSPEND_H #define __ASM_SUSPEND_H -#define NR_CTX_REGS 10 +#define NR_CTX_REGS 12 #define NR_CALLEE_SAVED_REGS 12 /*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 88bbe36..8546afc 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h
@@ -246,20 +246,39 @@ #include <linux/types.h> -asm( -" .irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n" -" .equ .L__reg_num_x\\num, \\num\n" -" .endr\n" +#define __DEFINE_MRS_MSR_S_REGNUM \ +" .irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n" \ +" .equ .L__reg_num_x\\num, \\num\n" \ +" .endr\n" \ " .equ .L__reg_num_xzr, 31\n" -"\n" -" .macro mrs_s, rt, sreg\n" -" .inst 0xd5200000|(\\sreg)|(.L__reg_num_\\rt)\n" + +#define DEFINE_MRS_S \ + __DEFINE_MRS_MSR_S_REGNUM \ +" .macro mrs_s, rt, sreg\n" \ +" .inst 0xd5200000|(\\sreg)|(.L__reg_num_\\rt)\n" \ " .endm\n" -"\n" -" .macro msr_s, sreg, rt\n" -" .inst 0xd5000000|(\\sreg)|(.L__reg_num_\\rt)\n" + +#define DEFINE_MSR_S \ + __DEFINE_MRS_MSR_S_REGNUM \ +" .macro msr_s, sreg, rt\n" \ +" .inst 0xd5000000|(\\sreg)|(.L__reg_num_\\rt)\n" \ " .endm\n" -); + +#define UNDEFINE_MRS_S \ +" .purgem mrs_s\n" + +#define UNDEFINE_MSR_S \ +" .purgem msr_s\n" + +#define __mrs_s(r, v) \ + DEFINE_MRS_S \ +" mrs_s %0, " __stringify(r) "\n" \ + UNDEFINE_MRS_S : "=r" (v) + +#define __msr_s(r, v) \ + DEFINE_MSR_S \ +" msr_s " __stringify(r) ", %x0\n" \ + UNDEFINE_MSR_S : : "rZ" (v) /* * Unlike read_cpuid, calls to read_sysreg are never expected to be @@ -285,15 +304,15 @@ asm( * For registers without architectural names, or simply unsupported by * GAS. */ -#define read_sysreg_s(r) ({ \ - u64 __val; \ - asm volatile("mrs_s %0, " __stringify(r) : "=r" (__val)); \ - __val; \ +#define read_sysreg_s(r) ({ \ + u64 __val; \ + asm volatile(__mrs_s(r, __val)); \ + __val; \ }) -#define write_sysreg_s(v, r) do { \ - u64 __val = (u64)v; \ - asm volatile("msr_s " __stringify(r) ", %x0" : : "rZ" (__val)); \ +#define write_sysreg_s(v, r) do { \ + u64 __val = (u64)(v); \ + asm volatile(__msr_s(r, __val)); \ } while (0) static inline void config_sctlr_el1(u32 clear, u32 set)
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index 0dd1bc1..bbd5dc7 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h
@@ -36,58 +36,31 @@ struct task_struct; +#include <asm/stack_pointer.h> #include <asm/types.h> typedef unsigned long mm_segment_t; /* * low level task data that entry.S needs immediate access to. - * __switch_to() assumes cpu_context follows immediately after cpu_domain. */ struct thread_info { unsigned long flags; /* low level flags */ mm_segment_t addr_limit; /* address limit */ - struct task_struct *task; /* main task structure */ +#ifdef CONFIG_ARM64_SW_TTBR0_PAN + u64 ttbr0; /* saved TTBR0_EL1 */ +#endif int preempt_count; /* 0 => preemptable, <0 => bug */ - int cpu; /* cpu */ }; #define INIT_THREAD_INFO(tsk) \ { \ - .task = &tsk, \ - .flags = 0, \ .preempt_count = INIT_PREEMPT_COUNT, \ .addr_limit = KERNEL_DS, \ } -#define init_thread_info (init_thread_union.thread_info) #define init_stack (init_thread_union.stack) -/* - * how to get the current stack pointer from C - */ -register unsigned long current_stack_pointer asm ("sp"); - -/* - * how to get the thread information struct from C - */ -static inline struct thread_info *current_thread_info(void) __attribute_const__; - -/* - * struct thread_info can be accessed directly via sp_el0. - * - * We don't use read_sysreg() as we want the compiler to cache the value where - * possible. - */ -static inline struct thread_info *current_thread_info(void) -{ - unsigned long sp_el0; - - asm ("mrs %0, sp_el0" : "=r" (sp_el0)); - - return (struct thread_info *)sp_el0; -} - #define thread_saved_pc(tsk) \ ((unsigned long)(tsk->thread.cpu_context.pc)) #define thread_saved_sp(tsk) \ @@ -112,6 +85,7 @@ static inline struct thread_info *current_thread_info(void) #define TIF_NEED_RESCHED 1 #define TIF_NOTIFY_RESUME 2 /* callback before returning to user */ #define TIF_FOREIGN_FPSTATE 3 /* CPU's FP state is not current's */ +#define TIF_FSCHECK 4 /* Check FS is USER_DS on return */ #define TIF_NOHZ 7 #define TIF_SYSCALL_TRACE 8 #define TIF_SYSCALL_AUDIT 9 @@ -133,10 +107,12 @@ static inline struct thread_info *current_thread_info(void) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT) #define _TIF_SECCOMP (1 << TIF_SECCOMP) +#define _TIF_FSCHECK (1 << TIF_FSCHECK) #define _TIF_32BIT (1 << TIF_32BIT) #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ - _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE) + _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \ + _TIF_FSCHECK) #define _TIF_SYSCALL_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h index 8b57339..a4f86ded 100644 --- a/arch/arm64/include/asm/topology.h +++ b/arch/arm64/include/asm/topology.h
@@ -31,6 +31,17 @@ int pcibus_to_node(struct pci_bus *bus); cpumask_of_node(pcibus_to_node(bus))) #endif /* CONFIG_NUMA */ +struct sched_domain; +#ifdef CONFIG_CPU_FREQ +#define arch_scale_freq_capacity cpufreq_scale_freq_capacity +extern unsigned long cpufreq_scale_freq_capacity(struct sched_domain *sd, int cpu); +#define arch_scale_max_freq_capacity cpufreq_scale_max_freq_capacity +extern unsigned long cpufreq_scale_max_freq_capacity(struct sched_domain *sd, int cpu); +#define arch_scale_min_freq_capacity cpufreq_scale_min_freq_capacity +extern unsigned long cpufreq_scale_min_freq_capacity(struct sched_domain *sd, int cpu); +#endif +#define arch_scale_cpu_capacity scale_cpu_capacity +extern unsigned long scale_cpu_capacity(struct sched_domain *sd, int cpu); #include <asm-generic/topology.h>
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h index 02e9035..47a9066 100644 --- a/arch/arm64/include/asm/traps.h +++ b/arch/arm64/include/asm/traps.h
@@ -37,18 +37,11 @@ void unregister_undef_hook(struct undef_hook *hook); void arm64_notify_segfault(struct pt_regs *regs, unsigned long addr); -#ifdef CONFIG_FUNCTION_GRAPH_TRACER static inline int __in_irqentry_text(unsigned long ptr) { return ptr >= (unsigned long)&__irqentry_text_start && ptr < (unsigned long)&__irqentry_text_end; } -#else -static inline int __in_irqentry_text(unsigned long ptr) -{ - return 0; -} -#endif static inline int in_exception_text(unsigned long ptr) {
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index 1d047d6..2163279 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h
@@ -18,6 +18,13 @@ #ifndef __ASM_UACCESS_H #define __ASM_UACCESS_H +#include <asm/alternative.h> +#include <asm/kernel-pgtable.h> +#include <asm/mmu.h> +#include <asm/sysreg.h> + +#ifndef __ASSEMBLY__ + /* * User space memory access functions */ @@ -26,11 +33,9 @@ #include <linux/string.h> #include <linux/thread_info.h> -#include <asm/alternative.h> #include <asm/cpufeature.h> #include <asm/processor.h> #include <asm/ptrace.h> -#include <asm/sysreg.h> #include <asm/errno.h> #include <asm/memory.h> #include <asm/compiler.h> @@ -67,6 +72,9 @@ static inline void set_fs(mm_segment_t fs) { current_thread_info()->addr_limit = fs; + /* On user-mode return, check fs is correct */ + set_thread_flag(TIF_FSCHECK); + /* * Prevent a mispredicted conditional call to set_fs from forwarding * the wrong address limit to access_ok under speculation. @@ -136,6 +144,115 @@ static inline unsigned long __range_ok(unsigned long addr, unsigned long size) " .popsection\n" /* + * User access enabling/disabling. + */ +#ifdef CONFIG_ARM64_SW_TTBR0_PAN +static inline void __uaccess_ttbr0_disable(void) +{ + unsigned long flags, ttbr; + + local_irq_save(flags); + ttbr = read_sysreg(ttbr1_el1); + ttbr &= ~TTBR_ASID_MASK; + /* reserved_ttbr0 placed at the end of swapper_pg_dir */ + write_sysreg(ttbr + SWAPPER_DIR_SIZE, ttbr0_el1); + isb(); + /* Set reserved ASID */ + write_sysreg(ttbr, ttbr1_el1); + isb(); + local_irq_restore(flags); +} + +static inline void __uaccess_ttbr0_enable(void) +{ + unsigned long flags, ttbr0, ttbr1; + + /* + * Disable interrupts to avoid preemption between reading the 'ttbr0' + * variable and the MSR. A context switch could trigger an ASID + * roll-over and an update of 'ttbr0'. + */ + local_irq_save(flags); + ttbr0 = READ_ONCE(current_thread_info()->ttbr0); + + /* Restore active ASID */ + ttbr1 = read_sysreg(ttbr1_el1); + ttbr1 &= ~TTBR_ASID_MASK; /* safety measure */ + ttbr1 |= ttbr0 & TTBR_ASID_MASK; + write_sysreg(ttbr1, ttbr1_el1); + isb(); + + /* Restore user page table */ + write_sysreg(ttbr0, ttbr0_el1); + isb(); + local_irq_restore(flags); +} + +static inline bool uaccess_ttbr0_disable(void) +{ + if (!system_uses_ttbr0_pan()) + return false; + __uaccess_ttbr0_disable(); + return true; +} + +static inline bool uaccess_ttbr0_enable(void) +{ + if (!system_uses_ttbr0_pan()) + return false; + __uaccess_ttbr0_enable(); + return true; +} +#else +static inline bool uaccess_ttbr0_disable(void) +{ + return false; +} + +static inline bool uaccess_ttbr0_enable(void) +{ + return false; +} +#endif + +#define __uaccess_disable(alt) \ +do { \ + if (!uaccess_ttbr0_disable()) \ + asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), alt, \ + CONFIG_ARM64_PAN)); \ +} while (0) + +#define __uaccess_enable(alt) \ +do { \ + if (!uaccess_ttbr0_enable()) \ + asm(ALTERNATIVE("nop", SET_PSTATE_PAN(0), alt, \ + CONFIG_ARM64_PAN)); \ +} while (0) + +static inline void uaccess_disable(void) +{ + __uaccess_disable(ARM64_HAS_PAN); +} + +static inline void uaccess_enable(void) +{ + __uaccess_enable(ARM64_HAS_PAN); +} + +/* + * These functions are no-ops when UAO is present. + */ +static inline void uaccess_disable_not_uao(void) +{ + __uaccess_disable(ARM64_ALT_PAN_NOT_UAO); +} + +static inline void uaccess_enable_not_uao(void) +{ + __uaccess_enable(ARM64_ALT_PAN_NOT_UAO); +} + +/* * Sanitise a uaccess pointer such that it becomes NULL if above the * current addr_limit. */ @@ -182,8 +299,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr) do { \ unsigned long __gu_val; \ __chk_user_ptr(ptr); \ - asm(ALTERNATIVE("nop", SET_PSTATE_PAN(0), ARM64_ALT_PAN_NOT_UAO,\ - CONFIG_ARM64_PAN)); \ + uaccess_enable_not_uao(); \ switch (sizeof(*(ptr))) { \ case 1: \ __get_user_asm("ldrb", "ldtrb", "%w", __gu_val, (ptr), \ @@ -198,15 +314,14 @@ do { \ (err), ARM64_HAS_UAO); \ break; \ case 8: \ - __get_user_asm("ldr", "ldtr", "%", __gu_val, (ptr), \ + __get_user_asm("ldr", "ldtr", "%x", __gu_val, (ptr), \ (err), ARM64_HAS_UAO); \ break; \ default: \ BUILD_BUG(); \ } \ + uaccess_disable_not_uao(); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ - asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_ALT_PAN_NOT_UAO,\ - CONFIG_ARM64_PAN)); \ } while (0) #define __get_user_check(x, ptr, err) \ @@ -256,8 +371,7 @@ do { \ do { \ __typeof__(*(ptr)) __pu_val = (x); \ __chk_user_ptr(ptr); \ - asm(ALTERNATIVE("nop", SET_PSTATE_PAN(0), ARM64_ALT_PAN_NOT_UAO,\ - CONFIG_ARM64_PAN)); \ + uaccess_enable_not_uao(); \ switch (sizeof(*(ptr))) { \ case 1: \ __put_user_asm("strb", "sttrb", "%w", __pu_val, (ptr), \ @@ -272,14 +386,13 @@ do { \ (err), ARM64_HAS_UAO); \ break; \ case 8: \ - __put_user_asm("str", "sttr", "%", __pu_val, (ptr), \ + __put_user_asm("str", "sttr", "%x", __pu_val, (ptr), \ (err), ARM64_HAS_UAO); \ break; \ default: \ BUILD_BUG(); \ } \ - asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_ALT_PAN_NOT_UAO,\ - CONFIG_ARM64_PAN)); \ + uaccess_disable_not_uao(); \ } while (0) #define __put_user_check(x, ptr, err) \ @@ -379,4 +492,77 @@ extern long strncpy_from_user(char *dest, const char __user *src, long count); extern __must_check long strlen_user(const char __user *str); extern __must_check long strnlen_user(const char __user *str, long n); +#else /* __ASSEMBLY__ */ + +#include <asm/assembler.h> + +/* + * User access enabling/disabling macros. + */ +#ifdef CONFIG_ARM64_SW_TTBR0_PAN + .macro __uaccess_ttbr0_disable, tmp1 + mrs \tmp1, ttbr1_el1 // swapper_pg_dir + bic \tmp1, \tmp1, #TTBR_ASID_MASK + add \tmp1, \tmp1, #SWAPPER_DIR_SIZE // reserved_ttbr0 at the end of swapper_pg_dir + msr ttbr0_el1, \tmp1 // set reserved TTBR0_EL1 + isb + sub \tmp1, \tmp1, #SWAPPER_DIR_SIZE + msr ttbr1_el1, \tmp1 // set reserved ASID + isb + .endm + + .macro __uaccess_ttbr0_enable, tmp1, tmp2 + get_thread_info \tmp1 + ldr \tmp1, [\tmp1, #TSK_TI_TTBR0] // load saved TTBR0_EL1 + mrs \tmp2, ttbr1_el1 + extr \tmp2, \tmp2, \tmp1, #48 + ror \tmp2, \tmp2, #16 + msr ttbr1_el1, \tmp2 // set the active ASID + isb + msr ttbr0_el1, \tmp1 // set the non-PAN TTBR0_EL1 + isb + .endm + + .macro uaccess_ttbr0_disable, tmp1, tmp2 +alternative_if_not ARM64_HAS_PAN + save_and_disable_irq \tmp2 // avoid preemption + __uaccess_ttbr0_disable \tmp1 + restore_irq \tmp2 +alternative_else_nop_endif + .endm + + .macro uaccess_ttbr0_enable, tmp1, tmp2, tmp3 +alternative_if_not ARM64_HAS_PAN + save_and_disable_irq \tmp3 // avoid preemption + __uaccess_ttbr0_enable \tmp1, \tmp2 + restore_irq \tmp3 +alternative_else_nop_endif + .endm +#else + .macro uaccess_ttbr0_disable, tmp1, tmp2 + .endm + + .macro uaccess_ttbr0_enable, tmp1, tmp2, tmp3 + .endm +#endif + +/* + * These macros are no-ops when UAO is present. + */ + .macro uaccess_disable_not_uao, tmp1, tmp2 + uaccess_ttbr0_disable \tmp1, \tmp2 +alternative_if ARM64_ALT_PAN_NOT_UAO + SET_PSTATE_PAN(1) +alternative_else_nop_endif + .endm + + .macro uaccess_enable_not_uao, tmp1, tmp2, tmp3 + uaccess_ttbr0_enable \tmp1, \tmp2, \tmp3 +alternative_if ARM64_ALT_PAN_NOT_UAO + SET_PSTATE_PAN(0) +alternative_else_nop_endif + .endm + +#endif /* __ASSEMBLY__ */ + #endif /* __ASM_UACCESS_H */
diff --git a/arch/arm64/kernel/acpi_parking_protocol.c b/arch/arm64/kernel/acpi_parking_protocol.c index a32b401..1f5655c 100644 --- a/arch/arm64/kernel/acpi_parking_protocol.c +++ b/arch/arm64/kernel/acpi_parking_protocol.c
@@ -17,6 +17,7 @@ * along with this program. If not, see <http://www.gnu.org/licenses/>. */ #include <linux/acpi.h> +#include <linux/mm.h> #include <linux/types.h> #include <asm/cpu_ops.h> @@ -109,7 +110,7 @@ static int acpi_parking_protocol_cpu_boot(unsigned int cpu) * that read this address need to convert this address to the * Boot-Loader's endianness before jumping. */ - writeq_relaxed(__pa(secondary_entry), &mailbox->entry_point); + writeq_relaxed(__pa_symbol(secondary_entry), &mailbox->entry_point); writel_relaxed(cpu_entry->gic_cpu_id, &mailbox->cpu_id); arch_send_wakeup_ipi_mask(cpumask_of(cpu));
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c index c0ede23..29d2ad8 100644 --- a/arch/arm64/kernel/armv8_deprecated.c +++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -14,7 +14,6 @@ #include <linux/slab.h> #include <linux/sysctl.h> -#include <asm/alternative.h> #include <asm/cpufeature.h> #include <asm/insn.h> #include <asm/opcodes.h> @@ -285,10 +284,10 @@ static void __init register_insn_emulation_sysctl(struct ctl_table *table) #define __SWP_LL_SC_LOOPS 4 #define __user_swpX_asm(data, addr, res, temp, temp2, B) \ +do { \ + uaccess_enable(); \ __asm__ __volatile__( \ " mov %w3, %w7\n" \ - ALTERNATIVE("nop", SET_PSTATE_PAN(0), ARM64_HAS_PAN, \ - CONFIG_ARM64_PAN) \ "0: ldxr"B" %w2, [%4]\n" \ "1: stxr"B" %w0, %w1, [%4]\n" \ " cbz %w0, 2f\n" \ @@ -306,13 +305,13 @@ static void __init register_insn_emulation_sysctl(struct ctl_table *table) " .popsection" \ _ASM_EXTABLE(0b, 4b) \ _ASM_EXTABLE(1b, 4b) \ - ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN, \ - CONFIG_ARM64_PAN) \ : "=&r" (res), "+r" (data), "=&r" (temp), "=&r" (temp2) \ : "r" ((unsigned long)addr), "i" (-EAGAIN), \ "i" (-EFAULT), \ "i" (__SWP_LL_SC_LOOPS) \ - : "memory") + : "memory"); \ + uaccess_disable(); \ +} while (0) #define __user_swp_asm(data, addr, res, temp, temp2) \ __user_swpX_asm(data, addr, res, temp, temp2, "")
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index bd239b1..61ae19e 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c
@@ -37,11 +37,13 @@ int main(void) { DEFINE(TSK_ACTIVE_MM, offsetof(struct task_struct, active_mm)); BLANK(); - DEFINE(TI_FLAGS, offsetof(struct thread_info, flags)); - DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count)); - DEFINE(TI_ADDR_LIMIT, offsetof(struct thread_info, addr_limit)); - DEFINE(TI_TASK, offsetof(struct thread_info, task)); - DEFINE(TI_CPU, offsetof(struct thread_info, cpu)); + DEFINE(TSK_TI_FLAGS, offsetof(struct task_struct, thread_info.flags)); + DEFINE(TSK_TI_PREEMPT, offsetof(struct task_struct, thread_info.preempt_count)); + DEFINE(TSK_TI_ADDR_LIMIT, offsetof(struct task_struct, thread_info.addr_limit)); +#ifdef CONFIG_ARM64_SW_TTBR0_PAN + DEFINE(TSK_TI_TTBR0, offsetof(struct task_struct, thread_info.ttbr0)); +#endif + DEFINE(TSK_STACK, offsetof(struct task_struct, stack)); BLANK(); DEFINE(THREAD_CPU_CONTEXT, offsetof(struct task_struct, thread.cpu_context)); BLANK(); @@ -124,6 +126,7 @@ int main(void) DEFINE(TZ_DSTTIME, offsetof(struct timezone, tz_dsttime)); BLANK(); DEFINE(CPU_BOOT_STACK, offsetof(struct secondary_data, stack)); + DEFINE(CPU_BOOT_TASK, offsetof(struct secondary_data, task)); BLANK(); #ifdef CONFIG_KVM_ARM_HOST DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt));
diff --git a/arch/arm64/kernel/cpu-reset.h b/arch/arm64/kernel/cpu-reset.h index d4e9ecb..6c2b1b4 100644 --- a/arch/arm64/kernel/cpu-reset.h +++ b/arch/arm64/kernel/cpu-reset.h
@@ -24,7 +24,7 @@ static inline void __noreturn cpu_soft_restart(unsigned long el2_switch, el2_switch = el2_switch && !is_kernel_in_hyp_mode() && is_hyp_mode_available(); - restart = (void *)virt_to_phys(__cpu_soft_restart); + restart = (void *)__pa_symbol(__cpu_soft_restart); cpu_install_idmap(); restart(el2_switch, entry, arg0, arg1, arg2);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index a3ab7df..5053072 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c
@@ -23,6 +23,7 @@ #include <linux/sort.h> #include <linux/stop_machine.h> #include <linux/types.h> +#include <linux/mm.h> #include <asm/cpu.h> #include <asm/cpufeature.h> #include <asm/cpu_ops.h> @@ -739,7 +740,7 @@ static bool runs_at_el2(const struct arm64_cpu_capabilities *entry, int __unused static bool hyp_offset_low(const struct arm64_cpu_capabilities *entry, int __unused) { - phys_addr_t idmap_addr = virt_to_phys(__hyp_idmap_text_start); + phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start); /* * Activate the lower HYP offset only if: @@ -791,7 +792,7 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry, ID_AA64PFR0_CSV3_SHIFT); } -static int kpti_install_ng_mappings(void *__unused) +static int __nocfi kpti_install_ng_mappings(void *__unused) { typedef void (kpti_remap_fn)(int, int, phys_addr_t); extern kpti_remap_fn idmap_kpti_install_ng_mappings;
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index ca978d7..ecebb48c7 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S
@@ -32,7 +32,9 @@ #include <asm/memory.h> #include <asm/mmu.h> #include <asm/processor.h> +#include <asm/ptrace.h> #include <asm/thread_info.h> +#include <asm/uaccess.h> #include <asm/asm-uaccess.h> #include <asm/unistd.h> #include <asm/kernel-pgtable.h> @@ -105,7 +107,7 @@ alternative_cb_end ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 cbz \tmp2, \targ - ldr \tmp2, [tsk, #TI_FLAGS] + ldr \tmp2, [tsk, #TSK_TI_FLAGS] tbnz \tmp2, #TIF_SSBD, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 mov w1, #\state @@ -137,9 +139,8 @@ .if \el == 0 mrs x21, sp_el0 - mov tsk, sp - and tsk, tsk, #~(THREAD_SIZE - 1) // Ensure MDSCR_EL1.SS is clear, - ldr x19, [tsk, #TI_FLAGS] // since we can unmask debug + ldr_this_cpu tsk, __entry_task, x20 // Ensure MDSCR_EL1.SS is clear, + ldr x19, [tsk, #TSK_TI_FLAGS] // since we can unmask debug disable_step_tsk x19, x20 // exceptions when scheduling. apply_ssbd 1, 1f, x22, x23 @@ -155,15 +156,41 @@ add x21, sp, #S_FRAME_SIZE get_thread_info tsk /* Save the task's original addr_limit and set USER_DS */ - ldr x20, [tsk, #TI_ADDR_LIMIT] + ldr x20, [tsk, #TSK_TI_ADDR_LIMIT] str x20, [sp, #S_ORIG_ADDR_LIMIT] mov x20, #USER_DS - str x20, [tsk, #TI_ADDR_LIMIT] + str x20, [tsk, #TSK_TI_ADDR_LIMIT] /* No need to reset PSTATE.UAO, hardware's already set it to 0 for us */ .endif /* \el == 0 */ mrs x22, elr_el1 mrs x23, spsr_el1 stp lr, x21, [sp, #S_LR] + +#ifdef CONFIG_ARM64_SW_TTBR0_PAN + /* + * Set the TTBR0 PAN bit in SPSR. When the exception is taken from + * EL0, there is no need to check the state of TTBR0_EL1 since + * accesses are always enabled. + * Note that the meaning of this bit differs from the ARMv8.1 PAN + * feature as all TTBR0_EL1 accesses are disabled, not just those to + * user mappings. + */ +alternative_if ARM64_HAS_PAN + b 1f // skip TTBR0 PAN +alternative_else_nop_endif + + .if \el != 0 + mrs x21, ttbr0_el1 + tst x21, #TTBR_ASID_MASK // Check for the reserved ASID + orr x23, x23, #PSR_PAN_BIT // Set the emulated PAN in the saved SPSR + b.eq 1f // TTBR0 access already disabled + and x23, x23, #~PSR_PAN_BIT // Clear the emulated PAN in the saved SPSR + .endif + + __uaccess_ttbr0_disable x21 +1: +#endif + stp x22, x23, [sp, #S_PC] /* @@ -194,7 +221,7 @@ .if \el != 0 /* Restore the task's original addr_limit. */ ldr x20, [sp, #S_ORIG_ADDR_LIMIT] - str x20, [tsk, #TI_ADDR_LIMIT] + str x20, [tsk, #TSK_TI_ADDR_LIMIT] /* No need to restore UAO, it will be restored from SPSR_EL1 */ .endif @@ -202,6 +229,40 @@ ldp x21, x22, [sp, #S_PC] // load ELR, SPSR .if \el == 0 ct_user_enter + .endif + +#ifdef CONFIG_ARM64_SW_TTBR0_PAN + /* + * Restore access to TTBR0_EL1. If returning to EL0, no need for SPSR + * PAN bit checking. + */ +alternative_if ARM64_HAS_PAN + b 2f // skip TTBR0 PAN +alternative_else_nop_endif + + .if \el != 0 + tbnz x22, #22, 1f // Skip re-enabling TTBR0 access if the PSR_PAN_BIT is set + .endif + + __uaccess_ttbr0_enable x0, x1 + + .if \el == 0 + /* + * Enable errata workarounds only if returning to user. The only + * workaround currently required for TTBR0_EL1 changes are for the + * Cavium erratum 27456 (broadcast TLBI instructions may cause I-cache + * corruption). + */ + bl post_ttbr_update_workaround + .endif +1: + .if \el != 0 + and x22, x22, #~PSR_PAN_BIT // ARMv8.0 CPUs do not understand this bit + .endif +2: +#endif + + .if \el == 0 ldr x23, [sp, #S_SP] // load return stack pointer msr sp_el0, x23 tst x22, #PSR_MODE32_BIT // native task? @@ -221,6 +282,7 @@ apply_ssbd 0, 5f, x0, x1 5: .endif + msr elr_el1, x21 // set up the return data msr spsr_el1, x22 ldp x0, x1, [sp, #16 * 0] @@ -257,21 +319,18 @@ .endif .endm - .macro get_thread_info, rd - mrs \rd, sp_el0 - .endm - .macro irq_stack_entry mov x19, sp // preserve the original sp /* - * Compare sp with the current thread_info, if the top - * ~(THREAD_SIZE - 1) bits match, we are on a task stack, and - * should switch to the irq stack. + * Compare sp with the base of the task stack. + * If the top ~(THREAD_SIZE - 1) bits match, we are on a task stack, + * and should switch to the irq stack. */ - and x25, x19, #~(THREAD_SIZE - 1) - cmp x25, tsk - b.ne 9998f + ldr x25, [tsk, TSK_STACK] + eor x25, x25, x19 + and x25, x25, #~(THREAD_SIZE - 1) + cbnz x25, 9998f adr_this_cpu x25, irq_stack, x26 mov x26, #IRQ_STACK_START_SP @@ -501,9 +560,9 @@ irq_handler #ifdef CONFIG_PREEMPT - ldr w24, [tsk, #TI_PREEMPT] // get preempt count + ldr w24, [tsk, #TSK_TI_PREEMPT] // get preempt count cbnz w24, 1f // preempt count != 0 - ldr x0, [tsk, #TI_FLAGS] // get flags + ldr x0, [tsk, #TSK_TI_FLAGS] // get flags tbz x0, #TIF_NEED_RESCHED, 1f // needs rescheduling? bl el1_preempt 1: @@ -518,7 +577,7 @@ el1_preempt: mov x24, lr 1: bl preempt_schedule_irq // irq en/disable is done inside - ldr x0, [tsk, #TI_FLAGS] // get new tasks TI_FLAGS + ldr x0, [tsk, #TSK_TI_FLAGS] // get new tasks TI_FLAGS tbnz x0, #TIF_NEED_RESCHED, 1b // needs rescheduling? ret x24 #endif @@ -757,8 +816,7 @@ ldp x29, x9, [x8], #16 ldr lr, [x8] mov sp, x9 - and x9, x9, #~(THREAD_SIZE - 1) - msr sp_el0, x9 + msr sp_el0, x1 ret ENDPROC(cpu_switch_to) @@ -769,7 +827,7 @@ ret_fast_syscall: disable_irq // disable interrupts str x0, [sp, #S_X0] // returned x0 - ldr x1, [tsk, #TI_FLAGS] // re-check for syscall tracing + ldr x1, [tsk, #TSK_TI_FLAGS] // re-check for syscall tracing and x2, x1, #_TIF_SYSCALL_WORK cbnz x2, ret_fast_syscall_trace and x2, x1, #_TIF_WORK_MASK @@ -789,14 +847,14 @@ #ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_on // enabled while in userspace #endif - ldr x1, [tsk, #TI_FLAGS] // re-check for single-step + ldr x1, [tsk, #TSK_TI_FLAGS] // re-check for single-step b finish_ret_to_user /* * "slow" syscall return path. */ ret_to_user: disable_irq // disable interrupts - ldr x1, [tsk, #TI_FLAGS] + ldr x1, [tsk, #TSK_TI_FLAGS] and x2, x1, #_TIF_WORK_MASK cbnz x2, work_pending finish_ret_to_user: @@ -829,7 +887,7 @@ enable_dbg_and_irq ct_user_exit 1 - ldr x16, [tsk, #TI_FLAGS] // check for syscall hooks + ldr x16, [tsk, #TSK_TI_FLAGS] // check for syscall hooks tst x16, #_TIF_SYSCALL_WORK b.ne __sys_trace cmp scno, sc_nr // check upper syscall limit @@ -891,14 +949,24 @@ .macro tramp_map_kernel, tmp mrs \tmp, ttbr1_el1 - sub \tmp, \tmp, #SWAPPER_DIR_SIZE + sub \tmp, \tmp, #(SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE) bic \tmp, \tmp, #USER_ASID_FLAG msr ttbr1_el1, \tmp +#ifdef CONFIG_ARCH_MSM8996 + /* ASID already in \tmp[63:48] */ + movk \tmp, #:abs_g2_nc:(TRAMP_VALIAS >> 12) + movk \tmp, #:abs_g1_nc:(TRAMP_VALIAS >> 12) + /* 2MB boundary containing the vectors, so we nobble the walk cache */ + movk \tmp, #:abs_g0_nc:((TRAMP_VALIAS & ~(SZ_2M - 1)) >> 12) + isb + tlbi vae1, \tmp + dsb nsh +#endif /* CONFIG_ARCH_MSM8996 */ .endm .macro tramp_unmap_kernel, tmp mrs \tmp, ttbr1_el1 - add \tmp, \tmp, #SWAPPER_DIR_SIZE + add \tmp, \tmp, #(SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE) orr \tmp, \tmp, #USER_ASID_FLAG msr ttbr1_el1, \tmp /* @@ -925,7 +993,9 @@ tramp_map_kernel x30 #ifdef CONFIG_RANDOMIZE_BASE adr x30, tramp_vectors + PAGE_SIZE +#ifndef CONFIG_ARCH_MSM8996 isb +#endif ldr x30, [x30] #else ldr x30, =vectors
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index fa52817..d479185 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S
@@ -326,14 +326,14 @@ * dirty cache lines being evicted. */ adrp x0, idmap_pg_dir - adrp x1, swapper_pg_dir + SWAPPER_DIR_SIZE + adrp x1, swapper_pg_dir + SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE bl __inval_cache_range /* * Clear the idmap and swapper page tables. */ adrp x0, idmap_pg_dir - adrp x6, swapper_pg_dir + SWAPPER_DIR_SIZE + adrp x6, swapper_pg_dir + SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE 1: stp xzr, xzr, [x0], #16 stp xzr, xzr, [x0], #16 stp xzr, xzr, [x0], #16 @@ -412,7 +412,7 @@ * tables again to remove any speculatively loaded cache lines. */ adrp x0, idmap_pg_dir - adrp x1, swapper_pg_dir + SWAPPER_DIR_SIZE + adrp x1, swapper_pg_dir + SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE dmb sy bl __inval_cache_range @@ -428,7 +428,8 @@ __primary_switched: adrp x4, init_thread_union add sp, x4, #THREAD_SIZE - msr sp_el0, x4 // Save thread_info + adr_l x5, init_task + msr sp_el0, x5 // Save thread_info adr_l x8, vectors // load VBAR_EL1 with virtual msr vbar_el1, x8 // vector table address @@ -700,10 +701,10 @@ isb adr_l x0, secondary_data - ldr x0, [x0, #CPU_BOOT_STACK] // get secondary_data.stack - mov sp, x0 - and x0, x0, #~(THREAD_SIZE - 1) - msr sp_el0, x0 // save thread_info + ldr x1, [x0, #CPU_BOOT_STACK] // get secondary_data.stack + mov sp, x1 + ldr x2, [x0, #CPU_BOOT_TASK] + msr sp_el0, x2 mov x29, #0 b secondary_start_kernel ENDPROC(__secondary_switched)
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c index f6e71c7..db603cd 100644 --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c
@@ -50,9 +50,6 @@ */ extern int in_suspend; -/* Find a symbols alias in the linear map */ -#define LMADDR(x) phys_to_virt(virt_to_phys(x)) - /* Do we need to reset el2? */ #define el2_reset_needed() (is_hyp_mode_available() && !is_kernel_in_hyp_mode()) @@ -102,8 +99,8 @@ static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i) int pfn_is_nosave(unsigned long pfn) { - unsigned long nosave_begin_pfn = virt_to_pfn(&__nosave_begin); - unsigned long nosave_end_pfn = virt_to_pfn(&__nosave_end - 1); + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin); + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1); return (pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn); } @@ -125,12 +122,12 @@ int arch_hibernation_header_save(void *addr, unsigned int max_size) return -EOVERFLOW; arch_hdr_invariants(&hdr->invariants); - hdr->ttbr1_el1 = virt_to_phys(swapper_pg_dir); + hdr->ttbr1_el1 = __pa_symbol(swapper_pg_dir); hdr->reenter_kernel = _cpu_resume; /* We can't use __hyp_get_vectors() because kvm may still be loaded */ if (el2_reset_needed()) - hdr->__hyp_stub_vectors = virt_to_phys(__hyp_stub_vectors); + hdr->__hyp_stub_vectors = __pa_symbol(__hyp_stub_vectors); else hdr->__hyp_stub_vectors = 0; @@ -471,7 +468,6 @@ int swsusp_arch_resume(void) void *zero_page; size_t exit_size; pgd_t *tmp_pg_dir; - void *lm_restore_pblist; phys_addr_t phys_hibernate_exit; void __noreturn (*hibernate_exit)(phys_addr_t, phys_addr_t, void *, void *, phys_addr_t, phys_addr_t); @@ -492,12 +488,6 @@ int swsusp_arch_resume(void) goto out; /* - * Since we only copied the linear map, we need to find restore_pblist's - * linear map address. - */ - lm_restore_pblist = LMADDR(restore_pblist); - - /* * We need a zero page that is zero before & after resume in order to * to break before make on the ttbr1 page tables. */ @@ -548,7 +538,7 @@ int swsusp_arch_resume(void) } hibernate_exit(virt_to_phys(tmp_pg_dir), resume_hdr.ttbr1_el1, - resume_hdr.reenter_kernel, lm_restore_pblist, + resume_hdr.reenter_kernel, restore_pblist, resume_hdr.__hyp_stub_vectors, virt_to_phys(zero_page)); out:
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index 0b9e5f6..fb0082a 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -318,9 +318,21 @@ static int get_hbp_len(u8 hbp_len) case ARM_BREAKPOINT_LEN_2: len_in_bytes = 2; break; + case ARM_BREAKPOINT_LEN_3: + len_in_bytes = 3; + break; case ARM_BREAKPOINT_LEN_4: len_in_bytes = 4; break; + case ARM_BREAKPOINT_LEN_5: + len_in_bytes = 5; + break; + case ARM_BREAKPOINT_LEN_6: + len_in_bytes = 6; + break; + case ARM_BREAKPOINT_LEN_7: + len_in_bytes = 7; + break; case ARM_BREAKPOINT_LEN_8: len_in_bytes = 8; break; @@ -350,7 +362,7 @@ int arch_check_bp_in_kernelspace(struct perf_event *bp) * to generic breakpoint descriptions. */ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, - int *gen_len, int *gen_type) + int *gen_len, int *gen_type, int *offset) { /* Type */ switch (ctrl.type) { @@ -370,17 +382,33 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, return -EINVAL; } + if (!ctrl.len) + return -EINVAL; + *offset = __ffs(ctrl.len); + /* Len */ - switch (ctrl.len) { + switch (ctrl.len >> *offset) { case ARM_BREAKPOINT_LEN_1: *gen_len = HW_BREAKPOINT_LEN_1; break; case ARM_BREAKPOINT_LEN_2: *gen_len = HW_BREAKPOINT_LEN_2; break; + case ARM_BREAKPOINT_LEN_3: + *gen_len = HW_BREAKPOINT_LEN_3; + break; case ARM_BREAKPOINT_LEN_4: *gen_len = HW_BREAKPOINT_LEN_4; break; + case ARM_BREAKPOINT_LEN_5: + *gen_len = HW_BREAKPOINT_LEN_5; + break; + case ARM_BREAKPOINT_LEN_6: + *gen_len = HW_BREAKPOINT_LEN_6; + break; + case ARM_BREAKPOINT_LEN_7: + *gen_len = HW_BREAKPOINT_LEN_7; + break; case ARM_BREAKPOINT_LEN_8: *gen_len = HW_BREAKPOINT_LEN_8; break; @@ -424,9 +452,21 @@ static int arch_build_bp_info(struct perf_event *bp) case HW_BREAKPOINT_LEN_2: info->ctrl.len = ARM_BREAKPOINT_LEN_2; break; + case HW_BREAKPOINT_LEN_3: + info->ctrl.len = ARM_BREAKPOINT_LEN_3; + break; case HW_BREAKPOINT_LEN_4: info->ctrl.len = ARM_BREAKPOINT_LEN_4; break; + case HW_BREAKPOINT_LEN_5: + info->ctrl.len = ARM_BREAKPOINT_LEN_5; + break; + case HW_BREAKPOINT_LEN_6: + info->ctrl.len = ARM_BREAKPOINT_LEN_6; + break; + case HW_BREAKPOINT_LEN_7: + info->ctrl.len = ARM_BREAKPOINT_LEN_7; + break; case HW_BREAKPOINT_LEN_8: info->ctrl.len = ARM_BREAKPOINT_LEN_8; break; @@ -518,18 +558,17 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp) default: return -EINVAL; } - - info->address &= ~alignment_mask; - info->ctrl.len <<= offset; } else { if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE) alignment_mask = 0x3; else alignment_mask = 0x7; - if (info->address & alignment_mask) - return -EINVAL; + offset = info->address & alignment_mask; } + info->address &= ~alignment_mask; + info->ctrl.len <<= offset; + /* * Disallow per-task kernel breakpoints since these would * complicate the stepping code. @@ -662,12 +701,47 @@ static int breakpoint_handler(unsigned long unused, unsigned int esr, } NOKPROBE_SYMBOL(breakpoint_handler); +/* + * Arm64 hardware does not always report a watchpoint hit address that matches + * one of the watchpoints set. It can also report an address "near" the + * watchpoint if a single instruction access both watched and unwatched + * addresses. There is no straight-forward way, short of disassembling the + * offending instruction, to map that address back to the watchpoint. This + * function computes the distance of the memory access from the watchpoint as a + * heuristic for the likelyhood that a given access triggered the watchpoint. + * + * See Section D2.10.5 "Determining the memory location that caused a Watchpoint + * exception" of ARMv8 Architecture Reference Manual for details. + * + * The function returns the distance of the address from the bytes watched by + * the watchpoint. In case of an exact match, it returns 0. + */ +static u64 get_distance_from_watchpoint(unsigned long addr, u64 val, + struct arch_hw_breakpoint_ctrl *ctrl) +{ + u64 wp_low, wp_high; + u32 lens, lene; + + lens = __ffs(ctrl->len); + lene = __fls(ctrl->len); + + wp_low = val + lens; + wp_high = val + lene; + if (addr < wp_low) + return wp_low - addr; + else if (addr > wp_high) + return addr - wp_high; + else + return 0; +} + static int watchpoint_handler(unsigned long addr, unsigned int esr, struct pt_regs *regs) { - int i, step = 0, *kernel_step, access; + int i, step = 0, *kernel_step, access, closest_match = 0; + u64 min_dist = -1, dist; u32 ctrl_reg; - u64 val, alignment_mask; + u64 val; struct perf_event *wp, **slots; struct debug_info *debug_info; struct arch_hw_breakpoint *info; @@ -676,35 +750,15 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, slots = this_cpu_ptr(wp_on_reg); debug_info = ¤t->thread.debug; + /* + * Find all watchpoints that match the reported address. If no exact + * match is found. Attribute the hit to the closest watchpoint. + */ + rcu_read_lock(); for (i = 0; i < core_num_wrps; ++i) { - rcu_read_lock(); - wp = slots[i]; - if (wp == NULL) - goto unlock; - - info = counter_arch_bp(wp); - /* AArch32 watchpoints are either 4 or 8 bytes aligned. */ - if (is_compat_task()) { - if (info->ctrl.len == ARM_BREAKPOINT_LEN_8) - alignment_mask = 0x7; - else - alignment_mask = 0x3; - } else { - alignment_mask = 0x7; - } - - /* Check if the watchpoint value matches. */ - val = read_wb_reg(AARCH64_DBG_REG_WVR, i); - if (val != (untagged_addr(addr) & ~alignment_mask)) - goto unlock; - - /* Possible match, check the byte address select to confirm. */ - ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i); - decode_ctrl_reg(ctrl_reg, &ctrl); - if (!((1 << (addr & alignment_mask)) & ctrl.len)) - goto unlock; + continue; /* * Check that the access type matches. @@ -713,18 +767,41 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, access = (esr & AARCH64_ESR_ACCESS_MASK) ? HW_BREAKPOINT_W : HW_BREAKPOINT_R; if (!(access & hw_breakpoint_type(wp))) - goto unlock; + continue; + /* Check if the watchpoint value and byte select match. */ + val = read_wb_reg(AARCH64_DBG_REG_WVR, i); + ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i); + decode_ctrl_reg(ctrl_reg, &ctrl); + dist = get_distance_from_watchpoint(addr, val, &ctrl); + if (dist < min_dist) { + min_dist = dist; + closest_match = i; + } + /* Is this an exact match? */ + if (dist != 0) + continue; + + info = counter_arch_bp(wp); info->trigger = addr; perf_bp_event(wp, regs); /* Do we need to handle the stepping? */ if (is_default_overflow_handler(wp)) step = 1; - -unlock: - rcu_read_unlock(); } + if (min_dist > 0 && min_dist != -1) { + /* No exact match found. */ + wp = slots[closest_match]; + info = counter_arch_bp(wp); + info->trigger = addr; + perf_bp_event(wp, regs); + + /* Do we need to handle the stepping? */ + if (is_default_overflow_handler(wp)) + step = 1; + } + rcu_read_unlock(); if (!step) return 0;
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c index 6f2ac4f..f607b38 100644 --- a/arch/arm64/kernel/insn.c +++ b/arch/arm64/kernel/insn.c
@@ -97,7 +97,7 @@ static void __kprobes *patch_map(void *addr, int fixmap) if (module && IS_ENABLED(CONFIG_DEBUG_SET_MODULE_RONX)) page = vmalloc_to_page(addr); else if (!module) - page = pfn_to_page(PHYS_PFN(__pa(addr))); + page = phys_to_page(__pa_symbol(addr)); else return addr;
diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c index 354be2a..79b1738 100644 --- a/arch/arm64/kernel/io.c +++ b/arch/arm64/kernel/io.c
@@ -25,8 +25,7 @@ */ void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count) { - while (count && (!IS_ALIGNED((unsigned long)from, 8) || - !IS_ALIGNED((unsigned long)to, 8))) { + while (count && !IS_ALIGNED((unsigned long)from, 8)) { *(u8 *)to = __raw_readb(from); from++; to++; @@ -54,23 +53,22 @@ EXPORT_SYMBOL(__memcpy_fromio); */ void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count) { - while (count && (!IS_ALIGNED((unsigned long)to, 8) || - !IS_ALIGNED((unsigned long)from, 8))) { - __raw_writeb(*(volatile u8 *)from, to); + while (count && !IS_ALIGNED((unsigned long)to, 8)) { + __raw_writeb(*(u8 *)from, to); from++; to++; count--; } while (count >= 8) { - __raw_writeq(*(volatile u64 *)from, to); + __raw_writeq(*(u64 *)from, to); from += 8; to += 8; count -= 8; } while (count) { - __raw_writeb(*(volatile u8 *)from, to); + __raw_writeb(*(u8 *)from, to); from++; to++; count--;
diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds index 8949f6c..05881e2 100644 --- a/arch/arm64/kernel/module.lds +++ b/arch/arm64/kernel/module.lds
@@ -1,3 +1,3 @@ SECTIONS { - .plt (NOLOAD) : { BYTE(0) } + .plt : { BYTE(0) } }
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index e917d11..ce1c933 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c
@@ -45,6 +45,7 @@ #include <linux/personality.h> #include <linux/notifier.h> #include <trace/events/power.h> +#include <linux/percpu.h> #include <asm/alternative.h> #include <asm/compat.h> @@ -166,6 +167,70 @@ void machine_restart(char *cmd) while (1); } +/* + * dump a block of kernel memory from around the given address + */ +static void show_data(unsigned long addr, int nbytes, const char *name) +{ + int i, j; + int nlines; + u32 *p; + + /* + * don't attempt to dump non-kernel addresses or + * values that are probably just small negative numbers + */ + if (addr < PAGE_OFFSET || addr > -256UL) + return; + + printk("\n%s: %#lx:\n", name, addr); + + /* + * round address down to a 32 bit boundary + * and always dump a multiple of 32 bytes + */ + p = (u32 *)(addr & ~(sizeof(u32) - 1)); + nbytes += (addr & (sizeof(u32) - 1)); + nlines = (nbytes + 31) / 32; + + + for (i = 0; i < nlines; i++) { + /* + * just display low 16 bits of address to keep + * each line of the dump < 80 characters + */ + printk("%04lx ", (unsigned long)p & 0xffff); + for (j = 0; j < 8; j++) { + u32 data; + if (probe_kernel_address(p, data)) { + pr_cont(" ********"); + } else { + pr_cont(" %08x", data); + } + ++p; + } + pr_cont("\n"); + } +} + +static void show_extra_register_data(struct pt_regs *regs, int nbytes) +{ + mm_segment_t fs; + unsigned int i; + + fs = get_fs(); + set_fs(KERNEL_DS); + show_data(regs->pc - nbytes, nbytes * 2, "PC"); + show_data(regs->regs[30] - nbytes, nbytes * 2, "LR"); + show_data(regs->sp - nbytes, nbytes * 2, "SP"); + for (i = 0; i < 30; i++) { + char name[4]; + snprintf(name, sizeof(name), "X%u", i); + show_data(regs->regs[i] - nbytes, nbytes * 2, name); + } + set_fs(fs); +} + void __show_regs(struct pt_regs *regs) { int i, top_reg; @@ -201,6 +266,8 @@ void __show_regs(struct pt_regs *regs) pr_cont("\n"); } + if (!user_mode(regs)) + show_extra_register_data(regs, 128); printk("\n"); } @@ -331,6 +398,20 @@ void uao_thread_switch(struct task_struct *next) } /* + * We store our current task in sp_el0, which is clobbered by userspace. Keep a + * shadow copy so that we can restore this upon entry from userspace. + * + * This is *only* for exception entry from EL0, and is not valid until we + * __switch_to() a user task. + */ +DEFINE_PER_CPU(struct task_struct *, __entry_task); + +static void entry_task_switch(struct task_struct *next) +{ + __this_cpu_write(__entry_task, next); +} + +/* * Thread switching. */ struct task_struct *__switch_to(struct task_struct *prev, @@ -342,6 +423,7 @@ struct task_struct *__switch_to(struct task_struct *prev, tls_thread_switch(next); hw_breakpoint_thread_switch(next); contextidr_thread_switch(next); + entry_task_switch(next); uao_thread_switch(next); /* @@ -359,27 +441,35 @@ struct task_struct *__switch_to(struct task_struct *prev, unsigned long get_wchan(struct task_struct *p) { struct stackframe frame; - unsigned long stack_page; + unsigned long stack_page, ret = 0; int count = 0; if (!p || p == current || p->state == TASK_RUNNING) return 0; + stack_page = (unsigned long)try_get_task_stack(p); + if (!stack_page) + return 0; + frame.fp = thread_saved_fp(p); frame.sp = thread_saved_sp(p); frame.pc = thread_saved_pc(p); #ifdef CONFIG_FUNCTION_GRAPH_TRACER frame.graph = p->curr_ret_stack; #endif - stack_page = (unsigned long)task_stack_page(p); do { if (frame.sp < stack_page || frame.sp >= stack_page + THREAD_SIZE || unwind_frame(p, &frame)) - return 0; - if (!in_sched_functions(frame.pc)) - return frame.pc; + goto out; + if (!in_sched_functions(frame.pc)) { + ret = frame.pc; + goto out; + } } while (count ++ < 16); - return 0; + +out: + put_task_stack(p); + return ret; } unsigned long arch_align_stack(unsigned long sp)
diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c index 42816be..e8edbf1 100644 --- a/arch/arm64/kernel/psci.c +++ b/arch/arm64/kernel/psci.c
@@ -20,6 +20,7 @@ #include <linux/smp.h> #include <linux/delay.h> #include <linux/psci.h> +#include <linux/mm.h> #include <uapi/linux/psci.h> @@ -45,7 +46,7 @@ static int __init cpu_psci_cpu_prepare(unsigned int cpu) static int cpu_psci_cpu_boot(unsigned int cpu) { - int err = psci_ops.cpu_on(cpu_logical_map(cpu), __pa(secondary_entry)); + int err = psci_ops.cpu_on(cpu_logical_map(cpu), __pa_symbol(secondary_entry)); if (err) pr_err("failed to boot CPU%d (%d)\n", cpu, err);
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 8eedeef..a22161c 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c
@@ -327,13 +327,13 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type, struct arch_hw_breakpoint_ctrl ctrl, struct perf_event_attr *attr) { - int err, len, type, disabled = !ctrl.enabled; + int err, len, type, offset, disabled = !ctrl.enabled; attr->disabled = disabled; if (disabled) return 0; - err = arch_bp_generic_fields(ctrl, &len, &type); + err = arch_bp_generic_fields(ctrl, &len, &type, &offset); if (err) return err; @@ -352,6 +352,7 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type, attr->bp_len = len; attr->bp_type = type; + attr->bp_addr += offset; return 0; } @@ -404,7 +405,7 @@ static int ptrace_hbp_get_addr(unsigned int note_type, if (IS_ERR(bp)) return PTR_ERR(bp); - *addr = bp ? bp->attr.bp_addr : 0; + *addr = bp ? counter_arch_bp(bp)->address : 0; return 0; }
diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c index 1718706f..12a87f2 100644 --- a/arch/arm64/kernel/return_address.c +++ b/arch/arm64/kernel/return_address.c
@@ -12,6 +12,7 @@ #include <linux/export.h> #include <linux/ftrace.h> +#include <asm/stack_pointer.h> #include <asm/stacktrace.h> struct return_address_data {
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index f534f492..b522209 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c
@@ -42,6 +42,7 @@ #include <linux/of_fdt.h> #include <linux/efi.h> #include <linux/psci.h> +#include <linux/mm.h> #include <asm/acpi.h> #include <asm/fixmap.h> @@ -199,10 +200,10 @@ static void __init request_standard_resources(void) struct memblock_region *region; struct resource *res; - kernel_code.start = virt_to_phys(_text); - kernel_code.end = virt_to_phys(__init_begin - 1); - kernel_data.start = virt_to_phys(_sdata); - kernel_data.end = virt_to_phys(_end - 1); + kernel_code.start = __pa_symbol(_text); + kernel_code.end = __pa_symbol(__init_begin - 1); + kernel_data.start = __pa_symbol(_sdata); + kernel_data.end = __pa_symbol(_end - 1); for_each_memblock(memory, region) { res = alloc_bootmem_low(sizeof(*res)); @@ -291,6 +292,15 @@ void __init setup_arch(char **cmdline_p) smp_init_cpus(); smp_build_mpidr_hash(); +#ifdef CONFIG_ARM64_SW_TTBR0_PAN + /* + * Make sure init_thread_info.ttbr0 always generates translation + * faults in case uaccess_enable() is inadvertently called by the init + * thread. + */ + init_task.thread_info.ttbr0 = __pa_symbol(empty_zero_page); +#endif + #ifdef CONFIG_VT #if defined(CONFIG_VGA_CONSOLE) conswitchp = &vga_con; @@ -329,11 +339,11 @@ subsys_initcall(topology_init); static int dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p) { - u64 const kaslr_offset = kimage_vaddr - KIMAGE_VADDR; + const unsigned long offset = kaslr_offset(); - if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset > 0) { - pr_emerg("Kernel Offset: 0x%llx from 0x%lx\n", - kaslr_offset, KIMAGE_VADDR); + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && offset > 0) { + pr_emerg("Kernel Offset: 0x%lx from 0x%lx\n", + offset, KIMAGE_VADDR); } else { pr_emerg("Kernel Offset: disabled\n"); }
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 404dd67..c59e675 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c
@@ -25,6 +25,7 @@ #include <linux/uaccess.h> #include <linux/tracehook.h> #include <linux/ratelimit.h> +#include <linux/syscalls.h> #include <asm/debug-monitors.h> #include <asm/elf.h> @@ -408,7 +409,11 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, * Update the trace code with the current status. */ trace_hardirqs_off(); + do { + /* Check valid user FS if needed */ + addr_limit_user_check(); + if (thread_flags & _TIF_NEED_RESCHED) { schedule(); } else {
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S index 0030d69..5a4fbcc 100644 --- a/arch/arm64/kernel/sleep.S +++ b/arch/arm64/kernel/sleep.S
@@ -125,9 +125,6 @@ /* load sp from context */ ldr x2, [x0, #CPU_CTX_SP] mov sp, x2 - /* save thread_info */ - and x2, x2, #~(THREAD_SIZE - 1) - msr sp_el0, x2 /* * cpu_do_resume expects x0 to contain context address pointer */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index cfd33f1..a0d9a8c 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c
@@ -58,6 +58,9 @@ #define CREATE_TRACE_POINTS #include <trace/events/ipi.h> +DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number); +EXPORT_PER_CPU_SYMBOL(cpu_number); + /* * as from 2.5, kernels no longer have an init_tasks structure * so we need some other way of telling a new secondary core @@ -146,6 +149,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) * We need to tell the secondary core where to find its stack and the * page tables. */ + secondary_data.task = idle; secondary_data.stack = task_stack_page(idle) + THREAD_START_SP; update_cpu_boot_status(CPU_MMU_OFF); __flush_dcache_area(&secondary_data, sizeof(secondary_data)); @@ -170,6 +174,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) pr_err("CPU%u: failed to boot: %d\n", cpu, ret); } + secondary_data.task = NULL; secondary_data.stack = NULL; status = READ_ONCE(secondary_data.status); if (ret && status) { @@ -208,7 +213,10 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) asmlinkage notrace void secondary_start_kernel(void) { struct mm_struct *mm = &init_mm; - unsigned int cpu = smp_processor_id(); + unsigned int cpu; + + cpu = task_cpu(current); + set_my_cpu_offset(per_cpu_offset(cpu)); /* * All kernel threads share the same mm context; grab a @@ -217,8 +225,6 @@ asmlinkage notrace void secondary_start_kernel(void) atomic_inc(&mm->mm_count); current->active_mm = mm; - set_my_cpu_offset(per_cpu_offset(smp_processor_id())); - /* * TTBR0 is only used for the identity mapping at this stage. Make it * point to zero page to avoid speculatively fetching new entries. @@ -718,6 +724,8 @@ void __init smp_prepare_cpus(unsigned int max_cpus) */ for_each_possible_cpu(cpu) { + per_cpu(cpu_number, cpu) = cpu; + if (cpu == smp_processor_id()) continue;
diff --git a/arch/arm64/kernel/smp_spin_table.c b/arch/arm64/kernel/smp_spin_table.c index 9a00eee..9303465 100644 --- a/arch/arm64/kernel/smp_spin_table.c +++ b/arch/arm64/kernel/smp_spin_table.c
@@ -21,6 +21,7 @@ #include <linux/of.h> #include <linux/smp.h> #include <linux/types.h> +#include <linux/mm.h> #include <asm/cacheflush.h> #include <asm/cpu_ops.h> @@ -98,7 +99,7 @@ static int smp_spin_table_cpu_prepare(unsigned int cpu) * boot-loader's endianess before jumping. This is mandated by * the boot protocol. */ - writeq_relaxed(__pa(secondary_holding_pen), release_addr); + writeq_relaxed(__pa_symbol(secondary_holding_pen), release_addr); __flush_dcache_area((__force void *)release_addr, sizeof(*release_addr));
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index 0cc01e0..5201beb 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c
@@ -22,6 +22,7 @@ #include <linux/stacktrace.h> #include <asm/irq.h> +#include <asm/stack_pointer.h> #include <asm/stacktrace.h> /* @@ -133,7 +134,6 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame, break; } } -EXPORT_SYMBOL(walk_stackframe); #ifdef CONFIG_STACKTRACE struct stack_trace_data { @@ -186,6 +186,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace) struct stack_trace_data data; struct stackframe frame; + if (!try_get_task_stack(tsk)) + return; + data.trace = trace; data.skip = trace->skip; @@ -207,6 +210,8 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace) walk_stackframe(tsk, &frame, save_trace, &data); if (trace->nr_entries < trace->max_entries) trace->entries[trace->nr_entries++] = ULONG_MAX; + + put_task_stack(tsk); } void save_stack_trace(struct stack_trace *trace)
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c index 1dbf609..e12f2d0 100644 --- a/arch/arm64/kernel/suspend.c +++ b/arch/arm64/kernel/suspend.c
@@ -47,12 +47,6 @@ void notrace __cpu_suspend_exit(void) cpu_uninstall_idmap(); /* - * Restore per-cpu offset before any kernel - * subsystem relying on it has a chance to run. - */ - set_my_cpu_offset(per_cpu_offset(cpu)); - - /* * PSTATE was not saved over suspend/resume, re-enable any detected * features that might not have been set correctly. */
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 694f6de..ed73426 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c
@@ -19,10 +19,24 @@ #include <linux/nodemask.h> #include <linux/of.h> #include <linux/sched.h> +#include <linux/sched.h> +#include <linux/sched_energy.h> #include <asm/cputype.h> #include <asm/topology.h> +static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; + +unsigned long scale_cpu_capacity(struct sched_domain *sd, int cpu) +{ + return per_cpu(cpu_scale, cpu); +} + +static void set_capacity_scale(unsigned int cpu, unsigned long capacity) +{ + per_cpu(cpu_scale, cpu) = capacity; +} + static int __init get_cpu_for_node(struct device_node *node) { struct device_node *cpu_node; @@ -206,11 +220,72 @@ static int __init parse_dt_topology(void) struct cpu_topology cpu_topology[NR_CPUS]; EXPORT_SYMBOL_GPL(cpu_topology); +/* sd energy functions */ +static inline +const struct sched_group_energy * const cpu_cluster_energy(int cpu) +{ + struct sched_group_energy *sge = sge_array[cpu][SD_LEVEL1]; + + if (!sge) { + pr_warn("Invalid sched_group_energy for Cluster%d\n", cpu); + return NULL; + } + + return sge; +} + +static inline +const struct sched_group_energy * const cpu_core_energy(int cpu) +{ + struct sched_group_energy *sge = sge_array[cpu][SD_LEVEL0]; + + if (!sge) { + pr_warn("Invalid sched_group_energy for CPU%d\n", cpu); + return NULL; + } + + return sge; +} + const struct cpumask *cpu_coregroup_mask(int cpu) { return &cpu_topology[cpu].core_sibling; } +static int cpu_cpu_flags(void) +{ + return SD_ASYM_CPUCAPACITY; +} + +static inline int cpu_corepower_flags(void) +{ + return SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN | \ + SD_SHARE_CAP_STATES; +} + +static struct sched_domain_topology_level arm64_topology[] = { +#ifdef CONFIG_SCHED_MC + { cpu_coregroup_mask, cpu_corepower_flags, cpu_core_energy, SD_INIT_NAME(MC) }, +#endif + { cpu_cpu_mask, cpu_cpu_flags, cpu_cluster_energy, SD_INIT_NAME(DIE) }, + { NULL, }, +}; + +static void update_cpu_capacity(unsigned int cpu) +{ + unsigned long capacity = SCHED_CAPACITY_SCALE; + + if (cpu_core_energy(cpu)) { + int max_cap_idx = cpu_core_energy(cpu)->nr_cap_states - 1; + capacity = cpu_core_energy(cpu)->cap_states[max_cap_idx].cap; + } + + set_capacity_scale(cpu, capacity); + + pr_info("CPU%d: update cpu_capacity %lu\n", + cpu, arch_scale_cpu_capacity(NULL, cpu)); +} + static void update_siblings_masks(unsigned int cpuid) { struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid]; @@ -272,6 +347,7 @@ void store_cpu_topology(unsigned int cpuid) topology_populated: update_siblings_masks(cpuid); + update_cpu_capacity(cpuid); } static void __init reset_cpu_topology(void) @@ -302,4 +378,8 @@ void __init init_cpu_topology(void) */ if (of_have_populated_dt() && parse_dt_topology()) reset_cpu_topology(); + else + set_sched_topology(arm64_topology); + + init_sched_energy_costs(); }
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 5963be2..86a87ae 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c
@@ -33,11 +33,13 @@ #include <linux/syscalls.h> #include <asm/atomic.h> +#include <asm/barrier.h> #include <asm/bug.h> #include <asm/debug-monitors.h> #include <asm/esr.h> #include <asm/insn.h> #include <asm/traps.h> +#include <asm/stack_pointer.h> #include <asm/stacktrace.h> #include <asm/exception.h> #include <asm/system_misc.h> @@ -147,6 +149,9 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) if (!tsk) tsk = current; + if (!try_get_task_stack(tsk)) + return; + /* * Switching between stacks is valid when tracing current and in * non-preemptible context. @@ -212,6 +217,8 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) stack + sizeof(struct pt_regs)); } } + + put_task_stack(tsk); } void show_stack(struct task_struct *tsk, unsigned long *sp) @@ -227,10 +234,9 @@ void show_stack(struct task_struct *tsk, unsigned long *sp) #endif #define S_SMP " SMP" -static int __die(const char *str, int err, struct thread_info *thread, - struct pt_regs *regs) +static int __die(const char *str, int err, struct pt_regs *regs) { - struct task_struct *tsk = thread->task; + struct task_struct *tsk = current; static int die_counter; int ret; @@ -245,7 +251,8 @@ static int __die(const char *str, int err, struct thread_info *thread, print_modules(); __show_regs(regs); pr_emerg("Process %.*s (pid: %d, stack limit = 0x%p)\n", - TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), thread + 1); + TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), + end_of_stack(tsk)); if (!user_mode(regs)) { dump_mem(KERN_EMERG, "Stack: ", regs->sp, @@ -264,7 +271,6 @@ static DEFINE_RAW_SPINLOCK(die_lock); */ void die(const char *str, struct pt_regs *regs, int err) { - struct thread_info *thread = current_thread_info(); int ret; oops_enter(); @@ -272,9 +278,9 @@ void die(const char *str, struct pt_regs *regs, int err) raw_spin_lock_irq(&die_lock); console_verbose(); bust_spinlocks(1); - ret = __die(str, err, thread, regs); + ret = __die(str, err, regs); - if (regs && kexec_should_crash(thread->task)) + if (regs && kexec_should_crash(current)) crash_kexec(regs); bust_spinlocks(0); @@ -435,9 +441,10 @@ int cpu_enable_cache_maint_trap(void *__unused) } #define __user_cache_maint(insn, address, res) \ - if (address >= user_addr_max()) \ + if (address >= user_addr_max()) { \ res = -EFAULT; \ - else \ + } else { \ + uaccess_ttbr0_enable(); \ asm volatile ( \ "1: " insn ", %1\n" \ " mov %w0, #0\n" \ @@ -449,7 +456,9 @@ int cpu_enable_cache_maint_trap(void *__unused) " .popsection\n" \ _ASM_EXTABLE(1b, 3b) \ : "=r" (res) \ - : "r" (address), "i" (-EFAULT) ) + : "r" (address), "i" (-EFAULT)); \ + uaccess_ttbr0_disable(); \ + } static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs) { @@ -492,6 +501,25 @@ static void ctr_read_handler(unsigned int esr, struct pt_regs *regs) regs->pc += 4; } +static void cntvct_read_handler(unsigned int esr, struct pt_regs *regs) +{ + int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT; + + isb(); + if (rt != 31) + regs->regs[rt] = arch_counter_get_cntvct(); + regs->pc += 4; +} + +static void cntfrq_read_handler(unsigned int esr, struct pt_regs *regs) +{ + int rt = (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT; + + if (rt != 31) + regs->regs[rt] = read_sysreg(cntfrq_el0); + regs->pc += 4; +} + struct sys64_hook { unsigned int esr_mask; unsigned int esr_val; @@ -510,6 +538,18 @@ static struct sys64_hook sys64_hooks[] = { .esr_val = ESR_ELx_SYS64_ISS_SYS_CTR_READ, .handler = ctr_read_handler, }, + { + /* Trap read access to CNTVCT_EL0 */ + .esr_mask = ESR_ELx_SYS64_ISS_SYS_OP_MASK, + .esr_val = ESR_ELx_SYS64_ISS_SYS_CNTVCT, + .handler = cntvct_read_handler, + }, + { + /* Trap read access to CNTFRQ_EL0 */ + .esr_mask = ESR_ELx_SYS64_ISS_SYS_OP_MASK, + .esr_val = ESR_ELx_SYS64_ISS_SYS_CNTFRQ, + .handler = cntfrq_read_handler, + }, {}, };
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 4bcfe01..7492d90 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c
@@ -123,6 +123,7 @@ static int __init vdso_init(void) { int i; struct page **vdso_pagelist; + unsigned long pfn; if (memcmp(&vdso_start, "\177ELF", 4)) { pr_err("vDSO is not a valid ELF object!\n"); @@ -140,11 +141,14 @@ static int __init vdso_init(void) return -ENOMEM; /* Grab the vDSO data page. */ - vdso_pagelist[0] = pfn_to_page(PHYS_PFN(__pa(vdso_data))); + vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data)); + /* Grab the vDSO code pages. */ + pfn = sym_to_pfn(&vdso_start); + for (i = 0; i < vdso_pages; i++) - vdso_pagelist[i + 1] = pfn_to_page(PHYS_PFN(__pa(&vdso_start)) + i); + vdso_pagelist[i + 1] = pfn_to_page(pfn + i); vdso_spec[0].pages = &vdso_pagelist[0]; vdso_spec[1].pages = &vdso_pagelist[1]; @@ -216,10 +220,8 @@ void update_vsyscall(struct timekeeper *tk) if (!use_syscall) { /* tkr_mono.cycle_last == tkr_raw.cycle_last */ vdso_data->cs_cycle_last = tk->tkr_mono.cycle_last; - vdso_data->raw_time_sec = tk->raw_time.tv_sec; - vdso_data->raw_time_nsec = (tk->raw_time.tv_nsec << - tk->tkr_raw.shift) + - tk->tkr_raw.xtime_nsec; + vdso_data->raw_time_sec = tk->raw_sec; + vdso_data->raw_time_nsec = tk->tkr_raw.xtime_nsec; vdso_data->xtime_clock_sec = tk->xtime_sec; vdso_data->xtime_clock_nsec = tk->tkr_mono.xtime_nsec; vdso_data->cs_mono_mult = tk->tkr_mono.mult;
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index 62c84f7..88fef38 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile
@@ -14,6 +14,7 @@ ccflags-y := -shared -fno-common -fno-builtin ccflags-y += -nostdlib -Wl,-soname=linux-vdso.so.1 \ $(call cc-ldoption, -Wl$(comma)--hash-style=sysv) +ccflags-y += $(DISABLE_LTO) # Disable gcov profiling for VDSO code GCOV_PROFILE := n
diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S index 76320e9..c39872a 100644 --- a/arch/arm64/kernel/vdso/gettimeofday.S +++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -309,7 +309,7 @@ b.ne 4f ldr x2, 6f 2: - cbz w1, 3f + cbz x1, 3f stp xzr, x2, [x1] 3: /* res == NULL. */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index 6a58455..402a621 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -60,7 +60,7 @@ #define TRAMP_TEXT \ . = ALIGN(PAGE_SIZE); \ VMLINUX_SYMBOL(__entry_tramp_text_start) = .; \ - *(.entry.tramp.text) \ + KEEP(*(.entry.tramp.text)) \ . = ALIGN(PAGE_SIZE); \ VMLINUX_SYMBOL(__entry_tramp_text_end) = .; #else @@ -179,11 +179,11 @@ . = ALIGN(4); .altinstructions : { __alt_instructions = .; - *(.altinstructions) + KEEP(*(.altinstructions)) __alt_instructions_end = .; } .altinstr_replacement : { - *(.altinstr_replacement) + KEEP(*(.altinstr_replacement)) } .rela : ALIGN(8) { *(.rela .rela*) @@ -228,6 +228,11 @@ swapper_pg_dir = .; . += SWAPPER_DIR_SIZE; +#ifdef CONFIG_ARM64_SW_TTBR0_PAN + reserved_ttbr0 = .; + . += RESERVED_TTBR0_SIZE; +#endif + #ifdef CONFIG_UNMAP_KERNEL_AT_EL0 tramp_pg_dir = .; . += PAGE_SIZE;
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile index 48b0354..c470a7c 100644 --- a/arch/arm64/kvm/hyp/Makefile +++ b/arch/arm64/kvm/hyp/Makefile
@@ -4,6 +4,10 @@ ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING +ifeq ($(cc-name),clang) +ccflags-y += -fno-jump-tables +endif + KVM=../../../../virt/kvm obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S index efbf610e..b581e16 100644 --- a/arch/arm64/lib/clear_user.S +++ b/arch/arm64/lib/clear_user.S
@@ -17,10 +17,7 @@ */ #include <linux/linkage.h> -#include <asm/alternative.h> -#include <asm/assembler.h> -#include <asm/cpufeature.h> -#include <asm/sysreg.h> +#include <asm/uaccess.h> .text @@ -33,8 +30,7 @@ * Alignment fixed up by hardware. */ ENTRY(__arch_clear_user) -ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(0)), ARM64_ALT_PAN_NOT_UAO, \ - CONFIG_ARM64_PAN) + uaccess_enable_not_uao x2, x3, x4 mov x2, x1 // save the size for fixup return subs x1, x1, #8 b.mi 2f @@ -54,8 +50,7 @@ b.mi 5f uao_user_alternative 9f, strb, sttrb, wzr, x0, 0 5: mov x0, #0 -ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \ - CONFIG_ARM64_PAN) + uaccess_disable_not_uao x2, x3 ret ENDPROC(__arch_clear_user)
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S index 4fd67ea..c7a7d96 100644 --- a/arch/arm64/lib/copy_from_user.S +++ b/arch/arm64/lib/copy_from_user.S
@@ -16,11 +16,8 @@ #include <linux/linkage.h> -#include <asm/alternative.h> -#include <asm/assembler.h> #include <asm/cache.h> -#include <asm/cpufeature.h> -#include <asm/sysreg.h> +#include <asm/uaccess.h> /* * Copy from user space to a kernel buffer (alignment handled by the hardware) @@ -67,12 +64,10 @@ end .req x5 ENTRY(__arch_copy_from_user) -ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(0)), ARM64_ALT_PAN_NOT_UAO, \ - CONFIG_ARM64_PAN) + uaccess_enable_not_uao x3, x4, x5 add end, x0, x2 #include "copy_template.S" -ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \ - CONFIG_ARM64_PAN) + uaccess_disable_not_uao x3, x4 mov x0, #0 // Nothing to copy ret ENDPROC(__arch_copy_from_user)
diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S index 841bf8f..800779e 100644 --- a/arch/arm64/lib/copy_in_user.S +++ b/arch/arm64/lib/copy_in_user.S
@@ -18,11 +18,8 @@ #include <linux/linkage.h> -#include <asm/alternative.h> -#include <asm/assembler.h> #include <asm/cache.h> -#include <asm/cpufeature.h> -#include <asm/sysreg.h> +#include <asm/uaccess.h> /* * Copy from user space to user space (alignment handled by the hardware) @@ -68,12 +65,10 @@ end .req x5 ENTRY(__arch_copy_in_user) -ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(0)), ARM64_ALT_PAN_NOT_UAO, \ - CONFIG_ARM64_PAN) + uaccess_enable_not_uao x3, x4, x5 add end, x0, x2 #include "copy_template.S" -ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \ - CONFIG_ARM64_PAN) + uaccess_disable_not_uao x3, x4 mov x0, #0 ret ENDPROC(__arch_copy_in_user)
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S index 7a7efe2..f6cfcc0 100644 --- a/arch/arm64/lib/copy_to_user.S +++ b/arch/arm64/lib/copy_to_user.S
@@ -16,11 +16,8 @@ #include <linux/linkage.h> -#include <asm/alternative.h> -#include <asm/assembler.h> #include <asm/cache.h> -#include <asm/cpufeature.h> -#include <asm/sysreg.h> +#include <asm/uaccess.h> /* * Copy to user space from a kernel buffer (alignment handled by the hardware) @@ -66,12 +63,10 @@ end .req x5 ENTRY(__arch_copy_to_user) -ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(0)), ARM64_ALT_PAN_NOT_UAO, \ - CONFIG_ARM64_PAN) + uaccess_enable_not_uao x3, x4, x5 add end, x0, x2 #include "copy_template.S" -ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \ - CONFIG_ARM64_PAN) + uaccess_disable_not_uao x3, x4 mov x0, #0 ret ENDPROC(__arch_copy_to_user)
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S index 58b5a90..82ecb6c 100644 --- a/arch/arm64/mm/cache.S +++ b/arch/arm64/mm/cache.S
@@ -23,6 +23,7 @@ #include <asm/assembler.h> #include <asm/cpufeature.h> #include <asm/alternative.h> +#include <asm/uaccess.h> /* * flush_icache_range(start,end) @@ -48,6 +49,7 @@ * - end - virtual end address of region */ ENTRY(__flush_cache_user_range) + uaccess_ttbr0_enable x2, x3, x4 dcache_line_size x2, x3 sub x3, x2, #1 bic x4, x0, x3 @@ -69,10 +71,12 @@ dsb ish isb mov x0, #0 +1: + uaccess_ttbr0_disable x1, x2 ret 9: mov x0, #-EFAULT - ret + b 1b ENDPROC(flush_icache_range) ENDPROC(__flush_cache_user_range)
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c index 62d976e..c841cce 100644 --- a/arch/arm64/mm/context.c +++ b/arch/arm64/mm/context.c
@@ -233,7 +233,12 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu) arm64_apply_bp_hardening(); - cpu_switch_mm(mm->pgd, mm); + /* + * Defer TTBR0_EL1 setting for user threads to uaccess_enable() when + * emulating PAN. + */ + if (!system_uses_ttbr0_pan()) + cpu_switch_mm(mm->pgd, mm); } /* Errata workaround post TTBRx_EL1 update. */
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index cab3574..6dda5ff 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c
@@ -174,7 +174,7 @@ static void *__dma_alloc(struct device *dev, size_t size, /* create a coherent mapping */ page = virt_to_page(ptr); coherent_ptr = dma_common_contiguous_remap(page, size, VM_USERMAP, - prot, NULL); + prot, __builtin_return_address(0)); if (!coherent_ptr) goto no_map;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index ad49ae8..b9bff22 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c
@@ -286,13 +286,19 @@ static int __do_page_fault(struct mm_struct *mm, unsigned long addr, return fault; } -static inline bool is_permission_fault(unsigned int esr) +static inline bool is_permission_fault(unsigned int esr, struct pt_regs *regs) { unsigned int ec = ESR_ELx_EC(esr); unsigned int fsc_type = esr & ESR_ELx_FSC_TYPE; - return (ec == ESR_ELx_EC_DABT_CUR && fsc_type == ESR_ELx_FSC_PERM) || - (ec == ESR_ELx_EC_IABT_CUR && fsc_type == ESR_ELx_FSC_PERM); + if (ec != ESR_ELx_EC_DABT_CUR && ec != ESR_ELx_EC_IABT_CUR) + return false; + + if (system_uses_ttbr0_pan()) + return fsc_type == ESR_ELx_FSC_FAULT && + (regs->pstate & PSR_PAN_BIT); + else + return fsc_type == ESR_ELx_FSC_PERM; } static bool is_el0_instruction_abort(unsigned int esr) @@ -332,7 +338,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, mm_flags |= FAULT_FLAG_WRITE; } - if (is_permission_fault(esr) && (addr < TASK_SIZE)) { + if (addr < TASK_SIZE && is_permission_fault(esr, regs)) { /* regs->orig_addr_limit may be 0 if we entered from EL0 */ if (regs->orig_addr_limit == KERNEL_DS) die("Accessing user space memory with fs=KERNEL_DS", regs, esr);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index fa6b2fa..c410c9e 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@ #include <linux/efi.h> #include <linux/swiotlb.h> #include <linux/vmalloc.h> +#include <linux/mm.h> #include <asm/boot.h> #include <asm/fixmap.h> @@ -213,8 +214,8 @@ void __init arm64_memblock_init(void) * linear mapping. Take care not to clip the kernel which may be * high in memory. */ - memblock_remove(max_t(u64, memstart_addr + linear_region_size, __pa(_end)), - ULLONG_MAX); + memblock_remove(max_t(u64, memstart_addr + linear_region_size, + __pa_symbol(_end)), ULLONG_MAX); if (memstart_addr + linear_region_size < memblock_end_of_DRAM()) { /* ensure that memstart_addr remains sufficiently aligned */ memstart_addr = round_up(memblock_end_of_DRAM() - linear_region_size, @@ -229,7 +230,7 @@ void __init arm64_memblock_init(void) */ if (memory_limit != (phys_addr_t)ULLONG_MAX) { memblock_mem_limit_remove_map(memory_limit); - memblock_add(__pa(_text), (u64)(_end - _text)); + memblock_add(__pa_symbol(_text), (u64)(_end - _text)); } if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) { @@ -282,7 +283,7 @@ void __init arm64_memblock_init(void) * Register the kernel text, kernel data, initrd, and initial * pagetables with memblock. */ - memblock_reserve(__pa(_text), _end - _text); + memblock_reserve(__pa_symbol(_text), _end - _text); #ifdef CONFIG_BLK_DEV_INITRD if (initrd_start) { memblock_reserve(initrd_start, initrd_end - initrd_start); @@ -492,7 +493,8 @@ void __init mem_init(void) void free_initmem(void) { - free_reserved_area(__va(__pa(__init_begin)), __va(__pa(__init_end)), + free_reserved_area(lm_alias(__init_begin), + lm_alias(__init_end), 0, "unused kernel"); /* * Unmap the __init region but leave the VM area in place. This
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c index 757009d..201d918 100644 --- a/arch/arm64/mm/kasan_init.c +++ b/arch/arm64/mm/kasan_init.c
@@ -15,6 +15,7 @@ #include <linux/kernel.h> #include <linux/memblock.h> #include <linux/start_kernel.h> +#include <linux/mm.h> #include <asm/mmu_context.h> #include <asm/kernel-pgtable.h> @@ -26,6 +27,13 @@ static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE); +/* + * The p*d_populate functions call virt_to_phys implicitly so they can't be used + * directly on kernel symbols (bm_p*d). All the early functions are called too + * early to use lm_alias so __p*d_populate functions must be used to populate + * with the physical address from __pa_symbol. + */ + static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr, unsigned long end) { @@ -33,12 +41,12 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr, unsigned long next; if (pmd_none(*pmd)) - pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte); + __pmd_populate(pmd, __pa_symbol(kasan_zero_pte), PMD_TYPE_TABLE); pte = pte_offset_kimg(pmd, addr); do { next = addr + PAGE_SIZE; - set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page), + set_pte(pte, pfn_pte(sym_to_pfn(kasan_zero_page), PAGE_KERNEL)); } while (pte++, addr = next, addr != end && pte_none(*pte)); } @@ -51,7 +59,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud, unsigned long next; if (pud_none(*pud)) - pud_populate(&init_mm, pud, kasan_zero_pmd); + __pud_populate(pud, __pa_symbol(kasan_zero_pmd), PMD_TYPE_TABLE); pmd = pmd_offset_kimg(pud, addr); do { @@ -68,7 +76,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd, unsigned long next; if (pgd_none(*pgd)) - pgd_populate(&init_mm, pgd, kasan_zero_pud); + __pgd_populate(pgd, __pa_symbol(kasan_zero_pud), PUD_TYPE_TABLE); pud = pud_offset_kimg(pgd, addr); do { @@ -148,7 +156,7 @@ void __init kasan_init(void) */ memcpy(tmp_pg_dir, swapper_pg_dir, sizeof(tmp_pg_dir)); dsb(ishst); - cpu_replace_ttbr1(tmp_pg_dir); + cpu_replace_ttbr1(lm_alias(tmp_pg_dir)); clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END); @@ -199,10 +207,10 @@ void __init kasan_init(void) */ for (i = 0; i < PTRS_PER_PTE; i++) set_pte(&kasan_zero_pte[i], - pfn_pte(virt_to_pfn(kasan_zero_page), PAGE_KERNEL_RO)); + pfn_pte(sym_to_pfn(kasan_zero_page), PAGE_KERNEL_RO)); memset(kasan_zero_page, 0, PAGE_SIZE); - cpu_replace_ttbr1(swapper_pg_dir); + cpu_replace_ttbr1(lm_alias(swapper_pg_dir)); /* At this point kasan is fully initialized. Enable error messages */ init_task.kasan_depth = 0;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 0a56898..7ae943e 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c
@@ -30,6 +30,7 @@ #include <linux/io.h> #include <linux/slab.h> #include <linux/stop_machine.h> +#include <linux/mm.h> #include <asm/barrier.h> #include <asm/cputype.h> @@ -319,8 +320,8 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt, static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end) { - unsigned long kernel_start = __pa(_text); - unsigned long kernel_end = __pa(__init_begin); + unsigned long kernel_start = __pa_symbol(_text); + unsigned long kernel_end = __pa_symbol(__init_begin); /* * Take care not to create a writable alias for the @@ -387,21 +388,21 @@ void mark_rodata_ro(void) unsigned long section_size; section_size = (unsigned long)_etext - (unsigned long)_text; - create_mapping_late(__pa(_text), (unsigned long)_text, + create_mapping_late(__pa_symbol(_text), (unsigned long)_text, section_size, PAGE_KERNEL_ROX); /* * mark .rodata as read only. Use __init_begin rather than __end_rodata * to cover NOTES and EXCEPTION_TABLE. */ section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata; - create_mapping_late(__pa(__start_rodata), (unsigned long)__start_rodata, + create_mapping_late(__pa_symbol(__start_rodata), (unsigned long)__start_rodata, section_size, PAGE_KERNEL_RO); } static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end, pgprot_t prot, struct vm_struct *vma) { - phys_addr_t pa_start = __pa(va_start); + phys_addr_t pa_start = __pa_symbol(va_start); unsigned long size = va_end - va_start; BUG_ON(!PAGE_ALIGNED(pa_start)); @@ -480,7 +481,7 @@ static void __init map_kernel(pgd_t *pgd) */ BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES)); set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START), - __pud(__pa(bm_pmd) | PUD_TYPE_TABLE)); + __pud(__pa_symbol(bm_pmd) | PUD_TYPE_TABLE)); pud_clear_fixmap(); } else { BUG(); @@ -511,7 +512,7 @@ void __init paging_init(void) */ cpu_replace_ttbr1(__va(pgd_phys)); memcpy(swapper_pg_dir, pgd, PGD_SIZE); - cpu_replace_ttbr1(swapper_pg_dir); + cpu_replace_ttbr1(lm_alias(swapper_pg_dir)); pgd_clear_fixmap(); memblock_free(pgd_phys, PAGE_SIZE); @@ -520,7 +521,7 @@ void __init paging_init(void) * We only reuse the PGD from the swapper_pg_dir, not the pud + pmd * allocated with it. */ - memblock_free(__pa(swapper_pg_dir) + PAGE_SIZE, + memblock_free(__pa_symbol(swapper_pg_dir) + PAGE_SIZE, SWAPPER_DIR_SIZE - PAGE_SIZE); } @@ -631,6 +632,12 @@ static inline pte_t * fixmap_pte(unsigned long addr) return &bm_pte[pte_index(addr)]; } +/* + * The p*d_populate functions call virt_to_phys implicitly so they can't be used + * directly on kernel symbols (bm_p*d). This function is called too early to use + * lm_alias so __p*d_populate functions must be used to populate with the + * physical address from __pa_symbol. + */ void __init early_fixmap_init(void) { pgd_t *pgd; @@ -640,7 +647,7 @@ void __init early_fixmap_init(void) pgd = pgd_offset_k(addr); if (CONFIG_PGTABLE_LEVELS > 3 && - !(pgd_none(*pgd) || pgd_page_paddr(*pgd) == __pa(bm_pud))) { + !(pgd_none(*pgd) || pgd_page_paddr(*pgd) == __pa_symbol(bm_pud))) { /* * We only end up here if the kernel mapping and the fixmap * share the top level pgd entry, which should only happen on @@ -649,12 +656,14 @@ void __init early_fixmap_init(void) BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES)); pud = pud_offset_kimg(pgd, addr); } else { - pgd_populate(&init_mm, pgd, bm_pud); + if (pgd_none(*pgd)) + __pgd_populate(pgd, __pa_symbol(bm_pud), PUD_TYPE_TABLE); pud = fixmap_pud(addr); } - pud_populate(&init_mm, pud, bm_pmd); + if (pud_none(*pud)) + __pud_populate(pud, __pa_symbol(bm_pmd), PMD_TYPE_TABLE); pmd = fixmap_pmd(addr); - pmd_populate_kernel(&init_mm, pmd, bm_pte); + __pmd_populate(pmd, __pa_symbol(bm_pte), PMD_TYPE_TABLE); /* * The boot-ioremap range spans multiple pmds, for which
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 18d96d3..b5f5e98 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S
@@ -70,11 +70,14 @@ mrs x8, mdscr_el1 mrs x9, oslsr_el1 mrs x10, sctlr_el1 + mrs x11, tpidr_el1 + mrs x12, sp_el0 stp x2, x3, [x0] stp x4, xzr, [x0, #16] stp x5, x6, [x0, #32] stp x7, x8, [x0, #48] stp x9, x10, [x0, #64] + stp x11, x12, [x0, #80] ret ENDPROC(cpu_do_suspend) @@ -90,6 +93,7 @@ ldp x6, x8, [x0, #32] ldp x9, x10, [x0, #48] ldp x11, x12, [x0, #64] + ldp x13, x14, [x0, #80] msr tpidr_el0, x2 msr tpidrro_el0, x3 msr contextidr_el1, x4 @@ -112,6 +116,8 @@ msr mdscr_el1, x10 msr sctlr_el1, x12 + msr tpidr_el1, x13 + msr sp_el0, x14 /* * Restore oslsr_el1 by writing oslar_el1 */ @@ -134,6 +140,9 @@ ENTRY(cpu_do_switch_mm) mrs x2, ttbr1_el1 mmid x1, x1 // get mm->context.id +#ifdef CONFIG_ARM64_SW_TTBR0_PAN + bfi x0, x1, #48, #16 // set the ASID field in TTBR0 +#endif bfi x2, x1, #48, #16 // set the ASID msr ttbr1_el1, x2 // in TTBR1 (since TCR.A1 is set) isb
diff --git a/arch/arm64/xen/hypercall.S b/arch/arm64/xen/hypercall.S index 329c8027..69711f2 100644 --- a/arch/arm64/xen/hypercall.S +++ b/arch/arm64/xen/hypercall.S
@@ -49,6 +49,7 @@ #include <linux/linkage.h> #include <asm/assembler.h> +#include <asm/uaccess.h> #include <xen/interface/xen.h> @@ -91,6 +92,20 @@ mov x2, x3 mov x3, x4 mov x4, x5 + /* + * Privcmd calls are issued by the userspace. The kernel needs to + * enable access to TTBR0_EL1 as the hypervisor would issue stage 1 + * translations to user memory via AT instructions. Since AT + * instructions are not affected by the PAN bit (ARMv8.1), we only + * need the explicit uaccess_enable/disable if the TTBR0 PAN emulation + * is enabled (it implies that hardware UAO and PAN disabled). + */ + uaccess_ttbr0_enable x6, x7, x8 hvc XEN_IMM + + /* + * Disable userspace access from kernel once the hyp call completed. + */ + uaccess_ttbr0_disable x6, x7 ret ENDPROC(privcmd_call);
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h index 1fd147f..5f10f9b 100644 --- a/arch/avr32/include/uapi/asm/socket.h +++ b/arch/avr32/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@ #define SO_CNX_ADVICE 53 +#define SO_COOKIE 57 + #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h index afbc98f0..ed960d3 100644 --- a/arch/frv/include/uapi/asm/socket.h +++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@ #define SO_CNX_ADVICE 53 +#define SO_COOKIE 57 + #endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h index 0018fad..9790d13 100644 --- a/arch/ia64/include/uapi/asm/socket.h +++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@ #define SO_CNX_ADVICE 53 +#define SO_COOKIE 57 + #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile index 3686d6a..9edda5466 100644 --- a/arch/ia64/kernel/Makefile +++ b/arch/ia64/kernel/Makefile
@@ -50,32 +50,10 @@ # The gate DSO image is built using a special linker script. include $(src)/Makefile.gate -# Calculate NR_IRQ = max(IA64_NATIVE_NR_IRQS, XEN_NR_IRQS, ...) based on config -define sed-y - "/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}" -endef -quiet_cmd_nr_irqs = GEN $@ -define cmd_nr_irqs - (set -e; \ - echo "#ifndef __ASM_NR_IRQS_H__"; \ - echo "#define __ASM_NR_IRQS_H__"; \ - echo "/*"; \ - echo " * DO NOT MODIFY."; \ - echo " *"; \ - echo " * This file was generated by Kbuild"; \ - echo " *"; \ - echo " */"; \ - echo ""; \ - sed -ne $(sed-y) $<; \ - echo ""; \ - echo "#endif" ) > $@ -endef - # We use internal kbuild rules to avoid the "is up to date" message from make arch/$(SRCARCH)/kernel/nr-irqs.s: arch/$(SRCARCH)/kernel/nr-irqs.c $(Q)mkdir -p $(dir $@) $(call if_changed_dep,cc_s_c) -include/generated/nr-irqs.h: arch/$(SRCARCH)/kernel/nr-irqs.s - $(Q)mkdir -p $(dir $@) - $(call cmd,nr_irqs) +include/generated/nr-irqs.h: arch/$(SRCARCH)/kernel/nr-irqs.s FORCE + $(call filechk,offsets,__ASM_NR_IRQS_H__)
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h index 5fe42fc..ad25676 100644 --- a/arch/m32r/include/uapi/asm/socket.h +++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@ #define SO_CNX_ADVICE 53 +#define SO_COOKIE 57 + #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h index 2027240a..2f106d0 100644 --- a/arch/mips/include/uapi/asm/socket.h +++ b/arch/mips/include/uapi/asm/socket.h
@@ -108,4 +108,6 @@ #define SO_CNX_ADVICE 53 +#define SO_COOKIE 57 + #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h index 5129f23..69f9618 100644 --- a/arch/mn10300/include/uapi/asm/socket.h +++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@ #define SO_CNX_ADVICE 53 +#define SO_COOKIE 57 + #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h index 9c935d717d..b96a193 100644 --- a/arch/parisc/include/uapi/asm/socket.h +++ b/arch/parisc/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@ #define SO_CNX_ADVICE 0x402E +#define SO_COOKIE 0x4032 + #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h index 1672e33..e78550f 100644 --- a/arch/powerpc/include/uapi/asm/socket.h +++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@ #define SO_CNX_ADVICE 53 +#define SO_COOKIE 57 + #endif /* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/powerpc/platforms/pseries/cmm.c b/arch/powerpc/platforms/pseries/cmm.c index 66e7227..85018a1 100644 --- a/arch/powerpc/platforms/pseries/cmm.c +++ b/arch/powerpc/platforms/pseries/cmm.c
@@ -708,7 +708,7 @@ static void cmm_exit(void) * Return value: * 0 on success / other on failure **/ -static int cmm_set_disable(const char *val, struct kernel_param *kp) +static int cmm_set_disable(const char *val, const struct kernel_param *kp) { int disable = simple_strtoul(val, NULL, 10);
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h index 41b51c2..04fe908 100644 --- a/arch/s390/include/uapi/asm/socket.h +++ b/arch/s390/include/uapi/asm/socket.h
@@ -96,4 +96,6 @@ #define SO_CNX_ADVICE 53 +#define SO_COOKIE 57 + #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h index 31aede3..de15f0a 100644 --- a/arch/sparc/include/uapi/asm/socket.h +++ b/arch/sparc/include/uapi/asm/socket.h
@@ -86,6 +86,8 @@ #define SO_CNX_ADVICE 0x0037 +#define SO_COOKIE 0x003b + /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 0x5001 #define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002
diff --git a/arch/x86/Makefile b/arch/x86/Makefile index f408bab..93b170e 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile
@@ -11,6 +11,16 @@ KBUILD_DEFCONFIG := $(ARCH)_defconfig endif +# For gcc stack alignment is specified with -mpreferred-stack-boundary, +# clang has the option -mstack-alignment for that purpose. +ifneq ($(call cc-option, -mpreferred-stack-boundary=4),) + cc_stack_align4 := -mpreferred-stack-boundary=2 + cc_stack_align8 := -mpreferred-stack-boundary=3 +else ifneq ($(call cc-option, -mstack-alignment=16),) + cc_stack_align4 := -mstack-alignment=4 + cc_stack_align8 := -mstack-alignment=8 +endif + # How to compile the 16-bit code. Note we always compile for -march=i386; # that way we can complain to the user if the CPU is insufficient. # @@ -24,10 +34,11 @@ -DDISABLE_BRANCH_PROFILING \ -Wall -Wstrict-prototypes -march=i386 -mregparm=3 \ -fno-strict-aliasing -fomit-frame-pointer -fno-pic \ - -mno-mmx -mno-sse \ - $(call cc-option, -ffreestanding) \ - $(call cc-option, -fno-stack-protector) \ - $(call cc-option, -mpreferred-stack-boundary=2) + -mno-mmx -mno-sse + +REALMODE_CFLAGS += $(call __cc-option, $(CC), $(REALMODE_CFLAGS), -ffreestanding) +REALMODE_CFLAGS += $(call __cc-option, $(CC), $(REALMODE_CFLAGS), -fno-stack-protector) +REALMODE_CFLAGS += $(call __cc-option, $(CC), $(REALMODE_CFLAGS), $(cc_stack_align4)) export REALMODE_CFLAGS # BITS is used as extension for files which are available in a 32 bit @@ -64,8 +75,10 @@ # with nonstandard options KBUILD_CFLAGS += -fno-pic - # prevent gcc from keeping the stack 16 byte aligned - KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=2) + # Align the stack to the register width instead of using the default + # alignment of 16 bytes. This reduces stack usage and the number of + # alignment instructions. + KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align4)) # Disable unit-at-a-time mode on pre-gcc-4.0 compilers, it makes gcc use # a lot more stack due to the lack of sharing of stacklots: @@ -88,17 +101,25 @@ KBUILD_CFLAGS += -m64 # Align jump targets to 1 byte, not the default 16 bytes: - KBUILD_CFLAGS += -falign-jumps=1 + KBUILD_CFLAGS += $(call cc-option,-falign-jumps=1) # Pack loops tightly as well: - KBUILD_CFLAGS += -falign-loops=1 + KBUILD_CFLAGS += $(call cc-option,-falign-loops=1) # Don't autogenerate traditional x87 instructions KBUILD_CFLAGS += $(call cc-option,-mno-80387) KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387) - # Use -mpreferred-stack-boundary=3 if supported. - KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=3) + KBUILD_CFLAGS += -fno-pic + + # By default gcc and clang use a stack alignment of 16 bytes for x86. + # However the standard kernel entry on x86-64 leaves the stack on an + # 8-byte boundary. If the compiler isn't informed about the actual + # alignment it will generate extra alignment instructions for the + # default alignment which keep the stack *mis*aligned. + # Furthermore an alignment to the register width reduces stack usage + # and the number of alignment instructions. + KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align8)) # Use -mskip-rax-setup if supported. KBUILD_CFLAGS += $(call cc-option,-mskip-rax-setup)
diff --git a/arch/x86/boot/string.c b/arch/x86/boot/string.c index 9e240fc..08dfce0 100644 --- a/arch/x86/boot/string.c +++ b/arch/x86/boot/string.c
@@ -16,6 +16,15 @@ #include "ctype.h" #include "string.h" +/* + * Undef these macros so that the functions that we provide + * here will have the correct names regardless of how string.h + * may have chosen to #define them. + */ +#undef memcpy +#undef memset +#undef memcmp + int memcmp(const void *s1, const void *s2, size_t len) { bool diff;
diff --git a/arch/x86/configs/i386_ranchu_defconfig b/arch/x86/configs/i386_ranchu_defconfig new file mode 100644 index 0000000..a1c83c4 --- /dev/null +++ b/arch/x86/configs/i386_ranchu_defconfig
@@ -0,0 +1,424 @@ +# CONFIG_64BIT is not set +# CONFIG_LOCALVERSION_AUTO is not set +CONFIG_POSIX_MQUEUE=y +CONFIG_AUDIT=y +CONFIG_NO_HZ=y +CONFIG_HIGH_RES_TIMERS=y +CONFIG_BSD_PROCESS_ACCT=y +CONFIG_TASKSTATS=y +CONFIG_TASK_DELAY_ACCT=y +CONFIG_TASK_XACCT=y +CONFIG_TASK_IO_ACCOUNTING=y +CONFIG_CGROUPS=y +CONFIG_CGROUP_DEBUG=y +CONFIG_CGROUP_FREEZER=y +CONFIG_CGROUP_CPUACCT=y +CONFIG_CGROUP_SCHED=y +CONFIG_RT_GROUP_SCHED=y +CONFIG_BLK_DEV_INITRD=y +CONFIG_CC_OPTIMIZE_FOR_SIZE=y +CONFIG_SYSCTL_SYSCALL=y +CONFIG_KALLSYMS_ALL=y +CONFIG_EMBEDDED=y +# CONFIG_COMPAT_BRK is not set +CONFIG_ARCH_MMAP_RND_BITS=16 +CONFIG_PARTITION_ADVANCED=y +CONFIG_OSF_PARTITION=y +CONFIG_AMIGA_PARTITION=y +CONFIG_MAC_PARTITION=y +CONFIG_BSD_DISKLABEL=y +CONFIG_MINIX_SUBPARTITION=y +CONFIG_SOLARIS_X86_PARTITION=y +CONFIG_UNIXWARE_DISKLABEL=y +CONFIG_SGI_PARTITION=y +CONFIG_SUN_PARTITION=y +CONFIG_KARMA_PARTITION=y +CONFIG_SMP=y +CONFIG_X86_BIGSMP=y +CONFIG_MCORE2=y +CONFIG_X86_GENERIC=y +CONFIG_HPET_TIMER=y +CONFIG_NR_CPUS=512 +CONFIG_PREEMPT=y +# CONFIG_X86_MCE is not set +CONFIG_X86_REBOOTFIXUPS=y +CONFIG_X86_MSR=y +CONFIG_X86_CPUID=y +CONFIG_KSM=y +CONFIG_CMA=y +# CONFIG_MTRR_SANITIZER is not set +CONFIG_EFI=y +CONFIG_EFI_STUB=y +CONFIG_HZ_100=y +CONFIG_PHYSICAL_START=0x100000 +CONFIG_PM_AUTOSLEEP=y +CONFIG_PM_WAKELOCKS=y +CONFIG_PM_WAKELOCKS_LIMIT=0 +# CONFIG_PM_WAKELOCKS_GC is not set +CONFIG_PM_DEBUG=y +CONFIG_CPU_FREQ=y +# CONFIG_CPU_FREQ_STAT is not set +CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y +CONFIG_CPU_FREQ_GOV_USERSPACE=y +CONFIG_PCIEPORTBUS=y +# CONFIG_PCIEASPM is not set +CONFIG_PCCARD=y +CONFIG_YENTA=y +CONFIG_HOTPLUG_PCI=y +# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set +CONFIG_BINFMT_MISC=y +CONFIG_NET=y +CONFIG_PACKET=y +CONFIG_UNIX=y +CONFIG_XFRM_USER=y +CONFIG_NET_KEY=y +CONFIG_INET=y +CONFIG_IP_MULTICAST=y +CONFIG_IP_ADVANCED_ROUTER=y +CONFIG_IP_MULTIPLE_TABLES=y +CONFIG_IP_ROUTE_MULTIPATH=y +CONFIG_IP_ROUTE_VERBOSE=y +CONFIG_IP_PNP=y +CONFIG_IP_PNP_DHCP=y +CONFIG_IP_PNP_BOOTP=y +CONFIG_IP_PNP_RARP=y +CONFIG_IP_MROUTE=y +CONFIG_IP_PIMSM_V1=y +CONFIG_IP_PIMSM_V2=y +CONFIG_SYN_COOKIES=y +CONFIG_INET_ESP=y +# CONFIG_INET_XFRM_MODE_BEET is not set +# CONFIG_INET_LRO is not set +CONFIG_INET_DIAG_DESTROY=y +CONFIG_IPV6_ROUTER_PREF=y +CONFIG_IPV6_ROUTE_INFO=y +CONFIG_IPV6_OPTIMISTIC_DAD=y +CONFIG_INET6_AH=y +CONFIG_INET6_ESP=y +CONFIG_INET6_IPCOMP=y +CONFIG_IPV6_MIP6=y +CONFIG_IPV6_MULTIPLE_TABLES=y +CONFIG_NETLABEL=y +CONFIG_NETFILTER=y +CONFIG_NF_CONNTRACK=y +CONFIG_NF_CONNTRACK_SECMARK=y +CONFIG_NF_CONNTRACK_EVENTS=y +CONFIG_NF_CT_PROTO_DCCP=y +CONFIG_NF_CT_PROTO_SCTP=y +CONFIG_NF_CT_PROTO_UDPLITE=y +CONFIG_NF_CONNTRACK_AMANDA=y +CONFIG_NF_CONNTRACK_FTP=y +CONFIG_NF_CONNTRACK_H323=y +CONFIG_NF_CONNTRACK_IRC=y +CONFIG_NF_CONNTRACK_NETBIOS_NS=y +CONFIG_NF_CONNTRACK_PPTP=y +CONFIG_NF_CONNTRACK_SANE=y +CONFIG_NF_CONNTRACK_TFTP=y +CONFIG_NF_CT_NETLINK=y +CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y +CONFIG_NETFILTER_XT_TARGET_CONNMARK=y +CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y +CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y +CONFIG_NETFILTER_XT_TARGET_MARK=y +CONFIG_NETFILTER_XT_TARGET_NFLOG=y +CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y +CONFIG_NETFILTER_XT_TARGET_TPROXY=y +CONFIG_NETFILTER_XT_TARGET_TRACE=y +CONFIG_NETFILTER_XT_TARGET_SECMARK=y +CONFIG_NETFILTER_XT_TARGET_TCPMSS=y +CONFIG_NETFILTER_XT_MATCH_COMMENT=y +CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y +CONFIG_NETFILTER_XT_MATCH_CONNMARK=y +CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y +CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y +CONFIG_NETFILTER_XT_MATCH_HELPER=y +CONFIG_NETFILTER_XT_MATCH_IPRANGE=y +CONFIG_NETFILTER_XT_MATCH_LENGTH=y +CONFIG_NETFILTER_XT_MATCH_LIMIT=y +CONFIG_NETFILTER_XT_MATCH_MAC=y +CONFIG_NETFILTER_XT_MATCH_MARK=y +CONFIG_NETFILTER_XT_MATCH_POLICY=y +CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y +CONFIG_NETFILTER_XT_MATCH_QTAGUID=y +CONFIG_NETFILTER_XT_MATCH_QUOTA=y +CONFIG_NETFILTER_XT_MATCH_QUOTA2=y +CONFIG_NETFILTER_XT_MATCH_SOCKET=y +CONFIG_NETFILTER_XT_MATCH_STATE=y +CONFIG_NETFILTER_XT_MATCH_STATISTIC=y +CONFIG_NETFILTER_XT_MATCH_STRING=y +CONFIG_NETFILTER_XT_MATCH_TIME=y +CONFIG_NETFILTER_XT_MATCH_U32=y +CONFIG_NF_CONNTRACK_IPV4=y +CONFIG_IP_NF_IPTABLES=y +CONFIG_IP_NF_MATCH_AH=y +CONFIG_IP_NF_MATCH_ECN=y +CONFIG_IP_NF_MATCH_TTL=y +CONFIG_IP_NF_FILTER=y +CONFIG_IP_NF_TARGET_REJECT=y +CONFIG_IP_NF_MANGLE=y +CONFIG_IP_NF_RAW=y +CONFIG_IP_NF_SECURITY=y +CONFIG_IP_NF_ARPTABLES=y +CONFIG_IP_NF_ARPFILTER=y +CONFIG_IP_NF_ARP_MANGLE=y +CONFIG_NF_CONNTRACK_IPV6=y +CONFIG_IP6_NF_IPTABLES=y +CONFIG_IP6_NF_FILTER=y +CONFIG_IP6_NF_TARGET_REJECT=y +CONFIG_IP6_NF_MANGLE=y +CONFIG_IP6_NF_RAW=y +CONFIG_NET_SCHED=y +CONFIG_NET_SCH_HTB=y +CONFIG_NET_CLS_U32=y +CONFIG_NET_EMATCH=y +CONFIG_NET_EMATCH_U32=y +CONFIG_NET_CLS_ACT=y +CONFIG_CFG80211=y +CONFIG_MAC80211=y +CONFIG_MAC80211_LEDS=y +CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_DMA_CMA=y +CONFIG_CMA_SIZE_MBYTES=16 +CONFIG_CONNECTOR=y +CONFIG_BLK_DEV_LOOP=y +CONFIG_BLK_DEV_RAM=y +CONFIG_BLK_DEV_RAM_SIZE=8192 +CONFIG_VIRTIO_BLK=y +CONFIG_BLK_DEV_SD=y +CONFIG_BLK_DEV_SR=y +CONFIG_BLK_DEV_SR_VENDOR=y +CONFIG_CHR_DEV_SG=y +CONFIG_SCSI_CONSTANTS=y +CONFIG_SCSI_SPI_ATTRS=y +CONFIG_SCSI_ISCSI_ATTRS=y +# CONFIG_SCSI_LOWLEVEL is not set +CONFIG_ATA=y +CONFIG_SATA_AHCI=y +CONFIG_ATA_PIIX=y +CONFIG_PATA_AMD=y +CONFIG_PATA_OLDPIIX=y +CONFIG_PATA_SCH=y +CONFIG_PATA_MPIIX=y +CONFIG_ATA_GENERIC=y +CONFIG_MD=y +CONFIG_BLK_DEV_MD=y +CONFIG_BLK_DEV_DM=y +CONFIG_DM_DEBUG=y +CONFIG_DM_CRYPT=y +CONFIG_DM_MIRROR=y +CONFIG_DM_ZERO=y +CONFIG_DM_UEVENT=y +CONFIG_DM_VERITY=y +CONFIG_DM_VERITY_FEC=y +CONFIG_NETDEVICES=y +CONFIG_NETCONSOLE=y +CONFIG_TUN=y +CONFIG_VIRTIO_NET=y +CONFIG_BNX2=y +CONFIG_TIGON3=y +CONFIG_NET_TULIP=y +CONFIG_E100=y +CONFIG_E1000=y +CONFIG_E1000E=y +CONFIG_SKY2=y +CONFIG_NE2K_PCI=y +CONFIG_FORCEDETH=y +CONFIG_8139TOO=y +# CONFIG_8139TOO_PIO is not set +CONFIG_R8169=y +CONFIG_FDDI=y +CONFIG_PPP=y +CONFIG_PPP_BSDCOMP=y +CONFIG_PPP_DEFLATE=y +CONFIG_PPP_MPPE=y +CONFIG_PPPOLAC=y +CONFIG_PPPOPNS=y +CONFIG_USB_USBNET=y +CONFIG_INPUT_POLLDEV=y +# CONFIG_INPUT_MOUSEDEV_PSAUX is not set +CONFIG_INPUT_EVDEV=y +CONFIG_INPUT_KEYRESET=y +# CONFIG_KEYBOARD_ATKBD is not set +CONFIG_KEYBOARD_GOLDFISH_EVENTS=y +# CONFIG_INPUT_MOUSE is not set +CONFIG_INPUT_JOYSTICK=y +CONFIG_JOYSTICK_XPAD=y +CONFIG_JOYSTICK_XPAD_FF=y +CONFIG_JOYSTICK_XPAD_LEDS=y +CONFIG_INPUT_TABLET=y +CONFIG_TABLET_USB_ACECAD=y +CONFIG_TABLET_USB_AIPTEK=y +CONFIG_TABLET_USB_GTCO=y +CONFIG_TABLET_USB_HANWANG=y +CONFIG_TABLET_USB_KBTAB=y +CONFIG_INPUT_TOUCHSCREEN=y +CONFIG_INPUT_MISC=y +CONFIG_INPUT_KEYCHORD=y +CONFIG_INPUT_UINPUT=y +CONFIG_INPUT_GPIO=y +# CONFIG_SERIO is not set +# CONFIG_VT is not set +# CONFIG_LEGACY_PTYS is not set +CONFIG_SERIAL_NONSTANDARD=y +# CONFIG_DEVMEM is not set +# CONFIG_DEVKMEM is not set +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_VIRTIO_CONSOLE=y +CONFIG_NVRAM=y +CONFIG_I2C_I801=y +CONFIG_BATTERY_GOLDFISH=y +CONFIG_WATCHDOG=y +CONFIG_MEDIA_SUPPORT=y +CONFIG_AGP=y +CONFIG_AGP_AMD64=y +CONFIG_AGP_INTEL=y +CONFIG_DRM=y +CONFIG_FB_MODE_HELPERS=y +CONFIG_FB_TILEBLITTING=y +CONFIG_FB_EFI=y +CONFIG_FB_GOLDFISH=y +CONFIG_BACKLIGHT_LCD_SUPPORT=y +# CONFIG_LCD_CLASS_DEVICE is not set +CONFIG_SOUND=y +CONFIG_SND=y +CONFIG_HIDRAW=y +CONFIG_UHID=y +CONFIG_HID_A4TECH=y +CONFIG_HID_ACRUX=y +CONFIG_HID_ACRUX_FF=y +CONFIG_HID_APPLE=y +CONFIG_HID_BELKIN=y +CONFIG_HID_CHERRY=y +CONFIG_HID_CHICONY=y +CONFIG_HID_PRODIKEYS=y +CONFIG_HID_CYPRESS=y +CONFIG_HID_DRAGONRISE=y +CONFIG_DRAGONRISE_FF=y +CONFIG_HID_EMS_FF=y +CONFIG_HID_ELECOM=y +CONFIG_HID_EZKEY=y +CONFIG_HID_HOLTEK=y +CONFIG_HID_KEYTOUCH=y +CONFIG_HID_KYE=y +CONFIG_HID_UCLOGIC=y +CONFIG_HID_WALTOP=y +CONFIG_HID_GYRATION=y +CONFIG_HID_TWINHAN=y +CONFIG_HID_KENSINGTON=y +CONFIG_HID_LCPOWER=y +CONFIG_HID_LOGITECH=y +CONFIG_HID_LOGITECH_DJ=y +CONFIG_LOGITECH_FF=y +CONFIG_LOGIRUMBLEPAD2_FF=y +CONFIG_LOGIG940_FF=y +CONFIG_HID_MAGICMOUSE=y +CONFIG_HID_MICROSOFT=y +CONFIG_HID_MONTEREY=y +CONFIG_HID_MULTITOUCH=y +CONFIG_HID_NTRIG=y +CONFIG_HID_ORTEK=y +CONFIG_HID_PANTHERLORD=y +CONFIG_PANTHERLORD_FF=y +CONFIG_HID_PETALYNX=y +CONFIG_HID_PICOLCD=y +CONFIG_HID_PRIMAX=y +CONFIG_HID_ROCCAT=y +CONFIG_HID_SAITEK=y +CONFIG_HID_SAMSUNG=y +CONFIG_HID_SONY=y +CONFIG_HID_SPEEDLINK=y +CONFIG_HID_SUNPLUS=y +CONFIG_HID_GREENASIA=y +CONFIG_GREENASIA_FF=y +CONFIG_HID_SMARTJOYPLUS=y +CONFIG_SMARTJOYPLUS_FF=y +CONFIG_HID_TIVO=y +CONFIG_HID_TOPSEED=y +CONFIG_HID_THRUSTMASTER=y +CONFIG_HID_WACOM=y +CONFIG_HID_WIIMOTE=y +CONFIG_HID_ZEROPLUS=y +CONFIG_HID_ZYDACRON=y +CONFIG_HID_PID=y +CONFIG_USB_HIDDEV=y +CONFIG_USB_ANNOUNCE_NEW_DEVICES=y +CONFIG_USB_MON=y +CONFIG_USB_EHCI_HCD=y +# CONFIG_USB_EHCI_TT_NEWSCHED is not set +CONFIG_USB_OHCI_HCD=y +CONFIG_USB_UHCI_HCD=y +CONFIG_USB_PRINTER=y +CONFIG_USB_STORAGE=y +CONFIG_USB_OTG_WAKELOCK=y +CONFIG_EDAC=y +CONFIG_RTC_CLASS=y +# CONFIG_RTC_HCTOSYS is not set +CONFIG_DMADEVICES=y +CONFIG_VIRTIO_PCI=y +CONFIG_STAGING=y +CONFIG_ASHMEM=y +CONFIG_ANDROID_LOW_MEMORY_KILLER=y +CONFIG_SYNC=y +CONFIG_SW_SYNC=y +CONFIG_SYNC_FILE=y +CONFIG_ION=y +CONFIG_GOLDFISH_AUDIO=y +CONFIG_SND_HDA_INTEL=y +CONFIG_GOLDFISH=y +CONFIG_GOLDFISH_PIPE=y +CONFIG_GOLDFISH_SYNC=y +CONFIG_ANDROID=y +CONFIG_ANDROID_BINDER_IPC=y +CONFIG_ISCSI_IBFT_FIND=y +CONFIG_EXT4_FS=y +CONFIG_EXT4_FS_SECURITY=y +CONFIG_QUOTA=y +CONFIG_QUOTA_NETLINK_INTERFACE=y +# CONFIG_PRINT_QUOTA_WARNING is not set +CONFIG_FUSE_FS=y +CONFIG_ISO9660_FS=y +CONFIG_JOLIET=y +CONFIG_ZISOFS=y +CONFIG_MSDOS_FS=y +CONFIG_VFAT_FS=y +CONFIG_PROC_KCORE=y +CONFIG_TMPFS=y +CONFIG_TMPFS_POSIX_ACL=y +CONFIG_HUGETLBFS=y +CONFIG_PSTORE=y +CONFIG_PSTORE_CONSOLE=y +CONFIG_PSTORE_RAM=y +# CONFIG_NETWORK_FILESYSTEMS is not set +CONFIG_NLS_DEFAULT="utf8" +CONFIG_NLS_CODEPAGE_437=y +CONFIG_NLS_ASCII=y +CONFIG_NLS_ISO8859_1=y +CONFIG_NLS_UTF8=y +CONFIG_PRINTK_TIME=y +CONFIG_DEBUG_INFO=y +# CONFIG_ENABLE_WARN_DEPRECATED is not set +# CONFIG_ENABLE_MUST_CHECK is not set +CONFIG_FRAME_WARN=2048 +# CONFIG_UNUSED_SYMBOLS is not set +CONFIG_MAGIC_SYSRQ=y +CONFIG_DEBUG_MEMORY_INIT=y +CONFIG_PANIC_TIMEOUT=5 +CONFIG_SCHEDSTATS=y +CONFIG_TIMER_STATS=y +CONFIG_SCHED_TRACER=y +CONFIG_BLK_DEV_IO_TRACE=y +CONFIG_PROVIDE_OHCI1394_DMA_INIT=y +CONFIG_KEYS=y +CONFIG_SECURITY=y +CONFIG_SECURITY_NETWORK=y +CONFIG_SECURITY_SELINUX=y +CONFIG_CRYPTO_AES_586=y +CONFIG_CRYPTO_TWOFISH=y +CONFIG_ASYMMETRIC_KEY_TYPE=y +CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y +CONFIG_X509_CERTIFICATE_PARSER=y +CONFIG_PKCS7_MESSAGE_PARSER=y +CONFIG_PKCS7_TEST_KEY=y +# CONFIG_VIRTUALIZATION is not set +CONFIG_CRC_T10DIF=y
diff --git a/arch/x86/configs/x86_64_cuttlefish_defconfig b/arch/x86/configs/x86_64_cuttlefish_defconfig new file mode 100644 index 0000000..98543be --- /dev/null +++ b/arch/x86/configs/x86_64_cuttlefish_defconfig
@@ -0,0 +1,476 @@ +CONFIG_POSIX_MQUEUE=y +# CONFIG_FHANDLE is not set +# CONFIG_USELIB is not set +CONFIG_AUDIT=y +CONFIG_NO_HZ=y +CONFIG_HIGH_RES_TIMERS=y +CONFIG_BSD_PROCESS_ACCT=y +CONFIG_TASKSTATS=y +CONFIG_TASK_DELAY_ACCT=y +CONFIG_TASK_XACCT=y +CONFIG_TASK_IO_ACCOUNTING=y +CONFIG_IKCONFIG=y +CONFIG_IKCONFIG_PROC=y +CONFIG_CGROUPS=y +CONFIG_CGROUP_DEBUG=y +CONFIG_CGROUP_FREEZER=y +CONFIG_CGROUP_CPUACCT=y +CONFIG_MEMCG=y +CONFIG_MEMCG_SWAP=y +CONFIG_CGROUP_SCHED=y +CONFIG_RT_GROUP_SCHED=y +CONFIG_CGROUP_BPF=y +CONFIG_NAMESPACES=y +CONFIG_BLK_DEV_INITRD=y +# CONFIG_RD_LZ4 is not set +CONFIG_KALLSYMS_ALL=y +# CONFIG_PCSPKR_PLATFORM is not set +CONFIG_BPF_SYSCALL=y +CONFIG_EMBEDDED=y +# CONFIG_COMPAT_BRK is not set +CONFIG_PROFILING=y +CONFIG_OPROFILE=y +CONFIG_KPROBES=y +CONFIG_JUMP_LABEL=y +CONFIG_CC_STACKPROTECTOR_STRONG=y +CONFIG_MODULES=y +CONFIG_MODULE_UNLOAD=y +CONFIG_MODVERSIONS=y +CONFIG_PARTITION_ADVANCED=y +CONFIG_SMP=y +CONFIG_HYPERVISOR_GUEST=y +CONFIG_PARAVIRT=y +CONFIG_PARAVIRT_SPINLOCKS=y +CONFIG_MCORE2=y +CONFIG_PROCESSOR_SELECT=y +# CONFIG_CPU_SUP_CENTAUR is not set +CONFIG_NR_CPUS=8 +CONFIG_PREEMPT=y +# CONFIG_MICROCODE is not set +CONFIG_X86_MSR=y +CONFIG_X86_CPUID=y +CONFIG_KSM=y +CONFIG_DEFAULT_MMAP_MIN_ADDR=65536 +CONFIG_TRANSPARENT_HUGEPAGE=y +CONFIG_ZSMALLOC=y +# CONFIG_MTRR is not set +CONFIG_HZ_100=y +CONFIG_KEXEC=y +CONFIG_CRASH_DUMP=y +CONFIG_PHYSICAL_START=0x200000 +CONFIG_RANDOMIZE_BASE=y +CONFIG_PHYSICAL_ALIGN=0x1000000 +CONFIG_CMDLINE_BOOL=y +CONFIG_CMDLINE="console=ttyS0 reboot=p nopti" +CONFIG_PM_AUTOSLEEP=y +CONFIG_PM_WAKELOCKS=y +CONFIG_PM_WAKELOCKS_LIMIT=0 +# CONFIG_PM_WAKELOCKS_GC is not set +CONFIG_PM_DEBUG=y +CONFIG_ACPI_PROCFS_POWER=y +# CONFIG_ACPI_FAN is not set +# CONFIG_ACPI_THERMAL is not set +# CONFIG_X86_PM_TIMER is not set +CONFIG_CPU_FREQ=y +CONFIG_CPU_FREQ_GOV_ONDEMAND=y +CONFIG_X86_ACPI_CPUFREQ=y +# CONFIG_X86_ACPI_CPUFREQ_CPB is not set +CONFIG_PCI_MMCONFIG=y +CONFIG_PCI_MSI=y +# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set +CONFIG_BINFMT_MISC=y +CONFIG_IA32_EMULATION=y +CONFIG_NET=y +CONFIG_PACKET=y +CONFIG_UNIX=y +CONFIG_XFRM_USER=y +CONFIG_NET_KEY=y +CONFIG_INET=y +CONFIG_IP_MULTICAST=y +CONFIG_IP_ADVANCED_ROUTER=y +CONFIG_IP_MULTIPLE_TABLES=y +CONFIG_IP_ROUTE_MULTIPATH=y +CONFIG_IP_ROUTE_VERBOSE=y +CONFIG_IP_MROUTE=y +CONFIG_IP_PIMSM_V1=y +CONFIG_IP_PIMSM_V2=y +CONFIG_SYN_COOKIES=y +CONFIG_NET_IPVTI=y +CONFIG_INET_ESP=y +# CONFIG_INET_XFRM_MODE_BEET is not set +CONFIG_INET_DIAG_DESTROY=y +CONFIG_TCP_CONG_ADVANCED=y +# CONFIG_TCP_CONG_BIC is not set +# CONFIG_TCP_CONG_WESTWOOD is not set +# CONFIG_TCP_CONG_HTCP is not set +CONFIG_TCP_MD5SIG=y +CONFIG_IPV6_ROUTER_PREF=y +CONFIG_IPV6_ROUTE_INFO=y +CONFIG_IPV6_OPTIMISTIC_DAD=y +CONFIG_INET6_AH=y +CONFIG_INET6_ESP=y +CONFIG_INET6_IPCOMP=y +CONFIG_IPV6_MIP6=y +CONFIG_IPV6_VTI=y +CONFIG_IPV6_MULTIPLE_TABLES=y +CONFIG_NETLABEL=y +CONFIG_NETFILTER=y +CONFIG_NF_CONNTRACK=y +CONFIG_NF_CONNTRACK_SECMARK=y +CONFIG_NF_CONNTRACK_EVENTS=y +CONFIG_NF_CT_PROTO_DCCP=y +CONFIG_NF_CT_PROTO_SCTP=y +CONFIG_NF_CT_PROTO_UDPLITE=y +CONFIG_NF_CONNTRACK_AMANDA=y +CONFIG_NF_CONNTRACK_FTP=y +CONFIG_NF_CONNTRACK_H323=y +CONFIG_NF_CONNTRACK_IRC=y +CONFIG_NF_CONNTRACK_NETBIOS_NS=y +CONFIG_NF_CONNTRACK_PPTP=y +CONFIG_NF_CONNTRACK_SANE=y +CONFIG_NF_CONNTRACK_TFTP=y +CONFIG_NF_CT_NETLINK=y +CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y +CONFIG_NETFILTER_XT_TARGET_CONNMARK=y +CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y +CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y +CONFIG_NETFILTER_XT_TARGET_MARK=y +CONFIG_NETFILTER_XT_TARGET_NFLOG=y +CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y +CONFIG_NETFILTER_XT_TARGET_TPROXY=y +CONFIG_NETFILTER_XT_TARGET_TRACE=y +CONFIG_NETFILTER_XT_TARGET_SECMARK=y +CONFIG_NETFILTER_XT_TARGET_TCPMSS=y +CONFIG_NETFILTER_XT_MATCH_BPF=y +CONFIG_NETFILTER_XT_MATCH_COMMENT=y +CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y +CONFIG_NETFILTER_XT_MATCH_CONNMARK=y +CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y +CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y +CONFIG_NETFILTER_XT_MATCH_HELPER=y +CONFIG_NETFILTER_XT_MATCH_IPRANGE=y +CONFIG_NETFILTER_XT_MATCH_LENGTH=y +CONFIG_NETFILTER_XT_MATCH_LIMIT=y +CONFIG_NETFILTER_XT_MATCH_MAC=y +CONFIG_NETFILTER_XT_MATCH_MARK=y +CONFIG_NETFILTER_XT_MATCH_POLICY=y +CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y +CONFIG_NETFILTER_XT_MATCH_QTAGUID=y +CONFIG_NETFILTER_XT_MATCH_QUOTA=y +CONFIG_NETFILTER_XT_MATCH_QUOTA2=y +CONFIG_NETFILTER_XT_MATCH_SOCKET=y +CONFIG_NETFILTER_XT_MATCH_STATE=y +CONFIG_NETFILTER_XT_MATCH_STATISTIC=y +CONFIG_NETFILTER_XT_MATCH_STRING=y +CONFIG_NETFILTER_XT_MATCH_TIME=y +CONFIG_NETFILTER_XT_MATCH_U32=y +CONFIG_NF_CONNTRACK_IPV4=y +CONFIG_IP_NF_IPTABLES=y +CONFIG_IP_NF_MATCH_AH=y +CONFIG_IP_NF_MATCH_ECN=y +CONFIG_IP_NF_MATCH_TTL=y +CONFIG_IP_NF_FILTER=y +CONFIG_IP_NF_TARGET_REJECT=y +CONFIG_IP_NF_NAT=y +CONFIG_IP_NF_TARGET_MASQUERADE=y +CONFIG_IP_NF_TARGET_NETMAP=y +CONFIG_IP_NF_TARGET_REDIRECT=y +CONFIG_IP_NF_MANGLE=y +CONFIG_IP_NF_RAW=y +CONFIG_IP_NF_SECURITY=y +CONFIG_IP_NF_ARPTABLES=y +CONFIG_IP_NF_ARPFILTER=y +CONFIG_IP_NF_ARP_MANGLE=y +CONFIG_NF_CONNTRACK_IPV6=y +CONFIG_IP6_NF_IPTABLES=y +CONFIG_IP6_NF_MATCH_IPV6HEADER=y +CONFIG_IP6_NF_MATCH_RPFILTER=y +CONFIG_IP6_NF_FILTER=y +CONFIG_IP6_NF_TARGET_REJECT=y +CONFIG_IP6_NF_MANGLE=y +CONFIG_IP6_NF_RAW=y +CONFIG_NET_SCHED=y +CONFIG_NET_SCH_HTB=y +CONFIG_NET_CLS_U32=y +CONFIG_NET_EMATCH=y +CONFIG_NET_EMATCH_U32=y +CONFIG_NET_CLS_ACT=y +CONFIG_CFG80211=y +CONFIG_MAC80211=y +CONFIG_RFKILL=y +CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_DEVTMPFS=y +CONFIG_DEBUG_DEVRES=y +CONFIG_OF=y +CONFIG_OF_UNITTEST=y +# CONFIG_PNP_DEBUG_MESSAGES is not set +CONFIG_ZRAM=y +CONFIG_BLK_DEV_LOOP=y +CONFIG_BLK_DEV_RAM=y +CONFIG_BLK_DEV_RAM_SIZE=8192 +CONFIG_VIRTIO_BLK=y +CONFIG_UID_SYS_STATS=y +CONFIG_MEMORY_STATE_TIME=y +CONFIG_SCSI=y +CONFIG_BLK_DEV_SD=y +CONFIG_BLK_DEV_SR=y +CONFIG_BLK_DEV_SR_VENDOR=y +CONFIG_CHR_DEV_SG=y +CONFIG_SCSI_CONSTANTS=y +CONFIG_SCSI_SPI_ATTRS=y +CONFIG_SCSI_VIRTIO=y +CONFIG_MD=y +CONFIG_BLK_DEV_DM=y +CONFIG_DM_CRYPT=y +CONFIG_DM_MIRROR=y +CONFIG_DM_ZERO=y +CONFIG_DM_UEVENT=y +CONFIG_DM_VERITY=y +CONFIG_DM_VERITY_HASH_PREFETCH_MIN_SIZE=1 +CONFIG_DM_VERITY_FEC=y +CONFIG_DM_ANDROID_VERITY=y +CONFIG_NETDEVICES=y +CONFIG_NETCONSOLE=y +CONFIG_NETCONSOLE_DYNAMIC=y +CONFIG_TUN=y +CONFIG_VIRTIO_NET=y +# CONFIG_ETHERNET is not set +CONFIG_PPP=y +CONFIG_PPP_BSDCOMP=y +CONFIG_PPP_DEFLATE=y +CONFIG_PPP_MPPE=y +CONFIG_PPPOLAC=y +CONFIG_PPPOPNS=y +CONFIG_USB_USBNET=y +# CONFIG_USB_NET_AX8817X is not set +# CONFIG_USB_NET_AX88179_178A is not set +# CONFIG_USB_NET_CDCETHER is not set +# CONFIG_USB_NET_CDC_NCM is not set +# CONFIG_USB_NET_NET1080 is not set +# CONFIG_USB_NET_CDC_SUBSET is not set +# CONFIG_USB_NET_ZAURUS is not set +# CONFIG_WLAN_VENDOR_ADMTEK is not set +# CONFIG_WLAN_VENDOR_ATH is not set +# CONFIG_WLAN_VENDOR_ATMEL is not set +# CONFIG_WLAN_VENDOR_BROADCOM is not set +# CONFIG_WLAN_VENDOR_CISCO is not set +# CONFIG_WLAN_VENDOR_INTEL is not set +# CONFIG_WLAN_VENDOR_INTERSIL is not set +# CONFIG_WLAN_VENDOR_MARVELL is not set +# CONFIG_WLAN_VENDOR_MEDIATEK is not set +# CONFIG_WLAN_VENDOR_RALINK is not set +# CONFIG_WLAN_VENDOR_REALTEK is not set +# CONFIG_WLAN_VENDOR_RSI is not set +# CONFIG_WLAN_VENDOR_ST is not set +# CONFIG_WLAN_VENDOR_TI is not set +# CONFIG_WLAN_VENDOR_ZYDAS is not set +CONFIG_MAC80211_HWSIM=y +CONFIG_INPUT_EVDEV=y +CONFIG_INPUT_KEYRESET=y +# CONFIG_INPUT_KEYBOARD is not set +# CONFIG_INPUT_MOUSE is not set +CONFIG_INPUT_JOYSTICK=y +CONFIG_JOYSTICK_XPAD=y +CONFIG_JOYSTICK_XPAD_FF=y +CONFIG_JOYSTICK_XPAD_LEDS=y +CONFIG_INPUT_TABLET=y +CONFIG_TABLET_USB_ACECAD=y +CONFIG_TABLET_USB_AIPTEK=y +CONFIG_TABLET_USB_GTCO=y +CONFIG_TABLET_USB_HANWANG=y +CONFIG_TABLET_USB_KBTAB=y +CONFIG_INPUT_MISC=y +CONFIG_INPUT_KEYCHORD=y +CONFIG_INPUT_UINPUT=y +CONFIG_INPUT_GPIO=y +# CONFIG_SERIO_I8042 is not set +# CONFIG_VT is not set +# CONFIG_LEGACY_PTYS is not set +# CONFIG_DEVMEM is not set +# CONFIG_DEVKMEM is not set +CONFIG_SERIAL_8250=y +# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_SERIAL_8250_NR_UARTS=48 +CONFIG_SERIAL_8250_EXTENDED=y +CONFIG_SERIAL_8250_MANY_PORTS=y +CONFIG_SERIAL_8250_SHARE_IRQ=y +CONFIG_VIRTIO_CONSOLE=y +CONFIG_HW_RANDOM=y +# CONFIG_HW_RANDOM_INTEL is not set +# CONFIG_HW_RANDOM_AMD is not set +# CONFIG_HW_RANDOM_VIA is not set +CONFIG_HW_RANDOM_VIRTIO=y +CONFIG_HPET=y +# CONFIG_HPET_MMAP_DEFAULT is not set +# CONFIG_DEVPORT is not set +# CONFIG_ACPI_I2C_OPREGION is not set +# CONFIG_I2C_COMPAT is not set +# CONFIG_I2C_HELPER_AUTO is not set +CONFIG_PTP_1588_CLOCK=y +# CONFIG_HWMON is not set +# CONFIG_X86_PKG_TEMP_THERMAL is not set +CONFIG_WATCHDOG=y +CONFIG_SOFT_WATCHDOG=y +CONFIG_MEDIA_SUPPORT=y +# CONFIG_DVB_TUNER_DIB0070 is not set +# CONFIG_DVB_TUNER_DIB0090 is not set +# CONFIG_VGA_ARB is not set +CONFIG_DRM=y +# CONFIG_DRM_FBDEV_EMULATION is not set +CONFIG_DRM_VIRTIO_GPU=y +CONFIG_FB=y +CONFIG_SOUND=y +CONFIG_SND=y +CONFIG_HIDRAW=y +CONFIG_UHID=y +CONFIG_HID_A4TECH=y +CONFIG_HID_ACRUX=y +CONFIG_HID_ACRUX_FF=y +CONFIG_HID_APPLE=y +CONFIG_HID_BELKIN=y +CONFIG_HID_CHERRY=y +CONFIG_HID_CHICONY=y +CONFIG_HID_PRODIKEYS=y +CONFIG_HID_CYPRESS=y +CONFIG_HID_DRAGONRISE=y +CONFIG_DRAGONRISE_FF=y +CONFIG_HID_EMS_FF=y +CONFIG_HID_ELECOM=y +CONFIG_HID_EZKEY=y +CONFIG_HID_HOLTEK=y +CONFIG_HID_KEYTOUCH=y +CONFIG_HID_KYE=y +CONFIG_HID_UCLOGIC=y +CONFIG_HID_WALTOP=y +CONFIG_HID_GYRATION=y +CONFIG_HID_TWINHAN=y +CONFIG_HID_KENSINGTON=y +CONFIG_HID_LCPOWER=y +CONFIG_HID_LOGITECH=y +CONFIG_HID_LOGITECH_DJ=y +CONFIG_LOGITECH_FF=y +CONFIG_LOGIRUMBLEPAD2_FF=y +CONFIG_LOGIG940_FF=y +CONFIG_HID_MAGICMOUSE=y +CONFIG_HID_MICROSOFT=y +CONFIG_HID_MONTEREY=y +CONFIG_HID_MULTITOUCH=y +CONFIG_HID_NTRIG=y +CONFIG_HID_ORTEK=y +CONFIG_HID_PANTHERLORD=y +CONFIG_PANTHERLORD_FF=y +CONFIG_HID_PETALYNX=y +CONFIG_HID_PICOLCD=y +CONFIG_HID_PRIMAX=y +CONFIG_HID_ROCCAT=y +CONFIG_HID_SAITEK=y +CONFIG_HID_SAMSUNG=y +CONFIG_HID_SONY=y +CONFIG_HID_SPEEDLINK=y +CONFIG_HID_SUNPLUS=y +CONFIG_HID_GREENASIA=y +CONFIG_GREENASIA_FF=y +CONFIG_HID_SMARTJOYPLUS=y +CONFIG_SMARTJOYPLUS_FF=y +CONFIG_HID_TIVO=y +CONFIG_HID_TOPSEED=y +CONFIG_HID_THRUSTMASTER=y +CONFIG_HID_WACOM=y +CONFIG_HID_WIIMOTE=y +CONFIG_HID_ZEROPLUS=y +CONFIG_HID_ZYDACRON=y +CONFIG_USB_HIDDEV=y +CONFIG_USB_ANNOUNCE_NEW_DEVICES=y +CONFIG_USB_EHCI_HCD=y +CONFIG_USB_GADGET=y +CONFIG_USB_DUMMY_HCD=y +CONFIG_USB_CONFIGFS=y +CONFIG_USB_CONFIGFS_F_FS=y +CONFIG_USB_CONFIGFS_F_MTP=y +CONFIG_USB_CONFIGFS_F_PTP=y +CONFIG_USB_CONFIGFS_F_ACC=y +CONFIG_USB_CONFIGFS_F_AUDIO_SRC=y +CONFIG_USB_CONFIGFS_UEVENT=y +CONFIG_USB_CONFIGFS_F_MIDI=y +CONFIG_RTC_CLASS=y +# CONFIG_RTC_HCTOSYS is not set +CONFIG_SW_SYNC=y +CONFIG_VIRTIO_PCI=y +CONFIG_VIRTIO_BALLOON=y +CONFIG_VIRTIO_MMIO=y +CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=y +CONFIG_STAGING=y +CONFIG_ASHMEM=y +CONFIG_ANDROID_VSOC=y +CONFIG_ION=y +# CONFIG_X86_PLATFORM_DEVICES is not set +# CONFIG_IOMMU_SUPPORT is not set +CONFIG_ANDROID=y +CONFIG_ANDROID_BINDER_IPC=y +# CONFIG_FIRMWARE_MEMMAP is not set +CONFIG_EXT4_FS=y +CONFIG_EXT4_FS_POSIX_ACL=y +CONFIG_EXT4_FS_SECURITY=y +CONFIG_EXT4_ENCRYPTION=y +CONFIG_F2FS_FS=y +CONFIG_F2FS_FS_SECURITY=y +CONFIG_F2FS_FS_ENCRYPTION=y +CONFIG_QUOTA=y +CONFIG_QUOTA_NETLINK_INTERFACE=y +# CONFIG_PRINT_QUOTA_WARNING is not set +CONFIG_QFMT_V2=y +CONFIG_AUTOFS4_FS=y +CONFIG_FUSE_FS=y +CONFIG_MSDOS_FS=y +CONFIG_VFAT_FS=y +CONFIG_PROC_KCORE=y +CONFIG_TMPFS=y +CONFIG_TMPFS_POSIX_ACL=y +CONFIG_HUGETLBFS=y +CONFIG_SDCARD_FS=y +CONFIG_PSTORE=y +CONFIG_PSTORE_CONSOLE=y +CONFIG_PSTORE_RAM=y +CONFIG_NLS_DEFAULT="utf8" +CONFIG_NLS_CODEPAGE_437=y +CONFIG_NLS_ASCII=y +CONFIG_NLS_ISO8859_1=y +CONFIG_NLS_UTF8=y +CONFIG_PRINTK_TIME=y +CONFIG_DEBUG_INFO=y +# CONFIG_ENABLE_WARN_DEPRECATED is not set +# CONFIG_ENABLE_MUST_CHECK is not set +CONFIG_FRAME_WARN=1024 +# CONFIG_UNUSED_SYMBOLS is not set +CONFIG_MAGIC_SYSRQ=y +CONFIG_DEBUG_STACK_USAGE=y +CONFIG_DEBUG_MEMORY_INIT=y +CONFIG_DEBUG_STACKOVERFLOW=y +CONFIG_LOCKUP_DETECTOR=y +CONFIG_PANIC_TIMEOUT=5 +CONFIG_SCHEDSTATS=y +CONFIG_TIMER_STATS=y +CONFIG_RCU_CPU_STALL_TIMEOUT=60 +CONFIG_ENABLE_DEFAULT_TRACERS=y +CONFIG_UPROBE_EVENT=y +CONFIG_IO_DELAY_NONE=y +CONFIG_DEBUG_BOOT_PARAMS=y +CONFIG_OPTIMIZE_INLINING=y +CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y +CONFIG_SECURITY=y +CONFIG_SECURITY_NETWORK=y +CONFIG_SECURITY_PATH=y +CONFIG_HARDENED_USERCOPY=y +CONFIG_SECURITY_SELINUX=y +CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1 +CONFIG_CRYPTO_RSA=y +# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set +CONFIG_CRYPTO_SHA512=y +CONFIG_CRYPTO_LZ4=y +CONFIG_CRYPTO_ZSTD=y +CONFIG_ASYMMETRIC_KEY_TYPE=y +CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y +CONFIG_X509_CERTIFICATE_PARSER=y +CONFIG_SYSTEM_TRUSTED_KEYRING=y +CONFIG_SYSTEM_TRUSTED_KEYS="verity_dev_keys.x509"
diff --git a/arch/x86/configs/x86_64_ranchu_defconfig b/arch/x86/configs/x86_64_ranchu_defconfig new file mode 100644 index 0000000..d50434f --- /dev/null +++ b/arch/x86/configs/x86_64_ranchu_defconfig
@@ -0,0 +1,419 @@ +# CONFIG_LOCALVERSION_AUTO is not set +CONFIG_POSIX_MQUEUE=y +CONFIG_AUDIT=y +CONFIG_NO_HZ=y +CONFIG_HIGH_RES_TIMERS=y +CONFIG_BSD_PROCESS_ACCT=y +CONFIG_TASKSTATS=y +CONFIG_TASK_DELAY_ACCT=y +CONFIG_TASK_XACCT=y +CONFIG_TASK_IO_ACCOUNTING=y +CONFIG_CGROUPS=y +CONFIG_CGROUP_DEBUG=y +CONFIG_CGROUP_FREEZER=y +CONFIG_CGROUP_CPUACCT=y +CONFIG_CGROUP_SCHED=y +CONFIG_RT_GROUP_SCHED=y +CONFIG_BLK_DEV_INITRD=y +CONFIG_CC_OPTIMIZE_FOR_SIZE=y +CONFIG_SYSCTL_SYSCALL=y +CONFIG_KALLSYMS_ALL=y +CONFIG_EMBEDDED=y +# CONFIG_COMPAT_BRK is not set +CONFIG_ARCH_MMAP_RND_BITS=32 +CONFIG_ARCH_MMAP_RND_COMPAT_BITS=16 +CONFIG_PARTITION_ADVANCED=y +CONFIG_OSF_PARTITION=y +CONFIG_AMIGA_PARTITION=y +CONFIG_MAC_PARTITION=y +CONFIG_BSD_DISKLABEL=y +CONFIG_MINIX_SUBPARTITION=y +CONFIG_SOLARIS_X86_PARTITION=y +CONFIG_UNIXWARE_DISKLABEL=y +CONFIG_SGI_PARTITION=y +CONFIG_SUN_PARTITION=y +CONFIG_KARMA_PARTITION=y +CONFIG_SMP=y +CONFIG_MCORE2=y +CONFIG_MAXSMP=y +CONFIG_PREEMPT=y +# CONFIG_X86_MCE is not set +CONFIG_X86_MSR=y +CONFIG_X86_CPUID=y +CONFIG_KSM=y +CONFIG_CMA=y +# CONFIG_MTRR_SANITIZER is not set +CONFIG_EFI=y +CONFIG_EFI_STUB=y +CONFIG_HZ_100=y +CONFIG_PHYSICAL_START=0x100000 +CONFIG_PM_AUTOSLEEP=y +CONFIG_PM_WAKELOCKS=y +CONFIG_PM_WAKELOCKS_LIMIT=0 +# CONFIG_PM_WAKELOCKS_GC is not set +CONFIG_PM_DEBUG=y +CONFIG_CPU_FREQ=y +# CONFIG_CPU_FREQ_STAT is not set +CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y +CONFIG_CPU_FREQ_GOV_USERSPACE=y +CONFIG_PCI_MMCONFIG=y +CONFIG_PCIEPORTBUS=y +# CONFIG_PCIEASPM is not set +CONFIG_PCCARD=y +CONFIG_YENTA=y +CONFIG_HOTPLUG_PCI=y +# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set +CONFIG_BINFMT_MISC=y +CONFIG_IA32_EMULATION=y +CONFIG_NET=y +CONFIG_PACKET=y +CONFIG_UNIX=y +CONFIG_XFRM_USER=y +CONFIG_NET_KEY=y +CONFIG_INET=y +CONFIG_IP_MULTICAST=y +CONFIG_IP_ADVANCED_ROUTER=y +CONFIG_IP_MULTIPLE_TABLES=y +CONFIG_IP_ROUTE_MULTIPATH=y +CONFIG_IP_ROUTE_VERBOSE=y +CONFIG_IP_PNP=y +CONFIG_IP_PNP_DHCP=y +CONFIG_IP_PNP_BOOTP=y +CONFIG_IP_PNP_RARP=y +CONFIG_IP_MROUTE=y +CONFIG_IP_PIMSM_V1=y +CONFIG_IP_PIMSM_V2=y +CONFIG_SYN_COOKIES=y +CONFIG_INET_ESP=y +# CONFIG_INET_XFRM_MODE_BEET is not set +# CONFIG_INET_LRO is not set +CONFIG_INET_DIAG_DESTROY=y +CONFIG_IPV6_ROUTER_PREF=y +CONFIG_IPV6_ROUTE_INFO=y +CONFIG_IPV6_OPTIMISTIC_DAD=y +CONFIG_INET6_AH=y +CONFIG_INET6_ESP=y +CONFIG_INET6_IPCOMP=y +CONFIG_IPV6_MIP6=y +CONFIG_IPV6_MULTIPLE_TABLES=y +CONFIG_NETLABEL=y +CONFIG_NETFILTER=y +CONFIG_NF_CONNTRACK=y +CONFIG_NF_CONNTRACK_SECMARK=y +CONFIG_NF_CONNTRACK_EVENTS=y +CONFIG_NF_CT_PROTO_DCCP=y +CONFIG_NF_CT_PROTO_SCTP=y +CONFIG_NF_CT_PROTO_UDPLITE=y +CONFIG_NF_CONNTRACK_AMANDA=y +CONFIG_NF_CONNTRACK_FTP=y +CONFIG_NF_CONNTRACK_H323=y +CONFIG_NF_CONNTRACK_IRC=y +CONFIG_NF_CONNTRACK_NETBIOS_NS=y +CONFIG_NF_CONNTRACK_PPTP=y +CONFIG_NF_CONNTRACK_SANE=y +CONFIG_NF_CONNTRACK_TFTP=y +CONFIG_NF_CT_NETLINK=y +CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y +CONFIG_NETFILTER_XT_TARGET_CONNMARK=y +CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y +CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y +CONFIG_NETFILTER_XT_TARGET_MARK=y +CONFIG_NETFILTER_XT_TARGET_NFLOG=y +CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y +CONFIG_NETFILTER_XT_TARGET_TPROXY=y +CONFIG_NETFILTER_XT_TARGET_TRACE=y +CONFIG_NETFILTER_XT_TARGET_SECMARK=y +CONFIG_NETFILTER_XT_TARGET_TCPMSS=y +CONFIG_NETFILTER_XT_MATCH_COMMENT=y +CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y +CONFIG_NETFILTER_XT_MATCH_CONNMARK=y +CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y +CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y +CONFIG_NETFILTER_XT_MATCH_HELPER=y +CONFIG_NETFILTER_XT_MATCH_IPRANGE=y +CONFIG_NETFILTER_XT_MATCH_LENGTH=y +CONFIG_NETFILTER_XT_MATCH_LIMIT=y +CONFIG_NETFILTER_XT_MATCH_MAC=y +CONFIG_NETFILTER_XT_MATCH_MARK=y +CONFIG_NETFILTER_XT_MATCH_POLICY=y +CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y +CONFIG_NETFILTER_XT_MATCH_QTAGUID=y +CONFIG_NETFILTER_XT_MATCH_QUOTA=y +CONFIG_NETFILTER_XT_MATCH_QUOTA2=y +CONFIG_NETFILTER_XT_MATCH_SOCKET=y +CONFIG_NETFILTER_XT_MATCH_STATE=y +CONFIG_NETFILTER_XT_MATCH_STATISTIC=y +CONFIG_NETFILTER_XT_MATCH_STRING=y +CONFIG_NETFILTER_XT_MATCH_TIME=y +CONFIG_NETFILTER_XT_MATCH_U32=y +CONFIG_NF_CONNTRACK_IPV4=y +CONFIG_IP_NF_IPTABLES=y +CONFIG_IP_NF_MATCH_AH=y +CONFIG_IP_NF_MATCH_ECN=y +CONFIG_IP_NF_MATCH_TTL=y +CONFIG_IP_NF_FILTER=y +CONFIG_IP_NF_TARGET_REJECT=y +CONFIG_IP_NF_MANGLE=y +CONFIG_IP_NF_RAW=y +CONFIG_IP_NF_SECURITY=y +CONFIG_IP_NF_ARPTABLES=y +CONFIG_IP_NF_ARPFILTER=y +CONFIG_IP_NF_ARP_MANGLE=y +CONFIG_NF_CONNTRACK_IPV6=y +CONFIG_IP6_NF_IPTABLES=y +CONFIG_IP6_NF_FILTER=y +CONFIG_IP6_NF_TARGET_REJECT=y +CONFIG_IP6_NF_MANGLE=y +CONFIG_IP6_NF_RAW=y +CONFIG_NET_SCHED=y +CONFIG_NET_SCH_HTB=y +CONFIG_NET_CLS_U32=y +CONFIG_NET_EMATCH=y +CONFIG_NET_EMATCH_U32=y +CONFIG_NET_CLS_ACT=y +CONFIG_CFG80211=y +CONFIG_MAC80211=y +CONFIG_MAC80211_LEDS=y +CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_DMA_CMA=y +CONFIG_CONNECTOR=y +CONFIG_BLK_DEV_LOOP=y +CONFIG_BLK_DEV_RAM=y +CONFIG_BLK_DEV_RAM_SIZE=8192 +CONFIG_VIRTIO_BLK=y +CONFIG_BLK_DEV_SD=y +CONFIG_BLK_DEV_SR=y +CONFIG_BLK_DEV_SR_VENDOR=y +CONFIG_CHR_DEV_SG=y +CONFIG_SCSI_CONSTANTS=y +CONFIG_SCSI_SPI_ATTRS=y +CONFIG_SCSI_ISCSI_ATTRS=y +# CONFIG_SCSI_LOWLEVEL is not set +CONFIG_ATA=y +CONFIG_SATA_AHCI=y +CONFIG_ATA_PIIX=y +CONFIG_PATA_AMD=y +CONFIG_PATA_OLDPIIX=y +CONFIG_PATA_SCH=y +CONFIG_PATA_MPIIX=y +CONFIG_ATA_GENERIC=y +CONFIG_MD=y +CONFIG_BLK_DEV_MD=y +CONFIG_BLK_DEV_DM=y +CONFIG_DM_DEBUG=y +CONFIG_DM_CRYPT=y +CONFIG_DM_MIRROR=y +CONFIG_DM_ZERO=y +CONFIG_DM_UEVENT=y +CONFIG_DM_VERITY=y +CONFIG_DM_VERITY_FEC=y +CONFIG_NETDEVICES=y +CONFIG_NETCONSOLE=y +CONFIG_TUN=y +CONFIG_VIRTIO_NET=y +CONFIG_BNX2=y +CONFIG_TIGON3=y +CONFIG_NET_TULIP=y +CONFIG_E100=y +CONFIG_E1000=y +CONFIG_E1000E=y +CONFIG_SKY2=y +CONFIG_NE2K_PCI=y +CONFIG_FORCEDETH=y +CONFIG_8139TOO=y +# CONFIG_8139TOO_PIO is not set +CONFIG_R8169=y +CONFIG_FDDI=y +CONFIG_PPP=y +CONFIG_PPP_BSDCOMP=y +CONFIG_PPP_DEFLATE=y +CONFIG_PPP_MPPE=y +CONFIG_PPPOLAC=y +CONFIG_PPPOPNS=y +CONFIG_USB_USBNET=y +CONFIG_INPUT_POLLDEV=y +# CONFIG_INPUT_MOUSEDEV_PSAUX is not set +CONFIG_INPUT_EVDEV=y +CONFIG_INPUT_KEYRESET=y +# CONFIG_KEYBOARD_ATKBD is not set +CONFIG_KEYBOARD_GOLDFISH_EVENTS=y +# CONFIG_INPUT_MOUSE is not set +CONFIG_INPUT_JOYSTICK=y +CONFIG_JOYSTICK_XPAD=y +CONFIG_JOYSTICK_XPAD_FF=y +CONFIG_JOYSTICK_XPAD_LEDS=y +CONFIG_INPUT_TABLET=y +CONFIG_TABLET_USB_ACECAD=y +CONFIG_TABLET_USB_AIPTEK=y +CONFIG_TABLET_USB_GTCO=y +CONFIG_TABLET_USB_HANWANG=y +CONFIG_TABLET_USB_KBTAB=y +CONFIG_INPUT_TOUCHSCREEN=y +CONFIG_INPUT_MISC=y +CONFIG_INPUT_KEYCHORD=y +CONFIG_INPUT_UINPUT=y +CONFIG_INPUT_GPIO=y +# CONFIG_SERIO is not set +# CONFIG_VT is not set +# CONFIG_LEGACY_PTYS is not set +CONFIG_SERIAL_NONSTANDARD=y +# CONFIG_DEVMEM is not set +# CONFIG_DEVKMEM is not set +CONFIG_SERIAL_8250=y +CONFIG_SERIAL_8250_CONSOLE=y +CONFIG_VIRTIO_CONSOLE=y +CONFIG_NVRAM=y +CONFIG_I2C_I801=y +CONFIG_BATTERY_GOLDFISH=y +CONFIG_WATCHDOG=y +CONFIG_MEDIA_SUPPORT=y +CONFIG_AGP=y +CONFIG_AGP_AMD64=y +CONFIG_AGP_INTEL=y +CONFIG_DRM=y +CONFIG_FB_MODE_HELPERS=y +CONFIG_FB_TILEBLITTING=y +CONFIG_FB_EFI=y +CONFIG_FB_GOLDFISH=y +CONFIG_BACKLIGHT_LCD_SUPPORT=y +# CONFIG_LCD_CLASS_DEVICE is not set +CONFIG_SOUND=y +CONFIG_SND=y +CONFIG_HIDRAW=y +CONFIG_UHID=y +CONFIG_HID_A4TECH=y +CONFIG_HID_ACRUX=y +CONFIG_HID_ACRUX_FF=y +CONFIG_HID_APPLE=y +CONFIG_HID_BELKIN=y +CONFIG_HID_CHERRY=y +CONFIG_HID_CHICONY=y +CONFIG_HID_PRODIKEYS=y +CONFIG_HID_CYPRESS=y +CONFIG_HID_DRAGONRISE=y +CONFIG_DRAGONRISE_FF=y +CONFIG_HID_EMS_FF=y +CONFIG_HID_ELECOM=y +CONFIG_HID_EZKEY=y +CONFIG_HID_HOLTEK=y +CONFIG_HID_KEYTOUCH=y +CONFIG_HID_KYE=y +CONFIG_HID_UCLOGIC=y +CONFIG_HID_WALTOP=y +CONFIG_HID_GYRATION=y +CONFIG_HID_TWINHAN=y +CONFIG_HID_KENSINGTON=y +CONFIG_HID_LCPOWER=y +CONFIG_HID_LOGITECH=y +CONFIG_HID_LOGITECH_DJ=y +CONFIG_LOGITECH_FF=y +CONFIG_LOGIRUMBLEPAD2_FF=y +CONFIG_LOGIG940_FF=y +CONFIG_HID_MAGICMOUSE=y +CONFIG_HID_MICROSOFT=y +CONFIG_HID_MONTEREY=y +CONFIG_HID_MULTITOUCH=y +CONFIG_HID_NTRIG=y +CONFIG_HID_ORTEK=y +CONFIG_HID_PANTHERLORD=y +CONFIG_PANTHERLORD_FF=y +CONFIG_HID_PETALYNX=y +CONFIG_HID_PICOLCD=y +CONFIG_HID_PRIMAX=y +CONFIG_HID_ROCCAT=y +CONFIG_HID_SAITEK=y +CONFIG_HID_SAMSUNG=y +CONFIG_HID_SONY=y +CONFIG_HID_SPEEDLINK=y +CONFIG_HID_SUNPLUS=y +CONFIG_HID_GREENASIA=y +CONFIG_GREENASIA_FF=y +CONFIG_HID_SMARTJOYPLUS=y +CONFIG_SMARTJOYPLUS_FF=y +CONFIG_HID_TIVO=y +CONFIG_HID_TOPSEED=y +CONFIG_HID_THRUSTMASTER=y +CONFIG_HID_WACOM=y +CONFIG_HID_WIIMOTE=y +CONFIG_HID_ZEROPLUS=y +CONFIG_HID_ZYDACRON=y +CONFIG_HID_PID=y +CONFIG_USB_HIDDEV=y +CONFIG_USB_ANNOUNCE_NEW_DEVICES=y +CONFIG_USB_MON=y +CONFIG_USB_EHCI_HCD=y +# CONFIG_USB_EHCI_TT_NEWSCHED is not set +CONFIG_USB_OHCI_HCD=y +CONFIG_USB_UHCI_HCD=y +CONFIG_USB_PRINTER=y +CONFIG_USB_STORAGE=y +CONFIG_USB_OTG_WAKELOCK=y +CONFIG_EDAC=y +CONFIG_RTC_CLASS=y +# CONFIG_RTC_HCTOSYS is not set +CONFIG_DMADEVICES=y +CONFIG_VIRTIO_PCI=y +CONFIG_STAGING=y +CONFIG_ASHMEM=y +CONFIG_ANDROID_LOW_MEMORY_KILLER=y +CONFIG_SYNC=y +CONFIG_SW_SYNC=y +CONFIG_SYNC_FILE=y +CONFIG_ION=y +CONFIG_GOLDFISH_AUDIO=y +CONFIG_SND_HDA_INTEL=y +CONFIG_GOLDFISH=y +CONFIG_GOLDFISH_PIPE=y +CONFIG_GOLDFISH_SYNC=y +CONFIG_ANDROID=y +CONFIG_ANDROID_BINDER_IPC=y +CONFIG_ISCSI_IBFT_FIND=y +CONFIG_EXT4_FS=y +CONFIG_EXT4_FS_SECURITY=y +CONFIG_QUOTA=y +CONFIG_QUOTA_NETLINK_INTERFACE=y +# CONFIG_PRINT_QUOTA_WARNING is not set +CONFIG_FUSE_FS=y +CONFIG_ISO9660_FS=y +CONFIG_JOLIET=y +CONFIG_ZISOFS=y +CONFIG_MSDOS_FS=y +CONFIG_VFAT_FS=y +CONFIG_PROC_KCORE=y +CONFIG_TMPFS=y +CONFIG_TMPFS_POSIX_ACL=y +CONFIG_HUGETLBFS=y +CONFIG_PSTORE=y +CONFIG_PSTORE_CONSOLE=y +CONFIG_PSTORE_RAM=y +# CONFIG_NETWORK_FILESYSTEMS is not set +CONFIG_NLS_DEFAULT="utf8" +CONFIG_NLS_CODEPAGE_437=y +CONFIG_NLS_ASCII=y +CONFIG_NLS_ISO8859_1=y +CONFIG_NLS_UTF8=y +CONFIG_PRINTK_TIME=y +CONFIG_DEBUG_INFO=y +# CONFIG_ENABLE_WARN_DEPRECATED is not set +# CONFIG_ENABLE_MUST_CHECK is not set +# CONFIG_UNUSED_SYMBOLS is not set +CONFIG_MAGIC_SYSRQ=y +CONFIG_DEBUG_MEMORY_INIT=y +CONFIG_PANIC_TIMEOUT=5 +CONFIG_SCHEDSTATS=y +CONFIG_TIMER_STATS=y +CONFIG_SCHED_TRACER=y +CONFIG_BLK_DEV_IO_TRACE=y +CONFIG_PROVIDE_OHCI1394_DMA_INIT=y +CONFIG_KEYS=y +CONFIG_SECURITY=y +CONFIG_SECURITY_NETWORK=y +CONFIG_SECURITY_SELINUX=y +CONFIG_CRYPTO_TWOFISH=y +CONFIG_ASYMMETRIC_KEY_TYPE=y +CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y +CONFIG_X509_CERTIFICATE_PARSER=y +CONFIG_PKCS7_MESSAGE_PARSER=y +CONFIG_PKCS7_TEST_KEY=y +# CONFIG_VIRTUALIZATION is not set +CONFIG_CRC_T10DIF=y
diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S index a916c4a..5f6a5af 100644 --- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S +++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -65,7 +65,6 @@ #include <linux/linkage.h> #include <asm/inst.h> -#define CONCAT(a,b) a##b #define VMOVDQ vmovdqu #define xdata0 %xmm0 @@ -92,8 +91,6 @@ #define num_bytes %r8 #define tmp %r10 -#define DDQ(i) CONCAT(ddq_add_,i) -#define XMM(i) CONCAT(%xmm, i) #define DDQ_DATA 0 #define XDATA 1 #define KEY_128 1 @@ -131,12 +128,12 @@ /* generate a unique variable for ddq_add_x */ .macro setddq n - var_ddq_add = DDQ(\n) + var_ddq_add = ddq_add_\n .endm /* generate a unique variable for xmm register */ .macro setxdata n - var_xdata = XMM(\n) + var_xdata = %xmm\n .endm /* club the numeric 'id' to the symbol 'name' */
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index b0cd306..b83eafa 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c
@@ -22,6 +22,7 @@ #include <linux/user-return-notifier.h> #include <linux/nospec.h> #include <linux/uprobes.h> +#include <linux/syscalls.h> #include <asm/desc.h> #include <asm/traps.h> @@ -180,6 +181,8 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs) struct thread_info *ti = current_thread_info(); u32 cached_flags; + addr_limit_user_check(); + if (IS_ENABLED(CONFIG_PROVE_LOCKING) && WARN_ON(!irqs_disabled())) local_irq_disable();
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 870e941..d7649924e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S
@@ -626,13 +626,8 @@ #endif /* Make sure APIC interrupt handlers end up in the irqentry section: */ -#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN) -# define PUSH_SECTION_IRQENTRY .pushsection .irqentry.text, "ax" -# define POP_SECTION_IRQENTRY .popsection -#else -# define PUSH_SECTION_IRQENTRY -# define POP_SECTION_IRQENTRY -#endif +#define PUSH_SECTION_IRQENTRY .pushsection .irqentry.text, "ax" +#define POP_SECTION_IRQENTRY .popsection .macro apicinterrupt num sym do_sym PUSH_SECTION_IRQENTRY
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index d540966..51a858e 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile
@@ -171,7 +171,8 @@ sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@' VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) \ - $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS) + $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS) \ + $(filter --target=% --gcc-toolchain=%,$(KBUILD_CFLAGS)) GCOV_PROFILE := n #
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index deca9b9..6665000 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h
@@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end) #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, \ output, input...) \ { \ - register void *__sp asm(_ASM_SP); \ asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\ "call %P[new2]", feature2) \ - : output, "+r" (__sp) \ + : output, ASM_CALL_CONSTRAINT \ : [old] "i" (oldfunc), [new1] "i" (newfunc1), \ [new2] "i" (newfunc2), ## input); \ }
diff --git a/arch/x86/include/asm/idle.h b/arch/x86/include/asm/idle.h index c5d1785..02bab09 100644 --- a/arch/x86/include/asm/idle.h +++ b/arch/x86/include/asm/idle.h
@@ -1,13 +1,6 @@ #ifndef _ASM_X86_IDLE_H #define _ASM_X86_IDLE_H -#define IDLE_START 1 -#define IDLE_END 2 - -struct notifier_block; -void idle_notifier_register(struct notifier_block *n); -void idle_notifier_unregister(struct notifier_block *n); - #ifdef CONFIG_X86_64 void enter_idle(void); void exit_idle(void);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 04b7971..98dbad6 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h
@@ -462,8 +462,8 @@ int paravirt_disable_iospace(void); */ #ifdef CONFIG_X86_32 #define PVOP_VCALL_ARGS \ - unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx; \ - register void *__sp asm("esp") + unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx; + #define PVOP_CALL_ARGS PVOP_VCALL_ARGS #define PVOP_CALL_ARG1(x) "a" ((unsigned long)(x)) @@ -483,8 +483,8 @@ int paravirt_disable_iospace(void); /* [re]ax isn't an arg, but the return val */ #define PVOP_VCALL_ARGS \ unsigned long __edi = __edi, __esi = __esi, \ - __edx = __edx, __ecx = __ecx, __eax = __eax; \ - register void *__sp asm("rsp") + __edx = __edx, __ecx = __ecx, __eax = __eax; + #define PVOP_CALL_ARGS PVOP_VCALL_ARGS #define PVOP_CALL_ARG1(x) "D" ((unsigned long)(x)) @@ -523,7 +523,7 @@ int paravirt_disable_iospace(void); asm volatile(pre \ paravirt_alt(PARAVIRT_CALL) \ post \ - : call_clbr, "+r" (__sp) \ + : call_clbr, ASM_CALL_CONSTRAINT \ : paravirt_type(op), \ paravirt_clobber(clbr), \ ##__VA_ARGS__ \ @@ -533,7 +533,7 @@ int paravirt_disable_iospace(void); asm volatile(pre \ paravirt_alt(PARAVIRT_CALL) \ post \ - : call_clbr, "+r" (__sp) \ + : call_clbr, ASM_CALL_CONSTRAINT \ : paravirt_type(op), \ paravirt_clobber(clbr), \ ##__VA_ARGS__ \ @@ -560,7 +560,7 @@ int paravirt_disable_iospace(void); asm volatile(pre \ paravirt_alt(PARAVIRT_CALL) \ post \ - : call_clbr, "+r" (__sp) \ + : call_clbr, ASM_CALL_CONSTRAINT \ : paravirt_type(op), \ paravirt_clobber(clbr), \ ##__VA_ARGS__ \
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h index 17f2186..4939f6ed 100644 --- a/arch/x86/include/asm/preempt.h +++ b/arch/x86/include/asm/preempt.h
@@ -94,19 +94,14 @@ static __always_inline bool should_resched(int preempt_offset) #ifdef CONFIG_PREEMPT extern asmlinkage void ___preempt_schedule(void); -# define __preempt_schedule() \ -({ \ - register void *__sp asm(_ASM_SP); \ - asm volatile ("call ___preempt_schedule" : "+r"(__sp)); \ -}) +# define __preempt_schedule() \ + asm volatile ("call ___preempt_schedule" : ASM_CALL_CONSTRAINT) extern asmlinkage void preempt_schedule(void); extern asmlinkage void ___preempt_schedule_notrace(void); -# define __preempt_schedule_notrace() \ -({ \ - register void *__sp asm(_ASM_SP); \ - asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp)); \ -}) +# define __preempt_schedule_notrace() \ + asm volatile ("call ___preempt_schedule_notrace" : ASM_CALL_CONSTRAINT) + extern asmlinkage void preempt_schedule_notrace(void); #endif
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h index a34e0d4..7116b79 100644 --- a/arch/x86/include/asm/rwsem.h +++ b/arch/x86/include/asm/rwsem.h
@@ -103,7 +103,6 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem) ({ \ long tmp; \ struct rw_semaphore* ret; \ - register void *__sp asm(_ASM_SP); \ \ asm volatile("# beginning down_write\n\t" \ LOCK_PREFIX " xadd %1,(%4)\n\t" \ @@ -114,7 +113,8 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem) " call " slow_path "\n" \ "1:\n" \ "# ending down_write" \ - : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \ + : "+m" (sem->count), "=d" (tmp), \ + "=a" (ret), ASM_CALL_CONSTRAINT \ : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \ : "memory", "cc"); \ ret; \
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 2d8788a..83252c2 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h
@@ -101,6 +101,7 @@ struct thread_info { #define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */ #define TIF_ADDR32 29 /* 32-bit address space on 64 bits */ #define TIF_X32 30 /* 32-bit native x86-64 binary */ +#define TIF_FSCHECK 31 /* Check FS is USER_DS on return */ #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) @@ -124,6 +125,7 @@ struct thread_info { #define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT) #define _TIF_ADDR32 (1 << TIF_ADDR32) #define _TIF_X32 (1 << TIF_X32) +#define _TIF_FSCHECK (1 << TIF_FSCHECK) /* * work to do in syscall_trace_enter(). Also includes TIF_NOHZ for @@ -137,7 +139,7 @@ struct thread_info { /* work to do on any return to user space */ #define _TIF_ALLWORK_MASK \ ((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT | \ - _TIF_NOHZ) + _TIF_NOHZ | _TIF_FSCHECK) /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW \
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index a8d85a6..76a12ab 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h
@@ -8,6 +8,7 @@ #include <linux/kasan-checks.h> #include <linux/thread_info.h> #include <linux/string.h> +#include <linux/sched.h> #include <asm/asm.h> #include <asm/page.h> #include <asm/smap.h> @@ -31,7 +32,12 @@ #define get_ds() (KERNEL_DS) #define get_fs() (current->thread.addr_limit) -#define set_fs(x) (current->thread.addr_limit = (x)) +static inline void set_fs(mm_segment_t fs) +{ + current->thread.addr_limit = fs; + /* On user-mode return, check fs is correct */ + set_thread_flag(TIF_FSCHECK); +} #define segment_eq(a, b) ((a).seg == (b).seg) @@ -171,11 +177,11 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL)) ({ \ int __ret_gu; \ register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \ - register void *__sp asm(_ASM_SP); \ __chk_user_ptr(ptr); \ might_fault(); \ asm volatile("call __get_user_%P4" \ - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \ + : "=a" (__ret_gu), "=r" (__val_gu), \ + ASM_CALL_CONSTRAINT \ : "0" (ptr), "i" (sizeof(*(ptr)))); \ (x) = (__force __typeof__(*(ptr))) __val_gu; \ __builtin_expect(__ret_gu, 0); \
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index ccdc23d..1fdcb8c 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h
@@ -112,10 +112,9 @@ extern struct { char _entry[32]; } hypercall_page[]; register unsigned long __arg2 asm(__HYPERCALL_ARG2REG) = __arg2; \ register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \ register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \ - register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \ - register void *__sp asm(_ASM_SP); + register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; -#define __HYPERCALL_0PARAM "=r" (__res), "+r" (__sp) +#define __HYPERCALL_0PARAM "=r" (__res), ASM_CALL_CONSTRAINT #define __HYPERCALL_1PARAM __HYPERCALL_0PARAM, "+r" (__arg1) #define __HYPERCALL_2PARAM __HYPERCALL_1PARAM, "+r" (__arg2) #define __HYPERCALL_3PARAM __HYPERCALL_2PARAM, "+r" (__arg3)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 00a9047..e9195a13 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c
@@ -68,19 +68,6 @@ EXPORT_PER_CPU_SYMBOL(cpu_tss); #ifdef CONFIG_X86_64 static DEFINE_PER_CPU(unsigned char, is_idle); -static ATOMIC_NOTIFIER_HEAD(idle_notifier); - -void idle_notifier_register(struct notifier_block *n) -{ - atomic_notifier_chain_register(&idle_notifier, n); -} -EXPORT_SYMBOL_GPL(idle_notifier_register); - -void idle_notifier_unregister(struct notifier_block *n) -{ - atomic_notifier_chain_unregister(&idle_notifier, n); -} -EXPORT_SYMBOL_GPL(idle_notifier_unregister); #endif /* @@ -397,14 +384,14 @@ static inline void play_dead(void) void enter_idle(void) { this_cpu_write(is_idle, 1); - atomic_notifier_call_chain(&idle_notifier, IDLE_START, NULL); + idle_notifier_call_chain(IDLE_START); } static void __exit_idle(void) { if (x86_test_and_clear_bit_percpu(0, is_idle) == 0) return; - atomic_notifier_call_chain(&idle_notifier, IDLE_END, NULL); + idle_notifier_call_chain(IDLE_END); } /* Called from interrupts to signify idle end */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 8888d894..ff7696c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c
@@ -8907,7 +8907,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) { u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - register void *__sp asm(_ASM_SP); /* * If external interrupt exists, IF bit is set in rflags/eflags on the @@ -8941,7 +8940,7 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) #ifdef CONFIG_X86_64 [sp]"=&r"(tmp), #endif - "+r"(__sp) + ASM_CALL_CONSTRAINT : THUNK_TARGET(entry), [ss]"i"(__KERNEL_DS),
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 5c419b8..0da3945 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c
@@ -758,7 +758,6 @@ no_context(struct pt_regs *regs, unsigned long error_code, if (is_vmalloc_addr((void *)address) && (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) || address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) { - register void *__sp asm("rsp"); unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *); /* * We're likely to be running with very little stack space @@ -773,7 +772,7 @@ no_context(struct pt_regs *regs, unsigned long error_code, asm volatile ("movq %[stack], %%rsp\n\t" "call handle_stack_overflow\n\t" "1: jmp 1b" - : "+r" (__sp) + : ASM_CALL_CONSTRAINT : "D" ("kernel stack overflow (page fault)"), "S" (regs), "d" (address), [stack] "rm" (stack));
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c index 842ca3c..18f9dad 100644 --- a/arch/x86/oprofile/nmi_int.c +++ b/arch/x86/oprofile/nmi_int.c
@@ -615,7 +615,7 @@ enum __force_cpu_type { static int force_cpu_type; -static int set_cpu_type(const char *str, struct kernel_param *kp) +static int set_cpu_type(const char *str, const struct kernel_param *kp) { if (!strcmp(str, "timer")) { force_cpu_type = timer;
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h index 81435d9..fc7ca28 100644 --- a/arch/xtensa/include/uapi/asm/socket.h +++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -101,4 +101,6 @@ #define SO_CNX_ADVICE 53 +#define SO_COOKIE 57 + #endif /* _XTENSA_SOCKET_H */
diff --git a/block/blk-core.c b/block/blk-core.c index 77b99bf..c13af61 100644 --- a/block/blk-core.c +++ b/block/blk-core.c
@@ -40,6 +40,8 @@ #include "blk.h" #include "blk-mq.h" +#include <linux/math64.h> + EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete); @@ -3569,3 +3571,52 @@ int __init blk_dev_init(void) return 0; } + +/* + * Blk IO latency support. We want this to be as cheap as possible, so doing + * this lockless (and avoiding atomics), a few off by a few errors in this + * code is not harmful, and we don't want to do anything that is + * perf-impactful. + * TODO : If necessary, we can make the histograms per-cpu and aggregate + * them when printing them out. + */ +ssize_t +blk_latency_hist_show(char* name, struct io_latency_state *s, char *buf, + int buf_size) +{ + int i; + int bytes_written = 0; + u_int64_t num_elem, elem; + int pct; + u_int64_t average; + + num_elem = s->latency_elems; + if (num_elem > 0) { + average = div64_u64(s->latency_sum, s->latency_elems); + bytes_written += scnprintf(buf + bytes_written, + buf_size - bytes_written, + "IO svc_time %s Latency Histogram (n = %llu," + " average = %llu):\n", name, num_elem, average); + for (i = 0; + i < ARRAY_SIZE(latency_x_axis_us); + i++) { + elem = s->latency_y_axis[i]; + pct = div64_u64(elem * 100, num_elem); + bytes_written += scnprintf(buf + bytes_written, + PAGE_SIZE - bytes_written, + "\t< %6lluus%15llu%15d%%\n", + latency_x_axis_us[i], + elem, pct); + } + /* Last element in y-axis table is overflow */ + elem = s->latency_y_axis[i]; + pct = div64_u64(elem * 100, num_elem); + bytes_written += scnprintf(buf + bytes_written, + PAGE_SIZE - bytes_written, + "\t>=%6lluus%15llu%15d%%\n", + latency_x_axis_us[i - 1], elem, pct); + } + + return bytes_written; +} +EXPORT_SYMBOL(blk_latency_hist_show);
diff --git a/build.config.cuttlefish.x86_64 b/build.config.cuttlefish.x86_64 new file mode 100644 index 0000000..8d56143 --- /dev/null +++ b/build.config.cuttlefish.x86_64
@@ -0,0 +1,16 @@ +ARCH=x86_64 +BRANCH=android-4.9 +CLANG_TRIPLE=x86_64-linux-gnu- +CROSS_COMPILE=x86_64-linux-androidkernel- +DEFCONFIG=x86_64_cuttlefish_defconfig +EXTRA_CMDS='' +KERNEL_DIR=common +POST_DEFCONFIG_CMDS="check_defconfig" +CLANG_PREBUILT_BIN=prebuilts-master/clang/host/linux-x86/clang-r328903/bin +LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/bin +FILES=" +arch/x86/boot/bzImage +vmlinux +System.map +" +STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.arm b/build.config.goldfish.arm new file mode 100644 index 0000000..ff5646a --- /dev/null +++ b/build.config.goldfish.arm
@@ -0,0 +1,13 @@ +ARCH=arm +BRANCH=android-4.4 +CROSS_COMPILE=arm-linux-androidkernel- +DEFCONFIG=ranchu_defconfig +EXTRA_CMDS='' +KERNEL_DIR=common +LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/bin +FILES=" +arch/arm/boot/zImage +vmlinux +System.map +" +STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.arm64 b/build.config.goldfish.arm64 new file mode 100644 index 0000000..4c896a67 --- /dev/null +++ b/build.config.goldfish.arm64
@@ -0,0 +1,13 @@ +ARCH=arm64 +BRANCH=android-4.4 +CROSS_COMPILE=aarch64-linux-android- +DEFCONFIG=ranchu64_defconfig +EXTRA_CMDS='' +KERNEL_DIR=common +LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/bin +FILES=" +arch/arm64/boot/Image +vmlinux +System.map +" +STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.mips b/build.config.goldfish.mips new file mode 100644 index 0000000..9a14a44 --- /dev/null +++ b/build.config.goldfish.mips
@@ -0,0 +1,12 @@ +ARCH=mips +BRANCH=android-4.4 +CROSS_COMPILE=mips64el-linux-android- +DEFCONFIG=ranchu_defconfig +EXTRA_CMDS='' +KERNEL_DIR=common +LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/mips/mips64el-linux-android-4.9/bin +FILES=" +vmlinux +System.map +" +STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.mips64 b/build.config.goldfish.mips64 new file mode 100644 index 0000000..6ad9759 --- /dev/null +++ b/build.config.goldfish.mips64
@@ -0,0 +1,12 @@ +ARCH=mips +BRANCH=android-4.4 +CROSS_COMPILE=mips64el-linux-android- +DEFCONFIG=ranchu64_defconfig +EXTRA_CMDS='' +KERNEL_DIR=common +LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/mips/mips64el-linux-android-4.9/bin +FILES=" +vmlinux +System.map +" +STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.x86 b/build.config.goldfish.x86 new file mode 100644 index 0000000..2266c62 --- /dev/null +++ b/build.config.goldfish.x86
@@ -0,0 +1,13 @@ +ARCH=x86 +BRANCH=android-4.4 +CROSS_COMPILE=x86_64-linux-android- +DEFCONFIG=i386_ranchu_defconfig +EXTRA_CMDS='' +KERNEL_DIR=common +LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/bin +FILES=" +arch/x86/boot/bzImage +vmlinux +System.map +" +STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.x86_64 b/build.config.goldfish.x86_64 new file mode 100644 index 0000000..08c42c2 --- /dev/null +++ b/build.config.goldfish.x86_64
@@ -0,0 +1,13 @@ +ARCH=x86_64 +BRANCH=android-4.4 +CROSS_COMPILE=x86_64-linux-android- +DEFCONFIG=x86_64_ranchu_defconfig +EXTRA_CMDS='' +KERNEL_DIR=common +LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/bin +FILES=" +arch/x86/boot/bzImage +vmlinux +System.map +" +STOP_SHIP_TRACEPRINTK=1
diff --git a/certs/system_keyring.c b/certs/system_keyring.c index 2476650..8cde8ea8 100644 --- a/certs/system_keyring.c +++ b/certs/system_keyring.c
@@ -241,5 +241,46 @@ int verify_pkcs7_signature(const void *data, size_t len, return ret; } EXPORT_SYMBOL_GPL(verify_pkcs7_signature); - #endif /* CONFIG_SYSTEM_DATA_VERIFICATION */ + +/** + * verify_signature_one - Verify a signature with keys from given keyring + * @sig: The signature to be verified + * @trusted_keys: Trusted keys to use (NULL for builtin trusted keys only, + * (void *)1UL for all trusted keys). + * @keyid: key description (not partial) + */ +int verify_signature_one(const struct public_key_signature *sig, + struct key *trusted_keys, const char *keyid) +{ + key_ref_t ref; + struct key *key; + int ret; + + if (!sig) + return -EBADMSG; + if (!trusted_keys) { + trusted_keys = builtin_trusted_keys; + } else if (trusted_keys == (void *)1UL) { +#ifdef CONFIG_SECONDARY_TRUSTED_KEYRING + trusted_keys = secondary_trusted_keys; +#else + trusted_keys = builtin_trusted_keys; +#endif + } + + ref = keyring_search(make_key_ref(trusted_keys, 1), + &key_type_asymmetric, keyid); + if (IS_ERR(ref)) { + pr_err("Asymmetric key (%s) not found in keyring(%s)\n", + keyid, trusted_keys->description); + return -ENOKEY; + } + + key = key_ref_to_ptr(ref); + ret = verify_signature(key, sig); + key_put(key); + return ret; +} +EXPORT_SYMBOL_GPL(verify_signature_one); +
diff --git a/crypto/Kconfig b/crypto/Kconfig index ab0d93a..64f50b7 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig
@@ -1444,6 +1444,20 @@ See also: <http://www.cl.cam.ac.uk/~rja14/serpent.html> +config CRYPTO_SPECK + tristate "Speck cipher algorithm" + select CRYPTO_ALGAPI + help + Speck is a lightweight block cipher that is tuned for optimal + performance in software (rather than hardware). + + Speck may not be as secure as AES, and should only be used on systems + where AES is not fast enough. + + See also: <https://eprint.iacr.org/2013/404.pdf> + + If unsure, say N. + config CRYPTO_TEA tristate "TEA, XTEA and XETA cipher algorithms" select CRYPTO_ALGAPI @@ -1608,6 +1622,15 @@ help This is the LZ4 high compression mode algorithm. +config CRYPTO_ZSTD + tristate "Zstd compression algorithm" + select CRYPTO_ALGAPI + select CRYPTO_ACOMP2 + select ZSTD_COMPRESS + select ZSTD_DECOMPRESS + help + This is the zstd algorithm. + comment "Random Number Generation" config CRYPTO_ANSI_CPRNG
diff --git a/crypto/Makefile b/crypto/Makefile index 9e52b3c..8a455d0 100644 --- a/crypto/Makefile +++ b/crypto/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o obj-$(CONFIG_CRYPTO_SEED) += seed.o +obj-$(CONFIG_CRYPTO_SPECK) += speck.o obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o @@ -132,6 +133,7 @@ obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o +obj-$(CONFIG_CRYPTO_ZSTD) += zstd.o # # generic algorithms and the async_tx api
diff --git a/crypto/api.c b/crypto/api.c index abf53e6..f12d6b9 100644 --- a/crypto/api.c +++ b/crypto/api.c
@@ -24,6 +24,7 @@ #include <linux/sched.h> #include <linux/slab.h> #include <linux/string.h> +#include <linux/completion.h> #include "internal.h" LIST_HEAD(crypto_alg_list); @@ -611,5 +612,17 @@ int crypto_has_alg(const char *name, u32 type, u32 mask) } EXPORT_SYMBOL_GPL(crypto_has_alg); +void crypto_req_done(struct crypto_async_request *req, int err) +{ + struct crypto_wait *wait = req->data; + + if (err == -EINPROGRESS) + return; + + wait->err = err; + complete(&wait->completion); +} +EXPORT_SYMBOL_GPL(crypto_req_done); + MODULE_DESCRIPTION("Cryptographic core API"); MODULE_LICENSE("GPL");
diff --git a/crypto/speck.c b/crypto/speck.c new file mode 100644 index 0000000..58aa9f7 --- /dev/null +++ b/crypto/speck.c
@@ -0,0 +1,307 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Speck: a lightweight block cipher + * + * Copyright (c) 2018 Google, Inc + * + * Speck has 10 variants, including 5 block sizes. For now we only implement + * the variants Speck128/128, Speck128/192, Speck128/256, Speck64/96, and + * Speck64/128. Speck${B}/${K} denotes the variant with a block size of B bits + * and a key size of K bits. The Speck128 variants are believed to be the most + * secure variants, and they use the same block size and key sizes as AES. The + * Speck64 variants are less secure, but on 32-bit processors are usually + * faster. The remaining variants (Speck32, Speck48, and Speck96) are even less + * secure and/or not as well suited for implementation on either 32-bit or + * 64-bit processors, so are omitted. + * + * Reference: "The Simon and Speck Families of Lightweight Block Ciphers" + * https://eprint.iacr.org/2013/404.pdf + * + * In a correspondence, the Speck designers have also clarified that the words + * should be interpreted in little-endian format, and the words should be + * ordered such that the first word of each block is 'y' rather than 'x', and + * the first key word (rather than the last) becomes the first round key. + */ + +#include <asm/unaligned.h> +#include <crypto/speck.h> +#include <linux/bitops.h> +#include <linux/crypto.h> +#include <linux/init.h> +#include <linux/module.h> + +/* Speck128 */ + +static __always_inline void speck128_round(u64 *x, u64 *y, u64 k) +{ + *x = ror64(*x, 8); + *x += *y; + *x ^= k; + *y = rol64(*y, 3); + *y ^= *x; +} + +static __always_inline void speck128_unround(u64 *x, u64 *y, u64 k) +{ + *y ^= *x; + *y = ror64(*y, 3); + *x ^= k; + *x -= *y; + *x = rol64(*x, 8); +} + +void crypto_speck128_encrypt(const struct speck128_tfm_ctx *ctx, + u8 *out, const u8 *in) +{ + u64 y = get_unaligned_le64(in); + u64 x = get_unaligned_le64(in + 8); + int i; + + for (i = 0; i < ctx->nrounds; i++) + speck128_round(&x, &y, ctx->round_keys[i]); + + put_unaligned_le64(y, out); + put_unaligned_le64(x, out + 8); +} +EXPORT_SYMBOL_GPL(crypto_speck128_encrypt); + +static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) +{ + crypto_speck128_encrypt(crypto_tfm_ctx(tfm), out, in); +} + +void crypto_speck128_decrypt(const struct speck128_tfm_ctx *ctx, + u8 *out, const u8 *in) +{ + u64 y = get_unaligned_le64(in); + u64 x = get_unaligned_le64(in + 8); + int i; + + for (i = ctx->nrounds - 1; i >= 0; i--) + speck128_unround(&x, &y, ctx->round_keys[i]); + + put_unaligned_le64(y, out); + put_unaligned_le64(x, out + 8); +} +EXPORT_SYMBOL_GPL(crypto_speck128_decrypt); + +static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) +{ + crypto_speck128_decrypt(crypto_tfm_ctx(tfm), out, in); +} + +int crypto_speck128_setkey(struct speck128_tfm_ctx *ctx, const u8 *key, + unsigned int keylen) +{ + u64 l[3]; + u64 k; + int i; + + switch (keylen) { + case SPECK128_128_KEY_SIZE: + k = get_unaligned_le64(key); + l[0] = get_unaligned_le64(key + 8); + ctx->nrounds = SPECK128_128_NROUNDS; + for (i = 0; i < ctx->nrounds; i++) { + ctx->round_keys[i] = k; + speck128_round(&l[0], &k, i); + } + break; + case SPECK128_192_KEY_SIZE: + k = get_unaligned_le64(key); + l[0] = get_unaligned_le64(key + 8); + l[1] = get_unaligned_le64(key + 16); + ctx->nrounds = SPECK128_192_NROUNDS; + for (i = 0; i < ctx->nrounds; i++) { + ctx->round_keys[i] = k; + speck128_round(&l[i % 2], &k, i); + } + break; + case SPECK128_256_KEY_SIZE: + k = get_unaligned_le64(key); + l[0] = get_unaligned_le64(key + 8); + l[1] = get_unaligned_le64(key + 16); + l[2] = get_unaligned_le64(key + 24); + ctx->nrounds = SPECK128_256_NROUNDS; + for (i = 0; i < ctx->nrounds; i++) { + ctx->round_keys[i] = k; + speck128_round(&l[i % 3], &k, i); + } + break; + default: + return -EINVAL; + } + + return 0; +} +EXPORT_SYMBOL_GPL(crypto_speck128_setkey); + +static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key, + unsigned int keylen) +{ + return crypto_speck128_setkey(crypto_tfm_ctx(tfm), key, keylen); +} + +/* Speck64 */ + +static __always_inline void speck64_round(u32 *x, u32 *y, u32 k) +{ + *x = ror32(*x, 8); + *x += *y; + *x ^= k; + *y = rol32(*y, 3); + *y ^= *x; +} + +static __always_inline void speck64_unround(u32 *x, u32 *y, u32 k) +{ + *y ^= *x; + *y = ror32(*y, 3); + *x ^= k; + *x -= *y; + *x = rol32(*x, 8); +} + +void crypto_speck64_encrypt(const struct speck64_tfm_ctx *ctx, + u8 *out, const u8 *in) +{ + u32 y = get_unaligned_le32(in); + u32 x = get_unaligned_le32(in + 4); + int i; + + for (i = 0; i < ctx->nrounds; i++) + speck64_round(&x, &y, ctx->round_keys[i]); + + put_unaligned_le32(y, out); + put_unaligned_le32(x, out + 4); +} +EXPORT_SYMBOL_GPL(crypto_speck64_encrypt); + +static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) +{ + crypto_speck64_encrypt(crypto_tfm_ctx(tfm), out, in); +} + +void crypto_speck64_decrypt(const struct speck64_tfm_ctx *ctx, + u8 *out, const u8 *in) +{ + u32 y = get_unaligned_le32(in); + u32 x = get_unaligned_le32(in + 4); + int i; + + for (i = ctx->nrounds - 1; i >= 0; i--) + speck64_unround(&x, &y, ctx->round_keys[i]); + + put_unaligned_le32(y, out); + put_unaligned_le32(x, out + 4); +} +EXPORT_SYMBOL_GPL(crypto_speck64_decrypt); + +static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) +{ + crypto_speck64_decrypt(crypto_tfm_ctx(tfm), out, in); +} + +int crypto_speck64_setkey(struct speck64_tfm_ctx *ctx, const u8 *key, + unsigned int keylen) +{ + u32 l[3]; + u32 k; + int i; + + switch (keylen) { + case SPECK64_96_KEY_SIZE: + k = get_unaligned_le32(key); + l[0] = get_unaligned_le32(key + 4); + l[1] = get_unaligned_le32(key + 8); + ctx->nrounds = SPECK64_96_NROUNDS; + for (i = 0; i < ctx->nrounds; i++) { + ctx->round_keys[i] = k; + speck64_round(&l[i % 2], &k, i); + } + break; + case SPECK64_128_KEY_SIZE: + k = get_unaligned_le32(key); + l[0] = get_unaligned_le32(key + 4); + l[1] = get_unaligned_le32(key + 8); + l[2] = get_unaligned_le32(key + 12); + ctx->nrounds = SPECK64_128_NROUNDS; + for (i = 0; i < ctx->nrounds; i++) { + ctx->round_keys[i] = k; + speck64_round(&l[i % 3], &k, i); + } + break; + default: + return -EINVAL; + } + + return 0; +} +EXPORT_SYMBOL_GPL(crypto_speck64_setkey); + +static int speck64_setkey(struct crypto_tfm *tfm, const u8 *key, + unsigned int keylen) +{ + return crypto_speck64_setkey(crypto_tfm_ctx(tfm), key, keylen); +} + +/* Algorithm definitions */ + +static struct crypto_alg speck_algs[] = { + { + .cra_name = "speck128", + .cra_driver_name = "speck128-generic", + .cra_priority = 100, + .cra_flags = CRYPTO_ALG_TYPE_CIPHER, + .cra_blocksize = SPECK128_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct speck128_tfm_ctx), + .cra_module = THIS_MODULE, + .cra_u = { + .cipher = { + .cia_min_keysize = SPECK128_128_KEY_SIZE, + .cia_max_keysize = SPECK128_256_KEY_SIZE, + .cia_setkey = speck128_setkey, + .cia_encrypt = speck128_encrypt, + .cia_decrypt = speck128_decrypt + } + } + }, { + .cra_name = "speck64", + .cra_driver_name = "speck64-generic", + .cra_priority = 100, + .cra_flags = CRYPTO_ALG_TYPE_CIPHER, + .cra_blocksize = SPECK64_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct speck64_tfm_ctx), + .cra_module = THIS_MODULE, + .cra_u = { + .cipher = { + .cia_min_keysize = SPECK64_96_KEY_SIZE, + .cia_max_keysize = SPECK64_128_KEY_SIZE, + .cia_setkey = speck64_setkey, + .cia_encrypt = speck64_encrypt, + .cia_decrypt = speck64_decrypt + } + } + } +}; + +static int __init speck_module_init(void) +{ + return crypto_register_algs(speck_algs, ARRAY_SIZE(speck_algs)); +} + +static void __exit speck_module_exit(void) +{ + crypto_unregister_algs(speck_algs, ARRAY_SIZE(speck_algs)); +} + +module_init(speck_module_init); +module_exit(speck_module_exit); + +MODULE_DESCRIPTION("Speck block cipher (generic)"); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>"); +MODULE_ALIAS_CRYPTO("speck128"); +MODULE_ALIAS_CRYPTO("speck128-generic"); +MODULE_ALIAS_CRYPTO("speck64"); +MODULE_ALIAS_CRYPTO("speck64-generic");
diff --git a/crypto/testmgr.c b/crypto/testmgr.c index 62dffa0..f1df5d2 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c
@@ -3238,6 +3238,36 @@ static const struct alg_test_desc alg_test_descs[] = { } } }, { + .alg = "ecb(speck128)", + .test = alg_test_skcipher, + .suite = { + .cipher = { + .enc = { + .vecs = speck128_enc_tv_template, + .count = ARRAY_SIZE(speck128_enc_tv_template) + }, + .dec = { + .vecs = speck128_dec_tv_template, + .count = ARRAY_SIZE(speck128_dec_tv_template) + } + } + } + }, { + .alg = "ecb(speck64)", + .test = alg_test_skcipher, + .suite = { + .cipher = { + .enc = { + .vecs = speck64_enc_tv_template, + .count = ARRAY_SIZE(speck64_enc_tv_template) + }, + .dec = { + .vecs = speck64_dec_tv_template, + .count = ARRAY_SIZE(speck64_dec_tv_template) + } + } + } + }, { .alg = "ecb(tea)", .test = alg_test_skcipher, .suite = { @@ -4058,6 +4088,36 @@ static const struct alg_test_desc alg_test_descs[] = { } } }, { + .alg = "xts(speck128)", + .test = alg_test_skcipher, + .suite = { + .cipher = { + .enc = { + .vecs = speck128_xts_enc_tv_template, + .count = ARRAY_SIZE(speck128_xts_enc_tv_template) + }, + .dec = { + .vecs = speck128_xts_dec_tv_template, + .count = ARRAY_SIZE(speck128_xts_dec_tv_template) + } + } + } + }, { + .alg = "xts(speck64)", + .test = alg_test_skcipher, + .suite = { + .cipher = { + .enc = { + .vecs = speck64_xts_enc_tv_template, + .count = ARRAY_SIZE(speck64_xts_enc_tv_template) + }, + .dec = { + .vecs = speck64_xts_dec_tv_template, + .count = ARRAY_SIZE(speck64_xts_dec_tv_template) + } + } + } + }, { .alg = "xts(twofish)", .test = alg_test_skcipher, .suite = { @@ -4072,6 +4132,22 @@ static const struct alg_test_desc alg_test_descs[] = { } } } + }, { + .alg = "zstd", + .test = alg_test_comp, + .fips_allowed = 1, + .suite = { + .comp = { + .comp = { + .vecs = zstd_comp_tv_template, + .count = ZSTD_COMP_TEST_VECTORS + }, + .decomp = { + .vecs = zstd_decomp_tv_template, + .count = ZSTD_DECOMP_TEST_VECTORS + } + } + } } };
diff --git a/crypto/testmgr.h b/crypto/testmgr.h index 9033088..1f40d17 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h
@@ -13622,6 +13622,1492 @@ static struct cipher_testvec serpent_xts_dec_tv_template[] = { }, }; +/* + * Speck test vectors taken from the original paper: + * "The Simon and Speck Families of Lightweight Block Ciphers" + * https://eprint.iacr.org/2013/404.pdf + * + * Note that the paper does not make byte and word order clear. But it was + * confirmed with the authors that the intended orders are little endian byte + * order and (y, x) word order. Equivalently, the printed test vectors, when + * looking at only the bytes (ignoring the whitespace that divides them into + * words), are backwards: the left-most byte is actually the one with the + * highest memory address, while the right-most byte is actually the one with + * the lowest memory address. + */ + +static struct cipher_testvec speck128_enc_tv_template[] = { + { /* Speck128/128 */ + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", + .klen = 16, + .input = "\x20\x6d\x61\x64\x65\x20\x69\x74" + "\x20\x65\x71\x75\x69\x76\x61\x6c", + .ilen = 16, + .result = "\x18\x0d\x57\x5c\xdf\xfe\x60\x78" + "\x65\x32\x78\x79\x51\x98\x5d\xa6", + .rlen = 16, + }, { /* Speck128/192 */ + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17", + .klen = 24, + .input = "\x65\x6e\x74\x20\x74\x6f\x20\x43" + "\x68\x69\x65\x66\x20\x48\x61\x72", + .ilen = 16, + .result = "\x86\x18\x3c\xe0\x5d\x18\xbc\xf9" + "\x66\x55\x13\x13\x3a\xcf\xe4\x1b", + .rlen = 16, + }, { /* Speck128/256 */ + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f", + .klen = 32, + .input = "\x70\x6f\x6f\x6e\x65\x72\x2e\x20" + "\x49\x6e\x20\x74\x68\x6f\x73\x65", + .ilen = 16, + .result = "\x43\x8f\x18\x9c\x8d\xb4\xee\x4e" + "\x3e\xf5\xc0\x05\x04\x01\x09\x41", + .rlen = 16, + }, +}; + +static struct cipher_testvec speck128_dec_tv_template[] = { + { /* Speck128/128 */ + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", + .klen = 16, + .input = "\x18\x0d\x57\x5c\xdf\xfe\x60\x78" + "\x65\x32\x78\x79\x51\x98\x5d\xa6", + .ilen = 16, + .result = "\x20\x6d\x61\x64\x65\x20\x69\x74" + "\x20\x65\x71\x75\x69\x76\x61\x6c", + .rlen = 16, + }, { /* Speck128/192 */ + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17", + .klen = 24, + .input = "\x86\x18\x3c\xe0\x5d\x18\xbc\xf9" + "\x66\x55\x13\x13\x3a\xcf\xe4\x1b", + .ilen = 16, + .result = "\x65\x6e\x74\x20\x74\x6f\x20\x43" + "\x68\x69\x65\x66\x20\x48\x61\x72", + .rlen = 16, + }, { /* Speck128/256 */ + .key = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f", + .klen = 32, + .input = "\x43\x8f\x18\x9c\x8d\xb4\xee\x4e" + "\x3e\xf5\xc0\x05\x04\x01\x09\x41", + .ilen = 16, + .result = "\x70\x6f\x6f\x6e\x65\x72\x2e\x20" + "\x49\x6e\x20\x74\x68\x6f\x73\x65", + .rlen = 16, + }, +}; + +/* + * Speck128-XTS test vectors, taken from the AES-XTS test vectors with the + * result recomputed with Speck128 as the cipher + */ + +static struct cipher_testvec speck128_xts_enc_tv_template[] = { + { + .key = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .ilen = 32, + .result = "\xbe\xa0\xe7\x03\xd7\xfe\xab\x62" + "\x3b\x99\x4a\x64\x74\x77\xac\xed" + "\xd8\xf4\xa6\xcf\xae\xb9\x07\x42" + "\x51\xd9\xb6\x1d\xe0\x5e\xbc\x54", + .rlen = 32, + }, { + .key = "\x11\x11\x11\x11\x11\x11\x11\x11" + "\x11\x11\x11\x11\x11\x11\x11\x11" + "\x22\x22\x22\x22\x22\x22\x22\x22" + "\x22\x22\x22\x22\x22\x22\x22\x22", + .klen = 32, + .iv = "\x33\x33\x33\x33\x33\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44", + .ilen = 32, + .result = "\xfb\x53\x81\x75\x6f\x9f\x34\xad" + "\x7e\x01\xed\x7b\xcc\xda\x4e\x4a" + "\xd4\x84\xa4\x53\xd5\x88\x73\x1b" + "\xfd\xcb\xae\x0d\xf3\x04\xee\xe6", + .rlen = 32, + }, { + .key = "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8" + "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0" + "\x22\x22\x22\x22\x22\x22\x22\x22" + "\x22\x22\x22\x22\x22\x22\x22\x22", + .klen = 32, + .iv = "\x33\x33\x33\x33\x33\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44", + .ilen = 32, + .result = "\x21\x52\x84\x15\xd1\xf7\x21\x55" + "\xd9\x75\x4a\xd3\xc5\xdb\x9f\x7d" + "\xda\x63\xb2\xf1\x82\xb0\x89\x59" + "\x86\xd4\xaa\xaa\xdd\xff\x4f\x92", + .rlen = 32, + }, { + .key = "\x27\x18\x28\x18\x28\x45\x90\x45" + "\x23\x53\x60\x28\x74\x71\x35\x26" + "\x31\x41\x59\x26\x53\x58\x97\x93" + "\x23\x84\x62\x64\x33\x83\x27\x95", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", + .ilen = 512, + .result = "\x57\xb5\xf8\x71\x6e\x6d\xdd\x82" + "\x53\xd0\xed\x2d\x30\xc1\x20\xef" + "\x70\x67\x5e\xff\x09\x70\xbb\xc1" + "\x3a\x7b\x48\x26\xd9\x0b\xf4\x48" + "\xbe\xce\xb1\xc7\xb2\x67\xc4\xa7" + "\x76\xf8\x36\x30\xb7\xb4\x9a\xd9" + "\xf5\x9d\xd0\x7b\xc1\x06\x96\x44" + "\x19\xc5\x58\x84\x63\xb9\x12\x68" + "\x68\xc7\xaa\x18\x98\xf2\x1f\x5c" + "\x39\xa6\xd8\x32\x2b\xc3\x51\xfd" + "\x74\x79\x2e\xb4\x44\xd7\x69\xc4" + "\xfc\x29\xe6\xed\x26\x1e\xa6\x9d" + "\x1c\xbe\x00\x0e\x7f\x3a\xca\xfb" + "\x6d\x13\x65\xa0\xf9\x31\x12\xe2" + "\x26\xd1\xec\x2b\x0a\x8b\x59\x99" + "\xa7\x49\xa0\x0e\x09\x33\x85\x50" + "\xc3\x23\xca\x7a\xdd\x13\x45\x5f" + "\xde\x4c\xa7\xcb\x00\x8a\x66\x6f" + "\xa2\xb6\xb1\x2e\xe1\xa0\x18\xf6" + "\xad\xf3\xbd\xeb\xc7\xef\x55\x4f" + "\x79\x91\x8d\x36\x13\x7b\xd0\x4a" + "\x6c\x39\xfb\x53\xb8\x6f\x02\x51" + "\xa5\x20\xac\x24\x1c\x73\x59\x73" + "\x58\x61\x3a\x87\x58\xb3\x20\x56" + "\x39\x06\x2b\x4d\xd3\x20\x2b\x89" + "\x3f\xa2\xf0\x96\xeb\x7f\xa4\xcd" + "\x11\xae\xbd\xcb\x3a\xb4\xd9\x91" + "\x09\x35\x71\x50\x65\xac\x92\xe3" + "\x7b\x32\xc0\x7a\xdd\xd4\xc3\x92" + "\x6f\xeb\x79\xde\x6f\xd3\x25\xc9" + "\xcd\x63\xf5\x1e\x7a\x3b\x26\x9d" + "\x77\x04\x80\xa9\xbf\x38\xb5\xbd" + "\xb8\x05\x07\xbd\xfd\xab\x7b\xf8" + "\x2a\x26\xcc\x49\x14\x6d\x55\x01" + "\x06\x94\xd8\xb2\x2d\x53\x83\x1b" + "\x8f\xd4\xdd\x57\x12\x7e\x18\xba" + "\x8e\xe2\x4d\x80\xef\x7e\x6b\x9d" + "\x24\xa9\x60\xa4\x97\x85\x86\x2a" + "\x01\x00\x09\xf1\xcb\x4a\x24\x1c" + "\xd8\xf6\xe6\x5b\xe7\x5d\xf2\xc4" + "\x97\x1c\x10\xc6\x4d\x66\x4f\x98" + "\x87\x30\xac\xd5\xea\x73\x49\x10" + "\x80\xea\xe5\x5f\x4d\x5f\x03\x33" + "\x66\x02\x35\x3d\x60\x06\x36\x4f" + "\x14\x1c\xd8\x07\x1f\x78\xd0\xf8" + "\x4f\x6c\x62\x7c\x15\xa5\x7c\x28" + "\x7c\xcc\xeb\x1f\xd1\x07\x90\x93" + "\x7e\xc2\xa8\x3a\x80\xc0\xf5\x30" + "\xcc\x75\xcf\x16\x26\xa9\x26\x3b" + "\xe7\x68\x2f\x15\x21\x5b\xe4\x00" + "\xbd\x48\x50\xcd\x75\x70\xc4\x62" + "\xbb\x41\xfb\x89\x4a\x88\x3b\x3b" + "\x51\x66\x02\x69\x04\x97\x36\xd4" + "\x75\xae\x0b\xa3\x42\xf8\xca\x79" + "\x8f\x93\xe9\xcc\x38\xbd\xd6\xd2" + "\xf9\x70\x4e\xc3\x6a\x8e\x25\xbd" + "\xea\x15\x5a\xa0\x85\x7e\x81\x0d" + "\x03\xe7\x05\x39\xf5\x05\x26\xee" + "\xec\xaa\x1f\x3d\xc9\x98\x76\x01" + "\x2c\xf4\xfc\xa3\x88\x77\x38\xc4" + "\x50\x65\x50\x6d\x04\x1f\xdf\x5a" + "\xaa\xf2\x01\xa9\xc1\x8d\xee\xca" + "\x47\x26\xef\x39\xb8\xb4\xf2\xd1" + "\xd6\xbb\x1b\x2a\xc1\x34\x14\xcf", + .rlen = 512, + }, { + .key = "\x27\x18\x28\x18\x28\x45\x90\x45" + "\x23\x53\x60\x28\x74\x71\x35\x26" + "\x62\x49\x77\x57\x24\x70\x93\x69" + "\x99\x59\x57\x49\x66\x96\x76\x27" + "\x31\x41\x59\x26\x53\x58\x97\x93" + "\x23\x84\x62\x64\x33\x83\x27\x95" + "\x02\x88\x41\x97\x16\x93\x99\x37" + "\x51\x05\x82\x09\x74\x94\x45\x92", + .klen = 64, + .iv = "\xff\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", + .ilen = 512, + .result = "\xc5\x85\x2a\x4b\x73\xe4\xf6\xf1" + "\x7e\xf9\xf6\xe9\xa3\x73\x36\xcb" + "\xaa\xb6\x22\xb0\x24\x6e\x3d\x73" + "\x92\x99\xde\xd3\x76\xed\xcd\x63" + "\x64\x3a\x22\x57\xc1\x43\x49\xd4" + "\x79\x36\x31\x19\x62\xae\x10\x7e" + "\x7d\xcf\x7a\xe2\x6b\xce\x27\xfa" + "\xdc\x3d\xd9\x83\xd3\x42\x4c\xe0" + "\x1b\xd6\x1d\x1a\x6f\xd2\x03\x00" + "\xfc\x81\x99\x8a\x14\x62\xf5\x7e" + "\x0d\xe7\x12\xe8\x17\x9d\x0b\xec" + "\xe2\xf7\xc9\xa7\x63\xd1\x79\xb6" + "\x62\x62\x37\xfe\x0a\x4c\x4a\x37" + "\x70\xc7\x5e\x96\x5f\xbc\x8e\x9e" + "\x85\x3c\x4f\x26\x64\x85\xbc\x68" + "\xb0\xe0\x86\x5e\x26\x41\xce\x11" + "\x50\xda\x97\x14\xe9\x9e\xc7\x6d" + "\x3b\xdc\x43\xde\x2b\x27\x69\x7d" + "\xfc\xb0\x28\xbd\x8f\xb1\xc6\x31" + "\x14\x4d\xf0\x74\x37\xfd\x07\x25" + "\x96\x55\xe5\xfc\x9e\x27\x2a\x74" + "\x1b\x83\x4d\x15\x83\xac\x57\xa0" + "\xac\xa5\xd0\x38\xef\x19\x56\x53" + "\x25\x4b\xfc\xce\x04\x23\xe5\x6b" + "\xf6\xc6\x6c\x32\x0b\xb3\x12\xc5" + "\xed\x22\x34\x1c\x5d\xed\x17\x06" + "\x36\xa3\xe6\x77\xb9\x97\x46\xb8" + "\xe9\x3f\x7e\xc7\xbc\x13\x5c\xdc" + "\x6e\x3f\x04\x5e\xd1\x59\xa5\x82" + "\x35\x91\x3d\x1b\xe4\x97\x9f\x92" + "\x1c\x5e\x5f\x6f\x41\xd4\x62\xa1" + "\x8d\x39\xfc\x42\xfb\x38\x80\xb9" + "\x0a\xe3\xcc\x6a\x93\xd9\x7a\xb1" + "\xe9\x69\xaf\x0a\x6b\x75\x38\xa7" + "\xa1\xbf\xf7\xda\x95\x93\x4b\x78" + "\x19\xf5\x94\xf9\xd2\x00\x33\x37" + "\xcf\xf5\x9e\x9c\xf3\xcc\xa6\xee" + "\x42\xb2\x9e\x2c\x5f\x48\x23\x26" + "\x15\x25\x17\x03\x3d\xfe\x2c\xfc" + "\xeb\xba\xda\xe0\x00\x05\xb6\xa6" + "\x07\xb3\xe8\x36\x5b\xec\x5b\xbf" + "\xd6\x5b\x00\x74\xc6\x97\xf1\x6a" + "\x49\xa1\xc3\xfa\x10\x52\xb9\x14" + "\xad\xb7\x73\xf8\x78\x12\xc8\x59" + "\x17\x80\x4c\x57\x39\xf1\x6d\x80" + "\x25\x77\x0f\x5e\x7d\xf0\xaf\x21" + "\xec\xce\xb7\xc8\x02\x8a\xed\x53" + "\x2c\x25\x68\x2e\x1f\x85\x5e\x67" + "\xd1\x07\x7a\x3a\x89\x08\xe0\x34" + "\xdc\xdb\x26\xb4\x6b\x77\xfc\x40" + "\x31\x15\x72\xa0\xf0\x73\xd9\x3b" + "\xd5\xdb\xfe\xfc\x8f\xa9\x44\xa2" + "\x09\x9f\xc6\x33\xe5\xe2\x88\xe8" + "\xf3\xf0\x1a\xf4\xce\x12\x0f\xd6" + "\xf7\x36\xe6\xa4\xf4\x7a\x10\x58" + "\xcc\x1f\x48\x49\x65\x47\x75\xe9" + "\x28\xe1\x65\x7b\xf2\xc4\xb5\x07" + "\xf2\xec\x76\xd8\x8f\x09\xf3\x16" + "\xa1\x51\x89\x3b\xeb\x96\x42\xac" + "\x65\xe0\x67\x63\x29\xdc\xb4\x7d" + "\xf2\x41\x51\x6a\xcb\xde\x3c\xfb" + "\x66\x8d\x13\xca\xe0\x59\x2a\x00" + "\xc9\x53\x4c\xe6\x9e\xe2\x73\xd5" + "\x67\x19\xb2\xbd\x9a\x63\xd7\x5c", + .rlen = 512, + .also_non_np = 1, + .np = 3, + .tap = { 512 - 20, 4, 16 }, + } +}; + +static struct cipher_testvec speck128_xts_dec_tv_template[] = { + { + .key = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\xbe\xa0\xe7\x03\xd7\xfe\xab\x62" + "\x3b\x99\x4a\x64\x74\x77\xac\xed" + "\xd8\xf4\xa6\xcf\xae\xb9\x07\x42" + "\x51\xd9\xb6\x1d\xe0\x5e\xbc\x54", + .ilen = 32, + .result = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .rlen = 32, + }, { + .key = "\x11\x11\x11\x11\x11\x11\x11\x11" + "\x11\x11\x11\x11\x11\x11\x11\x11" + "\x22\x22\x22\x22\x22\x22\x22\x22" + "\x22\x22\x22\x22\x22\x22\x22\x22", + .klen = 32, + .iv = "\x33\x33\x33\x33\x33\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\xfb\x53\x81\x75\x6f\x9f\x34\xad" + "\x7e\x01\xed\x7b\xcc\xda\x4e\x4a" + "\xd4\x84\xa4\x53\xd5\x88\x73\x1b" + "\xfd\xcb\xae\x0d\xf3\x04\xee\xe6", + .ilen = 32, + .result = "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44", + .rlen = 32, + }, { + .key = "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8" + "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0" + "\x22\x22\x22\x22\x22\x22\x22\x22" + "\x22\x22\x22\x22\x22\x22\x22\x22", + .klen = 32, + .iv = "\x33\x33\x33\x33\x33\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x21\x52\x84\x15\xd1\xf7\x21\x55" + "\xd9\x75\x4a\xd3\xc5\xdb\x9f\x7d" + "\xda\x63\xb2\xf1\x82\xb0\x89\x59" + "\x86\xd4\xaa\xaa\xdd\xff\x4f\x92", + .ilen = 32, + .result = "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44", + .rlen = 32, + }, { + .key = "\x27\x18\x28\x18\x28\x45\x90\x45" + "\x23\x53\x60\x28\x74\x71\x35\x26" + "\x31\x41\x59\x26\x53\x58\x97\x93" + "\x23\x84\x62\x64\x33\x83\x27\x95", + .klen = 32, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x57\xb5\xf8\x71\x6e\x6d\xdd\x82" + "\x53\xd0\xed\x2d\x30\xc1\x20\xef" + "\x70\x67\x5e\xff\x09\x70\xbb\xc1" + "\x3a\x7b\x48\x26\xd9\x0b\xf4\x48" + "\xbe\xce\xb1\xc7\xb2\x67\xc4\xa7" + "\x76\xf8\x36\x30\xb7\xb4\x9a\xd9" + "\xf5\x9d\xd0\x7b\xc1\x06\x96\x44" + "\x19\xc5\x58\x84\x63\xb9\x12\x68" + "\x68\xc7\xaa\x18\x98\xf2\x1f\x5c" + "\x39\xa6\xd8\x32\x2b\xc3\x51\xfd" + "\x74\x79\x2e\xb4\x44\xd7\x69\xc4" + "\xfc\x29\xe6\xed\x26\x1e\xa6\x9d" + "\x1c\xbe\x00\x0e\x7f\x3a\xca\xfb" + "\x6d\x13\x65\xa0\xf9\x31\x12\xe2" + "\x26\xd1\xec\x2b\x0a\x8b\x59\x99" + "\xa7\x49\xa0\x0e\x09\x33\x85\x50" + "\xc3\x23\xca\x7a\xdd\x13\x45\x5f" + "\xde\x4c\xa7\xcb\x00\x8a\x66\x6f" + "\xa2\xb6\xb1\x2e\xe1\xa0\x18\xf6" + "\xad\xf3\xbd\xeb\xc7\xef\x55\x4f" + "\x79\x91\x8d\x36\x13\x7b\xd0\x4a" + "\x6c\x39\xfb\x53\xb8\x6f\x02\x51" + "\xa5\x20\xac\x24\x1c\x73\x59\x73" + "\x58\x61\x3a\x87\x58\xb3\x20\x56" + "\x39\x06\x2b\x4d\xd3\x20\x2b\x89" + "\x3f\xa2\xf0\x96\xeb\x7f\xa4\xcd" + "\x11\xae\xbd\xcb\x3a\xb4\xd9\x91" + "\x09\x35\x71\x50\x65\xac\x92\xe3" + "\x7b\x32\xc0\x7a\xdd\xd4\xc3\x92" + "\x6f\xeb\x79\xde\x6f\xd3\x25\xc9" + "\xcd\x63\xf5\x1e\x7a\x3b\x26\x9d" + "\x77\x04\x80\xa9\xbf\x38\xb5\xbd" + "\xb8\x05\x07\xbd\xfd\xab\x7b\xf8" + "\x2a\x26\xcc\x49\x14\x6d\x55\x01" + "\x06\x94\xd8\xb2\x2d\x53\x83\x1b" + "\x8f\xd4\xdd\x57\x12\x7e\x18\xba" + "\x8e\xe2\x4d\x80\xef\x7e\x6b\x9d" + "\x24\xa9\x60\xa4\x97\x85\x86\x2a" + "\x01\x00\x09\xf1\xcb\x4a\x24\x1c" + "\xd8\xf6\xe6\x5b\xe7\x5d\xf2\xc4" + "\x97\x1c\x10\xc6\x4d\x66\x4f\x98" + "\x87\x30\xac\xd5\xea\x73\x49\x10" + "\x80\xea\xe5\x5f\x4d\x5f\x03\x33" + "\x66\x02\x35\x3d\x60\x06\x36\x4f" + "\x14\x1c\xd8\x07\x1f\x78\xd0\xf8" + "\x4f\x6c\x62\x7c\x15\xa5\x7c\x28" + "\x7c\xcc\xeb\x1f\xd1\x07\x90\x93" + "\x7e\xc2\xa8\x3a\x80\xc0\xf5\x30" + "\xcc\x75\xcf\x16\x26\xa9\x26\x3b" + "\xe7\x68\x2f\x15\x21\x5b\xe4\x00" + "\xbd\x48\x50\xcd\x75\x70\xc4\x62" + "\xbb\x41\xfb\x89\x4a\x88\x3b\x3b" + "\x51\x66\x02\x69\x04\x97\x36\xd4" + "\x75\xae\x0b\xa3\x42\xf8\xca\x79" + "\x8f\x93\xe9\xcc\x38\xbd\xd6\xd2" + "\xf9\x70\x4e\xc3\x6a\x8e\x25\xbd" + "\xea\x15\x5a\xa0\x85\x7e\x81\x0d" + "\x03\xe7\x05\x39\xf5\x05\x26\xee" + "\xec\xaa\x1f\x3d\xc9\x98\x76\x01" + "\x2c\xf4\xfc\xa3\x88\x77\x38\xc4" + "\x50\x65\x50\x6d\x04\x1f\xdf\x5a" + "\xaa\xf2\x01\xa9\xc1\x8d\xee\xca" + "\x47\x26\xef\x39\xb8\xb4\xf2\xd1" + "\xd6\xbb\x1b\x2a\xc1\x34\x14\xcf", + .ilen = 512, + .result = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", + .rlen = 512, + }, { + .key = "\x27\x18\x28\x18\x28\x45\x90\x45" + "\x23\x53\x60\x28\x74\x71\x35\x26" + "\x62\x49\x77\x57\x24\x70\x93\x69" + "\x99\x59\x57\x49\x66\x96\x76\x27" + "\x31\x41\x59\x26\x53\x58\x97\x93" + "\x23\x84\x62\x64\x33\x83\x27\x95" + "\x02\x88\x41\x97\x16\x93\x99\x37" + "\x51\x05\x82\x09\x74\x94\x45\x92", + .klen = 64, + .iv = "\xff\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\xc5\x85\x2a\x4b\x73\xe4\xf6\xf1" + "\x7e\xf9\xf6\xe9\xa3\x73\x36\xcb" + "\xaa\xb6\x22\xb0\x24\x6e\x3d\x73" + "\x92\x99\xde\xd3\x76\xed\xcd\x63" + "\x64\x3a\x22\x57\xc1\x43\x49\xd4" + "\x79\x36\x31\x19\x62\xae\x10\x7e" + "\x7d\xcf\x7a\xe2\x6b\xce\x27\xfa" + "\xdc\x3d\xd9\x83\xd3\x42\x4c\xe0" + "\x1b\xd6\x1d\x1a\x6f\xd2\x03\x00" + "\xfc\x81\x99\x8a\x14\x62\xf5\x7e" + "\x0d\xe7\x12\xe8\x17\x9d\x0b\xec" + "\xe2\xf7\xc9\xa7\x63\xd1\x79\xb6" + "\x62\x62\x37\xfe\x0a\x4c\x4a\x37" + "\x70\xc7\x5e\x96\x5f\xbc\x8e\x9e" + "\x85\x3c\x4f\x26\x64\x85\xbc\x68" + "\xb0\xe0\x86\x5e\x26\x41\xce\x11" + "\x50\xda\x97\x14\xe9\x9e\xc7\x6d" + "\x3b\xdc\x43\xde\x2b\x27\x69\x7d" + "\xfc\xb0\x28\xbd\x8f\xb1\xc6\x31" + "\x14\x4d\xf0\x74\x37\xfd\x07\x25" + "\x96\x55\xe5\xfc\x9e\x27\x2a\x74" + "\x1b\x83\x4d\x15\x83\xac\x57\xa0" + "\xac\xa5\xd0\x38\xef\x19\x56\x53" + "\x25\x4b\xfc\xce\x04\x23\xe5\x6b" + "\xf6\xc6\x6c\x32\x0b\xb3\x12\xc5" + "\xed\x22\x34\x1c\x5d\xed\x17\x06" + "\x36\xa3\xe6\x77\xb9\x97\x46\xb8" + "\xe9\x3f\x7e\xc7\xbc\x13\x5c\xdc" + "\x6e\x3f\x04\x5e\xd1\x59\xa5\x82" + "\x35\x91\x3d\x1b\xe4\x97\x9f\x92" + "\x1c\x5e\x5f\x6f\x41\xd4\x62\xa1" + "\x8d\x39\xfc\x42\xfb\x38\x80\xb9" + "\x0a\xe3\xcc\x6a\x93\xd9\x7a\xb1" + "\xe9\x69\xaf\x0a\x6b\x75\x38\xa7" + "\xa1\xbf\xf7\xda\x95\x93\x4b\x78" + "\x19\xf5\x94\xf9\xd2\x00\x33\x37" + "\xcf\xf5\x9e\x9c\xf3\xcc\xa6\xee" + "\x42\xb2\x9e\x2c\x5f\x48\x23\x26" + "\x15\x25\x17\x03\x3d\xfe\x2c\xfc" + "\xeb\xba\xda\xe0\x00\x05\xb6\xa6" + "\x07\xb3\xe8\x36\x5b\xec\x5b\xbf" + "\xd6\x5b\x00\x74\xc6\x97\xf1\x6a" + "\x49\xa1\xc3\xfa\x10\x52\xb9\x14" + "\xad\xb7\x73\xf8\x78\x12\xc8\x59" + "\x17\x80\x4c\x57\x39\xf1\x6d\x80" + "\x25\x77\x0f\x5e\x7d\xf0\xaf\x21" + "\xec\xce\xb7\xc8\x02\x8a\xed\x53" + "\x2c\x25\x68\x2e\x1f\x85\x5e\x67" + "\xd1\x07\x7a\x3a\x89\x08\xe0\x34" + "\xdc\xdb\x26\xb4\x6b\x77\xfc\x40" + "\x31\x15\x72\xa0\xf0\x73\xd9\x3b" + "\xd5\xdb\xfe\xfc\x8f\xa9\x44\xa2" + "\x09\x9f\xc6\x33\xe5\xe2\x88\xe8" + "\xf3\xf0\x1a\xf4\xce\x12\x0f\xd6" + "\xf7\x36\xe6\xa4\xf4\x7a\x10\x58" + "\xcc\x1f\x48\x49\x65\x47\x75\xe9" + "\x28\xe1\x65\x7b\xf2\xc4\xb5\x07" + "\xf2\xec\x76\xd8\x8f\x09\xf3\x16" + "\xa1\x51\x89\x3b\xeb\x96\x42\xac" + "\x65\xe0\x67\x63\x29\xdc\xb4\x7d" + "\xf2\x41\x51\x6a\xcb\xde\x3c\xfb" + "\x66\x8d\x13\xca\xe0\x59\x2a\x00" + "\xc9\x53\x4c\xe6\x9e\xe2\x73\xd5" + "\x67\x19\xb2\xbd\x9a\x63\xd7\x5c", + .ilen = 512, + .result = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", + .rlen = 512, + .also_non_np = 1, + .np = 3, + .tap = { 512 - 20, 4, 16 }, + } +}; + +static struct cipher_testvec speck64_enc_tv_template[] = { + { /* Speck64/96 */ + .key = "\x00\x01\x02\x03\x08\x09\x0a\x0b" + "\x10\x11\x12\x13", + .klen = 12, + .input = "\x65\x61\x6e\x73\x20\x46\x61\x74", + .ilen = 8, + .result = "\x6c\x94\x75\x41\xec\x52\x79\x9f", + .rlen = 8, + }, { /* Speck64/128 */ + .key = "\x00\x01\x02\x03\x08\x09\x0a\x0b" + "\x10\x11\x12\x13\x18\x19\x1a\x1b", + .klen = 16, + .input = "\x2d\x43\x75\x74\x74\x65\x72\x3b", + .ilen = 8, + .result = "\x8b\x02\x4e\x45\x48\xa5\x6f\x8c", + .rlen = 8, + }, +}; + +static struct cipher_testvec speck64_dec_tv_template[] = { + { /* Speck64/96 */ + .key = "\x00\x01\x02\x03\x08\x09\x0a\x0b" + "\x10\x11\x12\x13", + .klen = 12, + .input = "\x6c\x94\x75\x41\xec\x52\x79\x9f", + .ilen = 8, + .result = "\x65\x61\x6e\x73\x20\x46\x61\x74", + .rlen = 8, + }, { /* Speck64/128 */ + .key = "\x00\x01\x02\x03\x08\x09\x0a\x0b" + "\x10\x11\x12\x13\x18\x19\x1a\x1b", + .klen = 16, + .input = "\x8b\x02\x4e\x45\x48\xa5\x6f\x8c", + .ilen = 8, + .result = "\x2d\x43\x75\x74\x74\x65\x72\x3b", + .rlen = 8, + }, +}; + +/* + * Speck64-XTS test vectors, taken from the AES-XTS test vectors with the result + * recomputed with Speck64 as the cipher, and key lengths adjusted + */ + +static struct cipher_testvec speck64_xts_enc_tv_template[] = { + { + .key = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .klen = 24, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .ilen = 32, + .result = "\x84\xaf\x54\x07\x19\xd4\x7c\xa6" + "\xe4\xfe\xdf\xc4\x1f\x34\xc3\xc2" + "\x80\xf5\x72\xe7\xcd\xf0\x99\x22" + "\x35\xa7\x2f\x06\xef\xdc\x51\xaa", + .rlen = 32, + }, { + .key = "\x11\x11\x11\x11\x11\x11\x11\x11" + "\x11\x11\x11\x11\x11\x11\x11\x11" + "\x22\x22\x22\x22\x22\x22\x22\x22", + .klen = 24, + .iv = "\x33\x33\x33\x33\x33\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44", + .ilen = 32, + .result = "\x12\x56\x73\xcd\x15\x87\xa8\x59" + "\xcf\x84\xae\xd9\x1c\x66\xd6\x9f" + "\xb3\x12\x69\x7e\x36\xeb\x52\xff" + "\x62\xdd\xba\x90\xb3\xe1\xee\x99", + .rlen = 32, + }, { + .key = "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8" + "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0" + "\x22\x22\x22\x22\x22\x22\x22\x22", + .klen = 24, + .iv = "\x33\x33\x33\x33\x33\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44", + .ilen = 32, + .result = "\x15\x1b\xe4\x2c\xa2\x5a\x2d\x2c" + "\x27\x36\xc0\xbf\x5d\xea\x36\x37" + "\x2d\x1a\x88\xbc\x66\xb5\xd0\x0b" + "\xa1\xbc\x19\xb2\x0f\x3b\x75\x34", + .rlen = 32, + }, { + .key = "\x27\x18\x28\x18\x28\x45\x90\x45" + "\x23\x53\x60\x28\x74\x71\x35\x26" + "\x31\x41\x59\x26\x53\x58\x97\x93", + .klen = 24, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", + .ilen = 512, + .result = "\xaf\xa1\x81\xa6\x32\xbb\x15\x8e" + "\xf8\x95\x2e\xd3\xe6\xee\x7e\x09" + "\x0c\x1a\xf5\x02\x97\x8b\xe3\xb3" + "\x11\xc7\x39\x96\xd0\x95\xf4\x56" + "\xf4\xdd\x03\x38\x01\x44\x2c\xcf" + "\x88\xae\x8e\x3c\xcd\xe7\xaa\x66" + "\xfe\x3d\xc6\xfb\x01\x23\x51\x43" + "\xd5\xd2\x13\x86\x94\x34\xe9\x62" + "\xf9\x89\xe3\xd1\x7b\xbe\xf8\xef" + "\x76\x35\x04\x3f\xdb\x23\x9d\x0b" + "\x85\x42\xb9\x02\xd6\xcc\xdb\x96" + "\xa7\x6b\x27\xb6\xd4\x45\x8f\x7d" + "\xae\xd2\x04\xd5\xda\xc1\x7e\x24" + "\x8c\x73\xbe\x48\x7e\xcf\x65\x28" + "\x29\xe5\xbe\x54\x30\xcb\x46\x95" + "\x4f\x2e\x8a\x36\xc8\x27\xc5\xbe" + "\xd0\x1a\xaf\xab\x26\xcd\x9e\x69" + "\xa1\x09\x95\x71\x26\xe9\xc4\xdf" + "\xe6\x31\xc3\x46\xda\xaf\x0b\x41" + "\x1f\xab\xb1\x8e\xd6\xfc\x0b\xb3" + "\x82\xc0\x37\x27\xfc\x91\xa7\x05" + "\xfb\xc5\xdc\x2b\x74\x96\x48\x43" + "\x5d\x9c\x19\x0f\x60\x63\x3a\x1f" + "\x6f\xf0\x03\xbe\x4d\xfd\xc8\x4a" + "\xc6\xa4\x81\x6d\xc3\x12\x2a\x5c" + "\x07\xff\xf3\x72\x74\x48\xb5\x40" + "\x50\xb5\xdd\x90\x43\x31\x18\x15" + "\x7b\xf2\xa6\xdb\x83\xc8\x4b\x4a" + "\x29\x93\x90\x8b\xda\x07\xf0\x35" + "\x6d\x90\x88\x09\x4e\x83\xf5\x5b" + "\x94\x12\xbb\x33\x27\x1d\x3f\x23" + "\x51\xa8\x7c\x07\xa2\xae\x77\xa6" + "\x50\xfd\xcc\xc0\x4f\x80\x7a\x9f" + "\x66\xdd\xcd\x75\x24\x8b\x33\xf7" + "\x20\xdb\x83\x9b\x4f\x11\x63\x6e" + "\xcf\x37\xef\xc9\x11\x01\x5c\x45" + "\x32\x99\x7c\x3c\x9e\x42\x89\xe3" + "\x70\x6d\x15\x9f\xb1\xe6\xb6\x05" + "\xfe\x0c\xb9\x49\x2d\x90\x6d\xcc" + "\x5d\x3f\xc1\xfe\x89\x0a\x2e\x2d" + "\xa0\xa8\x89\x3b\x73\x39\xa5\x94" + "\x4c\xa4\xa6\xbb\xa7\x14\x46\x89" + "\x10\xff\xaf\xef\xca\xdd\x4f\x80" + "\xb3\xdf\x3b\xab\xd4\xe5\x5a\xc7" + "\x33\xca\x00\x8b\x8b\x3f\xea\xec" + "\x68\x8a\xc2\x6d\xfd\xd4\x67\x0f" + "\x22\x31\xe1\x0e\xfe\x5a\x04\xd5" + "\x64\xa3\xf1\x1a\x76\x28\xcc\x35" + "\x36\xa7\x0a\x74\xf7\x1c\x44\x9b" + "\xc7\x1b\x53\x17\x02\xea\xd1\xad" + "\x13\x51\x73\xc0\xa0\xb2\x05\x32" + "\xa8\xa2\x37\x2e\xe1\x7a\x3a\x19" + "\x26\xb4\x6c\x62\x5d\xb3\x1a\x1d" + "\x59\xda\xee\x1a\x22\x18\xda\x0d" + "\x88\x0f\x55\x8b\x72\x62\xfd\xc1" + "\x69\x13\xcd\x0d\x5f\xc1\x09\x52" + "\xee\xd6\xe3\x84\x4d\xee\xf6\x88" + "\xaf\x83\xdc\x76\xf4\xc0\x93\x3f" + "\x4a\x75\x2f\xb0\x0b\x3e\xc4\x54" + "\x7d\x69\x8d\x00\x62\x77\x0d\x14" + "\xbe\x7c\xa6\x7d\xc5\x24\x4f\xf3" + "\x50\xf7\x5f\xf4\xc2\xca\x41\x97" + "\x37\xbe\x75\x74\xcd\xf0\x75\x6e" + "\x25\x23\x94\xbd\xda\x8d\xb0\xd4", + .rlen = 512, + }, { + .key = "\x27\x18\x28\x18\x28\x45\x90\x45" + "\x23\x53\x60\x28\x74\x71\x35\x26" + "\x62\x49\x77\x57\x24\x70\x93\x69" + "\x99\x59\x57\x49\x66\x96\x76\x27", + .klen = 32, + .iv = "\xff\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", + .ilen = 512, + .result = "\x55\xed\x71\xd3\x02\x8e\x15\x3b" + "\xc6\x71\x29\x2d\x3e\x89\x9f\x59" + "\x68\x6a\xcc\x8a\x56\x97\xf3\x95" + "\x4e\x51\x08\xda\x2a\xf8\x6f\x3c" + "\x78\x16\xea\x80\xdb\x33\x75\x94" + "\xf9\x29\xc4\x2b\x76\x75\x97\xc7" + "\xf2\x98\x2c\xf9\xff\xc8\xd5\x2b" + "\x18\xf1\xaf\xcf\x7c\xc5\x0b\xee" + "\xad\x3c\x76\x7c\xe6\x27\xa2\x2a" + "\xe4\x66\xe1\xab\xa2\x39\xfc\x7c" + "\xf5\xec\x32\x74\xa3\xb8\x03\x88" + "\x52\xfc\x2e\x56\x3f\xa1\xf0\x9f" + "\x84\x5e\x46\xed\x20\x89\xb6\x44" + "\x8d\xd0\xed\x54\x47\x16\xbe\x95" + "\x8a\xb3\x6b\x72\xc4\x32\x52\x13" + "\x1b\xb0\x82\xbe\xac\xf9\x70\xa6" + "\x44\x18\xdd\x8c\x6e\xca\x6e\x45" + "\x8f\x1e\x10\x07\x57\x25\x98\x7b" + "\x17\x8c\x78\xdd\x80\xa7\xd9\xd8" + "\x63\xaf\xb9\x67\x57\xfd\xbc\xdb" + "\x44\xe9\xc5\x65\xd1\xc7\x3b\xff" + "\x20\xa0\x80\x1a\xc3\x9a\xad\x5e" + "\x5d\x3b\xd3\x07\xd9\xf5\xfd\x3d" + "\x4a\x8b\xa8\xd2\x6e\x7a\x51\x65" + "\x6c\x8e\x95\xe0\x45\xc9\x5f\x4a" + "\x09\x3c\x3d\x71\x7f\x0c\x84\x2a" + "\xc8\x48\x52\x1a\xc2\xd5\xd6\x78" + "\x92\x1e\xa0\x90\x2e\xea\xf0\xf3" + "\xdc\x0f\xb1\xaf\x0d\x9b\x06\x2e" + "\x35\x10\x30\x82\x0d\xe7\xc5\x9b" + "\xde\x44\x18\xbd\x9f\xd1\x45\xa9" + "\x7b\x7a\x4a\xad\x35\x65\x27\xca" + "\xb2\xc3\xd4\x9b\x71\x86\x70\xee" + "\xf1\x89\x3b\x85\x4b\x5b\xaa\xaf" + "\xfc\x42\xc8\x31\x59\xbe\x16\x60" + "\x4f\xf9\xfa\x12\xea\xd0\xa7\x14" + "\xf0\x7a\xf3\xd5\x8d\xbd\x81\xef" + "\x52\x7f\x29\x51\x94\x20\x67\x3c" + "\xd1\xaf\x77\x9f\x22\x5a\x4e\x63" + "\xe7\xff\x73\x25\xd1\xdd\x96\x8a" + "\x98\x52\x6d\xf3\xac\x3e\xf2\x18" + "\x6d\xf6\x0a\x29\xa6\x34\x3d\xed" + "\xe3\x27\x0d\x9d\x0a\x02\x44\x7e" + "\x5a\x7e\x67\x0f\x0a\x9e\xd6\xad" + "\x91\xe6\x4d\x81\x8c\x5c\x59\xaa" + "\xfb\xeb\x56\x53\xd2\x7d\x4c\x81" + "\x65\x53\x0f\x41\x11\xbd\x98\x99" + "\xf9\xc6\xfa\x51\x2e\xa3\xdd\x8d" + "\x84\x98\xf9\x34\xed\x33\x2a\x1f" + "\x82\xed\xc1\x73\x98\xd3\x02\xdc" + "\xe6\xc2\x33\x1d\xa2\xb4\xca\x76" + "\x63\x51\x34\x9d\x96\x12\xae\xce" + "\x83\xc9\x76\x5e\xa4\x1b\x53\x37" + "\x17\xd5\xc0\x80\x1d\x62\xf8\x3d" + "\x54\x27\x74\xbb\x10\x86\x57\x46" + "\x68\xe1\xed\x14\xe7\x9d\xfc\x84" + "\x47\xbc\xc2\xf8\x19\x4b\x99\xcf" + "\x7a\xe9\xc4\xb8\x8c\x82\x72\x4d" + "\x7b\x4f\x38\x55\x36\x71\x64\xc1" + "\xfc\x5c\x75\x52\x33\x02\x18\xf8" + "\x17\xe1\x2b\xc2\x43\x39\xbd\x76" + "\x9b\x63\x76\x32\x2f\x19\x72\x10" + "\x9f\x21\x0c\xf1\x66\x50\x7f\xa5" + "\x0d\x1f\x46\xe0\xba\xd3\x2f\x3c", + .rlen = 512, + .also_non_np = 1, + .np = 3, + .tap = { 512 - 20, 4, 16 }, + } +}; + +static struct cipher_testvec speck64_xts_dec_tv_template[] = { + { + .key = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .klen = 24, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x84\xaf\x54\x07\x19\xd4\x7c\xa6" + "\xe4\xfe\xdf\xc4\x1f\x34\xc3\xc2" + "\x80\xf5\x72\xe7\xcd\xf0\x99\x22" + "\x35\xa7\x2f\x06\xef\xdc\x51\xaa", + .ilen = 32, + .result = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .rlen = 32, + }, { + .key = "\x11\x11\x11\x11\x11\x11\x11\x11" + "\x11\x11\x11\x11\x11\x11\x11\x11" + "\x22\x22\x22\x22\x22\x22\x22\x22", + .klen = 24, + .iv = "\x33\x33\x33\x33\x33\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x12\x56\x73\xcd\x15\x87\xa8\x59" + "\xcf\x84\xae\xd9\x1c\x66\xd6\x9f" + "\xb3\x12\x69\x7e\x36\xeb\x52\xff" + "\x62\xdd\xba\x90\xb3\xe1\xee\x99", + .ilen = 32, + .result = "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44", + .rlen = 32, + }, { + .key = "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8" + "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0" + "\x22\x22\x22\x22\x22\x22\x22\x22", + .klen = 24, + .iv = "\x33\x33\x33\x33\x33\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x15\x1b\xe4\x2c\xa2\x5a\x2d\x2c" + "\x27\x36\xc0\xbf\x5d\xea\x36\x37" + "\x2d\x1a\x88\xbc\x66\xb5\xd0\x0b" + "\xa1\xbc\x19\xb2\x0f\x3b\x75\x34", + .ilen = 32, + .result = "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44" + "\x44\x44\x44\x44\x44\x44\x44\x44", + .rlen = 32, + }, { + .key = "\x27\x18\x28\x18\x28\x45\x90\x45" + "\x23\x53\x60\x28\x74\x71\x35\x26" + "\x31\x41\x59\x26\x53\x58\x97\x93", + .klen = 24, + .iv = "\x00\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\xaf\xa1\x81\xa6\x32\xbb\x15\x8e" + "\xf8\x95\x2e\xd3\xe6\xee\x7e\x09" + "\x0c\x1a\xf5\x02\x97\x8b\xe3\xb3" + "\x11\xc7\x39\x96\xd0\x95\xf4\x56" + "\xf4\xdd\x03\x38\x01\x44\x2c\xcf" + "\x88\xae\x8e\x3c\xcd\xe7\xaa\x66" + "\xfe\x3d\xc6\xfb\x01\x23\x51\x43" + "\xd5\xd2\x13\x86\x94\x34\xe9\x62" + "\xf9\x89\xe3\xd1\x7b\xbe\xf8\xef" + "\x76\x35\x04\x3f\xdb\x23\x9d\x0b" + "\x85\x42\xb9\x02\xd6\xcc\xdb\x96" + "\xa7\x6b\x27\xb6\xd4\x45\x8f\x7d" + "\xae\xd2\x04\xd5\xda\xc1\x7e\x24" + "\x8c\x73\xbe\x48\x7e\xcf\x65\x28" + "\x29\xe5\xbe\x54\x30\xcb\x46\x95" + "\x4f\x2e\x8a\x36\xc8\x27\xc5\xbe" + "\xd0\x1a\xaf\xab\x26\xcd\x9e\x69" + "\xa1\x09\x95\x71\x26\xe9\xc4\xdf" + "\xe6\x31\xc3\x46\xda\xaf\x0b\x41" + "\x1f\xab\xb1\x8e\xd6\xfc\x0b\xb3" + "\x82\xc0\x37\x27\xfc\x91\xa7\x05" + "\xfb\xc5\xdc\x2b\x74\x96\x48\x43" + "\x5d\x9c\x19\x0f\x60\x63\x3a\x1f" + "\x6f\xf0\x03\xbe\x4d\xfd\xc8\x4a" + "\xc6\xa4\x81\x6d\xc3\x12\x2a\x5c" + "\x07\xff\xf3\x72\x74\x48\xb5\x40" + "\x50\xb5\xdd\x90\x43\x31\x18\x15" + "\x7b\xf2\xa6\xdb\x83\xc8\x4b\x4a" + "\x29\x93\x90\x8b\xda\x07\xf0\x35" + "\x6d\x90\x88\x09\x4e\x83\xf5\x5b" + "\x94\x12\xbb\x33\x27\x1d\x3f\x23" + "\x51\xa8\x7c\x07\xa2\xae\x77\xa6" + "\x50\xfd\xcc\xc0\x4f\x80\x7a\x9f" + "\x66\xdd\xcd\x75\x24\x8b\x33\xf7" + "\x20\xdb\x83\x9b\x4f\x11\x63\x6e" + "\xcf\x37\xef\xc9\x11\x01\x5c\x45" + "\x32\x99\x7c\x3c\x9e\x42\x89\xe3" + "\x70\x6d\x15\x9f\xb1\xe6\xb6\x05" + "\xfe\x0c\xb9\x49\x2d\x90\x6d\xcc" + "\x5d\x3f\xc1\xfe\x89\x0a\x2e\x2d" + "\xa0\xa8\x89\x3b\x73\x39\xa5\x94" + "\x4c\xa4\xa6\xbb\xa7\x14\x46\x89" + "\x10\xff\xaf\xef\xca\xdd\x4f\x80" + "\xb3\xdf\x3b\xab\xd4\xe5\x5a\xc7" + "\x33\xca\x00\x8b\x8b\x3f\xea\xec" + "\x68\x8a\xc2\x6d\xfd\xd4\x67\x0f" + "\x22\x31\xe1\x0e\xfe\x5a\x04\xd5" + "\x64\xa3\xf1\x1a\x76\x28\xcc\x35" + "\x36\xa7\x0a\x74\xf7\x1c\x44\x9b" + "\xc7\x1b\x53\x17\x02\xea\xd1\xad" + "\x13\x51\x73\xc0\xa0\xb2\x05\x32" + "\xa8\xa2\x37\x2e\xe1\x7a\x3a\x19" + "\x26\xb4\x6c\x62\x5d\xb3\x1a\x1d" + "\x59\xda\xee\x1a\x22\x18\xda\x0d" + "\x88\x0f\x55\x8b\x72\x62\xfd\xc1" + "\x69\x13\xcd\x0d\x5f\xc1\x09\x52" + "\xee\xd6\xe3\x84\x4d\xee\xf6\x88" + "\xaf\x83\xdc\x76\xf4\xc0\x93\x3f" + "\x4a\x75\x2f\xb0\x0b\x3e\xc4\x54" + "\x7d\x69\x8d\x00\x62\x77\x0d\x14" + "\xbe\x7c\xa6\x7d\xc5\x24\x4f\xf3" + "\x50\xf7\x5f\xf4\xc2\xca\x41\x97" + "\x37\xbe\x75\x74\xcd\xf0\x75\x6e" + "\x25\x23\x94\xbd\xda\x8d\xb0\xd4", + .ilen = 512, + .result = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", + .rlen = 512, + }, { + .key = "\x27\x18\x28\x18\x28\x45\x90\x45" + "\x23\x53\x60\x28\x74\x71\x35\x26" + "\x62\x49\x77\x57\x24\x70\x93\x69" + "\x99\x59\x57\x49\x66\x96\x76\x27", + .klen = 32, + .iv = "\xff\x00\x00\x00\x00\x00\x00\x00" + "\x00\x00\x00\x00\x00\x00\x00\x00", + .input = "\x55\xed\x71\xd3\x02\x8e\x15\x3b" + "\xc6\x71\x29\x2d\x3e\x89\x9f\x59" + "\x68\x6a\xcc\x8a\x56\x97\xf3\x95" + "\x4e\x51\x08\xda\x2a\xf8\x6f\x3c" + "\x78\x16\xea\x80\xdb\x33\x75\x94" + "\xf9\x29\xc4\x2b\x76\x75\x97\xc7" + "\xf2\x98\x2c\xf9\xff\xc8\xd5\x2b" + "\x18\xf1\xaf\xcf\x7c\xc5\x0b\xee" + "\xad\x3c\x76\x7c\xe6\x27\xa2\x2a" + "\xe4\x66\xe1\xab\xa2\x39\xfc\x7c" + "\xf5\xec\x32\x74\xa3\xb8\x03\x88" + "\x52\xfc\x2e\x56\x3f\xa1\xf0\x9f" + "\x84\x5e\x46\xed\x20\x89\xb6\x44" + "\x8d\xd0\xed\x54\x47\x16\xbe\x95" + "\x8a\xb3\x6b\x72\xc4\x32\x52\x13" + "\x1b\xb0\x82\xbe\xac\xf9\x70\xa6" + "\x44\x18\xdd\x8c\x6e\xca\x6e\x45" + "\x8f\x1e\x10\x07\x57\x25\x98\x7b" + "\x17\x8c\x78\xdd\x80\xa7\xd9\xd8" + "\x63\xaf\xb9\x67\x57\xfd\xbc\xdb" + "\x44\xe9\xc5\x65\xd1\xc7\x3b\xff" + "\x20\xa0\x80\x1a\xc3\x9a\xad\x5e" + "\x5d\x3b\xd3\x07\xd9\xf5\xfd\x3d" + "\x4a\x8b\xa8\xd2\x6e\x7a\x51\x65" + "\x6c\x8e\x95\xe0\x45\xc9\x5f\x4a" + "\x09\x3c\x3d\x71\x7f\x0c\x84\x2a" + "\xc8\x48\x52\x1a\xc2\xd5\xd6\x78" + "\x92\x1e\xa0\x90\x2e\xea\xf0\xf3" + "\xdc\x0f\xb1\xaf\x0d\x9b\x06\x2e" + "\x35\x10\x30\x82\x0d\xe7\xc5\x9b" + "\xde\x44\x18\xbd\x9f\xd1\x45\xa9" + "\x7b\x7a\x4a\xad\x35\x65\x27\xca" + "\xb2\xc3\xd4\x9b\x71\x86\x70\xee" + "\xf1\x89\x3b\x85\x4b\x5b\xaa\xaf" + "\xfc\x42\xc8\x31\x59\xbe\x16\x60" + "\x4f\xf9\xfa\x12\xea\xd0\xa7\x14" + "\xf0\x7a\xf3\xd5\x8d\xbd\x81\xef" + "\x52\x7f\x29\x51\x94\x20\x67\x3c" + "\xd1\xaf\x77\x9f\x22\x5a\x4e\x63" + "\xe7\xff\x73\x25\xd1\xdd\x96\x8a" + "\x98\x52\x6d\xf3\xac\x3e\xf2\x18" + "\x6d\xf6\x0a\x29\xa6\x34\x3d\xed" + "\xe3\x27\x0d\x9d\x0a\x02\x44\x7e" + "\x5a\x7e\x67\x0f\x0a\x9e\xd6\xad" + "\x91\xe6\x4d\x81\x8c\x5c\x59\xaa" + "\xfb\xeb\x56\x53\xd2\x7d\x4c\x81" + "\x65\x53\x0f\x41\x11\xbd\x98\x99" + "\xf9\xc6\xfa\x51\x2e\xa3\xdd\x8d" + "\x84\x98\xf9\x34\xed\x33\x2a\x1f" + "\x82\xed\xc1\x73\x98\xd3\x02\xdc" + "\xe6\xc2\x33\x1d\xa2\xb4\xca\x76" + "\x63\x51\x34\x9d\x96\x12\xae\xce" + "\x83\xc9\x76\x5e\xa4\x1b\x53\x37" + "\x17\xd5\xc0\x80\x1d\x62\xf8\x3d" + "\x54\x27\x74\xbb\x10\x86\x57\x46" + "\x68\xe1\xed\x14\xe7\x9d\xfc\x84" + "\x47\xbc\xc2\xf8\x19\x4b\x99\xcf" + "\x7a\xe9\xc4\xb8\x8c\x82\x72\x4d" + "\x7b\x4f\x38\x55\x36\x71\x64\xc1" + "\xfc\x5c\x75\x52\x33\x02\x18\xf8" + "\x17\xe1\x2b\xc2\x43\x39\xbd\x76" + "\x9b\x63\x76\x32\x2f\x19\x72\x10" + "\x9f\x21\x0c\xf1\x66\x50\x7f\xa5" + "\x0d\x1f\x46\xe0\xba\xd3\x2f\x3c", + .ilen = 512, + .result = "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff" + "\x00\x01\x02\x03\x04\x05\x06\x07" + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" + "\x10\x11\x12\x13\x14\x15\x16\x17" + "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" + "\x20\x21\x22\x23\x24\x25\x26\x27" + "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" + "\x40\x41\x42\x43\x44\x45\x46\x47" + "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" + "\x50\x51\x52\x53\x54\x55\x56\x57" + "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" + "\x60\x61\x62\x63\x64\x65\x66\x67" + "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" + "\x70\x71\x72\x73\x74\x75\x76\x77" + "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" + "\x80\x81\x82\x83\x84\x85\x86\x87" + "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" + "\x90\x91\x92\x93\x94\x95\x96\x97" + "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" + "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7" + "\xa8\xa9\xaa\xab\xac\xad\xae\xaf" + "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7" + "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" + "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7" + "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" + "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7" + "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" + "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7" + "\xe8\xe9\xea\xeb\xec\xed\xee\xef" + "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7" + "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", + .rlen = 512, + .also_non_np = 1, + .np = 3, + .tap = { 512 - 20, 4, 16 }, + } +}; + /* Cast6 test vectors from RFC 2612 */ #define CAST6_ENC_TEST_VECTORS 4 #define CAST6_DEC_TEST_VECTORS 4 @@ -34545,4 +36031,78 @@ static struct comp_testvec lz4hc_decomp_tv_template[] = { }, }; +#define ZSTD_COMP_TEST_VECTORS 2 +#define ZSTD_DECOMP_TEST_VECTORS 2 + +static struct comp_testvec zstd_comp_tv_template[] = { + { + .inlen = 68, + .outlen = 39, + .input = "The algorithm is zstd. " + "The algorithm is zstd. " + "The algorithm is zstd.", + .output = "\x28\xb5\x2f\xfd\x00\x50\xf5\x00\x00\xb8\x54\x68\x65" + "\x20\x61\x6c\x67\x6f\x72\x69\x74\x68\x6d\x20\x69\x73" + "\x20\x7a\x73\x74\x64\x2e\x20\x01\x00\x55\x73\x36\x01" + , + }, + { + .inlen = 244, + .outlen = 151, + .input = "zstd, short for Zstandard, is a fast lossless " + "compression algorithm, targeting real-time " + "compression scenarios at zlib-level and better " + "compression ratios. The zstd compression library " + "provides in-memory compression and decompression " + "functions.", + .output = "\x28\xb5\x2f\xfd\x00\x50\x75\x04\x00\x42\x4b\x1e\x17" + "\x90\x81\x31\x00\xf2\x2f\xe4\x36\xc9\xef\x92\x88\x32" + "\xc9\xf2\x24\x94\xd8\x68\x9a\x0f\x00\x0c\xc4\x31\x6f" + "\x0d\x0c\x38\xac\x5c\x48\x03\xcd\x63\x67\xc0\xf3\xad" + "\x4e\x90\xaa\x78\xa0\xa4\xc5\x99\xda\x2f\xb6\x24\x60" + "\xe2\x79\x4b\xaa\xb6\x6b\x85\x0b\xc9\xc6\x04\x66\x86" + "\xe2\xcc\xe2\x25\x3f\x4f\x09\xcd\xb8\x9d\xdb\xc1\x90" + "\xa9\x11\xbc\x35\x44\x69\x2d\x9c\x64\x4f\x13\x31\x64" + "\xcc\xfb\x4d\x95\x93\x86\x7f\x33\x7f\x1a\xef\xe9\x30" + "\xf9\x67\xa1\x94\x0a\x69\x0f\x60\xcd\xc3\xab\x99\xdc" + "\x42\xed\x97\x05\x00\x33\xc3\x15\x95\x3a\x06\xa0\x0e" + "\x20\xa9\x0e\x82\xb9\x43\x45\x01", + }, +}; + +static struct comp_testvec zstd_decomp_tv_template[] = { + { + .inlen = 43, + .outlen = 68, + .input = "\x28\xb5\x2f\xfd\x04\x50\xf5\x00\x00\xb8\x54\x68\x65" + "\x20\x61\x6c\x67\x6f\x72\x69\x74\x68\x6d\x20\x69\x73" + "\x20\x7a\x73\x74\x64\x2e\x20\x01\x00\x55\x73\x36\x01" + "\x6b\xf4\x13\x35", + .output = "The algorithm is zstd. " + "The algorithm is zstd. " + "The algorithm is zstd.", + }, + { + .inlen = 155, + .outlen = 244, + .input = "\x28\xb5\x2f\xfd\x04\x50\x75\x04\x00\x42\x4b\x1e\x17" + "\x90\x81\x31\x00\xf2\x2f\xe4\x36\xc9\xef\x92\x88\x32" + "\xc9\xf2\x24\x94\xd8\x68\x9a\x0f\x00\x0c\xc4\x31\x6f" + "\x0d\x0c\x38\xac\x5c\x48\x03\xcd\x63\x67\xc0\xf3\xad" + "\x4e\x90\xaa\x78\xa0\xa4\xc5\x99\xda\x2f\xb6\x24\x60" + "\xe2\x79\x4b\xaa\xb6\x6b\x85\x0b\xc9\xc6\x04\x66\x86" + "\xe2\xcc\xe2\x25\x3f\x4f\x09\xcd\xb8\x9d\xdb\xc1\x90" + "\xa9\x11\xbc\x35\x44\x69\x2d\x9c\x64\x4f\x13\x31\x64" + "\xcc\xfb\x4d\x95\x93\x86\x7f\x33\x7f\x1a\xef\xe9\x30" + "\xf9\x67\xa1\x94\x0a\x69\x0f\x60\xcd\xc3\xab\x99\xdc" + "\x42\xed\x97\x05\x00\x33\xc3\x15\x95\x3a\x06\xa0\x0e" + "\x20\xa9\x0e\x82\xb9\x43\x45\x01\xaa\x6d\xda\x0d", + .output = "zstd, short for Zstandard, is a fast lossless " + "compression algorithm, targeting real-time " + "compression scenarios at zlib-level and better " + "compression ratios. The zstd compression library " + "provides in-memory compression and decompression " + "functions.", + }, +}; #endif /* _CRYPTO_TESTMGR_H */
diff --git a/crypto/zstd.c b/crypto/zstd.c new file mode 100644 index 0000000..9bfd28f8 --- /dev/null +++ b/crypto/zstd.c
@@ -0,0 +1,209 @@ +/* + * Cryptographic API. + * + * Copyright (c) 2017-present, Facebook, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ +#include <linux/crypto.h> +#include <linux/init.h> +#include <linux/interrupt.h> +#include <linux/mm.h> +#include <linux/module.h> +#include <linux/net.h> +#include <linux/vmalloc.h> +#include <linux/zstd.h> + + +#define ZSTD_DEF_LEVEL 3 + +struct zstd_ctx { + ZSTD_CCtx *cctx; + ZSTD_DCtx *dctx; + void *cwksp; + void *dwksp; +}; + +static ZSTD_parameters zstd_params(void) +{ + return ZSTD_getParams(ZSTD_DEF_LEVEL, 0, 0); +} + +static int zstd_comp_init(struct zstd_ctx *ctx) +{ + int ret = 0; + const ZSTD_parameters params = zstd_params(); + const size_t wksp_size = ZSTD_CCtxWorkspaceBound(params.cParams); + + ctx->cwksp = vzalloc(wksp_size); + if (!ctx->cwksp) { + ret = -ENOMEM; + goto out; + } + + ctx->cctx = ZSTD_initCCtx(ctx->cwksp, wksp_size); + if (!ctx->cctx) { + ret = -EINVAL; + goto out_free; + } +out: + return ret; +out_free: + vfree(ctx->cwksp); + goto out; +} + +static int zstd_decomp_init(struct zstd_ctx *ctx) +{ + int ret = 0; + const size_t wksp_size = ZSTD_DCtxWorkspaceBound(); + + ctx->dwksp = vzalloc(wksp_size); + if (!ctx->dwksp) { + ret = -ENOMEM; + goto out; + } + + ctx->dctx = ZSTD_initDCtx(ctx->dwksp, wksp_size); + if (!ctx->dctx) { + ret = -EINVAL; + goto out_free; + } +out: + return ret; +out_free: + vfree(ctx->dwksp); + goto out; +} + +static void zstd_comp_exit(struct zstd_ctx *ctx) +{ + vfree(ctx->cwksp); + ctx->cwksp = NULL; + ctx->cctx = NULL; +} + +static void zstd_decomp_exit(struct zstd_ctx *ctx) +{ + vfree(ctx->dwksp); + ctx->dwksp = NULL; + ctx->dctx = NULL; +} + +static int __zstd_init(void *ctx) +{ + int ret; + + ret = zstd_comp_init(ctx); + if (ret) + return ret; + ret = zstd_decomp_init(ctx); + if (ret) + zstd_comp_exit(ctx); + return ret; +} + +static int zstd_init(struct crypto_tfm *tfm) +{ + struct zstd_ctx *ctx = crypto_tfm_ctx(tfm); + + return __zstd_init(ctx); +} + +static void __zstd_exit(void *ctx) +{ + zstd_comp_exit(ctx); + zstd_decomp_exit(ctx); +} + +static void zstd_exit(struct crypto_tfm *tfm) +{ + struct zstd_ctx *ctx = crypto_tfm_ctx(tfm); + + __zstd_exit(ctx); +} + +static int __zstd_compress(const u8 *src, unsigned int slen, + u8 *dst, unsigned int *dlen, void *ctx) +{ + size_t out_len; + struct zstd_ctx *zctx = ctx; + const ZSTD_parameters params = zstd_params(); + + out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, params); + if (ZSTD_isError(out_len)) + return -EINVAL; + *dlen = out_len; + return 0; +} + +static int zstd_compress(struct crypto_tfm *tfm, const u8 *src, + unsigned int slen, u8 *dst, unsigned int *dlen) +{ + struct zstd_ctx *ctx = crypto_tfm_ctx(tfm); + + return __zstd_compress(src, slen, dst, dlen, ctx); +} + +static int __zstd_decompress(const u8 *src, unsigned int slen, + u8 *dst, unsigned int *dlen, void *ctx) +{ + size_t out_len; + struct zstd_ctx *zctx = ctx; + + out_len = ZSTD_decompressDCtx(zctx->dctx, dst, *dlen, src, slen); + if (ZSTD_isError(out_len)) + return -EINVAL; + *dlen = out_len; + return 0; +} + +static int zstd_decompress(struct crypto_tfm *tfm, const u8 *src, + unsigned int slen, u8 *dst, unsigned int *dlen) +{ + struct zstd_ctx *ctx = crypto_tfm_ctx(tfm); + + return __zstd_decompress(src, slen, dst, dlen, ctx); +} + +static struct crypto_alg alg = { + .cra_name = "zstd", + .cra_flags = CRYPTO_ALG_TYPE_COMPRESS, + .cra_ctxsize = sizeof(struct zstd_ctx), + .cra_module = THIS_MODULE, + .cra_init = zstd_init, + .cra_exit = zstd_exit, + .cra_u = { .compress = { + .coa_compress = zstd_compress, + .coa_decompress = zstd_decompress } } +}; + +static int __init zstd_mod_init(void) +{ + int ret; + + ret = crypto_register_alg(&alg); + if (ret) + return ret; + + return ret; +} + +static void __exit zstd_mod_fini(void) +{ + crypto_unregister_alg(&alg); +} + +module_init(zstd_mod_init); +module_exit(zstd_mod_fini); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("Zstd Compression Algorithm"); +MODULE_ALIAS_CRYPTO("zstd");
diff --git a/drivers/Kconfig b/drivers/Kconfig index e1e2066..de581c1 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig
@@ -202,4 +202,6 @@ source "drivers/fpga/Kconfig" +source "drivers/tee/Kconfig" + endmenu
diff --git a/drivers/Makefile b/drivers/Makefile index 7c3d58dc..dc932be 100644 --- a/drivers/Makefile +++ b/drivers/Makefile
@@ -175,3 +175,4 @@ obj-$(CONFIG_ANDROID) += android/ obj-$(CONFIG_NVMEM) += nvmem/ obj-$(CONFIG_FPGA) += fpga/ +obj-$(CONFIG_TEE) += tee/
diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c index e19f530..f7c4301 100644 --- a/drivers/acpi/button.c +++ b/drivers/acpi/button.c
@@ -556,7 +556,8 @@ static int acpi_button_remove(struct acpi_device *device) return 0; } -static int param_set_lid_init_state(const char *val, struct kernel_param *kp) +static int param_set_lid_init_state(const char *val, + const struct kernel_param *kp) { int result = 0; @@ -574,7 +575,8 @@ static int param_set_lid_init_state(const char *val, struct kernel_param *kp) return result; } -static int param_get_lid_init_state(char *buffer, struct kernel_param *kp) +static int param_get_lid_init_state(char *buffer, + const struct kernel_param *kp) { switch (lid_init_state) { case ACPI_BUTTON_LID_INIT_OPEN:
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 307b3e2..7b665aa 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c
@@ -1891,7 +1891,8 @@ static const struct dev_pm_ops acpi_ec_pm = { SET_SYSTEM_SLEEP_PM_OPS(acpi_ec_suspend, acpi_ec_resume) }; -static int param_set_event_clearing(const char *val, struct kernel_param *kp) +static int param_set_event_clearing(const char *val, + const struct kernel_param *kp) { int result = 0; @@ -1909,7 +1910,8 @@ static int param_set_event_clearing(const char *val, struct kernel_param *kp) return result; } -static int param_get_event_clearing(char *buffer, struct kernel_param *kp) +static int param_get_event_clearing(char *buffer, + const struct kernel_param *kp) { switch (ec_event_clearing) { case ACPI_EC_EVT_TIMING_STATUS:
diff --git a/drivers/acpi/sysfs.c b/drivers/acpi/sysfs.c index cf05ae9..34328e1 100644 --- a/drivers/acpi/sysfs.c +++ b/drivers/acpi/sysfs.c
@@ -227,7 +227,8 @@ module_param_cb(trace_method_name, ¶m_ops_trace_method, &trace_method_name, module_param_cb(trace_debug_layer, ¶m_ops_trace_attrib, &acpi_gbl_trace_dbg_layer, 0644); module_param_cb(trace_debug_level, ¶m_ops_trace_attrib, &acpi_gbl_trace_dbg_level, 0644); -static int param_set_trace_state(const char *val, struct kernel_param *kp) +static int param_set_trace_state(const char *val, + const struct kernel_param *kp) { acpi_status status; const char *method = trace_method_name; @@ -263,7 +264,7 @@ static int param_set_trace_state(const char *val, struct kernel_param *kp) return 0; } -static int param_get_trace_state(char *buffer, struct kernel_param *kp) +static int param_get_trace_state(char *buffer, const struct kernel_param *kp) { if (!(acpi_gbl_trace_flags & ACPI_TRACE_ENABLED)) return sprintf(buffer, "disable"); @@ -292,7 +293,8 @@ MODULE_PARM_DESC(aml_debug_output, "To enable/disable the ACPI Debug Object output."); /* /sys/module/acpi/parameters/acpica_version */ -static int param_get_acpica_version(char *buffer, struct kernel_param *kp) +static int param_get_acpica_version(char *buffer, + const struct kernel_param *kp) { int result;
diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig index bdfc6c6..63ed9ce 100644 --- a/drivers/android/Kconfig +++ b/drivers/android/Kconfig
@@ -9,7 +9,7 @@ config ANDROID_BINDER_IPC bool "Android Binder IPC Driver" - depends on MMU + depends on MMU && !M68K default n ---help--- Binder is used in Android for both communication between processes, @@ -19,18 +19,27 @@ Android process, using Binder to identify, invoke and pass arguments between said processes. -config ANDROID_BINDER_IPC_32BIT - bool - depends on !64BIT && ANDROID_BINDER_IPC - default y +config ANDROID_BINDER_DEVICES + string "Android Binder devices" + depends on ANDROID_BINDER_IPC + default "binder,hwbinder,vndbinder" ---help--- - The Binder API has been changed to support both 32 and 64bit - applications in a mixed environment. + Default value for the binder.devices parameter. - Enable this to support an old 32-bit Android user-space (v4.4 and - earlier). + The binder.devices parameter is a comma-separated list of strings + that specifies the names of the binder device nodes that will be + created. Each binder device has its own context manager, and is + therefore logically separated from the other devices. - Note that enabling this will break newer Android user-space. +config ANDROID_BINDER_IPC_SELFTEST + bool "Android Binder IPC Driver Selftest" + depends on ANDROID_BINDER_IPC + ---help--- + This feature allows binder selftest to run. + + Binder selftest checks the allocation and free of binder buffers + exhaustively with combinations of various buffer sizes and + alignments. endif # if ANDROID
diff --git a/drivers/android/Makefile b/drivers/android/Makefile index 3b7e4b0..a01254c 100644 --- a/drivers/android/Makefile +++ b/drivers/android/Makefile
@@ -1,3 +1,4 @@ ccflags-y += -I$(src) # needed for trace events -obj-$(CONFIG_ANDROID_BINDER_IPC) += binder.o +obj-$(CONFIG_ANDROID_BINDER_IPC) += binder.o binder_alloc.o +obj-$(CONFIG_ANDROID_BINDER_IPC_SELFTEST) += binder_alloc_selftest.o
diff --git a/drivers/android/binder.c b/drivers/android/binder.c index 49199bd..21c8e3c 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c
@@ -15,6 +15,40 @@ * */ +/* + * Locking overview + * + * There are 3 main spinlocks which must be acquired in the + * order shown: + * + * 1) proc->outer_lock : protects binder_ref + * binder_proc_lock() and binder_proc_unlock() are + * used to acq/rel. + * 2) node->lock : protects most fields of binder_node. + * binder_node_lock() and binder_node_unlock() are + * used to acq/rel + * 3) proc->inner_lock : protects the thread and node lists + * (proc->threads, proc->waiting_threads, proc->nodes) + * and all todo lists associated with the binder_proc + * (proc->todo, thread->todo, proc->delivered_death and + * node->async_todo), as well as thread->transaction_stack + * binder_inner_proc_lock() and binder_inner_proc_unlock() + * are used to acq/rel + * + * Any lock under procA must never be nested under any lock at the same + * level or below on procB. + * + * Functions that require a lock held on entry indicate which lock + * in the suffix of the function name: + * + * foo_olocked() : requires node->outer_lock + * foo_nlocked() : requires node->lock + * foo_ilocked() : requires proc->inner_lock + * foo_oilocked(): requires proc->outer_lock and proc->inner_lock + * foo_nilocked(): requires node->lock and proc->inner_lock + * ... + */ + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <asm/cacheflush.h> @@ -24,7 +58,6 @@ #include <linux/fs.h> #include <linux/list.h> #include <linux/miscdevice.h> -#include <linux/mm.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/nsproxy.h> @@ -34,31 +67,27 @@ #include <linux/sched.h> #include <linux/seq_file.h> #include <linux/uaccess.h> -#include <linux/vmalloc.h> -#include <linux/slab.h> #include <linux/pid_namespace.h> #include <linux/security.h> - -#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT -#define BINDER_IPC_32BIT 1 -#endif +#include <linux/spinlock.h> #include <uapi/linux/android/binder.h> +#include "binder_alloc.h" #include "binder_trace.h" -static DEFINE_MUTEX(binder_main_lock); -static DEFINE_MUTEX(binder_deferred_lock); -static DEFINE_MUTEX(binder_mmap_lock); - -static HLIST_HEAD(binder_procs); static HLIST_HEAD(binder_deferred_list); +static DEFINE_MUTEX(binder_deferred_lock); + +static HLIST_HEAD(binder_devices); +static HLIST_HEAD(binder_procs); +static DEFINE_MUTEX(binder_procs_lock); + static HLIST_HEAD(binder_dead_nodes); +static DEFINE_SPINLOCK(binder_dead_nodes_lock); static struct dentry *binder_debugfs_dir_entry_root; static struct dentry *binder_debugfs_dir_entry_proc; -static struct binder_node *binder_context_mgr_node; -static kuid_t binder_context_mgr_uid = INVALID_UID; -static int binder_last_id; +static atomic_t binder_last_id; #define BINDER_DEBUG_ENTRY(name) \ static int binder_##name##_open(struct inode *inode, struct file *file) \ @@ -104,22 +133,21 @@ enum { BINDER_DEBUG_TRANSACTION_COMPLETE = 1U << 10, BINDER_DEBUG_FREE_BUFFER = 1U << 11, BINDER_DEBUG_INTERNAL_REFS = 1U << 12, - BINDER_DEBUG_BUFFER_ALLOC = 1U << 13, - BINDER_DEBUG_PRIORITY_CAP = 1U << 14, - BINDER_DEBUG_BUFFER_ALLOC_ASYNC = 1U << 15, + BINDER_DEBUG_PRIORITY_CAP = 1U << 13, + BINDER_DEBUG_SPINLOCKS = 1U << 14, }; static uint32_t binder_debug_mask = BINDER_DEBUG_USER_ERROR | BINDER_DEBUG_FAILED_TRANSACTION | BINDER_DEBUG_DEAD_TRANSACTION; -module_param_named(debug_mask, binder_debug_mask, uint, S_IWUSR | S_IRUGO); +module_param_named(debug_mask, binder_debug_mask, uint, 0644); -static bool binder_debug_no_lock; -module_param_named(proc_no_lock, binder_debug_no_lock, bool, S_IWUSR | S_IRUGO); +static char *binder_devices_param = CONFIG_ANDROID_BINDER_DEVICES; +module_param_named(devices, binder_devices_param, charp, S_IRUGO); static DECLARE_WAIT_QUEUE_HEAD(binder_user_error_wait); static int binder_stop_on_user_error; static int binder_set_stop_on_user_error(const char *val, - struct kernel_param *kp) + const struct kernel_param *kp) { int ret; @@ -129,7 +157,7 @@ static int binder_set_stop_on_user_error(const char *val, return ret; } module_param_call(stop_on_user_error, binder_set_stop_on_user_error, - param_get_int, &binder_stop_on_user_error, S_IWUSR | S_IRUGO); + param_get_int, &binder_stop_on_user_error, 0644); #define binder_debug(mask, x...) \ do { \ @@ -145,6 +173,17 @@ module_param_call(stop_on_user_error, binder_set_stop_on_user_error, binder_stop_on_user_error = 2; \ } while (0) +#define to_flat_binder_object(hdr) \ + container_of(hdr, struct flat_binder_object, hdr) + +#define to_binder_fd_object(hdr) container_of(hdr, struct binder_fd_object, hdr) + +#define to_binder_buffer_object(hdr) \ + container_of(hdr, struct binder_buffer_object, hdr) + +#define to_binder_fd_array_object(hdr) \ + container_of(hdr, struct binder_fd_array_object, hdr) + enum binder_stat_types { BINDER_STAT_PROC, BINDER_STAT_THREAD, @@ -157,26 +196,27 @@ enum binder_stat_types { }; struct binder_stats { - int br[_IOC_NR(BR_FAILED_REPLY) + 1]; - int bc[_IOC_NR(BC_DEAD_BINDER_DONE) + 1]; - int obj_created[BINDER_STAT_COUNT]; - int obj_deleted[BINDER_STAT_COUNT]; + atomic_t br[_IOC_NR(BR_FAILED_REPLY) + 1]; + atomic_t bc[_IOC_NR(BC_REPLY_SG) + 1]; + atomic_t obj_created[BINDER_STAT_COUNT]; + atomic_t obj_deleted[BINDER_STAT_COUNT]; }; static struct binder_stats binder_stats; static inline void binder_stats_deleted(enum binder_stat_types type) { - binder_stats.obj_deleted[type]++; + atomic_inc(&binder_stats.obj_deleted[type]); } static inline void binder_stats_created(enum binder_stat_types type) { - binder_stats.obj_created[type]++; + atomic_inc(&binder_stats.obj_created[type]); } struct binder_transaction_log_entry { int debug_id; + int debug_id_done; int call_type; int from_proc; int from_thread; @@ -186,10 +226,14 @@ struct binder_transaction_log_entry { int to_node; int data_size; int offsets_size; + int return_error_line; + uint32_t return_error; + uint32_t return_error_param; + const char *context_name; }; struct binder_transaction_log { - int next; - int full; + atomic_t cur; + bool full; struct binder_transaction_log_entry entry[32]; }; static struct binder_transaction_log binder_transaction_log; @@ -199,22 +243,50 @@ static struct binder_transaction_log_entry *binder_transaction_log_add( struct binder_transaction_log *log) { struct binder_transaction_log_entry *e; + unsigned int cur = atomic_inc_return(&log->cur); - e = &log->entry[log->next]; + if (cur >= ARRAY_SIZE(log->entry)) + log->full = true; + e = &log->entry[cur % ARRAY_SIZE(log->entry)]; + WRITE_ONCE(e->debug_id_done, 0); + /* + * write-barrier to synchronize access to e->debug_id_done. + * We make sure the initialized 0 value is seen before + * memset() other fields are zeroed by memset. + */ + smp_wmb(); memset(e, 0, sizeof(*e)); - log->next++; - if (log->next == ARRAY_SIZE(log->entry)) { - log->next = 0; - log->full = 1; - } return e; } +struct binder_context { + struct binder_node *binder_context_mgr_node; + struct mutex context_mgr_node_lock; + + kuid_t binder_context_mgr_uid; + const char *name; +}; + +struct binder_device { + struct hlist_node hlist; + struct miscdevice miscdev; + struct binder_context context; +}; + +/** + * struct binder_work - work enqueued on a worklist + * @entry: node enqueued on list + * @type: type of work to be performed + * + * There are separate work lists for proc, thread, and node (async). + */ struct binder_work { struct list_head entry; + enum { BINDER_WORK_TRANSACTION = 1, BINDER_WORK_TRANSACTION_COMPLETE, + BINDER_WORK_RETURN_ERROR, BINDER_WORK_NODE, BINDER_WORK_DEAD_BINDER, BINDER_WORK_DEAD_BINDER_AND_CLEAR, @@ -222,8 +294,76 @@ struct binder_work { } type; }; +struct binder_error { + struct binder_work work; + uint32_t cmd; +}; + +/** + * struct binder_node - binder node bookkeeping + * @debug_id: unique ID for debugging + * (invariant after initialized) + * @lock: lock for node fields + * @work: worklist element for node work + * (protected by @proc->inner_lock) + * @rb_node: element for proc->nodes tree + * (protected by @proc->inner_lock) + * @dead_node: element for binder_dead_nodes list + * (protected by binder_dead_nodes_lock) + * @proc: binder_proc that owns this node + * (invariant after initialized) + * @refs: list of references on this node + * (protected by @lock) + * @internal_strong_refs: used to take strong references when + * initiating a transaction + * (protected by @proc->inner_lock if @proc + * and by @lock) + * @local_weak_refs: weak user refs from local process + * (protected by @proc->inner_lock if @proc + * and by @lock) + * @local_strong_refs: strong user refs from local process + * (protected by @proc->inner_lock if @proc + * and by @lock) + * @tmp_refs: temporary kernel refs + * (protected by @proc->inner_lock while @proc + * is valid, and by binder_dead_nodes_lock + * if @proc is NULL. During inc/dec and node release + * it is also protected by @lock to provide safety + * as the node dies and @proc becomes NULL) + * @ptr: userspace pointer for node + * (invariant, no lock needed) + * @cookie: userspace cookie for node + * (invariant, no lock needed) + * @has_strong_ref: userspace notified of strong ref + * (protected by @proc->inner_lock if @proc + * and by @lock) + * @pending_strong_ref: userspace has acked notification of strong ref + * (protected by @proc->inner_lock if @proc + * and by @lock) + * @has_weak_ref: userspace notified of weak ref + * (protected by @proc->inner_lock if @proc + * and by @lock) + * @pending_weak_ref: userspace has acked notification of weak ref + * (protected by @proc->inner_lock if @proc + * and by @lock) + * @has_async_transaction: async transaction to node in progress + * (protected by @lock) + * @sched_policy: minimum scheduling policy for node + * (invariant after initialized) + * @accept_fds: file descriptor operations supported for node + * (invariant after initialized) + * @min_priority: minimum scheduling priority + * (invariant after initialized) + * @inherit_rt: inherit RT scheduling policy from caller + * (invariant after initialized) + * @async_todo: list of async work items + * (protected by @proc->inner_lock) + * + * Bookkeeping structure for binder nodes. + */ struct binder_node { int debug_id; + spinlock_t lock; struct binder_work work; union { struct rb_node rb_node; @@ -234,97 +374,198 @@ struct binder_node { int internal_strong_refs; int local_weak_refs; int local_strong_refs; + int tmp_refs; binder_uintptr_t ptr; binder_uintptr_t cookie; - unsigned has_strong_ref:1; - unsigned pending_strong_ref:1; - unsigned has_weak_ref:1; - unsigned pending_weak_ref:1; - unsigned has_async_transaction:1; - unsigned accept_fds:1; - unsigned min_priority:8; + struct { + /* + * bitfield elements protected by + * proc inner_lock + */ + u8 has_strong_ref:1; + u8 pending_strong_ref:1; + u8 has_weak_ref:1; + u8 pending_weak_ref:1; + }; + struct { + /* + * invariant after initialization + */ + u8 sched_policy:2; + u8 inherit_rt:1; + u8 accept_fds:1; + u8 min_priority; + }; + bool has_async_transaction; struct list_head async_todo; }; struct binder_ref_death { + /** + * @work: worklist element for death notifications + * (protected by inner_lock of the proc that + * this ref belongs to) + */ struct binder_work work; binder_uintptr_t cookie; }; +/** + * struct binder_ref_data - binder_ref counts and id + * @debug_id: unique ID for the ref + * @desc: unique userspace handle for ref + * @strong: strong ref count (debugging only if not locked) + * @weak: weak ref count (debugging only if not locked) + * + * Structure to hold ref count and ref id information. Since + * the actual ref can only be accessed with a lock, this structure + * is used to return information about the ref to callers of + * ref inc/dec functions. + */ +struct binder_ref_data { + int debug_id; + uint32_t desc; + int strong; + int weak; +}; + +/** + * struct binder_ref - struct to track references on nodes + * @data: binder_ref_data containing id, handle, and current refcounts + * @rb_node_desc: node for lookup by @data.desc in proc's rb_tree + * @rb_node_node: node for lookup by @node in proc's rb_tree + * @node_entry: list entry for node->refs list in target node + * (protected by @node->lock) + * @proc: binder_proc containing ref + * @node: binder_node of target node. When cleaning up a + * ref for deletion in binder_cleanup_ref, a non-NULL + * @node indicates the node must be freed + * @death: pointer to death notification (ref_death) if requested + * (protected by @node->lock) + * + * Structure to track references from procA to target node (on procB). This + * structure is unsafe to access without holding @proc->outer_lock. + */ struct binder_ref { /* Lookups needed: */ /* node + proc => ref (transaction) */ /* desc + proc => ref (transaction, inc/dec ref) */ /* node => refs + procs (proc exit) */ - int debug_id; + struct binder_ref_data data; struct rb_node rb_node_desc; struct rb_node rb_node_node; struct hlist_node node_entry; struct binder_proc *proc; struct binder_node *node; - uint32_t desc; - int strong; - int weak; struct binder_ref_death *death; }; -struct binder_buffer { - struct list_head entry; /* free and allocated entries by address */ - struct rb_node rb_node; /* free entry by size or allocated entry */ - /* by address */ - unsigned free:1; - unsigned allow_user_free:1; - unsigned async_transaction:1; - unsigned debug_id:29; - - struct binder_transaction *transaction; - - struct binder_node *target_node; - size_t data_size; - size_t offsets_size; - uint8_t data[0]; -}; - enum binder_deferred_state { BINDER_DEFERRED_PUT_FILES = 0x01, BINDER_DEFERRED_FLUSH = 0x02, BINDER_DEFERRED_RELEASE = 0x04, }; +/** + * struct binder_priority - scheduler policy and priority + * @sched_policy scheduler policy + * @prio [100..139] for SCHED_NORMAL, [0..99] for FIFO/RT + * + * The binder driver supports inheriting the following scheduler policies: + * SCHED_NORMAL + * SCHED_BATCH + * SCHED_FIFO + * SCHED_RR + */ +struct binder_priority { + unsigned int sched_policy; + int prio; +}; + +/** + * struct binder_proc - binder process bookkeeping + * @proc_node: element for binder_procs list + * @threads: rbtree of binder_threads in this proc + * (protected by @inner_lock) + * @nodes: rbtree of binder nodes associated with + * this proc ordered by node->ptr + * (protected by @inner_lock) + * @refs_by_desc: rbtree of refs ordered by ref->desc + * (protected by @outer_lock) + * @refs_by_node: rbtree of refs ordered by ref->node + * (protected by @outer_lock) + * @waiting_threads: threads currently waiting for proc work + * (protected by @inner_lock) + * @pid PID of group_leader of process + * (invariant after initialized) + * @tsk task_struct for group_leader of process + * (invariant after initialized) + * @files files_struct for process + * (protected by @files_lock) + * @files_lock mutex to protect @files + * @deferred_work_node: element for binder_deferred_list + * (protected by binder_deferred_lock) + * @deferred_work: bitmap of deferred work to perform + * (protected by binder_deferred_lock) + * @is_dead: process is dead and awaiting free + * when outstanding transactions are cleaned up + * (protected by @inner_lock) + * @todo: list of work for this process + * (protected by @inner_lock) + * @stats: per-process binder statistics + * (atomics, no lock needed) + * @delivered_death: list of delivered death notification + * (protected by @inner_lock) + * @max_threads: cap on number of binder threads + * (protected by @inner_lock) + * @requested_threads: number of binder threads requested but not + * yet started. In current implementation, can + * only be 0 or 1. + * (protected by @inner_lock) + * @requested_threads_started: number binder threads started + * (protected by @inner_lock) + * @tmp_ref: temporary reference to indicate proc is in use + * (protected by @inner_lock) + * @default_priority: default scheduler priority + * (invariant after initialized) + * @debugfs_entry: debugfs node + * @alloc: binder allocator bookkeeping + * @context: binder_context for this proc + * (invariant after initialized) + * @inner_lock: can nest under outer_lock and/or node lock + * @outer_lock: no nesting under innor or node lock + * Lock order: 1) outer, 2) node, 3) inner + * + * Bookkeeping structure for binder processes + */ struct binder_proc { struct hlist_node proc_node; struct rb_root threads; struct rb_root nodes; struct rb_root refs_by_desc; struct rb_root refs_by_node; + struct list_head waiting_threads; int pid; - struct vm_area_struct *vma; - struct mm_struct *vma_vm_mm; struct task_struct *tsk; struct files_struct *files; + struct mutex files_lock; struct hlist_node deferred_work_node; int deferred_work; - void *buffer; - ptrdiff_t user_buffer_offset; + bool is_dead; - struct list_head buffers; - struct rb_root free_buffers; - struct rb_root allocated_buffers; - size_t free_async_space; - - struct page **pages; - size_t buffer_size; - uint32_t buffer_free; struct list_head todo; - wait_queue_head_t wait; struct binder_stats stats; struct list_head delivered_death; int max_threads; int requested_threads; int requested_threads_started; - int ready_threads; - long default_priority; + int tmp_ref; + struct binder_priority default_priority; struct dentry *debugfs_entry; + struct binder_alloc alloc; + struct binder_context *context; + spinlock_t inner_lock; + spinlock_t outer_lock; }; enum { @@ -333,22 +574,63 @@ enum { BINDER_LOOPER_STATE_EXITED = 0x04, BINDER_LOOPER_STATE_INVALID = 0x08, BINDER_LOOPER_STATE_WAITING = 0x10, - BINDER_LOOPER_STATE_NEED_RETURN = 0x20 + BINDER_LOOPER_STATE_POLL = 0x20, }; +/** + * struct binder_thread - binder thread bookkeeping + * @proc: binder process for this thread + * (invariant after initialization) + * @rb_node: element for proc->threads rbtree + * (protected by @proc->inner_lock) + * @waiting_thread_node: element for @proc->waiting_threads list + * (protected by @proc->inner_lock) + * @pid: PID for this thread + * (invariant after initialization) + * @looper: bitmap of looping state + * (only accessed by this thread) + * @looper_needs_return: looping thread needs to exit driver + * (no lock needed) + * @transaction_stack: stack of in-progress transactions for this thread + * (protected by @proc->inner_lock) + * @todo: list of work to do for this thread + * (protected by @proc->inner_lock) + * @process_todo: whether work in @todo should be processed + * (protected by @proc->inner_lock) + * @return_error: transaction errors reported by this thread + * (only accessed by this thread) + * @reply_error: transaction errors reported by target thread + * (protected by @proc->inner_lock) + * @wait: wait queue for thread work + * @stats: per-thread statistics + * (atomics, no lock needed) + * @tmp_ref: temporary reference to indicate thread is in use + * (atomic since @proc->inner_lock cannot + * always be acquired) + * @is_dead: thread is dead and awaiting free + * when outstanding transactions are cleaned up + * (protected by @proc->inner_lock) + * @task: struct task_struct for this thread + * + * Bookkeeping structure for binder threads. + */ struct binder_thread { struct binder_proc *proc; struct rb_node rb_node; + struct list_head waiting_thread_node; int pid; - int looper; + int looper; /* only modified by this thread */ + bool looper_need_return; /* can be written by other thread */ struct binder_transaction *transaction_stack; struct list_head todo; - uint32_t return_error; /* Write failed, return error code in read buf */ - uint32_t return_error2; /* Write failed, return error code in read */ - /* buffer. Used when sending a reply to a dead process that */ - /* we are also waiting on */ + bool process_todo; + struct binder_error return_error; + struct binder_error reply_error; wait_queue_head_t wait; struct binder_stats stats; + atomic_t tmp_ref; + bool is_dead; + struct task_struct *task; }; struct binder_transaction { @@ -365,30 +647,324 @@ struct binder_transaction { struct binder_buffer *buffer; unsigned int code; unsigned int flags; - long priority; - long saved_priority; + struct binder_priority priority; + struct binder_priority saved_priority; + bool set_priority_called; kuid_t sender_euid; + /** + * @lock: protects @from, @to_proc, and @to_thread + * + * @from, @to_proc, and @to_thread can be set to NULL + * during thread teardown + */ + spinlock_t lock; }; +/** + * binder_proc_lock() - Acquire outer lock for given binder_proc + * @proc: struct binder_proc to acquire + * + * Acquires proc->outer_lock. Used to protect binder_ref + * structures associated with the given proc. + */ +#define binder_proc_lock(proc) _binder_proc_lock(proc, __LINE__) +static void +_binder_proc_lock(struct binder_proc *proc, int line) +{ + binder_debug(BINDER_DEBUG_SPINLOCKS, + "%s: line=%d\n", __func__, line); + spin_lock(&proc->outer_lock); +} + +/** + * binder_proc_unlock() - Release spinlock for given binder_proc + * @proc: struct binder_proc to acquire + * + * Release lock acquired via binder_proc_lock() + */ +#define binder_proc_unlock(_proc) _binder_proc_unlock(_proc, __LINE__) +static void +_binder_proc_unlock(struct binder_proc *proc, int line) +{ + binder_debug(BINDER_DEBUG_SPINLOCKS, + "%s: line=%d\n", __func__, line); + spin_unlock(&proc->outer_lock); +} + +/** + * binder_inner_proc_lock() - Acquire inner lock for given binder_proc + * @proc: struct binder_proc to acquire + * + * Acquires proc->inner_lock. Used to protect todo lists + */ +#define binder_inner_proc_lock(proc) _binder_inner_proc_lock(proc, __LINE__) +static void +_binder_inner_proc_lock(struct binder_proc *proc, int line) +{ + binder_debug(BINDER_DEBUG_SPINLOCKS, + "%s: line=%d\n", __func__, line); + spin_lock(&proc->inner_lock); +} + +/** + * binder_inner_proc_unlock() - Release inner lock for given binder_proc + * @proc: struct binder_proc to acquire + * + * Release lock acquired via binder_inner_proc_lock() + */ +#define binder_inner_proc_unlock(proc) _binder_inner_proc_unlock(proc, __LINE__) +static void +_binder_inner_proc_unlock(struct binder_proc *proc, int line) +{ + binder_debug(BINDER_DEBUG_SPINLOCKS, + "%s: line=%d\n", __func__, line); + spin_unlock(&proc->inner_lock); +} + +/** + * binder_node_lock() - Acquire spinlock for given binder_node + * @node: struct binder_node to acquire + * + * Acquires node->lock. Used to protect binder_node fields + */ +#define binder_node_lock(node) _binder_node_lock(node, __LINE__) +static void +_binder_node_lock(struct binder_node *node, int line) +{ + binder_debug(BINDER_DEBUG_SPINLOCKS, + "%s: line=%d\n", __func__, line); + spin_lock(&node->lock); +} + +/** + * binder_node_unlock() - Release spinlock for given binder_proc + * @node: struct binder_node to acquire + * + * Release lock acquired via binder_node_lock() + */ +#define binder_node_unlock(node) _binder_node_unlock(node, __LINE__) +static void +_binder_node_unlock(struct binder_node *node, int line) +{ + binder_debug(BINDER_DEBUG_SPINLOCKS, + "%s: line=%d\n", __func__, line); + spin_unlock(&node->lock); +} + +/** + * binder_node_inner_lock() - Acquire node and inner locks + * @node: struct binder_node to acquire + * + * Acquires node->lock. If node->proc also acquires + * proc->inner_lock. Used to protect binder_node fields + */ +#define binder_node_inner_lock(node) _binder_node_inner_lock(node, __LINE__) +static void +_binder_node_inner_lock(struct binder_node *node, int line) +{ + binder_debug(BINDER_DEBUG_SPINLOCKS, + "%s: line=%d\n", __func__, line); + spin_lock(&node->lock); + if (node->proc) + binder_inner_proc_lock(node->proc); +} + +/** + * binder_node_unlock() - Release node and inner locks + * @node: struct binder_node to acquire + * + * Release lock acquired via binder_node_lock() + */ +#define binder_node_inner_unlock(node) _binder_node_inner_unlock(node, __LINE__) +static void +_binder_node_inner_unlock(struct binder_node *node, int line) +{ + struct binder_proc *proc = node->proc; + + binder_debug(BINDER_DEBUG_SPINLOCKS, + "%s: line=%d\n", __func__, line); + if (proc) + binder_inner_proc_unlock(proc); + spin_unlock(&node->lock); +} + +static bool binder_worklist_empty_ilocked(struct list_head *list) +{ + return list_empty(list); +} + +/** + * binder_worklist_empty() - Check if no items on the work list + * @proc: binder_proc associated with list + * @list: list to check + * + * Return: true if there are no items on list, else false + */ +static bool binder_worklist_empty(struct binder_proc *proc, + struct list_head *list) +{ + bool ret; + + binder_inner_proc_lock(proc); + ret = binder_worklist_empty_ilocked(list); + binder_inner_proc_unlock(proc); + return ret; +} + +/** + * binder_enqueue_work_ilocked() - Add an item to the work list + * @work: struct binder_work to add to list + * @target_list: list to add work to + * + * Adds the work to the specified list. Asserts that work + * is not already on a list. + * + * Requires the proc->inner_lock to be held. + */ +static void +binder_enqueue_work_ilocked(struct binder_work *work, + struct list_head *target_list) +{ + BUG_ON(target_list == NULL); + BUG_ON(work->entry.next && !list_empty(&work->entry)); + list_add_tail(&work->entry, target_list); +} + +/** + * binder_enqueue_deferred_thread_work_ilocked() - Add deferred thread work + * @thread: thread to queue work to + * @work: struct binder_work to add to list + * + * Adds the work to the todo list of the thread. Doesn't set the process_todo + * flag, which means that (if it wasn't already set) the thread will go to + * sleep without handling this work when it calls read. + * + * Requires the proc->inner_lock to be held. + */ +static void +binder_enqueue_deferred_thread_work_ilocked(struct binder_thread *thread, + struct binder_work *work) +{ + binder_enqueue_work_ilocked(work, &thread->todo); +} + +/** + * binder_enqueue_thread_work_ilocked() - Add an item to the thread work list + * @thread: thread to queue work to + * @work: struct binder_work to add to list + * + * Adds the work to the todo list of the thread, and enables processing + * of the todo queue. + * + * Requires the proc->inner_lock to be held. + */ +static void +binder_enqueue_thread_work_ilocked(struct binder_thread *thread, + struct binder_work *work) +{ + binder_enqueue_work_ilocked(work, &thread->todo); + thread->process_todo = true; +} + +/** + * binder_enqueue_thread_work() - Add an item to the thread work list + * @thread: thread to queue work to + * @work: struct binder_work to add to list + * + * Adds the work to the todo list of the thread, and enables processing + * of the todo queue. + */ +static void +binder_enqueue_thread_work(struct binder_thread *thread, + struct binder_work *work) +{ + binder_inner_proc_lock(thread->proc); + binder_enqueue_thread_work_ilocked(thread, work); + binder_inner_proc_unlock(thread->proc); +} + +static void +binder_dequeue_work_ilocked(struct binder_work *work) +{ + list_del_init(&work->entry); +} + +/** + * binder_dequeue_work() - Removes an item from the work list + * @proc: binder_proc associated with list + * @work: struct binder_work to remove from list + * + * Removes the specified work item from whatever list it is on. + * Can safely be called if work is not on any list. + */ +static void +binder_dequeue_work(struct binder_proc *proc, struct binder_work *work) +{ + binder_inner_proc_lock(proc); + binder_dequeue_work_ilocked(work); + binder_inner_proc_unlock(proc); +} + +static struct binder_work *binder_dequeue_work_head_ilocked( + struct list_head *list) +{ + struct binder_work *w; + + w = list_first_entry_or_null(list, struct binder_work, entry); + if (w) + list_del_init(&w->entry); + return w; +} + +/** + * binder_dequeue_work_head() - Dequeues the item at head of list + * @proc: binder_proc associated with list + * @list: list to dequeue head + * + * Removes the head of the list if there are items on the list + * + * Return: pointer dequeued binder_work, NULL if list was empty + */ +static struct binder_work *binder_dequeue_work_head( + struct binder_proc *proc, + struct list_head *list) +{ + struct binder_work *w; + + binder_inner_proc_lock(proc); + w = binder_dequeue_work_head_ilocked(list); + binder_inner_proc_unlock(proc); + return w; +} + static void binder_defer_work(struct binder_proc *proc, enum binder_deferred_state defer); +static void binder_free_thread(struct binder_thread *thread); +static void binder_free_proc(struct binder_proc *proc); +static void binder_inc_node_tmpref_ilocked(struct binder_node *node); static int task_get_unused_fd_flags(struct binder_proc *proc, int flags) { - struct files_struct *files = proc->files; unsigned long rlim_cur; unsigned long irqs; + int ret; - if (files == NULL) - return -ESRCH; - - if (!lock_task_sighand(proc->tsk, &irqs)) - return -EMFILE; - + mutex_lock(&proc->files_lock); + if (proc->files == NULL) { + ret = -ESRCH; + goto err; + } + if (!lock_task_sighand(proc->tsk, &irqs)) { + ret = -EMFILE; + goto err; + } rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE); unlock_task_sighand(proc->tsk, &irqs); - return __alloc_fd(files, 0, rlim_cur, flags); + ret = __alloc_fd(proc->files, 0, rlim_cur, flags); +err: + mutex_unlock(&proc->files_lock); + return ret; } /* @@ -397,8 +973,10 @@ static int task_get_unused_fd_flags(struct binder_proc *proc, int flags) static void task_fd_install( struct binder_proc *proc, unsigned int fd, struct file *file) { + mutex_lock(&proc->files_lock); if (proc->files) __fd_install(proc->files, fd, file); + mutex_unlock(&proc->files_lock); } /* @@ -408,9 +986,11 @@ static long task_close_fd(struct binder_proc *proc, unsigned int fd) { int retval; - if (proc->files == NULL) - return -ESRCH; - + mutex_lock(&proc->files_lock); + if (proc->files == NULL) { + retval = -ESRCH; + goto err; + } retval = __close_fd(proc->files, fd); /* can't restart close syscall because file table entry was cleared */ if (unlikely(retval == -ERESTARTSYS || @@ -418,457 +998,287 @@ static long task_close_fd(struct binder_proc *proc, unsigned int fd) retval == -ERESTARTNOHAND || retval == -ERESTART_RESTARTBLOCK)) retval = -EINTR; - +err: + mutex_unlock(&proc->files_lock); return retval; } -static inline void binder_lock(const char *tag) +static bool binder_has_work_ilocked(struct binder_thread *thread, + bool do_proc_work) { - trace_binder_lock(tag); - mutex_lock(&binder_main_lock); - trace_binder_locked(tag); + return thread->process_todo || + thread->looper_need_return || + (do_proc_work && + !binder_worklist_empty_ilocked(&thread->proc->todo)); } -static inline void binder_unlock(const char *tag) +static bool binder_has_work(struct binder_thread *thread, bool do_proc_work) { - trace_binder_unlock(tag); - mutex_unlock(&binder_main_lock); + bool has_work; + + binder_inner_proc_lock(thread->proc); + has_work = binder_has_work_ilocked(thread, do_proc_work); + binder_inner_proc_unlock(thread->proc); + + return has_work; } -static void binder_set_nice(long nice) +static bool binder_available_for_proc_work_ilocked(struct binder_thread *thread) { - long min_nice; + return !thread->transaction_stack && + binder_worklist_empty_ilocked(&thread->todo) && + (thread->looper & (BINDER_LOOPER_STATE_ENTERED | + BINDER_LOOPER_STATE_REGISTERED)); +} - if (can_nice(current, nice)) { - set_user_nice(current, nice); +static void binder_wakeup_poll_threads_ilocked(struct binder_proc *proc, + bool sync) +{ + struct rb_node *n; + struct binder_thread *thread; + + for (n = rb_first(&proc->threads); n != NULL; n = rb_next(n)) { + thread = rb_entry(n, struct binder_thread, rb_node); + if (thread->looper & BINDER_LOOPER_STATE_POLL && + binder_available_for_proc_work_ilocked(thread)) { + if (sync) + wake_up_interruptible_sync(&thread->wait); + else + wake_up_interruptible(&thread->wait); + } + } +} + +/** + * binder_select_thread_ilocked() - selects a thread for doing proc work. + * @proc: process to select a thread from + * + * Note that calling this function moves the thread off the waiting_threads + * list, so it can only be woken up by the caller of this function, or a + * signal. Therefore, callers *should* always wake up the thread this function + * returns. + * + * Return: If there's a thread currently waiting for process work, + * returns that thread. Otherwise returns NULL. + */ +static struct binder_thread * +binder_select_thread_ilocked(struct binder_proc *proc) +{ + struct binder_thread *thread; + + assert_spin_locked(&proc->inner_lock); + thread = list_first_entry_or_null(&proc->waiting_threads, + struct binder_thread, + waiting_thread_node); + + if (thread) + list_del_init(&thread->waiting_thread_node); + + return thread; +} + +/** + * binder_wakeup_thread_ilocked() - wakes up a thread for doing proc work. + * @proc: process to wake up a thread in + * @thread: specific thread to wake-up (may be NULL) + * @sync: whether to do a synchronous wake-up + * + * This function wakes up a thread in the @proc process. + * The caller may provide a specific thread to wake-up in + * the @thread parameter. If @thread is NULL, this function + * will wake up threads that have called poll(). + * + * Note that for this function to work as expected, callers + * should first call binder_select_thread() to find a thread + * to handle the work (if they don't have a thread already), + * and pass the result into the @thread parameter. + */ +static void binder_wakeup_thread_ilocked(struct binder_proc *proc, + struct binder_thread *thread, + bool sync) +{ + assert_spin_locked(&proc->inner_lock); + + if (thread) { + if (sync) + wake_up_interruptible_sync(&thread->wait); + else + wake_up_interruptible(&thread->wait); return; } - min_nice = rlimit_to_nice(current->signal->rlim[RLIMIT_NICE].rlim_cur); - binder_debug(BINDER_DEBUG_PRIORITY_CAP, - "%d: nice value %ld not allowed use %ld instead\n", - current->pid, nice, min_nice); - set_user_nice(current, min_nice); - if (min_nice <= MAX_NICE) - return; - binder_user_error("%d RLIMIT_NICE not set\n", current->pid); + + /* Didn't find a thread waiting for proc work; this can happen + * in two scenarios: + * 1. All threads are busy handling transactions + * In that case, one of those threads should call back into + * the kernel driver soon and pick up this work. + * 2. Threads are using the (e)poll interface, in which case + * they may be blocked on the waitqueue without having been + * added to waiting_threads. For this case, we just iterate + * over all threads not handling transaction work, and + * wake them all up. We wake all because we don't know whether + * a thread that called into (e)poll is handling non-binder + * work currently. + */ + binder_wakeup_poll_threads_ilocked(proc, sync); } -static size_t binder_buffer_size(struct binder_proc *proc, - struct binder_buffer *buffer) +static void binder_wakeup_proc_ilocked(struct binder_proc *proc) { - if (list_is_last(&buffer->entry, &proc->buffers)) - return proc->buffer + proc->buffer_size - (void *)buffer->data; - return (size_t)list_entry(buffer->entry.next, - struct binder_buffer, entry) - (size_t)buffer->data; + struct binder_thread *thread = binder_select_thread_ilocked(proc); + + binder_wakeup_thread_ilocked(proc, thread, /* sync = */false); } -static void binder_insert_free_buffer(struct binder_proc *proc, - struct binder_buffer *new_buffer) +static bool is_rt_policy(int policy) { - struct rb_node **p = &proc->free_buffers.rb_node; - struct rb_node *parent = NULL; - struct binder_buffer *buffer; - size_t buffer_size; - size_t new_buffer_size; - - BUG_ON(!new_buffer->free); - - new_buffer_size = binder_buffer_size(proc, new_buffer); - - binder_debug(BINDER_DEBUG_BUFFER_ALLOC, - "%d: add free buffer, size %zd, at %p\n", - proc->pid, new_buffer_size, new_buffer); - - while (*p) { - parent = *p; - buffer = rb_entry(parent, struct binder_buffer, rb_node); - BUG_ON(!buffer->free); - - buffer_size = binder_buffer_size(proc, buffer); - - if (new_buffer_size < buffer_size) - p = &parent->rb_left; - else - p = &parent->rb_right; - } - rb_link_node(&new_buffer->rb_node, parent, p); - rb_insert_color(&new_buffer->rb_node, &proc->free_buffers); + return policy == SCHED_FIFO || policy == SCHED_RR; } -static void binder_insert_allocated_buffer(struct binder_proc *proc, - struct binder_buffer *new_buffer) +static bool is_fair_policy(int policy) { - struct rb_node **p = &proc->allocated_buffers.rb_node; - struct rb_node *parent = NULL; - struct binder_buffer *buffer; - - BUG_ON(new_buffer->free); - - while (*p) { - parent = *p; - buffer = rb_entry(parent, struct binder_buffer, rb_node); - BUG_ON(buffer->free); - - if (new_buffer < buffer) - p = &parent->rb_left; - else if (new_buffer > buffer) - p = &parent->rb_right; - else - BUG(); - } - rb_link_node(&new_buffer->rb_node, parent, p); - rb_insert_color(&new_buffer->rb_node, &proc->allocated_buffers); + return policy == SCHED_NORMAL || policy == SCHED_BATCH; } -static struct binder_buffer *binder_buffer_lookup(struct binder_proc *proc, - uintptr_t user_ptr) +static bool binder_supported_policy(int policy) { - struct rb_node *n = proc->allocated_buffers.rb_node; - struct binder_buffer *buffer; - struct binder_buffer *kern_ptr; - - kern_ptr = (struct binder_buffer *)(user_ptr - proc->user_buffer_offset - - offsetof(struct binder_buffer, data)); - - while (n) { - buffer = rb_entry(n, struct binder_buffer, rb_node); - BUG_ON(buffer->free); - - if (kern_ptr < buffer) - n = n->rb_left; - else if (kern_ptr > buffer) - n = n->rb_right; - else - return buffer; - } - return NULL; + return is_fair_policy(policy) || is_rt_policy(policy); } -static int binder_update_page_range(struct binder_proc *proc, int allocate, - void *start, void *end, - struct vm_area_struct *vma) +static int to_userspace_prio(int policy, int kernel_priority) { - void *page_addr; - unsigned long user_page_addr; - struct page **page; - struct mm_struct *mm; - - binder_debug(BINDER_DEBUG_BUFFER_ALLOC, - "%d: %s pages %p-%p\n", proc->pid, - allocate ? "allocate" : "free", start, end); - - if (end <= start) - return 0; - - trace_binder_update_page_range(proc, allocate, start, end); - - if (vma) - mm = NULL; + if (is_fair_policy(policy)) + return PRIO_TO_NICE(kernel_priority); else - mm = get_task_mm(proc->tsk); - - if (mm) { - down_write(&mm->mmap_sem); - vma = proc->vma; - if (vma && mm != proc->vma_vm_mm) { - pr_err("%d: vma mm and task mm mismatch\n", - proc->pid); - vma = NULL; - } - } - - if (allocate == 0) - goto free_range; - - if (vma == NULL) { - pr_err("%d: binder_alloc_buf failed to map pages in userspace, no vma\n", - proc->pid); - goto err_no_vma; - } - - for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) { - int ret; - - page = &proc->pages[(page_addr - proc->buffer) / PAGE_SIZE]; - - BUG_ON(*page); - *page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO); - if (*page == NULL) { - pr_err("%d: binder_alloc_buf failed for page at %p\n", - proc->pid, page_addr); - goto err_alloc_page_failed; - } - ret = map_kernel_range_noflush((unsigned long)page_addr, - PAGE_SIZE, PAGE_KERNEL, page); - flush_cache_vmap((unsigned long)page_addr, - (unsigned long)page_addr + PAGE_SIZE); - if (ret != 1) { - pr_err("%d: binder_alloc_buf failed to map page at %p in kernel\n", - proc->pid, page_addr); - goto err_map_kernel_failed; - } - user_page_addr = - (uintptr_t)page_addr + proc->user_buffer_offset; - ret = vm_insert_page(vma, user_page_addr, page[0]); - if (ret) { - pr_err("%d: binder_alloc_buf failed to map page at %lx in userspace\n", - proc->pid, user_page_addr); - goto err_vm_insert_page_failed; - } - /* vm_insert_page does not seem to increment the refcount */ - } - if (mm) { - up_write(&mm->mmap_sem); - mmput(mm); - } - return 0; - -free_range: - for (page_addr = end - PAGE_SIZE; page_addr >= start; - page_addr -= PAGE_SIZE) { - page = &proc->pages[(page_addr - proc->buffer) / PAGE_SIZE]; - if (vma) - zap_page_range(vma, (uintptr_t)page_addr + - proc->user_buffer_offset, PAGE_SIZE, NULL); -err_vm_insert_page_failed: - unmap_kernel_range((unsigned long)page_addr, PAGE_SIZE); -err_map_kernel_failed: - __free_page(*page); - *page = NULL; -err_alloc_page_failed: - ; - } -err_no_vma: - if (mm) { - up_write(&mm->mmap_sem); - mmput(mm); - } - return -ENOMEM; + return MAX_USER_RT_PRIO - 1 - kernel_priority; } -static struct binder_buffer *binder_alloc_buf(struct binder_proc *proc, - size_t data_size, - size_t offsets_size, int is_async) +static int to_kernel_prio(int policy, int user_priority) { - struct rb_node *n = proc->free_buffers.rb_node; - struct binder_buffer *buffer; - size_t buffer_size; - struct rb_node *best_fit = NULL; - void *has_page_addr; - void *end_page_addr; - size_t size; + if (is_fair_policy(policy)) + return NICE_TO_PRIO(user_priority); + else + return MAX_USER_RT_PRIO - 1 - user_priority; +} - if (proc->vma == NULL) { - pr_err("%d: binder_alloc_buf, no vma\n", - proc->pid); - return NULL; - } +static void binder_do_set_priority(struct task_struct *task, + struct binder_priority desired, + bool verify) +{ + int priority; /* user-space prio value */ + bool has_cap_nice; + unsigned int policy = desired.sched_policy; - size = ALIGN(data_size, sizeof(void *)) + - ALIGN(offsets_size, sizeof(void *)); + if (task->policy == policy && task->normal_prio == desired.prio) + return; - if (size < data_size || size < offsets_size) { - binder_user_error("%d: got transaction with invalid size %zd-%zd\n", - proc->pid, data_size, offsets_size); - return NULL; - } + has_cap_nice = has_capability_noaudit(task, CAP_SYS_NICE); - if (is_async && - proc->free_async_space < size + sizeof(struct binder_buffer)) { - binder_debug(BINDER_DEBUG_BUFFER_ALLOC, - "%d: binder_alloc_buf size %zd failed, no async space left\n", - proc->pid, size); - return NULL; - } + priority = to_userspace_prio(policy, desired.prio); - while (n) { - buffer = rb_entry(n, struct binder_buffer, rb_node); - BUG_ON(!buffer->free); - buffer_size = binder_buffer_size(proc, buffer); + if (verify && is_rt_policy(policy) && !has_cap_nice) { + long max_rtprio = task_rlimit(task, RLIMIT_RTPRIO); - if (size < buffer_size) { - best_fit = n; - n = n->rb_left; - } else if (size > buffer_size) - n = n->rb_right; - else { - best_fit = n; - break; + if (max_rtprio == 0) { + policy = SCHED_NORMAL; + priority = MIN_NICE; + } else if (priority > max_rtprio) { + priority = max_rtprio; } } - if (best_fit == NULL) { - pr_err("%d: binder_alloc_buf size %zd failed, no address space\n", - proc->pid, size); - return NULL; - } - if (n == NULL) { - buffer = rb_entry(best_fit, struct binder_buffer, rb_node); - buffer_size = binder_buffer_size(proc, buffer); - } - binder_debug(BINDER_DEBUG_BUFFER_ALLOC, - "%d: binder_alloc_buf size %zd got buffer %p size %zd\n", - proc->pid, size, buffer, buffer_size); + if (verify && is_fair_policy(policy) && !has_cap_nice) { + long min_nice = rlimit_to_nice(task_rlimit(task, RLIMIT_NICE)); - has_page_addr = - (void *)(((uintptr_t)buffer->data + buffer_size) & PAGE_MASK); - if (n == NULL) { - if (size + sizeof(struct binder_buffer) + 4 >= buffer_size) - buffer_size = size; /* no room for other buffers */ - else - buffer_size = size + sizeof(struct binder_buffer); - } - end_page_addr = - (void *)PAGE_ALIGN((uintptr_t)buffer->data + buffer_size); - if (end_page_addr > has_page_addr) - end_page_addr = has_page_addr; - if (binder_update_page_range(proc, 1, - (void *)PAGE_ALIGN((uintptr_t)buffer->data), end_page_addr, NULL)) - return NULL; - - rb_erase(best_fit, &proc->free_buffers); - buffer->free = 0; - binder_insert_allocated_buffer(proc, buffer); - if (buffer_size != size) { - struct binder_buffer *new_buffer = (void *)buffer->data + size; - - list_add(&new_buffer->entry, &buffer->entry); - new_buffer->free = 1; - binder_insert_free_buffer(proc, new_buffer); - } - binder_debug(BINDER_DEBUG_BUFFER_ALLOC, - "%d: binder_alloc_buf size %zd got %p\n", - proc->pid, size, buffer); - buffer->data_size = data_size; - buffer->offsets_size = offsets_size; - buffer->async_transaction = is_async; - if (is_async) { - proc->free_async_space -= size + sizeof(struct binder_buffer); - binder_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC, - "%d: binder_alloc_buf size %zd async free %zd\n", - proc->pid, size, proc->free_async_space); - } - - return buffer; -} - -static void *buffer_start_page(struct binder_buffer *buffer) -{ - return (void *)((uintptr_t)buffer & PAGE_MASK); -} - -static void *buffer_end_page(struct binder_buffer *buffer) -{ - return (void *)(((uintptr_t)(buffer + 1) - 1) & PAGE_MASK); -} - -static void binder_delete_free_buffer(struct binder_proc *proc, - struct binder_buffer *buffer) -{ - struct binder_buffer *prev, *next = NULL; - int free_page_end = 1; - int free_page_start = 1; - - BUG_ON(proc->buffers.next == &buffer->entry); - prev = list_entry(buffer->entry.prev, struct binder_buffer, entry); - BUG_ON(!prev->free); - if (buffer_end_page(prev) == buffer_start_page(buffer)) { - free_page_start = 0; - if (buffer_end_page(prev) == buffer_end_page(buffer)) - free_page_end = 0; - binder_debug(BINDER_DEBUG_BUFFER_ALLOC, - "%d: merge free, buffer %p share page with %p\n", - proc->pid, buffer, prev); - } - - if (!list_is_last(&buffer->entry, &proc->buffers)) { - next = list_entry(buffer->entry.next, - struct binder_buffer, entry); - if (buffer_start_page(next) == buffer_end_page(buffer)) { - free_page_end = 0; - if (buffer_start_page(next) == - buffer_start_page(buffer)) - free_page_start = 0; - binder_debug(BINDER_DEBUG_BUFFER_ALLOC, - "%d: merge free, buffer %p share page with %p\n", - proc->pid, buffer, prev); + if (min_nice > MAX_NICE) { + binder_user_error("%d RLIMIT_NICE not set\n", + task->pid); + return; + } else if (priority < min_nice) { + priority = min_nice; } } - list_del(&buffer->entry); - if (free_page_start || free_page_end) { - binder_debug(BINDER_DEBUG_BUFFER_ALLOC, - "%d: merge free, buffer %p do not share page%s%s with %p or %p\n", - proc->pid, buffer, free_page_start ? "" : " end", - free_page_end ? "" : " start", prev, next); - binder_update_page_range(proc, 0, free_page_start ? - buffer_start_page(buffer) : buffer_end_page(buffer), - (free_page_end ? buffer_end_page(buffer) : - buffer_start_page(buffer)) + PAGE_SIZE, NULL); + + if (policy != desired.sched_policy || + to_kernel_prio(policy, priority) != desired.prio) + binder_debug(BINDER_DEBUG_PRIORITY_CAP, + "%d: priority %d not allowed, using %d instead\n", + task->pid, desired.prio, + to_kernel_prio(policy, priority)); + + trace_binder_set_priority(task->tgid, task->pid, task->normal_prio, + to_kernel_prio(policy, priority), + desired.prio); + + /* Set the actual priority */ + if (task->policy != policy || is_rt_policy(policy)) { + struct sched_param params; + + params.sched_priority = is_rt_policy(policy) ? priority : 0; + + sched_setscheduler_nocheck(task, + policy | SCHED_RESET_ON_FORK, + ¶ms); } + if (is_fair_policy(policy)) + set_user_nice(task, priority); } -static void binder_free_buf(struct binder_proc *proc, - struct binder_buffer *buffer) +static void binder_set_priority(struct task_struct *task, + struct binder_priority desired) { - size_t size, buffer_size; - - buffer_size = binder_buffer_size(proc, buffer); - - size = ALIGN(buffer->data_size, sizeof(void *)) + - ALIGN(buffer->offsets_size, sizeof(void *)); - - binder_debug(BINDER_DEBUG_BUFFER_ALLOC, - "%d: binder_free_buf %p size %zd buffer_size %zd\n", - proc->pid, buffer, size, buffer_size); - - BUG_ON(buffer->free); - BUG_ON(size > buffer_size); - BUG_ON(buffer->transaction != NULL); - BUG_ON((void *)buffer < proc->buffer); - BUG_ON((void *)buffer > proc->buffer + proc->buffer_size); - - if (buffer->async_transaction) { - proc->free_async_space += size + sizeof(struct binder_buffer); - - binder_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC, - "%d: binder_free_buf size %zd async free %zd\n", - proc->pid, size, proc->free_async_space); - } - - binder_update_page_range(proc, 0, - (void *)PAGE_ALIGN((uintptr_t)buffer->data), - (void *)(((uintptr_t)buffer->data + buffer_size) & PAGE_MASK), - NULL); - rb_erase(&buffer->rb_node, &proc->allocated_buffers); - buffer->free = 1; - if (!list_is_last(&buffer->entry, &proc->buffers)) { - struct binder_buffer *next = list_entry(buffer->entry.next, - struct binder_buffer, entry); - - if (next->free) { - rb_erase(&next->rb_node, &proc->free_buffers); - binder_delete_free_buffer(proc, next); - } - } - if (proc->buffers.next != &buffer->entry) { - struct binder_buffer *prev = list_entry(buffer->entry.prev, - struct binder_buffer, entry); - - if (prev->free) { - binder_delete_free_buffer(proc, buffer); - rb_erase(&prev->rb_node, &proc->free_buffers); - buffer = prev; - } - } - binder_insert_free_buffer(proc, buffer); + binder_do_set_priority(task, desired, /* verify = */ true); } -static struct binder_node *binder_get_node(struct binder_proc *proc, - binder_uintptr_t ptr) +static void binder_restore_priority(struct task_struct *task, + struct binder_priority desired) +{ + binder_do_set_priority(task, desired, /* verify = */ false); +} + +static void binder_transaction_priority(struct task_struct *task, + struct binder_transaction *t, + struct binder_priority node_prio, + bool inherit_rt) +{ + struct binder_priority desired_prio = t->priority; + + if (t->set_priority_called) + return; + + t->set_priority_called = true; + t->saved_priority.sched_policy = task->policy; + t->saved_priority.prio = task->normal_prio; + + if (!inherit_rt && is_rt_policy(desired_prio.sched_policy)) { + desired_prio.prio = NICE_TO_PRIO(0); + desired_prio.sched_policy = SCHED_NORMAL; + } + + if (node_prio.prio < t->priority.prio || + (node_prio.prio == t->priority.prio && + node_prio.sched_policy == SCHED_FIFO)) { + /* + * In case the minimum priority on the node is + * higher (lower value), use that priority. If + * the priority is the same, but the node uses + * SCHED_FIFO, prefer SCHED_FIFO, since it can + * run unbounded, unlike SCHED_RR. + */ + desired_prio = node_prio; + } + + binder_set_priority(task, desired_prio); +} + +static struct binder_node *binder_get_node_ilocked(struct binder_proc *proc, + binder_uintptr_t ptr) { struct rb_node *n = proc->nodes.rb_node; struct binder_node *node; + assert_spin_locked(&proc->inner_lock); + while (n) { node = rb_entry(n, struct binder_node, rb_node); @@ -876,21 +1286,47 @@ static struct binder_node *binder_get_node(struct binder_proc *proc, n = n->rb_left; else if (ptr > node->ptr) n = n->rb_right; - else + else { + /* + * take an implicit weak reference + * to ensure node stays alive until + * call to binder_put_node() + */ + binder_inc_node_tmpref_ilocked(node); return node; + } } return NULL; } -static struct binder_node *binder_new_node(struct binder_proc *proc, - binder_uintptr_t ptr, - binder_uintptr_t cookie) +static struct binder_node *binder_get_node(struct binder_proc *proc, + binder_uintptr_t ptr) +{ + struct binder_node *node; + + binder_inner_proc_lock(proc); + node = binder_get_node_ilocked(proc, ptr); + binder_inner_proc_unlock(proc); + return node; +} + +static struct binder_node *binder_init_node_ilocked( + struct binder_proc *proc, + struct binder_node *new_node, + struct flat_binder_object *fp) { struct rb_node **p = &proc->nodes.rb_node; struct rb_node *parent = NULL; struct binder_node *node; + binder_uintptr_t ptr = fp ? fp->binder : 0; + binder_uintptr_t cookie = fp ? fp->cookie : 0; + __u32 flags = fp ? fp->flags : 0; + s8 priority; + + assert_spin_locked(&proc->inner_lock); while (*p) { + parent = *p; node = rb_entry(parent, struct binder_node, rb_node); @@ -898,39 +1334,86 @@ static struct binder_node *binder_new_node(struct binder_proc *proc, p = &(*p)->rb_left; else if (ptr > node->ptr) p = &(*p)->rb_right; - else - return NULL; + else { + /* + * A matching node is already in + * the rb tree. Abandon the init + * and return it. + */ + binder_inc_node_tmpref_ilocked(node); + return node; + } } - - node = kzalloc(sizeof(*node), GFP_KERNEL); - if (node == NULL) - return NULL; + node = new_node; binder_stats_created(BINDER_STAT_NODE); + node->tmp_refs++; rb_link_node(&node->rb_node, parent, p); rb_insert_color(&node->rb_node, &proc->nodes); - node->debug_id = ++binder_last_id; + node->debug_id = atomic_inc_return(&binder_last_id); node->proc = proc; node->ptr = ptr; node->cookie = cookie; node->work.type = BINDER_WORK_NODE; + priority = flags & FLAT_BINDER_FLAG_PRIORITY_MASK; + node->sched_policy = (flags & FLAT_BINDER_FLAG_SCHED_POLICY_MASK) >> + FLAT_BINDER_FLAG_SCHED_POLICY_SHIFT; + node->min_priority = to_kernel_prio(node->sched_policy, priority); + node->accept_fds = !!(flags & FLAT_BINDER_FLAG_ACCEPTS_FDS); + node->inherit_rt = !!(flags & FLAT_BINDER_FLAG_INHERIT_RT); + spin_lock_init(&node->lock); INIT_LIST_HEAD(&node->work.entry); INIT_LIST_HEAD(&node->async_todo); binder_debug(BINDER_DEBUG_INTERNAL_REFS, "%d:%d node %d u%016llx c%016llx created\n", proc->pid, current->pid, node->debug_id, (u64)node->ptr, (u64)node->cookie); + return node; } -static int binder_inc_node(struct binder_node *node, int strong, int internal, - struct list_head *target_list) +static struct binder_node *binder_new_node(struct binder_proc *proc, + struct flat_binder_object *fp) { + struct binder_node *node; + struct binder_node *new_node = kzalloc(sizeof(*node), GFP_KERNEL); + + if (!new_node) + return NULL; + binder_inner_proc_lock(proc); + node = binder_init_node_ilocked(proc, new_node, fp); + binder_inner_proc_unlock(proc); + if (node != new_node) + /* + * The node was already added by another thread + */ + kfree(new_node); + + return node; +} + +static void binder_free_node(struct binder_node *node) +{ + kfree(node); + binder_stats_deleted(BINDER_STAT_NODE); +} + +static int binder_inc_node_nilocked(struct binder_node *node, int strong, + int internal, + struct list_head *target_list) +{ + struct binder_proc *proc = node->proc; + + assert_spin_locked(&node->lock); + if (proc) + assert_spin_locked(&proc->inner_lock); if (strong) { if (internal) { if (target_list == NULL && node->internal_strong_refs == 0 && - !(node == binder_context_mgr_node && - node->has_strong_ref)) { + !(node->proc && + node == node->proc->context-> + binder_context_mgr_node && + node->has_strong_ref)) { pr_err("invalid inc strong node for %d\n", node->debug_id); return -EINVAL; @@ -939,8 +1422,19 @@ static int binder_inc_node(struct binder_node *node, int strong, int internal, } else node->local_strong_refs++; if (!node->has_strong_ref && target_list) { - list_del_init(&node->work.entry); - list_add_tail(&node->work.entry, target_list); + binder_dequeue_work_ilocked(&node->work); + /* + * Note: this function is the only place where we queue + * directly to a thread->todo without using the + * corresponding binder_enqueue_thread_work() helper + * functions; in this case it's ok to not set the + * process_todo flag, since we know this node work will + * always be followed by other work that starts queue + * processing: in case of synchronous transactions, a + * BR_REPLY or BR_ERROR; in case of oneway + * transactions, a BR_TRANSACTION_COMPLETE. + */ + binder_enqueue_work_ilocked(&node->work, target_list); } } else { if (!internal) @@ -951,58 +1445,172 @@ static int binder_inc_node(struct binder_node *node, int strong, int internal, node->debug_id); return -EINVAL; } - list_add_tail(&node->work.entry, target_list); + /* + * See comment above + */ + binder_enqueue_work_ilocked(&node->work, target_list); } } return 0; } -static int binder_dec_node(struct binder_node *node, int strong, int internal) +static int binder_inc_node(struct binder_node *node, int strong, int internal, + struct list_head *target_list) { + int ret; + + binder_node_inner_lock(node); + ret = binder_inc_node_nilocked(node, strong, internal, target_list); + binder_node_inner_unlock(node); + + return ret; +} + +static bool binder_dec_node_nilocked(struct binder_node *node, + int strong, int internal) +{ + struct binder_proc *proc = node->proc; + + assert_spin_locked(&node->lock); + if (proc) + assert_spin_locked(&proc->inner_lock); if (strong) { if (internal) node->internal_strong_refs--; else node->local_strong_refs--; if (node->local_strong_refs || node->internal_strong_refs) - return 0; + return false; } else { if (!internal) node->local_weak_refs--; - if (node->local_weak_refs || !hlist_empty(&node->refs)) - return 0; + if (node->local_weak_refs || node->tmp_refs || + !hlist_empty(&node->refs)) + return false; } - if (node->proc && (node->has_strong_ref || node->has_weak_ref)) { + + if (proc && (node->has_strong_ref || node->has_weak_ref)) { if (list_empty(&node->work.entry)) { - list_add_tail(&node->work.entry, &node->proc->todo); - wake_up_interruptible(&node->proc->wait); + binder_enqueue_work_ilocked(&node->work, &proc->todo); + binder_wakeup_proc_ilocked(proc); } } else { if (hlist_empty(&node->refs) && !node->local_strong_refs && - !node->local_weak_refs) { - list_del_init(&node->work.entry); - if (node->proc) { - rb_erase(&node->rb_node, &node->proc->nodes); + !node->local_weak_refs && !node->tmp_refs) { + if (proc) { + binder_dequeue_work_ilocked(&node->work); + rb_erase(&node->rb_node, &proc->nodes); binder_debug(BINDER_DEBUG_INTERNAL_REFS, "refless node %d deleted\n", node->debug_id); } else { + BUG_ON(!list_empty(&node->work.entry)); + spin_lock(&binder_dead_nodes_lock); + /* + * tmp_refs could have changed so + * check it again + */ + if (node->tmp_refs) { + spin_unlock(&binder_dead_nodes_lock); + return false; + } hlist_del(&node->dead_node); + spin_unlock(&binder_dead_nodes_lock); binder_debug(BINDER_DEBUG_INTERNAL_REFS, "dead node %d deleted\n", node->debug_id); } - kfree(node); - binder_stats_deleted(BINDER_STAT_NODE); + return true; } } - - return 0; + return false; } +static void binder_dec_node(struct binder_node *node, int strong, int internal) +{ + bool free_node; -static struct binder_ref *binder_get_ref(struct binder_proc *proc, - u32 desc, bool need_strong_ref) + binder_node_inner_lock(node); + free_node = binder_dec_node_nilocked(node, strong, internal); + binder_node_inner_unlock(node); + if (free_node) + binder_free_node(node); +} + +static void binder_inc_node_tmpref_ilocked(struct binder_node *node) +{ + /* + * No call to binder_inc_node() is needed since we + * don't need to inform userspace of any changes to + * tmp_refs + */ + node->tmp_refs++; +} + +/** + * binder_inc_node_tmpref() - take a temporary reference on node + * @node: node to reference + * + * Take reference on node to prevent the node from being freed + * while referenced only by a local variable. The inner lock is + * needed to serialize with the node work on the queue (which + * isn't needed after the node is dead). If the node is dead + * (node->proc is NULL), use binder_dead_nodes_lock to protect + * node->tmp_refs against dead-node-only cases where the node + * lock cannot be acquired (eg traversing the dead node list to + * print nodes) + */ +static void binder_inc_node_tmpref(struct binder_node *node) +{ + binder_node_lock(node); + if (node->proc) + binder_inner_proc_lock(node->proc); + else + spin_lock(&binder_dead_nodes_lock); + binder_inc_node_tmpref_ilocked(node); + if (node->proc) + binder_inner_proc_unlock(node->proc); + else + spin_unlock(&binder_dead_nodes_lock); + binder_node_unlock(node); +} + +/** + * binder_dec_node_tmpref() - remove a temporary reference on node + * @node: node to reference + * + * Release temporary reference on node taken via binder_inc_node_tmpref() + */ +static void binder_dec_node_tmpref(struct binder_node *node) +{ + bool free_node; + + binder_node_inner_lock(node); + if (!node->proc) + spin_lock(&binder_dead_nodes_lock); + node->tmp_refs--; + BUG_ON(node->tmp_refs < 0); + if (!node->proc) + spin_unlock(&binder_dead_nodes_lock); + /* + * Call binder_dec_node() to check if all refcounts are 0 + * and cleanup is needed. Calling with strong=0 and internal=1 + * causes no actual reference to be released in binder_dec_node(). + * If that changes, a change is needed here too. + */ + free_node = binder_dec_node_nilocked(node, 0, 1); + binder_node_inner_unlock(node); + if (free_node) + binder_free_node(node); +} + +static void binder_put_node(struct binder_node *node) +{ + binder_dec_node_tmpref(node); +} + +static struct binder_ref *binder_get_ref_olocked(struct binder_proc *proc, + u32 desc, bool need_strong_ref) { struct rb_node *n = proc->refs_by_desc.rb_node; struct binder_ref *ref; @@ -1010,11 +1618,11 @@ static struct binder_ref *binder_get_ref(struct binder_proc *proc, while (n) { ref = rb_entry(n, struct binder_ref, rb_node_desc); - if (desc < ref->desc) { + if (desc < ref->data.desc) { n = n->rb_left; - } else if (desc > ref->desc) { + } else if (desc > ref->data.desc) { n = n->rb_right; - } else if (need_strong_ref && !ref->strong) { + } else if (need_strong_ref && !ref->data.strong) { binder_user_error("tried to use weak ref as strong ref\n"); return NULL; } else { @@ -1024,13 +1632,34 @@ static struct binder_ref *binder_get_ref(struct binder_proc *proc, return NULL; } -static struct binder_ref *binder_get_ref_for_node(struct binder_proc *proc, - struct binder_node *node) +/** + * binder_get_ref_for_node_olocked() - get the ref associated with given node + * @proc: binder_proc that owns the ref + * @node: binder_node of target + * @new_ref: newly allocated binder_ref to be initialized or %NULL + * + * Look up the ref for the given node and return it if it exists + * + * If it doesn't exist and the caller provides a newly allocated + * ref, initialize the fields of the newly allocated ref and insert + * into the given proc rb_trees and node refs list. + * + * Return: the ref for node. It is possible that another thread + * allocated/initialized the ref first in which case the + * returned ref would be different than the passed-in + * new_ref. new_ref must be kfree'd by the caller in + * this case. + */ +static struct binder_ref *binder_get_ref_for_node_olocked( + struct binder_proc *proc, + struct binder_node *node, + struct binder_ref *new_ref) { - struct rb_node *n; + struct binder_context *context = proc->context; struct rb_node **p = &proc->refs_by_node.rb_node; struct rb_node *parent = NULL; - struct binder_ref *ref, *new_ref; + struct binder_ref *ref; + struct rb_node *n; while (*p) { parent = *p; @@ -1043,22 +1672,22 @@ static struct binder_ref *binder_get_ref_for_node(struct binder_proc *proc, else return ref; } - new_ref = kzalloc(sizeof(*ref), GFP_KERNEL); - if (new_ref == NULL) + if (!new_ref) return NULL; + binder_stats_created(BINDER_STAT_REF); - new_ref->debug_id = ++binder_last_id; + new_ref->data.debug_id = atomic_inc_return(&binder_last_id); new_ref->proc = proc; new_ref->node = node; rb_link_node(&new_ref->rb_node_node, parent, p); rb_insert_color(&new_ref->rb_node_node, &proc->refs_by_node); - new_ref->desc = (node == binder_context_mgr_node) ? 0 : 1; + new_ref->data.desc = (node == context->binder_context_mgr_node) ? 0 : 1; for (n = rb_first(&proc->refs_by_desc); n != NULL; n = rb_next(n)) { ref = rb_entry(n, struct binder_ref, rb_node_desc); - if (ref->desc > new_ref->desc) + if (ref->data.desc > new_ref->data.desc) break; - new_ref->desc = ref->desc + 1; + new_ref->data.desc = ref->data.desc + 1; } p = &proc->refs_by_desc.rb_node; @@ -1066,121 +1695,423 @@ static struct binder_ref *binder_get_ref_for_node(struct binder_proc *proc, parent = *p; ref = rb_entry(parent, struct binder_ref, rb_node_desc); - if (new_ref->desc < ref->desc) + if (new_ref->data.desc < ref->data.desc) p = &(*p)->rb_left; - else if (new_ref->desc > ref->desc) + else if (new_ref->data.desc > ref->data.desc) p = &(*p)->rb_right; else BUG(); } rb_link_node(&new_ref->rb_node_desc, parent, p); rb_insert_color(&new_ref->rb_node_desc, &proc->refs_by_desc); - if (node) { - hlist_add_head(&new_ref->node_entry, &node->refs); - binder_debug(BINDER_DEBUG_INTERNAL_REFS, - "%d new ref %d desc %d for node %d\n", - proc->pid, new_ref->debug_id, new_ref->desc, - node->debug_id); - } else { - binder_debug(BINDER_DEBUG_INTERNAL_REFS, - "%d new ref %d desc %d for dead node\n", - proc->pid, new_ref->debug_id, new_ref->desc); - } + binder_node_lock(node); + hlist_add_head(&new_ref->node_entry, &node->refs); + + binder_debug(BINDER_DEBUG_INTERNAL_REFS, + "%d new ref %d desc %d for node %d\n", + proc->pid, new_ref->data.debug_id, new_ref->data.desc, + node->debug_id); + binder_node_unlock(node); return new_ref; } -static void binder_delete_ref(struct binder_ref *ref) +static void binder_cleanup_ref_olocked(struct binder_ref *ref) { + bool delete_node = false; + binder_debug(BINDER_DEBUG_INTERNAL_REFS, "%d delete ref %d desc %d for node %d\n", - ref->proc->pid, ref->debug_id, ref->desc, + ref->proc->pid, ref->data.debug_id, ref->data.desc, ref->node->debug_id); rb_erase(&ref->rb_node_desc, &ref->proc->refs_by_desc); rb_erase(&ref->rb_node_node, &ref->proc->refs_by_node); - if (ref->strong) - binder_dec_node(ref->node, 1, 1); + + binder_node_inner_lock(ref->node); + if (ref->data.strong) + binder_dec_node_nilocked(ref->node, 1, 1); + hlist_del(&ref->node_entry); - binder_dec_node(ref->node, 0, 1); + delete_node = binder_dec_node_nilocked(ref->node, 0, 1); + binder_node_inner_unlock(ref->node); + /* + * Clear ref->node unless we want the caller to free the node + */ + if (!delete_node) { + /* + * The caller uses ref->node to determine + * whether the node needs to be freed. Clear + * it since the node is still alive. + */ + ref->node = NULL; + } + if (ref->death) { binder_debug(BINDER_DEBUG_DEAD_BINDER, "%d delete ref %d desc %d has death notification\n", - ref->proc->pid, ref->debug_id, ref->desc); - list_del(&ref->death->work.entry); - kfree(ref->death); + ref->proc->pid, ref->data.debug_id, + ref->data.desc); + binder_dequeue_work(ref->proc, &ref->death->work); binder_stats_deleted(BINDER_STAT_DEATH); } - kfree(ref); binder_stats_deleted(BINDER_STAT_REF); } -static int binder_inc_ref(struct binder_ref *ref, int strong, - struct list_head *target_list) +/** + * binder_inc_ref_olocked() - increment the ref for given handle + * @ref: ref to be incremented + * @strong: if true, strong increment, else weak + * @target_list: list to queue node work on + * + * Increment the ref. @ref->proc->outer_lock must be held on entry + * + * Return: 0, if successful, else errno + */ +static int binder_inc_ref_olocked(struct binder_ref *ref, int strong, + struct list_head *target_list) { int ret; if (strong) { - if (ref->strong == 0) { + if (ref->data.strong == 0) { ret = binder_inc_node(ref->node, 1, 1, target_list); if (ret) return ret; } - ref->strong++; + ref->data.strong++; } else { - if (ref->weak == 0) { + if (ref->data.weak == 0) { ret = binder_inc_node(ref->node, 0, 1, target_list); if (ret) return ret; } - ref->weak++; + ref->data.weak++; } return 0; } - -static int binder_dec_ref(struct binder_ref *ref, int strong) +/** + * binder_dec_ref() - dec the ref for given handle + * @ref: ref to be decremented + * @strong: if true, strong decrement, else weak + * + * Decrement the ref. + * + * Return: true if ref is cleaned up and ready to be freed + */ +static bool binder_dec_ref_olocked(struct binder_ref *ref, int strong) { if (strong) { - if (ref->strong == 0) { + if (ref->data.strong == 0) { binder_user_error("%d invalid dec strong, ref %d desc %d s %d w %d\n", - ref->proc->pid, ref->debug_id, - ref->desc, ref->strong, ref->weak); - return -EINVAL; + ref->proc->pid, ref->data.debug_id, + ref->data.desc, ref->data.strong, + ref->data.weak); + return false; } - ref->strong--; - if (ref->strong == 0) { - int ret; - - ret = binder_dec_node(ref->node, strong, 1); - if (ret) - return ret; - } + ref->data.strong--; + if (ref->data.strong == 0) + binder_dec_node(ref->node, strong, 1); } else { - if (ref->weak == 0) { + if (ref->data.weak == 0) { binder_user_error("%d invalid dec weak, ref %d desc %d s %d w %d\n", - ref->proc->pid, ref->debug_id, - ref->desc, ref->strong, ref->weak); - return -EINVAL; + ref->proc->pid, ref->data.debug_id, + ref->data.desc, ref->data.strong, + ref->data.weak); + return false; } - ref->weak--; + ref->data.weak--; } - if (ref->strong == 0 && ref->weak == 0) - binder_delete_ref(ref); - return 0; + if (ref->data.strong == 0 && ref->data.weak == 0) { + binder_cleanup_ref_olocked(ref); + return true; + } + return false; } -static void binder_pop_transaction(struct binder_thread *target_thread, - struct binder_transaction *t) +/** + * binder_get_node_from_ref() - get the node from the given proc/desc + * @proc: proc containing the ref + * @desc: the handle associated with the ref + * @need_strong_ref: if true, only return node if ref is strong + * @rdata: the id/refcount data for the ref + * + * Given a proc and ref handle, return the associated binder_node + * + * Return: a binder_node or NULL if not found or not strong when strong required + */ +static struct binder_node *binder_get_node_from_ref( + struct binder_proc *proc, + u32 desc, bool need_strong_ref, + struct binder_ref_data *rdata) { - if (target_thread) { - BUG_ON(target_thread->transaction_stack != t); - BUG_ON(target_thread->transaction_stack->from != target_thread); - target_thread->transaction_stack = - target_thread->transaction_stack->from_parent; - t->from = NULL; + struct binder_node *node; + struct binder_ref *ref; + + binder_proc_lock(proc); + ref = binder_get_ref_olocked(proc, desc, need_strong_ref); + if (!ref) + goto err_no_ref; + node = ref->node; + /* + * Take an implicit reference on the node to ensure + * it stays alive until the call to binder_put_node() + */ + binder_inc_node_tmpref(node); + if (rdata) + *rdata = ref->data; + binder_proc_unlock(proc); + + return node; + +err_no_ref: + binder_proc_unlock(proc); + return NULL; +} + +/** + * binder_free_ref() - free the binder_ref + * @ref: ref to free + * + * Free the binder_ref. Free the binder_node indicated by ref->node + * (if non-NULL) and the binder_ref_death indicated by ref->death. + */ +static void binder_free_ref(struct binder_ref *ref) +{ + if (ref->node) + binder_free_node(ref->node); + kfree(ref->death); + kfree(ref); +} + +/** + * binder_update_ref_for_handle() - inc/dec the ref for given handle + * @proc: proc containing the ref + * @desc: the handle associated with the ref + * @increment: true=inc reference, false=dec reference + * @strong: true=strong reference, false=weak reference + * @rdata: the id/refcount data for the ref + * + * Given a proc and ref handle, increment or decrement the ref + * according to "increment" arg. + * + * Return: 0 if successful, else errno + */ +static int binder_update_ref_for_handle(struct binder_proc *proc, + uint32_t desc, bool increment, bool strong, + struct binder_ref_data *rdata) +{ + int ret = 0; + struct binder_ref *ref; + bool delete_ref = false; + + binder_proc_lock(proc); + ref = binder_get_ref_olocked(proc, desc, strong); + if (!ref) { + ret = -EINVAL; + goto err_no_ref; } - t->need_reply = 0; + if (increment) + ret = binder_inc_ref_olocked(ref, strong, NULL); + else + delete_ref = binder_dec_ref_olocked(ref, strong); + + if (rdata) + *rdata = ref->data; + binder_proc_unlock(proc); + + if (delete_ref) + binder_free_ref(ref); + return ret; + +err_no_ref: + binder_proc_unlock(proc); + return ret; +} + +/** + * binder_dec_ref_for_handle() - dec the ref for given handle + * @proc: proc containing the ref + * @desc: the handle associated with the ref + * @strong: true=strong reference, false=weak reference + * @rdata: the id/refcount data for the ref + * + * Just calls binder_update_ref_for_handle() to decrement the ref. + * + * Return: 0 if successful, else errno + */ +static int binder_dec_ref_for_handle(struct binder_proc *proc, + uint32_t desc, bool strong, struct binder_ref_data *rdata) +{ + return binder_update_ref_for_handle(proc, desc, false, strong, rdata); +} + + +/** + * binder_inc_ref_for_node() - increment the ref for given proc/node + * @proc: proc containing the ref + * @node: target node + * @strong: true=strong reference, false=weak reference + * @target_list: worklist to use if node is incremented + * @rdata: the id/refcount data for the ref + * + * Given a proc and node, increment the ref. Create the ref if it + * doesn't already exist + * + * Return: 0 if successful, else errno + */ +static int binder_inc_ref_for_node(struct binder_proc *proc, + struct binder_node *node, + bool strong, + struct list_head *target_list, + struct binder_ref_data *rdata) +{ + struct binder_ref *ref; + struct binder_ref *new_ref = NULL; + int ret = 0; + + binder_proc_lock(proc); + ref = binder_get_ref_for_node_olocked(proc, node, NULL); + if (!ref) { + binder_proc_unlock(proc); + new_ref = kzalloc(sizeof(*ref), GFP_KERNEL); + if (!new_ref) + return -ENOMEM; + binder_proc_lock(proc); + ref = binder_get_ref_for_node_olocked(proc, node, new_ref); + } + ret = binder_inc_ref_olocked(ref, strong, target_list); + *rdata = ref->data; + binder_proc_unlock(proc); + if (new_ref && ref != new_ref) + /* + * Another thread created the ref first so + * free the one we allocated + */ + kfree(new_ref); + return ret; +} + +static void binder_pop_transaction_ilocked(struct binder_thread *target_thread, + struct binder_transaction *t) +{ + BUG_ON(!target_thread); + assert_spin_locked(&target_thread->proc->inner_lock); + BUG_ON(target_thread->transaction_stack != t); + BUG_ON(target_thread->transaction_stack->from != target_thread); + target_thread->transaction_stack = + target_thread->transaction_stack->from_parent; + t->from = NULL; +} + +/** + * binder_thread_dec_tmpref() - decrement thread->tmp_ref + * @thread: thread to decrement + * + * A thread needs to be kept alive while being used to create or + * handle a transaction. binder_get_txn_from() is used to safely + * extract t->from from a binder_transaction and keep the thread + * indicated by t->from from being freed. When done with that + * binder_thread, this function is called to decrement the + * tmp_ref and free if appropriate (thread has been released + * and no transaction being processed by the driver) + */ +static void binder_thread_dec_tmpref(struct binder_thread *thread) +{ + /* + * atomic is used to protect the counter value while + * it cannot reach zero or thread->is_dead is false + */ + binder_inner_proc_lock(thread->proc); + atomic_dec(&thread->tmp_ref); + if (thread->is_dead && !atomic_read(&thread->tmp_ref)) { + binder_inner_proc_unlock(thread->proc); + binder_free_thread(thread); + return; + } + binder_inner_proc_unlock(thread->proc); +} + +/** + * binder_proc_dec_tmpref() - decrement proc->tmp_ref + * @proc: proc to decrement + * + * A binder_proc needs to be kept alive while being used to create or + * handle a transaction. proc->tmp_ref is incremented when + * creating a new transaction or the binder_proc is currently in-use + * by threads that are being released. When done with the binder_proc, + * this function is called to decrement the counter and free the + * proc if appropriate (proc has been released, all threads have + * been released and not currenly in-use to process a transaction). + */ +static void binder_proc_dec_tmpref(struct binder_proc *proc) +{ + binder_inner_proc_lock(proc); + proc->tmp_ref--; + if (proc->is_dead && RB_EMPTY_ROOT(&proc->threads) && + !proc->tmp_ref) { + binder_inner_proc_unlock(proc); + binder_free_proc(proc); + return; + } + binder_inner_proc_unlock(proc); +} + +/** + * binder_get_txn_from() - safely extract the "from" thread in transaction + * @t: binder transaction for t->from + * + * Atomically return the "from" thread and increment the tmp_ref + * count for the thread to ensure it stays alive until + * binder_thread_dec_tmpref() is called. + * + * Return: the value of t->from + */ +static struct binder_thread *binder_get_txn_from( + struct binder_transaction *t) +{ + struct binder_thread *from; + + spin_lock(&t->lock); + from = t->from; + if (from) + atomic_inc(&from->tmp_ref); + spin_unlock(&t->lock); + return from; +} + +/** + * binder_get_txn_from_and_acq_inner() - get t->from and acquire inner lock + * @t: binder transaction for t->from + * + * Same as binder_get_txn_from() except it also acquires the proc->inner_lock + * to guarantee that the thread cannot be released while operating on it. + * The caller must call binder_inner_proc_unlock() to release the inner lock + * as well as call binder_dec_thread_txn() to release the reference. + * + * Return: the value of t->from + */ +static struct binder_thread *binder_get_txn_from_and_acq_inner( + struct binder_transaction *t) +{ + struct binder_thread *from; + + from = binder_get_txn_from(t); + if (!from) + return NULL; + binder_inner_proc_lock(from->proc); + if (t->from) { + BUG_ON(from != t->from); + return from; + } + binder_inner_proc_unlock(from->proc); + binder_thread_dec_tmpref(from); + return NULL; +} + +static void binder_free_transaction(struct binder_transaction *t) +{ if (t->buffer) t->buffer->transaction = NULL; kfree(t); @@ -1195,30 +2126,34 @@ static void binder_send_failed_reply(struct binder_transaction *t, BUG_ON(t->flags & TF_ONE_WAY); while (1) { - target_thread = t->from; + target_thread = binder_get_txn_from_and_acq_inner(t); if (target_thread) { - if (target_thread->return_error != BR_OK && - target_thread->return_error2 == BR_OK) { - target_thread->return_error2 = - target_thread->return_error; - target_thread->return_error = BR_OK; - } - if (target_thread->return_error == BR_OK) { - binder_debug(BINDER_DEBUG_FAILED_TRANSACTION, - "send failed reply for transaction %d to %d:%d\n", - t->debug_id, - target_thread->proc->pid, - target_thread->pid); + binder_debug(BINDER_DEBUG_FAILED_TRANSACTION, + "send failed reply for transaction %d to %d:%d\n", + t->debug_id, + target_thread->proc->pid, + target_thread->pid); - binder_pop_transaction(target_thread, t); - target_thread->return_error = error_code; + binder_pop_transaction_ilocked(target_thread, t); + if (target_thread->reply_error.cmd == BR_OK) { + target_thread->reply_error.cmd = error_code; + binder_enqueue_thread_work_ilocked( + target_thread, + &target_thread->reply_error.work); wake_up_interruptible(&target_thread->wait); } else { - pr_err("reply failed, target thread, %d:%d, has error code %d already\n", - target_thread->proc->pid, - target_thread->pid, - target_thread->return_error); + /* + * Cannot get here for normal operation, but + * we can if multiple synchronous transactions + * are sent without blocking for responses. + * Just ignore the 2nd error in this case. + */ + pr_warn("Unexpected reply error: %u\n", + target_thread->reply_error.cmd); } + binder_inner_proc_unlock(target_thread->proc); + binder_thread_dec_tmpref(target_thread); + binder_free_transaction(t); return; } next = t->from_parent; @@ -1227,7 +2162,7 @@ static void binder_send_failed_reply(struct binder_transaction *t, "send failed reply for transaction %d, target dead\n", t->debug_id); - binder_pop_transaction(target_thread, t); + binder_free_transaction(t); if (next == NULL) { binder_debug(BINDER_DEBUG_DEAD_BINDER, "reply failed, no target thread at root\n"); @@ -1240,43 +2175,212 @@ static void binder_send_failed_reply(struct binder_transaction *t, } } +/** + * binder_cleanup_transaction() - cleans up undelivered transaction + * @t: transaction that needs to be cleaned up + * @reason: reason the transaction wasn't delivered + * @error_code: error to return to caller (if synchronous call) + */ +static void binder_cleanup_transaction(struct binder_transaction *t, + const char *reason, + uint32_t error_code) +{ + if (t->buffer->target_node && !(t->flags & TF_ONE_WAY)) { + binder_send_failed_reply(t, error_code); + } else { + binder_debug(BINDER_DEBUG_DEAD_TRANSACTION, + "undelivered transaction %d, %s\n", + t->debug_id, reason); + binder_free_transaction(t); + } +} + +/** + * binder_validate_object() - checks for a valid metadata object in a buffer. + * @buffer: binder_buffer that we're parsing. + * @offset: offset in the buffer at which to validate an object. + * + * Return: If there's a valid metadata object at @offset in @buffer, the + * size of that object. Otherwise, it returns zero. + */ +static size_t binder_validate_object(struct binder_buffer *buffer, u64 offset) +{ + /* Check if we can read a header first */ + struct binder_object_header *hdr; + size_t object_size = 0; + + if (buffer->data_size < sizeof(*hdr) || + offset > buffer->data_size - sizeof(*hdr) || + !IS_ALIGNED(offset, sizeof(u32))) + return 0; + + /* Ok, now see if we can read a complete object. */ + hdr = (struct binder_object_header *)(buffer->data + offset); + switch (hdr->type) { + case BINDER_TYPE_BINDER: + case BINDER_TYPE_WEAK_BINDER: + case BINDER_TYPE_HANDLE: + case BINDER_TYPE_WEAK_HANDLE: + object_size = sizeof(struct flat_binder_object); + break; + case BINDER_TYPE_FD: + object_size = sizeof(struct binder_fd_object); + break; + case BINDER_TYPE_PTR: + object_size = sizeof(struct binder_buffer_object); + break; + case BINDER_TYPE_FDA: + object_size = sizeof(struct binder_fd_array_object); + break; + default: + return 0; + } + if (offset <= buffer->data_size - object_size && + buffer->data_size >= object_size) + return object_size; + else + return 0; +} + +/** + * binder_validate_ptr() - validates binder_buffer_object in a binder_buffer. + * @b: binder_buffer containing the object + * @index: index in offset array at which the binder_buffer_object is + * located + * @start: points to the start of the offset array + * @num_valid: the number of valid offsets in the offset array + * + * Return: If @index is within the valid range of the offset array + * described by @start and @num_valid, and if there's a valid + * binder_buffer_object at the offset found in index @index + * of the offset array, that object is returned. Otherwise, + * %NULL is returned. + * Note that the offset found in index @index itself is not + * verified; this function assumes that @num_valid elements + * from @start were previously verified to have valid offsets. + */ +static struct binder_buffer_object *binder_validate_ptr(struct binder_buffer *b, + binder_size_t index, + binder_size_t *start, + binder_size_t num_valid) +{ + struct binder_buffer_object *buffer_obj; + binder_size_t *offp; + + if (index >= num_valid) + return NULL; + + offp = start + index; + buffer_obj = (struct binder_buffer_object *)(b->data + *offp); + if (buffer_obj->hdr.type != BINDER_TYPE_PTR) + return NULL; + + return buffer_obj; +} + +/** + * binder_validate_fixup() - validates pointer/fd fixups happen in order. + * @b: transaction buffer + * @objects_start start of objects buffer + * @buffer: binder_buffer_object in which to fix up + * @offset: start offset in @buffer to fix up + * @last_obj: last binder_buffer_object that we fixed up in + * @last_min_offset: minimum fixup offset in @last_obj + * + * Return: %true if a fixup in buffer @buffer at offset @offset is + * allowed. + * + * For safety reasons, we only allow fixups inside a buffer to happen + * at increasing offsets; additionally, we only allow fixup on the last + * buffer object that was verified, or one of its parents. + * + * Example of what is allowed: + * + * A + * B (parent = A, offset = 0) + * C (parent = A, offset = 16) + * D (parent = C, offset = 0) + * E (parent = A, offset = 32) // min_offset is 16 (C.parent_offset) + * + * Examples of what is not allowed: + * + * Decreasing offsets within the same parent: + * A + * C (parent = A, offset = 16) + * B (parent = A, offset = 0) // decreasing offset within A + * + * Referring to a parent that wasn't the last object or any of its parents: + * A + * B (parent = A, offset = 0) + * C (parent = A, offset = 0) + * C (parent = A, offset = 16) + * D (parent = B, offset = 0) // B is not A or any of A's parents + */ +static bool binder_validate_fixup(struct binder_buffer *b, + binder_size_t *objects_start, + struct binder_buffer_object *buffer, + binder_size_t fixup_offset, + struct binder_buffer_object *last_obj, + binder_size_t last_min_offset) +{ + if (!last_obj) { + /* Nothing to fix up in */ + return false; + } + + while (last_obj != buffer) { + /* + * Safe to retrieve the parent of last_obj, since it + * was already previously verified by the driver. + */ + if ((last_obj->flags & BINDER_BUFFER_FLAG_HAS_PARENT) == 0) + return false; + last_min_offset = last_obj->parent_offset + sizeof(uintptr_t); + last_obj = (struct binder_buffer_object *) + (b->data + *(objects_start + last_obj->parent)); + } + return (fixup_offset >= last_min_offset); +} + static void binder_transaction_buffer_release(struct binder_proc *proc, struct binder_buffer *buffer, binder_size_t *failed_at) { - binder_size_t *offp, *off_end; + binder_size_t *offp, *off_start, *off_end; int debug_id = buffer->debug_id; binder_debug(BINDER_DEBUG_TRANSACTION, - "%d buffer release %d, size %zd-%zd, failed at %p\n", + "%d buffer release %d, size %zd-%zd, failed at %pK\n", proc->pid, buffer->debug_id, buffer->data_size, buffer->offsets_size, failed_at); if (buffer->target_node) binder_dec_node(buffer->target_node, 1, 0); - offp = (binder_size_t *)(buffer->data + - ALIGN(buffer->data_size, sizeof(void *))); + off_start = (binder_size_t *)(buffer->data + + ALIGN(buffer->data_size, sizeof(void *))); if (failed_at) off_end = failed_at; else - off_end = (void *)offp + buffer->offsets_size; - for (; offp < off_end; offp++) { - struct flat_binder_object *fp; + off_end = (void *)off_start + buffer->offsets_size; + for (offp = off_start; offp < off_end; offp++) { + struct binder_object_header *hdr; + size_t object_size = binder_validate_object(buffer, *offp); - if (*offp > buffer->data_size - sizeof(*fp) || - buffer->data_size < sizeof(*fp) || - !IS_ALIGNED(*offp, sizeof(u32))) { - pr_err("transaction release %d bad offset %lld, size %zd\n", + if (object_size == 0) { + pr_err("transaction release %d bad object at offset %lld, size %zd\n", debug_id, (u64)*offp, buffer->data_size); continue; } - fp = (struct flat_binder_object *)(buffer->data + *offp); - switch (fp->type) { + hdr = (struct binder_object_header *)(buffer->data + *offp); + switch (hdr->type) { case BINDER_TYPE_BINDER: case BINDER_TYPE_WEAK_BINDER: { - struct binder_node *node = binder_get_node(proc, fp->binder); + struct flat_binder_object *fp; + struct binder_node *node; + fp = to_flat_binder_object(hdr); + node = binder_get_node(proc, fp->binder); if (node == NULL) { pr_err("transaction release %d bad node %016llx\n", debug_id, (u64)fp->binder); @@ -1285,90 +2389,560 @@ static void binder_transaction_buffer_release(struct binder_proc *proc, binder_debug(BINDER_DEBUG_TRANSACTION, " node %d u%016llx\n", node->debug_id, (u64)node->ptr); - binder_dec_node(node, fp->type == BINDER_TYPE_BINDER, 0); + binder_dec_node(node, hdr->type == BINDER_TYPE_BINDER, + 0); + binder_put_node(node); } break; case BINDER_TYPE_HANDLE: case BINDER_TYPE_WEAK_HANDLE: { - struct binder_ref *ref; + struct flat_binder_object *fp; + struct binder_ref_data rdata; + int ret; - ref = binder_get_ref(proc, fp->handle, - fp->type == BINDER_TYPE_HANDLE); + fp = to_flat_binder_object(hdr); + ret = binder_dec_ref_for_handle(proc, fp->handle, + hdr->type == BINDER_TYPE_HANDLE, &rdata); - if (ref == NULL) { - pr_err("transaction release %d bad handle %d\n", - debug_id, fp->handle); + if (ret) { + pr_err("transaction release %d bad handle %d, ret = %d\n", + debug_id, fp->handle, ret); break; } binder_debug(BINDER_DEBUG_TRANSACTION, - " ref %d desc %d (node %d)\n", - ref->debug_id, ref->desc, ref->node->debug_id); - binder_dec_ref(ref, fp->type == BINDER_TYPE_HANDLE); + " ref %d desc %d\n", + rdata.debug_id, rdata.desc); } break; - case BINDER_TYPE_FD: - binder_debug(BINDER_DEBUG_TRANSACTION, - " fd %d\n", fp->handle); - if (failed_at) - task_close_fd(proc, fp->handle); - break; + case BINDER_TYPE_FD: { + struct binder_fd_object *fp = to_binder_fd_object(hdr); + binder_debug(BINDER_DEBUG_TRANSACTION, + " fd %d\n", fp->fd); + if (failed_at) + task_close_fd(proc, fp->fd); + } break; + case BINDER_TYPE_PTR: + /* + * Nothing to do here, this will get cleaned up when the + * transaction buffer gets freed + */ + break; + case BINDER_TYPE_FDA: { + struct binder_fd_array_object *fda; + struct binder_buffer_object *parent; + uintptr_t parent_buffer; + u32 *fd_array; + size_t fd_index; + binder_size_t fd_buf_size; + + fda = to_binder_fd_array_object(hdr); + parent = binder_validate_ptr(buffer, fda->parent, + off_start, + offp - off_start); + if (!parent) { + pr_err("transaction release %d bad parent offset", + debug_id); + continue; + } + /* + * Since the parent was already fixed up, convert it + * back to kernel address space to access it + */ + parent_buffer = parent->buffer - + binder_alloc_get_user_buffer_offset( + &proc->alloc); + + fd_buf_size = sizeof(u32) * fda->num_fds; + if (fda->num_fds >= SIZE_MAX / sizeof(u32)) { + pr_err("transaction release %d invalid number of fds (%lld)\n", + debug_id, (u64)fda->num_fds); + continue; + } + if (fd_buf_size > parent->length || + fda->parent_offset > parent->length - fd_buf_size) { + /* No space for all file descriptors here. */ + pr_err("transaction release %d not enough space for %lld fds in buffer\n", + debug_id, (u64)fda->num_fds); + continue; + } + fd_array = (u32 *)(parent_buffer + (uintptr_t)fda->parent_offset); + for (fd_index = 0; fd_index < fda->num_fds; fd_index++) + task_close_fd(proc, fd_array[fd_index]); + } break; default: pr_err("transaction release %d bad object type %x\n", - debug_id, fp->type); + debug_id, hdr->type); break; } } } +static int binder_translate_binder(struct flat_binder_object *fp, + struct binder_transaction *t, + struct binder_thread *thread) +{ + struct binder_node *node; + struct binder_proc *proc = thread->proc; + struct binder_proc *target_proc = t->to_proc; + struct binder_ref_data rdata; + int ret = 0; + + node = binder_get_node(proc, fp->binder); + if (!node) { + node = binder_new_node(proc, fp); + if (!node) + return -ENOMEM; + } + if (fp->cookie != node->cookie) { + binder_user_error("%d:%d sending u%016llx node %d, cookie mismatch %016llx != %016llx\n", + proc->pid, thread->pid, (u64)fp->binder, + node->debug_id, (u64)fp->cookie, + (u64)node->cookie); + ret = -EINVAL; + goto done; + } + if (security_binder_transfer_binder(proc->tsk, target_proc->tsk)) { + ret = -EPERM; + goto done; + } + + ret = binder_inc_ref_for_node(target_proc, node, + fp->hdr.type == BINDER_TYPE_BINDER, + &thread->todo, &rdata); + if (ret) + goto done; + + if (fp->hdr.type == BINDER_TYPE_BINDER) + fp->hdr.type = BINDER_TYPE_HANDLE; + else + fp->hdr.type = BINDER_TYPE_WEAK_HANDLE; + fp->binder = 0; + fp->handle = rdata.desc; + fp->cookie = 0; + + trace_binder_transaction_node_to_ref(t, node, &rdata); + binder_debug(BINDER_DEBUG_TRANSACTION, + " node %d u%016llx -> ref %d desc %d\n", + node->debug_id, (u64)node->ptr, + rdata.debug_id, rdata.desc); +done: + binder_put_node(node); + return ret; +} + +static int binder_translate_handle(struct flat_binder_object *fp, + struct binder_transaction *t, + struct binder_thread *thread) +{ + struct binder_proc *proc = thread->proc; + struct binder_proc *target_proc = t->to_proc; + struct binder_node *node; + struct binder_ref_data src_rdata; + int ret = 0; + + node = binder_get_node_from_ref(proc, fp->handle, + fp->hdr.type == BINDER_TYPE_HANDLE, &src_rdata); + if (!node) { + binder_user_error("%d:%d got transaction with invalid handle, %d\n", + proc->pid, thread->pid, fp->handle); + return -EINVAL; + } + if (security_binder_transfer_binder(proc->tsk, target_proc->tsk)) { + ret = -EPERM; + goto done; + } + + binder_node_lock(node); + if (node->proc == target_proc) { + if (fp->hdr.type == BINDER_TYPE_HANDLE) + fp->hdr.type = BINDER_TYPE_BINDER; + else + fp->hdr.type = BINDER_TYPE_WEAK_BINDER; + fp->binder = node->ptr; + fp->cookie = node->cookie; + if (node->proc) + binder_inner_proc_lock(node->proc); + binder_inc_node_nilocked(node, + fp->hdr.type == BINDER_TYPE_BINDER, + 0, NULL); + if (node->proc) + binder_inner_proc_unlock(node->proc); + trace_binder_transaction_ref_to_node(t, node, &src_rdata); + binder_debug(BINDER_DEBUG_TRANSACTION, + " ref %d desc %d -> node %d u%016llx\n", + src_rdata.debug_id, src_rdata.desc, node->debug_id, + (u64)node->ptr); + binder_node_unlock(node); + } else { + struct binder_ref_data dest_rdata; + + binder_node_unlock(node); + ret = binder_inc_ref_for_node(target_proc, node, + fp->hdr.type == BINDER_TYPE_HANDLE, + NULL, &dest_rdata); + if (ret) + goto done; + + fp->binder = 0; + fp->handle = dest_rdata.desc; + fp->cookie = 0; + trace_binder_transaction_ref_to_ref(t, node, &src_rdata, + &dest_rdata); + binder_debug(BINDER_DEBUG_TRANSACTION, + " ref %d desc %d -> ref %d desc %d (node %d)\n", + src_rdata.debug_id, src_rdata.desc, + dest_rdata.debug_id, dest_rdata.desc, + node->debug_id); + } +done: + binder_put_node(node); + return ret; +} + +static int binder_translate_fd(int fd, + struct binder_transaction *t, + struct binder_thread *thread, + struct binder_transaction *in_reply_to) +{ + struct binder_proc *proc = thread->proc; + struct binder_proc *target_proc = t->to_proc; + int target_fd; + struct file *file; + int ret; + bool target_allows_fd; + + if (in_reply_to) + target_allows_fd = !!(in_reply_to->flags & TF_ACCEPT_FDS); + else + target_allows_fd = t->buffer->target_node->accept_fds; + if (!target_allows_fd) { + binder_user_error("%d:%d got %s with fd, %d, but target does not allow fds\n", + proc->pid, thread->pid, + in_reply_to ? "reply" : "transaction", + fd); + ret = -EPERM; + goto err_fd_not_accepted; + } + + file = fget(fd); + if (!file) { + binder_user_error("%d:%d got transaction with invalid fd, %d\n", + proc->pid, thread->pid, fd); + ret = -EBADF; + goto err_fget; + } + ret = security_binder_transfer_file(proc->tsk, target_proc->tsk, file); + if (ret < 0) { + ret = -EPERM; + goto err_security; + } + + target_fd = task_get_unused_fd_flags(target_proc, O_CLOEXEC); + if (target_fd < 0) { + ret = -ENOMEM; + goto err_get_unused_fd; + } + task_fd_install(target_proc, target_fd, file); + trace_binder_transaction_fd(t, fd, target_fd); + binder_debug(BINDER_DEBUG_TRANSACTION, " fd %d -> %d\n", + fd, target_fd); + + return target_fd; + +err_get_unused_fd: +err_security: + fput(file); +err_fget: +err_fd_not_accepted: + return ret; +} + +static int binder_translate_fd_array(struct binder_fd_array_object *fda, + struct binder_buffer_object *parent, + struct binder_transaction *t, + struct binder_thread *thread, + struct binder_transaction *in_reply_to) +{ + binder_size_t fdi, fd_buf_size, num_installed_fds; + int target_fd; + uintptr_t parent_buffer; + u32 *fd_array; + struct binder_proc *proc = thread->proc; + struct binder_proc *target_proc = t->to_proc; + + fd_buf_size = sizeof(u32) * fda->num_fds; + if (fda->num_fds >= SIZE_MAX / sizeof(u32)) { + binder_user_error("%d:%d got transaction with invalid number of fds (%lld)\n", + proc->pid, thread->pid, (u64)fda->num_fds); + return -EINVAL; + } + if (fd_buf_size > parent->length || + fda->parent_offset > parent->length - fd_buf_size) { + /* No space for all file descriptors here. */ + binder_user_error("%d:%d not enough space to store %lld fds in buffer\n", + proc->pid, thread->pid, (u64)fda->num_fds); + return -EINVAL; + } + /* + * Since the parent was already fixed up, convert it + * back to the kernel address space to access it + */ + parent_buffer = parent->buffer - + binder_alloc_get_user_buffer_offset(&target_proc->alloc); + fd_array = (u32 *)(parent_buffer + (uintptr_t)fda->parent_offset); + if (!IS_ALIGNED((unsigned long)fd_array, sizeof(u32))) { + binder_user_error("%d:%d parent offset not aligned correctly.\n", + proc->pid, thread->pid); + return -EINVAL; + } + for (fdi = 0; fdi < fda->num_fds; fdi++) { + target_fd = binder_translate_fd(fd_array[fdi], t, thread, + in_reply_to); + if (target_fd < 0) + goto err_translate_fd_failed; + fd_array[fdi] = target_fd; + } + return 0; + +err_translate_fd_failed: + /* + * Failed to allocate fd or security error, free fds + * installed so far. + */ + num_installed_fds = fdi; + for (fdi = 0; fdi < num_installed_fds; fdi++) + task_close_fd(target_proc, fd_array[fdi]); + return target_fd; +} + +static int binder_fixup_parent(struct binder_transaction *t, + struct binder_thread *thread, + struct binder_buffer_object *bp, + binder_size_t *off_start, + binder_size_t num_valid, + struct binder_buffer_object *last_fixup_obj, + binder_size_t last_fixup_min_off) +{ + struct binder_buffer_object *parent; + u8 *parent_buffer; + struct binder_buffer *b = t->buffer; + struct binder_proc *proc = thread->proc; + struct binder_proc *target_proc = t->to_proc; + + if (!(bp->flags & BINDER_BUFFER_FLAG_HAS_PARENT)) + return 0; + + parent = binder_validate_ptr(b, bp->parent, off_start, num_valid); + if (!parent) { + binder_user_error("%d:%d got transaction with invalid parent offset or type\n", + proc->pid, thread->pid); + return -EINVAL; + } + + if (!binder_validate_fixup(b, off_start, + parent, bp->parent_offset, + last_fixup_obj, + last_fixup_min_off)) { + binder_user_error("%d:%d got transaction with out-of-order buffer fixup\n", + proc->pid, thread->pid); + return -EINVAL; + } + + if (parent->length < sizeof(binder_uintptr_t) || + bp->parent_offset > parent->length - sizeof(binder_uintptr_t)) { + /* No space for a pointer here! */ + binder_user_error("%d:%d got transaction with invalid parent offset\n", + proc->pid, thread->pid); + return -EINVAL; + } + parent_buffer = (u8 *)((uintptr_t)parent->buffer - + binder_alloc_get_user_buffer_offset( + &target_proc->alloc)); + *(binder_uintptr_t *)(parent_buffer + bp->parent_offset) = bp->buffer; + + return 0; +} + +/** + * binder_proc_transaction() - sends a transaction to a process and wakes it up + * @t: transaction to send + * @proc: process to send the transaction to + * @thread: thread in @proc to send the transaction to (may be NULL) + * + * This function queues a transaction to the specified process. It will try + * to find a thread in the target process to handle the transaction and + * wake it up. If no thread is found, the work is queued to the proc + * waitqueue. + * + * If the @thread parameter is not NULL, the transaction is always queued + * to the waitlist of that specific thread. + * + * Return: true if the transactions was successfully queued + * false if the target process or thread is dead + */ +static bool binder_proc_transaction(struct binder_transaction *t, + struct binder_proc *proc, + struct binder_thread *thread) +{ + struct binder_node *node = t->buffer->target_node; + struct binder_priority node_prio; + bool oneway = !!(t->flags & TF_ONE_WAY); + bool pending_async = false; + + BUG_ON(!node); + binder_node_lock(node); + node_prio.prio = node->min_priority; + node_prio.sched_policy = node->sched_policy; + + if (oneway) { + BUG_ON(thread); + if (node->has_async_transaction) { + pending_async = true; + } else { + node->has_async_transaction = true; + } + } + + binder_inner_proc_lock(proc); + + if (proc->is_dead || (thread && thread->is_dead)) { + binder_inner_proc_unlock(proc); + binder_node_unlock(node); + return false; + } + + if (!thread && !pending_async) + thread = binder_select_thread_ilocked(proc); + + if (thread) { + binder_transaction_priority(thread->task, t, node_prio, + node->inherit_rt); + binder_enqueue_thread_work_ilocked(thread, &t->work); + } else if (!pending_async) { + binder_enqueue_work_ilocked(&t->work, &proc->todo); + } else { + binder_enqueue_work_ilocked(&t->work, &node->async_todo); + } + + if (!pending_async) + binder_wakeup_thread_ilocked(proc, thread, !oneway /* sync */); + + binder_inner_proc_unlock(proc); + binder_node_unlock(node); + + return true; +} + +/** + * binder_get_node_refs_for_txn() - Get required refs on node for txn + * @node: struct binder_node for which to get refs + * @proc: returns @node->proc if valid + * @error: if no @proc then returns BR_DEAD_REPLY + * + * User-space normally keeps the node alive when creating a transaction + * since it has a reference to the target. The local strong ref keeps it + * alive if the sending process dies before the target process processes + * the transaction. If the source process is malicious or has a reference + * counting bug, relying on the local strong ref can fail. + * + * Since user-space can cause the local strong ref to go away, we also take + * a tmpref on the node to ensure it survives while we are constructing + * the transaction. We also need a tmpref on the proc while we are + * constructing the transaction, so we take that here as well. + * + * Return: The target_node with refs taken or NULL if no @node->proc is NULL. + * Also sets @proc if valid. If the @node->proc is NULL indicating that the + * target proc has died, @error is set to BR_DEAD_REPLY + */ +static struct binder_node *binder_get_node_refs_for_txn( + struct binder_node *node, + struct binder_proc **procp, + uint32_t *error) +{ + struct binder_node *target_node = NULL; + + binder_node_inner_lock(node); + if (node->proc) { + target_node = node; + binder_inc_node_nilocked(node, 1, 0, NULL); + binder_inc_node_tmpref_ilocked(node); + node->proc->tmp_ref++; + *procp = node->proc; + } else + *error = BR_DEAD_REPLY; + binder_node_inner_unlock(node); + + return target_node; +} + static void binder_transaction(struct binder_proc *proc, struct binder_thread *thread, - struct binder_transaction_data *tr, int reply) + struct binder_transaction_data *tr, int reply, + binder_size_t extra_buffers_size) { + int ret; struct binder_transaction *t; struct binder_work *tcomplete; - binder_size_t *offp, *off_end; + binder_size_t *offp, *off_end, *off_start; binder_size_t off_min; - struct binder_proc *target_proc; + u8 *sg_bufp, *sg_buf_end; + struct binder_proc *target_proc = NULL; struct binder_thread *target_thread = NULL; struct binder_node *target_node = NULL; - struct list_head *target_list; - wait_queue_head_t *target_wait; struct binder_transaction *in_reply_to = NULL; struct binder_transaction_log_entry *e; - uint32_t return_error; + uint32_t return_error = 0; + uint32_t return_error_param = 0; + uint32_t return_error_line = 0; + struct binder_buffer_object *last_fixup_obj = NULL; + binder_size_t last_fixup_min_off = 0; + struct binder_context *context = proc->context; + int t_debug_id = atomic_inc_return(&binder_last_id); e = binder_transaction_log_add(&binder_transaction_log); + e->debug_id = t_debug_id; e->call_type = reply ? 2 : !!(tr->flags & TF_ONE_WAY); e->from_proc = proc->pid; e->from_thread = thread->pid; e->target_handle = tr->target.handle; e->data_size = tr->data_size; e->offsets_size = tr->offsets_size; + e->context_name = proc->context->name; if (reply) { + binder_inner_proc_lock(proc); in_reply_to = thread->transaction_stack; if (in_reply_to == NULL) { + binder_inner_proc_unlock(proc); binder_user_error("%d:%d got reply transaction with no transaction stack\n", proc->pid, thread->pid); return_error = BR_FAILED_REPLY; + return_error_param = -EPROTO; + return_error_line = __LINE__; goto err_empty_call_stack; } - binder_set_nice(in_reply_to->saved_priority); if (in_reply_to->to_thread != thread) { + spin_lock(&in_reply_to->lock); binder_user_error("%d:%d got reply transaction with bad transaction stack, transaction %d has target %d:%d\n", proc->pid, thread->pid, in_reply_to->debug_id, in_reply_to->to_proc ? in_reply_to->to_proc->pid : 0, in_reply_to->to_thread ? in_reply_to->to_thread->pid : 0); + spin_unlock(&in_reply_to->lock); + binder_inner_proc_unlock(proc); return_error = BR_FAILED_REPLY; + return_error_param = -EPROTO; + return_error_line = __LINE__; in_reply_to = NULL; goto err_bad_call_stack; } thread->transaction_stack = in_reply_to->to_parent; - target_thread = in_reply_to->from; + binder_inner_proc_unlock(proc); + target_thread = binder_get_txn_from_and_acq_inner(in_reply_to); if (target_thread == NULL) { return_error = BR_DEAD_REPLY; + return_error_line = __LINE__; goto err_dead_binder; } if (target_thread->transaction_stack != in_reply_to) { @@ -1377,106 +2951,156 @@ static void binder_transaction(struct binder_proc *proc, target_thread->transaction_stack ? target_thread->transaction_stack->debug_id : 0, in_reply_to->debug_id); + binder_inner_proc_unlock(target_thread->proc); return_error = BR_FAILED_REPLY; + return_error_param = -EPROTO; + return_error_line = __LINE__; in_reply_to = NULL; target_thread = NULL; goto err_dead_binder; } target_proc = target_thread->proc; + target_proc->tmp_ref++; + binder_inner_proc_unlock(target_thread->proc); } else { if (tr->target.handle) { struct binder_ref *ref; - ref = binder_get_ref(proc, tr->target.handle, true); - if (ref == NULL) { + /* + * There must already be a strong ref + * on this node. If so, do a strong + * increment on the node to ensure it + * stays alive until the transaction is + * done. + */ + binder_proc_lock(proc); + ref = binder_get_ref_olocked(proc, tr->target.handle, + true); + if (ref) { + target_node = binder_get_node_refs_for_txn( + ref->node, &target_proc, + &return_error); + } else { binder_user_error("%d:%d got transaction to invalid handle\n", - proc->pid, thread->pid); + proc->pid, thread->pid); return_error = BR_FAILED_REPLY; + } + binder_proc_unlock(proc); + } else { + mutex_lock(&context->context_mgr_node_lock); + target_node = context->binder_context_mgr_node; + if (target_node) + target_node = binder_get_node_refs_for_txn( + target_node, &target_proc, + &return_error); + else + return_error = BR_DEAD_REPLY; + mutex_unlock(&context->context_mgr_node_lock); + if (target_node && target_proc == proc) { + binder_user_error("%d:%d got transaction to context manager from process owning it\n", + proc->pid, thread->pid); + return_error = BR_FAILED_REPLY; + return_error_param = -EINVAL; + return_error_line = __LINE__; goto err_invalid_target_handle; } - target_node = ref->node; - } else { - target_node = binder_context_mgr_node; - if (target_node == NULL) { - return_error = BR_DEAD_REPLY; - goto err_no_context_mgr_node; - } } - e->to_node = target_node->debug_id; - target_proc = target_node->proc; - if (target_proc == NULL) { - return_error = BR_DEAD_REPLY; + if (!target_node) { + /* + * return_error is set above + */ + return_error_param = -EINVAL; + return_error_line = __LINE__; goto err_dead_binder; } + e->to_node = target_node->debug_id; if (security_binder_transaction(proc->tsk, target_proc->tsk) < 0) { return_error = BR_FAILED_REPLY; + return_error_param = -EPERM; + return_error_line = __LINE__; goto err_invalid_target_handle; } + binder_inner_proc_lock(proc); if (!(tr->flags & TF_ONE_WAY) && thread->transaction_stack) { struct binder_transaction *tmp; tmp = thread->transaction_stack; if (tmp->to_thread != thread) { + spin_lock(&tmp->lock); binder_user_error("%d:%d got new transaction with bad transaction stack, transaction %d has target %d:%d\n", proc->pid, thread->pid, tmp->debug_id, tmp->to_proc ? tmp->to_proc->pid : 0, tmp->to_thread ? tmp->to_thread->pid : 0); + spin_unlock(&tmp->lock); + binder_inner_proc_unlock(proc); return_error = BR_FAILED_REPLY; + return_error_param = -EPROTO; + return_error_line = __LINE__; goto err_bad_call_stack; } while (tmp) { - if (tmp->from && tmp->from->proc == target_proc) - target_thread = tmp->from; + struct binder_thread *from; + + spin_lock(&tmp->lock); + from = tmp->from; + if (from && from->proc == target_proc) { + atomic_inc(&from->tmp_ref); + target_thread = from; + spin_unlock(&tmp->lock); + break; + } + spin_unlock(&tmp->lock); tmp = tmp->from_parent; } } + binder_inner_proc_unlock(proc); } - if (target_thread) { + if (target_thread) e->to_thread = target_thread->pid; - target_list = &target_thread->todo; - target_wait = &target_thread->wait; - } else { - target_list = &target_proc->todo; - target_wait = &target_proc->wait; - } e->to_proc = target_proc->pid; /* TODO: reuse incoming transaction for reply */ t = kzalloc(sizeof(*t), GFP_KERNEL); if (t == NULL) { return_error = BR_FAILED_REPLY; + return_error_param = -ENOMEM; + return_error_line = __LINE__; goto err_alloc_t_failed; } binder_stats_created(BINDER_STAT_TRANSACTION); + spin_lock_init(&t->lock); tcomplete = kzalloc(sizeof(*tcomplete), GFP_KERNEL); if (tcomplete == NULL) { return_error = BR_FAILED_REPLY; + return_error_param = -ENOMEM; + return_error_line = __LINE__; goto err_alloc_tcomplete_failed; } binder_stats_created(BINDER_STAT_TRANSACTION_COMPLETE); - t->debug_id = ++binder_last_id; - e->debug_id = t->debug_id; + t->debug_id = t_debug_id; if (reply) binder_debug(BINDER_DEBUG_TRANSACTION, - "%d:%d BC_REPLY %d -> %d:%d, data %016llx-%016llx size %lld-%lld\n", + "%d:%d BC_REPLY %d -> %d:%d, data %016llx-%016llx size %lld-%lld-%lld\n", proc->pid, thread->pid, t->debug_id, target_proc->pid, target_thread->pid, (u64)tr->data.ptr.buffer, (u64)tr->data.ptr.offsets, - (u64)tr->data_size, (u64)tr->offsets_size); + (u64)tr->data_size, (u64)tr->offsets_size, + (u64)extra_buffers_size); else binder_debug(BINDER_DEBUG_TRANSACTION, - "%d:%d BC_TRANSACTION %d -> %d - node %d, data %016llx-%016llx size %lld-%lld\n", + "%d:%d BC_TRANSACTION %d -> %d - node %d, data %016llx-%016llx size %lld-%lld-%lld\n", proc->pid, thread->pid, t->debug_id, target_proc->pid, target_node->debug_id, (u64)tr->data.ptr.buffer, (u64)tr->data.ptr.offsets, - (u64)tr->data_size, (u64)tr->offsets_size); + (u64)tr->data_size, (u64)tr->offsets_size, + (u64)extra_buffers_size); if (!reply && !(tr->flags & TF_ONE_WAY)) t->from = thread; @@ -1487,14 +3111,30 @@ static void binder_transaction(struct binder_proc *proc, t->to_thread = target_thread; t->code = tr->code; t->flags = tr->flags; - t->priority = task_nice(current); + if (!(t->flags & TF_ONE_WAY) && + binder_supported_policy(current->policy)) { + /* Inherit supported policies for synchronous transactions */ + t->priority.sched_policy = current->policy; + t->priority.prio = current->normal_prio; + } else { + /* Otherwise, fall back to the default priority */ + t->priority = target_proc->default_priority; + } trace_binder_transaction(reply, t, target_node); - t->buffer = binder_alloc_buf(target_proc, tr->data_size, - tr->offsets_size, !reply && (t->flags & TF_ONE_WAY)); - if (t->buffer == NULL) { - return_error = BR_FAILED_REPLY; + t->buffer = binder_alloc_new_buf(&target_proc->alloc, tr->data_size, + tr->offsets_size, extra_buffers_size, + !reply && (t->flags & TF_ONE_WAY)); + if (IS_ERR(t->buffer)) { + /* + * -ESRCH indicates VMA cleared. The target is dying. + */ + return_error_param = PTR_ERR(t->buffer); + return_error = return_error_param == -ESRCH ? + BR_DEAD_REPLY : BR_FAILED_REPLY; + return_error_line = __LINE__; + t->buffer = NULL; goto err_binder_alloc_buf_failed; } t->buffer->allow_user_free = 0; @@ -1502,17 +3142,17 @@ static void binder_transaction(struct binder_proc *proc, t->buffer->transaction = t; t->buffer->target_node = target_node; trace_binder_transaction_alloc_buf(t->buffer); - if (target_node) - binder_inc_node(target_node, 1, 0, NULL); - - offp = (binder_size_t *)(t->buffer->data + - ALIGN(tr->data_size, sizeof(void *))); + off_start = (binder_size_t *)(t->buffer->data + + ALIGN(tr->data_size, sizeof(void *))); + offp = off_start; if (copy_from_user(t->buffer->data, (const void __user *)(uintptr_t) tr->data.ptr.buffer, tr->data_size)) { binder_user_error("%d:%d got transaction with invalid data ptr\n", proc->pid, thread->pid); return_error = BR_FAILED_REPLY; + return_error_param = -EFAULT; + return_error_line = __LINE__; goto err_copy_data_failed; } if (copy_from_user(offp, (const void __user *)(uintptr_t) @@ -1520,231 +3160,253 @@ static void binder_transaction(struct binder_proc *proc, binder_user_error("%d:%d got transaction with invalid offsets ptr\n", proc->pid, thread->pid); return_error = BR_FAILED_REPLY; + return_error_param = -EFAULT; + return_error_line = __LINE__; goto err_copy_data_failed; } if (!IS_ALIGNED(tr->offsets_size, sizeof(binder_size_t))) { binder_user_error("%d:%d got transaction with invalid offsets size, %lld\n", proc->pid, thread->pid, (u64)tr->offsets_size); return_error = BR_FAILED_REPLY; + return_error_param = -EINVAL; + return_error_line = __LINE__; goto err_bad_offset; } - off_end = (void *)offp + tr->offsets_size; + if (!IS_ALIGNED(extra_buffers_size, sizeof(u64))) { + binder_user_error("%d:%d got transaction with unaligned buffers size, %lld\n", + proc->pid, thread->pid, + (u64)extra_buffers_size); + return_error = BR_FAILED_REPLY; + return_error_param = -EINVAL; + return_error_line = __LINE__; + goto err_bad_offset; + } + off_end = (void *)off_start + tr->offsets_size; + sg_bufp = (u8 *)(PTR_ALIGN(off_end, sizeof(void *))); + sg_buf_end = sg_bufp + extra_buffers_size; off_min = 0; for (; offp < off_end; offp++) { - struct flat_binder_object *fp; + struct binder_object_header *hdr; + size_t object_size = binder_validate_object(t->buffer, *offp); - if (*offp > t->buffer->data_size - sizeof(*fp) || - *offp < off_min || - t->buffer->data_size < sizeof(*fp) || - !IS_ALIGNED(*offp, sizeof(u32))) { - binder_user_error("%d:%d got transaction with invalid offset, %lld (min %lld, max %lld)\n", + if (object_size == 0 || *offp < off_min) { + binder_user_error("%d:%d got transaction with invalid offset (%lld, min %lld max %lld) or object.\n", proc->pid, thread->pid, (u64)*offp, (u64)off_min, - (u64)(t->buffer->data_size - - sizeof(*fp))); + (u64)t->buffer->data_size); return_error = BR_FAILED_REPLY; + return_error_param = -EINVAL; + return_error_line = __LINE__; goto err_bad_offset; } - fp = (struct flat_binder_object *)(t->buffer->data + *offp); - off_min = *offp + sizeof(struct flat_binder_object); - switch (fp->type) { + + hdr = (struct binder_object_header *)(t->buffer->data + *offp); + off_min = *offp + object_size; + switch (hdr->type) { case BINDER_TYPE_BINDER: case BINDER_TYPE_WEAK_BINDER: { - struct binder_ref *ref; - struct binder_node *node = binder_get_node(proc, fp->binder); + struct flat_binder_object *fp; - if (node == NULL) { - node = binder_new_node(proc, fp->binder, fp->cookie); - if (node == NULL) { - return_error = BR_FAILED_REPLY; - goto err_binder_new_node_failed; - } - node->min_priority = fp->flags & FLAT_BINDER_FLAG_PRIORITY_MASK; - node->accept_fds = !!(fp->flags & FLAT_BINDER_FLAG_ACCEPTS_FDS); - } - if (fp->cookie != node->cookie) { - binder_user_error("%d:%d sending u%016llx node %d, cookie mismatch %016llx != %016llx\n", - proc->pid, thread->pid, - (u64)fp->binder, node->debug_id, - (u64)fp->cookie, (u64)node->cookie); + fp = to_flat_binder_object(hdr); + ret = binder_translate_binder(fp, t, thread); + if (ret < 0) { return_error = BR_FAILED_REPLY; - goto err_binder_get_ref_for_node_failed; + return_error_param = ret; + return_error_line = __LINE__; + goto err_translate_failed; } - if (security_binder_transfer_binder(proc->tsk, - target_proc->tsk)) { - return_error = BR_FAILED_REPLY; - goto err_binder_get_ref_for_node_failed; - } - ref = binder_get_ref_for_node(target_proc, node); - if (ref == NULL) { - return_error = BR_FAILED_REPLY; - goto err_binder_get_ref_for_node_failed; - } - if (fp->type == BINDER_TYPE_BINDER) - fp->type = BINDER_TYPE_HANDLE; - else - fp->type = BINDER_TYPE_WEAK_HANDLE; - fp->binder = 0; - fp->handle = ref->desc; - fp->cookie = 0; - binder_inc_ref(ref, fp->type == BINDER_TYPE_HANDLE, - &thread->todo); - - trace_binder_transaction_node_to_ref(t, node, ref); - binder_debug(BINDER_DEBUG_TRANSACTION, - " node %d u%016llx -> ref %d desc %d\n", - node->debug_id, (u64)node->ptr, - ref->debug_id, ref->desc); } break; case BINDER_TYPE_HANDLE: case BINDER_TYPE_WEAK_HANDLE: { - struct binder_ref *ref; + struct flat_binder_object *fp; - ref = binder_get_ref(proc, fp->handle, - fp->type == BINDER_TYPE_HANDLE); - - if (ref == NULL) { - binder_user_error("%d:%d got transaction with invalid handle, %d\n", - proc->pid, - thread->pid, fp->handle); + fp = to_flat_binder_object(hdr); + ret = binder_translate_handle(fp, t, thread); + if (ret < 0) { return_error = BR_FAILED_REPLY; - goto err_binder_get_ref_failed; - } - if (security_binder_transfer_binder(proc->tsk, - target_proc->tsk)) { - return_error = BR_FAILED_REPLY; - goto err_binder_get_ref_failed; - } - if (ref->node->proc == target_proc) { - if (fp->type == BINDER_TYPE_HANDLE) - fp->type = BINDER_TYPE_BINDER; - else - fp->type = BINDER_TYPE_WEAK_BINDER; - fp->binder = ref->node->ptr; - fp->cookie = ref->node->cookie; - binder_inc_node(ref->node, fp->type == BINDER_TYPE_BINDER, 0, NULL); - trace_binder_transaction_ref_to_node(t, ref); - binder_debug(BINDER_DEBUG_TRANSACTION, - " ref %d desc %d -> node %d u%016llx\n", - ref->debug_id, ref->desc, ref->node->debug_id, - (u64)ref->node->ptr); - } else { - struct binder_ref *new_ref; - - new_ref = binder_get_ref_for_node(target_proc, ref->node); - if (new_ref == NULL) { - return_error = BR_FAILED_REPLY; - goto err_binder_get_ref_for_node_failed; - } - fp->binder = 0; - fp->handle = new_ref->desc; - fp->cookie = 0; - binder_inc_ref(new_ref, fp->type == BINDER_TYPE_HANDLE, NULL); - trace_binder_transaction_ref_to_ref(t, ref, - new_ref); - binder_debug(BINDER_DEBUG_TRANSACTION, - " ref %d desc %d -> ref %d desc %d (node %d)\n", - ref->debug_id, ref->desc, new_ref->debug_id, - new_ref->desc, ref->node->debug_id); + return_error_param = ret; + return_error_line = __LINE__; + goto err_translate_failed; } } break; case BINDER_TYPE_FD: { - int target_fd; - struct file *file; + struct binder_fd_object *fp = to_binder_fd_object(hdr); + int target_fd = binder_translate_fd(fp->fd, t, thread, + in_reply_to); - if (reply) { - if (!(in_reply_to->flags & TF_ACCEPT_FDS)) { - binder_user_error("%d:%d got reply with fd, %d, but target does not allow fds\n", - proc->pid, thread->pid, fp->handle); - return_error = BR_FAILED_REPLY; - goto err_fd_not_allowed; - } - } else if (!target_node->accept_fds) { - binder_user_error("%d:%d got transaction with fd, %d, but target does not allow fds\n", - proc->pid, thread->pid, fp->handle); - return_error = BR_FAILED_REPLY; - goto err_fd_not_allowed; - } - - file = fget(fp->handle); - if (file == NULL) { - binder_user_error("%d:%d got transaction with invalid fd, %d\n", - proc->pid, thread->pid, fp->handle); - return_error = BR_FAILED_REPLY; - goto err_fget_failed; - } - if (security_binder_transfer_file(proc->tsk, - target_proc->tsk, - file) < 0) { - fput(file); - return_error = BR_FAILED_REPLY; - goto err_get_unused_fd_failed; - } - target_fd = task_get_unused_fd_flags(target_proc, O_CLOEXEC); if (target_fd < 0) { - fput(file); return_error = BR_FAILED_REPLY; - goto err_get_unused_fd_failed; + return_error_param = target_fd; + return_error_line = __LINE__; + goto err_translate_failed; } - task_fd_install(target_proc, target_fd, file); - trace_binder_transaction_fd(t, fp->handle, target_fd); - binder_debug(BINDER_DEBUG_TRANSACTION, - " fd %d -> %d\n", fp->handle, target_fd); - /* TODO: fput? */ - fp->binder = 0; - fp->handle = target_fd; + fp->pad_binder = 0; + fp->fd = target_fd; } break; + case BINDER_TYPE_FDA: { + struct binder_fd_array_object *fda = + to_binder_fd_array_object(hdr); + struct binder_buffer_object *parent = + binder_validate_ptr(t->buffer, fda->parent, + off_start, + offp - off_start); + if (!parent) { + binder_user_error("%d:%d got transaction with invalid parent offset or type\n", + proc->pid, thread->pid); + return_error = BR_FAILED_REPLY; + return_error_param = -EINVAL; + return_error_line = __LINE__; + goto err_bad_parent; + } + if (!binder_validate_fixup(t->buffer, off_start, + parent, fda->parent_offset, + last_fixup_obj, + last_fixup_min_off)) { + binder_user_error("%d:%d got transaction with out-of-order buffer fixup\n", + proc->pid, thread->pid); + return_error = BR_FAILED_REPLY; + return_error_param = -EINVAL; + return_error_line = __LINE__; + goto err_bad_parent; + } + ret = binder_translate_fd_array(fda, parent, t, thread, + in_reply_to); + if (ret < 0) { + return_error = BR_FAILED_REPLY; + return_error_param = ret; + return_error_line = __LINE__; + goto err_translate_failed; + } + last_fixup_obj = parent; + last_fixup_min_off = + fda->parent_offset + sizeof(u32) * fda->num_fds; + } break; + case BINDER_TYPE_PTR: { + struct binder_buffer_object *bp = + to_binder_buffer_object(hdr); + size_t buf_left = sg_buf_end - sg_bufp; + if (bp->length > buf_left) { + binder_user_error("%d:%d got transaction with too large buffer\n", + proc->pid, thread->pid); + return_error = BR_FAILED_REPLY; + return_error_param = -EINVAL; + return_error_line = __LINE__; + goto err_bad_offset; + } + if (copy_from_user(sg_bufp, + (const void __user *)(uintptr_t) + bp->buffer, bp->length)) { + binder_user_error("%d:%d got transaction with invalid offsets ptr\n", + proc->pid, thread->pid); + return_error_param = -EFAULT; + return_error = BR_FAILED_REPLY; + return_error_line = __LINE__; + goto err_copy_data_failed; + } + /* Fixup buffer pointer to target proc address space */ + bp->buffer = (uintptr_t)sg_bufp + + binder_alloc_get_user_buffer_offset( + &target_proc->alloc); + sg_bufp += ALIGN(bp->length, sizeof(u64)); + + ret = binder_fixup_parent(t, thread, bp, off_start, + offp - off_start, + last_fixup_obj, + last_fixup_min_off); + if (ret < 0) { + return_error = BR_FAILED_REPLY; + return_error_param = ret; + return_error_line = __LINE__; + goto err_translate_failed; + } + last_fixup_obj = bp; + last_fixup_min_off = 0; + } break; default: binder_user_error("%d:%d got transaction with invalid object type, %x\n", - proc->pid, thread->pid, fp->type); + proc->pid, thread->pid, hdr->type); return_error = BR_FAILED_REPLY; + return_error_param = -EINVAL; + return_error_line = __LINE__; goto err_bad_object_type; } } + tcomplete->type = BINDER_WORK_TRANSACTION_COMPLETE; + t->work.type = BINDER_WORK_TRANSACTION; + if (reply) { + binder_enqueue_thread_work(thread, tcomplete); + binder_inner_proc_lock(target_proc); + if (target_thread->is_dead) { + binder_inner_proc_unlock(target_proc); + goto err_dead_proc_or_thread; + } BUG_ON(t->buffer->async_transaction != 0); - binder_pop_transaction(target_thread, in_reply_to); + binder_pop_transaction_ilocked(target_thread, in_reply_to); + binder_enqueue_thread_work_ilocked(target_thread, &t->work); + binder_inner_proc_unlock(target_proc); + wake_up_interruptible_sync(&target_thread->wait); + binder_restore_priority(current, in_reply_to->saved_priority); + binder_free_transaction(in_reply_to); } else if (!(t->flags & TF_ONE_WAY)) { BUG_ON(t->buffer->async_transaction != 0); + binder_inner_proc_lock(proc); + /* + * Defer the TRANSACTION_COMPLETE, so we don't return to + * userspace immediately; this allows the target process to + * immediately start processing this transaction, reducing + * latency. We will then return the TRANSACTION_COMPLETE when + * the target replies (or there is an error). + */ + binder_enqueue_deferred_thread_work_ilocked(thread, tcomplete); t->need_reply = 1; t->from_parent = thread->transaction_stack; thread->transaction_stack = t; + binder_inner_proc_unlock(proc); + if (!binder_proc_transaction(t, target_proc, target_thread)) { + binder_inner_proc_lock(proc); + binder_pop_transaction_ilocked(thread, t); + binder_inner_proc_unlock(proc); + goto err_dead_proc_or_thread; + } } else { BUG_ON(target_node == NULL); BUG_ON(t->buffer->async_transaction != 1); - if (target_node->has_async_transaction) { - target_list = &target_node->async_todo; - target_wait = NULL; - } else - target_node->has_async_transaction = 1; + binder_enqueue_thread_work(thread, tcomplete); + if (!binder_proc_transaction(t, target_proc, NULL)) + goto err_dead_proc_or_thread; } - t->work.type = BINDER_WORK_TRANSACTION; - list_add_tail(&t->work.entry, target_list); - tcomplete->type = BINDER_WORK_TRANSACTION_COMPLETE; - list_add_tail(&tcomplete->entry, &thread->todo); - if (target_wait) { - if (reply || !(t->flags & TF_ONE_WAY)) - wake_up_interruptible_sync(target_wait); - else - wake_up_interruptible(target_wait); - } + if (target_thread) + binder_thread_dec_tmpref(target_thread); + binder_proc_dec_tmpref(target_proc); + if (target_node) + binder_dec_node_tmpref(target_node); + /* + * write barrier to synchronize with initialization + * of log entry + */ + smp_wmb(); + WRITE_ONCE(e->debug_id_done, t_debug_id); return; -err_get_unused_fd_failed: -err_fget_failed: -err_fd_not_allowed: -err_binder_get_ref_for_node_failed: -err_binder_get_ref_failed: -err_binder_new_node_failed: +err_dead_proc_or_thread: + return_error = BR_DEAD_REPLY; + return_error_line = __LINE__; + binder_dequeue_work(proc, tcomplete); +err_translate_failed: err_bad_object_type: err_bad_offset: +err_bad_parent: err_copy_data_failed: trace_binder_transaction_failed_buffer_release(t->buffer); binder_transaction_buffer_release(target_proc, t->buffer, offp); + if (target_node) + binder_dec_node_tmpref(target_node); + target_node = NULL; t->buffer->transaction = NULL; - binder_free_buf(target_proc, t->buffer); + binder_alloc_free_buf(&target_proc->alloc, t->buffer); err_binder_alloc_buf_failed: kfree(tcomplete); binder_stats_deleted(BINDER_STAT_TRANSACTION_COMPLETE); @@ -1756,25 +3418,48 @@ static void binder_transaction(struct binder_proc *proc, err_empty_call_stack: err_dead_binder: err_invalid_target_handle: -err_no_context_mgr_node: + if (target_thread) + binder_thread_dec_tmpref(target_thread); + if (target_proc) + binder_proc_dec_tmpref(target_proc); + if (target_node) { + binder_dec_node(target_node, 1, 0); + binder_dec_node_tmpref(target_node); + } + binder_debug(BINDER_DEBUG_FAILED_TRANSACTION, - "%d:%d transaction failed %d, size %lld-%lld\n", - proc->pid, thread->pid, return_error, - (u64)tr->data_size, (u64)tr->offsets_size); + "%d:%d transaction failed %d/%d, size %lld-%lld line %d\n", + proc->pid, thread->pid, return_error, return_error_param, + (u64)tr->data_size, (u64)tr->offsets_size, + return_error_line); { struct binder_transaction_log_entry *fe; + e->return_error = return_error; + e->return_error_param = return_error_param; + e->return_error_line = return_error_line; fe = binder_transaction_log_add(&binder_transaction_log_failed); *fe = *e; + /* + * write barrier to synchronize with initialization + * of log entry + */ + smp_wmb(); + WRITE_ONCE(e->debug_id_done, t_debug_id); + WRITE_ONCE(fe->debug_id_done, t_debug_id); } - BUG_ON(thread->return_error != BR_OK); + BUG_ON(thread->return_error.cmd != BR_OK); if (in_reply_to) { - thread->return_error = BR_TRANSACTION_COMPLETE; + binder_restore_priority(current, in_reply_to->saved_priority); + thread->return_error.cmd = BR_TRANSACTION_COMPLETE; + binder_enqueue_thread_work(thread, &thread->return_error.work); binder_send_failed_reply(in_reply_to, return_error); - } else - thread->return_error = return_error; + } else { + thread->return_error.cmd = return_error; + binder_enqueue_thread_work(thread, &thread->return_error.work); + } } static int binder_thread_write(struct binder_proc *proc, @@ -1783,19 +3468,22 @@ static int binder_thread_write(struct binder_proc *proc, binder_size_t *consumed) { uint32_t cmd; + struct binder_context *context = proc->context; void __user *buffer = (void __user *)(uintptr_t)binder_buffer; void __user *ptr = buffer + *consumed; void __user *end = buffer + size; - while (ptr < end && thread->return_error == BR_OK) { + while (ptr < end && thread->return_error.cmd == BR_OK) { + int ret; + if (get_user(cmd, (uint32_t __user *)ptr)) return -EFAULT; ptr += sizeof(uint32_t); trace_binder_command(cmd); if (_IOC_NR(cmd) < ARRAY_SIZE(binder_stats.bc)) { - binder_stats.bc[_IOC_NR(cmd)]++; - proc->stats.bc[_IOC_NR(cmd)]++; - thread->stats.bc[_IOC_NR(cmd)]++; + atomic_inc(&binder_stats.bc[_IOC_NR(cmd)]); + atomic_inc(&proc->stats.bc[_IOC_NR(cmd)]); + atomic_inc(&thread->stats.bc[_IOC_NR(cmd)]); } switch (cmd) { case BC_INCREFS: @@ -1803,53 +3491,61 @@ static int binder_thread_write(struct binder_proc *proc, case BC_RELEASE: case BC_DECREFS: { uint32_t target; - struct binder_ref *ref; const char *debug_string; + bool strong = cmd == BC_ACQUIRE || cmd == BC_RELEASE; + bool increment = cmd == BC_INCREFS || cmd == BC_ACQUIRE; + struct binder_ref_data rdata; if (get_user(target, (uint32_t __user *)ptr)) return -EFAULT; + ptr += sizeof(uint32_t); - if (target == 0 && binder_context_mgr_node && - (cmd == BC_INCREFS || cmd == BC_ACQUIRE)) { - ref = binder_get_ref_for_node(proc, - binder_context_mgr_node); - if (ref->desc != target) { - binder_user_error("%d:%d tried to acquire reference to desc 0, got %d instead\n", - proc->pid, thread->pid, - ref->desc); - } - } else - ref = binder_get_ref(proc, target, - cmd == BC_ACQUIRE || - cmd == BC_RELEASE); - if (ref == NULL) { - binder_user_error("%d:%d refcount change on invalid ref %d\n", - proc->pid, thread->pid, target); - break; + ret = -1; + if (increment && !target) { + struct binder_node *ctx_mgr_node; + mutex_lock(&context->context_mgr_node_lock); + ctx_mgr_node = context->binder_context_mgr_node; + if (ctx_mgr_node) + ret = binder_inc_ref_for_node( + proc, ctx_mgr_node, + strong, NULL, &rdata); + mutex_unlock(&context->context_mgr_node_lock); + } + if (ret) + ret = binder_update_ref_for_handle( + proc, target, increment, strong, + &rdata); + if (!ret && rdata.desc != target) { + binder_user_error("%d:%d tried to acquire reference to desc %d, got %d instead\n", + proc->pid, thread->pid, + target, rdata.desc); } switch (cmd) { case BC_INCREFS: debug_string = "IncRefs"; - binder_inc_ref(ref, 0, NULL); break; case BC_ACQUIRE: debug_string = "Acquire"; - binder_inc_ref(ref, 1, NULL); break; case BC_RELEASE: debug_string = "Release"; - binder_dec_ref(ref, 1); break; case BC_DECREFS: default: debug_string = "DecRefs"; - binder_dec_ref(ref, 0); + break; + } + if (ret) { + binder_user_error("%d:%d %s %d refcount change on invalid ref %d ret %d\n", + proc->pid, thread->pid, debug_string, + strong, target, ret); break; } binder_debug(BINDER_DEBUG_USER_REFS, - "%d:%d %s ref %d desc %d s %d w %d for node %d\n", - proc->pid, thread->pid, debug_string, ref->debug_id, - ref->desc, ref->strong, ref->weak, ref->node->debug_id); + "%d:%d %s ref %d desc %d s %d w %d\n", + proc->pid, thread->pid, debug_string, + rdata.debug_id, rdata.desc, rdata.strong, + rdata.weak); break; } case BC_INCREFS_DONE: @@ -1857,6 +3553,7 @@ static int binder_thread_write(struct binder_proc *proc, binder_uintptr_t node_ptr; binder_uintptr_t cookie; struct binder_node *node; + bool free_node; if (get_user(node_ptr, (binder_uintptr_t __user *)ptr)) return -EFAULT; @@ -1881,13 +3578,17 @@ static int binder_thread_write(struct binder_proc *proc, "BC_INCREFS_DONE" : "BC_ACQUIRE_DONE", (u64)node_ptr, node->debug_id, (u64)cookie, (u64)node->cookie); + binder_put_node(node); break; } + binder_node_inner_lock(node); if (cmd == BC_ACQUIRE_DONE) { if (node->pending_strong_ref == 0) { binder_user_error("%d:%d BC_ACQUIRE_DONE node %d has no pending acquire request\n", proc->pid, thread->pid, node->debug_id); + binder_node_inner_unlock(node); + binder_put_node(node); break; } node->pending_strong_ref = 0; @@ -1896,16 +3597,23 @@ static int binder_thread_write(struct binder_proc *proc, binder_user_error("%d:%d BC_INCREFS_DONE node %d has no pending increfs request\n", proc->pid, thread->pid, node->debug_id); + binder_node_inner_unlock(node); + binder_put_node(node); break; } node->pending_weak_ref = 0; } - binder_dec_node(node, cmd == BC_ACQUIRE_DONE, 0); + free_node = binder_dec_node_nilocked(node, + cmd == BC_ACQUIRE_DONE, 0); + WARN_ON(free_node); binder_debug(BINDER_DEBUG_USER_REFS, - "%d:%d %s node %d ls %d lw %d\n", + "%d:%d %s node %d ls %d lw %d tr %d\n", proc->pid, thread->pid, cmd == BC_INCREFS_DONE ? "BC_INCREFS_DONE" : "BC_ACQUIRE_DONE", - node->debug_id, node->local_strong_refs, node->local_weak_refs); + node->debug_id, node->local_strong_refs, + node->local_weak_refs, node->tmp_refs); + binder_node_inner_unlock(node); + binder_put_node(node); break; } case BC_ATTEMPT_ACQUIRE: @@ -1923,7 +3631,8 @@ static int binder_thread_write(struct binder_proc *proc, return -EFAULT; ptr += sizeof(binder_uintptr_t); - buffer = binder_buffer_lookup(proc, data_ptr); + buffer = binder_alloc_prepare_to_free(&proc->alloc, + data_ptr); if (buffer == NULL) { binder_user_error("%d:%d BC_FREE_BUFFER u%016llx no match\n", proc->pid, thread->pid, (u64)data_ptr); @@ -1945,18 +3654,41 @@ static int binder_thread_write(struct binder_proc *proc, buffer->transaction = NULL; } if (buffer->async_transaction && buffer->target_node) { - BUG_ON(!buffer->target_node->has_async_transaction); - if (list_empty(&buffer->target_node->async_todo)) - buffer->target_node->has_async_transaction = 0; - else - list_move_tail(buffer->target_node->async_todo.next, &thread->todo); + struct binder_node *buf_node; + struct binder_work *w; + + buf_node = buffer->target_node; + binder_node_inner_lock(buf_node); + BUG_ON(!buf_node->has_async_transaction); + BUG_ON(buf_node->proc != proc); + w = binder_dequeue_work_head_ilocked( + &buf_node->async_todo); + if (!w) { + buf_node->has_async_transaction = false; + } else { + binder_enqueue_work_ilocked( + w, &proc->todo); + binder_wakeup_proc_ilocked(proc); + } + binder_node_inner_unlock(buf_node); } trace_binder_transaction_buffer_release(buffer); binder_transaction_buffer_release(proc, buffer, NULL); - binder_free_buf(proc, buffer); + binder_alloc_free_buf(&proc->alloc, buffer); break; } + case BC_TRANSACTION_SG: + case BC_REPLY_SG: { + struct binder_transaction_data_sg tr; + + if (copy_from_user(&tr, ptr, sizeof(tr))) + return -EFAULT; + ptr += sizeof(tr); + binder_transaction(proc, thread, &tr.transaction_data, + cmd == BC_REPLY_SG, tr.buffers_size); + break; + } case BC_TRANSACTION: case BC_REPLY: { struct binder_transaction_data tr; @@ -1964,7 +3696,8 @@ static int binder_thread_write(struct binder_proc *proc, if (copy_from_user(&tr, ptr, sizeof(tr))) return -EFAULT; ptr += sizeof(tr); - binder_transaction(proc, thread, &tr, cmd == BC_REPLY); + binder_transaction(proc, thread, &tr, + cmd == BC_REPLY, 0); break; } @@ -1972,6 +3705,7 @@ static int binder_thread_write(struct binder_proc *proc, binder_debug(BINDER_DEBUG_THREADS, "%d:%d BC_REGISTER_LOOPER\n", proc->pid, thread->pid); + binder_inner_proc_lock(proc); if (thread->looper & BINDER_LOOPER_STATE_ENTERED) { thread->looper |= BINDER_LOOPER_STATE_INVALID; binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called after BC_ENTER_LOOPER\n", @@ -1985,6 +3719,7 @@ static int binder_thread_write(struct binder_proc *proc, proc->requested_threads_started++; } thread->looper |= BINDER_LOOPER_STATE_REGISTERED; + binder_inner_proc_unlock(proc); break; case BC_ENTER_LOOPER: binder_debug(BINDER_DEBUG_THREADS, @@ -2009,7 +3744,7 @@ static int binder_thread_write(struct binder_proc *proc, uint32_t target; binder_uintptr_t cookie; struct binder_ref *ref; - struct binder_ref_death *death; + struct binder_ref_death *death = NULL; if (get_user(target, (uint32_t __user *)ptr)) return -EFAULT; @@ -2017,7 +3752,28 @@ static int binder_thread_write(struct binder_proc *proc, if (get_user(cookie, (binder_uintptr_t __user *)ptr)) return -EFAULT; ptr += sizeof(binder_uintptr_t); - ref = binder_get_ref(proc, target, false); + if (cmd == BC_REQUEST_DEATH_NOTIFICATION) { + /* + * Allocate memory for death notification + * before taking lock + */ + death = kzalloc(sizeof(*death), GFP_KERNEL); + if (death == NULL) { + WARN_ON(thread->return_error.cmd != + BR_OK); + thread->return_error.cmd = BR_ERROR; + binder_enqueue_thread_work( + thread, + &thread->return_error.work); + binder_debug( + BINDER_DEBUG_FAILED_TRANSACTION, + "%d:%d BC_REQUEST_DEATH_NOTIFICATION failed\n", + proc->pid, thread->pid); + break; + } + } + binder_proc_lock(proc); + ref = binder_get_ref_olocked(proc, target, false); if (ref == NULL) { binder_user_error("%d:%d %s invalid ref %d\n", proc->pid, thread->pid, @@ -2025,6 +3781,8 @@ static int binder_thread_write(struct binder_proc *proc, "BC_REQUEST_DEATH_NOTIFICATION" : "BC_CLEAR_DEATH_NOTIFICATION", target); + binder_proc_unlock(proc); + kfree(death); break; } @@ -2034,21 +3792,18 @@ static int binder_thread_write(struct binder_proc *proc, cmd == BC_REQUEST_DEATH_NOTIFICATION ? "BC_REQUEST_DEATH_NOTIFICATION" : "BC_CLEAR_DEATH_NOTIFICATION", - (u64)cookie, ref->debug_id, ref->desc, - ref->strong, ref->weak, ref->node->debug_id); + (u64)cookie, ref->data.debug_id, + ref->data.desc, ref->data.strong, + ref->data.weak, ref->node->debug_id); + binder_node_lock(ref->node); if (cmd == BC_REQUEST_DEATH_NOTIFICATION) { if (ref->death) { binder_user_error("%d:%d BC_REQUEST_DEATH_NOTIFICATION death notification already set\n", proc->pid, thread->pid); - break; - } - death = kzalloc(sizeof(*death), GFP_KERNEL); - if (death == NULL) { - thread->return_error = BR_ERROR; - binder_debug(BINDER_DEBUG_FAILED_TRANSACTION, - "%d:%d BC_REQUEST_DEATH_NOTIFICATION failed\n", - proc->pid, thread->pid); + binder_node_unlock(ref->node); + binder_proc_unlock(proc); + kfree(death); break; } binder_stats_created(BINDER_STAT_DEATH); @@ -2057,17 +3812,19 @@ static int binder_thread_write(struct binder_proc *proc, ref->death = death; if (ref->node->proc == NULL) { ref->death->work.type = BINDER_WORK_DEAD_BINDER; - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) { - list_add_tail(&ref->death->work.entry, &thread->todo); - } else { - list_add_tail(&ref->death->work.entry, &proc->todo); - wake_up_interruptible(&proc->wait); - } + + binder_inner_proc_lock(proc); + binder_enqueue_work_ilocked( + &ref->death->work, &proc->todo); + binder_wakeup_proc_ilocked(proc); + binder_inner_proc_unlock(proc); } } else { if (ref->death == NULL) { binder_user_error("%d:%d BC_CLEAR_DEATH_NOTIFICATION death notification not active\n", proc->pid, thread->pid); + binder_node_unlock(ref->node); + binder_proc_unlock(proc); break; } death = ref->death; @@ -2076,22 +3833,35 @@ static int binder_thread_write(struct binder_proc *proc, proc->pid, thread->pid, (u64)death->cookie, (u64)cookie); + binder_node_unlock(ref->node); + binder_proc_unlock(proc); break; } ref->death = NULL; + binder_inner_proc_lock(proc); if (list_empty(&death->work.entry)) { death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION; - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) { - list_add_tail(&death->work.entry, &thread->todo); - } else { - list_add_tail(&death->work.entry, &proc->todo); - wake_up_interruptible(&proc->wait); + if (thread->looper & + (BINDER_LOOPER_STATE_REGISTERED | + BINDER_LOOPER_STATE_ENTERED)) + binder_enqueue_thread_work_ilocked( + thread, + &death->work); + else { + binder_enqueue_work_ilocked( + &death->work, + &proc->todo); + binder_wakeup_proc_ilocked( + proc); } } else { BUG_ON(death->work.type != BINDER_WORK_DEAD_BINDER); death->work.type = BINDER_WORK_DEAD_BINDER_AND_CLEAR; } + binder_inner_proc_unlock(proc); } + binder_node_unlock(ref->node); + binder_proc_unlock(proc); } break; case BC_DEAD_BINDER_DONE: { struct binder_work *w; @@ -2102,8 +3872,13 @@ static int binder_thread_write(struct binder_proc *proc, return -EFAULT; ptr += sizeof(cookie); - list_for_each_entry(w, &proc->delivered_death, entry) { - struct binder_ref_death *tmp_death = container_of(w, struct binder_ref_death, work); + binder_inner_proc_lock(proc); + list_for_each_entry(w, &proc->delivered_death, + entry) { + struct binder_ref_death *tmp_death = + container_of(w, + struct binder_ref_death, + work); if (tmp_death->cookie == cookie) { death = tmp_death; @@ -2111,25 +3886,31 @@ static int binder_thread_write(struct binder_proc *proc, } } binder_debug(BINDER_DEBUG_DEAD_BINDER, - "%d:%d BC_DEAD_BINDER_DONE %016llx found %p\n", + "%d:%d BC_DEAD_BINDER_DONE %016llx found %pK\n", proc->pid, thread->pid, (u64)cookie, death); if (death == NULL) { binder_user_error("%d:%d BC_DEAD_BINDER_DONE %016llx not found\n", proc->pid, thread->pid, (u64)cookie); + binder_inner_proc_unlock(proc); break; } - - list_del_init(&death->work.entry); + binder_dequeue_work_ilocked(&death->work); if (death->work.type == BINDER_WORK_DEAD_BINDER_AND_CLEAR) { death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION; - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) { - list_add_tail(&death->work.entry, &thread->todo); - } else { - list_add_tail(&death->work.entry, &proc->todo); - wake_up_interruptible(&proc->wait); + if (thread->looper & + (BINDER_LOOPER_STATE_REGISTERED | + BINDER_LOOPER_STATE_ENTERED)) + binder_enqueue_thread_work_ilocked( + thread, &death->work); + else { + binder_enqueue_work_ilocked( + &death->work, + &proc->todo); + binder_wakeup_proc_ilocked(proc); } } + binder_inner_proc_unlock(proc); } break; default: @@ -2147,23 +3928,73 @@ static void binder_stat_br(struct binder_proc *proc, { trace_binder_return(cmd); if (_IOC_NR(cmd) < ARRAY_SIZE(binder_stats.br)) { - binder_stats.br[_IOC_NR(cmd)]++; - proc->stats.br[_IOC_NR(cmd)]++; - thread->stats.br[_IOC_NR(cmd)]++; + atomic_inc(&binder_stats.br[_IOC_NR(cmd)]); + atomic_inc(&proc->stats.br[_IOC_NR(cmd)]); + atomic_inc(&thread->stats.br[_IOC_NR(cmd)]); } } -static int binder_has_proc_work(struct binder_proc *proc, - struct binder_thread *thread) +static int binder_put_node_cmd(struct binder_proc *proc, + struct binder_thread *thread, + void __user **ptrp, + binder_uintptr_t node_ptr, + binder_uintptr_t node_cookie, + int node_debug_id, + uint32_t cmd, const char *cmd_name) { - return !list_empty(&proc->todo) || - (thread->looper & BINDER_LOOPER_STATE_NEED_RETURN); + void __user *ptr = *ptrp; + + if (put_user(cmd, (uint32_t __user *)ptr)) + return -EFAULT; + ptr += sizeof(uint32_t); + + if (put_user(node_ptr, (binder_uintptr_t __user *)ptr)) + return -EFAULT; + ptr += sizeof(binder_uintptr_t); + + if (put_user(node_cookie, (binder_uintptr_t __user *)ptr)) + return -EFAULT; + ptr += sizeof(binder_uintptr_t); + + binder_stat_br(proc, thread, cmd); + binder_debug(BINDER_DEBUG_USER_REFS, "%d:%d %s %d u%016llx c%016llx\n", + proc->pid, thread->pid, cmd_name, node_debug_id, + (u64)node_ptr, (u64)node_cookie); + + *ptrp = ptr; + return 0; } -static int binder_has_thread_work(struct binder_thread *thread) +static int binder_wait_for_work(struct binder_thread *thread, + bool do_proc_work) { - return !list_empty(&thread->todo) || thread->return_error != BR_OK || - (thread->looper & BINDER_LOOPER_STATE_NEED_RETURN); + DEFINE_WAIT(wait); + struct binder_proc *proc = thread->proc; + int ret = 0; + + freezer_do_not_count(); + binder_inner_proc_lock(proc); + for (;;) { + prepare_to_wait(&thread->wait, &wait, TASK_INTERRUPTIBLE); + if (binder_has_work_ilocked(thread, do_proc_work)) + break; + if (do_proc_work) + list_add(&thread->waiting_thread_node, + &proc->waiting_threads); + binder_inner_proc_unlock(proc); + schedule(); + binder_inner_proc_lock(proc); + list_del_init(&thread->waiting_thread_node); + if (signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } + } + finish_wait(&thread->wait, &wait); + binder_inner_proc_unlock(proc); + freezer_count(); + + return ret; } static int binder_thread_read(struct binder_proc *proc, @@ -2185,37 +4016,15 @@ static int binder_thread_read(struct binder_proc *proc, } retry: - wait_for_proc_work = thread->transaction_stack == NULL && - list_empty(&thread->todo); - - if (thread->return_error != BR_OK && ptr < end) { - if (thread->return_error2 != BR_OK) { - if (put_user(thread->return_error2, (uint32_t __user *)ptr)) - return -EFAULT; - ptr += sizeof(uint32_t); - binder_stat_br(proc, thread, thread->return_error2); - if (ptr == end) - goto done; - thread->return_error2 = BR_OK; - } - if (put_user(thread->return_error, (uint32_t __user *)ptr)) - return -EFAULT; - ptr += sizeof(uint32_t); - binder_stat_br(proc, thread, thread->return_error); - thread->return_error = BR_OK; - goto done; - } - + binder_inner_proc_lock(proc); + wait_for_proc_work = binder_available_for_proc_work_ilocked(thread); + binder_inner_proc_unlock(proc); thread->looper |= BINDER_LOOPER_STATE_WAITING; - if (wait_for_proc_work) - proc->ready_threads++; - - binder_unlock(__func__); trace_binder_wait_for_work(wait_for_proc_work, !!thread->transaction_stack, - !list_empty(&thread->todo)); + !binder_worklist_empty(proc, &thread->todo)); if (wait_for_proc_work) { if (!(thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED))) { @@ -2224,24 +4033,16 @@ static int binder_thread_read(struct binder_proc *proc, wait_event_interruptible(binder_user_error_wait, binder_stop_on_user_error < 2); } - binder_set_nice(proc->default_priority); - if (non_block) { - if (!binder_has_proc_work(proc, thread)) - ret = -EAGAIN; - } else - ret = wait_event_freezable_exclusive(proc->wait, binder_has_proc_work(proc, thread)); - } else { - if (non_block) { - if (!binder_has_thread_work(thread)) - ret = -EAGAIN; - } else - ret = wait_event_freezable(thread->wait, binder_has_thread_work(thread)); + binder_restore_priority(current, proc->default_priority); } - binder_lock(__func__); + if (non_block) { + if (!binder_has_work(thread, wait_for_proc_work)) + ret = -EAGAIN; + } else { + ret = binder_wait_for_work(thread, wait_for_proc_work); + } - if (wait_for_proc_work) - proc->ready_threads--; thread->looper &= ~BINDER_LOOPER_STATE_WAITING; if (ret) @@ -2250,31 +4051,55 @@ static int binder_thread_read(struct binder_proc *proc, while (1) { uint32_t cmd; struct binder_transaction_data tr; - struct binder_work *w; + struct binder_work *w = NULL; + struct list_head *list = NULL; struct binder_transaction *t = NULL; + struct binder_thread *t_from; - if (!list_empty(&thread->todo)) { - w = list_first_entry(&thread->todo, struct binder_work, - entry); - } else if (!list_empty(&proc->todo) && wait_for_proc_work) { - w = list_first_entry(&proc->todo, struct binder_work, - entry); - } else { + binder_inner_proc_lock(proc); + if (!binder_worklist_empty_ilocked(&thread->todo)) + list = &thread->todo; + else if (!binder_worklist_empty_ilocked(&proc->todo) && + wait_for_proc_work) + list = &proc->todo; + else { + binder_inner_proc_unlock(proc); + /* no data added */ - if (ptr - buffer == 4 && - !(thread->looper & BINDER_LOOPER_STATE_NEED_RETURN)) + if (ptr - buffer == 4 && !thread->looper_need_return) goto retry; break; } - if (end - ptr < sizeof(tr) + 4) + if (end - ptr < sizeof(tr) + 4) { + binder_inner_proc_unlock(proc); break; + } + w = binder_dequeue_work_head_ilocked(list); + if (binder_worklist_empty_ilocked(&thread->todo)) + thread->process_todo = false; switch (w->type) { case BINDER_WORK_TRANSACTION: { + binder_inner_proc_unlock(proc); t = container_of(w, struct binder_transaction, work); } break; + case BINDER_WORK_RETURN_ERROR: { + struct binder_error *e = container_of( + w, struct binder_error, work); + + WARN_ON(e->cmd == BR_OK); + binder_inner_proc_unlock(proc); + if (put_user(e->cmd, (uint32_t __user *)ptr)) + return -EFAULT; + cmd = e->cmd; + e->cmd = BR_OK; + ptr += sizeof(uint32_t); + + binder_stat_br(proc, thread, cmd); + } break; case BINDER_WORK_TRANSACTION_COMPLETE: { + binder_inner_proc_unlock(proc); cmd = BR_TRANSACTION_COMPLETE; if (put_user(cmd, (uint32_t __user *)ptr)) return -EFAULT; @@ -2284,113 +4109,134 @@ static int binder_thread_read(struct binder_proc *proc, binder_debug(BINDER_DEBUG_TRANSACTION_COMPLETE, "%d:%d BR_TRANSACTION_COMPLETE\n", proc->pid, thread->pid); - - list_del(&w->entry); kfree(w); binder_stats_deleted(BINDER_STAT_TRANSACTION_COMPLETE); } break; case BINDER_WORK_NODE: { struct binder_node *node = container_of(w, struct binder_node, work); - uint32_t cmd = BR_NOOP; - const char *cmd_name; - int strong = node->internal_strong_refs || node->local_strong_refs; - int weak = !hlist_empty(&node->refs) || node->local_weak_refs || strong; + int strong, weak; + binder_uintptr_t node_ptr = node->ptr; + binder_uintptr_t node_cookie = node->cookie; + int node_debug_id = node->debug_id; + int has_weak_ref; + int has_strong_ref; + void __user *orig_ptr = ptr; - if (weak && !node->has_weak_ref) { - cmd = BR_INCREFS; - cmd_name = "BR_INCREFS"; + BUG_ON(proc != node->proc); + strong = node->internal_strong_refs || + node->local_strong_refs; + weak = !hlist_empty(&node->refs) || + node->local_weak_refs || + node->tmp_refs || strong; + has_strong_ref = node->has_strong_ref; + has_weak_ref = node->has_weak_ref; + + if (weak && !has_weak_ref) { node->has_weak_ref = 1; node->pending_weak_ref = 1; node->local_weak_refs++; - } else if (strong && !node->has_strong_ref) { - cmd = BR_ACQUIRE; - cmd_name = "BR_ACQUIRE"; + } + if (strong && !has_strong_ref) { node->has_strong_ref = 1; node->pending_strong_ref = 1; node->local_strong_refs++; - } else if (!strong && node->has_strong_ref) { - cmd = BR_RELEASE; - cmd_name = "BR_RELEASE"; + } + if (!strong && has_strong_ref) node->has_strong_ref = 0; - } else if (!weak && node->has_weak_ref) { - cmd = BR_DECREFS; - cmd_name = "BR_DECREFS"; + if (!weak && has_weak_ref) node->has_weak_ref = 0; - } - if (cmd != BR_NOOP) { - if (put_user(cmd, (uint32_t __user *)ptr)) - return -EFAULT; - ptr += sizeof(uint32_t); - if (put_user(node->ptr, - (binder_uintptr_t __user *)ptr)) - return -EFAULT; - ptr += sizeof(binder_uintptr_t); - if (put_user(node->cookie, - (binder_uintptr_t __user *)ptr)) - return -EFAULT; - ptr += sizeof(binder_uintptr_t); + if (!weak && !strong) { + binder_debug(BINDER_DEBUG_INTERNAL_REFS, + "%d:%d node %d u%016llx c%016llx deleted\n", + proc->pid, thread->pid, + node_debug_id, + (u64)node_ptr, + (u64)node_cookie); + rb_erase(&node->rb_node, &proc->nodes); + binder_inner_proc_unlock(proc); + binder_node_lock(node); + /* + * Acquire the node lock before freeing the + * node to serialize with other threads that + * may have been holding the node lock while + * decrementing this node (avoids race where + * this thread frees while the other thread + * is unlocking the node after the final + * decrement) + */ + binder_node_unlock(node); + binder_free_node(node); + } else + binder_inner_proc_unlock(proc); - binder_stat_br(proc, thread, cmd); - binder_debug(BINDER_DEBUG_USER_REFS, - "%d:%d %s %d u%016llx c%016llx\n", - proc->pid, thread->pid, cmd_name, - node->debug_id, - (u64)node->ptr, (u64)node->cookie); - } else { - list_del_init(&w->entry); - if (!weak && !strong) { - binder_debug(BINDER_DEBUG_INTERNAL_REFS, - "%d:%d node %d u%016llx c%016llx deleted\n", - proc->pid, thread->pid, - node->debug_id, - (u64)node->ptr, - (u64)node->cookie); - rb_erase(&node->rb_node, &proc->nodes); - kfree(node); - binder_stats_deleted(BINDER_STAT_NODE); - } else { - binder_debug(BINDER_DEBUG_INTERNAL_REFS, - "%d:%d node %d u%016llx c%016llx state unchanged\n", - proc->pid, thread->pid, - node->debug_id, - (u64)node->ptr, - (u64)node->cookie); - } - } + if (weak && !has_weak_ref) + ret = binder_put_node_cmd( + proc, thread, &ptr, node_ptr, + node_cookie, node_debug_id, + BR_INCREFS, "BR_INCREFS"); + if (!ret && strong && !has_strong_ref) + ret = binder_put_node_cmd( + proc, thread, &ptr, node_ptr, + node_cookie, node_debug_id, + BR_ACQUIRE, "BR_ACQUIRE"); + if (!ret && !strong && has_strong_ref) + ret = binder_put_node_cmd( + proc, thread, &ptr, node_ptr, + node_cookie, node_debug_id, + BR_RELEASE, "BR_RELEASE"); + if (!ret && !weak && has_weak_ref) + ret = binder_put_node_cmd( + proc, thread, &ptr, node_ptr, + node_cookie, node_debug_id, + BR_DECREFS, "BR_DECREFS"); + if (orig_ptr == ptr) + binder_debug(BINDER_DEBUG_INTERNAL_REFS, + "%d:%d node %d u%016llx c%016llx state unchanged\n", + proc->pid, thread->pid, + node_debug_id, + (u64)node_ptr, + (u64)node_cookie); + if (ret) + return ret; } break; case BINDER_WORK_DEAD_BINDER: case BINDER_WORK_DEAD_BINDER_AND_CLEAR: case BINDER_WORK_CLEAR_DEATH_NOTIFICATION: { struct binder_ref_death *death; uint32_t cmd; + binder_uintptr_t cookie; death = container_of(w, struct binder_ref_death, work); if (w->type == BINDER_WORK_CLEAR_DEATH_NOTIFICATION) cmd = BR_CLEAR_DEATH_NOTIFICATION_DONE; else cmd = BR_DEAD_BINDER; - if (put_user(cmd, (uint32_t __user *)ptr)) - return -EFAULT; - ptr += sizeof(uint32_t); - if (put_user(death->cookie, - (binder_uintptr_t __user *)ptr)) - return -EFAULT; - ptr += sizeof(binder_uintptr_t); - binder_stat_br(proc, thread, cmd); + cookie = death->cookie; + binder_debug(BINDER_DEBUG_DEATH_NOTIFICATION, "%d:%d %s %016llx\n", proc->pid, thread->pid, cmd == BR_DEAD_BINDER ? "BR_DEAD_BINDER" : "BR_CLEAR_DEATH_NOTIFICATION_DONE", - (u64)death->cookie); - + (u64)cookie); if (w->type == BINDER_WORK_CLEAR_DEATH_NOTIFICATION) { - list_del(&w->entry); + binder_inner_proc_unlock(proc); kfree(death); binder_stats_deleted(BINDER_STAT_DEATH); - } else - list_move(&w->entry, &proc->delivered_death); + } else { + binder_enqueue_work_ilocked( + w, &proc->delivered_death); + binder_inner_proc_unlock(proc); + } + if (put_user(cmd, (uint32_t __user *)ptr)) + return -EFAULT; + ptr += sizeof(uint32_t); + if (put_user(cookie, + (binder_uintptr_t __user *)ptr)) + return -EFAULT; + ptr += sizeof(binder_uintptr_t); + binder_stat_br(proc, thread, cmd); if (cmd == BR_DEAD_BINDER) goto done; /* DEAD_BINDER notifications can cause transactions */ } break; @@ -2402,16 +4248,14 @@ static int binder_thread_read(struct binder_proc *proc, BUG_ON(t->buffer == NULL); if (t->buffer->target_node) { struct binder_node *target_node = t->buffer->target_node; + struct binder_priority node_prio; tr.target.ptr = target_node->ptr; tr.cookie = target_node->cookie; - t->saved_priority = task_nice(current); - if (t->priority < target_node->min_priority && - !(t->flags & TF_ONE_WAY)) - binder_set_nice(t->priority); - else if (!(t->flags & TF_ONE_WAY) || - t->saved_priority > target_node->min_priority) - binder_set_nice(target_node->min_priority); + node_prio.sched_policy = target_node->sched_policy; + node_prio.prio = target_node->min_priority; + binder_transaction_priority(current, t, node_prio, + target_node->inherit_rt); cmd = BR_TRANSACTION; } else { tr.target.ptr = 0; @@ -2422,8 +4266,9 @@ static int binder_thread_read(struct binder_proc *proc, tr.flags = t->flags; tr.sender_euid = from_kuid(current_user_ns(), t->sender_euid); - if (t->from) { - struct task_struct *sender = t->from->proc->tsk; + t_from = binder_get_txn_from(t); + if (t_from) { + struct task_struct *sender = t_from->proc->tsk; tr.sender_pid = task_tgid_nr_ns(sender, task_active_pid_ns(current)); @@ -2433,18 +4278,32 @@ static int binder_thread_read(struct binder_proc *proc, tr.data_size = t->buffer->data_size; tr.offsets_size = t->buffer->offsets_size; - tr.data.ptr.buffer = (binder_uintptr_t)( - (uintptr_t)t->buffer->data + - proc->user_buffer_offset); + tr.data.ptr.buffer = (binder_uintptr_t) + ((uintptr_t)t->buffer->data + + binder_alloc_get_user_buffer_offset(&proc->alloc)); tr.data.ptr.offsets = tr.data.ptr.buffer + ALIGN(t->buffer->data_size, sizeof(void *)); - if (put_user(cmd, (uint32_t __user *)ptr)) + if (put_user(cmd, (uint32_t __user *)ptr)) { + if (t_from) + binder_thread_dec_tmpref(t_from); + + binder_cleanup_transaction(t, "put_user failed", + BR_FAILED_REPLY); + return -EFAULT; + } ptr += sizeof(uint32_t); - if (copy_to_user(ptr, &tr, sizeof(tr))) + if (copy_to_user(ptr, &tr, sizeof(tr))) { + if (t_from) + binder_thread_dec_tmpref(t_from); + + binder_cleanup_transaction(t, "copy_to_user failed", + BR_FAILED_REPLY); + return -EFAULT; + } ptr += sizeof(tr); trace_binder_transaction_received(t); @@ -2454,21 +4313,22 @@ static int binder_thread_read(struct binder_proc *proc, proc->pid, thread->pid, (cmd == BR_TRANSACTION) ? "BR_TRANSACTION" : "BR_REPLY", - t->debug_id, t->from ? t->from->proc->pid : 0, - t->from ? t->from->pid : 0, cmd, + t->debug_id, t_from ? t_from->proc->pid : 0, + t_from ? t_from->pid : 0, cmd, t->buffer->data_size, t->buffer->offsets_size, (u64)tr.data.ptr.buffer, (u64)tr.data.ptr.offsets); - list_del(&t->work.entry); + if (t_from) + binder_thread_dec_tmpref(t_from); t->buffer->allow_user_free = 1; if (cmd == BR_TRANSACTION && !(t->flags & TF_ONE_WAY)) { + binder_inner_proc_lock(thread->proc); t->to_parent = thread->transaction_stack; t->to_thread = thread; thread->transaction_stack = t; + binder_inner_proc_unlock(thread->proc); } else { - t->buffer->transaction = NULL; - kfree(t); - binder_stats_deleted(BINDER_STAT_TRANSACTION); + binder_free_transaction(t); } break; } @@ -2476,45 +4336,52 @@ static int binder_thread_read(struct binder_proc *proc, done: *consumed = ptr - buffer; - if (proc->requested_threads + proc->ready_threads == 0 && + binder_inner_proc_lock(proc); + if (proc->requested_threads == 0 && + list_empty(&thread->proc->waiting_threads) && proc->requested_threads_started < proc->max_threads && (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) /* the user-space code fails to */ /*spawn a new thread if we leave this out */) { proc->requested_threads++; + binder_inner_proc_unlock(proc); binder_debug(BINDER_DEBUG_THREADS, "%d:%d BR_SPAWN_LOOPER\n", proc->pid, thread->pid); if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer)) return -EFAULT; binder_stat_br(proc, thread, BR_SPAWN_LOOPER); - } + } else + binder_inner_proc_unlock(proc); return 0; } -static void binder_release_work(struct list_head *list) +static void binder_release_work(struct binder_proc *proc, + struct list_head *list) { struct binder_work *w; - while (!list_empty(list)) { - w = list_first_entry(list, struct binder_work, entry); - list_del_init(&w->entry); + while (1) { + w = binder_dequeue_work_head(proc, list); + if (!w) + return; + switch (w->type) { case BINDER_WORK_TRANSACTION: { struct binder_transaction *t; t = container_of(w, struct binder_transaction, work); - if (t->buffer->target_node && - !(t->flags & TF_ONE_WAY)) { - binder_send_failed_reply(t, BR_DEAD_REPLY); - } else { - binder_debug(BINDER_DEBUG_DEAD_TRANSACTION, - "undelivered transaction %d\n", - t->debug_id); - t->buffer->transaction = NULL; - kfree(t); - binder_stats_deleted(BINDER_STAT_TRANSACTION); - } + + binder_cleanup_transaction(t, "process died.", + BR_DEAD_REPLY); + } break; + case BINDER_WORK_RETURN_ERROR: { + struct binder_error *e = container_of( + w, struct binder_error, work); + + binder_debug(BINDER_DEBUG_DEAD_TRANSACTION, + "undelivered TRANSACTION_ERROR: %u\n", + e->cmd); } break; case BINDER_WORK_TRANSACTION_COMPLETE: { binder_debug(BINDER_DEBUG_DEAD_TRANSACTION, @@ -2542,7 +4409,8 @@ static void binder_release_work(struct list_head *list) } -static struct binder_thread *binder_get_thread(struct binder_proc *proc) +static struct binder_thread *binder_get_thread_ilocked( + struct binder_proc *proc, struct binder_thread *new_thread) { struct binder_thread *thread = NULL; struct rb_node *parent = NULL; @@ -2557,38 +4425,102 @@ static struct binder_thread *binder_get_thread(struct binder_proc *proc) else if (current->pid > thread->pid) p = &(*p)->rb_right; else - break; + return thread; } - if (*p == NULL) { - thread = kzalloc(sizeof(*thread), GFP_KERNEL); - if (thread == NULL) + if (!new_thread) + return NULL; + thread = new_thread; + binder_stats_created(BINDER_STAT_THREAD); + thread->proc = proc; + thread->pid = current->pid; + get_task_struct(current); + thread->task = current; + atomic_set(&thread->tmp_ref, 0); + init_waitqueue_head(&thread->wait); + INIT_LIST_HEAD(&thread->todo); + rb_link_node(&thread->rb_node, parent, p); + rb_insert_color(&thread->rb_node, &proc->threads); + thread->looper_need_return = true; + thread->return_error.work.type = BINDER_WORK_RETURN_ERROR; + thread->return_error.cmd = BR_OK; + thread->reply_error.work.type = BINDER_WORK_RETURN_ERROR; + thread->reply_error.cmd = BR_OK; + INIT_LIST_HEAD(&new_thread->waiting_thread_node); + return thread; +} + +static struct binder_thread *binder_get_thread(struct binder_proc *proc) +{ + struct binder_thread *thread; + struct binder_thread *new_thread; + + binder_inner_proc_lock(proc); + thread = binder_get_thread_ilocked(proc, NULL); + binder_inner_proc_unlock(proc); + if (!thread) { + new_thread = kzalloc(sizeof(*thread), GFP_KERNEL); + if (new_thread == NULL) return NULL; - binder_stats_created(BINDER_STAT_THREAD); - thread->proc = proc; - thread->pid = current->pid; - init_waitqueue_head(&thread->wait); - INIT_LIST_HEAD(&thread->todo); - rb_link_node(&thread->rb_node, parent, p); - rb_insert_color(&thread->rb_node, &proc->threads); - thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN; - thread->return_error = BR_OK; - thread->return_error2 = BR_OK; + binder_inner_proc_lock(proc); + thread = binder_get_thread_ilocked(proc, new_thread); + binder_inner_proc_unlock(proc); + if (thread != new_thread) + kfree(new_thread); } return thread; } -static int binder_free_thread(struct binder_proc *proc, - struct binder_thread *thread) +static void binder_free_proc(struct binder_proc *proc) +{ + BUG_ON(!list_empty(&proc->todo)); + BUG_ON(!list_empty(&proc->delivered_death)); + binder_alloc_deferred_release(&proc->alloc); + put_task_struct(proc->tsk); + binder_stats_deleted(BINDER_STAT_PROC); + kfree(proc); +} + +static void binder_free_thread(struct binder_thread *thread) +{ + BUG_ON(!list_empty(&thread->todo)); + binder_stats_deleted(BINDER_STAT_THREAD); + binder_proc_dec_tmpref(thread->proc); + put_task_struct(thread->task); + kfree(thread); +} + +static int binder_thread_release(struct binder_proc *proc, + struct binder_thread *thread) { struct binder_transaction *t; struct binder_transaction *send_reply = NULL; int active_transactions = 0; + struct binder_transaction *last_t = NULL; + binder_inner_proc_lock(thread->proc); + /* + * take a ref on the proc so it survives + * after we remove this thread from proc->threads. + * The corresponding dec is when we actually + * free the thread in binder_free_thread() + */ + proc->tmp_ref++; + /* + * take a ref on this thread to ensure it + * survives while we are releasing it + */ + atomic_inc(&thread->tmp_ref); rb_erase(&thread->rb_node, &proc->threads); t = thread->transaction_stack; - if (t && t->to_thread == thread) - send_reply = t; + if (t) { + spin_lock(&t->lock); + if (t->to_thread == thread) + send_reply = t; + } + thread->is_dead = true; + while (t) { + last_t = t; active_transactions++; binder_debug(BINDER_DEBUG_DEAD_TRANSACTION, "release %d:%d transaction %d %s, still active\n", @@ -2609,12 +4541,37 @@ static int binder_free_thread(struct binder_proc *proc, t = t->from_parent; } else BUG(); + spin_unlock(&last_t->lock); + if (t) + spin_lock(&t->lock); } + + /* + * If this thread used poll, make sure we remove the waitqueue + * from any epoll data structures holding it with POLLFREE. + * waitqueue_active() is safe to use here because we're holding + * the inner lock. + */ + if ((thread->looper & BINDER_LOOPER_STATE_POLL) && + waitqueue_active(&thread->wait)) { + wake_up_poll(&thread->wait, POLLHUP | POLLFREE); + } + + binder_inner_proc_unlock(thread->proc); + + /* + * This is needed to avoid races between wake_up_poll() above and + * and ep_remove_waitqueue() called for other reasons (eg the epoll file + * descriptor being closed); ep_remove_waitqueue() holds an RCU read + * lock, so we can be sure it's done after calling synchronize_rcu(). + */ + if (thread->looper & BINDER_LOOPER_STATE_POLL) + synchronize_rcu(); + if (send_reply) binder_send_failed_reply(send_reply, BR_DEAD_REPLY); - binder_release_work(&thread->todo); - kfree(thread); - binder_stats_deleted(BINDER_STAT_THREAD); + binder_release_work(proc, &thread->todo); + binder_thread_dec_tmpref(thread); return active_transactions; } @@ -2623,34 +4580,23 @@ static unsigned int binder_poll(struct file *filp, { struct binder_proc *proc = filp->private_data; struct binder_thread *thread = NULL; - int wait_for_proc_work; - - binder_lock(__func__); + bool wait_for_proc_work; thread = binder_get_thread(proc); - if (!thread) { - binder_unlock(__func__); + if (!thread) return POLLERR; - } - wait_for_proc_work = thread->transaction_stack == NULL && - list_empty(&thread->todo) && thread->return_error == BR_OK; + binder_inner_proc_lock(thread->proc); + thread->looper |= BINDER_LOOPER_STATE_POLL; + wait_for_proc_work = binder_available_for_proc_work_ilocked(thread); - binder_unlock(__func__); + binder_inner_proc_unlock(thread->proc); - if (wait_for_proc_work) { - if (binder_has_proc_work(proc, thread)) - return POLLIN; - poll_wait(filp, &proc->wait, wait); - if (binder_has_proc_work(proc, thread)) - return POLLIN; - } else { - if (binder_has_thread_work(thread)) - return POLLIN; - poll_wait(filp, &thread->wait, wait); - if (binder_has_thread_work(thread)) - return POLLIN; - } + poll_wait(filp, &thread->wait, wait); + + if (binder_has_work(thread, wait_for_proc_work)) + return POLLIN; + return 0; } @@ -2697,8 +4643,10 @@ static int binder_ioctl_write_read(struct file *filp, &bwr.read_consumed, filp->f_flags & O_NONBLOCK); trace_binder_read_done(ret); - if (!list_empty(&proc->todo)) - wake_up_interruptible(&proc->wait); + binder_inner_proc_lock(proc); + if (!binder_worklist_empty_ilocked(&proc->todo)) + binder_wakeup_proc_ilocked(proc); + binder_inner_proc_unlock(proc); if (ret < 0) { if (copy_to_user(ubuf, &bwr, sizeof(bwr))) ret = -EFAULT; @@ -2722,9 +4670,12 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp) { int ret = 0; struct binder_proc *proc = filp->private_data; + struct binder_context *context = proc->context; + struct binder_node *new_node; kuid_t curr_euid = current_euid(); - if (binder_context_mgr_node != NULL) { + mutex_lock(&context->context_mgr_node_lock); + if (context->binder_context_mgr_node) { pr_err("BINDER_SET_CONTEXT_MGR already set\n"); ret = -EBUSY; goto out; @@ -2732,31 +4683,96 @@ static int binder_ioctl_set_ctx_mgr(struct file *filp) ret = security_binder_set_context_mgr(proc->tsk); if (ret < 0) goto out; - if (uid_valid(binder_context_mgr_uid)) { - if (!uid_eq(binder_context_mgr_uid, curr_euid)) { + if (uid_valid(context->binder_context_mgr_uid)) { + if (!uid_eq(context->binder_context_mgr_uid, curr_euid)) { pr_err("BINDER_SET_CONTEXT_MGR bad uid %d != %d\n", from_kuid(&init_user_ns, curr_euid), from_kuid(&init_user_ns, - binder_context_mgr_uid)); + context->binder_context_mgr_uid)); ret = -EPERM; goto out; } } else { - binder_context_mgr_uid = curr_euid; + context->binder_context_mgr_uid = curr_euid; } - binder_context_mgr_node = binder_new_node(proc, 0, 0); - if (binder_context_mgr_node == NULL) { + new_node = binder_new_node(proc, NULL); + if (!new_node) { ret = -ENOMEM; goto out; } - binder_context_mgr_node->local_weak_refs++; - binder_context_mgr_node->local_strong_refs++; - binder_context_mgr_node->has_strong_ref = 1; - binder_context_mgr_node->has_weak_ref = 1; + binder_node_lock(new_node); + new_node->local_weak_refs++; + new_node->local_strong_refs++; + new_node->has_strong_ref = 1; + new_node->has_weak_ref = 1; + context->binder_context_mgr_node = new_node; + binder_node_unlock(new_node); + binder_put_node(new_node); out: + mutex_unlock(&context->context_mgr_node_lock); return ret; } +static int binder_ioctl_get_node_info_for_ref(struct binder_proc *proc, + struct binder_node_info_for_ref *info) +{ + struct binder_node *node; + struct binder_context *context = proc->context; + __u32 handle = info->handle; + + if (info->strong_count || info->weak_count || info->reserved1 || + info->reserved2 || info->reserved3) { + binder_user_error("%d BINDER_GET_NODE_INFO_FOR_REF: only handle may be non-zero.", + proc->pid); + return -EINVAL; + } + + /* This ioctl may only be used by the context manager */ + mutex_lock(&context->context_mgr_node_lock); + if (!context->binder_context_mgr_node || + context->binder_context_mgr_node->proc != proc) { + mutex_unlock(&context->context_mgr_node_lock); + return -EPERM; + } + mutex_unlock(&context->context_mgr_node_lock); + + node = binder_get_node_from_ref(proc, handle, true, NULL); + if (!node) + return -EINVAL; + + info->strong_count = node->local_strong_refs + + node->internal_strong_refs; + info->weak_count = node->local_weak_refs; + + binder_put_node(node); + + return 0; +} + +static int binder_ioctl_get_node_debug_info(struct binder_proc *proc, + struct binder_node_debug_info *info) { + struct rb_node *n; + binder_uintptr_t ptr = info->ptr; + + memset(info, 0, sizeof(*info)); + + binder_inner_proc_lock(proc); + for (n = rb_first(&proc->nodes); n != NULL; n = rb_next(n)) { + struct binder_node *node = rb_entry(n, struct binder_node, + rb_node); + if (node->ptr > ptr) { + info->ptr = node->ptr; + info->cookie = node->cookie; + info->has_strong_ref = node->has_strong_ref; + info->has_weak_ref = node->has_weak_ref; + break; + } + } + binder_inner_proc_unlock(proc); + + return 0; +} + static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { int ret; @@ -2768,13 +4784,14 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) /*pr_info("binder_ioctl: %d:%d %x %lx\n", proc->pid, current->pid, cmd, arg);*/ + binder_selftest_alloc(&proc->alloc); + trace_binder_ioctl(cmd, arg); ret = wait_event_interruptible(binder_user_error_wait, binder_stop_on_user_error < 2); if (ret) goto err_unlocked; - binder_lock(__func__); thread = binder_get_thread(proc); if (thread == NULL) { ret = -ENOMEM; @@ -2787,12 +4804,19 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (ret) goto err; break; - case BINDER_SET_MAX_THREADS: - if (copy_from_user(&proc->max_threads, ubuf, sizeof(proc->max_threads))) { + case BINDER_SET_MAX_THREADS: { + int max_threads; + + if (copy_from_user(&max_threads, ubuf, + sizeof(max_threads))) { ret = -EINVAL; goto err; } + binder_inner_proc_lock(proc); + proc->max_threads = max_threads; + binder_inner_proc_unlock(proc); break; + } case BINDER_SET_CONTEXT_MGR: ret = binder_ioctl_set_ctx_mgr(filp); if (ret) @@ -2801,7 +4825,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) case BINDER_THREAD_EXIT: binder_debug(BINDER_DEBUG_THREADS, "%d:%d exit\n", proc->pid, thread->pid); - binder_free_thread(proc, thread); + binder_thread_release(proc, thread); thread = NULL; break; case BINDER_VERSION: { @@ -2818,6 +4842,43 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) } break; } + case BINDER_GET_NODE_INFO_FOR_REF: { + struct binder_node_info_for_ref info; + + if (copy_from_user(&info, ubuf, sizeof(info))) { + ret = -EFAULT; + goto err; + } + + ret = binder_ioctl_get_node_info_for_ref(proc, &info); + if (ret < 0) + goto err; + + if (copy_to_user(ubuf, &info, sizeof(info))) { + ret = -EFAULT; + goto err; + } + + break; + } + case BINDER_GET_NODE_DEBUG_INFO: { + struct binder_node_debug_info info; + + if (copy_from_user(&info, ubuf, sizeof(info))) { + ret = -EFAULT; + goto err; + } + + ret = binder_ioctl_get_node_debug_info(proc, &info); + if (ret < 0) + goto err; + + if (copy_to_user(ubuf, &info, sizeof(info))) { + ret = -EFAULT; + goto err; + } + break; + } default: ret = -EINVAL; goto err; @@ -2825,8 +4886,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) ret = 0; err: if (thread) - thread->looper &= ~BINDER_LOOPER_STATE_NEED_RETURN; - binder_unlock(__func__); + thread->looper_need_return = false; wait_event_interruptible(binder_user_error_wait, binder_stop_on_user_error < 2); if (ret && ret != -ERESTARTSYS) pr_info("%d:%d ioctl %x %lx returned %d\n", proc->pid, current->pid, cmd, arg, ret); @@ -2855,8 +4915,7 @@ static void binder_vma_close(struct vm_area_struct *vma) proc->pid, vma->vm_start, vma->vm_end, (vma->vm_end - vma->vm_start) / SZ_1K, vma->vm_flags, (unsigned long)pgprot_val(vma->vm_page_prot)); - proc->vma = NULL; - proc->vma_vm_mm = NULL; + binder_alloc_vma_close(&proc->alloc); binder_defer_work(proc, BINDER_DEFERRED_PUT_FILES); } @@ -2874,10 +4933,8 @@ static const struct vm_operations_struct binder_vm_ops = { static int binder_mmap(struct file *filp, struct vm_area_struct *vma) { int ret; - struct vm_struct *area; struct binder_proc *proc = filp->private_data; const char *failure_string; - struct binder_buffer *buffer; if (proc->tsk != current->group_leader) return -EINVAL; @@ -2886,8 +4943,8 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma) vma->vm_end = vma->vm_start + SZ_4M; binder_debug(BINDER_DEBUG_OPEN_CLOSE, - "binder_mmap: %d %lx-%lx (%ld K) vma %lx pagep %lx\n", - proc->pid, vma->vm_start, vma->vm_end, + "%s: %d %lx-%lx (%ld K) vma %lx pagep %lx\n", + __func__, proc->pid, vma->vm_start, vma->vm_end, (vma->vm_end - vma->vm_start) / SZ_1K, vma->vm_flags, (unsigned long)pgprot_val(vma->vm_page_prot)); @@ -2896,76 +4953,22 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma) failure_string = "bad vm_flags"; goto err_bad_arg; } - vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE; - - mutex_lock(&binder_mmap_lock); - if (proc->buffer) { - ret = -EBUSY; - failure_string = "already mapped"; - goto err_already_mapped; - } - - area = get_vm_area(vma->vm_end - vma->vm_start, VM_IOREMAP); - if (area == NULL) { - ret = -ENOMEM; - failure_string = "get_vm_area"; - goto err_get_vm_area_failed; - } - proc->buffer = area->addr; - proc->user_buffer_offset = vma->vm_start - (uintptr_t)proc->buffer; - mutex_unlock(&binder_mmap_lock); - -#ifdef CONFIG_CPU_CACHE_VIPT - if (cache_is_vipt_aliasing()) { - while (CACHE_COLOUR((vma->vm_start ^ (uint32_t)proc->buffer))) { - pr_info("binder_mmap: %d %lx-%lx maps %p bad alignment\n", proc->pid, vma->vm_start, vma->vm_end, proc->buffer); - vma->vm_start += PAGE_SIZE; - } - } -#endif - proc->pages = kzalloc(sizeof(proc->pages[0]) * ((vma->vm_end - vma->vm_start) / PAGE_SIZE), GFP_KERNEL); - if (proc->pages == NULL) { - ret = -ENOMEM; - failure_string = "alloc page array"; - goto err_alloc_pages_failed; - } - proc->buffer_size = vma->vm_end - vma->vm_start; + vma->vm_flags |= VM_DONTCOPY | VM_MIXEDMAP; + vma->vm_flags &= ~VM_MAYWRITE; vma->vm_ops = &binder_vm_ops; vma->vm_private_data = proc; - if (binder_update_page_range(proc, 1, proc->buffer, proc->buffer + PAGE_SIZE, vma)) { - ret = -ENOMEM; - failure_string = "alloc small buf"; - goto err_alloc_small_buf_failed; - } - buffer = proc->buffer; - INIT_LIST_HEAD(&proc->buffers); - list_add(&buffer->entry, &proc->buffers); - buffer->free = 1; - binder_insert_free_buffer(proc, buffer); - proc->free_async_space = proc->buffer_size / 2; - barrier(); + ret = binder_alloc_mmap_handler(&proc->alloc, vma); + if (ret) + return ret; + mutex_lock(&proc->files_lock); proc->files = get_files_struct(current); - proc->vma = vma; - proc->vma_vm_mm = vma->vm_mm; - - /*pr_info("binder_mmap: %d %lx-%lx maps %p\n", - proc->pid, vma->vm_start, vma->vm_end, proc->buffer);*/ + mutex_unlock(&proc->files_lock); return 0; -err_alloc_small_buf_failed: - kfree(proc->pages); - proc->pages = NULL; -err_alloc_pages_failed: - mutex_lock(&binder_mmap_lock); - vfree(proc->buffer); - proc->buffer = NULL; -err_get_vm_area_failed: -err_already_mapped: - mutex_unlock(&binder_mmap_lock); err_bad_arg: - pr_err("binder_mmap: %d %lx-%lx %s failed %d\n", + pr_err("%s: %d %lx-%lx %s failed %d\n", __func__, proc->pid, vma->vm_start, vma->vm_end, failure_string, ret); return ret; } @@ -2973,35 +4976,58 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma) static int binder_open(struct inode *nodp, struct file *filp) { struct binder_proc *proc; + struct binder_device *binder_dev; - binder_debug(BINDER_DEBUG_OPEN_CLOSE, "binder_open: %d:%d\n", + binder_debug(BINDER_DEBUG_OPEN_CLOSE, "%s: %d:%d\n", __func__, current->group_leader->pid, current->pid); proc = kzalloc(sizeof(*proc), GFP_KERNEL); if (proc == NULL) return -ENOMEM; + spin_lock_init(&proc->inner_lock); + spin_lock_init(&proc->outer_lock); get_task_struct(current->group_leader); proc->tsk = current->group_leader; + mutex_init(&proc->files_lock); INIT_LIST_HEAD(&proc->todo); - init_waitqueue_head(&proc->wait); - proc->default_priority = task_nice(current); + if (binder_supported_policy(current->policy)) { + proc->default_priority.sched_policy = current->policy; + proc->default_priority.prio = current->normal_prio; + } else { + proc->default_priority.sched_policy = SCHED_NORMAL; + proc->default_priority.prio = NICE_TO_PRIO(0); + } - binder_lock(__func__); + binder_dev = container_of(filp->private_data, struct binder_device, + miscdev); + proc->context = &binder_dev->context; + binder_alloc_init(&proc->alloc); binder_stats_created(BINDER_STAT_PROC); - hlist_add_head(&proc->proc_node, &binder_procs); proc->pid = current->group_leader->pid; INIT_LIST_HEAD(&proc->delivered_death); + INIT_LIST_HEAD(&proc->waiting_threads); filp->private_data = proc; - binder_unlock(__func__); + mutex_lock(&binder_procs_lock); + hlist_add_head(&proc->proc_node, &binder_procs); + mutex_unlock(&binder_procs_lock); if (binder_debugfs_dir_entry_proc) { char strbuf[11]; snprintf(strbuf, sizeof(strbuf), "%u", proc->pid); - proc->debugfs_entry = debugfs_create_file(strbuf, S_IRUGO, - binder_debugfs_dir_entry_proc, proc, &binder_proc_fops); + /* + * proc debug entries are shared between contexts, so + * this will fail if the process tries to open the driver + * again with a different context. The priting code will + * anyway print all contexts that a given PID has, so this + * is not a problem. + */ + proc->debugfs_entry = debugfs_create_file(strbuf, 0444, + binder_debugfs_dir_entry_proc, + (void *)(unsigned long)proc->pid, + &binder_proc_fops); } return 0; @@ -3021,16 +5047,17 @@ static void binder_deferred_flush(struct binder_proc *proc) struct rb_node *n; int wake_count = 0; + binder_inner_proc_lock(proc); for (n = rb_first(&proc->threads); n != NULL; n = rb_next(n)) { struct binder_thread *thread = rb_entry(n, struct binder_thread, rb_node); - thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN; + thread->looper_need_return = true; if (thread->looper & BINDER_LOOPER_STATE_WAITING) { wake_up_interruptible(&thread->wait); wake_count++; } } - wake_up_interruptible_all(&proc->wait); + binder_inner_proc_unlock(proc); binder_debug(BINDER_DEBUG_OPEN_CLOSE, "binder_flush: %d woke %d threads\n", proc->pid, @@ -3051,13 +5078,21 @@ static int binder_node_release(struct binder_node *node, int refs) { struct binder_ref *ref; int death = 0; + struct binder_proc *proc = node->proc; - list_del_init(&node->work.entry); - binder_release_work(&node->async_todo); + binder_release_work(proc, &node->async_todo); - if (hlist_empty(&node->refs)) { - kfree(node); - binder_stats_deleted(BINDER_STAT_NODE); + binder_node_lock(node); + binder_inner_proc_lock(proc); + binder_dequeue_work_ilocked(&node->work); + /* + * The caller must have taken a temporary ref on the node, + */ + BUG_ON(!node->tmp_refs); + if (hlist_empty(&node->refs) && node->tmp_refs == 1) { + binder_inner_proc_unlock(proc); + binder_node_unlock(node); + binder_free_node(node); return refs; } @@ -3065,59 +5100,84 @@ static int binder_node_release(struct binder_node *node, int refs) node->proc = NULL; node->local_strong_refs = 0; node->local_weak_refs = 0; + binder_inner_proc_unlock(proc); + + spin_lock(&binder_dead_nodes_lock); hlist_add_head(&node->dead_node, &binder_dead_nodes); + spin_unlock(&binder_dead_nodes_lock); hlist_for_each_entry(ref, &node->refs, node_entry) { refs++; - - if (!ref->death) + /* + * Need the node lock to synchronize + * with new notification requests and the + * inner lock to synchronize with queued + * death notifications. + */ + binder_inner_proc_lock(ref->proc); + if (!ref->death) { + binder_inner_proc_unlock(ref->proc); continue; + } death++; - if (list_empty(&ref->death->work.entry)) { - ref->death->work.type = BINDER_WORK_DEAD_BINDER; - list_add_tail(&ref->death->work.entry, - &ref->proc->todo); - wake_up_interruptible(&ref->proc->wait); - } else - BUG(); + BUG_ON(!list_empty(&ref->death->work.entry)); + ref->death->work.type = BINDER_WORK_DEAD_BINDER; + binder_enqueue_work_ilocked(&ref->death->work, + &ref->proc->todo); + binder_wakeup_proc_ilocked(ref->proc); + binder_inner_proc_unlock(ref->proc); } binder_debug(BINDER_DEBUG_DEAD_BINDER, "node %d now dead, refs %d, death %d\n", node->debug_id, refs, death); + binder_node_unlock(node); + binder_put_node(node); return refs; } static void binder_deferred_release(struct binder_proc *proc) { - struct binder_transaction *t; + struct binder_context *context = proc->context; struct rb_node *n; - int threads, nodes, incoming_refs, outgoing_refs, buffers, - active_transactions, page_count; + int threads, nodes, incoming_refs, outgoing_refs, active_transactions; - BUG_ON(proc->vma); BUG_ON(proc->files); + mutex_lock(&binder_procs_lock); hlist_del(&proc->proc_node); + mutex_unlock(&binder_procs_lock); - if (binder_context_mgr_node && binder_context_mgr_node->proc == proc) { + mutex_lock(&context->context_mgr_node_lock); + if (context->binder_context_mgr_node && + context->binder_context_mgr_node->proc == proc) { binder_debug(BINDER_DEBUG_DEAD_BINDER, "%s: %d context_mgr_node gone\n", __func__, proc->pid); - binder_context_mgr_node = NULL; + context->binder_context_mgr_node = NULL; } + mutex_unlock(&context->context_mgr_node_lock); + binder_inner_proc_lock(proc); + /* + * Make sure proc stays alive after we + * remove all the threads + */ + proc->tmp_ref++; + proc->is_dead = true; threads = 0; active_transactions = 0; while ((n = rb_first(&proc->threads))) { struct binder_thread *thread; thread = rb_entry(n, struct binder_thread, rb_node); + binder_inner_proc_unlock(proc); threads++; - active_transactions += binder_free_thread(proc, thread); + active_transactions += binder_thread_release(proc, thread); + binder_inner_proc_lock(proc); } nodes = 0; @@ -3127,73 +5187,42 @@ static void binder_deferred_release(struct binder_proc *proc) node = rb_entry(n, struct binder_node, rb_node); nodes++; + /* + * take a temporary ref on the node before + * calling binder_node_release() which will either + * kfree() the node or call binder_put_node() + */ + binder_inc_node_tmpref_ilocked(node); rb_erase(&node->rb_node, &proc->nodes); + binder_inner_proc_unlock(proc); incoming_refs = binder_node_release(node, incoming_refs); + binder_inner_proc_lock(proc); } + binder_inner_proc_unlock(proc); outgoing_refs = 0; + binder_proc_lock(proc); while ((n = rb_first(&proc->refs_by_desc))) { struct binder_ref *ref; ref = rb_entry(n, struct binder_ref, rb_node_desc); outgoing_refs++; - binder_delete_ref(ref); + binder_cleanup_ref_olocked(ref); + binder_proc_unlock(proc); + binder_free_ref(ref); + binder_proc_lock(proc); } + binder_proc_unlock(proc); - binder_release_work(&proc->todo); - binder_release_work(&proc->delivered_death); - - buffers = 0; - while ((n = rb_first(&proc->allocated_buffers))) { - struct binder_buffer *buffer; - - buffer = rb_entry(n, struct binder_buffer, rb_node); - - t = buffer->transaction; - if (t) { - t->buffer = NULL; - buffer->transaction = NULL; - pr_err("release proc %d, transaction %d, not freed\n", - proc->pid, t->debug_id); - /*BUG();*/ - } - - binder_free_buf(proc, buffer); - buffers++; - } - - binder_stats_deleted(BINDER_STAT_PROC); - - page_count = 0; - if (proc->pages) { - int i; - - for (i = 0; i < proc->buffer_size / PAGE_SIZE; i++) { - void *page_addr; - - if (!proc->pages[i]) - continue; - - page_addr = proc->buffer + i * PAGE_SIZE; - binder_debug(BINDER_DEBUG_BUFFER_ALLOC, - "%s: %d: page %d at %p not freed\n", - __func__, proc->pid, i, page_addr); - unmap_kernel_range((unsigned long)page_addr, PAGE_SIZE); - __free_page(proc->pages[i]); - page_count++; - } - kfree(proc->pages); - vfree(proc->buffer); - } - - put_task_struct(proc->tsk); + binder_release_work(proc, &proc->todo); + binder_release_work(proc, &proc->delivered_death); binder_debug(BINDER_DEBUG_OPEN_CLOSE, - "%s: %d threads %d, nodes %d (ref %d), refs %d, active transactions %d, buffers %d, pages %d\n", + "%s: %d threads %d, nodes %d (ref %d), refs %d, active transactions %d\n", __func__, proc->pid, threads, nodes, incoming_refs, - outgoing_refs, active_transactions, buffers, page_count); + outgoing_refs, active_transactions); - kfree(proc); + binder_proc_dec_tmpref(proc); } static void binder_deferred_func(struct work_struct *work) @@ -3204,7 +5233,6 @@ static void binder_deferred_func(struct work_struct *work) int defer; do { - binder_lock(__func__); mutex_lock(&binder_deferred_lock); if (!hlist_empty(&binder_deferred_list)) { proc = hlist_entry(binder_deferred_list.first, @@ -3220,9 +5248,11 @@ static void binder_deferred_func(struct work_struct *work) files = NULL; if (defer & BINDER_DEFERRED_PUT_FILES) { + mutex_lock(&proc->files_lock); files = proc->files; if (files) proc->files = NULL; + mutex_unlock(&proc->files_lock); } if (defer & BINDER_DEFERRED_FLUSH) @@ -3231,7 +5261,6 @@ static void binder_deferred_func(struct work_struct *work) if (defer & BINDER_DEFERRED_RELEASE) binder_deferred_release(proc); /* frees proc */ - binder_unlock(__func__); if (files) put_files_struct(files); } while (proc); @@ -3251,41 +5280,52 @@ binder_defer_work(struct binder_proc *proc, enum binder_deferred_state defer) mutex_unlock(&binder_deferred_lock); } -static void print_binder_transaction(struct seq_file *m, const char *prefix, - struct binder_transaction *t) +static void print_binder_transaction_ilocked(struct seq_file *m, + struct binder_proc *proc, + const char *prefix, + struct binder_transaction *t) { + struct binder_proc *to_proc; + struct binder_buffer *buffer = t->buffer; + + spin_lock(&t->lock); + to_proc = t->to_proc; seq_printf(m, - "%s %d: %p from %d:%d to %d:%d code %x flags %x pri %ld r%d", + "%s %d: %pK from %d:%d to %d:%d code %x flags %x pri %d:%d r%d", prefix, t->debug_id, t, t->from ? t->from->proc->pid : 0, t->from ? t->from->pid : 0, - t->to_proc ? t->to_proc->pid : 0, + to_proc ? to_proc->pid : 0, t->to_thread ? t->to_thread->pid : 0, - t->code, t->flags, t->priority, t->need_reply); - if (t->buffer == NULL) { + t->code, t->flags, t->priority.sched_policy, + t->priority.prio, t->need_reply); + spin_unlock(&t->lock); + + if (proc != to_proc) { + /* + * Can only safely deref buffer if we are holding the + * correct proc inner lock for this node + */ + seq_puts(m, "\n"); + return; + } + + if (buffer == NULL) { seq_puts(m, " buffer free\n"); return; } - if (t->buffer->target_node) - seq_printf(m, " node %d", - t->buffer->target_node->debug_id); - seq_printf(m, " size %zd:%zd data %p\n", - t->buffer->data_size, t->buffer->offsets_size, - t->buffer->data); -} - -static void print_binder_buffer(struct seq_file *m, const char *prefix, - struct binder_buffer *buffer) -{ - seq_printf(m, "%s %d: %p size %zd:%zd %s\n", - prefix, buffer->debug_id, buffer->data, + if (buffer->target_node) + seq_printf(m, " node %d", buffer->target_node->debug_id); + seq_printf(m, " size %zd:%zd data %pK\n", buffer->data_size, buffer->offsets_size, - buffer->transaction ? "active" : "delivered"); + buffer->data); } -static void print_binder_work(struct seq_file *m, const char *prefix, - const char *transaction_prefix, - struct binder_work *w) +static void print_binder_work_ilocked(struct seq_file *m, + struct binder_proc *proc, + const char *prefix, + const char *transaction_prefix, + struct binder_work *w) { struct binder_node *node; struct binder_transaction *t; @@ -3293,8 +5333,16 @@ static void print_binder_work(struct seq_file *m, const char *prefix, switch (w->type) { case BINDER_WORK_TRANSACTION: t = container_of(w, struct binder_transaction, work); - print_binder_transaction(m, transaction_prefix, t); + print_binder_transaction_ilocked( + m, proc, transaction_prefix, t); break; + case BINDER_WORK_RETURN_ERROR: { + struct binder_error *e = container_of( + w, struct binder_error, work); + + seq_printf(m, "%stransaction error: %u\n", + prefix, e->cmd); + } break; case BINDER_WORK_TRANSACTION_COMPLETE: seq_printf(m, "%stransaction complete\n", prefix); break; @@ -3319,40 +5367,46 @@ static void print_binder_work(struct seq_file *m, const char *prefix, } } -static void print_binder_thread(struct seq_file *m, - struct binder_thread *thread, - int print_always) +static void print_binder_thread_ilocked(struct seq_file *m, + struct binder_thread *thread, + int print_always) { struct binder_transaction *t; struct binder_work *w; size_t start_pos = m->count; size_t header_pos; - seq_printf(m, " thread %d: l %02x\n", thread->pid, thread->looper); + seq_printf(m, " thread %d: l %02x need_return %d tr %d\n", + thread->pid, thread->looper, + thread->looper_need_return, + atomic_read(&thread->tmp_ref)); header_pos = m->count; t = thread->transaction_stack; while (t) { if (t->from == thread) { - print_binder_transaction(m, - " outgoing transaction", t); + print_binder_transaction_ilocked(m, thread->proc, + " outgoing transaction", t); t = t->from_parent; } else if (t->to_thread == thread) { - print_binder_transaction(m, + print_binder_transaction_ilocked(m, thread->proc, " incoming transaction", t); t = t->to_parent; } else { - print_binder_transaction(m, " bad transaction", t); + print_binder_transaction_ilocked(m, thread->proc, + " bad transaction", t); t = NULL; } } list_for_each_entry(w, &thread->todo, entry) { - print_binder_work(m, " ", " pending transaction", w); + print_binder_work_ilocked(m, thread->proc, " ", + " pending transaction", w); } if (!print_always && m->count == header_pos) m->count = start_pos; } -static void print_binder_node(struct seq_file *m, struct binder_node *node) +static void print_binder_node_nilocked(struct seq_file *m, + struct binder_node *node) { struct binder_ref *ref; struct binder_work *w; @@ -3362,27 +5416,35 @@ static void print_binder_node(struct seq_file *m, struct binder_node *node) hlist_for_each_entry(ref, &node->refs, node_entry) count++; - seq_printf(m, " node %d: u%016llx c%016llx hs %d hw %d ls %d lw %d is %d iw %d", + seq_printf(m, " node %d: u%016llx c%016llx pri %d:%d hs %d hw %d ls %d lw %d is %d iw %d tr %d", node->debug_id, (u64)node->ptr, (u64)node->cookie, + node->sched_policy, node->min_priority, node->has_strong_ref, node->has_weak_ref, node->local_strong_refs, node->local_weak_refs, - node->internal_strong_refs, count); + node->internal_strong_refs, count, node->tmp_refs); if (count) { seq_puts(m, " proc"); hlist_for_each_entry(ref, &node->refs, node_entry) seq_printf(m, " %d", ref->proc->pid); } seq_puts(m, "\n"); - list_for_each_entry(w, &node->async_todo, entry) - print_binder_work(m, " ", - " pending async transaction", w); + if (node->proc) { + list_for_each_entry(w, &node->async_todo, entry) + print_binder_work_ilocked(m, node->proc, " ", + " pending async transaction", w); + } } -static void print_binder_ref(struct seq_file *m, struct binder_ref *ref) +static void print_binder_ref_olocked(struct seq_file *m, + struct binder_ref *ref) { - seq_printf(m, " ref %d: desc %d %snode %d s %d w %d d %p\n", - ref->debug_id, ref->desc, ref->node->proc ? "" : "dead ", - ref->node->debug_id, ref->strong, ref->weak, ref->death); + binder_node_lock(ref->node); + seq_printf(m, " ref %d: desc %d %snode %d s %d w %d d %pK\n", + ref->data.debug_id, ref->data.desc, + ref->node->proc ? "" : "dead ", + ref->node->debug_id, ref->data.strong, + ref->data.weak, ref->death); + binder_node_unlock(ref->node); } static void print_binder_proc(struct seq_file *m, @@ -3392,35 +5454,60 @@ static void print_binder_proc(struct seq_file *m, struct rb_node *n; size_t start_pos = m->count; size_t header_pos; + struct binder_node *last_node = NULL; seq_printf(m, "proc %d\n", proc->pid); + seq_printf(m, "context %s\n", proc->context->name); header_pos = m->count; + binder_inner_proc_lock(proc); for (n = rb_first(&proc->threads); n != NULL; n = rb_next(n)) - print_binder_thread(m, rb_entry(n, struct binder_thread, + print_binder_thread_ilocked(m, rb_entry(n, struct binder_thread, rb_node), print_all); + for (n = rb_first(&proc->nodes); n != NULL; n = rb_next(n)) { struct binder_node *node = rb_entry(n, struct binder_node, rb_node); - if (print_all || node->has_async_transaction) - print_binder_node(m, node); + /* + * take a temporary reference on the node so it + * survives and isn't removed from the tree + * while we print it. + */ + binder_inc_node_tmpref_ilocked(node); + /* Need to drop inner lock to take node lock */ + binder_inner_proc_unlock(proc); + if (last_node) + binder_put_node(last_node); + binder_node_inner_lock(node); + print_binder_node_nilocked(m, node); + binder_node_inner_unlock(node); + last_node = node; + binder_inner_proc_lock(proc); } + binder_inner_proc_unlock(proc); + if (last_node) + binder_put_node(last_node); + if (print_all) { + binder_proc_lock(proc); for (n = rb_first(&proc->refs_by_desc); n != NULL; n = rb_next(n)) - print_binder_ref(m, rb_entry(n, struct binder_ref, - rb_node_desc)); + print_binder_ref_olocked(m, rb_entry(n, + struct binder_ref, + rb_node_desc)); + binder_proc_unlock(proc); } - for (n = rb_first(&proc->allocated_buffers); n != NULL; n = rb_next(n)) - print_binder_buffer(m, " buffer", - rb_entry(n, struct binder_buffer, rb_node)); + binder_alloc_print_allocated(m, &proc->alloc); + binder_inner_proc_lock(proc); list_for_each_entry(w, &proc->todo, entry) - print_binder_work(m, " ", " pending transaction", w); + print_binder_work_ilocked(m, proc, " ", + " pending transaction", w); list_for_each_entry(w, &proc->delivered_death, entry) { seq_puts(m, " has delivered dead binder\n"); break; } + binder_inner_proc_unlock(proc); if (!print_all && m->count == header_pos) m->count = start_pos; } @@ -3463,7 +5550,9 @@ static const char * const binder_command_strings[] = { "BC_EXIT_LOOPER", "BC_REQUEST_DEATH_NOTIFICATION", "BC_CLEAR_DEATH_NOTIFICATION", - "BC_DEAD_BINDER_DONE" + "BC_DEAD_BINDER_DONE", + "BC_TRANSACTION_SG", + "BC_REPLY_SG", }; static const char * const binder_objstat_strings[] = { @@ -3484,17 +5573,21 @@ static void print_binder_stats(struct seq_file *m, const char *prefix, BUILD_BUG_ON(ARRAY_SIZE(stats->bc) != ARRAY_SIZE(binder_command_strings)); for (i = 0; i < ARRAY_SIZE(stats->bc); i++) { - if (stats->bc[i]) + int temp = atomic_read(&stats->bc[i]); + + if (temp) seq_printf(m, "%s%s: %d\n", prefix, - binder_command_strings[i], stats->bc[i]); + binder_command_strings[i], temp); } BUILD_BUG_ON(ARRAY_SIZE(stats->br) != ARRAY_SIZE(binder_return_strings)); for (i = 0; i < ARRAY_SIZE(stats->br); i++) { - if (stats->br[i]) + int temp = atomic_read(&stats->br[i]); + + if (temp) seq_printf(m, "%s%s: %d\n", prefix, - binder_return_strings[i], stats->br[i]); + binder_return_strings[i], temp); } BUILD_BUG_ON(ARRAY_SIZE(stats->obj_created) != @@ -3502,11 +5595,15 @@ static void print_binder_stats(struct seq_file *m, const char *prefix, BUILD_BUG_ON(ARRAY_SIZE(stats->obj_created) != ARRAY_SIZE(stats->obj_deleted)); for (i = 0; i < ARRAY_SIZE(stats->obj_created); i++) { - if (stats->obj_created[i] || stats->obj_deleted[i]) - seq_printf(m, "%s%s: active %d total %d\n", prefix, + int created = atomic_read(&stats->obj_created[i]); + int deleted = atomic_read(&stats->obj_deleted[i]); + + if (created || deleted) + seq_printf(m, "%s%s: active %d total %d\n", + prefix, binder_objstat_strings[i], - stats->obj_created[i] - stats->obj_deleted[i], - stats->obj_created[i]); + created - deleted, + created); } } @@ -3514,50 +5611,61 @@ static void print_binder_proc_stats(struct seq_file *m, struct binder_proc *proc) { struct binder_work *w; + struct binder_thread *thread; struct rb_node *n; - int count, strong, weak; + int count, strong, weak, ready_threads; + size_t free_async_space = + binder_alloc_get_free_async_space(&proc->alloc); seq_printf(m, "proc %d\n", proc->pid); + seq_printf(m, "context %s\n", proc->context->name); count = 0; + ready_threads = 0; + binder_inner_proc_lock(proc); for (n = rb_first(&proc->threads); n != NULL; n = rb_next(n)) count++; + + list_for_each_entry(thread, &proc->waiting_threads, waiting_thread_node) + ready_threads++; + seq_printf(m, " threads: %d\n", count); seq_printf(m, " requested threads: %d+%d/%d\n" " ready threads %d\n" " free async space %zd\n", proc->requested_threads, proc->requested_threads_started, proc->max_threads, - proc->ready_threads, proc->free_async_space); + ready_threads, + free_async_space); count = 0; for (n = rb_first(&proc->nodes); n != NULL; n = rb_next(n)) count++; + binder_inner_proc_unlock(proc); seq_printf(m, " nodes: %d\n", count); count = 0; strong = 0; weak = 0; + binder_proc_lock(proc); for (n = rb_first(&proc->refs_by_desc); n != NULL; n = rb_next(n)) { struct binder_ref *ref = rb_entry(n, struct binder_ref, rb_node_desc); count++; - strong += ref->strong; - weak += ref->weak; + strong += ref->data.strong; + weak += ref->data.weak; } + binder_proc_unlock(proc); seq_printf(m, " refs: %d s %d w %d\n", count, strong, weak); - count = 0; - for (n = rb_first(&proc->allocated_buffers); n != NULL; n = rb_next(n)) - count++; + count = binder_alloc_get_allocated_count(&proc->alloc); seq_printf(m, " buffers: %d\n", count); + binder_alloc_print_pages(m, &proc->alloc); + count = 0; + binder_inner_proc_lock(proc); list_for_each_entry(w, &proc->todo, entry) { - switch (w->type) { - case BINDER_WORK_TRANSACTION: + if (w->type == BINDER_WORK_TRANSACTION) count++; - break; - default: - break; - } } + binder_inner_proc_unlock(proc); seq_printf(m, " pending transactions: %d\n", count); print_binder_stats(m, " ", &proc->stats); @@ -3568,107 +5676,131 @@ static int binder_state_show(struct seq_file *m, void *unused) { struct binder_proc *proc; struct binder_node *node; - int do_lock = !binder_debug_no_lock; - - if (do_lock) - binder_lock(__func__); + struct binder_node *last_node = NULL; seq_puts(m, "binder state:\n"); + spin_lock(&binder_dead_nodes_lock); if (!hlist_empty(&binder_dead_nodes)) seq_puts(m, "dead nodes:\n"); - hlist_for_each_entry(node, &binder_dead_nodes, dead_node) - print_binder_node(m, node); + hlist_for_each_entry(node, &binder_dead_nodes, dead_node) { + /* + * take a temporary reference on the node so it + * survives and isn't removed from the list + * while we print it. + */ + node->tmp_refs++; + spin_unlock(&binder_dead_nodes_lock); + if (last_node) + binder_put_node(last_node); + binder_node_lock(node); + print_binder_node_nilocked(m, node); + binder_node_unlock(node); + last_node = node; + spin_lock(&binder_dead_nodes_lock); + } + spin_unlock(&binder_dead_nodes_lock); + if (last_node) + binder_put_node(last_node); + mutex_lock(&binder_procs_lock); hlist_for_each_entry(proc, &binder_procs, proc_node) print_binder_proc(m, proc, 1); - if (do_lock) - binder_unlock(__func__); + mutex_unlock(&binder_procs_lock); + return 0; } static int binder_stats_show(struct seq_file *m, void *unused) { struct binder_proc *proc; - int do_lock = !binder_debug_no_lock; - - if (do_lock) - binder_lock(__func__); seq_puts(m, "binder stats:\n"); print_binder_stats(m, "", &binder_stats); + mutex_lock(&binder_procs_lock); hlist_for_each_entry(proc, &binder_procs, proc_node) print_binder_proc_stats(m, proc); - if (do_lock) - binder_unlock(__func__); + mutex_unlock(&binder_procs_lock); + return 0; } static int binder_transactions_show(struct seq_file *m, void *unused) { struct binder_proc *proc; - int do_lock = !binder_debug_no_lock; - - if (do_lock) - binder_lock(__func__); seq_puts(m, "binder transactions:\n"); + mutex_lock(&binder_procs_lock); hlist_for_each_entry(proc, &binder_procs, proc_node) print_binder_proc(m, proc, 0); - if (do_lock) - binder_unlock(__func__); + mutex_unlock(&binder_procs_lock); + return 0; } static int binder_proc_show(struct seq_file *m, void *unused) { struct binder_proc *itr; - struct binder_proc *proc = m->private; - int do_lock = !binder_debug_no_lock; - bool valid_proc = false; + int pid = (unsigned long)m->private; - if (do_lock) - binder_lock(__func__); - + mutex_lock(&binder_procs_lock); hlist_for_each_entry(itr, &binder_procs, proc_node) { - if (itr == proc) { - valid_proc = true; - break; + if (itr->pid == pid) { + seq_puts(m, "binder proc state:\n"); + print_binder_proc(m, itr, 1); } } - if (valid_proc) { - seq_puts(m, "binder proc state:\n"); - print_binder_proc(m, proc, 1); - } - if (do_lock) - binder_unlock(__func__); + mutex_unlock(&binder_procs_lock); + return 0; } static void print_binder_transaction_log_entry(struct seq_file *m, struct binder_transaction_log_entry *e) { + int debug_id = READ_ONCE(e->debug_id_done); + /* + * read barrier to guarantee debug_id_done read before + * we print the log values + */ + smp_rmb(); seq_printf(m, - "%d: %s from %d:%d to %d:%d node %d handle %d size %d:%d\n", + "%d: %s from %d:%d to %d:%d context %s node %d handle %d size %d:%d ret %d/%d l=%d", e->debug_id, (e->call_type == 2) ? "reply" : ((e->call_type == 1) ? "async" : "call "), e->from_proc, - e->from_thread, e->to_proc, e->to_thread, e->to_node, - e->target_handle, e->data_size, e->offsets_size); + e->from_thread, e->to_proc, e->to_thread, e->context_name, + e->to_node, e->target_handle, e->data_size, e->offsets_size, + e->return_error, e->return_error_param, + e->return_error_line); + /* + * read-barrier to guarantee read of debug_id_done after + * done printing the fields of the entry + */ + smp_rmb(); + seq_printf(m, debug_id && debug_id == READ_ONCE(e->debug_id_done) ? + "\n" : " (incomplete)\n"); } static int binder_transaction_log_show(struct seq_file *m, void *unused) { struct binder_transaction_log *log = m->private; + unsigned int log_cur = atomic_read(&log->cur); + unsigned int count; + unsigned int cur; int i; - if (log->full) { - for (i = log->next; i < ARRAY_SIZE(log->entry); i++) - print_binder_transaction_log_entry(m, &log->entry[i]); + count = log_cur + 1; + cur = count < ARRAY_SIZE(log->entry) && !log->full ? + 0 : count % ARRAY_SIZE(log->entry); + if (count > ARRAY_SIZE(log->entry) || log->full) + count = ARRAY_SIZE(log->entry); + for (i = 0; i < count; i++) { + unsigned int index = cur++ % ARRAY_SIZE(log->entry); + + print_binder_transaction_log_entry(m, &log->entry[index]); } - for (i = 0; i < log->next; i++) - print_binder_transaction_log_entry(m, &log->entry[i]); return 0; } @@ -3683,53 +5815,118 @@ static const struct file_operations binder_fops = { .release = binder_release, }; -static struct miscdevice binder_miscdev = { - .minor = MISC_DYNAMIC_MINOR, - .name = "binder", - .fops = &binder_fops -}; - BINDER_DEBUG_ENTRY(state); BINDER_DEBUG_ENTRY(stats); BINDER_DEBUG_ENTRY(transactions); BINDER_DEBUG_ENTRY(transaction_log); +static int __init init_binder_device(const char *name) +{ + int ret; + struct binder_device *binder_device; + + binder_device = kzalloc(sizeof(*binder_device), GFP_KERNEL); + if (!binder_device) + return -ENOMEM; + + binder_device->miscdev.fops = &binder_fops; + binder_device->miscdev.minor = MISC_DYNAMIC_MINOR; + binder_device->miscdev.name = name; + + binder_device->context.binder_context_mgr_uid = INVALID_UID; + binder_device->context.name = name; + mutex_init(&binder_device->context.context_mgr_node_lock); + + ret = misc_register(&binder_device->miscdev); + if (ret < 0) { + kfree(binder_device); + return ret; + } + + hlist_add_head(&binder_device->hlist, &binder_devices); + + return ret; +} + static int __init binder_init(void) { int ret; + char *device_name, *device_names, *device_tmp; + struct binder_device *device; + struct hlist_node *tmp; + + ret = binder_alloc_shrinker_init(); + if (ret) + return ret; + + atomic_set(&binder_transaction_log.cur, ~0U); + atomic_set(&binder_transaction_log_failed.cur, ~0U); binder_debugfs_dir_entry_root = debugfs_create_dir("binder", NULL); if (binder_debugfs_dir_entry_root) binder_debugfs_dir_entry_proc = debugfs_create_dir("proc", binder_debugfs_dir_entry_root); - ret = misc_register(&binder_miscdev); + if (binder_debugfs_dir_entry_root) { debugfs_create_file("state", - S_IRUGO, + 0444, binder_debugfs_dir_entry_root, NULL, &binder_state_fops); debugfs_create_file("stats", - S_IRUGO, + 0444, binder_debugfs_dir_entry_root, NULL, &binder_stats_fops); debugfs_create_file("transactions", - S_IRUGO, + 0444, binder_debugfs_dir_entry_root, NULL, &binder_transactions_fops); debugfs_create_file("transaction_log", - S_IRUGO, + 0444, binder_debugfs_dir_entry_root, &binder_transaction_log, &binder_transaction_log_fops); debugfs_create_file("failed_transaction_log", - S_IRUGO, + 0444, binder_debugfs_dir_entry_root, &binder_transaction_log_failed, &binder_transaction_log_fops); } + + /* + * Copy the module_parameter string, because we don't want to + * tokenize it in-place. + */ + device_names = kzalloc(strlen(binder_devices_param) + 1, GFP_KERNEL); + if (!device_names) { + ret = -ENOMEM; + goto err_alloc_device_names_failed; + } + strcpy(device_names, binder_devices_param); + + device_tmp = device_names; + while ((device_name = strsep(&device_tmp, ","))) { + ret = init_binder_device(device_name); + if (ret) + goto err_init_binder_device_failed; + } + + return ret; + +err_init_binder_device_failed: + hlist_for_each_entry_safe(device, tmp, &binder_devices, hlist) { + misc_deregister(&device->miscdev); + hlist_del(&device->hlist); + kfree(device); + } + + kfree(device_names); + +err_alloc_device_names_failed: + debugfs_remove_recursive(binder_debugfs_dir_entry_root); + return ret; }
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c new file mode 100644 index 0000000..bec6c0a --- /dev/null +++ b/drivers/android/binder_alloc.c
@@ -0,0 +1,1022 @@ +/* binder_alloc.c + * + * Android IPC Subsystem + * + * Copyright (C) 2007-2017 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <asm/cacheflush.h> +#include <linux/list.h> +#include <linux/mm.h> +#include <linux/module.h> +#include <linux/rtmutex.h> +#include <linux/rbtree.h> +#include <linux/seq_file.h> +#include <linux/vmalloc.h> +#include <linux/slab.h> +#include <linux/sched.h> +#include <linux/list_lru.h> +#include "binder_alloc.h" +#include "binder_trace.h" + +struct list_lru binder_alloc_lru; + +static DEFINE_MUTEX(binder_alloc_mmap_lock); + +enum { + BINDER_DEBUG_OPEN_CLOSE = 1U << 1, + BINDER_DEBUG_BUFFER_ALLOC = 1U << 2, + BINDER_DEBUG_BUFFER_ALLOC_ASYNC = 1U << 3, +}; +static uint32_t binder_alloc_debug_mask; + +module_param_named(debug_mask, binder_alloc_debug_mask, + uint, 0644); + +#define binder_alloc_debug(mask, x...) \ + do { \ + if (binder_alloc_debug_mask & mask) \ + pr_info(x); \ + } while (0) + +static struct binder_buffer *binder_buffer_next(struct binder_buffer *buffer) +{ + return list_entry(buffer->entry.next, struct binder_buffer, entry); +} + +static struct binder_buffer *binder_buffer_prev(struct binder_buffer *buffer) +{ + return list_entry(buffer->entry.prev, struct binder_buffer, entry); +} + +static size_t binder_alloc_buffer_size(struct binder_alloc *alloc, + struct binder_buffer *buffer) +{ + if (list_is_last(&buffer->entry, &alloc->buffers)) + return (u8 *)alloc->buffer + + alloc->buffer_size - (u8 *)buffer->data; + return (u8 *)binder_buffer_next(buffer)->data - (u8 *)buffer->data; +} + +static void binder_insert_free_buffer(struct binder_alloc *alloc, + struct binder_buffer *new_buffer) +{ + struct rb_node **p = &alloc->free_buffers.rb_node; + struct rb_node *parent = NULL; + struct binder_buffer *buffer; + size_t buffer_size; + size_t new_buffer_size; + + BUG_ON(!new_buffer->free); + + new_buffer_size = binder_alloc_buffer_size(alloc, new_buffer); + + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: add free buffer, size %zd, at %pK\n", + alloc->pid, new_buffer_size, new_buffer); + + while (*p) { + parent = *p; + buffer = rb_entry(parent, struct binder_buffer, rb_node); + BUG_ON(!buffer->free); + + buffer_size = binder_alloc_buffer_size(alloc, buffer); + + if (new_buffer_size < buffer_size) + p = &parent->rb_left; + else + p = &parent->rb_right; + } + rb_link_node(&new_buffer->rb_node, parent, p); + rb_insert_color(&new_buffer->rb_node, &alloc->free_buffers); +} + +static void binder_insert_allocated_buffer_locked( + struct binder_alloc *alloc, struct binder_buffer *new_buffer) +{ + struct rb_node **p = &alloc->allocated_buffers.rb_node; + struct rb_node *parent = NULL; + struct binder_buffer *buffer; + + BUG_ON(new_buffer->free); + + while (*p) { + parent = *p; + buffer = rb_entry(parent, struct binder_buffer, rb_node); + BUG_ON(buffer->free); + + if (new_buffer->data < buffer->data) + p = &parent->rb_left; + else if (new_buffer->data > buffer->data) + p = &parent->rb_right; + else + BUG(); + } + rb_link_node(&new_buffer->rb_node, parent, p); + rb_insert_color(&new_buffer->rb_node, &alloc->allocated_buffers); +} + +static struct binder_buffer *binder_alloc_prepare_to_free_locked( + struct binder_alloc *alloc, + uintptr_t user_ptr) +{ + struct rb_node *n = alloc->allocated_buffers.rb_node; + struct binder_buffer *buffer; + void *kern_ptr; + + kern_ptr = (void *)(user_ptr - alloc->user_buffer_offset); + + while (n) { + buffer = rb_entry(n, struct binder_buffer, rb_node); + BUG_ON(buffer->free); + + if (kern_ptr < buffer->data) + n = n->rb_left; + else if (kern_ptr > buffer->data) + n = n->rb_right; + else { + /* + * Guard against user threads attempting to + * free the buffer twice + */ + if (buffer->free_in_progress) { + pr_err("%d:%d FREE_BUFFER u%016llx user freed buffer twice\n", + alloc->pid, current->pid, (u64)user_ptr); + return NULL; + } + buffer->free_in_progress = 1; + return buffer; + } + } + return NULL; +} + +/** + * binder_alloc_buffer_lookup() - get buffer given user ptr + * @alloc: binder_alloc for this proc + * @user_ptr: User pointer to buffer data + * + * Validate userspace pointer to buffer data and return buffer corresponding to + * that user pointer. Search the rb tree for buffer that matches user data + * pointer. + * + * Return: Pointer to buffer or NULL + */ +struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc, + uintptr_t user_ptr) +{ + struct binder_buffer *buffer; + + mutex_lock(&alloc->mutex); + buffer = binder_alloc_prepare_to_free_locked(alloc, user_ptr); + mutex_unlock(&alloc->mutex); + return buffer; +} + +static int binder_update_page_range(struct binder_alloc *alloc, int allocate, + void *start, void *end) +{ + void *page_addr; + unsigned long user_page_addr; + struct binder_lru_page *page; + struct vm_area_struct *vma = NULL; + struct mm_struct *mm = NULL; + bool need_mm = false; + + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: %s pages %pK-%pK\n", alloc->pid, + allocate ? "allocate" : "free", start, end); + + if (end <= start) + return 0; + + trace_binder_update_page_range(alloc, allocate, start, end); + + if (allocate == 0) + goto free_range; + + for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) { + page = &alloc->pages[(page_addr - alloc->buffer) / PAGE_SIZE]; + if (!page->page_ptr) { + need_mm = true; + break; + } + } + + if (need_mm && mmget_not_zero(alloc->vma_vm_mm)) + mm = alloc->vma_vm_mm; + + if (mm) { + down_read(&mm->mmap_sem); + vma = alloc->vma; + } + + if (!vma && need_mm) { + pr_err("%d: binder_alloc_buf failed to map pages in userspace, no vma\n", + alloc->pid); + goto err_no_vma; + } + + for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) { + int ret; + bool on_lru; + size_t index; + + index = (page_addr - alloc->buffer) / PAGE_SIZE; + page = &alloc->pages[index]; + + if (page->page_ptr) { + trace_binder_alloc_lru_start(alloc, index); + + on_lru = list_lru_del(&binder_alloc_lru, &page->lru); + WARN_ON(!on_lru); + + trace_binder_alloc_lru_end(alloc, index); + continue; + } + + if (WARN_ON(!vma)) + goto err_page_ptr_cleared; + + trace_binder_alloc_page_start(alloc, index); + page->page_ptr = alloc_page(GFP_KERNEL | + __GFP_HIGHMEM | + __GFP_ZERO); + if (!page->page_ptr) { + pr_err("%d: binder_alloc_buf failed for page at %pK\n", + alloc->pid, page_addr); + goto err_alloc_page_failed; + } + page->alloc = alloc; + INIT_LIST_HEAD(&page->lru); + + ret = map_kernel_range_noflush((unsigned long)page_addr, + PAGE_SIZE, PAGE_KERNEL, + &page->page_ptr); + flush_cache_vmap((unsigned long)page_addr, + (unsigned long)page_addr + PAGE_SIZE); + if (ret != 1) { + pr_err("%d: binder_alloc_buf failed to map page at %pK in kernel\n", + alloc->pid, page_addr); + goto err_map_kernel_failed; + } + user_page_addr = + (uintptr_t)page_addr + alloc->user_buffer_offset; + ret = vm_insert_page(vma, user_page_addr, page[0].page_ptr); + if (ret) { + pr_err("%d: binder_alloc_buf failed to map page at %lx in userspace\n", + alloc->pid, user_page_addr); + goto err_vm_insert_page_failed; + } + + if (index + 1 > alloc->pages_high) + alloc->pages_high = index + 1; + + trace_binder_alloc_page_end(alloc, index); + /* vm_insert_page does not seem to increment the refcount */ + } + if (mm) { + up_read(&mm->mmap_sem); + mmput(mm); + } + return 0; + +free_range: + for (page_addr = end - PAGE_SIZE; page_addr >= start; + page_addr -= PAGE_SIZE) { + bool ret; + size_t index; + + index = (page_addr - alloc->buffer) / PAGE_SIZE; + page = &alloc->pages[index]; + + trace_binder_free_lru_start(alloc, index); + + ret = list_lru_add(&binder_alloc_lru, &page->lru); + WARN_ON(!ret); + + trace_binder_free_lru_end(alloc, index); + continue; + +err_vm_insert_page_failed: + unmap_kernel_range((unsigned long)page_addr, PAGE_SIZE); +err_map_kernel_failed: + __free_page(page->page_ptr); + page->page_ptr = NULL; +err_alloc_page_failed: +err_page_ptr_cleared: + ; + } +err_no_vma: + if (mm) { + up_read(&mm->mmap_sem); + mmput(mm); + } + return vma ? -ENOMEM : -ESRCH; +} + +static struct binder_buffer *binder_alloc_new_buf_locked( + struct binder_alloc *alloc, + size_t data_size, + size_t offsets_size, + size_t extra_buffers_size, + int is_async) +{ + struct rb_node *n = alloc->free_buffers.rb_node; + struct binder_buffer *buffer; + size_t buffer_size; + struct rb_node *best_fit = NULL; + void *has_page_addr; + void *end_page_addr; + size_t size, data_offsets_size; + int ret; + + if (alloc->vma == NULL) { + pr_err("%d: binder_alloc_buf, no vma\n", + alloc->pid); + return ERR_PTR(-ESRCH); + } + + data_offsets_size = ALIGN(data_size, sizeof(void *)) + + ALIGN(offsets_size, sizeof(void *)); + + if (data_offsets_size < data_size || data_offsets_size < offsets_size) { + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: got transaction with invalid size %zd-%zd\n", + alloc->pid, data_size, offsets_size); + return ERR_PTR(-EINVAL); + } + size = data_offsets_size + ALIGN(extra_buffers_size, sizeof(void *)); + if (size < data_offsets_size || size < extra_buffers_size) { + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: got transaction with invalid extra_buffers_size %zd\n", + alloc->pid, extra_buffers_size); + return ERR_PTR(-EINVAL); + } + if (is_async && + alloc->free_async_space < size + sizeof(struct binder_buffer)) { + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: binder_alloc_buf size %zd failed, no async space left\n", + alloc->pid, size); + return ERR_PTR(-ENOSPC); + } + + /* Pad 0-size buffers so they get assigned unique addresses */ + size = max(size, sizeof(void *)); + + while (n) { + buffer = rb_entry(n, struct binder_buffer, rb_node); + BUG_ON(!buffer->free); + buffer_size = binder_alloc_buffer_size(alloc, buffer); + + if (size < buffer_size) { + best_fit = n; + n = n->rb_left; + } else if (size > buffer_size) + n = n->rb_right; + else { + best_fit = n; + break; + } + } + if (best_fit == NULL) { + size_t allocated_buffers = 0; + size_t largest_alloc_size = 0; + size_t total_alloc_size = 0; + size_t free_buffers = 0; + size_t largest_free_size = 0; + size_t total_free_size = 0; + + for (n = rb_first(&alloc->allocated_buffers); n != NULL; + n = rb_next(n)) { + buffer = rb_entry(n, struct binder_buffer, rb_node); + buffer_size = binder_alloc_buffer_size(alloc, buffer); + allocated_buffers++; + total_alloc_size += buffer_size; + if (buffer_size > largest_alloc_size) + largest_alloc_size = buffer_size; + } + for (n = rb_first(&alloc->free_buffers); n != NULL; + n = rb_next(n)) { + buffer = rb_entry(n, struct binder_buffer, rb_node); + buffer_size = binder_alloc_buffer_size(alloc, buffer); + free_buffers++; + total_free_size += buffer_size; + if (buffer_size > largest_free_size) + largest_free_size = buffer_size; + } + pr_err("%d: binder_alloc_buf size %zd failed, no address space\n", + alloc->pid, size); + pr_err("allocated: %zd (num: %zd largest: %zd), free: %zd (num: %zd largest: %zd)\n", + total_alloc_size, allocated_buffers, largest_alloc_size, + total_free_size, free_buffers, largest_free_size); + return ERR_PTR(-ENOSPC); + } + if (n == NULL) { + buffer = rb_entry(best_fit, struct binder_buffer, rb_node); + buffer_size = binder_alloc_buffer_size(alloc, buffer); + } + + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: binder_alloc_buf size %zd got buffer %pK size %zd\n", + alloc->pid, size, buffer, buffer_size); + + has_page_addr = + (void *)(((uintptr_t)buffer->data + buffer_size) & PAGE_MASK); + WARN_ON(n && buffer_size != size); + end_page_addr = + (void *)PAGE_ALIGN((uintptr_t)buffer->data + size); + if (end_page_addr > has_page_addr) + end_page_addr = has_page_addr; + ret = binder_update_page_range(alloc, 1, + (void *)PAGE_ALIGN((uintptr_t)buffer->data), end_page_addr); + if (ret) + return ERR_PTR(ret); + + if (buffer_size != size) { + struct binder_buffer *new_buffer; + + new_buffer = kzalloc(sizeof(*buffer), GFP_KERNEL); + if (!new_buffer) { + pr_err("%s: %d failed to alloc new buffer struct\n", + __func__, alloc->pid); + goto err_alloc_buf_struct_failed; + } + new_buffer->data = (u8 *)buffer->data + size; + list_add(&new_buffer->entry, &buffer->entry); + new_buffer->free = 1; + binder_insert_free_buffer(alloc, new_buffer); + } + + rb_erase(best_fit, &alloc->free_buffers); + buffer->free = 0; + buffer->free_in_progress = 0; + binder_insert_allocated_buffer_locked(alloc, buffer); + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: binder_alloc_buf size %zd got %pK\n", + alloc->pid, size, buffer); + buffer->data_size = data_size; + buffer->offsets_size = offsets_size; + buffer->async_transaction = is_async; + buffer->extra_buffers_size = extra_buffers_size; + if (is_async) { + alloc->free_async_space -= size + sizeof(struct binder_buffer); + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC, + "%d: binder_alloc_buf size %zd async free %zd\n", + alloc->pid, size, alloc->free_async_space); + } + return buffer; + +err_alloc_buf_struct_failed: + binder_update_page_range(alloc, 0, + (void *)PAGE_ALIGN((uintptr_t)buffer->data), + end_page_addr); + return ERR_PTR(-ENOMEM); +} + +/** + * binder_alloc_new_buf() - Allocate a new binder buffer + * @alloc: binder_alloc for this proc + * @data_size: size of user data buffer + * @offsets_size: user specified buffer offset + * @extra_buffers_size: size of extra space for meta-data (eg, security context) + * @is_async: buffer for async transaction + * + * Allocate a new buffer given the requested sizes. Returns + * the kernel version of the buffer pointer. The size allocated + * is the sum of the three given sizes (each rounded up to + * pointer-sized boundary) + * + * Return: The allocated buffer or %NULL if error + */ +struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc, + size_t data_size, + size_t offsets_size, + size_t extra_buffers_size, + int is_async) +{ + struct binder_buffer *buffer; + + mutex_lock(&alloc->mutex); + buffer = binder_alloc_new_buf_locked(alloc, data_size, offsets_size, + extra_buffers_size, is_async); + mutex_unlock(&alloc->mutex); + return buffer; +} + +static void *buffer_start_page(struct binder_buffer *buffer) +{ + return (void *)((uintptr_t)buffer->data & PAGE_MASK); +} + +static void *prev_buffer_end_page(struct binder_buffer *buffer) +{ + return (void *)(((uintptr_t)(buffer->data) - 1) & PAGE_MASK); +} + +static void binder_delete_free_buffer(struct binder_alloc *alloc, + struct binder_buffer *buffer) +{ + struct binder_buffer *prev, *next = NULL; + bool to_free = true; + BUG_ON(alloc->buffers.next == &buffer->entry); + prev = binder_buffer_prev(buffer); + BUG_ON(!prev->free); + if (prev_buffer_end_page(prev) == buffer_start_page(buffer)) { + to_free = false; + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: merge free, buffer %pK share page with %pK\n", + alloc->pid, buffer->data, prev->data); + } + + if (!list_is_last(&buffer->entry, &alloc->buffers)) { + next = binder_buffer_next(buffer); + if (buffer_start_page(next) == buffer_start_page(buffer)) { + to_free = false; + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: merge free, buffer %pK share page with %pK\n", + alloc->pid, + buffer->data, + next->data); + } + } + + if (PAGE_ALIGNED(buffer->data)) { + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: merge free, buffer start %pK is page aligned\n", + alloc->pid, buffer->data); + to_free = false; + } + + if (to_free) { + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: merge free, buffer %pK do not share page with %pK or %pK\n", + alloc->pid, buffer->data, + prev->data, next ? next->data : NULL); + binder_update_page_range(alloc, 0, buffer_start_page(buffer), + buffer_start_page(buffer) + PAGE_SIZE); + } + list_del(&buffer->entry); + kfree(buffer); +} + +static void binder_free_buf_locked(struct binder_alloc *alloc, + struct binder_buffer *buffer) +{ + size_t size, buffer_size; + + buffer_size = binder_alloc_buffer_size(alloc, buffer); + + size = ALIGN(buffer->data_size, sizeof(void *)) + + ALIGN(buffer->offsets_size, sizeof(void *)) + + ALIGN(buffer->extra_buffers_size, sizeof(void *)); + + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%d: binder_free_buf %pK size %zd buffer_size %zd\n", + alloc->pid, buffer, size, buffer_size); + + BUG_ON(buffer->free); + BUG_ON(size > buffer_size); + BUG_ON(buffer->transaction != NULL); + BUG_ON(buffer->data < alloc->buffer); + BUG_ON(buffer->data > alloc->buffer + alloc->buffer_size); + + if (buffer->async_transaction) { + alloc->free_async_space += size + sizeof(struct binder_buffer); + + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC, + "%d: binder_free_buf size %zd async free %zd\n", + alloc->pid, size, alloc->free_async_space); + } + + binder_update_page_range(alloc, 0, + (void *)PAGE_ALIGN((uintptr_t)buffer->data), + (void *)(((uintptr_t)buffer->data + buffer_size) & PAGE_MASK)); + + rb_erase(&buffer->rb_node, &alloc->allocated_buffers); + buffer->free = 1; + if (!list_is_last(&buffer->entry, &alloc->buffers)) { + struct binder_buffer *next = binder_buffer_next(buffer); + + if (next->free) { + rb_erase(&next->rb_node, &alloc->free_buffers); + binder_delete_free_buffer(alloc, next); + } + } + if (alloc->buffers.next != &buffer->entry) { + struct binder_buffer *prev = binder_buffer_prev(buffer); + + if (prev->free) { + binder_delete_free_buffer(alloc, buffer); + rb_erase(&prev->rb_node, &alloc->free_buffers); + buffer = prev; + } + } + binder_insert_free_buffer(alloc, buffer); +} + +/** + * binder_alloc_free_buf() - free a binder buffer + * @alloc: binder_alloc for this proc + * @buffer: kernel pointer to buffer + * + * Free the buffer allocated via binder_alloc_new_buffer() + */ +void binder_alloc_free_buf(struct binder_alloc *alloc, + struct binder_buffer *buffer) +{ + mutex_lock(&alloc->mutex); + binder_free_buf_locked(alloc, buffer); + mutex_unlock(&alloc->mutex); +} + +/** + * binder_alloc_mmap_handler() - map virtual address space for proc + * @alloc: alloc structure for this proc + * @vma: vma passed to mmap() + * + * Called by binder_mmap() to initialize the space specified in + * vma for allocating binder buffers + * + * Return: + * 0 = success + * -EBUSY = address space already mapped + * -ENOMEM = failed to map memory to given address space + */ +int binder_alloc_mmap_handler(struct binder_alloc *alloc, + struct vm_area_struct *vma) +{ + int ret; + struct vm_struct *area; + const char *failure_string; + struct binder_buffer *buffer; + + mutex_lock(&binder_alloc_mmap_lock); + if (alloc->buffer) { + ret = -EBUSY; + failure_string = "already mapped"; + goto err_already_mapped; + } + + area = get_vm_area(vma->vm_end - vma->vm_start, VM_ALLOC); + if (area == NULL) { + ret = -ENOMEM; + failure_string = "get_vm_area"; + goto err_get_vm_area_failed; + } + alloc->buffer = area->addr; + alloc->user_buffer_offset = + vma->vm_start - (uintptr_t)alloc->buffer; + mutex_unlock(&binder_alloc_mmap_lock); + +#ifdef CONFIG_CPU_CACHE_VIPT + if (cache_is_vipt_aliasing()) { + while (CACHE_COLOUR( + (vma->vm_start ^ (uint32_t)alloc->buffer))) { + pr_info("%s: %d %lx-%lx maps %pK bad alignment\n", + __func__, alloc->pid, vma->vm_start, + vma->vm_end, alloc->buffer); + vma->vm_start += PAGE_SIZE; + } + } +#endif + alloc->pages = kzalloc(sizeof(alloc->pages[0]) * + ((vma->vm_end - vma->vm_start) / PAGE_SIZE), + GFP_KERNEL); + if (alloc->pages == NULL) { + ret = -ENOMEM; + failure_string = "alloc page array"; + goto err_alloc_pages_failed; + } + alloc->buffer_size = vma->vm_end - vma->vm_start; + + buffer = kzalloc(sizeof(*buffer), GFP_KERNEL); + if (!buffer) { + ret = -ENOMEM; + failure_string = "alloc buffer struct"; + goto err_alloc_buf_struct_failed; + } + + buffer->data = alloc->buffer; + list_add(&buffer->entry, &alloc->buffers); + buffer->free = 1; + binder_insert_free_buffer(alloc, buffer); + alloc->free_async_space = alloc->buffer_size / 2; + barrier(); + alloc->vma = vma; + alloc->vma_vm_mm = vma->vm_mm; + /* Same as mmgrab() in later kernel versions */ + atomic_inc(&alloc->vma_vm_mm->mm_count); + + return 0; + +err_alloc_buf_struct_failed: + kfree(alloc->pages); + alloc->pages = NULL; +err_alloc_pages_failed: + mutex_lock(&binder_alloc_mmap_lock); + vfree(alloc->buffer); + alloc->buffer = NULL; +err_get_vm_area_failed: +err_already_mapped: + mutex_unlock(&binder_alloc_mmap_lock); + pr_err("%s: %d %lx-%lx %s failed %d\n", __func__, + alloc->pid, vma->vm_start, vma->vm_end, failure_string, ret); + return ret; +} + + +void binder_alloc_deferred_release(struct binder_alloc *alloc) +{ + struct rb_node *n; + int buffers, page_count; + struct binder_buffer *buffer; + + BUG_ON(alloc->vma); + + buffers = 0; + mutex_lock(&alloc->mutex); + while ((n = rb_first(&alloc->allocated_buffers))) { + buffer = rb_entry(n, struct binder_buffer, rb_node); + + /* Transaction should already have been freed */ + BUG_ON(buffer->transaction); + + binder_free_buf_locked(alloc, buffer); + buffers++; + } + + while (!list_empty(&alloc->buffers)) { + buffer = list_first_entry(&alloc->buffers, + struct binder_buffer, entry); + WARN_ON(!buffer->free); + + list_del(&buffer->entry); + WARN_ON_ONCE(!list_empty(&alloc->buffers)); + kfree(buffer); + } + + page_count = 0; + if (alloc->pages) { + int i; + + for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) { + void *page_addr; + bool on_lru; + + if (!alloc->pages[i].page_ptr) + continue; + + on_lru = list_lru_del(&binder_alloc_lru, + &alloc->pages[i].lru); + page_addr = alloc->buffer + i * PAGE_SIZE; + binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, + "%s: %d: page %d at %pK %s\n", + __func__, alloc->pid, i, page_addr, + on_lru ? "on lru" : "active"); + unmap_kernel_range((unsigned long)page_addr, PAGE_SIZE); + __free_page(alloc->pages[i].page_ptr); + page_count++; + } + kfree(alloc->pages); + vfree(alloc->buffer); + } + mutex_unlock(&alloc->mutex); + if (alloc->vma_vm_mm) + mmdrop(alloc->vma_vm_mm); + + binder_alloc_debug(BINDER_DEBUG_OPEN_CLOSE, + "%s: %d buffers %d, pages %d\n", + __func__, alloc->pid, buffers, page_count); +} + +static void print_binder_buffer(struct seq_file *m, const char *prefix, + struct binder_buffer *buffer) +{ + seq_printf(m, "%s %d: %pK size %zd:%zd:%zd %s\n", + prefix, buffer->debug_id, buffer->data, + buffer->data_size, buffer->offsets_size, + buffer->extra_buffers_size, + buffer->transaction ? "active" : "delivered"); +} + +/** + * binder_alloc_print_allocated() - print buffer info + * @m: seq_file for output via seq_printf() + * @alloc: binder_alloc for this proc + * + * Prints information about every buffer associated with + * the binder_alloc state to the given seq_file + */ +void binder_alloc_print_allocated(struct seq_file *m, + struct binder_alloc *alloc) +{ + struct rb_node *n; + + mutex_lock(&alloc->mutex); + for (n = rb_first(&alloc->allocated_buffers); n != NULL; n = rb_next(n)) + print_binder_buffer(m, " buffer", + rb_entry(n, struct binder_buffer, rb_node)); + mutex_unlock(&alloc->mutex); +} + +/** + * binder_alloc_print_pages() - print page usage + * @m: seq_file for output via seq_printf() + * @alloc: binder_alloc for this proc + */ +void binder_alloc_print_pages(struct seq_file *m, + struct binder_alloc *alloc) +{ + struct binder_lru_page *page; + int i; + int active = 0; + int lru = 0; + int free = 0; + + mutex_lock(&alloc->mutex); + for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) { + page = &alloc->pages[i]; + if (!page->page_ptr) + free++; + else if (list_empty(&page->lru)) + active++; + else + lru++; + } + mutex_unlock(&alloc->mutex); + seq_printf(m, " pages: %d:%d:%d\n", active, lru, free); + seq_printf(m, " pages high watermark: %zu\n", alloc->pages_high); +} + +/** + * binder_alloc_get_allocated_count() - return count of buffers + * @alloc: binder_alloc for this proc + * + * Return: count of allocated buffers + */ +int binder_alloc_get_allocated_count(struct binder_alloc *alloc) +{ + struct rb_node *n; + int count = 0; + + mutex_lock(&alloc->mutex); + for (n = rb_first(&alloc->allocated_buffers); n != NULL; n = rb_next(n)) + count++; + mutex_unlock(&alloc->mutex); + return count; +} + + +/** + * binder_alloc_vma_close() - invalidate address space + * @alloc: binder_alloc for this proc + * + * Called from binder_vma_close() when releasing address space. + * Clears alloc->vma to prevent new incoming transactions from + * allocating more buffers. + */ +void binder_alloc_vma_close(struct binder_alloc *alloc) +{ + WRITE_ONCE(alloc->vma, NULL); +} + +/** + * binder_alloc_free_page() - shrinker callback to free pages + * @item: item to free + * @lock: lock protecting the item + * @cb_arg: callback argument + * + * Called from list_lru_walk() in binder_shrink_scan() to free + * up pages when the system is under memory pressure. + */ +enum lru_status binder_alloc_free_page(struct list_head *item, + struct list_lru_one *lru, + spinlock_t *lock, + void *cb_arg) +{ + struct mm_struct *mm = NULL; + struct binder_lru_page *page = container_of(item, + struct binder_lru_page, + lru); + struct binder_alloc *alloc; + uintptr_t page_addr; + size_t index; + struct vm_area_struct *vma; + + alloc = page->alloc; + if (!mutex_trylock(&alloc->mutex)) + goto err_get_alloc_mutex_failed; + + if (!page->page_ptr) + goto err_page_already_freed; + + index = page - alloc->pages; + page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE; + vma = alloc->vma; + if (vma) { + if (!mmget_not_zero(alloc->vma_vm_mm)) + goto err_mmget; + mm = alloc->vma_vm_mm; + if (!down_write_trylock(&mm->mmap_sem)) + goto err_down_write_mmap_sem_failed; + } + + list_lru_isolate(lru, item); + spin_unlock(lock); + + if (vma) { + trace_binder_unmap_user_start(alloc, index); + + zap_page_range(vma, + page_addr + + alloc->user_buffer_offset, + PAGE_SIZE, NULL); + + trace_binder_unmap_user_end(alloc, index); + + up_write(&mm->mmap_sem); + mmput(mm); + } + + trace_binder_unmap_kernel_start(alloc, index); + + unmap_kernel_range(page_addr, PAGE_SIZE); + __free_page(page->page_ptr); + page->page_ptr = NULL; + + trace_binder_unmap_kernel_end(alloc, index); + + spin_lock(lock); + mutex_unlock(&alloc->mutex); + return LRU_REMOVED_RETRY; + +err_down_write_mmap_sem_failed: + mmput_async(mm); +err_mmget: +err_page_already_freed: + mutex_unlock(&alloc->mutex); +err_get_alloc_mutex_failed: + return LRU_SKIP; +} + +static unsigned long +binder_shrink_count(struct shrinker *shrink, struct shrink_control *sc) +{ + unsigned long ret = list_lru_count(&binder_alloc_lru); + return ret; +} + +static unsigned long +binder_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) +{ + unsigned long ret; + + ret = list_lru_walk(&binder_alloc_lru, binder_alloc_free_page, + NULL, sc->nr_to_scan); + return ret; +} + +static struct shrinker binder_shrinker = { + .count_objects = binder_shrink_count, + .scan_objects = binder_shrink_scan, + .seeks = DEFAULT_SEEKS, +}; + +/** + * binder_alloc_init() - called by binder_open() for per-proc initialization + * @alloc: binder_alloc for this proc + * + * Called from binder_open() to initialize binder_alloc fields for + * new binder proc + */ +void binder_alloc_init(struct binder_alloc *alloc) +{ + alloc->pid = current->group_leader->pid; + mutex_init(&alloc->mutex); + INIT_LIST_HEAD(&alloc->buffers); +} + +int binder_alloc_shrinker_init(void) +{ + int ret = list_lru_init(&binder_alloc_lru); + + if (ret == 0) { + ret = register_shrinker(&binder_shrinker); + if (ret) + list_lru_destroy(&binder_alloc_lru); + } + return ret; +}
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h new file mode 100644 index 0000000..9ef64e5 --- /dev/null +++ b/drivers/android/binder_alloc.h
@@ -0,0 +1,188 @@ +/* + * Copyright (C) 2017 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef _LINUX_BINDER_ALLOC_H +#define _LINUX_BINDER_ALLOC_H + +#include <linux/rbtree.h> +#include <linux/list.h> +#include <linux/mm.h> +#include <linux/rtmutex.h> +#include <linux/vmalloc.h> +#include <linux/slab.h> +#include <linux/list_lru.h> + +extern struct list_lru binder_alloc_lru; +struct binder_transaction; + +/** + * struct binder_buffer - buffer used for binder transactions + * @entry: entry alloc->buffers + * @rb_node: node for allocated_buffers/free_buffers rb trees + * @free: true if buffer is free + * @allow_user_free: describe the second member of struct blah, + * @async_transaction: describe the second member of struct blah, + * @debug_id: describe the second member of struct blah, + * @transaction: describe the second member of struct blah, + * @target_node: describe the second member of struct blah, + * @data_size: describe the second member of struct blah, + * @offsets_size: describe the second member of struct blah, + * @extra_buffers_size: describe the second member of struct blah, + * @data:i describe the second member of struct blah, + * + * Bookkeeping structure for binder transaction buffers + */ +struct binder_buffer { + struct list_head entry; /* free and allocated entries by address */ + struct rb_node rb_node; /* free entry by size or allocated entry */ + /* by address */ + unsigned free:1; + unsigned allow_user_free:1; + unsigned async_transaction:1; + unsigned free_in_progress:1; + unsigned debug_id:28; + + struct binder_transaction *transaction; + + struct binder_node *target_node; + size_t data_size; + size_t offsets_size; + size_t extra_buffers_size; + void *data; +}; + +/** + * struct binder_lru_page - page object used for binder shrinker + * @page_ptr: pointer to physical page in mmap'd space + * @lru: entry in binder_alloc_lru + * @alloc: binder_alloc for a proc + */ +struct binder_lru_page { + struct list_head lru; + struct page *page_ptr; + struct binder_alloc *alloc; +}; + +/** + * struct binder_alloc - per-binder proc state for binder allocator + * @vma: vm_area_struct passed to mmap_handler + * (invarient after mmap) + * @tsk: tid for task that called init for this proc + * (invariant after init) + * @vma_vm_mm: copy of vma->vm_mm (invarient after mmap) + * @buffer: base of per-proc address space mapped via mmap + * @user_buffer_offset: offset between user and kernel VAs for buffer + * @buffers: list of all buffers for this proc + * @free_buffers: rb tree of buffers available for allocation + * sorted by size + * @allocated_buffers: rb tree of allocated buffers sorted by address + * @free_async_space: VA space available for async buffers. This is + * initialized at mmap time to 1/2 the full VA space + * @pages: array of binder_lru_page + * @buffer_size: size of address space specified via mmap + * @pid: pid for associated binder_proc (invariant after init) + * @pages_high: high watermark of offset in @pages + * + * Bookkeeping structure for per-proc address space management for binder + * buffers. It is normally initialized during binder_init() and binder_mmap() + * calls. The address space is used for both user-visible buffers and for + * struct binder_buffer objects used to track the user buffers + */ +struct binder_alloc { + struct mutex mutex; + struct vm_area_struct *vma; + struct mm_struct *vma_vm_mm; + void *buffer; + ptrdiff_t user_buffer_offset; + struct list_head buffers; + struct rb_root free_buffers; + struct rb_root allocated_buffers; + size_t free_async_space; + struct binder_lru_page *pages; + size_t buffer_size; + uint32_t buffer_free; + int pid; + size_t pages_high; +}; + +#ifdef CONFIG_ANDROID_BINDER_IPC_SELFTEST +void binder_selftest_alloc(struct binder_alloc *alloc); +#else +static inline void binder_selftest_alloc(struct binder_alloc *alloc) {} +#endif +enum lru_status binder_alloc_free_page(struct list_head *item, + struct list_lru_one *lru, + spinlock_t *lock, void *cb_arg); +extern struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc, + size_t data_size, + size_t offsets_size, + size_t extra_buffers_size, + int is_async); +extern void binder_alloc_init(struct binder_alloc *alloc); +extern int binder_alloc_shrinker_init(void); +extern void binder_alloc_vma_close(struct binder_alloc *alloc); +extern struct binder_buffer * +binder_alloc_prepare_to_free(struct binder_alloc *alloc, + uintptr_t user_ptr); +extern void binder_alloc_free_buf(struct binder_alloc *alloc, + struct binder_buffer *buffer); +extern int binder_alloc_mmap_handler(struct binder_alloc *alloc, + struct vm_area_struct *vma); +extern void binder_alloc_deferred_release(struct binder_alloc *alloc); +extern int binder_alloc_get_allocated_count(struct binder_alloc *alloc); +extern void binder_alloc_print_allocated(struct seq_file *m, + struct binder_alloc *alloc); +void binder_alloc_print_pages(struct seq_file *m, + struct binder_alloc *alloc); + +/** + * binder_alloc_get_free_async_space() - get free space available for async + * @alloc: binder_alloc for this proc + * + * Return: the bytes remaining in the address-space for async transactions + */ +static inline size_t +binder_alloc_get_free_async_space(struct binder_alloc *alloc) +{ + size_t free_async_space; + + mutex_lock(&alloc->mutex); + free_async_space = alloc->free_async_space; + mutex_unlock(&alloc->mutex); + return free_async_space; +} + +/** + * binder_alloc_get_user_buffer_offset() - get offset between kernel/user addrs + * @alloc: binder_alloc for this proc + * + * Return: the offset between kernel and user-space addresses to use for + * virtual address conversion + */ +static inline ptrdiff_t +binder_alloc_get_user_buffer_offset(struct binder_alloc *alloc) +{ + /* + * user_buffer_offset is constant if vma is set and + * undefined if vma is not set. It is possible to + * get here with !alloc->vma if the target process + * is dying while a transaction is being initiated. + * Returning the old value is ok in this case and + * the transaction will fail. + */ + return alloc->user_buffer_offset; +} + +#endif /* _LINUX_BINDER_ALLOC_H */ +
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c new file mode 100644 index 0000000..8bd7bce --- /dev/null +++ b/drivers/android/binder_alloc_selftest.c
@@ -0,0 +1,310 @@ +/* binder_alloc_selftest.c + * + * Android IPC Subsystem + * + * Copyright (C) 2017 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/mm_types.h> +#include <linux/err.h> +#include "binder_alloc.h" + +#define BUFFER_NUM 5 +#define BUFFER_MIN_SIZE (PAGE_SIZE / 8) + +static bool binder_selftest_run = true; +static int binder_selftest_failures; +static DEFINE_MUTEX(binder_selftest_lock); + +/** + * enum buf_end_align_type - Page alignment of a buffer + * end with regard to the end of the previous buffer. + * + * In the pictures below, buf2 refers to the buffer we + * are aligning. buf1 refers to previous buffer by addr. + * Symbol [ means the start of a buffer, ] means the end + * of a buffer, and | means page boundaries. + */ +enum buf_end_align_type { + /** + * @SAME_PAGE_UNALIGNED: The end of this buffer is on + * the same page as the end of the previous buffer and + * is not page aligned. Examples: + * buf1 ][ buf2 ][ ... + * buf1 ]|[ buf2 ][ ... + */ + SAME_PAGE_UNALIGNED = 0, + /** + * @SAME_PAGE_ALIGNED: When the end of the previous buffer + * is not page aligned, the end of this buffer is on the + * same page as the end of the previous buffer and is page + * aligned. When the previous buffer is page aligned, the + * end of this buffer is aligned to the next page boundary. + * Examples: + * buf1 ][ buf2 ]| ... + * buf1 ]|[ buf2 ]| ... + */ + SAME_PAGE_ALIGNED, + /** + * @NEXT_PAGE_UNALIGNED: The end of this buffer is on + * the page next to the end of the previous buffer and + * is not page aligned. Examples: + * buf1 ][ buf2 | buf2 ][ ... + * buf1 ]|[ buf2 | buf2 ][ ... + */ + NEXT_PAGE_UNALIGNED, + /** + * @NEXT_PAGE_ALIGNED: The end of this buffer is on + * the page next to the end of the previous buffer and + * is page aligned. Examples: + * buf1 ][ buf2 | buf2 ]| ... + * buf1 ]|[ buf2 | buf2 ]| ... + */ + NEXT_PAGE_ALIGNED, + /** + * @NEXT_NEXT_UNALIGNED: The end of this buffer is on + * the page that follows the page after the end of the + * previous buffer and is not page aligned. Examples: + * buf1 ][ buf2 | buf2 | buf2 ][ ... + * buf1 ]|[ buf2 | buf2 | buf2 ][ ... + */ + NEXT_NEXT_UNALIGNED, + LOOP_END, +}; + +static void pr_err_size_seq(size_t *sizes, int *seq) +{ + int i; + + pr_err("alloc sizes: "); + for (i = 0; i < BUFFER_NUM; i++) + pr_cont("[%zu]", sizes[i]); + pr_cont("\n"); + pr_err("free seq: "); + for (i = 0; i < BUFFER_NUM; i++) + pr_cont("[%d]", seq[i]); + pr_cont("\n"); +} + +static bool check_buffer_pages_allocated(struct binder_alloc *alloc, + struct binder_buffer *buffer, + size_t size) +{ + void *page_addr, *end; + int page_index; + + end = (void *)PAGE_ALIGN((uintptr_t)buffer->data + size); + page_addr = buffer->data; + for (; page_addr < end; page_addr += PAGE_SIZE) { + page_index = (page_addr - alloc->buffer) / PAGE_SIZE; + if (!alloc->pages[page_index].page_ptr || + !list_empty(&alloc->pages[page_index].lru)) { + pr_err("expect alloc but is %s at page index %d\n", + alloc->pages[page_index].page_ptr ? + "lru" : "free", page_index); + return false; + } + } + return true; +} + +static void binder_selftest_alloc_buf(struct binder_alloc *alloc, + struct binder_buffer *buffers[], + size_t *sizes, int *seq) +{ + int i; + + for (i = 0; i < BUFFER_NUM; i++) { + buffers[i] = binder_alloc_new_buf(alloc, sizes[i], 0, 0, 0); + if (IS_ERR(buffers[i]) || + !check_buffer_pages_allocated(alloc, buffers[i], + sizes[i])) { + pr_err_size_seq(sizes, seq); + binder_selftest_failures++; + } + } +} + +static void binder_selftest_free_buf(struct binder_alloc *alloc, + struct binder_buffer *buffers[], + size_t *sizes, int *seq, size_t end) +{ + int i; + + for (i = 0; i < BUFFER_NUM; i++) + binder_alloc_free_buf(alloc, buffers[seq[i]]); + + for (i = 0; i < end / PAGE_SIZE; i++) { + /** + * Error message on a free page can be false positive + * if binder shrinker ran during binder_alloc_free_buf + * calls above. + */ + if (list_empty(&alloc->pages[i].lru)) { + pr_err_size_seq(sizes, seq); + pr_err("expect lru but is %s at page index %d\n", + alloc->pages[i].page_ptr ? "alloc" : "free", i); + binder_selftest_failures++; + } + } +} + +static void binder_selftest_free_page(struct binder_alloc *alloc) +{ + int i; + unsigned long count; + + while ((count = list_lru_count(&binder_alloc_lru))) { + list_lru_walk(&binder_alloc_lru, binder_alloc_free_page, + NULL, count); + } + + for (i = 0; i < (alloc->buffer_size / PAGE_SIZE); i++) { + if (alloc->pages[i].page_ptr) { + pr_err("expect free but is %s at page index %d\n", + list_empty(&alloc->pages[i].lru) ? + "alloc" : "lru", i); + binder_selftest_failures++; + } + } +} + +static void binder_selftest_alloc_free(struct binder_alloc *alloc, + size_t *sizes, int *seq, size_t end) +{ + struct binder_buffer *buffers[BUFFER_NUM]; + + binder_selftest_alloc_buf(alloc, buffers, sizes, seq); + binder_selftest_free_buf(alloc, buffers, sizes, seq, end); + + /* Allocate from lru. */ + binder_selftest_alloc_buf(alloc, buffers, sizes, seq); + if (list_lru_count(&binder_alloc_lru)) + pr_err("lru list should be empty but is not\n"); + + binder_selftest_free_buf(alloc, buffers, sizes, seq, end); + binder_selftest_free_page(alloc); +} + +static bool is_dup(int *seq, int index, int val) +{ + int i; + + for (i = 0; i < index; i++) { + if (seq[i] == val) + return true; + } + return false; +} + +/* Generate BUFFER_NUM factorial free orders. */ +static void binder_selftest_free_seq(struct binder_alloc *alloc, + size_t *sizes, int *seq, + int index, size_t end) +{ + int i; + + if (index == BUFFER_NUM) { + binder_selftest_alloc_free(alloc, sizes, seq, end); + return; + } + for (i = 0; i < BUFFER_NUM; i++) { + if (is_dup(seq, index, i)) + continue; + seq[index] = i; + binder_selftest_free_seq(alloc, sizes, seq, index + 1, end); + } +} + +static void binder_selftest_alloc_size(struct binder_alloc *alloc, + size_t *end_offset) +{ + int i; + int seq[BUFFER_NUM] = {0}; + size_t front_sizes[BUFFER_NUM]; + size_t back_sizes[BUFFER_NUM]; + size_t last_offset, offset = 0; + + for (i = 0; i < BUFFER_NUM; i++) { + last_offset = offset; + offset = end_offset[i]; + front_sizes[i] = offset - last_offset; + back_sizes[BUFFER_NUM - i - 1] = front_sizes[i]; + } + /* + * Buffers share the first or last few pages. + * Only BUFFER_NUM - 1 buffer sizes are adjustable since + * we need one giant buffer before getting to the last page. + */ + back_sizes[0] += alloc->buffer_size - end_offset[BUFFER_NUM - 1]; + binder_selftest_free_seq(alloc, front_sizes, seq, 0, + end_offset[BUFFER_NUM - 1]); + binder_selftest_free_seq(alloc, back_sizes, seq, 0, alloc->buffer_size); +} + +static void binder_selftest_alloc_offset(struct binder_alloc *alloc, + size_t *end_offset, int index) +{ + int align; + size_t end, prev; + + if (index == BUFFER_NUM) { + binder_selftest_alloc_size(alloc, end_offset); + return; + } + prev = index == 0 ? 0 : end_offset[index - 1]; + end = prev; + + BUILD_BUG_ON(BUFFER_MIN_SIZE * BUFFER_NUM >= PAGE_SIZE); + + for (align = SAME_PAGE_UNALIGNED; align < LOOP_END; align++) { + if (align % 2) + end = ALIGN(end, PAGE_SIZE); + else + end += BUFFER_MIN_SIZE; + end_offset[index] = end; + binder_selftest_alloc_offset(alloc, end_offset, index + 1); + } +} + +/** + * binder_selftest_alloc() - Test alloc and free of buffer pages. + * @alloc: Pointer to alloc struct. + * + * Allocate BUFFER_NUM buffers to cover all page alignment cases, + * then free them in all orders possible. Check that pages are + * correctly allocated, put onto lru when buffers are freed, and + * are freed when binder_alloc_free_page is called. + */ +void binder_selftest_alloc(struct binder_alloc *alloc) +{ + size_t end_offset[BUFFER_NUM]; + + if (!binder_selftest_run) + return; + mutex_lock(&binder_selftest_lock); + if (!binder_selftest_run || !alloc->vma) + goto done; + pr_info("STARTED\n"); + binder_selftest_alloc_offset(alloc, end_offset, 0); + binder_selftest_run = false; + if (binder_selftest_failures > 0) + pr_info("%d tests FAILED\n", binder_selftest_failures); + else + pr_info("PASSED\n"); + +done: + mutex_unlock(&binder_selftest_lock); +}
diff --git a/drivers/android/binder_trace.h b/drivers/android/binder_trace.h index 7f20f3d..b11dffc5 100644 --- a/drivers/android/binder_trace.h +++ b/drivers/android/binder_trace.h
@@ -23,7 +23,8 @@ struct binder_buffer; struct binder_node; struct binder_proc; -struct binder_ref; +struct binder_alloc; +struct binder_ref_data; struct binder_thread; struct binder_transaction; @@ -84,6 +85,30 @@ DEFINE_BINDER_FUNCTION_RETURN_EVENT(binder_ioctl_done); DEFINE_BINDER_FUNCTION_RETURN_EVENT(binder_write_done); DEFINE_BINDER_FUNCTION_RETURN_EVENT(binder_read_done); +TRACE_EVENT(binder_set_priority, + TP_PROTO(int proc, int thread, unsigned int old_prio, + unsigned int desired_prio, unsigned int new_prio), + TP_ARGS(proc, thread, old_prio, new_prio, desired_prio), + + TP_STRUCT__entry( + __field(int, proc) + __field(int, thread) + __field(unsigned int, old_prio) + __field(unsigned int, new_prio) + __field(unsigned int, desired_prio) + ), + TP_fast_assign( + __entry->proc = proc; + __entry->thread = thread; + __entry->old_prio = old_prio; + __entry->new_prio = new_prio; + __entry->desired_prio = desired_prio; + ), + TP_printk("proc=%d thread=%d old=%d => new=%d desired=%d", + __entry->proc, __entry->thread, __entry->old_prio, + __entry->new_prio, __entry->desired_prio) +); + TRACE_EVENT(binder_wait_for_work, TP_PROTO(bool proc_work, bool transaction_stack, bool thread_todo), TP_ARGS(proc_work, transaction_stack, thread_todo), @@ -146,8 +171,8 @@ TRACE_EVENT(binder_transaction_received, TRACE_EVENT(binder_transaction_node_to_ref, TP_PROTO(struct binder_transaction *t, struct binder_node *node, - struct binder_ref *ref), - TP_ARGS(t, node, ref), + struct binder_ref_data *rdata), + TP_ARGS(t, node, rdata), TP_STRUCT__entry( __field(int, debug_id) @@ -160,8 +185,8 @@ TRACE_EVENT(binder_transaction_node_to_ref, __entry->debug_id = t->debug_id; __entry->node_debug_id = node->debug_id; __entry->node_ptr = node->ptr; - __entry->ref_debug_id = ref->debug_id; - __entry->ref_desc = ref->desc; + __entry->ref_debug_id = rdata->debug_id; + __entry->ref_desc = rdata->desc; ), TP_printk("transaction=%d node=%d src_ptr=0x%016llx ==> dest_ref=%d dest_desc=%d", __entry->debug_id, __entry->node_debug_id, @@ -170,8 +195,9 @@ TRACE_EVENT(binder_transaction_node_to_ref, ); TRACE_EVENT(binder_transaction_ref_to_node, - TP_PROTO(struct binder_transaction *t, struct binder_ref *ref), - TP_ARGS(t, ref), + TP_PROTO(struct binder_transaction *t, struct binder_node *node, + struct binder_ref_data *rdata), + TP_ARGS(t, node, rdata), TP_STRUCT__entry( __field(int, debug_id) @@ -182,10 +208,10 @@ TRACE_EVENT(binder_transaction_ref_to_node, ), TP_fast_assign( __entry->debug_id = t->debug_id; - __entry->ref_debug_id = ref->debug_id; - __entry->ref_desc = ref->desc; - __entry->node_debug_id = ref->node->debug_id; - __entry->node_ptr = ref->node->ptr; + __entry->ref_debug_id = rdata->debug_id; + __entry->ref_desc = rdata->desc; + __entry->node_debug_id = node->debug_id; + __entry->node_ptr = node->ptr; ), TP_printk("transaction=%d node=%d src_ref=%d src_desc=%d ==> dest_ptr=0x%016llx", __entry->debug_id, __entry->node_debug_id, @@ -194,9 +220,10 @@ TRACE_EVENT(binder_transaction_ref_to_node, ); TRACE_EVENT(binder_transaction_ref_to_ref, - TP_PROTO(struct binder_transaction *t, struct binder_ref *src_ref, - struct binder_ref *dest_ref), - TP_ARGS(t, src_ref, dest_ref), + TP_PROTO(struct binder_transaction *t, struct binder_node *node, + struct binder_ref_data *src_ref, + struct binder_ref_data *dest_ref), + TP_ARGS(t, node, src_ref, dest_ref), TP_STRUCT__entry( __field(int, debug_id) @@ -208,7 +235,7 @@ TRACE_EVENT(binder_transaction_ref_to_ref, ), TP_fast_assign( __entry->debug_id = t->debug_id; - __entry->node_debug_id = src_ref->node->debug_id; + __entry->node_debug_id = node->debug_id; __entry->src_ref_debug_id = src_ref->debug_id; __entry->src_ref_desc = src_ref->desc; __entry->dest_ref_debug_id = dest_ref->debug_id; @@ -268,9 +295,9 @@ DEFINE_EVENT(binder_buffer_class, binder_transaction_failed_buffer_release, TP_ARGS(buffer)); TRACE_EVENT(binder_update_page_range, - TP_PROTO(struct binder_proc *proc, bool allocate, + TP_PROTO(struct binder_alloc *alloc, bool allocate, void *start, void *end), - TP_ARGS(proc, allocate, start, end), + TP_ARGS(alloc, allocate, start, end), TP_STRUCT__entry( __field(int, proc) __field(bool, allocate) @@ -278,9 +305,9 @@ TRACE_EVENT(binder_update_page_range, __field(size_t, size) ), TP_fast_assign( - __entry->proc = proc->pid; + __entry->proc = alloc->pid; __entry->allocate = allocate; - __entry->offset = start - proc->buffer; + __entry->offset = start - alloc->buffer; __entry->size = end - start; ), TP_printk("proc=%d allocate=%d offset=%zu size=%zu", @@ -288,6 +315,61 @@ TRACE_EVENT(binder_update_page_range, __entry->offset, __entry->size) ); +DECLARE_EVENT_CLASS(binder_lru_page_class, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index), + TP_STRUCT__entry( + __field(int, proc) + __field(size_t, page_index) + ), + TP_fast_assign( + __entry->proc = alloc->pid; + __entry->page_index = page_index; + ), + TP_printk("proc=%d page_index=%zu", + __entry->proc, __entry->page_index) +); + +DEFINE_EVENT(binder_lru_page_class, binder_alloc_lru_start, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index)); + +DEFINE_EVENT(binder_lru_page_class, binder_alloc_lru_end, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index)); + +DEFINE_EVENT(binder_lru_page_class, binder_free_lru_start, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index)); + +DEFINE_EVENT(binder_lru_page_class, binder_free_lru_end, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index)); + +DEFINE_EVENT(binder_lru_page_class, binder_alloc_page_start, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index)); + +DEFINE_EVENT(binder_lru_page_class, binder_alloc_page_end, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index)); + +DEFINE_EVENT(binder_lru_page_class, binder_unmap_user_start, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index)); + +DEFINE_EVENT(binder_lru_page_class, binder_unmap_user_end, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index)); + +DEFINE_EVENT(binder_lru_page_class, binder_unmap_kernel_start, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index)); + +DEFINE_EVENT(binder_lru_page_class, binder_unmap_kernel_end, + TP_PROTO(const struct binder_alloc *alloc, size_t page_index), + TP_ARGS(alloc, page_index)); + TRACE_EVENT(binder_command, TP_PROTO(uint32_t cmd), TP_ARGS(cmd),
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 9851721..574d08f 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c
@@ -33,6 +33,7 @@ #include <linux/cpufreq.h> #include <linux/cpuidle.h> #include <linux/timer.h> +#include <linux/wakeup_reason.h> #include "../base.h" #include "power.h" @@ -1353,6 +1354,7 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) pm_callback_t callback = NULL; char *info = NULL; int error = 0; + char suspend_abort[MAX_SUSPEND_ABORT_LEN]; DECLARE_DPM_WATCHDOG_ON_STACK(wd); TRACE_DEVICE(dev); @@ -1375,6 +1377,9 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) pm_wakeup_event(dev, 0); if (pm_wakeup_pending()) { + pm_get_active_wakeup_sources(suspend_abort, + MAX_SUSPEND_ABORT_LEN); + log_suspend_abort_reason(suspend_abort); dev->power.direct_complete = false; async_error = -EBUSY; goto Complete;
diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c index 23ee46a..e494a93 100644 --- a/drivers/base/power/opp/core.c +++ b/drivers/base/power/opp/core.c
@@ -708,7 +708,7 @@ static void _remove_opp_dev(struct opp_device *opp_dev, struct opp_table *opp_table) { opp_debug_unregister(opp_dev, opp_table); - list_del(&opp_dev->node); + list_del_rcu(&opp_dev->node); call_srcu(&opp_table->srcu_head.srcu, &opp_dev->rcu_head, _kfree_opp_dev_rcu); }
diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index f98121f1..90c16d8 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c
@@ -15,6 +15,7 @@ #include <linux/seq_file.h> #include <linux/debugfs.h> #include <linux/pm_wakeirq.h> +#include <linux/types.h> #include <trace/events/power.h> #include "power.h" @@ -804,6 +805,37 @@ void pm_wakeup_event(struct device *dev, unsigned int msec) } EXPORT_SYMBOL_GPL(pm_wakeup_event); +void pm_get_active_wakeup_sources(char *pending_wakeup_source, size_t max) +{ + struct wakeup_source *ws, *last_active_ws = NULL; + int len = 0; + bool active = false; + + rcu_read_lock(); + list_for_each_entry_rcu(ws, &wakeup_sources, entry) { + if (ws->active && len < max) { + if (!active) + len += scnprintf(pending_wakeup_source, max, + "Pending Wakeup Sources: "); + len += scnprintf(pending_wakeup_source + len, max - len, + "%s ", ws->name); + active = true; + } else if (!active && + (!last_active_ws || + ktime_to_ns(ws->last_time) > + ktime_to_ns(last_active_ws->last_time))) { + last_active_ws = ws; + } + } + if (!active && last_active_ws) { + scnprintf(pending_wakeup_source, max, + "Last active Wakeup Source: %s", + last_active_ws->name); + } + rcu_read_unlock(); +} +EXPORT_SYMBOL_GPL(pm_get_active_wakeup_sources); + void pm_print_active_wakeup_sources(void) { struct wakeup_source *ws; @@ -1011,7 +1043,7 @@ static int print_wakeup_source_stats(struct seq_file *m, active_time = ktime_set(0, 0); } - seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n", + seq_printf(m, "%-32s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n", ws->name, active_count, ws->event_count, ws->wakeup_count, ws->expire_count, ktime_to_ms(active_time), ktime_to_ms(total_time), @@ -1032,7 +1064,7 @@ static int wakeup_sources_stats_show(struct seq_file *m, void *unused) struct wakeup_source *ws; int srcuidx; - seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t" + seq_puts(m, "name\t\t\t\t\tactive_count\tevent_count\twakeup_count\t" "expire_count\tactive_since\ttotal_time\tmax_time\t" "last_change\tprevent_suspend_time\n");
diff --git a/drivers/base/syscore.c b/drivers/base/syscore.c index 8d98a32..96c34a9 100644 --- a/drivers/base/syscore.c +++ b/drivers/base/syscore.c
@@ -11,6 +11,7 @@ #include <linux/module.h> #include <linux/suspend.h> #include <trace/events/power.h> +#include <linux/wakeup_reason.h> static LIST_HEAD(syscore_ops_list); static DEFINE_MUTEX(syscore_ops_lock); @@ -75,6 +76,8 @@ int syscore_suspend(void) return 0; err_out: + log_suspend_abort_reason("System core suspend callback %pF failed", + ops->suspend); pr_err("PM: System core suspend callback %pF failed.\n", ops->suspend); list_for_each_entry_continue(ops, &syscore_ops_list, node)
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index b8ecba6..cb53957 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig
@@ -12,4 +12,26 @@ It has several use cases, for example: /tmp storage, use as swap disks and maybe many more. - See zram.txt for more information. + See Documentation/blockdev/zram.txt for more information. + +config ZRAM_WRITEBACK + bool "Write back incompressible page to backing device" + depends on ZRAM + default n + help + With incompressible page, there is no memory saving to keep it + in memory. Instead, write it out to backing device. + For this feature, admin should set up backing device via + /sys/block/zramX/backing_dev. + + See Documentation/blockdev/zram.txt for more information. + +config ZRAM_MEMORY_TRACKING + bool "Track zRam block status" + depends on ZRAM && DEBUG_FS + help + With this feature, admin can track the state of allocated blocks + of zRAM. Admin could see the information via + /sys/kernel/debug/zram/zramX/block_state. + + See Documentation/blockdev/zram.txt for more information.
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c index 4b5cd3a..c084a7f 100644 --- a/drivers/block/zram/zcomp.c +++ b/drivers/block/zram/zcomp.c
@@ -32,6 +32,9 @@ static const char * const backends[] = { #if IS_ENABLED(CONFIG_CRYPTO_842) "842", #endif +#if IS_ENABLED(CONFIG_CRYPTO_ZSTD) + "zstd", +#endif NULL };
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index b7c0b69..5698cd2 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c
@@ -31,6 +31,7 @@ #include <linux/err.h> #include <linux/idr.h> #include <linux/sysfs.h> +#include <linux/debugfs.h> #include "zram_drv.h" @@ -43,82 +44,105 @@ static const char *default_compressor = "lzo"; /* Module params (documentation at end) */ static unsigned int num_devices = 1; +/* + * Pages that compress to sizes equals or greater than this are stored + * uncompressed in memory. + */ +static size_t huge_class_size; -static inline void deprecated_attr_warn(const char *name) +static void zram_free_page(struct zram *zram, size_t index); + +static void zram_slot_lock(struct zram *zram, u32 index) { - pr_warn_once("%d (%s) Attribute %s (and others) will be removed. %s\n", - task_pid_nr(current), - current->comm, - name, - "See zram documentation."); + bit_spin_lock(ZRAM_LOCK, &zram->table[index].value); } -#define ZRAM_ATTR_RO(name) \ -static ssize_t name##_show(struct device *d, \ - struct device_attribute *attr, char *b) \ -{ \ - struct zram *zram = dev_to_zram(d); \ - \ - deprecated_attr_warn(__stringify(name)); \ - return scnprintf(b, PAGE_SIZE, "%llu\n", \ - (u64)atomic64_read(&zram->stats.name)); \ -} \ -static DEVICE_ATTR_RO(name); +static void zram_slot_unlock(struct zram *zram, u32 index) +{ + bit_spin_unlock(ZRAM_LOCK, &zram->table[index].value); +} static inline bool init_done(struct zram *zram) { return zram->disksize; } +static inline bool zram_allocated(struct zram *zram, u32 index) +{ + + return (zram->table[index].value >> (ZRAM_FLAG_SHIFT + 1)) || + zram->table[index].handle; +} + static inline struct zram *dev_to_zram(struct device *dev) { return (struct zram *)dev_to_disk(dev)->private_data; } +static unsigned long zram_get_handle(struct zram *zram, u32 index) +{ + return zram->table[index].handle; +} + +static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle) +{ + zram->table[index].handle = handle; +} + /* flag operations require table entry bit_spin_lock() being held */ -static int zram_test_flag(struct zram_meta *meta, u32 index, +static bool zram_test_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { - return meta->table[index].value & BIT(flag); + return zram->table[index].value & BIT(flag); } -static void zram_set_flag(struct zram_meta *meta, u32 index, +static void zram_set_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { - meta->table[index].value |= BIT(flag); + zram->table[index].value |= BIT(flag); } -static void zram_clear_flag(struct zram_meta *meta, u32 index, +static void zram_clear_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { - meta->table[index].value &= ~BIT(flag); + zram->table[index].value &= ~BIT(flag); } -static size_t zram_get_obj_size(struct zram_meta *meta, u32 index) +static inline void zram_set_element(struct zram *zram, u32 index, + unsigned long element) { - return meta->table[index].value & (BIT(ZRAM_FLAG_SHIFT) - 1); + zram->table[index].element = element; } -static void zram_set_obj_size(struct zram_meta *meta, +static unsigned long zram_get_element(struct zram *zram, u32 index) +{ + return zram->table[index].element; +} + +static size_t zram_get_obj_size(struct zram *zram, u32 index) +{ + return zram->table[index].value & (BIT(ZRAM_FLAG_SHIFT) - 1); +} + +static void zram_set_obj_size(struct zram *zram, u32 index, size_t size) { - unsigned long flags = meta->table[index].value >> ZRAM_FLAG_SHIFT; + unsigned long flags = zram->table[index].value >> ZRAM_FLAG_SHIFT; - meta->table[index].value = (flags << ZRAM_FLAG_SHIFT) | size; + zram->table[index].value = (flags << ZRAM_FLAG_SHIFT) | size; } +#if PAGE_SIZE != 4096 static inline bool is_partial_io(struct bio_vec *bvec) { return bvec->bv_len != PAGE_SIZE; } - -static void zram_revalidate_disk(struct zram *zram) +#else +static inline bool is_partial_io(struct bio_vec *bvec) { - revalidate_disk(zram->disk); - /* revalidate_disk reset the BDI_CAP_STABLE_WRITES so set again */ - zram->disk->queue->backing_dev_info.capabilities |= - BDI_CAP_STABLE_WRITES; + return false; } +#endif /* * Check if request is within bounds and aligned on zram logical blocks. @@ -146,8 +170,7 @@ static inline bool valid_io_request(struct zram *zram, static void update_position(u32 *index, int *offset, struct bio_vec *bvec) { - if (*offset + bvec->bv_len >= PAGE_SIZE) - (*index)++; + *index += (*offset + bvec->bv_len) / PAGE_SIZE; *offset = (*offset + bvec->bv_len) % PAGE_SIZE; } @@ -166,36 +189,41 @@ static inline void update_used_max(struct zram *zram, } while (old_max != cur_max); } -static bool page_zero_filled(void *ptr) +static inline void zram_fill_page(char *ptr, unsigned long len, + unsigned long value) +{ + int i; + unsigned long *page = (unsigned long *)ptr; + + WARN_ON_ONCE(!IS_ALIGNED(len, sizeof(unsigned long))); + + if (likely(value == 0)) { + memset(ptr, 0, len); + } else { + for (i = 0; i < len / sizeof(*page); i++) + page[i] = value; + } +} + +static bool page_same_filled(void *ptr, unsigned long *element) { unsigned int pos; unsigned long *page; + unsigned long val; page = (unsigned long *)ptr; + val = page[0]; - for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) { - if (page[pos]) + for (pos = 1; pos < PAGE_SIZE / sizeof(*page); pos++) { + if (val != page[pos]) return false; } + *element = val; + return true; } -static void handle_zero_page(struct bio_vec *bvec) -{ - struct page *page = bvec->bv_page; - void *user_mem; - - user_mem = kmap_atomic(page); - if (is_partial_io(bvec)) - memset(user_mem + bvec->bv_offset, 0, bvec->bv_len); - else - clear_page(user_mem); - kunmap_atomic(user_mem); - - flush_dcache_page(page); -} - static ssize_t initstate_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -217,47 +245,6 @@ static ssize_t disksize_show(struct device *dev, return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize); } -static ssize_t orig_data_size_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct zram *zram = dev_to_zram(dev); - - deprecated_attr_warn("orig_data_size"); - return scnprintf(buf, PAGE_SIZE, "%llu\n", - (u64)(atomic64_read(&zram->stats.pages_stored)) << PAGE_SHIFT); -} - -static ssize_t mem_used_total_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - u64 val = 0; - struct zram *zram = dev_to_zram(dev); - - deprecated_attr_warn("mem_used_total"); - down_read(&zram->init_lock); - if (init_done(zram)) { - struct zram_meta *meta = zram->meta; - val = zs_get_total_pages(meta->mem_pool); - } - up_read(&zram->init_lock); - - return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); -} - -static ssize_t mem_limit_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - u64 val; - struct zram *zram = dev_to_zram(dev); - - deprecated_attr_warn("mem_limit"); - down_read(&zram->init_lock); - val = zram->limit_pages; - up_read(&zram->init_lock); - - return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); -} - static ssize_t mem_limit_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { @@ -276,21 +263,6 @@ static ssize_t mem_limit_store(struct device *dev, return len; } -static ssize_t mem_used_max_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - u64 val = 0; - struct zram *zram = dev_to_zram(dev); - - deprecated_attr_warn("mem_used_max"); - down_read(&zram->init_lock); - if (init_done(zram)) - val = atomic_long_read(&zram->stats.max_used_pages); - up_read(&zram->init_lock); - - return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); -} - static ssize_t mem_used_max_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { @@ -304,15 +276,485 @@ static ssize_t mem_used_max_store(struct device *dev, down_read(&zram->init_lock); if (init_done(zram)) { - struct zram_meta *meta = zram->meta; atomic_long_set(&zram->stats.max_used_pages, - zs_get_total_pages(meta->mem_pool)); + zs_get_total_pages(zram->mem_pool)); } up_read(&zram->init_lock); return len; } +#ifdef CONFIG_ZRAM_WRITEBACK +static bool zram_wb_enabled(struct zram *zram) +{ + return zram->backing_dev; +} + +static void reset_bdev(struct zram *zram) +{ + struct block_device *bdev; + + if (!zram_wb_enabled(zram)) + return; + + bdev = zram->bdev; + if (zram->old_block_size) + set_blocksize(bdev, zram->old_block_size); + blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); + /* hope filp_close flush all of IO */ + filp_close(zram->backing_dev, NULL); + zram->backing_dev = NULL; + zram->old_block_size = 0; + zram->bdev = NULL; + + kvfree(zram->bitmap); + zram->bitmap = NULL; +} + +static ssize_t backing_dev_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct zram *zram = dev_to_zram(dev); + struct file *file = zram->backing_dev; + char *p; + ssize_t ret; + + down_read(&zram->init_lock); + if (!zram_wb_enabled(zram)) { + memcpy(buf, "none\n", 5); + up_read(&zram->init_lock); + return 5; + } + + p = file_path(file, buf, PAGE_SIZE - 1); + if (IS_ERR(p)) { + ret = PTR_ERR(p); + goto out; + } + + ret = strlen(p); + memmove(buf, p, ret); + buf[ret++] = '\n'; +out: + up_read(&zram->init_lock); + return ret; +} + +static ssize_t backing_dev_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t len) +{ + char *file_name; + size_t sz; + struct file *backing_dev = NULL; + struct inode *inode; + struct address_space *mapping; + unsigned int bitmap_sz, old_block_size = 0; + unsigned long nr_pages, *bitmap = NULL; + struct block_device *bdev = NULL; + int err; + struct zram *zram = dev_to_zram(dev); + gfp_t kmalloc_flags; + + file_name = kmalloc(PATH_MAX, GFP_KERNEL); + if (!file_name) + return -ENOMEM; + + down_write(&zram->init_lock); + if (init_done(zram)) { + pr_info("Can't setup backing device for initialized device\n"); + err = -EBUSY; + goto out; + } + + strlcpy(file_name, buf, PATH_MAX); + /* ignore trailing newline */ + sz = strlen(file_name); + if (sz > 0 && file_name[sz - 1] == '\n') + file_name[sz - 1] = 0x00; + + backing_dev = filp_open(file_name, O_RDWR|O_LARGEFILE, 0); + if (IS_ERR(backing_dev)) { + err = PTR_ERR(backing_dev); + backing_dev = NULL; + goto out; + } + + mapping = backing_dev->f_mapping; + inode = mapping->host; + + /* Support only block device in this moment */ + if (!S_ISBLK(inode->i_mode)) { + err = -ENOTBLK; + goto out; + } + + bdev = bdgrab(I_BDEV(inode)); + err = blkdev_get(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram); + if (err < 0) + goto out; + + nr_pages = i_size_read(inode) >> PAGE_SHIFT; + bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long); + kmalloc_flags = GFP_KERNEL | __GFP_ZERO; + if (bitmap_sz > PAGE_SIZE) + kmalloc_flags |= __GFP_NOWARN | __GFP_NORETRY; + + bitmap = kmalloc_node(bitmap_sz, kmalloc_flags, NUMA_NO_NODE); + if (!bitmap && bitmap_sz > PAGE_SIZE) + bitmap = vzalloc(bitmap_sz); + + if (!bitmap) { + err = -ENOMEM; + goto out; + } + + old_block_size = block_size(bdev); + err = set_blocksize(bdev, PAGE_SIZE); + if (err) + goto out; + + reset_bdev(zram); + spin_lock_init(&zram->bitmap_lock); + + zram->old_block_size = old_block_size; + zram->bdev = bdev; + zram->backing_dev = backing_dev; + zram->bitmap = bitmap; + zram->nr_pages = nr_pages; + up_write(&zram->init_lock); + + pr_info("setup backing device %s\n", file_name); + kfree(file_name); + + return len; +out: + if (bitmap) + kvfree(bitmap); + + if (bdev) + blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); + + if (backing_dev) + filp_close(backing_dev, NULL); + + up_write(&zram->init_lock); + + kfree(file_name); + + return err; +} + +static unsigned long get_entry_bdev(struct zram *zram) +{ + unsigned long entry; + + spin_lock(&zram->bitmap_lock); + /* skip 0 bit to confuse zram.handle = 0 */ + entry = find_next_zero_bit(zram->bitmap, zram->nr_pages, 1); + if (entry == zram->nr_pages) { + spin_unlock(&zram->bitmap_lock); + return 0; + } + + set_bit(entry, zram->bitmap); + spin_unlock(&zram->bitmap_lock); + + return entry; +} + +static void put_entry_bdev(struct zram *zram, unsigned long entry) +{ + int was_set; + + spin_lock(&zram->bitmap_lock); + was_set = test_and_clear_bit(entry, zram->bitmap); + spin_unlock(&zram->bitmap_lock); + WARN_ON_ONCE(!was_set); +} + +static void zram_page_end_io(struct bio *bio) +{ + struct page *page = bio->bi_io_vec[0].bv_page; + + page_endio(page, op_is_write(bio_op(bio)), bio->bi_error); + bio_put(bio); +} + +/* + * Returns 1 if the submission is successful. + */ +static int read_from_bdev_async(struct zram *zram, struct bio_vec *bvec, + unsigned long entry, struct bio *parent) +{ + struct bio *bio; + + bio = bio_alloc(GFP_ATOMIC, 1); + if (!bio) + return -ENOMEM; + + bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9); + bio->bi_bdev = zram->bdev; + if (!bio_add_page(bio, bvec->bv_page, bvec->bv_len, bvec->bv_offset)) { + bio_put(bio); + return -EIO; + } + + if (!parent) { + bio->bi_opf = REQ_OP_READ; + bio->bi_end_io = zram_page_end_io; + } else { + bio->bi_opf = parent->bi_opf; + bio_chain(bio, parent); + } + + submit_bio(bio); + return 1; +} + +struct zram_work { + struct work_struct work; + struct zram *zram; + unsigned long entry; + struct bio *bio; +}; + +#if PAGE_SIZE != 4096 +static void zram_sync_read(struct work_struct *work) +{ + struct bio_vec bvec; + struct zram_work *zw = container_of(work, struct zram_work, work); + struct zram *zram = zw->zram; + unsigned long entry = zw->entry; + struct bio *bio = zw->bio; + + read_from_bdev_async(zram, &bvec, entry, bio); +} + +/* + * Block layer want one ->make_request_fn to be active at a time + * so if we use chained IO with parent IO in same context, + * it's a deadlock. To avoid, it, it uses worker thread context. + */ +static int read_from_bdev_sync(struct zram *zram, struct bio_vec *bvec, + unsigned long entry, struct bio *bio) +{ + struct zram_work work; + + work.zram = zram; + work.entry = entry; + work.bio = bio; + + INIT_WORK_ONSTACK(&work.work, zram_sync_read); + queue_work(system_unbound_wq, &work.work); + flush_work(&work.work); + destroy_work_on_stack(&work.work); + + return 1; +} +#else +static int read_from_bdev_sync(struct zram *zram, struct bio_vec *bvec, + unsigned long entry, struct bio *bio) +{ + WARN_ON(1); + return -EIO; +} +#endif + +static int read_from_bdev(struct zram *zram, struct bio_vec *bvec, + unsigned long entry, struct bio *parent, bool sync) +{ + if (sync) + return read_from_bdev_sync(zram, bvec, entry, parent); + else + return read_from_bdev_async(zram, bvec, entry, parent); +} + +static int write_to_bdev(struct zram *zram, struct bio_vec *bvec, + u32 index, struct bio *parent, + unsigned long *pentry) +{ + struct bio *bio; + unsigned long entry; + + bio = bio_alloc(GFP_ATOMIC, 1); + if (!bio) + return -ENOMEM; + + entry = get_entry_bdev(zram); + if (!entry) { + bio_put(bio); + return -ENOSPC; + } + + bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9); + bio->bi_bdev = zram->bdev; + if (!bio_add_page(bio, bvec->bv_page, bvec->bv_len, + bvec->bv_offset)) { + bio_put(bio); + put_entry_bdev(zram, entry); + return -EIO; + } + + if (!parent) { + bio->bi_opf = REQ_OP_WRITE | REQ_SYNC; + bio->bi_end_io = zram_page_end_io; + } else { + bio->bi_opf = parent->bi_opf; + bio_chain(bio, parent); + } + + submit_bio(bio); + *pentry = entry; + + return 0; +} + +static void zram_wb_clear(struct zram *zram, u32 index) +{ + unsigned long entry; + + zram_clear_flag(zram, index, ZRAM_WB); + entry = zram_get_element(zram, index); + zram_set_element(zram, index, 0); + put_entry_bdev(zram, entry); +} + +#else +static bool zram_wb_enabled(struct zram *zram) { return false; } +static inline void reset_bdev(struct zram *zram) {}; +static int write_to_bdev(struct zram *zram, struct bio_vec *bvec, + u32 index, struct bio *parent, + unsigned long *pentry) + +{ + return -EIO; +} + +static int read_from_bdev(struct zram *zram, struct bio_vec *bvec, + unsigned long entry, struct bio *parent, bool sync) +{ + return -EIO; +} +static void zram_wb_clear(struct zram *zram, u32 index) {} +#endif + +#ifdef CONFIG_ZRAM_MEMORY_TRACKING + +static struct dentry *zram_debugfs_root; + +static void zram_debugfs_create(void) +{ + zram_debugfs_root = debugfs_create_dir("zram", NULL); +} + +static void zram_debugfs_destroy(void) +{ + debugfs_remove_recursive(zram_debugfs_root); +} + +static void zram_accessed(struct zram *zram, u32 index) +{ + zram->table[index].ac_time = ktime_get_boottime(); +} + +static void zram_reset_access(struct zram *zram, u32 index) +{ + zram->table[index].ac_time.tv64 = 0; +} + +static ssize_t read_block_state(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + char *kbuf; + ssize_t index, written = 0; + struct zram *zram = file->private_data; + unsigned long nr_pages = zram->disksize >> PAGE_SHIFT; + struct timespec64 ts; + + gfp_t kmalloc_flags; + + kmalloc_flags = GFP_KERNEL; + if (count > PAGE_SIZE) + kmalloc_flags |= __GFP_NOWARN | __GFP_NORETRY; + + kbuf = kmalloc_node(count, kmalloc_flags, NUMA_NO_NODE); + if (!kbuf && count > PAGE_SIZE) + kbuf = vmalloc(count); + if (!kbuf) + return -ENOMEM; + + down_read(&zram->init_lock); + if (!init_done(zram)) { + up_read(&zram->init_lock); + kvfree(kbuf); + return -EINVAL; + } + + for (index = *ppos; index < nr_pages; index++) { + int copied; + + zram_slot_lock(zram, index); + if (!zram_allocated(zram, index)) + goto next; + + ts = ktime_to_timespec64(zram->table[index].ac_time); + copied = snprintf(kbuf + written, count, + "%12zd %12lld.%06lu %c%c%c\n", + index, (s64)ts.tv_sec, + ts.tv_nsec / NSEC_PER_USEC, + zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.', + zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.', + zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.'); + + if (count < copied) { + zram_slot_unlock(zram, index); + break; + } + written += copied; + count -= copied; +next: + zram_slot_unlock(zram, index); + *ppos += 1; + } + + up_read(&zram->init_lock); + if (copy_to_user(buf, kbuf, written)) + written = -EFAULT; + kvfree(kbuf); + + return written; +} + +static const struct file_operations proc_zram_block_state_op = { + .open = simple_open, + .read = read_block_state, + .llseek = default_llseek, +}; + +static void zram_debugfs_register(struct zram *zram) +{ + if (!zram_debugfs_root) + return; + + zram->debugfs_dir = debugfs_create_dir(zram->disk->disk_name, + zram_debugfs_root); + debugfs_create_file("block_state", 0400, zram->debugfs_dir, + zram, &proc_zram_block_state_op); +} + +static void zram_debugfs_unregister(struct zram *zram) +{ + debugfs_remove_recursive(zram->debugfs_dir); +} +#else +static void zram_debugfs_create(void) {}; +static void zram_debugfs_destroy(void) {}; +static void zram_accessed(struct zram *zram, u32 index) {}; +static void zram_reset_access(struct zram *zram, u32 index) {}; +static void zram_debugfs_register(struct zram *zram) {}; +static void zram_debugfs_unregister(struct zram *zram) {}; +#endif + /* * We switched to per-cpu streams and this attr is not needed anymore. * However, we will keep it around for some time, because: @@ -351,7 +793,7 @@ static ssize_t comp_algorithm_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct zram *zram = dev_to_zram(dev); - char compressor[CRYPTO_MAX_ALG_NAME]; + char compressor[ARRAY_SIZE(zram->compressor)]; size_t sz; strlcpy(compressor, buf, sizeof(compressor)); @@ -370,7 +812,7 @@ static ssize_t comp_algorithm_store(struct device *dev, return -EBUSY; } - strlcpy(zram->compressor, compressor, sizeof(compressor)); + strcpy(zram->compressor, compressor); up_write(&zram->init_lock); return len; } @@ -379,7 +821,6 @@ static ssize_t compact_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct zram *zram = dev_to_zram(dev); - struct zram_meta *meta; down_read(&zram->init_lock); if (!init_done(zram)) { @@ -387,8 +828,7 @@ static ssize_t compact_store(struct device *dev, return -EINVAL; } - meta = zram->meta; - zs_compact(meta->mem_pool); + zs_compact(zram->mem_pool); up_read(&zram->init_lock); return len; @@ -425,22 +865,23 @@ static ssize_t mm_stat_show(struct device *dev, down_read(&zram->init_lock); if (init_done(zram)) { - mem_used = zs_get_total_pages(zram->meta->mem_pool); - zs_pool_stats(zram->meta->mem_pool, &pool_stats); + mem_used = zs_get_total_pages(zram->mem_pool); + zs_pool_stats(zram->mem_pool, &pool_stats); } orig_size = atomic64_read(&zram->stats.pages_stored); max_used = atomic_long_read(&zram->stats.max_used_pages); ret = scnprintf(buf, PAGE_SIZE, - "%8llu %8llu %8llu %8lu %8ld %8llu %8lu\n", + "%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu\n", orig_size << PAGE_SHIFT, (u64)atomic64_read(&zram->stats.compr_data_size), mem_used << PAGE_SHIFT, zram->limit_pages << PAGE_SHIFT, max_used << PAGE_SHIFT, - (u64)atomic64_read(&zram->stats.zero_pages), - pool_stats.pages_compacted); + (u64)atomic64_read(&zram->stats.same_pages), + pool_stats.pages_compacted, + (u64)atomic64_read(&zram->stats.huge_pages)); up_read(&zram->init_lock); return ret; @@ -466,74 +907,38 @@ static ssize_t debug_stat_show(struct device *dev, static DEVICE_ATTR_RO(io_stat); static DEVICE_ATTR_RO(mm_stat); static DEVICE_ATTR_RO(debug_stat); -ZRAM_ATTR_RO(num_reads); -ZRAM_ATTR_RO(num_writes); -ZRAM_ATTR_RO(failed_reads); -ZRAM_ATTR_RO(failed_writes); -ZRAM_ATTR_RO(invalid_io); -ZRAM_ATTR_RO(notify_free); -ZRAM_ATTR_RO(zero_pages); -ZRAM_ATTR_RO(compr_data_size); -static inline bool zram_meta_get(struct zram *zram) -{ - if (atomic_inc_not_zero(&zram->refcount)) - return true; - return false; -} - -static inline void zram_meta_put(struct zram *zram) -{ - atomic_dec(&zram->refcount); -} - -static void zram_meta_free(struct zram_meta *meta, u64 disksize) +static void zram_meta_free(struct zram *zram, u64 disksize) { size_t num_pages = disksize >> PAGE_SHIFT; size_t index; /* Free all pages that are still in this zram device */ - for (index = 0; index < num_pages; index++) { - unsigned long handle = meta->table[index].handle; + for (index = 0; index < num_pages; index++) + zram_free_page(zram, index); - if (!handle) - continue; - - zs_free(meta->mem_pool, handle); - } - - zs_destroy_pool(meta->mem_pool); - vfree(meta->table); - kfree(meta); + zs_destroy_pool(zram->mem_pool); + vfree(zram->table); } -static struct zram_meta *zram_meta_alloc(char *pool_name, u64 disksize) +static bool zram_meta_alloc(struct zram *zram, u64 disksize) { size_t num_pages; - struct zram_meta *meta = kmalloc(sizeof(*meta), GFP_KERNEL); - - if (!meta) - return NULL; num_pages = disksize >> PAGE_SHIFT; - meta->table = vzalloc(num_pages * sizeof(*meta->table)); - if (!meta->table) { - pr_err("Error allocating zram address table\n"); - goto out_error; + zram->table = vzalloc(num_pages * sizeof(*zram->table)); + if (!zram->table) + return false; + + zram->mem_pool = zs_create_pool(zram->disk->disk_name); + if (!zram->mem_pool) { + vfree(zram->table); + return false; } - meta->mem_pool = zs_create_pool(pool_name); - if (!meta->mem_pool) { - pr_err("Error creating memory pool\n"); - goto out_error; - } - - return meta; - -out_error: - vfree(meta->table); - kfree(meta); - return NULL; + if (!huge_class_size) + huge_class_size = zs_huge_class_size(zram->mem_pool); + return true; } /* @@ -543,191 +948,194 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, u64 disksize) */ static void zram_free_page(struct zram *zram, size_t index) { - struct zram_meta *meta = zram->meta; - unsigned long handle = meta->table[index].handle; + unsigned long handle; - if (unlikely(!handle)) { - /* - * No memory is allocated for zero filled pages. - * Simply clear zero page flag. - */ - if (zram_test_flag(meta, index, ZRAM_ZERO)) { - zram_clear_flag(meta, index, ZRAM_ZERO); - atomic64_dec(&zram->stats.zero_pages); - } + zram_reset_access(zram, index); + + if (zram_test_flag(zram, index, ZRAM_HUGE)) { + zram_clear_flag(zram, index, ZRAM_HUGE); + atomic64_dec(&zram->stats.huge_pages); + } + + if (zram_wb_enabled(zram) && zram_test_flag(zram, index, ZRAM_WB)) { + zram_wb_clear(zram, index); + atomic64_dec(&zram->stats.pages_stored); return; } - zs_free(meta->mem_pool, handle); + /* + * No memory is allocated for same element filled pages. + * Simply clear same page flag. + */ + if (zram_test_flag(zram, index, ZRAM_SAME)) { + zram_clear_flag(zram, index, ZRAM_SAME); + zram_set_element(zram, index, 0); + atomic64_dec(&zram->stats.same_pages); + atomic64_dec(&zram->stats.pages_stored); + return; + } - atomic64_sub(zram_get_obj_size(meta, index), + handle = zram_get_handle(zram, index); + if (!handle) + return; + + zs_free(zram->mem_pool, handle); + + atomic64_sub(zram_get_obj_size(zram, index), &zram->stats.compr_data_size); atomic64_dec(&zram->stats.pages_stored); - meta->table[index].handle = 0; - zram_set_obj_size(meta, index, 0); + zram_set_handle(zram, index, 0); + zram_set_obj_size(zram, index, 0); } -static int zram_decompress_page(struct zram *zram, char *mem, u32 index) +static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, + struct bio *bio, bool partial_io) { - int ret = 0; - unsigned char *cmem; - struct zram_meta *meta = zram->meta; + int ret; unsigned long handle; unsigned int size; + void *src, *dst; - bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value); - handle = meta->table[index].handle; - size = zram_get_obj_size(meta, index); + if (zram_wb_enabled(zram)) { + zram_slot_lock(zram, index); + if (zram_test_flag(zram, index, ZRAM_WB)) { + struct bio_vec bvec; - if (!handle || zram_test_flag(meta, index, ZRAM_ZERO)) { - bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value); - memset(mem, 0, PAGE_SIZE); + zram_slot_unlock(zram, index); + + bvec.bv_page = page; + bvec.bv_len = PAGE_SIZE; + bvec.bv_offset = 0; + return read_from_bdev(zram, &bvec, + zram_get_element(zram, index), + bio, partial_io); + } + zram_slot_unlock(zram, index); + } + + zram_slot_lock(zram, index); + handle = zram_get_handle(zram, index); + if (!handle || zram_test_flag(zram, index, ZRAM_SAME)) { + unsigned long value; + void *mem; + + value = handle ? zram_get_element(zram, index) : 0; + mem = kmap_atomic(page); + zram_fill_page(mem, PAGE_SIZE, value); + kunmap_atomic(mem); + zram_slot_unlock(zram, index); return 0; } - cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO); + size = zram_get_obj_size(zram, index); + + src = zs_map_object(zram->mem_pool, handle, ZS_MM_RO); if (size == PAGE_SIZE) { - memcpy(mem, cmem, PAGE_SIZE); + dst = kmap_atomic(page); + memcpy(dst, src, PAGE_SIZE); + kunmap_atomic(dst); + ret = 0; } else { struct zcomp_strm *zstrm = zcomp_stream_get(zram->comp); - ret = zcomp_decompress(zstrm, cmem, size, mem); + dst = kmap_atomic(page); + ret = zcomp_decompress(zstrm, src, size, dst); + kunmap_atomic(dst); zcomp_stream_put(zram->comp); } - zs_unmap_object(meta->mem_pool, handle); - bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value); + zs_unmap_object(zram->mem_pool, handle); + zram_slot_unlock(zram, index); /* Should NEVER happen. Return bio error if it does. */ - if (unlikely(ret)) { - pr_err("Decompression failed! err=%d, page=%u\n", ret, index); - return ret; - } - - return 0; -} - -static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec, - u32 index, int offset) -{ - int ret; - struct page *page; - unsigned char *user_mem, *uncmem = NULL; - struct zram_meta *meta = zram->meta; - page = bvec->bv_page; - - bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value); - if (unlikely(!meta->table[index].handle) || - zram_test_flag(meta, index, ZRAM_ZERO)) { - bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value); - handle_zero_page(bvec); - return 0; - } - bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value); - - if (is_partial_io(bvec)) - /* Use a temporary buffer to decompress the page */ - uncmem = kmalloc(PAGE_SIZE, GFP_NOIO); - - user_mem = kmap_atomic(page); - if (!is_partial_io(bvec)) - uncmem = user_mem; - - if (!uncmem) { - pr_err("Unable to allocate temp memory\n"); - ret = -ENOMEM; - goto out_cleanup; - } - - ret = zram_decompress_page(zram, uncmem, index); - /* Should NEVER happen. Return bio error if it does. */ if (unlikely(ret)) - goto out_cleanup; + pr_err("Decompression failed! err=%d, page=%u\n", ret, index); - if (is_partial_io(bvec)) - memcpy(user_mem + bvec->bv_offset, uncmem + offset, - bvec->bv_len); - - flush_dcache_page(page); - ret = 0; -out_cleanup: - kunmap_atomic(user_mem); - if (is_partial_io(bvec)) - kfree(uncmem); return ret; } -static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, - int offset) +static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec, + u32 index, int offset, struct bio *bio) { - int ret = 0; - unsigned int clen; - unsigned long handle = 0; + int ret; struct page *page; - unsigned char *user_mem, *cmem, *src, *uncmem = NULL; - struct zram_meta *meta = zram->meta; - struct zcomp_strm *zstrm = NULL; - unsigned long alloced_pages; page = bvec->bv_page; if (is_partial_io(bvec)) { - /* - * This is a partial IO. We need to read the full page - * before to write the changes. - */ - uncmem = kmalloc(PAGE_SIZE, GFP_NOIO); - if (!uncmem) { - ret = -ENOMEM; - goto out; - } - ret = zram_decompress_page(zram, uncmem, index); - if (ret) - goto out; + /* Use a temporary buffer to decompress the page */ + page = alloc_page(GFP_NOIO|__GFP_HIGHMEM); + if (!page) + return -ENOMEM; } + ret = __zram_bvec_read(zram, page, index, bio, is_partial_io(bvec)); + if (unlikely(ret)) + goto out; + + if (is_partial_io(bvec)) { + void *dst = kmap_atomic(bvec->bv_page); + void *src = kmap_atomic(page); + + memcpy(dst + bvec->bv_offset, src + offset, bvec->bv_len); + kunmap_atomic(src); + kunmap_atomic(dst); + } +out: + if (is_partial_io(bvec)) + __free_page(page); + + return ret; +} + +static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, + u32 index, struct bio *bio) +{ + int ret = 0; + unsigned long alloced_pages; + unsigned long handle = 0; + unsigned int comp_len = 0; + void *src, *dst, *mem; + struct zcomp_strm *zstrm; + struct page *page = bvec->bv_page; + unsigned long element = 0; + enum zram_pageflags flags = 0; + bool allow_wb = true; + + mem = kmap_atomic(page); + if (page_same_filled(mem, &element)) { + kunmap_atomic(mem); + /* Free memory associated with this sector now. */ + flags = ZRAM_SAME; + atomic64_inc(&zram->stats.same_pages); + goto out; + } + kunmap_atomic(mem); + compress_again: - user_mem = kmap_atomic(page); - if (is_partial_io(bvec)) { - memcpy(uncmem + offset, user_mem + bvec->bv_offset, - bvec->bv_len); - kunmap_atomic(user_mem); - user_mem = NULL; - } else { - uncmem = user_mem; - } - - if (page_zero_filled(uncmem)) { - if (user_mem) - kunmap_atomic(user_mem); - /* Free memory associated with this sector now. */ - bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value); - zram_free_page(zram, index); - zram_set_flag(meta, index, ZRAM_ZERO); - bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value); - - atomic64_inc(&zram->stats.zero_pages); - ret = 0; - goto out; - } - zstrm = zcomp_stream_get(zram->comp); - ret = zcomp_compress(zstrm, uncmem, &clen); - if (!is_partial_io(bvec)) { - kunmap_atomic(user_mem); - user_mem = NULL; - uncmem = NULL; - } + src = kmap_atomic(page); + ret = zcomp_compress(zstrm, src, &comp_len); + kunmap_atomic(src); if (unlikely(ret)) { + zcomp_stream_put(zram->comp); pr_err("Compression failed! err=%d\n", ret); - goto out; + zs_free(zram->mem_pool, handle); + return ret; } - src = zstrm->buffer; - if (unlikely(clen > max_zpage_size)) { - clen = PAGE_SIZE; - if (is_partial_io(bvec)) - src = uncmem; + if (unlikely(comp_len >= huge_class_size)) { + if (zram_wb_enabled(zram) && allow_wb) { + zcomp_stream_put(zram->comp); + ret = write_to_bdev(zram, bvec, index, bio, &element); + if (!ret) { + flags = ZRAM_WB; + ret = 1; + goto out; + } + allow_wb = false; + goto compress_again; + } } /* @@ -744,71 +1152,108 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, * from the slow path and handle has already been allocated. */ if (!handle) - handle = zs_malloc(meta->mem_pool, clen, + handle = zs_malloc(zram->mem_pool, comp_len, __GFP_KSWAPD_RECLAIM | __GFP_NOWARN | __GFP_HIGHMEM | __GFP_MOVABLE); if (!handle) { zcomp_stream_put(zram->comp); - zstrm = NULL; - atomic64_inc(&zram->stats.writestall); - - handle = zs_malloc(meta->mem_pool, clen, + handle = zs_malloc(zram->mem_pool, comp_len, GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE); if (handle) goto compress_again; - - pr_err("Error allocating memory for compressed page: %u, size=%u\n", - index, clen); - ret = -ENOMEM; - goto out; + return -ENOMEM; } - alloced_pages = zs_get_total_pages(meta->mem_pool); + alloced_pages = zs_get_total_pages(zram->mem_pool); update_used_max(zram, alloced_pages); if (zram->limit_pages && alloced_pages > zram->limit_pages) { - zs_free(meta->mem_pool, handle); - ret = -ENOMEM; - goto out; + zcomp_stream_put(zram->comp); + zs_free(zram->mem_pool, handle); + return -ENOMEM; } - cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); + dst = zs_map_object(zram->mem_pool, handle, ZS_MM_WO); - if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { + src = zstrm->buffer; + if (comp_len == PAGE_SIZE) src = kmap_atomic(page); - memcpy(cmem, src, PAGE_SIZE); + memcpy(dst, src, comp_len); + if (comp_len == PAGE_SIZE) kunmap_atomic(src); - } else { - memcpy(cmem, src, clen); - } zcomp_stream_put(zram->comp); - zstrm = NULL; - zs_unmap_object(meta->mem_pool, handle); - + zs_unmap_object(zram->mem_pool, handle); + atomic64_add(comp_len, &zram->stats.compr_data_size); +out: /* * Free memory associated with this sector * before overwriting unused sectors. */ - bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value); + zram_slot_lock(zram, index); zram_free_page(zram, index); - meta->table[index].handle = handle; - zram_set_obj_size(meta, index, clen); - bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value); + if (comp_len == PAGE_SIZE) { + zram_set_flag(zram, index, ZRAM_HUGE); + atomic64_inc(&zram->stats.huge_pages); + } + + if (flags) { + zram_set_flag(zram, index, flags); + zram_set_element(zram, index, element); + } else { + zram_set_handle(zram, index, handle); + zram_set_obj_size(zram, index, comp_len); + } + zram_slot_unlock(zram, index); /* Update stats */ - atomic64_add(clen, &zram->stats.compr_data_size); atomic64_inc(&zram->stats.pages_stored); + return ret; +} + +static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, + u32 index, int offset, struct bio *bio) +{ + int ret; + struct page *page = NULL; + void *src; + struct bio_vec vec; + + vec = *bvec; + if (is_partial_io(bvec)) { + void *dst; + /* + * This is a partial IO. We need to read the full page + * before to write the changes. + */ + page = alloc_page(GFP_NOIO|__GFP_HIGHMEM); + if (!page) + return -ENOMEM; + + ret = __zram_bvec_read(zram, page, index, bio, true); + if (ret) + goto out; + + src = kmap_atomic(bvec->bv_page); + dst = kmap_atomic(page); + memcpy(dst + offset, src + bvec->bv_offset, bvec->bv_len); + kunmap_atomic(dst); + kunmap_atomic(src); + + vec.bv_page = page; + vec.bv_len = PAGE_SIZE; + vec.bv_offset = 0; + } + + ret = __zram_bvec_write(zram, &vec, index, bio); out: - if (zstrm) - zcomp_stream_put(zram->comp); if (is_partial_io(bvec)) - kfree(uncmem); + __free_page(page); return ret; } @@ -821,7 +1266,6 @@ static void zram_bio_discard(struct zram *zram, u32 index, int offset, struct bio *bio) { size_t n = bio->bi_iter.bi_size; - struct zram_meta *meta = zram->meta; /* * zram manages data in physical block size units. Because logical block @@ -842,17 +1286,22 @@ static void zram_bio_discard(struct zram *zram, u32 index, } while (n >= PAGE_SIZE) { - bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value); + zram_slot_lock(zram, index); zram_free_page(zram, index); - bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value); + zram_slot_unlock(zram, index); atomic64_inc(&zram->stats.notify_free); index++; n -= PAGE_SIZE; } } +/* + * Returns errno if it has some problem. Otherwise return 0 or 1. + * Returns 0 if IO request was done synchronously + * Returns 1 if IO request was successfully submitted. + */ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index, - int offset, bool is_write) + int offset, bool is_write, struct bio *bio) { unsigned long start_time = jiffies; int rw_acct = is_write ? REQ_OP_WRITE : REQ_OP_READ; @@ -863,15 +1312,20 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index, if (!is_write) { atomic64_inc(&zram->stats.num_reads); - ret = zram_bvec_read(zram, bvec, index, offset); + ret = zram_bvec_read(zram, bvec, index, offset, bio); + flush_dcache_page(bvec->bv_page); } else { atomic64_inc(&zram->stats.num_writes); - ret = zram_bvec_write(zram, bvec, index, offset); + ret = zram_bvec_write(zram, bvec, index, offset, bio); } generic_end_io_acct(rw_acct, &zram->disk->part0, start_time); - if (unlikely(ret)) { + zram_slot_lock(zram, index); + zram_accessed(zram, index); + zram_slot_unlock(zram, index); + + if (unlikely(ret < 0)) { if (!is_write) atomic64_inc(&zram->stats.failed_reads); else @@ -899,34 +1353,21 @@ static void __zram_make_request(struct zram *zram, struct bio *bio) } bio_for_each_segment(bvec, bio, iter) { - int max_transfer_size = PAGE_SIZE - offset; + struct bio_vec bv = bvec; + unsigned int unwritten = bvec.bv_len; - if (bvec.bv_len > max_transfer_size) { - /* - * zram_bvec_rw() can only make operation on a single - * zram page. Split the bio vector. - */ - struct bio_vec bv; - - bv.bv_page = bvec.bv_page; - bv.bv_len = max_transfer_size; - bv.bv_offset = bvec.bv_offset; - + do { + bv.bv_len = min_t(unsigned int, PAGE_SIZE - offset, + unwritten); if (zram_bvec_rw(zram, &bv, index, offset, - op_is_write(bio_op(bio))) < 0) + op_is_write(bio_op(bio)), bio) < 0) goto out; - bv.bv_len = bvec.bv_len - max_transfer_size; - bv.bv_offset += max_transfer_size; - if (zram_bvec_rw(zram, &bv, index + 1, 0, - op_is_write(bio_op(bio))) < 0) - goto out; - } else - if (zram_bvec_rw(zram, &bvec, index, offset, - op_is_write(bio_op(bio))) < 0) - goto out; + bv.bv_offset += bv.bv_len; + unwritten -= bv.bv_len; - update_position(&index, &offset, &bvec); + update_position(&index, &offset, &bv); + } while (unwritten); } bio_endio(bio); @@ -943,22 +1384,15 @@ static blk_qc_t zram_make_request(struct request_queue *queue, struct bio *bio) { struct zram *zram = queue->queuedata; - if (unlikely(!zram_meta_get(zram))) - goto error; - - blk_queue_split(queue, &bio, queue->bio_split); - if (!valid_io_request(zram, bio->bi_iter.bi_sector, bio->bi_iter.bi_size)) { atomic64_inc(&zram->stats.invalid_io); - goto put_zram; + goto error; } __zram_make_request(zram, bio); - zram_meta_put(zram); return BLK_QC_T_NONE; -put_zram: - zram_meta_put(zram); + error: bio_io_error(bio); return BLK_QC_T_NONE; @@ -968,45 +1402,39 @@ static void zram_slot_free_notify(struct block_device *bdev, unsigned long index) { struct zram *zram; - struct zram_meta *meta; zram = bdev->bd_disk->private_data; - meta = zram->meta; - bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value); + zram_slot_lock(zram, index); zram_free_page(zram, index); - bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value); + zram_slot_unlock(zram, index); atomic64_inc(&zram->stats.notify_free); } static int zram_rw_page(struct block_device *bdev, sector_t sector, struct page *page, bool is_write) { - int offset, err = -EIO; + int offset, ret; u32 index; struct zram *zram; struct bio_vec bv; zram = bdev->bd_disk->private_data; - if (unlikely(!zram_meta_get(zram))) - goto out; if (!valid_io_request(zram, sector, PAGE_SIZE)) { atomic64_inc(&zram->stats.invalid_io); - err = -EINVAL; - goto put_zram; + ret = -EINVAL; + goto out; } index = sector >> SECTORS_PER_PAGE_SHIFT; - offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT; + offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT; bv.bv_page = page; bv.bv_len = PAGE_SIZE; bv.bv_offset = 0; - err = zram_bvec_rw(zram, &bv, index, offset, is_write); -put_zram: - zram_meta_put(zram); + ret = zram_bvec_rw(zram, &bv, index, offset, is_write, NULL); out: /* * If I/O fails, just return error(ie, non-zero) without @@ -1016,14 +1444,24 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector, * bio->bi_end_io does things to handle the error * (e.g., SetPageError, set_page_dirty and extra works). */ - if (err == 0) + if (unlikely(ret < 0)) + return ret; + + switch (ret) { + case 0: page_endio(page, is_write, 0); - return err; + break; + case 1: + ret = 0; + break; + default: + WARN_ON(1); + } + return ret; } static void zram_reset_device(struct zram *zram) { - struct zram_meta *meta; struct zcomp *comp; u64 disksize; @@ -1036,23 +1474,8 @@ static void zram_reset_device(struct zram *zram) return; } - meta = zram->meta; comp = zram->comp; disksize = zram->disksize; - /* - * Refcount will go down to 0 eventually and r/w handler - * cannot handle further I/O so it will bail out by - * check zram_meta_get. - */ - zram_meta_put(zram); - /* - * We want to free zram_meta in process context to avoid - * deadlock between reclaim path and any other locks. - */ - wait_event(zram->io_done, atomic_read(&zram->refcount) == 0); - - /* Reset stats */ - memset(&zram->stats, 0, sizeof(zram->stats)); zram->disksize = 0; set_capacity(zram->disk, 0); @@ -1060,8 +1483,10 @@ static void zram_reset_device(struct zram *zram) up_write(&zram->init_lock); /* I/O operation under all of CPU are done so let's free */ - zram_meta_free(meta, disksize); + zram_meta_free(zram, disksize); + memset(&zram->stats, 0, sizeof(zram->stats)); zcomp_destroy(comp); + reset_bdev(zram); } static ssize_t disksize_store(struct device *dev, @@ -1069,7 +1494,6 @@ static ssize_t disksize_store(struct device *dev, { u64 disksize; struct zcomp *comp; - struct zram_meta *meta; struct zram *zram = dev_to_zram(dev); int err; @@ -1077,10 +1501,18 @@ static ssize_t disksize_store(struct device *dev, if (!disksize) return -EINVAL; + down_write(&zram->init_lock); + if (init_done(zram)) { + pr_info("Cannot change disksize for initialized device\n"); + err = -EBUSY; + goto out_unlock; + } + disksize = PAGE_ALIGN(disksize); - meta = zram_meta_alloc(zram->disk->disk_name, disksize); - if (!meta) - return -ENOMEM; + if (!zram_meta_alloc(zram, disksize)) { + err = -ENOMEM; + goto out_unlock; + } comp = zcomp_create(zram->compressor); if (IS_ERR(comp)) { @@ -1090,29 +1522,19 @@ static ssize_t disksize_store(struct device *dev, goto out_free_meta; } - down_write(&zram->init_lock); - if (init_done(zram)) { - pr_info("Cannot change disksize for initialized device\n"); - err = -EBUSY; - goto out_destroy_comp; - } - - init_waitqueue_head(&zram->io_done); - atomic_set(&zram->refcount, 1); - zram->meta = meta; zram->comp = comp; zram->disksize = disksize; set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT); - zram_revalidate_disk(zram); + + revalidate_disk(zram->disk); up_write(&zram->init_lock); return len; -out_destroy_comp: - up_write(&zram->init_lock); - zcomp_destroy(comp); out_free_meta: - zram_meta_free(meta, disksize); + zram_meta_free(zram, disksize); +out_unlock: + up_write(&zram->init_lock); return err; } @@ -1151,7 +1573,7 @@ static ssize_t reset_store(struct device *dev, /* Make sure all the pending I/O are finished */ fsync_bdev(bdev); zram_reset_device(zram); - zram_revalidate_disk(zram); + revalidate_disk(zram->disk); bdput(bdev); mutex_lock(&bdev->bd_mutex); @@ -1187,39 +1609,33 @@ static DEVICE_ATTR_WO(compact); static DEVICE_ATTR_RW(disksize); static DEVICE_ATTR_RO(initstate); static DEVICE_ATTR_WO(reset); -static DEVICE_ATTR_RO(orig_data_size); -static DEVICE_ATTR_RO(mem_used_total); -static DEVICE_ATTR_RW(mem_limit); -static DEVICE_ATTR_RW(mem_used_max); +static DEVICE_ATTR_WO(mem_limit); +static DEVICE_ATTR_WO(mem_used_max); static DEVICE_ATTR_RW(max_comp_streams); static DEVICE_ATTR_RW(comp_algorithm); +#ifdef CONFIG_ZRAM_WRITEBACK +static DEVICE_ATTR_RW(backing_dev); +#endif static struct attribute *zram_disk_attrs[] = { &dev_attr_disksize.attr, &dev_attr_initstate.attr, &dev_attr_reset.attr, - &dev_attr_num_reads.attr, - &dev_attr_num_writes.attr, - &dev_attr_failed_reads.attr, - &dev_attr_failed_writes.attr, &dev_attr_compact.attr, - &dev_attr_invalid_io.attr, - &dev_attr_notify_free.attr, - &dev_attr_zero_pages.attr, - &dev_attr_orig_data_size.attr, - &dev_attr_compr_data_size.attr, - &dev_attr_mem_used_total.attr, &dev_attr_mem_limit.attr, &dev_attr_mem_used_max.attr, &dev_attr_max_comp_streams.attr, &dev_attr_comp_algorithm.attr, +#ifdef CONFIG_ZRAM_WRITEBACK + &dev_attr_backing_dev.attr, +#endif &dev_attr_io_stat.attr, &dev_attr_mm_stat.attr, &dev_attr_debug_stat.attr, NULL, }; -static struct attribute_group zram_disk_attr_group = { +static const struct attribute_group zram_disk_attr_group = { .attrs = zram_disk_attrs, }; @@ -1276,6 +1692,7 @@ static int zram_add(void) /* zram devices sort of resembles non-rotational disks */ queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue); queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, zram->disk->queue); + /* * To ensure that we always get PAGE_SIZE aligned * and n*PAGE_SIZED sized I/O requests. @@ -1286,8 +1703,6 @@ static int zram_add(void) blk_queue_io_min(zram->disk->queue, PAGE_SIZE); blk_queue_io_opt(zram->disk->queue, PAGE_SIZE); zram->disk->queue->limits.discard_granularity = PAGE_SIZE; - zram->disk->queue->limits.max_sectors = SECTORS_PER_PAGE; - zram->disk->queue->limits.chunk_sectors = 0; blk_queue_max_discard_sectors(zram->disk->queue, UINT_MAX); /* * zram_bio_discard() will clear all logical blocks if logical block @@ -1303,6 +1718,8 @@ static int zram_add(void) zram->disk->queue->limits.discard_zeroes_data = 0; queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue); + zram->disk->queue->backing_dev_info.capabilities |= + BDI_CAP_STABLE_WRITES; add_disk(zram->disk); ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj, @@ -1313,8 +1730,8 @@ static int zram_add(void) goto out_free_disk; } strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor)); - zram->meta = NULL; + zram_debugfs_register(zram); pr_info("Added device: %s\n", zram->disk->disk_name); return device_id; @@ -1348,6 +1765,7 @@ static int zram_remove(struct zram *zram) zram->claim = true; mutex_unlock(&bdev->bd_mutex); + zram_debugfs_unregister(zram); /* * Remove sysfs first, so no one will perform a disksize * store while we destroy the devices. This also helps during @@ -1365,8 +1783,8 @@ static int zram_remove(struct zram *zram) pr_info("Removed device: %s\n", zram->disk->disk_name); - blk_cleanup_queue(zram->disk->queue); del_gendisk(zram->disk); + blk_cleanup_queue(zram->disk->queue); put_disk(zram->disk); kfree(zram); return 0; @@ -1446,6 +1864,7 @@ static void destroy_devices(void) { class_unregister(&zram_control_class); idr_for_each(&zram_index_idr, &zram_remove_cb, NULL); + zram_debugfs_destroy(); idr_destroy(&zram_index_idr); unregister_blkdev(zram_major, "zram"); } @@ -1460,6 +1879,7 @@ static int __init zram_init(void) return ret; } + zram_debugfs_create(); zram_major = register_blkdev(0, "zram"); if (zram_major <= 0) { pr_err("Unable to get major number\n");
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 74fcf10..3a1cac4 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h
@@ -21,22 +21,6 @@ #include "zcomp.h" -/*-- Configurable parameters */ - -/* - * Pages that compress to size greater than this are stored - * uncompressed in memory. - */ -static const size_t max_zpage_size = PAGE_SIZE / 4 * 3; - -/* - * NOTE: max_zpage_size must be less than or equal to: - * ZS_MAX_ALLOC_SIZE. Otherwise, zs_malloc() would - * always return failure. - */ - -/*-- End of configurable params */ - #define SECTOR_SHIFT 9 #define SECTORS_PER_PAGE_SHIFT (PAGE_SHIFT - SECTOR_SHIFT) #define SECTORS_PER_PAGE (1 << SECTORS_PER_PAGE_SHIFT) @@ -60,9 +44,11 @@ static const size_t max_zpage_size = PAGE_SIZE / 4 * 3; /* Flags for zram pages (table[page_no].value) */ enum zram_pageflags { - /* Page consists entirely of zeros */ - ZRAM_ZERO = ZRAM_FLAG_SHIFT, - ZRAM_ACCESS, /* page is now accessed */ + /* zram slot is locked */ + ZRAM_LOCK = ZRAM_FLAG_SHIFT, + ZRAM_SAME, /* Page consists the same element */ + ZRAM_WB, /* page is stored on backing_device */ + ZRAM_HUGE, /* Incompressible page */ __NR_ZRAM_PAGEFLAGS, }; @@ -71,8 +57,14 @@ enum zram_pageflags { /* Allocated for each disk page */ struct zram_table_entry { - unsigned long handle; + union { + unsigned long handle; + unsigned long element; + }; unsigned long value; +#ifdef CONFIG_ZRAM_MEMORY_TRACKING + ktime_t ac_time; +#endif }; struct zram_stats { @@ -83,19 +75,16 @@ struct zram_stats { atomic64_t failed_writes; /* can happen when memory is too low */ atomic64_t invalid_io; /* non-page-aligned I/O requests */ atomic64_t notify_free; /* no. of swap slot free notifications */ - atomic64_t zero_pages; /* no. of zero filled pages */ + atomic64_t same_pages; /* no. of same element filled pages */ + atomic64_t huge_pages; /* no. of huge pages */ atomic64_t pages_stored; /* no. of pages currently stored */ atomic_long_t max_used_pages; /* no. of maximum pages stored */ atomic64_t writestall; /* no. of write slow paths */ }; -struct zram_meta { +struct zram { struct zram_table_entry *table; struct zs_pool *mem_pool; -}; - -struct zram { - struct zram_meta *meta; struct zcomp *comp; struct gendisk *disk; /* Prevent concurrent execution of device init */ @@ -106,9 +95,6 @@ struct zram { unsigned long limit_pages; struct zram_stats stats; - atomic_t refcount; /* refcount for zram_meta */ - /* wait all IO under all of cpu are done */ - wait_queue_head_t io_done; /* * This is the limit on amount of *uncompressed* worth of data * we can store in a disk. @@ -119,5 +105,16 @@ struct zram { * zram is claimed so open request will be failed */ bool claim; /* Protected by bdev->bd_mutex */ +#ifdef CONFIG_ZRAM_WRITEBACK + struct file *backing_dev; + struct block_device *bdev; + unsigned int old_block_size; + unsigned long *bitmap; + unsigned long nr_pages; + spinlock_t bitmap_lock; +#endif +#ifdef CONFIG_ZRAM_MEMORY_TRACKING + struct dentry *debugfs_dir; +#endif }; #endif
diff --git a/drivers/char/ipmi/ipmi_poweroff.c b/drivers/char/ipmi/ipmi_poweroff.c index 9f2e3be..676c910 100644 --- a/drivers/char/ipmi/ipmi_poweroff.c +++ b/drivers/char/ipmi/ipmi_poweroff.c
@@ -66,7 +66,7 @@ static void (*specific_poweroff_func)(ipmi_user_t user); /* Holds the old poweroff function so we can restore it on removal. */ static void (*old_poweroff_func)(void); -static int set_param_ifnum(const char *val, struct kernel_param *kp) +static int set_param_ifnum(const char *val, const struct kernel_param *kp) { int rv = param_set_int(val, kp); if (rv)
diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index e0a5315..89adeb4 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -1344,7 +1344,7 @@ static unsigned int num_slave_addrs; #define IPMI_MEM_ADDR_SPACE 1 static const char * const addr_space_to_str[] = { "i/o", "mem" }; -static int hotmod_handler(const char *val, struct kernel_param *kp); +static int hotmod_handler(const char *val, const struct kernel_param *kp); module_param_call(hotmod, hotmod_handler, NULL, NULL, 0200); MODULE_PARM_DESC(hotmod, "Add and remove interfaces. See" @@ -1814,7 +1814,7 @@ static struct smi_info *smi_info_alloc(void) return info; } -static int hotmod_handler(const char *val, struct kernel_param *kp) +static int hotmod_handler(const char *val, const struct kernel_param *kp) { char *str = kstrdup(val, GFP_KERNEL); int rv;
diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig index e2c6e43..0bed22f 100644 --- a/drivers/clocksource/Kconfig +++ b/drivers/clocksource/Kconfig
@@ -305,6 +305,14 @@ This must be disabled for hardware validation purposes to detect any hardware anomalies of missing events. +config ARM_ARCH_TIMER_VCT_ACCESS + bool "Support for ARM architected timer virtual counter access in userspace" + default !ARM64 + depends on ARM_ARCH_TIMER + help + This option enables support for reading the ARM architected timer's + virtual counter in userspace. + config FSL_ERRATUM_A008585 bool "Workaround for Freescale/NXP Erratum A-008585" default y
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index a2503db..e3bc592 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c
@@ -449,7 +449,10 @@ static void arch_counter_set_user_access(void) | ARCH_TIMER_USR_PCT_ACCESS_EN); /* Enable user access to the virtual counter */ - cntkctl |= ARCH_TIMER_USR_VCT_ACCESS_EN; + if (IS_ENABLED(CONFIG_ARM_ARCH_TIMER_VCT_ACCESS)) + cntkctl |= ARCH_TIMER_USR_VCT_ACCESS_EN; + else + cntkctl &= ~ARCH_TIMER_USR_VCT_ACCESS_EN; arch_timer_set_cntkctl(cntkctl); }
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index cac26fb..c70ed51 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig
@@ -45,6 +45,15 @@ If in doubt, say N. +config CPU_FREQ_TIMES + bool "CPU frequency time-in-state statistics" + default y + help + This driver exports CPU time-in-state information through procfs file + system. + + If in doubt, say N. + choice prompt "Default CPUFreq governor" default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || ARM_SA1110_CPUFREQ @@ -102,6 +111,16 @@ governor. If unsure have a look at the help section of the driver. Fallback governor will be the performance governor. +config CPU_FREQ_DEFAULT_GOV_INTERACTIVE + bool "interactive" + select CPU_FREQ_GOV_INTERACTIVE + select CPU_FREQ_GOV_PERFORMANCE + help + Use the CPUFreq governor 'interactive' as default. This allows + you to get a full dynamic cpu frequency capable system by simply + loading your cpufreq low-level hardware driver, using the + 'interactive' governor for latency-sensitive workloads. + config CPU_FREQ_DEFAULT_GOV_SCHEDUTIL bool "schedutil" depends on SMP @@ -193,6 +212,26 @@ If in doubt, say N. +config CPU_FREQ_GOV_INTERACTIVE + tristate "'interactive' cpufreq policy governor" + depends on CPU_FREQ + select CPU_FREQ_GOV_ATTR_SET + select IRQ_WORK + help + 'interactive' - This driver adds a dynamic cpufreq policy governor + designed for latency-sensitive workloads. + + This governor attempts to reduce the latency of clock + increases so that the system is more responsive to + interactive workloads. + + To compile this driver as a module, choose M here: the + module will be called cpufreq_interactive. + + For details, take a look at linux/Documentation/cpu-freq. + + If in doubt, say N. + config CPU_FREQ_GOV_SCHEDUTIL bool "'schedutil' cpufreq policy governor" depends on CPU_FREQ && SMP
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile index 0a9b6a09..f10a18d 100644 --- a/drivers/cpufreq/Makefile +++ b/drivers/cpufreq/Makefile
@@ -4,12 +4,16 @@ # CPUfreq stats obj-$(CONFIG_CPU_FREQ_STAT) += cpufreq_stats.o -# CPUfreq governors +# CPUfreq times +obj-$(CONFIG_CPU_FREQ_TIMES) += cpufreq_times.o + +# CPUfreq governors obj-$(CONFIG_CPU_FREQ_GOV_PERFORMANCE) += cpufreq_performance.o obj-$(CONFIG_CPU_FREQ_GOV_POWERSAVE) += cpufreq_powersave.o obj-$(CONFIG_CPU_FREQ_GOV_USERSPACE) += cpufreq_userspace.o obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND) += cpufreq_ondemand.o obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE) += cpufreq_conservative.o +obj-$(CONFIG_CPU_FREQ_GOV_INTERACTIVE) += cpufreq_interactive.o obj-$(CONFIG_CPU_FREQ_GOV_COMMON) += cpufreq_governor.o obj-$(CONFIG_CPU_FREQ_GOV_ATTR_SET) += cpufreq_governor_attr_set.o
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c index 4d3ec92..e4cbfe5 100644 --- a/drivers/cpufreq/cpufreq-dt.c +++ b/drivers/cpufreq/cpufreq-dt.c
@@ -280,6 +280,13 @@ static int cpufreq_init(struct cpufreq_policy *policy) policy->cpuinfo.transition_latency = transition_latency; + /* + * Android: set default parameters for parity between schedutil and + * schedfreq + */ + policy->up_transition_delay_us = transition_latency / NSEC_PER_USEC; + policy->down_transition_delay_us = 50000; /* 50ms */ + return 0; out_free_cpufreq_table:
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index af5eff6..a63581b 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c
@@ -19,6 +19,7 @@ #include <linux/cpu.h> #include <linux/cpufreq.h> +#include <linux/cpufreq_times.h> #include <linux/delay.h> #include <linux/device.h> #include <linux/init.h> @@ -29,6 +30,9 @@ #include <linux/suspend.h> #include <linux/syscore_ops.h> #include <linux/tick.h> +#ifdef CONFIG_SMP +#include <linux/sched.h> +#endif #include <trace/events/power.h> static LIST_HEAD(cpufreq_policy_list); @@ -117,6 +121,12 @@ bool have_governor_per_policy(void) } EXPORT_SYMBOL_GPL(have_governor_per_policy); +bool cpufreq_driver_is_slow(void) +{ + return !(cpufreq_driver->flags & CPUFREQ_DRIVER_FAST); +} +EXPORT_SYMBOL_GPL(cpufreq_driver_is_slow); + struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy) { if (have_governor_per_policy()) @@ -301,6 +311,92 @@ static void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci) #endif } +/********************************************************************* + * FREQUENCY INVARIANT CPU CAPACITY * + *********************************************************************/ + +static DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE; +static DEFINE_PER_CPU(unsigned long, max_freq_cpu); +static DEFINE_PER_CPU(unsigned long, max_freq_scale) = SCHED_CAPACITY_SCALE; +static DEFINE_PER_CPU(unsigned long, min_freq_scale); + +static void +scale_freq_capacity(const cpumask_t *cpus, unsigned long cur_freq, + unsigned long max_freq) +{ + unsigned long scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq; + int cpu; + + for_each_cpu(cpu, cpus) { + per_cpu(freq_scale, cpu) = scale; + per_cpu(max_freq_cpu, cpu) = max_freq; + } + + pr_debug("cpus %*pbl cur freq/max freq %lu/%lu kHz freq scale %lu\n", + cpumask_pr_args(cpus), cur_freq, max_freq, scale); +} + +unsigned long cpufreq_scale_freq_capacity(struct sched_domain *sd, int cpu) +{ + return per_cpu(freq_scale, cpu); +} + +static void +scale_max_freq_capacity(const cpumask_t *cpus, unsigned long policy_max_freq) +{ + unsigned long scale, max_freq; + int cpu = cpumask_first(cpus); + + if (cpu >= nr_cpu_ids) + return; + + max_freq = per_cpu(max_freq_cpu, cpu); + + if (!max_freq) + return; + + scale = (policy_max_freq << SCHED_CAPACITY_SHIFT) / max_freq; + + for_each_cpu(cpu, cpus) + per_cpu(max_freq_scale, cpu) = scale; + + pr_debug("cpus %*pbl policy max freq/max freq %lu/%lu kHz max freq scale %lu\n", + cpumask_pr_args(cpus), policy_max_freq, max_freq, scale); +} + +unsigned long cpufreq_scale_max_freq_capacity(struct sched_domain *sd, int cpu) +{ + return per_cpu(max_freq_scale, cpu); +} + +static void +scale_min_freq_capacity(const cpumask_t *cpus, unsigned long policy_min_freq) +{ + unsigned long scale, max_freq; + int cpu = cpumask_first(cpus); + + if (cpu >= nr_cpu_ids) + return; + + max_freq = per_cpu(max_freq_cpu, cpu); + + if (!max_freq) + return; + + scale = (policy_min_freq << SCHED_CAPACITY_SHIFT) / max_freq; + + for_each_cpu(cpu, cpus) + per_cpu(min_freq_scale, cpu) = scale; + + pr_debug("cpus %*pbl policy min freq/max freq %lu/%lu kHz min freq scale %lu\n", + cpumask_pr_args(cpus), policy_min_freq, max_freq, scale); +} + +unsigned long cpufreq_scale_min_freq_capacity(struct sched_domain *sd, int cpu) +{ + return per_cpu(min_freq_scale, cpu); +} + static void __cpufreq_notify_transition(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs, unsigned int state) { @@ -339,6 +435,7 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy, (unsigned long)freqs->new, (unsigned long)freqs->cpu); trace_cpu_frequency(freqs->new, freqs->cpu); cpufreq_stats_record_transition(policy, freqs->new); + cpufreq_times_record_transition(freqs); srcu_notifier_call_chain(&cpufreq_transition_notifier_list, CPUFREQ_POSTCHANGE, freqs); if (likely(policy) && likely(policy->cpu == freqs->cpu)) @@ -378,6 +475,9 @@ static void cpufreq_notify_post_transition(struct cpufreq_policy *policy, void cpufreq_freq_transition_begin(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs) { +#ifdef CONFIG_SMP + int cpu; +#endif /* * Catch double invocations of _begin() which lead to self-deadlock. @@ -405,6 +505,12 @@ void cpufreq_freq_transition_begin(struct cpufreq_policy *policy, spin_unlock(&policy->transition_lock); + scale_freq_capacity(policy->cpus, freqs->new, policy->cpuinfo.max_freq); +#ifdef CONFIG_SMP + for_each_cpu(cpu, policy->cpus) + trace_cpu_capacity(capacity_curr_of(cpu), cpu); +#endif + cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE); } EXPORT_SYMBOL_GPL(cpufreq_freq_transition_begin); @@ -1257,6 +1363,7 @@ static int cpufreq_online(unsigned int cpu) goto out_exit_policy; cpufreq_stats_create_table(policy); + cpufreq_times_create_policy(policy); blocking_notifier_call_chain(&cpufreq_policy_notifier_list, CPUFREQ_CREATE_POLICY, policy); @@ -2201,8 +2308,12 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy, blocking_notifier_call_chain(&cpufreq_policy_notifier_list, CPUFREQ_NOTIFY, new_policy); + scale_max_freq_capacity(policy->cpus, policy->max); + scale_min_freq_capacity(policy->cpus, policy->min); + policy->min = new_policy->min; policy->max = new_policy->max; + trace_cpu_frequency_limits(policy->max, policy->min, policy->cpu); policy->cached_target_freq = UINT_MAX;
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 00a7435..0fe2518 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -302,7 +302,10 @@ static void cs_start(struct cpufreq_policy *policy) dbs_info->requested_freq = policy->cur; } -static struct dbs_governor cs_governor = { +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE +static +#endif +struct dbs_governor cs_governor = { .gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("conservative"), .kobj_type = { .default_attrs = cs_attributes }, .gov_dbs_timer = cs_dbs_timer,
diff --git a/drivers/cpufreq/cpufreq_interactive.c b/drivers/cpufreq/cpufreq_interactive.c new file mode 100644 index 0000000..5a77d91 --- /dev/null +++ b/drivers/cpufreq/cpufreq_interactive.c
@@ -0,0 +1,1411 @@ +/* + * drivers/cpufreq/cpufreq_interactive.c + * + * Copyright (C) 2010-2016 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Author: Mike Chan (mike@android.com) + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/cpu.h> +#include <linux/cpumask.h> +#include <linux/cpufreq.h> +#include <linux/irq_work.h> +#include <linux/module.h> +#include <linux/moduleparam.h> +#include <linux/rwsem.h> +#include <linux/sched.h> +#include <linux/sched/rt.h> +#include <linux/tick.h> +#include <linux/time.h> +#include <linux/timer.h> +#include <linux/kthread.h> +#include <linux/slab.h> + +#define CREATE_TRACE_POINTS +#include <trace/events/cpufreq_interactive.h> + +#define gov_attr_ro(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0444, show_##_name, NULL) + +#define gov_attr_wo(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0200, NULL, store_##_name) + +#define gov_attr_rw(_name) \ +static struct governor_attr _name = \ +__ATTR(_name, 0644, show_##_name, store_##_name) + +/* Separate instance required for each 'interactive' directory in sysfs */ +struct interactive_tunables { + struct gov_attr_set attr_set; + + /* Hi speed to bump to from lo speed when load burst (default max) */ + unsigned int hispeed_freq; + + /* Go to hi speed when CPU load at or above this value. */ +#define DEFAULT_GO_HISPEED_LOAD 99 + unsigned long go_hispeed_load; + + /* Target load. Lower values result in higher CPU speeds. */ + spinlock_t target_loads_lock; + unsigned int *target_loads; + int ntarget_loads; + + /* + * The minimum amount of time to spend at a frequency before we can ramp + * down. + */ +#define DEFAULT_MIN_SAMPLE_TIME (80 * USEC_PER_MSEC) + unsigned long min_sample_time; + + /* The sample rate of the timer used to increase frequency */ + unsigned long sampling_rate; + + /* + * Wait this long before raising speed above hispeed, by default a + * single timer interval. + */ + spinlock_t above_hispeed_delay_lock; + unsigned int *above_hispeed_delay; + int nabove_hispeed_delay; + + /* Non-zero means indefinite speed boost active */ + int boost; + /* Duration of a boot pulse in usecs */ + int boostpulse_duration; + /* End time of boost pulse in ktime converted to usecs */ + u64 boostpulse_endtime; + bool boosted; + + /* + * Max additional time to wait in idle, beyond sampling_rate, at speeds + * above minimum before wakeup to reduce speed, or -1 if unnecessary. + */ +#define DEFAULT_TIMER_SLACK (4 * DEFAULT_SAMPLING_RATE) + unsigned long timer_slack_delay; + unsigned long timer_slack; + bool io_is_busy; +}; + +/* Separate instance required for each 'struct cpufreq_policy' */ +struct interactive_policy { + struct cpufreq_policy *policy; + struct interactive_tunables *tunables; + struct list_head tunables_hook; +}; + +/* Separate instance required for each CPU */ +struct interactive_cpu { + struct update_util_data update_util; + struct interactive_policy *ipolicy; + + struct irq_work irq_work; + u64 last_sample_time; + unsigned long next_sample_jiffies; + bool work_in_progress; + + struct rw_semaphore enable_sem; + struct timer_list slack_timer; + + spinlock_t load_lock; /* protects the next 4 fields */ + u64 time_in_idle; + u64 time_in_idle_timestamp; + u64 cputime_speedadj; + u64 cputime_speedadj_timestamp; + + spinlock_t target_freq_lock; /*protects target freq */ + unsigned int target_freq; + + unsigned int floor_freq; + u64 pol_floor_val_time; /* policy floor_validate_time */ + u64 loc_floor_val_time; /* per-cpu floor_validate_time */ + u64 pol_hispeed_val_time; /* policy hispeed_validate_time */ + u64 loc_hispeed_val_time; /* per-cpu hispeed_validate_time */ +}; + +static DEFINE_PER_CPU(struct interactive_cpu, interactive_cpu); + +/* Realtime thread handles frequency scaling */ +static struct task_struct *speedchange_task; +static cpumask_t speedchange_cpumask; +static spinlock_t speedchange_cpumask_lock; + +/* Target load. Lower values result in higher CPU speeds. */ +#define DEFAULT_TARGET_LOAD 90 +static unsigned int default_target_loads[] = {DEFAULT_TARGET_LOAD}; + +#define DEFAULT_SAMPLING_RATE (20 * USEC_PER_MSEC) +#define DEFAULT_ABOVE_HISPEED_DELAY DEFAULT_SAMPLING_RATE +static unsigned int default_above_hispeed_delay[] = { + DEFAULT_ABOVE_HISPEED_DELAY +}; + +/* Iterate over interactive policies for tunables */ +#define for_each_ipolicy(__ip) \ + list_for_each_entry(__ip, &tunables->attr_set.policy_list, tunables_hook) + +static struct interactive_tunables *global_tunables; +static DEFINE_MUTEX(global_tunables_lock); + +static inline void update_slack_delay(struct interactive_tunables *tunables) +{ + tunables->timer_slack_delay = usecs_to_jiffies(tunables->timer_slack + + tunables->sampling_rate); +} + +static bool timer_slack_required(struct interactive_cpu *icpu) +{ + struct interactive_policy *ipolicy = icpu->ipolicy; + struct interactive_tunables *tunables = ipolicy->tunables; + + if (tunables->timer_slack < 0) + return false; + + if (icpu->target_freq > ipolicy->policy->min) + return true; + + return false; +} + +static void gov_slack_timer_start(struct interactive_cpu *icpu, int cpu) +{ + struct interactive_tunables *tunables = icpu->ipolicy->tunables; + + icpu->slack_timer.expires = jiffies + tunables->timer_slack_delay; + add_timer_on(&icpu->slack_timer, cpu); +} + +static void gov_slack_timer_modify(struct interactive_cpu *icpu) +{ + struct interactive_tunables *tunables = icpu->ipolicy->tunables; + + mod_timer(&icpu->slack_timer, jiffies + tunables->timer_slack_delay); +} + +static void slack_timer_resched(struct interactive_cpu *icpu, int cpu, + bool modify) +{ + struct interactive_tunables *tunables = icpu->ipolicy->tunables; + unsigned long flags; + + spin_lock_irqsave(&icpu->load_lock, flags); + + icpu->time_in_idle = get_cpu_idle_time(cpu, + &icpu->time_in_idle_timestamp, + tunables->io_is_busy); + icpu->cputime_speedadj = 0; + icpu->cputime_speedadj_timestamp = icpu->time_in_idle_timestamp; + + if (timer_slack_required(icpu)) { + if (modify) + gov_slack_timer_modify(icpu); + else + gov_slack_timer_start(icpu, cpu); + } + + spin_unlock_irqrestore(&icpu->load_lock, flags); +} + +static unsigned int +freq_to_above_hispeed_delay(struct interactive_tunables *tunables, + unsigned int freq) +{ + unsigned long flags; + unsigned int ret; + int i; + + spin_lock_irqsave(&tunables->above_hispeed_delay_lock, flags); + + for (i = 0; i < tunables->nabove_hispeed_delay - 1 && + freq >= tunables->above_hispeed_delay[i + 1]; i += 2) + ; + + ret = tunables->above_hispeed_delay[i]; + spin_unlock_irqrestore(&tunables->above_hispeed_delay_lock, flags); + + return ret; +} + +static unsigned int freq_to_targetload(struct interactive_tunables *tunables, + unsigned int freq) +{ + unsigned long flags; + unsigned int ret; + int i; + + spin_lock_irqsave(&tunables->target_loads_lock, flags); + + for (i = 0; i < tunables->ntarget_loads - 1 && + freq >= tunables->target_loads[i + 1]; i += 2) + ; + + ret = tunables->target_loads[i]; + spin_unlock_irqrestore(&tunables->target_loads_lock, flags); + return ret; +} + +/* + * If increasing frequencies never map to a lower target load then + * choose_freq() will find the minimum frequency that does not exceed its + * target load given the current load. + */ +static unsigned int choose_freq(struct interactive_cpu *icpu, + unsigned int loadadjfreq) +{ + struct cpufreq_policy *policy = icpu->ipolicy->policy; + struct cpufreq_frequency_table *freq_table = policy->freq_table; + unsigned int prevfreq, freqmin = 0, freqmax = UINT_MAX, tl; + unsigned int freq = policy->cur; + int index; + + do { + prevfreq = freq; + tl = freq_to_targetload(icpu->ipolicy->tunables, freq); + + /* + * Find the lowest frequency where the computed load is less + * than or equal to the target load. + */ + + index = cpufreq_frequency_table_target(policy, loadadjfreq / tl, + CPUFREQ_RELATION_L); + + freq = freq_table[index].frequency; + + if (freq > prevfreq) { + /* The previous frequency is too low */ + freqmin = prevfreq; + + if (freq < freqmax) + continue; + + /* Find highest frequency that is less than freqmax */ + index = cpufreq_frequency_table_target(policy, + freqmax - 1, CPUFREQ_RELATION_H); + + freq = freq_table[index].frequency; + + if (freq == freqmin) { + /* + * The first frequency below freqmax has already + * been found to be too low. freqmax is the + * lowest speed we found that is fast enough. + */ + freq = freqmax; + break; + } + } else if (freq < prevfreq) { + /* The previous frequency is high enough. */ + freqmax = prevfreq; + + if (freq > freqmin) + continue; + + /* Find lowest frequency that is higher than freqmin */ + index = cpufreq_frequency_table_target(policy, + freqmin + 1, CPUFREQ_RELATION_L); + + freq = freq_table[index].frequency; + + /* + * If freqmax is the first frequency above + * freqmin then we have already found that + * this speed is fast enough. + */ + if (freq == freqmax) + break; + } + + /* If same frequency chosen as previous then done. */ + } while (freq != prevfreq); + + return freq; +} + +static u64 update_load(struct interactive_cpu *icpu, int cpu) +{ + struct interactive_tunables *tunables = icpu->ipolicy->tunables; + u64 now_idle, now, active_time, delta_idle, delta_time; + + now_idle = get_cpu_idle_time(cpu, &now, tunables->io_is_busy); + delta_idle = (now_idle - icpu->time_in_idle); + delta_time = (now - icpu->time_in_idle_timestamp); + + if (delta_time <= delta_idle) + active_time = 0; + else + active_time = delta_time - delta_idle; + + icpu->cputime_speedadj += active_time * icpu->ipolicy->policy->cur; + + icpu->time_in_idle = now_idle; + icpu->time_in_idle_timestamp = now; + + return now; +} + +/* Re-evaluate load to see if a frequency change is required or not */ +static void eval_target_freq(struct interactive_cpu *icpu) +{ + struct interactive_tunables *tunables = icpu->ipolicy->tunables; + struct cpufreq_policy *policy = icpu->ipolicy->policy; + struct cpufreq_frequency_table *freq_table = policy->freq_table; + u64 cputime_speedadj, now, max_fvtime; + unsigned int new_freq, loadadjfreq, index, delta_time; + unsigned long flags; + int cpu_load; + int cpu = smp_processor_id(); + + spin_lock_irqsave(&icpu->load_lock, flags); + now = update_load(icpu, smp_processor_id()); + delta_time = (unsigned int)(now - icpu->cputime_speedadj_timestamp); + cputime_speedadj = icpu->cputime_speedadj; + spin_unlock_irqrestore(&icpu->load_lock, flags); + + if (WARN_ON_ONCE(!delta_time)) + return; + + spin_lock_irqsave(&icpu->target_freq_lock, flags); + do_div(cputime_speedadj, delta_time); + loadadjfreq = (unsigned int)cputime_speedadj * 100; + cpu_load = loadadjfreq / policy->cur; + tunables->boosted = tunables->boost || + now < tunables->boostpulse_endtime; + + if (cpu_load >= tunables->go_hispeed_load || tunables->boosted) { + if (policy->cur < tunables->hispeed_freq) { + new_freq = tunables->hispeed_freq; + } else { + new_freq = choose_freq(icpu, loadadjfreq); + + if (new_freq < tunables->hispeed_freq) + new_freq = tunables->hispeed_freq; + } + } else { + new_freq = choose_freq(icpu, loadadjfreq); + if (new_freq > tunables->hispeed_freq && + policy->cur < tunables->hispeed_freq) + new_freq = tunables->hispeed_freq; + } + + if (policy->cur >= tunables->hispeed_freq && + new_freq > policy->cur && + now - icpu->pol_hispeed_val_time < freq_to_above_hispeed_delay(tunables, policy->cur)) { + trace_cpufreq_interactive_notyet(cpu, cpu_load, + icpu->target_freq, policy->cur, new_freq); + goto exit; + } + + icpu->loc_hispeed_val_time = now; + + index = cpufreq_frequency_table_target(policy, new_freq, + CPUFREQ_RELATION_L); + new_freq = freq_table[index].frequency; + + /* + * Do not scale below floor_freq unless we have been at or above the + * floor frequency for the minimum sample time since last validated. + */ + max_fvtime = max(icpu->pol_floor_val_time, icpu->loc_floor_val_time); + if (new_freq < icpu->floor_freq && icpu->target_freq >= policy->cur) { + if (now - max_fvtime < tunables->min_sample_time) { + trace_cpufreq_interactive_notyet(cpu, cpu_load, + icpu->target_freq, policy->cur, new_freq); + goto exit; + } + } + + /* + * Update the timestamp for checking whether speed has been held at + * or above the selected frequency for a minimum of min_sample_time, + * if not boosted to hispeed_freq. If boosted to hispeed_freq then we + * allow the speed to drop as soon as the boostpulse duration expires + * (or the indefinite boost is turned off). + */ + + if (!tunables->boosted || new_freq > tunables->hispeed_freq) { + icpu->floor_freq = new_freq; + if (icpu->target_freq >= policy->cur || new_freq >= policy->cur) + icpu->loc_floor_val_time = now; + } + + if (icpu->target_freq == new_freq && + icpu->target_freq <= policy->cur) { + trace_cpufreq_interactive_already(cpu, cpu_load, + icpu->target_freq, policy->cur, new_freq); + goto exit; + } + + trace_cpufreq_interactive_target(cpu, cpu_load, icpu->target_freq, + policy->cur, new_freq); + + icpu->target_freq = new_freq; + spin_unlock_irqrestore(&icpu->target_freq_lock, flags); + + spin_lock_irqsave(&speedchange_cpumask_lock, flags); + cpumask_set_cpu(cpu, &speedchange_cpumask); + spin_unlock_irqrestore(&speedchange_cpumask_lock, flags); + + wake_up_process(speedchange_task); + return; + +exit: + spin_unlock_irqrestore(&icpu->target_freq_lock, flags); +} + +static void cpufreq_interactive_update(struct interactive_cpu *icpu) +{ + eval_target_freq(icpu); + slack_timer_resched(icpu, smp_processor_id(), true); +} + +static void cpufreq_interactive_idle_end(void) +{ + struct interactive_cpu *icpu = &per_cpu(interactive_cpu, + smp_processor_id()); + + if (!down_read_trylock(&icpu->enable_sem)) + return; + + if (icpu->ipolicy) { + /* + * We haven't sampled load for more than sampling_rate time, do + * it right now. + */ + if (time_after_eq(jiffies, icpu->next_sample_jiffies)) + cpufreq_interactive_update(icpu); + } + + up_read(&icpu->enable_sem); +} + +static void cpufreq_interactive_get_policy_info(struct cpufreq_policy *policy, + unsigned int *pmax_freq, + u64 *phvt, u64 *pfvt) +{ + struct interactive_cpu *icpu; + u64 hvt = ~0ULL, fvt = 0; + unsigned int max_freq = 0, i; + + for_each_cpu(i, policy->cpus) { + icpu = &per_cpu(interactive_cpu, i); + + fvt = max(fvt, icpu->loc_floor_val_time); + if (icpu->target_freq > max_freq) { + max_freq = icpu->target_freq; + hvt = icpu->loc_hispeed_val_time; + } else if (icpu->target_freq == max_freq) { + hvt = min(hvt, icpu->loc_hispeed_val_time); + } + } + + *pmax_freq = max_freq; + *phvt = hvt; + *pfvt = fvt; +} + +static void cpufreq_interactive_adjust_cpu(unsigned int cpu, + struct cpufreq_policy *policy) +{ + struct interactive_cpu *icpu; + u64 hvt, fvt; + unsigned int max_freq; + int i; + + cpufreq_interactive_get_policy_info(policy, &max_freq, &hvt, &fvt); + + for_each_cpu(i, policy->cpus) { + icpu = &per_cpu(interactive_cpu, i); + icpu->pol_floor_val_time = fvt; + } + + if (max_freq != policy->cur) { + __cpufreq_driver_target(policy, max_freq, CPUFREQ_RELATION_H); + for_each_cpu(i, policy->cpus) { + icpu = &per_cpu(interactive_cpu, i); + icpu->pol_hispeed_val_time = hvt; + } + } + + trace_cpufreq_interactive_setspeed(cpu, max_freq, policy->cur); +} + +static int cpufreq_interactive_speedchange_task(void *data) +{ + unsigned int cpu; + cpumask_t tmp_mask; + unsigned long flags; + +again: + set_current_state(TASK_INTERRUPTIBLE); + spin_lock_irqsave(&speedchange_cpumask_lock, flags); + + if (cpumask_empty(&speedchange_cpumask)) { + spin_unlock_irqrestore(&speedchange_cpumask_lock, flags); + schedule(); + + if (kthread_should_stop()) + return 0; + + spin_lock_irqsave(&speedchange_cpumask_lock, flags); + } + + set_current_state(TASK_RUNNING); + tmp_mask = speedchange_cpumask; + cpumask_clear(&speedchange_cpumask); + spin_unlock_irqrestore(&speedchange_cpumask_lock, flags); + + for_each_cpu(cpu, &tmp_mask) { + struct interactive_cpu *icpu = &per_cpu(interactive_cpu, cpu); + struct cpufreq_policy *policy; + + if (unlikely(!down_read_trylock(&icpu->enable_sem))) + continue; + + if (likely(icpu->ipolicy)) { + policy = icpu->ipolicy->policy; + cpufreq_interactive_adjust_cpu(cpu, policy); + } + + up_read(&icpu->enable_sem); + } + + goto again; +} + +static void cpufreq_interactive_boost(struct interactive_tunables *tunables) +{ + struct interactive_policy *ipolicy; + struct cpufreq_policy *policy; + struct interactive_cpu *icpu; + unsigned long flags[2]; + bool wakeup = false; + int i; + + tunables->boosted = true; + + spin_lock_irqsave(&speedchange_cpumask_lock, flags[0]); + + for_each_ipolicy(ipolicy) { + policy = ipolicy->policy; + + for_each_cpu(i, policy->cpus) { + icpu = &per_cpu(interactive_cpu, i); + + if (!down_read_trylock(&icpu->enable_sem)) + continue; + + if (!icpu->ipolicy) { + up_read(&icpu->enable_sem); + continue; + } + + spin_lock_irqsave(&icpu->target_freq_lock, flags[1]); + if (icpu->target_freq < tunables->hispeed_freq) { + icpu->target_freq = tunables->hispeed_freq; + cpumask_set_cpu(i, &speedchange_cpumask); + icpu->pol_hispeed_val_time = ktime_to_us(ktime_get()); + wakeup = true; + } + spin_unlock_irqrestore(&icpu->target_freq_lock, flags[1]); + + up_read(&icpu->enable_sem); + } + } + + spin_unlock_irqrestore(&speedchange_cpumask_lock, flags[0]); + + if (wakeup) + wake_up_process(speedchange_task); +} + +static int cpufreq_interactive_notifier(struct notifier_block *nb, + unsigned long val, void *data) +{ + struct cpufreq_freqs *freq = data; + struct interactive_cpu *icpu = &per_cpu(interactive_cpu, freq->cpu); + unsigned long flags; + + if (val != CPUFREQ_POSTCHANGE) + return 0; + + if (!down_read_trylock(&icpu->enable_sem)) + return 0; + + if (!icpu->ipolicy) { + up_read(&icpu->enable_sem); + return 0; + } + + spin_lock_irqsave(&icpu->load_lock, flags); + update_load(icpu, freq->cpu); + spin_unlock_irqrestore(&icpu->load_lock, flags); + + up_read(&icpu->enable_sem); + + return 0; +} + +static struct notifier_block cpufreq_notifier_block = { + .notifier_call = cpufreq_interactive_notifier, +}; + +static unsigned int *get_tokenized_data(const char *buf, int *num_tokens) +{ + const char *cp = buf; + int ntokens = 1, i = 0; + unsigned int *tokenized_data; + int err = -EINVAL; + + while ((cp = strpbrk(cp + 1, " :"))) + ntokens++; + + if (!(ntokens & 0x1)) + goto err; + + tokenized_data = kcalloc(ntokens, sizeof(*tokenized_data), GFP_KERNEL); + if (!tokenized_data) { + err = -ENOMEM; + goto err; + } + + cp = buf; + while (i < ntokens) { + if (kstrtouint(cp, 0, &tokenized_data[i++]) < 0) + goto err_kfree; + + cp = strpbrk(cp, " :"); + if (!cp) + break; + cp++; + } + + if (i != ntokens) + goto err_kfree; + + *num_tokens = ntokens; + return tokenized_data; + +err_kfree: + kfree(tokenized_data); +err: + return ERR_PTR(err); +} + +/* Interactive governor sysfs interface */ +static struct interactive_tunables *to_tunables(struct gov_attr_set *attr_set) +{ + return container_of(attr_set, struct interactive_tunables, attr_set); +} + +#define show_one(file_name, type) \ +static ssize_t show_##file_name(struct gov_attr_set *attr_set, char *buf) \ +{ \ + struct interactive_tunables *tunables = to_tunables(attr_set); \ + return sprintf(buf, type "\n", tunables->file_name); \ +} + +static ssize_t show_target_loads(struct gov_attr_set *attr_set, char *buf) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long flags; + ssize_t ret = 0; + int i; + + spin_lock_irqsave(&tunables->target_loads_lock, flags); + + for (i = 0; i < tunables->ntarget_loads; i++) + ret += sprintf(buf + ret, "%u%s", tunables->target_loads[i], + i & 0x1 ? ":" : " "); + + sprintf(buf + ret - 1, "\n"); + spin_unlock_irqrestore(&tunables->target_loads_lock, flags); + + return ret; +} + +static ssize_t store_target_loads(struct gov_attr_set *attr_set, + const char *buf, size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned int *new_target_loads; + unsigned long flags; + int ntokens; + + new_target_loads = get_tokenized_data(buf, &ntokens); + if (IS_ERR(new_target_loads)) + return PTR_ERR(new_target_loads); + + spin_lock_irqsave(&tunables->target_loads_lock, flags); + if (tunables->target_loads != default_target_loads) + kfree(tunables->target_loads); + tunables->target_loads = new_target_loads; + tunables->ntarget_loads = ntokens; + spin_unlock_irqrestore(&tunables->target_loads_lock, flags); + + return count; +} + +static ssize_t show_above_hispeed_delay(struct gov_attr_set *attr_set, + char *buf) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long flags; + ssize_t ret = 0; + int i; + + spin_lock_irqsave(&tunables->above_hispeed_delay_lock, flags); + + for (i = 0; i < tunables->nabove_hispeed_delay; i++) + ret += sprintf(buf + ret, "%u%s", + tunables->above_hispeed_delay[i], + i & 0x1 ? ":" : " "); + + sprintf(buf + ret - 1, "\n"); + spin_unlock_irqrestore(&tunables->above_hispeed_delay_lock, flags); + + return ret; +} + +static ssize_t store_above_hispeed_delay(struct gov_attr_set *attr_set, + const char *buf, size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned int *new_above_hispeed_delay = NULL; + unsigned long flags; + int ntokens; + + new_above_hispeed_delay = get_tokenized_data(buf, &ntokens); + if (IS_ERR(new_above_hispeed_delay)) + return PTR_ERR(new_above_hispeed_delay); + + spin_lock_irqsave(&tunables->above_hispeed_delay_lock, flags); + if (tunables->above_hispeed_delay != default_above_hispeed_delay) + kfree(tunables->above_hispeed_delay); + tunables->above_hispeed_delay = new_above_hispeed_delay; + tunables->nabove_hispeed_delay = ntokens; + spin_unlock_irqrestore(&tunables->above_hispeed_delay_lock, flags); + + return count; +} + +static ssize_t store_hispeed_freq(struct gov_attr_set *attr_set, + const char *buf, size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long int val; + int ret; + + ret = kstrtoul(buf, 0, &val); + if (ret < 0) + return ret; + + tunables->hispeed_freq = val; + + return count; +} + +static ssize_t store_go_hispeed_load(struct gov_attr_set *attr_set, + const char *buf, size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long val; + int ret; + + ret = kstrtoul(buf, 0, &val); + if (ret < 0) + return ret; + + tunables->go_hispeed_load = val; + + return count; +} + +static ssize_t store_min_sample_time(struct gov_attr_set *attr_set, + const char *buf, size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long val; + int ret; + + ret = kstrtoul(buf, 0, &val); + if (ret < 0) + return ret; + + tunables->min_sample_time = val; + + return count; +} + +static ssize_t show_timer_rate(struct gov_attr_set *attr_set, char *buf) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + + return sprintf(buf, "%lu\n", tunables->sampling_rate); +} + +static ssize_t store_timer_rate(struct gov_attr_set *attr_set, const char *buf, + size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long val, val_round; + int ret; + + ret = kstrtoul(buf, 0, &val); + if (ret < 0) + return ret; + + val_round = jiffies_to_usecs(usecs_to_jiffies(val)); + if (val != val_round) + pr_warn("timer_rate not aligned to jiffy. Rounded up to %lu\n", + val_round); + + tunables->sampling_rate = val_round; + + return count; +} + +static ssize_t store_timer_slack(struct gov_attr_set *attr_set, const char *buf, + size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long val; + int ret; + + ret = kstrtol(buf, 10, &val); + if (ret < 0) + return ret; + + tunables->timer_slack = val; + update_slack_delay(tunables); + + return count; +} + +static ssize_t store_boost(struct gov_attr_set *attr_set, const char *buf, + size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long val; + int ret; + + ret = kstrtoul(buf, 0, &val); + if (ret < 0) + return ret; + + tunables->boost = val; + + if (tunables->boost) { + trace_cpufreq_interactive_boost("on"); + if (!tunables->boosted) + cpufreq_interactive_boost(tunables); + } else { + tunables->boostpulse_endtime = ktime_to_us(ktime_get()); + trace_cpufreq_interactive_unboost("off"); + } + + return count; +} + +static ssize_t store_boostpulse(struct gov_attr_set *attr_set, const char *buf, + size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long val; + int ret; + + ret = kstrtoul(buf, 0, &val); + if (ret < 0) + return ret; + + tunables->boostpulse_endtime = ktime_to_us(ktime_get()) + + tunables->boostpulse_duration; + trace_cpufreq_interactive_boost("pulse"); + if (!tunables->boosted) + cpufreq_interactive_boost(tunables); + + return count; +} + +static ssize_t store_boostpulse_duration(struct gov_attr_set *attr_set, + const char *buf, size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long val; + int ret; + + ret = kstrtoul(buf, 0, &val); + if (ret < 0) + return ret; + + tunables->boostpulse_duration = val; + + return count; +} + +static ssize_t store_io_is_busy(struct gov_attr_set *attr_set, const char *buf, + size_t count) +{ + struct interactive_tunables *tunables = to_tunables(attr_set); + unsigned long val; + int ret; + + ret = kstrtoul(buf, 0, &val); + if (ret < 0) + return ret; + + tunables->io_is_busy = val; + + return count; +} + +show_one(hispeed_freq, "%u"); +show_one(go_hispeed_load, "%lu"); +show_one(min_sample_time, "%lu"); +show_one(timer_slack, "%lu"); +show_one(boost, "%u"); +show_one(boostpulse_duration, "%u"); +show_one(io_is_busy, "%u"); + +gov_attr_rw(target_loads); +gov_attr_rw(above_hispeed_delay); +gov_attr_rw(hispeed_freq); +gov_attr_rw(go_hispeed_load); +gov_attr_rw(min_sample_time); +gov_attr_rw(timer_rate); +gov_attr_rw(timer_slack); +gov_attr_rw(boost); +gov_attr_wo(boostpulse); +gov_attr_rw(boostpulse_duration); +gov_attr_rw(io_is_busy); + +static struct attribute *interactive_attributes[] = { + &target_loads.attr, + &above_hispeed_delay.attr, + &hispeed_freq.attr, + &go_hispeed_load.attr, + &min_sample_time.attr, + &timer_rate.attr, + &timer_slack.attr, + &boost.attr, + &boostpulse.attr, + &boostpulse_duration.attr, + &io_is_busy.attr, + NULL +}; + +static struct kobj_type interactive_tunables_ktype = { + .default_attrs = interactive_attributes, + .sysfs_ops = &governor_sysfs_ops, +}; + +static int cpufreq_interactive_idle_notifier(struct notifier_block *nb, + unsigned long val, void *data) +{ + if (val == IDLE_END) + cpufreq_interactive_idle_end(); + + return 0; +} + +static struct notifier_block cpufreq_interactive_idle_nb = { + .notifier_call = cpufreq_interactive_idle_notifier, +}; + +/* Interactive Governor callbacks */ +struct interactive_governor { + struct cpufreq_governor gov; + unsigned int usage_count; +}; + +static struct interactive_governor interactive_gov; + +#define CPU_FREQ_GOV_INTERACTIVE (&interactive_gov.gov) + +static void irq_work(struct irq_work *irq_work) +{ + struct interactive_cpu *icpu = container_of(irq_work, struct + interactive_cpu, irq_work); + + cpufreq_interactive_update(icpu); + icpu->work_in_progress = false; +} + +static void update_util_handler(struct update_util_data *data, u64 time, + unsigned int flags) +{ + struct interactive_cpu *icpu = container_of(data, + struct interactive_cpu, update_util); + struct interactive_policy *ipolicy = icpu->ipolicy; + struct interactive_tunables *tunables = ipolicy->tunables; + u64 delta_ns; + + /* + * The irq-work may not be allowed to be queued up right now. + * Possible reasons: + * - Work has already been queued up or is in progress. + * - It is too early (too little time from the previous sample). + */ + if (icpu->work_in_progress) + return; + + delta_ns = time - icpu->last_sample_time; + if ((s64)delta_ns < tunables->sampling_rate * NSEC_PER_USEC) + return; + + icpu->last_sample_time = time; + icpu->next_sample_jiffies = usecs_to_jiffies(tunables->sampling_rate) + + jiffies; + + icpu->work_in_progress = true; + irq_work_queue(&icpu->irq_work); +} + +static void gov_set_update_util(struct interactive_policy *ipolicy) +{ + struct cpufreq_policy *policy = ipolicy->policy; + struct interactive_cpu *icpu; + int cpu; + + for_each_cpu(cpu, policy->cpus) { + icpu = &per_cpu(interactive_cpu, cpu); + + icpu->last_sample_time = 0; + icpu->next_sample_jiffies = 0; + cpufreq_add_update_util_hook(cpu, &icpu->update_util, + update_util_handler); + } +} + +static inline void gov_clear_update_util(struct cpufreq_policy *policy) +{ + int i; + + for_each_cpu(i, policy->cpus) + cpufreq_remove_update_util_hook(i); + + synchronize_sched(); +} + +static void icpu_cancel_work(struct interactive_cpu *icpu) +{ + irq_work_sync(&icpu->irq_work); + icpu->work_in_progress = false; + del_timer_sync(&icpu->slack_timer); +} + +static struct interactive_policy * +interactive_policy_alloc(struct cpufreq_policy *policy) +{ + struct interactive_policy *ipolicy; + + ipolicy = kzalloc(sizeof(*ipolicy), GFP_KERNEL); + if (!ipolicy) + return NULL; + + ipolicy->policy = policy; + + return ipolicy; +} + +static void interactive_policy_free(struct interactive_policy *ipolicy) +{ + kfree(ipolicy); +} + +static struct interactive_tunables * +interactive_tunables_alloc(struct interactive_policy *ipolicy) +{ + struct interactive_tunables *tunables; + + tunables = kzalloc(sizeof(*tunables), GFP_KERNEL); + if (!tunables) + return NULL; + + gov_attr_set_init(&tunables->attr_set, &ipolicy->tunables_hook); + if (!have_governor_per_policy()) + global_tunables = tunables; + + ipolicy->tunables = tunables; + + return tunables; +} + +static void interactive_tunables_free(struct interactive_tunables *tunables) +{ + if (!have_governor_per_policy()) + global_tunables = NULL; + + kfree(tunables); +} + +int cpufreq_interactive_init(struct cpufreq_policy *policy) +{ + struct interactive_policy *ipolicy; + struct interactive_tunables *tunables; + int ret; + + /* State should be equivalent to EXIT */ + if (policy->governor_data) + return -EBUSY; + + ipolicy = interactive_policy_alloc(policy); + if (!ipolicy) + return -ENOMEM; + + mutex_lock(&global_tunables_lock); + + if (global_tunables) { + if (WARN_ON(have_governor_per_policy())) { + ret = -EINVAL; + goto free_int_policy; + } + + policy->governor_data = ipolicy; + ipolicy->tunables = global_tunables; + + gov_attr_set_get(&global_tunables->attr_set, + &ipolicy->tunables_hook); + goto out; + } + + tunables = interactive_tunables_alloc(ipolicy); + if (!tunables) { + ret = -ENOMEM; + goto free_int_policy; + } + + tunables->hispeed_freq = policy->max; + tunables->above_hispeed_delay = default_above_hispeed_delay; + tunables->nabove_hispeed_delay = + ARRAY_SIZE(default_above_hispeed_delay); + tunables->go_hispeed_load = DEFAULT_GO_HISPEED_LOAD; + tunables->target_loads = default_target_loads; + tunables->ntarget_loads = ARRAY_SIZE(default_target_loads); + tunables->min_sample_time = DEFAULT_MIN_SAMPLE_TIME; + tunables->boostpulse_duration = DEFAULT_MIN_SAMPLE_TIME; + tunables->sampling_rate = DEFAULT_SAMPLING_RATE; + tunables->timer_slack = DEFAULT_TIMER_SLACK; + update_slack_delay(tunables); + + spin_lock_init(&tunables->target_loads_lock); + spin_lock_init(&tunables->above_hispeed_delay_lock); + + policy->governor_data = ipolicy; + + ret = kobject_init_and_add(&tunables->attr_set.kobj, + &interactive_tunables_ktype, + get_governor_parent_kobj(policy), "%s", + interactive_gov.gov.name); + if (ret) + goto fail; + + /* One time initialization for governor */ + if (!interactive_gov.usage_count++) { + idle_notifier_register(&cpufreq_interactive_idle_nb); + cpufreq_register_notifier(&cpufreq_notifier_block, + CPUFREQ_TRANSITION_NOTIFIER); + } + + out: + mutex_unlock(&global_tunables_lock); + return 0; + + fail: + policy->governor_data = NULL; + interactive_tunables_free(tunables); + + free_int_policy: + mutex_unlock(&global_tunables_lock); + + interactive_policy_free(ipolicy); + pr_err("governor initialization failed (%d)\n", ret); + + return ret; +} + +void cpufreq_interactive_exit(struct cpufreq_policy *policy) +{ + struct interactive_policy *ipolicy = policy->governor_data; + struct interactive_tunables *tunables = ipolicy->tunables; + unsigned int count; + + mutex_lock(&global_tunables_lock); + + /* Last policy using the governor ? */ + if (!--interactive_gov.usage_count) { + cpufreq_unregister_notifier(&cpufreq_notifier_block, + CPUFREQ_TRANSITION_NOTIFIER); + idle_notifier_unregister(&cpufreq_interactive_idle_nb); + } + + count = gov_attr_set_put(&tunables->attr_set, &ipolicy->tunables_hook); + policy->governor_data = NULL; + if (!count) + interactive_tunables_free(tunables); + + mutex_unlock(&global_tunables_lock); + + interactive_policy_free(ipolicy); +} + +int cpufreq_interactive_start(struct cpufreq_policy *policy) +{ + struct interactive_policy *ipolicy = policy->governor_data; + struct interactive_cpu *icpu; + unsigned int cpu; + + for_each_cpu(cpu, policy->cpus) { + icpu = &per_cpu(interactive_cpu, cpu); + + icpu->target_freq = policy->cur; + icpu->floor_freq = icpu->target_freq; + icpu->pol_floor_val_time = ktime_to_us(ktime_get()); + icpu->loc_floor_val_time = icpu->pol_floor_val_time; + icpu->pol_hispeed_val_time = icpu->pol_floor_val_time; + icpu->loc_hispeed_val_time = icpu->pol_floor_val_time; + + down_write(&icpu->enable_sem); + icpu->ipolicy = ipolicy; + up_write(&icpu->enable_sem); + + slack_timer_resched(icpu, cpu, false); + } + + gov_set_update_util(ipolicy); + return 0; +} + +void cpufreq_interactive_stop(struct cpufreq_policy *policy) +{ + struct interactive_policy *ipolicy = policy->governor_data; + struct interactive_cpu *icpu; + unsigned int cpu; + + gov_clear_update_util(ipolicy->policy); + + for_each_cpu(cpu, policy->cpus) { + icpu = &per_cpu(interactive_cpu, cpu); + + icpu_cancel_work(icpu); + + down_write(&icpu->enable_sem); + icpu->ipolicy = NULL; + up_write(&icpu->enable_sem); + } +} + +void cpufreq_interactive_limits(struct cpufreq_policy *policy) +{ + struct interactive_cpu *icpu; + unsigned int cpu; + unsigned long flags; + + cpufreq_policy_apply_limits(policy); + + for_each_cpu(cpu, policy->cpus) { + icpu = &per_cpu(interactive_cpu, cpu); + + spin_lock_irqsave(&icpu->target_freq_lock, flags); + + if (policy->max < icpu->target_freq) + icpu->target_freq = policy->max; + else if (policy->min > icpu->target_freq) + icpu->target_freq = policy->min; + + spin_unlock_irqrestore(&icpu->target_freq_lock, flags); + } +} + +static struct interactive_governor interactive_gov = { + .gov = { + .name = "interactive", + .max_transition_latency = TRANSITION_LATENCY_LIMIT, + .owner = THIS_MODULE, + .init = cpufreq_interactive_init, + .exit = cpufreq_interactive_exit, + .start = cpufreq_interactive_start, + .stop = cpufreq_interactive_stop, + .limits = cpufreq_interactive_limits, + } +}; + +static void cpufreq_interactive_nop_timer(unsigned long data) +{ + /* + * The purpose of slack-timer is to wake up the CPU from IDLE, in order + * to decrease its frequency if it is not set to minimum already. + * + * This is important for platforms where CPU with higher frequencies + * consume higher power even at IDLE. + */ +} + +static int __init cpufreq_interactive_gov_init(void) +{ + struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 }; + struct interactive_cpu *icpu; + unsigned int cpu; + + for_each_possible_cpu(cpu) { + icpu = &per_cpu(interactive_cpu, cpu); + + init_irq_work(&icpu->irq_work, irq_work); + spin_lock_init(&icpu->load_lock); + spin_lock_init(&icpu->target_freq_lock); + init_rwsem(&icpu->enable_sem); + + /* Initialize per-cpu slack-timer */ + init_timer_pinned(&icpu->slack_timer); + icpu->slack_timer.function = cpufreq_interactive_nop_timer; + } + + spin_lock_init(&speedchange_cpumask_lock); + speedchange_task = kthread_create(cpufreq_interactive_speedchange_task, + NULL, "cfinteractive"); + if (IS_ERR(speedchange_task)) + return PTR_ERR(speedchange_task); + + sched_setscheduler_nocheck(speedchange_task, SCHED_FIFO, ¶m); + get_task_struct(speedchange_task); + + /* wake up so the thread does not look hung to the freezer */ + wake_up_process(speedchange_task); + + return cpufreq_register_governor(CPU_FREQ_GOV_INTERACTIVE); +} + +#ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_INTERACTIVE +struct cpufreq_governor *cpufreq_default_governor(void) +{ + return CPU_FREQ_GOV_INTERACTIVE; +} + +fs_initcall(cpufreq_interactive_gov_init); +#else +module_init(cpufreq_interactive_gov_init); +#endif + +static void __exit cpufreq_interactive_gov_exit(void) +{ + cpufreq_unregister_governor(CPU_FREQ_GOV_INTERACTIVE); + kthread_stop(speedchange_task); + put_task_struct(speedchange_task); +} +module_exit(cpufreq_interactive_gov_exit); + +MODULE_AUTHOR("Mike Chan <mike@android.com>"); +MODULE_DESCRIPTION("'cpufreq_interactive' - A dynamic cpufreq governor for Latency sensitive workloads"); +MODULE_LICENSE("GPL");
diff --git a/drivers/cpufreq/cpufreq_performance.c b/drivers/cpufreq/cpufreq_performance.c index dafb679..399428e 100644 --- a/drivers/cpufreq/cpufreq_performance.c +++ b/drivers/cpufreq/cpufreq_performance.c
@@ -22,7 +22,10 @@ static void cpufreq_gov_performance_limits(struct cpufreq_policy *policy) __cpufreq_driver_target(policy, policy->max, CPUFREQ_RELATION_H); } -static struct cpufreq_governor cpufreq_gov_performance = { +#ifdef CONFIG_CPU_FREQ_GOV_PERFORMANCE_MODULE +static +#endif +struct cpufreq_governor cpufreq_gov_performance = { .name = "performance", .owner = THIS_MODULE, .limits = cpufreq_gov_performance_limits,
diff --git a/drivers/cpufreq/cpufreq_powersave.c b/drivers/cpufreq/cpufreq_powersave.c index 78a6510..5daa500 100644 --- a/drivers/cpufreq/cpufreq_powersave.c +++ b/drivers/cpufreq/cpufreq_powersave.c
@@ -22,7 +22,10 @@ static void cpufreq_gov_powersave_limits(struct cpufreq_policy *policy) __cpufreq_driver_target(policy, policy->min, CPUFREQ_RELATION_L); } -static struct cpufreq_governor cpufreq_gov_powersave = { +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE +static +#endif +struct cpufreq_governor cpufreq_gov_powersave = { .name = "powersave", .limits = cpufreq_gov_powersave_limits, .owner = THIS_MODULE,
diff --git a/drivers/cpufreq/cpufreq_times.c b/drivers/cpufreq/cpufreq_times.c new file mode 100644 index 0000000..0e8754b6 --- /dev/null +++ b/drivers/cpufreq/cpufreq_times.c
@@ -0,0 +1,464 @@ +/* drivers/cpufreq/cpufreq_times.c + * + * Copyright (C) 2018 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/cpufreq.h> +#include <linux/cpufreq_times.h> +#include <linux/cputime.h> +#include <linux/hashtable.h> +#include <linux/init.h> +#include <linux/proc_fs.h> +#include <linux/sched.h> +#include <linux/seq_file.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/threads.h> + +#define UID_HASH_BITS 10 + +static DECLARE_HASHTABLE(uid_hash_table, UID_HASH_BITS); + +static DEFINE_SPINLOCK(task_time_in_state_lock); /* task->time_in_state */ +static DEFINE_SPINLOCK(uid_lock); /* uid_hash_table */ + +struct uid_entry { + uid_t uid; + unsigned int max_state; + struct hlist_node hash; + struct rcu_head rcu; + u64 time_in_state[0]; +}; + +/** + * struct cpu_freqs - per-cpu frequency information + * @offset: start of these freqs' stats in task time_in_state array + * @max_state: number of entries in freq_table + * @last_index: index in freq_table of last frequency switched to + * @freq_table: list of available frequencies + */ +struct cpu_freqs { + unsigned int offset; + unsigned int max_state; + unsigned int last_index; + unsigned int freq_table[0]; +}; + +static struct cpu_freqs *all_freqs[NR_CPUS]; + +static unsigned int next_offset; + + +/* Caller must hold rcu_read_lock() */ +static struct uid_entry *find_uid_entry_rcu(uid_t uid) +{ + struct uid_entry *uid_entry; + + hash_for_each_possible_rcu(uid_hash_table, uid_entry, hash, uid) { + if (uid_entry->uid == uid) + return uid_entry; + } + return NULL; +} + +/* Caller must hold uid lock */ +static struct uid_entry *find_uid_entry_locked(uid_t uid) +{ + struct uid_entry *uid_entry; + + hash_for_each_possible(uid_hash_table, uid_entry, hash, uid) { + if (uid_entry->uid == uid) + return uid_entry; + } + return NULL; +} + +/* Caller must hold uid lock */ +static struct uid_entry *find_or_register_uid_locked(uid_t uid) +{ + struct uid_entry *uid_entry, *temp; + unsigned int max_state = READ_ONCE(next_offset); + size_t alloc_size = sizeof(*uid_entry) + max_state * + sizeof(uid_entry->time_in_state[0]); + + uid_entry = find_uid_entry_locked(uid); + if (uid_entry) { + if (uid_entry->max_state == max_state) + return uid_entry; + /* uid_entry->time_in_state is too small to track all freqs, so + * expand it. + */ + temp = __krealloc(uid_entry, alloc_size, GFP_ATOMIC); + if (!temp) + return uid_entry; + temp->max_state = max_state; + memset(temp->time_in_state + uid_entry->max_state, 0, + (max_state - uid_entry->max_state) * + sizeof(uid_entry->time_in_state[0])); + if (temp != uid_entry) { + hlist_replace_rcu(&uid_entry->hash, &temp->hash); + kfree_rcu(uid_entry, rcu); + } + return temp; + } + + uid_entry = kzalloc(alloc_size, GFP_ATOMIC); + if (!uid_entry) + return NULL; + + uid_entry->uid = uid; + uid_entry->max_state = max_state; + + hash_add_rcu(uid_hash_table, &uid_entry->hash, uid); + + return uid_entry; +} + +static bool freq_index_invalid(unsigned int index) +{ + unsigned int cpu; + struct cpu_freqs *freqs; + + for_each_possible_cpu(cpu) { + freqs = all_freqs[cpu]; + if (!freqs || index < freqs->offset || + freqs->offset + freqs->max_state <= index) + continue; + return freqs->freq_table[index - freqs->offset] == + CPUFREQ_ENTRY_INVALID; + } + return true; +} + +static int single_uid_time_in_state_show(struct seq_file *m, void *ptr) +{ + struct uid_entry *uid_entry; + unsigned int i; + u64 time; + uid_t uid = from_kuid_munged(current_user_ns(), *(kuid_t *)m->private); + + if (uid == overflowuid) + return -EINVAL; + + rcu_read_lock(); + + uid_entry = find_uid_entry_rcu(uid); + if (!uid_entry) { + rcu_read_unlock(); + return 0; + } + + for (i = 0; i < uid_entry->max_state; ++i) { + if (freq_index_invalid(i)) + continue; + time = cputime_to_clock_t(uid_entry->time_in_state[i]); + seq_write(m, &time, sizeof(time)); + } + + rcu_read_unlock(); + + return 0; +} + +static void *uid_seq_start(struct seq_file *seq, loff_t *pos) +{ + if (*pos >= HASH_SIZE(uid_hash_table)) + return NULL; + + return &uid_hash_table[*pos]; +} + +static void *uid_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + (*pos)++; + + if (*pos >= HASH_SIZE(uid_hash_table)) + return NULL; + + return &uid_hash_table[*pos]; +} + +static void uid_seq_stop(struct seq_file *seq, void *v) { } + +static int uid_time_in_state_seq_show(struct seq_file *m, void *v) +{ + struct uid_entry *uid_entry; + struct cpu_freqs *freqs, *last_freqs = NULL; + int i, cpu; + + if (v == uid_hash_table) { + seq_puts(m, "uid:"); + for_each_possible_cpu(cpu) { + freqs = all_freqs[cpu]; + if (!freqs || freqs == last_freqs) + continue; + last_freqs = freqs; + for (i = 0; i < freqs->max_state; i++) { + if (freqs->freq_table[i] == + CPUFREQ_ENTRY_INVALID) + continue; + seq_printf(m, " %d", freqs->freq_table[i]); + } + } + seq_putc(m, '\n'); + } + + rcu_read_lock(); + + hlist_for_each_entry_rcu(uid_entry, (struct hlist_head *)v, hash) { + if (uid_entry->max_state) + seq_printf(m, "%d:", uid_entry->uid); + for (i = 0; i < uid_entry->max_state; ++i) { + if (freq_index_invalid(i)) + continue; + seq_printf(m, " %lu", (unsigned long)cputime_to_clock_t( + uid_entry->time_in_state[i])); + } + if (uid_entry->max_state) + seq_putc(m, '\n'); + } + + rcu_read_unlock(); + return 0; +} + +void cpufreq_task_times_init(struct task_struct *p) +{ + unsigned long flags; + + spin_lock_irqsave(&task_time_in_state_lock, flags); + p->time_in_state = NULL; + spin_unlock_irqrestore(&task_time_in_state_lock, flags); + p->max_state = 0; +} + +void cpufreq_task_times_alloc(struct task_struct *p) +{ + void *temp; + unsigned long flags; + unsigned int max_state = READ_ONCE(next_offset); + + /* We use one array to avoid multiple allocs per task */ + temp = kcalloc(max_state, sizeof(p->time_in_state[0]), GFP_ATOMIC); + if (!temp) + return; + + spin_lock_irqsave(&task_time_in_state_lock, flags); + p->time_in_state = temp; + spin_unlock_irqrestore(&task_time_in_state_lock, flags); + p->max_state = max_state; +} + +/* Caller must hold task_time_in_state_lock */ +static int cpufreq_task_times_realloc_locked(struct task_struct *p) +{ + void *temp; + unsigned int max_state = READ_ONCE(next_offset); + + temp = krealloc(p->time_in_state, max_state * sizeof(u64), GFP_ATOMIC); + if (!temp) + return -ENOMEM; + p->time_in_state = temp; + memset(p->time_in_state + p->max_state, 0, + (max_state - p->max_state) * sizeof(u64)); + p->max_state = max_state; + return 0; +} + +void cpufreq_task_times_exit(struct task_struct *p) +{ + unsigned long flags; + void *temp; + + if (!p->time_in_state) + return; + + spin_lock_irqsave(&task_time_in_state_lock, flags); + temp = p->time_in_state; + p->time_in_state = NULL; + spin_unlock_irqrestore(&task_time_in_state_lock, flags); + kfree(temp); +} + +int proc_time_in_state_show(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *p) +{ + unsigned int cpu, i; + cputime_t cputime; + unsigned long flags; + struct cpu_freqs *freqs; + struct cpu_freqs *last_freqs = NULL; + + spin_lock_irqsave(&task_time_in_state_lock, flags); + for_each_possible_cpu(cpu) { + freqs = all_freqs[cpu]; + if (!freqs || freqs == last_freqs) + continue; + last_freqs = freqs; + + seq_printf(m, "cpu%u\n", cpu); + for (i = 0; i < freqs->max_state; i++) { + if (freqs->freq_table[i] == CPUFREQ_ENTRY_INVALID) + continue; + cputime = 0; + if (freqs->offset + i < p->max_state && + p->time_in_state) + cputime = p->time_in_state[freqs->offset + i]; + seq_printf(m, "%u %lu\n", freqs->freq_table[i], + (unsigned long)cputime_to_clock_t(cputime)); + } + } + spin_unlock_irqrestore(&task_time_in_state_lock, flags); + return 0; +} + +void cpufreq_acct_update_power(struct task_struct *p, cputime_t cputime) +{ + unsigned long flags; + unsigned int state; + struct uid_entry *uid_entry; + struct cpu_freqs *freqs = all_freqs[task_cpu(p)]; + uid_t uid = from_kuid_munged(current_user_ns(), task_uid(p)); + + if (!freqs || p->flags & PF_EXITING) + return; + + state = freqs->offset + READ_ONCE(freqs->last_index); + + spin_lock_irqsave(&task_time_in_state_lock, flags); + if ((state < p->max_state || !cpufreq_task_times_realloc_locked(p)) && + p->time_in_state) + p->time_in_state[state] += cputime; + spin_unlock_irqrestore(&task_time_in_state_lock, flags); + + spin_lock_irqsave(&uid_lock, flags); + uid_entry = find_or_register_uid_locked(uid); + if (uid_entry && state < uid_entry->max_state) + uid_entry->time_in_state[state] += cputime; + spin_unlock_irqrestore(&uid_lock, flags); +} + +void cpufreq_times_create_policy(struct cpufreq_policy *policy) +{ + int cpu, index; + unsigned int count = 0; + struct cpufreq_frequency_table *pos, *table; + struct cpu_freqs *freqs; + void *tmp; + + if (all_freqs[policy->cpu]) + return; + + table = policy->freq_table; + if (!table) + return; + + cpufreq_for_each_entry(pos, table) + count++; + + tmp = kzalloc(sizeof(*freqs) + sizeof(freqs->freq_table[0]) * count, + GFP_KERNEL); + if (!tmp) + return; + + freqs = tmp; + freqs->max_state = count; + + index = cpufreq_frequency_table_get_index(policy, policy->cur); + if (index >= 0) + WRITE_ONCE(freqs->last_index, index); + + cpufreq_for_each_entry(pos, table) + freqs->freq_table[pos - table] = pos->frequency; + + freqs->offset = next_offset; + WRITE_ONCE(next_offset, freqs->offset + count); + for_each_cpu(cpu, policy->related_cpus) + all_freqs[cpu] = freqs; +} + +void cpufreq_task_times_remove_uids(uid_t uid_start, uid_t uid_end) +{ + struct uid_entry *uid_entry; + struct hlist_node *tmp; + unsigned long flags; + + spin_lock_irqsave(&uid_lock, flags); + + for (; uid_start <= uid_end; uid_start++) { + hash_for_each_possible_safe(uid_hash_table, uid_entry, tmp, + hash, uid_start) { + if (uid_start == uid_entry->uid) { + hash_del_rcu(&uid_entry->hash); + kfree_rcu(uid_entry, rcu); + } + } + } + + spin_unlock_irqrestore(&uid_lock, flags); +} + +void cpufreq_times_record_transition(struct cpufreq_freqs *freq) +{ + int index; + struct cpu_freqs *freqs = all_freqs[freq->cpu]; + struct cpufreq_policy *policy; + + if (!freqs) + return; + + policy = cpufreq_cpu_get(freq->cpu); + if (!policy) + return; + + index = cpufreq_frequency_table_get_index(policy, freq->new); + if (index >= 0) + WRITE_ONCE(freqs->last_index, index); + + cpufreq_cpu_put(policy); +} + +static const struct seq_operations uid_time_in_state_seq_ops = { + .start = uid_seq_start, + .next = uid_seq_next, + .stop = uid_seq_stop, + .show = uid_time_in_state_seq_show, +}; + +static int uid_time_in_state_open(struct inode *inode, struct file *file) +{ + return seq_open(file, &uid_time_in_state_seq_ops); +} + +int single_uid_time_in_state_open(struct inode *inode, struct file *file) +{ + return single_open(file, single_uid_time_in_state_show, + &(inode->i_uid)); +} + +static const struct file_operations uid_time_in_state_fops = { + .open = uid_time_in_state_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + +static int __init cpufreq_times_init(void) +{ + proc_create_data("uid_time_in_state", 0444, NULL, + &uid_time_in_state_fops, NULL); + + return 0; +} + +early_initcall(cpufreq_times_init);
diff --git a/drivers/cpufreq/cpufreq_userspace.c b/drivers/cpufreq/cpufreq_userspace.c index bd897e3..765166d 100644 --- a/drivers/cpufreq/cpufreq_userspace.c +++ b/drivers/cpufreq/cpufreq_userspace.c
@@ -118,7 +118,10 @@ static void cpufreq_userspace_policy_limits(struct cpufreq_policy *policy) mutex_unlock(&userspace_mutex); } -static struct cpufreq_governor cpufreq_gov_userspace = { +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE +static +#endif +struct cpufreq_governor cpufreq_gov_userspace = { .name = "userspace", .init = cpufreq_userspace_policy_init, .exit = cpufreq_userspace_policy_exit,
diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm index 21340e0..f5214480 100644 --- a/drivers/cpuidle/Kconfig.arm +++ b/drivers/cpuidle/Kconfig.arm
@@ -4,6 +4,7 @@ config ARM_CPUIDLE bool "Generic ARM/ARM64 CPU idle Driver" select DT_IDLE_STATES + select CPU_IDLE_MULTIPLE_DRIVERS help Select this to enable generic cpuidle driver for ARM. It provides a generic idle driver whose idle states are configured
diff --git a/drivers/cpuidle/cpuidle-arm.c b/drivers/cpuidle/cpuidle-arm.c index f440d38..f47c545 100644 --- a/drivers/cpuidle/cpuidle-arm.c +++ b/drivers/cpuidle/cpuidle-arm.c
@@ -18,6 +18,7 @@ #include <linux/module.h> #include <linux/of.h> #include <linux/slab.h> +#include <linux/topology.h> #include <asm/cpuidle.h> @@ -44,7 +45,7 @@ static int arm_enter_idle_state(struct cpuidle_device *dev, return CPU_PM_CPU_IDLE_ENTER(arm_cpuidle_suspend, idx); } -static struct cpuidle_driver arm_idle_driver = { +static struct cpuidle_driver arm_idle_driver __initdata = { .name = "arm_idle", .owner = THIS_MODULE, /* @@ -80,30 +81,42 @@ static const struct of_device_id arm_idle_state_match[] __initconst = { static int __init arm_idle_init(void) { int cpu, ret; - struct cpuidle_driver *drv = &arm_idle_driver; + struct cpuidle_driver *drv; struct cpuidle_device *dev; - /* - * Initialize idle states data, starting at index 1. - * This driver is DT only, if no DT idle states are detected (ret == 0) - * let the driver initialization fail accordingly since there is no - * reason to initialize the idle driver if only wfi is supported. - */ - ret = dt_init_idle_driver(drv, arm_idle_state_match, 1); - if (ret <= 0) - return ret ? : -ENODEV; - - ret = cpuidle_register_driver(drv); - if (ret) { - pr_err("Failed to register cpuidle driver\n"); - return ret; - } - - /* - * Call arch CPU operations in order to initialize - * idle states suspend back-end specific data - */ for_each_possible_cpu(cpu) { + + drv = kmemdup(&arm_idle_driver, sizeof(*drv), GFP_KERNEL); + if (!drv) { + ret = -ENOMEM; + goto out_fail; + } + + drv->cpumask = (struct cpumask *)cpumask_of(cpu); + + /* + * Initialize idle states data, starting at index 1. This + * driver is DT only, if no DT idle states are detected (ret + * == 0) let the driver initialization fail accordingly since + * there is no reason to initialize the idle driver if only + * wfi is supported. + */ + ret = dt_init_idle_driver(drv, arm_idle_state_match, 1); + if (ret <= 0) { + ret = ret ? : -ENODEV; + goto out_kfree_drv; + } + + ret = cpuidle_register_driver(drv); + if (ret) { + pr_err("Failed to register cpuidle driver\n"); + goto out_kfree_drv; + } + + /* + * Call arch CPU operations in order to initialize + * idle states suspend back-end specific data + */ ret = arm_cpuidle_init(cpu); /* @@ -115,14 +128,14 @@ static int __init arm_idle_init(void) if (ret) { pr_err("CPU %d failed to init idle CPU ops\n", cpu); - goto out_fail; + goto out_unregister_drv; } dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) { pr_err("Failed to allocate cpuidle device\n"); ret = -ENOMEM; - goto out_fail; + goto out_unregister_drv; } dev->cpu = cpu; @@ -130,21 +143,28 @@ static int __init arm_idle_init(void) if (ret) { pr_err("Failed to register cpuidle device for CPU %d\n", cpu); - kfree(dev); - goto out_fail; + goto out_kfree_dev; } } return 0; + +out_kfree_dev: + kfree(dev); +out_unregister_drv: + cpuidle_unregister_driver(drv); +out_kfree_drv: + kfree(drv); out_fail: while (--cpu >= 0) { dev = per_cpu(cpuidle_devices, cpu); + drv = cpuidle_get_cpu_driver(dev); cpuidle_unregister_device(dev); + cpuidle_unregister_driver(drv); kfree(dev); + kfree(drv); } - cpuidle_unregister_driver(drv); - return ret; } device_initcall(arm_idle_init);
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 35237c8..439f460 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c
@@ -193,7 +193,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, } /* Take note of the planned idle state. */ - sched_idle_set_state(target_state); + sched_idle_set_state(target_state, index); trace_cpu_idle_rcuidle(index, dev->cpu); time_start = ns_to_ktime(local_clock()); @@ -206,7 +206,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu); /* The cpu is no longer idle or about to enter idle. */ - sched_idle_set_state(NULL); + sched_idle_set_state(NULL, -1); if (broadcast) { if (WARN_ON_ONCE(!irqs_disabled()))
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 03d38c2..65bb6fd 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c
@@ -178,7 +178,12 @@ static inline int performance_multiplier(unsigned long nr_iowaiters, unsigned lo /* for higher loadavg, we are more reluctant */ - mult += 2 * get_loadavg(load); + /* + * this doesn't work as intended - it is almost always 0, but can + * sometimes, depending on workload, spike very high into the hundreds + * even when the average cpu load is under 10%. + */ + /* mult += 2 * get_loadavg(); */ /* for IO wait tasks (per cpu!) we add 5x each */ mult += 10 * nr_iowaiters;
diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c index 04bf298..883b3be 100644 --- a/drivers/dma-buf/fence.c +++ b/drivers/dma-buf/fence.c
@@ -68,6 +68,8 @@ int fence_signal_locked(struct fence *fence) struct fence_cb *cur, *tmp; int ret = 0; + lockdep_assert_held(fence->lock); + if (WARN_ON(!fence)) return -EINVAL; @@ -159,9 +161,6 @@ fence_wait_timeout(struct fence *fence, bool intr, signed long timeout) if (WARN_ON(timeout < 0)) return -EINVAL; - if (timeout == 0) - return fence_is_signaled(fence); - trace_fence_wait_start(fence); ret = fence->ops->wait(fence, intr, timeout); trace_fence_wait_end(fence); @@ -329,8 +328,12 @@ fence_remove_callback(struct fence *fence, struct fence_cb *cb) spin_lock_irqsave(fence->lock, flags); ret = !list_empty(&cb->node); - if (ret) + if (ret) { list_del_init(&cb->node); + if (list_empty(&fence->cb_list)) + if (fence->ops->disable_signaling) + fence->ops->disable_signaling(fence); + } spin_unlock_irqrestore(fence->lock, flags);
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c index 723d8af..82f35a4 100644 --- a/drivers/dma-buf/reservation.c +++ b/drivers/dma-buf/reservation.c
@@ -280,18 +280,24 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj, unsigned *pshared_count, struct fence ***pshared) { - unsigned shared_count = 0; - unsigned retry = 1; - struct fence **shared = NULL, *fence_excl = NULL; - int ret = 0; + struct fence **shared = NULL; + struct fence *fence_excl; + unsigned int shared_count; + int ret = 1; - while (retry) { + do { struct reservation_object_list *fobj; unsigned seq; + unsigned int i; - seq = read_seqcount_begin(&obj->seq); + shared_count = i = 0; rcu_read_lock(); + seq = read_seqcount_begin(&obj->seq); + + fence_excl = rcu_dereference(obj->fence_excl); + if (fence_excl && !fence_get_rcu(fence_excl)) + goto unlock; fobj = rcu_dereference(obj->fence); if (fobj) { @@ -309,52 +315,37 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj, } ret = -ENOMEM; - shared_count = 0; break; } shared = nshared; - memcpy(shared, fobj->shared, sz); shared_count = fobj->shared_count; - } else - shared_count = 0; - fence_excl = rcu_dereference(obj->fence_excl); - - retry = read_seqcount_retry(&obj->seq, seq); - if (retry) - goto unlock; - - if (!fence_excl || fence_get_rcu(fence_excl)) { - unsigned i; for (i = 0; i < shared_count; ++i) { - if (fence_get_rcu(shared[i])) - continue; - - /* uh oh, refcount failed, abort and retry */ - while (i--) - fence_put(shared[i]); - - if (fence_excl) { - fence_put(fence_excl); - fence_excl = NULL; - } - - retry = 1; - break; + shared[i] = rcu_dereference(fobj->shared[i]); + if (!fence_get_rcu(shared[i])) + break; } - } else - retry = 1; + } + if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) { + while (i--) + fence_put(shared[i]); + fence_put(fence_excl); + goto unlock; + } + + ret = 0; unlock: rcu_read_unlock(); - } - *pshared_count = shared_count; - if (shared_count) - *pshared = shared; - else { - *pshared = NULL; + } while (ret); + + if (!shared_count) { kfree(shared); + shared = NULL; } + + *pshared_count = shared_count; + *pshared = shared; *pfence_excl = fence_excl; return ret; @@ -379,10 +370,7 @@ long reservation_object_wait_timeout_rcu(struct reservation_object *obj, { struct fence *fence; unsigned seq, shared_count, i = 0; - long ret = timeout; - - if (!timeout) - return reservation_object_test_signaled_rcu(obj, wait_all); + long ret = timeout ? timeout : 1; retry: fence = NULL; @@ -397,9 +385,6 @@ long reservation_object_wait_timeout_rcu(struct reservation_object *obj, if (fobj) shared_count = fobj->shared_count; - if (read_seqcount_retry(&obj->seq, seq)) - goto unlock_retry; - for (i = 0; i < shared_count; ++i) { struct fence *lfence = rcu_dereference(fobj->shared[i]); @@ -422,9 +407,6 @@ long reservation_object_wait_timeout_rcu(struct reservation_object *obj, if (!shared_count) { struct fence *fence_excl = rcu_dereference(obj->fence_excl); - if (read_seqcount_retry(&obj->seq, seq)) - goto unlock_retry; - if (fence_excl && !test_bit(FENCE_FLAG_SIGNALED_BIT, &fence_excl->flags)) { if (!fence_get_rcu(fence_excl)) @@ -439,6 +421,11 @@ long reservation_object_wait_timeout_rcu(struct reservation_object *obj, rcu_read_unlock(); if (fence) { + if (read_seqcount_retry(&obj->seq, seq)) { + fence_put(fence); + goto retry; + } + ret = fence_wait_timeout(fence, intr, ret); fence_put(fence); if (ret > 0 && wait_all && (i + 1 < shared_count)) @@ -484,12 +471,13 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj, bool test_all) { unsigned seq, shared_count; - int ret = true; + int ret; + rcu_read_lock(); retry: + ret = true; shared_count = 0; seq = read_seqcount_begin(&obj->seq); - rcu_read_lock(); if (test_all) { unsigned i; @@ -500,46 +488,35 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj, if (fobj) shared_count = fobj->shared_count; - if (read_seqcount_retry(&obj->seq, seq)) - goto unlock_retry; - for (i = 0; i < shared_count; ++i) { struct fence *fence = rcu_dereference(fobj->shared[i]); ret = reservation_object_test_signaled_single(fence); if (ret < 0) - goto unlock_retry; + goto retry; else if (!ret) break; } - /* - * There could be a read_seqcount_retry here, but nothing cares - * about whether it's the old or newer fence pointers that are - * signaled. That race could still have happened after checking - * read_seqcount_retry. If you care, use ww_mutex_lock. - */ + if (read_seqcount_retry(&obj->seq, seq)) + goto retry; } if (!shared_count) { struct fence *fence_excl = rcu_dereference(obj->fence_excl); - if (read_seqcount_retry(&obj->seq, seq)) - goto unlock_retry; - if (fence_excl) { ret = reservation_object_test_signaled_single( fence_excl); if (ret < 0) - goto unlock_retry; + goto retry; + + if (read_seqcount_retry(&obj->seq, seq)) + goto retry; } } rcu_read_unlock(); return ret; - -unlock_retry: - rcu_read_unlock(); - goto retry; } EXPORT_SYMBOL_GPL(reservation_object_test_signaled_rcu);
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index 4f35114..9dc86d3 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c
@@ -169,6 +169,13 @@ static bool timeline_fence_enable_signaling(struct fence *fence) return true; } +static void timeline_fence_disable_signaling(struct fence *fence) +{ + struct sync_pt *pt = container_of(fence, struct sync_pt, base); + + list_del_init(&pt->link); +} + static void timeline_fence_value_str(struct fence *fence, char *str, int size) { @@ -187,6 +194,7 @@ static const struct fence_ops timeline_fence_ops = { .get_driver_name = timeline_fence_get_driver_name, .get_timeline_name = timeline_fence_get_timeline_name, .enable_signaling = timeline_fence_enable_signaling, + .disable_signaling = timeline_fence_disable_signaling, .signaled = timeline_fence_signaled, .wait = fence_default_wait, .release = timeline_fence_release, @@ -360,8 +368,8 @@ static long sw_sync_ioctl_create_fence(struct sync_timeline *obj, } sync_file = sync_file_create(&pt->base); + fence_put(&pt->base); if (!sync_file) { - fence_put(&pt->base); err = -ENOMEM; goto err; }
diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c index f0c374d..e16849fd 100644 --- a/drivers/dma-buf/sync_file.c +++ b/drivers/dma-buf/sync_file.c
@@ -279,7 +279,7 @@ static void sync_file_free(struct kref *kref) struct sync_file *sync_file = container_of(kref, struct sync_file, kref); - if (test_bit(POLL_ENABLED, &sync_file->fence->flags)) + if (test_bit(POLL_ENABLED, &sync_file->flags)) fence_remove_callback(sync_file->fence, &sync_file->cb); fence_put(sync_file->fence); kfree(sync_file); @@ -299,10 +299,10 @@ static unsigned int sync_file_poll(struct file *file, poll_table *wait) poll_wait(file, &sync_file->wq, wait); - if (!poll_does_not_wait(wait) && - !test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) { + if (list_empty(&sync_file->cb.node) && + !test_and_set_bit(POLL_ENABLED, &sync_file->flags)) { if (fence_add_callback(sync_file->fence, &sync_file->cb, - fence_check_cb_func) < 0) + fence_check_cb_func) < 0) wake_up_all(&sync_file->wq); }
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c index 40d792e..7335a86 100644 --- a/drivers/edac/edac_mc_sysfs.c +++ b/drivers/edac/edac_mc_sysfs.c
@@ -50,7 +50,7 @@ int edac_mc_get_poll_msec(void) return edac_mc_poll_msec; } -static int edac_set_poll_msec(const char *val, struct kernel_param *kp) +static int edac_set_poll_msec(const char *val, const struct kernel_param *kp) { unsigned long l; int ret;
diff --git a/drivers/edac/edac_module.c b/drivers/edac/edac_module.c index 5f8543b..b0d3284 100644 --- a/drivers/edac/edac_module.c +++ b/drivers/edac/edac_module.c
@@ -19,7 +19,8 @@ #ifdef CONFIG_EDAC_DEBUG -static int edac_set_debug_level(const char *buf, struct kernel_param *kp) +static int edac_set_debug_level(const char *buf, + const struct kernel_param *kp) { unsigned long val; int ret;
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile index 5e23e2d..0e3ca82 100644 --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile
@@ -10,7 +10,7 @@ -fPIC -fno-strict-aliasing -mno-red-zone \ -mno-mmx -mno-sse -cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) +cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie cflags-$(CONFIG_ARM) := $(subst -pg,,$(KBUILD_CFLAGS)) -g0 \ -fno-builtin -fpic -mno-single-pic-base @@ -18,7 +18,8 @@ KBUILD_CFLAGS := $(cflags-y) -DDISABLE_BRANCH_PROFILING \ $(call cc-option,-ffreestanding) \ - $(call cc-option,-fno-stack-protector) + $(call cc-option,-fno-stack-protector) \ + $(DISABLE_LTO) GCOV_PROFILE := n KASAN_SANITIZE := n
diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c index 993aa56..7ec09ea8 100644 --- a/drivers/firmware/efi/libstub/arm-stub.c +++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -18,8 +18,6 @@ #include "efistub.h" -bool __nokaslr; - static int efi_get_secureboot(efi_system_table_t *sys_table_arg) { static efi_char16_t const sb_var_name[] = { @@ -268,18 +266,6 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table, goto fail; } - /* check whether 'nokaslr' was passed on the command line */ - if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { - static const u8 default_cmdline[] = CONFIG_CMDLINE; - const u8 *str, *cmdline = cmdline_ptr; - - if (IS_ENABLED(CONFIG_CMDLINE_FORCE)) - cmdline = default_cmdline; - str = strstr(cmdline, "nokaslr"); - if (str == cmdline || (str > cmdline && *(str - 1) == ' ')) - __nokaslr = true; - } - si = setup_graphics(sys_table); status = handle_kernel_image(sys_table, image_addr, &image_size, @@ -291,9 +277,13 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table, goto fail_free_cmdline; } - status = efi_parse_options(cmdline_ptr); - if (status != EFI_SUCCESS) - pr_efi_err(sys_table, "Failed to parse EFI cmdline options\n"); + if (IS_ENABLED(CONFIG_CMDLINE_EXTEND) || + IS_ENABLED(CONFIG_CMDLINE_FORCE) || + cmdline_size == 0) + efi_parse_options(CONFIG_CMDLINE); + + if (!IS_ENABLED(CONFIG_CMDLINE_FORCE) && cmdline_size > 0) + efi_parse_options(cmdline_ptr); secure_boot = efi_get_secureboot(sys_table); if (secure_boot > 0)
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c index eae693e..f7a6970 100644 --- a/drivers/firmware/efi/libstub/arm64-stub.c +++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -9,15 +9,21 @@ * published by the Free Software Foundation. * */ + +/* + * To prevent the compiler from emitting GOT-indirected (and thus absolute) + * references to the section markers, override their visibility as 'hidden' + */ +#pragma GCC visibility push(hidden) +#include <asm/sections.h> +#pragma GCC visibility pop + #include <linux/efi.h> #include <asm/efi.h> -#include <asm/sections.h> #include <asm/sysreg.h> #include "efistub.h" -extern bool __nokaslr; - efi_status_t check_platform_features(efi_system_table_t *sys_table_arg) { u64 tg; @@ -52,7 +58,7 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table_arg, u64 phys_seed = 0; if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { - if (!__nokaslr) { + if (!nokaslr()) { status = efi_get_random_bytes(sys_table_arg, sizeof(phys_seed), (u8 *)&phys_seed);
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index aded106..2db141c 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -41,6 +41,13 @@ static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE; #define EFI_ALLOC_ALIGN EFI_PAGE_SIZE #endif +static int __section(.data) __nokaslr; + +int __pure nokaslr(void) +{ + return __nokaslr; +} + #define EFI_MMAP_NR_SLACK_SLOTS 8 struct file_info { @@ -351,10 +358,14 @@ void efi_free(efi_system_table_t *sys_table_arg, unsigned long size, * environments, first in the early boot environment of the EFI boot * stub, and subsequently during the kernel boot. */ -efi_status_t efi_parse_options(char *cmdline) +efi_status_t efi_parse_options(char const *cmdline) { char *str; + str = strstr(cmdline, "nokaslr"); + if (str == cmdline || (str && str > cmdline && *(str - 1) == ' ')) + __nokaslr = 1; + /* * If no EFI parameters were specified on the cmdline we've got * nothing to do.
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index fac6799..d0e5aca 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h
@@ -15,6 +15,8 @@ */ #undef __init +extern int __pure nokaslr(void); + void efi_char16_printk(efi_system_table_t *, efi_char16_t *); efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg, void *__image,
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 483059a..43cb33d 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig
@@ -12,6 +12,7 @@ select I2C select I2C_ALGOBIT select DMA_SHARED_BUFFER + select SYNC_FILE help Kernel-level support for the Direct Rendering Infrastructure (DRI) introduced in XFree86 4.0. If you say Y here, you need to select
diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c index dd6fff1..4e4043f 100644 --- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c
@@ -30,6 +30,7 @@ #include <drm/drm_atomic.h> #include <drm/drm_mode.h> #include <drm/drm_plane_helper.h> +#include <linux/sync_file.h> #include "drm_crtc_internal.h" @@ -292,6 +293,23 @@ drm_atomic_get_crtc_state(struct drm_atomic_state *state, } EXPORT_SYMBOL(drm_atomic_get_crtc_state); +static void set_out_fence_for_crtc(struct drm_atomic_state *state, + struct drm_crtc *crtc, s32 __user *fence_ptr) +{ + state->crtcs[drm_crtc_index(crtc)].out_fence_ptr = fence_ptr; +} + +static s32 __user *get_out_fence_for_crtc(struct drm_atomic_state *state, + struct drm_crtc *crtc) +{ + s32 __user *fence_ptr; + + fence_ptr = state->crtcs[drm_crtc_index(crtc)].out_fence_ptr; + state->crtcs[drm_crtc_index(crtc)].out_fence_ptr = NULL; + + return fence_ptr; +} + /** * drm_atomic_set_mode_for_crtc - set mode for CRTC * @state: the CRTC whose incoming state to update @@ -496,6 +514,16 @@ int drm_atomic_crtc_set_property(struct drm_crtc *crtc, &replaced); state->color_mgmt_changed |= replaced; return ret; + } else if (property == config->prop_out_fence_ptr) { + s32 __user *fence_ptr = u64_to_user_ptr(val); + + if (!fence_ptr) + return 0; + + if (put_user(-1, fence_ptr)) + return -EFAULT; + + set_out_fence_for_crtc(state->state, crtc, fence_ptr); } else if (crtc->funcs->atomic_set_property) return crtc->funcs->atomic_set_property(crtc, state, property, val); else @@ -538,6 +566,8 @@ drm_atomic_crtc_get_property(struct drm_crtc *crtc, *val = (state->ctm) ? state->ctm->base.id : 0; else if (property == config->gamma_lut_property) *val = (state->gamma_lut) ? state->gamma_lut->base.id : 0; + else if (property == config->prop_out_fence_ptr) + *val = 0; else if (crtc->funcs->atomic_get_property) return crtc->funcs->atomic_get_property(crtc, state, property, val); else @@ -693,6 +723,17 @@ int drm_atomic_plane_set_property(struct drm_plane *plane, drm_atomic_set_fb_for_plane(state, fb); if (fb) drm_framebuffer_unreference(fb); + } else if (property == config->prop_in_fence_fd) { + if (state->fence) + return -EINVAL; + + if (U642I64(val) == -1) + return 0; + + state->fence = sync_file_get_fence(val); + if (!state->fence) + return -EINVAL; + } else if (property == config->prop_crtc_id) { struct drm_crtc *crtc = drm_crtc_find(dev, val); return drm_atomic_set_crtc_for_plane(state, crtc); @@ -752,6 +793,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane, if (property == config->prop_fb_id) { *val = (state->fb) ? state->fb->base.id : 0; + } else if (property == config->prop_in_fence_fd) { + *val = -1; } else if (property == config->prop_crtc_id) { *val = (state->crtc) ? state->crtc->base.id : 0; } else if (property == config->prop_crtc_x) { @@ -1154,6 +1197,36 @@ drm_atomic_set_fb_for_plane(struct drm_plane_state *plane_state, EXPORT_SYMBOL(drm_atomic_set_fb_for_plane); /** + * drm_atomic_set_fence_for_plane - set fence for plane + * @plane_state: atomic state object for the plane + * @fence: fence to use for the plane + * + * Helper to setup the plane_state fence in case it is not set yet. + * By using this drivers doesn't need to worry if the user choose + * implicit or explicit fencing. + * + * This function will not set the fence to the state if it was set + * via explicit fencing interfaces on the atomic ioctl. It will + * all drope the reference to the fence as we not storing it + * anywhere. + * + * Otherwise, if plane_state->fence is not set this function we + * just set it with the received implict fence. + */ +void +drm_atomic_set_fence_for_plane(struct drm_plane_state *plane_state, + struct fence *fence) +{ + if (plane_state->fence) { + fence_put(fence); + return; + } + + plane_state->fence = fence; +} +EXPORT_SYMBOL(drm_atomic_set_fence_for_plane); + +/** * drm_atomic_set_crtc_for_connector - set crtc for connector * @conn_state: atomic state object for the connector * @crtc: crtc to use for the connector @@ -1472,11 +1545,9 @@ EXPORT_SYMBOL(drm_atomic_nonblocking_commit); */ static struct drm_pending_vblank_event *create_vblank_event( - struct drm_device *dev, struct drm_file *file_priv, - struct fence *fence, uint64_t user_data) + struct drm_device *dev, uint64_t user_data) { struct drm_pending_vblank_event *e = NULL; - int ret; e = kzalloc(sizeof *e, GFP_KERNEL); if (!e) @@ -1486,17 +1557,6 @@ static struct drm_pending_vblank_event *create_vblank_event( e->event.base.length = sizeof(e->event); e->event.user_data = user_data; - if (file_priv) { - ret = drm_event_reserve_init(dev, file_priv, &e->base, - &e->event.base); - if (ret) { - kfree(e); - return NULL; - } - } - - e->base.fence = fence; - return e; } @@ -1601,6 +1661,206 @@ void drm_atomic_clean_old_fb(struct drm_device *dev, } EXPORT_SYMBOL(drm_atomic_clean_old_fb); +/** + * DOC: explicit fencing properties + * + * Explicit fencing allows userspace to control the buffer synchronization + * between devices. A Fence or a group of fences are transfered to/from + * userspace using Sync File fds and there are two DRM properties for that. + * IN_FENCE_FD on each DRM Plane to send fences to the kernel and + * OUT_FENCE_PTR on each DRM CRTC to receive fences from the kernel. + * + * As a contrast, with implicit fencing the kernel keeps track of any + * ongoing rendering, and automatically ensures that the atomic update waits + * for any pending rendering to complete. For shared buffers represented with + * a struct &dma_buf this is tracked in &reservation_object structures. + * Implicit syncing is how Linux traditionally worked (e.g. DRI2/3 on X.org), + * whereas explicit fencing is what Android wants. + * + * "IN_FENCE_FD”: + * Use this property to pass a fence that DRM should wait on before + * proceeding with the Atomic Commit request and show the framebuffer for + * the plane on the screen. The fence can be either a normal fence or a + * merged one, the sync_file framework will handle both cases and use a + * fence_array if a merged fence is received. Passing -1 here means no + * fences to wait on. + * + * If the Atomic Commit request has the DRM_MODE_ATOMIC_TEST_ONLY flag + * it will only check if the Sync File is a valid one. + * + * On the driver side the fence is stored on the @fence parameter of + * struct &drm_plane_state. Drivers which also support implicit fencing + * should set the implicit fence using drm_atomic_set_fence_for_plane(), + * to make sure there's consistent behaviour between drivers in precedence + * of implicit vs. explicit fencing. + * + * "OUT_FENCE_PTR”: + * Use this property to pass a file descriptor pointer to DRM. Once the + * Atomic Commit request call returns OUT_FENCE_PTR will be filled with + * the file descriptor number of a Sync File. This Sync File contains the + * CRTC fence that will be signaled when all framebuffers present on the + * Atomic Commit * request for that given CRTC are scanned out on the + * screen. + * + * The Atomic Commit request fails if a invalid pointer is passed. If the + * Atomic Commit request fails for any other reason the out fence fd + * returned will be -1. On a Atomic Commit with the + * DRM_MODE_ATOMIC_TEST_ONLY flag the out fence will also be set to -1. + * + * Note that out-fences don't have a special interface to drivers and are + * internally represented by a struct &drm_pending_vblank_event in struct + * &drm_crtc_state, which is also used by the nonblocking atomic commit + * helpers and for the DRM event handling for existing userspace. + */ + +struct drm_out_fence_state { + s32 __user *out_fence_ptr; + struct sync_file *sync_file; + int fd; +}; + +static int setup_out_fence(struct drm_out_fence_state *fence_state, + struct fence *fence) +{ + fence_state->fd = get_unused_fd_flags(O_CLOEXEC); + if (fence_state->fd < 0) + return fence_state->fd; + + if (put_user(fence_state->fd, fence_state->out_fence_ptr)) + return -EFAULT; + + fence_state->sync_file = sync_file_create(fence); + if (!fence_state->sync_file) + return -ENOMEM; + + return 0; +} + +static int prepare_crtc_signaling(struct drm_device *dev, + struct drm_atomic_state *state, + struct drm_mode_atomic *arg, + struct drm_file *file_priv, + struct drm_out_fence_state **fence_state, + unsigned int *num_fences) +{ + struct drm_crtc *crtc; + struct drm_crtc_state *crtc_state; + int i, ret; + + if (arg->flags & DRM_MODE_ATOMIC_TEST_ONLY) + return 0; + + for_each_crtc_in_state(state, crtc, crtc_state, i) { + s32 __user *fence_ptr; + + fence_ptr = get_out_fence_for_crtc(crtc_state->state, crtc); + + if (arg->flags & DRM_MODE_PAGE_FLIP_EVENT || fence_ptr) { + struct drm_pending_vblank_event *e; + + e = create_vblank_event(dev, arg->user_data); + if (!e) + return -ENOMEM; + + crtc_state->event = e; + } + + if (arg->flags & DRM_MODE_PAGE_FLIP_EVENT) { + struct drm_pending_vblank_event *e = crtc_state->event; + + if (!file_priv) + continue; + + ret = drm_event_reserve_init(dev, file_priv, &e->base, + &e->event.base); + if (ret) { + kfree(e); + crtc_state->event = NULL; + return ret; + } + } + + if (fence_ptr) { + struct fence *fence; + struct drm_out_fence_state *f; + + f = krealloc(*fence_state, sizeof(**fence_state) * + (*num_fences + 1), GFP_KERNEL); + if (!f) + return -ENOMEM; + + memset(&f[*num_fences], 0, sizeof(*f)); + + f[*num_fences].out_fence_ptr = fence_ptr; + *fence_state = f; + + fence = drm_crtc_create_fence(crtc); + if (!fence) + return -ENOMEM; + + ret = setup_out_fence(&f[(*num_fences)++], fence); + if (ret) { + fence_put(fence); + return ret; + } + + crtc_state->event->base.fence = fence; + } + } + + return 0; +} + +static void complete_crtc_signaling(struct drm_device *dev, + struct drm_atomic_state *state, + struct drm_out_fence_state *fence_state, + unsigned int num_fences, + bool install_fds) +{ + struct drm_crtc *crtc; + struct drm_crtc_state *crtc_state; + int i; + + if (install_fds) { + for (i = 0; i < num_fences; i++) + fd_install(fence_state[i].fd, + fence_state[i].sync_file->file); + + kfree(fence_state); + return; + } + + for_each_crtc_in_state(state, crtc, crtc_state, i) { + struct drm_pending_vblank_event *event = crtc_state->event; + /* + * Free the allocated event. drm_atomic_helper_setup_commit + * can allocate an event too, so only free it if it's ours + * to prevent a double free in drm_atomic_state_clear. + */ + if (event && (event->base.fence || event->base.file_priv)) { + drm_event_cancel_free(dev, &event->base); + crtc_state->event = NULL; + } + } + + if (!fence_state) + return; + + for (i = 0; i < num_fences; i++) { + if (fence_state[i].sync_file) + fput(fence_state[i].sync_file->file); + if (fence_state[i].fd >= 0) + put_unused_fd(fence_state[i].fd); + + /* If this fails log error to the user */ + if (fence_state[i].out_fence_ptr && + put_user(-1, fence_state[i].out_fence_ptr)) + DRM_DEBUG_ATOMIC("Couldn't clear out_fence_ptr\n"); + } + + kfree(fence_state); +} + int drm_mode_atomic_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { @@ -1613,11 +1873,10 @@ int drm_mode_atomic_ioctl(struct drm_device *dev, struct drm_atomic_state *state; struct drm_modeset_acquire_ctx ctx; struct drm_plane *plane; - struct drm_crtc *crtc; - struct drm_crtc_state *crtc_state; + struct drm_out_fence_state *fence_state; unsigned plane_mask; int ret = 0; - unsigned int i, j; + unsigned int i, j, num_fences; /* disallow for drivers not supporting atomic: */ if (!drm_core_check_feature(dev, DRIVER_ATOMIC)) @@ -1658,6 +1917,8 @@ int drm_mode_atomic_ioctl(struct drm_device *dev, plane_mask = 0; copied_objs = 0; copied_props = 0; + fence_state = NULL; + num_fences = 0; for (i = 0; i < arg->count_objs; i++) { uint32_t obj_id, count_props; @@ -1732,20 +1993,10 @@ int drm_mode_atomic_ioctl(struct drm_device *dev, drm_mode_object_unreference(obj); } - if (arg->flags & DRM_MODE_PAGE_FLIP_EVENT) { - for_each_crtc_in_state(state, crtc, crtc_state, i) { - struct drm_pending_vblank_event *e; - - e = create_vblank_event(dev, file_priv, NULL, - arg->user_data); - if (!e) { - ret = -ENOMEM; - goto out; - } - - crtc_state->event = e; - } - } + ret = prepare_crtc_signaling(dev, state, arg, file_priv, &fence_state, + &num_fences); + if (ret) + goto out; if (arg->flags & DRM_MODE_ATOMIC_TEST_ONLY) { /* @@ -1762,20 +2013,7 @@ int drm_mode_atomic_ioctl(struct drm_device *dev, out: drm_atomic_clean_old_fb(dev, plane_mask, ret); - if (ret && arg->flags & DRM_MODE_PAGE_FLIP_EVENT) { - /* - * Free the allocated event. drm_atomic_helper_setup_commit - * can allocate an event too, so only free it if it's ours - * to prevent a double free in drm_atomic_state_clear. - */ - for_each_crtc_in_state(state, crtc, crtc_state, i) { - struct drm_pending_vblank_event *event = crtc_state->event; - if (event && (event->base.fence || event->base.file_priv)) { - drm_event_cancel_free(dev, &event->base); - crtc_state->event = NULL; - } - } - } + complete_crtc_signaling(dev, state, fence_state, num_fences, !ret); if (ret == -EDEADLK) { drm_atomic_state_clear(state);
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 50acd79..f34b4e8 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3166,6 +3166,9 @@ void __drm_atomic_helper_plane_destroy_state(struct drm_plane_state *state) { if (state->fb) drm_framebuffer_unreference(state->fb); + + if (state->fence) + fence_put(state->fence); } EXPORT_SYMBOL(__drm_atomic_helper_plane_destroy_state);
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 2d7bedf..79b3d52 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c
@@ -33,6 +33,7 @@ #include <linux/list.h> #include <linux/slab.h> #include <linux/export.h> +#include <linux/fence.h> #include <drm/drmP.h> #include <drm/drm_crtc.h> #include <drm/drm_edid.h> @@ -141,6 +142,54 @@ static void drm_crtc_unregister_all(struct drm_device *dev) } } +static const struct fence_ops drm_crtc_fence_ops; + +static struct drm_crtc *fence_to_crtc(struct fence *fence) +{ + BUG_ON(fence->ops != &drm_crtc_fence_ops); + return container_of(fence->lock, struct drm_crtc, fence_lock); +} + +static const char *drm_crtc_fence_get_driver_name(struct fence *fence) +{ + struct drm_crtc *crtc = fence_to_crtc(fence); + + return crtc->dev->driver->name; +} + +static const char *drm_crtc_fence_get_timeline_name(struct fence *fence) +{ + struct drm_crtc *crtc = fence_to_crtc(fence); + + return crtc->timeline_name; +} + +static bool drm_crtc_fence_enable_signaling(struct fence *fence) +{ + return true; +} + +static const struct fence_ops drm_crtc_fence_ops = { + .get_driver_name = drm_crtc_fence_get_driver_name, + .get_timeline_name = drm_crtc_fence_get_timeline_name, + .enable_signaling = drm_crtc_fence_enable_signaling, + .wait = fence_default_wait, +}; + +struct fence *drm_crtc_create_fence(struct drm_crtc *crtc) +{ + struct fence *fence; + + fence = kzalloc(sizeof(*fence), GFP_KERNEL); + if (!fence) + return NULL; + + fence_init(fence, &drm_crtc_fence_ops, &crtc->fence_lock, + crtc->fence_context, ++crtc->fence_seqno); + + return fence; +} + /** * drm_crtc_init_with_planes - Initialise a new CRTC object with * specified primary and cursor planes. @@ -198,6 +247,11 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc, return -ENOMEM; } + crtc->fence_context = fence_context_alloc(1); + spin_lock_init(&crtc->fence_lock); + snprintf(crtc->timeline_name, sizeof(crtc->timeline_name), + "CRTC:%d-%s", crtc->base.id, crtc->name); + crtc->base.properties = &crtc->properties; list_add_tail(&crtc->head, &config->crtc_list); @@ -213,6 +267,8 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc, if (drm_core_check_feature(dev, DRIVER_ATOMIC)) { drm_object_attach_property(&crtc->base, config->prop_active, 0); drm_object_attach_property(&crtc->base, config->prop_mode_id, 0); + drm_object_attach_property(&crtc->base, + config->prop_out_fence_ptr, 0); } return 0; @@ -365,6 +421,18 @@ static int drm_mode_create_standard_properties(struct drm_device *dev) return -ENOMEM; dev->mode_config.prop_fb_id = prop; + prop = drm_property_create_signed_range(dev, DRM_MODE_PROP_ATOMIC, + "IN_FENCE_FD", -1, INT_MAX); + if (!prop) + return -ENOMEM; + dev->mode_config.prop_in_fence_fd = prop; + + prop = drm_property_create_range(dev, DRM_MODE_PROP_ATOMIC, + "OUT_FENCE_PTR", 0, U64_MAX); + if (!prop) + return -ENOMEM; + dev->mode_config.prop_out_fence_ptr = prop; + prop = drm_property_create_object(dev, DRM_MODE_PROP_ATOMIC, "CRTC_ID", DRM_MODE_OBJECT_CRTC); if (!prop)
diff --git a/drivers/gpu/drm/drm_crtc_internal.h b/drivers/gpu/drm/drm_crtc_internal.h index c48ba02..df2b51a 100644 --- a/drivers/gpu/drm/drm_crtc_internal.h +++ b/drivers/gpu/drm/drm_crtc_internal.h
@@ -41,6 +41,8 @@ int drm_crtc_check_viewport(const struct drm_crtc *crtc, const struct drm_display_mode *mode, const struct drm_framebuffer *fb); +struct fence *drm_crtc_create_fence(struct drm_crtc *crtc); + void drm_fb_release(struct drm_file *file_priv); /* dumb buffer support IOCTLs */
diff --git a/drivers/gpu/drm/drm_fb_cma_helper.c b/drivers/gpu/drm/drm_fb_cma_helper.c index 1fd6eac..52629b6 100644 --- a/drivers/gpu/drm/drm_fb_cma_helper.c +++ b/drivers/gpu/drm/drm_fb_cma_helper.c
@@ -18,13 +18,16 @@ */ #include <drm/drmP.h> +#include <drm/drm_atomic.h> #include <drm/drm_crtc.h> #include <drm/drm_fb_helper.h> #include <drm/drm_crtc_helper.h> #include <drm/drm_gem_cma_helper.h> #include <drm/drm_fb_cma_helper.h> +#include <linux/dma-buf.h> #include <linux/dma-mapping.h> #include <linux/module.h> +#include <linux/reservation.h> #define DEFAULT_FBDEFIO_DELAY_MS 50 @@ -265,6 +268,38 @@ struct drm_gem_cma_object *drm_fb_cma_get_gem_obj(struct drm_framebuffer *fb, } EXPORT_SYMBOL_GPL(drm_fb_cma_get_gem_obj); +/** + * drm_fb_cma_prepare_fb() - Prepare CMA framebuffer + * @plane: Which plane + * @state: Plane state attach fence to + * + * This should be put into prepare_fb hook of struct &drm_plane_helper_funcs . + * + * This function checks if the plane FB has an dma-buf attached, extracts + * the exclusive fence and attaches it to plane state for the atomic helper + * to wait on. + * + * There is no need for cleanup_fb for CMA based framebuffer drivers. + */ +int drm_fb_cma_prepare_fb(struct drm_plane *plane, + struct drm_plane_state *state) +{ + struct dma_buf *dma_buf; + struct fence *fence; + + if ((plane->state->fb == state->fb) || !state->fb) + return 0; + + dma_buf = drm_fb_cma_get_gem_obj(state->fb, 0)->base.dma_buf; + if (dma_buf) { + fence = reservation_object_get_excl_rcu(dma_buf->resv); + drm_atomic_set_fence_for_plane(state, fence); + } + + return 0; +} +EXPORT_SYMBOL_GPL(drm_fb_cma_prepare_fb); + #ifdef CONFIG_DEBUG_FS static void drm_fb_cma_describe(struct drm_framebuffer *fb, struct seq_file *m) {
diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index c37b7b5..129b80b 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c
@@ -664,6 +664,10 @@ void drm_event_cancel_free(struct drm_device *dev, list_del(&p->pending_link); } spin_unlock_irqrestore(&dev->event_lock, flags); + + if (p->fence) + fence_put(p->fence); + kfree(p); } EXPORT_SYMBOL(drm_event_cancel_free);
diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c index 249c0ae..3957ef8 100644 --- a/drivers/gpu/drm/drm_plane.c +++ b/drivers/gpu/drm/drm_plane.c
@@ -137,6 +137,7 @@ int drm_universal_plane_init(struct drm_device *dev, struct drm_plane *plane, if (drm_core_check_feature(dev, DRIVER_ATOMIC)) { drm_object_attach_property(&plane->base, config->prop_fb_id, 0); + drm_object_attach_property(&plane->base, config->prop_in_fence_fd, -1); drm_object_attach_property(&plane->base, config->prop_crtc_id, 0); drm_object_attach_property(&plane->base, config->prop_crtc_x, 0); drm_object_attach_property(&plane->base, config->prop_crtc_y, 0);
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c index 83bf997..5e67e8b 100644 --- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c +++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c
@@ -218,9 +218,10 @@ mdp5_plane_duplicate_state(struct drm_plane *plane) mdp5_state = kmemdup(to_mdp5_plane_state(plane->state), sizeof(*mdp5_state), GFP_KERNEL); + if (!mdp5_state) + return NULL; - if (mdp5_state && mdp5_state->base.fb) - drm_framebuffer_reference(mdp5_state->base.fb); + __drm_atomic_helper_plane_duplicate_state(plane, &mdp5_state->base); mdp5_state->mode_changed = false; mdp5_state->pending = false;
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 52a2a1a..5a18408 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1606,7 +1606,14 @@ EXPORT_SYMBOL(ttm_bo_unmap_virtual); int ttm_bo_wait(struct ttm_buffer_object *bo, bool interruptible, bool no_wait) { - long timeout = no_wait ? 0 : 15 * HZ; + long timeout = 15 * HZ; + + if (no_wait) { + if (reservation_object_test_signaled_rcu(bo->resv, true)) + return 0; + else + return -EBUSY; + } timeout = reservation_object_wait_timeout_rcu(bo->resv, true, interruptible, timeout);
diff --git a/drivers/hid/hid-magicmouse.c b/drivers/hid/hid-magicmouse.c index 20b40ad2..42ed887 100644 --- a/drivers/hid/hid-magicmouse.c +++ b/drivers/hid/hid-magicmouse.c
@@ -34,7 +34,8 @@ module_param(emulate_scroll_wheel, bool, 0644); MODULE_PARM_DESC(emulate_scroll_wheel, "Emulate a scroll wheel"); static unsigned int scroll_speed = 32; -static int param_set_scroll_speed(const char *val, struct kernel_param *kp) { +static int param_set_scroll_speed(const char *val, + const struct kernel_param *kp) { unsigned long speed; if (!val || kstrtoul(val, 0, &speed) || speed > 63) return -EINVAL;
diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c index 7f8ff39..e46f656 100644 --- a/drivers/hid/uhid.c +++ b/drivers/hid/uhid.c
@@ -28,6 +28,8 @@ #define UHID_NAME "uhid" #define UHID_BUFSIZE 32 +static DEFINE_MUTEX(uhid_open_mutex); + struct uhid_device { struct mutex devlock; bool running; @@ -142,15 +144,26 @@ static void uhid_hid_stop(struct hid_device *hid) static int uhid_hid_open(struct hid_device *hid) { struct uhid_device *uhid = hid->driver_data; + int retval = 0; - return uhid_queue_event(uhid, UHID_OPEN); + mutex_lock(&uhid_open_mutex); + if (!hid->open++) { + retval = uhid_queue_event(uhid, UHID_OPEN); + if (retval) + hid->open--; + } + mutex_unlock(&uhid_open_mutex); + return retval; } static void uhid_hid_close(struct hid_device *hid) { struct uhid_device *uhid = hid->driver_data; - uhid_queue_event(uhid, UHID_CLOSE); + mutex_lock(&uhid_open_mutex); + if (!--hid->open) + uhid_queue_event(uhid, UHID_CLOSE); + mutex_unlock(&uhid_open_mutex); } static int uhid_hid_parse(struct hid_device *hid)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 2cd7c71..df63315 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -201,9 +201,23 @@ static void *etm_setup_aux(int event_cpu, void **pages, event_data = alloc_event_data(event_cpu); if (!event_data) return NULL; - INIT_WORK(&event_data->work, free_event_data); + /* + * In theory nothing prevent tracers in a trace session from being + * associated with different sinks, nor having a sink per tracer. But + * until we have HW with this kind of topology we need to assume tracers + * in a trace session are using the same sink. Therefore go through + * the coresight bus and pick the first enabled sink. + * + * When operated from sysFS users are responsible to enable the sink + * while from perf, the perf tools will do it based on the choice made + * on the cmd line. As such the "enable_sink" flag in sysFS is reset. + */ + sink = coresight_get_enabled_sink(true); + if (!sink) + goto err; + mask = &event_data->mask; /* Setup the path for each CPU in a trace session */ @@ -219,28 +233,15 @@ static void *etm_setup_aux(int event_cpu, void **pages, * list of devices from source to sink that can be * referenced later when the path is actually needed. */ - event_data->path[cpu] = coresight_build_path(csdev); + event_data->path[cpu] = coresight_build_path(csdev, sink); if (IS_ERR(event_data->path[cpu])) goto err; } - /* - * In theory nothing prevent tracers in a trace session from being - * associated with different sinks, nor having a sink per tracer. But - * until we have HW with this kind of topology and a way to convey - * sink assignement from the perf cmd line we need to assume tracers - * in a trace session are using the same sink. Therefore pick the sink - * found at the end of the first available path. - */ - cpu = cpumask_first(mask); - /* Grab the sink at the end of the path */ - sink = coresight_get_sink(event_data->path[cpu]); - if (!sink) - goto err; - if (!sink_ops(sink)->alloc_buffer) goto err; + cpu = cpumask_first(mask); /* Get the AUX specific data from the sink buffer */ event_data->snk_config = sink_ops(sink)->alloc_buffer(sink, cpu, pages,
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index 4db8d6a..83358eb 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -1026,7 +1026,8 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id) } pm_runtime_put(&adev->dev); - dev_info(dev, "%s initialized\n", (char *)id->data); + dev_info(dev, "CPU%d: %s initialized\n", + drvdata->cpu, (char *)id->data); if (boot_enable) { coresight_enable(drvdata->csdev); @@ -1045,20 +1046,25 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id) } static struct amba_id etm4_ids[] = { - { /* ETM 4.0 - Cortex-A53 */ + { .id = 0x000bb95d, .mask = 0x000fffff, - .data = "ETM 4.0", + .data = "Cortex-A53 ETM v4.0", }, - { /* ETM 4.0 - Cortex-A57 */ + { .id = 0x000bb95e, .mask = 0x000fffff, - .data = "ETM 4.0", + .data = "Cortex-A57 ETM v4.0", }, - { /* ETM 4.0 - A72, Maia, HiSilicon */ + { .id = 0x000bb95a, .mask = 0x000fffff, - .data = "ETM 4.0", + .data = "Cortex-A72 ETM v4.0", + }, + { + .id = 0x000bb959, + .mask = 0x000fffff, + .data = "Cortex-A73 ETM v4.0", }, { 0, 0}, };
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h index 196a14b..ef9d8e9 100644 --- a/drivers/hwtracing/coresight/coresight-priv.h +++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -111,7 +111,9 @@ static inline void CS_UNLOCK(void __iomem *addr) void coresight_disable_path(struct list_head *path); int coresight_enable_path(struct list_head *path, u32 mode); struct coresight_device *coresight_get_sink(struct list_head *path); -struct list_head *coresight_build_path(struct coresight_device *csdev); +struct coresight_device *coresight_get_enabled_sink(bool reset); +struct list_head *coresight_build_path(struct coresight_device *csdev, + struct coresight_device *sink); void coresight_release_path(struct list_head *path); #ifdef CONFIG_CORESIGHT_SOURCE_ETM3X
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index d6941ea..1549436 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -70,7 +70,7 @@ static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata) * When operating in sysFS mode the content of the buffer needs to be * read before the TMC is disabled. */ - if (local_read(&drvdata->mode) == CS_MODE_SYSFS) + if (drvdata->mode == CS_MODE_SYSFS) tmc_etb_dump_hw(drvdata); tmc_disable_hw(drvdata); @@ -103,19 +103,14 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata) CS_LOCK(drvdata->base); } -static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode) +static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev) { int ret = 0; bool used = false; char *buf = NULL; - long val; unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); - /* This shouldn't be happening */ - if (WARN_ON(mode != CS_MODE_SYSFS)) - return -EINVAL; - /* * If we don't have a buffer release the lock and allocate memory. * Otherwise keep the lock and move along. @@ -138,13 +133,12 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode) goto out; } - val = local_xchg(&drvdata->mode, mode); /* * In sysFS mode we can have multiple writers per sink. Since this * sink is already enabled no memory is needed and the HW need not be * touched. */ - if (val == CS_MODE_SYSFS) + if (drvdata->mode == CS_MODE_SYSFS) goto out; /* @@ -163,6 +157,7 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode) drvdata->buf = buf; } + drvdata->mode = CS_MODE_SYSFS; tmc_etb_enable_hw(drvdata); out: spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -177,34 +172,29 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode) return ret; } -static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, u32 mode) +static int tmc_enable_etf_sink_perf(struct coresight_device *csdev) { int ret = 0; - long val; unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); - /* This shouldn't be happening */ - if (WARN_ON(mode != CS_MODE_PERF)) - return -EINVAL; - spin_lock_irqsave(&drvdata->spinlock, flags); if (drvdata->reading) { ret = -EINVAL; goto out; } - val = local_xchg(&drvdata->mode, mode); /* * In Perf mode there can be only one writer per sink. There * is also no need to continue if the ETB/ETR is already operated * from sysFS. */ - if (val != CS_MODE_DISABLED) { + if (drvdata->mode != CS_MODE_DISABLED) { ret = -EINVAL; goto out; } + drvdata->mode = CS_MODE_PERF; tmc_etb_enable_hw(drvdata); out: spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -216,9 +206,9 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, u32 mode) { switch (mode) { case CS_MODE_SYSFS: - return tmc_enable_etf_sink_sysfs(csdev, mode); + return tmc_enable_etf_sink_sysfs(csdev); case CS_MODE_PERF: - return tmc_enable_etf_sink_perf(csdev, mode); + return tmc_enable_etf_sink_perf(csdev); } /* We shouldn't be here */ @@ -227,7 +217,6 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, u32 mode) static void tmc_disable_etf_sink(struct coresight_device *csdev) { - long val; unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); @@ -237,10 +226,11 @@ static void tmc_disable_etf_sink(struct coresight_device *csdev) return; } - val = local_xchg(&drvdata->mode, CS_MODE_DISABLED); /* Disable the TMC only if it needs to */ - if (val != CS_MODE_DISABLED) + if (drvdata->mode != CS_MODE_DISABLED) { tmc_etb_disable_hw(drvdata); + drvdata->mode = CS_MODE_DISABLED; + } spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -260,7 +250,7 @@ static int tmc_enable_etf_link(struct coresight_device *csdev, } tmc_etf_enable_hw(drvdata); - local_set(&drvdata->mode, CS_MODE_SYSFS); + drvdata->mode = CS_MODE_SYSFS; spin_unlock_irqrestore(&drvdata->spinlock, flags); dev_info(drvdata->dev, "TMC-ETF enabled\n"); @@ -280,7 +270,7 @@ static void tmc_disable_etf_link(struct coresight_device *csdev, } tmc_etf_disable_hw(drvdata); - local_set(&drvdata->mode, CS_MODE_DISABLED); + drvdata->mode = CS_MODE_DISABLED; spin_unlock_irqrestore(&drvdata->spinlock, flags); dev_info(drvdata->dev, "TMC disabled\n"); @@ -383,7 +373,7 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev, return; /* This shouldn't happen */ - if (WARN_ON_ONCE(local_read(&drvdata->mode) != CS_MODE_PERF)) + if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF)) return; CS_UNLOCK(drvdata->base); @@ -504,7 +494,6 @@ const struct coresight_ops tmc_etf_cs_ops = { int tmc_read_prepare_etb(struct tmc_drvdata *drvdata) { - long val; enum tmc_mode mode; int ret = 0; unsigned long flags; @@ -528,9 +517,8 @@ int tmc_read_prepare_etb(struct tmc_drvdata *drvdata) goto out; } - val = local_read(&drvdata->mode); /* Don't interfere if operated from Perf */ - if (val == CS_MODE_PERF) { + if (drvdata->mode == CS_MODE_PERF) { ret = -EINVAL; goto out; } @@ -542,7 +530,7 @@ int tmc_read_prepare_etb(struct tmc_drvdata *drvdata) } /* Disable the TMC if need be */ - if (val == CS_MODE_SYSFS) + if (drvdata->mode == CS_MODE_SYSFS) tmc_etb_disable_hw(drvdata); drvdata->reading = true; @@ -573,7 +561,7 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata) } /* Re-enable the TMC if need be */ - if (local_read(&drvdata->mode) == CS_MODE_SYSFS) { + if (drvdata->mode == CS_MODE_SYSFS) { /* * The trace run will continue with the same allocated trace * buffer. As such zero-out the buffer so that we don't end
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 886ea83..2db4857 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -15,11 +15,30 @@ * this program. If not, see <http://www.gnu.org/licenses/>. */ +#include <linux/circ_buf.h> #include <linux/coresight.h> #include <linux/dma-mapping.h> +#include <linux/slab.h> + #include "coresight-priv.h" #include "coresight-tmc.h" +/** + * struct cs_etr_buffer - keep track of a recording session' specifics + * @tmc: generic portion of the TMC buffers + * @paddr: the physical address of a DMA'able contiguous memory area + * @vaddr: the virtual address associated to @paddr + * @size: how much memory we have, starting at @paddr + * @dev: the device @vaddr has been tied to + */ +struct cs_etr_buffers { + struct cs_buffers tmc; + dma_addr_t paddr; + void __iomem *vaddr; + u32 size; + struct device *dev; +}; + static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata) { u32 axictl; @@ -86,26 +105,22 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata) * When operating in sysFS mode the content of the buffer needs to be * read before the TMC is disabled. */ - if (local_read(&drvdata->mode) == CS_MODE_SYSFS) + if (drvdata->mode == CS_MODE_SYSFS) tmc_etr_dump_hw(drvdata); tmc_disable_hw(drvdata); CS_LOCK(drvdata->base); } -static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode) +static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) { int ret = 0; bool used = false; - long val; unsigned long flags; void __iomem *vaddr = NULL; dma_addr_t paddr; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); - /* This shouldn't be happening */ - if (WARN_ON(mode != CS_MODE_SYSFS)) - return -EINVAL; /* * If we don't have a buffer release the lock and allocate memory. @@ -134,13 +149,12 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode) goto out; } - val = local_xchg(&drvdata->mode, mode); /* * In sysFS mode we can have multiple writers per sink. Since this * sink is already enabled no memory is needed and the HW need not be * touched. */ - if (val == CS_MODE_SYSFS) + if (drvdata->mode == CS_MODE_SYSFS) goto out; /* @@ -155,8 +169,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode) drvdata->buf = drvdata->vaddr; } - memset(drvdata->vaddr, 0, drvdata->size); - + drvdata->mode = CS_MODE_SYSFS; tmc_etr_enable_hw(drvdata); out: spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -171,34 +184,29 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode) return ret; } -static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, u32 mode) +static int tmc_enable_etr_sink_perf(struct coresight_device *csdev) { int ret = 0; - long val; unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); - /* This shouldn't be happening */ - if (WARN_ON(mode != CS_MODE_PERF)) - return -EINVAL; - spin_lock_irqsave(&drvdata->spinlock, flags); if (drvdata->reading) { ret = -EINVAL; goto out; } - val = local_xchg(&drvdata->mode, mode); /* * In Perf mode there can be only one writer per sink. There * is also no need to continue if the ETR is already operated * from sysFS. */ - if (val != CS_MODE_DISABLED) { + if (drvdata->mode != CS_MODE_DISABLED) { ret = -EINVAL; goto out; } + drvdata->mode = CS_MODE_PERF; tmc_etr_enable_hw(drvdata); out: spin_unlock_irqrestore(&drvdata->spinlock, flags); @@ -210,9 +218,9 @@ static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode) { switch (mode) { case CS_MODE_SYSFS: - return tmc_enable_etr_sink_sysfs(csdev, mode); + return tmc_enable_etr_sink_sysfs(csdev); case CS_MODE_PERF: - return tmc_enable_etr_sink_perf(csdev, mode); + return tmc_enable_etr_sink_perf(csdev); } /* We shouldn't be here */ @@ -221,7 +229,6 @@ static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode) static void tmc_disable_etr_sink(struct coresight_device *csdev) { - long val; unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); @@ -231,19 +238,244 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev) return; } - val = local_xchg(&drvdata->mode, CS_MODE_DISABLED); /* Disable the TMC only if it needs to */ - if (val != CS_MODE_DISABLED) + if (drvdata->mode != CS_MODE_DISABLED) { tmc_etr_disable_hw(drvdata); + drvdata->mode = CS_MODE_DISABLED; + } spin_unlock_irqrestore(&drvdata->spinlock, flags); dev_info(drvdata->dev, "TMC-ETR disabled\n"); } +static void *tmc_alloc_etr_buffer(struct coresight_device *csdev, int cpu, + void **pages, int nr_pages, bool overwrite) +{ + int node; + struct cs_etr_buffers *buf; + struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + + if (cpu == -1) + cpu = smp_processor_id(); + node = cpu_to_node(cpu); + + /* Allocate memory structure for interaction with Perf */ + buf = kzalloc_node(sizeof(struct cs_etr_buffers), GFP_KERNEL, node); + if (!buf) + return NULL; + + buf->dev = drvdata->dev; + buf->size = drvdata->size; + buf->vaddr = dma_alloc_coherent(buf->dev, buf->size, + &buf->paddr, GFP_KERNEL); + if (!buf->vaddr) { + kfree(buf); + return NULL; + } + + buf->tmc.snapshot = overwrite; + buf->tmc.nr_pages = nr_pages; + buf->tmc.data_pages = pages; + + return buf; +} + +static void tmc_free_etr_buffer(void *config) +{ + struct cs_etr_buffers *buf = config; + + dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->paddr); + kfree(buf); +} + +static int tmc_set_etr_buffer(struct coresight_device *csdev, + struct perf_output_handle *handle, + void *sink_config) +{ + int ret = 0; + unsigned long head; + struct cs_etr_buffers *buf = sink_config; + struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + + /* wrap head around to the amount of space we have */ + head = handle->head & ((buf->tmc.nr_pages << PAGE_SHIFT) - 1); + + /* find the page to write to */ + buf->tmc.cur = head / PAGE_SIZE; + + /* and offset within that page */ + buf->tmc.offset = head % PAGE_SIZE; + + local_set(&buf->tmc.data_size, 0); + + /* Tell the HW where to put the trace data */ + drvdata->vaddr = buf->vaddr; + drvdata->paddr = buf->paddr; + memset(drvdata->vaddr, 0, drvdata->size); + + return ret; +} + +static unsigned long tmc_reset_etr_buffer(struct coresight_device *csdev, + struct perf_output_handle *handle, + void *sink_config, bool *lost) +{ + long size = 0; + struct cs_etr_buffers *buf = sink_config; + struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + + if (buf) { + /* + * In snapshot mode ->data_size holds the new address of the + * ring buffer's head. The size itself is the whole address + * range since we want the latest information. + */ + if (buf->tmc.snapshot) { + size = buf->tmc.nr_pages << PAGE_SHIFT; + handle->head = local_xchg(&buf->tmc.data_size, size); + } + + /* + * Tell the tracer PMU how much we got in this run and if + * something went wrong along the way. Nobody else can use + * this cs_etr_buffers instance until we are done. As such + * resetting parameters here and squaring off with the ring + * buffer API in the tracer PMU is fine. + */ + *lost = !!local_xchg(&buf->tmc.lost, 0); + size = local_xchg(&buf->tmc.data_size, 0); + } + + /* Get ready for another run */ + drvdata->vaddr = NULL; + drvdata->paddr = 0; + + return size; +} + +static void tmc_update_etr_buffer(struct coresight_device *csdev, + struct perf_output_handle *handle, + void *sink_config) +{ + int i, cur; + u32 *buf_ptr; + u32 read_ptr, write_ptr; + u32 status, to_read; + unsigned long offset; + struct cs_buffers *buf = sink_config; + struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + + if (!buf) + return; + + /* This shouldn't happen */ + if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF)) + return; + + CS_UNLOCK(drvdata->base); + + tmc_flush_and_stop(drvdata); + + read_ptr = readl_relaxed(drvdata->base + TMC_RRP); + write_ptr = readl_relaxed(drvdata->base + TMC_RWP); + + /* + * Get a hold of the status register and see if a wrap around + * has occurred. If so adjust things accordingly. + */ + status = readl_relaxed(drvdata->base + TMC_STS); + if (status & TMC_STS_FULL) { + local_inc(&buf->lost); + to_read = drvdata->size; + } else { + to_read = CIRC_CNT(write_ptr, read_ptr, drvdata->size); + } + + /* + * The TMC RAM buffer may be bigger than the space available in the + * perf ring buffer (handle->size). If so advance the RRP so that we + * get the latest trace data. + */ + if (to_read > handle->size) { + u32 buffer_start, mask = 0; + + /* Read buffer start address in system memory */ + buffer_start = readl_relaxed(drvdata->base + TMC_DBALO); + + /* + * The value written to RRP must be byte-address aligned to + * the width of the trace memory databus _and_ to a frame + * boundary (16 byte), whichever is the biggest. For example, + * for 32-bit, 64-bit and 128-bit wide trace memory, the four + * LSBs must be 0s. For 256-bit wide trace memory, the five + * LSBs must be 0s. + */ + switch (drvdata->memwidth) { + case TMC_MEM_INTF_WIDTH_32BITS: + case TMC_MEM_INTF_WIDTH_64BITS: + case TMC_MEM_INTF_WIDTH_128BITS: + mask = GENMASK(31, 5); + break; + case TMC_MEM_INTF_WIDTH_256BITS: + mask = GENMASK(31, 6); + break; + } + + /* + * Make sure the new size is aligned in accordance with the + * requirement explained above. + */ + to_read = handle->size & mask; + /* Move the RAM read pointer up */ + read_ptr = (write_ptr + drvdata->size) - to_read; + /* Make sure we are still within our limits */ + if (read_ptr > (buffer_start + (drvdata->size - 1))) + read_ptr -= drvdata->size; + /* Tell the HW */ + writel_relaxed(read_ptr, drvdata->base + TMC_RRP); + local_inc(&buf->lost); + } + + cur = buf->cur; + offset = buf->offset; + + /* for every byte to read */ + for (i = 0; i < to_read; i += 4) { + buf_ptr = buf->data_pages[cur] + offset; + *buf_ptr = readl_relaxed(drvdata->base + TMC_RRD); + + offset += 4; + if (offset >= PAGE_SIZE) { + offset = 0; + cur++; + /* wrap around at the end of the buffer */ + cur &= buf->nr_pages - 1; + } + } + + /* + * In snapshot mode all we have to do is communicate to + * perf_aux_output_end() the address of the current head. In full + * trace mode the same function expects a size to move rb->aux_head + * forward. + */ + if (buf->snapshot) + local_set(&buf->data_size, (cur * PAGE_SIZE) + offset); + else + local_add(to_read, &buf->data_size); + + CS_LOCK(drvdata->base); +} + static const struct coresight_ops_sink tmc_etr_sink_ops = { .enable = tmc_enable_etr_sink, .disable = tmc_disable_etr_sink, + .alloc_buffer = tmc_alloc_etr_buffer, + .free_buffer = tmc_free_etr_buffer, + .set_buffer = tmc_set_etr_buffer, + .reset_buffer = tmc_reset_etr_buffer, + .update_buffer = tmc_update_etr_buffer, }; const struct coresight_ops tmc_etr_cs_ops = { @@ -253,7 +485,6 @@ const struct coresight_ops tmc_etr_cs_ops = { int tmc_read_prepare_etr(struct tmc_drvdata *drvdata) { int ret = 0; - long val; unsigned long flags; /* config types are set a boot time and never change */ @@ -266,9 +497,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata) goto out; } - val = local_read(&drvdata->mode); /* Don't interfere if operated from Perf */ - if (val == CS_MODE_PERF) { + if (drvdata->mode == CS_MODE_PERF) { ret = -EINVAL; goto out; } @@ -280,7 +510,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata) } /* Disable the TMC if need be */ - if (val == CS_MODE_SYSFS) + if (drvdata->mode == CS_MODE_SYSFS) tmc_etr_disable_hw(drvdata); drvdata->reading = true; @@ -303,7 +533,7 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata) spin_lock_irqsave(&drvdata->spinlock, flags); /* RE-enable the TMC if need be */ - if (local_read(&drvdata->mode) == CS_MODE_SYSFS) { + if (drvdata->mode == CS_MODE_SYSFS) { /* * The trace run will continue with the same allocated trace * buffer. The trace buffer is cleared in tmc_etr_enable_hw(),
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 44b3ae3..51c0185 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -117,7 +117,7 @@ struct tmc_drvdata { void __iomem *vaddr; u32 size; u32 len; - local_t mode; + u32 mode; enum tmc_config_type config_type; enum tmc_mem_intf_width memwidth; u32 trigger_cntr;
diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c index 398e44a..cd41f40 100644 --- a/drivers/hwtracing/coresight/coresight.c +++ b/drivers/hwtracing/coresight/coresight.c
@@ -371,6 +371,52 @@ struct coresight_device *coresight_get_sink(struct list_head *path) return csdev; } +static int coresight_enabled_sink(struct device *dev, void *data) +{ + bool *reset = data; + struct coresight_device *csdev = to_coresight_device(dev); + + if ((csdev->type == CORESIGHT_DEV_TYPE_SINK || + csdev->type == CORESIGHT_DEV_TYPE_LINKSINK) && + csdev->activated) { + /* + * Now that we have a handle on the sink for this session, + * disable the sysFS "enable_sink" flag so that possible + * concurrent perf session that wish to use another sink don't + * trip on it. Doing so has no ramification for the current + * session. + */ + if (*reset) + csdev->activated = false; + + return 1; + } + + return 0; +} + +/** + * coresight_get_enabled_sink - returns the first enabled sink found on the bus + * @deactivate: Whether the 'enable_sink' flag should be reset + * + * When operated from perf the deactivate parameter should be set to 'true'. + * That way the "enabled_sink" flag of the sink that was selected can be reset, + * allowing for other concurrent perf sessions to choose a different sink. + * + * When operated from sysFS users have full control and as such the deactivate + * parameter should be set to 'false', hence mandating users to explicitly + * clear the flag. + */ +struct coresight_device *coresight_get_enabled_sink(bool deactivate) +{ + struct device *dev = NULL; + + dev = bus_find_device(&coresight_bustype, NULL, &deactivate, + coresight_enabled_sink); + + return dev ? to_coresight_device(dev) : NULL; +} + /** * _coresight_build_path - recursively build a path from a @csdev to a sink. * @csdev: The device to start from. @@ -383,6 +429,7 @@ struct coresight_device *coresight_get_sink(struct list_head *path) * last one. */ static int _coresight_build_path(struct coresight_device *csdev, + struct coresight_device *sink, struct list_head *path) { int i; @@ -390,15 +437,15 @@ static int _coresight_build_path(struct coresight_device *csdev, struct coresight_node *node; /* An activated sink has been found. Enqueue the element */ - if ((csdev->type == CORESIGHT_DEV_TYPE_SINK || - csdev->type == CORESIGHT_DEV_TYPE_LINKSINK) && csdev->activated) + if (csdev == sink) goto out; /* Not a sink - recursively explore each port found on this element */ for (i = 0; i < csdev->nr_outport; i++) { struct coresight_device *child_dev = csdev->conns[i].child_dev; - if (child_dev && _coresight_build_path(child_dev, path) == 0) { + if (child_dev && + _coresight_build_path(child_dev, sink, path) == 0) { found = true; break; } @@ -425,18 +472,22 @@ static int _coresight_build_path(struct coresight_device *csdev, return 0; } -struct list_head *coresight_build_path(struct coresight_device *csdev) +struct list_head *coresight_build_path(struct coresight_device *source, + struct coresight_device *sink) { struct list_head *path; int rc; + if (!sink) + return ERR_PTR(-EINVAL); + path = kzalloc(sizeof(struct list_head), GFP_KERNEL); if (!path) return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(path); - rc = _coresight_build_path(csdev, path); + rc = _coresight_build_path(source, sink, path); if (rc) { kfree(path); return ERR_PTR(rc); @@ -500,6 +551,7 @@ static int coresight_validate_source(struct coresight_device *csdev, int coresight_enable(struct coresight_device *csdev) { int cpu, ret = 0; + struct coresight_device *sink; struct list_head *path; enum coresight_dev_subtype_source subtype; @@ -522,7 +574,17 @@ int coresight_enable(struct coresight_device *csdev) goto out; } - path = coresight_build_path(csdev); + /* + * Search for a valid sink for this session but don't reset the + * "enable_sink" flag in sysFS. Users get to do that explicitly. + */ + sink = coresight_get_enabled_sink(false); + if (!sink) { + ret = -EINVAL; + goto out; + } + + path = coresight_build_path(csdev, sink); if (IS_ERR(path)) { pr_err("building path(s) failed\n"); ret = PTR_ERR(path);
diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c index d127ace..6ee866f 100644 --- a/drivers/ide/ide.c +++ b/drivers/ide/ide.c
@@ -244,7 +244,7 @@ struct chs_geom { static unsigned int ide_disks; static struct chs_geom ide_disks_chs[MAX_HWIFS * MAX_DRIVES]; -static int ide_set_disk_chs(const char *str, struct kernel_param *kp) +static int ide_set_disk_chs(const char *str, const struct kernel_param *kp) { unsigned int a, b, c = 0, h = 0, s = 0, i, j = 1; @@ -328,7 +328,7 @@ static void ide_dev_apply_params(ide_drive_t *drive, u8 unit) static unsigned int ide_ignore_cable; -static int ide_set_ignore_cable(const char *s, struct kernel_param *kp) +static int ide_set_ignore_cable(const char *s, const struct kernel_param *kp) { int i, j = 1;
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c index cedb447..228cb4c 100644 --- a/drivers/infiniband/hw/qib/qib_iba7322.c +++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -150,7 +150,7 @@ static struct kparam_string kp_txselect = { .string = txselect_list, .maxlen = MAX_ATTEN_LEN }; -static int setup_txselect(const char *, struct kernel_param *); +static int setup_txselect(const char *, const struct kernel_param *); module_param_call(txselect, setup_txselect, param_get_string, &kp_txselect, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(txselect, @@ -6177,7 +6177,7 @@ static void set_no_qsfp_atten(struct qib_devdata *dd, int change) } /* handle the txselect parameter changing */ -static int setup_txselect(const char *str, struct kernel_param *kp) +static int setup_txselect(const char *str, const struct kernel_param *kp) { struct qib_devdata *dd; unsigned long val;
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index fe7c6ec..32a26e7 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -80,7 +80,7 @@ module_param(srpt_srq_size, int, 0444); MODULE_PARM_DESC(srpt_srq_size, "Shared receive queue (SRQ) size."); -static int srpt_get_u64_x(char *buffer, struct kernel_param *kp) +static int srpt_get_u64_x(char *buffer, const struct kernel_param *kp) { return sprintf(buffer, "0x%016llx", *(u64 *)kp->arg); }
diff --git a/drivers/input/Kconfig b/drivers/input/Kconfig index 6261874..34ffa02 100644 --- a/drivers/input/Kconfig +++ b/drivers/input/Kconfig
@@ -187,6 +187,19 @@ To compile this driver as a module, choose M here: the module will be called apm-power. +config INPUT_KEYRESET + bool "Reset key" + depends on INPUT + select INPUT_KEYCOMBO + ---help--- + Say Y here if you want to reboot when some keys are pressed; + +config INPUT_KEYCOMBO + bool "Key combo" + depends on INPUT + ---help--- + Say Y here if you want to take action when some keys are pressed; + comment "Input Device Drivers" source "drivers/input/keyboard/Kconfig"
diff --git a/drivers/input/Makefile b/drivers/input/Makefile index 595820b..6a3281c 100644 --- a/drivers/input/Makefile +++ b/drivers/input/Makefile
@@ -26,5 +26,7 @@ obj-$(CONFIG_INPUT_MISC) += misc/ obj-$(CONFIG_INPUT_APMPOWER) += apm-power.o +obj-$(CONFIG_INPUT_KEYRESET) += keyreset.o +obj-$(CONFIG_INPUT_KEYCOMBO) += keycombo.o obj-$(CONFIG_RMI4_CORE) += rmi4/
diff --git a/drivers/input/keyboard/goldfish_events.c b/drivers/input/keyboard/goldfish_events.c index f6e643b..c877e56 100644 --- a/drivers/input/keyboard/goldfish_events.c +++ b/drivers/input/keyboard/goldfish_events.c
@@ -17,6 +17,7 @@ #include <linux/interrupt.h> #include <linux/types.h> #include <linux/input.h> +#include <linux/input/mt.h> #include <linux/kernel.h> #include <linux/platform_device.h> #include <linux/slab.h> @@ -24,6 +25,8 @@ #include <linux/io.h> #include <linux/acpi.h> +#define GOLDFISH_MAX_FINGERS 5 + enum { REG_READ = 0x00, REG_SET_PAGE = 0x00, @@ -52,7 +55,21 @@ static irqreturn_t events_interrupt(int irq, void *dev_id) value = __raw_readl(edev->addr + REG_READ); input_event(edev->input, type, code, value); - input_sync(edev->input); + // Send an extra (EV_SYN, SYN_REPORT, 0x0) event + // if a key was pressed. Some keyboard device + // drivers may only send the EV_KEY event and + // not EV_SYN. + // Note that sending an extra SYN_REPORT is not + // necessary nor correct protocol with other + // devices such as touchscreens, which will send + // their own SYN_REPORT's when sufficient event + // information has been collected (e.g., for + // touchscreens, when pressure and X/Y coordinates + // have been received). Hence, we will only send + // this extra SYN_REPORT if type == EV_KEY. + if (type == EV_KEY) { + input_sync(edev->input); + } return IRQ_HANDLED; } @@ -154,6 +171,15 @@ static int events_probe(struct platform_device *pdev) input_dev->name = edev->name; input_dev->id.bustype = BUS_HOST; + // Set the Goldfish Device to be multi-touch. + // In the Ranchu kernel, there is multi-touch-specific + // code for handling ABS_MT_SLOT events. + // See drivers/input/input.c:input_handle_abs_event. + // If we do not issue input_mt_init_slots, + // the kernel will filter out needed ABS_MT_SLOT + // events when we touch the screen in more than one place, + // preventing multi-touch with more than one finger from working. + input_mt_init_slots(input_dev, GOLDFISH_MAX_FINGERS, 0); events_import_bits(edev, input_dev->evbit, EV_SYN, EV_MAX); events_import_bits(edev, input_dev->keybit, EV_KEY, KEY_MAX);
diff --git a/drivers/input/keycombo.c b/drivers/input/keycombo.c new file mode 100644 index 0000000..2fba451 --- /dev/null +++ b/drivers/input/keycombo.c
@@ -0,0 +1,261 @@ +/* drivers/input/keycombo.c + * + * Copyright (C) 2014 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/input.h> +#include <linux/keycombo.h> +#include <linux/module.h> +#include <linux/platform_device.h> +#include <linux/reboot.h> +#include <linux/sched.h> +#include <linux/slab.h> + +struct keycombo_state { + struct input_handler input_handler; + unsigned long keybit[BITS_TO_LONGS(KEY_CNT)]; + unsigned long upbit[BITS_TO_LONGS(KEY_CNT)]; + unsigned long key[BITS_TO_LONGS(KEY_CNT)]; + spinlock_t lock; + struct workqueue_struct *wq; + int key_down_target; + int key_down; + int key_up; + struct delayed_work key_down_work; + int delay; + struct work_struct key_up_work; + void (*key_up_fn)(void *); + void (*key_down_fn)(void *); + void *priv; + int key_is_down; + struct wakeup_source combo_held_wake_source; + struct wakeup_source combo_up_wake_source; +}; + +static void do_key_down(struct work_struct *work) +{ + struct delayed_work *dwork = container_of(work, struct delayed_work, + work); + struct keycombo_state *state = container_of(dwork, + struct keycombo_state, key_down_work); + if (state->key_down_fn) + state->key_down_fn(state->priv); +} + +static void do_key_up(struct work_struct *work) +{ + struct keycombo_state *state = container_of(work, struct keycombo_state, + key_up_work); + if (state->key_up_fn) + state->key_up_fn(state->priv); + __pm_relax(&state->combo_up_wake_source); +} + +static void keycombo_event(struct input_handle *handle, unsigned int type, + unsigned int code, int value) +{ + unsigned long flags; + struct keycombo_state *state = handle->private; + + if (type != EV_KEY) + return; + + if (code >= KEY_MAX) + return; + + if (!test_bit(code, state->keybit)) + return; + + spin_lock_irqsave(&state->lock, flags); + if (!test_bit(code, state->key) == !value) + goto done; + __change_bit(code, state->key); + if (test_bit(code, state->upbit)) { + if (value) + state->key_up++; + else + state->key_up--; + } else { + if (value) + state->key_down++; + else + state->key_down--; + } + if (state->key_down == state->key_down_target && state->key_up == 0) { + __pm_stay_awake(&state->combo_held_wake_source); + state->key_is_down = 1; + if (queue_delayed_work(state->wq, &state->key_down_work, + state->delay)) + pr_debug("Key down work already queued!"); + } else if (state->key_is_down) { + if (!cancel_delayed_work(&state->key_down_work)) { + __pm_stay_awake(&state->combo_up_wake_source); + queue_work(state->wq, &state->key_up_work); + } + __pm_relax(&state->combo_held_wake_source); + state->key_is_down = 0; + } +done: + spin_unlock_irqrestore(&state->lock, flags); +} + +static int keycombo_connect(struct input_handler *handler, + struct input_dev *dev, + const struct input_device_id *id) +{ + int i; + int ret; + struct input_handle *handle; + struct keycombo_state *state = + container_of(handler, struct keycombo_state, input_handler); + for (i = 0; i < KEY_MAX; i++) { + if (test_bit(i, state->keybit) && test_bit(i, dev->keybit)) + break; + } + if (i == KEY_MAX) + return -ENODEV; + + handle = kzalloc(sizeof(*handle), GFP_KERNEL); + if (!handle) + return -ENOMEM; + + handle->dev = dev; + handle->handler = handler; + handle->name = KEYCOMBO_NAME; + handle->private = state; + + ret = input_register_handle(handle); + if (ret) + goto err_input_register_handle; + + ret = input_open_device(handle); + if (ret) + goto err_input_open_device; + + return 0; + +err_input_open_device: + input_unregister_handle(handle); +err_input_register_handle: + kfree(handle); + return ret; +} + +static void keycombo_disconnect(struct input_handle *handle) +{ + input_close_device(handle); + input_unregister_handle(handle); + kfree(handle); +} + +static const struct input_device_id keycombo_ids[] = { + { + .flags = INPUT_DEVICE_ID_MATCH_EVBIT, + .evbit = { BIT_MASK(EV_KEY) }, + }, + { }, +}; +MODULE_DEVICE_TABLE(input, keycombo_ids); + +static int keycombo_probe(struct platform_device *pdev) +{ + int ret; + int key, *keyp; + struct keycombo_state *state; + struct keycombo_platform_data *pdata = pdev->dev.platform_data; + + if (!pdata) + return -EINVAL; + + state = kzalloc(sizeof(*state), GFP_KERNEL); + if (!state) + return -ENOMEM; + + spin_lock_init(&state->lock); + keyp = pdata->keys_down; + while ((key = *keyp++)) { + if (key >= KEY_MAX) + continue; + state->key_down_target++; + __set_bit(key, state->keybit); + } + if (pdata->keys_up) { + keyp = pdata->keys_up; + while ((key = *keyp++)) { + if (key >= KEY_MAX) + continue; + __set_bit(key, state->keybit); + __set_bit(key, state->upbit); + } + } + + state->wq = alloc_ordered_workqueue("keycombo", 0); + if (!state->wq) + return -ENOMEM; + + state->priv = pdata->priv; + + if (pdata->key_down_fn) + state->key_down_fn = pdata->key_down_fn; + INIT_DELAYED_WORK(&state->key_down_work, do_key_down); + + if (pdata->key_up_fn) + state->key_up_fn = pdata->key_up_fn; + INIT_WORK(&state->key_up_work, do_key_up); + + wakeup_source_init(&state->combo_held_wake_source, "key combo"); + wakeup_source_init(&state->combo_up_wake_source, "key combo up"); + state->delay = msecs_to_jiffies(pdata->key_down_delay); + + state->input_handler.event = keycombo_event; + state->input_handler.connect = keycombo_connect; + state->input_handler.disconnect = keycombo_disconnect; + state->input_handler.name = KEYCOMBO_NAME; + state->input_handler.id_table = keycombo_ids; + ret = input_register_handler(&state->input_handler); + if (ret) { + kfree(state); + return ret; + } + platform_set_drvdata(pdev, state); + return 0; +} + +int keycombo_remove(struct platform_device *pdev) +{ + struct keycombo_state *state = platform_get_drvdata(pdev); + input_unregister_handler(&state->input_handler); + destroy_workqueue(state->wq); + kfree(state); + return 0; +} + + +struct platform_driver keycombo_driver = { + .driver.name = KEYCOMBO_NAME, + .probe = keycombo_probe, + .remove = keycombo_remove, +}; + +static int __init keycombo_init(void) +{ + return platform_driver_register(&keycombo_driver); +} + +static void __exit keycombo_exit(void) +{ + return platform_driver_unregister(&keycombo_driver); +} + +module_init(keycombo_init); +module_exit(keycombo_exit);
diff --git a/drivers/input/keyreset.c b/drivers/input/keyreset.c new file mode 100644 index 0000000..7e5222ae --- /dev/null +++ b/drivers/input/keyreset.c
@@ -0,0 +1,144 @@ +/* drivers/input/keyreset.c + * + * Copyright (C) 2014 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/input.h> +#include <linux/keyreset.h> +#include <linux/module.h> +#include <linux/platform_device.h> +#include <linux/reboot.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/syscalls.h> +#include <linux/keycombo.h> + +struct keyreset_state { + int restart_requested; + int (*reset_fn)(void); + struct platform_device *pdev_child; + struct work_struct restart_work; +}; + +static void do_restart(struct work_struct *unused) +{ + orderly_reboot(); +} + +static void do_reset_fn(void *priv) +{ + struct keyreset_state *state = priv; + if (state->restart_requested) + panic("keyboard reset failed, %d", state->restart_requested); + if (state->reset_fn) { + state->restart_requested = state->reset_fn(); + } else { + pr_info("keyboard reset\n"); + schedule_work(&state->restart_work); + state->restart_requested = 1; + } +} + +static int keyreset_probe(struct platform_device *pdev) +{ + int ret = -ENOMEM; + struct keycombo_platform_data *pdata_child; + struct keyreset_platform_data *pdata = pdev->dev.platform_data; + int up_size = 0, down_size = 0, size; + int key, *keyp; + struct keyreset_state *state; + + if (!pdata) + return -EINVAL; + state = devm_kzalloc(&pdev->dev, sizeof(*state), GFP_KERNEL); + if (!state) + return -ENOMEM; + + state->pdev_child = platform_device_alloc(KEYCOMBO_NAME, + PLATFORM_DEVID_AUTO); + if (!state->pdev_child) + return -ENOMEM; + state->pdev_child->dev.parent = &pdev->dev; + INIT_WORK(&state->restart_work, do_restart); + + keyp = pdata->keys_down; + while ((key = *keyp++)) { + if (key >= KEY_MAX) + continue; + down_size++; + } + if (pdata->keys_up) { + keyp = pdata->keys_up; + while ((key = *keyp++)) { + if (key >= KEY_MAX) + continue; + up_size++; + } + } + size = sizeof(struct keycombo_platform_data) + + sizeof(int) * (down_size + 1); + pdata_child = devm_kzalloc(&pdev->dev, size, GFP_KERNEL); + if (!pdata_child) + goto error; + memcpy(pdata_child->keys_down, pdata->keys_down, + sizeof(int) * down_size); + if (up_size > 0) { + pdata_child->keys_up = devm_kzalloc(&pdev->dev, up_size + 1, + GFP_KERNEL); + if (!pdata_child->keys_up) + goto error; + memcpy(pdata_child->keys_up, pdata->keys_up, + sizeof(int) * up_size); + if (!pdata_child->keys_up) + goto error; + } + state->reset_fn = pdata->reset_fn; + pdata_child->key_down_fn = do_reset_fn; + pdata_child->priv = state; + pdata_child->key_down_delay = pdata->key_down_delay; + ret = platform_device_add_data(state->pdev_child, pdata_child, size); + if (ret) + goto error; + platform_set_drvdata(pdev, state); + return platform_device_add(state->pdev_child); +error: + platform_device_put(state->pdev_child); + return ret; +} + +int keyreset_remove(struct platform_device *pdev) +{ + struct keyreset_state *state = platform_get_drvdata(pdev); + platform_device_put(state->pdev_child); + return 0; +} + + +struct platform_driver keyreset_driver = { + .driver.name = KEYRESET_NAME, + .probe = keyreset_probe, + .remove = keyreset_remove, +}; + +static int __init keyreset_init(void) +{ + return platform_driver_register(&keyreset_driver); +} + +static void __exit keyreset_exit(void) +{ + return platform_driver_unregister(&keyreset_driver); +} + +module_init(keyreset_init); +module_exit(keyreset_exit);
diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig index 7ffb614..94360fe 100644 --- a/drivers/input/misc/Kconfig +++ b/drivers/input/misc/Kconfig
@@ -367,6 +367,17 @@ To compile this driver as a module, choose M here: the module will be called ati_remote2. +config INPUT_KEYCHORD + tristate "Key chord input driver support" + help + Say Y here if you want to enable the key chord driver + accessible at /dev/keychord. This driver can be used + for receiving notifications when client specified key + combinations are pressed. + + To compile this driver as a module, choose M here: the + module will be called keychord. + config INPUT_KEYSPAN_REMOTE tristate "Keyspan DMR USB remote control" depends on USB_ARCH_HAS_HCD @@ -535,6 +546,11 @@ To compile this driver as a module, choose M here: the module will be called sgi_btns. +config INPUT_GPIO + tristate "GPIO driver support" + help + Say Y here if you want to support gpio based keys, wheels etc... + config HP_SDC_RTC tristate "HP SDC Real Time Clock" depends on (GSC || HP300) && SERIO
diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile index 0b6d025..64bf231 100644 --- a/drivers/input/misc/Makefile +++ b/drivers/input/misc/Makefile
@@ -36,10 +36,12 @@ obj-$(CONFIG_INPUT_GPIO_BEEPER) += gpio-beeper.o obj-$(CONFIG_INPUT_GPIO_TILT_POLLED) += gpio_tilt_polled.o obj-$(CONFIG_INPUT_GPIO_DECODER) += gpio_decoder.o +obj-$(CONFIG_INPUT_GPIO) += gpio_event.o gpio_matrix.o gpio_input.o gpio_output.o gpio_axis.o obj-$(CONFIG_INPUT_HISI_POWERKEY) += hisi_powerkey.o obj-$(CONFIG_HP_SDC_RTC) += hp_sdc_rtc.o obj-$(CONFIG_INPUT_IMS_PCU) += ims-pcu.o obj-$(CONFIG_INPUT_IXP4XX_BEEPER) += ixp4xx-beeper.o +obj-$(CONFIG_INPUT_KEYCHORD) += keychord.o obj-$(CONFIG_INPUT_KEYSPAN_REMOTE) += keyspan_remote.o obj-$(CONFIG_INPUT_KXTJ9) += kxtj9.o obj-$(CONFIG_INPUT_M68K_BEEP) += m68kspkr.o
diff --git a/drivers/input/misc/gpio_axis.c b/drivers/input/misc/gpio_axis.c new file mode 100644 index 0000000..0acf4a5 --- /dev/null +++ b/drivers/input/misc/gpio_axis.c
@@ -0,0 +1,192 @@ +/* drivers/input/misc/gpio_axis.c + * + * Copyright (C) 2007 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/kernel.h> +#include <linux/gpio.h> +#include <linux/gpio_event.h> +#include <linux/interrupt.h> +#include <linux/slab.h> + +struct gpio_axis_state { + struct gpio_event_input_devs *input_devs; + struct gpio_event_axis_info *info; + uint32_t pos; +}; + +uint16_t gpio_axis_4bit_gray_map_table[] = { + [0x0] = 0x0, [0x1] = 0x1, /* 0000 0001 */ + [0x3] = 0x2, [0x2] = 0x3, /* 0011 0010 */ + [0x6] = 0x4, [0x7] = 0x5, /* 0110 0111 */ + [0x5] = 0x6, [0x4] = 0x7, /* 0101 0100 */ + [0xc] = 0x8, [0xd] = 0x9, /* 1100 1101 */ + [0xf] = 0xa, [0xe] = 0xb, /* 1111 1110 */ + [0xa] = 0xc, [0xb] = 0xd, /* 1010 1011 */ + [0x9] = 0xe, [0x8] = 0xf, /* 1001 1000 */ +}; +uint16_t gpio_axis_4bit_gray_map(struct gpio_event_axis_info *info, uint16_t in) +{ + return gpio_axis_4bit_gray_map_table[in]; +} + +uint16_t gpio_axis_5bit_singletrack_map_table[] = { + [0x10] = 0x00, [0x14] = 0x01, [0x1c] = 0x02, /* 10000 10100 11100 */ + [0x1e] = 0x03, [0x1a] = 0x04, [0x18] = 0x05, /* 11110 11010 11000 */ + [0x08] = 0x06, [0x0a] = 0x07, [0x0e] = 0x08, /* 01000 01010 01110 */ + [0x0f] = 0x09, [0x0d] = 0x0a, [0x0c] = 0x0b, /* 01111 01101 01100 */ + [0x04] = 0x0c, [0x05] = 0x0d, [0x07] = 0x0e, /* 00100 00101 00111 */ + [0x17] = 0x0f, [0x16] = 0x10, [0x06] = 0x11, /* 10111 10110 00110 */ + [0x02] = 0x12, [0x12] = 0x13, [0x13] = 0x14, /* 00010 10010 10011 */ + [0x1b] = 0x15, [0x0b] = 0x16, [0x03] = 0x17, /* 11011 01011 00011 */ + [0x01] = 0x18, [0x09] = 0x19, [0x19] = 0x1a, /* 00001 01001 11001 */ + [0x1d] = 0x1b, [0x15] = 0x1c, [0x11] = 0x1d, /* 11101 10101 10001 */ +}; +uint16_t gpio_axis_5bit_singletrack_map( + struct gpio_event_axis_info *info, uint16_t in) +{ + return gpio_axis_5bit_singletrack_map_table[in]; +} + +static void gpio_event_update_axis(struct gpio_axis_state *as, int report) +{ + struct gpio_event_axis_info *ai = as->info; + int i; + int change; + uint16_t state = 0; + uint16_t pos; + uint16_t old_pos = as->pos; + for (i = ai->count - 1; i >= 0; i--) + state = (state << 1) | gpio_get_value(ai->gpio[i]); + pos = ai->map(ai, state); + if (ai->flags & GPIOEAF_PRINT_RAW) + pr_info("axis %d-%d raw %x, pos %d -> %d\n", + ai->type, ai->code, state, old_pos, pos); + if (report && pos != old_pos) { + if (ai->type == EV_REL) { + change = (ai->decoded_size + pos - old_pos) % + ai->decoded_size; + if (change > ai->decoded_size / 2) + change -= ai->decoded_size; + if (change == ai->decoded_size / 2) { + if (ai->flags & GPIOEAF_PRINT_EVENT) + pr_info("axis %d-%d unknown direction, " + "pos %d -> %d\n", ai->type, + ai->code, old_pos, pos); + change = 0; /* no closest direction */ + } + if (ai->flags & GPIOEAF_PRINT_EVENT) + pr_info("axis %d-%d change %d\n", + ai->type, ai->code, change); + input_report_rel(as->input_devs->dev[ai->dev], + ai->code, change); + } else { + if (ai->flags & GPIOEAF_PRINT_EVENT) + pr_info("axis %d-%d now %d\n", + ai->type, ai->code, pos); + input_event(as->input_devs->dev[ai->dev], + ai->type, ai->code, pos); + } + input_sync(as->input_devs->dev[ai->dev]); + } + as->pos = pos; +} + +static irqreturn_t gpio_axis_irq_handler(int irq, void *dev_id) +{ + struct gpio_axis_state *as = dev_id; + gpio_event_update_axis(as, 1); + return IRQ_HANDLED; +} + +int gpio_event_axis_func(struct gpio_event_input_devs *input_devs, + struct gpio_event_info *info, void **data, int func) +{ + int ret; + int i; + int irq; + struct gpio_event_axis_info *ai; + struct gpio_axis_state *as; + + ai = container_of(info, struct gpio_event_axis_info, info); + if (func == GPIO_EVENT_FUNC_SUSPEND) { + for (i = 0; i < ai->count; i++) + disable_irq(gpio_to_irq(ai->gpio[i])); + return 0; + } + if (func == GPIO_EVENT_FUNC_RESUME) { + for (i = 0; i < ai->count; i++) + enable_irq(gpio_to_irq(ai->gpio[i])); + return 0; + } + + if (func == GPIO_EVENT_FUNC_INIT) { + *data = as = kmalloc(sizeof(*as), GFP_KERNEL); + if (as == NULL) { + ret = -ENOMEM; + goto err_alloc_axis_state_failed; + } + as->input_devs = input_devs; + as->info = ai; + if (ai->dev >= input_devs->count) { + pr_err("gpio_event_axis: bad device index %d >= %d " + "for %d:%d\n", ai->dev, input_devs->count, + ai->type, ai->code); + ret = -EINVAL; + goto err_bad_device_index; + } + + input_set_capability(input_devs->dev[ai->dev], + ai->type, ai->code); + if (ai->type == EV_ABS) { + input_set_abs_params(input_devs->dev[ai->dev], ai->code, + 0, ai->decoded_size - 1, 0, 0); + } + for (i = 0; i < ai->count; i++) { + ret = gpio_request(ai->gpio[i], "gpio_event_axis"); + if (ret < 0) + goto err_request_gpio_failed; + ret = gpio_direction_input(ai->gpio[i]); + if (ret < 0) + goto err_gpio_direction_input_failed; + ret = irq = gpio_to_irq(ai->gpio[i]); + if (ret < 0) + goto err_get_irq_num_failed; + ret = request_irq(irq, gpio_axis_irq_handler, + IRQF_TRIGGER_RISING | + IRQF_TRIGGER_FALLING, + "gpio_event_axis", as); + if (ret < 0) + goto err_request_irq_failed; + } + gpio_event_update_axis(as, 0); + return 0; + } + + ret = 0; + as = *data; + for (i = ai->count - 1; i >= 0; i--) { + free_irq(gpio_to_irq(ai->gpio[i]), as); +err_request_irq_failed: +err_get_irq_num_failed: +err_gpio_direction_input_failed: + gpio_free(ai->gpio[i]); +err_request_gpio_failed: + ; + } +err_bad_device_index: + kfree(as); + *data = NULL; +err_alloc_axis_state_failed: + return ret; +}
diff --git a/drivers/input/misc/gpio_event.c b/drivers/input/misc/gpio_event.c new file mode 100644 index 0000000..90f07eb --- /dev/null +++ b/drivers/input/misc/gpio_event.c
@@ -0,0 +1,228 @@ +/* drivers/input/misc/gpio_event.c + * + * Copyright (C) 2007 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/module.h> +#include <linux/input.h> +#include <linux/gpio_event.h> +#include <linux/hrtimer.h> +#include <linux/platform_device.h> +#include <linux/slab.h> + +struct gpio_event { + struct gpio_event_input_devs *input_devs; + const struct gpio_event_platform_data *info; + void *state[0]; +}; + +static int gpio_input_event( + struct input_dev *dev, unsigned int type, unsigned int code, int value) +{ + int i; + int devnr; + int ret = 0; + int tmp_ret; + struct gpio_event_info **ii; + struct gpio_event *ip = input_get_drvdata(dev); + + for (devnr = 0; devnr < ip->input_devs->count; devnr++) + if (ip->input_devs->dev[devnr] == dev) + break; + if (devnr == ip->input_devs->count) { + pr_err("gpio_input_event: unknown device %p\n", dev); + return -EIO; + } + + for (i = 0, ii = ip->info->info; i < ip->info->info_count; i++, ii++) { + if ((*ii)->event) { + tmp_ret = (*ii)->event(ip->input_devs, *ii, + &ip->state[i], + devnr, type, code, value); + if (tmp_ret) + ret = tmp_ret; + } + } + return ret; +} + +static int gpio_event_call_all_func(struct gpio_event *ip, int func) +{ + int i; + int ret; + struct gpio_event_info **ii; + + if (func == GPIO_EVENT_FUNC_INIT || func == GPIO_EVENT_FUNC_RESUME) { + ii = ip->info->info; + for (i = 0; i < ip->info->info_count; i++, ii++) { + if ((*ii)->func == NULL) { + ret = -ENODEV; + pr_err("gpio_event_probe: Incomplete pdata, " + "no function\n"); + goto err_no_func; + } + if (func == GPIO_EVENT_FUNC_RESUME && (*ii)->no_suspend) + continue; + ret = (*ii)->func(ip->input_devs, *ii, &ip->state[i], + func); + if (ret) { + pr_err("gpio_event_probe: function failed\n"); + goto err_func_failed; + } + } + return 0; + } + + ret = 0; + i = ip->info->info_count; + ii = ip->info->info + i; + while (i > 0) { + i--; + ii--; + if ((func & ~1) == GPIO_EVENT_FUNC_SUSPEND && (*ii)->no_suspend) + continue; + (*ii)->func(ip->input_devs, *ii, &ip->state[i], func & ~1); +err_func_failed: +err_no_func: + ; + } + return ret; +} + +static void __maybe_unused gpio_event_suspend(struct gpio_event *ip) +{ + gpio_event_call_all_func(ip, GPIO_EVENT_FUNC_SUSPEND); + if (ip->info->power) + ip->info->power(ip->info, 0); +} + +static void __maybe_unused gpio_event_resume(struct gpio_event *ip) +{ + if (ip->info->power) + ip->info->power(ip->info, 1); + gpio_event_call_all_func(ip, GPIO_EVENT_FUNC_RESUME); +} + +static int gpio_event_probe(struct platform_device *pdev) +{ + int err; + struct gpio_event *ip; + struct gpio_event_platform_data *event_info; + int dev_count = 1; + int i; + int registered = 0; + + event_info = pdev->dev.platform_data; + if (event_info == NULL) { + pr_err("gpio_event_probe: No pdata\n"); + return -ENODEV; + } + if ((!event_info->name && !event_info->names[0]) || + !event_info->info || !event_info->info_count) { + pr_err("gpio_event_probe: Incomplete pdata\n"); + return -ENODEV; + } + if (!event_info->name) + while (event_info->names[dev_count]) + dev_count++; + ip = kzalloc(sizeof(*ip) + + sizeof(ip->state[0]) * event_info->info_count + + sizeof(*ip->input_devs) + + sizeof(ip->input_devs->dev[0]) * dev_count, GFP_KERNEL); + if (ip == NULL) { + err = -ENOMEM; + pr_err("gpio_event_probe: Failed to allocate private data\n"); + goto err_kp_alloc_failed; + } + ip->input_devs = (void*)&ip->state[event_info->info_count]; + platform_set_drvdata(pdev, ip); + + for (i = 0; i < dev_count; i++) { + struct input_dev *input_dev = input_allocate_device(); + if (input_dev == NULL) { + err = -ENOMEM; + pr_err("gpio_event_probe: " + "Failed to allocate input device\n"); + goto err_input_dev_alloc_failed; + } + input_set_drvdata(input_dev, ip); + input_dev->name = event_info->name ? + event_info->name : event_info->names[i]; + input_dev->event = gpio_input_event; + ip->input_devs->dev[i] = input_dev; + } + ip->input_devs->count = dev_count; + ip->info = event_info; + if (event_info->power) + ip->info->power(ip->info, 1); + + err = gpio_event_call_all_func(ip, GPIO_EVENT_FUNC_INIT); + if (err) + goto err_call_all_func_failed; + + for (i = 0; i < dev_count; i++) { + err = input_register_device(ip->input_devs->dev[i]); + if (err) { + pr_err("gpio_event_probe: Unable to register %s " + "input device\n", ip->input_devs->dev[i]->name); + goto err_input_register_device_failed; + } + registered++; + } + + return 0; + +err_input_register_device_failed: + gpio_event_call_all_func(ip, GPIO_EVENT_FUNC_UNINIT); +err_call_all_func_failed: + if (event_info->power) + ip->info->power(ip->info, 0); + for (i = 0; i < registered; i++) + input_unregister_device(ip->input_devs->dev[i]); + for (i = dev_count - 1; i >= registered; i--) { + input_free_device(ip->input_devs->dev[i]); +err_input_dev_alloc_failed: + ; + } + kfree(ip); +err_kp_alloc_failed: + return err; +} + +static int gpio_event_remove(struct platform_device *pdev) +{ + struct gpio_event *ip = platform_get_drvdata(pdev); + int i; + + gpio_event_call_all_func(ip, GPIO_EVENT_FUNC_UNINIT); + if (ip->info->power) + ip->info->power(ip->info, 0); + for (i = 0; i < ip->input_devs->count; i++) + input_unregister_device(ip->input_devs->dev[i]); + kfree(ip); + return 0; +} + +static struct platform_driver gpio_event_driver = { + .probe = gpio_event_probe, + .remove = gpio_event_remove, + .driver = { + .name = GPIO_EVENT_DEV_NAME, + }, +}; + +module_platform_driver(gpio_event_driver); + +MODULE_DESCRIPTION("GPIO Event Driver"); +MODULE_LICENSE("GPL"); +
diff --git a/drivers/input/misc/gpio_input.c b/drivers/input/misc/gpio_input.c new file mode 100644 index 0000000..eefd027 --- /dev/null +++ b/drivers/input/misc/gpio_input.c
@@ -0,0 +1,390 @@ +/* drivers/input/misc/gpio_input.c + * + * Copyright (C) 2007 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/kernel.h> +#include <linux/gpio.h> +#include <linux/gpio_event.h> +#include <linux/hrtimer.h> +#include <linux/input.h> +#include <linux/interrupt.h> +#include <linux/slab.h> +#include <linux/pm_wakeup.h> + +enum { + DEBOUNCE_UNSTABLE = BIT(0), /* Got irq, while debouncing */ + DEBOUNCE_PRESSED = BIT(1), + DEBOUNCE_NOTPRESSED = BIT(2), + DEBOUNCE_WAIT_IRQ = BIT(3), /* Stable irq state */ + DEBOUNCE_POLL = BIT(4), /* Stable polling state */ + + DEBOUNCE_UNKNOWN = + DEBOUNCE_PRESSED | DEBOUNCE_NOTPRESSED, +}; + +struct gpio_key_state { + struct gpio_input_state *ds; + uint8_t debounce; +}; + +struct gpio_input_state { + struct gpio_event_input_devs *input_devs; + const struct gpio_event_input_info *info; + struct hrtimer timer; + int use_irq; + int debounce_count; + spinlock_t irq_lock; + struct wakeup_source *ws; + struct gpio_key_state key_state[0]; +}; + +static enum hrtimer_restart gpio_event_input_timer_func(struct hrtimer *timer) +{ + int i; + int pressed; + struct gpio_input_state *ds = + container_of(timer, struct gpio_input_state, timer); + unsigned gpio_flags = ds->info->flags; + unsigned npolarity; + int nkeys = ds->info->keymap_size; + const struct gpio_event_direct_entry *key_entry; + struct gpio_key_state *key_state; + unsigned long irqflags; + uint8_t debounce; + bool sync_needed; + +#if 0 + key_entry = kp->keys_info->keymap; + key_state = kp->key_state; + for (i = 0; i < nkeys; i++, key_entry++, key_state++) + pr_info("gpio_read_detect_status %d %d\n", key_entry->gpio, + gpio_read_detect_status(key_entry->gpio)); +#endif + key_entry = ds->info->keymap; + key_state = ds->key_state; + sync_needed = false; + spin_lock_irqsave(&ds->irq_lock, irqflags); + for (i = 0; i < nkeys; i++, key_entry++, key_state++) { + debounce = key_state->debounce; + if (debounce & DEBOUNCE_WAIT_IRQ) + continue; + if (key_state->debounce & DEBOUNCE_UNSTABLE) { + debounce = key_state->debounce = DEBOUNCE_UNKNOWN; + enable_irq(gpio_to_irq(key_entry->gpio)); + if (gpio_flags & GPIOEDF_PRINT_KEY_UNSTABLE) + pr_info("gpio_keys_scan_keys: key %x-%x, %d " + "(%d) continue debounce\n", + ds->info->type, key_entry->code, + i, key_entry->gpio); + } + npolarity = !(gpio_flags & GPIOEDF_ACTIVE_HIGH); + pressed = gpio_get_value(key_entry->gpio) ^ npolarity; + if (debounce & DEBOUNCE_POLL) { + if (pressed == !(debounce & DEBOUNCE_PRESSED)) { + ds->debounce_count++; + key_state->debounce = DEBOUNCE_UNKNOWN; + if (gpio_flags & GPIOEDF_PRINT_KEY_DEBOUNCE) + pr_info("gpio_keys_scan_keys: key %x-" + "%x, %d (%d) start debounce\n", + ds->info->type, key_entry->code, + i, key_entry->gpio); + } + continue; + } + if (pressed && (debounce & DEBOUNCE_NOTPRESSED)) { + if (gpio_flags & GPIOEDF_PRINT_KEY_DEBOUNCE) + pr_info("gpio_keys_scan_keys: key %x-%x, %d " + "(%d) debounce pressed 1\n", + ds->info->type, key_entry->code, + i, key_entry->gpio); + key_state->debounce = DEBOUNCE_PRESSED; + continue; + } + if (!pressed && (debounce & DEBOUNCE_PRESSED)) { + if (gpio_flags & GPIOEDF_PRINT_KEY_DEBOUNCE) + pr_info("gpio_keys_scan_keys: key %x-%x, %d " + "(%d) debounce pressed 0\n", + ds->info->type, key_entry->code, + i, key_entry->gpio); + key_state->debounce = DEBOUNCE_NOTPRESSED; + continue; + } + /* key is stable */ + ds->debounce_count--; + if (ds->use_irq) + key_state->debounce |= DEBOUNCE_WAIT_IRQ; + else + key_state->debounce |= DEBOUNCE_POLL; + if (gpio_flags & GPIOEDF_PRINT_KEYS) + pr_info("gpio_keys_scan_keys: key %x-%x, %d (%d) " + "changed to %d\n", ds->info->type, + key_entry->code, i, key_entry->gpio, pressed); + input_event(ds->input_devs->dev[key_entry->dev], ds->info->type, + key_entry->code, pressed); + sync_needed = true; + } + if (sync_needed) { + for (i = 0; i < ds->input_devs->count; i++) + input_sync(ds->input_devs->dev[i]); + } + +#if 0 + key_entry = kp->keys_info->keymap; + key_state = kp->key_state; + for (i = 0; i < nkeys; i++, key_entry++, key_state++) { + pr_info("gpio_read_detect_status %d %d\n", key_entry->gpio, + gpio_read_detect_status(key_entry->gpio)); + } +#endif + + if (ds->debounce_count) + hrtimer_start(timer, ds->info->debounce_time, HRTIMER_MODE_REL); + else if (!ds->use_irq) + hrtimer_start(timer, ds->info->poll_time, HRTIMER_MODE_REL); + else + __pm_relax(ds->ws); + + spin_unlock_irqrestore(&ds->irq_lock, irqflags); + + return HRTIMER_NORESTART; +} + +static irqreturn_t gpio_event_input_irq_handler(int irq, void *dev_id) +{ + struct gpio_key_state *ks = dev_id; + struct gpio_input_state *ds = ks->ds; + int keymap_index = ks - ds->key_state; + const struct gpio_event_direct_entry *key_entry; + unsigned long irqflags; + int pressed; + + if (!ds->use_irq) + return IRQ_HANDLED; + + key_entry = &ds->info->keymap[keymap_index]; + + if (ds->info->debounce_time.tv64) { + spin_lock_irqsave(&ds->irq_lock, irqflags); + if (ks->debounce & DEBOUNCE_WAIT_IRQ) { + ks->debounce = DEBOUNCE_UNKNOWN; + if (ds->debounce_count++ == 0) { + __pm_stay_awake(ds->ws); + hrtimer_start( + &ds->timer, ds->info->debounce_time, + HRTIMER_MODE_REL); + } + if (ds->info->flags & GPIOEDF_PRINT_KEY_DEBOUNCE) + pr_info("gpio_event_input_irq_handler: " + "key %x-%x, %d (%d) start debounce\n", + ds->info->type, key_entry->code, + keymap_index, key_entry->gpio); + } else { + disable_irq_nosync(irq); + ks->debounce = DEBOUNCE_UNSTABLE; + } + spin_unlock_irqrestore(&ds->irq_lock, irqflags); + } else { + pressed = gpio_get_value(key_entry->gpio) ^ + !(ds->info->flags & GPIOEDF_ACTIVE_HIGH); + if (ds->info->flags & GPIOEDF_PRINT_KEYS) + pr_info("gpio_event_input_irq_handler: key %x-%x, %d " + "(%d) changed to %d\n", + ds->info->type, key_entry->code, keymap_index, + key_entry->gpio, pressed); + input_event(ds->input_devs->dev[key_entry->dev], ds->info->type, + key_entry->code, pressed); + input_sync(ds->input_devs->dev[key_entry->dev]); + } + return IRQ_HANDLED; +} + +static int gpio_event_input_request_irqs(struct gpio_input_state *ds) +{ + int i; + int err; + unsigned int irq; + unsigned long req_flags = IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING; + + for (i = 0; i < ds->info->keymap_size; i++) { + err = irq = gpio_to_irq(ds->info->keymap[i].gpio); + if (err < 0) + goto err_gpio_get_irq_num_failed; + err = request_irq(irq, gpio_event_input_irq_handler, + req_flags, "gpio_keys", &ds->key_state[i]); + if (err) { + pr_err("gpio_event_input_request_irqs: request_irq " + "failed for input %d, irq %d\n", + ds->info->keymap[i].gpio, irq); + goto err_request_irq_failed; + } + if (ds->info->info.no_suspend) { + err = enable_irq_wake(irq); + if (err) { + pr_err("gpio_event_input_request_irqs: " + "enable_irq_wake failed for input %d, " + "irq %d\n", + ds->info->keymap[i].gpio, irq); + goto err_enable_irq_wake_failed; + } + } + } + return 0; + + for (i = ds->info->keymap_size - 1; i >= 0; i--) { + irq = gpio_to_irq(ds->info->keymap[i].gpio); + if (ds->info->info.no_suspend) + disable_irq_wake(irq); +err_enable_irq_wake_failed: + free_irq(irq, &ds->key_state[i]); +err_request_irq_failed: +err_gpio_get_irq_num_failed: + ; + } + return err; +} + +int gpio_event_input_func(struct gpio_event_input_devs *input_devs, + struct gpio_event_info *info, void **data, int func) +{ + int ret; + int i; + unsigned long irqflags; + struct gpio_event_input_info *di; + struct gpio_input_state *ds = *data; + char *wlname; + + di = container_of(info, struct gpio_event_input_info, info); + + if (func == GPIO_EVENT_FUNC_SUSPEND) { + if (ds->use_irq) + for (i = 0; i < di->keymap_size; i++) + disable_irq(gpio_to_irq(di->keymap[i].gpio)); + hrtimer_cancel(&ds->timer); + return 0; + } + if (func == GPIO_EVENT_FUNC_RESUME) { + spin_lock_irqsave(&ds->irq_lock, irqflags); + if (ds->use_irq) + for (i = 0; i < di->keymap_size; i++) + enable_irq(gpio_to_irq(di->keymap[i].gpio)); + hrtimer_start(&ds->timer, ktime_set(0, 0), HRTIMER_MODE_REL); + spin_unlock_irqrestore(&ds->irq_lock, irqflags); + return 0; + } + + if (func == GPIO_EVENT_FUNC_INIT) { + if (ktime_to_ns(di->poll_time) <= 0) + di->poll_time = ktime_set(0, 20 * NSEC_PER_MSEC); + + *data = ds = kzalloc(sizeof(*ds) + sizeof(ds->key_state[0]) * + di->keymap_size, GFP_KERNEL); + if (ds == NULL) { + ret = -ENOMEM; + pr_err("gpio_event_input_func: " + "Failed to allocate private data\n"); + goto err_ds_alloc_failed; + } + ds->debounce_count = di->keymap_size; + ds->input_devs = input_devs; + ds->info = di; + wlname = kasprintf(GFP_KERNEL, "gpio_input:%s%s", + input_devs->dev[0]->name, + (input_devs->count > 1) ? "..." : ""); + + ds->ws = wakeup_source_register(wlname); + kfree(wlname); + if (!ds->ws) { + ret = -ENOMEM; + pr_err("gpio_event_input_func: " + "Failed to allocate wakeup source\n"); + goto err_ws_failed; + } + + spin_lock_init(&ds->irq_lock); + + for (i = 0; i < di->keymap_size; i++) { + int dev = di->keymap[i].dev; + if (dev >= input_devs->count) { + pr_err("gpio_event_input_func: bad device " + "index %d >= %d for key code %d\n", + dev, input_devs->count, + di->keymap[i].code); + ret = -EINVAL; + goto err_bad_keymap; + } + input_set_capability(input_devs->dev[dev], di->type, + di->keymap[i].code); + ds->key_state[i].ds = ds; + ds->key_state[i].debounce = DEBOUNCE_UNKNOWN; + } + + for (i = 0; i < di->keymap_size; i++) { + ret = gpio_request(di->keymap[i].gpio, "gpio_kp_in"); + if (ret) { + pr_err("gpio_event_input_func: gpio_request " + "failed for %d\n", di->keymap[i].gpio); + goto err_gpio_request_failed; + } + ret = gpio_direction_input(di->keymap[i].gpio); + if (ret) { + pr_err("gpio_event_input_func: " + "gpio_direction_input failed for %d\n", + di->keymap[i].gpio); + goto err_gpio_configure_failed; + } + } + + ret = gpio_event_input_request_irqs(ds); + + spin_lock_irqsave(&ds->irq_lock, irqflags); + ds->use_irq = ret == 0; + + pr_info("GPIO Input Driver: Start gpio inputs for %s%s in %s " + "mode\n", input_devs->dev[0]->name, + (input_devs->count > 1) ? "..." : "", + ret == 0 ? "interrupt" : "polling"); + + hrtimer_init(&ds->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + ds->timer.function = gpio_event_input_timer_func; + hrtimer_start(&ds->timer, ktime_set(0, 0), HRTIMER_MODE_REL); + spin_unlock_irqrestore(&ds->irq_lock, irqflags); + return 0; + } + + ret = 0; + spin_lock_irqsave(&ds->irq_lock, irqflags); + hrtimer_cancel(&ds->timer); + if (ds->use_irq) { + for (i = di->keymap_size - 1; i >= 0; i--) { + int irq = gpio_to_irq(di->keymap[i].gpio); + if (ds->info->info.no_suspend) + disable_irq_wake(irq); + free_irq(irq, &ds->key_state[i]); + } + } + spin_unlock_irqrestore(&ds->irq_lock, irqflags); + + for (i = di->keymap_size - 1; i >= 0; i--) { +err_gpio_configure_failed: + gpio_free(di->keymap[i].gpio); +err_gpio_request_failed: + ; + } +err_bad_keymap: + wakeup_source_unregister(ds->ws); +err_ws_failed: + kfree(ds); +err_ds_alloc_failed: + return ret; +}
diff --git a/drivers/input/misc/gpio_matrix.c b/drivers/input/misc/gpio_matrix.c new file mode 100644 index 0000000..08769dd --- /dev/null +++ b/drivers/input/misc/gpio_matrix.c
@@ -0,0 +1,440 @@ +/* drivers/input/misc/gpio_matrix.c + * + * Copyright (C) 2007 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/kernel.h> +#include <linux/gpio.h> +#include <linux/gpio_event.h> +#include <linux/hrtimer.h> +#include <linux/interrupt.h> +#include <linux/slab.h> + +struct gpio_kp { + struct gpio_event_input_devs *input_devs; + struct gpio_event_matrix_info *keypad_info; + struct hrtimer timer; + struct wakeup_source wake_src; + int current_output; + unsigned int use_irq:1; + unsigned int key_state_changed:1; + unsigned int last_key_state_changed:1; + unsigned int some_keys_pressed:2; + unsigned int disabled_irq:1; + unsigned long keys_pressed[0]; +}; + +static void clear_phantom_key(struct gpio_kp *kp, int out, int in) +{ + struct gpio_event_matrix_info *mi = kp->keypad_info; + int key_index = out * mi->ninputs + in; + unsigned short keyentry = mi->keymap[key_index]; + unsigned short keycode = keyentry & MATRIX_KEY_MASK; + unsigned short dev = keyentry >> MATRIX_CODE_BITS; + + if (!test_bit(keycode, kp->input_devs->dev[dev]->key)) { + if (mi->flags & GPIOKPF_PRINT_PHANTOM_KEYS) + pr_info("gpiomatrix: phantom key %x, %d-%d (%d-%d) " + "cleared\n", keycode, out, in, + mi->output_gpios[out], mi->input_gpios[in]); + __clear_bit(key_index, kp->keys_pressed); + } else { + if (mi->flags & GPIOKPF_PRINT_PHANTOM_KEYS) + pr_info("gpiomatrix: phantom key %x, %d-%d (%d-%d) " + "not cleared\n", keycode, out, in, + mi->output_gpios[out], mi->input_gpios[in]); + } +} + +static int restore_keys_for_input(struct gpio_kp *kp, int out, int in) +{ + int rv = 0; + int key_index; + + key_index = out * kp->keypad_info->ninputs + in; + while (out < kp->keypad_info->noutputs) { + if (test_bit(key_index, kp->keys_pressed)) { + rv = 1; + clear_phantom_key(kp, out, in); + } + key_index += kp->keypad_info->ninputs; + out++; + } + return rv; +} + +static void remove_phantom_keys(struct gpio_kp *kp) +{ + int out, in, inp; + int key_index; + + if (kp->some_keys_pressed < 3) + return; + + for (out = 0; out < kp->keypad_info->noutputs; out++) { + inp = -1; + key_index = out * kp->keypad_info->ninputs; + for (in = 0; in < kp->keypad_info->ninputs; in++, key_index++) { + if (test_bit(key_index, kp->keys_pressed)) { + if (inp == -1) { + inp = in; + continue; + } + if (inp >= 0) { + if (!restore_keys_for_input(kp, out + 1, + inp)) + break; + clear_phantom_key(kp, out, inp); + inp = -2; + } + restore_keys_for_input(kp, out, in); + } + } + } +} + +static void report_key(struct gpio_kp *kp, int key_index, int out, int in) +{ + struct gpio_event_matrix_info *mi = kp->keypad_info; + int pressed = test_bit(key_index, kp->keys_pressed); + unsigned short keyentry = mi->keymap[key_index]; + unsigned short keycode = keyentry & MATRIX_KEY_MASK; + unsigned short dev = keyentry >> MATRIX_CODE_BITS; + + if (pressed != test_bit(keycode, kp->input_devs->dev[dev]->key)) { + if (keycode == KEY_RESERVED) { + if (mi->flags & GPIOKPF_PRINT_UNMAPPED_KEYS) + pr_info("gpiomatrix: unmapped key, %d-%d " + "(%d-%d) changed to %d\n", + out, in, mi->output_gpios[out], + mi->input_gpios[in], pressed); + } else { + if (mi->flags & GPIOKPF_PRINT_MAPPED_KEYS) + pr_info("gpiomatrix: key %x, %d-%d (%d-%d) " + "changed to %d\n", keycode, + out, in, mi->output_gpios[out], + mi->input_gpios[in], pressed); + input_report_key(kp->input_devs->dev[dev], keycode, pressed); + } + } +} + +static void report_sync(struct gpio_kp *kp) +{ + int i; + + for (i = 0; i < kp->input_devs->count; i++) + input_sync(kp->input_devs->dev[i]); +} + +static enum hrtimer_restart gpio_keypad_timer_func(struct hrtimer *timer) +{ + int out, in; + int key_index; + int gpio; + struct gpio_kp *kp = container_of(timer, struct gpio_kp, timer); + struct gpio_event_matrix_info *mi = kp->keypad_info; + unsigned gpio_keypad_flags = mi->flags; + unsigned polarity = !!(gpio_keypad_flags & GPIOKPF_ACTIVE_HIGH); + + out = kp->current_output; + if (out == mi->noutputs) { + out = 0; + kp->last_key_state_changed = kp->key_state_changed; + kp->key_state_changed = 0; + kp->some_keys_pressed = 0; + } else { + key_index = out * mi->ninputs; + for (in = 0; in < mi->ninputs; in++, key_index++) { + gpio = mi->input_gpios[in]; + if (gpio_get_value(gpio) ^ !polarity) { + if (kp->some_keys_pressed < 3) + kp->some_keys_pressed++; + kp->key_state_changed |= !__test_and_set_bit( + key_index, kp->keys_pressed); + } else + kp->key_state_changed |= __test_and_clear_bit( + key_index, kp->keys_pressed); + } + gpio = mi->output_gpios[out]; + if (gpio_keypad_flags & GPIOKPF_DRIVE_INACTIVE) + gpio_set_value(gpio, !polarity); + else + gpio_direction_input(gpio); + out++; + } + kp->current_output = out; + if (out < mi->noutputs) { + gpio = mi->output_gpios[out]; + if (gpio_keypad_flags & GPIOKPF_DRIVE_INACTIVE) + gpio_set_value(gpio, polarity); + else + gpio_direction_output(gpio, polarity); + hrtimer_start(timer, mi->settle_time, HRTIMER_MODE_REL); + return HRTIMER_NORESTART; + } + if (gpio_keypad_flags & GPIOKPF_DEBOUNCE) { + if (kp->key_state_changed) { + hrtimer_start(&kp->timer, mi->debounce_delay, + HRTIMER_MODE_REL); + return HRTIMER_NORESTART; + } + kp->key_state_changed = kp->last_key_state_changed; + } + if (kp->key_state_changed) { + if (gpio_keypad_flags & GPIOKPF_REMOVE_SOME_PHANTOM_KEYS) + remove_phantom_keys(kp); + key_index = 0; + for (out = 0; out < mi->noutputs; out++) + for (in = 0; in < mi->ninputs; in++, key_index++) + report_key(kp, key_index, out, in); + report_sync(kp); + } + if (!kp->use_irq || kp->some_keys_pressed) { + hrtimer_start(timer, mi->poll_time, HRTIMER_MODE_REL); + return HRTIMER_NORESTART; + } + + /* No keys are pressed, reenable interrupt */ + for (out = 0; out < mi->noutputs; out++) { + if (gpio_keypad_flags & GPIOKPF_DRIVE_INACTIVE) + gpio_set_value(mi->output_gpios[out], polarity); + else + gpio_direction_output(mi->output_gpios[out], polarity); + } + for (in = 0; in < mi->ninputs; in++) + enable_irq(gpio_to_irq(mi->input_gpios[in])); + __pm_relax(&kp->wake_src); + return HRTIMER_NORESTART; +} + +static irqreturn_t gpio_keypad_irq_handler(int irq_in, void *dev_id) +{ + int i; + struct gpio_kp *kp = dev_id; + struct gpio_event_matrix_info *mi = kp->keypad_info; + unsigned gpio_keypad_flags = mi->flags; + + if (!kp->use_irq) { + /* ignore interrupt while registering the handler */ + kp->disabled_irq = 1; + disable_irq_nosync(irq_in); + return IRQ_HANDLED; + } + + for (i = 0; i < mi->ninputs; i++) + disable_irq_nosync(gpio_to_irq(mi->input_gpios[i])); + for (i = 0; i < mi->noutputs; i++) { + if (gpio_keypad_flags & GPIOKPF_DRIVE_INACTIVE) + gpio_set_value(mi->output_gpios[i], + !(gpio_keypad_flags & GPIOKPF_ACTIVE_HIGH)); + else + gpio_direction_input(mi->output_gpios[i]); + } + __pm_stay_awake(&kp->wake_src); + hrtimer_start(&kp->timer, ktime_set(0, 0), HRTIMER_MODE_REL); + return IRQ_HANDLED; +} + +static int gpio_keypad_request_irqs(struct gpio_kp *kp) +{ + int i; + int err; + unsigned int irq; + unsigned long request_flags; + struct gpio_event_matrix_info *mi = kp->keypad_info; + + switch (mi->flags & (GPIOKPF_ACTIVE_HIGH|GPIOKPF_LEVEL_TRIGGERED_IRQ)) { + default: + request_flags = IRQF_TRIGGER_FALLING; + break; + case GPIOKPF_ACTIVE_HIGH: + request_flags = IRQF_TRIGGER_RISING; + break; + case GPIOKPF_LEVEL_TRIGGERED_IRQ: + request_flags = IRQF_TRIGGER_LOW; + break; + case GPIOKPF_LEVEL_TRIGGERED_IRQ | GPIOKPF_ACTIVE_HIGH: + request_flags = IRQF_TRIGGER_HIGH; + break; + } + + for (i = 0; i < mi->ninputs; i++) { + err = irq = gpio_to_irq(mi->input_gpios[i]); + if (err < 0) + goto err_gpio_get_irq_num_failed; + err = request_irq(irq, gpio_keypad_irq_handler, request_flags, + "gpio_kp", kp); + if (err) { + pr_err("gpiomatrix: request_irq failed for input %d, " + "irq %d\n", mi->input_gpios[i], irq); + goto err_request_irq_failed; + } + err = enable_irq_wake(irq); + if (err) { + pr_err("gpiomatrix: set_irq_wake failed for input %d, " + "irq %d\n", mi->input_gpios[i], irq); + } + disable_irq(irq); + if (kp->disabled_irq) { + kp->disabled_irq = 0; + enable_irq(irq); + } + } + return 0; + + for (i = mi->noutputs - 1; i >= 0; i--) { + free_irq(gpio_to_irq(mi->input_gpios[i]), kp); +err_request_irq_failed: +err_gpio_get_irq_num_failed: + ; + } + return err; +} + +int gpio_event_matrix_func(struct gpio_event_input_devs *input_devs, + struct gpio_event_info *info, void **data, int func) +{ + int i; + int err; + int key_count; + struct gpio_kp *kp; + struct gpio_event_matrix_info *mi; + + mi = container_of(info, struct gpio_event_matrix_info, info); + if (func == GPIO_EVENT_FUNC_SUSPEND || func == GPIO_EVENT_FUNC_RESUME) { + /* TODO: disable scanning */ + return 0; + } + + if (func == GPIO_EVENT_FUNC_INIT) { + if (mi->keymap == NULL || + mi->input_gpios == NULL || + mi->output_gpios == NULL) { + err = -ENODEV; + pr_err("gpiomatrix: Incomplete pdata\n"); + goto err_invalid_platform_data; + } + key_count = mi->ninputs * mi->noutputs; + + *data = kp = kzalloc(sizeof(*kp) + sizeof(kp->keys_pressed[0]) * + BITS_TO_LONGS(key_count), GFP_KERNEL); + if (kp == NULL) { + err = -ENOMEM; + pr_err("gpiomatrix: Failed to allocate private data\n"); + goto err_kp_alloc_failed; + } + kp->input_devs = input_devs; + kp->keypad_info = mi; + for (i = 0; i < key_count; i++) { + unsigned short keyentry = mi->keymap[i]; + unsigned short keycode = keyentry & MATRIX_KEY_MASK; + unsigned short dev = keyentry >> MATRIX_CODE_BITS; + if (dev >= input_devs->count) { + pr_err("gpiomatrix: bad device index %d >= " + "%d for key code %d\n", + dev, input_devs->count, keycode); + err = -EINVAL; + goto err_bad_keymap; + } + if (keycode && keycode <= KEY_MAX) + input_set_capability(input_devs->dev[dev], + EV_KEY, keycode); + } + + for (i = 0; i < mi->noutputs; i++) { + err = gpio_request(mi->output_gpios[i], "gpio_kp_out"); + if (err) { + pr_err("gpiomatrix: gpio_request failed for " + "output %d\n", mi->output_gpios[i]); + goto err_request_output_gpio_failed; + } + if (gpio_cansleep(mi->output_gpios[i])) { + pr_err("gpiomatrix: unsupported output gpio %d," + " can sleep\n", mi->output_gpios[i]); + err = -EINVAL; + goto err_output_gpio_configure_failed; + } + if (mi->flags & GPIOKPF_DRIVE_INACTIVE) + err = gpio_direction_output(mi->output_gpios[i], + !(mi->flags & GPIOKPF_ACTIVE_HIGH)); + else + err = gpio_direction_input(mi->output_gpios[i]); + if (err) { + pr_err("gpiomatrix: gpio_configure failed for " + "output %d\n", mi->output_gpios[i]); + goto err_output_gpio_configure_failed; + } + } + for (i = 0; i < mi->ninputs; i++) { + err = gpio_request(mi->input_gpios[i], "gpio_kp_in"); + if (err) { + pr_err("gpiomatrix: gpio_request failed for " + "input %d\n", mi->input_gpios[i]); + goto err_request_input_gpio_failed; + } + err = gpio_direction_input(mi->input_gpios[i]); + if (err) { + pr_err("gpiomatrix: gpio_direction_input failed" + " for input %d\n", mi->input_gpios[i]); + goto err_gpio_direction_input_failed; + } + } + kp->current_output = mi->noutputs; + kp->key_state_changed = 1; + + hrtimer_init(&kp->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + kp->timer.function = gpio_keypad_timer_func; + wakeup_source_init(&kp->wake_src, "gpio_kp"); + err = gpio_keypad_request_irqs(kp); + kp->use_irq = err == 0; + + pr_info("GPIO Matrix Keypad Driver: Start keypad matrix for " + "%s%s in %s mode\n", input_devs->dev[0]->name, + (input_devs->count > 1) ? "..." : "", + kp->use_irq ? "interrupt" : "polling"); + + if (kp->use_irq) + __pm_stay_awake(&kp->wake_src); + hrtimer_start(&kp->timer, ktime_set(0, 0), HRTIMER_MODE_REL); + + return 0; + } + + err = 0; + kp = *data; + + if (kp->use_irq) + for (i = mi->noutputs - 1; i >= 0; i--) + free_irq(gpio_to_irq(mi->input_gpios[i]), kp); + + hrtimer_cancel(&kp->timer); + wakeup_source_trash(&kp->wake_src); + for (i = mi->noutputs - 1; i >= 0; i--) { +err_gpio_direction_input_failed: + gpio_free(mi->input_gpios[i]); +err_request_input_gpio_failed: + ; + } + for (i = mi->noutputs - 1; i >= 0; i--) { +err_output_gpio_configure_failed: + gpio_free(mi->output_gpios[i]); +err_request_output_gpio_failed: + ; + } +err_bad_keymap: + kfree(kp); +err_kp_alloc_failed: +err_invalid_platform_data: + return err; +}
diff --git a/drivers/input/misc/gpio_output.c b/drivers/input/misc/gpio_output.c new file mode 100644 index 0000000..2aac2fa --- /dev/null +++ b/drivers/input/misc/gpio_output.c
@@ -0,0 +1,97 @@ +/* drivers/input/misc/gpio_output.c + * + * Copyright (C) 2007 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/kernel.h> +#include <linux/gpio.h> +#include <linux/gpio_event.h> + +int gpio_event_output_event( + struct gpio_event_input_devs *input_devs, struct gpio_event_info *info, + void **data, unsigned int dev, unsigned int type, + unsigned int code, int value) +{ + int i; + struct gpio_event_output_info *oi; + oi = container_of(info, struct gpio_event_output_info, info); + if (type != oi->type) + return 0; + if (!(oi->flags & GPIOEDF_ACTIVE_HIGH)) + value = !value; + for (i = 0; i < oi->keymap_size; i++) + if (dev == oi->keymap[i].dev && code == oi->keymap[i].code) + gpio_set_value(oi->keymap[i].gpio, value); + return 0; +} + +int gpio_event_output_func( + struct gpio_event_input_devs *input_devs, struct gpio_event_info *info, + void **data, int func) +{ + int ret; + int i; + struct gpio_event_output_info *oi; + oi = container_of(info, struct gpio_event_output_info, info); + + if (func == GPIO_EVENT_FUNC_SUSPEND || func == GPIO_EVENT_FUNC_RESUME) + return 0; + + if (func == GPIO_EVENT_FUNC_INIT) { + int output_level = !(oi->flags & GPIOEDF_ACTIVE_HIGH); + + for (i = 0; i < oi->keymap_size; i++) { + int dev = oi->keymap[i].dev; + if (dev >= input_devs->count) { + pr_err("gpio_event_output_func: bad device " + "index %d >= %d for key code %d\n", + dev, input_devs->count, + oi->keymap[i].code); + ret = -EINVAL; + goto err_bad_keymap; + } + input_set_capability(input_devs->dev[dev], oi->type, + oi->keymap[i].code); + } + + for (i = 0; i < oi->keymap_size; i++) { + ret = gpio_request(oi->keymap[i].gpio, + "gpio_event_output"); + if (ret) { + pr_err("gpio_event_output_func: gpio_request " + "failed for %d\n", oi->keymap[i].gpio); + goto err_gpio_request_failed; + } + ret = gpio_direction_output(oi->keymap[i].gpio, + output_level); + if (ret) { + pr_err("gpio_event_output_func: " + "gpio_direction_output failed for %d\n", + oi->keymap[i].gpio); + goto err_gpio_direction_output_failed; + } + } + return 0; + } + + ret = 0; + for (i = oi->keymap_size - 1; i >= 0; i--) { +err_gpio_direction_output_failed: + gpio_free(oi->keymap[i].gpio); +err_gpio_request_failed: + ; + } +err_bad_keymap: + return ret; +} +
diff --git a/drivers/input/misc/keychord.c b/drivers/input/misc/keychord.c new file mode 100644 index 0000000..82fefdf --- /dev/null +++ b/drivers/input/misc/keychord.c
@@ -0,0 +1,467 @@ +/* + * drivers/input/misc/keychord.c + * + * Copyright (C) 2008 Google, Inc. + * Author: Mike Lockwood <lockwood@android.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * +*/ + +#include <linux/poll.h> +#include <linux/slab.h> +#include <linux/module.h> +#include <linux/init.h> +#include <linux/spinlock.h> +#include <linux/fs.h> +#include <linux/miscdevice.h> +#include <linux/keychord.h> +#include <linux/sched.h> + +#define KEYCHORD_NAME "keychord" +#define BUFFER_SIZE 16 + +MODULE_AUTHOR("Mike Lockwood <lockwood@android.com>"); +MODULE_DESCRIPTION("Key chord input driver"); +MODULE_SUPPORTED_DEVICE("keychord"); +MODULE_LICENSE("GPL"); + +#define NEXT_KEYCHORD(kc) ((struct input_keychord *) \ + ((char *)kc + sizeof(struct input_keychord) + \ + kc->count * sizeof(kc->keycodes[0]))) + +struct keychord_device { + struct input_handler input_handler; + int registered; + + /* list of keychords to monitor */ + struct input_keychord *keychords; + int keychord_count; + + /* bitmask of keys contained in our keychords */ + unsigned long keybit[BITS_TO_LONGS(KEY_CNT)]; + /* current state of the keys */ + unsigned long keystate[BITS_TO_LONGS(KEY_CNT)]; + /* number of keys that are currently pressed */ + int key_down; + + /* second input_device_id is needed for null termination */ + struct input_device_id device_ids[2]; + + spinlock_t lock; + wait_queue_head_t waitq; + unsigned char head; + unsigned char tail; + __u16 buff[BUFFER_SIZE]; + /* Bit to serialize writes to this device */ +#define KEYCHORD_BUSY 0x01 + unsigned long flags; + wait_queue_head_t write_waitq; +}; + +static int check_keychord(struct keychord_device *kdev, + struct input_keychord *keychord) +{ + int i; + + if (keychord->count != kdev->key_down) + return 0; + + for (i = 0; i < keychord->count; i++) { + if (!test_bit(keychord->keycodes[i], kdev->keystate)) + return 0; + } + + /* we have a match */ + return 1; +} + +static void keychord_event(struct input_handle *handle, unsigned int type, + unsigned int code, int value) +{ + struct keychord_device *kdev = handle->private; + struct input_keychord *keychord; + unsigned long flags; + int i, got_chord = 0; + + if (type != EV_KEY || code >= KEY_MAX) + return; + + spin_lock_irqsave(&kdev->lock, flags); + /* do nothing if key state did not change */ + if (!test_bit(code, kdev->keystate) == !value) + goto done; + __change_bit(code, kdev->keystate); + if (value) + kdev->key_down++; + else + kdev->key_down--; + + /* don't notify on key up */ + if (!value) + goto done; + /* ignore this event if it is not one of the keys we are monitoring */ + if (!test_bit(code, kdev->keybit)) + goto done; + + keychord = kdev->keychords; + if (!keychord) + goto done; + + /* check to see if the keyboard state matches any keychords */ + for (i = 0; i < kdev->keychord_count; i++) { + if (check_keychord(kdev, keychord)) { + kdev->buff[kdev->head] = keychord->id; + kdev->head = (kdev->head + 1) % BUFFER_SIZE; + got_chord = 1; + break; + } + /* skip to next keychord */ + keychord = NEXT_KEYCHORD(keychord); + } + +done: + spin_unlock_irqrestore(&kdev->lock, flags); + + if (got_chord) { + pr_info("keychord: got keychord id %d. Any tasks: %d\n", + keychord->id, + !list_empty_careful(&kdev->waitq.task_list)); + wake_up_interruptible(&kdev->waitq); + } +} + +static int keychord_connect(struct input_handler *handler, + struct input_dev *dev, + const struct input_device_id *id) +{ + int i, ret; + struct input_handle *handle; + struct keychord_device *kdev = + container_of(handler, struct keychord_device, input_handler); + + /* + * ignore this input device if it does not contain any keycodes + * that we are monitoring + */ + for (i = 0; i < KEY_MAX; i++) { + if (test_bit(i, kdev->keybit) && test_bit(i, dev->keybit)) + break; + } + if (i == KEY_MAX) + return -ENODEV; + + handle = kzalloc(sizeof(*handle), GFP_KERNEL); + if (!handle) + return -ENOMEM; + + handle->dev = dev; + handle->handler = handler; + handle->name = KEYCHORD_NAME; + handle->private = kdev; + + ret = input_register_handle(handle); + if (ret) + goto err_input_register_handle; + + ret = input_open_device(handle); + if (ret) + goto err_input_open_device; + + pr_info("keychord: using input dev %s for fevent\n", dev->name); + return 0; + +err_input_open_device: + input_unregister_handle(handle); +err_input_register_handle: + kfree(handle); + return ret; +} + +static void keychord_disconnect(struct input_handle *handle) +{ + input_close_device(handle); + input_unregister_handle(handle); + kfree(handle); +} + +/* + * keychord_read is used to read keychord events from the driver + */ +static ssize_t keychord_read(struct file *file, char __user *buffer, + size_t count, loff_t *ppos) +{ + struct keychord_device *kdev = file->private_data; + __u16 id; + int retval; + unsigned long flags; + + if (count < sizeof(id)) + return -EINVAL; + count = sizeof(id); + + if (kdev->head == kdev->tail && (file->f_flags & O_NONBLOCK)) + return -EAGAIN; + + retval = wait_event_interruptible(kdev->waitq, + kdev->head != kdev->tail); + if (retval) + return retval; + + spin_lock_irqsave(&kdev->lock, flags); + /* pop a keychord ID off the queue */ + id = kdev->buff[kdev->tail]; + kdev->tail = (kdev->tail + 1) % BUFFER_SIZE; + spin_unlock_irqrestore(&kdev->lock, flags); + + if (copy_to_user(buffer, &id, count)) + return -EFAULT; + + return count; +} + +/* + * serializes writes on a device. can use mutex_lock_interruptible() + * for this particular use case as well - a matter of preference. + */ +static int +keychord_write_lock(struct keychord_device *kdev) +{ + int ret; + unsigned long flags; + + spin_lock_irqsave(&kdev->lock, flags); + while (kdev->flags & KEYCHORD_BUSY) { + spin_unlock_irqrestore(&kdev->lock, flags); + ret = wait_event_interruptible(kdev->write_waitq, + ((kdev->flags & KEYCHORD_BUSY) == 0)); + if (ret) + return ret; + spin_lock_irqsave(&kdev->lock, flags); + } + kdev->flags |= KEYCHORD_BUSY; + spin_unlock_irqrestore(&kdev->lock, flags); + return 0; +} + +static void +keychord_write_unlock(struct keychord_device *kdev) +{ + unsigned long flags; + + spin_lock_irqsave(&kdev->lock, flags); + kdev->flags &= ~KEYCHORD_BUSY; + spin_unlock_irqrestore(&kdev->lock, flags); + wake_up_interruptible(&kdev->write_waitq); +} + +/* + * keychord_write is used to configure the driver + */ +static ssize_t keychord_write(struct file *file, const char __user *buffer, + size_t count, loff_t *ppos) +{ + struct keychord_device *kdev = file->private_data; + struct input_keychord *keychords = 0; + struct input_keychord *keychord; + int ret, i, key; + unsigned long flags; + size_t resid = count; + size_t key_bytes; + + if (count < sizeof(struct input_keychord) || count > PAGE_SIZE) + return -EINVAL; + keychords = kzalloc(count, GFP_KERNEL); + if (!keychords) + return -ENOMEM; + + /* read list of keychords from userspace */ + if (copy_from_user(keychords, buffer, count)) { + kfree(keychords); + return -EFAULT; + } + + /* + * Serialize writes to this device to prevent various races. + * 1) writers racing here could do duplicate input_unregister_handler() + * calls, resulting in attempting to unlink a node from a list that + * does not exist. + * 2) writers racing here could do duplicate input_register_handler() calls + * below, resulting in a duplicate insertion of a node into the list. + * 3) a double kfree of keychords can occur (in the event that + * input_register_handler() fails below. + */ + ret = keychord_write_lock(kdev); + if (ret) { + kfree(keychords); + return ret; + } + + /* unregister handler before changing configuration */ + if (kdev->registered) { + input_unregister_handler(&kdev->input_handler); + kdev->registered = 0; + } + + spin_lock_irqsave(&kdev->lock, flags); + /* clear any existing configuration */ + kfree(kdev->keychords); + kdev->keychords = 0; + kdev->keychord_count = 0; + kdev->key_down = 0; + memset(kdev->keybit, 0, sizeof(kdev->keybit)); + memset(kdev->keystate, 0, sizeof(kdev->keystate)); + kdev->head = kdev->tail = 0; + + keychord = keychords; + + while (resid > 0) { + /* Is the entire keychord entry header present ? */ + if (resid < sizeof(struct input_keychord)) { + pr_err("keychord: Insufficient bytes present for header %zu\n", + resid); + goto err_unlock_return; + } + resid -= sizeof(struct input_keychord); + if (keychord->count <= 0) { + pr_err("keychord: invalid keycode count %d\n", + keychord->count); + goto err_unlock_return; + } + key_bytes = keychord->count * sizeof(keychord->keycodes[0]); + /* Do we have all the expected keycodes ? */ + if (resid < key_bytes) { + pr_err("keychord: Insufficient bytes present for keycount %zu\n", + resid); + goto err_unlock_return; + } + resid -= key_bytes; + + if (keychord->version != KEYCHORD_VERSION) { + pr_err("keychord: unsupported version %d\n", + keychord->version); + goto err_unlock_return; + } + + /* keep track of the keys we are monitoring in keybit */ + for (i = 0; i < keychord->count; i++) { + key = keychord->keycodes[i]; + if (key < 0 || key >= KEY_CNT) { + pr_err("keychord: keycode %d out of range\n", + key); + goto err_unlock_return; + } + __set_bit(key, kdev->keybit); + } + + kdev->keychord_count++; + keychord = NEXT_KEYCHORD(keychord); + } + + kdev->keychords = keychords; + spin_unlock_irqrestore(&kdev->lock, flags); + + ret = input_register_handler(&kdev->input_handler); + if (ret) { + kfree(keychords); + kdev->keychords = 0; + keychord_write_unlock(kdev); + return ret; + } + kdev->registered = 1; + + keychord_write_unlock(kdev); + + return count; + +err_unlock_return: + spin_unlock_irqrestore(&kdev->lock, flags); + kfree(keychords); + keychord_write_unlock(kdev); + return -EINVAL; +} + +static unsigned int keychord_poll(struct file *file, poll_table *wait) +{ + struct keychord_device *kdev = file->private_data; + + poll_wait(file, &kdev->waitq, wait); + + if (kdev->head != kdev->tail) + return POLLIN | POLLRDNORM; + + return 0; +} + +static int keychord_open(struct inode *inode, struct file *file) +{ + struct keychord_device *kdev; + + kdev = kzalloc(sizeof(struct keychord_device), GFP_KERNEL); + if (!kdev) + return -ENOMEM; + + spin_lock_init(&kdev->lock); + init_waitqueue_head(&kdev->waitq); + init_waitqueue_head(&kdev->write_waitq); + + kdev->input_handler.event = keychord_event; + kdev->input_handler.connect = keychord_connect; + kdev->input_handler.disconnect = keychord_disconnect; + kdev->input_handler.name = KEYCHORD_NAME; + kdev->input_handler.id_table = kdev->device_ids; + + kdev->device_ids[0].flags = INPUT_DEVICE_ID_MATCH_EVBIT; + __set_bit(EV_KEY, kdev->device_ids[0].evbit); + + file->private_data = kdev; + + return 0; +} + +static int keychord_release(struct inode *inode, struct file *file) +{ + struct keychord_device *kdev = file->private_data; + + if (kdev->registered) + input_unregister_handler(&kdev->input_handler); + kfree(kdev->keychords); + kfree(kdev); + + return 0; +} + +static const struct file_operations keychord_fops = { + .owner = THIS_MODULE, + .open = keychord_open, + .release = keychord_release, + .read = keychord_read, + .write = keychord_write, + .poll = keychord_poll, +}; + +static struct miscdevice keychord_misc = { + .fops = &keychord_fops, + .name = KEYCHORD_NAME, + .minor = MISC_DYNAMIC_MINOR, +}; + +static int __init keychord_init(void) +{ + return misc_register(&keychord_misc); +} + +static void __exit keychord_exit(void) +{ + misc_deregister(&keychord_misc); +} + +module_init(keychord_init); +module_exit(keychord_exit);
diff --git a/drivers/isdn/hardware/mISDN/avmfritz.c b/drivers/isdn/hardware/mISDN/avmfritz.c index e3fa1cd..a57b04f 100644 --- a/drivers/isdn/hardware/mISDN/avmfritz.c +++ b/drivers/isdn/hardware/mISDN/avmfritz.c
@@ -156,7 +156,7 @@ _set_debug(struct fritzcard *card) } static int -set_debug(const char *val, struct kernel_param *kp) +set_debug(const char *val, const struct kernel_param *kp) { int ret; struct fritzcard *card;
diff --git a/drivers/isdn/hardware/mISDN/mISDNinfineon.c b/drivers/isdn/hardware/mISDN/mISDNinfineon.c index d5bdbaf..1fc2906 100644 --- a/drivers/isdn/hardware/mISDN/mISDNinfineon.c +++ b/drivers/isdn/hardware/mISDN/mISDNinfineon.c
@@ -244,7 +244,7 @@ _set_debug(struct inf_hw *card) } static int -set_debug(const char *val, struct kernel_param *kp) +set_debug(const char *val, const struct kernel_param *kp) { int ret; struct inf_hw *card;
diff --git a/drivers/isdn/hardware/mISDN/netjet.c b/drivers/isdn/hardware/mISDN/netjet.c index afde4ed..e9fcae4 100644 --- a/drivers/isdn/hardware/mISDN/netjet.c +++ b/drivers/isdn/hardware/mISDN/netjet.c
@@ -111,7 +111,7 @@ _set_debug(struct tiger_hw *card) } static int -set_debug(const char *val, struct kernel_param *kp) +set_debug(const char *val, const struct kernel_param *kp) { int ret; struct tiger_hw *card;
diff --git a/drivers/isdn/hardware/mISDN/speedfax.c b/drivers/isdn/hardware/mISDN/speedfax.c index 9815bb4..1f1446e 100644 --- a/drivers/isdn/hardware/mISDN/speedfax.c +++ b/drivers/isdn/hardware/mISDN/speedfax.c
@@ -94,7 +94,7 @@ _set_debug(struct sfax_hw *card) } static int -set_debug(const char *val, struct kernel_param *kp) +set_debug(const char *val, const struct kernel_param *kp) { int ret; struct sfax_hw *card;
diff --git a/drivers/isdn/hardware/mISDN/w6692.c b/drivers/isdn/hardware/mISDN/w6692.c index 3b067ea..0db6783 100644 --- a/drivers/isdn/hardware/mISDN/w6692.c +++ b/drivers/isdn/hardware/mISDN/w6692.c
@@ -101,7 +101,7 @@ _set_debug(struct w6692_hw *card) } static int -set_debug(const char *val, struct kernel_param *kp) +set_debug(const char *val, const struct kernel_param *kp) { int ret; struct w6692_hw *card;
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index 197e29d..ebdb791 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig
@@ -459,6 +459,21 @@ If unsure, say N. +config DM_VERITY_HASH_PREFETCH_MIN_SIZE_128 + bool "Prefetch size 128" + +config DM_VERITY_HASH_PREFETCH_MIN_SIZE + int "Verity hash prefetch minimum size" + depends on DM_VERITY + range 1 4096 + default 128 if DM_VERITY_HASH_PREFETCH_MIN_SIZE_128 + default 1 + ---help--- + This sets minimum number of hash blocks to prefetch for dm-verity. + For devices like eMMC, having larger prefetch size like 128 can improve + performance with increased memory consumption for keeping more hashes + in RAM. + config DM_VERITY_FEC bool "Verity forward error correction support" depends on DM_VERITY @@ -501,4 +516,53 @@ If unsure, say N. +config DM_VERITY_AVB + tristate "Support AVB specific verity error behavior" + depends on DM_VERITY + ---help--- + Enables Android Verified Boot platform-specific error + behavior. In particular, it will modify the vbmeta partition + specified on the kernel command-line when non-transient error + occurs (followed by a panic). + + If unsure, say N. + +config DM_ANDROID_VERITY + bool "Android verity target support" + depends on BLK_DEV_DM=y + depends on DM_VERITY=y + depends on X509_CERTIFICATE_PARSER + depends on SYSTEM_TRUSTED_KEYRING + depends on CRYPTO_RSA + depends on KEYS + depends on ASYMMETRIC_KEY_TYPE + depends on ASYMMETRIC_PUBLIC_KEY_SUBTYPE + select DM_VERITY_HASH_PREFETCH_MIN_SIZE_128 + ---help--- + This device-mapper target is virtually a VERITY target. This + target is setup by reading the metadata contents piggybacked + to the actual data blocks in the block device. The signature + of the metadata contents are verified against the key included + in the system keyring. Upon success, the underlying verity + target is setup. + +config DM_ANDROID_VERITY_AT_MOST_ONCE_DEFAULT_ENABLED + bool "Verity will validate blocks at most once" + depends on DM_VERITY + ---help--- + Default enables at_most_once option for dm-verity + + Verify data blocks only the first time they are read from the + data device, rather than every time. This reduces the overhead + of dm-verity so that it can be used on systems that are memory + and/or CPU constrained. However, it provides a reduced level + of security because only offline tampering of the data device's + content will be detected, not online tampering. + + Hash blocks are still verified each time they are read from the + hash device, since verification of hash blocks is less performance + critical than data blocks, and a hash block will not be verified + any more after all the data blocks it covers have been verified anyway. + + If unsure, say N. endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile index 3cbda1a..63c3170 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_DM_CACHE_CLEANER) += dm-cache-cleaner.o obj-$(CONFIG_DM_ERA) += dm-era.o obj-$(CONFIG_DM_LOG_WRITES) += dm-log-writes.o +obj-$(CONFIG_DM_ANDROID_VERITY) += dm-android-verity.o ifeq ($(CONFIG_DM_UEVENT),y) dm-mod-objs += dm-uevent.o @@ -67,3 +68,7 @@ ifeq ($(CONFIG_DM_VERITY_FEC),y) dm-verity-objs += dm-verity-fec.o endif + +ifeq ($(CONFIG_DM_VERITY_AVB),y) +dm-verity-objs += dm-verity-avb.o +endif
diff --git a/drivers/md/dm-android-verity.c b/drivers/md/dm-android-verity.c new file mode 100644 index 0000000..f9491de --- /dev/null +++ b/drivers/md/dm-android-verity.c
@@ -0,0 +1,923 @@ +/* + * Copyright (C) 2015 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/buffer_head.h> +#include <linux/debugfs.h> +#include <linux/delay.h> +#include <linux/device.h> +#include <linux/device-mapper.h> +#include <linux/errno.h> +#include <linux/fs.h> +#include <linux/fcntl.h> +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/key.h> +#include <linux/module.h> +#include <linux/mount.h> +#include <linux/namei.h> +#include <linux/of.h> +#include <linux/reboot.h> +#include <linux/string.h> +#include <linux/vmalloc.h> + +#include <asm/setup.h> +#include <crypto/hash.h> +#include <crypto/hash_info.h> +#include <crypto/public_key.h> +#include <crypto/sha.h> +#include <keys/asymmetric-type.h> +#include <keys/system_keyring.h> + +#include "dm-verity.h" +#include "dm-android-verity.h" + +static char verifiedbootstate[VERITY_COMMANDLINE_PARAM_LENGTH]; +static char veritymode[VERITY_COMMANDLINE_PARAM_LENGTH]; +static char veritykeyid[VERITY_DEFAULT_KEY_ID_LENGTH]; +static char buildvariant[BUILD_VARIANT]; + +static bool target_added; +static bool verity_enabled = true; +struct dentry *debug_dir; +static int android_verity_ctr(struct dm_target *ti, unsigned argc, char **argv); + +static struct target_type android_verity_target = { + .name = "android-verity", + .version = {1, 0, 0}, + .module = THIS_MODULE, + .ctr = android_verity_ctr, + .dtr = verity_dtr, + .map = verity_map, + .status = verity_status, + .prepare_ioctl = verity_prepare_ioctl, + .iterate_devices = verity_iterate_devices, + .io_hints = verity_io_hints, +}; + +static int __init verified_boot_state_param(char *line) +{ + strlcpy(verifiedbootstate, line, sizeof(verifiedbootstate)); + return 1; +} + +__setup("androidboot.verifiedbootstate=", verified_boot_state_param); + +static int __init verity_mode_param(char *line) +{ + strlcpy(veritymode, line, sizeof(veritymode)); + return 1; +} + +__setup("androidboot.veritymode=", verity_mode_param); + +static int __init verity_keyid_param(char *line) +{ + strlcpy(veritykeyid, line, sizeof(veritykeyid)); + return 1; +} + +__setup("veritykeyid=", verity_keyid_param); + +static int __init verity_buildvariant(char *line) +{ + strlcpy(buildvariant, line, sizeof(buildvariant)); + return 1; +} + +__setup("buildvariant=", verity_buildvariant); + +static inline bool default_verity_key_id(void) +{ + return veritykeyid[0] != '\0'; +} + +static inline bool is_eng(void) +{ + static const char typeeng[] = "eng"; + + return !strncmp(buildvariant, typeeng, sizeof(typeeng)); +} + +static inline bool is_userdebug(void) +{ + static const char typeuserdebug[] = "userdebug"; + + return !strncmp(buildvariant, typeuserdebug, sizeof(typeuserdebug)); +} + +static inline bool is_unlocked(void) +{ + static const char unlocked[] = "orange"; + + return !strncmp(verifiedbootstate, unlocked, sizeof(unlocked)); +} + +static int read_block_dev(struct bio_read *payload, struct block_device *bdev, + sector_t offset, int length) +{ + struct bio *bio; + int err = 0, i; + + payload->number_of_pages = DIV_ROUND_UP(length, PAGE_SIZE); + + bio = bio_alloc(GFP_KERNEL, payload->number_of_pages); + if (!bio) { + DMERR("Error while allocating bio"); + return -ENOMEM; + } + + bio->bi_bdev = bdev; + bio->bi_iter.bi_sector = offset; + bio_set_op_attrs(bio, REQ_OP_READ, READ_SYNC); + + payload->page_io = kzalloc(sizeof(struct page *) * + payload->number_of_pages, GFP_KERNEL); + if (!payload->page_io) { + DMERR("page_io array alloc failed"); + err = -ENOMEM; + goto free_bio; + } + + for (i = 0; i < payload->number_of_pages; i++) { + payload->page_io[i] = alloc_page(GFP_KERNEL); + if (!payload->page_io[i]) { + DMERR("alloc_page failed"); + err = -ENOMEM; + goto free_pages; + } + if (!bio_add_page(bio, payload->page_io[i], PAGE_SIZE, 0)) { + DMERR("bio_add_page error"); + err = -EIO; + goto free_pages; + } + } + + if (!submit_bio_wait(bio)) + /* success */ + goto free_bio; + DMERR("bio read failed"); + err = -EIO; + +free_pages: + for (i = 0; i < payload->number_of_pages; i++) + if (payload->page_io[i]) + __free_page(payload->page_io[i]); + kfree(payload->page_io); +free_bio: + bio_put(bio); + return err; +} + +static inline u64 fec_div_round_up(u64 x, u64 y) +{ + u64 remainder; + + return div64_u64_rem(x, y, &remainder) + + (remainder > 0 ? 1 : 0); +} + +static inline void populate_fec_metadata(struct fec_header *header, + struct fec_ecc_metadata *ecc) +{ + ecc->blocks = fec_div_round_up(le64_to_cpu(header->inp_size), + FEC_BLOCK_SIZE); + ecc->roots = le32_to_cpu(header->roots); + ecc->start = le64_to_cpu(header->inp_size); +} + +static inline int validate_fec_header(struct fec_header *header, u64 offset) +{ + /* move offset to make the sanity check work for backup header + * as well. */ + offset -= offset % FEC_BLOCK_SIZE; + if (le32_to_cpu(header->magic) != FEC_MAGIC || + le32_to_cpu(header->version) != FEC_VERSION || + le32_to_cpu(header->size) != sizeof(struct fec_header) || + le32_to_cpu(header->roots) == 0 || + le32_to_cpu(header->roots) >= FEC_RSM) + return -EINVAL; + + return 0; +} + +static int extract_fec_header(dev_t dev, struct fec_header *fec, + struct fec_ecc_metadata *ecc) +{ + u64 device_size; + struct bio_read payload; + int i, err = 0; + struct block_device *bdev; + + bdev = blkdev_get_by_dev(dev, FMODE_READ, NULL); + + if (IS_ERR_OR_NULL(bdev)) { + DMERR("bdev get error"); + return PTR_ERR(bdev); + } + + device_size = i_size_read(bdev->bd_inode); + + /* fec metadata size is a power of 2 and PAGE_SIZE + * is a power of 2 as well. + */ + BUG_ON(FEC_BLOCK_SIZE > PAGE_SIZE); + /* 512 byte sector alignment */ + BUG_ON(((device_size - FEC_BLOCK_SIZE) % (1 << SECTOR_SHIFT)) != 0); + + err = read_block_dev(&payload, bdev, (device_size - + FEC_BLOCK_SIZE) / (1 << SECTOR_SHIFT), FEC_BLOCK_SIZE); + if (err) { + DMERR("Error while reading verity metadata"); + goto error; + } + + BUG_ON(sizeof(struct fec_header) > PAGE_SIZE); + memcpy(fec, page_address(payload.page_io[0]), + sizeof(*fec)); + + ecc->valid = true; + if (validate_fec_header(fec, device_size - FEC_BLOCK_SIZE)) { + /* Try the backup header */ + memcpy(fec, page_address(payload.page_io[0]) + FEC_BLOCK_SIZE + - sizeof(*fec) , + sizeof(*fec)); + if (validate_fec_header(fec, device_size - + sizeof(struct fec_header))) + ecc->valid = false; + } + + if (ecc->valid) + populate_fec_metadata(fec, ecc); + + for (i = 0; i < payload.number_of_pages; i++) + __free_page(payload.page_io[i]); + kfree(payload.page_io); + +error: + blkdev_put(bdev, FMODE_READ); + return err; +} +static void find_metadata_offset(struct fec_header *fec, + struct block_device *bdev, u64 *metadata_offset) +{ + u64 device_size; + + device_size = i_size_read(bdev->bd_inode); + + if (le32_to_cpu(fec->magic) == FEC_MAGIC) + *metadata_offset = le64_to_cpu(fec->inp_size) - + VERITY_METADATA_SIZE; + else + *metadata_offset = device_size - VERITY_METADATA_SIZE; +} + +static int find_size(dev_t dev, u64 *device_size) +{ + struct block_device *bdev; + + bdev = blkdev_get_by_dev(dev, FMODE_READ, NULL); + if (IS_ERR_OR_NULL(bdev)) { + DMERR("blkdev_get_by_dev failed"); + return PTR_ERR(bdev); + } + + *device_size = i_size_read(bdev->bd_inode); + *device_size >>= SECTOR_SHIFT; + + DMINFO("blkdev size in sectors: %llu", *device_size); + blkdev_put(bdev, FMODE_READ); + return 0; +} + +static int verify_header(struct android_metadata_header *header) +{ + int retval = -EINVAL; + + if (is_userdebug() && le32_to_cpu(header->magic_number) == + VERITY_METADATA_MAGIC_DISABLE) + return VERITY_STATE_DISABLE; + + if (!(le32_to_cpu(header->magic_number) == + VERITY_METADATA_MAGIC_NUMBER) || + (le32_to_cpu(header->magic_number) == + VERITY_METADATA_MAGIC_DISABLE)) { + DMERR("Incorrect magic number"); + return retval; + } + + if (le32_to_cpu(header->protocol_version) != + VERITY_METADATA_VERSION) { + DMERR("Unsupported version %u", + le32_to_cpu(header->protocol_version)); + return retval; + } + + return 0; +} + +static int extract_metadata(dev_t dev, struct fec_header *fec, + struct android_metadata **metadata, + bool *verity_enabled) +{ + struct block_device *bdev; + struct android_metadata_header *header; + int i; + u32 table_length, copy_length, offset; + u64 metadata_offset; + struct bio_read payload; + int err = 0; + + bdev = blkdev_get_by_dev(dev, FMODE_READ, NULL); + + if (IS_ERR_OR_NULL(bdev)) { + DMERR("blkdev_get_by_dev failed"); + return -ENODEV; + } + + find_metadata_offset(fec, bdev, &metadata_offset); + + /* Verity metadata size is a power of 2 and PAGE_SIZE + * is a power of 2 as well. + * PAGE_SIZE is also a multiple of 512 bytes. + */ + if (VERITY_METADATA_SIZE > PAGE_SIZE) + BUG_ON(VERITY_METADATA_SIZE % PAGE_SIZE != 0); + /* 512 byte sector alignment */ + BUG_ON(metadata_offset % (1 << SECTOR_SHIFT) != 0); + + err = read_block_dev(&payload, bdev, metadata_offset / + (1 << SECTOR_SHIFT), VERITY_METADATA_SIZE); + if (err) { + DMERR("Error while reading verity metadata"); + goto blkdev_release; + } + + header = kzalloc(sizeof(*header), GFP_KERNEL); + if (!header) { + DMERR("kzalloc failed for header"); + err = -ENOMEM; + goto free_payload; + } + + memcpy(header, page_address(payload.page_io[0]), + sizeof(*header)); + + DMINFO("bio magic_number:%u protocol_version:%d table_length:%u", + le32_to_cpu(header->magic_number), + le32_to_cpu(header->protocol_version), + le32_to_cpu(header->table_length)); + + err = verify_header(header); + + if (err == VERITY_STATE_DISABLE) { + DMERR("Mounting root with verity disabled"); + *verity_enabled = false; + /* we would still have to read the metadata to figure out + * the data blocks size. Or may be could map the entire + * partition similar to mounting the device. + * + * Reset error as well as the verity_enabled flag is changed. + */ + err = 0; + } else if (err) + goto free_header; + + *metadata = kzalloc(sizeof(**metadata), GFP_KERNEL); + if (!*metadata) { + DMERR("kzalloc for metadata failed"); + err = -ENOMEM; + goto free_header; + } + + (*metadata)->header = header; + table_length = le32_to_cpu(header->table_length); + + if (table_length == 0 || + table_length > (VERITY_METADATA_SIZE - + sizeof(struct android_metadata_header))) { + DMERR("table_length too long"); + err = -EINVAL; + goto free_metadata; + } + + (*metadata)->verity_table = kzalloc(table_length + 1, GFP_KERNEL); + + if (!(*metadata)->verity_table) { + DMERR("kzalloc verity_table failed"); + err = -ENOMEM; + goto free_metadata; + } + + if (sizeof(struct android_metadata_header) + + table_length <= PAGE_SIZE) { + memcpy((*metadata)->verity_table, + page_address(payload.page_io[0]) + + sizeof(struct android_metadata_header), + table_length); + } else { + copy_length = PAGE_SIZE - + sizeof(struct android_metadata_header); + memcpy((*metadata)->verity_table, + page_address(payload.page_io[0]) + + sizeof(struct android_metadata_header), + copy_length); + table_length -= copy_length; + offset = copy_length; + i = 1; + while (table_length != 0) { + if (table_length > PAGE_SIZE) { + memcpy((*metadata)->verity_table + offset, + page_address(payload.page_io[i]), + PAGE_SIZE); + offset += PAGE_SIZE; + table_length -= PAGE_SIZE; + } else { + memcpy((*metadata)->verity_table + offset, + page_address(payload.page_io[i]), + table_length); + table_length = 0; + } + i++; + } + } + (*metadata)->verity_table[table_length] = '\0'; + + DMINFO("verity_table: %s", (*metadata)->verity_table); + goto free_payload; + +free_metadata: + kfree(*metadata); +free_header: + kfree(header); +free_payload: + for (i = 0; i < payload.number_of_pages; i++) + if (payload.page_io[i]) + __free_page(payload.page_io[i]); + kfree(payload.page_io); +blkdev_release: + blkdev_put(bdev, FMODE_READ); + return err; +} + +/* helper functions to extract properties from dts */ +const char *find_dt_value(const char *name) +{ + struct device_node *firmware; + const char *value; + + firmware = of_find_node_by_path("/firmware/android"); + if (!firmware) + return NULL; + value = of_get_property(firmware, name, NULL); + of_node_put(firmware); + + return value; +} + +static int verity_mode(void) +{ + static const char enforcing[] = "enforcing"; + static const char verified_mode_prop[] = "veritymode"; + const char *value; + + value = find_dt_value(verified_mode_prop); + if (!value) + value = veritymode; + if (!strncmp(value, enforcing, sizeof(enforcing) - 1)) + return DM_VERITY_MODE_RESTART; + + return DM_VERITY_MODE_EIO; +} + +static void handle_error(void) +{ + int mode = verity_mode(); + if (mode == DM_VERITY_MODE_RESTART) { + DMERR("triggering restart"); + kernel_restart("dm-verity device corrupted"); + } else { + DMERR("Mounting verity root failed"); + } +} + +static struct public_key_signature *table_make_digest( + enum hash_algo hash, + const void *table, + unsigned long table_len) +{ + struct public_key_signature *pks = NULL; + struct crypto_shash *tfm; + struct shash_desc *desc; + size_t digest_size, desc_size; + int ret; + + /* Allocate the hashing algorithm we're going to need and find out how + * big the hash operational data will be. + */ + tfm = crypto_alloc_shash(hash_algo_name[hash], 0, 0); + if (IS_ERR(tfm)) + return ERR_CAST(tfm); + + desc_size = crypto_shash_descsize(tfm) + sizeof(*desc); + digest_size = crypto_shash_digestsize(tfm); + + /* We allocate the hash operational data storage on the end of out + * context data and the digest output buffer on the end of that. + */ + ret = -ENOMEM; + pks = kzalloc(digest_size + sizeof(*pks) + desc_size, GFP_KERNEL); + if (!pks) + goto error; + + pks->pkey_algo = "rsa"; + pks->hash_algo = hash_algo_name[hash]; + pks->digest = (u8 *)pks + sizeof(*pks) + desc_size; + pks->digest_size = digest_size; + + desc = (struct shash_desc *)(pks + 1); + desc->tfm = tfm; + desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP; + + ret = crypto_shash_init(desc); + if (ret < 0) + goto error; + + ret = crypto_shash_finup(desc, table, table_len, pks->digest); + if (ret < 0) + goto error; + + crypto_free_shash(tfm); + return pks; + +error: + kfree(pks); + crypto_free_shash(tfm); + return ERR_PTR(ret); +} + + +static int verify_verity_signature(char *key_id, + struct android_metadata *metadata) +{ + struct public_key_signature *pks = NULL; + int retval = -EINVAL; + + if (!key_id) + goto error; + + pks = table_make_digest(HASH_ALGO_SHA256, + (const void *)metadata->verity_table, + le32_to_cpu(metadata->header->table_length)); + if (IS_ERR(pks)) { + DMERR("hashing failed"); + retval = PTR_ERR(pks); + pks = NULL; + goto error; + } + + pks->s = kmemdup(&metadata->header->signature[0], RSANUMBYTES, GFP_KERNEL); + if (!pks->s) { + DMERR("Error allocating memory for signature"); + goto error; + } + pks->s_size = RSANUMBYTES; + + retval = verify_signature_one(pks, NULL, key_id); + kfree(pks->s); +error: + kfree(pks); + return retval; +} + +static inline bool test_mult_overflow(sector_t a, u32 b) +{ + sector_t r = (sector_t)~0ULL; + + sector_div(r, b); + return a > r; +} + +static int add_as_linear_device(struct dm_target *ti, char *dev) +{ + /*Move to linear mapping defines*/ + char *linear_table_args[DM_LINEAR_ARGS] = {dev, + DM_LINEAR_TARGET_OFFSET}; + int err = 0; + + android_verity_target.dtr = dm_linear_dtr, + android_verity_target.map = dm_linear_map, + android_verity_target.status = dm_linear_status, + android_verity_target.prepare_ioctl = dm_linear_prepare_ioctl, + android_verity_target.iterate_devices = dm_linear_iterate_devices, + android_verity_target.direct_access = dm_linear_direct_access, + android_verity_target.io_hints = NULL; + + set_disk_ro(dm_disk(dm_table_get_md(ti->table)), 0); + + err = dm_linear_ctr(ti, DM_LINEAR_ARGS, linear_table_args); + + if (!err) { + DMINFO("Added android-verity as a linear target"); + target_added = true; + } else + DMERR("Failed to add android-verity as linear target"); + + return err; +} + +static int create_linear_device(struct dm_target *ti, dev_t dev, + char *target_device) +{ + u64 device_size = 0; + int err = find_size(dev, &device_size); + + if (err) { + DMERR("error finding bdev size"); + handle_error(); + return err; + } + + ti->len = device_size; + err = add_as_linear_device(ti, target_device); + if (err) { + handle_error(); + return err; + } + verity_enabled = false; + return 0; +} + +/* + * Target parameters: + * <key id> Key id of the public key in the system keyring. + * Verity metadata's signature would be verified against + * this. If the key id contains spaces, replace them + * with '#'. + * <block device> The block device for which dm-verity is being setup. + */ +static int android_verity_ctr(struct dm_target *ti, unsigned argc, char **argv) +{ + dev_t uninitialized_var(dev); + struct android_metadata *metadata = NULL; + int err = 0, i, mode; + char *key_id = NULL, *table_ptr, dummy, *target_device; + char *verity_table_args[VERITY_TABLE_ARGS + 2 + VERITY_TABLE_OPT_FEC_ARGS]; + /* One for specifying number of opt args and one for mode */ + sector_t data_sectors; + u32 data_block_size; + unsigned int no_of_args = VERITY_TABLE_ARGS + 2 + VERITY_TABLE_OPT_FEC_ARGS; + struct fec_header uninitialized_var(fec); + struct fec_ecc_metadata uninitialized_var(ecc); + char buf[FEC_ARG_LENGTH], *buf_ptr; + unsigned long long tmpll; + + if (argc == 1) { + /* Use the default keyid */ + if (default_verity_key_id()) + key_id = veritykeyid; + else if (!is_eng()) { + DMERR("veritykeyid= is not set"); + handle_error(); + return -EINVAL; + } + target_device = argv[0]; + } else if (argc == 2) { + key_id = argv[0]; + target_device = argv[1]; + } else { + DMERR("Incorrect number of arguments"); + handle_error(); + return -EINVAL; + } + + dev = name_to_dev_t(target_device); + if (!dev) { + DMERR("no dev found for %s", target_device); + handle_error(); + return -EINVAL; + } + + if (is_eng()) + return create_linear_device(ti, dev, target_device); + + strreplace(key_id, '#', ' '); + + DMINFO("key:%s dev:%s", key_id, target_device); + + if (extract_fec_header(dev, &fec, &ecc)) { + DMERR("Error while extracting fec header"); + handle_error(); + return -EINVAL; + } + + err = extract_metadata(dev, &fec, &metadata, &verity_enabled); + + if (err) { + /* Allow invalid metadata when the device is unlocked */ + if (is_unlocked()) { + DMWARN("Allow invalid metadata when unlocked"); + return create_linear_device(ti, dev, target_device); + } + DMERR("Error while extracting metadata"); + handle_error(); + goto free_metadata; + } + + if (verity_enabled) { + err = verify_verity_signature(key_id, metadata); + + if (err) { + DMERR("Signature verification failed"); + handle_error(); + goto free_metadata; + } else + DMINFO("Signature verification success"); + } + + table_ptr = metadata->verity_table; + + for (i = 0; i < VERITY_TABLE_ARGS; i++) { + verity_table_args[i] = strsep(&table_ptr, " "); + if (verity_table_args[i] == NULL) + break; + } + + if (i != VERITY_TABLE_ARGS) { + DMERR("Verity table not in the expected format"); + err = -EINVAL; + handle_error(); + goto free_metadata; + } + + if (sscanf(verity_table_args[5], "%llu%c", &tmpll, &dummy) + != 1) { + DMERR("Verity table not in the expected format"); + handle_error(); + err = -EINVAL; + goto free_metadata; + } + + if (tmpll > ULONG_MAX) { + DMERR("<num_data_blocks> too large. Forgot to turn on CONFIG_LBDAF?"); + handle_error(); + err = -EINVAL; + goto free_metadata; + } + + data_sectors = tmpll; + + if (sscanf(verity_table_args[3], "%u%c", &data_block_size, &dummy) + != 1) { + DMERR("Verity table not in the expected format"); + handle_error(); + err = -EINVAL; + goto free_metadata; + } + + if (test_mult_overflow(data_sectors, data_block_size >> + SECTOR_SHIFT)) { + DMERR("data_sectors too large"); + handle_error(); + err = -EOVERFLOW; + goto free_metadata; + } + + data_sectors *= data_block_size >> SECTOR_SHIFT; + DMINFO("Data sectors %llu", (unsigned long long)data_sectors); + + /* update target length */ + ti->len = data_sectors; + + /* Setup linear target and free */ + if (!verity_enabled) { + err = add_as_linear_device(ti, target_device); + goto free_metadata; + } + + /*substitute data_dev and hash_dev*/ + verity_table_args[1] = target_device; + verity_table_args[2] = target_device; + + mode = verity_mode(); + + if (ecc.valid && IS_BUILTIN(CONFIG_DM_VERITY_FEC)) { + if (mode) { + err = snprintf(buf, FEC_ARG_LENGTH, + "%u %s " VERITY_TABLE_OPT_FEC_FORMAT, + 1 + VERITY_TABLE_OPT_FEC_ARGS, + mode == DM_VERITY_MODE_RESTART ? + VERITY_TABLE_OPT_RESTART : + VERITY_TABLE_OPT_LOGGING, + target_device, + ecc.start / FEC_BLOCK_SIZE, ecc.blocks, + ecc.roots); + } else { + err = snprintf(buf, FEC_ARG_LENGTH, + "%u " VERITY_TABLE_OPT_FEC_FORMAT, + VERITY_TABLE_OPT_FEC_ARGS, target_device, + ecc.start / FEC_BLOCK_SIZE, ecc.blocks, + ecc.roots); + } + } else if (mode) { + err = snprintf(buf, FEC_ARG_LENGTH, + "2 " VERITY_TABLE_OPT_IGNZERO " %s", + mode == DM_VERITY_MODE_RESTART ? + VERITY_TABLE_OPT_RESTART : VERITY_TABLE_OPT_LOGGING); + } else { + err = snprintf(buf, FEC_ARG_LENGTH, "1 %s", + "ignore_zero_blocks"); + } + + if (err < 0 || err >= FEC_ARG_LENGTH) + goto free_metadata; + + buf_ptr = buf; + + for (i = VERITY_TABLE_ARGS; i < (VERITY_TABLE_ARGS + + VERITY_TABLE_OPT_FEC_ARGS + 2); i++) { + verity_table_args[i] = strsep(&buf_ptr, " "); + if (verity_table_args[i] == NULL) { + no_of_args = i; + break; + } + } + + err = verity_ctr(ti, no_of_args, verity_table_args); + if (err) { + DMERR("android-verity failed to create a verity target"); + } else { + target_added = true; + DMINFO("android-verity created as verity target"); + } + +free_metadata: + if (metadata) { + kfree(metadata->header); + kfree(metadata->verity_table); + } + kfree(metadata); + return err; +} + +static int __init dm_android_verity_init(void) +{ + int r; + struct dentry *file; + + r = dm_register_target(&android_verity_target); + if (r < 0) + DMERR("register failed %d", r); + + /* Tracks the status of the last added target */ + debug_dir = debugfs_create_dir("android_verity", NULL); + + if (IS_ERR_OR_NULL(debug_dir)) { + DMERR("Cannot create android_verity debugfs directory: %ld", + PTR_ERR(debug_dir)); + goto end; + } + + file = debugfs_create_bool("target_added", S_IRUGO, debug_dir, + &target_added); + + if (IS_ERR_OR_NULL(file)) { + DMERR("Cannot create android_verity debugfs directory: %ld", + PTR_ERR(debug_dir)); + debugfs_remove_recursive(debug_dir); + goto end; + } + + file = debugfs_create_bool("verity_enabled", S_IRUGO, debug_dir, + &verity_enabled); + + if (IS_ERR_OR_NULL(file)) { + DMERR("Cannot create android_verity debugfs directory: %ld", + PTR_ERR(debug_dir)); + debugfs_remove_recursive(debug_dir); + } + +end: + return r; +} + +static void __exit dm_android_verity_exit(void) +{ + if (!IS_ERR_OR_NULL(debug_dir)) + debugfs_remove_recursive(debug_dir); + + dm_unregister_target(&android_verity_target); +} + +module_init(dm_android_verity_init); +module_exit(dm_android_verity_exit);
diff --git a/drivers/md/dm-android-verity.h b/drivers/md/dm-android-verity.h new file mode 100644 index 0000000..c8d7ab64 --- /dev/null +++ b/drivers/md/dm-android-verity.h
@@ -0,0 +1,123 @@ +/* + * Copyright (C) 2015 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef DM_ANDROID_VERITY_H +#define DM_ANDROID_VERITY_H + +#include <crypto/sha.h> + +#define RSANUMBYTES 256 +#define VERITY_METADATA_MAGIC_NUMBER 0xb001b001 +#define VERITY_METADATA_MAGIC_DISABLE 0x46464f56 +#define VERITY_METADATA_VERSION 0 +#define VERITY_STATE_DISABLE 1 +#define DATA_BLOCK_SIZE (4 * 1024) +#define VERITY_METADATA_SIZE (8 * DATA_BLOCK_SIZE) +#define VERITY_TABLE_ARGS 10 +#define VERITY_COMMANDLINE_PARAM_LENGTH 20 +#define BUILD_VARIANT 20 + +/* + * <subject>:<sha1-id> is the format for the identifier. + * subject can either be the Common Name(CN) + Organization Name(O) or + * just the CN if the it is prefixed with O + * From https://tools.ietf.org/html/rfc5280#appendix-A + * ub-organization-name-length INTEGER ::= 64 + * ub-common-name-length INTEGER ::= 64 + * + * http://lxr.free-electrons.com/source/crypto/asymmetric_keys/x509_cert_parser.c?v=3.9#L278 + * ctx->o_size + 2 + ctx->cn_size + 1 + * + 41 characters for ":" and sha1 id + * 64 + 2 + 64 + 1 + 1 + 40 (172) + * setting VERITY_DEFAULT_KEY_ID_LENGTH to 200 characters. + */ +#define VERITY_DEFAULT_KEY_ID_LENGTH 200 + +#define FEC_MAGIC 0xFECFECFE +#define FEC_BLOCK_SIZE (4 * 1024) +#define FEC_VERSION 0 +#define FEC_RSM 255 +#define FEC_ARG_LENGTH 300 + +#define VERITY_TABLE_OPT_RESTART "restart_on_corruption" +#define VERITY_TABLE_OPT_LOGGING "ignore_corruption" +#define VERITY_TABLE_OPT_IGNZERO "ignore_zero_blocks" + +#define VERITY_TABLE_OPT_FEC_FORMAT \ + "use_fec_from_device %s fec_start %llu fec_blocks %llu fec_roots %u ignore_zero_blocks" +#define VERITY_TABLE_OPT_FEC_ARGS 9 + +#define VERITY_DEBUG 0 + +#define DM_MSG_PREFIX "android-verity" + +#define DM_LINEAR_ARGS 2 +#define DM_LINEAR_TARGET_OFFSET "0" + +/* + * There can be two formats. + * if fec is present + * <data_blocks> <verity_tree> <verity_metdata_32K><fec_data><fec_data_4K> + * if fec is not present + * <data_blocks> <verity_tree> <verity_metdata_32K> + */ +struct fec_header { + __le32 magic; + __le32 version; + __le32 size; + __le32 roots; + __le32 fec_size; + __le64 inp_size; + u8 hash[SHA256_DIGEST_SIZE]; +} __attribute__((packed)); + +struct android_metadata_header { + __le32 magic_number; + __le32 protocol_version; + char signature[RSANUMBYTES]; + __le32 table_length; +}; + +struct android_metadata { + struct android_metadata_header *header; + char *verity_table; +}; + +struct fec_ecc_metadata { + bool valid; + u32 roots; + u64 blocks; + u64 rounds; + u64 start; +}; + +struct bio_read { + struct page **page_io; + int number_of_pages; +}; + +extern struct target_type linear_target; + +extern void dm_linear_dtr(struct dm_target *ti); +extern int dm_linear_map(struct dm_target *ti, struct bio *bio); +extern void dm_linear_status(struct dm_target *ti, status_type_t type, + unsigned status_flags, char *result, unsigned maxlen); +extern int dm_linear_prepare_ioctl(struct dm_target *ti, + struct block_device **bdev, fmode_t *mode); +extern int dm_linear_iterate_devices(struct dm_target *ti, + iterate_devices_callout_fn fn, void *data); +extern int dm_linear_ctr(struct dm_target *ti, unsigned int argc, char **argv); +extern long dm_linear_direct_access(struct dm_target *ti, sector_t sector, + void **kaddr, pfn_t *pfn, long size); +#endif /* DM_ANDROID_VERITY_H */
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 0aedd0e..7a5b75f 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c
@@ -1866,16 +1866,24 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv) } ret = -ENOMEM; - cc->io_queue = alloc_workqueue("kcryptd_io", WQ_MEM_RECLAIM, 1); + cc->io_queue = alloc_workqueue("kcryptd_io", + WQ_HIGHPRI | + WQ_MEM_RECLAIM, + 1); if (!cc->io_queue) { ti->error = "Couldn't create kcryptd io queue"; goto bad; } if (test_bit(DM_CRYPT_SAME_CPU, &cc->flags)) - cc->crypt_queue = alloc_workqueue("kcryptd", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 1); + cc->crypt_queue = alloc_workqueue("kcryptd", + WQ_HIGHPRI | + WQ_MEM_RECLAIM, 1); else - cc->crypt_queue = alloc_workqueue("kcryptd", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND, + cc->crypt_queue = alloc_workqueue("kcryptd", + WQ_HIGHPRI | + WQ_MEM_RECLAIM | + WQ_UNBOUND, num_online_cpus()); if (!cc->crypt_queue) { ti->error = "Couldn't create kcryptd queue";
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index b67414b..f688bfe 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c
@@ -1927,6 +1927,45 @@ void dm_interface_exit(void) dm_hash_exit(); } + +/** + * dm_ioctl_export - Permanently export a mapped device via the ioctl interface + * @md: Pointer to mapped_device + * @name: Buffer (size DM_NAME_LEN) for name + * @uuid: Buffer (size DM_UUID_LEN) for uuid or NULL if not desired + */ +int dm_ioctl_export(struct mapped_device *md, const char *name, + const char *uuid) +{ + int r = 0; + struct hash_cell *hc; + + if (!md) { + r = -ENXIO; + goto out; + } + + /* The name and uuid can only be set once. */ + mutex_lock(&dm_hash_cells_mutex); + hc = dm_get_mdptr(md); + mutex_unlock(&dm_hash_cells_mutex); + if (hc) { + DMERR("%s: already exported", dm_device_name(md)); + r = -ENXIO; + goto out; + } + + r = dm_hash_insert(name, uuid, md); + if (r) { + DMERR("%s: could not bind to '%s'", dm_device_name(md), name); + goto out; + } + + /* Let udev know we've changed. */ + dm_kobject_uevent(md, KOBJ_CHANGE, dm_get_event_nr(md)); +out: + return r; +} /** * dm_copy_name_and_uuid - Copy mapped device name & uuid into supplied buffers * @md: Pointer to mapped_device
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 4788b0b..4ad62d6 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c
@@ -25,7 +25,7 @@ struct linear_c { /* * Construct a linear mapping: <dev_path> <offset> */ -static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv) +int dm_linear_ctr(struct dm_target *ti, unsigned int argc, char **argv) { struct linear_c *lc; unsigned long long tmp; @@ -66,14 +66,16 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv) kfree(lc); return ret; } +EXPORT_SYMBOL_GPL(dm_linear_ctr); -static void linear_dtr(struct dm_target *ti) +void dm_linear_dtr(struct dm_target *ti) { struct linear_c *lc = (struct linear_c *) ti->private; dm_put_device(ti, lc->dev); kfree(lc); } +EXPORT_SYMBOL_GPL(dm_linear_dtr); static sector_t linear_map_sector(struct dm_target *ti, sector_t bi_sector) { @@ -92,14 +94,15 @@ static void linear_map_bio(struct dm_target *ti, struct bio *bio) linear_map_sector(ti, bio->bi_iter.bi_sector); } -static int linear_map(struct dm_target *ti, struct bio *bio) +int dm_linear_map(struct dm_target *ti, struct bio *bio) { linear_map_bio(ti, bio); return DM_MAPIO_REMAPPED; } +EXPORT_SYMBOL_GPL(dm_linear_map); -static void linear_status(struct dm_target *ti, status_type_t type, +void dm_linear_status(struct dm_target *ti, status_type_t type, unsigned status_flags, char *result, unsigned maxlen) { struct linear_c *lc = (struct linear_c *) ti->private; @@ -115,8 +118,9 @@ static void linear_status(struct dm_target *ti, status_type_t type, break; } } +EXPORT_SYMBOL_GPL(dm_linear_status); -static int linear_prepare_ioctl(struct dm_target *ti, +int dm_linear_prepare_ioctl(struct dm_target *ti, struct block_device **bdev, fmode_t *mode) { struct linear_c *lc = (struct linear_c *) ti->private; @@ -132,16 +136,18 @@ static int linear_prepare_ioctl(struct dm_target *ti, return 1; return 0; } +EXPORT_SYMBOL_GPL(dm_linear_prepare_ioctl); -static int linear_iterate_devices(struct dm_target *ti, +int dm_linear_iterate_devices(struct dm_target *ti, iterate_devices_callout_fn fn, void *data) { struct linear_c *lc = ti->private; return fn(ti, lc->dev, lc->start, ti->len, data); } +EXPORT_SYMBOL_GPL(dm_linear_iterate_devices); -static long linear_direct_access(struct dm_target *ti, sector_t sector, +long dm_linear_direct_access(struct dm_target *ti, sector_t sector, void **kaddr, pfn_t *pfn, long size) { struct linear_c *lc = ti->private; @@ -158,18 +164,19 @@ static long linear_direct_access(struct dm_target *ti, sector_t sector, return ret; } +EXPORT_SYMBOL_GPL(dm_linear_direct_access); static struct target_type linear_target = { .name = "linear", .version = {1, 3, 0}, .module = THIS_MODULE, - .ctr = linear_ctr, - .dtr = linear_dtr, - .map = linear_map, - .status = linear_status, - .prepare_ioctl = linear_prepare_ioctl, - .iterate_devices = linear_iterate_devices, - .direct_access = linear_direct_access, + .ctr = dm_linear_ctr, + .dtr = dm_linear_dtr, + .map = dm_linear_map, + .status = dm_linear_status, + .prepare_ioctl = dm_linear_prepare_ioctl, + .iterate_devices = dm_linear_iterate_devices, + .direct_access = dm_linear_direct_access, }; int __init dm_linear_init(void)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 5ac239d..0e79a43 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c
@@ -11,6 +11,7 @@ #include <linux/vmalloc.h> #include <linux/blkdev.h> #include <linux/namei.h> +#include <linux/mount.h> #include <linux/ctype.h> #include <linux/string.h> #include <linux/slab.h> @@ -510,14 +511,14 @@ static int adjoin(struct dm_table *table, struct dm_target *ti) * On the other hand, dm-switch needs to process bulk data using messages and * excessive use of GFP_NOIO could cause trouble. */ -static char **realloc_argv(unsigned *array_size, char **old_argv) +static char **realloc_argv(unsigned *size, char **old_argv) { char **argv; unsigned new_size; gfp_t gfp; - if (*array_size) { - new_size = *array_size * 2; + if (*size) { + new_size = *size * 2; gfp = GFP_KERNEL; } else { new_size = 8; @@ -525,8 +526,8 @@ static char **realloc_argv(unsigned *array_size, char **old_argv) } argv = kmalloc(new_size * sizeof(*argv), gfp); if (argv) { - memcpy(argv, old_argv, *array_size * sizeof(*argv)); - *array_size = new_size; + memcpy(argv, old_argv, *size * sizeof(*argv)); + *size = new_size; } kfree(old_argv);
diff --git a/drivers/md/dm-verity-avb.c b/drivers/md/dm-verity-avb.c new file mode 100644 index 0000000..89f95e4 --- /dev/null +++ b/drivers/md/dm-verity-avb.c
@@ -0,0 +1,229 @@ +/* + * Copyright (C) 2017 Google. + * + * This file is released under the GPLv2. + * + * Based on drivers/md/dm-verity-chromeos.c + */ + +#include <linux/device-mapper.h> +#include <linux/module.h> +#include <linux/mount.h> + +#define DM_MSG_PREFIX "verity-avb" + +/* Set via module parameters. */ +static char avb_vbmeta_device[64]; +static char avb_invalidate_on_error[4]; + +static void invalidate_vbmeta_endio(struct bio *bio) +{ + if (bio->bi_error) + DMERR("invalidate_vbmeta_endio: error %d", bio->bi_error); + complete(bio->bi_private); +} + +static int invalidate_vbmeta_submit(struct bio *bio, + struct block_device *bdev, + int op, int access_last_sector, + struct page *page) +{ + DECLARE_COMPLETION_ONSTACK(wait); + + bio->bi_private = &wait; + bio->bi_end_io = invalidate_vbmeta_endio; + bio->bi_bdev = bdev; + bio_set_op_attrs(bio, op, REQ_SYNC | REQ_NOIDLE); + + bio->bi_iter.bi_sector = 0; + if (access_last_sector) { + sector_t last_sector; + + last_sector = (i_size_read(bdev->bd_inode)>>SECTOR_SHIFT) - 1; + bio->bi_iter.bi_sector = last_sector; + } + if (!bio_add_page(bio, page, PAGE_SIZE, 0)) { + DMERR("invalidate_vbmeta_submit: bio_add_page error"); + return -EIO; + } + + submit_bio(bio); + /* Wait up to 2 seconds for completion or fail. */ + if (!wait_for_completion_timeout(&wait, msecs_to_jiffies(2000))) + return -EIO; + return 0; +} + +static int invalidate_vbmeta(dev_t vbmeta_devt) +{ + int ret = 0; + struct block_device *bdev; + struct bio *bio; + struct page *page; + fmode_t dev_mode; + /* Ensure we do synchronous unblocked I/O. We may also need + * sync_bdev() on completion, but it really shouldn't. + */ + int access_last_sector = 0; + + DMINFO("invalidate_vbmeta: acting on device %d:%d", + MAJOR(vbmeta_devt), MINOR(vbmeta_devt)); + + /* First we open the device for reading. */ + dev_mode = FMODE_READ | FMODE_EXCL; + bdev = blkdev_get_by_dev(vbmeta_devt, dev_mode, + invalidate_vbmeta); + if (IS_ERR(bdev)) { + DMERR("invalidate_kernel: could not open device for reading"); + dev_mode = 0; + ret = -ENOENT; + goto failed_to_read; + } + + bio = bio_alloc(GFP_NOIO, 1); + if (!bio) { + ret = -ENOMEM; + goto failed_bio_alloc; + } + + page = alloc_page(GFP_NOIO); + if (!page) { + ret = -ENOMEM; + goto failed_to_alloc_page; + } + + access_last_sector = 0; + ret = invalidate_vbmeta_submit(bio, bdev, REQ_OP_READ, + access_last_sector, page); + if (ret) { + DMERR("invalidate_vbmeta: error reading"); + goto failed_to_submit_read; + } + + /* We have a page. Let's make sure it looks right. */ + if (memcmp("AVB0", page_address(page), 4) == 0) { + /* Stamp it. */ + memcpy(page_address(page), "AVE0", 4); + DMINFO("invalidate_vbmeta: found vbmeta partition"); + } else { + /* Could be this is on a AVB footer, check. Also, since the + * AVB footer is in the last 64 bytes, adjust for the fact that + * we're dealing with 512-byte sectors. + */ + size_t offset = (1<<SECTOR_SHIFT) - 64; + + access_last_sector = 1; + ret = invalidate_vbmeta_submit(bio, bdev, REQ_OP_READ, + access_last_sector, page); + if (ret) { + DMERR("invalidate_vbmeta: error reading"); + goto failed_to_submit_read; + } + if (memcmp("AVBf", page_address(page) + offset, 4) != 0) { + DMERR("invalidate_vbmeta on non-vbmeta partition"); + ret = -EINVAL; + goto invalid_header; + } + /* Stamp it. */ + memcpy(page_address(page) + offset, "AVE0", 4); + DMINFO("invalidate_vbmeta: found vbmeta footer partition"); + } + + /* Now rewrite the changed page - the block dev was being + * changed on read. Let's reopen here. + */ + blkdev_put(bdev, dev_mode); + dev_mode = FMODE_WRITE | FMODE_EXCL; + bdev = blkdev_get_by_dev(vbmeta_devt, dev_mode, + invalidate_vbmeta); + if (IS_ERR(bdev)) { + DMERR("invalidate_vbmeta: could not open device for writing"); + dev_mode = 0; + ret = -ENOENT; + goto failed_to_write; + } + + /* We re-use the same bio to do the write after the read. Need to reset + * it to initialize bio->bi_remaining. + */ + bio_reset(bio); + + ret = invalidate_vbmeta_submit(bio, bdev, REQ_OP_WRITE, + access_last_sector, page); + if (ret) { + DMERR("invalidate_vbmeta: error writing"); + goto failed_to_submit_write; + } + + DMERR("invalidate_vbmeta: completed."); + ret = 0; +failed_to_submit_write: +failed_to_write: +invalid_header: + __free_page(page); +failed_to_submit_read: + /* Technically, we'll leak a page with the pending bio, but + * we're about to reboot anyway. + */ +failed_to_alloc_page: + bio_put(bio); +failed_bio_alloc: + if (dev_mode) + blkdev_put(bdev, dev_mode); +failed_to_read: + return ret; +} + +void dm_verity_avb_error_handler(void) +{ + dev_t dev; + + DMINFO("AVB error handler called for %s", avb_vbmeta_device); + + if (strcmp(avb_invalidate_on_error, "yes") != 0) { + DMINFO("Not configured to invalidate"); + return; + } + + if (avb_vbmeta_device[0] == '\0') { + DMERR("avb_vbmeta_device parameter not set"); + goto fail_no_dev; + } + + dev = name_to_dev_t(avb_vbmeta_device); + if (!dev) { + DMERR("No matching partition for device: %s", + avb_vbmeta_device); + goto fail_no_dev; + } + + invalidate_vbmeta(dev); + +fail_no_dev: + ; +} + +static int __init dm_verity_avb_init(void) +{ + DMINFO("AVB error handler initialized with vbmeta device: %s", + avb_vbmeta_device); + return 0; +} + +static void __exit dm_verity_avb_exit(void) +{ +} + +module_init(dm_verity_avb_init); +module_exit(dm_verity_avb_exit); + +MODULE_AUTHOR("David Zeuthen <zeuthen@google.com>"); +MODULE_DESCRIPTION("AVB-specific error handler for dm-verity"); +MODULE_LICENSE("GPL"); + +/* Declare parameter with no module prefix */ +#undef MODULE_PARAM_PREFIX +#define MODULE_PARAM_PREFIX "androidboot.vbmeta." +module_param_string(device, avb_vbmeta_device, sizeof(avb_vbmeta_device), 0); +module_param_string(invalidate_on_error, avb_invalidate_on_error, + sizeof(avb_invalidate_on_error), 0);
diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c index 78f3601..3b62315 100644 --- a/drivers/md/dm-verity-fec.c +++ b/drivers/md/dm-verity-fec.c
@@ -11,6 +11,7 @@ #include "dm-verity-fec.h" #include <linux/math64.h> +#include <linux/sysfs.h> #define DM_MSG_PREFIX "verity-fec" @@ -175,9 +176,11 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_fec_io *fio, if (r < 0 && neras) DMERR_LIMIT("%s: FEC %llu: failed to correct: %d", v->data_dev->name, (unsigned long long)rsb, r); - else if (r > 0) + else if (r > 0) { DMWARN_LIMIT("%s: FEC %llu: corrected %d errors", v->data_dev->name, (unsigned long long)rsb, r); + atomic_add_unless(&v->fec->corrected, 1, INT_MAX); + } return r; } @@ -556,6 +559,7 @@ unsigned verity_fec_status_table(struct dm_verity *v, unsigned sz, void verity_fec_dtr(struct dm_verity *v) { struct dm_verity_fec *f = v->fec; + struct kobject *kobj = &f->kobj_holder.kobj; if (!verity_fec_is_enabled(v)) goto out; @@ -572,6 +576,12 @@ void verity_fec_dtr(struct dm_verity *v) if (f->dev) dm_put_device(v->ti, f->dev); + + if (kobj->state_initialized) { + kobject_put(kobj); + wait_for_completion(dm_get_completion_from_kobject(kobj)); + } + out: kfree(f); v->fec = NULL; @@ -660,6 +670,28 @@ int verity_fec_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v, return 0; } +static ssize_t corrected_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct dm_verity_fec *f = container_of(kobj, struct dm_verity_fec, + kobj_holder.kobj); + + return sprintf(buf, "%d\n", atomic_read(&f->corrected)); +} + +static struct kobj_attribute attr_corrected = __ATTR_RO(corrected); + +static struct attribute *fec_attrs[] = { + &attr_corrected.attr, + NULL +}; + +static struct kobj_type fec_ktype = { + .sysfs_ops = &kobj_sysfs_ops, + .default_attrs = fec_attrs, + .release = dm_kobject_release +}; + /* * Allocate dm_verity_fec for v->fec. Must be called before verity_fec_ctr. */ @@ -683,8 +715,10 @@ int verity_fec_ctr_alloc(struct dm_verity *v) */ int verity_fec_ctr(struct dm_verity *v) { + int r; struct dm_verity_fec *f = v->fec; struct dm_target *ti = v->ti; + struct mapped_device *md = dm_table_get_md(ti->table); u64 hash_blocks; if (!verity_fec_is_enabled(v)) { @@ -692,6 +726,16 @@ int verity_fec_ctr(struct dm_verity *v) return 0; } + /* Create a kobject and sysfs attributes */ + init_completion(&f->kobj_holder.completion); + + r = kobject_init_and_add(&f->kobj_holder.kobj, &fec_ktype, + &disk_to_dev(dm_disk(md))->kobj, "%s", "fec"); + if (r) { + ti->error = "Cannot create kobject"; + return r; + } + /* * FEC is computed over data blocks, possible metadata, and * hash blocks. In other words, FEC covers total of fec_blocks
diff --git a/drivers/md/dm-verity-fec.h b/drivers/md/dm-verity-fec.h index bb31ce8..4db0cae 100644 --- a/drivers/md/dm-verity-fec.h +++ b/drivers/md/dm-verity-fec.h
@@ -12,6 +12,8 @@ #ifndef DM_VERITY_FEC_H #define DM_VERITY_FEC_H +#include "dm.h" +#include "dm-core.h" #include "dm-verity.h" #include <linux/rslib.h> @@ -51,6 +53,8 @@ struct dm_verity_fec { mempool_t *extra_pool; /* mempool for extra buffers */ mempool_t *output_pool; /* mempool for output */ struct kmem_cache *cache; /* cache for buffers */ + atomic_t corrected; /* corrected errors */ + struct dm_kobject_holder kobj_holder; /* for sysfs attributes */ }; /* per-bio data */
diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c index 0aba34a..b03e808 100644 --- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c
@@ -19,6 +19,7 @@ #include <linux/module.h> #include <linux/reboot.h> +#include <linux/vmalloc.h> #define DM_MSG_PREFIX "verity" @@ -32,6 +33,7 @@ #define DM_VERITY_OPT_LOGGING "ignore_corruption" #define DM_VERITY_OPT_RESTART "restart_on_corruption" #define DM_VERITY_OPT_IGN_ZEROES "ignore_zero_blocks" +#define DM_VERITY_OPT_AT_MOST_ONCE "check_at_most_once" #define DM_VERITY_OPTS_MAX (2 + DM_VERITY_OPTS_FEC) @@ -233,8 +235,12 @@ static int verity_handle_err(struct dm_verity *v, enum verity_block_type type, if (v->mode == DM_VERITY_MODE_LOGGING) return 0; - if (v->mode == DM_VERITY_MODE_RESTART) + if (v->mode == DM_VERITY_MODE_RESTART) { +#ifdef CONFIG_DM_VERITY_AVB + dm_verity_avb_error_handler(); +#endif kernel_restart("dm-verity device corrupted"); + } return 1; } @@ -395,6 +401,18 @@ static int verity_bv_zero(struct dm_verity *v, struct dm_verity_io *io, } /* + * Moves the bio iter one data block forward. + */ +static inline void verity_bv_skip_block(struct dm_verity *v, + struct dm_verity_io *io, + struct bvec_iter *iter) +{ + struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size); + + bio_advance_iter(bio, iter, 1 << v->data_dev_block_bits); +} + +/* * Verify one "dm_verity_io" structure. */ static int verity_verify_io(struct dm_verity_io *io) @@ -406,9 +424,16 @@ static int verity_verify_io(struct dm_verity_io *io) for (b = 0; b < io->n_blocks; b++) { int r; + sector_t cur_block = io->block + b; struct shash_desc *desc = verity_io_hash_desc(v, io); - r = verity_hash_for_block(v, io, io->block + b, + if (v->validated_blocks && + likely(test_bit(cur_block, v->validated_blocks))) { + verity_bv_skip_block(v, io, &io->iter); + continue; + } + + r = verity_hash_for_block(v, io, cur_block, verity_io_want_digest(v, io), &is_zero); if (unlikely(r < 0)) @@ -441,13 +466,16 @@ static int verity_verify_io(struct dm_verity_io *io) return r; if (likely(memcmp(verity_io_real_digest(v, io), - verity_io_want_digest(v, io), v->digest_size) == 0)) + verity_io_want_digest(v, io), v->digest_size) == 0)) { + if (v->validated_blocks) + set_bit(cur_block, v->validated_blocks); continue; + } else if (verity_fec_decode(v, io, DM_VERITY_BLOCK_TYPE_DATA, - io->block + b, NULL, &start) == 0) + cur_block, NULL, &start) == 0) continue; else if (verity_handle_err(v, DM_VERITY_BLOCK_TYPE_DATA, - io->block + b)) + cur_block)) return -EIO; } @@ -501,6 +529,7 @@ static void verity_prefetch_io(struct work_struct *work) container_of(work, struct dm_verity_prefetch_work, work); struct dm_verity *v = pw->v; int i; + sector_t prefetch_size; for (i = v->levels - 2; i >= 0; i--) { sector_t hash_block_start; @@ -523,8 +552,14 @@ static void verity_prefetch_io(struct work_struct *work) hash_block_end = v->hash_blocks - 1; } no_prefetch_cluster: + // for emmc, it is more efficient to send bigger read + prefetch_size = max((sector_t)CONFIG_DM_VERITY_HASH_PREFETCH_MIN_SIZE, + hash_block_end - hash_block_start + 1); + if ((hash_block_start + prefetch_size) >= (v->hash_start + v->hash_blocks)) { + prefetch_size = hash_block_end - hash_block_start + 1; + } dm_bufio_prefetch(v->bufio, hash_block_start, - hash_block_end - hash_block_start + 1); + prefetch_size); } kfree(pw); @@ -551,7 +586,7 @@ static void verity_submit_prefetch(struct dm_verity *v, struct dm_verity_io *io) * Bio map function. It allocates dm_verity_io structure and bio vector and * fills them. Then it issues prefetches and the I/O. */ -static int verity_map(struct dm_target *ti, struct bio *bio) +int verity_map(struct dm_target *ti, struct bio *bio) { struct dm_verity *v = ti->private; struct dm_verity_io *io; @@ -592,11 +627,12 @@ static int verity_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_SUBMITTED; } +EXPORT_SYMBOL_GPL(verity_map); /* * Status: V (valid) or C (corruption found) */ -static void verity_status(struct dm_target *ti, status_type_t type, +void verity_status(struct dm_target *ti, status_type_t type, unsigned status_flags, char *result, unsigned maxlen) { struct dm_verity *v = ti->private; @@ -633,6 +669,8 @@ static void verity_status(struct dm_target *ti, status_type_t type, args += DM_VERITY_OPTS_FEC; if (v->zero_digest) args++; + if (v->validated_blocks) + args++; if (!args) return; DMEMIT(" %u", args); @@ -651,12 +689,15 @@ static void verity_status(struct dm_target *ti, status_type_t type, } if (v->zero_digest) DMEMIT(" " DM_VERITY_OPT_IGN_ZEROES); + if (v->validated_blocks) + DMEMIT(" " DM_VERITY_OPT_AT_MOST_ONCE); sz = verity_fec_status_table(v, sz, result, maxlen); break; } } +EXPORT_SYMBOL_GPL(verity_status); -static int verity_prepare_ioctl(struct dm_target *ti, +int verity_prepare_ioctl(struct dm_target *ti, struct block_device **bdev, fmode_t *mode) { struct dm_verity *v = ti->private; @@ -668,16 +709,18 @@ static int verity_prepare_ioctl(struct dm_target *ti, return 1; return 0; } +EXPORT_SYMBOL_GPL(verity_prepare_ioctl); -static int verity_iterate_devices(struct dm_target *ti, +int verity_iterate_devices(struct dm_target *ti, iterate_devices_callout_fn fn, void *data) { struct dm_verity *v = ti->private; return fn(ti, v->data_dev, v->data_start, ti->len, data); } +EXPORT_SYMBOL_GPL(verity_iterate_devices); -static void verity_io_hints(struct dm_target *ti, struct queue_limits *limits) +void verity_io_hints(struct dm_target *ti, struct queue_limits *limits) { struct dm_verity *v = ti->private; @@ -689,8 +732,9 @@ static void verity_io_hints(struct dm_target *ti, struct queue_limits *limits) blk_limits_io_min(limits, limits->logical_block_size); } +EXPORT_SYMBOL_GPL(verity_io_hints); -static void verity_dtr(struct dm_target *ti) +void verity_dtr(struct dm_target *ti) { struct dm_verity *v = ti->private; @@ -700,6 +744,7 @@ static void verity_dtr(struct dm_target *ti) if (v->bufio) dm_bufio_client_destroy(v->bufio); + vfree(v->validated_blocks); kfree(v->salt); kfree(v->root_digest); kfree(v->zero_digest); @@ -719,6 +764,27 @@ static void verity_dtr(struct dm_target *ti) kfree(v); } +EXPORT_SYMBOL_GPL(verity_dtr); + +static int verity_alloc_most_once(struct dm_verity *v) +{ + struct dm_target *ti = v->ti; + + /* the bitset can only handle INT_MAX blocks */ + if (v->data_blocks > INT_MAX) { + ti->error = "device too large to use check_at_most_once"; + return -E2BIG; + } + + v->validated_blocks = vzalloc(BITS_TO_LONGS(v->data_blocks) * + sizeof(unsigned long)); + if (!v->validated_blocks) { + ti->error = "failed to allocate bitset for check_at_most_once"; + return -ENOMEM; + } + + return 0; +} static int verity_alloc_zero_digest(struct dm_verity *v) { @@ -789,6 +855,12 @@ static int verity_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v) } continue; + } else if (!strcasecmp(arg_name, DM_VERITY_OPT_AT_MOST_ONCE)) { + r = verity_alloc_most_once(v); + if (r) + return r; + continue; + } else if (verity_is_fec_opt_arg(arg_name)) { r = verity_fec_parse_opt_args(as, v, &argc, arg_name); if (r) @@ -817,7 +889,7 @@ static int verity_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v) * <digest> * <salt> Hex string or "-" if no salt. */ -static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv) +int verity_ctr(struct dm_target *ti, unsigned argc, char **argv) { struct dm_verity *v; struct dm_arg_set as; @@ -981,6 +1053,14 @@ static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv) goto bad; } +#ifdef CONFIG_DM_ANDROID_VERITY_AT_MOST_ONCE_DEFAULT_ENABLED + if (!v->validated_blocks) { + r = verity_alloc_most_once(v); + if (r) + goto bad; + } +#endif + v->hash_per_block_bits = __fls((1 << v->hash_dev_block_bits) / v->digest_size); @@ -1053,10 +1133,11 @@ static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv) return r; } +EXPORT_SYMBOL_GPL(verity_ctr); static struct target_type verity_target = { .name = "verity", - .version = {1, 3, 0}, + .version = {1, 4, 0}, .module = THIS_MODULE, .ctr = verity_ctr, .dtr = verity_dtr,
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h index fb419f4..d216fc76 100644 --- a/drivers/md/dm-verity.h +++ b/drivers/md/dm-verity.h
@@ -63,6 +63,7 @@ struct dm_verity { sector_t hash_level_block[DM_VERITY_MAX_LEVELS]; struct dm_verity_fec *fec; /* forward error correction */ + unsigned long *validated_blocks; /* bitset blocks validated */ }; struct dm_verity_io { @@ -126,4 +127,15 @@ extern int verity_hash(struct dm_verity *v, struct shash_desc *desc, extern int verity_hash_for_block(struct dm_verity *v, struct dm_verity_io *io, sector_t block, u8 *digest, bool *is_zero); +extern void verity_status(struct dm_target *ti, status_type_t type, + unsigned status_flags, char *result, unsigned maxlen); +extern int verity_prepare_ioctl(struct dm_target *ti, + struct block_device **bdev, fmode_t *mode); +extern int verity_iterate_devices(struct dm_target *ti, + iterate_devices_callout_fn fn, void *data); +extern void verity_io_hints(struct dm_target *ti, struct queue_limits *limits); +extern void verity_dtr(struct dm_target *ti); +extern int verity_ctr(struct dm_target *ti, unsigned argc, char **argv); +extern int verity_map(struct dm_target *ti, struct bio *bio); +extern void dm_verity_avb_error_handler(void); #endif /* DM_VERITY_H */
diff --git a/drivers/md/dm.h b/drivers/md/dm.h index f0aad08..ed25f30 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h
@@ -80,8 +80,6 @@ void dm_set_md_type(struct mapped_device *md, unsigned type); unsigned dm_get_md_type(struct mapped_device *md); struct target_type *dm_get_immutable_target_type(struct mapped_device *md); -int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t); - /* * To check the return value from dm_table_find_target(). */
diff --git a/drivers/md/md.c b/drivers/md/md.c index a7a0e3a..ff53829 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c
@@ -5098,7 +5098,7 @@ static struct kobject *md_probe(dev_t dev, int *part, void *data) return NULL; } -static int add_named_array(const char *val, struct kernel_param *kp) +static int add_named_array(const char *val, const struct kernel_param *kp) { /* val must be "md_*" where * is not all digits. * We allocate an array with a large free minor number, and @@ -8989,11 +8989,11 @@ static __exit void md_exit(void) subsys_initcall(md_init); module_exit(md_exit) -static int get_ro(char *buffer, struct kernel_param *kp) +static int get_ro(char *buffer, const struct kernel_param *kp) { return sprintf(buffer, "%d", start_readonly); } -static int set_ro(const char *val, struct kernel_param *kp) +static int set_ro(const char *val, const struct kernel_param *kp) { return kstrtouint(val, 10, (unsigned int *)&start_readonly); }
diff --git a/drivers/media/pci/tw686x/tw686x-core.c b/drivers/media/pci/tw686x/tw686x-core.c index 71a0453..279d447 100644 --- a/drivers/media/pci/tw686x/tw686x-core.c +++ b/drivers/media/pci/tw686x/tw686x-core.c
@@ -72,12 +72,12 @@ static const char *dma_mode_name(unsigned int mode) } } -static int tw686x_dma_mode_get(char *buffer, struct kernel_param *kp) +static int tw686x_dma_mode_get(char *buffer, const struct kernel_param *kp) { return sprintf(buffer, dma_mode_name(dma_mode)); } -static int tw686x_dma_mode_set(const char *val, struct kernel_param *kp) +static int tw686x_dma_mode_set(const char *val, const struct kernel_param *kp) { if (!strcasecmp(val, dma_mode_name(TW686X_DMA_MODE_MEMCPY))) dma_mode = TW686X_DMA_MODE_MEMCPY;
diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index cde43b63..8412dfc02 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -2184,7 +2184,7 @@ static int uvc_reset_resume(struct usb_interface *intf) * Module parameters */ -static int uvc_clock_param_get(char *buffer, struct kernel_param *kp) +static int uvc_clock_param_get(char *buffer, const struct kernel_param *kp) { if (uvc_clock_param == CLOCK_MONOTONIC) return sprintf(buffer, "CLOCK_MONOTONIC"); @@ -2192,7 +2192,7 @@ static int uvc_clock_param_get(char *buffer, struct kernel_param *kp) return sprintf(buffer, "CLOCK_REALTIME"); } -static int uvc_clock_param_set(const char *val, struct kernel_param *kp) +static int uvc_clock_param_set(const char *val, const struct kernel_param *kp) { if (strncasecmp(val, "clock_", strlen("clock_")) == 0) val += strlen("clock_");
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c index 4510e8a..610caa1 100644 --- a/drivers/media/v4l2-core/v4l2-ioctl.c +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -2455,11 +2455,8 @@ struct v4l2_ioctl_info { unsigned int ioctl; u32 flags; const char * const name; - union { - u32 offset; - int (*func)(const struct v4l2_ioctl_ops *ops, - struct file *file, void *fh, void *p); - } u; + int (*func)(const struct v4l2_ioctl_ops *ops, struct file *file, + void *fh, void *p); void (*debug)(const void *arg, bool write_only); }; @@ -2467,25 +2464,21 @@ struct v4l2_ioctl_info { #define INFO_FL_PRIO (1 << 0) /* This control can be valid if the filehandle passes a control handler. */ #define INFO_FL_CTRL (1 << 1) -/* This is a standard ioctl, no need for special code */ -#define INFO_FL_STD (1 << 2) /* This is ioctl has its own function */ -#define INFO_FL_FUNC (1 << 3) +#define INFO_FL_FUNC (1 << 2) /* Queuing ioctl */ -#define INFO_FL_QUEUE (1 << 4) +#define INFO_FL_QUEUE (1 << 3) /* Zero struct from after the field to the end */ #define INFO_FL_CLEAR(v4l2_struct, field) \ ((offsetof(struct v4l2_struct, field) + \ sizeof(((struct v4l2_struct *)0)->field)) << 16) #define INFO_FL_CLEAR_MASK (_IOC_SIZEMASK << 16) -#define IOCTL_INFO_STD(_ioctl, _vidioc, _debug, _flags) \ - [_IOC_NR(_ioctl)] = { \ - .ioctl = _ioctl, \ - .flags = _flags | INFO_FL_STD, \ - .name = #_ioctl, \ - .u.offset = offsetof(struct v4l2_ioctl_ops, _vidioc), \ - .debug = _debug, \ +#define DEFINE_IOCTL_STD_FNC(_vidioc) \ + static int __v4l_ ## _vidioc ## _fnc( \ + const struct v4l2_ioctl_ops *ops, \ + struct file *file, void *fh, void *p) { \ + return ops->_vidioc(file, fh, p); \ } #define IOCTL_INFO_FNC(_ioctl, _func, _debug, _flags) \ @@ -2493,10 +2486,44 @@ struct v4l2_ioctl_info { .ioctl = _ioctl, \ .flags = _flags | INFO_FL_FUNC, \ .name = #_ioctl, \ - .u.func = _func, \ + .func = _func, \ .debug = _debug, \ } +#define IOCTL_INFO_STD(_ioctl, _vidioc, _debug, _flags) \ + IOCTL_INFO_FNC(_ioctl, __v4l_ ## _vidioc ## _fnc, _debug, _flags) + +DEFINE_IOCTL_STD_FNC(vidioc_g_fbuf) +DEFINE_IOCTL_STD_FNC(vidioc_s_fbuf) +DEFINE_IOCTL_STD_FNC(vidioc_expbuf) +DEFINE_IOCTL_STD_FNC(vidioc_g_std) +DEFINE_IOCTL_STD_FNC(vidioc_g_audio) +DEFINE_IOCTL_STD_FNC(vidioc_s_audio) +DEFINE_IOCTL_STD_FNC(vidioc_g_input) +DEFINE_IOCTL_STD_FNC(vidioc_g_edid) +DEFINE_IOCTL_STD_FNC(vidioc_s_edid) +DEFINE_IOCTL_STD_FNC(vidioc_g_output) +DEFINE_IOCTL_STD_FNC(vidioc_g_audout) +DEFINE_IOCTL_STD_FNC(vidioc_s_audout) +DEFINE_IOCTL_STD_FNC(vidioc_g_selection) +DEFINE_IOCTL_STD_FNC(vidioc_s_selection) +DEFINE_IOCTL_STD_FNC(vidioc_g_jpegcomp) +DEFINE_IOCTL_STD_FNC(vidioc_s_jpegcomp) +DEFINE_IOCTL_STD_FNC(vidioc_enumaudio) +DEFINE_IOCTL_STD_FNC(vidioc_enumaudout) +DEFINE_IOCTL_STD_FNC(vidioc_enum_framesizes) +DEFINE_IOCTL_STD_FNC(vidioc_enum_frameintervals) +DEFINE_IOCTL_STD_FNC(vidioc_g_enc_index) +DEFINE_IOCTL_STD_FNC(vidioc_encoder_cmd) +DEFINE_IOCTL_STD_FNC(vidioc_try_encoder_cmd) +DEFINE_IOCTL_STD_FNC(vidioc_decoder_cmd) +DEFINE_IOCTL_STD_FNC(vidioc_try_decoder_cmd) +DEFINE_IOCTL_STD_FNC(vidioc_s_dv_timings) +DEFINE_IOCTL_STD_FNC(vidioc_g_dv_timings) +DEFINE_IOCTL_STD_FNC(vidioc_enum_dv_timings) +DEFINE_IOCTL_STD_FNC(vidioc_query_dv_timings) +DEFINE_IOCTL_STD_FNC(vidioc_dv_timings_cap) + static struct v4l2_ioctl_info v4l2_ioctls[] = { IOCTL_INFO_FNC(VIDIOC_QUERYCAP, v4l_querycap, v4l_print_querycap, 0), IOCTL_INFO_FNC(VIDIOC_ENUM_FMT, v4l_enum_fmt, v4l_print_fmtdesc, INFO_FL_CLEAR(v4l2_fmtdesc, type)), @@ -2681,14 +2708,8 @@ static long __video_do_ioctl(struct file *file, } write_only = _IOC_DIR(cmd) == _IOC_WRITE; - if (info->flags & INFO_FL_STD) { - typedef int (*vidioc_op)(struct file *file, void *fh, void *p); - const void *p = vfd->ioctl_ops; - const vidioc_op *vidioc = p + info->u.offset; - - ret = (*vidioc)(file, fh, arg); - } else if (info->flags & INFO_FL_FUNC) { - ret = info->u.func(ops, file, fh, arg); + if (info->flags & INFO_FL_FUNC) { + ret = info->func(ops, file, fh, arg); } else if (!ops->vidioc_default) { ret = -ENOTTY; } else {
diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c index 89c7ed1..1cdab6d 100644 --- a/drivers/message/fusion/mptbase.c +++ b/drivers/message/fusion/mptbase.c
@@ -99,7 +99,7 @@ module_param(mpt_channel_mapping, int, 0); MODULE_PARM_DESC(mpt_channel_mapping, " Mapping id's to channels (default=0)"); static int mpt_debug_level; -static int mpt_set_debug_level(const char *val, struct kernel_param *kp); +static int mpt_set_debug_level(const char *val, const struct kernel_param *kp); module_param_call(mpt_debug_level, mpt_set_debug_level, param_get_int, &mpt_debug_level, 0600); MODULE_PARM_DESC(mpt_debug_level, @@ -242,7 +242,7 @@ pci_enable_io_access(struct pci_dev *pdev) pci_write_config_word(pdev, PCI_COMMAND, command_reg); } -static int mpt_set_debug_level(const char *val, struct kernel_param *kp) +static int mpt_set_debug_level(const char *val, const struct kernel_param *kp) { int ret = param_set_int(val, kp); MPT_ADAPTER *ioc;
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 64971ba..9360e6e 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig
@@ -766,6 +766,27 @@ An empty message will only clear the display at driver init time. Any other printf()-formatted message is valid with newline and escape codes. +config UID_SYS_STATS + bool "Per-UID statistics" + depends on PROFILING && TASK_XACCT && TASK_IO_ACCOUNTING + help + Per UID based cpu time statistics exported to /proc/uid_cputime + Per UID based io statistics exported to /proc/uid_io + Per UID based procstat control in /proc/uid_procstat + +config UID_SYS_STATS_DEBUG + bool "Per-TASK statistics" + depends on UID_SYS_STATS + default n + help + Per TASK based io statistics exported to /proc/uid_io + +config MEMORY_STATE_TIME + tristate "Memory freq/bandwidth time statistics" + depends on PROFILING + help + Memory time statistics exported to /sys/kernel/memory_state_time + source "drivers/misc/c2port/Kconfig" source "drivers/misc/eeprom/Kconfig" source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 2bf79ba..564406b 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile
@@ -54,6 +54,9 @@ obj-$(CONFIG_CXL_BASE) += cxl/ obj-$(CONFIG_PANEL) += panel.o +obj-$(CONFIG_UID_SYS_STATS) += uid_sys_stats.o +obj-$(CONFIG_MEMORY_STATE_TIME) += memory_state_time.o + lkdtm-$(CONFIG_LKDTM) += lkdtm_core.o lkdtm-$(CONFIG_LKDTM) += lkdtm_bugs.o lkdtm-$(CONFIG_LKDTM) += lkdtm_heap.o @@ -61,6 +64,7 @@ lkdtm-$(CONFIG_LKDTM) += lkdtm_rodata_objcopy.o lkdtm-$(CONFIG_LKDTM) += lkdtm_usercopy.o +CFLAGS_lkdtm_rodata.o += $(DISABLE_LTO) KCOV_INSTRUMENT_lkdtm_rodata.o := n OBJCOPYFLAGS :=
diff --git a/drivers/misc/kgdbts.c b/drivers/misc/kgdbts.c index 99635dd..2290845 100644 --- a/drivers/misc/kgdbts.c +++ b/drivers/misc/kgdbts.c
@@ -1130,7 +1130,8 @@ static void kgdbts_put_char(u8 chr) ts.run_test(0, chr); } -static int param_set_kgdbts_var(const char *kmessage, struct kernel_param *kp) +static int param_set_kgdbts_var(const char *kmessage, + const struct kernel_param *kp) { int len = strlen(kmessage);
diff --git a/drivers/misc/lkdtm.h b/drivers/misc/lkdtm.h index fdf954c..9966891 100644 --- a/drivers/misc/lkdtm.h +++ b/drivers/misc/lkdtm.h
@@ -21,6 +21,9 @@ void lkdtm_SPINLOCKUP(void); void lkdtm_HUNG_TASK(void); void lkdtm_ATOMIC_UNDERFLOW(void); void lkdtm_ATOMIC_OVERFLOW(void); +void lkdtm_CORRUPT_LIST_ADD(void); +void lkdtm_CORRUPT_LIST_DEL(void); +void lkdtm_CORRUPT_USER_DS(void); /* lkdtm_heap.c */ void lkdtm_OVERWRITE_ALLOCATION(void);
diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c index 182ae18..da1cf47 100644 --- a/drivers/misc/lkdtm_bugs.c +++ b/drivers/misc/lkdtm_bugs.c
@@ -5,7 +5,13 @@ * test source files. */ #include "lkdtm.h" +#include <linux/list.h> #include <linux/sched.h> +#include <linux/uaccess.h> + +struct lkdtm_list { + struct list_head node; +}; /* * Make sure our attempts to over run the kernel stack doesn't trigger @@ -146,3 +152,75 @@ void lkdtm_ATOMIC_OVERFLOW(void) pr_info("attempting bad atomic overflow\n"); atomic_inc(&over); } + +void lkdtm_CORRUPT_LIST_ADD(void) +{ + /* + * Initially, an empty list via LIST_HEAD: + * test_head.next = &test_head + * test_head.prev = &test_head + */ + LIST_HEAD(test_head); + struct lkdtm_list good, bad; + void *target[2] = { }; + void *redirection = ⌖ + + pr_info("attempting good list addition\n"); + + /* + * Adding to the list performs these actions: + * test_head.next->prev = &good.node + * good.node.next = test_head.next + * good.node.prev = test_head + * test_head.next = good.node + */ + list_add(&good.node, &test_head); + + pr_info("attempting corrupted list addition\n"); + /* + * In simulating this "write what where" primitive, the "what" is + * the address of &bad.node, and the "where" is the address held + * by "redirection". + */ + test_head.next = redirection; + list_add(&bad.node, &test_head); + + if (target[0] == NULL && target[1] == NULL) + pr_err("Overwrite did not happen, but no BUG?!\n"); + else + pr_err("list_add() corruption not detected!\n"); +} + +void lkdtm_CORRUPT_LIST_DEL(void) +{ + LIST_HEAD(test_head); + struct lkdtm_list item; + void *target[2] = { }; + void *redirection = ⌖ + + list_add(&item.node, &test_head); + + pr_info("attempting good list removal\n"); + list_del(&item.node); + + pr_info("attempting corrupted list removal\n"); + list_add(&item.node, &test_head); + + /* As with the list_add() test above, this corrupts "next". */ + item.node.next = redirection; + list_del(&item.node); + + if (target[0] == NULL && target[1] == NULL) + pr_err("Overwrite did not happen, but no BUG?!\n"); + else + pr_err("list_del() corruption not detected!\n"); +} + +void lkdtm_CORRUPT_USER_DS(void) +{ + pr_info("setting bad task size limit\n"); + set_fs(KERNEL_DS); + + /* Make sure we do not keep running with a KERNEL_DS! */ + force_sig(SIGKILL, current); +}
diff --git a/drivers/misc/lkdtm_core.c b/drivers/misc/lkdtm_core.c index b2989f2..b72fb64a 100644 --- a/drivers/misc/lkdtm_core.c +++ b/drivers/misc/lkdtm_core.c
@@ -197,6 +197,9 @@ struct crashtype crashtypes[] = { CRASHTYPE(EXCEPTION), CRASHTYPE(LOOP), CRASHTYPE(OVERFLOW), + CRASHTYPE(CORRUPT_LIST_ADD), + CRASHTYPE(CORRUPT_LIST_DEL), + CRASHTYPE(CORRUPT_USER_DS), CRASHTYPE(CORRUPT_STACK), CRASHTYPE(UNALIGNED_LOAD_STORE_WRITE), CRASHTYPE(OVERWRITE_ALLOCATION),
diff --git a/drivers/misc/memory_state_time.c b/drivers/misc/memory_state_time.c new file mode 100644 index 0000000..ba94dcf --- /dev/null +++ b/drivers/misc/memory_state_time.c
@@ -0,0 +1,462 @@ +/* drivers/misc/memory_state_time.c + * + * Copyright (C) 2016 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/device.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/hashtable.h> +#include <linux/kconfig.h> +#include <linux/kernel.h> +#include <linux/kobject.h> +#include <linux/memory-state-time.h> +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/of_platform.h> +#include <linux/slab.h> +#include <linux/sysfs.h> +#include <linux/time.h> +#include <linux/timekeeping.h> +#include <linux/workqueue.h> + +#define KERNEL_ATTR_RO(_name) \ +static struct kobj_attribute _name##_attr = __ATTR_RO(_name) + +#define KERNEL_ATTR_RW(_name) \ +static struct kobj_attribute _name##_attr = \ + __ATTR(_name, 0644, _name##_show, _name##_store) + +#define FREQ_HASH_BITS 4 +DECLARE_HASHTABLE(freq_hash_table, FREQ_HASH_BITS); + +static DEFINE_MUTEX(mem_lock); + +#define TAG "memory_state_time" +#define BW_NODE "/soc/memory-state-time" +#define FREQ_TBL "freq-tbl" +#define BW_TBL "bw-buckets" +#define NUM_SOURCES "num-sources" + +#define LOWEST_FREQ 2 + +static int curr_bw; +static int curr_freq; +static u32 *bw_buckets; +static u32 *freq_buckets; +static int num_freqs; +static int num_buckets; +static int registered_bw_sources; +static u64 last_update; +static bool init_success; +static struct workqueue_struct *memory_wq; +static u32 num_sources = 10; +static int *bandwidths; + +struct freq_entry { + int freq; + u64 *buckets; /* Bandwidth buckets. */ + struct hlist_node hash; +}; + +struct queue_container { + struct work_struct update_state; + int value; + u64 time_now; + int id; + struct mutex *lock; +}; + +static int find_bucket(int bw) +{ + int i; + + if (bw_buckets != NULL) { + for (i = 0; i < num_buckets; i++) { + if (bw_buckets[i] > bw) { + pr_debug("Found bucket %d for bandwidth %d\n", + i, bw); + return i; + } + } + return num_buckets - 1; + } + return 0; +} + +static u64 get_time_diff(u64 time_now) +{ + u64 ms; + + ms = time_now - last_update; + last_update = time_now; + return ms; +} + +static ssize_t show_stat_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int i, j; + int len = 0; + struct freq_entry *freq_entry; + + for (i = 0; i < num_freqs; i++) { + hash_for_each_possible(freq_hash_table, freq_entry, hash, + freq_buckets[i]) { + if (freq_entry->freq == freq_buckets[i]) { + len += scnprintf(buf + len, PAGE_SIZE - len, + "%d ", freq_buckets[i]); + if (len >= PAGE_SIZE) + break; + for (j = 0; j < num_buckets; j++) { + len += scnprintf(buf + len, + PAGE_SIZE - len, + "%llu ", + freq_entry->buckets[j]); + } + len += scnprintf(buf + len, PAGE_SIZE - len, + "\n"); + } + } + } + pr_debug("Current Time: %llu\n", ktime_get_boot_ns()); + return len; +} +KERNEL_ATTR_RO(show_stat); + +static void update_table(u64 time_now) +{ + struct freq_entry *freq_entry; + + pr_debug("Last known bw %d freq %d\n", curr_bw, curr_freq); + hash_for_each_possible(freq_hash_table, freq_entry, hash, curr_freq) { + if (curr_freq == freq_entry->freq) { + freq_entry->buckets[find_bucket(curr_bw)] + += get_time_diff(time_now); + break; + } + } +} + +static bool freq_exists(int freq) +{ + int i; + + for (i = 0; i < num_freqs; i++) { + if (freq == freq_buckets[i]) + return true; + } + return false; +} + +static int calculate_total_bw(int bw, int index) +{ + int i; + int total_bw = 0; + + pr_debug("memory_state_time New bw %d for id %d\n", bw, index); + bandwidths[index] = bw; + for (i = 0; i < registered_bw_sources; i++) + total_bw += bandwidths[i]; + return total_bw; +} + +static void freq_update_do_work(struct work_struct *work) +{ + struct queue_container *freq_state_update + = container_of(work, struct queue_container, + update_state); + if (freq_state_update) { + mutex_lock(&mem_lock); + update_table(freq_state_update->time_now); + curr_freq = freq_state_update->value; + mutex_unlock(&mem_lock); + kfree(freq_state_update); + } +} + +static void bw_update_do_work(struct work_struct *work) +{ + struct queue_container *bw_state_update + = container_of(work, struct queue_container, + update_state); + if (bw_state_update) { + mutex_lock(&mem_lock); + update_table(bw_state_update->time_now); + curr_bw = calculate_total_bw(bw_state_update->value, + bw_state_update->id); + mutex_unlock(&mem_lock); + kfree(bw_state_update); + } +} + +static void memory_state_freq_update(struct memory_state_update_block *ub, + int value) +{ + if (IS_ENABLED(CONFIG_MEMORY_STATE_TIME)) { + if (freq_exists(value) && init_success) { + struct queue_container *freq_container + = kmalloc(sizeof(struct queue_container), + GFP_KERNEL); + if (!freq_container) + return; + INIT_WORK(&freq_container->update_state, + freq_update_do_work); + freq_container->time_now = ktime_get_boot_ns(); + freq_container->value = value; + pr_debug("Scheduling freq update in work queue\n"); + queue_work(memory_wq, &freq_container->update_state); + } else { + pr_debug("Freq does not exist.\n"); + } + } +} + +static void memory_state_bw_update(struct memory_state_update_block *ub, + int value) +{ + if (IS_ENABLED(CONFIG_MEMORY_STATE_TIME)) { + if (init_success) { + struct queue_container *bw_container + = kmalloc(sizeof(struct queue_container), + GFP_KERNEL); + if (!bw_container) + return; + INIT_WORK(&bw_container->update_state, + bw_update_do_work); + bw_container->time_now = ktime_get_boot_ns(); + bw_container->value = value; + bw_container->id = ub->id; + pr_debug("Scheduling bandwidth update in work queue\n"); + queue_work(memory_wq, &bw_container->update_state); + } + } +} + +struct memory_state_update_block *memory_state_register_frequency_source(void) +{ + struct memory_state_update_block *block; + + if (IS_ENABLED(CONFIG_MEMORY_STATE_TIME)) { + pr_debug("Allocating frequency source\n"); + block = kmalloc(sizeof(struct memory_state_update_block), + GFP_KERNEL); + if (!block) + return NULL; + block->update_call = memory_state_freq_update; + return block; + } + pr_err("Config option disabled.\n"); + return NULL; +} +EXPORT_SYMBOL_GPL(memory_state_register_frequency_source); + +struct memory_state_update_block *memory_state_register_bandwidth_source(void) +{ + struct memory_state_update_block *block; + + if (IS_ENABLED(CONFIG_MEMORY_STATE_TIME)) { + pr_debug("Allocating bandwidth source %d\n", + registered_bw_sources); + block = kmalloc(sizeof(struct memory_state_update_block), + GFP_KERNEL); + if (!block) + return NULL; + block->update_call = memory_state_bw_update; + if (registered_bw_sources < num_sources) { + block->id = registered_bw_sources++; + } else { + pr_err("Unable to allocate source; max number reached\n"); + kfree(block); + return NULL; + } + return block; + } + pr_err("Config option disabled.\n"); + return NULL; +} +EXPORT_SYMBOL_GPL(memory_state_register_bandwidth_source); + +/* Buckets are designated by their maximum. + * Returns the buckets decided by the capability of the device. + */ +static int get_bw_buckets(struct device *dev) +{ + int ret, lenb; + struct device_node *node = dev->of_node; + + of_property_read_u32(node, NUM_SOURCES, &num_sources); + if (!of_find_property(node, BW_TBL, &lenb)) { + pr_err("Missing %s property\n", BW_TBL); + return -ENODATA; + } + + bandwidths = devm_kzalloc(dev, + sizeof(*bandwidths) * num_sources, GFP_KERNEL); + if (!bandwidths) + return -ENOMEM; + lenb /= sizeof(*bw_buckets); + bw_buckets = devm_kzalloc(dev, lenb * sizeof(*bw_buckets), + GFP_KERNEL); + if (!bw_buckets) { + devm_kfree(dev, bandwidths); + return -ENOMEM; + } + ret = of_property_read_u32_array(node, BW_TBL, bw_buckets, + lenb); + if (ret < 0) { + devm_kfree(dev, bandwidths); + devm_kfree(dev, bw_buckets); + pr_err("Unable to read bandwidth table from device tree.\n"); + return ret; + } + + curr_bw = 0; + num_buckets = lenb; + return 0; +} + +/* Adds struct freq_entry nodes to the hashtable for each compatible frequency. + * Returns the supported number of frequencies. + */ +static int freq_buckets_init(struct device *dev) +{ + struct freq_entry *freq_entry; + int i; + int ret, lenf; + struct device_node *node = dev->of_node; + + if (!of_find_property(node, FREQ_TBL, &lenf)) { + pr_err("Missing %s property\n", FREQ_TBL); + return -ENODATA; + } + + lenf /= sizeof(*freq_buckets); + freq_buckets = devm_kzalloc(dev, lenf * sizeof(*freq_buckets), + GFP_KERNEL); + if (!freq_buckets) + return -ENOMEM; + pr_debug("freqs found len %d\n", lenf); + ret = of_property_read_u32_array(node, FREQ_TBL, freq_buckets, + lenf); + if (ret < 0) { + devm_kfree(dev, freq_buckets); + pr_err("Unable to read frequency table from device tree.\n"); + return ret; + } + pr_debug("ret freq %d\n", ret); + + num_freqs = lenf; + curr_freq = freq_buckets[LOWEST_FREQ]; + + for (i = 0; i < num_freqs; i++) { + freq_entry = devm_kzalloc(dev, sizeof(struct freq_entry), + GFP_KERNEL); + if (!freq_entry) + return -ENOMEM; + freq_entry->buckets = devm_kzalloc(dev, sizeof(u64)*num_buckets, + GFP_KERNEL); + if (!freq_entry->buckets) { + devm_kfree(dev, freq_entry); + return -ENOMEM; + } + pr_debug("memory_state_time Adding freq to ht %d\n", + freq_buckets[i]); + freq_entry->freq = freq_buckets[i]; + hash_add(freq_hash_table, &freq_entry->hash, freq_buckets[i]); + } + return 0; +} + +struct kobject *memory_kobj; +EXPORT_SYMBOL_GPL(memory_kobj); + +static struct attribute *memory_attrs[] = { + &show_stat_attr.attr, + NULL +}; + +static struct attribute_group memory_attr_group = { + .attrs = memory_attrs, +}; + +static int memory_state_time_probe(struct platform_device *pdev) +{ + int error; + + error = get_bw_buckets(&pdev->dev); + if (error) + return error; + error = freq_buckets_init(&pdev->dev); + if (error) + return error; + last_update = ktime_get_boot_ns(); + init_success = true; + + pr_debug("memory_state_time initialized with num_freqs %d\n", + num_freqs); + return 0; +} + +static const struct of_device_id match_table[] = { + { .compatible = "memory-state-time" }, + {} +}; + +static struct platform_driver memory_state_time_driver = { + .probe = memory_state_time_probe, + .driver = { + .name = "memory-state-time", + .of_match_table = match_table, + .owner = THIS_MODULE, + }, +}; + +static int __init memory_state_time_init(void) +{ + int error; + + hash_init(freq_hash_table); + memory_wq = create_singlethread_workqueue("memory_wq"); + if (!memory_wq) { + pr_err("Unable to create workqueue.\n"); + return -EINVAL; + } + /* + * Create sys/kernel directory for memory_state_time. + */ + memory_kobj = kobject_create_and_add(TAG, kernel_kobj); + if (!memory_kobj) { + pr_err("Unable to allocate memory_kobj for sysfs directory.\n"); + error = -ENOMEM; + goto wq; + } + error = sysfs_create_group(memory_kobj, &memory_attr_group); + if (error) { + pr_err("Unable to create sysfs folder.\n"); + goto kobj; + } + + error = platform_driver_register(&memory_state_time_driver); + if (error) { + pr_err("Unable to register memory_state_time platform driver.\n"); + goto group; + } + return 0; + +group: sysfs_remove_group(memory_kobj, &memory_attr_group); +kobj: kobject_put(memory_kobj); +wq: destroy_workqueue(memory_wq); + return error; +} +module_init(memory_state_time_init);
diff --git a/drivers/misc/uid_sys_stats.c b/drivers/misc/uid_sys_stats.c new file mode 100644 index 0000000..345f229 --- /dev/null +++ b/drivers/misc/uid_sys_stats.c
@@ -0,0 +1,706 @@ +/* drivers/misc/uid_sys_stats.c + * + * Copyright (C) 2014 - 2015 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/atomic.h> +#include <linux/cpufreq_times.h> +#include <linux/err.h> +#include <linux/hashtable.h> +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/list.h> +#include <linux/mm.h> +#include <linux/proc_fs.h> +#include <linux/profile.h> +#include <linux/rtmutex.h> +#include <linux/sched.h> +#include <linux/seq_file.h> +#include <linux/slab.h> +#include <linux/uaccess.h> + + +#define UID_HASH_BITS 10 +DECLARE_HASHTABLE(hash_table, UID_HASH_BITS); + +static DEFINE_RT_MUTEX(uid_lock); +static struct proc_dir_entry *cpu_parent; +static struct proc_dir_entry *io_parent; +static struct proc_dir_entry *proc_parent; + +struct io_stats { + u64 read_bytes; + u64 write_bytes; + u64 rchar; + u64 wchar; + u64 fsync; +}; + +#define UID_STATE_FOREGROUND 0 +#define UID_STATE_BACKGROUND 1 +#define UID_STATE_BUCKET_SIZE 2 + +#define UID_STATE_TOTAL_CURR 2 +#define UID_STATE_TOTAL_LAST 3 +#define UID_STATE_DEAD_TASKS 4 +#define UID_STATE_SIZE 5 + +#define MAX_TASK_COMM_LEN 256 + +struct task_entry { + char comm[MAX_TASK_COMM_LEN]; + pid_t pid; + struct io_stats io[UID_STATE_SIZE]; + struct hlist_node hash; +}; + +struct uid_entry { + uid_t uid; + cputime_t utime; + cputime_t stime; + cputime_t active_utime; + cputime_t active_stime; + int state; + struct io_stats io[UID_STATE_SIZE]; + struct hlist_node hash; +#ifdef CONFIG_UID_SYS_STATS_DEBUG + DECLARE_HASHTABLE(task_entries, UID_HASH_BITS); +#endif +}; + +static u64 compute_write_bytes(struct task_struct *task) +{ + if (task->ioac.write_bytes <= task->ioac.cancelled_write_bytes) + return 0; + + return task->ioac.write_bytes - task->ioac.cancelled_write_bytes; +} + +static void compute_io_bucket_stats(struct io_stats *io_bucket, + struct io_stats *io_curr, + struct io_stats *io_last, + struct io_stats *io_dead) +{ + /* tasks could switch to another uid group, but its io_last in the + * previous uid group could still be positive. + * therefore before each update, do an overflow check first + */ + int64_t delta; + + delta = io_curr->read_bytes + io_dead->read_bytes - + io_last->read_bytes; + io_bucket->read_bytes += delta > 0 ? delta : 0; + delta = io_curr->write_bytes + io_dead->write_bytes - + io_last->write_bytes; + io_bucket->write_bytes += delta > 0 ? delta : 0; + delta = io_curr->rchar + io_dead->rchar - io_last->rchar; + io_bucket->rchar += delta > 0 ? delta : 0; + delta = io_curr->wchar + io_dead->wchar - io_last->wchar; + io_bucket->wchar += delta > 0 ? delta : 0; + delta = io_curr->fsync + io_dead->fsync - io_last->fsync; + io_bucket->fsync += delta > 0 ? delta : 0; + + io_last->read_bytes = io_curr->read_bytes; + io_last->write_bytes = io_curr->write_bytes; + io_last->rchar = io_curr->rchar; + io_last->wchar = io_curr->wchar; + io_last->fsync = io_curr->fsync; + + memset(io_dead, 0, sizeof(struct io_stats)); +} + +#ifdef CONFIG_UID_SYS_STATS_DEBUG +static void get_full_task_comm(struct task_entry *task_entry, + struct task_struct *task) +{ + int i = 0, offset = 0, len = 0; + /* save one byte for terminating null character */ + int unused_len = MAX_TASK_COMM_LEN - TASK_COMM_LEN - 1; + char buf[unused_len]; + struct mm_struct *mm = task->mm; + + /* fill the first TASK_COMM_LEN bytes with thread name */ + __get_task_comm(task_entry->comm, TASK_COMM_LEN, task); + i = strlen(task_entry->comm); + while (i < TASK_COMM_LEN) + task_entry->comm[i++] = ' '; + + /* next the executable file name */ + if (mm) { + down_read(&mm->mmap_sem); + if (mm->exe_file) { + char *pathname = d_path(&mm->exe_file->f_path, buf, + unused_len); + + if (!IS_ERR(pathname)) { + len = strlcpy(task_entry->comm + i, pathname, + unused_len); + i += len; + task_entry->comm[i++] = ' '; + unused_len--; + } + } + up_read(&mm->mmap_sem); + } + unused_len -= len; + + /* fill the rest with command line argument + * replace each null or new line character + * between args in argv with whitespace */ + len = get_cmdline(task, buf, unused_len); + while (offset < len) { + if (buf[offset] != '\0' && buf[offset] != '\n') + task_entry->comm[i++] = buf[offset]; + else + task_entry->comm[i++] = ' '; + offset++; + } + + /* get rid of trailing whitespaces in case when arg is memset to + * zero before being reset in userspace + */ + while (task_entry->comm[i-1] == ' ') + i--; + task_entry->comm[i] = '\0'; +} + +static struct task_entry *find_task_entry(struct uid_entry *uid_entry, + struct task_struct *task) +{ + struct task_entry *task_entry; + + hash_for_each_possible(uid_entry->task_entries, task_entry, hash, + task->pid) { + if (task->pid == task_entry->pid) { + /* if thread name changed, update the entire command */ + int len = strnchr(task_entry->comm, ' ', TASK_COMM_LEN) + - task_entry->comm; + + if (strncmp(task_entry->comm, task->comm, len)) + get_full_task_comm(task_entry, task); + return task_entry; + } + } + return NULL; +} + +static struct task_entry *find_or_register_task(struct uid_entry *uid_entry, + struct task_struct *task) +{ + struct task_entry *task_entry; + pid_t pid = task->pid; + + task_entry = find_task_entry(uid_entry, task); + if (task_entry) + return task_entry; + + task_entry = kzalloc(sizeof(struct task_entry), GFP_ATOMIC); + if (!task_entry) + return NULL; + + get_full_task_comm(task_entry, task); + + task_entry->pid = pid; + hash_add(uid_entry->task_entries, &task_entry->hash, (unsigned int)pid); + + return task_entry; +} + +static void remove_uid_tasks(struct uid_entry *uid_entry) +{ + struct task_entry *task_entry; + unsigned long bkt_task; + struct hlist_node *tmp_task; + + hash_for_each_safe(uid_entry->task_entries, bkt_task, + tmp_task, task_entry, hash) { + hash_del(&task_entry->hash); + kfree(task_entry); + } +} + +static void set_io_uid_tasks_zero(struct uid_entry *uid_entry) +{ + struct task_entry *task_entry; + unsigned long bkt_task; + + hash_for_each(uid_entry->task_entries, bkt_task, task_entry, hash) { + memset(&task_entry->io[UID_STATE_TOTAL_CURR], 0, + sizeof(struct io_stats)); + } +} + +static void add_uid_tasks_io_stats(struct uid_entry *uid_entry, + struct task_struct *task, int slot) +{ + struct task_entry *task_entry = find_or_register_task(uid_entry, task); + struct io_stats *task_io_slot = &task_entry->io[slot]; + + task_io_slot->read_bytes += task->ioac.read_bytes; + task_io_slot->write_bytes += compute_write_bytes(task); + task_io_slot->rchar += task->ioac.rchar; + task_io_slot->wchar += task->ioac.wchar; + task_io_slot->fsync += task->ioac.syscfs; +} + +static void compute_io_uid_tasks(struct uid_entry *uid_entry) +{ + struct task_entry *task_entry; + unsigned long bkt_task; + + hash_for_each(uid_entry->task_entries, bkt_task, task_entry, hash) { + compute_io_bucket_stats(&task_entry->io[uid_entry->state], + &task_entry->io[UID_STATE_TOTAL_CURR], + &task_entry->io[UID_STATE_TOTAL_LAST], + &task_entry->io[UID_STATE_DEAD_TASKS]); + } +} + +static void show_io_uid_tasks(struct seq_file *m, struct uid_entry *uid_entry) +{ + struct task_entry *task_entry; + unsigned long bkt_task; + + hash_for_each(uid_entry->task_entries, bkt_task, task_entry, hash) { + /* Separated by comma because space exists in task comm */ + seq_printf(m, "task,%s,%lu,%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu\n", + task_entry->comm, + (unsigned long)task_entry->pid, + task_entry->io[UID_STATE_FOREGROUND].rchar, + task_entry->io[UID_STATE_FOREGROUND].wchar, + task_entry->io[UID_STATE_FOREGROUND].read_bytes, + task_entry->io[UID_STATE_FOREGROUND].write_bytes, + task_entry->io[UID_STATE_BACKGROUND].rchar, + task_entry->io[UID_STATE_BACKGROUND].wchar, + task_entry->io[UID_STATE_BACKGROUND].read_bytes, + task_entry->io[UID_STATE_BACKGROUND].write_bytes, + task_entry->io[UID_STATE_FOREGROUND].fsync, + task_entry->io[UID_STATE_BACKGROUND].fsync); + } +} +#else +static void remove_uid_tasks(struct uid_entry *uid_entry) {}; +static void set_io_uid_tasks_zero(struct uid_entry *uid_entry) {}; +static void add_uid_tasks_io_stats(struct uid_entry *uid_entry, + struct task_struct *task, int slot) {}; +static void compute_io_uid_tasks(struct uid_entry *uid_entry) {}; +static void show_io_uid_tasks(struct seq_file *m, + struct uid_entry *uid_entry) {} +#endif + +static struct uid_entry *find_uid_entry(uid_t uid) +{ + struct uid_entry *uid_entry; + hash_for_each_possible(hash_table, uid_entry, hash, uid) { + if (uid_entry->uid == uid) + return uid_entry; + } + return NULL; +} + +static struct uid_entry *find_or_register_uid(uid_t uid) +{ + struct uid_entry *uid_entry; + + uid_entry = find_uid_entry(uid); + if (uid_entry) + return uid_entry; + + uid_entry = kzalloc(sizeof(struct uid_entry), GFP_ATOMIC); + if (!uid_entry) + return NULL; + + uid_entry->uid = uid; +#ifdef CONFIG_UID_SYS_STATS_DEBUG + hash_init(uid_entry->task_entries); +#endif + hash_add(hash_table, &uid_entry->hash, uid); + + return uid_entry; +} + +static int uid_cputime_show(struct seq_file *m, void *v) +{ + struct uid_entry *uid_entry = NULL; + struct task_struct *task, *temp; + struct user_namespace *user_ns = current_user_ns(); + cputime_t utime; + cputime_t stime; + unsigned long bkt; + uid_t uid; + + rt_mutex_lock(&uid_lock); + + hash_for_each(hash_table, bkt, uid_entry, hash) { + uid_entry->active_stime = 0; + uid_entry->active_utime = 0; + } + + rcu_read_lock(); + do_each_thread(temp, task) { + uid = from_kuid_munged(user_ns, task_uid(task)); + if (!uid_entry || uid_entry->uid != uid) + uid_entry = find_or_register_uid(uid); + if (!uid_entry) { + rcu_read_unlock(); + rt_mutex_unlock(&uid_lock); + pr_err("%s: failed to find the uid_entry for uid %d\n", + __func__, uid); + return -ENOMEM; + } + task_cputime_adjusted(task, &utime, &stime); + uid_entry->active_utime += utime; + uid_entry->active_stime += stime; + } while_each_thread(temp, task); + rcu_read_unlock(); + + hash_for_each(hash_table, bkt, uid_entry, hash) { + cputime_t total_utime = uid_entry->utime + + uid_entry->active_utime; + cputime_t total_stime = uid_entry->stime + + uid_entry->active_stime; + seq_printf(m, "%d: %llu %llu\n", uid_entry->uid, + (unsigned long long)jiffies_to_msecs( + cputime_to_jiffies(total_utime)) * USEC_PER_MSEC, + (unsigned long long)jiffies_to_msecs( + cputime_to_jiffies(total_stime)) * USEC_PER_MSEC); + } + + rt_mutex_unlock(&uid_lock); + return 0; +} + +static int uid_cputime_open(struct inode *inode, struct file *file) +{ + return single_open(file, uid_cputime_show, PDE_DATA(inode)); +} + +static const struct file_operations uid_cputime_fops = { + .open = uid_cputime_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int uid_remove_open(struct inode *inode, struct file *file) +{ + return single_open(file, NULL, NULL); +} + +static ssize_t uid_remove_write(struct file *file, + const char __user *buffer, size_t count, loff_t *ppos) +{ + struct uid_entry *uid_entry; + struct hlist_node *tmp; + char uids[128]; + char *start_uid, *end_uid = NULL; + long int uid_start = 0, uid_end = 0; + + if (count >= sizeof(uids)) + count = sizeof(uids) - 1; + + if (copy_from_user(uids, buffer, count)) + return -EFAULT; + + uids[count] = '\0'; + end_uid = uids; + start_uid = strsep(&end_uid, "-"); + + if (!start_uid || !end_uid) + return -EINVAL; + + if (kstrtol(start_uid, 10, &uid_start) != 0 || + kstrtol(end_uid, 10, &uid_end) != 0) { + return -EINVAL; + } + + /* Also remove uids from /proc/uid_time_in_state */ + cpufreq_task_times_remove_uids(uid_start, uid_end); + + rt_mutex_lock(&uid_lock); + + for (; uid_start <= uid_end; uid_start++) { + hash_for_each_possible_safe(hash_table, uid_entry, tmp, + hash, (uid_t)uid_start) { + if (uid_start == uid_entry->uid) { + remove_uid_tasks(uid_entry); + hash_del(&uid_entry->hash); + kfree(uid_entry); + } + } + } + + rt_mutex_unlock(&uid_lock); + return count; +} + +static const struct file_operations uid_remove_fops = { + .open = uid_remove_open, + .release = single_release, + .write = uid_remove_write, +}; + + +static void add_uid_io_stats(struct uid_entry *uid_entry, + struct task_struct *task, int slot) +{ + struct io_stats *io_slot = &uid_entry->io[slot]; + + io_slot->read_bytes += task->ioac.read_bytes; + io_slot->write_bytes += compute_write_bytes(task); + io_slot->rchar += task->ioac.rchar; + io_slot->wchar += task->ioac.wchar; + io_slot->fsync += task->ioac.syscfs; + + add_uid_tasks_io_stats(uid_entry, task, slot); +} + +static void update_io_stats_all_locked(void) +{ + struct uid_entry *uid_entry = NULL; + struct task_struct *task, *temp; + struct user_namespace *user_ns = current_user_ns(); + unsigned long bkt; + uid_t uid; + + hash_for_each(hash_table, bkt, uid_entry, hash) { + memset(&uid_entry->io[UID_STATE_TOTAL_CURR], 0, + sizeof(struct io_stats)); + set_io_uid_tasks_zero(uid_entry); + } + + rcu_read_lock(); + do_each_thread(temp, task) { + uid = from_kuid_munged(user_ns, task_uid(task)); + if (!uid_entry || uid_entry->uid != uid) + uid_entry = find_or_register_uid(uid); + if (!uid_entry) + continue; + add_uid_io_stats(uid_entry, task, UID_STATE_TOTAL_CURR); + } while_each_thread(temp, task); + rcu_read_unlock(); + + hash_for_each(hash_table, bkt, uid_entry, hash) { + compute_io_bucket_stats(&uid_entry->io[uid_entry->state], + &uid_entry->io[UID_STATE_TOTAL_CURR], + &uid_entry->io[UID_STATE_TOTAL_LAST], + &uid_entry->io[UID_STATE_DEAD_TASKS]); + compute_io_uid_tasks(uid_entry); + } +} + +static void update_io_stats_uid_locked(struct uid_entry *uid_entry) +{ + struct task_struct *task, *temp; + struct user_namespace *user_ns = current_user_ns(); + + memset(&uid_entry->io[UID_STATE_TOTAL_CURR], 0, + sizeof(struct io_stats)); + set_io_uid_tasks_zero(uid_entry); + + rcu_read_lock(); + do_each_thread(temp, task) { + if (from_kuid_munged(user_ns, task_uid(task)) != uid_entry->uid) + continue; + add_uid_io_stats(uid_entry, task, UID_STATE_TOTAL_CURR); + } while_each_thread(temp, task); + rcu_read_unlock(); + + compute_io_bucket_stats(&uid_entry->io[uid_entry->state], + &uid_entry->io[UID_STATE_TOTAL_CURR], + &uid_entry->io[UID_STATE_TOTAL_LAST], + &uid_entry->io[UID_STATE_DEAD_TASKS]); + compute_io_uid_tasks(uid_entry); +} + + +static int uid_io_show(struct seq_file *m, void *v) +{ + struct uid_entry *uid_entry; + unsigned long bkt; + + rt_mutex_lock(&uid_lock); + + update_io_stats_all_locked(); + + hash_for_each(hash_table, bkt, uid_entry, hash) { + seq_printf(m, "%d %llu %llu %llu %llu %llu %llu %llu %llu %llu %llu\n", + uid_entry->uid, + uid_entry->io[UID_STATE_FOREGROUND].rchar, + uid_entry->io[UID_STATE_FOREGROUND].wchar, + uid_entry->io[UID_STATE_FOREGROUND].read_bytes, + uid_entry->io[UID_STATE_FOREGROUND].write_bytes, + uid_entry->io[UID_STATE_BACKGROUND].rchar, + uid_entry->io[UID_STATE_BACKGROUND].wchar, + uid_entry->io[UID_STATE_BACKGROUND].read_bytes, + uid_entry->io[UID_STATE_BACKGROUND].write_bytes, + uid_entry->io[UID_STATE_FOREGROUND].fsync, + uid_entry->io[UID_STATE_BACKGROUND].fsync); + + show_io_uid_tasks(m, uid_entry); + } + + rt_mutex_unlock(&uid_lock); + return 0; +} + +static int uid_io_open(struct inode *inode, struct file *file) +{ + return single_open(file, uid_io_show, PDE_DATA(inode)); +} + +static const struct file_operations uid_io_fops = { + .open = uid_io_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int uid_procstat_open(struct inode *inode, struct file *file) +{ + return single_open(file, NULL, NULL); +} + +static ssize_t uid_procstat_write(struct file *file, + const char __user *buffer, size_t count, loff_t *ppos) +{ + struct uid_entry *uid_entry; + uid_t uid; + int argc, state; + char input[128]; + + if (count >= sizeof(input)) + return -EINVAL; + + if (copy_from_user(input, buffer, count)) + return -EFAULT; + + input[count] = '\0'; + + argc = sscanf(input, "%u %d", &uid, &state); + if (argc != 2) + return -EINVAL; + + if (state != UID_STATE_BACKGROUND && state != UID_STATE_FOREGROUND) + return -EINVAL; + + rt_mutex_lock(&uid_lock); + + uid_entry = find_or_register_uid(uid); + if (!uid_entry) { + rt_mutex_unlock(&uid_lock); + return -EINVAL; + } + + if (uid_entry->state == state) { + rt_mutex_unlock(&uid_lock); + return count; + } + + update_io_stats_uid_locked(uid_entry); + + uid_entry->state = state; + + rt_mutex_unlock(&uid_lock); + + return count; +} + +static const struct file_operations uid_procstat_fops = { + .open = uid_procstat_open, + .release = single_release, + .write = uid_procstat_write, +}; + +static int process_notifier(struct notifier_block *self, + unsigned long cmd, void *v) +{ + struct task_struct *task = v; + struct uid_entry *uid_entry; + cputime_t utime, stime; + uid_t uid; + + if (!task) + return NOTIFY_OK; + + rt_mutex_lock(&uid_lock); + uid = from_kuid_munged(current_user_ns(), task_uid(task)); + uid_entry = find_or_register_uid(uid); + if (!uid_entry) { + pr_err("%s: failed to find uid %d\n", __func__, uid); + goto exit; + } + + task_cputime_adjusted(task, &utime, &stime); + uid_entry->utime += utime; + uid_entry->stime += stime; + + add_uid_io_stats(uid_entry, task, UID_STATE_DEAD_TASKS); + +exit: + rt_mutex_unlock(&uid_lock); + return NOTIFY_OK; +} + +static struct notifier_block process_notifier_block = { + .notifier_call = process_notifier, +}; + +static int __init proc_uid_sys_stats_init(void) +{ + hash_init(hash_table); + + cpu_parent = proc_mkdir("uid_cputime", NULL); + if (!cpu_parent) { + pr_err("%s: failed to create uid_cputime proc entry\n", + __func__); + goto err; + } + + proc_create_data("remove_uid_range", 0222, cpu_parent, + &uid_remove_fops, NULL); + proc_create_data("show_uid_stat", 0444, cpu_parent, + &uid_cputime_fops, NULL); + + io_parent = proc_mkdir("uid_io", NULL); + if (!io_parent) { + pr_err("%s: failed to create uid_io proc entry\n", + __func__); + goto err; + } + + proc_create_data("stats", 0444, io_parent, + &uid_io_fops, NULL); + + proc_parent = proc_mkdir("uid_procstat", NULL); + if (!proc_parent) { + pr_err("%s: failed to create uid_procstat proc entry\n", + __func__); + goto err; + } + + proc_create_data("set", 0222, proc_parent, + &uid_procstat_fops, NULL); + + profile_event_register(PROFILE_TASK_EXIT, &process_notifier_block); + + return 0; + +err: + remove_proc_subtree("uid_cputime", NULL); + remove_proc_subtree("uid_io", NULL); + remove_proc_subtree("uid_procstat", NULL); + return -ENOMEM; +} + +early_initcall(proc_uid_sys_stats_init);
diff --git a/drivers/mmc/card/Kconfig b/drivers/mmc/card/Kconfig index 5562308..6142ec1 100644 --- a/drivers/mmc/card/Kconfig +++ b/drivers/mmc/card/Kconfig
@@ -68,3 +68,15 @@ This driver is only of interest to those developing or testing a host driver. Most people should say N here. + +config MMC_SIMULATE_MAX_SPEED + bool "Turn on maximum speed control per block device" + depends on MMC_BLOCK + help + Say Y here to enable MMC device speed limiting. Used to test and + simulate the behavior of the system when confronted with a slow MMC. + + Enables max_read_speed, max_write_speed and cache_size attributes to + control the write or read maximum KB/second speed behaviors. + + If unsure, say N here.
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 709a872..817fcf8 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c
@@ -287,6 +287,250 @@ static ssize_t force_ro_store(struct device *dev, struct device_attribute *attr, return ret; } +#ifdef CONFIG_MMC_SIMULATE_MAX_SPEED + +static int max_read_speed, max_write_speed, cache_size = 4; + +module_param(max_read_speed, int, S_IRUSR | S_IRGRP); +MODULE_PARM_DESC(max_read_speed, "maximum KB/s read speed 0=off"); +module_param(max_write_speed, int, S_IRUSR | S_IRGRP); +MODULE_PARM_DESC(max_write_speed, "maximum KB/s write speed 0=off"); +module_param(cache_size, int, S_IRUSR | S_IRGRP); +MODULE_PARM_DESC(cache_size, "MB high speed memory or SLC cache"); + +/* + * helper macros and expectations: + * size - unsigned long number of bytes + * jiffies - unsigned long HZ timestamp difference + * speed - unsigned KB/s transfer rate + */ +#define size_and_speed_to_jiffies(size, speed) \ + ((size) * HZ / (speed) / 1024UL) +#define jiffies_and_speed_to_size(jiffies, speed) \ + (((speed) * (jiffies) * 1024UL) / HZ) +#define jiffies_and_size_to_speed(jiffies, size) \ + ((size) * HZ / (jiffies) / 1024UL) + +/* Limits to report warning */ +/* jiffies_and_size_to_speed(10*HZ, queue_max_hw_sectors(q) * 512UL) ~ 25 */ +#define MIN_SPEED(q) 250 /* 10 times faster than a floppy disk */ +#define MAX_SPEED(q) jiffies_and_size_to_speed(1, queue_max_sectors(q) * 512UL) + +#define speed_valid(speed) ((speed) > 0) + +static const char off[] = "off\n"; + +static int max_speed_show(int speed, char *buf) +{ + if (speed) + return scnprintf(buf, PAGE_SIZE, "%uKB/s\n", speed); + else + return scnprintf(buf, PAGE_SIZE, off); +} + +static int max_speed_store(const char *buf, struct request_queue *q) +{ + unsigned int limit, set = 0; + + if (!strncasecmp(off, buf, sizeof(off) - 2)) + return set; + if (kstrtouint(buf, 0, &set) || (set > INT_MAX)) + return -EINVAL; + if (set == 0) + return set; + limit = MAX_SPEED(q); + if (set > limit) + pr_warn("max speed %u ineffective above %u\n", set, limit); + limit = MIN_SPEED(q); + if (set < limit) + pr_warn("max speed %u painful below %u\n", set, limit); + return set; +} + +static ssize_t max_write_speed_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev)); + int ret = max_speed_show(atomic_read(&md->queue.max_write_speed), buf); + + mmc_blk_put(md); + return ret; +} + +static ssize_t max_write_speed_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev)); + int set = max_speed_store(buf, md->queue.queue); + + if (set < 0) { + mmc_blk_put(md); + return set; + } + + atomic_set(&md->queue.max_write_speed, set); + mmc_blk_put(md); + return count; +} + +static const DEVICE_ATTR(max_write_speed, S_IRUGO | S_IWUSR, + max_write_speed_show, max_write_speed_store); + +static ssize_t max_read_speed_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev)); + int ret = max_speed_show(atomic_read(&md->queue.max_read_speed), buf); + + mmc_blk_put(md); + return ret; +} + +static ssize_t max_read_speed_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev)); + int set = max_speed_store(buf, md->queue.queue); + + if (set < 0) { + mmc_blk_put(md); + return set; + } + + atomic_set(&md->queue.max_read_speed, set); + mmc_blk_put(md); + return count; +} + +static const DEVICE_ATTR(max_read_speed, S_IRUGO | S_IWUSR, + max_read_speed_show, max_read_speed_store); + +static ssize_t cache_size_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev)); + struct mmc_queue *mq = &md->queue; + int cache_size = atomic_read(&mq->cache_size); + int ret; + + if (!cache_size) + ret = scnprintf(buf, PAGE_SIZE, off); + else { + int speed = atomic_read(&mq->max_write_speed); + + if (!speed_valid(speed)) + ret = scnprintf(buf, PAGE_SIZE, "%uMB\n", cache_size); + else { /* We accept race between cache_jiffies and cache_used */ + unsigned long size = jiffies_and_speed_to_size( + jiffies - mq->cache_jiffies, speed); + long used = atomic_long_read(&mq->cache_used); + + if (size >= used) + size = 0; + else + size = (used - size) * 100 / cache_size + / 1024UL / 1024UL; + + ret = scnprintf(buf, PAGE_SIZE, "%uMB %lu%% used\n", + cache_size, size); + } + } + + mmc_blk_put(md); + return ret; +} + +static ssize_t cache_size_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct mmc_blk_data *md; + unsigned int set = 0; + + if (strncasecmp(off, buf, sizeof(off) - 2) + && (kstrtouint(buf, 0, &set) || (set > INT_MAX))) + return -EINVAL; + + md = mmc_blk_get(dev_to_disk(dev)); + atomic_set(&md->queue.cache_size, set); + mmc_blk_put(md); + return count; +} + +static const DEVICE_ATTR(cache_size, S_IRUGO | S_IWUSR, + cache_size_show, cache_size_store); + +/* correct for write-back */ +static long mmc_blk_cache_used(struct mmc_queue *mq, unsigned long waitfor) +{ + long used = 0; + int speed = atomic_read(&mq->max_write_speed); + + if (speed_valid(speed)) { + unsigned long size = jiffies_and_speed_to_size( + waitfor - mq->cache_jiffies, speed); + used = atomic_long_read(&mq->cache_used); + + if (size >= used) + used = 0; + else + used -= size; + } + + atomic_long_set(&mq->cache_used, used); + mq->cache_jiffies = waitfor; + + return used; +} + +static void mmc_blk_simulate_delay( + struct mmc_queue *mq, + struct request *req, + unsigned long waitfor) +{ + int max_speed; + + if (!req) + return; + + max_speed = (rq_data_dir(req) == READ) + ? atomic_read(&mq->max_read_speed) + : atomic_read(&mq->max_write_speed); + if (speed_valid(max_speed)) { + unsigned long bytes = blk_rq_bytes(req); + + if (rq_data_dir(req) != READ) { + int cache_size = atomic_read(&mq->cache_size); + + if (cache_size) { + unsigned long size = cache_size * 1024L * 1024L; + long used = mmc_blk_cache_used(mq, waitfor); + + used += bytes; + atomic_long_set(&mq->cache_used, used); + bytes = 0; + if (used > size) + bytes = used - size; + } + } + waitfor += size_and_speed_to_jiffies(bytes, max_speed); + if (time_is_after_jiffies(waitfor)) { + long msecs = jiffies_to_msecs(waitfor - jiffies); + + if (likely(msecs > 0)) + msleep(msecs); + } + } +} + +#else + +#define mmc_blk_simulate_delay(mq, req, waitfor) + +#endif + static int mmc_blk_open(struct block_device *bdev, fmode_t mode) { struct mmc_blk_data *md = mmc_blk_get(bdev->bd_disk); @@ -1284,6 +1528,23 @@ static int mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req) if (ret) ret = -EIO; +#ifdef CONFIG_MMC_SIMULATE_MAX_SPEED + else if (atomic_read(&mq->cache_size)) { + long used = mmc_blk_cache_used(mq, jiffies); + + if (used) { + int speed = atomic_read(&mq->max_write_speed); + + if (speed_valid(speed)) { + unsigned long msecs = jiffies_to_msecs( + size_and_speed_to_jiffies( + used, speed)); + if (msecs) + msleep(msecs); + } + } + } +#endif blk_end_request_all(req, ret); return ret ? 0 : 1; @@ -1965,6 +2226,9 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc) struct mmc_async_req *areq; const u8 packed_nr = 2; u8 reqs = 0; +#ifdef CONFIG_MMC_SIMULATE_MAX_SPEED + unsigned long waitfor = jiffies; +#endif if (!rqc && !mq->mqrq_prev->req) return 0; @@ -2015,6 +2279,8 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc) */ mmc_blk_reset_success(md, type); + mmc_blk_simulate_delay(mq, rqc, waitfor); + if (mmc_packed_cmd(mq_rq->cmd_type)) { ret = mmc_blk_end_packed_req(mq_rq); break; @@ -2437,6 +2703,14 @@ static void mmc_blk_remove_req(struct mmc_blk_data *md) card->ext_csd.boot_ro_lockable) device_remove_file(disk_to_dev(md->disk), &md->power_ro_lock); +#ifdef CONFIG_MMC_SIMULATE_MAX_SPEED + device_remove_file(disk_to_dev(md->disk), + &dev_attr_max_write_speed); + device_remove_file(disk_to_dev(md->disk), + &dev_attr_max_read_speed); + device_remove_file(disk_to_dev(md->disk), + &dev_attr_cache_size); +#endif del_gendisk(md->disk); } @@ -2471,6 +2745,24 @@ static int mmc_add_disk(struct mmc_blk_data *md) ret = device_create_file(disk_to_dev(md->disk), &md->force_ro); if (ret) goto force_ro_fail; +#ifdef CONFIG_MMC_SIMULATE_MAX_SPEED + atomic_set(&md->queue.max_write_speed, max_write_speed); + ret = device_create_file(disk_to_dev(md->disk), + &dev_attr_max_write_speed); + if (ret) + goto max_write_speed_fail; + atomic_set(&md->queue.max_read_speed, max_read_speed); + ret = device_create_file(disk_to_dev(md->disk), + &dev_attr_max_read_speed); + if (ret) + goto max_read_speed_fail; + atomic_set(&md->queue.cache_size, cache_size); + atomic_long_set(&md->queue.cache_used, 0); + md->queue.cache_jiffies = jiffies; + ret = device_create_file(disk_to_dev(md->disk), &dev_attr_cache_size); + if (ret) + goto cache_size_fail; +#endif if ((md->area_type & MMC_BLK_DATA_AREA_BOOT) && card->ext_csd.boot_ro_lockable) { @@ -2495,6 +2787,14 @@ static int mmc_add_disk(struct mmc_blk_data *md) return ret; power_ro_lock_fail: +#ifdef CONFIG_MMC_SIMULATE_MAX_SPEED + device_remove_file(disk_to_dev(md->disk), &dev_attr_cache_size); +cache_size_fail: + device_remove_file(disk_to_dev(md->disk), &dev_attr_max_read_speed); +max_read_speed_fail: + device_remove_file(disk_to_dev(md->disk), &dev_attr_max_write_speed); +max_write_speed_fail: +#endif device_remove_file(disk_to_dev(md->disk), &md->force_ro); force_ro_fail: del_gendisk(md->disk);
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c index 8037f73..1810f76 100644 --- a/drivers/mmc/card/queue.c +++ b/drivers/mmc/card/queue.c
@@ -19,6 +19,7 @@ #include <linux/mmc/card.h> #include <linux/mmc/host.h> +#include <linux/sched/rt.h> #include "queue.h" #include "block.h" @@ -53,6 +54,11 @@ static int mmc_queue_thread(void *d) { struct mmc_queue *mq = d; struct request_queue *q = mq->queue; + struct sched_param scheduler_params = {0}; + + scheduler_params.sched_priority = 1; + + sched_setscheduler(current, SCHED_FIFO, &scheduler_params); current->flags |= PF_MEMALLOC;
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h index 342f1e3..fe58d31 100644 --- a/drivers/mmc/card/queue.h +++ b/drivers/mmc/card/queue.h
@@ -62,6 +62,14 @@ struct mmc_queue { struct mmc_queue_req mqrq[2]; struct mmc_queue_req *mqrq_cur; struct mmc_queue_req *mqrq_prev; +#ifdef CONFIG_MMC_SIMULATE_MAX_SPEED + atomic_t max_write_speed; + atomic_t max_read_speed; + atomic_t cache_size; + /* i/o tracking */ + atomic_long_t cache_used; + unsigned long cache_jiffies; +#endif }; extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *,
diff --git a/drivers/mmc/core/Kconfig b/drivers/mmc/core/Kconfig index 250f223..daad32f 100644 --- a/drivers/mmc/core/Kconfig +++ b/drivers/mmc/core/Kconfig
@@ -22,3 +22,18 @@ This driver can also be built as a module. If so, the module will be called pwrseq_simple. + +config MMC_EMBEDDED_SDIO + boolean "MMC embedded SDIO device support (EXPERIMENTAL)" + help + If you say Y here, support will be added for embedded SDIO + devices which do not contain the necessary enumeration + support in hardware to be properly detected. + +config MMC_PARANOID_SD_INIT + bool "Enable paranoid SD card initialization (EXPERIMENTAL)" + help + If you say Y here, the MMC layer will be extra paranoid + about re-trying SD init requests. This can be a useful + work-around for buggy controllers and hardware. Enable + if you are experiencing issues with SD detection.
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index cff58297..e19c370 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c
@@ -201,6 +201,20 @@ void mmc_request_done(struct mmc_host *host, struct mmc_request *mrq) pr_debug("%s: %d bytes transferred: %d\n", mmc_hostname(host), mrq->data->bytes_xfered, mrq->data->error); +#ifdef CONFIG_BLOCK + if (mrq->lat_hist_enabled) { + ktime_t completion; + u_int64_t delta_us; + + completion = ktime_get(); + delta_us = ktime_us_delta(completion, + mrq->io_start); + blk_update_latency_hist( + (mrq->data->flags & MMC_DATA_READ) ? + &host->io_lat_read : + &host->io_lat_write, delta_us); + } +#endif } if (mrq->stop) { @@ -699,8 +713,16 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host, } } - if (!err && areq) + if (!err && areq) { +#ifdef CONFIG_BLOCK + if (host->latency_hist_enabled) { + areq->mrq->io_start = ktime_get(); + areq->mrq->lat_hist_enabled = 1; + } else + areq->mrq->lat_hist_enabled = 0; +#endif start_err = __mmc_start_data_req(host, areq->mrq); + } if (host->areq) mmc_post_req(host, host->areq->mrq, 0); @@ -2051,7 +2073,7 @@ void mmc_init_erase(struct mmc_card *card) } static unsigned int mmc_mmc_erase_timeout(struct mmc_card *card, - unsigned int arg, unsigned int qty) + unsigned int arg, unsigned int qty) { unsigned int erase_timeout; @@ -3034,6 +3056,22 @@ void mmc_init_context_info(struct mmc_host *host) init_waitqueue_head(&host->context_info.wait); } +#ifdef CONFIG_MMC_EMBEDDED_SDIO +void mmc_set_embedded_sdio_data(struct mmc_host *host, + struct sdio_cis *cis, + struct sdio_cccr *cccr, + struct sdio_embedded_func *funcs, + int num_funcs) +{ + host->embedded_sdio_data.cis = cis; + host->embedded_sdio_data.cccr = cccr; + host->embedded_sdio_data.funcs = funcs; + host->embedded_sdio_data.num_funcs = num_funcs; +} + +EXPORT_SYMBOL(mmc_set_embedded_sdio_data); +#endif + static int __init mmc_init(void) { int ret; @@ -3066,6 +3104,63 @@ static void __exit mmc_exit(void) mmc_unregister_bus(); } +#ifdef CONFIG_BLOCK +static ssize_t +latency_hist_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct mmc_host *host = cls_dev_to_mmc_host(dev); + size_t written_bytes; + + written_bytes = blk_latency_hist_show("Read", &host->io_lat_read, + buf, PAGE_SIZE); + written_bytes += blk_latency_hist_show("Write", &host->io_lat_write, + buf + written_bytes, PAGE_SIZE - written_bytes); + + return written_bytes; +} + +/* + * Values permitted 0, 1, 2. + * 0 -> Disable IO latency histograms (default) + * 1 -> Enable IO latency histograms + * 2 -> Zero out IO latency histograms + */ +static ssize_t +latency_hist_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct mmc_host *host = cls_dev_to_mmc_host(dev); + long value; + + if (kstrtol(buf, 0, &value)) + return -EINVAL; + if (value == BLK_IO_LAT_HIST_ZERO) { + memset(&host->io_lat_read, 0, sizeof(host->io_lat_read)); + memset(&host->io_lat_write, 0, sizeof(host->io_lat_write)); + } else if (value == BLK_IO_LAT_HIST_ENABLE || + value == BLK_IO_LAT_HIST_DISABLE) + host->latency_hist_enabled = value; + return count; +} + +static DEVICE_ATTR(latency_hist, S_IRUGO | S_IWUSR, + latency_hist_show, latency_hist_store); + +void +mmc_latency_hist_sysfs_init(struct mmc_host *host) +{ + if (device_create_file(&host->class_dev, &dev_attr_latency_hist)) + dev_err(&host->class_dev, + "Failed to create latency_hist sysfs entry\n"); +} + +void +mmc_latency_hist_sysfs_exit(struct mmc_host *host) +{ + device_remove_file(&host->class_dev, &dev_attr_latency_hist); +} +#endif + subsys_initcall(mmc_init); module_exit(mmc_exit);
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c index 848b345..07f8891 100644 --- a/drivers/mmc/core/host.c +++ b/drivers/mmc/core/host.c
@@ -31,8 +31,6 @@ #include "slot-gpio.h" #include "pwrseq.h" -#define cls_dev_to_mmc_host(d) container_of(d, struct mmc_host, class_dev) - static DEFINE_IDA(mmc_host_ida); static DEFINE_SPINLOCK(mmc_host_lock); @@ -428,8 +426,13 @@ int mmc_add_host(struct mmc_host *host) mmc_add_host_debugfs(host); #endif +#ifdef CONFIG_BLOCK + mmc_latency_hist_sysfs_init(host); +#endif + mmc_start_host(host); - mmc_register_pm_notifier(host); + if (!(host->pm_flags & MMC_PM_IGNORE_PM_NOTIFY)) + mmc_register_pm_notifier(host); return 0; } @@ -446,13 +449,18 @@ EXPORT_SYMBOL(mmc_add_host); */ void mmc_remove_host(struct mmc_host *host) { - mmc_unregister_pm_notifier(host); + if (!(host->pm_flags & MMC_PM_IGNORE_PM_NOTIFY)) + mmc_unregister_pm_notifier(host); mmc_stop_host(host); #ifdef CONFIG_DEBUG_FS mmc_remove_host_debugfs(host); #endif +#ifdef CONFIG_BLOCK + mmc_latency_hist_sysfs_exit(host); +#endif + device_del(&host->class_dev); led_trigger_unregister_simple(host->led);
diff --git a/drivers/mmc/core/host.h b/drivers/mmc/core/host.h index 992bf53..bf38533 100644 --- a/drivers/mmc/core/host.h +++ b/drivers/mmc/core/host.h
@@ -12,6 +12,8 @@ #define _MMC_CORE_HOST_H #include <linux/mmc/host.h> +#define cls_dev_to_mmc_host(d) container_of(d, struct mmc_host, class_dev) + int mmc_register_host_class(void); void mmc_unregister_host_class(void); @@ -21,5 +23,8 @@ void mmc_retune_hold(struct mmc_host *host); void mmc_retune_release(struct mmc_host *host); int mmc_retune(struct mmc_host *host); +void mmc_latency_hist_sysfs_init(struct mmc_host *host); +void mmc_latency_hist_sysfs_exit(struct mmc_host *host); + #endif
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index 0c6de9f..3e5954f 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c
@@ -617,6 +617,12 @@ static int mmc_decode_ext_csd(struct mmc_card *card, u8 *ext_csd) card->ext_csd.ffu_capable = (ext_csd[EXT_CSD_SUPPORTED_MODE] & 0x1) && !(ext_csd[EXT_CSD_FW_CONFIG] & 0x1); + + card->ext_csd.pre_eol_info = ext_csd[EXT_CSD_PRE_EOL_INFO]; + card->ext_csd.device_life_time_est_typ_a = + ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]; + card->ext_csd.device_life_time_est_typ_b = + ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]; } out: return err; @@ -746,6 +752,11 @@ MMC_DEV_ATTR(manfid, "0x%06x\n", card->cid.manfid); MMC_DEV_ATTR(name, "%s\n", card->cid.prod_name); MMC_DEV_ATTR(oemid, "0x%04x\n", card->cid.oemid); MMC_DEV_ATTR(prv, "0x%x\n", card->cid.prv); +MMC_DEV_ATTR(rev, "0x%x\n", card->ext_csd.rev); +MMC_DEV_ATTR(pre_eol_info, "%02x\n", card->ext_csd.pre_eol_info); +MMC_DEV_ATTR(life_time, "0x%02x 0x%02x\n", + card->ext_csd.device_life_time_est_typ_a, + card->ext_csd.device_life_time_est_typ_b); MMC_DEV_ATTR(serial, "0x%08x\n", card->cid.serial); MMC_DEV_ATTR(enhanced_area_offset, "%llu\n", card->ext_csd.enhanced_area_offset); @@ -799,6 +810,9 @@ static struct attribute *mmc_std_attrs[] = { &dev_attr_name.attr, &dev_attr_oemid.attr, &dev_attr_prv.attr, + &dev_attr_rev.attr, + &dev_attr_pre_eol_info.attr, + &dev_attr_life_time.attr, &dev_attr_serial.attr, &dev_attr_enhanced_area_offset.attr, &dev_attr_enhanced_area_size.attr,
diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c index f09148a..ad70934 100644 --- a/drivers/mmc/core/sd.c +++ b/drivers/mmc/core/sd.c
@@ -847,6 +847,9 @@ int mmc_sd_setup_card(struct mmc_host *host, struct mmc_card *card, bool reinit) { int err; +#ifdef CONFIG_MMC_PARANOID_SD_INIT + int retries; +#endif if (!reinit) { /* @@ -873,7 +876,26 @@ int mmc_sd_setup_card(struct mmc_host *host, struct mmc_card *card, /* * Fetch switch information from card. */ +#ifdef CONFIG_MMC_PARANOID_SD_INIT + for (retries = 1; retries <= 3; retries++) { + err = mmc_read_switch(card); + if (!err) { + if (retries > 1) { + printk(KERN_WARNING + "%s: recovered\n", + mmc_hostname(host)); + } + break; + } else { + printk(KERN_WARNING + "%s: read switch failed (attempt %d)\n", + mmc_hostname(host), retries); + } + } +#else err = mmc_read_switch(card); +#endif + if (err) return err; } @@ -1071,7 +1093,10 @@ static int mmc_sd_alive(struct mmc_host *host) */ static void mmc_sd_detect(struct mmc_host *host) { - int err; + int err = 0; +#ifdef CONFIG_MMC_PARANOID_SD_INIT + int retries = 5; +#endif BUG_ON(!host); BUG_ON(!host->card); @@ -1081,7 +1106,23 @@ static void mmc_sd_detect(struct mmc_host *host) /* * Just check if our card has been removed. */ +#ifdef CONFIG_MMC_PARANOID_SD_INIT + while(retries) { + err = mmc_send_status(host->card, NULL); + if (err) { + retries--; + udelay(5); + continue; + } + break; + } + if (!retries) { + printk(KERN_ERR "%s(%s): Unable to re-detect card (%d)\n", + __func__, mmc_hostname(host), err); + } +#else err = _mmc_detect_card_removed(host); +#endif mmc_put_card(host->card); @@ -1143,6 +1184,9 @@ static int mmc_sd_suspend(struct mmc_host *host) static int _mmc_sd_resume(struct mmc_host *host) { int err = 0; +#ifdef CONFIG_MMC_PARANOID_SD_INIT + int retries; +#endif BUG_ON(!host); BUG_ON(!host->card); @@ -1153,7 +1197,23 @@ static int _mmc_sd_resume(struct mmc_host *host) goto out; mmc_power_up(host, host->card->ocr); +#ifdef CONFIG_MMC_PARANOID_SD_INIT + retries = 5; + while (retries) { + err = mmc_sd_init_card(host, host->card->ocr, host->card); + + if (err) { + printk(KERN_ERR "%s: Re-init card rc = %d (retries = %d)\n", + mmc_hostname(host), err, retries); + mdelay(5); + retries--; + continue; + } + break; + } +#else err = mmc_sd_init_card(host, host->card->ocr, host->card); +#endif mmc_card_clr_suspended(host->card); out: @@ -1228,6 +1288,9 @@ int mmc_attach_sd(struct mmc_host *host) { int err; u32 ocr, rocr; +#ifdef CONFIG_MMC_PARANOID_SD_INIT + int retries; +#endif BUG_ON(!host); WARN_ON(!host->claimed); @@ -1264,9 +1327,27 @@ int mmc_attach_sd(struct mmc_host *host) /* * Detect and init the card. */ +#ifdef CONFIG_MMC_PARANOID_SD_INIT + retries = 5; + while (retries) { + err = mmc_sd_init_card(host, rocr, NULL); + if (err) { + retries--; + continue; + } + break; + } + + if (!retries) { + printk(KERN_ERR "%s: mmc_sd_init_card() failure (err = %d)\n", + mmc_hostname(host), err); + goto err; + } +#else err = mmc_sd_init_card(host, rocr, NULL); if (err) goto err; +#endif mmc_release_host(host); err = mmc_add_card(host->card);
diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c index bd44ba8..b5ec3c8 100644 --- a/drivers/mmc/core/sdio.c +++ b/drivers/mmc/core/sdio.c
@@ -10,6 +10,7 @@ */ #include <linux/err.h> +#include <linux/module.h> #include <linux/pm_runtime.h> #include <linux/mmc/host.h> @@ -21,6 +22,7 @@ #include "core.h" #include "bus.h" +#include "host.h" #include "sd.h" #include "sdio_bus.h" #include "mmc_ops.h" @@ -28,6 +30,10 @@ #include "sdio_ops.h" #include "sdio_cis.h" +#ifdef CONFIG_MMC_EMBEDDED_SDIO +#include <linux/mmc/sdio_ids.h> +#endif + static int sdio_read_fbr(struct sdio_func *func) { int ret; @@ -697,19 +703,35 @@ static int mmc_sdio_init_card(struct mmc_host *host, u32 ocr, goto finish; } - /* - * Read the common registers. - */ - err = sdio_read_cccr(card, ocr); - if (err) - goto remove; +#ifdef CONFIG_MMC_EMBEDDED_SDIO + if (host->embedded_sdio_data.cccr) + memcpy(&card->cccr, host->embedded_sdio_data.cccr, sizeof(struct sdio_cccr)); + else { +#endif + /* + * Read the common registers. + */ + err = sdio_read_cccr(card, ocr); + if (err) + goto remove; +#ifdef CONFIG_MMC_EMBEDDED_SDIO + } +#endif - /* - * Read the common CIS tuples. - */ - err = sdio_read_common_cis(card); - if (err) - goto remove; +#ifdef CONFIG_MMC_EMBEDDED_SDIO + if (host->embedded_sdio_data.cis) + memcpy(&card->cis, host->embedded_sdio_data.cis, sizeof(struct sdio_cis)); + else { +#endif + /* + * Read the common CIS tuples. + */ + err = sdio_read_common_cis(card); + if (err) + goto remove; +#ifdef CONFIG_MMC_EMBEDDED_SDIO + } +#endif if (oldcard) { int same = (card->cis.vendor == oldcard->cis.vendor && @@ -1118,14 +1140,36 @@ int mmc_attach_sdio(struct mmc_host *host) funcs = (ocr & 0x70000000) >> 28; card->sdio_funcs = 0; +#ifdef CONFIG_MMC_EMBEDDED_SDIO + if (host->embedded_sdio_data.funcs) + card->sdio_funcs = funcs = host->embedded_sdio_data.num_funcs; +#endif + /* * Initialize (but don't add) all present functions. */ for (i = 0; i < funcs; i++, card->sdio_funcs++) { - err = sdio_init_func(host->card, i + 1); - if (err) - goto remove; +#ifdef CONFIG_MMC_EMBEDDED_SDIO + if (host->embedded_sdio_data.funcs) { + struct sdio_func *tmp; + tmp = sdio_alloc_func(host->card); + if (IS_ERR(tmp)) + goto remove; + tmp->num = (i + 1); + card->sdio_func[i] = tmp; + tmp->class = host->embedded_sdio_data.funcs[i].f_class; + tmp->max_blksize = host->embedded_sdio_data.funcs[i].f_maxblksize; + tmp->vendor = card->cis.vendor; + tmp->device = card->cis.device; + } else { +#endif + err = sdio_init_func(host->card, i + 1); + if (err) + goto remove; +#ifdef CONFIG_MMC_EMBEDDED_SDIO + } +#endif /* * Enable Runtime PM for this func (if supported) */ @@ -1173,3 +1217,42 @@ int mmc_attach_sdio(struct mmc_host *host) return err; } +int sdio_reset_comm(struct mmc_card *card) +{ + struct mmc_host *host = card->host; + u32 ocr; + u32 rocr; + int err; + + printk("%s():\n", __func__); + mmc_claim_host(host); + + mmc_retune_disable(host); + + mmc_go_idle(host); + + mmc_set_clock(host, host->f_min); + + err = mmc_send_io_op_cond(host, 0, &ocr); + if (err) + goto err; + + rocr = mmc_select_voltage(host, ocr); + if (!rocr) { + err = -EINVAL; + goto err; + } + + err = mmc_sdio_init_card(host, rocr, card, 0); + if (err) + goto err; + + mmc_release_host(host); + return 0; +err: + printk("%s: Error resetting SDIO communications (%d)\n", + mmc_hostname(host), err); + mmc_release_host(host); + return err; +} +EXPORT_SYMBOL(sdio_reset_comm);
diff --git a/drivers/mmc/core/sdio_bus.c b/drivers/mmc/core/sdio_bus.c index d56a3b6..528524a2 100644 --- a/drivers/mmc/core/sdio_bus.c +++ b/drivers/mmc/core/sdio_bus.c
@@ -28,6 +28,10 @@ #include "sdio_cis.h" #include "sdio_bus.h" +#ifdef CONFIG_MMC_EMBEDDED_SDIO +#include <linux/mmc/host.h> +#endif + #define to_sdio_driver(d) container_of(d, struct sdio_driver, drv) /* show configuration fields */ @@ -263,7 +267,14 @@ static void sdio_release_func(struct device *dev) { struct sdio_func *func = dev_to_sdio_func(dev); - sdio_free_func_cis(func); +#ifdef CONFIG_MMC_EMBEDDED_SDIO + /* + * If this device is embedded then we never allocated + * cis tables for this func + */ + if (!func->card->host->embedded_sdio_data.funcs) +#endif + sdio_free_func_cis(func); kfree(func->info); kfree(func->tmpbuf);
diff --git a/drivers/mmc/core/sdio_io.c b/drivers/mmc/core/sdio_io.c index 406e5f0..3734cba 100644 --- a/drivers/mmc/core/sdio_io.c +++ b/drivers/mmc/core/sdio_io.c
@@ -390,6 +390,39 @@ u8 sdio_readb(struct sdio_func *func, unsigned int addr, int *err_ret) EXPORT_SYMBOL_GPL(sdio_readb); /** + * sdio_readb_ext - read a single byte from a SDIO function + * @func: SDIO function to access + * @addr: address to read + * @err_ret: optional status value from transfer + * @in: value to add to argument + * + * Reads a single byte from the address space of a given SDIO + * function. If there is a problem reading the address, 0xff + * is returned and @err_ret will contain the error code. + */ +unsigned char sdio_readb_ext(struct sdio_func *func, unsigned int addr, + int *err_ret, unsigned in) +{ + int ret; + unsigned char val; + + BUG_ON(!func); + + if (err_ret) + *err_ret = 0; + + ret = mmc_io_rw_direct(func->card, 0, func->num, addr, (u8)in, &val); + if (ret) { + if (err_ret) + *err_ret = ret; + return 0xFF; + } + + return val; +} +EXPORT_SYMBOL_GPL(sdio_readb_ext); + +/** * sdio_writeb - write a single byte to a SDIO function * @func: SDIO function to access * @b: byte to write
diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c index 7c887f1..62fd690 100644 --- a/drivers/mtd/devices/block2mtd.c +++ b/drivers/mtd/devices/block2mtd.c
@@ -431,7 +431,7 @@ static int block2mtd_setup2(const char *val) } -static int block2mtd_setup(const char *val, struct kernel_param *kp) +static int block2mtd_setup(const char *val, const struct kernel_param *kp) { #ifdef MODULE return block2mtd_setup2(val);
diff --git a/drivers/mtd/devices/phram.c b/drivers/mtd/devices/phram.c index 8b66e52..7287696 100644 --- a/drivers/mtd/devices/phram.c +++ b/drivers/mtd/devices/phram.c
@@ -266,7 +266,7 @@ static int phram_setup(const char *val) return ret; } -static int phram_param_call(const char *val, struct kernel_param *kp) +static int phram_param_call(const char *val, const struct kernel_param *kp) { #ifdef MODULE return phram_setup(val);
diff --git a/drivers/mtd/nand/Kconfig b/drivers/mtd/nand/Kconfig index b254090..50ee1ba 100644 --- a/drivers/mtd/nand/Kconfig +++ b/drivers/mtd/nand/Kconfig
@@ -1,3 +1,10 @@ +config MTD_NAND_IDS + tristate "Include chip ids for known NAND devices." + depends on MTD + help + Useful for NAND drivers that do not use the NAND subsystem but + still like to take advantage of the known chip information. + config MTD_NAND_ECC tristate @@ -109,9 +116,6 @@ config MTD_NAND_OMAP_BCH_BUILD def_tristate MTD_NAND_OMAP2 && MTD_NAND_OMAP_BCH -config MTD_NAND_IDS - tristate - config MTD_NAND_RICOH tristate "Ricoh xD card reader" default n
diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c index ad2b57c6..541c179 100644 --- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c
@@ -1403,7 +1403,7 @@ static int __init bytes_str_to_int(const char *str) * This function returns zero in case of success and a negative error code in * case of error. */ -static int __init ubi_mtd_param_parse(const char *val, struct kernel_param *kp) +static int __init ubi_mtd_param_parse(const char *val, const struct kernel_param *kp) { int i, len; struct mtd_dev_param *p;
diff --git a/drivers/net/ppp/Kconfig b/drivers/net/ppp/Kconfig index 1373c6d..282aec4 100644 --- a/drivers/net/ppp/Kconfig +++ b/drivers/net/ppp/Kconfig
@@ -149,6 +149,23 @@ tunnels. L2TP is replacing PPTP for VPN uses. if TTY +config PPPOLAC + tristate "PPP on L2TP Access Concentrator" + depends on PPP && INET + help + L2TP (RFC 2661) is a tunneling protocol widely used in virtual private + networks. This driver handles L2TP data packets between a UDP socket + and a PPP channel, but only permits one session per socket. Thus it is + fairly simple and suited for clients. + +config PPPOPNS + tristate "PPP on PPTP Network Server" + depends on PPP && INET + help + PPTP (RFC 2637) is a tunneling protocol widely used in virtual private + networks. This driver handles PPTP data packets between a RAW socket + and a PPP channel. It is fairly simple and easy to use. + config PPP_ASYNC tristate "PPP support for async serial ports" depends on PPP
diff --git a/drivers/net/ppp/Makefile b/drivers/net/ppp/Makefile index a6b6297..d283d03c 100644 --- a/drivers/net/ppp/Makefile +++ b/drivers/net/ppp/Makefile
@@ -11,3 +11,5 @@ obj-$(CONFIG_PPPOE) += pppox.o pppoe.o obj-$(CONFIG_PPPOL2TP) += pppox.o obj-$(CONFIG_PPTP) += pppox.o pptp.o +obj-$(CONFIG_PPPOLAC) += pppox.o pppolac.o +obj-$(CONFIG_PPPOPNS) += pppox.o pppopns.o
diff --git a/drivers/net/ppp/pppolac.c b/drivers/net/ppp/pppolac.c new file mode 100644 index 0000000..3a45cf80 --- /dev/null +++ b/drivers/net/ppp/pppolac.c
@@ -0,0 +1,450 @@ +/* drivers/net/pppolac.c + * + * Driver for PPP on L2TP Access Concentrator / PPPoLAC Socket (RFC 2661) + * + * Copyright (C) 2009 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +/* This driver handles L2TP data packets between a UDP socket and a PPP channel. + * The socket must keep connected, and only one session per socket is permitted. + * Sequencing of outgoing packets is controlled by LNS. Incoming packets with + * sequences are reordered within a sliding window of one second. Currently + * reordering only happens when a packet is received. It is done for simplicity + * since no additional locks or threads are required. This driver only works on + * IPv4 due to the lack of UDP encapsulation support in IPv6. */ + +#include <linux/module.h> +#include <linux/jiffies.h> +#include <linux/workqueue.h> +#include <linux/skbuff.h> +#include <linux/file.h> +#include <linux/netdevice.h> +#include <linux/net.h> +#include <linux/udp.h> +#include <linux/ppp_defs.h> +#include <linux/if_ppp.h> +#include <linux/if_pppox.h> +#include <linux/ppp_channel.h> +#include <net/tcp_states.h> +#include <asm/uaccess.h> + +#define L2TP_CONTROL_BIT 0x80 +#define L2TP_LENGTH_BIT 0x40 +#define L2TP_SEQUENCE_BIT 0x08 +#define L2TP_OFFSET_BIT 0x02 +#define L2TP_VERSION 0x02 +#define L2TP_VERSION_MASK 0x0F + +#define PPP_ADDR 0xFF +#define PPP_CTRL 0x03 + +union unaligned { + __u32 u32; +} __attribute__((packed)); + +static inline union unaligned *unaligned(void *ptr) +{ + return (union unaligned *)ptr; +} + +struct meta { + __u32 sequence; + __u32 timestamp; +}; + +static inline struct meta *skb_meta(struct sk_buff *skb) +{ + return (struct meta *)skb->cb; +} + +/******************************************************************************/ + +static int pppolac_recv_core(struct sock *sk_udp, struct sk_buff *skb) +{ + struct sock *sk = (struct sock *)sk_udp->sk_user_data; + struct pppolac_opt *opt = &pppox_sk(sk)->proto.lac; + struct meta *meta = skb_meta(skb); + __u32 now = jiffies; + __u8 bits; + __u8 *ptr; + + /* Drop the packet if L2TP header is missing. */ + if (skb->len < sizeof(struct udphdr) + 6) + goto drop; + + /* Put it back if it is a control packet. */ + if (skb->data[sizeof(struct udphdr)] & L2TP_CONTROL_BIT) + return opt->backlog_rcv(sk_udp, skb); + + /* Skip UDP header. */ + skb_pull(skb, sizeof(struct udphdr)); + + /* Check the version. */ + if ((skb->data[1] & L2TP_VERSION_MASK) != L2TP_VERSION) + goto drop; + bits = skb->data[0]; + ptr = &skb->data[2]; + + /* Check the length if it is present. */ + if (bits & L2TP_LENGTH_BIT) { + if ((ptr[0] << 8 | ptr[1]) != skb->len) + goto drop; + ptr += 2; + } + + /* Skip all fields including optional ones. */ + if (!skb_pull(skb, 6 + (bits & L2TP_SEQUENCE_BIT ? 4 : 0) + + (bits & L2TP_LENGTH_BIT ? 2 : 0) + + (bits & L2TP_OFFSET_BIT ? 2 : 0))) + goto drop; + + /* Skip the offset padding if it is present. */ + if (bits & L2TP_OFFSET_BIT && + !skb_pull(skb, skb->data[-2] << 8 | skb->data[-1])) + goto drop; + + /* Check the tunnel and the session. */ + if (unaligned(ptr)->u32 != opt->local) + goto drop; + + /* Check the sequence if it is present. */ + if (bits & L2TP_SEQUENCE_BIT) { + meta->sequence = ptr[4] << 8 | ptr[5]; + if ((__s16)(meta->sequence - opt->recv_sequence) < 0) + goto drop; + } + + /* Skip PPP address and control if they are present. */ + if (skb->len >= 2 && skb->data[0] == PPP_ADDR && + skb->data[1] == PPP_CTRL) + skb_pull(skb, 2); + + /* Fix PPP protocol if it is compressed. */ + if (skb->len >= 1 && skb->data[0] & 1) + skb_push(skb, 1)[0] = 0; + + /* Drop the packet if PPP protocol is missing. */ + if (skb->len < 2) + goto drop; + + /* Perform reordering if sequencing is enabled. */ + atomic_set(&opt->sequencing, bits & L2TP_SEQUENCE_BIT); + if (bits & L2TP_SEQUENCE_BIT) { + struct sk_buff *skb1; + + /* Insert the packet into receive queue in order. */ + skb_set_owner_r(skb, sk); + skb_queue_walk(&sk->sk_receive_queue, skb1) { + struct meta *meta1 = skb_meta(skb1); + __s16 order = meta->sequence - meta1->sequence; + if (order == 0) + goto drop; + if (order < 0) { + meta->timestamp = meta1->timestamp; + skb_insert(skb1, skb, &sk->sk_receive_queue); + skb = NULL; + break; + } + } + if (skb) { + meta->timestamp = now; + skb_queue_tail(&sk->sk_receive_queue, skb); + } + + /* Remove packets from receive queue as long as + * 1. the receive buffer is full, + * 2. they are queued longer than one second, or + * 3. there are no missing packets before them. */ + skb_queue_walk_safe(&sk->sk_receive_queue, skb, skb1) { + meta = skb_meta(skb); + if (atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf && + now - meta->timestamp < HZ && + meta->sequence != opt->recv_sequence) + break; + skb_unlink(skb, &sk->sk_receive_queue); + opt->recv_sequence = (__u16)(meta->sequence + 1); + skb_orphan(skb); + ppp_input(&pppox_sk(sk)->chan, skb); + } + return NET_RX_SUCCESS; + } + + /* Flush receive queue if sequencing is disabled. */ + skb_queue_purge(&sk->sk_receive_queue); + skb_orphan(skb); + ppp_input(&pppox_sk(sk)->chan, skb); + return NET_RX_SUCCESS; +drop: + kfree_skb(skb); + return NET_RX_DROP; +} + +static int pppolac_recv(struct sock *sk_udp, struct sk_buff *skb) +{ + sock_hold(sk_udp); + sk_receive_skb(sk_udp, skb, 0); + return 0; +} + +static struct sk_buff_head delivery_queue; + +static void pppolac_xmit_core(struct work_struct *delivery_work) +{ + mm_segment_t old_fs = get_fs(); + struct sk_buff *skb; + + set_fs(KERNEL_DS); + while ((skb = skb_dequeue(&delivery_queue))) { + struct sock *sk_udp = skb->sk; + struct kvec iov = {.iov_base = skb->data, .iov_len = skb->len}; + struct msghdr msg = { + .msg_flags = MSG_NOSIGNAL | MSG_DONTWAIT, + }; + + iov_iter_kvec(&msg.msg_iter, WRITE | ITER_KVEC, &iov, 1, + skb->len); + sk_udp->sk_prot->sendmsg(sk_udp, &msg, skb->len); + kfree_skb(skb); + } + set_fs(old_fs); +} + +static DECLARE_WORK(delivery_work, pppolac_xmit_core); + +static int pppolac_xmit(struct ppp_channel *chan, struct sk_buff *skb) +{ + struct sock *sk_udp = (struct sock *)chan->private; + struct pppolac_opt *opt = &pppox_sk(sk_udp->sk_user_data)->proto.lac; + + /* Install PPP address and control. */ + skb_push(skb, 2); + skb->data[0] = PPP_ADDR; + skb->data[1] = PPP_CTRL; + + /* Install L2TP header. */ + if (atomic_read(&opt->sequencing)) { + skb_push(skb, 10); + skb->data[0] = L2TP_SEQUENCE_BIT; + skb->data[6] = opt->xmit_sequence >> 8; + skb->data[7] = opt->xmit_sequence; + skb->data[8] = 0; + skb->data[9] = 0; + opt->xmit_sequence++; + } else { + skb_push(skb, 6); + skb->data[0] = 0; + } + skb->data[1] = L2TP_VERSION; + unaligned(&skb->data[2])->u32 = opt->remote; + + /* Now send the packet via the delivery queue. */ + skb_set_owner_w(skb, sk_udp); + skb_queue_tail(&delivery_queue, skb); + schedule_work(&delivery_work); + return 1; +} + +/******************************************************************************/ + +static struct ppp_channel_ops pppolac_channel_ops = { + .start_xmit = pppolac_xmit, +}; + +static int pppolac_connect(struct socket *sock, struct sockaddr *useraddr, + int addrlen, int flags) +{ + struct sock *sk = sock->sk; + struct pppox_sock *po = pppox_sk(sk); + struct sockaddr_pppolac *addr = (struct sockaddr_pppolac *)useraddr; + struct socket *sock_udp = NULL; + struct sock *sk_udp; + int error; + + if (addrlen != sizeof(struct sockaddr_pppolac) || + !addr->local.tunnel || !addr->local.session || + !addr->remote.tunnel || !addr->remote.session) { + return -EINVAL; + } + + lock_sock(sk); + error = -EALREADY; + if (sk->sk_state != PPPOX_NONE) + goto out; + + sock_udp = sockfd_lookup(addr->udp_socket, &error); + if (!sock_udp) + goto out; + sk_udp = sock_udp->sk; + lock_sock(sk_udp); + + /* Remove this check when IPv6 supports UDP encapsulation. */ + error = -EAFNOSUPPORT; + if (sk_udp->sk_family != AF_INET) + goto out; + error = -EPROTONOSUPPORT; + if (sk_udp->sk_protocol != IPPROTO_UDP) + goto out; + error = -EDESTADDRREQ; + if (sk_udp->sk_state != TCP_ESTABLISHED) + goto out; + error = -EBUSY; + if (udp_sk(sk_udp)->encap_type || sk_udp->sk_user_data) + goto out; + if (!sk_udp->sk_bound_dev_if) { + struct dst_entry *dst = sk_dst_get(sk_udp); + error = -ENODEV; + if (!dst) + goto out; + sk_udp->sk_bound_dev_if = dst->dev->ifindex; + dst_release(dst); + } + + po->chan.hdrlen = 12; + po->chan.private = sk_udp; + po->chan.ops = &pppolac_channel_ops; + po->chan.mtu = PPP_MRU - 80; + po->proto.lac.local = unaligned(&addr->local)->u32; + po->proto.lac.remote = unaligned(&addr->remote)->u32; + atomic_set(&po->proto.lac.sequencing, 1); + po->proto.lac.backlog_rcv = sk_udp->sk_backlog_rcv; + + error = ppp_register_channel(&po->chan); + if (error) + goto out; + + sk->sk_state = PPPOX_CONNECTED; + udp_sk(sk_udp)->encap_type = UDP_ENCAP_L2TPINUDP; + udp_sk(sk_udp)->encap_rcv = pppolac_recv; + sk_udp->sk_backlog_rcv = pppolac_recv_core; + sk_udp->sk_user_data = sk; +out: + if (sock_udp) { + release_sock(sk_udp); + if (error) + sockfd_put(sock_udp); + } + release_sock(sk); + return error; +} + +static int pppolac_release(struct socket *sock) +{ + struct sock *sk = sock->sk; + + if (!sk) + return 0; + + lock_sock(sk); + if (sock_flag(sk, SOCK_DEAD)) { + release_sock(sk); + return -EBADF; + } + + if (sk->sk_state != PPPOX_NONE) { + struct sock *sk_udp = (struct sock *)pppox_sk(sk)->chan.private; + lock_sock(sk_udp); + skb_queue_purge(&sk->sk_receive_queue); + pppox_unbind_sock(sk); + udp_sk(sk_udp)->encap_type = 0; + udp_sk(sk_udp)->encap_rcv = NULL; + sk_udp->sk_backlog_rcv = pppox_sk(sk)->proto.lac.backlog_rcv; + sk_udp->sk_user_data = NULL; + release_sock(sk_udp); + sockfd_put(sk_udp->sk_socket); + } + + sock_orphan(sk); + sock->sk = NULL; + release_sock(sk); + sock_put(sk); + return 0; +} + +/******************************************************************************/ + +static struct proto pppolac_proto = { + .name = "PPPOLAC", + .owner = THIS_MODULE, + .obj_size = sizeof(struct pppox_sock), +}; + +static struct proto_ops pppolac_proto_ops = { + .family = PF_PPPOX, + .owner = THIS_MODULE, + .release = pppolac_release, + .bind = sock_no_bind, + .connect = pppolac_connect, + .socketpair = sock_no_socketpair, + .accept = sock_no_accept, + .getname = sock_no_getname, + .poll = sock_no_poll, + .ioctl = pppox_ioctl, + .listen = sock_no_listen, + .shutdown = sock_no_shutdown, + .setsockopt = sock_no_setsockopt, + .getsockopt = sock_no_getsockopt, + .sendmsg = sock_no_sendmsg, + .recvmsg = sock_no_recvmsg, + .mmap = sock_no_mmap, +}; + +static int pppolac_create(struct net *net, struct socket *sock, int kern) +{ + struct sock *sk; + + sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppolac_proto, kern); + if (!sk) + return -ENOMEM; + + sock_init_data(sock, sk); + sock->state = SS_UNCONNECTED; + sock->ops = &pppolac_proto_ops; + sk->sk_protocol = PX_PROTO_OLAC; + sk->sk_state = PPPOX_NONE; + return 0; +} + +/******************************************************************************/ + +static struct pppox_proto pppolac_pppox_proto = { + .create = pppolac_create, + .owner = THIS_MODULE, +}; + +static int __init pppolac_init(void) +{ + int error; + + error = proto_register(&pppolac_proto, 0); + if (error) + return error; + + error = register_pppox_proto(PX_PROTO_OLAC, &pppolac_pppox_proto); + if (error) + proto_unregister(&pppolac_proto); + else + skb_queue_head_init(&delivery_queue); + return error; +} + +static void __exit pppolac_exit(void) +{ + unregister_pppox_proto(PX_PROTO_OLAC); + proto_unregister(&pppolac_proto); +} + +module_init(pppolac_init); +module_exit(pppolac_exit); + +MODULE_DESCRIPTION("PPP on L2TP Access Concentrator (PPPoLAC)"); +MODULE_AUTHOR("Chia-chi Yeh <chiachi@android.com>"); +MODULE_LICENSE("GPL");
diff --git a/drivers/net/ppp/pppopns.c b/drivers/net/ppp/pppopns.c new file mode 100644 index 0000000..cdb4fa1 --- /dev/null +++ b/drivers/net/ppp/pppopns.c
@@ -0,0 +1,429 @@ +/* drivers/net/pppopns.c + * + * Driver for PPP on PPTP Network Server / PPPoPNS Socket (RFC 2637) + * + * Copyright (C) 2009 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +/* This driver handles PPTP data packets between a RAW socket and a PPP channel. + * The socket is created in the kernel space and connected to the same address + * of the control socket. Outgoing packets are always sent with sequences but + * without acknowledgements. Incoming packets with sequences are reordered + * within a sliding window of one second. Currently reordering only happens when + * a packet is received. It is done for simplicity since no additional locks or + * threads are required. This driver should work on both IPv4 and IPv6. */ + +#include <linux/module.h> +#include <linux/jiffies.h> +#include <linux/workqueue.h> +#include <linux/skbuff.h> +#include <linux/file.h> +#include <linux/netdevice.h> +#include <linux/net.h> +#include <linux/ppp_defs.h> +#include <linux/if.h> +#include <linux/if_ppp.h> +#include <linux/if_pppox.h> +#include <linux/ppp_channel.h> +#include <asm/uaccess.h> + +#define GRE_HEADER_SIZE 8 + +#define PPTP_GRE_BITS htons(0x2001) +#define PPTP_GRE_BITS_MASK htons(0xEF7F) +#define PPTP_GRE_SEQ_BIT htons(0x1000) +#define PPTP_GRE_ACK_BIT htons(0x0080) +#define PPTP_GRE_TYPE htons(0x880B) + +#define PPP_ADDR 0xFF +#define PPP_CTRL 0x03 + +struct header { + __u16 bits; + __u16 type; + __u16 length; + __u16 call; + __u32 sequence; +} __attribute__((packed)); + +struct meta { + __u32 sequence; + __u32 timestamp; +}; + +static inline struct meta *skb_meta(struct sk_buff *skb) +{ + return (struct meta *)skb->cb; +} + +/******************************************************************************/ + +static int pppopns_recv_core(struct sock *sk_raw, struct sk_buff *skb) +{ + struct sock *sk = (struct sock *)sk_raw->sk_user_data; + struct pppopns_opt *opt = &pppox_sk(sk)->proto.pns; + struct meta *meta = skb_meta(skb); + __u32 now = jiffies; + struct header *hdr; + + /* Skip transport header */ + skb_pull(skb, skb_transport_header(skb) - skb->data); + + /* Drop the packet if GRE header is missing. */ + if (skb->len < GRE_HEADER_SIZE) + goto drop; + hdr = (struct header *)skb->data; + + /* Check the header. */ + if (hdr->type != PPTP_GRE_TYPE || hdr->call != opt->local || + (hdr->bits & PPTP_GRE_BITS_MASK) != PPTP_GRE_BITS) + goto drop; + + /* Skip all fields including optional ones. */ + if (!skb_pull(skb, GRE_HEADER_SIZE + + (hdr->bits & PPTP_GRE_SEQ_BIT ? 4 : 0) + + (hdr->bits & PPTP_GRE_ACK_BIT ? 4 : 0))) + goto drop; + + /* Check the length. */ + if (skb->len != ntohs(hdr->length)) + goto drop; + + /* Check the sequence if it is present. */ + if (hdr->bits & PPTP_GRE_SEQ_BIT) { + meta->sequence = ntohl(hdr->sequence); + if ((__s32)(meta->sequence - opt->recv_sequence) < 0) + goto drop; + } + + /* Skip PPP address and control if they are present. */ + if (skb->len >= 2 && skb->data[0] == PPP_ADDR && + skb->data[1] == PPP_CTRL) + skb_pull(skb, 2); + + /* Fix PPP protocol if it is compressed. */ + if (skb->len >= 1 && skb->data[0] & 1) + skb_push(skb, 1)[0] = 0; + + /* Drop the packet if PPP protocol is missing. */ + if (skb->len < 2) + goto drop; + + /* Perform reordering if sequencing is enabled. */ + if (hdr->bits & PPTP_GRE_SEQ_BIT) { + struct sk_buff *skb1; + + /* Insert the packet into receive queue in order. */ + skb_set_owner_r(skb, sk); + skb_queue_walk(&sk->sk_receive_queue, skb1) { + struct meta *meta1 = skb_meta(skb1); + __s32 order = meta->sequence - meta1->sequence; + if (order == 0) + goto drop; + if (order < 0) { + meta->timestamp = meta1->timestamp; + skb_insert(skb1, skb, &sk->sk_receive_queue); + skb = NULL; + break; + } + } + if (skb) { + meta->timestamp = now; + skb_queue_tail(&sk->sk_receive_queue, skb); + } + + /* Remove packets from receive queue as long as + * 1. the receive buffer is full, + * 2. they are queued longer than one second, or + * 3. there are no missing packets before them. */ + skb_queue_walk_safe(&sk->sk_receive_queue, skb, skb1) { + meta = skb_meta(skb); + if (atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf && + now - meta->timestamp < HZ && + meta->sequence != opt->recv_sequence) + break; + skb_unlink(skb, &sk->sk_receive_queue); + opt->recv_sequence = meta->sequence + 1; + skb_orphan(skb); + ppp_input(&pppox_sk(sk)->chan, skb); + } + return NET_RX_SUCCESS; + } + + /* Flush receive queue if sequencing is disabled. */ + skb_queue_purge(&sk->sk_receive_queue); + skb_orphan(skb); + ppp_input(&pppox_sk(sk)->chan, skb); + return NET_RX_SUCCESS; +drop: + kfree_skb(skb); + return NET_RX_DROP; +} + +static void pppopns_recv(struct sock *sk_raw) +{ + struct sk_buff *skb; + while ((skb = skb_dequeue(&sk_raw->sk_receive_queue))) { + sock_hold(sk_raw); + sk_receive_skb(sk_raw, skb, 0); + } +} + +static struct sk_buff_head delivery_queue; + +static void pppopns_xmit_core(struct work_struct *delivery_work) +{ + mm_segment_t old_fs = get_fs(); + struct sk_buff *skb; + + set_fs(KERNEL_DS); + while ((skb = skb_dequeue(&delivery_queue))) { + struct sock *sk_raw = skb->sk; + struct kvec iov = {.iov_base = skb->data, .iov_len = skb->len}; + struct msghdr msg = { + .msg_flags = MSG_NOSIGNAL | MSG_DONTWAIT, + }; + + iov_iter_kvec(&msg.msg_iter, WRITE | ITER_KVEC, &iov, 1, + skb->len); + sk_raw->sk_prot->sendmsg(sk_raw, &msg, skb->len); + kfree_skb(skb); + } + set_fs(old_fs); +} + +static DECLARE_WORK(delivery_work, pppopns_xmit_core); + +static int pppopns_xmit(struct ppp_channel *chan, struct sk_buff *skb) +{ + struct sock *sk_raw = (struct sock *)chan->private; + struct pppopns_opt *opt = &pppox_sk(sk_raw->sk_user_data)->proto.pns; + struct header *hdr; + __u16 length; + + /* Install PPP address and control. */ + skb_push(skb, 2); + skb->data[0] = PPP_ADDR; + skb->data[1] = PPP_CTRL; + length = skb->len; + + /* Install PPTP GRE header. */ + hdr = (struct header *)skb_push(skb, 12); + hdr->bits = PPTP_GRE_BITS | PPTP_GRE_SEQ_BIT; + hdr->type = PPTP_GRE_TYPE; + hdr->length = htons(length); + hdr->call = opt->remote; + hdr->sequence = htonl(opt->xmit_sequence); + opt->xmit_sequence++; + + /* Now send the packet via the delivery queue. */ + skb_set_owner_w(skb, sk_raw); + skb_queue_tail(&delivery_queue, skb); + schedule_work(&delivery_work); + return 1; +} + +/******************************************************************************/ + +static struct ppp_channel_ops pppopns_channel_ops = { + .start_xmit = pppopns_xmit, +}; + +static int pppopns_connect(struct socket *sock, struct sockaddr *useraddr, + int addrlen, int flags) +{ + struct sock *sk = sock->sk; + struct pppox_sock *po = pppox_sk(sk); + struct sockaddr_pppopns *addr = (struct sockaddr_pppopns *)useraddr; + struct sockaddr_storage ss; + struct socket *sock_tcp = NULL; + struct socket *sock_raw = NULL; + struct sock *sk_tcp; + struct sock *sk_raw; + int error; + + if (addrlen != sizeof(struct sockaddr_pppopns)) + return -EINVAL; + + lock_sock(sk); + error = -EALREADY; + if (sk->sk_state != PPPOX_NONE) + goto out; + + sock_tcp = sockfd_lookup(addr->tcp_socket, &error); + if (!sock_tcp) + goto out; + sk_tcp = sock_tcp->sk; + error = -EPROTONOSUPPORT; + if (sk_tcp->sk_protocol != IPPROTO_TCP) + goto out; + addrlen = sizeof(struct sockaddr_storage); + error = kernel_getpeername(sock_tcp, (struct sockaddr *)&ss, &addrlen); + if (error) + goto out; + if (!sk_tcp->sk_bound_dev_if) { + struct dst_entry *dst = sk_dst_get(sk_tcp); + error = -ENODEV; + if (!dst) + goto out; + sk_tcp->sk_bound_dev_if = dst->dev->ifindex; + dst_release(dst); + } + + error = sock_create(ss.ss_family, SOCK_RAW, IPPROTO_GRE, &sock_raw); + if (error) + goto out; + sk_raw = sock_raw->sk; + sk_raw->sk_bound_dev_if = sk_tcp->sk_bound_dev_if; + error = kernel_connect(sock_raw, (struct sockaddr *)&ss, addrlen, 0); + if (error) + goto out; + + po->chan.hdrlen = 14; + po->chan.private = sk_raw; + po->chan.ops = &pppopns_channel_ops; + po->chan.mtu = PPP_MRU - 80; + po->proto.pns.local = addr->local; + po->proto.pns.remote = addr->remote; + po->proto.pns.data_ready = sk_raw->sk_data_ready; + po->proto.pns.backlog_rcv = sk_raw->sk_backlog_rcv; + + error = ppp_register_channel(&po->chan); + if (error) + goto out; + + sk->sk_state = PPPOX_CONNECTED; + lock_sock(sk_raw); + sk_raw->sk_data_ready = pppopns_recv; + sk_raw->sk_backlog_rcv = pppopns_recv_core; + sk_raw->sk_user_data = sk; + release_sock(sk_raw); +out: + if (sock_tcp) + sockfd_put(sock_tcp); + if (error && sock_raw) + sock_release(sock_raw); + release_sock(sk); + return error; +} + +static int pppopns_release(struct socket *sock) +{ + struct sock *sk = sock->sk; + + if (!sk) + return 0; + + lock_sock(sk); + if (sock_flag(sk, SOCK_DEAD)) { + release_sock(sk); + return -EBADF; + } + + if (sk->sk_state != PPPOX_NONE) { + struct sock *sk_raw = (struct sock *)pppox_sk(sk)->chan.private; + lock_sock(sk_raw); + skb_queue_purge(&sk->sk_receive_queue); + pppox_unbind_sock(sk); + sk_raw->sk_data_ready = pppox_sk(sk)->proto.pns.data_ready; + sk_raw->sk_backlog_rcv = pppox_sk(sk)->proto.pns.backlog_rcv; + sk_raw->sk_user_data = NULL; + release_sock(sk_raw); + sock_release(sk_raw->sk_socket); + } + + sock_orphan(sk); + sock->sk = NULL; + release_sock(sk); + sock_put(sk); + return 0; +} + +/******************************************************************************/ + +static struct proto pppopns_proto = { + .name = "PPPOPNS", + .owner = THIS_MODULE, + .obj_size = sizeof(struct pppox_sock), +}; + +static struct proto_ops pppopns_proto_ops = { + .family = PF_PPPOX, + .owner = THIS_MODULE, + .release = pppopns_release, + .bind = sock_no_bind, + .connect = pppopns_connect, + .socketpair = sock_no_socketpair, + .accept = sock_no_accept, + .getname = sock_no_getname, + .poll = sock_no_poll, + .ioctl = pppox_ioctl, + .listen = sock_no_listen, + .shutdown = sock_no_shutdown, + .setsockopt = sock_no_setsockopt, + .getsockopt = sock_no_getsockopt, + .sendmsg = sock_no_sendmsg, + .recvmsg = sock_no_recvmsg, + .mmap = sock_no_mmap, +}; + +static int pppopns_create(struct net *net, struct socket *sock, int kern) +{ + struct sock *sk; + + sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppopns_proto, kern); + if (!sk) + return -ENOMEM; + + sock_init_data(sock, sk); + sock->state = SS_UNCONNECTED; + sock->ops = &pppopns_proto_ops; + sk->sk_protocol = PX_PROTO_OPNS; + sk->sk_state = PPPOX_NONE; + return 0; +} + +/******************************************************************************/ + +static struct pppox_proto pppopns_pppox_proto = { + .create = pppopns_create, + .owner = THIS_MODULE, +}; + +static int __init pppopns_init(void) +{ + int error; + + error = proto_register(&pppopns_proto, 0); + if (error) + return error; + + error = register_pppox_proto(PX_PROTO_OPNS, &pppopns_pppox_proto); + if (error) + proto_unregister(&pppopns_proto); + else + skb_queue_head_init(&delivery_queue); + return error; +} + +static void __exit pppopns_exit(void) +{ + unregister_pppox_proto(PX_PROTO_OPNS); + proto_unregister(&pppopns_proto); +} + +module_init(pppopns_init); +module_exit(pppopns_exit); + +MODULE_DESCRIPTION("PPP on PPTP Network Server (PPPoPNS)"); +MODULE_AUTHOR("Chia-chi Yeh <chiachi@android.com>"); +MODULE_LICENSE("GPL");
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index eb6dc28..5f51668 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c
@@ -2019,6 +2019,12 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, int le; int ret; +#ifdef CONFIG_ANDROID_PARANOID_NETWORK + if (cmd != TUNGETIFF && !capable(CAP_NET_ADMIN)) { + return -EPERM; + } +#endif + if (cmd == TUNSETIFF || cmd == TUNSETQUEUE || _IOC_TYPE(cmd) == 0x89) { if (copy_from_user(&ifr, argp, ifreq_len)) return -EFAULT;
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c index c221597..00f2c0b 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -2774,7 +2774,6 @@ static s32 brcmf_inform_single_bss(struct brcmf_cfg80211_info *cfg, struct brcmf_bss_info_le *bi) { struct wiphy *wiphy = cfg_to_wiphy(cfg); - struct ieee80211_channel *notify_channel; struct cfg80211_bss *bss; struct ieee80211_supported_band *band; struct brcmu_chan ch; @@ -2784,7 +2783,7 @@ static s32 brcmf_inform_single_bss(struct brcmf_cfg80211_info *cfg, u16 notify_interval; u8 *notify_ie; size_t notify_ielen; - s32 notify_signal; + struct cfg80211_inform_bss bss_data = {}; if (le32_to_cpu(bi->length) > WL_BSS_INFO_MAX) { brcmf_err("Bss info is larger than buffer. Discarding\n"); @@ -2804,27 +2803,28 @@ static s32 brcmf_inform_single_bss(struct brcmf_cfg80211_info *cfg, band = wiphy->bands[NL80211_BAND_5GHZ]; freq = ieee80211_channel_to_frequency(channel, band->band); - notify_channel = ieee80211_get_channel(wiphy, freq); + bss_data.chan = ieee80211_get_channel(wiphy, freq); + bss_data.scan_width = NL80211_BSS_CHAN_WIDTH_20; + bss_data.boottime_ns = ktime_to_ns(ktime_get_boottime()); notify_capability = le16_to_cpu(bi->capability); notify_interval = le16_to_cpu(bi->beacon_period); notify_ie = (u8 *)bi + le16_to_cpu(bi->ie_offset); notify_ielen = le32_to_cpu(bi->ie_length); - notify_signal = (s16)le16_to_cpu(bi->RSSI) * 100; + bss_data.signal = (s16)le16_to_cpu(bi->RSSI) * 100; brcmf_dbg(CONN, "bssid: %pM\n", bi->BSSID); brcmf_dbg(CONN, "Channel: %d(%d)\n", channel, freq); brcmf_dbg(CONN, "Capability: %X\n", notify_capability); brcmf_dbg(CONN, "Beacon interval: %d\n", notify_interval); - brcmf_dbg(CONN, "Signal: %d\n", notify_signal); + brcmf_dbg(CONN, "Signal: %d\n", bss_data.signal); - bss = cfg80211_inform_bss(wiphy, notify_channel, - CFG80211_BSS_FTYPE_UNKNOWN, - (const u8 *)bi->BSSID, - 0, notify_capability, - notify_interval, notify_ie, - notify_ielen, notify_signal, - GFP_KERNEL); + bss = cfg80211_inform_bss_data(wiphy, &bss_data, + CFG80211_BSS_FTYPE_UNKNOWN, + (const u8 *)bi->BSSID, + 0, notify_capability, + notify_interval, notify_ie, + notify_ielen, GFP_KERNEL); if (!bss) return -ENOMEM;
diff --git a/drivers/net/wireless/ti/wlcore/init.c b/drivers/net/wireless/ti/wlcore/init.c index d0b7734..b7974b4 100644 --- a/drivers/net/wireless/ti/wlcore/init.c +++ b/drivers/net/wireless/ti/wlcore/init.c
@@ -549,6 +549,11 @@ static int wl12xx_init_ap_role(struct wl1271 *wl, struct wl12xx_vif *wlvif) { int ret; + /* Disable filtering */ + ret = wl1271_acx_group_address_tbl(wl, wlvif, false, NULL, 0); + if (ret < 0) + return ret; + ret = wl1271_acx_ap_max_tx_retry(wl, wlvif); if (ret < 0) return ret;
diff --git a/drivers/nfc/fdp/i2c.c b/drivers/nfc/fdp/i2c.c index 712936f..fbd26ec 100644 --- a/drivers/nfc/fdp/i2c.c +++ b/drivers/nfc/fdp/i2c.c
@@ -177,6 +177,16 @@ static int fdp_nci_i2c_read(struct fdp_i2c_phy *phy, struct sk_buff **skb) /* Packet that contains a length */ if (tmp[0] == 0 && tmp[1] == 0) { phy->next_read_size = (tmp[2] << 8) + tmp[3] + 3; + /* + * Ensure next_read_size does not exceed sizeof(tmp) + * for reading that many bytes during next iteration + */ + if (phy->next_read_size > FDP_NCI_I2C_MAX_PAYLOAD) { + dev_dbg(&client->dev, "%s: corrupted packet\n", + __func__); + phy->next_read_size = 5; + goto flush; + } } else { phy->next_read_size = FDP_NCI_I2C_MIN_PAYLOAD;
diff --git a/drivers/nfc/st21nfca/dep.c b/drivers/nfc/st21nfca/dep.c index 798a32b..2062852 100644 --- a/drivers/nfc/st21nfca/dep.c +++ b/drivers/nfc/st21nfca/dep.c
@@ -217,7 +217,8 @@ static int st21nfca_tm_recv_atr_req(struct nfc_hci_dev *hdev, atr_req = (struct st21nfca_atr_req *)skb->data; - if (atr_req->length < sizeof(struct st21nfca_atr_req)) { + if (atr_req->length < sizeof(struct st21nfca_atr_req) || + atr_req->length > skb->len) { r = -EPROTO; goto exit; }
diff --git a/drivers/nfc/st21nfca/se.c b/drivers/nfc/st21nfca/se.c index 3a98563..6e84e12 100644 --- a/drivers/nfc/st21nfca/se.c +++ b/drivers/nfc/st21nfca/se.c
@@ -320,23 +320,33 @@ int st21nfca_connectivity_event_received(struct nfc_hci_dev *hdev, u8 host, * AID 81 5 to 16 * PARAMETERS 82 0 to 255 */ - if (skb->len < NFC_MIN_AID_LENGTH + 2 && + if (skb->len < NFC_MIN_AID_LENGTH + 2 || skb->data[0] != NFC_EVT_TRANSACTION_AID_TAG) return -EPROTO; + /* + * Buffer should have enough space for at least + * two tag fields + two length fields + aid_len (skb->data[1]) + */ + if (skb->len < skb->data[1] + 4) + return -EPROTO; + transaction = (struct nfc_evt_transaction *)devm_kzalloc(dev, skb->len - 2, GFP_KERNEL); transaction->aid_len = skb->data[1]; memcpy(transaction->aid, &skb->data[2], transaction->aid_len); - - /* Check next byte is PARAMETERS tag (82) */ - if (skb->data[transaction->aid_len + 2] != - NFC_EVT_TRANSACTION_PARAMS_TAG) - return -EPROTO; - transaction->params_len = skb->data[transaction->aid_len + 3]; + + /* Check next byte is PARAMETERS tag (82) and the length field */ + if (skb->data[transaction->aid_len + 2] != + NFC_EVT_TRANSACTION_PARAMS_TAG || + skb->len < transaction->aid_len + transaction->params_len + 4) { + devm_kfree(dev, transaction); + return -EPROTO; + } + memcpy(transaction->params, skb->data + transaction->aid_len + 4, transaction->params_len);
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index e9360d5..e37a2a5 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c
@@ -1063,42 +1063,66 @@ int __init early_init_dt_scan_memory(unsigned long node, const char *uname, return 0; } +/* + * Convert configs to something easy to use in C code + */ +#if defined(CONFIG_CMDLINE_FORCE) +static const int overwrite_incoming_cmdline = 1; +static const int read_dt_cmdline; +static const int concat_cmdline; +#elif defined(CONFIG_CMDLINE_EXTEND) +static const int overwrite_incoming_cmdline; +static const int read_dt_cmdline = 1; +static const int concat_cmdline = 1; +#else /* CMDLINE_FROM_BOOTLOADER */ +static const int overwrite_incoming_cmdline; +static const int read_dt_cmdline = 1; +static const int concat_cmdline; +#endif + +#ifdef CONFIG_CMDLINE +static const char *config_cmdline = CONFIG_CMDLINE; +#else +static const char *config_cmdline = ""; +#endif + int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, int depth, void *data) { - int l; - const char *p; + int l = 0; + const char *p = NULL; + char *cmdline = data; pr_debug("search \"chosen\", depth: %d, uname: %s\n", depth, uname); - if (depth != 1 || !data || + if (depth != 1 || !cmdline || (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0)) return 0; early_init_dt_check_for_initrd(node); - /* Retrieve command line */ - p = of_get_flat_dt_prop(node, "bootargs", &l); - if (p != NULL && l > 0) - strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE)); + /* Put CONFIG_CMDLINE in if forced or if data had nothing in it to start */ + if (overwrite_incoming_cmdline || !cmdline[0]) + strlcpy(cmdline, config_cmdline, COMMAND_LINE_SIZE); - /* - * CONFIG_CMDLINE is meant to be a default in case nothing else - * managed to set the command line, unless CONFIG_CMDLINE_FORCE - * is set in which case we override whatever was found earlier. - */ -#ifdef CONFIG_CMDLINE -#if defined(CONFIG_CMDLINE_EXTEND) - strlcat(data, " ", COMMAND_LINE_SIZE); - strlcat(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE); -#elif defined(CONFIG_CMDLINE_FORCE) - strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE); -#else - /* No arguments from boot loader, use kernel's cmdl*/ - if (!((char *)data)[0]) - strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE); -#endif -#endif /* CONFIG_CMDLINE */ + /* Retrieve command line unless forcing */ + if (read_dt_cmdline) + p = of_get_flat_dt_prop(node, "bootargs", &l); + + if (p != NULL && l > 0) { + if (concat_cmdline) { + int cmdline_len; + int copy_len; + strlcat(cmdline, " ", COMMAND_LINE_SIZE); + cmdline_len = strlen(cmdline); + copy_len = COMMAND_LINE_SIZE - cmdline_len - 1; + copy_len = min((int)l, copy_len); + strncpy(cmdline + cmdline_len, p, copy_len); + cmdline[cmdline_len + copy_len] = '\0'; + } else { + strlcpy(cmdline, p, min((int)l, COMMAND_LINE_SIZE)); + } + } pr_debug("Command line is: %s\n", (char*)data);
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index 6643a7bc..e38bb48 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c
@@ -786,7 +786,8 @@ void pci_disable_link_state(struct pci_dev *pdev, int state) } EXPORT_SYMBOL(pci_disable_link_state); -static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp) +static int pcie_aspm_set_policy(const char *val, + const struct kernel_param *kp) { int i; struct pcie_link_state *link; @@ -813,7 +814,7 @@ static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp) return 0; } -static int pcie_aspm_get_policy(char *buffer, struct kernel_param *kp) +static int pcie_aspm_get_policy(char *buffer, const struct kernel_param *kp) { int i, cnt = 0; for (i = 0; i < ARRAY_SIZE(policy_str); i++)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index af82edc..f51759d 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c
@@ -1018,7 +1018,7 @@ int arm_pmu_device_probe(struct platform_device *pdev, const struct pmu_probe_info *probe_table) { const struct of_device_id *of_id; - const int (*init_fn)(struct arm_pmu *); + int (*init_fn)(struct arm_pmu *); struct device_node *node = pdev->dev.of_node; struct arm_pmu *pmu; int ret = -ENODEV;
diff --git a/drivers/platform/goldfish/Makefile b/drivers/platform/goldfish/Makefile index d348712..277a820 100644 --- a/drivers/platform/goldfish/Makefile +++ b/drivers/platform/goldfish/Makefile
@@ -2,4 +2,5 @@ # Makefile for Goldfish platform specific drivers # obj-$(CONFIG_GOLDFISH_BUS) += pdev_bus.o -obj-$(CONFIG_GOLDFISH_PIPE) += goldfish_pipe.o +obj-$(CONFIG_GOLDFISH_PIPE) += goldfish_pipe_all.o +goldfish_pipe_all-objs := goldfish_pipe.o goldfish_pipe_v2.o
diff --git a/drivers/platform/goldfish/goldfish_pipe.c b/drivers/platform/goldfish/goldfish_pipe.c index 1aba2c7..91e0a56 100644 --- a/drivers/platform/goldfish/goldfish_pipe.c +++ b/drivers/platform/goldfish/goldfish_pipe.c
@@ -15,52 +15,11 @@ * */ -/* This source file contains the implementation of a special device driver - * that intends to provide a *very* fast communication channel between the - * guest system and the QEMU emulator. - * - * Usage from the guest is simply the following (error handling simplified): - * - * int fd = open("/dev/qemu_pipe",O_RDWR); - * .... write() or read() through the pipe. - * - * This driver doesn't deal with the exact protocol used during the session. - * It is intended to be as simple as something like: - * - * // do this _just_ after opening the fd to connect to a specific - * // emulator service. - * const char* msg = "<pipename>"; - * if (write(fd, msg, strlen(msg)+1) < 0) { - * ... could not connect to <pipename> service - * close(fd); - * } - * - * // after this, simply read() and write() to communicate with the - * // service. Exact protocol details left as an exercise to the reader. - * - * This driver is very fast because it doesn't copy any data through - * intermediate buffers, since the emulator is capable of translating - * guest user addresses into host ones. - * - * Note that we must however ensure that each user page involved in the - * exchange is properly mapped during a transfer. +/* This source file contains the implementation of the legacy version of + * a goldfish pipe device driver. See goldfish_pipe_v2.c for the current + * version. */ - -#include <linux/module.h> -#include <linux/interrupt.h> -#include <linux/kernel.h> -#include <linux/spinlock.h> -#include <linux/miscdevice.h> -#include <linux/platform_device.h> -#include <linux/poll.h> -#include <linux/sched.h> -#include <linux/bitops.h> -#include <linux/slab.h> -#include <linux/io.h> -#include <linux/goldfish.h> -#include <linux/dma-mapping.h> -#include <linux/mm.h> -#include <linux/acpi.h> +#include "goldfish_pipe.h" /* * IMPORTANT: The following constants must match the ones used and defined @@ -110,29 +69,15 @@ #define PIPE_WAKE_READ (1 << 1) /* pipe can now be read from */ #define PIPE_WAKE_WRITE (1 << 2) /* pipe can now be written to */ -struct access_params { - unsigned long channel; - u32 size; - unsigned long address; - u32 cmd; - u32 result; - /* reserved for future extension */ - u32 flags; -}; +#define MAX_PAGES_TO_GRAB 32 -/* The global driver data. Holds a reference to the i/o page used to - * communicate with the emulator, and a wake queue for blocked tasks - * waiting to be awoken. - */ -struct goldfish_pipe_dev { - spinlock_t lock; - unsigned char __iomem *base; - struct access_params *aps; - int irq; - u32 version; -}; +#define DEBUG 0 -static struct goldfish_pipe_dev pipe_dev[1]; +#if DEBUG +#define DPRINT(...) { printk(KERN_ERR __VA_ARGS__); } +#else +#define DPRINT(...) +#endif /* This data type models a given pipe instance */ struct goldfish_pipe { @@ -142,6 +87,15 @@ struct goldfish_pipe { wait_queue_head_t wake_queue; }; +struct access_params { + unsigned long channel; + u32 size; + unsigned long address; + u32 cmd; + u32 result; + /* reserved for future extension */ + u32 flags; +}; /* Bit flags for the 'flags' field */ enum { @@ -231,8 +185,10 @@ static int setup_access_params_addr(struct platform_device *pdev, if (valid_batchbuffer_addr(dev, aps)) { dev->aps = aps; return 0; - } else + } else { + devm_kfree(&pdev->dev, aps); return -1; + } } /* A value that will not be set by qemu emulator */ @@ -269,6 +225,7 @@ static ssize_t goldfish_pipe_read_write(struct file *filp, char __user *buffer, struct goldfish_pipe *pipe = filp->private_data; struct goldfish_pipe_dev *dev = pipe->dev; unsigned long address, address_end; + struct page* pages[MAX_PAGES_TO_GRAB] = {}; int count = 0, ret = -EINVAL; /* If the emulator already closed the pipe, no need to go further */ @@ -293,45 +250,61 @@ static ssize_t goldfish_pipe_read_write(struct file *filp, char __user *buffer, while (address < address_end) { unsigned long page_end = (address & PAGE_MASK) + PAGE_SIZE; - unsigned long next = page_end < address_end ? page_end - : address_end; - unsigned long avail = next - address; - int status, wakeBit; - struct page *page; - - /* Either vaddr or paddr depending on the device version */ - unsigned long xaddr; + unsigned long next, avail; + int status, wakeBit, page_i, num_contiguous_pages; + long first_page, last_page, requested_pages; + unsigned long xaddr, xaddr_prev, xaddr_i; /* - * We grab the pages on a page-by-page basis in case user - * space gives us a potentially huge buffer but the read only - * returns a small amount, then there's no need to pin that - * much memory to the process. + * Attempt to grab multiple physically contiguous pages. */ - down_read(¤t->mm->mmap_sem); - ret = get_user_pages(address, 1, is_write ? 0 : FOLL_WRITE, - &page, NULL); - up_read(¤t->mm->mmap_sem); - if (ret < 0) - break; - - if (dev->version) { - /* Device version 1 or newer (qemu-android) expects the - * physical address. - */ - xaddr = page_to_phys(page) | (address & ~PAGE_MASK); - } else { - /* Device version 0 (classic emulator) expects the - * virtual address. - */ - xaddr = address; + first_page = address & PAGE_MASK; + last_page = (address_end - 1) & PAGE_MASK; + requested_pages = ((last_page - first_page) >> PAGE_SHIFT) + 1; + if (requested_pages > MAX_PAGES_TO_GRAB) { + requested_pages = MAX_PAGES_TO_GRAB; } + ret = get_user_pages_fast(first_page, requested_pages, + !is_write, pages); + + DPRINT("%s: requested pages: %d %d %p\n", __FUNCTION__, + ret, requested_pages, first_page); + if (ret == 0) { + DPRINT("%s: error: (requested pages == 0) (wanted %d)\n", + __FUNCTION__, requested_pages); + mutex_unlock(&pipe->lock); + return ret; + } + if (ret < 0) { + DPRINT("%s: (requested pages < 0) %d \n", + __FUNCTION__, requested_pages); + mutex_unlock(&pipe->lock); + return ret; + } + + xaddr = page_to_phys(pages[0]) | (address & ~PAGE_MASK); + xaddr_prev = xaddr; + num_contiguous_pages = ret == 0 ? 0 : 1; + for (page_i = 1; page_i < ret; page_i++) { + xaddr_i = page_to_phys(pages[page_i]) | (address & ~PAGE_MASK); + if (xaddr_i == xaddr_prev + PAGE_SIZE) { + page_end += PAGE_SIZE; + xaddr_prev = xaddr_i; + num_contiguous_pages++; + } else { + DPRINT("%s: discontinuous page boundary: %d pages instead\n", + __FUNCTION__, page_i); + break; + } + } + next = page_end < address_end ? page_end : address_end; + avail = next - address; /* Now, try to transfer the bytes in the current page */ spin_lock_irqsave(&dev->lock, irq_flags); if (access_with_param(dev, - is_write ? CMD_WRITE_BUFFER : CMD_READ_BUFFER, - xaddr, avail, pipe, &status)) { + is_write ? CMD_WRITE_BUFFER : CMD_READ_BUFFER, + xaddr, avail, pipe, &status)) { gf_write_ptr(pipe, dev->base + PIPE_REG_CHANNEL, dev->base + PIPE_REG_CHANNEL_HIGH); writel(avail, dev->base + PIPE_REG_SIZE); @@ -344,9 +317,13 @@ static ssize_t goldfish_pipe_read_write(struct file *filp, char __user *buffer, } spin_unlock_irqrestore(&dev->lock, irq_flags); - if (status > 0 && !is_write) - set_page_dirty(page); - put_page(page); + for (page_i = 0; page_i < ret; page_i++) { + if (status > 0 && !is_write && + page_i < num_contiguous_pages) { + set_page_dirty(pages[page_i]); + } + put_page(pages[page_i]); + } if (status > 0) { /* Correct transfer */ count += status; @@ -368,7 +345,7 @@ static ssize_t goldfish_pipe_read_write(struct file *filp, char __user *buffer, */ if (status != PIPE_ERROR_AGAIN) pr_info_ratelimited("goldfish_pipe: backend returned error %d on %s\n", - status, is_write ? "write" : "read"); + status, is_write ? "write" : "read"); ret = 0; break; } @@ -378,7 +355,7 @@ static ssize_t goldfish_pipe_read_write(struct file *filp, char __user *buffer, * non-blocking mode, just return the error code. */ if (status != PIPE_ERROR_AGAIN || - (filp->f_flags & O_NONBLOCK) != 0) { + (filp->f_flags & O_NONBLOCK) != 0) { ret = goldfish_pipe_error_convert(status); break; } @@ -392,7 +369,7 @@ static ssize_t goldfish_pipe_read_write(struct file *filp, char __user *buffer, /* Tell the emulator we're going to wait for a wake event */ goldfish_cmd(pipe, - is_write ? CMD_WAKE_ON_WRITE : CMD_WAKE_ON_READ); + is_write ? CMD_WAKE_ON_WRITE : CMD_WAKE_ON_READ); /* Unlock the pipe, then wait for the wake signal */ mutex_unlock(&pipe->lock); @@ -538,6 +515,8 @@ static int goldfish_pipe_open(struct inode *inode, struct file *file) pipe->dev = dev; mutex_init(&pipe->lock); + DPRINT("%s: call. pipe_dev pipe_dev=0x%lx new_pipe_addr=0x%lx file=0x%lx\n", __FUNCTION__, pipe_dev, pipe, file); + // spin lock init, write head of list, i guess init_waitqueue_head(&pipe->wake_queue); /* @@ -560,6 +539,7 @@ static int goldfish_pipe_release(struct inode *inode, struct file *filp) { struct goldfish_pipe *pipe = filp->private_data; + DPRINT("%s: call. pipe=0x%lx file=0x%lx\n", __FUNCTION__, pipe, filp); /* The guest is closing the channel, so tell the emulator right now */ goldfish_cmd(pipe, CMD_CLOSE); kfree(pipe); @@ -576,98 +556,33 @@ static const struct file_operations goldfish_pipe_fops = { .release = goldfish_pipe_release, }; -static struct miscdevice goldfish_pipe_device = { +static struct miscdevice goldfish_pipe_dev = { .minor = MISC_DYNAMIC_MINOR, .name = "goldfish_pipe", .fops = &goldfish_pipe_fops, }; -static int goldfish_pipe_probe(struct platform_device *pdev) +int goldfish_pipe_device_init_v1(struct platform_device *pdev) { - int err; - struct resource *r; struct goldfish_pipe_dev *dev = pipe_dev; - - /* not thread safe, but this should not happen */ - WARN_ON(dev->base != NULL); - - spin_lock_init(&dev->lock); - - r = platform_get_resource(pdev, IORESOURCE_MEM, 0); - if (r == NULL || resource_size(r) < PAGE_SIZE) { - dev_err(&pdev->dev, "can't allocate i/o page\n"); - return -EINVAL; - } - dev->base = devm_ioremap(&pdev->dev, r->start, PAGE_SIZE); - if (dev->base == NULL) { - dev_err(&pdev->dev, "ioremap failed\n"); - return -EINVAL; - } - - r = platform_get_resource(pdev, IORESOURCE_IRQ, 0); - if (r == NULL) { - err = -EINVAL; - goto error; - } - dev->irq = r->start; - - err = devm_request_irq(&pdev->dev, dev->irq, goldfish_pipe_interrupt, + int err = devm_request_irq(&pdev->dev, dev->irq, goldfish_pipe_interrupt, IRQF_SHARED, "goldfish_pipe", dev); if (err) { - dev_err(&pdev->dev, "unable to allocate IRQ\n"); - goto error; + dev_err(&pdev->dev, "unable to allocate IRQ for v1\n"); + return err; } - err = misc_register(&goldfish_pipe_device); + err = misc_register(&goldfish_pipe_dev); if (err) { - dev_err(&pdev->dev, "unable to register device\n"); - goto error; + dev_err(&pdev->dev, "unable to register v1 device\n"); + return err; } + setup_access_params_addr(pdev, dev); - - /* Although the pipe device in the classic Android emulator does not - * recognize the 'version' register, it won't treat this as an error - * either and will simply return 0, which is fine. - */ - dev->version = readl(dev->base + PIPE_REG_VERSION); return 0; - -error: - dev->base = NULL; - return err; } -static int goldfish_pipe_remove(struct platform_device *pdev) +void goldfish_pipe_device_deinit_v1(struct platform_device *pdev) { - struct goldfish_pipe_dev *dev = pipe_dev; - misc_deregister(&goldfish_pipe_device); - dev->base = NULL; - return 0; + misc_deregister(&goldfish_pipe_dev); } - -static const struct acpi_device_id goldfish_pipe_acpi_match[] = { - { "GFSH0003", 0 }, - { }, -}; -MODULE_DEVICE_TABLE(acpi, goldfish_pipe_acpi_match); - -static const struct of_device_id goldfish_pipe_of_match[] = { - { .compatible = "google,android-pipe", }, - {}, -}; -MODULE_DEVICE_TABLE(of, goldfish_pipe_of_match); - -static struct platform_driver goldfish_pipe = { - .probe = goldfish_pipe_probe, - .remove = goldfish_pipe_remove, - .driver = { - .name = "goldfish_pipe", - .owner = THIS_MODULE, - .of_match_table = goldfish_pipe_of_match, - .acpi_match_table = ACPI_PTR(goldfish_pipe_acpi_match), - } -}; - -module_platform_driver(goldfish_pipe); -MODULE_AUTHOR("David Turner <digit@google.com>"); -MODULE_LICENSE("GPL");
diff --git a/drivers/platform/goldfish/goldfish_pipe.h b/drivers/platform/goldfish/goldfish_pipe.h new file mode 100644 index 0000000..6cd1b63 --- /dev/null +++ b/drivers/platform/goldfish/goldfish_pipe.h
@@ -0,0 +1,92 @@ +/* + * Copyright (C) 2016 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ +#ifndef GOLDFISH_PIPE_H +#define GOLDFISH_PIPE_H + +#include <linux/module.h> +#include <linux/interrupt.h> +#include <linux/kernel.h> +#include <linux/spinlock.h> +#include <linux/miscdevice.h> +#include <linux/platform_device.h> +#include <linux/poll.h> +#include <linux/sched.h> +#include <linux/bitops.h> +#include <linux/slab.h> +#include <linux/io.h> +#include <linux/goldfish.h> +#include <linux/dma-mapping.h> +#include <linux/mm.h> +#include <linux/acpi.h> + + +/* Initialize the legacy version of the pipe device driver */ +int goldfish_pipe_device_init_v1(struct platform_device *pdev); + +/* Deinitialize the legacy version of the pipe device driver */ +void goldfish_pipe_device_deinit_v1(struct platform_device *pdev); + +/* Forward declarations for the device struct */ +struct goldfish_pipe; +struct goldfish_pipe_device_buffers; + +/* The global driver data. Holds a reference to the i/o page used to + * communicate with the emulator, and a wake queue for blocked tasks + * waiting to be awoken. + */ +struct goldfish_pipe_dev { + /* + * Global device spinlock. Protects the following members: + * - pipes, pipes_capacity + * - [*pipes, *pipes + pipes_capacity) - array data + * - first_signalled_pipe, + * goldfish_pipe::prev_signalled, + * goldfish_pipe::next_signalled, + * goldfish_pipe::signalled_flags - all singnalled-related fields, + * in all allocated pipes + * - open_command_params - PIPE_CMD_OPEN-related buffers + * + * It looks like a lot of different fields, but the trick is that the only + * operation that happens often is the signalled pipes array manipulation. + * That's why it's OK for now to keep the rest of the fields under the same + * lock. If we notice too much contention because of PIPE_CMD_OPEN, + * then we should add a separate lock there. + */ + spinlock_t lock; + + /* + * Array of the pipes of |pipes_capacity| elements, + * indexed by goldfish_pipe::id + */ + struct goldfish_pipe **pipes; + u32 pipes_capacity; + + /* Pointers to the buffers host uses for interaction with this driver */ + struct goldfish_pipe_dev_buffers *buffers; + + /* Head of a doubly linked list of signalled pipes */ + struct goldfish_pipe *first_signalled_pipe; + + /* Some device-specific data */ + int irq; + int version; + unsigned char __iomem *base; + + /* v1-specific access parameters */ + struct access_params *aps; +}; + +extern struct goldfish_pipe_dev pipe_dev[1]; + +#endif /* GOLDFISH_PIPE_H */
diff --git a/drivers/platform/goldfish/goldfish_pipe_v2.c b/drivers/platform/goldfish/goldfish_pipe_v2.c new file mode 100644 index 0000000..ad373ed --- /dev/null +++ b/drivers/platform/goldfish/goldfish_pipe_v2.c
@@ -0,0 +1,889 @@ +/* + * Copyright (C) 2012 Intel, Inc. + * Copyright (C) 2013 Intel, Inc. + * Copyright (C) 2014 Linaro Limited + * Copyright (C) 2011-2016 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +/* This source file contains the implementation of a special device driver + * that intends to provide a *very* fast communication channel between the + * guest system and the QEMU emulator. + * + * Usage from the guest is simply the following (error handling simplified): + * + * int fd = open("/dev/qemu_pipe",O_RDWR); + * .... write() or read() through the pipe. + * + * This driver doesn't deal with the exact protocol used during the session. + * It is intended to be as simple as something like: + * + * // do this _just_ after opening the fd to connect to a specific + * // emulator service. + * const char* msg = "<pipename>"; + * if (write(fd, msg, strlen(msg)+1) < 0) { + * ... could not connect to <pipename> service + * close(fd); + * } + * + * // after this, simply read() and write() to communicate with the + * // service. Exact protocol details left as an exercise to the reader. + * + * This driver is very fast because it doesn't copy any data through + * intermediate buffers, since the emulator is capable of translating + * guest user addresses into host ones. + * + * Note that we must however ensure that each user page involved in the + * exchange is properly mapped during a transfer. + */ + +#include "goldfish_pipe.h" + + +/* + * Update this when something changes in the driver's behavior so the host + * can benefit from knowing it + */ +enum { + PIPE_DRIVER_VERSION = 2, + PIPE_CURRENT_DEVICE_VERSION = 2 +}; + +/* + * IMPORTANT: The following constants must match the ones used and defined + * in external/qemu/hw/goldfish_pipe.c in the Android source tree. + */ + +/* List of bitflags returned in status of CMD_POLL command */ +enum PipePollFlags { + PIPE_POLL_IN = 1 << 0, + PIPE_POLL_OUT = 1 << 1, + PIPE_POLL_HUP = 1 << 2 +}; + +/* Possible status values used to signal errors - see goldfish_pipe_error_convert */ +enum PipeErrors { + PIPE_ERROR_INVAL = -1, + PIPE_ERROR_AGAIN = -2, + PIPE_ERROR_NOMEM = -3, + PIPE_ERROR_IO = -4 +}; + +/* Bit-flags used to signal events from the emulator */ +enum PipeWakeFlags { + PIPE_WAKE_CLOSED = 1 << 0, /* emulator closed pipe */ + PIPE_WAKE_READ = 1 << 1, /* pipe can now be read from */ + PIPE_WAKE_WRITE = 1 << 2 /* pipe can now be written to */ +}; + +/* Bit flags for the 'flags' field */ +enum PipeFlagsBits { + BIT_CLOSED_ON_HOST = 0, /* pipe closed by host */ + BIT_WAKE_ON_WRITE = 1, /* want to be woken on writes */ + BIT_WAKE_ON_READ = 2, /* want to be woken on reads */ +}; + +enum PipeRegs { + PIPE_REG_CMD = 0, + + PIPE_REG_SIGNAL_BUFFER_HIGH = 4, + PIPE_REG_SIGNAL_BUFFER = 8, + PIPE_REG_SIGNAL_BUFFER_COUNT = 12, + + PIPE_REG_OPEN_BUFFER_HIGH = 20, + PIPE_REG_OPEN_BUFFER = 24, + + PIPE_REG_VERSION = 36, + + PIPE_REG_GET_SIGNALLED = 48, +}; + +enum PipeCmdCode { + PIPE_CMD_OPEN = 1, /* to be used by the pipe device itself */ + PIPE_CMD_CLOSE, + PIPE_CMD_POLL, + PIPE_CMD_WRITE, + PIPE_CMD_WAKE_ON_WRITE, + PIPE_CMD_READ, + PIPE_CMD_WAKE_ON_READ, + + /* + * TODO(zyy): implement a deferred read/write execution to allow parallel + * processing of pipe operations on the host. + */ + PIPE_CMD_WAKE_ON_DONE_IO, +}; + +enum { + MAX_BUFFERS_PER_COMMAND = 336, + MAX_SIGNALLED_PIPES = 64, + INITIAL_PIPES_CAPACITY = 64 +}; + +struct goldfish_pipe_dev; +struct goldfish_pipe; +struct goldfish_pipe_command; + +/* A per-pipe command structure, shared with the host */ +struct goldfish_pipe_command { + s32 cmd; /* PipeCmdCode, guest -> host */ + s32 id; /* pipe id, guest -> host */ + s32 status; /* command execution status, host -> guest */ + s32 reserved; /* to pad to 64-bit boundary */ + union { + /* Parameters for PIPE_CMD_{READ,WRITE} */ + struct { + u32 buffers_count; /* number of buffers, guest -> host */ + s32 consumed_size; /* number of consumed bytes, host -> guest */ + u64 ptrs[MAX_BUFFERS_PER_COMMAND]; /* buffer pointers, guest -> host */ + u32 sizes[MAX_BUFFERS_PER_COMMAND]; /* buffer sizes, guest -> host */ + } rw_params; + }; +}; + +/* A single signalled pipe information */ +struct signalled_pipe_buffer { + u32 id; + u32 flags; +}; + +/* Parameters for the PIPE_CMD_OPEN command */ +struct open_command_param { + u64 command_buffer_ptr; + u32 rw_params_max_count; +}; + +/* Device-level set of buffers shared with the host */ +struct goldfish_pipe_dev_buffers { + struct open_command_param open_command_params; + struct signalled_pipe_buffer signalled_pipe_buffers[MAX_SIGNALLED_PIPES]; +}; + +/* This data type models a given pipe instance */ +struct goldfish_pipe { + u32 id; /* pipe ID - index into goldfish_pipe_dev::pipes array */ + unsigned long flags; /* The wake flags pipe is waiting for + * Note: not protected with any lock, uses atomic operations + * and barriers to make it thread-safe. + */ + unsigned long signalled_flags; /* wake flags host have signalled, + * - protected by goldfish_pipe_dev::lock */ + + struct goldfish_pipe_command *command_buffer; /* A pointer to command buffer */ + + /* doubly linked list of signalled pipes, protected by goldfish_pipe_dev::lock */ + struct goldfish_pipe *prev_signalled; + struct goldfish_pipe *next_signalled; + + /* + * A pipe's own lock. Protects the following: + * - *command_buffer - makes sure a command can safely write its parameters + * to the host and read the results back. + */ + struct mutex lock; + + wait_queue_head_t wake_queue; /* A wake queue for sleeping until host signals an event */ + struct goldfish_pipe_dev *dev; /* Pointer to the parent goldfish_pipe_dev instance */ +}; + +struct goldfish_pipe_dev pipe_dev[1] = {}; + +static int goldfish_cmd_locked(struct goldfish_pipe *pipe, enum PipeCmdCode cmd) +{ + pipe->command_buffer->cmd = cmd; + pipe->command_buffer->status = PIPE_ERROR_INVAL; /* failure by default */ + writel(pipe->id, pipe->dev->base + PIPE_REG_CMD); + return pipe->command_buffer->status; +} + +static int goldfish_cmd(struct goldfish_pipe *pipe, enum PipeCmdCode cmd) +{ + int status; + if (mutex_lock_interruptible(&pipe->lock)) + return PIPE_ERROR_IO; + status = goldfish_cmd_locked(pipe, cmd); + mutex_unlock(&pipe->lock); + return status; +} + +/* + * This function converts an error code returned by the emulator through + * the PIPE_REG_STATUS i/o register into a valid negative errno value. + */ +static int goldfish_pipe_error_convert(int status) +{ + switch (status) { + case PIPE_ERROR_AGAIN: + return -EAGAIN; + case PIPE_ERROR_NOMEM: + return -ENOMEM; + case PIPE_ERROR_IO: + return -EIO; + default: + return -EINVAL; + } +} + +static int pin_user_pages(unsigned long first_page, unsigned long last_page, + unsigned last_page_size, int is_write, + struct page *pages[MAX_BUFFERS_PER_COMMAND], unsigned *iter_last_page_size) +{ + int ret; + int requested_pages = ((last_page - first_page) >> PAGE_SHIFT) + 1; + if (requested_pages > MAX_BUFFERS_PER_COMMAND) { + requested_pages = MAX_BUFFERS_PER_COMMAND; + *iter_last_page_size = PAGE_SIZE; + } else { + *iter_last_page_size = last_page_size; + } + + ret = get_user_pages_fast( + first_page, requested_pages, !is_write, pages); + if (ret <= 0) + return -EFAULT; + if (ret < requested_pages) + *iter_last_page_size = PAGE_SIZE; + return ret; + +} + +static void release_user_pages(struct page **pages, int pages_count, + int is_write, s32 consumed_size) +{ + int i; + for (i = 0; i < pages_count; i++) { + if (!is_write && consumed_size > 0) { + set_page_dirty(pages[i]); + } + put_page(pages[i]); + } +} + +/* Populate the call parameters, merging adjacent pages together */ +static void populate_rw_params( + struct page **pages, int pages_count, + unsigned long address, unsigned long address_end, + unsigned long first_page, unsigned long last_page, + unsigned iter_last_page_size, int is_write, + struct goldfish_pipe_command *command) +{ + /* + * Process the first page separately - it's the only page that + * needs special handling for its start address. + */ + unsigned long xaddr = page_to_phys(pages[0]); + unsigned long xaddr_prev = xaddr; + int buffer_idx = 0; + int i = 1; + int size_on_page = first_page == last_page + ? (int)(address_end - address) + : (PAGE_SIZE - (address & ~PAGE_MASK)); + command->rw_params.ptrs[0] = (u64)(xaddr | (address & ~PAGE_MASK)); + command->rw_params.sizes[0] = size_on_page; + for (; i < pages_count; ++i) { + xaddr = page_to_phys(pages[i]); + size_on_page = (i == pages_count - 1) ? iter_last_page_size : PAGE_SIZE; + if (xaddr == xaddr_prev + PAGE_SIZE) { + command->rw_params.sizes[buffer_idx] += size_on_page; + } else { + ++buffer_idx; + command->rw_params.ptrs[buffer_idx] = (u64)xaddr; + command->rw_params.sizes[buffer_idx] = size_on_page; + } + xaddr_prev = xaddr; + } + command->rw_params.buffers_count = buffer_idx + 1; +} + +static int transfer_max_buffers(struct goldfish_pipe* pipe, + unsigned long address, unsigned long address_end, int is_write, + unsigned long last_page, unsigned int last_page_size, + s32* consumed_size, int* status) +{ + struct page *pages[MAX_BUFFERS_PER_COMMAND]; + unsigned long first_page = address & PAGE_MASK; + unsigned int iter_last_page_size; + int pages_count = pin_user_pages(first_page, last_page, + last_page_size, is_write, + pages, &iter_last_page_size); + if (pages_count < 0) + return pages_count; + + /* Serialize access to the pipe command buffers */ + if (mutex_lock_interruptible(&pipe->lock)) + return -ERESTARTSYS; + + populate_rw_params(pages, pages_count, address, address_end, + first_page, last_page, iter_last_page_size, is_write, + pipe->command_buffer); + + /* Transfer the data */ + *status = goldfish_cmd_locked(pipe, + is_write ? PIPE_CMD_WRITE : PIPE_CMD_READ); + + *consumed_size = pipe->command_buffer->rw_params.consumed_size; + + mutex_unlock(&pipe->lock); + + release_user_pages(pages, pages_count, is_write, *consumed_size); + + return 0; +} + +static int wait_for_host_signal(struct goldfish_pipe *pipe, int is_write) +{ + u32 wakeBit = is_write ? BIT_WAKE_ON_WRITE : BIT_WAKE_ON_READ; + set_bit(wakeBit, &pipe->flags); + + /* Tell the emulator we're going to wait for a wake event */ + (void)goldfish_cmd(pipe, + is_write ? PIPE_CMD_WAKE_ON_WRITE : PIPE_CMD_WAKE_ON_READ); + + while (test_bit(wakeBit, &pipe->flags)) { + if (wait_event_interruptible( + pipe->wake_queue, + !test_bit(wakeBit, &pipe->flags))) + return -ERESTARTSYS; + + if (test_bit(BIT_CLOSED_ON_HOST, &pipe->flags)) + return -EIO; + } + + return 0; +} + +static ssize_t goldfish_pipe_read_write(struct file *filp, + char __user *buffer, size_t bufflen, int is_write) +{ + struct goldfish_pipe *pipe = filp->private_data; + int count = 0, ret = -EINVAL; + unsigned long address, address_end, last_page; + unsigned int last_page_size; + + /* If the emulator already closed the pipe, no need to go further */ + if (unlikely(test_bit(BIT_CLOSED_ON_HOST, &pipe->flags))) + return -EIO; + /* Null reads or writes succeeds */ + if (unlikely(bufflen == 0)) + return 0; + /* Check the buffer range for access */ + if (unlikely(!access_ok(is_write ? VERIFY_WRITE : VERIFY_READ, + buffer, bufflen))) + return -EFAULT; + + address = (unsigned long)buffer; + address_end = address + bufflen; + last_page = (address_end - 1) & PAGE_MASK; + last_page_size = ((address_end - 1) & ~PAGE_MASK) + 1; + + while (address < address_end) { + s32 consumed_size; + int status; + ret = transfer_max_buffers(pipe, address, address_end, is_write, + last_page, last_page_size, &consumed_size, &status); + if (ret < 0) + break; + + if (consumed_size > 0) { + /* No matter what's the status, we've transfered something */ + count += consumed_size; + address += consumed_size; + } + if (status > 0) + continue; + if (status == 0) { + /* EOF */ + ret = 0; + break; + } + if (count > 0) { + /* + * An error occured, but we already transfered + * something on one of the previous iterations. + * Just return what we already copied and log this + * err. + */ + if (status != PIPE_ERROR_AGAIN) + pr_info_ratelimited("goldfish_pipe: backend error %d on %s\n", + status, is_write ? "write" : "read"); + break; + } + + /* + * If the error is not PIPE_ERROR_AGAIN, or if we are in + * non-blocking mode, just return the error code. + */ + if (status != PIPE_ERROR_AGAIN || (filp->f_flags & O_NONBLOCK) != 0) { + ret = goldfish_pipe_error_convert(status); + break; + } + + status = wait_for_host_signal(pipe, is_write); + if (status < 0) + return status; + } + + if (count > 0) + return count; + return ret; +} + +static ssize_t goldfish_pipe_read(struct file *filp, char __user *buffer, + size_t bufflen, loff_t *ppos) +{ + return goldfish_pipe_read_write(filp, buffer, bufflen, /* is_write */ 0); +} + +static ssize_t goldfish_pipe_write(struct file *filp, + const char __user *buffer, size_t bufflen, + loff_t *ppos) +{ + return goldfish_pipe_read_write(filp, + /* cast away the const */(char __user *)buffer, bufflen, + /* is_write */ 1); +} + +static unsigned int goldfish_pipe_poll(struct file *filp, poll_table *wait) +{ + struct goldfish_pipe *pipe = filp->private_data; + unsigned int mask = 0; + int status; + + poll_wait(filp, &pipe->wake_queue, wait); + + status = goldfish_cmd(pipe, PIPE_CMD_POLL); + if (status < 0) { + return -ERESTARTSYS; + } + + if (status & PIPE_POLL_IN) + mask |= POLLIN | POLLRDNORM; + if (status & PIPE_POLL_OUT) + mask |= POLLOUT | POLLWRNORM; + if (status & PIPE_POLL_HUP) + mask |= POLLHUP; + if (test_bit(BIT_CLOSED_ON_HOST, &pipe->flags)) + mask |= POLLERR; + + return mask; +} + +static void signalled_pipes_add_locked(struct goldfish_pipe_dev *dev, + u32 id, u32 flags) +{ + struct goldfish_pipe *pipe; + + BUG_ON(id >= dev->pipes_capacity); + + pipe = dev->pipes[id]; + if (!pipe) + return; + pipe->signalled_flags |= flags; + + if (pipe->prev_signalled || pipe->next_signalled + || dev->first_signalled_pipe == pipe) + return; /* already in the list */ + pipe->next_signalled = dev->first_signalled_pipe; + if (dev->first_signalled_pipe) { + dev->first_signalled_pipe->prev_signalled = pipe; + } + dev->first_signalled_pipe = pipe; +} + +static void signalled_pipes_remove_locked(struct goldfish_pipe_dev *dev, + struct goldfish_pipe *pipe) { + if (pipe->prev_signalled) + pipe->prev_signalled->next_signalled = pipe->next_signalled; + if (pipe->next_signalled) + pipe->next_signalled->prev_signalled = pipe->prev_signalled; + if (pipe == dev->first_signalled_pipe) + dev->first_signalled_pipe = pipe->next_signalled; + pipe->prev_signalled = NULL; + pipe->next_signalled = NULL; +} + +static struct goldfish_pipe *signalled_pipes_pop_front(struct goldfish_pipe_dev *dev, + int *wakes) +{ + struct goldfish_pipe *pipe; + unsigned long flags; + spin_lock_irqsave(&dev->lock, flags); + + pipe = dev->first_signalled_pipe; + if (pipe) { + *wakes = pipe->signalled_flags; + pipe->signalled_flags = 0; + /* + * This is an optimized version of signalled_pipes_remove_locked() - + * we want to make it as fast as possible to wake the sleeping pipe + * operations faster + */ + dev->first_signalled_pipe = pipe->next_signalled; + if (dev->first_signalled_pipe) + dev->first_signalled_pipe->prev_signalled = NULL; + pipe->next_signalled = NULL; + } + + spin_unlock_irqrestore(&dev->lock, flags); + return pipe; +} + +static void goldfish_interrupt_task(unsigned long unused) +{ + struct goldfish_pipe_dev *dev = pipe_dev; + /* Iterate over the signalled pipes and wake them one by one */ + struct goldfish_pipe *pipe; + int wakes; + while ((pipe = signalled_pipes_pop_front(dev, &wakes)) != NULL) { + if (wakes & PIPE_WAKE_CLOSED) { + pipe->flags = 1 << BIT_CLOSED_ON_HOST; + } else { + if (wakes & PIPE_WAKE_READ) + clear_bit(BIT_WAKE_ON_READ, &pipe->flags); + if (wakes & PIPE_WAKE_WRITE) + clear_bit(BIT_WAKE_ON_WRITE, &pipe->flags); + } + /* + * wake_up_interruptible() implies a write barrier, so don't explicitly + * add another one here. + */ + wake_up_interruptible(&pipe->wake_queue); + } +} +DECLARE_TASKLET(goldfish_interrupt_tasklet, goldfish_interrupt_task, 0); + +/* + * The general idea of the interrupt handling: + * + * 1. device raises an interrupt if there's at least one signalled pipe + * 2. IRQ handler reads the signalled pipes and their count from the device + * 3. device writes them into a shared buffer and returns the count + * it only resets the IRQ if it has returned all signalled pipes, + * otherwise it leaves it raised, so IRQ handler will be called + * again for the next chunk + * 4. IRQ handler adds all returned pipes to the device's signalled pipes list + * 5. IRQ handler launches a tasklet to process the signalled pipes from the + * list in a separate context + */ +static irqreturn_t goldfish_pipe_interrupt(int irq, void *dev_id) +{ + u32 count; + u32 i; + unsigned long flags; + struct goldfish_pipe_dev *dev = dev_id; + if (dev != pipe_dev) + return IRQ_NONE; + + /* Request the signalled pipes from the device */ + spin_lock_irqsave(&dev->lock, flags); + + count = readl(dev->base + PIPE_REG_GET_SIGNALLED); + if (count == 0) { + spin_unlock_irqrestore(&dev->lock, flags); + return IRQ_NONE; + } + if (count > MAX_SIGNALLED_PIPES) + count = MAX_SIGNALLED_PIPES; + + for (i = 0; i < count; ++i) + signalled_pipes_add_locked(dev, + dev->buffers->signalled_pipe_buffers[i].id, + dev->buffers->signalled_pipe_buffers[i].flags); + + spin_unlock_irqrestore(&dev->lock, flags); + + tasklet_schedule(&goldfish_interrupt_tasklet); + return IRQ_HANDLED; +} + +static int get_free_pipe_id_locked(struct goldfish_pipe_dev *dev) +{ + int id; + for (id = 0; id < dev->pipes_capacity; ++id) + if (!dev->pipes[id]) + return id; + + { + /* Reallocate the array */ + u32 new_capacity = 2 * dev->pipes_capacity; + struct goldfish_pipe **pipes = + kcalloc(new_capacity, sizeof(*pipes), + GFP_ATOMIC); + if (!pipes) + return -ENOMEM; + memcpy(pipes, dev->pipes, sizeof(*pipes) * dev->pipes_capacity); + kfree(dev->pipes); + dev->pipes = pipes; + id = dev->pipes_capacity; + dev->pipes_capacity = new_capacity; + } + return id; +} + +/** + * goldfish_pipe_open - open a channel to the AVD + * @inode: inode of device + * @file: file struct of opener + * + * Create a new pipe link between the emulator and the use application. + * Each new request produces a new pipe. + * + * Note: we use the pipe ID as a mux. All goldfish emulations are 32bit + * right now so this is fine. A move to 64bit will need this addressing + */ +static int goldfish_pipe_open(struct inode *inode, struct file *file) +{ + struct goldfish_pipe_dev *dev = pipe_dev; + unsigned long flags; + int id; + int status; + + /* Allocate new pipe kernel object */ + struct goldfish_pipe *pipe = kzalloc(sizeof(*pipe), GFP_KERNEL); + if (pipe == NULL) + return -ENOMEM; + + pipe->dev = dev; + mutex_init(&pipe->lock); + init_waitqueue_head(&pipe->wake_queue); + + /* + * Command buffer needs to be allocated on its own page to make sure it is + * physically contiguous in host's address space. + */ + pipe->command_buffer = + (struct goldfish_pipe_command*)__get_free_page(GFP_KERNEL); + if (!pipe->command_buffer) { + status = -ENOMEM; + goto err_pipe; + } + + spin_lock_irqsave(&dev->lock, flags); + + id = get_free_pipe_id_locked(dev); + if (id < 0) { + status = id; + goto err_id_locked; + } + + dev->pipes[id] = pipe; + pipe->id = id; + pipe->command_buffer->id = id; + + /* Now tell the emulator we're opening a new pipe. */ + dev->buffers->open_command_params.rw_params_max_count = + MAX_BUFFERS_PER_COMMAND; + dev->buffers->open_command_params.command_buffer_ptr = + (u64)(unsigned long)__pa(pipe->command_buffer); + status = goldfish_cmd_locked(pipe, PIPE_CMD_OPEN); + spin_unlock_irqrestore(&dev->lock, flags); + if (status < 0) + goto err_cmd; + /* All is done, save the pipe into the file's private data field */ + file->private_data = pipe; + return 0; + +err_cmd: + spin_lock_irqsave(&dev->lock, flags); + dev->pipes[id] = NULL; +err_id_locked: + spin_unlock_irqrestore(&dev->lock, flags); + free_page((unsigned long)pipe->command_buffer); +err_pipe: + kfree(pipe); + return status; +} + +static int goldfish_pipe_release(struct inode *inode, struct file *filp) +{ + unsigned long flags; + struct goldfish_pipe *pipe = filp->private_data; + struct goldfish_pipe_dev *dev = pipe->dev; + + /* The guest is closing the channel, so tell the emulator right now */ + (void)goldfish_cmd(pipe, PIPE_CMD_CLOSE); + + spin_lock_irqsave(&dev->lock, flags); + dev->pipes[pipe->id] = NULL; + signalled_pipes_remove_locked(dev, pipe); + spin_unlock_irqrestore(&dev->lock, flags); + + filp->private_data = NULL; + free_page((unsigned long)pipe->command_buffer); + kfree(pipe); + return 0; +} + +static const struct file_operations goldfish_pipe_fops = { + .owner = THIS_MODULE, + .read = goldfish_pipe_read, + .write = goldfish_pipe_write, + .poll = goldfish_pipe_poll, + .open = goldfish_pipe_open, + .release = goldfish_pipe_release, +}; + +static struct miscdevice goldfish_pipe_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "goldfish_pipe", + .fops = &goldfish_pipe_fops, +}; + +static int goldfish_pipe_device_init_v2(struct platform_device *pdev) +{ + char *page; + struct goldfish_pipe_dev *dev = pipe_dev; + int err = devm_request_irq(&pdev->dev, dev->irq, goldfish_pipe_interrupt, + IRQF_SHARED, "goldfish_pipe", dev); + if (err) { + dev_err(&pdev->dev, "unable to allocate IRQ for v2\n"); + return err; + } + + err = misc_register(&goldfish_pipe_dev); + if (err) { + dev_err(&pdev->dev, "unable to register v2 device\n"); + return err; + } + + dev->first_signalled_pipe = NULL; + dev->pipes_capacity = INITIAL_PIPES_CAPACITY; + dev->pipes = kcalloc(dev->pipes_capacity, sizeof(*dev->pipes), GFP_KERNEL); + if (!dev->pipes) + return -ENOMEM; + + /* + * We're going to pass two buffers, open_command_params and + * signalled_pipe_buffers, to the host. This means each of those buffers + * needs to be contained in a single physical page. The easiest choice is + * to just allocate a page and place the buffers in it. + */ + BUG_ON(sizeof(*dev->buffers) > PAGE_SIZE); + page = (char*)__get_free_page(GFP_KERNEL); + if (!page) { + kfree(dev->pipes); + return -ENOMEM; + } + dev->buffers = (struct goldfish_pipe_dev_buffers*)page; + + /* Send the buffer addresses to the host */ + { + u64 paddr = __pa(&dev->buffers->signalled_pipe_buffers); + writel((u32)(unsigned long)(paddr >> 32), dev->base + PIPE_REG_SIGNAL_BUFFER_HIGH); + writel((u32)(unsigned long)paddr, dev->base + PIPE_REG_SIGNAL_BUFFER); + writel((u32)MAX_SIGNALLED_PIPES, dev->base + PIPE_REG_SIGNAL_BUFFER_COUNT); + + paddr = __pa(&dev->buffers->open_command_params); + writel((u32)(unsigned long)(paddr >> 32), dev->base + PIPE_REG_OPEN_BUFFER_HIGH); + writel((u32)(unsigned long)paddr, dev->base + PIPE_REG_OPEN_BUFFER); + } + return 0; +} + +static void goldfish_pipe_device_deinit_v2(struct platform_device *pdev) { + struct goldfish_pipe_dev *dev = pipe_dev; + misc_deregister(&goldfish_pipe_dev); + kfree(dev->pipes); + free_page((unsigned long)dev->buffers); +} + +static int goldfish_pipe_probe(struct platform_device *pdev) +{ + int err; + struct resource *r; + struct goldfish_pipe_dev *dev = pipe_dev; + + BUG_ON(sizeof(struct goldfish_pipe_command) > PAGE_SIZE); + + /* not thread safe, but this should not happen */ + WARN_ON(dev->base != NULL); + + spin_lock_init(&dev->lock); + + r = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (r == NULL || resource_size(r) < PAGE_SIZE) { + dev_err(&pdev->dev, "can't allocate i/o page\n"); + return -EINVAL; + } + dev->base = devm_ioremap(&pdev->dev, r->start, PAGE_SIZE); + if (dev->base == NULL) { + dev_err(&pdev->dev, "ioremap failed\n"); + return -EINVAL; + } + + r = platform_get_resource(pdev, IORESOURCE_IRQ, 0); + if (r == NULL) { + err = -EINVAL; + goto error; + } + dev->irq = r->start; + + /* + * Exchange the versions with the host device + * + * Note: v1 driver used to not report its version, so we write it before + * reading device version back: this allows the host implementation to + * detect the old driver (if there was no version write before read). + */ + writel((u32)PIPE_DRIVER_VERSION, dev->base + PIPE_REG_VERSION); + dev->version = readl(dev->base + PIPE_REG_VERSION); + if (dev->version < PIPE_CURRENT_DEVICE_VERSION) { + /* initialize the old device version */ + err = goldfish_pipe_device_init_v1(pdev); + } else { + /* Host device supports the new interface */ + err = goldfish_pipe_device_init_v2(pdev); + } + if (!err) + return 0; + +error: + dev->base = NULL; + return err; +} + +static int goldfish_pipe_remove(struct platform_device *pdev) +{ + struct goldfish_pipe_dev *dev = pipe_dev; + if (dev->version < PIPE_CURRENT_DEVICE_VERSION) + goldfish_pipe_device_deinit_v1(pdev); + else + goldfish_pipe_device_deinit_v2(pdev); + dev->base = NULL; + return 0; +} + +static const struct acpi_device_id goldfish_pipe_acpi_match[] = { + { "GFSH0003", 0 }, + { }, +}; +MODULE_DEVICE_TABLE(acpi, goldfish_pipe_acpi_match); + +static const struct of_device_id goldfish_pipe_of_match[] = { + { .compatible = "google,android-pipe", }, + {}, +}; +MODULE_DEVICE_TABLE(of, goldfish_pipe_of_match); + +static struct platform_driver goldfish_pipe_driver = { + .probe = goldfish_pipe_probe, + .remove = goldfish_pipe_remove, + .driver = { + .name = "goldfish_pipe", + .of_match_table = goldfish_pipe_of_match, + .acpi_match_table = ACPI_PTR(goldfish_pipe_acpi_match), + } +}; + +module_platform_driver(goldfish_pipe_driver); +MODULE_AUTHOR("David Turner <digit@google.com>"); +MODULE_LICENSE("GPL");
diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index b65ce75..60ee94e 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -9526,7 +9526,7 @@ static struct ibm_init_struct ibms_init[] __initdata = { }, }; -static int __init set_ibm_param(const char *val, struct kernel_param *kp) +static int __init set_ibm_param(const char *val, const struct kernel_param *kp) { unsigned int i; struct ibm_struct *ibm;
diff --git a/drivers/power/supply/power_supply_sysfs.c b/drivers/power/supply/power_supply_sysfs.c index bcde8d1..fdb824f 100644 --- a/drivers/power/supply/power_supply_sysfs.c +++ b/drivers/power/supply/power_supply_sysfs.c
@@ -107,7 +107,10 @@ static ssize_t power_supply_show_property(struct device *dev, else if (off >= POWER_SUPPLY_PROP_MODEL_NAME) return sprintf(buf, "%s\n", value.strval); - return sprintf(buf, "%d\n", value.intval); + if (off == POWER_SUPPLY_PROP_CHARGE_COUNTER_EXT) + return sprintf(buf, "%lld\n", value.int64val); + else + return sprintf(buf, "%d\n", value.intval); } static ssize_t power_supply_store_property(struct device *dev, @@ -198,6 +201,12 @@ static struct device_attribute power_supply_attrs[] = { POWER_SUPPLY_ATTR(scope), POWER_SUPPLY_ATTR(charge_term_current), POWER_SUPPLY_ATTR(calibrate), + /* Local extensions */ + POWER_SUPPLY_ATTR(usb_hc), + POWER_SUPPLY_ATTR(usb_otg), + POWER_SUPPLY_ATTR(charge_enabled), + /* Local extensions of type int64_t */ + POWER_SUPPLY_ATTR(charge_counter_ext), /* Properties of type `const char *' */ POWER_SUPPLY_ATTR(model_name), POWER_SUPPLY_ATTR(manufacturer),
diff --git a/drivers/rtc/rtc-palmas.c b/drivers/rtc/rtc-palmas.c index 4bcfb88..34aea38 100644 --- a/drivers/rtc/rtc-palmas.c +++ b/drivers/rtc/rtc-palmas.c
@@ -45,6 +45,42 @@ struct palmas_rtc { /* Total number of RTC registers needed to set time*/ #define PALMAS_NUM_TIME_REGS (PALMAS_YEARS_REG - PALMAS_SECONDS_REG + 1) +/* + * Special bin2bcd mapping to deal with bcd storage of year. + * + * 0-69 -> 0xD0 + * 70-99 (1970 - 1999) -> 0xD0 - 0xF9 (correctly rolls to 0x00) + * 100-199 (2000 - 2099) -> 0x00 - 0x99 (does not roll to 0xA0 :-( ) + * 200-229 (2100 - 2129) -> 0xA0 - 0xC9 (really for completeness) + * 230- -> 0xC9 + * + * Confirmed: the only transition that does not work correctly for this rtc + * clock is the transition from 2099 to 2100, it proceeds to 2000. We will + * accept this issue since the clock retains and transitions the year correctly + * in all other conditions. + */ +static unsigned char year_bin2bcd(int val) +{ + if (val < 70) + return 0xD0; + if (val < 100) + return bin2bcd(val - 20) | 0x80; /* KISS leverage of bin2bcd */ + if (val >= 230) + return 0xC9; + if (val >= 200) + return bin2bcd(val - 180) | 0x80; + return bin2bcd(val - 100); +} + +static int year_bcd2bin(unsigned char val) +{ + if (val >= 0xD0) + return bcd2bin(val & 0x7F) + 20; + if (val >= 0xA0) + return bcd2bin(val & 0x7F) + 180; + return bcd2bin(val) + 100; +} + static int palmas_rtc_read_time(struct device *dev, struct rtc_time *tm) { unsigned char rtc_data[PALMAS_NUM_TIME_REGS]; @@ -71,7 +107,7 @@ static int palmas_rtc_read_time(struct device *dev, struct rtc_time *tm) tm->tm_hour = bcd2bin(rtc_data[2]); tm->tm_mday = bcd2bin(rtc_data[3]); tm->tm_mon = bcd2bin(rtc_data[4]) - 1; - tm->tm_year = bcd2bin(rtc_data[5]) + 100; + tm->tm_year = year_bcd2bin(rtc_data[5]); return ret; } @@ -87,7 +123,7 @@ static int palmas_rtc_set_time(struct device *dev, struct rtc_time *tm) rtc_data[2] = bin2bcd(tm->tm_hour); rtc_data[3] = bin2bcd(tm->tm_mday); rtc_data[4] = bin2bcd(tm->tm_mon + 1); - rtc_data[5] = bin2bcd(tm->tm_year - 100); + rtc_data[5] = year_bin2bcd(tm->tm_year); /* Stop RTC while updating the RTC time registers */ ret = palmas_update_bits(palmas, PALMAS_RTC_BASE, PALMAS_RTC_CTRL_REG, @@ -142,7 +178,7 @@ static int palmas_rtc_read_alarm(struct device *dev, struct rtc_wkalrm *alm) alm->time.tm_hour = bcd2bin(alarm_data[2]); alm->time.tm_mday = bcd2bin(alarm_data[3]); alm->time.tm_mon = bcd2bin(alarm_data[4]) - 1; - alm->time.tm_year = bcd2bin(alarm_data[5]) + 100; + alm->time.tm_year = year_bcd2bin(alarm_data[5]); ret = palmas_read(palmas, PALMAS_RTC_BASE, PALMAS_RTC_INTERRUPTS_REG, &int_val); @@ -173,7 +209,7 @@ static int palmas_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alm) alarm_data[2] = bin2bcd(alm->time.tm_hour); alarm_data[3] = bin2bcd(alm->time.tm_mday); alarm_data[4] = bin2bcd(alm->time.tm_mon + 1); - alarm_data[5] = bin2bcd(alm->time.tm_year - 100); + alarm_data[5] = year_bin2bcd(alm->time.tm_year); ret = palmas_bulk_write(palmas, PALMAS_RTC_BASE, PALMAS_ALARM_SECONDS_REG, alarm_data, PALMAS_NUM_TIME_REGS);
diff --git a/drivers/scsi/fcoe/fcoe_transport.c b/drivers/scsi/fcoe/fcoe_transport.c index 375c536..c5eb0c4 100644 --- a/drivers/scsi/fcoe/fcoe_transport.c +++ b/drivers/scsi/fcoe/fcoe_transport.c
@@ -32,13 +32,13 @@ MODULE_AUTHOR("Open-FCoE.org"); MODULE_DESCRIPTION("FIP discovery protocol and FCoE transport for FCoE HBAs"); MODULE_LICENSE("GPL v2"); -static int fcoe_transport_create(const char *, struct kernel_param *); -static int fcoe_transport_destroy(const char *, struct kernel_param *); +static int fcoe_transport_create(const char *, const struct kernel_param *); +static int fcoe_transport_destroy(const char *, const struct kernel_param *); static int fcoe_transport_show(char *buffer, const struct kernel_param *kp); static struct fcoe_transport *fcoe_transport_lookup(struct net_device *device); static struct fcoe_transport *fcoe_netdev_map_lookup(struct net_device *device); -static int fcoe_transport_enable(const char *, struct kernel_param *); -static int fcoe_transport_disable(const char *, struct kernel_param *); +static int fcoe_transport_enable(const char *, const struct kernel_param *); +static int fcoe_transport_disable(const char *, const struct kernel_param *); static int libfcoe_device_notification(struct notifier_block *notifier, ulong event, void *ptr); @@ -865,7 +865,8 @@ EXPORT_SYMBOL(fcoe_ctlr_destroy_store); * * Returns: 0 for success */ -static int fcoe_transport_create(const char *buffer, struct kernel_param *kp) +static int fcoe_transport_create(const char *buffer, + const struct kernel_param *kp) { int rc = -ENODEV; struct net_device *netdev = NULL; @@ -930,7 +931,8 @@ static int fcoe_transport_create(const char *buffer, struct kernel_param *kp) * * Returns: 0 for success */ -static int fcoe_transport_destroy(const char *buffer, struct kernel_param *kp) +static int fcoe_transport_destroy(const char *buffer, + const struct kernel_param *kp) { int rc = -ENODEV; struct net_device *netdev = NULL; @@ -974,7 +976,8 @@ static int fcoe_transport_destroy(const char *buffer, struct kernel_param *kp) * * Returns: 0 for success */ -static int fcoe_transport_disable(const char *buffer, struct kernel_param *kp) +static int fcoe_transport_disable(const char *buffer, + const struct kernel_param *kp) { int rc = -ENODEV; struct net_device *netdev = NULL; @@ -1008,7 +1011,8 @@ static int fcoe_transport_disable(const char *buffer, struct kernel_param *kp) * * Returns: 0 for success */ -static int fcoe_transport_enable(const char *buffer, struct kernel_param *kp) +static int fcoe_transport_enable(const char *buffer, + const struct kernel_param *kp) { int rc = -ENODEV; struct net_device *netdev = NULL;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index a1a5ceb..8e83e34 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -105,7 +105,7 @@ _base_get_ioc_facts(struct MPT3SAS_ADAPTER *ioc); * */ static int -_scsih_set_fwfault_debug(const char *val, struct kernel_param *kp) +_scsih_set_fwfault_debug(const char *val, const struct kernel_param *kp) { int ret = param_set_int(val, kp); struct MPT3SAS_ADAPTER *ioc;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index ec48c01..caa0045 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -281,7 +281,7 @@ struct _scsi_io_transfer { * Note: The logging levels are defined in mpt3sas_debug.h. */ static int -_scsih_set_debug_level(const char *val, struct kernel_param *kp) +_scsih_set_debug_level(const char *val, const struct kernel_param *kp) { int ret = param_set_int(val, kp); struct MPT3SAS_ADAPTER *ioc;
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index f857086..d0f6f2c 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c
@@ -41,6 +41,7 @@ #include <linux/devfreq.h> #include <linux/nls.h> #include <linux/of.h> +#include <linux/blkdev.h> #include "ufshcd.h" #include "ufs_quirks.h" #include "unipro.h" @@ -1469,6 +1470,17 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd) clear_bit_unlock(tag, &hba->lrb_in_use); goto out; } + + /* IO svc time latency histogram */ + if (hba != NULL && cmd->request != NULL) { + if (hba->latency_hist_enabled && + (cmd->request->cmd_type == REQ_TYPE_FS)) { + cmd->request->lat_hist_io_start = ktime_get(); + cmd->request->lat_hist_enabled = 1; + } else + cmd->request->lat_hist_enabled = 0; + } + WARN_ON(hba->clk_gating.state != CLKS_ON); lrbp = &hba->lrb[tag]; @@ -3677,6 +3689,7 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba, struct scsi_cmnd *cmd; int result; int index; + struct request *req; for_each_set_bit(index, &completed_reqs, hba->nutrs) { lrbp = &hba->lrb[index]; @@ -3688,6 +3701,22 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba, /* Mark completed command as NULL in LRB */ lrbp->cmd = NULL; clear_bit_unlock(index, &hba->lrb_in_use); + req = cmd->request; + if (req) { + /* Update IO svc time latency histogram */ + if (req->lat_hist_enabled) { + ktime_t completion; + u_int64_t delta_us; + + completion = ktime_get(); + delta_us = ktime_us_delta(completion, + req->lat_hist_io_start); + blk_update_latency_hist( + (rq_data_dir(req) == READ) ? + &hba->io_lat_read : + &hba->io_lat_write, delta_us); + } + } /* Do not touch lrbp after scsi done */ cmd->scsi_done(cmd); __ufshcd_release(hba); @@ -6467,6 +6496,61 @@ int ufshcd_shutdown(struct ufs_hba *hba) } EXPORT_SYMBOL(ufshcd_shutdown); +/* + * Values permitted 0, 1, 2. + * 0 -> Disable IO latency histograms (default) + * 1 -> Enable IO latency histograms + * 2 -> Zero out IO latency histograms + */ +static ssize_t +latency_hist_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct ufs_hba *hba = dev_get_drvdata(dev); + long value; + + if (kstrtol(buf, 0, &value)) + return -EINVAL; + if (value == BLK_IO_LAT_HIST_ZERO) { + memset(&hba->io_lat_read, 0, sizeof(hba->io_lat_read)); + memset(&hba->io_lat_write, 0, sizeof(hba->io_lat_write)); + } else if (value == BLK_IO_LAT_HIST_ENABLE || + value == BLK_IO_LAT_HIST_DISABLE) + hba->latency_hist_enabled = value; + return count; +} + +ssize_t +latency_hist_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct ufs_hba *hba = dev_get_drvdata(dev); + size_t written_bytes; + + written_bytes = blk_latency_hist_show("Read", &hba->io_lat_read, + buf, PAGE_SIZE); + written_bytes += blk_latency_hist_show("Write", &hba->io_lat_write, + buf + written_bytes, PAGE_SIZE - written_bytes); + + return written_bytes; +} + +static DEVICE_ATTR(latency_hist, S_IRUGO | S_IWUSR, + latency_hist_show, latency_hist_store); + +static void +ufshcd_init_latency_hist(struct ufs_hba *hba) +{ + if (device_create_file(hba->dev, &dev_attr_latency_hist)) + dev_err(hba->dev, "Failed to create latency_hist sysfs entry\n"); +} + +static void +ufshcd_exit_latency_hist(struct ufs_hba *hba) +{ + device_create_file(hba->dev, &dev_attr_latency_hist); +} + /** * ufshcd_remove - de-allocate SCSI host and host memory space * data structure memory @@ -6482,6 +6566,7 @@ void ufshcd_remove(struct ufs_hba *hba) scsi_host_put(hba->host); ufshcd_exit_clk_gating(hba); + ufshcd_exit_latency_hist(hba); if (ufshcd_is_clkscaling_enabled(hba)) devfreq_remove_device(hba->devfreq); ufshcd_hba_exit(hba); @@ -6798,6 +6883,8 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) /* Hold auto suspend until async scan completes */ pm_runtime_get_sync(dev); + ufshcd_init_latency_hist(hba); + /* * We are assuming that device wasn't put in sleep/power-down * state exclusively during the boot stage before kernel. @@ -6814,6 +6901,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) scsi_remove_host(hba->host); exit_gating: ufshcd_exit_clk_gating(hba); + ufshcd_exit_latency_hist(hba); out_disable: hba->is_irq_enabled = false; scsi_host_put(host);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h index 6dbd2e1..096e667 100644 --- a/drivers/scsi/ufs/ufshcd.h +++ b/drivers/scsi/ufs/ufshcd.h
@@ -575,6 +575,10 @@ struct ufs_hba { bool is_urgent_bkops_lvl_checked; struct ufs_desc_size desc_size; + + int latency_hist_enabled; + struct io_latency_state io_lat_read; + struct io_latency_state io_lat_write; }; /* Returns true if clocks can be gated. Otherwise false */
diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig index 6c00d6f..a17c483 100644 --- a/drivers/staging/android/Kconfig +++ b/drivers/staging/android/Kconfig
@@ -24,8 +24,28 @@ scripts (/init.rc), and it defines priority values with minimum free memory size for each priority. +config ANDROID_LOW_MEMORY_KILLER_AUTODETECT_OOM_ADJ_VALUES + bool "Android Low Memory Killer: detect oom_adj values" + depends on ANDROID_LOW_MEMORY_KILLER + default y + ---help--- + Detect oom_adj values written to + /sys/module/lowmemorykiller/parameters/adj and convert them + to oom_score_adj values. + +config ANDROID_VSOC + tristate "Android Virtual SoC support" + default n + depends on PCI_MSI + ---help--- + This option adds support for the Virtual SoC driver needed to boot + a 'cuttlefish' Android image inside QEmu. The driver interacts with + a QEmu ivshmem device. If built as a module, it will be called vsoc. + source "drivers/staging/android/ion/Kconfig" +source "drivers/staging/android/fiq_debugger/Kconfig" + endif # if ANDROID endmenu
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile index 7ed1be7..93c5f5a 100644 --- a/drivers/staging/android/Makefile +++ b/drivers/staging/android/Makefile
@@ -1,6 +1,8 @@ ccflags-y += -I$(src) # needed for trace events obj-y += ion/ +obj-$(CONFIG_FIQ_DEBUGGER) += fiq_debugger/ obj-$(CONFIG_ASHMEM) += ashmem.o obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER) += lowmemorykiller.o +obj-$(CONFIG_ANDROID_VSOC) += vsoc.o
diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO index 64d8c87..edfb680 100644 --- a/drivers/staging/android/TODO +++ b/drivers/staging/android/TODO
@@ -33,5 +33,14 @@ - clean up and ABI check for security issues - move it to drivers/base/dma-buf +vsoc.c, uapi/vsoc_shm.h + - The current driver uses the same wait queue for all of the futexes in a + region. This will cause false wakeups in regions with a large number of + waiting threads. We should eventually use multiple queues and select the + queue based on the region. + - Add debugfs support for examining the permissions of regions. + - Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT ioctl. This functionality has been + superseded by the futex and is there for legacy reasons. + Please send patches to Greg Kroah-Hartman <greg@kroah.com> and Cc: Arve Hjønnevåg <arve@android.com> and Riley Andrews <riandrews@android.com>
diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c index c6314d1..5af176b 100644 --- a/drivers/staging/android/ashmem.c +++ b/drivers/staging/android/ashmem.c
@@ -415,22 +415,14 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma) } get_file(asma->file); - /* - * XXX - Reworked to use shmem_zero_setup() instead of - * shmem_set_file while we're in staging. -jstultz - */ - if (vma->vm_flags & VM_SHARED) { - ret = shmem_zero_setup(vma); - if (ret) { - fput(asma->file); - goto out; - } + if (vma->vm_flags & VM_SHARED) + shmem_set_file(vma, asma->file); + else { + if (vma->vm_file) + fput(vma->vm_file); + vma->vm_file = asma->file; } - if (vma->vm_file) - fput(vma->vm_file); - vma->vm_file = asma->file; - out: mutex_unlock(&ashmem_mutex); return ret; @@ -467,9 +459,9 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) loff_t start = range->pgstart * PAGE_SIZE; loff_t end = (range->pgend + 1) * PAGE_SIZE; - vfs_fallocate(range->asma->file, - FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, - start, end - start); + range->asma->file->f_op->fallocate(range->asma->file, + FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + start, end - start); range->purged = ASHMEM_WAS_PURGED; lru_del(range);
diff --git a/drivers/staging/android/fiq_debugger/Kconfig b/drivers/staging/android/fiq_debugger/Kconfig new file mode 100644 index 0000000..60fc224 --- /dev/null +++ b/drivers/staging/android/fiq_debugger/Kconfig
@@ -0,0 +1,58 @@ +config FIQ_DEBUGGER + bool "FIQ Mode Serial Debugger" + default n + depends on ARM || ARM64 + help + The FIQ serial debugger can accept commands even when the + kernel is unresponsive due to being stuck with interrupts + disabled. + +config FIQ_DEBUGGER_NO_SLEEP + bool "Keep serial debugger active" + depends on FIQ_DEBUGGER + default n + help + Enables the serial debugger at boot. Passing + fiq_debugger.no_sleep on the kernel commandline will + override this config option. + +config FIQ_DEBUGGER_WAKEUP_IRQ_ALWAYS_ON + bool "Don't disable wakeup IRQ when debugger is active" + depends on FIQ_DEBUGGER + default n + help + Don't disable the wakeup irq when enabling the uart clock. This will + cause extra interrupts, but it makes the serial debugger usable with + on some MSM radio builds that ignore the uart clock request in power + collapse. + +config FIQ_DEBUGGER_CONSOLE + bool "Console on FIQ Serial Debugger port" + depends on FIQ_DEBUGGER + default n + help + Enables a console so that printk messages are displayed on + the debugger serial port as the occur. + +config FIQ_DEBUGGER_CONSOLE_DEFAULT_ENABLE + bool "Put the FIQ debugger into console mode by default" + depends on FIQ_DEBUGGER_CONSOLE + default n + help + If enabled, this puts the fiq debugger into console mode by default. + Otherwise, the fiq debugger will start out in debug mode. + +config FIQ_DEBUGGER_UART_OVERLAY + bool "Install uart DT overlay" + depends on FIQ_DEBUGGER + select OF_OVERLAY + default n + help + If enabled, fiq debugger is calling fiq_debugger_uart_overlay() + that will apply overlay uart_overlay@0 to disable proper uart. + +config FIQ_WATCHDOG + bool + select FIQ_DEBUGGER + select PSTORE_RAM + default n
diff --git a/drivers/staging/android/fiq_debugger/Makefile b/drivers/staging/android/fiq_debugger/Makefile new file mode 100644 index 0000000..a7ca487 --- /dev/null +++ b/drivers/staging/android/fiq_debugger/Makefile
@@ -0,0 +1,4 @@ +obj-y += fiq_debugger.o +obj-$(CONFIG_ARM) += fiq_debugger_arm.o +obj-$(CONFIG_ARM64) += fiq_debugger_arm64.o +obj-$(CONFIG_FIQ_WATCHDOG) += fiq_watchdog.o
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger.c b/drivers/staging/android/fiq_debugger/fiq_debugger.c new file mode 100644 index 0000000..d9bc1253 --- /dev/null +++ b/drivers/staging/android/fiq_debugger/fiq_debugger.c
@@ -0,0 +1,1246 @@ +/* + * drivers/staging/android/fiq_debugger.c + * + * Serial Debugger Interface accessed through an FIQ interrupt. + * + * Copyright (C) 2008 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <stdarg.h> +#include <linux/module.h> +#include <linux/io.h> +#include <linux/console.h> +#include <linux/interrupt.h> +#include <linux/clk.h> +#include <linux/platform_device.h> +#include <linux/kernel_stat.h> +#include <linux/kmsg_dump.h> +#include <linux/irq.h> +#include <linux/delay.h> +#include <linux/reboot.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/smp.h> +#include <linux/timer.h> +#include <linux/tty.h> +#include <linux/tty_flip.h> + +#ifdef CONFIG_FIQ_GLUE +#include <asm/fiq_glue.h> +#endif + +#ifdef CONFIG_FIQ_DEBUGGER_UART_OVERLAY +#include <linux/of.h> +#endif + +#include <linux/uaccess.h> + +#include "fiq_debugger.h" +#include "fiq_debugger_priv.h" +#include "fiq_debugger_ringbuf.h" + +#define DEBUG_MAX 64 +#define MAX_UNHANDLED_FIQ_COUNT 1000000 + +#define MAX_FIQ_DEBUGGER_PORTS 4 + +struct fiq_debugger_state { +#ifdef CONFIG_FIQ_GLUE + struct fiq_glue_handler handler; +#endif + struct fiq_debugger_output output; + + int fiq; + int uart_irq; + int signal_irq; + int wakeup_irq; + bool wakeup_irq_no_set_wake; + struct clk *clk; + struct fiq_debugger_pdata *pdata; + struct platform_device *pdev; + + char debug_cmd[DEBUG_MAX]; + int debug_busy; + int debug_abort; + + char debug_buf[DEBUG_MAX]; + int debug_count; + + bool no_sleep; + bool debug_enable; + bool ignore_next_wakeup_irq; + struct timer_list sleep_timer; + spinlock_t sleep_timer_lock; + bool uart_enabled; + struct wakeup_source debugger_wake_src; + bool console_enable; + int current_cpu; + atomic_t unhandled_fiq_count; + bool in_fiq; + + struct work_struct work; + spinlock_t work_lock; + char work_cmd[DEBUG_MAX]; + +#ifdef CONFIG_FIQ_DEBUGGER_CONSOLE + spinlock_t console_lock; + struct console console; + struct tty_port tty_port; + struct fiq_debugger_ringbuf *tty_rbuf; + bool syslog_dumping; +#endif + + unsigned int last_irqs[NR_IRQS]; + unsigned int last_local_timer_irqs[NR_CPUS]; +}; + +#ifdef CONFIG_FIQ_DEBUGGER_CONSOLE +struct tty_driver *fiq_tty_driver; +#endif + +#ifdef CONFIG_FIQ_DEBUGGER_NO_SLEEP +static bool initial_no_sleep = true; +#else +static bool initial_no_sleep; +#endif + +#ifdef CONFIG_FIQ_DEBUGGER_CONSOLE_DEFAULT_ENABLE +static bool initial_debug_enable = true; +static bool initial_console_enable = true; +#else +static bool initial_debug_enable; +static bool initial_console_enable; +#endif + +static bool fiq_kgdb_enable; +static bool fiq_debugger_disable; + +module_param_named(no_sleep, initial_no_sleep, bool, 0644); +module_param_named(debug_enable, initial_debug_enable, bool, 0644); +module_param_named(console_enable, initial_console_enable, bool, 0644); +module_param_named(kgdb_enable, fiq_kgdb_enable, bool, 0644); +module_param_named(disable, fiq_debugger_disable, bool, 0644); + +#ifdef CONFIG_FIQ_DEBUGGER_WAKEUP_IRQ_ALWAYS_ON +static inline +void fiq_debugger_enable_wakeup_irq(struct fiq_debugger_state *state) {} +static inline +void fiq_debugger_disable_wakeup_irq(struct fiq_debugger_state *state) {} +#else +static inline +void fiq_debugger_enable_wakeup_irq(struct fiq_debugger_state *state) +{ + if (state->wakeup_irq < 0) + return; + enable_irq(state->wakeup_irq); + if (!state->wakeup_irq_no_set_wake) + enable_irq_wake(state->wakeup_irq); +} +static inline +void fiq_debugger_disable_wakeup_irq(struct fiq_debugger_state *state) +{ + if (state->wakeup_irq < 0) + return; + disable_irq_nosync(state->wakeup_irq); + if (!state->wakeup_irq_no_set_wake) + disable_irq_wake(state->wakeup_irq); +} +#endif + +static inline bool fiq_debugger_have_fiq(struct fiq_debugger_state *state) +{ + return (state->fiq >= 0); +} + +#ifdef CONFIG_FIQ_GLUE +static void fiq_debugger_force_irq(struct fiq_debugger_state *state) +{ + unsigned int irq = state->signal_irq; + + if (WARN_ON(!fiq_debugger_have_fiq(state))) + return; + if (state->pdata->force_irq) { + state->pdata->force_irq(state->pdev, irq); + } else { + struct irq_chip *chip = irq_get_chip(irq); + if (chip && chip->irq_retrigger) + chip->irq_retrigger(irq_get_irq_data(irq)); + } +} +#endif + +static void fiq_debugger_uart_enable(struct fiq_debugger_state *state) +{ + if (state->clk) + clk_enable(state->clk); + if (state->pdata->uart_enable) + state->pdata->uart_enable(state->pdev); +} + +static void fiq_debugger_uart_disable(struct fiq_debugger_state *state) +{ + if (state->pdata->uart_disable) + state->pdata->uart_disable(state->pdev); + if (state->clk) + clk_disable(state->clk); +} + +static void fiq_debugger_uart_flush(struct fiq_debugger_state *state) +{ + if (state->pdata->uart_flush) + state->pdata->uart_flush(state->pdev); +} + +static void fiq_debugger_putc(struct fiq_debugger_state *state, char c) +{ + state->pdata->uart_putc(state->pdev, c); +} + +static void fiq_debugger_puts(struct fiq_debugger_state *state, char *s) +{ + unsigned c; + while ((c = *s++)) { + if (c == '\n') + fiq_debugger_putc(state, '\r'); + fiq_debugger_putc(state, c); + } +} + +static void fiq_debugger_prompt(struct fiq_debugger_state *state) +{ + fiq_debugger_puts(state, "debug> "); +} + +static void fiq_debugger_dump_kernel_log(struct fiq_debugger_state *state) +{ + char buf[512]; + size_t len; + struct kmsg_dumper dumper = { .active = true }; + + + kmsg_dump_rewind_nolock(&dumper); + while (kmsg_dump_get_line_nolock(&dumper, true, buf, + sizeof(buf) - 1, &len)) { + buf[len] = 0; + fiq_debugger_puts(state, buf); + } +} + +static void fiq_debugger_printf(struct fiq_debugger_output *output, + const char *fmt, ...) +{ + struct fiq_debugger_state *state; + char buf[256]; + va_list ap; + + state = container_of(output, struct fiq_debugger_state, output); + va_start(ap, fmt); + vsnprintf(buf, sizeof(buf), fmt, ap); + va_end(ap); + + fiq_debugger_puts(state, buf); +} + +/* Safe outside fiq context */ +static int fiq_debugger_printf_nfiq(void *cookie, const char *fmt, ...) +{ + struct fiq_debugger_state *state = cookie; + char buf[256]; + va_list ap; + unsigned long irq_flags; + + va_start(ap, fmt); + vsnprintf(buf, 128, fmt, ap); + va_end(ap); + + local_irq_save(irq_flags); + fiq_debugger_puts(state, buf); + fiq_debugger_uart_flush(state); + local_irq_restore(irq_flags); + return state->debug_abort; +} + +static void fiq_debugger_dump_irqs(struct fiq_debugger_state *state) +{ + int n; + struct irq_desc *desc; + + fiq_debugger_printf(&state->output, + "irqnr total since-last status name\n"); + for_each_irq_desc(n, desc) { + struct irqaction *act = desc->action; + if (!act && !kstat_irqs(n)) + continue; + fiq_debugger_printf(&state->output, "%5d: %10u %11u %8x %s\n", n, + kstat_irqs(n), + kstat_irqs(n) - state->last_irqs[n], + desc->status_use_accessors, + (act && act->name) ? act->name : "???"); + state->last_irqs[n] = kstat_irqs(n); + } +} + +static void fiq_debugger_do_ps(struct fiq_debugger_state *state) +{ + struct task_struct *g; + struct task_struct *p; + unsigned task_state; + static const char stat_nam[] = "RSDTtZX"; + + fiq_debugger_printf(&state->output, "pid ppid prio task pc\n"); + read_lock(&tasklist_lock); + do_each_thread(g, p) { + task_state = p->state ? __ffs(p->state) + 1 : 0; + fiq_debugger_printf(&state->output, + "%5d %5d %4d ", p->pid, p->parent->pid, p->prio); + fiq_debugger_printf(&state->output, "%-13.13s %c", p->comm, + task_state >= sizeof(stat_nam) ? '?' : stat_nam[task_state]); + if (task_state == TASK_RUNNING) + fiq_debugger_printf(&state->output, " running\n"); + else + fiq_debugger_printf(&state->output, " %08lx\n", + thread_saved_pc(p)); + } while_each_thread(g, p); + read_unlock(&tasklist_lock); +} + +#ifdef CONFIG_FIQ_DEBUGGER_CONSOLE +static void fiq_debugger_begin_syslog_dump(struct fiq_debugger_state *state) +{ + state->syslog_dumping = true; +} + +static void fiq_debugger_end_syslog_dump(struct fiq_debugger_state *state) +{ + state->syslog_dumping = false; +} +#else +extern int do_syslog(int type, char __user *bug, int count); +static void fiq_debugger_begin_syslog_dump(struct fiq_debugger_state *state) +{ + do_syslog(5 /* clear */, NULL, 0); +} + +static void fiq_debugger_end_syslog_dump(struct fiq_debugger_state *state) +{ + fiq_debugger_dump_kernel_log(state); +} +#endif + +static void fiq_debugger_do_sysrq(struct fiq_debugger_state *state, char rq) +{ + if ((rq == 'g' || rq == 'G') && !fiq_kgdb_enable) { + fiq_debugger_printf(&state->output, "sysrq-g blocked\n"); + return; + } + fiq_debugger_begin_syslog_dump(state); + handle_sysrq(rq); + fiq_debugger_end_syslog_dump(state); +} + +#ifdef CONFIG_KGDB +static void fiq_debugger_do_kgdb(struct fiq_debugger_state *state) +{ + if (!fiq_kgdb_enable) { + fiq_debugger_printf(&state->output, "kgdb through fiq debugger not enabled\n"); + return; + } + + fiq_debugger_printf(&state->output, "enabling console and triggering kgdb\n"); + state->console_enable = true; + handle_sysrq('g'); +} +#endif + +static void fiq_debugger_schedule_work(struct fiq_debugger_state *state, + char *cmd) +{ + unsigned long flags; + + spin_lock_irqsave(&state->work_lock, flags); + if (state->work_cmd[0] != '\0') { + fiq_debugger_printf(&state->output, "work command processor busy\n"); + spin_unlock_irqrestore(&state->work_lock, flags); + return; + } + + strlcpy(state->work_cmd, cmd, sizeof(state->work_cmd)); + spin_unlock_irqrestore(&state->work_lock, flags); + + schedule_work(&state->work); +} + +static void fiq_debugger_work(struct work_struct *work) +{ + struct fiq_debugger_state *state; + char work_cmd[DEBUG_MAX]; + char *cmd; + unsigned long flags; + + state = container_of(work, struct fiq_debugger_state, work); + + spin_lock_irqsave(&state->work_lock, flags); + + strlcpy(work_cmd, state->work_cmd, sizeof(work_cmd)); + state->work_cmd[0] = '\0'; + + spin_unlock_irqrestore(&state->work_lock, flags); + + cmd = work_cmd; + if (!strncmp(cmd, "reboot", 6)) { + cmd += 6; + while (*cmd == ' ') + cmd++; + if (*cmd != '\0') + kernel_restart(cmd); + else + kernel_restart(NULL); + } else { + fiq_debugger_printf(&state->output, "unknown work command '%s'\n", + work_cmd); + } +} + +/* This function CANNOT be called in FIQ context */ +static void fiq_debugger_irq_exec(struct fiq_debugger_state *state, char *cmd) +{ + if (!strcmp(cmd, "ps")) + fiq_debugger_do_ps(state); + if (!strcmp(cmd, "sysrq")) + fiq_debugger_do_sysrq(state, 'h'); + if (!strncmp(cmd, "sysrq ", 6)) + fiq_debugger_do_sysrq(state, cmd[6]); +#ifdef CONFIG_KGDB + if (!strcmp(cmd, "kgdb")) + fiq_debugger_do_kgdb(state); +#endif + if (!strncmp(cmd, "reboot", 6)) + fiq_debugger_schedule_work(state, cmd); +} + +static void fiq_debugger_help(struct fiq_debugger_state *state) +{ + fiq_debugger_printf(&state->output, + "FIQ Debugger commands:\n" + " pc PC status\n" + " regs Register dump\n" + " allregs Extended Register dump\n" + " bt Stack trace\n" + " reboot [<c>] Reboot with command <c>\n" + " reset [<c>] Hard reset with command <c>\n" + " irqs Interupt status\n" + " kmsg Kernel log\n" + " version Kernel version\n"); + fiq_debugger_printf(&state->output, + " sleep Allow sleep while in FIQ\n" + " nosleep Disable sleep while in FIQ\n" + " console Switch terminal to console\n" + " cpu Current CPU\n" + " cpu <number> Switch to CPU<number>\n"); + fiq_debugger_printf(&state->output, + " ps Process list\n" + " sysrq sysrq options\n" + " sysrq <param> Execute sysrq with <param>\n"); +#ifdef CONFIG_KGDB + fiq_debugger_printf(&state->output, + " kgdb Enter kernel debugger\n"); +#endif +} + +static void fiq_debugger_take_affinity(void *info) +{ + struct fiq_debugger_state *state = info; + struct cpumask cpumask; + + cpumask_clear(&cpumask); + cpumask_set_cpu(get_cpu(), &cpumask); + + irq_set_affinity(state->uart_irq, &cpumask); +} + +static void fiq_debugger_switch_cpu(struct fiq_debugger_state *state, int cpu) +{ + if (!fiq_debugger_have_fiq(state)) + smp_call_function_single(cpu, fiq_debugger_take_affinity, state, + false); + state->current_cpu = cpu; +} + +static bool fiq_debugger_fiq_exec(struct fiq_debugger_state *state, + const char *cmd, const struct pt_regs *regs, + void *svc_sp) +{ + bool signal_helper = false; + + if (!strcmp(cmd, "help") || !strcmp(cmd, "?")) { + fiq_debugger_help(state); + } else if (!strcmp(cmd, "pc")) { + fiq_debugger_dump_pc(&state->output, regs); + } else if (!strcmp(cmd, "regs")) { + fiq_debugger_dump_regs(&state->output, regs); + } else if (!strcmp(cmd, "allregs")) { + fiq_debugger_dump_allregs(&state->output, regs); + } else if (!strcmp(cmd, "bt")) { + fiq_debugger_dump_stacktrace(&state->output, regs, 100, svc_sp); + } else if (!strncmp(cmd, "reset", 5)) { + cmd += 5; + while (*cmd == ' ') + cmd++; + if (*cmd) { + char tmp_cmd[32]; + strlcpy(tmp_cmd, cmd, sizeof(tmp_cmd)); + machine_restart(tmp_cmd); + } else { + machine_restart(NULL); + } + } else if (!strcmp(cmd, "irqs")) { + fiq_debugger_dump_irqs(state); + } else if (!strcmp(cmd, "kmsg")) { + fiq_debugger_dump_kernel_log(state); + } else if (!strcmp(cmd, "version")) { + fiq_debugger_printf(&state->output, "%s\n", linux_banner); + } else if (!strcmp(cmd, "sleep")) { + state->no_sleep = false; + fiq_debugger_printf(&state->output, "enabling sleep\n"); + } else if (!strcmp(cmd, "nosleep")) { + state->no_sleep = true; + fiq_debugger_printf(&state->output, "disabling sleep\n"); + } else if (!strcmp(cmd, "console")) { + fiq_debugger_printf(&state->output, "console mode\n"); + fiq_debugger_uart_flush(state); + state->console_enable = true; + } else if (!strcmp(cmd, "cpu")) { + fiq_debugger_printf(&state->output, "cpu %d\n", state->current_cpu); + } else if (!strncmp(cmd, "cpu ", 4)) { + unsigned long cpu = 0; + if (kstrtoul(cmd + 4, 10, &cpu) == 0) + fiq_debugger_switch_cpu(state, cpu); + else + fiq_debugger_printf(&state->output, "invalid cpu\n"); + fiq_debugger_printf(&state->output, "cpu %d\n", state->current_cpu); + } else { + if (state->debug_busy) { + fiq_debugger_printf(&state->output, + "command processor busy. trying to abort.\n"); + state->debug_abort = -1; + } else { + strcpy(state->debug_cmd, cmd); + state->debug_busy = 1; + } + + return true; + } + if (!state->console_enable) + fiq_debugger_prompt(state); + + return signal_helper; +} + +static void fiq_debugger_sleep_timer_expired(unsigned long data) +{ + struct fiq_debugger_state *state = (struct fiq_debugger_state *)data; + unsigned long flags; + + spin_lock_irqsave(&state->sleep_timer_lock, flags); + if (state->uart_enabled && !state->no_sleep) { + if (state->debug_enable && !state->console_enable) { + state->debug_enable = false; + fiq_debugger_printf_nfiq(state, + "suspending fiq debugger\n"); + } + state->ignore_next_wakeup_irq = true; + fiq_debugger_uart_disable(state); + state->uart_enabled = false; + fiq_debugger_enable_wakeup_irq(state); + } + __pm_relax(&state->debugger_wake_src); + spin_unlock_irqrestore(&state->sleep_timer_lock, flags); +} + +static void fiq_debugger_handle_wakeup(struct fiq_debugger_state *state) +{ + unsigned long flags; + + spin_lock_irqsave(&state->sleep_timer_lock, flags); + if (state->wakeup_irq >= 0 && state->ignore_next_wakeup_irq) { + state->ignore_next_wakeup_irq = false; + } else if (!state->uart_enabled) { + __pm_stay_awake(&state->debugger_wake_src); + fiq_debugger_uart_enable(state); + state->uart_enabled = true; + fiq_debugger_disable_wakeup_irq(state); + mod_timer(&state->sleep_timer, jiffies + HZ / 2); + } + spin_unlock_irqrestore(&state->sleep_timer_lock, flags); +} + +static irqreturn_t fiq_debugger_wakeup_irq_handler(int irq, void *dev) +{ + struct fiq_debugger_state *state = dev; + + if (!state->no_sleep) + fiq_debugger_puts(state, "WAKEUP\n"); + fiq_debugger_handle_wakeup(state); + + return IRQ_HANDLED; +} + +static +void fiq_debugger_handle_console_irq_context(struct fiq_debugger_state *state) +{ +#if defined(CONFIG_FIQ_DEBUGGER_CONSOLE) + if (state->tty_port.ops) { + int i; + int count = fiq_debugger_ringbuf_level(state->tty_rbuf); + for (i = 0; i < count; i++) { + int c = fiq_debugger_ringbuf_peek(state->tty_rbuf, 0); + tty_insert_flip_char(&state->tty_port, c, TTY_NORMAL); + if (!fiq_debugger_ringbuf_consume(state->tty_rbuf, 1)) + pr_warn("fiq tty failed to consume byte\n"); + } + tty_flip_buffer_push(&state->tty_port); + } +#endif +} + +static void fiq_debugger_handle_irq_context(struct fiq_debugger_state *state) +{ + if (!state->no_sleep) { + unsigned long flags; + + spin_lock_irqsave(&state->sleep_timer_lock, flags); + __pm_stay_awake(&state->debugger_wake_src); + mod_timer(&state->sleep_timer, jiffies + HZ * 5); + spin_unlock_irqrestore(&state->sleep_timer_lock, flags); + } + fiq_debugger_handle_console_irq_context(state); + if (state->debug_busy) { + fiq_debugger_irq_exec(state, state->debug_cmd); + if (!state->console_enable) + fiq_debugger_prompt(state); + state->debug_busy = 0; + } +} + +static int fiq_debugger_getc(struct fiq_debugger_state *state) +{ + return state->pdata->uart_getc(state->pdev); +} + +static bool fiq_debugger_handle_uart_interrupt(struct fiq_debugger_state *state, + int this_cpu, const struct pt_regs *regs, void *svc_sp) +{ + int c; + static int last_c; + int count = 0; + bool signal_helper = false; + + if (this_cpu != state->current_cpu) { + if (state->in_fiq) + return false; + + if (atomic_inc_return(&state->unhandled_fiq_count) != + MAX_UNHANDLED_FIQ_COUNT) + return false; + + fiq_debugger_printf(&state->output, + "fiq_debugger: cpu %d not responding, " + "reverting to cpu %d\n", state->current_cpu, + this_cpu); + + atomic_set(&state->unhandled_fiq_count, 0); + fiq_debugger_switch_cpu(state, this_cpu); + return false; + } + + state->in_fiq = true; + + while ((c = fiq_debugger_getc(state)) != FIQ_DEBUGGER_NO_CHAR) { + count++; + if (!state->debug_enable) { + if ((c == 13) || (c == 10)) { + state->debug_enable = true; + state->debug_count = 0; + fiq_debugger_prompt(state); + } + } else if (c == FIQ_DEBUGGER_BREAK) { + state->console_enable = false; + fiq_debugger_puts(state, "fiq debugger mode\n"); + state->debug_count = 0; + fiq_debugger_prompt(state); +#ifdef CONFIG_FIQ_DEBUGGER_CONSOLE + } else if (state->console_enable && state->tty_rbuf) { + fiq_debugger_ringbuf_push(state->tty_rbuf, c); + signal_helper = true; +#endif + } else if ((c >= ' ') && (c < 127)) { + if (state->debug_count < (DEBUG_MAX - 1)) { + state->debug_buf[state->debug_count++] = c; + fiq_debugger_putc(state, c); + } + } else if ((c == 8) || (c == 127)) { + if (state->debug_count > 0) { + state->debug_count--; + fiq_debugger_putc(state, 8); + fiq_debugger_putc(state, ' '); + fiq_debugger_putc(state, 8); + } + } else if ((c == 13) || (c == 10)) { + if (c == '\r' || (c == '\n' && last_c != '\r')) { + fiq_debugger_putc(state, '\r'); + fiq_debugger_putc(state, '\n'); + } + if (state->debug_count) { + state->debug_buf[state->debug_count] = 0; + state->debug_count = 0; + signal_helper |= + fiq_debugger_fiq_exec(state, + state->debug_buf, + regs, svc_sp); + } else { + fiq_debugger_prompt(state); + } + } + last_c = c; + } + if (!state->console_enable) + fiq_debugger_uart_flush(state); + if (state->pdata->fiq_ack) + state->pdata->fiq_ack(state->pdev, state->fiq); + + /* poke sleep timer if necessary */ + if (state->debug_enable && !state->no_sleep) + signal_helper = true; + + atomic_set(&state->unhandled_fiq_count, 0); + state->in_fiq = false; + + return signal_helper; +} + +#ifdef CONFIG_FIQ_GLUE +static void fiq_debugger_fiq(struct fiq_glue_handler *h, + const struct pt_regs *regs, void *svc_sp) +{ + struct fiq_debugger_state *state = + container_of(h, struct fiq_debugger_state, handler); + unsigned int this_cpu = THREAD_INFO(svc_sp)->cpu; + bool need_irq; + + need_irq = fiq_debugger_handle_uart_interrupt(state, this_cpu, regs, + svc_sp); + if (need_irq) + fiq_debugger_force_irq(state); +} +#endif + +/* + * When not using FIQs, we only use this single interrupt as an entry point. + * This just effectively takes over the UART interrupt and does all the work + * in this context. + */ +static irqreturn_t fiq_debugger_uart_irq(int irq, void *dev) +{ + struct fiq_debugger_state *state = dev; + bool not_done; + + fiq_debugger_handle_wakeup(state); + + /* handle the debugger irq in regular context */ + not_done = fiq_debugger_handle_uart_interrupt(state, smp_processor_id(), + get_irq_regs(), + current_thread_info()); + if (not_done) + fiq_debugger_handle_irq_context(state); + + return IRQ_HANDLED; +} + +/* + * If FIQs are used, not everything can happen in fiq context. + * FIQ handler does what it can and then signals this interrupt to finish the + * job in irq context. + */ +static irqreturn_t fiq_debugger_signal_irq(int irq, void *dev) +{ + struct fiq_debugger_state *state = dev; + + if (state->pdata->force_irq_ack) + state->pdata->force_irq_ack(state->pdev, state->signal_irq); + + fiq_debugger_handle_irq_context(state); + + return IRQ_HANDLED; +} + +#ifdef CONFIG_FIQ_GLUE +static void fiq_debugger_resume(struct fiq_glue_handler *h) +{ + struct fiq_debugger_state *state = + container_of(h, struct fiq_debugger_state, handler); + if (state->pdata->uart_resume) + state->pdata->uart_resume(state->pdev); +} +#endif + +#if defined(CONFIG_FIQ_DEBUGGER_CONSOLE) +struct tty_driver *fiq_debugger_console_device(struct console *co, int *index) +{ + *index = co->index; + return fiq_tty_driver; +} + +static void fiq_debugger_console_write(struct console *co, + const char *s, unsigned int count) +{ + struct fiq_debugger_state *state; + unsigned long flags; + + state = container_of(co, struct fiq_debugger_state, console); + + if (!state->console_enable && !state->syslog_dumping) + return; + + fiq_debugger_uart_enable(state); + spin_lock_irqsave(&state->console_lock, flags); + while (count--) { + if (*s == '\n') + fiq_debugger_putc(state, '\r'); + fiq_debugger_putc(state, *s++); + } + fiq_debugger_uart_flush(state); + spin_unlock_irqrestore(&state->console_lock, flags); + fiq_debugger_uart_disable(state); +} + +static struct console fiq_debugger_console = { + .name = "ttyFIQ", + .device = fiq_debugger_console_device, + .write = fiq_debugger_console_write, + .flags = CON_PRINTBUFFER | CON_ANYTIME | CON_ENABLED, +}; + +int fiq_tty_open(struct tty_struct *tty, struct file *filp) +{ + int line = tty->index; + struct fiq_debugger_state **states = tty->driver->driver_state; + struct fiq_debugger_state *state = states[line]; + + return tty_port_open(&state->tty_port, tty, filp); +} + +void fiq_tty_close(struct tty_struct *tty, struct file *filp) +{ + tty_port_close(tty->port, tty, filp); +} + +int fiq_tty_write(struct tty_struct *tty, const unsigned char *buf, int count) +{ + int i; + int line = tty->index; + struct fiq_debugger_state **states = tty->driver->driver_state; + struct fiq_debugger_state *state = states[line]; + + if (!state->console_enable) + return count; + + fiq_debugger_uart_enable(state); + spin_lock_irq(&state->console_lock); + for (i = 0; i < count; i++) + fiq_debugger_putc(state, *buf++); + spin_unlock_irq(&state->console_lock); + fiq_debugger_uart_disable(state); + + return count; +} + +int fiq_tty_write_room(struct tty_struct *tty) +{ + return 16; +} + +#ifdef CONFIG_CONSOLE_POLL +static int fiq_tty_poll_init(struct tty_driver *driver, int line, char *options) +{ + return 0; +} + +static int fiq_tty_poll_get_char(struct tty_driver *driver, int line) +{ + struct fiq_debugger_state **states = driver->driver_state; + struct fiq_debugger_state *state = states[line]; + int c = NO_POLL_CHAR; + + fiq_debugger_uart_enable(state); + if (fiq_debugger_have_fiq(state)) { + int count = fiq_debugger_ringbuf_level(state->tty_rbuf); + if (count > 0) { + c = fiq_debugger_ringbuf_peek(state->tty_rbuf, 0); + fiq_debugger_ringbuf_consume(state->tty_rbuf, 1); + } + } else { + c = fiq_debugger_getc(state); + if (c == FIQ_DEBUGGER_NO_CHAR) + c = NO_POLL_CHAR; + } + fiq_debugger_uart_disable(state); + + return c; +} + +static void fiq_tty_poll_put_char(struct tty_driver *driver, int line, char ch) +{ + struct fiq_debugger_state **states = driver->driver_state; + struct fiq_debugger_state *state = states[line]; + fiq_debugger_uart_enable(state); + fiq_debugger_putc(state, ch); + fiq_debugger_uart_disable(state); +} +#endif + +static const struct tty_port_operations fiq_tty_port_ops; + +static const struct tty_operations fiq_tty_driver_ops = { + .write = fiq_tty_write, + .write_room = fiq_tty_write_room, + .open = fiq_tty_open, + .close = fiq_tty_close, +#ifdef CONFIG_CONSOLE_POLL + .poll_init = fiq_tty_poll_init, + .poll_get_char = fiq_tty_poll_get_char, + .poll_put_char = fiq_tty_poll_put_char, +#endif +}; + +static int fiq_debugger_tty_init(void) +{ + int ret; + struct fiq_debugger_state **states = NULL; + + states = kzalloc(sizeof(*states) * MAX_FIQ_DEBUGGER_PORTS, GFP_KERNEL); + if (!states) { + pr_err("Failed to allocate fiq debugger state structres\n"); + return -ENOMEM; + } + + fiq_tty_driver = alloc_tty_driver(MAX_FIQ_DEBUGGER_PORTS); + if (!fiq_tty_driver) { + pr_err("Failed to allocate fiq debugger tty\n"); + ret = -ENOMEM; + goto err_free_state; + } + + fiq_tty_driver->owner = THIS_MODULE; + fiq_tty_driver->driver_name = "fiq-debugger"; + fiq_tty_driver->name = "ttyFIQ"; + fiq_tty_driver->type = TTY_DRIVER_TYPE_SERIAL; + fiq_tty_driver->subtype = SERIAL_TYPE_NORMAL; + fiq_tty_driver->init_termios = tty_std_termios; + fiq_tty_driver->flags = TTY_DRIVER_REAL_RAW | + TTY_DRIVER_DYNAMIC_DEV; + fiq_tty_driver->driver_state = states; + + fiq_tty_driver->init_termios.c_cflag = + B115200 | CS8 | CREAD | HUPCL | CLOCAL; + fiq_tty_driver->init_termios.c_ispeed = 115200; + fiq_tty_driver->init_termios.c_ospeed = 115200; + + tty_set_operations(fiq_tty_driver, &fiq_tty_driver_ops); + + ret = tty_register_driver(fiq_tty_driver); + if (ret) { + pr_err("Failed to register fiq tty: %d\n", ret); + goto err_free_tty; + } + + pr_info("Registered FIQ tty driver\n"); + return 0; + +err_free_tty: + put_tty_driver(fiq_tty_driver); + fiq_tty_driver = NULL; +err_free_state: + kfree(states); + return ret; +} + +static int fiq_debugger_tty_init_one(struct fiq_debugger_state *state) +{ + int ret; + struct device *tty_dev; + struct fiq_debugger_state **states = fiq_tty_driver->driver_state; + + states[state->pdev->id] = state; + + state->tty_rbuf = fiq_debugger_ringbuf_alloc(1024); + if (!state->tty_rbuf) { + pr_err("Failed to allocate fiq debugger ringbuf\n"); + ret = -ENOMEM; + goto err; + } + + tty_port_init(&state->tty_port); + state->tty_port.ops = &fiq_tty_port_ops; + + tty_dev = tty_port_register_device(&state->tty_port, fiq_tty_driver, + state->pdev->id, &state->pdev->dev); + if (IS_ERR(tty_dev)) { + pr_err("Failed to register fiq debugger tty device\n"); + ret = PTR_ERR(tty_dev); + goto err; + } + + device_set_wakeup_capable(tty_dev, 1); + + pr_info("Registered fiq debugger ttyFIQ%d\n", state->pdev->id); + + return 0; + +err: + fiq_debugger_ringbuf_free(state->tty_rbuf); + state->tty_rbuf = NULL; + return ret; +} +#endif + +static int fiq_debugger_dev_suspend(struct device *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct fiq_debugger_state *state = platform_get_drvdata(pdev); + + if (state->pdata->uart_dev_suspend) + return state->pdata->uart_dev_suspend(pdev); + return 0; +} + +static int fiq_debugger_dev_resume(struct device *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct fiq_debugger_state *state = platform_get_drvdata(pdev); + + if (state->pdata->uart_dev_resume) + return state->pdata->uart_dev_resume(pdev); + return 0; +} + +static int fiq_debugger_probe(struct platform_device *pdev) +{ + int ret; + struct fiq_debugger_pdata *pdata = dev_get_platdata(&pdev->dev); + struct fiq_debugger_state *state; + int fiq; + int uart_irq; + + if (pdev->id >= MAX_FIQ_DEBUGGER_PORTS) + return -EINVAL; + + if (!pdata->uart_getc || !pdata->uart_putc) + return -EINVAL; + if ((pdata->uart_enable && !pdata->uart_disable) || + (!pdata->uart_enable && pdata->uart_disable)) + return -EINVAL; + + fiq = platform_get_irq_byname(pdev, "fiq"); + uart_irq = platform_get_irq_byname(pdev, "uart_irq"); + + /* uart_irq mode and fiq mode are mutually exclusive, but one of them + * is required */ + if ((uart_irq < 0 && fiq < 0) || (uart_irq >= 0 && fiq >= 0)) + return -EINVAL; + if (fiq >= 0 && !pdata->fiq_enable) + return -EINVAL; + + state = kzalloc(sizeof(*state), GFP_KERNEL); + state->output.printf = fiq_debugger_printf; + setup_timer(&state->sleep_timer, fiq_debugger_sleep_timer_expired, + (unsigned long)state); + state->pdata = pdata; + state->pdev = pdev; + state->no_sleep = initial_no_sleep; + state->debug_enable = initial_debug_enable; + state->console_enable = initial_console_enable; + + state->fiq = fiq; + state->uart_irq = uart_irq; + state->signal_irq = platform_get_irq_byname(pdev, "signal"); + state->wakeup_irq = platform_get_irq_byname(pdev, "wakeup"); + + INIT_WORK(&state->work, fiq_debugger_work); + spin_lock_init(&state->work_lock); + + platform_set_drvdata(pdev, state); + + spin_lock_init(&state->sleep_timer_lock); + + if (state->wakeup_irq < 0 && fiq_debugger_have_fiq(state)) + state->no_sleep = true; + state->ignore_next_wakeup_irq = !state->no_sleep; + + wakeup_source_init(&state->debugger_wake_src, "serial-debug"); + + state->clk = clk_get(&pdev->dev, NULL); + if (IS_ERR(state->clk)) + state->clk = NULL; + + /* do not call pdata->uart_enable here since uart_init may still + * need to do some initialization before uart_enable can work. + * So, only try to manage the clock during init. + */ + if (state->clk) + clk_enable(state->clk); + + if (pdata->uart_init) { + ret = pdata->uart_init(pdev); + if (ret) + goto err_uart_init; + } + + fiq_debugger_printf_nfiq(state, + "<hit enter %sto activate fiq debugger>\n", + state->no_sleep ? "" : "twice "); + +#ifdef CONFIG_FIQ_GLUE + if (fiq_debugger_have_fiq(state)) { + state->handler.fiq = fiq_debugger_fiq; + state->handler.resume = fiq_debugger_resume; + ret = fiq_glue_register_handler(&state->handler); + if (ret) { + pr_err("%s: could not install fiq handler\n", __func__); + goto err_register_irq; + } + + pdata->fiq_enable(pdev, state->fiq, 1); + } else +#endif + { + ret = request_irq(state->uart_irq, fiq_debugger_uart_irq, + IRQF_NO_SUSPEND, "debug", state); + if (ret) { + pr_err("%s: could not install irq handler\n", __func__); + goto err_register_irq; + } + + /* for irq-only mode, we want this irq to wake us up, if it + * can. + */ + enable_irq_wake(state->uart_irq); + } + + if (state->clk) + clk_disable(state->clk); + + if (state->signal_irq >= 0) { + ret = request_irq(state->signal_irq, fiq_debugger_signal_irq, + IRQF_TRIGGER_RISING, "debug-signal", state); + if (ret) + pr_err("serial_debugger: could not install signal_irq"); + } + + if (state->wakeup_irq >= 0) { + ret = request_irq(state->wakeup_irq, + fiq_debugger_wakeup_irq_handler, + IRQF_TRIGGER_FALLING, + "debug-wakeup", state); + if (ret) { + pr_err("serial_debugger: " + "could not install wakeup irq\n"); + state->wakeup_irq = -1; + } else { + ret = enable_irq_wake(state->wakeup_irq); + if (ret) { + pr_err("serial_debugger: " + "could not enable wakeup\n"); + state->wakeup_irq_no_set_wake = true; + } + } + } + if (state->no_sleep) + fiq_debugger_handle_wakeup(state); + +#if defined(CONFIG_FIQ_DEBUGGER_CONSOLE) + spin_lock_init(&state->console_lock); + state->console = fiq_debugger_console; + state->console.index = pdev->id; + if (!console_set_on_cmdline) + add_preferred_console(state->console.name, + state->console.index, NULL); + register_console(&state->console); + fiq_debugger_tty_init_one(state); +#endif + return 0; + +err_register_irq: + if (pdata->uart_free) + pdata->uart_free(pdev); +err_uart_init: + if (state->clk) + clk_disable(state->clk); + if (state->clk) + clk_put(state->clk); + wakeup_source_trash(&state->debugger_wake_src); + platform_set_drvdata(pdev, NULL); + kfree(state); + return ret; +} + +static const struct dev_pm_ops fiq_debugger_dev_pm_ops = { + .suspend = fiq_debugger_dev_suspend, + .resume = fiq_debugger_dev_resume, +}; + +static struct platform_driver fiq_debugger_driver = { + .probe = fiq_debugger_probe, + .driver = { + .name = "fiq_debugger", + .pm = &fiq_debugger_dev_pm_ops, + }, +}; + +#if defined(CONFIG_FIQ_DEBUGGER_UART_OVERLAY) +int fiq_debugger_uart_overlay(void) +{ + struct device_node *onp = of_find_node_by_path("/uart_overlay@0"); + int ret; + + if (!onp) { + pr_err("serial_debugger: uart overlay not found\n"); + return -ENODEV; + } + + ret = of_overlay_create(onp); + if (ret < 0) { + pr_err("serial_debugger: fail to create overlay: %d\n", ret); + of_node_put(onp); + return ret; + } + + pr_info("serial_debugger: uart overlay applied\n"); + return 0; +} +#endif + +static int __init fiq_debugger_init(void) +{ + if (fiq_debugger_disable) { + pr_err("serial_debugger: disabled\n"); + return -ENODEV; + } +#if defined(CONFIG_FIQ_DEBUGGER_CONSOLE) + fiq_debugger_tty_init(); +#endif +#if defined(CONFIG_FIQ_DEBUGGER_UART_OVERLAY) + fiq_debugger_uart_overlay(); +#endif + return platform_driver_register(&fiq_debugger_driver); +} + +postcore_initcall(fiq_debugger_init);
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger.h b/drivers/staging/android/fiq_debugger/fiq_debugger.h new file mode 100644 index 0000000..c9ec4f8 --- /dev/null +++ b/drivers/staging/android/fiq_debugger/fiq_debugger.h
@@ -0,0 +1,64 @@ +/* + * drivers/staging/android/fiq_debugger/fiq_debugger.h + * + * Copyright (C) 2010 Google, Inc. + * Author: Colin Cross <ccross@android.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef _ARCH_ARM_MACH_TEGRA_FIQ_DEBUGGER_H_ +#define _ARCH_ARM_MACH_TEGRA_FIQ_DEBUGGER_H_ + +#include <linux/serial_core.h> + +#define FIQ_DEBUGGER_NO_CHAR NO_POLL_CHAR +#define FIQ_DEBUGGER_BREAK 0x00ff0100 + +#define FIQ_DEBUGGER_FIQ_IRQ_NAME "fiq" +#define FIQ_DEBUGGER_SIGNAL_IRQ_NAME "signal" +#define FIQ_DEBUGGER_WAKEUP_IRQ_NAME "wakeup" + +/** + * struct fiq_debugger_pdata - fiq debugger platform data + * @uart_resume: used to restore uart state right before enabling + * the fiq. + * @uart_enable: Do the work necessary to communicate with the uart + * hw (enable clocks, etc.). This must be ref-counted. + * @uart_disable: Do the work necessary to disable the uart hw + * (disable clocks, etc.). This must be ref-counted. + * @uart_dev_suspend: called during PM suspend, generally not needed + * for real fiq mode debugger. + * @uart_dev_resume: called during PM resume, generally not needed + * for real fiq mode debugger. + */ +struct fiq_debugger_pdata { + int (*uart_init)(struct platform_device *pdev); + void (*uart_free)(struct platform_device *pdev); + int (*uart_resume)(struct platform_device *pdev); + int (*uart_getc)(struct platform_device *pdev); + void (*uart_putc)(struct platform_device *pdev, unsigned int c); + void (*uart_flush)(struct platform_device *pdev); + void (*uart_enable)(struct platform_device *pdev); + void (*uart_disable)(struct platform_device *pdev); + + int (*uart_dev_suspend)(struct platform_device *pdev); + int (*uart_dev_resume)(struct platform_device *pdev); + + void (*fiq_enable)(struct platform_device *pdev, unsigned int fiq, + bool enable); + void (*fiq_ack)(struct platform_device *pdev, unsigned int fiq); + + void (*force_irq)(struct platform_device *pdev, unsigned int irq); + void (*force_irq_ack)(struct platform_device *pdev, unsigned int irq); +}; + +#endif
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger_arm.c b/drivers/staging/android/fiq_debugger/fiq_debugger_arm.c new file mode 100644 index 0000000..8b3e013 --- /dev/null +++ b/drivers/staging/android/fiq_debugger/fiq_debugger_arm.c
@@ -0,0 +1,240 @@ +/* + * Copyright (C) 2014 Google, Inc. + * Author: Colin Cross <ccross@android.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/ptrace.h> +#include <linux/uaccess.h> + +#include <asm/stacktrace.h> + +#include "fiq_debugger_priv.h" + +static char *mode_name(unsigned cpsr) +{ + switch (cpsr & MODE_MASK) { + case USR_MODE: return "USR"; + case FIQ_MODE: return "FIQ"; + case IRQ_MODE: return "IRQ"; + case SVC_MODE: return "SVC"; + case ABT_MODE: return "ABT"; + case UND_MODE: return "UND"; + case SYSTEM_MODE: return "SYS"; + default: return "???"; + } +} + +void fiq_debugger_dump_pc(struct fiq_debugger_output *output, + const struct pt_regs *regs) +{ + output->printf(output, " pc %08x cpsr %08x mode %s\n", + regs->ARM_pc, regs->ARM_cpsr, mode_name(regs->ARM_cpsr)); +} + +void fiq_debugger_dump_regs(struct fiq_debugger_output *output, + const struct pt_regs *regs) +{ + output->printf(output, + " r0 %08x r1 %08x r2 %08x r3 %08x\n", + regs->ARM_r0, regs->ARM_r1, regs->ARM_r2, regs->ARM_r3); + output->printf(output, + " r4 %08x r5 %08x r6 %08x r7 %08x\n", + regs->ARM_r4, regs->ARM_r5, regs->ARM_r6, regs->ARM_r7); + output->printf(output, + " r8 %08x r9 %08x r10 %08x r11 %08x mode %s\n", + regs->ARM_r8, regs->ARM_r9, regs->ARM_r10, regs->ARM_fp, + mode_name(regs->ARM_cpsr)); + output->printf(output, + " ip %08x sp %08x lr %08x pc %08x cpsr %08x\n", + regs->ARM_ip, regs->ARM_sp, regs->ARM_lr, regs->ARM_pc, + regs->ARM_cpsr); +} + +struct mode_regs { + unsigned long sp_svc; + unsigned long lr_svc; + unsigned long spsr_svc; + + unsigned long sp_abt; + unsigned long lr_abt; + unsigned long spsr_abt; + + unsigned long sp_und; + unsigned long lr_und; + unsigned long spsr_und; + + unsigned long sp_irq; + unsigned long lr_irq; + unsigned long spsr_irq; + + unsigned long r8_fiq; + unsigned long r9_fiq; + unsigned long r10_fiq; + unsigned long r11_fiq; + unsigned long r12_fiq; + unsigned long sp_fiq; + unsigned long lr_fiq; + unsigned long spsr_fiq; +}; + +static void __naked get_mode_regs(struct mode_regs *regs) +{ + asm volatile ( + "mrs r1, cpsr\n" + "msr cpsr_c, #0xd3 @(SVC_MODE | PSR_I_BIT | PSR_F_BIT)\n" + "stmia r0!, {r13 - r14}\n" + "mrs r2, spsr\n" + "msr cpsr_c, #0xd7 @(ABT_MODE | PSR_I_BIT | PSR_F_BIT)\n" + "stmia r0!, {r2, r13 - r14}\n" + "mrs r2, spsr\n" + "msr cpsr_c, #0xdb @(UND_MODE | PSR_I_BIT | PSR_F_BIT)\n" + "stmia r0!, {r2, r13 - r14}\n" + "mrs r2, spsr\n" + "msr cpsr_c, #0xd2 @(IRQ_MODE | PSR_I_BIT | PSR_F_BIT)\n" + "stmia r0!, {r2, r13 - r14}\n" + "mrs r2, spsr\n" + "msr cpsr_c, #0xd1 @(FIQ_MODE | PSR_I_BIT | PSR_F_BIT)\n" + "stmia r0!, {r2, r8 - r14}\n" + "mrs r2, spsr\n" + "stmia r0!, {r2}\n" + "msr cpsr_c, r1\n" + "bx lr\n"); +} + + +void fiq_debugger_dump_allregs(struct fiq_debugger_output *output, + const struct pt_regs *regs) +{ + struct mode_regs mode_regs; + unsigned long mode = regs->ARM_cpsr & MODE_MASK; + + fiq_debugger_dump_regs(output, regs); + get_mode_regs(&mode_regs); + + output->printf(output, + "%csvc: sp %08x lr %08x spsr %08x\n", + mode == SVC_MODE ? '*' : ' ', + mode_regs.sp_svc, mode_regs.lr_svc, mode_regs.spsr_svc); + output->printf(output, + "%cabt: sp %08x lr %08x spsr %08x\n", + mode == ABT_MODE ? '*' : ' ', + mode_regs.sp_abt, mode_regs.lr_abt, mode_regs.spsr_abt); + output->printf(output, + "%cund: sp %08x lr %08x spsr %08x\n", + mode == UND_MODE ? '*' : ' ', + mode_regs.sp_und, mode_regs.lr_und, mode_regs.spsr_und); + output->printf(output, + "%cirq: sp %08x lr %08x spsr %08x\n", + mode == IRQ_MODE ? '*' : ' ', + mode_regs.sp_irq, mode_regs.lr_irq, mode_regs.spsr_irq); + output->printf(output, + "%cfiq: r8 %08x r9 %08x r10 %08x r11 %08x r12 %08x\n", + mode == FIQ_MODE ? '*' : ' ', + mode_regs.r8_fiq, mode_regs.r9_fiq, mode_regs.r10_fiq, + mode_regs.r11_fiq, mode_regs.r12_fiq); + output->printf(output, + " fiq: sp %08x lr %08x spsr %08x\n", + mode_regs.sp_fiq, mode_regs.lr_fiq, mode_regs.spsr_fiq); +} + +struct stacktrace_state { + struct fiq_debugger_output *output; + unsigned int depth; +}; + +static int report_trace(struct stackframe *frame, void *d) +{ + struct stacktrace_state *sts = d; + + if (sts->depth) { + sts->output->printf(sts->output, + " pc: %p (%pF), lr %p (%pF), sp %p, fp %p\n", + frame->pc, frame->pc, frame->lr, frame->lr, + frame->sp, frame->fp); + sts->depth--; + return 0; + } + sts->output->printf(sts->output, " ...\n"); + + return sts->depth == 0; +} + +struct frame_tail { + struct frame_tail *fp; + unsigned long sp; + unsigned long lr; +} __attribute__((packed)); + +static struct frame_tail *user_backtrace(struct fiq_debugger_output *output, + struct frame_tail *tail) +{ + struct frame_tail buftail[2]; + + /* Also check accessibility of one struct frame_tail beyond */ + if (!access_ok(VERIFY_READ, tail, sizeof(buftail))) { + output->printf(output, " invalid frame pointer %p\n", + tail); + return NULL; + } + if (__copy_from_user_inatomic(buftail, tail, sizeof(buftail))) { + output->printf(output, + " failed to copy frame pointer %p\n", tail); + return NULL; + } + + output->printf(output, " %p\n", buftail[0].lr); + + /* frame pointers should strictly progress back up the stack + * (towards higher addresses) */ + if (tail >= buftail[0].fp) + return NULL; + + return buftail[0].fp-1; +} + +void fiq_debugger_dump_stacktrace(struct fiq_debugger_output *output, + const struct pt_regs *regs, unsigned int depth, void *ssp) +{ + struct frame_tail *tail; + struct thread_info *real_thread_info = THREAD_INFO(ssp); + struct stacktrace_state sts; + + sts.depth = depth; + sts.output = output; + *current_thread_info() = *real_thread_info; + + if (!current) + output->printf(output, "current NULL\n"); + else + output->printf(output, "pid: %d comm: %s\n", + current->pid, current->comm); + fiq_debugger_dump_regs(output, regs); + + if (!user_mode(regs)) { + struct stackframe frame; + frame.fp = regs->ARM_fp; + frame.sp = regs->ARM_sp; + frame.lr = regs->ARM_lr; + frame.pc = regs->ARM_pc; + output->printf(output, + " pc: %p (%pF), lr %p (%pF), sp %p, fp %p\n", + regs->ARM_pc, regs->ARM_pc, regs->ARM_lr, regs->ARM_lr, + regs->ARM_sp, regs->ARM_fp); + walk_stackframe(&frame, report_trace, &sts); + return; + } + + tail = ((struct frame_tail *) regs->ARM_fp) - 1; + while (depth-- && tail && !((unsigned long) tail & 3)) + tail = user_backtrace(output, tail); +}
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger_arm64.c b/drivers/staging/android/fiq_debugger/fiq_debugger_arm64.c new file mode 100644 index 0000000..97246bc --- /dev/null +++ b/drivers/staging/android/fiq_debugger/fiq_debugger_arm64.c
@@ -0,0 +1,202 @@ +/* + * Copyright (C) 2014 Google, Inc. + * Author: Colin Cross <ccross@android.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/ptrace.h> +#include <asm/stacktrace.h> + +#include "fiq_debugger_priv.h" + +static char *mode_name(const struct pt_regs *regs) +{ + if (compat_user_mode(regs)) { + return "USR"; + } else { + switch (processor_mode(regs)) { + case PSR_MODE_EL0t: return "EL0t"; + case PSR_MODE_EL1t: return "EL1t"; + case PSR_MODE_EL1h: return "EL1h"; + case PSR_MODE_EL2t: return "EL2t"; + case PSR_MODE_EL2h: return "EL2h"; + default: return "???"; + } + } +} + +void fiq_debugger_dump_pc(struct fiq_debugger_output *output, + const struct pt_regs *regs) +{ + output->printf(output, " pc %016lx cpsr %08lx mode %s\n", + regs->pc, regs->pstate, mode_name(regs)); +} + +void fiq_debugger_dump_regs_aarch32(struct fiq_debugger_output *output, + const struct pt_regs *regs) +{ + output->printf(output, " r0 %08x r1 %08x r2 %08x r3 %08x\n", + regs->compat_usr(0), regs->compat_usr(1), + regs->compat_usr(2), regs->compat_usr(3)); + output->printf(output, " r4 %08x r5 %08x r6 %08x r7 %08x\n", + regs->compat_usr(4), regs->compat_usr(5), + regs->compat_usr(6), regs->compat_usr(7)); + output->printf(output, " r8 %08x r9 %08x r10 %08x r11 %08x\n", + regs->compat_usr(8), regs->compat_usr(9), + regs->compat_usr(10), regs->compat_usr(11)); + output->printf(output, " ip %08x sp %08x lr %08x pc %08x\n", + regs->compat_usr(12), regs->compat_sp, + regs->compat_lr, regs->pc); + output->printf(output, " cpsr %08x (%s)\n", + regs->pstate, mode_name(regs)); +} + +void fiq_debugger_dump_regs_aarch64(struct fiq_debugger_output *output, + const struct pt_regs *regs) +{ + + output->printf(output, " x0 %016lx x1 %016lx\n", + regs->regs[0], regs->regs[1]); + output->printf(output, " x2 %016lx x3 %016lx\n", + regs->regs[2], regs->regs[3]); + output->printf(output, " x4 %016lx x5 %016lx\n", + regs->regs[4], regs->regs[5]); + output->printf(output, " x6 %016lx x7 %016lx\n", + regs->regs[6], regs->regs[7]); + output->printf(output, " x8 %016lx x9 %016lx\n", + regs->regs[8], regs->regs[9]); + output->printf(output, " x10 %016lx x11 %016lx\n", + regs->regs[10], regs->regs[11]); + output->printf(output, " x12 %016lx x13 %016lx\n", + regs->regs[12], regs->regs[13]); + output->printf(output, " x14 %016lx x15 %016lx\n", + regs->regs[14], regs->regs[15]); + output->printf(output, " x16 %016lx x17 %016lx\n", + regs->regs[16], regs->regs[17]); + output->printf(output, " x18 %016lx x19 %016lx\n", + regs->regs[18], regs->regs[19]); + output->printf(output, " x20 %016lx x21 %016lx\n", + regs->regs[20], regs->regs[21]); + output->printf(output, " x22 %016lx x23 %016lx\n", + regs->regs[22], regs->regs[23]); + output->printf(output, " x24 %016lx x25 %016lx\n", + regs->regs[24], regs->regs[25]); + output->printf(output, " x26 %016lx x27 %016lx\n", + regs->regs[26], regs->regs[27]); + output->printf(output, " x28 %016lx x29 %016lx\n", + regs->regs[28], regs->regs[29]); + output->printf(output, " x30 %016lx sp %016lx\n", + regs->regs[30], regs->sp); + output->printf(output, " pc %016lx cpsr %08x (%s)\n", + regs->pc, regs->pstate, mode_name(regs)); +} + +void fiq_debugger_dump_regs(struct fiq_debugger_output *output, + const struct pt_regs *regs) +{ + if (compat_user_mode(regs)) + fiq_debugger_dump_regs_aarch32(output, regs); + else + fiq_debugger_dump_regs_aarch64(output, regs); +} + +#define READ_SPECIAL_REG(x) ({ \ + u64 val; \ + asm volatile ("mrs %0, " # x : "=r"(val)); \ + val; \ +}) + +void fiq_debugger_dump_allregs(struct fiq_debugger_output *output, + const struct pt_regs *regs) +{ + u32 pstate = READ_SPECIAL_REG(CurrentEl); + bool in_el2 = (pstate & PSR_MODE_MASK) >= PSR_MODE_EL2t; + + fiq_debugger_dump_regs(output, regs); + + output->printf(output, " sp_el0 %016lx\n", + READ_SPECIAL_REG(sp_el0)); + + if (in_el2) + output->printf(output, " sp_el1 %016lx\n", + READ_SPECIAL_REG(sp_el1)); + + output->printf(output, " elr_el1 %016lx\n", + READ_SPECIAL_REG(elr_el1)); + + output->printf(output, " spsr_el1 %08lx\n", + READ_SPECIAL_REG(spsr_el1)); + + if (in_el2) { + output->printf(output, " spsr_irq %08lx\n", + READ_SPECIAL_REG(spsr_irq)); + output->printf(output, " spsr_abt %08lx\n", + READ_SPECIAL_REG(spsr_abt)); + output->printf(output, " spsr_und %08lx\n", + READ_SPECIAL_REG(spsr_und)); + output->printf(output, " spsr_fiq %08lx\n", + READ_SPECIAL_REG(spsr_fiq)); + output->printf(output, " spsr_el2 %08lx\n", + READ_SPECIAL_REG(elr_el2)); + output->printf(output, " spsr_el2 %08lx\n", + READ_SPECIAL_REG(spsr_el2)); + } +} + +struct stacktrace_state { + struct fiq_debugger_output *output; + unsigned int depth; +}; + +static int report_trace(struct stackframe *frame, void *d) +{ + struct stacktrace_state *sts = d; + + if (sts->depth) { + sts->output->printf(sts->output, "%pF:\n", frame->pc); + sts->output->printf(sts->output, + " pc %016lx sp %016lx fp %016lx\n", + frame->pc, frame->sp, frame->fp); + sts->depth--; + return 0; + } + sts->output->printf(sts->output, " ...\n"); + + return sts->depth == 0; +} + +void fiq_debugger_dump_stacktrace(struct fiq_debugger_output *output, + const struct pt_regs *regs, unsigned int depth, void *ssp) +{ + struct thread_info *real_thread_info = THREAD_INFO(ssp); + struct stacktrace_state sts; + + sts.depth = depth; + sts.output = output; + *current_thread_info() = *real_thread_info; + + if (!current) + output->printf(output, "current NULL\n"); + else + output->printf(output, "pid: %d comm: %s\n", + current->pid, current->comm); + fiq_debugger_dump_regs(output, regs); + + if (!user_mode(regs)) { + struct stackframe frame; + frame.fp = regs->regs[29]; + frame.sp = regs->sp; + frame.pc = regs->pc; + output->printf(output, "\n"); + walk_stackframe(current, &frame, report_trace, &sts); + } +}
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger_priv.h b/drivers/staging/android/fiq_debugger/fiq_debugger_priv.h new file mode 100644 index 0000000..d5d051f --- /dev/null +++ b/drivers/staging/android/fiq_debugger/fiq_debugger_priv.h
@@ -0,0 +1,37 @@ +/* + * Copyright (C) 2014 Google, Inc. + * Author: Colin Cross <ccross@android.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef _FIQ_DEBUGGER_PRIV_H_ +#define _FIQ_DEBUGGER_PRIV_H_ + +#define THREAD_INFO(sp) ((struct thread_info *) \ + ((unsigned long)(sp) & ~(THREAD_SIZE - 1))) + +struct fiq_debugger_output { + void (*printf)(struct fiq_debugger_output *output, const char *fmt, ...); +}; + +struct pt_regs; + +void fiq_debugger_dump_pc(struct fiq_debugger_output *output, + const struct pt_regs *regs); +void fiq_debugger_dump_regs(struct fiq_debugger_output *output, + const struct pt_regs *regs); +void fiq_debugger_dump_allregs(struct fiq_debugger_output *output, + const struct pt_regs *regs); +void fiq_debugger_dump_stacktrace(struct fiq_debugger_output *output, + const struct pt_regs *regs, unsigned int depth, void *ssp); + +#endif
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger_ringbuf.h b/drivers/staging/android/fiq_debugger/fiq_debugger_ringbuf.h new file mode 100644 index 0000000..10c3c5d0 --- /dev/null +++ b/drivers/staging/android/fiq_debugger/fiq_debugger_ringbuf.h
@@ -0,0 +1,94 @@ +/* + * drivers/staging/android/fiq_debugger/fiq_debugger_ringbuf.h + * + * simple lockless ringbuffer + * + * Copyright (C) 2010 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/kernel.h> +#include <linux/slab.h> + +struct fiq_debugger_ringbuf { + int len; + int head; + int tail; + u8 buf[]; +}; + + +static inline struct fiq_debugger_ringbuf *fiq_debugger_ringbuf_alloc(int len) +{ + struct fiq_debugger_ringbuf *rbuf; + + rbuf = kzalloc(sizeof(*rbuf) + len, GFP_KERNEL); + if (rbuf == NULL) + return NULL; + + rbuf->len = len; + rbuf->head = 0; + rbuf->tail = 0; + smp_mb(); + + return rbuf; +} + +static inline void fiq_debugger_ringbuf_free(struct fiq_debugger_ringbuf *rbuf) +{ + kfree(rbuf); +} + +static inline int fiq_debugger_ringbuf_level(struct fiq_debugger_ringbuf *rbuf) +{ + int level = rbuf->head - rbuf->tail; + + if (level < 0) + level = rbuf->len + level; + + return level; +} + +static inline int fiq_debugger_ringbuf_room(struct fiq_debugger_ringbuf *rbuf) +{ + return rbuf->len - fiq_debugger_ringbuf_level(rbuf) - 1; +} + +static inline u8 +fiq_debugger_ringbuf_peek(struct fiq_debugger_ringbuf *rbuf, int i) +{ + return rbuf->buf[(rbuf->tail + i) % rbuf->len]; +} + +static inline int +fiq_debugger_ringbuf_consume(struct fiq_debugger_ringbuf *rbuf, int count) +{ + count = min(count, fiq_debugger_ringbuf_level(rbuf)); + + rbuf->tail = (rbuf->tail + count) % rbuf->len; + smp_mb(); + + return count; +} + +static inline int +fiq_debugger_ringbuf_push(struct fiq_debugger_ringbuf *rbuf, u8 datum) +{ + if (fiq_debugger_ringbuf_room(rbuf) == 0) + return 0; + + rbuf->buf[rbuf->head] = datum; + smp_mb(); + rbuf->head = (rbuf->head + 1) % rbuf->len; + smp_mb(); + + return 1; +}
diff --git a/drivers/staging/android/fiq_debugger/fiq_watchdog.c b/drivers/staging/android/fiq_debugger/fiq_watchdog.c new file mode 100644 index 0000000..194b541 --- /dev/null +++ b/drivers/staging/android/fiq_debugger/fiq_watchdog.c
@@ -0,0 +1,56 @@ +/* + * Copyright (C) 2014 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/kernel.h> +#include <linux/spinlock.h> +#include <linux/pstore_ram.h> + +#include "fiq_watchdog.h" +#include "fiq_debugger_priv.h" + +static DEFINE_RAW_SPINLOCK(fiq_watchdog_lock); + +static void fiq_watchdog_printf(struct fiq_debugger_output *output, + const char *fmt, ...) +{ + char buf[256]; + va_list ap; + int len; + + va_start(ap, fmt); + len = vscnprintf(buf, sizeof(buf), fmt, ap); + va_end(ap); + + ramoops_console_write_buf(buf, len); +} + +struct fiq_debugger_output fiq_watchdog_output = { + .printf = fiq_watchdog_printf, +}; + +void fiq_watchdog_triggered(const struct pt_regs *regs, void *svc_sp) +{ + char msg[24]; + int len; + + raw_spin_lock(&fiq_watchdog_lock); + + len = scnprintf(msg, sizeof(msg), "watchdog fiq cpu %d\n", + THREAD_INFO(svc_sp)->cpu); + ramoops_console_write_buf(msg, len); + + fiq_debugger_dump_stacktrace(&fiq_watchdog_output, regs, 100, svc_sp); + + raw_spin_unlock(&fiq_watchdog_lock); +}
diff --git a/drivers/staging/android/fiq_debugger/fiq_watchdog.h b/drivers/staging/android/fiq_debugger/fiq_watchdog.h new file mode 100644 index 0000000..c6b507f --- /dev/null +++ b/drivers/staging/android/fiq_debugger/fiq_watchdog.h
@@ -0,0 +1,20 @@ +/* + * Copyright (C) 2014 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef _FIQ_WATCHDOG_H_ +#define _FIQ_WATCHDOG_H_ + +void fiq_watchdog_triggered(const struct pt_regs *regs, void *svc_sp); + +#endif
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 806e9b3..3af862c 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c
@@ -622,7 +622,7 @@ static int ion_debug_client_show(struct seq_file *s, void *unused) mutex_lock(&debugfs_mutex); if (!is_client_alive(client)) { - seq_printf(s, "ion_client 0x%p dead, can't dump its buffers\n", + seq_printf(s, "ion_client 0x%pK dead, can't dump its buffers\n", client); mutex_unlock(&debugfs_mutex); return 0; @@ -775,7 +775,6 @@ void ion_client_destroy(struct ion_client *client) struct ion_device *dev = client->dev; struct rb_node *n; - pr_debug("%s: %d\n", __func__, __LINE__); mutex_lock(&debugfs_mutex); while ((n = rb_first(&client->handles))) { struct ion_handle *handle = rb_entry(n, struct ion_handle, @@ -848,9 +847,6 @@ static void ion_buffer_sync_for_device(struct ion_buffer *buffer, int pages = PAGE_ALIGN(buffer->size) / PAGE_SIZE; int i; - pr_debug("%s: syncing for device %s\n", __func__, - dev ? dev_name(dev) : "null"); - if (!ion_buffer_fault_user_mappings(buffer)) return; @@ -904,7 +900,6 @@ static void ion_vm_open(struct vm_area_struct *vma) mutex_lock(&buffer->lock); list_add(&vma_list->list, &buffer->vmas); mutex_unlock(&buffer->lock); - pr_debug("%s: adding %p\n", __func__, vma); } static void ion_vm_close(struct vm_area_struct *vma) @@ -912,14 +907,12 @@ static void ion_vm_close(struct vm_area_struct *vma) struct ion_buffer *buffer = vma->vm_private_data; struct ion_vma_list *vma_list, *tmp; - pr_debug("%s\n", __func__); mutex_lock(&buffer->lock); list_for_each_entry_safe(vma_list, tmp, &buffer->vmas, list) { if (vma_list->vma != vma) continue; list_del(&vma_list->list); kfree(vma_list); - pr_debug("%s: deleting %p\n", __func__, vma); break; } mutex_unlock(&buffer->lock); @@ -1231,7 +1224,6 @@ static int ion_release(struct inode *inode, struct file *file) { struct ion_client *client = file->private_data; - pr_debug("%s: %d\n", __func__, __LINE__); ion_client_destroy(client); return 0; } @@ -1243,7 +1235,6 @@ static int ion_open(struct inode *inode, struct file *file) struct ion_client *client; char debug_name[64]; - pr_debug("%s: %d\n", __func__, __LINE__); snprintf(debug_name, 64, "%u", task_pid_nr(current->group_leader)); client = ion_client_create(dev, debug_name); if (IS_ERR(client))
diff --git a/drivers/staging/android/ion/ion_cma_heap.c b/drivers/staging/android/ion/ion_cma_heap.c index 6c7de74..0ca90ba 100644 --- a/drivers/staging/android/ion/ion_cma_heap.c +++ b/drivers/staging/android/ion/ion_cma_heap.c
@@ -49,8 +49,6 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer, struct device *dev = cma_heap->dev; struct ion_cma_buffer_info *info; - dev_dbg(dev, "Request buffer allocation len %ld\n", len); - if (buffer->flags & ION_FLAG_CACHED) return -EINVAL; @@ -79,7 +77,6 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer, /* keep this for memory release */ buffer->priv_virt = info; buffer->sg_table = info->table; - dev_dbg(dev, "Allocate buffer %p\n", buffer); return 0; free_table: @@ -97,7 +94,6 @@ static void ion_cma_free(struct ion_buffer *buffer) struct device *dev = cma_heap->dev; struct ion_cma_buffer_info *info = buffer->priv_virt; - dev_dbg(dev, "Release buffer %p\n", buffer); /* release memory */ dma_free_coherent(dev, buffer->size, info->cpu_addr, info->handle); /* release sg table */
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index ec3b665..687be36 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c
@@ -43,6 +43,9 @@ #include <linux/profile.h> #include <linux/notifier.h> +#define CREATE_TRACE_POINTS +#include "trace/lowmemorykiller.h" + static u32 lowmem_debug_level = 1; static short lowmem_adj[6] = { 0, @@ -93,6 +96,7 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc) int other_free = global_page_state(NR_FREE_PAGES) - totalreserve_pages; int other_file = global_node_page_state(NR_FILE_PAGES) - global_node_page_state(NR_SHMEM) - + global_node_page_state(NR_UNEVICTABLE) - total_swapcache_pages(); if (lowmem_adj_size < array_size) @@ -160,23 +164,27 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc) p->comm, p->pid, oom_score_adj, tasksize); } if (selected) { + long cache_size = other_file * (long)(PAGE_SIZE / 1024); + long cache_limit = minfree * (long)(PAGE_SIZE / 1024); + long free = other_free * (long)(PAGE_SIZE / 1024); + task_lock(selected); send_sig(SIGKILL, selected, 0); if (selected->mm) task_set_lmk_waiting(selected); task_unlock(selected); - lowmem_print(1, "Killing '%s' (%d), adj %hd,\n" + trace_lowmemory_kill(selected, cache_size, cache_limit, free); + lowmem_print(1, "Killing '%s' (%d) (tgid %d), adj %hd,\n" " to free %ldkB on behalf of '%s' (%d) because\n" " cache %ldkB is below limit %ldkB for oom_score_adj %hd\n" " Free memory is %ldkB above reserved\n", - selected->comm, selected->pid, + selected->comm, selected->pid, selected->tgid, selected_oom_score_adj, selected_tasksize * (long)(PAGE_SIZE / 1024), current->comm, current->pid, - other_file * (long)(PAGE_SIZE / 1024), - minfree * (long)(PAGE_SIZE / 1024), + cache_size, cache_limit, min_score_adj, - other_free * (long)(PAGE_SIZE / 1024)); + free); lowmem_deathpending_timeout = jiffies + HZ; rem += selected_tasksize; } @@ -200,12 +208,96 @@ static int __init lowmem_init(void) } device_initcall(lowmem_init); +#ifdef CONFIG_ANDROID_LOW_MEMORY_KILLER_AUTODETECT_OOM_ADJ_VALUES +static short lowmem_oom_adj_to_oom_score_adj(short oom_adj) +{ + if (oom_adj == OOM_ADJUST_MAX) + return OOM_SCORE_ADJ_MAX; + else + return (oom_adj * OOM_SCORE_ADJ_MAX) / -OOM_DISABLE; +} + +static void lowmem_autodetect_oom_adj_values(void) +{ + int i; + short oom_adj; + short oom_score_adj; + int array_size = ARRAY_SIZE(lowmem_adj); + + if (lowmem_adj_size < array_size) + array_size = lowmem_adj_size; + + if (array_size <= 0) + return; + + oom_adj = lowmem_adj[array_size - 1]; + if (oom_adj > OOM_ADJUST_MAX) + return; + + oom_score_adj = lowmem_oom_adj_to_oom_score_adj(oom_adj); + if (oom_score_adj <= OOM_ADJUST_MAX) + return; + + lowmem_print(1, "lowmem_shrink: convert oom_adj to oom_score_adj:\n"); + for (i = 0; i < array_size; i++) { + oom_adj = lowmem_adj[i]; + oom_score_adj = lowmem_oom_adj_to_oom_score_adj(oom_adj); + lowmem_adj[i] = oom_score_adj; + lowmem_print(1, "oom_adj %d => oom_score_adj %d\n", + oom_adj, oom_score_adj); + } +} + +static int lowmem_adj_array_set(const char *val, const struct kernel_param *kp) +{ + int ret; + + ret = param_array_ops.set(val, kp); + + /* HACK: Autodetect oom_adj values in lowmem_adj array */ + lowmem_autodetect_oom_adj_values(); + + return ret; +} + +static int lowmem_adj_array_get(char *buffer, const struct kernel_param *kp) +{ + return param_array_ops.get(buffer, kp); +} + +static void lowmem_adj_array_free(void *arg) +{ + param_array_ops.free(arg); +} + +static struct kernel_param_ops lowmem_adj_array_ops = { + .set = lowmem_adj_array_set, + .get = lowmem_adj_array_get, + .free = lowmem_adj_array_free, +}; + +static const struct kparam_array __param_arr_adj = { + .max = ARRAY_SIZE(lowmem_adj), + .num = &lowmem_adj_size, + .ops = ¶m_ops_short, + .elemsize = sizeof(lowmem_adj[0]), + .elem = lowmem_adj, +}; +#endif + /* * not really modular, but the easiest way to keep compat with existing * bootargs behaviour is to continue using module_param here. */ module_param_named(cost, lowmem_shrinker.seeks, int, 0644); +#ifdef CONFIG_ANDROID_LOW_MEMORY_KILLER_AUTODETECT_OOM_ADJ_VALUES +module_param_cb(adj, &lowmem_adj_array_ops, + .arr = &__param_arr_adj, + 0644); +__MODULE_PARM_TYPE(adj, "array of short"); +#else module_param_array_named(adj, lowmem_adj, short, &lowmem_adj_size, 0644); +#endif module_param_array_named(minfree, lowmem_minfree, uint, &lowmem_minfree_size, 0644); module_param_named(debug_level, lowmem_debug_level, uint, 0644);
diff --git a/drivers/staging/android/trace/lowmemorykiller.h b/drivers/staging/android/trace/lowmemorykiller.h new file mode 100644 index 0000000..f43d3fa --- /dev/null +++ b/drivers/staging/android/trace/lowmemorykiller.h
@@ -0,0 +1,41 @@ +#undef TRACE_SYSTEM +#define TRACE_INCLUDE_PATH ../../drivers/staging/android/trace +#define TRACE_SYSTEM lowmemorykiller + +#if !defined(_TRACE_LOWMEMORYKILLER_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_LOWMEMORYKILLER_H + +#include <linux/tracepoint.h> + +TRACE_EVENT(lowmemory_kill, + TP_PROTO(struct task_struct *killed_task, long cache_size, \ + long cache_limit, long free), + + TP_ARGS(killed_task, cache_size, cache_limit, free), + + TP_STRUCT__entry( + __array(char, comm, TASK_COMM_LEN) + __field(pid_t, pid) + __field(long, pagecache_size) + __field(long, pagecache_limit) + __field(long, free) + ), + + TP_fast_assign( + memcpy(__entry->comm, killed_task->comm, TASK_COMM_LEN); + __entry->pid = killed_task->pid; + __entry->pagecache_size = cache_size; + __entry->pagecache_limit = cache_limit; + __entry->free = free; + ), + + TP_printk("%s (%d), page cache %ldkB (limit %ldkB), free %ldKb", + __entry->comm, __entry->pid, __entry->pagecache_size, + __entry->pagecache_limit, __entry->free) +); + + +#endif /* if !defined(_TRACE_LOWMEMORYKILLER_H) || defined(TRACE_HEADER_MULTI_READ) */ + +/* This part must be outside protection */ +#include <trace/define_trace.h>
diff --git a/drivers/staging/android/uapi/vsoc_shm.h b/drivers/staging/android/uapi/vsoc_shm.h new file mode 100644 index 0000000..741b138 --- /dev/null +++ b/drivers/staging/android/uapi/vsoc_shm.h
@@ -0,0 +1,303 @@ +/* + * Copyright (C) 2017 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef _UAPI_LINUX_VSOC_SHM_H +#define _UAPI_LINUX_VSOC_SHM_H + +#include <linux/types.h> + +/** + * A permission is a token that permits a receiver to read and/or write an area + * of memory within a Vsoc region. + * + * An fd_scoped permission grants both read and write access, and can be + * attached to a file description (see open(2)). + * Ownership of the area can then be shared by passing a file descriptor + * among processes. + * + * begin_offset and end_offset define the area of memory that is controlled by + * the permission. owner_offset points to a word, also in shared memory, that + * controls ownership of the area. + * + * ownership of the region expires when the associated file description is + * released. + * + * At most one permission can be attached to each file description. + * + * This is useful when implementing HALs like gralloc that scope and pass + * ownership of shared resources via file descriptors. + * + * The caller is responsibe for doing any fencing. + * + * The calling process will normally identify a currently free area of + * memory. It will construct a proposed fd_scoped_permission_arg structure: + * + * begin_offset and end_offset describe the area being claimed + * + * owner_offset points to the location in shared memory that indicates the + * owner of the area. + * + * owned_value is the value that will be stored in owner_offset iff the + * permission can be granted. It must be different than VSOC_REGION_FREE. + * + * Two fd_scoped_permission structures are compatible if they vary only by + * their owned_value fields. + * + * The driver ensures that, for any group of simultaneous callers proposing + * compatible fd_scoped_permissions, it will accept exactly one of the + * propopsals. The other callers will get a failure with errno of EAGAIN. + * + * A process receiving a file descriptor can identify the region being + * granted using the VSOC_GET_FD_SCOPED_PERMISSION ioctl. + */ +struct fd_scoped_permission { + __u32 begin_offset; + __u32 end_offset; + __u32 owner_offset; + __u32 owned_value; +}; + +/* + * This value represents a free area of memory. The driver expects to see this + * value at owner_offset when creating a permission otherwise it will not do it, + * and will write this value back once the permission is no longer needed. + */ +#define VSOC_REGION_FREE ((__u32)0) + +/** + * ioctl argument for VSOC_CREATE_FD_SCOPE_PERMISSION + */ +struct fd_scoped_permission_arg { + struct fd_scoped_permission perm; + __s32 managed_region_fd; +}; + +#define VSOC_NODE_FREE ((__u32)0) + +/* + * Describes a signal table in shared memory. Each non-zero entry in the + * table indicates that the receiver should signal the futex at the given + * offset. Offsets are relative to the region, not the shared memory window. + * + * interrupt_signalled_offset is used to reliably signal interrupts across the + * vmm boundary. There are two roles: transmitter and receiver. For example, + * in the host_to_guest_signal_table the host is the transmitter and the + * guest is the receiver. The protocol is as follows: + * + * 1. The transmitter should convert the offset of the futex to an offset + * in the signal table [0, (1 << num_nodes_lg2)) + * The transmitter can choose any appropriate hashing algorithm, including + * hash = futex_offset & ((1 << num_nodes_lg2) - 1) + * + * 3. The transmitter should atomically compare and swap futex_offset with 0 + * at hash. There are 3 possible outcomes + * a. The swap fails because the futex_offset is already in the table. + * The transmitter should stop. + * b. Some other offset is in the table. This is a hash collision. The + * transmitter should move to another table slot and try again. One + * possible algorithm: + * hash = (hash + 1) & ((1 << num_nodes_lg2) - 1) + * c. The swap worked. Continue below. + * + * 3. The transmitter atomically swaps 1 with the value at the + * interrupt_signalled_offset. There are two outcomes: + * a. The prior value was 1. In this case an interrupt has already been + * posted. The transmitter is done. + * b. The prior value was 0, indicating that the receiver may be sleeping. + * The transmitter will issue an interrupt. + * + * 4. On waking the receiver immediately exchanges a 0 with the + * interrupt_signalled_offset. If it receives a 0 then this a spurious + * interrupt. That may occasionally happen in the current protocol, but + * should be rare. + * + * 5. The receiver scans the signal table by atomicaly exchanging 0 at each + * location. If a non-zero offset is returned from the exchange the + * receiver wakes all sleepers at the given offset: + * futex((int*)(region_base + old_value), FUTEX_WAKE, MAX_INT); + * + * 6. The receiver thread then does a conditional wait, waking immediately + * if the value at interrupt_signalled_offset is non-zero. This catches cases + * here additional signals were posted while the table was being scanned. + * On the guest the wait is handled via the VSOC_WAIT_FOR_INCOMING_INTERRUPT + * ioctl. + */ +struct vsoc_signal_table_layout { + /* log_2(Number of signal table entries) */ + __u32 num_nodes_lg2; + /* + * Offset to the first signal table entry relative to the start of the + * region + */ + __u32 futex_uaddr_table_offset; + /* + * Offset to an atomic_t / atomic uint32_t. A non-zero value indicates + * that one or more offsets are currently posted in the table. + * semi-unique access to an entry in the table + */ + __u32 interrupt_signalled_offset; +}; + +#define VSOC_REGION_WHOLE ((__s32)0) +#define VSOC_DEVICE_NAME_SZ 16 + +/** + * Each HAL would (usually) talk to a single device region + * Mulitple entities care about these regions: + * - The ivshmem_server will populate the regions in shared memory + * - The guest kernel will read the region, create minor device nodes, and + * allow interested parties to register for FUTEX_WAKE events in the region + * - HALs will access via the minor device nodes published by the guest kernel + * - Host side processes will access the region via the ivshmem_server: + * 1. Pass name to ivshmem_server at a UNIX socket + * 2. ivshmemserver will reply with 2 fds: + * - host->guest doorbell fd + * - guest->host doorbell fd + * - fd for the shared memory region + * - region offset + * 3. Start a futex receiver thread on the doorbell fd pointed at the + * signal_nodes + */ +struct vsoc_device_region { + __u16 current_version; + __u16 min_compatible_version; + __u32 region_begin_offset; + __u32 region_end_offset; + __u32 offset_of_region_data; + struct vsoc_signal_table_layout guest_to_host_signal_table; + struct vsoc_signal_table_layout host_to_guest_signal_table; + /* Name of the device. Must always be terminated with a '\0', so + * the longest supported device name is 15 characters. + */ + char device_name[VSOC_DEVICE_NAME_SZ]; + /* There are two ways that permissions to access regions are handled: + * - When subdivided_by is VSOC_REGION_WHOLE, any process that can + * open the device node for the region gains complete access to it. + * - When subdivided is set processes that open the region cannot + * access it. Access to a sub-region must be established by invoking + * the VSOC_CREATE_FD_SCOPE_PERMISSION ioctl on the region + * referenced in subdivided_by, providing a fileinstance + * (represented by a fd) opened on this region. + */ + __u32 managed_by; +}; + +/* + * The vsoc layout descriptor. + * The first 4K should be reserved for the shm header and region descriptors. + * The regions should be page aligned. + */ + +struct vsoc_shm_layout_descriptor { + __u16 major_version; + __u16 minor_version; + + /* size of the shm. This may be redundant but nice to have */ + __u32 size; + + /* number of shared memory regions */ + __u32 region_count; + + /* The offset to the start of region descriptors */ + __u32 vsoc_region_desc_offset; +}; + +/* + * This specifies the current version that should be stored in + * vsoc_shm_layout_descriptor.major_version and + * vsoc_shm_layout_descriptor.minor_version. + * It should be updated only if the vsoc_device_region and + * vsoc_shm_layout_descriptor structures have changed. + * Versioning within each region is transferred + * via the min_compatible_version and current_version fields in + * vsoc_device_region. The driver does not consult these fields: they are left + * for the HALs and host processes and will change independently of the layout + * version. + */ +#define CURRENT_VSOC_LAYOUT_MAJOR_VERSION 2 +#define CURRENT_VSOC_LAYOUT_MINOR_VERSION 0 + +#define VSOC_CREATE_FD_SCOPED_PERMISSION \ + _IOW(0xF5, 0, struct fd_scoped_permission) +#define VSOC_GET_FD_SCOPED_PERMISSION _IOR(0xF5, 1, struct fd_scoped_permission) + +/* + * This is used to signal the host to scan the guest_to_host_signal_table + * for new futexes to wake. This sends an interrupt if one is not already + * in flight. + */ +#define VSOC_MAYBE_SEND_INTERRUPT_TO_HOST _IO(0xF5, 2) + +/* + * When this returns the guest will scan host_to_guest_signal_table to + * check for new futexes to wake. + */ +/* TODO(ghartman): Consider moving this to the bottom half */ +#define VSOC_WAIT_FOR_INCOMING_INTERRUPT _IO(0xF5, 3) + +/* + * Guest HALs will use this to retrieve the region description after + * opening their device node. + */ +#define VSOC_DESCRIBE_REGION _IOR(0xF5, 4, struct vsoc_device_region) + +/* + * Wake any threads that may be waiting for a host interrupt on this region. + * This is mostly used during shutdown. + */ +#define VSOC_SELF_INTERRUPT _IO(0xF5, 5) + +/* + * This is used to signal the host to scan the guest_to_host_signal_table + * for new futexes to wake. This sends an interrupt unconditionally. + */ +#define VSOC_SEND_INTERRUPT_TO_HOST _IO(0xF5, 6) + +enum wait_types { + VSOC_WAIT_UNDEFINED = 0, + VSOC_WAIT_IF_EQUAL = 1, + VSOC_WAIT_IF_EQUAL_TIMEOUT = 2 +}; + +/* + * Wait for a condition to be true + * + * Note, this is sized and aligned so the 32 bit and 64 bit layouts are + * identical. + */ +struct vsoc_cond_wait { + /* Input: Offset of the 32 bit word to check */ + __u32 offset; + /* Input: Value that will be compared with the offset */ + __u32 value; + /* Monotonic time to wake at in seconds */ + __u64 wake_time_sec; + /* Input: Monotonic time to wait in nanoseconds */ + __u32 wake_time_nsec; + /* Input: Type of wait */ + __u32 wait_type; + /* Output: Number of times the thread woke before returning. */ + __u32 wakes; + /* Ensure that we're 8-byte aligned and 8 byte length for 32/64 bit + * compatibility. + */ + __u32 reserved_1; +}; + +#define VSOC_COND_WAIT _IOWR(0xF5, 7, struct vsoc_cond_wait) + +/* Wake any local threads waiting at the offset given in arg */ +#define VSOC_COND_WAKE _IO(0xF5, 8) + +#endif /* _UAPI_LINUX_VSOC_SHM_H */
diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c new file mode 100644 index 0000000..954ed2c --- /dev/null +++ b/drivers/staging/android/vsoc.c
@@ -0,0 +1,1165 @@ +/* + * drivers/android/staging/vsoc.c + * + * Android Virtual System on a Chip (VSoC) driver + * + * Copyright (C) 2017 Google, Inc. + * + * Author: ghartman@google.com + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * + * Based on drivers/char/kvm_ivshmem.c - driver for KVM Inter-VM shared memory + * Copyright 2009 Cam Macdonell <cam@cs.ualberta.ca> + * + * Based on cirrusfb.c and 8139cp.c: + * Copyright 1999-2001 Jeff Garzik + * Copyright 2001-2004 Jeff Garzik + */ + +#include <linux/dma-mapping.h> +#include <linux/freezer.h> +#include <linux/futex.h> +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/pci.h> +#include <linux/proc_fs.h> +#include <linux/sched.h> +#include <linux/syscalls.h> +#include <linux/uaccess.h> +#include <linux/interrupt.h> +#include <linux/mutex.h> +#include <linux/cdev.h> +#include <linux/file.h> +#include "uapi/vsoc_shm.h" + +#define VSOC_DEV_NAME "vsoc" + +/* + * Description of the ivshmem-doorbell PCI device used by QEmu. These + * constants follow docs/specs/ivshmem-spec.txt, which can be found in + * the QEmu repository. This was last reconciled with the version that + * came out with 2.8 + */ + +/* + * These constants are determined KVM Inter-VM shared memory device + * register offsets + */ +enum { + INTR_MASK = 0x00, /* Interrupt Mask */ + INTR_STATUS = 0x04, /* Interrupt Status */ + IV_POSITION = 0x08, /* VM ID */ + DOORBELL = 0x0c, /* Doorbell */ +}; + +static const int REGISTER_BAR; /* Equal to 0 */ +static const int MAX_REGISTER_BAR_LEN = 0x100; +/* + * The MSI-x BAR is not used directly. + * + * static const int MSI_X_BAR = 1; + */ +static const int SHARED_MEMORY_BAR = 2; + +struct vsoc_region_data { + char name[VSOC_DEVICE_NAME_SZ + 1]; + wait_queue_head_t interrupt_wait_queue; + /* TODO(b/73664181): Use multiple futex wait queues */ + wait_queue_head_t futex_wait_queue; + /* Flag indicating that an interrupt has been signalled by the host. */ + atomic_t *incoming_signalled; + /* Flag indicating the guest has signalled the host. */ + atomic_t *outgoing_signalled; + bool irq_requested; + bool device_created; +}; + +struct vsoc_device { + /* Kernel virtual address of REGISTER_BAR. */ + void __iomem *regs; + /* Physical address of SHARED_MEMORY_BAR. */ + phys_addr_t shm_phys_start; + /* Kernel virtual address of SHARED_MEMORY_BAR. */ + void __iomem *kernel_mapped_shm; + /* Size of the entire shared memory window in bytes. */ + size_t shm_size; + /* + * Pointer to the virtual address of the shared memory layout structure. + * This is probably identical to kernel_mapped_shm, but saving this + * here saves a lot of annoying casts. + */ + struct vsoc_shm_layout_descriptor *layout; + /* + * Points to a table of region descriptors in the kernel's virtual + * address space. Calculated from + * vsoc_shm_layout_descriptor.vsoc_region_desc_offset + */ + struct vsoc_device_region *regions; + /* Head of a list of permissions that have been granted. */ + struct list_head permissions; + struct pci_dev *dev; + /* Per-region (and therefore per-interrupt) information. */ + struct vsoc_region_data *regions_data; + /* + * Table of msi-x entries. This has to be separated from struct + * vsoc_region_data because the kernel deals with them as an array. + */ + struct msix_entry *msix_entries; + /* Mutex that protectes the permission list */ + struct mutex mtx; + /* Major number assigned by the kernel */ + int major; + /* Character device assigned by the kernel */ + struct cdev cdev; + /* Device class assigned by the kernel */ + struct class *class; + /* + * Flags that indicate what we've initialized. These are used to do an + * orderly cleanup of the device. + */ + bool enabled_device; + bool requested_regions; + bool cdev_added; + bool class_added; + bool msix_enabled; +}; + +static struct vsoc_device vsoc_dev; + +/* + * TODO(ghartman): Add a /sys filesystem entry that summarizes the permissions. + */ + +struct fd_scoped_permission_node { + struct fd_scoped_permission permission; + struct list_head list; +}; + +struct vsoc_private_data { + struct fd_scoped_permission_node *fd_scoped_permission_node; +}; + +static long vsoc_ioctl(struct file *, unsigned int, unsigned long); +static int vsoc_mmap(struct file *, struct vm_area_struct *); +static int vsoc_open(struct inode *, struct file *); +static int vsoc_release(struct inode *, struct file *); +static ssize_t vsoc_read(struct file *, char __user *, size_t, loff_t *); +static ssize_t vsoc_write(struct file *, const char __user *, size_t, loff_t *); +static loff_t vsoc_lseek(struct file *filp, loff_t offset, int origin); +static int do_create_fd_scoped_permission( + struct vsoc_device_region *region_p, + struct fd_scoped_permission_node *np, + struct fd_scoped_permission_arg __user *arg); +static void do_destroy_fd_scoped_permission( + struct vsoc_device_region *owner_region_p, + struct fd_scoped_permission *perm); +static long do_vsoc_describe_region(struct file *, + struct vsoc_device_region __user *); +static ssize_t vsoc_get_area(struct file *filp, __u32 *perm_off); + +/** + * Validate arguments on entry points to the driver. + */ +inline int vsoc_validate_inode(struct inode *inode) +{ + if (iminor(inode) >= vsoc_dev.layout->region_count) { + dev_err(&vsoc_dev.dev->dev, + "describe_region: invalid region %d\n", iminor(inode)); + return -ENODEV; + } + return 0; +} + +inline int vsoc_validate_filep(struct file *filp) +{ + int ret = vsoc_validate_inode(file_inode(filp)); + + if (ret) + return ret; + if (!filp->private_data) { + dev_err(&vsoc_dev.dev->dev, + "No private data on fd, region %d\n", + iminor(file_inode(filp))); + return -EBADFD; + } + return 0; +} + +/* Converts from shared memory offset to virtual address */ +static inline void *shm_off_to_virtual_addr(__u32 offset) +{ + return (void __force *)vsoc_dev.kernel_mapped_shm + offset; +} + +/* Converts from shared memory offset to physical address */ +static inline phys_addr_t shm_off_to_phys_addr(__u32 offset) +{ + return vsoc_dev.shm_phys_start + offset; +} + +/** + * Convenience functions to obtain the region from the inode or file. + * Dangerous to call before validating the inode/file. + */ +static inline struct vsoc_device_region *vsoc_region_from_inode( + struct inode *inode) +{ + return &vsoc_dev.regions[iminor(inode)]; +} + +static inline struct vsoc_device_region *vsoc_region_from_filep( + struct file *inode) +{ + return vsoc_region_from_inode(file_inode(inode)); +} + +static inline uint32_t vsoc_device_region_size(struct vsoc_device_region *r) +{ + return r->region_end_offset - r->region_begin_offset; +} + +static const struct file_operations vsoc_ops = { + .owner = THIS_MODULE, + .open = vsoc_open, + .mmap = vsoc_mmap, + .read = vsoc_read, + .unlocked_ioctl = vsoc_ioctl, + .compat_ioctl = vsoc_ioctl, + .write = vsoc_write, + .llseek = vsoc_lseek, + .release = vsoc_release, +}; + +static struct pci_device_id vsoc_id_table[] = { + {0x1af4, 0x1110, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, + {0}, +}; + +MODULE_DEVICE_TABLE(pci, vsoc_id_table); + +static void vsoc_remove_device(struct pci_dev *pdev); +static int vsoc_probe_device(struct pci_dev *pdev, + const struct pci_device_id *ent); + +static struct pci_driver vsoc_pci_driver = { + .name = "vsoc", + .id_table = vsoc_id_table, + .probe = vsoc_probe_device, + .remove = vsoc_remove_device, +}; + +static int do_create_fd_scoped_permission( + struct vsoc_device_region *region_p, + struct fd_scoped_permission_node *np, + struct fd_scoped_permission_arg __user *arg) +{ + struct file *managed_filp; + s32 managed_fd; + atomic_t *owner_ptr = NULL; + struct vsoc_device_region *managed_region_p; + + if (copy_from_user(&np->permission, &arg->perm, sizeof(*np)) || + copy_from_user(&managed_fd, + &arg->managed_region_fd, sizeof(managed_fd))) { + return -EFAULT; + } + managed_filp = fdget(managed_fd).file; + /* Check that it's a valid fd, */ + if (!managed_filp || vsoc_validate_filep(managed_filp)) + return -EPERM; + /* EEXIST if the given fd already has a permission. */ + if (((struct vsoc_private_data *)managed_filp->private_data)-> + fd_scoped_permission_node) + return -EEXIST; + managed_region_p = vsoc_region_from_filep(managed_filp); + /* Check that the provided region is managed by this one */ + if (&vsoc_dev.regions[managed_region_p->managed_by] != region_p) + return -EPERM; + /* The area must be well formed and have non-zero size */ + if (np->permission.begin_offset >= np->permission.end_offset) + return -EINVAL; + /* The area must fit in the memory window */ + if (np->permission.end_offset > + vsoc_device_region_size(managed_region_p)) + return -ERANGE; + /* The area must be in the region data section */ + if (np->permission.begin_offset < + managed_region_p->offset_of_region_data) + return -ERANGE; + /* The area must be page aligned */ + if (!PAGE_ALIGNED(np->permission.begin_offset) || + !PAGE_ALIGNED(np->permission.end_offset)) + return -EINVAL; + /* Owner offset must be naturally aligned in the window */ + if (np->permission.owner_offset & + (sizeof(np->permission.owner_offset) - 1)) + return -EINVAL; + /* The owner flag must reside in the owner memory */ + if (np->permission.owner_offset + sizeof(np->permission.owner_offset) > + vsoc_device_region_size(region_p)) + return -ERANGE; + /* The owner flag must reside in the data section */ + if (np->permission.owner_offset < region_p->offset_of_region_data) + return -EINVAL; + /* The owner value must change to claim the memory */ + if (np->permission.owned_value == VSOC_REGION_FREE) + return -EINVAL; + owner_ptr = + (atomic_t *)shm_off_to_virtual_addr(region_p->region_begin_offset + + np->permission.owner_offset); + /* We've already verified that this is in the shared memory window, so + * it should be safe to write to this address. + */ + if (atomic_cmpxchg(owner_ptr, + VSOC_REGION_FREE, + np->permission.owned_value) != VSOC_REGION_FREE) { + return -EBUSY; + } + ((struct vsoc_private_data *)managed_filp->private_data)-> + fd_scoped_permission_node = np; + /* The file offset needs to be adjusted if the calling + * process did any read/write operations on the fd + * before creating the permission. + */ + if (managed_filp->f_pos) { + if (managed_filp->f_pos > np->permission.end_offset) { + /* If the offset is beyond the permission end, set it + * to the end. + */ + managed_filp->f_pos = np->permission.end_offset; + } else { + /* If the offset is within the permission interval + * keep it there otherwise reset it to zero. + */ + if (managed_filp->f_pos < np->permission.begin_offset) { + managed_filp->f_pos = 0; + } else { + managed_filp->f_pos -= + np->permission.begin_offset; + } + } + } + return 0; +} + +static void do_destroy_fd_scoped_permission_node( + struct vsoc_device_region *owner_region_p, + struct fd_scoped_permission_node *node) +{ + if (node) { + do_destroy_fd_scoped_permission(owner_region_p, + &node->permission); + mutex_lock(&vsoc_dev.mtx); + list_del(&node->list); + mutex_unlock(&vsoc_dev.mtx); + kfree(node); + } +} + +static void do_destroy_fd_scoped_permission( + struct vsoc_device_region *owner_region_p, + struct fd_scoped_permission *perm) +{ + atomic_t *owner_ptr = NULL; + int prev = 0; + + if (!perm) + return; + owner_ptr = (atomic_t *)shm_off_to_virtual_addr( + owner_region_p->region_begin_offset + perm->owner_offset); + prev = atomic_xchg(owner_ptr, VSOC_REGION_FREE); + if (prev != perm->owned_value) + dev_err(&vsoc_dev.dev->dev, + "%x-%x: owner (%s) %x: expected to be %x was %x", + perm->begin_offset, perm->end_offset, + owner_region_p->device_name, perm->owner_offset, + perm->owned_value, prev); +} + +static long do_vsoc_describe_region(struct file *filp, + struct vsoc_device_region __user *dest) +{ + struct vsoc_device_region *region_p; + int retval = vsoc_validate_filep(filp); + + if (retval) + return retval; + region_p = vsoc_region_from_filep(filp); + if (copy_to_user(dest, region_p, sizeof(*region_p))) + return -EFAULT; + return 0; +} + +/** + * Implements the inner logic of cond_wait. Copies to and from userspace are + * done in the helper function below. + */ +static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) +{ + DEFINE_WAIT(wait); + u32 region_number = iminor(file_inode(filp)); + struct vsoc_region_data *data = vsoc_dev.regions_data + region_number; + struct hrtimer_sleeper timeout, *to = NULL; + int ret = 0; + struct vsoc_device_region *region_p = vsoc_region_from_filep(filp); + atomic_t *address = NULL; + struct timespec ts; + + /* Ensure that the offset is aligned */ + if (arg->offset & (sizeof(uint32_t) - 1)) + return -EADDRNOTAVAIL; + /* Ensure that the offset is within shared memory */ + if (((uint64_t)arg->offset) + region_p->region_begin_offset + + sizeof(uint32_t) > region_p->region_end_offset) + return -E2BIG; + address = shm_off_to_virtual_addr(region_p->region_begin_offset + + arg->offset); + + /* Ensure that the type of wait is valid */ + switch (arg->wait_type) { + case VSOC_WAIT_IF_EQUAL: + break; + case VSOC_WAIT_IF_EQUAL_TIMEOUT: + to = &timeout; + break; + default: + return -EINVAL; + } + + if (to) { + /* Copy the user-supplied timesec into the kernel structure. + * We do things this way to flatten differences between 32 bit + * and 64 bit timespecs. + */ + ts.tv_sec = arg->wake_time_sec; + ts.tv_nsec = arg->wake_time_nsec; + + if (!timespec_valid(&ts)) + return -EINVAL; + hrtimer_init_on_stack(&to->timer, CLOCK_MONOTONIC, + HRTIMER_MODE_ABS); + hrtimer_set_expires_range_ns(&to->timer, timespec_to_ktime(ts), + current->timer_slack_ns); + + hrtimer_init_sleeper(to, current); + } + + while (1) { + prepare_to_wait(&data->futex_wait_queue, &wait, + TASK_INTERRUPTIBLE); + /* + * Check the sentinel value after prepare_to_wait. If the value + * changes after this check the writer will call signal, + * changing the task state from INTERRUPTIBLE to RUNNING. That + * will ensure that schedule() will eventually schedule this + * task. + */ + if (atomic_read(address) != arg->value) { + ret = 0; + break; + } + if (to) { + hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS); + if (likely(to->task)) + freezable_schedule(); + hrtimer_cancel(&to->timer); + if (!to->task) { + ret = -ETIMEDOUT; + break; + } + } else { + freezable_schedule(); + } + /* Count the number of times that we woke up. This is useful + * for unit testing. + */ + ++arg->wakes; + if (signal_pending(current)) { + ret = -EINTR; + break; + } + } + finish_wait(&data->futex_wait_queue, &wait); + if (to) + destroy_hrtimer_on_stack(&to->timer); + return ret; +} + +/** + * Handles the details of copying from/to userspace to ensure that the copies + * happen on all of the return paths of cond_wait. + */ +static int do_vsoc_cond_wait(struct file *filp, + struct vsoc_cond_wait __user *untrusted_in) +{ + struct vsoc_cond_wait arg; + int rval = 0; + + if (copy_from_user(&arg, untrusted_in, sizeof(arg))) + return -EFAULT; + /* wakes is an out parameter. Initialize it to something sensible. */ + arg.wakes = 0; + rval = handle_vsoc_cond_wait(filp, &arg); + if (copy_to_user(untrusted_in, &arg, sizeof(arg))) + return -EFAULT; + return rval; +} + +static int do_vsoc_cond_wake(struct file *filp, uint32_t offset) +{ + struct vsoc_device_region *region_p = vsoc_region_from_filep(filp); + u32 region_number = iminor(file_inode(filp)); + struct vsoc_region_data *data = vsoc_dev.regions_data + region_number; + /* Ensure that the offset is aligned */ + if (offset & (sizeof(uint32_t) - 1)) + return -EADDRNOTAVAIL; + /* Ensure that the offset is within shared memory */ + if (((uint64_t)offset) + region_p->region_begin_offset + + sizeof(uint32_t) > region_p->region_end_offset) + return -E2BIG; + /* + * TODO(b/73664181): Use multiple futex wait queues. + * We need to wake every sleeper when the condition changes. Typically + * only a single thread will be waiting on the condition, but there + * are exceptions. The worst case is about 10 threads. + */ + wake_up_interruptible_all(&data->futex_wait_queue); + return 0; +} + +static long vsoc_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) +{ + int rv = 0; + struct vsoc_device_region *region_p; + u32 reg_num; + struct vsoc_region_data *reg_data; + int retval = vsoc_validate_filep(filp); + + if (retval) + return retval; + region_p = vsoc_region_from_filep(filp); + reg_num = iminor(file_inode(filp)); + reg_data = vsoc_dev.regions_data + reg_num; + switch (cmd) { + case VSOC_CREATE_FD_SCOPED_PERMISSION: + { + struct fd_scoped_permission_node *node = NULL; + + node = kzalloc(sizeof(*node), GFP_KERNEL); + /* We can't allocate memory for the permission */ + if (!node) + return -ENOMEM; + INIT_LIST_HEAD(&node->list); + rv = do_create_fd_scoped_permission( + region_p, + node, + (struct fd_scoped_permission_arg __user *)arg); + if (!rv) { + mutex_lock(&vsoc_dev.mtx); + list_add(&node->list, &vsoc_dev.permissions); + mutex_unlock(&vsoc_dev.mtx); + } else { + kfree(node); + return rv; + } + } + break; + + case VSOC_GET_FD_SCOPED_PERMISSION: + { + struct fd_scoped_permission_node *node = + ((struct vsoc_private_data *)filp->private_data)-> + fd_scoped_permission_node; + if (!node) + return -ENOENT; + if (copy_to_user + ((struct fd_scoped_permission __user *)arg, + &node->permission, sizeof(node->permission))) + return -EFAULT; + } + break; + + case VSOC_MAYBE_SEND_INTERRUPT_TO_HOST: + if (!atomic_xchg( + reg_data->outgoing_signalled, + 1)) { + writel(reg_num, vsoc_dev.regs + DOORBELL); + return 0; + } else { + return -EBUSY; + } + break; + + case VSOC_SEND_INTERRUPT_TO_HOST: + writel(reg_num, vsoc_dev.regs + DOORBELL); + return 0; + + case VSOC_WAIT_FOR_INCOMING_INTERRUPT: + wait_event_interruptible( + reg_data->interrupt_wait_queue, + (atomic_read(reg_data->incoming_signalled) != 0)); + break; + + case VSOC_DESCRIBE_REGION: + return do_vsoc_describe_region( + filp, + (struct vsoc_device_region __user *)arg); + + case VSOC_SELF_INTERRUPT: + atomic_set(reg_data->incoming_signalled, 1); + wake_up_interruptible(®_data->interrupt_wait_queue); + break; + + case VSOC_COND_WAIT: + return do_vsoc_cond_wait(filp, + (struct vsoc_cond_wait __user *)arg); + case VSOC_COND_WAKE: + return do_vsoc_cond_wake(filp, arg); + + default: + return -EINVAL; + } + return 0; +} + +static ssize_t vsoc_read(struct file *filp, char __user *buffer, size_t len, + loff_t *poffset) +{ + __u32 area_off; + const void *area_p; + ssize_t area_len; + int retval = vsoc_validate_filep(filp); + + if (retval) + return retval; + area_len = vsoc_get_area(filp, &area_off); + area_p = shm_off_to_virtual_addr(area_off); + area_p += *poffset; + area_len -= *poffset; + if (area_len <= 0) + return 0; + if (area_len < len) + len = area_len; + if (copy_to_user(buffer, area_p, len)) + return -EFAULT; + *poffset += len; + return len; +} + +static loff_t vsoc_lseek(struct file *filp, loff_t offset, int origin) +{ + ssize_t area_len = 0; + int retval = vsoc_validate_filep(filp); + + if (retval) + return retval; + area_len = vsoc_get_area(filp, NULL); + switch (origin) { + case SEEK_SET: + break; + + case SEEK_CUR: + if (offset > 0 && offset + filp->f_pos < 0) + return -EOVERFLOW; + offset += filp->f_pos; + break; + + case SEEK_END: + if (offset > 0 && offset + area_len < 0) + return -EOVERFLOW; + offset += area_len; + break; + + case SEEK_DATA: + if (offset >= area_len) + return -EINVAL; + if (offset < 0) + offset = 0; + break; + + case SEEK_HOLE: + /* Next hole is always the end of the region, unless offset is + * beyond that + */ + if (offset < area_len) + offset = area_len; + break; + + default: + return -EINVAL; + } + + if (offset < 0 || offset > area_len) + return -EINVAL; + filp->f_pos = offset; + + return offset; +} + +static ssize_t vsoc_write(struct file *filp, const char __user *buffer, + size_t len, loff_t *poffset) +{ + __u32 area_off; + void *area_p; + ssize_t area_len; + int retval = vsoc_validate_filep(filp); + + if (retval) + return retval; + area_len = vsoc_get_area(filp, &area_off); + area_p = shm_off_to_virtual_addr(area_off); + area_p += *poffset; + area_len -= *poffset; + if (area_len <= 0) + return 0; + if (area_len < len) + len = area_len; + if (copy_from_user(area_p, buffer, len)) + return -EFAULT; + *poffset += len; + return len; +} + +static irqreturn_t vsoc_interrupt(int irq, void *region_data_v) +{ + struct vsoc_region_data *region_data = + (struct vsoc_region_data *)region_data_v; + int reg_num = region_data - vsoc_dev.regions_data; + + if (unlikely(!region_data)) + return IRQ_NONE; + + if (unlikely(reg_num < 0 || + reg_num >= vsoc_dev.layout->region_count)) { + dev_err(&vsoc_dev.dev->dev, + "invalid irq @%p reg_num=0x%04x\n", + region_data, reg_num); + return IRQ_NONE; + } + if (unlikely(vsoc_dev.regions_data + reg_num != region_data)) { + dev_err(&vsoc_dev.dev->dev, + "irq not aligned @%p reg_num=0x%04x\n", + region_data, reg_num); + return IRQ_NONE; + } + wake_up_interruptible(®ion_data->interrupt_wait_queue); + return IRQ_HANDLED; +} + +static int vsoc_probe_device(struct pci_dev *pdev, + const struct pci_device_id *ent) +{ + int result; + int i; + resource_size_t reg_size; + dev_t devt; + + vsoc_dev.dev = pdev; + result = pci_enable_device(pdev); + if (result) { + dev_err(&pdev->dev, + "pci_enable_device failed %s: error %d\n", + pci_name(pdev), result); + return result; + } + vsoc_dev.enabled_device = true; + result = pci_request_regions(pdev, "vsoc"); + if (result < 0) { + dev_err(&pdev->dev, "pci_request_regions failed\n"); + vsoc_remove_device(pdev); + return -EBUSY; + } + vsoc_dev.requested_regions = true; + /* Set up the control registers in BAR 0 */ + reg_size = pci_resource_len(pdev, REGISTER_BAR); + if (reg_size > MAX_REGISTER_BAR_LEN) + vsoc_dev.regs = + pci_iomap(pdev, REGISTER_BAR, MAX_REGISTER_BAR_LEN); + else + vsoc_dev.regs = pci_iomap(pdev, REGISTER_BAR, reg_size); + + if (!vsoc_dev.regs) { + dev_err(&pdev->dev, + "cannot map registers of size %zu\n", + (size_t)reg_size); + vsoc_remove_device(pdev); + return -EBUSY; + } + + /* Map the shared memory in BAR 2 */ + vsoc_dev.shm_phys_start = pci_resource_start(pdev, SHARED_MEMORY_BAR); + vsoc_dev.shm_size = pci_resource_len(pdev, SHARED_MEMORY_BAR); + + dev_info(&pdev->dev, "shared memory @ DMA %pa size=0x%zx\n", + &vsoc_dev.shm_phys_start, vsoc_dev.shm_size); + vsoc_dev.kernel_mapped_shm = pci_iomap_wc(pdev, SHARED_MEMORY_BAR, 0); + if (!vsoc_dev.kernel_mapped_shm) { + dev_err(&vsoc_dev.dev->dev, "cannot iomap region\n"); + vsoc_remove_device(pdev); + return -EBUSY; + } + + vsoc_dev.layout = (struct vsoc_shm_layout_descriptor __force *) + vsoc_dev.kernel_mapped_shm; + dev_info(&pdev->dev, "major_version: %d\n", + vsoc_dev.layout->major_version); + dev_info(&pdev->dev, "minor_version: %d\n", + vsoc_dev.layout->minor_version); + dev_info(&pdev->dev, "size: 0x%x\n", vsoc_dev.layout->size); + dev_info(&pdev->dev, "regions: %d\n", vsoc_dev.layout->region_count); + if (vsoc_dev.layout->major_version != + CURRENT_VSOC_LAYOUT_MAJOR_VERSION) { + dev_err(&vsoc_dev.dev->dev, + "driver supports only major_version %d\n", + CURRENT_VSOC_LAYOUT_MAJOR_VERSION); + vsoc_remove_device(pdev); + return -EBUSY; + } + result = alloc_chrdev_region(&devt, 0, vsoc_dev.layout->region_count, + VSOC_DEV_NAME); + if (result) { + dev_err(&vsoc_dev.dev->dev, "alloc_chrdev_region failed\n"); + vsoc_remove_device(pdev); + return -EBUSY; + } + vsoc_dev.major = MAJOR(devt); + cdev_init(&vsoc_dev.cdev, &vsoc_ops); + vsoc_dev.cdev.owner = THIS_MODULE; + result = cdev_add(&vsoc_dev.cdev, devt, vsoc_dev.layout->region_count); + if (result) { + dev_err(&vsoc_dev.dev->dev, "cdev_add error\n"); + vsoc_remove_device(pdev); + return -EBUSY; + } + vsoc_dev.cdev_added = true; + vsoc_dev.class = class_create(THIS_MODULE, VSOC_DEV_NAME); + if (IS_ERR(vsoc_dev.class)) { + dev_err(&vsoc_dev.dev->dev, "class_create failed\n"); + vsoc_remove_device(pdev); + return PTR_ERR(vsoc_dev.class); + } + vsoc_dev.class_added = true; + vsoc_dev.regions = (struct vsoc_device_region __force *) + ((void *)vsoc_dev.layout + + vsoc_dev.layout->vsoc_region_desc_offset); + vsoc_dev.msix_entries = kcalloc( + vsoc_dev.layout->region_count, + sizeof(vsoc_dev.msix_entries[0]), GFP_KERNEL); + if (!vsoc_dev.msix_entries) { + dev_err(&vsoc_dev.dev->dev, + "unable to allocate msix_entries\n"); + vsoc_remove_device(pdev); + return -ENOSPC; + } + vsoc_dev.regions_data = kcalloc( + vsoc_dev.layout->region_count, + sizeof(vsoc_dev.regions_data[0]), GFP_KERNEL); + if (!vsoc_dev.regions_data) { + dev_err(&vsoc_dev.dev->dev, + "unable to allocate regions' data\n"); + vsoc_remove_device(pdev); + return -ENOSPC; + } + for (i = 0; i < vsoc_dev.layout->region_count; ++i) + vsoc_dev.msix_entries[i].entry = i; + + result = pci_enable_msix_exact(vsoc_dev.dev, vsoc_dev.msix_entries, + vsoc_dev.layout->region_count); + if (result) { + dev_info(&pdev->dev, "pci_enable_msix failed: %d\n", result); + vsoc_remove_device(pdev); + return -ENOSPC; + } + /* Check that all regions are well formed */ + for (i = 0; i < vsoc_dev.layout->region_count; ++i) { + const struct vsoc_device_region *region = vsoc_dev.regions + i; + + if (!PAGE_ALIGNED(region->region_begin_offset) || + !PAGE_ALIGNED(region->region_end_offset)) { + dev_err(&vsoc_dev.dev->dev, + "region %d not aligned (%x:%x)", i, + region->region_begin_offset, + region->region_end_offset); + vsoc_remove_device(pdev); + return -EFAULT; + } + if (region->region_begin_offset >= region->region_end_offset || + region->region_end_offset > vsoc_dev.shm_size) { + dev_err(&vsoc_dev.dev->dev, + "region %d offsets are wrong: %x %x %zx", + i, region->region_begin_offset, + region->region_end_offset, vsoc_dev.shm_size); + vsoc_remove_device(pdev); + return -EFAULT; + } + if (region->managed_by >= vsoc_dev.layout->region_count) { + dev_err(&vsoc_dev.dev->dev, + "region %d has invalid owner: %u", + i, region->managed_by); + vsoc_remove_device(pdev); + return -EFAULT; + } + } + vsoc_dev.msix_enabled = true; + for (i = 0; i < vsoc_dev.layout->region_count; ++i) { + const struct vsoc_device_region *region = vsoc_dev.regions + i; + size_t name_sz = sizeof(vsoc_dev.regions_data[i].name) - 1; + const struct vsoc_signal_table_layout *h_to_g_signal_table = + ®ion->host_to_guest_signal_table; + const struct vsoc_signal_table_layout *g_to_h_signal_table = + ®ion->guest_to_host_signal_table; + + vsoc_dev.regions_data[i].name[name_sz] = '\0'; + memcpy(vsoc_dev.regions_data[i].name, region->device_name, + name_sz); + dev_info(&pdev->dev, "region %d name=%s\n", + i, vsoc_dev.regions_data[i].name); + init_waitqueue_head( + &vsoc_dev.regions_data[i].interrupt_wait_queue); + init_waitqueue_head(&vsoc_dev.regions_data[i].futex_wait_queue); + vsoc_dev.regions_data[i].incoming_signalled = + shm_off_to_virtual_addr(region->region_begin_offset) + + h_to_g_signal_table->interrupt_signalled_offset; + vsoc_dev.regions_data[i].outgoing_signalled = + shm_off_to_virtual_addr(region->region_begin_offset) + + g_to_h_signal_table->interrupt_signalled_offset; + result = request_irq( + vsoc_dev.msix_entries[i].vector, + vsoc_interrupt, 0, + vsoc_dev.regions_data[i].name, + vsoc_dev.regions_data + i); + if (result) { + dev_info(&pdev->dev, + "request_irq failed irq=%d vector=%d\n", + i, vsoc_dev.msix_entries[i].vector); + vsoc_remove_device(pdev); + return -ENOSPC; + } + vsoc_dev.regions_data[i].irq_requested = true; + if (!device_create(vsoc_dev.class, NULL, + MKDEV(vsoc_dev.major, i), + NULL, vsoc_dev.regions_data[i].name)) { + dev_err(&vsoc_dev.dev->dev, "device_create failed\n"); + vsoc_remove_device(pdev); + return -EBUSY; + } + vsoc_dev.regions_data[i].device_created = true; + } + return 0; +} + +/* + * This should undo all of the allocations in the probe function in reverse + * order. + * + * Notes: + * + * The device may have been partially initialized, so double check + * that the allocations happened. + * + * This function may be called multiple times, so mark resources as freed + * as they are deallocated. + */ +static void vsoc_remove_device(struct pci_dev *pdev) +{ + int i; + /* + * pdev is the first thing to be set on probe and the last thing + * to be cleared here. If it's NULL then there is no cleanup. + */ + if (!pdev || !vsoc_dev.dev) + return; + dev_info(&pdev->dev, "remove_device\n"); + if (vsoc_dev.regions_data) { + for (i = 0; i < vsoc_dev.layout->region_count; ++i) { + if (vsoc_dev.regions_data[i].device_created) { + device_destroy(vsoc_dev.class, + MKDEV(vsoc_dev.major, i)); + vsoc_dev.regions_data[i].device_created = false; + } + if (vsoc_dev.regions_data[i].irq_requested) + free_irq(vsoc_dev.msix_entries[i].vector, NULL); + vsoc_dev.regions_data[i].irq_requested = false; + } + kfree(vsoc_dev.regions_data); + vsoc_dev.regions_data = NULL; + } + if (vsoc_dev.msix_enabled) { + pci_disable_msix(pdev); + vsoc_dev.msix_enabled = false; + } + kfree(vsoc_dev.msix_entries); + vsoc_dev.msix_entries = NULL; + vsoc_dev.regions = NULL; + if (vsoc_dev.class_added) { + class_destroy(vsoc_dev.class); + vsoc_dev.class_added = false; + } + if (vsoc_dev.cdev_added) { + cdev_del(&vsoc_dev.cdev); + vsoc_dev.cdev_added = false; + } + if (vsoc_dev.major && vsoc_dev.layout) { + unregister_chrdev_region(MKDEV(vsoc_dev.major, 0), + vsoc_dev.layout->region_count); + vsoc_dev.major = 0; + } + vsoc_dev.layout = NULL; + if (vsoc_dev.kernel_mapped_shm) { + pci_iounmap(pdev, vsoc_dev.kernel_mapped_shm); + vsoc_dev.kernel_mapped_shm = NULL; + } + if (vsoc_dev.regs) { + pci_iounmap(pdev, vsoc_dev.regs); + vsoc_dev.regs = NULL; + } + if (vsoc_dev.requested_regions) { + pci_release_regions(pdev); + vsoc_dev.requested_regions = false; + } + if (vsoc_dev.enabled_device) { + pci_disable_device(pdev); + vsoc_dev.enabled_device = false; + } + /* Do this last: it indicates that the device is not initialized. */ + vsoc_dev.dev = NULL; +} + +static void __exit vsoc_cleanup_module(void) +{ + vsoc_remove_device(vsoc_dev.dev); + pci_unregister_driver(&vsoc_pci_driver); +} + +static int __init vsoc_init_module(void) +{ + int err = -ENOMEM; + + INIT_LIST_HEAD(&vsoc_dev.permissions); + mutex_init(&vsoc_dev.mtx); + + err = pci_register_driver(&vsoc_pci_driver); + if (err < 0) + return err; + return 0; +} + +static int vsoc_open(struct inode *inode, struct file *filp) +{ + /* Can't use vsoc_validate_filep because filp is still incomplete */ + int ret = vsoc_validate_inode(inode); + + if (ret) + return ret; + filp->private_data = + kzalloc(sizeof(struct vsoc_private_data), GFP_KERNEL); + if (!filp->private_data) + return -ENOMEM; + return 0; +} + +static int vsoc_release(struct inode *inode, struct file *filp) +{ + struct vsoc_private_data *private_data = NULL; + struct fd_scoped_permission_node *node = NULL; + struct vsoc_device_region *owner_region_p = NULL; + int retval = vsoc_validate_filep(filp); + + if (retval) + return retval; + private_data = (struct vsoc_private_data *)filp->private_data; + if (!private_data) + return 0; + + node = private_data->fd_scoped_permission_node; + if (node) { + owner_region_p = vsoc_region_from_inode(inode); + if (owner_region_p->managed_by != VSOC_REGION_WHOLE) { + owner_region_p = + &vsoc_dev.regions[owner_region_p->managed_by]; + } + do_destroy_fd_scoped_permission_node(owner_region_p, node); + private_data->fd_scoped_permission_node = NULL; + } + kfree(private_data); + filp->private_data = NULL; + + return 0; +} + +/* + * Returns the device relative offset and length of the area specified by the + * fd scoped permission. If there is no fd scoped permission set, a default + * permission covering the entire region is assumed, unless the region is owned + * by another one, in which case the default is a permission with zero size. + */ +static ssize_t vsoc_get_area(struct file *filp, __u32 *area_offset) +{ + __u32 off = 0; + ssize_t length = 0; + struct vsoc_device_region *region_p; + struct fd_scoped_permission *perm; + + region_p = vsoc_region_from_filep(filp); + off = region_p->region_begin_offset; + perm = &((struct vsoc_private_data *)filp->private_data)-> + fd_scoped_permission_node->permission; + if (perm) { + off += perm->begin_offset; + length = perm->end_offset - perm->begin_offset; + } else if (region_p->managed_by == VSOC_REGION_WHOLE) { + /* No permission set and the regions is not owned by another, + * default to full region access. + */ + length = vsoc_device_region_size(region_p); + } else { + /* return zero length, access is denied. */ + length = 0; + } + if (area_offset) + *area_offset = off; + return length; +} + +static int vsoc_mmap(struct file *filp, struct vm_area_struct *vma) +{ + unsigned long len = vma->vm_end - vma->vm_start; + __u32 area_off; + phys_addr_t mem_off; + ssize_t area_len; + int retval = vsoc_validate_filep(filp); + + if (retval) + return retval; + area_len = vsoc_get_area(filp, &area_off); + /* Add the requested offset */ + area_off += (vma->vm_pgoff << PAGE_SHIFT); + area_len -= (vma->vm_pgoff << PAGE_SHIFT); + if (area_len < len) + return -EINVAL; + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + mem_off = shm_off_to_phys_addr(area_off); + if (io_remap_pfn_range(vma, vma->vm_start, mem_off >> PAGE_SHIFT, + len, vma->vm_page_prot)) + return -EAGAIN; + return 0; +} + +module_init(vsoc_init_module); +module_exit(vsoc_cleanup_module); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Greg Hartman <ghartman@google.com>"); +MODULE_DESCRIPTION("VSoC interpretation of QEmu's ivshmem device"); +MODULE_VERSION("1.0");
diff --git a/drivers/staging/goldfish/Kconfig b/drivers/staging/goldfish/Kconfig index 4e09460..d293bbc 100644 --- a/drivers/staging/goldfish/Kconfig +++ b/drivers/staging/goldfish/Kconfig
@@ -4,6 +4,14 @@ ---help--- Emulated audio channel for the Goldfish Android Virtual Device +config GOLDFISH_SYNC + tristate "Goldfish AVD Sync Driver" + depends on GOLDFISH + depends on SW_SYNC + depends on SYNC_FILE + ---help--- + Emulated sync fences for the Goldfish Android Virtual Device + config MTD_GOLDFISH_NAND tristate "Goldfish NAND device" depends on GOLDFISH
diff --git a/drivers/staging/goldfish/Makefile b/drivers/staging/goldfish/Makefile index dec34ad..3313fce 100644 --- a/drivers/staging/goldfish/Makefile +++ b/drivers/staging/goldfish/Makefile
@@ -4,3 +4,9 @@ obj-$(CONFIG_GOLDFISH_AUDIO) += goldfish_audio.o obj-$(CONFIG_MTD_GOLDFISH_NAND) += goldfish_nand.o + +# and sync + +ccflags-y := -Idrivers/staging/android +goldfish_sync-objs := goldfish_sync_timeline_fence.o goldfish_sync_timeline.o +obj-$(CONFIG_GOLDFISH_SYNC) += goldfish_sync.o
diff --git a/drivers/staging/goldfish/goldfish_audio.c b/drivers/staging/goldfish/goldfish_audio.c index bd55995..0bb0ee2 100644 --- a/drivers/staging/goldfish/goldfish_audio.c +++ b/drivers/staging/goldfish/goldfish_audio.c
@@ -28,6 +28,7 @@ #include <linux/uaccess.h> #include <linux/slab.h> #include <linux/goldfish.h> +#include <linux/acpi.h> MODULE_AUTHOR("Google, Inc."); MODULE_DESCRIPTION("Android QEMU Audio Driver"); @@ -116,6 +117,7 @@ static ssize_t goldfish_audio_read(struct file *fp, char __user *buf, size_t count, loff_t *pos) { struct goldfish_audio *data = fp->private_data; + unsigned long irq_flags; int length; int result = 0; @@ -129,6 +131,10 @@ static ssize_t goldfish_audio_read(struct file *fp, char __user *buf, wait_event_interruptible(data->wait, data->buffer_status & AUDIO_INT_READ_BUFFER_FULL); + spin_lock_irqsave(&data->lock, irq_flags); + data->buffer_status &= ~AUDIO_INT_READ_BUFFER_FULL; + spin_unlock_irqrestore(&data->lock, irq_flags); + length = AUDIO_READ(data, AUDIO_READ_BUFFER_AVAILABLE); /* copy data to user space */ @@ -351,12 +357,19 @@ static const struct of_device_id goldfish_audio_of_match[] = { }; MODULE_DEVICE_TABLE(of, goldfish_audio_of_match); +static const struct acpi_device_id goldfish_audio_acpi_match[] = { + { "GFSH0005", 0 }, + { }, +}; +MODULE_DEVICE_TABLE(acpi, goldfish_audio_acpi_match); + static struct platform_driver goldfish_audio_driver = { .probe = goldfish_audio_probe, .remove = goldfish_audio_remove, .driver = { .name = "goldfish_audio", .of_match_table = goldfish_audio_of_match, + .acpi_match_table = ACPI_PTR(goldfish_audio_acpi_match), } };
diff --git a/drivers/staging/goldfish/goldfish_sync_timeline.c b/drivers/staging/goldfish/goldfish_sync_timeline.c new file mode 100644 index 0000000..5bef4c6 --- /dev/null +++ b/drivers/staging/goldfish/goldfish_sync_timeline.c
@@ -0,0 +1,962 @@ +/* + * Copyright (C) 2016 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/fdtable.h> +#include <linux/file.h> +#include <linux/init.h> +#include <linux/miscdevice.h> +#include <linux/module.h> +#include <linux/kernel.h> +#include <linux/platform_device.h> + +#include <linux/interrupt.h> +#include <linux/kref.h> +#include <linux/spinlock.h> +#include <linux/types.h> + +#include <linux/io.h> +#include <linux/mm.h> +#include <linux/acpi.h> + +#include <linux/string.h> + +#include <linux/fs.h> +#include <linux/syscalls.h> +#include <linux/sync_file.h> +#include <linux/fence.h> + +#include "goldfish_sync_timeline_fence.h" + +#define ERR(...) printk(KERN_ERR __VA_ARGS__); + +#define INFO(...) printk(KERN_INFO __VA_ARGS__); + +#define DPRINT(...) pr_debug(__VA_ARGS__); + +#define DTRACE() DPRINT("%s: enter", __func__) + +/* The Goldfish sync driver is designed to provide a interface + * between the underlying host's sync device and the kernel's + * fence sync framework.. + * The purpose of the device/driver is to enable lightweight + * creation and signaling of timelines and fences + * in order to synchronize the guest with host-side graphics events. + * + * Each time the interrupt trips, the driver + * may perform a sync operation. + */ + +/* The operations are: */ + +/* Ready signal - used to mark when irq should lower */ +#define CMD_SYNC_READY 0 + +/* Create a new timeline. writes timeline handle */ +#define CMD_CREATE_SYNC_TIMELINE 1 + +/* Create a fence object. reads timeline handle and time argument. + * Writes fence fd to the SYNC_REG_HANDLE register. */ +#define CMD_CREATE_SYNC_FENCE 2 + +/* Increments timeline. reads timeline handle and time argument */ +#define CMD_SYNC_TIMELINE_INC 3 + +/* Destroys a timeline. reads timeline handle */ +#define CMD_DESTROY_SYNC_TIMELINE 4 + +/* Starts a wait on the host with + * the given glsync object and sync thread handle. */ +#define CMD_TRIGGER_HOST_WAIT 5 + +/* The register layout is: */ + +#define SYNC_REG_BATCH_COMMAND 0x00 /* host->guest batch commands */ +#define SYNC_REG_BATCH_GUESTCOMMAND 0x04 /* guest->host batch commands */ +#define SYNC_REG_BATCH_COMMAND_ADDR 0x08 /* communicate physical address of host->guest batch commands */ +#define SYNC_REG_BATCH_COMMAND_ADDR_HIGH 0x0c /* 64-bit part */ +#define SYNC_REG_BATCH_GUESTCOMMAND_ADDR 0x10 /* communicate physical address of guest->host commands */ +#define SYNC_REG_BATCH_GUESTCOMMAND_ADDR_HIGH 0x14 /* 64-bit part */ +#define SYNC_REG_INIT 0x18 /* signals that the device has been probed */ + +/* There is an ioctl associated with goldfish sync driver. + * Make it conflict with ioctls that are not likely to be used + * in the emulator. + * + * '@' 00-0F linux/radeonfb.h conflict! + * '@' 00-0F drivers/video/aty/aty128fb.c conflict! + */ +#define GOLDFISH_SYNC_IOC_MAGIC '@' + +#define GOLDFISH_SYNC_IOC_QUEUE_WORK _IOWR(GOLDFISH_SYNC_IOC_MAGIC, 0, struct goldfish_sync_ioctl_info) + +/* The above definitions (command codes, register layout, ioctl definitions) + * need to be in sync with the following files: + * + * Host-side (emulator): + * external/qemu/android/emulation/goldfish_sync.h + * external/qemu-android/hw/misc/goldfish_sync.c + * + * Guest-side (system image): + * device/generic/goldfish-opengl/system/egl/goldfish_sync.h + * device/generic/goldfish/ueventd.ranchu.rc + * platform/build/target/board/generic/sepolicy/file_contexts + */ +struct goldfish_sync_hostcmd { + /* sorted for alignment */ + uint64_t handle; + uint64_t hostcmd_handle; + uint32_t cmd; + uint32_t time_arg; +}; + +struct goldfish_sync_guestcmd { + uint64_t host_command; /* uint64_t for alignment */ + uint64_t glsync_handle; + uint64_t thread_handle; + uint64_t guest_timeline_handle; +}; + +#define GOLDFISH_SYNC_MAX_CMDS 32 + +struct goldfish_sync_state { + char __iomem *reg_base; + int irq; + + /* Spinlock protects |to_do| / |to_do_end|. */ + spinlock_t lock; + /* |mutex_lock| protects all concurrent access + * to timelines for both kernel and user space. */ + struct mutex mutex_lock; + + /* Buffer holding commands issued from host. */ + struct goldfish_sync_hostcmd to_do[GOLDFISH_SYNC_MAX_CMDS]; + uint32_t to_do_end; + + /* Addresses for the reading or writing + * of individual commands. The host can directly write + * to |batch_hostcmd| (and then this driver immediately + * copies contents to |to_do|). This driver either replies + * through |batch_hostcmd| or simply issues a + * guest->host command through |batch_guestcmd|. + */ + struct goldfish_sync_hostcmd *batch_hostcmd; + struct goldfish_sync_guestcmd *batch_guestcmd; + + /* Used to give this struct itself to a work queue + * function for executing actual sync commands. */ + struct work_struct work_item; +}; + +static struct goldfish_sync_state global_sync_state[1]; + +struct goldfish_sync_timeline_obj { + struct goldfish_sync_timeline *sync_tl; + uint32_t current_time; + /* We need to be careful about when we deallocate + * this |goldfish_sync_timeline_obj| struct. + * In order to ensure proper cleanup, we need to + * consider the triggered host-side wait that may + * still be in flight when the guest close()'s a + * goldfish_sync device's sync context fd (and + * destroys the |sync_tl| field above). + * The host-side wait may raise IRQ + * and tell the kernel to increment the timeline _after_ + * the |sync_tl| has already been set to null. + * + * From observations on OpenGL apps and CTS tests, this + * happens at some very low probability upon context + * destruction or process close, but it does happen + * and it needs to be handled properly. Otherwise, + * if we clean up the surrounding |goldfish_sync_timeline_obj| + * too early, any |handle| field of any host->guest command + * might not even point to a null |sync_tl| field, + * but to garbage memory or even a reclaimed |sync_tl|. + * If we do not count such "pending waits" and kfree the object + * immediately upon |goldfish_sync_timeline_destroy|, + * we might get mysterous RCU stalls after running a long + * time because the garbage memory that is being read + * happens to be interpretable as a |spinlock_t| struct + * that is currently in the locked state. + * + * To track when to free the |goldfish_sync_timeline_obj| + * itself, we maintain a kref. + * The kref essentially counts the timeline itself plus + * the number of waits in flight. kref_init/kref_put + * are issued on + * |goldfish_sync_timeline_create|/|goldfish_sync_timeline_destroy| + * and kref_get/kref_put are issued on + * |goldfish_sync_fence_create|/|goldfish_sync_timeline_inc|. + * + * The timeline is destroyed after reference count + * reaches zero, which would happen after + * |goldfish_sync_timeline_destroy| and all pending + * |goldfish_sync_timeline_inc|'s are fulfilled. + * + * NOTE (1): We assume that |fence_create| and + * |timeline_inc| calls are 1:1, otherwise the kref scheme + * will not work. This is a valid assumption as long + * as the host-side virtual device implementation + * does not insert any timeline increments + * that we did not trigger from here. + * + * NOTE (2): The use of kref by itself requires no locks, + * but this does not mean everything works without locks. + * Related timeline operations do require a lock of some sort, + * or at least are not proven to work without it. + * In particualr, we assume that all the operations + * done on the |kref| field above are done in contexts where + * |global_sync_state->mutex_lock| is held. Do not + * remove that lock until everything is proven to work + * without it!!! */ + struct kref kref; +}; + +/* We will call |delete_timeline_obj| when the last reference count + * of the kref is decremented. This deletes the sync + * timeline object along with the wrapper itself. */ +static void delete_timeline_obj(struct kref* kref) { + struct goldfish_sync_timeline_obj* obj = + container_of(kref, struct goldfish_sync_timeline_obj, kref); + + goldfish_sync_timeline_put_internal(obj->sync_tl); + obj->sync_tl = NULL; + kfree(obj); +} + +static uint64_t gensym_ctr; +static void gensym(char *dst) +{ + sprintf(dst, "goldfish_sync:gensym:%llu", gensym_ctr); + gensym_ctr++; +} + +/* |goldfish_sync_timeline_create| assumes that |global_sync_state->mutex_lock| + * is held. */ +static struct goldfish_sync_timeline_obj* +goldfish_sync_timeline_create(void) +{ + + char timeline_name[256]; + struct goldfish_sync_timeline *res_sync_tl = NULL; + struct goldfish_sync_timeline_obj *res; + + DTRACE(); + + gensym(timeline_name); + + res_sync_tl = goldfish_sync_timeline_create_internal(timeline_name); + if (!res_sync_tl) { + ERR("Failed to create goldfish_sw_sync timeline."); + return NULL; + } + + res = kzalloc(sizeof(struct goldfish_sync_timeline_obj), GFP_KERNEL); + res->sync_tl = res_sync_tl; + res->current_time = 0; + kref_init(&res->kref); + + DPRINT("new timeline_obj=0x%p", res); + return res; +} + +/* |goldfish_sync_fence_create| assumes that |global_sync_state->mutex_lock| + * is held. */ +static int +goldfish_sync_fence_create(struct goldfish_sync_timeline_obj *obj, + uint32_t val) +{ + + int fd; + char fence_name[256]; + struct sync_pt *syncpt = NULL; + struct sync_file *sync_file_obj = NULL; + struct goldfish_sync_timeline *tl; + + DTRACE(); + + if (!obj) return -1; + + tl = obj->sync_tl; + + syncpt = goldfish_sync_pt_create_internal( + tl, sizeof(struct sync_pt) + 4, val); + if (!syncpt) { + ERR("could not create sync point! " + "goldfish_sync_timeline=0x%p val=%d", + tl, val); + return -1; + } + + fd = get_unused_fd_flags(O_CLOEXEC); + if (fd < 0) { + ERR("could not get unused fd for sync fence. " + "errno=%d", fd); + goto err_cleanup_pt; + } + + gensym(fence_name); + + sync_file_obj = sync_file_create(&syncpt->base); + if (!sync_file_obj) { + ERR("could not create sync fence! " + "goldfish_sync_timeline=0x%p val=%d sync_pt=0x%p", + tl, val, syncpt); + goto err_cleanup_fd_pt; + } + + DPRINT("installing sync fence into fd %d sync_file_obj=0x%p", + fd, sync_file_obj); + fd_install(fd, sync_file_obj->file); + kref_get(&obj->kref); + + return fd; + +err_cleanup_fd_pt: + put_unused_fd(fd); +err_cleanup_pt: + fence_put(&syncpt->base); + return -1; +} + +/* |goldfish_sync_timeline_inc| assumes that |global_sync_state->mutex_lock| + * is held. */ +static void +goldfish_sync_timeline_inc(struct goldfish_sync_timeline_obj *obj, uint32_t inc) +{ + DTRACE(); + /* Just give up if someone else nuked the timeline. + * Whoever it was won't care that it doesn't get signaled. */ + if (!obj) return; + + DPRINT("timeline_obj=0x%p", obj); + goldfish_sync_timeline_signal_internal(obj->sync_tl, inc); + DPRINT("incremented timeline. increment max_time"); + obj->current_time += inc; + + /* Here, we will end up deleting the timeline object if it + * turns out that this call was a pending increment after + * |goldfish_sync_timeline_destroy| was called. */ + kref_put(&obj->kref, delete_timeline_obj); + DPRINT("done"); +} + +/* |goldfish_sync_timeline_destroy| assumes + * that |global_sync_state->mutex_lock| is held. */ +static void +goldfish_sync_timeline_destroy(struct goldfish_sync_timeline_obj *obj) +{ + DTRACE(); + /* See description of |goldfish_sync_timeline_obj| for why we + * should not immediately destroy |obj| */ + kref_put(&obj->kref, delete_timeline_obj); +} + +static inline void +goldfish_sync_cmd_queue(struct goldfish_sync_state *sync_state, + uint32_t cmd, + uint64_t handle, + uint32_t time_arg, + uint64_t hostcmd_handle) +{ + struct goldfish_sync_hostcmd *to_add; + + DTRACE(); + + BUG_ON(sync_state->to_do_end == GOLDFISH_SYNC_MAX_CMDS); + + to_add = &sync_state->to_do[sync_state->to_do_end]; + + to_add->cmd = cmd; + to_add->handle = handle; + to_add->time_arg = time_arg; + to_add->hostcmd_handle = hostcmd_handle; + + sync_state->to_do_end += 1; +} + +static inline void +goldfish_sync_hostcmd_reply(struct goldfish_sync_state *sync_state, + uint32_t cmd, + uint64_t handle, + uint32_t time_arg, + uint64_t hostcmd_handle) +{ + unsigned long irq_flags; + struct goldfish_sync_hostcmd *batch_hostcmd = + sync_state->batch_hostcmd; + + DTRACE(); + + spin_lock_irqsave(&sync_state->lock, irq_flags); + + batch_hostcmd->cmd = cmd; + batch_hostcmd->handle = handle; + batch_hostcmd->time_arg = time_arg; + batch_hostcmd->hostcmd_handle = hostcmd_handle; + writel(0, sync_state->reg_base + SYNC_REG_BATCH_COMMAND); + + spin_unlock_irqrestore(&sync_state->lock, irq_flags); +} + +static inline void +goldfish_sync_send_guestcmd(struct goldfish_sync_state *sync_state, + uint32_t cmd, + uint64_t glsync_handle, + uint64_t thread_handle, + uint64_t timeline_handle) +{ + unsigned long irq_flags; + struct goldfish_sync_guestcmd *batch_guestcmd = + sync_state->batch_guestcmd; + + DTRACE(); + + spin_lock_irqsave(&sync_state->lock, irq_flags); + + batch_guestcmd->host_command = (uint64_t)cmd; + batch_guestcmd->glsync_handle = (uint64_t)glsync_handle; + batch_guestcmd->thread_handle = (uint64_t)thread_handle; + batch_guestcmd->guest_timeline_handle = (uint64_t)timeline_handle; + writel(0, sync_state->reg_base + SYNC_REG_BATCH_GUESTCOMMAND); + + spin_unlock_irqrestore(&sync_state->lock, irq_flags); +} + +/* |goldfish_sync_interrupt| handles IRQ raises from the virtual device. + * In the context of OpenGL, this interrupt will fire whenever we need + * to signal a fence fd in the guest, with the command + * |CMD_SYNC_TIMELINE_INC|. + * However, because this function will be called in an interrupt context, + * it is necessary to do the actual work of signaling off of interrupt context. + * The shared work queue is used for this purpose. At the end when + * all pending commands are intercepted by the interrupt handler, + * we call |schedule_work|, which will later run the actual + * desired sync command in |goldfish_sync_work_item_fn|. + */ +static irqreturn_t goldfish_sync_interrupt(int irq, void *dev_id) +{ + + struct goldfish_sync_state *sync_state = dev_id; + + uint32_t nextcmd; + uint32_t command_r; + uint64_t handle_rw; + uint32_t time_r; + uint64_t hostcmd_handle_rw; + + int count = 0; + + DTRACE(); + + sync_state = dev_id; + + spin_lock(&sync_state->lock); + + for (;;) { + + readl(sync_state->reg_base + SYNC_REG_BATCH_COMMAND); + nextcmd = sync_state->batch_hostcmd->cmd; + + if (nextcmd == 0) + break; + + command_r = nextcmd; + handle_rw = sync_state->batch_hostcmd->handle; + time_r = sync_state->batch_hostcmd->time_arg; + hostcmd_handle_rw = sync_state->batch_hostcmd->hostcmd_handle; + + goldfish_sync_cmd_queue( + sync_state, + command_r, + handle_rw, + time_r, + hostcmd_handle_rw); + + count++; + } + + spin_unlock(&sync_state->lock); + + schedule_work(&sync_state->work_item); + + return (count == 0) ? IRQ_NONE : IRQ_HANDLED; +} + +/* |goldfish_sync_work_item_fn| does the actual work of servicing + * host->guest sync commands. This function is triggered whenever + * the IRQ for the goldfish sync device is raised. Once it starts + * running, it grabs the contents of the buffer containing the + * commands it needs to execute (there may be multiple, because + * our IRQ is active high and not edge triggered), and then + * runs all of them one after the other. + */ +static void goldfish_sync_work_item_fn(struct work_struct *input) +{ + + struct goldfish_sync_state *sync_state; + int sync_fence_fd; + + struct goldfish_sync_timeline_obj *timeline; + uint64_t timeline_ptr; + + uint64_t hostcmd_handle; + + uint32_t cmd; + uint64_t handle; + uint32_t time_arg; + + struct goldfish_sync_hostcmd *todo; + uint32_t todo_end; + + unsigned long irq_flags; + + struct goldfish_sync_hostcmd to_run[GOLDFISH_SYNC_MAX_CMDS]; + uint32_t i = 0; + + sync_state = container_of(input, struct goldfish_sync_state, work_item); + + mutex_lock(&sync_state->mutex_lock); + + spin_lock_irqsave(&sync_state->lock, irq_flags); { + + todo_end = sync_state->to_do_end; + + DPRINT("num sync todos: %u", sync_state->to_do_end); + + for (i = 0; i < todo_end; i++) + to_run[i] = sync_state->to_do[i]; + + /* We expect that commands will come in at a slow enough rate + * so that incoming items will not be more than + * GOLDFISH_SYNC_MAX_CMDS. + * + * This is because the way the sync device is used, + * it's only for managing buffer data transfers per frame, + * with a sequential dependency between putting things in + * to_do and taking them out. Once a set of commands is + * queued up in to_do, the user of the device waits for + * them to be processed before queuing additional commands, + * which limits the rate at which commands come in + * to the rate at which we take them out here. + * + * We also don't expect more than MAX_CMDS to be issued + * at once; there is a correspondence between + * which buffers need swapping to the (display / buffer queue) + * to particular commands, and we don't expect there to be + * enough display or buffer queues in operation at once + * to overrun GOLDFISH_SYNC_MAX_CMDS. + */ + sync_state->to_do_end = 0; + + } spin_unlock_irqrestore(&sync_state->lock, irq_flags); + + for (i = 0; i < todo_end; i++) { + DPRINT("todo index: %u", i); + + todo = &to_run[i]; + + cmd = todo->cmd; + + handle = (uint64_t)todo->handle; + time_arg = todo->time_arg; + hostcmd_handle = (uint64_t)todo->hostcmd_handle; + + DTRACE(); + + timeline = (struct goldfish_sync_timeline_obj *)(uintptr_t)handle; + + switch (cmd) { + case CMD_SYNC_READY: + break; + case CMD_CREATE_SYNC_TIMELINE: + DPRINT("exec CMD_CREATE_SYNC_TIMELINE: " + "handle=0x%llx time_arg=%d", + handle, time_arg); + timeline = goldfish_sync_timeline_create(); + timeline_ptr = (uintptr_t)timeline; + goldfish_sync_hostcmd_reply(sync_state, CMD_CREATE_SYNC_TIMELINE, + timeline_ptr, + 0, + hostcmd_handle); + DPRINT("sync timeline created: %p", timeline); + break; + case CMD_CREATE_SYNC_FENCE: + DPRINT("exec CMD_CREATE_SYNC_FENCE: " + "handle=0x%llx time_arg=%d", + handle, time_arg); + sync_fence_fd = goldfish_sync_fence_create(timeline, time_arg); + goldfish_sync_hostcmd_reply(sync_state, CMD_CREATE_SYNC_FENCE, + sync_fence_fd, + 0, + hostcmd_handle); + break; + case CMD_SYNC_TIMELINE_INC: + DPRINT("exec CMD_SYNC_TIMELINE_INC: " + "handle=0x%llx time_arg=%d", + handle, time_arg); + goldfish_sync_timeline_inc(timeline, time_arg); + break; + case CMD_DESTROY_SYNC_TIMELINE: + DPRINT("exec CMD_DESTROY_SYNC_TIMELINE: " + "handle=0x%llx time_arg=%d", + handle, time_arg); + goldfish_sync_timeline_destroy(timeline); + break; + } + DPRINT("Done executing sync command"); + } + mutex_unlock(&sync_state->mutex_lock); +} + +/* Guest-side interface: file operations */ + +/* Goldfish sync context and ioctl info. + * + * When a sync context is created by open()-ing the goldfish sync device, we + * create a sync context (|goldfish_sync_context|). + * + * Currently, the only data required to track is the sync timeline itself + * along with the current time, which are all packed up in the + * |goldfish_sync_timeline_obj| field. We use a |goldfish_sync_context| + * as the filp->private_data. + * + * Next, when a sync context user requests that work be queued and a fence + * fd provided, we use the |goldfish_sync_ioctl_info| struct, which holds + * information about which host handles to touch for this particular + * queue-work operation. We need to know about the host-side sync thread + * and the particular host-side GLsync object. We also possibly write out + * a file descriptor. + */ +struct goldfish_sync_context { + struct goldfish_sync_timeline_obj *timeline; +}; + +struct goldfish_sync_ioctl_info { + uint64_t host_glsync_handle_in; + uint64_t host_syncthread_handle_in; + int fence_fd_out; +}; + +static int goldfish_sync_open(struct inode *inode, struct file *file) +{ + + struct goldfish_sync_context *sync_context; + + DTRACE(); + + mutex_lock(&global_sync_state->mutex_lock); + + sync_context = kzalloc(sizeof(struct goldfish_sync_context), GFP_KERNEL); + + if (sync_context == NULL) { + ERR("Creation of goldfish sync context failed!"); + mutex_unlock(&global_sync_state->mutex_lock); + return -ENOMEM; + } + + sync_context->timeline = NULL; + + file->private_data = sync_context; + + DPRINT("successfully create a sync context @0x%p", sync_context); + + mutex_unlock(&global_sync_state->mutex_lock); + + return 0; +} + +static int goldfish_sync_release(struct inode *inode, struct file *file) +{ + + struct goldfish_sync_context *sync_context; + + DTRACE(); + + mutex_lock(&global_sync_state->mutex_lock); + + sync_context = file->private_data; + + if (sync_context->timeline) + goldfish_sync_timeline_destroy(sync_context->timeline); + + sync_context->timeline = NULL; + + kfree(sync_context); + + mutex_unlock(&global_sync_state->mutex_lock); + + return 0; +} + +/* |goldfish_sync_ioctl| is the guest-facing interface of goldfish sync + * and is used in conjunction with eglCreateSyncKHR to queue up the + * actual work of waiting for the EGL sync command to complete, + * possibly returning a fence fd to the guest. + */ +static long goldfish_sync_ioctl(struct file *file, + unsigned int cmd, + unsigned long arg) +{ + struct goldfish_sync_context *sync_context_data; + struct goldfish_sync_timeline_obj *timeline; + int fd_out; + struct goldfish_sync_ioctl_info ioctl_data; + + DTRACE(); + + sync_context_data = file->private_data; + fd_out = -1; + + switch (cmd) { + case GOLDFISH_SYNC_IOC_QUEUE_WORK: + + DPRINT("exec GOLDFISH_SYNC_IOC_QUEUE_WORK"); + + mutex_lock(&global_sync_state->mutex_lock); + + if (copy_from_user(&ioctl_data, + (void __user *)arg, + sizeof(ioctl_data))) { + ERR("Failed to copy memory for ioctl_data from user."); + mutex_unlock(&global_sync_state->mutex_lock); + return -EFAULT; + } + + if (ioctl_data.host_syncthread_handle_in == 0) { + DPRINT("Error: zero host syncthread handle!!!"); + mutex_unlock(&global_sync_state->mutex_lock); + return -EFAULT; + } + + if (!sync_context_data->timeline) { + DPRINT("no timeline yet, create one."); + sync_context_data->timeline = goldfish_sync_timeline_create(); + DPRINT("timeline: 0x%p", &sync_context_data->timeline); + } + + timeline = sync_context_data->timeline; + fd_out = goldfish_sync_fence_create(timeline, + timeline->current_time + 1); + DPRINT("Created fence with fd %d and current time %u (timeline: 0x%p)", + fd_out, + sync_context_data->timeline->current_time + 1, + sync_context_data->timeline); + + ioctl_data.fence_fd_out = fd_out; + + if (copy_to_user((void __user *)arg, + &ioctl_data, + sizeof(ioctl_data))) { + DPRINT("Error, could not copy to user!!!"); + + sys_close(fd_out); + /* We won't be doing an increment, kref_put immediately. */ + kref_put(&timeline->kref, delete_timeline_obj); + mutex_unlock(&global_sync_state->mutex_lock); + return -EFAULT; + } + + /* We are now about to trigger a host-side wait; + * accumulate on |pending_waits|. */ + goldfish_sync_send_guestcmd(global_sync_state, + CMD_TRIGGER_HOST_WAIT, + ioctl_data.host_glsync_handle_in, + ioctl_data.host_syncthread_handle_in, + (uint64_t)(uintptr_t)(sync_context_data->timeline)); + + mutex_unlock(&global_sync_state->mutex_lock); + return 0; + default: + return -ENOTTY; + } +} + +static const struct file_operations goldfish_sync_fops = { + .owner = THIS_MODULE, + .open = goldfish_sync_open, + .release = goldfish_sync_release, + .unlocked_ioctl = goldfish_sync_ioctl, + .compat_ioctl = goldfish_sync_ioctl, +}; + +static struct miscdevice goldfish_sync_device = { + .name = "goldfish_sync", + .fops = &goldfish_sync_fops, +}; + + +static bool setup_verify_batch_cmd_addr(struct goldfish_sync_state *sync_state, + void *batch_addr, + uint32_t addr_offset, + uint32_t addr_offset_high) +{ + uint64_t batch_addr_phys; + uint32_t batch_addr_phys_test_lo; + uint32_t batch_addr_phys_test_hi; + + if (!batch_addr) { + ERR("Could not use batch command address!"); + return false; + } + + batch_addr_phys = virt_to_phys(batch_addr); + writel((uint32_t)(batch_addr_phys), + sync_state->reg_base + addr_offset); + writel((uint32_t)(batch_addr_phys >> 32), + sync_state->reg_base + addr_offset_high); + + batch_addr_phys_test_lo = + readl(sync_state->reg_base + addr_offset); + batch_addr_phys_test_hi = + readl(sync_state->reg_base + addr_offset_high); + + if (virt_to_phys(batch_addr) != + (((uint64_t)batch_addr_phys_test_hi << 32) | + batch_addr_phys_test_lo)) { + ERR("Invalid batch command address!"); + return false; + } + + return true; +} + +int goldfish_sync_probe(struct platform_device *pdev) +{ + struct resource *ioresource; + struct goldfish_sync_state *sync_state = global_sync_state; + int status; + + DTRACE(); + + sync_state->to_do_end = 0; + + spin_lock_init(&sync_state->lock); + mutex_init(&sync_state->mutex_lock); + + platform_set_drvdata(pdev, sync_state); + + ioresource = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (ioresource == NULL) { + ERR("platform_get_resource failed"); + return -ENODEV; + } + + sync_state->reg_base = + devm_ioremap(&pdev->dev, ioresource->start, PAGE_SIZE); + if (sync_state->reg_base == NULL) { + ERR("Could not ioremap"); + return -ENOMEM; + } + + sync_state->irq = platform_get_irq(pdev, 0); + if (sync_state->irq < 0) { + ERR("Could not platform_get_irq"); + return -ENODEV; + } + + status = devm_request_irq(&pdev->dev, + sync_state->irq, + goldfish_sync_interrupt, + IRQF_SHARED, + pdev->name, + sync_state); + if (status) { + ERR("request_irq failed"); + return -ENODEV; + } + + INIT_WORK(&sync_state->work_item, + goldfish_sync_work_item_fn); + + misc_register(&goldfish_sync_device); + + /* Obtain addresses for batch send/recv of commands. */ + { + struct goldfish_sync_hostcmd *batch_addr_hostcmd; + struct goldfish_sync_guestcmd *batch_addr_guestcmd; + + batch_addr_hostcmd = + devm_kzalloc(&pdev->dev, sizeof(struct goldfish_sync_hostcmd), + GFP_KERNEL); + batch_addr_guestcmd = + devm_kzalloc(&pdev->dev, sizeof(struct goldfish_sync_guestcmd), + GFP_KERNEL); + + if (!setup_verify_batch_cmd_addr(sync_state, + batch_addr_hostcmd, + SYNC_REG_BATCH_COMMAND_ADDR, + SYNC_REG_BATCH_COMMAND_ADDR_HIGH)) { + ERR("goldfish_sync: Could not setup batch command address"); + return -ENODEV; + } + + if (!setup_verify_batch_cmd_addr(sync_state, + batch_addr_guestcmd, + SYNC_REG_BATCH_GUESTCOMMAND_ADDR, + SYNC_REG_BATCH_GUESTCOMMAND_ADDR_HIGH)) { + ERR("goldfish_sync: Could not setup batch guest command address"); + return -ENODEV; + } + + sync_state->batch_hostcmd = batch_addr_hostcmd; + sync_state->batch_guestcmd = batch_addr_guestcmd; + } + + INFO("goldfish_sync: Initialized goldfish sync device"); + + writel(0, sync_state->reg_base + SYNC_REG_INIT); + + return 0; +} + +static int goldfish_sync_remove(struct platform_device *pdev) +{ + struct goldfish_sync_state *sync_state = global_sync_state; + + DTRACE(); + + misc_deregister(&goldfish_sync_device); + memset(sync_state, 0, sizeof(struct goldfish_sync_state)); + return 0; +} + +static const struct of_device_id goldfish_sync_of_match[] = { + { .compatible = "google,goldfish-sync", }, + {}, +}; +MODULE_DEVICE_TABLE(of, goldfish_sync_of_match); + +static const struct acpi_device_id goldfish_sync_acpi_match[] = { + { "GFSH0006", 0 }, + { }, +}; + +MODULE_DEVICE_TABLE(acpi, goldfish_sync_acpi_match); + +static struct platform_driver goldfish_sync = { + .probe = goldfish_sync_probe, + .remove = goldfish_sync_remove, + .driver = { + .name = "goldfish_sync", + .of_match_table = goldfish_sync_of_match, + .acpi_match_table = ACPI_PTR(goldfish_sync_acpi_match), + } +}; + +module_platform_driver(goldfish_sync); + +MODULE_AUTHOR("Google, Inc."); +MODULE_DESCRIPTION("Android QEMU Sync Driver"); +MODULE_LICENSE("GPL"); +MODULE_VERSION("1.0");
diff --git a/drivers/staging/goldfish/goldfish_sync_timeline_fence.c b/drivers/staging/goldfish/goldfish_sync_timeline_fence.c new file mode 100644 index 0000000..e671618 --- /dev/null +++ b/drivers/staging/goldfish/goldfish_sync_timeline_fence.c
@@ -0,0 +1,254 @@ +#include <linux/slab.h> +#include <linux/fs.h> +#include <linux/syscalls.h> +#include <linux/sync_file.h> +#include <linux/fence.h> + +#include "goldfish_sync_timeline_fence.h" + +/* + * Timeline-based sync for Goldfish Sync + * Based on "Sync File validation framework" + * (drivers/dma-buf/sw_sync.c) + * + * Copyright (C) 2017 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +/** + * struct goldfish_sync_timeline - sync object + * @kref: reference count on fence. + * @name: name of the goldfish_sync_timeline. Useful for debugging + * @child_list_head: list of children sync_pts for this goldfish_sync_timeline + * @child_list_lock: lock protecting @child_list_head and fence.status + * @active_list_head: list of active (unsignaled/errored) sync_pts + */ +struct goldfish_sync_timeline { + struct kref kref; + char name[32]; + + /* protected by child_list_lock */ + u64 context; + int value; + + struct list_head child_list_head; + spinlock_t child_list_lock; + + struct list_head active_list_head; +}; + +static inline struct goldfish_sync_timeline *fence_parent(struct fence *fence) +{ + return container_of(fence->lock, struct goldfish_sync_timeline, + child_list_lock); +} + +static const struct fence_ops goldfish_sync_timeline_fence_ops; + +static inline struct sync_pt *goldfish_sync_fence_to_sync_pt(struct fence *fence) +{ + if (fence->ops != &goldfish_sync_timeline_fence_ops) + return NULL; + return container_of(fence, struct sync_pt, base); +} + +/** + * goldfish_sync_timeline_create_internal() - creates a sync object + * @name: sync_timeline name + * + * Creates a new sync_timeline. Returns the sync_timeline object or NULL in + * case of error. + */ +struct goldfish_sync_timeline +*goldfish_sync_timeline_create_internal(const char *name) +{ + struct goldfish_sync_timeline *obj; + + obj = kzalloc(sizeof(*obj), GFP_KERNEL); + if (!obj) + return NULL; + + kref_init(&obj->kref); + obj->context = fence_context_alloc(1); + strlcpy(obj->name, name, sizeof(obj->name)); + + INIT_LIST_HEAD(&obj->child_list_head); + INIT_LIST_HEAD(&obj->active_list_head); + spin_lock_init(&obj->child_list_lock); + + return obj; +} + +static void goldfish_sync_timeline_free_internal(struct kref *kref) +{ + struct goldfish_sync_timeline *obj = + container_of(kref, struct goldfish_sync_timeline, kref); + + kfree(obj); +} + +static void goldfish_sync_timeline_get_internal( + struct goldfish_sync_timeline *obj) +{ + kref_get(&obj->kref); +} + +void goldfish_sync_timeline_put_internal(struct goldfish_sync_timeline *obj) +{ + kref_put(&obj->kref, goldfish_sync_timeline_free_internal); +} + +/** + * goldfish_sync_timeline_signal() - + * signal a status change on a goldfish_sync_timeline + * @obj: sync_timeline to signal + * @inc: num to increment on timeline->value + * + * A sync implementation should call this any time one of it's fences + * has signaled or has an error condition. + */ +void goldfish_sync_timeline_signal_internal(struct goldfish_sync_timeline *obj, + unsigned int inc) +{ + unsigned long flags; + struct sync_pt *pt, *next; + + spin_lock_irqsave(&obj->child_list_lock, flags); + + obj->value += inc; + + list_for_each_entry_safe(pt, next, &obj->active_list_head, + active_list) { + if (fence_is_signaled_locked(&pt->base)) + list_del_init(&pt->active_list); + } + + spin_unlock_irqrestore(&obj->child_list_lock, flags); +} + +/** + * goldfish_sync_pt_create_internal() - creates a sync pt + * @parent: fence's parent sync_timeline + * @size: size to allocate for this pt + * @inc: value of the fence + * + * Creates a new sync_pt as a child of @parent. @size bytes will be + * allocated allowing for implementation specific data to be kept after + * the generic sync_timeline struct. Returns the sync_pt object or + * NULL in case of error. + */ +struct sync_pt *goldfish_sync_pt_create_internal( + struct goldfish_sync_timeline *obj, int size, + unsigned int value) +{ + unsigned long flags; + struct sync_pt *pt; + + if (size < sizeof(*pt)) + return NULL; + + pt = kzalloc(size, GFP_KERNEL); + if (!pt) + return NULL; + + spin_lock_irqsave(&obj->child_list_lock, flags); + goldfish_sync_timeline_get_internal(obj); + fence_init(&pt->base, &goldfish_sync_timeline_fence_ops, &obj->child_list_lock, + obj->context, value); + list_add_tail(&pt->child_list, &obj->child_list_head); + INIT_LIST_HEAD(&pt->active_list); + spin_unlock_irqrestore(&obj->child_list_lock, flags); + return pt; +} + +static const char *goldfish_sync_timeline_fence_get_driver_name( + struct fence *fence) +{ + return "sw_sync"; +} + +static const char *goldfish_sync_timeline_fence_get_timeline_name( + struct fence *fence) +{ + struct goldfish_sync_timeline *parent = fence_parent(fence); + + return parent->name; +} + +static void goldfish_sync_timeline_fence_release(struct fence *fence) +{ + struct sync_pt *pt = goldfish_sync_fence_to_sync_pt(fence); + struct goldfish_sync_timeline *parent = fence_parent(fence); + unsigned long flags; + + spin_lock_irqsave(fence->lock, flags); + list_del(&pt->child_list); + if (!list_empty(&pt->active_list)) + list_del(&pt->active_list); + spin_unlock_irqrestore(fence->lock, flags); + + goldfish_sync_timeline_put_internal(parent); + fence_free(fence); +} + +static bool goldfish_sync_timeline_fence_signaled(struct fence *fence) +{ + struct goldfish_sync_timeline *parent = fence_parent(fence); + + return (fence->seqno > parent->value) ? false : true; +} + +static bool goldfish_sync_timeline_fence_enable_signaling(struct fence *fence) +{ + struct sync_pt *pt = goldfish_sync_fence_to_sync_pt(fence); + struct goldfish_sync_timeline *parent = fence_parent(fence); + + if (goldfish_sync_timeline_fence_signaled(fence)) + return false; + + list_add_tail(&pt->active_list, &parent->active_list_head); + return true; +} + +static void goldfish_sync_timeline_fence_disable_signaling(struct fence *fence) +{ + struct sync_pt *pt = container_of(fence, struct sync_pt, base); + + list_del_init(&pt->active_list); +} + +static void goldfish_sync_timeline_fence_value_str(struct fence *fence, + char *str, int size) +{ + snprintf(str, size, "%d", fence->seqno); +} + +static void goldfish_sync_timeline_fence_timeline_value_str( + struct fence *fence, + char *str, int size) +{ + struct goldfish_sync_timeline *parent = fence_parent(fence); + + snprintf(str, size, "%d", parent->value); +} + +static const struct fence_ops goldfish_sync_timeline_fence_ops = { + .get_driver_name = goldfish_sync_timeline_fence_get_driver_name, + .get_timeline_name = goldfish_sync_timeline_fence_get_timeline_name, + .enable_signaling = goldfish_sync_timeline_fence_enable_signaling, + .disable_signaling = goldfish_sync_timeline_fence_disable_signaling, + .signaled = goldfish_sync_timeline_fence_signaled, + .wait = fence_default_wait, + .release = goldfish_sync_timeline_fence_release, + .fence_value_str = goldfish_sync_timeline_fence_value_str, + .timeline_value_str = goldfish_sync_timeline_fence_timeline_value_str, +};
diff --git a/drivers/staging/goldfish/goldfish_sync_timeline_fence.h b/drivers/staging/goldfish/goldfish_sync_timeline_fence.h new file mode 100644 index 0000000..fc25924 --- /dev/null +++ b/drivers/staging/goldfish/goldfish_sync_timeline_fence.h
@@ -0,0 +1,58 @@ +#include <linux/sync_file.h> +#include <linux/fence.h> + +/** + * struct sync_pt - sync_pt object + * @base: base fence object + * @child_list: sync timeline child's list + * @active_list: sync timeline active child's list + */ +struct sync_pt { + struct fence base; + struct list_head child_list; + struct list_head active_list; +}; + +/** + * goldfish_sync_timeline_create_internal() - creates a sync object + * @name: goldfish_sync_timeline name + * + * Creates a new goldfish_sync_timeline. + * Returns the goldfish_sync_timeline object or NULL in case of error. + */ +struct goldfish_sync_timeline +*goldfish_sync_timeline_create_internal(const char *name); + +/** + * goldfish_sync_pt_create_internal() - creates a sync pt + * @parent: fence's parent goldfish_sync_timeline + * @size: size to allocate for this pt + * @inc: value of the fence + * + * Creates a new sync_pt as a child of @parent. @size bytes will be + * allocated allowing for implementation specific data to be kept after + * the generic sync_timeline struct. Returns the sync_pt object or + * NULL in case of error. + */ +struct sync_pt +*goldfish_sync_pt_create_internal(struct goldfish_sync_timeline *obj, + int size, unsigned int value); + +/** + * goldfish_sync_timeline_signal_internal() - + * signal a status change on a sync_timeline + * @obj: goldfish_sync_timeline to signal + * @inc: num to increment on timeline->value + * + * A sync implementation should call this any time one of it's fences + * has signaled or has an error condition. + */ +void goldfish_sync_timeline_signal_internal(struct goldfish_sync_timeline *obj, + unsigned int inc); + +/** + * goldfish_sync_timeline_put_internal() - dec refcount of a sync_timeline + * and clean up memory if it was the last ref. + * @obj: goldfish_sync_timeline to decref + */ +void goldfish_sync_timeline_put_internal(struct goldfish_sync_timeline *obj);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c index f56ea64..6a146f4 100644 --- a/drivers/staging/lustre/lustre/mdc/mdc_request.c +++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -1219,9 +1219,9 @@ struct readpage_param { * in PAGE_SIZE (if PAGE_SIZE greater than LU_PAGE_SIZE), and the * lu_dirpage for this integrated page will be adjusted. **/ -static int mdc_read_page_remote(void *data, struct page *page0) +static int mdc_read_page_remote(struct file *data, struct page *page0) { - struct readpage_param *rp = data; + struct readpage_param *rp = (struct readpage_param *)data; struct page **page_pool; struct page *page; struct lu_dirpage *dp;
diff --git a/drivers/tee/Kconfig b/drivers/tee/Kconfig new file mode 100644 index 0000000..a6df12d --- /dev/null +++ b/drivers/tee/Kconfig
@@ -0,0 +1,19 @@ +# Generic Trusted Execution Environment Configuration +config TEE + tristate "Trusted Execution Environment support" + depends on HAVE_ARM_SMCCC || COMPILE_TEST + select DMA_SHARED_BUFFER + select GENERIC_ALLOCATOR + help + This implements a generic interface towards a Trusted Execution + Environment (TEE). + +if TEE + +menu "TEE drivers" + +source "drivers/tee/optee/Kconfig" + +endmenu + +endif
diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile new file mode 100644 index 0000000..7a4e4a1 --- /dev/null +++ b/drivers/tee/Makefile
@@ -0,0 +1,5 @@ +obj-$(CONFIG_TEE) += tee.o +tee-objs += tee_core.o +tee-objs += tee_shm.o +tee-objs += tee_shm_pool.o +obj-$(CONFIG_OPTEE) += optee/
diff --git a/drivers/tee/optee/Kconfig b/drivers/tee/optee/Kconfig new file mode 100644 index 0000000..0126de8 --- /dev/null +++ b/drivers/tee/optee/Kconfig
@@ -0,0 +1,7 @@ +# OP-TEE Trusted Execution Environment Configuration +config OPTEE + tristate "OP-TEE" + depends on HAVE_ARM_SMCCC + help + This implements the OP-TEE Trusted Execution Environment (TEE) + driver.
diff --git a/drivers/tee/optee/Makefile b/drivers/tee/optee/Makefile new file mode 100644 index 0000000..220cf42 --- /dev/null +++ b/drivers/tee/optee/Makefile
@@ -0,0 +1,6 @@ +obj-$(CONFIG_OPTEE) += optee.o +optee-objs += core.o +optee-objs += call.o +optee-objs += rpc.o +optee-objs += supp.o +optee-objs += shm_pool.o
diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c new file mode 100644 index 0000000..a5afbe6 --- /dev/null +++ b/drivers/tee/optee/call.c
@@ -0,0 +1,662 @@ +/* + * Copyright (c) 2015, Linaro Limited + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ +#include <linux/arm-smccc.h> +#include <linux/device.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/mm.h> +#include <linux/slab.h> +#include <linux/tee_drv.h> +#include <linux/types.h> +#include <linux/uaccess.h> +#include "optee_private.h" +#include "optee_smc.h" + +struct optee_call_waiter { + struct list_head list_node; + struct completion c; +}; + +static void optee_cq_wait_init(struct optee_call_queue *cq, + struct optee_call_waiter *w) +{ + /* + * We're preparing to make a call to secure world. In case we can't + * allocate a thread in secure world we'll end up waiting in + * optee_cq_wait_for_completion(). + * + * Normally if there's no contention in secure world the call will + * complete and we can cleanup directly with optee_cq_wait_final(). + */ + mutex_lock(&cq->mutex); + + /* + * We add ourselves to the queue, but we don't wait. This + * guarantees that we don't lose a completion if secure world + * returns busy and another thread just exited and try to complete + * someone. + */ + init_completion(&w->c); + list_add_tail(&w->list_node, &cq->waiters); + + mutex_unlock(&cq->mutex); +} + +static void optee_cq_wait_for_completion(struct optee_call_queue *cq, + struct optee_call_waiter *w) +{ + wait_for_completion(&w->c); + + mutex_lock(&cq->mutex); + + /* Move to end of list to get out of the way for other waiters */ + list_del(&w->list_node); + reinit_completion(&w->c); + list_add_tail(&w->list_node, &cq->waiters); + + mutex_unlock(&cq->mutex); +} + +static void optee_cq_complete_one(struct optee_call_queue *cq) +{ + struct optee_call_waiter *w; + + list_for_each_entry(w, &cq->waiters, list_node) { + if (!completion_done(&w->c)) { + complete(&w->c); + break; + } + } +} + +static void optee_cq_wait_final(struct optee_call_queue *cq, + struct optee_call_waiter *w) +{ + /* + * We're done with the call to secure world. The thread in secure + * world that was used for this call is now available for some + * other task to use. + */ + mutex_lock(&cq->mutex); + + /* Get out of the list */ + list_del(&w->list_node); + + /* Wake up one eventual waiting task */ + optee_cq_complete_one(cq); + + /* + * If we're completed we've got a completion from another task that + * was just done with its call to secure world. Since yet another + * thread now is available in secure world wake up another eventual + * waiting task. + */ + if (completion_done(&w->c)) + optee_cq_complete_one(cq); + + mutex_unlock(&cq->mutex); +} + +/* Requires the filpstate mutex to be held */ +static struct optee_session *find_session(struct optee_context_data *ctxdata, + u32 session_id) +{ + struct optee_session *sess; + + list_for_each_entry(sess, &ctxdata->sess_list, list_node) + if (sess->session_id == session_id) + return sess; + + return NULL; +} + +/** + * optee_do_call_with_arg() - Do an SMC to OP-TEE in secure world + * @ctx: calling context + * @parg: physical address of message to pass to secure world + * + * Does and SMC to OP-TEE in secure world and handles eventual resulting + * Remote Procedure Calls (RPC) from OP-TEE. + * + * Returns return code from secure world, 0 is OK + */ +u32 optee_do_call_with_arg(struct tee_context *ctx, phys_addr_t parg) +{ + struct optee *optee = tee_get_drvdata(ctx->teedev); + struct optee_call_waiter w; + struct optee_rpc_param param = { }; + struct optee_call_ctx call_ctx = { }; + u32 ret; + + param.a0 = OPTEE_SMC_CALL_WITH_ARG; + reg_pair_from_64(¶m.a1, ¶m.a2, parg); + /* Initialize waiter */ + optee_cq_wait_init(&optee->call_queue, &w); + while (true) { + struct arm_smccc_res res; + + optee->invoke_fn(param.a0, param.a1, param.a2, param.a3, + param.a4, param.a5, param.a6, param.a7, + &res); + + if (res.a0 == OPTEE_SMC_RETURN_ETHREAD_LIMIT) { + /* + * Out of threads in secure world, wait for a thread + * become available. + */ + optee_cq_wait_for_completion(&optee->call_queue, &w); + } else if (OPTEE_SMC_RETURN_IS_RPC(res.a0)) { + param.a0 = res.a0; + param.a1 = res.a1; + param.a2 = res.a2; + param.a3 = res.a3; + optee_handle_rpc(ctx, ¶m, &call_ctx); + } else { + ret = res.a0; + break; + } + } + + optee_rpc_finalize_call(&call_ctx); + /* + * We're done with our thread in secure world, if there's any + * thread waiters wake up one. + */ + optee_cq_wait_final(&optee->call_queue, &w); + + return ret; +} + +static struct tee_shm *get_msg_arg(struct tee_context *ctx, size_t num_params, + struct optee_msg_arg **msg_arg, + phys_addr_t *msg_parg) +{ + int rc; + struct tee_shm *shm; + struct optee_msg_arg *ma; + + shm = tee_shm_alloc(ctx, OPTEE_MSG_GET_ARG_SIZE(num_params), + TEE_SHM_MAPPED); + if (IS_ERR(shm)) + return shm; + + ma = tee_shm_get_va(shm, 0); + if (IS_ERR(ma)) { + rc = PTR_ERR(ma); + goto out; + } + + rc = tee_shm_get_pa(shm, 0, msg_parg); + if (rc) + goto out; + + memset(ma, 0, OPTEE_MSG_GET_ARG_SIZE(num_params)); + ma->num_params = num_params; + *msg_arg = ma; +out: + if (rc) { + tee_shm_free(shm); + return ERR_PTR(rc); + } + + return shm; +} + +int optee_open_session(struct tee_context *ctx, + struct tee_ioctl_open_session_arg *arg, + struct tee_param *param) +{ + struct optee_context_data *ctxdata = ctx->data; + int rc; + struct tee_shm *shm; + struct optee_msg_arg *msg_arg; + phys_addr_t msg_parg; + struct optee_session *sess = NULL; + + /* +2 for the meta parameters added below */ + shm = get_msg_arg(ctx, arg->num_params + 2, &msg_arg, &msg_parg); + if (IS_ERR(shm)) + return PTR_ERR(shm); + + msg_arg->cmd = OPTEE_MSG_CMD_OPEN_SESSION; + msg_arg->cancel_id = arg->cancel_id; + + /* + * Initialize and add the meta parameters needed when opening a + * session. + */ + msg_arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT | + OPTEE_MSG_ATTR_META; + msg_arg->params[1].attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT | + OPTEE_MSG_ATTR_META; + memcpy(&msg_arg->params[0].u.value, arg->uuid, sizeof(arg->uuid)); + memcpy(&msg_arg->params[1].u.value, arg->uuid, sizeof(arg->clnt_uuid)); + msg_arg->params[1].u.value.c = arg->clnt_login; + + rc = optee_to_msg_param(msg_arg->params + 2, arg->num_params, param); + if (rc) + goto out; + + sess = kzalloc(sizeof(*sess), GFP_KERNEL); + if (!sess) { + rc = -ENOMEM; + goto out; + } + + if (optee_do_call_with_arg(ctx, msg_parg)) { + msg_arg->ret = TEEC_ERROR_COMMUNICATION; + msg_arg->ret_origin = TEEC_ORIGIN_COMMS; + } + + if (msg_arg->ret == TEEC_SUCCESS) { + /* A new session has been created, add it to the list. */ + sess->session_id = msg_arg->session; + mutex_lock(&ctxdata->mutex); + list_add(&sess->list_node, &ctxdata->sess_list); + mutex_unlock(&ctxdata->mutex); + } else { + kfree(sess); + } + + if (optee_from_msg_param(param, arg->num_params, msg_arg->params + 2)) { + arg->ret = TEEC_ERROR_COMMUNICATION; + arg->ret_origin = TEEC_ORIGIN_COMMS; + /* Close session again to avoid leakage */ + optee_close_session(ctx, msg_arg->session); + } else { + arg->session = msg_arg->session; + arg->ret = msg_arg->ret; + arg->ret_origin = msg_arg->ret_origin; + } +out: + tee_shm_free(shm); + + return rc; +} + +int optee_close_session(struct tee_context *ctx, u32 session) +{ + struct optee_context_data *ctxdata = ctx->data; + struct tee_shm *shm; + struct optee_msg_arg *msg_arg; + phys_addr_t msg_parg; + struct optee_session *sess; + + /* Check that the session is valid and remove it from the list */ + mutex_lock(&ctxdata->mutex); + sess = find_session(ctxdata, session); + if (sess) + list_del(&sess->list_node); + mutex_unlock(&ctxdata->mutex); + if (!sess) + return -EINVAL; + kfree(sess); + + shm = get_msg_arg(ctx, 0, &msg_arg, &msg_parg); + if (IS_ERR(shm)) + return PTR_ERR(shm); + + msg_arg->cmd = OPTEE_MSG_CMD_CLOSE_SESSION; + msg_arg->session = session; + optee_do_call_with_arg(ctx, msg_parg); + + tee_shm_free(shm); + return 0; +} + +int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg, + struct tee_param *param) +{ + struct optee_context_data *ctxdata = ctx->data; + struct tee_shm *shm; + struct optee_msg_arg *msg_arg; + phys_addr_t msg_parg; + struct optee_session *sess; + int rc; + + /* Check that the session is valid */ + mutex_lock(&ctxdata->mutex); + sess = find_session(ctxdata, arg->session); + mutex_unlock(&ctxdata->mutex); + if (!sess) + return -EINVAL; + + shm = get_msg_arg(ctx, arg->num_params, &msg_arg, &msg_parg); + if (IS_ERR(shm)) + return PTR_ERR(shm); + msg_arg->cmd = OPTEE_MSG_CMD_INVOKE_COMMAND; + msg_arg->func = arg->func; + msg_arg->session = arg->session; + msg_arg->cancel_id = arg->cancel_id; + + rc = optee_to_msg_param(msg_arg->params, arg->num_params, param); + if (rc) + goto out; + + if (optee_do_call_with_arg(ctx, msg_parg)) { + msg_arg->ret = TEEC_ERROR_COMMUNICATION; + msg_arg->ret_origin = TEEC_ORIGIN_COMMS; + } + + if (optee_from_msg_param(param, arg->num_params, msg_arg->params)) { + msg_arg->ret = TEEC_ERROR_COMMUNICATION; + msg_arg->ret_origin = TEEC_ORIGIN_COMMS; + } + + arg->ret = msg_arg->ret; + arg->ret_origin = msg_arg->ret_origin; +out: + tee_shm_free(shm); + return rc; +} + +int optee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session) +{ + struct optee_context_data *ctxdata = ctx->data; + struct tee_shm *shm; + struct optee_msg_arg *msg_arg; + phys_addr_t msg_parg; + struct optee_session *sess; + + /* Check that the session is valid */ + mutex_lock(&ctxdata->mutex); + sess = find_session(ctxdata, session); + mutex_unlock(&ctxdata->mutex); + if (!sess) + return -EINVAL; + + shm = get_msg_arg(ctx, 0, &msg_arg, &msg_parg); + if (IS_ERR(shm)) + return PTR_ERR(shm); + + msg_arg->cmd = OPTEE_MSG_CMD_CANCEL; + msg_arg->session = session; + msg_arg->cancel_id = cancel_id; + optee_do_call_with_arg(ctx, msg_parg); + + tee_shm_free(shm); + return 0; +} + +/** + * optee_enable_shm_cache() - Enables caching of some shared memory allocation + * in OP-TEE + * @optee: main service struct + */ +void optee_enable_shm_cache(struct optee *optee) +{ + struct optee_call_waiter w; + + /* We need to retry until secure world isn't busy. */ + optee_cq_wait_init(&optee->call_queue, &w); + while (true) { + struct arm_smccc_res res; + + optee->invoke_fn(OPTEE_SMC_ENABLE_SHM_CACHE, 0, 0, 0, 0, 0, 0, + 0, &res); + if (res.a0 == OPTEE_SMC_RETURN_OK) + break; + optee_cq_wait_for_completion(&optee->call_queue, &w); + } + optee_cq_wait_final(&optee->call_queue, &w); +} + +/** + * optee_disable_shm_cache() - Disables caching of some shared memory allocation + * in OP-TEE + * @optee: main service struct + */ +void optee_disable_shm_cache(struct optee *optee) +{ + struct optee_call_waiter w; + + /* We need to retry until secure world isn't busy. */ + optee_cq_wait_init(&optee->call_queue, &w); + while (true) { + union { + struct arm_smccc_res smccc; + struct optee_smc_disable_shm_cache_result result; + } res; + + optee->invoke_fn(OPTEE_SMC_DISABLE_SHM_CACHE, 0, 0, 0, 0, 0, 0, + 0, &res.smccc); + if (res.result.status == OPTEE_SMC_RETURN_ENOTAVAIL) + break; /* All shm's freed */ + if (res.result.status == OPTEE_SMC_RETURN_OK) { + struct tee_shm *shm; + + shm = reg_pair_to_ptr(res.result.shm_upper32, + res.result.shm_lower32); + tee_shm_free(shm); + } else { + optee_cq_wait_for_completion(&optee->call_queue, &w); + } + } + optee_cq_wait_final(&optee->call_queue, &w); +} + +#define PAGELIST_ENTRIES_PER_PAGE \ + ((OPTEE_MSG_NONCONTIG_PAGE_SIZE / sizeof(u64)) - 1) + +/** + * optee_fill_pages_list() - write list of user pages to given shared + * buffer. + * + * @dst: page-aligned buffer where list of pages will be stored + * @pages: array of pages that represents shared buffer + * @num_pages: number of entries in @pages + * @page_offset: offset of user buffer from page start + * + * @dst should be big enough to hold list of user page addresses and + * links to the next pages of buffer + */ +void optee_fill_pages_list(u64 *dst, struct page **pages, int num_pages, + size_t page_offset) +{ + int n = 0; + phys_addr_t optee_page; + /* + * Refer to OPTEE_MSG_ATTR_NONCONTIG description in optee_msg.h + * for details. + */ + struct { + u64 pages_list[PAGELIST_ENTRIES_PER_PAGE]; + u64 next_page_data; + } *pages_data; + + /* + * Currently OP-TEE uses 4k page size and it does not looks + * like this will change in the future. On other hand, there are + * no know ARM architectures with page size < 4k. + * Thus the next built assert looks redundant. But the following + * code heavily relies on this assumption, so it is better be + * safe than sorry. + */ + BUILD_BUG_ON(PAGE_SIZE < OPTEE_MSG_NONCONTIG_PAGE_SIZE); + + pages_data = (void *)dst; + /* + * If linux page is bigger than 4k, and user buffer offset is + * larger than 4k/8k/12k/etc this will skip first 4k pages, + * because they bear no value data for OP-TEE. + */ + optee_page = page_to_phys(*pages) + + round_down(page_offset, OPTEE_MSG_NONCONTIG_PAGE_SIZE); + + while (true) { + pages_data->pages_list[n++] = optee_page; + + if (n == PAGELIST_ENTRIES_PER_PAGE) { + pages_data->next_page_data = + virt_to_phys(pages_data + 1); + pages_data++; + n = 0; + } + + optee_page += OPTEE_MSG_NONCONTIG_PAGE_SIZE; + if (!(optee_page & ~PAGE_MASK)) { + if (!--num_pages) + break; + pages++; + optee_page = page_to_phys(*pages); + } + } +} + +/* + * The final entry in each pagelist page is a pointer to the next + * pagelist page. + */ +static size_t get_pages_list_size(size_t num_entries) +{ + int pages = DIV_ROUND_UP(num_entries, PAGELIST_ENTRIES_PER_PAGE); + + return pages * OPTEE_MSG_NONCONTIG_PAGE_SIZE; +} + +u64 *optee_allocate_pages_list(size_t num_entries) +{ + return alloc_pages_exact(get_pages_list_size(num_entries), GFP_KERNEL); +} + +void optee_free_pages_list(void *list, size_t num_entries) +{ + free_pages_exact(list, get_pages_list_size(num_entries)); +} + +static bool is_normal_memory(pgprot_t p) +{ +#if defined(CONFIG_ARM) + return (pgprot_val(p) & L_PTE_MT_MASK) == L_PTE_MT_WRITEALLOC; +#elif defined(CONFIG_ARM64) + return (pgprot_val(p) & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL); +#else +#error "Unuspported architecture" +#endif +} + +static int __check_mem_type(struct vm_area_struct *vma, unsigned long end) +{ + while (vma && is_normal_memory(vma->vm_page_prot)) { + if (vma->vm_end >= end) + return 0; + vma = vma->vm_next; + } + + return -EINVAL; +} + +static int check_mem_type(unsigned long start, size_t num_pages) +{ + struct mm_struct *mm = current->mm; + int rc; + + down_read(&mm->mmap_sem); + rc = __check_mem_type(find_vma(mm, start), + start + num_pages * PAGE_SIZE); + up_read(&mm->mmap_sem); + + return rc; +} + +int optee_shm_register(struct tee_context *ctx, struct tee_shm *shm, + struct page **pages, size_t num_pages, + unsigned long start) +{ + struct tee_shm *shm_arg = NULL; + struct optee_msg_arg *msg_arg; + u64 *pages_list; + phys_addr_t msg_parg; + int rc; + + if (!num_pages) + return -EINVAL; + + rc = check_mem_type(start, num_pages); + if (rc) + return rc; + + pages_list = optee_allocate_pages_list(num_pages); + if (!pages_list) + return -ENOMEM; + + shm_arg = get_msg_arg(ctx, 1, &msg_arg, &msg_parg); + if (IS_ERR(shm_arg)) { + rc = PTR_ERR(shm_arg); + goto out; + } + + optee_fill_pages_list(pages_list, pages, num_pages, + tee_shm_get_page_offset(shm)); + + msg_arg->cmd = OPTEE_MSG_CMD_REGISTER_SHM; + msg_arg->params->attr = OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT | + OPTEE_MSG_ATTR_NONCONTIG; + msg_arg->params->u.tmem.shm_ref = (unsigned long)shm; + msg_arg->params->u.tmem.size = tee_shm_get_size(shm); + /* + * In the least bits of msg_arg->params->u.tmem.buf_ptr we + * store buffer offset from 4k page, as described in OP-TEE ABI. + */ + msg_arg->params->u.tmem.buf_ptr = virt_to_phys(pages_list) | + (tee_shm_get_page_offset(shm) & (OPTEE_MSG_NONCONTIG_PAGE_SIZE - 1)); + + if (optee_do_call_with_arg(ctx, msg_parg) || + msg_arg->ret != TEEC_SUCCESS) + rc = -EINVAL; + + tee_shm_free(shm_arg); +out: + optee_free_pages_list(pages_list, num_pages); + return rc; +} + +int optee_shm_unregister(struct tee_context *ctx, struct tee_shm *shm) +{ + struct tee_shm *shm_arg; + struct optee_msg_arg *msg_arg; + phys_addr_t msg_parg; + int rc = 0; + + shm_arg = get_msg_arg(ctx, 1, &msg_arg, &msg_parg); + if (IS_ERR(shm_arg)) + return PTR_ERR(shm_arg); + + msg_arg->cmd = OPTEE_MSG_CMD_UNREGISTER_SHM; + + msg_arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_RMEM_INPUT; + msg_arg->params[0].u.rmem.shm_ref = (unsigned long)shm; + + if (optee_do_call_with_arg(ctx, msg_parg) || + msg_arg->ret != TEEC_SUCCESS) + rc = -EINVAL; + tee_shm_free(shm_arg); + return rc; +} + +int optee_shm_register_supp(struct tee_context *ctx, struct tee_shm *shm, + struct page **pages, size_t num_pages, + unsigned long start) +{ + /* + * We don't want to register supplicant memory in OP-TEE. + * Instead information about it will be passed in RPC code. + */ + return check_mem_type(start, num_pages); +} + +int optee_shm_unregister_supp(struct tee_context *ctx, struct tee_shm *shm) +{ + return 0; +}
diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c new file mode 100644 index 0000000..e9843c5 --- /dev/null +++ b/drivers/tee/optee/core.c
@@ -0,0 +1,705 @@ +/* + * Copyright (c) 2015, Linaro Limited + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/arm-smccc.h> +#include <linux/errno.h> +#include <linux/io.h> +#include <linux/module.h> +#include <linux/of.h> +#include <linux/of_platform.h> +#include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/string.h> +#include <linux/tee_drv.h> +#include <linux/types.h> +#include <linux/uaccess.h> +#include "optee_private.h" +#include "optee_smc.h" +#include "shm_pool.h" + +#define DRIVER_NAME "optee" + +#define OPTEE_SHM_NUM_PRIV_PAGES 1 + +/** + * optee_from_msg_param() - convert from OPTEE_MSG parameters to + * struct tee_param + * @params: subsystem internal parameter representation + * @num_params: number of elements in the parameter arrays + * @msg_params: OPTEE_MSG parameters + * Returns 0 on success or <0 on failure + */ +int optee_from_msg_param(struct tee_param *params, size_t num_params, + const struct optee_msg_param *msg_params) +{ + int rc; + size_t n; + struct tee_shm *shm; + phys_addr_t pa; + + for (n = 0; n < num_params; n++) { + struct tee_param *p = params + n; + const struct optee_msg_param *mp = msg_params + n; + u32 attr = mp->attr & OPTEE_MSG_ATTR_TYPE_MASK; + + switch (attr) { + case OPTEE_MSG_ATTR_TYPE_NONE: + p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_NONE; + memset(&p->u, 0, sizeof(p->u)); + break; + case OPTEE_MSG_ATTR_TYPE_VALUE_INPUT: + case OPTEE_MSG_ATTR_TYPE_VALUE_OUTPUT: + case OPTEE_MSG_ATTR_TYPE_VALUE_INOUT: + p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT + + attr - OPTEE_MSG_ATTR_TYPE_VALUE_INPUT; + p->u.value.a = mp->u.value.a; + p->u.value.b = mp->u.value.b; + p->u.value.c = mp->u.value.c; + break; + case OPTEE_MSG_ATTR_TYPE_TMEM_INPUT: + case OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT: + case OPTEE_MSG_ATTR_TYPE_TMEM_INOUT: + p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT + + attr - OPTEE_MSG_ATTR_TYPE_TMEM_INPUT; + p->u.memref.size = mp->u.tmem.size; + shm = (struct tee_shm *)(unsigned long) + mp->u.tmem.shm_ref; + if (!shm) { + p->u.memref.shm_offs = 0; + p->u.memref.shm = NULL; + break; + } + rc = tee_shm_get_pa(shm, 0, &pa); + if (rc) + return rc; + p->u.memref.shm_offs = mp->u.tmem.buf_ptr - pa; + p->u.memref.shm = shm; + + /* Check that the memref is covered by the shm object */ + if (p->u.memref.size) { + size_t o = p->u.memref.shm_offs + + p->u.memref.size - 1; + + rc = tee_shm_get_pa(shm, o, NULL); + if (rc) + return rc; + } + break; + case OPTEE_MSG_ATTR_TYPE_RMEM_INPUT: + case OPTEE_MSG_ATTR_TYPE_RMEM_OUTPUT: + case OPTEE_MSG_ATTR_TYPE_RMEM_INOUT: + p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT + + attr - OPTEE_MSG_ATTR_TYPE_RMEM_INPUT; + p->u.memref.size = mp->u.rmem.size; + shm = (struct tee_shm *)(unsigned long) + mp->u.rmem.shm_ref; + + if (!shm) { + p->u.memref.shm_offs = 0; + p->u.memref.shm = NULL; + break; + } + p->u.memref.shm_offs = mp->u.rmem.offs; + p->u.memref.shm = shm; + + break; + + default: + return -EINVAL; + } + } + return 0; +} + +static int to_msg_param_tmp_mem(struct optee_msg_param *mp, + const struct tee_param *p) +{ + int rc; + phys_addr_t pa; + + mp->attr = OPTEE_MSG_ATTR_TYPE_TMEM_INPUT + p->attr - + TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT; + + mp->u.tmem.shm_ref = (unsigned long)p->u.memref.shm; + mp->u.tmem.size = p->u.memref.size; + + if (!p->u.memref.shm) { + mp->u.tmem.buf_ptr = 0; + return 0; + } + + rc = tee_shm_get_pa(p->u.memref.shm, p->u.memref.shm_offs, &pa); + if (rc) + return rc; + + mp->u.tmem.buf_ptr = pa; + mp->attr |= OPTEE_MSG_ATTR_CACHE_PREDEFINED << + OPTEE_MSG_ATTR_CACHE_SHIFT; + + return 0; +} + +static int to_msg_param_reg_mem(struct optee_msg_param *mp, + const struct tee_param *p) +{ + mp->attr = OPTEE_MSG_ATTR_TYPE_RMEM_INPUT + p->attr - + TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT; + + mp->u.rmem.shm_ref = (unsigned long)p->u.memref.shm; + mp->u.rmem.size = p->u.memref.size; + mp->u.rmem.offs = p->u.memref.shm_offs; + return 0; +} + +/** + * optee_to_msg_param() - convert from struct tee_params to OPTEE_MSG parameters + * @msg_params: OPTEE_MSG parameters + * @num_params: number of elements in the parameter arrays + * @params: subsystem itnernal parameter representation + * Returns 0 on success or <0 on failure + */ +int optee_to_msg_param(struct optee_msg_param *msg_params, size_t num_params, + const struct tee_param *params) +{ + int rc; + size_t n; + + for (n = 0; n < num_params; n++) { + const struct tee_param *p = params + n; + struct optee_msg_param *mp = msg_params + n; + + switch (p->attr) { + case TEE_IOCTL_PARAM_ATTR_TYPE_NONE: + mp->attr = TEE_IOCTL_PARAM_ATTR_TYPE_NONE; + memset(&mp->u, 0, sizeof(mp->u)); + break; + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT: + mp->attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT + p->attr - + TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT; + mp->u.value.a = p->u.value.a; + mp->u.value.b = p->u.value.b; + mp->u.value.c = p->u.value.c; + break; + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: + if (tee_shm_is_registered(p->u.memref.shm)) + rc = to_msg_param_reg_mem(mp, p); + else + rc = to_msg_param_tmp_mem(mp, p); + if (rc) + return rc; + break; + default: + return -EINVAL; + } + } + return 0; +} + +static void optee_get_version(struct tee_device *teedev, + struct tee_ioctl_version_data *vers) +{ + struct tee_ioctl_version_data v = { + .impl_id = TEE_IMPL_ID_OPTEE, + .impl_caps = TEE_OPTEE_CAP_TZ, + .gen_caps = TEE_GEN_CAP_GP, + }; + struct optee *optee = tee_get_drvdata(teedev); + + if (optee->sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM) + v.gen_caps |= TEE_GEN_CAP_REG_MEM; + *vers = v; +} + +static int optee_open(struct tee_context *ctx) +{ + struct optee_context_data *ctxdata; + struct tee_device *teedev = ctx->teedev; + struct optee *optee = tee_get_drvdata(teedev); + + ctxdata = kzalloc(sizeof(*ctxdata), GFP_KERNEL); + if (!ctxdata) + return -ENOMEM; + + if (teedev == optee->supp_teedev) { + bool busy = true; + + mutex_lock(&optee->supp.mutex); + if (!optee->supp.ctx) { + busy = false; + optee->supp.ctx = ctx; + } + mutex_unlock(&optee->supp.mutex); + if (busy) { + kfree(ctxdata); + return -EBUSY; + } + } + + mutex_init(&ctxdata->mutex); + INIT_LIST_HEAD(&ctxdata->sess_list); + + ctx->data = ctxdata; + return 0; +} + +static void optee_release(struct tee_context *ctx) +{ + struct optee_context_data *ctxdata = ctx->data; + struct tee_device *teedev = ctx->teedev; + struct optee *optee = tee_get_drvdata(teedev); + struct tee_shm *shm; + struct optee_msg_arg *arg = NULL; + phys_addr_t parg; + struct optee_session *sess; + struct optee_session *sess_tmp; + + if (!ctxdata) + return; + + shm = tee_shm_alloc(ctx, sizeof(struct optee_msg_arg), TEE_SHM_MAPPED); + if (!IS_ERR(shm)) { + arg = tee_shm_get_va(shm, 0); + /* + * If va2pa fails for some reason, we can't call into + * secure world, only free the memory. Secure OS will leak + * sessions and finally refuse more sessions, but we will + * at least let normal world reclaim its memory. + */ + if (!IS_ERR(arg)) + if (tee_shm_va2pa(shm, arg, &parg)) + arg = NULL; /* prevent usage of parg below */ + } + + list_for_each_entry_safe(sess, sess_tmp, &ctxdata->sess_list, + list_node) { + list_del(&sess->list_node); + if (!IS_ERR_OR_NULL(arg)) { + memset(arg, 0, sizeof(*arg)); + arg->cmd = OPTEE_MSG_CMD_CLOSE_SESSION; + arg->session = sess->session_id; + optee_do_call_with_arg(ctx, parg); + } + kfree(sess); + } + kfree(ctxdata); + + if (!IS_ERR(shm)) + tee_shm_free(shm); + + ctx->data = NULL; + + if (teedev == optee->supp_teedev) + optee_supp_release(&optee->supp); +} + +static const struct tee_driver_ops optee_ops = { + .get_version = optee_get_version, + .open = optee_open, + .release = optee_release, + .open_session = optee_open_session, + .close_session = optee_close_session, + .invoke_func = optee_invoke_func, + .cancel_req = optee_cancel_req, + .shm_register = optee_shm_register, + .shm_unregister = optee_shm_unregister, +}; + +static const struct tee_desc optee_desc = { + .name = DRIVER_NAME "-clnt", + .ops = &optee_ops, + .owner = THIS_MODULE, +}; + +static const struct tee_driver_ops optee_supp_ops = { + .get_version = optee_get_version, + .open = optee_open, + .release = optee_release, + .supp_recv = optee_supp_recv, + .supp_send = optee_supp_send, + .shm_register = optee_shm_register_supp, + .shm_unregister = optee_shm_unregister_supp, +}; + +static const struct tee_desc optee_supp_desc = { + .name = DRIVER_NAME "-supp", + .ops = &optee_supp_ops, + .owner = THIS_MODULE, + .flags = TEE_DESC_PRIVILEGED, +}; + +static bool optee_msg_api_uid_is_optee_api(optee_invoke_fn *invoke_fn) +{ + struct arm_smccc_res res; + + invoke_fn(OPTEE_SMC_CALLS_UID, 0, 0, 0, 0, 0, 0, 0, &res); + + if (res.a0 == OPTEE_MSG_UID_0 && res.a1 == OPTEE_MSG_UID_1 && + res.a2 == OPTEE_MSG_UID_2 && res.a3 == OPTEE_MSG_UID_3) + return true; + return false; +} + +static bool optee_msg_api_revision_is_compatible(optee_invoke_fn *invoke_fn) +{ + union { + struct arm_smccc_res smccc; + struct optee_smc_calls_revision_result result; + } res; + + invoke_fn(OPTEE_SMC_CALLS_REVISION, 0, 0, 0, 0, 0, 0, 0, &res.smccc); + + if (res.result.major == OPTEE_MSG_REVISION_MAJOR && + (int)res.result.minor >= OPTEE_MSG_REVISION_MINOR) + return true; + return false; +} + +static bool optee_msg_exchange_capabilities(optee_invoke_fn *invoke_fn, + u32 *sec_caps) +{ + union { + struct arm_smccc_res smccc; + struct optee_smc_exchange_capabilities_result result; + } res; + u32 a1 = 0; + + /* + * TODO This isn't enough to tell if it's UP system (from kernel + * point of view) or not, is_smp() returns the the information + * needed, but can't be called directly from here. + */ + if (!IS_ENABLED(CONFIG_SMP) || nr_cpu_ids == 1) + a1 |= OPTEE_SMC_NSEC_CAP_UNIPROCESSOR; + + invoke_fn(OPTEE_SMC_EXCHANGE_CAPABILITIES, a1, 0, 0, 0, 0, 0, 0, + &res.smccc); + + if (res.result.status != OPTEE_SMC_RETURN_OK) + return false; + + *sec_caps = res.result.capabilities; + return true; +} + +static struct tee_shm_pool * +optee_config_shm_memremap(optee_invoke_fn *invoke_fn, void **memremaped_shm, + u32 sec_caps) +{ + union { + struct arm_smccc_res smccc; + struct optee_smc_get_shm_config_result result; + } res; + unsigned long vaddr; + phys_addr_t paddr; + size_t size; + phys_addr_t begin; + phys_addr_t end; + void *va; + struct tee_shm_pool_mgr *priv_mgr; + struct tee_shm_pool_mgr *dmabuf_mgr; + void *rc; + + invoke_fn(OPTEE_SMC_GET_SHM_CONFIG, 0, 0, 0, 0, 0, 0, 0, &res.smccc); + if (res.result.status != OPTEE_SMC_RETURN_OK) { + pr_info("shm service not available\n"); + return ERR_PTR(-ENOENT); + } + + if (res.result.settings != OPTEE_SMC_SHM_CACHED) { + pr_err("only normal cached shared memory supported\n"); + return ERR_PTR(-EINVAL); + } + + begin = roundup(res.result.start, PAGE_SIZE); + end = rounddown(res.result.start + res.result.size, PAGE_SIZE); + paddr = begin; + size = end - begin; + + if (size < 2 * OPTEE_SHM_NUM_PRIV_PAGES * PAGE_SIZE) { + pr_err("too small shared memory area\n"); + return ERR_PTR(-EINVAL); + } + + va = memremap(paddr, size, MEMREMAP_WB); + if (!va) { + pr_err("shared memory ioremap failed\n"); + return ERR_PTR(-EINVAL); + } + vaddr = (unsigned long)va; + + /* + * If OP-TEE can work with unregistered SHM, we will use own pool + * for private shm + */ + if (sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM) { + rc = optee_shm_pool_alloc_pages(); + if (IS_ERR(rc)) + goto err_memunmap; + priv_mgr = rc; + } else { + const size_t sz = OPTEE_SHM_NUM_PRIV_PAGES * PAGE_SIZE; + + rc = tee_shm_pool_mgr_alloc_res_mem(vaddr, paddr, sz, + 3 /* 8 bytes aligned */); + if (IS_ERR(rc)) + goto err_memunmap; + priv_mgr = rc; + + vaddr += sz; + paddr += sz; + size -= sz; + } + + rc = tee_shm_pool_mgr_alloc_res_mem(vaddr, paddr, size, PAGE_SHIFT); + if (IS_ERR(rc)) + goto err_free_priv_mgr; + dmabuf_mgr = rc; + + rc = tee_shm_pool_alloc(priv_mgr, dmabuf_mgr); + if (IS_ERR(rc)) + goto err_free_dmabuf_mgr; + + *memremaped_shm = va; + + return rc; + +err_free_dmabuf_mgr: + tee_shm_pool_mgr_destroy(dmabuf_mgr); +err_free_priv_mgr: + tee_shm_pool_mgr_destroy(priv_mgr); +err_memunmap: + memunmap(va); + return rc; +} + +/* Simple wrapper functions to be able to use a function pointer */ +static void optee_smccc_smc(unsigned long a0, unsigned long a1, + unsigned long a2, unsigned long a3, + unsigned long a4, unsigned long a5, + unsigned long a6, unsigned long a7, + struct arm_smccc_res *res) +{ + arm_smccc_smc(a0, a1, a2, a3, a4, a5, a6, a7, res); +} + +static void optee_smccc_hvc(unsigned long a0, unsigned long a1, + unsigned long a2, unsigned long a3, + unsigned long a4, unsigned long a5, + unsigned long a6, unsigned long a7, + struct arm_smccc_res *res) +{ + arm_smccc_hvc(a0, a1, a2, a3, a4, a5, a6, a7, res); +} + +static optee_invoke_fn *get_invoke_func(struct device_node *np) +{ + const char *method; + + pr_info("probing for conduit method from DT.\n"); + + if (of_property_read_string(np, "method", &method)) { + pr_warn("missing \"method\" property\n"); + return ERR_PTR(-ENXIO); + } + + if (!strcmp("hvc", method)) + return optee_smccc_hvc; + else if (!strcmp("smc", method)) + return optee_smccc_smc; + + pr_warn("invalid \"method\" property: %s\n", method); + return ERR_PTR(-EINVAL); +} + +static struct optee *optee_probe(struct device_node *np) +{ + optee_invoke_fn *invoke_fn; + struct tee_shm_pool *pool; + struct optee *optee = NULL; + void *memremaped_shm = NULL; + struct tee_device *teedev; + u32 sec_caps; + int rc; + + invoke_fn = get_invoke_func(np); + if (IS_ERR(invoke_fn)) + return (void *)invoke_fn; + + if (!optee_msg_api_uid_is_optee_api(invoke_fn)) { + pr_warn("api uid mismatch\n"); + return ERR_PTR(-EINVAL); + } + + if (!optee_msg_api_revision_is_compatible(invoke_fn)) { + pr_warn("api revision mismatch\n"); + return ERR_PTR(-EINVAL); + } + + if (!optee_msg_exchange_capabilities(invoke_fn, &sec_caps)) { + pr_warn("capabilities mismatch\n"); + return ERR_PTR(-EINVAL); + } + + /* + * We have no other option for shared memory, if secure world + * doesn't have any reserved memory we can use we can't continue. + */ + if (!(sec_caps & OPTEE_SMC_SEC_CAP_HAVE_RESERVED_SHM)) + return ERR_PTR(-EINVAL); + + pool = optee_config_shm_memremap(invoke_fn, &memremaped_shm, sec_caps); + if (IS_ERR(pool)) + return (void *)pool; + + optee = kzalloc(sizeof(*optee), GFP_KERNEL); + if (!optee) { + rc = -ENOMEM; + goto err; + } + + optee->invoke_fn = invoke_fn; + optee->sec_caps = sec_caps; + + teedev = tee_device_alloc(&optee_desc, NULL, pool, optee); + if (IS_ERR(teedev)) { + rc = PTR_ERR(teedev); + goto err; + } + optee->teedev = teedev; + + teedev = tee_device_alloc(&optee_supp_desc, NULL, pool, optee); + if (IS_ERR(teedev)) { + rc = PTR_ERR(teedev); + goto err; + } + optee->supp_teedev = teedev; + + rc = tee_device_register(optee->teedev); + if (rc) + goto err; + + rc = tee_device_register(optee->supp_teedev); + if (rc) + goto err; + + mutex_init(&optee->call_queue.mutex); + INIT_LIST_HEAD(&optee->call_queue.waiters); + optee_wait_queue_init(&optee->wait_queue); + optee_supp_init(&optee->supp); + optee->memremaped_shm = memremaped_shm; + optee->pool = pool; + + optee_enable_shm_cache(optee); + + pr_info("initialized driver\n"); + return optee; +err: + if (optee) { + /* + * tee_device_unregister() is safe to call even if the + * devices hasn't been registered with + * tee_device_register() yet. + */ + tee_device_unregister(optee->supp_teedev); + tee_device_unregister(optee->teedev); + kfree(optee); + } + if (pool) + tee_shm_pool_free(pool); + if (memremaped_shm) + memunmap(memremaped_shm); + return ERR_PTR(rc); +} + +static void optee_remove(struct optee *optee) +{ + /* + * Ask OP-TEE to free all cached shared memory objects to decrease + * reference counters and also avoid wild pointers in secure world + * into the old shared memory range. + */ + optee_disable_shm_cache(optee); + + /* + * The two devices has to be unregistered before we can free the + * other resources. + */ + tee_device_unregister(optee->supp_teedev); + tee_device_unregister(optee->teedev); + + tee_shm_pool_free(optee->pool); + if (optee->memremaped_shm) + memunmap(optee->memremaped_shm); + optee_wait_queue_exit(&optee->wait_queue); + optee_supp_uninit(&optee->supp); + mutex_destroy(&optee->call_queue.mutex); + + kfree(optee); +} + +static const struct of_device_id optee_match[] = { + { .compatible = "linaro,optee-tz" }, + {}, +}; + +static struct optee *optee_svc; + +static int __init optee_driver_init(void) +{ + struct device_node *fw_np; + struct device_node *np; + struct optee *optee; + + /* Node is supposed to be below /firmware */ + fw_np = of_find_node_by_name(NULL, "firmware"); + if (!fw_np) + return -ENODEV; + + np = of_find_matching_node(fw_np, optee_match); + if (!np) + return -ENODEV; + + optee = optee_probe(np); + of_node_put(np); + + if (IS_ERR(optee)) + return PTR_ERR(optee); + + optee_svc = optee; + + return 0; +} +module_init(optee_driver_init); + +static void __exit optee_driver_exit(void) +{ + struct optee *optee = optee_svc; + + optee_svc = NULL; + if (optee) + optee_remove(optee); +} +module_exit(optee_driver_exit); + +MODULE_AUTHOR("Linaro"); +MODULE_DESCRIPTION("OP-TEE driver"); +MODULE_SUPPORTED_DEVICE(""); +MODULE_VERSION("1.0"); +MODULE_LICENSE("GPL v2");
diff --git a/drivers/tee/optee/optee_msg.h b/drivers/tee/optee/optee_msg.h new file mode 100644 index 0000000..3050490 --- /dev/null +++ b/drivers/tee/optee/optee_msg.h
@@ -0,0 +1,444 @@ +/* + * Copyright (c) 2015-2016, Linaro Limited + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _OPTEE_MSG_H +#define _OPTEE_MSG_H + +#include <linux/bitops.h> +#include <linux/types.h> + +/* + * This file defines the OP-TEE message protocol used to communicate + * with an instance of OP-TEE running in secure world. + * + * This file is divided into three sections. + * 1. Formatting of messages. + * 2. Requests from normal world + * 3. Requests from secure world, Remote Procedure Call (RPC), handled by + * tee-supplicant. + */ + +/***************************************************************************** + * Part 1 - formatting of messages + *****************************************************************************/ + +#define OPTEE_MSG_ATTR_TYPE_NONE 0x0 +#define OPTEE_MSG_ATTR_TYPE_VALUE_INPUT 0x1 +#define OPTEE_MSG_ATTR_TYPE_VALUE_OUTPUT 0x2 +#define OPTEE_MSG_ATTR_TYPE_VALUE_INOUT 0x3 +#define OPTEE_MSG_ATTR_TYPE_RMEM_INPUT 0x5 +#define OPTEE_MSG_ATTR_TYPE_RMEM_OUTPUT 0x6 +#define OPTEE_MSG_ATTR_TYPE_RMEM_INOUT 0x7 +#define OPTEE_MSG_ATTR_TYPE_TMEM_INPUT 0x9 +#define OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT 0xa +#define OPTEE_MSG_ATTR_TYPE_TMEM_INOUT 0xb + +#define OPTEE_MSG_ATTR_TYPE_MASK GENMASK(7, 0) + +/* + * Meta parameter to be absorbed by the Secure OS and not passed + * to the Trusted Application. + * + * Currently only used with OPTEE_MSG_CMD_OPEN_SESSION. + */ +#define OPTEE_MSG_ATTR_META BIT(8) + +/* + * Pointer to a list of pages used to register user-defined SHM buffer. + * Used with OPTEE_MSG_ATTR_TYPE_TMEM_*. + * buf_ptr should point to the beginning of the buffer. Buffer will contain + * list of page addresses. OP-TEE core can reconstruct contiguous buffer from + * that page addresses list. Page addresses are stored as 64 bit values. + * Last entry on a page should point to the next page of buffer. + * Every entry in buffer should point to a 4k page beginning (12 least + * significant bits must be equal to zero). + * + * 12 least significant bints of optee_msg_param.u.tmem.buf_ptr should hold page + * offset of the user buffer. + * + * So, entries should be placed like members of this structure: + * + * struct page_data { + * uint64_t pages_array[OPTEE_MSG_NONCONTIG_PAGE_SIZE/sizeof(uint64_t) - 1]; + * uint64_t next_page_data; + * }; + * + * Structure is designed to exactly fit into the page size + * OPTEE_MSG_NONCONTIG_PAGE_SIZE which is a standard 4KB page. + * + * The size of 4KB is chosen because this is the smallest page size for ARM + * architectures. If REE uses larger pages, it should divide them to 4KB ones. + */ +#define OPTEE_MSG_ATTR_NONCONTIG BIT(9) + +/* + * Memory attributes for caching passed with temp memrefs. The actual value + * used is defined outside the message protocol with the exception of + * OPTEE_MSG_ATTR_CACHE_PREDEFINED which means the attributes already + * defined for the memory range should be used. If optee_smc.h is used as + * bearer of this protocol OPTEE_SMC_SHM_* is used for values. + */ +#define OPTEE_MSG_ATTR_CACHE_SHIFT 16 +#define OPTEE_MSG_ATTR_CACHE_MASK GENMASK(2, 0) +#define OPTEE_MSG_ATTR_CACHE_PREDEFINED 0 + +/* + * Same values as TEE_LOGIN_* from TEE Internal API + */ +#define OPTEE_MSG_LOGIN_PUBLIC 0x00000000 +#define OPTEE_MSG_LOGIN_USER 0x00000001 +#define OPTEE_MSG_LOGIN_GROUP 0x00000002 +#define OPTEE_MSG_LOGIN_APPLICATION 0x00000004 +#define OPTEE_MSG_LOGIN_APPLICATION_USER 0x00000005 +#define OPTEE_MSG_LOGIN_APPLICATION_GROUP 0x00000006 + +/* + * Page size used in non-contiguous buffer entries + */ +#define OPTEE_MSG_NONCONTIG_PAGE_SIZE 4096 + +/** + * struct optee_msg_param_tmem - temporary memory reference parameter + * @buf_ptr: Address of the buffer + * @size: Size of the buffer + * @shm_ref: Temporary shared memory reference, pointer to a struct tee_shm + * + * Secure and normal world communicates pointers as physical address + * instead of the virtual address. This is because secure and normal world + * have completely independent memory mapping. Normal world can even have a + * hypervisor which need to translate the guest physical address (AKA IPA + * in ARM documentation) to a real physical address before passing the + * structure to secure world. + */ +struct optee_msg_param_tmem { + u64 buf_ptr; + u64 size; + u64 shm_ref; +}; + +/** + * struct optee_msg_param_rmem - registered memory reference parameter + * @offs: Offset into shared memory reference + * @size: Size of the buffer + * @shm_ref: Shared memory reference, pointer to a struct tee_shm + */ +struct optee_msg_param_rmem { + u64 offs; + u64 size; + u64 shm_ref; +}; + +/** + * struct optee_msg_param_value - opaque value parameter + * + * Value parameters are passed unchecked between normal and secure world. + */ +struct optee_msg_param_value { + u64 a; + u64 b; + u64 c; +}; + +/** + * struct optee_msg_param - parameter used together with struct optee_msg_arg + * @attr: attributes + * @tmem: parameter by temporary memory reference + * @rmem: parameter by registered memory reference + * @value: parameter by opaque value + * + * @attr & OPTEE_MSG_ATTR_TYPE_MASK indicates if tmem, rmem or value is used in + * the union. OPTEE_MSG_ATTR_TYPE_VALUE_* indicates value, + * OPTEE_MSG_ATTR_TYPE_TMEM_* indicates @tmem and + * OPTEE_MSG_ATTR_TYPE_RMEM_* indicates @rmem, + * OPTEE_MSG_ATTR_TYPE_NONE indicates that none of the members are used. + */ +struct optee_msg_param { + u64 attr; + union { + struct optee_msg_param_tmem tmem; + struct optee_msg_param_rmem rmem; + struct optee_msg_param_value value; + } u; +}; + +/** + * struct optee_msg_arg - call argument + * @cmd: Command, one of OPTEE_MSG_CMD_* or OPTEE_MSG_RPC_CMD_* + * @func: Trusted Application function, specific to the Trusted Application, + * used if cmd == OPTEE_MSG_CMD_INVOKE_COMMAND + * @session: In parameter for all OPTEE_MSG_CMD_* except + * OPTEE_MSG_CMD_OPEN_SESSION where it's an output parameter instead + * @cancel_id: Cancellation id, a unique value to identify this request + * @ret: return value + * @ret_origin: origin of the return value + * @num_params: number of parameters supplied to the OS Command + * @params: the parameters supplied to the OS Command + * + * All normal calls to Trusted OS uses this struct. If cmd requires further + * information than what these field holds it can be passed as a parameter + * tagged as meta (setting the OPTEE_MSG_ATTR_META bit in corresponding + * attrs field). All parameters tagged as meta has to come first. + * + * Temp memref parameters can be fragmented if supported by the Trusted OS + * (when optee_smc.h is bearer of this protocol this is indicated with + * OPTEE_SMC_SEC_CAP_UNREGISTERED_SHM). If a logical memref parameter is + * fragmented then has all but the last fragment the + * OPTEE_MSG_ATTR_FRAGMENT bit set in attrs. Even if a memref is fragmented + * it will still be presented as a single logical memref to the Trusted + * Application. + */ +struct optee_msg_arg { + u32 cmd; + u32 func; + u32 session; + u32 cancel_id; + u32 pad; + u32 ret; + u32 ret_origin; + u32 num_params; + + /* num_params tells the actual number of element in params */ + struct optee_msg_param params[0]; +}; + +/** + * OPTEE_MSG_GET_ARG_SIZE - return size of struct optee_msg_arg + * + * @num_params: Number of parameters embedded in the struct optee_msg_arg + * + * Returns the size of the struct optee_msg_arg together with the number + * of embedded parameters. + */ +#define OPTEE_MSG_GET_ARG_SIZE(num_params) \ + (sizeof(struct optee_msg_arg) + \ + sizeof(struct optee_msg_param) * (num_params)) + +/***************************************************************************** + * Part 2 - requests from normal world + *****************************************************************************/ + +/* + * Return the following UID if using API specified in this file without + * further extensions: + * 384fb3e0-e7f8-11e3-af63-0002a5d5c51b. + * Represented in 4 32-bit words in OPTEE_MSG_UID_0, OPTEE_MSG_UID_1, + * OPTEE_MSG_UID_2, OPTEE_MSG_UID_3. + */ +#define OPTEE_MSG_UID_0 0x384fb3e0 +#define OPTEE_MSG_UID_1 0xe7f811e3 +#define OPTEE_MSG_UID_2 0xaf630002 +#define OPTEE_MSG_UID_3 0xa5d5c51b +#define OPTEE_MSG_FUNCID_CALLS_UID 0xFF01 + +/* + * Returns 2.0 if using API specified in this file without further + * extensions. Represented in 2 32-bit words in OPTEE_MSG_REVISION_MAJOR + * and OPTEE_MSG_REVISION_MINOR + */ +#define OPTEE_MSG_REVISION_MAJOR 2 +#define OPTEE_MSG_REVISION_MINOR 0 +#define OPTEE_MSG_FUNCID_CALLS_REVISION 0xFF03 + +/* + * Get UUID of Trusted OS. + * + * Used by non-secure world to figure out which Trusted OS is installed. + * Note that returned UUID is the UUID of the Trusted OS, not of the API. + * + * Returns UUID in 4 32-bit words in the same way as + * OPTEE_MSG_FUNCID_CALLS_UID described above. + */ +#define OPTEE_MSG_OS_OPTEE_UUID_0 0x486178e0 +#define OPTEE_MSG_OS_OPTEE_UUID_1 0xe7f811e3 +#define OPTEE_MSG_OS_OPTEE_UUID_2 0xbc5e0002 +#define OPTEE_MSG_OS_OPTEE_UUID_3 0xa5d5c51b +#define OPTEE_MSG_FUNCID_GET_OS_UUID 0x0000 + +/* + * Get revision of Trusted OS. + * + * Used by non-secure world to figure out which version of the Trusted OS + * is installed. Note that the returned revision is the revision of the + * Trusted OS, not of the API. + * + * Returns revision in 2 32-bit words in the same way as + * OPTEE_MSG_CALLS_REVISION described above. + */ +#define OPTEE_MSG_FUNCID_GET_OS_REVISION 0x0001 + +/* + * Do a secure call with struct optee_msg_arg as argument + * The OPTEE_MSG_CMD_* below defines what goes in struct optee_msg_arg::cmd + * + * OPTEE_MSG_CMD_OPEN_SESSION opens a session to a Trusted Application. + * The first two parameters are tagged as meta, holding two value + * parameters to pass the following information: + * param[0].u.value.a-b uuid of Trusted Application + * param[1].u.value.a-b uuid of Client + * param[1].u.value.c Login class of client OPTEE_MSG_LOGIN_* + * + * OPTEE_MSG_CMD_INVOKE_COMMAND invokes a command a previously opened + * session to a Trusted Application. struct optee_msg_arg::func is Trusted + * Application function, specific to the Trusted Application. + * + * OPTEE_MSG_CMD_CLOSE_SESSION closes a previously opened session to + * Trusted Application. + * + * OPTEE_MSG_CMD_CANCEL cancels a currently invoked command. + * + * OPTEE_MSG_CMD_REGISTER_SHM registers a shared memory reference. The + * information is passed as: + * [in] param[0].attr OPTEE_MSG_ATTR_TYPE_TMEM_INPUT + * [| OPTEE_MSG_ATTR_FRAGMENT] + * [in] param[0].u.tmem.buf_ptr physical address (of first fragment) + * [in] param[0].u.tmem.size size (of first fragment) + * [in] param[0].u.tmem.shm_ref holds shared memory reference + * ... + * The shared memory can optionally be fragmented, temp memrefs can follow + * each other with all but the last with the OPTEE_MSG_ATTR_FRAGMENT bit set. + * + * OPTEE_MSG_CMD_UNREGISTER_SHM unregisteres a previously registered shared + * memory reference. The information is passed as: + * [in] param[0].attr OPTEE_MSG_ATTR_TYPE_RMEM_INPUT + * [in] param[0].u.rmem.shm_ref holds shared memory reference + * [in] param[0].u.rmem.offs 0 + * [in] param[0].u.rmem.size 0 + */ +#define OPTEE_MSG_CMD_OPEN_SESSION 0 +#define OPTEE_MSG_CMD_INVOKE_COMMAND 1 +#define OPTEE_MSG_CMD_CLOSE_SESSION 2 +#define OPTEE_MSG_CMD_CANCEL 3 +#define OPTEE_MSG_CMD_REGISTER_SHM 4 +#define OPTEE_MSG_CMD_UNREGISTER_SHM 5 +#define OPTEE_MSG_FUNCID_CALL_WITH_ARG 0x0004 + +/***************************************************************************** + * Part 3 - Requests from secure world, RPC + *****************************************************************************/ + +/* + * All RPC is done with a struct optee_msg_arg as bearer of information, + * struct optee_msg_arg::arg holds values defined by OPTEE_MSG_RPC_CMD_* below + * + * RPC communication with tee-supplicant is reversed compared to normal + * client communication desribed above. The supplicant receives requests + * and sends responses. + */ + +/* + * Load a TA into memory, defined in tee-supplicant + */ +#define OPTEE_MSG_RPC_CMD_LOAD_TA 0 + +/* + * Reserved + */ +#define OPTEE_MSG_RPC_CMD_RPMB 1 + +/* + * File system access, defined in tee-supplicant + */ +#define OPTEE_MSG_RPC_CMD_FS 2 + +/* + * Get time + * + * Returns number of seconds and nano seconds since the Epoch, + * 1970-01-01 00:00:00 +0000 (UTC). + * + * [out] param[0].u.value.a Number of seconds + * [out] param[0].u.value.b Number of nano seconds. + */ +#define OPTEE_MSG_RPC_CMD_GET_TIME 3 + +/* + * Wait queue primitive, helper for secure world to implement a wait queue. + * + * If secure world need to wait for a secure world mutex it issues a sleep + * request instead of spinning in secure world. Conversely is a wakeup + * request issued when a secure world mutex with a thread waiting thread is + * unlocked. + * + * Waiting on a key + * [in] param[0].u.value.a OPTEE_MSG_RPC_WAIT_QUEUE_SLEEP + * [in] param[0].u.value.b wait key + * + * Waking up a key + * [in] param[0].u.value.a OPTEE_MSG_RPC_WAIT_QUEUE_WAKEUP + * [in] param[0].u.value.b wakeup key + */ +#define OPTEE_MSG_RPC_CMD_WAIT_QUEUE 4 +#define OPTEE_MSG_RPC_WAIT_QUEUE_SLEEP 0 +#define OPTEE_MSG_RPC_WAIT_QUEUE_WAKEUP 1 + +/* + * Suspend execution + * + * [in] param[0].value .a number of milliseconds to suspend + */ +#define OPTEE_MSG_RPC_CMD_SUSPEND 5 + +/* + * Allocate a piece of shared memory + * + * Shared memory can optionally be fragmented, to support that additional + * spare param entries are allocated to make room for eventual fragments. + * The spare param entries has .attr = OPTEE_MSG_ATTR_TYPE_NONE when + * unused. All returned temp memrefs except the last should have the + * OPTEE_MSG_ATTR_FRAGMENT bit set in the attr field. + * + * [in] param[0].u.value.a type of memory one of + * OPTEE_MSG_RPC_SHM_TYPE_* below + * [in] param[0].u.value.b requested size + * [in] param[0].u.value.c required alignment + * + * [out] param[0].u.tmem.buf_ptr physical address (of first fragment) + * [out] param[0].u.tmem.size size (of first fragment) + * [out] param[0].u.tmem.shm_ref shared memory reference + * ... + * [out] param[n].u.tmem.buf_ptr physical address + * [out] param[n].u.tmem.size size + * [out] param[n].u.tmem.shm_ref shared memory reference (same value + * as in param[n-1].u.tmem.shm_ref) + */ +#define OPTEE_MSG_RPC_CMD_SHM_ALLOC 6 +/* Memory that can be shared with a non-secure user space application */ +#define OPTEE_MSG_RPC_SHM_TYPE_APPL 0 +/* Memory only shared with non-secure kernel */ +#define OPTEE_MSG_RPC_SHM_TYPE_KERNEL 1 + +/* + * Free shared memory previously allocated with OPTEE_MSG_RPC_CMD_SHM_ALLOC + * + * [in] param[0].u.value.a type of memory one of + * OPTEE_MSG_RPC_SHM_TYPE_* above + * [in] param[0].u.value.b value of shared memory reference + * returned in param[0].u.tmem.shm_ref + * above + */ +#define OPTEE_MSG_RPC_CMD_SHM_FREE 7 + +#endif /* _OPTEE_MSG_H */
diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h new file mode 100644 index 0000000..35e7938 --- /dev/null +++ b/drivers/tee/optee/optee_private.h
@@ -0,0 +1,199 @@ +/* + * Copyright (c) 2015, Linaro Limited + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef OPTEE_PRIVATE_H +#define OPTEE_PRIVATE_H + +#include <linux/arm-smccc.h> +#include <linux/semaphore.h> +#include <linux/tee_drv.h> +#include <linux/types.h> +#include "optee_msg.h" + +#define OPTEE_MAX_ARG_SIZE 1024 + +/* Some Global Platform error codes used in this driver */ +#define TEEC_SUCCESS 0x00000000 +#define TEEC_ERROR_BAD_PARAMETERS 0xFFFF0006 +#define TEEC_ERROR_COMMUNICATION 0xFFFF000E +#define TEEC_ERROR_OUT_OF_MEMORY 0xFFFF000C + +#define TEEC_ORIGIN_COMMS 0x00000002 + +typedef void (optee_invoke_fn)(unsigned long, unsigned long, unsigned long, + unsigned long, unsigned long, unsigned long, + unsigned long, unsigned long, + struct arm_smccc_res *); + +struct optee_call_queue { + /* Serializes access to this struct */ + struct mutex mutex; + struct list_head waiters; +}; + +struct optee_wait_queue { + /* Serializes access to this struct */ + struct mutex mu; + struct list_head db; +}; + +/** + * struct optee_supp - supplicant synchronization struct + * @ctx the context of current connected supplicant. + * if !NULL the supplicant device is available for use, + * else busy + * @mutex: held while accessing content of this struct + * @req_id: current request id if supplicant is doing synchronous + * communication, else -1 + * @reqs: queued request not yet retrieved by supplicant + * @idr: IDR holding all requests currently being processed + * by supplicant + * @reqs_c: completion used by supplicant when waiting for a + * request to be queued. + */ +struct optee_supp { + /* Serializes access to this struct */ + struct mutex mutex; + struct tee_context *ctx; + + int req_id; + struct list_head reqs; + struct idr idr; + struct completion reqs_c; +}; + +/** + * struct optee - main service struct + * @supp_teedev: supplicant device + * @teedev: client device + * @invoke_fn: function to issue smc or hvc + * @call_queue: queue of threads waiting to call @invoke_fn + * @wait_queue: queue of threads from secure world waiting for a + * secure world sync object + * @supp: supplicant synchronization struct for RPC to supplicant + * @pool: shared memory pool + * @memremaped_shm virtual address of memory in shared memory pool + * @sec_caps: secure world capabilities defined by + * OPTEE_SMC_SEC_CAP_* in optee_smc.h + */ +struct optee { + struct tee_device *supp_teedev; + struct tee_device *teedev; + optee_invoke_fn *invoke_fn; + struct optee_call_queue call_queue; + struct optee_wait_queue wait_queue; + struct optee_supp supp; + struct tee_shm_pool *pool; + void *memremaped_shm; + u32 sec_caps; +}; + +struct optee_session { + struct list_head list_node; + u32 session_id; +}; + +struct optee_context_data { + /* Serializes access to this struct */ + struct mutex mutex; + struct list_head sess_list; +}; + +struct optee_rpc_param { + u32 a0; + u32 a1; + u32 a2; + u32 a3; + u32 a4; + u32 a5; + u32 a6; + u32 a7; +}; + +/* Holds context that is preserved during one STD call */ +struct optee_call_ctx { + /* information about pages list used in last allocation */ + void *pages_list; + size_t num_entries; +}; + +void optee_handle_rpc(struct tee_context *ctx, struct optee_rpc_param *param, + struct optee_call_ctx *call_ctx); +void optee_rpc_finalize_call(struct optee_call_ctx *call_ctx); + +void optee_wait_queue_init(struct optee_wait_queue *wq); +void optee_wait_queue_exit(struct optee_wait_queue *wq); + +u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params, + struct tee_param *param); + +int optee_supp_read(struct tee_context *ctx, void __user *buf, size_t len); +int optee_supp_write(struct tee_context *ctx, void __user *buf, size_t len); +void optee_supp_init(struct optee_supp *supp); +void optee_supp_uninit(struct optee_supp *supp); +void optee_supp_release(struct optee_supp *supp); + +int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params, + struct tee_param *param); +int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params, + struct tee_param *param); + +u32 optee_do_call_with_arg(struct tee_context *ctx, phys_addr_t parg); +int optee_open_session(struct tee_context *ctx, + struct tee_ioctl_open_session_arg *arg, + struct tee_param *param); +int optee_close_session(struct tee_context *ctx, u32 session); +int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg, + struct tee_param *param); +int optee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session); + +void optee_enable_shm_cache(struct optee *optee); +void optee_disable_shm_cache(struct optee *optee); + +int optee_shm_register(struct tee_context *ctx, struct tee_shm *shm, + struct page **pages, size_t num_pages, + unsigned long start); +int optee_shm_unregister(struct tee_context *ctx, struct tee_shm *shm); + +int optee_shm_register_supp(struct tee_context *ctx, struct tee_shm *shm, + struct page **pages, size_t num_pages, + unsigned long start); +int optee_shm_unregister_supp(struct tee_context *ctx, struct tee_shm *shm); + +int optee_from_msg_param(struct tee_param *params, size_t num_params, + const struct optee_msg_param *msg_params); +int optee_to_msg_param(struct optee_msg_param *msg_params, size_t num_params, + const struct tee_param *params); + +u64 *optee_allocate_pages_list(size_t num_entries); +void optee_free_pages_list(void *array, size_t num_entries); +void optee_fill_pages_list(u64 *dst, struct page **pages, int num_pages, + size_t page_offset); + +/* + * Small helpers + */ + +static inline void *reg_pair_to_ptr(u32 reg0, u32 reg1) +{ + return (void *)(unsigned long)(((u64)reg0 << 32) | reg1); +} + +static inline void reg_pair_from_64(u32 *reg0, u32 *reg1, u64 val) +{ + *reg0 = val >> 32; + *reg1 = val; +} + +#endif /*OPTEE_PRIVATE_H*/
diff --git a/drivers/tee/optee/optee_smc.h b/drivers/tee/optee/optee_smc.h new file mode 100644 index 0000000..7cd3272 --- /dev/null +++ b/drivers/tee/optee/optee_smc.h
@@ -0,0 +1,457 @@ +/* + * Copyright (c) 2015-2016, Linaro Limited + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef OPTEE_SMC_H +#define OPTEE_SMC_H + +#include <linux/arm-smccc.h> +#include <linux/bitops.h> + +#define OPTEE_SMC_STD_CALL_VAL(func_num) \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_STD_CALL, ARM_SMCCC_SMC_32, \ + ARM_SMCCC_OWNER_TRUSTED_OS, (func_num)) +#define OPTEE_SMC_FAST_CALL_VAL(func_num) \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, ARM_SMCCC_SMC_32, \ + ARM_SMCCC_OWNER_TRUSTED_OS, (func_num)) + +/* + * Function specified by SMC Calling convention. + */ +#define OPTEE_SMC_FUNCID_CALLS_COUNT 0xFF00 +#define OPTEE_SMC_CALLS_COUNT \ + ARM_SMCCC_CALL_VAL(OPTEE_SMC_FAST_CALL, SMCCC_SMC_32, \ + SMCCC_OWNER_TRUSTED_OS_END, \ + OPTEE_SMC_FUNCID_CALLS_COUNT) + +/* + * Normal cached memory (write-back), shareable for SMP systems and not + * shareable for UP systems. + */ +#define OPTEE_SMC_SHM_CACHED 1 + +/* + * a0..a7 is used as register names in the descriptions below, on arm32 + * that translates to r0..r7 and on arm64 to w0..w7. In both cases it's + * 32-bit registers. + */ + +/* + * Function specified by SMC Calling convention + * + * Return one of the following UIDs if using API specified in this file + * without further extentions: + * 65cb6b93-af0c-4617-8ed6-644a8d1140f8 + * see also OPTEE_SMC_UID_* in optee_msg.h + */ +#define OPTEE_SMC_FUNCID_CALLS_UID OPTEE_MSG_FUNCID_CALLS_UID +#define OPTEE_SMC_CALLS_UID \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, ARM_SMCCC_SMC_32, \ + ARM_SMCCC_OWNER_TRUSTED_OS_END, \ + OPTEE_SMC_FUNCID_CALLS_UID) + +/* + * Function specified by SMC Calling convention + * + * Returns 2.0 if using API specified in this file without further extentions. + * see also OPTEE_MSG_REVISION_* in optee_msg.h + */ +#define OPTEE_SMC_FUNCID_CALLS_REVISION OPTEE_MSG_FUNCID_CALLS_REVISION +#define OPTEE_SMC_CALLS_REVISION \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, ARM_SMCCC_SMC_32, \ + ARM_SMCCC_OWNER_TRUSTED_OS_END, \ + OPTEE_SMC_FUNCID_CALLS_REVISION) + +struct optee_smc_calls_revision_result { + unsigned long major; + unsigned long minor; + unsigned long reserved0; + unsigned long reserved1; +}; + +/* + * Get UUID of Trusted OS. + * + * Used by non-secure world to figure out which Trusted OS is installed. + * Note that returned UUID is the UUID of the Trusted OS, not of the API. + * + * Returns UUID in a0-4 in the same way as OPTEE_SMC_CALLS_UID + * described above. + */ +#define OPTEE_SMC_FUNCID_GET_OS_UUID OPTEE_MSG_FUNCID_GET_OS_UUID +#define OPTEE_SMC_CALL_GET_OS_UUID \ + OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_GET_OS_UUID) + +/* + * Get revision of Trusted OS. + * + * Used by non-secure world to figure out which version of the Trusted OS + * is installed. Note that the returned revision is the revision of the + * Trusted OS, not of the API. + * + * Returns revision in a0-1 in the same way as OPTEE_SMC_CALLS_REVISION + * described above. + */ +#define OPTEE_SMC_FUNCID_GET_OS_REVISION OPTEE_MSG_FUNCID_GET_OS_REVISION +#define OPTEE_SMC_CALL_GET_OS_REVISION \ + OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_GET_OS_REVISION) + +/* + * Call with struct optee_msg_arg as argument + * + * Call register usage: + * a0 SMC Function ID, OPTEE_SMC*CALL_WITH_ARG + * a1 Upper 32bit of a 64bit physical pointer to a struct optee_msg_arg + * a2 Lower 32bit of a 64bit physical pointer to a struct optee_msg_arg + * a3 Cache settings, not used if physical pointer is in a predefined shared + * memory area else per OPTEE_SMC_SHM_* + * a4-6 Not used + * a7 Hypervisor Client ID register + * + * Normal return register usage: + * a0 Return value, OPTEE_SMC_RETURN_* + * a1-3 Not used + * a4-7 Preserved + * + * OPTEE_SMC_RETURN_ETHREAD_LIMIT return register usage: + * a0 Return value, OPTEE_SMC_RETURN_ETHREAD_LIMIT + * a1-3 Preserved + * a4-7 Preserved + * + * RPC return register usage: + * a0 Return value, OPTEE_SMC_RETURN_IS_RPC(val) + * a1-2 RPC parameters + * a3-7 Resume information, must be preserved + * + * Possible return values: + * OPTEE_SMC_RETURN_UNKNOWN_FUNCTION Trusted OS does not recognize this + * function. + * OPTEE_SMC_RETURN_OK Call completed, result updated in + * the previously supplied struct + * optee_msg_arg. + * OPTEE_SMC_RETURN_ETHREAD_LIMIT Number of Trusted OS threads exceeded, + * try again later. + * OPTEE_SMC_RETURN_EBADADDR Bad physcial pointer to struct + * optee_msg_arg. + * OPTEE_SMC_RETURN_EBADCMD Bad/unknown cmd in struct optee_msg_arg + * OPTEE_SMC_RETURN_IS_RPC() Call suspended by RPC call to normal + * world. + */ +#define OPTEE_SMC_FUNCID_CALL_WITH_ARG OPTEE_MSG_FUNCID_CALL_WITH_ARG +#define OPTEE_SMC_CALL_WITH_ARG \ + OPTEE_SMC_STD_CALL_VAL(OPTEE_SMC_FUNCID_CALL_WITH_ARG) + +/* + * Get Shared Memory Config + * + * Returns the Secure/Non-secure shared memory config. + * + * Call register usage: + * a0 SMC Function ID, OPTEE_SMC_GET_SHM_CONFIG + * a1-6 Not used + * a7 Hypervisor Client ID register + * + * Have config return register usage: + * a0 OPTEE_SMC_RETURN_OK + * a1 Physical address of start of SHM + * a2 Size of of SHM + * a3 Cache settings of memory, as defined by the + * OPTEE_SMC_SHM_* values above + * a4-7 Preserved + * + * Not available register usage: + * a0 OPTEE_SMC_RETURN_ENOTAVAIL + * a1-3 Not used + * a4-7 Preserved + */ +#define OPTEE_SMC_FUNCID_GET_SHM_CONFIG 7 +#define OPTEE_SMC_GET_SHM_CONFIG \ + OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_GET_SHM_CONFIG) + +struct optee_smc_get_shm_config_result { + unsigned long status; + unsigned long start; + unsigned long size; + unsigned long settings; +}; + +/* + * Exchanges capabilities between normal world and secure world + * + * Call register usage: + * a0 SMC Function ID, OPTEE_SMC_EXCHANGE_CAPABILITIES + * a1 bitfield of normal world capabilities OPTEE_SMC_NSEC_CAP_* + * a2-6 Not used + * a7 Hypervisor Client ID register + * + * Normal return register usage: + * a0 OPTEE_SMC_RETURN_OK + * a1 bitfield of secure world capabilities OPTEE_SMC_SEC_CAP_* + * a2-7 Preserved + * + * Error return register usage: + * a0 OPTEE_SMC_RETURN_ENOTAVAIL, can't use the capabilities from normal world + * a1 bitfield of secure world capabilities OPTEE_SMC_SEC_CAP_* + * a2-7 Preserved + */ +/* Normal world works as a uniprocessor system */ +#define OPTEE_SMC_NSEC_CAP_UNIPROCESSOR BIT(0) +/* Secure world has reserved shared memory for normal world to use */ +#define OPTEE_SMC_SEC_CAP_HAVE_RESERVED_SHM BIT(0) +/* Secure world can communicate via previously unregistered shared memory */ +#define OPTEE_SMC_SEC_CAP_UNREGISTERED_SHM BIT(1) + +/* + * Secure world supports commands "register/unregister shared memory", + * secure world accepts command buffers located in any parts of non-secure RAM + */ +#define OPTEE_SMC_SEC_CAP_DYNAMIC_SHM BIT(2) + +#define OPTEE_SMC_FUNCID_EXCHANGE_CAPABILITIES 9 +#define OPTEE_SMC_EXCHANGE_CAPABILITIES \ + OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_EXCHANGE_CAPABILITIES) + +struct optee_smc_exchange_capabilities_result { + unsigned long status; + unsigned long capabilities; + unsigned long reserved0; + unsigned long reserved1; +}; + +/* + * Disable and empties cache of shared memory objects + * + * Secure world can cache frequently used shared memory objects, for + * example objects used as RPC arguments. When secure world is idle this + * function returns one shared memory reference to free. To disable the + * cache and free all cached objects this function has to be called until + * it returns OPTEE_SMC_RETURN_ENOTAVAIL. + * + * Call register usage: + * a0 SMC Function ID, OPTEE_SMC_DISABLE_SHM_CACHE + * a1-6 Not used + * a7 Hypervisor Client ID register + * + * Normal return register usage: + * a0 OPTEE_SMC_RETURN_OK + * a1 Upper 32bit of a 64bit Shared memory cookie + * a2 Lower 32bit of a 64bit Shared memory cookie + * a3-7 Preserved + * + * Cache empty return register usage: + * a0 OPTEE_SMC_RETURN_ENOTAVAIL + * a1-7 Preserved + * + * Not idle return register usage: + * a0 OPTEE_SMC_RETURN_EBUSY + * a1-7 Preserved + */ +#define OPTEE_SMC_FUNCID_DISABLE_SHM_CACHE 10 +#define OPTEE_SMC_DISABLE_SHM_CACHE \ + OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_DISABLE_SHM_CACHE) + +struct optee_smc_disable_shm_cache_result { + unsigned long status; + unsigned long shm_upper32; + unsigned long shm_lower32; + unsigned long reserved0; +}; + +/* + * Enable cache of shared memory objects + * + * Secure world can cache frequently used shared memory objects, for + * example objects used as RPC arguments. When secure world is idle this + * function returns OPTEE_SMC_RETURN_OK and the cache is enabled. If + * secure world isn't idle OPTEE_SMC_RETURN_EBUSY is returned. + * + * Call register usage: + * a0 SMC Function ID, OPTEE_SMC_ENABLE_SHM_CACHE + * a1-6 Not used + * a7 Hypervisor Client ID register + * + * Normal return register usage: + * a0 OPTEE_SMC_RETURN_OK + * a1-7 Preserved + * + * Not idle return register usage: + * a0 OPTEE_SMC_RETURN_EBUSY + * a1-7 Preserved + */ +#define OPTEE_SMC_FUNCID_ENABLE_SHM_CACHE 11 +#define OPTEE_SMC_ENABLE_SHM_CACHE \ + OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_ENABLE_SHM_CACHE) + +/* + * Resume from RPC (for example after processing a foreign interrupt) + * + * Call register usage: + * a0 SMC Function ID, OPTEE_SMC_CALL_RETURN_FROM_RPC + * a1-3 Value of a1-3 when OPTEE_SMC_CALL_WITH_ARG returned + * OPTEE_SMC_RETURN_RPC in a0 + * + * Return register usage is the same as for OPTEE_SMC_*CALL_WITH_ARG above. + * + * Possible return values + * OPTEE_SMC_RETURN_UNKNOWN_FUNCTION Trusted OS does not recognize this + * function. + * OPTEE_SMC_RETURN_OK Original call completed, result + * updated in the previously supplied. + * struct optee_msg_arg + * OPTEE_SMC_RETURN_RPC Call suspended by RPC call to normal + * world. + * OPTEE_SMC_RETURN_ERESUME Resume failed, the opaque resume + * information was corrupt. + */ +#define OPTEE_SMC_FUNCID_RETURN_FROM_RPC 3 +#define OPTEE_SMC_CALL_RETURN_FROM_RPC \ + OPTEE_SMC_STD_CALL_VAL(OPTEE_SMC_FUNCID_RETURN_FROM_RPC) + +#define OPTEE_SMC_RETURN_RPC_PREFIX_MASK 0xFFFF0000 +#define OPTEE_SMC_RETURN_RPC_PREFIX 0xFFFF0000 +#define OPTEE_SMC_RETURN_RPC_FUNC_MASK 0x0000FFFF + +#define OPTEE_SMC_RETURN_GET_RPC_FUNC(ret) \ + ((ret) & OPTEE_SMC_RETURN_RPC_FUNC_MASK) + +#define OPTEE_SMC_RPC_VAL(func) ((func) | OPTEE_SMC_RETURN_RPC_PREFIX) + +/* + * Allocate memory for RPC parameter passing. The memory is used to hold a + * struct optee_msg_arg. + * + * "Call" register usage: + * a0 This value, OPTEE_SMC_RETURN_RPC_ALLOC + * a1 Size in bytes of required argument memory + * a2 Not used + * a3 Resume information, must be preserved + * a4-5 Not used + * a6-7 Resume information, must be preserved + * + * "Return" register usage: + * a0 SMC Function ID, OPTEE_SMC_CALL_RETURN_FROM_RPC. + * a1 Upper 32bits of 64bit physical pointer to allocated + * memory, (a1 == 0 && a2 == 0) if size was 0 or if memory can't + * be allocated. + * a2 Lower 32bits of 64bit physical pointer to allocated + * memory, (a1 == 0 && a2 == 0) if size was 0 or if memory can't + * be allocated + * a3 Preserved + * a4 Upper 32bits of 64bit Shared memory cookie used when freeing + * the memory or doing an RPC + * a5 Lower 32bits of 64bit Shared memory cookie used when freeing + * the memory or doing an RPC + * a6-7 Preserved + */ +#define OPTEE_SMC_RPC_FUNC_ALLOC 0 +#define OPTEE_SMC_RETURN_RPC_ALLOC \ + OPTEE_SMC_RPC_VAL(OPTEE_SMC_RPC_FUNC_ALLOC) + +/* + * Free memory previously allocated by OPTEE_SMC_RETURN_RPC_ALLOC + * + * "Call" register usage: + * a0 This value, OPTEE_SMC_RETURN_RPC_FREE + * a1 Upper 32bits of 64bit shared memory cookie belonging to this + * argument memory + * a2 Lower 32bits of 64bit shared memory cookie belonging to this + * argument memory + * a3-7 Resume information, must be preserved + * + * "Return" register usage: + * a0 SMC Function ID, OPTEE_SMC_CALL_RETURN_FROM_RPC. + * a1-2 Not used + * a3-7 Preserved + */ +#define OPTEE_SMC_RPC_FUNC_FREE 2 +#define OPTEE_SMC_RETURN_RPC_FREE \ + OPTEE_SMC_RPC_VAL(OPTEE_SMC_RPC_FUNC_FREE) + +/* + * Deliver foreign interrupt to normal world. + * + * "Call" register usage: + * a0 OPTEE_SMC_RETURN_RPC_FOREIGN_INTR + * a1-7 Resume information, must be preserved + * + * "Return" register usage: + * a0 SMC Function ID, OPTEE_SMC_CALL_RETURN_FROM_RPC. + * a1-7 Preserved + */ +#define OPTEE_SMC_RPC_FUNC_FOREIGN_INTR 4 +#define OPTEE_SMC_RETURN_RPC_FOREIGN_INTR \ + OPTEE_SMC_RPC_VAL(OPTEE_SMC_RPC_FUNC_FOREIGN_INTR) + +/* + * Do an RPC request. The supplied struct optee_msg_arg tells which + * request to do and the parameters for the request. The following fields + * are used (the rest are unused): + * - cmd the Request ID + * - ret return value of the request, filled in by normal world + * - num_params number of parameters for the request + * - params the parameters + * - param_attrs attributes of the parameters + * + * "Call" register usage: + * a0 OPTEE_SMC_RETURN_RPC_CMD + * a1 Upper 32bit of a 64bit Shared memory cookie holding a + * struct optee_msg_arg, must be preserved, only the data should + * be updated + * a2 Lower 32bit of a 64bit Shared memory cookie holding a + * struct optee_msg_arg, must be preserved, only the data should + * be updated + * a3-7 Resume information, must be preserved + * + * "Return" register usage: + * a0 SMC Function ID, OPTEE_SMC_CALL_RETURN_FROM_RPC. + * a1-2 Not used + * a3-7 Preserved + */ +#define OPTEE_SMC_RPC_FUNC_CMD 5 +#define OPTEE_SMC_RETURN_RPC_CMD \ + OPTEE_SMC_RPC_VAL(OPTEE_SMC_RPC_FUNC_CMD) + +/* Returned in a0 */ +#define OPTEE_SMC_RETURN_UNKNOWN_FUNCTION 0xFFFFFFFF + +/* Returned in a0 only from Trusted OS functions */ +#define OPTEE_SMC_RETURN_OK 0x0 +#define OPTEE_SMC_RETURN_ETHREAD_LIMIT 0x1 +#define OPTEE_SMC_RETURN_EBUSY 0x2 +#define OPTEE_SMC_RETURN_ERESUME 0x3 +#define OPTEE_SMC_RETURN_EBADADDR 0x4 +#define OPTEE_SMC_RETURN_EBADCMD 0x5 +#define OPTEE_SMC_RETURN_ENOMEM 0x6 +#define OPTEE_SMC_RETURN_ENOTAVAIL 0x7 +#define OPTEE_SMC_RETURN_IS_RPC(ret) __optee_smc_return_is_rpc((ret)) + +static inline bool __optee_smc_return_is_rpc(u32 ret) +{ + return ret != OPTEE_SMC_RETURN_UNKNOWN_FUNCTION && + (ret & OPTEE_SMC_RETURN_RPC_PREFIX_MASK) == + OPTEE_SMC_RETURN_RPC_PREFIX; +} + +#endif /* OPTEE_SMC_H */
diff --git a/drivers/tee/optee/rpc.c b/drivers/tee/optee/rpc.c new file mode 100644 index 0000000..41aea12 --- /dev/null +++ b/drivers/tee/optee/rpc.c
@@ -0,0 +1,452 @@ +/* + * Copyright (c) 2015-2016, Linaro Limited + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/delay.h> +#include <linux/device.h> +#include <linux/slab.h> +#include <linux/tee_drv.h> +#include "optee_private.h" +#include "optee_smc.h" + +struct wq_entry { + struct list_head link; + struct completion c; + u32 key; +}; + +void optee_wait_queue_init(struct optee_wait_queue *priv) +{ + mutex_init(&priv->mu); + INIT_LIST_HEAD(&priv->db); +} + +void optee_wait_queue_exit(struct optee_wait_queue *priv) +{ + mutex_destroy(&priv->mu); +} + +static void handle_rpc_func_cmd_get_time(struct optee_msg_arg *arg) +{ + struct timespec64 ts; + + if (arg->num_params != 1) + goto bad; + if ((arg->params[0].attr & OPTEE_MSG_ATTR_TYPE_MASK) != + OPTEE_MSG_ATTR_TYPE_VALUE_OUTPUT) + goto bad; + + getnstimeofday64(&ts); + arg->params[0].u.value.a = ts.tv_sec; + arg->params[0].u.value.b = ts.tv_nsec; + + arg->ret = TEEC_SUCCESS; + return; +bad: + arg->ret = TEEC_ERROR_BAD_PARAMETERS; +} + +static struct wq_entry *wq_entry_get(struct optee_wait_queue *wq, u32 key) +{ + struct wq_entry *w; + + mutex_lock(&wq->mu); + + list_for_each_entry(w, &wq->db, link) + if (w->key == key) + goto out; + + w = kmalloc(sizeof(*w), GFP_KERNEL); + if (w) { + init_completion(&w->c); + w->key = key; + list_add_tail(&w->link, &wq->db); + } +out: + mutex_unlock(&wq->mu); + return w; +} + +static void wq_sleep(struct optee_wait_queue *wq, u32 key) +{ + struct wq_entry *w = wq_entry_get(wq, key); + + if (w) { + wait_for_completion(&w->c); + mutex_lock(&wq->mu); + list_del(&w->link); + mutex_unlock(&wq->mu); + kfree(w); + } +} + +static void wq_wakeup(struct optee_wait_queue *wq, u32 key) +{ + struct wq_entry *w = wq_entry_get(wq, key); + + if (w) + complete(&w->c); +} + +static void handle_rpc_func_cmd_wq(struct optee *optee, + struct optee_msg_arg *arg) +{ + if (arg->num_params != 1) + goto bad; + + if ((arg->params[0].attr & OPTEE_MSG_ATTR_TYPE_MASK) != + OPTEE_MSG_ATTR_TYPE_VALUE_INPUT) + goto bad; + + switch (arg->params[0].u.value.a) { + case OPTEE_MSG_RPC_WAIT_QUEUE_SLEEP: + wq_sleep(&optee->wait_queue, arg->params[0].u.value.b); + break; + case OPTEE_MSG_RPC_WAIT_QUEUE_WAKEUP: + wq_wakeup(&optee->wait_queue, arg->params[0].u.value.b); + break; + default: + goto bad; + } + + arg->ret = TEEC_SUCCESS; + return; +bad: + arg->ret = TEEC_ERROR_BAD_PARAMETERS; +} + +static void handle_rpc_func_cmd_wait(struct optee_msg_arg *arg) +{ + u32 msec_to_wait; + + if (arg->num_params != 1) + goto bad; + + if ((arg->params[0].attr & OPTEE_MSG_ATTR_TYPE_MASK) != + OPTEE_MSG_ATTR_TYPE_VALUE_INPUT) + goto bad; + + msec_to_wait = arg->params[0].u.value.a; + + /* Go to interruptible sleep */ + msleep_interruptible(msec_to_wait); + + arg->ret = TEEC_SUCCESS; + return; +bad: + arg->ret = TEEC_ERROR_BAD_PARAMETERS; +} + +static void handle_rpc_supp_cmd(struct tee_context *ctx, + struct optee_msg_arg *arg) +{ + struct tee_param *params; + + arg->ret_origin = TEEC_ORIGIN_COMMS; + + params = kmalloc_array(arg->num_params, sizeof(struct tee_param), + GFP_KERNEL); + if (!params) { + arg->ret = TEEC_ERROR_OUT_OF_MEMORY; + return; + } + + if (optee_from_msg_param(params, arg->num_params, arg->params)) { + arg->ret = TEEC_ERROR_BAD_PARAMETERS; + goto out; + } + + arg->ret = optee_supp_thrd_req(ctx, arg->cmd, arg->num_params, params); + + if (optee_to_msg_param(arg->params, arg->num_params, params)) + arg->ret = TEEC_ERROR_BAD_PARAMETERS; +out: + kfree(params); +} + +static struct tee_shm *cmd_alloc_suppl(struct tee_context *ctx, size_t sz) +{ + u32 ret; + struct tee_param param; + struct optee *optee = tee_get_drvdata(ctx->teedev); + struct tee_shm *shm; + + param.attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT; + param.u.value.a = OPTEE_MSG_RPC_SHM_TYPE_APPL; + param.u.value.b = sz; + param.u.value.c = 0; + + ret = optee_supp_thrd_req(ctx, OPTEE_MSG_RPC_CMD_SHM_ALLOC, 1, ¶m); + if (ret) + return ERR_PTR(-ENOMEM); + + mutex_lock(&optee->supp.mutex); + /* Increases count as secure world doesn't have a reference */ + shm = tee_shm_get_from_id(optee->supp.ctx, param.u.value.c); + mutex_unlock(&optee->supp.mutex); + return shm; +} + +static void handle_rpc_func_cmd_shm_alloc(struct tee_context *ctx, + struct optee_msg_arg *arg, + struct optee_call_ctx *call_ctx) +{ + phys_addr_t pa; + struct tee_shm *shm; + size_t sz; + size_t n; + + arg->ret_origin = TEEC_ORIGIN_COMMS; + + if (!arg->num_params || + arg->params[0].attr != OPTEE_MSG_ATTR_TYPE_VALUE_INPUT) { + arg->ret = TEEC_ERROR_BAD_PARAMETERS; + return; + } + + for (n = 1; n < arg->num_params; n++) { + if (arg->params[n].attr != OPTEE_MSG_ATTR_TYPE_NONE) { + arg->ret = TEEC_ERROR_BAD_PARAMETERS; + return; + } + } + + sz = arg->params[0].u.value.b; + switch (arg->params[0].u.value.a) { + case OPTEE_MSG_RPC_SHM_TYPE_APPL: + shm = cmd_alloc_suppl(ctx, sz); + break; + case OPTEE_MSG_RPC_SHM_TYPE_KERNEL: + shm = tee_shm_alloc(ctx, sz, TEE_SHM_MAPPED); + break; + default: + arg->ret = TEEC_ERROR_BAD_PARAMETERS; + return; + } + + if (IS_ERR(shm)) { + arg->ret = TEEC_ERROR_OUT_OF_MEMORY; + return; + } + + if (tee_shm_get_pa(shm, 0, &pa)) { + arg->ret = TEEC_ERROR_BAD_PARAMETERS; + goto bad; + } + + sz = tee_shm_get_size(shm); + + if (tee_shm_is_registered(shm)) { + struct page **pages; + u64 *pages_list; + size_t page_num; + + pages = tee_shm_get_pages(shm, &page_num); + if (!pages || !page_num) { + arg->ret = TEEC_ERROR_OUT_OF_MEMORY; + goto bad; + } + + pages_list = optee_allocate_pages_list(page_num); + if (!pages_list) { + arg->ret = TEEC_ERROR_OUT_OF_MEMORY; + goto bad; + } + + call_ctx->pages_list = pages_list; + call_ctx->num_entries = page_num; + + arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT | + OPTEE_MSG_ATTR_NONCONTIG; + /* + * In the least bits of u.tmem.buf_ptr we store buffer offset + * from 4k page, as described in OP-TEE ABI. + */ + arg->params[0].u.tmem.buf_ptr = virt_to_phys(pages_list) | + (tee_shm_get_page_offset(shm) & + (OPTEE_MSG_NONCONTIG_PAGE_SIZE - 1)); + arg->params[0].u.tmem.size = tee_shm_get_size(shm); + arg->params[0].u.tmem.shm_ref = (unsigned long)shm; + + optee_fill_pages_list(pages_list, pages, page_num, + tee_shm_get_page_offset(shm)); + } else { + arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT; + arg->params[0].u.tmem.buf_ptr = pa; + arg->params[0].u.tmem.size = sz; + arg->params[0].u.tmem.shm_ref = (unsigned long)shm; + } + + arg->ret = TEEC_SUCCESS; + return; +bad: + tee_shm_free(shm); +} + +static void cmd_free_suppl(struct tee_context *ctx, struct tee_shm *shm) +{ + struct tee_param param; + + param.attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT; + param.u.value.a = OPTEE_MSG_RPC_SHM_TYPE_APPL; + param.u.value.b = tee_shm_get_id(shm); + param.u.value.c = 0; + + /* + * Match the tee_shm_get_from_id() in cmd_alloc_suppl() as secure + * world has released its reference. + * + * It's better to do this before sending the request to supplicant + * as we'd like to let the process doing the initial allocation to + * do release the last reference too in order to avoid stacking + * many pending fput() on the client process. This could otherwise + * happen if secure world does many allocate and free in a single + * invoke. + */ + tee_shm_put(shm); + + optee_supp_thrd_req(ctx, OPTEE_MSG_RPC_CMD_SHM_FREE, 1, ¶m); +} + +static void handle_rpc_func_cmd_shm_free(struct tee_context *ctx, + struct optee_msg_arg *arg) +{ + struct tee_shm *shm; + + arg->ret_origin = TEEC_ORIGIN_COMMS; + + if (arg->num_params != 1 || + arg->params[0].attr != OPTEE_MSG_ATTR_TYPE_VALUE_INPUT) { + arg->ret = TEEC_ERROR_BAD_PARAMETERS; + return; + } + + shm = (struct tee_shm *)(unsigned long)arg->params[0].u.value.b; + switch (arg->params[0].u.value.a) { + case OPTEE_MSG_RPC_SHM_TYPE_APPL: + cmd_free_suppl(ctx, shm); + break; + case OPTEE_MSG_RPC_SHM_TYPE_KERNEL: + tee_shm_free(shm); + break; + default: + arg->ret = TEEC_ERROR_BAD_PARAMETERS; + } + arg->ret = TEEC_SUCCESS; +} + +static void free_pages_list(struct optee_call_ctx *call_ctx) +{ + if (call_ctx->pages_list) { + optee_free_pages_list(call_ctx->pages_list, + call_ctx->num_entries); + call_ctx->pages_list = NULL; + call_ctx->num_entries = 0; + } +} + +void optee_rpc_finalize_call(struct optee_call_ctx *call_ctx) +{ + free_pages_list(call_ctx); +} + +static void handle_rpc_func_cmd(struct tee_context *ctx, struct optee *optee, + struct tee_shm *shm, + struct optee_call_ctx *call_ctx) +{ + struct optee_msg_arg *arg; + + arg = tee_shm_get_va(shm, 0); + if (IS_ERR(arg)) { + pr_err("%s: tee_shm_get_va %p failed\n", __func__, shm); + return; + } + + switch (arg->cmd) { + case OPTEE_MSG_RPC_CMD_GET_TIME: + handle_rpc_func_cmd_get_time(arg); + break; + case OPTEE_MSG_RPC_CMD_WAIT_QUEUE: + handle_rpc_func_cmd_wq(optee, arg); + break; + case OPTEE_MSG_RPC_CMD_SUSPEND: + handle_rpc_func_cmd_wait(arg); + break; + case OPTEE_MSG_RPC_CMD_SHM_ALLOC: + free_pages_list(call_ctx); + handle_rpc_func_cmd_shm_alloc(ctx, arg, call_ctx); + break; + case OPTEE_MSG_RPC_CMD_SHM_FREE: + handle_rpc_func_cmd_shm_free(ctx, arg); + break; + default: + handle_rpc_supp_cmd(ctx, arg); + } +} + +/** + * optee_handle_rpc() - handle RPC from secure world + * @ctx: context doing the RPC + * @param: value of registers for the RPC + * @call_ctx: call context. Preserved during one OP-TEE invocation + * + * Result of RPC is written back into @param. + */ +void optee_handle_rpc(struct tee_context *ctx, struct optee_rpc_param *param, + struct optee_call_ctx *call_ctx) +{ + struct tee_device *teedev = ctx->teedev; + struct optee *optee = tee_get_drvdata(teedev); + struct tee_shm *shm; + phys_addr_t pa; + + switch (OPTEE_SMC_RETURN_GET_RPC_FUNC(param->a0)) { + case OPTEE_SMC_RPC_FUNC_ALLOC: + shm = tee_shm_alloc(ctx, param->a1, TEE_SHM_MAPPED); + if (!IS_ERR(shm) && !tee_shm_get_pa(shm, 0, &pa)) { + reg_pair_from_64(¶m->a1, ¶m->a2, pa); + reg_pair_from_64(¶m->a4, ¶m->a5, + (unsigned long)shm); + } else { + param->a1 = 0; + param->a2 = 0; + param->a4 = 0; + param->a5 = 0; + } + break; + case OPTEE_SMC_RPC_FUNC_FREE: + shm = reg_pair_to_ptr(param->a1, param->a2); + tee_shm_free(shm); + break; + case OPTEE_SMC_RPC_FUNC_FOREIGN_INTR: + /* + * A foreign interrupt was raised while secure world was + * executing, since they are handled in Linux a dummy RPC is + * performed to let Linux take the interrupt through the normal + * vector. + */ + break; + case OPTEE_SMC_RPC_FUNC_CMD: + shm = reg_pair_to_ptr(param->a1, param->a2); + handle_rpc_func_cmd(ctx, optee, shm, call_ctx); + break; + default: + pr_warn("Unknown RPC func 0x%x\n", + (u32)OPTEE_SMC_RETURN_GET_RPC_FUNC(param->a0)); + break; + } + + param->a0 = OPTEE_SMC_CALL_RETURN_FROM_RPC; +}
diff --git a/drivers/tee/optee/shm_pool.c b/drivers/tee/optee/shm_pool.c new file mode 100644 index 0000000..4939781 --- /dev/null +++ b/drivers/tee/optee/shm_pool.c
@@ -0,0 +1,75 @@ +/* + * Copyright (c) 2015, Linaro Limited + * Copyright (c) 2017, EPAM Systems + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ +#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/genalloc.h> +#include <linux/slab.h> +#include <linux/tee_drv.h> +#include "optee_private.h" +#include "optee_smc.h" +#include "shm_pool.h" + +static int pool_op_alloc(struct tee_shm_pool_mgr *poolm, + struct tee_shm *shm, size_t size) +{ + unsigned int order = get_order(size); + struct page *page; + + page = alloc_pages(GFP_KERNEL | __GFP_ZERO, order); + if (!page) + return -ENOMEM; + + shm->kaddr = page_address(page); + shm->paddr = page_to_phys(page); + shm->size = PAGE_SIZE << order; + + return 0; +} + +static void pool_op_free(struct tee_shm_pool_mgr *poolm, + struct tee_shm *shm) +{ + free_pages((unsigned long)shm->kaddr, get_order(shm->size)); + shm->kaddr = NULL; +} + +static void pool_op_destroy_poolmgr(struct tee_shm_pool_mgr *poolm) +{ + kfree(poolm); +} + +static const struct tee_shm_pool_mgr_ops pool_ops = { + .alloc = pool_op_alloc, + .free = pool_op_free, + .destroy_poolmgr = pool_op_destroy_poolmgr, +}; + +/** + * optee_shm_pool_alloc_pages() - create page-based allocator pool + * + * This pool is used when OP-TEE supports dymanic SHM. In this case + * command buffers and such are allocated from kernel's own memory. + */ +struct tee_shm_pool_mgr *optee_shm_pool_alloc_pages(void) +{ + struct tee_shm_pool_mgr *mgr = kzalloc(sizeof(*mgr), GFP_KERNEL); + + if (!mgr) + return ERR_PTR(-ENOMEM); + + mgr->ops = &pool_ops; + + return mgr; +}
diff --git a/drivers/tee/optee/shm_pool.h b/drivers/tee/optee/shm_pool.h new file mode 100644 index 0000000..4e753c3 --- /dev/null +++ b/drivers/tee/optee/shm_pool.h
@@ -0,0 +1,23 @@ +/* + * Copyright (c) 2015, Linaro Limited + * Copyright (c) 2016, EPAM Systems + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef SHM_POOL_H +#define SHM_POOL_H + +#include <linux/tee_drv.h> + +struct tee_shm_pool_mgr *optee_shm_pool_alloc_pages(void); + +#endif
diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c new file mode 100644 index 0000000..df35fc0 --- /dev/null +++ b/drivers/tee/optee/supp.c
@@ -0,0 +1,382 @@ +/* + * Copyright (c) 2015, Linaro Limited + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ +#include <linux/device.h> +#include <linux/slab.h> +#include <linux/uaccess.h> +#include "optee_private.h" + +struct optee_supp_req { + struct list_head link; + + bool busy; + u32 func; + u32 ret; + size_t num_params; + struct tee_param *param; + + struct completion c; +}; + +void optee_supp_init(struct optee_supp *supp) +{ + memset(supp, 0, sizeof(*supp)); + mutex_init(&supp->mutex); + init_completion(&supp->reqs_c); + idr_init(&supp->idr); + INIT_LIST_HEAD(&supp->reqs); + supp->req_id = -1; +} + +void optee_supp_uninit(struct optee_supp *supp) +{ + mutex_destroy(&supp->mutex); + idr_destroy(&supp->idr); +} + +void optee_supp_release(struct optee_supp *supp) +{ + int id; + struct optee_supp_req *req; + struct optee_supp_req *req_tmp; + + mutex_lock(&supp->mutex); + + /* Abort all request retrieved by supplicant */ + idr_for_each_entry(&supp->idr, req, id) { + req->busy = false; + idr_remove(&supp->idr, id); + req->ret = TEEC_ERROR_COMMUNICATION; + complete(&req->c); + } + + /* Abort all queued requests */ + list_for_each_entry_safe(req, req_tmp, &supp->reqs, link) { + list_del(&req->link); + req->ret = TEEC_ERROR_COMMUNICATION; + complete(&req->c); + } + + supp->ctx = NULL; + supp->req_id = -1; + + mutex_unlock(&supp->mutex); +} + +/** + * optee_supp_thrd_req() - request service from supplicant + * @ctx: context doing the request + * @func: function requested + * @num_params: number of elements in @param array + * @param: parameters for function + * + * Returns result of operation to be passed to secure world + */ +u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params, + struct tee_param *param) + +{ + struct optee *optee = tee_get_drvdata(ctx->teedev); + struct optee_supp *supp = &optee->supp; + struct optee_supp_req *req = kzalloc(sizeof(*req), GFP_KERNEL); + bool interruptable; + u32 ret; + + if (!req) + return TEEC_ERROR_OUT_OF_MEMORY; + + init_completion(&req->c); + req->func = func; + req->num_params = num_params; + req->param = param; + + /* Insert the request in the request list */ + mutex_lock(&supp->mutex); + list_add_tail(&req->link, &supp->reqs); + mutex_unlock(&supp->mutex); + + /* Tell an eventual waiter there's a new request */ + complete(&supp->reqs_c); + + /* + * Wait for supplicant to process and return result, once we've + * returned from wait_for_completion(&req->c) successfully we have + * exclusive access again. + */ + while (wait_for_completion_interruptible(&req->c)) { + mutex_lock(&supp->mutex); + interruptable = !supp->ctx; + if (interruptable) { + /* + * There's no supplicant available and since the + * supp->mutex currently is held none can + * become available until the mutex released + * again. + * + * Interrupting an RPC to supplicant is only + * allowed as a way of slightly improving the user + * experience in case the supplicant hasn't been + * started yet. During normal operation the supplicant + * will serve all requests in a timely manner and + * interrupting then wouldn't make sense. + */ + interruptable = !req->busy; + if (!req->busy) + list_del(&req->link); + } + mutex_unlock(&supp->mutex); + + if (interruptable) { + req->ret = TEEC_ERROR_COMMUNICATION; + break; + } + } + + ret = req->ret; + kfree(req); + + return ret; +} + +static struct optee_supp_req *supp_pop_entry(struct optee_supp *supp, + int num_params, int *id) +{ + struct optee_supp_req *req; + + if (supp->req_id != -1) { + /* + * Supplicant should not mix synchronous and asnynchronous + * requests. + */ + return ERR_PTR(-EINVAL); + } + + if (list_empty(&supp->reqs)) + return NULL; + + req = list_first_entry(&supp->reqs, struct optee_supp_req, link); + + if (num_params < req->num_params) { + /* Not enough room for parameters */ + return ERR_PTR(-EINVAL); + } + + *id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL); + if (*id < 0) + return ERR_PTR(-ENOMEM); + + list_del(&req->link); + req->busy = true; + + return req; +} + +static int supp_check_recv_params(size_t num_params, struct tee_param *params, + size_t *num_meta) +{ + size_t n; + + if (!num_params) + return -EINVAL; + + /* + * If there's memrefs we need to decrease those as they where + * increased earlier and we'll even refuse to accept any below. + */ + for (n = 0; n < num_params; n++) + if (tee_param_is_memref(params + n) && params[n].u.memref.shm) + tee_shm_put(params[n].u.memref.shm); + + /* + * We only expect parameters as TEE_IOCTL_PARAM_ATTR_TYPE_NONE with + * or without the TEE_IOCTL_PARAM_ATTR_META bit set. + */ + for (n = 0; n < num_params; n++) + if (params[n].attr && + params[n].attr != TEE_IOCTL_PARAM_ATTR_META) + return -EINVAL; + + /* At most we'll need one meta parameter so no need to check for more */ + if (params->attr == TEE_IOCTL_PARAM_ATTR_META) + *num_meta = 1; + else + *num_meta = 0; + + return 0; +} + +/** + * optee_supp_recv() - receive request for supplicant + * @ctx: context receiving the request + * @func: requested function in supplicant + * @num_params: number of elements allocated in @param, updated with number + * used elements + * @param: space for parameters for @func + * + * Returns 0 on success or <0 on failure + */ +int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params, + struct tee_param *param) +{ + struct tee_device *teedev = ctx->teedev; + struct optee *optee = tee_get_drvdata(teedev); + struct optee_supp *supp = &optee->supp; + struct optee_supp_req *req = NULL; + int id; + size_t num_meta; + int rc; + + rc = supp_check_recv_params(*num_params, param, &num_meta); + if (rc) + return rc; + + while (true) { + mutex_lock(&supp->mutex); + req = supp_pop_entry(supp, *num_params - num_meta, &id); + mutex_unlock(&supp->mutex); + + if (req) { + if (IS_ERR(req)) + return PTR_ERR(req); + break; + } + + /* + * If we didn't get a request we'll block in + * wait_for_completion() to avoid needless spinning. + * + * This is where supplicant will be hanging most of + * the time, let's make this interruptable so we + * can easily restart supplicant if needed. + */ + if (wait_for_completion_interruptible(&supp->reqs_c)) + return -ERESTARTSYS; + } + + if (num_meta) { + /* + * tee-supplicant support meta parameters -> requsts can be + * processed asynchronously. + */ + param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT | + TEE_IOCTL_PARAM_ATTR_META; + param->u.value.a = id; + param->u.value.b = 0; + param->u.value.c = 0; + } else { + mutex_lock(&supp->mutex); + supp->req_id = id; + mutex_unlock(&supp->mutex); + } + + *func = req->func; + *num_params = req->num_params + num_meta; + memcpy(param + num_meta, req->param, + sizeof(struct tee_param) * req->num_params); + + return 0; +} + +static struct optee_supp_req *supp_pop_req(struct optee_supp *supp, + size_t num_params, + struct tee_param *param, + size_t *num_meta) +{ + struct optee_supp_req *req; + int id; + size_t nm; + const u32 attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT | + TEE_IOCTL_PARAM_ATTR_META; + + if (!num_params) + return ERR_PTR(-EINVAL); + + if (supp->req_id == -1) { + if (param->attr != attr) + return ERR_PTR(-EINVAL); + id = param->u.value.a; + nm = 1; + } else { + id = supp->req_id; + nm = 0; + } + + req = idr_find(&supp->idr, id); + if (!req) + return ERR_PTR(-ENOENT); + + if ((num_params - nm) != req->num_params) + return ERR_PTR(-EINVAL); + + req->busy = false; + idr_remove(&supp->idr, id); + supp->req_id = -1; + *num_meta = nm; + + return req; +} + +/** + * optee_supp_send() - send result of request from supplicant + * @ctx: context sending result + * @ret: return value of request + * @num_params: number of parameters returned + * @param: returned parameters + * + * Returns 0 on success or <0 on failure. + */ +int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params, + struct tee_param *param) +{ + struct tee_device *teedev = ctx->teedev; + struct optee *optee = tee_get_drvdata(teedev); + struct optee_supp *supp = &optee->supp; + struct optee_supp_req *req; + size_t n; + size_t num_meta; + + mutex_lock(&supp->mutex); + req = supp_pop_req(supp, num_params, param, &num_meta); + mutex_unlock(&supp->mutex); + + if (IS_ERR(req)) { + /* Something is wrong, let supplicant restart. */ + return PTR_ERR(req); + } + + /* Update out and in/out parameters */ + for (n = 0; n < req->num_params; n++) { + struct tee_param *p = req->param + n; + + switch (p->attr & TEE_IOCTL_PARAM_ATTR_TYPE_MASK) { + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT: + p->u.value.a = param[n + num_meta].u.value.a; + p->u.value.b = param[n + num_meta].u.value.b; + p->u.value.c = param[n + num_meta].u.value.c; + break; + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: + p->u.memref.size = param[n + num_meta].u.memref.size; + break; + default: + break; + } + } + req->ret = ret; + + /* Let the requesting thread continue */ + complete(&req->c); + + return 0; +}
diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c new file mode 100644 index 0000000..6c4b200 --- /dev/null +++ b/drivers/tee/tee_core.c
@@ -0,0 +1,949 @@ +/* + * Copyright (c) 2015-2016, Linaro Limited + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#define pr_fmt(fmt) "%s: " fmt, __func__ + +#include <linux/cdev.h> +#include <linux/device.h> +#include <linux/fs.h> +#include <linux/idr.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/tee_drv.h> +#include <linux/uaccess.h> +#include "tee_private.h" + +#define TEE_NUM_DEVICES 32 + +#define TEE_IOCTL_PARAM_SIZE(x) (sizeof(struct tee_param) * (x)) + +/* + * Unprivileged devices in the lower half range and privileged devices in + * the upper half range. + */ +static DECLARE_BITMAP(dev_mask, TEE_NUM_DEVICES); +static DEFINE_SPINLOCK(driver_lock); + +static struct class *tee_class; +static dev_t tee_devt; + +static int tee_open(struct inode *inode, struct file *filp) +{ + int rc; + struct tee_device *teedev; + struct tee_context *ctx; + + teedev = container_of(inode->i_cdev, struct tee_device, cdev); + if (!tee_device_get(teedev)) + return -EINVAL; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) { + rc = -ENOMEM; + goto err; + } + + kref_init(&ctx->refcount); + ctx->teedev = teedev; + INIT_LIST_HEAD(&ctx->list_shm); + filp->private_data = ctx; + rc = teedev->desc->ops->open(ctx); + if (rc) + goto err; + + return 0; +err: + kfree(ctx); + tee_device_put(teedev); + return rc; +} + +void teedev_ctx_get(struct tee_context *ctx) +{ + if (ctx->releasing) + return; + + kref_get(&ctx->refcount); +} + +static void teedev_ctx_release(struct kref *ref) +{ + struct tee_context *ctx = container_of(ref, struct tee_context, + refcount); + ctx->releasing = true; + ctx->teedev->desc->ops->release(ctx); + kfree(ctx); +} + +void teedev_ctx_put(struct tee_context *ctx) +{ + if (ctx->releasing) + return; + + kref_put(&ctx->refcount, teedev_ctx_release); +} + +static void teedev_close_context(struct tee_context *ctx) +{ + tee_device_put(ctx->teedev); + teedev_ctx_put(ctx); +} + +static int tee_release(struct inode *inode, struct file *filp) +{ + teedev_close_context(filp->private_data); + return 0; +} + +static int tee_ioctl_version(struct tee_context *ctx, + struct tee_ioctl_version_data __user *uvers) +{ + struct tee_ioctl_version_data vers; + + ctx->teedev->desc->ops->get_version(ctx->teedev, &vers); + + if (ctx->teedev->desc->flags & TEE_DESC_PRIVILEGED) + vers.gen_caps |= TEE_GEN_CAP_PRIVILEGED; + + if (copy_to_user(uvers, &vers, sizeof(vers))) + return -EFAULT; + + return 0; +} + +static int tee_ioctl_shm_alloc(struct tee_context *ctx, + struct tee_ioctl_shm_alloc_data __user *udata) +{ + long ret; + struct tee_ioctl_shm_alloc_data data; + struct tee_shm *shm; + + if (copy_from_user(&data, udata, sizeof(data))) + return -EFAULT; + + /* Currently no input flags are supported */ + if (data.flags) + return -EINVAL; + + shm = tee_shm_alloc(ctx, data.size, TEE_SHM_MAPPED | TEE_SHM_DMA_BUF); + if (IS_ERR(shm)) + return PTR_ERR(shm); + + data.id = shm->id; + data.flags = shm->flags; + data.size = shm->size; + + if (copy_to_user(udata, &data, sizeof(data))) + ret = -EFAULT; + else + ret = tee_shm_get_fd(shm); + + /* + * When user space closes the file descriptor the shared memory + * should be freed or if tee_shm_get_fd() failed then it will + * be freed immediately. + */ + tee_shm_put(shm); + return ret; +} + +static int +tee_ioctl_shm_register(struct tee_context *ctx, + struct tee_ioctl_shm_register_data __user *udata) +{ + long ret; + struct tee_ioctl_shm_register_data data; + struct tee_shm *shm; + + if (copy_from_user(&data, udata, sizeof(data))) + return -EFAULT; + + /* Currently no input flags are supported */ + if (data.flags) + return -EINVAL; + + shm = tee_shm_register(ctx, data.addr, data.length, + TEE_SHM_DMA_BUF | TEE_SHM_USER_MAPPED); + if (IS_ERR(shm)) + return PTR_ERR(shm); + + data.id = shm->id; + data.flags = shm->flags; + data.length = shm->size; + + if (copy_to_user(udata, &data, sizeof(data))) + ret = -EFAULT; + else + ret = tee_shm_get_fd(shm); + /* + * When user space closes the file descriptor the shared memory + * should be freed or if tee_shm_get_fd() failed then it will + * be freed immediately. + */ + tee_shm_put(shm); + return ret; +} + +static int params_from_user(struct tee_context *ctx, struct tee_param *params, + size_t num_params, + struct tee_ioctl_param __user *uparams) +{ + size_t n; + + for (n = 0; n < num_params; n++) { + struct tee_shm *shm; + struct tee_ioctl_param ip; + + if (copy_from_user(&ip, uparams + n, sizeof(ip))) + return -EFAULT; + + /* All unused attribute bits has to be zero */ + if (ip.attr & ~TEE_IOCTL_PARAM_ATTR_MASK) + return -EINVAL; + + params[n].attr = ip.attr; + switch (ip.attr & TEE_IOCTL_PARAM_ATTR_TYPE_MASK) { + case TEE_IOCTL_PARAM_ATTR_TYPE_NONE: + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT: + break; + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT: + params[n].u.value.a = ip.a; + params[n].u.value.b = ip.b; + params[n].u.value.c = ip.c; + break; + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: + /* + * If we fail to get a pointer to a shared memory + * object (and increase the ref count) from an + * identifier we return an error. All pointers that + * has been added in params have an increased ref + * count. It's the callers responibility to do + * tee_shm_put() on all resolved pointers. + */ + shm = tee_shm_get_from_id(ctx, ip.c); + if (IS_ERR(shm)) + return PTR_ERR(shm); + + params[n].u.memref.shm_offs = ip.a; + params[n].u.memref.size = ip.b; + params[n].u.memref.shm = shm; + break; + default: + /* Unknown attribute */ + return -EINVAL; + } + } + return 0; +} + +static int params_to_user(struct tee_ioctl_param __user *uparams, + size_t num_params, struct tee_param *params) +{ + size_t n; + + for (n = 0; n < num_params; n++) { + struct tee_ioctl_param __user *up = uparams + n; + struct tee_param *p = params + n; + + switch (p->attr) { + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT: + if (put_user(p->u.value.a, &up->a) || + put_user(p->u.value.b, &up->b) || + put_user(p->u.value.c, &up->c)) + return -EFAULT; + break; + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: + if (put_user((u64)p->u.memref.size, &up->b)) + return -EFAULT; + default: + break; + } + } + return 0; +} + +static int tee_ioctl_open_session(struct tee_context *ctx, + struct tee_ioctl_buf_data __user *ubuf) +{ + int rc; + size_t n; + struct tee_ioctl_buf_data buf; + struct tee_ioctl_open_session_arg __user *uarg; + struct tee_ioctl_open_session_arg arg; + struct tee_ioctl_param __user *uparams = NULL; + struct tee_param *params = NULL; + bool have_session = false; + + if (!ctx->teedev->desc->ops->open_session) + return -EINVAL; + + if (copy_from_user(&buf, ubuf, sizeof(buf))) + return -EFAULT; + + if (buf.buf_len > TEE_MAX_ARG_SIZE || + buf.buf_len < sizeof(struct tee_ioctl_open_session_arg)) + return -EINVAL; + + uarg = u64_to_user_ptr(buf.buf_ptr); + if (copy_from_user(&arg, uarg, sizeof(arg))) + return -EFAULT; + + if (sizeof(arg) + TEE_IOCTL_PARAM_SIZE(arg.num_params) != buf.buf_len) + return -EINVAL; + + if (arg.num_params) { + params = kcalloc(arg.num_params, sizeof(struct tee_param), + GFP_KERNEL); + if (!params) + return -ENOMEM; + uparams = uarg->params; + rc = params_from_user(ctx, params, arg.num_params, uparams); + if (rc) + goto out; + } + + rc = ctx->teedev->desc->ops->open_session(ctx, &arg, params); + if (rc) + goto out; + have_session = true; + + if (put_user(arg.session, &uarg->session) || + put_user(arg.ret, &uarg->ret) || + put_user(arg.ret_origin, &uarg->ret_origin)) { + rc = -EFAULT; + goto out; + } + rc = params_to_user(uparams, arg.num_params, params); +out: + /* + * If we've succeeded to open the session but failed to communicate + * it back to user space, close the session again to avoid leakage. + */ + if (rc && have_session && ctx->teedev->desc->ops->close_session) + ctx->teedev->desc->ops->close_session(ctx, arg.session); + + if (params) { + /* Decrease ref count for all valid shared memory pointers */ + for (n = 0; n < arg.num_params; n++) + if (tee_param_is_memref(params + n) && + params[n].u.memref.shm) + tee_shm_put(params[n].u.memref.shm); + kfree(params); + } + + return rc; +} + +static int tee_ioctl_invoke(struct tee_context *ctx, + struct tee_ioctl_buf_data __user *ubuf) +{ + int rc; + size_t n; + struct tee_ioctl_buf_data buf; + struct tee_ioctl_invoke_arg __user *uarg; + struct tee_ioctl_invoke_arg arg; + struct tee_ioctl_param __user *uparams = NULL; + struct tee_param *params = NULL; + + if (!ctx->teedev->desc->ops->invoke_func) + return -EINVAL; + + if (copy_from_user(&buf, ubuf, sizeof(buf))) + return -EFAULT; + + if (buf.buf_len > TEE_MAX_ARG_SIZE || + buf.buf_len < sizeof(struct tee_ioctl_invoke_arg)) + return -EINVAL; + + uarg = u64_to_user_ptr(buf.buf_ptr); + if (copy_from_user(&arg, uarg, sizeof(arg))) + return -EFAULT; + + if (sizeof(arg) + TEE_IOCTL_PARAM_SIZE(arg.num_params) != buf.buf_len) + return -EINVAL; + + if (arg.num_params) { + params = kcalloc(arg.num_params, sizeof(struct tee_param), + GFP_KERNEL); + if (!params) + return -ENOMEM; + uparams = uarg->params; + rc = params_from_user(ctx, params, arg.num_params, uparams); + if (rc) + goto out; + } + + rc = ctx->teedev->desc->ops->invoke_func(ctx, &arg, params); + if (rc) + goto out; + + if (put_user(arg.ret, &uarg->ret) || + put_user(arg.ret_origin, &uarg->ret_origin)) { + rc = -EFAULT; + goto out; + } + rc = params_to_user(uparams, arg.num_params, params); +out: + if (params) { + /* Decrease ref count for all valid shared memory pointers */ + for (n = 0; n < arg.num_params; n++) + if (tee_param_is_memref(params + n) && + params[n].u.memref.shm) + tee_shm_put(params[n].u.memref.shm); + kfree(params); + } + return rc; +} + +static int tee_ioctl_cancel(struct tee_context *ctx, + struct tee_ioctl_cancel_arg __user *uarg) +{ + struct tee_ioctl_cancel_arg arg; + + if (!ctx->teedev->desc->ops->cancel_req) + return -EINVAL; + + if (copy_from_user(&arg, uarg, sizeof(arg))) + return -EFAULT; + + return ctx->teedev->desc->ops->cancel_req(ctx, arg.cancel_id, + arg.session); +} + +static int +tee_ioctl_close_session(struct tee_context *ctx, + struct tee_ioctl_close_session_arg __user *uarg) +{ + struct tee_ioctl_close_session_arg arg; + + if (!ctx->teedev->desc->ops->close_session) + return -EINVAL; + + if (copy_from_user(&arg, uarg, sizeof(arg))) + return -EFAULT; + + return ctx->teedev->desc->ops->close_session(ctx, arg.session); +} + +static int params_to_supp(struct tee_context *ctx, + struct tee_ioctl_param __user *uparams, + size_t num_params, struct tee_param *params) +{ + size_t n; + + for (n = 0; n < num_params; n++) { + struct tee_ioctl_param ip; + struct tee_param *p = params + n; + + ip.attr = p->attr; + switch (p->attr & TEE_IOCTL_PARAM_ATTR_TYPE_MASK) { + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT: + ip.a = p->u.value.a; + ip.b = p->u.value.b; + ip.c = p->u.value.c; + break; + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: + ip.b = p->u.memref.size; + if (!p->u.memref.shm) { + ip.a = 0; + ip.c = (u64)-1; /* invalid shm id */ + break; + } + ip.a = p->u.memref.shm_offs; + ip.c = p->u.memref.shm->id; + break; + default: + ip.a = 0; + ip.b = 0; + ip.c = 0; + break; + } + + if (copy_to_user(uparams + n, &ip, sizeof(ip))) + return -EFAULT; + } + + return 0; +} + +static int tee_ioctl_supp_recv(struct tee_context *ctx, + struct tee_ioctl_buf_data __user *ubuf) +{ + int rc; + struct tee_ioctl_buf_data buf; + struct tee_iocl_supp_recv_arg __user *uarg; + struct tee_param *params; + u32 num_params; + u32 func; + + if (!ctx->teedev->desc->ops->supp_recv) + return -EINVAL; + + if (copy_from_user(&buf, ubuf, sizeof(buf))) + return -EFAULT; + + if (buf.buf_len > TEE_MAX_ARG_SIZE || + buf.buf_len < sizeof(struct tee_iocl_supp_recv_arg)) + return -EINVAL; + + uarg = u64_to_user_ptr(buf.buf_ptr); + if (get_user(num_params, &uarg->num_params)) + return -EFAULT; + + if (sizeof(*uarg) + TEE_IOCTL_PARAM_SIZE(num_params) != buf.buf_len) + return -EINVAL; + + params = kcalloc(num_params, sizeof(struct tee_param), GFP_KERNEL); + if (!params) + return -ENOMEM; + + rc = params_from_user(ctx, params, num_params, uarg->params); + if (rc) + goto out; + + rc = ctx->teedev->desc->ops->supp_recv(ctx, &func, &num_params, params); + if (rc) + goto out; + + if (put_user(func, &uarg->func) || + put_user(num_params, &uarg->num_params)) { + rc = -EFAULT; + goto out; + } + + rc = params_to_supp(ctx, uarg->params, num_params, params); +out: + kfree(params); + return rc; +} + +static int params_from_supp(struct tee_param *params, size_t num_params, + struct tee_ioctl_param __user *uparams) +{ + size_t n; + + for (n = 0; n < num_params; n++) { + struct tee_param *p = params + n; + struct tee_ioctl_param ip; + + if (copy_from_user(&ip, uparams + n, sizeof(ip))) + return -EFAULT; + + /* All unused attribute bits has to be zero */ + if (ip.attr & ~TEE_IOCTL_PARAM_ATTR_MASK) + return -EINVAL; + + p->attr = ip.attr; + switch (ip.attr & TEE_IOCTL_PARAM_ATTR_TYPE_MASK) { + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT: + /* Only out and in/out values can be updated */ + p->u.value.a = ip.a; + p->u.value.b = ip.b; + p->u.value.c = ip.c; + break; + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: + case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: + /* + * Only the size of the memref can be updated. + * Since we don't have access to the original + * parameters here, only store the supplied size. + * The driver will copy the updated size into the + * original parameters. + */ + p->u.memref.shm = NULL; + p->u.memref.shm_offs = 0; + p->u.memref.size = ip.b; + break; + default: + memset(&p->u, 0, sizeof(p->u)); + break; + } + } + return 0; +} + +static int tee_ioctl_supp_send(struct tee_context *ctx, + struct tee_ioctl_buf_data __user *ubuf) +{ + long rc; + struct tee_ioctl_buf_data buf; + struct tee_iocl_supp_send_arg __user *uarg; + struct tee_param *params; + u32 num_params; + u32 ret; + + /* Not valid for this driver */ + if (!ctx->teedev->desc->ops->supp_send) + return -EINVAL; + + if (copy_from_user(&buf, ubuf, sizeof(buf))) + return -EFAULT; + + if (buf.buf_len > TEE_MAX_ARG_SIZE || + buf.buf_len < sizeof(struct tee_iocl_supp_send_arg)) + return -EINVAL; + + uarg = u64_to_user_ptr(buf.buf_ptr); + if (get_user(ret, &uarg->ret) || + get_user(num_params, &uarg->num_params)) + return -EFAULT; + + if (sizeof(*uarg) + TEE_IOCTL_PARAM_SIZE(num_params) > buf.buf_len) + return -EINVAL; + + params = kcalloc(num_params, sizeof(struct tee_param), GFP_KERNEL); + if (!params) + return -ENOMEM; + + rc = params_from_supp(params, num_params, uarg->params); + if (rc) + goto out; + + rc = ctx->teedev->desc->ops->supp_send(ctx, ret, num_params, params); +out: + kfree(params); + return rc; +} + +static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) +{ + struct tee_context *ctx = filp->private_data; + void __user *uarg = (void __user *)arg; + + switch (cmd) { + case TEE_IOC_VERSION: + return tee_ioctl_version(ctx, uarg); + case TEE_IOC_SHM_ALLOC: + return tee_ioctl_shm_alloc(ctx, uarg); + case TEE_IOC_SHM_REGISTER: + return tee_ioctl_shm_register(ctx, uarg); + case TEE_IOC_OPEN_SESSION: + return tee_ioctl_open_session(ctx, uarg); + case TEE_IOC_INVOKE: + return tee_ioctl_invoke(ctx, uarg); + case TEE_IOC_CANCEL: + return tee_ioctl_cancel(ctx, uarg); + case TEE_IOC_CLOSE_SESSION: + return tee_ioctl_close_session(ctx, uarg); + case TEE_IOC_SUPPL_RECV: + return tee_ioctl_supp_recv(ctx, uarg); + case TEE_IOC_SUPPL_SEND: + return tee_ioctl_supp_send(ctx, uarg); + default: + return -EINVAL; + } +} + +static const struct file_operations tee_fops = { + .owner = THIS_MODULE, + .open = tee_open, + .release = tee_release, + .unlocked_ioctl = tee_ioctl, + .compat_ioctl = tee_ioctl, +}; + +static void tee_release_device(struct device *dev) +{ + struct tee_device *teedev = container_of(dev, struct tee_device, dev); + + spin_lock(&driver_lock); + clear_bit(teedev->id, dev_mask); + spin_unlock(&driver_lock); + mutex_destroy(&teedev->mutex); + idr_destroy(&teedev->idr); + kfree(teedev); +} + +/** + * tee_device_alloc() - Allocate a new struct tee_device instance + * @teedesc: Descriptor for this driver + * @dev: Parent device for this device + * @pool: Shared memory pool, NULL if not used + * @driver_data: Private driver data for this device + * + * Allocates a new struct tee_device instance. The device is + * removed by tee_device_unregister(). + * + * @returns a pointer to a 'struct tee_device' or an ERR_PTR on failure + */ +struct tee_device *tee_device_alloc(const struct tee_desc *teedesc, + struct device *dev, + struct tee_shm_pool *pool, + void *driver_data) +{ + struct tee_device *teedev; + void *ret; + int rc; + int offs = 0; + + if (!teedesc || !teedesc->name || !teedesc->ops || + !teedesc->ops->get_version || !teedesc->ops->open || + !teedesc->ops->release || !pool) + return ERR_PTR(-EINVAL); + + teedev = kzalloc(sizeof(*teedev), GFP_KERNEL); + if (!teedev) { + ret = ERR_PTR(-ENOMEM); + goto err; + } + + if (teedesc->flags & TEE_DESC_PRIVILEGED) + offs = TEE_NUM_DEVICES / 2; + + spin_lock(&driver_lock); + teedev->id = find_next_zero_bit(dev_mask, TEE_NUM_DEVICES, offs); + if (teedev->id < TEE_NUM_DEVICES) + set_bit(teedev->id, dev_mask); + spin_unlock(&driver_lock); + + if (teedev->id >= TEE_NUM_DEVICES) { + ret = ERR_PTR(-ENOMEM); + goto err; + } + + snprintf(teedev->name, sizeof(teedev->name), "tee%s%d", + teedesc->flags & TEE_DESC_PRIVILEGED ? "priv" : "", + teedev->id - offs); + + teedev->dev.class = tee_class; + teedev->dev.release = tee_release_device; + teedev->dev.parent = dev; + + teedev->dev.devt = MKDEV(MAJOR(tee_devt), teedev->id); + + rc = dev_set_name(&teedev->dev, "%s", teedev->name); + if (rc) { + ret = ERR_PTR(rc); + goto err_devt; + } + + cdev_init(&teedev->cdev, &tee_fops); + teedev->cdev.owner = teedesc->owner; + teedev->cdev.kobj.parent = &teedev->dev.kobj; + + dev_set_drvdata(&teedev->dev, driver_data); + device_initialize(&teedev->dev); + + /* 1 as tee_device_unregister() does one final tee_device_put() */ + teedev->num_users = 1; + init_completion(&teedev->c_no_users); + mutex_init(&teedev->mutex); + idr_init(&teedev->idr); + + teedev->desc = teedesc; + teedev->pool = pool; + + return teedev; +err_devt: + unregister_chrdev_region(teedev->dev.devt, 1); +err: + pr_err("could not register %s driver\n", + teedesc->flags & TEE_DESC_PRIVILEGED ? "privileged" : "client"); + if (teedev && teedev->id < TEE_NUM_DEVICES) { + spin_lock(&driver_lock); + clear_bit(teedev->id, dev_mask); + spin_unlock(&driver_lock); + } + kfree(teedev); + return ret; +} +EXPORT_SYMBOL_GPL(tee_device_alloc); + +static ssize_t implementation_id_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct tee_device *teedev = container_of(dev, struct tee_device, dev); + struct tee_ioctl_version_data vers; + + teedev->desc->ops->get_version(teedev, &vers); + return scnprintf(buf, PAGE_SIZE, "%d\n", vers.impl_id); +} +static DEVICE_ATTR_RO(implementation_id); + +static struct attribute *tee_dev_attrs[] = { + &dev_attr_implementation_id.attr, + NULL +}; + +static const struct attribute_group tee_dev_group = { + .attrs = tee_dev_attrs, +}; + +/** + * tee_device_register() - Registers a TEE device + * @teedev: Device to register + * + * tee_device_unregister() need to be called to remove the @teedev if + * this function fails. + * + * @returns < 0 on failure + */ +int tee_device_register(struct tee_device *teedev) +{ + int rc; + + if (teedev->flags & TEE_DEVICE_FLAG_REGISTERED) { + dev_err(&teedev->dev, "attempt to register twice\n"); + return -EINVAL; + } + + rc = cdev_add(&teedev->cdev, teedev->dev.devt, 1); + if (rc) { + dev_err(&teedev->dev, + "unable to cdev_add() %s, major %d, minor %d, err=%d\n", + teedev->name, MAJOR(teedev->dev.devt), + MINOR(teedev->dev.devt), rc); + return rc; + } + + rc = device_add(&teedev->dev); + if (rc) { + dev_err(&teedev->dev, + "unable to device_add() %s, major %d, minor %d, err=%d\n", + teedev->name, MAJOR(teedev->dev.devt), + MINOR(teedev->dev.devt), rc); + goto err_device_add; + } + + rc = sysfs_create_group(&teedev->dev.kobj, &tee_dev_group); + if (rc) { + dev_err(&teedev->dev, + "failed to create sysfs attributes, err=%d\n", rc); + goto err_sysfs_create_group; + } + + teedev->flags |= TEE_DEVICE_FLAG_REGISTERED; + return 0; + +err_sysfs_create_group: + device_del(&teedev->dev); +err_device_add: + cdev_del(&teedev->cdev); + return rc; +} +EXPORT_SYMBOL_GPL(tee_device_register); + +void tee_device_put(struct tee_device *teedev) +{ + mutex_lock(&teedev->mutex); + /* Shouldn't put in this state */ + if (!WARN_ON(!teedev->desc)) { + teedev->num_users--; + if (!teedev->num_users) { + teedev->desc = NULL; + complete(&teedev->c_no_users); + } + } + mutex_unlock(&teedev->mutex); +} + +bool tee_device_get(struct tee_device *teedev) +{ + mutex_lock(&teedev->mutex); + if (!teedev->desc) { + mutex_unlock(&teedev->mutex); + return false; + } + teedev->num_users++; + mutex_unlock(&teedev->mutex); + return true; +} + +/** + * tee_device_unregister() - Removes a TEE device + * @teedev: Device to unregister + * + * This function should be called to remove the @teedev even if + * tee_device_register() hasn't been called yet. Does nothing if + * @teedev is NULL. + */ +void tee_device_unregister(struct tee_device *teedev) +{ + if (!teedev) + return; + + if (teedev->flags & TEE_DEVICE_FLAG_REGISTERED) { + sysfs_remove_group(&teedev->dev.kobj, &tee_dev_group); + cdev_del(&teedev->cdev); + device_del(&teedev->dev); + } + + tee_device_put(teedev); + wait_for_completion(&teedev->c_no_users); + + /* + * No need to take a mutex any longer now since teedev->desc was + * set to NULL before teedev->c_no_users was completed. + */ + + teedev->pool = NULL; + + put_device(&teedev->dev); +} +EXPORT_SYMBOL_GPL(tee_device_unregister); + +/** + * tee_get_drvdata() - Return driver_data pointer + * @teedev: Device containing the driver_data pointer + * @returns the driver_data pointer supplied to tee_register(). + */ +void *tee_get_drvdata(struct tee_device *teedev) +{ + return dev_get_drvdata(&teedev->dev); +} +EXPORT_SYMBOL_GPL(tee_get_drvdata); + +static int __init tee_init(void) +{ + int rc; + + tee_class = class_create(THIS_MODULE, "tee"); + if (IS_ERR(tee_class)) { + pr_err("couldn't create class\n"); + return PTR_ERR(tee_class); + } + + rc = alloc_chrdev_region(&tee_devt, 0, TEE_NUM_DEVICES, "tee"); + if (rc) { + pr_err("failed to allocate char dev region\n"); + class_destroy(tee_class); + tee_class = NULL; + } + + return rc; +} + +static void __exit tee_exit(void) +{ + class_destroy(tee_class); + tee_class = NULL; + unregister_chrdev_region(tee_devt, TEE_NUM_DEVICES); +} + +subsys_initcall(tee_init); +module_exit(tee_exit); + +MODULE_AUTHOR("Linaro"); +MODULE_DESCRIPTION("TEE Driver"); +MODULE_VERSION("1.0"); +MODULE_LICENSE("GPL v2");
diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h new file mode 100644 index 0000000..85d99d6 --- /dev/null +++ b/drivers/tee/tee_private.h
@@ -0,0 +1,79 @@ +/* + * Copyright (c) 2015-2016, Linaro Limited + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ +#ifndef TEE_PRIVATE_H +#define TEE_PRIVATE_H + +#include <linux/cdev.h> +#include <linux/completion.h> +#include <linux/device.h> +#include <linux/kref.h> +#include <linux/mutex.h> +#include <linux/types.h> + +/** + * struct tee_shm_pool - shared memory pool + * @private_mgr: pool manager for shared memory only between kernel + * and secure world + * @dma_buf_mgr: pool manager for shared memory exported to user space + */ +struct tee_shm_pool { + struct tee_shm_pool_mgr *private_mgr; + struct tee_shm_pool_mgr *dma_buf_mgr; +}; + +#define TEE_DEVICE_FLAG_REGISTERED 0x1 +#define TEE_MAX_DEV_NAME_LEN 32 + +/** + * struct tee_device - TEE Device representation + * @name: name of device + * @desc: description of device + * @id: unique id of device + * @flags: represented by TEE_DEVICE_FLAG_REGISTERED above + * @dev: embedded basic device structure + * @cdev: embedded cdev + * @num_users: number of active users of this device + * @c_no_user: completion used when unregistering the device + * @mutex: mutex protecting @num_users and @idr + * @idr: register of shared memory object allocated on this device + * @pool: shared memory pool + */ +struct tee_device { + char name[TEE_MAX_DEV_NAME_LEN]; + const struct tee_desc *desc; + int id; + unsigned int flags; + + struct device dev; + struct cdev cdev; + + size_t num_users; + struct completion c_no_users; + struct mutex mutex; /* protects num_users and idr */ + + struct idr idr; + struct tee_shm_pool *pool; +}; + +int tee_shm_init(void); + +int tee_shm_get_fd(struct tee_shm *shm); + +bool tee_device_get(struct tee_device *teedev); +void tee_device_put(struct tee_device *teedev); + +void teedev_ctx_get(struct tee_context *ctx); +void teedev_ctx_put(struct tee_context *ctx); + +#endif /*TEE_PRIVATE_H*/
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c new file mode 100644 index 0000000..ed2d71c --- /dev/null +++ b/drivers/tee/tee_shm.c
@@ -0,0 +1,510 @@ +/* + * Copyright (c) 2015-2016, Linaro Limited + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ +#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/fdtable.h> +#include <linux/idr.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/tee_drv.h> +#include "tee_private.h" + +static void tee_shm_release(struct tee_shm *shm) +{ + struct tee_device *teedev = shm->teedev; + + mutex_lock(&teedev->mutex); + idr_remove(&teedev->idr, shm->id); + if (shm->ctx) + list_del(&shm->link); + mutex_unlock(&teedev->mutex); + + if (shm->flags & TEE_SHM_POOL) { + struct tee_shm_pool_mgr *poolm; + + if (shm->flags & TEE_SHM_DMA_BUF) + poolm = teedev->pool->dma_buf_mgr; + else + poolm = teedev->pool->private_mgr; + + poolm->ops->free(poolm, shm); + } else if (shm->flags & TEE_SHM_REGISTER) { + size_t n; + int rc = teedev->desc->ops->shm_unregister(shm->ctx, shm); + + if (rc) + dev_err(teedev->dev.parent, + "unregister shm %p failed: %d", shm, rc); + + for (n = 0; n < shm->num_pages; n++) + put_page(shm->pages[n]); + + kfree(shm->pages); + } + + if (shm->ctx) + teedev_ctx_put(shm->ctx); + + kfree(shm); + + tee_device_put(teedev); +} + +static struct sg_table *tee_shm_op_map_dma_buf(struct dma_buf_attachment + *attach, enum dma_data_direction dir) +{ + return NULL; +} + +static void tee_shm_op_unmap_dma_buf(struct dma_buf_attachment *attach, + struct sg_table *table, + enum dma_data_direction dir) +{ +} + +static void tee_shm_op_release(struct dma_buf *dmabuf) +{ + struct tee_shm *shm = dmabuf->priv; + + tee_shm_release(shm); +} + +static void *tee_shm_op_kmap_atomic(struct dma_buf *dmabuf, unsigned long pgnum) +{ + return NULL; +} + +static void *tee_shm_op_kmap(struct dma_buf *dmabuf, unsigned long pgnum) +{ + return NULL; +} + +static int tee_shm_op_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma) +{ + struct tee_shm *shm = dmabuf->priv; + size_t size = vma->vm_end - vma->vm_start; + + /* Refuse sharing shared memory provided by application */ + if (shm->flags & TEE_SHM_REGISTER) + return -EINVAL; + + return remap_pfn_range(vma, vma->vm_start, shm->paddr >> PAGE_SHIFT, + size, vma->vm_page_prot); +} + +static const struct dma_buf_ops tee_shm_dma_buf_ops = { + .map_dma_buf = tee_shm_op_map_dma_buf, + .unmap_dma_buf = tee_shm_op_unmap_dma_buf, + .release = tee_shm_op_release, + .kmap_atomic = tee_shm_op_kmap_atomic, + .kmap = tee_shm_op_kmap, + .mmap = tee_shm_op_mmap, +}; + +static struct tee_shm *__tee_shm_alloc(struct tee_context *ctx, + struct tee_device *teedev, + size_t size, u32 flags) +{ + struct tee_shm_pool_mgr *poolm = NULL; + struct tee_shm *shm; + void *ret; + int rc; + + if (ctx && ctx->teedev != teedev) { + dev_err(teedev->dev.parent, "ctx and teedev mismatch\n"); + return ERR_PTR(-EINVAL); + } + + if (!(flags & TEE_SHM_MAPPED)) { + dev_err(teedev->dev.parent, + "only mapped allocations supported\n"); + return ERR_PTR(-EINVAL); + } + + if ((flags & ~(TEE_SHM_MAPPED | TEE_SHM_DMA_BUF))) { + dev_err(teedev->dev.parent, "invalid shm flags 0x%x", flags); + return ERR_PTR(-EINVAL); + } + + if (!tee_device_get(teedev)) + return ERR_PTR(-EINVAL); + + if (!teedev->pool) { + /* teedev has been detached from driver */ + ret = ERR_PTR(-EINVAL); + goto err_dev_put; + } + + shm = kzalloc(sizeof(*shm), GFP_KERNEL); + if (!shm) { + ret = ERR_PTR(-ENOMEM); + goto err_dev_put; + } + + shm->flags = flags | TEE_SHM_POOL; + shm->teedev = teedev; + shm->ctx = ctx; + if (flags & TEE_SHM_DMA_BUF) + poolm = teedev->pool->dma_buf_mgr; + else + poolm = teedev->pool->private_mgr; + + rc = poolm->ops->alloc(poolm, shm, size); + if (rc) { + ret = ERR_PTR(rc); + goto err_kfree; + } + + mutex_lock(&teedev->mutex); + shm->id = idr_alloc(&teedev->idr, shm, 1, 0, GFP_KERNEL); + mutex_unlock(&teedev->mutex); + if (shm->id < 0) { + ret = ERR_PTR(shm->id); + goto err_pool_free; + } + + if (flags & TEE_SHM_DMA_BUF) { + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + + exp_info.ops = &tee_shm_dma_buf_ops; + exp_info.size = shm->size; + exp_info.flags = O_RDWR; + exp_info.priv = shm; + + shm->dmabuf = dma_buf_export(&exp_info); + if (IS_ERR(shm->dmabuf)) { + ret = ERR_CAST(shm->dmabuf); + goto err_rem; + } + } + + if (ctx) { + teedev_ctx_get(ctx); + mutex_lock(&teedev->mutex); + list_add_tail(&shm->link, &ctx->list_shm); + mutex_unlock(&teedev->mutex); + } + + return shm; +err_rem: + mutex_lock(&teedev->mutex); + idr_remove(&teedev->idr, shm->id); + mutex_unlock(&teedev->mutex); +err_pool_free: + poolm->ops->free(poolm, shm); +err_kfree: + kfree(shm); +err_dev_put: + tee_device_put(teedev); + return ret; +} + +/** + * tee_shm_alloc() - Allocate shared memory + * @ctx: Context that allocates the shared memory + * @size: Requested size of shared memory + * @flags: Flags setting properties for the requested shared memory. + * + * Memory allocated as global shared memory is automatically freed when the + * TEE file pointer is closed. The @flags field uses the bits defined by + * TEE_SHM_* in <linux/tee_drv.h>. TEE_SHM_MAPPED must currently always be + * set. If TEE_SHM_DMA_BUF global shared memory will be allocated and + * associated with a dma-buf handle, else driver private memory. + */ +struct tee_shm *tee_shm_alloc(struct tee_context *ctx, size_t size, u32 flags) +{ + return __tee_shm_alloc(ctx, ctx->teedev, size, flags); +} +EXPORT_SYMBOL_GPL(tee_shm_alloc); + +struct tee_shm *tee_shm_priv_alloc(struct tee_device *teedev, size_t size) +{ + return __tee_shm_alloc(NULL, teedev, size, TEE_SHM_MAPPED); +} +EXPORT_SYMBOL_GPL(tee_shm_priv_alloc); + +struct tee_shm *tee_shm_register(struct tee_context *ctx, unsigned long addr, + size_t length, u32 flags) +{ + struct tee_device *teedev = ctx->teedev; + const u32 req_flags = TEE_SHM_DMA_BUF | TEE_SHM_USER_MAPPED; + struct tee_shm *shm; + void *ret; + int rc; + int num_pages; + unsigned long start; + + if (flags != req_flags) + return ERR_PTR(-ENOTSUPP); + + if (!tee_device_get(teedev)) + return ERR_PTR(-EINVAL); + + if (!teedev->desc->ops->shm_register || + !teedev->desc->ops->shm_unregister) { + tee_device_put(teedev); + return ERR_PTR(-ENOTSUPP); + } + + teedev_ctx_get(ctx); + + shm = kzalloc(sizeof(*shm), GFP_KERNEL); + if (!shm) { + ret = ERR_PTR(-ENOMEM); + goto err; + } + + shm->flags = flags | TEE_SHM_REGISTER; + shm->teedev = teedev; + shm->ctx = ctx; + shm->id = -1; + start = rounddown(addr, PAGE_SIZE); + shm->offset = addr - start; + shm->size = length; + num_pages = (roundup(addr + length, PAGE_SIZE) - start) / PAGE_SIZE; + shm->pages = kcalloc(num_pages, sizeof(*shm->pages), GFP_KERNEL); + if (!shm->pages) { + ret = ERR_PTR(-ENOMEM); + goto err; + } + + rc = get_user_pages_fast(start, num_pages, 1, shm->pages); + if (rc > 0) + shm->num_pages = rc; + if (rc != num_pages) { + if (rc >= 0) + rc = -ENOMEM; + ret = ERR_PTR(rc); + goto err; + } + + mutex_lock(&teedev->mutex); + shm->id = idr_alloc(&teedev->idr, shm, 1, 0, GFP_KERNEL); + mutex_unlock(&teedev->mutex); + + if (shm->id < 0) { + ret = ERR_PTR(shm->id); + goto err; + } + + rc = teedev->desc->ops->shm_register(ctx, shm, shm->pages, + shm->num_pages, start); + if (rc) { + ret = ERR_PTR(rc); + goto err; + } + + if (flags & TEE_SHM_DMA_BUF) { + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + + exp_info.ops = &tee_shm_dma_buf_ops; + exp_info.size = shm->size; + exp_info.flags = O_RDWR; + exp_info.priv = shm; + + shm->dmabuf = dma_buf_export(&exp_info); + if (IS_ERR(shm->dmabuf)) { + ret = ERR_CAST(shm->dmabuf); + teedev->desc->ops->shm_unregister(ctx, shm); + goto err; + } + } + + mutex_lock(&teedev->mutex); + list_add_tail(&shm->link, &ctx->list_shm); + mutex_unlock(&teedev->mutex); + + return shm; +err: + if (shm) { + size_t n; + + if (shm->id >= 0) { + mutex_lock(&teedev->mutex); + idr_remove(&teedev->idr, shm->id); + mutex_unlock(&teedev->mutex); + } + if (shm->pages) { + for (n = 0; n < shm->num_pages; n++) + put_page(shm->pages[n]); + kfree(shm->pages); + } + } + kfree(shm); + teedev_ctx_put(ctx); + tee_device_put(teedev); + return ret; +} +EXPORT_SYMBOL_GPL(tee_shm_register); + +/** + * tee_shm_get_fd() - Increase reference count and return file descriptor + * @shm: Shared memory handle + * @returns user space file descriptor to shared memory + */ +int tee_shm_get_fd(struct tee_shm *shm) +{ + int fd; + + if (!(shm->flags & TEE_SHM_DMA_BUF)) + return -EINVAL; + + fd = dma_buf_fd(shm->dmabuf, O_CLOEXEC); + if (fd >= 0) + get_dma_buf(shm->dmabuf); + return fd; +} + +/** + * tee_shm_free() - Free shared memory + * @shm: Handle to shared memory to free + */ +void tee_shm_free(struct tee_shm *shm) +{ + /* + * dma_buf_put() decreases the dmabuf reference counter and will + * call tee_shm_release() when the last reference is gone. + * + * In the case of driver private memory we call tee_shm_release + * directly instead as it doesn't have a reference counter. + */ + if (shm->flags & TEE_SHM_DMA_BUF) + dma_buf_put(shm->dmabuf); + else + tee_shm_release(shm); +} +EXPORT_SYMBOL_GPL(tee_shm_free); + +/** + * tee_shm_va2pa() - Get physical address of a virtual address + * @shm: Shared memory handle + * @va: Virtual address to tranlsate + * @pa: Returned physical address + * @returns 0 on success and < 0 on failure + */ +int tee_shm_va2pa(struct tee_shm *shm, void *va, phys_addr_t *pa) +{ + if (!(shm->flags & TEE_SHM_MAPPED)) + return -EINVAL; + /* Check that we're in the range of the shm */ + if ((char *)va < (char *)shm->kaddr) + return -EINVAL; + if ((char *)va >= ((char *)shm->kaddr + shm->size)) + return -EINVAL; + + return tee_shm_get_pa( + shm, (unsigned long)va - (unsigned long)shm->kaddr, pa); +} +EXPORT_SYMBOL_GPL(tee_shm_va2pa); + +/** + * tee_shm_pa2va() - Get virtual address of a physical address + * @shm: Shared memory handle + * @pa: Physical address to tranlsate + * @va: Returned virtual address + * @returns 0 on success and < 0 on failure + */ +int tee_shm_pa2va(struct tee_shm *shm, phys_addr_t pa, void **va) +{ + if (!(shm->flags & TEE_SHM_MAPPED)) + return -EINVAL; + /* Check that we're in the range of the shm */ + if (pa < shm->paddr) + return -EINVAL; + if (pa >= (shm->paddr + shm->size)) + return -EINVAL; + + if (va) { + void *v = tee_shm_get_va(shm, pa - shm->paddr); + + if (IS_ERR(v)) + return PTR_ERR(v); + *va = v; + } + return 0; +} +EXPORT_SYMBOL_GPL(tee_shm_pa2va); + +/** + * tee_shm_get_va() - Get virtual address of a shared memory plus an offset + * @shm: Shared memory handle + * @offs: Offset from start of this shared memory + * @returns virtual address of the shared memory + offs if offs is within + * the bounds of this shared memory, else an ERR_PTR + */ +void *tee_shm_get_va(struct tee_shm *shm, size_t offs) +{ + if (!(shm->flags & TEE_SHM_MAPPED)) + return ERR_PTR(-EINVAL); + if (offs >= shm->size) + return ERR_PTR(-EINVAL); + return (char *)shm->kaddr + offs; +} +EXPORT_SYMBOL_GPL(tee_shm_get_va); + +/** + * tee_shm_get_pa() - Get physical address of a shared memory plus an offset + * @shm: Shared memory handle + * @offs: Offset from start of this shared memory + * @pa: Physical address to return + * @returns 0 if offs is within the bounds of this shared memory, else an + * error code. + */ +int tee_shm_get_pa(struct tee_shm *shm, size_t offs, phys_addr_t *pa) +{ + if (offs >= shm->size) + return -EINVAL; + if (pa) + *pa = shm->paddr + offs; + return 0; +} +EXPORT_SYMBOL_GPL(tee_shm_get_pa); + +/** + * tee_shm_get_from_id() - Find shared memory object and increase reference + * count + * @ctx: Context owning the shared memory + * @id: Id of shared memory object + * @returns a pointer to 'struct tee_shm' on success or an ERR_PTR on failure + */ +struct tee_shm *tee_shm_get_from_id(struct tee_context *ctx, int id) +{ + struct tee_device *teedev; + struct tee_shm *shm; + + if (!ctx) + return ERR_PTR(-EINVAL); + + teedev = ctx->teedev; + mutex_lock(&teedev->mutex); + shm = idr_find(&teedev->idr, id); + if (!shm || shm->ctx != ctx) + shm = ERR_PTR(-EINVAL); + else if (shm->flags & TEE_SHM_DMA_BUF) + get_dma_buf(shm->dmabuf); + mutex_unlock(&teedev->mutex); + return shm; +} +EXPORT_SYMBOL_GPL(tee_shm_get_from_id); + +/** + * tee_shm_put() - Decrease reference count on a shared memory handle + * @shm: Shared memory handle + */ +void tee_shm_put(struct tee_shm *shm) +{ + if (shm->flags & TEE_SHM_DMA_BUF) + dma_buf_put(shm->dmabuf); +} +EXPORT_SYMBOL_GPL(tee_shm_put);
diff --git a/drivers/tee/tee_shm_pool.c b/drivers/tee/tee_shm_pool.c new file mode 100644 index 0000000..e6d4b9e --- /dev/null +++ b/drivers/tee/tee_shm_pool.c
@@ -0,0 +1,195 @@ +/* + * Copyright (c) 2015, Linaro Limited + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ +#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/genalloc.h> +#include <linux/slab.h> +#include <linux/tee_drv.h> +#include "tee_private.h" + +static int pool_op_gen_alloc(struct tee_shm_pool_mgr *poolm, + struct tee_shm *shm, size_t size) +{ + unsigned long va; + struct gen_pool *genpool = poolm->private_data; + size_t s = roundup(size, 1 << genpool->min_alloc_order); + + va = gen_pool_alloc(genpool, s); + if (!va) + return -ENOMEM; + + memset((void *)va, 0, s); + shm->kaddr = (void *)va; + shm->paddr = gen_pool_virt_to_phys(genpool, va); + shm->size = s; + return 0; +} + +static void pool_op_gen_free(struct tee_shm_pool_mgr *poolm, + struct tee_shm *shm) +{ + gen_pool_free(poolm->private_data, (unsigned long)shm->kaddr, + shm->size); + shm->kaddr = NULL; +} + +static void pool_op_gen_destroy_poolmgr(struct tee_shm_pool_mgr *poolm) +{ + gen_pool_destroy(poolm->private_data); + kfree(poolm); +} + +static const struct tee_shm_pool_mgr_ops pool_ops_generic = { + .alloc = pool_op_gen_alloc, + .free = pool_op_gen_free, + .destroy_poolmgr = pool_op_gen_destroy_poolmgr, +}; + +/** + * tee_shm_pool_alloc_res_mem() - Create a shared memory pool from reserved + * memory range + * @priv_info: Information for driver private shared memory pool + * @dmabuf_info: Information for dma-buf shared memory pool + * + * Start and end of pools will must be page aligned. + * + * Allocation with the flag TEE_SHM_DMA_BUF set will use the range supplied + * in @dmabuf, others will use the range provided by @priv. + * + * @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure. + */ +struct tee_shm_pool * +tee_shm_pool_alloc_res_mem(struct tee_shm_pool_mem_info *priv_info, + struct tee_shm_pool_mem_info *dmabuf_info) +{ + struct tee_shm_pool_mgr *priv_mgr; + struct tee_shm_pool_mgr *dmabuf_mgr; + void *rc; + + /* + * Create the pool for driver private shared memory + */ + rc = tee_shm_pool_mgr_alloc_res_mem(priv_info->vaddr, priv_info->paddr, + priv_info->size, + 3 /* 8 byte aligned */); + if (IS_ERR(rc)) + return rc; + priv_mgr = rc; + + /* + * Create the pool for dma_buf shared memory + */ + rc = tee_shm_pool_mgr_alloc_res_mem(dmabuf_info->vaddr, + dmabuf_info->paddr, + dmabuf_info->size, PAGE_SHIFT); + if (IS_ERR(rc)) + goto err_free_priv_mgr; + dmabuf_mgr = rc; + + rc = tee_shm_pool_alloc(priv_mgr, dmabuf_mgr); + if (IS_ERR(rc)) + goto err_free_dmabuf_mgr; + + return rc; + +err_free_dmabuf_mgr: + tee_shm_pool_mgr_destroy(dmabuf_mgr); +err_free_priv_mgr: + tee_shm_pool_mgr_destroy(priv_mgr); + + return rc; +} +EXPORT_SYMBOL_GPL(tee_shm_pool_alloc_res_mem); + +struct tee_shm_pool_mgr *tee_shm_pool_mgr_alloc_res_mem(unsigned long vaddr, + phys_addr_t paddr, + size_t size, + int min_alloc_order) +{ + const size_t page_mask = PAGE_SIZE - 1; + struct tee_shm_pool_mgr *mgr; + int rc; + + /* Start and end must be page aligned */ + if (vaddr & page_mask || paddr & page_mask || size & page_mask) + return ERR_PTR(-EINVAL); + + mgr = kzalloc(sizeof(*mgr), GFP_KERNEL); + if (!mgr) + return ERR_PTR(-ENOMEM); + + mgr->private_data = gen_pool_create(min_alloc_order, -1); + if (!mgr->private_data) { + rc = -ENOMEM; + goto err; + } + + gen_pool_set_algo(mgr->private_data, gen_pool_best_fit, NULL); + rc = gen_pool_add_virt(mgr->private_data, vaddr, paddr, size, -1); + if (rc) { + gen_pool_destroy(mgr->private_data); + goto err; + } + + mgr->ops = &pool_ops_generic; + + return mgr; +err: + kfree(mgr); + + return ERR_PTR(rc); +} +EXPORT_SYMBOL_GPL(tee_shm_pool_mgr_alloc_res_mem); + +static bool check_mgr_ops(struct tee_shm_pool_mgr *mgr) +{ + return mgr && mgr->ops && mgr->ops->alloc && mgr->ops->free && + mgr->ops->destroy_poolmgr; +} + +struct tee_shm_pool *tee_shm_pool_alloc(struct tee_shm_pool_mgr *priv_mgr, + struct tee_shm_pool_mgr *dmabuf_mgr) +{ + struct tee_shm_pool *pool; + + if (!check_mgr_ops(priv_mgr) || !check_mgr_ops(dmabuf_mgr)) + return ERR_PTR(-EINVAL); + + pool = kzalloc(sizeof(*pool), GFP_KERNEL); + if (!pool) + return ERR_PTR(-ENOMEM); + + pool->private_mgr = priv_mgr; + pool->dma_buf_mgr = dmabuf_mgr; + + return pool; +} +EXPORT_SYMBOL_GPL(tee_shm_pool_alloc); + +/** + * tee_shm_pool_free() - Free a shared memory pool + * @pool: The shared memory pool to free + * + * There must be no remaining shared memory allocated from this pool when + * this function is called. + */ +void tee_shm_pool_free(struct tee_shm_pool *pool) +{ + if (pool->private_mgr) + tee_shm_pool_mgr_destroy(pool->private_mgr); + if (pool->dma_buf_mgr) + tee_shm_pool_mgr_destroy(pool->dma_buf_mgr); + kfree(pool); +} +EXPORT_SYMBOL_GPL(tee_shm_pool_free);
diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index c5285ed..2d855a9 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c
@@ -23,243 +23,450 @@ #include <linux/module.h> #include <linux/platform_device.h> #include <linux/io.h> +#include <linux/of_device.h> #include "thermal_core.h" -#define TEMP0_TH (0x4) -#define TEMP0_RST_TH (0x8) -#define TEMP0_CFG (0xC) -#define TEMP0_EN (0x10) -#define TEMP0_INT_EN (0x14) -#define TEMP0_INT_CLR (0x18) -#define TEMP0_RST_MSK (0x1C) -#define TEMP0_VALUE (0x28) +#define HI6220_TEMP0_LAG (0x0) +#define HI6220_TEMP0_TH (0x4) +#define HI6220_TEMP0_RST_TH (0x8) +#define HI6220_TEMP0_CFG (0xC) +#define HI6220_TEMP0_CFG_SS_MSK (0xF000) +#define HI6220_TEMP0_CFG_HDAK_MSK (0x30) +#define HI6220_TEMP0_EN (0x10) +#define HI6220_TEMP0_INT_EN (0x14) +#define HI6220_TEMP0_INT_CLR (0x18) +#define HI6220_TEMP0_RST_MSK (0x1C) +#define HI6220_TEMP0_VALUE (0x28) -#define HISI_TEMP_BASE (-60000) -#define HISI_TEMP_RESET (100000) -#define HISI_TEMP_STEP (784) +#define HI3660_OFFSET(chan) ((chan) * 0x40) +#define HI3660_TEMP(chan) (HI3660_OFFSET(chan) + 0x1C) +#define HI3660_TH(chan) (HI3660_OFFSET(chan) + 0x20) +#define HI3660_LAG(chan) (HI3660_OFFSET(chan) + 0x28) +#define HI3660_INT_EN(chan) (HI3660_OFFSET(chan) + 0x2C) +#define HI3660_INT_CLR(chan) (HI3660_OFFSET(chan) + 0x30) -#define HISI_MAX_SENSORS 4 +#define HI6220_TEMP_BASE (-60000) +#define HI6220_TEMP_RESET (100000) +#define HI6220_TEMP_STEP (785) +#define HI6220_TEMP_LAG (3500) + +#define HI3660_TEMP_BASE (-63780) +#define HI3660_TEMP_STEP (205) +#define HI3660_TEMP_LAG (4000) + +#define HI6220_DEFAULT_SENSOR 2 +#define HI3660_DEFAULT_SENSOR 1 struct hisi_thermal_sensor { - struct hisi_thermal_data *thermal; struct thermal_zone_device *tzd; - - long sensor_temp; uint32_t id; uint32_t thres_temp; }; struct hisi_thermal_data { - struct mutex thermal_lock; /* protects register data */ + int (*get_temp)(struct hisi_thermal_data *data); + int (*enable_sensor)(struct hisi_thermal_data *data); + int (*disable_sensor)(struct hisi_thermal_data *data); + int (*irq_handler)(struct hisi_thermal_data *data); struct platform_device *pdev; struct clk *clk; - struct hisi_thermal_sensor sensors[HISI_MAX_SENSORS]; - - int irq, irq_bind_sensor; - bool irq_enabled; - + struct hisi_thermal_sensor sensor; void __iomem *regs; + int irq; }; /* * The temperature computation on the tsensor is as follow: * Unit: millidegree Celsius - * Step: 255/200 (0.7843) + * Step: 200/255 (0.7843) * Temperature base: -60°C * - * The register is programmed in temperature steps, every step is 784 + * The register is programmed in temperature steps, every step is 785 * millidegree and begins at -60 000 m°C * * The temperature from the steps: * - * Temp = TempBase + (steps x 784) + * Temp = TempBase + (steps x 785) * * and the steps from the temperature: * - * steps = (Temp - TempBase) / 784 + * steps = (Temp - TempBase) / 785 * */ -static inline int hisi_thermal_step_to_temp(int step) +static inline int hi6220_thermal_step_to_temp(int step) { - return HISI_TEMP_BASE + (step * HISI_TEMP_STEP); + return HI6220_TEMP_BASE + (step * HI6220_TEMP_STEP); } -static inline long hisi_thermal_temp_to_step(long temp) +static inline int hi6220_thermal_temp_to_step(int temp) { - return (temp - HISI_TEMP_BASE) / HISI_TEMP_STEP; + return DIV_ROUND_UP(temp - HI6220_TEMP_BASE, HI6220_TEMP_STEP); } -static inline long hisi_thermal_round_temp(int temp) +/* + * for Hi3660, + * Step: 189/922 (0.205) + * Temperature base: -63.780°C + * + * The register is programmed in temperature steps, every step is 205 + * millidegree and begins at -63 780 m°C + */ +static inline int hi3660_thermal_step_to_temp(int step) { - return hisi_thermal_step_to_temp( - hisi_thermal_temp_to_step(temp)); + return HI3660_TEMP_BASE + step * HI3660_TEMP_STEP; } -static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data, - struct hisi_thermal_sensor *sensor) +static inline int hi3660_thermal_temp_to_step(int temp) { - long val; - - mutex_lock(&data->thermal_lock); - - /* disable interrupt */ - writel(0x0, data->regs + TEMP0_INT_EN); - writel(0x1, data->regs + TEMP0_INT_CLR); - - /* disable module firstly */ - writel(0x0, data->regs + TEMP0_EN); - - /* select sensor id */ - writel((sensor->id << 12), data->regs + TEMP0_CFG); - - /* enable module */ - writel(0x1, data->regs + TEMP0_EN); - - usleep_range(3000, 5000); - - val = readl(data->regs + TEMP0_VALUE); - val = hisi_thermal_step_to_temp(val); - - mutex_unlock(&data->thermal_lock); - - return val; + return DIV_ROUND_UP(temp - HI3660_TEMP_BASE, HI3660_TEMP_STEP); } -static void hisi_thermal_enable_bind_irq_sensor - (struct hisi_thermal_data *data) +/* + * The lag register contains 5 bits encoding the temperature in steps. + * + * Each time the temperature crosses the threshold boundary, an + * interrupt is raised. It could be when the temperature is going + * above the threshold or below. However, if the temperature is + * fluctuating around this value due to the load, we can receive + * several interrupts which may not desired. + * + * We can setup a temperature representing the delta between the + * threshold and the current temperature when the temperature is + * decreasing. + * + * For instance: the lag register is 5°C, the threshold is 65°C, when + * the temperature reaches 65°C an interrupt is raised and when the + * temperature decrease to 65°C - 5°C another interrupt is raised. + * + * A very short lag can lead to an interrupt storm, a long lag + * increase the latency to react to the temperature changes. In our + * case, that is not really a problem as we are polling the + * temperature. + * + * [0:4] : lag register + * + * The temperature is coded in steps, cf. HI6220_TEMP_STEP. + * + * Min : 0x00 : 0.0 °C + * Max : 0x1F : 24.3 °C + * + * The 'value' parameter is in milliCelsius. + */ +static inline void hi6220_thermal_set_lag(void __iomem *addr, int value) { - struct hisi_thermal_sensor *sensor; - - mutex_lock(&data->thermal_lock); - - sensor = &data->sensors[data->irq_bind_sensor]; - - /* setting the hdak time */ - writel(0x0, data->regs + TEMP0_CFG); - - /* disable module firstly */ - writel(0x0, data->regs + TEMP0_RST_MSK); - writel(0x0, data->regs + TEMP0_EN); - - /* select sensor id */ - writel((sensor->id << 12), data->regs + TEMP0_CFG); - - /* enable for interrupt */ - writel(hisi_thermal_temp_to_step(sensor->thres_temp) | 0x0FFFFFF00, - data->regs + TEMP0_TH); - - writel(hisi_thermal_temp_to_step(HISI_TEMP_RESET), - data->regs + TEMP0_RST_TH); - - /* enable module */ - writel(0x1, data->regs + TEMP0_RST_MSK); - writel(0x1, data->regs + TEMP0_EN); - - writel(0x0, data->regs + TEMP0_INT_CLR); - writel(0x1, data->regs + TEMP0_INT_EN); - - usleep_range(3000, 5000); - - mutex_unlock(&data->thermal_lock); + writel(DIV_ROUND_UP(value, HI6220_TEMP_STEP) & 0x1F, + addr + HI6220_TEMP0_LAG); } -static void hisi_thermal_disable_sensor(struct hisi_thermal_data *data) +static inline void hi6220_thermal_alarm_clear(void __iomem *addr, int value) { - mutex_lock(&data->thermal_lock); + writel(value, addr + HI6220_TEMP0_INT_CLR); +} +static inline void hi6220_thermal_alarm_enable(void __iomem *addr, int value) +{ + writel(value, addr + HI6220_TEMP0_INT_EN); +} + +static inline void hi6220_thermal_alarm_set(void __iomem *addr, int temp) +{ + writel(hi6220_thermal_temp_to_step(temp) | 0x0FFFFFF00, + addr + HI6220_TEMP0_TH); +} + +static inline void hi6220_thermal_reset_set(void __iomem *addr, int temp) +{ + writel(hi6220_thermal_temp_to_step(temp), addr + HI6220_TEMP0_RST_TH); +} + +static inline void hi6220_thermal_reset_enable(void __iomem *addr, int value) +{ + writel(value, addr + HI6220_TEMP0_RST_MSK); +} + +static inline void hi6220_thermal_enable(void __iomem *addr, int value) +{ + writel(value, addr + HI6220_TEMP0_EN); +} + +static inline int hi6220_thermal_get_temperature(void __iomem *addr) +{ + return hi6220_thermal_step_to_temp(readl(addr + HI6220_TEMP0_VALUE)); +} + +/* + * [0:6] lag register + * + * The temperature is coded in steps, cf. HI3660_TEMP_STEP. + * + * Min : 0x00 : 0.0 °C + * Max : 0x7F : 26.0 °C + * + */ +static inline void hi3660_thermal_set_lag(void __iomem *addr, + int id, int value) +{ + writel(DIV_ROUND_UP(value, HI3660_TEMP_STEP) & 0x7F, + addr + HI3660_LAG(id)); +} + +static inline void hi3660_thermal_alarm_clear(void __iomem *addr, + int id, int value) +{ + writel(value, addr + HI3660_INT_CLR(id)); +} + +static inline void hi3660_thermal_alarm_enable(void __iomem *addr, + int id, int value) +{ + writel(value, addr + HI3660_INT_EN(id)); +} + +static inline void hi3660_thermal_alarm_set(void __iomem *addr, + int id, int value) +{ + writel(value, addr + HI3660_TH(id)); +} + +static inline int hi3660_thermal_get_temperature(void __iomem *addr, int id) +{ + return hi3660_thermal_step_to_temp(readl(addr + HI3660_TEMP(id))); +} + +/* + * Temperature configuration register - Sensor selection + * + * Bits [19:12] + * + * 0x0: local sensor (default) + * 0x1: remote sensor 1 (ACPU cluster 1) + * 0x2: remote sensor 2 (ACPU cluster 0) + * 0x3: remote sensor 3 (G3D) + */ +static inline void hi6220_thermal_sensor_select(void __iomem *addr, int sensor) +{ + writel((readl(addr + HI6220_TEMP0_CFG) & ~HI6220_TEMP0_CFG_SS_MSK) | + (sensor << 12), addr + HI6220_TEMP0_CFG); +} + +/* + * Temperature configuration register - Hdak conversion polling interval + * + * Bits [5:4] + * + * 0x0 : 0.768 ms + * 0x1 : 6.144 ms + * 0x2 : 49.152 ms + * 0x3 : 393.216 ms + */ +static inline void hi6220_thermal_hdak_set(void __iomem *addr, int value) +{ + writel((readl(addr + HI6220_TEMP0_CFG) & ~HI6220_TEMP0_CFG_HDAK_MSK) | + (value << 4), addr + HI6220_TEMP0_CFG); +} + +static int hi6220_thermal_irq_handler(struct hisi_thermal_data *data) +{ + hi6220_thermal_alarm_clear(data->regs, 1); + return 0; +} + +static int hi3660_thermal_irq_handler(struct hisi_thermal_data *data) +{ + hi3660_thermal_alarm_clear(data->regs, data->sensor.id, 1); + return 0; +} + +static int hi6220_thermal_get_temp(struct hisi_thermal_data *data) +{ + return hi6220_thermal_get_temperature(data->regs); +} + +static int hi3660_thermal_get_temp(struct hisi_thermal_data *data) +{ + return hi3660_thermal_get_temperature(data->regs, data->sensor.id); +} + +static int hi6220_thermal_disable_sensor(struct hisi_thermal_data *data) +{ /* disable sensor module */ - writel(0x0, data->regs + TEMP0_INT_EN); - writel(0x0, data->regs + TEMP0_RST_MSK); - writel(0x0, data->regs + TEMP0_EN); + hi6220_thermal_enable(data->regs, 0); + hi6220_thermal_alarm_enable(data->regs, 0); + hi6220_thermal_reset_enable(data->regs, 0); - mutex_unlock(&data->thermal_lock); -} - -static int hisi_thermal_get_temp(void *_sensor, int *temp) -{ - struct hisi_thermal_sensor *sensor = _sensor; - struct hisi_thermal_data *data = sensor->thermal; - - int sensor_id = -1, i; - long max_temp = 0; - - *temp = hisi_thermal_get_sensor_temp(data, sensor); - - sensor->sensor_temp = *temp; - - for (i = 0; i < HISI_MAX_SENSORS; i++) { - if (!data->sensors[i].tzd) - continue; - - if (data->sensors[i].sensor_temp >= max_temp) { - max_temp = data->sensors[i].sensor_temp; - sensor_id = i; - } - } - - /* If no sensor has been enabled, then skip to enable irq */ - if (sensor_id == -1) - return 0; - - mutex_lock(&data->thermal_lock); - data->irq_bind_sensor = sensor_id; - mutex_unlock(&data->thermal_lock); - - dev_dbg(&data->pdev->dev, "id=%d, irq=%d, temp=%d, thres=%d\n", - sensor->id, data->irq_enabled, *temp, sensor->thres_temp); - /* - * Bind irq to sensor for two cases: - * Reenable alarm IRQ if temperature below threshold; - * if irq has been enabled, always set it; - */ - if (data->irq_enabled) { - hisi_thermal_enable_bind_irq_sensor(data); - return 0; - } - - if (max_temp < sensor->thres_temp) { - data->irq_enabled = true; - hisi_thermal_enable_bind_irq_sensor(data); - enable_irq(data->irq); - } + clk_disable_unprepare(data->clk); return 0; } -static struct thermal_zone_of_device_ops hisi_of_thermal_ops = { +static int hi3660_thermal_disable_sensor(struct hisi_thermal_data *data) +{ + /* disable sensor module */ + hi3660_thermal_alarm_enable(data->regs, data->sensor.id, 0); + return 0; +} + +static int hi6220_thermal_enable_sensor(struct hisi_thermal_data *data) +{ + struct hisi_thermal_sensor *sensor = &data->sensor; + int ret; + + /* enable clock for tsensor */ + ret = clk_prepare_enable(data->clk); + if (ret) + return ret; + + /* disable module firstly */ + hi6220_thermal_reset_enable(data->regs, 0); + hi6220_thermal_enable(data->regs, 0); + + /* select sensor id */ + hi6220_thermal_sensor_select(data->regs, sensor->id); + + /* setting the hdak time */ + hi6220_thermal_hdak_set(data->regs, 0); + + /* setting lag value between current temp and the threshold */ + hi6220_thermal_set_lag(data->regs, HI6220_TEMP_LAG); + + /* enable for interrupt */ + hi6220_thermal_alarm_set(data->regs, sensor->thres_temp); + + hi6220_thermal_reset_set(data->regs, HI6220_TEMP_RESET); + + /* enable module */ + hi6220_thermal_reset_enable(data->regs, 1); + hi6220_thermal_enable(data->regs, 1); + + hi6220_thermal_alarm_clear(data->regs, 0); + hi6220_thermal_alarm_enable(data->regs, 1); + + return 0; +} + +static int hi3660_thermal_enable_sensor(struct hisi_thermal_data *data) +{ + unsigned int value; + struct hisi_thermal_sensor *sensor = &data->sensor; + + /* disable interrupt */ + hi3660_thermal_alarm_enable(data->regs, sensor->id, 0); + + /* setting lag value between current temp and the threshold */ + hi3660_thermal_set_lag(data->regs, sensor->id, HI3660_TEMP_LAG); + + /* set interrupt threshold */ + value = hi3660_thermal_temp_to_step(sensor->thres_temp); + hi3660_thermal_alarm_set(data->regs, sensor->id, value); + + /* enable interrupt */ + hi3660_thermal_alarm_clear(data->regs, sensor->id, 1); + hi3660_thermal_alarm_enable(data->regs, sensor->id, 1); + + return 0; +} + +static int hi6220_thermal_probe(struct hisi_thermal_data *data) +{ + struct platform_device *pdev = data->pdev; + struct device *dev = &pdev->dev; + struct resource *res; + int ret; + + data->get_temp = hi6220_thermal_get_temp; + data->enable_sensor = hi6220_thermal_enable_sensor; + data->disable_sensor = hi6220_thermal_disable_sensor; + data->irq_handler = hi6220_thermal_irq_handler; + + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + data->regs = devm_ioremap_resource(dev, res); + if (IS_ERR(data->regs)) { + dev_err(dev, "failed to get io address\n"); + return PTR_ERR(data->regs); + } + + data->clk = devm_clk_get(dev, "thermal_clk"); + if (IS_ERR(data->clk)) { + ret = PTR_ERR(data->clk); + if (ret != -EPROBE_DEFER) + dev_err(dev, "failed to get thermal clk: %d\n", ret); + return ret; + } + + data->irq = platform_get_irq(pdev, 0); + if (data->irq < 0) + return data->irq; + + data->sensor.id = HI6220_DEFAULT_SENSOR; + + return 0; +} + +static int hi3660_thermal_probe(struct hisi_thermal_data *data) +{ + struct platform_device *pdev = data->pdev; + struct device *dev = &pdev->dev; + struct resource *res; + + data->get_temp = hi3660_thermal_get_temp; + data->enable_sensor = hi3660_thermal_enable_sensor; + data->disable_sensor = hi3660_thermal_disable_sensor; + data->irq_handler = hi3660_thermal_irq_handler; + + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + data->regs = devm_ioremap_resource(dev, res); + if (IS_ERR(data->regs)) { + dev_err(dev, "failed to get io address\n"); + return PTR_ERR(data->regs); + } + + data->irq = platform_get_irq(pdev, 0); + if (data->irq < 0) + return data->irq; + + data->sensor.id = HI3660_DEFAULT_SENSOR; + + return 0; +} + +static int hisi_thermal_get_temp(void *__data, int *temp) +{ + struct hisi_thermal_data *data = __data; + struct hisi_thermal_sensor *sensor = &data->sensor; + + *temp = data->get_temp(data); + + dev_dbg(&data->pdev->dev, "id=%d, temp=%d, thres=%d\n", + sensor->id, *temp, sensor->thres_temp); + + return 0; +} + +static const struct thermal_zone_of_device_ops hisi_of_thermal_ops = { .get_temp = hisi_thermal_get_temp, }; -static irqreturn_t hisi_thermal_alarm_irq(int irq, void *dev) -{ - struct hisi_thermal_data *data = dev; - - disable_irq_nosync(irq); - data->irq_enabled = false; - - return IRQ_WAKE_THREAD; -} - static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev) { struct hisi_thermal_data *data = dev; - struct hisi_thermal_sensor *sensor; - int i; + struct hisi_thermal_sensor *sensor = &data->sensor; + int temp = 0; - mutex_lock(&data->thermal_lock); - sensor = &data->sensors[data->irq_bind_sensor]; + data->irq_handler(data); - dev_crit(&data->pdev->dev, "THERMAL ALARM: T > %d\n", - sensor->thres_temp); - mutex_unlock(&data->thermal_lock); + hisi_thermal_get_temp(data, &temp); - for (i = 0; i < HISI_MAX_SENSORS; i++) { - if (!data->sensors[i].tzd) - continue; + if (temp >= sensor->thres_temp) { + dev_crit(&data->pdev->dev, "THERMAL ALARM: %d > %d\n", + temp, sensor->thres_temp); - thermal_zone_device_update(data->sensors[i].tzd, + thermal_zone_device_update(data->sensor.tzd, THERMAL_EVENT_UNSPECIFIED); + + } else { + dev_crit(&data->pdev->dev, "THERMAL ALARM stopped: %d < %d\n", + temp, sensor->thres_temp); } return IRQ_HANDLED; @@ -267,17 +474,14 @@ static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev) static int hisi_thermal_register_sensor(struct platform_device *pdev, struct hisi_thermal_data *data, - struct hisi_thermal_sensor *sensor, - int index) + struct hisi_thermal_sensor *sensor) { int ret, i; const struct thermal_trip *trip; - sensor->id = index; - sensor->thermal = data; - sensor->tzd = devm_thermal_zone_of_sensor_register(&pdev->dev, - sensor->id, sensor, &hisi_of_thermal_ops); + sensor->id, data, + &hisi_of_thermal_ops); if (IS_ERR(sensor->tzd)) { ret = PTR_ERR(sensor->tzd); sensor->tzd = NULL; @@ -290,7 +494,7 @@ static int hisi_thermal_register_sensor(struct platform_device *pdev, for (i = 0; i < of_thermal_get_ntrips(sensor->tzd); i++) { if (trip[i].type == THERMAL_TRIP_PASSIVE) { - sensor->thres_temp = hisi_thermal_round_temp(trip[i].temperature); + sensor->thres_temp = trip[i].temperature; break; } } @@ -299,7 +503,14 @@ static int hisi_thermal_register_sensor(struct platform_device *pdev, } static const struct of_device_id of_hisi_thermal_match[] = { - { .compatible = "hisilicon,tsensor" }, + { + .compatible = "hisilicon,tsensor", + .data = hi6220_thermal_probe + }, + { + .compatible = "hisilicon,hi3660-tsensor", + .data = hi3660_thermal_probe + }, { /* end */ } }; MODULE_DEVICE_TABLE(of, of_hisi_thermal_match); @@ -316,69 +527,51 @@ static void hisi_thermal_toggle_sensor(struct hisi_thermal_sensor *sensor, static int hisi_thermal_probe(struct platform_device *pdev) { struct hisi_thermal_data *data; - struct resource *res; - int i; + int const (*platform_probe)(struct hisi_thermal_data *); + struct device *dev = &pdev->dev; int ret; - data = devm_kzalloc(&pdev->dev, sizeof(*data), GFP_KERNEL); + data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL); if (!data) return -ENOMEM; - mutex_init(&data->thermal_lock); data->pdev = pdev; - - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - data->regs = devm_ioremap_resource(&pdev->dev, res); - if (IS_ERR(data->regs)) { - dev_err(&pdev->dev, "failed to get io address\n"); - return PTR_ERR(data->regs); - } - - data->irq = platform_get_irq(pdev, 0); - if (data->irq < 0) - return data->irq; - platform_set_drvdata(pdev, data); - data->clk = devm_clk_get(&pdev->dev, "thermal_clk"); - if (IS_ERR(data->clk)) { - ret = PTR_ERR(data->clk); - if (ret != -EPROBE_DEFER) - dev_err(&pdev->dev, - "failed to get thermal clk: %d\n", ret); - return ret; + platform_probe = of_device_get_match_data(dev); + if (!platform_probe) { + dev_err(dev, "failed to get probe func\n"); + return -EINVAL; } - /* enable clock for thermal */ - ret = clk_prepare_enable(data->clk); + ret = platform_probe(data); + if (ret) + return ret; + + ret = hisi_thermal_register_sensor(pdev, data, + &data->sensor); if (ret) { - dev_err(&pdev->dev, "failed to enable thermal clk: %d\n", ret); + dev_err(dev, "failed to register thermal sensor: %d\n", ret); return ret; } - hisi_thermal_enable_bind_irq_sensor(data); - data->irq_enabled = true; - - for (i = 0; i < HISI_MAX_SENSORS; ++i) { - ret = hisi_thermal_register_sensor(pdev, data, - &data->sensors[i], i); - if (ret) - dev_err(&pdev->dev, - "failed to register thermal sensor: %d\n", ret); - else - hisi_thermal_toggle_sensor(&data->sensors[i], true); - } - - ret = devm_request_threaded_irq(&pdev->dev, data->irq, - hisi_thermal_alarm_irq, - hisi_thermal_alarm_irq_thread, - 0, "hisi_thermal", data); - if (ret < 0) { - dev_err(&pdev->dev, "failed to request alarm irq: %d\n", ret); + ret = data->enable_sensor(data); + if (ret) { + dev_err(dev, "Failed to setup the sensor: %d\n", ret); return ret; } - enable_irq(data->irq); + if (data->irq) { + ret = devm_request_threaded_irq(dev, data->irq, NULL, + hisi_thermal_alarm_irq_thread, + IRQF_ONESHOT, "hisi_thermal", data); + if (ret < 0) { + dev_err(dev, "failed to request alarm irq: %d\n", ret); + return ret; + } + } + + hisi_thermal_toggle_sensor(&data->sensor, true); return 0; } @@ -386,19 +579,11 @@ static int hisi_thermal_probe(struct platform_device *pdev) static int hisi_thermal_remove(struct platform_device *pdev) { struct hisi_thermal_data *data = platform_get_drvdata(pdev); - int i; + struct hisi_thermal_sensor *sensor = &data->sensor; - for (i = 0; i < HISI_MAX_SENSORS; i++) { - struct hisi_thermal_sensor *sensor = &data->sensors[i]; + hisi_thermal_toggle_sensor(sensor, false); - if (!sensor->tzd) - continue; - - hisi_thermal_toggle_sensor(sensor, false); - } - - hisi_thermal_disable_sensor(data); - clk_disable_unprepare(data->clk); + data->disable_sensor(data); return 0; } @@ -408,10 +593,7 @@ static int hisi_thermal_suspend(struct device *dev) { struct hisi_thermal_data *data = dev_get_drvdata(dev); - hisi_thermal_disable_sensor(data); - data->irq_enabled = false; - - clk_disable_unprepare(data->clk); + data->disable_sensor(data); return 0; } @@ -419,16 +601,8 @@ static int hisi_thermal_suspend(struct device *dev) static int hisi_thermal_resume(struct device *dev) { struct hisi_thermal_data *data = dev_get_drvdata(dev); - int ret; - ret = clk_prepare_enable(data->clk); - if (ret) - return ret; - - data->irq_enabled = true; - hisi_thermal_enable_bind_irq_sensor(data); - - return 0; + return data->enable_sensor(data); } #endif
diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c index a260cde..5532c44 100644 --- a/drivers/tty/serial/kgdboc.c +++ b/drivers/tty/serial/kgdboc.c
@@ -245,7 +245,8 @@ static void kgdboc_put_char(u8 chr) kgdb_tty_line, chr); } -static int param_set_kgdboc_var(const char *kmessage, struct kernel_param *kp) +static int param_set_kgdboc_var(const char *kmessage, + const struct kernel_param *kp) { int len = strlen(kmessage);
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c index 53e6db8..17c2ee2 100644 --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c
@@ -131,6 +131,9 @@ static void __uart_start(struct tty_struct *tty) struct uart_state *state = tty->driver_data; struct uart_port *port = state->uart_port; + if (port && port->ops->wake_peer) + port->ops->wake_peer(port); + if (port && !uart_tx_stopped(port)) port->ops->start_tx(port); }
diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig index f3ee80e..f6cce5a 100644 --- a/drivers/usb/gadget/Kconfig +++ b/drivers/usb/gadget/Kconfig
@@ -209,6 +209,18 @@ config USB_F_TCM tristate +config USB_F_MTP + tristate + +config USB_F_PTP + tristate + +config USB_F_AUDIO_SRC + tristate + +config USB_F_ACC + tristate + # this first set of drivers all depend on bulk-capable hardware. config USB_CONFIGFS @@ -362,6 +374,44 @@ implemented in kernel space (for instance Ethernet, serial or mass storage) and other are implemented in user space. +config USB_CONFIGFS_F_MTP + boolean "MTP gadget" + depends on USB_CONFIGFS + select USB_F_MTP + help + USB gadget MTP support + +config USB_CONFIGFS_F_PTP + boolean "PTP gadget" + depends on USB_CONFIGFS && USB_CONFIGFS_F_MTP + select USB_F_PTP + help + USB gadget PTP support + +config USB_CONFIGFS_F_ACC + boolean "Accessory gadget" + depends on USB_CONFIGFS + select USB_F_ACC + help + USB gadget Accessory support + +config USB_CONFIGFS_F_AUDIO_SRC + boolean "Audio Source gadget" + depends on USB_CONFIGFS && USB_CONFIGFS_F_ACC + depends on SND + select SND_PCM + select USB_F_AUDIO_SRC + help + USB gadget Audio Source support + +config USB_CONFIGFS_UEVENT + boolean "Uevent notification of Gadget state" + depends on USB_CONFIGFS + help + Enable uevent notifications to userspace when the gadget + state changes. The gadget can be in any of the following + three states: "CONNECTED/DISCONNECTED/CONFIGURED" + config USB_CONFIGFS_F_UAC1 bool "Audio Class 1.0" depends on USB_CONFIGFS
diff --git a/drivers/usb/gadget/composite.c b/drivers/usb/gadget/composite.c index 2c022a0..a8b4ca0 100644 --- a/drivers/usb/gadget/composite.c +++ b/drivers/usb/gadget/composite.c
@@ -1996,6 +1996,12 @@ void composite_disconnect(struct usb_gadget *gadget) struct usb_composite_dev *cdev = get_gadget_data(gadget); unsigned long flags; + if (cdev == NULL) { + WARN(1, "%s: Calling disconnect on a Gadget that is \ + not connected\n", __func__); + return; + } + /* REVISIT: should we have config and device level * disconnect callbacks? */
diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c index a5ca409..b1d22d8 100644 --- a/drivers/usb/gadget/configfs.c +++ b/drivers/usb/gadget/configfs.c
@@ -9,6 +9,31 @@ #include "u_f.h" #include "u_os_desc.h" +#ifdef CONFIG_USB_CONFIGFS_UEVENT +#include <linux/platform_device.h> +#include <linux/kdev_t.h> +#include <linux/usb/ch9.h> + +#ifdef CONFIG_USB_CONFIGFS_F_ACC +extern int acc_ctrlrequest(struct usb_composite_dev *cdev, + const struct usb_ctrlrequest *ctrl); +void acc_disconnect(void); +#endif +static struct class *android_class; +static struct device *android_device; +static int index; + +struct device *create_function_device(char *name) +{ + if (android_device && !IS_ERR(android_device)) + return device_create(android_class, android_device, + MKDEV(0, index++), NULL, name); + else + return ERR_PTR(-EINVAL); +} +EXPORT_SYMBOL_GPL(create_function_device); +#endif + int check_user_usb_string(const char *name, struct usb_gadget_strings *stringtab_dev) { @@ -60,6 +85,12 @@ struct gadget_info { bool use_os_desc; char b_vendor_code; char qw_sign[OS_STRING_QW_SIGN_LEN]; +#ifdef CONFIG_USB_CONFIGFS_UEVENT + bool connected; + bool sw_connected; + struct work_struct work; + struct device *dev; +#endif }; static inline struct gadget_info *to_gadget_info(struct config_item *item) @@ -265,7 +296,7 @@ static ssize_t gadget_dev_desc_UDC_store(struct config_item *item, mutex_lock(&gi->lock); - if (!strlen(name)) { + if (!strlen(name) || strcmp(name, "none") == 0) { ret = unregister_gadget(gi); if (ret) goto err; @@ -1369,6 +1400,60 @@ static int configfs_composite_bind(struct usb_gadget *gadget, return ret; } +#ifdef CONFIG_USB_CONFIGFS_UEVENT +static void android_work(struct work_struct *data) +{ + struct gadget_info *gi = container_of(data, struct gadget_info, work); + struct usb_composite_dev *cdev = &gi->cdev; + char *disconnected[2] = { "USB_STATE=DISCONNECTED", NULL }; + char *connected[2] = { "USB_STATE=CONNECTED", NULL }; + char *configured[2] = { "USB_STATE=CONFIGURED", NULL }; + /* 0-connected 1-configured 2-disconnected*/ + bool status[3] = { false, false, false }; + unsigned long flags; + bool uevent_sent = false; + + spin_lock_irqsave(&cdev->lock, flags); + if (cdev->config) + status[1] = true; + + if (gi->connected != gi->sw_connected) { + if (gi->connected) + status[0] = true; + else + status[2] = true; + gi->sw_connected = gi->connected; + } + spin_unlock_irqrestore(&cdev->lock, flags); + + if (status[0]) { + kobject_uevent_env(&android_device->kobj, + KOBJ_CHANGE, connected); + pr_info("%s: sent uevent %s\n", __func__, connected[0]); + uevent_sent = true; + } + + if (status[1]) { + kobject_uevent_env(&android_device->kobj, + KOBJ_CHANGE, configured); + pr_info("%s: sent uevent %s\n", __func__, configured[0]); + uevent_sent = true; + } + + if (status[2]) { + kobject_uevent_env(&android_device->kobj, + KOBJ_CHANGE, disconnected); + pr_info("%s: sent uevent %s\n", __func__, disconnected[0]); + uevent_sent = true; + } + + if (!uevent_sent) { + pr_info("%s: did not send uevent (%d %d %p)\n", __func__, + gi->connected, gi->sw_connected, cdev->config); + } +} +#endif + static void configfs_composite_unbind(struct usb_gadget *gadget) { struct usb_composite_dev *cdev; @@ -1388,14 +1473,91 @@ static void configfs_composite_unbind(struct usb_gadget *gadget) set_gadget_data(gadget, NULL); } +#ifdef CONFIG_USB_CONFIGFS_UEVENT +static int android_setup(struct usb_gadget *gadget, + const struct usb_ctrlrequest *c) +{ + struct usb_composite_dev *cdev = get_gadget_data(gadget); + unsigned long flags; + struct gadget_info *gi = container_of(cdev, struct gadget_info, cdev); + int value = -EOPNOTSUPP; + struct usb_function_instance *fi; + + spin_lock_irqsave(&cdev->lock, flags); + if (!gi->connected) { + gi->connected = 1; + schedule_work(&gi->work); + } + spin_unlock_irqrestore(&cdev->lock, flags); + list_for_each_entry(fi, &gi->available_func, cfs_list) { + if (fi != NULL && fi->f != NULL && fi->f->setup != NULL) { + value = fi->f->setup(fi->f, c); + if (value >= 0) + break; + } + } + +#ifdef CONFIG_USB_CONFIGFS_F_ACC + if (value < 0) + value = acc_ctrlrequest(cdev, c); +#endif + + if (value < 0) + value = composite_setup(gadget, c); + + spin_lock_irqsave(&cdev->lock, flags); + if (c->bRequest == USB_REQ_SET_CONFIGURATION && + cdev->config) { + schedule_work(&gi->work); + } + spin_unlock_irqrestore(&cdev->lock, flags); + + return value; +} + +static void android_disconnect(struct usb_gadget *gadget) +{ + struct usb_composite_dev *cdev = get_gadget_data(gadget); + struct gadget_info *gi = container_of(cdev, struct gadget_info, cdev); + + /* FIXME: There's a race between usb_gadget_udc_stop() which is likely + * to set the gadget driver to NULL in the udc driver and this drivers + * gadget disconnect fn which likely checks for the gadget driver to + * be a null ptr. It happens that unbind (doing set_gadget_data(NULL)) + * is called before the gadget driver is set to NULL and the udc driver + * calls disconnect fn which results in cdev being a null ptr. + */ + if (cdev == NULL) { + WARN(1, "%s: gadget driver already disconnected\n", __func__); + return; + } + + /* accessory HID support can be active while the + accessory function is not actually enabled, + so we need to inform it when we are disconnected. + */ + +#ifdef CONFIG_USB_CONFIGFS_F_ACC + acc_disconnect(); +#endif + gi->connected = 0; + schedule_work(&gi->work); + composite_disconnect(gadget); +} +#endif + static const struct usb_gadget_driver configfs_driver_template = { .bind = configfs_composite_bind, .unbind = configfs_composite_unbind, - +#ifdef CONFIG_USB_CONFIGFS_UEVENT + .setup = android_setup, + .reset = android_disconnect, + .disconnect = android_disconnect, +#else .setup = composite_setup, .reset = composite_disconnect, .disconnect = composite_disconnect, - +#endif .suspend = composite_suspend, .resume = composite_resume, @@ -1407,6 +1569,89 @@ static const struct usb_gadget_driver configfs_driver_template = { .match_existing_only = 1, }; +#ifdef CONFIG_USB_CONFIGFS_UEVENT +static ssize_t state_show(struct device *pdev, struct device_attribute *attr, + char *buf) +{ + struct gadget_info *dev = dev_get_drvdata(pdev); + struct usb_composite_dev *cdev; + char *state = "DISCONNECTED"; + unsigned long flags; + + if (!dev) + goto out; + + cdev = &dev->cdev; + + if (!cdev) + goto out; + + spin_lock_irqsave(&cdev->lock, flags); + if (cdev->config) + state = "CONFIGURED"; + else if (dev->connected) + state = "CONNECTED"; + spin_unlock_irqrestore(&cdev->lock, flags); +out: + return sprintf(buf, "%s\n", state); +} + +static DEVICE_ATTR(state, S_IRUGO, state_show, NULL); + +static struct device_attribute *android_usb_attributes[] = { + &dev_attr_state, + NULL +}; + +static int android_device_create(struct gadget_info *gi) +{ + struct device_attribute **attrs; + struct device_attribute *attr; + + INIT_WORK(&gi->work, android_work); + android_device = device_create(android_class, NULL, + MKDEV(0, 0), NULL, "android0"); + if (IS_ERR(android_device)) + return PTR_ERR(android_device); + + dev_set_drvdata(android_device, gi); + + attrs = android_usb_attributes; + while ((attr = *attrs++)) { + int err; + + err = device_create_file(android_device, attr); + if (err) { + device_destroy(android_device->class, + android_device->devt); + return err; + } + } + + return 0; +} + +static void android_device_destroy(void) +{ + struct device_attribute **attrs; + struct device_attribute *attr; + + attrs = android_usb_attributes; + while ((attr = *attrs++)) + device_remove_file(android_device, attr); + device_destroy(android_device->class, android_device->devt); +} +#else +static inline int android_device_create(struct gadget_info *gi) +{ + return 0; +} + +static inline void android_device_destroy(void) +{ +} +#endif + static struct config_group *gadgets_make( struct config_group *group, const char *name) @@ -1458,7 +1703,11 @@ static struct config_group *gadgets_make( if (!gi->composite.gadget_driver.function) goto err; + if (android_device_create(gi) < 0) + goto err; + return &gi->group; + err: kfree(gi); return ERR_PTR(-ENOMEM); @@ -1467,6 +1716,7 @@ static struct config_group *gadgets_make( static void gadgets_drop(struct config_group *group, struct config_item *item) { config_item_put(item); + android_device_destroy(); } static struct configfs_group_operations gadgets_ops = { @@ -1506,6 +1756,13 @@ static int __init gadget_cfs_init(void) config_group_init(&gadget_subsys.su_group); ret = configfs_register_subsystem(&gadget_subsys); + +#ifdef CONFIG_USB_CONFIGFS_UEVENT + android_class = class_create(THIS_MODULE, "android_usb"); + if (IS_ERR(android_class)) + return PTR_ERR(android_class); +#endif + return ret; } module_init(gadget_cfs_init); @@ -1513,5 +1770,10 @@ module_init(gadget_cfs_init); static void __exit gadget_cfs_exit(void) { configfs_unregister_subsystem(&gadget_subsys); +#ifdef CONFIG_USB_CONFIGFS_UEVENT + if (!IS_ERR(android_class)) + class_destroy(android_class); +#endif + } module_exit(gadget_cfs_exit);
diff --git a/drivers/usb/gadget/function/Makefile b/drivers/usb/gadget/function/Makefile index cb8c225..78682d5 100644 --- a/drivers/usb/gadget/function/Makefile +++ b/drivers/usb/gadget/function/Makefile
@@ -46,3 +46,11 @@ obj-$(CONFIG_USB_F_PRINTER) += usb_f_printer.o usb_f_tcm-y := f_tcm.o obj-$(CONFIG_USB_F_TCM) += usb_f_tcm.o +usb_f_mtp-y := f_mtp.o +obj-$(CONFIG_USB_F_MTP) += usb_f_mtp.o +usb_f_ptp-y := f_ptp.o +obj-$(CONFIG_USB_F_PTP) += usb_f_ptp.o +usb_f_audio_source-y := f_audio_source.o +obj-$(CONFIG_USB_F_AUDIO_SRC) += usb_f_audio_source.o +usb_f_accessory-y := f_accessory.o +obj-$(CONFIG_USB_F_ACC) += usb_f_accessory.o
diff --git a/drivers/usb/gadget/function/f_accessory.c b/drivers/usb/gadget/function/f_accessory.c new file mode 100644 index 0000000..7aa2656 --- /dev/null +++ b/drivers/usb/gadget/function/f_accessory.c
@@ -0,0 +1,1352 @@ +/* + * Gadget Function Driver for Android USB accessories + * + * Copyright (C) 2011 Google, Inc. + * Author: Mike Lockwood <lockwood@android.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +/* #define DEBUG */ +/* #define VERBOSE_DEBUG */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/poll.h> +#include <linux/delay.h> +#include <linux/wait.h> +#include <linux/err.h> +#include <linux/interrupt.h> +#include <linux/kthread.h> +#include <linux/freezer.h> + +#include <linux/types.h> +#include <linux/file.h> +#include <linux/device.h> +#include <linux/miscdevice.h> + +#include <linux/hid.h> +#include <linux/hiddev.h> +#include <linux/usb.h> +#include <linux/usb/ch9.h> +#include <linux/usb/f_accessory.h> + +#include <linux/configfs.h> +#include <linux/usb/composite.h> + +#define MAX_INST_NAME_LEN 40 +#define BULK_BUFFER_SIZE 16384 +#define ACC_STRING_SIZE 256 + +#define PROTOCOL_VERSION 2 + +/* String IDs */ +#define INTERFACE_STRING_INDEX 0 + +/* number of tx and rx requests to allocate */ +#define TX_REQ_MAX 4 +#define RX_REQ_MAX 2 + +struct acc_hid_dev { + struct list_head list; + struct hid_device *hid; + struct acc_dev *dev; + /* accessory defined ID */ + int id; + /* HID report descriptor */ + u8 *report_desc; + /* length of HID report descriptor */ + int report_desc_len; + /* number of bytes of report_desc we have received so far */ + int report_desc_offset; +}; + +struct acc_dev { + struct usb_function function; + struct usb_composite_dev *cdev; + spinlock_t lock; + + struct usb_ep *ep_in; + struct usb_ep *ep_out; + + /* online indicates state of function_set_alt & function_unbind + * set to 1 when we connect + */ + int online:1; + + /* disconnected indicates state of open & release + * Set to 1 when we disconnect. + * Not cleared until our file is closed. + */ + int disconnected:1; + + /* strings sent by the host */ + char manufacturer[ACC_STRING_SIZE]; + char model[ACC_STRING_SIZE]; + char description[ACC_STRING_SIZE]; + char version[ACC_STRING_SIZE]; + char uri[ACC_STRING_SIZE]; + char serial[ACC_STRING_SIZE]; + + /* for acc_complete_set_string */ + int string_index; + + /* set to 1 if we have a pending start request */ + int start_requested; + + int audio_mode; + + /* synchronize access to our device file */ + atomic_t open_excl; + + struct list_head tx_idle; + + wait_queue_head_t read_wq; + wait_queue_head_t write_wq; + struct usb_request *rx_req[RX_REQ_MAX]; + int rx_done; + + /* delayed work for handling ACCESSORY_START */ + struct delayed_work start_work; + + /* worker for registering and unregistering hid devices */ + struct work_struct hid_work; + + /* list of active HID devices */ + struct list_head hid_list; + + /* list of new HID devices to register */ + struct list_head new_hid_list; + + /* list of dead HID devices to unregister */ + struct list_head dead_hid_list; +}; + +static struct usb_interface_descriptor acc_interface_desc = { + .bLength = USB_DT_INTERFACE_SIZE, + .bDescriptorType = USB_DT_INTERFACE, + .bInterfaceNumber = 0, + .bNumEndpoints = 2, + .bInterfaceClass = USB_CLASS_VENDOR_SPEC, + .bInterfaceSubClass = USB_SUBCLASS_VENDOR_SPEC, + .bInterfaceProtocol = 0, +}; + +static struct usb_endpoint_descriptor acc_highspeed_in_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_IN, + .bmAttributes = USB_ENDPOINT_XFER_BULK, + .wMaxPacketSize = __constant_cpu_to_le16(512), +}; + +static struct usb_endpoint_descriptor acc_highspeed_out_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_OUT, + .bmAttributes = USB_ENDPOINT_XFER_BULK, + .wMaxPacketSize = __constant_cpu_to_le16(512), +}; + +static struct usb_endpoint_descriptor acc_fullspeed_in_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_IN, + .bmAttributes = USB_ENDPOINT_XFER_BULK, +}; + +static struct usb_endpoint_descriptor acc_fullspeed_out_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_OUT, + .bmAttributes = USB_ENDPOINT_XFER_BULK, +}; + +static struct usb_descriptor_header *fs_acc_descs[] = { + (struct usb_descriptor_header *) &acc_interface_desc, + (struct usb_descriptor_header *) &acc_fullspeed_in_desc, + (struct usb_descriptor_header *) &acc_fullspeed_out_desc, + NULL, +}; + +static struct usb_descriptor_header *hs_acc_descs[] = { + (struct usb_descriptor_header *) &acc_interface_desc, + (struct usb_descriptor_header *) &acc_highspeed_in_desc, + (struct usb_descriptor_header *) &acc_highspeed_out_desc, + NULL, +}; + +static struct usb_string acc_string_defs[] = { + [INTERFACE_STRING_INDEX].s = "Android Accessory Interface", + { }, /* end of list */ +}; + +static struct usb_gadget_strings acc_string_table = { + .language = 0x0409, /* en-US */ + .strings = acc_string_defs, +}; + +static struct usb_gadget_strings *acc_strings[] = { + &acc_string_table, + NULL, +}; + +/* temporary variable used between acc_open() and acc_gadget_bind() */ +static struct acc_dev *_acc_dev; + +struct acc_instance { + struct usb_function_instance func_inst; + const char *name; +}; + +static inline struct acc_dev *func_to_dev(struct usb_function *f) +{ + return container_of(f, struct acc_dev, function); +} + +static struct usb_request *acc_request_new(struct usb_ep *ep, int buffer_size) +{ + struct usb_request *req = usb_ep_alloc_request(ep, GFP_KERNEL); + + if (!req) + return NULL; + + /* now allocate buffers for the requests */ + req->buf = kmalloc(buffer_size, GFP_KERNEL); + if (!req->buf) { + usb_ep_free_request(ep, req); + return NULL; + } + + return req; +} + +static void acc_request_free(struct usb_request *req, struct usb_ep *ep) +{ + if (req) { + kfree(req->buf); + usb_ep_free_request(ep, req); + } +} + +/* add a request to the tail of a list */ +static void req_put(struct acc_dev *dev, struct list_head *head, + struct usb_request *req) +{ + unsigned long flags; + + spin_lock_irqsave(&dev->lock, flags); + list_add_tail(&req->list, head); + spin_unlock_irqrestore(&dev->lock, flags); +} + +/* remove a request from the head of a list */ +static struct usb_request *req_get(struct acc_dev *dev, struct list_head *head) +{ + unsigned long flags; + struct usb_request *req; + + spin_lock_irqsave(&dev->lock, flags); + if (list_empty(head)) { + req = 0; + } else { + req = list_first_entry(head, struct usb_request, list); + list_del(&req->list); + } + spin_unlock_irqrestore(&dev->lock, flags); + return req; +} + +static void acc_set_disconnected(struct acc_dev *dev) +{ + dev->disconnected = 1; +} + +static void acc_complete_in(struct usb_ep *ep, struct usb_request *req) +{ + struct acc_dev *dev = _acc_dev; + + if (req->status == -ESHUTDOWN) { + pr_debug("acc_complete_in set disconnected"); + acc_set_disconnected(dev); + } + + req_put(dev, &dev->tx_idle, req); + + wake_up(&dev->write_wq); +} + +static void acc_complete_out(struct usb_ep *ep, struct usb_request *req) +{ + struct acc_dev *dev = _acc_dev; + + dev->rx_done = 1; + if (req->status == -ESHUTDOWN) { + pr_debug("acc_complete_out set disconnected"); + acc_set_disconnected(dev); + } + + wake_up(&dev->read_wq); +} + +static void acc_complete_set_string(struct usb_ep *ep, struct usb_request *req) +{ + struct acc_dev *dev = ep->driver_data; + char *string_dest = NULL; + int length = req->actual; + + if (req->status != 0) { + pr_err("acc_complete_set_string, err %d\n", req->status); + return; + } + + switch (dev->string_index) { + case ACCESSORY_STRING_MANUFACTURER: + string_dest = dev->manufacturer; + break; + case ACCESSORY_STRING_MODEL: + string_dest = dev->model; + break; + case ACCESSORY_STRING_DESCRIPTION: + string_dest = dev->description; + break; + case ACCESSORY_STRING_VERSION: + string_dest = dev->version; + break; + case ACCESSORY_STRING_URI: + string_dest = dev->uri; + break; + case ACCESSORY_STRING_SERIAL: + string_dest = dev->serial; + break; + } + if (string_dest) { + unsigned long flags; + + if (length >= ACC_STRING_SIZE) + length = ACC_STRING_SIZE - 1; + + spin_lock_irqsave(&dev->lock, flags); + memcpy(string_dest, req->buf, length); + /* ensure zero termination */ + string_dest[length] = 0; + spin_unlock_irqrestore(&dev->lock, flags); + } else { + pr_err("unknown accessory string index %d\n", + dev->string_index); + } +} + +static void acc_complete_set_hid_report_desc(struct usb_ep *ep, + struct usb_request *req) +{ + struct acc_hid_dev *hid = req->context; + struct acc_dev *dev = hid->dev; + int length = req->actual; + + if (req->status != 0) { + pr_err("acc_complete_set_hid_report_desc, err %d\n", + req->status); + return; + } + + memcpy(hid->report_desc + hid->report_desc_offset, req->buf, length); + hid->report_desc_offset += length; + if (hid->report_desc_offset == hid->report_desc_len) { + /* After we have received the entire report descriptor + * we schedule work to initialize the HID device + */ + schedule_work(&dev->hid_work); + } +} + +static void acc_complete_send_hid_event(struct usb_ep *ep, + struct usb_request *req) +{ + struct acc_hid_dev *hid = req->context; + int length = req->actual; + + if (req->status != 0) { + pr_err("acc_complete_send_hid_event, err %d\n", req->status); + return; + } + + hid_report_raw_event(hid->hid, HID_INPUT_REPORT, req->buf, length, 1); +} + +static int acc_hid_parse(struct hid_device *hid) +{ + struct acc_hid_dev *hdev = hid->driver_data; + + hid_parse_report(hid, hdev->report_desc, hdev->report_desc_len); + return 0; +} + +static int acc_hid_start(struct hid_device *hid) +{ + return 0; +} + +static void acc_hid_stop(struct hid_device *hid) +{ +} + +static int acc_hid_open(struct hid_device *hid) +{ + return 0; +} + +static void acc_hid_close(struct hid_device *hid) +{ +} + +static int acc_hid_raw_request(struct hid_device *hid, unsigned char reportnum, + __u8 *buf, size_t len, unsigned char rtype, int reqtype) +{ + return 0; +} + +static struct hid_ll_driver acc_hid_ll_driver = { + .parse = acc_hid_parse, + .start = acc_hid_start, + .stop = acc_hid_stop, + .open = acc_hid_open, + .close = acc_hid_close, + .raw_request = acc_hid_raw_request, +}; + +static struct acc_hid_dev *acc_hid_new(struct acc_dev *dev, + int id, int desc_len) +{ + struct acc_hid_dev *hdev; + + hdev = kzalloc(sizeof(*hdev), GFP_ATOMIC); + if (!hdev) + return NULL; + hdev->report_desc = kzalloc(desc_len, GFP_ATOMIC); + if (!hdev->report_desc) { + kfree(hdev); + return NULL; + } + hdev->dev = dev; + hdev->id = id; + hdev->report_desc_len = desc_len; + + return hdev; +} + +static struct acc_hid_dev *acc_hid_get(struct list_head *list, int id) +{ + struct acc_hid_dev *hid; + + list_for_each_entry(hid, list, list) { + if (hid->id == id) + return hid; + } + return NULL; +} + +static int acc_register_hid(struct acc_dev *dev, int id, int desc_length) +{ + struct acc_hid_dev *hid; + unsigned long flags; + + /* report descriptor length must be > 0 */ + if (desc_length <= 0) + return -EINVAL; + + spin_lock_irqsave(&dev->lock, flags); + /* replace HID if one already exists with this ID */ + hid = acc_hid_get(&dev->hid_list, id); + if (!hid) + hid = acc_hid_get(&dev->new_hid_list, id); + if (hid) + list_move(&hid->list, &dev->dead_hid_list); + + hid = acc_hid_new(dev, id, desc_length); + if (!hid) { + spin_unlock_irqrestore(&dev->lock, flags); + return -ENOMEM; + } + + list_add(&hid->list, &dev->new_hid_list); + spin_unlock_irqrestore(&dev->lock, flags); + + /* schedule work to register the HID device */ + schedule_work(&dev->hid_work); + return 0; +} + +static int acc_unregister_hid(struct acc_dev *dev, int id) +{ + struct acc_hid_dev *hid; + unsigned long flags; + + spin_lock_irqsave(&dev->lock, flags); + hid = acc_hid_get(&dev->hid_list, id); + if (!hid) + hid = acc_hid_get(&dev->new_hid_list, id); + if (!hid) { + spin_unlock_irqrestore(&dev->lock, flags); + return -EINVAL; + } + + list_move(&hid->list, &dev->dead_hid_list); + spin_unlock_irqrestore(&dev->lock, flags); + + schedule_work(&dev->hid_work); + return 0; +} + +static int create_bulk_endpoints(struct acc_dev *dev, + struct usb_endpoint_descriptor *in_desc, + struct usb_endpoint_descriptor *out_desc) +{ + struct usb_composite_dev *cdev = dev->cdev; + struct usb_request *req; + struct usb_ep *ep; + int i; + + DBG(cdev, "create_bulk_endpoints dev: %p\n", dev); + + ep = usb_ep_autoconfig(cdev->gadget, in_desc); + if (!ep) { + DBG(cdev, "usb_ep_autoconfig for ep_in failed\n"); + return -ENODEV; + } + DBG(cdev, "usb_ep_autoconfig for ep_in got %s\n", ep->name); + ep->driver_data = dev; /* claim the endpoint */ + dev->ep_in = ep; + + ep = usb_ep_autoconfig(cdev->gadget, out_desc); + if (!ep) { + DBG(cdev, "usb_ep_autoconfig for ep_out failed\n"); + return -ENODEV; + } + DBG(cdev, "usb_ep_autoconfig for ep_out got %s\n", ep->name); + ep->driver_data = dev; /* claim the endpoint */ + dev->ep_out = ep; + + /* now allocate requests for our endpoints */ + for (i = 0; i < TX_REQ_MAX; i++) { + req = acc_request_new(dev->ep_in, BULK_BUFFER_SIZE); + if (!req) + goto fail; + req->complete = acc_complete_in; + req_put(dev, &dev->tx_idle, req); + } + for (i = 0; i < RX_REQ_MAX; i++) { + req = acc_request_new(dev->ep_out, BULK_BUFFER_SIZE); + if (!req) + goto fail; + req->complete = acc_complete_out; + dev->rx_req[i] = req; + } + + return 0; + +fail: + pr_err("acc_bind() could not allocate requests\n"); + while ((req = req_get(dev, &dev->tx_idle))) + acc_request_free(req, dev->ep_in); + for (i = 0; i < RX_REQ_MAX; i++) + acc_request_free(dev->rx_req[i], dev->ep_out); + return -1; +} + +static ssize_t acc_read(struct file *fp, char __user *buf, + size_t count, loff_t *pos) +{ + struct acc_dev *dev = fp->private_data; + struct usb_request *req; + ssize_t r = count; + unsigned xfer; + int ret = 0; + + pr_debug("acc_read(%zu)\n", count); + + if (dev->disconnected) { + pr_debug("acc_read disconnected"); + return -ENODEV; + } + + if (count > BULK_BUFFER_SIZE) + count = BULK_BUFFER_SIZE; + + /* we will block until we're online */ + pr_debug("acc_read: waiting for online\n"); + ret = wait_event_interruptible(dev->read_wq, dev->online); + if (ret < 0) { + r = ret; + goto done; + } + + if (dev->rx_done) { + // last req cancelled. try to get it. + req = dev->rx_req[0]; + goto copy_data; + } + +requeue_req: + /* queue a request */ + req = dev->rx_req[0]; + req->length = count; + dev->rx_done = 0; + ret = usb_ep_queue(dev->ep_out, req, GFP_KERNEL); + if (ret < 0) { + r = -EIO; + goto done; + } else { + pr_debug("rx %p queue\n", req); + } + + /* wait for a request to complete */ + ret = wait_event_interruptible(dev->read_wq, dev->rx_done); + if (ret < 0) { + r = ret; + ret = usb_ep_dequeue(dev->ep_out, req); + if (ret != 0) { + // cancel failed. There can be a data already received. + // it will be retrieved in the next read. + pr_debug("acc_read: cancelling failed %d", ret); + } + goto done; + } + +copy_data: + dev->rx_done = 0; + if (dev->online) { + /* If we got a 0-len packet, throw it back and try again. */ + if (req->actual == 0) + goto requeue_req; + + pr_debug("rx %p %u\n", req, req->actual); + xfer = (req->actual < count) ? req->actual : count; + r = xfer; + if (copy_to_user(buf, req->buf, xfer)) + r = -EFAULT; + } else + r = -EIO; + +done: + pr_debug("acc_read returning %zd\n", r); + return r; +} + +static ssize_t acc_write(struct file *fp, const char __user *buf, + size_t count, loff_t *pos) +{ + struct acc_dev *dev = fp->private_data; + struct usb_request *req = 0; + ssize_t r = count; + unsigned xfer; + int ret; + + pr_debug("acc_write(%zu)\n", count); + + if (!dev->online || dev->disconnected) { + pr_debug("acc_write disconnected or not online"); + return -ENODEV; + } + + while (count > 0) { + if (!dev->online) { + pr_debug("acc_write dev->error\n"); + r = -EIO; + break; + } + + /* get an idle tx request to use */ + req = 0; + ret = wait_event_interruptible(dev->write_wq, + ((req = req_get(dev, &dev->tx_idle)) || !dev->online)); + if (!req) { + r = ret; + break; + } + + if (count > BULK_BUFFER_SIZE) { + xfer = BULK_BUFFER_SIZE; + /* ZLP, They will be more TX requests so not yet. */ + req->zero = 0; + } else { + xfer = count; + /* If the data length is a multple of the + * maxpacket size then send a zero length packet(ZLP). + */ + req->zero = ((xfer % dev->ep_in->maxpacket) == 0); + } + if (copy_from_user(req->buf, buf, xfer)) { + r = -EFAULT; + break; + } + + req->length = xfer; + ret = usb_ep_queue(dev->ep_in, req, GFP_KERNEL); + if (ret < 0) { + pr_debug("acc_write: xfer error %d\n", ret); + r = -EIO; + break; + } + + buf += xfer; + count -= xfer; + + /* zero this so we don't try to free it on error exit */ + req = 0; + } + + if (req) + req_put(dev, &dev->tx_idle, req); + + pr_debug("acc_write returning %zd\n", r); + return r; +} + +static long acc_ioctl(struct file *fp, unsigned code, unsigned long value) +{ + struct acc_dev *dev = fp->private_data; + char *src = NULL; + int ret; + + switch (code) { + case ACCESSORY_GET_STRING_MANUFACTURER: + src = dev->manufacturer; + break; + case ACCESSORY_GET_STRING_MODEL: + src = dev->model; + break; + case ACCESSORY_GET_STRING_DESCRIPTION: + src = dev->description; + break; + case ACCESSORY_GET_STRING_VERSION: + src = dev->version; + break; + case ACCESSORY_GET_STRING_URI: + src = dev->uri; + break; + case ACCESSORY_GET_STRING_SERIAL: + src = dev->serial; + break; + case ACCESSORY_IS_START_REQUESTED: + return dev->start_requested; + case ACCESSORY_GET_AUDIO_MODE: + return dev->audio_mode; + } + if (!src) + return -EINVAL; + + ret = strlen(src) + 1; + if (copy_to_user((void __user *)value, src, ret)) + ret = -EFAULT; + return ret; +} + +static int acc_open(struct inode *ip, struct file *fp) +{ + printk(KERN_INFO "acc_open\n"); + if (atomic_xchg(&_acc_dev->open_excl, 1)) + return -EBUSY; + + _acc_dev->disconnected = 0; + fp->private_data = _acc_dev; + return 0; +} + +static int acc_release(struct inode *ip, struct file *fp) +{ + printk(KERN_INFO "acc_release\n"); + + WARN_ON(!atomic_xchg(&_acc_dev->open_excl, 0)); + /* indicate that we are disconnected + * still could be online so don't touch online flag + */ + _acc_dev->disconnected = 1; + return 0; +} + +/* file operations for /dev/usb_accessory */ +static const struct file_operations acc_fops = { + .owner = THIS_MODULE, + .read = acc_read, + .write = acc_write, + .unlocked_ioctl = acc_ioctl, + .open = acc_open, + .release = acc_release, +}; + +static int acc_hid_probe(struct hid_device *hdev, + const struct hid_device_id *id) +{ + int ret; + + ret = hid_parse(hdev); + if (ret) + return ret; + return hid_hw_start(hdev, HID_CONNECT_DEFAULT); +} + +static struct miscdevice acc_device = { + .minor = MISC_DYNAMIC_MINOR, + .name = "usb_accessory", + .fops = &acc_fops, +}; + +static const struct hid_device_id acc_hid_table[] = { + { HID_USB_DEVICE(HID_ANY_ID, HID_ANY_ID) }, + { } +}; + +static struct hid_driver acc_hid_driver = { + .name = "USB accessory", + .id_table = acc_hid_table, + .probe = acc_hid_probe, +}; + +static void acc_complete_setup_noop(struct usb_ep *ep, struct usb_request *req) +{ + /* + * Default no-op function when nothing needs to be done for the + * setup request + */ +} + +int acc_ctrlrequest(struct usb_composite_dev *cdev, + const struct usb_ctrlrequest *ctrl) +{ + struct acc_dev *dev = _acc_dev; + int value = -EOPNOTSUPP; + struct acc_hid_dev *hid; + int offset; + u8 b_requestType = ctrl->bRequestType; + u8 b_request = ctrl->bRequest; + u16 w_index = le16_to_cpu(ctrl->wIndex); + u16 w_value = le16_to_cpu(ctrl->wValue); + u16 w_length = le16_to_cpu(ctrl->wLength); + unsigned long flags; + +/* + printk(KERN_INFO "acc_ctrlrequest " + "%02x.%02x v%04x i%04x l%u\n", + b_requestType, b_request, + w_value, w_index, w_length); +*/ + + if (b_requestType == (USB_DIR_OUT | USB_TYPE_VENDOR)) { + if (b_request == ACCESSORY_START) { + dev->start_requested = 1; + schedule_delayed_work( + &dev->start_work, msecs_to_jiffies(10)); + value = 0; + cdev->req->complete = acc_complete_setup_noop; + } else if (b_request == ACCESSORY_SEND_STRING) { + dev->string_index = w_index; + cdev->gadget->ep0->driver_data = dev; + cdev->req->complete = acc_complete_set_string; + value = w_length; + } else if (b_request == ACCESSORY_SET_AUDIO_MODE && + w_index == 0 && w_length == 0) { + dev->audio_mode = w_value; + cdev->req->complete = acc_complete_setup_noop; + value = 0; + } else if (b_request == ACCESSORY_REGISTER_HID) { + cdev->req->complete = acc_complete_setup_noop; + value = acc_register_hid(dev, w_value, w_index); + } else if (b_request == ACCESSORY_UNREGISTER_HID) { + cdev->req->complete = acc_complete_setup_noop; + value = acc_unregister_hid(dev, w_value); + } else if (b_request == ACCESSORY_SET_HID_REPORT_DESC) { + spin_lock_irqsave(&dev->lock, flags); + hid = acc_hid_get(&dev->new_hid_list, w_value); + spin_unlock_irqrestore(&dev->lock, flags); + if (!hid) { + value = -EINVAL; + goto err; + } + offset = w_index; + if (offset != hid->report_desc_offset + || offset + w_length > hid->report_desc_len) { + value = -EINVAL; + goto err; + } + cdev->req->context = hid; + cdev->req->complete = acc_complete_set_hid_report_desc; + value = w_length; + } else if (b_request == ACCESSORY_SEND_HID_EVENT) { + spin_lock_irqsave(&dev->lock, flags); + hid = acc_hid_get(&dev->hid_list, w_value); + spin_unlock_irqrestore(&dev->lock, flags); + if (!hid) { + value = -EINVAL; + goto err; + } + cdev->req->context = hid; + cdev->req->complete = acc_complete_send_hid_event; + value = w_length; + } + } else if (b_requestType == (USB_DIR_IN | USB_TYPE_VENDOR)) { + if (b_request == ACCESSORY_GET_PROTOCOL) { + *((u16 *)cdev->req->buf) = PROTOCOL_VERSION; + value = sizeof(u16); + cdev->req->complete = acc_complete_setup_noop; + /* clear any string left over from a previous session */ + memset(dev->manufacturer, 0, sizeof(dev->manufacturer)); + memset(dev->model, 0, sizeof(dev->model)); + memset(dev->description, 0, sizeof(dev->description)); + memset(dev->version, 0, sizeof(dev->version)); + memset(dev->uri, 0, sizeof(dev->uri)); + memset(dev->serial, 0, sizeof(dev->serial)); + dev->start_requested = 0; + dev->audio_mode = 0; + } + } + + if (value >= 0) { + cdev->req->zero = 0; + cdev->req->length = value; + value = usb_ep_queue(cdev->gadget->ep0, cdev->req, GFP_ATOMIC); + if (value < 0) + ERROR(cdev, "%s setup response queue error\n", + __func__); + } + +err: + if (value == -EOPNOTSUPP) + VDBG(cdev, + "unknown class-specific control req " + "%02x.%02x v%04x i%04x l%u\n", + ctrl->bRequestType, ctrl->bRequest, + w_value, w_index, w_length); + return value; +} +EXPORT_SYMBOL_GPL(acc_ctrlrequest); + +static int +__acc_function_bind(struct usb_configuration *c, + struct usb_function *f, bool configfs) +{ + struct usb_composite_dev *cdev = c->cdev; + struct acc_dev *dev = func_to_dev(f); + int id; + int ret; + + DBG(cdev, "acc_function_bind dev: %p\n", dev); + + if (configfs) { + if (acc_string_defs[INTERFACE_STRING_INDEX].id == 0) { + ret = usb_string_id(c->cdev); + if (ret < 0) + return ret; + acc_string_defs[INTERFACE_STRING_INDEX].id = ret; + acc_interface_desc.iInterface = ret; + } + dev->cdev = c->cdev; + } + ret = hid_register_driver(&acc_hid_driver); + if (ret) + return ret; + + dev->start_requested = 0; + + /* allocate interface ID(s) */ + id = usb_interface_id(c, f); + if (id < 0) + return id; + acc_interface_desc.bInterfaceNumber = id; + + /* allocate endpoints */ + ret = create_bulk_endpoints(dev, &acc_fullspeed_in_desc, + &acc_fullspeed_out_desc); + if (ret) + return ret; + + /* support high speed hardware */ + if (gadget_is_dualspeed(c->cdev->gadget)) { + acc_highspeed_in_desc.bEndpointAddress = + acc_fullspeed_in_desc.bEndpointAddress; + acc_highspeed_out_desc.bEndpointAddress = + acc_fullspeed_out_desc.bEndpointAddress; + } + + DBG(cdev, "%s speed %s: IN/%s, OUT/%s\n", + gadget_is_dualspeed(c->cdev->gadget) ? "dual" : "full", + f->name, dev->ep_in->name, dev->ep_out->name); + return 0; +} + +static int +acc_function_bind_configfs(struct usb_configuration *c, + struct usb_function *f) { + return __acc_function_bind(c, f, true); +} + +static void +kill_all_hid_devices(struct acc_dev *dev) +{ + struct acc_hid_dev *hid; + struct list_head *entry, *temp; + unsigned long flags; + + /* do nothing if usb accessory device doesn't exist */ + if (!dev) + return; + + spin_lock_irqsave(&dev->lock, flags); + list_for_each_safe(entry, temp, &dev->hid_list) { + hid = list_entry(entry, struct acc_hid_dev, list); + list_del(&hid->list); + list_add(&hid->list, &dev->dead_hid_list); + } + list_for_each_safe(entry, temp, &dev->new_hid_list) { + hid = list_entry(entry, struct acc_hid_dev, list); + list_del(&hid->list); + list_add(&hid->list, &dev->dead_hid_list); + } + spin_unlock_irqrestore(&dev->lock, flags); + + schedule_work(&dev->hid_work); +} + +static void +acc_hid_unbind(struct acc_dev *dev) +{ + hid_unregister_driver(&acc_hid_driver); + kill_all_hid_devices(dev); +} + +static void +acc_function_unbind(struct usb_configuration *c, struct usb_function *f) +{ + struct acc_dev *dev = func_to_dev(f); + struct usb_request *req; + int i; + + dev->online = 0; /* clear online flag */ + wake_up(&dev->read_wq); /* unblock reads on closure */ + wake_up(&dev->write_wq); /* likewise for writes */ + + while ((req = req_get(dev, &dev->tx_idle))) + acc_request_free(req, dev->ep_in); + for (i = 0; i < RX_REQ_MAX; i++) + acc_request_free(dev->rx_req[i], dev->ep_out); + + acc_hid_unbind(dev); +} + +static void acc_start_work(struct work_struct *data) +{ + char *envp[2] = { "ACCESSORY=START", NULL }; + + kobject_uevent_env(&acc_device.this_device->kobj, KOBJ_CHANGE, envp); +} + +static int acc_hid_init(struct acc_hid_dev *hdev) +{ + struct hid_device *hid; + int ret; + + hid = hid_allocate_device(); + if (IS_ERR(hid)) + return PTR_ERR(hid); + + hid->ll_driver = &acc_hid_ll_driver; + hid->dev.parent = acc_device.this_device; + + hid->bus = BUS_USB; + hid->vendor = HID_ANY_ID; + hid->product = HID_ANY_ID; + hid->driver_data = hdev; + ret = hid_add_device(hid); + if (ret) { + pr_err("can't add hid device: %d\n", ret); + hid_destroy_device(hid); + return ret; + } + + hdev->hid = hid; + return 0; +} + +static void acc_hid_delete(struct acc_hid_dev *hid) +{ + kfree(hid->report_desc); + kfree(hid); +} + +static void acc_hid_work(struct work_struct *data) +{ + struct acc_dev *dev = _acc_dev; + struct list_head *entry, *temp; + struct acc_hid_dev *hid; + struct list_head new_list, dead_list; + unsigned long flags; + + INIT_LIST_HEAD(&new_list); + + spin_lock_irqsave(&dev->lock, flags); + + /* copy hids that are ready for initialization to new_list */ + list_for_each_safe(entry, temp, &dev->new_hid_list) { + hid = list_entry(entry, struct acc_hid_dev, list); + if (hid->report_desc_offset == hid->report_desc_len) + list_move(&hid->list, &new_list); + } + + if (list_empty(&dev->dead_hid_list)) { + INIT_LIST_HEAD(&dead_list); + } else { + /* move all of dev->dead_hid_list to dead_list */ + dead_list.prev = dev->dead_hid_list.prev; + dead_list.next = dev->dead_hid_list.next; + dead_list.next->prev = &dead_list; + dead_list.prev->next = &dead_list; + INIT_LIST_HEAD(&dev->dead_hid_list); + } + + spin_unlock_irqrestore(&dev->lock, flags); + + /* register new HID devices */ + list_for_each_safe(entry, temp, &new_list) { + hid = list_entry(entry, struct acc_hid_dev, list); + if (acc_hid_init(hid)) { + pr_err("can't add HID device %p\n", hid); + acc_hid_delete(hid); + } else { + spin_lock_irqsave(&dev->lock, flags); + list_move(&hid->list, &dev->hid_list); + spin_unlock_irqrestore(&dev->lock, flags); + } + } + + /* remove dead HID devices */ + list_for_each_safe(entry, temp, &dead_list) { + hid = list_entry(entry, struct acc_hid_dev, list); + list_del(&hid->list); + if (hid->hid) + hid_destroy_device(hid->hid); + acc_hid_delete(hid); + } +} + +static int acc_function_set_alt(struct usb_function *f, + unsigned intf, unsigned alt) +{ + struct acc_dev *dev = func_to_dev(f); + struct usb_composite_dev *cdev = f->config->cdev; + int ret; + + DBG(cdev, "acc_function_set_alt intf: %d alt: %d\n", intf, alt); + + ret = config_ep_by_speed(cdev->gadget, f, dev->ep_in); + if (ret) + return ret; + + ret = usb_ep_enable(dev->ep_in); + if (ret) + return ret; + + ret = config_ep_by_speed(cdev->gadget, f, dev->ep_out); + if (ret) + return ret; + + ret = usb_ep_enable(dev->ep_out); + if (ret) { + usb_ep_disable(dev->ep_in); + return ret; + } + + dev->online = 1; + dev->disconnected = 0; /* if online then not disconnected */ + + /* readers may be blocked waiting for us to go online */ + wake_up(&dev->read_wq); + return 0; +} + +static void acc_function_disable(struct usb_function *f) +{ + struct acc_dev *dev = func_to_dev(f); + struct usb_composite_dev *cdev = dev->cdev; + + DBG(cdev, "acc_function_disable\n"); + acc_set_disconnected(dev); /* this now only sets disconnected */ + dev->online = 0; /* so now need to clear online flag here too */ + usb_ep_disable(dev->ep_in); + usb_ep_disable(dev->ep_out); + + /* readers may be blocked waiting for us to go online */ + wake_up(&dev->read_wq); + + VDBG(cdev, "%s disabled\n", dev->function.name); +} + +static int acc_setup(void) +{ + struct acc_dev *dev; + int ret; + + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + if (!dev) + return -ENOMEM; + + spin_lock_init(&dev->lock); + init_waitqueue_head(&dev->read_wq); + init_waitqueue_head(&dev->write_wq); + atomic_set(&dev->open_excl, 0); + INIT_LIST_HEAD(&dev->tx_idle); + INIT_LIST_HEAD(&dev->hid_list); + INIT_LIST_HEAD(&dev->new_hid_list); + INIT_LIST_HEAD(&dev->dead_hid_list); + INIT_DELAYED_WORK(&dev->start_work, acc_start_work); + INIT_WORK(&dev->hid_work, acc_hid_work); + + /* _acc_dev must be set before calling usb_gadget_register_driver */ + _acc_dev = dev; + + ret = misc_register(&acc_device); + if (ret) + goto err; + + return 0; + +err: + kfree(dev); + pr_err("USB accessory gadget driver failed to initialize\n"); + return ret; +} + +void acc_disconnect(void) +{ + /* unregister all HID devices if USB is disconnected */ + kill_all_hid_devices(_acc_dev); +} +EXPORT_SYMBOL_GPL(acc_disconnect); + +static void acc_cleanup(void) +{ + misc_deregister(&acc_device); + kfree(_acc_dev); + _acc_dev = NULL; +} +static struct acc_instance *to_acc_instance(struct config_item *item) +{ + return container_of(to_config_group(item), struct acc_instance, + func_inst.group); +} + +static void acc_attr_release(struct config_item *item) +{ + struct acc_instance *fi_acc = to_acc_instance(item); + + usb_put_function_instance(&fi_acc->func_inst); +} + +static struct configfs_item_operations acc_item_ops = { + .release = acc_attr_release, +}; + +static struct config_item_type acc_func_type = { + .ct_item_ops = &acc_item_ops, + .ct_owner = THIS_MODULE, +}; + +static struct acc_instance *to_fi_acc(struct usb_function_instance *fi) +{ + return container_of(fi, struct acc_instance, func_inst); +} + +static int acc_set_inst_name(struct usb_function_instance *fi, const char *name) +{ + struct acc_instance *fi_acc; + char *ptr; + int name_len; + + name_len = strlen(name) + 1; + if (name_len > MAX_INST_NAME_LEN) + return -ENAMETOOLONG; + + ptr = kstrndup(name, name_len, GFP_KERNEL); + if (!ptr) + return -ENOMEM; + + fi_acc = to_fi_acc(fi); + fi_acc->name = ptr; + return 0; +} + +static void acc_free_inst(struct usb_function_instance *fi) +{ + struct acc_instance *fi_acc; + + fi_acc = to_fi_acc(fi); + kfree(fi_acc->name); + acc_cleanup(); +} + +static struct usb_function_instance *acc_alloc_inst(void) +{ + struct acc_instance *fi_acc; + struct acc_dev *dev; + int err; + + fi_acc = kzalloc(sizeof(*fi_acc), GFP_KERNEL); + if (!fi_acc) + return ERR_PTR(-ENOMEM); + fi_acc->func_inst.set_inst_name = acc_set_inst_name; + fi_acc->func_inst.free_func_inst = acc_free_inst; + + err = acc_setup(); + if (err) { + kfree(fi_acc); + pr_err("Error setting ACCESSORY\n"); + return ERR_PTR(err); + } + + config_group_init_type_name(&fi_acc->func_inst.group, + "", &acc_func_type); + dev = _acc_dev; + return &fi_acc->func_inst; +} + +static void acc_free(struct usb_function *f) +{ +/*NO-OP: no function specific resource allocation in mtp_alloc*/ +} + +int acc_ctrlrequest_configfs(struct usb_function *f, + const struct usb_ctrlrequest *ctrl) { + if (f->config != NULL && f->config->cdev != NULL) + return acc_ctrlrequest(f->config->cdev, ctrl); + else + return -1; +} + +static struct usb_function *acc_alloc(struct usb_function_instance *fi) +{ + struct acc_dev *dev = _acc_dev; + + pr_info("acc_alloc\n"); + + dev->function.name = "accessory"; + dev->function.strings = acc_strings, + dev->function.fs_descriptors = fs_acc_descs; + dev->function.hs_descriptors = hs_acc_descs; + dev->function.bind = acc_function_bind_configfs; + dev->function.unbind = acc_function_unbind; + dev->function.set_alt = acc_function_set_alt; + dev->function.disable = acc_function_disable; + dev->function.free_func = acc_free; + dev->function.setup = acc_ctrlrequest_configfs; + + return &dev->function; +} +DECLARE_USB_FUNCTION_INIT(accessory, acc_alloc_inst, acc_alloc); +MODULE_LICENSE("GPL");
diff --git a/drivers/usb/gadget/function/f_audio_source.c b/drivers/usb/gadget/function/f_audio_source.c new file mode 100644 index 0000000..8124af3 --- /dev/null +++ b/drivers/usb/gadget/function/f_audio_source.c
@@ -0,0 +1,1071 @@ +/* + * Gadget Function Driver for USB audio source device + * + * Copyright (C) 2012 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/device.h> +#include <linux/usb/audio.h> +#include <linux/wait.h> +#include <linux/pm_qos.h> +#include <sound/core.h> +#include <sound/initval.h> +#include <sound/pcm.h> + +#include <linux/usb.h> +#include <linux/usb_usual.h> +#include <linux/usb/ch9.h> +#include <linux/configfs.h> +#include <linux/usb/composite.h> +#include <linux/module.h> +#include <linux/moduleparam.h> +#define SAMPLE_RATE 44100 +#define FRAMES_PER_MSEC (SAMPLE_RATE / 1000) + +#define IN_EP_MAX_PACKET_SIZE 256 + +/* Number of requests to allocate */ +#define IN_EP_REQ_COUNT 4 + +#define AUDIO_AC_INTERFACE 0 +#define AUDIO_AS_INTERFACE 1 +#define AUDIO_NUM_INTERFACES 2 +#define MAX_INST_NAME_LEN 40 + +/* B.3.1 Standard AC Interface Descriptor */ +static struct usb_interface_descriptor ac_interface_desc = { + .bLength = USB_DT_INTERFACE_SIZE, + .bDescriptorType = USB_DT_INTERFACE, + .bNumEndpoints = 0, + .bInterfaceClass = USB_CLASS_AUDIO, + .bInterfaceSubClass = USB_SUBCLASS_AUDIOCONTROL, +}; + +DECLARE_UAC_AC_HEADER_DESCRIPTOR(2); + +#define UAC_DT_AC_HEADER_LENGTH UAC_DT_AC_HEADER_SIZE(AUDIO_NUM_INTERFACES) +/* 1 input terminal, 1 output terminal and 1 feature unit */ +#define UAC_DT_TOTAL_LENGTH (UAC_DT_AC_HEADER_LENGTH \ + + UAC_DT_INPUT_TERMINAL_SIZE + UAC_DT_OUTPUT_TERMINAL_SIZE \ + + UAC_DT_FEATURE_UNIT_SIZE(0)) +/* B.3.2 Class-Specific AC Interface Descriptor */ +static struct uac1_ac_header_descriptor_2 ac_header_desc = { + .bLength = UAC_DT_AC_HEADER_LENGTH, + .bDescriptorType = USB_DT_CS_INTERFACE, + .bDescriptorSubtype = UAC_HEADER, + .bcdADC = __constant_cpu_to_le16(0x0100), + .wTotalLength = __constant_cpu_to_le16(UAC_DT_TOTAL_LENGTH), + .bInCollection = AUDIO_NUM_INTERFACES, + .baInterfaceNr = { + [0] = AUDIO_AC_INTERFACE, + [1] = AUDIO_AS_INTERFACE, + } +}; + +#define INPUT_TERMINAL_ID 1 +static struct uac_input_terminal_descriptor input_terminal_desc = { + .bLength = UAC_DT_INPUT_TERMINAL_SIZE, + .bDescriptorType = USB_DT_CS_INTERFACE, + .bDescriptorSubtype = UAC_INPUT_TERMINAL, + .bTerminalID = INPUT_TERMINAL_ID, + .wTerminalType = UAC_INPUT_TERMINAL_MICROPHONE, + .bAssocTerminal = 0, + .wChannelConfig = 0x3, +}; + +DECLARE_UAC_FEATURE_UNIT_DESCRIPTOR(0); + +#define FEATURE_UNIT_ID 2 +static struct uac_feature_unit_descriptor_0 feature_unit_desc = { + .bLength = UAC_DT_FEATURE_UNIT_SIZE(0), + .bDescriptorType = USB_DT_CS_INTERFACE, + .bDescriptorSubtype = UAC_FEATURE_UNIT, + .bUnitID = FEATURE_UNIT_ID, + .bSourceID = INPUT_TERMINAL_ID, + .bControlSize = 2, +}; + +#define OUTPUT_TERMINAL_ID 3 +static struct uac1_output_terminal_descriptor output_terminal_desc = { + .bLength = UAC_DT_OUTPUT_TERMINAL_SIZE, + .bDescriptorType = USB_DT_CS_INTERFACE, + .bDescriptorSubtype = UAC_OUTPUT_TERMINAL, + .bTerminalID = OUTPUT_TERMINAL_ID, + .wTerminalType = UAC_TERMINAL_STREAMING, + .bAssocTerminal = FEATURE_UNIT_ID, + .bSourceID = FEATURE_UNIT_ID, +}; + +/* B.4.1 Standard AS Interface Descriptor */ +static struct usb_interface_descriptor as_interface_alt_0_desc = { + .bLength = USB_DT_INTERFACE_SIZE, + .bDescriptorType = USB_DT_INTERFACE, + .bAlternateSetting = 0, + .bNumEndpoints = 0, + .bInterfaceClass = USB_CLASS_AUDIO, + .bInterfaceSubClass = USB_SUBCLASS_AUDIOSTREAMING, +}; + +static struct usb_interface_descriptor as_interface_alt_1_desc = { + .bLength = USB_DT_INTERFACE_SIZE, + .bDescriptorType = USB_DT_INTERFACE, + .bAlternateSetting = 1, + .bNumEndpoints = 1, + .bInterfaceClass = USB_CLASS_AUDIO, + .bInterfaceSubClass = USB_SUBCLASS_AUDIOSTREAMING, +}; + +/* B.4.2 Class-Specific AS Interface Descriptor */ +static struct uac1_as_header_descriptor as_header_desc = { + .bLength = UAC_DT_AS_HEADER_SIZE, + .bDescriptorType = USB_DT_CS_INTERFACE, + .bDescriptorSubtype = UAC_AS_GENERAL, + .bTerminalLink = INPUT_TERMINAL_ID, + .bDelay = 1, + .wFormatTag = UAC_FORMAT_TYPE_I_PCM, +}; + +DECLARE_UAC_FORMAT_TYPE_I_DISCRETE_DESC(1); + +static struct uac_format_type_i_discrete_descriptor_1 as_type_i_desc = { + .bLength = UAC_FORMAT_TYPE_I_DISCRETE_DESC_SIZE(1), + .bDescriptorType = USB_DT_CS_INTERFACE, + .bDescriptorSubtype = UAC_FORMAT_TYPE, + .bFormatType = UAC_FORMAT_TYPE_I, + .bSubframeSize = 2, + .bBitResolution = 16, + .bSamFreqType = 1, +}; + +/* Standard ISO IN Endpoint Descriptor for highspeed */ +static struct usb_endpoint_descriptor hs_as_in_ep_desc = { + .bLength = USB_DT_ENDPOINT_AUDIO_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_IN, + .bmAttributes = USB_ENDPOINT_SYNC_SYNC + | USB_ENDPOINT_XFER_ISOC, + .wMaxPacketSize = __constant_cpu_to_le16(IN_EP_MAX_PACKET_SIZE), + .bInterval = 4, /* poll 1 per millisecond */ +}; + +/* Standard ISO IN Endpoint Descriptor for highspeed */ +static struct usb_endpoint_descriptor fs_as_in_ep_desc = { + .bLength = USB_DT_ENDPOINT_AUDIO_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_IN, + .bmAttributes = USB_ENDPOINT_SYNC_SYNC + | USB_ENDPOINT_XFER_ISOC, + .wMaxPacketSize = __constant_cpu_to_le16(IN_EP_MAX_PACKET_SIZE), + .bInterval = 1, /* poll 1 per millisecond */ +}; + +/* Class-specific AS ISO OUT Endpoint Descriptor */ +static struct uac_iso_endpoint_descriptor as_iso_in_desc = { + .bLength = UAC_ISO_ENDPOINT_DESC_SIZE, + .bDescriptorType = USB_DT_CS_ENDPOINT, + .bDescriptorSubtype = UAC_EP_GENERAL, + .bmAttributes = 1, + .bLockDelayUnits = 1, + .wLockDelay = __constant_cpu_to_le16(1), +}; + +static struct usb_descriptor_header *hs_audio_desc[] = { + (struct usb_descriptor_header *)&ac_interface_desc, + (struct usb_descriptor_header *)&ac_header_desc, + + (struct usb_descriptor_header *)&input_terminal_desc, + (struct usb_descriptor_header *)&output_terminal_desc, + (struct usb_descriptor_header *)&feature_unit_desc, + + (struct usb_descriptor_header *)&as_interface_alt_0_desc, + (struct usb_descriptor_header *)&as_interface_alt_1_desc, + (struct usb_descriptor_header *)&as_header_desc, + + (struct usb_descriptor_header *)&as_type_i_desc, + + (struct usb_descriptor_header *)&hs_as_in_ep_desc, + (struct usb_descriptor_header *)&as_iso_in_desc, + NULL, +}; + +static struct usb_descriptor_header *fs_audio_desc[] = { + (struct usb_descriptor_header *)&ac_interface_desc, + (struct usb_descriptor_header *)&ac_header_desc, + + (struct usb_descriptor_header *)&input_terminal_desc, + (struct usb_descriptor_header *)&output_terminal_desc, + (struct usb_descriptor_header *)&feature_unit_desc, + + (struct usb_descriptor_header *)&as_interface_alt_0_desc, + (struct usb_descriptor_header *)&as_interface_alt_1_desc, + (struct usb_descriptor_header *)&as_header_desc, + + (struct usb_descriptor_header *)&as_type_i_desc, + + (struct usb_descriptor_header *)&fs_as_in_ep_desc, + (struct usb_descriptor_header *)&as_iso_in_desc, + NULL, +}; + +static struct snd_pcm_hardware audio_hw_info = { + .info = SNDRV_PCM_INFO_MMAP | + SNDRV_PCM_INFO_MMAP_VALID | + SNDRV_PCM_INFO_BATCH | + SNDRV_PCM_INFO_INTERLEAVED | + SNDRV_PCM_INFO_BLOCK_TRANSFER, + + .formats = SNDRV_PCM_FMTBIT_S16_LE, + .channels_min = 2, + .channels_max = 2, + .rate_min = SAMPLE_RATE, + .rate_max = SAMPLE_RATE, + + .buffer_bytes_max = 1024 * 1024, + .period_bytes_min = 64, + .period_bytes_max = 512 * 1024, + .periods_min = 2, + .periods_max = 1024, +}; + +/*-------------------------------------------------------------------------*/ + +struct audio_source_config { + int card; + int device; +}; + +struct audio_dev { + struct usb_function func; + struct snd_card *card; + struct snd_pcm *pcm; + struct snd_pcm_substream *substream; + + struct list_head idle_reqs; + struct usb_ep *in_ep; + + spinlock_t lock; + + /* beginning, end and current position in our buffer */ + void *buffer_start; + void *buffer_end; + void *buffer_pos; + + /* byte size of a "period" */ + unsigned int period; + /* bytes sent since last call to snd_pcm_period_elapsed */ + unsigned int period_offset; + /* time we started playing */ + ktime_t start_time; + /* number of frames sent since start_time */ + s64 frames_sent; + struct audio_source_config *config; + /* for creating and issuing QoS requests */ + struct pm_qos_request pm_qos; +}; + +static inline struct audio_dev *func_to_audio(struct usb_function *f) +{ + return container_of(f, struct audio_dev, func); +} + +/*-------------------------------------------------------------------------*/ + +struct audio_source_instance { + struct usb_function_instance func_inst; + const char *name; + struct audio_source_config *config; + struct device *audio_device; +}; + +static void audio_source_attr_release(struct config_item *item); + +static struct configfs_item_operations audio_source_item_ops = { + .release = audio_source_attr_release, +}; + +static struct config_item_type audio_source_func_type = { + .ct_item_ops = &audio_source_item_ops, + .ct_owner = THIS_MODULE, +}; + +static ssize_t audio_source_pcm_show(struct device *dev, + struct device_attribute *attr, char *buf); + +static DEVICE_ATTR(pcm, S_IRUGO, audio_source_pcm_show, NULL); + +static struct device_attribute *audio_source_function_attributes[] = { + &dev_attr_pcm, + NULL +}; + +/*--------------------------------------------------------------------------*/ + +static struct usb_request *audio_request_new(struct usb_ep *ep, int buffer_size) +{ + struct usb_request *req = usb_ep_alloc_request(ep, GFP_KERNEL); + + if (!req) + return NULL; + + req->buf = kmalloc(buffer_size, GFP_KERNEL); + if (!req->buf) { + usb_ep_free_request(ep, req); + return NULL; + } + req->length = buffer_size; + return req; +} + +static void audio_request_free(struct usb_request *req, struct usb_ep *ep) +{ + if (req) { + kfree(req->buf); + usb_ep_free_request(ep, req); + } +} + +static void audio_req_put(struct audio_dev *audio, struct usb_request *req) +{ + unsigned long flags; + + spin_lock_irqsave(&audio->lock, flags); + list_add_tail(&req->list, &audio->idle_reqs); + spin_unlock_irqrestore(&audio->lock, flags); +} + +static struct usb_request *audio_req_get(struct audio_dev *audio) +{ + unsigned long flags; + struct usb_request *req; + + spin_lock_irqsave(&audio->lock, flags); + if (list_empty(&audio->idle_reqs)) { + req = 0; + } else { + req = list_first_entry(&audio->idle_reqs, struct usb_request, + list); + list_del(&req->list); + } + spin_unlock_irqrestore(&audio->lock, flags); + return req; +} + +/* send the appropriate number of packets to match our bitrate */ +static void audio_send(struct audio_dev *audio) +{ + struct snd_pcm_runtime *runtime; + struct usb_request *req; + int length, length1, length2, ret; + s64 msecs; + s64 frames; + ktime_t now; + + /* audio->substream will be null if we have been closed */ + if (!audio->substream) + return; + /* audio->buffer_pos will be null if we have been stopped */ + if (!audio->buffer_pos) + return; + + runtime = audio->substream->runtime; + + /* compute number of frames to send */ + now = ktime_get(); + msecs = div_s64((ktime_to_ns(now) - ktime_to_ns(audio->start_time)), + 1000000); + frames = div_s64((msecs * SAMPLE_RATE), 1000); + + /* Readjust our frames_sent if we fall too far behind. + * If we get too far behind it is better to drop some frames than + * to keep sending data too fast in an attempt to catch up. + */ + if (frames - audio->frames_sent > 10 * FRAMES_PER_MSEC) + audio->frames_sent = frames - FRAMES_PER_MSEC; + + frames -= audio->frames_sent; + + /* We need to send something to keep the pipeline going */ + if (frames <= 0) + frames = FRAMES_PER_MSEC; + + while (frames > 0) { + req = audio_req_get(audio); + if (!req) + break; + + length = frames_to_bytes(runtime, frames); + if (length > IN_EP_MAX_PACKET_SIZE) + length = IN_EP_MAX_PACKET_SIZE; + + if (audio->buffer_pos + length > audio->buffer_end) + length1 = audio->buffer_end - audio->buffer_pos; + else + length1 = length; + memcpy(req->buf, audio->buffer_pos, length1); + if (length1 < length) { + /* Wrap around and copy remaining length + * at beginning of buffer. + */ + length2 = length - length1; + memcpy(req->buf + length1, audio->buffer_start, + length2); + audio->buffer_pos = audio->buffer_start + length2; + } else { + audio->buffer_pos += length1; + if (audio->buffer_pos >= audio->buffer_end) + audio->buffer_pos = audio->buffer_start; + } + + req->length = length; + ret = usb_ep_queue(audio->in_ep, req, GFP_ATOMIC); + if (ret < 0) { + pr_err("usb_ep_queue failed ret: %d\n", ret); + audio_req_put(audio, req); + break; + } + + frames -= bytes_to_frames(runtime, length); + audio->frames_sent += bytes_to_frames(runtime, length); + } +} + +static void audio_control_complete(struct usb_ep *ep, struct usb_request *req) +{ + /* nothing to do here */ +} + +static void audio_data_complete(struct usb_ep *ep, struct usb_request *req) +{ + struct audio_dev *audio = req->context; + + pr_debug("audio_data_complete req->status %d req->actual %d\n", + req->status, req->actual); + + audio_req_put(audio, req); + + if (!audio->buffer_start || req->status) + return; + + audio->period_offset += req->actual; + if (audio->period_offset >= audio->period) { + snd_pcm_period_elapsed(audio->substream); + audio->period_offset = 0; + } + audio_send(audio); +} + +static int audio_set_endpoint_req(struct usb_function *f, + const struct usb_ctrlrequest *ctrl) +{ + int value = -EOPNOTSUPP; + u16 ep = le16_to_cpu(ctrl->wIndex); + u16 len = le16_to_cpu(ctrl->wLength); + u16 w_value = le16_to_cpu(ctrl->wValue); + + pr_debug("bRequest 0x%x, w_value 0x%04x, len %d, endpoint %d\n", + ctrl->bRequest, w_value, len, ep); + + switch (ctrl->bRequest) { + case UAC_SET_CUR: + case UAC_SET_MIN: + case UAC_SET_MAX: + case UAC_SET_RES: + value = len; + break; + default: + break; + } + + return value; +} + +static int audio_get_endpoint_req(struct usb_function *f, + const struct usb_ctrlrequest *ctrl) +{ + struct usb_composite_dev *cdev = f->config->cdev; + int value = -EOPNOTSUPP; + u8 ep = ((le16_to_cpu(ctrl->wIndex) >> 8) & 0xFF); + u16 len = le16_to_cpu(ctrl->wLength); + u16 w_value = le16_to_cpu(ctrl->wValue); + u8 *buf = cdev->req->buf; + + pr_debug("bRequest 0x%x, w_value 0x%04x, len %d, endpoint %d\n", + ctrl->bRequest, w_value, len, ep); + + if (w_value == UAC_EP_CS_ATTR_SAMPLE_RATE << 8) { + switch (ctrl->bRequest) { + case UAC_GET_CUR: + case UAC_GET_MIN: + case UAC_GET_MAX: + case UAC_GET_RES: + /* return our sample rate */ + buf[0] = (u8)SAMPLE_RATE; + buf[1] = (u8)(SAMPLE_RATE >> 8); + buf[2] = (u8)(SAMPLE_RATE >> 16); + value = 3; + break; + default: + break; + } + } + + return value; +} + +static int +audio_setup(struct usb_function *f, const struct usb_ctrlrequest *ctrl) +{ + struct usb_composite_dev *cdev = f->config->cdev; + struct usb_request *req = cdev->req; + int value = -EOPNOTSUPP; + u16 w_index = le16_to_cpu(ctrl->wIndex); + u16 w_value = le16_to_cpu(ctrl->wValue); + u16 w_length = le16_to_cpu(ctrl->wLength); + + /* composite driver infrastructure handles everything; interface + * activation uses set_alt(). + */ + switch (ctrl->bRequestType) { + case USB_DIR_OUT | USB_TYPE_CLASS | USB_RECIP_ENDPOINT: + value = audio_set_endpoint_req(f, ctrl); + break; + + case USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_ENDPOINT: + value = audio_get_endpoint_req(f, ctrl); + break; + } + + /* respond with data transfer or status phase? */ + if (value >= 0) { + pr_debug("audio req%02x.%02x v%04x i%04x l%d\n", + ctrl->bRequestType, ctrl->bRequest, + w_value, w_index, w_length); + req->zero = 0; + req->length = value; + req->complete = audio_control_complete; + value = usb_ep_queue(cdev->gadget->ep0, req, GFP_ATOMIC); + if (value < 0) + pr_err("audio response on err %d\n", value); + } + + /* device either stalls (value < 0) or reports success */ + return value; +} + +static int audio_set_alt(struct usb_function *f, unsigned intf, unsigned alt) +{ + struct audio_dev *audio = func_to_audio(f); + struct usb_composite_dev *cdev = f->config->cdev; + int ret; + + pr_debug("audio_set_alt intf %d, alt %d\n", intf, alt); + + ret = config_ep_by_speed(cdev->gadget, f, audio->in_ep); + if (ret) + return ret; + + usb_ep_enable(audio->in_ep); + return 0; +} + +static void audio_disable(struct usb_function *f) +{ + struct audio_dev *audio = func_to_audio(f); + + pr_debug("audio_disable\n"); + usb_ep_disable(audio->in_ep); +} + +static void audio_free_func(struct usb_function *f) +{ + /* no-op */ +} + +/*-------------------------------------------------------------------------*/ + +static void audio_build_desc(struct audio_dev *audio) +{ + u8 *sam_freq; + int rate; + + /* Set channel numbers */ + input_terminal_desc.bNrChannels = 2; + as_type_i_desc.bNrChannels = 2; + + /* Set sample rates */ + rate = SAMPLE_RATE; + sam_freq = as_type_i_desc.tSamFreq[0]; + memcpy(sam_freq, &rate, 3); +} + + +static int snd_card_setup(struct usb_configuration *c, + struct audio_source_config *config); +static struct audio_source_instance *to_fi_audio_source( + const struct usb_function_instance *fi); + + +/* audio function driver setup/binding */ +static int +audio_bind(struct usb_configuration *c, struct usb_function *f) +{ + struct usb_composite_dev *cdev = c->cdev; + struct audio_dev *audio = func_to_audio(f); + int status; + struct usb_ep *ep; + struct usb_request *req; + int i; + int err; + + if (IS_ENABLED(CONFIG_USB_CONFIGFS)) { + struct audio_source_instance *fi_audio = + to_fi_audio_source(f->fi); + struct audio_source_config *config = + fi_audio->config; + + err = snd_card_setup(c, config); + if (err) + return err; + } + + audio_build_desc(audio); + + /* allocate instance-specific interface IDs, and patch descriptors */ + status = usb_interface_id(c, f); + if (status < 0) + goto fail; + ac_interface_desc.bInterfaceNumber = status; + + /* AUDIO_AC_INTERFACE */ + ac_header_desc.baInterfaceNr[0] = status; + + status = usb_interface_id(c, f); + if (status < 0) + goto fail; + as_interface_alt_0_desc.bInterfaceNumber = status; + as_interface_alt_1_desc.bInterfaceNumber = status; + + /* AUDIO_AS_INTERFACE */ + ac_header_desc.baInterfaceNr[1] = status; + + status = -ENODEV; + + /* allocate our endpoint */ + ep = usb_ep_autoconfig(cdev->gadget, &fs_as_in_ep_desc); + if (!ep) + goto fail; + audio->in_ep = ep; + ep->driver_data = audio; /* claim */ + + if (gadget_is_dualspeed(c->cdev->gadget)) + hs_as_in_ep_desc.bEndpointAddress = + fs_as_in_ep_desc.bEndpointAddress; + + f->fs_descriptors = fs_audio_desc; + f->hs_descriptors = hs_audio_desc; + + for (i = 0, status = 0; i < IN_EP_REQ_COUNT && status == 0; i++) { + req = audio_request_new(ep, IN_EP_MAX_PACKET_SIZE); + if (req) { + req->context = audio; + req->complete = audio_data_complete; + audio_req_put(audio, req); + } else + status = -ENOMEM; + } + +fail: + return status; +} + +static void +audio_unbind(struct usb_configuration *c, struct usb_function *f) +{ + struct audio_dev *audio = func_to_audio(f); + struct usb_request *req; + + while ((req = audio_req_get(audio))) + audio_request_free(req, audio->in_ep); + + snd_card_free_when_closed(audio->card); + audio->card = NULL; + audio->pcm = NULL; + audio->substream = NULL; + audio->in_ep = NULL; + + if (IS_ENABLED(CONFIG_USB_CONFIGFS)) { + struct audio_source_instance *fi_audio = + to_fi_audio_source(f->fi); + struct audio_source_config *config = + fi_audio->config; + + config->card = -1; + config->device = -1; + } +} + +static void audio_pcm_playback_start(struct audio_dev *audio) +{ + audio->start_time = ktime_get(); + audio->frames_sent = 0; + audio_send(audio); +} + +static void audio_pcm_playback_stop(struct audio_dev *audio) +{ + unsigned long flags; + + spin_lock_irqsave(&audio->lock, flags); + audio->buffer_start = 0; + audio->buffer_end = 0; + audio->buffer_pos = 0; + spin_unlock_irqrestore(&audio->lock, flags); +} + +static int audio_pcm_open(struct snd_pcm_substream *substream) +{ + struct snd_pcm_runtime *runtime = substream->runtime; + struct audio_dev *audio = substream->private_data; + + runtime->private_data = audio; + runtime->hw = audio_hw_info; + snd_pcm_limit_hw_rates(runtime); + runtime->hw.channels_max = 2; + + audio->substream = substream; + + /* Add the QoS request and set the latency to 0 */ + pm_qos_add_request(&audio->pm_qos, PM_QOS_CPU_DMA_LATENCY, 0); + + return 0; +} + +static int audio_pcm_close(struct snd_pcm_substream *substream) +{ + struct audio_dev *audio = substream->private_data; + unsigned long flags; + + spin_lock_irqsave(&audio->lock, flags); + + /* Remove the QoS request */ + pm_qos_remove_request(&audio->pm_qos); + + audio->substream = NULL; + spin_unlock_irqrestore(&audio->lock, flags); + + return 0; +} + +static int audio_pcm_hw_params(struct snd_pcm_substream *substream, + struct snd_pcm_hw_params *params) +{ + unsigned int channels = params_channels(params); + unsigned int rate = params_rate(params); + + if (rate != SAMPLE_RATE) + return -EINVAL; + if (channels != 2) + return -EINVAL; + + return snd_pcm_lib_alloc_vmalloc_buffer(substream, + params_buffer_bytes(params)); +} + +static int audio_pcm_hw_free(struct snd_pcm_substream *substream) +{ + return snd_pcm_lib_free_vmalloc_buffer(substream); +} + +static int audio_pcm_prepare(struct snd_pcm_substream *substream) +{ + struct snd_pcm_runtime *runtime = substream->runtime; + struct audio_dev *audio = runtime->private_data; + + audio->period = snd_pcm_lib_period_bytes(substream); + audio->period_offset = 0; + audio->buffer_start = runtime->dma_area; + audio->buffer_end = audio->buffer_start + + snd_pcm_lib_buffer_bytes(substream); + audio->buffer_pos = audio->buffer_start; + + return 0; +} + +static snd_pcm_uframes_t audio_pcm_pointer(struct snd_pcm_substream *substream) +{ + struct snd_pcm_runtime *runtime = substream->runtime; + struct audio_dev *audio = runtime->private_data; + ssize_t bytes = audio->buffer_pos - audio->buffer_start; + + /* return offset of next frame to fill in our buffer */ + return bytes_to_frames(runtime, bytes); +} + +static int audio_pcm_playback_trigger(struct snd_pcm_substream *substream, + int cmd) +{ + struct audio_dev *audio = substream->runtime->private_data; + int ret = 0; + + switch (cmd) { + case SNDRV_PCM_TRIGGER_START: + case SNDRV_PCM_TRIGGER_RESUME: + audio_pcm_playback_start(audio); + break; + + case SNDRV_PCM_TRIGGER_STOP: + case SNDRV_PCM_TRIGGER_SUSPEND: + audio_pcm_playback_stop(audio); + break; + + default: + ret = -EINVAL; + } + + return ret; +} + +static struct audio_dev _audio_dev = { + .func = { + .name = "audio_source", + .bind = audio_bind, + .unbind = audio_unbind, + .set_alt = audio_set_alt, + .setup = audio_setup, + .disable = audio_disable, + .free_func = audio_free_func, + }, + .lock = __SPIN_LOCK_UNLOCKED(_audio_dev.lock), + .idle_reqs = LIST_HEAD_INIT(_audio_dev.idle_reqs), +}; + +static struct snd_pcm_ops audio_playback_ops = { + .open = audio_pcm_open, + .close = audio_pcm_close, + .ioctl = snd_pcm_lib_ioctl, + .hw_params = audio_pcm_hw_params, + .hw_free = audio_pcm_hw_free, + .prepare = audio_pcm_prepare, + .trigger = audio_pcm_playback_trigger, + .pointer = audio_pcm_pointer, +}; + +int audio_source_bind_config(struct usb_configuration *c, + struct audio_source_config *config) +{ + struct audio_dev *audio; + int err; + + config->card = -1; + config->device = -1; + + audio = &_audio_dev; + + err = snd_card_setup(c, config); + if (err) + return err; + + err = usb_add_function(c, &audio->func); + if (err) + goto add_fail; + + return 0; + +add_fail: + snd_card_free(audio->card); + return err; +} + +static int snd_card_setup(struct usb_configuration *c, + struct audio_source_config *config) +{ + struct audio_dev *audio; + struct snd_card *card; + struct snd_pcm *pcm; + int err; + + audio = &_audio_dev; + + err = snd_card_new(&c->cdev->gadget->dev, + SNDRV_DEFAULT_IDX1, SNDRV_DEFAULT_STR1, + THIS_MODULE, 0, &card); + if (err) + return err; + + err = snd_pcm_new(card, "USB audio source", 0, 1, 0, &pcm); + if (err) + goto pcm_fail; + + pcm->private_data = audio; + pcm->info_flags = 0; + audio->pcm = pcm; + + strlcpy(pcm->name, "USB gadget audio", sizeof(pcm->name)); + + snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, &audio_playback_ops); + snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, + NULL, 0, 64 * 1024); + + strlcpy(card->driver, "audio_source", sizeof(card->driver)); + strlcpy(card->shortname, card->driver, sizeof(card->shortname)); + strlcpy(card->longname, "USB accessory audio source", + sizeof(card->longname)); + + err = snd_card_register(card); + if (err) + goto register_fail; + + config->card = pcm->card->number; + config->device = pcm->device; + audio->card = card; + return 0; + +register_fail: +pcm_fail: + snd_card_free(audio->card); + return err; +} + +static struct audio_source_instance *to_audio_source_instance( + struct config_item *item) +{ + return container_of(to_config_group(item), struct audio_source_instance, + func_inst.group); +} + +static struct audio_source_instance *to_fi_audio_source( + const struct usb_function_instance *fi) +{ + return container_of(fi, struct audio_source_instance, func_inst); +} + +static void audio_source_attr_release(struct config_item *item) +{ + struct audio_source_instance *fi_audio = to_audio_source_instance(item); + + usb_put_function_instance(&fi_audio->func_inst); +} + +static int audio_source_set_inst_name(struct usb_function_instance *fi, + const char *name) +{ + struct audio_source_instance *fi_audio; + char *ptr; + int name_len; + + name_len = strlen(name) + 1; + if (name_len > MAX_INST_NAME_LEN) + return -ENAMETOOLONG; + + ptr = kstrndup(name, name_len, GFP_KERNEL); + if (!ptr) + return -ENOMEM; + + fi_audio = to_fi_audio_source(fi); + fi_audio->name = ptr; + + return 0; +} + +static void audio_source_free_inst(struct usb_function_instance *fi) +{ + struct audio_source_instance *fi_audio; + + fi_audio = to_fi_audio_source(fi); + device_destroy(fi_audio->audio_device->class, + fi_audio->audio_device->devt); + kfree(fi_audio->name); + kfree(fi_audio->config); +} + +static ssize_t audio_source_pcm_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct audio_source_instance *fi_audio = dev_get_drvdata(dev); + struct audio_source_config *config = fi_audio->config; + + /* print PCM card and device numbers */ + return sprintf(buf, "%d %d\n", config->card, config->device); +} + +struct device *create_function_device(char *name); + +static struct usb_function_instance *audio_source_alloc_inst(void) +{ + struct audio_source_instance *fi_audio; + struct device_attribute **attrs; + struct device_attribute *attr; + struct device *dev; + void *err_ptr; + int err = 0; + + fi_audio = kzalloc(sizeof(*fi_audio), GFP_KERNEL); + if (!fi_audio) + return ERR_PTR(-ENOMEM); + + fi_audio->func_inst.set_inst_name = audio_source_set_inst_name; + fi_audio->func_inst.free_func_inst = audio_source_free_inst; + + fi_audio->config = kzalloc(sizeof(struct audio_source_config), + GFP_KERNEL); + if (!fi_audio->config) { + err_ptr = ERR_PTR(-ENOMEM); + goto fail_audio; + } + + config_group_init_type_name(&fi_audio->func_inst.group, "", + &audio_source_func_type); + dev = create_function_device("f_audio_source"); + + if (IS_ERR(dev)) { + err_ptr = dev; + goto fail_audio_config; + } + + fi_audio->config->card = -1; + fi_audio->config->device = -1; + fi_audio->audio_device = dev; + + attrs = audio_source_function_attributes; + if (attrs) { + while ((attr = *attrs++) && !err) + err = device_create_file(dev, attr); + if (err) { + err_ptr = ERR_PTR(-EINVAL); + goto fail_device; + } + } + + dev_set_drvdata(dev, fi_audio); + _audio_dev.config = fi_audio->config; + + return &fi_audio->func_inst; + +fail_device: + device_destroy(dev->class, dev->devt); +fail_audio_config: + kfree(fi_audio->config); +fail_audio: + kfree(fi_audio); + return err_ptr; + +} + +static struct usb_function *audio_source_alloc(struct usb_function_instance *fi) +{ + return &_audio_dev.func; +} + +DECLARE_USB_FUNCTION_INIT(audio_source, audio_source_alloc_inst, + audio_source_alloc); +MODULE_LICENSE("GPL");
diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c index 70ac196..7d6a48c5 100644 --- a/drivers/usb/gadget/function/f_midi.c +++ b/drivers/usb/gadget/function/f_midi.c
@@ -1168,6 +1168,65 @@ static void f_midi_free_inst(struct usb_function_instance *f) kfree(opts); } +#ifdef CONFIG_USB_CONFIGFS_UEVENT +extern struct device *create_function_device(char *name); +static ssize_t alsa_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct usb_function_instance *fi_midi = dev_get_drvdata(dev); + struct f_midi *midi; + + if (!fi_midi->f) + dev_warn(dev, "f_midi: function not set\n"); + + if (fi_midi && fi_midi->f) { + midi = func_to_midi(fi_midi->f); + if (midi->rmidi && midi->rmidi->card) + return sprintf(buf, "%d %d\n", + midi->rmidi->card->number, midi->rmidi->device); + } + + /* print PCM card and device numbers */ + return sprintf(buf, "%d %d\n", -1, -1); +} + +static DEVICE_ATTR(alsa, S_IRUGO, alsa_show, NULL); + +static struct device_attribute *alsa_function_attributes[] = { + &dev_attr_alsa, + NULL +}; + +static int create_alsa_device(struct usb_function_instance *fi) +{ + struct device *dev; + struct device_attribute **attrs; + struct device_attribute *attr; + int err = 0; + + dev = create_function_device("f_midi"); + if (IS_ERR(dev)) + return PTR_ERR(dev); + + attrs = alsa_function_attributes; + if (attrs) { + while ((attr = *attrs++) && !err) + err = device_create_file(dev, attr); + if (err) { + device_destroy(dev->class, dev->devt); + return -EINVAL; + } + } + dev_set_drvdata(dev, fi); + return 0; +} +#else +static int create_alsa_device(struct usb_function_instance *fi) +{ + return 0; +} +#endif + static struct usb_function_instance *f_midi_alloc_inst(void) { struct f_midi_opts *opts; @@ -1185,6 +1244,11 @@ static struct usb_function_instance *f_midi_alloc_inst(void) opts->in_ports = 1; opts->out_ports = 1; + if (create_alsa_device(&opts->func_inst)) { + kfree(opts); + return ERR_PTR(-ENODEV); + } + config_group_init_type_name(&opts->func_inst.group, "", &midi_func_type); @@ -1202,6 +1266,7 @@ static void f_midi_free(struct usb_function *f) mutex_lock(&opts->lock); kfifo_free(&midi->in_req_fifo); kfree(midi); + opts->func_inst.f = NULL; --opts->refcnt; mutex_unlock(&opts->lock); } @@ -1281,6 +1346,7 @@ static struct usb_function *f_midi_alloc(struct usb_function_instance *fi) midi->func.disable = f_midi_disable; midi->func.free_func = f_midi_free; + fi->f = &midi->func; return &midi->func; setup_fail:
diff --git a/drivers/usb/gadget/function/f_mtp.c b/drivers/usb/gadget/function/f_mtp.c new file mode 100644 index 0000000..54f7ebb --- /dev/null +++ b/drivers/usb/gadget/function/f_mtp.c
@@ -0,0 +1,1554 @@ +/* + * Gadget Function Driver for MTP + * + * Copyright (C) 2010 Google, Inc. + * Author: Mike Lockwood <lockwood@android.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +/* #define DEBUG */ +/* #define VERBOSE_DEBUG */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/poll.h> +#include <linux/delay.h> +#include <linux/wait.h> +#include <linux/err.h> +#include <linux/interrupt.h> + +#include <linux/types.h> +#include <linux/file.h> +#include <linux/device.h> +#include <linux/miscdevice.h> + +#include <linux/usb.h> +#include <linux/usb_usual.h> +#include <linux/usb/ch9.h> +#include <linux/usb/f_mtp.h> +#include <linux/configfs.h> +#include <linux/usb/composite.h> + +#include "configfs.h" + +#define MTP_BULK_BUFFER_SIZE 16384 +#define INTR_BUFFER_SIZE 28 +#define MAX_INST_NAME_LEN 40 +#define MTP_MAX_FILE_SIZE 0xFFFFFFFFL + +/* String IDs */ +#define INTERFACE_STRING_INDEX 0 + +/* values for mtp_dev.state */ +#define STATE_OFFLINE 0 /* initial state, disconnected */ +#define STATE_READY 1 /* ready for userspace calls */ +#define STATE_BUSY 2 /* processing userspace calls */ +#define STATE_CANCELED 3 /* transaction canceled by host */ +#define STATE_ERROR 4 /* error from completion routine */ + +/* number of tx and rx requests to allocate */ +#define TX_REQ_MAX 4 +#define RX_REQ_MAX 2 +#define INTR_REQ_MAX 5 + +/* ID for Microsoft MTP OS String */ +#define MTP_OS_STRING_ID 0xEE + +/* MTP class reqeusts */ +#define MTP_REQ_CANCEL 0x64 +#define MTP_REQ_GET_EXT_EVENT_DATA 0x65 +#define MTP_REQ_RESET 0x66 +#define MTP_REQ_GET_DEVICE_STATUS 0x67 + +/* constants for device status */ +#define MTP_RESPONSE_OK 0x2001 +#define MTP_RESPONSE_DEVICE_BUSY 0x2019 +#define DRIVER_NAME "mtp" + +static const char mtp_shortname[] = DRIVER_NAME "_usb"; + +struct mtp_dev { + struct usb_function function; + struct usb_composite_dev *cdev; + spinlock_t lock; + + struct usb_ep *ep_in; + struct usb_ep *ep_out; + struct usb_ep *ep_intr; + + int state; + + /* synchronize access to our device file */ + atomic_t open_excl; + /* to enforce only one ioctl at a time */ + atomic_t ioctl_excl; + + struct list_head tx_idle; + struct list_head intr_idle; + + wait_queue_head_t read_wq; + wait_queue_head_t write_wq; + wait_queue_head_t intr_wq; + struct usb_request *rx_req[RX_REQ_MAX]; + int rx_done; + + /* for processing MTP_SEND_FILE, MTP_RECEIVE_FILE and + * MTP_SEND_FILE_WITH_HEADER ioctls on a work queue + */ + struct workqueue_struct *wq; + struct work_struct send_file_work; + struct work_struct receive_file_work; + struct file *xfer_file; + loff_t xfer_file_offset; + int64_t xfer_file_length; + unsigned xfer_send_header; + uint16_t xfer_command; + uint32_t xfer_transaction_id; + int xfer_result; +}; + +static struct usb_interface_descriptor mtp_interface_desc = { + .bLength = USB_DT_INTERFACE_SIZE, + .bDescriptorType = USB_DT_INTERFACE, + .bInterfaceNumber = 0, + .bNumEndpoints = 3, + .bInterfaceClass = USB_CLASS_VENDOR_SPEC, + .bInterfaceSubClass = USB_SUBCLASS_VENDOR_SPEC, + .bInterfaceProtocol = 0, +}; + +static struct usb_interface_descriptor ptp_interface_desc = { + .bLength = USB_DT_INTERFACE_SIZE, + .bDescriptorType = USB_DT_INTERFACE, + .bInterfaceNumber = 0, + .bNumEndpoints = 3, + .bInterfaceClass = USB_CLASS_STILL_IMAGE, + .bInterfaceSubClass = 1, + .bInterfaceProtocol = 1, +}; + +static struct usb_endpoint_descriptor mtp_ss_in_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_IN, + .bmAttributes = USB_ENDPOINT_XFER_BULK, + .wMaxPacketSize = __constant_cpu_to_le16(1024), +}; + +static struct usb_ss_ep_comp_descriptor mtp_ss_in_comp_desc = { + .bLength = sizeof(mtp_ss_in_comp_desc), + .bDescriptorType = USB_DT_SS_ENDPOINT_COMP, + /* .bMaxBurst = DYNAMIC, */ +}; + +static struct usb_endpoint_descriptor mtp_ss_out_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_OUT, + .bmAttributes = USB_ENDPOINT_XFER_BULK, + .wMaxPacketSize = __constant_cpu_to_le16(1024), +}; + +static struct usb_ss_ep_comp_descriptor mtp_ss_out_comp_desc = { + .bLength = sizeof(mtp_ss_out_comp_desc), + .bDescriptorType = USB_DT_SS_ENDPOINT_COMP, + /* .bMaxBurst = DYNAMIC, */ +}; + +static struct usb_endpoint_descriptor mtp_highspeed_in_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_IN, + .bmAttributes = USB_ENDPOINT_XFER_BULK, + .wMaxPacketSize = __constant_cpu_to_le16(512), +}; + +static struct usb_endpoint_descriptor mtp_highspeed_out_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_OUT, + .bmAttributes = USB_ENDPOINT_XFER_BULK, + .wMaxPacketSize = __constant_cpu_to_le16(512), +}; + +static struct usb_endpoint_descriptor mtp_fullspeed_in_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_IN, + .bmAttributes = USB_ENDPOINT_XFER_BULK, +}; + +static struct usb_endpoint_descriptor mtp_fullspeed_out_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_OUT, + .bmAttributes = USB_ENDPOINT_XFER_BULK, +}; + +static struct usb_endpoint_descriptor mtp_intr_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + .bEndpointAddress = USB_DIR_IN, + .bmAttributes = USB_ENDPOINT_XFER_INT, + .wMaxPacketSize = __constant_cpu_to_le16(INTR_BUFFER_SIZE), + .bInterval = 6, +}; + +static struct usb_ss_ep_comp_descriptor mtp_intr_ss_comp_desc = { + .bLength = sizeof(mtp_intr_ss_comp_desc), + .bDescriptorType = USB_DT_SS_ENDPOINT_COMP, + .wBytesPerInterval = cpu_to_le16(INTR_BUFFER_SIZE), +}; + +static struct usb_descriptor_header *fs_mtp_descs[] = { + (struct usb_descriptor_header *) &mtp_interface_desc, + (struct usb_descriptor_header *) &mtp_fullspeed_in_desc, + (struct usb_descriptor_header *) &mtp_fullspeed_out_desc, + (struct usb_descriptor_header *) &mtp_intr_desc, + NULL, +}; + +static struct usb_descriptor_header *hs_mtp_descs[] = { + (struct usb_descriptor_header *) &mtp_interface_desc, + (struct usb_descriptor_header *) &mtp_highspeed_in_desc, + (struct usb_descriptor_header *) &mtp_highspeed_out_desc, + (struct usb_descriptor_header *) &mtp_intr_desc, + NULL, +}; + +static struct usb_descriptor_header *ss_mtp_descs[] = { + (struct usb_descriptor_header *) &mtp_interface_desc, + (struct usb_descriptor_header *) &mtp_ss_in_desc, + (struct usb_descriptor_header *) &mtp_ss_in_comp_desc, + (struct usb_descriptor_header *) &mtp_ss_out_desc, + (struct usb_descriptor_header *) &mtp_ss_out_comp_desc, + (struct usb_descriptor_header *) &mtp_intr_desc, + (struct usb_descriptor_header *) &mtp_intr_ss_comp_desc, + NULL, +}; + +static struct usb_descriptor_header *fs_ptp_descs[] = { + (struct usb_descriptor_header *) &ptp_interface_desc, + (struct usb_descriptor_header *) &mtp_fullspeed_in_desc, + (struct usb_descriptor_header *) &mtp_fullspeed_out_desc, + (struct usb_descriptor_header *) &mtp_intr_desc, + NULL, +}; + +static struct usb_descriptor_header *hs_ptp_descs[] = { + (struct usb_descriptor_header *) &ptp_interface_desc, + (struct usb_descriptor_header *) &mtp_highspeed_in_desc, + (struct usb_descriptor_header *) &mtp_highspeed_out_desc, + (struct usb_descriptor_header *) &mtp_intr_desc, + NULL, +}; + +static struct usb_descriptor_header *ss_ptp_descs[] = { + (struct usb_descriptor_header *) &ptp_interface_desc, + (struct usb_descriptor_header *) &mtp_ss_in_desc, + (struct usb_descriptor_header *) &mtp_ss_in_comp_desc, + (struct usb_descriptor_header *) &mtp_ss_out_desc, + (struct usb_descriptor_header *) &mtp_ss_out_comp_desc, + (struct usb_descriptor_header *) &mtp_intr_desc, + (struct usb_descriptor_header *) &mtp_intr_ss_comp_desc, + NULL, +}; + +static struct usb_string mtp_string_defs[] = { + /* Naming interface "MTP" so libmtp will recognize us */ + [INTERFACE_STRING_INDEX].s = "MTP", + { }, /* end of list */ +}; + +static struct usb_gadget_strings mtp_string_table = { + .language = 0x0409, /* en-US */ + .strings = mtp_string_defs, +}; + +static struct usb_gadget_strings *mtp_strings[] = { + &mtp_string_table, + NULL, +}; + +/* Microsoft MTP OS String */ +static u8 mtp_os_string[] = { + 18, /* sizeof(mtp_os_string) */ + USB_DT_STRING, + /* Signature field: "MSFT100" */ + 'M', 0, 'S', 0, 'F', 0, 'T', 0, '1', 0, '0', 0, '0', 0, + /* vendor code */ + 1, + /* padding */ + 0 +}; + +/* Microsoft Extended Configuration Descriptor Header Section */ +struct mtp_ext_config_desc_header { + __le32 dwLength; + __u16 bcdVersion; + __le16 wIndex; + __u8 bCount; + __u8 reserved[7]; +}; + +/* Microsoft Extended Configuration Descriptor Function Section */ +struct mtp_ext_config_desc_function { + __u8 bFirstInterfaceNumber; + __u8 bInterfaceCount; + __u8 compatibleID[8]; + __u8 subCompatibleID[8]; + __u8 reserved[6]; +}; + +/* MTP Extended Configuration Descriptor */ +struct { + struct mtp_ext_config_desc_header header; + struct mtp_ext_config_desc_function function; +} mtp_ext_config_desc = { + .header = { + .dwLength = __constant_cpu_to_le32(sizeof(mtp_ext_config_desc)), + .bcdVersion = __constant_cpu_to_le16(0x0100), + .wIndex = __constant_cpu_to_le16(4), + .bCount = 1, + }, + .function = { + .bFirstInterfaceNumber = 0, + .bInterfaceCount = 1, + .compatibleID = { 'M', 'T', 'P' }, + }, +}; + +struct mtp_device_status { + __le16 wLength; + __le16 wCode; +}; + +struct mtp_data_header { + /* length of packet, including this header */ + __le32 length; + /* container type (2 for data packet) */ + __le16 type; + /* MTP command code */ + __le16 command; + /* MTP transaction ID */ + __le32 transaction_id; +}; + +struct mtp_instance { + struct usb_function_instance func_inst; + const char *name; + struct mtp_dev *dev; + char mtp_ext_compat_id[16]; + struct usb_os_desc mtp_os_desc; +}; + +/* temporary variable used between mtp_open() and mtp_gadget_bind() */ +static struct mtp_dev *_mtp_dev; + +static inline struct mtp_dev *func_to_mtp(struct usb_function *f) +{ + return container_of(f, struct mtp_dev, function); +} + +static struct usb_request *mtp_request_new(struct usb_ep *ep, int buffer_size) +{ + struct usb_request *req = usb_ep_alloc_request(ep, GFP_KERNEL); + + if (!req) + return NULL; + + /* now allocate buffers for the requests */ + req->buf = kmalloc(buffer_size, GFP_KERNEL); + if (!req->buf) { + usb_ep_free_request(ep, req); + return NULL; + } + + return req; +} + +static void mtp_request_free(struct usb_request *req, struct usb_ep *ep) +{ + if (req) { + kfree(req->buf); + usb_ep_free_request(ep, req); + } +} + +static inline int mtp_lock(atomic_t *excl) +{ + if (atomic_inc_return(excl) == 1) { + return 0; + } else { + atomic_dec(excl); + return -1; + } +} + +static inline void mtp_unlock(atomic_t *excl) +{ + atomic_dec(excl); +} + +/* add a request to the tail of a list */ +static void mtp_req_put(struct mtp_dev *dev, struct list_head *head, + struct usb_request *req) +{ + unsigned long flags; + + spin_lock_irqsave(&dev->lock, flags); + list_add_tail(&req->list, head); + spin_unlock_irqrestore(&dev->lock, flags); +} + +/* remove a request from the head of a list */ +static struct usb_request +*mtp_req_get(struct mtp_dev *dev, struct list_head *head) +{ + unsigned long flags; + struct usb_request *req; + + spin_lock_irqsave(&dev->lock, flags); + if (list_empty(head)) { + req = 0; + } else { + req = list_first_entry(head, struct usb_request, list); + list_del(&req->list); + } + spin_unlock_irqrestore(&dev->lock, flags); + return req; +} + +static void mtp_complete_in(struct usb_ep *ep, struct usb_request *req) +{ + struct mtp_dev *dev = _mtp_dev; + + if (req->status != 0) + dev->state = STATE_ERROR; + + mtp_req_put(dev, &dev->tx_idle, req); + + wake_up(&dev->write_wq); +} + +static void mtp_complete_out(struct usb_ep *ep, struct usb_request *req) +{ + struct mtp_dev *dev = _mtp_dev; + + dev->rx_done = 1; + if (req->status != 0) + dev->state = STATE_ERROR; + + wake_up(&dev->read_wq); +} + +static void mtp_complete_intr(struct usb_ep *ep, struct usb_request *req) +{ + struct mtp_dev *dev = _mtp_dev; + + if (req->status != 0) + dev->state = STATE_ERROR; + + mtp_req_put(dev, &dev->intr_idle, req); + + wake_up(&dev->intr_wq); +} + +static int mtp_create_bulk_endpoints(struct mtp_dev *dev, + struct usb_endpoint_descriptor *in_desc, + struct usb_endpoint_descriptor *out_desc, + struct usb_endpoint_descriptor *intr_desc) +{ + struct usb_composite_dev *cdev = dev->cdev; + struct usb_request *req; + struct usb_ep *ep; + int i; + + DBG(cdev, "create_bulk_endpoints dev: %p\n", dev); + + ep = usb_ep_autoconfig(cdev->gadget, in_desc); + if (!ep) { + DBG(cdev, "usb_ep_autoconfig for ep_in failed\n"); + return -ENODEV; + } + DBG(cdev, "usb_ep_autoconfig for ep_in got %s\n", ep->name); + ep->driver_data = dev; /* claim the endpoint */ + dev->ep_in = ep; + + ep = usb_ep_autoconfig(cdev->gadget, out_desc); + if (!ep) { + DBG(cdev, "usb_ep_autoconfig for ep_out failed\n"); + return -ENODEV; + } + DBG(cdev, "usb_ep_autoconfig for mtp ep_out got %s\n", ep->name); + ep->driver_data = dev; /* claim the endpoint */ + dev->ep_out = ep; + + ep = usb_ep_autoconfig(cdev->gadget, intr_desc); + if (!ep) { + DBG(cdev, "usb_ep_autoconfig for ep_intr failed\n"); + return -ENODEV; + } + DBG(cdev, "usb_ep_autoconfig for mtp ep_intr got %s\n", ep->name); + ep->driver_data = dev; /* claim the endpoint */ + dev->ep_intr = ep; + + /* now allocate requests for our endpoints */ + for (i = 0; i < TX_REQ_MAX; i++) { + req = mtp_request_new(dev->ep_in, MTP_BULK_BUFFER_SIZE); + if (!req) + goto fail; + req->complete = mtp_complete_in; + mtp_req_put(dev, &dev->tx_idle, req); + } + for (i = 0; i < RX_REQ_MAX; i++) { + req = mtp_request_new(dev->ep_out, MTP_BULK_BUFFER_SIZE); + if (!req) + goto fail; + req->complete = mtp_complete_out; + dev->rx_req[i] = req; + } + for (i = 0; i < INTR_REQ_MAX; i++) { + req = mtp_request_new(dev->ep_intr, INTR_BUFFER_SIZE); + if (!req) + goto fail; + req->complete = mtp_complete_intr; + mtp_req_put(dev, &dev->intr_idle, req); + } + + return 0; + +fail: + pr_err("mtp_bind() could not allocate requests\n"); + return -1; +} + +static ssize_t mtp_read(struct file *fp, char __user *buf, + size_t count, loff_t *pos) +{ + struct mtp_dev *dev = fp->private_data; + struct usb_composite_dev *cdev = dev->cdev; + struct usb_request *req; + ssize_t r = count; + unsigned xfer; + int ret = 0; + size_t len = 0; + + DBG(cdev, "mtp_read(%zu)\n", count); + + /* we will block until we're online */ + DBG(cdev, "mtp_read: waiting for online state\n"); + ret = wait_event_interruptible(dev->read_wq, + dev->state != STATE_OFFLINE); + if (ret < 0) { + r = ret; + goto done; + } + spin_lock_irq(&dev->lock); + if (dev->ep_out->desc) { + len = usb_ep_align_maybe(cdev->gadget, dev->ep_out, count); + if (len > MTP_BULK_BUFFER_SIZE) { + spin_unlock_irq(&dev->lock); + return -EINVAL; + } + } + + if (dev->state == STATE_CANCELED) { + /* report cancelation to userspace */ + dev->state = STATE_READY; + spin_unlock_irq(&dev->lock); + return -ECANCELED; + } + dev->state = STATE_BUSY; + spin_unlock_irq(&dev->lock); + +requeue_req: + /* queue a request */ + req = dev->rx_req[0]; + req->length = len; + dev->rx_done = 0; + ret = usb_ep_queue(dev->ep_out, req, GFP_KERNEL); + if (ret < 0) { + r = -EIO; + goto done; + } else { + DBG(cdev, "rx %p queue\n", req); + } + + /* wait for a request to complete */ + ret = wait_event_interruptible(dev->read_wq, dev->rx_done); + if (ret < 0) { + r = ret; + usb_ep_dequeue(dev->ep_out, req); + goto done; + } + if (dev->state == STATE_BUSY) { + /* If we got a 0-len packet, throw it back and try again. */ + if (req->actual == 0) + goto requeue_req; + + DBG(cdev, "rx %p %d\n", req, req->actual); + xfer = (req->actual < count) ? req->actual : count; + r = xfer; + if (copy_to_user(buf, req->buf, xfer)) + r = -EFAULT; + } else + r = -EIO; + +done: + spin_lock_irq(&dev->lock); + if (dev->state == STATE_CANCELED) + r = -ECANCELED; + else if (dev->state != STATE_OFFLINE) + dev->state = STATE_READY; + spin_unlock_irq(&dev->lock); + + DBG(cdev, "mtp_read returning %zd\n", r); + return r; +} + +static ssize_t mtp_write(struct file *fp, const char __user *buf, + size_t count, loff_t *pos) +{ + struct mtp_dev *dev = fp->private_data; + struct usb_composite_dev *cdev = dev->cdev; + struct usb_request *req = 0; + ssize_t r = count; + unsigned xfer; + int sendZLP = 0; + int ret; + + DBG(cdev, "mtp_write(%zu)\n", count); + + spin_lock_irq(&dev->lock); + if (dev->state == STATE_CANCELED) { + /* report cancelation to userspace */ + dev->state = STATE_READY; + spin_unlock_irq(&dev->lock); + return -ECANCELED; + } + if (dev->state == STATE_OFFLINE) { + spin_unlock_irq(&dev->lock); + return -ENODEV; + } + dev->state = STATE_BUSY; + spin_unlock_irq(&dev->lock); + + /* we need to send a zero length packet to signal the end of transfer + * if the transfer size is aligned to a packet boundary. + */ + if ((count & (dev->ep_in->maxpacket - 1)) == 0) + sendZLP = 1; + + while (count > 0 || sendZLP) { + /* so we exit after sending ZLP */ + if (count == 0) + sendZLP = 0; + + if (dev->state != STATE_BUSY) { + DBG(cdev, "mtp_write dev->error\n"); + r = -EIO; + break; + } + + /* get an idle tx request to use */ + req = 0; + ret = wait_event_interruptible(dev->write_wq, + ((req = mtp_req_get(dev, &dev->tx_idle)) + || dev->state != STATE_BUSY)); + if (!req) { + r = ret; + break; + } + + if (count > MTP_BULK_BUFFER_SIZE) + xfer = MTP_BULK_BUFFER_SIZE; + else + xfer = count; + if (xfer && copy_from_user(req->buf, buf, xfer)) { + r = -EFAULT; + break; + } + + req->length = xfer; + ret = usb_ep_queue(dev->ep_in, req, GFP_KERNEL); + if (ret < 0) { + DBG(cdev, "mtp_write: xfer error %d\n", ret); + r = -EIO; + break; + } + + buf += xfer; + count -= xfer; + + /* zero this so we don't try to free it on error exit */ + req = 0; + } + + if (req) + mtp_req_put(dev, &dev->tx_idle, req); + + spin_lock_irq(&dev->lock); + if (dev->state == STATE_CANCELED) + r = -ECANCELED; + else if (dev->state != STATE_OFFLINE) + dev->state = STATE_READY; + spin_unlock_irq(&dev->lock); + + DBG(cdev, "mtp_write returning %zd\n", r); + return r; +} + +/* read from a local file and write to USB */ +static void send_file_work(struct work_struct *data) +{ + struct mtp_dev *dev = container_of(data, struct mtp_dev, + send_file_work); + struct usb_composite_dev *cdev = dev->cdev; + struct usb_request *req = 0; + struct mtp_data_header *header; + struct file *filp; + loff_t offset; + int64_t count; + int xfer, ret, hdr_size; + int r = 0; + int sendZLP = 0; + + /* read our parameters */ + smp_rmb(); + filp = dev->xfer_file; + offset = dev->xfer_file_offset; + count = dev->xfer_file_length; + + DBG(cdev, "send_file_work(%lld %lld)\n", offset, count); + + if (dev->xfer_send_header) { + hdr_size = sizeof(struct mtp_data_header); + count += hdr_size; + } else { + hdr_size = 0; + } + + /* we need to send a zero length packet to signal the end of transfer + * if the transfer size is aligned to a packet boundary. + */ + if ((count & (dev->ep_in->maxpacket - 1)) == 0) + sendZLP = 1; + + while (count > 0 || sendZLP) { + /* so we exit after sending ZLP */ + if (count == 0) + sendZLP = 0; + + /* get an idle tx request to use */ + req = 0; + ret = wait_event_interruptible(dev->write_wq, + (req = mtp_req_get(dev, &dev->tx_idle)) + || dev->state != STATE_BUSY); + if (dev->state == STATE_CANCELED) { + r = -ECANCELED; + break; + } + if (!req) { + r = ret; + break; + } + + if (count > MTP_BULK_BUFFER_SIZE) + xfer = MTP_BULK_BUFFER_SIZE; + else + xfer = count; + + if (hdr_size) { + /* prepend MTP data header */ + header = (struct mtp_data_header *)req->buf; + /* + * set file size with header according to + * MTP Specification v1.0 + */ + header->length = (count > MTP_MAX_FILE_SIZE) ? + MTP_MAX_FILE_SIZE : __cpu_to_le32(count); + header->type = __cpu_to_le16(2); /* data packet */ + header->command = __cpu_to_le16(dev->xfer_command); + header->transaction_id = + __cpu_to_le32(dev->xfer_transaction_id); + } + + ret = vfs_read(filp, req->buf + hdr_size, xfer - hdr_size, + &offset); + if (ret < 0) { + r = ret; + break; + } + xfer = ret + hdr_size; + hdr_size = 0; + + req->length = xfer; + ret = usb_ep_queue(dev->ep_in, req, GFP_KERNEL); + if (ret < 0) { + DBG(cdev, "send_file_work: xfer error %d\n", ret); + dev->state = STATE_ERROR; + r = -EIO; + break; + } + + count -= xfer; + + /* zero this so we don't try to free it on error exit */ + req = 0; + } + + if (req) + mtp_req_put(dev, &dev->tx_idle, req); + + DBG(cdev, "send_file_work returning %d\n", r); + /* write the result */ + dev->xfer_result = r; + smp_wmb(); +} + +/* read from USB and write to a local file */ +static void receive_file_work(struct work_struct *data) +{ + struct mtp_dev *dev = container_of(data, struct mtp_dev, + receive_file_work); + struct usb_composite_dev *cdev = dev->cdev; + struct usb_request *read_req = NULL, *write_req = NULL; + struct file *filp; + loff_t offset; + int64_t count, len; + int ret, cur_buf = 0; + int r = 0; + + /* read our parameters */ + smp_rmb(); + filp = dev->xfer_file; + offset = dev->xfer_file_offset; + count = dev->xfer_file_length; + + DBG(cdev, "receive_file_work(%lld)\n", count); + + while (count > 0 || write_req) { + if (count > 0) { + /* queue a request */ + read_req = dev->rx_req[cur_buf]; + cur_buf = (cur_buf + 1) % RX_REQ_MAX; + + len = usb_ep_align_maybe(cdev->gadget, dev->ep_out, count); + if (len > MTP_BULK_BUFFER_SIZE) + len = MTP_BULK_BUFFER_SIZE; + read_req->length = len; + dev->rx_done = 0; + ret = usb_ep_queue(dev->ep_out, read_req, GFP_KERNEL); + if (ret < 0) { + r = -EIO; + dev->state = STATE_ERROR; + break; + } + } + + if (write_req) { + DBG(cdev, "rx %p %d\n", write_req, write_req->actual); + ret = vfs_write(filp, write_req->buf, write_req->actual, + &offset); + DBG(cdev, "vfs_write %d\n", ret); + if (ret != write_req->actual) { + r = -EIO; + dev->state = STATE_ERROR; + break; + } + write_req = NULL; + } + + if (read_req) { + /* wait for our last read to complete */ + ret = wait_event_interruptible(dev->read_wq, + dev->rx_done || dev->state != STATE_BUSY); + if (dev->state == STATE_CANCELED) { + r = -ECANCELED; + if (!dev->rx_done) + usb_ep_dequeue(dev->ep_out, read_req); + break; + } + if (read_req->status) { + r = read_req->status; + break; + } + /* if xfer_file_length is 0xFFFFFFFF, then we read until + * we get a zero length packet + */ + if (count != 0xFFFFFFFF) + count -= read_req->actual; + if (read_req->actual < read_req->length) { + /* + * short packet is used to signal EOF for + * sizes > 4 gig + */ + DBG(cdev, "got short packet\n"); + count = 0; + } + + write_req = read_req; + read_req = NULL; + } + } + + DBG(cdev, "receive_file_work returning %d\n", r); + /* write the result */ + dev->xfer_result = r; + smp_wmb(); +} + +static int mtp_send_event(struct mtp_dev *dev, struct mtp_event *event) +{ + struct usb_request *req = NULL; + int ret; + int length = event->length; + + DBG(dev->cdev, "mtp_send_event(%zu)\n", event->length); + + if (length < 0 || length > INTR_BUFFER_SIZE) + return -EINVAL; + if (dev->state == STATE_OFFLINE) + return -ENODEV; + + ret = wait_event_interruptible_timeout(dev->intr_wq, + (req = mtp_req_get(dev, &dev->intr_idle)), + msecs_to_jiffies(1000)); + if (!req) + return -ETIME; + + if (copy_from_user(req->buf, (void __user *)event->data, length)) { + mtp_req_put(dev, &dev->intr_idle, req); + return -EFAULT; + } + req->length = length; + ret = usb_ep_queue(dev->ep_intr, req, GFP_KERNEL); + if (ret) + mtp_req_put(dev, &dev->intr_idle, req); + + return ret; +} + +static long mtp_ioctl(struct file *fp, unsigned code, unsigned long value) +{ + struct mtp_dev *dev = fp->private_data; + struct file *filp = NULL; + int ret = -EINVAL; + + if (mtp_lock(&dev->ioctl_excl)) + return -EBUSY; + + switch (code) { + case MTP_SEND_FILE: + case MTP_RECEIVE_FILE: + case MTP_SEND_FILE_WITH_HEADER: + { + struct mtp_file_range mfr; + struct work_struct *work; + + spin_lock_irq(&dev->lock); + if (dev->state == STATE_CANCELED) { + /* report cancelation to userspace */ + dev->state = STATE_READY; + spin_unlock_irq(&dev->lock); + ret = -ECANCELED; + goto out; + } + if (dev->state == STATE_OFFLINE) { + spin_unlock_irq(&dev->lock); + ret = -ENODEV; + goto out; + } + dev->state = STATE_BUSY; + spin_unlock_irq(&dev->lock); + + if (copy_from_user(&mfr, (void __user *)value, sizeof(mfr))) { + ret = -EFAULT; + goto fail; + } + /* hold a reference to the file while we are working with it */ + filp = fget(mfr.fd); + if (!filp) { + ret = -EBADF; + goto fail; + } + + /* write the parameters */ + dev->xfer_file = filp; + dev->xfer_file_offset = mfr.offset; + dev->xfer_file_length = mfr.length; + smp_wmb(); + + if (code == MTP_SEND_FILE_WITH_HEADER) { + work = &dev->send_file_work; + dev->xfer_send_header = 1; + dev->xfer_command = mfr.command; + dev->xfer_transaction_id = mfr.transaction_id; + } else if (code == MTP_SEND_FILE) { + work = &dev->send_file_work; + dev->xfer_send_header = 0; + } else { + work = &dev->receive_file_work; + } + + /* We do the file transfer on a work queue so it will run + * in kernel context, which is necessary for vfs_read and + * vfs_write to use our buffers in the kernel address space. + */ + queue_work(dev->wq, work); + /* wait for operation to complete */ + flush_workqueue(dev->wq); + fput(filp); + + /* read the result */ + smp_rmb(); + ret = dev->xfer_result; + break; + } + case MTP_SEND_EVENT: + { + struct mtp_event event; + /* return here so we don't change dev->state below, + * which would interfere with bulk transfer state. + */ + if (copy_from_user(&event, (void __user *)value, sizeof(event))) + ret = -EFAULT; + else + ret = mtp_send_event(dev, &event); + goto out; + } + } + +fail: + spin_lock_irq(&dev->lock); + if (dev->state == STATE_CANCELED) + ret = -ECANCELED; + else if (dev->state != STATE_OFFLINE) + dev->state = STATE_READY; + spin_unlock_irq(&dev->lock); +out: + mtp_unlock(&dev->ioctl_excl); + DBG(dev->cdev, "ioctl returning %d\n", ret); + return ret; +} + +static int mtp_open(struct inode *ip, struct file *fp) +{ + printk(KERN_INFO "mtp_open\n"); + if (mtp_lock(&_mtp_dev->open_excl)) + return -EBUSY; + + /* clear any error condition */ + if (_mtp_dev->state != STATE_OFFLINE) + _mtp_dev->state = STATE_READY; + + fp->private_data = _mtp_dev; + return 0; +} + +static int mtp_release(struct inode *ip, struct file *fp) +{ + printk(KERN_INFO "mtp_release\n"); + + mtp_unlock(&_mtp_dev->open_excl); + return 0; +} + +/* file operations for /dev/mtp_usb */ +static const struct file_operations mtp_fops = { + .owner = THIS_MODULE, + .read = mtp_read, + .write = mtp_write, + .unlocked_ioctl = mtp_ioctl, + .open = mtp_open, + .release = mtp_release, +}; + +static struct miscdevice mtp_device = { + .minor = MISC_DYNAMIC_MINOR, + .name = mtp_shortname, + .fops = &mtp_fops, +}; + +static int mtp_ctrlrequest(struct usb_composite_dev *cdev, + const struct usb_ctrlrequest *ctrl) +{ + struct mtp_dev *dev = _mtp_dev; + int value = -EOPNOTSUPP; + u16 w_index = le16_to_cpu(ctrl->wIndex); + u16 w_value = le16_to_cpu(ctrl->wValue); + u16 w_length = le16_to_cpu(ctrl->wLength); + unsigned long flags; + + VDBG(cdev, "mtp_ctrlrequest " + "%02x.%02x v%04x i%04x l%u\n", + ctrl->bRequestType, ctrl->bRequest, + w_value, w_index, w_length); + + /* Handle MTP OS string */ + if (ctrl->bRequestType == + (USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE) + && ctrl->bRequest == USB_REQ_GET_DESCRIPTOR + && (w_value >> 8) == USB_DT_STRING + && (w_value & 0xFF) == MTP_OS_STRING_ID) { + value = (w_length < sizeof(mtp_os_string) + ? w_length : sizeof(mtp_os_string)); + memcpy(cdev->req->buf, mtp_os_string, value); + } else if ((ctrl->bRequestType & USB_TYPE_MASK) == USB_TYPE_VENDOR) { + /* Handle MTP OS descriptor */ + DBG(cdev, "vendor request: %d index: %d value: %d length: %d\n", + ctrl->bRequest, w_index, w_value, w_length); + + if (ctrl->bRequest == 1 + && (ctrl->bRequestType & USB_DIR_IN) + && (w_index == 4 || w_index == 5)) { + value = (w_length < sizeof(mtp_ext_config_desc) ? + w_length : sizeof(mtp_ext_config_desc)); + memcpy(cdev->req->buf, &mtp_ext_config_desc, value); + } + } else if ((ctrl->bRequestType & USB_TYPE_MASK) == USB_TYPE_CLASS) { + DBG(cdev, "class request: %d index: %d value: %d length: %d\n", + ctrl->bRequest, w_index, w_value, w_length); + + if (ctrl->bRequest == MTP_REQ_CANCEL && w_index == 0 + && w_value == 0) { + DBG(cdev, "MTP_REQ_CANCEL\n"); + + spin_lock_irqsave(&dev->lock, flags); + if (dev->state == STATE_BUSY) { + dev->state = STATE_CANCELED; + wake_up(&dev->read_wq); + wake_up(&dev->write_wq); + } + spin_unlock_irqrestore(&dev->lock, flags); + + /* We need to queue a request to read the remaining + * bytes, but we don't actually need to look at + * the contents. + */ + value = w_length; + } else if (ctrl->bRequest == MTP_REQ_GET_DEVICE_STATUS + && w_index == 0 && w_value == 0) { + struct mtp_device_status *status = cdev->req->buf; + + status->wLength = + __constant_cpu_to_le16(sizeof(*status)); + + DBG(cdev, "MTP_REQ_GET_DEVICE_STATUS\n"); + spin_lock_irqsave(&dev->lock, flags); + /* device status is "busy" until we report + * the cancelation to userspace + */ + if (dev->state == STATE_CANCELED) + status->wCode = + __cpu_to_le16(MTP_RESPONSE_DEVICE_BUSY); + else + status->wCode = + __cpu_to_le16(MTP_RESPONSE_OK); + spin_unlock_irqrestore(&dev->lock, flags); + value = sizeof(*status); + } + } + + /* respond with data transfer or status phase? */ + if (value >= 0) { + int rc; + + cdev->req->zero = value < w_length; + cdev->req->length = value; + rc = usb_ep_queue(cdev->gadget->ep0, cdev->req, GFP_ATOMIC); + if (rc < 0) + ERROR(cdev, "%s: response queue error\n", __func__); + } + return value; +} + +static int +mtp_function_bind(struct usb_configuration *c, struct usb_function *f) +{ + struct usb_composite_dev *cdev = c->cdev; + struct mtp_dev *dev = func_to_mtp(f); + int id; + int ret; + struct mtp_instance *fi_mtp; + + dev->cdev = cdev; + DBG(cdev, "mtp_function_bind dev: %p\n", dev); + + /* allocate interface ID(s) */ + id = usb_interface_id(c, f); + if (id < 0) + return id; + mtp_interface_desc.bInterfaceNumber = id; + + if (mtp_string_defs[INTERFACE_STRING_INDEX].id == 0) { + ret = usb_string_id(c->cdev); + if (ret < 0) + return ret; + mtp_string_defs[INTERFACE_STRING_INDEX].id = ret; + mtp_interface_desc.iInterface = ret; + } + + fi_mtp = container_of(f->fi, struct mtp_instance, func_inst); + + if (cdev->use_os_string) { + f->os_desc_table = kzalloc(sizeof(*f->os_desc_table), + GFP_KERNEL); + if (!f->os_desc_table) + return -ENOMEM; + f->os_desc_n = 1; + f->os_desc_table[0].os_desc = &fi_mtp->mtp_os_desc; + } + + /* allocate endpoints */ + ret = mtp_create_bulk_endpoints(dev, &mtp_fullspeed_in_desc, + &mtp_fullspeed_out_desc, &mtp_intr_desc); + if (ret) + return ret; + + /* support high speed hardware */ + if (gadget_is_dualspeed(c->cdev->gadget)) { + mtp_highspeed_in_desc.bEndpointAddress = + mtp_fullspeed_in_desc.bEndpointAddress; + mtp_highspeed_out_desc.bEndpointAddress = + mtp_fullspeed_out_desc.bEndpointAddress; + } + /* support super speed hardware */ + if (gadget_is_superspeed(c->cdev->gadget)) { + unsigned max_burst; + + /* Calculate bMaxBurst, we know packet size is 1024 */ + max_burst = min_t(unsigned, MTP_BULK_BUFFER_SIZE / 1024, 15); + mtp_ss_in_desc.bEndpointAddress = + mtp_fullspeed_in_desc.bEndpointAddress; + mtp_ss_in_comp_desc.bMaxBurst = max_burst; + mtp_ss_out_desc.bEndpointAddress = + mtp_fullspeed_out_desc.bEndpointAddress; + mtp_ss_out_comp_desc.bMaxBurst = max_burst; + } + + DBG(cdev, "%s speed %s: IN/%s, OUT/%s\n", + gadget_is_superspeed(c->cdev->gadget) ? "super" : + (gadget_is_dualspeed(c->cdev->gadget) ? "dual" : "full"), + f->name, dev->ep_in->name, dev->ep_out->name); + return 0; +} + +static void +mtp_function_unbind(struct usb_configuration *c, struct usb_function *f) +{ + struct mtp_dev *dev = func_to_mtp(f); + struct usb_request *req; + int i; + + mtp_string_defs[INTERFACE_STRING_INDEX].id = 0; + while ((req = mtp_req_get(dev, &dev->tx_idle))) + mtp_request_free(req, dev->ep_in); + for (i = 0; i < RX_REQ_MAX; i++) + mtp_request_free(dev->rx_req[i], dev->ep_out); + while ((req = mtp_req_get(dev, &dev->intr_idle))) + mtp_request_free(req, dev->ep_intr); + dev->state = STATE_OFFLINE; + kfree(f->os_desc_table); + f->os_desc_n = 0; +} + +static int mtp_function_set_alt(struct usb_function *f, + unsigned intf, unsigned alt) +{ + struct mtp_dev *dev = func_to_mtp(f); + struct usb_composite_dev *cdev = f->config->cdev; + int ret; + + DBG(cdev, "mtp_function_set_alt intf: %d alt: %d\n", intf, alt); + + ret = config_ep_by_speed(cdev->gadget, f, dev->ep_in); + if (ret) + return ret; + + ret = usb_ep_enable(dev->ep_in); + if (ret) + return ret; + + ret = config_ep_by_speed(cdev->gadget, f, dev->ep_out); + if (ret) + return ret; + + ret = usb_ep_enable(dev->ep_out); + if (ret) { + usb_ep_disable(dev->ep_in); + return ret; + } + + ret = config_ep_by_speed(cdev->gadget, f, dev->ep_intr); + if (ret) + return ret; + + ret = usb_ep_enable(dev->ep_intr); + if (ret) { + usb_ep_disable(dev->ep_out); + usb_ep_disable(dev->ep_in); + return ret; + } + dev->state = STATE_READY; + + /* readers may be blocked waiting for us to go online */ + wake_up(&dev->read_wq); + return 0; +} + +static void mtp_function_disable(struct usb_function *f) +{ + struct mtp_dev *dev = func_to_mtp(f); + struct usb_composite_dev *cdev = dev->cdev; + + DBG(cdev, "mtp_function_disable\n"); + dev->state = STATE_OFFLINE; + usb_ep_disable(dev->ep_in); + usb_ep_disable(dev->ep_out); + usb_ep_disable(dev->ep_intr); + + /* readers may be blocked waiting for us to go online */ + wake_up(&dev->read_wq); + + VDBG(cdev, "%s disabled\n", dev->function.name); +} + +static int __mtp_setup(struct mtp_instance *fi_mtp) +{ + struct mtp_dev *dev; + int ret; + + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + + if (fi_mtp != NULL) + fi_mtp->dev = dev; + + if (!dev) + return -ENOMEM; + + spin_lock_init(&dev->lock); + init_waitqueue_head(&dev->read_wq); + init_waitqueue_head(&dev->write_wq); + init_waitqueue_head(&dev->intr_wq); + atomic_set(&dev->open_excl, 0); + atomic_set(&dev->ioctl_excl, 0); + INIT_LIST_HEAD(&dev->tx_idle); + INIT_LIST_HEAD(&dev->intr_idle); + + dev->wq = create_singlethread_workqueue("f_mtp"); + if (!dev->wq) { + ret = -ENOMEM; + goto err1; + } + INIT_WORK(&dev->send_file_work, send_file_work); + INIT_WORK(&dev->receive_file_work, receive_file_work); + + _mtp_dev = dev; + + ret = misc_register(&mtp_device); + if (ret) + goto err2; + + return 0; + +err2: + destroy_workqueue(dev->wq); +err1: + _mtp_dev = NULL; + kfree(dev); + printk(KERN_ERR "mtp gadget driver failed to initialize\n"); + return ret; +} + +static int mtp_setup_configfs(struct mtp_instance *fi_mtp) +{ + return __mtp_setup(fi_mtp); +} + + +static void mtp_cleanup(void) +{ + struct mtp_dev *dev = _mtp_dev; + + if (!dev) + return; + + misc_deregister(&mtp_device); + destroy_workqueue(dev->wq); + _mtp_dev = NULL; + kfree(dev); +} + +static struct mtp_instance *to_mtp_instance(struct config_item *item) +{ + return container_of(to_config_group(item), struct mtp_instance, + func_inst.group); +} + +static void mtp_attr_release(struct config_item *item) +{ + struct mtp_instance *fi_mtp = to_mtp_instance(item); + + usb_put_function_instance(&fi_mtp->func_inst); +} + +static struct configfs_item_operations mtp_item_ops = { + .release = mtp_attr_release, +}; + +static struct config_item_type mtp_func_type = { + .ct_item_ops = &mtp_item_ops, + .ct_owner = THIS_MODULE, +}; + + +static struct mtp_instance *to_fi_mtp(struct usb_function_instance *fi) +{ + return container_of(fi, struct mtp_instance, func_inst); +} + +static int mtp_set_inst_name(struct usb_function_instance *fi, const char *name) +{ + struct mtp_instance *fi_mtp; + char *ptr; + int name_len; + + name_len = strlen(name) + 1; + if (name_len > MAX_INST_NAME_LEN) + return -ENAMETOOLONG; + + ptr = kstrndup(name, name_len, GFP_KERNEL); + if (!ptr) + return -ENOMEM; + + fi_mtp = to_fi_mtp(fi); + fi_mtp->name = ptr; + + return 0; +} + +static void mtp_free_inst(struct usb_function_instance *fi) +{ + struct mtp_instance *fi_mtp; + + fi_mtp = to_fi_mtp(fi); + kfree(fi_mtp->name); + mtp_cleanup(); + kfree(fi_mtp); +} + +struct usb_function_instance *alloc_inst_mtp_ptp(bool mtp_config) +{ + struct mtp_instance *fi_mtp; + int ret = 0; + struct usb_os_desc *descs[1]; + char *names[1]; + + fi_mtp = kzalloc(sizeof(*fi_mtp), GFP_KERNEL); + if (!fi_mtp) + return ERR_PTR(-ENOMEM); + fi_mtp->func_inst.set_inst_name = mtp_set_inst_name; + fi_mtp->func_inst.free_func_inst = mtp_free_inst; + + fi_mtp->mtp_os_desc.ext_compat_id = fi_mtp->mtp_ext_compat_id; + INIT_LIST_HEAD(&fi_mtp->mtp_os_desc.ext_prop); + descs[0] = &fi_mtp->mtp_os_desc; + names[0] = "MTP"; + + if (mtp_config) { + ret = mtp_setup_configfs(fi_mtp); + if (ret) { + kfree(fi_mtp); + pr_err("Error setting MTP\n"); + return ERR_PTR(ret); + } + } else + fi_mtp->dev = _mtp_dev; + + config_group_init_type_name(&fi_mtp->func_inst.group, + "", &mtp_func_type); + usb_os_desc_prepare_interf_dir(&fi_mtp->func_inst.group, 1, + descs, names, THIS_MODULE); + + return &fi_mtp->func_inst; +} +EXPORT_SYMBOL_GPL(alloc_inst_mtp_ptp); + +static struct usb_function_instance *mtp_alloc_inst(void) +{ + return alloc_inst_mtp_ptp(true); +} + +static int mtp_ctrlreq_configfs(struct usb_function *f, + const struct usb_ctrlrequest *ctrl) +{ + return mtp_ctrlrequest(f->config->cdev, ctrl); +} + +static void mtp_free(struct usb_function *f) +{ + /*NO-OP: no function specific resource allocation in mtp_alloc*/ +} + +struct usb_function *function_alloc_mtp_ptp(struct usb_function_instance *fi, + bool mtp_config) +{ + struct mtp_instance *fi_mtp = to_fi_mtp(fi); + struct mtp_dev *dev; + + /* + * PTP piggybacks on MTP function so make sure we have + * created MTP function before we associate this PTP + * function with a gadget configuration. + */ + if (fi_mtp->dev == NULL) { + pr_err("Error: Create MTP function before linking" + " PTP function with a gadget configuration\n"); + pr_err("\t1: Delete existing PTP function if any\n"); + pr_err("\t2: Create MTP function\n"); + pr_err("\t3: Create and symlink PTP function" + " with a gadget configuration\n"); + return ERR_PTR(-EINVAL); /* Invalid Configuration */ + } + + dev = fi_mtp->dev; + dev->function.name = DRIVER_NAME; + dev->function.strings = mtp_strings; + if (mtp_config) { + dev->function.fs_descriptors = fs_mtp_descs; + dev->function.hs_descriptors = hs_mtp_descs; + dev->function.ss_descriptors = ss_mtp_descs; + } else { + dev->function.fs_descriptors = fs_ptp_descs; + dev->function.hs_descriptors = hs_ptp_descs; + dev->function.ss_descriptors = ss_ptp_descs; + } + dev->function.bind = mtp_function_bind; + dev->function.unbind = mtp_function_unbind; + dev->function.set_alt = mtp_function_set_alt; + dev->function.disable = mtp_function_disable; + dev->function.setup = mtp_ctrlreq_configfs; + dev->function.free_func = mtp_free; + + return &dev->function; +} +EXPORT_SYMBOL_GPL(function_alloc_mtp_ptp); + +static struct usb_function *mtp_alloc(struct usb_function_instance *fi) +{ + return function_alloc_mtp_ptp(fi, true); +} + +DECLARE_USB_FUNCTION_INIT(mtp, mtp_alloc_inst, mtp_alloc); +MODULE_LICENSE("GPL");
diff --git a/drivers/usb/gadget/function/f_mtp.h b/drivers/usb/gadget/function/f_mtp.h new file mode 100644 index 0000000..7adb1ff --- /dev/null +++ b/drivers/usb/gadget/function/f_mtp.h
@@ -0,0 +1,18 @@ +/* + * Copyright (C) 2014 Google, Inc. + * Author: Badhri Jagan Sridharan <badhri@android.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +extern struct usb_function_instance *alloc_inst_mtp_ptp(bool mtp_config); +extern struct usb_function *function_alloc_mtp_ptp( + struct usb_function_instance *fi, bool mtp_config);
diff --git a/drivers/usb/gadget/function/f_ptp.c b/drivers/usb/gadget/function/f_ptp.c new file mode 100644 index 0000000..da3e4d5 --- /dev/null +++ b/drivers/usb/gadget/function/f_ptp.c
@@ -0,0 +1,38 @@ +/* + * Gadget Function Driver for PTP + * + * Copyright (C) 2014 Google, Inc. + * Author: Badhri Jagan Sridharan <badhri@android.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/module.h> +#include <linux/types.h> + +#include <linux/configfs.h> +#include <linux/usb/composite.h> + +#include "f_mtp.h" + +static struct usb_function_instance *ptp_alloc_inst(void) +{ + return alloc_inst_mtp_ptp(false); +} + +static struct usb_function *ptp_alloc(struct usb_function_instance *fi) +{ + return function_alloc_mtp_ptp(fi, false); +} + +DECLARE_USB_FUNCTION_INIT(ptp, ptp_alloc_inst, ptp_alloc); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Badhri Jagan Sridharan");
diff --git a/drivers/usb/phy/Kconfig b/drivers/usb/phy/Kconfig index 125cea1..e47bb36 100644 --- a/drivers/usb/phy/Kconfig +++ b/drivers/usb/phy/Kconfig
@@ -6,6 +6,14 @@ config USB_PHY def_bool n +config USB_OTG_WAKELOCK + bool "Hold a wakelock when USB connected" + depends on PM_WAKELOCKS + select USB_OTG_UTILS + help + Select this to automatically hold a wakelock when USB is + connected, preventing suspend. + # # USB Transceiver Drivers # @@ -209,4 +217,13 @@ Provides read/write operations to the ULPI phy register set for controllers with a viewport register (e.g. Chipidea/ARC controllers). +config DUAL_ROLE_USB_INTF + bool "Generic DUAL ROLE sysfs interface" + depends on SYSFS && USB_PHY + help + A generic sysfs interface to track and change the state of + dual role usb phys. The usb phy drivers can register to + this interface to expose it capabilities to the userspace + and thereby allowing userspace to change the port mode. + endmenu
diff --git a/drivers/usb/phy/Makefile b/drivers/usb/phy/Makefile index b433e5d..f65ac3e 100644 --- a/drivers/usb/phy/Makefile +++ b/drivers/usb/phy/Makefile
@@ -3,6 +3,8 @@ # obj-$(CONFIG_USB_PHY) += phy.o obj-$(CONFIG_OF) += of.o +obj-$(CONFIG_USB_OTG_WAKELOCK) += otg-wakelock.o +obj-$(CONFIG_DUAL_ROLE_USB_INTF) += class-dual-role.o # transceiver drivers, keep the list sorted
diff --git a/drivers/usb/phy/class-dual-role.c b/drivers/usb/phy/class-dual-role.c new file mode 100644 index 0000000..51fcb54 --- /dev/null +++ b/drivers/usb/phy/class-dual-role.c
@@ -0,0 +1,529 @@ +/* + * class-dual-role.c + * + * Copyright (C) 2015 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/ctype.h> +#include <linux/device.h> +#include <linux/usb/class-dual-role.h> +#include <linux/err.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/stat.h> +#include <linux/types.h> + +#define DUAL_ROLE_NOTIFICATION_TIMEOUT 2000 + +static ssize_t dual_role_store_property(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count); +static ssize_t dual_role_show_property(struct device *dev, + struct device_attribute *attr, + char *buf); + +#define DUAL_ROLE_ATTR(_name) \ +{ \ + .attr = { .name = #_name }, \ + .show = dual_role_show_property, \ + .store = dual_role_store_property, \ +} + +static struct device_attribute dual_role_attrs[] = { + DUAL_ROLE_ATTR(supported_modes), + DUAL_ROLE_ATTR(mode), + DUAL_ROLE_ATTR(power_role), + DUAL_ROLE_ATTR(data_role), + DUAL_ROLE_ATTR(powers_vconn), +}; + +struct class *dual_role_class; +EXPORT_SYMBOL_GPL(dual_role_class); + +static struct device_type dual_role_dev_type; + +static char *kstrdupcase(const char *str, gfp_t gfp, bool to_upper) +{ + char *ret, *ustr; + + ustr = ret = kmalloc(strlen(str) + 1, gfp); + + if (!ret) + return NULL; + + while (*str) + *ustr++ = to_upper ? toupper(*str++) : tolower(*str++); + + *ustr = 0; + + return ret; +} + +static void dual_role_changed_work(struct work_struct *work) +{ + struct dual_role_phy_instance *dual_role = + container_of(work, struct dual_role_phy_instance, + changed_work); + + dev_dbg(&dual_role->dev, "%s\n", __func__); + kobject_uevent(&dual_role->dev.kobj, KOBJ_CHANGE); +} + +void dual_role_instance_changed(struct dual_role_phy_instance *dual_role) +{ + dev_dbg(&dual_role->dev, "%s\n", __func__); + pm_wakeup_event(&dual_role->dev, DUAL_ROLE_NOTIFICATION_TIMEOUT); + schedule_work(&dual_role->changed_work); +} +EXPORT_SYMBOL_GPL(dual_role_instance_changed); + +int dual_role_get_property(struct dual_role_phy_instance *dual_role, + enum dual_role_property prop, + unsigned int *val) +{ + return dual_role->desc->get_property(dual_role, prop, val); +} +EXPORT_SYMBOL_GPL(dual_role_get_property); + +int dual_role_set_property(struct dual_role_phy_instance *dual_role, + enum dual_role_property prop, + const unsigned int *val) +{ + if (!dual_role->desc->set_property) + return -ENODEV; + + return dual_role->desc->set_property(dual_role, prop, val); +} +EXPORT_SYMBOL_GPL(dual_role_set_property); + +int dual_role_property_is_writeable(struct dual_role_phy_instance *dual_role, + enum dual_role_property prop) +{ + if (!dual_role->desc->property_is_writeable) + return -ENODEV; + + return dual_role->desc->property_is_writeable(dual_role, prop); +} +EXPORT_SYMBOL_GPL(dual_role_property_is_writeable); + +static void dual_role_dev_release(struct device *dev) +{ + struct dual_role_phy_instance *dual_role = + container_of(dev, struct dual_role_phy_instance, dev); + pr_debug("device: '%s': %s\n", dev_name(dev), __func__); + kfree(dual_role); +} + +static struct dual_role_phy_instance *__must_check +__dual_role_register(struct device *parent, + const struct dual_role_phy_desc *desc) +{ + struct device *dev; + struct dual_role_phy_instance *dual_role; + int rc; + + dual_role = kzalloc(sizeof(*dual_role), GFP_KERNEL); + if (!dual_role) + return ERR_PTR(-ENOMEM); + + dev = &dual_role->dev; + + device_initialize(dev); + + dev->class = dual_role_class; + dev->type = &dual_role_dev_type; + dev->parent = parent; + dev->release = dual_role_dev_release; + dev_set_drvdata(dev, dual_role); + dual_role->desc = desc; + + rc = dev_set_name(dev, "%s", desc->name); + if (rc) + goto dev_set_name_failed; + + INIT_WORK(&dual_role->changed_work, dual_role_changed_work); + + rc = device_init_wakeup(dev, true); + if (rc) + goto wakeup_init_failed; + + rc = device_add(dev); + if (rc) + goto device_add_failed; + + dual_role_instance_changed(dual_role); + + return dual_role; + +device_add_failed: + device_init_wakeup(dev, false); +wakeup_init_failed: +dev_set_name_failed: + put_device(dev); + kfree(dual_role); + + return ERR_PTR(rc); +} + +static void dual_role_instance_unregister(struct dual_role_phy_instance + *dual_role) +{ + cancel_work_sync(&dual_role->changed_work); + device_init_wakeup(&dual_role->dev, false); + device_unregister(&dual_role->dev); +} + +static void devm_dual_role_release(struct device *dev, void *res) +{ + struct dual_role_phy_instance **dual_role = res; + + dual_role_instance_unregister(*dual_role); +} + +struct dual_role_phy_instance *__must_check +devm_dual_role_instance_register(struct device *parent, + const struct dual_role_phy_desc *desc) +{ + struct dual_role_phy_instance **ptr, *dual_role; + + ptr = devres_alloc(devm_dual_role_release, sizeof(*ptr), GFP_KERNEL); + + if (!ptr) + return ERR_PTR(-ENOMEM); + dual_role = __dual_role_register(parent, desc); + if (IS_ERR(dual_role)) { + devres_free(ptr); + } else { + *ptr = dual_role; + devres_add(parent, ptr); + } + return dual_role; +} +EXPORT_SYMBOL_GPL(devm_dual_role_instance_register); + +static int devm_dual_role_match(struct device *dev, void *res, void *data) +{ + struct dual_role_phy_instance **r = res; + + if (WARN_ON(!r || !*r)) + return 0; + + return *r == data; +} + +void devm_dual_role_instance_unregister(struct device *dev, + struct dual_role_phy_instance + *dual_role) +{ + int rc; + + rc = devres_release(dev, devm_dual_role_release, + devm_dual_role_match, dual_role); + WARN_ON(rc); +} +EXPORT_SYMBOL_GPL(devm_dual_role_instance_unregister); + +void *dual_role_get_drvdata(struct dual_role_phy_instance *dual_role) +{ + return dual_role->drv_data; +} +EXPORT_SYMBOL_GPL(dual_role_get_drvdata); + +/***************** Device attribute functions **************************/ + +/* port type */ +static char *supported_modes_text[] = { + "ufp dfp", "dfp", "ufp" +}; + +/* current mode */ +static char *mode_text[] = { + "ufp", "dfp", "none" +}; + +/* Power role */ +static char *pr_text[] = { + "source", "sink", "none" +}; + +/* Data role */ +static char *dr_text[] = { + "host", "device", "none" +}; + +/* Vconn supply */ +static char *vconn_supply_text[] = { + "n", "y" +}; + +static ssize_t dual_role_show_property(struct device *dev, + struct device_attribute *attr, char *buf) +{ + ssize_t ret = 0; + struct dual_role_phy_instance *dual_role = dev_get_drvdata(dev); + const ptrdiff_t off = attr - dual_role_attrs; + unsigned int value; + + if (off == DUAL_ROLE_PROP_SUPPORTED_MODES) { + value = dual_role->desc->supported_modes; + } else { + ret = dual_role_get_property(dual_role, off, &value); + + if (ret < 0) { + if (ret == -ENODATA) + dev_dbg(dev, + "driver has no data for `%s' property\n", + attr->attr.name); + else if (ret != -ENODEV) + dev_err(dev, + "driver failed to report `%s' property: %zd\n", + attr->attr.name, ret); + return ret; + } + } + + if (off == DUAL_ROLE_PROP_SUPPORTED_MODES) { + BUILD_BUG_ON(DUAL_ROLE_PROP_SUPPORTED_MODES_TOTAL != + ARRAY_SIZE(supported_modes_text)); + if (value < DUAL_ROLE_PROP_SUPPORTED_MODES_TOTAL) + return snprintf(buf, PAGE_SIZE, "%s\n", + supported_modes_text[value]); + else + return -EIO; + } else if (off == DUAL_ROLE_PROP_MODE) { + BUILD_BUG_ON(DUAL_ROLE_PROP_MODE_TOTAL != + ARRAY_SIZE(mode_text)); + if (value < DUAL_ROLE_PROP_MODE_TOTAL) + return snprintf(buf, PAGE_SIZE, "%s\n", + mode_text[value]); + else + return -EIO; + } else if (off == DUAL_ROLE_PROP_PR) { + BUILD_BUG_ON(DUAL_ROLE_PROP_PR_TOTAL != ARRAY_SIZE(pr_text)); + if (value < DUAL_ROLE_PROP_PR_TOTAL) + return snprintf(buf, PAGE_SIZE, "%s\n", + pr_text[value]); + else + return -EIO; + } else if (off == DUAL_ROLE_PROP_DR) { + BUILD_BUG_ON(DUAL_ROLE_PROP_DR_TOTAL != ARRAY_SIZE(dr_text)); + if (value < DUAL_ROLE_PROP_DR_TOTAL) + return snprintf(buf, PAGE_SIZE, "%s\n", + dr_text[value]); + else + return -EIO; + } else if (off == DUAL_ROLE_PROP_VCONN_SUPPLY) { + BUILD_BUG_ON(DUAL_ROLE_PROP_VCONN_SUPPLY_TOTAL != + ARRAY_SIZE(vconn_supply_text)); + if (value < DUAL_ROLE_PROP_VCONN_SUPPLY_TOTAL) + return snprintf(buf, PAGE_SIZE, "%s\n", + vconn_supply_text[value]); + else + return -EIO; + } else + return -EIO; +} + +static ssize_t dual_role_store_property(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + struct dual_role_phy_instance *dual_role = dev_get_drvdata(dev); + const ptrdiff_t off = attr - dual_role_attrs; + unsigned int value; + int total, i; + char *dup_buf, **text_array; + bool result = false; + + dup_buf = kstrdupcase(buf, GFP_KERNEL, false); + switch (off) { + case DUAL_ROLE_PROP_MODE: + total = DUAL_ROLE_PROP_MODE_TOTAL; + text_array = mode_text; + break; + case DUAL_ROLE_PROP_PR: + total = DUAL_ROLE_PROP_PR_TOTAL; + text_array = pr_text; + break; + case DUAL_ROLE_PROP_DR: + total = DUAL_ROLE_PROP_DR_TOTAL; + text_array = dr_text; + break; + case DUAL_ROLE_PROP_VCONN_SUPPLY: + ret = strtobool(dup_buf, &result); + value = result; + if (!ret) + goto setprop; + default: + ret = -EINVAL; + goto error; + } + + for (i = 0; i <= total; i++) { + if (i == total) { + ret = -ENOTSUPP; + goto error; + } + if (!strncmp(*(text_array + i), dup_buf, + strlen(*(text_array + i)))) { + value = i; + break; + } + } + +setprop: + ret = dual_role->desc->set_property(dual_role, off, &value); + +error: + kfree(dup_buf); + + if (ret < 0) + return ret; + + return count; +} + +static umode_t dual_role_attr_is_visible(struct kobject *kobj, + struct attribute *attr, int attrno) +{ + struct device *dev = container_of(kobj, struct device, kobj); + struct dual_role_phy_instance *dual_role = dev_get_drvdata(dev); + umode_t mode = S_IRUSR | S_IRGRP | S_IROTH; + int i; + + if (attrno == DUAL_ROLE_PROP_SUPPORTED_MODES) + return mode; + + for (i = 0; i < dual_role->desc->num_properties; i++) { + int property = dual_role->desc->properties[i]; + + if (property == attrno) { + if (dual_role->desc->property_is_writeable && + dual_role_property_is_writeable(dual_role, property) + > 0) + mode |= S_IWUSR; + + return mode; + } + } + + return 0; +} + +static struct attribute *__dual_role_attrs[ARRAY_SIZE(dual_role_attrs) + 1]; + +static struct attribute_group dual_role_attr_group = { + .attrs = __dual_role_attrs, + .is_visible = dual_role_attr_is_visible, +}; + +static const struct attribute_group *dual_role_attr_groups[] = { + &dual_role_attr_group, + NULL, +}; + +void dual_role_init_attrs(struct device_type *dev_type) +{ + int i; + + dev_type->groups = dual_role_attr_groups; + + for (i = 0; i < ARRAY_SIZE(dual_role_attrs); i++) + __dual_role_attrs[i] = &dual_role_attrs[i].attr; +} + +int dual_role_uevent(struct device *dev, struct kobj_uevent_env *env) +{ + struct dual_role_phy_instance *dual_role = dev_get_drvdata(dev); + int ret = 0, j; + char *prop_buf; + char *attrname; + + dev_dbg(dev, "uevent\n"); + + if (!dual_role || !dual_role->desc) { + dev_dbg(dev, "No dual_role phy yet\n"); + return ret; + } + + dev_dbg(dev, "DUAL_ROLE_NAME=%s\n", dual_role->desc->name); + + ret = add_uevent_var(env, "DUAL_ROLE_NAME=%s", dual_role->desc->name); + if (ret) + return ret; + + prop_buf = (char *)get_zeroed_page(GFP_KERNEL); + if (!prop_buf) + return -ENOMEM; + + for (j = 0; j < dual_role->desc->num_properties; j++) { + struct device_attribute *attr; + char *line; + + attr = &dual_role_attrs[dual_role->desc->properties[j]]; + + ret = dual_role_show_property(dev, attr, prop_buf); + if (ret == -ENODEV || ret == -ENODATA) { + ret = 0; + continue; + } + + if (ret < 0) + goto out; + line = strnchr(prop_buf, PAGE_SIZE, '\n'); + if (line) + *line = 0; + + attrname = kstrdupcase(attr->attr.name, GFP_KERNEL, true); + if (!attrname) + ret = -ENOMEM; + + dev_dbg(dev, "prop %s=%s\n", attrname, prop_buf); + + ret = add_uevent_var(env, "DUAL_ROLE_%s=%s", attrname, + prop_buf); + kfree(attrname); + if (ret) + goto out; + } + +out: + free_page((unsigned long)prop_buf); + + return ret; +} + +/******************* Module Init ***********************************/ + +static int __init dual_role_class_init(void) +{ + dual_role_class = class_create(THIS_MODULE, "dual_role_usb"); + + if (IS_ERR(dual_role_class)) + return PTR_ERR(dual_role_class); + + dual_role_class->dev_uevent = dual_role_uevent; + dual_role_init_attrs(&dual_role_dev_type); + + return 0; +} + +static void __exit dual_role_class_exit(void) +{ + class_destroy(dual_role_class); +} + +subsys_initcall(dual_role_class_init); +module_exit(dual_role_class_exit);
diff --git a/drivers/usb/phy/otg-wakelock.c b/drivers/usb/phy/otg-wakelock.c new file mode 100644 index 0000000..ecd7410 --- /dev/null +++ b/drivers/usb/phy/otg-wakelock.c
@@ -0,0 +1,170 @@ +/* + * otg-wakelock.c + * + * Copyright (C) 2011 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/kernel.h> +#include <linux/device.h> +#include <linux/err.h> +#include <linux/module.h> +#include <linux/notifier.h> +#include <linux/spinlock.h> +#include <linux/usb/otg.h> + +#define TEMPORARY_HOLD_TIME 2000 + +static bool enabled = true; +static struct usb_phy *otgwl_xceiv; +static struct notifier_block otgwl_nb; + +/* + * otgwl_spinlock is held while the VBUS lock is grabbed or dropped and the + * held field is updated to match. + */ + +static DEFINE_SPINLOCK(otgwl_spinlock); + +/* + * Only one lock, but since these 3 fields are associated with each other... + */ + +struct otgwl_lock { + char name[40]; + struct wakeup_source wakesrc; + bool held; +}; + +/* + * VBUS present lock. Also used as a timed lock on charger + * connect/disconnect and USB host disconnect, to allow the system + * to react to the change in power. + */ + +static struct otgwl_lock vbus_lock; + +static void otgwl_hold(struct otgwl_lock *lock) +{ + if (!lock->held) { + __pm_stay_awake(&lock->wakesrc); + lock->held = true; + } +} + +static void otgwl_temporary_hold(struct otgwl_lock *lock) +{ + __pm_wakeup_event(&lock->wakesrc, TEMPORARY_HOLD_TIME); + lock->held = false; +} + +static void otgwl_drop(struct otgwl_lock *lock) +{ + if (lock->held) { + __pm_relax(&lock->wakesrc); + lock->held = false; + } +} + +static void otgwl_handle_event(unsigned long event) +{ + unsigned long irqflags; + + spin_lock_irqsave(&otgwl_spinlock, irqflags); + + if (!enabled) { + otgwl_drop(&vbus_lock); + spin_unlock_irqrestore(&otgwl_spinlock, irqflags); + return; + } + + switch (event) { + case USB_EVENT_VBUS: + case USB_EVENT_ENUMERATED: + otgwl_hold(&vbus_lock); + break; + + case USB_EVENT_NONE: + case USB_EVENT_ID: + case USB_EVENT_CHARGER: + otgwl_temporary_hold(&vbus_lock); + break; + + default: + break; + } + + spin_unlock_irqrestore(&otgwl_spinlock, irqflags); +} + +static int otgwl_otg_notifications(struct notifier_block *nb, + unsigned long event, void *unused) +{ + otgwl_handle_event(event); + return NOTIFY_OK; +} + +static int set_enabled(const char *val, const struct kernel_param *kp) +{ + int rv = param_set_bool(val, kp); + + if (rv) + return rv; + + if (otgwl_xceiv) + otgwl_handle_event(otgwl_xceiv->last_event); + + return 0; +} + +static struct kernel_param_ops enabled_param_ops = { + .set = set_enabled, + .get = param_get_bool, +}; + +module_param_cb(enabled, &enabled_param_ops, &enabled, 0644); +MODULE_PARM_DESC(enabled, "enable wakelock when VBUS present"); + +static int __init otg_wakelock_init(void) +{ + int ret; + struct usb_phy *phy; + + phy = usb_get_phy(USB_PHY_TYPE_USB2); + + if (IS_ERR(phy)) { + pr_err("%s: No USB transceiver found\n", __func__); + return PTR_ERR(phy); + } + otgwl_xceiv = phy; + + snprintf(vbus_lock.name, sizeof(vbus_lock.name), "vbus-%s", + dev_name(otgwl_xceiv->dev)); + wakeup_source_init(&vbus_lock.wakesrc, vbus_lock.name); + + otgwl_nb.notifier_call = otgwl_otg_notifications; + ret = usb_register_notifier(otgwl_xceiv, &otgwl_nb); + + if (ret) { + pr_err("%s: usb_register_notifier on transceiver %s" + " failed\n", __func__, + dev_name(otgwl_xceiv->dev)); + otgwl_xceiv = NULL; + wakeup_source_trash(&vbus_lock.wakesrc); + return ret; + } + + otgwl_handle_event(otgwl_xceiv->last_event); + return ret; +} + +late_initcall(otg_wakelock_init);
diff --git a/drivers/video/console/dummycon.c b/drivers/video/console/dummycon.c index b90ef96..25d5bc3 100644 --- a/drivers/video/console/dummycon.c +++ b/drivers/video/console/dummycon.c
@@ -41,12 +41,55 @@ static void dummycon_init(struct vc_data *vc, int init) vc_resize(vc, DUMMY_COLUMNS, DUMMY_ROWS); } -static int dummycon_dummy(void) +static void dummycon_deinit(struct vc_data *vc) +{ +} + +static void dummycon_clear(struct vc_data *vc, int a, int b, int c, int d) +{ +} + +static void dummycon_putc(struct vc_data *vc, int a, int b, int c) +{ +} + +static void dummycon_putcs(struct vc_data *vc, const unsigned short *s, int a, int b, int c) +{ +} + +static void dummycon_cursor(struct vc_data *vc, int a) +{ +} + +static int dummycon_scroll(struct vc_data *vc, int a, int b, int c, int d) { return 0; } -#define DUMMY (void *)dummycon_dummy +static int dummycon_switch(struct vc_data *vc) +{ + return 0; +} + +static int dummycon_blank(struct vc_data *vc, int a, int b) +{ + return 0; +} + +static int dummycon_font_set(struct vc_data *vc, struct console_font *f, unsigned u) +{ + return 0; +} + +static int dummycon_font_default(struct vc_data *vc, struct console_font *f, char *c) +{ + return 0; +} + +static int dummycon_font_copy(struct vc_data *vc, int a) +{ + return 0; +} /* * The console `switch' structure for the dummy console @@ -58,16 +101,16 @@ const struct consw dummy_con = { .owner = THIS_MODULE, .con_startup = dummycon_startup, .con_init = dummycon_init, - .con_deinit = DUMMY, - .con_clear = DUMMY, - .con_putc = DUMMY, - .con_putcs = DUMMY, - .con_cursor = DUMMY, - .con_scroll = DUMMY, - .con_switch = DUMMY, - .con_blank = DUMMY, - .con_font_set = DUMMY, - .con_font_default = DUMMY, - .con_font_copy = DUMMY, + .con_deinit = dummycon_deinit, + .con_clear = dummycon_clear, + .con_putc = dummycon_putc, + .con_putcs = dummycon_putcs, + .con_cursor = dummycon_cursor, + .con_scroll = dummycon_scroll, + .con_switch = dummycon_switch, + .con_blank = dummycon_blank, + .con_font_set = dummycon_font_set, + .con_font_default = dummycon_font_default, + .con_font_copy = dummycon_font_copy, }; EXPORT_SYMBOL_GPL(dummy_con);
diff --git a/drivers/video/fbdev/goldfishfb.c b/drivers/video/fbdev/goldfishfb.c index 14a93cb..8c93ad1d 100644 --- a/drivers/video/fbdev/goldfishfb.c +++ b/drivers/video/fbdev/goldfishfb.c
@@ -26,6 +26,7 @@ #include <linux/interrupt.h> #include <linux/ioport.h> #include <linux/platform_device.h> +#include <linux/acpi.h> enum { FB_GET_WIDTH = 0x00, @@ -234,7 +235,7 @@ static int goldfish_fb_probe(struct platform_device *pdev) fb->fb.var.activate = FB_ACTIVATE_NOW; fb->fb.var.height = readl(fb->reg_base + FB_GET_PHYS_HEIGHT); fb->fb.var.width = readl(fb->reg_base + FB_GET_PHYS_WIDTH); - fb->fb.var.pixclock = 10000; + fb->fb.var.pixclock = 0; fb->fb.var.red.offset = 11; fb->fb.var.red.length = 5; @@ -305,12 +306,25 @@ static int goldfish_fb_remove(struct platform_device *pdev) return 0; } +static const struct of_device_id goldfish_fb_of_match[] = { + { .compatible = "google,goldfish-fb", }, + {}, +}; +MODULE_DEVICE_TABLE(of, goldfish_fb_of_match); + +static const struct acpi_device_id goldfish_fb_acpi_match[] = { + { "GFSH0004", 0 }, + { }, +}; +MODULE_DEVICE_TABLE(acpi, goldfish_fb_acpi_match); static struct platform_driver goldfish_fb_driver = { .probe = goldfish_fb_probe, .remove = goldfish_fb_remove, .driver = { - .name = "goldfish_fb" + .name = "goldfish_fb", + .of_match_table = goldfish_fb_of_match, + .acpi_match_table = ACPI_PTR(goldfish_fb_acpi_match), } };
diff --git a/drivers/w1/masters/ds2482.c b/drivers/w1/masters/ds2482.c index 2e30db1..fa13fa8 100644 --- a/drivers/w1/masters/ds2482.c +++ b/drivers/w1/masters/ds2482.c
@@ -18,6 +18,8 @@ #include <linux/slab.h> #include <linux/i2c.h> #include <linux/delay.h> +#include <linux/gpio.h> +#include <linux/platform_data/ds2482.h> #include <asm/delay.h> #include "../w1.h" @@ -97,7 +99,8 @@ static const u8 ds2482_chan_rd[8] = static int ds2482_probe(struct i2c_client *client, const struct i2c_device_id *id); static int ds2482_remove(struct i2c_client *client); - +static int ds2482_suspend(struct device *dev); +static int ds2482_resume(struct device *dev); /** * Driver data (common to all clients) @@ -108,9 +111,15 @@ static const struct i2c_device_id ds2482_id[] = { }; MODULE_DEVICE_TABLE(i2c, ds2482_id); +static const struct dev_pm_ops ds2482_pm_ops = { + .suspend = ds2482_suspend, + .resume = ds2482_resume, +}; + static struct i2c_driver ds2482_driver = { .driver = { .name = "ds2482", + .pm = &ds2482_pm_ops, }, .probe = ds2482_probe, .remove = ds2482_remove, @@ -132,6 +141,7 @@ struct ds2482_w1_chan { struct ds2482_data { struct i2c_client *client; struct mutex access_lock; + int slpz_gpio; /* 1-wire interface(s) */ int w1_count; /* 1 or 8 */ @@ -460,11 +470,31 @@ static u8 ds2482_w1_set_pullup(void *data, int delay) return retval; } +static int ds2482_suspend(struct device *dev) +{ + struct i2c_client *client = to_i2c_client(dev); + struct ds2482_data *data = i2c_get_clientdata(client); + + if (data->slpz_gpio >= 0) + gpio_set_value(data->slpz_gpio, 0); + return 0; +} + +static int ds2482_resume(struct device *dev) +{ + struct i2c_client *client = to_i2c_client(dev); + struct ds2482_data *data = i2c_get_clientdata(client); + + if (data->slpz_gpio >= 0) + gpio_set_value(data->slpz_gpio, 1); + return 0; +} static int ds2482_probe(struct i2c_client *client, const struct i2c_device_id *id) { struct ds2482_data *data; + struct ds2482_platform_data *pdata; int err = -ENODEV; int temp1; int idx; @@ -531,6 +561,16 @@ static int ds2482_probe(struct i2c_client *client, } } + pdata = client->dev.platform_data; + data->slpz_gpio = pdata ? pdata->slpz_gpio : -1; + + if (data->slpz_gpio >= 0) { + err = gpio_request_one(data->slpz_gpio, GPIOF_OUT_INIT_HIGH, + "ds2482.slpz"); + if (err < 0) + goto exit_w1_remove; + } + return 0; exit_w1_remove: @@ -555,6 +595,11 @@ static int ds2482_remove(struct i2c_client *client) w1_remove_master_device(&data->w1_ch[idx].w1_bm); } + if (data->slpz_gpio >= 0) { + gpio_set_value(data->slpz_gpio, 0); + gpio_free(data->slpz_gpio); + } + /* Free the memory */ kfree(data); return 0;
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index 8feab810..7bd882d 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile
@@ -7,8 +7,10 @@ nostackp := $(call cc-option, -fno-stack-protector) CFLAGS_features.o := $(nostackp) +ifndef CONFIG_LTO_CLANG CFLAGS_efi.o += -fshort-wchar LDFLAGS += $(call ld-option, --no-wchar-size-warning) +endif dom0-$(CONFIG_ARM64) += arm-device.o dom0-$(CONFIG_PCI) += pci.o
diff --git a/fs/Kconfig b/fs/Kconfig index 4bd03a2..20a8d95 100644 --- a/fs/Kconfig +++ b/fs/Kconfig
@@ -227,6 +227,7 @@ source "fs/adfs/Kconfig" source "fs/affs/Kconfig" source "fs/ecryptfs/Kconfig" +source "fs/sdcardfs/Kconfig" source "fs/hfs/Kconfig" source "fs/hfsplus/Kconfig" source "fs/befs/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile index ed2b632..f207d43 100644 --- a/fs/Makefile +++ b/fs/Makefile
@@ -3,7 +3,7 @@ # # 14 Sep 2000, Christoph Hellwig <hch@infradead.org> # Rewritten to use lists instead of if-statements. -# +# obj-y := open.o read_write.o file_table.o super.o \ char_dev.o stat.o exec.o pipe.o namei.o fcntl.o \ @@ -61,7 +61,7 @@ obj-$(CONFIG_PROFILING) += dcookies.o obj-$(CONFIG_DLM) += dlm/ - + # Do not add any filesystems before this line obj-$(CONFIG_FSCACHE) += fscache/ obj-$(CONFIG_REISERFS_FS) += reiserfs/ @@ -83,6 +83,7 @@ obj-$(CONFIG_HFSPLUS_FS) += hfsplus/ # Before hfs to find wrapped HFS+ obj-$(CONFIG_HFS_FS) += hfs/ obj-$(CONFIG_ECRYPT_FS) += ecryptfs/ +obj-$(CONFIG_SDCARD_FS) += sdcardfs/ obj-$(CONFIG_VXFS_FS) += freevxfs/ obj-$(CONFIG_NFS_FS) += nfs/ obj-$(CONFIG_EXPORTFS) += exportfs/
diff --git a/fs/afs/file.c b/fs/afs/file.c index 7237297..37f97ba 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c
@@ -123,11 +123,10 @@ static void afs_file_readpage_read_complete(struct page *page, /* * read page from file, directory or symlink, given a key to use */ -int afs_page_filler(void *data, struct page *page) +static int __afs_page_filler(struct key *key, struct page *page) { struct inode *inode = page->mapping->host; struct afs_vnode *vnode = AFS_FS_I(inode); - struct key *key = data; size_t len; off_t offset; int ret; @@ -209,6 +208,13 @@ int afs_page_filler(void *data, struct page *page) return ret; } +int afs_page_filler(struct file *data, struct page *page) +{ + struct key *key = (struct key *)data; + + return __afs_page_filler(key, page); +} + /* * read page from file, directory or symlink, given a file to nominate the key * to be used @@ -221,14 +227,14 @@ static int afs_readpage(struct file *file, struct page *page) if (file) { key = file->private_data; ASSERT(key != NULL); - ret = afs_page_filler(key, page); + ret = __afs_page_filler(key, page); } else { struct inode *inode = page->mapping->host; key = afs_request_key(AFS_FS_S(inode->i_sb)->volume->cell); if (IS_ERR(key)) { ret = PTR_ERR(key); } else { - ret = afs_page_filler(key, page); + ret = __afs_page_filler(key, page); key_put(key); } }
diff --git a/fs/afs/internal.h b/fs/afs/internal.h index dd98dcd..b4165cc 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h
@@ -497,7 +497,7 @@ extern const struct file_operations afs_file_operations; extern int afs_open(struct inode *, struct file *); extern int afs_release(struct inode *, struct file *); -extern int afs_page_filler(void *, struct page *); +extern int afs_page_filler(struct file *, struct page *); /* * flock.c
diff --git a/fs/attr.c b/fs/attr.c index c902b3d..c4093c5 100644 --- a/fs/attr.c +++ b/fs/attr.c
@@ -200,7 +200,7 @@ EXPORT_SYMBOL(setattr_copy); * the file open for write, as there can be no conflicting delegation in * that case. */ -int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode) +int notify_change2(struct vfsmount *mnt, struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode) { struct inode *inode = dentry->d_inode; umode_t mode = inode->i_mode; @@ -224,7 +224,7 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de return -EPERM; if (!inode_owner_or_capable(inode)) { - error = inode_permission(inode, MAY_WRITE); + error = inode_permission2(mnt, inode, MAY_WRITE); if (error) return error; } @@ -307,7 +307,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de if (error) return error; - if (inode->i_op->setattr) + if (mnt && inode->i_op->setattr2) + error = inode->i_op->setattr2(mnt, dentry, attr); + else if (inode->i_op->setattr) error = inode->i_op->setattr(dentry, attr); else error = simple_setattr(dentry, attr); @@ -320,4 +322,10 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de return error; } +EXPORT_SYMBOL(notify_change2); + +int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode) +{ + return notify_change2(NULL, dentry, attr, delegated_inode); +} EXPORT_SYMBOL(notify_change);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2b96ca6..a1615e8 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c
@@ -3830,8 +3830,8 @@ int btree_write_cache_pages(struct address_space *mapping, if (wbc->sync_mode == WB_SYNC_ALL) tag_pages_for_writeback(mapping, index, end); while (!done && !nr_to_write_done && (index <= end) && - (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) { + (nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, + tag))) { unsigned i; scanned = 1; @@ -3841,11 +3841,6 @@ int btree_write_cache_pages(struct address_space *mapping, if (!PagePrivate(page)) continue; - if (!wbc->range_cyclic && page->index > end) { - done = 1; - break; - } - spin_lock(&mapping->private_lock); if (!PagePrivate(page)) { spin_unlock(&mapping->private_lock); @@ -3978,8 +3973,8 @@ static int extent_write_cache_pages(struct extent_io_tree *tree, tag_pages_for_writeback(mapping, index, end); done_index = index; while (!done && !nr_to_write_done && (index <= end) && - (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) { + (nr_pages = pagevec_lookup_range_tag(&pvec, mapping, + &index, end, tag))) { unsigned i; scanned = 1; @@ -4004,12 +3999,6 @@ static int extent_write_cache_pages(struct extent_io_tree *tree, continue; } - if (!wbc->range_cyclic && page->index > end) { - done = 1; - unlock_page(page); - continue; - } - if (wbc->sync_mode != WB_SYNC_NONE) { if (PageWriteback(page)) flush_fn(data);
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 7b79a54..546d643 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c
@@ -838,21 +838,15 @@ static int ceph_writepages_start(struct address_space *mapping, struct page **pages = NULL, **data_pages; mempool_t *pool = NULL; /* Becomes non-null if mempool used */ struct page *page; - int want; u64 offset = 0, len = 0; max_pages = max_pages_ever; get_more_pages: first = -1; - want = min(end - index, - min((pgoff_t)PAGEVEC_SIZE, - max_pages - (pgoff_t)locked_pages) - 1) - + 1; - pvec_pages = pagevec_lookup_tag(&pvec, mapping, &index, - PAGECACHE_TAG_DIRTY, - want); - dout("pagevec_lookup_tag got %d\n", pvec_pages); + pvec_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, + end, PAGECACHE_TAG_DIRTY); + dout("pagevec_lookup_range_tag got %d\n", pvec_pages); if (!pvec_pages && !locked_pages) break; for (i = 0; i < pvec_pages && locked_pages < max_pages; i++) { @@ -870,12 +864,6 @@ static int ceph_writepages_start(struct address_space *mapping, unlock_page(page); break; } - if (!wbc->range_cyclic && page->index > end) { - dout("end of range %p\n", page); - done = 1; - unlock_page(page); - break; - } if (strip_unit_end && (page->index > strip_unit_end)) { dout("end of strip unit %p\n", page); unlock_page(page);
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 441d434..c4a27f6 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c
@@ -2439,7 +2439,7 @@ cifs_set_cifscreds(struct smb_vol *vol, struct cifs_ses *ses) } down_read(&key->sem); - upayload = user_key_payload(key); + upayload = user_key_payload_locked(key); if (IS_ERR_OR_NULL(upayload)) { rc = upayload ? PTR_ERR(upayload) : -EINVAL; goto out_key_put;
diff --git a/fs/coredump.c b/fs/coredump.c index 4407e27..00a900a 100644 --- a/fs/coredump.c +++ b/fs/coredump.c
@@ -744,7 +744,7 @@ void do_coredump(const siginfo_t *siginfo) goto close_fail; if (!(cprm.file->f_mode & FMODE_CAN_WRITE)) goto close_fail; - if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file)) + if (do_truncate2(cprm.file->f_path.mnt, cprm.file->f_path.dentry, 0, 0, cprm.file)) goto close_fail; }
diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig index 92348fa..08b46e6 100644 --- a/fs/crypto/Kconfig +++ b/fs/crypto/Kconfig
@@ -1,6 +1,5 @@ config FS_ENCRYPTION tristate "FS Encryption (Per-file encryption)" - depends on BLOCK select CRYPTO select CRYPTO_AES select CRYPTO_CBC @@ -8,9 +7,7 @@ select CRYPTO_XTS select CRYPTO_CTS select CRYPTO_CTR - select CRYPTO_SHA256 select KEYS - select ENCRYPTED_KEYS help Enable encryption of files and directories. This feature is similar to ecryptfs, but it is more memory
diff --git a/fs/crypto/Makefile b/fs/crypto/Makefile index f17684c..cb49698 100644 --- a/fs/crypto/Makefile +++ b/fs/crypto/Makefile
@@ -1,3 +1,4 @@ obj-$(CONFIG_FS_ENCRYPTION) += fscrypto.o -fscrypto-y := crypto.o fname.o policy.o keyinfo.o +fscrypto-y := crypto.o fname.o hooks.o keyinfo.o policy.o +fscrypto-$(CONFIG_BLOCK) += bio.o
diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c new file mode 100644 index 0000000..d7b4c48 --- /dev/null +++ b/fs/crypto/bio.c
@@ -0,0 +1,154 @@ +/* + * This contains encryption functions for per-file encryption. + * + * Copyright (C) 2015, Google, Inc. + * Copyright (C) 2015, Motorola Mobility + * + * Written by Michael Halcrow, 2014. + * + * Filename encryption additions + * Uday Savagaonkar, 2014 + * Encryption policy handling additions + * Ildar Muslukhov, 2014 + * Add fscrypt_pullback_bio_page() + * Jaegeuk Kim, 2015. + * + * This has not yet undergone a rigorous security audit. + * + * The usage of AES-XTS should conform to recommendations in NIST + * Special Publication 800-38E and IEEE P1619/D16. + */ + +#include <linux/pagemap.h> +#include <linux/module.h> +#include <linux/bio.h> +#include <linux/namei.h> +#include "fscrypt_private.h" + +static void __fscrypt_decrypt_bio(struct bio *bio, bool done) +{ + struct bio_vec *bv; + int i; + + bio_for_each_segment_all(bv, bio, i) { + struct page *page = bv->bv_page; + int ret = fscrypt_decrypt_page(page->mapping->host, page, + PAGE_SIZE, 0, page->index); + + if (ret) { + WARN_ON_ONCE(1); + SetPageError(page); + } else if (done) { + SetPageUptodate(page); + } + if (done) + unlock_page(page); + } +} + +void fscrypt_decrypt_bio(struct bio *bio) +{ + __fscrypt_decrypt_bio(bio, false); +} +EXPORT_SYMBOL(fscrypt_decrypt_bio); + +static void completion_pages(struct work_struct *work) +{ + struct fscrypt_ctx *ctx = + container_of(work, struct fscrypt_ctx, r.work); + struct bio *bio = ctx->r.bio; + + __fscrypt_decrypt_bio(bio, true); + fscrypt_release_ctx(ctx); + bio_put(bio); +} + +void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx, struct bio *bio) +{ + INIT_WORK(&ctx->r.work, completion_pages); + ctx->r.bio = bio; + fscrypt_enqueue_decrypt_work(&ctx->r.work); +} +EXPORT_SYMBOL(fscrypt_enqueue_decrypt_bio); + +void fscrypt_pullback_bio_page(struct page **page, bool restore) +{ + struct fscrypt_ctx *ctx; + struct page *bounce_page; + + /* The bounce data pages are unmapped. */ + if ((*page)->mapping) + return; + + /* The bounce data page is unmapped. */ + bounce_page = *page; + ctx = (struct fscrypt_ctx *)page_private(bounce_page); + + /* restore control page */ + *page = ctx->w.control_page; + + if (restore) + fscrypt_restore_control_page(bounce_page); +} +EXPORT_SYMBOL(fscrypt_pullback_bio_page); + +int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk, + sector_t pblk, unsigned int len) +{ + struct fscrypt_ctx *ctx; + struct page *ciphertext_page = NULL; + struct bio *bio; + int ret, err = 0; + + BUG_ON(inode->i_sb->s_blocksize != PAGE_SIZE); + + ctx = fscrypt_get_ctx(inode, GFP_NOFS); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ciphertext_page = fscrypt_alloc_bounce_page(ctx, GFP_NOWAIT); + if (IS_ERR(ciphertext_page)) { + err = PTR_ERR(ciphertext_page); + goto errout; + } + + while (len--) { + err = fscrypt_do_page_crypto(inode, FS_ENCRYPT, lblk, + ZERO_PAGE(0), ciphertext_page, + PAGE_SIZE, 0, GFP_NOFS); + if (err) + goto errout; + + bio = bio_alloc(GFP_NOWAIT, 1); + if (!bio) { + err = -ENOMEM; + goto errout; + } + bio->bi_bdev = inode->i_sb->s_bdev; + bio->bi_iter.bi_sector = + pblk << (inode->i_sb->s_blocksize_bits - 9); + bio_set_op_attrs(bio, REQ_OP_WRITE, 0); + ret = bio_add_page(bio, ciphertext_page, + inode->i_sb->s_blocksize, 0); + if (ret != inode->i_sb->s_blocksize) { + /* should never happen! */ + WARN_ON(1); + bio_put(bio); + err = -EIO; + goto errout; + } + err = submit_bio_wait(bio); + if (err == 0 && bio->bi_error) + err = -EIO; + bio_put(bio); + if (err) + goto errout; + lblk++; + pblk++; + } + err = 0; +errout: + fscrypt_release_ctx(ctx); + return err; +} +EXPORT_SYMBOL(fscrypt_zeroout_range);
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c index 1a89625..0f46cf5 100644 --- a/fs/crypto/crypto.c +++ b/fs/crypto/crypto.c
@@ -24,10 +24,11 @@ #include <linux/module.h> #include <linux/scatterlist.h> #include <linux/ratelimit.h> -#include <linux/bio.h> #include <linux/dcache.h> #include <linux/namei.h> -#include <linux/fscrypto.h> +#include <crypto/aes.h> +#include <crypto/skcipher.h> +#include "fscrypt_private.h" static unsigned int num_prealloc_crypto_pages = 32; static unsigned int num_prealloc_crypto_ctxs = 128; @@ -50,6 +51,12 @@ static DEFINE_MUTEX(fscrypt_init_mutex); static struct kmem_cache *fscrypt_ctx_cachep; struct kmem_cache *fscrypt_info_cachep; +void fscrypt_enqueue_decrypt_work(struct work_struct *work) +{ + queue_work(fscrypt_read_workqueue, work); +} +EXPORT_SYMBOL(fscrypt_enqueue_decrypt_work); + /** * fscrypt_release_ctx() - Releases an encryption context * @ctx: The encryption context to release. @@ -63,7 +70,7 @@ void fscrypt_release_ctx(struct fscrypt_ctx *ctx) { unsigned long flags; - if (ctx->flags & FS_WRITE_PATH_FL && ctx->w.bounce_page) { + if (ctx->flags & FS_CTX_HAS_BOUNCE_BUFFER_FL && ctx->w.bounce_page) { mempool_free(ctx->w.bounce_page, fscrypt_bounce_page_pool); ctx->w.bounce_page = NULL; } @@ -88,7 +95,7 @@ EXPORT_SYMBOL(fscrypt_release_ctx); * Return: An allocated and initialized encryption context on success; error * value or NULL otherwise. */ -struct fscrypt_ctx *fscrypt_get_ctx(struct inode *inode, gfp_t gfp_flags) +struct fscrypt_ctx *fscrypt_get_ctx(const struct inode *inode, gfp_t gfp_flags) { struct fscrypt_ctx *ctx = NULL; struct fscrypt_info *ci = inode->i_crypt_info; @@ -121,134 +128,147 @@ struct fscrypt_ctx *fscrypt_get_ctx(struct inode *inode, gfp_t gfp_flags) } else { ctx->flags &= ~FS_CTX_REQUIRES_FREE_ENCRYPT_FL; } - ctx->flags &= ~FS_WRITE_PATH_FL; + ctx->flags &= ~FS_CTX_HAS_BOUNCE_BUFFER_FL; return ctx; } EXPORT_SYMBOL(fscrypt_get_ctx); -/** - * page_crypt_complete() - completion callback for page crypto - * @req: The asynchronous cipher request context - * @res: The result of the cipher operation - */ -static void page_crypt_complete(struct crypto_async_request *req, int res) -{ - struct fscrypt_completion_result *ecr = req->data; - - if (res == -EINPROGRESS) - return; - ecr->res = res; - complete(&ecr->completion); -} - -typedef enum { - FS_DECRYPT = 0, - FS_ENCRYPT, -} fscrypt_direction_t; - -static int do_page_crypto(struct inode *inode, - fscrypt_direction_t rw, pgoff_t index, - struct page *src_page, struct page *dest_page, - gfp_t gfp_flags) +int fscrypt_do_page_crypto(const struct inode *inode, fscrypt_direction_t rw, + u64 lblk_num, struct page *src_page, + struct page *dest_page, unsigned int len, + unsigned int offs, gfp_t gfp_flags) { struct { __le64 index; - u8 padding[FS_XTS_TWEAK_SIZE - sizeof(__le64)]; - } xts_tweak; + u8 padding[FS_IV_SIZE - sizeof(__le64)]; + } iv; struct skcipher_request *req = NULL; - DECLARE_FS_COMPLETION_RESULT(ecr); + DECLARE_CRYPTO_WAIT(wait); struct scatterlist dst, src; struct fscrypt_info *ci = inode->i_crypt_info; struct crypto_skcipher *tfm = ci->ci_ctfm; int res = 0; - req = skcipher_request_alloc(tfm, gfp_flags); - if (!req) { - printk_ratelimited(KERN_ERR - "%s: crypto_request_alloc() failed\n", - __func__); - return -ENOMEM; + BUG_ON(len == 0); + + BUILD_BUG_ON(sizeof(iv) != FS_IV_SIZE); + BUILD_BUG_ON(AES_BLOCK_SIZE != FS_IV_SIZE); + iv.index = cpu_to_le64(lblk_num); + memset(iv.padding, 0, sizeof(iv.padding)); + + if (ci->ci_essiv_tfm != NULL) { + crypto_cipher_encrypt_one(ci->ci_essiv_tfm, (u8 *)&iv, + (u8 *)&iv); } + req = skcipher_request_alloc(tfm, gfp_flags); + if (!req) + return -ENOMEM; + skcipher_request_set_callback( req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP, - page_crypt_complete, &ecr); - - BUILD_BUG_ON(sizeof(xts_tweak) != FS_XTS_TWEAK_SIZE); - xts_tweak.index = cpu_to_le64(index); - memset(xts_tweak.padding, 0, sizeof(xts_tweak.padding)); + crypto_req_done, &wait); sg_init_table(&dst, 1); - sg_set_page(&dst, dest_page, PAGE_SIZE, 0); + sg_set_page(&dst, dest_page, len, offs); sg_init_table(&src, 1); - sg_set_page(&src, src_page, PAGE_SIZE, 0); - skcipher_request_set_crypt(req, &src, &dst, PAGE_SIZE, &xts_tweak); + sg_set_page(&src, src_page, len, offs); + skcipher_request_set_crypt(req, &src, &dst, len, &iv); if (rw == FS_DECRYPT) - res = crypto_skcipher_decrypt(req); + res = crypto_wait_req(crypto_skcipher_decrypt(req), &wait); else - res = crypto_skcipher_encrypt(req); - if (res == -EINPROGRESS || res == -EBUSY) { - BUG_ON(req->base.data != &ecr); - wait_for_completion(&ecr.completion); - res = ecr.res; - } + res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait); skcipher_request_free(req); if (res) { - printk_ratelimited(KERN_ERR - "%s: crypto_skcipher_encrypt() returned %d\n", - __func__, res); + fscrypt_err(inode->i_sb, + "%scryption failed for inode %lu, block %llu: %d", + (rw == FS_DECRYPT ? "de" : "en"), + inode->i_ino, lblk_num, res); return res; } return 0; } -static struct page *alloc_bounce_page(struct fscrypt_ctx *ctx, gfp_t gfp_flags) +struct page *fscrypt_alloc_bounce_page(struct fscrypt_ctx *ctx, + gfp_t gfp_flags) { ctx->w.bounce_page = mempool_alloc(fscrypt_bounce_page_pool, gfp_flags); if (ctx->w.bounce_page == NULL) return ERR_PTR(-ENOMEM); - ctx->flags |= FS_WRITE_PATH_FL; + ctx->flags |= FS_CTX_HAS_BOUNCE_BUFFER_FL; return ctx->w.bounce_page; } /** * fscypt_encrypt_page() - Encrypts a page - * @inode: The inode for which the encryption should take place - * @plaintext_page: The page to encrypt. Must be locked. - * @gfp_flags: The gfp flag for memory allocation + * @inode: The inode for which the encryption should take place + * @page: The page to encrypt. Must be locked for bounce-page + * encryption. + * @len: Length of data to encrypt in @page and encrypted + * data in returned page. + * @offs: Offset of data within @page and returned + * page holding encrypted data. + * @lblk_num: Logical block number. This must be unique for multiple + * calls with same inode, except when overwriting + * previously written data. + * @gfp_flags: The gfp flag for memory allocation * - * Allocates a ciphertext page and encrypts plaintext_page into it using the ctx - * encryption context. + * Encrypts @page using the ctx encryption context. Performs encryption + * either in-place or into a newly allocated bounce page. + * Called on the page write path. * - * Called on the page write path. The caller must call + * Bounce page allocation is the default. + * In this case, the contents of @page are encrypted and stored in an + * allocated bounce page. @page has to be locked and the caller must call * fscrypt_restore_control_page() on the returned ciphertext page to * release the bounce buffer and the encryption context. * - * Return: An allocated page with the encrypted content on success. Else, an + * In-place encryption is used by setting the FS_CFLG_OWN_PAGES flag in + * fscrypt_operations. Here, the input-page is returned with its content + * encrypted. + * + * Return: A page with the encrypted content on success. Else, an * error value or NULL. */ -struct page *fscrypt_encrypt_page(struct inode *inode, - struct page *plaintext_page, gfp_t gfp_flags) +struct page *fscrypt_encrypt_page(const struct inode *inode, + struct page *page, + unsigned int len, + unsigned int offs, + u64 lblk_num, gfp_t gfp_flags) + { struct fscrypt_ctx *ctx; - struct page *ciphertext_page = NULL; + struct page *ciphertext_page = page; int err; - BUG_ON(!PageLocked(plaintext_page)); + BUG_ON(len % FS_CRYPTO_BLOCK_SIZE != 0); + + if (inode->i_sb->s_cop->flags & FS_CFLG_OWN_PAGES) { + /* with inplace-encryption we just encrypt the page */ + err = fscrypt_do_page_crypto(inode, FS_ENCRYPT, lblk_num, page, + ciphertext_page, len, offs, + gfp_flags); + if (err) + return ERR_PTR(err); + + return ciphertext_page; + } + + BUG_ON(!PageLocked(page)); ctx = fscrypt_get_ctx(inode, gfp_flags); if (IS_ERR(ctx)) return (struct page *)ctx; /* The encryption operation will require a bounce page. */ - ciphertext_page = alloc_bounce_page(ctx, gfp_flags); + ciphertext_page = fscrypt_alloc_bounce_page(ctx, gfp_flags); if (IS_ERR(ciphertext_page)) goto errout; - ctx->w.control_page = plaintext_page; - err = do_page_crypto(inode, FS_ENCRYPT, plaintext_page->index, - plaintext_page, ciphertext_page, - gfp_flags); + ctx->w.control_page = page; + err = fscrypt_do_page_crypto(inode, FS_ENCRYPT, lblk_num, + page, ciphertext_page, len, offs, + gfp_flags); if (err) { ciphertext_page = ERR_PTR(err); goto errout; @@ -265,8 +285,13 @@ struct page *fscrypt_encrypt_page(struct inode *inode, EXPORT_SYMBOL(fscrypt_encrypt_page); /** - * f2crypt_decrypt_page() - Decrypts a page in-place - * @page: The page to decrypt. Must be locked. + * fscrypt_decrypt_page() - Decrypts a page in-place + * @inode: The corresponding inode for the page to decrypt. + * @page: The page to decrypt. Must be locked in case + * it is a writeback page (FS_CFLG_OWN_PAGES unset). + * @len: Number of bytes in @page to be decrypted. + * @offs: Start of data in @page. + * @lblk_num: Logical block number. * * Decrypts page in-place using the ctx encryption context. * @@ -274,76 +299,17 @@ EXPORT_SYMBOL(fscrypt_encrypt_page); * * Return: Zero on success, non-zero otherwise. */ -int fscrypt_decrypt_page(struct page *page) +int fscrypt_decrypt_page(const struct inode *inode, struct page *page, + unsigned int len, unsigned int offs, u64 lblk_num) { - BUG_ON(!PageLocked(page)); + if (!(inode->i_sb->s_cop->flags & FS_CFLG_OWN_PAGES)) + BUG_ON(!PageLocked(page)); - return do_page_crypto(page->mapping->host, - FS_DECRYPT, page->index, page, page, GFP_NOFS); + return fscrypt_do_page_crypto(inode, FS_DECRYPT, lblk_num, page, page, + len, offs, GFP_NOFS); } EXPORT_SYMBOL(fscrypt_decrypt_page); -int fscrypt_zeroout_range(struct inode *inode, pgoff_t lblk, - sector_t pblk, unsigned int len) -{ - struct fscrypt_ctx *ctx; - struct page *ciphertext_page = NULL; - struct bio *bio; - int ret, err = 0; - - BUG_ON(inode->i_sb->s_blocksize != PAGE_SIZE); - - ctx = fscrypt_get_ctx(inode, GFP_NOFS); - if (IS_ERR(ctx)) - return PTR_ERR(ctx); - - ciphertext_page = alloc_bounce_page(ctx, GFP_NOWAIT); - if (IS_ERR(ciphertext_page)) { - err = PTR_ERR(ciphertext_page); - goto errout; - } - - while (len--) { - err = do_page_crypto(inode, FS_ENCRYPT, lblk, - ZERO_PAGE(0), ciphertext_page, - GFP_NOFS); - if (err) - goto errout; - - bio = bio_alloc(GFP_NOWAIT, 1); - if (!bio) { - err = -ENOMEM; - goto errout; - } - bio->bi_bdev = inode->i_sb->s_bdev; - bio->bi_iter.bi_sector = - pblk << (inode->i_sb->s_blocksize_bits - 9); - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); - ret = bio_add_page(bio, ciphertext_page, - inode->i_sb->s_blocksize, 0); - if (ret != inode->i_sb->s_blocksize) { - /* should never happen! */ - WARN_ON(1); - bio_put(bio); - err = -EIO; - goto errout; - } - err = submit_bio_wait(bio); - if ((err == 0) && bio->bi_error) - err = -EIO; - bio_put(bio); - if (err) - goto errout; - lblk++; - pblk++; - } - err = 0; -errout: - fscrypt_release_ctx(ctx); - return err; -} -EXPORT_SYMBOL(fscrypt_zeroout_range); - /* * Validate dentries for encrypted directories to make sure we aren't * potentially caching stale data after a key has been added or @@ -358,12 +324,11 @@ static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags) return -ECHILD; dir = dget_parent(dentry); - if (!d_inode(dir)->i_sb->s_cop->is_encrypted(d_inode(dir))) { + if (!IS_ENCRYPTED(d_inode(dir))) { dput(dir); return 0; } - /* this should eventually be an flag in d_flags */ spin_lock(&dentry->d_lock); cached_with_key = dentry->d_flags & DCACHE_ENCRYPTED_WITH_KEY; spin_unlock(&dentry->d_lock); @@ -390,64 +355,6 @@ static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags) const struct dentry_operations fscrypt_d_ops = { .d_revalidate = fscrypt_d_revalidate, }; -EXPORT_SYMBOL(fscrypt_d_ops); - -/* - * Call fscrypt_decrypt_page on every single page, reusing the encryption - * context. - */ -static void completion_pages(struct work_struct *work) -{ - struct fscrypt_ctx *ctx = - container_of(work, struct fscrypt_ctx, r.work); - struct bio *bio = ctx->r.bio; - struct bio_vec *bv; - int i; - - bio_for_each_segment_all(bv, bio, i) { - struct page *page = bv->bv_page; - int ret = fscrypt_decrypt_page(page); - - if (ret) { - WARN_ON_ONCE(1); - SetPageError(page); - } else { - SetPageUptodate(page); - } - unlock_page(page); - } - fscrypt_release_ctx(ctx); - bio_put(bio); -} - -void fscrypt_decrypt_bio_pages(struct fscrypt_ctx *ctx, struct bio *bio) -{ - INIT_WORK(&ctx->r.work, completion_pages); - ctx->r.bio = bio; - queue_work(fscrypt_read_workqueue, &ctx->r.work); -} -EXPORT_SYMBOL(fscrypt_decrypt_bio_pages); - -void fscrypt_pullback_bio_page(struct page **page, bool restore) -{ - struct fscrypt_ctx *ctx; - struct page *bounce_page; - - /* The bounce data pages are unmapped. */ - if ((*page)->mapping) - return; - - /* The bounce data page is unmapped. */ - bounce_page = *page; - ctx = (struct fscrypt_ctx *)page_private(bounce_page); - - /* restore control page */ - *page = ctx->w.control_page; - - if (restore) - fscrypt_restore_control_page(bounce_page); -} -EXPORT_SYMBOL(fscrypt_pullback_bio_page); void fscrypt_restore_control_page(struct page *page) { @@ -474,16 +381,21 @@ static void fscrypt_destroy(void) /** * fscrypt_initialize() - allocate major buffers for fs encryption. + * @cop_flags: fscrypt operations flags * * We only call this when we start accessing encrypted files, since it * results in memory getting allocated that wouldn't otherwise be used. * * Return: Zero on success, non-zero otherwise. */ -int fscrypt_initialize(void) +int fscrypt_initialize(unsigned int cop_flags) { int i, res = -ENOMEM; + /* No need to allocate a bounce page pool if this FS won't use it. */ + if (cop_flags & FS_CFLG_OWN_PAGES) + return 0; + mutex_lock(&fscrypt_init_mutex); if (fscrypt_bounce_page_pool) goto already_initialized; @@ -510,7 +422,27 @@ int fscrypt_initialize(void) mutex_unlock(&fscrypt_init_mutex); return res; } -EXPORT_SYMBOL(fscrypt_initialize); + +void fscrypt_msg(struct super_block *sb, const char *level, + const char *fmt, ...) +{ + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + struct va_format vaf; + va_list args; + + if (!__ratelimit(&rs)) + return; + + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = &args; + if (sb) + printk("%sfscrypt (%s): %pV\n", level, sb->s_id, &vaf); + else + printk("%sfscrypt: %pV\n", level, &vaf); + va_end(args); +} /** * fscrypt_init() - Set up for fs encryption. @@ -561,6 +493,8 @@ static void __exit fscrypt_exit(void) destroy_workqueue(fscrypt_read_workqueue); kmem_cache_destroy(fscrypt_ctx_cachep); kmem_cache_destroy(fscrypt_info_cachep); + + fscrypt_essiv_cleanup(); } module_exit(fscrypt_exit);
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c index e14bb7b..1bdb9f2 100644 --- a/fs/crypto/fname.c +++ b/fs/crypto/fname.c
@@ -12,89 +12,70 @@ #include <linux/scatterlist.h> #include <linux/ratelimit.h> -#include <linux/fscrypto.h> +#include <crypto/skcipher.h> +#include "fscrypt_private.h" -/** - * fname_crypt_complete() - completion callback for filename crypto - * @req: The asynchronous cipher request context - * @res: The result of the cipher operation - */ -static void fname_crypt_complete(struct crypto_async_request *req, int res) +static inline bool fscrypt_is_dot_dotdot(const struct qstr *str) { - struct fscrypt_completion_result *ecr = req->data; + if (str->len == 1 && str->name[0] == '.') + return true; - if (res == -EINPROGRESS) - return; - ecr->res = res; - complete(&ecr->completion); + if (str->len == 2 && str->name[0] == '.' && str->name[1] == '.') + return true; + + return false; } /** * fname_encrypt() - encrypt a filename * - * The caller must have allocated sufficient memory for the @oname string. + * The output buffer must be at least as large as the input buffer. + * Any extra space is filled with NUL padding before encryption. * * Return: 0 on success, -errno on failure */ -static int fname_encrypt(struct inode *inode, - const struct qstr *iname, struct fscrypt_str *oname) +int fname_encrypt(struct inode *inode, const struct qstr *iname, + u8 *out, unsigned int olen) { struct skcipher_request *req = NULL; - DECLARE_FS_COMPLETION_RESULT(ecr); - struct fscrypt_info *ci = inode->i_crypt_info; - struct crypto_skcipher *tfm = ci->ci_ctfm; + DECLARE_CRYPTO_WAIT(wait); + struct crypto_skcipher *tfm = inode->i_crypt_info->ci_ctfm; int res = 0; char iv[FS_CRYPTO_BLOCK_SIZE]; struct scatterlist sg; - int padding = 4 << (ci->ci_flags & FS_POLICY_FLAGS_PAD_MASK); - unsigned int lim; - unsigned int cryptlen; - - lim = inode->i_sb->s_cop->max_namelen(inode); - if (iname->len <= 0 || iname->len > lim) - return -EIO; /* * Copy the filename to the output buffer for encrypting in-place and * pad it with the needed number of NUL bytes. */ - cryptlen = max_t(unsigned int, iname->len, FS_CRYPTO_BLOCK_SIZE); - cryptlen = round_up(cryptlen, padding); - cryptlen = min(cryptlen, lim); - memcpy(oname->name, iname->name, iname->len); - memset(oname->name + iname->len, 0, cryptlen - iname->len); + if (WARN_ON(olen < iname->len)) + return -ENOBUFS; + memcpy(out, iname->name, iname->len); + memset(out + iname->len, 0, olen - iname->len); /* Initialize the IV */ memset(iv, 0, FS_CRYPTO_BLOCK_SIZE); /* Set up the encryption request */ req = skcipher_request_alloc(tfm, GFP_NOFS); - if (!req) { - printk_ratelimited(KERN_ERR - "%s: skcipher_request_alloc() failed\n", __func__); + if (!req) return -ENOMEM; - } skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP, - fname_crypt_complete, &ecr); - sg_init_one(&sg, oname->name, cryptlen); - skcipher_request_set_crypt(req, &sg, &sg, cryptlen, iv); + crypto_req_done, &wait); + sg_init_one(&sg, out, olen); + skcipher_request_set_crypt(req, &sg, &sg, olen, iv); /* Do the encryption */ - res = crypto_skcipher_encrypt(req); - if (res == -EINPROGRESS || res == -EBUSY) { - /* Request is being completed asynchronously; wait for it */ - wait_for_completion(&ecr.completion); - res = ecr.res; - } + res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait); skcipher_request_free(req); if (res < 0) { - printk_ratelimited(KERN_ERR - "%s: Error (error code %d)\n", __func__, res); + fscrypt_err(inode->i_sb, + "Filename encryption failed for inode %lu: %d", + inode->i_ino, res); return res; } - oname->len = cryptlen; return 0; } @@ -110,28 +91,19 @@ static int fname_decrypt(struct inode *inode, struct fscrypt_str *oname) { struct skcipher_request *req = NULL; - DECLARE_FS_COMPLETION_RESULT(ecr); + DECLARE_CRYPTO_WAIT(wait); struct scatterlist src_sg, dst_sg; - struct fscrypt_info *ci = inode->i_crypt_info; - struct crypto_skcipher *tfm = ci->ci_ctfm; + struct crypto_skcipher *tfm = inode->i_crypt_info->ci_ctfm; int res = 0; char iv[FS_CRYPTO_BLOCK_SIZE]; - unsigned lim; - - lim = inode->i_sb->s_cop->max_namelen(inode); - if (iname->len <= 0 || iname->len > lim) - return -EIO; /* Allocate request */ req = skcipher_request_alloc(tfm, GFP_NOFS); - if (!req) { - printk_ratelimited(KERN_ERR - "%s: crypto_request_alloc() failed\n", __func__); + if (!req) return -ENOMEM; - } skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP, - fname_crypt_complete, &ecr); + crypto_req_done, &wait); /* Initialize IV */ memset(iv, 0, FS_CRYPTO_BLOCK_SIZE); @@ -140,15 +112,12 @@ static int fname_decrypt(struct inode *inode, sg_init_one(&src_sg, iname->name, iname->len); sg_init_one(&dst_sg, oname->name, oname->len); skcipher_request_set_crypt(req, &src_sg, &dst_sg, iname->len, iv); - res = crypto_skcipher_decrypt(req); - if (res == -EINPROGRESS || res == -EBUSY) { - wait_for_completion(&ecr.completion); - res = ecr.res; - } + res = crypto_wait_req(crypto_skcipher_decrypt(req), &wait); skcipher_request_free(req); if (res < 0) { - printk_ratelimited(KERN_ERR - "%s: Error (error code %d)\n", __func__, res); + fscrypt_err(inode->i_sb, + "Filename decryption failed for inode %lu: %d", + inode->i_ino, res); return res; } @@ -159,6 +128,8 @@ static int fname_decrypt(struct inode *inode, static const char *lookup_table = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,"; +#define BASE64_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3) + /** * digest_encode() - * @@ -209,47 +180,52 @@ static int digest_decode(const char *src, int len, char *dst) return cp - dst; } -u32 fscrypt_fname_encrypted_size(struct inode *inode, u32 ilen) +bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len, + u32 max_len, u32 *encrypted_len_ret) { - int padding = 32; - struct fscrypt_info *ci = inode->i_crypt_info; + int padding = 4 << (inode->i_crypt_info->ci_flags & + FS_POLICY_FLAGS_PAD_MASK); + u32 encrypted_len; - if (ci) - padding = 4 << (ci->ci_flags & FS_POLICY_FLAGS_PAD_MASK); - ilen = max(ilen, (u32)FS_CRYPTO_BLOCK_SIZE); - return round_up(ilen, padding); + if (orig_len > max_len) + return false; + encrypted_len = max(orig_len, (u32)FS_CRYPTO_BLOCK_SIZE); + encrypted_len = round_up(encrypted_len, padding); + *encrypted_len_ret = min(encrypted_len, max_len); + return true; } -EXPORT_SYMBOL(fscrypt_fname_encrypted_size); /** - * fscrypt_fname_crypto_alloc_obuff() - + * fscrypt_fname_alloc_buffer - allocate a buffer for presented filenames * - * Allocates an output buffer that is sufficient for the crypto operation - * specified by the context and the direction. + * Allocate a buffer that is large enough to hold any decrypted or encoded + * filename (null-terminated), for the given maximum encrypted filename length. + * + * Return: 0 on success, -errno on failure */ -int fscrypt_fname_alloc_buffer(struct inode *inode, - u32 ilen, struct fscrypt_str *crypto_str) +int fscrypt_fname_alloc_buffer(const struct inode *inode, + u32 max_encrypted_len, + struct fscrypt_str *crypto_str) { - unsigned int olen = fscrypt_fname_encrypted_size(inode, ilen); + const u32 max_encoded_len = + max_t(u32, BASE64_CHARS(FSCRYPT_FNAME_MAX_UNDIGESTED_SIZE), + 1 + BASE64_CHARS(sizeof(struct fscrypt_digested_name))); + u32 max_presented_len; - crypto_str->len = olen; - if (olen < FS_FNAME_CRYPTO_DIGEST_SIZE * 2) - olen = FS_FNAME_CRYPTO_DIGEST_SIZE * 2; - /* - * Allocated buffer can hold one more character to null-terminate the - * string - */ - crypto_str->name = kmalloc(olen + 1, GFP_NOFS); - if (!(crypto_str->name)) + max_presented_len = max(max_encoded_len, max_encrypted_len); + + crypto_str->name = kmalloc(max_presented_len + 1, GFP_NOFS); + if (!crypto_str->name) return -ENOMEM; + crypto_str->len = max_presented_len; return 0; } EXPORT_SYMBOL(fscrypt_fname_alloc_buffer); /** - * fscrypt_fname_crypto_free_buffer() - + * fscrypt_fname_free_buffer - free the buffer for presented filenames * - * Frees the buffer allocated for crypto operation. + * Free the buffer allocated by fscrypt_fname_alloc_buffer(). */ void fscrypt_fname_free_buffer(struct fscrypt_str *crypto_str) { @@ -266,6 +242,10 @@ EXPORT_SYMBOL(fscrypt_fname_free_buffer); * * The caller must have allocated sufficient memory for the @oname string. * + * If the key is available, we'll decrypt the disk name; otherwise, we'll encode + * it for presentation. Short names are directly base64-encoded, while long + * names are encoded in fscrypt_digested_name format. + * * Return: 0 on success, -errno on failure */ int fscrypt_fname_disk_to_usr(struct inode *inode, @@ -274,7 +254,7 @@ int fscrypt_fname_disk_to_usr(struct inode *inode, struct fscrypt_str *oname) { const struct qstr qname = FSTR_TO_QSTR(iname); - char buf[24]; + struct fscrypt_digested_name digested_name; if (fscrypt_is_dot_dotdot(&qname)) { oname->name[0] = '.'; @@ -289,77 +269,82 @@ int fscrypt_fname_disk_to_usr(struct inode *inode, if (inode->i_crypt_info) return fname_decrypt(inode, iname, oname); - if (iname->len <= FS_FNAME_CRYPTO_DIGEST_SIZE) { + if (iname->len <= FSCRYPT_FNAME_MAX_UNDIGESTED_SIZE) { oname->len = digest_encode(iname->name, iname->len, oname->name); return 0; } if (hash) { - memcpy(buf, &hash, 4); - memcpy(buf + 4, &minor_hash, 4); + digested_name.hash = hash; + digested_name.minor_hash = minor_hash; } else { - memset(buf, 0, 8); + digested_name.hash = 0; + digested_name.minor_hash = 0; } - memcpy(buf + 8, iname->name + ((iname->len - 17) & ~15), 16); + memcpy(digested_name.digest, + FSCRYPT_FNAME_DIGEST(iname->name, iname->len), + FSCRYPT_FNAME_DIGEST_SIZE); oname->name[0] = '_'; - oname->len = 1 + digest_encode(buf, 24, oname->name + 1); + oname->len = 1 + digest_encode((const char *)&digested_name, + sizeof(digested_name), oname->name + 1); return 0; } EXPORT_SYMBOL(fscrypt_fname_disk_to_usr); /** - * fscrypt_fname_usr_to_disk() - converts a filename from user space to disk - * space + * fscrypt_setup_filename() - prepare to search a possibly encrypted directory + * @dir: the directory that will be searched + * @iname: the user-provided filename being searched for + * @lookup: 1 if we're allowed to proceed without the key because it's + * ->lookup() or we're finding the dir_entry for deletion; 0 if we cannot + * proceed without the key because we're going to create the dir_entry. + * @fname: the filename information to be filled in * - * The caller must have allocated sufficient memory for the @oname string. + * Given a user-provided filename @iname, this function sets @fname->disk_name + * to the name that would be stored in the on-disk directory entry, if possible. + * If the directory is unencrypted this is simply @iname. Else, if we have the + * directory's encryption key, then @iname is the plaintext, so we encrypt it to + * get the disk_name. + * + * Else, for keyless @lookup operations, @iname is the presented ciphertext, so + * we decode it to get either the ciphertext disk_name (for short names) or the + * fscrypt_digested_name (for long names). Non-@lookup operations will be + * impossible in this case, so we fail them with ENOKEY. + * + * If successful, fscrypt_free_filename() must be called later to clean up. * * Return: 0 on success, -errno on failure */ -int fscrypt_fname_usr_to_disk(struct inode *inode, - const struct qstr *iname, - struct fscrypt_str *oname) -{ - if (fscrypt_is_dot_dotdot(iname)) { - oname->name[0] = '.'; - oname->name[iname->len - 1] = '.'; - oname->len = iname->len; - return 0; - } - if (inode->i_crypt_info) - return fname_encrypt(inode, iname, oname); - /* - * Without a proper key, a user is not allowed to modify the filenames - * in a directory. Consequently, a user space name cannot be mapped to - * a disk-space name - */ - return -ENOKEY; -} -EXPORT_SYMBOL(fscrypt_fname_usr_to_disk); - int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname, int lookup, struct fscrypt_name *fname) { - int ret = 0, bigname = 0; + int ret; + int digested; memset(fname, 0, sizeof(struct fscrypt_name)); fname->usr_fname = iname; - if (!dir->i_sb->s_cop->is_encrypted(dir) || - fscrypt_is_dot_dotdot(iname)) { + if (!IS_ENCRYPTED(dir) || fscrypt_is_dot_dotdot(iname)) { fname->disk_name.name = (unsigned char *)iname->name; fname->disk_name.len = iname->len; return 0; } ret = fscrypt_get_encryption_info(dir); - if (ret && ret != -EOPNOTSUPP) + if (ret) return ret; if (dir->i_crypt_info) { - ret = fscrypt_fname_alloc_buffer(dir, iname->len, - &fname->crypto_buf); - if (ret) - return ret; - ret = fname_encrypt(dir, iname, &fname->crypto_buf); + if (!fscrypt_fname_encrypted_size(dir, iname->len, + dir->i_sb->s_cop->max_namelen, + &fname->crypto_buf.len)) + return -ENAMETOOLONG; + fname->crypto_buf.name = kmalloc(fname->crypto_buf.len, + GFP_NOFS); + if (!fname->crypto_buf.name) + return -ENOMEM; + + ret = fname_encrypt(dir, iname, fname->crypto_buf.name, + fname->crypto_buf.len); if (ret) goto errout; fname->disk_name.name = fname->crypto_buf.name; @@ -373,25 +358,37 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname, * We don't have the key and we are doing a lookup; decode the * user-supplied name */ - if (iname->name[0] == '_') - bigname = 1; - if ((bigname && (iname->len != 33)) || (!bigname && (iname->len > 43))) - return -ENOENT; + if (iname->name[0] == '_') { + if (iname->len != + 1 + BASE64_CHARS(sizeof(struct fscrypt_digested_name))) + return -ENOENT; + digested = 1; + } else { + if (iname->len > + BASE64_CHARS(FSCRYPT_FNAME_MAX_UNDIGESTED_SIZE)) + return -ENOENT; + digested = 0; + } - fname->crypto_buf.name = kmalloc(32, GFP_KERNEL); + fname->crypto_buf.name = + kmalloc(max_t(size_t, FSCRYPT_FNAME_MAX_UNDIGESTED_SIZE, + sizeof(struct fscrypt_digested_name)), + GFP_KERNEL); if (fname->crypto_buf.name == NULL) return -ENOMEM; - ret = digest_decode(iname->name + bigname, iname->len - bigname, + ret = digest_decode(iname->name + digested, iname->len - digested, fname->crypto_buf.name); if (ret < 0) { ret = -ENOENT; goto errout; } fname->crypto_buf.len = ret; - if (bigname) { - memcpy(&fname->hash, fname->crypto_buf.name, 4); - memcpy(&fname->minor_hash, fname->crypto_buf.name + 4, 4); + if (digested) { + const struct fscrypt_digested_name *n = + (const void *)fname->crypto_buf.name; + fname->hash = n->hash; + fname->minor_hash = n->minor_hash; } else { fname->disk_name.name = fname->crypto_buf.name; fname->disk_name.len = fname->crypto_buf.len; @@ -399,16 +396,7 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname, return 0; errout: - fscrypt_fname_free_buffer(&fname->crypto_buf); + kfree(fname->crypto_buf.name); return ret; } EXPORT_SYMBOL(fscrypt_setup_filename); - -void fscrypt_free_filename(struct fscrypt_name *fname) -{ - kfree(fname->crypto_buf.name); - fname->crypto_buf.name = NULL; - fname->usr_fname = NULL; - fname->disk_name.name = NULL; -} -EXPORT_SYMBOL(fscrypt_free_filename);
diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h new file mode 100644 index 0000000..d608237 --- /dev/null +++ b/fs/crypto/fscrypt_private.h
@@ -0,0 +1,123 @@ +/* + * fscrypt_private.h + * + * Copyright (C) 2015, Google, Inc. + * + * This contains encryption key functions. + * + * Written by Michael Halcrow, Ildar Muslukhov, and Uday Savagaonkar, 2015. + */ + +#ifndef _FSCRYPT_PRIVATE_H +#define _FSCRYPT_PRIVATE_H + +#define __FS_HAS_ENCRYPTION 1 +#include <linux/fscrypt.h> +#include <crypto/hash.h> + +/* Encryption parameters */ +#define FS_IV_SIZE 16 +#define FS_KEY_DERIVATION_NONCE_SIZE 16 + +/** + * Encryption context for inode + * + * Protector format: + * 1 byte: Protector format (1 = this version) + * 1 byte: File contents encryption mode + * 1 byte: File names encryption mode + * 1 byte: Flags + * 8 bytes: Master Key descriptor + * 16 bytes: Encryption Key derivation nonce + */ +struct fscrypt_context { + u8 format; + u8 contents_encryption_mode; + u8 filenames_encryption_mode; + u8 flags; + u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE]; + u8 nonce[FS_KEY_DERIVATION_NONCE_SIZE]; +} __packed; + +#define FS_ENCRYPTION_CONTEXT_FORMAT_V1 1 + +/** + * For encrypted symlinks, the ciphertext length is stored at the beginning + * of the string in little-endian format. + */ +struct fscrypt_symlink_data { + __le16 len; + char encrypted_path[1]; +} __packed; + +/* + * A pointer to this structure is stored in the file system's in-core + * representation of an inode. + */ +struct fscrypt_info { + u8 ci_data_mode; + u8 ci_filename_mode; + u8 ci_flags; + struct crypto_skcipher *ci_ctfm; + struct crypto_cipher *ci_essiv_tfm; + u8 ci_master_key[FS_KEY_DESCRIPTOR_SIZE]; +}; + +typedef enum { + FS_DECRYPT = 0, + FS_ENCRYPT, +} fscrypt_direction_t; + +#define FS_CTX_REQUIRES_FREE_ENCRYPT_FL 0x00000001 +#define FS_CTX_HAS_BOUNCE_BUFFER_FL 0x00000002 + +static inline bool fscrypt_valid_enc_modes(u32 contents_mode, + u32 filenames_mode) +{ + if (contents_mode == FS_ENCRYPTION_MODE_AES_128_CBC && + filenames_mode == FS_ENCRYPTION_MODE_AES_128_CTS) + return true; + + if (contents_mode == FS_ENCRYPTION_MODE_AES_256_XTS && + filenames_mode == FS_ENCRYPTION_MODE_AES_256_CTS) + return true; + + if (contents_mode == FS_ENCRYPTION_MODE_SPECK128_256_XTS && + filenames_mode == FS_ENCRYPTION_MODE_SPECK128_256_CTS) + return true; + + return false; +} + +/* crypto.c */ +extern struct kmem_cache *fscrypt_info_cachep; +extern int fscrypt_initialize(unsigned int cop_flags); +extern int fscrypt_do_page_crypto(const struct inode *inode, + fscrypt_direction_t rw, u64 lblk_num, + struct page *src_page, + struct page *dest_page, + unsigned int len, unsigned int offs, + gfp_t gfp_flags); +extern struct page *fscrypt_alloc_bounce_page(struct fscrypt_ctx *ctx, + gfp_t gfp_flags); +extern const struct dentry_operations fscrypt_d_ops; + +extern void __printf(3, 4) __cold +fscrypt_msg(struct super_block *sb, const char *level, const char *fmt, ...); + +#define fscrypt_warn(sb, fmt, ...) \ + fscrypt_msg(sb, KERN_WARNING, fmt, ##__VA_ARGS__) +#define fscrypt_err(sb, fmt, ...) \ + fscrypt_msg(sb, KERN_ERR, fmt, ##__VA_ARGS__) + +/* fname.c */ +extern int fname_encrypt(struct inode *inode, const struct qstr *iname, + u8 *out, unsigned int olen); +extern bool fscrypt_fname_encrypted_size(const struct inode *inode, + u32 orig_len, u32 max_len, + u32 *encrypted_len_ret); + +/* keyinfo.c */ +extern void __exit fscrypt_essiv_cleanup(void); + +#endif /* _FSCRYPT_PRIVATE_H */
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c new file mode 100644 index 0000000..926e5df --- /dev/null +++ b/fs/crypto/hooks.c
@@ -0,0 +1,271 @@ +/* + * fs/crypto/hooks.c + * + * Encryption hooks for higher-level filesystem operations. + */ + +#include <linux/ratelimit.h> +#include "fscrypt_private.h" + +/** + * fscrypt_file_open - prepare to open a possibly-encrypted regular file + * @inode: the inode being opened + * @filp: the struct file being set up + * + * Currently, an encrypted regular file can only be opened if its encryption key + * is available; access to the raw encrypted contents is not supported. + * Therefore, we first set up the inode's encryption key (if not already done) + * and return an error if it's unavailable. + * + * We also verify that if the parent directory (from the path via which the file + * is being opened) is encrypted, then the inode being opened uses the same + * encryption policy. This is needed as part of the enforcement that all files + * in an encrypted directory tree use the same encryption policy, as a + * protection against certain types of offline attacks. Note that this check is + * needed even when opening an *unencrypted* file, since it's forbidden to have + * an unencrypted file in an encrypted directory. + * + * Return: 0 on success, -ENOKEY if the key is missing, or another -errno code + */ +int fscrypt_file_open(struct inode *inode, struct file *filp) +{ + int err; + struct dentry *dir; + + err = fscrypt_require_key(inode); + if (err) + return err; + + dir = dget_parent(file_dentry(filp)); + if (IS_ENCRYPTED(d_inode(dir)) && + !fscrypt_has_permitted_context(d_inode(dir), inode)) { + fscrypt_warn(inode->i_sb, + "inconsistent encryption contexts: %lu/%lu", + d_inode(dir)->i_ino, inode->i_ino); + err = -EPERM; + } + dput(dir); + return err; +} +EXPORT_SYMBOL_GPL(fscrypt_file_open); + +int __fscrypt_prepare_link(struct inode *inode, struct inode *dir) +{ + int err; + + err = fscrypt_require_key(dir); + if (err) + return err; + + if (!fscrypt_has_permitted_context(dir, inode)) + return -EPERM; + + return 0; +} +EXPORT_SYMBOL_GPL(__fscrypt_prepare_link); + +int __fscrypt_prepare_rename(struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry, + unsigned int flags) +{ + int err; + + err = fscrypt_require_key(old_dir); + if (err) + return err; + + err = fscrypt_require_key(new_dir); + if (err) + return err; + + if (old_dir != new_dir) { + if (IS_ENCRYPTED(new_dir) && + !fscrypt_has_permitted_context(new_dir, + d_inode(old_dentry))) + return -EPERM; + + if ((flags & RENAME_EXCHANGE) && + IS_ENCRYPTED(old_dir) && + !fscrypt_has_permitted_context(old_dir, + d_inode(new_dentry))) + return -EPERM; + } + return 0; +} +EXPORT_SYMBOL_GPL(__fscrypt_prepare_rename); + +int __fscrypt_prepare_lookup(struct inode *dir, struct dentry *dentry) +{ + int err = fscrypt_get_encryption_info(dir); + + if (err) + return err; + + if (fscrypt_has_encryption_key(dir)) { + spin_lock(&dentry->d_lock); + dentry->d_flags |= DCACHE_ENCRYPTED_WITH_KEY; + spin_unlock(&dentry->d_lock); + } + + d_set_d_op(dentry, &fscrypt_d_ops); + return 0; +} +EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup); + +int __fscrypt_prepare_symlink(struct inode *dir, unsigned int len, + unsigned int max_len, + struct fscrypt_str *disk_link) +{ + int err; + + /* + * To calculate the size of the encrypted symlink target we need to know + * the amount of NUL padding, which is determined by the flags set in + * the encryption policy which will be inherited from the directory. + * The easiest way to get access to this is to just load the directory's + * fscrypt_info, since we'll need it to create the dir_entry anyway. + * + * Note: in test_dummy_encryption mode, @dir may be unencrypted. + */ + err = fscrypt_get_encryption_info(dir); + if (err) + return err; + if (!fscrypt_has_encryption_key(dir)) + return -ENOKEY; + + /* + * Calculate the size of the encrypted symlink and verify it won't + * exceed max_len. Note that for historical reasons, encrypted symlink + * targets are prefixed with the ciphertext length, despite this + * actually being redundant with i_size. This decreases by 2 bytes the + * longest symlink target we can accept. + * + * We could recover 1 byte by not counting a null terminator, but + * counting it (even though it is meaningless for ciphertext) is simpler + * for now since filesystems will assume it is there and subtract it. + */ + if (!fscrypt_fname_encrypted_size(dir, len, + max_len - sizeof(struct fscrypt_symlink_data), + &disk_link->len)) + return -ENAMETOOLONG; + disk_link->len += sizeof(struct fscrypt_symlink_data); + + disk_link->name = NULL; + return 0; +} +EXPORT_SYMBOL_GPL(__fscrypt_prepare_symlink); + +int __fscrypt_encrypt_symlink(struct inode *inode, const char *target, + unsigned int len, struct fscrypt_str *disk_link) +{ + int err; + struct qstr iname = QSTR_INIT(target, len); + struct fscrypt_symlink_data *sd; + unsigned int ciphertext_len; + + err = fscrypt_require_key(inode); + if (err) + return err; + + if (disk_link->name) { + /* filesystem-provided buffer */ + sd = (struct fscrypt_symlink_data *)disk_link->name; + } else { + sd = kmalloc(disk_link->len, GFP_NOFS); + if (!sd) + return -ENOMEM; + } + ciphertext_len = disk_link->len - sizeof(*sd); + sd->len = cpu_to_le16(ciphertext_len); + + err = fname_encrypt(inode, &iname, sd->encrypted_path, ciphertext_len); + if (err) { + if (!disk_link->name) + kfree(sd); + return err; + } + /* + * Null-terminating the ciphertext doesn't make sense, but we still + * count the null terminator in the length, so we might as well + * initialize it just in case the filesystem writes it out. + */ + sd->encrypted_path[ciphertext_len] = '\0'; + + if (!disk_link->name) + disk_link->name = (unsigned char *)sd; + return 0; +} +EXPORT_SYMBOL_GPL(__fscrypt_encrypt_symlink); + +/** + * fscrypt_get_symlink - get the target of an encrypted symlink + * @inode: the symlink inode + * @caddr: the on-disk contents of the symlink + * @max_size: size of @caddr buffer + * @done: if successful, will be set up to free the returned target + * + * If the symlink's encryption key is available, we decrypt its target. + * Otherwise, we encode its target for presentation. + * + * This may sleep, so the filesystem must have dropped out of RCU mode already. + * + * Return: the presentable symlink target or an ERR_PTR() + */ +const char *fscrypt_get_symlink(struct inode *inode, const void *caddr, + unsigned int max_size, + struct delayed_call *done) +{ + const struct fscrypt_symlink_data *sd; + struct fscrypt_str cstr, pstr; + int err; + + /* This is for encrypted symlinks only */ + if (WARN_ON(!IS_ENCRYPTED(inode))) + return ERR_PTR(-EINVAL); + + /* + * Try to set up the symlink's encryption key, but we can continue + * regardless of whether the key is available or not. + */ + err = fscrypt_get_encryption_info(inode); + if (err) + return ERR_PTR(err); + + /* + * For historical reasons, encrypted symlink targets are prefixed with + * the ciphertext length, even though this is redundant with i_size. + */ + + if (max_size < sizeof(*sd)) + return ERR_PTR(-EUCLEAN); + sd = caddr; + cstr.name = (unsigned char *)sd->encrypted_path; + cstr.len = le16_to_cpu(sd->len); + + if (cstr.len == 0) + return ERR_PTR(-EUCLEAN); + + if (cstr.len + sizeof(*sd) - 1 > max_size) + return ERR_PTR(-EUCLEAN); + + err = fscrypt_fname_alloc_buffer(inode, cstr.len, &pstr); + if (err) + return ERR_PTR(err); + + err = fscrypt_fname_disk_to_usr(inode, 0, 0, &cstr, &pstr); + if (err) + goto err_kfree; + + err = -EUCLEAN; + if (pstr.name[0] == '\0') + goto err_kfree; + + pstr.name[pstr.len] = '\0'; + set_delayed_call(done, kfree_link, pstr.name); + return pstr.name; + +err_kfree: + kfree(pstr.name); + return ERR_PTR(err); +} +EXPORT_SYMBOL_GPL(fscrypt_get_symlink);
diff --git a/fs/crypto/keyinfo.c b/fs/crypto/keyinfo.c index a755fa1..70c0cca 100644 --- a/fs/crypto/keyinfo.c +++ b/fs/crypto/keyinfo.c
@@ -10,34 +10,28 @@ #include <keys/user-type.h> #include <linux/scatterlist.h> -#include <linux/fscrypto.h> +#include <linux/ratelimit.h> +#include <crypto/aes.h> +#include <crypto/sha.h> +#include <crypto/skcipher.h> +#include "fscrypt_private.h" -static void derive_crypt_complete(struct crypto_async_request *req, int rc) -{ - struct fscrypt_completion_result *ecr = req->data; +static struct crypto_shash *essiv_hash_tfm; - if (rc == -EINPROGRESS) - return; - - ecr->res = rc; - complete(&ecr->completion); -} - -/** - * derive_key_aes() - Derive a key using AES-128-ECB - * @deriving_key: Encryption key used for derivation. - * @source_key: Source key to which to apply derivation. - * @derived_key: Derived key. +/* + * Key derivation function. This generates the derived key by encrypting the + * master key with AES-128-ECB using the inode's nonce as the AES key. * - * Return: Zero on success; non-zero otherwise. + * The master key must be at least as long as the derived key. If the master + * key is longer, then only the first 'derived_keysize' bytes are used. */ -static int derive_key_aes(u8 deriving_key[FS_AES_128_ECB_KEY_SIZE], - u8 source_key[FS_AES_256_XTS_KEY_SIZE], - u8 derived_key[FS_AES_256_XTS_KEY_SIZE]) +static int derive_key_aes(const u8 *master_key, + const struct fscrypt_context *ctx, + u8 *derived_key, unsigned int derived_keysize) { int res = 0; struct skcipher_request *req = NULL; - DECLARE_FS_COMPLETION_RESULT(ecr); + DECLARE_CRYPTO_WAIT(wait); struct scatterlist src_sg, dst_sg; struct crypto_skcipher *tfm = crypto_alloc_skcipher("ecb(aes)", 0, 0); @@ -54,116 +48,163 @@ static int derive_key_aes(u8 deriving_key[FS_AES_128_ECB_KEY_SIZE], } skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP, - derive_crypt_complete, &ecr); - res = crypto_skcipher_setkey(tfm, deriving_key, - FS_AES_128_ECB_KEY_SIZE); + crypto_req_done, &wait); + res = crypto_skcipher_setkey(tfm, ctx->nonce, sizeof(ctx->nonce)); if (res < 0) goto out; - sg_init_one(&src_sg, source_key, FS_AES_256_XTS_KEY_SIZE); - sg_init_one(&dst_sg, derived_key, FS_AES_256_XTS_KEY_SIZE); - skcipher_request_set_crypt(req, &src_sg, &dst_sg, - FS_AES_256_XTS_KEY_SIZE, NULL); - res = crypto_skcipher_encrypt(req); - if (res == -EINPROGRESS || res == -EBUSY) { - wait_for_completion(&ecr.completion); - res = ecr.res; - } + sg_init_one(&src_sg, master_key, derived_keysize); + sg_init_one(&dst_sg, derived_key, derived_keysize); + skcipher_request_set_crypt(req, &src_sg, &dst_sg, derived_keysize, + NULL); + res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait); out: skcipher_request_free(req); crypto_free_skcipher(tfm); return res; } -static int validate_user_key(struct fscrypt_info *crypt_info, - struct fscrypt_context *ctx, u8 *raw_key, - u8 *prefix, int prefix_size) +/* + * Search the current task's subscribed keyrings for a "logon" key with + * description prefix:descriptor, and if found acquire a read lock on it and + * return a pointer to its validated payload in *payload_ret. + */ +static struct key * +find_and_lock_process_key(const char *prefix, + const u8 descriptor[FS_KEY_DESCRIPTOR_SIZE], + unsigned int min_keysize, + const struct fscrypt_key **payload_ret) { - u8 *full_key_descriptor; - struct key *keyring_key; - struct fscrypt_key *master_key; + char *description; + struct key *key; const struct user_key_payload *ukp; - int full_key_len = prefix_size + (FS_KEY_DESCRIPTOR_SIZE * 2) + 1; - int res; + const struct fscrypt_key *payload; - full_key_descriptor = kmalloc(full_key_len, GFP_NOFS); - if (!full_key_descriptor) - return -ENOMEM; + description = kasprintf(GFP_NOFS, "%s%*phN", prefix, + FS_KEY_DESCRIPTOR_SIZE, descriptor); + if (!description) + return ERR_PTR(-ENOMEM); - memcpy(full_key_descriptor, prefix, prefix_size); - sprintf(full_key_descriptor + prefix_size, - "%*phN", FS_KEY_DESCRIPTOR_SIZE, - ctx->master_key_descriptor); - full_key_descriptor[full_key_len - 1] = '\0'; - keyring_key = request_key(&key_type_logon, full_key_descriptor, NULL); - kfree(full_key_descriptor); - if (IS_ERR(keyring_key)) - return PTR_ERR(keyring_key); - down_read(&keyring_key->sem); + key = request_key(&key_type_logon, description, NULL); + kfree(description); + if (IS_ERR(key)) + return key; - if (keyring_key->type != &key_type_logon) { - printk_once(KERN_WARNING - "%s: key type must be logon\n", __func__); - res = -ENOKEY; - goto out; - } - ukp = user_key_payload(keyring_key); - if (!ukp) { - /* key was revoked before we acquired its semaphore */ - res = -EKEYREVOKED; - goto out; - } - if (ukp->datalen != sizeof(struct fscrypt_key)) { - res = -EINVAL; - goto out; - } - master_key = (struct fscrypt_key *)ukp->data; - BUILD_BUG_ON(FS_AES_128_ECB_KEY_SIZE != FS_KEY_DERIVATION_NONCE_SIZE); + down_read(&key->sem); + ukp = user_key_payload_locked(key); - if (master_key->size != FS_AES_256_XTS_KEY_SIZE) { - printk_once(KERN_WARNING - "%s: key size incorrect: %d\n", - __func__, master_key->size); - res = -ENOKEY; - goto out; + if (!ukp) /* was the key revoked before we acquired its semaphore? */ + goto invalid; + + payload = (const struct fscrypt_key *)ukp->data; + + if (ukp->datalen != sizeof(struct fscrypt_key) || + payload->size < 1 || payload->size > FS_MAX_KEY_SIZE) { + fscrypt_warn(NULL, + "key with description '%s' has invalid payload", + key->description); + goto invalid; } - res = derive_key_aes(ctx->nonce, master_key->raw, raw_key); -out: - up_read(&keyring_key->sem); - key_put(keyring_key); - return res; + + if (payload->size < min_keysize) { + fscrypt_warn(NULL, + "key with description '%s' is too short (got %u bytes, need %u+ bytes)", + key->description, payload->size, min_keysize); + goto invalid; + } + + *payload_ret = payload; + return key; + +invalid: + up_read(&key->sem); + key_put(key); + return ERR_PTR(-ENOKEY); } -static int determine_cipher_type(struct fscrypt_info *ci, struct inode *inode, - const char **cipher_str_ret, int *keysize_ret) +/* Find the master key, then derive the inode's actual encryption key */ +static int find_and_derive_key(const struct inode *inode, + const struct fscrypt_context *ctx, + u8 *derived_key, unsigned int derived_keysize) { - if (S_ISREG(inode->i_mode)) { - if (ci->ci_data_mode == FS_ENCRYPTION_MODE_AES_256_XTS) { - *cipher_str_ret = "xts(aes)"; - *keysize_ret = FS_AES_256_XTS_KEY_SIZE; - return 0; - } - pr_warn_once("fscrypto: unsupported contents encryption mode " - "%d for inode %lu\n", - ci->ci_data_mode, inode->i_ino); - return -ENOKEY; + struct key *key; + const struct fscrypt_key *payload; + int err; + + key = find_and_lock_process_key(FS_KEY_DESC_PREFIX, + ctx->master_key_descriptor, + derived_keysize, &payload); + if (key == ERR_PTR(-ENOKEY) && inode->i_sb->s_cop->key_prefix) { + key = find_and_lock_process_key(inode->i_sb->s_cop->key_prefix, + ctx->master_key_descriptor, + derived_keysize, &payload); + } + if (IS_ERR(key)) + return PTR_ERR(key); + err = derive_key_aes(payload->raw, ctx, derived_key, derived_keysize); + up_read(&key->sem); + key_put(key); + return err; +} + +static struct fscrypt_mode { + const char *friendly_name; + const char *cipher_str; + int keysize; + bool logged_impl_name; +} available_modes[] = { + [FS_ENCRYPTION_MODE_AES_256_XTS] = { + .friendly_name = "AES-256-XTS", + .cipher_str = "xts(aes)", + .keysize = 64, + }, + [FS_ENCRYPTION_MODE_AES_256_CTS] = { + .friendly_name = "AES-256-CTS-CBC", + .cipher_str = "cts(cbc(aes))", + .keysize = 32, + }, + [FS_ENCRYPTION_MODE_AES_128_CBC] = { + .friendly_name = "AES-128-CBC", + .cipher_str = "cbc(aes)", + .keysize = 16, + }, + [FS_ENCRYPTION_MODE_AES_128_CTS] = { + .friendly_name = "AES-128-CTS-CBC", + .cipher_str = "cts(cbc(aes))", + .keysize = 16, + }, + [FS_ENCRYPTION_MODE_SPECK128_256_XTS] = { + .friendly_name = "Speck128/256-XTS", + .cipher_str = "xts(speck128)", + .keysize = 64, + }, + [FS_ENCRYPTION_MODE_SPECK128_256_CTS] = { + .friendly_name = "Speck128/256-CTS-CBC", + .cipher_str = "cts(cbc(speck128))", + .keysize = 32, + }, +}; + +static struct fscrypt_mode * +select_encryption_mode(const struct fscrypt_info *ci, const struct inode *inode) +{ + if (!fscrypt_valid_enc_modes(ci->ci_data_mode, ci->ci_filename_mode)) { + fscrypt_warn(inode->i_sb, + "inode %lu uses unsupported encryption modes (contents mode %d, filenames mode %d)", + inode->i_ino, ci->ci_data_mode, + ci->ci_filename_mode); + return ERR_PTR(-EINVAL); } - if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) { - if (ci->ci_filename_mode == FS_ENCRYPTION_MODE_AES_256_CTS) { - *cipher_str_ret = "cts(cbc(aes))"; - *keysize_ret = FS_AES_256_CTS_KEY_SIZE; - return 0; - } - pr_warn_once("fscrypto: unsupported filenames encryption mode " - "%d for inode %lu\n", - ci->ci_filename_mode, inode->i_ino); - return -ENOKEY; - } + if (S_ISREG(inode->i_mode)) + return &available_modes[ci->ci_data_mode]; - pr_warn_once("fscrypto: unsupported file type %d for inode %lu\n", - (inode->i_mode & S_IFMT), inode->i_ino); - return -ENOKEY; + if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) + return &available_modes[ci->ci_filename_mode]; + + WARN_ONCE(1, "fscrypt: filesystem tried to load encryption info for inode %lu, which is not encryptable (file type %d)\n", + inode->i_ino, (inode->i_mode & S_IFMT)); + return ERR_PTR(-EINVAL); } static void put_crypt_info(struct fscrypt_info *ci) @@ -172,37 +213,104 @@ static void put_crypt_info(struct fscrypt_info *ci) return; crypto_free_skcipher(ci->ci_ctfm); + crypto_free_cipher(ci->ci_essiv_tfm); kmem_cache_free(fscrypt_info_cachep, ci); } +static int derive_essiv_salt(const u8 *key, int keysize, u8 *salt) +{ + struct crypto_shash *tfm = READ_ONCE(essiv_hash_tfm); + + /* init hash transform on demand */ + if (unlikely(!tfm)) { + struct crypto_shash *prev_tfm; + + tfm = crypto_alloc_shash("sha256", 0, 0); + if (IS_ERR(tfm)) { + fscrypt_warn(NULL, + "error allocating SHA-256 transform: %ld", + PTR_ERR(tfm)); + return PTR_ERR(tfm); + } + prev_tfm = cmpxchg(&essiv_hash_tfm, NULL, tfm); + if (prev_tfm) { + crypto_free_shash(tfm); + tfm = prev_tfm; + } + } + + { + SHASH_DESC_ON_STACK(desc, tfm); + desc->tfm = tfm; + desc->flags = 0; + + return crypto_shash_digest(desc, key, keysize, salt); + } +} + +static int init_essiv_generator(struct fscrypt_info *ci, const u8 *raw_key, + int keysize) +{ + int err; + struct crypto_cipher *essiv_tfm; + u8 salt[SHA256_DIGEST_SIZE]; + + essiv_tfm = crypto_alloc_cipher("aes", 0, 0); + if (IS_ERR(essiv_tfm)) + return PTR_ERR(essiv_tfm); + + ci->ci_essiv_tfm = essiv_tfm; + + err = derive_essiv_salt(raw_key, keysize, salt); + if (err) + goto out; + + /* + * Using SHA256 to derive the salt/key will result in AES-256 being + * used for IV generation. File contents encryption will still use the + * configured keysize (AES-128) nevertheless. + */ + err = crypto_cipher_setkey(essiv_tfm, salt, sizeof(salt)); + if (err) + goto out; + +out: + memzero_explicit(salt, sizeof(salt)); + return err; +} + +void __exit fscrypt_essiv_cleanup(void) +{ + crypto_free_shash(essiv_hash_tfm); +} + int fscrypt_get_encryption_info(struct inode *inode) { struct fscrypt_info *crypt_info; struct fscrypt_context ctx; struct crypto_skcipher *ctfm; - const char *cipher_str; - int keysize; + struct fscrypt_mode *mode; u8 *raw_key = NULL; int res; if (inode->i_crypt_info) return 0; - res = fscrypt_initialize(); + res = fscrypt_initialize(inode->i_sb->s_cop->flags); if (res) return res; - if (!inode->i_sb->s_cop->get_context) - return -EOPNOTSUPP; - res = inode->i_sb->s_cop->get_context(inode, &ctx, sizeof(ctx)); if (res < 0) { - if (!fscrypt_dummy_context_enabled(inode)) + if (!fscrypt_dummy_context_enabled(inode) || + IS_ENCRYPTED(inode)) return res; + /* Fake up a context for an unencrypted directory */ + memset(&ctx, 0, sizeof(ctx)); ctx.format = FS_ENCRYPTION_CONTEXT_FORMAT_V1; ctx.contents_encryption_mode = FS_ENCRYPTION_MODE_AES_256_XTS; ctx.filenames_encryption_mode = FS_ENCRYPTION_MODE_AES_256_CTS; - ctx.flags = 0; + memset(ctx.master_key_descriptor, 0x42, FS_KEY_DESCRIPTOR_SIZE); } else if (res != sizeof(ctx)) { return -EINVAL; } @@ -221,60 +329,66 @@ int fscrypt_get_encryption_info(struct inode *inode) crypt_info->ci_data_mode = ctx.contents_encryption_mode; crypt_info->ci_filename_mode = ctx.filenames_encryption_mode; crypt_info->ci_ctfm = NULL; + crypt_info->ci_essiv_tfm = NULL; memcpy(crypt_info->ci_master_key, ctx.master_key_descriptor, sizeof(crypt_info->ci_master_key)); - res = determine_cipher_type(crypt_info, inode, &cipher_str, &keysize); - if (res) + mode = select_encryption_mode(crypt_info, inode); + if (IS_ERR(mode)) { + res = PTR_ERR(mode); goto out; + } /* * This cannot be a stack buffer because it is passed to the scatterlist * crypto API as part of key derivation. */ res = -ENOMEM; - raw_key = kmalloc(FS_MAX_KEY_SIZE, GFP_NOFS); + raw_key = kmalloc(mode->keysize, GFP_NOFS); if (!raw_key) goto out; - if (fscrypt_dummy_context_enabled(inode)) { - memset(raw_key, 0x42, FS_AES_256_XTS_KEY_SIZE); - goto got_key; - } - - res = validate_user_key(crypt_info, &ctx, raw_key, - FS_KEY_DESC_PREFIX, FS_KEY_DESC_PREFIX_SIZE); - if (res && inode->i_sb->s_cop->key_prefix) { - u8 *prefix = NULL; - int prefix_size, res2; - - prefix_size = inode->i_sb->s_cop->key_prefix(inode, &prefix); - res2 = validate_user_key(crypt_info, &ctx, raw_key, - prefix, prefix_size); - if (res2) { - if (res2 == -ENOKEY) - res = -ENOKEY; - goto out; - } - } else if (res) { - goto out; - } -got_key: - ctfm = crypto_alloc_skcipher(cipher_str, 0, 0); - if (!ctfm || IS_ERR(ctfm)) { - res = ctfm ? PTR_ERR(ctfm) : -ENOMEM; - printk(KERN_DEBUG - "%s: error %d (inode %u) allocating crypto tfm\n", - __func__, res, (unsigned) inode->i_ino); - goto out; - } - crypt_info->ci_ctfm = ctfm; - crypto_skcipher_clear_flags(ctfm, ~0); - crypto_skcipher_set_flags(ctfm, CRYPTO_TFM_REQ_WEAK_KEY); - res = crypto_skcipher_setkey(ctfm, raw_key, keysize); + res = find_and_derive_key(inode, &ctx, raw_key, mode->keysize); if (res) goto out; + ctfm = crypto_alloc_skcipher(mode->cipher_str, 0, 0); + if (IS_ERR(ctfm)) { + res = PTR_ERR(ctfm); + fscrypt_warn(inode->i_sb, + "error allocating '%s' transform for inode %lu: %d", + mode->cipher_str, inode->i_ino, res); + goto out; + } + if (unlikely(!mode->logged_impl_name)) { + /* + * fscrypt performance can vary greatly depending on which + * crypto algorithm implementation is used. Help people debug + * performance problems by logging the ->cra_driver_name the + * first time a mode is used. Note that multiple threads can + * race here, but it doesn't really matter. + */ + mode->logged_impl_name = true; + pr_info("fscrypt: %s using implementation \"%s\"\n", + mode->friendly_name, + crypto_skcipher_alg(ctfm)->base.cra_driver_name); + } + crypt_info->ci_ctfm = ctfm; + crypto_skcipher_set_flags(ctfm, CRYPTO_TFM_REQ_WEAK_KEY); + res = crypto_skcipher_setkey(ctfm, raw_key, mode->keysize); + if (res) + goto out; + + if (S_ISREG(inode->i_mode) && + crypt_info->ci_data_mode == FS_ENCRYPTION_MODE_AES_128_CBC) { + res = init_essiv_generator(crypt_info, raw_key, mode->keysize); + if (res) { + fscrypt_warn(inode->i_sb, + "error initializing ESSIV generator for inode %lu: %d", + inode->i_ino, res); + goto out; + } + } if (cmpxchg(&inode->i_crypt_info, NULL, crypt_info) == NULL) crypt_info = NULL; out: @@ -286,19 +400,9 @@ int fscrypt_get_encryption_info(struct inode *inode) } EXPORT_SYMBOL(fscrypt_get_encryption_info); -void fscrypt_put_encryption_info(struct inode *inode, struct fscrypt_info *ci) +void fscrypt_put_encryption_info(struct inode *inode) { - struct fscrypt_info *prev; - - if (ci == NULL) - ci = ACCESS_ONCE(inode->i_crypt_info); - if (ci == NULL) - return; - - prev = cmpxchg(&inode->i_crypt_info, ci, NULL); - if (prev != ci) - return; - - put_crypt_info(ci); + put_crypt_info(inode->i_crypt_info); + inode->i_crypt_info = NULL; } EXPORT_SYMBOL(fscrypt_put_encryption_info);
diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c index c160d2d..2f2c53f 100644 --- a/fs/crypto/policy.c +++ b/fs/crypto/policy.c
@@ -10,76 +10,37 @@ #include <linux/random.h> #include <linux/string.h> -#include <linux/fscrypto.h> #include <linux/mount.h> - -static int inode_has_encryption_context(struct inode *inode) -{ - if (!inode->i_sb->s_cop->get_context) - return 0; - return (inode->i_sb->s_cop->get_context(inode, NULL, 0L) > 0); -} +#include "fscrypt_private.h" /* - * check whether the policy is consistent with the encryption context - * for the inode + * check whether an encryption policy is consistent with an encryption context */ -static int is_encryption_context_consistent_with_policy(struct inode *inode, +static bool is_encryption_context_consistent_with_policy( + const struct fscrypt_context *ctx, const struct fscrypt_policy *policy) { - struct fscrypt_context ctx; - int res; - - if (!inode->i_sb->s_cop->get_context) - return 0; - - res = inode->i_sb->s_cop->get_context(inode, &ctx, sizeof(ctx)); - if (res != sizeof(ctx)) - return 0; - - return (memcmp(ctx.master_key_descriptor, policy->master_key_descriptor, - FS_KEY_DESCRIPTOR_SIZE) == 0 && - (ctx.flags == policy->flags) && - (ctx.contents_encryption_mode == - policy->contents_encryption_mode) && - (ctx.filenames_encryption_mode == - policy->filenames_encryption_mode)); + return memcmp(ctx->master_key_descriptor, policy->master_key_descriptor, + FS_KEY_DESCRIPTOR_SIZE) == 0 && + (ctx->flags == policy->flags) && + (ctx->contents_encryption_mode == + policy->contents_encryption_mode) && + (ctx->filenames_encryption_mode == + policy->filenames_encryption_mode); } static int create_encryption_context_from_policy(struct inode *inode, const struct fscrypt_policy *policy) { struct fscrypt_context ctx; - int res; - - if (!inode->i_sb->s_cop->set_context) - return -EOPNOTSUPP; - - if (inode->i_sb->s_cop->prepare_context) { - res = inode->i_sb->s_cop->prepare_context(inode); - if (res) - return res; - } ctx.format = FS_ENCRYPTION_CONTEXT_FORMAT_V1; memcpy(ctx.master_key_descriptor, policy->master_key_descriptor, FS_KEY_DESCRIPTOR_SIZE); - if (!fscrypt_valid_contents_enc_mode( - policy->contents_encryption_mode)) { - printk(KERN_WARNING - "%s: Invalid contents encryption mode %d\n", __func__, - policy->contents_encryption_mode); + if (!fscrypt_valid_enc_modes(policy->contents_encryption_mode, + policy->filenames_encryption_mode)) return -EINVAL; - } - - if (!fscrypt_valid_filenames_enc_mode( - policy->filenames_encryption_mode)) { - printk(KERN_WARNING - "%s: Invalid filenames encryption mode %d\n", __func__, - policy->filenames_encryption_mode); - return -EINVAL; - } if (policy->flags & ~FS_POLICY_FLAGS_VALID) return -EINVAL; @@ -93,16 +54,20 @@ static int create_encryption_context_from_policy(struct inode *inode, return inode->i_sb->s_cop->set_context(inode, &ctx, sizeof(ctx), NULL); } -int fscrypt_process_policy(struct file *filp, - const struct fscrypt_policy *policy) +int fscrypt_ioctl_set_policy(struct file *filp, const void __user *arg) { + struct fscrypt_policy policy; struct inode *inode = file_inode(filp); int ret; + struct fscrypt_context ctx; + + if (copy_from_user(&policy, arg, sizeof(policy))) + return -EFAULT; if (!inode_owner_or_capable(inode)) return -EACCES; - if (policy->version != 0) + if (policy.version != 0) return -EINVAL; ret = mnt_want_write_file(filp); @@ -111,22 +76,23 @@ int fscrypt_process_policy(struct file *filp, inode_lock(inode); - if (!inode_has_encryption_context(inode)) { + ret = inode->i_sb->s_cop->get_context(inode, &ctx, sizeof(ctx)); + if (ret == -ENODATA) { if (!S_ISDIR(inode->i_mode)) ret = -ENOTDIR; - else if (!inode->i_sb->s_cop->empty_dir) - ret = -EOPNOTSUPP; else if (!inode->i_sb->s_cop->empty_dir(inode)) ret = -ENOTEMPTY; else ret = create_encryption_context_from_policy(inode, - policy); - } else if (!is_encryption_context_consistent_with_policy(inode, - policy)) { - printk(KERN_WARNING - "%s: Policy inconsistent with encryption context\n", - __func__); - ret = -EINVAL; + &policy); + } else if (ret == sizeof(ctx) && + is_encryption_context_consistent_with_policy(&ctx, + &policy)) { + /* The file already uses the same encryption policy. */ + ret = 0; + } else if (ret >= 0 || ret == -ERANGE) { + /* The file already uses a different encryption policy. */ + ret = -EEXIST; } inode_unlock(inode); @@ -134,32 +100,38 @@ int fscrypt_process_policy(struct file *filp, mnt_drop_write_file(filp); return ret; } -EXPORT_SYMBOL(fscrypt_process_policy); +EXPORT_SYMBOL(fscrypt_ioctl_set_policy); -int fscrypt_get_policy(struct inode *inode, struct fscrypt_policy *policy) +int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg) { + struct inode *inode = file_inode(filp); struct fscrypt_context ctx; + struct fscrypt_policy policy; int res; - if (!inode->i_sb->s_cop->get_context || - !inode->i_sb->s_cop->is_encrypted(inode)) + if (!IS_ENCRYPTED(inode)) return -ENODATA; res = inode->i_sb->s_cop->get_context(inode, &ctx, sizeof(ctx)); + if (res < 0 && res != -ERANGE) + return res; if (res != sizeof(ctx)) - return -ENODATA; + return -EINVAL; if (ctx.format != FS_ENCRYPTION_CONTEXT_FORMAT_V1) return -EINVAL; - policy->version = 0; - policy->contents_encryption_mode = ctx.contents_encryption_mode; - policy->filenames_encryption_mode = ctx.filenames_encryption_mode; - policy->flags = ctx.flags; - memcpy(&policy->master_key_descriptor, ctx.master_key_descriptor, + policy.version = 0; + policy.contents_encryption_mode = ctx.contents_encryption_mode; + policy.filenames_encryption_mode = ctx.filenames_encryption_mode; + policy.flags = ctx.flags; + memcpy(policy.master_key_descriptor, ctx.master_key_descriptor, FS_KEY_DESCRIPTOR_SIZE); + + if (copy_to_user(arg, &policy, sizeof(policy))) + return -EFAULT; return 0; } -EXPORT_SYMBOL(fscrypt_get_policy); +EXPORT_SYMBOL(fscrypt_ioctl_get_policy); /** * fscrypt_has_permitted_context() - is a file's encryption policy permitted @@ -194,11 +166,11 @@ int fscrypt_has_permitted_context(struct inode *parent, struct inode *child) return 1; /* No restrictions if the parent directory is unencrypted */ - if (!cops->is_encrypted(parent)) + if (!IS_ENCRYPTED(parent)) return 1; /* Encrypted directories must not contain unencrypted files */ - if (!cops->is_encrypted(child)) + if (!IS_ENCRYPTED(child)) return 0; /* @@ -258,9 +230,9 @@ EXPORT_SYMBOL(fscrypt_has_permitted_context); * @parent: Parent inode from which the context is inherited. * @child: Child inode that inherits the context from @parent. * @fs_data: private data given by FS. - * @preload: preload child i_crypt_info + * @preload: preload child i_crypt_info if true * - * Return: Zero on success, non-zero otherwise + * Return: 0 on success, -errno on failure */ int fscrypt_inherit_context(struct inode *parent, struct inode *child, void *fs_data, bool preload) @@ -269,9 +241,6 @@ int fscrypt_inherit_context(struct inode *parent, struct inode *child, struct fscrypt_info *ci; int res; - if (!parent->i_sb->s_cop->set_context) - return -EOPNOTSUPP; - res = fscrypt_get_encryption_info(parent); if (res < 0) return res; @@ -281,19 +250,11 @@ int fscrypt_inherit_context(struct inode *parent, struct inode *child, return -ENOKEY; ctx.format = FS_ENCRYPTION_CONTEXT_FORMAT_V1; - if (fscrypt_dummy_context_enabled(parent)) { - ctx.contents_encryption_mode = FS_ENCRYPTION_MODE_AES_256_XTS; - ctx.filenames_encryption_mode = FS_ENCRYPTION_MODE_AES_256_CTS; - ctx.flags = 0; - memset(ctx.master_key_descriptor, 0x42, FS_KEY_DESCRIPTOR_SIZE); - res = 0; - } else { - ctx.contents_encryption_mode = ci->ci_data_mode; - ctx.filenames_encryption_mode = ci->ci_filename_mode; - ctx.flags = ci->ci_flags; - memcpy(ctx.master_key_descriptor, ci->ci_master_key, - FS_KEY_DESCRIPTOR_SIZE); - } + ctx.contents_encryption_mode = ci->ci_data_mode; + ctx.filenames_encryption_mode = ci->ci_filename_mode; + ctx.flags = ci->ci_flags; + memcpy(ctx.master_key_descriptor, ci->ci_master_key, + FS_KEY_DESCRIPTOR_SIZE); get_random_bytes(ctx.nonce, FS_KEY_DERIVATION_NONCE_SIZE); res = parent->i_sb->s_cop->set_context(child, &ctx, sizeof(ctx), fs_data);
diff --git a/fs/dcache.c b/fs/dcache.c index f903b86..183e0ae 100644 --- a/fs/dcache.c +++ b/fs/dcache.c
@@ -3257,6 +3257,7 @@ char *d_absolute_path(const struct path *path, return ERR_PTR(error); return res; } +EXPORT_SYMBOL(d_absolute_path); /* * same as __d_path but appends "(deleted)" for unlinked files.
diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h index a896e46..d4d8ad1 100644 --- a/fs/ecryptfs/ecryptfs_kernel.h +++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -125,7 +125,7 @@ ecryptfs_get_key_payload_data(struct key *key) if (auth_tok) return auth_tok; - ukp = user_key_payload(key); + ukp = user_key_payload_locked(key); if (!ukp) return ERR_PTR(-EKEYREVOKED);
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 3cbc304..5b96ba7 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c
@@ -34,6 +34,7 @@ #include <linux/mutex.h> #include <linux/anon_inodes.h> #include <linux/device.h> +#include <linux/freezer.h> #include <asm/uaccess.h> #include <asm/io.h> #include <asm/mman.h> @@ -1673,7 +1674,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, } spin_unlock_irqrestore(&ep->lock, flags); - if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) + if (!freezable_schedule_hrtimeout_range(to, slack, + HRTIMER_MODE_ABS)) timed_out = 1; spin_lock_irqsave(&ep->lock, flags);
diff --git a/fs/exec.c b/fs/exec.c index fcd8642e..df13a19 100644 --- a/fs/exec.c +++ b/fs/exec.c
@@ -1303,7 +1303,7 @@ EXPORT_SYMBOL(flush_old_exec); void would_dump(struct linux_binprm *bprm, struct file *file) { struct inode *inode = file_inode(file); - if (inode_permission(inode, MAY_READ) < 0) { + if (inode_permission2(file->f_path.mnt, inode, MAY_READ) < 0) { struct user_namespace *old, *user_ns; bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c index d8072bc..7f5b735 100644 --- a/fs/exofs/inode.c +++ b/fs/exofs/inode.c
@@ -377,9 +377,8 @@ static int read_exec(struct page_collect *pcol) * and will start a new collection. Eventually caller must submit the last * segment if present. */ -static int readpage_strip(void *data, struct page *page) +static int __readpage_strip(struct page_collect *pcol, struct page *page) { - struct page_collect *pcol = data; struct inode *inode = pcol->inode; struct exofs_i_info *oi = exofs_i(inode); loff_t i_size = i_size_read(inode); @@ -470,6 +469,13 @@ static int readpage_strip(void *data, struct page *page) return ret; } +static int readpage_strip(struct file *data, struct page *page) +{ + struct page_collect *pcol = (struct page_collect *)data; + + return __readpage_strip(pcol, page); +} + static int exofs_readpages(struct file *file, struct address_space *mapping, struct list_head *pages, unsigned nr_pages) { @@ -499,7 +505,7 @@ static int _readpage(struct page *page, bool read_4_write) _pcol_init(&pcol, 1, page->mapping->host); pcol.read_4_write = read_4_write; - ret = readpage_strip(&pcol, page); + ret = __readpage_strip(&pcol, page); if (ret) { EXOFS_ERR("_readpage => %d\n", ret); return ret;
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 43e27d8..0647538 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h
@@ -32,13 +32,15 @@ #include <linux/percpu_counter.h> #include <linux/ratelimit.h> #include <crypto/hash.h> -#include <linux/fscrypto.h> #include <linux/falloc.h> #include <linux/percpu-rwsem.h> #ifdef __KERNEL__ #include <linux/compat.h> #endif +#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION) +#include <linux/fscrypt.h> + /* * The fourth extended filesystem constants/structures */ @@ -1342,11 +1344,6 @@ struct ext4_super_block { /* Number of quota types we support */ #define EXT4_MAXQUOTAS 3 -#ifdef CONFIG_EXT4_FS_ENCRYPTION -#define EXT4_KEY_DESC_PREFIX "ext4:" -#define EXT4_KEY_DESC_PREFIX_SIZE 5 -#endif - /* * fourth extended-fs super-block data in memory */ @@ -1516,12 +1513,6 @@ struct ext4_sb_info { /* Barrier between changing inodes' journal flags and writepages ops. */ struct percpu_rw_semaphore s_journal_flag_rwsem; - - /* Encryption support */ -#ifdef CONFIG_EXT4_FS_ENCRYPTION - u8 key_prefix[EXT4_KEY_DESC_PREFIX_SIZE]; - u8 key_prefix_size; -#endif }; static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb) @@ -2272,11 +2263,6 @@ extern unsigned ext4_free_clusters_after_init(struct super_block *sb, struct ext4_group_desc *gdp); ext4_fsblk_t ext4_inode_to_goal_block(struct inode *); -static inline int ext4_sb_has_crypto(struct super_block *sb) -{ - return ext4_has_feature_encrypt(sb); -} - static inline bool ext4_encrypted_inode(struct inode *inode) { return ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT); @@ -2325,28 +2311,6 @@ static inline int ext4_fname_setup_filename(struct inode *dir, } static inline void ext4_fname_free_filename(struct ext4_filename *fname) { } -#define fscrypt_set_d_op(i) -#define fscrypt_get_ctx fscrypt_notsupp_get_ctx -#define fscrypt_release_ctx fscrypt_notsupp_release_ctx -#define fscrypt_encrypt_page fscrypt_notsupp_encrypt_page -#define fscrypt_decrypt_page fscrypt_notsupp_decrypt_page -#define fscrypt_decrypt_bio_pages fscrypt_notsupp_decrypt_bio_pages -#define fscrypt_pullback_bio_page fscrypt_notsupp_pullback_bio_page -#define fscrypt_restore_control_page fscrypt_notsupp_restore_control_page -#define fscrypt_zeroout_range fscrypt_notsupp_zeroout_range -#define fscrypt_process_policy fscrypt_notsupp_process_policy -#define fscrypt_get_policy fscrypt_notsupp_get_policy -#define fscrypt_has_permitted_context fscrypt_notsupp_has_permitted_context -#define fscrypt_inherit_context fscrypt_notsupp_inherit_context -#define fscrypt_get_encryption_info fscrypt_notsupp_get_encryption_info -#define fscrypt_put_encryption_info fscrypt_notsupp_put_encryption_info -#define fscrypt_setup_filename fscrypt_notsupp_setup_filename -#define fscrypt_free_filename fscrypt_notsupp_free_filename -#define fscrypt_fname_encrypted_size fscrypt_notsupp_fname_encrypted_size -#define fscrypt_fname_alloc_buffer fscrypt_notsupp_fname_alloc_buffer -#define fscrypt_fname_free_buffer fscrypt_notsupp_fname_free_buffer -#define fscrypt_fname_disk_to_usr fscrypt_notsupp_fname_disk_to_usr -#define fscrypt_fname_usr_to_disk fscrypt_notsupp_fname_usr_to_disk #endif /* dir.c */ @@ -2368,17 +2332,16 @@ extern int ext4_find_dest_de(struct inode *dir, struct inode *inode, void *buf, int buf_size, struct ext4_filename *fname, struct ext4_dir_entry_2 **dest_de); -int ext4_insert_dentry(struct inode *dir, - struct inode *inode, - struct ext4_dir_entry_2 *de, - int buf_size, - struct ext4_filename *fname); +void ext4_insert_dentry(struct inode *inode, + struct ext4_dir_entry_2 *de, + int buf_size, + struct ext4_filename *fname); static inline void ext4_update_dx_flag(struct inode *inode) { if (!ext4_has_feature_dir_index(inode->i_sb)) ext4_clear_inode_flag(inode, EXT4_INODE_INDEX); } -static unsigned char ext4_filetype_table[] = { +static const unsigned char ext4_filetype_table[] = { DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK }; @@ -2445,7 +2408,8 @@ extern int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t i, struct ext4_group_desc *desc); extern int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, ext4_fsblk_t block, unsigned long count); -extern int ext4_trim_fs(struct super_block *, struct fstrim_range *); +extern int ext4_trim_fs(struct super_block *, struct fstrim_range *, + unsigned long blkdev_flags); /* inode.c */ int ext4_inode_is_fast_symlink(struct inode *inode); @@ -3065,7 +3029,7 @@ extern int ext4_handle_dirty_dirent_node(handle_t *handle, struct inode *inode, struct buffer_head *bh); #define S_SHIFT 12 -static unsigned char ext4_type_by_mode[S_IFMT >> S_SHIFT] = { +static const unsigned char ext4_type_by_mode[S_IFMT >> S_SHIFT] = { [S_IFREG >> S_SHIFT] = EXT4_FT_REG_FILE, [S_IFDIR >> S_SHIFT] = EXT4_FT_DIR, [S_IFCHR >> S_SHIFT] = EXT4_FT_CHRDEV,
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 4f78e09..004c088 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c
@@ -1072,6 +1072,17 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, if (err) goto fail_drop; + /* + * Since the encryption xattr will always be unique, create it first so + * that it's less likely to end up in an external xattr block and + * prevent its deduplication. + */ + if (encrypt) { + err = fscrypt_inherit_context(dir, inode, handle, true); + if (err) + goto fail_free_drop; + } + err = ext4_init_acl(handle, inode, dir); if (err) goto fail_free_drop; @@ -1093,13 +1104,6 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, ei->i_datasync_tid = handle->h_transaction->t_tid; } - if (encrypt) { - /* give pointer to avoid set_context with journal ops. */ - err = fscrypt_inherit_context(dir, inode, &encrypt, true); - if (err) - goto fail_free_drop; - } - err = ext4_mark_inode_dirty(handle, inode); if (err) { ext4_std_error(sb, err);
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 211539a..f901b643 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c
@@ -18,6 +18,7 @@ #include "ext4.h" #include "xattr.h" #include "truncate.h" +#include <trace/events/android_fs.h> #define EXT4_XATTR_SYSTEM_DATA "data" #define EXT4_MIN_INLINE_DATA_SIZE ((sizeof(__le32) * EXT4_N_BLOCKS)) @@ -502,6 +503,17 @@ int ext4_readpage_inline(struct inode *inode, struct page *page) return -EAGAIN; } + if (trace_android_fs_dataread_start_enabled()) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_dataread_start(inode, page_offset(page), + PAGE_SIZE, current->pid, + path, current->comm); + } + /* * Current inline data can only exist in the 1st page, * So for all the other pages, just set them uptodate. @@ -513,6 +525,8 @@ int ext4_readpage_inline(struct inode *inode, struct page *page) SetPageUptodate(page); } + trace_android_fs_dataread_end(inode, page_offset(page), PAGE_SIZE); + up_read(&EXT4_I(inode)->xattr_sem); unlock_page(page); @@ -1025,7 +1039,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle, err = ext4_journal_get_write_access(handle, iloc->bh); if (err) return err; - ext4_insert_dentry(dir, inode, de, inline_size, fname); + ext4_insert_dentry(inode, de, inline_size, fname); ext4_show_inline_dir(dir, iloc->bh, inline_start, inline_size);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f62eca8..ba50c10 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c
@@ -44,6 +44,7 @@ #include "truncate.h" #include <trace/events/ext4.h> +#include <trace/events/android_fs.h> #define MPAGE_DA_EXTENT_TAIL 0x01 @@ -1165,7 +1166,8 @@ static int ext4_block_write_begin(struct page *page, loff_t pos, unsigned len, if (unlikely(err)) page_zero_new_buffers(page, from, to); else if (decrypt) - err = fscrypt_decrypt_page(page); + err = fscrypt_decrypt_page(page->mapping->host, page, + PAGE_SIZE, 0, page->index); return err; } #endif @@ -1182,6 +1184,16 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, pgoff_t index; unsigned from, to; + if (trace_android_fs_datawrite_start_enabled()) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_datawrite_start(inode, pos, len, + current->pid, path, + current->comm); + } trace_ext4_write_begin(inode, pos, len, flags); /* * Reserve one block more for addition to orphan list in case @@ -1320,6 +1332,7 @@ static int ext4_write_end(struct file *file, int i_size_changed = 0; int inline_data = ext4_has_inline_data(inode); + trace_android_fs_datawrite_end(inode, pos, len); trace_ext4_write_end(inode, pos, len, copied); if (inline_data) { ret = ext4_write_inline_data_end(inode, pos, len, @@ -1425,6 +1438,7 @@ static int ext4_journalled_write_end(struct file *file, int size_changed = 0; int inline_data = ext4_has_inline_data(inode); + trace_android_fs_datawrite_end(inode, pos, len); trace_ext4_journalled_write_end(inode, pos, len, copied); from = pos & (PAGE_SIZE - 1); to = from + len; @@ -2553,8 +2567,8 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) mpd->map.m_len = 0; mpd->next_page = index; while (index <= end) { - nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); + nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, + tag); if (nr_pages == 0) goto out; @@ -2562,16 +2576,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) struct page *page = pvec.pages[i]; /* - * At this point, the page may be truncated or - * invalidated (changing page->mapping to NULL), or - * even swizzled back from swapper_space to tmpfs file - * mapping. However, page->index will not change - * because we have a reference on the page. - */ - if (page->index > end) - goto out; - - /* * Accumulated enough dirty pages? This doesn't apply * to WB_SYNC_ALL mode. For integrity sync we have to * keep going because someone may be concurrently @@ -2921,6 +2925,16 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, len, flags, pagep, fsdata); } *fsdata = (void *)0; + if (trace_android_fs_datawrite_start_enabled()) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_datawrite_start(inode, pos, len, + current->pid, + path, current->comm); + } trace_ext4_da_write_begin(inode, pos, len, flags); if (ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) { @@ -3039,6 +3053,7 @@ static int ext4_da_write_end(struct file *file, return ext4_write_end(file, mapping, pos, len, copied, page, fsdata); + trace_android_fs_datawrite_end(inode, pos, len); trace_ext4_da_write_end(inode, pos, len, copied); start = pos & (PAGE_SIZE - 1); end = start + copied - 1; @@ -3602,6 +3617,7 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) size_t count = iov_iter_count(iter); loff_t offset = iocb->ki_pos; ssize_t ret; + int rw = iov_iter_rw(iter); #ifdef CONFIG_EXT4_FS_ENCRYPTION if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode)) @@ -3618,12 +3634,42 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) if (ext4_has_inline_data(inode)) return 0; + if (trace_android_fs_dataread_start_enabled() && + (rw == READ)) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_dataread_start(inode, offset, count, + current->pid, path, + current->comm); + } + if (trace_android_fs_datawrite_start_enabled() && + (rw == WRITE)) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_datawrite_start(inode, offset, count, + current->pid, path, + current->comm); + } trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter)); if (iov_iter_rw(iter) == READ) ret = ext4_direct_IO_read(iocb, iter); else ret = ext4_direct_IO_write(iocb, iter); trace_ext4_direct_IO_exit(inode, offset, count, iov_iter_rw(iter), ret); + + if (trace_android_fs_dataread_start_enabled() && + (rw == READ)) + trace_android_fs_dataread_end(inode, offset, count); + if (trace_android_fs_datawrite_start_enabled() && + (rw == WRITE)) + trace_android_fs_datawrite_end(inode, offset, count); + return ret; } @@ -3774,7 +3820,8 @@ static int __ext4_block_zero_page_range(handle_t *handle, /* We expect the key to be set. */ BUG_ON(!fscrypt_has_encryption_key(inode)); BUG_ON(blocksize != PAGE_SIZE); - WARN_ON_ONCE(fscrypt_decrypt_page(page)); + WARN_ON_ONCE(fscrypt_decrypt_page(page->mapping->host, + page, PAGE_SIZE, 0, page->index)); } } if (ext4_should_journal_data(inode)) { @@ -4392,8 +4439,11 @@ void ext4_set_inode_flags(struct inode *inode) new_fl |= S_DIRSYNC; if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode)) new_fl |= S_DAX; + if (flags & EXT4_ENCRYPT_FL) + new_fl |= S_ENCRYPTED; inode_set_flags(inode, new_fl, - S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX); + S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX| + S_ENCRYPTED); } /* Propagate flags from i_flags to EXT4_I(inode)->i_flags */
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index bf5ae8e..15ca15c 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c
@@ -191,6 +191,7 @@ static long swap_inode_boot_loader(struct super_block *sb, return err; } +#ifdef CONFIG_EXT4_FS_ENCRYPTION static int uuid_is_zero(__u8 u[16]) { int i; @@ -200,6 +201,7 @@ static int uuid_is_zero(__u8 u[16]) return 0; return 1; } +#endif static int ext4_ioctl_setflags(struct inode *inode, unsigned int flags) @@ -735,11 +737,13 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) return err; } + case FIDTRIM: case FITRIM: { struct request_queue *q = bdev_get_queue(sb->s_bdev); struct fstrim_range range; int ret = 0; + int flags = cmd == FIDTRIM ? BLKDEV_DISCARD_SECURE : 0; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -747,13 +751,16 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (!blk_queue_discard(q)) return -EOPNOTSUPP; + if ((flags & BLKDEV_DISCARD_SECURE) && !blk_queue_secure_erase(q)) + return -EOPNOTSUPP; + if (copy_from_user(&range, (struct fstrim_range __user *)arg, sizeof(range))) return -EFAULT; range.minlen = max((unsigned int)range.minlen, q->limits.discard_granularity); - ret = ext4_trim_fs(sb, &range); + ret = ext4_trim_fs(sb, &range, flags); if (ret < 0) return ret; @@ -765,28 +772,19 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) } case EXT4_IOC_PRECACHE_EXTENTS: return ext4_ext_precache(inode); - case EXT4_IOC_SET_ENCRYPTION_POLICY: { -#ifdef CONFIG_EXT4_FS_ENCRYPTION - struct fscrypt_policy policy; + case EXT4_IOC_SET_ENCRYPTION_POLICY: if (!ext4_has_feature_encrypt(sb)) return -EOPNOTSUPP; + return fscrypt_ioctl_set_policy(filp, (const void __user *)arg); - if (copy_from_user(&policy, - (struct fscrypt_policy __user *)arg, - sizeof(policy))) - return -EFAULT; - return fscrypt_process_policy(filp, &policy); -#else - return -EOPNOTSUPP; -#endif - } case EXT4_IOC_GET_ENCRYPTION_PWSALT: { +#ifdef CONFIG_EXT4_FS_ENCRYPTION int err, err2; struct ext4_sb_info *sbi = EXT4_SB(sb); handle_t *handle; - if (!ext4_sb_has_crypto(sb)) + if (!ext4_has_feature_encrypt(sb)) return -EOPNOTSUPP; if (uuid_is_zero(sbi->s_es->s_encrypt_pw_salt)) { err = mnt_want_write_file(filp); @@ -816,30 +814,18 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) sbi->s_es->s_encrypt_pw_salt, 16)) return -EFAULT; return 0; - } - case EXT4_IOC_GET_ENCRYPTION_POLICY: { -#ifdef CONFIG_EXT4_FS_ENCRYPTION - struct fscrypt_policy policy; - int err = 0; - - if (!ext4_encrypted_inode(inode)) - return -ENOENT; - err = fscrypt_get_policy(inode, &policy); - if (err) - return err; - if (copy_to_user((void __user *)arg, &policy, sizeof(policy))) - return -EFAULT; - return 0; #else return -EOPNOTSUPP; #endif } + case EXT4_IOC_GET_ENCRYPTION_POLICY: + return fscrypt_ioctl_get_policy(filp, (void __user *)arg); + case EXT4_IOC_FSGETXATTR: { struct fsxattr fa; memset(&fa, 0, sizeof(struct fsxattr)); - ext4_get_inode_flags(ei); fa.fsx_xflags = ext4_iflags_to_xflags(ei->i_flags & EXT4_FL_USER_VISIBLE); if (ext4_has_feature_project(inode->i_sb)) {
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index a49d0e5d7..3d6f73e 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c
@@ -2775,7 +2775,8 @@ int ext4_mb_release(struct super_block *sb) } static inline int ext4_issue_discard(struct super_block *sb, - ext4_group_t block_group, ext4_grpblk_t cluster, int count) + ext4_group_t block_group, ext4_grpblk_t cluster, int count, + unsigned long flags) { ext4_fsblk_t discard_block; @@ -2784,7 +2785,7 @@ static inline int ext4_issue_discard(struct super_block *sb, count = EXT4_C2B(EXT4_SB(sb), count); trace_ext4_discard_blocks(sb, (unsigned long long) discard_block, count); - return sb_issue_discard(sb, discard_block, count, GFP_NOFS, 0); + return sb_issue_discard(sb, discard_block, count, GFP_NOFS, flags); } /* @@ -2806,7 +2807,7 @@ static void ext4_free_data_callback(struct super_block *sb, if (test_opt(sb, DISCARD)) { err = ext4_issue_discard(sb, entry->efd_group, entry->efd_start_cluster, - entry->efd_count); + entry->efd_count, 0); if (err && err != -EOPNOTSUPP) ext4_msg(sb, KERN_WARNING, "discard request in" " group:%d block:%d count:%d failed" @@ -4865,7 +4866,8 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, * them with group lock_held */ if (test_opt(sb, DISCARD)) { - err = ext4_issue_discard(sb, block_group, bit, count); + err = ext4_issue_discard(sb, block_group, bit, count, + 0); if (err && err != -EOPNOTSUPP) ext4_msg(sb, KERN_WARNING, "discard request in" " group:%d block:%d count:%lu failed" @@ -5061,13 +5063,15 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, * @count: number of blocks to TRIM * @group: alloc. group we are working with * @e4b: ext4 buddy for the group + * @blkdev_flags: flags for the block device * * Trim "count" blocks starting at "start" in the "group". To assure that no * one will allocate those blocks, mark it as used in buddy bitmap. This must * be called with under the group lock. */ static int ext4_trim_extent(struct super_block *sb, int start, int count, - ext4_group_t group, struct ext4_buddy *e4b) + ext4_group_t group, struct ext4_buddy *e4b, + unsigned long blkdev_flags) __releases(bitlock) __acquires(bitlock) { @@ -5088,7 +5092,7 @@ __acquires(bitlock) */ mb_mark_used(e4b, &ex); ext4_unlock_group(sb, group); - ret = ext4_issue_discard(sb, group, start, count); + ret = ext4_issue_discard(sb, group, start, count, blkdev_flags); ext4_lock_group(sb, group); mb_free_blocks(NULL, e4b, start, ex.fe_len); return ret; @@ -5101,6 +5105,7 @@ __acquires(bitlock) * @start: first group block to examine * @max: last group block to examine * @minblocks: minimum extent block count + * @blkdev_flags: flags for the block device * * ext4_trim_all_free walks through group's buddy bitmap searching for free * extents. When the free block is found, ext4_trim_extent is called to TRIM @@ -5115,7 +5120,7 @@ __acquires(bitlock) static ext4_grpblk_t ext4_trim_all_free(struct super_block *sb, ext4_group_t group, ext4_grpblk_t start, ext4_grpblk_t max, - ext4_grpblk_t minblocks) + ext4_grpblk_t minblocks, unsigned long blkdev_flags) { void *bitmap; ext4_grpblk_t next, count = 0, free_count = 0; @@ -5148,7 +5153,8 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group, if ((next - start) >= minblocks) { ret = ext4_trim_extent(sb, start, - next - start, group, &e4b); + next - start, group, &e4b, + blkdev_flags); if (ret && ret != -EOPNOTSUPP) break; ret = 0; @@ -5190,6 +5196,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group, * ext4_trim_fs() -- trim ioctl handle function * @sb: superblock for filesystem * @range: fstrim_range structure + * @blkdev_flags: flags for the block device * * start: First Byte to trim * len: number of Bytes to trim from start @@ -5198,7 +5205,8 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group, * start to start+len. For each such a group ext4_trim_all_free function * is invoked to trim all free space. */ -int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range) +int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range, + unsigned long blkdev_flags) { struct ext4_group_info *grp; ext4_group_t group, first_group, last_group; @@ -5254,7 +5262,7 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range) if (grp->bb_free >= minlen) { cnt = ext4_trim_all_free(sb, group, first_cluster, - end, minlen); + end, minlen, blkdev_flags); if (cnt < 0) { ret = cnt; break;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index bc727c3..9bad755 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c
@@ -1237,37 +1237,24 @@ static void dx_insert_block(struct dx_frame *frame, u32 hash, ext4_lblk_t block) } /* - * NOTE! unlike strncmp, ext4_match returns 1 for success, 0 for failure. + * Test whether a directory entry matches the filename being searched for. * - * `len <= EXT4_NAME_LEN' is guaranteed by caller. - * `de != NULL' is guaranteed by caller. + * Return: %true if the directory entry matches, otherwise %false. */ -static inline int ext4_match(struct ext4_filename *fname, - struct ext4_dir_entry_2 *de) +static inline bool ext4_match(const struct ext4_filename *fname, + const struct ext4_dir_entry_2 *de) { - const void *name = fname_name(fname); - u32 len = fname_len(fname); + struct fscrypt_name f; if (!de->inode) - return 0; + return false; + f.usr_fname = fname->usr_fname; + f.disk_name = fname->disk_name; #ifdef CONFIG_EXT4_FS_ENCRYPTION - if (unlikely(!name)) { - if (fname->usr_fname->name[0] == '_') { - int ret; - if (de->name_len <= 32) - return 0; - ret = memcmp(de->name + ((de->name_len - 17) & ~15), - fname->crypto_buf.name + 8, 16); - return (ret == 0) ? 1 : 0; - } - name = fname->crypto_buf.name; - len = fname->crypto_buf.len; - } + f.crypto_buf = fname->crypto_buf; #endif - if (de->name_len != len) - return 0; - return (memcmp(de->name, name, len) == 0) ? 1 : 0; + return fscrypt_match_name(&f, de->name, de->name_len); } /* @@ -1281,48 +1268,31 @@ int ext4_search_dir(struct buffer_head *bh, char *search_buf, int buf_size, struct ext4_dir_entry_2 * de; char * dlimit; int de_len; - int res; de = (struct ext4_dir_entry_2 *)search_buf; dlimit = search_buf + buf_size; while ((char *) de < dlimit) { /* this code is executed quadratically often */ /* do minimal checking `by hand' */ - if ((char *) de + de->name_len <= dlimit) { - res = ext4_match(fname, de); - if (res < 0) { - res = -1; - goto return_result; - } - if (res > 0) { - /* found a match - just to be sure, do - * a full check */ - if (ext4_check_dir_entry(dir, NULL, de, bh, - bh->b_data, - bh->b_size, offset)) { - res = -1; - goto return_result; - } - *res_dir = de; - res = 1; - goto return_result; - } - + if ((char *) de + de->name_len <= dlimit && + ext4_match(fname, de)) { + /* found a match - just to be sure, do + * a full check */ + if (ext4_check_dir_entry(dir, NULL, de, bh, bh->b_data, + bh->b_size, offset)) + return -1; + *res_dir = de; + return 1; } /* prevent looping on a bad block */ de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize); - if (de_len <= 0) { - res = -1; - goto return_result; - } + if (de_len <= 0) + return -1; offset += de_len; de = (struct ext4_dir_entry_2 *) ((char *) de + de_len); } - - res = 0; -return_result: - return res; + return 0; } static int is_dx_internal_node(struct inode *dir, ext4_lblk_t block, @@ -1576,24 +1546,14 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi struct inode *inode; struct ext4_dir_entry_2 *de; struct buffer_head *bh; + int err; - if (ext4_encrypted_inode(dir)) { - int res = fscrypt_get_encryption_info(dir); + err = fscrypt_prepare_lookup(dir, dentry, flags); + if (err) + return ERR_PTR(err); - /* - * DCACHE_ENCRYPTED_WITH_KEY is set if the dentry is - * created while the directory was encrypted and we - * have access to the key. - */ - if (fscrypt_has_encryption_key(dir)) - fscrypt_set_encrypted_dentry(dentry); - fscrypt_set_d_op(dentry); - if (res && res != -ENOKEY) - return ERR_PTR(res); - } - - if (dentry->d_name.len > EXT4_NAME_LEN) - return ERR_PTR(-ENAMETOOLONG); + if (dentry->d_name.len > EXT4_NAME_LEN) + return ERR_PTR(-ENAMETOOLONG); bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL); if (IS_ERR(bh)) @@ -1621,16 +1581,9 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi if (!IS_ERR(inode) && ext4_encrypted_inode(dir) && (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) && !fscrypt_has_permitted_context(dir, inode)) { - int nokey = ext4_encrypted_inode(inode) && - !fscrypt_has_encryption_key(inode); - if (nokey) { - iput(inode); - return ERR_PTR(-ENOKEY); - } ext4_warning(inode->i_sb, "Inconsistent encryption contexts: %lu/%lu", - (unsigned long) dir->i_ino, - (unsigned long) inode->i_ino); + dir->i_ino, inode->i_ino); iput(inode); return ERR_PTR(-EPERM); } @@ -1838,24 +1791,15 @@ int ext4_find_dest_de(struct inode *dir, struct inode *inode, int nlen, rlen; unsigned int offset = 0; char *top; - int res; de = (struct ext4_dir_entry_2 *)buf; top = buf + buf_size - reclen; while ((char *) de <= top) { if (ext4_check_dir_entry(dir, NULL, de, bh, - buf, buf_size, offset)) { - res = -EFSCORRUPTED; - goto return_result; - } - /* Provide crypto context and crypto buffer to ext4 match */ - res = ext4_match(fname, de); - if (res < 0) - goto return_result; - if (res > 0) { - res = -EEXIST; - goto return_result; - } + buf, buf_size, offset)) + return -EFSCORRUPTED; + if (ext4_match(fname, de)) + return -EEXIST; nlen = EXT4_DIR_REC_LEN(de->name_len); rlen = ext4_rec_len_from_disk(de->rec_len, buf_size); if ((de->inode ? rlen - nlen : rlen) >= reclen) @@ -1863,22 +1807,17 @@ int ext4_find_dest_de(struct inode *dir, struct inode *inode, de = (struct ext4_dir_entry_2 *)((char *)de + rlen); offset += rlen; } - if ((char *) de > top) - res = -ENOSPC; - else { - *dest_de = de; - res = 0; - } -return_result: - return res; + return -ENOSPC; + + *dest_de = de; + return 0; } -int ext4_insert_dentry(struct inode *dir, - struct inode *inode, - struct ext4_dir_entry_2 *de, - int buf_size, - struct ext4_filename *fname) +void ext4_insert_dentry(struct inode *inode, + struct ext4_dir_entry_2 *de, + int buf_size, + struct ext4_filename *fname) { int nlen, rlen; @@ -1897,7 +1836,6 @@ int ext4_insert_dentry(struct inode *dir, ext4_set_de_type(inode->i_sb, de, inode->i_mode); de->name_len = fname_len(fname); memcpy(de->name, fname_name(fname), fname_len(fname)); - return 0; } /* @@ -1933,11 +1871,8 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname, return err; } - /* By now the buffer is marked for journaling. Due to crypto operations, - * the following function call may fail */ - err = ext4_insert_dentry(dir, inode, de, blocksize, fname); - if (err < 0) - return err; + /* By now the buffer is marked for journaling */ + ext4_insert_dentry(inode, de, blocksize, fname); /* * XXX shouldn't update any times until successful @@ -3081,36 +3016,16 @@ static int ext4_symlink(struct inode *dir, struct inode *inode; int err, len = strlen(symname); int credits; - bool encryption_required; struct fscrypt_str disk_link; - struct fscrypt_symlink_data *sd = NULL; - disk_link.len = len + 1; - disk_link.name = (char *) symname; - - encryption_required = (ext4_encrypted_inode(dir) || - DUMMY_ENCRYPTION_ENABLED(EXT4_SB(dir->i_sb))); - if (encryption_required) { - err = fscrypt_get_encryption_info(dir); - if (err) - return err; - if (!fscrypt_has_encryption_key(dir)) - return -ENOKEY; - disk_link.len = (fscrypt_fname_encrypted_size(dir, len) + - sizeof(struct fscrypt_symlink_data)); - sd = kzalloc(disk_link.len, GFP_KERNEL); - if (!sd) - return -ENOMEM; - } - - if (disk_link.len > dir->i_sb->s_blocksize) { - err = -ENAMETOOLONG; - goto err_free_sd; - } + err = fscrypt_prepare_symlink(dir, symname, len, dir->i_sb->s_blocksize, + &disk_link); + if (err) + return err; err = dquot_initialize(dir); if (err) - goto err_free_sd; + return err; if ((disk_link.len > EXT4_N_BLOCKS * 4)) { /* @@ -3139,27 +3054,18 @@ static int ext4_symlink(struct inode *dir, if (IS_ERR(inode)) { if (handle) ext4_journal_stop(handle); - err = PTR_ERR(inode); - goto err_free_sd; + return PTR_ERR(inode); } - if (encryption_required) { - struct qstr istr; - struct fscrypt_str ostr = - FSTR_INIT(sd->encrypted_path, disk_link.len); - - istr.name = (const unsigned char *) symname; - istr.len = len; - err = fscrypt_fname_usr_to_disk(inode, &istr, &ostr); + if (IS_ENCRYPTED(inode)) { + err = fscrypt_encrypt_symlink(inode, symname, len, &disk_link); if (err) goto err_drop_inode; - sd->len = cpu_to_le16(ostr.len); - disk_link.name = (char *) sd; inode->i_op = &ext4_encrypted_symlink_inode_operations; } if ((disk_link.len > EXT4_N_BLOCKS * 4)) { - if (!encryption_required) + if (!IS_ENCRYPTED(inode)) inode->i_op = &ext4_symlink_inode_operations; inode_nohighmem(inode); ext4_set_aops(inode); @@ -3201,7 +3107,7 @@ static int ext4_symlink(struct inode *dir, } else { /* clear the extent format for fast symlink */ ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS); - if (!encryption_required) { + if (!IS_ENCRYPTED(inode)) { inode->i_op = &ext4_fast_symlink_inode_operations; inode->i_link = (char *)&EXT4_I(inode)->i_data; } @@ -3216,16 +3122,17 @@ static int ext4_symlink(struct inode *dir, if (handle) ext4_journal_stop(handle); - kfree(sd); - return err; + goto out_free_encrypted_link; + err_drop_inode: if (handle) ext4_journal_stop(handle); clear_nlink(inode); unlock_new_inode(inode); iput(inode); -err_free_sd: - kfree(sd); +out_free_encrypted_link: + if (disk_link.name != (unsigned char *)symname) + kfree(disk_link.name); return err; }
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 0094923..0718a86 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c
@@ -24,7 +24,6 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/backing-dev.h> -#include <linux/fscrypto.h> #include "ext4_jbd2.h" #include "xattr.h" @@ -470,7 +469,8 @@ int ext4_bio_write_page(struct ext4_io_submit *io, gfp_t gfp_flags = GFP_NOFS; retry_encrypt: - data_page = fscrypt_encrypt_page(inode, page, gfp_flags); + data_page = fscrypt_encrypt_page(inode, page, PAGE_SIZE, 0, + page->index, gfp_flags); if (IS_ERR(data_page)) { ret = PTR_ERR(data_page); if (ret == -ENOMEM && wbc->sync_mode == WB_SYNC_ALL) {
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c index a81b829..c39a12d 100644 --- a/fs/ext4/readpage.c +++ b/fs/ext4/readpage.c
@@ -45,6 +45,7 @@ #include <linux/cleancache.h> #include "ext4.h" +#include <trace/events/android_fs.h> static inline bool ext4_bio_encrypted(struct bio *bio) { @@ -55,6 +56,17 @@ static inline bool ext4_bio_encrypted(struct bio *bio) #endif } +static void +ext4_trace_read_completion(struct bio *bio) +{ + struct page *first_page = bio->bi_io_vec[0].bv_page; + + if (first_page != NULL) + trace_android_fs_dataread_end(first_page->mapping->host, + page_offset(first_page), + bio->bi_iter.bi_size); +} + /* * I/O completion handler for multipage BIOs. * @@ -72,11 +84,14 @@ static void mpage_end_io(struct bio *bio) struct bio_vec *bv; int i; + if (trace_android_fs_dataread_start_enabled()) + ext4_trace_read_completion(bio); + if (ext4_bio_encrypted(bio)) { if (bio->bi_error) { fscrypt_release_ctx(bio->bi_private); } else { - fscrypt_decrypt_bio_pages(bio->bi_private, bio); + fscrypt_enqueue_decrypt_bio(bio->bi_private, bio); return; } } @@ -95,6 +110,30 @@ static void mpage_end_io(struct bio *bio) bio_put(bio); } +static void +ext4_submit_bio_read(struct bio *bio) +{ + if (trace_android_fs_dataread_start_enabled()) { + struct page *first_page = bio->bi_io_vec[0].bv_page; + + if (first_page != NULL) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + first_page->mapping->host); + trace_android_fs_dataread_start( + first_page->mapping->host, + page_offset(first_page), + bio->bi_iter.bi_size, + current->pid, + path, + current->comm); + } + } + submit_bio(bio); +} + int ext4_mpage_readpages(struct address_space *mapping, struct list_head *pages, struct page *page, unsigned nr_pages) @@ -235,7 +274,7 @@ int ext4_mpage_readpages(struct address_space *mapping, */ if (bio && (last_block_in_bio != blocks[0] - 1)) { submit_and_realloc: - submit_bio(bio); + ext4_submit_bio_read(bio); bio = NULL; } if (bio == NULL) { @@ -268,14 +307,14 @@ int ext4_mpage_readpages(struct address_space *mapping, if (((map.m_flags & EXT4_MAP_BOUNDARY) && (relative_block == map.m_len)) || (first_hole != blocks_per_page)) { - submit_bio(bio); + ext4_submit_bio_read(bio); bio = NULL; } else last_block_in_bio = blocks[blocks_per_page - 1]; goto next_page; confused: if (bio) { - submit_bio(bio); + ext4_submit_bio_read(bio); bio = NULL; } if (!PageUptodate(page)) @@ -288,6 +327,6 @@ int ext4_mpage_readpages(struct address_space *mapping, } BUG_ON(pages && !list_empty(pages)); if (bio) - submit_bio(bio); + ext4_submit_bio_read(bio); return 0; }
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index f88d480..031e43d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c
@@ -1028,9 +1028,7 @@ void ext4_clear_inode(struct inode *inode) jbd2_free_inode(EXT4_I(inode)->jinode); EXT4_I(inode)->jinode = NULL; } -#ifdef CONFIG_EXT4_FS_ENCRYPTION - fscrypt_put_encryption_info(inode, NULL); -#endif + fscrypt_put_encryption_info(inode); } static struct inode *ext4_nfs_get_inode(struct super_block *sb, @@ -1103,80 +1101,86 @@ static int ext4_get_context(struct inode *inode, void *ctx, size_t len) EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, ctx, len); } -static int ext4_key_prefix(struct inode *inode, u8 **key) -{ - *key = EXT4_SB(inode->i_sb)->key_prefix; - return EXT4_SB(inode->i_sb)->key_prefix_size; -} - -static int ext4_prepare_context(struct inode *inode) -{ - return ext4_convert_inline_data(inode); -} - static int ext4_set_context(struct inode *inode, const void *ctx, size_t len, void *fs_data) { - handle_t *handle; - int res, res2; + handle_t *handle = fs_data; + int res, res2, retries = 0; - /* fs_data is null when internally used. */ - if (fs_data) { - res = ext4_xattr_set(inode, EXT4_XATTR_INDEX_ENCRYPTION, - EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, ctx, - len, 0); + res = ext4_convert_inline_data(inode); + if (res) + return res; + + /* + * If a journal handle was specified, then the encryption context is + * being set on a new inode via inheritance and is part of a larger + * transaction to create the inode. Otherwise the encryption context is + * being set on an existing inode in its own transaction. Only in the + * latter case should the "retry on ENOSPC" logic be used. + */ + + if (handle) { + res = ext4_xattr_set_handle(handle, inode, + EXT4_XATTR_INDEX_ENCRYPTION, + EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, + ctx, len, 0); if (!res) { ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT); ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); + /* + * Update inode->i_flags - S_ENCRYPTED will be enabled, + * S_DAX may be disabled + */ + ext4_set_inode_flags(inode); } return res; } + res = dquot_initialize(inode); + if (res) + return res; +retry: handle = ext4_journal_start(inode, EXT4_HT_MISC, ext4_jbd2_credits_xattr(inode)); if (IS_ERR(handle)) return PTR_ERR(handle); - res = ext4_xattr_set(inode, EXT4_XATTR_INDEX_ENCRYPTION, - EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, ctx, - len, 0); + res = ext4_xattr_set_handle(handle, inode, EXT4_XATTR_INDEX_ENCRYPTION, + EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, + ctx, len, 0); if (!res) { ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT); + /* + * Update inode->i_flags - S_ENCRYPTED will be enabled, + * S_DAX may be disabled + */ + ext4_set_inode_flags(inode); res = ext4_mark_inode_dirty(handle, inode); if (res) EXT4_ERROR_INODE(inode, "Failed to mark inode dirty"); } res2 = ext4_journal_stop(handle); + + if (res == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) + goto retry; if (!res) res = res2; return res; } -static int ext4_dummy_context(struct inode *inode) +static bool ext4_dummy_context(struct inode *inode) { return DUMMY_ENCRYPTION_ENABLED(EXT4_SB(inode->i_sb)); } -static unsigned ext4_max_namelen(struct inode *inode) -{ - return S_ISLNK(inode->i_mode) ? inode->i_sb->s_blocksize : - EXT4_NAME_LEN; -} - -static struct fscrypt_operations ext4_cryptops = { +static const struct fscrypt_operations ext4_cryptops = { + .key_prefix = "ext4:", .get_context = ext4_get_context, - .key_prefix = ext4_key_prefix, - .prepare_context = ext4_prepare_context, .set_context = ext4_set_context, .dummy_context = ext4_dummy_context, - .is_encrypted = ext4_encrypted_inode, .empty_dir = ext4_empty_dir, - .max_namelen = ext4_max_namelen, -}; -#else -static struct fscrypt_operations ext4_cryptops = { - .is_encrypted = ext4_encrypted_inode, + .max_namelen = EXT4_NAME_LEN, }; #endif @@ -3963,7 +3967,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sb->s_op = &ext4_sops; sb->s_export_op = &ext4_export_ops; sb->s_xattr = ext4_xattr_handlers; +#ifdef CONFIG_EXT4_FS_ENCRYPTION sb->s_cop = &ext4_cryptops; +#endif #ifdef CONFIG_QUOTA sb->dq_op = &ext4_quota_operations; if (ext4_has_feature_quota(sb)) @@ -4279,11 +4285,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) ratelimit_state_init(&sbi->s_msg_ratelimit_state, 5 * HZ, 10); kfree(orig_data); -#ifdef CONFIG_EXT4_FS_ENCRYPTION - memcpy(sbi->key_prefix, EXT4_KEY_DESC_PREFIX, - EXT4_KEY_DESC_PREFIX_SIZE); - sbi->key_prefix_size = EXT4_KEY_DESC_PREFIX_SIZE; -#endif return 0; cantfind_ext4:
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c index 557b3b0..d4ce3af 100644 --- a/fs/ext4/symlink.c +++ b/fs/ext4/symlink.c
@@ -27,59 +27,28 @@ static const char *ext4_encrypted_get_link(struct dentry *dentry, struct delayed_call *done) { struct page *cpage = NULL; - char *caddr, *paddr = NULL; - struct fscrypt_str cstr, pstr; - struct fscrypt_symlink_data *sd; - int res; - u32 max_size = inode->i_sb->s_blocksize; + const void *caddr; + unsigned int max_size; + const char *paddr; if (!dentry) return ERR_PTR(-ECHILD); - res = fscrypt_get_encryption_info(inode); - if (res) - return ERR_PTR(res); - if (ext4_inode_is_fast_symlink(inode)) { - caddr = (char *) EXT4_I(inode)->i_data; + caddr = EXT4_I(inode)->i_data; max_size = sizeof(EXT4_I(inode)->i_data); } else { cpage = read_mapping_page(inode->i_mapping, 0, NULL); if (IS_ERR(cpage)) return ERR_CAST(cpage); caddr = page_address(cpage); + max_size = inode->i_sb->s_blocksize; } - /* Symlink is encrypted */ - sd = (struct fscrypt_symlink_data *)caddr; - cstr.name = sd->encrypted_path; - cstr.len = le16_to_cpu(sd->len); - if ((cstr.len + sizeof(struct fscrypt_symlink_data) - 1) > max_size) { - /* Symlink data on the disk is corrupted */ - res = -EFSCORRUPTED; - goto errout; - } - - res = fscrypt_fname_alloc_buffer(inode, cstr.len, &pstr); - if (res) - goto errout; - paddr = pstr.name; - - res = fscrypt_fname_disk_to_usr(inode, 0, 0, &cstr, &pstr); - if (res) - goto errout; - - /* Null-terminate the name */ - paddr[pstr.len] = '\0'; + paddr = fscrypt_get_symlink(inode, caddr, max_size, done); if (cpage) put_page(cpage); - set_delayed_call(done, kfree_link, paddr); return paddr; -errout: - if (cpage) - put_page(cpage); - kfree(paddr); - return ERR_PTR(res); } const struct inode_operations ext4_encrypted_symlink_inode_operations = {
diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile index ca949ea..a0dc559 100644 --- a/fs/f2fs/Makefile +++ b/fs/f2fs/Makefile
@@ -2,7 +2,7 @@ f2fs-y := dir.o file.o inode.o namei.o hash.o super.o inline.o f2fs-y += checkpoint.o gc.o data.o node.o segment.o recovery.o -f2fs-y += shrinker.o extent_cache.o +f2fs-y += shrinker.o extent_cache.o sysfs.o f2fs-$(CONFIG_F2FS_STAT_FS) += debug.o f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c index 55aa29c..2bb7c9f 100644 --- a/fs/f2fs/acl.c +++ b/fs/f2fs/acl.c
@@ -207,15 +207,16 @@ static int __f2fs_set_acl(struct inode *inode, int type, void *value = NULL; size_t size = 0; int error; + umode_t mode = inode->i_mode; switch (type) { case ACL_TYPE_ACCESS: name_index = F2FS_XATTR_INDEX_POSIX_ACL_ACCESS; if (acl && !ipage) { - error = posix_acl_update_mode(inode, &inode->i_mode, &acl); + error = posix_acl_update_mode(inode, &mode, &acl); if (error) return error; - set_acl_inode(inode, inode->i_mode); + set_acl_inode(inode, mode); } break; @@ -233,7 +234,7 @@ static int __f2fs_set_acl(struct inode *inode, int type, value = f2fs_acl_to_disk(F2FS_I_SB(inode), acl, &size); if (IS_ERR(value)) { clear_inode_flag(inode, FI_ACL_MODE); - return (int)PTR_ERR(value); + return PTR_ERR(value); } } @@ -249,6 +250,9 @@ static int __f2fs_set_acl(struct inode *inode, int type, int f2fs_set_acl(struct inode *inode, struct posix_acl *acl, int type) { + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) + return -EIO; + return __f2fs_set_acl(inode, type, acl, NULL); } @@ -384,7 +388,7 @@ int f2fs_init_acl(struct inode *inode, struct inode *dir, struct page *ipage, if (error) return error; - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, true); if (default_acl) { error = __f2fs_set_acl(inode, ACL_TYPE_DEFAULT, default_acl,
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index aee2a06..6b24eb4 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c
@@ -24,20 +24,20 @@ #include <trace/events/f2fs.h> static struct kmem_cache *ino_entry_slab; -struct kmem_cache *inode_entry_slab; +struct kmem_cache *f2fs_inode_entry_slab; void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io) { + f2fs_build_fault_attr(sbi, 0, 0); set_ckpt_flags(sbi, CP_ERROR_FLAG); - sbi->sb->s_flags |= MS_RDONLY; if (!end_io) - f2fs_flush_merged_bios(sbi); + f2fs_flush_merged_writes(sbi); } /* * We guarantee no failure on the returned page. */ -struct page *grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index) +struct page *f2fs_grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index) { struct address_space *mapping = META_MAPPING(sbi); struct page *page = NULL; @@ -65,11 +65,13 @@ static struct page *__get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index, .sbi = sbi, .type = META, .op = REQ_OP_READ, - .op_flags = READ_SYNC | REQ_META | REQ_PRIO, + .op_flags = REQ_META | REQ_PRIO, .old_blkaddr = index, .new_blkaddr = index, .encrypted_page = NULL, + .is_meta = is_meta, }; + int err; if (unlikely(!is_meta)) fio.op_flags &= ~REQ_META; @@ -84,9 +86,10 @@ static struct page *__get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index, fio.page = page; - if (f2fs_submit_page_bio(&fio)) { + err = f2fs_submit_page_bio(&fio); + if (err) { f2fs_put_page(page, 1); - goto repeat; + return ERR_PTR(err); } lock_page(page); @@ -95,29 +98,46 @@ static struct page *__get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index, goto repeat; } - /* - * if there is any IO error when accessing device, make our filesystem - * readonly and make sure do not write checkpoint with non-uptodate - * meta page. - */ - if (unlikely(!PageUptodate(page))) - f2fs_stop_checkpoint(sbi, false); + if (unlikely(!PageUptodate(page))) { + f2fs_put_page(page, 1); + return ERR_PTR(-EIO); + } out: return page; } -struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index) +struct page *f2fs_get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index) { return __get_meta_page(sbi, index, true); } +struct page *f2fs_get_meta_page_nofail(struct f2fs_sb_info *sbi, pgoff_t index) +{ + struct page *page; + int count = 0; + +retry: + page = __get_meta_page(sbi, index, true); + if (IS_ERR(page)) { + if (PTR_ERR(page) == -EIO && + ++count <= DEFAULT_RETRY_IO_COUNT) + goto retry; + + f2fs_stop_checkpoint(sbi, false); + f2fs_bug_on(sbi, 1); + } + + return page; +} + /* for POR only */ -struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index) +struct page *f2fs_get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index) { return __get_meta_page(sbi, index, false); } -bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) +bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr, int type) { switch (type) { case META_NAT: @@ -137,8 +157,20 @@ bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) return false; break; case META_POR: + case DATA_GENERIC: if (unlikely(blkaddr >= MAX_BLKADDR(sbi) || - blkaddr < MAIN_BLKADDR(sbi))) + blkaddr < MAIN_BLKADDR(sbi))) { + if (type == DATA_GENERIC) { + f2fs_msg(sbi->sb, KERN_WARNING, + "access invalid blkaddr:%u", blkaddr); + WARN_ON(1); + } + return false; + } + break; + case META_GENERIC: + if (unlikely(blkaddr < SEG0_BLKADDR(sbi) || + blkaddr >= MAIN_BLKADDR(sbi))) return false; break; default: @@ -151,7 +183,7 @@ bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) /* * Readahead CP/NAT/SIT/SSA pages */ -int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, +int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type, bool sync) { struct page *page; @@ -160,8 +192,10 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, .sbi = sbi, .type = META, .op = REQ_OP_READ, - .op_flags = sync ? (READ_SYNC | REQ_META | REQ_PRIO) : REQ_RAHEAD, + .op_flags = sync ? (REQ_META | REQ_PRIO) : REQ_RAHEAD, .encrypted_page = NULL, + .in_list = false, + .is_meta = (type != META_POR), }; struct blk_plug plug; @@ -171,7 +205,7 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, blk_start_plug(&plug); for (; nrpages-- > 0; blkno++) { - if (!is_valid_blkaddr(sbi, blkno, type)) + if (!f2fs_is_valid_blkaddr(sbi, blkno, type)) goto out; switch (type) { @@ -207,17 +241,15 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, } fio.page = page; - fio.old_blkaddr = fio.new_blkaddr; - f2fs_submit_page_mbio(&fio); + f2fs_submit_page_bio(&fio); f2fs_put_page(page, 0); } out: - f2fs_submit_merged_bio(sbi, META, READ); blk_finish_plug(&plug); return blkno - start; } -void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index) +void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index) { struct page *page; bool readahead = false; @@ -228,33 +260,35 @@ void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index) f2fs_put_page(page, 0); if (readahead) - ra_meta_pages(sbi, index, MAX_BIO_BLOCKS(sbi), META_POR, true); + f2fs_ra_meta_pages(sbi, index, BIO_MAX_PAGES, META_POR, true); } -static int f2fs_write_meta_page(struct page *page, - struct writeback_control *wbc) +static int __f2fs_write_meta_page(struct page *page, + struct writeback_control *wbc, + enum iostat_type io_type) { struct f2fs_sb_info *sbi = F2FS_P_SB(page); trace_f2fs_writepage(page, META); + if (unlikely(f2fs_cp_error(sbi))) + goto redirty_out; if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) goto redirty_out; if (wbc->for_reclaim && page->index < GET_SUM_BLOCK(sbi, 0)) goto redirty_out; - if (unlikely(f2fs_cp_error(sbi))) - goto redirty_out; - write_meta_page(sbi, page); + f2fs_do_write_meta_page(sbi, page, io_type); dec_page_count(sbi, F2FS_DIRTY_META); if (wbc->for_reclaim) - f2fs_submit_merged_bio_cond(sbi, NULL, page, 0, META, WRITE); + f2fs_submit_merged_write_cond(sbi, page->mapping->host, + 0, page->index, META); unlock_page(page); if (unlikely(f2fs_cp_error(sbi))) - f2fs_submit_merged_bio(sbi, META, WRITE); + f2fs_submit_merged_write(sbi, META); return 0; @@ -263,23 +297,33 @@ static int f2fs_write_meta_page(struct page *page, return AOP_WRITEPAGE_ACTIVATE; } +static int f2fs_write_meta_page(struct page *page, + struct writeback_control *wbc) +{ + return __f2fs_write_meta_page(page, wbc, FS_META_IO); +} + static int f2fs_write_meta_pages(struct address_space *mapping, struct writeback_control *wbc) { struct f2fs_sb_info *sbi = F2FS_M_SB(mapping); long diff, written; + if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) + goto skip_write; + /* collect a number of dirty meta pages and write together */ if (wbc->for_kupdate || get_pages(sbi, F2FS_DIRTY_META) < nr_pages_to_skip(sbi, META)) goto skip_write; - trace_f2fs_writepages(mapping->host, wbc, META); + /* if locked failed, cp will flush dirty pages instead */ + if (!mutex_trylock(&sbi->cp_mutex)) + goto skip_write; - /* if mounting is failed, skip writing node pages */ - mutex_lock(&sbi->cp_mutex); + trace_f2fs_writepages(mapping->host, wbc, META); diff = nr_pages_to_write(sbi, META, wbc); - written = sync_meta_pages(sbi, META, wbc->nr_to_write); + written = f2fs_sync_meta_pages(sbi, META, wbc->nr_to_write, FS_META_IO); mutex_unlock(&sbi->cp_mutex); wbc->nr_to_write = max((long)0, wbc->nr_to_write - written - diff); return 0; @@ -290,13 +334,14 @@ static int f2fs_write_meta_pages(struct address_space *mapping, return 0; } -long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, - long nr_to_write) +long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, + long nr_to_write, enum iostat_type io_type) { struct address_space *mapping = META_MAPPING(sbi); - pgoff_t index = 0, end = ULONG_MAX, prev = ULONG_MAX; + pgoff_t index = 0, prev = ULONG_MAX; struct pagevec pvec; long nwritten = 0; + int nr_pages; struct writeback_control wbc = { .for_reclaim = 0, }; @@ -306,13 +351,9 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, blk_start_plug(&plug); - while (index <= end) { - int i, nr_pages; - nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, - PAGECACHE_TAG_DIRTY, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); - if (unlikely(nr_pages == 0)) - break; + while ((nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, + PAGECACHE_TAG_DIRTY))) { + int i; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; @@ -342,7 +383,7 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, if (!clear_page_dirty_for_io(page)) goto continue_unlock; - if (mapping->a_ops->writepage(page, &wbc)) { + if (__f2fs_write_meta_page(page, &wbc, io_type)) { unlock_page(page); break; } @@ -356,7 +397,7 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, } stop: if (nwritten) - f2fs_submit_merged_bio(sbi, type, WRITE); + f2fs_submit_merged_write(sbi, type); blk_finish_plug(&plug); @@ -370,7 +411,7 @@ static int f2fs_set_meta_page_dirty(struct page *page) if (!PageUptodate(page)) SetPageUptodate(page); if (!PageDirty(page)) { - f2fs_set_page_dirty_nobuffers(page); + __set_page_dirty_nobuffers(page); inc_page_count(F2FS_P_SB(page), F2FS_DIRTY_META); SetPagePrivate(page); f2fs_trace_pid(page); @@ -390,24 +431,23 @@ const struct address_space_operations f2fs_meta_aops = { #endif }; -static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) +static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, + unsigned int devidx, int type) { struct inode_management *im = &sbi->im[type]; struct ino_entry *e, *tmp; tmp = f2fs_kmem_cache_alloc(ino_entry_slab, GFP_NOFS); -retry: + radix_tree_preload(GFP_NOFS | __GFP_NOFAIL); spin_lock(&im->ino_lock); e = radix_tree_lookup(&im->ino_root, ino); if (!e) { e = tmp; - if (radix_tree_insert(&im->ino_root, ino, e)) { - spin_unlock(&im->ino_lock); - radix_tree_preload_end(); - goto retry; - } + if (unlikely(radix_tree_insert(&im->ino_root, ino, e))) + f2fs_bug_on(sbi, 1); + memset(e, 0, sizeof(struct ino_entry)); e->ino = ino; @@ -415,6 +455,10 @@ static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) if (type != ORPHAN_INO) im->ino_num++; } + + if (type == FLUSH_INO) + f2fs_set_bit(devidx, (char *)&e->dirty_device); + spin_unlock(&im->ino_lock); radix_tree_preload_end(); @@ -440,20 +484,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) spin_unlock(&im->ino_lock); } -void add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) +void f2fs_add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) { /* add new dirty ino entry into list */ - __add_ino_entry(sbi, ino, type); + __add_ino_entry(sbi, ino, 0, type); } -void remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) +void f2fs_remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) { /* remove dirty ino entry from list */ __remove_ino_entry(sbi, ino, type); } /* mode should be APPEND_INO or UPDATE_INO */ -bool exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode) +bool f2fs_exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode) { struct inode_management *im = &sbi->im[mode]; struct ino_entry *e; @@ -464,12 +508,12 @@ bool exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode) return e ? true : false; } -void release_ino_entry(struct f2fs_sb_info *sbi, bool all) +void f2fs_release_ino_entry(struct f2fs_sb_info *sbi, bool all) { struct ino_entry *e, *tmp; int i; - for (i = all ? ORPHAN_INO: APPEND_INO; i <= UPDATE_INO; i++) { + for (i = all ? ORPHAN_INO : APPEND_INO; i < MAX_INO_ENTRY; i++) { struct inode_management *im = &sbi->im[i]; spin_lock(&im->ino_lock); @@ -483,19 +527,40 @@ void release_ino_entry(struct f2fs_sb_info *sbi, bool all) } } -int acquire_orphan_inode(struct f2fs_sb_info *sbi) +void f2fs_set_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, + unsigned int devidx, int type) +{ + __add_ino_entry(sbi, ino, devidx, type); +} + +bool f2fs_is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, + unsigned int devidx, int type) +{ + struct inode_management *im = &sbi->im[type]; + struct ino_entry *e; + bool is_dirty = false; + + spin_lock(&im->ino_lock); + e = radix_tree_lookup(&im->ino_root, ino); + if (e && f2fs_test_bit(devidx, (char *)&e->dirty_device)) + is_dirty = true; + spin_unlock(&im->ino_lock); + return is_dirty; +} + +int f2fs_acquire_orphan_inode(struct f2fs_sb_info *sbi) { struct inode_management *im = &sbi->im[ORPHAN_INO]; int err = 0; spin_lock(&im->ino_lock); -#ifdef CONFIG_F2FS_FAULT_INJECTION if (time_to_inject(sbi, FAULT_ORPHAN)) { spin_unlock(&im->ino_lock); + f2fs_show_injection_info(FAULT_ORPHAN); return -ENOSPC; } -#endif + if (unlikely(im->ino_num >= sbi->max_orphans)) err = -ENOSPC; else @@ -505,7 +570,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *sbi) return err; } -void release_orphan_inode(struct f2fs_sb_info *sbi) +void f2fs_release_orphan_inode(struct f2fs_sb_info *sbi) { struct inode_management *im = &sbi->im[ORPHAN_INO]; @@ -515,14 +580,14 @@ void release_orphan_inode(struct f2fs_sb_info *sbi) spin_unlock(&im->ino_lock); } -void add_orphan_inode(struct inode *inode) +void f2fs_add_orphan_inode(struct inode *inode) { /* add new orphan ino entry into list */ - __add_ino_entry(F2FS_I_SB(inode), inode->i_ino, ORPHAN_INO); - update_inode_page(inode); + __add_ino_entry(F2FS_I_SB(inode), inode->i_ino, 0, ORPHAN_INO); + f2fs_update_inode_page(inode); } -void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) +void f2fs_remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) { /* remove orphan entry from orphan list */ __remove_ino_entry(sbi, ino, ORPHAN_INO); @@ -532,17 +597,7 @@ static int recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) { struct inode *inode; struct node_info ni; - int err = acquire_orphan_inode(sbi); - - if (err) { - set_sbi_flag(sbi, SBI_NEED_FSCK); - f2fs_msg(sbi->sb, KERN_WARNING, - "%s: orphan failed (ino=%x), run fsck to fix.", - __func__, ino); - return err; - } - - __add_ino_entry(sbi, ino, ORPHAN_INO); + int err; inode = f2fs_iget_retry(sbi->sb, ino); if (IS_ERR(inode)) { @@ -554,56 +609,101 @@ static int recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) return PTR_ERR(inode); } + err = dquot_initialize(inode); + if (err) { + iput(inode); + goto err_out; + } + clear_nlink(inode); /* truncate all the data during iput */ iput(inode); - get_node_info(sbi, ino, &ni); + err = f2fs_get_node_info(sbi, ino, &ni); + if (err) + goto err_out; /* ENOMEM was fully retried in f2fs_evict_inode. */ if (ni.blk_addr != NULL_ADDR) { - set_sbi_flag(sbi, SBI_NEED_FSCK); - f2fs_msg(sbi->sb, KERN_WARNING, - "%s: orphan failed (ino=%x), run fsck to fix.", - __func__, ino); - return -EIO; + err = -EIO; + goto err_out; } - __remove_ino_entry(sbi, ino, ORPHAN_INO); return 0; + +err_out: + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: orphan failed (ino=%x), run fsck to fix.", + __func__, ino); + return err; } -int recover_orphan_inodes(struct f2fs_sb_info *sbi) +int f2fs_recover_orphan_inodes(struct f2fs_sb_info *sbi) { block_t start_blk, orphan_blocks, i, j; - int err; + unsigned int s_flags = sbi->sb->s_flags; + int err = 0; +#ifdef CONFIG_QUOTA + int quota_enabled; +#endif if (!is_set_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG)) return 0; + if (s_flags & MS_RDONLY) { + f2fs_msg(sbi->sb, KERN_INFO, "orphan cleanup on readonly fs"); + sbi->sb->s_flags &= ~MS_RDONLY; + } + +#ifdef CONFIG_QUOTA + /* Needed for iput() to work correctly and not trash data */ + sbi->sb->s_flags |= MS_ACTIVE; + + /* + * Turn on quotas which were not enabled for read-only mounts if + * filesystem has quota feature, so that they are updated correctly. + */ + quota_enabled = f2fs_enable_quota_files(sbi, s_flags & MS_RDONLY); +#endif + start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi); orphan_blocks = __start_sum_addr(sbi) - 1 - __cp_payload(sbi); - ra_meta_pages(sbi, start_blk, orphan_blocks, META_CP, true); + f2fs_ra_meta_pages(sbi, start_blk, orphan_blocks, META_CP, true); for (i = 0; i < orphan_blocks; i++) { - struct page *page = get_meta_page(sbi, start_blk + i); + struct page *page; struct f2fs_orphan_block *orphan_blk; + page = f2fs_get_meta_page(sbi, start_blk + i); + if (IS_ERR(page)) { + err = PTR_ERR(page); + goto out; + } + orphan_blk = (struct f2fs_orphan_block *)page_address(page); for (j = 0; j < le32_to_cpu(orphan_blk->entry_count); j++) { nid_t ino = le32_to_cpu(orphan_blk->ino[j]); err = recover_orphan_inode(sbi, ino); if (err) { f2fs_put_page(page, 1); - return err; + goto out; } } f2fs_put_page(page, 1); } /* clear Orphan Flag */ clear_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG); - return 0; +out: +#ifdef CONFIG_QUOTA + /* Turn quotas off */ + if (quota_enabled) + f2fs_quota_off_umount(sbi->sb); +#endif + sbi->sb->s_flags = s_flags; /* Restore MS_RDONLY status */ + + return err; } static void write_orphan_inodes(struct f2fs_sb_info *sbi, block_t start_blk) @@ -629,7 +729,7 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, block_t start_blk) /* loop for each orphan inode entry and write them in Jornal block */ list_for_each_entry(orphan, head, list) { if (!page) { - page = grab_meta_page(sbi, start_blk++); + page = f2fs_grab_meta_page(sbi, start_blk++); orphan_blk = (struct f2fs_orphan_block *)page_address(page); memset(orphan_blk, 0, sizeof(*orphan_blk)); @@ -671,19 +771,21 @@ static int get_checkpoint_version(struct f2fs_sb_info *sbi, block_t cp_addr, size_t crc_offset = 0; __u32 crc = 0; - *cp_page = get_meta_page(sbi, cp_addr); + *cp_page = f2fs_get_meta_page(sbi, cp_addr); + if (IS_ERR(*cp_page)) + return PTR_ERR(*cp_page); + *cp_block = (struct f2fs_checkpoint *)page_address(*cp_page); crc_offset = le32_to_cpu((*cp_block)->checksum_offset); - if (crc_offset >= blk_size) { + if (crc_offset > (blk_size - sizeof(__le32))) { f2fs_put_page(*cp_page, 1); f2fs_msg(sbi->sb, KERN_WARNING, "invalid crc_offset: %zu", crc_offset); return -EINVAL; } - crc = le32_to_cpu(*((__le32 *)((unsigned char *)*cp_block - + crc_offset))); + crc = cur_cp_crc(*cp_block); if (!f2fs_crc_valid(sbi, crc, *cp_block, crc_offset)) { f2fs_put_page(*cp_page, 1); f2fs_msg(sbi->sb, KERN_WARNING, "invalid crc value"); @@ -706,6 +808,14 @@ static struct page *validate_checkpoint(struct f2fs_sb_info *sbi, &cp_page_1, version); if (err) return NULL; + + if (le32_to_cpu(cp_block->cp_pack_total_block_count) > + sbi->blocks_per_seg) { + f2fs_msg(sbi->sb, KERN_WARNING, + "invalid cp_pack_total_block_count:%u", + le32_to_cpu(cp_block->cp_pack_total_block_count)); + goto invalid_cp; + } pre_version = *version; cp_addr += le32_to_cpu(cp_block->cp_pack_total_block_count) - 1; @@ -726,7 +836,7 @@ static struct page *validate_checkpoint(struct f2fs_sb_info *sbi, return NULL; } -int get_valid_checkpoint(struct f2fs_sb_info *sbi) +int f2fs_get_valid_checkpoint(struct f2fs_sb_info *sbi) { struct f2fs_checkpoint *cp_block; struct f2fs_super_block *fsb = sbi->raw_super; @@ -738,7 +848,8 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) block_t cp_blk_no; int i; - sbi->ckpt = kzalloc(cp_blks * blk_size, GFP_KERNEL); + sbi->ckpt = f2fs_kzalloc(sbi, array_size(blk_size, cp_blks), + GFP_KERNEL); if (!sbi->ckpt) return -ENOMEM; /* @@ -769,15 +880,15 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) cp_block = (struct f2fs_checkpoint *)page_address(cur_page); memcpy(sbi->ckpt, cp_block, blk_size); - /* Sanity checking of checkpoint */ - if (sanity_check_ckpt(sbi)) - goto fail_no_cp; - if (cur_page == cp1) sbi->cur_cp_pack = 1; else sbi->cur_cp_pack = 2; + /* Sanity checking of checkpoint */ + if (f2fs_sanity_check_ckpt(sbi)) + goto free_fail_no_cp; + if (cp_blks <= 1) goto done; @@ -789,7 +900,9 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) void *sit_bitmap_ptr; unsigned char *ckpt = (unsigned char *)sbi->ckpt; - cur_page = get_meta_page(sbi, cp_blk_no + i); + cur_page = f2fs_get_meta_page(sbi, cp_blk_no + i); + if (IS_ERR(cur_page)) + goto free_fail_no_cp; sit_bitmap_ptr = page_address(cur_page); memcpy(ckpt + i * blk_size, sit_bitmap_ptr, blk_size); f2fs_put_page(cur_page, 1); @@ -799,6 +912,9 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) f2fs_put_page(cp2, 1); return 0; +free_fail_no_cp: + f2fs_put_page(cp1, 1); + f2fs_put_page(cp2, 1); fail_no_cp: kfree(sbi->ckpt); return -EINVAL; @@ -813,7 +929,9 @@ static void __add_dirty_inode(struct inode *inode, enum inode_type type) return; set_inode_flag(inode, flag); - list_add_tail(&F2FS_I(inode)->dirty_list, &sbi->inode_list[type]); + if (!f2fs_is_volatile_file(inode)) + list_add_tail(&F2FS_I(inode)->dirty_list, + &sbi->inode_list[type]); stat_inc_dirty_inode(sbi, type); } @@ -829,7 +947,7 @@ static void __remove_dirty_inode(struct inode *inode, enum inode_type type) stat_dec_dirty_inode(F2FS_I_SB(inode), type); } -void update_dirty_page(struct inode *inode, struct page *page) +void f2fs_update_dirty_page(struct inode *inode, struct page *page) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); enum inode_type type = S_ISDIR(inode->i_mode) ? DIR_INODE : FILE_INODE; @@ -848,7 +966,7 @@ void update_dirty_page(struct inode *inode, struct page *page) f2fs_trace_pid(page); } -void remove_dirty_inode(struct inode *inode) +void f2fs_remove_dirty_inode(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); enum inode_type type = S_ISDIR(inode->i_mode) ? DIR_INODE : FILE_INODE; @@ -865,12 +983,13 @@ void remove_dirty_inode(struct inode *inode) spin_unlock(&sbi->inode_lock[type]); } -int sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type) +int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type) { struct list_head *head; struct inode *inode; struct f2fs_inode_info *fi; bool is_dir = (type == DIR_INODE); + unsigned long ino = 0; trace_f2fs_sync_dirty_inodes_enter(sbi->sb, is_dir, get_pages(sbi, is_dir ? @@ -889,18 +1008,32 @@ int sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type) F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA)); return 0; } - fi = list_entry(head->next, struct f2fs_inode_info, dirty_list); + fi = list_first_entry(head, struct f2fs_inode_info, dirty_list); inode = igrab(&fi->vfs_inode); spin_unlock(&sbi->inode_lock[type]); if (inode) { + unsigned long cur_ino = inode->i_ino; + + if (is_dir) + F2FS_I(inode)->cp_task = current; + filemap_fdatawrite(inode->i_mapping); + + if (is_dir) + F2FS_I(inode)->cp_task = NULL; + iput(inode); + /* We need to give cpu to another writers. */ + if (ino == cur_ino) + cond_resched(); + else + ino = cur_ino; } else { /* * We should submit bio, since it exists several * wribacking dentry pages in the freeing inode. */ - f2fs_submit_merged_bio(sbi, DATA, WRITE); + f2fs_submit_merged_write(sbi, DATA); cond_resched(); } goto retry; @@ -922,18 +1055,35 @@ int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi) spin_unlock(&sbi->inode_lock[DIRTY_META]); return 0; } - fi = list_entry(head->next, struct f2fs_inode_info, + fi = list_first_entry(head, struct f2fs_inode_info, gdirty_list); inode = igrab(&fi->vfs_inode); spin_unlock(&sbi->inode_lock[DIRTY_META]); if (inode) { - update_inode_page(inode); + sync_inode_metadata(inode, 0); + + /* it's on eviction */ + if (is_inode_flag_set(inode, FI_DIRTY_INODE)) + f2fs_update_inode_page(inode); iput(inode); } - }; + } return 0; } +static void __prepare_cp_block(struct f2fs_sb_info *sbi) +{ + struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); + struct f2fs_nm_info *nm_i = NM_I(sbi); + nid_t last_nid = nm_i->next_scan_nid; + + next_free_nid(sbi, &last_nid); + ckpt->valid_block_count = cpu_to_le64(valid_user_blocks(sbi)); + ckpt->valid_node_count = cpu_to_le32(valid_node_count(sbi)); + ckpt->valid_inode_count = cpu_to_le32(valid_inode_count(sbi)); + ckpt->next_free_nid = cpu_to_le32(last_nid); +} + /* * Freeze all the FS-operations for checkpoint. */ @@ -954,36 +1104,52 @@ static int block_operations(struct f2fs_sb_info *sbi) /* write all the dirty dentry pages */ if (get_pages(sbi, F2FS_DIRTY_DENTS)) { f2fs_unlock_all(sbi); - err = sync_dirty_inodes(sbi, DIR_INODE); + err = f2fs_sync_dirty_inodes(sbi, DIR_INODE); if (err) goto out; - goto retry_flush_dents; - } - - if (get_pages(sbi, F2FS_DIRTY_IMETA)) { - f2fs_unlock_all(sbi); - err = f2fs_sync_inode_meta(sbi); - if (err) - goto out; + cond_resched(); goto retry_flush_dents; } /* * POR: we should ensure that there are no dirty node pages - * until finishing nat/sit flush. + * until finishing nat/sit flush. inode->i_blocks can be updated. */ + down_write(&sbi->node_change); + + if (get_pages(sbi, F2FS_DIRTY_IMETA)) { + up_write(&sbi->node_change); + f2fs_unlock_all(sbi); + err = f2fs_sync_inode_meta(sbi); + if (err) + goto out; + cond_resched(); + goto retry_flush_dents; + } + retry_flush_nodes: down_write(&sbi->node_write); if (get_pages(sbi, F2FS_DIRTY_NODES)) { up_write(&sbi->node_write); - err = sync_node_pages(sbi, &wbc); + atomic_inc(&sbi->wb_sync_req[NODE]); + err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO); + atomic_dec(&sbi->wb_sync_req[NODE]); if (err) { + up_write(&sbi->node_change); f2fs_unlock_all(sbi); goto out; } + cond_resched(); goto retry_flush_nodes; } + + /* + * sbi->node_change is used only for AIO write_begin path which produces + * dirty node blocks and some checkpoint values by block allocation. + */ + __prepare_cp_block(sbi); + up_write(&sbi->node_change); out: blk_finish_plug(&plug); return err; @@ -992,19 +1158,20 @@ static int block_operations(struct f2fs_sb_info *sbi) static void unblock_operations(struct f2fs_sb_info *sbi) { up_write(&sbi->node_write); - - build_free_nids(sbi); f2fs_unlock_all(sbi); } -static void wait_on_all_pages_writeback(struct f2fs_sb_info *sbi) +void f2fs_wait_on_all_pages_writeback(struct f2fs_sb_info *sbi) { DEFINE_WAIT(wait); for (;;) { prepare_to_wait(&sbi->cp_wait, &wait, TASK_UNINTERRUPTIBLE); - if (!atomic_read(&sbi->nr_wb_bios)) + if (!get_pages(sbi, F2FS_WB_CP_DATA)) + break; + + if (unlikely(f2fs_cp_error(sbi))) break; io_schedule_timeout(5*HZ); @@ -1016,15 +1183,26 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, struct cp_control *cpc) { unsigned long orphan_num = sbi->im[ORPHAN_INO].ino_num; struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); + unsigned long flags; - spin_lock(&sbi->cp_lock); + spin_lock_irqsave(&sbi->cp_lock, flags); - if (cpc->reason == CP_UMOUNT) + if ((cpc->reason & CP_UMOUNT) && + le32_to_cpu(ckpt->cp_pack_total_block_count) > + sbi->blocks_per_seg - NM_I(sbi)->nat_bits_blocks) + disable_nat_bits(sbi, false); + + if (cpc->reason & CP_TRIMMED) + __set_ckpt_flags(ckpt, CP_TRIMMED_FLAG); + else + __clear_ckpt_flags(ckpt, CP_TRIMMED_FLAG); + + if (cpc->reason & CP_UMOUNT) __set_ckpt_flags(ckpt, CP_UMOUNT_FLAG); else __clear_ckpt_flags(ckpt, CP_UMOUNT_FLAG); - if (cpc->reason == CP_FASTBOOT) + if (cpc->reason & CP_FASTBOOT) __set_ckpt_flags(ckpt, CP_FASTBOOT_FLAG); else __clear_ckpt_flags(ckpt, CP_FASTBOOT_FLAG); @@ -1039,16 +1217,53 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, struct cp_control *cpc) /* set this flag to activate crc|cp_ver for recovery */ __set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG); + __clear_ckpt_flags(ckpt, CP_NOCRC_RECOVERY_FLAG); - spin_unlock(&sbi->cp_lock); + spin_unlock_irqrestore(&sbi->cp_lock, flags); +} + +static void commit_checkpoint(struct f2fs_sb_info *sbi, + void *src, block_t blk_addr) +{ + struct writeback_control wbc = { + .for_reclaim = 0, + }; + + /* + * pagevec_lookup_tag and lock_page again will take + * some extra time. Therefore, f2fs_update_meta_pages and + * f2fs_sync_meta_pages are combined in this function. + */ + struct page *page = f2fs_grab_meta_page(sbi, blk_addr); + int err; + + memcpy(page_address(page), src, PAGE_SIZE); + set_page_dirty(page); + + f2fs_wait_on_page_writeback(page, META, true); + f2fs_bug_on(sbi, PageWriteback(page)); + if (unlikely(!clear_page_dirty_for_io(page))) + f2fs_bug_on(sbi, 1); + + /* writeout cp pack 2 page */ + err = __f2fs_write_meta_page(page, &wbc, FS_CP_META_IO); + if (unlikely(err && f2fs_cp_error(sbi))) { + f2fs_put_page(page, 1); + return; + } + + f2fs_bug_on(sbi, err); + f2fs_put_page(page, 0); + + /* submit checkpoint (with barrier if NOBARRIER is not set) */ + f2fs_submit_merged_write(sbi, META_FLUSH); } static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) { struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); struct f2fs_nm_info *nm_i = NM_I(sbi); - unsigned long orphan_num = sbi->im[ORPHAN_INO].ino_num; - nid_t last_nid = nm_i->next_scan_nid; + unsigned long orphan_num = sbi->im[ORPHAN_INO].ino_num, flags; block_t start_blk; unsigned int data_sum_blocks, orphan_blocks; __u32 crc32 = 0; @@ -1057,22 +1272,20 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) struct super_block *sb = sbi->sb; struct curseg_info *seg_i = CURSEG_I(sbi, CURSEG_HOT_NODE); u64 kbytes_written; + int err; /* Flush all the NAT/SIT pages */ while (get_pages(sbi, F2FS_DIRTY_META)) { - sync_meta_pages(sbi, META, LONG_MAX); + f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); if (unlikely(f2fs_cp_error(sbi))) - return -EIO; + break; } - next_free_nid(sbi, &last_nid); - /* * modify checkpoint * version number is already updated */ - ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi)); - ckpt->valid_block_count = cpu_to_le64(valid_user_blocks(sbi)); + ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi, true)); ckpt->free_segment_count = cpu_to_le32(free_segments(sbi)); for (i = 0; i < NR_CURSEG_NODE_TYPE; i++) { ckpt->cur_node_segno[i] = @@ -1091,18 +1304,14 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) curseg_alloc_type(sbi, i + CURSEG_HOT_DATA); } - ckpt->valid_node_count = cpu_to_le32(valid_node_count(sbi)); - ckpt->valid_inode_count = cpu_to_le32(valid_inode_count(sbi)); - ckpt->next_free_nid = cpu_to_le32(last_nid); - /* 2 cp + n data seg summary + orphan inode blocks */ - data_sum_blocks = npages_for_summary_flush(sbi, false); - spin_lock(&sbi->cp_lock); + data_sum_blocks = f2fs_npages_for_summary_flush(sbi, false); + spin_lock_irqsave(&sbi->cp_lock, flags); if (data_sum_blocks < NR_CURSEG_DATA_TYPE) __set_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG); else __clear_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG); - spin_unlock(&sbi->cp_lock); + spin_unlock_irqrestore(&sbi->cp_lock, flags); orphan_blocks = GET_ORPHAN_BLOCKS(orphan_num); ckpt->cp_pack_start_sum = cpu_to_le32(1 + cp_payload_blks + @@ -1131,16 +1340,33 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) start_blk = __start_cp_next_addr(sbi); - /* need to wait for end_io results */ - wait_on_all_pages_writeback(sbi); - if (unlikely(f2fs_cp_error(sbi))) - return -EIO; + /* write nat bits */ + if (enabled_nat_bits(sbi, cpc)) { + __u64 cp_ver = cur_cp_version(ckpt); + block_t blk; + + cp_ver |= ((__u64)crc32 << 32); + *(__le64 *)nm_i->nat_bits = cpu_to_le64(cp_ver); + + blk = start_blk + sbi->blocks_per_seg - nm_i->nat_bits_blocks; + for (i = 0; i < nm_i->nat_bits_blocks; i++) + f2fs_update_meta_page(sbi, nm_i->nat_bits + + (i << F2FS_BLKSIZE_BITS), blk + i); + + /* Flush all the NAT BITS pages */ + while (get_pages(sbi, F2FS_DIRTY_META)) { + f2fs_sync_meta_pages(sbi, META, LONG_MAX, + FS_CP_META_IO); + if (unlikely(f2fs_cp_error(sbi))) + break; + } + } /* write out checkpoint buffer at block 0 */ - update_meta_page(sbi, ckpt, start_blk++); + f2fs_update_meta_page(sbi, ckpt, start_blk++); for (i = 1; i < 1 + cp_payload_blks; i++) - update_meta_page(sbi, (char *)ckpt + i * F2FS_BLKSIZE, + f2fs_update_meta_page(sbi, (char *)ckpt + i * F2FS_BLKSIZE, start_blk++); if (orphan_num) { @@ -1148,7 +1374,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) start_blk += orphan_blocks; } - write_data_summaries(sbi, start_blk); + f2fs_write_data_summaries(sbi, start_blk); start_blk += data_sum_blocks; /* Record write statistics in the hot node summary */ @@ -1159,38 +1385,41 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) seg_i->journal->info.kbytes_written = cpu_to_le64(kbytes_written); if (__remain_node_summaries(cpc->reason)) { - write_node_summaries(sbi, start_blk); + f2fs_write_node_summaries(sbi, start_blk); start_blk += NR_CURSEG_NODE_TYPE; } - /* writeout checkpoint block */ - update_meta_page(sbi, ckpt, start_blk); - - /* wait for previous submitted node/meta pages writeback */ - wait_on_all_pages_writeback(sbi); - - if (unlikely(f2fs_cp_error(sbi))) - return -EIO; - - filemap_fdatawait_range(NODE_MAPPING(sbi), 0, LLONG_MAX); - filemap_fdatawait_range(META_MAPPING(sbi), 0, LLONG_MAX); - /* update user_block_counts */ sbi->last_valid_block_count = sbi->total_valid_block_count; percpu_counter_set(&sbi->alloc_valid_block_count, 0); - /* Here, we only have one bio having CP pack */ - sync_meta_pages(sbi, META_FLUSH, LONG_MAX); + /* Here, we have one bio having CP pack except cp pack 2 page */ + f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); /* wait for previous submitted meta pages writeback */ - wait_on_all_pages_writeback(sbi); + f2fs_wait_on_all_pages_writeback(sbi); - release_ino_entry(sbi, false); + /* flush all device cache */ + err = f2fs_flush_device_cache(sbi); + if (err) + return err; - if (unlikely(f2fs_cp_error(sbi))) - return -EIO; + /* barrier and flush checkpoint cp pack 2 page if it can */ + commit_checkpoint(sbi, ckpt, start_blk); + f2fs_wait_on_all_pages_writeback(sbi); - clear_prefree_segments(sbi, cpc); + /* + * invalidate intermediate page cache borrowed from meta inode + * which are used for migration of encrypted inode's blocks. + */ + if (f2fs_sb_has_encrypt(sbi->sb)) + invalidate_mapping_pages(META_MAPPING(sbi), + MAIN_BLKADDR(sbi), MAX_BLKADDR(sbi) - 1); + + f2fs_release_ino_entry(sbi, false); + + f2fs_reset_fsync_node_info(sbi); + clear_sbi_flag(sbi, SBI_IS_DIRTY); clear_sbi_flag(sbi, SBI_NEED_CP); __set_cp_next_pack(sbi); @@ -1205,13 +1434,13 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_DENTS)); - return 0; + return unlikely(f2fs_cp_error(sbi)) ? -EIO : 0; } /* * We guarantee that this checkpoint procedure will not fail. */ -int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) +int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) { struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); unsigned long long ckpt_ver; @@ -1220,8 +1449,8 @@ int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) mutex_lock(&sbi->cp_mutex); if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) && - (cpc->reason == CP_FASTBOOT || cpc->reason == CP_SYNC || - (cpc->reason == CP_DISCARD && !sbi->discard_blks))) + ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) || + ((cpc->reason & CP_DISCARD) && !sbi->discard_blks))) goto out; if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; @@ -1240,18 +1469,23 @@ int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish block_ops"); - f2fs_flush_merged_bios(sbi); + f2fs_flush_merged_writes(sbi); /* this is the case of multiple fstrims without any changes */ - if (cpc->reason == CP_DISCARD && !is_sbi_flag_set(sbi, SBI_IS_DIRTY)) { - f2fs_bug_on(sbi, NM_I(sbi)->dirty_nat_cnt); - f2fs_bug_on(sbi, SIT_I(sbi)->dirty_sentries); - f2fs_bug_on(sbi, prefree_segments(sbi)); - flush_sit_entries(sbi, cpc); - clear_prefree_segments(sbi, cpc); - f2fs_wait_all_discard_bio(sbi); - unblock_operations(sbi); - goto out; + if (cpc->reason & CP_DISCARD) { + if (!f2fs_exist_trim_candidates(sbi, cpc)) { + unblock_operations(sbi); + goto out; + } + + if (NM_I(sbi)->dirty_nat_cnt == 0 && + SIT_I(sbi)->dirty_sentries == 0 && + prefree_segments(sbi) == 0) { + f2fs_flush_sit_entries(sbi, cpc); + f2fs_clear_prefree_segments(sbi, cpc); + unblock_operations(sbi); + goto out; + } } /* @@ -1263,18 +1497,20 @@ int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) ckpt->checkpoint_ver = cpu_to_le64(++ckpt_ver); /* write cached NAT/SIT entries to NAT/SIT area */ - flush_nat_entries(sbi); - flush_sit_entries(sbi, cpc); + f2fs_flush_nat_entries(sbi, cpc); + f2fs_flush_sit_entries(sbi, cpc); /* unlock all the fs_lock[] in do_checkpoint() */ err = do_checkpoint(sbi, cpc); - - f2fs_wait_all_discard_bio(sbi); + if (err) + f2fs_release_discard_addrs(sbi); + else + f2fs_clear_prefree_segments(sbi, cpc); unblock_operations(sbi); stat_inc_cp_count(sbi->stat_info); - if (cpc->reason == CP_RECOVERY) + if (cpc->reason & CP_RECOVERY) f2fs_msg(sbi->sb, KERN_NOTICE, "checkpoint: version = %llx", ckpt_ver); @@ -1286,7 +1522,7 @@ int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) return err; } -void init_ino_entry_info(struct f2fs_sb_info *sbi) +void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi) { int i; @@ -1304,23 +1540,23 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi) F2FS_ORPHANS_PER_BLOCK; } -int __init create_checkpoint_caches(void) +int __init f2fs_create_checkpoint_caches(void) { ino_entry_slab = f2fs_kmem_cache_create("f2fs_ino_entry", sizeof(struct ino_entry)); if (!ino_entry_slab) return -ENOMEM; - inode_entry_slab = f2fs_kmem_cache_create("f2fs_inode_entry", + f2fs_inode_entry_slab = f2fs_kmem_cache_create("f2fs_inode_entry", sizeof(struct inode_entry)); - if (!inode_entry_slab) { + if (!f2fs_inode_entry_slab) { kmem_cache_destroy(ino_entry_slab); return -ENOMEM; } return 0; } -void destroy_checkpoint_caches(void) +void f2fs_destroy_checkpoint_caches(void) { kmem_cache_destroy(ino_entry_slab); - kmem_cache_destroy(inode_entry_slab); + kmem_cache_destroy(f2fs_inode_entry_slab); }
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index ae354ac..51ed82a 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c
@@ -19,8 +19,6 @@ #include <linux/bio.h> #include <linux/prefetch.h> #include <linux/uio.h> -#include <linux/mm.h> -#include <linux/memcontrol.h> #include <linux/cleancache.h> #include "f2fs.h" @@ -28,41 +26,122 @@ #include "segment.h" #include "trace.h" #include <trace/events/f2fs.h> +#include <trace/events/android_fs.h> -static void f2fs_read_end_io(struct bio *bio) +#define NUM_PREALLOC_POST_READ_CTXS 128 + +static struct kmem_cache *bio_post_read_ctx_cache; +static mempool_t *bio_post_read_ctx_pool; + +static bool __is_cp_guaranteed(struct page *page) { - struct bio_vec *bvec; + struct address_space *mapping = page->mapping; + struct inode *inode; + struct f2fs_sb_info *sbi; + + if (!mapping) + return false; + + inode = mapping->host; + sbi = F2FS_I_SB(inode); + + if (inode->i_ino == F2FS_META_INO(sbi) || + inode->i_ino == F2FS_NODE_INO(sbi) || + S_ISDIR(inode->i_mode) || + (S_ISREG(inode->i_mode) && + is_inode_flag_set(inode, FI_ATOMIC_FILE)) || + is_cold_data(page)) + return true; + return false; +} + +/* postprocessing steps for read bios */ +enum bio_post_read_step { + STEP_INITIAL = 0, + STEP_DECRYPT, +}; + +struct bio_post_read_ctx { + struct bio *bio; + struct work_struct work; + unsigned int cur_step; + unsigned int enabled_steps; +}; + +static void __read_end_io(struct bio *bio) +{ + struct page *page; + struct bio_vec *bv; int i; -#ifdef CONFIG_F2FS_FAULT_INJECTION - if (time_to_inject(F2FS_P_SB(bio->bi_io_vec->bv_page), FAULT_IO)) - bio->bi_error = -EIO; -#endif + bio_for_each_segment_all(bv, bio, i) { + page = bv->bv_page; - if (f2fs_bio_encrypted(bio)) { - if (bio->bi_error) { - fscrypt_release_ctx(bio->bi_private); - } else { - fscrypt_decrypt_bio_pages(bio->bi_private, bio); - return; - } - } - - bio_for_each_segment_all(bvec, bio, i) { - struct page *page = bvec->bv_page; - - if (!bio->bi_error) { - if (!PageUptodate(page)) - SetPageUptodate(page); - } else { + /* PG_error was set if any post_read step failed */ + if (bio->bi_error || PageError(page)) { ClearPageUptodate(page); SetPageError(page); + } else { + SetPageUptodate(page); } unlock_page(page); } + if (bio->bi_private) + mempool_free(bio->bi_private, bio_post_read_ctx_pool); bio_put(bio); } +static void bio_post_read_processing(struct bio_post_read_ctx *ctx); + +static void decrypt_work(struct work_struct *work) +{ + struct bio_post_read_ctx *ctx = + container_of(work, struct bio_post_read_ctx, work); + + fscrypt_decrypt_bio(ctx->bio); + + bio_post_read_processing(ctx); +} + +static void bio_post_read_processing(struct bio_post_read_ctx *ctx) +{ + switch (++ctx->cur_step) { + case STEP_DECRYPT: + if (ctx->enabled_steps & (1 << STEP_DECRYPT)) { + INIT_WORK(&ctx->work, decrypt_work); + fscrypt_enqueue_decrypt_work(&ctx->work); + return; + } + ctx->cur_step++; + /* fall-through */ + default: + __read_end_io(ctx->bio); + } +} + +static bool f2fs_bio_post_read_required(struct bio *bio) +{ + return bio->bi_private && !bio->bi_error; +} + +static void f2fs_read_end_io(struct bio *bio) +{ + if (time_to_inject(F2FS_P_SB(bio->bi_io_vec->bv_page), FAULT_IO)) { + f2fs_show_injection_info(FAULT_IO); + bio->bi_error = -EIO; + } + + if (f2fs_bio_post_read_required(bio)) { + struct bio_post_read_ctx *ctx = bio->bi_private; + + ctx->cur_step = STEP_INITIAL; + bio_post_read_processing(ctx); + return; + } + + __read_end_io(bio); +} + static void f2fs_write_end_io(struct bio *bio) { struct f2fs_sb_info *sbi = bio->bi_private; @@ -71,16 +150,37 @@ static void f2fs_write_end_io(struct bio *bio) bio_for_each_segment_all(bvec, bio, i) { struct page *page = bvec->bv_page; + enum count_type type = WB_DATA_TYPE(page); + + if (IS_DUMMY_WRITTEN_PAGE(page)) { + set_page_private(page, (unsigned long)NULL); + ClearPagePrivate(page); + unlock_page(page); + mempool_free(page, sbi->write_io_dummy); + + if (unlikely(bio->bi_error)) + f2fs_stop_checkpoint(sbi, true); + continue; + } fscrypt_pullback_bio_page(&page, true); if (unlikely(bio->bi_error)) { mapping_set_error(page->mapping, -EIO); - f2fs_stop_checkpoint(sbi, true); + if (type == F2FS_WB_CP_DATA) + f2fs_stop_checkpoint(sbi, true); } + + f2fs_bug_on(sbi, page->mapping == NODE_MAPPING(sbi) && + page->index != nid_of_node(page)); + + dec_page_count(sbi, type); + if (f2fs_in_warm_node_list(sbi, page)) + f2fs_del_fsync_node_entry(sbi, page); + clear_cold_data(page); end_page_writeback(page); } - if (atomic_dec_and_test(&sbi->nr_wb_bios) && + if (!get_pages(sbi, F2FS_WB_CP_DATA) && wq_has_sleeper(&sbi->cp_wait)) wake_up(&sbi->cp_wait); @@ -88,19 +188,68 @@ static void f2fs_write_end_io(struct bio *bio) } /* + * Return true, if pre_bio's bdev is same as its target device. + */ +struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi, + block_t blk_addr, struct bio *bio) +{ + struct block_device *bdev = sbi->sb->s_bdev; + int i; + + for (i = 0; i < sbi->s_ndevs; i++) { + if (FDEV(i).start_blk <= blk_addr && + FDEV(i).end_blk >= blk_addr) { + blk_addr -= FDEV(i).start_blk; + bdev = FDEV(i).bdev; + break; + } + } + if (bio) { + bio->bi_bdev = bdev; + bio->bi_iter.bi_sector = SECTOR_FROM_BLOCK(blk_addr); + } + return bdev; +} + +int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr) +{ + int i; + + for (i = 0; i < sbi->s_ndevs; i++) + if (FDEV(i).start_blk <= blkaddr && FDEV(i).end_blk >= blkaddr) + return i; + return 0; +} + +static bool __same_bdev(struct f2fs_sb_info *sbi, + block_t blk_addr, struct bio *bio) +{ + return f2fs_target_device(sbi, blk_addr, NULL) == bio->bi_bdev; +} + +/* * Low-level block read/write IO operations. */ static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr, - int npages, bool is_read) + struct writeback_control *wbc, + int npages, bool is_read, + enum page_type type, enum temp_type temp) { struct bio *bio; - bio = f2fs_bio_alloc(npages); + bio = f2fs_bio_alloc(sbi, npages, true); - bio->bi_bdev = sbi->sb->s_bdev; - bio->bi_iter.bi_sector = SECTOR_FROM_BLOCK(blk_addr); - bio->bi_end_io = is_read ? f2fs_read_end_io : f2fs_write_end_io; - bio->bi_private = is_read ? NULL : sbi; + f2fs_target_device(sbi, blk_addr, bio); + if (is_read) { + bio->bi_end_io = f2fs_read_end_io; + bio->bi_private = NULL; + } else { + bio->bi_end_io = f2fs_write_end_io; + bio->bi_private = sbi; + bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi, type, temp); + } + if (wbc) + wbc_init_bio(wbc, bio); return bio; } @@ -109,11 +258,45 @@ static inline void __submit_bio(struct f2fs_sb_info *sbi, struct bio *bio, enum page_type type) { if (!is_read_io(bio_op(bio))) { - atomic_inc(&sbi->nr_wb_bios); - if (f2fs_sb_mounted_hmsmr(sbi->sb) && - current->plug && (type == DATA || type == NODE)) + unsigned int start; + + if (type != DATA && type != NODE) + goto submit_io; + + if (test_opt(sbi, LFS) && current->plug) blk_finish_plug(current->plug); + + start = bio->bi_iter.bi_size >> F2FS_BLKSIZE_BITS; + start %= F2FS_IO_SIZE(sbi); + + if (start == 0) + goto submit_io; + + /* fill dummy pages */ + for (; start < F2FS_IO_SIZE(sbi); start++) { + struct page *page = + mempool_alloc(sbi->write_io_dummy, + GFP_NOIO | __GFP_ZERO | __GFP_NOFAIL); + f2fs_bug_on(sbi, !page); + + SetPagePrivate(page); + set_page_private(page, (unsigned long)DUMMY_WRITTEN_PAGE); + lock_page(page); + if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) + f2fs_bug_on(sbi, 1); + } + /* + * In the NODE case, we lose next block address chain. So, we + * need to do checkpoint in f2fs_sync_file. + */ + if (type == NODE) + set_sbi_flag(sbi, SBI_NEED_CP); } +submit_io: + if (is_read_io(bio_op(bio))) + trace_f2fs_submit_read_bio(sbi->sb, type, bio); + else + trace_f2fs_submit_write_bio(sbi->sb, type, bio); submit_bio(bio); } @@ -124,19 +307,19 @@ static void __submit_merged_bio(struct f2fs_bio_info *io) if (!io->bio) return; - if (is_read_io(fio->op)) - trace_f2fs_submit_read_bio(io->sbi->sb, fio, io->bio); - else - trace_f2fs_submit_write_bio(io->sbi->sb, fio, io->bio); - bio_set_op_attrs(io->bio, fio->op, fio->op_flags); + if (is_read_io(fio->op)) + trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio); + else + trace_f2fs_prepare_write_bio(io->sbi->sb, fio->type, io->bio); + __submit_bio(io->sbi, io->bio, fio->type); io->bio = NULL; } -static bool __has_merged_page(struct f2fs_bio_info *io, struct inode *inode, - struct page *page, nid_t ino) +static bool __has_merged_page(struct f2fs_bio_info *io, + struct inode *inode, nid_t ino, pgoff_t idx) { struct bio_vec *bvec; struct page *target; @@ -145,7 +328,7 @@ static bool __has_merged_page(struct f2fs_bio_info *io, struct inode *inode, if (!io->bio) return false; - if (!inode && !page && !ino) + if (!inode && !ino) return true; bio_for_each_segment_all(bvec, io->bio, i) { @@ -155,10 +338,11 @@ static bool __has_merged_page(struct f2fs_bio_info *io, struct inode *inode, else target = fscrypt_control_page(bvec->bv_page); + if (idx != target->index) + continue; + if (inode && inode == target->mapping->host) return true; - if (page && page == target) - return true; if (ino && ino == ino_of_node(target)) return true; } @@ -167,72 +351,88 @@ static bool __has_merged_page(struct f2fs_bio_info *io, struct inode *inode, } static bool has_merged_page(struct f2fs_sb_info *sbi, struct inode *inode, - struct page *page, nid_t ino, - enum page_type type) + nid_t ino, pgoff_t idx, enum page_type type) { enum page_type btype = PAGE_TYPE_OF_BIO(type); - struct f2fs_bio_info *io = &sbi->write_io[btype]; - bool ret; + enum temp_type temp; + struct f2fs_bio_info *io; + bool ret = false; - down_read(&io->io_rwsem); - ret = __has_merged_page(io, inode, page, ino); - up_read(&io->io_rwsem); + for (temp = HOT; temp < NR_TEMP_TYPE; temp++) { + io = sbi->write_io[btype] + temp; + + down_read(&io->io_rwsem); + ret = __has_merged_page(io, inode, ino, idx); + up_read(&io->io_rwsem); + + /* TODO: use HOT temp only for meta pages now. */ + if (ret || btype == META) + break; + } return ret; } -static void __f2fs_submit_merged_bio(struct f2fs_sb_info *sbi, - struct inode *inode, struct page *page, - nid_t ino, enum page_type type, int rw) +static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi, + enum page_type type, enum temp_type temp) { enum page_type btype = PAGE_TYPE_OF_BIO(type); - struct f2fs_bio_info *io; - - io = is_read_io(rw) ? &sbi->read_io : &sbi->write_io[btype]; + struct f2fs_bio_info *io = sbi->write_io[btype] + temp; down_write(&io->io_rwsem); - if (!__has_merged_page(io, inode, page, ino)) - goto out; - /* change META to META_FLUSH in the checkpoint procedure */ if (type >= META_FLUSH) { io->fio.type = META_FLUSH; io->fio.op = REQ_OP_WRITE; - if (test_opt(sbi, NOBARRIER)) - io->fio.op_flags = WRITE_FLUSH | REQ_META | REQ_PRIO; - else - io->fio.op_flags = WRITE_FLUSH_FUA | REQ_META | - REQ_PRIO; + io->fio.op_flags = REQ_META | REQ_PRIO | REQ_SYNC; + if (!test_opt(sbi, NOBARRIER)) + io->fio.op_flags |= REQ_PREFLUSH | REQ_FUA; } __submit_merged_bio(io); -out: up_write(&io->io_rwsem); } -void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi, enum page_type type, - int rw) +static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, + struct inode *inode, nid_t ino, pgoff_t idx, + enum page_type type, bool force) { - __f2fs_submit_merged_bio(sbi, NULL, NULL, 0, type, rw); + enum temp_type temp; + + if (!force && !has_merged_page(sbi, inode, ino, idx, type)) + return; + + for (temp = HOT; temp < NR_TEMP_TYPE; temp++) { + + __f2fs_submit_merged_write(sbi, type, temp); + + /* TODO: use HOT temp only for meta pages now. */ + if (type >= META) + break; + } } -void f2fs_submit_merged_bio_cond(struct f2fs_sb_info *sbi, - struct inode *inode, struct page *page, - nid_t ino, enum page_type type, int rw) +void f2fs_submit_merged_write(struct f2fs_sb_info *sbi, enum page_type type) { - if (has_merged_page(sbi, inode, page, ino, type)) - __f2fs_submit_merged_bio(sbi, inode, page, ino, type, rw); + __submit_merged_write_cond(sbi, NULL, 0, 0, type, true); } -void f2fs_flush_merged_bios(struct f2fs_sb_info *sbi) +void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi, + struct inode *inode, nid_t ino, pgoff_t idx, + enum page_type type) { - f2fs_submit_merged_bio(sbi, DATA, WRITE); - f2fs_submit_merged_bio(sbi, NODE, WRITE); - f2fs_submit_merged_bio(sbi, META, WRITE); + __submit_merged_write_cond(sbi, inode, ino, idx, type, false); +} + +void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi) +{ + f2fs_submit_merged_write(sbi, DATA); + f2fs_submit_merged_write(sbi, NODE); + f2fs_submit_merged_write(sbi, META); } /* * Fill the locked page with data located in the block address. - * Return unlocked page. + * A caller needs to unlock the page on failure. */ int f2fs_submit_page_bio(struct f2fs_io_info *fio) { @@ -240,11 +440,16 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) struct page *page = fio->encrypted_page ? fio->encrypted_page : fio->page; + if (!f2fs_is_valid_blkaddr(fio->sbi, fio->new_blkaddr, + __is_meta_io(fio) ? META_GENERIC : DATA_GENERIC)) + return -EFAULT; + trace_f2fs_submit_page_bio(page, fio); f2fs_trace_ios(fio, 0); /* Allocate a new bio */ - bio = __bio_alloc(fio->sbi, fio->new_blkaddr, 1, is_read_io(fio->op)); + bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc, + 1, is_read_io(fio->op), fio->type, fio->temp); if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) { bio_put(bio); @@ -253,60 +458,149 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) bio_set_op_attrs(bio, fio->op, fio->op_flags); __submit_bio(fio->sbi, bio, fio->type); + + if (!is_read_io(fio->op)) + inc_page_count(fio->sbi, WB_DATA_TYPE(fio->page)); return 0; } -void f2fs_submit_page_mbio(struct f2fs_io_info *fio) +void f2fs_submit_page_write(struct f2fs_io_info *fio) { struct f2fs_sb_info *sbi = fio->sbi; enum page_type btype = PAGE_TYPE_OF_BIO(fio->type); - struct f2fs_bio_info *io; - bool is_read = is_read_io(fio->op); + struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp; struct page *bio_page; - io = is_read ? &sbi->read_io : &sbi->write_io[btype]; - - if (fio->old_blkaddr != NEW_ADDR) - verify_block_addr(sbi, fio->old_blkaddr); - verify_block_addr(sbi, fio->new_blkaddr); + f2fs_bug_on(sbi, is_read_io(fio->op)); down_write(&io->io_rwsem); - - if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 || - (io->fio.op != fio->op || io->fio.op_flags != fio->op_flags))) - __submit_merged_bio(io); -alloc_new: - if (io->bio == NULL) { - int bio_blocks = MAX_BIO_BLOCKS(sbi); - - io->bio = __bio_alloc(sbi, fio->new_blkaddr, - bio_blocks, is_read); - io->fio = *fio; +next: + if (fio->in_list) { + spin_lock(&io->io_lock); + if (list_empty(&io->io_list)) { + spin_unlock(&io->io_lock); + goto out; + } + fio = list_first_entry(&io->io_list, + struct f2fs_io_info, list); + list_del(&fio->list); + spin_unlock(&io->io_lock); } + if (__is_valid_data_blkaddr(fio->old_blkaddr)) + verify_block_addr(fio, fio->old_blkaddr); + verify_block_addr(fio, fio->new_blkaddr); + bio_page = fio->encrypted_page ? fio->encrypted_page : fio->page; - if (bio_add_page(io->bio, bio_page, PAGE_SIZE, 0) < - PAGE_SIZE) { + /* set submitted = true as a return value */ + fio->submitted = true; + + inc_page_count(sbi, WB_DATA_TYPE(bio_page)); + + if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 || + (io->fio.op != fio->op || io->fio.op_flags != fio->op_flags) || + !__same_bdev(sbi, fio->new_blkaddr, io->bio))) + __submit_merged_bio(io); +alloc_new: + if (io->bio == NULL) { + if ((fio->type == DATA || fio->type == NODE) && + fio->new_blkaddr & F2FS_IO_SIZE_MASK(sbi)) { + dec_page_count(sbi, WB_DATA_TYPE(bio_page)); + fio->retry = true; + goto skip; + } + io->bio = __bio_alloc(sbi, fio->new_blkaddr, fio->io_wbc, + BIO_MAX_PAGES, false, + fio->type, fio->temp); + io->fio = *fio; + } + + if (bio_add_page(io->bio, bio_page, PAGE_SIZE, 0) < PAGE_SIZE) { __submit_merged_bio(io); goto alloc_new; } + if (fio->io_wbc) + wbc_account_io(fio->io_wbc, bio_page, PAGE_SIZE); + io->last_block_in_bio = fio->new_blkaddr; f2fs_trace_ios(fio, 0); + trace_f2fs_submit_page_write(fio->page, fio); +skip: + if (fio->in_list) + goto next; +out: up_write(&io->io_rwsem); - trace_f2fs_submit_page_mbio(fio->page, fio); +} + +static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr, + unsigned nr_pages, unsigned op_flag) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct bio *bio; + struct bio_post_read_ctx *ctx; + unsigned int post_read_steps = 0; + + if (!f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC)) + return ERR_PTR(-EFAULT); + + bio = f2fs_bio_alloc(sbi, min_t(int, nr_pages, BIO_MAX_PAGES), false); + if (!bio) + return ERR_PTR(-ENOMEM); + f2fs_target_device(sbi, blkaddr, bio); + bio->bi_end_io = f2fs_read_end_io; + bio_set_op_attrs(bio, REQ_OP_READ, op_flag); + + if (f2fs_encrypted_file(inode)) + post_read_steps |= 1 << STEP_DECRYPT; + if (post_read_steps) { + ctx = mempool_alloc(bio_post_read_ctx_pool, GFP_NOFS); + if (!ctx) { + bio_put(bio); + return ERR_PTR(-ENOMEM); + } + ctx->bio = bio; + ctx->enabled_steps = post_read_steps; + bio->bi_private = ctx; + + /* wait the page to be moved by cleaning */ + f2fs_wait_on_block_writeback(sbi, blkaddr); + } + + return bio; +} + +/* This can handle encryption stuffs */ +static int f2fs_submit_page_read(struct inode *inode, struct page *page, + block_t blkaddr) +{ + struct bio *bio = f2fs_grab_read_bio(inode, blkaddr, 1, 0); + + if (IS_ERR(bio)) + return PTR_ERR(bio); + + if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) { + bio_put(bio); + return -EFAULT; + } + __submit_bio(F2FS_I_SB(inode), bio, DATA); + return 0; } static void __set_data_blkaddr(struct dnode_of_data *dn) { struct f2fs_node *rn = F2FS_NODE(dn->node_page); __le32 *addr_array; + int base = 0; + + if (IS_INODE(dn->node_page) && f2fs_has_extra_attr(dn->inode)) + base = get_extra_isize(dn->inode); /* Get physical address of data block */ addr_array = blkaddr_in_node(rn); - addr_array[dn->ofs_in_node] = cpu_to_le32(dn->data_blkaddr); + addr_array[base + dn->ofs_in_node] = cpu_to_le32(dn->data_blkaddr); } /* @@ -315,7 +609,7 @@ static void __set_data_blkaddr(struct dnode_of_data *dn) * ->node_page * update block addresses in the node page */ -void set_data_blkaddr(struct dnode_of_data *dn) +void f2fs_set_data_blkaddr(struct dnode_of_data *dn) { f2fs_wait_on_page_writeback(dn->node_page, NODE, true); __set_data_blkaddr(dn); @@ -326,22 +620,23 @@ void set_data_blkaddr(struct dnode_of_data *dn) void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr) { dn->data_blkaddr = blkaddr; - set_data_blkaddr(dn); + f2fs_set_data_blkaddr(dn); f2fs_update_extent_cache(dn); } /* dn->ofs_in_node will be returned with up-to-date last block pointer */ -int reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count) +int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count) { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); + int err; if (!count) return 0; if (unlikely(is_inode_flag_set(dn->inode, FI_NO_ALLOC))) return -EPERM; - if (unlikely(!inc_valid_block_count(sbi, dn->inode, &count))) - return -ENOSPC; + if (unlikely((err = inc_valid_block_count(sbi, dn->inode, &count)))) + return err; trace_f2fs_reserve_new_blocks(dn->inode, dn->nid, dn->ofs_in_node, count); @@ -349,8 +644,8 @@ int reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count) f2fs_wait_on_page_writeback(dn->node_page, NODE, true); for (; count > 0; dn->ofs_in_node++) { - block_t blkaddr = - datablock_addr(dn->node_page, dn->ofs_in_node); + block_t blkaddr = datablock_addr(dn->inode, + dn->node_page, dn->ofs_in_node); if (blkaddr == NULL_ADDR) { dn->data_blkaddr = NEW_ADDR; __set_data_blkaddr(dn); @@ -364,12 +659,12 @@ int reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count) } /* Should keep dn->ofs_in_node unchanged */ -int reserve_new_block(struct dnode_of_data *dn) +int f2fs_reserve_new_block(struct dnode_of_data *dn) { unsigned int ofs_in_node = dn->ofs_in_node; int ret; - ret = reserve_new_blocks(dn, 1); + ret = f2fs_reserve_new_blocks(dn, 1); dn->ofs_in_node = ofs_in_node; return ret; } @@ -379,12 +674,12 @@ int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index) bool need_put = dn->inode_page ? false : true; int err; - err = get_dnode_of_data(dn, index, ALLOC_NODE); + err = f2fs_get_dnode_of_data(dn, index, ALLOC_NODE); if (err) return err; if (dn->data_blkaddr == NULL_ADDR) - err = reserve_new_block(dn); + err = f2fs_reserve_new_block(dn); if (err || need_put) f2fs_put_dnode(dn); return err; @@ -392,7 +687,7 @@ int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index) int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index) { - struct extent_info ei; + struct extent_info ei = {0,0,0}; struct inode *inode = dn->inode; if (f2fs_lookup_extent_cache(inode, index, &ei)) { @@ -403,24 +698,14 @@ int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index) return f2fs_reserve_block(dn, index); } -struct page *get_read_data_page(struct inode *inode, pgoff_t index, +struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index, int op_flags, bool for_write) { struct address_space *mapping = inode->i_mapping; struct dnode_of_data dn; struct page *page; - struct extent_info ei; + struct extent_info ei = {0,0,0}; int err; - struct f2fs_io_info fio = { - .sbi = F2FS_I_SB(inode), - .type = DATA, - .op = REQ_OP_READ, - .op_flags = op_flags, - .encrypted_page = NULL, - }; - - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) - return read_mapping_page(mapping, index, NULL); page = f2fs_grab_cache_page(mapping, index, for_write); if (!page) @@ -432,7 +717,7 @@ struct page *get_read_data_page(struct inode *inode, pgoff_t index, } set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, index, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE); if (err) goto put_err; f2fs_put_dnode(&dn); @@ -451,7 +736,8 @@ struct page *get_read_data_page(struct inode *inode, pgoff_t index, * A new dentry page is allocated but not able to be written, since its * new inode page couldn't be allocated due to -ENOSPC. * In such the case, its blkaddr can be remained as NEW_ADDR. - * see, f2fs_add_link -> get_new_data_page -> init_inode_metadata. + * see, f2fs_add_link -> f2fs_get_new_data_page -> + * f2fs_init_inode_metadata. */ if (dn.data_blkaddr == NEW_ADDR) { zero_user_segment(page, 0, PAGE_SIZE); @@ -461,9 +747,7 @@ struct page *get_read_data_page(struct inode *inode, pgoff_t index, return page; } - fio.new_blkaddr = fio.old_blkaddr = dn.data_blkaddr; - fio.page = page; - err = f2fs_submit_page_bio(&fio); + err = f2fs_submit_page_read(inode, page, dn.data_blkaddr); if (err) goto put_err; return page; @@ -473,7 +757,7 @@ struct page *get_read_data_page(struct inode *inode, pgoff_t index, return ERR_PTR(err); } -struct page *find_data_page(struct inode *inode, pgoff_t index) +struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index) { struct address_space *mapping = inode->i_mapping; struct page *page; @@ -483,7 +767,7 @@ struct page *find_data_page(struct inode *inode, pgoff_t index) return page; f2fs_put_page(page, 0); - page = get_read_data_page(inode, index, READ_SYNC, false); + page = f2fs_get_read_data_page(inode, index, 0, false); if (IS_ERR(page)) return page; @@ -503,13 +787,13 @@ struct page *find_data_page(struct inode *inode, pgoff_t index) * Because, the callers, functions in dir.c and GC, should be able to know * whether this page exists or not. */ -struct page *get_lock_data_page(struct inode *inode, pgoff_t index, +struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index, bool for_write) { struct address_space *mapping = inode->i_mapping; struct page *page; repeat: - page = get_read_data_page(inode, index, READ_SYNC, for_write); + page = f2fs_get_read_data_page(inode, index, 0, for_write); if (IS_ERR(page)) return page; @@ -535,7 +819,7 @@ struct page *get_lock_data_page(struct inode *inode, pgoff_t index, * Note that, ipage is set only by make_empty_dir, and if any error occur, * ipage should be released by this function. */ -struct page *get_new_data_page(struct inode *inode, +struct page *f2fs_get_new_data_page(struct inode *inode, struct page *ipage, pgoff_t index, bool new_i_size) { struct address_space *mapping = inode->i_mapping; @@ -574,7 +858,7 @@ struct page *get_new_data_page(struct inode *inode, /* if ipage exists, blkaddr should be NEW_ADDR */ f2fs_bug_on(F2FS_I_SB(inode), ipage); - page = get_lock_data_page(inode, index, true); + page = f2fs_get_lock_data_page(inode, index, true); if (IS_ERR(page)) return page; } @@ -585,38 +869,43 @@ struct page *get_new_data_page(struct inode *inode, return page; } -static int __allocate_data_block(struct dnode_of_data *dn) +static int __allocate_data_block(struct dnode_of_data *dn, int seg_type) { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); struct f2fs_summary sum; struct node_info ni; - int seg = CURSEG_WARM_DATA; + block_t old_blkaddr; pgoff_t fofs; blkcnt_t count = 1; + int err; if (unlikely(is_inode_flag_set(dn->inode, FI_NO_ALLOC))) return -EPERM; - dn->data_blkaddr = datablock_addr(dn->node_page, dn->ofs_in_node); + err = f2fs_get_node_info(sbi, dn->nid, &ni); + if (err) + return err; + + dn->data_blkaddr = datablock_addr(dn->inode, + dn->node_page, dn->ofs_in_node); if (dn->data_blkaddr == NEW_ADDR) goto alloc; - if (unlikely(!inc_valid_block_count(sbi, dn->inode, &count))) - return -ENOSPC; + if (unlikely((err = inc_valid_block_count(sbi, dn->inode, &count)))) + return err; alloc: - get_node_info(sbi, dn->nid, &ni); set_summary(&sum, dn->nid, dn->ofs_in_node, ni.version); - - if (dn->ofs_in_node == 0 && dn->inode_page == dn->node_page) - seg = CURSEG_DIRECT_IO; - - allocate_data_block(sbi, NULL, dn->data_blkaddr, &dn->data_blkaddr, - &sum, seg); - set_data_blkaddr(dn); + old_blkaddr = dn->data_blkaddr; + f2fs_allocate_data_block(sbi, NULL, old_blkaddr, &dn->data_blkaddr, + &sum, seg_type, NULL, false); + if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO) + invalidate_mapping_pages(META_MAPPING(sbi), + old_blkaddr, old_blkaddr); + f2fs_set_data_blkaddr(dn); /* update i_size */ - fofs = start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) + + fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) + dn->ofs_in_node; if (i_size_read(dn->inode) < ((loff_t)(fofs + 1) << PAGE_SHIFT)) f2fs_i_size_write(dn->inode, @@ -624,11 +913,23 @@ static int __allocate_data_block(struct dnode_of_data *dn) return 0; } -ssize_t f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) +int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); struct f2fs_map_blocks map; - ssize_t ret = 0; + int flag; + int err = 0; + bool direct_io = iocb->ki_flags & IOCB_DIRECT; + + /* convert inline data for Direct I/O*/ + if (direct_io) { + err = f2fs_convert_inline_inode(inode); + if (err) + return err; + } + + if (is_inode_flag_set(inode, FI_NO_PREALLOC)) + return 0; map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos); map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from)); @@ -638,21 +939,49 @@ ssize_t f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) map.m_len = 0; map.m_next_pgofs = NULL; + map.m_next_extent = NULL; + map.m_seg_type = NO_CHECK_TYPE; - if (iocb->ki_flags & IOCB_DIRECT) { - ret = f2fs_convert_inline_inode(inode); - if (ret) - return ret; - return f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_DIO); + if (direct_io) { + map.m_seg_type = f2fs_rw_hint_to_seg_type(iocb->ki_hint); + flag = f2fs_force_buffered_io(inode, WRITE) ? + F2FS_GET_BLOCK_PRE_AIO : + F2FS_GET_BLOCK_PRE_DIO; + goto map_blocks; } - if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA) { - ret = f2fs_convert_inline_inode(inode); - if (ret) - return ret; + if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) { + err = f2fs_convert_inline_inode(inode); + if (err) + return err; } - if (!f2fs_has_inline_data(inode)) - return f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_AIO); - return ret; + if (f2fs_has_inline_data(inode)) + return err; + + flag = F2FS_GET_BLOCK_PRE_AIO; + +map_blocks: + err = f2fs_map_blocks(inode, &map, 1, flag); + if (map.m_len > 0 && err == -ENOSPC) { + if (!direct_io) + set_inode_flag(inode, FI_NO_PREALLOC); + err = 0; + } + return err; +} + +static inline void __do_map_lock(struct f2fs_sb_info *sbi, int flag, bool lock) +{ + if (flag == F2FS_GET_BLOCK_PRE_AIO) { + if (lock) + down_read(&sbi->node_change); + else + up_read(&sbi->node_change); + } else { + if (lock) + f2fs_lock_op(sbi); + else + f2fs_unlock_op(sbi); + } } /* @@ -675,9 +1004,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int err = 0, ofs = 1; unsigned int ofs_in_node, last_ofs_in_node; blkcnt_t prealloc; - struct extent_info ei; - bool allocated = false; + struct extent_info ei = {0,0,0}; block_t blkaddr; + unsigned int start_pgofs; if (!maxblocks) return 0; @@ -693,16 +1022,18 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, map->m_pblk = ei.blk + pgofs - ei.fofs; map->m_len = min((pgoff_t)maxblocks, ei.fofs + ei.len - pgofs); map->m_flags = F2FS_MAP_MAPPED; + if (map->m_next_extent) + *map->m_next_extent = pgofs + map->m_len; goto out; } next_dnode: if (create) - f2fs_lock_op(sbi); + __do_map_lock(sbi, flag, true); /* When reading holes, we need its node page */ set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, pgofs, mode); + err = f2fs_get_dnode_of_data(&dn, pgofs, mode); if (err) { if (flag == F2FS_GET_BLOCK_BMAP) map->m_pblk = 0; @@ -710,19 +1041,29 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, err = 0; if (map->m_next_pgofs) *map->m_next_pgofs = - get_next_page_offset(&dn, pgofs); + f2fs_get_next_page_offset(&dn, pgofs); + if (map->m_next_extent) + *map->m_next_extent = + f2fs_get_next_page_offset(&dn, pgofs); } goto unlock_out; } + start_pgofs = pgofs; prealloc = 0; last_ofs_in_node = ofs_in_node = dn.ofs_in_node; end_offset = ADDRS_PER_PAGE(dn.node_page, inode); next_block: - blkaddr = datablock_addr(dn.node_page, dn.ofs_in_node); + blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node); - if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) { + if (__is_valid_data_blkaddr(blkaddr) && + !f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC)) { + err = -EFAULT; + goto sync_out; + } + + if (!is_valid_data_blkaddr(sbi, blkaddr)) { if (create) { if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; @@ -734,29 +1075,34 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, last_ofs_in_node = dn.ofs_in_node; } } else { - err = __allocate_data_block(&dn); - if (!err) { + err = __allocate_data_block(&dn, + map->m_seg_type); + if (!err) set_inode_flag(inode, FI_APPEND_WRITE); - allocated = true; - } } if (err) goto sync_out; - map->m_flags = F2FS_MAP_NEW; + map->m_flags |= F2FS_MAP_NEW; blkaddr = dn.data_blkaddr; } else { if (flag == F2FS_GET_BLOCK_BMAP) { map->m_pblk = 0; goto sync_out; } + if (flag == F2FS_GET_BLOCK_PRECACHE) + goto sync_out; if (flag == F2FS_GET_BLOCK_FIEMAP && blkaddr == NULL_ADDR) { if (map->m_next_pgofs) *map->m_next_pgofs = pgofs + 1; - } - if (flag != F2FS_GET_BLOCK_FIEMAP || - blkaddr != NEW_ADDR) goto sync_out; + } + if (flag != F2FS_GET_BLOCK_FIEMAP) { + /* for defragment case */ + if (map->m_next_pgofs) + *map->m_next_pgofs = pgofs + 1; + goto sync_out; + } } } @@ -790,10 +1136,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, (pgofs == end || dn.ofs_in_node == end_offset)) { dn.ofs_in_node = ofs_in_node; - err = reserve_new_blocks(&dn, prealloc); + err = f2fs_reserve_new_blocks(&dn, prealloc); if (err) goto sync_out; - allocated = dn.node_changed; map->m_len += dn.ofs_in_node - ofs_in_node; if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) { @@ -808,45 +1153,92 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, else if (dn.ofs_in_node < end_offset) goto next_block; + if (flag == F2FS_GET_BLOCK_PRECACHE) { + if (map->m_flags & F2FS_MAP_MAPPED) { + unsigned int ofs = start_pgofs - map->m_lblk; + + f2fs_update_extent_cache_range(&dn, + start_pgofs, map->m_pblk + ofs, + map->m_len - ofs); + } + } + f2fs_put_dnode(&dn); if (create) { - f2fs_unlock_op(sbi); - f2fs_balance_fs(sbi, allocated); + __do_map_lock(sbi, flag, false); + f2fs_balance_fs(sbi, dn.node_changed); } - allocated = false; goto next_dnode; sync_out: + if (flag == F2FS_GET_BLOCK_PRECACHE) { + if (map->m_flags & F2FS_MAP_MAPPED) { + unsigned int ofs = start_pgofs - map->m_lblk; + + f2fs_update_extent_cache_range(&dn, + start_pgofs, map->m_pblk + ofs, + map->m_len - ofs); + } + if (map->m_next_extent) + *map->m_next_extent = pgofs + 1; + } f2fs_put_dnode(&dn); unlock_out: if (create) { - f2fs_unlock_op(sbi); - f2fs_balance_fs(sbi, allocated); + __do_map_lock(sbi, flag, false); + f2fs_balance_fs(sbi, dn.node_changed); } out: trace_f2fs_map_blocks(inode, map, err); return err; } -static int __get_data_block(struct inode *inode, sector_t iblock, - struct buffer_head *bh, int create, int flag, - pgoff_t *next_pgofs) +bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len) { struct f2fs_map_blocks map; - int ret; + block_t last_lblk; + int err; + + if (pos + len > i_size_read(inode)) + return false; + + map.m_lblk = F2FS_BYTES_TO_BLK(pos); + map.m_next_pgofs = NULL; + map.m_next_extent = NULL; + map.m_seg_type = NO_CHECK_TYPE; + last_lblk = F2FS_BLK_ALIGN(pos + len); + + while (map.m_lblk < last_lblk) { + map.m_len = last_lblk - map.m_lblk; + err = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_DEFAULT); + if (err || map.m_len == 0) + return false; + map.m_lblk += map.m_len; + } + return true; +} + +static int __get_data_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh, int create, int flag, + pgoff_t *next_pgofs, int seg_type) +{ + struct f2fs_map_blocks map; + int err; map.m_lblk = iblock; map.m_len = bh->b_size >> inode->i_blkbits; map.m_next_pgofs = next_pgofs; + map.m_next_extent = NULL; + map.m_seg_type = seg_type; - ret = f2fs_map_blocks(inode, &map, create, flag); - if (!ret) { + err = f2fs_map_blocks(inode, &map, create, flag); + if (!err) { map_bh(bh, inode->i_sb, map.m_pblk); bh->b_state = (bh->b_state & ~F2FS_MAP_FLAGS) | map.m_flags; bh->b_size = (u64)map.m_len << inode->i_blkbits; } - return ret; + return err; } static int get_data_block(struct inode *inode, sector_t iblock, @@ -854,14 +1246,17 @@ static int get_data_block(struct inode *inode, sector_t iblock, pgoff_t *next_pgofs) { return __get_data_block(inode, iblock, bh_result, create, - flag, next_pgofs); + flag, next_pgofs, + NO_CHECK_TYPE); } static int get_data_block_dio(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { return __get_data_block(inode, iblock, bh_result, create, - F2FS_GET_BLOCK_DIO, NULL); + F2FS_GET_BLOCK_DEFAULT, NULL, + f2fs_rw_hint_to_seg_type( + inode->i_write_hint)); } static int get_data_block_bmap(struct inode *inode, sector_t iblock, @@ -872,7 +1267,8 @@ static int get_data_block_bmap(struct inode *inode, sector_t iblock, return -EFBIG; return __get_data_block(inode, iblock, bh_result, create, - F2FS_GET_BLOCK_BMAP, NULL); + F2FS_GET_BLOCK_BMAP, NULL, + NO_CHECK_TYPE); } static inline sector_t logical_to_blk(struct inode *inode, loff_t offset) @@ -885,36 +1281,109 @@ static inline loff_t blk_to_logical(struct inode *inode, sector_t blk) return (blk << inode->i_blkbits); } +static int f2fs_xattr_fiemap(struct inode *inode, + struct fiemap_extent_info *fieinfo) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct page *page; + struct node_info ni; + __u64 phys = 0, len; + __u32 flags; + nid_t xnid = F2FS_I(inode)->i_xattr_nid; + int err = 0; + + if (f2fs_has_inline_xattr(inode)) { + int offset; + + page = f2fs_grab_cache_page(NODE_MAPPING(sbi), + inode->i_ino, false); + if (!page) + return -ENOMEM; + + err = f2fs_get_node_info(sbi, inode->i_ino, &ni); + if (err) { + f2fs_put_page(page, 1); + return err; + } + + phys = (__u64)blk_to_logical(inode, ni.blk_addr); + offset = offsetof(struct f2fs_inode, i_addr) + + sizeof(__le32) * (DEF_ADDRS_PER_INODE - + get_inline_xattr_addrs(inode)); + + phys += offset; + len = inline_xattr_size(inode); + + f2fs_put_page(page, 1); + + flags = FIEMAP_EXTENT_DATA_INLINE | FIEMAP_EXTENT_NOT_ALIGNED; + + if (!xnid) + flags |= FIEMAP_EXTENT_LAST; + + err = fiemap_fill_next_extent(fieinfo, 0, phys, len, flags); + if (err || err == 1) + return err; + } + + if (xnid) { + page = f2fs_grab_cache_page(NODE_MAPPING(sbi), xnid, false); + if (!page) + return -ENOMEM; + + err = f2fs_get_node_info(sbi, xnid, &ni); + if (err) { + f2fs_put_page(page, 1); + return err; + } + + phys = (__u64)blk_to_logical(inode, ni.blk_addr); + len = inode->i_sb->s_blocksize; + + f2fs_put_page(page, 1); + + flags = FIEMAP_EXTENT_LAST; + } + + if (phys) + err = fiemap_fill_next_extent(fieinfo, 0, phys, len, flags); + + return (err < 0 ? err : 0); +} + int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { struct buffer_head map_bh; sector_t start_blk, last_blk; pgoff_t next_pgofs; - loff_t isize; u64 logical = 0, phys = 0, size = 0; u32 flags = 0; int ret = 0; - ret = fiemap_check_flags(fieinfo, FIEMAP_FLAG_SYNC); + if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) { + ret = f2fs_precache_extents(inode); + if (ret) + return ret; + } + + ret = fiemap_check_flags(fieinfo, FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR); if (ret) return ret; + inode_lock(inode); + + if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) { + ret = f2fs_xattr_fiemap(inode, fieinfo); + goto out; + } + if (f2fs_has_inline_data(inode)) { ret = f2fs_inline_data_fiemap(inode, fieinfo, start, len); if (ret != -EAGAIN) - return ret; + goto out; } - inode_lock(inode); - - isize = i_size_read(inode); - if (start >= isize) - goto out; - - if (start + len > isize) - len = isize - start; - if (logical_to_blk(inode, len) == 0) len = blk_to_logical(inode, 1); @@ -933,13 +1402,11 @@ int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, /* HOLE */ if (!buffer_mapped(&map_bh)) { start_blk = next_pgofs; - /* Go through holes util pass the EOF */ - if (blk_to_logical(inode, start_blk) < isize) + + if (blk_to_logical(inode, start_blk) < blk_to_logical(inode, + F2FS_I_SB(inode)->max_file_blocks)) goto prep_next; - /* Found a hole beyond isize means no more extents. - * Note that the premise is that filesystems don't - * punch holes beyond isize and keep size unchanged. - */ + flags |= FIEMAP_EXTENT_LAST; } @@ -977,47 +1444,20 @@ int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, return ret; } -static struct bio *f2fs_grab_bio(struct inode *inode, block_t blkaddr, - unsigned nr_pages) -{ - struct f2fs_sb_info *sbi = F2FS_I_SB(inode); - struct fscrypt_ctx *ctx = NULL; - struct block_device *bdev = sbi->sb->s_bdev; - struct bio *bio; - - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) { - ctx = fscrypt_get_ctx(inode, GFP_NOFS); - if (IS_ERR(ctx)) - return ERR_CAST(ctx); - - /* wait the page to be moved by cleaning */ - f2fs_wait_on_encrypted_page_writeback(sbi, blkaddr); - } - - bio = bio_alloc(GFP_KERNEL, min_t(int, nr_pages, BIO_MAX_PAGES)); - if (!bio) { - if (ctx) - fscrypt_release_ctx(ctx); - return ERR_PTR(-ENOMEM); - } - bio->bi_bdev = bdev; - bio->bi_iter.bi_sector = SECTOR_FROM_BLOCK(blkaddr); - bio->bi_end_io = f2fs_read_end_io; - bio->bi_private = ctx; - - return bio; -} - /* * This function was originally taken from fs/mpage.c, and customized for f2fs. * Major change was from block_size == page_size in f2fs by default. + * + * Note that the aops->readpages() function is ONLY used for read-ahead. If + * this function ever deviates from doing just read-ahead, it should either + * use ->readpage() or do the necessary surgery to decouple ->readpages() + * from read-ahead. */ static int f2fs_mpage_readpages(struct address_space *mapping, struct list_head *pages, struct page *page, - unsigned nr_pages) + unsigned nr_pages, bool is_readahead) { struct bio *bio = NULL; - unsigned page_idx; sector_t last_block_in_bio = 0; struct inode *inode = mapping->host; const unsigned blkbits = inode->i_blkbits; @@ -1033,12 +1473,14 @@ static int f2fs_mpage_readpages(struct address_space *mapping, map.m_len = 0; map.m_flags = 0; map.m_next_pgofs = NULL; + map.m_next_extent = NULL; + map.m_seg_type = NO_CHECK_TYPE; - for (page_idx = 0; nr_pages; page_idx++, nr_pages--) { - - prefetchw(&page->flags); + for (; nr_pages; nr_pages--) { if (pages) { - page = list_entry(pages->prev, struct page, lru); + page = list_last_entry(pages, struct page, lru); + + prefetchw(&page->flags); list_del(&page->lru); if (add_to_page_cache_lru(page, mapping, page->index, @@ -1072,7 +1514,7 @@ static int f2fs_mpage_readpages(struct address_space *mapping, map.m_len = last_block - block_in_file; if (f2fs_map_blocks(inode, &map, 0, - F2FS_GET_BLOCK_READ)) + F2FS_GET_BLOCK_DEFAULT)) goto set_error_page; } got_it: @@ -1084,6 +1526,10 @@ static int f2fs_mpage_readpages(struct address_space *mapping, SetPageUptodate(page); goto confused; } + + if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), block_nr, + DATA_GENERIC)) + goto set_error_page; } else { zero_user_segment(page, 0, PAGE_SIZE); if (!PageUptodate(page)) @@ -1096,18 +1542,19 @@ static int f2fs_mpage_readpages(struct address_space *mapping, * This page will go to BIO. Do we need to send this * BIO off first? */ - if (bio && (last_block_in_bio != block_nr - 1)) { + if (bio && (last_block_in_bio != block_nr - 1 || + !__same_bdev(F2FS_I_SB(inode), block_nr, bio))) { submit_and_realloc: __submit_bio(F2FS_I_SB(inode), bio, DATA); bio = NULL; } if (bio == NULL) { - bio = f2fs_grab_bio(inode, block_nr, nr_pages); + bio = f2fs_grab_read_bio(inode, block_nr, nr_pages, + is_readahead ? REQ_RAHEAD : 0); if (IS_ERR(bio)) { bio = NULL; goto set_error_page; } - bio_set_op_attrs(bio, REQ_OP_READ, 0); } if (bio_add_page(bio, page, blocksize, 0) < blocksize) @@ -1147,7 +1594,7 @@ static int f2fs_read_data_page(struct file *file, struct page *page) if (f2fs_has_inline_data(inode)) ret = f2fs_read_inline_data(inode, page); if (ret == -EAGAIN) - ret = f2fs_mpage_readpages(page->mapping, NULL, page, 1); + ret = f2fs_mpage_readpages(page->mapping, NULL, page, 1, false); return ret; } @@ -1155,8 +1602,8 @@ static int f2fs_read_data_pages(struct file *file, struct address_space *mapping, struct list_head *pages, unsigned nr_pages) { - struct inode *inode = file->f_mapping->host; - struct page *page = list_entry(pages->prev, struct page, lru); + struct inode *inode = mapping->host; + struct page *page = list_last_entry(pages, struct page, lru); trace_f2fs_readpages(inode, page, nr_pages); @@ -1164,20 +1611,151 @@ static int f2fs_read_data_pages(struct file *file, if (f2fs_has_inline_data(inode)) return 0; - return f2fs_mpage_readpages(mapping, pages, NULL, nr_pages); + return f2fs_mpage_readpages(mapping, pages, NULL, nr_pages, true); } -int do_write_data_page(struct f2fs_io_info *fio) +static int encrypt_one_page(struct f2fs_io_info *fio) +{ + struct inode *inode = fio->page->mapping->host; + struct page *mpage; + gfp_t gfp_flags = GFP_NOFS; + + if (!f2fs_encrypted_file(inode)) + return 0; + + /* wait for GCed page writeback via META_MAPPING */ + f2fs_wait_on_block_writeback(fio->sbi, fio->old_blkaddr); + +retry_encrypt: + fio->encrypted_page = fscrypt_encrypt_page(inode, fio->page, + PAGE_SIZE, 0, fio->page->index, gfp_flags); + if (IS_ERR(fio->encrypted_page)) { + /* flush pending IOs and wait for a while in the ENOMEM case */ + if (PTR_ERR(fio->encrypted_page) == -ENOMEM) { + f2fs_flush_merged_writes(fio->sbi); + congestion_wait(BLK_RW_ASYNC, HZ/50); + gfp_flags |= __GFP_NOFAIL; + goto retry_encrypt; + } + return PTR_ERR(fio->encrypted_page); + } + + mpage = find_lock_page(META_MAPPING(fio->sbi), fio->old_blkaddr); + if (mpage) { + if (PageUptodate(mpage)) + memcpy(page_address(mpage), + page_address(fio->encrypted_page), PAGE_SIZE); + f2fs_put_page(mpage, 1); + } + return 0; +} + +static inline bool check_inplace_update_policy(struct inode *inode, + struct f2fs_io_info *fio) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + unsigned int policy = SM_I(sbi)->ipu_policy; + + if (policy & (0x1 << F2FS_IPU_FORCE)) + return true; + if (policy & (0x1 << F2FS_IPU_SSR) && f2fs_need_SSR(sbi)) + return true; + if (policy & (0x1 << F2FS_IPU_UTIL) && + utilization(sbi) > SM_I(sbi)->min_ipu_util) + return true; + if (policy & (0x1 << F2FS_IPU_SSR_UTIL) && f2fs_need_SSR(sbi) && + utilization(sbi) > SM_I(sbi)->min_ipu_util) + return true; + + /* + * IPU for rewrite async pages + */ + if (policy & (0x1 << F2FS_IPU_ASYNC) && + fio && fio->op == REQ_OP_WRITE && + !(fio->op_flags & REQ_SYNC) && + !f2fs_encrypted_inode(inode)) + return true; + + /* this is only set during fdatasync */ + if (policy & (0x1 << F2FS_IPU_FSYNC) && + is_inode_flag_set(inode, FI_NEED_IPU)) + return true; + + return false; +} + +bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio) +{ + if (f2fs_is_pinned_file(inode)) + return true; + + /* if this is cold file, we should overwrite to avoid fragmentation */ + if (file_is_cold(inode)) + return true; + + return check_inplace_update_policy(inode, fio); +} + +bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + + if (test_opt(sbi, LFS)) + return true; + if (S_ISDIR(inode->i_mode)) + return true; + if (f2fs_is_atomic_file(inode)) + return true; + if (fio) { + if (is_cold_data(fio->page)) + return true; + if (IS_ATOMIC_WRITTEN_PAGE(fio->page)) + return true; + } + return false; +} + +static inline bool need_inplace_update(struct f2fs_io_info *fio) +{ + struct inode *inode = fio->page->mapping->host; + + if (f2fs_should_update_outplace(inode, fio)) + return false; + + return f2fs_should_update_inplace(inode, fio); +} + +int f2fs_do_write_data_page(struct f2fs_io_info *fio) { struct page *page = fio->page; struct inode *inode = page->mapping->host; struct dnode_of_data dn; + struct extent_info ei = {0,0,0}; + struct node_info ni; + bool ipu_force = false; int err = 0; set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, page->index, LOOKUP_NODE); + if (need_inplace_update(fio) && + f2fs_lookup_extent_cache(inode, page->index, &ei)) { + fio->old_blkaddr = ei.blk + page->index - ei.fofs; + + if (!f2fs_is_valid_blkaddr(fio->sbi, fio->old_blkaddr, + DATA_GENERIC)) + return -EFAULT; + + ipu_force = true; + fio->need_lock = LOCK_DONE; + goto got_it; + } + + /* Deadlock due to between page->lock and f2fs_lock_op */ + if (fio->need_lock == LOCK_REQ && !f2fs_trylock_op(fio->sbi)) + return -EAGAIN; + + err = f2fs_get_dnode_of_data(&dn, page->index, LOOKUP_NODE); if (err) - return err; + goto out; fio->old_blkaddr = dn.data_blkaddr; @@ -1186,57 +1764,72 @@ int do_write_data_page(struct f2fs_io_info *fio) ClearPageUptodate(page); goto out_writepage; } - - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) { - gfp_t gfp_flags = GFP_NOFS; - - /* wait for GCed encrypted page writeback */ - f2fs_wait_on_encrypted_page_writeback(F2FS_I_SB(inode), - fio->old_blkaddr); -retry_encrypt: - fio->encrypted_page = fscrypt_encrypt_page(inode, fio->page, - gfp_flags); - if (IS_ERR(fio->encrypted_page)) { - err = PTR_ERR(fio->encrypted_page); - if (err == -ENOMEM) { - /* flush pending ios and wait for a while */ - f2fs_flush_merged_bios(F2FS_I_SB(inode)); - congestion_wait(BLK_RW_ASYNC, HZ/50); - gfp_flags |= __GFP_NOFAIL; - err = 0; - goto retry_encrypt; - } - goto out_writepage; - } +got_it: + if (__is_valid_data_blkaddr(fio->old_blkaddr) && + !f2fs_is_valid_blkaddr(fio->sbi, fio->old_blkaddr, + DATA_GENERIC)) { + err = -EFAULT; + goto out_writepage; } - - set_page_writeback(page); - /* * If current allocation needs SSR, * it had better in-place writes for updated data. */ - if (unlikely(fio->old_blkaddr != NEW_ADDR && - !is_cold_data(page) && - !IS_ATOMIC_WRITTEN_PAGE(page) && - need_inplace_update(inode))) { - rewrite_data_page(fio); + if (ipu_force || (is_valid_data_blkaddr(fio->sbi, fio->old_blkaddr) && + need_inplace_update(fio))) { + err = encrypt_one_page(fio); + if (err) + goto out_writepage; + + set_page_writeback(page); + ClearPageError(page); + f2fs_put_dnode(&dn); + if (fio->need_lock == LOCK_REQ) + f2fs_unlock_op(fio->sbi); + err = f2fs_inplace_write_data(fio); + trace_f2fs_do_write_data_page(fio->page, IPU); set_inode_flag(inode, FI_UPDATE_WRITE); - trace_f2fs_do_write_data_page(page, IPU); - } else { - write_data_page(&dn, fio); - trace_f2fs_do_write_data_page(page, OPU); - set_inode_flag(inode, FI_APPEND_WRITE); - if (page->index == 0) - set_inode_flag(inode, FI_FIRST_BLOCK_WRITTEN); + return err; } + + if (fio->need_lock == LOCK_RETRY) { + if (!f2fs_trylock_op(fio->sbi)) { + err = -EAGAIN; + goto out_writepage; + } + fio->need_lock = LOCK_REQ; + } + + err = f2fs_get_node_info(fio->sbi, dn.nid, &ni); + if (err) + goto out_writepage; + + fio->version = ni.version; + + err = encrypt_one_page(fio); + if (err) + goto out_writepage; + + set_page_writeback(page); + ClearPageError(page); + + /* LFS mode write path */ + f2fs_outplace_write_data(&dn, fio); + trace_f2fs_do_write_data_page(page, OPU); + set_inode_flag(inode, FI_APPEND_WRITE); + if (page->index == 0) + set_inode_flag(inode, FI_FIRST_BLOCK_WRITTEN); out_writepage: f2fs_put_dnode(&dn); +out: + if (fio->need_lock == LOCK_REQ) + f2fs_unlock_op(fio->sbi); return err; } -static int f2fs_write_data_page(struct page *page, - struct writeback_control *wbc) +static int __write_data_page(struct page *page, bool *submitted, + struct writeback_control *wbc, + enum iostat_type io_type) { struct inode *inode = page->mapping->host; struct f2fs_sb_info *sbi = F2FS_I_SB(inode); @@ -1249,15 +1842,36 @@ static int f2fs_write_data_page(struct page *page, int err = 0; struct f2fs_io_info fio = { .sbi = sbi, + .ino = inode->i_ino, .type = DATA, .op = REQ_OP_WRITE, - .op_flags = (wbc->sync_mode == WB_SYNC_ALL) ? WRITE_SYNC : 0, + .op_flags = wbc_to_write_flags(wbc), + .old_blkaddr = NULL_ADDR, .page = page, .encrypted_page = NULL, + .submitted = false, + .need_lock = LOCK_RETRY, + .io_type = io_type, + .io_wbc = wbc, }; trace_f2fs_writepage(page, DATA); + /* we should bypass data pages to proceed the kworkder jobs */ + if (unlikely(f2fs_cp_error(sbi))) { + mapping_set_error(page->mapping, -EIO); + /* + * don't drop any dirty dentry pages for keeping lastest + * directory structure. + */ + if (S_ISDIR(inode->i_mode)) + goto redirty_out; + goto out; + } + + if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) + goto redirty_out; + if (page->index < end_index) goto write; @@ -1271,25 +1885,18 @@ static int f2fs_write_data_page(struct page *page, zero_user_segment(page, offset, PAGE_SIZE); write: - if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) - goto redirty_out; if (f2fs_is_drop_cache(inode)) goto out; /* we should not write 0'th page having journal header */ if (f2fs_is_volatile_file(inode) && (!page->index || (!wbc->for_reclaim && - available_free_memory(sbi, BASE_CHECK)))) + f2fs_available_free_memory(sbi, BASE_CHECK)))) goto redirty_out; - /* we should bypass data pages to proceed the kworkder jobs */ - if (unlikely(f2fs_cp_error(sbi))) { - mapping_set_error(page->mapping, -EIO); - goto out; - } - /* Dentry blocks are controlled by checkpoint */ if (S_ISDIR(inode->i_mode)) { - err = do_write_data_page(&fio); + fio.need_lock = LOCK_DONE; + err = f2fs_do_write_data_page(&fio); goto done; } @@ -1297,68 +1904,114 @@ static int f2fs_write_data_page(struct page *page, need_balance_fs = true; else if (has_not_enough_free_secs(sbi, 0, 0)) goto redirty_out; + else + set_inode_flag(inode, FI_HOT_DATA); err = -EAGAIN; - f2fs_lock_op(sbi); - if (f2fs_has_inline_data(inode)) + if (f2fs_has_inline_data(inode)) { err = f2fs_write_inline_data(inode, page); - if (err == -EAGAIN) - err = do_write_data_page(&fio); - if (F2FS_I(inode)->last_disk_size < psize) - F2FS_I(inode)->last_disk_size = psize; - f2fs_unlock_op(sbi); + if (!err) + goto out; + } + + if (err == -EAGAIN) { + err = f2fs_do_write_data_page(&fio); + if (err == -EAGAIN) { + fio.need_lock = LOCK_REQ; + err = f2fs_do_write_data_page(&fio); + } + } + + if (err) { + file_set_keep_isize(inode); + } else { + down_write(&F2FS_I(inode)->i_sem); + if (F2FS_I(inode)->last_disk_size < psize) + F2FS_I(inode)->last_disk_size = psize; + up_write(&F2FS_I(inode)->i_sem); + } + done: if (err && err != -ENOENT) goto redirty_out; - clear_cold_data(page); out: inode_dec_dirty_pages(inode); if (err) ClearPageUptodate(page); if (wbc->for_reclaim) { - f2fs_submit_merged_bio_cond(sbi, NULL, page, 0, DATA, WRITE); - remove_dirty_inode(inode); + f2fs_submit_merged_write_cond(sbi, inode, 0, page->index, DATA); + clear_inode_flag(inode, FI_HOT_DATA); + f2fs_remove_dirty_inode(inode); + submitted = NULL; } unlock_page(page); - f2fs_balance_fs(sbi, need_balance_fs); + if (!S_ISDIR(inode->i_mode)) + f2fs_balance_fs(sbi, need_balance_fs); - if (unlikely(f2fs_cp_error(sbi))) - f2fs_submit_merged_bio(sbi, DATA, WRITE); + if (unlikely(f2fs_cp_error(sbi))) { + f2fs_submit_merged_write(sbi, DATA); + submitted = NULL; + } + + if (submitted) + *submitted = fio.submitted; return 0; redirty_out: redirty_page_for_writepage(wbc, page); + /* + * pageout() in MM traslates EAGAIN, so calls handle_write_error() + * -> mapping_set_error() -> set_bit(AS_EIO, ...). + * file_write_and_wait_range() will see EIO error, which is critical + * to return value of fsync() followed by atomic_write failure to user. + */ + if (!err || wbc->for_reclaim) + return AOP_WRITEPAGE_ACTIVATE; unlock_page(page); return err; } +static int f2fs_write_data_page(struct page *page, + struct writeback_control *wbc) +{ + return __write_data_page(page, NULL, wbc, FS_DATA_IO); +} + /* * This function was copied from write_cche_pages from mm/page-writeback.c. * The major change is making write step of cold data page separately from * warm/hot data page. */ static int f2fs_write_cache_pages(struct address_space *mapping, - struct writeback_control *wbc) + struct writeback_control *wbc, + enum iostat_type io_type) { int ret = 0; int done = 0; struct pagevec pvec; + struct f2fs_sb_info *sbi = F2FS_M_SB(mapping); int nr_pages; pgoff_t uninitialized_var(writeback_index); pgoff_t index; pgoff_t end; /* Inclusive */ pgoff_t done_index; + pgoff_t last_idx = ULONG_MAX; int cycled; int range_whole = 0; int tag; - int nwritten = 0; pagevec_init(&pvec, 0); + if (get_dirty_pages(mapping->host) <= + SM_I(F2FS_M_SB(mapping))->min_hot_blocks) + set_inode_flag(mapping->host, FI_HOT_DATA); + else + clear_inode_flag(mapping->host, FI_HOT_DATA); + if (wbc->range_cyclic) { writeback_index = mapping->writeback_index; /* prev offset */ index = writeback_index; @@ -1385,21 +2038,24 @@ static int f2fs_write_cache_pages(struct address_space *mapping, while (!done && (index <= end)) { int i; - nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, - min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1); + nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, + tag); if (nr_pages == 0) break; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; + bool submitted = false; - if (page->index > end) { + /* give a priority to WB_SYNC threads */ + if (atomic_read(&sbi->wb_sync_req[DATA]) && + wbc->sync_mode == WB_SYNC_NONE) { done = 1; break; } done_index = page->index; - +retry_write: lock_page(page); if (unlikely(page->mapping != mapping)) { @@ -1425,17 +2081,35 @@ static int f2fs_write_cache_pages(struct address_space *mapping, if (!clear_page_dirty_for_io(page)) goto continue_unlock; - ret = mapping->a_ops->writepage(page, wbc); + ret = __write_data_page(page, &submitted, wbc, io_type); if (unlikely(ret)) { + /* + * keep nr_to_write, since vfs uses this to + * get # of written pages. + */ + if (ret == AOP_WRITEPAGE_ACTIVATE) { + unlock_page(page); + ret = 0; + continue; + } else if (ret == -EAGAIN) { + ret = 0; + if (wbc->sync_mode == WB_SYNC_ALL) { + cond_resched(); + congestion_wait(BLK_RW_ASYNC, + HZ/50); + goto retry_write; + } + continue; + } done_index = page->index + 1; done = 1; break; - } else { - nwritten++; + } else if (submitted) { + last_idx = page->index; } if (--wbc->nr_to_write <= 0 && - wbc->sync_mode == WB_SYNC_NONE) { + wbc->sync_mode == WB_SYNC_NONE) { done = 1; break; } @@ -1453,20 +2127,34 @@ static int f2fs_write_cache_pages(struct address_space *mapping, if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0)) mapping->writeback_index = done_index; - if (nwritten) - f2fs_submit_merged_bio_cond(F2FS_M_SB(mapping), mapping->host, - NULL, 0, DATA, WRITE); + if (last_idx != ULONG_MAX) + f2fs_submit_merged_write_cond(F2FS_M_SB(mapping), mapping->host, + 0, last_idx, DATA); return ret; } -static int f2fs_write_data_pages(struct address_space *mapping, - struct writeback_control *wbc) +static inline bool __should_serialize_io(struct inode *inode, + struct writeback_control *wbc) +{ + if (!S_ISREG(inode->i_mode)) + return false; + if (wbc->sync_mode != WB_SYNC_ALL) + return true; + if (get_dirty_pages(inode) >= SM_I(F2FS_I_SB(inode))->min_seq_blocks) + return true; + return false; +} + +static int __f2fs_write_data_pages(struct address_space *mapping, + struct writeback_control *wbc, + enum iostat_type io_type) { struct inode *inode = mapping->host; struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct blk_plug plug; int ret; + bool locked = false; /* deal with chardevs and other special file */ if (!mapping->a_ops->writepage) @@ -1476,30 +2164,47 @@ static int f2fs_write_data_pages(struct address_space *mapping, if (!get_dirty_pages(inode) && wbc->sync_mode == WB_SYNC_NONE) return 0; + /* during POR, we don't need to trigger writepage at all. */ + if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) + goto skip_write; + if (S_ISDIR(inode->i_mode) && wbc->sync_mode == WB_SYNC_NONE && get_dirty_pages(inode) < nr_pages_to_skip(sbi, DATA) && - available_free_memory(sbi, DIRTY_DENTS)) + f2fs_available_free_memory(sbi, DIRTY_DENTS)) goto skip_write; /* skip writing during file defragment */ if (is_inode_flag_set(inode, FI_DO_DEFRAG)) goto skip_write; - /* during POR, we don't need to trigger writepage at all. */ - if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) - goto skip_write; - trace_f2fs_writepages(mapping->host, wbc, DATA); + /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */ + if (wbc->sync_mode == WB_SYNC_ALL) + atomic_inc(&sbi->wb_sync_req[DATA]); + else if (atomic_read(&sbi->wb_sync_req[DATA])) + goto skip_write; + + if (__should_serialize_io(inode, wbc)) { + mutex_lock(&sbi->writepages); + locked = true; + } + blk_start_plug(&plug); - ret = f2fs_write_cache_pages(mapping, wbc); + ret = f2fs_write_cache_pages(mapping, wbc, io_type); blk_finish_plug(&plug); + + if (locked) + mutex_unlock(&sbi->writepages); + + if (wbc->sync_mode == WB_SYNC_ALL) + atomic_dec(&sbi->wb_sync_req[DATA]); /* * if some pages were truncated, we cannot guarantee its mapping->host * to detect pending bios. */ - remove_dirty_inode(inode); + f2fs_remove_dirty_inode(inode); return ret; skip_write: @@ -1508,14 +2213,30 @@ static int f2fs_write_data_pages(struct address_space *mapping, return 0; } +static int f2fs_write_data_pages(struct address_space *mapping, + struct writeback_control *wbc) +{ + struct inode *inode = mapping->host; + + return __f2fs_write_data_pages(mapping, wbc, + F2FS_I(inode)->cp_task == current ? + FS_CP_DATA_IO : FS_DATA_IO); +} + static void f2fs_write_failed(struct address_space *mapping, loff_t to) { struct inode *inode = mapping->host; loff_t i_size = i_size_read(inode); if (to > i_size) { + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); + truncate_pagecache(inode, i_size); - truncate_blocks(inode, i_size, true); + f2fs_truncate_blocks(inode, i_size, true); + + up_write(&F2FS_I(inode)->i_mmap_sem); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); } } @@ -1528,24 +2249,25 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi, struct dnode_of_data dn; struct page *ipage; bool locked = false; - struct extent_info ei; + struct extent_info ei = {0,0,0}; int err = 0; /* * we already allocated all the blocks, so we don't need to get * the block addresses when there is no need to fill the page. */ - if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE) + if (!f2fs_has_inline_data(inode) && len == PAGE_SIZE && + !is_inode_flag_set(inode, FI_NO_PREALLOC)) return 0; if (f2fs_has_inline_data(inode) || (pos & PAGE_MASK) >= i_size_read(inode)) { - f2fs_lock_op(sbi); + __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true); locked = true; } restart: /* check inline_data */ - ipage = get_node_page(sbi, inode->i_ino); + ipage = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(ipage)) { err = PTR_ERR(ipage); goto unlock_out; @@ -1554,8 +2276,8 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi, set_new_dnode(&dn, inode, ipage, ipage, 0); if (f2fs_has_inline_data(inode)) { - if (pos + len <= MAX_INLINE_DATA) { - read_inline_data(page, ipage); + if (pos + len <= MAX_INLINE_DATA(inode)) { + f2fs_do_read_inline_data(page, ipage); set_inode_flag(inode, FI_DATA_EXIST); if (inode->i_nlink) set_inline_node(ipage); @@ -1573,10 +2295,11 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi, dn.data_blkaddr = ei.blk + index - ei.fofs; } else { /* hole case */ - err = get_dnode_of_data(&dn, index, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE); if (err || dn.data_blkaddr == NULL_ADDR) { f2fs_put_dnode(&dn); - f2fs_lock_op(sbi); + __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, + true); locked = true; goto restart; } @@ -1590,7 +2313,7 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi, f2fs_put_dnode(&dn); unlock_out: if (locked) - f2fs_unlock_op(sbi); + __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false); return err; } @@ -1602,12 +2325,30 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping, struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct page *page = NULL; pgoff_t index = ((unsigned long long) pos) >> PAGE_SHIFT; - bool need_balance = false; + bool need_balance = false, drop_atomic = false; block_t blkaddr = NULL_ADDR; int err = 0; + if (trace_android_fs_datawrite_start_enabled()) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_datawrite_start(inode, pos, len, + current->pid, path, + current->comm); + } trace_f2fs_write_begin(inode, pos, len, flags); + if ((f2fs_is_atomic_file(inode) && + !f2fs_available_free_memory(sbi, INMEM_PAGES)) || + is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) { + err = -ENOMEM; + drop_atomic = true; + goto fail; + } + /* * We should check this at this moment to avoid deadlock on inode page * and #0 page. The locking rule for inline_data conversion should be: @@ -1623,7 +2364,7 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping, * Do not use grab_cache_page_write_begin() to avoid deadlock due to * wait_for_stable_page. Will wait that below with our IO control. */ - page = pagecache_get_page(mapping, index, + page = f2fs_pagecache_get_page(mapping, index, FGP_LOCK | FGP_WRITE | FGP_CREAT, GFP_NOFS); if (!page) { err = -ENOMEM; @@ -1650,32 +2391,25 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping, f2fs_wait_on_page_writeback(page, DATA, false); - /* wait for GCed encrypted page writeback */ - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) - f2fs_wait_on_encrypted_page_writeback(sbi, blkaddr); + /* wait for GCed page writeback via META_MAPPING */ + if (f2fs_post_read_required(inode)) + f2fs_wait_on_block_writeback(sbi, blkaddr); if (len == PAGE_SIZE || PageUptodate(page)) return 0; + if (!(pos & (PAGE_SIZE - 1)) && (pos + len) >= i_size_read(inode)) { + zero_user_segment(page, len, PAGE_SIZE); + return 0; + } + if (blkaddr == NEW_ADDR) { zero_user_segment(page, 0, PAGE_SIZE); SetPageUptodate(page); } else { - struct bio *bio; - - bio = f2fs_grab_bio(inode, blkaddr, 1); - if (IS_ERR(bio)) { - err = PTR_ERR(bio); + err = f2fs_submit_page_read(inode, page, blkaddr); + if (err) goto fail; - } - bio_set_op_attrs(bio, REQ_OP_READ, READ_SYNC); - if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) { - bio_put(bio); - err = -EFAULT; - goto fail; - } - - __submit_bio(sbi, bio, DATA); lock_page(page); if (unlikely(page->mapping != mapping)) { @@ -1692,6 +2426,8 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping, fail: f2fs_put_page(page, 1); f2fs_write_failed(mapping, pos + len); + if (drop_atomic) + f2fs_drop_inmem_pages_all(sbi, false); return err; } @@ -1702,6 +2438,7 @@ static int f2fs_write_end(struct file *file, { struct inode *inode = page->mapping->host; + trace_android_fs_datawrite_end(inode, pos, len); trace_f2fs_write_end(inode, pos, len, copied); /* @@ -1710,7 +2447,7 @@ static int f2fs_write_end(struct file *file, * let generic_perform_write() try to copy data again through copied=0. */ if (!PageUptodate(page)) { - if (unlikely(copied != PAGE_SIZE)) + if (unlikely(copied != len)) copied = 0; else SetPageUptodate(page); @@ -1719,7 +2456,6 @@ static int f2fs_write_end(struct file *file, goto unlock_out; set_page_dirty(page); - clear_cold_data(page); if (pos + copied > i_size_read(inode)) f2fs_i_size_write(inode, pos + copied); @@ -1732,14 +2468,20 @@ static int f2fs_write_end(struct file *file, static int check_direct_IO(struct inode *inode, struct iov_iter *iter, loff_t offset) { - unsigned blocksize_mask = inode->i_sb->s_blocksize - 1; + unsigned i_blkbits = READ_ONCE(inode->i_blkbits); + unsigned blkbits = i_blkbits; + unsigned blocksize_mask = (1 << blkbits) - 1; + unsigned long align = offset | iov_iter_alignment(iter); + struct block_device *bdev = inode->i_sb->s_bdev; - if (offset & blocksize_mask) - return -EINVAL; - - if (iov_iter_alignment(iter) & blocksize_mask) - return -EINVAL; - + if (align & blocksize_mask) { + if (bdev) + blkbits = blksize_bits(bdev_logical_block_size(bdev)); + blocksize_mask = (1 << blkbits) - 1; + if (align & blocksize_mask) + return -EINVAL; + return 1; + } return 0; } @@ -1747,32 +2489,78 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) { struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = mapping->host; + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); size_t count = iov_iter_count(iter); loff_t offset = iocb->ki_pos; int rw = iov_iter_rw(iter); int err; + enum rw_hint hint = iocb->ki_hint; + int whint_mode = F2FS_OPTION(sbi).whint_mode; err = check_direct_IO(inode, iter, offset); if (err) - return err; + return err < 0 ? err : 0; - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) - return 0; - if (test_opt(F2FS_I_SB(inode), LFS)) + if (f2fs_force_buffered_io(inode, rw)) return 0; trace_f2fs_direct_IO_enter(inode, offset, count, rw); - down_read(&F2FS_I(inode)->dio_rwsem[rw]); + if (trace_android_fs_dataread_start_enabled() && + (rw == READ)) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_dataread_start(inode, offset, + count, current->pid, path, + current->comm); + } + if (trace_android_fs_datawrite_start_enabled() && + (rw == WRITE)) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_datawrite_start(inode, offset, count, + current->pid, path, + current->comm); + } + if (rw == WRITE && whint_mode == WHINT_MODE_OFF) + iocb->ki_hint = WRITE_LIFE_NOT_SET; + + if (!down_read_trylock(&F2FS_I(inode)->i_gc_rwsem[rw])) { + if (iocb->ki_flags & IOCB_NOWAIT) { + iocb->ki_hint = hint; + err = -EAGAIN; + goto out; + } + down_read(&F2FS_I(inode)->i_gc_rwsem[rw]); + } + err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio); - up_read(&F2FS_I(inode)->dio_rwsem[rw]); + up_read(&F2FS_I(inode)->i_gc_rwsem[rw]); if (rw == WRITE) { - if (err > 0) + if (whint_mode == WHINT_MODE_OFF) + iocb->ki_hint = hint; + if (err > 0) { + f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO, + err); set_inode_flag(inode, FI_UPDATE_WRITE); - else if (err < 0) + } else if (err < 0) { f2fs_write_failed(mapping, offset + count); + } } +out: + if (trace_android_fs_dataread_start_enabled() && + (rw == READ)) + trace_android_fs_dataread_end(inode, offset, count); + if (trace_android_fs_datawrite_start_enabled() && + (rw == WRITE)) + trace_android_fs_datawrite_end(inode, offset, count); trace_f2fs_direct_IO_exit(inode, offset, count, rw, err); @@ -1790,17 +2578,19 @@ void f2fs_invalidate_page(struct page *page, unsigned int offset, return; if (PageDirty(page)) { - if (inode->i_ino == F2FS_META_INO(sbi)) + if (inode->i_ino == F2FS_META_INO(sbi)) { dec_page_count(sbi, F2FS_DIRTY_META); - else if (inode->i_ino == F2FS_NODE_INO(sbi)) + } else if (inode->i_ino == F2FS_NODE_INO(sbi)) { dec_page_count(sbi, F2FS_DIRTY_NODES); - else + } else { inode_dec_dirty_pages(inode); + f2fs_remove_dirty_inode(inode); + } } /* This is atomic written page, keep Private */ if (IS_ATOMIC_WRITTEN_PAGE(page)) - return; + return f2fs_drop_inmem_page(inode, page); set_page_private(page, 0); ClearPagePrivate(page); @@ -1821,35 +2611,6 @@ int f2fs_release_page(struct page *page, gfp_t wait) return 1; } -/* - * This was copied from __set_page_dirty_buffers which gives higher performance - * in very high speed storages. (e.g., pmem) - */ -void f2fs_set_page_dirty_nobuffers(struct page *page) -{ - struct address_space *mapping = page->mapping; - unsigned long flags; - - if (unlikely(!mapping)) - return; - - spin_lock(&mapping->private_lock); - lock_page_memcg(page); - SetPageDirty(page); - spin_unlock(&mapping->private_lock); - - spin_lock_irqsave(&mapping->tree_lock, flags); - WARN_ON_ONCE(!PageUptodate(page)); - account_page_dirtied(page, mapping); - radix_tree_tag_set(&mapping->page_tree, - page_index(page), PAGECACHE_TAG_DIRTY); - spin_unlock_irqrestore(&mapping->tree_lock, flags); - unlock_page_memcg(page); - - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - return; -} - static int f2fs_set_data_page_dirty(struct page *page) { struct address_space *mapping = page->mapping; @@ -1860,9 +2621,13 @@ static int f2fs_set_data_page_dirty(struct page *page) if (!PageUptodate(page)) SetPageUptodate(page); - if (f2fs_is_atomic_file(inode)) { + /* don't remain PG_checked flag which was set during GC */ + if (is_cold_data(page)) + clear_cold_data(page); + + if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode)) { if (!IS_ATOMIC_WRITTEN_PAGE(page)) { - register_inmem_page(inode, page); + f2fs_register_inmem_page(inode, page); return 1; } /* @@ -1873,8 +2638,8 @@ static int f2fs_set_data_page_dirty(struct page *page) } if (!PageDirty(page)) { - f2fs_set_page_dirty_nobuffers(page); - update_dirty_page(inode, page); + __set_page_dirty_nobuffers(page); + f2fs_update_dirty_page(inode, page); return 1; } return 0; @@ -1907,8 +2672,12 @@ int f2fs_migrate_page(struct address_space *mapping, BUG_ON(PageWriteback(page)); /* migrating an atomic written page is safe with the inmem_lock hold */ - if (atomic_written && !mutex_trylock(&fi->inmem_lock)) - return -EAGAIN; + if (atomic_written) { + if (mode != MIGRATE_SYNC) + return -EBUSY; + if (!mutex_trylock(&fi->inmem_lock)) + return -EAGAIN; + } /* * A reference is expected if PagePrivate set when move mapping, @@ -1962,3 +2731,38 @@ const struct address_space_operations f2fs_dblock_aops = { .migratepage = f2fs_migrate_page, #endif }; + +void f2fs_clear_radix_tree_dirty_tag(struct page *page) +{ + struct address_space *mapping = page_mapping(page); + unsigned long flags; + + spin_lock_irqsave(&mapping->tree_lock, flags); + radix_tree_tag_clear(&mapping->page_tree, page_index(page), + PAGECACHE_TAG_DIRTY); + spin_unlock_irqrestore(&mapping->tree_lock, flags); +} + +int __init f2fs_init_post_read_processing(void) +{ + bio_post_read_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0); + if (!bio_post_read_ctx_cache) + goto fail; + bio_post_read_ctx_pool = + mempool_create_slab_pool(NUM_PREALLOC_POST_READ_CTXS, + bio_post_read_ctx_cache); + if (!bio_post_read_ctx_pool) + goto fail_free_cache; + return 0; + +fail_free_cache: + kmem_cache_destroy(bio_post_read_ctx_cache); +fail: + return -ENOMEM; +} + +void __exit f2fs_destroy_post_read_processing(void) +{ + mempool_destroy(bio_post_read_ctx_pool); + kmem_cache_destroy(bio_post_read_ctx_cache); +}
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c index 687998e9..214a968 100644 --- a/fs/f2fs/debug.c +++ b/fs/f2fs/debug.c
@@ -45,12 +45,36 @@ static void update_general_status(struct f2fs_sb_info *sbi) si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS); si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META); si->ndirty_data = get_pages(sbi, F2FS_DIRTY_DATA); + si->ndirty_qdata = get_pages(sbi, F2FS_DIRTY_QDATA); si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA); si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE]; si->ndirty_files = sbi->ndirty_inode[FILE_INODE]; + si->nquota_files = sbi->nquota_files; si->ndirty_all = sbi->ndirty_inode[DIRTY_META]; si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES); - si->wb_bios = atomic_read(&sbi->nr_wb_bios); + si->aw_cnt = atomic_read(&sbi->aw_cnt); + si->vw_cnt = atomic_read(&sbi->vw_cnt); + si->max_aw_cnt = atomic_read(&sbi->max_aw_cnt); + si->max_vw_cnt = atomic_read(&sbi->max_vw_cnt); + si->nr_wb_cp_data = get_pages(sbi, F2FS_WB_CP_DATA); + si->nr_wb_data = get_pages(sbi, F2FS_WB_DATA); + if (SM_I(sbi) && SM_I(sbi)->fcc_info) { + si->nr_flushed = + atomic_read(&SM_I(sbi)->fcc_info->issued_flush); + si->nr_flushing = + atomic_read(&SM_I(sbi)->fcc_info->issing_flush); + si->flush_list_empty = + llist_empty(&SM_I(sbi)->fcc_info->issue_list); + } + if (SM_I(sbi) && SM_I(sbi)->dcc_info) { + si->nr_discarded = + atomic_read(&SM_I(sbi)->dcc_info->issued_discard); + si->nr_discarding = + atomic_read(&SM_I(sbi)->dcc_info->issing_discard); + si->nr_discard_cmd = + atomic_read(&SM_I(sbi)->dcc_info->discard_cmd_cnt); + si->undiscard_blks = SM_I(sbi)->dcc_info->undiscard_blks; + } si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg; si->rsvd_segs = reserved_segments(sbi); si->overp_segs = overprovision_segments(sbi); @@ -61,6 +85,8 @@ static void update_general_status(struct f2fs_sb_info *sbi) si->inline_xattr = atomic_read(&sbi->inline_xattr); si->inline_inode = atomic_read(&sbi->inline_inode); si->inline_dir = atomic_read(&sbi->inline_dir); + si->append = sbi->im[APPEND_INO].ino_num; + si->update = sbi->im[UPDATE_INO].ino_num; si->orphans = sbi->im[ORPHAN_INO].ino_num; si->utilization = utilization(sbi); @@ -74,8 +100,12 @@ static void update_general_status(struct f2fs_sb_info *sbi) si->dirty_nats = NM_I(sbi)->dirty_nat_cnt; si->sits = MAIN_SEGS(sbi); si->dirty_sits = SIT_I(sbi)->dirty_sentries; - si->fnids = NM_I(sbi)->fcnt; + si->free_nids = NM_I(sbi)->nid_cnt[FREE_NID]; + si->avail_nids = NM_I(sbi)->available_nids; + si->alloc_nids = NM_I(sbi)->nid_cnt[PREALLOC_NID]; si->bg_gc = sbi->bg_gc; + si->skipped_atomic_files[BG_GC] = sbi->skipped_atomic_files[BG_GC]; + si->skipped_atomic_files[FG_GC] = sbi->skipped_atomic_files[FG_GC]; si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg) * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg) / 2; @@ -87,8 +117,8 @@ static void update_general_status(struct f2fs_sb_info *sbi) for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_NODE; i++) { struct curseg_info *curseg = CURSEG_I(sbi, i); si->curseg[i] = curseg->segno; - si->cursec[i] = curseg->segno / sbi->segs_per_sec; - si->curzone[i] = si->cursec[i] / sbi->secs_per_zone; + si->cursec[i] = GET_SEC_FROM_SEG(sbi, curseg->segno); + si->curzone[i] = GET_ZONE_FROM_SEC(sbi, si->cursec[i]); } for (i = 0; i < 2; i++) { @@ -112,10 +142,10 @@ static void update_sit_info(struct f2fs_sb_info *sbi) bimodal = 0; total_vblocks = 0; - blks_per_sec = sbi->segs_per_sec * sbi->blocks_per_seg; + blks_per_sec = BLKS_PER_SEC(sbi); hblks_per_sec = blks_per_sec / 2; for (segno = 0; segno < MAIN_SEGS(sbi); segno += sbi->segs_per_sec) { - vblocks = get_valid_blocks(sbi, segno, sbi->segs_per_sec); + vblocks = get_valid_blocks(sbi, segno, true); dist = abs(vblocks - hblks_per_sec); bimodal += dist * dist; @@ -144,10 +174,13 @@ static void update_mem_info(struct f2fs_sb_info *sbi) if (si->base_mem) goto get_cache; - si->base_mem = sizeof(struct f2fs_sb_info) + sbi->sb->s_blocksize; + /* build stat */ + si->base_mem = sizeof(struct f2fs_stat_info); + + /* build superblock */ + si->base_mem += sizeof(struct f2fs_sb_info) + sbi->sb->s_blocksize; si->base_mem += 2 * sizeof(struct f2fs_inode_info); si->base_mem += sizeof(*sbi->ckpt); - si->base_mem += sizeof(struct percpu_counter) * NR_COUNT_TYPE; /* build sm */ si->base_mem += sizeof(struct f2fs_sm_info); @@ -181,6 +214,11 @@ static void update_mem_info(struct f2fs_sb_info *sbi) /* build nm */ si->base_mem += sizeof(struct f2fs_nm_info); si->base_mem += __bitmap_size(sbi, NAT_BITMAP); + si->base_mem += (NM_I(sbi)->nat_bits_blocks << F2FS_BLKSIZE_BITS); + si->base_mem += NM_I(sbi)->nat_blocks * + f2fs_bitmap_size(NAT_ENTRY_PER_BLOCK); + si->base_mem += NM_I(sbi)->nat_blocks / 8; + si->base_mem += NM_I(sbi)->nat_blocks * sizeof(unsigned short); get_cache: si->cache_mem = 0; @@ -190,16 +228,23 @@ static void update_mem_info(struct f2fs_sb_info *sbi) si->cache_mem += sizeof(struct f2fs_gc_kthread); /* build merge flush thread */ - if (SM_I(sbi)->cmd_control_info) + if (SM_I(sbi)->fcc_info) si->cache_mem += sizeof(struct flush_cmd_control); + if (SM_I(sbi)->dcc_info) { + si->cache_mem += sizeof(struct discard_cmd_control); + si->cache_mem += sizeof(struct discard_cmd) * + atomic_read(&SM_I(sbi)->dcc_info->discard_cmd_cnt); + } /* free nids */ - si->cache_mem += NM_I(sbi)->fcnt * sizeof(struct free_nid); + si->cache_mem += (NM_I(sbi)->nid_cnt[FREE_NID] + + NM_I(sbi)->nid_cnt[PREALLOC_NID]) * + sizeof(struct free_nid); si->cache_mem += NM_I(sbi)->nat_cnt * sizeof(struct nat_entry); si->cache_mem += NM_I(sbi)->dirty_nat_cnt * sizeof(struct nat_entry_set); si->cache_mem += si->inmem_pages * sizeof(struct inmem_pages); - for (i = 0; i <= ORPHAN_INO; i++) + for (i = 0; i < MAX_INO_ENTRY; i++) si->cache_mem += sbi->im[i].ino_num * sizeof(struct ino_entry); si->cache_mem += atomic_read(&sbi->total_ext_tree) * sizeof(struct extent_tree); @@ -223,9 +268,10 @@ static int stat_show(struct seq_file *s, void *v) list_for_each_entry(si, &f2fs_stat_list, stat_list) { update_general_status(si->sbi); - seq_printf(s, "\n=====[ partition info(%pg). #%d, %s]=====\n", + seq_printf(s, "\n=====[ partition info(%pg). #%d, %s, CP: %s]=====\n", si->sbi->sb->s_bdev, i++, - f2fs_readonly(si->sbi->sb) ? "RO": "RW"); + f2fs_readonly(si->sbi->sb) ? "RO": "RW", + f2fs_cp_error(si->sbi) ? "Error": "Good"); seq_printf(s, "[SB: 1] [CP: 2] [SIT: %d] [NAT: %d] ", si->sit_area_segs, si->nat_area_segs); seq_printf(s, "[SSA: %d] [MAIN: %d", @@ -250,8 +296,8 @@ static int stat_show(struct seq_file *s, void *v) si->inline_inode); seq_printf(s, " - Inline_dentry Inode: %u\n", si->inline_dir); - seq_printf(s, " - Orphan Inode: %u\n", - si->orphans); + seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n", + si->orphans, si->append, si->update); seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n", si->main_area_segs, si->main_area_sections, si->main_area_zones); @@ -299,6 +345,10 @@ static int stat_show(struct seq_file *s, void *v) si->bg_data_blks); seq_printf(s, " - node blocks : %d (%d)\n", si->node_blks, si->bg_node_blks); + seq_printf(s, "Skipped : atomic write %llu (%llu)\n", + si->skipped_atomic_files[BG_GC] + + si->skipped_atomic_files[FG_GC], + si->skipped_atomic_files[BG_GC]); seq_puts(s, "\nExtent Cache:\n"); seq_printf(s, " - Hit Count: L1-1:%llu L1-2:%llu L2:%llu\n", si->hit_largest, si->hit_cached, @@ -310,22 +360,33 @@ static int stat_show(struct seq_file *s, void *v) seq_printf(s, " - Inner Struct Count: tree: %d(%d), node: %d\n", si->ext_tree, si->zombie_tree, si->ext_node); seq_puts(s, "\nBalancing F2FS Async:\n"); - seq_printf(s, " - inmem: %4d, wb_bios: %4d\n", - si->inmem_pages, si->wb_bios); + seq_printf(s, " - IO (CP: %4d, Data: %4d, Flush: (%4d %4d %4d), " + "Discard: (%4d %4d)) cmd: %4d undiscard:%4u\n", + si->nr_wb_cp_data, si->nr_wb_data, + si->nr_flushing, si->nr_flushed, + si->flush_list_empty, + si->nr_discarding, si->nr_discarded, + si->nr_discard_cmd, si->undiscard_blks); + seq_printf(s, " - inmem: %4d, atomic IO: %4d (Max. %4d), " + "volatile IO: %4d (Max. %4d)\n", + si->inmem_pages, si->aw_cnt, si->max_aw_cnt, + si->vw_cnt, si->max_vw_cnt); seq_printf(s, " - nodes: %4d in %4d\n", si->ndirty_node, si->node_pages); seq_printf(s, " - dents: %4d in dirs:%4d (%4d)\n", si->ndirty_dent, si->ndirty_dirs, si->ndirty_all); seq_printf(s, " - datas: %4d in files:%4d\n", si->ndirty_data, si->ndirty_files); + seq_printf(s, " - quota datas: %4d in quota files:%4d\n", + si->ndirty_qdata, si->nquota_files); seq_printf(s, " - meta: %4d in %4d\n", si->ndirty_meta, si->meta_pages); seq_printf(s, " - imeta: %4d\n", si->ndirty_imeta); seq_printf(s, " - NATs: %9d/%9d\n - SITs: %9d/%9d\n", si->dirty_nats, si->nats, si->dirty_sits, si->sits); - seq_printf(s, " - free_nids: %9d\n", - si->fnids); + seq_printf(s, " - free_nids: %9d/%9d\n - alloc_nids: %9d\n", + si->free_nids, si->avail_nids, si->alloc_nids); seq_puts(s, "\nDistribution of User Blocks:"); seq_puts(s, " [ valid | invalid | free ]\n"); seq_puts(s, " ["); @@ -385,7 +446,7 @@ int f2fs_build_stats(struct f2fs_sb_info *sbi) struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi); struct f2fs_stat_info *si; - si = kzalloc(sizeof(struct f2fs_stat_info), GFP_KERNEL); + si = f2fs_kzalloc(sbi, sizeof(struct f2fs_stat_info), GFP_KERNEL); if (!si) return -ENOMEM; @@ -410,6 +471,11 @@ int f2fs_build_stats(struct f2fs_sb_info *sbi) atomic_set(&sbi->inline_dir, 0); atomic_set(&sbi->inplace_count, 0); + atomic_set(&sbi->aw_cnt, 0); + atomic_set(&sbi->vw_cnt, 0); + atomic_set(&sbi->max_aw_cnt, 0); + atomic_set(&sbi->max_vw_cnt, 0); + mutex_lock(&f2fs_stat_mutex); list_add_tail(&si->stat_list, &f2fs_stat_list); mutex_unlock(&f2fs_stat_mutex);
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 8add4e8..56cc274 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c
@@ -10,10 +10,12 @@ */ #include <linux/fs.h> #include <linux/f2fs_fs.h> +#include <linux/sched.h> #include "f2fs.h" #include "node.h" #include "acl.h" #include "xattr.h" +#include <trace/events/f2fs.h> static unsigned long dir_blocks(struct inode *inode) { @@ -58,12 +60,12 @@ static unsigned char f2fs_type_by_mode[S_IFMT >> S_SHIFT] = { [S_IFLNK >> S_SHIFT] = F2FS_FT_SYMLINK, }; -void set_de_type(struct f2fs_dir_entry *de, umode_t mode) +static void set_de_type(struct f2fs_dir_entry *de, umode_t mode) { de->file_type = f2fs_type_by_mode[(mode & S_IFMT) >> S_SHIFT]; } -unsigned char get_de_type(struct f2fs_dir_entry *de) +unsigned char f2fs_get_de_type(struct f2fs_dir_entry *de) { if (de->file_type < F2FS_FT_MAX) return f2fs_filetype_table[de->file_type]; @@ -92,27 +94,23 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page, struct f2fs_dir_entry *de; struct f2fs_dentry_ptr d; - dentry_blk = (struct f2fs_dentry_block *)kmap(dentry_page); + dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page); - make_dentry_ptr(NULL, &d, (void *)dentry_blk, 1); - de = find_target_dentry(fname, namehash, max_slots, &d); + make_dentry_ptr_block(NULL, &d, dentry_blk); + de = f2fs_find_target_dentry(fname, namehash, max_slots, &d); if (de) *res_page = dentry_page; - else - kunmap(dentry_page); return de; } -struct f2fs_dir_entry *find_target_dentry(struct fscrypt_name *fname, +struct f2fs_dir_entry *f2fs_find_target_dentry(struct fscrypt_name *fname, f2fs_hash_t namehash, int *max_slots, struct f2fs_dentry_ptr *d) { struct f2fs_dir_entry *de; unsigned long bit_pos = 0; int max_len = 0; - struct fscrypt_str de_name = FSTR_INIT(NULL, 0); - struct fscrypt_str *name = &fname->disk_name; if (max_slots) *max_slots = 0; @@ -130,29 +128,11 @@ struct f2fs_dir_entry *find_target_dentry(struct fscrypt_name *fname, continue; } - if (de->hash_code != namehash) - goto not_match; - - de_name.name = d->filename[bit_pos]; - de_name.len = le16_to_cpu(de->name_len); - -#ifdef CONFIG_F2FS_FS_ENCRYPTION - if (unlikely(!name->name)) { - if (fname->usr_fname->name[0] == '_') { - if (de_name.len > 32 && - !memcmp(de_name.name + ((de_name.len - 17) & ~15), - fname->crypto_buf.name + 8, 16)) - goto found; - goto not_match; - } - name->name = fname->crypto_buf.name; - name->len = fname->crypto_buf.len; - } -#endif - if (de_name.len == name->len && - !memcmp(de_name.name, name->name, name->len)) + if (de->hash_code == namehash && + fscrypt_match_name(fname, d->filename[bit_pos], + le16_to_cpu(de->name_len))) goto found; -not_match: + if (max_slots && max_len > *max_slots) *max_slots = max_len; max_len = 0; @@ -191,7 +171,7 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir, for (; bidx < end_block; bidx++) { /* no need to allocate new dentry pages to all the indices */ - dentry_page = find_data_page(dir, bidx); + dentry_page = f2fs_find_data_page(dir, bidx); if (IS_ERR(dentry_page)) { if (PTR_ERR(dentry_page) == -ENOENT) { room = true; @@ -212,13 +192,9 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir, f2fs_put_page(dentry_page, 0); } - /* This is to increase the speed of f2fs_create */ - if (!de && room) { - F2FS_I(dir)->task = current; - if (F2FS_I(dir)->chash != namehash) { - F2FS_I(dir)->chash = namehash; - F2FS_I(dir)->clevel = level; - } + if (!de && room && F2FS_I(dir)->chash != namehash) { + F2FS_I(dir)->chash = namehash; + F2FS_I(dir)->clevel = level; } return de; @@ -234,7 +210,7 @@ struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir, if (f2fs_has_inline_dentry(dir)) { *res_page = NULL; - de = find_in_inline_dir(dir, fname, res_page); + de = f2fs_find_in_inline_dir(dir, fname, res_page); goto out; } @@ -259,6 +235,9 @@ struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir, break; } out: + /* This is to increase the speed of f2fs_create */ + if (!de) + F2FS_I(dir)->task = current; return de; } @@ -306,7 +285,6 @@ ino_t f2fs_inode_by_name(struct inode *dir, const struct qstr *qstr, de = f2fs_find_entry(dir, qstr, page); if (de) { res = le32_to_cpu(de->ino); - f2fs_dentry_kunmap(dir, *page); f2fs_put_page(*page, 0); } @@ -321,11 +299,10 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type, true); de->ino = cpu_to_le32(inode->i_ino); set_de_type(de, inode->i_mode); - f2fs_dentry_kunmap(dir, page); set_page_dirty(page); dir->i_mtime = dir->i_ctime = current_time(dir); - f2fs_mark_inode_dirty_sync(dir); + f2fs_mark_inode_dirty_sync(dir, false); f2fs_put_page(page, 1); } @@ -342,25 +319,7 @@ static void init_dent_inode(const struct qstr *name, struct page *ipage) set_page_dirty(ipage); } -int update_dent_inode(struct inode *inode, struct inode *to, - const struct qstr *name) -{ - struct page *page; - - if (file_enc_name(to)) - return 0; - - page = get_node_page(F2FS_I_SB(inode), inode->i_ino); - if (IS_ERR(page)) - return PTR_ERR(page); - - init_dent_inode(name, page); - f2fs_put_page(page, 1); - - return 0; -} - -void do_make_empty_dir(struct inode *inode, struct inode *parent, +void f2fs_do_make_empty_dir(struct inode *inode, struct inode *parent, struct f2fs_dentry_ptr *d) { struct qstr dot = QSTR_INIT(".", 1); @@ -381,33 +340,32 @@ static int make_empty_dir(struct inode *inode, struct f2fs_dentry_ptr d; if (f2fs_has_inline_dentry(inode)) - return make_empty_inline_dir(inode, parent, page); + return f2fs_make_empty_inline_dir(inode, parent, page); - dentry_page = get_new_data_page(inode, page, 0, true); + dentry_page = f2fs_get_new_data_page(inode, page, 0, true); if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); - make_dentry_ptr(NULL, &d, (void *)dentry_blk, 1); - do_make_empty_dir(inode, parent, &d); - - kunmap_atomic(dentry_blk); + make_dentry_ptr_block(NULL, &d, dentry_blk); + f2fs_do_make_empty_dir(inode, parent, &d); set_page_dirty(dentry_page); f2fs_put_page(dentry_page, 1); return 0; } -struct page *init_inode_metadata(struct inode *inode, struct inode *dir, +struct page *f2fs_init_inode_metadata(struct inode *inode, struct inode *dir, const struct qstr *new_name, const struct qstr *orig_name, struct page *dpage) { struct page *page; + int dummy_encrypt = DUMMY_ENCRYPTION_ENABLED(F2FS_I_SB(dir)); int err; if (is_inode_flag_set(inode, FI_NEW_INODE)) { - page = new_inode_page(inode); + page = f2fs_new_inode_page(inode); if (IS_ERR(page)) return page; @@ -430,46 +388,49 @@ struct page *init_inode_metadata(struct inode *inode, struct inode *dir, if (err) goto put_error; - if (f2fs_encrypted_inode(dir) && f2fs_may_encrypt(inode)) { + if ((f2fs_encrypted_inode(dir) || dummy_encrypt) && + f2fs_may_encrypt(inode)) { err = fscrypt_inherit_context(dir, inode, page, false); if (err) goto put_error; } } else { - page = get_node_page(F2FS_I_SB(dir), inode->i_ino); + page = f2fs_get_node_page(F2FS_I_SB(dir), inode->i_ino); if (IS_ERR(page)) return page; - - set_cold_node(inode, page); } - if (new_name) + if (new_name) { init_dent_inode(new_name, page); + if (f2fs_encrypted_inode(dir)) + file_set_enc_name(inode); + } /* * This file should be checkpointed during fsync. * We lost i_pino from now on. */ if (is_inode_flag_set(inode, FI_INC_LINK)) { - file_lost_pino(inode); + if (!S_ISDIR(inode->i_mode)) + file_lost_pino(inode); /* * If link the tmpfile to alias through linkat path, * we should remove this inode from orphan list. */ if (inode->i_nlink == 0) - remove_orphan_inode(F2FS_I_SB(dir), inode->i_ino); + f2fs_remove_orphan_inode(F2FS_I_SB(dir), inode->i_ino); f2fs_i_links_write(inode, true); } return page; put_error: clear_nlink(inode); - update_inode(inode, page); + f2fs_update_inode(inode, page); f2fs_put_page(page, 1); return ERR_PTR(err); } -void update_parent_metadata(struct inode *dir, struct inode *inode, +void f2fs_update_parent_metadata(struct inode *dir, struct inode *inode, unsigned int current_depth) { if (inode && is_inode_flag_set(inode, FI_NEW_INODE)) { @@ -478,7 +439,7 @@ void update_parent_metadata(struct inode *dir, struct inode *inode, clear_inode_flag(inode, FI_NEW_INODE); } dir->i_mtime = dir->i_ctime = current_time(dir); - f2fs_mark_inode_dirty_sync(dir); + f2fs_mark_inode_dirty_sync(dir, false); if (F2FS_I(dir)->i_current_depth != current_depth) f2fs_i_depth_write(dir, current_depth); @@ -487,7 +448,7 @@ void update_parent_metadata(struct inode *dir, struct inode *inode, clear_inode_flag(inode, FI_INC_LINK); } -int room_for_filename(const void *bitmap, int slots, int max_slots) +int f2fs_room_for_filename(const void *bitmap, int slots, int max_slots) { int bit_start = 0; int zero_start, zero_end; @@ -556,10 +517,11 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, } start: -#ifdef CONFIG_F2FS_FAULT_INJECTION - if (time_to_inject(F2FS_I_SB(dir), FAULT_DIR_DEPTH)) + if (time_to_inject(F2FS_I_SB(dir), FAULT_DIR_DEPTH)) { + f2fs_show_injection_info(FAULT_DIR_DEPTH); return -ENOSPC; -#endif + } + if (unlikely(current_depth == MAX_DIR_HASH_DEPTH)) return -ENOSPC; @@ -574,17 +536,16 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, (le32_to_cpu(dentry_hash) % nbucket)); for (block = bidx; block <= (bidx + nblock - 1); block++) { - dentry_page = get_new_data_page(dir, NULL, block, true); + dentry_page = f2fs_get_new_data_page(dir, NULL, block, true); if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); - dentry_blk = kmap(dentry_page); - bit_pos = room_for_filename(&dentry_blk->dentry_bitmap, + dentry_blk = page_address(dentry_page); + bit_pos = f2fs_room_for_filename(&dentry_blk->dentry_bitmap, slots, NR_DENTRY_IN_BLOCK); if (bit_pos < NR_DENTRY_IN_BLOCK) goto add_dentry; - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); } @@ -596,17 +557,15 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, if (inode) { down_write(&F2FS_I(inode)->i_sem); - page = init_inode_metadata(inode, dir, new_name, + page = f2fs_init_inode_metadata(inode, dir, new_name, orig_name, NULL); if (IS_ERR(page)) { err = PTR_ERR(page); goto fail; } - if (f2fs_encrypted_inode(dir)) - file_set_enc_name(inode); } - make_dentry_ptr(NULL, &d, (void *)dentry_blk, 1); + make_dentry_ptr_block(NULL, &d, dentry_blk); f2fs_update_dentry(ino, mode, &d, new_name, dentry_hash, bit_pos); set_page_dirty(dentry_page); @@ -616,18 +575,17 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, f2fs_put_page(page, 1); } - update_parent_metadata(dir, inode, current_depth); + f2fs_update_parent_metadata(dir, inode, current_depth); fail: if (inode) up_write(&F2FS_I(inode)->i_sem); - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); return err; } -int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname, +int f2fs_add_dentry(struct inode *dir, struct fscrypt_name *fname, struct inode *inode, nid_t ino, umode_t mode) { struct qstr new_name; @@ -651,7 +609,7 @@ int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname, * Caller should grab and release a rwsem by calling f2fs_lock_op() and * f2fs_unlock_op(). */ -int __f2fs_add_link(struct inode *dir, const struct qstr *name, +int f2fs_do_add_link(struct inode *dir, const struct qstr *name, struct inode *inode, nid_t ino, umode_t mode) { struct fscrypt_name fname; @@ -675,13 +633,12 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name, F2FS_I(dir)->task = NULL; } if (de) { - f2fs_dentry_kunmap(dir, page); f2fs_put_page(page, 0); err = -EEXIST; } else if (IS_ERR(page)) { err = PTR_ERR(page); } else { - err = __f2fs_do_add_link(dir, &fname, inode, ino, mode); + err = f2fs_add_dentry(dir, &fname, inode, ino, mode); } fscrypt_free_filename(&fname); return err; @@ -693,7 +650,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir) int err = 0; down_write(&F2FS_I(inode)->i_sem); - page = init_inode_metadata(inode, dir, NULL, NULL, NULL); + page = f2fs_init_inode_metadata(inode, dir, NULL, NULL, NULL); if (IS_ERR(page)) { err = PTR_ERR(page); goto fail; @@ -725,9 +682,9 @@ void f2fs_drop_nlink(struct inode *dir, struct inode *inode) up_write(&F2FS_I(inode)->i_sem); if (inode->i_nlink == 0) - add_orphan_inode(inode); + f2fs_add_orphan_inode(inode); else - release_orphan_inode(sbi); + f2fs_release_orphan_inode(sbi); } /* @@ -744,6 +701,9 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, f2fs_update_time(F2FS_I_SB(dir), REQ_TIME); + if (F2FS_OPTION(F2FS_I_SB(dir)).fsync_mode == FSYNC_MODE_STRICT) + f2fs_add_ino_entry(F2FS_I_SB(dir), dir->i_ino, TRANS_DIR_INO); + if (f2fs_has_inline_dentry(dir)) return f2fs_delete_inline_entry(dentry, page, dir, inode); @@ -753,27 +713,28 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, dentry_blk = page_address(page); bit_pos = dentry - dentry_blk->dentry; for (i = 0; i < slots; i++) - clear_bit_le(bit_pos + i, &dentry_blk->dentry_bitmap); + __clear_bit_le(bit_pos + i, &dentry_blk->dentry_bitmap); /* Let's check and deallocate this dentry page */ bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, 0); - kunmap(page); /* kunmap - pair of f2fs_find_entry */ set_page_dirty(page); dir->i_ctime = dir->i_mtime = current_time(dir); - f2fs_mark_inode_dirty_sync(dir); + f2fs_mark_inode_dirty_sync(dir, false); if (inode) f2fs_drop_nlink(dir, inode); if (bit_pos == NR_DENTRY_IN_BLOCK && - !truncate_hole(dir, page->index, page->index + 1)) { + !f2fs_truncate_hole(dir, page->index, page->index + 1)) { + f2fs_clear_radix_tree_dirty_tag(page); clear_page_dirty_for_io(page); ClearPagePrivate(page); ClearPageUptodate(page); inode_dec_dirty_pages(dir); + f2fs_remove_dirty_inode(dir); } f2fs_put_page(page, 1); } @@ -790,7 +751,7 @@ bool f2fs_empty_dir(struct inode *dir) return f2fs_empty_inline_dir(dir); for (bidx = 0; bidx < nblock; bidx++) { - dentry_page = get_lock_data_page(dir, bidx, false); + dentry_page = f2fs_get_lock_data_page(dir, bidx, false); if (IS_ERR(dentry_page)) { if (PTR_ERR(dentry_page) == -ENOENT) continue; @@ -798,7 +759,7 @@ bool f2fs_empty_dir(struct inode *dir) return false; } - dentry_blk = kmap_atomic(dentry_page); + dentry_blk = page_address(dentry_page); if (bidx == 0) bit_pos = 2; else @@ -806,7 +767,6 @@ bool f2fs_empty_dir(struct inode *dir) bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap, NR_DENTRY_IN_BLOCK, bit_pos); - kunmap_atomic(dentry_blk); f2fs_put_page(dentry_page, 1); @@ -816,13 +776,14 @@ bool f2fs_empty_dir(struct inode *dir) return true; } -bool f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, +int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, unsigned int start_pos, struct fscrypt_str *fstr) { unsigned char d_type = DT_UNKNOWN; unsigned int bit_pos; struct f2fs_dir_entry *de = NULL; struct fscrypt_str de_name = FSTR_INIT(NULL, 0); + struct f2fs_sb_info *sbi = F2FS_I_SB(d->inode); bit_pos = ((unsigned long)ctx->pos % d->max); @@ -838,7 +799,7 @@ bool f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, continue; } - d_type = get_de_type(de); + d_type = f2fs_get_de_type(de); de_name.name = d->filename[bit_pos]; de_name.len = le16_to_cpu(de->name_len); @@ -851,7 +812,7 @@ bool f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, (u32)de->hash_code, 0, &de_name, fstr); if (err) - return true; + return err; de_name = *fstr; fstr->len = save_len; @@ -859,12 +820,15 @@ bool f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, if (!dir_emit(ctx, de_name.name, de_name.len, le32_to_cpu(de->ino), d_type)) - return true; + return 1; + + if (sbi->readdir_ra == 1) + f2fs_ra_node_page(sbi, le32_to_cpu(de->ino)); bit_pos += GET_DENTRY_SLOTS(le16_to_cpu(de->name_len)); ctx->pos = start_pos + bit_pos; } - return false; + return 0; } static int f2fs_readdir(struct file *file, struct dir_context *ctx) @@ -874,6 +838,7 @@ static int f2fs_readdir(struct file *file, struct dir_context *ctx) struct f2fs_dentry_block *dentry_blk = NULL; struct page *dentry_page = NULL; struct file_ra_state *ra = &file->f_ra; + loff_t start_pos = ctx->pos; unsigned int n = ((unsigned long)ctx->pos / NR_DENTRY_IN_BLOCK); struct f2fs_dentry_ptr d; struct fscrypt_str fstr = FSTR_INIT(NULL, 0); @@ -882,51 +847,61 @@ static int f2fs_readdir(struct file *file, struct dir_context *ctx) if (f2fs_encrypted_inode(inode)) { err = fscrypt_get_encryption_info(inode); if (err && err != -ENOKEY) - return err; + goto out; err = fscrypt_fname_alloc_buffer(inode, F2FS_NAME_LEN, &fstr); if (err < 0) - return err; + goto out; } if (f2fs_has_inline_dentry(inode)) { err = f2fs_read_inline_dir(file, ctx, &fstr); - goto out; + goto out_free; } - /* readahead for multi pages of dir */ - if (npages - n > 1 && !ra_has_index(ra, n)) - page_cache_sync_readahead(inode->i_mapping, ra, file, n, + for (; n < npages; n++, ctx->pos = n * NR_DENTRY_IN_BLOCK) { + + /* allow readdir() to be interrupted */ + if (fatal_signal_pending(current)) { + err = -ERESTARTSYS; + goto out_free; + } + cond_resched(); + + /* readahead for multi pages of dir */ + if (npages - n > 1 && !ra_has_index(ra, n)) + page_cache_sync_readahead(inode->i_mapping, ra, file, n, min(npages - n, (pgoff_t)MAX_DIR_RA_PAGES)); - for (; n < npages; n++) { - dentry_page = get_lock_data_page(inode, n, false); + dentry_page = f2fs_get_lock_data_page(inode, n, false); if (IS_ERR(dentry_page)) { err = PTR_ERR(dentry_page); - if (err == -ENOENT) + if (err == -ENOENT) { + err = 0; continue; - else - goto out; + } else { + goto out_free; + } } - dentry_blk = kmap(dentry_page); + dentry_blk = page_address(dentry_page); - make_dentry_ptr(inode, &d, (void *)dentry_blk, 1); + make_dentry_ptr_block(inode, &d, dentry_blk); - if (f2fs_fill_dentries(ctx, &d, n * NR_DENTRY_IN_BLOCK, &fstr)) { - kunmap(dentry_page); + err = f2fs_fill_dentries(ctx, &d, + n * NR_DENTRY_IN_BLOCK, &fstr); + if (err) { f2fs_put_page(dentry_page, 1); break; } - ctx->pos = (n + 1) * NR_DENTRY_IN_BLOCK; - kunmap(dentry_page); f2fs_put_page(dentry_page, 1); } - err = 0; -out: +out_free: fscrypt_fname_free_buffer(&fstr); - return err; +out: + trace_f2fs_readdir(inode, start_pos, ctx->pos, err); + return err < 0 ? err : 0; } static int f2fs_dir_open(struct inode *inode, struct file *filp)
diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c index d7b8c8b..231b77e 100644 --- a/fs/f2fs/extent_cache.c +++ b/fs/f2fs/extent_cache.c
@@ -18,6 +18,179 @@ #include "node.h" #include <trace/events/f2fs.h> +static struct rb_entry *__lookup_rb_tree_fast(struct rb_entry *cached_re, + unsigned int ofs) +{ + if (cached_re) { + if (cached_re->ofs <= ofs && + cached_re->ofs + cached_re->len > ofs) { + return cached_re; + } + } + return NULL; +} + +static struct rb_entry *__lookup_rb_tree_slow(struct rb_root *root, + unsigned int ofs) +{ + struct rb_node *node = root->rb_node; + struct rb_entry *re; + + while (node) { + re = rb_entry(node, struct rb_entry, rb_node); + + if (ofs < re->ofs) + node = node->rb_left; + else if (ofs >= re->ofs + re->len) + node = node->rb_right; + else + return re; + } + return NULL; +} + +struct rb_entry *f2fs_lookup_rb_tree(struct rb_root *root, + struct rb_entry *cached_re, unsigned int ofs) +{ + struct rb_entry *re; + + re = __lookup_rb_tree_fast(cached_re, ofs); + if (!re) + return __lookup_rb_tree_slow(root, ofs); + + return re; +} + +struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi, + struct rb_root *root, struct rb_node **parent, + unsigned int ofs) +{ + struct rb_node **p = &root->rb_node; + struct rb_entry *re; + + while (*p) { + *parent = *p; + re = rb_entry(*parent, struct rb_entry, rb_node); + + if (ofs < re->ofs) + p = &(*p)->rb_left; + else if (ofs >= re->ofs + re->len) + p = &(*p)->rb_right; + else + f2fs_bug_on(sbi, 1); + } + + return p; +} + +/* + * lookup rb entry in position of @ofs in rb-tree, + * if hit, return the entry, otherwise, return NULL + * @prev_ex: extent before ofs + * @next_ex: extent after ofs + * @insert_p: insert point for new extent at ofs + * in order to simpfy the insertion after. + * tree must stay unchanged between lookup and insertion. + */ +struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root *root, + struct rb_entry *cached_re, + unsigned int ofs, + struct rb_entry **prev_entry, + struct rb_entry **next_entry, + struct rb_node ***insert_p, + struct rb_node **insert_parent, + bool force) +{ + struct rb_node **pnode = &root->rb_node; + struct rb_node *parent = NULL, *tmp_node; + struct rb_entry *re = cached_re; + + *insert_p = NULL; + *insert_parent = NULL; + *prev_entry = NULL; + *next_entry = NULL; + + if (RB_EMPTY_ROOT(root)) + return NULL; + + if (re) { + if (re->ofs <= ofs && re->ofs + re->len > ofs) + goto lookup_neighbors; + } + + while (*pnode) { + parent = *pnode; + re = rb_entry(*pnode, struct rb_entry, rb_node); + + if (ofs < re->ofs) + pnode = &(*pnode)->rb_left; + else if (ofs >= re->ofs + re->len) + pnode = &(*pnode)->rb_right; + else + goto lookup_neighbors; + } + + *insert_p = pnode; + *insert_parent = parent; + + re = rb_entry(parent, struct rb_entry, rb_node); + tmp_node = parent; + if (parent && ofs > re->ofs) + tmp_node = rb_next(parent); + *next_entry = rb_entry_safe(tmp_node, struct rb_entry, rb_node); + + tmp_node = parent; + if (parent && ofs < re->ofs) + tmp_node = rb_prev(parent); + *prev_entry = rb_entry_safe(tmp_node, struct rb_entry, rb_node); + return NULL; + +lookup_neighbors: + if (ofs == re->ofs || force) { + /* lookup prev node for merging backward later */ + tmp_node = rb_prev(&re->rb_node); + *prev_entry = rb_entry_safe(tmp_node, struct rb_entry, rb_node); + } + if (ofs == re->ofs + re->len - 1 || force) { + /* lookup next node for merging frontward later */ + tmp_node = rb_next(&re->rb_node); + *next_entry = rb_entry_safe(tmp_node, struct rb_entry, rb_node); + } + return re; +} + +bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi, + struct rb_root *root) +{ +#ifdef CONFIG_F2FS_CHECK_FS + struct rb_node *cur = rb_first(root), *next; + struct rb_entry *cur_re, *next_re; + + if (!cur) + return true; + + while (cur) { + next = rb_next(cur); + if (!next) + return true; + + cur_re = rb_entry(cur, struct rb_entry, rb_node); + next_re = rb_entry(next, struct rb_entry, rb_node); + + if (cur_re->ofs + cur_re->len > next_re->ofs) { + f2fs_msg(sbi->sb, KERN_INFO, "inconsistent rbtree, " + "cur(%u, %u) next(%u, %u)", + cur_re->ofs, cur_re->len, + next_re->ofs, next_re->len); + return false; + } + + cur = next; + } +#endif + return true; +} + static struct kmem_cache *extent_tree_slab; static struct kmem_cache *extent_node_slab; @@ -77,7 +250,7 @@ static struct extent_tree *__grab_extent_tree(struct inode *inode) struct extent_tree *et; nid_t ino = inode->i_ino; - down_write(&sbi->extent_tree_lock); + mutex_lock(&sbi->extent_tree_lock); et = radix_tree_lookup(&sbi->extent_tree_root, ino); if (!et) { et = f2fs_kmem_cache_alloc(extent_tree_slab, GFP_NOFS); @@ -94,7 +267,7 @@ static struct extent_tree *__grab_extent_tree(struct inode *inode) atomic_dec(&sbi->total_zombie_tree); list_del_init(&et->list); } - up_write(&sbi->extent_tree_lock); + mutex_unlock(&sbi->extent_tree_lock); /* never died until evict_inode */ F2FS_I(inode)->extent_tree = et; @@ -102,36 +275,6 @@ static struct extent_tree *__grab_extent_tree(struct inode *inode) return et; } -static struct extent_node *__lookup_extent_tree(struct f2fs_sb_info *sbi, - struct extent_tree *et, unsigned int fofs) -{ - struct rb_node *node = et->root.rb_node; - struct extent_node *en = et->cached_en; - - if (en) { - struct extent_info *cei = &en->ei; - - if (cei->fofs <= fofs && cei->fofs + cei->len > fofs) { - stat_inc_cached_node_hit(sbi); - return en; - } - } - - while (node) { - en = rb_entry(node, struct extent_node, rb_node); - - if (fofs < en->ei.fofs) { - node = node->rb_left; - } else if (fofs >= en->ei.fofs + en->ei.len) { - node = node->rb_right; - } else { - stat_inc_rbtree_node_hit(sbi); - return en; - } - } - return NULL; -} - static struct extent_node *__init_extent_tree(struct f2fs_sb_info *sbi, struct extent_tree *et, struct extent_info *ei) { @@ -172,7 +315,7 @@ static void __drop_largest_extent(struct inode *inode, if (fofs < largest->fofs + largest->len && fofs + len > largest->fofs) { largest->len = 0; - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, true); } } @@ -247,17 +390,24 @@ static bool f2fs_lookup_extent_tree(struct inode *inode, pgoff_t pgofs, goto out; } - en = __lookup_extent_tree(sbi, et, pgofs); - if (en) { - *ei = en->ei; - spin_lock(&sbi->extent_lock); - if (!list_empty(&en->list)) { - list_move_tail(&en->list, &sbi->extent_list); - et->cached_en = en; - } - spin_unlock(&sbi->extent_lock); - ret = true; + en = (struct extent_node *)f2fs_lookup_rb_tree(&et->root, + (struct rb_entry *)et->cached_en, pgofs); + if (!en) + goto out; + + if (en == et->cached_en) + stat_inc_cached_node_hit(sbi); + else + stat_inc_rbtree_node_hit(sbi); + + *ei = en->ei; + spin_lock(&sbi->extent_lock); + if (!list_empty(&en->list)) { + list_move_tail(&en->list, &sbi->extent_list); + et->cached_en = en; } + spin_unlock(&sbi->extent_lock); + ret = true; out: stat_inc_total_hit(sbi); read_unlock(&et->lock); @@ -266,87 +416,6 @@ static bool f2fs_lookup_extent_tree(struct inode *inode, pgoff_t pgofs, return ret; } - -/* - * lookup extent at @fofs, if hit, return the extent - * if not, return NULL and - * @prev_ex: extent before fofs - * @next_ex: extent after fofs - * @insert_p: insert point for new extent at fofs - * in order to simpfy the insertion after. - * tree must stay unchanged between lookup and insertion. - */ -static struct extent_node *__lookup_extent_tree_ret(struct extent_tree *et, - unsigned int fofs, - struct extent_node **prev_ex, - struct extent_node **next_ex, - struct rb_node ***insert_p, - struct rb_node **insert_parent) -{ - struct rb_node **pnode = &et->root.rb_node; - struct rb_node *parent = NULL, *tmp_node; - struct extent_node *en = et->cached_en; - - *insert_p = NULL; - *insert_parent = NULL; - *prev_ex = NULL; - *next_ex = NULL; - - if (RB_EMPTY_ROOT(&et->root)) - return NULL; - - if (en) { - struct extent_info *cei = &en->ei; - - if (cei->fofs <= fofs && cei->fofs + cei->len > fofs) - goto lookup_neighbors; - } - - while (*pnode) { - parent = *pnode; - en = rb_entry(*pnode, struct extent_node, rb_node); - - if (fofs < en->ei.fofs) - pnode = &(*pnode)->rb_left; - else if (fofs >= en->ei.fofs + en->ei.len) - pnode = &(*pnode)->rb_right; - else - goto lookup_neighbors; - } - - *insert_p = pnode; - *insert_parent = parent; - - en = rb_entry(parent, struct extent_node, rb_node); - tmp_node = parent; - if (parent && fofs > en->ei.fofs) - tmp_node = rb_next(parent); - *next_ex = tmp_node ? - rb_entry(tmp_node, struct extent_node, rb_node) : NULL; - - tmp_node = parent; - if (parent && fofs < en->ei.fofs) - tmp_node = rb_prev(parent); - *prev_ex = tmp_node ? - rb_entry(tmp_node, struct extent_node, rb_node) : NULL; - return NULL; - -lookup_neighbors: - if (fofs == en->ei.fofs) { - /* lookup prev node for merging backward later */ - tmp_node = rb_prev(&en->rb_node); - *prev_ex = tmp_node ? - rb_entry(tmp_node, struct extent_node, rb_node) : NULL; - } - if (fofs == en->ei.fofs + en->ei.len - 1) { - /* lookup next node for merging frontward later */ - tmp_node = rb_next(&en->rb_node); - *next_ex = tmp_node ? - rb_entry(tmp_node, struct extent_node, rb_node) : NULL; - } - return en; -} - static struct extent_node *__try_merge_extent_node(struct inode *inode, struct extent_tree *et, struct extent_info *ei, struct extent_node *prev_ex, @@ -391,7 +460,7 @@ static struct extent_node *__insert_extent_tree(struct inode *inode, struct rb_node *insert_parent) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); - struct rb_node **p = &et->root.rb_node; + struct rb_node **p; struct rb_node *parent = NULL; struct extent_node *en = NULL; @@ -401,17 +470,7 @@ static struct extent_node *__insert_extent_tree(struct inode *inode, goto do_insert; } - while (*p) { - parent = *p; - en = rb_entry(parent, struct extent_node, rb_node); - - if (ei->fofs < en->ei.fofs) - p = &(*p)->rb_left; - else if (ei->fofs >= en->ei.fofs + en->ei.len) - p = &(*p)->rb_right; - else - f2fs_bug_on(sbi, 1); - } + p = f2fs_lookup_rb_tree_for_insert(sbi, &et->root, &parent, ei->fofs); do_insert: en = __attach_extent_node(sbi, et, ei, parent, p); if (!en) @@ -427,7 +486,7 @@ static struct extent_node *__insert_extent_tree(struct inode *inode, return en; } -static unsigned int f2fs_update_extent_tree_range(struct inode *inode, +static void f2fs_update_extent_tree_range(struct inode *inode, pgoff_t fofs, block_t blkaddr, unsigned int len) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); @@ -440,7 +499,7 @@ static unsigned int f2fs_update_extent_tree_range(struct inode *inode, unsigned int pos = (unsigned int)fofs; if (!et) - return false; + return; trace_f2fs_update_extent_tree_range(inode, fofs, blkaddr, len); @@ -448,7 +507,7 @@ static unsigned int f2fs_update_extent_tree_range(struct inode *inode, if (is_inode_flag_set(inode, FI_NO_EXTENT)) { write_unlock(&et->lock); - return false; + return; } prev = et->largest; @@ -461,8 +520,11 @@ static unsigned int f2fs_update_extent_tree_range(struct inode *inode, __drop_largest_extent(inode, fofs, len); /* 1. lookup first extent node in range [fofs, fofs + len - 1] */ - en = __lookup_extent_tree_ret(et, fofs, &prev_en, &next_en, - &insert_p, &insert_parent); + en = (struct extent_node *)f2fs_lookup_rb_tree_ret(&et->root, + (struct rb_entry *)et->cached_en, fofs, + (struct rb_entry **)&prev_en, + (struct rb_entry **)&next_en, + &insert_p, &insert_parent, false); if (!en) en = next_en; @@ -503,9 +565,8 @@ static unsigned int f2fs_update_extent_tree_range(struct inode *inode, if (!next_en) { struct rb_node *node = rb_next(&en->rb_node); - next_en = node ? - rb_entry(node, struct extent_node, rb_node) - : NULL; + next_en = rb_entry_safe(node, struct extent_node, + rb_node); } if (parts) @@ -546,8 +607,6 @@ static unsigned int f2fs_update_extent_tree_range(struct inode *inode, __free_extent_tree(sbi, et); write_unlock(&et->lock); - - return !__is_extent_same(&prev, &et->largest); } unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink) @@ -563,7 +622,7 @@ unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink) if (!atomic_read(&sbi->total_zombie_tree)) goto free_node; - if (!down_write_trylock(&sbi->extent_tree_lock)) + if (!mutex_trylock(&sbi->extent_tree_lock)) goto out; /* 1. remove unreferenced extent tree */ @@ -585,11 +644,11 @@ unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink) goto unlock_out; cond_resched(); } - up_write(&sbi->extent_tree_lock); + mutex_unlock(&sbi->extent_tree_lock); free_node: /* 2. remove LRU extent entries */ - if (!down_write_trylock(&sbi->extent_tree_lock)) + if (!mutex_trylock(&sbi->extent_tree_lock)) goto out; remained = nr_shrink - (node_cnt + tree_cnt); @@ -619,7 +678,7 @@ unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink) spin_unlock(&sbi->extent_lock); unlock_out: - up_write(&sbi->extent_tree_lock); + mutex_unlock(&sbi->extent_tree_lock); out: trace_f2fs_shrink_extent_tree(sbi, node_cnt, tree_cnt); @@ -669,10 +728,10 @@ void f2fs_destroy_extent_tree(struct inode *inode) if (inode->i_nlink && !is_bad_inode(inode) && atomic_read(&et->node_cnt)) { - down_write(&sbi->extent_tree_lock); + mutex_lock(&sbi->extent_tree_lock); list_add_tail(&et->list, &sbi->zombie_list); atomic_inc(&sbi->total_zombie_tree); - up_write(&sbi->extent_tree_lock); + mutex_unlock(&sbi->extent_tree_lock); return; } @@ -680,12 +739,12 @@ void f2fs_destroy_extent_tree(struct inode *inode) node_cnt = f2fs_destroy_extent_node(inode); /* delete extent tree entry in radix tree */ - down_write(&sbi->extent_tree_lock); + mutex_lock(&sbi->extent_tree_lock); f2fs_bug_on(sbi, atomic_read(&et->node_cnt)); radix_tree_delete(&sbi->extent_tree_root, inode->i_ino); kmem_cache_free(extent_tree_slab, et); atomic_dec(&sbi->total_ext_tree); - up_write(&sbi->extent_tree_lock); + mutex_unlock(&sbi->extent_tree_lock); F2FS_I(inode)->extent_tree = NULL; @@ -714,7 +773,7 @@ void f2fs_update_extent_cache(struct dnode_of_data *dn) else blkaddr = dn->data_blkaddr; - fofs = start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) + + fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) + dn->ofs_in_node; f2fs_update_extent_tree_range(dn->inode, fofs, blkaddr, 1); } @@ -729,10 +788,10 @@ void f2fs_update_extent_cache_range(struct dnode_of_data *dn, f2fs_update_extent_tree_range(dn->inode, fofs, blkaddr, len); } -void init_extent_cache_info(struct f2fs_sb_info *sbi) +void f2fs_init_extent_cache_info(struct f2fs_sb_info *sbi) { INIT_RADIX_TREE(&sbi->extent_tree_root, GFP_NOIO); - init_rwsem(&sbi->extent_tree_lock); + mutex_init(&sbi->extent_tree_lock); INIT_LIST_HEAD(&sbi->extent_list); spin_lock_init(&sbi->extent_lock); atomic_set(&sbi->total_ext_tree, 0); @@ -741,7 +800,7 @@ void init_extent_cache_info(struct f2fs_sb_info *sbi) atomic_set(&sbi->total_ext_node, 0); } -int __init create_extent_cache(void) +int __init f2fs_create_extent_cache(void) { extent_tree_slab = f2fs_kmem_cache_create("f2fs_extent_tree", sizeof(struct extent_tree)); @@ -756,7 +815,7 @@ int __init create_extent_cache(void) return 0; } -void destroy_extent_cache(void) +void f2fs_destroy_extent_cache(void) { kmem_cache_destroy(extent_node_slab); kmem_cache_destroy(extent_tree_slab);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 88e111a..606b1cf 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h
@@ -19,11 +19,17 @@ #include <linux/magic.h> #include <linux/kobject.h> #include <linux/sched.h> +#include <linux/writeback.h> +#include <linux/cred.h> #include <linux/vmalloc.h> #include <linux/bio.h> #include <linux/blkdev.h> -#include <linux/fscrypto.h> +#include <linux/quotaops.h> #include <crypto/hash.h> +#include <linux/overflow.h> + +#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_F2FS_FS_ENCRYPTION) +#include <linux/fscrypt.h> #ifdef CONFIG_F2FS_CHECK_FS #define f2fs_bug_on(sbi, condition) BUG_ON(condition) @@ -37,28 +43,35 @@ } while (0) #endif -#ifdef CONFIG_F2FS_FAULT_INJECTION enum { FAULT_KMALLOC, + FAULT_KVMALLOC, FAULT_PAGE_ALLOC, + FAULT_PAGE_GET, + FAULT_ALLOC_BIO, FAULT_ALLOC_NID, FAULT_ORPHAN, FAULT_BLOCK, FAULT_DIR_DEPTH, FAULT_EVICT_INODE, + FAULT_TRUNCATE, FAULT_IO, FAULT_CHECKPOINT, + FAULT_DISCARD, FAULT_MAX, }; +#ifdef CONFIG_F2FS_FAULT_INJECTION +#define F2FS_ALL_FAULT_TYPE ((1 << FAULT_MAX) - 1) + struct f2fs_fault_info { atomic_t inject_ops; unsigned int inject_rate; unsigned int inject_type; }; -extern char *fault_name[FAULT_MAX]; -#define IS_FAULT_SET(fi, type) (fi->inject_type & (1 << (type))) +extern char *f2fs_fault_name[FAULT_MAX]; +#define IS_FAULT_SET(fi, type) ((fi)->inject_type & (1 << (type))) #endif /* @@ -83,10 +96,17 @@ extern char *fault_name[FAULT_MAX]; #define F2FS_MOUNT_FAULT_INJECTION 0x00010000 #define F2FS_MOUNT_ADAPTIVE 0x00020000 #define F2FS_MOUNT_LFS 0x00040000 +#define F2FS_MOUNT_USRQUOTA 0x00080000 +#define F2FS_MOUNT_GRPQUOTA 0x00100000 +#define F2FS_MOUNT_PRJQUOTA 0x00200000 +#define F2FS_MOUNT_QUOTA 0x00400000 +#define F2FS_MOUNT_INLINE_XATTR_SIZE 0x00800000 +#define F2FS_MOUNT_RESERVE_ROOT 0x01000000 -#define clear_opt(sbi, option) (sbi->mount_opt.opt &= ~F2FS_MOUNT_##option) -#define set_opt(sbi, option) (sbi->mount_opt.opt |= F2FS_MOUNT_##option) -#define test_opt(sbi, option) (sbi->mount_opt.opt & F2FS_MOUNT_##option) +#define F2FS_OPTION(sbi) ((sbi)->mount_opt) +#define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option) +#define set_opt(sbi, option) (F2FS_OPTION(sbi).opt |= F2FS_MOUNT_##option) +#define test_opt(sbi, option) (F2FS_OPTION(sbi).opt & F2FS_MOUNT_##option) #define ver_after(a, b) (typecheck(unsigned long long, a) && \ typecheck(unsigned long long, b) && \ @@ -99,18 +119,52 @@ typedef u32 block_t; /* typedef u32 nid_t; struct f2fs_mount_info { - unsigned int opt; + unsigned int opt; + int write_io_size_bits; /* Write IO size bits */ + block_t root_reserved_blocks; /* root reserved blocks */ + kuid_t s_resuid; /* reserved blocks for uid */ + kgid_t s_resgid; /* reserved blocks for gid */ + int active_logs; /* # of active logs */ + int inline_xattr_size; /* inline xattr size */ +#ifdef CONFIG_F2FS_FAULT_INJECTION + struct f2fs_fault_info fault_info; /* For fault injection */ +#endif +#ifdef CONFIG_QUOTA + /* Names of quota files with journalled quota */ + char *s_qf_names[MAXQUOTAS]; + int s_jquota_fmt; /* Format of quota to use */ +#endif + /* For which write hints are passed down to block layer */ + int whint_mode; + int alloc_mode; /* segment allocation policy */ + int fsync_mode; /* fsync policy */ + bool test_dummy_encryption; /* test dummy encryption */ }; -#define F2FS_FEATURE_ENCRYPT 0x0001 -#define F2FS_FEATURE_HMSMR 0x0002 +#define F2FS_FEATURE_ENCRYPT 0x0001 +#define F2FS_FEATURE_BLKZONED 0x0002 +#define F2FS_FEATURE_ATOMIC_WRITE 0x0004 +#define F2FS_FEATURE_EXTRA_ATTR 0x0008 +#define F2FS_FEATURE_PRJQUOTA 0x0010 +#define F2FS_FEATURE_INODE_CHKSUM 0x0020 +#define F2FS_FEATURE_FLEXIBLE_INLINE_XATTR 0x0040 +#define F2FS_FEATURE_QUOTA_INO 0x0080 +#define F2FS_FEATURE_INODE_CRTIME 0x0100 +#define F2FS_FEATURE_LOST_FOUND 0x0200 +#define F2FS_FEATURE_VERITY 0x0400 /* reserved */ #define F2FS_HAS_FEATURE(sb, mask) \ ((F2FS_SB(sb)->raw_super->feature & cpu_to_le32(mask)) != 0) #define F2FS_SET_FEATURE(sb, mask) \ - F2FS_SB(sb)->raw_super->feature |= cpu_to_le32(mask) + (F2FS_SB(sb)->raw_super->feature |= cpu_to_le32(mask)) #define F2FS_CLEAR_FEATURE(sb, mask) \ - F2FS_SB(sb)->raw_super->feature &= ~cpu_to_le32(mask) + (F2FS_SB(sb)->raw_super->feature &= ~cpu_to_le32(mask)) + +/* + * Default values for user and/or group using reserved blocks + */ +#define F2FS_DEF_RESUID 0 +#define F2FS_DEF_RESGID 0 /* * For checkpoint manager @@ -120,19 +174,19 @@ enum { SIT_BITMAP }; -enum { - CP_UMOUNT, - CP_FASTBOOT, - CP_SYNC, - CP_RECOVERY, - CP_DISCARD, -}; +#define CP_UMOUNT 0x00000001 +#define CP_FASTBOOT 0x00000002 +#define CP_SYNC 0x00000004 +#define CP_RECOVERY 0x00000008 +#define CP_DISCARD 0x00000010 +#define CP_TRIMMED 0x00000020 -#define DEF_BATCHED_TRIM_SECTIONS 2 -#define BATCHED_TRIM_SEGMENTS(sbi) \ - (SM_I(sbi)->trim_sections * (sbi)->segs_per_sec) -#define BATCHED_TRIM_BLOCKS(sbi) \ - (BATCHED_TRIM_SEGMENTS(sbi) << (sbi)->log_blocks_per_seg) +#define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi) +#define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */ +#define DEF_MIN_DISCARD_ISSUE_TIME 50 /* 50 ms, if exists */ +#define DEF_MID_DISCARD_ISSUE_TIME 500 /* 500 ms, if device busy */ +#define DEF_MAX_DISCARD_ISSUE_TIME 60000 /* 60 s, if no candidates */ +#define DEF_DISCARD_URGENT_UTIL 80 /* do more discard over 80% */ #define DEF_CP_INTERVAL 60 /* 60 secs */ #define DEF_IDLE_INTERVAL 5 /* 5 secs */ @@ -141,11 +195,10 @@ struct cp_control { __u64 trim_start; __u64 trim_end; __u64 trim_minlen; - __u64 trimmed; }; /* - * For CP/NAT/SIT/SSA readahead + * indicate meta/data type */ enum { META_CP, @@ -153,6 +206,8 @@ enum { META_SIT, META_SSA, META_POR, + DATA_GENERIC, + META_GENERIC, }; /* for the list of ino */ @@ -160,12 +215,15 @@ enum { ORPHAN_INO, /* for orphan ino list */ APPEND_INO, /* for append ino list */ UPDATE_INO, /* for update ino list */ + TRANS_DIR_INO, /* for trasactions dir ino list */ + FLUSH_INO, /* for multiple device flushing */ MAX_INO_ENTRY, /* max. list */ }; struct ino_entry { - struct list_head list; /* list head */ - nid_t ino; /* inode number */ + struct list_head list; /* list head */ + nid_t ino; /* inode number */ + unsigned int dirty_device; /* dirty device bitmap */ }; /* for the list of inodes to be GCed */ @@ -174,18 +232,102 @@ struct inode_entry { struct inode *inode; /* vfs inode pointer */ }; -/* for the list of blockaddresses to be discarded */ -struct discard_entry { +struct fsync_node_entry { struct list_head list; /* list head */ - block_t blkaddr; /* block address to be discarded */ - int len; /* # of consecutive blocks of the discard */ + struct page *page; /* warm node page pointer */ + unsigned int seq_id; /* sequence id */ }; -struct bio_entry { - struct list_head list; - struct bio *bio; - struct completion event; - int error; +/* for the bitmap indicate blocks to be discarded */ +struct discard_entry { + struct list_head list; /* list head */ + block_t start_blkaddr; /* start blockaddr of current segment */ + unsigned char discard_map[SIT_VBLOCK_MAP_SIZE]; /* segment discard bitmap */ +}; + +/* default discard granularity of inner discard thread, unit: block count */ +#define DEFAULT_DISCARD_GRANULARITY 16 + +/* max discard pend list number */ +#define MAX_PLIST_NUM 512 +#define plist_idx(blk_num) ((blk_num) >= MAX_PLIST_NUM ? \ + (MAX_PLIST_NUM - 1) : (blk_num - 1)) + +enum { + D_PREP, /* initial */ + D_PARTIAL, /* partially submitted */ + D_SUBMIT, /* all submitted */ + D_DONE, /* finished */ +}; + +struct discard_info { + block_t lstart; /* logical start address */ + block_t len; /* length */ + block_t start; /* actual start address in dev */ +}; + +struct discard_cmd { + struct rb_node rb_node; /* rb node located in rb-tree */ + union { + struct { + block_t lstart; /* logical start address */ + block_t len; /* length */ + block_t start; /* actual start address in dev */ + }; + struct discard_info di; /* discard info */ + + }; + struct list_head list; /* command list */ + struct completion wait; /* compleation */ + struct block_device *bdev; /* bdev */ + unsigned short ref; /* reference count */ + unsigned char state; /* state */ + unsigned char issuing; /* issuing discard */ + int error; /* bio error */ + spinlock_t lock; /* for state/bio_ref updating */ + unsigned short bio_ref; /* bio reference count */ +}; + +enum { + DPOLICY_BG, + DPOLICY_FORCE, + DPOLICY_FSTRIM, + DPOLICY_UMOUNT, + MAX_DPOLICY, +}; + +struct discard_policy { + int type; /* type of discard */ + unsigned int min_interval; /* used for candidates exist */ + unsigned int mid_interval; /* used for device busy */ + unsigned int max_interval; /* used for candidates not exist */ + unsigned int max_requests; /* # of discards issued per round */ + unsigned int io_aware_gran; /* minimum granularity discard not be aware of I/O */ + bool io_aware; /* issue discard in idle time */ + bool sync; /* submit discard with REQ_SYNC flag */ + bool ordered; /* issue discard by lba order */ + unsigned int granularity; /* discard granularity */ +}; + +struct discard_cmd_control { + struct task_struct *f2fs_issue_discard; /* discard thread */ + struct list_head entry_list; /* 4KB discard entry list */ + struct list_head pend_list[MAX_PLIST_NUM];/* store pending entries */ + struct list_head wait_list; /* store on-flushing entries */ + struct list_head fstrim_list; /* in-flight discard from fstrim */ + wait_queue_head_t discard_wait_queue; /* waiting queue for wake-up */ + unsigned int discard_wake; /* to wake up discard thread */ + struct mutex cmd_lock; + unsigned int nr_discards; /* # of discards in the list */ + unsigned int max_discards; /* max. discards to be issued */ + unsigned int discard_granularity; /* discard granularity */ + unsigned int undiscard_blks; /* # of undiscard blocks */ + unsigned int next_pos; /* next discard position */ + atomic_t issued_discard; /* # of issued discard */ + atomic_t issing_discard; /* # of issing discard */ + atomic_t discard_cmd_cnt; /* # of cached cmd count */ + struct rb_root root; /* root of discard rb-tree */ + bool rbtree_check; /* config for consistence check */ }; /* for the list of fsync inodes, used only during recovery */ @@ -196,13 +338,13 @@ struct fsync_inode_entry { block_t last_dentry; /* block address locating the last dentry */ }; -#define nats_in_cursum(jnl) (le16_to_cpu(jnl->n_nats)) -#define sits_in_cursum(jnl) (le16_to_cpu(jnl->n_sits)) +#define nats_in_cursum(jnl) (le16_to_cpu((jnl)->n_nats)) +#define sits_in_cursum(jnl) (le16_to_cpu((jnl)->n_sits)) -#define nat_in_journal(jnl, i) (jnl->nat_j.entries[i].ne) -#define nid_in_journal(jnl, i) (jnl->nat_j.entries[i].nid) -#define sit_in_journal(jnl, i) (jnl->sit_j.entries[i].se) -#define segno_in_journal(jnl, i) (jnl->sit_j.entries[i].segno) +#define nat_in_journal(jnl, i) ((jnl)->nat_j.entries[i].ne) +#define nid_in_journal(jnl, i) ((jnl)->nat_j.entries[i].nid) +#define sit_in_journal(jnl, i) ((jnl)->sit_j.entries[i].se) +#define segno_in_journal(jnl, i) ((jnl)->sit_j.entries[i].segno) #define MAX_NAT_JENTRIES(jnl) (NAT_JOURNAL_ENTRIES - nats_in_cursum(jnl)) #define MAX_SIT_JENTRIES(jnl) (SIT_JOURNAL_ENTRIES - sits_in_cursum(jnl)) @@ -210,6 +352,7 @@ struct fsync_inode_entry { static inline int update_nats_in_cursum(struct f2fs_journal *journal, int i) { int before = nats_in_cursum(journal); + journal->n_nats = cpu_to_le16(before + i); return before; } @@ -217,6 +360,7 @@ static inline int update_nats_in_cursum(struct f2fs_journal *journal, int i) static inline int update_sits_in_cursum(struct f2fs_journal *journal, int i) { int before = sits_in_cursum(journal); + journal->n_sits = cpu_to_le16(before + i); return before; } @@ -242,11 +386,20 @@ static inline bool __has_cursum_space(struct f2fs_journal *journal, #define F2FS_IOC_START_VOLATILE_WRITE _IO(F2FS_IOCTL_MAGIC, 3) #define F2FS_IOC_RELEASE_VOLATILE_WRITE _IO(F2FS_IOCTL_MAGIC, 4) #define F2FS_IOC_ABORT_VOLATILE_WRITE _IO(F2FS_IOCTL_MAGIC, 5) -#define F2FS_IOC_GARBAGE_COLLECT _IO(F2FS_IOCTL_MAGIC, 6) +#define F2FS_IOC_GARBAGE_COLLECT _IOW(F2FS_IOCTL_MAGIC, 6, __u32) #define F2FS_IOC_WRITE_CHECKPOINT _IO(F2FS_IOCTL_MAGIC, 7) -#define F2FS_IOC_DEFRAGMENT _IO(F2FS_IOCTL_MAGIC, 8) +#define F2FS_IOC_DEFRAGMENT _IOWR(F2FS_IOCTL_MAGIC, 8, \ + struct f2fs_defragment) #define F2FS_IOC_MOVE_RANGE _IOWR(F2FS_IOCTL_MAGIC, 9, \ struct f2fs_move_range) +#define F2FS_IOC_FLUSH_DEVICE _IOW(F2FS_IOCTL_MAGIC, 10, \ + struct f2fs_flush_device) +#define F2FS_IOC_GARBAGE_COLLECT_RANGE _IOW(F2FS_IOCTL_MAGIC, 11, \ + struct f2fs_gc_range) +#define F2FS_IOC_GET_FEATURES _IOR(F2FS_IOCTL_MAGIC, 12, __u32) +#define F2FS_IOC_SET_PIN_FILE _IOW(F2FS_IOCTL_MAGIC, 13, __u32) +#define F2FS_IOC_GET_PIN_FILE _IOR(F2FS_IOCTL_MAGIC, 14, __u32) +#define F2FS_IOC_PRECACHE_EXTENTS _IO(F2FS_IOCTL_MAGIC, 15) #define F2FS_IOC_SET_ENCRYPTION_POLICY FS_IOC_SET_ENCRYPTION_POLICY #define F2FS_IOC_GET_ENCRYPTION_POLICY FS_IOC_GET_ENCRYPTION_POLICY @@ -271,6 +424,12 @@ static inline bool __has_cursum_space(struct f2fs_journal *journal, #define F2FS_IOC32_GETVERSION FS_IOC32_GETVERSION #endif +struct f2fs_gc_range { + u32 sync; + u64 start; + u64 len; +}; + struct f2fs_defragment { u64 start; u64 len; @@ -283,36 +442,70 @@ struct f2fs_move_range { u64 len; /* size to move */ }; +struct f2fs_flush_device { + u32 dev_num; /* device number to flush */ + u32 segments; /* # of segments to flush */ +}; + +/* for inline stuff */ +#define DEF_INLINE_RESERVED_SIZE 1 +#define DEF_MIN_INLINE_SIZE 1 +static inline int get_extra_isize(struct inode *inode); +static inline int get_inline_xattr_addrs(struct inode *inode); +#define MAX_INLINE_DATA(inode) (sizeof(__le32) * \ + (CUR_ADDRS_PER_INODE(inode) - \ + get_inline_xattr_addrs(inode) - \ + DEF_INLINE_RESERVED_SIZE)) + +/* for inline dir */ +#define NR_INLINE_DENTRY(inode) (MAX_INLINE_DATA(inode) * BITS_PER_BYTE / \ + ((SIZE_OF_DIR_ENTRY + F2FS_SLOT_LEN) * \ + BITS_PER_BYTE + 1)) +#define INLINE_DENTRY_BITMAP_SIZE(inode) ((NR_INLINE_DENTRY(inode) + \ + BITS_PER_BYTE - 1) / BITS_PER_BYTE) +#define INLINE_RESERVED_SIZE(inode) (MAX_INLINE_DATA(inode) - \ + ((SIZE_OF_DIR_ENTRY + F2FS_SLOT_LEN) * \ + NR_INLINE_DENTRY(inode) + \ + INLINE_DENTRY_BITMAP_SIZE(inode))) + /* * For INODE and NODE manager */ /* for directory operations */ struct f2fs_dentry_ptr { struct inode *inode; - const void *bitmap; + void *bitmap; struct f2fs_dir_entry *dentry; __u8 (*filename)[F2FS_SLOT_LEN]; int max; + int nr_bitmap; }; -static inline void make_dentry_ptr(struct inode *inode, - struct f2fs_dentry_ptr *d, void *src, int type) +static inline void make_dentry_ptr_block(struct inode *inode, + struct f2fs_dentry_ptr *d, struct f2fs_dentry_block *t) { d->inode = inode; + d->max = NR_DENTRY_IN_BLOCK; + d->nr_bitmap = SIZE_OF_DENTRY_BITMAP; + d->bitmap = t->dentry_bitmap; + d->dentry = t->dentry; + d->filename = t->filename; +} - if (type == 1) { - struct f2fs_dentry_block *t = (struct f2fs_dentry_block *)src; - d->max = NR_DENTRY_IN_BLOCK; - d->bitmap = &t->dentry_bitmap; - d->dentry = t->dentry; - d->filename = t->filename; - } else { - struct f2fs_inline_dentry *t = (struct f2fs_inline_dentry *)src; - d->max = NR_INLINE_DENTRY; - d->bitmap = &t->dentry_bitmap; - d->dentry = t->dentry; - d->filename = t->filename; - } +static inline void make_dentry_ptr_inline(struct inode *inode, + struct f2fs_dentry_ptr *d, void *t) +{ + int entry_cnt = NR_INLINE_DENTRY(inode); + int bitmap_size = INLINE_DENTRY_BITMAP_SIZE(inode); + int reserved_size = INLINE_RESERVED_SIZE(inode); + + d->inode = inode; + d->max = entry_cnt; + d->nr_bitmap = bitmap_size; + d->bitmap = t; + d->dentry = t + bitmap_size + reserved_size; + d->filename = t + bitmap_size + reserved_size + + SIZE_OF_DIR_ENTRY * entry_cnt; } /* @@ -331,29 +524,42 @@ enum { */ }; +#define DEFAULT_RETRY_IO_COUNT 8 /* maximum retry read IO count */ + #define F2FS_LINK_MAX 0xffffffff /* maximum link count per file */ #define MAX_DIR_RA_PAGES 4 /* maximum ra pages of dir */ -/* vector size for gang look-up from extent cache that consists of radix tree */ -#define EXT_TREE_VEC_SIZE 64 - /* for in-memory extent cache entry */ #define F2FS_MIN_EXTENT_LEN 64 /* minimum extent length */ /* number of extent info in extent cache we try to shrink */ #define EXTENT_CACHE_SHRINK_NUMBER 128 +struct rb_entry { + struct rb_node rb_node; /* rb node located in rb-tree */ + unsigned int ofs; /* start offset of the entry */ + unsigned int len; /* length of the entry */ +}; + struct extent_info { unsigned int fofs; /* start offset in a file */ - u32 blk; /* start block address of the extent */ unsigned int len; /* length of the extent */ + u32 blk; /* start block address of the extent */ }; struct extent_node { - struct rb_node rb_node; /* rb node located in rb-tree */ + struct rb_node rb_node; + union { + struct { + unsigned int fofs; + unsigned int len; + u32 blk; + }; + struct extent_info ei; /* extent info */ + + }; struct list_head list; /* node in global extent list of sbi */ - struct extent_info ei; /* extent info */ struct extent_tree *et; /* extent tree pointer */ }; @@ -384,15 +590,19 @@ struct f2fs_map_blocks { unsigned int m_len; unsigned int m_flags; pgoff_t *m_next_pgofs; /* point next possible non-hole pgofs */ + pgoff_t *m_next_extent; /* point to next possible extent */ + int m_seg_type; }; /* for flag in get_data_block */ -#define F2FS_GET_BLOCK_READ 0 -#define F2FS_GET_BLOCK_DIO 1 -#define F2FS_GET_BLOCK_FIEMAP 2 -#define F2FS_GET_BLOCK_BMAP 3 -#define F2FS_GET_BLOCK_PRE_DIO 4 -#define F2FS_GET_BLOCK_PRE_AIO 5 +enum { + F2FS_GET_BLOCK_DEFAULT, + F2FS_GET_BLOCK_FIEMAP, + F2FS_GET_BLOCK_BMAP, + F2FS_GET_BLOCK_PRE_DIO, + F2FS_GET_BLOCK_PRE_AIO, + F2FS_GET_BLOCK_PRECACHE, +}; /* * i_advise uses FADVISE_XXX_BIT. We can add additional hints later. @@ -401,6 +611,11 @@ struct f2fs_map_blocks { #define FADVISE_LOST_PINO_BIT 0x02 #define FADVISE_ENCRYPT_BIT 0x04 #define FADVISE_ENC_NAME_BIT 0x08 +#define FADVISE_KEEP_SIZE_BIT 0x10 +#define FADVISE_HOT_BIT 0x20 +#define FADVISE_VERITY_BIT 0x40 /* reserved */ + +#define FADVISE_MODIFIABLE_BITS (FADVISE_COLD_BIT | FADVISE_HOT_BIT) #define file_is_cold(inode) is_file(inode, FADVISE_COLD_BIT) #define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT) @@ -413,15 +628,28 @@ struct f2fs_map_blocks { #define file_clear_encrypt(inode) clear_file(inode, FADVISE_ENCRYPT_BIT) #define file_enc_name(inode) is_file(inode, FADVISE_ENC_NAME_BIT) #define file_set_enc_name(inode) set_file(inode, FADVISE_ENC_NAME_BIT) +#define file_keep_isize(inode) is_file(inode, FADVISE_KEEP_SIZE_BIT) +#define file_set_keep_isize(inode) set_file(inode, FADVISE_KEEP_SIZE_BIT) +#define file_is_hot(inode) is_file(inode, FADVISE_HOT_BIT) +#define file_set_hot(inode) set_file(inode, FADVISE_HOT_BIT) +#define file_clear_hot(inode) clear_file(inode, FADVISE_HOT_BIT) #define DEF_DIR_LEVEL 0 +enum { + GC_FAILURE_PIN, + GC_FAILURE_ATOMIC, + MAX_GC_FAILURE +}; + struct f2fs_inode_info { struct inode vfs_inode; /* serve a vfs inode */ unsigned long i_flags; /* keep an inode flags for ioctl */ unsigned char i_advise; /* use to give file attribute hints */ unsigned char i_dir_level; /* use for dentry level for large dir */ - unsigned int i_current_depth; /* use only in directory structure */ + unsigned int i_current_depth; /* only for directory depth */ + /* for gc failure statistic */ + unsigned int i_gc_failures[MAX_GC_FAILURE]; unsigned int i_pino; /* parent inode number */ umode_t i_acl_mode; /* keep file acl mode temporarily */ @@ -432,16 +660,34 @@ struct f2fs_inode_info { f2fs_hash_t chash; /* hash value of given file name */ unsigned int clevel; /* maximum level of given file name */ struct task_struct *task; /* lookup and create consistency */ + struct task_struct *cp_task; /* separate cp/wb IO stats*/ nid_t i_xattr_nid; /* node id that contains xattrs */ - unsigned long long xattr_ver; /* cp version of xattr modification */ loff_t last_disk_size; /* lastly written file size */ +#ifdef CONFIG_QUOTA + struct dquot *i_dquot[MAXQUOTAS]; + + /* quota space reservation, managed internally by quota code */ + qsize_t i_reserved_quota; +#endif struct list_head dirty_list; /* dirty list for dirs and files */ struct list_head gdirty_list; /* linked in global dirty list */ + struct list_head inmem_ilist; /* list for inmem inodes */ struct list_head inmem_pages; /* inmemory pages managed by f2fs */ + struct task_struct *inmem_task; /* store inmemory task */ struct mutex inmem_lock; /* lock for inmemory pages */ struct extent_tree *extent_tree; /* cached extent_tree entry */ - struct rw_semaphore dio_rwsem[2];/* avoid racing between dio and gc */ + + /* avoid racing between foreground op and gc */ + struct rw_semaphore i_gc_rwsem[2]; + struct rw_semaphore i_mmap_sem; + struct rw_semaphore i_xattr_sem; /* avoid racing between reading and changing EAs */ + + int i_extra_isize; /* size of extra space located in i_addr */ + kprojid_t i_projid; /* id for project quota */ + int i_inline_xattr_size; /* inline xattr size */ + struct timespec i_crtime; /* inode creation time */ + struct timespec i_disk_time[4]; /* inode disk times */ }; static inline void get_extent_info(struct extent_info *ext, @@ -468,11 +714,23 @@ static inline void set_extent_info(struct extent_info *ei, unsigned int fofs, ei->len = len; } -static inline bool __is_extent_same(struct extent_info *ei1, - struct extent_info *ei2) +static inline bool __is_discard_mergeable(struct discard_info *back, + struct discard_info *front, unsigned int max_len) { - return (ei1->fofs == ei2->fofs && ei1->blk == ei2->blk && - ei1->len == ei2->len); + return (back->lstart + back->len == front->lstart) && + (back->len + front->len <= max_len); +} + +static inline bool __is_discard_back_mergeable(struct discard_info *cur, + struct discard_info *back, unsigned int max_len) +{ + return __is_discard_mergeable(back, cur, max_len); +} + +static inline bool __is_discard_front_mergeable(struct discard_info *cur, + struct discard_info *front, unsigned int max_len) +{ + return __is_discard_mergeable(cur, front, max_len); } static inline bool __is_extent_mergeable(struct extent_info *back, @@ -494,20 +752,29 @@ static inline bool __is_front_mergeable(struct extent_info *cur, return __is_extent_mergeable(cur, front); } -extern void f2fs_mark_inode_dirty_sync(struct inode *); +extern void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync); static inline void __try_update_largest_extent(struct inode *inode, struct extent_tree *et, struct extent_node *en) { if (en->ei.len > et->largest.len) { et->largest = en->ei; - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, true); } } +/* + * For free nid management + */ +enum nid_state { + FREE_NID, /* newly added to free nid list */ + PREALLOC_NID, /* it is preallocated */ + MAX_NID_STATE, +}; + struct f2fs_nm_info { block_t nat_blkaddr; /* base disk address of NAT */ nid_t max_nid; /* maximum possible node ids */ - nid_t available_nids; /* maximum available node ids */ + nid_t available_nids; /* # of available node ids */ nid_t next_scan_nid; /* the next nid to be scanned */ unsigned int ram_thresh; /* control the memory footprint */ unsigned int ra_nid_pages; /* # of nid pages to be readaheaded */ @@ -518,18 +785,31 @@ struct f2fs_nm_info { struct radix_tree_root nat_set_root;/* root of the nat set cache */ struct rw_semaphore nat_tree_lock; /* protect nat_tree_lock */ struct list_head nat_entries; /* cached nat entry list (clean) */ + spinlock_t nat_list_lock; /* protect clean nat entry list */ unsigned int nat_cnt; /* the # of cached nat entries */ unsigned int dirty_nat_cnt; /* total num of nat entries in set */ + unsigned int nat_blocks; /* # of nat blocks */ /* free node ids management */ struct radix_tree_root free_nid_root;/* root of the free_nid cache */ - struct list_head free_nid_list; /* a list for free nids */ - spinlock_t free_nid_list_lock; /* protect free nid list */ - unsigned int fcnt; /* the number of free node id */ + struct list_head free_nid_list; /* list for free nids excluding preallocated nids */ + unsigned int nid_cnt[MAX_NID_STATE]; /* the number of free node id */ + spinlock_t nid_list_lock; /* protect nid lists ops */ struct mutex build_lock; /* lock for build free nids */ + unsigned char **free_nid_bitmap; + unsigned char *nat_block_bitmap; + unsigned short *free_nid_count; /* free nid count of NAT block */ /* for checkpoint */ char *nat_bitmap; /* NAT bitmap pointer */ + + unsigned int nat_bits_blocks; /* # of nat bits blocks */ + unsigned char *nat_bits; /* NAT bits blocks */ + unsigned char *full_nat_bits; /* full NAT pages */ + unsigned char *empty_nat_bits; /* empty NAT pages */ +#ifdef CONFIG_F2FS_CHECK_FS + char *nat_bitmap_mir; /* NAT bitmap mirror */ +#endif int bitmap_size; /* bitmap size */ }; @@ -586,19 +866,20 @@ enum { CURSEG_WARM_NODE, /* direct node blocks of normal files */ CURSEG_COLD_NODE, /* indirect node blocks */ NO_CHECK_TYPE, - CURSEG_DIRECT_IO, /* to use for the direct IO path */ }; struct flush_cmd { struct completion wait; struct llist_node llnode; + nid_t ino; int ret; }; struct flush_cmd_control { struct task_struct *f2fs_issue_flush; /* flush thread */ wait_queue_head_t flush_wait_queue; /* waiting queue for wake-up */ - atomic_t submit_flush; /* # of issued flushes */ + atomic_t issued_flush; /* # of issued flushes */ + atomic_t issing_flush; /* # of issing flushes */ struct llist_head issue_list; /* list for command issue */ struct llist_node *dispatch_list; /* list for command dispatch */ }; @@ -609,6 +890,8 @@ struct f2fs_sm_info { struct dirty_seglist_info *dirty_info; /* dirty segment information */ struct curseg_info *curseg_array; /* active segment information */ + struct rw_semaphore curseg_lock; /* for preventing curseg change */ + block_t seg0_blkaddr; /* block address of 0'th segment */ block_t main_blkaddr; /* start block address of main area */ block_t ssa_blkaddr; /* start block address of SSA area */ @@ -621,12 +904,6 @@ struct f2fs_sm_info { /* a threshold to reclaim prefree segments */ unsigned int rec_prefree_segments; - /* for small discard management */ - struct list_head discard_list; /* 4KB discard list */ - struct list_head wait_list; /* linked with issued discard bio */ - int nr_discards; /* # of discards in the list */ - int max_discards; /* max. discards to be issued */ - /* for batched trimming */ unsigned int trim_sections; /* # of sections to trim */ @@ -635,10 +912,15 @@ struct f2fs_sm_info { unsigned int ipu_policy; /* in-place-update policy */ unsigned int min_ipu_util; /* in-place-update threshold */ unsigned int min_fsync_blocks; /* threshold for fsync */ + unsigned int min_seq_blocks; /* threshold for sequential blocks */ + unsigned int min_hot_blocks; /* threshold for hot block allocation */ + unsigned int min_ssr_sections; /* threshold to trigger SSR allocation */ /* for flush command control */ - struct flush_cmd_control *cmd_control_info; + struct flush_cmd_control *fcc_info; + /* for discard command control */ + struct discard_cmd_control *dcc_info; }; /* @@ -650,13 +932,17 @@ struct f2fs_sm_info { * f2fs monitors the number of several block types such as on-writeback, * dirty dentry blocks, dirty node blocks, and dirty meta blocks. */ +#define WB_DATA_TYPE(p) (__is_cp_guaranteed(p) ? F2FS_WB_CP_DATA : F2FS_WB_DATA) enum count_type { F2FS_DIRTY_DENTS, F2FS_DIRTY_DATA, + F2FS_DIRTY_QDATA, F2FS_DIRTY_NODES, F2FS_DIRTY_META, F2FS_INMEM_PAGES, F2FS_DIRTY_IMETA, + F2FS_WB_CP_DATA, + F2FS_WB_DATA, NR_COUNT_TYPE, }; @@ -680,35 +966,107 @@ enum page_type { META_FLUSH, INMEM, /* the below types are used by tracepoints only. */ INMEM_DROP, + INMEM_INVALIDATE, INMEM_REVOKE, IPU, OPU, }; +enum temp_type { + HOT = 0, /* must be zero for meta bio */ + WARM, + COLD, + NR_TEMP_TYPE, +}; + +enum need_lock_type { + LOCK_REQ = 0, + LOCK_DONE, + LOCK_RETRY, +}; + +enum cp_reason_type { + CP_NO_NEEDED, + CP_NON_REGULAR, + CP_HARDLINK, + CP_SB_NEED_CP, + CP_WRONG_PINO, + CP_NO_SPC_ROLL, + CP_NODE_NEED_CP, + CP_FASTBOOT_MODE, + CP_SPEC_LOG_NUM, + CP_RECOVER_DIR, +}; + +enum iostat_type { + APP_DIRECT_IO, /* app direct IOs */ + APP_BUFFERED_IO, /* app buffered IOs */ + APP_WRITE_IO, /* app write IOs */ + APP_MAPPED_IO, /* app mapped IOs */ + FS_DATA_IO, /* data IOs from kworker/fsync/reclaimer */ + FS_NODE_IO, /* node IOs from kworker/fsync/reclaimer */ + FS_META_IO, /* meta IOs from kworker/reclaimer */ + FS_GC_DATA_IO, /* data IOs from forground gc */ + FS_GC_NODE_IO, /* node IOs from forground gc */ + FS_CP_DATA_IO, /* data IOs from checkpoint */ + FS_CP_NODE_IO, /* node IOs from checkpoint */ + FS_CP_META_IO, /* meta IOs from checkpoint */ + FS_DISCARD, /* discard */ + NR_IO_TYPE, +}; + struct f2fs_io_info { struct f2fs_sb_info *sbi; /* f2fs_sb_info pointer */ + nid_t ino; /* inode number */ enum page_type type; /* contains DATA/NODE/META/META_FLUSH */ + enum temp_type temp; /* contains HOT/WARM/COLD */ int op; /* contains REQ_OP_ */ - int op_flags; /* rq_flag_bits */ + int op_flags; /* req_flag_bits */ block_t new_blkaddr; /* new block address to be written */ block_t old_blkaddr; /* old block address before Cow */ struct page *page; /* page to be written */ struct page *encrypted_page; /* encrypted page */ + struct list_head list; /* serialize IOs */ + bool submitted; /* indicate IO submission */ + int need_lock; /* indicate we need to lock cp_rwsem */ + bool in_list; /* indicate fio is in io_list */ + bool is_meta; /* indicate borrow meta inode mapping or not */ + bool retry; /* need to reallocate block address */ + enum iostat_type io_type; /* io type */ + struct writeback_control *io_wbc; /* writeback control */ + unsigned char version; /* version of the node */ }; -#define is_read_io(rw) (rw == READ) +#define is_read_io(rw) ((rw) == READ) struct f2fs_bio_info { struct f2fs_sb_info *sbi; /* f2fs superblock */ struct bio *bio; /* bios to merge */ sector_t last_block_in_bio; /* last block number */ struct f2fs_io_info fio; /* store buffered io info. */ struct rw_semaphore io_rwsem; /* blocking op for bio */ + spinlock_t io_lock; /* serialize DATA/NODE IOs */ + struct list_head io_list; /* track fios */ +}; + +#define FDEV(i) (sbi->devs[i]) +#define RDEV(i) (raw_super->devs[i]) +struct f2fs_dev_info { + struct block_device *bdev; + char path[MAX_PATH_LEN]; + unsigned int total_segments; + block_t start_blk; + block_t end_blk; +#ifdef CONFIG_BLK_DEV_ZONED + unsigned int nr_blkz; /* Total number of zones */ + u8 *blkz_type; /* Array of zones type */ +#endif }; enum inode_type { DIR_INODE, /* for dirty dir inode */ FILE_INODE, /* for dirty regular/symlink inode */ DIRTY_META, /* for all dirtied inode metadata */ + ATOMIC_FILE, /* for all atomic files */ NR_INODE_TYPE, }; @@ -728,6 +1086,7 @@ enum { SBI_POR_DOING, /* recovery is doing or not */ SBI_NEED_SB_WRITE, /* need to recover superblock */ SBI_NEED_CP, /* need to checkpoint */ + SBI_IS_SHUTDOWN, /* shutdown by ioctl */ }; enum { @@ -736,21 +1095,51 @@ enum { MAX_TIME, }; +enum { + GC_NORMAL, + GC_IDLE_CB, + GC_IDLE_GREEDY, + GC_URGENT, +}; + +enum { + WHINT_MODE_OFF, /* not pass down write hints */ + WHINT_MODE_USER, /* try to pass down hints given by users */ + WHINT_MODE_FS, /* pass down hints with F2FS policy */ +}; + +enum { + ALLOC_MODE_DEFAULT, /* stay default */ + ALLOC_MODE_REUSE, /* reuse segments as much as possible */ +}; + +enum fsync_mode { + FSYNC_MODE_POSIX, /* fsync follows posix semantics */ + FSYNC_MODE_STRICT, /* fsync behaves in line with ext4 */ + FSYNC_MODE_NOBARRIER, /* fsync behaves nobarrier based on posix */ +}; + #ifdef CONFIG_F2FS_FS_ENCRYPTION -#define F2FS_KEY_DESC_PREFIX "f2fs:" -#define F2FS_KEY_DESC_PREFIX_SIZE 5 +#define DUMMY_ENCRYPTION_ENABLED(sbi) \ + (unlikely(F2FS_OPTION(sbi).test_dummy_encryption)) +#else +#define DUMMY_ENCRYPTION_ENABLED(sbi) (0) #endif + struct f2fs_sb_info { struct super_block *sb; /* pointer to VFS super block */ struct proc_dir_entry *s_proc; /* proc entry */ struct f2fs_super_block *raw_super; /* raw super block pointer */ + struct rw_semaphore sb_lock; /* lock for raw super block */ int valid_super_block; /* valid super block no */ unsigned long s_flag; /* flags for sbi */ + struct mutex writepages; /* mutex for writepages() */ -#ifdef CONFIG_F2FS_FS_ENCRYPTION - u8 key_prefix[F2FS_KEY_DESC_PREFIX_SIZE]; - u8 key_prefix_size; +#ifdef CONFIG_BLK_DEV_ZONED + unsigned int blocks_per_blkz; /* F2FS blocks per zone */ + unsigned int log_blocks_per_blkz; /* log2 F2FS blocks per zone */ #endif + /* for node-related operations */ struct f2fs_nm_info *nm_info; /* node manager */ struct inode *node_inode; /* cache node blocks */ @@ -759,9 +1148,12 @@ struct f2fs_sb_info { struct f2fs_sm_info *sm_info; /* segment manager */ /* for bio operations */ - struct f2fs_bio_info read_io; /* for read bios */ - struct f2fs_bio_info write_io[NR_PAGE_TYPE]; /* for write bios */ - struct mutex wio_mutex[NODE + 1]; /* bio ordering for NODE/DATA */ + struct f2fs_bio_info *write_io[NR_PAGE_TYPE]; /* for write bios */ + struct mutex wio_mutex[NR_PAGE_TYPE - 1][NR_TEMP_TYPE]; + /* bio ordering for NODE/DATA */ + /* keep migration IO order for LFS mode */ + struct rw_semaphore io_order_lock; + mempool_t *write_io_dummy; /* Dummy pages */ /* for checkpoint */ struct f2fs_checkpoint *ckpt; /* raw checkpoint pointer */ @@ -771,12 +1163,18 @@ struct f2fs_sb_info { struct mutex cp_mutex; /* checkpoint procedure lock */ struct rw_semaphore cp_rwsem; /* blocking FS operations */ struct rw_semaphore node_write; /* locking node writes */ + struct rw_semaphore node_change; /* locking node change */ wait_queue_head_t cp_wait; unsigned long last_time[MAX_TIME]; /* to store time in jiffies */ long interval_time[MAX_TIME]; /* to store thresholds */ struct inode_management im[MAX_INO_ENTRY]; /* manage inode cache */ + spinlock_t fsync_node_lock; /* for node entry lock */ + struct list_head fsync_node_list; /* node list head */ + unsigned int fsync_seg_id; /* sequence id */ + unsigned int fsync_node_num; /* number of node entries */ + /* for orphan inode, use 0'th array */ unsigned int max_orphans; /* max orphan inodes */ @@ -786,7 +1184,7 @@ struct f2fs_sb_info { /* for extent tree cache */ struct radix_tree_root extent_tree_root;/* cache extent cache entries */ - struct rw_semaphore extent_tree_lock; /* locking extent radix tree */ + struct mutex extent_tree_lock; /* locking extent radix tree */ struct list_head extent_list; /* lru list for shrinker */ spinlock_t extent_lock; /* locking extent lru list */ atomic_t total_ext_tree; /* extent tree count */ @@ -809,21 +1207,29 @@ struct f2fs_sb_info { unsigned int total_node_count; /* total node block count */ unsigned int total_valid_node_count; /* valid node block count */ loff_t max_file_blocks; /* max block index of file */ - int active_logs; /* # of active logs */ int dir_level; /* directory level */ + unsigned int trigger_ssr_threshold; /* threshold to trigger ssr */ + int readdir_ra; /* readahead inode in readdir */ block_t user_block_count; /* # of user blocks */ block_t total_valid_block_count; /* # of valid blocks */ block_t discard_blks; /* discard command candidats */ block_t last_valid_block_count; /* for recovery */ + block_t reserved_blocks; /* configurable reserved blocks */ + block_t current_reserved_blocks; /* current reserved blocks */ + + unsigned int nquota_files; /* # of quota sysfile */ + u32 s_next_generation; /* for NFS support */ - atomic_t nr_wb_bios; /* # of writeback bios */ /* # of pages, see count_type */ atomic_t nr_pages[NR_COUNT_TYPE]; /* # of allocated blocks */ struct percpu_counter alloc_valid_block_count; + /* writeback control */ + atomic_t wb_sync_req[META]; /* count # of WB_SYNC threads */ + /* valid inode count */ struct percpu_counter total_valid_inode_count; @@ -833,9 +1239,13 @@ struct f2fs_sb_info { struct mutex gc_mutex; /* mutex for GC */ struct f2fs_gc_kthread *gc_thread; /* GC thread */ unsigned int cur_victim_sec; /* current victim section num */ + unsigned int gc_mode; /* current GC state */ + /* for skip statistic */ + unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */ + unsigned long long skipped_gc_rwsem; /* FG_GC only */ - /* threshold for converting bg victims for fg */ - u64 fggc_threshold; + /* threshold for gc trials on pinned files */ + u64 gc_pin_file_threshold; /* maximum # of trials to find a victim segment for SSR and GC */ unsigned int max_victim_search; @@ -856,18 +1266,30 @@ struct f2fs_sb_info { atomic_t inline_xattr; /* # of inline_xattr inodes */ atomic_t inline_inode; /* # of inline_data inodes */ atomic_t inline_dir; /* # of inline_dentry inodes */ + atomic_t aw_cnt; /* # of atomic writes */ + atomic_t vw_cnt; /* # of volatile writes */ + atomic_t max_aw_cnt; /* max # of atomic writes */ + atomic_t max_vw_cnt; /* max # of volatile writes */ int bg_gc; /* background gc calls */ unsigned int ndirty_inode[NR_INODE_TYPE]; /* # of dirty inodes */ #endif - unsigned int last_victim[2]; /* last victim segment # */ spinlock_t stat_lock; /* lock for stat operations */ + /* For app/fs IO statistics */ + spinlock_t iostat_lock; + unsigned long long write_iostat[NR_IO_TYPE]; + bool iostat_enable; + /* For sysfs suppport */ struct kobject s_kobj; struct completion s_kobj_unregister; /* For shrinker support */ struct list_head s_list; + int s_ndevs; /* number of devices */ + struct f2fs_dev_info *devs; /* for device list */ + unsigned int dirty_device; /* for checkpoint data flush */ + spinlock_t dev_lock; /* protect dirty_device */ struct mutex umount_mutex; unsigned int shrinker_run_no; @@ -878,16 +1300,18 @@ struct f2fs_sb_info { /* Reference to checksum algorithm driver via cryptoapi */ struct crypto_shash *s_chksum_driver; - /* For fault injection */ -#ifdef CONFIG_F2FS_FAULT_INJECTION - struct f2fs_fault_info fault_info; -#endif + /* Precomputed FS UUID checksum for seeding other checksums */ + __u32 s_chksum_seed; }; #ifdef CONFIG_F2FS_FAULT_INJECTION +#define f2fs_show_injection_info(type) \ + printk("%sF2FS-fs : inject %s in %s of %pF\n", \ + KERN_INFO, f2fs_fault_name[type], \ + __func__, __builtin_return_address(0)) static inline bool time_to_inject(struct f2fs_sb_info *sbi, int type) { - struct f2fs_fault_info *ffi = &sbi->fault_info; + struct f2fs_fault_info *ffi = &F2FS_OPTION(sbi).fault_info; if (!ffi->inject_rate) return false; @@ -898,22 +1322,24 @@ static inline bool time_to_inject(struct f2fs_sb_info *sbi, int type) atomic_inc(&ffi->inject_ops); if (atomic_read(&ffi->inject_ops) >= ffi->inject_rate) { atomic_set(&ffi->inject_ops, 0); - printk("%sF2FS-fs : inject %s in %pF\n", - KERN_INFO, - fault_name[type], - __builtin_return_address(0)); return true; } return false; } +#else +#define f2fs_show_injection_info(type) do { } while (0) +static inline bool time_to_inject(struct f2fs_sb_info *sbi, int type) +{ + return false; +} #endif /* For write statistics. Suppose sector size is 512 bytes, * and the return value is in kbytes. s is of struct f2fs_sb_info. */ #define BD_PART_WRITTEN(s) \ -(((u64)part_stat_read(s->sb->s_bdev->bd_part, sectors[1]) - \ - s->sectors_written_start) >> 1) +(((u64)part_stat_read((s)->sb->s_bdev->bd_part, sectors[1]) - \ + (s)->sectors_written_start) >> 1) static inline void f2fs_update_time(struct f2fs_sb_info *sbi, int type) { @@ -922,8 +1348,7 @@ static inline void f2fs_update_time(struct f2fs_sb_info *sbi, int type) static inline bool f2fs_time_over(struct f2fs_sb_info *sbi, int type) { - struct timespec ts = {sbi->interval_time[type], 0}; - unsigned long interval = timespec_to_jiffies(&ts); + unsigned long interval = sbi->interval_time[type] * HZ; return time_after(jiffies, sbi->last_time[type] + interval); } @@ -935,7 +1360,7 @@ static inline bool is_idle(struct f2fs_sb_info *sbi) struct request_list *rl = &q->root_rl; if (rl->count[BLK_RW_SYNC] || rl->count[BLK_RW_ASYNC]) - return 0; + return false; return f2fs_time_over(sbi, REQ_TIME); } @@ -943,24 +1368,31 @@ static inline bool is_idle(struct f2fs_sb_info *sbi) /* * Inline functions */ +static inline u32 __f2fs_crc32(struct f2fs_sb_info *sbi, u32 crc, + const void *address, unsigned int length) +{ + struct { + struct shash_desc shash; + char ctx[4]; + } desc; + int err; + + BUG_ON(crypto_shash_descsize(sbi->s_chksum_driver) != sizeof(desc.ctx)); + + desc.shash.tfm = sbi->s_chksum_driver; + desc.shash.flags = 0; + *(u32 *)desc.ctx = crc; + + err = crypto_shash_update(&desc.shash, address, length); + BUG_ON(err); + + return *(u32 *)desc.ctx; +} + static inline u32 f2fs_crc32(struct f2fs_sb_info *sbi, const void *address, unsigned int length) { - SHASH_DESC_ON_STACK(shash, sbi->s_chksum_driver); - u32 *ctx = (u32 *)shash_desc_ctx(shash); - u32 retval; - int err; - - shash->tfm = sbi->s_chksum_driver; - shash->flags = 0; - *ctx = F2FS_SUPER_MAGIC; - - err = crypto_shash_update(shash, address, length); - BUG_ON(err); - - retval = *ctx; - barrier_data(ctx); - return retval; + return __f2fs_crc32(sbi, F2FS_SUPER_MAGIC, address, length); } static inline bool f2fs_crc_valid(struct f2fs_sb_info *sbi, __u32 blk_crc, @@ -969,6 +1401,12 @@ static inline bool f2fs_crc_valid(struct f2fs_sb_info *sbi, __u32 blk_crc, return f2fs_crc32(sbi, buf, buf_size) == blk_crc; } +static inline u32 f2fs_chksum(struct f2fs_sb_info *sbi, u32 crc, + const void *address, unsigned int length) +{ + return __f2fs_crc32(sbi, crc, address, length); +} + static inline struct f2fs_inode_info *F2FS_I(struct inode *inode) { return container_of(inode, struct f2fs_inode_info, vfs_inode); @@ -1069,6 +1507,19 @@ static inline unsigned long long cur_cp_version(struct f2fs_checkpoint *cp) return le64_to_cpu(cp->checkpoint_ver); } +static inline unsigned long f2fs_qf_ino(struct super_block *sb, int type) +{ + if (type < F2FS_MAX_QUOTAS) + return le32_to_cpu(F2FS_SB(sb)->raw_super->qf_ino[type]); + return 0; +} + +static inline __u64 cur_cp_crc(struct f2fs_checkpoint *cp) +{ + size_t crc_offset = le32_to_cpu(cp->checksum_offset); + return le32_to_cpu(*((__le32 *)((unsigned char *)cp + crc_offset))); +} + static inline bool __is_set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f) { unsigned int ckpt_flags = le32_to_cpu(cp->ckpt_flags); @@ -1092,9 +1543,11 @@ static inline void __set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f) static inline void set_ckpt_flags(struct f2fs_sb_info *sbi, unsigned int f) { - spin_lock(&sbi->cp_lock); + unsigned long flags; + + spin_lock_irqsave(&sbi->cp_lock, flags); __set_ckpt_flags(F2FS_CKPT(sbi), f); - spin_unlock(&sbi->cp_lock); + spin_unlock_irqrestore(&sbi->cp_lock, flags); } static inline void __clear_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f) @@ -1108,16 +1561,34 @@ static inline void __clear_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f static inline void clear_ckpt_flags(struct f2fs_sb_info *sbi, unsigned int f) { - spin_lock(&sbi->cp_lock); + unsigned long flags; + + spin_lock_irqsave(&sbi->cp_lock, flags); __clear_ckpt_flags(F2FS_CKPT(sbi), f); - spin_unlock(&sbi->cp_lock); + spin_unlock_irqrestore(&sbi->cp_lock, flags); } -static inline bool f2fs_discard_en(struct f2fs_sb_info *sbi) +static inline void disable_nat_bits(struct f2fs_sb_info *sbi, bool lock) { - struct request_queue *q = bdev_get_queue(sbi->sb->s_bdev); + unsigned long flags; - return blk_queue_discard(q); + set_sbi_flag(sbi, SBI_NEED_FSCK); + + if (lock) + spin_lock_irqsave(&sbi->cp_lock, flags); + __clear_ckpt_flags(F2FS_CKPT(sbi), CP_NAT_BITS_FLAG); + kfree(NM_I(sbi)->nat_bits); + NM_I(sbi)->nat_bits = NULL; + if (lock) + spin_unlock_irqrestore(&sbi->cp_lock, flags); +} + +static inline bool enabled_nat_bits(struct f2fs_sb_info *sbi, + struct cp_control *cpc) +{ + bool set = is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG); + + return (cpc) ? (cpc->reason & CP_UMOUNT) && set : set; } static inline void f2fs_lock_op(struct f2fs_sb_info *sbi) @@ -1125,6 +1596,11 @@ static inline void f2fs_lock_op(struct f2fs_sb_info *sbi) down_read(&sbi->cp_rwsem); } +static inline int f2fs_trylock_op(struct f2fs_sb_info *sbi) +{ + return down_read_trylock(&sbi->cp_rwsem); +} + static inline void f2fs_unlock_op(struct f2fs_sb_info *sbi) { up_read(&sbi->cp_rwsem); @@ -1153,7 +1629,7 @@ static inline int __get_cp_reason(struct f2fs_sb_info *sbi) static inline bool __remain_node_summaries(int reason) { - return (reason == CP_UMOUNT || reason == CP_FASTBOOT); + return (reason & (CP_UMOUNT | CP_FASTBOOT)); } static inline bool __exist_node_summaries(struct f2fs_sb_info *sbi) @@ -1163,28 +1639,13 @@ static inline bool __exist_node_summaries(struct f2fs_sb_info *sbi) } /* - * Check whether the given nid is within node id range. - */ -static inline int check_nid_range(struct f2fs_sb_info *sbi, nid_t nid) -{ - if (unlikely(nid < F2FS_ROOT_INO(sbi))) - return -EINVAL; - if (unlikely(nid >= NM_I(sbi)->max_nid)) - return -EINVAL; - return 0; -} - -#define F2FS_DEFAULT_ALLOCATED_BLOCKS 1 - -/* * Check whether the inode has blocks or not */ static inline int F2FS_HAS_BLOCKS(struct inode *inode) { - if (F2FS_I(inode)->i_xattr_nid) - return inode->i_blocks > F2FS_DEFAULT_ALLOCATED_BLOCKS + 1; - else - return inode->i_blocks > F2FS_DEFAULT_ALLOCATED_BLOCKS; + block_t xattr_block = F2FS_I(inode)->i_xattr_nid ? 1 : 0; + + return (inode->i_blocks >> F2FS_LOG_SECTORS_PER_BLOCK) > xattr_block; } static inline bool f2fs_has_xattr_block(unsigned int ofs) @@ -1192,16 +1653,43 @@ static inline bool f2fs_has_xattr_block(unsigned int ofs) return ofs == XATTR_NODE_OFFSET; } -static inline void f2fs_i_blocks_write(struct inode *, blkcnt_t, bool); -static inline bool inc_valid_block_count(struct f2fs_sb_info *sbi, +static inline bool __allow_reserved_blocks(struct f2fs_sb_info *sbi, + struct inode *inode, bool cap) +{ + if (!inode) + return true; + if (!test_opt(sbi, RESERVE_ROOT)) + return false; + if (IS_NOQUOTA(inode)) + return true; + if (uid_eq(F2FS_OPTION(sbi).s_resuid, current_fsuid())) + return true; + if (!gid_eq(F2FS_OPTION(sbi).s_resgid, GLOBAL_ROOT_GID) && + in_group_p(F2FS_OPTION(sbi).s_resgid)) + return true; + if (cap && capable(CAP_SYS_RESOURCE)) + return true; + return false; +} + +static inline void f2fs_i_blocks_write(struct inode *, block_t, bool, bool); +static inline int inc_valid_block_count(struct f2fs_sb_info *sbi, struct inode *inode, blkcnt_t *count) { - blkcnt_t diff; + blkcnt_t diff = 0, release = 0; + block_t avail_user_block_count; + int ret; -#ifdef CONFIG_F2FS_FAULT_INJECTION - if (time_to_inject(sbi, FAULT_BLOCK)) - return false; -#endif + ret = dquot_reserve_block(inode, *count); + if (ret) + return ret; + + if (time_to_inject(sbi, FAULT_BLOCK)) { + f2fs_show_injection_info(FAULT_BLOCK); + release = *count; + goto enospc; + } + /* * let's increase this in prior to actual block count change in order * for f2fs_sync_file to avoid data races when deciding checkpoint. @@ -1210,39 +1698,63 @@ static inline bool inc_valid_block_count(struct f2fs_sb_info *sbi, spin_lock(&sbi->stat_lock); sbi->total_valid_block_count += (block_t)(*count); - if (unlikely(sbi->total_valid_block_count > sbi->user_block_count)) { - diff = sbi->total_valid_block_count - sbi->user_block_count; + avail_user_block_count = sbi->user_block_count - + sbi->current_reserved_blocks; + + if (!__allow_reserved_blocks(sbi, inode, true)) + avail_user_block_count -= F2FS_OPTION(sbi).root_reserved_blocks; + + if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) { + diff = sbi->total_valid_block_count - avail_user_block_count; + if (diff > *count) + diff = *count; *count -= diff; - sbi->total_valid_block_count = sbi->user_block_count; + release = diff; + sbi->total_valid_block_count -= diff; if (!*count) { spin_unlock(&sbi->stat_lock); - percpu_counter_sub(&sbi->alloc_valid_block_count, diff); - return false; + goto enospc; } } spin_unlock(&sbi->stat_lock); - f2fs_i_blocks_write(inode, *count, true); - return true; + if (unlikely(release)) { + percpu_counter_sub(&sbi->alloc_valid_block_count, release); + dquot_release_reservation_block(inode, release); + } + f2fs_i_blocks_write(inode, *count, true, true); + return 0; + +enospc: + percpu_counter_sub(&sbi->alloc_valid_block_count, release); + dquot_release_reservation_block(inode, release); + return -ENOSPC; } static inline void dec_valid_block_count(struct f2fs_sb_info *sbi, struct inode *inode, - blkcnt_t count) + block_t count) { + blkcnt_t sectors = count << F2FS_LOG_SECTORS_PER_BLOCK; + spin_lock(&sbi->stat_lock); f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count); - f2fs_bug_on(sbi, inode->i_blocks < count); + f2fs_bug_on(sbi, inode->i_blocks < sectors); sbi->total_valid_block_count -= (block_t)count; + if (sbi->reserved_blocks && + sbi->current_reserved_blocks < sbi->reserved_blocks) + sbi->current_reserved_blocks = min(sbi->reserved_blocks, + sbi->current_reserved_blocks + count); spin_unlock(&sbi->stat_lock); - f2fs_i_blocks_write(inode, count, false); + f2fs_i_blocks_write(inode, count, false, true); } static inline void inc_page_count(struct f2fs_sb_info *sbi, int count_type) { atomic_inc(&sbi->nr_pages[count_type]); - if (count_type == F2FS_DIRTY_DATA || count_type == F2FS_INMEM_PAGES) + if (count_type == F2FS_DIRTY_DATA || count_type == F2FS_INMEM_PAGES || + count_type == F2FS_WB_CP_DATA || count_type == F2FS_WB_DATA) return; set_sbi_flag(sbi, SBI_IS_DIRTY); @@ -1253,6 +1765,8 @@ static inline void inode_inc_dirty_pages(struct inode *inode) atomic_inc(&F2FS_I(inode)->dirty_pages); inc_page_count(F2FS_I_SB(inode), S_ISDIR(inode->i_mode) ? F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA); + if (IS_NOQUOTA(inode)) + inc_page_count(F2FS_I_SB(inode), F2FS_DIRTY_QDATA); } static inline void dec_page_count(struct f2fs_sb_info *sbi, int count_type) @@ -1269,6 +1783,8 @@ static inline void inode_dec_dirty_pages(struct inode *inode) atomic_dec(&F2FS_I(inode)->dirty_pages); dec_page_count(F2FS_I_SB(inode), S_ISDIR(inode->i_mode) ? F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA); + if (IS_NOQUOTA(inode)) + dec_page_count(F2FS_I_SB(inode), F2FS_DIRTY_QDATA); } static inline s64 get_pages(struct f2fs_sb_info *sbi, int count_type) @@ -1323,6 +1839,12 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); int offset; + if (is_set_ckpt_flags(sbi, CP_LARGE_NAT_BITMAP_FLAG)) { + offset = (flag == SIT_BITMAP) ? + le32_to_cpu(ckpt->nat_ver_bitmap_bytesize) : 0; + return &ckpt->sit_nat_version_bitmap + offset; + } + if (__cp_payload(sbi) > 0) { if (flag == NAT_BITMAP) return &ckpt->sit_nat_version_bitmap; @@ -1363,51 +1885,82 @@ static inline block_t __start_sum_addr(struct f2fs_sb_info *sbi) return le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_start_sum); } -static inline bool inc_valid_node_count(struct f2fs_sb_info *sbi, - struct inode *inode) +static inline int inc_valid_node_count(struct f2fs_sb_info *sbi, + struct inode *inode, bool is_inode) { block_t valid_block_count; unsigned int valid_node_count; + bool quota = inode && !is_inode; + + if (quota) { + int ret = dquot_reserve_block(inode, 1); + if (ret) + return ret; + } + + if (time_to_inject(sbi, FAULT_BLOCK)) { + f2fs_show_injection_info(FAULT_BLOCK); + goto enospc; + } spin_lock(&sbi->stat_lock); - valid_block_count = sbi->total_valid_block_count + 1; + valid_block_count = sbi->total_valid_block_count + + sbi->current_reserved_blocks + 1; + + if (!__allow_reserved_blocks(sbi, inode, false)) + valid_block_count += F2FS_OPTION(sbi).root_reserved_blocks; + if (unlikely(valid_block_count > sbi->user_block_count)) { spin_unlock(&sbi->stat_lock); - return false; + goto enospc; } valid_node_count = sbi->total_valid_node_count + 1; if (unlikely(valid_node_count > sbi->total_node_count)) { spin_unlock(&sbi->stat_lock); - return false; + goto enospc; } - if (inode) - f2fs_i_blocks_write(inode, 1, true); - sbi->total_valid_node_count++; sbi->total_valid_block_count++; spin_unlock(&sbi->stat_lock); + if (inode) { + if (is_inode) + f2fs_mark_inode_dirty_sync(inode, true); + else + f2fs_i_blocks_write(inode, 1, true, true); + } + percpu_counter_inc(&sbi->alloc_valid_block_count); - return true; + return 0; + +enospc: + if (quota) + dquot_release_reservation_block(inode, 1); + return -ENOSPC; } static inline void dec_valid_node_count(struct f2fs_sb_info *sbi, - struct inode *inode) + struct inode *inode, bool is_inode) { spin_lock(&sbi->stat_lock); f2fs_bug_on(sbi, !sbi->total_valid_block_count); f2fs_bug_on(sbi, !sbi->total_valid_node_count); - f2fs_bug_on(sbi, !inode->i_blocks); + f2fs_bug_on(sbi, !is_inode && !inode->i_blocks); - f2fs_i_blocks_write(inode, 1, false); sbi->total_valid_node_count--; sbi->total_valid_block_count--; + if (sbi->reserved_blocks && + sbi->current_reserved_blocks < sbi->reserved_blocks) + sbi->current_reserved_blocks++; spin_unlock(&sbi->stat_lock); + + if (!is_inode) + f2fs_i_blocks_write(inode, 1, false, true); } static inline unsigned int valid_node_count(struct f2fs_sb_info *sbi) @@ -1433,19 +1986,40 @@ static inline s64 valid_inode_count(struct f2fs_sb_info *sbi) static inline struct page *f2fs_grab_cache_page(struct address_space *mapping, pgoff_t index, bool for_write) { -#ifdef CONFIG_F2FS_FAULT_INJECTION - struct page *page = find_lock_page(mapping, index); - if (page) - return page; + struct page *page; - if (time_to_inject(F2FS_M_SB(mapping), FAULT_PAGE_ALLOC)) - return NULL; -#endif + if (IS_ENABLED(CONFIG_F2FS_FAULT_INJECTION)) { + if (!for_write) + page = find_get_page_flags(mapping, index, + FGP_LOCK | FGP_ACCESSED); + else + page = find_lock_page(mapping, index); + if (page) + return page; + + if (time_to_inject(F2FS_M_SB(mapping), FAULT_PAGE_ALLOC)) { + f2fs_show_injection_info(FAULT_PAGE_ALLOC); + return NULL; + } + } + if (!for_write) return grab_cache_page(mapping, index); return grab_cache_page_write_begin(mapping, index, AOP_FLAG_NOFS); } +static inline struct page *f2fs_pagecache_get_page( + struct address_space *mapping, pgoff_t index, + int fgp_flags, gfp_t gfp_mask) +{ + if (time_to_inject(F2FS_M_SB(mapping), FAULT_PAGE_GET)) { + f2fs_show_injection_info(FAULT_PAGE_GET); + return NULL; + } + + return pagecache_get_page(mapping, index, fgp_flags, gfp_mask); +} + static inline void f2fs_copy_page(struct page *src, struct page *dst) { char *src_kaddr = kmap(src); @@ -1495,15 +2069,24 @@ static inline void *f2fs_kmem_cache_alloc(struct kmem_cache *cachep, return entry; } -static inline struct bio *f2fs_bio_alloc(int npages) +static inline struct bio *f2fs_bio_alloc(struct f2fs_sb_info *sbi, + int npages, bool no_fail) { struct bio *bio; - /* No failure on bio allocation */ - bio = bio_alloc(GFP_NOIO, npages); - if (!bio) - bio = bio_alloc(GFP_NOIO | __GFP_NOFAIL, npages); - return bio; + if (no_fail) { + /* No failure on bio allocation */ + bio = bio_alloc(GFP_NOIO, npages); + if (!bio) + bio = bio_alloc(GFP_NOIO | __GFP_NOFAIL, npages); + return bio; + } + if (time_to_inject(sbi, FAULT_ALLOC_BIO)) { + f2fs_show_injection_info(FAULT_ALLOC_BIO); + return NULL; + } + + return bio_alloc(GFP_KERNEL, npages); } static inline void f2fs_radix_tree_insert(struct radix_tree_root *root, @@ -1518,22 +2101,42 @@ static inline void f2fs_radix_tree_insert(struct radix_tree_root *root, static inline bool IS_INODE(struct page *page) { struct f2fs_node *p = F2FS_NODE(page); + return RAW_IS_INODE(p); } +static inline int offset_in_addr(struct f2fs_inode *i) +{ + return (i->i_inline & F2FS_EXTRA_ATTR) ? + (le16_to_cpu(i->i_extra_isize) / sizeof(__le32)) : 0; +} + static inline __le32 *blkaddr_in_node(struct f2fs_node *node) { return RAW_IS_INODE(node) ? node->i.i_addr : node->dn.addr; } -static inline block_t datablock_addr(struct page *node_page, - unsigned int offset) +static inline int f2fs_has_extra_attr(struct inode *inode); +static inline block_t datablock_addr(struct inode *inode, + struct page *node_page, unsigned int offset) { struct f2fs_node *raw_node; __le32 *addr_array; + int base = 0; + bool is_inode = IS_INODE(node_page); + raw_node = F2FS_NODE(node_page); + + /* from GC path only */ + if (is_inode) { + if (!inode) + base = offset_in_addr(&raw_node->i); + else if (f2fs_has_extra_attr(inode)) + base = get_extra_isize(inode); + } + addr_array = blkaddr_in_node(raw_node); - return le32_to_cpu(addr_array[offset]); + return le32_to_cpu(addr_array[base + offset]); } static inline int f2fs_test_bit(unsigned int nr, char *addr) @@ -1596,6 +2199,71 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr) *addr ^= mask; } +/* + * Inode flags + */ +#define F2FS_SECRM_FL 0x00000001 /* Secure deletion */ +#define F2FS_UNRM_FL 0x00000002 /* Undelete */ +#define F2FS_COMPR_FL 0x00000004 /* Compress file */ +#define F2FS_SYNC_FL 0x00000008 /* Synchronous updates */ +#define F2FS_IMMUTABLE_FL 0x00000010 /* Immutable file */ +#define F2FS_APPEND_FL 0x00000020 /* writes to file may only append */ +#define F2FS_NODUMP_FL 0x00000040 /* do not dump file */ +#define F2FS_NOATIME_FL 0x00000080 /* do not update atime */ +/* Reserved for compression usage... */ +#define F2FS_DIRTY_FL 0x00000100 +#define F2FS_COMPRBLK_FL 0x00000200 /* One or more compressed clusters */ +#define F2FS_NOCOMPR_FL 0x00000400 /* Don't compress */ +#define F2FS_ENCRYPT_FL 0x00000800 /* encrypted file */ +/* End compression flags --- maybe not all used */ +#define F2FS_INDEX_FL 0x00001000 /* hash-indexed directory */ +#define F2FS_IMAGIC_FL 0x00002000 /* AFS directory */ +#define F2FS_JOURNAL_DATA_FL 0x00004000 /* file data should be journaled */ +#define F2FS_NOTAIL_FL 0x00008000 /* file tail should not be merged */ +#define F2FS_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */ +#define F2FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ +#define F2FS_HUGE_FILE_FL 0x00040000 /* Set to each huge file */ +#define F2FS_EXTENTS_FL 0x00080000 /* Inode uses extents */ +#define F2FS_EA_INODE_FL 0x00200000 /* Inode used for large EA */ +#define F2FS_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */ +#define F2FS_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */ +#define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */ +#define F2FS_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ + +#define F2FS_FL_USER_VISIBLE 0x304BDFFF /* User visible flags */ +#define F2FS_FL_USER_MODIFIABLE 0x204BC0FF /* User modifiable flags */ + +/* Flags we can manipulate with through F2FS_IOC_FSSETXATTR */ +#define F2FS_FL_XFLAG_VISIBLE (F2FS_SYNC_FL | \ + F2FS_IMMUTABLE_FL | \ + F2FS_APPEND_FL | \ + F2FS_NODUMP_FL | \ + F2FS_NOATIME_FL | \ + F2FS_PROJINHERIT_FL) + +/* Flags that should be inherited by new inodes from their parent. */ +#define F2FS_FL_INHERITED (F2FS_SECRM_FL | F2FS_UNRM_FL | F2FS_COMPR_FL |\ + F2FS_SYNC_FL | F2FS_NODUMP_FL | F2FS_NOATIME_FL |\ + F2FS_NOCOMPR_FL | F2FS_JOURNAL_DATA_FL |\ + F2FS_NOTAIL_FL | F2FS_DIRSYNC_FL |\ + F2FS_PROJINHERIT_FL) + +/* Flags that are appropriate for regular files (all but dir-specific ones). */ +#define F2FS_REG_FLMASK (~(F2FS_DIRSYNC_FL | F2FS_TOPDIR_FL)) + +/* Flags that are appropriate for non-directories/regular files. */ +#define F2FS_OTHER_FLMASK (F2FS_NODUMP_FL | F2FS_NOATIME_FL) + +static inline __u32 f2fs_mask_flags(umode_t mode, __u32 flags) +{ + if (S_ISDIR(mode)) + return flags; + else if (S_ISREG(mode)) + return flags & F2FS_REG_FLMASK; + else + return flags & F2FS_OTHER_FLMASK; +} + /* used for f2fs_inode_info->flags */ enum { FI_NEW_INODE, /* indicate newly allocated inode */ @@ -1614,6 +2282,7 @@ enum { FI_UPDATE_WRITE, /* inode has in-place-update data */ FI_NEED_IPU, /* used for ipu per file */ FI_ATOMIC_FILE, /* indicate atomic file */ + FI_ATOMIC_COMMIT, /* indicate the state of atomical committing */ FI_VOLATILE_FILE, /* indicate volatile file */ FI_FIRST_BLOCK_WRITTEN, /* indicate #0 data block was written */ FI_DROP_CACHE, /* drop dirty page cache */ @@ -1621,6 +2290,12 @@ enum { FI_INLINE_DOTS, /* indicate inline dot dentries */ FI_DO_DEFRAG, /* indicate defragment is running */ FI_DIRTY_FILE, /* indicate regular/symlink has dirty pages */ + FI_NO_PREALLOC, /* indicate skipped preallocated blocks */ + FI_HOT_DATA, /* indicate file is hot */ + FI_EXTRA_ATTR, /* indicate file has extra attribute */ + FI_PROJ_INHERIT, /* indicate file inherits projectid */ + FI_PIN_FILE, /* indicate file should not be gced */ + FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */ }; static inline void __mark_inode_dirty_flag(struct inode *inode, @@ -1630,11 +2305,13 @@ static inline void __mark_inode_dirty_flag(struct inode *inode, case FI_INLINE_XATTR: case FI_INLINE_DATA: case FI_INLINE_DENTRY: + case FI_NEW_INODE: if (set) return; case FI_DATA_EXIST: case FI_INLINE_DOTS: - f2fs_mark_inode_dirty_sync(inode); + case FI_PIN_FILE: + f2fs_mark_inode_dirty_sync(inode, true); } } @@ -1661,7 +2338,7 @@ static inline void set_acl_inode(struct inode *inode, umode_t mode) { F2FS_I(inode)->i_acl_mode = mode; set_inode_flag(inode, FI_ACL_MODE); - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, false); } static inline void f2fs_i_links_write(struct inode *inode, bool inc) @@ -1670,18 +2347,26 @@ static inline void f2fs_i_links_write(struct inode *inode, bool inc) inc_nlink(inode); else drop_nlink(inode); - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, true); } static inline void f2fs_i_blocks_write(struct inode *inode, - blkcnt_t diff, bool add) + block_t diff, bool add, bool claim) { bool clean = !is_inode_flag_set(inode, FI_DIRTY_INODE); bool recover = is_inode_flag_set(inode, FI_AUTO_RECOVER); - inode->i_blocks = add ? inode->i_blocks + diff : - inode->i_blocks - diff; - f2fs_mark_inode_dirty_sync(inode); + /* add = 1, claim = 1 should be dquot_reserve_block in pair */ + if (add) { + if (claim) + dquot_claim_block(inode, diff); + else + dquot_alloc_block_nofail(inode, diff); + } else { + dquot_free_block(inode, diff); + } + + f2fs_mark_inode_dirty_sync(inode, true); if (clean || recover) set_inode_flag(inode, FI_AUTO_RECOVER); } @@ -1695,34 +2380,34 @@ static inline void f2fs_i_size_write(struct inode *inode, loff_t i_size) return; i_size_write(inode, i_size); - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, true); if (clean || recover) set_inode_flag(inode, FI_AUTO_RECOVER); } -static inline bool f2fs_skip_inode_update(struct inode *inode) -{ - if (!is_inode_flag_set(inode, FI_AUTO_RECOVER)) - return false; - return F2FS_I(inode)->last_disk_size == i_size_read(inode); -} - static inline void f2fs_i_depth_write(struct inode *inode, unsigned int depth) { F2FS_I(inode)->i_current_depth = depth; - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, true); +} + +static inline void f2fs_i_gc_failures_write(struct inode *inode, + unsigned int count) +{ + F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN] = count; + f2fs_mark_inode_dirty_sync(inode, true); } static inline void f2fs_i_xnid_write(struct inode *inode, nid_t xnid) { F2FS_I(inode)->i_xattr_nid = xnid; - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, true); } static inline void f2fs_i_pino_write(struct inode *inode, nid_t pino) { F2FS_I(inode)->i_pino = pino; - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, true); } static inline void get_inline_info(struct inode *inode, struct f2fs_inode *ri) @@ -1739,6 +2424,10 @@ static inline void get_inline_info(struct inode *inode, struct f2fs_inode *ri) set_bit(FI_DATA_EXIST, &fi->flags); if (ri->i_inline & F2FS_INLINE_DOTS) set_bit(FI_INLINE_DOTS, &fi->flags); + if (ri->i_inline & F2FS_EXTRA_ATTR) + set_bit(FI_EXTRA_ATTR, &fi->flags); + if (ri->i_inline & F2FS_PIN_FILE) + set_bit(FI_PIN_FILE, &fi->flags); } static inline void set_raw_inline(struct inode *inode, struct f2fs_inode *ri) @@ -1755,6 +2444,15 @@ static inline void set_raw_inline(struct inode *inode, struct f2fs_inode *ri) ri->i_inline |= F2FS_DATA_EXIST; if (is_inode_flag_set(inode, FI_INLINE_DOTS)) ri->i_inline |= F2FS_INLINE_DOTS; + if (is_inode_flag_set(inode, FI_EXTRA_ATTR)) + ri->i_inline |= F2FS_EXTRA_ATTR; + if (is_inode_flag_set(inode, FI_PIN_FILE)) + ri->i_inline |= F2FS_PIN_FILE; +} + +static inline int f2fs_has_extra_attr(struct inode *inode) +{ + return is_inode_flag_set(inode, FI_EXTRA_ATTR); } static inline int f2fs_has_inline_xattr(struct inode *inode) @@ -1764,24 +2462,20 @@ static inline int f2fs_has_inline_xattr(struct inode *inode) static inline unsigned int addrs_per_inode(struct inode *inode) { - if (f2fs_has_inline_xattr(inode)) - return DEF_ADDRS_PER_INODE - F2FS_INLINE_XATTR_ADDRS; - return DEF_ADDRS_PER_INODE; + return CUR_ADDRS_PER_INODE(inode) - get_inline_xattr_addrs(inode); } -static inline void *inline_xattr_addr(struct page *page) +static inline void *inline_xattr_addr(struct inode *inode, struct page *page) { struct f2fs_inode *ri = F2FS_INODE(page); + return (void *)&(ri->i_addr[DEF_ADDRS_PER_INODE - - F2FS_INLINE_XATTR_ADDRS]); + get_inline_xattr_addrs(inode)]); } static inline int inline_xattr_size(struct inode *inode) { - if (f2fs_has_inline_xattr(inode)) - return F2FS_INLINE_XATTR_ADDRS << 2; - else - return 0; + return get_inline_xattr_addrs(inode) * sizeof(__le32); } static inline int f2fs_has_inline_data(struct inode *inode) @@ -1789,12 +2483,6 @@ static inline int f2fs_has_inline_data(struct inode *inode) return is_inode_flag_set(inode, FI_INLINE_DATA); } -static inline void f2fs_clear_inline_inode(struct inode *inode) -{ - clear_inode_flag(inode, FI_INLINE_DATA); - clear_inode_flag(inode, FI_DATA_EXIST); -} - static inline int f2fs_exist_data(struct inode *inode) { return is_inode_flag_set(inode, FI_DATA_EXIST); @@ -1805,11 +2493,21 @@ static inline int f2fs_has_inline_dots(struct inode *inode) return is_inode_flag_set(inode, FI_INLINE_DOTS); } +static inline bool f2fs_is_pinned_file(struct inode *inode) +{ + return is_inode_flag_set(inode, FI_PIN_FILE); +} + static inline bool f2fs_is_atomic_file(struct inode *inode) { return is_inode_flag_set(inode, FI_ATOMIC_FILE); } +static inline bool f2fs_is_commit_atomic_write(struct inode *inode) +{ + return is_inode_flag_set(inode, FI_ATOMIC_COMMIT); +} + static inline bool f2fs_is_volatile_file(struct inode *inode) { return is_inode_flag_set(inode, FI_VOLATILE_FILE); @@ -1825,10 +2523,12 @@ static inline bool f2fs_is_drop_cache(struct inode *inode) return is_inode_flag_set(inode, FI_DROP_CACHE); } -static inline void *inline_data_addr(struct page *page) +static inline void *inline_data_addr(struct inode *inode, struct page *page) { struct f2fs_inode *ri = F2FS_INODE(page); - return (void *)&(ri->i_addr[1]); + int extra_size = get_extra_isize(inode); + + return (void *)&(ri->i_addr[extra_size + DEF_INLINE_RESERVED_SIZE]); } static inline int f2fs_has_inline_dentry(struct inode *inode) @@ -1836,12 +2536,6 @@ static inline int f2fs_has_inline_dentry(struct inode *inode) return is_inode_flag_set(inode, FI_INLINE_DENTRY); } -static inline void f2fs_dentry_kunmap(struct inode *dir, struct page *page) -{ - if (!f2fs_has_inline_dentry(dir)) - kunmap(page); -} - static inline int is_file(struct inode *inode, int type) { return F2FS_I(inode)->i_advise & type; @@ -1850,16 +2544,50 @@ static inline int is_file(struct inode *inode, int type) static inline void set_file(struct inode *inode, int type) { F2FS_I(inode)->i_advise |= type; - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, true); } static inline void clear_file(struct inode *inode, int type) { F2FS_I(inode)->i_advise &= ~type; - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, true); } -static inline int f2fs_readonly(struct super_block *sb) +static inline bool f2fs_skip_inode_update(struct inode *inode, int dsync) +{ + bool ret; + + if (dsync) { + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + + spin_lock(&sbi->inode_lock[DIRTY_META]); + ret = list_empty(&F2FS_I(inode)->gdirty_list); + spin_unlock(&sbi->inode_lock[DIRTY_META]); + return ret; + } + if (!is_inode_flag_set(inode, FI_AUTO_RECOVER) || + file_keep_isize(inode) || + i_size_read(inode) & ~PAGE_MASK) + return false; + + if (!timespec_equal(F2FS_I(inode)->i_disk_time, &inode->i_atime)) + return false; + if (!timespec_equal(F2FS_I(inode)->i_disk_time + 1, &inode->i_ctime)) + return false; + if (!timespec_equal(F2FS_I(inode)->i_disk_time + 2, &inode->i_mtime)) + return false; + if (!timespec_equal(F2FS_I(inode)->i_disk_time + 3, + &F2FS_I(inode)->i_crtime)) + return false; + + down_read(&F2FS_I(inode)->i_sem); + ret = F2FS_I(inode)->last_disk_size == i_size_read(inode); + up_read(&F2FS_I(inode)->i_sem); + + return ret; +} + +static inline bool f2fs_readonly(struct super_block *sb) { return sb->s_flags & MS_RDONLY; } @@ -1892,14 +2620,15 @@ static inline bool f2fs_may_extent_tree(struct inode *inode) static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi, size_t size, gfp_t flags) { -#ifdef CONFIG_F2FS_FAULT_INJECTION - if (time_to_inject(sbi, FAULT_KMALLOC)) + if (time_to_inject(sbi, FAULT_KMALLOC)) { + f2fs_show_injection_info(FAULT_KMALLOC); return NULL; -#endif + } + return kmalloc(size, flags); } -static inline void *f2fs_kvmalloc(size_t size, gfp_t flags) +static inline void *kvmalloc(size_t size, gfp_t flags) { void *ret; @@ -1909,7 +2638,7 @@ static inline void *f2fs_kvmalloc(size_t size, gfp_t flags) return ret; } -static inline void *f2fs_kvzalloc(size_t size, gfp_t flags) +static inline void *kvzalloc(size_t size, gfp_t flags) { void *ret; @@ -1919,102 +2648,220 @@ static inline void *f2fs_kvzalloc(size_t size, gfp_t flags) return ret; } -#define get_inode_mode(i) \ +static inline int wbc_to_write_flags(struct writeback_control *wbc) +{ + if (wbc->sync_mode == WB_SYNC_ALL) + return REQ_SYNC; + else if (wbc->for_kupdate || wbc->for_background) + return 0; + + return 0; +} + +static inline void *f2fs_kzalloc(struct f2fs_sb_info *sbi, + size_t size, gfp_t flags) +{ + return f2fs_kmalloc(sbi, size, flags | __GFP_ZERO); +} + +static inline void *f2fs_kvmalloc(struct f2fs_sb_info *sbi, + size_t size, gfp_t flags) +{ + if (time_to_inject(sbi, FAULT_KVMALLOC)) { + f2fs_show_injection_info(FAULT_KVMALLOC); + return NULL; + } + + return kvmalloc(size, flags); +} + +static inline void *f2fs_kvzalloc(struct f2fs_sb_info *sbi, + size_t size, gfp_t flags) +{ + return f2fs_kvmalloc(sbi, size, flags | __GFP_ZERO); +} + +static inline int get_extra_isize(struct inode *inode) +{ + return F2FS_I(inode)->i_extra_isize / sizeof(__le32); +} + +static inline int get_inline_xattr_addrs(struct inode *inode) +{ + return F2FS_I(inode)->i_inline_xattr_size; +} + +#define f2fs_get_inode_mode(i) \ ((is_inode_flag_set(i, FI_ACL_MODE)) ? \ (F2FS_I(i)->i_acl_mode) : ((i)->i_mode)) -/* get offset of first page in next direct node */ -#define PGOFS_OF_NEXT_DNODE(pgofs, inode) \ - ((pgofs < ADDRS_PER_INODE(inode)) ? ADDRS_PER_INODE(inode) : \ - (pgofs - ADDRS_PER_INODE(inode) + ADDRS_PER_BLOCK) / \ - ADDRS_PER_BLOCK * ADDRS_PER_BLOCK + ADDRS_PER_INODE(inode)) +#define F2FS_TOTAL_EXTRA_ATTR_SIZE \ + (offsetof(struct f2fs_inode, i_extra_end) - \ + offsetof(struct f2fs_inode, i_extra_isize)) \ + +#define F2FS_OLD_ATTRIBUTE_SIZE (offsetof(struct f2fs_inode, i_addr)) +#define F2FS_FITS_IN_INODE(f2fs_inode, extra_isize, field) \ + ((offsetof(typeof(*f2fs_inode), field) + \ + sizeof((f2fs_inode)->field)) \ + <= (F2FS_OLD_ATTRIBUTE_SIZE + extra_isize)) \ + +static inline void f2fs_reset_iostat(struct f2fs_sb_info *sbi) +{ + int i; + + spin_lock(&sbi->iostat_lock); + for (i = 0; i < NR_IO_TYPE; i++) + sbi->write_iostat[i] = 0; + spin_unlock(&sbi->iostat_lock); +} + +static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi, + enum iostat_type type, unsigned long long io_bytes) +{ + if (!sbi->iostat_enable) + return; + spin_lock(&sbi->iostat_lock); + sbi->write_iostat[type] += io_bytes; + + if (type == APP_WRITE_IO || type == APP_DIRECT_IO) + sbi->write_iostat[APP_BUFFERED_IO] = + sbi->write_iostat[APP_WRITE_IO] - + sbi->write_iostat[APP_DIRECT_IO]; + spin_unlock(&sbi->iostat_lock); +} + +#define __is_meta_io(fio) (PAGE_TYPE_OF_BIO(fio->type) == META && \ + (!is_read_io(fio->op) || fio->is_meta)) + +bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr, int type); +void f2fs_msg(struct super_block *sb, const char *level, const char *fmt, ...); +static inline void verify_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr, int type) +{ + if (!f2fs_is_valid_blkaddr(sbi, blkaddr, type)) { + f2fs_msg(sbi->sb, KERN_ERR, + "invalid blkaddr: %u, type: %d, run fsck to fix.", + blkaddr, type); + f2fs_bug_on(sbi, 1); + } +} + +static inline bool __is_valid_data_blkaddr(block_t blkaddr) +{ + if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) + return false; + return true; +} + +static inline bool is_valid_data_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr) +{ + if (!__is_valid_data_blkaddr(blkaddr)) + return false; + verify_blkaddr(sbi, blkaddr, DATA_GENERIC); + return true; +} /* * file.c */ -int f2fs_sync_file(struct file *, loff_t, loff_t, int); -void truncate_data_blocks(struct dnode_of_data *); -int truncate_blocks(struct inode *, u64, bool); -int f2fs_truncate(struct inode *); -int f2fs_getattr(struct vfsmount *, struct dentry *, struct kstat *); -int f2fs_setattr(struct dentry *, struct iattr *); -int truncate_hole(struct inode *, pgoff_t, pgoff_t); -int truncate_data_blocks_range(struct dnode_of_data *, int); -long f2fs_ioctl(struct file *, unsigned int, unsigned long); -long f2fs_compat_ioctl(struct file *, unsigned int, unsigned long); +int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync); +void f2fs_truncate_data_blocks(struct dnode_of_data *dn); +int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock); +int f2fs_truncate(struct inode *inode); +int f2fs_getattr(struct vfsmount *mnt, struct dentry *dentry, + struct kstat *stat); +int f2fs_setattr(struct dentry *dentry, struct iattr *attr); +int f2fs_truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end); +void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count); +int f2fs_precache_extents(struct inode *inode); +long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg); +long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg); +int f2fs_pin_file_control(struct inode *inode, bool inc); /* * inode.c */ -void f2fs_set_inode_flags(struct inode *); -struct inode *f2fs_iget(struct super_block *, unsigned long); -struct inode *f2fs_iget_retry(struct super_block *, unsigned long); -int try_to_free_nats(struct f2fs_sb_info *, int); -int update_inode(struct inode *, struct page *); -int update_inode_page(struct inode *); -int f2fs_write_inode(struct inode *, struct writeback_control *); -void f2fs_evict_inode(struct inode *); -void handle_failed_inode(struct inode *); +void f2fs_set_inode_flags(struct inode *inode); +bool f2fs_inode_chksum_verify(struct f2fs_sb_info *sbi, struct page *page); +void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, struct page *page); +struct inode *f2fs_iget(struct super_block *sb, unsigned long ino); +struct inode *f2fs_iget_retry(struct super_block *sb, unsigned long ino); +int f2fs_try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink); +void f2fs_update_inode(struct inode *inode, struct page *node_page); +void f2fs_update_inode_page(struct inode *inode); +int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc); +void f2fs_evict_inode(struct inode *inode); +void f2fs_handle_failed_inode(struct inode *inode); /* * namei.c */ +int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name, + bool hot, bool set); struct dentry *f2fs_get_parent(struct dentry *child); /* * dir.c */ -void set_de_type(struct f2fs_dir_entry *, umode_t); -unsigned char get_de_type(struct f2fs_dir_entry *); -struct f2fs_dir_entry *find_target_dentry(struct fscrypt_name *, - f2fs_hash_t, int *, struct f2fs_dentry_ptr *); -bool f2fs_fill_dentries(struct dir_context *, struct f2fs_dentry_ptr *, - unsigned int, struct fscrypt_str *); -void do_make_empty_dir(struct inode *, struct inode *, - struct f2fs_dentry_ptr *); -struct page *init_inode_metadata(struct inode *, struct inode *, - const struct qstr *, const struct qstr *, struct page *); -void update_parent_metadata(struct inode *, struct inode *, unsigned int); -int room_for_filename(const void *, int, int); -void f2fs_drop_nlink(struct inode *, struct inode *); -struct f2fs_dir_entry *__f2fs_find_entry(struct inode *, struct fscrypt_name *, - struct page **); -struct f2fs_dir_entry *f2fs_find_entry(struct inode *, const struct qstr *, - struct page **); -struct f2fs_dir_entry *f2fs_parent_dir(struct inode *, struct page **); -ino_t f2fs_inode_by_name(struct inode *, const struct qstr *, struct page **); -void f2fs_set_link(struct inode *, struct f2fs_dir_entry *, - struct page *, struct inode *); -int update_dent_inode(struct inode *, struct inode *, const struct qstr *); -void f2fs_update_dentry(nid_t ino, umode_t mode, struct f2fs_dentry_ptr *, - const struct qstr *, f2fs_hash_t , unsigned int); -int f2fs_add_regular_entry(struct inode *, const struct qstr *, - const struct qstr *, struct inode *, nid_t, umode_t); -int __f2fs_do_add_link(struct inode *, struct fscrypt_name*, struct inode *, - nid_t, umode_t); -int __f2fs_add_link(struct inode *, const struct qstr *, struct inode *, nid_t, - umode_t); -void f2fs_delete_entry(struct f2fs_dir_entry *, struct page *, struct inode *, - struct inode *); -int f2fs_do_tmpfile(struct inode *, struct inode *); -bool f2fs_empty_dir(struct inode *); +unsigned char f2fs_get_de_type(struct f2fs_dir_entry *de); +struct f2fs_dir_entry *f2fs_find_target_dentry(struct fscrypt_name *fname, + f2fs_hash_t namehash, int *max_slots, + struct f2fs_dentry_ptr *d); +int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, + unsigned int start_pos, struct fscrypt_str *fstr); +void f2fs_do_make_empty_dir(struct inode *inode, struct inode *parent, + struct f2fs_dentry_ptr *d); +struct page *f2fs_init_inode_metadata(struct inode *inode, struct inode *dir, + const struct qstr *new_name, + const struct qstr *orig_name, struct page *dpage); +void f2fs_update_parent_metadata(struct inode *dir, struct inode *inode, + unsigned int current_depth); +int f2fs_room_for_filename(const void *bitmap, int slots, int max_slots); +void f2fs_drop_nlink(struct inode *dir, struct inode *inode); +struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir, + struct fscrypt_name *fname, struct page **res_page); +struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir, + const struct qstr *child, struct page **res_page); +struct f2fs_dir_entry *f2fs_parent_dir(struct inode *dir, struct page **p); +ino_t f2fs_inode_by_name(struct inode *dir, const struct qstr *qstr, + struct page **page); +void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, + struct page *page, struct inode *inode); +void f2fs_update_dentry(nid_t ino, umode_t mode, struct f2fs_dentry_ptr *d, + const struct qstr *name, f2fs_hash_t name_hash, + unsigned int bit_pos); +int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, + const struct qstr *orig_name, + struct inode *inode, nid_t ino, umode_t mode); +int f2fs_add_dentry(struct inode *dir, struct fscrypt_name *fname, + struct inode *inode, nid_t ino, umode_t mode); +int f2fs_do_add_link(struct inode *dir, const struct qstr *name, + struct inode *inode, nid_t ino, umode_t mode); +void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, + struct inode *dir, struct inode *inode); +int f2fs_do_tmpfile(struct inode *inode, struct inode *dir); +bool f2fs_empty_dir(struct inode *dir); static inline int f2fs_add_link(struct dentry *dentry, struct inode *inode) { - return __f2fs_add_link(d_inode(dentry->d_parent), &dentry->d_name, + return f2fs_do_add_link(d_inode(dentry->d_parent), &dentry->d_name, inode, inode->i_ino, inode->i_mode); } /* * super.c */ -int f2fs_inode_dirtied(struct inode *); -void f2fs_inode_synced(struct inode *); -int f2fs_commit_super(struct f2fs_sb_info *, bool); -int f2fs_sync_fs(struct super_block *, int); +int f2fs_inode_dirtied(struct inode *inode, bool sync); +void f2fs_inode_synced(struct inode *inode); +int f2fs_enable_quota_files(struct f2fs_sb_info *sbi, bool rdonly); +void f2fs_quota_off_umount(struct super_block *sb); +int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover); +int f2fs_sync_fs(struct super_block *sb, int sync); extern __printf(3, 4) -void f2fs_msg(struct super_block *, const char *, const char *, ...); -int sanity_check_ckpt(struct f2fs_sb_info *sbi); +void f2fs_msg(struct super_block *sb, const char *level, const char *fmt, ...); +int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi); /* * hash.c @@ -2028,160 +2875,216 @@ f2fs_hash_t f2fs_dentry_hash(const struct qstr *name_info, struct dnode_of_data; struct node_info; -bool available_free_memory(struct f2fs_sb_info *, int); -int need_dentry_mark(struct f2fs_sb_info *, nid_t); -bool is_checkpointed_node(struct f2fs_sb_info *, nid_t); -bool need_inode_block_update(struct f2fs_sb_info *, nid_t); -void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *); -pgoff_t get_next_page_offset(struct dnode_of_data *, pgoff_t); -int get_dnode_of_data(struct dnode_of_data *, pgoff_t, int); -int truncate_inode_blocks(struct inode *, pgoff_t); -int truncate_xattr_node(struct inode *, struct page *); -int wait_on_node_pages_writeback(struct f2fs_sb_info *, nid_t); -int remove_inode_page(struct inode *); -struct page *new_inode_page(struct inode *); -struct page *new_node_page(struct dnode_of_data *, unsigned int, struct page *); -void ra_node_page(struct f2fs_sb_info *, nid_t); -struct page *get_node_page(struct f2fs_sb_info *, pgoff_t); -struct page *get_node_page_ra(struct page *, int); -void move_node_page(struct page *, int); -int fsync_node_pages(struct f2fs_sb_info *, struct inode *, - struct writeback_control *, bool); -int sync_node_pages(struct f2fs_sb_info *, struct writeback_control *); -void build_free_nids(struct f2fs_sb_info *); -bool alloc_nid(struct f2fs_sb_info *, nid_t *); -void alloc_nid_done(struct f2fs_sb_info *, nid_t); -void alloc_nid_failed(struct f2fs_sb_info *, nid_t); -int try_to_free_nids(struct f2fs_sb_info *, int); -void recover_inline_xattr(struct inode *, struct page *); -void recover_xattr_data(struct inode *, struct page *, block_t); -int recover_inode_page(struct f2fs_sb_info *, struct page *); -int restore_node_summary(struct f2fs_sb_info *, unsigned int, - struct f2fs_summary_block *); -void flush_nat_entries(struct f2fs_sb_info *); -int build_node_manager(struct f2fs_sb_info *); -void destroy_node_manager(struct f2fs_sb_info *); -int __init create_node_manager_caches(void); -void destroy_node_manager_caches(void); +int f2fs_check_nid_range(struct f2fs_sb_info *sbi, nid_t nid); +bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type); +bool f2fs_in_warm_node_list(struct f2fs_sb_info *sbi, struct page *page); +void f2fs_init_fsync_node_info(struct f2fs_sb_info *sbi); +void f2fs_del_fsync_node_entry(struct f2fs_sb_info *sbi, struct page *page); +void f2fs_reset_fsync_node_info(struct f2fs_sb_info *sbi); +int f2fs_need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid); +bool f2fs_is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid); +bool f2fs_need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino); +int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid, + struct node_info *ni); +pgoff_t f2fs_get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs); +int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode); +int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from); +int f2fs_truncate_xattr_node(struct inode *inode); +int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, + unsigned int seq_id); +int f2fs_remove_inode_page(struct inode *inode); +struct page *f2fs_new_inode_page(struct inode *inode); +struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs); +void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid); +struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid); +struct page *f2fs_get_node_page_ra(struct page *parent, int start); +void f2fs_move_node_page(struct page *node_page, int gc_type); +int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, + struct writeback_control *wbc, bool atomic, + unsigned int *seq_id); +int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, + struct writeback_control *wbc, + bool do_balance, enum iostat_type io_type); +int f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount); +bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid); +void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid); +void f2fs_alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid); +int f2fs_try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink); +void f2fs_recover_inline_xattr(struct inode *inode, struct page *page); +int f2fs_recover_xattr_data(struct inode *inode, struct page *page); +int f2fs_recover_inode_page(struct f2fs_sb_info *sbi, struct page *page); +int f2fs_restore_node_summary(struct f2fs_sb_info *sbi, + unsigned int segno, struct f2fs_summary_block *sum); +void f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); +int f2fs_build_node_manager(struct f2fs_sb_info *sbi); +void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi); +int __init f2fs_create_node_manager_caches(void); +void f2fs_destroy_node_manager_caches(void); /* * segment.c */ -void register_inmem_page(struct inode *, struct page *); -void drop_inmem_pages(struct inode *); -int commit_inmem_pages(struct inode *); -void f2fs_balance_fs(struct f2fs_sb_info *, bool); -void f2fs_balance_fs_bg(struct f2fs_sb_info *); -int f2fs_issue_flush(struct f2fs_sb_info *); -int create_flush_cmd_control(struct f2fs_sb_info *); -void destroy_flush_cmd_control(struct f2fs_sb_info *); -void invalidate_blocks(struct f2fs_sb_info *, block_t); -bool is_checkpointed_data(struct f2fs_sb_info *, block_t); -void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t); -void f2fs_wait_all_discard_bio(struct f2fs_sb_info *); -void clear_prefree_segments(struct f2fs_sb_info *, struct cp_control *); -void release_discard_addrs(struct f2fs_sb_info *); -int npages_for_summary_flush(struct f2fs_sb_info *, bool); -void allocate_new_segments(struct f2fs_sb_info *); -int f2fs_trim_fs(struct f2fs_sb_info *, struct fstrim_range *); -struct page *get_sum_page(struct f2fs_sb_info *, unsigned int); -void update_meta_page(struct f2fs_sb_info *, void *, block_t); -void write_meta_page(struct f2fs_sb_info *, struct page *); -void write_node_page(unsigned int, struct f2fs_io_info *); -void write_data_page(struct dnode_of_data *, struct f2fs_io_info *); -void rewrite_data_page(struct f2fs_io_info *); -void __f2fs_replace_block(struct f2fs_sb_info *, struct f2fs_summary *, - block_t, block_t, bool, bool); -void f2fs_replace_block(struct f2fs_sb_info *, struct dnode_of_data *, - block_t, block_t, unsigned char, bool, bool); -void allocate_data_block(struct f2fs_sb_info *, struct page *, - block_t, block_t *, struct f2fs_summary *, int); -void f2fs_wait_on_page_writeback(struct page *, enum page_type, bool); -void f2fs_wait_on_encrypted_page_writeback(struct f2fs_sb_info *, block_t); -void write_data_summaries(struct f2fs_sb_info *, block_t); -void write_node_summaries(struct f2fs_sb_info *, block_t); -int lookup_journal_in_cursum(struct f2fs_journal *, int, unsigned int, int); -void flush_sit_entries(struct f2fs_sb_info *, struct cp_control *); -int build_segment_manager(struct f2fs_sb_info *); -void destroy_segment_manager(struct f2fs_sb_info *); -int __init create_segment_manager_caches(void); -void destroy_segment_manager_caches(void); +bool f2fs_need_SSR(struct f2fs_sb_info *sbi); +void f2fs_register_inmem_page(struct inode *inode, struct page *page); +void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure); +void f2fs_drop_inmem_pages(struct inode *inode); +void f2fs_drop_inmem_page(struct inode *inode, struct page *page); +int f2fs_commit_inmem_pages(struct inode *inode); +void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need); +void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi); +int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino); +int f2fs_create_flush_cmd_control(struct f2fs_sb_info *sbi); +int f2fs_flush_device_cache(struct f2fs_sb_info *sbi); +void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free); +void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr); +bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr); +void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi); +void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi); +bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi); +void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi, + struct cp_control *cpc); +void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi); +int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra); +void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi); +int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range); +bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi, + struct cp_control *cpc); +struct page *f2fs_get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno); +void f2fs_update_meta_page(struct f2fs_sb_info *sbi, void *src, + block_t blk_addr); +void f2fs_do_write_meta_page(struct f2fs_sb_info *sbi, struct page *page, + enum iostat_type io_type); +void f2fs_do_write_node_page(unsigned int nid, struct f2fs_io_info *fio); +void f2fs_outplace_write_data(struct dnode_of_data *dn, + struct f2fs_io_info *fio); +int f2fs_inplace_write_data(struct f2fs_io_info *fio); +void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, + block_t old_blkaddr, block_t new_blkaddr, + bool recover_curseg, bool recover_newaddr); +void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn, + block_t old_addr, block_t new_addr, + unsigned char version, bool recover_curseg, + bool recover_newaddr); +void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, + block_t old_blkaddr, block_t *new_blkaddr, + struct f2fs_summary *sum, int type, + struct f2fs_io_info *fio, bool add_list); +void f2fs_wait_on_page_writeback(struct page *page, + enum page_type type, bool ordered); +void f2fs_wait_on_block_writeback(struct f2fs_sb_info *sbi, block_t blkaddr); +void f2fs_write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk); +void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk); +int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, + unsigned int val, int alloc); +void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); +int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); +void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); +int __init f2fs_create_segment_manager_caches(void); +void f2fs_destroy_segment_manager_caches(void); +int f2fs_rw_hint_to_seg_type(enum rw_hint hint); +enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi, + enum page_type type, enum temp_type temp); /* * checkpoint.c */ -void f2fs_stop_checkpoint(struct f2fs_sb_info *, bool); -struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t); -struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t); -struct page *get_tmp_page(struct f2fs_sb_info *, pgoff_t); -bool is_valid_blkaddr(struct f2fs_sb_info *, block_t, int); -int ra_meta_pages(struct f2fs_sb_info *, block_t, int, int, bool); -void ra_meta_pages_cond(struct f2fs_sb_info *, pgoff_t); -long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long); -void add_ino_entry(struct f2fs_sb_info *, nid_t, int type); -void remove_ino_entry(struct f2fs_sb_info *, nid_t, int type); -void release_ino_entry(struct f2fs_sb_info *, bool); -bool exist_written_data(struct f2fs_sb_info *, nid_t, int); -int f2fs_sync_inode_meta(struct f2fs_sb_info *); -int acquire_orphan_inode(struct f2fs_sb_info *); -void release_orphan_inode(struct f2fs_sb_info *); -void add_orphan_inode(struct inode *); -void remove_orphan_inode(struct f2fs_sb_info *, nid_t); -int recover_orphan_inodes(struct f2fs_sb_info *); -int get_valid_checkpoint(struct f2fs_sb_info *); -void update_dirty_page(struct inode *, struct page *); -void remove_dirty_inode(struct inode *); -int sync_dirty_inodes(struct f2fs_sb_info *, enum inode_type); -int write_checkpoint(struct f2fs_sb_info *, struct cp_control *); -void init_ino_entry_info(struct f2fs_sb_info *); -int __init create_checkpoint_caches(void); -void destroy_checkpoint_caches(void); +void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io); +struct page *f2fs_grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); +struct page *f2fs_get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); +struct page *f2fs_get_meta_page_nofail(struct f2fs_sb_info *sbi, pgoff_t index); +struct page *f2fs_get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index); +bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr, int type); +int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, + int type, bool sync); +void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index); +long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, + long nr_to_write, enum iostat_type io_type); +void f2fs_add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type); +void f2fs_remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type); +void f2fs_release_ino_entry(struct f2fs_sb_info *sbi, bool all); +bool f2fs_exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode); +void f2fs_set_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, + unsigned int devidx, int type); +bool f2fs_is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, + unsigned int devidx, int type); +int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi); +int f2fs_acquire_orphan_inode(struct f2fs_sb_info *sbi); +void f2fs_release_orphan_inode(struct f2fs_sb_info *sbi); +void f2fs_add_orphan_inode(struct inode *inode); +void f2fs_remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino); +int f2fs_recover_orphan_inodes(struct f2fs_sb_info *sbi); +int f2fs_get_valid_checkpoint(struct f2fs_sb_info *sbi); +void f2fs_update_dirty_page(struct inode *inode, struct page *page); +void f2fs_remove_dirty_inode(struct inode *inode); +int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type); +void f2fs_wait_on_all_pages_writeback(struct f2fs_sb_info *sbi); +int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc); +void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi); +int __init f2fs_create_checkpoint_caches(void); +void f2fs_destroy_checkpoint_caches(void); /* * data.c */ -void f2fs_submit_merged_bio(struct f2fs_sb_info *, enum page_type, int); -void f2fs_submit_merged_bio_cond(struct f2fs_sb_info *, struct inode *, - struct page *, nid_t, enum page_type, int); -void f2fs_flush_merged_bios(struct f2fs_sb_info *); -int f2fs_submit_page_bio(struct f2fs_io_info *); -void f2fs_submit_page_mbio(struct f2fs_io_info *); -void set_data_blkaddr(struct dnode_of_data *); -void f2fs_update_data_blkaddr(struct dnode_of_data *, block_t); -int reserve_new_blocks(struct dnode_of_data *, blkcnt_t); -int reserve_new_block(struct dnode_of_data *); -int f2fs_get_block(struct dnode_of_data *, pgoff_t); -ssize_t f2fs_preallocate_blocks(struct kiocb *, struct iov_iter *); -int f2fs_reserve_block(struct dnode_of_data *, pgoff_t); -struct page *get_read_data_page(struct inode *, pgoff_t, int, bool); -struct page *find_data_page(struct inode *, pgoff_t); -struct page *get_lock_data_page(struct inode *, pgoff_t, bool); -struct page *get_new_data_page(struct inode *, struct page *, pgoff_t, bool); -int do_write_data_page(struct f2fs_io_info *); -int f2fs_map_blocks(struct inode *, struct f2fs_map_blocks *, int, int); -int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *, u64, u64); -void f2fs_set_page_dirty_nobuffers(struct page *); -void f2fs_invalidate_page(struct page *, unsigned int, unsigned int); -int f2fs_release_page(struct page *, gfp_t); +int f2fs_init_post_read_processing(void); +void f2fs_destroy_post_read_processing(void); +void f2fs_submit_merged_write(struct f2fs_sb_info *sbi, enum page_type type); +void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi, + struct inode *inode, nid_t ino, pgoff_t idx, + enum page_type type); +void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi); +int f2fs_submit_page_bio(struct f2fs_io_info *fio); +void f2fs_submit_page_write(struct f2fs_io_info *fio); +struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi, + block_t blk_addr, struct bio *bio); +int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr); +void f2fs_set_data_blkaddr(struct dnode_of_data *dn); +void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr); +int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count); +int f2fs_reserve_new_block(struct dnode_of_data *dn); +int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index); +int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from); +int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index); +struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index, + int op_flags, bool for_write); +struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index); +struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index, + bool for_write); +struct page *f2fs_get_new_data_page(struct inode *inode, + struct page *ipage, pgoff_t index, bool new_i_size); +int f2fs_do_write_data_page(struct f2fs_io_info *fio); +int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, + int create, int flag); +int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, + u64 start, u64 len); +bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio); +bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio); +void f2fs_invalidate_page(struct page *page, unsigned int offset, + unsigned int length); +int f2fs_release_page(struct page *page, gfp_t wait); #ifdef CONFIG_MIGRATION -int f2fs_migrate_page(struct address_space *, struct page *, struct page *, - enum migrate_mode); +int f2fs_migrate_page(struct address_space *mapping, struct page *newpage, + struct page *page, enum migrate_mode mode); #endif +bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len); +void f2fs_clear_radix_tree_dirty_tag(struct page *page); /* * gc.c */ -int start_gc_thread(struct f2fs_sb_info *); -void stop_gc_thread(struct f2fs_sb_info *); -block_t start_bidx_of_node(unsigned int, struct inode *); -int f2fs_gc(struct f2fs_sb_info *, bool); -void build_gc_manager(struct f2fs_sb_info *); +int f2fs_start_gc_thread(struct f2fs_sb_info *sbi); +void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi); +block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode); +int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, bool background, + unsigned int segno); +void f2fs_build_gc_manager(struct f2fs_sb_info *sbi); /* * recovery.c */ -int recover_fsync_data(struct f2fs_sb_info *, bool); -bool space_for_roll_forward(struct f2fs_sb_info *); +int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only); +bool f2fs_space_for_roll_forward(struct f2fs_sb_info *sbi); /* * debug.c @@ -2195,13 +3098,20 @@ struct f2fs_stat_info { unsigned long long hit_largest, hit_cached, hit_rbtree; unsigned long long hit_total, total_ext; int ext_tree, zombie_tree, ext_node; - int ndirty_node, ndirty_dent, ndirty_meta, ndirty_data, ndirty_imeta; + int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta; + int ndirty_data, ndirty_qdata; int inmem_pages; - unsigned int ndirty_dirs, ndirty_files, ndirty_all; - int nats, dirty_nats, sits, dirty_sits, fnids; + unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all; + int nats, dirty_nats, sits, dirty_sits; + int free_nids, avail_nids, alloc_nids; int total_count, utilization; - int bg_gc, wb_bios; - int inline_xattr, inline_inode, inline_dir, orphans; + int bg_gc, nr_wb_cp_data, nr_wb_data; + int nr_flushing, nr_flushed, flush_list_empty; + int nr_discarding, nr_discarded; + int nr_discard_cmd; + unsigned int undiscard_blks; + int inline_xattr, inline_inode, inline_dir, append, update, orphans; + int aw_cnt, max_aw_cnt, vw_cnt, max_vw_cnt; unsigned int valid_count, valid_node_count, valid_inode_count, discard_blks; unsigned int bimodal, avg_vblocks; int util_free, util_valid, util_invalid; @@ -2212,6 +3122,7 @@ struct f2fs_stat_info { int bg_node_segs, bg_data_segs; int tot_blks, data_blks, node_blks; int bg_data_blks, bg_node_blks; + unsigned long long skipped_atomic_files[2]; int curseg[NR_CURSEG_TYPE]; int cursec[NR_CURSEG_TYPE]; int curzone[NR_CURSEG_TYPE]; @@ -2273,11 +3184,33 @@ static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi) ((sbi)->block_count[(curseg)->alloc_type]++) #define stat_inc_inplace_blocks(sbi) \ (atomic_inc(&(sbi)->inplace_count)) +#define stat_inc_atomic_write(inode) \ + (atomic_inc(&F2FS_I_SB(inode)->aw_cnt)) +#define stat_dec_atomic_write(inode) \ + (atomic_dec(&F2FS_I_SB(inode)->aw_cnt)) +#define stat_update_max_atomic_write(inode) \ + do { \ + int cur = atomic_read(&F2FS_I_SB(inode)->aw_cnt); \ + int max = atomic_read(&F2FS_I_SB(inode)->max_aw_cnt); \ + if (cur > max) \ + atomic_set(&F2FS_I_SB(inode)->max_aw_cnt, cur); \ + } while (0) +#define stat_inc_volatile_write(inode) \ + (atomic_inc(&F2FS_I_SB(inode)->vw_cnt)) +#define stat_dec_volatile_write(inode) \ + (atomic_dec(&F2FS_I_SB(inode)->vw_cnt)) +#define stat_update_max_volatile_write(inode) \ + do { \ + int cur = atomic_read(&F2FS_I_SB(inode)->vw_cnt); \ + int max = atomic_read(&F2FS_I_SB(inode)->max_vw_cnt); \ + if (cur > max) \ + atomic_set(&F2FS_I_SB(inode)->max_vw_cnt, cur); \ + } while (0) #define stat_inc_seg_count(sbi, type, gc_type) \ do { \ struct f2fs_stat_info *si = F2FS_STAT(sbi); \ - (si)->tot_segs++; \ - if (type == SUM_TYPE_DATA) { \ + si->tot_segs++; \ + if ((type) == SUM_TYPE_DATA) { \ si->data_segs++; \ si->bg_data_segs += (gc_type == BG_GC) ? 1 : 0; \ } else { \ @@ -2287,14 +3220,14 @@ static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi) } while (0) #define stat_inc_tot_blk_count(si, blks) \ - (si->tot_blks += (blks)) + ((si)->tot_blks += (blks)) #define stat_inc_data_blk_count(sbi, blks, gc_type) \ do { \ struct f2fs_stat_info *si = F2FS_STAT(sbi); \ stat_inc_tot_blk_count(si, blks); \ si->data_blks += (blks); \ - si->bg_data_blks += (gc_type == BG_GC) ? (blks) : 0; \ + si->bg_data_blks += ((gc_type) == BG_GC) ? (blks) : 0; \ } while (0) #define stat_inc_node_blk_count(sbi, blks, gc_type) \ @@ -2302,37 +3235,43 @@ static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi) struct f2fs_stat_info *si = F2FS_STAT(sbi); \ stat_inc_tot_blk_count(si, blks); \ si->node_blks += (blks); \ - si->bg_node_blks += (gc_type == BG_GC) ? (blks) : 0; \ + si->bg_node_blks += ((gc_type) == BG_GC) ? (blks) : 0; \ } while (0) -int f2fs_build_stats(struct f2fs_sb_info *); -void f2fs_destroy_stats(struct f2fs_sb_info *); +int f2fs_build_stats(struct f2fs_sb_info *sbi); +void f2fs_destroy_stats(struct f2fs_sb_info *sbi); int __init f2fs_create_root_stats(void); void f2fs_destroy_root_stats(void); #else -#define stat_inc_cp_count(si) -#define stat_inc_bg_cp_count(si) -#define stat_inc_call_count(si) -#define stat_inc_bggc_count(si) -#define stat_inc_dirty_inode(sbi, type) -#define stat_dec_dirty_inode(sbi, type) -#define stat_inc_total_hit(sb) -#define stat_inc_rbtree_node_hit(sb) -#define stat_inc_largest_node_hit(sbi) -#define stat_inc_cached_node_hit(sbi) -#define stat_inc_inline_xattr(inode) -#define stat_dec_inline_xattr(inode) -#define stat_inc_inline_inode(inode) -#define stat_dec_inline_inode(inode) -#define stat_inc_inline_dir(inode) -#define stat_dec_inline_dir(inode) -#define stat_inc_seg_type(sbi, curseg) -#define stat_inc_block_count(sbi, curseg) -#define stat_inc_inplace_blocks(sbi) -#define stat_inc_seg_count(sbi, type, gc_type) -#define stat_inc_tot_blk_count(si, blks) -#define stat_inc_data_blk_count(sbi, blks, gc_type) -#define stat_inc_node_blk_count(sbi, blks, gc_type) +#define stat_inc_cp_count(si) do { } while (0) +#define stat_inc_bg_cp_count(si) do { } while (0) +#define stat_inc_call_count(si) do { } while (0) +#define stat_inc_bggc_count(si) do { } while (0) +#define stat_inc_dirty_inode(sbi, type) do { } while (0) +#define stat_dec_dirty_inode(sbi, type) do { } while (0) +#define stat_inc_total_hit(sb) do { } while (0) +#define stat_inc_rbtree_node_hit(sb) do { } while (0) +#define stat_inc_largest_node_hit(sbi) do { } while (0) +#define stat_inc_cached_node_hit(sbi) do { } while (0) +#define stat_inc_inline_xattr(inode) do { } while (0) +#define stat_dec_inline_xattr(inode) do { } while (0) +#define stat_inc_inline_inode(inode) do { } while (0) +#define stat_dec_inline_inode(inode) do { } while (0) +#define stat_inc_inline_dir(inode) do { } while (0) +#define stat_dec_inline_dir(inode) do { } while (0) +#define stat_inc_atomic_write(inode) do { } while (0) +#define stat_dec_atomic_write(inode) do { } while (0) +#define stat_update_max_atomic_write(inode) do { } while (0) +#define stat_inc_volatile_write(inode) do { } while (0) +#define stat_dec_volatile_write(inode) do { } while (0) +#define stat_update_max_volatile_write(inode) do { } while (0) +#define stat_inc_seg_type(sbi, curseg) do { } while (0) +#define stat_inc_block_count(sbi, curseg) do { } while (0) +#define stat_inc_inplace_blocks(sbi) do { } while (0) +#define stat_inc_seg_count(sbi, type, gc_type) do { } while (0) +#define stat_inc_tot_blk_count(si, blks) do { } while (0) +#define stat_inc_data_blk_count(sbi, blks, gc_type) do { } while (0) +#define stat_inc_node_blk_count(sbi, blks, gc_type) do { } while (0) static inline int f2fs_build_stats(struct f2fs_sb_info *sbi) { return 0; } static inline void f2fs_destroy_stats(struct f2fs_sb_info *sbi) { } @@ -2350,56 +3289,84 @@ extern const struct inode_operations f2fs_dir_inode_operations; extern const struct inode_operations f2fs_symlink_inode_operations; extern const struct inode_operations f2fs_encrypted_symlink_inode_operations; extern const struct inode_operations f2fs_special_inode_operations; -extern struct kmem_cache *inode_entry_slab; +extern struct kmem_cache *f2fs_inode_entry_slab; /* * inline.c */ -bool f2fs_may_inline_data(struct inode *); -bool f2fs_may_inline_dentry(struct inode *); -void read_inline_data(struct page *, struct page *); -bool truncate_inline_inode(struct page *, u64); -int f2fs_read_inline_data(struct inode *, struct page *); -int f2fs_convert_inline_page(struct dnode_of_data *, struct page *); -int f2fs_convert_inline_inode(struct inode *); -int f2fs_write_inline_data(struct inode *, struct page *); -bool recover_inline_data(struct inode *, struct page *); -struct f2fs_dir_entry *find_in_inline_dir(struct inode *, - struct fscrypt_name *, struct page **); -int make_empty_inline_dir(struct inode *inode, struct inode *, struct page *); -int f2fs_add_inline_entry(struct inode *, const struct qstr *, - const struct qstr *, struct inode *, nid_t, umode_t); -void f2fs_delete_inline_entry(struct f2fs_dir_entry *, struct page *, - struct inode *, struct inode *); -bool f2fs_empty_inline_dir(struct inode *); -int f2fs_read_inline_dir(struct file *, struct dir_context *, - struct fscrypt_str *); -int f2fs_inline_data_fiemap(struct inode *, - struct fiemap_extent_info *, __u64, __u64); +bool f2fs_may_inline_data(struct inode *inode); +bool f2fs_may_inline_dentry(struct inode *inode); +void f2fs_do_read_inline_data(struct page *page, struct page *ipage); +void f2fs_truncate_inline_inode(struct inode *inode, + struct page *ipage, u64 from); +int f2fs_read_inline_data(struct inode *inode, struct page *page); +int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page); +int f2fs_convert_inline_inode(struct inode *inode); +int f2fs_write_inline_data(struct inode *inode, struct page *page); +bool f2fs_recover_inline_data(struct inode *inode, struct page *npage); +struct f2fs_dir_entry *f2fs_find_in_inline_dir(struct inode *dir, + struct fscrypt_name *fname, struct page **res_page); +int f2fs_make_empty_inline_dir(struct inode *inode, struct inode *parent, + struct page *ipage); +int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name, + const struct qstr *orig_name, + struct inode *inode, nid_t ino, umode_t mode); +void f2fs_delete_inline_entry(struct f2fs_dir_entry *dentry, + struct page *page, struct inode *dir, + struct inode *inode); +bool f2fs_empty_inline_dir(struct inode *dir); +int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx, + struct fscrypt_str *fstr); +int f2fs_inline_data_fiemap(struct inode *inode, + struct fiemap_extent_info *fieinfo, + __u64 start, __u64 len); /* * shrinker.c */ -unsigned long f2fs_shrink_count(struct shrinker *, struct shrink_control *); -unsigned long f2fs_shrink_scan(struct shrinker *, struct shrink_control *); -void f2fs_join_shrinker(struct f2fs_sb_info *); -void f2fs_leave_shrinker(struct f2fs_sb_info *); +unsigned long f2fs_shrink_count(struct shrinker *shrink, + struct shrink_control *sc); +unsigned long f2fs_shrink_scan(struct shrinker *shrink, + struct shrink_control *sc); +void f2fs_join_shrinker(struct f2fs_sb_info *sbi); +void f2fs_leave_shrinker(struct f2fs_sb_info *sbi); /* * extent_cache.c */ -unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *, int); -bool f2fs_init_extent_tree(struct inode *, struct f2fs_extent *); -void f2fs_drop_extent_tree(struct inode *); -unsigned int f2fs_destroy_extent_node(struct inode *); -void f2fs_destroy_extent_tree(struct inode *); -bool f2fs_lookup_extent_cache(struct inode *, pgoff_t, struct extent_info *); -void f2fs_update_extent_cache(struct dnode_of_data *); +struct rb_entry *f2fs_lookup_rb_tree(struct rb_root *root, + struct rb_entry *cached_re, unsigned int ofs); +struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi, + struct rb_root *root, struct rb_node **parent, + unsigned int ofs); +struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root *root, + struct rb_entry *cached_re, unsigned int ofs, + struct rb_entry **prev_entry, struct rb_entry **next_entry, + struct rb_node ***insert_p, struct rb_node **insert_parent, + bool force); +bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi, + struct rb_root *root); +unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink); +bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext); +void f2fs_drop_extent_tree(struct inode *inode); +unsigned int f2fs_destroy_extent_node(struct inode *inode); +void f2fs_destroy_extent_tree(struct inode *inode); +bool f2fs_lookup_extent_cache(struct inode *inode, pgoff_t pgofs, + struct extent_info *ei); +void f2fs_update_extent_cache(struct dnode_of_data *dn); void f2fs_update_extent_cache_range(struct dnode_of_data *dn, - pgoff_t, block_t, unsigned int); -void init_extent_cache_info(struct f2fs_sb_info *); -int __init create_extent_cache(void); -void destroy_extent_cache(void); + pgoff_t fofs, block_t blkaddr, unsigned int len); +void f2fs_init_extent_cache_info(struct f2fs_sb_info *sbi); +int __init f2fs_create_extent_cache(void); +void f2fs_destroy_extent_cache(void); + +/* + * sysfs.c + */ +int __init f2fs_init_sysfs(void); +void f2fs_exit_sysfs(void); +int f2fs_register_sysfs(struct f2fs_sb_info *sbi); +void f2fs_unregister_sysfs(struct f2fs_sb_info *sbi); /* * crypto support @@ -2409,26 +3376,63 @@ static inline bool f2fs_encrypted_inode(struct inode *inode) return file_is_encrypt(inode); } +static inline bool f2fs_encrypted_file(struct inode *inode) +{ + return f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode); +} + static inline void f2fs_set_encrypted_inode(struct inode *inode) { #ifdef CONFIG_F2FS_FS_ENCRYPTION file_set_encrypt(inode); + inode->i_flags |= S_ENCRYPTED; #endif } -static inline bool f2fs_bio_encrypted(struct bio *bio) +/* + * Returns true if the reads of the inode's data need to undergo some + * postprocessing step, like decryption or authenticity verification. + */ +static inline bool f2fs_post_read_required(struct inode *inode) { - return bio->bi_private != NULL; + return f2fs_encrypted_file(inode); } -static inline int f2fs_sb_has_crypto(struct super_block *sb) -{ - return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_ENCRYPT); +#define F2FS_FEATURE_FUNCS(name, flagname) \ +static inline int f2fs_sb_has_##name(struct super_block *sb) \ +{ \ + return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_##flagname); \ } -static inline int f2fs_sb_mounted_hmsmr(struct super_block *sb) +F2FS_FEATURE_FUNCS(encrypt, ENCRYPT); +F2FS_FEATURE_FUNCS(blkzoned, BLKZONED); +F2FS_FEATURE_FUNCS(extra_attr, EXTRA_ATTR); +F2FS_FEATURE_FUNCS(project_quota, PRJQUOTA); +F2FS_FEATURE_FUNCS(inode_chksum, INODE_CHKSUM); +F2FS_FEATURE_FUNCS(flexible_inline_xattr, FLEXIBLE_INLINE_XATTR); +F2FS_FEATURE_FUNCS(quota_ino, QUOTA_INO); +F2FS_FEATURE_FUNCS(inode_crtime, INODE_CRTIME); +F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND); + +#ifdef CONFIG_BLK_DEV_ZONED +static inline int get_blkz_type(struct f2fs_sb_info *sbi, + struct block_device *bdev, block_t blkaddr) { - return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_HMSMR); + unsigned int zno = blkaddr >> sbi->log_blocks_per_blkz; + int i; + + for (i = 0; i < sbi->s_ndevs; i++) + if (FDEV(i).bdev == bdev) + return FDEV(i).blkz_type[zno]; + return -EINVAL; +} +#endif + +static inline bool f2fs_discard_en(struct f2fs_sb_info *sbi) +{ + struct request_queue *q = bdev_get_queue(sbi->sb->s_bdev); + + return blk_queue_discard(q) || f2fs_sb_has_blkzoned(sbi->sb); } static inline void set_opt_mode(struct f2fs_sb_info *sbi, unsigned int mt) @@ -2453,32 +3457,22 @@ static inline bool f2fs_may_encrypt(struct inode *inode) return (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)); #else - return 0; + return false; #endif } -#ifndef CONFIG_F2FS_FS_ENCRYPTION -#define fscrypt_set_d_op(i) -#define fscrypt_get_ctx fscrypt_notsupp_get_ctx -#define fscrypt_release_ctx fscrypt_notsupp_release_ctx -#define fscrypt_encrypt_page fscrypt_notsupp_encrypt_page -#define fscrypt_decrypt_page fscrypt_notsupp_decrypt_page -#define fscrypt_decrypt_bio_pages fscrypt_notsupp_decrypt_bio_pages -#define fscrypt_pullback_bio_page fscrypt_notsupp_pullback_bio_page -#define fscrypt_restore_control_page fscrypt_notsupp_restore_control_page -#define fscrypt_zeroout_range fscrypt_notsupp_zeroout_range -#define fscrypt_process_policy fscrypt_notsupp_process_policy -#define fscrypt_get_policy fscrypt_notsupp_get_policy -#define fscrypt_has_permitted_context fscrypt_notsupp_has_permitted_context -#define fscrypt_inherit_context fscrypt_notsupp_inherit_context -#define fscrypt_get_encryption_info fscrypt_notsupp_get_encryption_info -#define fscrypt_put_encryption_info fscrypt_notsupp_put_encryption_info -#define fscrypt_setup_filename fscrypt_notsupp_setup_filename -#define fscrypt_free_filename fscrypt_notsupp_free_filename -#define fscrypt_fname_encrypted_size fscrypt_notsupp_fname_encrypted_size -#define fscrypt_fname_alloc_buffer fscrypt_notsupp_fname_alloc_buffer -#define fscrypt_fname_free_buffer fscrypt_notsupp_fname_free_buffer -#define fscrypt_fname_disk_to_usr fscrypt_notsupp_fname_disk_to_usr -#define fscrypt_fname_usr_to_disk fscrypt_notsupp_fname_usr_to_disk +static inline bool f2fs_force_buffered_io(struct inode *inode, int rw) +{ + return (f2fs_post_read_required(inode) || + (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) || + F2FS_I_SB(inode)->s_ndevs); +} + +#ifdef CONFIG_F2FS_FAULT_INJECTION +extern void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate, + unsigned int type); +#else +#define f2fs_build_fault_attr(sbi, rate, type) do { } while (0) #endif + #endif
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 594e6e2..4636b019 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c
@@ -20,6 +20,7 @@ #include <linux/uaccess.h> #include <linux/mount.h> #include <linux/pagevec.h> +#include <linux/uio.h> #include <linux/uuid.h> #include <linux/file.h> @@ -32,6 +33,19 @@ #include "trace.h" #include <trace/events/f2fs.h> +static int f2fs_filemap_fault(struct vm_area_struct *vma, + struct vm_fault *vmf) +{ + struct inode *inode = file_inode(vma->vm_file); + int err; + + down_read(&F2FS_I(inode)->i_mmap_sem); + err = filemap_fault(vma, vmf); + up_read(&F2FS_I(inode)->i_mmap_sem); + + return err; +} + static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) { @@ -41,6 +55,11 @@ static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma, struct dnode_of_data dn; int err; + if (unlikely(f2fs_cp_error(sbi))) { + err = -EIO; + goto err; + } + sb_start_pagefault(inode->i_sb); f2fs_bug_on(sbi, f2fs_has_inline_data(inode)); @@ -59,13 +78,14 @@ static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma, f2fs_balance_fs(sbi, dn.node_changed); file_update_time(vma->vm_file); + down_read(&F2FS_I(inode)->i_mmap_sem); lock_page(page); if (unlikely(page->mapping != inode->i_mapping || page_offset(page) > i_size_read(inode) || !PageUptodate(page))) { unlock_page(page); err = -EFAULT; - goto out; + goto out_sem; } /* @@ -77,7 +97,8 @@ static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma, /* page is wholly or partially inside EOF */ if (((loff_t)(page->index + 1) << PAGE_SHIFT) > i_size_read(inode)) { - unsigned offset; + loff_t offset; + offset = i_size_read(inode) & ~PAGE_MASK; zero_user_segment(page, offset, PAGE_SIZE); } @@ -85,25 +106,28 @@ static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma, if (!PageUptodate(page)) SetPageUptodate(page); + f2fs_update_iostat(sbi, APP_MAPPED_IO, F2FS_BLKSIZE); + trace_f2fs_vm_page_mkwrite(page, DATA); mapped: /* fill the page */ f2fs_wait_on_page_writeback(page, DATA, false); - /* wait for GCed encrypted page writeback */ - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) - f2fs_wait_on_encrypted_page_writeback(sbi, dn.data_blkaddr); + /* wait for GCed page writeback via META_MAPPING */ + if (f2fs_post_read_required(inode)) + f2fs_wait_on_block_writeback(sbi, dn.data_blkaddr); - /* if gced page is attached, don't write to cold segment */ - clear_cold_data(page); +out_sem: + up_read(&F2FS_I(inode)->i_mmap_sem); out: sb_end_pagefault(inode->i_sb); f2fs_update_time(sbi, REQ_TIME); +err: return block_page_mkwrite_return(err); } static const struct vm_operations_struct f2fs_file_vm_ops = { - .fault = filemap_fault, + .fault = f2fs_filemap_fault, .map_pages = filemap_map_pages, .page_mkwrite = f2fs_vm_page_mkwrite, }; @@ -118,39 +142,39 @@ static int get_parent_ino(struct inode *inode, nid_t *pino) if (!dentry) return 0; - if (update_dent_inode(inode, inode, &dentry->d_name)) { - dput(dentry); - return 0; - } - *pino = parent_ino(dentry); dput(dentry); return 1; } -static inline bool need_do_checkpoint(struct inode *inode) +static inline enum cp_reason_type need_do_checkpoint(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); - bool need_cp = false; + enum cp_reason_type cp_reason = CP_NO_NEEDED; - if (!S_ISREG(inode->i_mode) || inode->i_nlink != 1) - need_cp = true; + if (!S_ISREG(inode->i_mode)) + cp_reason = CP_NON_REGULAR; + else if (inode->i_nlink != 1) + cp_reason = CP_HARDLINK; else if (is_sbi_flag_set(sbi, SBI_NEED_CP)) - need_cp = true; + cp_reason = CP_SB_NEED_CP; else if (file_wrong_pino(inode)) - need_cp = true; - else if (!space_for_roll_forward(sbi)) - need_cp = true; - else if (!is_checkpointed_node(sbi, F2FS_I(inode)->i_pino)) - need_cp = true; - else if (F2FS_I(inode)->xattr_ver == cur_cp_version(F2FS_CKPT(sbi))) - need_cp = true; + cp_reason = CP_WRONG_PINO; + else if (!f2fs_space_for_roll_forward(sbi)) + cp_reason = CP_NO_SPC_ROLL; + else if (!f2fs_is_checkpointed_node(sbi, F2FS_I(inode)->i_pino)) + cp_reason = CP_NODE_NEED_CP; else if (test_opt(sbi, FASTBOOT)) - need_cp = true; - else if (sbi->active_logs == 2) - need_cp = true; + cp_reason = CP_FASTBOOT_MODE; + else if (F2FS_OPTION(sbi).active_logs == 2) + cp_reason = CP_SPEC_LOG_NUM; + else if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT && + f2fs_need_dentry_mark(sbi, inode->i_ino) && + f2fs_exist_written_data(sbi, F2FS_I(inode)->i_pino, + TRANS_DIR_INO)) + cp_reason = CP_RECOVER_DIR; - return need_cp; + return cp_reason; } static bool need_inode_page_update(struct f2fs_sb_info *sbi, nid_t ino) @@ -158,7 +182,7 @@ static bool need_inode_page_update(struct f2fs_sb_info *sbi, nid_t ino) struct page *i = find_get_page(NODE_MAPPING(sbi), ino); bool ret = false; /* But we need to avoid that there are some inode updates */ - if ((i && PageDirty(i)) || need_inode_block_update(sbi, ino)) + if ((i && PageDirty(i)) || f2fs_need_inode_block_update(sbi, ino)) ret = true; f2fs_put_page(i, 0); return ret; @@ -170,7 +194,6 @@ static void try_to_fix_pino(struct inode *inode) nid_t pino; down_write(&fi->i_sem); - fi->xattr_ver = 0; if (file_wrong_pino(inode) && inode->i_nlink == 1 && get_parent_ino(inode, &pino)) { f2fs_i_pino_write(inode, pino); @@ -186,12 +209,13 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, struct f2fs_sb_info *sbi = F2FS_I_SB(inode); nid_t ino = inode->i_ino; int ret = 0; - bool need_cp = false; + enum cp_reason_type cp_reason = 0; struct writeback_control wbc = { .sync_mode = WB_SYNC_ALL, .nr_to_write = LONG_MAX, .for_reclaim = 0, }; + unsigned int seq_id = 0; if (unlikely(f2fs_readonly(inode->i_sb))) return 0; @@ -205,12 +229,12 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, clear_inode_flag(inode, FI_NEED_IPU); if (ret) { - trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); + trace_f2fs_sync_file_exit(inode, cp_reason, datasync, ret); return ret; } /* if the inode is dirty, let's recover all the time */ - if (!datasync && !f2fs_skip_inode_update(inode)) { + if (!f2fs_skip_inode_update(inode, datasync)) { f2fs_write_inode(inode, NULL); goto go_write; } @@ -219,14 +243,14 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, * if there is no written data, don't waste time to write recovery info. */ if (!is_inode_flag_set(inode, FI_APPEND_WRITE) && - !exist_written_data(sbi, ino, APPEND_INO)) { + !f2fs_exist_written_data(sbi, ino, APPEND_INO)) { /* it may call write_inode just prior to fsync */ if (need_inode_page_update(sbi, ino)) goto go_write; if (is_inode_flag_set(inode, FI_UPDATE_WRITE) || - exist_written_data(sbi, ino, UPDATE_INO)) + f2fs_exist_written_data(sbi, ino, UPDATE_INO)) goto flush_out; goto out; } @@ -236,10 +260,10 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, * sudden-power-off. */ down_read(&F2FS_I(inode)->i_sem); - need_cp = need_do_checkpoint(inode); + cp_reason = need_do_checkpoint(inode); up_read(&F2FS_I(inode)->i_sem); - if (need_cp) { + if (cp_reason) { /* all the dirty node pages should be flushed for POR */ ret = f2fs_sync_fs(inode->i_sb, 1); @@ -253,7 +277,9 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, goto out; } sync_nodes: - ret = fsync_node_pages(sbi, inode, &wbc, atomic); + atomic_inc(&sbi->wb_sync_req[NODE]); + ret = f2fs_fsync_node_pages(sbi, inode, &wbc, atomic, &seq_id); + atomic_dec(&sbi->wb_sync_req[NODE]); if (ret) goto out; @@ -263,60 +289,77 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, goto out; } - if (need_inode_block_update(sbi, ino)) { - f2fs_mark_inode_dirty_sync(inode); + if (f2fs_need_inode_block_update(sbi, ino)) { + f2fs_mark_inode_dirty_sync(inode, true); f2fs_write_inode(inode, NULL); goto sync_nodes; } - ret = wait_on_node_pages_writeback(sbi, ino); - if (ret) - goto out; + /* + * If it's atomic_write, it's just fine to keep write ordering. So + * here we don't need to wait for node write completion, since we use + * node chain which serializes node blocks. If one of node writes are + * reordered, we can see simply broken chain, resulting in stopping + * roll-forward recovery. It means we'll recover all or none node blocks + * given fsync mark. + */ + if (!atomic) { + ret = f2fs_wait_on_node_pages_writeback(sbi, seq_id); + if (ret) + goto out; + } /* once recovery info is written, don't need to tack this */ - remove_ino_entry(sbi, ino, APPEND_INO); + f2fs_remove_ino_entry(sbi, ino, APPEND_INO); clear_inode_flag(inode, FI_APPEND_WRITE); flush_out: - remove_ino_entry(sbi, ino, UPDATE_INO); - clear_inode_flag(inode, FI_UPDATE_WRITE); - ret = f2fs_issue_flush(sbi); + if (!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER) + ret = f2fs_issue_flush(sbi, inode->i_ino); + if (!ret) { + f2fs_remove_ino_entry(sbi, ino, UPDATE_INO); + clear_inode_flag(inode, FI_UPDATE_WRITE); + f2fs_remove_ino_entry(sbi, ino, FLUSH_INO); + } f2fs_update_time(sbi, REQ_TIME); out: - trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); + trace_f2fs_sync_file_exit(inode, cp_reason, datasync, ret); f2fs_trace_ios(NULL, 1); return ret; } int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) { + if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(file))))) + return -EIO; return f2fs_do_sync_file(file, start, end, datasync, false); } static pgoff_t __get_first_dirty_index(struct address_space *mapping, pgoff_t pgofs, int whence) { - struct pagevec pvec; + struct page *page; int nr_pages; if (whence != SEEK_DATA) return 0; /* find first dirty page index */ - pagevec_init(&pvec, 0); - nr_pages = pagevec_lookup_tag(&pvec, mapping, &pgofs, - PAGECACHE_TAG_DIRTY, 1); - pgofs = nr_pages ? pvec.pages[0]->index : ULONG_MAX; - pagevec_release(&pvec); + nr_pages = find_get_pages_tag(mapping, &pgofs, PAGECACHE_TAG_DIRTY, + 1, &page); + if (!nr_pages) + return ULONG_MAX; + pgofs = page->index; + put_page(page); return pgofs; } -static bool __found_offset(block_t blkaddr, pgoff_t dirty, pgoff_t pgofs, - int whence) +static bool __found_offset(struct f2fs_sb_info *sbi, block_t blkaddr, + pgoff_t dirty, pgoff_t pgofs, int whence) { switch (whence) { case SEEK_DATA: if ((blkaddr == NEW_ADDR && dirty == pgofs) || - (blkaddr != NEW_ADDR && blkaddr != NULL_ADDR)) + is_valid_data_blkaddr(sbi, blkaddr)) return true; break; case SEEK_HOLE: @@ -356,13 +399,13 @@ static loff_t f2fs_seek_block(struct file *file, loff_t offset, int whence) for (; data_ofs < isize; data_ofs = (loff_t)pgofs << PAGE_SHIFT) { set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, pgofs, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, pgofs, LOOKUP_NODE); if (err && err != -ENOENT) { goto fail; } else if (err == -ENOENT) { /* direct node does not exists */ if (whence == SEEK_DATA) { - pgofs = get_next_page_offset(&dn, pgofs); + pgofs = f2fs_get_next_page_offset(&dn, pgofs); continue; } else { goto found; @@ -376,9 +419,19 @@ static loff_t f2fs_seek_block(struct file *file, loff_t offset, int whence) dn.ofs_in_node++, pgofs++, data_ofs = (loff_t)pgofs << PAGE_SHIFT) { block_t blkaddr; - blkaddr = datablock_addr(dn.node_page, dn.ofs_in_node); - if (__found_offset(blkaddr, dirty, pgofs, whence)) { + blkaddr = datablock_addr(dn.inode, + dn.node_page, dn.ofs_in_node); + + if (__is_valid_data_blkaddr(blkaddr) && + !f2fs_is_valid_blkaddr(F2FS_I_SB(inode), + blkaddr, DATA_GENERIC)) { + f2fs_put_dnode(&dn); + goto fail; + } + + if (__found_offset(F2FS_I_SB(inode), blkaddr, dirty, + pgofs, whence)) { f2fs_put_dnode(&dn); goto found; } @@ -424,13 +477,8 @@ static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma) struct inode *inode = file_inode(file); int err; - if (f2fs_encrypted_inode(inode)) { - err = fscrypt_get_encryption_info(inode); - if (err) - return 0; - if (!f2fs_encrypted_inode(inode)) - return -ENOKEY; - } + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) + return -EIO; /* we don't need to use inline_data strictly */ err = f2fs_convert_inline_inode(inode); @@ -444,44 +492,44 @@ static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma) static int f2fs_file_open(struct inode *inode, struct file *filp) { - int ret = generic_file_open(inode, filp); - struct dentry *dir; + int err = fscrypt_file_open(inode, filp); - if (!ret && f2fs_encrypted_inode(inode)) { - ret = fscrypt_get_encryption_info(inode); - if (ret) - return -EACCES; - if (!fscrypt_has_encryption_key(inode)) - return -ENOKEY; - } - dir = dget_parent(file_dentry(filp)); - if (f2fs_encrypted_inode(d_inode(dir)) && - !fscrypt_has_permitted_context(d_inode(dir), inode)) { - dput(dir); - return -EPERM; - } - dput(dir); - return ret; + if (err) + return err; + + filp->f_mode |= FMODE_NOWAIT; + + return dquot_file_open(inode, filp); } -int truncate_data_blocks_range(struct dnode_of_data *dn, int count) +void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count) { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); struct f2fs_node *raw_node; int nr_free = 0, ofs = dn->ofs_in_node, len = count; __le32 *addr; + int base = 0; + + if (IS_INODE(dn->node_page) && f2fs_has_extra_attr(dn->inode)) + base = get_extra_isize(dn->inode); raw_node = F2FS_NODE(dn->node_page); - addr = blkaddr_in_node(raw_node) + ofs; + addr = blkaddr_in_node(raw_node) + base + ofs; for (; count > 0; count--, addr++, dn->ofs_in_node++) { block_t blkaddr = le32_to_cpu(*addr); + if (blkaddr == NULL_ADDR) continue; dn->data_blkaddr = NULL_ADDR; - set_data_blkaddr(dn); - invalidate_blocks(sbi, blkaddr); + f2fs_set_data_blkaddr(dn); + + if (__is_valid_data_blkaddr(blkaddr) && + !f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC)) + continue; + + f2fs_invalidate_blocks(sbi, blkaddr); if (dn->ofs_in_node == 0 && IS_INODE(dn->node_page)) clear_inode_flag(dn->inode, FI_FIRST_BLOCK_WRITTEN); nr_free++; @@ -493,7 +541,7 @@ int truncate_data_blocks_range(struct dnode_of_data *dn, int count) * once we invalidate valid blkaddr in range [ofs, ofs + count], * we will invalidate all blkaddr in the whole range. */ - fofs = start_bidx_of_node(ofs_of_node(dn->node_page), + fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) + ofs; f2fs_update_extent_cache_range(dn, fofs, 0, len); dec_valid_block_count(sbi, dn->inode, nr_free); @@ -503,18 +551,17 @@ int truncate_data_blocks_range(struct dnode_of_data *dn, int count) f2fs_update_time(sbi, REQ_TIME); trace_f2fs_truncate_data_blocks_range(dn->inode, dn->nid, dn->ofs_in_node, nr_free); - return nr_free; } -void truncate_data_blocks(struct dnode_of_data *dn) +void f2fs_truncate_data_blocks(struct dnode_of_data *dn) { - truncate_data_blocks_range(dn, ADDRS_PER_BLOCK); + f2fs_truncate_data_blocks_range(dn, ADDRS_PER_BLOCK); } static int truncate_partial_data_page(struct inode *inode, u64 from, bool cache_only) { - unsigned offset = from & (PAGE_SIZE - 1); + loff_t offset = from & (PAGE_SIZE - 1); pgoff_t index = from >> PAGE_SHIFT; struct address_space *mapping = inode->i_mapping; struct page *page; @@ -530,23 +577,24 @@ static int truncate_partial_data_page(struct inode *inode, u64 from, return 0; } - page = get_lock_data_page(inode, index, true); + page = f2fs_get_lock_data_page(inode, index, true); if (IS_ERR(page)) - return 0; + return PTR_ERR(page) == -ENOENT ? 0 : PTR_ERR(page); truncate_out: f2fs_wait_on_page_writeback(page, DATA, true); zero_user(page, offset, PAGE_SIZE - offset); - if (!cache_only || !f2fs_encrypted_inode(inode) || - !S_ISREG(inode->i_mode)) + + /* An encrypted inode should have a key and truncate the last page. */ + f2fs_bug_on(F2FS_I_SB(inode), cache_only && f2fs_encrypted_inode(inode)); + if (!cache_only) set_page_dirty(page); f2fs_put_page(page, 1); return 0; } -int truncate_blocks(struct inode *inode, u64 from, bool lock) +int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); - unsigned int blocksize = inode->i_sb->s_blocksize; struct dnode_of_data dn; pgoff_t free_from; int count = 0, err = 0; @@ -555,7 +603,7 @@ int truncate_blocks(struct inode *inode, u64 from, bool lock) trace_f2fs_truncate_blocks_enter(inode, from); - free_from = (pgoff_t)F2FS_BYTES_TO_BLK(from + blocksize - 1); + free_from = (pgoff_t)F2FS_BLK_ALIGN(from); if (free_from >= sbi->max_file_blocks) goto free_partial; @@ -563,22 +611,21 @@ int truncate_blocks(struct inode *inode, u64 from, bool lock) if (lock) f2fs_lock_op(sbi); - ipage = get_node_page(sbi, inode->i_ino); + ipage = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(ipage)) { err = PTR_ERR(ipage); goto out; } if (f2fs_has_inline_data(inode)) { - if (truncate_inline_inode(ipage, from)) - set_page_dirty(ipage); + f2fs_truncate_inline_inode(inode, ipage, from); f2fs_put_page(ipage, 1); truncate_page = true; goto out; } set_new_dnode(&dn, inode, ipage, NULL, 0); - err = get_dnode_of_data(&dn, free_from, LOOKUP_NODE_RA); + err = f2fs_get_dnode_of_data(&dn, free_from, LOOKUP_NODE_RA); if (err) { if (err == -ENOENT) goto free_next; @@ -591,13 +638,13 @@ int truncate_blocks(struct inode *inode, u64 from, bool lock) f2fs_bug_on(sbi, count < 0); if (dn.ofs_in_node || IS_INODE(dn.node_page)) { - truncate_data_blocks_range(&dn, count); + f2fs_truncate_data_blocks_range(&dn, count); free_from += count; } f2fs_put_dnode(&dn); free_next: - err = truncate_inode_blocks(inode, free_from); + err = f2fs_truncate_inode_blocks(inode, free_from); out: if (lock) f2fs_unlock_op(sbi); @@ -614,12 +661,20 @@ int f2fs_truncate(struct inode *inode) { int err; + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) + return -EIO; + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))) return 0; trace_f2fs_truncate(inode); + if (time_to_inject(F2FS_I_SB(inode), FAULT_TRUNCATE)) { + f2fs_show_injection_info(FAULT_TRUNCATE); + return -EIO; + } + /* we should check inline_data size */ if (!f2fs_may_inline_data(inode)) { err = f2fs_convert_inline_inode(inode); @@ -627,21 +682,57 @@ int f2fs_truncate(struct inode *inode) return err; } - err = truncate_blocks(inode, i_size_read(inode), true); + err = f2fs_truncate_blocks(inode, i_size_read(inode), true); if (err) return err; inode->i_mtime = inode->i_ctime = current_time(inode); - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, false); return 0; } int f2fs_getattr(struct vfsmount *mnt, - struct dentry *dentry, struct kstat *stat) + struct dentry *dentry, struct kstat *stat) { struct inode *inode = d_inode(dentry); +#if 0 + struct f2fs_inode_info *fi = F2FS_I(inode); + struct f2fs_inode *ri; + unsigned int flags; + + if (f2fs_has_extra_attr(inode) && + f2fs_sb_has_inode_crtime(inode->i_sb) && + F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_crtime)) { + stat->result_mask |= STATX_BTIME; + stat->btime.tv_sec = fi->i_crtime.tv_sec; + stat->btime.tv_nsec = fi->i_crtime.tv_nsec; + } + + flags = fi->i_flags & F2FS_FL_USER_VISIBLE; + if (flags & F2FS_APPEND_FL) + stat->attributes |= STATX_ATTR_APPEND; + if (flags & F2FS_COMPR_FL) + stat->attributes |= STATX_ATTR_COMPRESSED; + if (f2fs_encrypted_inode(inode)) + stat->attributes |= STATX_ATTR_ENCRYPTED; + if (flags & F2FS_IMMUTABLE_FL) + stat->attributes |= STATX_ATTR_IMMUTABLE; + if (flags & F2FS_NODUMP_FL) + stat->attributes |= STATX_ATTR_NODUMP; + + stat->attributes_mask |= (STATX_ATTR_APPEND | + STATX_ATTR_COMPRESSED | + STATX_ATTR_ENCRYPTED | + STATX_ATTR_IMMUTABLE | + STATX_ATTR_NODUMP); +#endif generic_fillattr(inode, stat); - stat->blocks <<= 3; + + /* we need to show initial sectors used for inline_data/dentries */ + if ((S_ISREG(inode->i_mode) && f2fs_has_inline_data(inode)) || + f2fs_has_inline_dentry(inode)) + stat->blocks += (stat->size + 511) >> 9; + return 0; } @@ -679,29 +770,54 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) { struct inode *inode = d_inode(dentry); int err; + bool size_changed = false; + + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) + return -EIO; err = setattr_prepare(dentry, attr); if (err) return err; + err = fscrypt_prepare_setattr(dentry, attr); + if (err) + return err; + + if (is_quota_modification(inode, attr)) { + err = dquot_initialize(inode); + if (err) + return err; + } + if ((attr->ia_valid & ATTR_UID && + !uid_eq(attr->ia_uid, inode->i_uid)) || + (attr->ia_valid & ATTR_GID && + !gid_eq(attr->ia_gid, inode->i_gid))) { + err = dquot_transfer(inode, attr); + if (err) + return err; + } + if (attr->ia_valid & ATTR_SIZE) { - if (f2fs_encrypted_inode(inode) && - fscrypt_get_encryption_info(inode)) - return -EACCES; + bool to_smaller = (attr->ia_size <= i_size_read(inode)); - if (attr->ia_size <= i_size_read(inode)) { - truncate_setsize(inode, attr->ia_size); + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); + + truncate_setsize(inode, attr->ia_size); + + if (to_smaller) err = f2fs_truncate(inode); - if (err) - return err; - f2fs_balance_fs(F2FS_I_SB(inode), true); - } else { - /* - * do not trim all blocks after i_size if target size is - * larger than i_size. - */ - truncate_setsize(inode, attr->ia_size); + /* + * do not trim all blocks after i_size if target size is + * larger than i_size. + */ + up_write(&F2FS_I(inode)->i_mmap_sem); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + if (err) + return err; + + if (!to_smaller) { /* should convert inline inode here */ if (!f2fs_may_inline_data(inode)) { err = f2fs_convert_inline_inode(inode); @@ -710,19 +826,30 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) } inode->i_mtime = inode->i_ctime = current_time(inode); } + + down_write(&F2FS_I(inode)->i_sem); + F2FS_I(inode)->last_disk_size = i_size_read(inode); + up_write(&F2FS_I(inode)->i_sem); + + size_changed = true; } __setattr_copy(inode, attr); if (attr->ia_valid & ATTR_MODE) { - err = posix_acl_chmod(inode, get_inode_mode(inode)); + err = posix_acl_chmod(inode, f2fs_get_inode_mode(inode)); if (err || is_inode_flag_set(inode, FI_ACL_MODE)) { inode->i_mode = F2FS_I(inode)->i_acl_mode; clear_inode_flag(inode, FI_ACL_MODE); } } - f2fs_mark_inode_dirty_sync(inode); + /* file size may changed here */ + f2fs_mark_inode_dirty_sync(inode, size_changed); + + /* inode change will produce dirty node pages flushed by checkpoint */ + f2fs_balance_fs(F2FS_I_SB(inode), true); + return err; } @@ -749,7 +876,7 @@ static int fill_zero(struct inode *inode, pgoff_t index, f2fs_balance_fs(sbi, true); f2fs_lock_op(sbi); - page = get_new_data_page(inode, NULL, index, false); + page = f2fs_get_new_data_page(inode, NULL, index, false); f2fs_unlock_op(sbi); if (IS_ERR(page)) @@ -762,7 +889,7 @@ static int fill_zero(struct inode *inode, pgoff_t index, return 0; } -int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end) +int f2fs_truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end) { int err; @@ -771,10 +898,11 @@ int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end) pgoff_t end_offset, count; set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, pg_start, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, pg_start, LOOKUP_NODE); if (err) { if (err == -ENOENT) { - pg_start++; + pg_start = f2fs_get_next_page_offset(&dn, + pg_start); continue; } return err; @@ -785,7 +913,7 @@ int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end) f2fs_bug_on(F2FS_I_SB(inode), count == 0 || count > end_offset); - truncate_data_blocks_range(&dn, count); + f2fs_truncate_data_blocks_range(&dn, count); f2fs_put_dnode(&dn); pg_start += count; @@ -836,12 +964,19 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len) blk_start = (loff_t)pg_start << PAGE_SHIFT; blk_end = (loff_t)pg_end << PAGE_SHIFT; + + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); + truncate_inode_pages_range(mapping, blk_start, blk_end - 1); f2fs_lock_op(sbi); - ret = truncate_hole(inode, pg_start, pg_end); + ret = f2fs_truncate_hole(inode, pg_start, pg_end); f2fs_unlock_op(sbi); + + up_write(&F2FS_I(inode)->i_mmap_sem); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); } } @@ -857,7 +992,7 @@ static int __read_out_blkaddrs(struct inode *inode, block_t *blkaddr, next_dnode: set_new_dnode(&dn, inode, NULL, NULL, 0); - ret = get_dnode_of_data(&dn, off, LOOKUP_NODE_RA); + ret = f2fs_get_dnode_of_data(&dn, off, LOOKUP_NODE_RA); if (ret && ret != -ENOENT) { return ret; } else if (ret == -ENOENT) { @@ -872,8 +1007,9 @@ static int __read_out_blkaddrs(struct inode *inode, block_t *blkaddr, done = min((pgoff_t)ADDRS_PER_PAGE(dn.node_page, inode) - dn.ofs_in_node, len); for (i = 0; i < done; i++, blkaddr++, do_replace++, dn.ofs_in_node++) { - *blkaddr = datablock_addr(dn.node_page, dn.ofs_in_node); - if (!is_checkpointed_data(sbi, *blkaddr)) { + *blkaddr = datablock_addr(dn.inode, + dn.node_page, dn.ofs_in_node); + if (!f2fs_is_checkpointed_data(sbi, *blkaddr)) { if (test_opt(sbi, LFS)) { f2fs_put_dnode(&dn); @@ -906,10 +1042,10 @@ static int __roll_back_blkaddrs(struct inode *inode, block_t *blkaddr, continue; set_new_dnode(&dn, inode, NULL, NULL, 0); - ret = get_dnode_of_data(&dn, off + i, LOOKUP_NODE_RA); + ret = f2fs_get_dnode_of_data(&dn, off + i, LOOKUP_NODE_RA); if (ret) { dec_valid_block_count(sbi, inode, 1); - invalidate_blocks(sbi, *blkaddr); + f2fs_invalidate_blocks(sbi, *blkaddr); } else { f2fs_update_data_blkaddr(&dn, *blkaddr); } @@ -939,24 +1075,29 @@ static int __clone_blkaddrs(struct inode *src_inode, struct inode *dst_inode, pgoff_t ilen; set_new_dnode(&dn, dst_inode, NULL, NULL, 0); - ret = get_dnode_of_data(&dn, dst + i, ALLOC_NODE); + ret = f2fs_get_dnode_of_data(&dn, dst + i, ALLOC_NODE); if (ret) return ret; - get_node_info(sbi, dn.nid, &ni); + ret = f2fs_get_node_info(sbi, dn.nid, &ni); + if (ret) { + f2fs_put_dnode(&dn); + return ret; + } + ilen = min((pgoff_t) ADDRS_PER_PAGE(dn.node_page, dst_inode) - dn.ofs_in_node, len - i); do { - dn.data_blkaddr = datablock_addr(dn.node_page, - dn.ofs_in_node); - truncate_data_blocks_range(&dn, 1); + dn.data_blkaddr = datablock_addr(dn.inode, + dn.node_page, dn.ofs_in_node); + f2fs_truncate_data_blocks_range(&dn, 1); if (do_replace[i]) { f2fs_i_blocks_write(src_inode, - 1, false); + 1, false, false); f2fs_i_blocks_write(dst_inode, - 1, true); + 1, true, false); f2fs_replace_block(sbi, &dn, dn.data_blkaddr, blkaddr[i], ni.version, true, false); @@ -973,10 +1114,11 @@ static int __clone_blkaddrs(struct inode *src_inode, struct inode *dst_inode, } else { struct page *psrc, *pdst; - psrc = get_lock_data_page(src_inode, src + i, true); + psrc = f2fs_get_lock_data_page(src_inode, + src + i, true); if (IS_ERR(psrc)) return PTR_ERR(psrc); - pdst = get_new_data_page(dst_inode, NULL, dst + i, + pdst = f2fs_get_new_data_page(dst_inode, NULL, dst + i, true); if (IS_ERR(pdst)) { f2fs_put_page(psrc, 1); @@ -987,7 +1129,8 @@ static int __clone_blkaddrs(struct inode *src_inode, struct inode *dst_inode, f2fs_put_page(pdst, 1); f2fs_put_page(psrc, 1); - ret = truncate_hole(src_inode, src + i, src + i + 1); + ret = f2fs_truncate_hole(src_inode, + src + i, src + i + 1); if (ret) return ret; i++; @@ -1008,11 +1151,15 @@ static int __exchange_data_block(struct inode *src_inode, while (len) { olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len); - src_blkaddr = f2fs_kvzalloc(sizeof(block_t) * olen, GFP_KERNEL); + src_blkaddr = f2fs_kvzalloc(F2FS_I_SB(src_inode), + array_size(olen, sizeof(block_t)), + GFP_KERNEL); if (!src_blkaddr) return -ENOMEM; - do_replace = f2fs_kvzalloc(sizeof(int) * olen, GFP_KERNEL); + do_replace = f2fs_kvzalloc(F2FS_I_SB(src_inode), + array_size(olen, sizeof(int)), + GFP_KERNEL); if (!do_replace) { kvfree(src_blkaddr); return -ENOMEM; @@ -1038,31 +1185,39 @@ static int __exchange_data_block(struct inode *src_inode, return 0; roll_back: - __roll_back_blkaddrs(src_inode, src_blkaddr, do_replace, src, len); + __roll_back_blkaddrs(src_inode, src_blkaddr, do_replace, src, olen); kvfree(src_blkaddr); kvfree(do_replace); return ret; } -static int f2fs_do_collapse(struct inode *inode, pgoff_t start, pgoff_t end) +static int f2fs_do_collapse(struct inode *inode, loff_t offset, loff_t len) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); pgoff_t nrpages = (i_size_read(inode) + PAGE_SIZE - 1) / PAGE_SIZE; + pgoff_t start = offset >> PAGE_SHIFT; + pgoff_t end = (offset + len) >> PAGE_SHIFT; int ret; f2fs_balance_fs(sbi, true); + + /* avoid gc operation during block exchange */ + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); + f2fs_lock_op(sbi); - f2fs_drop_extent_tree(inode); - + truncate_pagecache(inode, offset); ret = __exchange_data_block(inode, inode, end, start, nrpages - end, true); f2fs_unlock_op(sbi); + + up_write(&F2FS_I(inode)->i_mmap_sem); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); return ret; } static int f2fs_collapse_range(struct inode *inode, loff_t offset, loff_t len) { - pgoff_t pg_start, pg_end; loff_t new_size; int ret; @@ -1077,31 +1232,27 @@ static int f2fs_collapse_range(struct inode *inode, loff_t offset, loff_t len) if (ret) return ret; - pg_start = offset >> PAGE_SHIFT; - pg_end = (offset + len) >> PAGE_SHIFT; - /* write out all dirty pages from offset */ ret = filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX); if (ret) return ret; - truncate_pagecache(inode, offset); - - ret = f2fs_do_collapse(inode, pg_start, pg_end); + ret = f2fs_do_collapse(inode, offset, len); if (ret) return ret; /* write out all moved pages, if possible */ + down_write(&F2FS_I(inode)->i_mmap_sem); filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX); truncate_pagecache(inode, offset); new_size = i_size_read(inode) - len; truncate_pagecache(inode, new_size); - ret = truncate_blocks(inode, new_size, true); + ret = f2fs_truncate_blocks(inode, new_size, true); + up_write(&F2FS_I(inode)->i_mmap_sem); if (!ret) f2fs_i_size_write(inode, new_size); - return ret; } @@ -1115,21 +1266,22 @@ static int f2fs_do_zero_range(struct dnode_of_data *dn, pgoff_t start, int ret; for (; index < end; index++, dn->ofs_in_node++) { - if (datablock_addr(dn->node_page, dn->ofs_in_node) == NULL_ADDR) + if (datablock_addr(dn->inode, dn->node_page, + dn->ofs_in_node) == NULL_ADDR) count++; } dn->ofs_in_node = ofs_in_node; - ret = reserve_new_blocks(dn, count); + ret = f2fs_reserve_new_blocks(dn, count); if (ret) return ret; dn->ofs_in_node = ofs_in_node; for (index = start; index < end; index++, dn->ofs_in_node++) { - dn->data_blkaddr = - datablock_addr(dn->node_page, dn->ofs_in_node); + dn->data_blkaddr = datablock_addr(dn->inode, + dn->node_page, dn->ofs_in_node); /* - * reserve_new_blocks will not guarantee entire block + * f2fs_reserve_new_blocks will not guarantee entire block * allocation. */ if (dn->data_blkaddr == NULL_ADDR) { @@ -1137,9 +1289,9 @@ static int f2fs_do_zero_range(struct dnode_of_data *dn, pgoff_t start, break; } if (dn->data_blkaddr != NEW_ADDR) { - invalidate_blocks(sbi, dn->data_blkaddr); + f2fs_invalidate_blocks(sbi, dn->data_blkaddr); dn->data_blkaddr = NEW_ADDR; - set_data_blkaddr(dn); + f2fs_set_data_blkaddr(dn); } } @@ -1170,8 +1322,6 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, if (ret) return ret; - truncate_pagecache_range(inode, offset, offset + len - 1); - pg_start = ((unsigned long long) offset) >> PAGE_SHIFT; pg_end = ((unsigned long long) offset + len) >> PAGE_SHIFT; @@ -1184,8 +1334,6 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, if (ret) return ret; - if (offset + len > new_size) - new_size = offset + len; new_size = max_t(loff_t, new_size, offset + len); } else { if (off_start) { @@ -1203,12 +1351,21 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, unsigned int end_offset; pgoff_t end; + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); + + truncate_pagecache_range(inode, + (loff_t)index << PAGE_SHIFT, + ((loff_t)pg_end << PAGE_SHIFT) - 1); + f2fs_lock_op(sbi); set_new_dnode(&dn, inode, NULL, NULL, 0); - ret = get_dnode_of_data(&dn, index, ALLOC_NODE); + ret = f2fs_get_dnode_of_data(&dn, index, ALLOC_NODE); if (ret) { f2fs_unlock_op(sbi); + up_write(&F2FS_I(inode)->i_mmap_sem); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); goto out; } @@ -1217,7 +1374,13 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, ret = f2fs_do_zero_range(&dn, index, end); f2fs_put_dnode(&dn); + f2fs_unlock_op(sbi); + up_write(&F2FS_I(inode)->i_mmap_sem); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + + f2fs_balance_fs(sbi, dn.node_changed); + if (ret) goto out; @@ -1236,9 +1399,12 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, } out: - if (!(mode & FALLOC_FL_KEEP_SIZE) && i_size_read(inode) < new_size) - f2fs_i_size_write(inode, new_size); - + if (new_size > i_size_read(inode)) { + if (mode & FALLOC_FL_KEEP_SIZE) + file_set_keep_isize(inode); + else + f2fs_i_size_write(inode, new_size); + } return ret; } @@ -1250,8 +1416,9 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) int ret = 0; new_size = i_size_read(inode) + len; - if (new_size > inode->i_sb->s_maxbytes) - return -EFBIG; + ret = inode_newsize_ok(inode, new_size); + if (ret) + return ret; if (offset >= i_size_read(inode)) return -EINVAL; @@ -1266,7 +1433,9 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) f2fs_balance_fs(sbi, true); - ret = truncate_blocks(inode, i_size_read(inode), true); + down_write(&F2FS_I(inode)->i_mmap_sem); + ret = f2fs_truncate_blocks(inode, i_size_read(inode), true); + up_write(&F2FS_I(inode)->i_mmap_sem); if (ret) return ret; @@ -1275,13 +1444,16 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) if (ret) return ret; - truncate_pagecache(inode, offset); - pg_start = offset >> PAGE_SHIFT; pg_end = (offset + len) >> PAGE_SHIFT; delta = pg_end - pg_start; idx = (i_size_read(inode) + PAGE_SIZE - 1) / PAGE_SIZE; + /* avoid gc operation during block exchange */ + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); + truncate_pagecache(inode, offset); + while (!ret && idx > pg_start) { nr = idx - pg_start; if (nr > delta) @@ -1295,10 +1467,14 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) idx + delta, nr, false); f2fs_unlock_op(sbi); } + up_write(&F2FS_I(inode)->i_mmap_sem); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); /* write out all moved pages, if possible */ + down_write(&F2FS_I(inode)->i_mmap_sem); filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX); truncate_pagecache(inode, offset); + up_write(&F2FS_I(inode)->i_mmap_sem); if (!ret) f2fs_i_size_write(inode, new_size); @@ -1309,19 +1485,20 @@ static int expand_inode_data(struct inode *inode, loff_t offset, loff_t len, int mode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); - struct f2fs_map_blocks map = { .m_next_pgofs = NULL }; + struct f2fs_map_blocks map = { .m_next_pgofs = NULL, + .m_next_extent = NULL, .m_seg_type = NO_CHECK_TYPE }; pgoff_t pg_end; loff_t new_size = i_size_read(inode); loff_t off_end; - int ret; + int err; - ret = inode_newsize_ok(inode, (len + offset)); - if (ret) - return ret; + err = inode_newsize_ok(inode, (len + offset)); + if (err) + return err; - ret = f2fs_convert_inline_inode(inode); - if (ret) - return ret; + err = f2fs_convert_inline_inode(inode); + if (err) + return err; f2fs_balance_fs(sbi, true); @@ -1333,26 +1510,30 @@ static int expand_inode_data(struct inode *inode, loff_t offset, if (off_end) map.m_len++; - ret = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_AIO); - if (ret) { + err = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_AIO); + if (err) { pgoff_t last_off; if (!map.m_len) - return ret; + return err; last_off = map.m_lblk + map.m_len - 1; /* update new size to the failed position */ - new_size = (last_off == pg_end) ? offset + len: + new_size = (last_off == pg_end) ? offset + len : (loff_t)(last_off + 1) << PAGE_SHIFT; } else { new_size = ((loff_t)pg_end << PAGE_SHIFT) + off_end; } - if (!(mode & FALLOC_FL_KEEP_SIZE) && i_size_read(inode) < new_size) - f2fs_i_size_write(inode, new_size); + if (new_size > i_size_read(inode)) { + if (mode & FALLOC_FL_KEEP_SIZE) + file_set_keep_isize(inode); + else + f2fs_i_size_write(inode, new_size); + } - return ret; + return err; } static long f2fs_fallocate(struct file *file, int mode, @@ -1361,6 +1542,9 @@ static long f2fs_fallocate(struct file *file, int mode, struct inode *inode = file_inode(file); long ret = 0; + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) + return -EIO; + /* f2fs only support ->fallocate for regular file */ if (!S_ISREG(inode->i_mode)) return -EINVAL; @@ -1393,7 +1577,7 @@ static long f2fs_fallocate(struct file *file, int mode, if (!ret) { inode->i_mtime = inode->i_ctime = current_time(inode); - f2fs_mark_inode_dirty_sync(inode); + f2fs_mark_inode_dirty_sync(inode, false); f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); } @@ -1416,34 +1600,46 @@ static int f2fs_release_file(struct inode *inode, struct file *filp) /* some remained atomic pages should discarded */ if (f2fs_is_atomic_file(inode)) - drop_inmem_pages(inode); + f2fs_drop_inmem_pages(inode); if (f2fs_is_volatile_file(inode)) { - clear_inode_flag(inode, FI_VOLATILE_FILE); set_inode_flag(inode, FI_DROP_CACHE); filemap_fdatawrite(inode->i_mapping); clear_inode_flag(inode, FI_DROP_CACHE); + clear_inode_flag(inode, FI_VOLATILE_FILE); + stat_dec_volatile_write(inode); } return 0; } -#define F2FS_REG_FLMASK (~(FS_DIRSYNC_FL | FS_TOPDIR_FL)) -#define F2FS_OTHER_FLMASK (FS_NODUMP_FL | FS_NOATIME_FL) - -static inline __u32 f2fs_mask_flags(umode_t mode, __u32 flags) +static int f2fs_file_flush(struct file *file, fl_owner_t id) { - if (S_ISDIR(mode)) - return flags; - else if (S_ISREG(mode)) - return flags & F2FS_REG_FLMASK; - else - return flags & F2FS_OTHER_FLMASK; + struct inode *inode = file_inode(file); + + /* + * If the process doing a transaction is crashed, we should do + * roll-back. Otherwise, other reader/write can see corrupted database + * until all the writers close its file. Since this should be done + * before dropping file lock, it needs to do in ->flush. + */ + if (f2fs_is_atomic_file(inode) && + F2FS_I(inode)->inmem_task == current) + f2fs_drop_inmem_pages(inode); + return 0; } static int f2fs_ioc_getflags(struct file *filp, unsigned long arg) { struct inode *inode = file_inode(filp); struct f2fs_inode_info *fi = F2FS_I(inode); - unsigned int flags = fi->i_flags & FS_FL_USER_VISIBLE; + unsigned int flags = fi->i_flags; + + if (f2fs_encrypted_inode(inode)) + flags |= F2FS_ENCRYPT_FL; + if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode)) + flags |= F2FS_INLINE_DATA_FL; + + flags &= F2FS_FL_USER_VISIBLE; + return put_user(flags, (int __user *)arg); } @@ -1465,28 +1661,34 @@ static int f2fs_ioc_setflags(struct file *filp, unsigned long arg) if (ret) return ret; - flags = f2fs_mask_flags(inode->i_mode, flags); - inode_lock(inode); + /* Is it quota file? Do not allow user to mess with it */ + if (IS_NOQUOTA(inode)) { + ret = -EPERM; + goto unlock_out; + } + + flags = f2fs_mask_flags(inode->i_mode, flags); + oldflags = fi->i_flags; - if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) { + if ((flags ^ oldflags) & (F2FS_APPEND_FL | F2FS_IMMUTABLE_FL)) { if (!capable(CAP_LINUX_IMMUTABLE)) { - inode_unlock(inode); ret = -EPERM; - goto out; + goto unlock_out; } } - flags = flags & FS_FL_USER_MODIFIABLE; - flags |= oldflags & ~FS_FL_USER_MODIFIABLE; + flags = flags & (F2FS_FL_USER_MODIFIABLE); + flags |= oldflags & ~(F2FS_FL_USER_MODIFIABLE); fi->i_flags = flags; - inode_unlock(inode); inode->i_ctime = current_time(inode); f2fs_set_inode_flags(inode); -out: + f2fs_mark_inode_dirty_sync(inode, false); +unlock_out: + inode_unlock(inode); mnt_drop_write_file(filp); return ret; } @@ -1506,35 +1708,48 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) if (!inode_owner_or_capable(inode)) return -EACCES; + if (!S_ISREG(inode->i_mode)) + return -EINVAL; + ret = mnt_want_write_file(filp); if (ret) return ret; inode_lock(inode); - down_write(&F2FS_I(inode)->dio_rwsem[WRITE]); - - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode)) { + if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) + ret = -EINVAL; goto out; + } ret = f2fs_convert_inline_inode(inode); if (ret) goto out; - set_inode_flag(inode, FI_ATOMIC_FILE); - f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); if (!get_dirty_pages(inode)) - goto out; + goto skip_flush; f2fs_msg(F2FS_I_SB(inode)->sb, KERN_WARNING, "Unexpected flush for atomic writes: ino=%lu, npages=%u", inode->i_ino, get_dirty_pages(inode)); ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX); - if (ret) - clear_inode_flag(inode, FI_ATOMIC_FILE); + if (ret) { + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + goto out; + } +skip_flush: + set_inode_flag(inode, FI_ATOMIC_FILE); + clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + + f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); + F2FS_I(inode)->inmem_task = current; + stat_inc_atomic_write(inode); + stat_update_max_atomic_write(inode); out: - up_write(&F2FS_I(inode)->dio_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; @@ -1552,22 +1767,34 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) if (ret) return ret; + f2fs_balance_fs(F2FS_I_SB(inode), true); + inode_lock(inode); - if (f2fs_is_volatile_file(inode)) + if (f2fs_is_volatile_file(inode)) { + ret = -EINVAL; goto err_out; - - if (f2fs_is_atomic_file(inode)) { - clear_inode_flag(inode, FI_ATOMIC_FILE); - ret = commit_inmem_pages(inode); - if (ret) { - set_inode_flag(inode, FI_ATOMIC_FILE); - goto err_out; - } } - ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true); + if (f2fs_is_atomic_file(inode)) { + ret = f2fs_commit_inmem_pages(inode); + if (ret) + goto err_out; + + ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true); + if (!ret) { + clear_inode_flag(inode, FI_ATOMIC_FILE); + F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0; + stat_dec_atomic_write(inode); + } + } else { + ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); + } err_out: + if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) { + clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); + ret = -EINVAL; + } inode_unlock(inode); mnt_drop_write_file(filp); return ret; @@ -1581,6 +1808,9 @@ static int f2fs_ioc_start_volatile_write(struct file *filp) if (!inode_owner_or_capable(inode)) return -EACCES; + if (!S_ISREG(inode->i_mode)) + return -EINVAL; + ret = mnt_want_write_file(filp); if (ret) return ret; @@ -1594,6 +1824,9 @@ static int f2fs_ioc_start_volatile_write(struct file *filp) if (ret) goto out; + stat_inc_volatile_write(inode); + stat_update_max_volatile_write(inode); + set_inode_flag(inode, FI_VOLATILE_FILE); f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); out: @@ -1646,12 +1879,15 @@ static int f2fs_ioc_abort_volatile_write(struct file *filp) inode_lock(inode); if (f2fs_is_atomic_file(inode)) - drop_inmem_pages(inode); + f2fs_drop_inmem_pages(inode); if (f2fs_is_volatile_file(inode)) { clear_inode_flag(inode, FI_VOLATILE_FILE); + stat_dec_volatile_write(inode); ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true); } + clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); + inode_unlock(inode); mnt_drop_write_file(filp); @@ -1682,27 +1918,44 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg) switch (in) { case F2FS_GOING_DOWN_FULLSYNC: sb = freeze_bdev(sb->s_bdev); - if (sb && !IS_ERR(sb)) { + if (IS_ERR(sb)) { + ret = PTR_ERR(sb); + goto out; + } + if (sb) { f2fs_stop_checkpoint(sbi, false); + set_sbi_flag(sbi, SBI_IS_SHUTDOWN); thaw_bdev(sb->s_bdev, sb); } break; case F2FS_GOING_DOWN_METASYNC: /* do checkpoint only */ - f2fs_sync_fs(sb, 1); + ret = f2fs_sync_fs(sb, 1); + if (ret) + goto out; f2fs_stop_checkpoint(sbi, false); + set_sbi_flag(sbi, SBI_IS_SHUTDOWN); break; case F2FS_GOING_DOWN_NOSYNC: f2fs_stop_checkpoint(sbi, false); + set_sbi_flag(sbi, SBI_IS_SHUTDOWN); break; case F2FS_GOING_DOWN_METAFLUSH: - sync_meta_pages(sbi, META, LONG_MAX); + f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_META_IO); f2fs_stop_checkpoint(sbi, false); + set_sbi_flag(sbi, SBI_IS_SHUTDOWN); break; default: ret = -EINVAL; goto out; } + + f2fs_stop_gc_thread(sbi); + f2fs_stop_discard_thread(sbi); + + f2fs_drop_discard_cmd(sbi); + clear_opt(sbi, DISCARD); + f2fs_update_time(sbi, REQ_TIME); out: if (in != F2FS_GOING_DOWN_FULLSYNC) @@ -1758,31 +2011,21 @@ static bool uuid_is_nonzero(__u8 u[16]) static int f2fs_ioc_set_encryption_policy(struct file *filp, unsigned long arg) { - struct fscrypt_policy policy; struct inode *inode = file_inode(filp); - if (copy_from_user(&policy, (struct fscrypt_policy __user *)arg, - sizeof(policy))) - return -EFAULT; + if (!f2fs_sb_has_encrypt(inode->i_sb)) + return -EOPNOTSUPP; f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); - return fscrypt_process_policy(filp, &policy); + return fscrypt_ioctl_set_policy(filp, (const void __user *)arg); } static int f2fs_ioc_get_encryption_policy(struct file *filp, unsigned long arg) { - struct fscrypt_policy policy; - struct inode *inode = file_inode(filp); - int err; - - err = fscrypt_get_policy(inode, &policy); - if (err) - return err; - - if (copy_to_user((struct fscrypt_policy __user *)arg, &policy, sizeof(policy))) - return -EFAULT; - return 0; + if (!f2fs_sb_has_encrypt(file_inode(filp)->i_sb)) + return -EOPNOTSUPP; + return fscrypt_ioctl_get_policy(filp, (void __user *)arg); } static int f2fs_ioc_get_encryption_pwsalt(struct file *filp, unsigned long arg) @@ -1791,16 +2034,18 @@ static int f2fs_ioc_get_encryption_pwsalt(struct file *filp, unsigned long arg) struct f2fs_sb_info *sbi = F2FS_I_SB(inode); int err; - if (!f2fs_sb_has_crypto(inode->i_sb)) + if (!f2fs_sb_has_encrypt(inode->i_sb)) return -EOPNOTSUPP; - if (uuid_is_nonzero(sbi->raw_super->encrypt_pw_salt)) - goto got_it; - err = mnt_want_write_file(filp); if (err) return err; + down_write(&sbi->sb_lock); + + if (uuid_is_nonzero(sbi->raw_super->encrypt_pw_salt)) + goto got_it; + /* update superblock with uuid */ generate_random_uuid(sbi->raw_super->encrypt_pw_salt); @@ -1808,15 +2053,16 @@ static int f2fs_ioc_get_encryption_pwsalt(struct file *filp, unsigned long arg) if (err) { /* undo new data */ memset(sbi->raw_super->encrypt_pw_salt, 0, 16); - mnt_drop_write_file(filp); - return err; + goto out_err; } - mnt_drop_write_file(filp); got_it: if (copy_to_user((__u8 __user *)arg, sbi->raw_super->encrypt_pw_salt, 16)) - return -EFAULT; - return 0; + err = -EFAULT; +out_err: + up_write(&sbi->sb_lock); + mnt_drop_write_file(filp); + return err; } static int f2fs_ioc_gc(struct file *filp, unsigned long arg) @@ -1848,7 +2094,53 @@ static int f2fs_ioc_gc(struct file *filp, unsigned long arg) mutex_lock(&sbi->gc_mutex); } - ret = f2fs_gc(sbi, sync); + ret = f2fs_gc(sbi, sync, true, NULL_SEGNO); +out: + mnt_drop_write_file(filp); + return ret; +} + +static int f2fs_ioc_gc_range(struct file *filp, unsigned long arg) +{ + struct inode *inode = file_inode(filp); + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct f2fs_gc_range range; + u64 end; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&range, (struct f2fs_gc_range __user *)arg, + sizeof(range))) + return -EFAULT; + + if (f2fs_readonly(sbi->sb)) + return -EROFS; + + end = range.start + range.len; + if (range.start < MAIN_BLKADDR(sbi) || end >= MAX_BLKADDR(sbi)) { + return -EINVAL; + } + + ret = mnt_want_write_file(filp); + if (ret) + return ret; + +do_more: + if (!range.sync) { + if (!mutex_trylock(&sbi->gc_mutex)) { + ret = -EBUSY; + goto out; + } + } else { + mutex_lock(&sbi->gc_mutex); + } + + ret = f2fs_gc(sbi, range.sync, true, GET_SEGNO(sbi, range.start)); + range.start += sbi->blocks_per_seg; + if (range.start <= end) + goto do_more; out: mnt_drop_write_file(filp); return ret; @@ -1881,18 +2173,18 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi, struct f2fs_defragment *range) { struct inode *inode = file_inode(filp); - struct f2fs_map_blocks map = { .m_next_pgofs = NULL }; - struct extent_info ei; - pgoff_t pg_start, pg_end; + struct f2fs_map_blocks map = { .m_next_extent = NULL, + .m_seg_type = NO_CHECK_TYPE }; + struct extent_info ei = {0, 0, 0}; + pgoff_t pg_start, pg_end, next_pgofs; unsigned int blk_per_seg = sbi->blocks_per_seg; unsigned int total = 0, sec_num; - unsigned int pages_per_sec = sbi->segs_per_sec * blk_per_seg; block_t blk_end = 0; bool fragmented = false; int err; /* if in-place-update policy is enabled, don't waste time here */ - if (need_inplace_update(inode)) + if (f2fs_should_update_inplace(inode, NULL)) return -EINVAL; pg_start = range->start >> PAGE_SHIFT; @@ -1918,6 +2210,7 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi, } map.m_lblk = pg_start; + map.m_next_pgofs = &next_pgofs; /* * lookup mapping info in dnode page cache, skip defragmenting if all @@ -1926,19 +2219,21 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi, */ while (map.m_lblk < pg_end) { map.m_len = pg_end - map.m_lblk; - err = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_READ); + err = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_DEFAULT); if (err) goto out; if (!(map.m_flags & F2FS_MAP_FLAGS)) { - map.m_lblk++; + map.m_lblk = next_pgofs; continue; } - if (blk_end && blk_end != map.m_pblk) { + if (blk_end && blk_end != map.m_pblk) fragmented = true; - break; - } + + /* record total count of block that we're going to move */ + total += map.m_len; + blk_end = map.m_pblk + map.m_len; map.m_lblk += map.m_len; @@ -1947,10 +2242,7 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi, if (!fragmented) goto out; - map.m_lblk = pg_start; - map.m_len = pg_end - pg_start; - - sec_num = (map.m_len + pages_per_sec - 1) / pages_per_sec; + sec_num = (total + BLKS_PER_SEC(sbi) - 1) / BLKS_PER_SEC(sbi); /* * make sure there are enough free section for LFS allocation, this can @@ -1962,18 +2254,22 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi, goto out; } + map.m_lblk = pg_start; + map.m_len = pg_end - pg_start; + total = 0; + while (map.m_lblk < pg_end) { pgoff_t idx; int cnt = 0; do_map: map.m_len = pg_end - map.m_lblk; - err = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_READ); + err = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_DEFAULT); if (err) goto clear_out; if (!(map.m_flags & F2FS_MAP_FLAGS)) { - map.m_lblk++; + map.m_lblk = next_pgofs; continue; } @@ -1983,7 +2279,7 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi, while (idx < map.m_lblk + map.m_len && cnt < blk_per_seg) { struct page *page; - page = get_lock_data_page(inode, idx, true); + page = f2fs_get_lock_data_page(inode, idx, true); if (IS_ERR(page)) { err = PTR_ERR(page); goto clear_out; @@ -2027,42 +2323,40 @@ static int f2fs_ioc_defragment(struct file *filp, unsigned long arg) if (!capable(CAP_SYS_ADMIN)) return -EPERM; - if (!S_ISREG(inode->i_mode)) + if (!S_ISREG(inode->i_mode) || f2fs_is_atomic_file(inode)) + return -EINVAL; + + if (f2fs_readonly(sbi->sb)) + return -EROFS; + + if (copy_from_user(&range, (struct f2fs_defragment __user *)arg, + sizeof(range))) + return -EFAULT; + + /* verify alignment of offset & size */ + if (range.start & (F2FS_BLKSIZE - 1) || range.len & (F2FS_BLKSIZE - 1)) + return -EINVAL; + + if (unlikely((range.start + range.len) >> PAGE_SHIFT > + sbi->max_file_blocks)) return -EINVAL; err = mnt_want_write_file(filp); if (err) return err; - if (f2fs_readonly(sbi->sb)) { - err = -EROFS; - goto out; - } - - if (copy_from_user(&range, (struct f2fs_defragment __user *)arg, - sizeof(range))) { - err = -EFAULT; - goto out; - } - - /* verify alignment of offset & size */ - if (range.start & (F2FS_BLKSIZE - 1) || - range.len & (F2FS_BLKSIZE - 1)) { - err = -EINVAL; - goto out; - } - err = f2fs_defragment_range(sbi, filp, &range); + mnt_drop_write_file(filp); + f2fs_update_time(sbi, REQ_TIME); if (err < 0) - goto out; + return err; if (copy_to_user((struct f2fs_defragment __user *)arg, &range, sizeof(range))) - err = -EFAULT; -out: - mnt_drop_write_file(filp); - return err; + return -EFAULT; + + return 0; } static int f2fs_move_file_range(struct file *file_in, loff_t pos_in, @@ -2097,10 +2391,9 @@ static int f2fs_move_file_range(struct file *file_in, loff_t pos_in, inode_lock(src); if (src != dst) { - if (!inode_trylock(dst)) { - ret = -EBUSY; + ret = -EBUSY; + if (!inode_trylock(dst)) goto out; - } } ret = -EINVAL; @@ -2145,6 +2438,14 @@ static int f2fs_move_file_range(struct file *file_in, loff_t pos_in, goto out_unlock; f2fs_balance_fs(sbi, true); + + down_write(&F2FS_I(src)->i_gc_rwsem[WRITE]); + if (src != dst) { + ret = -EBUSY; + if (!down_write_trylock(&F2FS_I(dst)->i_gc_rwsem[WRITE])) + goto out_src; + } + f2fs_lock_op(sbi); ret = __exchange_data_block(src, dst, pos_in >> F2FS_BLKSIZE_BITS, pos_out >> F2FS_BLKSIZE_BITS, @@ -2157,6 +2458,11 @@ static int f2fs_move_file_range(struct file *file_in, loff_t pos_in, f2fs_i_size_write(dst, dst_osize); } f2fs_unlock_op(sbi); + + if (src != dst) + up_write(&F2FS_I(dst)->i_gc_rwsem[WRITE]); +out_src: + up_write(&F2FS_I(src)->i_gc_rwsem[WRITE]); out_unlock: if (src != dst) inode_unlock(dst); @@ -2196,6 +2502,8 @@ static int f2fs_ioc_move_range(struct file *filp, unsigned long arg) range.pos_out, range.len); mnt_drop_write_file(filp); + if (err) + goto err_out; if (copy_to_user((struct f2fs_move_range __user *)arg, &range, sizeof(range))) @@ -2205,8 +2513,205 @@ static int f2fs_ioc_move_range(struct file *filp, unsigned long arg) return err; } +static int f2fs_ioc_flush_device(struct file *filp, unsigned long arg) +{ + struct inode *inode = file_inode(filp); + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct sit_info *sm = SIT_I(sbi); + unsigned int start_segno = 0, end_segno = 0; + unsigned int dev_start_segno = 0, dev_end_segno = 0; + struct f2fs_flush_device range; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (f2fs_readonly(sbi->sb)) + return -EROFS; + + if (copy_from_user(&range, (struct f2fs_flush_device __user *)arg, + sizeof(range))) + return -EFAULT; + + if (sbi->s_ndevs <= 1 || sbi->s_ndevs - 1 <= range.dev_num || + sbi->segs_per_sec != 1) { + f2fs_msg(sbi->sb, KERN_WARNING, + "Can't flush %u in %d for segs_per_sec %u != 1\n", + range.dev_num, sbi->s_ndevs, + sbi->segs_per_sec); + return -EINVAL; + } + + ret = mnt_want_write_file(filp); + if (ret) + return ret; + + if (range.dev_num != 0) + dev_start_segno = GET_SEGNO(sbi, FDEV(range.dev_num).start_blk); + dev_end_segno = GET_SEGNO(sbi, FDEV(range.dev_num).end_blk); + + start_segno = sm->last_victim[FLUSH_DEVICE]; + if (start_segno < dev_start_segno || start_segno >= dev_end_segno) + start_segno = dev_start_segno; + end_segno = min(start_segno + range.segments, dev_end_segno); + + while (start_segno < end_segno) { + if (!mutex_trylock(&sbi->gc_mutex)) { + ret = -EBUSY; + goto out; + } + sm->last_victim[GC_CB] = end_segno + 1; + sm->last_victim[GC_GREEDY] = end_segno + 1; + sm->last_victim[ALLOC_NEXT] = end_segno + 1; + ret = f2fs_gc(sbi, true, true, start_segno); + if (ret == -EAGAIN) + ret = 0; + else if (ret < 0) + break; + start_segno++; + } +out: + mnt_drop_write_file(filp); + return ret; +} + +static int f2fs_ioc_get_features(struct file *filp, unsigned long arg) +{ + struct inode *inode = file_inode(filp); + u32 sb_feature = le32_to_cpu(F2FS_I_SB(inode)->raw_super->feature); + + /* Must validate to set it with SQLite behavior in Android. */ + sb_feature |= F2FS_FEATURE_ATOMIC_WRITE; + + return put_user(sb_feature, (u32 __user *)arg); +} + +int f2fs_pin_file_control(struct inode *inode, bool inc) +{ + struct f2fs_inode_info *fi = F2FS_I(inode); + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + + /* Use i_gc_failures for normal file as a risk signal. */ + if (inc) + f2fs_i_gc_failures_write(inode, + fi->i_gc_failures[GC_FAILURE_PIN] + 1); + + if (fi->i_gc_failures[GC_FAILURE_PIN] > sbi->gc_pin_file_threshold) { + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: Enable GC = ino %lx after %x GC trials\n", + __func__, inode->i_ino, + fi->i_gc_failures[GC_FAILURE_PIN]); + clear_inode_flag(inode, FI_PIN_FILE); + return -EAGAIN; + } + return 0; +} + +static int f2fs_ioc_set_pin_file(struct file *filp, unsigned long arg) +{ + struct inode *inode = file_inode(filp); + __u32 pin; + int ret = 0; + + if (!inode_owner_or_capable(inode)) + return -EACCES; + + if (get_user(pin, (__u32 __user *)arg)) + return -EFAULT; + + if (!S_ISREG(inode->i_mode)) + return -EINVAL; + + if (f2fs_readonly(F2FS_I_SB(inode)->sb)) + return -EROFS; + + ret = mnt_want_write_file(filp); + if (ret) + return ret; + + inode_lock(inode); + + if (f2fs_should_update_outplace(inode, NULL)) { + ret = -EINVAL; + goto out; + } + + if (!pin) { + clear_inode_flag(inode, FI_PIN_FILE); + f2fs_i_gc_failures_write(inode, 0); + goto done; + } + + if (f2fs_pin_file_control(inode, false)) { + ret = -EAGAIN; + goto out; + } + ret = f2fs_convert_inline_inode(inode); + if (ret) + goto out; + + set_inode_flag(inode, FI_PIN_FILE); + ret = F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN]; +done: + f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); +out: + inode_unlock(inode); + mnt_drop_write_file(filp); + return ret; +} + +static int f2fs_ioc_get_pin_file(struct file *filp, unsigned long arg) +{ + struct inode *inode = file_inode(filp); + __u32 pin = 0; + + if (is_inode_flag_set(inode, FI_PIN_FILE)) + pin = F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN]; + return put_user(pin, (u32 __user *)arg); +} + +int f2fs_precache_extents(struct inode *inode) +{ + struct f2fs_inode_info *fi = F2FS_I(inode); + struct f2fs_map_blocks map; + pgoff_t m_next_extent; + loff_t end; + int err; + + if (is_inode_flag_set(inode, FI_NO_EXTENT)) + return -EOPNOTSUPP; + + map.m_lblk = 0; + map.m_next_pgofs = NULL; + map.m_next_extent = &m_next_extent; + map.m_seg_type = NO_CHECK_TYPE; + end = F2FS_I_SB(inode)->max_file_blocks; + + while (map.m_lblk < end) { + map.m_len = end - map.m_lblk; + + down_write(&fi->i_gc_rwsem[WRITE]); + err = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_PRECACHE); + up_write(&fi->i_gc_rwsem[WRITE]); + if (err) + return err; + + map.m_lblk = m_next_extent; + } + + return err; +} + +static int f2fs_ioc_precache_extents(struct file *filp, unsigned long arg) +{ + return f2fs_precache_extents(file_inode(filp)); +} + long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { + if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(filp))))) + return -EIO; + switch (cmd) { case F2FS_IOC_GETFLAGS: return f2fs_ioc_getflags(filp, arg); @@ -2236,12 +2741,24 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) return f2fs_ioc_get_encryption_pwsalt(filp, arg); case F2FS_IOC_GARBAGE_COLLECT: return f2fs_ioc_gc(filp, arg); + case F2FS_IOC_GARBAGE_COLLECT_RANGE: + return f2fs_ioc_gc_range(filp, arg); case F2FS_IOC_WRITE_CHECKPOINT: return f2fs_ioc_write_checkpoint(filp, arg); case F2FS_IOC_DEFRAGMENT: return f2fs_ioc_defragment(filp, arg); case F2FS_IOC_MOVE_RANGE: return f2fs_ioc_move_range(filp, arg); + case F2FS_IOC_FLUSH_DEVICE: + return f2fs_ioc_flush_device(filp, arg); + case F2FS_IOC_GET_FEATURES: + return f2fs_ioc_get_features(filp, arg); + case F2FS_IOC_GET_PIN_FILE: + return f2fs_ioc_get_pin_file(filp, arg); + case F2FS_IOC_SET_PIN_FILE: + return f2fs_ioc_set_pin_file(filp, arg); + case F2FS_IOC_PRECACHE_EXTENTS: + return f2fs_ioc_precache_extents(filp, arg); default: return -ENOTTY; } @@ -2251,23 +2768,61 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); - struct blk_plug plug; ssize_t ret; - if (f2fs_encrypted_inode(inode) && - !fscrypt_has_encryption_key(inode) && - fscrypt_get_encryption_info(inode)) - return -EACCES; + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) + return -EIO; - inode_lock(inode); + if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)) + return -EINVAL; + + if (!inode_trylock(inode)) { + if (iocb->ki_flags & IOCB_NOWAIT) + return -EAGAIN; + inode_lock(inode); + } + ret = generic_write_checks(iocb, from); if (ret > 0) { - ret = f2fs_preallocate_blocks(iocb, from); - if (!ret) { - blk_start_plug(&plug); - ret = __generic_file_write_iter(iocb, from); - blk_finish_plug(&plug); + bool preallocated = false; + size_t target_size = 0; + int err; + + if (iov_iter_fault_in_readable(from, iov_iter_count(from))) + set_inode_flag(inode, FI_NO_PREALLOC); + + if ((iocb->ki_flags & IOCB_NOWAIT) && + (iocb->ki_flags & IOCB_DIRECT)) { + if (!f2fs_overwrite_io(inode, iocb->ki_pos, + iov_iter_count(from)) || + f2fs_has_inline_data(inode) || + f2fs_force_buffered_io(inode, WRITE)) { + clear_inode_flag(inode, + FI_NO_PREALLOC); + inode_unlock(inode); + return -EAGAIN; + } + + } else { + preallocated = true; + target_size = iocb->ki_pos + iov_iter_count(from); + + err = f2fs_preallocate_blocks(iocb, from); + if (err) { + clear_inode_flag(inode, FI_NO_PREALLOC); + inode_unlock(inode); + return err; + } } + ret = __generic_file_write_iter(iocb, from); + clear_inode_flag(inode, FI_NO_PREALLOC); + + /* if we couldn't write data, we should deallocate blocks. */ + if (preallocated && i_size_read(inode) < target_size) + f2fs_truncate(inode); + + if (ret > 0) + f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret); } inode_unlock(inode); @@ -2299,10 +2854,15 @@ long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg) case F2FS_IOC_GET_ENCRYPTION_PWSALT: case F2FS_IOC_GET_ENCRYPTION_POLICY: case F2FS_IOC_GARBAGE_COLLECT: + case F2FS_IOC_GARBAGE_COLLECT_RANGE: case F2FS_IOC_WRITE_CHECKPOINT: case F2FS_IOC_DEFRAGMENT: - break; case F2FS_IOC_MOVE_RANGE: + case F2FS_IOC_FLUSH_DEVICE: + case F2FS_IOC_GET_FEATURES: + case F2FS_IOC_GET_PIN_FILE: + case F2FS_IOC_SET_PIN_FILE: + case F2FS_IOC_PRECACHE_EXTENTS: break; default: return -ENOIOCTLCMD; @@ -2318,6 +2878,7 @@ const struct file_operations f2fs_file_operations = { .open = f2fs_file_open, .release = f2fs_release_file, .mmap = f2fs_file_mmap, + .flush = f2fs_file_flush, .fsync = f2fs_sync_file, .fallocate = f2fs_fallocate, .unlocked_ioctl = f2fs_ioctl,
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 759056e..c6322ef 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c
@@ -28,17 +28,23 @@ static int gc_thread_func(void *data) struct f2fs_sb_info *sbi = data; struct f2fs_gc_kthread *gc_th = sbi->gc_thread; wait_queue_head_t *wq = &sbi->gc_thread->gc_wait_queue_head; - long wait_ms; + unsigned int wait_ms; wait_ms = gc_th->min_sleep_time; + set_freezable(); do { + wait_event_interruptible_timeout(*wq, + kthread_should_stop() || freezing(current) || + gc_th->gc_wake, + msecs_to_jiffies(wait_ms)); + + /* give it a try one time */ + if (gc_th->gc_wake) + gc_th->gc_wake = 0; + if (try_to_freeze()) continue; - else - wait_event_interruptible_timeout(*wq, - kthread_should_stop(), - msecs_to_jiffies(wait_ms)); if (kthread_should_stop()) break; @@ -47,10 +53,13 @@ static int gc_thread_func(void *data) continue; } -#ifdef CONFIG_F2FS_FAULT_INJECTION - if (time_to_inject(sbi, FAULT_CHECKPOINT)) + if (time_to_inject(sbi, FAULT_CHECKPOINT)) { + f2fs_show_injection_info(FAULT_CHECKPOINT); f2fs_stop_checkpoint(sbi, false); -#endif + } + + if (!sb_start_write_trylock(sbi->sb)) + continue; /* * [GC triggering condition] @@ -65,24 +74,30 @@ static int gc_thread_func(void *data) * invalidated soon after by user update or deletion. * So, I'd like to wait some time to collect dirty segments. */ + if (sbi->gc_mode == GC_URGENT) { + wait_ms = gc_th->urgent_sleep_time; + mutex_lock(&sbi->gc_mutex); + goto do_gc; + } + if (!mutex_trylock(&sbi->gc_mutex)) - continue; + goto next; if (!is_idle(sbi)) { increase_sleep_time(gc_th, &wait_ms); mutex_unlock(&sbi->gc_mutex); - continue; + goto next; } if (has_enough_invalid_blocks(sbi)) decrease_sleep_time(gc_th, &wait_ms); else increase_sleep_time(gc_th, &wait_ms); - +do_gc: stat_inc_bggc_count(sbi); /* if return value is not zero, no victim was selected */ - if (f2fs_gc(sbi, test_opt(sbi, FORCE_FG_GC))) + if (f2fs_gc(sbi, test_opt(sbi, FORCE_FG_GC), true, NULL_SEGNO)) wait_ms = gc_th->no_gc_sleep_time; trace_f2fs_background_gc(sbi->sb, wait_ms, @@ -90,12 +105,14 @@ static int gc_thread_func(void *data) /* balancing f2fs's metadata periodically */ f2fs_balance_fs_bg(sbi); +next: + sb_end_write(sbi->sb); } while (!kthread_should_stop()); return 0; } -int start_gc_thread(struct f2fs_sb_info *sbi) +int f2fs_start_gc_thread(struct f2fs_sb_info *sbi) { struct f2fs_gc_kthread *gc_th; dev_t dev = sbi->sb->s_bdev->bd_dev; @@ -107,11 +124,12 @@ int start_gc_thread(struct f2fs_sb_info *sbi) goto out; } + gc_th->urgent_sleep_time = DEF_GC_THREAD_URGENT_SLEEP_TIME; gc_th->min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME; gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME; gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME; - gc_th->gc_idle = 0; + gc_th->gc_wake= 0; sbi->gc_thread = gc_th; init_waitqueue_head(&sbi->gc_thread->gc_wait_queue_head); @@ -126,7 +144,7 @@ int start_gc_thread(struct f2fs_sb_info *sbi) return err; } -void stop_gc_thread(struct f2fs_sb_info *sbi) +void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi) { struct f2fs_gc_kthread *gc_th = sbi->gc_thread; if (!gc_th) @@ -136,15 +154,18 @@ void stop_gc_thread(struct f2fs_sb_info *sbi) sbi->gc_thread = NULL; } -static int select_gc_type(struct f2fs_gc_kthread *gc_th, int gc_type) +static int select_gc_type(struct f2fs_sb_info *sbi, int gc_type) { int gc_mode = (gc_type == BG_GC) ? GC_CB : GC_GREEDY; - if (gc_th && gc_th->gc_idle) { - if (gc_th->gc_idle == 1) - gc_mode = GC_CB; - else if (gc_th->gc_idle == 2) - gc_mode = GC_GREEDY; + switch (sbi->gc_mode) { + case GC_IDLE_CB: + gc_mode = GC_CB; + break; + case GC_IDLE_GREEDY: + case GC_URGENT: + gc_mode = GC_GREEDY; + break; } return gc_mode; } @@ -160,17 +181,24 @@ static void select_policy(struct f2fs_sb_info *sbi, int gc_type, p->max_search = dirty_i->nr_dirty[type]; p->ofs_unit = 1; } else { - p->gc_mode = select_gc_type(sbi->gc_thread, gc_type); + p->gc_mode = select_gc_type(sbi, gc_type); p->dirty_segmap = dirty_i->dirty_segmap[DIRTY]; p->max_search = dirty_i->nr_dirty[DIRTY]; p->ofs_unit = sbi->segs_per_sec; } /* we need to check every dirty segments in the FG_GC case */ - if (gc_type != FG_GC && p->max_search > sbi->max_victim_search) + if (gc_type != FG_GC && + (sbi->gc_mode != GC_URGENT) && + p->max_search > sbi->max_victim_search) p->max_search = sbi->max_victim_search; - p->offset = sbi->last_victim[p->gc_mode]; + /* let's select beginning hot/small space first in no_heap mode*/ + if (test_opt(sbi, NOHEAP) && + (type == CURSEG_HOT_DATA || IS_NODESEG(type))) + p->offset = 0; + else + p->offset = SIT_I(sbi)->last_victim[p->gc_mode]; } static unsigned int get_max_cost(struct f2fs_sb_info *sbi, @@ -180,7 +208,7 @@ static unsigned int get_max_cost(struct f2fs_sb_info *sbi, if (p->alloc_mode == SSR) return sbi->blocks_per_seg; if (p->gc_mode == GC_GREEDY) - return sbi->blocks_per_seg * p->ofs_unit; + return 2 * sbi->blocks_per_seg * p->ofs_unit; else if (p->gc_mode == GC_CB) return UINT_MAX; else /* No other gc_mode */ @@ -200,12 +228,8 @@ static unsigned int check_bg_victims(struct f2fs_sb_info *sbi) for_each_set_bit(secno, dirty_i->victim_secmap, MAIN_SECS(sbi)) { if (sec_usage_check(sbi, secno)) continue; - - if (no_fggc_candidate(sbi, secno)) - continue; - clear_bit(secno, dirty_i->victim_secmap); - return secno * sbi->segs_per_sec; + return GET_SEG_FROM_SEC(sbi, secno); } return NULL_SEGNO; } @@ -213,8 +237,8 @@ static unsigned int check_bg_victims(struct f2fs_sb_info *sbi) static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno) { struct sit_info *sit_i = SIT_I(sbi); - unsigned int secno = GET_SECNO(sbi, segno); - unsigned int start = secno * sbi->segs_per_sec; + unsigned int secno = GET_SEC_FROM_SEG(sbi, segno); + unsigned int start = GET_SEG_FROM_SEC(sbi, secno); unsigned long long mtime = 0; unsigned int vblocks; unsigned char age = 0; @@ -223,7 +247,7 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno) for (i = 0; i < sbi->segs_per_sec; i++) mtime += get_seg_entry(sbi, start + i)->mtime; - vblocks = get_valid_blocks(sbi, segno, sbi->segs_per_sec); + vblocks = get_valid_blocks(sbi, segno, true); mtime = div_u64(mtime, sbi->segs_per_sec); vblocks = div_u64(vblocks, sbi->segs_per_sec); @@ -250,7 +274,7 @@ static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi, /* alloc_mode == LFS */ if (p->gc_mode == GC_GREEDY) - return get_valid_blocks(sbi, segno, sbi->segs_per_sec); + return get_valid_blocks(sbi, segno, true); else return get_cb_cost(sbi, segno); } @@ -279,6 +303,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, unsigned int *result, int gc_type, int type, char alloc_mode) { struct dirty_seglist_info *dirty_i = DIRTY_I(sbi); + struct sit_info *sm = SIT_I(sbi); struct victim_sel_policy p; unsigned int secno, last_victim; unsigned int last_segment = MAIN_SEGS(sbi); @@ -292,10 +317,18 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, p.min_segno = NULL_SEGNO; p.min_cost = get_max_cost(sbi, &p); + if (*result != NULL_SEGNO) { + if (IS_DATASEG(get_seg_entry(sbi, *result)->type) && + get_valid_blocks(sbi, *result, false) && + !sec_usage_check(sbi, GET_SEC_FROM_SEG(sbi, *result))) + p.min_segno = *result; + goto out; + } + if (p.max_search == 0) goto out; - last_victim = sbi->last_victim[p.gc_mode]; + last_victim = sm->last_victim[p.gc_mode]; if (p.alloc_mode == LFS && gc_type == FG_GC) { p.min_segno = check_bg_victims(sbi); if (p.min_segno != NULL_SEGNO) @@ -308,9 +341,10 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, segno = find_next_bit(p.dirty_segmap, last_segment, p.offset); if (segno >= last_segment) { - if (sbi->last_victim[p.gc_mode]) { - last_segment = sbi->last_victim[p.gc_mode]; - sbi->last_victim[p.gc_mode] = 0; + if (sm->last_victim[p.gc_mode]) { + last_segment = + sm->last_victim[p.gc_mode]; + sm->last_victim[p.gc_mode] = 0; p.offset = 0; continue; } @@ -327,15 +361,12 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, nsearched++; } - secno = GET_SECNO(sbi, segno); + secno = GET_SEC_FROM_SEG(sbi, segno); if (sec_usage_check(sbi, secno)) goto next; if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap)) goto next; - if (gc_type == FG_GC && p.alloc_mode == LFS && - no_fggc_candidate(sbi, secno)) - goto next; cost = get_gc_cost(sbi, segno, &p); @@ -345,17 +376,18 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, } next: if (nsearched >= p.max_search) { - if (!sbi->last_victim[p.gc_mode] && segno <= last_victim) - sbi->last_victim[p.gc_mode] = last_victim + 1; + if (!sm->last_victim[p.gc_mode] && segno <= last_victim) + sm->last_victim[p.gc_mode] = last_victim + 1; else - sbi->last_victim[p.gc_mode] = segno + 1; + sm->last_victim[p.gc_mode] = segno + 1; + sm->last_victim[p.gc_mode] %= MAIN_SEGS(sbi); break; } } if (p.min_segno != NULL_SEGNO) { got_it: if (p.alloc_mode == LFS) { - secno = GET_SECNO(sbi, p.min_segno); + secno = GET_SEC_FROM_SEG(sbi, p.min_segno); if (gc_type == FG_GC) sbi->cur_victim_sec = secno; else @@ -395,7 +427,7 @@ static void add_gc_inode(struct gc_inode_list *gc_list, struct inode *inode) iput(inode); return; } - new_ie = f2fs_kmem_cache_alloc(inode_entry_slab, GFP_NOFS); + new_ie = f2fs_kmem_cache_alloc(f2fs_inode_entry_slab, GFP_NOFS); new_ie->inode = inode; f2fs_radix_tree_insert(&gc_list->iroot, inode->i_ino, new_ie); @@ -409,7 +441,7 @@ static void put_gc_inode(struct gc_inode_list *gc_list) radix_tree_delete(&gc_list->iroot, ie->inode->i_ino); iput(ie->inode); list_del(&ie->list); - kmem_cache_free(inode_entry_slab, ie); + kmem_cache_free(f2fs_inode_entry_slab, ie); } } @@ -420,10 +452,10 @@ static int check_valid_map(struct f2fs_sb_info *sbi, struct seg_entry *sentry; int ret; - mutex_lock(&sit_i->sentry_lock); + down_read(&sit_i->sentry_lock); sentry = get_seg_entry(sbi, segno); ret = f2fs_test_bit(offset, sentry->cur_valid_map); - mutex_unlock(&sit_i->sentry_lock); + up_read(&sit_i->sentry_lock); return ret; } @@ -439,12 +471,16 @@ static void gc_node_segment(struct f2fs_sb_info *sbi, block_t start_addr; int off; int phase = 0; + bool fggc = (gc_type == FG_GC); start_addr = START_BLOCK(sbi, segno); next_step: entry = sum; + if (fggc && phase == 2) + atomic_inc(&sbi->wb_sync_req[NODE]); + for (off = 0; off < sbi->blocks_per_seg; off++, entry++) { nid_t nid = le32_to_cpu(entry->nid); struct page *node_page; @@ -458,39 +494,46 @@ static void gc_node_segment(struct f2fs_sb_info *sbi, continue; if (phase == 0) { - ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1, + f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1, META_NAT, true); continue; } if (phase == 1) { - ra_node_page(sbi, nid); + f2fs_ra_node_page(sbi, nid); continue; } /* phase == 2 */ - node_page = get_node_page(sbi, nid); + node_page = f2fs_get_node_page(sbi, nid); if (IS_ERR(node_page)) continue; - /* block may become invalid during get_node_page */ + /* block may become invalid during f2fs_get_node_page */ if (check_valid_map(sbi, segno, off) == 0) { f2fs_put_page(node_page, 1); continue; } - get_node_info(sbi, nid, &ni); + if (f2fs_get_node_info(sbi, nid, &ni)) { + f2fs_put_page(node_page, 1); + continue; + } + if (ni.blk_addr != start_addr + off) { f2fs_put_page(node_page, 1); continue; } - move_node_page(node_page, gc_type); + f2fs_move_node_page(node_page, gc_type); stat_inc_node_blk_count(sbi, 1, gc_type); } if (++phase < 3) goto next_step; + + if (fggc) + atomic_dec(&sbi->wb_sync_req[NODE]); } /* @@ -500,7 +543,7 @@ static void gc_node_segment(struct f2fs_sb_info *sbi, * as indirect or double indirect node blocks, are given, it must be a caller's * bug. */ -block_t start_bidx_of_node(unsigned int node_ofs, struct inode *inode) +block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode) { unsigned int indirect_blks = 2 * NIDS_PER_BLOCK + 4; unsigned int bidx; @@ -531,11 +574,14 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, nid = le32_to_cpu(sum->nid); ofs_in_node = le16_to_cpu(sum->ofs_in_node); - node_page = get_node_page(sbi, nid); + node_page = f2fs_get_node_page(sbi, nid); if (IS_ERR(node_page)) return false; - get_node_info(sbi, nid, dni); + if (f2fs_get_node_info(sbi, nid, dni)) { + f2fs_put_page(node_page, 1); + return false; + } if (sum->version != dni->version) { f2fs_msg(sbi->sb, KERN_WARNING, @@ -545,7 +591,7 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, } *nofs = ofs_of_node(node_page); - source_blkaddr = datablock_addr(node_page, ofs_in_node); + source_blkaddr = datablock_addr(NULL, node_page, ofs_in_node); f2fs_put_page(node_page, 1); if (source_blkaddr != blkaddr) @@ -553,29 +599,119 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, return true; } -static void move_encrypted_block(struct inode *inode, block_t bidx) +static int ra_data_block(struct inode *inode, pgoff_t index) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct address_space *mapping = inode->i_mapping; + struct dnode_of_data dn; + struct page *page; + struct extent_info ei = {0, 0, 0}; + struct f2fs_io_info fio = { + .sbi = sbi, + .ino = inode->i_ino, + .type = DATA, + .temp = COLD, + .op = REQ_OP_READ, + .op_flags = 0, + .encrypted_page = NULL, + .in_list = false, + .retry = false, + }; + int err; + + page = f2fs_grab_cache_page(mapping, index, true); + if (!page) + return -ENOMEM; + + if (f2fs_lookup_extent_cache(inode, index, &ei)) { + dn.data_blkaddr = ei.blk + index - ei.fofs; + goto got_it; + } + + set_new_dnode(&dn, inode, NULL, NULL, 0); + err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE); + if (err) + goto put_page; + f2fs_put_dnode(&dn); + + if (unlikely(!f2fs_is_valid_blkaddr(sbi, dn.data_blkaddr, + DATA_GENERIC))) { + err = -EFAULT; + goto put_page; + } +got_it: + /* read page */ + fio.page = page; + fio.new_blkaddr = fio.old_blkaddr = dn.data_blkaddr; + + fio.encrypted_page = f2fs_pagecache_get_page(META_MAPPING(sbi), + dn.data_blkaddr, + FGP_LOCK | FGP_CREAT, GFP_NOFS); + if (!fio.encrypted_page) { + err = -ENOMEM; + goto put_page; + } + + err = f2fs_submit_page_bio(&fio); + if (err) + goto put_encrypted_page; + f2fs_put_page(fio.encrypted_page, 0); + f2fs_put_page(page, 1); + return 0; +put_encrypted_page: + f2fs_put_page(fio.encrypted_page, 1); +put_page: + f2fs_put_page(page, 1); + return err; +} + +/* + * Move data block via META_MAPPING while keeping locked data page. + * This can be used to move blocks, aka LBAs, directly on disk. + */ +static void move_data_block(struct inode *inode, block_t bidx, + int gc_type, unsigned int segno, int off) { struct f2fs_io_info fio = { .sbi = F2FS_I_SB(inode), + .ino = inode->i_ino, .type = DATA, + .temp = COLD, .op = REQ_OP_READ, - .op_flags = READ_SYNC, + .op_flags = 0, .encrypted_page = NULL, + .in_list = false, + .retry = false, }; struct dnode_of_data dn; struct f2fs_summary sum; struct node_info ni; - struct page *page; + struct page *page, *mpage; block_t newaddr; int err; + bool lfs_mode = test_opt(fio.sbi, LFS); /* do not read out */ page = f2fs_grab_cache_page(inode->i_mapping, bidx, false); if (!page) return; + if (!check_valid_map(F2FS_I_SB(inode), segno, off)) + goto out; + + if (f2fs_is_atomic_file(inode)) { + F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++; + F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++; + goto out; + } + + if (f2fs_is_pinned_file(inode)) { + f2fs_pin_file_control(inode, true); + goto out; + } + set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, bidx, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, bidx, LOOKUP_NODE); if (err) goto out; @@ -590,23 +726,46 @@ static void move_encrypted_block(struct inode *inode, block_t bidx) */ f2fs_wait_on_page_writeback(page, DATA, true); - get_node_info(fio.sbi, dn.nid, &ni); + err = f2fs_get_node_info(fio.sbi, dn.nid, &ni); + if (err) + goto put_out; + set_summary(&sum, dn.nid, dn.ofs_in_node, ni.version); /* read page */ fio.page = page; fio.new_blkaddr = fio.old_blkaddr = dn.data_blkaddr; - allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, &newaddr, - &sum, CURSEG_COLD_DATA); + if (lfs_mode) + down_write(&fio.sbi->io_order_lock); - fio.encrypted_page = pagecache_get_page(META_MAPPING(fio.sbi), newaddr, - FGP_LOCK | FGP_CREAT, GFP_NOFS); + f2fs_allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, &newaddr, + &sum, CURSEG_COLD_DATA, NULL, false); + + fio.encrypted_page = f2fs_pagecache_get_page(META_MAPPING(fio.sbi), + newaddr, FGP_LOCK | FGP_CREAT, GFP_NOFS); if (!fio.encrypted_page) { err = -ENOMEM; goto recover_block; } + mpage = f2fs_pagecache_get_page(META_MAPPING(fio.sbi), + fio.old_blkaddr, FGP_LOCK, GFP_NOFS); + if (mpage) { + bool updated = false; + + if (PageUptodate(mpage)) { + memcpy(page_address(fio.encrypted_page), + page_address(mpage), PAGE_SIZE); + updated = true; + } + f2fs_put_page(mpage, 1); + invalidate_mapping_pages(META_MAPPING(fio.sbi), + fio.old_blkaddr, fio.old_blkaddr); + if (updated) + goto write_page; + } + err = f2fs_submit_page_bio(&fio); if (err) goto put_page_out; @@ -623,20 +782,29 @@ static void move_encrypted_block(struct inode *inode, block_t bidx) goto put_page_out; } +write_page: set_page_dirty(fio.encrypted_page); f2fs_wait_on_page_writeback(fio.encrypted_page, DATA, true); if (clear_page_dirty_for_io(fio.encrypted_page)) dec_page_count(fio.sbi, F2FS_DIRTY_META); set_page_writeback(fio.encrypted_page); + ClearPageError(page); /* allocate block address */ f2fs_wait_on_page_writeback(dn.node_page, NODE, true); fio.op = REQ_OP_WRITE; - fio.op_flags = WRITE_SYNC; + fio.op_flags = REQ_SYNC; fio.new_blkaddr = newaddr; - f2fs_submit_page_mbio(&fio); + f2fs_submit_page_write(&fio); + if (fio.retry) { + if (PageWriteback(fio.encrypted_page)) + end_page_writeback(fio.encrypted_page); + goto put_page_out; + } + + f2fs_update_iostat(fio.sbi, FS_GC_DATA_IO, F2FS_BLKSIZE); f2fs_update_data_blkaddr(&dn, newaddr); set_inode_flag(inode, FI_APPEND_WRITE); @@ -645,8 +813,10 @@ static void move_encrypted_block(struct inode *inode, block_t bidx) put_page_out: f2fs_put_page(fio.encrypted_page, 1); recover_block: + if (lfs_mode) + up_write(&fio.sbi->io_order_lock); if (err) - __f2fs_replace_block(fio.sbi, &sum, newaddr, fio.old_blkaddr, + f2fs_do_replace_block(fio.sbi, &sum, newaddr, fio.old_blkaddr, true, true); put_out: f2fs_put_dnode(&dn); @@ -654,14 +824,29 @@ static void move_encrypted_block(struct inode *inode, block_t bidx) f2fs_put_page(page, 1); } -static void move_data_page(struct inode *inode, block_t bidx, int gc_type) +static void move_data_page(struct inode *inode, block_t bidx, int gc_type, + unsigned int segno, int off) { struct page *page; - page = get_lock_data_page(inode, bidx, true); + page = f2fs_get_lock_data_page(inode, bidx, true); if (IS_ERR(page)) return; + if (!check_valid_map(F2FS_I_SB(inode), segno, off)) + goto out; + + if (f2fs_is_atomic_file(inode)) { + F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++; + F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++; + goto out; + } + if (f2fs_is_pinned_file(inode)) { + if (gc_type == FG_GC) + f2fs_pin_file_control(inode, true); + goto out; + } + if (gc_type == BG_GC) { if (PageWriteback(page)) goto out; @@ -670,11 +855,16 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type) } else { struct f2fs_io_info fio = { .sbi = F2FS_I_SB(inode), + .ino = inode->i_ino, .type = DATA, + .temp = COLD, .op = REQ_OP_WRITE, - .op_flags = WRITE_SYNC, + .op_flags = REQ_SYNC, + .old_blkaddr = NULL_ADDR, .page = page, .encrypted_page = NULL, + .need_lock = LOCK_REQ, + .io_type = FS_GC_DATA_IO, }; bool is_dirty = PageDirty(page); int err; @@ -682,12 +872,14 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type) retry: set_page_dirty(page); f2fs_wait_on_page_writeback(page, DATA, true); - if (clear_page_dirty_for_io(page)) + if (clear_page_dirty_for_io(page)) { inode_dec_dirty_pages(inode); + f2fs_remove_dirty_inode(inode); + } set_cold_data(page); - err = do_write_data_page(&fio); + err = f2fs_do_write_data_page(&fio); if (err) { clear_cold_data(page); if (err == -ENOMEM) { @@ -697,8 +889,6 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type) if (is_dirty) set_page_dirty(page); } - - clear_cold_data(page); } out: f2fs_put_page(page, 1); @@ -741,13 +931,13 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, continue; if (phase == 0) { - ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1, + f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1, META_NAT, true); continue; } if (phase == 1) { - ra_node_page(sbi, nid); + f2fs_ra_node_page(sbi, nid); continue; } @@ -756,7 +946,7 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, continue; if (phase == 2) { - ra_node_page(sbi, dni.ino); + f2fs_ra_node_page(sbi, dni.ino); continue; } @@ -767,17 +957,31 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, if (IS_ERR(inode) || is_bad_inode(inode)) continue; - /* if encrypted inode, let's go phase 3 */ - if (f2fs_encrypted_inode(inode) && - S_ISREG(inode->i_mode)) { + if (!down_write_trylock( + &F2FS_I(inode)->i_gc_rwsem[WRITE])) { + iput(inode); + sbi->skipped_gc_rwsem++; + continue; + } + + start_bidx = f2fs_start_bidx_of_node(nofs, inode) + + ofs_in_node; + + if (f2fs_post_read_required(inode)) { + int err = ra_data_block(inode, start_bidx); + + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + if (err) { + iput(inode); + continue; + } add_gc_inode(gc_list, inode); continue; } - start_bidx = start_bidx_of_node(nofs, inode); - data_page = get_read_data_page(inode, - start_bidx + ofs_in_node, REQ_RAHEAD, - true); + data_page = f2fs_get_read_data_page(inode, + start_bidx, REQ_RAHEAD, true); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); if (IS_ERR(data_page)) { iput(inode); continue; @@ -795,26 +999,32 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, bool locked = false; if (S_ISREG(inode->i_mode)) { - if (!down_write_trylock(&fi->dio_rwsem[READ])) + if (!down_write_trylock(&fi->i_gc_rwsem[READ])) continue; if (!down_write_trylock( - &fi->dio_rwsem[WRITE])) { - up_write(&fi->dio_rwsem[READ]); + &fi->i_gc_rwsem[WRITE])) { + sbi->skipped_gc_rwsem++; + up_write(&fi->i_gc_rwsem[READ]); continue; } locked = true; + + /* wait for all inflight aio data */ + inode_dio_wait(inode); } - start_bidx = start_bidx_of_node(nofs, inode) + start_bidx = f2fs_start_bidx_of_node(nofs, inode) + ofs_in_node; - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) - move_encrypted_block(inode, start_bidx); + if (f2fs_post_read_required(inode)) + move_data_block(inode, start_bidx, gc_type, + segno, off); else - move_data_page(inode, start_bidx, gc_type); + move_data_page(inode, start_bidx, gc_type, + segno, off); if (locked) { - up_write(&fi->dio_rwsem[WRITE]); - up_write(&fi->dio_rwsem[READ]); + up_write(&fi->i_gc_rwsem[WRITE]); + up_write(&fi->i_gc_rwsem[READ]); } stat_inc_data_blk_count(sbi, 1, gc_type); @@ -831,10 +1041,10 @@ static int __get_victim(struct f2fs_sb_info *sbi, unsigned int *victim, struct sit_info *sit_i = SIT_I(sbi); int ret; - mutex_lock(&sit_i->sentry_lock); + down_write(&sit_i->sentry_lock); ret = DIRTY_I(sbi)->v_ops->get_victim(sbi, victim, gc_type, NO_CHECK_TYPE, LFS); - mutex_unlock(&sit_i->sentry_lock); + up_write(&sit_i->sentry_lock); return ret; } @@ -847,18 +1057,18 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi, struct blk_plug plug; unsigned int segno = start_segno; unsigned int end_segno = start_segno + sbi->segs_per_sec; - int sec_freed = 0; + int seg_freed = 0; unsigned char type = IS_DATASEG(get_seg_entry(sbi, segno)->type) ? SUM_TYPE_DATA : SUM_TYPE_NODE; /* readahead multi ssa blocks those have contiguous address */ if (sbi->segs_per_sec > 1) - ra_meta_pages(sbi, GET_SUM_BLOCK(sbi, segno), + f2fs_ra_meta_pages(sbi, GET_SUM_BLOCK(sbi, segno), sbi->segs_per_sec, META_SSA, true); /* reference all summary page */ while (segno < end_segno) { - sum_page = get_sum_page(sbi, segno++); + sum_page = f2fs_get_sum_page(sbi, segno++); unlock_page(sum_page); } @@ -871,7 +1081,7 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi, GET_SUM_BLOCK(sbi, segno)); f2fs_put_page(sum_page, 0); - if (get_valid_blocks(sbi, segno, 1) == 0 || + if (get_valid_blocks(sbi, segno, false) == 0 || !PageUptodate(sum_page) || unlikely(f2fs_cp_error(sbi))) goto next; @@ -888,11 +1098,10 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi, /* * this is to avoid deadlock: * - lock_page(sum_page) - f2fs_replace_block - * - check_valid_map() - mutex_lock(sentry_lock) - * - mutex_lock(sentry_lock) - change_curseg() + * - check_valid_map() - down_write(sentry_lock) + * - down_read(sentry_lock) - change_curseg() * - lock_page(sum_page) */ - if (type == SUM_TYPE_NODE) gc_node_segment(sbi, sum->entries, segno, gc_type); else @@ -900,87 +1109,137 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi, gc_type); stat_inc_seg_count(sbi, type, gc_type); + + if (gc_type == FG_GC && + get_valid_blocks(sbi, segno, false) == 0) + seg_freed++; next: f2fs_put_page(sum_page, 0); } if (gc_type == FG_GC) - f2fs_submit_merged_bio(sbi, - (type == SUM_TYPE_NODE) ? NODE : DATA, WRITE); + f2fs_submit_merged_write(sbi, + (type == SUM_TYPE_NODE) ? NODE : DATA); blk_finish_plug(&plug); - if (gc_type == FG_GC && - get_valid_blocks(sbi, start_segno, sbi->segs_per_sec) == 0) - sec_freed = 1; - stat_inc_call_count(sbi->stat_info); - return sec_freed; + return seg_freed; } -int f2fs_gc(struct f2fs_sb_info *sbi, bool sync) +int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, + bool background, unsigned int segno) { - unsigned int segno; int gc_type = sync ? FG_GC : BG_GC; - int sec_freed = 0; - int ret = -EINVAL; + int sec_freed = 0, seg_freed = 0, total_freed = 0; + int ret = 0; struct cp_control cpc; + unsigned int init_segno = segno; struct gc_inode_list gc_list = { .ilist = LIST_HEAD_INIT(gc_list.ilist), .iroot = RADIX_TREE_INIT(GFP_NOFS), }; + unsigned long long last_skipped = sbi->skipped_atomic_files[FG_GC]; + unsigned long long first_skipped; + unsigned int skipped_round = 0, round = 0; + + trace_f2fs_gc_begin(sbi->sb, sync, background, + get_pages(sbi, F2FS_DIRTY_NODES), + get_pages(sbi, F2FS_DIRTY_DENTS), + get_pages(sbi, F2FS_DIRTY_IMETA), + free_sections(sbi), + free_segments(sbi), + reserved_segments(sbi), + prefree_segments(sbi)); cpc.reason = __get_cp_reason(sbi); + sbi->skipped_gc_rwsem = 0; + first_skipped = last_skipped; gc_more: - segno = NULL_SEGNO; - - if (unlikely(!(sbi->sb->s_flags & MS_ACTIVE))) + if (unlikely(!(sbi->sb->s_flags & MS_ACTIVE))) { + ret = -EINVAL; goto stop; + } if (unlikely(f2fs_cp_error(sbi))) { ret = -EIO; goto stop; } - if (gc_type == BG_GC && has_not_enough_free_secs(sbi, sec_freed, 0)) { - gc_type = FG_GC; + if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) { /* - * If there is no victim and no prefree segment but still not - * enough free sections, we should flush dent/node blocks and do - * garbage collections. + * For example, if there are many prefree_segments below given + * threshold, we can make them free by checkpoint. Then, we + * secure free segments which doesn't need fggc any more. */ - if (__get_victim(sbi, &segno, gc_type) || - prefree_segments(sbi)) { - ret = write_checkpoint(sbi, &cpc); - if (ret) - goto stop; - segno = NULL_SEGNO; - } else if (has_not_enough_free_secs(sbi, 0, 0)) { - ret = write_checkpoint(sbi, &cpc); + if (prefree_segments(sbi)) { + ret = f2fs_write_checkpoint(sbi, &cpc); if (ret) goto stop; } + if (has_not_enough_free_secs(sbi, 0, 0)) + gc_type = FG_GC; } - if (segno == NULL_SEGNO && !__get_victim(sbi, &segno, gc_type)) + /* f2fs_balance_fs doesn't need to do BG_GC in critical path. */ + if (gc_type == BG_GC && !background) { + ret = -EINVAL; goto stop; - ret = 0; + } + if (!__get_victim(sbi, &segno, gc_type)) { + ret = -ENODATA; + goto stop; + } - if (do_garbage_collect(sbi, segno, &gc_list, gc_type) && - gc_type == FG_GC) + seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type); + if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec) sec_freed++; + total_freed += seg_freed; + + if (gc_type == FG_GC) { + if (sbi->skipped_atomic_files[FG_GC] > last_skipped || + sbi->skipped_gc_rwsem) + skipped_round++; + last_skipped = sbi->skipped_atomic_files[FG_GC]; + round++; + } if (gc_type == FG_GC) sbi->cur_victim_sec = NULL_SEGNO; - if (!sync) { - if (has_not_enough_free_secs(sbi, sec_freed, 0)) - goto gc_more; + if (sync) + goto stop; + if (has_not_enough_free_secs(sbi, sec_freed, 0)) { + if (skipped_round <= MAX_SKIP_GC_COUNT || + skipped_round * 2 < round) { + segno = NULL_SEGNO; + goto gc_more; + } + + if (first_skipped < last_skipped && + (last_skipped - first_skipped) > + sbi->skipped_gc_rwsem) { + f2fs_drop_inmem_pages_all(sbi, true); + segno = NULL_SEGNO; + goto gc_more; + } if (gc_type == FG_GC) - ret = write_checkpoint(sbi, &cpc); + ret = f2fs_write_checkpoint(sbi, &cpc); } stop: + SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0; + SIT_I(sbi)->last_victim[FLUSH_DEVICE] = init_segno; + + trace_f2fs_gc_end(sbi->sb, ret, total_freed, sec_freed, + get_pages(sbi, F2FS_DIRTY_NODES), + get_pages(sbi, F2FS_DIRTY_DENTS), + get_pages(sbi, F2FS_DIRTY_IMETA), + free_sections(sbi), + free_segments(sbi), + reserved_segments(sbi), + prefree_segments(sbi)); + mutex_unlock(&sbi->gc_mutex); put_gc_inode(&gc_list); @@ -990,18 +1249,14 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync) return ret; } -void build_gc_manager(struct f2fs_sb_info *sbi) +void f2fs_build_gc_manager(struct f2fs_sb_info *sbi) { - u64 main_count, resv_count, ovp_count, blocks_per_sec; - DIRTY_I(sbi)->v_ops = &default_v_ops; - /* threshold of # of valid blocks in a section for victims of FG_GC */ - main_count = SM_I(sbi)->main_segments << sbi->log_blocks_per_seg; - resv_count = SM_I(sbi)->reserved_segments << sbi->log_blocks_per_seg; - ovp_count = SM_I(sbi)->ovp_segments << sbi->log_blocks_per_seg; - blocks_per_sec = sbi->blocks_per_seg * sbi->segs_per_sec; + sbi->gc_pin_file_threshold = DEF_GC_FAILED_PINNED_FILES; - sbi->fggc_threshold = div_u64((main_count - ovp_count) * blocks_per_sec, - (main_count - resv_count)); + /* give warm/cold data area from slower device */ + if (sbi->s_ndevs && sbi->segs_per_sec == 1) + SIT_I(sbi)->last_victim[ALLOC_NEXT] = + GET_SEGNO(sbi, FDEV(0).end_blk) + 1; }
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h index a993967..c8619e4 100644 --- a/fs/f2fs/gc.h +++ b/fs/f2fs/gc.h
@@ -13,12 +13,15 @@ * whether IO subsystem is idle * or not */ +#define DEF_GC_THREAD_URGENT_SLEEP_TIME 500 /* 500 ms */ #define DEF_GC_THREAD_MIN_SLEEP_TIME 30000 /* milliseconds */ #define DEF_GC_THREAD_MAX_SLEEP_TIME 60000 #define DEF_GC_THREAD_NOGC_SLEEP_TIME 300000 /* wait 5 min */ #define LIMIT_INVALID_BLOCK 40 /* percentage over total user space */ #define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */ +#define DEF_GC_FAILED_PINNED_FILES 2048 + /* Search max. number of dirty segments to select a victim segment */ #define DEF_MAX_VICTIM_SEARCH 4096 /* covers 8GB */ @@ -27,12 +30,13 @@ struct f2fs_gc_kthread { wait_queue_head_t gc_wait_queue_head; /* for gc sleep time */ + unsigned int urgent_sleep_time; unsigned int min_sleep_time; unsigned int max_sleep_time; unsigned int no_gc_sleep_time; /* for changing gc mode */ - unsigned int gc_idle; + unsigned int gc_wake; }; struct gc_inode_list { @@ -65,25 +69,32 @@ static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi) } static inline void increase_sleep_time(struct f2fs_gc_kthread *gc_th, - long *wait) + unsigned int *wait) { + unsigned int min_time = gc_th->min_sleep_time; + unsigned int max_time = gc_th->max_sleep_time; + if (*wait == gc_th->no_gc_sleep_time) return; - *wait += gc_th->min_sleep_time; - if (*wait > gc_th->max_sleep_time) - *wait = gc_th->max_sleep_time; + if ((long long)*wait + (long long)min_time > (long long)max_time) + *wait = max_time; + else + *wait += min_time; } static inline void decrease_sleep_time(struct f2fs_gc_kthread *gc_th, - long *wait) + unsigned int *wait) { + unsigned int min_time = gc_th->min_sleep_time; + if (*wait == gc_th->no_gc_sleep_time) *wait = gc_th->max_sleep_time; - *wait -= gc_th->min_sleep_time; - if (*wait <= gc_th->min_sleep_time) - *wait = gc_th->min_sleep_time; + if ((long long)*wait - (long long)min_time < (long long)min_time) + *wait = min_time; + else + *wait -= min_time; } static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi)
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index 482888e..df71d26 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c
@@ -13,6 +13,7 @@ #include "f2fs.h" #include "node.h" +#include <trace/events/android_fs.h> bool f2fs_may_inline_data(struct inode *inode) { @@ -22,10 +23,10 @@ bool f2fs_may_inline_data(struct inode *inode) if (!S_ISREG(inode->i_mode) && !S_ISLNK(inode->i_mode)) return false; - if (i_size_read(inode) > MAX_INLINE_DATA) + if (i_size_read(inode) > MAX_INLINE_DATA(inode)) return false; - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) + if (f2fs_post_read_required(inode)) return false; return true; @@ -42,8 +43,9 @@ bool f2fs_may_inline_dentry(struct inode *inode) return true; } -void read_inline_data(struct page *page, struct page *ipage) +void f2fs_do_read_inline_data(struct page *page, struct page *ipage) { + struct inode *inode = page->mapping->host; void *src_addr, *dst_addr; if (PageUptodate(page)) @@ -51,56 +53,76 @@ void read_inline_data(struct page *page, struct page *ipage) f2fs_bug_on(F2FS_P_SB(page), page->index); - zero_user_segment(page, MAX_INLINE_DATA, PAGE_SIZE); + zero_user_segment(page, MAX_INLINE_DATA(inode), PAGE_SIZE); /* Copy the whole inline data block */ - src_addr = inline_data_addr(ipage); + src_addr = inline_data_addr(inode, ipage); dst_addr = kmap_atomic(page); - memcpy(dst_addr, src_addr, MAX_INLINE_DATA); + memcpy(dst_addr, src_addr, MAX_INLINE_DATA(inode)); flush_dcache_page(page); kunmap_atomic(dst_addr); if (!PageUptodate(page)) SetPageUptodate(page); } -bool truncate_inline_inode(struct page *ipage, u64 from) +void f2fs_truncate_inline_inode(struct inode *inode, + struct page *ipage, u64 from) { void *addr; - if (from >= MAX_INLINE_DATA) - return false; + if (from >= MAX_INLINE_DATA(inode)) + return; - addr = inline_data_addr(ipage); + addr = inline_data_addr(inode, ipage); f2fs_wait_on_page_writeback(ipage, NODE, true); - memset(addr + from, 0, MAX_INLINE_DATA - from); + memset(addr + from, 0, MAX_INLINE_DATA(inode) - from); set_page_dirty(ipage); - return true; + + if (from == 0) + clear_inode_flag(inode, FI_DATA_EXIST); } int f2fs_read_inline_data(struct inode *inode, struct page *page) { struct page *ipage; - ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino); + if (trace_android_fs_dataread_start_enabled()) { + char *path, pathbuf[MAX_TRACE_PATHBUF_LEN]; + + path = android_fstrace_get_pathname(pathbuf, + MAX_TRACE_PATHBUF_LEN, + inode); + trace_android_fs_dataread_start(inode, page_offset(page), + PAGE_SIZE, current->pid, + path, current->comm); + } + + ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino); if (IS_ERR(ipage)) { + trace_android_fs_dataread_end(inode, page_offset(page), + PAGE_SIZE); unlock_page(page); return PTR_ERR(ipage); } if (!f2fs_has_inline_data(inode)) { f2fs_put_page(ipage, 1); + trace_android_fs_dataread_end(inode, page_offset(page), + PAGE_SIZE); return -EAGAIN; } if (page->index) zero_user_segment(page, 0, PAGE_SIZE); else - read_inline_data(page, ipage); + f2fs_do_read_inline_data(page, ipage); if (!PageUptodate(page)) SetPageUptodate(page); f2fs_put_page(ipage, 1); + trace_android_fs_dataread_end(inode, page_offset(page), + PAGE_SIZE); unlock_page(page); return 0; } @@ -109,12 +131,15 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) { struct f2fs_io_info fio = { .sbi = F2FS_I_SB(dn->inode), + .ino = dn->inode->i_ino, .type = DATA, .op = REQ_OP_WRITE, - .op_flags = WRITE_SYNC | REQ_PRIO, + .op_flags = REQ_SYNC | REQ_PRIO, .page = page, .encrypted_page = NULL, + .io_type = FS_DATA_IO, }; + struct node_info ni; int dirty, err; if (!f2fs_exist_data(dn->inode)) @@ -124,6 +149,14 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) if (err) return err; + err = f2fs_get_node_info(fio.sbi, dn->nid, &ni); + if (err) { + f2fs_put_dnode(dn); + return err; + } + + fio.version = ni.version; + if (unlikely(dn->data_blkaddr != NEW_ADDR)) { f2fs_put_dnode(dn); set_sbi_flag(fio.sbi, SBI_NEED_FSCK); @@ -136,7 +169,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) f2fs_bug_on(F2FS_P_SB(page), PageWriteback(page)); - read_inline_data(page, dn->inode_page); + f2fs_do_read_inline_data(page, dn->inode_page); set_page_dirty(page); /* clear dirty state */ @@ -144,21 +177,25 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) /* write data page to try to make data consistent */ set_page_writeback(page); + ClearPageError(page); fio.old_blkaddr = dn->data_blkaddr; - write_data_page(dn, &fio); + set_inode_flag(dn->inode, FI_HOT_DATA); + f2fs_outplace_write_data(dn, &fio); f2fs_wait_on_page_writeback(page, DATA, true); - if (dirty) + if (dirty) { inode_dec_dirty_pages(dn->inode); + f2fs_remove_dirty_inode(dn->inode); + } /* this converted inline_data should be recovered. */ set_inode_flag(dn->inode, FI_APPEND_WRITE); /* clear inline data and flag after data writeback */ - truncate_inline_inode(dn->inode_page, 0); + f2fs_truncate_inline_inode(dn->inode, dn->inode_page, 0); clear_inline_node(dn->inode_page); clear_out: stat_dec_inline_inode(dn->inode); - f2fs_clear_inline_inode(dn->inode); + clear_inode_flag(dn->inode, FI_INLINE_DATA); f2fs_put_dnode(dn); return 0; } @@ -179,7 +216,7 @@ int f2fs_convert_inline_inode(struct inode *inode) f2fs_lock_op(sbi); - ipage = get_node_page(sbi, inode->i_ino); + ipage = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(ipage)) { err = PTR_ERR(ipage); goto out; @@ -208,7 +245,7 @@ int f2fs_write_inline_data(struct inode *inode, struct page *page) int err; set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, 0, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, 0, LOOKUP_NODE); if (err) return err; @@ -221,11 +258,13 @@ int f2fs_write_inline_data(struct inode *inode, struct page *page) f2fs_wait_on_page_writeback(dn.inode_page, NODE, true); src_addr = kmap_atomic(page); - dst_addr = inline_data_addr(dn.inode_page); - memcpy(dst_addr, src_addr, MAX_INLINE_DATA); + dst_addr = inline_data_addr(inode, dn.inode_page); + memcpy(dst_addr, src_addr, MAX_INLINE_DATA(inode)); kunmap_atomic(src_addr); set_page_dirty(dn.inode_page); + f2fs_clear_radix_tree_dirty_tag(page); + set_inode_flag(inode, FI_APPEND_WRITE); set_inode_flag(inode, FI_DATA_EXIST); @@ -234,7 +273,7 @@ int f2fs_write_inline_data(struct inode *inode, struct page *page) return 0; } -bool recover_inline_data(struct inode *inode, struct page *npage) +bool f2fs_recover_inline_data(struct inode *inode, struct page *npage) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_inode *ri = NULL; @@ -255,14 +294,14 @@ bool recover_inline_data(struct inode *inode, struct page *npage) if (f2fs_has_inline_data(inode) && ri && (ri->i_inline & F2FS_INLINE_DATA)) { process_inline: - ipage = get_node_page(sbi, inode->i_ino); + ipage = f2fs_get_node_page(sbi, inode->i_ino); f2fs_bug_on(sbi, IS_ERR(ipage)); f2fs_wait_on_page_writeback(ipage, NODE, true); - src_addr = inline_data_addr(npage); - dst_addr = inline_data_addr(ipage); - memcpy(dst_addr, src_addr, MAX_INLINE_DATA); + src_addr = inline_data_addr(inode, npage); + dst_addr = inline_data_addr(inode, ipage); + memcpy(dst_addr, src_addr, MAX_INLINE_DATA(inode)); set_inode_flag(inode, FI_INLINE_DATA); set_inode_flag(inode, FI_DATA_EXIST); @@ -273,32 +312,31 @@ bool recover_inline_data(struct inode *inode, struct page *npage) } if (f2fs_has_inline_data(inode)) { - ipage = get_node_page(sbi, inode->i_ino); + ipage = f2fs_get_node_page(sbi, inode->i_ino); f2fs_bug_on(sbi, IS_ERR(ipage)); - if (!truncate_inline_inode(ipage, 0)) - return false; - f2fs_clear_inline_inode(inode); + f2fs_truncate_inline_inode(inode, ipage, 0); + clear_inode_flag(inode, FI_INLINE_DATA); f2fs_put_page(ipage, 1); } else if (ri && (ri->i_inline & F2FS_INLINE_DATA)) { - if (truncate_blocks(inode, 0, false)) + if (f2fs_truncate_blocks(inode, 0, false)) return false; goto process_inline; } return false; } -struct f2fs_dir_entry *find_in_inline_dir(struct inode *dir, +struct f2fs_dir_entry *f2fs_find_in_inline_dir(struct inode *dir, struct fscrypt_name *fname, struct page **res_page) { struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb); - struct f2fs_inline_dentry *inline_dentry; struct qstr name = FSTR_TO_QSTR(&fname->disk_name); struct f2fs_dir_entry *de; struct f2fs_dentry_ptr d; struct page *ipage; + void *inline_dentry; f2fs_hash_t namehash; - ipage = get_node_page(sbi, dir->i_ino); + ipage = f2fs_get_node_page(sbi, dir->i_ino); if (IS_ERR(ipage)) { *res_page = ipage; return NULL; @@ -306,10 +344,10 @@ struct f2fs_dir_entry *find_in_inline_dir(struct inode *dir, namehash = f2fs_dentry_hash(&name, fname); - inline_dentry = inline_data_addr(ipage); + inline_dentry = inline_data_addr(dir, ipage); - make_dentry_ptr(NULL, &d, (void *)inline_dentry, 2); - de = find_target_dentry(fname, namehash, NULL, &d); + make_dentry_ptr_inline(dir, &d, inline_dentry); + de = f2fs_find_target_dentry(fname, namehash, NULL, &d); unlock_page(ipage); if (de) *res_page = ipage; @@ -319,22 +357,22 @@ struct f2fs_dir_entry *find_in_inline_dir(struct inode *dir, return de; } -int make_empty_inline_dir(struct inode *inode, struct inode *parent, +int f2fs_make_empty_inline_dir(struct inode *inode, struct inode *parent, struct page *ipage) { - struct f2fs_inline_dentry *dentry_blk; struct f2fs_dentry_ptr d; + void *inline_dentry; - dentry_blk = inline_data_addr(ipage); + inline_dentry = inline_data_addr(inode, ipage); - make_dentry_ptr(NULL, &d, (void *)dentry_blk, 2); - do_make_empty_dir(inode, parent, &d); + make_dentry_ptr_inline(inode, &d, inline_dentry); + f2fs_do_make_empty_dir(inode, parent, &d); set_page_dirty(ipage); /* update i_size to MAX_INLINE_DATA */ - if (i_size_read(inode) < MAX_INLINE_DATA) - f2fs_i_size_write(inode, MAX_INLINE_DATA); + if (i_size_read(inode) < MAX_INLINE_DATA(inode)) + f2fs_i_size_write(inode, MAX_INLINE_DATA(inode)); return 0; } @@ -343,11 +381,12 @@ int make_empty_inline_dir(struct inode *inode, struct inode *parent, * release ipage in this function. */ static int f2fs_move_inline_dirents(struct inode *dir, struct page *ipage, - struct f2fs_inline_dentry *inline_dentry) + void *inline_dentry) { struct page *page; struct dnode_of_data dn; struct f2fs_dentry_block *dentry_blk; + struct f2fs_dentry_ptr src, dst; int err; page = f2fs_grab_cache_page(dir->i_mapping, 0, false); @@ -373,33 +412,30 @@ static int f2fs_move_inline_dirents(struct inode *dir, struct page *ipage, } f2fs_wait_on_page_writeback(page, DATA, true); - zero_user_segment(page, MAX_INLINE_DATA, PAGE_SIZE); - dentry_blk = kmap_atomic(page); + dentry_blk = page_address(page); + + make_dentry_ptr_inline(dir, &src, inline_dentry); + make_dentry_ptr_block(dir, &dst, dentry_blk); /* copy data from inline dentry block to new dentry block */ - memcpy(dentry_blk->dentry_bitmap, inline_dentry->dentry_bitmap, - INLINE_DENTRY_BITMAP_SIZE); - memset(dentry_blk->dentry_bitmap + INLINE_DENTRY_BITMAP_SIZE, 0, - SIZE_OF_DENTRY_BITMAP - INLINE_DENTRY_BITMAP_SIZE); + memcpy(dst.bitmap, src.bitmap, src.nr_bitmap); + memset(dst.bitmap + src.nr_bitmap, 0, dst.nr_bitmap - src.nr_bitmap); /* * we do not need to zero out remainder part of dentry and filename * field, since we have used bitmap for marking the usage status of * them, besides, we can also ignore copying/zeroing reserved space * of dentry block, because them haven't been used so far. */ - memcpy(dentry_blk->dentry, inline_dentry->dentry, - sizeof(struct f2fs_dir_entry) * NR_INLINE_DENTRY); - memcpy(dentry_blk->filename, inline_dentry->filename, - NR_INLINE_DENTRY * F2FS_SLOT_LEN); + memcpy(dst.dentry, src.dentry, SIZE_OF_DIR_ENTRY * src.max); + memcpy(dst.filename, src.filename, src.max * F2FS_SLOT_LEN); - kunmap_atomic(dentry_blk); if (!PageUptodate(page)) SetPageUptodate(page); set_page_dirty(page); /* clear inline dir and flag after data writeback */ - truncate_inline_inode(ipage, 0); + f2fs_truncate_inline_inode(dir, ipage, 0); stat_dec_inline_dir(dir); clear_inode_flag(dir, FI_INLINE_DENTRY); @@ -412,14 +448,13 @@ static int f2fs_move_inline_dirents(struct inode *dir, struct page *ipage, return err; } -static int f2fs_add_inline_entries(struct inode *dir, - struct f2fs_inline_dentry *inline_dentry) +static int f2fs_add_inline_entries(struct inode *dir, void *inline_dentry) { struct f2fs_dentry_ptr d; unsigned long bit_pos = 0; int err = 0; - make_dentry_ptr(NULL, &d, (void *)inline_dentry, 2); + make_dentry_ptr_inline(dir, &d, inline_dentry); while (bit_pos < d.max) { struct f2fs_dir_entry *de; @@ -440,10 +475,10 @@ static int f2fs_add_inline_entries(struct inode *dir, } new_name.name = d.filename[bit_pos]; - new_name.len = de->name_len; + new_name.len = le16_to_cpu(de->name_len); ino = le32_to_cpu(de->ino); - fake_mode = get_de_type(de) << S_SHIFT; + fake_mode = f2fs_get_de_type(de) << S_SHIFT; err = f2fs_add_regular_entry(dir, &new_name, NULL, NULL, ino, fake_mode); @@ -455,26 +490,26 @@ static int f2fs_add_inline_entries(struct inode *dir, return 0; punch_dentry_pages: truncate_inode_pages(&dir->i_data, 0); - truncate_blocks(dir, 0, false); - remove_dirty_inode(dir); + f2fs_truncate_blocks(dir, 0, false); + f2fs_remove_dirty_inode(dir); return err; } static int f2fs_move_rehashed_dirents(struct inode *dir, struct page *ipage, - struct f2fs_inline_dentry *inline_dentry) + void *inline_dentry) { - struct f2fs_inline_dentry *backup_dentry; + void *backup_dentry; int err; backup_dentry = f2fs_kmalloc(F2FS_I_SB(dir), - sizeof(struct f2fs_inline_dentry), GFP_F2FS_ZERO); + MAX_INLINE_DATA(dir), GFP_F2FS_ZERO); if (!backup_dentry) { f2fs_put_page(ipage, 1); return -ENOMEM; } - memcpy(backup_dentry, inline_dentry, MAX_INLINE_DATA); - truncate_inline_inode(ipage, 0); + memcpy(backup_dentry, inline_dentry, MAX_INLINE_DATA(dir)); + f2fs_truncate_inline_inode(dir, ipage, 0); unlock_page(ipage); @@ -490,9 +525,10 @@ static int f2fs_move_rehashed_dirents(struct inode *dir, struct page *ipage, return 0; recover: lock_page(ipage); - memcpy(inline_dentry, backup_dentry, MAX_INLINE_DATA); + f2fs_wait_on_page_writeback(ipage, NODE, true); + memcpy(inline_dentry, backup_dentry, MAX_INLINE_DATA(dir)); f2fs_i_depth_write(dir, 0); - f2fs_i_size_write(dir, MAX_INLINE_DATA); + f2fs_i_size_write(dir, MAX_INLINE_DATA(dir)); set_page_dirty(ipage); f2fs_put_page(ipage, 1); @@ -501,7 +537,7 @@ static int f2fs_move_rehashed_dirents(struct inode *dir, struct page *ipage, } static int f2fs_convert_inline_dir(struct inode *dir, struct page *ipage, - struct f2fs_inline_dentry *inline_dentry) + void *inline_dentry) { if (!F2FS_I(dir)->i_dir_level) return f2fs_move_inline_dirents(dir, ipage, inline_dentry); @@ -517,21 +553,22 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name, struct page *ipage; unsigned int bit_pos; f2fs_hash_t name_hash; - struct f2fs_inline_dentry *dentry_blk = NULL; + void *inline_dentry = NULL; struct f2fs_dentry_ptr d; int slots = GET_DENTRY_SLOTS(new_name->len); struct page *page = NULL; int err = 0; - ipage = get_node_page(sbi, dir->i_ino); + ipage = f2fs_get_node_page(sbi, dir->i_ino); if (IS_ERR(ipage)) return PTR_ERR(ipage); - dentry_blk = inline_data_addr(ipage); - bit_pos = room_for_filename(&dentry_blk->dentry_bitmap, - slots, NR_INLINE_DENTRY); - if (bit_pos >= NR_INLINE_DENTRY) { - err = f2fs_convert_inline_dir(dir, ipage, dentry_blk); + inline_dentry = inline_data_addr(dir, ipage); + make_dentry_ptr_inline(dir, &d, inline_dentry); + + bit_pos = f2fs_room_for_filename(d.bitmap, slots, d.max); + if (bit_pos >= d.max) { + err = f2fs_convert_inline_dir(dir, ipage, inline_dentry); if (err) return err; err = -EAGAIN; @@ -540,20 +577,17 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name, if (inode) { down_write(&F2FS_I(inode)->i_sem); - page = init_inode_metadata(inode, dir, new_name, + page = f2fs_init_inode_metadata(inode, dir, new_name, orig_name, ipage); if (IS_ERR(page)) { err = PTR_ERR(page); goto fail; } - if (f2fs_encrypted_inode(dir)) - file_set_enc_name(inode); } f2fs_wait_on_page_writeback(ipage, NODE, true); name_hash = f2fs_dentry_hash(new_name, NULL); - make_dentry_ptr(NULL, &d, (void *)dentry_blk, 2); f2fs_update_dentry(ino, mode, &d, new_name, name_hash, bit_pos); set_page_dirty(ipage); @@ -564,7 +598,7 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name, f2fs_put_page(page, 1); } - update_parent_metadata(dir, inode, 0); + f2fs_update_parent_metadata(dir, inode, 0); fail: if (inode) up_write(&F2FS_I(inode)->i_sem); @@ -576,7 +610,8 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name, void f2fs_delete_inline_entry(struct f2fs_dir_entry *dentry, struct page *page, struct inode *dir, struct inode *inode) { - struct f2fs_inline_dentry *inline_dentry; + struct f2fs_dentry_ptr d; + void *inline_dentry; int slots = GET_DENTRY_SLOTS(le16_to_cpu(dentry->name_len)); unsigned int bit_pos; int i; @@ -584,17 +619,18 @@ void f2fs_delete_inline_entry(struct f2fs_dir_entry *dentry, struct page *page, lock_page(page); f2fs_wait_on_page_writeback(page, NODE, true); - inline_dentry = inline_data_addr(page); - bit_pos = dentry - inline_dentry->dentry; + inline_dentry = inline_data_addr(dir, page); + make_dentry_ptr_inline(dir, &d, inline_dentry); + + bit_pos = dentry - d.dentry; for (i = 0; i < slots; i++) - __clear_bit_le(bit_pos + i, - &inline_dentry->dentry_bitmap); + __clear_bit_le(bit_pos + i, d.bitmap); set_page_dirty(page); f2fs_put_page(page, 1); dir->i_ctime = dir->i_mtime = current_time(dir); - f2fs_mark_inode_dirty_sync(dir); + f2fs_mark_inode_dirty_sync(dir, false); if (inode) f2fs_drop_nlink(dir, inode); @@ -605,20 +641,21 @@ bool f2fs_empty_inline_dir(struct inode *dir) struct f2fs_sb_info *sbi = F2FS_I_SB(dir); struct page *ipage; unsigned int bit_pos = 2; - struct f2fs_inline_dentry *dentry_blk; + void *inline_dentry; + struct f2fs_dentry_ptr d; - ipage = get_node_page(sbi, dir->i_ino); + ipage = f2fs_get_node_page(sbi, dir->i_ino); if (IS_ERR(ipage)) return false; - dentry_blk = inline_data_addr(ipage); - bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap, - NR_INLINE_DENTRY, - bit_pos); + inline_dentry = inline_data_addr(dir, ipage); + make_dentry_ptr_inline(dir, &d, inline_dentry); + + bit_pos = find_next_bit_le(d.bitmap, d.max, bit_pos); f2fs_put_page(ipage, 1); - if (bit_pos < NR_INLINE_DENTRY) + if (bit_pos < d.max) return false; return true; @@ -628,26 +665,30 @@ int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx, struct fscrypt_str *fstr) { struct inode *inode = file_inode(file); - struct f2fs_inline_dentry *inline_dentry = NULL; struct page *ipage = NULL; struct f2fs_dentry_ptr d; + void *inline_dentry = NULL; + int err; - if (ctx->pos == NR_INLINE_DENTRY) + make_dentry_ptr_inline(inode, &d, inline_dentry); + + if (ctx->pos == d.max) return 0; - ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino); + ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino); if (IS_ERR(ipage)) return PTR_ERR(ipage); - inline_dentry = inline_data_addr(ipage); + inline_dentry = inline_data_addr(inode, ipage); - make_dentry_ptr(inode, &d, (void *)inline_dentry, 2); + make_dentry_ptr_inline(inode, &d, inline_dentry); - if (!f2fs_fill_dentries(ctx, &d, 0, fstr)) - ctx->pos = NR_INLINE_DENTRY; + err = f2fs_fill_dentries(ctx, &d, 0, fstr); + if (!err) + ctx->pos = d.max; f2fs_put_page(ipage, 1); - return 0; + return err < 0 ? err : 0; } int f2fs_inline_data_fiemap(struct inode *inode, @@ -660,7 +701,7 @@ int f2fs_inline_data_fiemap(struct inode *inode, struct page *ipage; int err = 0; - ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino); + ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino); if (IS_ERR(ipage)) return PTR_ERR(ipage); @@ -669,16 +710,20 @@ int f2fs_inline_data_fiemap(struct inode *inode, goto out; } - ilen = min_t(size_t, MAX_INLINE_DATA, i_size_read(inode)); + ilen = min_t(size_t, MAX_INLINE_DATA(inode), i_size_read(inode)); if (start >= ilen) goto out; if (start + len < ilen) ilen = start + len; ilen -= start; - get_node_info(F2FS_I_SB(inode), inode->i_ino, &ni); + err = f2fs_get_node_info(F2FS_I_SB(inode), inode->i_ino, &ni); + if (err) + goto out; + byteaddr = (__u64)ni.blk_addr << inode->i_sb->s_blocksize_bits; - byteaddr += (char *)inline_data_addr(ipage) - (char *)F2FS_INODE(ipage); + byteaddr += (char *)inline_data_addr(inode, ipage) - + (char *)F2FS_INODE(ipage); err = fiemap_fill_next_extent(fieinfo, start, byteaddr, ilen, flags); out: f2fs_put_page(ipage, 1);
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index d736989..959df22 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c
@@ -16,13 +16,18 @@ #include "f2fs.h" #include "node.h" +#include "segment.h" #include <trace/events/f2fs.h> -void f2fs_mark_inode_dirty_sync(struct inode *inode) +void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync) { - if (f2fs_inode_dirtied(inode)) + if (is_inode_flag_set(inode, FI_NEW_INODE)) return; + + if (f2fs_inode_dirtied(inode, sync)) + return; + mark_inode_dirty_sync(inode); } @@ -31,64 +36,73 @@ void f2fs_set_inode_flags(struct inode *inode) unsigned int flags = F2FS_I(inode)->i_flags; unsigned int new_fl = 0; - if (flags & FS_SYNC_FL) + if (flags & F2FS_SYNC_FL) new_fl |= S_SYNC; - if (flags & FS_APPEND_FL) + if (flags & F2FS_APPEND_FL) new_fl |= S_APPEND; - if (flags & FS_IMMUTABLE_FL) + if (flags & F2FS_IMMUTABLE_FL) new_fl |= S_IMMUTABLE; - if (flags & FS_NOATIME_FL) + if (flags & F2FS_NOATIME_FL) new_fl |= S_NOATIME; - if (flags & FS_DIRSYNC_FL) + if (flags & F2FS_DIRSYNC_FL) new_fl |= S_DIRSYNC; + if (f2fs_encrypted_inode(inode)) + new_fl |= S_ENCRYPTED; inode_set_flags(inode, new_fl, - S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC); - f2fs_mark_inode_dirty_sync(inode); + S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC| + S_ENCRYPTED); } static void __get_inode_rdev(struct inode *inode, struct f2fs_inode *ri) { + int extra_size = get_extra_isize(inode); + if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) || S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) { - if (ri->i_addr[0]) - inode->i_rdev = - old_decode_dev(le32_to_cpu(ri->i_addr[0])); + if (ri->i_addr[extra_size]) + inode->i_rdev = old_decode_dev( + le32_to_cpu(ri->i_addr[extra_size])); else - inode->i_rdev = - new_decode_dev(le32_to_cpu(ri->i_addr[1])); + inode->i_rdev = new_decode_dev( + le32_to_cpu(ri->i_addr[extra_size + 1])); } } -static bool __written_first_block(struct f2fs_inode *ri) +static int __written_first_block(struct f2fs_sb_info *sbi, + struct f2fs_inode *ri) { - block_t addr = le32_to_cpu(ri->i_addr[0]); + block_t addr = le32_to_cpu(ri->i_addr[offset_in_addr(ri)]); - if (addr != NEW_ADDR && addr != NULL_ADDR) - return true; - return false; + if (!__is_valid_data_blkaddr(addr)) + return 1; + if (!f2fs_is_valid_blkaddr(sbi, addr, DATA_GENERIC)) + return -EFAULT; + return 0; } static void __set_inode_rdev(struct inode *inode, struct f2fs_inode *ri) { + int extra_size = get_extra_isize(inode); + if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) { if (old_valid_dev(inode->i_rdev)) { - ri->i_addr[0] = + ri->i_addr[extra_size] = cpu_to_le32(old_encode_dev(inode->i_rdev)); - ri->i_addr[1] = 0; + ri->i_addr[extra_size + 1] = 0; } else { - ri->i_addr[0] = 0; - ri->i_addr[1] = + ri->i_addr[extra_size] = 0; + ri->i_addr[extra_size + 1] = cpu_to_le32(new_encode_dev(inode->i_rdev)); - ri->i_addr[2] = 0; + ri->i_addr[extra_size + 2] = 0; } } } static void __recover_inline_status(struct inode *inode, struct page *ipage) { - void *inline_data = inline_data_addr(ipage); + void *inline_data = inline_data_addr(inode, ipage); __le32 *start = inline_data; - __le32 *end = start + MAX_INLINE_DATA / sizeof(__le32); + __le32 *end = start + MAX_INLINE_DATA(inode) / sizeof(__le32); while (start < end) { if (*start++) { @@ -103,22 +117,193 @@ static void __recover_inline_status(struct inode *inode, struct page *ipage) return; } +static bool f2fs_enable_inode_chksum(struct f2fs_sb_info *sbi, struct page *page) +{ + struct f2fs_inode *ri = &F2FS_NODE(page)->i; + + if (!f2fs_sb_has_inode_chksum(sbi->sb)) + return false; + + if (!IS_INODE(page) || !(ri->i_inline & F2FS_EXTRA_ATTR)) + return false; + + if (!F2FS_FITS_IN_INODE(ri, le16_to_cpu(ri->i_extra_isize), + i_inode_checksum)) + return false; + + return true; +} + +static __u32 f2fs_inode_chksum(struct f2fs_sb_info *sbi, struct page *page) +{ + struct f2fs_node *node = F2FS_NODE(page); + struct f2fs_inode *ri = &node->i; + __le32 ino = node->footer.ino; + __le32 gen = ri->i_generation; + __u32 chksum, chksum_seed; + __u32 dummy_cs = 0; + unsigned int offset = offsetof(struct f2fs_inode, i_inode_checksum); + unsigned int cs_size = sizeof(dummy_cs); + + chksum = f2fs_chksum(sbi, sbi->s_chksum_seed, (__u8 *)&ino, + sizeof(ino)); + chksum_seed = f2fs_chksum(sbi, chksum, (__u8 *)&gen, sizeof(gen)); + + chksum = f2fs_chksum(sbi, chksum_seed, (__u8 *)ri, offset); + chksum = f2fs_chksum(sbi, chksum, (__u8 *)&dummy_cs, cs_size); + offset += cs_size; + chksum = f2fs_chksum(sbi, chksum, (__u8 *)ri + offset, + F2FS_BLKSIZE - offset); + return chksum; +} + +bool f2fs_inode_chksum_verify(struct f2fs_sb_info *sbi, struct page *page) +{ + struct f2fs_inode *ri; + __u32 provided, calculated; + + if (unlikely(is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN))) + return true; + +#ifdef CONFIG_F2FS_CHECK_FS + if (!f2fs_enable_inode_chksum(sbi, page)) +#else + if (!f2fs_enable_inode_chksum(sbi, page) || + PageDirty(page) || PageWriteback(page)) +#endif + return true; + + ri = &F2FS_NODE(page)->i; + provided = le32_to_cpu(ri->i_inode_checksum); + calculated = f2fs_inode_chksum(sbi, page); + + if (provided != calculated) + f2fs_msg(sbi->sb, KERN_WARNING, + "checksum invalid, ino = %x, %x vs. %x", + ino_of_node(page), provided, calculated); + + return provided == calculated; +} + +void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, struct page *page) +{ + struct f2fs_inode *ri = &F2FS_NODE(page)->i; + + if (!f2fs_enable_inode_chksum(sbi, page)) + return; + + ri->i_inode_checksum = cpu_to_le32(f2fs_inode_chksum(sbi, page)); +} + +static bool sanity_check_inode(struct inode *inode, struct page *node_page) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct f2fs_inode_info *fi = F2FS_I(inode); + unsigned long long iblocks; + + iblocks = le64_to_cpu(F2FS_INODE(node_page)->i_blocks); + if (!iblocks) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: corrupted inode i_blocks i_ino=%lx iblocks=%llu, " + "run fsck to fix.", + __func__, inode->i_ino, iblocks); + return false; + } + + if (ino_of_node(node_page) != nid_of_node(node_page)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: corrupted inode footer i_ino=%lx, ino,nid: " + "[%u, %u] run fsck to fix.", + __func__, inode->i_ino, + ino_of_node(node_page), nid_of_node(node_page)); + return false; + } + + if (f2fs_sb_has_flexible_inline_xattr(sbi->sb) + && !f2fs_has_extra_attr(inode)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: corrupted inode ino=%lx, run fsck to fix.", + __func__, inode->i_ino); + return false; + } + + if (f2fs_has_extra_attr(inode) && + !f2fs_sb_has_extra_attr(sbi->sb)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: inode (ino=%lx) is with extra_attr, " + "but extra_attr feature is off", + __func__, inode->i_ino); + return false; + } + + if (fi->i_extra_isize > F2FS_TOTAL_EXTRA_ATTR_SIZE || + fi->i_extra_isize % sizeof(__le32)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: inode (ino=%lx) has corrupted i_extra_isize: %d, " + "max: %zu", + __func__, inode->i_ino, fi->i_extra_isize, + F2FS_TOTAL_EXTRA_ATTR_SIZE); + return false; + } + + if (F2FS_I(inode)->extent_tree) { + struct extent_info *ei = &F2FS_I(inode)->extent_tree->largest; + + if (ei->len && + (!f2fs_is_valid_blkaddr(sbi, ei->blk, DATA_GENERIC) || + !f2fs_is_valid_blkaddr(sbi, ei->blk + ei->len - 1, + DATA_GENERIC))) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: inode (ino=%lx) extent info [%u, %u, %u] " + "is incorrect, run fsck to fix", + __func__, inode->i_ino, + ei->blk, ei->fofs, ei->len); + return false; + } + } + + if (f2fs_has_inline_data(inode) && + (!S_ISREG(inode->i_mode) && !S_ISLNK(inode->i_mode))) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: inode (ino=%lx, mode=%u) should not have " + "inline_data, run fsck to fix", + __func__, inode->i_ino, inode->i_mode); + return false; + } + + if (f2fs_has_inline_dentry(inode) && !S_ISDIR(inode->i_mode)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: inode (ino=%lx, mode=%u) should not have " + "inline_dentry, run fsck to fix", + __func__, inode->i_ino, inode->i_mode); + return false; + } + + return true; +} + static int do_read_inode(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_inode_info *fi = F2FS_I(inode); struct page *node_page; struct f2fs_inode *ri; + projid_t i_projid; + int err; /* Check if ino is within scope */ - if (check_nid_range(sbi, inode->i_ino)) { - f2fs_msg(inode->i_sb, KERN_ERR, "bad inode number: %lu", - (unsigned long) inode->i_ino); - WARN_ON(1); + if (f2fs_check_nid_range(sbi, inode->i_ino)) return -EINVAL; - } - node_page = get_node_page(sbi, inode->i_ino); + node_page = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(node_page)) return PTR_ERR(node_page); @@ -129,7 +314,7 @@ static int do_read_inode(struct inode *inode) i_gid_write(inode, le32_to_cpu(ri->i_gid)); set_nlink(inode, le32_to_cpu(ri->i_links)); inode->i_size = le64_to_cpu(ri->i_size); - inode->i_blocks = le64_to_cpu(ri->i_blocks); + inode->i_blocks = SECTOR_FROM_BLOCK(le64_to_cpu(ri->i_blocks) - 1); inode->i_atime.tv_sec = le64_to_cpu(ri->i_atime); inode->i_ctime.tv_sec = le64_to_cpu(ri->i_ctime); @@ -138,8 +323,11 @@ static int do_read_inode(struct inode *inode) inode->i_ctime.tv_nsec = le32_to_cpu(ri->i_ctime_nsec); inode->i_mtime.tv_nsec = le32_to_cpu(ri->i_mtime_nsec); inode->i_generation = le32_to_cpu(ri->i_generation); - - fi->i_current_depth = le32_to_cpu(ri->i_current_depth); + if (S_ISDIR(inode->i_mode)) + fi->i_current_depth = le32_to_cpu(ri->i_current_depth); + else if (S_ISREG(inode->i_mode)) + fi->i_gc_failures[GC_FAILURE_PIN] = + le16_to_cpu(ri->i_gc_failures); fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid); fi->i_flags = le32_to_cpu(ri->i_flags); fi->flags = 0; @@ -152,6 +340,30 @@ static int do_read_inode(struct inode *inode) get_inline_info(inode, ri); + fi->i_extra_isize = f2fs_has_extra_attr(inode) ? + le16_to_cpu(ri->i_extra_isize) : 0; + + if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) { + fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size); + } else if (f2fs_has_inline_xattr(inode) || + f2fs_has_inline_dentry(inode)) { + fi->i_inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS; + } else { + + /* + * Previous inline data or directory always reserved 200 bytes + * in inode layout, even if inline_xattr is disabled. In order + * to keep inline_dentry's structure for backward compatibility, + * we get the space back only from inline_data. + */ + fi->i_inline_xattr_size = 0; + } + + if (!sanity_check_inode(inode, node_page)) { + f2fs_put_page(node_page, 1); + return -EINVAL; + } + /* check data exist */ if (f2fs_has_inline_data(inode) && !f2fs_exist_data(inode)) __recover_inline_status(inode, node_page); @@ -159,12 +371,39 @@ static int do_read_inode(struct inode *inode) /* get rdev by using inline_info */ __get_inode_rdev(inode, ri); - if (__written_first_block(ri)) - set_inode_flag(inode, FI_FIRST_BLOCK_WRITTEN); + if (S_ISREG(inode->i_mode)) { + err = __written_first_block(sbi, ri); + if (err < 0) { + f2fs_put_page(node_page, 1); + return err; + } + if (!err) + set_inode_flag(inode, FI_FIRST_BLOCK_WRITTEN); + } - if (!need_inode_block_update(sbi, inode->i_ino)) + if (!f2fs_need_inode_block_update(sbi, inode->i_ino)) fi->last_disk_size = inode->i_size; + if (fi->i_flags & F2FS_PROJINHERIT_FL) + set_inode_flag(inode, FI_PROJ_INHERIT); + + if (f2fs_has_extra_attr(inode) && f2fs_sb_has_project_quota(sbi->sb) && + F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_projid)) + i_projid = (projid_t)le32_to_cpu(ri->i_projid); + else + i_projid = F2FS_DEF_PROJID; + fi->i_projid = make_kprojid(&init_user_ns, i_projid); + + if (f2fs_has_extra_attr(inode) && f2fs_sb_has_inode_crtime(sbi->sb) && + F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_crtime)) { + fi->i_crtime.tv_sec = le64_to_cpu(ri->i_crtime); + fi->i_crtime.tv_nsec = le32_to_cpu(ri->i_crtime_nsec); + } + + F2FS_I(inode)->i_disk_time[0] = inode->i_atime; + F2FS_I(inode)->i_disk_time[1] = inode->i_ctime; + F2FS_I(inode)->i_disk_time[2] = inode->i_mtime; + F2FS_I(inode)->i_disk_time[3] = F2FS_I(inode)->i_crtime; f2fs_put_page(node_page, 1); stat_inc_inline_xattr(inode); @@ -197,10 +436,10 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) make_now: if (ino == F2FS_NODE_INO(sbi)) { inode->i_mapping->a_ops = &f2fs_node_aops; - mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO); + mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); } else if (ino == F2FS_META_INO(sbi)) { inode->i_mapping->a_ops = &f2fs_meta_aops; - mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO); + mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); } else if (S_ISREG(inode->i_mode)) { inode->i_op = &f2fs_file_inode_operations; inode->i_fop = &f2fs_file_operations; @@ -209,7 +448,7 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) inode->i_op = &f2fs_dir_inode_operations; inode->i_fop = &f2fs_dir_operations; inode->i_mapping->a_ops = &f2fs_dblock_aops; - mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_HIGH_ZERO); + inode_nohighmem(inode); } else if (S_ISLNK(inode->i_mode)) { if (f2fs_encrypted_inode(inode)) inode->i_op = &f2fs_encrypted_symlink_inode_operations; @@ -225,6 +464,7 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) ret = -EIO; goto bad_inode; } + f2fs_set_inode_flags(inode); unlock_new_inode(inode); trace_f2fs_iget(inode); return inode; @@ -249,13 +489,15 @@ struct inode *f2fs_iget_retry(struct super_block *sb, unsigned long ino) return inode; } -int update_inode(struct inode *inode, struct page *node_page) +void f2fs_update_inode(struct inode *inode, struct page *node_page) { struct f2fs_inode *ri; - - f2fs_inode_synced(inode); + struct extent_tree *et = F2FS_I(inode)->extent_tree; f2fs_wait_on_page_writeback(node_page, NODE, true); + set_page_dirty(node_page); + + f2fs_inode_synced(inode); ri = F2FS_INODE(node_page); @@ -265,13 +507,15 @@ int update_inode(struct inode *inode, struct page *node_page) ri->i_gid = cpu_to_le32(i_gid_read(inode)); ri->i_links = cpu_to_le32(inode->i_nlink); ri->i_size = cpu_to_le64(i_size_read(inode)); - ri->i_blocks = cpu_to_le64(inode->i_blocks); + ri->i_blocks = cpu_to_le64(SECTOR_TO_BLOCK(inode->i_blocks) + 1); - if (F2FS_I(inode)->extent_tree) - set_raw_extent(&F2FS_I(inode)->extent_tree->largest, - &ri->i_ext); - else + if (et) { + read_lock(&et->lock); + set_raw_extent(&et->largest, &ri->i_ext); + read_unlock(&et->lock); + } else { memset(&ri->i_ext, 0, sizeof(ri->i_ext)); + } set_raw_inline(inode, ri); ri->i_atime = cpu_to_le64(inode->i_atime.tv_sec); @@ -280,30 +524,67 @@ int update_inode(struct inode *inode, struct page *node_page) ri->i_atime_nsec = cpu_to_le32(inode->i_atime.tv_nsec); ri->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec); ri->i_mtime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec); - ri->i_current_depth = cpu_to_le32(F2FS_I(inode)->i_current_depth); + if (S_ISDIR(inode->i_mode)) + ri->i_current_depth = + cpu_to_le32(F2FS_I(inode)->i_current_depth); + else if (S_ISREG(inode->i_mode)) + ri->i_gc_failures = + cpu_to_le16(F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN]); ri->i_xattr_nid = cpu_to_le32(F2FS_I(inode)->i_xattr_nid); ri->i_flags = cpu_to_le32(F2FS_I(inode)->i_flags); ri->i_pino = cpu_to_le32(F2FS_I(inode)->i_pino); ri->i_generation = cpu_to_le32(inode->i_generation); ri->i_dir_level = F2FS_I(inode)->i_dir_level; + if (f2fs_has_extra_attr(inode)) { + ri->i_extra_isize = cpu_to_le16(F2FS_I(inode)->i_extra_isize); + + if (f2fs_sb_has_flexible_inline_xattr(F2FS_I_SB(inode)->sb)) + ri->i_inline_xattr_size = + cpu_to_le16(F2FS_I(inode)->i_inline_xattr_size); + + if (f2fs_sb_has_project_quota(F2FS_I_SB(inode)->sb) && + F2FS_FITS_IN_INODE(ri, F2FS_I(inode)->i_extra_isize, + i_projid)) { + projid_t i_projid; + + i_projid = from_kprojid(&init_user_ns, + F2FS_I(inode)->i_projid); + ri->i_projid = cpu_to_le32(i_projid); + } + + if (f2fs_sb_has_inode_crtime(F2FS_I_SB(inode)->sb) && + F2FS_FITS_IN_INODE(ri, F2FS_I(inode)->i_extra_isize, + i_crtime)) { + ri->i_crtime = + cpu_to_le64(F2FS_I(inode)->i_crtime.tv_sec); + ri->i_crtime_nsec = + cpu_to_le32(F2FS_I(inode)->i_crtime.tv_nsec); + } + } + __set_inode_rdev(inode, ri); - set_cold_node(inode, node_page); /* deleted inode */ if (inode->i_nlink == 0) clear_inline_node(node_page); - return set_page_dirty(node_page); + F2FS_I(inode)->i_disk_time[0] = inode->i_atime; + F2FS_I(inode)->i_disk_time[1] = inode->i_ctime; + F2FS_I(inode)->i_disk_time[2] = inode->i_mtime; + F2FS_I(inode)->i_disk_time[3] = F2FS_I(inode)->i_crtime; + +#ifdef CONFIG_F2FS_CHECK_FS + f2fs_inode_chksum_set(F2FS_I_SB(inode), node_page); +#endif } -int update_inode_page(struct inode *inode) +void f2fs_update_inode_page(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct page *node_page; - int ret = 0; retry: - node_page = get_node_page(sbi, inode->i_ino); + node_page = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(node_page)) { int err = PTR_ERR(node_page); if (err == -ENOMEM) { @@ -312,12 +593,10 @@ int update_inode_page(struct inode *inode) } else if (err != -ENOENT) { f2fs_stop_checkpoint(sbi, false); } - f2fs_inode_synced(inode); - return 0; + return; } - ret = update_inode(inode, node_page); + f2fs_update_inode(inode, node_page); f2fs_put_page(node_page, 1); - return ret; } int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc) @@ -335,7 +614,8 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc) * We need to balance fs here to prevent from producing dirty node pages * during the urgent cleaning time when runing out of free sections. */ - if (update_inode_page(inode)) + f2fs_update_inode_page(inode); + if (wbc && wbc->nr_to_write) f2fs_balance_fs(sbi, true); return 0; } @@ -351,7 +631,7 @@ void f2fs_evict_inode(struct inode *inode) /* some remained atomic pages should discarded */ if (f2fs_is_atomic_file(inode)) - drop_inmem_pages(inode); + f2fs_drop_inmem_pages(inode); trace_f2fs_evict_inode(inode); truncate_inode_pages_final(&inode->i_data); @@ -361,17 +641,18 @@ void f2fs_evict_inode(struct inode *inode) goto out_clear; f2fs_bug_on(sbi, get_dirty_pages(inode)); - remove_dirty_inode(inode); + f2fs_remove_dirty_inode(inode); f2fs_destroy_extent_tree(inode); if (inode->i_nlink || is_bad_inode(inode)) goto no_delete; -#ifdef CONFIG_F2FS_FAULT_INJECTION - if (time_to_inject(sbi, FAULT_EVICT_INODE)) - goto no_delete; -#endif + dquot_initialize(inode); + + f2fs_remove_ino_entry(sbi, inode->i_ino, APPEND_INO); + f2fs_remove_ino_entry(sbi, inode->i_ino, UPDATE_INO); + f2fs_remove_ino_entry(sbi, inode->i_ino, FLUSH_INO); sb_start_intwrite(inode->i_sb); set_inode_flag(inode, FI_NO_ALLOC); @@ -380,10 +661,17 @@ void f2fs_evict_inode(struct inode *inode) if (F2FS_HAS_BLOCKS(inode)) err = f2fs_truncate(inode); + if (time_to_inject(sbi, FAULT_EVICT_INODE)) { + f2fs_show_injection_info(FAULT_EVICT_INODE); + err = -EIO; + } + if (!err) { f2fs_lock_op(sbi); - err = remove_inode_page(inode); + err = f2fs_remove_inode_page(inode); f2fs_unlock_op(sbi); + if (err == -ENOENT) + err = 0; } /* give more chances, if ENOMEM case */ @@ -393,36 +681,67 @@ void f2fs_evict_inode(struct inode *inode) } if (err) - update_inode_page(inode); + f2fs_update_inode_page(inode); + dquot_free_inode(inode); sb_end_intwrite(inode->i_sb); no_delete: + dquot_drop(inode); + stat_dec_inline_xattr(inode); stat_dec_inline_dir(inode); stat_dec_inline_inode(inode); - invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, inode->i_ino); + if (likely(!is_set_ckpt_flags(sbi, CP_ERROR_FLAG))) + f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE)); + else + f2fs_inode_synced(inode); + + /* ino == 0, if f2fs_new_inode() was failed t*/ + if (inode->i_ino) + invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, + inode->i_ino); if (xnid) invalidate_mapping_pages(NODE_MAPPING(sbi), xnid, xnid); - if (is_inode_flag_set(inode, FI_APPEND_WRITE)) - add_ino_entry(sbi, inode->i_ino, APPEND_INO); - if (is_inode_flag_set(inode, FI_UPDATE_WRITE)) - add_ino_entry(sbi, inode->i_ino, UPDATE_INO); - if (is_inode_flag_set(inode, FI_FREE_NID)) { - alloc_nid_failed(sbi, inode->i_ino); - clear_inode_flag(inode, FI_FREE_NID); + if (inode->i_nlink) { + if (is_inode_flag_set(inode, FI_APPEND_WRITE)) + f2fs_add_ino_entry(sbi, inode->i_ino, APPEND_INO); + if (is_inode_flag_set(inode, FI_UPDATE_WRITE)) + f2fs_add_ino_entry(sbi, inode->i_ino, UPDATE_INO); } - f2fs_bug_on(sbi, err && - !exist_written_data(sbi, inode->i_ino, ORPHAN_INO)); + if (is_inode_flag_set(inode, FI_FREE_NID)) { + f2fs_alloc_nid_failed(sbi, inode->i_ino); + clear_inode_flag(inode, FI_FREE_NID); + } else { + /* + * If xattr nid is corrupted, we can reach out error condition, + * err & !f2fs_exist_written_data(sbi, inode->i_ino, ORPHAN_INO)). + * In that case, f2fs_check_nid_range() is enough to give a clue. + */ + } out_clear: - fscrypt_put_encryption_info(inode, NULL); + fscrypt_put_encryption_info(inode); clear_inode(inode); } /* caller should call f2fs_lock_op() */ -void handle_failed_inode(struct inode *inode) +void f2fs_handle_failed_inode(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct node_info ni; + int err; + + /* + * clear nlink of inode in order to release resource of inode + * immediately. + */ + clear_nlink(inode); + + /* + * we must call this to avoid inode being remained as dirty, resulting + * in a panic when flushing dirty inodes in gdirty_list. + */ + f2fs_update_inode_page(inode); + f2fs_inode_synced(inode); /* don't make bad inode, since it becomes a regular file. */ unlock_new_inode(inode); @@ -432,22 +751,29 @@ void handle_failed_inode(struct inode *inode) * so we can prevent losing this orphan when encoutering checkpoint * and following suddenly power-off. */ - get_node_info(sbi, inode->i_ino, &ni); + err = f2fs_get_node_info(sbi, inode->i_ino, &ni); + if (err) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "May loss orphan inode, run fsck to fix."); + goto out; + } if (ni.blk_addr != NULL_ADDR) { - int err = acquire_orphan_inode(sbi); + err = f2fs_acquire_orphan_inode(sbi); if (err) { set_sbi_flag(sbi, SBI_NEED_FSCK); f2fs_msg(sbi->sb, KERN_WARNING, "Too many orphan inodes, run fsck to fix."); } else { - add_orphan_inode(inode); + f2fs_add_orphan_inode(inode); } - alloc_nid_done(sbi, inode->i_ino); + f2fs_alloc_nid_done(sbi, inode->i_ino); } else { set_inode_flag(inode, FI_FREE_NID); } +out: f2fs_unlock_op(sbi); /* iput will drop the inode object */
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index ccb99d5..56593b3 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c
@@ -15,6 +15,7 @@ #include <linux/ctype.h> #include <linux/dcache.h> #include <linux/namei.h> +#include <linux/quotaops.h> #include "f2fs.h" #include "node.h" @@ -28,6 +29,7 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode) nid_t ino; struct inode *inode; bool nid_free = false; + int xattr_size = 0; int err; inode = new_inode(dir->i_sb); @@ -35,46 +37,93 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode) return ERR_PTR(-ENOMEM); f2fs_lock_op(sbi); - if (!alloc_nid(sbi, &ino)) { + if (!f2fs_alloc_nid(sbi, &ino)) { f2fs_unlock_op(sbi); err = -ENOSPC; goto fail; } f2fs_unlock_op(sbi); + nid_free = true; + inode_init_owner(inode, dir, mode); inode->i_ino = ino; inode->i_blocks = 0; - inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); + inode->i_mtime = inode->i_atime = inode->i_ctime = + F2FS_I(inode)->i_crtime = current_time(inode); inode->i_generation = sbi->s_next_generation++; + if (S_ISDIR(inode->i_mode)) + F2FS_I(inode)->i_current_depth = 1; + err = insert_inode_locked(inode); if (err) { err = -EINVAL; - nid_free = true; goto fail; } - /* If the directory encrypted, then we should encrypt the inode. */ - if (f2fs_encrypted_inode(dir) && f2fs_may_encrypt(inode)) - f2fs_set_encrypted_inode(inode); + if (f2fs_sb_has_project_quota(sbi->sb) && + (F2FS_I(dir)->i_flags & F2FS_PROJINHERIT_FL)) + F2FS_I(inode)->i_projid = F2FS_I(dir)->i_projid; + else + F2FS_I(inode)->i_projid = make_kprojid(&init_user_ns, + F2FS_DEF_PROJID); + + err = dquot_initialize(inode); + if (err) + goto fail_drop; + + err = dquot_alloc_inode(inode); + if (err) + goto fail_drop; set_inode_flag(inode, FI_NEW_INODE); + /* If the directory encrypted, then we should encrypt the inode. */ + if ((f2fs_encrypted_inode(dir) || DUMMY_ENCRYPTION_ENABLED(sbi)) && + f2fs_may_encrypt(inode)) + f2fs_set_encrypted_inode(inode); + + if (f2fs_sb_has_extra_attr(sbi->sb)) { + set_inode_flag(inode, FI_EXTRA_ATTR); + F2FS_I(inode)->i_extra_isize = F2FS_TOTAL_EXTRA_ATTR_SIZE; + } + if (test_opt(sbi, INLINE_XATTR)) set_inode_flag(inode, FI_INLINE_XATTR); + if (test_opt(sbi, INLINE_DATA) && f2fs_may_inline_data(inode)) set_inode_flag(inode, FI_INLINE_DATA); if (f2fs_may_inline_dentry(inode)) set_inode_flag(inode, FI_INLINE_DENTRY); + if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) { + f2fs_bug_on(sbi, !f2fs_has_extra_attr(inode)); + if (f2fs_has_inline_xattr(inode)) + xattr_size = F2FS_OPTION(sbi).inline_xattr_size; + /* Otherwise, will be 0 */ + } else if (f2fs_has_inline_xattr(inode) || + f2fs_has_inline_dentry(inode)) { + xattr_size = DEFAULT_INLINE_XATTR_ADDRS; + } + F2FS_I(inode)->i_inline_xattr_size = xattr_size; + f2fs_init_extent_tree(inode, NULL); stat_inc_inline_xattr(inode); stat_inc_inline_inode(inode); stat_inc_inline_dir(inode); + F2FS_I(inode)->i_flags = + f2fs_mask_flags(mode, F2FS_I(dir)->i_flags & F2FS_FL_INHERITED); + + if (S_ISDIR(inode->i_mode)) + F2FS_I(inode)->i_flags |= F2FS_INDEX_FL; + + if (F2FS_I(inode)->i_flags & F2FS_PROJINHERIT_FL) + set_inode_flag(inode, FI_PROJ_INHERIT); + trace_f2fs_new_inode(inode, 0); return inode; @@ -85,9 +134,19 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode) set_inode_flag(inode, FI_FREE_NID); iput(inode); return ERR_PTR(err); +fail_drop: + trace_f2fs_new_inode(inode, err); + dquot_drop(inode); + inode->i_flags |= S_NOQUOTA; + if (nid_free) + set_inode_flag(inode, FI_FREE_NID); + clear_nlink(inode); + unlock_new_inode(inode); + iput(inode); + return ERR_PTR(err); } -static int is_multimedia_file(const unsigned char *s, const char *sub) +static int is_extension_exist(const unsigned char *s, const char *sub) { size_t slen = strlen(s); size_t sublen = strlen(sub); @@ -113,19 +172,94 @@ static int is_multimedia_file(const unsigned char *s, const char *sub) /* * Set multimedia files as cold files for hot/cold data separation */ -static inline void set_cold_files(struct f2fs_sb_info *sbi, struct inode *inode, +static inline void set_file_temperature(struct f2fs_sb_info *sbi, struct inode *inode, const unsigned char *name) { - int i; - __u8 (*extlist)[8] = sbi->raw_super->extension_list; + __u8 (*extlist)[F2FS_EXTENSION_LEN] = sbi->raw_super->extension_list; + int i, cold_count, hot_count; - int count = le32_to_cpu(sbi->raw_super->extension_count); - for (i = 0; i < count; i++) { - if (is_multimedia_file(name, extlist[i])) { + down_read(&sbi->sb_lock); + + cold_count = le32_to_cpu(sbi->raw_super->extension_count); + hot_count = sbi->raw_super->hot_ext_count; + + for (i = 0; i < cold_count + hot_count; i++) { + if (!is_extension_exist(name, extlist[i])) + continue; + if (i < cold_count) file_set_cold(inode); - break; - } + else + file_set_hot(inode); + break; } + + up_read(&sbi->sb_lock); +} + +int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name, + bool hot, bool set) +{ + __u8 (*extlist)[F2FS_EXTENSION_LEN] = sbi->raw_super->extension_list; + int cold_count = le32_to_cpu(sbi->raw_super->extension_count); + int hot_count = sbi->raw_super->hot_ext_count; + int total_count = cold_count + hot_count; + int start, count; + int i; + + if (set) { + if (total_count == F2FS_MAX_EXTENSION) + return -EINVAL; + } else { + if (!hot && !cold_count) + return -EINVAL; + if (hot && !hot_count) + return -EINVAL; + } + + if (hot) { + start = cold_count; + count = total_count; + } else { + start = 0; + count = cold_count; + } + + for (i = start; i < count; i++) { + if (strcmp(name, extlist[i])) + continue; + + if (set) + return -EINVAL; + + memcpy(extlist[i], extlist[i + 1], + F2FS_EXTENSION_LEN * (total_count - i - 1)); + memset(extlist[total_count - 1], 0, F2FS_EXTENSION_LEN); + if (hot) + sbi->raw_super->hot_ext_count = hot_count - 1; + else + sbi->raw_super->extension_count = + cpu_to_le32(cold_count - 1); + return 0; + } + + if (!set) + return -EINVAL; + + if (hot) { + memcpy(extlist[count], name, strlen(name)); + sbi->raw_super->hot_ext_count = hot_count + 1; + } else { + char buf[F2FS_MAX_EXTENSION][F2FS_EXTENSION_LEN]; + + memcpy(buf, &extlist[cold_count], + F2FS_EXTENSION_LEN * hot_count); + memset(extlist[cold_count], 0, F2FS_EXTENSION_LEN); + memcpy(extlist[cold_count], name, strlen(name)); + memcpy(&extlist[cold_count + 1], buf, + F2FS_EXTENSION_LEN * hot_count); + sbi->raw_super->extension_count = cpu_to_le32(cold_count + 1); + } + return 0; } static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode, @@ -136,35 +270,42 @@ static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode, nid_t ino = 0; int err; + if (unlikely(f2fs_cp_error(sbi))) + return -EIO; + + err = dquot_initialize(dir); + if (err) + return err; + inode = f2fs_new_inode(dir, mode); if (IS_ERR(inode)) return PTR_ERR(inode); if (!test_opt(sbi, DISABLE_EXT_IDENTIFY)) - set_cold_files(sbi, inode, dentry->d_name.name); + set_file_temperature(sbi, inode, dentry->d_name.name); inode->i_op = &f2fs_file_inode_operations; inode->i_fop = &f2fs_file_operations; inode->i_mapping->a_ops = &f2fs_dblock_aops; ino = inode->i_ino; - f2fs_balance_fs(sbi, true); - f2fs_lock_op(sbi); err = f2fs_add_link(dentry, inode); if (err) goto out; f2fs_unlock_op(sbi); - alloc_nid_done(sbi, ino); + f2fs_alloc_nid_done(sbi, ino); d_instantiate_new(dentry, inode); if (IS_DIRSYNC(dir)) f2fs_sync_fs(sbi->sb, 1); + + f2fs_balance_fs(sbi, true); return 0; out: - handle_failed_inode(inode); + f2fs_handle_failed_inode(inode); return err; } @@ -175,9 +316,21 @@ static int f2fs_link(struct dentry *old_dentry, struct inode *dir, struct f2fs_sb_info *sbi = F2FS_I_SB(dir); int err; - if (f2fs_encrypted_inode(dir) && - !fscrypt_has_permitted_context(dir, inode)) - return -EPERM; + if (unlikely(f2fs_cp_error(sbi))) + return -EIO; + + err = fscrypt_prepare_link(old_dentry, dir, dentry); + if (err) + return err; + + if (is_inode_flag_set(dir, FI_PROJ_INHERIT) && + (!projid_eq(F2FS_I(dir)->i_projid, + F2FS_I(old_dentry->d_inode)->i_projid))) + return -EXDEV; + + err = dquot_initialize(dir); + if (err) + return err; f2fs_balance_fs(sbi, true); @@ -232,32 +385,33 @@ static int __recover_dot_dentries(struct inode *dir, nid_t pino) return 0; } + err = dquot_initialize(dir); + if (err) + return err; + f2fs_balance_fs(sbi, true); f2fs_lock_op(sbi); de = f2fs_find_entry(dir, &dot, &page); if (de) { - f2fs_dentry_kunmap(dir, page); f2fs_put_page(page, 0); } else if (IS_ERR(page)) { err = PTR_ERR(page); goto out; } else { - err = __f2fs_add_link(dir, &dot, NULL, dir->i_ino, S_IFDIR); + err = f2fs_do_add_link(dir, &dot, NULL, dir->i_ino, S_IFDIR); if (err) goto out; } de = f2fs_find_entry(dir, &dotdot, &page); - if (de) { - f2fs_dentry_kunmap(dir, page); + if (de) f2fs_put_page(page, 0); - } else if (IS_ERR(page)) { + else if (IS_ERR(page)) err = PTR_ERR(page); - } else { - err = __f2fs_add_link(dir, &dotdot, NULL, pino, S_IFDIR); - } + else + err = f2fs_do_add_link(dir, &dotdot, NULL, pino, S_IFDIR); out: if (!err) clear_inode_flag(dir, FI_INLINE_DOTS); @@ -272,66 +426,70 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry, struct inode *inode = NULL; struct f2fs_dir_entry *de; struct page *page; - nid_t ino; + struct dentry *new; + nid_t ino = -1; int err = 0; unsigned int root_ino = F2FS_ROOT_INO(F2FS_I_SB(dir)); - if (f2fs_encrypted_inode(dir)) { - int res = fscrypt_get_encryption_info(dir); + trace_f2fs_lookup_start(dir, dentry, flags); - /* - * DCACHE_ENCRYPTED_WITH_KEY is set if the dentry is - * created while the directory was encrypted and we - * don't have access to the key. - */ - if (fscrypt_has_encryption_key(dir)) - fscrypt_set_encrypted_dentry(dentry); - fscrypt_set_d_op(dentry); - if (res && res != -ENOKEY) - return ERR_PTR(res); + err = fscrypt_prepare_lookup(dir, dentry, flags); + if (err) + goto out; + + if (dentry->d_name.len > F2FS_NAME_LEN) { + err = -ENAMETOOLONG; + goto out; } - if (dentry->d_name.len > F2FS_NAME_LEN) - return ERR_PTR(-ENAMETOOLONG); - de = f2fs_find_entry(dir, &dentry->d_name, &page); if (!de) { - if (IS_ERR(page)) - return (struct dentry *)page; - return d_splice_alias(inode, dentry); + if (IS_ERR(page)) { + err = PTR_ERR(page); + goto out; + } + goto out_splice; } ino = le32_to_cpu(de->ino); - f2fs_dentry_kunmap(dir, page); f2fs_put_page(page, 0); inode = f2fs_iget(dir->i_sb, ino); - if (IS_ERR(inode)) - return ERR_CAST(inode); + if (IS_ERR(inode)) { + err = PTR_ERR(inode); + goto out; + } if ((dir->i_ino == root_ino) && f2fs_has_inline_dots(dir)) { err = __recover_dot_dentries(dir, root_ino); if (err) - goto err_out; + goto out_iput; } if (f2fs_has_inline_dots(inode)) { err = __recover_dot_dentries(inode, dir->i_ino); if (err) - goto err_out; + goto out_iput; } - if (!IS_ERR(inode) && f2fs_encrypted_inode(dir) && - (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) && - !fscrypt_has_permitted_context(dir, inode)) { - bool nokey = f2fs_encrypted_inode(inode) && - !fscrypt_has_encryption_key(inode); - err = nokey ? -ENOKEY : -EPERM; - goto err_out; + if (f2fs_encrypted_inode(dir) && + (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) && + !fscrypt_has_permitted_context(dir, inode)) { + f2fs_msg(inode->i_sb, KERN_WARNING, + "Inconsistent encryption contexts: %lu/%lu", + dir->i_ino, inode->i_ino); + err = -EPERM; + goto out_iput; } - return d_splice_alias(inode, dentry); - -err_out: +out_splice: + new = d_splice_alias(inode, dentry); + if (IS_ERR(new)) + err = PTR_ERR(new); + trace_f2fs_lookup_end(dir, dentry, ino, err); + return new; +out_iput: iput(inode); +out: + trace_f2fs_lookup_end(dir, dentry, ino, err); return ERR_PTR(err); } @@ -345,6 +503,16 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry) trace_f2fs_unlink_enter(dir, dentry); + if (unlikely(f2fs_cp_error(sbi))) + return -EIO; + + err = dquot_initialize(dir); + if (err) + return err; + err = dquot_initialize(inode); + if (err) + return err; + de = f2fs_find_entry(dir, &dentry->d_name, &page); if (!de) { if (IS_ERR(page)) @@ -355,10 +523,9 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry) f2fs_balance_fs(sbi, true); f2fs_lock_op(sbi); - err = acquire_orphan_inode(sbi); + err = f2fs_acquire_orphan_inode(sbi); if (err) { f2fs_unlock_op(sbi); - f2fs_dentry_kunmap(dir, page); f2fs_put_page(page, 0); goto fail; } @@ -392,73 +559,42 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry, struct f2fs_sb_info *sbi = F2FS_I_SB(dir); struct inode *inode; size_t len = strlen(symname); - struct fscrypt_str disk_link = FSTR_INIT((char *)symname, len + 1); - struct fscrypt_symlink_data *sd = NULL; + struct fscrypt_str disk_link; int err; - if (f2fs_encrypted_inode(dir)) { - err = fscrypt_get_encryption_info(dir); - if (err) - return err; + if (unlikely(f2fs_cp_error(sbi))) + return -EIO; - if (!fscrypt_has_encryption_key(dir)) - return -ENOKEY; + err = fscrypt_prepare_symlink(dir, symname, len, dir->i_sb->s_blocksize, + &disk_link); + if (err) + return err; - disk_link.len = (fscrypt_fname_encrypted_size(dir, len) + - sizeof(struct fscrypt_symlink_data)); - } - - if (disk_link.len > dir->i_sb->s_blocksize) - return -ENAMETOOLONG; + err = dquot_initialize(dir); + if (err) + return err; inode = f2fs_new_inode(dir, S_IFLNK | S_IRWXUGO); if (IS_ERR(inode)) return PTR_ERR(inode); - if (f2fs_encrypted_inode(inode)) + if (IS_ENCRYPTED(inode)) inode->i_op = &f2fs_encrypted_symlink_inode_operations; else inode->i_op = &f2fs_symlink_inode_operations; inode_nohighmem(inode); inode->i_mapping->a_ops = &f2fs_dblock_aops; - f2fs_balance_fs(sbi, true); - f2fs_lock_op(sbi); err = f2fs_add_link(dentry, inode); if (err) - goto out; + goto out_f2fs_handle_failed_inode; f2fs_unlock_op(sbi); - alloc_nid_done(sbi, inode->i_ino); + f2fs_alloc_nid_done(sbi, inode->i_ino); - if (f2fs_encrypted_inode(inode)) { - struct qstr istr = QSTR_INIT(symname, len); - struct fscrypt_str ostr; - - sd = kzalloc(disk_link.len, GFP_NOFS); - if (!sd) { - err = -ENOMEM; - goto err_out; - } - - err = fscrypt_get_encryption_info(inode); - if (err) - goto err_out; - - if (!fscrypt_has_encryption_key(inode)) { - err = -ENOKEY; - goto err_out; - } - - ostr.name = sd->encrypted_path; - ostr.len = disk_link.len; - err = fscrypt_fname_usr_to_disk(inode, &istr, &ostr); - if (err) - goto err_out; - - sd->len = cpu_to_le16(ostr.len); - disk_link.name = (char *)sd; - } + err = fscrypt_encrypt_symlink(inode, symname, len, &disk_link); + if (err) + goto err_out; err = page_symlink(inode, disk_link.name, disk_link.len); @@ -484,10 +620,14 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry, f2fs_unlink(dir, dentry); } - kfree(sd); - return err; -out: - handle_failed_inode(inode); + f2fs_balance_fs(sbi, true); + goto out_free_encrypted_link; + +out_f2fs_handle_failed_inode: + f2fs_handle_failed_inode(inode); +out_free_encrypted_link: + if (disk_link.name != (unsigned char *)symname) + kfree(disk_link.name); return err; } @@ -497,6 +637,13 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) struct inode *inode; int err; + if (unlikely(f2fs_cp_error(sbi))) + return -EIO; + + err = dquot_initialize(dir); + if (err) + return err; + inode = f2fs_new_inode(dir, S_IFDIR | mode); if (IS_ERR(inode)) return PTR_ERR(inode); @@ -504,9 +651,7 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) inode->i_op = &f2fs_dir_inode_operations; inode->i_fop = &f2fs_dir_operations; inode->i_mapping->a_ops = &f2fs_dblock_aops; - mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_HIGH_ZERO); - - f2fs_balance_fs(sbi, true); + inode_nohighmem(inode); set_inode_flag(inode, FI_INC_LINK); f2fs_lock_op(sbi); @@ -515,17 +660,19 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) goto out_fail; f2fs_unlock_op(sbi); - alloc_nid_done(sbi, inode->i_ino); + f2fs_alloc_nid_done(sbi, inode->i_ino); d_instantiate_new(dentry, inode); if (IS_DIRSYNC(dir)) f2fs_sync_fs(sbi->sb, 1); + + f2fs_balance_fs(sbi, true); return 0; out_fail: clear_inode_flag(inode, FI_INC_LINK); - handle_failed_inode(inode); + f2fs_handle_failed_inode(inode); return err; } @@ -544,6 +691,13 @@ static int f2fs_mknod(struct inode *dir, struct dentry *dentry, struct inode *inode; int err = 0; + if (unlikely(f2fs_cp_error(sbi))) + return -EIO; + + err = dquot_initialize(dir); + if (err) + return err; + inode = f2fs_new_inode(dir, mode); if (IS_ERR(inode)) return PTR_ERR(inode); @@ -551,23 +705,23 @@ static int f2fs_mknod(struct inode *dir, struct dentry *dentry, init_special_inode(inode, inode->i_mode, rdev); inode->i_op = &f2fs_special_inode_operations; - f2fs_balance_fs(sbi, true); - f2fs_lock_op(sbi); err = f2fs_add_link(dentry, inode); if (err) goto out; f2fs_unlock_op(sbi); - alloc_nid_done(sbi, inode->i_ino); + f2fs_alloc_nid_done(sbi, inode->i_ino); d_instantiate_new(dentry, inode); if (IS_DIRSYNC(dir)) f2fs_sync_fs(sbi->sb, 1); + + f2fs_balance_fs(sbi, true); return 0; out: - handle_failed_inode(inode); + f2fs_handle_failed_inode(inode); return err; } @@ -578,6 +732,10 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry *dentry, struct inode *inode; int err; + err = dquot_initialize(dir); + if (err) + return err; + inode = f2fs_new_inode(dir, mode); if (IS_ERR(inode)) return PTR_ERR(inode); @@ -591,10 +749,8 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry *dentry, inode->i_mapping->a_ops = &f2fs_dblock_aops; } - f2fs_balance_fs(sbi, true); - f2fs_lock_op(sbi); - err = acquire_orphan_inode(sbi); + err = f2fs_acquire_orphan_inode(sbi); if (err) goto out; @@ -606,8 +762,8 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry *dentry, * add this non-linked tmpfile to orphan list, in this way we could * remove all unused data of tmpfile after abnormal power-off. */ - add_orphan_inode(inode); - alloc_nid_done(sbi, inode->i_ino); + f2fs_add_orphan_inode(inode); + f2fs_alloc_nid_done(sbi, inode->i_ino); if (whiteout) { f2fs_i_links_write(inode, false); @@ -618,18 +774,25 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry *dentry, /* link_count was changed by d_tmpfile as well. */ f2fs_unlock_op(sbi); unlock_new_inode(inode); + + f2fs_balance_fs(sbi, true); return 0; release_out: - release_orphan_inode(sbi); + f2fs_release_orphan_inode(sbi); out: - handle_failed_inode(inode); + f2fs_handle_failed_inode(inode); return err; } static int f2fs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode) { - if (f2fs_encrypted_inode(dir)) { + struct f2fs_sb_info *sbi = F2FS_I_SB(dir); + + if (unlikely(f2fs_cp_error(sbi))) + return -EIO; + + if (f2fs_encrypted_inode(dir) || DUMMY_ENCRYPTION_ENABLED(sbi)) { int err = fscrypt_get_encryption_info(dir); if (err) return err; @@ -640,6 +803,9 @@ static int f2fs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode) static int f2fs_create_whiteout(struct inode *dir, struct inode **whiteout) { + if (unlikely(f2fs_cp_error(F2FS_I_SB(dir)))) + return -EIO; + return __f2fs_tmpfile(dir, NULL, S_IFCHR | WHITEOUT_MODE, whiteout); } @@ -659,16 +825,26 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, bool is_old_inline = f2fs_has_inline_dentry(old_dir); int err = -ENOENT; - if ((f2fs_encrypted_inode(old_dir) && - !fscrypt_has_encryption_key(old_dir)) || - (f2fs_encrypted_inode(new_dir) && - !fscrypt_has_encryption_key(new_dir))) - return -ENOKEY; + if (unlikely(f2fs_cp_error(sbi))) + return -EIO; - if ((old_dir != new_dir) && f2fs_encrypted_inode(new_dir) && - !fscrypt_has_permitted_context(new_dir, old_inode)) { - err = -EPERM; + if (is_inode_flag_set(new_dir, FI_PROJ_INHERIT) && + (!projid_eq(F2FS_I(new_dir)->i_projid, + F2FS_I(old_dentry->d_inode)->i_projid))) + return -EXDEV; + + err = dquot_initialize(old_dir); + if (err) goto out; + + err = dquot_initialize(new_dir); + if (err) + goto out; + + if (new_inode) { + err = dquot_initialize(new_inode); + if (err) + goto out; } old_entry = f2fs_find_entry(old_dir, &old_dentry->d_name, &old_page); @@ -712,17 +888,10 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, f2fs_lock_op(sbi); - err = acquire_orphan_inode(sbi); + err = f2fs_acquire_orphan_inode(sbi); if (err) goto put_out_dir; - err = update_dent_inode(old_inode, new_inode, - &new_dentry->d_name); - if (err) { - release_orphan_inode(sbi); - goto put_out_dir; - } - f2fs_set_link(new_dir, new_entry, new_page, old_inode); new_inode->i_ctime = current_time(new_inode); @@ -733,9 +902,9 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, up_write(&F2FS_I(new_inode)->i_sem); if (!new_inode->i_nlink) - add_orphan_inode(new_inode); + f2fs_add_orphan_inode(new_inode); else - release_orphan_inode(sbi); + f2fs_release_orphan_inode(sbi); } else { f2fs_balance_fs(sbi, true); @@ -774,13 +943,14 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, } down_write(&F2FS_I(old_inode)->i_sem); - file_lost_pino(old_inode); - if (new_inode && file_enc_name(new_inode)) - file_set_enc_name(old_inode); + if (!old_dir_entry || whiteout) + file_lost_pino(old_inode); + else + F2FS_I(old_inode)->i_pino = new_dir->i_ino; up_write(&F2FS_I(old_inode)->i_sem); old_inode->i_ctime = current_time(old_inode); - f2fs_mark_inode_dirty_sync(old_inode); + f2fs_mark_inode_dirty_sync(old_inode, false); f2fs_delete_entry(old_entry, old_page, old_dir, NULL); @@ -795,15 +965,19 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, } if (old_dir_entry) { - if (old_dir != new_dir && !whiteout) { + if (old_dir != new_dir && !whiteout) f2fs_set_link(old_inode, old_dir_entry, old_dir_page, new_dir); - } else { - f2fs_dentry_kunmap(old_inode, old_dir_page); + else f2fs_put_page(old_dir_page, 0); - } f2fs_i_links_write(old_dir, false); } + if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT) { + f2fs_add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO); + if (S_ISDIR(old_inode->i_mode)) + f2fs_add_ino_entry(sbi, old_inode->i_ino, + TRANS_DIR_INO); + } f2fs_unlock_op(sbi); @@ -813,20 +987,15 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, put_out_dir: f2fs_unlock_op(sbi); - if (new_page) { - f2fs_dentry_kunmap(new_dir, new_page); + if (new_page) f2fs_put_page(new_page, 0); - } out_whiteout: if (whiteout) iput(whiteout); out_dir: - if (old_dir_entry) { - f2fs_dentry_kunmap(old_inode, old_dir_page); + if (old_dir_entry) f2fs_put_page(old_dir_page, 0); - } out_old: - f2fs_dentry_kunmap(old_dir, old_page); f2fs_put_page(old_page, 0); out: return err; @@ -845,17 +1014,24 @@ static int f2fs_cross_rename(struct inode *old_dir, struct dentry *old_dentry, int old_nlink = 0, new_nlink = 0; int err = -ENOENT; - if ((f2fs_encrypted_inode(old_dir) && - !fscrypt_has_encryption_key(old_dir)) || - (f2fs_encrypted_inode(new_dir) && - !fscrypt_has_encryption_key(new_dir))) - return -ENOKEY; + if (unlikely(f2fs_cp_error(sbi))) + return -EIO; - if ((f2fs_encrypted_inode(old_dir) || f2fs_encrypted_inode(new_dir)) && - (old_dir != new_dir) && - (!fscrypt_has_permitted_context(new_dir, old_inode) || - !fscrypt_has_permitted_context(old_dir, new_inode))) - return -EPERM; + if ((is_inode_flag_set(new_dir, FI_PROJ_INHERIT) && + !projid_eq(F2FS_I(new_dir)->i_projid, + F2FS_I(old_dentry->d_inode)->i_projid)) || + (is_inode_flag_set(new_dir, FI_PROJ_INHERIT) && + !projid_eq(F2FS_I(old_dir)->i_projid, + F2FS_I(new_dentry->d_inode)->i_projid))) + return -EXDEV; + + err = dquot_initialize(old_dir); + if (err) + goto out; + + err = dquot_initialize(new_dir); + if (err) + goto out; old_entry = f2fs_find_entry(old_dir, &old_dentry->d_name, &old_page); if (!old_entry) { @@ -904,8 +1080,8 @@ static int f2fs_cross_rename(struct inode *old_dir, struct dentry *old_dentry, old_nlink = old_dir_entry ? -1 : 1; new_nlink = -old_nlink; err = -EMLINK; - if ((old_nlink > 0 && old_inode->i_nlink >= F2FS_LINK_MAX) || - (new_nlink > 0 && new_inode->i_nlink >= F2FS_LINK_MAX)) + if ((old_nlink > 0 && old_dir->i_nlink >= F2FS_LINK_MAX) || + (new_nlink > 0 && new_dir->i_nlink >= F2FS_LINK_MAX)) goto out_new_dir; } @@ -913,18 +1089,6 @@ static int f2fs_cross_rename(struct inode *old_dir, struct dentry *old_dentry, f2fs_lock_op(sbi); - err = update_dent_inode(old_inode, new_inode, &new_dentry->d_name); - if (err) - goto out_unlock; - if (file_enc_name(new_inode)) - file_set_enc_name(old_inode); - - err = update_dent_inode(new_inode, old_inode, &old_dentry->d_name); - if (err) - goto out_undo; - if (file_enc_name(old_inode)) - file_set_enc_name(new_inode); - /* update ".." directory entry info of old dentry */ if (old_dir_entry) f2fs_set_link(old_inode, old_dir_entry, old_dir_page, new_dir); @@ -946,7 +1110,7 @@ static int f2fs_cross_rename(struct inode *old_dir, struct dentry *old_dentry, f2fs_i_links_write(old_dir, old_nlink > 0); up_write(&F2FS_I(old_dir)->i_sem); } - f2fs_mark_inode_dirty_sync(old_dir); + f2fs_mark_inode_dirty_sync(old_dir, false); /* update directory entry info of new dir inode */ f2fs_set_link(new_dir, new_entry, new_page, old_inode); @@ -961,36 +1125,29 @@ static int f2fs_cross_rename(struct inode *old_dir, struct dentry *old_dentry, f2fs_i_links_write(new_dir, new_nlink > 0); up_write(&F2FS_I(new_dir)->i_sem); } - f2fs_mark_inode_dirty_sync(new_dir); + f2fs_mark_inode_dirty_sync(new_dir, false); + + if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT) { + f2fs_add_ino_entry(sbi, old_dir->i_ino, TRANS_DIR_INO); + f2fs_add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO); + } f2fs_unlock_op(sbi); if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir)) f2fs_sync_fs(sbi->sb, 1); return 0; -out_undo: - /* - * Still we may fail to recover name info of f2fs_inode here - * Drop it, once its name is set as encrypted - */ - update_dent_inode(old_inode, old_inode, &old_dentry->d_name); -out_unlock: - f2fs_unlock_op(sbi); out_new_dir: if (new_dir_entry) { - f2fs_dentry_kunmap(new_inode, new_dir_page); f2fs_put_page(new_dir_page, 0); } out_old_dir: if (old_dir_entry) { - f2fs_dentry_kunmap(old_inode, old_dir_page); f2fs_put_page(old_dir_page, 0); } out_new: - f2fs_dentry_kunmap(new_dir, new_page); f2fs_put_page(new_page, 0); out_old: - f2fs_dentry_kunmap(old_dir, old_page); f2fs_put_page(old_page, 0); out: return err; @@ -1000,9 +1157,16 @@ static int f2fs_rename2(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry, unsigned int flags) { + int err; + if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT)) return -EINVAL; + err = fscrypt_prepare_rename(old_dir, old_dentry, new_dir, new_dentry, + flags); + if (err) + return err; + if (flags & RENAME_EXCHANGE) { return f2fs_cross_rename(old_dir, old_dentry, new_dir, new_dentry); @@ -1018,68 +1182,20 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, struct inode *inode, struct delayed_call *done) { - struct page *cpage = NULL; - char *caddr, *paddr = NULL; - struct fscrypt_str cstr = FSTR_INIT(NULL, 0); - struct fscrypt_str pstr = FSTR_INIT(NULL, 0); - struct fscrypt_symlink_data *sd; - u32 max_size = inode->i_sb->s_blocksize; - int res; + struct page *page; + const char *target; if (!dentry) return ERR_PTR(-ECHILD); - res = fscrypt_get_encryption_info(inode); - if (res) - return ERR_PTR(res); + page = read_mapping_page(inode->i_mapping, 0, NULL); + if (IS_ERR(page)) + return ERR_CAST(page); - cpage = read_mapping_page(inode->i_mapping, 0, NULL); - if (IS_ERR(cpage)) - return ERR_CAST(cpage); - caddr = page_address(cpage); - - /* Symlink is encrypted */ - sd = (struct fscrypt_symlink_data *)caddr; - cstr.name = sd->encrypted_path; - cstr.len = le16_to_cpu(sd->len); - - /* this is broken symlink case */ - if (unlikely(cstr.len == 0)) { - res = -ENOENT; - goto errout; - } - - if ((cstr.len + sizeof(struct fscrypt_symlink_data) - 1) > max_size) { - /* Symlink data on the disk is corrupted */ - res = -EIO; - goto errout; - } - res = fscrypt_fname_alloc_buffer(inode, cstr.len, &pstr); - if (res) - goto errout; - - res = fscrypt_fname_disk_to_usr(inode, 0, 0, &cstr, &pstr); - if (res) - goto errout; - - /* this is broken symlink case */ - if (unlikely(pstr.name[0] == 0)) { - res = -ENOENT; - goto errout; - } - - paddr = pstr.name; - - /* Null-terminate the name */ - paddr[pstr.len] = '\0'; - - put_page(cpage); - set_delayed_call(done, kfree_link, paddr); - return paddr; -errout: - fscrypt_fname_free_buffer(&pstr); - put_page(cpage); - return ERR_PTR(res); + target = fscrypt_get_symlink(inode, page_address(page), + inode->i_sb->s_blocksize, done); + put_page(page); + return target; } const struct inode_operations f2fs_encrypted_symlink_inode_operations = {
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index addff6a3..f213a53 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c
@@ -19,16 +19,33 @@ #include "f2fs.h" #include "node.h" #include "segment.h" +#include "xattr.h" #include "trace.h" #include <trace/events/f2fs.h> -#define on_build_free_nids(nmi) mutex_is_locked(&nm_i->build_lock) +#define on_f2fs_build_free_nids(nmi) mutex_is_locked(&(nm_i)->build_lock) static struct kmem_cache *nat_entry_slab; static struct kmem_cache *free_nid_slab; static struct kmem_cache *nat_entry_set_slab; +static struct kmem_cache *fsync_node_entry_slab; -bool available_free_memory(struct f2fs_sb_info *sbi, int type) +/* + * Check whether the given nid is within node id range. + */ +int f2fs_check_nid_range(struct f2fs_sb_info *sbi, nid_t nid) +{ + if (unlikely(nid < F2FS_ROOT_INO(sbi) || nid >= NM_I(sbi)->max_nid)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: out-of-range nid=%x, run fsck to fix.", + __func__, nid); + return -EINVAL; + } + return 0; +} + +bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct sysinfo val; @@ -45,8 +62,8 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int type) * give 25%, 25%, 50%, 50%, 50% memory for each components respectively */ if (type == FREE_NIDS) { - mem_size = (nm_i->fcnt * sizeof(struct free_nid)) >> - PAGE_SHIFT; + mem_size = (nm_i->nid_cnt[FREE_NID] * + sizeof(struct free_nid)) >> PAGE_SHIFT; res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2); } else if (type == NAT_ENTRIES) { mem_size = (nm_i->nat_cnt * sizeof(struct nat_entry)) >> @@ -62,9 +79,10 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int type) } else if (type == INO_ENTRIES) { int i; - for (i = 0; i <= UPDATE_INO; i++) - mem_size += (sbi->im[i].ino_num * - sizeof(struct ino_entry)) >> PAGE_SHIFT; + for (i = 0; i < MAX_INO_ENTRY; i++) + mem_size += sbi->im[i].ino_num * + sizeof(struct ino_entry); + mem_size >>= PAGE_SHIFT; res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1); } else if (type == EXTENT_CACHE) { mem_size = (atomic_read(&sbi->total_ext_tree) * @@ -72,6 +90,10 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int type) atomic_read(&sbi->total_ext_node) * sizeof(struct extent_node)) >> PAGE_SHIFT; res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1); + } else if (type == INMEM_PAGES) { + /* it allows 20% / total_ram for inmemory pages */ + mem_size = get_pages(sbi, F2FS_INMEM_PAGES); + res = mem_size < (val.totalram / 5); } else { if (!sbi->sb->s_bdi->wb.dirty_exceeded) return true; @@ -81,44 +103,33 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int type) static void clear_node_page_dirty(struct page *page) { - struct address_space *mapping = page->mapping; - unsigned int long flags; - if (PageDirty(page)) { - spin_lock_irqsave(&mapping->tree_lock, flags); - radix_tree_tag_clear(&mapping->page_tree, - page_index(page), - PAGECACHE_TAG_DIRTY); - spin_unlock_irqrestore(&mapping->tree_lock, flags); - + f2fs_clear_radix_tree_dirty_tag(page); clear_page_dirty_for_io(page); - dec_page_count(F2FS_M_SB(mapping), F2FS_DIRTY_NODES); + dec_page_count(F2FS_P_SB(page), F2FS_DIRTY_NODES); } ClearPageUptodate(page); } static struct page *get_current_nat_page(struct f2fs_sb_info *sbi, nid_t nid) { - pgoff_t index = current_nat_addr(sbi, nid); - return get_meta_page(sbi, index); + return f2fs_get_meta_page_nofail(sbi, current_nat_addr(sbi, nid)); } static struct page *get_next_nat_page(struct f2fs_sb_info *sbi, nid_t nid) { struct page *src_page; struct page *dst_page; - pgoff_t src_off; pgoff_t dst_off; void *src_addr; void *dst_addr; struct f2fs_nm_info *nm_i = NM_I(sbi); - src_off = current_nat_addr(sbi, nid); - dst_off = next_nat_addr(sbi, src_off); + dst_off = next_nat_addr(sbi, current_nat_addr(sbi, nid)); /* get current nat block page with lock */ - src_page = get_meta_page(sbi, src_off); - dst_page = grab_meta_page(sbi, dst_off); + src_page = get_current_nat_page(sbi, nid); + dst_page = f2fs_grab_meta_page(sbi, dst_off); f2fs_bug_on(sbi, PageDirty(src_page)); src_addr = page_address(src_page); @@ -132,9 +143,61 @@ static struct page *get_next_nat_page(struct f2fs_sb_info *sbi, nid_t nid) return dst_page; } +static struct nat_entry *__alloc_nat_entry(nid_t nid, bool no_fail) +{ + struct nat_entry *new; + + if (no_fail) + new = f2fs_kmem_cache_alloc(nat_entry_slab, GFP_F2FS_ZERO); + else + new = kmem_cache_alloc(nat_entry_slab, GFP_F2FS_ZERO); + if (new) { + nat_set_nid(new, nid); + nat_reset_flag(new); + } + return new; +} + +static void __free_nat_entry(struct nat_entry *e) +{ + kmem_cache_free(nat_entry_slab, e); +} + +/* must be locked by nat_tree_lock */ +static struct nat_entry *__init_nat_entry(struct f2fs_nm_info *nm_i, + struct nat_entry *ne, struct f2fs_nat_entry *raw_ne, bool no_fail) +{ + if (no_fail) + f2fs_radix_tree_insert(&nm_i->nat_root, nat_get_nid(ne), ne); + else if (radix_tree_insert(&nm_i->nat_root, nat_get_nid(ne), ne)) + return NULL; + + if (raw_ne) + node_info_from_raw_nat(&ne->ni, raw_ne); + + spin_lock(&nm_i->nat_list_lock); + list_add_tail(&ne->list, &nm_i->nat_entries); + spin_unlock(&nm_i->nat_list_lock); + + nm_i->nat_cnt++; + return ne; +} + static struct nat_entry *__lookup_nat_cache(struct f2fs_nm_info *nm_i, nid_t n) { - return radix_tree_lookup(&nm_i->nat_root, n); + struct nat_entry *ne; + + ne = radix_tree_lookup(&nm_i->nat_root, n); + + /* for recent accessed nat entry, move it to tail of lru list */ + if (ne && !get_nat_flag(ne, IS_DIRTY)) { + spin_lock(&nm_i->nat_list_lock); + if (!list_empty(&ne->list)) + list_move_tail(&ne->list, &nm_i->nat_entries); + spin_unlock(&nm_i->nat_list_lock); + } + + return ne; } static unsigned int __gang_lookup_nat_cache(struct f2fs_nm_info *nm_i, @@ -145,21 +208,17 @@ static unsigned int __gang_lookup_nat_cache(struct f2fs_nm_info *nm_i, static void __del_from_nat_cache(struct f2fs_nm_info *nm_i, struct nat_entry *e) { - list_del(&e->list); radix_tree_delete(&nm_i->nat_root, nat_get_nid(e)); nm_i->nat_cnt--; - kmem_cache_free(nat_entry_slab, e); + __free_nat_entry(e); } -static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i, - struct nat_entry *ne) +static struct nat_entry_set *__grab_nat_entry_set(struct f2fs_nm_info *nm_i, + struct nat_entry *ne) { nid_t set = NAT_BLOCK_OFFSET(ne->ni.nid); struct nat_entry_set *head; - if (get_nat_flag(ne, IS_DIRTY)) - return; - head = radix_tree_lookup(&nm_i->nat_set_root, set); if (!head) { head = f2fs_kmem_cache_alloc(nat_entry_set_slab, GFP_NOFS); @@ -170,25 +229,53 @@ static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i, head->entry_cnt = 0; f2fs_radix_tree_insert(&nm_i->nat_set_root, set, head); } - list_move_tail(&ne->list, &head->entry_list); + return head; +} + +static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i, + struct nat_entry *ne) +{ + struct nat_entry_set *head; + bool new_ne = nat_get_blkaddr(ne) == NEW_ADDR; + + if (!new_ne) + head = __grab_nat_entry_set(nm_i, ne); + + /* + * update entry_cnt in below condition: + * 1. update NEW_ADDR to valid block address; + * 2. update old block address to new one; + */ + if (!new_ne && (get_nat_flag(ne, IS_PREALLOC) || + !get_nat_flag(ne, IS_DIRTY))) + head->entry_cnt++; + + set_nat_flag(ne, IS_PREALLOC, new_ne); + + if (get_nat_flag(ne, IS_DIRTY)) + goto refresh_list; + nm_i->dirty_nat_cnt++; - head->entry_cnt++; set_nat_flag(ne, IS_DIRTY, true); +refresh_list: + spin_lock(&nm_i->nat_list_lock); + if (new_ne) + list_del_init(&ne->list); + else + list_move_tail(&ne->list, &head->entry_list); + spin_unlock(&nm_i->nat_list_lock); } static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i, - struct nat_entry *ne) + struct nat_entry_set *set, struct nat_entry *ne) { - nid_t set = NAT_BLOCK_OFFSET(ne->ni.nid); - struct nat_entry_set *head; + spin_lock(&nm_i->nat_list_lock); + list_move_tail(&ne->list, &nm_i->nat_entries); + spin_unlock(&nm_i->nat_list_lock); - head = radix_tree_lookup(&nm_i->nat_set_root, set); - if (head) { - list_move_tail(&ne->list, &nm_i->nat_entries); - set_nat_flag(ne, IS_DIRTY, false); - head->entry_cnt--; - nm_i->dirty_nat_cnt--; - } + set_nat_flag(ne, IS_DIRTY, false); + set->entry_cnt--; + nm_i->dirty_nat_cnt--; } static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i, @@ -198,7 +285,73 @@ static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i, start, nr); } -int need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid) +bool f2fs_in_warm_node_list(struct f2fs_sb_info *sbi, struct page *page) +{ + return NODE_MAPPING(sbi) == page->mapping && + IS_DNODE(page) && is_cold_node(page); +} + +void f2fs_init_fsync_node_info(struct f2fs_sb_info *sbi) +{ + spin_lock_init(&sbi->fsync_node_lock); + INIT_LIST_HEAD(&sbi->fsync_node_list); + sbi->fsync_seg_id = 0; + sbi->fsync_node_num = 0; +} + +static unsigned int f2fs_add_fsync_node_entry(struct f2fs_sb_info *sbi, + struct page *page) +{ + struct fsync_node_entry *fn; + unsigned long flags; + unsigned int seq_id; + + fn = f2fs_kmem_cache_alloc(fsync_node_entry_slab, GFP_NOFS); + + get_page(page); + fn->page = page; + INIT_LIST_HEAD(&fn->list); + + spin_lock_irqsave(&sbi->fsync_node_lock, flags); + list_add_tail(&fn->list, &sbi->fsync_node_list); + fn->seq_id = sbi->fsync_seg_id++; + seq_id = fn->seq_id; + sbi->fsync_node_num++; + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); + + return seq_id; +} + +void f2fs_del_fsync_node_entry(struct f2fs_sb_info *sbi, struct page *page) +{ + struct fsync_node_entry *fn; + unsigned long flags; + + spin_lock_irqsave(&sbi->fsync_node_lock, flags); + list_for_each_entry(fn, &sbi->fsync_node_list, list) { + if (fn->page == page) { + list_del(&fn->list); + sbi->fsync_node_num--; + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); + kmem_cache_free(fsync_node_entry_slab, fn); + put_page(page); + return; + } + } + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); + f2fs_bug_on(sbi, 1); +} + +void f2fs_reset_fsync_node_info(struct f2fs_sb_info *sbi) +{ + unsigned long flags; + + spin_lock_irqsave(&sbi->fsync_node_lock, flags); + sbi->fsync_seg_id = 0; + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); +} + +int f2fs_need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct nat_entry *e; @@ -215,7 +368,7 @@ int need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid) return need; } -bool is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid) +bool f2fs_is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct nat_entry *e; @@ -229,7 +382,7 @@ bool is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid) return is_cp; } -bool need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino) +bool f2fs_need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct nat_entry *e; @@ -245,35 +398,29 @@ bool need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino) return need_update; } -static struct nat_entry *grab_nat_entry(struct f2fs_nm_info *nm_i, nid_t nid) -{ - struct nat_entry *new; - - new = f2fs_kmem_cache_alloc(nat_entry_slab, GFP_NOFS); - f2fs_radix_tree_insert(&nm_i->nat_root, nid, new); - memset(new, 0, sizeof(struct nat_entry)); - nat_set_nid(new, nid); - nat_reset_flag(new); - list_add_tail(&new->list, &nm_i->nat_entries); - nm_i->nat_cnt++; - return new; -} - +/* must be locked by nat_tree_lock */ static void cache_nat_entry(struct f2fs_sb_info *sbi, nid_t nid, struct f2fs_nat_entry *ne) { struct f2fs_nm_info *nm_i = NM_I(sbi); - struct nat_entry *e; + struct nat_entry *new, *e; + new = __alloc_nat_entry(nid, false); + if (!new) + return; + + down_write(&nm_i->nat_tree_lock); e = __lookup_nat_cache(nm_i, nid); - if (!e) { - e = grab_nat_entry(nm_i, nid); - node_info_from_raw_nat(&e->ni, ne); - } else { - f2fs_bug_on(sbi, nat_get_ino(e) != ne->ino || - nat_get_blkaddr(e) != ne->block_addr || + if (!e) + e = __init_nat_entry(nm_i, new, ne, false); + else + f2fs_bug_on(sbi, nat_get_ino(e) != le32_to_cpu(ne->ino) || + nat_get_blkaddr(e) != + le32_to_cpu(ne->block_addr) || nat_get_version(e) != ne->version); - } + up_write(&nm_i->nat_tree_lock); + if (e != new) + __free_nat_entry(new); } static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, @@ -281,11 +428,12 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, { struct f2fs_nm_info *nm_i = NM_I(sbi); struct nat_entry *e; + struct nat_entry *new = __alloc_nat_entry(ni->nid, true); down_write(&nm_i->nat_tree_lock); e = __lookup_nat_cache(nm_i, ni->nid); if (!e) { - e = grab_nat_entry(nm_i, ni->nid); + e = __init_nat_entry(nm_i, new, NULL, true); copy_node_info(&e->ni, ni); f2fs_bug_on(sbi, ni->blk_addr == NEW_ADDR); } else if (new_blkaddr == NEW_ADDR) { @@ -297,6 +445,9 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, copy_node_info(&e->ni, ni); f2fs_bug_on(sbi, ni->blk_addr != NULL_ADDR); } + /* let's free early to reduce memory consumption */ + if (e != new) + __free_nat_entry(new); /* sanity check */ f2fs_bug_on(sbi, nat_get_blkaddr(e) != ni->blk_addr); @@ -304,23 +455,18 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, new_blkaddr == NULL_ADDR); f2fs_bug_on(sbi, nat_get_blkaddr(e) == NEW_ADDR && new_blkaddr == NEW_ADDR); - f2fs_bug_on(sbi, nat_get_blkaddr(e) != NEW_ADDR && - nat_get_blkaddr(e) != NULL_ADDR && + f2fs_bug_on(sbi, is_valid_data_blkaddr(sbi, nat_get_blkaddr(e)) && new_blkaddr == NEW_ADDR); /* increment version no as node is removed */ if (nat_get_blkaddr(e) != NEW_ADDR && new_blkaddr == NULL_ADDR) { unsigned char version = nat_get_version(e); nat_set_version(e, inc_node_version(version)); - - /* in order to reuse the nid */ - if (nm_i->next_scan_nid > ni->nid) - nm_i->next_scan_nid = ni->nid; } /* change address */ nat_set_blkaddr(e, new_blkaddr); - if (new_blkaddr == NEW_ADDR || new_blkaddr == NULL_ADDR) + if (!is_valid_data_blkaddr(sbi, new_blkaddr)) set_nat_flag(e, IS_CHECKPOINTED, false); __set_nat_cache_dirty(nm_i, e); @@ -335,7 +481,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, up_write(&nm_i->nat_tree_lock); } -int try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink) +int f2fs_try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink) { struct f2fs_nm_info *nm_i = NM_I(sbi); int nr = nr_shrink; @@ -343,13 +489,25 @@ int try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink) if (!down_write_trylock(&nm_i->nat_tree_lock)) return 0; - while (nr_shrink && !list_empty(&nm_i->nat_entries)) { + spin_lock(&nm_i->nat_list_lock); + while (nr_shrink) { struct nat_entry *ne; + + if (list_empty(&nm_i->nat_entries)) + break; + ne = list_first_entry(&nm_i->nat_entries, struct nat_entry, list); + list_del(&ne->list); + spin_unlock(&nm_i->nat_list_lock); + __del_from_nat_cache(nm_i, ne); nr_shrink--; + + spin_lock(&nm_i->nat_list_lock); } + spin_unlock(&nm_i->nat_list_lock); + up_write(&nm_i->nat_tree_lock); return nr - nr_shrink; } @@ -357,7 +515,8 @@ int try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink) /* * This function always returns success */ -void get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct node_info *ni) +int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid, + struct node_info *ni) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA); @@ -367,6 +526,7 @@ void get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct node_info *ni) struct page *page = NULL; struct f2fs_nat_entry ne; struct nat_entry *e; + pgoff_t index; int i; ni->nid = nid; @@ -379,40 +539,46 @@ void get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct node_info *ni) ni->blk_addr = nat_get_blkaddr(e); ni->version = nat_get_version(e); up_read(&nm_i->nat_tree_lock); - return; + return 0; } memset(&ne, 0, sizeof(struct f2fs_nat_entry)); /* Check current segment summary */ down_read(&curseg->journal_rwsem); - i = lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 0); + i = f2fs_lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 0); if (i >= 0) { ne = nat_in_journal(journal, i); node_info_from_raw_nat(ni, &ne); } up_read(&curseg->journal_rwsem); - if (i >= 0) + if (i >= 0) { + up_read(&nm_i->nat_tree_lock); goto cache; + } /* Fill node_info from nat page */ - page = get_current_nat_page(sbi, start_nid); + index = current_nat_addr(sbi, nid); + up_read(&nm_i->nat_tree_lock); + + page = f2fs_get_meta_page(sbi, index); + if (IS_ERR(page)) + return PTR_ERR(page); + nat_blk = (struct f2fs_nat_block *)page_address(page); ne = nat_blk->entries[nid - start_nid]; node_info_from_raw_nat(ni, &ne); f2fs_put_page(page, 1); cache: - up_read(&nm_i->nat_tree_lock); /* cache nat entry */ - down_write(&nm_i->nat_tree_lock); cache_nat_entry(sbi, nid, &ne); - up_write(&nm_i->nat_tree_lock); + return 0; } /* * readahead MAX_RA_NODE number of node pages. */ -static void ra_node_pages(struct page *parent, int start, int n) +static void f2fs_ra_node_pages(struct page *parent, int start, int n) { struct f2fs_sb_info *sbi = F2FS_P_SB(parent); struct blk_plug plug; @@ -426,13 +592,13 @@ static void ra_node_pages(struct page *parent, int start, int n) end = min(end, NIDS_PER_BLOCK); for (i = start; i < end; i++) { nid = get_nid(parent, i, false); - ra_node_page(sbi, nid); + f2fs_ra_node_page(sbi, nid); } blk_finish_plug(&plug); } -pgoff_t get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs) +pgoff_t f2fs_get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs) { const long direct_index = ADDRS_PER_INODE(dn->inode); const long direct_blks = ADDRS_PER_BLOCK; @@ -535,7 +701,7 @@ static int get_node_path(struct inode *inode, long block, level = 3; goto got; } else { - BUG(); + return -E2BIG; } got: return level; @@ -547,7 +713,7 @@ static int get_node_path(struct inode *inode, long block, * f2fs_unlock_op() only if ro is not set RDONLY_NODE. * In the case of RDONLY_NODE, we don't need to care about mutex. */ -int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) +int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); struct page *npage[4]; @@ -559,12 +725,14 @@ int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) int err = 0; level = get_node_path(dn->inode, index, offset, noffset); + if (level < 0) + return level; nids[0] = dn->inode->i_ino; npage[0] = dn->inode_page; if (!npage[0]) { - npage[0] = get_node_page(sbi, nids[0]); + npage[0] = f2fs_get_node_page(sbi, nids[0]); if (IS_ERR(npage[0])) return PTR_ERR(npage[0]); } @@ -588,24 +756,24 @@ int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) if (!nids[i] && mode == ALLOC_NODE) { /* alloc new node */ - if (!alloc_nid(sbi, &(nids[i]))) { + if (!f2fs_alloc_nid(sbi, &(nids[i]))) { err = -ENOSPC; goto release_pages; } dn->nid = nids[i]; - npage[i] = new_node_page(dn, noffset[i], NULL); + npage[i] = f2fs_new_node_page(dn, noffset[i]); if (IS_ERR(npage[i])) { - alloc_nid_failed(sbi, nids[i]); + f2fs_alloc_nid_failed(sbi, nids[i]); err = PTR_ERR(npage[i]); goto release_pages; } set_nid(parent, offset[i - 1], nids[i], i == 1); - alloc_nid_done(sbi, nids[i]); + f2fs_alloc_nid_done(sbi, nids[i]); done = true; } else if (mode == LOOKUP_NODE_RA && i == level && level > 1) { - npage[i] = get_node_page_ra(parent, offset[i - 1]); + npage[i] = f2fs_get_node_page_ra(parent, offset[i - 1]); if (IS_ERR(npage[i])) { err = PTR_ERR(npage[i]); goto release_pages; @@ -620,7 +788,7 @@ int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) } if (!done) { - npage[i] = get_node_page(sbi, nids[i]); + npage[i] = f2fs_get_node_page(sbi, nids[i]); if (IS_ERR(npage[i])) { err = PTR_ERR(npage[i]); f2fs_put_page(npage[0], 0); @@ -635,7 +803,8 @@ int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) dn->nid = nids[level]; dn->ofs_in_node = offset[level]; dn->node_page = npage[level]; - dn->data_blkaddr = datablock_addr(dn->node_page, dn->ofs_in_node); + dn->data_blkaddr = datablock_addr(dn->inode, + dn->node_page, dn->ofs_in_node); return 0; release_pages: @@ -653,29 +822,27 @@ int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) return err; } -static void truncate_node(struct dnode_of_data *dn) +static int truncate_node(struct dnode_of_data *dn) { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); struct node_info ni; + int err; - get_node_info(sbi, dn->nid, &ni); - if (dn->inode->i_blocks == 0) { - f2fs_bug_on(sbi, ni.blk_addr != NULL_ADDR); - goto invalidate; - } - f2fs_bug_on(sbi, ni.blk_addr == NULL_ADDR); + err = f2fs_get_node_info(sbi, dn->nid, &ni); + if (err) + return err; /* Deallocate node address */ - invalidate_blocks(sbi, ni.blk_addr); - dec_valid_node_count(sbi, dn->inode); + f2fs_invalidate_blocks(sbi, ni.blk_addr); + dec_valid_node_count(sbi, dn->inode, dn->nid == dn->inode->i_ino); set_node_addr(sbi, &ni, NULL_ADDR, false); if (dn->nid == dn->inode->i_ino) { - remove_orphan_inode(sbi, dn->nid); + f2fs_remove_orphan_inode(sbi, dn->nid); dec_valid_inode_count(sbi); f2fs_inode_synced(dn->inode); } -invalidate: + clear_node_page_dirty(dn->node_page); set_sbi_flag(sbi, SBI_IS_DIRTY); @@ -686,17 +853,20 @@ static void truncate_node(struct dnode_of_data *dn) dn->node_page = NULL; trace_f2fs_truncate_node(dn->inode, dn->nid, ni.blk_addr); + + return 0; } static int truncate_dnode(struct dnode_of_data *dn) { struct page *page; + int err; if (dn->nid == 0) return 1; /* get direct node */ - page = get_node_page(F2FS_I_SB(dn->inode), dn->nid); + page = f2fs_get_node_page(F2FS_I_SB(dn->inode), dn->nid); if (IS_ERR(page) && PTR_ERR(page) == -ENOENT) return 1; else if (IS_ERR(page)) @@ -705,8 +875,11 @@ static int truncate_dnode(struct dnode_of_data *dn) /* Make dnode_of_data for parameter */ dn->node_page = page; dn->ofs_in_node = 0; - truncate_data_blocks(dn); - truncate_node(dn); + f2fs_truncate_data_blocks(dn); + err = truncate_node(dn); + if (err) + return err; + return 1; } @@ -726,13 +899,13 @@ static int truncate_nodes(struct dnode_of_data *dn, unsigned int nofs, trace_f2fs_truncate_nodes_enter(dn->inode, dn->nid, dn->data_blkaddr); - page = get_node_page(F2FS_I_SB(dn->inode), dn->nid); + page = f2fs_get_node_page(F2FS_I_SB(dn->inode), dn->nid); if (IS_ERR(page)) { trace_f2fs_truncate_nodes_exit(dn->inode, PTR_ERR(page)); return PTR_ERR(page); } - ra_node_pages(page, ofs, NIDS_PER_BLOCK); + f2fs_ra_node_pages(page, ofs, NIDS_PER_BLOCK); rn = F2FS_NODE(page); if (depth < 3) { @@ -771,7 +944,9 @@ static int truncate_nodes(struct dnode_of_data *dn, unsigned int nofs, if (!ofs) { /* remove current indirect node */ dn->node_page = page; - truncate_node(dn); + ret = truncate_node(dn); + if (ret) + goto out_err; freed++; } else { f2fs_put_page(page, 1); @@ -802,7 +977,7 @@ static int truncate_partial_nodes(struct dnode_of_data *dn, /* get indirect nodes in the path */ for (i = 0; i < idx + 1; i++) { /* reference count'll be increased */ - pages[i] = get_node_page(F2FS_I_SB(dn->inode), nid[i]); + pages[i] = f2fs_get_node_page(F2FS_I_SB(dn->inode), nid[i]); if (IS_ERR(pages[i])) { err = PTR_ERR(pages[i]); idx = i - 1; @@ -811,7 +986,7 @@ static int truncate_partial_nodes(struct dnode_of_data *dn, nid[i + 1] = get_nid(pages[i], offset[i + 1], false); } - ra_node_pages(pages[idx], offset[idx + 1], NIDS_PER_BLOCK); + f2fs_ra_node_pages(pages[idx], offset[idx + 1], NIDS_PER_BLOCK); /* free direct nodes linked to a partial indirect node */ for (i = offset[idx + 1]; i < NIDS_PER_BLOCK; i++) { @@ -829,7 +1004,9 @@ static int truncate_partial_nodes(struct dnode_of_data *dn, if (offset[idx + 1] == 0) { dn->node_page = pages[idx]; dn->nid = nid[idx]; - truncate_node(dn); + err = truncate_node(dn); + if (err) + goto fail; } else { f2fs_put_page(pages[idx], 1); } @@ -848,7 +1025,7 @@ static int truncate_partial_nodes(struct dnode_of_data *dn, /* * All the block addresses of data and nodes should be nullified. */ -int truncate_inode_blocks(struct inode *inode, pgoff_t from) +int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); int err = 0, cont = 1; @@ -861,8 +1038,10 @@ int truncate_inode_blocks(struct inode *inode, pgoff_t from) trace_f2fs_truncate_inode_blocks_enter(inode, from); level = get_node_path(inode, from, offset, noffset); + if (level < 0) + return level; - page = get_node_page(sbi, inode->i_ino); + page = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(page)) { trace_f2fs_truncate_inode_blocks_exit(inode, PTR_ERR(page)); return PTR_ERR(page); @@ -941,30 +1120,31 @@ int truncate_inode_blocks(struct inode *inode, pgoff_t from) return err > 0 ? 0 : err; } -int truncate_xattr_node(struct inode *inode, struct page *page) +/* caller must lock inode page */ +int f2fs_truncate_xattr_node(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); nid_t nid = F2FS_I(inode)->i_xattr_nid; struct dnode_of_data dn; struct page *npage; + int err; if (!nid) return 0; - npage = get_node_page(sbi, nid); + npage = f2fs_get_node_page(sbi, nid); if (IS_ERR(npage)) return PTR_ERR(npage); + set_new_dnode(&dn, inode, NULL, npage, nid); + err = truncate_node(&dn); + if (err) { + f2fs_put_page(npage, 1); + return err; + } + f2fs_i_xnid_write(inode, 0); - /* need to do checkpoint during fsync */ - F2FS_I(inode)->xattr_ver = cur_cp_version(F2FS_CKPT(sbi)); - - set_new_dnode(&dn, inode, page, npage, nid); - - if (page) - dn.inode_page_locked = true; - truncate_node(&dn); return 0; } @@ -972,17 +1152,17 @@ int truncate_xattr_node(struct inode *inode, struct page *page) * Caller should grab and release a rwsem by calling f2fs_lock_op() and * f2fs_unlock_op(). */ -int remove_inode_page(struct inode *inode) +int f2fs_remove_inode_page(struct inode *inode) { struct dnode_of_data dn; int err; set_new_dnode(&dn, inode, NULL, NULL, inode->i_ino); - err = get_dnode_of_data(&dn, 0, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, 0, LOOKUP_NODE); if (err) return err; - err = truncate_xattr_node(inode, dn.inode_page); + err = f2fs_truncate_xattr_node(inode); if (err) { f2fs_put_dnode(&dn); return err; @@ -991,18 +1171,26 @@ int remove_inode_page(struct inode *inode) /* remove potential inline_data blocks */ if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) - truncate_data_blocks_range(&dn, 1); + f2fs_truncate_data_blocks_range(&dn, 1); /* 0 is possible, after f2fs_new_inode() has failed */ + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) { + f2fs_put_dnode(&dn); + return -EIO; + } f2fs_bug_on(F2FS_I_SB(inode), - inode->i_blocks != 0 && inode->i_blocks != 1); + inode->i_blocks != 0 && inode->i_blocks != 8); /* will put inode & node pages */ - truncate_node(&dn); + err = truncate_node(&dn); + if (err) { + f2fs_put_dnode(&dn); + return err; + } return 0; } -struct page *new_inode_page(struct inode *inode) +struct page *f2fs_new_inode_page(struct inode *inode) { struct dnode_of_data dn; @@ -1010,14 +1198,13 @@ struct page *new_inode_page(struct inode *inode) set_new_dnode(&dn, inode, NULL, NULL, inode->i_ino); /* caller should f2fs_put_page(page, 1); */ - return new_node_page(&dn, 0, NULL); + return f2fs_new_node_page(&dn, 0); } -struct page *new_node_page(struct dnode_of_data *dn, - unsigned int ofs, struct page *ipage) +struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs) { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); - struct node_info old_ni, new_ni; + struct node_info new_ni; struct page *page; int err; @@ -1028,22 +1215,27 @@ struct page *new_node_page(struct dnode_of_data *dn, if (!page) return ERR_PTR(-ENOMEM); - if (unlikely(!inc_valid_node_count(sbi, dn->inode))) { - err = -ENOSPC; + if (unlikely((err = inc_valid_node_count(sbi, dn->inode, !ofs)))) + goto fail; + +#ifdef CONFIG_F2FS_CHECK_FS + err = f2fs_get_node_info(sbi, dn->nid, &new_ni); + if (err) { + dec_valid_node_count(sbi, dn->inode, !ofs); goto fail; } - - get_node_info(sbi, dn->nid, &old_ni); - - /* Reinitialize old_ni with new node page */ - f2fs_bug_on(sbi, old_ni.blk_addr != NULL_ADDR); - new_ni = old_ni; + f2fs_bug_on(sbi, new_ni.blk_addr != NULL_ADDR); +#endif + new_ni.nid = dn->nid; new_ni.ino = dn->inode->i_ino; + new_ni.blk_addr = NULL_ADDR; + new_ni.flag = 0; + new_ni.version = 0; set_node_addr(sbi, &new_ni, NEW_ADDR, false); f2fs_wait_on_page_writeback(page, NODE, true); fill_node_footer(page, dn->nid, dn->inode->i_ino, ofs, true); - set_cold_node(dn->inode, page); + set_cold_node(page, S_ISDIR(dn->inode->i_mode)); if (!PageUptodate(page)) SetPageUptodate(page); if (set_page_dirty(page)) @@ -1079,13 +1271,21 @@ static int read_node_page(struct page *page, int op_flags) .page = page, .encrypted_page = NULL, }; + int err; - if (PageUptodate(page)) + if (PageUptodate(page)) { +#ifdef CONFIG_F2FS_CHECK_FS + f2fs_bug_on(sbi, !f2fs_inode_chksum_verify(sbi, page)); +#endif return LOCKED_PAGE; + } - get_node_info(sbi, page->index, &ni); + err = f2fs_get_node_info(sbi, page->index, &ni); + if (err) + return err; - if (unlikely(ni.blk_addr == NULL_ADDR)) { + if (unlikely(ni.blk_addr == NULL_ADDR) || + is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN)) { ClearPageUptodate(page); return -ENOENT; } @@ -1097,14 +1297,15 @@ static int read_node_page(struct page *page, int op_flags) /* * Readahead a node page */ -void ra_node_page(struct f2fs_sb_info *sbi, nid_t nid) +void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid) { struct page *apage; int err; if (!nid) return; - f2fs_bug_on(sbi, check_nid_range(sbi, nid)); + if (f2fs_check_nid_range(sbi, nid)) + return; rcu_read_lock(); apage = radix_tree_lookup(&NODE_MAPPING(sbi)->page_tree, nid); @@ -1128,22 +1329,24 @@ static struct page *__get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid, if (!nid) return ERR_PTR(-ENOENT); - f2fs_bug_on(sbi, check_nid_range(sbi, nid)); + if (f2fs_check_nid_range(sbi, nid)) + return ERR_PTR(-EINVAL); repeat: page = f2fs_grab_cache_page(NODE_MAPPING(sbi), nid, false); if (!page) return ERR_PTR(-ENOMEM); - err = read_node_page(page, READ_SYNC); + err = read_node_page(page, 0); if (err < 0) { f2fs_put_page(page, 1); return ERR_PTR(err); } else if (err == LOCKED_PAGE) { + err = 0; goto page_hit; } if (parent) - ra_node_pages(parent, start + 1, MAX_RA_NODE); + f2fs_ra_node_pages(parent, start + 1, MAX_RA_NODE); lock_page(page); @@ -1152,25 +1355,37 @@ static struct page *__get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid, goto repeat; } - if (unlikely(!PageUptodate(page))) + if (unlikely(!PageUptodate(page))) { + err = -EIO; goto out_err; + } + + if (!f2fs_inode_chksum_verify(sbi, page)) { + err = -EBADMSG; + goto out_err; + } page_hit: if(unlikely(nid != nid_of_node(page))) { - f2fs_bug_on(sbi, 1); - ClearPageUptodate(page); + f2fs_msg(sbi->sb, KERN_WARNING, "inconsistent node block, " + "nid:%lu, node_footer[nid:%u,ino:%u,ofs:%u,cpver:%llu,blkaddr:%u]", + nid, nid_of_node(page), ino_of_node(page), + ofs_of_node(page), cpver_of_node(page), + next_blkaddr_of_node(page)); + err = -EINVAL; out_err: + ClearPageUptodate(page); f2fs_put_page(page, 1); - return ERR_PTR(-EIO); + return ERR_PTR(err); } return page; } -struct page *get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid) +struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid) { return __get_node_page(sbi, nid, NULL, 0); } -struct page *get_node_page_ra(struct page *parent, int start) +struct page *f2fs_get_node_page_ra(struct page *parent, int start) { struct f2fs_sb_info *sbi = F2FS_P_SB(parent); nid_t nid = get_nid(parent, start, false); @@ -1189,7 +1404,8 @@ static void flush_inline_data(struct f2fs_sb_info *sbi, nid_t ino) if (!inode) return; - page = pagecache_get_page(inode->i_mapping, 0, FGP_LOCK|FGP_NOWAIT, 0); + page = f2fs_pagecache_get_page(inode->i_mapping, 0, + FGP_LOCK|FGP_NOWAIT, 0); if (!page) goto iput_out; @@ -1204,6 +1420,7 @@ static void flush_inline_data(struct f2fs_sb_info *sbi, nid_t ino) ret = f2fs_write_inline_data(inode, page); inode_dec_dirty_pages(inode); + f2fs_remove_dirty_inode(inode); if (ret) set_page_dirty(page); page_out: @@ -1212,54 +1429,19 @@ static void flush_inline_data(struct f2fs_sb_info *sbi, nid_t ino) iput(inode); } -void move_node_page(struct page *node_page, int gc_type) -{ - if (gc_type == FG_GC) { - struct f2fs_sb_info *sbi = F2FS_P_SB(node_page); - struct writeback_control wbc = { - .sync_mode = WB_SYNC_ALL, - .nr_to_write = 1, - .for_reclaim = 0, - }; - - set_page_dirty(node_page); - f2fs_wait_on_page_writeback(node_page, NODE, true); - - f2fs_bug_on(sbi, PageWriteback(node_page)); - if (!clear_page_dirty_for_io(node_page)) - goto out_page; - - if (NODE_MAPPING(sbi)->a_ops->writepage(node_page, &wbc)) - unlock_page(node_page); - goto release_page; - } else { - /* set page dirty and write it */ - if (!PageWriteback(node_page)) - set_page_dirty(node_page); - } -out_page: - unlock_page(node_page); -release_page: - f2fs_put_page(node_page, 0); -} - static struct page *last_fsync_dnode(struct f2fs_sb_info *sbi, nid_t ino) { - pgoff_t index, end; + pgoff_t index; struct pagevec pvec; struct page *last_page = NULL; + int nr_pages; pagevec_init(&pvec, 0); index = 0; - end = ULONG_MAX; - while (index <= end) { - int i, nr_pages; - nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, - PAGECACHE_TAG_DIRTY, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); - if (nr_pages == 0) - break; + while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, + PAGECACHE_TAG_DIRTY))) { + int i; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; @@ -1303,16 +1485,158 @@ static struct page *last_fsync_dnode(struct f2fs_sb_info *sbi, nid_t ino) return last_page; } -int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, - struct writeback_control *wbc, bool atomic) +static int __write_node_page(struct page *page, bool atomic, bool *submitted, + struct writeback_control *wbc, bool do_balance, + enum iostat_type io_type, unsigned int *seq_id) { - pgoff_t index, end; + struct f2fs_sb_info *sbi = F2FS_P_SB(page); + nid_t nid; + struct node_info ni; + struct f2fs_io_info fio = { + .sbi = sbi, + .ino = ino_of_node(page), + .type = NODE, + .op = REQ_OP_WRITE, + .op_flags = wbc_to_write_flags(wbc), + .page = page, + .encrypted_page = NULL, + .submitted = false, + .io_type = io_type, + .io_wbc = wbc, + }; + unsigned int seq; + + trace_f2fs_writepage(page, NODE); + + if (unlikely(f2fs_cp_error(sbi))) + goto redirty_out; + + if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) + goto redirty_out; + + if (wbc->sync_mode == WB_SYNC_NONE && + IS_DNODE(page) && is_cold_node(page)) + goto redirty_out; + + /* get old block addr of this node page */ + nid = nid_of_node(page); + f2fs_bug_on(sbi, page->index != nid); + + if (f2fs_get_node_info(sbi, nid, &ni)) + goto redirty_out; + + if (wbc->for_reclaim) { + if (!down_read_trylock(&sbi->node_write)) + goto redirty_out; + } else { + down_read(&sbi->node_write); + } + + /* This page is already truncated */ + if (unlikely(ni.blk_addr == NULL_ADDR)) { + ClearPageUptodate(page); + dec_page_count(sbi, F2FS_DIRTY_NODES); + up_read(&sbi->node_write); + unlock_page(page); + return 0; + } + + if (__is_valid_data_blkaddr(ni.blk_addr) && + !f2fs_is_valid_blkaddr(sbi, ni.blk_addr, DATA_GENERIC)) + goto redirty_out; + + if (atomic && !test_opt(sbi, NOBARRIER)) + fio.op_flags |= REQ_PREFLUSH | REQ_FUA; + + set_page_writeback(page); + ClearPageError(page); + + if (f2fs_in_warm_node_list(sbi, page)) { + seq = f2fs_add_fsync_node_entry(sbi, page); + if (seq_id) + *seq_id = seq; + } + + fio.old_blkaddr = ni.blk_addr; + f2fs_do_write_node_page(nid, &fio); + set_node_addr(sbi, &ni, fio.new_blkaddr, is_fsync_dnode(page)); + dec_page_count(sbi, F2FS_DIRTY_NODES); + up_read(&sbi->node_write); + + if (wbc->for_reclaim) { + f2fs_submit_merged_write_cond(sbi, page->mapping->host, 0, + page->index, NODE); + submitted = NULL; + } + + unlock_page(page); + + if (unlikely(f2fs_cp_error(sbi))) { + f2fs_submit_merged_write(sbi, NODE); + submitted = NULL; + } + if (submitted) + *submitted = fio.submitted; + + if (do_balance) + f2fs_balance_fs(sbi, false); + return 0; + +redirty_out: + redirty_page_for_writepage(wbc, page); + return AOP_WRITEPAGE_ACTIVATE; +} + +void f2fs_move_node_page(struct page *node_page, int gc_type) +{ + if (gc_type == FG_GC) { + struct writeback_control wbc = { + .sync_mode = WB_SYNC_ALL, + .nr_to_write = 1, + .for_reclaim = 0, + }; + + set_page_dirty(node_page); + f2fs_wait_on_page_writeback(node_page, NODE, true); + + f2fs_bug_on(F2FS_P_SB(node_page), PageWriteback(node_page)); + if (!clear_page_dirty_for_io(node_page)) + goto out_page; + + if (__write_node_page(node_page, false, NULL, + &wbc, false, FS_GC_NODE_IO, NULL)) + unlock_page(node_page); + goto release_page; + } else { + /* set page dirty and write it */ + if (!PageWriteback(node_page)) + set_page_dirty(node_page); + } +out_page: + unlock_page(node_page); +release_page: + f2fs_put_page(node_page, 0); +} + +static int f2fs_write_node_page(struct page *page, + struct writeback_control *wbc) +{ + return __write_node_page(page, false, NULL, wbc, false, + FS_NODE_IO, NULL); +} + +int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, + struct writeback_control *wbc, bool atomic, + unsigned int *seq_id) +{ + pgoff_t index; + pgoff_t last_idx = ULONG_MAX; struct pagevec pvec; int ret = 0; struct page *last_page = NULL; bool marked = false; nid_t ino = inode->i_ino; - int nwritten = 0; + int nr_pages; if (atomic) { last_page = last_fsync_dnode(sbi, ino); @@ -1322,23 +1646,20 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, retry: pagevec_init(&pvec, 0); index = 0; - end = ULONG_MAX; - while (index <= end) { - int i, nr_pages; - nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, - PAGECACHE_TAG_DIRTY, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); - if (nr_pages == 0) - break; + while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, + PAGECACHE_TAG_DIRTY))) { + int i; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; + bool submitted = false; if (unlikely(f2fs_cp_error(sbi))) { f2fs_put_page(last_page, 0); pagevec_release(&pvec); - return -EIO; + ret = -EIO; + goto out; } if (!IS_DNODE(page) || !is_cold_node(page)) @@ -1364,14 +1685,17 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, f2fs_wait_on_page_writeback(page, NODE, true); BUG_ON(PageWriteback(page)); + set_fsync_mark(page, 0); + set_dentry_mark(page, 0); + if (!atomic || page == last_page) { set_fsync_mark(page, 1); if (IS_INODE(page)) { if (is_inode_flag_set(inode, FI_DIRTY_INODE)) - update_inode(inode, page); + f2fs_update_inode(inode, page); set_dentry_mark(page, - need_dentry_mark(sbi, ino)); + f2fs_need_dentry_mark(sbi, ino)); } /* may be written by other thread */ if (!PageDirty(page)) @@ -1381,13 +1705,16 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, if (!clear_page_dirty_for_io(page)) goto continue_unlock; - ret = NODE_MAPPING(sbi)->a_ops->writepage(page, wbc); + ret = __write_node_page(page, atomic && + page == last_page, + &submitted, wbc, true, + FS_NODE_IO, seq_id); if (ret) { unlock_page(page); f2fs_put_page(last_page, 0); break; - } else { - nwritten++; + } else if (submitted) { + last_idx = page->index; } if (page == last_page) { @@ -1407,45 +1734,46 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, "Retry to write fsync mark: ino=%u, idx=%lx", ino, last_page->index); lock_page(last_page); + f2fs_wait_on_page_writeback(last_page, NODE, true); set_page_dirty(last_page); unlock_page(last_page); goto retry; } - - if (nwritten) - f2fs_submit_merged_bio_cond(sbi, NULL, NULL, ino, NODE, WRITE); +out: + if (last_idx != ULONG_MAX) + f2fs_submit_merged_write_cond(sbi, NULL, ino, last_idx, NODE); return ret ? -EIO: 0; } -int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc) +int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, + struct writeback_control *wbc, + bool do_balance, enum iostat_type io_type) { - pgoff_t index, end; + pgoff_t index; struct pagevec pvec; int step = 0; int nwritten = 0; int ret = 0; + int nr_pages, done = 0; pagevec_init(&pvec, 0); next_step: index = 0; - end = ULONG_MAX; - while (index <= end) { - int i, nr_pages; - nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, - PAGECACHE_TAG_DIRTY, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); - if (nr_pages == 0) - break; + while (!done && (nr_pages = pagevec_lookup_tag(&pvec, + NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) { + int i; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; + bool submitted = false; - if (unlikely(f2fs_cp_error(sbi))) { - pagevec_release(&pvec); - ret = -EIO; - goto out; + /* give a priority to WB_SYNC threads */ + if (atomic_read(&sbi->wb_sync_req[NODE]) && + wbc->sync_mode == WB_SYNC_NONE) { + done = 1; + break; } /* @@ -1496,9 +1824,11 @@ int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc) set_fsync_mark(page, 0); set_dentry_mark(page, 0); - if (NODE_MAPPING(sbi)->a_ops->writepage(page, wbc)) + ret = __write_node_page(page, false, &submitted, + wbc, do_balance, io_type, NULL); + if (ret) unlock_page(page); - else + else if (submitted) nwritten++; if (--wbc->nr_to_write == 0) @@ -1514,120 +1844,63 @@ int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc) } if (step < 2) { + if (wbc->sync_mode == WB_SYNC_NONE && step == 1) + goto out; step++; goto next_step; } out: if (nwritten) - f2fs_submit_merged_bio(sbi, NODE, WRITE); + f2fs_submit_merged_write(sbi, NODE); + + if (unlikely(f2fs_cp_error(sbi))) + return -EIO; return ret; } -int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino) +int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, + unsigned int seq_id) { - pgoff_t index = 0, end = ULONG_MAX; - struct pagevec pvec; + struct fsync_node_entry *fn; + struct page *page; + struct list_head *head = &sbi->fsync_node_list; + unsigned long flags; + unsigned int cur_seq_id = 0; int ret2, ret = 0; - pagevec_init(&pvec, 0); - - while (index <= end) { - int i, nr_pages; - nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, - PAGECACHE_TAG_WRITEBACK, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); - if (nr_pages == 0) + while (seq_id && cur_seq_id < seq_id) { + spin_lock_irqsave(&sbi->fsync_node_lock, flags); + if (list_empty(head)) { + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); break; - - for (i = 0; i < nr_pages; i++) { - struct page *page = pvec.pages[i]; - - /* until radix tree lookup accepts end_index */ - if (unlikely(page->index > end)) - continue; - - if (ino && ino_of_node(page) == ino) { - f2fs_wait_on_page_writeback(page, NODE, true); - if (TestClearPageError(page)) - ret = -EIO; - } } - pagevec_release(&pvec); - cond_resched(); + fn = list_first_entry(head, struct fsync_node_entry, list); + if (fn->seq_id > seq_id) { + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); + break; + } + cur_seq_id = fn->seq_id; + page = fn->page; + get_page(page); + spin_unlock_irqrestore(&sbi->fsync_node_lock, flags); + + f2fs_wait_on_page_writeback(page, NODE, true); + if (TestClearPageError(page)) + ret = -EIO; + + put_page(page); + + if (ret) + break; } ret2 = filemap_check_errors(NODE_MAPPING(sbi)); if (!ret) ret = ret2; + return ret; } -static int f2fs_write_node_page(struct page *page, - struct writeback_control *wbc) -{ - struct f2fs_sb_info *sbi = F2FS_P_SB(page); - nid_t nid; - struct node_info ni; - struct f2fs_io_info fio = { - .sbi = sbi, - .type = NODE, - .op = REQ_OP_WRITE, - .op_flags = (wbc->sync_mode == WB_SYNC_ALL) ? WRITE_SYNC : 0, - .page = page, - .encrypted_page = NULL, - }; - - trace_f2fs_writepage(page, NODE); - - if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) - goto redirty_out; - if (unlikely(f2fs_cp_error(sbi))) - goto redirty_out; - - /* get old block addr of this node page */ - nid = nid_of_node(page); - f2fs_bug_on(sbi, page->index != nid); - - if (wbc->for_reclaim) { - if (!down_read_trylock(&sbi->node_write)) - goto redirty_out; - } else { - down_read(&sbi->node_write); - } - - get_node_info(sbi, nid, &ni); - - /* This page is already truncated */ - if (unlikely(ni.blk_addr == NULL_ADDR)) { - ClearPageUptodate(page); - dec_page_count(sbi, F2FS_DIRTY_NODES); - up_read(&sbi->node_write); - unlock_page(page); - return 0; - } - - set_page_writeback(page); - fio.old_blkaddr = ni.blk_addr; - write_node_page(nid, &fio); - set_node_addr(sbi, &ni, fio.new_blkaddr, is_fsync_dnode(page)); - dec_page_count(sbi, F2FS_DIRTY_NODES); - up_read(&sbi->node_write); - - if (wbc->for_reclaim) - f2fs_submit_merged_bio_cond(sbi, NULL, page, 0, NODE, WRITE); - - unlock_page(page); - - if (unlikely(f2fs_cp_error(sbi))) - f2fs_submit_merged_bio(sbi, NODE, WRITE); - - return 0; - -redirty_out: - redirty_page_for_writepage(wbc, page); - return AOP_WRITEPAGE_ACTIVATE; -} - static int f2fs_write_node_pages(struct address_space *mapping, struct writeback_control *wbc) { @@ -1635,6 +1908,9 @@ static int f2fs_write_node_pages(struct address_space *mapping, struct blk_plug plug; long diff; + if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) + goto skip_write; + /* balancing f2fs's metadata in background */ f2fs_balance_fs_bg(sbi); @@ -1642,14 +1918,21 @@ static int f2fs_write_node_pages(struct address_space *mapping, if (get_pages(sbi, F2FS_DIRTY_NODES) < nr_pages_to_skip(sbi, NODE)) goto skip_write; + if (wbc->sync_mode == WB_SYNC_ALL) + atomic_inc(&sbi->wb_sync_req[NODE]); + else if (atomic_read(&sbi->wb_sync_req[NODE])) + goto skip_write; + trace_f2fs_writepages(mapping->host, wbc, NODE); diff = nr_pages_to_write(sbi, NODE, wbc); - wbc->sync_mode = WB_SYNC_NONE; blk_start_plug(&plug); - sync_node_pages(sbi, wbc); + f2fs_sync_node_pages(sbi, wbc, true, FS_NODE_IO); blk_finish_plug(&plug); wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff); + + if (wbc->sync_mode == WB_SYNC_ALL) + atomic_dec(&sbi->wb_sync_req[NODE]); return 0; skip_write: @@ -1664,8 +1947,12 @@ static int f2fs_set_node_page_dirty(struct page *page) if (!PageUptodate(page)) SetPageUptodate(page); +#ifdef CONFIG_F2FS_CHECK_FS + if (IS_INODE(page)) + f2fs_inode_chksum_set(F2FS_P_SB(page), page); +#endif if (!PageDirty(page)) { - f2fs_set_page_dirty_nobuffers(page); + __set_page_dirty_nobuffers(page); inc_page_count(F2FS_P_SB(page), F2FS_DIRTY_NODES); SetPagePrivate(page); f2fs_trace_pid(page); @@ -1694,122 +1981,304 @@ static struct free_nid *__lookup_free_nid_list(struct f2fs_nm_info *nm_i, return radix_tree_lookup(&nm_i->free_nid_root, n); } -static void __del_from_free_nid_list(struct f2fs_nm_info *nm_i, - struct free_nid *i) +static int __insert_free_nid(struct f2fs_sb_info *sbi, + struct free_nid *i, enum nid_state state) { - list_del(&i->list); + struct f2fs_nm_info *nm_i = NM_I(sbi); + + int err = radix_tree_insert(&nm_i->free_nid_root, i->nid, i); + if (err) + return err; + + f2fs_bug_on(sbi, state != i->state); + nm_i->nid_cnt[state]++; + if (state == FREE_NID) + list_add_tail(&i->list, &nm_i->free_nid_list); + return 0; +} + +static void __remove_free_nid(struct f2fs_sb_info *sbi, + struct free_nid *i, enum nid_state state) +{ + struct f2fs_nm_info *nm_i = NM_I(sbi); + + f2fs_bug_on(sbi, state != i->state); + nm_i->nid_cnt[state]--; + if (state == FREE_NID) + list_del(&i->list); radix_tree_delete(&nm_i->free_nid_root, i->nid); } -static int add_free_nid(struct f2fs_sb_info *sbi, nid_t nid, bool build) +static void __move_free_nid(struct f2fs_sb_info *sbi, struct free_nid *i, + enum nid_state org_state, enum nid_state dst_state) { struct f2fs_nm_info *nm_i = NM_I(sbi); - struct free_nid *i; - struct nat_entry *ne; - if (!available_free_memory(sbi, FREE_NIDS)) - return -1; + f2fs_bug_on(sbi, org_state != i->state); + i->state = dst_state; + nm_i->nid_cnt[org_state]--; + nm_i->nid_cnt[dst_state]++; + + switch (dst_state) { + case PREALLOC_NID: + list_del(&i->list); + break; + case FREE_NID: + list_add_tail(&i->list, &nm_i->free_nid_list); + break; + default: + BUG_ON(1); + } +} + +static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid, + bool set, bool build) +{ + struct f2fs_nm_info *nm_i = NM_I(sbi); + unsigned int nat_ofs = NAT_BLOCK_OFFSET(nid); + unsigned int nid_ofs = nid - START_NID(nid); + + if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap)) + return; + + if (set) { + if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs])) + return; + __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]); + nm_i->free_nid_count[nat_ofs]++; + } else { + if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs])) + return; + __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]); + if (!build) + nm_i->free_nid_count[nat_ofs]--; + } +} + +/* return if the nid is recognized as free */ +static bool add_free_nid(struct f2fs_sb_info *sbi, + nid_t nid, bool build, bool update) +{ + struct f2fs_nm_info *nm_i = NM_I(sbi); + struct free_nid *i, *e; + struct nat_entry *ne; + int err = -EINVAL; + bool ret = false; /* 0 nid should not be used */ if (unlikely(nid == 0)) - return 0; - - if (build) { - /* do not add allocated nids */ - ne = __lookup_nat_cache(nm_i, nid); - if (ne && (!get_nat_flag(ne, IS_CHECKPOINTED) || - nat_get_blkaddr(ne) != NULL_ADDR)) - return 0; - } + return false; i = f2fs_kmem_cache_alloc(free_nid_slab, GFP_NOFS); i->nid = nid; - i->state = NID_NEW; + i->state = FREE_NID; - if (radix_tree_preload(GFP_NOFS)) { - kmem_cache_free(free_nid_slab, i); - return 0; - } + radix_tree_preload(GFP_NOFS | __GFP_NOFAIL); - spin_lock(&nm_i->free_nid_list_lock); - if (radix_tree_insert(&nm_i->free_nid_root, i->nid, i)) { - spin_unlock(&nm_i->free_nid_list_lock); - radix_tree_preload_end(); - kmem_cache_free(free_nid_slab, i); - return 0; + spin_lock(&nm_i->nid_list_lock); + + if (build) { + /* + * Thread A Thread B + * - f2fs_create + * - f2fs_new_inode + * - f2fs_alloc_nid + * - __insert_nid_to_list(PREALLOC_NID) + * - f2fs_balance_fs_bg + * - f2fs_build_free_nids + * - __f2fs_build_free_nids + * - scan_nat_page + * - add_free_nid + * - __lookup_nat_cache + * - f2fs_add_link + * - f2fs_init_inode_metadata + * - f2fs_new_inode_page + * - f2fs_new_node_page + * - set_node_addr + * - f2fs_alloc_nid_done + * - __remove_nid_from_list(PREALLOC_NID) + * - __insert_nid_to_list(FREE_NID) + */ + ne = __lookup_nat_cache(nm_i, nid); + if (ne && (!get_nat_flag(ne, IS_CHECKPOINTED) || + nat_get_blkaddr(ne) != NULL_ADDR)) + goto err_out; + + e = __lookup_free_nid_list(nm_i, nid); + if (e) { + if (e->state == FREE_NID) + ret = true; + goto err_out; + } } - list_add_tail(&i->list, &nm_i->free_nid_list); - nm_i->fcnt++; - spin_unlock(&nm_i->free_nid_list_lock); + ret = true; + err = __insert_free_nid(sbi, i, FREE_NID); +err_out: + if (update) { + update_free_nid_bitmap(sbi, nid, ret, build); + if (!build) + nm_i->available_nids++; + } + spin_unlock(&nm_i->nid_list_lock); radix_tree_preload_end(); - return 1; + + if (err) + kmem_cache_free(free_nid_slab, i); + return ret; } -static void remove_free_nid(struct f2fs_nm_info *nm_i, nid_t nid) +static void remove_free_nid(struct f2fs_sb_info *sbi, nid_t nid) { + struct f2fs_nm_info *nm_i = NM_I(sbi); struct free_nid *i; bool need_free = false; - spin_lock(&nm_i->free_nid_list_lock); + spin_lock(&nm_i->nid_list_lock); i = __lookup_free_nid_list(nm_i, nid); - if (i && i->state == NID_NEW) { - __del_from_free_nid_list(nm_i, i); - nm_i->fcnt--; + if (i && i->state == FREE_NID) { + __remove_free_nid(sbi, i, FREE_NID); need_free = true; } - spin_unlock(&nm_i->free_nid_list_lock); + spin_unlock(&nm_i->nid_list_lock); if (need_free) kmem_cache_free(free_nid_slab, i); } -static void scan_nat_page(struct f2fs_sb_info *sbi, +static int scan_nat_page(struct f2fs_sb_info *sbi, struct page *nat_page, nid_t start_nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct f2fs_nat_block *nat_blk = page_address(nat_page); block_t blk_addr; + unsigned int nat_ofs = NAT_BLOCK_OFFSET(start_nid); int i; + __set_bit_le(nat_ofs, nm_i->nat_block_bitmap); + i = start_nid % NAT_ENTRY_PER_BLOCK; for (; i < NAT_ENTRY_PER_BLOCK; i++, start_nid++) { - if (unlikely(start_nid >= nm_i->max_nid)) break; blk_addr = le32_to_cpu(nat_blk->entries[i].block_addr); - f2fs_bug_on(sbi, blk_addr == NEW_ADDR); + + if (blk_addr == NEW_ADDR) + return -EINVAL; + if (blk_addr == NULL_ADDR) { - if (add_free_nid(sbi, start_nid, true) < 0) - break; + add_free_nid(sbi, start_nid, true, true); + } else { + spin_lock(&NM_I(sbi)->nid_list_lock); + update_free_nid_bitmap(sbi, start_nid, false, true); + spin_unlock(&NM_I(sbi)->nid_list_lock); } } + + return 0; } -void build_free_nids(struct f2fs_sb_info *sbi) +static void scan_curseg_cache(struct f2fs_sb_info *sbi) { - struct f2fs_nm_info *nm_i = NM_I(sbi); struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA); struct f2fs_journal *journal = curseg->journal; - int i = 0; + int i; + + down_read(&curseg->journal_rwsem); + for (i = 0; i < nats_in_cursum(journal); i++) { + block_t addr; + nid_t nid; + + addr = le32_to_cpu(nat_in_journal(journal, i).block_addr); + nid = le32_to_cpu(nid_in_journal(journal, i)); + if (addr == NULL_ADDR) + add_free_nid(sbi, nid, true, false); + else + remove_free_nid(sbi, nid); + } + up_read(&curseg->journal_rwsem); +} + +static void scan_free_nid_bits(struct f2fs_sb_info *sbi) +{ + struct f2fs_nm_info *nm_i = NM_I(sbi); + unsigned int i, idx; + nid_t nid; + + down_read(&nm_i->nat_tree_lock); + + for (i = 0; i < nm_i->nat_blocks; i++) { + if (!test_bit_le(i, nm_i->nat_block_bitmap)) + continue; + if (!nm_i->free_nid_count[i]) + continue; + for (idx = 0; idx < NAT_ENTRY_PER_BLOCK; idx++) { + idx = find_next_bit_le(nm_i->free_nid_bitmap[i], + NAT_ENTRY_PER_BLOCK, idx); + if (idx >= NAT_ENTRY_PER_BLOCK) + break; + + nid = i * NAT_ENTRY_PER_BLOCK + idx; + add_free_nid(sbi, nid, true, false); + + if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) + goto out; + } + } +out: + scan_curseg_cache(sbi); + + up_read(&nm_i->nat_tree_lock); +} + +static int __f2fs_build_free_nids(struct f2fs_sb_info *sbi, + bool sync, bool mount) +{ + struct f2fs_nm_info *nm_i = NM_I(sbi); + int i = 0, ret; nid_t nid = nm_i->next_scan_nid; + if (unlikely(nid >= nm_i->max_nid)) + nid = 0; + /* Enough entries */ - if (nm_i->fcnt >= NAT_ENTRY_PER_BLOCK) - return; + if (nm_i->nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK) + return 0; + + if (!sync && !f2fs_available_free_memory(sbi, FREE_NIDS)) + return 0; + + if (!mount) { + /* try to find free nids in free_nid_bitmap */ + scan_free_nid_bits(sbi); + + if (nm_i->nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK) + return 0; + } /* readahead nat pages to be scanned */ - ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), FREE_NID_PAGES, + f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), FREE_NID_PAGES, META_NAT, true); down_read(&nm_i->nat_tree_lock); while (1) { - struct page *page = get_current_nat_page(sbi, nid); + if (!test_bit_le(NAT_BLOCK_OFFSET(nid), + nm_i->nat_block_bitmap)) { + struct page *page = get_current_nat_page(sbi, nid); - scan_nat_page(sbi, page, nid); - f2fs_put_page(page, 1); + ret = scan_nat_page(sbi, page, nid); + f2fs_put_page(page, 1); + + if (ret) { + up_read(&nm_i->nat_tree_lock); + f2fs_bug_on(sbi, !mount); + f2fs_msg(sbi->sb, KERN_ERR, + "NAT is corrupt, run fsck to fix it"); + return -EINVAL; + } + } nid += (NAT_ENTRY_PER_BLOCK - (nid % NAT_ENTRY_PER_BLOCK)); if (unlikely(nid >= nm_i->max_nid)) @@ -1823,22 +2292,25 @@ void build_free_nids(struct f2fs_sb_info *sbi) nm_i->next_scan_nid = nid; /* find free nids from current sum_pages */ - down_read(&curseg->journal_rwsem); - for (i = 0; i < nats_in_cursum(journal); i++) { - block_t addr; + scan_curseg_cache(sbi); - addr = le32_to_cpu(nat_in_journal(journal, i).block_addr); - nid = le32_to_cpu(nid_in_journal(journal, i)); - if (addr == NULL_ADDR) - add_free_nid(sbi, nid, true); - else - remove_free_nid(nm_i, nid); - } - up_read(&curseg->journal_rwsem); up_read(&nm_i->nat_tree_lock); - ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nm_i->next_scan_nid), + f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nm_i->next_scan_nid), nm_i->ra_nid_pages, META_NAT, false); + + return 0; +} + +int f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount) +{ + int ret; + + mutex_lock(&NM_I(sbi)->build_lock); + ret = __f2fs_build_free_nids(sbi, sync, mount); + mutex_unlock(&NM_I(sbi)->build_lock); + + return ret; } /* @@ -1846,64 +2318,66 @@ void build_free_nids(struct f2fs_sb_info *sbi) * from second parameter of this function. * The returned nid could be used ino as well as nid when inode is created. */ -bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid) +bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct free_nid *i = NULL; retry: -#ifdef CONFIG_F2FS_FAULT_INJECTION - if (time_to_inject(sbi, FAULT_ALLOC_NID)) + if (time_to_inject(sbi, FAULT_ALLOC_NID)) { + f2fs_show_injection_info(FAULT_ALLOC_NID); return false; -#endif - if (unlikely(sbi->total_valid_node_count + 1 > nm_i->available_nids)) + } + + spin_lock(&nm_i->nid_list_lock); + + if (unlikely(nm_i->available_nids == 0)) { + spin_unlock(&nm_i->nid_list_lock); return false; + } - spin_lock(&nm_i->free_nid_list_lock); - - /* We should not use stale free nids created by build_free_nids */ - if (nm_i->fcnt && !on_build_free_nids(nm_i)) { + /* We should not use stale free nids created by f2fs_build_free_nids */ + if (nm_i->nid_cnt[FREE_NID] && !on_f2fs_build_free_nids(nm_i)) { f2fs_bug_on(sbi, list_empty(&nm_i->free_nid_list)); - list_for_each_entry(i, &nm_i->free_nid_list, list) - if (i->state == NID_NEW) - break; - - f2fs_bug_on(sbi, i->state != NID_NEW); + i = list_first_entry(&nm_i->free_nid_list, + struct free_nid, list); *nid = i->nid; - i->state = NID_ALLOC; - nm_i->fcnt--; - spin_unlock(&nm_i->free_nid_list_lock); + + __move_free_nid(sbi, i, FREE_NID, PREALLOC_NID); + nm_i->available_nids--; + + update_free_nid_bitmap(sbi, *nid, false, false); + + spin_unlock(&nm_i->nid_list_lock); return true; } - spin_unlock(&nm_i->free_nid_list_lock); + spin_unlock(&nm_i->nid_list_lock); /* Let's scan nat pages and its caches to get free nids */ - mutex_lock(&nm_i->build_lock); - build_free_nids(sbi); - mutex_unlock(&nm_i->build_lock); + f2fs_build_free_nids(sbi, true, false); goto retry; } /* - * alloc_nid() should be called prior to this function. + * f2fs_alloc_nid() should be called prior to this function. */ -void alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid) +void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct free_nid *i; - spin_lock(&nm_i->free_nid_list_lock); + spin_lock(&nm_i->nid_list_lock); i = __lookup_free_nid_list(nm_i, nid); - f2fs_bug_on(sbi, !i || i->state != NID_ALLOC); - __del_from_free_nid_list(nm_i, i); - spin_unlock(&nm_i->free_nid_list_lock); + f2fs_bug_on(sbi, !i); + __remove_free_nid(sbi, i, PREALLOC_NID); + spin_unlock(&nm_i->nid_list_lock); kmem_cache_free(free_nid_slab, i); } /* - * alloc_nid() should be called prior to this function. + * f2fs_alloc_nid() should be called prior to this function. */ -void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid) +void f2fs_alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct free_nid *i; @@ -1912,120 +2386,141 @@ void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid) if (!nid) return; - spin_lock(&nm_i->free_nid_list_lock); + spin_lock(&nm_i->nid_list_lock); i = __lookup_free_nid_list(nm_i, nid); - f2fs_bug_on(sbi, !i || i->state != NID_ALLOC); - if (!available_free_memory(sbi, FREE_NIDS)) { - __del_from_free_nid_list(nm_i, i); + f2fs_bug_on(sbi, !i); + + if (!f2fs_available_free_memory(sbi, FREE_NIDS)) { + __remove_free_nid(sbi, i, PREALLOC_NID); need_free = true; } else { - i->state = NID_NEW; - nm_i->fcnt++; + __move_free_nid(sbi, i, PREALLOC_NID, FREE_NID); } - spin_unlock(&nm_i->free_nid_list_lock); + + nm_i->available_nids++; + + update_free_nid_bitmap(sbi, nid, true, false); + + spin_unlock(&nm_i->nid_list_lock); if (need_free) kmem_cache_free(free_nid_slab, i); } -int try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink) +int f2fs_try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct free_nid *i, *next; int nr = nr_shrink; - if (nm_i->fcnt <= MAX_FREE_NIDS) + if (nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS) return 0; if (!mutex_trylock(&nm_i->build_lock)) return 0; - spin_lock(&nm_i->free_nid_list_lock); + spin_lock(&nm_i->nid_list_lock); list_for_each_entry_safe(i, next, &nm_i->free_nid_list, list) { - if (nr_shrink <= 0 || nm_i->fcnt <= MAX_FREE_NIDS) + if (nr_shrink <= 0 || + nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS) break; - if (i->state == NID_ALLOC) - continue; - __del_from_free_nid_list(nm_i, i); + + __remove_free_nid(sbi, i, FREE_NID); kmem_cache_free(free_nid_slab, i); - nm_i->fcnt--; nr_shrink--; } - spin_unlock(&nm_i->free_nid_list_lock); + spin_unlock(&nm_i->nid_list_lock); mutex_unlock(&nm_i->build_lock); return nr - nr_shrink; } -void recover_inline_xattr(struct inode *inode, struct page *page) +void f2fs_recover_inline_xattr(struct inode *inode, struct page *page) { void *src_addr, *dst_addr; size_t inline_size; struct page *ipage; struct f2fs_inode *ri; - ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino); + ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino); f2fs_bug_on(F2FS_I_SB(inode), IS_ERR(ipage)); ri = F2FS_INODE(page); - if (!(ri->i_inline & F2FS_INLINE_XATTR)) { + if (ri->i_inline & F2FS_INLINE_XATTR) { + set_inode_flag(inode, FI_INLINE_XATTR); + } else { clear_inode_flag(inode, FI_INLINE_XATTR); goto update_inode; } - dst_addr = inline_xattr_addr(ipage); - src_addr = inline_xattr_addr(page); + dst_addr = inline_xattr_addr(inode, ipage); + src_addr = inline_xattr_addr(inode, page); inline_size = inline_xattr_size(inode); f2fs_wait_on_page_writeback(ipage, NODE, true); memcpy(dst_addr, src_addr, inline_size); update_inode: - update_inode(inode, ipage); + f2fs_update_inode(inode, ipage); f2fs_put_page(ipage, 1); } -void recover_xattr_data(struct inode *inode, struct page *page, block_t blkaddr) +int f2fs_recover_xattr_data(struct inode *inode, struct page *page) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); nid_t prev_xnid = F2FS_I(inode)->i_xattr_nid; - nid_t new_xnid = nid_of_node(page); + nid_t new_xnid; + struct dnode_of_data dn; struct node_info ni; + struct page *xpage; + int err; - /* 1: invalidate the previous xattr nid */ if (!prev_xnid) goto recover_xnid; - /* Deallocate node address */ - get_node_info(sbi, prev_xnid, &ni); - f2fs_bug_on(sbi, ni.blk_addr == NULL_ADDR); - invalidate_blocks(sbi, ni.blk_addr); - dec_valid_node_count(sbi, inode); + /* 1: invalidate the previous xattr nid */ + err = f2fs_get_node_info(sbi, prev_xnid, &ni); + if (err) + return err; + + f2fs_invalidate_blocks(sbi, ni.blk_addr); + dec_valid_node_count(sbi, inode, false); set_node_addr(sbi, &ni, NULL_ADDR, false); recover_xnid: - /* 2: allocate new xattr nid */ - if (unlikely(!inc_valid_node_count(sbi, inode))) - f2fs_bug_on(sbi, 1); + /* 2: update xattr nid in inode */ + if (!f2fs_alloc_nid(sbi, &new_xnid)) + return -ENOSPC; - remove_free_nid(NM_I(sbi), new_xnid); - get_node_info(sbi, new_xnid, &ni); - ni.ino = inode->i_ino; - set_node_addr(sbi, &ni, NEW_ADDR, false); - f2fs_i_xnid_write(inode, new_xnid); + set_new_dnode(&dn, inode, NULL, NULL, new_xnid); + xpage = f2fs_new_node_page(&dn, XATTR_NODE_OFFSET); + if (IS_ERR(xpage)) { + f2fs_alloc_nid_failed(sbi, new_xnid); + return PTR_ERR(xpage); + } - /* 3: update xattr blkaddr */ - refresh_sit_entry(sbi, NEW_ADDR, blkaddr); - set_node_addr(sbi, &ni, blkaddr, false); + f2fs_alloc_nid_done(sbi, new_xnid); + f2fs_update_inode_page(inode); + + /* 3: update and set xattr node page dirty */ + memcpy(F2FS_NODE(xpage), F2FS_NODE(page), VALID_XATTR_BLOCK_SIZE); + + set_page_dirty(xpage); + f2fs_put_page(xpage, 1); + + return 0; } -int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) +int f2fs_recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) { struct f2fs_inode *src, *dst; nid_t ino = ino_of_node(page); struct node_info old_ni, new_ni; struct page *ipage; + int err; - get_node_info(sbi, ino, &old_ni); + err = f2fs_get_node_info(sbi, ino, &old_ni); + if (err) + return err; if (unlikely(old_ni.blk_addr != NULL_ADDR)) return -EINVAL; @@ -2037,11 +2532,12 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) } /* Should not use this inode from free nid list */ - remove_free_nid(NM_I(sbi), ino); + remove_free_nid(sbi, ino); if (!PageUptodate(ipage)) SetPageUptodate(ipage); fill_node_footer(ipage, ino, ino, 0, true); + set_cold_node(page, false); src = F2FS_INODE(page); dst = F2FS_INODE(ipage); @@ -2051,12 +2547,25 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) dst->i_blocks = cpu_to_le64(1); dst->i_links = cpu_to_le32(1); dst->i_xattr_nid = 0; - dst->i_inline = src->i_inline & F2FS_INLINE_XATTR; + dst->i_inline = src->i_inline & (F2FS_INLINE_XATTR | F2FS_EXTRA_ATTR); + if (dst->i_inline & F2FS_EXTRA_ATTR) { + dst->i_extra_isize = src->i_extra_isize; + + if (f2fs_sb_has_flexible_inline_xattr(sbi->sb) && + F2FS_FITS_IN_INODE(src, le16_to_cpu(src->i_extra_isize), + i_inline_xattr_size)) + dst->i_inline_xattr_size = src->i_inline_xattr_size; + + if (f2fs_sb_has_project_quota(sbi->sb) && + F2FS_FITS_IN_INODE(src, le16_to_cpu(src->i_extra_isize), + i_projid)) + dst->i_projid = src->i_projid; + } new_ni = old_ni; new_ni.ino = ino; - if (unlikely(!inc_valid_node_count(sbi, NULL))) + if (unlikely(inc_valid_node_count(sbi, NULL, true))) WARN_ON(1); set_node_addr(sbi, &new_ni, NEW_ADDR, false); inc_valid_inode_count(sbi); @@ -2065,13 +2574,12 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) return 0; } -int restore_node_summary(struct f2fs_sb_info *sbi, +int f2fs_restore_node_summary(struct f2fs_sb_info *sbi, unsigned int segno, struct f2fs_summary_block *sum) { struct f2fs_node *rn; struct f2fs_summary *sum_entry; block_t addr; - int bio_blocks = MAX_BIO_BLOCKS(sbi); int i, idx, last_offset, nrpages; /* scan the node segment */ @@ -2080,13 +2588,16 @@ int restore_node_summary(struct f2fs_sb_info *sbi, sum_entry = &sum->entries[0]; for (i = 0; i < last_offset; i += nrpages, addr += nrpages) { - nrpages = min(last_offset - i, bio_blocks); + nrpages = min(last_offset - i, BIO_MAX_PAGES); /* readahead node pages */ - ra_meta_pages(sbi, addr, nrpages, META_POR, true); + f2fs_ra_meta_pages(sbi, addr, nrpages, META_POR, true); for (idx = addr; idx < addr + nrpages; idx++) { - struct page *page = get_tmp_page(sbi, idx); + struct page *page = f2fs_get_tmp_page(sbi, idx); + + if (IS_ERR(page)) + return PTR_ERR(page); rn = F2FS_NODE(page); sum_entry->nid = rn->footer.nid; @@ -2119,9 +2630,22 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi) ne = __lookup_nat_cache(nm_i, nid); if (!ne) { - ne = grab_nat_entry(nm_i, nid); - node_info_from_raw_nat(&ne->ni, &raw_ne); + ne = __alloc_nat_entry(nid, true); + __init_nat_entry(nm_i, ne, &raw_ne, true); } + + /* + * if a free nat in journal has not been used after last + * checkpoint, we should remove it from available nids, + * since later we will add it again. + */ + if (!get_nat_flag(ne, IS_DIRTY) && + le32_to_cpu(raw_ne.block_addr) == NULL_ADDR) { + spin_lock(&nm_i->nid_list_lock); + nm_i->available_nids--; + spin_unlock(&nm_i->nid_list_lock); + } + __set_nat_cache_dirty(nm_i, ne); } update_nats_in_cursum(journal, -i); @@ -2146,8 +2670,41 @@ static void __adjust_nat_entry_set(struct nat_entry_set *nes, list_add_tail(&nes->set_list, head); } +static void __update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid, + struct page *page) +{ + struct f2fs_nm_info *nm_i = NM_I(sbi); + unsigned int nat_index = start_nid / NAT_ENTRY_PER_BLOCK; + struct f2fs_nat_block *nat_blk = page_address(page); + int valid = 0; + int i = 0; + + if (!enabled_nat_bits(sbi, NULL)) + return; + + if (nat_index == 0) { + valid = 1; + i = 1; + } + for (; i < NAT_ENTRY_PER_BLOCK; i++) { + if (nat_blk->entries[i].block_addr != NULL_ADDR) + valid++; + } + if (valid == 0) { + __set_bit_le(nat_index, nm_i->empty_nat_bits); + __clear_bit_le(nat_index, nm_i->full_nat_bits); + return; + } + + __clear_bit_le(nat_index, nm_i->empty_nat_bits); + if (valid == NAT_ENTRY_PER_BLOCK) + __set_bit_le(nat_index, nm_i->full_nat_bits); + else + __clear_bit_le(nat_index, nm_i->full_nat_bits); +} + static void __flush_nat_entry_set(struct f2fs_sb_info *sbi, - struct nat_entry_set *set) + struct nat_entry_set *set, struct cp_control *cpc) { struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA); struct f2fs_journal *journal = curseg->journal; @@ -2162,7 +2719,8 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi, * #1, flush nat entries to journal in current hot data summary block. * #2, flush nat entries to nat page. */ - if (!__has_cursum_space(journal, set->entry_cnt, NAT_JOURNAL)) + if (enabled_nat_bits(sbi, cpc) || + !__has_cursum_space(journal, set->entry_cnt, NAT_JOURNAL)) to_journal = false; if (to_journal) { @@ -2179,11 +2737,10 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi, nid_t nid = nat_get_nid(ne); int offset; - if (nat_get_blkaddr(ne) == NEW_ADDR) - continue; + f2fs_bug_on(sbi, nat_get_blkaddr(ne) == NEW_ADDR); if (to_journal) { - offset = lookup_journal_in_cursum(journal, + offset = f2fs_lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 1); f2fs_bug_on(sbi, offset < 0); raw_ne = &nat_in_journal(journal, offset); @@ -2193,26 +2750,34 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi, } raw_nat_from_node_info(raw_ne, &ne->ni); nat_reset_flag(ne); - __clear_nat_cache_dirty(NM_I(sbi), ne); - if (nat_get_blkaddr(ne) == NULL_ADDR) - add_free_nid(sbi, nid, false); + __clear_nat_cache_dirty(NM_I(sbi), set, ne); + if (nat_get_blkaddr(ne) == NULL_ADDR) { + add_free_nid(sbi, nid, false, true); + } else { + spin_lock(&NM_I(sbi)->nid_list_lock); + update_free_nid_bitmap(sbi, nid, false, false); + spin_unlock(&NM_I(sbi)->nid_list_lock); + } } - if (to_journal) + if (to_journal) { up_write(&curseg->journal_rwsem); - else + } else { + __update_nat_bits(sbi, start_nid, page); f2fs_put_page(page, 1); + } - f2fs_bug_on(sbi, set->entry_cnt); - - radix_tree_delete(&NM_I(sbi)->nat_set_root, set->set); - kmem_cache_free(nat_entry_set_slab, set); + /* Allow dirty nats by node block allocation in write_begin */ + if (!set->entry_cnt) { + radix_tree_delete(&NM_I(sbi)->nat_set_root, set->set); + kmem_cache_free(nat_entry_set_slab, set); + } } /* * This function is called during the checkpointing process. */ -void flush_nat_entries(struct f2fs_sb_info *sbi) +void f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA); @@ -2223,6 +2788,13 @@ void flush_nat_entries(struct f2fs_sb_info *sbi) nid_t set_idx = 0; LIST_HEAD(sets); + /* during unmount, let's flush nat_bits before checking dirty_nat_cnt */ + if (enabled_nat_bits(sbi, cpc)) { + down_write(&nm_i->nat_tree_lock); + remove_nats_in_journal(sbi); + up_write(&nm_i->nat_tree_lock); + } + if (!nm_i->dirty_nat_cnt) return; @@ -2233,7 +2805,8 @@ void flush_nat_entries(struct f2fs_sb_info *sbi) * entries, remove all entries from journal and merge them * into nat entry set. */ - if (!__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL)) + if (enabled_nat_bits(sbi, cpc) || + !__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL)) remove_nats_in_journal(sbi); while ((found = __gang_lookup_nat_set(nm_i, @@ -2247,11 +2820,91 @@ void flush_nat_entries(struct f2fs_sb_info *sbi) /* flush dirty nats in nat entry set */ list_for_each_entry_safe(set, tmp, &sets, set_list) - __flush_nat_entry_set(sbi, set); + __flush_nat_entry_set(sbi, set, cpc); up_write(&nm_i->nat_tree_lock); + /* Allow dirty nats by node block allocation in write_begin */ +} - f2fs_bug_on(sbi, nm_i->dirty_nat_cnt); +static int __get_nat_bitmaps(struct f2fs_sb_info *sbi) +{ + struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); + struct f2fs_nm_info *nm_i = NM_I(sbi); + unsigned int nat_bits_bytes = nm_i->nat_blocks / BITS_PER_BYTE; + unsigned int i; + __u64 cp_ver = cur_cp_version(ckpt); + block_t nat_bits_addr; + + if (!enabled_nat_bits(sbi, NULL)) + return 0; + + nm_i->nat_bits_blocks = F2FS_BLK_ALIGN((nat_bits_bytes << 1) + 8); + nm_i->nat_bits = f2fs_kzalloc(sbi, + nm_i->nat_bits_blocks << F2FS_BLKSIZE_BITS, GFP_KERNEL); + if (!nm_i->nat_bits) + return -ENOMEM; + + nat_bits_addr = __start_cp_addr(sbi) + sbi->blocks_per_seg - + nm_i->nat_bits_blocks; + for (i = 0; i < nm_i->nat_bits_blocks; i++) { + struct page *page; + + page = f2fs_get_meta_page(sbi, nat_bits_addr++); + if (IS_ERR(page)) { + disable_nat_bits(sbi, true); + return PTR_ERR(page); + } + + memcpy(nm_i->nat_bits + (i << F2FS_BLKSIZE_BITS), + page_address(page), F2FS_BLKSIZE); + f2fs_put_page(page, 1); + } + + cp_ver |= (cur_cp_crc(ckpt) << 32); + if (cpu_to_le64(cp_ver) != *(__le64 *)nm_i->nat_bits) { + disable_nat_bits(sbi, true