Commit Graph

1516 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
633920f372 Revert "ANDROID: vfs: Add permission2 for filesystems with per mount permissions"
This reverts commit e81cea2a6f as it is
longer needed because sdcardfs is gone.

Bug: 157700134
Cc: Daniel Rosenberg <drosen@google.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Yongqin Liu <yongqin.liu@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic7f303d4353c9f03e8e7a5aad07d0a5aa5289412
2020-06-27 15:17:42 +02:00
Jaegeuk Kim
c95d6ed4e7 Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-4.19.y' into android-4.19-stable
This has one fix in advance merged in f2fs-stable.
("xfs: drop I_DIRTY_TIME_EXPIRED")

* aosp/upstream-f2fs-stable-linux-4.19.y:
  writeback: Drop I_DIRTY_TIME_EXPIRE
  writeback: Fix sync livelock due to b_dirty_time processing
  writeback: Avoid skipping inode writeback
  writeback: Protect inode->i_io_list with inode->i_lock
  Revert "writeback: Avoid skipping inode writeback"

Bug: 154542664
Change-Id: I98a6258cb60227e6ca02e57bf7adf28ab7816cbf
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2020-06-24 13:31:46 -07:00
Jan Kara
00f6b03b41 writeback: Drop I_DIRTY_TIME_EXPIRE
The only use of I_DIRTY_TIME_EXPIRE is to detect in
__writeback_single_inode() that inode got there because flush worker
decided it's time to writeback the dirty inode time stamps (either
because we are syncing or because of age). However we can detect this
directly in __writeback_single_inode() and there's no need for the
strange propagation with I_DIRTY_TIME_EXPIRE flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-06-19 11:51:04 -07:00
Jan Kara
359caa950d writeback: Avoid skipping inode writeback
Inode's i_io_list list head is used to attach inode to several different
lists - wb->{b_dirty, b_dirty_time, b_io, b_more_io}. When flush worker
prepares a list of inodes to writeback e.g. for sync(2), it moves inodes
to b_io list. Thus it is critical for sync(2) data integrity guarantees
that inode is not requeued to any other writeback list when inode is
queued for processing by flush worker. That's the reason why
writeback_single_inode() does not touch i_io_list (unless the inode is
completely clean) and why __mark_inode_dirty() does not touch i_io_list
if I_SYNC flag is set.

However there are two flaws in the current logic:

1) When inode has only I_DIRTY_TIME set but it is already queued in b_io
list due to sync(2), concurrent __mark_inode_dirty(inode, I_DIRTY_SYNC)
can still move inode back to b_dirty list resulting in skipping
writeback of inode time stamps during sync(2).

2) When inode is on b_dirty_time list and writeback_single_inode() races
with __mark_inode_dirty() like:

writeback_single_inode()		__mark_inode_dirty(inode, I_DIRTY_PAGES)
  inode->i_state |= I_SYNC
  __writeback_single_inode()
					  inode->i_state |= I_DIRTY_PAGES;
					  if (inode->i_state & I_SYNC)
					    bail
  if (!(inode->i_state & I_DIRTY_ALL))
  - not true so nothing done

We end up with I_DIRTY_PAGES inode on b_dirty_time list and thus
standard background writeback will not writeback this inode leading to
possible dirty throttling stalls etc. (thanks to Martijn Coenen for this
analysis).

Fix these problems by tracking whether inode is queued in b_io or
b_more_io lists in a new I_SYNC_QUEUED flag. When this flag is set, we
know flush worker has queued inode and we should not touch i_io_list.
On the other hand we also know that once flush worker is done with the
inode it will requeue the inode to appropriate dirty list. When
I_SYNC_QUEUED is not set, __mark_inode_dirty() can (and must) move inode
to appropriate dirty list.

Reported-by: Martijn Coenen <maco@android.com>
Reviewed-by: Martijn Coenen <maco@android.com>
Tested-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 0ae45f63d4 ("vfs: add support for a lazytime mount option")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2020-06-19 11:50:12 -07:00
Jaegeuk Kim
fd0b57f1cc Revert "writeback: Avoid skipping inode writeback"
This reverts commit 83af290d0975eda561a0599f54991c62842a8a99.
2020-06-19 11:50:11 -07:00
Jaegeuk Kim
a176f88e45 Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-4.19.y' into android-4.19-stable
* aosp/upstream-f2fs-stable-linux-4.19.y:
  f2fs: attach IO flags to the missing cases
  f2fs: add node_io_flag for bio flags likewise data_io_flag
  f2fs: remove unused parameter of f2fs_put_rpages_mapping()
  f2fs: handle readonly filesystem in f2fs_ioc_shutdown()
  f2fs: avoid utf8_strncasecmp() with unstable name
  f2fs: don't return vmalloc() memory from f2fs_kmalloc()
  f2fs: fix retry logic in f2fs_write_cache_pages()
  f2fs: fix wrong discard space
  f2fs: compress: don't compress any datas after cp stop
  f2fs: remove unneeded return value of __insert_discard_tree()
  f2fs: fix wrong value of tracepoint parameter
  f2fs: protect new segment allocation in expand_inode_data
  f2fs: code cleanup by removing ifdef macro surrounding
  writeback: Avoid skipping inode writeback
  f2fs: avoid inifinite loop to wait for flushing node pages at cp_error
  f2fs: compress: fix zstd data corruption
  f2fs: add compressed/gc data read IO stat
  f2fs: fix potential use-after-free issue
  f2fs: compress: don't handle non-compressed data in workqueue
  f2fs: remove redundant assignment to variable err
  f2fs: refactor resize_fs to avoid meta updates in progress
  f2fs: use round_up to enhance calculation
  f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS
  f2fs: Avoid double lock for cp_rwsem during checkpoint
  f2fs: report delalloc reserve as non-free in statfs for project quota
  f2fs: Fix wrong stub helper update_sit_info
  f2fs: compress: let lz4 compressor handle output buffer budget properly
  f2fs: remove blk_plugging in block_operations
  f2fs: introduce F2FS_IOC_RELEASE_COMPRESS_BLOCKS
  f2fs: shrink spinlock coverage
  f2fs: correctly fix the parent inode number during fsync()
  f2fs: introduce mempool for {,de}compress intermediate page allocation
  f2fs: introduce f2fs_bmap_compress()
  f2fs: support fiemap on compressed inode
  f2fs: support partial truncation on compressed inode
  f2fs: remove redundant compress inode check
  f2fs: use strcmp() in parse_options()
  f2fs: Use the correct style for SPDX License Identifier

 Conflicts:
	fs/f2fs/data.c
	fs/f2fs/dir.c

Bug: 154167995
Change-Id: I04ec97a9cafef2d7b8736f36a2a8d244965cae9a
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2020-06-15 13:24:58 -07:00
Jan Kara
762f39459a writeback: Avoid skipping inode writeback
Inode's i_io_list list head is used to attach inode to several different
lists - wb->{b_dirty, b_dirty_time, b_io, b_more_io}. When flush worker
prepares a list of inodes to writeback e.g. for sync(2), it moves inodes
to b_io list. Thus it is critical for sync(2) data integrity guarantees
that inode is not requeued to any other writeback list when inode is
queued for processing by flush worker. That's the reason why
writeback_single_inode() does not touch i_io_list (unless the inode is
completely clean) and why __mark_inode_dirty() does not touch i_io_list
if I_SYNC flag is set.

However there are two flaws in the current logic:

1) When inode has only I_DIRTY_TIME set but it is already queued in b_io
list due to sync(2), concurrent __mark_inode_dirty(inode, I_DIRTY_SYNC)
can still move inode back to b_dirty list resulting in skipping
writeback of inode time stamps during sync(2).

2) When inode is on b_dirty_time list and writeback_single_inode() races
with __mark_inode_dirty() like:

writeback_single_inode()		__mark_inode_dirty(inode, I_DIRTY_PAGES)
  inode->i_state |= I_SYNC
  __writeback_single_inode()
					  inode->i_state |= I_DIRTY_PAGES;
					  if (inode->i_state & I_SYNC)
					    bail
  if (!(inode->i_state & I_DIRTY_ALL))
  - not true so nothing done

We end up with I_DIRTY_PAGES inode on b_dirty_time list and thus
standard background writeback will not writeback this inode leading to
possible dirty throttling stalls etc. (thanks to Martijn Coenen for this
analysis).

Fix these problems by tracking whether inode is queued in b_io or
b_more_io lists in a new I_SYNC_QUEUED flag. When this flag is set, we
know flush worker has queued inode and we should not touch i_io_list.
On the other hand we also know that once flush worker is done with the
inode it will requeue the inode to appropriate dirty list. When
I_SYNC_QUEUED is not set, __mark_inode_dirty() can (and must) move inode
to appropriate dirty list.

Reported-by: Martijn Coenen <maco@android.com>
Fixes: 0ae45f63d4 ("vfs: add support for a lazytime mount option")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-30 08:49:33 -07:00
Greg Kroah-Hartman
91d4544b24 Merge 4.19.124 into android-4.19-stable
Changes in 4.19.124
	net: dsa: Do not make user port errors fatal
	shmem: fix possible deadlocks on shmlock_user_lock
	net/sonic: Fix a resource leak in an error handling path in 'jazz_sonic_probe()'
	net: moxa: Fix a potential double 'free_irq()'
	drop_monitor: work around gcc-10 stringop-overflow warning
	virtio-blk: handle block_device_operations callbacks after hot unplug
	scsi: sg: add sg_remove_request in sg_write
	mmc: sdhci-acpi: Add SDHCI_QUIRK2_BROKEN_64_BIT_DMA for AMDI0040
	net: fix a potential recursive NETDEV_FEAT_CHANGE
	netlabel: cope with NULL catmap
	net: phy: fix aneg restart in phy_ethtool_set_eee
	pppoe: only process PADT targeted at local interfaces
	Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu"
	tcp: fix error recovery in tcp_zerocopy_receive()
	virtio_net: fix lockdep warning on 32 bit
	hinic: fix a bug of ndo_stop
	net: dsa: loop: Add module soft dependency
	net: ipv4: really enforce backoff for redirects
	netprio_cgroup: Fix unlimited memory leak of v2 cgroups
	net: tcp: fix rx timestamp behavior for tcp_recvmsg
	tcp: fix SO_RCVLOWAT hangs with fat skbs
	riscv: fix vdso build with lld
	dmaengine: pch_dma.c: Avoid data race between probe and irq handler
	dmaengine: mmp_tdma: Reset channel error on release
	cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once
	ALSA: hda/hdmi: fix race in monitor detection during probe
	drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper()
	ipc/util.c: sysvipc_find_ipc() incorrectly updates position index
	ALSA: hda/realtek - Fix S3 pop noise on Dell Wyse
	gfs2: Another gfs2_walk_metadata fix
	pinctrl: baytrail: Enable pin configuration setting for GPIO chip
	pinctrl: cherryview: Add missing spinlock usage in chv_gpio_irq_handler
	i40iw: Fix error handling in i40iw_manage_arp_cache()
	mmc: core: Check request type before completing the request
	mmc: block: Fix request completion in the CQE timeout path
	NFS: Fix fscache super_cookie index_key from changing after umount
	nfs: fscache: use timespec64 in inode auxdata
	NFSv4: Fix fscache cookie aux_data to ensure change_attr is included
	netfilter: conntrack: avoid gcc-10 zero-length-bounds warning
	arm64: fix the flush_icache_range arguments in machine_kexec
	netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()
	IB/mlx4: Test return value of calls to ib_get_cached_pkey
	hwmon: (da9052) Synchronize access with mfd
	pnp: Use list_for_each_entry() instead of open coding
	gcc-10 warnings: fix low-hanging fruit
	kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
	Stop the ad-hoc games with -Wno-maybe-initialized
	gcc-10: disable 'zero-length-bounds' warning for now
	gcc-10: disable 'array-bounds' warning for now
	gcc-10: disable 'stringop-overflow' warning for now
	gcc-10: disable 'restrict' warning for now
	gcc-10: avoid shadowing standard library 'free()' in crypto
	ALSA: hda/realtek - Limit int mic boost for Thinkpad T530
	ALSA: rawmidi: Fix racy buffer resize under concurrent accesses
	ALSA: usb-audio: Add control message quirk delay for Kingston HyperX headset
	usb: core: hub: limit HUB_QUIRK_DISABLE_AUTOSUSPEND to USB5534B
	usb: host: xhci-plat: keep runtime active when removing host
	USB: gadget: fix illegal array access in binding with UDC
	usb: xhci: Fix NULL pointer dereference when enqueuing trbs from urb sg list
	ARM: dts: dra7: Fix bus_dma_limit for PCIe
	ARM: dts: imx27-phytec-phycard-s-rdk: Fix the I2C1 pinctrl entries
	cifs: fix leaked reference on requeued write
	x86: Fix early boot crash on gcc-10, third try
	x86/unwind/orc: Fix error handling in __unwind_start()
	exec: Move would_dump into flush_old_exec
	clk: rockchip: fix incorrect configuration of rk3228 aclk_gpu* clocks
	dwc3: Remove check for HWO flag in dwc3_gadget_ep_reclaim_trb_sg()
	usb: gadget: net2272: Fix a memory leak in an error handling path in 'net2272_plat_probe()'
	usb: gadget: audio: Fix a missing error return value in audio_bind()
	usb: gadget: legacy: fix error return code in gncm_bind()
	usb: gadget: legacy: fix error return code in cdc_bind()
	Revert "ALSA: hda/realtek: Fix pop noise on ALC225"
	clk: Unlink clock if failed to prepare or enable
	arm64: dts: rockchip: Replace RK805 PMIC node name with "pmic" on rk3328 boards
	arm64: dts: rockchip: Rename dwc3 device nodes on rk3399 to make dtc happy
	ARM: dts: r8a73a4: Add missing CMT1 interrupts
	arm64: dts: renesas: r8a77980: Fix IPMMU VIP[01] nodes
	ARM: dts: r8a7740: Add missing extal2 to CPG node
	KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce
	Makefile: disallow data races on gcc-10 as well
	Linux 4.19.124

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3d253f677cc08337e64d316005a0ec0c33717940
2020-05-20 11:37:46 +02:00
Linus Torvalds
51cc5495ff gcc-10 warnings: fix low-hanging fruit
commit 9d82973e032e246ff5663c9805fbb5407ae932e3 upstream.

Due to a bug-report that was compiler-dependent, I updated one of my
machines to gcc-10.  That shows a lot of new warnings.  Happily they
seem to be mostly the valid kind, but it's going to cause a round of
churn for getting rid of them..

This is the really low-hanging fruit of removing a couple of zero-sized
arrays in some core code.  We have had a round of these patches before,
and we'll have many more coming, and there is nothing special about
these except that they were particularly trivial, and triggered more
warnings than most.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20 08:18:45 +02:00
Greg Kroah-Hartman
89cded3aeb ANDROID: GKI: fs.h: add Android ABI padding to some structures
Try to mitigate potential future driver core api changes by adding a
padding to a bunch of filesystem structures.

Based on a change made to the RHEL/CENTOS 8 kernel.

Bug: 151154716
Change-Id: Ida6d98d30f292c980ab07e0250fec5268c4c87ed
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-05-01 15:18:12 +02:00
Will McVicker
46a302bbb0 ANDROID: GKI: fs: add umount_end() function to struct super_operations
This reduces the ABI diff for the module_layout symbol.

Bug: 148872640
Test: compile test
(cherry picked from commit ee7bdc62fd)
[willmcvicker: Only cherry-picked the ABI difference]
Signed-off-by: Will McVicker <willmcvicker@google.com>
Change-Id: I1e7e110316d90d190cb4fa1dcd18b70d0d6045f5
2020-04-01 15:20:11 -07:00
Greg Kroah-Hartman
248555d63c Merge 4.19.113 into android-4.19
Changes in 4.19.113
	drm/mediatek: Find the cursor plane instead of hard coding it
	spi: qup: call spi_qup_pm_resume_runtime before suspending
	powerpc: Include .BTF section
	ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes
	spi: pxa2xx: Add CS control clock quirk
	spi/zynqmp: remove entry that causes a cs glitch
	drm/exynos: dsi: propagate error value and silence meaningless warning
	drm/exynos: dsi: fix workaround for the legacy clock name
	drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer
	altera-stapl: altera_get_note: prevent write beyond end of 'key'
	dm bio record: save/restore bi_end_io and bi_integrity
	dm integrity: use dm_bio_record and dm_bio_restore
	riscv: avoid the PIC offset of static percpu data in module beyond 2G limits
	drm/amd/display: Clear link settings on MST disable connector
	drm/amd/display: fix dcc swath size calculations on dcn1
	xenbus: req->body should be updated before req->state
	xenbus: req->err should be updated before req->state
	block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
	parse-maintainers: Mark as executable
	USB: Disable LPM on WD19's Realtek Hub
	usb: quirks: add NO_LPM quirk for RTL8153 based ethernet adapters
	USB: serial: option: add ME910G1 ECM composition 0x110b
	usb: host: xhci-plat: add a shutdown
	USB: serial: pl2303: add device-id for HP LD381
	usb: xhci: apply XHCI_SUSPEND_DELAY to AMD XHCI controller 1022:145c
	ALSA: line6: Fix endless MIDI read loop
	ALSA: seq: virmidi: Fix running status after receiving sysex
	ALSA: seq: oss: Fix running status after receiving sysex
	ALSA: pcm: oss: Avoid plugin buffer overflow
	ALSA: pcm: oss: Remove WARNING from snd_pcm_plug_alloc() checks
	iio: st_sensors: remap SMO8840 to LIS2DH12
	iio: trigger: stm32-timer: disable master mode when stopping
	iio: magnetometer: ak8974: Fix negative raw values in sysfs
	iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode
	mmc: rtsx_pci: Fix support for speed-modes that relies on tuning
	mmc: sdhci-of-at91: fix cd-gpios for SAMA5D2
	staging: rtl8188eu: Add device id for MERCUSYS MW150US v2
	staging: greybus: loopback_test: fix poll-mask build breakage
	staging/speakup: fix get_word non-space look-ahead
	intel_th: Fix user-visible error codes
	intel_th: pci: Add Elkhart Lake CPU support
	rtc: max8907: add missing select REGMAP_IRQ
	xhci: Do not open code __print_symbolic() in xhci trace events
	btrfs: fix log context list corruption after rename whiteout error
	drm/amd/amdgpu: Fix GPR read from debugfs (v2)
	drm/lease: fix WARNING in idr_destroy
	memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
	mm: slub: be more careful about the double cmpxchg of freelist
	mm, slub: prevent kmalloc_node crashes and memory leaks
	page-flags: fix a crash at SetPageError(THP_SWAP)
	x86/mm: split vmalloc_sync_all()
	USB: cdc-acm: fix close_delay and closing_wait units in TIOCSSERIAL
	USB: cdc-acm: fix rounding error in TIOCSSERIAL
	iio: light: vcnl4000: update sampling periods for vcnl4200
	kbuild: Disable -Wpointer-to-enum-cast
	futex: Fix inode life-time issue
	futex: Unbreak futex hashing
	Revert "vrf: mark skb for multicast or link-local as enslaved to VRF"
	Revert "ipv6: Fix handling of LLA with VRF and sockets bound to VRF"
	ALSA: hda/realtek: Fix pop noise on ALC225
	arm64: smp: fix smp_send_stop() behaviour
	arm64: smp: fix crash_smp_send_stop() behaviour
	drm/bridge: dw-hdmi: fix AVI frame colorimetry
	staging: greybus: loopback_test: fix potential path truncation
	staging: greybus: loopback_test: fix potential path truncations
	Linux 4.19.113

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I90c48cd7189a964e59d199ecc0f32c0a68688ec5
2020-03-25 09:50:38 +01:00
Peter Zijlstra
e6d506cd22 futex: Fix inode life-time issue
commit 8019ad13ef7f64be44d4f892af9c840179009254 upstream.

As reported by Jann, ihold() does not in fact guarantee inode
persistence. And instead of making it so, replace the usage of inode
pointers with a per boot, machine wide, unique inode identifier.

This sequence number is global, but shared (file backed) futexes are
rare enough that this should not become a performance issue.

Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25 08:06:14 +01:00
Daniel Rosenberg
af2b6eaa10 FROMLIST: fscrypt: Have filesystems handle their d_ops
This shifts the responsibility of setting up dentry operations from
fscrypt to the individual filesystems, allowing them to have their own
operations while still setting fscrypt's d_revalidate as appropriate.

Also added helper function to libfs to unify ext4 and f2fs
implementations.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Test: Boots, /data/media is case insensitive
Bug: 138322712
Link: https://lore.kernel.org/linux-f2fs-devel/20200208013552.241832-1-drosen@google.com/T/#t
Change-Id: Iaf77f8c5961ecf22e22478701ab0b7fe2025225d
2020-02-28 03:36:12 +00:00
Daniel Rosenberg
6e29e14f38 FROMLIST: Add standard casefolding support
This adds general supporting functions for filesystems that use
utf8 casefolding. It provides standard dentry_operations and adds the
necessary structures in struct super_block to allow this standardization.

Ext4 and F2fs are switch to these implementations.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Note: Fixed issue with non-strictly enforced fallback hash
Test: Boots, /data/media is case insensitive
Bug: 138322712
Link: https://lore.kernel.org/linux-f2fs-devel/20200208013552.241832-1-drosen@google.com/T/#t
Change-Id: I81b5fb5d3ce0259a60712ae2505c1e4b03dbafde
2020-02-28 03:35:49 +00:00
Jaegeuk Kim
435c9a613f Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-4.19.y/v5.5-rc1' into android-4.19
* aosp/upstream-f2fs-stable-linux-4.19.y:
  f2fs: stop GC when the victim becomes fully valid
  f2fs: expose main_blkaddr in sysfs
  f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
  f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
  f2fs: show f2fs instance in printk_ratelimited
  f2fs: fix potential overflow
  f2fs: fix to update dir's i_pino during cross_rename
  f2fs: support aligned pinned file
  f2fs: avoid kernel panic on corruption test
  f2fs: fix wrong description in document
  f2fs: cache global IPU bio
  f2fs: fix to avoid memory leakage in f2fs_listxattr
  f2fs: check total_segments from devices in raw_super
  f2fs: update multi-dev metadata in resize_fs
  f2fs: mark recovery flag correctly in read_raw_super_block()
  f2fs: fix to update time in lazytime mode
  vfs: don't allow writes to swap files
  mm: set S_SWAPFILE on blockdev swap devices

Bug: 146023540
Change-Id: Ia24ce5f48f245dd7ba4fd94aa00a7d84615a8b22
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2019-12-18 10:26:14 -08:00
Darrick J. Wong
8cfd90e159 vfs: don't allow writes to swap files
Don't let userspace write to an active swap file because the kernel
effectively has a long term lease on the storage and things could get
seriously corrupted if we let this happen.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-12-02 15:03:46 -08:00
Jaegeuk Kim
c2ad33f029 Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-4.19.y' into android-4.19
* aosp/upstream-f2fs-stable-linux-4.19.y:
  f2fs: add a condition to detect overflow in f2fs_ioc_gc_range()
  f2fs: fix to add missing F2FS_IO_ALIGNED() condition
  f2fs: fix to fallback to buffered IO in IO aligned mode
  f2fs: fix to handle error path correctly in f2fs_map_blocks
  f2fs: fix extent corrupotion during directIO in LFS mode
  f2fs: check all the data segments against all node ones
  f2fs: Add a small clarification to CONFIG_FS_F2FS_FS_SECURITY
  f2fs: fix inode rwsem regression
  f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()
  f2fs: avoid infinite GC loop due to stale atomic files
  f2fs: Fix indefinite loop in f2fs_gc()
  f2fs: convert inline_data in prior to i_size_write
  f2fs: fix error path of f2fs_convert_inline_page()
  f2fs: add missing documents of reserve_root/resuid/resgid
  f2fs: fix flushing node pages when checkpoint is disabled
  f2fs: enhance f2fs_is_checkpoint_ready()'s readability
  f2fs: clean up __bio_alloc()'s parameter
  f2fs: fix wrong error injection path in inc_valid_block_count()
  f2fs: fix to writeout dirty inode during node flush
  f2fs: optimize case-insensitive lookups
  f2fs: introduce f2fs_match_name() for cleanup
  f2fs: Fix indefinite loop in f2fs_gc()
  f2fs: allocate memory in batch in build_sit_info()
  f2fs: support FS_IOC_{GET,SET}FSLABEL
  f2fs: fix to avoid data corruption by forbidding SSR overwrite
  f2fs: Fix build error while CONFIG_NLS=m
  Revert "f2fs: avoid out-of-range memory access"
  f2fs: cleanup the code in build_sit_entries.
  f2fs: fix wrong available node count calculation
  f2fs: remove duplicate code in f2fs_file_write_iter
  f2fs: fix to migrate blocks correctly during defragment
  f2fs: use wrapped f2fs_cp_error()
  f2fs: fix to use more generic EOPNOTSUPP
  f2fs: use wrapped IS_SWAPFILE()
  f2fs: Support case-insensitive file name lookups
  f2fs: include charset encoding information in the superblock
  fs: Reserve flag for casefolding
  f2fs: fix to avoid call kvfree under spinlock
  fs: f2fs: Remove unnecessary checks of SM_I(sbi) in update_general_status()
  f2fs: disallow direct IO in atomic write
  f2fs: fix to handle quota_{on,off} correctly
  f2fs: fix to detect cp error in f2fs_setxattr()
  f2fs: fix to spread f2fs_is_checkpoint_ready()
  f2fs: support fiemap() for directory inode
  f2fs: fix to avoid discard command leak
  f2fs: fix to avoid tagging SBI_QUOTA_NEED_REPAIR incorrectly
  f2fs: fix to drop meta/node pages during umount
  f2fs: disallow switching io_bits option during remount
  f2fs: fix panic of IO alignment feature
  f2fs: introduce {page,io}_is_mergeable() for readability
  f2fs: fix livelock in swapfile writes
  f2fs: add fs-verity support
  ext4: update on-disk format documentation for fs-verity
  ext4: add fs-verity read support
  ext4: add basic fs-verity support
  fs-verity: support builtin file signatures
  fs-verity: add SHA-512 support
  fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
  fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
  fs-verity: add data verification hooks for ->readpages()
  fs-verity: add the hook for file ->setattr()
  fs-verity: add the hook for file ->open()
  fs-verity: add inode and superblock fields
  fs-verity: add Kconfig and the helper functions for hashing
  fs: uapi: define verity bit for FS_IOC_GETFLAGS
  fs-verity: add UAPI header
  fs-verity: add MAINTAINERS file entry
  fs-verity: add a documentation file
  ext4: fix kernel oops caused by spurious casefold flag
  ext4: fix coverity warning on error path of filename setup
  ext4: optimize case-insensitive lookups
  ext4: fix dcache lookup of !casefolded directories
  unicode: update to Unicode 12.1.0 final
  unicode: add missing check for an error return from utf8lookup()
  ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present
  unicode: refactor the rule for regenerating utf8data.h
  ext4: Support case-insensitive file name lookups
  ext4: include charset encoding information in the superblock
  unicode: update unicode database unicode version 12.1.0
  unicode: introduce test module for normalized utf8 implementation
  unicode: implement higher level API for string handling
  unicode: reduce the size of utf8data[]
  unicode: introduce code for UTF-8 normalization
  unicode: introduce UTF-8 character database
  ext4 crypto: fix to check feature status before get policy
  fscrypt: document the new ioctls and policy version
  ubifs: wire up new fscrypt ioctls
  f2fs: wire up new fscrypt ioctls
  ext4: wire up new fscrypt ioctls
  fscrypt: require that key be added when setting a v2 encryption policy
  fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS ioctl
  fscrypt: allow unprivileged users to add/remove keys for v2 policies
  fscrypt: v2 encryption policy support
  fscrypt: add an HKDF-SHA512 implementation
  fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl
  fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl
  fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl
  fscrypt: rename keyinfo.c to keysetup.c
  fscrypt: move v1 policy key setup to keysetup_v1.c
  fscrypt: refactor key setup code in preparation for v2 policies
  fscrypt: rename fscrypt_master_key to fscrypt_direct_key
  fscrypt: add ->ci_inode to fscrypt_info
  fscrypt: use FSCRYPT_* definitions, not FS_*
  fscrypt: use FSCRYPT_ prefix for uapi constants
  fs, fscrypt: move uapi definitions to new header <linux/fscrypt.h>
  fscrypt: use ENOPKG when crypto API support missing
  fscrypt: improve warnings for missing crypto API support
  fscrypt: improve warning messages for unsupported encryption contexts
  fscrypt: make fscrypt_msg() take inode instead of super_block
  fscrypt: clean up base64 encoding/decoding
  fscrypt: remove loadable module related code

 Conflicts:
	fs/ext4/ioctl.c
	fs/ext4/readpage.c

Bug: 141329812
Change-Id: I2e10c22a7c52982d073ac6897cc8aa4d5a811a38
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2019-10-07 13:29:05 -07:00
Eric Biggers
806d34a384 fs-verity: add inode and superblock fields
Analogous to fs/crypto/, add fields to the VFS inode and superblock for
use by the fs/verity/ support layer:

- ->s_vop: points to the fsverity_operations if the filesystem supports
  fs-verity, otherwise is NULL.

- ->i_verity_info: points to cached fs-verity information for the inode
  after someone opens it, otherwise is NULL.

- S_VERITY: bit in ->i_flags that identifies verity inodes, even when
  they haven't been opened yet and thus still have NULL ->i_verity_info.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-09-23 14:11:55 -07:00
Gabriel Krisman Bertazi
ad2d18a3b9 ext4: Support case-insensitive file name lookups
This patch implements the actual support for case-insensitive file name
lookups in ext4, based on the feature bit and the encoding stored in the
superblock.

A filesystem that has the casefold feature set is able to configure
directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string.  This operation is called a
case-insensitive file name lookup.

The feature is configured as an inode attribute applied to directories
and inherited by its children.  This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.

* dcache handling:

For a +F directory, Ext4 only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().

d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.

For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries.  This is bad for performance but requires some leveraging of
the vfs layer to fix.  We can live without that for now, and so does
everyone else.

* on-disk data:

Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.

DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware.  The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.

* Dealing with invalid sequences:

By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file.  This means that case-insensitive
file name lookup will not work only for that file.  An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding.  When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.

* Normalization algorithm:

The UTF-8 algorithms used to compare strings in ext4 is implemented
lives in fs/unicode, and is based on a previous version developed by
SGI.  It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.

NFD seems to be the best normalization method for EXT4 because:

  - It has a lower cost than NFC/NFKC (which requires
    decomposing to NFD as an intermediary step)
  - It doesn't eliminate important semantic meaning like
    compatibility decompositions.

Although:

  - This implementation is not completely linguistic accurate, because
  different languages have conflicting rules, which would require the
  specialization of the filesystem to a given locale, which brings all
  sorts of problems for removable media and for users who use more than
  one language.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-09-23 13:23:29 -07:00
Eric Biggers
9846255919 fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl
Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY.  This ioctl adds an
encryption key to the filesystem's fscrypt keyring ->s_master_keys,
making any files encrypted with that key appear "unlocked".

Why we need this
~~~~~~~~~~~~~~~~

The main problem is that the "locked/unlocked" (ciphertext/plaintext)
status of encrypted files is global, but the fscrypt keys are not.
fscrypt only looks for keys in the keyring(s) the process accessing the
filesystem is subscribed to: the thread keyring, process keyring, and
session keyring, where the session keyring may contain the user keyring.

Therefore, userspace has to put fscrypt keys in the keyrings for
individual users or sessions.  But this means that when a process with a
different keyring tries to access encrypted files, whether they appear
"unlocked" or not is nondeterministic.  This is because it depends on
whether the files are currently present in the inode cache.

Fixing this by consistently providing each process its own view of the
filesystem depending on whether it has the key or not isn't feasible due
to how the VFS caches work.  Furthermore, while sometimes users expect
this behavior, it is misguided for two reasons.  First, it would be an
OS-level access control mechanism largely redundant with existing access
control mechanisms such as UNIX file permissions, ACLs, LSMs, etc.
Encryption is actually for protecting the data at rest.

Second, almost all users of fscrypt actually do need the keys to be
global.  The largest users of fscrypt, Android and Chromium OS, achieve
this by having PID 1 create a "session keyring" that is inherited by
every process.  This works, but it isn't scalable because it prevents
session keyrings from being used for any other purpose.

On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't
similarly abuse the session keyring, so to make 'sudo' work on all
systems it has to link all the user keyrings into root's user keyring
[2].  This is ugly and raises security concerns.  Moreover it can't make
the keys available to system services, such as sshd trying to access the
user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to
read certificates from the user's home directory (see [5]); or to Docker
containers (see [6], [7]).

By having an API to add a key to the *filesystem* we'll be able to fix
the above bugs, remove userspace workarounds, and clearly express the
intended semantics: the locked/unlocked status of an encrypted directory
is global, and encryption is orthogonal to OS-level access control.

Why not use the add_key() syscall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We use an ioctl for this API rather than the existing add_key() system
call because the ioctl gives us the flexibility needed to implement
fscrypt-specific semantics that will be introduced in later patches:

- Supporting key removal with the semantics such that the secret is
  removed immediately and any unused inodes using the key are evicted;
  also, the eviction of any in-use inodes can be retried.

- Calculating a key-dependent cryptographic identifier and returning it
  to userspace.

- Allowing keys to be added and removed by non-root users, but only keys
  for v2 encryption policies; and to prevent denial-of-service attacks,
  users can only remove keys they themselves have added, and a key is
  only really removed after all users who added it have removed it.

Trying to shoehorn these semantics into the keyrings syscalls would be
very difficult, whereas the ioctls make things much easier.

However, to reuse code the implementation still uses the keyrings
service internally.  Thus we get lockless RCU-mode key lookups without
having to re-implement it, and the keys automatically show up in
/proc/keys for debugging purposes.

References:

    [1] https://github.com/google/fscrypt
    [2] https://goo.gl/55cCrI#heading=h.vf09isp98isb
    [3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939
    [4] https://github.com/google/fscrypt/issues/116
    [5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715
    [6] https://github.com/google/fscrypt/issues/128
    [7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-09-23 13:21:54 -07:00
Jaegeuk Kim
e6f3ddbf65 Merge remote-tracking branch 'origin/upstream-f2fs-stable-linux-4.19.y' into android-4.19
* origin/upstream-f2fs-stable-linux-4.19.y:
  f2fs: use EINVAL for superblock with invalid magic
  f2fs: fix to read source block before invalidating it
  f2fs: remove redundant check from f2fs_setflags_common()
  f2fs: use generic checking function for FS_IOC_FSSETXATTR
  f2fs: use generic checking and prep function for FS_IOC_SETFLAGS
  ubifs, fscrypt: cache decrypted symlink target in ->i_link
  vfs: use READ_ONCE() to access ->i_link
  fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory
  fscrypt: cache decrypted symlink target in ->i_link
  fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
  fscrypt: only set dentry_operations on ciphertext dentries
  fscrypt: fix race allowing rename() and link() of ciphertext dentries
  fscrypt: clean up and improve dentry revalidation
  fscrypt: use READ_ONCE() to access ->i_crypt_info
  fscrypt: remove WARN_ON_ONCE() when decryption fails
  fscrypt: drop inode argument from fscrypt_get_ctx()
  f2fs: improve print log in f2fs_sanity_check_ckpt()
  f2fs: avoid out-of-range memory access
  f2fs: fix to avoid long latency during umount
  f2fs: allow all the users to pin a file
  f2fs: support swap file w/ DIO
  f2fs: allocate blocks for pinned file
  f2fs: fix is_idle() check for discard type
  f2fs: add a rw_sem to cover quota flag changes
  f2fs: set SBI_NEED_FSCK for xattr corruption case
  f2fs: use generic EFSBADCRC/EFSCORRUPTED
  f2fs: Use DIV_ROUND_UP() instead of open-coding
  f2fs: print kernel message if filesystem is inconsistent
  f2fs: introduce f2fs_<level> macros to wrap f2fs_printk()
  f2fs: avoid get_valid_blocks() for cleanup
  f2fs: ioctl for removing a range from F2FS
  f2fs: only set project inherit bit for directory
  f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags
  f2fs: Add option to limit required GC for checkpoint=disable
  f2fs: Fix accounting for unusable blocks
  f2fs: Fix root reserved on remount
  f2fs: Lower threshold for disable_cp_again
  f2fs: fix sparse warning
  f2fs: fix f2fs_show_options to show nodiscard mount option
  f2fs: add error prints for debugging mount failure
  f2fs: fix to do sanity check on segment bitmap of LFS curseg
  f2fs: add missing sysfs entries in documentation
  f2fs: fix to avoid deadloop if data_flush is on
  f2fs: always assume that the device is idle under gc_urgent
  f2fs: add bio cache for IPU
  f2fs: allow ssr block allocation during checkpoint=disable period
  f2fs: fix to check layout on last valid checkpoint park

Change-Id: Ie910f127f574c2115e5b9a6725461ce002c267be
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2019-07-31 21:34:30 -07:00
Eric Biggers
e831418555 f2fs: use generic checking function for FS_IOC_FSSETXATTR
Make the f2fs implementation of FS_IOC_FSSETXATTR use the new VFS helper
function vfs_ioc_fssetxattr_check(), and remove the project quota check
since it's now done by the helper function.

This is based on a patch from Darrick Wong, but reworked to apply after
commit 360985573b55 ("f2fs: separate f2fs i_flags from fs_flags and ext4
i_flags").

Originally-from: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-30 13:43:58 -07:00
Eric Biggers
f62199a95c f2fs: use generic checking and prep function for FS_IOC_SETFLAGS
Make the f2fs implementation of FS_IOC_SETFLAGS use the new VFS helper
function vfs_ioc_setflags_prepare().

This is based on a patch from Darrick Wong, but reworked to apply after
commit 360985573b55 ("f2fs: separate f2fs i_flags from fs_flags and ext4
i_flags").

Originally-from: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-30 13:40:59 -07:00
Greg Kroah-Hartman
f232ce65ba Merge 4.19.62 into android-4.19
Changes in 4.19.62
	bnx2x: Prevent load reordering in tx completion processing
	caif-hsi: fix possible deadlock in cfhsi_exit_module()
	hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
	igmp: fix memory leak in igmpv3_del_delrec()
	ipv4: don't set IPv6 only flags to IPv4 addresses
	ipv6: rt6_check should return NULL if 'from' is NULL
	ipv6: Unlink sibling route in case of failure
	net: bcmgenet: use promisc for unsupported filters
	net: dsa: mv88e6xxx: wait after reset deactivation
	net: make skb_dst_force return true when dst is refcounted
	net: neigh: fix multiple neigh timer scheduling
	net: openvswitch: fix csum updates for MPLS actions
	net: phy: sfp: hwmon: Fix scaling of RX power
	net: stmmac: Re-work the queue selection for TSO packets
	nfc: fix potential illegal memory access
	r8169: fix issue with confused RX unit after PHY power-down on RTL8411b
	rxrpc: Fix send on a connected, but unbound socket
	sctp: fix error handling on stream scheduler initialization
	sky2: Disable MSI on ASUS P6T
	tcp: be more careful in tcp_fragment()
	tcp: fix tcp_set_congestion_control() use from bpf hook
	tcp: Reset bytes_acked and bytes_received when disconnecting
	vrf: make sure skb->data contains ip header to make routing
	net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn
	macsec: fix use-after-free of skb during RX
	macsec: fix checksumming after decryption
	netrom: fix a memory leak in nr_rx_frame()
	netrom: hold sock when setting skb->destructor
	net_sched: unset TCQ_F_CAN_BYPASS when adding filters
	net/tls: make sure offload also gets the keys wiped
	sctp: not bind the socket in sctp_connect
	net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling
	net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query
	net: bridge: don't cache ether dest pointer on input
	net: bridge: stp: don't cache eth dest pointer before skb pull
	dma-buf: balance refcount inbalance
	dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc
	gpio: davinci: silence error prints in case of EPROBE_DEFER
	MIPS: lb60: Fix pin mappings
	perf/core: Fix exclusive events' grouping
	perf/core: Fix race between close() and fork()
	ext4: don't allow any modifications to an immutable file
	ext4: enforce the immutable flag on open files
	mm: add filemap_fdatawait_range_keep_errors()
	jbd2: introduce jbd2_inode dirty range scoping
	ext4: use jbd2_inode dirty range scoping
	ext4: allow directory holes
	KVM: nVMX: do not use dangling shadow VMCS after guest reset
	KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
	mm: vmscan: scan anonymous pages on file refaults
	net: sched: verify that q!=NULL before setting q->flags
	Linux 4.19.62

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2eb23bda9d5294a5c874fe4f403934fd99e84661
2019-07-28 08:43:04 +02:00
Ross Zwisler
4becd6c11e mm: add filemap_fdatawait_range_keep_errors()
commit aa0bfcd939c30617385ffa28682c062d78050eba upstream.

In the spirit of filemap_fdatawait_range() and
filemap_fdatawait_keep_errors(), introduce
filemap_fdatawait_range_keep_errors() which both takes a range upon
which to wait and does not clear errors from the address space.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28 08:29:29 +02:00
Jaegeuk Kim
701ca1f5e2 Merge remote-tracking branch 'origin/upstream-f2fs-stable-linux-4.19.y' into android-4.19
* origin/upstream-f2fs-stable-linux-4.19.y:
  fscrypt: remove filesystem specific build config option
  f2fs: use IS_ENCRYPTED() to check encryption status
  ext4: use IS_ENCRYPTED() to check encryption status
  fscrypt: return -EXDEV for incompatible rename or link into encrypted dir
  fscrypt: remove CRYPTO_CTR dependency
  fscrypt: add Adiantum support
  crypto: speck - remove Speck

 Conflicts:
	arch/arm/crypto/Kconfig
	arch/arm/crypto/Makefile
	crypto/testmgr.h

Change-Id: I1a6d1e35c857c4117190388b4797d0c11a109cf0
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2019-05-08 15:13:46 -07:00
Greg Kroah-Hartman
44c5f03127 Merge 4.19.41 into android-4.19
Changes in 4.19.41
	iwlwifi: fix driver operation for 5350
	mwifiex: Make resume actually do something useful again on SDIO cards
	mac80211: don't attempt to rename ERR_PTR() debugfs dirs
	i2c: synquacer: fix enumeration of slave devices
	i2c: imx: correct the method of getting private data in notifier_call
	i2c: Remove unnecessary call to irq_find_mapping
	i2c: Clear client->irq in i2c_device_remove
	i2c: Allow recovery of the initial IRQ by an I2C client device.
	i2c: Prevent runtime suspend of adapter when Host Notify is required
	ALSA: hda/realtek - Add new Dell platform for headset mode
	ALSA: hda/realtek - Fixed Dell AIO speaker noise
	ALSA: hda/realtek - Apply the fixup for ASUS Q325UAR
	USB: yurex: Fix protection fault after device removal
	USB: w1 ds2490: Fix bug caused by improper use of altsetting array
	USB: dummy-hcd: Fix failure to give back unlinked URBs
	usb: usbip: fix isoc packet num validation in get_pipe
	USB: core: Fix unterminated string returned by usb_string()
	USB: core: Fix bug caused by duplicate interface PM usage counter
	nvme-loop: init nvmet_ctrl fatal_err_work when allocate
	efi: Fix debugobjects warning on 'efi_rts_work'
	arm64: dts: rockchip: fix rk3328-roc-cc gmac2io tx/rx_delay
	HID: logitech: check the return value of create_singlethread_workqueue
	HID: debug: fix race condition with between rdesc_show() and device removal
	rtc: cros-ec: Fail suspend/resume if wake IRQ can't be configured
	rtc: sh: Fix invalid alarm warning for non-enabled alarm
	batman-adv: Reduce claim hash refcnt only for removed entry
	batman-adv: Reduce tt_local hash refcnt only for removed entry
	batman-adv: Reduce tt_global hash refcnt only for removed entry
	batman-adv: fix warning in function batadv_v_elp_get_throughput
	ARM: dts: rockchip: Fix gpu opp node names for rk3288
	reset: meson-audio-arb: Fix missing .owner setting of reset_controller_dev
	igb: Fix WARN_ONCE on runtime suspend
	riscv: fix accessing 8-byte variable from RV32
	HID: quirks: Fix keyboard + touchpad on Lenovo Miix 630
	net: hns3: fix compile error
	net/mlx5: E-Switch, Fix esw manager vport indication for more vport commands
	bonding: show full hw address in sysfs for slave entries
	net: stmmac: use correct DMA buffer size in the RX descriptor
	net: stmmac: ratelimit RX error logs
	net: stmmac: don't stop NAPI processing when dropping a packet
	net: stmmac: don't overwrite discard_frame status
	net: stmmac: fix dropping of multi-descriptor RX frames
	net: stmmac: don't log oversized frames
	jffs2: fix use-after-free on symlink traversal
	debugfs: fix use-after-free on symlink traversal
	mfd: twl-core: Disable IRQ while suspended
	block: use blk_free_flush_queue() to free hctx->fq in blk_mq_init_hctx
	rtc: da9063: set uie_unsupported when relevant
	HID: input: add mapping for Assistant key
	vfio/pci: use correct format characters
	scsi: core: add new RDAC LENOVO/DE_Series device
	scsi: storvsc: Fix calculation of sub-channel count
	arm/mach-at91/pm : fix possible object reference leak
	arm64: fix wrong check of on_sdei_stack in nmi context
	net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
	net: hns: Use NAPI_POLL_WEIGHT for hns driver
	net: hns: Fix probabilistic memory overwrite when HNS driver initialized
	net: hns: fix ICMP6 neighbor solicitation messages discard problem
	net: hns: Fix WARNING when remove HNS driver with SMMU enabled
	libcxgb: fix incorrect ppmax calculation
	KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow
	kmemleak: powerpc: skip scanning holes in the .bss section
	hugetlbfs: fix memory leak for resv_map
	sh: fix multiple function definition build errors
	xsysace: Fix error handling in ace_setup
	fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
	ARM: orion: don't use using 64-bit DMA masks
	ARM: iop: don't use using 64-bit DMA masks
	block: pass no-op callback to INIT_WORK().
	perf/x86/amd: Update generic hardware cache events for Family 17h
	Bluetooth: btusb: request wake pin with NOAUTOEN
	Bluetooth: mediatek: fix up an error path to restore bdev->tx_state
	clk: qcom: Add missing freq for usb30_master_clk on 8998
	staging: iio: adt7316: allow adt751x to use internal vref for all dacs
	staging: iio: adt7316: fix the dac read calculation
	staging: iio: adt7316: fix the dac write calculation
	scsi: RDMA/srpt: Fix a credit leak for aborted commands
	ASoC: Intel: bytcr_rt5651: Revert "Fix DMIC map headsetmic mapping"
	ASoC: wm_adsp: Correct handling of compressed streams that restart
	ASoC: stm32: fix sai driver name initialisation
	platform/x86: intel_pmc_core: Fix PCH IP name
	platform/x86: intel_pmc_core: Handle CFL regmap properly
	IB/core: Unregister notifier before freeing MAD security
	IB/core: Fix potential memory leak while creating MAD agents
	IB/core: Destroy QP if XRC QP fails
	Input: snvs_pwrkey - initialize necessary driver data before enabling IRQ
	Input: stmfts - acknowledge that setting brightness is a blocking call
	gpio: mxc: add check to return defer probe if clock tree NOT ready
	selinux: avoid silent denials in permissive mode under RCU walk
	selinux: never allow relabeling on context mounts
	mac80211: Honor SW_CRYPTO_CONTROL for unicast keys in AP VLAN mode
	powerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown search
	x86/mce: Improve error message when kernel cannot recover, p2
	clk: x86: Add system specific quirk to mark clocks as critical
	x86/mm/KASLR: Fix the size of the direct mapping section
	x86/mm: Fix a crash with kmemleak_scan()
	x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info"
	i2c: i2c-stm32f7: Fix SDADEL minimum formula
	media: v4l2: i2c: ov7670: Fix PLL bypass register values
	ASoC: wm_adsp: Check for buffer in trigger stop
	mm/kmemleak.c: fix unused-function warning
	Linux 4.19.41

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-08 07:39:48 +02:00
Kirill Smelkov
04b4d5f75a fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
[ Upstream commit 10dce8af34226d90fa56746a934f8da5dcdba3df ]

Commit 9c225f2655 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.

This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2d0 ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.

The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.

However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f2655 -
is doing so irregardless of whether a file is seekable or not.

See

    https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
    https://lwn.net/Articles/180387
    https://lwn.net/Articles/180396

for historic context.

The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:

	kernel/power/user.c		snapshot_read
	fs/debugfs/file.c		u32_array_read
	fs/fuse/control.c		fuse_conn_waiting_read + ...
	drivers/hwmon/asus_atk0110.c	atk_debugfs_ggrp_read
	arch/s390/hypfs/inode.c		hypfs_read_iter
	...

Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -> deadlock.

Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):

	drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
	drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
	drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
	drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
	net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
	drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
	drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
	drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
	net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
	drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
	drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
	drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
	drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
	drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()

In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.

FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f7 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:

See

    https://github.com/libfuse/osspd
    https://lwn.net/Articles/308445

    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510

Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:

    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216

I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:

    https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169

Let's fix this regression. The plan is:

1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
   doing so would break many in-kernel nonseekable_open users which
   actually use ppos in read/write handlers.

2. Add stream_open() to kernel to open stream-like non-seekable file
   descriptors. Read and write on such file descriptors would never use
   nor change ppos. And with that property on stream-like files read and
   write will be running without taking f_pos lock - i.e. read and write
   could be running simultaneously.

3. With semantic patch search and convert to stream_open all in-kernel
   nonseekable_open users for which read and write actually do not
   depend on ppos and where there is no other methods in file_operations
   which assume @offset access.

4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
   steam_open if that bit is present in filesystem open reply.

   It was tempting to change fs/fuse/ open handler to use stream_open
   instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
   grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
   and in particular GVFS which actually uses offset in its read and
   write handlers

	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

   so if we would do such a change it will break a real user.

5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
   from v3.14+ (the kernel where 9c225f2655 first appeared).

   This will allow to patch OSSPD and other FUSE filesystems that
   provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
   in their open handler and this way avoid the deadlock on all kernel
   versions. This should work because fs/fuse/ ignores unknown open
   flags returned from a filesystem and so passing FOPEN_STREAM to a
   kernel that is not aware of this flag cannot hurt. In turn the kernel
   that is not aware of FOPEN_STREAM will be < v3.14 where just
   FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
   write deadlock.

This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.

Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.

The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)

Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-08 07:21:51 +02:00
Greg Kroah-Hartman
5e7b4fbe36 Merge 4.19.38 into android-4.19
Changes in 4.19.38
	netfilter: nft_compat: use refcnt_t type for nft_xt reference count
	netfilter: nft_compat: make lists per netns
	netfilter: nf_tables: split set destruction in deactivate and destroy phase
	netfilter: nft_compat: destroy function must not have side effects
	netfilter: nf_tables: warn when expr implements only one of activate/deactivate
	netfilter: nf_tables: unbind set in rule from commit path
	netfilter: nft_compat: don't use refcount_inc on newly allocated entry
	netfilter: nft_compat: use .release_ops and remove list of extension
	netfilter: nf_tables: fix set double-free in abort path
	netfilter: nf_tables: bogus EBUSY when deleting set after flush
	netfilter: nf_tables: bogus EBUSY in helper removal from transaction
	net/ibmvnic: Fix RTNL deadlock during device reset
	net: mvpp2: fix validate for PPv2.1
	ext4: fix some error pointer dereferences
	tipc: handle the err returned from cmd header function
	loop: do not print warn message if partition scan is successful
	drm/rockchip: fix for mailbox read validation.
	vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock
	ipvs: fix warning on unused variable
	powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64
	ALSA: hda/ca0132 - Fix build error without CONFIG_PCI
	net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
	cifs: fix memory leak in SMB2_read
	cifs: do not attempt cifs operation on smb2+ rename error
	tracing: Fix a memory leak by early error exit in trace_pid_write()
	tracing: Fix buffer_ref pipe ops
	gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC
	zram: pass down the bvec we need to read into in the work struct
	lib/Kconfig.debug: fix build error without CONFIG_BLOCK
	MIPS: scall64-o32: Fix indirect syscall number load
	trace: Fix preempt_enable_no_resched() abuse
	IB/rdmavt: Fix frwr memory registration
	RDMA/mlx5: Do not allow the user to write to the clock page
	sched/numa: Fix a possible divide-by-zero
	ceph: only use d_name directly when parent is locked
	ceph: ensure d_name stability in ceph_dentry_hash()
	ceph: fix ci->i_head_snapc leak
	nfsd: Don't release the callback slot unless it was actually held
	sunrpc: don't mark uninitialised items as VALID.
	perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters
	Input: synaptics-rmi4 - write config register values to the right offset
	vfio/type1: Limit DMA mappings per container
	dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid
	dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status
	ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache
	powerpc/mm/radix: Make Radix require HUGETLB_PAGE
	drm/vc4: Fix memory leak during gpu reset.
	Revert "drm/i915/fbdev: Actually configure untiled displays"
	drm/vc4: Fix compilation error reported by kbuild test bot
	USB: Add new USB LPM helpers
	USB: Consolidate LPM checks to avoid enabling LPM twice
	slip: make slhc_free() silently accept an error pointer
	intel_th: gth: Fix an off-by-one in output unassigning
	fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
	workqueue: Try to catch flush_work() without INIT_WORK().
	binder: fix handling of misaligned binder object
	sched/deadline: Correctly handle active 0-lag timers
	NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
	netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
	fm10k: Fix a potential NULL pointer dereference
	tipc: check bearer name with right length in tipc_nl_compat_bearer_enable
	tipc: check link name with right length in tipc_nl_compat_link_set
	net: netrom: Fix error cleanup path of nr_proto_init
	net/rds: Check address length before reading address family
	rxrpc: fix race condition in rxrpc_input_packet()
	aio: clear IOCB_HIPRI
	aio: use assigned completion handler
	aio: separate out ring reservation from req allocation
	aio: don't zero entire aio_kiocb aio_get_req()
	aio: use iocb_put() instead of open coding it
	aio: split out iocb copy from io_submit_one()
	aio: abstract out io_event filler helper
	aio: initialize kiocb private in case any filesystems expect it.
	aio: simplify - and fix - fget/fput for io_submit()
	pin iocb through aio.
	aio: fold lookup_kiocb() into its sole caller
	aio: keep io_event in aio_kiocb
	aio: store event at final iocb_put()
	Fix aio_poll() races
	x86, retpolines: Raise limit for generating indirect calls from switch-case
	x86/retpolines: Disable switch jump tables when retpolines are enabled
	mm: Fix warning in insert_pfn()
	x86/fpu: Don't export __kernel_fpu_{begin,end}()
	ipv4: add sanity checks in ipv4_link_failure()
	ipv4: set the tcp_min_rtt_wlen range from 0 to one day
	mlxsw: spectrum: Fix autoneg status in ethtool
	net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query
	net: rds: exchange of 8K and 1M pool
	net/rose: fix unbound loop in rose_loopback_timer()
	net: stmmac: move stmmac_check_ether_addr() to driver probe
	net/tls: fix refcount adjustment in fallback
	stmmac: pci: Adjust IOT2000 matching
	team: fix possible recursive locking when add slaves
	net: hns: Fix WARNING when hns modules installed
	mlxsw: pci: Reincrease PCI reset timeout
	mlxsw: spectrum: Put MC TCs into DWRR mode
	net/mlx5e: Fix the max MTU check in case of XDP
	net/mlx5e: Fix use-after-free after xdp_return_frame
	net/tls: avoid potential deadlock in tls_set_device_offload_rx()
	net/tls: don't leak IV and record seq when offload fails
	powerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg
	Linux 4.19.38

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-02 10:13:34 +02:00
Linus Torvalds
d6b2615f7d aio: simplify - and fix - fget/fput for io_submit()
commit 84c4e1f89fefe70554da0ab33be72c9be7994379 upstream.

Al Viro root-caused a race where the IOCB_CMD_POLL handling of
fget/fput() could cause us to access the file pointer after it had
already been freed:

 "In more details - normally IOCB_CMD_POLL handling looks so:

   1) io_submit(2) allocates aio_kiocb instance and passes it to
      aio_poll()

   2) aio_poll() resolves the descriptor to struct file by req->file =
      fget(iocb->aio_fildes)

   3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
      aio_kiocb to 2 (bumps by 1, that is).

   4) aio_poll() calls vfs_poll(). After sanity checks (basically,
      "poll_wait() had been called and only once") it locks the queue.
      That's what the extra reference to iocb had been for - we know we
      can safely access it.

   5) With queue locked, we check if ->woken has already been set to
      true (by aio_poll_wake()) and, if it had been, we unlock the
      queue, drop a reference to aio_kiocb and bugger off - at that
      point it's a responsibility to aio_poll_wake() and the stuff
      called/scheduled by it. That code will drop the reference to file
      in req->file, along with the other reference to our aio_kiocb.

   6) otherwise, we see whether we need to wait. If we do, we unlock the
      queue, drop one reference to aio_kiocb and go away - eventual
      wakeup (or cancel) will deal with the reference to file and with
      the other reference to aio_kiocb

   7) otherwise we remove ourselves from waitqueue (still under the
      queue lock), so that wakeup won't get us. No async activity will
      be happening, so we can safely drop req->file and iocb ourselves.

  If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
  won't get freed under us, so we can do all the checks and locking
  safely. And we don't touch ->file if we detect that case.

  However, vfs_poll() most certainly *does* touch the file it had been
  given. So wakeup coming while we are still in ->poll() might end up
  doing fput() on that file. That case is not too rare, and usually we
  are saved by the still present reference from descriptor table - that
  fput() is not the final one.

  But if another thread closes that descriptor right after our fget()
  and wakeup does happen before ->poll() returns, we are in trouble -
  final fput() done while we are in the middle of a method:

Al also wrote a patch to take an extra reference to the file descriptor
to fix this, but I instead suggested we just streamline the whole file
pointer handling by submit_io() so that the generic aio submission code
simply keeps the file pointer around until the aio has completed.

Fixes: bfe4037e72 ("aio: implement IOCB_CMD_POLL")
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02 09:58:58 +02:00
Chandan Rajendra
ad8ceb03fd fscrypt: remove filesystem specific build config option
In order to have a common code base for fscrypt "post read" processing
for all filesystems which support encryption, this commit removes
filesystem specific build config option (e.g. CONFIG_EXT4_FS_ENCRYPTION)
and replaces it with a build option (i.e. CONFIG_FS_ENCRYPTION) whose
value affects all the filesystems making use of fscrypt.

Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-04-04 09:52:02 -07:00
Daniel Rosenberg
e81cea2a6f ANDROID: vfs: Add permission2 for filesystems with per mount permissions
This allows filesystems to use their mount private data to
influence the permssions they return in permission2. It has
been separated into a new call to avoid disrupting current
permission users.

Bug: 35848445
Bug: 120446149
Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[AmitP: Minor refactoring of original patch to align with
        changes from the following upstream commit
        4bfd054ae1 ("fs: fold __inode_permission() into inode_permission()").
        Also introduce vfs_mkobj2(), because do_create()
        moved from using vfs_create() to vfs_mkobj()
        eecec19d9e ("mqueue: switch to vfs_mkobj(), quit abusing ->d_fsdata")
        do_create() is dropped/cleaned-up upstream so a
        minor refactoring there as well.
        066cc813e9 ("do_mq_open(): move all work prior to dentry_open() into a helper")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[astrachan: Folded the following changes into this patch:
            f46c9d62dd81 ("ANDROID: fs: Export vfs_rmdir2")
            9992eb8b9a1e ("ANDROID: xattr: Pass EOPNOTSUPP to permission2")]
Signed-off-by: Alistair Strachan <astrachan@google.com>
2018-12-05 09:48:14 -08:00
Daniel Rosenberg
74cca90e7d ANDROID: vfs: Add setattr2 for filesystems with per mount permissions
This allows filesystems to use their mount private data to
influence the permssions they use in setattr2. It has
been separated into a new call to avoid disrupting current
setattr users.

Bug: 120446149
Change-Id: I19959038309284448f1b7f232d579674ef546385
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2018-12-05 09:48:13 -08:00
Daniel Rosenberg
8fd4c18e0b ANDROID: vfs: Allow filesystems to access their private mount data
Now we pass the vfsmount when mounting and remounting.
This allows the filesystem to actually set up the mount
specific data, although we can't quite do anything with
it yet. show_options is expanded to include data that
lives with the mount.

To avoid changing existing filesystems, these have
been added as new vfs functions.

Bug: 120446149
Change-Id: If80670bfad9f287abb8ac22457e1b034c9697097
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2018-12-05 09:48:13 -08:00
Daniel Rosenberg
a618d31ac8 ANDROID: mnt: Add filesystem private data to mount points
This starts to add private data associated directly
to mount points. The intent is to give filesystems
a sense of where they have come from, as a means of
letting a filesystem take different actions based on
this information.

Bug: 62094374
Bug: 120446149
Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[astrachan: Folded 89a54ed3bf68 ("ANDROID: mnt: Fix next_descendent")
            into this patch]
Signed-off-by: Alistair Strachan <astrachan@google.com>
2018-12-05 09:48:13 -08:00
Jan Kara
778af261c5 fsnotify: Fix busy inodes during unmount
commit 721fb6fbfd2132164c2e8777cc837f9b2c1794dc upstream.

Detaching of mark connector from fsnotify_put_mark() can race with
unmounting of the filesystem like:

  CPU1				CPU2
fsnotify_put_mark()
  spin_lock(&conn->lock);
  ...
  inode = fsnotify_detach_connector_from_object(conn)
  spin_unlock(&conn->lock);
				generic_shutdown_super()
				  fsnotify_unmount_inodes()
				    sees connector detached for inode
				      -> nothing to do
				  evict_inode()
				    barfs on pending inode reference
  iput(inode);

Resulting in "Busy inodes after unmount" message and possible kernel
oops. Make fsnotify_unmount_inodes() properly wait for outstanding inode
references from detached connectors.

Note that the accounting of outstanding inode references in the
superblock can cause some cacheline contention on the counter. OTOH it
happens only during deletion of the last notification mark from an inode
(or during unlinking of watched inode) and that is not too bad. I have
measured time to create & delete inotify watch 100000 times from 64
processes in parallel (each process having its own inotify group and its
own file on a shared superblock) on a 64 CPU machine. Average and
standard deviation of 15 runs look like:

	Avg		Stddev
Vanilla	9.817400	0.276165
Fixed	9.710467	0.228294

So there's no statistically significant difference.

Fixes: 6b3f05d24d ("fsnotify: Detach mark from object list when last reference is dropped")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13 11:08:50 -08:00
Amir Goldstein
a725356b66 vfs: swap names of {do,vfs}_clone_file_range()
Commit 031a072a0b ("vfs: call vfs_clone_file_range() under freeze
protection") created a wrapper do_clone_file_range() around
vfs_clone_file_range() moving the freeze protection to former, so
overlayfs could call the latter.

The more common vfs practice is to call do_xxx helpers from vfs_xxx
helpers, where freeze protecction is taken in the vfs_xxx helper, so
this anomality could be a source of confusion.

It seems that commit 8ede205541 ("ovl: add reflink/copyfile/dedup
support") may have fallen a victim to this confusion -
ovl_clone_file_range() calls the vfs_clone_file_range() helper in the
hope of getting freeze protection on upper fs, but in fact results in
overlayfs allowing to bypass upper fs freeze protection.

Swap the names of the two helpers to conform to common vfs practice
and call the correct helpers from overlayfs and nfsd.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-24 10:54:01 +02:00
Amir Goldstein
45cd0faae3 vfs: add the fadvise() file operation
This is going to be used by overlayfs and possibly useful
for other filesystems.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-08-30 17:08:35 +02:00
Salvatore Mesoraca
30aba6656f namei: allow restricted O_CREAT of FIFOs and regular files
Disallows open of FIFOs or regular files not owned by the user in world
writable sticky directories, unless the owner is the same as that of the
directory or the file is opened without the O_CREAT flag.  The purpose
is to make data spoofing attacks harder.  This protection can be turned
on and off separately for FIFOs and regular files via sysctl, just like
the symlinks/hardlinks protection.  This patch is based on Openwall's
"HARDEN_FIFO" feature by Solar Designer.

This is a brief list of old vulnerabilities that could have been prevented
by this feature, some of them even allow for privilege escalation:

CVE-2000-1134
CVE-2007-3852
CVE-2008-0525
CVE-2009-0416
CVE-2011-4834
CVE-2015-1838
CVE-2015-7442
CVE-2016-7489

This list is not meant to be complete.  It's difficult to track down all
vulnerabilities of this kind because they were often reported without any
mention of this particular attack vector.  In fact, before
hardlinks/symlinks restrictions, fifos/regular files weren't the favorite
vehicle to exploit them.

[s.mesoraca16@gmail.com: fix bug reported by Dan Carpenter]
  Link: https://lkml.kernel.org/r/20180426081456.GA7060@mwanda
  Link: http://lkml.kernel.org/r/1524829819-11275-1-git-send-email-s.mesoraca16@gmail.com
[keescook@chromium.org: drop pr_warn_ratelimited() in favor of audit changes in the future]
[keescook@chromium.org: adjust commit subjet]
Link: http://lkml.kernel.org/r/20180416175918.GA13494@beast
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Suggested-by: Solar Designer <solar@openwall.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23 18:48:43 -07:00
Linus Torvalds
d9a185f8b4 Merge tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "This contains two new features:

   - Stack file operations: this allows removal of several hacks from
     the VFS, proper interaction of read-only open files with copy-up,
     possibility to implement fs modifying ioctls properly, and others.

   - Metadata only copy-up: when file is on lower layer and only
     metadata is modified (except size) then only copy up the metadata
     and continue to use the data from the lower file"

* tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (66 commits)
  ovl: Enable metadata only feature
  ovl: Do not do metacopy only for ioctl modifying file attr
  ovl: Do not do metadata only copy-up for truncate operation
  ovl: add helper to force data copy-up
  ovl: Check redirect on index as well
  ovl: Set redirect on upper inode when it is linked
  ovl: Set redirect on metacopy files upon rename
  ovl: Do not set dentry type ORIGIN for broken hardlinks
  ovl: Add an inode flag OVL_CONST_INO
  ovl: Treat metacopy dentries as type OVL_PATH_MERGE
  ovl: Check redirects for metacopy files
  ovl: Move some dir related ovl_lookup_single() code in else block
  ovl: Do not expose metacopy only dentry from d_real()
  ovl: Open file with data except for the case of fsync
  ovl: Add helper ovl_inode_realdata()
  ovl: Store lower data inode in ovl_inode
  ovl: Fix ovl_getattr() to get number of blocks from lower
  ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
  ovl: Copy up meta inode data from lowest data inode
  ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
  ...
2018-08-21 18:19:09 -07:00
Jens Axboe
74c8164e1c mpage: mpage_readpages() should submit IO as read-ahead
a_ops->readpages() is only ever used for read-ahead, yet we don't flag
the IO being submitted as such.  Fix that up.  Any file system that uses
mpage_readpages() as its ->readpages() implementation will now get this
right.

Since we're passing in whether the IO is read-ahead or not, we don't
need to pass in the 'gfp' separately, as it is dependent on the IO being
read-ahead.  Kill off that member.

Add some documentation notes on ->readpages() being purely for
read-ahead.

Link: http://lkml.kernel.org/r/20180621010725.17813-3-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <clm@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:29 -07:00
NeilBrown
4cdfffc872 vfs: discard ATTR_ATTR_FLAG
This flag was introduce in 2.1.37pre1 and the only place it was tested
was removed in 2.1.43pre1.  The flag was never set.

Let's discard it properly.

Link: http://lkml.kernel.org/r/877en0hewz.fsf@notabene.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:28 -07:00
Linus Torvalds
4591343e35 Merge branches 'work.misc' and 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Misc cleanups from various folks all over the place

  I expected more fs/dcache.c cleanups this cycle, so that went into a
  separate branch. Said cleanups have missed the window, so in the
  hindsight it could've gone into work.misc instead. Decided not to
  cherry-pick, thus the 'work.dcache' branch"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: dcache: Use true and false for boolean values
  fold generic_readlink() into its only caller
  fs: shave 8 bytes off of struct inode
  fs: Add more kernel-doc to the produced documentation
  fs: Fix attr.c kernel-doc
  removed extra extern file_fdatawait_range

* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  kill dentry_update_name_case()
2018-08-13 21:28:25 -07:00
Linus Torvalds
0ea97a2d61 Merge branch 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs icache updates from Al Viro:

 - NFS mkdir/open_by_handle race fix

 - analogous solution for FUSE, replacing the one currently in mainline

 - new primitive to be used when discarding halfway set up inodes on
   failed object creation; gives sane warranties re icache lookups not
   returning such doomed by still not freed inodes. A bunch of
   filesystems switched to that animal.

 - Miklos' fix for last cycle regression in iget5_locked(); -stable will
   need a slightly different variant, unfortunately.

 - misc bits and pieces around things icache-related (in adfs and jfs).

* 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  jfs: don't bother with make_bad_inode() in ialloc()
  adfs: don't put inodes into icache
  new helper: inode_fake_hash()
  vfs: don't evict uninitialized inode
  jfs: switch to discard_new_inode()
  ext2: make sure that partially set up inodes won't be returned by ext2_iget()
  udf: switch to discard_new_inode()
  ufs: switch to discard_new_inode()
  btrfs: switch to discard_new_inode()
  new primitive: discard_new_inode()
  kill d_instantiate_no_diralias()
  nfs_instantiate(): prevent multiple aliases for directory inode
2018-08-13 20:25:58 -07:00
Linus Torvalds
a66b4cd1e7 Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs open-related updates from Al Viro:

 - "do we need fput() or put_filp()" rules are gone - it's always fput()
   now. We keep track of that state where it belongs - in ->f_mode.

 - int *opened mess killed - in finish_open(), in ->atomic_open()
   instances and in fs/namei.c code around do_last()/lookup_open()/atomic_open().

 - alloc_file() wrappers with saner calling conventions are introduced
   (alloc_file_clone() and alloc_file_pseudo()); callers converted, with
   much simplification.

 - while we are at it, saner calling conventions for path_init() and
   link_path_walk(), simplifying things inside fs/namei.c (both on
   open-related paths and elsewhere).

* 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
  few more cleanups of link_path_walk() callers
  allow link_path_walk() to take ERR_PTR()
  make path_init() unconditionally paired with terminate_walk()
  document alloc_file() changes
  make alloc_file() static
  do_shmat(): grab shp->shm_file earlier, switch to alloc_file_clone()
  new helper: alloc_file_clone()
  create_pipe_files(): switch the first allocation to alloc_file_pseudo()
  anon_inode_getfile(): switch to alloc_file_pseudo()
  hugetlb_file_setup(): switch to alloc_file_pseudo()
  ocxlflash_getfile(): switch to alloc_file_pseudo()
  cxl_getfile(): switch to alloc_file_pseudo()
  ... and switch shmem_file_setup() to alloc_file_pseudo()
  __shmem_file_setup(): reorder allocations
  new wrapper: alloc_file_pseudo()
  kill FILE_{CREATED,OPENED}
  switch atomic_open() and lookup_open() to returning 0 in all success cases
  document ->atomic_open() changes
  ->atomic_open(): return 0 in all success cases
  get rid of 'opened' in path_openat() and the helpers downstream
  ...
2018-08-13 19:58:36 -07:00
Al Viro
5bef915104 new helper: inode_fake_hash()
open-coded in a quite a few places...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03 16:03:32 -04:00
Al Viro
c2b6d621c4 new primitive: discard_new_inode()
We don't want open-by-handle picking half-set-up in-core
struct inode from e.g. mkdir() having failed halfway through.
In other words, we don't want such inodes returned by iget_locked()
on their way to extinction.  However, we can't just have them
unhashed - otherwise open-by-handle immediately *after* that would've
ended up creating a new in-core inode over the on-disk one that
is in process of being freed right under us.

	Solution: new flag (I_CREATING) set by insert_inode_locked() and
removed by unlock_new_inode() and a new primitive (discard_new_inode())
to be used by such halfway-through-setup failure exits instead of
unlock_new_inode() / iput() combinations.  That primitive unlocks new
inode, but leaves I_CREATING in place.

	iget_locked() treats finding an I_CREATING inode as failure
(-ESTALE, once we sort out the error propagation).
	insert_inode_locked() treats the same as instant -EBUSY.
	ilookup() treats those as icache miss.

[Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-03 15:55:30 -04:00
Linus Torvalds
165ea0d1c2 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Fix several places that screw up cleanups after failures halfway
  through opening a file (one open-coding filp_clone_open() and getting
  it wrong, two misusing alloc_file()). That part is -stable fodder from
  the 'work.open' branch.

  And Christoph's regression fix for uapi breakage in aio series;
  include/uapi/linux/aio_abi.h shouldn't be pulling in the kernel
  definition of sigset_t, the reason for doing so in the first place had
  been bogus - there's no need to expose struct __aio_sigset in
  aio_abi.h at all"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  aio: don't expose __aio_sigset in uapi
  ocxlflash_getfile(): fix double-iput() on alloc_file() failures
  cxl_getfile(): fix double-iput() on alloc_file() failures
  drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()
2018-07-22 12:04:51 -07:00
Miklos Szeredi
fb16043b46 vfs: remove open_flags from d_real()
Opening regular files on overlayfs is now handled via ovl_open().  Remove
the now unused "open_flags" argument from d_op->d_real() and the d_real()
helper.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-18 15:44:44 +02:00