Commit Graph

207 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
27e286f3db Merge 4.19.262 into android-4.19-stable
Changes in 4.19.262
	Makefile.extrawarn: Move -Wcast-function-type-strict to W=1
	docs: update mediator information in CoC docs
	ARM: fix function graph tracer and unwinder dependencies
	fs: fix UAF/GPF bug in nilfs_mdt_destroy
	firmware: arm_scmi: Add SCMI PM driver remove routine
	dmaengine: xilinx_dma: cleanup for fetching xlnx,num-fstores property
	dmaengine: xilinx_dma: Report error in case of dma_set_mask_and_coherent API failure
	ARM: dts: fix Moxa SDIO 'compatible', remove 'sdhci' misnomer
	scsi: qedf: Fix a UAF bug in __qedf_probe()
	net/ieee802154: fix uninit value bug in dgram_sendmsg
	um: Cleanup syscall_handler_t cast in syscalls_32.h
	um: Cleanup compiler warning in arch/x86/um/tls_32.c
	usb: mon: make mmapped memory read only
	USB: serial: ftdi_sio: fix 300 bps rate for SIO
	mmc: core: Replace with already defined values for readability
	mmc: core: Terminate infinite loop in SD-UHS voltage switch
	rpmsg: qcom: glink: replace strncpy() with strscpy_pad()
	nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
	nilfs2: fix leak of nilfs_root in case of writer thread creation failure
	nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
	ceph: don't truncate file in atomic_open
	random: clamp credited irq bits to maximum mixed
	ALSA: hda: Fix position reporting on Poulsbo
	scsi: stex: Properly zero out the passthrough command structure
	USB: serial: qcserial: add new usb-id for Dell branded EM7455
	random: restore O_NONBLOCK support
	random: avoid reading two cache lines on irq randomness
	random: use expired timer rather than wq for mixing fast pool
	wifi: mac80211_hwsim: avoid mac80211 warning on bad rate
	Input: xpad - add supported devices as contributed on github
	Input: xpad - fix wireless 360 controller breaking after suspend
	ALSA: oss: Fix potential deadlock at unregistration
	ALSA: rawmidi: Drop register_mutex in snd_rawmidi_free()
	ALSA: usb-audio: Fix potential memory leaks
	ALSA: usb-audio: Fix NULL dererence at error path
	ALSA: hda/realtek: remove ALC289_FIXUP_DUAL_SPK for Dell 5530
	mtd: rawnand: atmel: Unmap streaming DMA mappings
	iio: dac: ad5593r: Fix i2c read protocol requirements
	usb: add quirks for Lenovo OneLink+ Dock
	can: kvaser_usb: Fix use of uninitialized completion
	can: kvaser_usb_leaf: Fix overread with an invalid command
	can: kvaser_usb_leaf: Fix TX queue out of sync after restart
	can: kvaser_usb_leaf: Fix CAN state after restart
	fs: dlm: fix race between test_bit() and queue_work()
	fs: dlm: handle -EBUSY first in lock arg validation
	HID: multitouch: Add memory barriers
	quota: Check next/prev free block number after reading from quota file
	regulator: qcom_rpm: Fix circular deferral regression
	Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
	parisc: fbdev/stifb: Align graphics memory size to 4MB
	riscv: Allow PROT_WRITE-only mmap()
	UM: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
	PCI: Sanitise firmware BAR assignments behind a PCI-PCI bridge
	fbdev: smscufx: Fix use-after-free in ufx_ops_open()
	btrfs: fix race between quota enable and quota rescan ioctl
	riscv: fix build with binutils 2.38
	nilfs2: fix use-after-free bug of struct nilfs_root
	ext4: avoid crash when inline data creation follows DIO write
	ext4: fix null-ptr-deref in ext4_write_info
	ext4: make ext4_lazyinit_thread freezable
	ext4: place buffer head allocation before handle start
	livepatch: fix race between fork and KLP transition
	ftrace: Properly unset FTRACE_HASH_FL_MOD
	ring-buffer: Allow splice to read previous partially read pages
	ring-buffer: Check pending waiters when doing wake ups as well
	ring-buffer: Fix race between reset page and reading page
	KVM: x86/emulator: Fix handing of POP SS to correctly set interruptibility
	KVM: nVMX: Unconditionally purge queued/injected events on nested "exit"
	selinux: use "grep -E" instead of "egrep"
	sh: machvec: Use char[] for section boundaries
	wifi: ath10k: add peer map clean up for peer delete in ath10k_sta_state()
	wifi: mac80211: allow bw change during channel switch in mesh
	bpftool: Fix a wrong type cast in btf_dumper_int
	spi: mt7621: Fix an error message in mt7621_spi_probe()
	wifi: rtl8xxxu: tighten bounds checking in rtl8xxxu_read_efuse()
	spi: qup: add missing clk_disable_unprepare on error in spi_qup_resume()
	spi: qup: add missing clk_disable_unprepare on error in spi_qup_pm_resume_runtime()
	wifi: rtl8xxxu: Fix skb misuse in TX queue selection
	bpf: btf: fix truncated last_member_type_id in btf_struct_resolve
	wifi: rtl8xxxu: gen2: Fix mistake in path B IQ calibration
	net: fs_enet: Fix wrong check in do_pd_setup
	bpf: Ensure correct locking around vulnerable function find_vpid()
	spi/omap100k:Fix PM disable depth imbalance in omap1_spi100k_probe
	netfilter: nft_fib: Fix for rpath check with VRF devices
	spi: s3c64xx: Fix large transfers with DMA
	vhost/vsock: Use kvmalloc/kvfree for larger packets.
	mISDN: fix use-after-free bugs in l1oip timer handlers
	sctp: handle the error returned from sctp_auth_asoc_init_active_key
	tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited
	net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
	bnx2x: fix potential memory leak in bnx2x_tpa_stop()
	once: add DO_ONCE_SLOW() for sleepable contexts
	net: mvpp2: fix mvpp2 debugfs leak
	drm: bridge: adv7511: fix CEC power down control register offset
	drm/mipi-dsi: Detach devices when removing the host
	platform/chrome: fix double-free in chromeos_laptop_prepare()
	platform/x86: msi-laptop: Fix old-ec check for backlight registering
	platform/x86: msi-laptop: Fix resource cleanup
	drm/bridge: megachips: Fix a null pointer dereference bug
	mmc: au1xmmc: Fix an error handling path in au1xmmc_probe()
	ASoC: eureka-tlv320: Hold reference returned from of_find_xxx API
	drm/msm/dpu: index dpu_kms->hw_vbif using vbif_idx
	ALSA: dmaengine: increment buffer pointer atomically
	mmc: wmt-sdmmc: Fix an error handling path in wmt_mci_probe()
	ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe
	ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe
	ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe
	memory: of: Fix refcount leak bug in of_get_ddr_timings()
	soc: qcom: smsm: Fix refcount leak bugs in qcom_smsm_probe()
	soc: qcom: smem_state: Add refcounting for the 'state->of_node'
	ARM: dts: turris-omnia: Fix mpp26 pin name and comment
	ARM: dts: kirkwood: lsxl: fix serial line
	ARM: dts: kirkwood: lsxl: remove first ethernet port
	ARM: dts: exynos: correct s5k6a3 reset polarity on Midas family
	ARM: Drop CMDLINE_* dependency on ATAGS
	ARM: dts: exynos: fix polarity of VBUS GPIO of Origen
	iio: adc: at91-sama5d2_adc: fix AT91_SAMA5D2_MR_TRACKTIM_MAX
	iio: adc: at91-sama5d2_adc: check return status for pressure and touch
	iio: inkern: only release the device node when done with it
	iio: ABI: Fix wrong format of differential capacitance channel ABI.
	clk: oxnas: Hold reference returned by of_get_parent()
	clk: berlin: Add of_node_put() for of_get_parent()
	clk: tegra: Fix refcount leak in tegra210_clock_init
	clk: tegra: Fix refcount leak in tegra114_clock_init
	clk: tegra20: Fix refcount leak in tegra20_clock_init
	HSI: omap_ssi: Fix refcount leak in ssi_probe
	HSI: omap_ssi_port: Fix dma_map_sg error check
	media: exynos4-is: fimc-is: Add of_node_put() when breaking out of loop
	tty: xilinx_uartps: Fix the ignore_status
	media: xilinx: vipp: Fix refcount leak in xvip_graph_dma_init
	RDMA/rxe: Fix "kernel NULL pointer dereference" error
	RDMA/rxe: Fix the error caused by qp->sk
	dyndbg: fix module.dyndbg handling
	dyndbg: let query-modname override actual module name
	mtd: devices: docg3: check the return value of devm_ioremap() in the probe
	ata: fix ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting()
	ata: fix ata_id_has_devslp()
	ata: fix ata_id_has_ncq_autosense()
	ata: fix ata_id_has_dipm()
	md/raid5: Ensure stripe_fill happens on non-read IO with journal
	xhci: Don't show warning for reinit on known broken suspend
	usb: gadget: function: fix dangling pnp_string in f_printer.c
	drivers: serial: jsm: fix some leaks in probe
	phy: qualcomm: call clk_disable_unprepare in the error handling
	staging: vt6655: fix some erroneous memory clean-up loops
	firmware: google: Test spinlock on panic path to avoid lockups
	serial: 8250: Fix restoring termios speed after suspend
	fsi: core: Check error number after calling ida_simple_get
	mfd: intel_soc_pmic: Fix an error handling path in intel_soc_pmic_i2c_probe()
	mfd: fsl-imx25: Fix an error handling path in mx25_tsadc_setup_irq()
	mfd: lp8788: Fix an error handling path in lp8788_probe()
	mfd: lp8788: Fix an error handling path in lp8788_irq_init() and lp8788_irq_init()
	mfd: sm501: Add check for platform_driver_register()
	dmaengine: ioat: stop mod_timer from resurrecting deleted timer in __cleanup()
	spmi: pmic-arb: correct duplicate APID to PPID mapping logic
	clk: bcm2835: fix bcm2835_clock_rate_from_divisor declaration
	clk: ti: dra7-atl: Fix reference leak in of_dra7_atl_clk_probe
	mailbox: bcm-ferxrm-mailbox: Fix error check for dma_map_sg
	powerpc/math_emu/efp: Include module.h
	powerpc/sysdev/fsl_msi: Add missing of_node_put()
	powerpc/pci_dn: Add missing of_node_put()
	powerpc/powernv: add missing of_node_put() in opal_export_attrs()
	x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
	powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5
	powerpc: Fix SPE Power ISA properties for e500v1 platforms
	iommu/omap: Fix buffer overflow in debugfs
	iommu/iova: Fix module config properly
	crypto: cavium - prevent integer overflow loading firmware
	f2fs: fix race condition on setting FI_NO_EXTENT flag
	ACPI: video: Add Toshiba Satellite/Portege Z830 quirk
	MIPS: BCM47XX: Cast memcmp() of function to (void *)
	powercap: intel_rapl: fix UBSAN shift-out-of-bounds issue
	thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash
	NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
	wifi: brcmfmac: fix invalid address access when enabling SCAN log level
	openvswitch: Fix double reporting of drops in dropwatch
	openvswitch: Fix overreporting of drops in dropwatch
	tcp: annotate data-race around tcp_md5sig_pool_populated
	wifi: ath9k: avoid uninit memory read in ath9k_htc_rx_msg()
	xfrm: Update ipcomp_scratches with NULL when freed
	wifi: brcmfmac: fix use-after-free bug in brcmf_netdev_start_xmit()
	Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create()
	Bluetooth: hci_sysfs: Fix attempting to call device_add multiple times
	can: bcm: check the result of can_send() in bcm_can_tx()
	wifi: rt2x00: don't run Rt5592 IQ calibration on MT7620
	wifi: rt2x00: set correct TX_SW_CFG1 MAC register for MT7620
	wifi: rt2x00: set SoC wmac clock register
	wifi: rt2x00: correctly set BBP register 86 for MT7620
	net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory
	Bluetooth: L2CAP: Fix user-after-free
	r8152: Rate limit overflow messages
	drm: Use size_t type for len variable in drm_copy_field()
	drm: Prevent drm_copy_field() to attempt copying a NULL pointer
	drm/amd/display: fix overflow on MIN_I64 definition
	drm/vc4: vec: Fix timings for VEC modes
	drm: panel-orientation-quirks: Add quirk for Anbernic Win600
	platform/x86: msi-laptop: Change DMI match / alias strings to fix module autoloading
	drm/amdgpu: fix initial connector audio value
	ARM: dts: imx7d-sdb: config the max pressure for tsc2046
	ARM: dts: imx6q: add missing properties for sram
	ARM: dts: imx6dl: add missing properties for sram
	ARM: dts: imx6qp: add missing properties for sram
	ARM: dts: imx6sl: add missing properties for sram
	ARM: dts: imx6sll: add missing properties for sram
	ARM: dts: imx6sx: add missing properties for sram
	media: cx88: Fix a null-ptr-deref bug in buffer_prepare()
	scsi: 3w-9xxx: Avoid disabling device if failing to enable it
	nbd: Fix hung when signal interrupts nbd_start_device_ioctl()
	power: supply: adp5061: fix out-of-bounds read in adp5061_get_chg_type()
	staging: vt6655: fix potential memory leak
	ata: libahci_platform: Sanity check the DT child nodes number
	HID: roccat: Fix use-after-free in roccat_read()
	md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d
	usb: host: xhci: Fix potential memory leak in xhci_alloc_stream_info()
	usb: musb: Fix musb_gadget.c rxstate overflow bug
	Revert "usb: storage: Add quirk for Samsung Fit flash"
	nvme: copy firmware_rev on each init
	usb: idmouse: fix an uninit-value in idmouse_open
	clk: bcm2835: Make peripheral PLLC critical
	perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
	net: ieee802154: return -EINVAL for unknown addr type
	net/ieee802154: don't warn zero-sized raw_sendmsg()
	ext4: continue to expand file system when the target size doesn't reach
	md: Replace snprintf with scnprintf
	efi: libstub: drop pointless get_memory_map() call
	inet: fully convert sk->sk_rx_dst to RCU rules
	thermal: intel_powerclamp: Use first online CPU as control_cpu
	gcov: support GCC 12.1 and newer compilers
	Linux 4.19.262

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If70223b939e3710c4fbc4f7cc522f07d4b4ffd45
2022-10-30 16:23:17 +01:00
Jan Kara
59b108630a ext4: avoid crash when inline data creation follows DIO write
commit 4bb26f2885ac6930984ee451b952c5a6042f2c0e upstream.

When inode is created and written to using direct IO, there is nothing
to clear the EXT4_STATE_MAY_INLINE_DATA flag. Thus when inode gets
truncated later to say 1 byte and written using normal write, we will
try to store the data as inline data. This confuses the code later
because the inode now has both normal block and inline data allocated
and the confusion manifests for example as:

kernel BUG at fs/ext4/inode.c:2721!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 359 Comm: repro Not tainted 5.19.0-rc8-00001-g31ba1e3b8305-dirty #15
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
RIP: 0010:ext4_writepages+0x363d/0x3660
RSP: 0018:ffffc90000ccf260 EFLAGS: 00010293
RAX: ffffffff81e1abcd RBX: 0000008000000000 RCX: ffff88810842a180
RDX: 0000000000000000 RSI: 0000008000000000 RDI: 0000000000000000
RBP: ffffc90000ccf650 R08: ffffffff81e17d58 R09: ffffed10222c680b
R10: dfffe910222c680c R11: 1ffff110222c680a R12: ffff888111634128
R13: ffffc90000ccf880 R14: 0000008410000000 R15: 0000000000000001
FS:  00007f72635d2640(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000565243379180 CR3: 000000010aa74000 CR4: 0000000000150eb0
Call Trace:
 <TASK>
 do_writepages+0x397/0x640
 filemap_fdatawrite_wbc+0x151/0x1b0
 file_write_and_wait_range+0x1c9/0x2b0
 ext4_sync_file+0x19e/0xa00
 vfs_fsync_range+0x17b/0x190
 ext4_buffered_write_iter+0x488/0x530
 ext4_file_write_iter+0x449/0x1b90
 vfs_write+0xbcd/0xf40
 ksys_write+0x198/0x2c0
 __x64_sys_write+0x7b/0x90
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
 </TASK>

Fix the problem by clearing EXT4_STATE_MAY_INLINE_DATA when we are doing
direct IO write to a file.

Cc: stable@kernel.org
Reported-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Reported-by: syzbot+bd13648a53ed6933ca49@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=a1e89d09bbbcbd5c4cb45db230ee28c822953984
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Tested-by: Tadeusz Struk<tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220727155753.13969-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-26 13:19:22 +02:00
Greg Kroah-Hartman
4dc4199770 Merge 4.19.106 into android-4.19
Changes in 4.19.106
	core: Don't skip generic XDP program execution for cloned SKBs
	enic: prevent waking up stopped tx queues over watchdog reset
	net/smc: fix leak of kernel memory to user space
	net: dsa: tag_qca: Make sure there is headroom for tag
	net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGS
	net/sched: flower: add missing validation of TCA_FLOWER_FLAGS
	Revert "KVM: nVMX: Use correct root level for nested EPT shadow page tables"
	Revert "KVM: VMX: Add non-canonical check on writes to RTIT address MSRs"
	KVM: nVMX: Use correct root level for nested EPT shadow page tables
	drm/gma500: Fixup fbdev stolen size usage evaluation
	cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
	brcmfmac: Fix use after free in brcmf_sdio_readframes()
	leds: pca963x: Fix open-drain initialization
	ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT
	ALSA: ctl: allow TLV read operation for callback type of element in locked case
	gianfar: Fix TX timestamping with a stacked DSA driver
	pinctrl: sh-pfc: sh7264: Fix CAN function GPIOs
	pxa168fb: Fix the function used to release some memory in an error handling path
	media: i2c: mt9v032: fix enum mbus codes and frame sizes
	powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number
	gpio: gpio-grgpio: fix possible sleep-in-atomic-context bugs in grgpio_irq_map/unmap()
	iommu/vt-d: Fix off-by-one in PASID allocation
	char/random: silence a lockdep splat with printk()
	media: sti: bdisp: fix a possible sleep-in-atomic-context bug in bdisp_device_run()
	pinctrl: baytrail: Do not clear IRQ flags on direct-irq enabled pins
	efi/x86: Map the entire EFI vendor string before copying it
	MIPS: Loongson: Fix potential NULL dereference in loongson3_platform_init()
	sparc: Add .exit.data section.
	uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()
	usb: gadget: udc: fix possible sleep-in-atomic-context bugs in gr_probe()
	usb: dwc2: Fix IN FIFO allocation
	clocksource/drivers/bcm2835_timer: Fix memory leak of timer
	kselftest: Minimise dependency of get_size on C library interfaces
	jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
	x86/sysfb: Fix check for bad VRAM size
	pwm: omap-dmtimer: Simplify error handling
	s390/pci: Fix possible deadlock in recover_store()
	powerpc/iov: Move VF pdev fixup into pcibios_fixup_iov()
	tracing: Fix tracing_stat return values in error handling paths
	tracing: Fix very unlikely race of registering two stat tracers
	ARM: 8952/1: Disable kmemleak on XIP kernels
	ext4, jbd2: ensure panic when aborting with zero errno
	ath10k: Correct the DMA direction for management tx buffers
	drm/amd/display: Retrain dongles when SINK_COUNT becomes non-zero
	nbd: add a flush_workqueue in nbd_start_device
	KVM: s390: ENOTSUPP -> EOPNOTSUPP fixups
	kconfig: fix broken dependency in randconfig-generated .config
	clk: qcom: rcg2: Don't crash if our parent can't be found; return an error
	drm/amdgpu: remove 4 set but not used variable in amdgpu_atombios_get_connector_info_from_object_table
	drm/amdgpu: Ensure ret is always initialized when using SOC15_WAIT_ON_RREG
	regulator: rk808: Lower log level on optional GPIOs being not available
	net/wan/fsl_ucc_hdlc: reject muram offsets above 64K
	NFC: port100: Convert cpu_to_le16(le16_to_cpu(E1) + E2) to use le16_add_cpu().
	selinux: fall back to ref-walk if audit is required
	arm64: dts: allwinner: H6: Add PMU mode
	arm: dts: allwinner: H3: Add PMU node
	selinux: ensure we cleanup the internal AVC counters on error in avc_insert()
	arm64: dts: qcom: msm8996: Disable USB2 PHY suspend by core
	ARM: dts: imx6: rdu2: Disable WP for USDHC2 and USDHC3
	ARM: dts: imx6: rdu2: Limit USBH1 to Full Speed
	PCI: iproc: Apply quirk_paxc_bridge() for module as well as built-in
	media: cx23885: Add support for AVerMedia CE310B
	PCI: Add generic quirk for increasing D3hot delay
	PCI: Increase D3 delay for AMD Ryzen5/7 XHCI controllers
	media: v4l2-device.h: Explicitly compare grp{id,mask} to zero in v4l2_device macros
	reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling
	r8169: check that Realtek PHY driver module is loaded
	fore200e: Fix incorrect checks of NULL pointer dereference
	netfilter: nft_tunnel: add the missing ERSPAN_VERSION nla_policy
	ALSA: usx2y: Adjust indentation in snd_usX2Y_hwdep_dsp_status
	b43legacy: Fix -Wcast-function-type
	ipw2x00: Fix -Wcast-function-type
	iwlegacy: Fix -Wcast-function-type
	rtlwifi: rtl_pci: Fix -Wcast-function-type
	orinoco: avoid assertion in case of NULL pointer
	ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1
	scsi: ufs: Complete pending requests in host reset and restore path
	scsi: aic7xxx: Adjust indentation in ahc_find_syncrate
	drm/mediatek: handle events when enabling/disabling crtc
	ARM: dts: r8a7779: Add device node for ARM global timer
	selinux: ensure we cleanup the internal AVC counters on error in avc_update()
	dmaengine: Store module owner in dma_device struct
	dmaengine: imx-sdma: Fix memory leak
	crypto: chtls - Fixed memory leak
	x86/vdso: Provide missing include file
	PM / devfreq: rk3399_dmc: Add COMPILE_TEST and HAVE_ARM_SMCCC dependency
	pinctrl: sh-pfc: sh7269: Fix CAN function GPIOs
	reset: uniphier: Add SCSSI reset control for each channel
	RDMA/rxe: Fix error type of mmap_offset
	clk: sunxi-ng: add mux and pll notifiers for A64 CPU clock
	ALSA: sh: Fix unused variable warnings
	clk: uniphier: Add SCSSI clock gate for each channel
	ALSA: sh: Fix compile warning wrt const
	tools lib api fs: Fix gcc9 stringop-truncation compilation error
	ACPI: button: Add DMI quirk for Razer Blade Stealth 13 late 2019 lid switch
	mlx5: work around high stack usage with gcc
	drm: remove the newline for CRC source name.
	ARM: dts: stm32: Add power-supply for DSI panel on stm32f469-disco
	usbip: Fix unsafe unaligned pointer usage
	udf: Fix free space reporting for metadata and virtual partitions
	staging: rtl8188: avoid excessive stack usage
	IB/hfi1: Add software counter for ctxt0 seq drop
	soc/tegra: fuse: Correct straps' address for older Tegra124 device trees
	efi/x86: Don't panic or BUG() on non-critical error conditions
	rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nulls
	Input: edt-ft5x06 - work around first register access error
	x86/nmi: Remove irq_work from the long duration NMI handler
	wan: ixp4xx_hss: fix compile-testing on 64-bit
	ASoC: atmel: fix build error with CONFIG_SND_ATMEL_SOC_DMA=m
	tty: synclinkmp: Adjust indentation in several functions
	tty: synclink_gt: Adjust indentation in several functions
	visorbus: fix uninitialized variable access
	driver core: platform: Prevent resouce overflow from causing infinite loops
	driver core: Print device when resources present in really_probe()
	bpf: Return -EBADRQC for invalid map type in __bpf_tx_xdp_map
	vme: bridges: reduce stack usage
	drm/nouveau/secboot/gm20b: initialize pointer in gm20b_secboot_new()
	drm/nouveau/gr/gk20a,gm200-: add terminators to method lists read from fw
	drm/nouveau: Fix copy-paste error in nouveau_fence_wait_uevent_handler
	drm/nouveau/drm/ttm: Remove set but not used variable 'mem'
	drm/nouveau/fault/gv100-: fix memory leak on module unload
	drm/vmwgfx: prevent memory leak in vmw_cmdbuf_res_add
	usb: musb: omap2430: Get rid of musb .set_vbus for omap2430 glue
	iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
	f2fs: set I_LINKABLE early to avoid wrong access by vfs
	f2fs: free sysfs kobject
	scsi: iscsi: Don't destroy session if there are outstanding connections
	arm64: fix alternatives with LLVM's integrated assembler
	drm/amd/display: fixup DML dependencies
	watchdog/softlockup: Enforce that timestamp is valid on boot
	f2fs: fix memleak of kobject
	x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
	pwm: omap-dmtimer: Remove PWM chip in .remove before making it unfunctional
	cmd64x: potential buffer overflow in cmd64x_program_timings()
	ide: serverworks: potential overflow in svwks_set_pio_mode()
	pwm: Remove set but not set variable 'pwm'
	btrfs: fix possible NULL-pointer dereference in integrity checks
	btrfs: safely advance counter when looking up bio csums
	btrfs: device stats, log when stats are zeroed
	module: avoid setting info->name early in case we can fall back to info->mod->name
	remoteproc: Initialize rproc_class before use
	irqchip/mbigen: Set driver .suppress_bind_attrs to avoid remove problems
	ALSA: hda/hdmi - add retry logic to parse_intel_hdmi()
	kbuild: use -S instead of -E for precise cc-option test in Kconfig
	x86/decoder: Add TEST opcode to Group3-2
	s390: adjust -mpacked-stack support check for clang 10
	s390/ftrace: generate traced function stack frame
	driver core: platform: fix u32 greater or equal to zero comparison
	ALSA: hda - Add docking station support for Lenovo Thinkpad T420s
	drm/nouveau/mmu: fix comptag memory leak
	powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV
	bcache: cached_dev_free needs to put the sb page
	iommu/vt-d: Remove unnecessary WARN_ON_ONCE()
	selftests: bpf: Reset global state between reuseport test runs
	jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record
	jbd2: make sure ESHUTDOWN to be recorded in the journal superblock
	ARM: 8951/1: Fix Kexec compilation issue.
	hostap: Adjust indentation in prism2_hostapd_add_sta
	iwlegacy: ensure loop counter addr does not wrap and cause an infinite loop
	cifs: fix NULL dereference in match_prepath
	bpf: map_seq_next should always increase position index
	ceph: check availability of mds cluster on mount after wait timeout
	rbd: work around -Wuninitialized warning
	irqchip/gic-v3: Only provision redistributors that are enabled in ACPI
	drm/nouveau/disp/nv50-: prevent oops when no channel method map provided
	ftrace: fpid_next() should increase position index
	trigger_next should increase position index
	radeon: insert 10ms sleep in dce5_crtc_load_lut
	ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
	lib/scatterlist.c: adjust indentation in __sg_alloc_table
	reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
	bcache: explicity type cast in bset_bkey_last()
	irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL
	iwlwifi: mvm: Fix thermal zone registration
	microblaze: Prevent the overflow of the start
	brd: check and limit max_part par
	drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency
	drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage
	NFS: Fix memory leaks
	help_next should increase position index
	cifs: log warning message (once) if out of disk space
	virtio_balloon: prevent pfn array overflow
	mlxsw: spectrum_dpipe: Add missing error path
	drm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2)
	Linux 4.19.106

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia1032b50dd82b42e13973120dcbf94ae7b864648
2020-02-24 09:13:25 +01:00
Ritesh Harjani
428bb08aed ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT
[ Upstream commit f629afe3369e9885fd6e9cc7a4f514b6a65cf9e9 ]

Apparently our current rwsem code doesn't like doing the trylock, then
lock for real scheme.  So change our dax read/write methods to just do the
trylock for the RWF_NOWAIT case.
This seems to fix AIM7 regression in some scalable filesystems upto ~25%
in some cases. Claimed in commit 942491c9e6 ("xfs: fix AIM7 regression")

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20191212055557.11151-2-riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:35 +01:00
Jaegeuk Kim
c2ad33f029 Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-4.19.y' into android-4.19
* aosp/upstream-f2fs-stable-linux-4.19.y:
  f2fs: add a condition to detect overflow in f2fs_ioc_gc_range()
  f2fs: fix to add missing F2FS_IO_ALIGNED() condition
  f2fs: fix to fallback to buffered IO in IO aligned mode
  f2fs: fix to handle error path correctly in f2fs_map_blocks
  f2fs: fix extent corrupotion during directIO in LFS mode
  f2fs: check all the data segments against all node ones
  f2fs: Add a small clarification to CONFIG_FS_F2FS_FS_SECURITY
  f2fs: fix inode rwsem regression
  f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()
  f2fs: avoid infinite GC loop due to stale atomic files
  f2fs: Fix indefinite loop in f2fs_gc()
  f2fs: convert inline_data in prior to i_size_write
  f2fs: fix error path of f2fs_convert_inline_page()
  f2fs: add missing documents of reserve_root/resuid/resgid
  f2fs: fix flushing node pages when checkpoint is disabled
  f2fs: enhance f2fs_is_checkpoint_ready()'s readability
  f2fs: clean up __bio_alloc()'s parameter
  f2fs: fix wrong error injection path in inc_valid_block_count()
  f2fs: fix to writeout dirty inode during node flush
  f2fs: optimize case-insensitive lookups
  f2fs: introduce f2fs_match_name() for cleanup
  f2fs: Fix indefinite loop in f2fs_gc()
  f2fs: allocate memory in batch in build_sit_info()
  f2fs: support FS_IOC_{GET,SET}FSLABEL
  f2fs: fix to avoid data corruption by forbidding SSR overwrite
  f2fs: Fix build error while CONFIG_NLS=m
  Revert "f2fs: avoid out-of-range memory access"
  f2fs: cleanup the code in build_sit_entries.
  f2fs: fix wrong available node count calculation
  f2fs: remove duplicate code in f2fs_file_write_iter
  f2fs: fix to migrate blocks correctly during defragment
  f2fs: use wrapped f2fs_cp_error()
  f2fs: fix to use more generic EOPNOTSUPP
  f2fs: use wrapped IS_SWAPFILE()
  f2fs: Support case-insensitive file name lookups
  f2fs: include charset encoding information in the superblock
  fs: Reserve flag for casefolding
  f2fs: fix to avoid call kvfree under spinlock
  fs: f2fs: Remove unnecessary checks of SM_I(sbi) in update_general_status()
  f2fs: disallow direct IO in atomic write
  f2fs: fix to handle quota_{on,off} correctly
  f2fs: fix to detect cp error in f2fs_setxattr()
  f2fs: fix to spread f2fs_is_checkpoint_ready()
  f2fs: support fiemap() for directory inode
  f2fs: fix to avoid discard command leak
  f2fs: fix to avoid tagging SBI_QUOTA_NEED_REPAIR incorrectly
  f2fs: fix to drop meta/node pages during umount
  f2fs: disallow switching io_bits option during remount
  f2fs: fix panic of IO alignment feature
  f2fs: introduce {page,io}_is_mergeable() for readability
  f2fs: fix livelock in swapfile writes
  f2fs: add fs-verity support
  ext4: update on-disk format documentation for fs-verity
  ext4: add fs-verity read support
  ext4: add basic fs-verity support
  fs-verity: support builtin file signatures
  fs-verity: add SHA-512 support
  fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
  fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
  fs-verity: add data verification hooks for ->readpages()
  fs-verity: add the hook for file ->setattr()
  fs-verity: add the hook for file ->open()
  fs-verity: add inode and superblock fields
  fs-verity: add Kconfig and the helper functions for hashing
  fs: uapi: define verity bit for FS_IOC_GETFLAGS
  fs-verity: add UAPI header
  fs-verity: add MAINTAINERS file entry
  fs-verity: add a documentation file
  ext4: fix kernel oops caused by spurious casefold flag
  ext4: fix coverity warning on error path of filename setup
  ext4: optimize case-insensitive lookups
  ext4: fix dcache lookup of !casefolded directories
  unicode: update to Unicode 12.1.0 final
  unicode: add missing check for an error return from utf8lookup()
  ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present
  unicode: refactor the rule for regenerating utf8data.h
  ext4: Support case-insensitive file name lookups
  ext4: include charset encoding information in the superblock
  unicode: update unicode database unicode version 12.1.0
  unicode: introduce test module for normalized utf8 implementation
  unicode: implement higher level API for string handling
  unicode: reduce the size of utf8data[]
  unicode: introduce code for UTF-8 normalization
  unicode: introduce UTF-8 character database
  ext4 crypto: fix to check feature status before get policy
  fscrypt: document the new ioctls and policy version
  ubifs: wire up new fscrypt ioctls
  f2fs: wire up new fscrypt ioctls
  ext4: wire up new fscrypt ioctls
  fscrypt: require that key be added when setting a v2 encryption policy
  fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS ioctl
  fscrypt: allow unprivileged users to add/remove keys for v2 policies
  fscrypt: v2 encryption policy support
  fscrypt: add an HKDF-SHA512 implementation
  fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl
  fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl
  fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl
  fscrypt: rename keyinfo.c to keysetup.c
  fscrypt: move v1 policy key setup to keysetup_v1.c
  fscrypt: refactor key setup code in preparation for v2 policies
  fscrypt: rename fscrypt_master_key to fscrypt_direct_key
  fscrypt: add ->ci_inode to fscrypt_info
  fscrypt: use FSCRYPT_* definitions, not FS_*
  fscrypt: use FSCRYPT_ prefix for uapi constants
  fs, fscrypt: move uapi definitions to new header <linux/fscrypt.h>
  fscrypt: use ENOPKG when crypto API support missing
  fscrypt: improve warnings for missing crypto API support
  fscrypt: improve warning messages for unsupported encryption contexts
  fscrypt: make fscrypt_msg() take inode instead of super_block
  fscrypt: clean up base64 encoding/decoding
  fscrypt: remove loadable module related code

 Conflicts:
	fs/ext4/ioctl.c
	fs/ext4/readpage.c

Bug: 141329812
Change-Id: I2e10c22a7c52982d073ac6897cc8aa4d5a811a38
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2019-10-07 13:29:05 -07:00
Eric Biggers
5ed7b76f84 ext4: add basic fs-verity support
Add most of fs-verity support to ext4.  fs-verity is a filesystem
feature that enables transparent integrity protection and authentication
of read-only files.  It uses a dm-verity like mechanism at the file
level: a Merkle tree is used to verify any block in the file in
log(filesize) time.  It is implemented mainly by helper functions in
fs/verity/.  See Documentation/filesystems/fsverity.rst for the full
documentation.

This commit adds all of ext4 fs-verity support except for the actual
data verification, including:

- Adding a filesystem feature flag and an inode flag for fs-verity.

- Implementing the fsverity_operations to support enabling verity on an
  inode and reading/writing the verity metadata.

- Updating ->write_begin(), ->write_end(), and ->writepages() to support
  writing verity metadata pages.

- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().

ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
past the end of the file, starting at the first 64K boundary beyond
i_size.  This approach works because (a) verity files are readonly, and
(b) pages fully beyond i_size aren't visible to userspace but can be
read/written internally by ext4 with only some relatively small changes
to ext4.  This approach avoids having to depend on the EA_INODE feature
and on rearchitecturing ext4's xattr support to support paging
multi-gigabyte xattrs into memory, and to support encrypting xattrs.
Note that the verity metadata *must* be encrypted when the file is,
since it contains hashes of the plaintext data.

This patch incorporates work by Theodore Ts'o and Chandan Rajendra.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-09-23 14:11:56 -07:00
Theodore Ts'o
c9ea4620a3 ext4: enforce the immutable flag on open files
commit 02b016ca7f99229ae6227e7b2fc950c4e140d74a upstream.

According to the chattr man page, "a file with the 'i' attribute
cannot be modified..."  Historically, this was only enforced when the
file was opened, per the rest of the description, "... and the file
can not be opened in write mode".

There is general agreement that we should standardize all file systems
to prevent modifications even for files that were opened at the time
the immutable flag is set.  Eventually, a change to enforce this at
the VFS layer should be landing in mainline.  Until then, enforce this
at the ext4 level to prevent xfstests generic/553 from failing.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28 08:29:29 +02:00
Lukas Czerner
0db24122bd ext4: fix data corruption caused by overlapping unaligned and aligned IO
commit 57a0da28ced8707cb9f79f071a016b9d005caf5a upstream.

Unaligned AIO must be serialized because the zeroing of partial blocks
of unaligned AIO can result in data corruption in case it's overlapping
another in flight IO.

Currently we wait for all unwritten extents before we submit unaligned
AIO which protects data in case of unaligned AIO is following overlapping
IO. However if a unaligned AIO is followed by overlapping aligned AIO we
can still end up corrupting data.

To fix this, we must make sure that the unaligned AIO is the only IO in
flight by waiting for unwritten extents conversion not just before the
IO submission, but right after it as well.

This problem can be reproduced by xfstest generic/538

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22 07:37:44 +02:00
Lukas Czerner
76c9ee6bd5 ext4: fix data corruption caused by unaligned direct AIO
commit 372a03e01853f860560eade508794dd274e9b390 upstream.

Ext4 needs to serialize unaligned direct AIO because the zeroing of
partial blocks of two competing unaligned AIOs can result in data
corruption.

However it decides not to serialize if the potentially unaligned aio is
past i_size with the rationale that no pending writes are possible past
i_size. Unfortunately if the i_size is not block aligned and the second
unaligned write lands past i_size, but still into the same block, it has
the potential of corrupting the previous unaligned write to the same
block.

This is (very simplified) reproducer from Frank

    // 41472 = (10 * 4096) + 512
    // 37376 = 41472 - 4096

    ftruncate(fd, 41472);
    io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
    io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);

    io_submit(io_ctx, 1, &iocbs[1]);
    io_submit(io_ctx, 1, &iocbs[2]);

    io_getevents(io_ctx, 2, 2, events, NULL);

Without this patch the 512B range from 40960 up to the start of the
second unaligned write (41472) is going to be zeroed overwriting the data
written by the first write. This is a data corruption.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31

With this patch the data corruption is avoided because we will recognize
the unaligned_aio and wait for the unwritten extent conversion.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
*
0000b200

Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: e9e3bcecf4 ("ext4: serialize unaligned asynchronous DIO")
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-27 14:14:41 +09:00
Dave Jiang
e1fb4a0864 dax: remove VM_MIXEDMAP for fsdax and device dax
This patch is reworked from an earlier patch that Dan has posted:
https://patchwork.kernel.org/patch/10131727/

VM_MIXEDMAP is used by dax to direct mm paths like vm_normal_page() that
the memory page it is dealing with is not typical memory from the linear
map.  The get_user_pages_fast() path, since it does not resolve the vma,
is already using {pte,pmd}_devmap() as a stand-in for VM_MIXEDMAP, so we
use that as a VM_MIXEDMAP replacement in some locations.  In the cases
where there is no pte to consult we fallback to using vma_is_dax() to
detect the VM_MIXEDMAP special case.

Now that we have explicit driver pfn_t-flag opt-in/opt-out for
get_user_pages() support for DAX we can stop setting VM_MIXEDMAP.  This
also means we no longer need to worry about safely manipulating vm_flags
in a future where we support dynamically changing the dax mode of a
file.

DAX should also now be supported with madvise_behavior(), vma_merge(),
and copy_page_range().

This patch has been tested against ndctl unit test.  It has also been
tested against xfstests commit: 625515d using fake pmem created by
memmap and no additional issues have been observed.

Link: http://lkml.kernel.org/r/152847720311.55924.16999195879201817653.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:27 -07:00
Amir Goldstein
db6516a5e7 ext4: do not update s_last_mounted of a frozen fs
If fs is frozen after mount and before the first file open, the
update of s_last_mounted bypasses freeze protection and prints out
a WARNING splat:

$ mount /vdf
$ fsfreeze -f /vdf
$ cat /vdf/foo

[   31.578555] WARNING: CPU: 1 PID: 1415 at
fs/ext4/ext4_jbd2.c:53 ext4_journal_check_start+0x48/0x82

[   31.614016] Call Trace:
[   31.614997]  __ext4_journal_start_sb+0xe4/0x1a4
[   31.616771]  ? ext4_file_open+0xb6/0x189
[   31.618094]  ext4_file_open+0xb6/0x189

If fs is frozen, skip s_last_mounted update.

[backport hint: to apply to stable tree, need to apply also patches
 vfs: add the sb_start_intwrite_trylock() helper
 ext4: factor out helper ext4_sample_last_mounted()]

Cc: stable@vger.kernel.org
Fixes: bc0b0d6d69 ("ext4: update the s_last_mounted field in the superblock")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2018-05-13 22:54:44 -04:00
Amir Goldstein
833a950882 ext4: factor out helper ext4_sample_last_mounted()
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2018-05-13 22:44:23 -04:00
Souptick Joarder
71fe989961 fs: ext4: add new return type vm_fault_t
Use new return type vm_fault_t for fault handler. For now,
this is just documenting that the function returns a
VM_FAULT value rather than an errno. Once all instances are
converted, vm_fault_t will become a distinct type.

commit 1c8f422059 ("mm: change return type to vm_fault_t")

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
2018-05-13 16:01:49 -04:00
Jan Kara
2244642310 ext4: fix ENOSPC handling in DAX page fault handler
When allocation of underlying block for a page fault fails, we fail the
fault with SIGBUS. However we may well hit ENOSPC just due to lots of
free blocks being held by the running / committing transaction. So
propagate the error from ext4_iomap_begin() and implement do standard
allocation retry loop in ext4_dax_huge_fault().

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-01-07 16:41:01 -05:00
Jan Kara
c0b2462597 dax: pass detailed error code from dax_iomap_fault()
Ext4 needs to pass through error from its iomap handler to the page
fault handler so that it can properly detect ENOSPC and force
transaction commit and retry the fault (and block allocation). Add
argument to dax_iomap_fault() for passing such error.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-01-07 16:38:43 -05:00
Linus Torvalds
a3841f94c7 Merge tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm and dax updates from Dan Williams:
 "Save for a few late fixes, all of these commits have shipped in -next
  releases since before the merge window opened, and 0day has given a
  build success notification.

  The ext4 touches came from Jan, and the xfs touches have Darrick's
  reviewed-by. An xfstest for the MAP_SYNC feature has been through
  a few round of reviews and is on track to be merged.

   - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable
     'userspace flush' of persistent memory updates via filesystem-dax
     mappings. It arranges for any filesystem metadata updates that may
     be required to satisfy a write fault to also be flushed ("on disk")
     before the kernel returns to userspace from the fault handler.
     Effectively every write-fault that dirties metadata completes an
     fsync() before returning from the fault handler. The new
     MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag
     is validated as supported by the filesystem's ->mmap() file
     operation.

   - Add support for the standard ACPI 6.2 label access methods that
     replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods.
     This enables interoperability with environments that only implement
     the standardized methods.

   - Add support for the ACPI 6.2 NVDIMM media error injection methods.

   - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for
     latch last shutdown status, firmware update, SMART error injection,
     and SMART alarm threshold control.

   - Cleanup physical address information disclosures to be root-only.

   - Fix revalidation of the DIMM "locked label area" status to support
     dynamic unlock of the label area.

   - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA
     (system-physical-address) command and error injection commands.

  Acknowledgements that came after the commits were pushed to -next:

   - 957ac8c421 ("dax: fix PMD faults on zero-length files"):
       Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

   - a39e596baa ("xfs: support for synchronous DAX faults") and
     7b565c9f96 ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()")
        Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>"

* tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits)
  acpi, nfit: add 'Enable Latch System Shutdown Status' command support
  dax: fix general protection fault in dax_alloc_inode
  dax: fix PMD faults on zero-length files
  dax: stop requiring a live device for dax_flush()
  brd: remove dax support
  dax: quiet bdev_dax_supported()
  fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core
  tools/testing/nvdimm: unit test clear-error commands
  acpi, nfit: validate commands against the device type
  tools/testing/nvdimm: stricter bounds checking for error injection commands
  xfs: support for synchronous DAX faults
  xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()
  ext4: Support for synchronous DAX faults
  ext4: Simplify error handling in ext4_dax_huge_fault()
  dax: Implement dax_finish_sync_fault()
  dax, iomap: Add support for synchronous faults
  mm: Define MAP_SYNC and VM_SYNC flags
  dax: Allow tuning whether dax_insert_mapping_entry() dirties entry
  dax: Allow dax_iomap_fault() to return pfn
  dax: Fix comment describing dax_iomap_fault()
  ...
2017-11-17 09:51:57 -08:00
Linus Torvalds
ae9a8c4bdc Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:

 - Add support for online resizing of file systems with bigalloc

 - Fix a two data corruption bugs involving DAX, as well as a corruption
   bug after a crash during a racing fallocate and delayed allocation.

 - Finally, a number of cleanups and optimizations.

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: improve smp scalability for inode generation
  ext4: add support for online resizing with bigalloc
  ext4: mention noload when recovering on read-only device
  Documentation: fix little inconsistencies
  ext4: convert timers to use timer_setup()
  jbd2: convert timers to use timer_setup()
  ext4: remove duplicate extended attributes defs
  ext4: add ext4_should_use_dax()
  ext4: add sanity check for encryption + DAX
  ext4: prevent data corruption with journaling + DAX
  ext4: prevent data corruption with inline data + DAX
  ext4: fix interaction between i_size, fallocate, and delalloc after a crash
  ext4: retry allocations conservatively
  ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  ext4: Add iomap support for inline data
  iomap: Add IOMAP_F_DATA_INLINE flag
  iomap: Switch from blkno to disk offset
2017-11-14 12:59:42 -08:00
Linus Torvalds
32190f0afb Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
 "Lots of cleanups, mostly courtesy by Eric Biggers"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
  fscrypt: lock mutex before checking for bounce page pool
  fscrypt: add a documentation file for filesystem-level encryption
  ext4: switch to fscrypt_prepare_setattr()
  ext4: switch to fscrypt_prepare_lookup()
  ext4: switch to fscrypt_prepare_rename()
  ext4: switch to fscrypt_prepare_link()
  ext4: switch to fscrypt_file_open()
  fscrypt: new helper function - fscrypt_prepare_setattr()
  fscrypt: new helper function - fscrypt_prepare_lookup()
  fscrypt: new helper function - fscrypt_prepare_rename()
  fscrypt: new helper function - fscrypt_prepare_link()
  fscrypt: new helper function - fscrypt_file_open()
  fscrypt: new helper function - fscrypt_require_key()
  fscrypt: remove unneeded empty fscrypt_operations structs
  fscrypt: remove ->is_encrypted()
  fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED()
  fs, fscrypt: add an S_ENCRYPTED inode flag
  fscrypt: clean up include file mess
2017-11-14 11:35:15 -08:00
Jan Kara
b8a6176c21 ext4: Support for synchronous DAX faults
We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to
prepare blocks for writing and the inode has some uncommitted metadata
changes. In the fault handler ext4_dax_fault() we then detect this case
(through VM_FAULT_NEEDDSYNC return value) and call helper
dax_finish_sync_fault() to flush metadata changes and insert page table
entry. Note that this will also dirty corresponding radix tree entry
which is what we want - fsync(2) will still provide data integrity
guarantees for applications not using userspace flushing. And
applications using userspace flushing can avoid calling fsync(2) and
thus avoid the performance overhead.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-03 06:26:26 -07:00
Jan Kara
497f6926d8 ext4: Simplify error handling in ext4_dax_huge_fault()
If transaction starting fails, just bail out of the function immediately
instead of checking for that condition throughout the function.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-03 06:26:26 -07:00
Jan Kara
9a0dd42251 dax: Allow dax_iomap_fault() to return pfn
For synchronous page fault dax_iomap_fault() will need to return PFN
which will then need to be inserted into page tables after fsync()
completes. Add necessary parameter to dax_iomap_fault().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-03 06:26:24 -07:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Eric Biggers
09a5c31c91 ext4: switch to fscrypt_file_open()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-10-18 20:21:57 -04:00
Christoph Hellwig
545052e9e3 ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
Switch to the iomap_seek_hole and iomap_seek_data helpers for
implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the code that
isn't needed any more.

Note that with this patch ext4 will now always depend on the iomap code
instead of only when CONFIG_DAX is enabled, and it requires adding a
call into the extent status tree for iomap_begin as well to properly
deal with delalloc extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
[More fixes and cleanups by Andreas]
2017-10-01 17:58:54 -04:00
Linus Torvalds
e253d98f5b Merge branch 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull nowait read support from Al Viro:
 "Support IOCB_NOWAIT for buffered reads and block devices"

* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  block_dev: support RFW_NOWAIT on block device nodes
  fs: support RWF_NOWAIT for buffered reads
  fs: support IOCB_NOWAIT in generic_file_buffered_read
  fs: pass iocb to do_generic_file_read
2017-09-14 19:29:55 -07:00
Linus Torvalds
0f0d12728e Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro:
 "Another chunk of fmount preparations from dhowells; only trivial
  conflicts for that part. It separates MS_... bits (very grotty
  mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
  only a small subset of MS_... stuff).

  This does *not* convert the filesystems to new constants; only the
  infrastructure is done here. The next step in that series is where the
  conflicts would be; that's the conversion of filesystems. It's purely
  mechanical and it's better done after the merge, so if you could run
  something like

	list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

	sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
	        -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
	        -e 's/\<MS_NODEV\>/SB_NODEV/g' \
	        -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
	        -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
	        -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
	        -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
	        -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
	        -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
	        -e 's/\<MS_SILENT\>/SB_SILENT/g' \
	        -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
	        -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
	        -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
	        -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
	        $list

  and commit it with something along the lines of 'convert filesystems
  away from use of MS_... constants' as commit message, it would save a
  quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Differentiate mount flags (MS_*) from internal superblock flags
  VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
  vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-14 18:54:01 -07:00
Linus Torvalds
d34fc1adf0 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - various misc bits

 - DAX updates

 - OCFS2

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits)
  mm,fork: introduce MADV_WIPEONFORK
  x86,mpx: make mpx depend on x86-64 to free up VMA flag
  mm: add /proc/pid/smaps_rollup
  mm: hugetlb: clear target sub-page last when clearing huge page
  mm: oom: let oom_reap_task and exit_mmap run concurrently
  swap: choose swap device according to numa node
  mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
  mm, oom: do not rely on TIF_MEMDIE for memory reserves access
  z3fold: use per-cpu unbuddied lists
  mm, swap: don't use VMA based swap readahead if HDD is used as swap
  mm, swap: add sysfs interface for VMA based swap readahead
  mm, swap: VMA based swap readahead
  mm, swap: fix swap readahead marking
  mm, swap: add swap readahead hit statistics
  mm/vmalloc.c: don't reinvent the wheel but use existing llist API
  mm/vmstat.c: fix wrong comment
  selftests/memfd: add memfd_create hugetlbfs selftest
  mm/shmem: add hugetlbfs support to memfd_create()
  mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
  mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
  ...
2017-09-06 20:49:49 -07:00
Jan Kara
397162ffa2 mm: remove nr_pages argument from pagevec_lookup{,_range}()
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.

Just drop the argument.

Link: http://lkml.kernel.org/r/20170726114704.7626-11-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:27 -07:00
Jan Kara
dec0da7b60 ext4: use pagevec_lookup_range() in ext4_find_unwritten_pgoff()
Use pagevec_lookup_range() in ext4_find_unwritten_pgoff() since we are
interested only in pages in the given range.  Simplify the logic as a
result of not getting pages out of range and index getting automatically
advanced.

Link: http://lkml.kernel.org/r/20170726114704.7626-6-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:26 -07:00
Jan Kara
d72dc8a25a mm: make pagevec_lookup() update index
Make pagevec_lookup() (and underlying find_get_pages()) update index to
the next page where iteration should continue.  Most callers want this
and also pagevec_lookup_tag() already does this.

Link: http://lkml.kernel.org/r/20170726114704.7626-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:26 -07:00
Ross Zwisler
91d25ba8a6 dax: use common 4k zero page for dax mmap reads
When servicing mmap() reads from file holes the current DAX code
allocates a page cache page of all zeroes and places the struct page
pointer in the mapping->page_tree radix tree.

This has three major drawbacks:

1) It consumes memory unnecessarily. For every 4k page that is read via
   a DAX mmap() over a hole, we allocate a new page cache page. This
   means that if you read 1GiB worth of pages, you end up using 1GiB of
   zeroed memory. This is easily visible by looking at the overall
   memory consumption of the system or by looking at /proc/[pid]/smaps:

	7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12   /root/dax/data
	Size:            1048576 kB
	Rss:             1048576 kB
	Pss:             1048576 kB
	Shared_Clean:          0 kB
	Shared_Dirty:          0 kB
	Private_Clean:   1048576 kB
	Private_Dirty:         0 kB
	Referenced:      1048576 kB
	Anonymous:             0 kB
	LazyFree:              0 kB
	AnonHugePages:         0 kB
	ShmemPmdMapped:        0 kB
	Shared_Hugetlb:        0 kB
	Private_Hugetlb:       0 kB
	Swap:                  0 kB
	SwapPss:               0 kB
	KernelPageSize:        4 kB
	MMUPageSize:           4 kB
	Locked:                0 kB

2) It is slower than using a common zero page because each page fault
   has more work to do. Instead of just inserting a common zero page we
   have to allocate a page cache page, zero it, and then insert it. Here
   are the average latencies of dax_load_hole() as measured by ftrace on
   a random test box:

    Old method, using zeroed page cache pages:	3.4 us
    New method, using the common 4k zero page:	0.8 us

   This was the average latency over 1 GiB of sequential reads done by
   this simple fio script:

     [global]
     size=1G
     filename=/root/dax/data
     fallocate=none
     [io]
     rw=read
     ioengine=mmap

3) The fact that we had to check for both DAX exceptional entries and
   for page cache pages in the radix tree made the DAX code more
   complex.

Solve these issues by following the lead of the DAX PMD code and using a
common 4k zero page instead.  As with the PMD code we will now insert a
DAX exceptional entry into the radix tree instead of a struct page
pointer which allows us to remove all the special casing in the DAX
code.

Note that we do still pretty aggressively check for regular pages in the
DAX radix tree, especially where we take action based on the bits set in
the page.  If we ever find a regular page in our radix tree now that
most likely means that someone besides DAX is inserting pages (which has
happened lots of times in the past), and we want to find that out early
and fail loudly.

This solution also removes the extra memory consumption.  Here is that
same /proc/[pid]/smaps after 1GiB of reading from a hole with the new
code:

	7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12   /root/dax/data
	Size:            1048576 kB
	Rss:                   0 kB
	Pss:                   0 kB
	Shared_Clean:          0 kB
	Shared_Dirty:          0 kB
	Private_Clean:         0 kB
	Private_Dirty:         0 kB
	Referenced:            0 kB
	Anonymous:             0 kB
	LazyFree:              0 kB
	AnonHugePages:         0 kB
	ShmemPmdMapped:        0 kB
	Shared_Hugetlb:        0 kB
	Private_Hugetlb:       0 kB
	Swap:                  0 kB
	SwapPss:               0 kB
	KernelPageSize:        4 kB
	MMUPageSize:           4 kB
	Locked:                0 kB

Overall system memory consumption is similarly improved.

Another major change is that we remove dax_pfn_mkwrite() from our fault
flow, and instead rely on the page fault itself to make the PTE dirty
and writeable.  The following description from the patch adding the
vm_insert_mixed_mkwrite() call explains this a little more:

   "To be able to use the common 4k zero page in DAX we need to have our
    PTE fault path look more like our PMD fault path where a PTE entry
    can be marked as dirty and writeable as it is first inserted rather
    than waiting for a follow-up dax_pfn_mkwrite() =>
    finish_mkwrite_fault() call.

    Right now we can rely on having a dax_pfn_mkwrite() call because we
    can distinguish between these two cases in do_wp_page():

            case 1: 4k zero page => writable DAX storage
            case 2: read-only DAX storage => writeable DAX storage

    This distinction is made by via vm_normal_page(). vm_normal_page()
    returns false for the common 4k zero page, though, just as it does
    for DAX ptes. Instead of special casing the DAX + 4k zero page case
    we will simplify our DAX PTE page fault sequence so that it matches
    our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.
    We will instead use dax_iomap_fault() to handle write-protection
    faults.

    This means that insert_pfn() needs to follow the lead of
    insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If
    'mkwrite' is set insert_pfn() will do the work that was previously
    done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path"

Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:24 -07:00
Christoph Hellwig
91f9943e1c fs: support RWF_NOWAIT for buffered reads
This is based on the old idea and code from Milosz Tanski.  With the aio
nowait code it becomes mostly trivial now.  Buffered writes continue to
return -EOPNOTSUPP if RWF_NOWAIT is passed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-04 19:04:23 -04:00
Randy Dodgen
fd96b8da68 ext4: fix fault handling when mounted with -o dax,ro
If an ext4 filesystem is mounted with both the DAX and read-only
options, executables on that filesystem will fail to start (claiming
'Segmentation fault') due to the fault handler returning
VM_FAULT_SIGBUS.

This is due to the DAX fault handler (see ext4_dax_huge_fault)
attempting to write to the journal when FAULT_FLAG_WRITE is set. This is
the wrong behavior for write faults which will lead to a COW page; in
particular, this fails for readonly mounts.

This change avoids journal writes for faults that are expected to COW.

It might be the case that this could be better handled in
ext4_iomap_begin / ext4_iomap_end (called via iomap_ops inside
dax_iomap_fault). These is some overlap already (e.g. grabbing journal
handles).

Signed-off-by: Randy Dodgen <dodgen@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
2017-08-24 15:26:01 -04:00
Darrick J. Wong
1bd8d6cd3e ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets
In the ext4 implementations of SEEK_HOLE and SEEK_DATA, make sure we
return -ENXIO for negative offsets instead of banging around inside
the extent code and returning -EFSCORRUPTED.

Reported-by: Mateusz S <muttdini@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # 4.6
2017-08-24 13:22:06 -04:00
Jan Kara
fcf5ea1099 ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
ext4_find_unwritten_pgoff() does not properly handle a situation when
starting index is in the middle of a page and blocksize < pagesize. The
following command shows the bug on filesystem with 1k blocksize:

  xfs_io -f -c "falloc 0 4k" \
            -c "pwrite 1k 1k" \
            -c "pwrite 3k 1k" \
            -c "seek -a -r 0" foo

In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.

Fix the problem by neglecting buffers in a page before starting offset.

Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
CC: stable@vger.kernel.org # 3.8+
2017-08-05 17:43:24 -04:00
David Howells
bc98a42c1f VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
Firstly by applying the following with coccinelle's spatch:

	@@ expression SB; @@
	-SB->s_flags & MS_RDONLY
	+sb_rdonly(SB)

to effect the conversion to sb_rdonly(sb), then by applying:

	@@ expression A, SB; @@
	(
	-(!sb_rdonly(SB)) && A
	+!sb_rdonly(SB) && A
	|
	-A != (sb_rdonly(SB))
	+A != sb_rdonly(SB)
	|
	-A == (sb_rdonly(SB))
	+A == sb_rdonly(SB)
	|
	-!(sb_rdonly(SB))
	+!sb_rdonly(SB)
	|
	-A && (sb_rdonly(SB))
	+A && sb_rdonly(SB)
	|
	-A || (sb_rdonly(SB))
	+A || sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) != A
	+sb_rdonly(SB) != A
	|
	-(sb_rdonly(SB)) == A
	+sb_rdonly(SB) == A
	|
	-(sb_rdonly(SB)) && A
	+sb_rdonly(SB) && A
	|
	-(sb_rdonly(SB)) || A
	+sb_rdonly(SB) || A
	)

	@@ expression A, B, SB; @@
	(
	-(sb_rdonly(SB)) ? 1 : 0
	+sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) ? A : B
	+sb_rdonly(SB) ? A : B
	)

to remove left over excess bracketage and finally by applying:

	@@ expression A, SB; @@
	(
	-(A & MS_RDONLY) != sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
	|
	-(A & MS_RDONLY) == sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
	)

to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-17 08:45:34 +01:00
Linus Torvalds
bc2c6421cb Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "The first major feature for ext4 this merge window is the largedir
  feature, which allows ext4 directories to support over 2 billion
  directory entries (assuming ~64 byte file names; in practice, users
  will run into practical performance limits first.) This feature was
  originally written by the Lustre team, and credit goes to Artem
  Blagodarenko from Seagate for getting this feature upstream.

  The second major major feature allows ext4 to support extended
  attribute values up to 64k. This feature was also originally from
  Lustre, and has been enhanced by Tahsin Erdogan from Google with a
  deduplication feature so that if multiple files have the same xattr
  value (for example, Windows ACL's stored by Samba), only one copy will
  be stored on disk for encoding and caching efficiency.

  We also have the usual set of bug fixes, cleanups, and optimizations"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits)
  ext4: fix spelling mistake: "prellocated" -> "preallocated"
  ext4: fix __ext4_new_inode() journal credits calculation
  ext4: skip ext4_init_security() and encryption on ea_inodes
  fs: generic_block_bmap(): initialize all of the fields in the temp bh
  ext4: change fast symlink test to not rely on i_blocks
  ext4: require key for truncate(2) of encrypted file
  ext4: don't bother checking for encryption key in ->mmap()
  ext4: check return value of kstrtoull correctly in reserved_clusters_store
  ext4: fix off-by-one fsmap error on 1k block filesystems
  ext4: return EFSBADCRC if a bad checksum error is found in ext4_find_entry()
  ext4: return EIO on read error in ext4_find_entry
  ext4: forbid encrypting root directory
  ext4: send parallel discards on commit completions
  ext4: avoid unnecessary stalls in ext4_evict_inode()
  ext4: add nombcache mount option
  ext4: strong binding of xattr inode references
  ext4: eliminate xattr entry e_hash recalculation for removes
  ext4: reserve space for xattr entries/names
  quota: add get_inode_usage callback to transfer multi-inode charges
  ext4: xattr inode deduplication
  ...
2017-07-09 09:31:22 -07:00
Eric Biggers
66e0aaadce ext4: don't bother checking for encryption key in ->mmap()
Since only an open file can be mmap'ed, and we only allow open()ing an
encrypted file when its key is available, there is no need to check for
the key again before permitting each mmap().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-23 19:41:38 -04:00
Goldwyn Rodrigues
728fbc0e10 ext4: nowait aio support
Return EAGAIN if any of the following checks fail for direct I/O:
  + i_rwsem is lockable
  + Writing beyond end of file (will trigger allocation)
  + Blocks are not allocated at the write location

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20 07:12:03 -06:00
Eryu Guan
624327f879 ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff()
ext4_find_unwritten_pgoff() is used to search for offset of hole or
data in page range [index, end] (both inclusive), and the max number
of pages to search should be at least one, if end == index.
Otherwise the only page is missed and no hole or data is found,
which is not correct.

When block size is smaller than page size, this can be demonstrated
by preallocating a file with size smaller than page size and writing
data to the last block. E.g. run this xfs_io command on a 1k block
size ext4 on x86_64 host.

  # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
  	    -c "seek -d 0" /mnt/ext4/testfile
  wrote 1024/1024 bytes at offset 2048
  1 KiB, 1 ops; 0.0000 sec (42.459 MiB/sec and 43478.2609 ops/sec)
  Whence  Result
  DATA    EOF

Data at offset 2k was missed, and lseek(2) returned ENXIO.

This is unconvered by generic/285 subtest 07 and 08 on ppc64 host,
where pagesize is 64k. Because a recent change to generic/285
reduced the preallocated file size to smaller than 64k.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-05-24 18:02:20 -04:00
Jan Kara
3f1d5bad3f ext4: fix off-by-in in loop termination in ext4_find_unwritten_pgoff()
There is an off-by-one error in loop termination conditions in
ext4_find_unwritten_pgoff() since 'end' may index a page beyond end of
desired range if 'endoff' is page aligned. It doesn't have any visible
effects but still it is good to fix it.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-21 22:34:23 -04:00
Jan Kara
7d95eddf31 ext4: fix SEEK_HOLE
Currently, SEEK_HOLE implementation in ext4 may both return that there's
a hole at some offset although that offset already has data and skip
some holes during a search for the next hole. The first problem is
demostrated by:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file
wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec)
Whence	Result
HOLE	0

Where we can see that SEEK_HOLE wrongly returned offset 0 as containing
a hole although we have written data there. The second problem can be
demonstrated by:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
       -c "seek -h 0" file

wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec)
wrote 8192/8192 bytes at offset 131072
8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec)
Whence	Result
HOLE	139264

Where we can see that hole at offsets 56k..128k has been ignored by the
SEEK_HOLE call.

The underlying problem is in the ext4_find_unwritten_pgoff() which is
just buggy. In some cases it fails to update returned offset when it
finds a hole (when no pages are found or when the first found page has
higher index than expected), in some cases conditions for detecting hole
are just missing (we fail to detect a situation where indices of
returned pages are not contiguous).

Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page
indices and also handle all cases where we got less pages then expected
in one place and handle it properly there.

CC: stable@vger.kernel.org
Fixes: c8c0df241c
CC: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-21 22:33:23 -04:00
Jan Kara
fb26a1cbed ext4: return to starting transaction in ext4_dax_huge_fault()
DAX will return to locking exceptional entry before mapping blocks for a
page fault to fix possible races with concurrent writes.  To avoid lock
inversion between exceptional entry lock and transaction start, start
the transaction already in ext4_dax_huge_fault().

Fixes: 9f141d6ef6
Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12 15:57:16 -07:00
David Howells
99652ea56a ext4: Add statx support
Return enhanced file attributes from the Ext4 filesystem.  This includes
the following:

 (1) The inode creation time (i_crtime) as stx_btime, setting STATX_BTIME.

 (2) Certain FS_xxx_FL flags are mapped to stx_attribute flags.

This requires that all ext4 inodes have a getattr call, not just some of
them, so to this end, split the ext4_getattr() function and only call part
of it where appropriate.

Example output:

	[root@andromeda ~]# touch foo
	[root@andromeda ~]# chattr +ai foo
	[root@andromeda ~]# /tmp/test-statx foo
	statx(foo) = 0
	results=fff
	  Size: 0               Blocks: 0          IO Block: 4096    regular file
	Device: 08:12           Inode: 2101950     Links: 1
	Access: (0644/-rw-r--r--)  Uid:     0   Gid:     0
	Access: 2016-02-11 17:08:29.031795451+0000
	Modify: 2016-02-11 17:08:29.031795451+0000
	Change: 2016-02-11 17:11:11.987790114+0000
	 Birth: 2016-02-11 17:08:29.031795451+0000
	Attributes: 0000000000000030 (-------- -------- -------- -------- -------- -------- -------- --ai----)

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-03 01:05:58 -04:00
Dave Jiang
c791ace1e7 mm: replace FAULT_FLAG_SIZE with parameter to huge_fault
Since the introduction of FAULT_FLAG_SIZE to the vm_fault flag, it has
been somewhat painful with getting the flags set and removed at the
correct locations.  More than one kernel oops was introduced due to
difficulties of getting the placement correctly.

Remove the flag values and introduce an input parameter to huge_fault
that indicates the size of the page entry.  This makes the code easier
to trace and should avoid the issues we see with the fault flags where
removal of the flag was necessary in the fallback paths.

Link: http://lkml.kernel.org/r/148615748258.43180.1690152053774975329.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Dave Jiang
a2d581675d mm,fs,dax: change ->pmd_fault to ->huge_fault
Patch series "1G transparent hugepage support for device dax", v2.

The following series implements support for 1G trasparent hugepage on
x86 for device dax.  The bulk of the code was written by Mathew Wilcox a
while back supporting transparent 1G hugepage for fs DAX.  I have
forward ported the relevant bits to 4.10-rc.  The current submission has
only the necessary code to support device DAX.

Comments from Dan Williams: So the motivation and intended user of this
functionality mirrors the motivation and users of 1GB page support in
hugetlbfs.  Given expected capacities of persistent memory devices an
in-memory database may want to reduce tlb pressure beyond what they can
already achieve with 2MB mappings of a device-dax file.  We have
customer feedback to that effect as Willy mentioned in his previous
version of these patches [1].

[1]: https://lkml.org/lkml/2016/1/31/52

Comments from Nilesh @ Oracle:

There are applications which have a process model; and if you assume
10,000 processes attempting to mmap all the 6TB memory available on a
server; we are looking at the following:

processes         : 10,000
memory            :    6TB
pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB
pmd @ 2M page size: 120,000 / 512 = ~240GB
pud @ 1G page size: 240GB / 512 = ~480MB

As you can see with 2M pages, this system will use up an exorbitant
amount of DRAM to hold the page tables; but the 1G pages finally brings
it down to a reasonable level.  Memory sizes will keep increasing; so
this number will keep increasing.

An argument can be made to convert the applications from process model
to thread model, but in the real world that may not be always practical.
Hopefully this helps explain the use case where this is valuable.

This patch (of 3):

In preparation for adding the ability to handle PUD pages, convert
vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault.  The
vm_fault structure is extended to include a union of the different page
table pointers that may be needed, and three flag bits are reserved to
indicate which type of pointer is in the union.

[ross.zwisler@linux.intel.com: remove unused function ext4_dax_huge_fault()]
  Link: http://lkml.kernel.org/r/1485813172-7284-1-git-send-email-ross.zwisler@linux.intel.com
[dave.jiang@intel.com: clear PMD or PUD size flags when in fall through path]
  Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Dave Jiang
11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Dave Jiang
f42003917b mm, dax: change pmd_fault() to take only vmf parameter
pmd_fault() and related functions really only need the vmf parameter since
the additional parameters are all included in the vmf struct.  Remove the
additional parameter and simplify pmd_fault() and friends.

Link: http://lkml.kernel.org/r/1484085142-2297-8-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:26 -08:00
Dave Jiang
d8a849e1bc mm, dax: make pmd_fault() and friends be the same as fault()
Instead of passing in multiple parameters in the pmd_fault() handler,
a vmf can be passed in just like a fault() handler. This will simplify
code and remove the need for the actual pmd fault handlers to allocate a
vmf. Related functions are also modified to do the same.

[dave.jiang@intel.com: fix issue with xfs_tests stall when DAX option is off]
  Link: http://lkml.kernel.org/r/148469861071.195597.3619476895250028518.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/1484085142-2297-7-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:26 -08:00
Christoph Hellwig
ff5462e39c ext4: fix DAX write locking
Unlike O_DIRECT DAX is not an optional opt-in feature selected by the
application, so we'll have to provide the traditional synchronіzation
of overlapping writes as we do for buffered writes.

This was broken historically for DAX, but got fixed for ext2 and XFS
as part of the iomap conversion.  Fix up ext4 as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-02-08 14:39:27 -05:00