Changes in 4.19.300
locking/ww_mutex/test: Fix potential workqueue corruption
perf/core: Bail out early if the request AUX area is out of bound
clocksource/drivers/timer-imx-gpt: Fix potential memory leak
clocksource/drivers/timer-atmel-tcb: Fix initialization on SAM9 hardware
x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size
wifi: mac80211: don't return unset power in ieee80211_get_tx_power()
wifi: ath9k: fix clang-specific fortify warnings
wifi: ath10k: fix clang-specific fortify warning
net: annotate data-races around sk->sk_tx_queue_mapping
net: annotate data-races around sk->sk_dst_pending_confirm
Bluetooth: Fix double free in hci_conn_cleanup
platform/x86: thinkpad_acpi: Add battery quirk for Thinkpad X120e
drm/amd: Fix UBSAN array-index-out-of-bounds for SMU7
drm/amd: Fix UBSAN array-index-out-of-bounds for Polaris and Tonga
drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL
selftests/efivarfs: create-read: fix a resource leak
crypto: pcrypt - Fix hungtask for PADATA_RESET
RDMA/hfi1: Use FIELD_GET() to extract Link Width
fs/jfs: Add check for negative db_l2nbperpage
fs/jfs: Add validity check for db_maxag and db_agpref
jfs: fix array-index-out-of-bounds in dbFindLeaf
jfs: fix array-index-out-of-bounds in diAlloc
ARM: 9320/1: fix stack depot IRQ stack filter
ALSA: hda: Fix possible null-ptr-deref when assigning a stream
atm: iphase: Do PCI error checks on own line
scsi: libfc: Fix potential NULL pointer dereference in fc_lport_ptp_setup()
HID: Add quirk for Dell Pro Wireless Keyboard and Mouse KM5221W
tty: vcc: Add check for kstrdup() in vcc_probe()
usb: gadget: f_ncm: Always set current gadget in ncm_bind()
i2c: sun6i-p2wi: Prevent potential division by zero
media: gspca: cpia1: shift-out-of-bounds in set_flicker
media: vivid: avoid integer overflow
gfs2: ignore negated quota changes
drm/amd/display: Avoid NULL dereference of timing generator
pwm: Fix double shift bug
NFSv4.1: fix SP4_MACH_CRED protection for pnfs IO
ipvlan: add ipvlan_route_v6_outbound() helper
tty: Fix uninit-value access in ppp_sync_receive()
tipc: Fix kernel-infoleak due to uninitialized TLV value
ppp: limit MRU to 64K
xen/events: fix delayed eoi list handling
ptp: annotate data-race around q->head and q->tail
net: ethernet: cortina: Fix max RX frame define
net: ethernet: cortina: Handle large frames
net: ethernet: cortina: Fix MTU max setting
macvlan: Don't propagate promisc change to lower dev in passthru
cifs: spnego: add ';' in HOST_KEY_LEN
media: venus: hfi: add checks to perform sanity on queue pointers
randstruct: Fix gcc-plugin performance mode to stay in group
KVM: x86: Ignore MSR_AMD64_TW_CFG access
audit: don't take task_lock() in audit_exe_compare() code path
audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare()
hvc/xen: fix error path in xen_hvc_init() to always register frontend driver
PCI/sysfs: Protect driver's D3cold preference from user space
mmc: meson-gx: Remove setting of CMD_CFG_ERROR
genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware
PCI: keystone: Don't discard .remove() callback
PCI: keystone: Don't discard .probe() callback
parisc/pdc: Add width field to struct pdc_model
clk: qcom: ipq8074: drop the CLK_SET_RATE_PARENT flag from PLL clocks
mmc: vub300: fix an error code
PM: hibernate: Use __get_safe_page() rather than touching the list
PM: hibernate: Clean up sync_read handling in snapshot_write_next()
jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev
quota: explicitly forbid quota files from being encrypted
mcb: fix error handling for different scenarios when parsing
dmaengine: stm32-mdma: correct desc prep when channel running
parisc: Prevent booting 64-bit kernels on PA1.x machines
parisc/pgtable: Do not drop upper 5 address bits of physical address
ALSA: info: Fix potential deadlock at disconnection
ALSA: hda/realtek - Enable internal speaker of ASUS K6500ZC
tty: serial: meson: if no alias specified use an available id
serial: meson: remove redundant initialization of variable id
tty: serial: meson: retrieve port FIFO size from DT
serial: meson: Use platform_get_irq() to get the interrupt
tty: serial: meson: fix hard LOCKUP on crtscts mode
net: dsa: lan9303: consequently nested-lock physical MDIO
i2c: i801: fix potential race in i801_block_transaction_byte_by_byte
media: lirc: drop trailing space from scancode transmit
media: sharp: fix sharp encoding
media: venus: hfi_parser: Add check to keep the number of codecs within range
media: venus: hfi: fix the check to handle session buffer requirement
media: venus: hfi: add checks to handle capabilities from firmware
Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E"
ext4: apply umask if ACL support is disabled
ext4: correct offset of gdb backup in non meta_bg group to update_backups
ext4: correct return value of ext4_convert_meta_bg
ext4: remove gdb backup copy for meta bg in setup_new_flex_group_blocks
drm/amdgpu: fix error handling in amdgpu_bo_list_get()
scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
iomap: Set all uptodate bits for an Uptodate page
net: sched: fix race condition in qdisc_graft()
Linux 4.19.300
Change-Id: I21f68d5f5dc85afe62bbc6e9a7aac12faee56621
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 4595a298d5563cf76c1d852970f162051fd1a7a6 upstream.
For filesystems with block size < page size, we need to set all the
per-block uptodate bits if the page was already uptodate at the time
we create the per-block metadata. This can happen if the page is
invalidated (eg by a write to drop_caches) but ultimately not removed
from the page cache.
This is a data corruption issue as page writeback skips blocks which
are marked !uptodate.
Fixes: 9dc55f1389 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Qian Cai <cai@redhat.com>
Cc: Brian Foster <bfoster@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.19.191
s390/disassembler: increase ebpf disasm buffer size
ACPI: custom_method: fix potential use-after-free issue
ACPI: custom_method: fix a possible memory leak
ftrace: Handle commands when closing set_ftrace_filter file
ARM: 9056/1: decompressor: fix BSS size calculation for LLVM ld.lld
arm64: dts: marvell: armada-37xx: add syscon compatible to NB clk node
arm64: dts: mt8173: fix property typo of 'phys' in dsi node
ecryptfs: fix kernel panic with null dev_name
mtd: spinand: core: add missing MODULE_DEVICE_TABLE()
mtd: rawnand: atmel: Update ecc_stats.corrected counter
spi: spi-ti-qspi: Free DMA resources
scsi: qla2xxx: Fix crash in qla2xxx_mqueuecommand()
mmc: sdhci-pci: Fix initialization of some SD cards for Intel BYT-based controllers
mmc: block: Update ext_csd.cache_ctrl if it was written
mmc: block: Issue a cache flush only when it's enabled
mmc: core: Do a power cycle when the CMD11 fails
mmc: core: Set read only for SD cards with permanent write protect bit
erofs: add unsupported inode i_format check
cifs: Return correct error code from smb2_get_enc_key
btrfs: fix metadata extent leak after failure to create subvolume
intel_th: pci: Add Rocket Lake CPU support
fbdev: zero-fill colormap in fbcmap.c
staging: wimax/i2400m: fix byte-order issue
crypto: api - check for ERR pointers in crypto_destroy_tfm()
usb: gadget: uvc: add bInterval checking for HS mode
genirq/matrix: Prevent allocation counter corruption
usb: gadget: f_uac1: validate input parameters
usb: dwc3: gadget: Ignore EP queue requests during bus reset
usb: xhci: Fix port minor revision
PCI: PM: Do not read power state in pci_enable_device_flags()
x86/build: Propagate $(CLANG_FLAGS) to $(REALMODE_FLAGS)
tee: optee: do not check memref size on return from Secure World
perf/arm_pmu_platform: Fix error handling
usb: xhci-mtk: support quirk to disable usb2 lpm
xhci: check control context is valid before dereferencing it.
xhci: fix potential array out of bounds with several interrupters
spi: dln2: Fix reference leak to master
spi: omap-100k: Fix reference leak to master
intel_th: Consistency and off-by-one fix
phy: phy-twl4030-usb: Fix possible use-after-free in twl4030_usb_remove()
btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s
scsi: lpfc: Fix incorrect dbde assignment when building target abts wqe
scsi: lpfc: Fix pt2pt connection does not recover after LOGO
scsi: target: pscsi: Fix warning in pscsi_complete_cmd()
media: ite-cir: check for receive overflow
media: drivers: media: pci: sta2x11: fix Kconfig dependency on GPIOLIB
power: supply: bq27xxx: fix power_avg for newer ICs
extcon: arizona: Fix some issues when HPDET IRQ fires after the jack has been unplugged
media: media/saa7164: fix saa7164_encoder_register() memory leak bugs
media: gspca/sq905.c: fix uninitialized variable
power: supply: Use IRQF_ONESHOT
drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f
scsi: qla2xxx: Always check the return value of qla24xx_get_isp_stats()
scsi: qla2xxx: Fix use after free in bsg
scsi: scsi_dh_alua: Remove check for ASC 24h in alua_rtpg()
media: em28xx: fix memory leak
media: vivid: update EDID
clk: socfpga: arria10: Fix memory leak of socfpga_clk on error return
power: supply: generic-adc-battery: fix possible use-after-free in gab_remove()
power: supply: s3c_adc_battery: fix possible use-after-free in s3c_adc_bat_remove()
media: tc358743: fix possible use-after-free in tc358743_remove()
media: adv7604: fix possible use-after-free in adv76xx_remove()
media: i2c: adv7511-v4l2: fix possible use-after-free in adv7511_remove()
media: i2c: adv7842: fix possible use-after-free in adv7842_remove()
media: dvb-usb: fix memory leak in dvb_usb_adapter_init
media: gscpa/stv06xx: fix memory leak
drm/msm/mdp5: Configure PP_SYNC_HEIGHT to double the vtotal
amdgpu: avoid incorrect %hu format string
drm/amdgpu: fix NULL pointer dereference
scsi: lpfc: Fix crash when a REG_RPI mailbox fails triggering a LOGO response
scsi: lpfc: Remove unsupported mbox PORT_CAPABILITIES logic
scsi: libfc: Fix a format specifier
s390/archrandom: add parameter check for s390_arch_random_generate
ALSA: emu8000: Fix a use after free in snd_emu8000_create_mixer
ALSA: hda/conexant: Re-order CX5066 quirk table entries
ALSA: sb: Fix two use after free in snd_sb_qsound_build
ALSA: usb-audio: Explicitly set up the clock selector
ALSA: usb-audio: More constifications
ALSA: usb-audio: Add dB range mapping for Sennheiser Communications Headset PC 8
ALSA: hda/realtek: Add quirk for Intel Clevo PCx0Dx
btrfs: fix race when picking most recent mod log operation for an old root
arm64/vdso: Discard .note.gnu.property sections in vDSO
ubifs: Only check replay with inode type to judge if inode linked
f2fs: fix to avoid out-of-bounds memory access
mlxsw: spectrum_mr: Update egress RIF list before route's action
openvswitch: fix stack OOB read while fragmenting IPv4 packets
ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
NFS: Don't discard pNFS layout segments that are marked for return
NFSv4: Don't discard segments marked for return in _pnfs_return_layout()
jffs2: Fix kasan slab-out-of-bounds problem
powerpc/eeh: Fix EEH handling for hugepages in ioremap space.
powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h
intel_th: pci: Add Alder Lake-M support
tpm: vtpm_proxy: Avoid reading host log when using a virtual device
md/raid1: properly indicate failure when ending a failed write request
dm raid: fix inconclusive reshape layout on fast raid4/5/6 table reload sequences
security: commoncap: fix -Wstringop-overread warning
Fix misc new gcc warnings
jffs2: check the validity of dstlen in jffs2_zlib_compress()
Revert 337f13046f ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")
posix-timers: Preserve return value in clock_adjtime32()
arm64: vdso: remove commas between macro name and arguments
ext4: fix check to prevent false positive report of incorrect used inodes
ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()
ext4: fix error code in ext4_commit_super
media: dvbdev: Fix memory leak in dvb_media_device_free()
usb: gadget: dummy_hcd: fix gpf in gadget_setup
usb: gadget: Fix double free of device descriptor pointers
usb: gadget/function/f_fs string table fix for multiple languages
usb: dwc3: gadget: Fix START_TRANSFER link state check
usb: dwc2: Fix session request interrupt handler
tty: fix memory leak in vc_deallocate
rsi: Use resume_noirq for SDIO
tracing: Map all PIDs to command lines
tracing: Restructure trace_clock_global() to never block
dm persistent data: packed struct should have an aligned() attribute too
dm space map common: fix division bug in sm_ll_find_free_block()
dm rq: fix double free of blk_mq_tag_set in dev remove after table load fails
modules: mark ref_module static
modules: mark find_symbol static
modules: mark each_symbol_section static
modules: unexport __module_text_address
modules: unexport __module_address
modules: rename the licence field in struct symsearch to license
modules: return licensing information from find_symbol
modules: inherit TAINT_PROPRIETARY_MODULE
Bluetooth: verify AMP hci_chan before amp_destroy
hsr: use netdev_err() instead of WARN_ONCE()
bluetooth: eliminate the potential race condition when removing the HCI controller
net/nfc: fix use-after-free llcp_sock_bind/connect
ASoC: samsung: tm2_wm5110: check of of_parse return value
MIPS: pci-mt7620: fix PLL lock check
MIPS: pci-rt2880: fix slot 0 configuration
FDDI: defxx: Bail out gracefully with unassigned PCI resource for CSR
iio:accel:adis16201: Fix wrong axis assignment that prevents loading
misc: lis3lv02d: Fix false-positive WARN on various HP models
misc: vmw_vmci: explicitly initialize vmci_notify_bm_set_msg struct
misc: vmw_vmci: explicitly initialize vmci_datagram payload
md/bitmap: wait for external bitmap writes to complete during tear down
md-cluster: fix use-after-free issue when removing rdev
md: split mddev_find
md: factor out a mddev_find_locked helper from mddev_find
md: md_open returns -EBUSY when entering racing area
md: Fix missing unused status line of /proc/mdstat
ipw2x00: potential buffer overflow in libipw_wx_set_encodeext()
cfg80211: scan: drop entry from hidden_list on overflow
drm/radeon: fix copy of uninitialized variable back to userspace
ALSA: hda/realtek: Re-order ALC882 Acer quirk table entries
ALSA: hda/realtek: Re-order ALC882 Sony quirk table entries
ALSA: hda/realtek: Re-order ALC882 Clevo quirk table entries
ALSA: hda/realtek: Re-order ALC269 HP quirk table entries
ALSA: hda/realtek: Re-order ALC269 Dell quirk table entries
ALSA: hda/realtek: Re-order ALC269 Sony quirk table entries
ALSA: hda/realtek: Re-order ALC269 Lenovo quirk table entries
ALSA: hda/realtek: Remove redundant entry for ALC861 Haier/Uniwill devices
x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported
KVM: s390: split kvm_s390_logical_to_effective
KVM: s390: fix guarded storage control register handling
KVM: s390: split kvm_s390_real_to_abs
ovl: fix missing revert_creds() on error path
usb: gadget: pch_udc: Revert d3cb25a121 completely
memory: gpmc: fix out of bounds read and dereference on gpmc_cs[]
ARM: dts: exynos: correct fuel gauge interrupt trigger level on Midas family
ARM: dts: exynos: correct MUIC interrupt trigger level on Midas family
ARM: dts: exynos: correct PMIC interrupt trigger level on Midas family
ARM: dts: exynos: correct PMIC interrupt trigger level on Odroid X/U3 family
ARM: dts: exynos: correct PMIC interrupt trigger level on SMDK5250
ARM: dts: exynos: correct PMIC interrupt trigger level on Snow
serial: stm32: fix incorrect characters on console
serial: stm32: fix tx_empty condition
usb: typec: tcpci: Check ROLE_CONTROL while interpreting CC_STATUS
regmap: set debugfs_name to NULL after it is freed
mtd: rawnand: fsmc: Fix error code in fsmc_nand_probe()
mtd: rawnand: brcmnand: fix OOB R/W with Hamming ECC
mtd: Handle possible -EPROBE_DEFER from parse_mtd_partitions()
mtd: rawnand: qcom: Return actual error code instead of -ENODEV
x86/microcode: Check for offline CPUs before requesting new microcode
usb: gadget: pch_udc: Replace cpu_to_le32() by lower_32_bits()
usb: gadget: pch_udc: Check if driver is present before calling ->setup()
usb: gadget: pch_udc: Check for DMA mapping error
crypto: qat - don't release uninitialized resources
crypto: qat - ADF_STATUS_PF_RUNNING should be set after adf_dev_init
fotg210-udc: Fix DMA on EP0 for length > max packet size
fotg210-udc: Fix EP0 IN requests bigger than two packets
fotg210-udc: Remove a dubious condition leading to fotg210_done
fotg210-udc: Mask GRP2 interrupts we don't handle
fotg210-udc: Don't DMA more than the buffer can take
fotg210-udc: Complete OUT requests on short packets
mtd: require write permissions for locking and badblock ioctls
bus: qcom: Put child node before return
soundwire: bus: Fix device found flag correctly
phy: marvell: ARMADA375_USBCLUSTER_PHY should not default to y, unconditionally
crypto: qat - fix error path in adf_isr_resource_alloc()
usb: gadget: aspeed: fix dma map failure
USB: gadget: udc: fix wrong pointer passed to IS_ERR() and PTR_ERR()
soundwire: stream: fix memory leak in stream config error path
mtd: rawnand: gpmi: Fix a double free in gpmi_nand_init
irqchip/gic-v3: Fix OF_BAD_ADDR error handling
staging: rtl8192u: Fix potential infinite loop
staging: greybus: uart: fix unprivileged TIOCCSERIAL
spi: Fix use-after-free with devm_spi_alloc_*
soc: qcom: mdt_loader: Validate that p_filesz < p_memsz
soc: qcom: mdt_loader: Detect truncated read of segments
ACPI: CPPC: Replace cppc_attr with kobj_attribute
crypto: qat - Fix a double free in adf_create_ring
cpufreq: armada-37xx: Fix setting TBG parent for load levels
clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
cpufreq: armada-37xx: Fix the AVS value for load L1
clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
cpufreq: armada-37xx: Fix driver cleanup when registration failed
cpufreq: armada-37xx: Fix determining base CPU frequency
usb: gadget: r8a66597: Add missing null check on return from platform_get_resource
USB: cdc-acm: fix unprivileged TIOCCSERIAL
tty: actually undefine superseded ASYNC flags
tty: fix return value for unsupported ioctls
firmware: qcom-scm: Fix QCOM_SCM configuration
usbip: vudc: fix missing unlock on error in usbip_sockfd_store()
platform/x86: pmc_atom: Match all Beckhoff Automation baytrail boards with critclk_systems DMI table
x86/platform/uv: Fix !KEXEC build failure
Drivers: hv: vmbus: Increase wait time for VMbus unload
usb: dwc2: Fix host mode hibernation exit with remote wakeup flow.
usb: dwc2: Fix hibernation between host and device modes.
ttyprintk: Add TTY hangup callback.
soc: aspeed: fix a ternary sign expansion bug
media: vivid: fix assignment of dev->fbuf_out_flags
media: omap4iss: return error code when omap4iss_get() failed
media: m88rs6000t: avoid potential out-of-bounds reads on arrays
drm/amdkfd: fix build error with AMD_IOMMU_V2=m
x86/kprobes: Fix to check non boostable prefixes correctly
pata_arasan_cf: fix IRQ check
pata_ipx4xx_cf: fix IRQ check
sata_mv: add IRQ checks
ata: libahci_platform: fix IRQ check
nvme: retrigger ANA log update if group descriptor isn't found
vfio/mdev: Do not allow a mdev_type to have a NULL parent pointer
clk: qcom: a53-pll: Add missing MODULE_DEVICE_TABLE
clk: uniphier: Fix potential infinite loop
scsi: jazz_esp: Add IRQ check
scsi: sun3x_esp: Add IRQ check
scsi: sni_53c710: Add IRQ check
scsi: ibmvfc: Fix invalid state machine BUG_ON()
mfd: stm32-timers: Avoid clearing auto reload register
HSI: core: fix resource leaks in hsi_add_client_from_dt()
x86/events/amd/iommu: Fix sysfs type mismatch
sched/debug: Fix cgroup_path[] serialization
drivers/block/null_blk/main: Fix a double free in null_init.
HID: plantronics: Workaround for double volume key presses
perf symbols: Fix dso__fprintf_symbols_by_name() to return the number of printed chars
net: lapbether: Prevent racing when checking whether the netif is running
powerpc/prom: Mark identical_pvr_fixup as __init
powerpc: Fix HAVE_HARDLOCKUP_DETECTOR_ARCH build configuration
ALSA: core: remove redundant spin_lock pair in snd_card_disconnect
bug: Remove redundant condition check in report_bug
nfc: pn533: prevent potential memory corruption
net: hns3: Limiting the scope of vector_ring_chain variable
ALSA: usb-audio: Add error checks for usb_driver_claim_interface() calls
liquidio: Fix unintented sign extension of a left shift of a u16
powerpc/64s: Fix pte update for kernel memory on radix
powerpc/perf: Fix PMU constraint check for EBB events
powerpc: iommu: fix build when neither PCI or IBMVIO is set
mac80211: bail out if cipher schemes are invalid
mt7601u: fix always true expression
IB/hfi1: Fix error return code in parse_platform_config()
net: thunderx: Fix unintentional sign extension issue
RDMA/srpt: Fix error return code in srpt_cm_req_recv()
i2c: cadence: add IRQ check
i2c: emev2: add IRQ check
i2c: jz4780: add IRQ check
i2c: sh7760: add IRQ check
ASoC: ak5558: correct reset polarity
drm/i915/gvt: Fix error code in intel_gvt_init_device()
MIPS: pci-legacy: stop using of_pci_range_to_resource
powerpc/pseries: extract host bridge from pci_bus prior to bus removal
rtlwifi: 8821ae: upgrade PHY and RF parameters
i2c: sh7760: fix IRQ error path
mwl8k: Fix a double Free in mwl8k_probe_hw
vsock/vmci: log once the failed queue pair allocation
RDMA/i40iw: Fix error unwinding when i40iw_hmc_sd_one fails
ALSA: usb: midi: don't return -ENOMEM when usb_urb_ep_type_check fails
net: davinci_emac: Fix incorrect masking of tx and rx error channel
ath9k: Fix error check in ath9k_hw_read_revisions() for PCI devices
ath10k: Fix ath10k_wmi_tlv_op_pull_peer_stats_info() unlock without lock
powerpc/52xx: Fix an invalid ASM expression ('addi' used instead of 'add')
bnxt_en: fix ternary sign extension bug in bnxt_show_temp()
ARM: dts: uniphier: Change phy-mode to RGMII-ID to enable delay pins for RTL8211E
arm64: dts: uniphier: Change phy-mode to RGMII-ID to enable delay pins for RTL8211E
net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb
net:emac/emac-mac: Fix a use after free in emac_mac_tx_buf_send
RDMA/bnxt_re: Fix a double free in bnxt_qplib_alloc_res
net:nfc:digital: Fix a double free in digital_tg_recv_dep_req
kfifo: fix ternary sign extension bugs
mm/sparse: add the missing sparse_buffer_fini() in error branch
mm/memory-failure: unnecessary amount of unmapping
net: Only allow init netns to set default tcp cong to a restricted algo
smp: Fix smp_call_function_single_async prototype
Revert "net/sctp: fix race condition in sctp_destroy_sock"
sctp: delay auto_asconf init until binding the first addr
Revert "of/fdt: Make sure no-map does not remove already reserved regions"
Revert "fdt: Properly handle "no-map" field in the memory region"
tpm: fix error return code in tpm2_get_cc_attrs_tbl()
fs: dlm: fix debugfs dump
tipc: convert dest node's address to network order
ASoC: Intel: bytcr_rt5640: Enable jack-detect support on Asus T100TAF
net: stmmac: Set FIFO sizes for ipq806x
i2c: bail out early when RDWR parameters are wrong
ALSA: hdsp: don't disable if not enabled
ALSA: hdspm: don't disable if not enabled
ALSA: rme9652: don't disable if not enabled
Bluetooth: Set CONF_NOT_COMPLETE as l2cap_chan default
Bluetooth: initialize skb_queue_head at l2cap_chan_create()
net: bridge: when suppression is enabled exclude RARP packets
Bluetooth: check for zapped sk before connecting
ip6_vti: proper dev_{hold|put} in ndo_[un]init methods
ASoC: Intel: bytcr_rt5640: Add quirk for the Chuwi Hi8 tablet
i2c: Add I2C_AQ_NO_REP_START adapter quirk
mac80211: clear the beacon's CRC after channel switch
pinctrl: samsung: use 'int' for register masks in Exynos
cuse: prevent clone
selftests: Set CC to clang in lib.mk if LLVM is set
kconfig: nconf: stop endless search loops
sctp: Fix out-of-bounds warning in sctp_process_asconf_param()
powerpc/smp: Set numa node before updating mask
ASoC: rt286: Generalize support for ALC3263 codec
ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user()
samples/bpf: Fix broken tracex1 due to kprobe argument change
powerpc/pseries: Stop calling printk in rtas_stop_self()
wl3501_cs: Fix out-of-bounds warnings in wl3501_send_pkt
wl3501_cs: Fix out-of-bounds warnings in wl3501_mgmt_join
powerpc/iommu: Annotate nested lock for lockdep
net: ethernet: mtk_eth_soc: fix RX VLAN offload
ia64: module: fix symbolizer crash on fdescr
ASoC: rt286: Make RT286_SET_GPIO_* readable and writable
f2fs: fix a redundant call to f2fs_balance_fs if an error occurs
PCI: iproc: Fix return value of iproc_msi_irq_domain_alloc()
PCI: Release OF node in pci_scan_device()'s error path
ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook
rpmsg: qcom_glink_native: fix error return code of qcom_glink_rx_data()
NFSv4.2: Always flush out writes in nfs42_proc_fallocate()
NFS: Deal correctly with attribute generation counter overflow
PCI: endpoint: Fix missing destroy_workqueue()
pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()
NFSv4.2 fix handling of sr_eof in SEEK's reply
rtc: ds1307: Fix wday settings for rx8130
net: hns3: disable phy loopback setting in hclge_mac_start_phy
sctp: do asoc update earlier in sctp_sf_do_dupcook_a
ethernet:enic: Fix a use after free bug in enic_hard_start_xmit
sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
netfilter: xt_SECMARK: add new revision to fix structure layout
drm/radeon: Fix off-by-one power_state index heap overwrite
drm/radeon: Avoid power table parsing memory leaks
khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate()
mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()
ksm: fix potential missing rmap_item for stable_node
net: fix nla_strcmp to handle more then one trailing null character
smc: disallow TCP_ULP in smc_setsockopt()
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
sched/fair: Fix unfairness caused by missing load decay
kernel: kexec_file: fix error return code of kexec_calculate_store_digests()
netfilter: nftables: avoid overflows in nft_hash_buckets()
i40e: Fix use-after-free in i40e_client_subtask()
ARC: entry: fix off-by-one error in syscall number validation
powerpc/64s: Fix crashes when toggling stf barrier
powerpc/64s: Fix crashes when toggling entry flush barrier
hfsplus: prevent corruption in shrinking truncate
squashfs: fix divide error in calculate_skip()
userfaultfd: release page in error path to avoid BUG_ON
drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
iio: proximity: pulsedlight: Fix rumtime PM imbalance on error
usb: fotg210-hcd: Fix an error message
ACPI: scan: Fix a memory leak in an error handling path
blk-mq: Swap two calls in blk_mq_exit_queue()
usb: dwc3: omap: improve extcon initialization
usb: dwc3: pci: Enable usb2-gadget-lpm-disable for Intel Merrifield
usb: xhci: Increase timeout for HC halt
usb: dwc2: Fix gadget DMA unmap direction
usb: core: hub: fix race condition about TRSMRCY of resume
usb: dwc3: gadget: Return success always for kick transfer in ep queue
xhci: Do not use GFP_KERNEL in (potentially) atomic context
xhci: Add reset resume quirk for AMD xhci controller.
iio: gyro: mpu3050: Fix reported temperature value
iio: tsl2583: Fix division by a zero lux_val
cdc-wdm: untangle a circular dependency between callback and softint
KVM: x86: Cancel pvclock_gtod_work on module removal
FDDI: defxx: Make MMIO the configuration default except for EISA
MIPS: Reinstate platform `__div64_32' handler
MIPS: Avoid DIVU in `__div64_32' is result would be zero
MIPS: Avoid handcoded DIVU in `__div64_32' altogether
thermal/core/fair share: Lock the thermal zone while looping over instances
kobject_uevent: remove warning in init_uevent_argv()
netfilter: conntrack: Make global sysctls readonly in non-init netns
clk: exynos7: Mark aclk_fsys1_200 as critical
nvme: do not try to reconfigure APST when the controller is not live
x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes
kgdb: fix gcc-11 warning on indentation
usb: sl811-hcd: improve misleading indentation
cxgb4: Fix the -Wmisleading-indentation warning
isdn: capi: fix mismatched prototypes
pinctrl: ingenic: Improve unreachable code generation
xsk: Simplify detection of empty and full rings
PCI: thunder: Fix compile testing
ARM: 9066/1: ftrace: pause/unpause function graph tracer in cpu_suspend()
ACPI / hotplug / PCI: Fix reference count leak in enable_slot()
Input: elants_i2c - do not bind to i2c-hid compatible ACPI instantiated devices
Input: silead - add workaround for x86 BIOS-es which bring the chip up in a stuck state
um: Mark all kernel symbols as local
ARM: 9075/1: kernel: Fix interrupted SMC calls
scripts/recordmcount.pl: Fix RISC-V regex for clang
riscv: Workaround mcount name prior to clang-13
ceph: fix fscache invalidation
scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found
gpiolib: acpi: Add quirk to ignore EC wakeups on Dell Venue 10 Pro 5055
ALSA: hda: generic: change the DAC ctl name for LO+SPK or LO+HP
block: reexpand iov_iter after read/write
lib: stackdepot: turn depot_lock spinlock to raw_spinlock
net: stmmac: Do not enable RX FIFO overflow interrupts
ip6_gre: proper dev_{hold|put} in ndo_[un]init methods
sit: proper dev_{hold|put} in ndo_[un]init methods
ip6_tunnel: sit: proper dev_{hold|put} in ndo_[un]init methods
ipv6: remove extra dev_hold() for fallback tunnels
iomap: fix sub-page uptodate handling
KVM: arm64: Initialize VCPU mdcr_el2 before loading it
tweewide: Fix most Shebang lines
scripts: switch explicitly to Python 3
Linux 4.19.191
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2ea4fc6350bb5c5b5ae38ec7ad52ec20cf3b7aae
commit 1cea335d1db1ce6ab71b3d2f94a807112b738a0f upstream.
bio completions can race when a page spans more than one file system
block. Add a spinlock to synchronize marking the page uptodate.
Fixes: 9dc55f1389 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ext4 and f2fs have traditionally not supported direct I/O on encrypted
files, since it's difficult to implement with the traditional
filesystem-layer encryption. But when inline encryption is used
instead, it's straightforward to support direct I/O, as long as the I/O
is fully filesystem-block-aligned. Add support for it by:
- Making the two generic direct I/O implementations in the kernel,
__blockdev_direct_IO() and iomap_dio_rw(), set the encryption context
on bios for inline-encrypted files. __blockdev_direct_IO() is used by
f2fs, and was used by ext4 in kernel v5.4 and earlier. iomap_dio_rw()
is used by ext4 in kernel v5.5 and later.
- Making ext4 and f2fs allow direct I/O to encrypted files (rather the
current behavior of falling back to buffered I/O) when the file is
using inline encryption and the I/O is fully filesystem-block-aligned.
Bug: 137270441
Change-Id: I4c8f7497eb8f829d03611d24281113d68c21d4d1
Signed-off-by: Eric Biggers <ebiggers@google.com>
[ Upstream commit 8f67b5adc030553fbc877124306f3f3bdab89aa8 ]
In commit 4721a601099, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read. This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.
In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads. This causes infinite
splice() loops and assertion failures on generic/095 on overlayfs
because xfs only permit total success or total failure of a directio
operation. The underlying issue in the pipe splice code has now been
fixed by changing the pipe splice loop to avoid avoid reading more data
than there is space in the pipe.
Therefore, it's no longer necessary to simulate the short directio, so
remove the hack from iomap.
Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Ranted-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 419e9c38aa075ed0cd3c13d47e15954b686bcdb6 upstream.
When splicing using iomap_dio_rw() to a pipe, we may leak pipe pages
because bio_iov_iter_get_pages() records that the pipe will have full
extent worth of data however if file size is not block size aligned
iomap_dio_rw() returns less than what bio_iov_iter_get_pages() set up
and splice code gets confused leaking a pipe page with the file tail.
Handle the situation similarly to the old direct IO implementation and
revert iter to actually returned read amount which makes iter consistent
with value returned from iomap_dio_rw() and thus the splice code is
happy.
Fixes: ff6a9292e6 ("iomap: implement direct I/O")
CC: stable@vger.kernel.org
Reported-by: syzbot+991400e8eba7e00a26e1@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8c110d43c6bca4b24dd13272a9d4e0ba6f2ec957 ]
When we read the EOF page of the file via readpages, we need
to zero the region beyond EOF that we either do not read or
should not contain data so that mmap does not expose stale data to
user applications.
However, iomap_adjust_read_range() fails to detect EOF correctly,
and so fsx on 1k block size filesystems fails very quickly with
mapreads exposing data beyond EOF. There are two problems here.
Firstly, when calculating the end block of the EOF byte, we have
to round the size by one to avoid a block aligned EOF from reporting
a block too large. i.e. a size of 1024 bytes is 1 block, which in
index terms is block 0. Therefore we have to calculate the end block
from (isize - 1), not isize.
The second bug is determining if the current page spans EOF, and so
whether we need split it into two half, one for the IO, and the
other for zeroing. Unfortunately, the code that checks whether
we should split the block doesn't actually check if we span EOF, it
just checks if the read spans the /offset in the page/ that EOF
sits on. So it splits every read into two if EOF is not page
aligned, regardless of whether we are reading the EOF block or not.
Hence we need to restrict the "does the read span EOF" check to
just the page that spans EOF, not every page we read.
This patch results in correct EOF detection through readpages:
xfs_vm_readpages: dev 259:0 ino 0x43 nr_pages 24
xfs_iomap_found: dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4
iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864
xfs_iomap_found: dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c
iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864
iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864
As you can see, it now does full page reads until the last one which
is split correctly at the block aligned EOF, reading 3072 bytes and
zeroing the last 1024 bytes. The original version of the patch got
this right, but it got another case wrong.
The EOF detection crossing really needs to the the original length
as plen, while it starts at the end of the block, will be shortened
as up-to-date blocks are found on the page. This means "orig_pos +
plen" no longer points to the end of the page, and so will not
correctly detect EOF crossing. Hence we have to use the length
passed in to detect this partial page case:
xfs_filemap_fault: dev 259:1 ino 0x43 write_fault 0
xfs_vm_readpage: dev 259:1 ino 0x43 nr_pages 1
xfs_iomap_found: dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4
iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296
xfs_iomap_found: dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1
iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296
Heere we see a trace where the first block on the EOF page is up to
date, hence poff = 1024 bytes. The offset into the page of EOF is
3072, so the range we want to read is 1024 - 3071, and the range we
want to zero is 3072 - 4095. You can see this is split correctly
now.
This fixes the stale data beyond EOF problem that fsx quickly
uncovers on 1k block size filesystems.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4721a6010990971440b4ffefbdf014976b8eda2f ]
When doing direct IO to a pipe for do_splice_direct(), then pipe is
trivial to fill up and overflow as it can only hold 16 pages. At
this point bio_iov_iter_get_pages() then returns -EFAULT, and we
abort the IO submission process. Unfortunately, iomap_dio_rw()
propagates the error back up the stack.
The error is converted from the EFAULT to EAGAIN in
generic_file_splice_read() to tell the splice layers that the pipe
is full. do_splice_direct() completely fails to handle EAGAIN errors
(it aborts on error) and returns EAGAIN to the caller.
copy_file_write() then completely fails to handle EAGAIN as well,
and so returns EAGAIN to userspace, having failed to copy the data
it was asked to.
Avoid this whole steaming pile of fail by having iomap_dio_rw()
silently swallow EFAULT errors and so do short reads.
To make matters worse, iomap_dio_actor() has a stale data exposure
bug bio_iov_iter_get_pages() fails - it does not zero the tail block
that it may have been left uncovered by partial IO. Fix the error
handling case to drop to the sub-block zeroing rather than
immmediately returning the -EFAULT error.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b450672fb66b4a991a5b55ee24209ac7ae7690ce ]
If we are doing sub-block dio that extends EOF, we need to zero
the unused tail of the block to initialise the data in it it. If we
do not zero the tail of the block, then an immediate mmap read of
the EOF block will expose stale data beyond EOF to userspace. Found
with fsx running sub-block DIO sizes vs MAPREAD/MAPWRITE operations.
Fix this by detecting if the end of the DIO write is beyond EOF
and zeroing the tail if necessary.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0929d8580071c6a1cec1a7916a8f674c243ceee1 ]
When we write into an unwritten extent via direct IO, we dirty
metadata on IO completion to convert the unwritten extent to
written. However, when we do the FUA optimisation checks, the inode
may be clean and so we issue a FUA write into the unwritten extent.
This means we then bypass the generic_write_sync() call after
unwritten extent conversion has ben done and we don't force the
modified metadata to stable storage.
This violates O_DSYNC semantics. The window of exposure is a single
IO, as the next DIO write will see the inode has dirty metadata and
hence will not use the FUA optimisation. Calling
generic_write_sync() after completion of the second IO will also
sync the first write and it's metadata.
Fix this by avoiding the FUA optimisation when writing to unwritten
extents.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ea899ead2786a30aaa8181fefa81a3df4ad28f6 ]
Introduce a local wait_for_completion variable to avoid an access to the
potentially freed dio struture after dropping the last reference count.
Also use the chance to document the completion behavior to make the
refcounting clear to the reader of the code.
Fixes: ff6a9292e6 ("iomap: implement direct I/O")
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Chandan Rajendra <chandan@linux.ibm.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8e47a457321ca1a74ad194ab5dcbca764bc70731 ]
migrate_page_move_mapping() expects pages with private data set to have
a page_count elevated by 1. This is what used to happen for xfs through
the buffer_heads code before the switch to iomap in commit 82cb14175e
("xfs: add support for sub-pagesize writeback without buffer_heads").
Not having the count elevated causes move_pages() to fail on memory
mapped files coming from xfs.
Make iomap compatible with the migrate_page_move_mapping() assumption by
elevating the page count as part of iomap_page_create() and lowering it
in iomap_page_release().
It causes the move_pages() syscall to misbehave on memory mapped files
from xfs. It does not not move any pages, which I suppose is "just" a
perf issue, but it also ends up returning a positive number which is out
of spec for the syscall. Talking to Michal Hocko, it sounds like
returning positive numbers might be a necessary update to move_pages()
anyway though.
Fixes: 82cb14175e ("xfs: add support for sub-pagesize writeback without buffer_heads")
Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
[hch: actually get/put the page iomap_migrate_page() to make it work
properly]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3cc31fa65d85610574c0f6a474e89f4c419923d5 ]
iomap_is_partially_uptodate() is intended to check wither blocks within
the selected range of a not-uptodate page are uptodate; if the range we
care about is up to date, it's an optimization.
However, the iomap implementation continues to check all blocks up to
from+count, which is beyond the page, and can even be well beyond the
iop->uptodate bitmap.
I think the worst that will happen is that we may eventually find a zero
bit and return "not partially uptodate" when it would have otherwise
returned true, and skip the optimization. Still, it's clearly an invalid
memory access that must be fixed.
So: fix this by limiting the search to within the page as is done in the
non-iomap variant, block_is_partially_uptodate().
Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k
page system:
BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0
Read of size 8 at addr ffff800120c3a318 by task fsstress/22337
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a837eca2412051628c0529768c9bc4f3580b040e ]
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
The reverted commit added page reference counting to iomap page
structures that are used to track block size < page size state. This
was supposed to align the code with page migration page accounting
assumptions, but what it has done instead is break XFS filesystems.
Every fstests run I've done on sub-page block size XFS filesystems
has since picking up this commit 2 days ago has failed with bad page
state errors such as:
# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
....
SECTION -- xfs
FSTYP -- xfs (debug)
PLATFORM -- Linux/x86_64 test1 4.20.0-rc6-dgc+
MKFS_OPTIONS -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
generic/038 454s ...
run fstests generic/038 at 2018-12-20 18:43:05
XFS (sdc): Unmounting Filesystem
XFS (sdc): Mounting V5 Filesystem
XFS (sdc): Ending clean mount
BUG: Bad page state in process kswapd0 pfn:3a7fa
page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
flags: 0xfffffc0000000()
raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
raw: 0000000000000001 0000000000000000 00000000ffffffff
page dumped because: non-NULL mapping
CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
Call Trace:
dump_stack+0x67/0x90
bad_page.cold.116+0x8a/0xbd
free_pcppages_bulk+0x4bf/0x6a0
free_unref_page_list+0x10f/0x1f0
shrink_page_list+0x49d/0xf50
shrink_inactive_list+0x19d/0x3b0
shrink_node_memcg.constprop.77+0x398/0x690
? shrink_slab.constprop.81+0x278/0x3f0
shrink_node+0x7a/0x2f0
kswapd+0x34b/0x6d0
? node_reclaim+0x240/0x240
kthread+0x11f/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x24/0x30
Disabling lock debugging due to kernel taint
....
The failures are from anyway that frees pages and empties the
per-cpu page magazines, so it's not a predictable failure or an easy
to debug failure.
generic/038 is a reliable reproducer of this problem - it has a 9 in
10 failure rate on one of my test machines. Failure on other
machines have been at random points in fstests runs but every run
has ended up tripping this problem. Hence generic/038 was used to
bisect the failure because it was the most reliable failure.
It is too close to the 4.20 release (not to mention holidays) to
try to diagnose, fix and test the underlying cause of the problem,
so reverting the commit is the only option we have right now. The
revert has been tested against a current tot 4.20-rc7+ kernel across
multiple machines running sub-page block size XFs filesystems and
none of the bad page state failures have been seen.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 61c6de667263184125d5ca75e894fcad632b0dd3 upstream.
migrate_page_move_mapping() expects pages with private data set to have
a page_count elevated by 1. This is what used to happen for xfs through
the buffer_heads code before the switch to iomap in commit 82cb14175e
("xfs: add support for sub-pagesize writeback without buffer_heads").
Not having the count elevated causes move_pages() to fail on memory
mapped files coming from xfs.
Make iomap compatible with the migrate_page_move_mapping() assumption by
elevating the page count as part of iomap_page_create() and lowering it
in iomap_page_release().
It causes the move_pages() syscall to misbehave on memory mapped files
from xfs. It does not not move any pages, which I suppose is "just" a
perf issue, but it also ends up returning a positive number which is out
of spec for the syscall. Talking to Michal Hocko, it sounds like
returning positive numbers might be a necessary update to move_pages()
anyway though
(https://lkml.kernel.org/r/20181116114955.GJ14706@dhcp22.suse.cz).
I only hit this in tests that verify that move_pages() actually moved
the pages. The test also got confused by the positive return from
move_pages() (it got treated as a success as positive numbers were not
expected and not handled) making it a bit harder to track down what's
going on.
Link: http://lkml.kernel.org/r/20181115184140.1388751-1-pjaroszynski@nvidia.com
Fixes: 82cb14175e ("xfs: add support for sub-pagesize writeback without buffer_heads")
Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The iomap page fault mechanism currently dirties the associated page
after the full block range of the page has been allocated. This
leaves the page susceptible to delayed allocations without ever
being set dirty on sub-page block sized filesystems.
For example, consider a page fault on a page with one preexisting
real (non-delalloc) block allocated in the middle of the page. The
first iomap_apply() iteration performs delayed allocation on the
range up to the preexisting block, the next iteration finds the
preexisting block, and the last iteration attempts to perform
delayed allocation on the range after the prexisting block to the
end of the page. If the first allocation succeeds and the final
allocation fails with -ENOSPC, iomap_apply() returns the error and
iomap_page_mkwrite() fails to dirty the page having already
performed partial delayed allocation. This eventually results in the
page being invalidated without ever converting the delayed
allocation to real blocks.
This problem is reliably reproduced by generic/083 on XFS on ppc64
systems (64k page size, 4k block size). It results in leaked
delalloc blocks on inode reclaim, which triggers an assert failure
in xfs_fs_destroy_inode() and filesystem accounting inconsistency.
Move the set_page_dirty() call from iomap_page_mkwrite() to the
actor callback, similar to how the buffer head implementation works.
The actor callback is called iff ->iomap_begin() returns success, so
ensures the page is dirtied as soon as possible after an allocation.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Pull xfs fixes from Darrick Wong:
- Fix an uninitialized variable
- Don't use obviously garbage AG header counters to calculate
transaction reservations
- Trigger icount recalculation on bad icount when mounting
* tag 'xfs-4.19-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: fix WARN_ON_ONCE on uninitialized variable
xfs: sanity check ag header values in xrep_calc_ag_resblks
xfs: recalculate summary counters at mount time if icount is bad
Pull xfs updates from Darrick Wong:
"This is the second part of the XFS changes for 4.19.
The biggest changes are the removal of buffer heads frm XFS, a massive
reworking of the deferred transaction operations handling code, the
removal of the long defunct barrier/nobarrier mount options, and the
addition of a few more online repair functions.
Summary:
- Use extent maps to track pagecache page status instead of
bufferhead state.
- Refactor pagecache read and write paths to use the new iomap
library functions, which enable us to drop the old bufferhead code
for pagesize == blocksize filesystems.
- Set up parallel per-block-per-page metadata to track subpage
information that was tracked by buffer heads, which enables us to
drop the old bufferhead code for pagesize > blocksize filesystems.
- Tie a deferred ops control structure to a transaction so that we
can take advantage of an upper-level dfops without having to plumb
pointer passing through the code.
- Refactor the deferred ops code to track deferred ops as part of the
transaction structure (instead of as a separate data structure) so
that we can simplify the scoping rules around defer_ops.
- Refactor twisty delwri buffer submission code to avoid deadlocks.
- Shorten and fix indenting problems in the scrub code.
- Detect obviously bad summary counts at mount and fix them.
- Directly associate deferred ops control structure with a
transaction so that callers no longer have to manage it themselves.
- Remove a couple of IRIX-era inode macros.
- Remove the long-deprecated 'barrier' and 'nobarrier' mount options.
- Clean up the inode fork structure a bit.
- Check for bad fs summary counter values in the superblock.
- Reduce COW fork lookups during writeback.
- Refactor the deferred ops control structures into the transaction
structure, thereby eliminating the need for transaction users to
handle the deferred ops as a separate data structure.
- Add the ability to repair AG headers online.
- Fix a crash due to insufficient return value checking.
- Various fixes and cleanups"
* tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (155 commits)
xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree
xfs: remove b_last_holder & associated macros
iomap: Switch to offset_in_page for clarity
xfs: Close race between direct IO and xfs_break_layouts()
xfs: repair the AGI
xfs: repair the AGFL
xfs: repair the AGF
xfs: remove dead error handling code in xfs_dquot_disk_alloc()
xfs: use WRITE_ONCE to update if_seq
xfs: fix a comment in xfs_log_reserve
xfs: only validate summary counts on primary superblock
xfs: substitute spaces with tabs
xfs: fold dfops into the transaction
xfs: always defer agfl block frees
xfs: pass transaction to xfs_defer_add()
xfs: replace xfs_defer_ops ->dop_pending with on-stack list
xfs: cancel dfops on xfs_defer_finish() error
xfs: clean out superfluous dfops dop params/vars
xfs: drop dop param from xfs_defer_op_type ->finish_item() callback
xfs: automatic dfops inode relogging
...
In commit 9dc55f1389 ("iomap: add support for sub-pagesize buffered
I/O without buffer heads") we moved the initialization of poff (it's
computed from pos) into a separate helper function. Inline data only
ever deals with pos == 0, hence the WARN_ON_ONCE, but now we're testing
an uninitialized variable.
Therefore, change the test to check the parameter directly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Pull fs iomap refactoring from Darrick Wong:
"This is the first part of the XFS changes for 4.19.
Christoph and Andreas coordinated some refactoring work on the iomap
code in preparation for removing buffer heads from XFS and porting
gfs2 to iomap. I'm sending this small pull request ahead of the main
XFS merge to avoid holding up gfs2 unnecessarily"
* 'iomap-4.19-merge' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: add inline data support to iomap_readpage_actor
iomap: support direct I/O to inline data
iomap: refactor iomap_dio_actor
iomap: add initial support for writes without buffer heads
iomap: add an iomap-based readpage and readpages implementation
iomap: add private pointer to struct iomap
iomap: add a page_done callback
iomap: generic inline data handling
iomap: complete partial direct I/O writes synchronously
iomap: mark newly allocated buffer heads as new
fs: factor out a __generic_write_end helper
Instead of open-coding pos & (PAGE_SIZE - 1) and pos & ~PAGE_MASK, use
the offset_in_page macro.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The position calculation in iomap_bmap() shifts bno the wrong way,
so we don't progress properly and end up re-mapping block zero
over and over, yielding an unchanging physical block range as the
logical block advances:
# filefrag -Be file
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 0: 21.. 21: 1: merged
1: 1.. 1: 21.. 21: 1: 22: merged
Discontinuity: Block 1 is at 21 (was 22)
2: 2.. 2: 21.. 21: 1: 22: merged
Discontinuity: Block 2 is at 21 (was 22)
3: 3.. 3: 21.. 21: 1: 22: merged
This breaks the FIBMAP interface for anyone using it (XFS), which
in turn breaks LILO, zipl, etc.
Bug-actually-spotted-by: Darrick J. Wong <darrick.wong@oracle.com>
Fixes: 89eb1906a9 ("iomap: add an iomap-based bmap implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
After already supporting a simple implementation of buffered writes for
the blocksize == PAGE_SIZE case in the last commit this adds full support
even for smaller block sizes. There are three bits of per-block
information in the buffer_head structure that really matter for the iomap
read and write path:
- uptodate status (BH_uptodate)
- marked as currently under read I/O (BH_Async_Read)
- marked as currently under write I/O (BH_Async_Write)
Instead of having new per-block structures this now adds a per-page
structure called struct iomap_page to track this information in a slightly
different form:
- a bitmap for the per-block uptodate status. For worst case of a 64k
page size system this bitmap needs to contain 128 bits. For the
typical 4k page size case it only needs 8 bits, although we still
need a full unsigned long due to the way the atomic bitmap API works.
- two atomic_t counters are used to track the outstanding read and write
counts
There is quite a bit of boilerplate code as the buffered I/O path uses
various helper methods, but the actual code is very straight forward.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Add support for reading from and writing to inline data to iomap_dio_rw.
This saves filesystems from having to implement fallback code for this
case.
The inline data is actually cached in the inode, so the I/O is only
direct in the sense that it doesn't go through the page cache.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Split the function up into two helpers for the bio based I/O and hole
case, and a small helper to call the two. This separates the code a
little better in preparation for supporting I/O to inline data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
For now just limited to blocksize == PAGE_SIZE, where we can simply read
in the full page in write begin, and just set the whole page dirty after
copying data into it. This code is enabled by default and XFS will now
be feed pages without buffer heads in ->writepage and ->writepages.
If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old
path will still be used, this both helps the transition in XFS and
prepares for the gfs2 migration to the iomap infrastructure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Simply use iomap_apply to iterate over the file and a submit a bio for
each non-uptodate but mapped region and zero everything else. Note that
as-is this can not be used for file systems with a blocksize smaller than
the page size, but that support will be added later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This will be used by gfs2 to attach data to transactions for the journaled
data mode. But the concept is generic enough that we might be able to
use it for other purposes like encryption/integrity post-processing in the
future.
Based on a patch from Andreas Gruenbacher.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Add generic inline data handling by adding a pointer to the inline data
region to struct iomap. When handling a buffered IOMAP_INLINE write,
iomap_write_begin will copy the current inline data from the inline data
region into the page cache, and iomap_write_end will copy the changes in
the page cache back to the inline data region.
This doesn't cover inline data reads and direct I/O yet because so far,
we have no users.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[hch: small cleanups to better fit in with other iomap work]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated. This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write as buffered I/O. The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.
The above mentioned fallback mode is needed for gfs2, which doesn't
allow block allocations under direct I/O to avoid taking cluster-wide
exclusive locks. As a consequence, an asynchronous direct I/O write to
a file range that contains a hole will result in a short write. In that
case, wait for the short write to complete to allow gfs2 to recover.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull more xfs updates from Darrick Wong:
"Here's the second round of patches for XFS for 4.18. Most of the
commits are small cleanups, bug fixes, and continued strengthening of
metadata verifiers; the bulk of the diff is the conversion of the
fs/xfs/ tree to use SPDX tags.
This series has been run through a full xfstests run over the weekend
and through a quick xfstests run against this morning's master, with
no major failures reported.
Summary:
- Strengthen metadata checking to avoid ASSERTing on bad disk
contents
- Validate btree records that are being retrieved for clients
- Strengthen root inode verification
- Convert license blurbs to SPDX tags
- Enable changing DAX flag on directories
- Fix some writeback deadlocks in reflink
- Refactor out some old xfs helpers
- Move type verifiers to a separate file
- Fix some fuzzer crashes
- Various other bug fixes"
* tag 'xfs-4.18-merge-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (31 commits)
xfs: update incore per-AG inode count
xfs: replace do_mod with native operations
xfs: don't call xfs_da_shrink_inode with NULL bp
xfs: clean up MIN/MAX
xfs: move various type verifiers to common file
xfs: xfs_reflink_convert_cow() memory allocation deadlock
xfs: setup VFS i_rwsem lockdep state correctly
xfs: fix string handling in label get/set functions
xfs: convert to SPDX license tags
xfs: validate btree records on retrieval
xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode()
xfs: verify root inode more thoroughly
xfs: verify COW extent size hint is valid in inode verifier
xfs: verify extent size hint is valid in inode verifier
xfs: catch bad stripe alignment configurations
iomap: fsync swap files before iterating mappings
xfs: use xfs_trans_getsb in xfs_sync_sb_buf
xfs: don't assert on corrupted unlinked inode list
xfs: explicitly pass buffer size to xfs_corruption_error
xfs: don't assert when on-disk btree pointers are garbage
...
Pull aio iopriority support from Al Viro:
"The rest of aio stuff for this cycle - Adam's aio ioprio series"
* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: aio ioprio use ioprio_check_cap ret val
fs: aio ioprio add explicit block layer dependence
fs: iomap dio set bio prio from kiocb prio
fs: blkdev set bio prio from kiocb prio
fs: Add aio iopriority support
fs: Convert kiocb rw_hint from enum to u16
block: add ioprio_check_cap function
Swap files require that all the file mapping metadata be stable on disk.
It is insufficient to flush dirty pages in the page cache because that
won't necessarily result in filesystems pushing all their metadata out
to disk. Therefore, call fsync from iomap_swapfile_activate.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
We only call into this function through the iomap iterators, so we already
know the buffer is unwritten. In addition to that we always require the
uptodate flag that is ORed with the result anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This function is only used by the iomap code, depends on being called
from it, and will soon stop poking into buffer head internals.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This adds a simple iomap-based implementation of the legacy ->bmap
interface. Note that we can't easily add checks for rt or reflink
files, so these will have to remain in the callers. This interface
just needs to die..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Factor the repeated calculation of the on-disk sector for a given logical
block into a littler helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We don't need any merging logic, and this also replaces a BUG_ON with a
WARN_ON_ONCE inside __bio_add_page for the impossible overflow condition.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address. So instead of having a flag for it
it should be an entirely separate iomap range type.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that kiocb has an ioprio field copy this over to the bio when it is
created from the kiocb during direct IO.
Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
generic_swapfile_activate() doesn't allow holes, so we should be
consistent here. This is also a bit safer: if the user creates a
swapfile with, say, truncate -s $SIZE followed by mkswap, they should
really get an error and not much less swap space than they expected.
swapon(8) will error out before calling swapon(2) if the file has holes,
anyways.
Fixes: 9d93388b0afe ("iomap: add a swapfile activation function")
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Currently, for an invalid swap file, we print the same error message
regardless of the reason. This isn't very useful for an admin, who will
likely want to know why exactly they can't use their swap file. So,
let's add specific error messages for each reason, and also move the
bdev check after the flags checks, since the latter are more
fundamental.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Add a new iomap_swapfile_activate function so that filesystems can
activate swap files without having to use the obsolete and slow bmap
function. This enables XFS to support fallocate'd swap files and
swap files on realtime devices.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
If we are doing direct IO writes with datasync semantics, we often
have to flush metadata changes along with the data write. However,
if we are overwriting existing data, there are no metadata changes
that we need to flush. In this case, optimising the IO by using
FUA write makes sense.
We know from the IOMAP_F_DIRTY flag as to whether a specific inode
requires a metadata flush - this is currently used by DAX to ensure
extent modification as stable in page fault operations. For direct
IO writes, we can use it to determine if we need to flush metadata
or not once the data is on disk.
Hence if we have been returned a mapped extent that is not new and
the IO mapping is not dirty, then we can use a FUA write to provide
datasync semantics. This allows us to short-cut the
generic_write_sync() call in IO completion and hence avoid
unnecessary operations. This makes pure direct IO data write
behaviour identical to the way block devices use REQ_FUA to provide
datasync semantics.
On a FUA enabled device, a synchronous direct IO write workload
(sequential 4k overwrites in 32MB file) had the following results:
# xfs_io -fd -c "pwrite -V 1 -D 0 32m" /mnt/scratch/boo
kernel time write()s write iops Write b/w
------ ---- -------- ---------- ---------
(no dsync) 4s 2173/s 2173 8.5MB/s
vanilla 22s 370/s 750 1.4MB/s
patched 19s 420/s 420 1.6MB/s
The patched code clearly doesn't send cache flushes anymore, but
instead uses FUA (confirmed via blktrace), and performance improves
a bit as a result. However, the benefits will be higher on workloads
that mix O_DSYNC overwrites with other write IO as we won't be
flushing the entire device cache on every DSYNC overwrite IO
anymore.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Currently iomap_dio_rw() only handles (data)sync write completions
for AIO. This means we can't optimised non-AIO IO to minimise device
flushes as we can't tell the caller whether a flush is required or
not.
To solve this problem and enable further optimisations, make
iomap_dio_rw responsible for data sync behaviour for all IO, not
just AIO.
In doing so, the sync operation is now accounted as part of the DIO
IO by inode_dio_end(), hence post-IO data stability updates will no
long race against operations that serialise via inode_dio_wait()
such as truncate or hole punch.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Don't let the iomap callback get away with feeding us a garbage zero
length mapping -- there was a bug in xfs that resulted in those leaking
out to hilarious effect.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
If two programs simultaneously try to write to the same part of a file
via direct IO and buffered IO, there's a chance that the post-diowrite
pagecache invalidation will fail on the dirty page. When this happens,
the dio write succeeded, which means that the page cache is no longer
coherent with the disk!
Programs are not supposed to mix IO types and this is a clear case of
data corruption, so store an EIO which will be reflected to userspace
during the next fsync. Replace the WARN_ON with a ratelimited pr_crit
so that the developers have /some/ kind of breadcrumb to track down the
offending program(s) and file(s) involved.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>