dc261c32e7327e4f4b8192a73823f49b35805101
379 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
dc261c32e7 |
Merge 4.19.267 into android-4.19-stable
Changes in 4.19.267 phy: stm32: fix an error code in probe wifi: cfg80211: fix memory leak in query_regdb_file() HID: hyperv: fix possible memory leak in mousevsc_probe() net: gso: fix panic on frag_list with mixed head alloc types net: tun: Fix memory leaks of napi_get_frags bnxt_en: fix potentially incorrect return value for ndo_rx_flow_steer net: fman: Unregister ethernet device on removal capabilities: fix undefined behavior in bit shift for CAP_TO_MASK net: lapbether: fix issue of dev reference count leakage in lapbeth_device_event() hamradio: fix issue of dev reference count leakage in bpq_device_event() drm/vc4: Fix missing platform_unregister_drivers() call in vc4_drm_register() ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network tipc: fix the msg->req tlv len check in tipc_nl_compat_name_table_dump_header dmaengine: mv_xor_v2: Fix a resource leak in mv_xor_v2_remove() drivers: net: xgene: disable napi when register irq failed in xgene_enet_open() net: nixge: disable napi when enable interrupts failed in nixge_open() net: cxgb3_main: disable napi when bind qsets failed in cxgb_up() ethernet: s2io: disable napi when start nic failed in s2io_card_up() net: mv643xx_eth: disable napi when init rxq or txq failed in mv643xx_eth_open() net: macvlan: fix memory leaks of macvlan_common_newlink riscv: process: fix kernel info leakage arm64: efi: Fix handling of misaligned runtime regions and drop warning ALSA: hda/ca0132: add quirk for EVGA Z390 DARK ALSA: hda: fix potential memleak in 'add_widget_node' ALSA: usb-audio: Add quirk entry for M-Audio Micro ALSA: usb-audio: Add DSD support for Accuphase DAC-60 vmlinux.lds.h: Fix placement of '.data..decrypted' section nilfs2: fix deadlock in nilfs_count_free_blocks() nilfs2: fix use-after-free bug of ns_writer on remount drm/i915/dmabuf: fix sg_table handling in map_dma_buf platform/x86: hp_wmi: Fix rfkill causing soft blocked wifi btrfs: selftests: fix wrong error check in btrfs_free_dummy_root() udf: Fix a slab-out-of-bounds write bug in udf_find_entry() cert host tools: Stop complaining about deprecated OpenSSL functions dmaengine: at_hdmac: Fix at_lli struct definition dmaengine: at_hdmac: Don't start transactions at tx_submit level dmaengine: at_hdmac: Fix completion of unissued descriptor in case of errors dmaengine: at_hdmac: Don't allow CPU to reorder channel enable dmaengine: at_hdmac: Fix impossible condition dmaengine: at_hdmac: Check return code of dma_async_device_register net: tun: call napi_schedule_prep() to ensure we own a napi x86/cpu: Restore AMD's DE_CFG MSR after resume ASoC: wm5102: Revert "ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe" ASoC: wm5110: Revert "ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe" ASoC: wm8997: Revert "ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe" spi: intel: Fix the offset to get the 64K erase opcode selftests/futex: fix build for clang selftests/intel_pstate: fix build for ARCH=x86_64 NFSv4: Retry LOCK on OLD_STATEID during delegation return drm/imx: imx-tve: Fix return type of imx_tve_connector_mode_valid btrfs: remove pointless and double ulist frees in error paths of qgroup tests Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm ASoC: core: Fix use-after-free in snd_soc_exit() serial: 8250_omap: remove wait loop from Errata i202 workaround serial: 8250: omap: Flush PM QOS work on remove serial: imx: Add missing .thaw_noirq hook tty: n_gsm: fix sleep-in-atomic-context bug in gsm_control_send ASoC: soc-utils: Remove __exit for snd_soc_util_exit() block: sed-opal: kmalloc the cmd/resp buffers siox: fix possible memory leak in siox_device_add() parport_pc: Avoid FIFO port location truncation pinctrl: devicetree: fix null pointer dereferencing in pinctrl_dt_to_map ata: libata-transport: fix double ata_host_put() in ata_tport_add() net: bgmac: Drop free_netdev() from bgmac_enet_remove() mISDN: fix possible memory leak in mISDN_dsp_element_register() mISDN: fix misuse of put_device() in mISDN_register_device() net: caif: fix double disconnect client in chnl_net_open() bnxt_en: Remove debugfs when pci_register_driver failed xen/pcpu: fix possible memory leak in register_pcpu() drbd: use after free in drbd_create_device() net/x25: Fix skb leak in x25_lapb_receive_frame() cifs: Fix wrong return value checking when GETFLAGS net: thunderbolt: Fix error handling in tbnet_init() ftrace: Fix the possible incorrect kernel message ftrace: Optimize the allocation for mcount entries ftrace: Fix null pointer dereference in ftrace_add_mod() ring_buffer: Do not deactivate non-existant pages ALSA: usb-audio: Drop snd_BUG_ON() from snd_usbmidi_output_open() slimbus: stream: correct presence rate frequencies speakup: fix a segfault caused by switching consoles USB: serial: option: add Sierra Wireless EM9191 USB: serial: option: remove old LARA-R6 PID USB: serial: option: add u-blox LARA-R6 00B modem USB: serial: option: add u-blox LARA-L6 modem USB: serial: option: add Fibocom FM160 0x0111 composition usb: add NO_LPM quirk for Realforce 87U Keyboard usb: chipidea: fix deadlock in ci_otg_del_timer iio: adc: at91_adc: fix possible memory leak in at91_adc_allocate_trigger() iio: trigger: sysfs: fix possible memory leak in iio_sysfs_trig_init() iio: pressure: ms5611: changed hardcoded SPI speed to value limited dm ioctl: fix misbehavior if list_versions races with module loading serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs serial: 8250_lpss: Configure DMA also w/o DMA filter mmc: core: properly select voltage range without power cycle mmc: sdhci-pci: Fix possible memory leak caused by missing pci_dev_put() docs: update mediator contact information in CoC doc misc/vmw_vmci: fix an infoleak in vmci_host_do_receive_datagram() scsi: target: tcm_loop: Fix possible name leak in tcm_loop_setup_hba_bus() Input: i8042 - fix leaking of platform device on module removal serial: 8250: Flush DMA Rx on RLSI macvlan: enforce a consistent minimal mtu tcp: cdg: allow tcp_cdg_release() to be called multiple times kcm: avoid potential race in kcm_tx_work bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb() kcm: close race conditions on sk_receive_queue 9p: trans_fd/p9_conn_cancel: drop client lock earlier gfs2: Check sb_bsize_shift after reading superblock gfs2: Switch from strlcpy to strscpy 9p/trans_fd: always use O_NONBLOCK read/write mm: fs: initialize fsdata passed to write_begin/write_end interface ntfs: fix use-after-free in ntfs_attr_find() ntfs: fix out-of-bounds read in ntfs_attr_find() ntfs: check overflow when iterating ATTR_RECORDs Linux 4.19.267 Change-Id: Id7e07ae5c1681de4cd1b0499cf1bfd257ca2261b Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
8a5be2948f |
mm: fs: initialize fsdata passed to write_begin/write_end interface
commit 1468c6f4558b1bcd92aa0400f2920f9dc7588402 upstream. Functions implementing the a_ops->write_end() interface accept the `void *fsdata` parameter that is supposed to be initialized by the corresponding a_ops->write_begin() (which accepts `void **fsdata`). However not all a_ops->write_begin() implementations initialize `fsdata` unconditionally, so it may get passed uninitialized to a_ops->write_end(), resulting in undefined behavior. Fix this by initializing fsdata with NULL before the call to write_begin(), rather than doing so in all possible a_ops implementations. This patch covers only the following cases found by running x86 KMSAN under syzkaller: - generic_perform_write() - cont_expand_zero() and generic_cont_expand_simple() - page_symlink() Other cases of passing uninitialized fsdata may persist in the codebase. Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
d1253c75a8 |
Merge 4.19.155 into android-4.19-stable
Changes in 4.19.155
objtool: Support Clang non-section symbols in ORC generation
scripts/setlocalversion: make git describe output more reliable
arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs
arm64: link with -z norelro regardless of CONFIG_RELOCATABLE
x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled
efivarfs: Replace invalid slashes with exclamation marks in dentries.
chelsio/chtls: fix deadlock issue
chelsio/chtls: fix memory leaks in CPL handlers
chelsio/chtls: fix tls record info to user
gtp: fix an use-before-init in gtp_newlink()
mlxsw: core: Fix memory leak on module removal
netem: fix zero division in tabledist
ravb: Fix bit fields checking in ravb_hwtstamp_get()
tcp: Prevent low rmem stalls with SO_RCVLOWAT.
tipc: fix memory leak caused by tipc_buf_append()
r8169: fix issue with forced threading in combination with shared interrupts
cxgb4: set up filter action after rewrites
arch/x86/amd/ibs: Fix re-arming IBS Fetch
x86/xen: disable Firmware First mode for correctable memory errors
fuse: fix page dereference after free
bpf: Fix comment for helper bpf_current_task_under_cgroup()
evm: Check size of security.evm before using it
p54: avoid accessing the data mapped to streaming DMA
cxl: Rework error message for incompatible slots
RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()
mtd: lpddr: Fix bad logic in print_drs_error
serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt
ata: sata_rcar: Fix DMA boundary mask
fscrypt: return -EXDEV for incompatible rename or link into encrypted dir
fscrypt: clean up and improve dentry revalidation
fscrypt: fix race allowing rename() and link() of ciphertext dentries
fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory
fscrypt: only set dentry_operations on ciphertext dentries
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
Revert "block: ratelimit handle_bad_sector() message"
xen/events: don't use chip_data for legacy IRQs
xen/events: avoid removing an event channel while handling it
xen/events: add a proper barrier to 2-level uevent unmasking
xen/events: fix race in evtchn_fifo_unmask()
xen/events: add a new "late EOI" evtchn framework
xen/blkback: use lateeoi irq binding
xen/netback: use lateeoi irq binding
xen/scsiback: use lateeoi irq binding
xen/pvcallsback: use lateeoi irq binding
xen/pciback: use lateeoi irq binding
xen/events: switch user event channels to lateeoi model
xen/events: use a common cpu hotplug hook for event channels
xen/events: defer eoi in case of excessive number of events
xen/events: block rogue events for some time
x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels
mlxsw: core: Fix use-after-free in mlxsw_emad_trans_finish()
RDMA/qedr: Fix memory leak in iWARP CM
ata: sata_nv: Fix retrieving of active qcs
futex: Fix incorrect should_fail_futex() handling
powerpc/powernv/smp: Fix spurious DBG() warning
mm: fix exec activate_mm vs TLB shootdown and lazy tlb switching race
powerpc: select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
sparc64: remove mm_cpumask clearing to fix kthread_use_mm race
f2fs: add trace exit in exception path
f2fs: fix uninit-value in f2fs_lookup
f2fs: fix to check segment boundary during SIT page readahead
um: change sigio_spinlock to a mutex
ARM: 8997/2: hw_breakpoint: Handle inexact watchpoint addresses
power: supply: bq27xxx: report "not charging" on all types
xfs: fix realtime bitmap/summary file truncation when growing rt volume
video: fbdev: pvr2fb: initialize variables
ath10k: start recovery process when payload length exceeds max htc length for sdio
ath10k: fix VHT NSS calculation when STBC is enabled
drm/brige/megachips: Add checking if ge_b850v3_lvds_init() is working correctly
media: videodev2.h: RGB BT2020 and HSV are always full range
media: platform: Improve queue set up flow for bug fixing
usb: typec: tcpm: During PR_SWAP, source caps should be sent only after tSwapSourceStart
media: tw5864: check status of tw5864_frameinterval_get
media: imx274: fix frame interval handling
mmc: via-sdmmc: Fix data race bug
drm/bridge/synopsys: dsi: add support for non-continuous HS clock
arm64: topology: Stop using MPIDR for topology information
printk: reduce LOG_BUF_SHIFT range for H8300
ia64: kprobes: Use generic kretprobe trampoline handler
kgdb: Make "kgdbcon" work properly with "kgdb_earlycon"
media: uvcvideo: Fix dereference of out-of-bound list iterator
riscv: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
cpufreq: sti-cpufreq: add stih418 support
USB: adutux: fix debugging
uio: free uio id after uio file node is freed
usb: xhci: omit duplicate actions when suspending a runtime suspended host.
arm64/mm: return cpu_all_mask when node is NUMA_NO_NODE
xfs: don't free rt blocks when we're doing a REMAP bunmapi call
ACPI: Add out of bounds and numa_off protections to pxm_to_node()
drivers/net/wan/hdlc_fr: Correctly handle special skb->protocol values
bus/fsl_mc: Do not rely on caller to provide non NULL mc_io
power: supply: test_power: add missing newlines when printing parameters by sysfs
drm/amd/display: HDMI remote sink need mode validation for Linux
btrfs: fix replace of seed device
md/bitmap: md_bitmap_get_counter returns wrong blocks
bnxt_en: Log unknown link speed appropriately.
rpmsg: glink: Use complete_all for open states
clk: ti: clockdomain: fix static checker warning
net: 9p: initialize sun_server.sun_path to have addr's value only when addr is valid
drivers: watchdog: rdc321x_wdt: Fix race condition bugs
ext4: Detect already used quota file early
gfs2: add validation checks for size of superblock
cifs: handle -EINTR in cifs_setattr
arm64: dts: renesas: ulcb: add full-pwr-cycle-in-suspend into eMMC nodes
ARM: dts: omap4: Fix sgx clock rate for 4430
memory: emif: Remove bogus debugfs error handling
ARM: dts: s5pv210: remove DMA controller bus node name to fix dtschema warnings
ARM: dts: s5pv210: move PMU node out of clock controller
ARM: dts: s5pv210: remove dedicated 'audio-subsystem' node
nbd: make the config put is called before the notifying the waiter
sgl_alloc_order: fix memory leak
nvme-rdma: fix crash when connect rejected
md/raid5: fix oops during stripe resizing
mmc: sdhci-acpi: AMDI0040: Set SDHCI_QUIRK2_PRESET_VALUE_BROKEN
perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()
perf/x86/amd/ibs: Fix raw sample data accumulation
leds: bcm6328, bcm6358: use devres LED registering function
media: uvcvideo: Fix uvc_ctrl_fixup_xu_info() not having any effect
fs: Don't invalidate page buffers in block_write_full_page()
NFS: fix nfs_path in case of a rename retry
ACPI: button: fix handling lid state changes when input device closed
ACPI / extlog: Check for RDMSR failure
ACPI: video: use ACPI backlight for HP 635 Notebook
ACPI: debug: don't allow debugging when ACPI is disabled
acpi-cpufreq: Honor _PSD table setting on new AMD CPUs
w1: mxc_w1: Fix timeout resolution problem leading to bus error
scsi: mptfusion: Fix null pointer dereferences in mptscsih_remove()
scsi: qla2xxx: Fix crash on session cleanup with unload
btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode
btrfs: improve device scanning messages
btrfs: reschedule if necessary when logging directory items
btrfs: send, recompute reference path after orphanization of a directory
btrfs: use kvzalloc() to allocate clone_roots in btrfs_ioctl_send()
btrfs: cleanup cow block on error
btrfs: fix use-after-free on readahead extent after failure to create it
usb: xhci: Workaround for S3 issue on AMD SNPS 3.0 xHC
usb: dwc3: ep0: Fix ZLP for OUT ep0 requests
usb: dwc3: gadget: Check MPS of the request length
usb: dwc3: core: add phy cleanup for probe error handling
usb: dwc3: core: don't trigger runtime pm when remove driver
usb: cdc-acm: fix cooldown mechanism
usb: typec: tcpm: reset hard_reset_count for any disconnect
usb: host: fsl-mph-dr-of: check return of dma_set_mask()
drm/i915: Force VT'd workarounds when running as a guest OS
vt: keyboard, simplify vt_kdgkbsent
vt: keyboard, extend func_buf_lock to readers
HID: wacom: Avoid entering wacom_wac_pen_report for pad / battery
udf: Fix memory leak when mounting
dmaengine: dma-jz4780: Fix race in jz4780_dma_tx_status
iio:light:si1145: Fix timestamp alignment and prevent data leak.
iio:adc:ti-adc0832 Fix alignment issue with timestamp
iio:adc:ti-adc12138 Fix alignment issue with timestamp
iio:gyro:itg3200: Fix timestamp alignment and prevent data leak.
powerpc/drmem: Make lmb_size 64 bit
s390/stp: add locking to sysfs functions
powerpc/rtas: Restrict RTAS requests from userspace
powerpc: Warn about use of smt_snooze_delay
powerpc/powernv/elog: Fix race while processing OPAL error log event.
powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation
NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag
NFSD: Add missing NFSv2 .pc_func methods
ubifs: dent: Fix some potential memory leaks while iterating entries
perf python scripting: Fix printable strings in python3 scripts
ubi: check kthread_should_stop() after the setting of task state
ia64: fix build error with !COREDUMP
i2c: imx: Fix external abort on interrupt in exit paths
drm/amdgpu: don't map BO in reserved region
drm/amd/display: Don't invoke kgdb_breakpoint() unconditionally
ceph: promote to unsigned long long before shifting
libceph: clear con->out_msg on Policy::stateful_server faults
9P: Cast to loff_t before multiplying
ring-buffer: Return 0 on success from ring_buffer_resize()
vringh: fix __vringh_iov() when riov and wiov are different
ext4: fix leaking sysfs kobject after failed mount
ext4: fix error handling code in add_new_gdb
ext4: fix invalid inode checksum
drm/ttm: fix eviction valuable range check.
rtc: rx8010: don't modify the global rtc ops
tty: make FONTX ioctl use the tty pointer they were actually passed
arm64: berlin: Select DW_APB_TIMER_OF
cachefiles: Handle readpage error correctly
hil/parisc: Disable HIL driver when it gets stuck
arm: dts: mt7623: add missing pause for switchport
ARM: samsung: fix PM debug build with DEBUG_LL but !MMU
ARM: s3c24xx: fix missing system reset
device property: Keep secondary firmware node secondary by type
device property: Don't clear secondary pointer for shared primary firmware node
KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice
staging: octeon: repair "fixed-link" support
staging: octeon: Drop on uncorrectable alignment or FCS error
Linux 4.19.155
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I18fefb5bfaa4d05772c61c2975340d0f089b8e3e
|
||
|
|
7ce2b16bad |
fs: Don't invalidate page buffers in block_write_full_page()
commit 6dbf7bb555981fb5faf7b691e8f6169fc2b2e63b upstream.
If block_write_full_page() is called for a page that is beyond current
inode size, it will truncate page buffers for the page and return 0.
This logic has been added in 2.5.62 in commit 81eb69062588 ("fix ext3
BUG due to race with truncate") in history.git tree to fix a problem
with ext3 in data=ordered mode. This particular problem doesn't exist
anymore because ext3 is long gone and ext4 handles ordered data
differently. Also normally buffers are invalidated by truncate code and
there's no need to specially handle this in ->writepage() code.
This invalidation of page buffers in block_write_full_page() is causing
issues to filesystems (e.g. ext4 or ocfs2) when block device is shrunk
under filesystem's hands and metadata buffers get discarded while being
tracked by the journalling layer. Although it is obviously "not
supported" it can cause kernel crashes like:
[ 7986.689400] BUG: unable to handle kernel NULL pointer dereference at
+0000000000000008
[ 7986.697197] PGD 0 P4D 0
[ 7986.699724] Oops: 0002 [#1] SMP PTI
[ 7986.703200] CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G
+O --------- - - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
[ 7986.716438] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
[ 7986.723462] RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
...
[ 7986.810150] Call Trace:
[ 7986.812595] __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
[ 7986.818408] jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
[ 7986.836467] kjournald2+0xbd/0x270 [jbd2]
which is not great. The crash happens because bh->b_private is suddently
NULL although BH_JBD flag is still set (this is because
block_invalidatepage() cleared BH_Mapped flag and subsequent bh lookup
found buffer without BH_Mapped set, called init_page_buffers() which has
rewritten bh->b_private). So just remove the invalidation in
block_write_full_page().
Note that the buffer cache invalidation when block device changes size
is already careful to avoid similar problems by using
invalidate_mapping_pages() which skips busy buffers so it was only this
odd block_write_full_page() behavior that could tear down bdev buffers
under filesystem's hands.
Reported-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
a13ec5ea86 |
Merge 4.19.143 into android-4.19-stable
Changes in 4.19.143 powerpc/64s: Don't init FSCR_DSCR in __init_FSCR() gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY net: Fix potential wrong skb->protocol in skb_vlan_untag() net: qrtr: fix usage of idr in port assignment to socket net/smc: Prevent kernel-infoleak in __smc_diag_dump() tipc: fix uninit skb->data in tipc_nl_compat_dumpit() net: ena: Make missed_tx stat incremental ipvlan: fix device features ALSA: pci: delete repeated words in comments ASoC: img: Fix a reference count leak in img_i2s_in_set_fmt ASoC: img-parallel-out: Fix a reference count leak ASoC: tegra: Fix reference count leaks. mfd: intel-lpss: Add Intel Emmitsburg PCH PCI IDs arm64: dts: qcom: msm8916: Pull down PDM GPIOs during sleep powerpc/xive: Ignore kmemleak false positives media: pci: ttpci: av7110: fix possible buffer overflow caused by bad DMA value in debiirq() blktrace: ensure our debugfs dir exists scsi: target: tcmu: Fix crash on ARM during cmd completion iommu/iova: Don't BUG on invalid PFNs drm/amdkfd: Fix reference count leaks. drm/radeon: fix multiple reference count leak drm/amdgpu: fix ref count leak in amdgpu_driver_open_kms drm/amd/display: fix ref count leak in amdgpu_drm_ioctl drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config drm/amdgpu/display: fix ref count leak when pm_runtime_get_sync fails scsi: lpfc: Fix shost refcount mismatch when deleting vport xfs: Don't allow logging of XFS_ISTALE inodes selftests/powerpc: Purge extra count_pmc() calls of ebb selftests f2fs: fix error path in do_recover_data() omapfb: fix multiple reference count leaks due to pm_runtime_get_sync PCI: Fix pci_create_slot() reference count leak ARM: dts: ls1021a: output PPS signal on FIPER2 rtlwifi: rtl8192cu: Prevent leaking urb mips/vdso: Fix resource leaks in genvdso.c cec-api: prevent leaking memory through hole in structure HID: quirks: add NOGET quirk for Logitech GROUP f2fs: fix use-after-free issue drm/nouveau/drm/noveau: fix reference count leak in nouveau_fbcon_open drm/nouveau: fix reference count leak in nv50_disp_atomic_commit drm/nouveau: Fix reference count leak in nouveau_connector_detect locking/lockdep: Fix overflow in presentation of average lock-time btrfs: file: reserve qgroup space after the hole punch range is locked scsi: iscsi: Do not put host in iscsi_set_flashnode_param() ceph: fix potential mdsc use-after-free crash scsi: fcoe: Memory leak fix in fcoe_sysfs_fcf_del() EDAC/ie31200: Fallback if host bridge device is already initialized KVM: arm64: Fix symbol dependency in __hyp_call_panic_nvhe powerpc/spufs: add CONFIG_COREDUMP dependency USB: sisusbvga: Fix a potential UB casued by left shifting a negative value efi: provide empty efi_enter_virtual_mode implementation Revert "ath10k: fix DMA related firmware crashes on multiple devices" media: gpio-ir-tx: improve precision of transmitted signal due to scheduling drm/msm/adreno: fix updating ring fence nvme-fc: Fix wrong return value in __nvme_fc_init_request() null_blk: fix passing of REQ_FUA flag in null_handle_rq i2c: rcar: in slave mode, clear NACK earlier usb: gadget: f_tcm: Fix some resource leaks in some error paths jbd2: make sure jh have b_transaction set in refile/unfile_buffer ext4: don't BUG on inconsistent journal feature ext4: handle read only external journal device jbd2: abort journal if free a async write error metadata buffer ext4: handle option set by mount flags correctly ext4: handle error of ext4_setup_system_zone() on remount ext4: correctly restore system zone info when remount fails fs: prevent BUG_ON in submit_bh_wbc() spi: stm32: fix stm32_spi_prepare_mbr in case of odd clk_rate s390/cio: add cond_resched() in the slow_eval_known_fn() loop ASoC: wm8994: Avoid attempts to read unreadable registers scsi: fcoe: Fix I/O path allocation scsi: ufs: Fix possible infinite loop in ufshcd_hold scsi: ufs: Improve interrupt handling for shared interrupts scsi: ufs: Clean up completed request without interrupt notification scsi: qla2xxx: Check if FW supports MQ before enabling scsi: qla2xxx: Fix null pointer access during disconnect from subsystem Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command" macvlan: validate setting of multiple remote source MAC addresses net: gianfar: Add of_node_put() before goto statement powerpc/perf: Fix soft lockups due to missed interrupt accounting block: loop: set discard granularity and alignment for block device backed loop HID: i2c-hid: Always sleep 60ms after I2C_HID_PWR_ON commands blk-mq: order adding requests to hctx->dispatch and checking SCHED_RESTART btrfs: reset compression level for lzo on remount btrfs: fix space cache memory leak after transaction abort fbcon: prevent user font height or width change from causing potential out-of-bounds access USB: lvtest: return proper error code in probe vt: defer kfree() of vc_screenbuf in vc_do_resize() vt_ioctl: change VT_RESIZEX ioctl to check for error return from vc_resize() serial: samsung: Removes the IRQ not found warning serial: pl011: Fix oops on -EPROBE_DEFER serial: pl011: Don't leak amba_ports entry on driver register error serial: 8250_exar: Fix number of ports for Commtech PCIe cards serial: 8250: change lock order in serial8250_do_startup() writeback: Protect inode->i_io_list with inode->i_lock writeback: Avoid skipping inode writeback writeback: Fix sync livelock due to b_dirty_time processing XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt XEN data pointer which contains XEN specific information. usb: host: xhci: fix ep context print mismatch in debugfs xhci: Do warm-reset when both CAS and XDEV_RESUME are set xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed PM: sleep: core: Fix the handling of pending runtime resume requests device property: Fix the secondary firmware node handling in set_primary_fwnode() genirq/matrix: Deal with the sillyness of for_each_cpu() on UP irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake drm/amdgpu: Fix buffer overflow in INFO ioctl drm/amd/pm: correct Vega10 swctf limit setting drm/amd/pm: correct Vega12 swctf limit setting USB: yurex: Fix bad gfp argument usb: uas: Add quirk for PNY Pro Elite USB: quirks: Add no-lpm quirk for another Raydium touchscreen USB: quirks: Ignore duplicate endpoint on Sound Devices MixPre-D USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge usb: host: ohci-exynos: Fix error handling in exynos_ohci_probe() USB: gadget: u_f: add overflow checks to VLA macros USB: gadget: f_ncm: add bounds checks to ncm_unwrap_ntb() USB: gadget: u_f: Unbreak offset calculation in VLAs USB: cdc-acm: rework notification_buffer resizing usb: storage: Add unusual_uas entry for Sony PSZ drives btrfs: check the right error variable in btrfs_del_dir_entries_in_log usb: dwc3: gadget: Don't setup more than requested usb: dwc3: gadget: Fix handling ZLP usb: dwc3: gadget: Handle ZLP for sg requests tpm: Unify the mismatching TPM space buffer sizes HID: hiddev: Fix slab-out-of-bounds write in hiddev_ioctl_usage() ALSA: usb-audio: Update documentation comment for MS2109 quirk Linux 4.19.143 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I8b6e29eda77bd69df30132842cf28019c8e7c1a3 |
||
|
|
4aaac9c537 |
fs: prevent BUG_ON in submit_bh_wbc()
[ Upstream commit 377254b2cd2252c7c3151b113cbdf93a7736c2e9 ] If a device is hot-removed --- for example, when a physical device is unplugged from pcie slot or a nbd device's network is shutdown --- this can result in a BUG_ON() crash in submit_bh_wbc(). This is because the when the block device dies, the buffer heads will have their Buffer_Mapped flag get cleared, leading to the crash in submit_bh_wbc. We had attempted to work around this problem in commit |
||
|
|
a13256124f |
Merge 4.19.118 into android-4.19
Changes in 4.19.118
arm, bpf: Fix offset overflow for BPF_MEM BPF_DW
objtool: Fix switch table detection in .text.unlikely
scsi: sg: add sg_remove_request in sg_common_write
ext4: use non-movable memory for superblock readahead
watchdog: sp805: fix restart handler
arm, bpf: Fix bugs with ALU64 {RSH, ARSH} BPF_K shift by 0
ARM: dts: imx6: Use gpc for FEC interrupt controller to fix wake on LAN.
netfilter: nf_tables: report EOPNOTSUPP on unsupported flags/object type
irqchip/mbigen: Free msi_desc on device teardown
ALSA: hda: Don't release card at firmware loading error
of: unittest: kmemleak on changeset destroy
of: unittest: kmemleak in of_unittest_platform_populate()
of: unittest: kmemleak in of_unittest_overlay_high_level()
of: overlay: kmemleak in dup_and_fixup_symbol_prop()
x86/Hyper-V: Report crash register data or kmsg before running crash kernel
lib/raid6: use vdupq_n_u8 to avoid endianness warnings
video: fbdev: sis: Remove unnecessary parentheses and commented code
rbd: avoid a deadlock on header_rwsem when flushing notifies
rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
xsk: Add missing check on user supplied headroom size
x86/Hyper-V: Unload vmbus channel in hv panic callback
x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump
x86/Hyper-V: Trigger crash enlightenment only once during system crash.
x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set
x86/Hyper-V: Report crash data in die() when panic_on_oops is set
clk: at91: usb: continue if clk_hw_round_rate() return zero
power: supply: bq27xxx_battery: Silence deferred-probe error
clk: tegra: Fix Tegra PMC clock out parents
soc: imx: gpc: fix power up sequencing
rtc: 88pm860x: fix possible race condition
NFSv4/pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
NFS: direct.c: Fix memory leak of dreq when nfs_get_lock_context fails
s390/cpuinfo: fix wrong output when CPU0 is offline
powerpc/maple: Fix declaration made after definition
s390/cpum_sf: Fix wrong page count in error message
ext4: do not commit super on read-only bdev
um: ubd: Prevent buffer overrun on command completion
cifs: Allocate encryption header through kmalloc
include/linux/swapops.h: correct guards for non_swap_entry()
percpu_counter: fix a data race at vm_committed_as
compiler.h: fix error in BUILD_BUG_ON() reporting
KVM: s390: vsie: Fix possible race when shadowing region 3 tables
x86: ACPI: fix CPU hotplug deadlock
drm/amdkfd: kfree the wrong pointer
NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
f2fs: fix NULL pointer dereference in f2fs_write_begin()
drm/vc4: Fix HDMI mode validation
iommu/vt-d: Fix mm reference leak
ext2: fix empty body warnings when -Wextra is used
ext2: fix debug reference to ext2_xattr_cache
power: supply: axp288_fuel_gauge: Broaden vendor check for Intel Compute Sticks.
libnvdimm: Out of bounds read in __nd_ioctl()
iommu/amd: Fix the configuration of GCR3 table root pointer
f2fs: fix to wait all node page writeback
net: dsa: bcm_sf2: Fix overflow checks
fbdev: potential information leak in do_fb_ioctl()
iio: si1133: read 24-bit signed integer for measurement
tty: evh_bytechan: Fix out of bounds accesses
locktorture: Print ratio of acquisitions, not failures
mtd: spinand: Explicitly use MTD_OPS_RAW to write the bad block marker to OOB
mtd: lpddr: Fix a double free in probe()
mtd: phram: fix a double free issue in error path
KEYS: Don't write out to userspace while holding key semaphore
bpf: fix buggy r0 retval refinement for tracing helpers
Linux 4.19.118
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ife34f739f719c332c7b1d22b1832179be6a16800
|
||
|
|
a6375c9877 |
ext4: use non-movable memory for superblock readahead
commit d87f639258a6a5980183f11876c884931ad93da2 upstream. Since commit |
||
|
|
b01c73ea71 |
BACKPORT: FROMLIST: Update Inline Encryption from v5 to v6 of patch series
Changes v5 => v6: - Blk-crypto's kernel crypto API fallback is no longer restricted to 8-byte DUNs. It's also now separately configurable from blk-crypto, and can be disabled entirely, while still allowing the kernel to use inline encryption hardware. Further, struct bio_crypt_ctx takes up less space, and no longer contains the information needed by the crypto API fallback - the fallback allocates the required memory when necessary. - Blk-crypto now supports all file content encryption modes supported by fscrypt. - Fixed bio merging logic in blk-merge.c - Fscrypt now supports inline encryption with the direct key policy, since blk-crypto now has support for larger DUNs. - Keyslot manager now uses a hashtable to lookup which keyslot contains any particular key (thanks Eric!) - Fscrypt support for inline encryption now handles filesystems with multiple underlying block devices (thanks Eric!) - Numerous cleanups Bug: 137270441 Test: refer to I26376479ee38259b8c35732cb3a1d7e15f9b05a3 Change-Id: I13e2e327e0b4784b394cb1e7cf32a04856d95f01 Link: https://lore.kernel.org/linux-block/20191218145136.172774-1-satyat@google.com/ Signed-off-by: Satya Tangirala <satyat@google.com> |
||
|
|
f2ca2620dd |
BACKPORT: FROMLIST: ext4: add inline encryption support
Wire up ext4 to support inline encryption via the helper functions which fs/crypto/ now provides. This includes: - Adding a mount option 'inlinecrypt' which enables inline encryption on encrypted files where it can be used. - Setting the bio_crypt_ctx on bios that will be submitted to an inline-encrypted file. Note: submit_bh_wbc() in fs/buffer.c also needed to be patched for this part, since ext4 sometimes uses ll_rw_block() on file data. - Not adding logically discontiguous data to bios that will be submitted to an inline-encrypted file. - Not doing filesystem-layer crypto on inline-encrypted files. Bug: 137270441 Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec Change-Id: I54a8efe388289918f4144d8138fb87aa507ae760 Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Link: https://patchwork.kernel.org/patch/11214781/ |
||
|
|
83c395332f |
fs: fix guard_bio_eod to check for real EOD errors
[ Upstream commit dce30ca9e3b676fb288c33c1f4725a0621361185 ]
guard_bio_eod() can truncate a segment in bio to allow it to do IO on
odd last sectors of a device.
It already checks if the IO starts past EOD, but it does not consider
the possibility of an IO request starting within device boundaries can
contain more than one segment past EOD.
In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
underflow bvec->bv_len.
Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
This situation has been found on filesystems such as isofs and vfat,
which doesn't check the device size before mount, if the device is
smaller than the filesystem itself, a readahead on such filesystem,
which spans EOD, can trigger this situation, leading a call to
zero_user() with a wrong size possibly corrupting memory.
I didn't see any crash, or didn't let the system run long enough to
check if memory corruption will be hit somewhere, but adding
instrumentation to guard_bio_end() to check truncated_bytes size, was
enough to see the error.
The following script can trigger the error.
MNT=/mnt
IMG=./DISK.img
DEV=/dev/loop0
mkfs.vfat $IMG
mount $IMG $MNT
cp -R /etc $MNT &> /dev/null
umount $MNT
losetup -D
losetup --find --show --sizelimit 16247280 $IMG
mount $DEV $MNT
find $MNT -type f -exec cat {} + >/dev/null
Kudos to Eric Sandeen for coming up with the reproducer above
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
72426ed2a1 |
fs: ratelimit __find_get_block_slow() failure message.
[ Upstream commit 43636c804df0126da669c261fc820fb22f62bfc2 ] When something let __find_get_block_slow() hit all_mapped path, it calls printk() for 100+ times per a second. But there is no need to print same message with such high frequency; it is just asking for stall warning, or at least bloating log files. [ 399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8 [ 399.873324][T15342] b_state=0x00000029, b_size=512 [ 399.878403][T15342] device loop0 blocksize: 4096 [ 399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8 [ 399.890400][T15342] b_state=0x00000029, b_size=512 [ 399.895595][T15342] device loop0 blocksize: 4096 [ 399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8 [ 399.907471][T15342] b_state=0x00000029, b_size=512 [ 399.912506][T15342] device loop0 blocksize: 4096 This patch reduces frequency to up to once per a second, in addition to concatenating three lines into one. [ 399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
13ba17bee1 |
notifier: Remove notifier header file wherever not used
The conversion of the hotplug notifiers to a state machine left the notifier.h includes around in some places. Remove them. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org |
||
|
|
f745c6f5fe |
fs, mm: account buffer_head to kmemcg
The buffer_head can consume a significant amount of system memory and is directly related to the amount of page cache. In our production environment we have observed that a lot of machines are spending a significant amount of memory as buffer_head and can not be left as system memory overhead. Charging buffer_head is not as simple as adding __GFP_ACCOUNT to the allocation. The buffer_heads can be allocated in a memcg different from the memcg of the page for which buffer_heads are being allocated. One concrete example is memory reclaim. The reclaim can trigger I/O of pages of any memcg on the system. So, the right way to charge buffer_head is to extract the memcg from the page for which buffer_heads are being allocated and then use targeted memcg charging API. [shakeelb@google.com: use __GFP_ACCOUNT for directed memcg charging] Link: http://lkml.kernel.org/r/20180702220208.213380-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180627191250.209150-3-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3d7b6b21f6 |
iomap: mark newly allocated buffer heads as new
In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW flag set. This will be used by filesystems like gfs2, which allocate blocks in iomap->begin. Minor corrections to the comment for IOMAP_UNWRITTEN maps. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
||
|
|
a6d639da63 |
fs: factor out a __generic_write_end helper
Bits of the buffer.c based write_end implementations that don't know about buffer_heads and can be reused by other implementations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
||
|
|
8a78cb1f1b |
fs: move page_cache_seek_hole_data to iomap.c
This function is only used by the iomap code, depends on being called from it, and will soon stop poking into buffer head internals. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
||
|
|
7214dd4ea9 |
Merge branch 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs thaw updates from Al Viro: "An ancient series that has fallen through the cracks in the previous cycle" * 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: buffer.c: call thaw_super during emergency thaw vfs: factor sb iteration out of do_emergency_remount |
||
|
|
b93b016313 |
page cache: use xa_lock
Remove the address_space ->tree_lock and use the xa_lock newly added to the radix_tree_root. Rename the address_space ->page_tree to ->i_pages, since we don't really care that it's a tree. [willy@infradead.org: fix nds32, fs/dax.c] Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
f82b376413 |
export __set_page_dirty
XFS currently contains a copy-and-paste of __set_page_dirty(). Export it from buffer.c instead. Link: http://lkml.kernel.org/r/20180313132639.17387-6-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Dave Chinner <david@fromorbit.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3172485f4f |
block_invalidatepage(): only release page if the full page was invalidated
Prior to commit |
||
|
|
08fdc8a013 |
buffer.c: call thaw_super during emergency thaw
There are 2 distinct freezing mechanisms - one operates on block devices and another one directly on super blocks. Both end up with the same result, but thaw of only one of these does not thaw the other. In particular fsfreeze --freeze uses the ioctl variant going to the super block. Since prior to this patch emergency thaw was not doing a relevant thaw, filesystems frozen with this method remained unaffected. The patch is a hack which adds blind unfreezing. In order to keep the super block write-locked the whole time the code is shuffled around and the newly introduced __iterate_supers is employed. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
19e7b5f994 |
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro: "All kinds of misc stuff, without any unifying topic, from various people. Neil's d_anon patch, several bugfixes, introduction of kvmalloc analogue of kmemdup_user(), extending bitfield.h to deal with fixed-endians, assorted cleanups all over the place..." * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) alpha: osf_sys.c: use timespec64 where appropriate alpha: osf_sys.c: fix put_tv32 regression jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path dcache: delete unused d_hash_mask dcache: subtract d_hash_shift from 32 in advance fs/buffer.c: fold init_buffer() into init_page_buffers() fs: fold __inode_permission() into inode_permission() fs: add RWF_APPEND sctp: use vmemdup_user() rather than badly open-coding memdup_user() snd_ctl_elem_init_enum_names(): switch to vmemdup_user() replace_user_tlv(): switch to vmemdup_user() new primitive: vmemdup_user() memdup_user(): switch to GFP_USER eventfd: fold eventfd_ctx_get() into eventfd_ctx_fileget() eventfd: fold eventfd_ctx_read() into eventfd_read() eventfd: convert to use anon_inode_getfd() nfs4file: get rid of pointless include of btrfs.h uvc_v4l2: clean copyin/copyout up vme_user: don't use __copy_..._user() usx2y: don't bother with memdup_user() for 16-byte structure ... |
||
|
|
01950a349e |
fs/buffer.c: fold init_buffer() into init_page_buffers()
Since commit
|
||
|
|
c45a8f2def |
fs: convert to bio_last_bvec_all()
This patch converts 3 users to bio_last_bvec_all(), so that we can go ahead and convert to multipage bvec. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
8667982014 |
mm, pagevec: remove cold parameter for pagevecs
Every pagevec_init user claims the pages being released are hot even in cases where it is unlikely the pages are hot. As no one cares about the hotness of pages being released to the allocator, just ditch the parameter. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
e2c5923c34 |
Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.
Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:
- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.
- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.
- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)
- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)
- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).
- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)
- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).
- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).
- Fix laptop mode on blk-mq (me).
- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).
- blktrace race fix, oopsing on sg load (me).
- blk-mq optimizations (me).
- Obscure waitqueue race fix for kyber (Omar).
- NBD fixes (Josef).
- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).
- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.
- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.
- BFQ updates (Paolo).
- blk-mq atomic flags memory ordering fixes (Peter Z).
- Loop cgroup support (Shaohua).
- Lots of minor fixes from lots of different folks, both for core and
driver code"
* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...
|
||
|
|
ae9a8c4bdc |
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o: - Add support for online resizing of file systems with bigalloc - Fix a two data corruption bugs involving DAX, as well as a corruption bug after a crash during a racing fallocate and delayed allocation. - Finally, a number of cleanups and optimizations. * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: improve smp scalability for inode generation ext4: add support for online resizing with bigalloc ext4: mention noload when recovering on read-only device Documentation: fix little inconsistencies ext4: convert timers to use timer_setup() jbd2: convert timers to use timer_setup() ext4: remove duplicate extended attributes defs ext4: add ext4_should_use_dax() ext4: add sanity check for encryption + DAX ext4: prevent data corruption with journaling + DAX ext4: prevent data corruption with inline data + DAX ext4: fix interaction between i_size, fallocate, and delalloc after a crash ext4: retry allocations conservatively ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA ext4: Add iomap support for inline data iomap: Add IOMAP_F_DATA_INLINE flag iomap: Switch from blkno to disk offset |
||
|
|
67f2519fe2 |
fs: guard_bio_eod() needs to consider partitions
guard_bio_eod() needs to look at the partition capacity, not just the
capacity of the whole device, when determining if truncation is
necessary.
[ 60.268688] attempt to access beyond end of device
[ 60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
[ 60.268693] buffer_io_error: 2 callbacks suppressed
[ 60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read
Fixes:
|
||
|
|
6aa7de0591 |
locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
||
|
|
bc48f001de |
buffer: eliminate the need to call free_more_memory() in __getblk_slow()
Since the previous commit removed any case where grow_buffers() would return failure due to memory allocations, we can safely remove the case where we have to call free_more_memory() in this function. Since this is also the last user of free_more_memory(), kill it off completely. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
94dc24c0c5 |
buffer: grow_dev_page() should use __GFP_NOFAIL for all cases
We currently use it for find_or_create_page(), which means that it cannot fail. Ensure we also pass in 'retry == true' to alloc_page_buffers(), which also ensure that it cannot fail. After this, there are no failure cases in grow_dev_page() that occur because of a failed memory allocation. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
640ab98fb3 |
buffer: have alloc_page_buffers() use __GFP_NOFAIL
Instead of adding weird retry logic in that function, utilize __GFP_NOFAIL to ensure that the vm takes care of handling any potential retries appropriately. This means we don't have to call free_more_memory() from here. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
19fe5f643f |
iomap: Switch from blkno to disk offset
Replace iomap->blkno, the sector number, with iomap->addr, the disk offset in bytes. For invalid disk offsets, use the special value IOMAP_NULL_ADDR instead of IOMAP_NULL_BLOCK. This allows to use iomap for mappings which are not block aligned, such as inline data on ext4. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> # iomap, xfs Reviewed-by: Jan Kara <jack@suse.cz> |
||
|
|
a0725ab0c7 |
Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:
- Fix for a registration race in loop, from Anton Volkov.
- Overflow complaint fix from Arnd for DAC960.
- Series of drbd changes from the usual suspects.
- Conversion of the stec/skd driver to blk-mq. From Bart.
- A few BFQ improvements/fixes from Paolo.
- CFQ improvement from Ritesh, allowing idling for group idle.
- A few fixes found by Dan's smatch, courtesy of Dan.
- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.
- A few nbd fixes from Josef.
- Support for cgroup info in blktrace, from Shaohua.
- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.
- Various corner cases and error handling fixes from Weiping Zhang.
- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.
- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.
- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"
* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...
|
||
|
|
397162ffa2 |
mm: remove nr_pages argument from pagevec_lookup{,_range}()
All users of pagevec_lookup() and pagevec_lookup_range() now pass PAGEVEC_SIZE as a desired number of pages. Just drop the argument. Link: http://lkml.kernel.org/r/20170726114704.7626-11-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8338141f0f |
fs: use pagevec_lookup_range() in page_cache_seek_hole_data()
We want only pages from given range in page_cache_seek_hole_data(). Use pagevec_lookup_range() instead of pagevec_lookup() and remove unnecessary code. Note that the check for getting less pages than desired can be removed because index gets updated by pagevec_lookup_range(). Link: http://lkml.kernel.org/r/20170726114704.7626-9-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c10f778ddf |
fs: fix performance regression in clean_bdev_aliases()
Commit |
||
|
|
d72dc8a25a |
mm: make pagevec_lookup() update index
Make pagevec_lookup() (and underlying find_get_pages()) update index to the next page where iteration should continue. Most callers want this and also pagevec_lookup_tag() already does this. Link: http://lkml.kernel.org/r/20170726114704.7626-3-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
74d46992e0 |
block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
241f01fbed |
fs/buffer.c: make bh_lru_install() more efficient
To install a buffer_head into the cpu's LRU queue, bh_lru_install() would construct a new copy of the queue and then memcpy it over the real queue. But it's easily possible to do the update in-place, which is faster and simpler. Some work can also be skipped if the buffer_head was already in the queue. As a microbenchmark I timed how long it takes to run sb_getblk() 10,000,000 times alternating between BH_LRU_SIZE + 1 blocks. Effectively, this benchmarks looking up buffer_heads that are in the page cache but not in the LRU: Before this patch: 1.758s After this patch: 1.653s This patch also removes about 350 bytes of compiled code (on x86_64), partly due to removal of the memcpy() which was being inlined+unrolled. Link: http://lkml.kernel.org/r/20161229193445.1913-1-ebiggers3@gmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
642338ba33 |
Merge tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull XFS updates from Darrick Wong:
"Here are some changes for you for 4.13. For the most part it's fixes
for bugs and deadlock problems, and preparation for online fsck in
some future merge window.
- Avoid quotacheck deadlocks
- Fix transaction overflows when bunmapping fragmented files
- Refactor directory readahead
- Allow admin to configure if ASSERT is fatal
- Improve transaction usage detail logging during overflows
- Minor cleanups
- Don't leak log items when the log shuts down
- Remove double-underscore typedefs
- Various preparation for online scrubbing
- Introduce new error injection configuration sysfs knobs
- Refactor dq_get_next to use extent map directly
- Fix problems with iterating the page cache for unwritten data
- Implement SEEK_{HOLE,DATA} via iomap
- Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA
- Don't use MAXPATHLEN to check on-disk symlink target lengths"
* tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
xfs: don't crash on unexpected holes in dir/attr btrees
xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN
xfs: fix contiguous dquot chunk iteration livelock
xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
vfs: Add iomap_seek_hole and iomap_seek_data helpers
vfs: Add page_cache_seek_hole_data helper
xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk
xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent
xfs: Check for m_errortag initialization in xfs_errortag_test
xfs: grab dquots without taking the ilock
xfs: fix semicolon.cocci warnings
xfs: Don't clear SGID when inheriting ACLs
xfs: free cowblocks and retry on buffered write ENOSPC
xfs: replace log_badcrc_factor knob with error injection tag
xfs: convert drop_writes to use the errortag mechanism
xfs: remove unneeded parameter from XFS_TEST_ERROR
xfs: expose errortag knobs via sysfs
xfs: make errortag a per-mountpoint structure
xfs: free uncommitted transactions during log recovery
xfs: don't allow bmap on rt files
...
|
||
|
|
bc2c6421cb |
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o: "The first major feature for ext4 this merge window is the largedir feature, which allows ext4 directories to support over 2 billion directory entries (assuming ~64 byte file names; in practice, users will run into practical performance limits first.) This feature was originally written by the Lustre team, and credit goes to Artem Blagodarenko from Seagate for getting this feature upstream. The second major major feature allows ext4 to support extended attribute values up to 64k. This feature was also originally from Lustre, and has been enhanced by Tahsin Erdogan from Google with a deduplication feature so that if multiple files have the same xattr value (for example, Windows ACL's stored by Samba), only one copy will be stored on disk for encoding and caching efficiency. We also have the usual set of bug fixes, cleanups, and optimizations" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits) ext4: fix spelling mistake: "prellocated" -> "preallocated" ext4: fix __ext4_new_inode() journal credits calculation ext4: skip ext4_init_security() and encryption on ea_inodes fs: generic_block_bmap(): initialize all of the fields in the temp bh ext4: change fast symlink test to not rely on i_blocks ext4: require key for truncate(2) of encrypted file ext4: don't bother checking for encryption key in ->mmap() ext4: check return value of kstrtoull correctly in reserved_clusters_store ext4: fix off-by-one fsmap error on 1k block filesystems ext4: return EFSBADCRC if a bad checksum error is found in ext4_find_entry() ext4: return EIO on read error in ext4_find_entry ext4: forbid encrypting root directory ext4: send parallel discards on commit completions ext4: avoid unnecessary stalls in ext4_evict_inode() ext4: add nombcache mount option ext4: strong binding of xattr inode references ext4: eliminate xattr entry e_hash recalculation for removes ext4: reserve space for xattr entries/names quota: add get_inode_usage callback to transfer multi-inode charges ext4: xattr inode deduplication ... |
||
|
|
088737f44b |
Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull Writeback error handling updates from Jeff Layton:
"This pile represents the bulk of the writeback error handling fixes
that I have for this cycle. Some of the earlier patches in this pile
may look trivial but they are prerequisites for later patches in the
series.
The aim of this set is to improve how we track and report writeback
errors to userland. Most applications that care about data integrity
will periodically call fsync/fdatasync/msync to ensure that their
writes have made it to the backing store.
For a very long time, we have tracked writeback errors using two flags
in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
writeback error occurs (via mapping_set_error) and are cleared as a
side-effect of filemap_check_errors (as you noted yesterday). This
model really sucks for userland.
Only the first task to call fsync (or msync or fdatasync) will see the
error. Any subsequent task calling fsync on a file will get back 0
(unless another writeback error occurs in the interim). If I have
several tasks writing to a file and calling fsync to ensure that their
writes got stored, then I need to have them coordinate with one
another. That's difficult enough, but in a world of containerized
setups that coordination may even not be possible.
But wait...it gets worse!
The calls to filemap_check_errors can be buried pretty far down in the
call stack, and there are internal callers of filemap_write_and_wait
and the like that also end up clearing those errors. Many of those
callers ignore the error return from that function or return it to
userland at nonsensical times (e.g. truncate() or stat()). If I get
back -EIO on a truncate, there is no reason to think that it was
because some previous writeback failed, and a subsequent fsync() will
(incorrectly) return 0.
This pile aims to do three things:
1) ensure that when a writeback error occurs that that error will be
reported to userland on a subsequent fsync/fdatasync/msync call,
regardless of what internal callers are doing
2) report writeback errors on all file descriptions that were open at
the time that the error occurred. This is a user-visible change,
but I think most applications are written to assume this behavior
anyway. Those that aren't are unlikely to be hurt by it.
3) document what filesystems should do when there is a writeback
error. Today, there is very little consistency between them, and a
lot of cargo-cult copying. We need to make it very clear what
filesystems should do in this situation.
To achieve this, the set adds a new data type (errseq_t) and then
builds new writeback error tracking infrastructure around that. Once
all of that is in place, we change the filesystems to use the new
infrastructure for reporting wb errors to userland.
Note that this is just the initial foray into cleaning up this mess.
There is a lot of work remaining here:
1) convert the rest of the filesystems in a similar fashion. Once the
initial set is in, then I think most other fs' will be fairly
simple to convert. Hopefully most of those can in via individual
filesystem trees.
2) convert internal waiters on writeback to use errseq_t for
detecting errors instead of relying on the AS_* flags. I have some
draft patches for this for ext4, but they are not quite ready for
prime time yet.
This was a discussion topic this year at LSF/MM too. If you're
interested in the gory details, LWN has some good articles about this:
https://lwn.net/Articles/718734/
https://lwn.net/Articles/724307/"
* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
btrfs: minimal conversion to errseq_t writeback error reporting on fsync
xfs: minimal conversion to errseq_t writeback error reporting
ext4: use errseq_t based error handling for reporting data writeback errors
fs: convert __generic_file_fsync to use errseq_t based reporting
block: convert to errseq_t based writeback error tracking
dax: set errors in mapping when writeback fails
Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
fs: new infrastructure for writeback error handling and reporting
lib: add errseq_t type and infrastructure for handling it
mm: don't TestClearPageError in __filemap_fdatawait_range
mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
jbd2: don't clear and reset errors after waiting on writeback
buffer: set errors in mapping at the time that the error occurs
fs: check for writeback errors after syncing out buffers in generic_file_fsync
buffer: use mapping_set_error instead of setting the flag
mm: fix mapping_set_error call in me_pagecache_dirty
|
||
|
|
87354e5de0 |
buffer: set errors in mapping at the time that the error occurs
I noticed on xfs that I could still sometimes get back an error on fsync on a fd that was opened after the error condition had been cleared. The problem is that the buffer code sets the write_io_error flag and then later checks that flag to set the error in the mapping. That flag perisists for quite a while however. If the file is later opened with O_TRUNC, the buffers will then be invalidated and the mapping's error set such that a subsequent fsync will return error. I think this is incorrect, as there was no writeback between the open and fsync. Add a new mark_buffer_write_io_error operation that sets the flag and the error in the mapping at the same time. Replace all calls to set_buffer_write_io_error with mark_buffer_write_io_error, and remove the places that check this flag in order to set the error in the mapping. This sets the error in the mapping earlier, at the time that it's first detected. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> |
||
|
|
d945b59db8 |
buffer: use mapping_set_error instead of setting the flag
Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
|
|
2a527d6858 |
fs: generic_block_bmap(): initialize all of the fields in the temp bh
KMSAN (KernelMemorySanitizer, a new error detection tool) reports the use of uninitialized memory in ext4_update_bh_state(): ================================================================== BUG: KMSAN: use of unitialized memory CPU: 3 PID: 1 Comm: swapper/0 Tainted: G B 4.8.0-rc6+ #597 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 0000000000000282 ffff88003cc96f68 ffffffff81f30856 0000003000000008 ffff88003cc96f78 0000000000000096 ffffffff8169742a ffff88003cc96ff8 ffffffff812fc1fc 0000000000000008 ffff88003a1980e8 0000000100000000 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff81f30856>] dump_stack+0xa6/0xc0 lib/dump_stack.c:51 [<ffffffff812fc1fc>] kmsan_report+0x1ec/0x300 mm/kmsan/kmsan.c:? [<ffffffff812fc33b>] __msan_warning+0x2b/0x40 ??:? [< inline >] ext4_update_bh_state fs/ext4/inode.c:727 [<ffffffff8169742a>] _ext4_get_block+0x6ca/0x8a0 fs/ext4/inode.c:759 [<ffffffff81696d4c>] ext4_get_block+0x8c/0xa0 fs/ext4/inode.c:769 [<ffffffff814a2d36>] generic_block_bmap+0x246/0x2b0 fs/buffer.c:2991 [<ffffffff816ca30e>] ext4_bmap+0x5ee/0x660 fs/ext4/inode.c:3177 ... origin description: ----tmp@generic_block_bmap ================================================================== (the line numbers are relative to 4.8-rc6, but the bug persists upstream) The local |tmp| is created in generic_block_bmap() and then passed into ext4_bmap() => ext4_get_block() => _ext4_get_block() => ext4_update_bh_state(). Along the way tmp.b_page is never initialized before ext4_update_bh_state() checks its value. [ Use the approach suggested by Kees Cook of initializing the whole bh structure.] Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
|
|
334fd34d76 |
vfs: Add page_cache_seek_hole_data helper
Both ext4 and xfs implement seeking for the next hole or piece of data
in unwritten extents by scanning the page cache, and both versions share
the same bug when iterating the buffers of a page: the start offset into
the page isn't taken into account, so when a page fits more than two
filesystem blocks, things will go wrong. For example, on a filesystem
with a block size of 1k, the following command will fail:
xfs_io -f -c "falloc 0 4k" \
-c "pwrite 1k 1k" \
-c "pwrite 3k 1k" \
-c "seek -a -r 0" foo
In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.
Introduce a generic vfs helper for seeking in the page cache that gets
this right. The next commits will replace the filesystem specific
implementations.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[hch: dropped the export]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
||
|
|
8e8f929881 |
fs: add support for buffered writeback to pass down write hints
Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
4e4cbee93d |
block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> |