Commit Graph

379 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
dc261c32e7 Merge 4.19.267 into android-4.19-stable
Changes in 4.19.267
	phy: stm32: fix an error code in probe
	wifi: cfg80211: fix memory leak in query_regdb_file()
	HID: hyperv: fix possible memory leak in mousevsc_probe()
	net: gso: fix panic on frag_list with mixed head alloc types
	net: tun: Fix memory leaks of napi_get_frags
	bnxt_en: fix potentially incorrect return value for ndo_rx_flow_steer
	net: fman: Unregister ethernet device on removal
	capabilities: fix undefined behavior in bit shift for CAP_TO_MASK
	net: lapbether: fix issue of dev reference count leakage in lapbeth_device_event()
	hamradio: fix issue of dev reference count leakage in bpq_device_event()
	drm/vc4: Fix missing platform_unregister_drivers() call in vc4_drm_register()
	ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network
	tipc: fix the msg->req tlv len check in tipc_nl_compat_name_table_dump_header
	dmaengine: mv_xor_v2: Fix a resource leak in mv_xor_v2_remove()
	drivers: net: xgene: disable napi when register irq failed in xgene_enet_open()
	net: nixge: disable napi when enable interrupts failed in nixge_open()
	net: cxgb3_main: disable napi when bind qsets failed in cxgb_up()
	ethernet: s2io: disable napi when start nic failed in s2io_card_up()
	net: mv643xx_eth: disable napi when init rxq or txq failed in mv643xx_eth_open()
	net: macvlan: fix memory leaks of macvlan_common_newlink
	riscv: process: fix kernel info leakage
	arm64: efi: Fix handling of misaligned runtime regions and drop warning
	ALSA: hda/ca0132: add quirk for EVGA Z390 DARK
	ALSA: hda: fix potential memleak in 'add_widget_node'
	ALSA: usb-audio: Add quirk entry for M-Audio Micro
	ALSA: usb-audio: Add DSD support for Accuphase DAC-60
	vmlinux.lds.h: Fix placement of '.data..decrypted' section
	nilfs2: fix deadlock in nilfs_count_free_blocks()
	nilfs2: fix use-after-free bug of ns_writer on remount
	drm/i915/dmabuf: fix sg_table handling in map_dma_buf
	platform/x86: hp_wmi: Fix rfkill causing soft blocked wifi
	btrfs: selftests: fix wrong error check in btrfs_free_dummy_root()
	udf: Fix a slab-out-of-bounds write bug in udf_find_entry()
	cert host tools: Stop complaining about deprecated OpenSSL functions
	dmaengine: at_hdmac: Fix at_lli struct definition
	dmaengine: at_hdmac: Don't start transactions at tx_submit level
	dmaengine: at_hdmac: Fix completion of unissued descriptor in case of errors
	dmaengine: at_hdmac: Don't allow CPU to reorder channel enable
	dmaengine: at_hdmac: Fix impossible condition
	dmaengine: at_hdmac: Check return code of dma_async_device_register
	net: tun: call napi_schedule_prep() to ensure we own a napi
	x86/cpu: Restore AMD's DE_CFG MSR after resume
	ASoC: wm5102: Revert "ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe"
	ASoC: wm5110: Revert "ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe"
	ASoC: wm8997: Revert "ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe"
	spi: intel: Fix the offset to get the 64K erase opcode
	selftests/futex: fix build for clang
	selftests/intel_pstate: fix build for ARCH=x86_64
	NFSv4: Retry LOCK on OLD_STATEID during delegation return
	drm/imx: imx-tve: Fix return type of imx_tve_connector_mode_valid
	btrfs: remove pointless and double ulist frees in error paths of qgroup tests
	Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm
	ASoC: core: Fix use-after-free in snd_soc_exit()
	serial: 8250_omap: remove wait loop from Errata i202 workaround
	serial: 8250: omap: Flush PM QOS work on remove
	serial: imx: Add missing .thaw_noirq hook
	tty: n_gsm: fix sleep-in-atomic-context bug in gsm_control_send
	ASoC: soc-utils: Remove __exit for snd_soc_util_exit()
	block: sed-opal: kmalloc the cmd/resp buffers
	siox: fix possible memory leak in siox_device_add()
	parport_pc: Avoid FIFO port location truncation
	pinctrl: devicetree: fix null pointer dereferencing in pinctrl_dt_to_map
	ata: libata-transport: fix double ata_host_put() in ata_tport_add()
	net: bgmac: Drop free_netdev() from bgmac_enet_remove()
	mISDN: fix possible memory leak in mISDN_dsp_element_register()
	mISDN: fix misuse of put_device() in mISDN_register_device()
	net: caif: fix double disconnect client in chnl_net_open()
	bnxt_en: Remove debugfs when pci_register_driver failed
	xen/pcpu: fix possible memory leak in register_pcpu()
	drbd: use after free in drbd_create_device()
	net/x25: Fix skb leak in x25_lapb_receive_frame()
	cifs: Fix wrong return value checking when GETFLAGS
	net: thunderbolt: Fix error handling in tbnet_init()
	ftrace: Fix the possible incorrect kernel message
	ftrace: Optimize the allocation for mcount entries
	ftrace: Fix null pointer dereference in ftrace_add_mod()
	ring_buffer: Do not deactivate non-existant pages
	ALSA: usb-audio: Drop snd_BUG_ON() from snd_usbmidi_output_open()
	slimbus: stream: correct presence rate frequencies
	speakup: fix a segfault caused by switching consoles
	USB: serial: option: add Sierra Wireless EM9191
	USB: serial: option: remove old LARA-R6 PID
	USB: serial: option: add u-blox LARA-R6 00B modem
	USB: serial: option: add u-blox LARA-L6 modem
	USB: serial: option: add Fibocom FM160 0x0111 composition
	usb: add NO_LPM quirk for Realforce 87U Keyboard
	usb: chipidea: fix deadlock in ci_otg_del_timer
	iio: adc: at91_adc: fix possible memory leak in at91_adc_allocate_trigger()
	iio: trigger: sysfs: fix possible memory leak in iio_sysfs_trig_init()
	iio: pressure: ms5611: changed hardcoded SPI speed to value limited
	dm ioctl: fix misbehavior if list_versions races with module loading
	serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs
	serial: 8250_lpss: Configure DMA also w/o DMA filter
	mmc: core: properly select voltage range without power cycle
	mmc: sdhci-pci: Fix possible memory leak caused by missing pci_dev_put()
	docs: update mediator contact information in CoC doc
	misc/vmw_vmci: fix an infoleak in vmci_host_do_receive_datagram()
	scsi: target: tcm_loop: Fix possible name leak in tcm_loop_setup_hba_bus()
	Input: i8042 - fix leaking of platform device on module removal
	serial: 8250: Flush DMA Rx on RLSI
	macvlan: enforce a consistent minimal mtu
	tcp: cdg: allow tcp_cdg_release() to be called multiple times
	kcm: avoid potential race in kcm_tx_work
	bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb()
	kcm: close race conditions on sk_receive_queue
	9p: trans_fd/p9_conn_cancel: drop client lock earlier
	gfs2: Check sb_bsize_shift after reading superblock
	gfs2: Switch from strlcpy to strscpy
	9p/trans_fd: always use O_NONBLOCK read/write
	mm: fs: initialize fsdata passed to write_begin/write_end interface
	ntfs: fix use-after-free in ntfs_attr_find()
	ntfs: fix out-of-bounds read in ntfs_attr_find()
	ntfs: check overflow when iterating ATTR_RECORDs
	Linux 4.19.267

Change-Id: Id7e07ae5c1681de4cd1b0499cf1bfd257ca2261b
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2022-11-28 16:04:43 +00:00
Alexander Potapenko
8a5be2948f mm: fs: initialize fsdata passed to write_begin/write_end interface
commit 1468c6f4558b1bcd92aa0400f2920f9dc7588402 upstream.

Functions implementing the a_ops->write_end() interface accept the `void
*fsdata` parameter that is supposed to be initialized by the corresponding
a_ops->write_begin() (which accepts `void **fsdata`).

However not all a_ops->write_begin() implementations initialize `fsdata`
unconditionally, so it may get passed uninitialized to a_ops->write_end(),
resulting in undefined behavior.

Fix this by initializing fsdata with NULL before the call to
write_begin(), rather than doing so in all possible a_ops implementations.

This patch covers only the following cases found by running x86 KMSAN
under syzkaller:

 - generic_perform_write()
 - cont_expand_zero() and generic_cont_expand_simple()
 - page_symlink()

Other cases of passing uninitialized fsdata may persist in the codebase.

Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25 17:40:29 +01:00
Greg Kroah-Hartman
d1253c75a8 Merge 4.19.155 into android-4.19-stable
Changes in 4.19.155
	objtool: Support Clang non-section symbols in ORC generation
	scripts/setlocalversion: make git describe output more reliable
	arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs
	arm64: link with -z norelro regardless of CONFIG_RELOCATABLE
	x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled
	efivarfs: Replace invalid slashes with exclamation marks in dentries.
	chelsio/chtls: fix deadlock issue
	chelsio/chtls: fix memory leaks in CPL handlers
	chelsio/chtls: fix tls record info to user
	gtp: fix an use-before-init in gtp_newlink()
	mlxsw: core: Fix memory leak on module removal
	netem: fix zero division in tabledist
	ravb: Fix bit fields checking in ravb_hwtstamp_get()
	tcp: Prevent low rmem stalls with SO_RCVLOWAT.
	tipc: fix memory leak caused by tipc_buf_append()
	r8169: fix issue with forced threading in combination with shared interrupts
	cxgb4: set up filter action after rewrites
	arch/x86/amd/ibs: Fix re-arming IBS Fetch
	x86/xen: disable Firmware First mode for correctable memory errors
	fuse: fix page dereference after free
	bpf: Fix comment for helper bpf_current_task_under_cgroup()
	evm: Check size of security.evm before using it
	p54: avoid accessing the data mapped to streaming DMA
	cxl: Rework error message for incompatible slots
	RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()
	mtd: lpddr: Fix bad logic in print_drs_error
	serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt
	ata: sata_rcar: Fix DMA boundary mask
	fscrypt: return -EXDEV for incompatible rename or link into encrypted dir
	fscrypt: clean up and improve dentry revalidation
	fscrypt: fix race allowing rename() and link() of ciphertext dentries
	fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory
	fscrypt: only set dentry_operations on ciphertext dentries
	fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
	Revert "block: ratelimit handle_bad_sector() message"
	xen/events: don't use chip_data for legacy IRQs
	xen/events: avoid removing an event channel while handling it
	xen/events: add a proper barrier to 2-level uevent unmasking
	xen/events: fix race in evtchn_fifo_unmask()
	xen/events: add a new "late EOI" evtchn framework
	xen/blkback: use lateeoi irq binding
	xen/netback: use lateeoi irq binding
	xen/scsiback: use lateeoi irq binding
	xen/pvcallsback: use lateeoi irq binding
	xen/pciback: use lateeoi irq binding
	xen/events: switch user event channels to lateeoi model
	xen/events: use a common cpu hotplug hook for event channels
	xen/events: defer eoi in case of excessive number of events
	xen/events: block rogue events for some time
	x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels
	mlxsw: core: Fix use-after-free in mlxsw_emad_trans_finish()
	RDMA/qedr: Fix memory leak in iWARP CM
	ata: sata_nv: Fix retrieving of active qcs
	futex: Fix incorrect should_fail_futex() handling
	powerpc/powernv/smp: Fix spurious DBG() warning
	mm: fix exec activate_mm vs TLB shootdown and lazy tlb switching race
	powerpc: select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
	sparc64: remove mm_cpumask clearing to fix kthread_use_mm race
	f2fs: add trace exit in exception path
	f2fs: fix uninit-value in f2fs_lookup
	f2fs: fix to check segment boundary during SIT page readahead
	um: change sigio_spinlock to a mutex
	ARM: 8997/2: hw_breakpoint: Handle inexact watchpoint addresses
	power: supply: bq27xxx: report "not charging" on all types
	xfs: fix realtime bitmap/summary file truncation when growing rt volume
	video: fbdev: pvr2fb: initialize variables
	ath10k: start recovery process when payload length exceeds max htc length for sdio
	ath10k: fix VHT NSS calculation when STBC is enabled
	drm/brige/megachips: Add checking if ge_b850v3_lvds_init() is working correctly
	media: videodev2.h: RGB BT2020 and HSV are always full range
	media: platform: Improve queue set up flow for bug fixing
	usb: typec: tcpm: During PR_SWAP, source caps should be sent only after tSwapSourceStart
	media: tw5864: check status of tw5864_frameinterval_get
	media: imx274: fix frame interval handling
	mmc: via-sdmmc: Fix data race bug
	drm/bridge/synopsys: dsi: add support for non-continuous HS clock
	arm64: topology: Stop using MPIDR for topology information
	printk: reduce LOG_BUF_SHIFT range for H8300
	ia64: kprobes: Use generic kretprobe trampoline handler
	kgdb: Make "kgdbcon" work properly with "kgdb_earlycon"
	media: uvcvideo: Fix dereference of out-of-bound list iterator
	riscv: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
	cpufreq: sti-cpufreq: add stih418 support
	USB: adutux: fix debugging
	uio: free uio id after uio file node is freed
	usb: xhci: omit duplicate actions when suspending a runtime suspended host.
	arm64/mm: return cpu_all_mask when node is NUMA_NO_NODE
	xfs: don't free rt blocks when we're doing a REMAP bunmapi call
	ACPI: Add out of bounds and numa_off protections to pxm_to_node()
	drivers/net/wan/hdlc_fr: Correctly handle special skb->protocol values
	bus/fsl_mc: Do not rely on caller to provide non NULL mc_io
	power: supply: test_power: add missing newlines when printing parameters by sysfs
	drm/amd/display: HDMI remote sink need mode validation for Linux
	btrfs: fix replace of seed device
	md/bitmap: md_bitmap_get_counter returns wrong blocks
	bnxt_en: Log unknown link speed appropriately.
	rpmsg: glink: Use complete_all for open states
	clk: ti: clockdomain: fix static checker warning
	net: 9p: initialize sun_server.sun_path to have addr's value only when addr is valid
	drivers: watchdog: rdc321x_wdt: Fix race condition bugs
	ext4: Detect already used quota file early
	gfs2: add validation checks for size of superblock
	cifs: handle -EINTR in cifs_setattr
	arm64: dts: renesas: ulcb: add full-pwr-cycle-in-suspend into eMMC nodes
	ARM: dts: omap4: Fix sgx clock rate for 4430
	memory: emif: Remove bogus debugfs error handling
	ARM: dts: s5pv210: remove DMA controller bus node name to fix dtschema warnings
	ARM: dts: s5pv210: move PMU node out of clock controller
	ARM: dts: s5pv210: remove dedicated 'audio-subsystem' node
	nbd: make the config put is called before the notifying the waiter
	sgl_alloc_order: fix memory leak
	nvme-rdma: fix crash when connect rejected
	md/raid5: fix oops during stripe resizing
	mmc: sdhci-acpi: AMDI0040: Set SDHCI_QUIRK2_PRESET_VALUE_BROKEN
	perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()
	perf/x86/amd/ibs: Fix raw sample data accumulation
	leds: bcm6328, bcm6358: use devres LED registering function
	media: uvcvideo: Fix uvc_ctrl_fixup_xu_info() not having any effect
	fs: Don't invalidate page buffers in block_write_full_page()
	NFS: fix nfs_path in case of a rename retry
	ACPI: button: fix handling lid state changes when input device closed
	ACPI / extlog: Check for RDMSR failure
	ACPI: video: use ACPI backlight for HP 635 Notebook
	ACPI: debug: don't allow debugging when ACPI is disabled
	acpi-cpufreq: Honor _PSD table setting on new AMD CPUs
	w1: mxc_w1: Fix timeout resolution problem leading to bus error
	scsi: mptfusion: Fix null pointer dereferences in mptscsih_remove()
	scsi: qla2xxx: Fix crash on session cleanup with unload
	btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode
	btrfs: improve device scanning messages
	btrfs: reschedule if necessary when logging directory items
	btrfs: send, recompute reference path after orphanization of a directory
	btrfs: use kvzalloc() to allocate clone_roots in btrfs_ioctl_send()
	btrfs: cleanup cow block on error
	btrfs: fix use-after-free on readahead extent after failure to create it
	usb: xhci: Workaround for S3 issue on AMD SNPS 3.0 xHC
	usb: dwc3: ep0: Fix ZLP for OUT ep0 requests
	usb: dwc3: gadget: Check MPS of the request length
	usb: dwc3: core: add phy cleanup for probe error handling
	usb: dwc3: core: don't trigger runtime pm when remove driver
	usb: cdc-acm: fix cooldown mechanism
	usb: typec: tcpm: reset hard_reset_count for any disconnect
	usb: host: fsl-mph-dr-of: check return of dma_set_mask()
	drm/i915: Force VT'd workarounds when running as a guest OS
	vt: keyboard, simplify vt_kdgkbsent
	vt: keyboard, extend func_buf_lock to readers
	HID: wacom: Avoid entering wacom_wac_pen_report for pad / battery
	udf: Fix memory leak when mounting
	dmaengine: dma-jz4780: Fix race in jz4780_dma_tx_status
	iio:light:si1145: Fix timestamp alignment and prevent data leak.
	iio:adc:ti-adc0832 Fix alignment issue with timestamp
	iio:adc:ti-adc12138 Fix alignment issue with timestamp
	iio:gyro:itg3200: Fix timestamp alignment and prevent data leak.
	powerpc/drmem: Make lmb_size 64 bit
	s390/stp: add locking to sysfs functions
	powerpc/rtas: Restrict RTAS requests from userspace
	powerpc: Warn about use of smt_snooze_delay
	powerpc/powernv/elog: Fix race while processing OPAL error log event.
	powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation
	NFSv4.2: support EXCHGID4_FLAG_SUPP_FENCE_OPS 4.2 EXCHANGE_ID flag
	NFSD: Add missing NFSv2 .pc_func methods
	ubifs: dent: Fix some potential memory leaks while iterating entries
	perf python scripting: Fix printable strings in python3 scripts
	ubi: check kthread_should_stop() after the setting of task state
	ia64: fix build error with !COREDUMP
	i2c: imx: Fix external abort on interrupt in exit paths
	drm/amdgpu: don't map BO in reserved region
	drm/amd/display: Don't invoke kgdb_breakpoint() unconditionally
	ceph: promote to unsigned long long before shifting
	libceph: clear con->out_msg on Policy::stateful_server faults
	9P: Cast to loff_t before multiplying
	ring-buffer: Return 0 on success from ring_buffer_resize()
	vringh: fix __vringh_iov() when riov and wiov are different
	ext4: fix leaking sysfs kobject after failed mount
	ext4: fix error handling code in add_new_gdb
	ext4: fix invalid inode checksum
	drm/ttm: fix eviction valuable range check.
	rtc: rx8010: don't modify the global rtc ops
	tty: make FONTX ioctl use the tty pointer they were actually passed
	arm64: berlin: Select DW_APB_TIMER_OF
	cachefiles: Handle readpage error correctly
	hil/parisc: Disable HIL driver when it gets stuck
	arm: dts: mt7623: add missing pause for switchport
	ARM: samsung: fix PM debug build with DEBUG_LL but !MMU
	ARM: s3c24xx: fix missing system reset
	device property: Keep secondary firmware node secondary by type
	device property: Don't clear secondary pointer for shared primary firmware node
	KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
	staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice
	staging: octeon: repair "fixed-link" support
	staging: octeon: Drop on uncorrectable alignment or FCS error
	Linux 4.19.155

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I18fefb5bfaa4d05772c61c2975340d0f089b8e3e
2020-11-05 14:02:27 +01:00
Jan Kara
7ce2b16bad fs: Don't invalidate page buffers in block_write_full_page()
commit 6dbf7bb555981fb5faf7b691e8f6169fc2b2e63b upstream.

If block_write_full_page() is called for a page that is beyond current
inode size, it will truncate page buffers for the page and return 0.
This logic has been added in 2.5.62 in commit 81eb69062588 ("fix ext3
BUG due to race with truncate") in history.git tree to fix a problem
with ext3 in data=ordered mode. This particular problem doesn't exist
anymore because ext3 is long gone and ext4 handles ordered data
differently. Also normally buffers are invalidated by truncate code and
there's no need to specially handle this in ->writepage() code.

This invalidation of page buffers in block_write_full_page() is causing
issues to filesystems (e.g. ext4 or ocfs2) when block device is shrunk
under filesystem's hands and metadata buffers get discarded while being
tracked by the journalling layer. Although it is obviously "not
supported" it can cause kernel crashes like:

[ 7986.689400] BUG: unable to handle kernel NULL pointer dereference at
+0000000000000008
[ 7986.697197] PGD 0 P4D 0
[ 7986.699724] Oops: 0002 [#1] SMP PTI
[ 7986.703200] CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G
+O     --------- -  - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
[ 7986.716438] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
[ 7986.723462] RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
...
[ 7986.810150] Call Trace:
[ 7986.812595]  __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
[ 7986.818408]  jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
[ 7986.836467]  kjournald2+0xbd/0x270 [jbd2]

which is not great. The crash happens because bh->b_private is suddently
NULL although BH_JBD flag is still set (this is because
block_invalidatepage() cleared BH_Mapped flag and subsequent bh lookup
found buffer without BH_Mapped set, called init_page_buffers() which has
rewritten bh->b_private). So just remove the invalidation in
block_write_full_page().

Note that the buffer cache invalidation when block device changes size
is already careful to avoid similar problems by using
invalidate_mapping_pages() which skips busy buffers so it was only this
odd block_write_full_page() behavior that could tear down bdev buffers
under filesystem's hands.

Reported-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05 11:08:46 +01:00
Greg Kroah-Hartman
a13ec5ea86 Merge 4.19.143 into android-4.19-stable
Changes in 4.19.143
	powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
	gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY
	net: Fix potential wrong skb->protocol in skb_vlan_untag()
	net: qrtr: fix usage of idr in port assignment to socket
	net/smc: Prevent kernel-infoleak in __smc_diag_dump()
	tipc: fix uninit skb->data in tipc_nl_compat_dumpit()
	net: ena: Make missed_tx stat incremental
	ipvlan: fix device features
	ALSA: pci: delete repeated words in comments
	ASoC: img: Fix a reference count leak in img_i2s_in_set_fmt
	ASoC: img-parallel-out: Fix a reference count leak
	ASoC: tegra: Fix reference count leaks.
	mfd: intel-lpss: Add Intel Emmitsburg PCH PCI IDs
	arm64: dts: qcom: msm8916: Pull down PDM GPIOs during sleep
	powerpc/xive: Ignore kmemleak false positives
	media: pci: ttpci: av7110: fix possible buffer overflow caused by bad DMA value in debiirq()
	blktrace: ensure our debugfs dir exists
	scsi: target: tcmu: Fix crash on ARM during cmd completion
	iommu/iova: Don't BUG on invalid PFNs
	drm/amdkfd: Fix reference count leaks.
	drm/radeon: fix multiple reference count leak
	drm/amdgpu: fix ref count leak in amdgpu_driver_open_kms
	drm/amd/display: fix ref count leak in amdgpu_drm_ioctl
	drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config
	drm/amdgpu/display: fix ref count leak when pm_runtime_get_sync fails
	scsi: lpfc: Fix shost refcount mismatch when deleting vport
	xfs: Don't allow logging of XFS_ISTALE inodes
	selftests/powerpc: Purge extra count_pmc() calls of ebb selftests
	f2fs: fix error path in do_recover_data()
	omapfb: fix multiple reference count leaks due to pm_runtime_get_sync
	PCI: Fix pci_create_slot() reference count leak
	ARM: dts: ls1021a: output PPS signal on FIPER2
	rtlwifi: rtl8192cu: Prevent leaking urb
	mips/vdso: Fix resource leaks in genvdso.c
	cec-api: prevent leaking memory through hole in structure
	HID: quirks: add NOGET quirk for Logitech GROUP
	f2fs: fix use-after-free issue
	drm/nouveau/drm/noveau: fix reference count leak in nouveau_fbcon_open
	drm/nouveau: fix reference count leak in nv50_disp_atomic_commit
	drm/nouveau: Fix reference count leak in nouveau_connector_detect
	locking/lockdep: Fix overflow in presentation of average lock-time
	btrfs: file: reserve qgroup space after the hole punch range is locked
	scsi: iscsi: Do not put host in iscsi_set_flashnode_param()
	ceph: fix potential mdsc use-after-free crash
	scsi: fcoe: Memory leak fix in fcoe_sysfs_fcf_del()
	EDAC/ie31200: Fallback if host bridge device is already initialized
	KVM: arm64: Fix symbol dependency in __hyp_call_panic_nvhe
	powerpc/spufs: add CONFIG_COREDUMP dependency
	USB: sisusbvga: Fix a potential UB casued by left shifting a negative value
	efi: provide empty efi_enter_virtual_mode implementation
	Revert "ath10k: fix DMA related firmware crashes on multiple devices"
	media: gpio-ir-tx: improve precision of transmitted signal due to scheduling
	drm/msm/adreno: fix updating ring fence
	nvme-fc: Fix wrong return value in __nvme_fc_init_request()
	null_blk: fix passing of REQ_FUA flag in null_handle_rq
	i2c: rcar: in slave mode, clear NACK earlier
	usb: gadget: f_tcm: Fix some resource leaks in some error paths
	jbd2: make sure jh have b_transaction set in refile/unfile_buffer
	ext4: don't BUG on inconsistent journal feature
	ext4: handle read only external journal device
	jbd2: abort journal if free a async write error metadata buffer
	ext4: handle option set by mount flags correctly
	ext4: handle error of ext4_setup_system_zone() on remount
	ext4: correctly restore system zone info when remount fails
	fs: prevent BUG_ON in submit_bh_wbc()
	spi: stm32: fix stm32_spi_prepare_mbr in case of odd clk_rate
	s390/cio: add cond_resched() in the slow_eval_known_fn() loop
	ASoC: wm8994: Avoid attempts to read unreadable registers
	scsi: fcoe: Fix I/O path allocation
	scsi: ufs: Fix possible infinite loop in ufshcd_hold
	scsi: ufs: Improve interrupt handling for shared interrupts
	scsi: ufs: Clean up completed request without interrupt notification
	scsi: qla2xxx: Check if FW supports MQ before enabling
	scsi: qla2xxx: Fix null pointer access during disconnect from subsystem
	Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command"
	macvlan: validate setting of multiple remote source MAC addresses
	net: gianfar: Add of_node_put() before goto statement
	powerpc/perf: Fix soft lockups due to missed interrupt accounting
	block: loop: set discard granularity and alignment for block device backed loop
	HID: i2c-hid: Always sleep 60ms after I2C_HID_PWR_ON commands
	blk-mq: order adding requests to hctx->dispatch and checking SCHED_RESTART
	btrfs: reset compression level for lzo on remount
	btrfs: fix space cache memory leak after transaction abort
	fbcon: prevent user font height or width change from causing potential out-of-bounds access
	USB: lvtest: return proper error code in probe
	vt: defer kfree() of vc_screenbuf in vc_do_resize()
	vt_ioctl: change VT_RESIZEX ioctl to check for error return from vc_resize()
	serial: samsung: Removes the IRQ not found warning
	serial: pl011: Fix oops on -EPROBE_DEFER
	serial: pl011: Don't leak amba_ports entry on driver register error
	serial: 8250_exar: Fix number of ports for Commtech PCIe cards
	serial: 8250: change lock order in serial8250_do_startup()
	writeback: Protect inode->i_io_list with inode->i_lock
	writeback: Avoid skipping inode writeback
	writeback: Fix sync livelock due to b_dirty_time processing
	XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt XEN data pointer which contains XEN specific information.
	usb: host: xhci: fix ep context print mismatch in debugfs
	xhci: Do warm-reset when both CAS and XDEV_RESUME are set
	xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed
	PM: sleep: core: Fix the handling of pending runtime resume requests
	device property: Fix the secondary firmware node handling in set_primary_fwnode()
	genirq/matrix: Deal with the sillyness of for_each_cpu() on UP
	irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake
	drm/amdgpu: Fix buffer overflow in INFO ioctl
	drm/amd/pm: correct Vega10 swctf limit setting
	drm/amd/pm: correct Vega12 swctf limit setting
	USB: yurex: Fix bad gfp argument
	usb: uas: Add quirk for PNY Pro Elite
	USB: quirks: Add no-lpm quirk for another Raydium touchscreen
	USB: quirks: Ignore duplicate endpoint on Sound Devices MixPre-D
	USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge
	usb: host: ohci-exynos: Fix error handling in exynos_ohci_probe()
	USB: gadget: u_f: add overflow checks to VLA macros
	USB: gadget: f_ncm: add bounds checks to ncm_unwrap_ntb()
	USB: gadget: u_f: Unbreak offset calculation in VLAs
	USB: cdc-acm: rework notification_buffer resizing
	usb: storage: Add unusual_uas entry for Sony PSZ drives
	btrfs: check the right error variable in btrfs_del_dir_entries_in_log
	usb: dwc3: gadget: Don't setup more than requested
	usb: dwc3: gadget: Fix handling ZLP
	usb: dwc3: gadget: Handle ZLP for sg requests
	tpm: Unify the mismatching TPM space buffer sizes
	HID: hiddev: Fix slab-out-of-bounds write in hiddev_ioctl_usage()
	ALSA: usb-audio: Update documentation comment for MS2109 quirk
	Linux 4.19.143

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8b6e29eda77bd69df30132842cf28019c8e7c1a3
2020-09-03 13:19:20 +02:00
Xianting Tian
4aaac9c537 fs: prevent BUG_ON in submit_bh_wbc()
[ Upstream commit 377254b2cd2252c7c3151b113cbdf93a7736c2e9 ]

If a device is hot-removed --- for example, when a physical device is
unplugged from pcie slot or a nbd device's network is shutdown ---
this can result in a BUG_ON() crash in submit_bh_wbc().  This is
because the when the block device dies, the buffer heads will have
their Buffer_Mapped flag get cleared, leading to the crash in
submit_bh_wbc.

We had attempted to work around this problem in commit a17712c8
("ext4: check superblock mapped prior to committing").  Unfortunately,
it's still possible to hit the BUG_ON(!buffer_mapped(bh)) if the
device dies between when the work-around check in ext4_commit_super()
and when submit_bh_wbh() is finally called:

Code path:
ext4_commit_super
    judge if 'buffer_mapped(sbh)' is false, return <== commit a17712c8
          lock_buffer(sbh)
          ...
          unlock_buffer(sbh)
               __sync_dirty_buffer(sbh,...
                    lock_buffer(sbh)
                        judge if 'buffer_mapped(sbh))' is false, return <== added by this patch
                            submit_bh(...,sbh)
                                submit_bh_wbc(...,sbh,...)

[100722.966497] kernel BUG at fs/buffer.c:3095! <== BUG_ON(!buffer_mapped(bh))' in submit_bh_wbc()
[100722.966503] invalid opcode: 0000 [#1] SMP
[100722.966566] task: ffff8817e15a9e40 task.stack: ffffc90024744000
[100722.966574] RIP: 0010:submit_bh_wbc+0x180/0x190
[100722.966575] RSP: 0018:ffffc90024747a90 EFLAGS: 00010246
[100722.966576] RAX: 0000000000620005 RBX: ffff8818a80603a8 RCX: 0000000000000000
[100722.966576] RDX: ffff8818a80603a8 RSI: 0000000000020800 RDI: 0000000000000001
[100722.966577] RBP: ffffc90024747ac0 R08: 0000000000000000 R09: ffff88207f94170d
[100722.966578] R10: 00000000000437c8 R11: 0000000000000001 R12: 0000000000020800
[100722.966578] R13: 0000000000000001 R14: 000000000bf9a438 R15: ffff88195f333000
[100722.966580] FS:  00007fa2eee27700(0000) GS:ffff88203d840000(0000) knlGS:0000000000000000
[100722.966580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[100722.966581] CR2: 0000000000f0b008 CR3: 000000201a622003 CR4: 00000000007606e0
[100722.966582] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[100722.966583] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[100722.966583] PKRU: 55555554
[100722.966583] Call Trace:
[100722.966588]  __sync_dirty_buffer+0x6e/0xd0
[100722.966614]  ext4_commit_super+0x1d8/0x290 [ext4]
[100722.966626]  __ext4_std_error+0x78/0x100 [ext4]
[100722.966635]  ? __ext4_journal_get_write_access+0xca/0x120 [ext4]
[100722.966646]  ext4_reserve_inode_write+0x58/0xb0 [ext4]
[100722.966655]  ? ext4_dirty_inode+0x48/0x70 [ext4]
[100722.966663]  ext4_mark_inode_dirty+0x53/0x1e0 [ext4]
[100722.966671]  ? __ext4_journal_start_sb+0x6d/0xf0 [ext4]
[100722.966679]  ext4_dirty_inode+0x48/0x70 [ext4]
[100722.966682]  __mark_inode_dirty+0x17f/0x350
[100722.966686]  generic_update_time+0x87/0xd0
[100722.966687]  touch_atime+0xa9/0xd0
[100722.966690]  generic_file_read_iter+0xa09/0xcd0
[100722.966694]  ? page_cache_tree_insert+0xb0/0xb0
[100722.966704]  ext4_file_read_iter+0x4a/0x100 [ext4]
[100722.966707]  ? __inode_security_revalidate+0x4f/0x60
[100722.966709]  __vfs_read+0xec/0x160
[100722.966711]  vfs_read+0x8c/0x130
[100722.966712]  SyS_pread64+0x87/0xb0
[100722.966716]  do_syscall_64+0x67/0x1b0
[100722.966719]  entry_SYSCALL64_slow_path+0x25/0x25

To address this, add the check of 'buffer_mapped(bh)' to
__sync_dirty_buffer().  This also has the benefit of fixing this for
other file systems.

With this addition, we can drop the workaround in ext4_commit_supper().

[ Commit description rewritten by tytso. ]

Signed-off-by: Xianting Tian <xianting_tian@126.com>
Link: https://lore.kernel.org/r/1596211825-8750-1-git-send-email-xianting_tian@126.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03 11:24:24 +02:00
Greg Kroah-Hartman
a13256124f Merge 4.19.118 into android-4.19
Changes in 4.19.118
	arm, bpf: Fix offset overflow for BPF_MEM BPF_DW
	objtool: Fix switch table detection in .text.unlikely
	scsi: sg: add sg_remove_request in sg_common_write
	ext4: use non-movable memory for superblock readahead
	watchdog: sp805: fix restart handler
	arm, bpf: Fix bugs with ALU64 {RSH, ARSH} BPF_K shift by 0
	ARM: dts: imx6: Use gpc for FEC interrupt controller to fix wake on LAN.
	netfilter: nf_tables: report EOPNOTSUPP on unsupported flags/object type
	irqchip/mbigen: Free msi_desc on device teardown
	ALSA: hda: Don't release card at firmware loading error
	of: unittest: kmemleak on changeset destroy
	of: unittest: kmemleak in of_unittest_platform_populate()
	of: unittest: kmemleak in of_unittest_overlay_high_level()
	of: overlay: kmemleak in dup_and_fixup_symbol_prop()
	x86/Hyper-V: Report crash register data or kmsg before running crash kernel
	lib/raid6: use vdupq_n_u8 to avoid endianness warnings
	video: fbdev: sis: Remove unnecessary parentheses and commented code
	rbd: avoid a deadlock on header_rwsem when flushing notifies
	rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
	xsk: Add missing check on user supplied headroom size
	x86/Hyper-V: Unload vmbus channel in hv panic callback
	x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump
	x86/Hyper-V: Trigger crash enlightenment only once during system crash.
	x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set
	x86/Hyper-V: Report crash data in die() when panic_on_oops is set
	clk: at91: usb: continue if clk_hw_round_rate() return zero
	power: supply: bq27xxx_battery: Silence deferred-probe error
	clk: tegra: Fix Tegra PMC clock out parents
	soc: imx: gpc: fix power up sequencing
	rtc: 88pm860x: fix possible race condition
	NFSv4/pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
	NFS: direct.c: Fix memory leak of dreq when nfs_get_lock_context fails
	s390/cpuinfo: fix wrong output when CPU0 is offline
	powerpc/maple: Fix declaration made after definition
	s390/cpum_sf: Fix wrong page count in error message
	ext4: do not commit super on read-only bdev
	um: ubd: Prevent buffer overrun on command completion
	cifs: Allocate encryption header through kmalloc
	include/linux/swapops.h: correct guards for non_swap_entry()
	percpu_counter: fix a data race at vm_committed_as
	compiler.h: fix error in BUILD_BUG_ON() reporting
	KVM: s390: vsie: Fix possible race when shadowing region 3 tables
	x86: ACPI: fix CPU hotplug deadlock
	drm/amdkfd: kfree the wrong pointer
	NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
	f2fs: fix NULL pointer dereference in f2fs_write_begin()
	drm/vc4: Fix HDMI mode validation
	iommu/vt-d: Fix mm reference leak
	ext2: fix empty body warnings when -Wextra is used
	ext2: fix debug reference to ext2_xattr_cache
	power: supply: axp288_fuel_gauge: Broaden vendor check for Intel Compute Sticks.
	libnvdimm: Out of bounds read in __nd_ioctl()
	iommu/amd: Fix the configuration of GCR3 table root pointer
	f2fs: fix to wait all node page writeback
	net: dsa: bcm_sf2: Fix overflow checks
	fbdev: potential information leak in do_fb_ioctl()
	iio: si1133: read 24-bit signed integer for measurement
	tty: evh_bytechan: Fix out of bounds accesses
	locktorture: Print ratio of acquisitions, not failures
	mtd: spinand: Explicitly use MTD_OPS_RAW to write the bad block marker to OOB
	mtd: lpddr: Fix a double free in probe()
	mtd: phram: fix a double free issue in error path
	KEYS: Don't write out to userspace while holding key semaphore
	bpf: fix buggy r0 retval refinement for tracing helpers
	Linux 4.19.118

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ife34f739f719c332c7b1d22b1832179be6a16800
2020-04-23 11:07:54 +02:00
Roman Gushchin
a6375c9877 ext4: use non-movable memory for superblock readahead
commit d87f639258a6a5980183f11876c884931ad93da2 upstream.

Since commit a8ac900b81 ("ext4: use non-movable memory for the
superblock") buffers for ext4 superblock were allocated using
the sb_bread_unmovable() helper which allocated buffer heads
out of non-movable memory blocks. It was necessarily to not block
page migrations and do not cause cma allocation failures.

However commit 85c8f176a6 ("ext4: preload block group descriptors")
broke this by introducing pre-reading of the ext4 superblock.
The problem is that __breadahead() is using __getblk() underneath,
which allocates buffer heads out of movable memory.

It resulted in page migration failures I've seen on a machine
with an ext4 partition and a preallocated cma area.

Fix this by introducing sb_breadahead_unmovable() and
__breadahead_gfp() helpers which use non-movable memory for buffer
head allocations and use them for the ext4 superblock readahead.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Fixes: 85c8f176a6 ("ext4: preload block group descriptors")
Signed-off-by: Roman Gushchin <guro@fb.com>
Link: https://lore.kernel.org/r/20200229001411.128010-1-guro@fb.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23 10:30:12 +02:00
Satya Tangirala
b01c73ea71 BACKPORT: FROMLIST: Update Inline Encryption from v5 to v6 of patch series
Changes v5 => v6:
 - Blk-crypto's kernel crypto API fallback is no longer restricted to
   8-byte DUNs. It's also now separately configurable from blk-crypto, and
   can be disabled entirely, while still allowing the kernel to use inline
   encryption hardware. Further, struct bio_crypt_ctx takes up less space,
   and no longer contains the information needed by the crypto API
   fallback - the fallback allocates the required memory when necessary.
 - Blk-crypto now supports all file content encryption modes supported by
   fscrypt.
 - Fixed bio merging logic in blk-merge.c
 - Fscrypt now supports inline encryption with the direct key policy, since
   blk-crypto now has support for larger DUNs.
 - Keyslot manager now uses a hashtable to lookup which keyslot contains
   any particular key (thanks Eric!)
 - Fscrypt support for inline encryption now handles filesystems with
   multiple underlying block devices (thanks Eric!)
 - Numerous cleanups

Bug: 137270441
Test: refer to I26376479ee38259b8c35732cb3a1d7e15f9b05a3
Change-Id: I13e2e327e0b4784b394cb1e7cf32a04856d95f01
Link: https://lore.kernel.org/linux-block/20191218145136.172774-1-satyat@google.com/
Signed-off-by: Satya Tangirala <satyat@google.com>
2020-01-13 07:11:38 -08:00
Eric Biggers
f2ca2620dd BACKPORT: FROMLIST: ext4: add inline encryption support
Wire up ext4 to support inline encryption via the helper functions which
fs/crypto/ now provides.  This includes:

- Adding a mount option 'inlinecrypt' which enables inline encryption
  on encrypted files where it can be used.

- Setting the bio_crypt_ctx on bios that will be submitted to an
  inline-encrypted file.

  Note: submit_bh_wbc() in fs/buffer.c also needed to be patched for
  this part, since ext4 sometimes uses ll_rw_block() on file data.

- Not adding logically discontiguous data to bios that will be submitted
  to an inline-encrypted file.

- Not doing filesystem-layer crypto on inline-encrypted files.

Bug: 137270441
Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec
Change-Id: I54a8efe388289918f4144d8138fb87aa507ae760
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://patchwork.kernel.org/patch/11214781/
2019-11-14 14:47:50 -08:00
Carlos Maiolino
83c395332f fs: fix guard_bio_eod to check for real EOD errors
[ Upstream commit dce30ca9e3b676fb288c33c1f4725a0621361185 ]

guard_bio_eod() can truncate a segment in bio to allow it to do IO on
odd last sectors of a device.

It already checks if the IO starts past EOD, but it does not consider
the possibility of an IO request starting within device boundaries can
contain more than one segment past EOD.

In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
underflow bvec->bv_len.

Fix this by checking if truncated_bytes is lower than PAGE_SIZE.

This situation has been found on filesystems such as isofs and vfat,
which doesn't check the device size before mount, if the device is
smaller than the filesystem itself, a readahead on such filesystem,
which spans EOD, can trigger this situation, leading a call to
zero_user() with a wrong size possibly corrupting memory.

I didn't see any crash, or didn't let the system run long enough to
check if memory corruption will be hit somewhere, but adding
instrumentation to guard_bio_end() to check truncated_bytes size, was
enough to see the error.

The following script can trigger the error.

MNT=/mnt
IMG=./DISK.img
DEV=/dev/loop0

mkfs.vfat $IMG
mount $IMG $MNT
cp -R /etc $MNT &> /dev/null
umount $MNT

losetup -D

losetup --find --show --sizelimit 16247280 $IMG
mount $DEV $MNT

find $MNT -type f -exec cat {} + >/dev/null

Kudos to Eric Sandeen for coming up with the reproducer above

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:33:00 +02:00
Tetsuo Handa
72426ed2a1 fs: ratelimit __find_get_block_slow() failure message.
[ Upstream commit 43636c804df0126da669c261fc820fb22f62bfc2 ]

When something let __find_get_block_slow() hit all_mapped path, it calls
printk() for 100+ times per a second. But there is no need to print same
message with such high frequency; it is just asking for stall warning, or
at least bloating log files.

  [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.873324][T15342] b_state=0x00000029, b_size=512
  [  399.878403][T15342] device loop0 blocksize: 4096
  [  399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.890400][T15342] b_state=0x00000029, b_size=512
  [  399.895595][T15342] device loop0 blocksize: 4096
  [  399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.907471][T15342] b_state=0x00000029, b_size=512
  [  399.912506][T15342] device loop0 blocksize: 4096

This patch reduces frequency to up to once per a second, in addition to
concatenating three lines into one.

  [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-13 14:02:38 -07:00
Mukesh Ojha
13ba17bee1 notifier: Remove notifier header file wherever not used
The conversion of the hotplug notifiers to a state machine left the
notifier.h includes around in some places. Remove them.

Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org
2018-08-30 12:56:40 +02:00
Shakeel Butt
f745c6f5fe fs, mm: account buffer_head to kmemcg
The buffer_head can consume a significant amount of system memory and is
directly related to the amount of page cache.  In our production
environment we have observed that a lot of machines are spending a
significant amount of memory as buffer_head and can not be left as
system memory overhead.

Charging buffer_head is not as simple as adding __GFP_ACCOUNT to the
allocation.  The buffer_heads can be allocated in a memcg different from
the memcg of the page for which buffer_heads are being allocated.  One
concrete example is memory reclaim.  The reclaim can trigger I/O of
pages of any memcg on the system.  So, the right way to charge
buffer_head is to extract the memcg from the page for which buffer_heads
are being allocated and then use targeted memcg charging API.

[shakeelb@google.com: use __GFP_ACCOUNT for directed memcg charging]
  Link: http://lkml.kernel.org/r/20180702220208.213380-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-3-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:30 -07:00
Andreas Gruenbacher
3d7b6b21f6 iomap: mark newly allocated buffer heads as new
In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as
new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW
flag set.  This will be used by filesystems like gfs2, which allocate
blocks in iomap->begin.

Minor corrections to the comment for IOMAP_UNWRITTEN maps.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19 15:10:55 -07:00
Christoph Hellwig
a6d639da63 fs: factor out a __generic_write_end helper
Bits of the buffer.c based write_end implementations that don't know
about buffer_heads and can be reused by other implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19 15:10:55 -07:00
Christoph Hellwig
8a78cb1f1b fs: move page_cache_seek_hole_data to iomap.c
This function is only used by the iomap code, depends on being called
from it, and will soon stop poking into buffer head internals.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:33 -07:00
Linus Torvalds
7214dd4ea9 Merge branch 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs thaw updates from Al Viro:
 "An ancient series that has fallen through the cracks in the previous
  cycle"

* 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  buffer.c: call thaw_super during emergency thaw
  vfs: factor sb iteration out of do_emergency_remount
2018-04-12 12:28:32 -07:00
Matthew Wilcox
b93b016313 page cache: use xa_lock
Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
  Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox
f82b376413 export __set_page_dirty
XFS currently contains a copy-and-paste of __set_page_dirty().  Export
it from buffer.c instead.

Link: http://lkml.kernel.org/r/20180313132639.17387-6-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Jeff Moyer
3172485f4f block_invalidatepage(): only release page if the full page was invalidated
Prior to commit d47992f86b ("mm: change invalidatepage prototype to
accept length"), an offset of 0 meant that the full page was being
invalidated.  After that commit, we need to instead check the length.

Jan said:
:
: The only possible issue is that try_to_release_page() was called more
: often than necessary.  Otherwise the issue is harmless but still it's good
: to have this fixed.

Link: http://lkml.kernel.org/r/x49fu5rtnzs.fsf@segfault.boston.devel.redhat.com
Fixes: d47992f86b ("mm: change invalidatepage prototype to accept length")
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:27 -07:00
Mateusz Guzik
08fdc8a013 buffer.c: call thaw_super during emergency thaw
There are 2 distinct freezing mechanisms - one operates on block
devices and another one directly on super blocks. Both end up with the
same result, but thaw of only one of these does not thaw the other.

In particular fsfreeze --freeze uses the ioctl variant going to the
super block. Since prior to this patch emergency thaw was not doing
a relevant thaw, filesystems frozen with this method remained
unaffected.

The patch is a hack which adds blind unfreezing.

In order to keep the super block write-locked the whole time the code
is shuffled around and the newly introduced __iterate_supers is
employed.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-19 02:21:40 -04:00
Linus Torvalds
19e7b5f994 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "All kinds of misc stuff, without any unifying topic, from various
  people.

  Neil's d_anon patch, several bugfixes, introduction of kvmalloc
  analogue of kmemdup_user(), extending bitfield.h to deal with
  fixed-endians, assorted cleanups all over the place..."

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
  alpha: osf_sys.c: use timespec64 where appropriate
  alpha: osf_sys.c: fix put_tv32 regression
  jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
  dcache: delete unused d_hash_mask
  dcache: subtract d_hash_shift from 32 in advance
  fs/buffer.c: fold init_buffer() into init_page_buffers()
  fs: fold __inode_permission() into inode_permission()
  fs: add RWF_APPEND
  sctp: use vmemdup_user() rather than badly open-coding memdup_user()
  snd_ctl_elem_init_enum_names(): switch to vmemdup_user()
  replace_user_tlv(): switch to vmemdup_user()
  new primitive: vmemdup_user()
  memdup_user(): switch to GFP_USER
  eventfd: fold eventfd_ctx_get() into eventfd_ctx_fileget()
  eventfd: fold eventfd_ctx_read() into eventfd_read()
  eventfd: convert to use anon_inode_getfd()
  nfs4file: get rid of pointless include of btrfs.h
  uvc_v4l2: clean copyin/copyout up
  vme_user: don't use __copy_..._user()
  usx2y: don't bother with memdup_user() for 16-byte structure
  ...
2018-01-31 09:25:20 -08:00
Eric Biggers
01950a349e fs/buffer.c: fold init_buffer() into init_page_buffers()
Since commit e76004093d ("fs/buffer.c: remove unnecessary init
operation after allocating buffer_head"), there are no callers of
init_buffer() outside of init_page_buffers().  So just fold it into
init_page_buffers().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-25 19:34:28 -05:00
Ming Lei
c45a8f2def fs: convert to bio_last_bvec_all()
This patch converts 3 users to bio_last_bvec_all(), so that we can go
ahead and convert to multipage bvec.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06 09:18:00 -07:00
Mel Gorman
8667982014 mm, pagevec: remove cold parameter for pagevecs
Every pagevec_init user claims the pages being released are hot even in
cases where it is unlikely the pages are hot.  As no one cares about the
hotness of pages being released to the allocator, just ditch the
parameter.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Linus Torvalds
e2c5923c34 Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
 "This is the main pull request for block storage for 4.15-rc1.

  Nothing out of the ordinary in here, and no API changes or anything
  like that. Just various new features for drivers, core changes, etc.
  In particular, this pull request contains:

   - A patch series from Bart, closing the whole on blk/scsi-mq queue
     quescing.

   - A series from Christoph, building towards hidden gendisks (for
     multipath) and ability to move bio chains around.

   - NVMe
        - Support for native multipath for NVMe (Christoph).
        - Userspace notifications for AENs (Keith).
        - Command side-effects support (Keith).
        - SGL support (Chaitanya Kulkarni)
        - FC fixes and improvements (James Smart)
        - Lots of fixes and tweaks (Various)

   - bcache
        - New maintainer (Michael Lyle)
        - Writeback control improvements (Michael)
        - Various fixes (Coly, Elena, Eric, Liang, et al)

   - lightnvm updates, mostly centered around the pblk interface
     (Javier, Hans, and Rakesh).

   - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)

   - Writeback series that fix the much discussed hundreds of millions
     of sync-all units. This goes all the way, as discussed previously
     (me).

   - Fix for missing wakeup on writeback timer adjustments (Yafang
     Shao).

   - Fix laptop mode on blk-mq (me).

   - {mq,name} tupple lookup for IO schedulers, allowing us to have
     alias names. This means you can use 'deadline' on both !mq and on
     mq (where it's called mq-deadline). (me).

   - blktrace race fix, oopsing on sg load (me).

   - blk-mq optimizations (me).

   - Obscure waitqueue race fix for kyber (Omar).

   - NBD fixes (Josef).

   - Disable writeback throttling by default on bfq, like we do on cfq
     (Luca Miccio).

   - Series from Ming that enable us to treat flush requests on blk-mq
     like any other request. This is a really nice cleanup.

   - Series from Ming that improves merging on blk-mq with schedulers,
     getting us closer to flipping the switch on scsi-mq again.

   - BFQ updates (Paolo).

   - blk-mq atomic flags memory ordering fixes (Peter Z).

   - Loop cgroup support (Shaohua).

   - Lots of minor fixes from lots of different folks, both for core and
     driver code"

* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
  nvme: fix visibility of "uuid" ns attribute
  blk-mq: fixup some comment typos and lengths
  ide: ide-atapi: fix compile error with defining macro DEBUG
  blk-mq: improve tag waiting setup for non-shared tags
  brd: remove unused brd_mutex
  blk-mq: only run the hardware queue if IO is pending
  block: avoid null pointer dereference on null disk
  fs: guard_bio_eod() needs to consider partitions
  xtensa/simdisk: fix compile error
  nvme: expose subsys attribute to sysfs
  nvme: create 'slaves' and 'holders' entries for hidden controllers
  block: create 'slaves' and 'holders' entries for hidden gendisks
  nvme: also expose the namespace identification sysfs files for mpath nodes
  nvme: implement multipath access to nvme subsystems
  nvme: track shared namespaces
  nvme: introduce a nvme_ns_ids structure
  nvme: track subsystems
  block, nvme: Introduce blk_mq_req_flags_t
  block, scsi: Make SCSI quiesce and resume work reliably
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  ...
2017-11-14 15:32:19 -08:00
Linus Torvalds
ae9a8c4bdc Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:

 - Add support for online resizing of file systems with bigalloc

 - Fix a two data corruption bugs involving DAX, as well as a corruption
   bug after a crash during a racing fallocate and delayed allocation.

 - Finally, a number of cleanups and optimizations.

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: improve smp scalability for inode generation
  ext4: add support for online resizing with bigalloc
  ext4: mention noload when recovering on read-only device
  Documentation: fix little inconsistencies
  ext4: convert timers to use timer_setup()
  jbd2: convert timers to use timer_setup()
  ext4: remove duplicate extended attributes defs
  ext4: add ext4_should_use_dax()
  ext4: add sanity check for encryption + DAX
  ext4: prevent data corruption with journaling + DAX
  ext4: prevent data corruption with inline data + DAX
  ext4: fix interaction between i_size, fallocate, and delalloc after a crash
  ext4: retry allocations conservatively
  ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  ext4: Add iomap support for inline data
  iomap: Add IOMAP_F_DATA_INLINE flag
  iomap: Switch from blkno to disk offset
2017-11-14 12:59:42 -08:00
Greg Edwards
67f2519fe2 fs: guard_bio_eod() needs to consider partitions
guard_bio_eod() needs to look at the partition capacity, not just the
capacity of the whole device, when determining if truncation is
necessary.

[   60.268688] attempt to access beyond end of device
[   60.268690] unknown-block(9,1): rw=0, want=67103509, limit=67103506
[   60.268693] buffer_io_error: 2 callbacks suppressed
[   60.268696] Buffer I/O error on dev md1p7, logical block 4524305, async page read

Fixes: 74d46992e0 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Cc: stable@vger.kernel.org # v4.13
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-10 19:55:57 -07:00
Mark Rutland
6aa7de0591 locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 11:01:08 +02:00
Jens Axboe
bc48f001de buffer: eliminate the need to call free_more_memory() in __getblk_slow()
Since the previous commit removed any case where grow_buffers()
would return failure due to memory allocations, we can safely
remove the case where we have to call free_more_memory() in
this function.

Since this is also the last user of free_more_memory(), kill
it off completely.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-03 08:38:17 -06:00
Jens Axboe
94dc24c0c5 buffer: grow_dev_page() should use __GFP_NOFAIL for all cases
We currently use it for find_or_create_page(), which means that it
cannot fail. Ensure we also pass in 'retry == true' to
alloc_page_buffers(), which also ensure that it cannot fail.

After this, there are no failure cases in grow_dev_page() that
occur because of a failed memory allocation.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-03 08:38:17 -06:00
Jens Axboe
640ab98fb3 buffer: have alloc_page_buffers() use __GFP_NOFAIL
Instead of adding weird retry logic in that function, utilize
__GFP_NOFAIL to ensure that the vm takes care of handling any
potential retries appropriately. This means we don't have to
call free_more_memory() from here.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-03 08:38:17 -06:00
Andreas Gruenbacher
19fe5f643f iomap: Switch from blkno to disk offset
Replace iomap->blkno, the sector number, with iomap->addr, the disk
offset in bytes.  For invalid disk offsets, use the special value
IOMAP_NULL_ADDR instead of IOMAP_NULL_BLOCK.

This allows to use iomap for mappings which are not block aligned, such
as inline data on ext4.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>  # iomap, xfs
Reviewed-by: Jan Kara <jack@suse.cz>
2017-10-01 17:55:54 -04:00
Linus Torvalds
a0725ab0c7 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the first pull request for 4.14, containing most of the code
  changes. It's a quiet series this round, which I think we needed after
  the churn of the last few series. This contains:

   - Fix for a registration race in loop, from Anton Volkov.

   - Overflow complaint fix from Arnd for DAC960.

   - Series of drbd changes from the usual suspects.

   - Conversion of the stec/skd driver to blk-mq. From Bart.

   - A few BFQ improvements/fixes from Paolo.

   - CFQ improvement from Ritesh, allowing idling for group idle.

   - A few fixes found by Dan's smatch, courtesy of Dan.

   - A warning fixup for a race between changing the IO scheduler and
     device remova. From David Jeffery.

   - A few nbd fixes from Josef.

   - Support for cgroup info in blktrace, from Shaohua.

   - Also from Shaohua, new features in the null_blk driver to allow it
     to actually hold data, among other things.

   - Various corner cases and error handling fixes from Weiping Zhang.

   - Improvements to the IO stats tracking for blk-mq from me. Can
     drastically improve performance for fast devices and/or big
     machines.

   - Series from Christoph removing bi_bdev as being needed for IO
     submission, in preparation for nvme multipathing code.

   - Series from Bart, including various cleanups and fixes for switch
     fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
  kernfs: checking for IS_ERR() instead of NULL
  drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
  drbd: Fix allyesconfig build, fix recent commit
  drbd: switch from kmalloc() to kmalloc_array()
  drbd: abort drbd_start_resync if there is no connection
  drbd: move global variables to drbd namespace and make some static
  drbd: rename "usermode_helper" to "drbd_usermode_helper"
  drbd: fix race between handshake and admin disconnect/down
  drbd: fix potential deadlock when trying to detach during handshake
  drbd: A single dot should be put into a sequence.
  drbd: fix rmmod cleanup, remove _all_ debugfs entries
  drbd: Use setup_timer() instead of init_timer() to simplify the code.
  drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  drbd: new disk-option disable-write-same
  drbd: Fix resource role for newly created resources in events2
  drbd: mark symbols static where possible
  drbd: Send P_NEG_ACK upon write error in protocol != C
  drbd: add explicit plugging when submitting batches
  drbd: change list_for_each_safe to while(list_first_entry_or_null)
  drbd: introduce drbd_recv_header_maybe_unplug
  ...
2017-09-07 11:59:42 -07:00
Jan Kara
397162ffa2 mm: remove nr_pages argument from pagevec_lookup{,_range}()
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.

Just drop the argument.

Link: http://lkml.kernel.org/r/20170726114704.7626-11-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:27 -07:00
Jan Kara
8338141f0f fs: use pagevec_lookup_range() in page_cache_seek_hole_data()
We want only pages from given range in page_cache_seek_hole_data().  Use
pagevec_lookup_range() instead of pagevec_lookup() and remove
unnecessary code.

Note that the check for getting less pages than desired can be removed
because index gets updated by pagevec_lookup_range().

Link: http://lkml.kernel.org/r/20170726114704.7626-9-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:27 -07:00
Jan Kara
c10f778ddf fs: fix performance regression in clean_bdev_aliases()
Commit e64855c6cf ("fs: Add helper to clean bdev aliases under a bh
and use it") added a wrapper for clean_bdev_aliases() that invalidates
bdev aliases underlying a single buffer head.

However this has caused a performance regression for bonnie++ benchmark
on ext4 filesystem when delayed allocation is turned off (ext3 mode) -
average of 3 runs:

  Hmean SeqOut Char  164787.55 (  0.00%) 107189.06 (-34.95%)
  Hmean SeqOut Block 219883.89 (  0.00%) 168870.32 (-23.20%)

The reason for this regression is that clean_bdev_aliases() is slower
when called for a single block because pagevec_lookup() it uses will end
up iterating through the radix tree until it finds a page (which may
take a while) but we are only interested whether there's a page at a
particular index.

Fix the problem by using pagevec_lookup_range() instead which avoids the
needless iteration.

Fixes: e64855c6cf ("fs: Add helper to clean bdev aliases under a bh and use it")
Link: http://lkml.kernel.org/r/20170726114704.7626-5-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:26 -07:00
Jan Kara
d72dc8a25a mm: make pagevec_lookup() update index
Make pagevec_lookup() (and underlying find_get_pages()) update index to
the next page where iteration should continue.  Most callers want this
and also pagevec_lookup_tag() already does this.

Link: http://lkml.kernel.org/r/20170726114704.7626-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:26 -07:00
Christoph Hellwig
74d46992e0 block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-23 12:49:55 -06:00
Eric Biggers
241f01fbed fs/buffer.c: make bh_lru_install() more efficient
To install a buffer_head into the cpu's LRU queue, bh_lru_install()
would construct a new copy of the queue and then memcpy it over the real
queue.  But it's easily possible to do the update in-place, which is
faster and simpler.  Some work can also be skipped if the buffer_head
was already in the queue.

As a microbenchmark I timed how long it takes to run sb_getblk()
10,000,000 times alternating between BH_LRU_SIZE + 1 blocks.
Effectively, this benchmarks looking up buffer_heads that are in the
page cache but not in the LRU:

	Before this patch: 1.758s
	After this patch: 1.653s

This patch also removes about 350 bytes of compiled code (on x86_64),
partly due to removal of the memcpy() which was being inlined+unrolled.

Link: http://lkml.kernel.org/r/20161229193445.1913-1-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:30 -07:00
Linus Torvalds
642338ba33 Merge tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull XFS updates from Darrick Wong:
 "Here are some changes for you for 4.13. For the most part it's fixes
  for bugs and deadlock problems, and preparation for online fsck in
  some future merge window.

   - Avoid quotacheck deadlocks

   - Fix transaction overflows when bunmapping fragmented files

   - Refactor directory readahead

   - Allow admin to configure if ASSERT is fatal

   - Improve transaction usage detail logging during overflows

   - Minor cleanups

   - Don't leak log items when the log shuts down

   - Remove double-underscore typedefs

   - Various preparation for online scrubbing

   - Introduce new error injection configuration sysfs knobs

   - Refactor dq_get_next to use extent map directly

   - Fix problems with iterating the page cache for unwritten data

   - Implement SEEK_{HOLE,DATA} via iomap

   - Refactor XFS to use iomap SEEK_HOLE and SEEK_DATA

   - Don't use MAXPATHLEN to check on-disk symlink target lengths"

* tag 'xfs-4.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
  xfs: don't crash on unexpected holes in dir/attr btrees
  xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN
  xfs: fix contiguous dquot chunk iteration livelock
  xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
  vfs: Add iomap_seek_hole and iomap_seek_data helpers
  vfs: Add page_cache_seek_hole_data helper
  xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk
  xfs: rewrite xfs_dq_get_next_id using xfs_iext_lookup_extent
  xfs: Check for m_errortag initialization in xfs_errortag_test
  xfs: grab dquots without taking the ilock
  xfs: fix semicolon.cocci warnings
  xfs: Don't clear SGID when inheriting ACLs
  xfs: free cowblocks and retry on buffered write ENOSPC
  xfs: replace log_badcrc_factor knob with error injection tag
  xfs: convert drop_writes to use the errortag mechanism
  xfs: remove unneeded parameter from XFS_TEST_ERROR
  xfs: expose errortag knobs via sysfs
  xfs: make errortag a per-mountpoint structure
  xfs: free uncommitted transactions during log recovery
  xfs: don't allow bmap on rt files
  ...
2017-07-10 10:51:53 -07:00
Linus Torvalds
bc2c6421cb Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "The first major feature for ext4 this merge window is the largedir
  feature, which allows ext4 directories to support over 2 billion
  directory entries (assuming ~64 byte file names; in practice, users
  will run into practical performance limits first.) This feature was
  originally written by the Lustre team, and credit goes to Artem
  Blagodarenko from Seagate for getting this feature upstream.

  The second major major feature allows ext4 to support extended
  attribute values up to 64k. This feature was also originally from
  Lustre, and has been enhanced by Tahsin Erdogan from Google with a
  deduplication feature so that if multiple files have the same xattr
  value (for example, Windows ACL's stored by Samba), only one copy will
  be stored on disk for encoding and caching efficiency.

  We also have the usual set of bug fixes, cleanups, and optimizations"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits)
  ext4: fix spelling mistake: "prellocated" -> "preallocated"
  ext4: fix __ext4_new_inode() journal credits calculation
  ext4: skip ext4_init_security() and encryption on ea_inodes
  fs: generic_block_bmap(): initialize all of the fields in the temp bh
  ext4: change fast symlink test to not rely on i_blocks
  ext4: require key for truncate(2) of encrypted file
  ext4: don't bother checking for encryption key in ->mmap()
  ext4: check return value of kstrtoull correctly in reserved_clusters_store
  ext4: fix off-by-one fsmap error on 1k block filesystems
  ext4: return EFSBADCRC if a bad checksum error is found in ext4_find_entry()
  ext4: return EIO on read error in ext4_find_entry
  ext4: forbid encrypting root directory
  ext4: send parallel discards on commit completions
  ext4: avoid unnecessary stalls in ext4_evict_inode()
  ext4: add nombcache mount option
  ext4: strong binding of xattr inode references
  ext4: eliminate xattr entry e_hash recalculation for removes
  ext4: reserve space for xattr entries/names
  quota: add get_inode_usage callback to transfer multi-inode charges
  ext4: xattr inode deduplication
  ...
2017-07-09 09:31:22 -07:00
Linus Torvalds
088737f44b Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull Writeback error handling updates from Jeff Layton:
 "This pile represents the bulk of the writeback error handling fixes
  that I have for this cycle. Some of the earlier patches in this pile
  may look trivial but they are prerequisites for later patches in the
  series.

  The aim of this set is to improve how we track and report writeback
  errors to userland. Most applications that care about data integrity
  will periodically call fsync/fdatasync/msync to ensure that their
  writes have made it to the backing store.

  For a very long time, we have tracked writeback errors using two flags
  in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
  writeback error occurs (via mapping_set_error) and are cleared as a
  side-effect of filemap_check_errors (as you noted yesterday). This
  model really sucks for userland.

  Only the first task to call fsync (or msync or fdatasync) will see the
  error. Any subsequent task calling fsync on a file will get back 0
  (unless another writeback error occurs in the interim). If I have
  several tasks writing to a file and calling fsync to ensure that their
  writes got stored, then I need to have them coordinate with one
  another. That's difficult enough, but in a world of containerized
  setups that coordination may even not be possible.

  But wait...it gets worse!

  The calls to filemap_check_errors can be buried pretty far down in the
  call stack, and there are internal callers of filemap_write_and_wait
  and the like that also end up clearing those errors. Many of those
  callers ignore the error return from that function or return it to
  userland at nonsensical times (e.g. truncate() or stat()). If I get
  back -EIO on a truncate, there is no reason to think that it was
  because some previous writeback failed, and a subsequent fsync() will
  (incorrectly) return 0.

  This pile aims to do three things:

   1) ensure that when a writeback error occurs that that error will be
      reported to userland on a subsequent fsync/fdatasync/msync call,
      regardless of what internal callers are doing

   2) report writeback errors on all file descriptions that were open at
      the time that the error occurred. This is a user-visible change,
      but I think most applications are written to assume this behavior
      anyway. Those that aren't are unlikely to be hurt by it.

   3) document what filesystems should do when there is a writeback
      error. Today, there is very little consistency between them, and a
      lot of cargo-cult copying. We need to make it very clear what
      filesystems should do in this situation.

  To achieve this, the set adds a new data type (errseq_t) and then
  builds new writeback error tracking infrastructure around that. Once
  all of that is in place, we change the filesystems to use the new
  infrastructure for reporting wb errors to userland.

  Note that this is just the initial foray into cleaning up this mess.
  There is a lot of work remaining here:

   1) convert the rest of the filesystems in a similar fashion. Once the
      initial set is in, then I think most other fs' will be fairly
      simple to convert. Hopefully most of those can in via individual
      filesystem trees.

   2) convert internal waiters on writeback to use errseq_t for
      detecting errors instead of relying on the AS_* flags. I have some
      draft patches for this for ext4, but they are not quite ready for
      prime time yet.

  This was a discussion topic this year at LSF/MM too. If you're
  interested in the gory details, LWN has some good articles about this:

      https://lwn.net/Articles/718734/
      https://lwn.net/Articles/724307/"

* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  btrfs: minimal conversion to errseq_t writeback error reporting on fsync
  xfs: minimal conversion to errseq_t writeback error reporting
  ext4: use errseq_t based error handling for reporting data writeback errors
  fs: convert __generic_file_fsync to use errseq_t based reporting
  block: convert to errseq_t based writeback error tracking
  dax: set errors in mapping when writeback fails
  Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
  mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
  fs: new infrastructure for writeback error handling and reporting
  lib: add errseq_t type and infrastructure for handling it
  mm: don't TestClearPageError in __filemap_fdatawait_range
  mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
  jbd2: don't clear and reset errors after waiting on writeback
  buffer: set errors in mapping at the time that the error occurs
  fs: check for writeback errors after syncing out buffers in generic_file_fsync
  buffer: use mapping_set_error instead of setting the flag
  mm: fix mapping_set_error call in me_pagecache_dirty
2017-07-07 19:38:17 -07:00
Jeff Layton
87354e5de0 buffer: set errors in mapping at the time that the error occurs
I noticed on xfs that I could still sometimes get back an error on fsync
on a fd that was opened after the error condition had been cleared.

The problem is that the buffer code sets the write_io_error flag and
then later checks that flag to set the error in the mapping. That flag
perisists for quite a while however. If the file is later opened with
O_TRUNC, the buffers will then be invalidated and the mapping's error
set such that a subsequent fsync will return error. I think this is
incorrect, as there was no writeback between the open and fsync.

Add a new mark_buffer_write_io_error operation that sets the flag and
the error in the mapping at the same time. Replace all calls to
set_buffer_write_io_error with mark_buffer_write_io_error, and remove
the places that check this flag in order to set the error in the
mapping.

This sets the error in the mapping earlier, at the time that it's first
detected.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2017-07-06 07:02:21 -04:00
Jeff Layton
d945b59db8 buffer: use mapping_set_error instead of setting the flag
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-07-06 07:02:20 -04:00
Alexander Potapenko
2a527d6858 fs: generic_block_bmap(): initialize all of the fields in the temp bh
KMSAN (KernelMemorySanitizer, a new error detection tool) reports the
use of uninitialized memory in ext4_update_bh_state():

==================================================================
BUG: KMSAN: use of unitialized memory
CPU: 3 PID: 1 Comm: swapper/0 Tainted: G    B           4.8.0-rc6+ #597
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
 0000000000000282 ffff88003cc96f68 ffffffff81f30856 0000003000000008
 ffff88003cc96f78 0000000000000096 ffffffff8169742a ffff88003cc96ff8
 ffffffff812fc1fc 0000000000000008 ffff88003a1980e8 0000000100000000
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff81f30856>] dump_stack+0xa6/0xc0 lib/dump_stack.c:51
 [<ffffffff812fc1fc>] kmsan_report+0x1ec/0x300 mm/kmsan/kmsan.c:?
 [<ffffffff812fc33b>] __msan_warning+0x2b/0x40 ??:?
 [<     inline     >] ext4_update_bh_state fs/ext4/inode.c:727
 [<ffffffff8169742a>] _ext4_get_block+0x6ca/0x8a0 fs/ext4/inode.c:759
 [<ffffffff81696d4c>] ext4_get_block+0x8c/0xa0 fs/ext4/inode.c:769
 [<ffffffff814a2d36>] generic_block_bmap+0x246/0x2b0 fs/buffer.c:2991
 [<ffffffff816ca30e>] ext4_bmap+0x5ee/0x660 fs/ext4/inode.c:3177
...
origin description: ----tmp@generic_block_bmap
==================================================================

(the line numbers are relative to 4.8-rc6, but the bug persists
upstream)

The local |tmp| is created in generic_block_bmap() and then passed into
ext4_bmap() => ext4_get_block() => _ext4_get_block() =>
ext4_update_bh_state(). Along the way tmp.b_page is never initialized
before ext4_update_bh_state() checks its value.

[ Use the approach suggested by Kees Cook of initializing the whole bh
  structure.]

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-07-05 00:56:21 -04:00
Andreas Gruenbacher
334fd34d76 vfs: Add page_cache_seek_hole_data helper
Both ext4 and xfs implement seeking for the next hole or piece of data
in unwritten extents by scanning the page cache, and both versions share
the same bug when iterating the buffers of a page: the start offset into
the page isn't taken into account, so when a page fits more than two
filesystem blocks, things will go wrong.  For example, on a filesystem
with a block size of 1k, the following command will fail:

  xfs_io -f -c "falloc 0 4k" \
            -c "pwrite 1k 1k" \
            -c "pwrite 3k 1k" \
            -c "seek -a -r 0" foo

In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.

Introduce a generic vfs helper for seeking in the page cache that gets
this right.  The next commits will replace the filesystem specific
implementations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[hch: dropped the export]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-07-02 22:46:13 -07:00
Jens Axboe
8e8f929881 fs: add support for buffered writeback to pass down write hints
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-27 12:05:39 -06:00
Christoph Hellwig
4e4cbee93d block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00