lineage-22.2
421 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
38d95b42c4 |
Merge 4.19.304 into android-4.19-stable
Changes in 4.19.304 arm64: dts: mediatek: mt8173-evb: Fix regulator-fixed node names ALSA: hda/realtek: Add quirk for Lenovo TianYi510Pro-14IOB ALSA: hda/realtek: Enable headset onLenovo M70/M90 ALSA: hda/realtek: Enable headset on Lenovo M90 Gen5 ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE ARM: OMAP2+: Fix null pointer dereference and memory leak in omap_soc_device_init reset: Fix crash when freeing non-existent optional resets s390/vx: fix save/restore of fpu kernel context wifi: mac80211: mesh_plink: fix matches_local logic net/mlx5: improve some comments net/mlx5: Fix fw tracer first block check net: sched: ife: fix potential use-after-free ethernet: atheros: fix a memleak in atl1e_setup_ring_resources net/rose: fix races in rose_kill_by_device() net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev() afs: Fix the dynamic root's d_delete to always delete unused dentries net: warn if gso_type isn't set for a GSO SKB net: check dev->gso_max_size in gso_features_check() pinctrl: at91-pio4: use dedicated lock class for IRQ smb: client: fix NULL deref in asn1_ber_decoder() btrfs: do not allow non subvolume root targets for snapshot iio: imu: inv_mpu6050: fix an error code problem in inv_mpu6050_read_raw Input: ipaq-micro-keys - add error handling for devm_kmemdup scsi: bnx2fc: Remove set but not used variable 'oxid' scsi: bnx2fc: Fix skb double free in bnx2fc_rcv() iio: common: ms_sensors: ms_sensors_i2c: fix humidity conversion time table wifi: cfg80211: Add my certificate wifi: cfg80211: fix certs build to not depend on file order USB: serial: ftdi_sio: update Actisense PIDs constant names USB: serial: option: add Quectel EG912Y module support USB: serial: option: add Foxconn T99W265 with new baseline USB: serial: option: add Quectel RM500Q R13 firmware support Bluetooth: hci_event: Fix not checking if HCI_OP_INQUIRY has been sent net: 9p: avoid freeing uninit memory in p9pdu_vreadf net: rfkill: gpio: set GPIO direction x86/alternatives: Sync core before enabling interrupts usb: musb: fix MUSB_QUIRK_B_DISCONNECT_99 handling usb: fotg210-hcd: delete an incorrect bounds test smb: client: fix OOB in smbCalcSize() dm-integrity: don't modify bio's immutable bio_vec in integrity_metadata() block: Don't invalidate pagecache for invalid falloc modes Linux 4.19.304 Change-Id: I924e0479cdd444b14c25d83a165ca082fa2c9f80 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
4d26c2228d |
block: Don't invalidate pagecache for invalid falloc modes
commit 1364a3c391aedfeb32aa025303ead3d7c91cdf9d upstream.
Only call truncate_bdev_range() if the fallocate mode is supported. This
fixes a bug where data in the pagecache could be invalidated if the
fallocate() was called on the block device with an invalid mode.
Fixes:
|
||
|
|
011b73c995 |
Merge 4.19.191 into android-4.19-stable
Changes in 4.19.191 s390/disassembler: increase ebpf disasm buffer size ACPI: custom_method: fix potential use-after-free issue ACPI: custom_method: fix a possible memory leak ftrace: Handle commands when closing set_ftrace_filter file ARM: 9056/1: decompressor: fix BSS size calculation for LLVM ld.lld arm64: dts: marvell: armada-37xx: add syscon compatible to NB clk node arm64: dts: mt8173: fix property typo of 'phys' in dsi node ecryptfs: fix kernel panic with null dev_name mtd: spinand: core: add missing MODULE_DEVICE_TABLE() mtd: rawnand: atmel: Update ecc_stats.corrected counter spi: spi-ti-qspi: Free DMA resources scsi: qla2xxx: Fix crash in qla2xxx_mqueuecommand() mmc: sdhci-pci: Fix initialization of some SD cards for Intel BYT-based controllers mmc: block: Update ext_csd.cache_ctrl if it was written mmc: block: Issue a cache flush only when it's enabled mmc: core: Do a power cycle when the CMD11 fails mmc: core: Set read only for SD cards with permanent write protect bit erofs: add unsupported inode i_format check cifs: Return correct error code from smb2_get_enc_key btrfs: fix metadata extent leak after failure to create subvolume intel_th: pci: Add Rocket Lake CPU support fbdev: zero-fill colormap in fbcmap.c staging: wimax/i2400m: fix byte-order issue crypto: api - check for ERR pointers in crypto_destroy_tfm() usb: gadget: uvc: add bInterval checking for HS mode genirq/matrix: Prevent allocation counter corruption usb: gadget: f_uac1: validate input parameters usb: dwc3: gadget: Ignore EP queue requests during bus reset usb: xhci: Fix port minor revision PCI: PM: Do not read power state in pci_enable_device_flags() x86/build: Propagate $(CLANG_FLAGS) to $(REALMODE_FLAGS) tee: optee: do not check memref size on return from Secure World perf/arm_pmu_platform: Fix error handling usb: xhci-mtk: support quirk to disable usb2 lpm xhci: check control context is valid before dereferencing it. xhci: fix potential array out of bounds with several interrupters spi: dln2: Fix reference leak to master spi: omap-100k: Fix reference leak to master intel_th: Consistency and off-by-one fix phy: phy-twl4030-usb: Fix possible use-after-free in twl4030_usb_remove() btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s scsi: lpfc: Fix incorrect dbde assignment when building target abts wqe scsi: lpfc: Fix pt2pt connection does not recover after LOGO scsi: target: pscsi: Fix warning in pscsi_complete_cmd() media: ite-cir: check for receive overflow media: drivers: media: pci: sta2x11: fix Kconfig dependency on GPIOLIB power: supply: bq27xxx: fix power_avg for newer ICs extcon: arizona: Fix some issues when HPDET IRQ fires after the jack has been unplugged media: media/saa7164: fix saa7164_encoder_register() memory leak bugs media: gspca/sq905.c: fix uninitialized variable power: supply: Use IRQF_ONESHOT drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f scsi: qla2xxx: Always check the return value of qla24xx_get_isp_stats() scsi: qla2xxx: Fix use after free in bsg scsi: scsi_dh_alua: Remove check for ASC 24h in alua_rtpg() media: em28xx: fix memory leak media: vivid: update EDID clk: socfpga: arria10: Fix memory leak of socfpga_clk on error return power: supply: generic-adc-battery: fix possible use-after-free in gab_remove() power: supply: s3c_adc_battery: fix possible use-after-free in s3c_adc_bat_remove() media: tc358743: fix possible use-after-free in tc358743_remove() media: adv7604: fix possible use-after-free in adv76xx_remove() media: i2c: adv7511-v4l2: fix possible use-after-free in adv7511_remove() media: i2c: adv7842: fix possible use-after-free in adv7842_remove() media: dvb-usb: fix memory leak in dvb_usb_adapter_init media: gscpa/stv06xx: fix memory leak drm/msm/mdp5: Configure PP_SYNC_HEIGHT to double the vtotal amdgpu: avoid incorrect %hu format string drm/amdgpu: fix NULL pointer dereference scsi: lpfc: Fix crash when a REG_RPI mailbox fails triggering a LOGO response scsi: lpfc: Remove unsupported mbox PORT_CAPABILITIES logic scsi: libfc: Fix a format specifier s390/archrandom: add parameter check for s390_arch_random_generate ALSA: emu8000: Fix a use after free in snd_emu8000_create_mixer ALSA: hda/conexant: Re-order CX5066 quirk table entries ALSA: sb: Fix two use after free in snd_sb_qsound_build ALSA: usb-audio: Explicitly set up the clock selector ALSA: usb-audio: More constifications ALSA: usb-audio: Add dB range mapping for Sennheiser Communications Headset PC 8 ALSA: hda/realtek: Add quirk for Intel Clevo PCx0Dx btrfs: fix race when picking most recent mod log operation for an old root arm64/vdso: Discard .note.gnu.property sections in vDSO ubifs: Only check replay with inode type to judge if inode linked f2fs: fix to avoid out-of-bounds memory access mlxsw: spectrum_mr: Update egress RIF list before route's action openvswitch: fix stack OOB read while fragmenting IPv4 packets ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure NFS: Don't discard pNFS layout segments that are marked for return NFSv4: Don't discard segments marked for return in _pnfs_return_layout() jffs2: Fix kasan slab-out-of-bounds problem powerpc/eeh: Fix EEH handling for hugepages in ioremap space. powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h intel_th: pci: Add Alder Lake-M support tpm: vtpm_proxy: Avoid reading host log when using a virtual device md/raid1: properly indicate failure when ending a failed write request dm raid: fix inconclusive reshape layout on fast raid4/5/6 table reload sequences security: commoncap: fix -Wstringop-overread warning Fix misc new gcc warnings jffs2: check the validity of dstlen in jffs2_zlib_compress() Revert |
||
|
|
b9bcbc3c74 |
block: reexpand iov_iter after read/write
[ Upstream commit cf7b39a0cbf6bf57aa07a008d46cf695add05b4c ] We get a bug: BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 Read of size 8 at addr ffff0000d3fb11f8 by task CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted 5.10.0-00843-g352c8610ccd2 #2 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x164 lib/dump_stack.c:118 print_address_description+0x78/0x5c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report+0x148/0x1e4 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:183 [inline] __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 io_read fs/io_uring.c:3421 [inline] io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Allocated by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 __kmalloc+0x23c/0x334 mm/slub.c:3970 kmalloc include/linux/slab.h:557 [inline] __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 io_setup_async_rw fs/io_uring.c:3229 [inline] io_read fs/io_uring.c:3436 [inline] io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Freed by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x38/0x6c mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0x104/0x38c mm/slub.c:4124 io_dismantle_req fs/io_uring.c:1855 [inline] __io_free_req+0x70/0x254 fs/io_uring.c:1867 io_put_req_find_next fs/io_uring.c:2173 [inline] __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 task_work_run+0xdc/0x128 kernel/task_work.c:151 get_signal+0x6f8/0x980 kernel/signal.c:2562 do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 work_pending+0xc/0x180 blkdev_read_iter can truncate iov_iter's count since the count + pos may exceed the size of the blkdev. This will confuse io_read that we have consume the iovec. And once we do the iov_iter_revert in io_read, we will trigger the slab-out-of-bounds. Fix it by reexpand the count with size has been truncated. blkdev_write_iter can trigger the problem too. Signed-off-by: yangerkun <yangerkun@huawei.com> Acked-by: Pavel Begunkov <asml.silencec@gmail.com> Link: https://lore.kernel.org/r/20210401071807.3328235-1-yangerkun@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
9ce79d9bed |
Merge 4.19.149 into android-4.19-stable
Changes in 4.19.149 selinux: allow labeling before policy is loaded media: mc-device.c: fix memleak in media_device_register_entity dma-fence: Serialise signal enabling (dma_fence_enable_sw_signaling) ath10k: fix array out-of-bounds access ath10k: fix memory leak for tpc_stats_final mm: fix double page fault on arm64 if PTE_AF is cleared scsi: aacraid: fix illegal IO beyond last LBA m68k: q40: Fix info-leak in rtc_ioctl gma/gma500: fix a memory disclosure bug due to uninitialized bytes ASoC: kirkwood: fix IRQ error handling media: smiapp: Fix error handling at NVM reading arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback x86/ioapic: Unbreak check_timer() ALSA: usb-audio: Add delay quirk for H570e USB headsets ALSA: hda/realtek - Couldn't detect Mic if booting with headset plugged ALSA: hda/realtek: Enable front panel headset LED on Lenovo ThinkStation P520 lib/string.c: implement stpcpy leds: mlxreg: Fix possible buffer overflow PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out scsi: fnic: fix use after free scsi: lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounce net: silence data-races on sk_backlog.tail clk/ti/adpll: allocate room for terminating null drm/amdgpu/powerplay: fix AVFS handling with custom powerplay table mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup() mfd: mfd-core: Protect against NULL call-back function pointer drm/amdgpu/powerplay/smu7: fix AVFS handling with custom powerplay table tpm_crb: fix fTPM on AMD Zen+ CPUs tracing: Adding NULL checks for trace_array descriptor pointer bcache: fix a lost wake-up problem caused by mca_cannibalize_lock dmaengine: mediatek: hsdma_probe: fixed a memory leak when devm_request_irq fails RDMA/qedr: Fix potential use after free RDMA/i40iw: Fix potential use after free fix dget_parent() fastpath race xfs: fix attr leaf header freemap.size underflow RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()' ubi: Fix producing anchor PEBs mmc: core: Fix size overflow for mmc partitions gfs2: clean up iopen glock mess in gfs2_create_inode scsi: pm80xx: Cleanup command when a reset times out debugfs: Fix !DEBUG_FS debugfs_create_automount CIFS: Properly process SMB3 lease breaks ASoC: max98090: remove msleep in PLL unlocked workaround kernel/sys.c: avoid copying possible padding bytes in copy_to_user KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy() xfs: fix log reservation overflows when allocating large rt extents neigh_stat_seq_next() should increase position index rt_cpu_seq_next should increase position index ipv6_route_seq_next should increase position index seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier media: ti-vpe: cal: Restrict DMA to avoid memory corruption sctp: move trace_sctp_probe_path into sctp_outq_sack ACPI: EC: Reference count query handlers under lock scsi: ufs: Make ufshcd_add_command_trace() easier to read scsi: ufs: Fix a race condition in the tracing code dmaengine: zynqmp_dma: fix burst length configuration s390/cpum_sf: Use kzalloc and minor changes powerpc/eeh: Only dump stack once if an MMIO loop is detected Bluetooth: btrtl: Use kvmalloc for FW allocations tracing: Set kernel_stack's caller size properly ARM: 8948/1: Prevent OOB access in stacktrace ar5523: Add USB ID of SMCWUSBT-G2 wireless adapter ceph: ensure we have a new cap before continuing in fill_inode selftests/ftrace: fix glob selftest tools/power/x86/intel_pstate_tracer: changes for python 3 compatibility Bluetooth: Fix refcount use-after-free issue mm/swapfile.c: swap_next should increase position index mm: pagewalk: fix termination condition in walk_pte_range() Bluetooth: prefetch channel before killing sock KVM: fix overflow of zero page refcount with ksm running ALSA: hda: Clear RIRB status before reading WP skbuff: fix a data race in skb_queue_len() audit: CONFIG_CHANGE don't log internal bookkeeping as an event selinux: sel_avc_get_stat_idx should increase position index scsi: lpfc: Fix RQ buffer leakage when no IOCBs available scsi: lpfc: Fix coverity errors in fmdi attribute handling drm/omap: fix possible object reference leak clk: stratix10: use do_div() for 64-bit calculation crypto: chelsio - This fixes the kernel panic which occurs during a libkcapi test mt76: clear skb pointers from rx aggregation reorder buffer during cleanup ALSA: usb-audio: Don't create a mixer element with bogus volume range perf test: Fix test trace+probe_vfs_getname.sh on s390 RDMA/rxe: Fix configuration of atomic queue pair attributes KVM: x86: fix incorrect comparison in trace event dmaengine: stm32-mdma: use vchan_terminate_vdesc() in .terminate_all media: staging/imx: Missing assignment in imx_media_capture_device_register() x86/pkeys: Add check for pkey "overflow" bpf: Remove recursion prevention from rcu free callback dmaengine: stm32-dma: use vchan_terminate_vdesc() in .terminate_all dmaengine: tegra-apb: Prevent race conditions on channel's freeing drm/amd/display: dal_ddc_i2c_payloads_create can fail causing panic firmware: arm_sdei: Use cpus_read_lock() to avoid races with cpuhp random: fix data races at timer_rand_state bus: hisi_lpc: Fixup IO ports addresses to avoid use-after-free in host removal media: go7007: Fix URB type for interrupt handling Bluetooth: guard against controllers sending zero'd events timekeeping: Prevent 32bit truncation in scale64_check_overflow() ext4: fix a data race at inode->i_disksize perf jevents: Fix leak of mapfile memory mm: avoid data corruption on CoW fault into PFN-mapped VMA drm/amdgpu: increase atombios cmd timeout drm/amd/display: Stop if retimer is not available ath10k: use kzalloc to read for ath10k_sdio_hif_diag_read scsi: aacraid: Disabling TM path and only processing IOP reset Bluetooth: L2CAP: handle l2cap config request during open state media: tda10071: fix unsigned sign extension overflow xfs: don't ever return a stale pointer from __xfs_dir3_free_read xfs: mark dir corrupt when lookup-by-hash fails ext4: mark block bitmap corrupted when found instead of BUGON tpm: ibmvtpm: Wait for buffer to be set before proceeding rtc: sa1100: fix possible race condition rtc: ds1374: fix possible race condition nfsd: Don't add locks to closed or closing open stateids RDMA/cm: Remove a race freeing timewait_info KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones drm/msm: fix leaks if initialization fails drm/msm/a5xx: Always set an OPP supported hardware value tracing: Use address-of operator on section symbols thermal: rcar_thermal: Handle probe error gracefully perf parse-events: Fix 3 use after frees found with clang ASAN serial: 8250_port: Don't service RX FIFO if throttled serial: 8250_omap: Fix sleeping function called from invalid context during probe serial: 8250: 8250_omap: Terminate DMA before pushing data on RX timeout perf cpumap: Fix snprintf overflow check cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn tools: gpio-hammer: Avoid potential overflow in main nvme-multipath: do not reset on unknown status nvme: Fix controller creation races with teardown flow RDMA/rxe: Set sys_image_guid to be aligned with HW IB devices scsi: hpsa: correct race condition in offload enabled SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()' svcrdma: Fix leak of transport addresses PCI: Use ioremap(), not phys_to_virt() for platform ROM ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor PCI: pciehp: Fix MSI interrupt race NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests() mm/kmemleak.c: use address-of operator on section symbols mm/filemap.c: clear page error before actual read mm/vmscan.c: fix data races using kswapd_classzone_idx nvmet-rdma: fix double free of rdma queue mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area scsi: qedi: Fix termination timeouts in session logout serial: uartps: Wait for tx_empty in console setup KVM: Remove CREATE_IRQCHIP/SET_PIT2 race bdev: Reduce time holding bd_mutex in sync in blkdev_close() drivers: char: tlclk.c: Avoid data race between init and interrupt handler KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi() net: openvswitch: use u64 for meter bucket scsi: aacraid: Fix error handling paths in aac_probe_one() staging:r8188eu: avoid skb_clone for amsdu to msdu conversion sparc64: vcc: Fix error return code in vcc_probe() arm64: cpufeature: Relax checks for AArch32 support at EL[0-2] dt-bindings: sound: wm8994: Correct required supplies based on actual implementaion atm: fix a memory leak of vcc->user_back perf mem2node: Avoid double free related to realloc power: supply: max17040: Correct voltage reading phy: samsung: s5pv210-usb2: Add delay after reset Bluetooth: Handle Inquiry Cancel error after Inquiry Complete USB: EHCI: ehci-mv: fix error handling in mv_ehci_probe() tipc: fix memory leak in service subscripting tty: serial: samsung: Correct clock selection logic ALSA: hda: Fix potential race in unsol event handler powerpc/traps: Make unrecoverable NMIs die instead of panic fuse: don't check refcount after stealing page USB: EHCI: ehci-mv: fix less than zero comparison of an unsigned int scsi: cxlflash: Fix error return code in cxlflash_probe() arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register e1000: Do not perform reset in reset_task if we are already down drm/nouveau/debugfs: fix runtime pm imbalance on error drm/nouveau: fix runtime pm imbalance on error drm/nouveau/dispnv50: fix runtime pm imbalance on error printk: handle blank console arguments passed in. usb: dwc3: Increase timeout for CmdAct cleared by device controller btrfs: don't force read-only after error in drop snapshot vfio/pci: fix memory leaks of eventfd ctx perf evsel: Fix 2 memory leaks perf trace: Fix the selection for architectures to generate the errno name tables perf stat: Fix duration_time value for higher intervals perf util: Fix memory leak of prefix_if_not_in perf metricgroup: Free metric_events on error perf kcore_copy: Fix module map when there are no modules loaded ASoC: img-i2s-out: Fix runtime PM imbalance on error wlcore: fix runtime pm imbalance in wl1271_tx_work wlcore: fix runtime pm imbalance in wlcore_regdomain_config mtd: rawnand: omap_elm: Fix runtime PM imbalance on error PCI: tegra: Fix runtime PM imbalance on error ceph: fix potential race in ceph_check_caps mm/swap_state: fix a data race in swapin_nr_pages rapidio: avoid data race between file operation callbacks and mport_cdev_add(). mtd: parser: cmdline: Support MTD names containing one or more colons x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline vfio/pci: Clear error and request eventfd ctx after releasing cifs: Fix double add page to memcg when cifs_readpages nvme: fix possible deadlock when I/O is blocked scsi: libfc: Handling of extra kref scsi: libfc: Skip additional kref updating work event selftests/x86/syscall_nt: Clear weird flags after each test vfio/pci: fix racy on error and request eventfd ctx btrfs: qgroup: fix data leak caused by race between writeback and truncate ubi: fastmap: Free unused fastmap anchor peb during detach perf parse-events: Use strcmp() to compare the PMU name net: openvswitch: use div_u64() for 64-by-32 divisions nvme: explicitly update mpath disk capacity on revalidation ASoC: wm8994: Skip setting of the WM8994_MICBIAS register for WM1811 ASoC: wm8994: Ensure the device is resumed in wm89xx_mic_detect functions ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN Converter9 2-in-1 RISC-V: Take text_mutex in ftrace_init_nop() s390/init: add missing __init annotations lockdep: fix order in trace_hardirqs_off_caller() drm/amdkfd: fix a memory leak issue i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices() objtool: Fix noreturn detection for ignored functions ieee802154: fix one possible memleak in ca8210_dev_com_init ieee802154/adf7242: check status of adf7242_read_reg clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init() mwifiex: Increase AES key storage size to 256 bits batman-adv: bla: fix type misuse for backbone_gw hash indexing atm: eni: fix the missed pci_disable_device() for eni_init_one() batman-adv: mcast/TT: fix wrongly dropped or rerouted packets mac802154: tx: fix use-after-free bpf: Fix clobbering of r2 in bpf_gen_ld_abs drm/vc4/vc4_hdmi: fill ASoC card owner net: qed: RDMA personality shouldn't fail VF load drm/sun4i: sun8i-csc: Secondary CSC register correction batman-adv: Add missing include for in_interrupt() batman-adv: mcast: fix duplicate mcast packets in BLA backbone from mesh batman-adv: mcast: fix duplicate mcast packets from BLA backbone to mesh bpf: Fix a rcu warning for bpffs map pretty-print ALSA: asihpi: fix iounmap in error handler regmap: fix page selection for noinc reads MIPS: Add the missing 'CPU_1074K' into __get_cpu_type() KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE KVM: SVM: Add a dedicated INVD intercept routine tracing: fix double free s390/dasd: Fix zero write for FBA devices kprobes: Fix to check probe enabled before disarm_kprobe_ftrace() mm, THP, swap: fix allocating cluster for swapfile by mistake s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE ata: define AC_ERR_OK ata: make qc_prep return ata_completion_errors ata: sata_mv, avoid trigerrable BUG_ON KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch Linux 4.19.149 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Idfc1b35ec63b4b464aeb6e32709102bee0efc872 |
||
|
|
b6256c2966 |
bdev: Reduce time holding bd_mutex in sync in blkdev_close()
[ Upstream commit b849dd84b6ccfe32622988b79b7b073861fcf9f7 ] While trying to "dd" to the block device for a USB stick, I encountered a hung task warning (blocked for > 120 seconds). I managed to come up with an easy way to reproduce this on my system (where /dev/sdb is the block device for my USB stick) with: while true; do dd if=/dev/zero of=/dev/sdb bs=4M; done With my reproduction here are the relevant bits from the hung task detector: INFO: task udevd:294 blocked for more than 122 seconds. ... udevd D 0 294 1 0x00400008 Call trace: ... mutex_lock_nested+0x40/0x50 __blkdev_get+0x7c/0x3d4 blkdev_get+0x118/0x138 blkdev_open+0x94/0xa8 do_dentry_open+0x268/0x3a0 vfs_open+0x34/0x40 path_openat+0x39c/0xdf4 do_filp_open+0x90/0x10c do_sys_open+0x150/0x3c8 ... ... Showing all locks held in the system: ... 1 lock held by dd/2798: #0: ffffff814ac1a3b8 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x50/0x204 ... dd D 0 2798 2764 0x00400208 Call trace: ... schedule+0x8c/0xbc io_schedule+0x1c/0x40 wait_on_page_bit_common+0x238/0x338 __lock_page+0x5c/0x68 write_cache_pages+0x194/0x500 generic_writepages+0x64/0xa4 blkdev_writepages+0x24/0x30 do_writepages+0x48/0xa8 __filemap_fdatawrite_range+0xac/0xd8 filemap_write_and_wait+0x30/0x84 __blkdev_put+0x88/0x204 blkdev_put+0xc4/0xe4 blkdev_close+0x28/0x38 __fput+0xe0/0x238 ____fput+0x1c/0x28 task_work_run+0xb0/0xe4 do_notify_resume+0xfc0/0x14bc work_pending+0x8/0x14 The problem appears related to the fact that my USB disk is terribly slow and that I have a lot of RAM in my system to cache things. Specifically my writes seem to be happening at ~15 MB/s and I've got ~4 GB of RAM in my system that can be used for buffering. To write 4 GB of buffer to disk thus takes ~4000 MB / ~15 MB/s = ~267 seconds. The 267 second number is a problem because in __blkdev_put() we call sync_blockdev() while holding the bd_mutex. Any other callers who want the bd_mutex will be blocked for the whole time. The problem is made worse because I believe blkdev_put() specifically tells other tasks (namely udev) to go try to access the device at right around the same time we're going to hold the mutex for a long time. Putting some traces around this (after disabling the hung task detector), I could confirm: dd: 437.608600: __blkdev_put() right before sync_blockdev() for sdb udevd: 437.623901: blkdev_open() right before blkdev_get() for sdb dd: 661.468451: __blkdev_put() right after sync_blockdev() for sdb udevd: 663.820426: blkdev_open() right after blkdev_get() for sdb A simple fix for this is to realize that sync_blockdev() works fine if you're not holding the mutex. Also, it's not the end of the world if you sync a little early (though it can have performance impacts). Thus we can make a guess that we're going to need to do the sync and then do it without holding the mutex. We still do one last sync with the mutex but it should be much, much faster. With this, my hung task warnings for my test case are gone. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
b96549a28b |
Merge 4.19.130 into android-4.19-stable
Changes in 4.19.130 power: supply: bq24257_charger: Replace depends on REGMAP_I2C with select clk: sunxi: Fix incorrect usage of round_down() ASoC: tegra: tegra_wm8903: Support nvidia, headset property i2c: piix4: Detect secondary SMBus controller on AMD AM4 chipsets iio: pressure: bmp280: Tolerate IRQ before registering remoteproc: Fix IDR initialisation in rproc_alloc() clk: qcom: msm8916: Fix the address location of pll->config_reg backlight: lp855x: Ensure regulators are disabled on probe failure ASoC: davinci-mcasp: Fix dma_chan refcnt leak when getting dma type ARM: integrator: Add some Kconfig selections scsi: qedi: Check for buffer overflow in qedi_set_path() ALSA: hda/realtek - Introduce polarity for micmute LED GPIO ALSA: isa/wavefront: prevent out of bounds write in ioctl PCI: Allow pci_resize_resource() for devices on root bus scsi: qla2xxx: Fix issue with adapter's stopping state iio: bmp280: fix compensation of humidity f2fs: report delalloc reserve as non-free in statfs for project quota i2c: pxa: clear all master action bits in i2c_pxa_stop_message() clk: samsung: Mark top ISP and CAM clocks on Exynos542x as critical usblp: poison URBs upon disconnect serial: 8250: Fix max baud limit in generic 8250 port dm mpath: switch paths in dm_blk_ioctl() code path PCI: aardvark: Don't blindly enable ASPM L0s and don't write to read-only register ps3disk: use the default segment boundary vfio/pci: fix memory leaks in alloc_perm_bits() RDMA/mlx5: Add init2init as a modify command m68k/PCI: Fix a memory leak in an error handling path gpio: dwapb: Call acpi_gpiochip_free_interrupts() on GPIO chip de-registration mfd: wm8994: Fix driver operation if loaded as modules scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event clk: clk-flexgen: fix clock-critical handling powerpc/perf/hv-24x7: Fix inconsistent output values incase multiple hv-24x7 events run nfsd: Fix svc_xprt refcnt leak when setup callback client failed PCI: vmd: Filter resource type bits from shadow register powerpc/crashkernel: Take "mem=" option into account pwm: img: Call pm_runtime_put() in pm_runtime_get_sync() failed case yam: fix possible memory leak in yam_init_driver NTB: ntb_pingpong: Choose doorbells based on port number NTB: Fix the default port and peer numbers for legacy drivers mksysmap: Fix the mismatch of '.L' symbols in System.map apparmor: fix introspection of of task mode for unconfined tasks apparmor: check/put label on apparmor_sk_clone_security() ASoC: meson: add missing free_irq() in error path scsi: sr: Fix sr_probe() missing deallocate of device minor scsi: ibmvscsi: Don't send host info in adapter info MAD after LPM apparmor: fix nnp subset test for unconfined x86/purgatory: Disable various profiling and sanitizing options staging: greybus: fix a missing-check bug in gb_lights_light_config() arm64: dts: mt8173: fix unit name warnings scsi: qedi: Do not flush offload work if ARP not resolved ARM: dts: sun8i-h2-plus-bananapi-m2-zero: Fix led polarity gpio: dwapb: Append MODULE_ALIAS for platform driver scsi: qedf: Fix crash when MFW calls for protocol stats while function is still probing pinctrl: rza1: Fix wrong array assignment of rza1l_swio_entries firmware: qcom_scm: fix bogous abuse of dma-direct internals staging: gasket: Fix mapping refcnt leak when put attribute fails staging: gasket: Fix mapping refcnt leak when register/store fails ALSA: usb-audio: Improve frames size computation ALSA: usb-audio: Fix racy list management in output queue s390/qdio: put thinint indicator after early error tty: hvc: Fix data abort due to race in hvc_open slimbus: ngd: get drvdata from correct device thermal/drivers/ti-soc-thermal: Avoid dereferencing ERR_PTR usb: dwc3: gadget: Properly handle failed kick_transfer staging: sm750fb: add missing case while setting FB_VISUAL PCI: v3-semi: Fix a memory leak in v3_pci_probe() error handling paths i2c: pxa: fix i2c_pxa_scream_blue_murder() debug output serial: amba-pl011: Make sure we initialize the port.lock spinlock drivers: base: Fix NULL pointer exception in __platform_driver_probe() if a driver developer is foolish PCI: rcar: Fix incorrect programming of OB windows PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges scsi: qla2xxx: Fix warning after FC target reset power: supply: lp8788: Fix an error handling path in 'lp8788_charger_probe()' power: supply: smb347-charger: IRQSTAT_D is volatile scsi: mpt3sas: Fix double free warnings pinctrl: rockchip: fix memleak in rockchip_dt_node_to_map dlm: remove BUG() before panic() clk: ti: composite: fix memory leak PCI: Fix pci_register_host_bridge() device_register() error handling powerpc/64: Don't initialise init_task->thread.regs tty: n_gsm: Fix SOF skipping tty: n_gsm: Fix waking up upper tty layer when room available HID: Add quirks for Trust Panora Graphic Tablet ipmi: use vzalloc instead of kmalloc for user creation powerpc/pseries/ras: Fix FWNMI_VALID off by one powerpc/ps3: Fix kexec shutdown hang vfio-pci: Mask cap zero usb/ohci-platform: Fix a warning when hibernating drm/msm/mdp5: Fix mdp5_init error path for failed mdp5_kms allocation ASoC: Intel: bytcr_rt5640: Add quirk for Toshiba Encore WT8-A tablet USB: host: ehci-mxc: Add error handling in ehci_mxc_drv_probe() tty: n_gsm: Fix bogus i++ in gsm_data_kick fpga: dfl: afu: Corrected error handling levels clk: samsung: exynos5433: Add IGNORE_UNUSED flag to sclk_i2s1 scsi: target: tcmu: Userspace must not complete queued commands arm64: tegra: Fix ethernet phy-mode for Jetson Xavier powerpc/64s/pgtable: fix an undefined behaviour dm zoned: return NULL if dmz_get_zone_for_reclaim() fails to find a zone PCI/PTM: Inherit Switch Downstream Port PTM settings from Upstream Port PCI: dwc: Fix inner MSI IRQ domain registration IB/cma: Fix ports memory leak in cma_configfs watchdog: da9062: No need to ping manually before setting timeout usb: dwc2: gadget: move gadget resume after the core is in L0 state USB: gadget: udc: s3c2410_udc: Remove pointless NULL check in s3c2410_udc_nuke usb: gadget: lpc32xx_udc: don't dereference ep pointer before null check usb: gadget: fix potential double-free in m66592_probe. usb: gadget: Fix issue with config_ep_by_speed function RDMA/iw_cxgb4: cleanup device debugfs entries on ULD remove x86/apic: Make TSC deadline timer detection message visible ASoC: fix incomplete error-handling in img_i2s_in_probe. scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd() clk: bcm2835: Fix return type of bcm2835_register_gate scsi: ufs-qcom: Fix scheduling while atomic issue KVM: PPC: Book3S HV: Ignore kmemleak false positives clk: sprd: return correct type of value for _sprd_pll_recalc_rate net: sunrpc: Fix off-by-one issues in 'rpc_ntop6' NFSv4.1 fix rpc_call_done assignment for BIND_CONN_TO_SESSION of: Fix a refcounting bug in __of_attach_node_sysfs() powerpc/4xx: Don't unmap NULL mbase extcon: adc-jack: Fix an error handling path in 'adc_jack_probe()' ASoC: fsl_asrc_dma: Fix dma_chan leak when config DMA channel failed vfio/mdev: Fix reference count leak in add_mdev_supported_type rxrpc: Adjust /proc/net/rxrpc/calls to display call->debug_id not user_ID openrisc: Fix issue with argument clobbering for clone/fork gfs2: Allow lock_nolock mount to specify jid=X scsi: iscsi: Fix reference count leak in iscsi_boot_create_kobj scsi: ufs: Don't update urgent bkops level when toggling auto bkops pinctrl: imxl: Fix an error handling path in 'imx1_pinctrl_core_probe()' pinctrl: freescale: imx: Fix an error handling path in 'imx_pinctrl_probe()' crypto: omap-sham - add proper load balancing support for multicore geneve: change from tx_error to tx_dropped on missing metadata lib/zlib: remove outdated and incorrect pre-increment optimization include/linux/bitops.h: avoid clang shift-count-overflow warnings elfnote: mark all .note sections SHF_ALLOC selftests/vm/pkeys: fix alloc_random_pkey() to make it really random blktrace: use errno instead of bi_status blktrace: fix endianness in get_pdu_int() blktrace: fix endianness for blk_log_remap() gfs2: fix use-after-free on transaction ail lists ntb_perf: pass correct struct device to dma_alloc_coherent ntb_tool: pass correct struct device to dma_alloc_coherent NTB: ntb_tool: reading the link file should not end in a NULL byte NTB: Revert the change to use the NTB device dev for DMA allocations NTB: perf: Don't require one more memory window than number of peers NTB: perf: Fix support for hardware that doesn't have port numbers NTB: perf: Fix race condition when run with ntb_test NTB: ntb_test: Fix bug when counting remote files drivers/perf: hisi: Fix wrong value for all counters enable selftests/net: in timestamping, strncpy needs to preserve null byte afs: Fix memory leak in afs_put_sysnames() ASoC: core: only convert non DPCM link to DPCM link ASoC: Intel: bytcr_rt5640: Add quirk for Toshiba Encore WT10-A tablet ASoC: rt5645: Add platform-data for Asus T101HA drm/sun4i: hdmi ddc clk: Fix size of m divider scsi: acornscsi: Fix an error handling path in acornscsi_probe() x86/idt: Keep spurious entries unset in system_vectors net/filter: Permit reading NET in load_bytes_relative when MAC not set xdp: Fix xsk_generic_xmit errno usb/xhci-plat: Set PM runtime as active on resume usb: host: ehci-platform: add a quirk to avoid stuck usb/ehci-platform: Set PM runtime as active on resume perf report: Fix NULL pointer dereference in hists__fprintf_nr_sample_events() ext4: stop overwrite the errcode in ext4_setup_super bcache: fix potential deadlock problem in btree_gc_coalesce afs: Fix non-setting of mtime when writing into mmap afs: afs_write_end() should change i_size under the right lock block: Fix use-after-free in blkdev_get() arm64: hw_breakpoint: Don't invoke overflow handler on uaccess watchpoints libata: Use per port sync for detach drm: encoder_slave: fix refcouting error for modules drm/dp_mst: Reformat drm_dp_check_act_status() a bit drm/qxl: Use correct notify port address when creating cursor ring drm/amdgpu: Replace invalid device ID with a valid device ID selinux: fix double free ext4: fix partial cluster initialization when splitting extent ext4: avoid race conditions when remounting with options that change dax drm/dp_mst: Increase ACT retry timeout to 3s x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld block: nr_sects_write(): Disable preemption on seqcount write mtd: rawnand: Pass a nand_chip object to nand_scan() mtd: rawnand: Pass a nand_chip object to nand_release() mtd: rawnand: diskonchip: Fix the probe error path mtd: rawnand: sharpsl: Fix the probe error path mtd: rawnand: xway: Fix the probe error path mtd: rawnand: orion: Fix the probe error path mtd: rawnand: oxnas: Add of_node_put() mtd: rawnand: oxnas: Fix the probe error path mtd: rawnand: socrates: Fix the probe error path mtd: rawnand: plat_nand: Fix the probe error path mtd: rawnand: mtk: Fix the probe error path mtd: rawnand: tmio: Fix the probe error path s390: fix syscall_get_error for compat processes drm/i915: Whitelist context-local timestamp in the gen9 cmdparser drm/i915/icl+: Fix hotplug interrupt disabling after storm detection crypto: algif_skcipher - Cap recv SG list at ctx->used crypto: algboss - don't wait during notifier callback kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex e1000e: Do not wake up the system via WOL if device wakeup is disabled net: octeon: mgmt: Repair filling of RX ring kretprobe: Prevent triggering kretprobe from within kprobe_flush_task sched/rt, net: Use CONFIG_PREEMPTION.patch net: core: device_rename: Use rwsem instead of a seqcount Revert "dpaa_eth: fix usage as DSA master, try 3" md: add feature flag MD_FEATURE_RAID0_LAYOUT kvm: x86: Move kvm_set_mmio_spte_mask() from x86.c to mmu.c kvm: x86: Fix reserved bits related calculation errors caused by MKTME KVM: x86/mmu: Set mmio_value to '0' if reserved #PF can't be generated Linux 4.19.130 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I8fff23470852b747c3d75461b45f9d77460062d3 |
||
|
|
49289b1fa5 |
block: Fix use-after-free in blkdev_get()
[ Upstream commit 2d3a8e2deddea6c89961c422ec0c5b851e648c14 ]
In blkdev_get() we call __blkdev_get() to do some internal jobs and if
there is some errors in __blkdev_get(), the bdput() is called which
means we have released the refcount of the bdev (actually the refcount of
the bdev inode). This means we cannot access bdev after that point. But
acctually bdev is still accessed in blkdev_get() after calling
__blkdev_get(). This results in use-after-free if the refcount is the
last one we released in __blkdev_get(). Let's take a look at the
following scenerio:
CPU0 CPU1 CPU2
blkdev_open blkdev_open Remove disk
bd_acquire
blkdev_get
__blkdev_get del_gendisk
bdev_unhash_inode
bd_acquire bdev_get_gendisk
bd_forget failed because of unhashed
bdput
bdput (the last one)
bdev_evict_inode
access bdev => use after free
[ 459.350216] BUG: KASAN: use-after-free in __lock_acquire+0x24c1/0x31b0
[ 459.351190] Read of size 8 at addr ffff88806c815a80 by task syz-executor.0/20132
[ 459.352347]
[ 459.352594] CPU: 0 PID: 20132 Comm: syz-executor.0 Not tainted 4.19.90 #2
[ 459.353628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 459.354947] Call Trace:
[ 459.355337] dump_stack+0x111/0x19e
[ 459.355879] ? __lock_acquire+0x24c1/0x31b0
[ 459.356523] print_address_description+0x60/0x223
[ 459.357248] ? __lock_acquire+0x24c1/0x31b0
[ 459.357887] kasan_report.cold+0xae/0x2d8
[ 459.358503] __lock_acquire+0x24c1/0x31b0
[ 459.359120] ? _raw_spin_unlock_irq+0x24/0x40
[ 459.359784] ? lockdep_hardirqs_on+0x37b/0x580
[ 459.360465] ? _raw_spin_unlock_irq+0x24/0x40
[ 459.361123] ? finish_task_switch+0x125/0x600
[ 459.361812] ? finish_task_switch+0xee/0x600
[ 459.362471] ? mark_held_locks+0xf0/0xf0
[ 459.363108] ? __schedule+0x96f/0x21d0
[ 459.363716] lock_acquire+0x111/0x320
[ 459.364285] ? blkdev_get+0xce/0xbe0
[ 459.364846] ? blkdev_get+0xce/0xbe0
[ 459.365390] __mutex_lock+0xf9/0x12a0
[ 459.365948] ? blkdev_get+0xce/0xbe0
[ 459.366493] ? bdev_evict_inode+0x1f0/0x1f0
[ 459.367130] ? blkdev_get+0xce/0xbe0
[ 459.367678] ? destroy_inode+0xbc/0x110
[ 459.368261] ? mutex_trylock+0x1a0/0x1a0
[ 459.368867] ? __blkdev_get+0x3e6/0x1280
[ 459.369463] ? bdev_disk_changed+0x1d0/0x1d0
[ 459.370114] ? blkdev_get+0xce/0xbe0
[ 459.370656] blkdev_get+0xce/0xbe0
[ 459.371178] ? find_held_lock+0x2c/0x110
[ 459.371774] ? __blkdev_get+0x1280/0x1280
[ 459.372383] ? lock_downgrade+0x680/0x680
[ 459.373002] ? lock_acquire+0x111/0x320
[ 459.373587] ? bd_acquire+0x21/0x2c0
[ 459.374134] ? do_raw_spin_unlock+0x4f/0x250
[ 459.374780] blkdev_open+0x202/0x290
[ 459.375325] do_dentry_open+0x49e/0x1050
[ 459.375924] ? blkdev_get_by_dev+0x70/0x70
[ 459.376543] ? __x64_sys_fchdir+0x1f0/0x1f0
[ 459.377192] ? inode_permission+0xbe/0x3a0
[ 459.377818] path_openat+0x148c/0x3f50
[ 459.378392] ? kmem_cache_alloc+0xd5/0x280
[ 459.379016] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 459.379802] ? path_lookupat.isra.0+0x900/0x900
[ 459.380489] ? __lock_is_held+0xad/0x140
[ 459.381093] do_filp_open+0x1a1/0x280
[ 459.381654] ? may_open_dev+0xf0/0xf0
[ 459.382214] ? find_held_lock+0x2c/0x110
[ 459.382816] ? lock_downgrade+0x680/0x680
[ 459.383425] ? __lock_is_held+0xad/0x140
[ 459.384024] ? do_raw_spin_unlock+0x4f/0x250
[ 459.384668] ? _raw_spin_unlock+0x1f/0x30
[ 459.385280] ? __alloc_fd+0x448/0x560
[ 459.385841] do_sys_open+0x3c3/0x500
[ 459.386386] ? filp_open+0x70/0x70
[ 459.386911] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 459.387610] ? trace_hardirqs_off_caller+0x55/0x1c0
[ 459.388342] ? do_syscall_64+0x1a/0x520
[ 459.388930] do_syscall_64+0xc3/0x520
[ 459.389490] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 459.390248] RIP: 0033:0x416211
[ 459.390720] Code: 75 14 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83
04 19 00 00 c3 48 83 ec 08 e8 0a fa ff ff 48 89 04 24 b8 02 00 00 00 0f
05 <48> 8b 3c 24 48 89 c2 e8 53 fa ff ff 48 89 d0 48 83 c4 08 48 3d
01
[ 459.393483] RSP: 002b:00007fe45dfe9a60 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
[ 459.394610] RAX: ffffffffffffffda RBX: 00007fe45dfea6d4 RCX: 0000000000416211
[ 459.395678] RDX: 00007fe45dfe9b0a RSI: 0000000000000002 RDI: 00007fe45dfe9b00
[ 459.396758] RBP: 000000000076bf20 R08: 0000000000000000 R09: 000000000000000a
[ 459.397930] R10: 0000000000000075 R11: 0000000000000293 R12: 00000000ffffffff
[ 459.399022] R13: 0000000000000bd9 R14: 00000000004cdb80 R15: 000000000076bf2c
[ 459.400168]
[ 459.400430] Allocated by task 20132:
[ 459.401038] kasan_kmalloc+0xbf/0xe0
[ 459.401652] kmem_cache_alloc+0xd5/0x280
[ 459.402330] bdev_alloc_inode+0x18/0x40
[ 459.402970] alloc_inode+0x5f/0x180
[ 459.403510] iget5_locked+0x57/0xd0
[ 459.404095] bdget+0x94/0x4e0
[ 459.404607] bd_acquire+0xfa/0x2c0
[ 459.405113] blkdev_open+0x110/0x290
[ 459.405702] do_dentry_open+0x49e/0x1050
[ 459.406340] path_openat+0x148c/0x3f50
[ 459.406926] do_filp_open+0x1a1/0x280
[ 459.407471] do_sys_open+0x3c3/0x500
[ 459.408010] do_syscall_64+0xc3/0x520
[ 459.408572] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 459.409415]
[ 459.409679] Freed by task 1262:
[ 459.410212] __kasan_slab_free+0x129/0x170
[ 459.410919] kmem_cache_free+0xb2/0x2a0
[ 459.411564] rcu_process_callbacks+0xbb2/0x2320
[ 459.412318] __do_softirq+0x225/0x8ac
Fix this by delaying bdput() to the end of blkdev_get() which means we
have finished accessing bdev.
Fixes:
|
||
|
|
ff0e96e80f |
Merge 4.19.94 into android-4.19
Changes in 4.19.94 nvme_fc: add module to ops template to allow module references nvme-fc: fix double-free scenarios on hw queues drm/amdgpu: add check before enabling/disabling broadcast mode drm/amdgpu: add cache flush workaround to gfx8 emit_fence drm/amd/display: Fixed kernel panic when booting with DP-to-HDMI dongle iio: adc: max9611: Fix too short conversion time delay PM / devfreq: Fix devfreq_notifier_call returning errno PM / devfreq: Set scaling_max_freq to max on OPP notifier error PM / devfreq: Don't fail devfreq_dev_release if not in list afs: Fix afs_find_server lookups for ipv4 peers afs: Fix SELinux setting security label on /afs RDMA/cma: add missed unregister_pernet_subsys in init failure rxe: correctly calculate iCRC for unaligned payloads scsi: lpfc: Fix memory leak on lpfc_bsg_write_ebuf_set func scsi: qla2xxx: Drop superfluous INIT_WORK of del_work scsi: qla2xxx: Don't call qlt_async_event twice scsi: qla2xxx: Fix PLOGI payload and ELS IOCB dump length scsi: qla2xxx: Configure local loop for N2N target scsi: qla2xxx: Send Notify ACK after N2N PLOGI scsi: qla2xxx: Ignore PORT UPDATE after N2N PLOGI scsi: iscsi: qla4xxx: fix double free in probe scsi: libsas: stop discovering if oob mode is disconnected drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit usb: gadget: fix wrong endpoint desc net: make socket read/write_iter() honor IOCB_NOWAIT afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP md: raid1: check rdev before reference in raid1_sync_request func s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits s390/cpum_sf: Avoid SBD overflow condition in irq handler IB/mlx4: Follow mirror sequence of device add during device removal IB/mlx5: Fix steering rule of drop and count xen-blkback: prevent premature module unload xen/balloon: fix ballooned page accounting without hotplug enabled PM / hibernate: memory_bm_find_bit(): Tighten node optimisation ALSA: hda/realtek - Add Bass Speaker and fixed dac for bass speaker ALSA: hda/realtek - Enable the bass speaker of ASUS UX431FLC ALSA: hda - fixup for the bass speaker on Lenovo Carbon X1 7th gen xfs: fix mount failure crash on invalid iclog memory access taskstats: fix data-race drm: limit to INT_MAX in create_blob ioctl netfilter: nft_tproxy: Fix port selector on Big Endian ALSA: ice1724: Fix sleep-in-atomic in Infrasonic Quartet support code ALSA: usb-audio: fix set_format altsetting sanity check ALSA: usb-audio: set the interface format after resume on Dell WD19 ALSA: hda/realtek - Add headset Mic no shutup for ALC283 drm/sun4i: hdmi: Remove duplicate cleanup calls MIPS: Avoid VDSO ABI breakage due to global register variable media: pulse8-cec: fix lost cec_transmit_attempt_done() call media: cec: CEC 2.0-only bcast messages were ignored media: cec: avoid decrementing transmit_queue_sz if it is 0 media: cec: check 'transmit_in_progress', not 'transmitting' mm/zsmalloc.c: fix the migrated zspage statistics. memcg: account security cred as well to kmemcg mm: move_pages: return valid node id in status if the page is already on the target node pstore/ram: Write new dumps to start of recycled zones locks: print unsigned ino in /proc/locks dmaengine: Fix access to uninitialized dma_slave_caps compat_ioctl: block: handle Persistent Reservations compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys() ata: ahci_brcm: Fix AHCI resources management ata: ahci_brcm: Allow optional reset controller to be used ata: ahci_brcm: Add missing clock management during recovery ata: ahci_brcm: BCM7425 AHCI requires AHCI_HFLAG_DELAY_ENGINE libata: Fix retrieving of active qcs gpiolib: fix up emulated open drain outputs riscv: ftrace: correct the condition logic in function graph tracer rseq/selftests: Fix: Namespace gettid() for compatibility with glibc 2.30 tracing: Fix lock inversion in trace_event_enable_tgid_record() tracing: Avoid memory leak in process_system_preds() tracing: Have the histogram compare functions convert to u64 first tracing: Fix endianness bug in histogram trigger apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock ALSA: cs4236: fix error return comparison of an unsigned integer ALSA: firewire-motu: Correct a typo in the clock proc string exit: panic before exit_mm() on global init exit arm64: Revert support for execute-only user mappings ftrace: Avoid potential division by zero in function profiler drm/msm: include linux/sched/task.h PM / devfreq: Check NULL governor in available_governors_show nfsd4: fix up replay_matches_cache() HID: i2c-hid: Reset ALPS touchpads on resume ACPI: sysfs: Change ACPI_MASKABLE_GPE_MAX to 0x100 xfs: don't check for AG deadlock for realtime files in bunmapi platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table Bluetooth: btusb: fix PM leak in error case of setup Bluetooth: delete a stray unlock Bluetooth: Fix memory leak in hci_connect_le_scan media: flexcop-usb: ensure -EIO is returned on error condition regulator: ab8500: Remove AB8505 USB regulator media: usb: fix memory leak in af9005_identify_state dt-bindings: clock: renesas: rcar-usb2-clock-sel: Fix typo in example arm64: dts: meson: odroid-c2: Disable usb_otg bus to avoid power failed warning tty: serial: msm_serial: Fix lockup for sysrq and oops fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP bdev: Factor out bdev revalidation into a common helper bdev: Refresh bdev size for disks without partitioning scsi: qedf: Do not retry ELS request if qedf_alloc_cmd fails drm/mst: Fix MST sideband up-reply failure handling powerpc/pseries/hvconsole: Fix stack overread via udbg selftests: rtnetlink: add addresses with fixed life time KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag rxrpc: Fix possible NULL pointer access in ICMP handling tcp: annotate tp->rcv_nxt lockless reads net: core: limit nested device depth ath9k_htc: Modify byte order for an error message ath9k_htc: Discard undersized packets xfs: periodically yield scrub threads to the scheduler net: add annotations on hh->hh_len lockless accesses ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps s390/smp: fix physical to logical CPU map for SMT xen/blkback: Avoid unmapping unmapped grant pages perf/x86/intel/bts: Fix the use of page_private() Linux 4.19.94 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic3d1a4e10565c38d0e82448f0fb7b6fd1822aab2 |
||
|
|
a06ee0d2ff |
bdev: Refresh bdev size for disks without partitioning
commit cba22d86e0a10b7070d2e6a7379dbea51aa0883c upstream. Currently, block device size in not updated on second and further open for block devices where partition scan is disabled. This is particularly annoying for example for DVD drives as that means block device size does not get updated once the media is inserted into a drive if the device is already open when inserting the media. This is actually always the case for example when pktcdvd is in use. Fix the problem by revalidating block device size on every open even for devices with partition scan disabled. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
2a5b513148 |
bdev: Factor out bdev revalidation into a common helper
commit 731dc4868311ee097757b8746eaa1b4f8b2b4f1c upstream. Factor out code handling revalidation of bdev on disk change into a common helper. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
435c9a613f |
Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-4.19.y/v5.5-rc1' into android-4.19
* aosp/upstream-f2fs-stable-linux-4.19.y: f2fs: stop GC when the victim becomes fully valid f2fs: expose main_blkaddr in sysfs f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project() f2fs: Fix deadlock in f2fs_gc() context during atomic files handling f2fs: show f2fs instance in printk_ratelimited f2fs: fix potential overflow f2fs: fix to update dir's i_pino during cross_rename f2fs: support aligned pinned file f2fs: avoid kernel panic on corruption test f2fs: fix wrong description in document f2fs: cache global IPU bio f2fs: fix to avoid memory leakage in f2fs_listxattr f2fs: check total_segments from devices in raw_super f2fs: update multi-dev metadata in resize_fs f2fs: mark recovery flag correctly in read_raw_super_block() f2fs: fix to update time in lazytime mode vfs: don't allow writes to swap files mm: set S_SWAPFILE on blockdev swap devices Bug: 146023540 Change-Id: Ia24ce5f48f245dd7ba4fd94aa00a7d84615a8b22 Signed-off-by: Jaegeuk Kim <jaegeuk@google.com> |
||
|
|
8cfd90e159 |
vfs: don't allow writes to swap files
Don't let userspace write to an active swap file because the kernel effectively has a long term lease on the storage and things could get seriously corrupted if we let this happen. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> |
||
|
|
543bb48dc4 |
block: fix the return errno for direct IO
commit a89afe58f1a74aac768a5eb77af95ef4ee15beaa upstream.
If the last bio returned is not dio->bio, the status of the bio will
not assigned to dio->bio if it is error. This will cause the whole IO
status wrong.
ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0]
<idle>-0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0]
<idle>-0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 [<idle>]
<idle>-0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 [<idle>]
ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475]
ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7
ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
We always have to assign bio->bi_status to dio->bio.bi_status because we
will only check dio->bio.bi_status when we return the whole IO to
the upper layer.
Fixes:
|
||
|
|
1e11b1d630 |
blockdev: Fix livelocks on loop device
commit 04906b2f542c23626b0ef6219b808406f8dddbe9 upstream. bd_set_size() updates also block device's block size. This is somewhat unexpected from its name and at this point, only blkdev_open() uses this functionality. Furthermore, this can result in changing block size under a filesystem mounted on a loop device which leads to livelocks inside __getblk_gfp() like: Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106 ... Call Trace: init_page_buffers+0x3e2/0x530 fs/buffer.c:904 grow_dev_page fs/buffer.c:947 [inline] grow_buffers fs/buffer.c:1009 [inline] __getblk_slow fs/buffer.c:1036 [inline] __getblk_gfp+0x906/0xb10 fs/buffer.c:1313 __bread_gfp+0x2d/0x310 fs/buffer.c:1347 sb_bread include/linux/buffer_head.h:307 [inline] fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75 fat_ent_read_block fs/fat/fatent.c:441 [inline] fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489 fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101 __fat_get_block fs/fat/inode.c:148 [inline] ... Trivial reproducer for the problem looks like: truncate -s 1G /tmp/image losetup /dev/loop0 /tmp/image mkfs.ext4 -b 1024 /dev/loop0 mount -t ext4 /dev/loop0 /mnt losetup -c /dev/loop0 l /mnt Fix the problem by moving initialization of a block device block size into a separate function and call it when needed. Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with debugging the problem. Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
73ba2fb33c |
Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
"First pull request for this merge window, there will also be a
followup request with some stragglers.
This pull request contains:
- Fix for a thundering heard issue in the wbt block code (Anchal
Agarwal)
- A few NVMe pull requests:
* Improved tracepoints (Keith)
* Larger inline data support for RDMA (Steve Wise)
* RDMA setup/teardown fixes (Sagi)
* Effects log suppor for NVMe target (Chaitanya Kulkarni)
* Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
* TP4004 (ANA) support (Christoph)
* Various NVMe fixes
- Block io-latency controller support. Much needed support for
properly containing block devices. (Josef)
- Series improving how we handle sense information on the stack
(Kees)
- Lightnvm fixes and updates/improvements (Mathias/Javier et al)
- Zoned device support for null_blk (Matias)
- AIX partition fixes (Mauricio Faria de Oliveira)
- DIF checksum code made generic (Max Gurtovoy)
- Add support for discard in iostats (Michael Callahan / Tejun)
- Set of updates for BFQ (Paolo)
- Removal of async write support for bsg (Christoph)
- Bio page dirtying and clone fixups (Christoph)
- Set of bcache fix/changes (via Coly)
- Series improving blk-mq queue setup/teardown speed (Ming)
- Series improving merging performance on blk-mq (Ming)
- Lots of other fixes and cleanups from a slew of folks"
* tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
blkcg: Make blkg_root_lookup() work for queues in bypass mode
bcache: fix error setting writeback_rate through sysfs interface
null_blk: add lock drop/acquire annotation
Blk-throttle: reduce tail io latency when iops limit is enforced
block: paride: pd: mark expected switch fall-throughs
block: Ensure that a request queue is dissociated from the cgroup controller
block: Introduce blk_exit_queue()
blkcg: Introduce blkg_root_lookup()
block: Remove two superfluous #include directives
blk-mq: count the hctx as active before allocating tag
block: bvec_nr_vecs() returns value for wrong slab
bcache: trivial - remove tailing backslash in macro BTREE_FLAG
bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
bcache: set max writeback rate when I/O request is idle
bcache: add code comments for bset.c
bcache: fix mistaken comments in request.c
bcache: fix mistaken code comments in bcache.h
bcache: add a comment in super.c
bcache: avoid unncessary cache prefetch bch_btree_node_get()
bcache: display rate debug parameters to 0 when writeback is not running
...
|
||
|
|
eb181a814c |
Merge tag 'for-linus-20180727' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Bigger than usual at this time, mostly due to the O_DIRECT corruption
issue and the fact that I was on vacation last week. This contains:
- NVMe pull request with two fixes for the FC code, and two target
fixes (Christoph)
- a DIF bio reset iteration fix (Greg Edwards)
- two nbd reply and requeue fixes (Josef)
- SCSI timeout fixup (Keith)
- a small series that fixes an issue with bio_iov_iter_get_pages(),
which ended up causing corruption for larger sized O_DIRECT writes
that ended up racing with buffered writes (Martin Wilck)"
* tag 'for-linus-20180727' of git://git.kernel.dk/linux-block:
block: reset bi_iter.bi_done after splitting bio
block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
blkdev: __blkdev_direct_IO_simple: fix leak in error case
block: bio_iov_iter_get_pages: fix size of last iovec
nvmet: only check for filebacking on -ENOTBLK
nvmet: fixup crash on NULL device path
scsi: set timed out out mq requests to complete
blk-mq: export setting request completion state
nvme: if_ready checks to fail io to deleting controller
nvmet-fc: fix target sgl list on large transfers
nbd: handle unexpected replies better
nbd: don't requeue the same request twice.
|
||
|
|
9362dd1109 |
blkdev: __blkdev_direct_IO_simple: fix leak in error case
Fixes:
|
||
|
|
3f289dcb4b |
block: make bdev_ops->rw_page() take a REQ_OP instead of bool
|
||
|
|
6da2ec5605 |
treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:
kmalloc(a * b, gfp)
with:
kmalloc_array(a * b, gfp)
as well as handling cases of:
kmalloc(a * b * c, gfp)
with:
kmalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kmalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kmalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kmalloc
+ kmalloc_array
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kmalloc(sizeof(THING) * C2, ...)
|
kmalloc(sizeof(TYPE) * C2, ...)
|
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * E2
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
|
||
|
|
4a189982e2 |
Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull aio iopriority support from Al Viro: "The rest of aio stuff for this cycle - Adam's aio ioprio series" * 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: aio ioprio use ioprio_check_cap ret val fs: aio ioprio add explicit block layer dependence fs: iomap dio set bio prio from kiocb prio fs: blkdev set bio prio from kiocb prio fs: Add aio iopriority support fs: Convert kiocb rw_hint from enum to u16 block: add ioprio_check_cap function |
||
|
|
074111ca5f |
fs: blkdev set bio prio from kiocb prio
Now that kiocb has an ioprio field copy this over to the bio when it is created from the kiocb. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
52190f8abe |
fs: convert block_dev.c to bioset_init()
Convert block DIO code to embedded bio sets. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
5afb78356c |
block: don't print a message when the device went away
The information about a size change in this case just creates confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
4163a03984 |
block: unexport check_disk_size_change
Only used in block_dev.c and the partitions code, and it should remain that way.. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
9f3a0941fb |
Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This cycle was was not something I ever want to repeat as there were
several late changes that have only now just settled.
Half of the branch up to commit
|
||
|
|
849cf55963 |
fs: don't flush pagecache when expanding block device
When changing the size of a block device, its all caches are freed. It's necessary on shrinking to prevent spurious I/Os to the disappeared region. However, on expanding, such kind of I/Os doesn't happen. Similar things can be considered for btrfs filesystem resize and resize2fs, but they are designed not to drop caches when expanding. Therefore this patch removes unnecessary cache drop. Link: http://lkml.kernel.org/r/1521457240-153390-1-git-send-email-shunki-fujita@cybozu.co.jp Signed-off-by: Shunki Fujita <shunki-fujita@cybozu.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
15aa8a0118 |
block, dax: remove dead code in blkdev_writepages()
Block device inodes never have S_DAX set, so kill the check for DAX and diversion to dax_writeback_mapping_range(). Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
||
|
|
560e7cb2f3 |
blockdev: Avoid two active bdev inodes for one device
When blkdev_open() races with device removal and creation it can happen
that unhashed bdev inode gets associated with newly created gendisk
like:
CPU0 CPU1
blkdev_open()
bdev = bd_acquire()
del_gendisk()
bdev_unhash_inode(bdev);
remove device
create new device with the same number
__blkdev_get()
disk = get_gendisk()
- gets reference to gendisk of the new device
Now another blkdev_open() will not find original 'bdev' as it got
unhashed, create a new one and associate it with the same 'disk' at
which point problems start as we have two independent page caches for
one device.
Fix the problem by verifying that the bdev inode didn't get unhashed
before we acquired gendisk reference. That way we make sure gendisk can
get associated only with visible bdev inodes.
Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
||
|
|
897366537f |
genhd: Fix use after free in __blkdev_get()
When two blkdev_open() calls race with device removal and recreation,
__blkdev_get() can use looked up gendisk after it is freed:
CPU0 CPU1 CPU2
del_gendisk(disk);
bdev_unhash_inode(inode);
blkdev_open() blkdev_open()
bdev = bd_acquire(inode);
- creates and returns new inode
bdev = bd_acquire(inode);
- returns the same inode
__blkdev_get(devt) __blkdev_get(devt)
disk = get_gendisk(devt);
- got structure of device going away
<finish device removal>
<new device gets
created under the same
device number>
disk = get_gendisk(devt);
- got new device structure
if (!bdev->bd_openers) {
does the first open
}
if (!bdev->bd_openers)
- false
} else {
put_disk_and_module(disk)
- remember this was old device - this was last ref and disk is
now freed
}
disk_unblock_events(disk); -> oops
Fix the problem by making sure we drop reference to disk in
__blkdev_get() only after we are really done with it.
Reported-by: Hou Tao <houtao1@huawei.com>
Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
||
|
|
9df6c29912 |
genhd: Add helper put_disk_and_module()
Add a proper counterpart to get_disk_and_module() - put_disk_and_module(). Currently it is opencoded in several places. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
e2c5923c34 |
Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.
Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:
- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.
- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.
- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)
- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)
- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).
- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)
- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).
- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).
- Fix laptop mode on blk-mq (me).
- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).
- blktrace race fix, oopsing on sg load (me).
- blk-mq optimizations (me).
- Obscure waitqueue race fix for kyber (Omar).
- NBD fixes (Josef).
- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).
- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.
- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.
- BFQ updates (Paolo).
- blk-mq atomic flags memory ordering fixes (Peter Z).
- Loop cgroup support (Shaohua).
- Lots of minor fixes from lots of different folks, both for core and
driver code"
* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...
|
||
|
|
3a0a529971 |
block, scsi: Make SCSI quiesce and resume work reliably
The contexts from which a SCSI device can be quiesced or resumed are:
* Writing into /sys/class/scsi_device/*/device/state.
* SCSI parallel (SPI) domain validation.
* The SCSI device power management methods. See also scsi_bus_pm_ops.
It is essential during suspend and resume that neither the filesystem
state nor the filesystem metadata in RAM changes. This is why while
the hibernation image is being written or restored that SCSI devices
are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
and scsi_device_resume(). In the SDEV_QUIESCE state execution of
non-preempt requests is deferred. This is realized by returning
BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
devices. Avoid that a full queue prevents power management requests
to be submitted by deferring allocation of non-preempt requests for
devices in the quiesced state. This patch has been tested by running
the following commands and by verifying that after each resume the
fio job was still running:
for ((i=0; i<10; i++)); do
(
cd /sys/block/md0/md &&
while true; do
[ "$(<sync_action)" = "idle" ] && echo check > sync_action
sleep 1
done
) &
pids=($!)
for d in /sys/class/block/sd*[a-z]; do
bdev=${d#/sys/class/block/}
hcil=$(readlink "$d/device")
hcil=${hcil#../../../}
echo 4 > "$d/queue/nr_requests"
echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \
--rw=randread --ioengine=libaio --numjobs=4 --iodepth=16 \
--iodepth_batch=1 --thread --loops=$((2**31)) &
pids+=($!)
done
sleep 1
echo "$(date) Hibernating ..." >>hibernate-test-log.txt
systemctl hibernate
sleep 10
kill "${pids[@]}"
echo idle > /sys/block/md0/md/sync_action
wait
echo "$(date) Done." >>hibernate-test-log.txt
done
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
||
|
|
ea435e1b93 |
block: add a poll_fn callback to struct request_queue
That we we can also poll non blk-mq queues. Mostly needed for the NVMe multipath code, but could also be useful elsewhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
f892760aa6 |
fs/mpage.c: fix mpage_writepage() for pages with buffers
When using FAT on a block device which supports rw_page, we can hit BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we call clean_buffers() after unlocking the page we've written. Introduce a new clean_page_buffers() which cleans all buffers associated with a page and call it from within bdev_write_page(). [akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew] Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reported-by: Toshi Kani <toshi.kani@hpe.com> Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Tested-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7f66721a7d |
fs/block_dev: remove vfs_msg() interface
Replaced by pr_err usage in commit
|
||
|
|
e253d98f5b |
Merge branch 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull nowait read support from Al Viro: "Support IOCB_NOWAIT for buffered reads and block devices" * 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: block_dev: support RFW_NOWAIT on block device nodes fs: support RWF_NOWAIT for buffered reads fs: support IOCB_NOWAIT in generic_file_buffered_read fs: pass iocb to do_generic_file_read |
||
|
|
c35fc7a5ab |
block_dev: support RFW_NOWAIT on block device nodes
All support is already there in the generic code, we just need to wire it up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
74d46992e0 |
block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
c2ee070fb0 |
block: cache the partition index in struct block_device
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
088737f44b |
Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull Writeback error handling updates from Jeff Layton:
"This pile represents the bulk of the writeback error handling fixes
that I have for this cycle. Some of the earlier patches in this pile
may look trivial but they are prerequisites for later patches in the
series.
The aim of this set is to improve how we track and report writeback
errors to userland. Most applications that care about data integrity
will periodically call fsync/fdatasync/msync to ensure that their
writes have made it to the backing store.
For a very long time, we have tracked writeback errors using two flags
in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
writeback error occurs (via mapping_set_error) and are cleared as a
side-effect of filemap_check_errors (as you noted yesterday). This
model really sucks for userland.
Only the first task to call fsync (or msync or fdatasync) will see the
error. Any subsequent task calling fsync on a file will get back 0
(unless another writeback error occurs in the interim). If I have
several tasks writing to a file and calling fsync to ensure that their
writes got stored, then I need to have them coordinate with one
another. That's difficult enough, but in a world of containerized
setups that coordination may even not be possible.
But wait...it gets worse!
The calls to filemap_check_errors can be buried pretty far down in the
call stack, and there are internal callers of filemap_write_and_wait
and the like that also end up clearing those errors. Many of those
callers ignore the error return from that function or return it to
userland at nonsensical times (e.g. truncate() or stat()). If I get
back -EIO on a truncate, there is no reason to think that it was
because some previous writeback failed, and a subsequent fsync() will
(incorrectly) return 0.
This pile aims to do three things:
1) ensure that when a writeback error occurs that that error will be
reported to userland on a subsequent fsync/fdatasync/msync call,
regardless of what internal callers are doing
2) report writeback errors on all file descriptions that were open at
the time that the error occurred. This is a user-visible change,
but I think most applications are written to assume this behavior
anyway. Those that aren't are unlikely to be hurt by it.
3) document what filesystems should do when there is a writeback
error. Today, there is very little consistency between them, and a
lot of cargo-cult copying. We need to make it very clear what
filesystems should do in this situation.
To achieve this, the set adds a new data type (errseq_t) and then
builds new writeback error tracking infrastructure around that. Once
all of that is in place, we change the filesystems to use the new
infrastructure for reporting wb errors to userland.
Note that this is just the initial foray into cleaning up this mess.
There is a lot of work remaining here:
1) convert the rest of the filesystems in a similar fashion. Once the
initial set is in, then I think most other fs' will be fairly
simple to convert. Hopefully most of those can in via individual
filesystem trees.
2) convert internal waiters on writeback to use errseq_t for
detecting errors instead of relying on the AS_* flags. I have some
draft patches for this for ext4, but they are not quite ready for
prime time yet.
This was a discussion topic this year at LSF/MM too. If you're
interested in the gory details, LWN has some good articles about this:
https://lwn.net/Articles/718734/
https://lwn.net/Articles/724307/"
* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
btrfs: minimal conversion to errseq_t writeback error reporting on fsync
xfs: minimal conversion to errseq_t writeback error reporting
ext4: use errseq_t based error handling for reporting data writeback errors
fs: convert __generic_file_fsync to use errseq_t based reporting
block: convert to errseq_t based writeback error tracking
dax: set errors in mapping when writeback fails
Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
fs: new infrastructure for writeback error handling and reporting
lib: add errseq_t type and infrastructure for handling it
mm: don't TestClearPageError in __filemap_fdatawait_range
mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
jbd2: don't clear and reset errors after waiting on writeback
buffer: set errors in mapping at the time that the error occurs
fs: check for writeback errors after syncing out buffers in generic_file_fsync
buffer: use mapping_set_error instead of setting the flag
mm: fix mapping_set_error call in me_pagecache_dirty
|
||
|
|
372cf243ea |
block: convert to errseq_t based writeback error tracking
This is a very minimal conversion to errseq_t based error tracking for raw block device access. Just have it use the standard file_write_and_wait_range call. Note that there are internal callers that call sync_blockdev and the like that are not affected by this. They'll continue to use the AS_EIO/AS_ENOSPC flags for error reporting like they always have for now. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> |
||
|
|
5660e13d2f |
fs: new infrastructure for writeback error handling and reporting
Most filesystems currently use mapping_set_error and filemap_check_errors for setting and reporting/clearing writeback errors at the mapping level. filemap_check_errors is indirectly called from most of the filemap_fdatawait_* functions and from filemap_write_and_wait*. These functions are called from all sorts of contexts to wait on writeback to finish -- e.g. mostly in fsync, but also in truncate calls, getattr, etc. The non-fsync callers are problematic. We should be reporting writeback errors during fsync, but many places spread over the tree clear out errors before they can be properly reported, or report errors at nonsensical times. If I get -EIO on a stat() call, there is no reason for me to assume that it is because some previous writeback failed. The fact that it also clears out the error such that a subsequent fsync returns 0 is a bug, and a nasty one since that's potentially silent data corruption. This patch adds a small bit of new infrastructure for setting and reporting errors during address_space writeback. While the above was my original impetus for adding this, I think it's also the case that current fsync semantics are just problematic for userland. Most applications that call fsync do so to ensure that the data they wrote has hit the backing store. In the case where there are multiple writers to the file at the same time, this is really hard to determine. The first one to call fsync will see any stored error, and the rest get back 0. The processes with open fds may not be associated with one another in any way. They could even be in different containers, so ensuring coordination between all fsync callers is not really an option. One way to remedy this would be to track what file descriptor was used to dirty the file, but that's rather cumbersome and would likely be slow. However, there is a simpler way to improve the semantics here without incurring too much overhead. This set adds an errseq_t to struct address_space, and a corresponding one is added to struct file. Writeback errors are recorded in the mapping's errseq_t, and the one in struct file is used as the "since" value. This changes the semantics of the Linux fsync implementation such that applications can now use it to determine whether there were any writeback errors since fsync(fd) was last called (or since the file was opened in the case of fsync having never been called). Note that those writeback errors may have occurred when writing data that was dirtied via an entirely different fd, but that's the case now with the current mapping_set_error/filemap_check_error infrastructure. This will at least prevent you from getting a false report of success. The new behavior is still consistent with the POSIX spec, and is more reliable for application developers. This patch just adds some basic infrastructure for doing this, and ensures that the f_wb_err "cursor" is properly set when a file is opened. Later patches will change the existing code to use this new infrastructure for reporting errors at fsync time. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> |
||
|
|
c6b1e36c8f |
Merge branch 'for-4.13/block' of git://git.kernel.dk/linux-block
Pull core block/IO updates from Jens Axboe:
"This is the main pull request for the block layer for 4.13. Not a huge
round in terms of features, but there's a lot of churn related to some
core cleanups.
Note this depends on the UUID tree pull request, that Christoph
already sent out.
This pull request contains:
- A series from Christoph, unifying the error/stats codes in the
block layer. We now use blk_status_t everywhere, instead of using
different schemes for different places.
- Also from Christoph, some cleanups around request allocation and IO
scheduler interactions in blk-mq.
- And yet another series from Christoph, cleaning up how we handle
and do bounce buffering in the block layer.
- A blk-mq debugfs series from Bart, further improving on the support
we have for exporting internal information to aid debugging IO
hangs or stalls.
- Also from Bart, a series that cleans up the request initialization
differences across types of devices.
- A series from Goldwyn Rodrigues, allowing the block layer to return
failure if we will block and the user asked for non-blocking.
- Patch from Hannes for supporting setting loop devices block size to
that of the underlying device.
- Two series of patches from Javier, fixing various issues with
lightnvm, particular around pblk.
- A series from me, adding support for write hints. This comes with
NVMe support as well, so applications can help guide data placement
on flash to improve performance, latencies, and write
amplification.
- A series from Ming, improving and hardening blk-mq support for
stopping/starting and quiescing hardware queues.
- Two pull requests for NVMe updates. Nothing major on the feature
side, but lots of cleanups and bug fixes. From the usual crew.
- A series from Neil Brown, greatly improving the bio rescue set
support. Most notably, this kills the bio rescue work queues, if we
don't really need them.
- Lots of other little bug fixes that are all over the place"
* 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits)
lightnvm: pblk: set line bitmap check under debug
lightnvm: pblk: verify that cache read is still valid
lightnvm: pblk: add initialization check
lightnvm: pblk: remove target using async. I/Os
lightnvm: pblk: use vmalloc for GC data buffer
lightnvm: pblk: use right metadata buffer for recovery
lightnvm: pblk: schedule if data is not ready
lightnvm: pblk: remove unused return variable
lightnvm: pblk: fix double-free on pblk init
lightnvm: pblk: fix bad le64 assignations
nvme: Makefile: remove dead build rule
blk-mq: map all HWQ also in hyperthreaded system
nvmet-rdma: register ib_client to not deadlock in device removal
nvme_fc: fix error recovery on link down.
nvmet_fc: fix crashes on bad opcodes
nvme_fc: Fix crash when nvme controller connection fails.
nvme_fc: replace ioabort msleep loop with completion
nvme_fc: fix double calls to nvme_cleanup_cmd()
nvme-fabrics: verify that a controller returns the correct NQN
nvme: simplify nvme_dev_attrs_are_visible
...
|
||
|
|
9ae3b3f52c |
block: provide bio_uninit() free freeing integrity/task associations
Wen reports significant memory leaks with DIF and O_DIRECT:
"With nvme devive + T10 enabled, On a system it has 256GB and started
logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
leaking.
/proc/meminfo | grep SUnreclaim...
SUnreclaim: 6752128 kB
SUnreclaim: 6874880 kB
SUnreclaim: 7238080 kB
....
SUnreclaim: 22307264 kB
SUnreclaim: 22485888 kB
SUnreclaim: 22720256 kB
When testcases with T10 enabled call into __blkdev_direct_IO_simple,
code doesn't free memory allocated by bio_integrity_alloc. The patch
fixes the issue. HTX has been run with +60 hours without failure."
Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
doesn't go through the regular bio free. This means that any ancillary
data allocated with the bio through the stack is not freed. Hence, we
can leak the integrity data associated with the bio, if the device is
using DIF/DIX.
Fix this by providing a bio_uninit() and export it, so that we can use
it to free this data. Note that this is a minimal fix for this issue.
Any current user of bio's that are allocated outside of
bio_alloc_bioset() suffers from this issue, most notably some drivers.
We will fix those in a more comprehensive patch for 4.13. This also
means that the commit marked as being fixed by this isn't the real
culprit, it's just the most obvious one out there.
Fixes:
|
||
|
|
45d06cf701 |
fs: add O_DIRECT and aio support for sending down write life time hints
Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
011067b056 |
blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
"flags" arguments are often seen as good API design as they allow easy extensibility. bioset_create_nobvec() is implemented internally as a variation in flags passed to __bioset_create(). To support future extension, make the internal structure part of the API. i.e. add a 'flags' argument to bioset_create() and discard bioset_create_nobvec(). Note that the bio_split allocations in drivers/md/raid* do not need the bvec mempool - they should have used bioset_create_nobvec(). Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
4e4cbee93d |
block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
36ffc6c1c0 |
block_dev: propagate bio_iov_iter_get_pages error in __blkdev_direct_IO
Once we move the block layer to its own status code we'll still want to propagate the bio_iov_iter_get_pages, so restructure __blkdev_direct_IO to take ret into account when returning the errno. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
0fcc3ab23d |
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"Incremental fixes and a small feature addition on top of the main
libnvdimm 4.12 pull request:
- Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
The size regression is fixed by moving all dax helpers into the
dax-core and only specifying "select DAX" for FS_DAX and
dax-capable drivers. He also asked for clarification of the
NR_DEV_DAX config option which, on closer look, does not need to be
a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
for good measure.
- Ben's attention to detail on -stable patch submissions caught a
case where the recent fixes to arch_copy_from_iter_pmem() missed a
condition where we strand dirty data in the cache. This is tagged
for -stable and will also be included in the rework of the pmem api
to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.
- Vishal adds a feature that missed the initial pull due to pending
review feedback. It allows the kernel to clear media errors when
initializing a BTT (atomic sector update driver) instance on a pmem
namespace.
- Ross noticed that the dax_device + dax_operations conversion broke
__dax_zero_page_range(). The nvdimm unit tests fail to check this
path, but xfstests immediately trips over it. No excuse for missing
this before submitting the 4.12 pull request.
These all pass the nvdimm unit tests and an xfstests spot check. The
set has received a build success notification from the kbuild robot"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
filesystem-dax: fix broken __dax_zero_page_range() conversion
libnvdimm, btt: ensure that initializing metadata clears poison
libnvdimm: add an atomic vs process context flag to rw_bytes
x86, pmem: Fix cache flushing for iovec write < 8 bytes
device-dax: kill NR_DEV_DAX
block, dax: move "select DAX" from BLOCK to FS_DAX
device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
|