be047631defd955980ea239e1f690e1f733692fd
13248 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
bc09bee25e |
Merge 4.19.156 into android-4.19-stable
Changes in 4.19.156 drm/i915: Break up error capture compression loops with cond_resched() tipc: fix use-after-free in tipc_bcast_get_mode ptrace: fix task_join_group_stop() for the case when current is traced cadence: force nonlinear buffers to be cloned chelsio/chtls: fix memory leaks caused by a race chelsio/chtls: fix always leaking ctrl_skb gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP gianfar: Account for Tx PTP timestamp in the skb headroom net: usb: qmi_wwan: add Telit LE910Cx 0x1230 composition sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian platforms sfp: Fix error handing in sfp_probe() blktrace: fix debugfs use after free btrfs: extent_io: Kill the forward declaration of flush_write_bio btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Revert "btrfs: flush write bio if we loop in extent_write_cache_pages" btrfs: flush write bio if we loop in extent_write_cache_pages btrfs: extent_io: Handle errors better in extent_write_full_page() btrfs: extent_io: Handle errors better in btree_write_cache_pages() btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io() Btrfs: fix unwritten extent buffers and hangs on future writeback attempts btrfs: Don't submit any btree write bio if the fs has errors btrfs: Move btrfs_check_chunk_valid() to tree-check.[ch] and export it btrfs: tree-checker: Make chunk item checker messages more readable btrfs: tree-checker: Make btrfs_check_chunk_valid() return EUCLEAN instead of EIO btrfs: tree-checker: Check chunk item at tree block read time btrfs: tree-checker: Verify dev item btrfs: tree-checker: Fix wrong check on max devid btrfs: tree-checker: Enhance chunk checker to validate chunk profile btrfs: tree-checker: Verify inode item btrfs: tree-checker: fix the error message for transid error Fonts: Replace discarded const qualifier ALSA: usb-audio: Add implicit feedback quirk for Zoom UAC-2 ALSA: usb-audio: add usb vendor id as DSD-capable for Khadas devices ALSA: usb-audio: Add implicit feedback quirk for Qu-16 ALSA: usb-audio: Add implicit feedback quirk for MODX mm: mempolicy: fix potential pte_unmap_unlock pte error lib/crc32test: remove extra local_irq_disable/enable kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled mm: always have io_remap_pfn_range() set pgprot_decrypted() gfs2: Wake up when sd_glock_disposal becomes zero ring-buffer: Fix recursion protection transitions between interrupt context ftrace: Fix recursion check for NMI test ftrace: Handle tracing when switching between context tracing: Fix out of bounds write in get_trace_buf futex: Handle transient "ownerless" rtmutex state correctly ARM: dts: sun4i-a10: fix cpu_alert temperature x86/kexec: Use up-to-dated screen_info copy to fill boot params of: Fix reserved-memory overlap detection blk-cgroup: Fix memleak on error path blk-cgroup: Pre-allocate tree node on blkg_conf_prep scsi: core: Don't start concurrent async scan on same host vsock: use ns_capable_noaudit() on socket create drm/vc4: drv: Add error handding for bind ACPI: NFIT: Fix comparison to '-ENXIO' vt: Disable KD_FONT_OP_COPY fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent serial: 8250_mtk: Fix uart_get_baud_rate warning serial: txx9: add missing platform_driver_unregister() on error in serial_txx9_init USB: serial: cyberjack: fix write-URB completion race USB: serial: option: add Quectel EC200T module support USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231 USB: serial: option: add Telit FN980 composition 0x1055 USB: Add NO_LPM quirk for Kingston flash drive usb: mtu3: fix panic in mtu3_gadget_stop() ARC: stack unwinding: avoid indefinite looping Revert "ARC: entry: fix potential EFA clobber when TIF_SYSCALL_TRACE" PM: runtime: Resume the device earlier in __device_release_driver() perf/core: Fix a memory leak in perf_event_parse_addr_filter() tools: perf: Fix build error in v4.19.y net: dsa: read mac address from DT for slave device arm64: dts: marvell: espressobin: Add ethernet switch aliases Linux 4.19.156 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I87af8871465f54de0332fa74bc1f342b7fe99061 |
||
|
|
d4fea27ffc |
mm: mempolicy: fix potential pte_unmap_unlock pte error
commit 3f08842098e842c51e3b97d0dcdebf810b32558e upstream.
When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
pte_unmap_unlock seems like not a good idea.
queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
migrate misplaced pages but returns with EIO when encountering such a
page. Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return
-EIO when MPOL_MF_STRICT is specified") and early break on the first pte
in the range results in pte_unmap_unlock on an underflow pte. This can
lead to lockups later on when somebody tries to lock the pte resp.
page_table_lock again..
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Feilong Lin <linfeilong@huawei.com>
Cc: Shijie Luo <luoshijie1@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
d624a2b1ab |
UPSTREAM: mm/sl[uo]b: export __kmalloc_track(_node)_caller
slab does this already, and I want to use this in a memory allocation tracker in drm for stuff that's tied to the lifetime of a drm_device, not the underlying struct device. Kinda like devres, but for drm. Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Link: https://patchwork.freedesktop.org/patch/msgid/20200323144950.3018436-2-daniel.vetter@ffwll.ch (cherry picked from commit fd7cb5753ef49964ea9db5121c3fc9a4ec21ed8e) Bug: 163141236 Signed-off-by: Alistair Delva <adelva@google.com> Change-Id: I10790befc779311a5f2ee441e4d073e51a5a7a62 |
||
|
|
b9a942466b |
Merge 4.19.153 into android-4.19-stable
Changes in 4.19.153
ibmveth: Switch order of ibmveth_helper calls.
ibmveth: Identify ingress large send packets.
ipv4: Restore flowi4_oif update before call to xfrm_lookup_route
mlx4: handle non-napi callers to napi_poll
net: fec: Fix phy_device lookup for phy_reset_after_clk_enable()
net: fec: Fix PHY init after phy_reset_after_clk_enable()
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix valid DMBE buffer sizes
net: usb: qmi_wwan: add Cellient MPL200 card
tipc: fix the skb_unshare() in tipc_buf_append()
net/ipv4: always honour route mtu during forwarding
r8169: fix data corruption issue on RTL8402
net/tls: sendfile fails with ktls offload
binder: fix UAF when releasing todo list
ALSA: bebob: potential info leak in hwdep_read()
chelsio/chtls: fix socket lock
chelsio/chtls: correct netdevice for vlan interface
chelsio/chtls: correct function return and return type
net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
nfc: Ensure presence of NFC_ATTR_FIRMWARE_NAME attribute in nfc_genl_fw_download()
tcp: fix to update snd_wl1 in bulk receiver fast path
r8169: fix operation under forced interrupt threading
icmp: randomize the global rate limiter
ALSA: hda/realtek: Enable audio jacks of ASUS D700SA with ALC887
cifs: remove bogus debug code
cifs: Return the error from crypt_message when enc/dec key not found.
KVM: x86/mmu: Commit zap of remaining invalid pages when recovering lpages
KVM: SVM: Initialize prev_ga_tag before use
ima: Don't ignore errors from crypto_shash_update()
crypto: algif_aead - Do not set MAY_BACKLOG on the async path
EDAC/i5100: Fix error handling order in i5100_init_one()
EDAC/ti: Fix handling of platform_get_irq() error
x86/fpu: Allow multiple bits in clearcpuid= parameter
drivers/perf: xgene_pmu: Fix uninitialized resource struct
x86/nmi: Fix nmi_handle() duration miscalculation
x86/events/amd/iommu: Fix sizeof mismatch
crypto: algif_skcipher - EBUSY on aio should be an error
crypto: mediatek - Fix wrong return value in mtk_desc_ring_alloc()
crypto: ixp4xx - Fix the size used in a 'dma_free_coherent()' call
crypto: picoxcell - Fix potential race condition bug
media: tuner-simple: fix regression in simple_set_radio_freq
media: Revert "media: exynos4-is: Add missed check for pinctrl_lookup_state()"
media: m5mols: Check function pointer in m5mols_sensor_power
media: uvcvideo: Set media controller entity functions
media: uvcvideo: Silence shift-out-of-bounds warning
media: omap3isp: Fix memleak in isp_probe
crypto: omap-sham - fix digcnt register handling with export/import
hwmon: (pmbus/max34440) Fix status register reads for MAX344{51,60,61}
cypto: mediatek - fix leaks in mtk_desc_ring_alloc
media: mx2_emmaprp: Fix memleak in emmaprp_probe
media: tc358743: initialize variable
media: tc358743: cleanup tc358743_cec_isr
media: rcar-vin: Fix a reference count leak.
media: rockchip/rga: Fix a reference count leak.
media: platform: fcp: Fix a reference count leak.
media: camss: Fix a reference count leak.
media: s5p-mfc: Fix a reference count leak
media: stm32-dcmi: Fix a reference count leak
media: ti-vpe: Fix a missing check and reference count leak
regulator: resolve supply after creating regulator
pinctrl: bcm: fix kconfig dependency warning when !GPIOLIB
spi: spi-s3c64xx: swap s3c64xx_spi_set_cs() and s3c64xx_enable_datapath()
spi: spi-s3c64xx: Check return values
ath10k: provide survey info as accumulated data
Bluetooth: hci_uart: Cancel init work before unregistering
ath6kl: prevent potential array overflow in ath6kl_add_new_sta()
ath9k: Fix potential out of bounds in ath9k_htc_txcompletion_cb()
ath10k: Fix the size used in a 'dma_free_coherent()' call in an error handling path
wcn36xx: Fix reported 802.11n rx_highest rate wcn3660/wcn3680
ASoC: qcom: lpass-platform: fix memory leak
ASoC: qcom: lpass-cpu: fix concurrency issue
brcmfmac: check ndev pointer
mwifiex: Do not use GFP_KERNEL in atomic context
staging: rtl8192u: Do not use GFP_KERNEL in atomic context
drm/gma500: fix error check
scsi: qla4xxx: Fix an error handling path in 'qla4xxx_get_host_stats()'
scsi: qla2xxx: Fix wrong return value in qla_nvme_register_hba()
scsi: csiostor: Fix wrong return value in csio_hw_prep_fw()
backlight: sky81452-backlight: Fix refcount imbalance on error
VMCI: check return value of get_user_pages_fast() for errors
tty: serial: earlycon dependency
tty: hvcs: Don't NULL tty->driver_data until hvcs_cleanup()
pty: do tty_flip_buffer_push without port->lock in pty_write
pwm: lpss: Fix off by one error in base_unit math in pwm_lpss_prepare()
pwm: lpss: Add range limit check for the base_unit register value
drivers/virt/fsl_hypervisor: Fix error handling path
video: fbdev: vga16fb: fix setting of pixclock because a pass-by-value error
video: fbdev: sis: fix null ptr dereference
video: fbdev: radeon: Fix memleak in radeonfb_pci_register
HID: roccat: add bounds checking in kone_sysfs_write_settings()
pinctrl: mcp23s08: Fix mcp23x17_regmap initialiser
pinctrl: mcp23s08: Fix mcp23x17 precious range
net/mlx5: Don't call timecounter cyc2time directly from 1PPS flow
net: stmmac: use netif_tx_start|stop_all_queues() function
cpufreq: armada-37xx: Add missing MODULE_DEVICE_TABLE
net: dsa: rtl8366: Check validity of passed VLANs
net: dsa: rtl8366: Refactor VLAN/PVID init
net: dsa: rtl8366: Skip PVID setting if not requested
net: dsa: rtl8366rb: Support all 4096 VLANs
ath6kl: wmi: prevent a shift wrapping bug in ath6kl_wmi_delete_pstream_cmd()
misc: mic: scif: Fix error handling path
ALSA: seq: oss: Avoid mutex lock for a long-time ioctl
usb: dwc2: Fix parameter type in function pointer prototype
quota: clear padding in v2r1_mem2diskdqb()
slimbus: core: check get_addr before removing laddr ida
slimbus: core: do not enter to clock pause mode in core
slimbus: qcom-ngd-ctrl: disable ngd in qmi server down callback
HID: hid-input: fix stylus battery reporting
qtnfmac: fix resource leaks on unsupported iftype error return path
net: enic: Cure the enic api locking trainwreck
mfd: sm501: Fix leaks in probe()
iwlwifi: mvm: split a print to avoid a WARNING in ROC
usb: gadget: f_ncm: fix ncm_bitrate for SuperSpeed and above.
usb: gadget: u_ether: enable qmult on SuperSpeed Plus as well
nl80211: fix non-split wiphy information
usb: dwc2: Fix INTR OUT transfers in DDMA mode.
scsi: target: tcmu: Fix warning: 'page' may be used uninitialized
scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs()
platform/x86: mlx-platform: Remove PSU EEPROM configuration
mwifiex: fix double free
ipvs: clear skb->tstamp in forwarding path
net: korina: fix kfree of rx/tx descriptor array
netfilter: nf_log: missing vlan offload tag and proto
mm/memcg: fix device private memcg accounting
mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
IB/mlx4: Fix starvation in paravirt mux/demux
IB/mlx4: Adjust delayed work when a dup is observed
powerpc/pseries: Fix missing of_node_put() in rng_init()
powerpc/icp-hv: Fix missing of_node_put() in success path
RDMA/ucma: Fix locking for ctx->events_reported
RDMA/ucma: Add missing locking around rdma_leave_multicast()
mtd: lpddr: fix excessive stack usage with clang
powerpc/pseries: explicitly reschedule during drmem_lmb list traversal
mtd: mtdoops: Don't write panic data twice
ARM: 9007/1: l2c: fix prefetch bits init in L2X0_AUX_CTRL using DT values
arc: plat-hsdk: fix kconfig dependency warning when !RESET_CONTROLLER
xfs: limit entries returned when counting fsmap records
xfs: fix high key handling in the rt allocator's query_range function
RDMA/qedr: Fix use of uninitialized field
RDMA/qedr: Fix inline size returned for iWARP
powerpc/tau: Use appropriate temperature sample interval
powerpc/tau: Convert from timer to workqueue
powerpc/tau: Remove duplicated set_thresholds() call
Linux 4.19.153
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9e85e8ca67ab8e28d04a77339f80fdbf3c568956
|
||
|
|
a3d0ceee71 |
mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
[ Upstream commit 67197a4f28d28d0b073ab0427b03cb2ee5382578 ]
Currently __set_oom_adj loops through all processes in the system to keep
oom_score_adj and oom_score_adj_min in sync between processes sharing
their mm. This is done for any task with more that one mm_users, which
includes processes with multiple threads (sharing mm and signals).
However for such processes the loop is unnecessary because their signal
structure is shared as well.
Android updates oom_score_adj whenever a tasks changes its role
(background/foreground/...) or binds to/unbinds from a service, making it
more/less important. Such operation can happen frequently. We noticed
that updates to oom_score_adj became more expensive and after further
investigation found out that the patch mentioned in "Fixes" introduced a
regression. Using Pixel 4 with a typical Android workload, write time to
oom_score_adj increased from ~3.57us to ~362us. Moreover this regression
linearly depends on the number of multi-threaded processes running on the
system.
Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with
(CLONE_VM && !CLONE_THREAD && !CLONE_VFORK). Change __set_oom_adj to use
MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj
update should be synchronized between multiple processes. To prevent
races between clone() and __set_oom_adj(), when oom_score_adj of the
process being cloned might be modified from userspace, we use
oom_adj_mutex. Its scope is changed to global.
The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for
the case of vfork(). To prevent performance regressions of vfork(), we
skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is
specified. Clearing the MMF_MULTIPROCESS flag (when the last process
sharing the mm exits) is left out of this patch to keep it simple and
because it is believed that this threading model is rare. Should there
ever be a need for optimizing that case as well, it can be done by hooking
into the exit path, likely following the mm_update_next_owner pattern.
With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being
quite rare, the regression is gone after the change is applied.
[surenb@google.com: v3]
Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com
Fixes:
|
||
|
|
2761fff65f |
mm/memcg: fix device private memcg accounting
[ Upstream commit 9a137153fc8798a89d8fce895cd0a06ea5b8e37c ]
The code in mc_handle_swap_pte() checks for non_swap_entry() and returns
NULL before checking is_device_private_entry() so device private pages are
never handled. Fix this by checking for non_swap_entry() after handling
device private swap PTEs.
I assume the memory cgroup accounting would be off somehow when moving
a process to another memory cgroup. Currently, the device private page
is charged like a normal anonymous page when allocated and is uncharged
when the page is freed so I think that path is OK.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Link: https://lkml.kernel.org/r/20201009215952.2726-1-rcampbell@nvidia.com
xFixes:
|
||
|
|
9f80205d66 |
Merge 4.19.151 into android-4.19-stable
Changes in 4.19.151 fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts fbcon: Fix global-out-of-bounds read in fbcon_get_font() Revert "ravb: Fixed to be able to unload modules" net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() drm/nouveau/mem: guard against NULL pointer access in mem_del usermodehelper: reset umask to default before executing user process platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360 platform/x86: thinkpad_acpi: initialize tp_nvram_state variable platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse driver core: Fix probe_count imbalance in really_probe() perf top: Fix stdio interface input handling with glibc 2.28+ i2c: i801: Exclude device from suspend direct complete optimization mtd: rawnand: sunxi: Fix the probe error path arm64: dts: stratix10: add status to qspi dts node nvme-core: put ctrl ref when module ref get fail macsec: avoid use-after-free in macsec_handle_frame() mm/khugepaged: fix filemap page_to_pgoff(page) != offset xfrmi: drop ignore_df check before updating pmtu cifs: Fix incomplete memory allocation on setxattr path i2c: meson: fix clock setting overwrite i2c: meson: fixup rate calculation with filter delay i2c: owl: Clear NACK and BUS error bits sctp: fix sctp_auth_init_hmacs() error path team: set dev->needed_headroom in team_setup_by_port() net: team: fix memory leak in __team_options_register openvswitch: handle DNAT tuple collision drm/amdgpu: prevent double kfree ttm->sg xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate xfrm: clone XFRMA_REPLAY_ESN_VAL in xfrm_do_migrate xfrm: clone XFRMA_SEC_CTX in xfrm_do_migrate xfrm: clone whole liftime_cur structure in xfrm_do_migrate net: stmmac: removed enabling eee in EEE set callback platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP xfrm: Use correct address family in xfrm_state_find bonding: set dev->needed_headroom in bond_setup_by_slave() mdio: fix mdio-thunder.c dependency & build error net: usb: ax88179_178a: fix missing stop entry in driver_info net/mlx5e: Fix VLAN cleanup flow net/mlx5e: Fix VLAN create flow rxrpc: Fix rxkad token xdr encoding rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read() rxrpc: Fix some missing _bh annotations on locking conn->state_lock rxrpc: Fix server keyring leak perf: Fix task_function_call() error handling mmc: core: don't set limits.discard_granularity as 0 mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails Linux 4.19.151 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I9ee2b0fc4fc39f27be6ae680529e1046f249a3e6 |
||
|
|
94c5167581 |
mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
commit 4aab2be0983031a05cb4a19696c9da5749523426 upstream.
When memory is hotplug added or removed the min_free_kbytes should be
recalculated based on what is expected by khugepaged. Currently after
hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost.
This change restores min_free_kbytes as expected for THP consumers.
[vijayb@linux.microsoft.com: v5]
Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.com
Fixes:
|
||
|
|
fbe96d5aab |
mm/khugepaged: fix filemap page_to_pgoff(page) != offset
commit 033b5d77551167f8c24ca862ce83d3e0745f9245 upstream.
There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.
Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.
collapse_file(old start)
new_page = khugepaged_alloc_page(hpage)
__SetPageLocked(new_page)
new_page->index = start // hpage->index=old offset
new_page->mapping = mapping
xas_store(&xas, new_page)
filemap_fault
page = find_get_page(mapping, offset)
// if offset falls inside hpage then
// compound_head(page) == hpage
lock_page_maybe_drop_mmap()
__lock_page(page)
// collapse fails
xas_store(&xas, old page)
new_page->mapping = NULL
unlock_page(new_page)
collapse_file(new start)
new_page = khugepaged_alloc_page(hpage)
__SetPageLocked(new_page)
new_page->index = start // hpage->index=new offset
new_page->mapping = mapping // mapping becomes valid again
// since compound_head(page) == hpage
// page_to_pgoff(page) got changed
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above. But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.
Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).
Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
Reported-by: Denis Lisov <dennis.lissov@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.9+
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
2dce03a5c2 |
Merge 4.19.150 into android-4.19-stable
Changes in 4.19.150 mmc: sdhci: Workaround broken command queuing on Intel GLK based IRBIS models USB: gadget: f_ncm: Fix NDP16 datagram validation gpio: mockup: fix resource leak in error path gpio: tc35894: fix up tc35894 interrupt configuration clk: socfpga: stratix10: fix the divider for the emac_ptp_free_clk vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock vsock/virtio: stop workers during the .remove() vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock() net: virtio_vsock: Enhance connection semantics Input: i8042 - add nopnp quirk for Acer Aspire 5 A515 ftrace: Move RCU is watching check after recursion check drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices drm/sun4i: mixer: Extend regmap max_register net: dec: de2104x: Increase receive ring size for Tulip rndis_host: increase sleep time in the query-response loop nvme-core: get/put ctrl and transport module in nvme_dev_open/release() drivers/net/wan/lapbether: Make skb->protocol consistent with the header drivers/net/wan/hdlc: Set skb->protocol before transmitting mac80211: do not allow bigger VHT MPDUs than the hardware supports spi: fsl-espi: Only process interrupts for expected events nvme-fc: fail new connections to a deleted host or remote port gpio: sprd: Clear interrupt when setting the type as edge pinctrl: mvebu: Fix i2c sda definition for 98DX3236 nfs: Fix security label length not being reset clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate() i2c: cpm: Fix i2c_ram structure Input: trackpoint - enable Synaptics trackpoints random32: Restore __latent_entropy attribute on net_rand_state mm: replace memmap_context by meminit_context mm: don't rely on system state to detect hot-plug operations net/packet: fix overflow in tpacket_rcv epoll: do not insert into poll queues until all sanity checks are done epoll: replace ->visited/visited_list with generation count epoll: EPOLL_CTL_ADD: close the race in decision to take fast path ep_create_wakeup_source(): dentry name can change under you... netfilter: ctnetlink: add a range check for l3/l4 protonum Linux 4.19.150 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib6f1b6fce01bec80efd4a905d03903ff20ca89be |
||
|
|
b6f69f72c1 |
mm: don't rely on system state to detect hot-plug operations
commit f85086f95fa36194eb0db5cd5c12e56801b98523 upstream.
In register_mem_sect_under_node() the system_state's value is checked to
detect whether the call is made during boot time or during an hot-plug
operation. Unfortunately, that check against SYSTEM_BOOTING is wrong
because regular memory is registered at SYSTEM_SCHEDULING state. In
addition, memory hot-plug operation can be triggered at this system
state by the ACPI [1]. So checking against the system state is not
enough.
The consequence is that on system with interleaved node's ranges like this:
Early memory node ranges
node 1: [mem 0x0000000000000000-0x000000011fffffff]
node 2: [mem 0x0000000120000000-0x000000014fffffff]
node 1: [mem 0x0000000150000000-0x00000001ffffffff]
node 0: [mem 0x0000000200000000-0x000000048fffffff]
node 2: [mem 0x0000000490000000-0x00000007ffffffff]
This can be seen on PowerPC LPAR after multiple memory hot-plug and
hot-unplug operations are done. At the next reboot the node's memory
ranges can be interleaved and since the call to link_mem_sections() is
made in topology_init() while the system is in the SYSTEM_SCHEDULING
state, the node's id is not checked, and the sections registered to
multiple nodes:
$ ls -l /sys/devices/system/memory/memory21/node*
total 0
lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1
lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2
In that case, the system is able to boot but if later one of theses
memory blocks is hot-unplugged and then hot-plugged, the sysfs
inconsistency is detected and this is triggering a BUG_ON():
kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
Call Trace:
add_memory_resource+0x23c/0x340 (unreliable)
__add_memory+0x5c/0xf0
dlpar_add_lmb+0x1b4/0x500
dlpar_memory+0x1f8/0xb80
handle_dlpar_errorlog+0xc0/0x190
dlpar_store+0x198/0x4a0
kobj_attr_store+0x30/0x50
sysfs_kf_write+0x64/0x90
kernfs_fop_write+0x1b0/0x290
vfs_write+0xe8/0x290
ksys_write+0xdc/0x130
system_call_exception+0x160/0x270
system_call_common+0xf0/0x27c
This patch addresses the root cause by not relying on the system_state
value to detect whether the call is due to a hot-plug operation. An
extra parameter is added to link_mem_sections() detailing whether the
operation is due to a hot-plug operation.
[1] According to Oscar Salvador, using this qemu command line, ACPI
memory hotplug operations are raised at SYSTEM_SCHEDULING state:
$QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \
-m size=$MEM,slots=255,maxmem=4294967296k \
-numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \
-object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \
-object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \
-object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \
-object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \
-object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \
-object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \
-object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \
Fixes:
|
||
|
|
25eaea1b33 |
mm: replace memmap_context by meminit_context
commit c1d0da83358a2316d9be7f229f26126dbaa07468 upstream. Patch series "mm: fix memory to node bad links in sysfs", v3. Sometimes, firmware may expose interleaved memory layout like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] In that case, we can see memory blocks assigned to multiple nodes in sysfs: $ ls -l /sys/devices/system/memory/memory21 total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 -rw-r--r-- 1 root root 65536 Aug 24 05:27 online -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index drwxr-xr-x 2 root root 0 Aug 24 05:27 power -r--r--r-- 1 root root 65536 Aug 24 05:27 removable -rw-r--r-- 1 root root 65536 Aug 24 05:27 state lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones The same applies in the node's directory with a memory21 link in both the node1 and node2's directory. This is wrong but doesn't prevent the system to run. However when later, one of these memory blocks is hot-unplugged and then hot-plugged, the system is detecting an inconsistency in the sysfs layout and a BUG_ON() is raised: kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This has been seen on PowerPC LPAR. The root cause of this issue is that when node's memory is registered, the range used can overlap another node's range, thus the memory block is registered to multiple nodes in sysfs. There are two issues here: (a) The sysfs memory and node's layouts are broken due to these multiple links (b) The link errors in link_mem_sections() should not lead to a system panic. To address (a) register_mem_sect_under_node should not rely on the system state to detect whether the link operation is triggered by a hot plug operation or not. This is addressed by the patches 1 and 2 of this series. Issue (b) will be addressed separately. This patch (of 2): The memmap_context enum is used to detect whether a memory operation is due to a hot-add operation or happening at boot time. Make it general to the hotplug operation and rename it as meminit_context. There is no functional change introduced by this patch Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J . Wysocki" <rafael@kernel.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
9ce79d9bed |
Merge 4.19.149 into android-4.19-stable
Changes in 4.19.149 selinux: allow labeling before policy is loaded media: mc-device.c: fix memleak in media_device_register_entity dma-fence: Serialise signal enabling (dma_fence_enable_sw_signaling) ath10k: fix array out-of-bounds access ath10k: fix memory leak for tpc_stats_final mm: fix double page fault on arm64 if PTE_AF is cleared scsi: aacraid: fix illegal IO beyond last LBA m68k: q40: Fix info-leak in rtc_ioctl gma/gma500: fix a memory disclosure bug due to uninitialized bytes ASoC: kirkwood: fix IRQ error handling media: smiapp: Fix error handling at NVM reading arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback x86/ioapic: Unbreak check_timer() ALSA: usb-audio: Add delay quirk for H570e USB headsets ALSA: hda/realtek - Couldn't detect Mic if booting with headset plugged ALSA: hda/realtek: Enable front panel headset LED on Lenovo ThinkStation P520 lib/string.c: implement stpcpy leds: mlxreg: Fix possible buffer overflow PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out scsi: fnic: fix use after free scsi: lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounce net: silence data-races on sk_backlog.tail clk/ti/adpll: allocate room for terminating null drm/amdgpu/powerplay: fix AVFS handling with custom powerplay table mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup() mfd: mfd-core: Protect against NULL call-back function pointer drm/amdgpu/powerplay/smu7: fix AVFS handling with custom powerplay table tpm_crb: fix fTPM on AMD Zen+ CPUs tracing: Adding NULL checks for trace_array descriptor pointer bcache: fix a lost wake-up problem caused by mca_cannibalize_lock dmaengine: mediatek: hsdma_probe: fixed a memory leak when devm_request_irq fails RDMA/qedr: Fix potential use after free RDMA/i40iw: Fix potential use after free fix dget_parent() fastpath race xfs: fix attr leaf header freemap.size underflow RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()' ubi: Fix producing anchor PEBs mmc: core: Fix size overflow for mmc partitions gfs2: clean up iopen glock mess in gfs2_create_inode scsi: pm80xx: Cleanup command when a reset times out debugfs: Fix !DEBUG_FS debugfs_create_automount CIFS: Properly process SMB3 lease breaks ASoC: max98090: remove msleep in PLL unlocked workaround kernel/sys.c: avoid copying possible padding bytes in copy_to_user KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy() xfs: fix log reservation overflows when allocating large rt extents neigh_stat_seq_next() should increase position index rt_cpu_seq_next should increase position index ipv6_route_seq_next should increase position index seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier media: ti-vpe: cal: Restrict DMA to avoid memory corruption sctp: move trace_sctp_probe_path into sctp_outq_sack ACPI: EC: Reference count query handlers under lock scsi: ufs: Make ufshcd_add_command_trace() easier to read scsi: ufs: Fix a race condition in the tracing code dmaengine: zynqmp_dma: fix burst length configuration s390/cpum_sf: Use kzalloc and minor changes powerpc/eeh: Only dump stack once if an MMIO loop is detected Bluetooth: btrtl: Use kvmalloc for FW allocations tracing: Set kernel_stack's caller size properly ARM: 8948/1: Prevent OOB access in stacktrace ar5523: Add USB ID of SMCWUSBT-G2 wireless adapter ceph: ensure we have a new cap before continuing in fill_inode selftests/ftrace: fix glob selftest tools/power/x86/intel_pstate_tracer: changes for python 3 compatibility Bluetooth: Fix refcount use-after-free issue mm/swapfile.c: swap_next should increase position index mm: pagewalk: fix termination condition in walk_pte_range() Bluetooth: prefetch channel before killing sock KVM: fix overflow of zero page refcount with ksm running ALSA: hda: Clear RIRB status before reading WP skbuff: fix a data race in skb_queue_len() audit: CONFIG_CHANGE don't log internal bookkeeping as an event selinux: sel_avc_get_stat_idx should increase position index scsi: lpfc: Fix RQ buffer leakage when no IOCBs available scsi: lpfc: Fix coverity errors in fmdi attribute handling drm/omap: fix possible object reference leak clk: stratix10: use do_div() for 64-bit calculation crypto: chelsio - This fixes the kernel panic which occurs during a libkcapi test mt76: clear skb pointers from rx aggregation reorder buffer during cleanup ALSA: usb-audio: Don't create a mixer element with bogus volume range perf test: Fix test trace+probe_vfs_getname.sh on s390 RDMA/rxe: Fix configuration of atomic queue pair attributes KVM: x86: fix incorrect comparison in trace event dmaengine: stm32-mdma: use vchan_terminate_vdesc() in .terminate_all media: staging/imx: Missing assignment in imx_media_capture_device_register() x86/pkeys: Add check for pkey "overflow" bpf: Remove recursion prevention from rcu free callback dmaengine: stm32-dma: use vchan_terminate_vdesc() in .terminate_all dmaengine: tegra-apb: Prevent race conditions on channel's freeing drm/amd/display: dal_ddc_i2c_payloads_create can fail causing panic firmware: arm_sdei: Use cpus_read_lock() to avoid races with cpuhp random: fix data races at timer_rand_state bus: hisi_lpc: Fixup IO ports addresses to avoid use-after-free in host removal media: go7007: Fix URB type for interrupt handling Bluetooth: guard against controllers sending zero'd events timekeeping: Prevent 32bit truncation in scale64_check_overflow() ext4: fix a data race at inode->i_disksize perf jevents: Fix leak of mapfile memory mm: avoid data corruption on CoW fault into PFN-mapped VMA drm/amdgpu: increase atombios cmd timeout drm/amd/display: Stop if retimer is not available ath10k: use kzalloc to read for ath10k_sdio_hif_diag_read scsi: aacraid: Disabling TM path and only processing IOP reset Bluetooth: L2CAP: handle l2cap config request during open state media: tda10071: fix unsigned sign extension overflow xfs: don't ever return a stale pointer from __xfs_dir3_free_read xfs: mark dir corrupt when lookup-by-hash fails ext4: mark block bitmap corrupted when found instead of BUGON tpm: ibmvtpm: Wait for buffer to be set before proceeding rtc: sa1100: fix possible race condition rtc: ds1374: fix possible race condition nfsd: Don't add locks to closed or closing open stateids RDMA/cm: Remove a race freeing timewait_info KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones drm/msm: fix leaks if initialization fails drm/msm/a5xx: Always set an OPP supported hardware value tracing: Use address-of operator on section symbols thermal: rcar_thermal: Handle probe error gracefully perf parse-events: Fix 3 use after frees found with clang ASAN serial: 8250_port: Don't service RX FIFO if throttled serial: 8250_omap: Fix sleeping function called from invalid context during probe serial: 8250: 8250_omap: Terminate DMA before pushing data on RX timeout perf cpumap: Fix snprintf overflow check cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn tools: gpio-hammer: Avoid potential overflow in main nvme-multipath: do not reset on unknown status nvme: Fix controller creation races with teardown flow RDMA/rxe: Set sys_image_guid to be aligned with HW IB devices scsi: hpsa: correct race condition in offload enabled SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()' svcrdma: Fix leak of transport addresses PCI: Use ioremap(), not phys_to_virt() for platform ROM ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor PCI: pciehp: Fix MSI interrupt race NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests() mm/kmemleak.c: use address-of operator on section symbols mm/filemap.c: clear page error before actual read mm/vmscan.c: fix data races using kswapd_classzone_idx nvmet-rdma: fix double free of rdma queue mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area scsi: qedi: Fix termination timeouts in session logout serial: uartps: Wait for tx_empty in console setup KVM: Remove CREATE_IRQCHIP/SET_PIT2 race bdev: Reduce time holding bd_mutex in sync in blkdev_close() drivers: char: tlclk.c: Avoid data race between init and interrupt handler KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi() net: openvswitch: use u64 for meter bucket scsi: aacraid: Fix error handling paths in aac_probe_one() staging:r8188eu: avoid skb_clone for amsdu to msdu conversion sparc64: vcc: Fix error return code in vcc_probe() arm64: cpufeature: Relax checks for AArch32 support at EL[0-2] dt-bindings: sound: wm8994: Correct required supplies based on actual implementaion atm: fix a memory leak of vcc->user_back perf mem2node: Avoid double free related to realloc power: supply: max17040: Correct voltage reading phy: samsung: s5pv210-usb2: Add delay after reset Bluetooth: Handle Inquiry Cancel error after Inquiry Complete USB: EHCI: ehci-mv: fix error handling in mv_ehci_probe() tipc: fix memory leak in service subscripting tty: serial: samsung: Correct clock selection logic ALSA: hda: Fix potential race in unsol event handler powerpc/traps: Make unrecoverable NMIs die instead of panic fuse: don't check refcount after stealing page USB: EHCI: ehci-mv: fix less than zero comparison of an unsigned int scsi: cxlflash: Fix error return code in cxlflash_probe() arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register e1000: Do not perform reset in reset_task if we are already down drm/nouveau/debugfs: fix runtime pm imbalance on error drm/nouveau: fix runtime pm imbalance on error drm/nouveau/dispnv50: fix runtime pm imbalance on error printk: handle blank console arguments passed in. usb: dwc3: Increase timeout for CmdAct cleared by device controller btrfs: don't force read-only after error in drop snapshot vfio/pci: fix memory leaks of eventfd ctx perf evsel: Fix 2 memory leaks perf trace: Fix the selection for architectures to generate the errno name tables perf stat: Fix duration_time value for higher intervals perf util: Fix memory leak of prefix_if_not_in perf metricgroup: Free metric_events on error perf kcore_copy: Fix module map when there are no modules loaded ASoC: img-i2s-out: Fix runtime PM imbalance on error wlcore: fix runtime pm imbalance in wl1271_tx_work wlcore: fix runtime pm imbalance in wlcore_regdomain_config mtd: rawnand: omap_elm: Fix runtime PM imbalance on error PCI: tegra: Fix runtime PM imbalance on error ceph: fix potential race in ceph_check_caps mm/swap_state: fix a data race in swapin_nr_pages rapidio: avoid data race between file operation callbacks and mport_cdev_add(). mtd: parser: cmdline: Support MTD names containing one or more colons x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline vfio/pci: Clear error and request eventfd ctx after releasing cifs: Fix double add page to memcg when cifs_readpages nvme: fix possible deadlock when I/O is blocked scsi: libfc: Handling of extra kref scsi: libfc: Skip additional kref updating work event selftests/x86/syscall_nt: Clear weird flags after each test vfio/pci: fix racy on error and request eventfd ctx btrfs: qgroup: fix data leak caused by race between writeback and truncate ubi: fastmap: Free unused fastmap anchor peb during detach perf parse-events: Use strcmp() to compare the PMU name net: openvswitch: use div_u64() for 64-by-32 divisions nvme: explicitly update mpath disk capacity on revalidation ASoC: wm8994: Skip setting of the WM8994_MICBIAS register for WM1811 ASoC: wm8994: Ensure the device is resumed in wm89xx_mic_detect functions ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN Converter9 2-in-1 RISC-V: Take text_mutex in ftrace_init_nop() s390/init: add missing __init annotations lockdep: fix order in trace_hardirqs_off_caller() drm/amdkfd: fix a memory leak issue i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices() objtool: Fix noreturn detection for ignored functions ieee802154: fix one possible memleak in ca8210_dev_com_init ieee802154/adf7242: check status of adf7242_read_reg clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init() mwifiex: Increase AES key storage size to 256 bits batman-adv: bla: fix type misuse for backbone_gw hash indexing atm: eni: fix the missed pci_disable_device() for eni_init_one() batman-adv: mcast/TT: fix wrongly dropped or rerouted packets mac802154: tx: fix use-after-free bpf: Fix clobbering of r2 in bpf_gen_ld_abs drm/vc4/vc4_hdmi: fill ASoC card owner net: qed: RDMA personality shouldn't fail VF load drm/sun4i: sun8i-csc: Secondary CSC register correction batman-adv: Add missing include for in_interrupt() batman-adv: mcast: fix duplicate mcast packets in BLA backbone from mesh batman-adv: mcast: fix duplicate mcast packets from BLA backbone to mesh bpf: Fix a rcu warning for bpffs map pretty-print ALSA: asihpi: fix iounmap in error handler regmap: fix page selection for noinc reads MIPS: Add the missing 'CPU_1074K' into __get_cpu_type() KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE KVM: SVM: Add a dedicated INVD intercept routine tracing: fix double free s390/dasd: Fix zero write for FBA devices kprobes: Fix to check probe enabled before disarm_kprobe_ftrace() mm, THP, swap: fix allocating cluster for swapfile by mistake s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE ata: define AC_ERR_OK ata: make qc_prep return ata_completion_errors ata: sata_mv, avoid trigerrable BUG_ON KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch Linux 4.19.149 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Idfc1b35ec63b4b464aeb6e32709102bee0efc872 |
||
|
|
f3e8ed3d33 |
mm, THP, swap: fix allocating cluster for swapfile by mistake
commit 41663430588c737dd735bad5a0d1ba325dcabd59 upstream.
SWP_FS is used to make swap_{read,write}page() go through the
filesystem, and it's only used for swap files over NFS. So, !SWP_FS
means non NFS for now, it could be either file backed or device backed.
Something similar goes with legacy SWP_FILE.
So in order to achieve the goal of the original patch, SWP_BLKDEV should
be used instead.
FS corruption can be observed with SSD device + XFS + fragmented
swapfile due to CONFIG_THP_SWAP=y.
I reproduced the issue with the following details:
Environment:
QEMU + upstream kernel + buildroot + NVMe (2 GB)
Kernel config:
CONFIG_BLK_DEV_NVME=y
CONFIG_THP_SWAP=y
Some reproducible steps:
mkfs.xfs -f /dev/nvme0n1
mkdir /tmp/mnt
mount /dev/nvme0n1 /tmp/mnt
bs="32k"
sz="1024m" # doesn't matter too much, I also tried 16m
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw
mkswap /tmp/mnt/sw
swapon /tmp/mnt/sw
stress --vm 2 --vm-bytes 600M # doesn't matter too much as well
Symptoms:
- FS corruption (e.g. checksum failure)
- memory corruption at: 0xd2808010
- segfault
Fixes:
|
||
|
|
8cc3afd53d |
mm/swap_state: fix a data race in swapin_nr_pages
[ Upstream commit d6c1f098f2a7ba62627c9bc17cda28f534ef9e4a ] "prev_offset" is a static variable in swapin_nr_pages() that can be accessed concurrently with only mmap_sem held in read mode as noticed by KCSAN, BUG: KCSAN: data-race in swap_cluster_readahead / swap_cluster_readahead write to 0xffffffff92763830 of 8 bytes by task 14795 on cpu 17: swap_cluster_readahead+0x2a6/0x5e0 swapin_readahead+0x92/0x8dc do_swap_page+0x49b/0xf20 __handle_mm_fault+0xcfb/0xd70 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x715 page_fault+0x34/0x40 1 lock held by (dnf)/14795: #0: ffff897bd2e98858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715 do_user_addr_fault at arch/x86/mm/fault.c:1405 (inlined by) do_page_fault at arch/x86/mm/fault.c:1535 irq event stamp: 83493 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x365/0x589 irq_exit+0xa2/0xc0 read to 0xffffffff92763830 of 8 bytes by task 1 on cpu 22: swap_cluster_readahead+0xfd/0x5e0 swapin_readahead+0x92/0x8dc do_swap_page+0x49b/0xf20 __handle_mm_fault+0xcfb/0xd70 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x715 page_fault+0x34/0x40 1 lock held by systemd/1: #0: ffff897c38f14858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715 irq event stamp: 43530289 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x365/0x589 irq_exit+0xa2/0xc0 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200402213748.2237-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
6bee7991f6 |
mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area
[ Upstream commit 09ef5283fd96ac424ef0e569626f359bf9ab86c9 ]
On passing requirement to vm_unmapped_area, arch_get_unmapped_area and
arch_get_unmapped_area_topdown did not set align_offset. Internally on
both unmapped_area and unmapped_area_topdown, if info->align_mask is 0,
then info->align_offset was meaningless.
But commit df529cabb7a2 ("mm: mmap: add trace point of
vm_unmapped_area") always prints info->align_offset even though it is
uninitialized.
Fix this uninitialized value issue by setting it to 0 explicitly.
Before:
vm_unmapped_area: addr=0x755b155000 err=0 total_vm=0x15aaf0 flags=0x1 len=0x109000 lo=0x8000 hi=0x75eed48000 mask=0x0 ofs=0x4022
After:
vm_unmapped_area: addr=0x74a4ca1000 err=0 total_vm=0x168ab1 flags=0x1 len=0x9000 lo=0x8000 hi=0x753d94b000 mask=0x0 ofs=0x0
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20200409094035.19457-1-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
b73c744019 |
mm/vmscan.c: fix data races using kswapd_classzone_idx
[ Upstream commit 5644e1fbbfe15ad06785502bbfe5751223e5841d ] pgdat->kswapd_classzone_idx could be accessed concurrently in wakeup_kswapd(). Plain writes and reads without any lock protection result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as well as saving a branch (compilers might well optimize the original code in an unintentional way anyway). While at it, also take care of pgdat->kswapd_order and non-kswapd threads in allow_direct_reclaim(). The data races were reported by KCSAN, BUG: KCSAN: data-race in wakeup_kswapd / wakeup_kswapd write to 0xffff9f427ffff2dc of 4 bytes by task 7454 on cpu 13: wakeup_kswapd+0xf1/0x400 wakeup_kswapd at mm/vmscan.c:3967 wake_all_kswapds+0x59/0xc0 wake_all_kswapds at mm/page_alloc.c:4241 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_slowpath at mm/page_alloc.c:4512 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 1 lock held by mtest01/7454: #0: ffff9f425afe8808 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9 do_user_addr_fault at arch/x86/mm/fault.c:1405 (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 irq event stamp: 6944085 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 read to 0xffff9f427ffff2dc of 4 bytes by task 7472 on cpu 38: wakeup_kswapd+0xc8/0x400 wake_all_kswapds+0x59/0xc0 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 1 lock held by mtest01/7472: #0: ffff9f425a9ac148 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9 irq event stamp: 6793561 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 BUG: KCSAN: data-race in kswapd / wakeup_kswapd write to 0xffff90973ffff2dc of 4 bytes by task 820 on cpu 6: kswapd+0x27c/0x8d0 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 read to 0xffff90973ffff2dc of 4 bytes by task 6299 on cpu 0: wakeup_kswapd+0xf3/0x450 wake_all_kswapds+0x59/0xc0 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x170/0x700 __handle_mm_fault+0xc9f/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/1582749472-5171-1-git-send-email-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
cebefe4f6f |
mm/filemap.c: clear page error before actual read
[ Upstream commit faffdfa04fa11ccf048cebdde73db41ede0679e0 ]
Mount failure issue happens under the scenario: Application forked dozens
of threads to mount the same number of cramfs images separately in docker,
but several mounts failed with high probability. Mount failed due to the
checking result of the page(read from the superblock of loop dev) is not
uptodate after wait_on_page_locked(page) returned in function cramfs_read:
wait_on_page_locked(page);
if (!PageUptodate(page)) {
...
}
The reason of the checking result of the page not uptodate: systemd-udevd
read the loopX dev before mount, because the status of loopX is Lo_unbound
at this time, so loop_make_request directly trigger the calling of io_end
handler end_buffer_async_read, which called SetPageError(page). So It
caused the page can't be set to uptodate in function
end_buffer_async_read:
if(page_uptodate && !PageError(page)) {
SetPageUptodate(page);
}
Then mount operation is performed, it used the same page which is just
accessed by systemd-udevd above, Because this page is not uptodate, it
will launch a actual read via submit_bh, then wait on this page by calling
wait_on_page_locked(page). When the I/O of the page done, io_end handler
end_buffer_async_read is called, because no one cleared the page
error(during the whole read path of mount), which is caused by
systemd-udevd reading, so this page is still in "PageError" status, which
can't be set to uptodate in function end_buffer_async_read, then caused
mount failure.
But sometimes mount succeed even through systemd-udeved read loopX dev
just before, The reason is systemd-udevd launched other loopX read just
between step 3.1 and 3.2, the steps as below:
1, loopX dev default status is Lo_unbound;
2, systemd-udved read loopX dev (page is set to PageError);
3, mount operation
1) set loopX status to Lo_bound;
==>systemd-udevd read loopX dev<==
2) read loopX dev(page has no error)
3) mount succeed
As the loopX dev status is set to Lo_bound after step 3.1, so the other
loopX dev read by systemd-udevd will go through the whole I/O stack, part
of the call trace as below:
SYS_read
vfs_read
do_sync_read
blkdev_aio_read
generic_file_aio_read
do_generic_file_read:
ClearPageError(page);
mapping->a_ops->readpage(filp, page);
here, mapping->a_ops->readpage() is blkdev_readpage. In latest kernel,
some function name changed, the call trace as below:
blkdev_read_iter
generic_file_read_iter
generic_file_buffered_read:
/*
* A previous I/O error may have been due to temporary
* failures, eg. mutipath errors.
* Pg_error will be set again if readpage fails.
*/
ClearPageError(page);
/* Start the actual read. The read will unlock the page*/
error=mapping->a_ops->readpage(flip, page);
We can see ClearPageError(page) is called before the actual read,
then the read in step 3.2 succeed.
This patch is to add the calling of ClearPageError just before the actual
read of read path of cramfs mount. Without the patch, the call trace as
below when performing cramfs mount:
do_mount
cramfs_read
cramfs_blkdev_read
read_cache_page
do_read_cache_page:
filler(data, page);
or
mapping->a_ops->readpage(data, page);
With the patch, the call trace as below when performing mount:
do_mount
cramfs_read
cramfs_blkdev_read
read_cache_page:
do_read_cache_page:
ClearPageError(page); <== new add
filler(data, page);
or
mapping->a_ops->readpage(data, page);
With the patch, mount operation trigger the calling of
ClearPageError(page) before the actual read, the page has no error if no
additional page error happen when I/O done.
Signed-off-by: Xianting Tian <xianting_tian@126.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: <yubin@h3c.com>
Link: http://lkml.kernel.org/r/1583318844-22971-1-git-send-email-xianting_tian@126.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
afe001488e |
mm/kmemleak.c: use address-of operator on section symbols
[ Upstream commit b0d14fc43d39203ae025f20ef4d5d25d9ccf4be1 ]
Clang warns:
mm/kmemleak.c:1955:28: warning: array comparison always evaluates to a constant [-Wtautological-compare]
if (__start_ro_after_init < _sdata || __end_ro_after_init > _edata)
^
mm/kmemleak.c:1955:60: warning: array comparison always evaluates to a constant [-Wtautological-compare]
if (__start_ro_after_init < _sdata || __end_ro_after_init > _edata)
These are not true arrays, they are linker defined symbols, which are just
addresses. Using the address of operator silences the warning and does
not change the resulting assembly with either clang/ld.lld or gcc/ld
(tested with diff + objdump -Dr).
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/895
Link: http://lkml.kernel.org/r/20200220051551.44000-1-natechancellor@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
2b294ac325 |
mm: avoid data corruption on CoW fault into PFN-mapped VMA
[ Upstream commit c3e5ea6ee574ae5e845a40ac8198de1fb63bb3ab ]
Jeff Moyer has reported that one of xfstests triggers a warning when run
on DAX-enabled filesystem:
WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50
...
wp_page_copy+0x98c/0xd50 (unreliable)
do_wp_page+0xd8/0xad0
__handle_mm_fault+0x748/0x1b90
handle_mm_fault+0x120/0x1f0
__do_page_fault+0x240/0xd70
do_page_fault+0x38/0xd0
handle_page_fault+0x10/0x30
The warning happens on failed __copy_from_user_inatomic() which tries to
copy data into a CoW page.
This happens because of race between MADV_DONTNEED and CoW page fault:
CPU0 CPU1
handle_mm_fault()
do_wp_page()
wp_page_copy()
do_wp_page()
madvise(MADV_DONTNEED)
zap_page_range()
zap_pte_range()
ptep_get_and_clear_full()
<TLB flush>
__copy_from_user_inatomic()
sees empty PTE and fails
WARN_ON_ONCE(1)
clear_page()
The solution is to re-try __copy_from_user_inatomic() under PTL after
checking that PTE is matches the orig_pte.
The second copy attempt can still fail, like due to non-readable PTE, but
there's nothing reasonable we can do about, except clearing the CoW page.
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Justin He <Justin.He@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: http://lkml.kernel.org/r/20200218154151.13349-1-kirill.shutemov@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
f9cb6b6124 |
mm: pagewalk: fix termination condition in walk_pte_range()
[ Upstream commit c02a98753e0a36ba65a05818626fa6adeb4e7c97 ] If walk_pte_range() is called with a 'end' argument that is beyond the last page of memory (e.g. ~0UL) then the comparison between 'addr' and 'end' will always fail and the loop will be infinite. Instead change the comparison to >= while accounting for overflow. Link: http://lkml.kernel.org/r/20191218162402.45610-15-steven.price@arm.com Signed-off-by: Steven Price <steven.price@arm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Zong Li <zong.li@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
52f5a09ab7 |
mm/swapfile.c: swap_next should increase position index
[ Upstream commit 10c8d69f314d557d94d74ec492575ae6a4f1eb1c ] If seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. In Aug 2018 NeilBrown noticed commit |
||
|
|
8579a04403 |
mm: fix double page fault on arm64 if PTE_AF is cleared
[ Upstream commit 83d116c53058d505ddef051e90ab27f57015b025 ]
When we tested pmdk unit test [1] vmmalloc_fork TEST3 on arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.
To reproduce the bug, the cmd is as follows after you deployed everything:
make -C src/test/vmmalloc_fork/ TEST_TIME=60m check
Below call trace is from arm64 do_page_fault for debugging purpose:
[ 110.016195] Call trace:
[ 110.016826] do_page_fault+0x5a4/0x690
[ 110.017812] do_mem_abort+0x50/0xb0
[ 110.018726] el1_da+0x20/0xc4
[ 110.019492] __arch_copy_from_user+0x180/0x280
[ 110.020646] do_wp_page+0xb0/0x860
[ 110.021517] __handle_mm_fault+0x994/0x1338
[ 110.022606] handle_mm_fault+0xe8/0x180
[ 110.023584] do_page_fault+0x240/0x690
[ 110.024535] do_mem_abort+0x50/0xb0
[ 110.025423] el0_da+0x20/0x24
The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
pmd=000000023d4b3003, pte=360000298607bd3
As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."
This patch fixes it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()
Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
in case there can be some obscure use-case (by Kirill).
[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
Signed-off-by: Jia He <justin.he@arm.com>
Reported-by: Yibo Cai <Yibo.Cai@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
5d5e582aab |
Merge 4.19.148 into android-4.19-stable
Changes in 4.19.148 af_key: pfkey_dump needs parameter validation KVM: fix memory leak in kvm_io_bus_unregister_dev() kprobes: fix kill kprobe which has been marked as gone mm/thp: fix __split_huge_pmd_locked() for migration PMD cxgb4: Fix offset when clearing filter byte counters geneve: add transport ports in route lookup for geneve hdlc_ppp: add range checks in ppp_cp_parse_cr() ip: fix tos reflection in ack and reset packets ipv6: avoid lockdep issue in fib6_del() net: DCB: Validate DCB_ATTR_DCB_BUFFER argument net: dsa: rtl8366: Properly clear member config net: ipv6: fix kconfig dependency warning for IPV6_SEG6_HMAC net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc nfp: use correct define to return NONE fec tipc: Fix memory leak in tipc_group_create_member() tipc: fix shutdown() of connection oriented socket tipc: use skb_unshare() instead in tipc_buf_append() bnxt_en: return proper error codes in bnxt_show_temp bnxt_en: Protect bnxt_set_eee() and bnxt_set_pauseparam() with mutex. net: phy: Avoid NPD upon phy_detach() when driver is unbound net: qrtr: check skb_put_padto() return value net: add __must_check to skb_put_padto() ipv4: Update exception handling for multipath routes via same device MAINTAINERS: add CLANG/LLVM BUILD SUPPORT info kbuild: add OBJSIZE variable for the size tool Documentation/llvm: add documentation on building w/ Clang/LLVM Documentation/llvm: fix the name of llvm-size net: wan: wanxl: use allow to pass CROSS_COMPILE_M68k for rebuilding firmware net: wan: wanxl: use $(M68KCC) instead of $(M68KAS) for rebuilding firmware x86/boot: kbuild: allow readelf executable to be specified kbuild: remove AS variable kbuild: replace AS=clang with LLVM_IAS=1 kbuild: support LLVM=1 to switch the default tools to Clang/LLVM mm: memcg: fix memcg reclaim soft lockup tcp_bbr: refactor bbr_target_cwnd() for general inflight provisioning tcp_bbr: adapt cwnd based on ack aggregation estimation serial: 8250: Avoid error message on reprobe Linux 4.19.148 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5077093f58ae882547e3b23c5bbd4dfe78116071 |
||
|
|
1aa7a9e5ee |
mm: memcg: fix memcg reclaim soft lockup
commit e3336cab2579012b1e72b5265adf98e2d6e244ad upstream. We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg doesn't have any reclaimable memory. It can be easily reproduced as below: watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204] CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12 Call Trace: shrink_lruvec+0x49f/0x640 shrink_node+0x2a6/0x6f0 do_try_to_free_pages+0xe9/0x3e0 try_to_free_mem_cgroup_pages+0xef/0x1f0 try_charge+0x2c1/0x750 mem_cgroup_charge+0xd7/0x240 __add_to_page_cache_locked+0x2fd/0x370 add_to_page_cache_lru+0x4a/0xc0 pagecache_get_page+0x10b/0x2f0 filemap_fault+0x661/0xad0 ext4_filemap_fault+0x2c/0x40 __do_fault+0x4d/0xf9 handle_mm_fault+0x1080/0x1790 It only happens on our 1-vcpu instances, because there's no chance for oom reaper to run to reclaim the to-be-killed process. Add a cond_resched() at the upper shrink_node_memcgs() to solve this issue, this will mean that we will get a scheduling point for each memcg in the reclaimed hierarchy without any dependency on the reclaimable memory in that memcg thus making it more predictable. Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
ec56646e3b |
mm/thp: fix __split_huge_pmd_locked() for migration PMD
[ Upstream commit ec0abae6dcdf7ef88607c869bf35a4b63ce1b370 ]
A migrating transparent huge page has to already be unmapped. Otherwise,
the page could be modified while it is being copied to a new page and data
could be lost. The function __split_huge_pmd() checks for a PMD migration
entry before calling __split_huge_pmd_locked() leading one to think that
__split_huge_pmd_locked() can handle splitting a migrating PMD.
However, the code always increments the page->_mapcount and adjusts the
memory control group accounting assuming the page is mapped.
Also, if the PMD entry is a migration PMD entry, the call to
is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead
of migration_entry_to_pfn(pmd_to_swp_entry(pmd)). Fix these problems by
checking for a PMD migration entry.
Fixes:
|
||
|
|
0b8c61c48e |
Merge 4.19.147 into android-4.19-stable
Changes in 4.19.147 dsa: Allow forwarding of redirected IGMP traffic scsi: qla2xxx: Update rscn_rcvd field to more meaningful scan_needed scsi: qla2xxx: Move rport registration out of internal work_list scsi: qla2xxx: Reduce holding sess_lock to prevent CPU lock-up gfs2: initialize transaction tr_ailX_lists earlier RDMA/bnxt_re: Restrict the max_gids to 256 net: handle the return value of pskb_carve_frag_list() correctly hv_netvsc: Remove "unlikely" from netvsc_select_queue NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall scsi: pm8001: Fix memleak in pm8001_exec_internal_task_abort scsi: libfc: Fix for double free() scsi: lpfc: Fix FLOGI/PLOGI receive race condition in pt2pt discovery regulator: pwm: Fix machine constraints application spi: spi-loopback-test: Fix out-of-bounds read NFS: Zero-stateid SETATTR should first return delegation SUNRPC: stop printk reading past end of string rapidio: Replace 'select' DMAENGINES 'with depends on' openrisc: Fix cache API compile issue when not inlining nvme-fc: cancel async events before freeing event struct nvme-rdma: cancel async events before freeing event struct f2fs: fix indefinite loop scanning for free nid f2fs: Return EOF on unaligned end of file DIO read i2c: algo: pca: Reapply i2c bus settings after reset spi: Fix memory leak on splited transfers KVM: MIPS: Change the definition of kvm type clk: davinci: Use the correct size when allocating memory clk: rockchip: Fix initialization of mux_pll_src_4plls_p ASoC: qcom: Set card->owner to avoid warnings Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload perf test: Fix the "signal" test inline assembly MIPS: SNI: Fix MIPS_L1_CACHE_SHIFT perf test: Free formats for perf pmu parse test fbcon: Fix user font detection test at fbcon_resize(). MIPS: SNI: Fix spurious interrupts drm/mediatek: Add exception handing in mtk_drm_probe() if component init fail drm/mediatek: Add missing put_device() call in mtk_hdmi_dt_parse_pdata() USB: quirks: Add USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk for BYD zhaoxin notebook USB: UAS: fix disconnect by unplugging a hub usblp: fix race between disconnect() and read() i2c: i801: Fix resume bug Revert "ALSA: hda - Fix silent audio output and corrupted input on MSI X570-A PRO" percpu: fix first chunk size calculation for populated bitmap Input: trackpoint - add new trackpoint variant IDs Input: i8042 - add Entroware Proteus EL07R4 to nomux and reset lists serial: 8250_pci: Add Realtek 816a and 816b x86/boot/compressed: Disable relocation relaxation ehci-hcd: Move include to keep CRC stable powerpc/dma: Fix dma_map_ops::get_required_mask x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y Linux 4.19.147 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I1c512021698db2701c51491a813bec79bda6bbf5 |
||
|
|
5afd52f302 |
percpu: fix first chunk size calculation for populated bitmap
commit b3b33d3c43bbe0177d70653f4e889c78cc37f097 upstream.
Variable populated, which is a member of struct pcpu_chunk, is used as a
unit of size of unsigned long.
However, size of populated is miscounted. So, I fix this minor part.
Fixes:
|
||
|
|
009b982d9c |
Merge 4.19.144 into android-4.19-stable
Changes in 4.19.144 HID: core: Correctly handle ReportSize being zero HID: core: Sanitize event code and type when mapping input perf record/stat: Explicitly call out event modifiers in the documentation scsi: target: tcmu: Fix size in calls to tcmu_flush_dcache_range scsi: target: tcmu: Optimize use of flush_dcache_page tty: serial: qcom_geni_serial: Drop __init from qcom_geni_console_setup drm/msm: add shutdown support for display platform_driver hwmon: (applesmc) check status earlier. nvmet: Disable keep-alive timer when kato is cleared to 0h drm/msm/a6xx: fix gmu start on newer firmware ceph: don't allow setlease on cephfs cpuidle: Fixup IRQ state s390: don't trace preemption in percpu macros xen/xenbus: Fix granting of vmalloc'd memory dmaengine: of-dma: Fix of_dma_router_xlate's of_dma_xlate handling batman-adv: Avoid uninitialized chaddr when handling DHCP batman-adv: Fix own OGM check in aggregated OGMs batman-adv: bla: use netif_rx_ni when not in interrupt context dmaengine: at_hdmac: check return value of of_find_device_by_node() in at_dma_xlate() MIPS: mm: BMIPS5000 has inclusive physical caches MIPS: BMIPS: Also call bmips_cpu_setup() for secondary cores netfilter: nf_tables: add NFTA_SET_USERDATA if not null netfilter: nf_tables: incorrect enum nft_list_attributes definition netfilter: nf_tables: fix destination register zeroing net: hns: Fix memleak in hns_nic_dev_probe net: systemport: Fix memleak in bcm_sysport_probe ravb: Fixed to be able to unload modules net: arc_emac: Fix memleak in arc_mdio_probe dmaengine: pl330: Fix burst length if burst size is smaller than bus width gtp: add GTPA_LINK info to msg sent to userspace bnxt_en: Don't query FW when netif_running() is false. bnxt_en: Check for zero dir entries in NVRAM. bnxt_en: Fix PCI AER error recovery flow bnxt_en: fix HWRM error when querying VF temperature xfs: fix boundary test in xfs_attr_shortform_verify bnxt: don't enable NAPI until rings are ready selftests/bpf: Fix massive output from test_maps netfilter: nfnetlink: nfnetlink_unicast() reports EAGAIN instead of ENOBUFS nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()' perf tools: Correct SNOOPX field offset net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init() fix regression in "epoll: Keep a reference on files added to the check list" net: gemini: Fix another missing clk_disable_unprepare() in probe xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files perf jevents: Fix suspicious code in fixregex() tg3: Fix soft lockup when tg3_reset_task() fails. x86, fakenuma: Fix invalid starting node ID iommu/vt-d: Serialize IOMMU GCMD register modifications thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430 include/linux/log2.h: add missing () around n in roundup_pow_of_two() ext2: don't update mtime on COW faults xfs: don't update mtime on COW faults btrfs: drop path before adding new uuid tree entry vfio/type1: Support faulting PFNMAP vmas vfio-pci: Fault mmaps to enable vma tracking vfio-pci: Invalidate mmaps and block MMIO access on disabled memory btrfs: Remove redundant extent_buffer_get in get_old_root btrfs: Remove extraneous extent_buffer_get from tree_mod_log_rewind btrfs: set the lockdep class for log tree extent buffers uaccess: Add non-pagefault user-space read functions uaccess: Add non-pagefault user-space write function btrfs: fix potential deadlock in the search ioctl net: usb: qmi_wwan: add Telit 0x1050 composition usb: qmi_wwan: add D-Link DWM-222 A2 device ID ALSA: ca0106: fix error code handling ALSA: pcm: oss: Remove superfluous WARN_ON() for mulaw sanity check ALSA: hda/hdmi: always check pin power status in i915 pin fixup ALSA: firewire-digi00x: exclude Avid Adrenaline from detection ALSA: hda - Fix silent audio output and corrupted input on MSI X570-A PRO media: rc: do not access device via sysfs after rc_unregister_device() media: rc: uevent sysfs file races with rc_unregister_device() affs: fix basic permission bits to actually work block: allow for_each_bvec to support zero len bvec libata: implement ATA_HORKAGE_MAX_TRIM_128M and apply to Sandisks dm writecache: handle DAX to partitions on persistent memory correctly dm cache metadata: Avoid returning cmd->bm wild pointer on error dm thin metadata: Avoid returning cmd->bm wild pointer on error mm: slub: fix conversion of freelist_corrupted() KVM: arm64: Add kvm_extable for vaxorcism code KVM: arm64: Defer guest entry when an asynchronous exception is pending KVM: arm64: Survive synchronous exceptions caused by AT instructions KVM: arm64: Set HCR_EL2.PTW to prevent AT taking synchronous exception vfio/pci: Fix SR-IOV VF handling with MMIO blocking checkpatch: fix the usage of capture group ( ... ) mm/hugetlb: fix a race between hugetlb sysctl handlers cfg80211: regulatory: reject invalid hints net: usb: Fix uninit-was-stored issue in asix_read_phy_addr() Linux 4.19.144 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I81d6b3f044fe0dd919d1ece16d131c2185c00bb3 |
||
|
|
221ea9a3da |
mm/hugetlb: fix a race between hugetlb sysctl handlers
commit 17743798d81238ab13050e8e2833699b54e15467 upstream.
There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.
CPU0: CPU1:
proc_sys_write
hugetlb_sysctl_handler proc_sys_call_handler
hugetlb_sysctl_handler_common hugetlb_sysctl_handler
table->data = &tmp; hugetlb_sysctl_handler_common
table->data = &tmp;
proc_doulongvec_minmax
do_proc_doulongvec_minmax sysctl_head_finish
__do_proc_doulongvec_minmax unuse_table
i = table->data;
*i = val; // corrupt CPU1's stack
Fix this by duplicating the `table`, and only update the duplicate of
it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.
The following oops was seen:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
Code: Bad RIP value.
...
Call Trace:
? set_max_huge_pages+0x3da/0x4f0
? alloc_pool_huge_page+0x150/0x150
? proc_doulongvec_minmax+0x46/0x60
? hugetlb_sysctl_handler_common+0x1c7/0x200
? nr_hugepages_store+0x20/0x20
? copy_fd_bitmaps+0x170/0x170
? hugetlb_sysctl_handler+0x1e/0x20
? proc_sys_call_handler+0x2f1/0x300
? unregister_sysctl_table+0xb0/0xb0
? __fd_install+0x78/0x100
? proc_sys_write+0x14/0x20
? __vfs_write+0x4d/0x90
? vfs_write+0xef/0x240
? ksys_write+0xc0/0x160
? __ia32_sys_read+0x50/0x50
? __close_fd+0x129/0x150
? __x64_sys_write+0x43/0x50
? do_syscall_64+0x6c/0x200
? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes:
|
||
|
|
af2cf2c5a2 |
mm: slub: fix conversion of freelist_corrupted()
commit dc07a728d49cf025f5da2c31add438d839d076c0 upstream.
Commit 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in
deactivate_slab()") suffered an update when picked up from LKML [1].
Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()'
created a no-op statement. Fix it by sticking to the behavior intended
in the original patch [1]. In addition, make freelist_corrupted()
immune to passing NULL instead of &freelist.
The issue has been spotted via static analysis and code review.
[1] https://lore.kernel.org/linux-mm/20200331031450.12182-1-dongli.zhang@oracle.com/
Fixes: 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in deactivate_slab()")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200824130643.10291-1-erosca@de.adit-jv.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
cfb4721fce |
uaccess: Add non-pagefault user-space write function
[ Upstream commit 1d1585ca0f48fe7ed95c3571f3e4a82b2b5045dc ]
Commit 3d7081822f7f ("uaccess: Add non-pagefault user-space read functions")
missed to add probe write function, therefore factor out a probe_write_common()
helper with most logic of probe_kernel_write() except setting KERNEL_DS, and
add a new probe_user_write() helper so it can be used from BPF side.
Again, on some archs, the user address space and kernel address space can
co-exist and be overlapping, so in such case, setting KERNEL_DS would mean
that the given address is treated as being in kernel address space.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/9df2542e68141bfa3addde631441ee45503856a8.1572649915.git.daniel@iogearbox.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
61135a9c74 |
uaccess: Add non-pagefault user-space read functions
[ Upstream commit 3d7081822f7f9eab867d9bcc8fd635208ec438e0 ] Add probe_user_read(), strncpy_from_unsafe_user() and strnlen_unsafe_user() which allows caller to access user-space in IRQ context. Current probe_kernel_read() and strncpy_from_unsafe() are not available for user-space memory, because it sets KERNEL_DS while accessing data. On some arch, user address space and kernel address space can be co-exist, but others can not. In that case, setting KERNEL_DS means given address is treated as a kernel address space. Also strnlen_user() is only available from user context since it can sleep if pagefault is enabled. To access user-space memory without pagefault, we need these new functions which sets USER_DS while accessing the data. Link: http://lkml.kernel.org/r/155789869802.26965.4940338412595759063.stgit@devnote2 Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
599bf02de4 |
Merge 4.19.142 into android-4.19-stable
Changes in 4.19.142
drm/vgem: Replace opencoded version of drm_gem_dumb_map_offset()
perf probe: Fix memory leakage when the probe point is not found
khugepaged: khugepaged_test_exit() check mmget_still_valid()
khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
btrfs: export helpers for subvolume name/id resolution
btrfs: don't show full path of bind mounts in subvol=
btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range
btrfs: inode: fix NULL pointer dereference if inode doesn't need compression
btrfs: sysfs: use NOFS for device creation
romfs: fix uninitialized memory leak in romfs_dev_read()
kernel/relay.c: fix memleak on destroy relay channel
mm: include CMA pages in lowmem_reserve at boot
mm, page_alloc: fix core hung in free_pcppages_bulk()
ext4: fix checking of directory entry validity for inline directories
jbd2: add the missing unlock_buffer() in the error path of jbd2_write_superblock()
scsi: zfcp: Fix use-after-free in request timeout handlers
drm/amd/display: fix pow() crashing when given base 0
kthread: Do not preempt current task if it is going to call schedule()
spi: Prevent adding devices below an unregistering controller
scsi: ufs: Add DELAY_BEFORE_LPM quirk for Micron devices
scsi: target: tcmu: Fix crash in tcmu_flush_dcache_range on ARM
media: budget-core: Improve exception handling in budget_register()
rtc: goldfish: Enable interrupt in set_alarm() when necessary
media: vpss: clean up resources in init
Input: psmouse - add a newline when printing 'proto' by sysfs
m68knommu: fix overwriting of bits in ColdFire V3 cache control
svcrdma: Fix another Receive buffer leak
xfs: fix inode quota reservation checks
jffs2: fix UAF problem
ceph: fix use-after-free for fsc->mdsc
cpufreq: intel_pstate: Fix cpuinfo_max_freq when MSR_TURBO_RATIO_LIMIT is 0
scsi: libfc: Free skb in fc_disc_gpn_id_resp() for valid cases
virtio_ring: Avoid loop when vq is broken in virtqueue_poll
tools/testing/selftests/cgroup/cgroup_util.c: cg_read_strcmp: fix null pointer dereference
xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init
alpha: fix annotation of io{read,write}{16,32}be()
fs/signalfd.c: fix inconsistent return codes for signalfd4
ext4: fix potential negative array index in do_split()
ext4: don't allow overlapping system zones
ASoC: q6routing: add dummy register read/write function
i40e: Set RX_ONLY mode for unicast promiscuous on VLAN
i40e: Fix crash during removing i40e driver
net: fec: correct the error path for regulator disable in probe
bonding: show saner speed for broadcast mode
bonding: fix a potential double-unregister
s390/runtime_instrumentation: fix storage key handling
s390/ptrace: fix storage key handling
ASoC: msm8916-wcd-analog: fix register Interrupt offset
ASoC: intel: Fix memleak in sst_media_open
vfio/type1: Add proper error unwind for vfio_iommu_replay()
kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode
kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode
kconfig: qconf: do not limit the pop-up menu to the first row
kconfig: qconf: fix signal connection to invalid slots
efi: avoid error message when booting under Xen
Fix build error when CONFIG_ACPI is not set/enabled:
RDMA/bnxt_re: Do not add user qps to flushlist
afs: Fix NULL deref in afs_dynroot_depopulate()
bonding: fix active-backup failover for current ARP slave
net: ena: Prevent reset after device destruction
net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()
hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
net: dsa: b53: check for timeout
powerpc/pseries: Do not initiate shutdown when system is running on UPS
efi: add missed destroy_workqueue when efisubsys_init fails
epoll: Keep a reference on files added to the check list
do_epoll_ctl(): clean the failure exits up a bit
mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
xen: don't reschedule in preemption off sections
clk: Evict unregistered clks from parent caches
KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
Linux 4.19.142
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibfe4a0a4249f76ab35076f4b003e32cd6f9788a5
|
||
|
|
734654ae79 |
mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
commit 75802ca66354a39ab8e35822747cd08b3384a99a upstream.
This is found by code observation only.
Firstly, the worst case scenario should assume the whole range was covered
by pmd sharing. The old algorithm might not work as expected for ranges
like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
expected range should be (0, 2g).
Since at it, remove the loop since it should not be required. With that,
the new code should be faster too when the invalidating range is huge.
Mike said:
: With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only
: adjust to (0, 1g+2m) which is incorrect.
:
: We should cc stable. The original reason for adjusting the range was to
: prevent data corruption (getting wrong page). Since the range is not
: always adjusted correctly, the potential for corruption still exists.
:
: However, I am fairly confident that adjust_range_if_pmd_sharing_possible
: is only gong to be called in two cases:
:
: 1) for a single page
: 2) for range == entire vma
:
: In those cases, the current code should produce the correct results.
:
: To be safe, let's just cc stable.
Fixes:
|
||
|
|
c666936d8d |
mm, page_alloc: fix core hung in free_pcppages_bulk()
commit 88e8ac11d2ea3acc003cf01bb5a38c8aa76c3cfd upstream.
The following race is observed with the repeated online, offline and a
delay between two successive online of memory blocks of movable zone.
P1 P2
Online the first memory block in
the movable zone. The pcp struct
values are initialized to default
values,i.e., pcp->high = 0 &
pcp->batch = 1.
Allocate the pages from the
movable zone.
Try to Online the second memory
block in the movable zone thus it
entered the online_pages() but yet
to call zone_pcp_update().
This process is entered into
the exit path thus it tries
to release the order-0 pages
to pcp lists through
free_unref_page_commit().
As pcp->high = 0, pcp->count = 1
proceed to call the function
free_pcppages_bulk().
Update the pcp values thus the
new pcp values are like, say,
pcp->high = 378, pcp->batch = 63.
Read the pcp's batch value using
READ_ONCE() and pass the same to
free_pcppages_bulk(), pcp values
passed here are, batch = 63,
count = 1.
Since num of pages in the pcp
lists are less than ->batch,
then it will stuck in
while(list_empty(list)) loop
with interrupts disabled thus
a core hung.
Avoid this by ensuring free_pcppages_bulk() is called with proper count of
pcp list pages.
The mentioned race is some what easily reproducible without [1] because
pcp's are not updated for the first memory block online and thus there is
a enough race window for P2 between alloc+free and pcp struct values
update through onlining of second memory block.
With [1], the race still exists but it is very narrow as we update the pcp
struct values for the first memory block online itself.
This is not limited to the movable zone, it could also happen in cases
with the normal zone (e.g., hotplug to a node that only has DMA memory, or
no other memory yet).
[1]: https://patchwork.kernel.org/patch/11696389/
Fixes:
|
||
|
|
84b8dc232a |
mm: include CMA pages in lowmem_reserve at boot
commit e08d3fdfe2dafa0331843f70ce1ff6c1c4900bf4 upstream.
The lowmem_reserve arrays provide a means of applying pressure against
allocations from lower zones that were targeted at higher zones. Its
values are a function of the number of pages managed by higher zones and
are assigned by a call to the setup_per_zone_lowmem_reserve() function.
The function is initially called at boot time by the function
init_per_zone_wmark_min() and may be called later by accesses of the
/proc/sys/vm/lowmem_reserve_ratio sysctl file.
The function init_per_zone_wmark_min() was moved up from a module_init to
a core_initcall to resolve a sequencing issue with khugepaged.
Unfortunately this created a sequencing issue with CMA page accounting.
The CMA pages are added to the managed page count of a zone when
cma_init_reserved_areas() is called at boot also as a core_initcall. This
makes it uncertain whether the CMA pages will be added to the managed page
counts of their zones before or after the call to
init_per_zone_wmark_min() as it becomes dependent on link order. With the
current link order the pages are added to the managed count after the
lowmem_reserve arrays are initialized at boot.
This means the lowmem_reserve values at boot may be lower than the values
used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the
ratio values are unchanged.
In many cases the difference is not significant, but for example
an ARM platform with 1GB of memory and the following memory layout
cma: Reserved 256 MiB at 0x0000000030000000
Zone ranges:
DMA [mem 0x0000000000000000-0x000000002fffffff]
Normal empty
HighMem [mem 0x0000000030000000-0x000000003fffffff]
would result in 0 lowmem_reserve for the DMA zone. This would allow
userspace to deplete the DMA zone easily.
Funnily enough
$ cat /proc/sys/vm/lowmem_reserve_ratio
would fix up the situation because as a side effect it forces
setup_per_zone_lowmem_reserve.
This commit breaks the link order dependency by invoking
init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages
have the chance to be properly accounted in their zone(s) and allowing
the lowmem_reserve arrays to receive consistent values.
Fixes:
|
||
|
|
2ef7ebb143 |
khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
[ Upstream commit f3f99d63a8156c7a4a6b20aac22b53c5579c7dc1 ]
syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
__khugepaged_enter(): yes, when one thread is about to dump core, has set
core_state, and is waiting for others, another might do something calling
__khugepaged_enter(), which now crashes because I lumped the core_state
test (known as "mmget_still_valid") into khugepaged_test_exit(). I still
think it's best to lump them together, so just in this exceptional case,
check mm->mm_users directly instead of khugepaged_test_exit().
Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org> [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008141503370.18085@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
17c08ee00b |
khugepaged: khugepaged_test_exit() check mmget_still_valid()
[ Upstream commit bbe98f9cadff58cdd6a4acaeba0efa8565dabe65 ]
Move collapse_huge_page()'s mmget_still_valid() check into
khugepaged_test_exit() itself. collapse_huge_page() is used for anon THP
only, and earned its mmget_still_valid() check because it inserts a huge
pmd entry in place of the page table's pmd entry; whereas
collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
merely clears the page table's pmd entry. But core dumping without mmap
lock must have been as open to mistaking a racily cleared pmd entry for a
page table at physical page 0, as exit_mmap() was. And we certainly have
no interest in mapping as a THP once dumping core.
Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org> [4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
369c9d2963 |
Merge 4.19.141 into android-4.19-stable
Changes in 4.19.141 smb3: warn on confusing error scenario with sec=krb5 genirq/affinity: Make affinity setting if activated opt-in PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context() PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken PCI: Add device even if driver attach failed PCI: qcom: Define some PARF params needed for ipq8064 SoC PCI: qcom: Add support for tx term offset for rev 2.1.0 PCI: Probe bridge window attributes once at enumeration-time btrfs: free anon block device right after subvolume deletion btrfs: don't allocate anonymous block device for user invisible roots btrfs: ref-verify: fix memory leak in add_block_entry btrfs: don't traverse into the seed devices in show_devname btrfs: open device without device_list_mutex btrfs: fix messages after changing compression level by remount btrfs: only search for left_info if there is no right_info in try_merge_free_space btrfs: fix memory leaks after failure to lookup checksums during inode logging btrfs: fix return value mixup in btrfs_get_extent dt-bindings: iio: io-channel-mux: Fix compatible string in example code iio: dac: ad5592r: fix unbalanced mutex unlocks in ad5592r_read_raw() xtensa: fix xtensa_pmu_setup prototype cifs: Fix leak when handling lease break for cached root fid powerpc: Allow 4224 bytes of stack expansion for the signal frame powerpc: Fix circular dependency between percpu.h and mmu.h media: vsp1: dl: Fix NULL pointer dereference on unbind net: ethernet: stmmac: Disable hardware multicast filter net: stmmac: dwmac1000: provide multicast filter fallback net/compat: Add missing sock updates for SCM_RIGHTS md/raid5: Fix Force reconstruct-write io stuck in degraded raid5 bcache: allocate meta data pages as compound pages bcache: fix overflow in offset_to_stripe() mac80211: fix misplaced while instead of if driver core: Avoid binding drivers to dead devices MIPS: CPU#0 is not hotpluggable ext2: fix missing percpu_counter_inc ocfs2: change slot number type s16 to u16 mm/page_counter.c: fix protection usage propagation ftrace: Setup correct FTRACE_FL_REGS flags for module kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler tracing/hwlat: Honor the tracing_cpumask tracing: Use trace_sched_process_free() instead of exit() for pid tracing watchdog: f71808e_wdt: indicate WDIOF_CARDRESET support in watchdog_info.options watchdog: f71808e_wdt: remove use of wrong watchdog_info option watchdog: f71808e_wdt: clear watchdog timeout occurred flag pseries: Fix 64 bit logical memory block panic module: Correctly truncate sysfs sections output perf intel-pt: Fix FUP packet state remoteproc: qcom: q6v5: Update running state before requesting stop drm/imx: imx-ldb: Disable both channels for split mode in enc->disable() mfd: arizona: Ensure 32k clock is put on driver unbind and error RDMA/ipoib: Return void from ipoib_ib_dev_stop() RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah() media: rockchip: rga: Introduce color fmt macros and refactor CSC mode logic media: rockchip: rga: Only set output CSC mode for RGB input USB: serial: ftdi_sio: make process-packet buffer unsigned USB: serial: ftdi_sio: clean up receive processing mmc: renesas_sdhi_internal_dmac: clean up the code for dma complete gpu: ipu-v3: image-convert: Combine rotate/no-rotate irq handlers dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue() selftests/powerpc: ptrace-pkey: Rename variables to make it easier to follow code selftests/powerpc: ptrace-pkey: Update the test to mark an invalid pkey correctly selftests/powerpc: ptrace-pkey: Don't update expected UAMOR value iommu/omap: Check for failure of a call to omap_iommu_dump_ctx iommu/vt-d: Enforce PASID devTLB field mask i2c: rcar: slave: only send STOP event when we have been addressed clk: clk-atlas6: fix return value check in atlas6_clk_init() pwm: bcm-iproc: handle clk_get_rate() return tools build feature: Use CC and CXX from parent i2c: rcar: avoid race when unregistering slave openrisc: Fix oops caused when dumping stack scsi: lpfc: nvmet: Avoid hang / use-after-free again when destroying targetport watchdog: initialize device before misc_register Input: sentelic - fix error return when fsp_reg_write fails drm/vmwgfx: Use correct vmw_legacy_display_unit pointer drm/vmwgfx: Fix two list_for_each loop exit tests net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init nfs: Fix getxattr kernel panic and memory overflow fs/minix: set s_maxbytes correctly fs/minix: fix block limit check for V1 filesystems fs/minix: remove expected error message in block_to_path() fs/ufs: avoid potential u32 multiplication overflow test_kmod: avoid potential double free in trigger_config_run_type() mfd: dln2: Run event handler loop under spinlock ALSA: echoaudio: Fix potential Oops in snd_echo_resume() perf bench mem: Always memset source before memcpy tools build feature: Quote CC and CXX for their arguments sh: landisk: Add missing initialization of sh_io_port_base khugepaged: retract_page_tables() remember to test exit arm64: dts: marvell: espressobin: add ethernet alias drm: Added orientation quirk for ASUS tablet model T103HAF drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume Linux 4.19.141 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I0800f8e03919fd8f054c1bcda87efd70a6e5db6b |
||
|
|
2406c45db3 |
khugepaged: retract_page_tables() remember to test exit
commit 18e77600f7a1ed69f8ce46c9e11cad0985712dfa upstream.
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Fixes:
|
||
|
|
e88a72e86b |
mm/page_counter.c: fix protection usage propagation
commit a6f23d14ec7d7d02220ad8bb2774be3322b9aeec upstream.
When workload runs in cgroups that aren't directly below root cgroup and
their parent specifies reclaim protection, it may end up ineffective.
The reason is that propagate_protected_usage() is not called in all
hierarchy up. All the protected usage is incorrectly accumulated in the
workload's parent. This means that siblings_low_usage is overestimated
and effective protection underestimated. Even though it is transitional
phenomenon (uncharge path does correct propagation and fixes the wrong
children_low_usage), it can undermine the intended protection
unexpectedly.
We have noticed this problem while seeing a swap out in a descendant of a
protected memcg (intermediate node) while the parent was conveniently
under its protection limit and the memory pressure was external to that
hierarchy. Michal has pinpointed this down to the wrong
siblings_low_usage which led to the unwanted reclaim.
The fix is simply updating children_low_usage in respective ancestors also
in the charging path.
Fixes:
|
||
|
|
7086849b9c |
Merge 4.19.140 into android-4.19-stable
Changes in 4.19.140 tracepoint: Mark __tracepoint_string's __used HID: input: Fix devices that return multiple bytes in battery report cgroup: add missing skcd->no_refcnt check in cgroup_sk_clone() x86/mce/inject: Fix a wrong assignment of i_mce.status sched/fair: Fix NOHZ next idle balance sched: correct SD_flags returned by tl->sd_flags() arm64: dts: rockchip: fix rk3368-lion gmac reset gpio arm64: dts: rockchip: fix rk3399-puma vcc5v0-host gpio arm64: dts: rockchip: fix rk3399-puma gmac reset gpio EDAC: Fix reference count leaks arm64: dts: qcom: msm8916: Replace invalid bias-pull-none property crypto: ccree - fix resource leak on error path firmware: arm_scmi: Fix SCMI genpd domain probing arm64: dts: exynos: Fix silent hang after boot on Espresso clk: scmi: Fix min and max rate when registering clocks with discrete rates m68k: mac: Don't send IOP message until channel is idle m68k: mac: Fix IOP status/control register writes platform/x86: intel-hid: Fix return value check in check_acpi_dev() platform/x86: intel-vbtn: Fix return value check in check_acpi_dev() ARM: dts: gose: Fix ports node name for adv7180 ARM: dts: gose: Fix ports node name for adv7612 ARM: at91: pm: add missing put_device() call in at91_pm_sram_init() spi: lantiq: fix: Rx overflow error in full duplex mode ARM: socfpga: PM: add missing put_device() call in socfpga_setup_ocram_self_refresh() drm/tilcdc: fix leak & null ref in panel_connector_get_modes soc: qcom: rpmh-rsc: Set suppress_bind_attrs flag Bluetooth: add a mutex lock to avoid UAF in do_enale_set loop: be paranoid on exit and prevent new additions / removals fs/btrfs: Add cond_resched() for try_release_extent_mapping() stalls drm/amdgpu: avoid dereferencing a NULL pointer drm/radeon: Fix reference count leaks caused by pm_runtime_get_sync crypto: aesni - Fix build with LLVM_IAS=1 video: fbdev: neofb: fix memory leak in neo_scan_monitor() md-cluster: fix wild pointer of unlock_all_bitmaps() arm64: dts: hisilicon: hikey: fixes to comply with adi, adv7533 DT binding drm/etnaviv: fix ref count leak via pm_runtime_get_sync drm/nouveau: fix multiple instances of reference count leaks usb: mtu3: clear dual mode of u3port when disable device drm/debugfs: fix plain echo to connector "force" attribute drm/radeon: disable AGP by default irqchip/irq-mtk-sysirq: Replace spinlock with raw_spinlock mm/mmap.c: Add cond_resched() for exit_mmap() CPU stalls brcmfmac: keep SDIO watchdog running when console_interval is non-zero brcmfmac: To fix Bss Info flag definition Bug brcmfmac: set state of hanger slot to FREE when flushing PSQ iwlegacy: Check the return value of pcie_capability_read_*() gpu: host1x: debug: Fix multiple channels emitting messages simultaneously usb: gadget: net2280: fix memory leak on probe error handling paths bdc: Fix bug causing crash after multiple disconnects usb: bdc: Halt controller on suspend dyndbg: fix a BUG_ON in ddebug_describe_flags bcache: fix super block seq numbers comparision in register_cache_set() ACPICA: Do not increment operation_region reference counts for field units drm/msm: ratelimit crtc event overflow error agp/intel: Fix a memory leak on module initialisation failure video: fbdev: sm712fb: fix an issue about iounmap for a wrong address console: newport_con: fix an issue about leak related system resources video: pxafb: Fix the function used to balance a 'dma_alloc_coherent()' call ath10k: Acquire tx_lock in tx error paths iio: improve IIO_CONCENTRATION channel type description drm/etnaviv: Fix error path on failure to enable bus clk drm/arm: fix unintentional integer overflow on left shift leds: lm355x: avoid enum conversion warning media: omap3isp: Add missed v4l2_ctrl_handler_free() for preview_init_entities() ASoC: Intel: bxt_rt298: add missing .owner field scsi: cumana_2: Fix different dev_id between request_irq() and free_irq() drm/mipi: use dcs write for mipi_dsi_dcs_set_tear_scanline cxl: Fix kobject memleak drm/radeon: fix array out-of-bounds read and write issues scsi: powertec: Fix different dev_id between request_irq() and free_irq() scsi: eesox: Fix different dev_id between request_irq() and free_irq() ipvs: allow connection reuse for unconfirmed conntrack media: firewire: Using uninitialized values in node_probe() media: exynos4-is: Add missed check for pinctrl_lookup_state() xfs: don't eat an EIO/ENOSPC writeback error when scrubbing data fork xfs: fix reflink quota reservation accounting error RDMA/rxe: Skip dgid check in loopback mode PCI: Fix pci_cfg_wait queue locking problem leds: core: Flush scheduled work for system suspend drm: panel: simple: Fix bpc for LG LB070WV8 panel phy: exynos5-usbdrd: Calibrating makes sense only for USB2.0 PHY drm/bridge: sil_sii8620: initialize return of sii8620_readb scsi: scsi_debug: Add check for sdebug_max_queue during module init mwifiex: Prevent memory corruption handling keys powerpc/vdso: Fix vdso cpu truncation RDMA/qedr: SRQ's bug fixes RDMA/rxe: Prevent access to wr->next ptr afrer wr is posted to send queue staging: rtl8192u: fix a dubious looking mask before a shift PCI/ASPM: Add missing newline in sysfs 'policy' powerpc/book3s64/pkeys: Use PVR check instead of cpu feature drm/imx: tve: fix regulator_disable error path USB: serial: iuu_phoenix: fix led-activity helpers usb: core: fix quirks_param_set() writing to a const pointer thermal: ti-soc-thermal: Fix reversed condition in ti_thermal_expose_sensor() coresight: tmc: Fix TMC mode read in tmc_read_unprepare_etb() MIPS: OCTEON: add missing put_device() call in dwc3_octeon_device_init() usb: dwc2: Fix error path in gadget registration scsi: mesh: Fix panic after host or bus reset net: dsa: mv88e6xxx: MV88E6097 does not support jumbo configuration PCI: cadence: Fix updating Vendor ID and Subsystem Vendor ID register RDMA/core: Fix return error value in _ib_modify_qp() to negative Smack: fix another vsscanf out of bounds Smack: prevent underflow in smk_set_cipso() power: supply: check if calc_soc succeeded in pm860x_init_battery Bluetooth: hci_h5: Set HCI_UART_RESET_ON_INIT to correct flags Bluetooth: hci_serdev: Only unregister device if it was registered net: dsa: rtl8366: Fix VLAN semantics net: dsa: rtl8366: Fix VLAN set-up powerpc/boot: Fix CONFIG_PPC_MPC52XX references selftests/powerpc: Fix CPU affinity for child process PCI: Release IVRS table in AMD ACS quirk selftests/powerpc: Fix online CPU selection ASoC: meson: axg-tdm-interface: fix link fmt setup s390/qeth: don't process empty bridge port events wl1251: fix always return 0 error tools, build: Propagate build failures from tools/build/Makefile.build net: ethernet: aquantia: Fix wrong return value liquidio: Fix wrong return value in cn23xx_get_pf_num() net: spider_net: Fix the size used in a 'dma_free_coherent()' call fsl/fman: use 32-bit unsigned integer fsl/fman: fix dereference null return value fsl/fman: fix unreachable code fsl/fman: check dereferencing null pointer fsl/fman: fix eth hash table allocation dlm: Fix kobject memleak ocfs2: fix unbalanced locking pinctrl-single: fix pcs_parse_pinconf() return value svcrdma: Fix page leak in svc_rdma_recv_read_chunk() x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task crypto: aesni - add compatibility with IAS af_packet: TPACKET_V3: fix fill status rwlock imbalance drivers/net/wan/lapbether: Added needed_headroom and a skb->len check net/nfc/rawsock.c: add CAP_NET_RAW check. net: Set fput_needed iff FDPUT_FPUT is set net/tls: Fix kmap usage net: refactor bind_bucket fastreuse into helper net: initialize fastreuse on inet_inherit_port USB: serial: cp210x: re-enable auto-RTS on open USB: serial: cp210x: enable usb generic throttle/unthrottle ALSA: hda - fix the micmute led status for Lenovo ThinkCentre AIO ALSA: usb-audio: Creative USB X-Fi Pro SB1095 volume knob support ALSA: usb-audio: fix overeager device match for MacroSilicon MS2109 ALSA: usb-audio: work around streaming quirk for MacroSilicon MS2109 pstore: Fix linking when crypto API disabled crypto: hisilicon - don't sleep of CRYPTO_TFM_REQ_MAY_SLEEP was not specified crypto: qat - fix double free in qat_uclo_create_batch_init_list crypto: ccp - Fix use of merged scatterlists crypto: cpt - don't sleep of CRYPTO_TFM_REQ_MAY_SLEEP was not specified bitfield.h: don't compile-time validate _val in FIELD_FIT fs/minix: check return value of sb_getblk() fs/minix: don't allow getting deleted inodes fs/minix: reject too-large maximum file size ALSA: usb-audio: add quirk for Pioneer DDJ-RB 9p: Fix memory leak in v9fs_mount drm/ttm/nouveau: don't call tt destroy callback on alloc failure. NFS: Don't move layouts to plh_return_segs list while in use NFS: Don't return layout segments that are in use cpufreq: dt: fix oops on armada37xx include/asm-generic/vmlinux.lds.h: align ro_after_init spi: spidev: Align buffers for DMA mtd: rawnand: qcom: avoid write to unavailable register parisc: Implement __smp_store_release and __smp_load_acquire barriers parisc: mask out enable and reserved bits from sba imask ARM: 8992/1: Fix unwind_frame for clang-built kernels irqdomain/treewide: Free firmware node after domain removal xen/balloon: fix accounting in alloc_xenballooned_pages error path xen/balloon: make the balloon wait interruptible xen/gntdev: Fix dmabuf import with non-zero sgt offset Linux 4.19.140 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6b0d8dcf9ded022f62d9c62605388f1c1e9112d1 |
||
|
|
abfa9c47ec |
mm/mmap.c: Add cond_resched() for exit_mmap() CPU stalls
[ Upstream commit 0a3b3c253a1eb2c7fe7f34086d46660c909abeb3 ] A large process running on a heavily loaded system can encounter the following RCU CPU stall warning: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 3-....: (20998 ticks this GP) idle=4ea/1/0x4000000000000002 softirq=556558/556558 fqs=5190 (t=21013 jiffies g=1005461 q=132576) NMI backtrace for cpu 3 CPU: 3 PID: 501900 Comm: aio-free-ring-w Kdump: loaded Not tainted 5.2.9-108_fbk12_rc3_3858_gb83b75af7909 #1 Hardware name: Wiwynn HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016 Call Trace: <IRQ> dump_stack+0x46/0x60 nmi_cpu_backtrace.cold.3+0x13/0x50 ? lapic_can_unplug_cpu.cold.27+0x34/0x34 nmi_trigger_cpumask_backtrace+0xba/0xca rcu_dump_cpu_stacks+0x99/0xc7 rcu_sched_clock_irq.cold.87+0x1aa/0x397 ? tick_sched_do_timer+0x60/0x60 update_process_times+0x28/0x60 tick_sched_timer+0x37/0x70 __hrtimer_run_queues+0xfe/0x270 hrtimer_interrupt+0xf4/0x210 smp_apic_timer_interrupt+0x5e/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:kmem_cache_free+0x223/0x300 Code: 88 00 00 00 0f 85 ca 00 00 00 41 8b 55 18 31 f6 f7 da 41 f6 45 0a 02 40 0f 94 c6 83 c6 05 9c 41 5e fa e8 a0 a7 01 00 41 56 9d <49> 8b 47 08 a8 03 0f 85 87 00 00 00 65 48 ff 08 e9 3d fe ff ff 65 RSP: 0018:ffffc9000e8e3da8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 RAX: 0000000000020000 RBX: ffff88861b9de960 RCX: 0000000000000030 RDX: fffffffffffe41e8 RSI: 000060777fe3a100 RDI: 000000000001be18 RBP: ffffea00186e7780 R08: ffffffffffffffff R09: ffffffffffffffff R10: ffff88861b9dea28 R11: ffff88887ffde000 R12: ffffffff81230a1f R13: ffff888854684dc0 R14: 0000000000000206 R15: ffff8888547dbc00 ? remove_vma+0x4f/0x60 remove_vma+0x4f/0x60 exit_mmap+0xd6/0x160 mmput+0x4a/0x110 do_exit+0x278/0xae0 ? syscall_trace_enter+0x1d3/0x2b0 ? handle_mm_fault+0xaa/0x1c0 do_group_exit+0x3a/0xa0 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 And on a PREEMPT=n kernel, the "while (vma)" loop in exit_mmap() can run for a very long time given a large process. This commit therefore adds a cond_resched() to this loop, providing RCU any needed quiescent states. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
bcf9517454 |
Merge 4.19.135 into android-4.19-stable
Changes in 4.19.135
soc: qcom: rpmh: Dirt can only make you dirtier, not cleaner
gpio: arizona: handle pm_runtime_get_sync failure case
gpio: arizona: put pm_runtime in case of failure
pinctrl: amd: fix npins for uart0 in kerncz_groups
mac80211: allow rx of mesh eapol frames with default rx key
scsi: scsi_transport_spi: Fix function pointer check
xtensa: fix __sync_fetch_and_{and,or}_4 declarations
xtensa: update *pos in cpuinfo_op.next
drivers/net/wan/lapbether: Fixed the value of hard_header_len
net: sky2: initialize return of gm_phy_read
drm/nouveau/i2c/g94-: increase NV_PMGR_DP_AUXCTL_TRANSACTREQ timeout
drivers/firmware/psci: Fix memory leakage in alloc_init_cpu_groups()
fuse: fix weird page warning
irqdomain/treewide: Keep firmware node unconditionally allocated
SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO compeletion")
spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours
tipc: clean up skb list lock handling on send path
IB/umem: fix reference count leak in ib_umem_odp_get()
uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix GDB regression
ALSA: info: Drop WARN_ON() from buffer NULL sanity check
ASoC: rt5670: Correct RT5670_LDO_SEL_MASK
btrfs: fix double free on ulist after backref resolution failure
btrfs: fix mount failure caused by race with umount
btrfs: fix page leaks after failure to lock page for delalloc
bnxt_en: Fix race when modifying pause settings.
fpga: dfl: fix bug in port reset handshake
hippi: Fix a size used in a 'pci_free_consistent()' in an error handling path
ax88172a: fix ax88172a_unbind() failures
net: dp83640: fix SIOCSHWTSTAMP to update the struct with actual configuration
ieee802154: fix one possible memleak in adf7242_probe
drm: sun4i: hdmi: Fix inverted HPD result
net: smc91x: Fix possible memory leak in smc_drv_probe()
bonding: check error value of register_netdevice() immediately
mlxsw: destroy workqueue when trap_register in mlxsw_emad_init
qed: suppress "don't support RoCE & iWARP" flooding on HW init
ipvs: fix the connection sync failed in some cases
net: ethernet: ave: Fix error returns in ave_init
i2c: rcar: always clear ICSAR to avoid side effects
bonding: check return value of register_netdevice() in bond_newlink()
serial: exar: Fix GPIO configuration for Sealevel cards based on XR17V35X
scripts/decode_stacktrace: strip basepath from all paths
scripts/gdb: fix lx-symbols 'gdb.error' while loading modules
HID: i2c-hid: add Mediacom FlexBook edge13 to descriptor override
HID: alps: support devices with report id 2
HID: steam: fixes race in handling device list.
HID: apple: Disable Fn-key key-re-mapping on clone keyboards
dmaengine: tegra210-adma: Fix runtime PM imbalance on error
Input: add `SW_MACHINE_COVER`
spi: mediatek: use correct SPI_CFG2_REG MACRO
regmap: dev_get_regmap_match(): fix string comparison
hwmon: (aspeed-pwm-tacho) Avoid possible buffer overflow
dmaengine: ioat setting ioat timeout as module parameter
Input: synaptics - enable InterTouch for ThinkPad X1E 1st gen
usb: gadget: udc: gr_udc: fix memleak on error handling path in gr_ep_init()
hwmon: (adm1275) Make sure we are reading enough data for different chips
hwmon: (scmi) Fix potential buffer overflow in scmi_hwmon_probe()
arm64: Use test_tsk_thread_flag() for checking TIF_SINGLESTEP
x86: math-emu: Fix up 'cmp' insn for clang ias
RISC-V: Upgrade smp_mb__after_spinlock() to iorw,iorw
binder: Don't use mmput() from shrinker function.
usb: xhci-mtk: fix the failure of bandwidth allocation
usb: xhci: Fix ASM2142/ASM3142 DMA addressing
Revert "cifs: Fix the target file was deleted when rename failed."
staging: wlan-ng: properly check endpoint types
staging: comedi: addi_apci_1032: check INSN_CONFIG_DIGITAL_TRIG shift
staging: comedi: ni_6527: fix INSN_CONFIG_DIGITAL_TRIG support
staging: comedi: addi_apci_1500: check INSN_CONFIG_DIGITAL_TRIG shift
staging: comedi: addi_apci_1564: check INSN_CONFIG_DIGITAL_TRIG shift
serial: 8250: fix null-ptr-deref in serial8250_start_tx()
serial: 8250_mtk: Fix high-speed baud rates clamping
fbdev: Detect integer underflow at "struct fbcon_ops"->clear_margins.
vt: Reject zero-sized screen buffer size.
Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation
mm/memcg: fix refcount error while moving and swapping
mm: memcg/slab: synchronize access to kmem_cache dying flag using a spinlock
mm: memcg/slab: fix memory leak at non-root kmem_cache destroy
io-mapping: indicate mapping failure
drm/amdgpu: Fix NULL dereference in dpm sysfs handlers
drm/amd/powerplay: fix a crash when overclocking Vega M
parisc: Add atomic64_set_release() define to avoid CPU soft lockups
x86, vmlinux.lds: Page-align end of ..page_aligned sections
ASoC: rt5670: Add new gpio1_is_ext_spk_en quirk and enable it on the Lenovo Miix 2 10
ASoC: qcom: Drop HAS_DMA dependency to fix link failure
dm integrity: fix integrity recalculation that is improperly skipped
ath9k: Fix general protection fault in ath9k_hif_usb_rx_cb
ath9k: Fix regression with Atheros 9271
Linux 4.19.135
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0bbcde83e7c810352d998f28d3484efa2b9ede8e
|
||
|
|
d87ddcdb2d |
mm: memcg/slab: fix memory leak at non-root kmem_cache destroy
commit d38a2b7a9c939e6d7329ab92b96559ccebf7b135 upstream.
If the kmem_cache refcount is greater than one, we should not mark the
root kmem_cache as dying. If we mark the root kmem_cache dying
incorrectly, the non-root kmem_cache can never be destroyed. It
resulted in memory leak when memcg was destroyed. We can use the
following steps to reproduce.
1) Use kmem_cache_create() to create a new kmem_cache named A.
2) Coincidentally, the kmem_cache A is an alias for kmem_cache B,
so the refcount of B is just increased.
3) Use kmem_cache_destroy() to destroy the kmem_cache A, just
decrease the B's refcount but mark the B as dying.
4) Create a new memory cgroup and alloc memory from the kmem_cache
B. It leads to create a non-root kmem_cache for allocating memory.
5) When destroy the memory cgroup created in the step 4), the
non-root kmem_cache can never be destroyed.
If we repeat steps 4) and 5), this will cause a lot of memory leak. So
only when refcount reach zero, we mark the root kmem_cache as dying.
Fixes:
|
||
|
|
763b04c6b2 |
mm: memcg/slab: synchronize access to kmem_cache dying flag using a spinlock
[ Upstream commit 63b02ef7dc4ec239df45c018ac0adbd02ba30a0c ] Currently the memcg_params.dying flag and the corresponding workqueue used for the asynchronous deactivation of kmem_caches is synchronized using the slab_mutex. It makes impossible to check this flag from the irq context, which will be required in order to implement asynchronous release of kmem_caches. So let's switch over to the irq-save flavor of the spinlock-based synchronization. Link: http://lkml.kernel.org/r/20190611231813.3148843-8-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Waiman Long <longman@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
91404e91eb |
mm/memcg: fix refcount error while moving and swapping
commit 8d22a9351035ef2ff12ef163a1091b8b8cf1e49c upstream.
It was hard to keep a test running, moving tasks between memcgs with
move_charge_at_immigrate, while swapping: mem_cgroup_id_get_many()'s
refcount is discovered to be 0 (supposedly impossible), so it is then
forced to REFCOUNT_SATURATED, and after thousands of warnings in quick
succession, the test is at last put out of misery by being OOM killed.
This is because of the way moved_swap accounting was saved up until the
task move gets completed in __mem_cgroup_clear_mc(), deferred from when
mem_cgroup_move_swap_account() actually exchanged old and new ids.
Concurrent activity can free up swap quicker than the task is scanned,
bringing id refcount down 0 (which should only be possible when
offlining).
Just skip that optimization: do that part of the accounting immediately.
Fixes:
|
||
|
|
a6c20c0442 |
Merge 4.19.132 into android-4.19-stable
Changes in 4.19.132 btrfs: fix a block group ref counter leak after failure to remove block group mm: fix swap cache node allocation mask EDAC/amd64: Read back the scrub rate PCI register on F15h usbnet: smsc95xx: Fix use-after-free after removal mm/slub.c: fix corrupted freechain in deactivate_slab() mm/slub: fix stack overruns with SLUB_STATS usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect s390/debug: avoid kernel warning on too large number of pages nvme-multipath: set bdi capabilities once nvme-multipath: fix deadlock between ana_work and scan_work kgdb: Avoid suspicious RCU usage warning crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock() drm/msm/dpu: fix error return code in dpu_encoder_init cxgb4: use unaligned conversion for fetching timestamp cxgb4: parse TC-U32 key values and masks natively cxgb4: use correct type for all-mask IP address comparison cxgb4: fix SGE queue dump destination buffer context hwmon: (max6697) Make sure the OVERT mask is set correctly hwmon: (acpi_power_meter) Fix potential memory leak in acpi_power_meter_add() drm: sun4i: hdmi: Remove extra HPD polling virtio-blk: free vblk-vqs in error path of virtblk_probe() SMB3: Honor 'posix' flag for multiuser mounts nvme: fix a crash in nvme_mpath_add_disk i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665 i2c: mlxcpld: check correct size of maximum RECV_LEN packet nfsd: apply umask on fs without ACL support Revert "ALSA: usb-audio: Improve frames size computation" SMB3: Honor 'seal' flag for multiuser mounts SMB3: Honor persistent/resilient handle flags for multiuser mounts SMB3: Honor lease disabling for multiuser mounts cifs: Fix the target file was deleted when rename failed. MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen irqchip/gic: Atomically update affinity dm zoned: assign max_io_len correctly efi: Make it possible to disable efivar_ssdt entirely Linux 4.19.132 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I394bc3ee947bdf92bad83297e7bd579e92fed8a9 |
||
|
|
3e632652e3 |
mm/slub: fix stack overruns with SLUB_STATS
[ Upstream commit a68ee0573991e90af2f1785db309206408bad3e5 ]
There is no need to copy SLUB_STATS items from root memcg cache to new
memcg cache copies. Doing so could result in stack overruns because the
store function only accepts 0 to clear the stat and returns an error for
everything else while the show method would print out the whole stat.
Then, the mismatch of the lengths returns from show and store methods
happens in memcg_propagate_slab_attrs():
else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf))
buf = mbuf;
max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64]
in show_stat() later where a bounch of sprintf() would overrun the stack
variable. Fix it by always allocating a page of buffer to be used in
show_stat() if SLUB_STATS=y which should only be used for debug purpose.
# echo 1 > /sys/kernel/slab/fs_cache/shrink
BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0
Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
Call Trace:
number+0x421/0x6e0
vsnprintf+0x451/0x8e0
sprintf+0x9e/0xd0
show_stat+0x124/0x1d0
alloc_slowpath_show+0x13/0x20
__kmem_cache_create+0x47a/0x6b0
addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame:
process_one_work+0x0/0xb90
this frame has 1 object:
[32, 72) 'lockdep_map'
Memory state around the buggy address:
ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
^
ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00
ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0
Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
Call Trace:
__kmem_cache_create+0x6ac/0x6b0
Fixes:
|