Commit Graph

13248 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
bc09bee25e Merge 4.19.156 into android-4.19-stable
Changes in 4.19.156
	drm/i915: Break up error capture compression loops with cond_resched()
	tipc: fix use-after-free in tipc_bcast_get_mode
	ptrace: fix task_join_group_stop() for the case when current is traced
	cadence: force nonlinear buffers to be cloned
	chelsio/chtls: fix memory leaks caused by a race
	chelsio/chtls: fix always leaking ctrl_skb
	gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP
	gianfar: Account for Tx PTP timestamp in the skb headroom
	net: usb: qmi_wwan: add Telit LE910Cx 0x1230 composition
	sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian platforms
	sfp: Fix error handing in sfp_probe()
	blktrace: fix debugfs use after free
	btrfs: extent_io: Kill the forward declaration of flush_write_bio
	btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
	Revert "btrfs: flush write bio if we loop in extent_write_cache_pages"
	btrfs: flush write bio if we loop in extent_write_cache_pages
	btrfs: extent_io: Handle errors better in extent_write_full_page()
	btrfs: extent_io: Handle errors better in btree_write_cache_pages()
	btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()
	Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
	btrfs: Don't submit any btree write bio if the fs has errors
	btrfs: Move btrfs_check_chunk_valid() to tree-check.[ch] and export it
	btrfs: tree-checker: Make chunk item checker messages more readable
	btrfs: tree-checker: Make btrfs_check_chunk_valid() return EUCLEAN instead of EIO
	btrfs: tree-checker: Check chunk item at tree block read time
	btrfs: tree-checker: Verify dev item
	btrfs: tree-checker: Fix wrong check on max devid
	btrfs: tree-checker: Enhance chunk checker to validate chunk profile
	btrfs: tree-checker: Verify inode item
	btrfs: tree-checker: fix the error message for transid error
	Fonts: Replace discarded const qualifier
	ALSA: usb-audio: Add implicit feedback quirk for Zoom UAC-2
	ALSA: usb-audio: add usb vendor id as DSD-capable for Khadas devices
	ALSA: usb-audio: Add implicit feedback quirk for Qu-16
	ALSA: usb-audio: Add implicit feedback quirk for MODX
	mm: mempolicy: fix potential pte_unmap_unlock pte error
	lib/crc32test: remove extra local_irq_disable/enable
	kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
	mm: always have io_remap_pfn_range() set pgprot_decrypted()
	gfs2: Wake up when sd_glock_disposal becomes zero
	ring-buffer: Fix recursion protection transitions between interrupt context
	ftrace: Fix recursion check for NMI test
	ftrace: Handle tracing when switching between context
	tracing: Fix out of bounds write in get_trace_buf
	futex: Handle transient "ownerless" rtmutex state correctly
	ARM: dts: sun4i-a10: fix cpu_alert temperature
	x86/kexec: Use up-to-dated screen_info copy to fill boot params
	of: Fix reserved-memory overlap detection
	blk-cgroup: Fix memleak on error path
	blk-cgroup: Pre-allocate tree node on blkg_conf_prep
	scsi: core: Don't start concurrent async scan on same host
	vsock: use ns_capable_noaudit() on socket create
	drm/vc4: drv: Add error handding for bind
	ACPI: NFIT: Fix comparison to '-ENXIO'
	vt: Disable KD_FONT_OP_COPY
	fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent
	serial: 8250_mtk: Fix uart_get_baud_rate warning
	serial: txx9: add missing platform_driver_unregister() on error in serial_txx9_init
	USB: serial: cyberjack: fix write-URB completion race
	USB: serial: option: add Quectel EC200T module support
	USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231
	USB: serial: option: add Telit FN980 composition 0x1055
	USB: Add NO_LPM quirk for Kingston flash drive
	usb: mtu3: fix panic in mtu3_gadget_stop()
	ARC: stack unwinding: avoid indefinite looping
	Revert "ARC: entry: fix potential EFA clobber when TIF_SYSCALL_TRACE"
	PM: runtime: Resume the device earlier in __device_release_driver()
	perf/core: Fix a memory leak in perf_event_parse_addr_filter()
	tools: perf: Fix build error in v4.19.y
	net: dsa: read mac address from DT for slave device
	arm64: dts: marvell: espressobin: Add ethernet switch aliases
	Linux 4.19.156

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I87af8871465f54de0332fa74bc1f342b7fe99061
2020-11-10 13:23:09 +01:00
Shijie Luo
d4fea27ffc mm: mempolicy: fix potential pte_unmap_unlock pte error
commit 3f08842098e842c51e3b97d0dcdebf810b32558e upstream.

When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
pte_unmap_unlock seems like not a good idea.

queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
migrate misplaced pages but returns with EIO when encountering such a
page.  Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return
-EIO when MPOL_MF_STRICT is specified") and early break on the first pte
in the range results in pte_unmap_unlock on an underflow pte.  This can
lead to lockups later on when somebody tries to lock the pte resp.
page_table_lock again..

Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Feilong Lin <linfeilong@huawei.com>
Cc: Shijie Luo <luoshijie1@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-10 12:35:57 +01:00
Daniel Vetter
d624a2b1ab UPSTREAM: mm/sl[uo]b: export __kmalloc_track(_node)_caller
slab does this already, and I want to use this in a memory allocation
tracker in drm for stuff that's tied to the lifetime of a drm_device,
not the underlying struct device. Kinda like devres, but for drm.

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200323144950.3018436-2-daniel.vetter@ffwll.ch
(cherry picked from commit fd7cb5753ef49964ea9db5121c3fc9a4ec21ed8e)
Bug: 163141236
Signed-off-by: Alistair Delva <adelva@google.com>
Change-Id: I10790befc779311a5f2ee441e4d073e51a5a7a62
2020-11-02 16:12:14 +00:00
Greg Kroah-Hartman
b9a942466b Merge 4.19.153 into android-4.19-stable
Changes in 4.19.153
	ibmveth: Switch order of ibmveth_helper calls.
	ibmveth: Identify ingress large send packets.
	ipv4: Restore flowi4_oif update before call to xfrm_lookup_route
	mlx4: handle non-napi callers to napi_poll
	net: fec: Fix phy_device lookup for phy_reset_after_clk_enable()
	net: fec: Fix PHY init after phy_reset_after_clk_enable()
	net: fix pos incrementment in ipv6_route_seq_next
	net/smc: fix valid DMBE buffer sizes
	net: usb: qmi_wwan: add Cellient MPL200 card
	tipc: fix the skb_unshare() in tipc_buf_append()
	net/ipv4: always honour route mtu during forwarding
	r8169: fix data corruption issue on RTL8402
	net/tls: sendfile fails with ktls offload
	binder: fix UAF when releasing todo list
	ALSA: bebob: potential info leak in hwdep_read()
	chelsio/chtls: fix socket lock
	chelsio/chtls: correct netdevice for vlan interface
	chelsio/chtls: correct function return and return type
	net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
	net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
	net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
	nfc: Ensure presence of NFC_ATTR_FIRMWARE_NAME attribute in nfc_genl_fw_download()
	tcp: fix to update snd_wl1 in bulk receiver fast path
	r8169: fix operation under forced interrupt threading
	icmp: randomize the global rate limiter
	ALSA: hda/realtek: Enable audio jacks of ASUS D700SA with ALC887
	cifs: remove bogus debug code
	cifs: Return the error from crypt_message when enc/dec key not found.
	KVM: x86/mmu: Commit zap of remaining invalid pages when recovering lpages
	KVM: SVM: Initialize prev_ga_tag before use
	ima: Don't ignore errors from crypto_shash_update()
	crypto: algif_aead - Do not set MAY_BACKLOG on the async path
	EDAC/i5100: Fix error handling order in i5100_init_one()
	EDAC/ti: Fix handling of platform_get_irq() error
	x86/fpu: Allow multiple bits in clearcpuid= parameter
	drivers/perf: xgene_pmu: Fix uninitialized resource struct
	x86/nmi: Fix nmi_handle() duration miscalculation
	x86/events/amd/iommu: Fix sizeof mismatch
	crypto: algif_skcipher - EBUSY on aio should be an error
	crypto: mediatek - Fix wrong return value in mtk_desc_ring_alloc()
	crypto: ixp4xx - Fix the size used in a 'dma_free_coherent()' call
	crypto: picoxcell - Fix potential race condition bug
	media: tuner-simple: fix regression in simple_set_radio_freq
	media: Revert "media: exynos4-is: Add missed check for pinctrl_lookup_state()"
	media: m5mols: Check function pointer in m5mols_sensor_power
	media: uvcvideo: Set media controller entity functions
	media: uvcvideo: Silence shift-out-of-bounds warning
	media: omap3isp: Fix memleak in isp_probe
	crypto: omap-sham - fix digcnt register handling with export/import
	hwmon: (pmbus/max34440) Fix status register reads for MAX344{51,60,61}
	cypto: mediatek - fix leaks in mtk_desc_ring_alloc
	media: mx2_emmaprp: Fix memleak in emmaprp_probe
	media: tc358743: initialize variable
	media: tc358743: cleanup tc358743_cec_isr
	media: rcar-vin: Fix a reference count leak.
	media: rockchip/rga: Fix a reference count leak.
	media: platform: fcp: Fix a reference count leak.
	media: camss: Fix a reference count leak.
	media: s5p-mfc: Fix a reference count leak
	media: stm32-dcmi: Fix a reference count leak
	media: ti-vpe: Fix a missing check and reference count leak
	regulator: resolve supply after creating regulator
	pinctrl: bcm: fix kconfig dependency warning when !GPIOLIB
	spi: spi-s3c64xx: swap s3c64xx_spi_set_cs() and s3c64xx_enable_datapath()
	spi: spi-s3c64xx: Check return values
	ath10k: provide survey info as accumulated data
	Bluetooth: hci_uart: Cancel init work before unregistering
	ath6kl: prevent potential array overflow in ath6kl_add_new_sta()
	ath9k: Fix potential out of bounds in ath9k_htc_txcompletion_cb()
	ath10k: Fix the size used in a 'dma_free_coherent()' call in an error handling path
	wcn36xx: Fix reported 802.11n rx_highest rate wcn3660/wcn3680
	ASoC: qcom: lpass-platform: fix memory leak
	ASoC: qcom: lpass-cpu: fix concurrency issue
	brcmfmac: check ndev pointer
	mwifiex: Do not use GFP_KERNEL in atomic context
	staging: rtl8192u: Do not use GFP_KERNEL in atomic context
	drm/gma500: fix error check
	scsi: qla4xxx: Fix an error handling path in 'qla4xxx_get_host_stats()'
	scsi: qla2xxx: Fix wrong return value in qla_nvme_register_hba()
	scsi: csiostor: Fix wrong return value in csio_hw_prep_fw()
	backlight: sky81452-backlight: Fix refcount imbalance on error
	VMCI: check return value of get_user_pages_fast() for errors
	tty: serial: earlycon dependency
	tty: hvcs: Don't NULL tty->driver_data until hvcs_cleanup()
	pty: do tty_flip_buffer_push without port->lock in pty_write
	pwm: lpss: Fix off by one error in base_unit math in pwm_lpss_prepare()
	pwm: lpss: Add range limit check for the base_unit register value
	drivers/virt/fsl_hypervisor: Fix error handling path
	video: fbdev: vga16fb: fix setting of pixclock because a pass-by-value error
	video: fbdev: sis: fix null ptr dereference
	video: fbdev: radeon: Fix memleak in radeonfb_pci_register
	HID: roccat: add bounds checking in kone_sysfs_write_settings()
	pinctrl: mcp23s08: Fix mcp23x17_regmap initialiser
	pinctrl: mcp23s08: Fix mcp23x17 precious range
	net/mlx5: Don't call timecounter cyc2time directly from 1PPS flow
	net: stmmac: use netif_tx_start|stop_all_queues() function
	cpufreq: armada-37xx: Add missing MODULE_DEVICE_TABLE
	net: dsa: rtl8366: Check validity of passed VLANs
	net: dsa: rtl8366: Refactor VLAN/PVID init
	net: dsa: rtl8366: Skip PVID setting if not requested
	net: dsa: rtl8366rb: Support all 4096 VLANs
	ath6kl: wmi: prevent a shift wrapping bug in ath6kl_wmi_delete_pstream_cmd()
	misc: mic: scif: Fix error handling path
	ALSA: seq: oss: Avoid mutex lock for a long-time ioctl
	usb: dwc2: Fix parameter type in function pointer prototype
	quota: clear padding in v2r1_mem2diskdqb()
	slimbus: core: check get_addr before removing laddr ida
	slimbus: core: do not enter to clock pause mode in core
	slimbus: qcom-ngd-ctrl: disable ngd in qmi server down callback
	HID: hid-input: fix stylus battery reporting
	qtnfmac: fix resource leaks on unsupported iftype error return path
	net: enic: Cure the enic api locking trainwreck
	mfd: sm501: Fix leaks in probe()
	iwlwifi: mvm: split a print to avoid a WARNING in ROC
	usb: gadget: f_ncm: fix ncm_bitrate for SuperSpeed and above.
	usb: gadget: u_ether: enable qmult on SuperSpeed Plus as well
	nl80211: fix non-split wiphy information
	usb: dwc2: Fix INTR OUT transfers in DDMA mode.
	scsi: target: tcmu: Fix warning: 'page' may be used uninitialized
	scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs()
	platform/x86: mlx-platform: Remove PSU EEPROM configuration
	mwifiex: fix double free
	ipvs: clear skb->tstamp in forwarding path
	net: korina: fix kfree of rx/tx descriptor array
	netfilter: nf_log: missing vlan offload tag and proto
	mm/memcg: fix device private memcg accounting
	mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
	IB/mlx4: Fix starvation in paravirt mux/demux
	IB/mlx4: Adjust delayed work when a dup is observed
	powerpc/pseries: Fix missing of_node_put() in rng_init()
	powerpc/icp-hv: Fix missing of_node_put() in success path
	RDMA/ucma: Fix locking for ctx->events_reported
	RDMA/ucma: Add missing locking around rdma_leave_multicast()
	mtd: lpddr: fix excessive stack usage with clang
	powerpc/pseries: explicitly reschedule during drmem_lmb list traversal
	mtd: mtdoops: Don't write panic data twice
	ARM: 9007/1: l2c: fix prefetch bits init in L2X0_AUX_CTRL using DT values
	arc: plat-hsdk: fix kconfig dependency warning when !RESET_CONTROLLER
	xfs: limit entries returned when counting fsmap records
	xfs: fix high key handling in the rt allocator's query_range function
	RDMA/qedr: Fix use of uninitialized field
	RDMA/qedr: Fix inline size returned for iWARP
	powerpc/tau: Use appropriate temperature sample interval
	powerpc/tau: Convert from timer to workqueue
	powerpc/tau: Remove duplicated set_thresholds() call
	Linux 4.19.153

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9e85e8ca67ab8e28d04a77339f80fdbf3c568956
2020-10-29 11:36:20 +01:00
Suren Baghdasaryan
a3d0ceee71 mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
[ Upstream commit 67197a4f28d28d0b073ab0427b03cb2ee5382578 ]

Currently __set_oom_adj loops through all processes in the system to keep
oom_score_adj and oom_score_adj_min in sync between processes sharing
their mm.  This is done for any task with more that one mm_users, which
includes processes with multiple threads (sharing mm and signals).
However for such processes the loop is unnecessary because their signal
structure is shared as well.

Android updates oom_score_adj whenever a tasks changes its role
(background/foreground/...) or binds to/unbinds from a service, making it
more/less important.  Such operation can happen frequently.  We noticed
that updates to oom_score_adj became more expensive and after further
investigation found out that the patch mentioned in "Fixes" introduced a
regression.  Using Pixel 4 with a typical Android workload, write time to
oom_score_adj increased from ~3.57us to ~362us.  Moreover this regression
linearly depends on the number of multi-threaded processes running on the
system.

Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with
(CLONE_VM && !CLONE_THREAD && !CLONE_VFORK).  Change __set_oom_adj to use
MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj
update should be synchronized between multiple processes.  To prevent
races between clone() and __set_oom_adj(), when oom_score_adj of the
process being cloned might be modified from userspace, we use
oom_adj_mutex.  Its scope is changed to global.

The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for
the case of vfork().  To prevent performance regressions of vfork(), we
skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is
specified.  Clearing the MMF_MULTIPROCESS flag (when the last process
sharing the mm exits) is left out of this patch to keep it simple and
because it is believed that this threading model is rare.  Should there
ever be a need for optimizing that case as well, it can be done by hooking
into the exit path, likely following the mm_update_next_owner pattern.

With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being
quite rare, the regression is gone after the change is applied.

[surenb@google.com: v3]
  Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com

Fixes: 44a70adec9 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj")
Reported-by: Tim Murray <timmurray@google.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Christian Kellner <christian@kellner.me>
Cc: Adrian Reber <areber@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20200824153036.3201505-1-surenb@google.com
Debugged-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:55:15 +01:00
Ralph Campbell
2761fff65f mm/memcg: fix device private memcg accounting
[ Upstream commit 9a137153fc8798a89d8fce895cd0a06ea5b8e37c ]

The code in mc_handle_swap_pte() checks for non_swap_entry() and returns
NULL before checking is_device_private_entry() so device private pages are
never handled.  Fix this by checking for non_swap_entry() after handling
device private swap PTEs.

I assume the memory cgroup accounting would be off somehow when moving
a process to another memory cgroup.  Currently, the device private page
is charged like a normal anonymous page when allocated and is uncharged
when the page is freed so I think that path is OK.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Link: https://lkml.kernel.org/r/20201009215952.2726-1-rcampbell@nvidia.com
xFixes: c733a82874 ("mm/memcontrol: support MEMORY_DEVICE_PRIVATE")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:55:15 +01:00
Greg Kroah-Hartman
9f80205d66 Merge 4.19.151 into android-4.19-stable
Changes in 4.19.151
	fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h
	Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts
	fbcon: Fix global-out-of-bounds read in fbcon_get_font()
	Revert "ravb: Fixed to be able to unload modules"
	net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()
	drm/nouveau/mem: guard against NULL pointer access in mem_del
	usermodehelper: reset umask to default before executing user process
	platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360
	platform/x86: thinkpad_acpi: initialize tp_nvram_state variable
	platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting
	platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse
	driver core: Fix probe_count imbalance in really_probe()
	perf top: Fix stdio interface input handling with glibc 2.28+
	i2c: i801: Exclude device from suspend direct complete optimization
	mtd: rawnand: sunxi: Fix the probe error path
	arm64: dts: stratix10: add status to qspi dts node
	nvme-core: put ctrl ref when module ref get fail
	macsec: avoid use-after-free in macsec_handle_frame()
	mm/khugepaged: fix filemap page_to_pgoff(page) != offset
	xfrmi: drop ignore_df check before updating pmtu
	cifs: Fix incomplete memory allocation on setxattr path
	i2c: meson: fix clock setting overwrite
	i2c: meson: fixup rate calculation with filter delay
	i2c: owl: Clear NACK and BUS error bits
	sctp: fix sctp_auth_init_hmacs() error path
	team: set dev->needed_headroom in team_setup_by_port()
	net: team: fix memory leak in __team_options_register
	openvswitch: handle DNAT tuple collision
	drm/amdgpu: prevent double kfree ttm->sg
	xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate
	xfrm: clone XFRMA_REPLAY_ESN_VAL in xfrm_do_migrate
	xfrm: clone XFRMA_SEC_CTX in xfrm_do_migrate
	xfrm: clone whole liftime_cur structure in xfrm_do_migrate
	net: stmmac: removed enabling eee in EEE set callback
	platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP
	xfrm: Use correct address family in xfrm_state_find
	bonding: set dev->needed_headroom in bond_setup_by_slave()
	mdio: fix mdio-thunder.c dependency & build error
	net: usb: ax88179_178a: fix missing stop entry in driver_info
	net/mlx5e: Fix VLAN cleanup flow
	net/mlx5e: Fix VLAN create flow
	rxrpc: Fix rxkad token xdr encoding
	rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()
	rxrpc: Fix some missing _bh annotations on locking conn->state_lock
	rxrpc: Fix server keyring leak
	perf: Fix task_function_call() error handling
	mmc: core: don't set limits.discard_granularity as 0
	mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
	net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails
	Linux 4.19.151

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9ee2b0fc4fc39f27be6ae680529e1046f249a3e6
2020-10-14 12:11:08 +02:00
Vijay Balakrishna
94c5167581 mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
commit 4aab2be0983031a05cb4a19696c9da5749523426 upstream.

When memory is hotplug added or removed the min_free_kbytes should be
recalculated based on what is expected by khugepaged.  Currently after
hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost.

This change restores min_free_kbytes as expected for THP consumers.

[vijayb@linux.microsoft.com: v5]
  Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.com

Fixes: f000565adb ("thp: set recommended min free kbytes")
Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Allen Pais <apais@microsoft.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1600305709-2319-2-git-send-email-vijayb@linux.microsoft.com
Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-14 10:31:26 +02:00
Hugh Dickins
fbe96d5aab mm/khugepaged: fix filemap page_to_pgoff(page) != offset
commit 033b5d77551167f8c24ca862ce83d3e0745f9245 upstream.

There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.

Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.

collapse_file(old start)
  new_page = khugepaged_alloc_page(hpage)
  __SetPageLocked(new_page)
  new_page->index = start // hpage->index=old offset
  new_page->mapping = mapping
  xas_store(&xas, new_page)

                          filemap_fault
                            page = find_get_page(mapping, offset)
                            // if offset falls inside hpage then
                            // compound_head(page) == hpage
                            lock_page_maybe_drop_mmap()
                              __lock_page(page)

  // collapse fails
  xas_store(&xas, old page)
  new_page->mapping = NULL
  unlock_page(new_page)

collapse_file(new start)
  new_page = khugepaged_alloc_page(hpage)
  __SetPageLocked(new_page)
  new_page->index = start // hpage->index=new offset
  new_page->mapping = mapping // mapping becomes valid again

                            // since compound_head(page) == hpage
                            // page_to_pgoff(page) got changed
                            VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)

An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above.  But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.

It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.

Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).

Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.

Reported-by: Denis Lisov <dennis.lissov@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.9+
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-14 10:31:23 +02:00
Greg Kroah-Hartman
2dce03a5c2 Merge 4.19.150 into android-4.19-stable
Changes in 4.19.150
	mmc: sdhci: Workaround broken command queuing on Intel GLK based IRBIS models
	USB: gadget: f_ncm: Fix NDP16 datagram validation
	gpio: mockup: fix resource leak in error path
	gpio: tc35894: fix up tc35894 interrupt configuration
	clk: socfpga: stratix10: fix the divider for the emac_ptp_free_clk
	vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock
	vsock/virtio: stop workers during the .remove()
	vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock()
	net: virtio_vsock: Enhance connection semantics
	Input: i8042 - add nopnp quirk for Acer Aspire 5 A515
	ftrace: Move RCU is watching check after recursion check
	drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
	drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices
	drm/sun4i: mixer: Extend regmap max_register
	net: dec: de2104x: Increase receive ring size for Tulip
	rndis_host: increase sleep time in the query-response loop
	nvme-core: get/put ctrl and transport module in nvme_dev_open/release()
	drivers/net/wan/lapbether: Make skb->protocol consistent with the header
	drivers/net/wan/hdlc: Set skb->protocol before transmitting
	mac80211: do not allow bigger VHT MPDUs than the hardware supports
	spi: fsl-espi: Only process interrupts for expected events
	nvme-fc: fail new connections to a deleted host or remote port
	gpio: sprd: Clear interrupt when setting the type as edge
	pinctrl: mvebu: Fix i2c sda definition for 98DX3236
	nfs: Fix security label length not being reset
	clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED
	iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()
	i2c: cpm: Fix i2c_ram structure
	Input: trackpoint - enable Synaptics trackpoints
	random32: Restore __latent_entropy attribute on net_rand_state
	mm: replace memmap_context by meminit_context
	mm: don't rely on system state to detect hot-plug operations
	net/packet: fix overflow in tpacket_rcv
	epoll: do not insert into poll queues until all sanity checks are done
	epoll: replace ->visited/visited_list with generation count
	epoll: EPOLL_CTL_ADD: close the race in decision to take fast path
	ep_create_wakeup_source(): dentry name can change under you...
	netfilter: ctnetlink: add a range check for l3/l4 protonum
	Linux 4.19.150

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib6f1b6fce01bec80efd4a905d03903ff20ca89be
2020-10-07 08:45:35 +02:00
Laurent Dufour
b6f69f72c1 mm: don't rely on system state to detect hot-plug operations
commit f85086f95fa36194eb0db5cd5c12e56801b98523 upstream.

In register_mem_sect_under_node() the system_state's value is checked to
detect whether the call is made during boot time or during an hot-plug
operation.  Unfortunately, that check against SYSTEM_BOOTING is wrong
because regular memory is registered at SYSTEM_SCHEDULING state.  In
addition, memory hot-plug operation can be triggered at this system
state by the ACPI [1].  So checking against the system state is not
enough.

The consequence is that on system with interleaved node's ranges like this:

 Early memory node ranges
   node   1: [mem 0x0000000000000000-0x000000011fffffff]
   node   2: [mem 0x0000000120000000-0x000000014fffffff]
   node   1: [mem 0x0000000150000000-0x00000001ffffffff]
   node   0: [mem 0x0000000200000000-0x000000048fffffff]
   node   2: [mem 0x0000000490000000-0x00000007ffffffff]

This can be seen on PowerPC LPAR after multiple memory hot-plug and
hot-unplug operations are done.  At the next reboot the node's memory
ranges can be interleaved and since the call to link_mem_sections() is
made in topology_init() while the system is in the SYSTEM_SCHEDULING
state, the node's id is not checked, and the sections registered to
multiple nodes:

  $ ls -l /sys/devices/system/memory/memory21/node*
  total 0
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -> ../../node/node1
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -> ../../node/node2

In that case, the system is able to boot but if later one of theses
memory blocks is hot-unplugged and then hot-plugged, the sysfs
inconsistency is detected and this is triggering a BUG_ON():

  kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
  CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
  Call Trace:
    add_memory_resource+0x23c/0x340 (unreliable)
    __add_memory+0x5c/0xf0
    dlpar_add_lmb+0x1b4/0x500
    dlpar_memory+0x1f8/0xb80
    handle_dlpar_errorlog+0xc0/0x190
    dlpar_store+0x198/0x4a0
    kobj_attr_store+0x30/0x50
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1b0/0x290
    vfs_write+0xe8/0x290
    ksys_write+0xdc/0x130
    system_call_exception+0x160/0x270
    system_call_common+0xf0/0x27c

This patch addresses the root cause by not relying on the system_state
value to detect whether the call is due to a hot-plug operation.  An
extra parameter is added to link_mem_sections() detailing whether the
operation is due to a hot-plug operation.

[1] According to Oscar Salvador, using this qemu command line, ACPI
memory hotplug operations are raised at SYSTEM_SCHEDULING state:

  $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \
        -m size=$MEM,slots=255,maxmem=4294967296k  \
        -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \
        -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \
        -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \
        -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \
        -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \
        -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \
        -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \
        -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \

Fixes: 4fbce63391 ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-07 08:00:08 +02:00
Laurent Dufour
25eaea1b33 mm: replace memmap_context by meminit_context
commit c1d0da83358a2316d9be7f229f26126dbaa07468 upstream.

Patch series "mm: fix memory to node bad links in sysfs", v3.

Sometimes, firmware may expose interleaved memory layout like this:

 Early memory node ranges
   node   1: [mem 0x0000000000000000-0x000000011fffffff]
   node   2: [mem 0x0000000120000000-0x000000014fffffff]
   node   1: [mem 0x0000000150000000-0x00000001ffffffff]
   node   0: [mem 0x0000000200000000-0x000000048fffffff]
   node   2: [mem 0x0000000490000000-0x00000007ffffffff]

In that case, we can see memory blocks assigned to multiple nodes in
sysfs:

  $ ls -l /sys/devices/system/memory/memory21
  total 0
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -> ../../node/node1
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -> ../../node/node2
  -rw-r--r-- 1 root root 65536 Aug 24 05:27 online
  -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device
  -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index
  drwxr-xr-x 2 root root     0 Aug 24 05:27 power
  -r--r--r-- 1 root root 65536 Aug 24 05:27 removable
  -rw-r--r-- 1 root root 65536 Aug 24 05:27 state
  lrwxrwxrwx 1 root root     0 Aug 24 05:25 subsystem -> ../../../../bus/memory
  -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent
  -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones

The same applies in the node's directory with a memory21 link in both
the node1 and node2's directory.

This is wrong but doesn't prevent the system to run.  However when
later, one of these memory blocks is hot-unplugged and then hot-plugged,
the system is detecting an inconsistency in the sysfs layout and a
BUG_ON() is raised:

  kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
  CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
  Call Trace:
    add_memory_resource+0x23c/0x340 (unreliable)
    __add_memory+0x5c/0xf0
    dlpar_add_lmb+0x1b4/0x500
    dlpar_memory+0x1f8/0xb80
    handle_dlpar_errorlog+0xc0/0x190
    dlpar_store+0x198/0x4a0
    kobj_attr_store+0x30/0x50
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1b0/0x290
    vfs_write+0xe8/0x290
    ksys_write+0xdc/0x130
    system_call_exception+0x160/0x270
    system_call_common+0xf0/0x27c

This has been seen on PowerPC LPAR.

The root cause of this issue is that when node's memory is registered,
the range used can overlap another node's range, thus the memory block
is registered to multiple nodes in sysfs.

There are two issues here:

 (a) The sysfs memory and node's layouts are broken due to these
     multiple links

 (b) The link errors in link_mem_sections() should not lead to a system
     panic.

To address (a) register_mem_sect_under_node should not rely on the
system state to detect whether the link operation is triggered by a hot
plug operation or not.  This is addressed by the patches 1 and 2 of this
series.

Issue (b) will be addressed separately.

This patch (of 2):

The memmap_context enum is used to detect whether a memory operation is
due to a hot-add operation or happening at boot time.

Make it general to the hotplug operation and rename it as
meminit_context.

There is no functional change introduced by this patch

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J . Wysocki" <rafael@kernel.org>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com
Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-07 08:00:08 +02:00
Greg Kroah-Hartman
9ce79d9bed Merge 4.19.149 into android-4.19-stable
Changes in 4.19.149
	selinux: allow labeling before policy is loaded
	media: mc-device.c: fix memleak in media_device_register_entity
	dma-fence: Serialise signal enabling (dma_fence_enable_sw_signaling)
	ath10k: fix array out-of-bounds access
	ath10k: fix memory leak for tpc_stats_final
	mm: fix double page fault on arm64 if PTE_AF is cleared
	scsi: aacraid: fix illegal IO beyond last LBA
	m68k: q40: Fix info-leak in rtc_ioctl
	gma/gma500: fix a memory disclosure bug due to uninitialized bytes
	ASoC: kirkwood: fix IRQ error handling
	media: smiapp: Fix error handling at NVM reading
	arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
	x86/ioapic: Unbreak check_timer()
	ALSA: usb-audio: Add delay quirk for H570e USB headsets
	ALSA: hda/realtek - Couldn't detect Mic if booting with headset plugged
	ALSA: hda/realtek: Enable front panel headset LED on Lenovo ThinkStation P520
	lib/string.c: implement stpcpy
	leds: mlxreg: Fix possible buffer overflow
	PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out
	scsi: fnic: fix use after free
	scsi: lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounce
	net: silence data-races on sk_backlog.tail
	clk/ti/adpll: allocate room for terminating null
	drm/amdgpu/powerplay: fix AVFS handling with custom powerplay table
	mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup()
	mfd: mfd-core: Protect against NULL call-back function pointer
	drm/amdgpu/powerplay/smu7: fix AVFS handling with custom powerplay table
	tpm_crb: fix fTPM on AMD Zen+ CPUs
	tracing: Adding NULL checks for trace_array descriptor pointer
	bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
	dmaengine: mediatek: hsdma_probe: fixed a memory leak when devm_request_irq fails
	RDMA/qedr: Fix potential use after free
	RDMA/i40iw: Fix potential use after free
	fix dget_parent() fastpath race
	xfs: fix attr leaf header freemap.size underflow
	RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()'
	ubi: Fix producing anchor PEBs
	mmc: core: Fix size overflow for mmc partitions
	gfs2: clean up iopen glock mess in gfs2_create_inode
	scsi: pm80xx: Cleanup command when a reset times out
	debugfs: Fix !DEBUG_FS debugfs_create_automount
	CIFS: Properly process SMB3 lease breaks
	ASoC: max98090: remove msleep in PLL unlocked workaround
	kernel/sys.c: avoid copying possible padding bytes in copy_to_user
	KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy()
	xfs: fix log reservation overflows when allocating large rt extents
	neigh_stat_seq_next() should increase position index
	rt_cpu_seq_next should increase position index
	ipv6_route_seq_next should increase position index
	seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier
	media: ti-vpe: cal: Restrict DMA to avoid memory corruption
	sctp: move trace_sctp_probe_path into sctp_outq_sack
	ACPI: EC: Reference count query handlers under lock
	scsi: ufs: Make ufshcd_add_command_trace() easier to read
	scsi: ufs: Fix a race condition in the tracing code
	dmaengine: zynqmp_dma: fix burst length configuration
	s390/cpum_sf: Use kzalloc and minor changes
	powerpc/eeh: Only dump stack once if an MMIO loop is detected
	Bluetooth: btrtl: Use kvmalloc for FW allocations
	tracing: Set kernel_stack's caller size properly
	ARM: 8948/1: Prevent OOB access in stacktrace
	ar5523: Add USB ID of SMCWUSBT-G2 wireless adapter
	ceph: ensure we have a new cap before continuing in fill_inode
	selftests/ftrace: fix glob selftest
	tools/power/x86/intel_pstate_tracer: changes for python 3 compatibility
	Bluetooth: Fix refcount use-after-free issue
	mm/swapfile.c: swap_next should increase position index
	mm: pagewalk: fix termination condition in walk_pte_range()
	Bluetooth: prefetch channel before killing sock
	KVM: fix overflow of zero page refcount with ksm running
	ALSA: hda: Clear RIRB status before reading WP
	skbuff: fix a data race in skb_queue_len()
	audit: CONFIG_CHANGE don't log internal bookkeeping as an event
	selinux: sel_avc_get_stat_idx should increase position index
	scsi: lpfc: Fix RQ buffer leakage when no IOCBs available
	scsi: lpfc: Fix coverity errors in fmdi attribute handling
	drm/omap: fix possible object reference leak
	clk: stratix10: use do_div() for 64-bit calculation
	crypto: chelsio - This fixes the kernel panic which occurs during a libkcapi test
	mt76: clear skb pointers from rx aggregation reorder buffer during cleanup
	ALSA: usb-audio: Don't create a mixer element with bogus volume range
	perf test: Fix test trace+probe_vfs_getname.sh on s390
	RDMA/rxe: Fix configuration of atomic queue pair attributes
	KVM: x86: fix incorrect comparison in trace event
	dmaengine: stm32-mdma: use vchan_terminate_vdesc() in .terminate_all
	media: staging/imx: Missing assignment in imx_media_capture_device_register()
	x86/pkeys: Add check for pkey "overflow"
	bpf: Remove recursion prevention from rcu free callback
	dmaengine: stm32-dma: use vchan_terminate_vdesc() in .terminate_all
	dmaengine: tegra-apb: Prevent race conditions on channel's freeing
	drm/amd/display: dal_ddc_i2c_payloads_create can fail causing panic
	firmware: arm_sdei: Use cpus_read_lock() to avoid races with cpuhp
	random: fix data races at timer_rand_state
	bus: hisi_lpc: Fixup IO ports addresses to avoid use-after-free in host removal
	media: go7007: Fix URB type for interrupt handling
	Bluetooth: guard against controllers sending zero'd events
	timekeeping: Prevent 32bit truncation in scale64_check_overflow()
	ext4: fix a data race at inode->i_disksize
	perf jevents: Fix leak of mapfile memory
	mm: avoid data corruption on CoW fault into PFN-mapped VMA
	drm/amdgpu: increase atombios cmd timeout
	drm/amd/display: Stop if retimer is not available
	ath10k: use kzalloc to read for ath10k_sdio_hif_diag_read
	scsi: aacraid: Disabling TM path and only processing IOP reset
	Bluetooth: L2CAP: handle l2cap config request during open state
	media: tda10071: fix unsigned sign extension overflow
	xfs: don't ever return a stale pointer from __xfs_dir3_free_read
	xfs: mark dir corrupt when lookup-by-hash fails
	ext4: mark block bitmap corrupted when found instead of BUGON
	tpm: ibmvtpm: Wait for buffer to be set before proceeding
	rtc: sa1100: fix possible race condition
	rtc: ds1374: fix possible race condition
	nfsd: Don't add locks to closed or closing open stateids
	RDMA/cm: Remove a race freeing timewait_info
	KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones
	drm/msm: fix leaks if initialization fails
	drm/msm/a5xx: Always set an OPP supported hardware value
	tracing: Use address-of operator on section symbols
	thermal: rcar_thermal: Handle probe error gracefully
	perf parse-events: Fix 3 use after frees found with clang ASAN
	serial: 8250_port: Don't service RX FIFO if throttled
	serial: 8250_omap: Fix sleeping function called from invalid context during probe
	serial: 8250: 8250_omap: Terminate DMA before pushing data on RX timeout
	perf cpumap: Fix snprintf overflow check
	cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn
	tools: gpio-hammer: Avoid potential overflow in main
	nvme-multipath: do not reset on unknown status
	nvme: Fix controller creation races with teardown flow
	RDMA/rxe: Set sys_image_guid to be aligned with HW IB devices
	scsi: hpsa: correct race condition in offload enabled
	SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()'
	svcrdma: Fix leak of transport addresses
	PCI: Use ioremap(), not phys_to_virt() for platform ROM
	ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len
	ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor
	PCI: pciehp: Fix MSI interrupt race
	NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
	mm/kmemleak.c: use address-of operator on section symbols
	mm/filemap.c: clear page error before actual read
	mm/vmscan.c: fix data races using kswapd_classzone_idx
	nvmet-rdma: fix double free of rdma queue
	mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area
	scsi: qedi: Fix termination timeouts in session logout
	serial: uartps: Wait for tx_empty in console setup
	KVM: Remove CREATE_IRQCHIP/SET_PIT2 race
	bdev: Reduce time holding bd_mutex in sync in blkdev_close()
	drivers: char: tlclk.c: Avoid data race between init and interrupt handler
	KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
	net: openvswitch: use u64 for meter bucket
	scsi: aacraid: Fix error handling paths in aac_probe_one()
	staging:r8188eu: avoid skb_clone for amsdu to msdu conversion
	sparc64: vcc: Fix error return code in vcc_probe()
	arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
	dt-bindings: sound: wm8994: Correct required supplies based on actual implementaion
	atm: fix a memory leak of vcc->user_back
	perf mem2node: Avoid double free related to realloc
	power: supply: max17040: Correct voltage reading
	phy: samsung: s5pv210-usb2: Add delay after reset
	Bluetooth: Handle Inquiry Cancel error after Inquiry Complete
	USB: EHCI: ehci-mv: fix error handling in mv_ehci_probe()
	tipc: fix memory leak in service subscripting
	tty: serial: samsung: Correct clock selection logic
	ALSA: hda: Fix potential race in unsol event handler
	powerpc/traps: Make unrecoverable NMIs die instead of panic
	fuse: don't check refcount after stealing page
	USB: EHCI: ehci-mv: fix less than zero comparison of an unsigned int
	scsi: cxlflash: Fix error return code in cxlflash_probe()
	arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
	e1000: Do not perform reset in reset_task if we are already down
	drm/nouveau/debugfs: fix runtime pm imbalance on error
	drm/nouveau: fix runtime pm imbalance on error
	drm/nouveau/dispnv50: fix runtime pm imbalance on error
	printk: handle blank console arguments passed in.
	usb: dwc3: Increase timeout for CmdAct cleared by device controller
	btrfs: don't force read-only after error in drop snapshot
	vfio/pci: fix memory leaks of eventfd ctx
	perf evsel: Fix 2 memory leaks
	perf trace: Fix the selection for architectures to generate the errno name tables
	perf stat: Fix duration_time value for higher intervals
	perf util: Fix memory leak of prefix_if_not_in
	perf metricgroup: Free metric_events on error
	perf kcore_copy: Fix module map when there are no modules loaded
	ASoC: img-i2s-out: Fix runtime PM imbalance on error
	wlcore: fix runtime pm imbalance in wl1271_tx_work
	wlcore: fix runtime pm imbalance in wlcore_regdomain_config
	mtd: rawnand: omap_elm: Fix runtime PM imbalance on error
	PCI: tegra: Fix runtime PM imbalance on error
	ceph: fix potential race in ceph_check_caps
	mm/swap_state: fix a data race in swapin_nr_pages
	rapidio: avoid data race between file operation callbacks and mport_cdev_add().
	mtd: parser: cmdline: Support MTD names containing one or more colons
	x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline
	vfio/pci: Clear error and request eventfd ctx after releasing
	cifs: Fix double add page to memcg when cifs_readpages
	nvme: fix possible deadlock when I/O is blocked
	scsi: libfc: Handling of extra kref
	scsi: libfc: Skip additional kref updating work event
	selftests/x86/syscall_nt: Clear weird flags after each test
	vfio/pci: fix racy on error and request eventfd ctx
	btrfs: qgroup: fix data leak caused by race between writeback and truncate
	ubi: fastmap: Free unused fastmap anchor peb during detach
	perf parse-events: Use strcmp() to compare the PMU name
	net: openvswitch: use div_u64() for 64-by-32 divisions
	nvme: explicitly update mpath disk capacity on revalidation
	ASoC: wm8994: Skip setting of the WM8994_MICBIAS register for WM1811
	ASoC: wm8994: Ensure the device is resumed in wm89xx_mic_detect functions
	ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN Converter9 2-in-1
	RISC-V: Take text_mutex in ftrace_init_nop()
	s390/init: add missing __init annotations
	lockdep: fix order in trace_hardirqs_off_caller()
	drm/amdkfd: fix a memory leak issue
	i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()
	objtool: Fix noreturn detection for ignored functions
	ieee802154: fix one possible memleak in ca8210_dev_com_init
	ieee802154/adf7242: check status of adf7242_read_reg
	clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()
	mwifiex: Increase AES key storage size to 256 bits
	batman-adv: bla: fix type misuse for backbone_gw hash indexing
	atm: eni: fix the missed pci_disable_device() for eni_init_one()
	batman-adv: mcast/TT: fix wrongly dropped or rerouted packets
	mac802154: tx: fix use-after-free
	bpf: Fix clobbering of r2 in bpf_gen_ld_abs
	drm/vc4/vc4_hdmi: fill ASoC card owner
	net: qed: RDMA personality shouldn't fail VF load
	drm/sun4i: sun8i-csc: Secondary CSC register correction
	batman-adv: Add missing include for in_interrupt()
	batman-adv: mcast: fix duplicate mcast packets in BLA backbone from mesh
	batman-adv: mcast: fix duplicate mcast packets from BLA backbone to mesh
	bpf: Fix a rcu warning for bpffs map pretty-print
	ALSA: asihpi: fix iounmap in error handler
	regmap: fix page selection for noinc reads
	MIPS: Add the missing 'CPU_1074K' into __get_cpu_type()
	KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE
	KVM: SVM: Add a dedicated INVD intercept routine
	tracing: fix double free
	s390/dasd: Fix zero write for FBA devices
	kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
	mm, THP, swap: fix allocating cluster for swapfile by mistake
	s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl
	kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE
	ata: define AC_ERR_OK
	ata: make qc_prep return ata_completion_errors
	ata: sata_mv, avoid trigerrable BUG_ON
	KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch
	Linux 4.19.149

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idfc1b35ec63b4b464aeb6e32709102bee0efc872
2020-10-01 16:49:05 +02:00
Gao Xiang
f3e8ed3d33 mm, THP, swap: fix allocating cluster for swapfile by mistake
commit 41663430588c737dd735bad5a0d1ba325dcabd59 upstream.

SWP_FS is used to make swap_{read,write}page() go through the
filesystem, and it's only used for swap files over NFS.  So, !SWP_FS
means non NFS for now, it could be either file backed or device backed.
Something similar goes with legacy SWP_FILE.

So in order to achieve the goal of the original patch, SWP_BLKDEV should
be used instead.

FS corruption can be observed with SSD device + XFS + fragmented
swapfile due to CONFIG_THP_SWAP=y.

I reproduced the issue with the following details:

Environment:

  QEMU + upstream kernel + buildroot + NVMe (2 GB)

Kernel config:

  CONFIG_BLK_DEV_NVME=y
  CONFIG_THP_SWAP=y

Some reproducible steps:

  mkfs.xfs -f /dev/nvme0n1
  mkdir /tmp/mnt
  mount /dev/nvme0n1 /tmp/mnt
  bs="32k"
  sz="1024m"    # doesn't matter too much, I also tried 16m
  xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
  xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
  xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
  xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw
  xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw

  mkswap /tmp/mnt/sw
  swapon /tmp/mnt/sw

  stress --vm 2 --vm-bytes 600M   # doesn't matter too much as well

Symptoms:
 - FS corruption (e.g. checksum failure)
 - memory corruption at: 0xd2808010
 - segfault

Fixes: f0eea189e8 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device")
Fixes: 38d8b4e6bd ("mm, THP, swap: delay splitting THP during swap out")
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Eric Sandeen <esandeen@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200820045323.7809-1-hsiangkao@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-01 13:14:53 +02:00
Qian Cai
8cc3afd53d mm/swap_state: fix a data race in swapin_nr_pages
[ Upstream commit d6c1f098f2a7ba62627c9bc17cda28f534ef9e4a ]

"prev_offset" is a static variable in swapin_nr_pages() that can be
accessed concurrently with only mmap_sem held in read mode as noticed by
KCSAN,

 BUG: KCSAN: data-race in swap_cluster_readahead / swap_cluster_readahead

 write to 0xffffffff92763830 of 8 bytes by task 14795 on cpu 17:
  swap_cluster_readahead+0x2a6/0x5e0
  swapin_readahead+0x92/0x8dc
  do_swap_page+0x49b/0xf20
  __handle_mm_fault+0xcfb/0xd70
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x715
  page_fault+0x34/0x40

 1 lock held by (dnf)/14795:
  #0: ffff897bd2e98858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715
  do_user_addr_fault at arch/x86/mm/fault.c:1405
  (inlined by) do_page_fault at arch/x86/mm/fault.c:1535
 irq event stamp: 83493
 count_memcg_event_mm+0x1a6/0x270
 count_memcg_event_mm+0x119/0x270
 __do_softirq+0x365/0x589
 irq_exit+0xa2/0xc0

 read to 0xffffffff92763830 of 8 bytes by task 1 on cpu 22:
  swap_cluster_readahead+0xfd/0x5e0
  swapin_readahead+0x92/0x8dc
  do_swap_page+0x49b/0xf20
  __handle_mm_fault+0xcfb/0xd70
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x715
  page_fault+0x34/0x40

 1 lock held by systemd/1:
  #0: ffff897c38f14858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715
 irq event stamp: 43530289
 count_memcg_event_mm+0x1a6/0x270
 count_memcg_event_mm+0x119/0x270
 __do_softirq+0x365/0x589
 irq_exit+0xa2/0xc0

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20200402213748.2237-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:47 +02:00
Jaewon Kim
6bee7991f6 mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area
[ Upstream commit 09ef5283fd96ac424ef0e569626f359bf9ab86c9 ]

On passing requirement to vm_unmapped_area, arch_get_unmapped_area and
arch_get_unmapped_area_topdown did not set align_offset.  Internally on
both unmapped_area and unmapped_area_topdown, if info->align_mask is 0,
then info->align_offset was meaningless.

But commit df529cabb7a2 ("mm: mmap: add trace point of
vm_unmapped_area") always prints info->align_offset even though it is
uninitialized.

Fix this uninitialized value issue by setting it to 0 explicitly.

Before:
  vm_unmapped_area: addr=0x755b155000 err=0 total_vm=0x15aaf0 flags=0x1 len=0x109000 lo=0x8000 hi=0x75eed48000 mask=0x0 ofs=0x4022

After:
  vm_unmapped_area: addr=0x74a4ca1000 err=0 total_vm=0x168ab1 flags=0x1 len=0x9000 lo=0x8000 hi=0x753d94b000 mask=0x0 ofs=0x0

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20200409094035.19457-1-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:42 +02:00
Qian Cai
b73c744019 mm/vmscan.c: fix data races using kswapd_classzone_idx
[ Upstream commit 5644e1fbbfe15ad06785502bbfe5751223e5841d ]

pgdat->kswapd_classzone_idx could be accessed concurrently in
wakeup_kswapd().  Plain writes and reads without any lock protection
result in data races.  Fix them by adding a pair of READ|WRITE_ONCE() as
well as saving a branch (compilers might well optimize the original code
in an unintentional way anyway).  While at it, also take care of
pgdat->kswapd_order and non-kswapd threads in allow_direct_reclaim().  The
data races were reported by KCSAN,

 BUG: KCSAN: data-race in wakeup_kswapd / wakeup_kswapd

 write to 0xffff9f427ffff2dc of 4 bytes by task 7454 on cpu 13:
  wakeup_kswapd+0xf1/0x400
  wakeup_kswapd at mm/vmscan.c:3967
  wake_all_kswapds+0x59/0xc0
  wake_all_kswapds at mm/page_alloc.c:4241
  __alloc_pages_slowpath+0xdcc/0x1290
  __alloc_pages_slowpath at mm/page_alloc.c:4512
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x16e/0x6f0
  __handle_mm_fault+0xcd5/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 1 lock held by mtest01/7454:
  #0: ffff9f425afe8808 (&mm->mmap_sem#2){++++}, at:
 do_page_fault+0x143/0x6f9
 do_user_addr_fault at arch/x86/mm/fault.c:1405
 (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
 irq event stamp: 6944085
 count_memcg_event_mm+0x1a6/0x270
 count_memcg_event_mm+0x119/0x270
 __do_softirq+0x34c/0x57c
 irq_exit+0xa2/0xc0

 read to 0xffff9f427ffff2dc of 4 bytes by task 7472 on cpu 38:
  wakeup_kswapd+0xc8/0x400
  wake_all_kswapds+0x59/0xc0
  __alloc_pages_slowpath+0xdcc/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x16e/0x6f0
  __handle_mm_fault+0xcd5/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 1 lock held by mtest01/7472:
  #0: ffff9f425a9ac148 (&mm->mmap_sem#2){++++}, at:
 do_page_fault+0x143/0x6f9
 irq event stamp: 6793561
 count_memcg_event_mm+0x1a6/0x270
 count_memcg_event_mm+0x119/0x270
 __do_softirq+0x34c/0x57c
 irq_exit+0xa2/0xc0

 BUG: KCSAN: data-race in kswapd / wakeup_kswapd

 write to 0xffff90973ffff2dc of 4 bytes by task 820 on cpu 6:
  kswapd+0x27c/0x8d0
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

 read to 0xffff90973ffff2dc of 4 bytes by task 6299 on cpu 0:
  wakeup_kswapd+0xf3/0x450
  wake_all_kswapds+0x59/0xc0
  __alloc_pages_slowpath+0xdcc/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/1582749472-5171-1-git-send-email-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:41 +02:00
Xianting Tian
cebefe4f6f mm/filemap.c: clear page error before actual read
[ Upstream commit faffdfa04fa11ccf048cebdde73db41ede0679e0 ]

Mount failure issue happens under the scenario: Application forked dozens
of threads to mount the same number of cramfs images separately in docker,
but several mounts failed with high probability.  Mount failed due to the
checking result of the page(read from the superblock of loop dev) is not
uptodate after wait_on_page_locked(page) returned in function cramfs_read:

   wait_on_page_locked(page);
   if (!PageUptodate(page)) {
      ...
   }

The reason of the checking result of the page not uptodate: systemd-udevd
read the loopX dev before mount, because the status of loopX is Lo_unbound
at this time, so loop_make_request directly trigger the calling of io_end
handler end_buffer_async_read, which called SetPageError(page).  So It
caused the page can't be set to uptodate in function
end_buffer_async_read:

   if(page_uptodate && !PageError(page)) {
      SetPageUptodate(page);
   }

Then mount operation is performed, it used the same page which is just
accessed by systemd-udevd above, Because this page is not uptodate, it
will launch a actual read via submit_bh, then wait on this page by calling
wait_on_page_locked(page).  When the I/O of the page done, io_end handler
end_buffer_async_read is called, because no one cleared the page
error(during the whole read path of mount), which is caused by
systemd-udevd reading, so this page is still in "PageError" status, which
can't be set to uptodate in function end_buffer_async_read, then caused
mount failure.

But sometimes mount succeed even through systemd-udeved read loopX dev
just before, The reason is systemd-udevd launched other loopX read just
between step 3.1 and 3.2, the steps as below:

1, loopX dev default status is Lo_unbound;
2, systemd-udved read loopX dev (page is set to PageError);
3, mount operation
   1) set loopX status to Lo_bound;
   ==>systemd-udevd read loopX dev<==
   2) read loopX dev(page has no error)
   3) mount succeed

As the loopX dev status is set to Lo_bound after step 3.1, so the other
loopX dev read by systemd-udevd will go through the whole I/O stack, part
of the call trace as below:

   SYS_read
      vfs_read
          do_sync_read
              blkdev_aio_read
                 generic_file_aio_read
                     do_generic_file_read:
                        ClearPageError(page);
                        mapping->a_ops->readpage(filp, page);

here, mapping->a_ops->readpage() is blkdev_readpage.  In latest kernel,
some function name changed, the call trace as below:

   blkdev_read_iter
      generic_file_read_iter
         generic_file_buffered_read:
            /*
             * A previous I/O error may have been due to temporary
             * failures, eg. mutipath errors.
             * Pg_error will be set again if readpage fails.
             */
            ClearPageError(page);
            /* Start the actual read. The read will unlock the page*/
            error=mapping->a_ops->readpage(flip, page);

We can see ClearPageError(page) is called before the actual read,
then the read in step 3.2 succeed.

This patch is to add the calling of ClearPageError just before the actual
read of read path of cramfs mount.  Without the patch, the call trace as
below when performing cramfs mount:

   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page
               do_read_cache_page:
                  filler(data, page);
                  or
                  mapping->a_ops->readpage(data, page);

With the patch, the call trace as below when performing mount:

   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page:
               do_read_cache_page:
                  ClearPageError(page); <== new add
                  filler(data, page);
                  or
                  mapping->a_ops->readpage(data, page);

With the patch, mount operation trigger the calling of
ClearPageError(page) before the actual read, the page has no error if no
additional page error happen when I/O done.

Signed-off-by: Xianting Tian <xianting_tian@126.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: <yubin@h3c.com>
Link: http://lkml.kernel.org/r/1583318844-22971-1-git-send-email-xianting_tian@126.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:41 +02:00
Nathan Chancellor
afe001488e mm/kmemleak.c: use address-of operator on section symbols
[ Upstream commit b0d14fc43d39203ae025f20ef4d5d25d9ccf4be1 ]

Clang warns:

  mm/kmemleak.c:1955:28: warning: array comparison always evaluates to a constant [-Wtautological-compare]
        if (__start_ro_after_init < _sdata || __end_ro_after_init > _edata)
                                  ^
  mm/kmemleak.c:1955:60: warning: array comparison always evaluates to a constant [-Wtautological-compare]
        if (__start_ro_after_init < _sdata || __end_ro_after_init > _edata)

These are not true arrays, they are linker defined symbols, which are just
addresses.  Using the address of operator silences the warning and does
not change the resulting assembly with either clang/ld.lld or gcc/ld
(tested with diff + objdump -Dr).

Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/895
Link: http://lkml.kernel.org/r/20200220051551.44000-1-natechancellor@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:41 +02:00
Kirill A. Shutemov
2b294ac325 mm: avoid data corruption on CoW fault into PFN-mapped VMA
[ Upstream commit c3e5ea6ee574ae5e845a40ac8198de1fb63bb3ab ]

Jeff Moyer has reported that one of xfstests triggers a warning when run
on DAX-enabled filesystem:

	WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50
	...
	wp_page_copy+0x98c/0xd50 (unreliable)
	do_wp_page+0xd8/0xad0
	__handle_mm_fault+0x748/0x1b90
	handle_mm_fault+0x120/0x1f0
	__do_page_fault+0x240/0xd70
	do_page_fault+0x38/0xd0
	handle_page_fault+0x10/0x30

The warning happens on failed __copy_from_user_inatomic() which tries to
copy data into a CoW page.

This happens because of race between MADV_DONTNEED and CoW page fault:

	CPU0					CPU1
 handle_mm_fault()
   do_wp_page()
     wp_page_copy()
       do_wp_page()
					madvise(MADV_DONTNEED)
					  zap_page_range()
					    zap_pte_range()
					      ptep_get_and_clear_full()
					      <TLB flush>
	 __copy_from_user_inatomic()
	 sees empty PTE and fails
	 WARN_ON_ONCE(1)
	 clear_page()

The solution is to re-try __copy_from_user_inatomic() under PTL after
checking that PTE is matches the orig_pte.

The second copy attempt can still fail, like due to non-readable PTE, but
there's nothing reasonable we can do about, except clearing the CoW page.

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Justin He <Justin.He@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: http://lkml.kernel.org/r/20200218154151.13349-1-kirill.shutemov@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:36 +02:00
Steven Price
f9cb6b6124 mm: pagewalk: fix termination condition in walk_pte_range()
[ Upstream commit c02a98753e0a36ba65a05818626fa6adeb4e7c97 ]

If walk_pte_range() is called with a 'end' argument that is beyond the
last page of memory (e.g.  ~0UL) then the comparison between 'addr' and
'end' will always fail and the loop will be infinite.  Instead change the
comparison to >= while accounting for overflow.

Link: http://lkml.kernel.org/r/20191218162402.45610-15-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:32 +02:00
Vasily Averin
52f5a09ab7 mm/swapfile.c: swap_next should increase position index
[ Upstream commit 10c8d69f314d557d94d74ec492575ae6a4f1eb1c ]

If seq_file .next fuction does not change position index, read after
some lseek can generate unexpected output.

In Aug 2018 NeilBrown noticed commit 1f4aace60b ("fs/seq_file.c:
simplify seq_file iteration code and interface") "Some ->next functions
do not increment *pos when they return NULL...  Note that such ->next
functions are buggy and should be fixed.  A simple demonstration is

  dd if=/proc/swaps bs=1000 skip=1

Choose any block size larger than the size of /proc/swaps.  This will
always show the whole last line of /proc/swaps"

Described problem is still actual.  If you make lseek into middle of
last output line following read will output end of last line and whole
last line once again.

  $ dd if=/proc/swaps bs=1  # usual output
  Filename				Type		Size	Used	Priority
  /dev/dm-0                               partition	4194812	97536	-2
  104+0 records in
  104+0 records out
  104 bytes copied

  $ dd if=/proc/swaps bs=40 skip=1    # last line was generated twice
  dd: /proc/swaps: cannot skip to specified offset
  v/dm-0                               partition	4194812	97536	-2
  /dev/dm-0                               partition	4194812	97536	-2
  3+1 records in
  3+1 records out
  131 bytes copied

https://bugzilla.kernel.org/show_bug.cgi?id=206283

Link: http://lkml.kernel.org/r/bd8cfd7b-ac95-9b91-f9e7-e8438bd5047d@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:32 +02:00
Jia He
8579a04403 mm: fix double page fault on arm64 if PTE_AF is cleared
[ Upstream commit 83d116c53058d505ddef051e90ab27f57015b025 ]

When we tested pmdk unit test [1] vmmalloc_fork TEST3 on arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

To reproduce the bug, the cmd is as follows after you deployed everything:
make -C src/test/vmmalloc_fork/ TEST_TIME=60m check

Below call trace is from arm64 do_page_fault for debugging purpose:
[  110.016195] Call trace:
[  110.016826]  do_page_fault+0x5a4/0x690
[  110.017812]  do_mem_abort+0x50/0xb0
[  110.018726]  el1_da+0x20/0xc4
[  110.019492]  __arch_copy_from_user+0x180/0x280
[  110.020646]  do_wp_page+0xb0/0x860
[  110.021517]  __handle_mm_fault+0x994/0x1338
[  110.022606]  handle_mm_fault+0xe8/0x180
[  110.023584]  do_page_fault+0x240/0x690
[  110.024535]  do_mem_abort+0x50/0xb0
[  110.025423]  el0_da+0x20/0x24

The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
               pmd=000000023d4b3003, pte=360000298607bd3

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

This patch fixes it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()

Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
in case there can be some obscure use-case (by Kirill).

[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork

Signed-off-by: Jia He <justin.he@arm.com>
Reported-by: Yibo Cai <Yibo.Cai@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:24 +02:00
Greg Kroah-Hartman
5d5e582aab Merge 4.19.148 into android-4.19-stable
Changes in 4.19.148
	af_key: pfkey_dump needs parameter validation
	KVM: fix memory leak in kvm_io_bus_unregister_dev()
	kprobes: fix kill kprobe which has been marked as gone
	mm/thp: fix __split_huge_pmd_locked() for migration PMD
	cxgb4: Fix offset when clearing filter byte counters
	geneve: add transport ports in route lookup for geneve
	hdlc_ppp: add range checks in ppp_cp_parse_cr()
	ip: fix tos reflection in ack and reset packets
	ipv6: avoid lockdep issue in fib6_del()
	net: DCB: Validate DCB_ATTR_DCB_BUFFER argument
	net: dsa: rtl8366: Properly clear member config
	net: ipv6: fix kconfig dependency warning for IPV6_SEG6_HMAC
	net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
	nfp: use correct define to return NONE fec
	tipc: Fix memory leak in tipc_group_create_member()
	tipc: fix shutdown() of connection oriented socket
	tipc: use skb_unshare() instead in tipc_buf_append()
	bnxt_en: return proper error codes in bnxt_show_temp
	bnxt_en: Protect bnxt_set_eee() and bnxt_set_pauseparam() with mutex.
	net: phy: Avoid NPD upon phy_detach() when driver is unbound
	net: qrtr: check skb_put_padto() return value
	net: add __must_check to skb_put_padto()
	ipv4: Update exception handling for multipath routes via same device
	MAINTAINERS: add CLANG/LLVM BUILD SUPPORT info
	kbuild: add OBJSIZE variable for the size tool
	Documentation/llvm: add documentation on building w/ Clang/LLVM
	Documentation/llvm: fix the name of llvm-size
	net: wan: wanxl: use allow to pass CROSS_COMPILE_M68k for rebuilding firmware
	net: wan: wanxl: use $(M68KCC) instead of $(M68KAS) for rebuilding firmware
	x86/boot: kbuild: allow readelf executable to be specified
	kbuild: remove AS variable
	kbuild: replace AS=clang with LLVM_IAS=1
	kbuild: support LLVM=1 to switch the default tools to Clang/LLVM
	mm: memcg: fix memcg reclaim soft lockup
	tcp_bbr: refactor bbr_target_cwnd() for general inflight provisioning
	tcp_bbr: adapt cwnd based on ack aggregation estimation
	serial: 8250: Avoid error message on reprobe
	Linux 4.19.148

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5077093f58ae882547e3b23c5bbd4dfe78116071
2020-09-27 07:58:48 +02:00
Xunlei Pang
1aa7a9e5ee mm: memcg: fix memcg reclaim soft lockup
commit e3336cab2579012b1e72b5265adf98e2d6e244ad upstream.

We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg
doesn't have any reclaimable memory.

It can be easily reproduced as below:

  watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
  CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
  Call Trace:
    shrink_lruvec+0x49f/0x640
    shrink_node+0x2a6/0x6f0
    do_try_to_free_pages+0xe9/0x3e0
    try_to_free_mem_cgroup_pages+0xef/0x1f0
    try_charge+0x2c1/0x750
    mem_cgroup_charge+0xd7/0x240
    __add_to_page_cache_locked+0x2fd/0x370
    add_to_page_cache_lru+0x4a/0xc0
    pagecache_get_page+0x10b/0x2f0
    filemap_fault+0x661/0xad0
    ext4_filemap_fault+0x2c/0x40
    __do_fault+0x4d/0xf9
    handle_mm_fault+0x1080/0x1790

It only happens on our 1-vcpu instances, because there's no chance for
oom reaper to run to reclaim the to-be-killed process.

Add a cond_resched() at the upper shrink_node_memcgs() to solve this
issue, this will mean that we will get a scheduling point for each memcg
in the reclaimed hierarchy without any dependency on the reclaimable
memory in that memcg thus making it more predictable.

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-26 18:01:32 +02:00
Ralph Campbell
ec56646e3b mm/thp: fix __split_huge_pmd_locked() for migration PMD
[ Upstream commit ec0abae6dcdf7ef88607c869bf35a4b63ce1b370 ]

A migrating transparent huge page has to already be unmapped.  Otherwise,
the page could be modified while it is being copied to a new page and data
could be lost.  The function __split_huge_pmd() checks for a PMD migration
entry before calling __split_huge_pmd_locked() leading one to think that
__split_huge_pmd_locked() can handle splitting a migrating PMD.

However, the code always increments the page->_mapcount and adjusts the
memory control group accounting assuming the page is mapped.

Also, if the PMD entry is a migration PMD entry, the call to
is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead
of migration_entry_to_pfn(pmd_to_swp_entry(pmd)).  Fix these problems by
checking for a PMD migration entry.

Fixes: 84c3fc4e9c ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>	[4.14+]
Link: https://lkml.kernel.org/r/20200903183140.19055-1-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-26 18:01:28 +02:00
Greg Kroah-Hartman
0b8c61c48e Merge 4.19.147 into android-4.19-stable
Changes in 4.19.147
	dsa: Allow forwarding of redirected IGMP traffic
	scsi: qla2xxx: Update rscn_rcvd field to more meaningful scan_needed
	scsi: qla2xxx: Move rport registration out of internal work_list
	scsi: qla2xxx: Reduce holding sess_lock to prevent CPU lock-up
	gfs2: initialize transaction tr_ailX_lists earlier
	RDMA/bnxt_re: Restrict the max_gids to 256
	net: handle the return value of pskb_carve_frag_list() correctly
	hv_netvsc: Remove "unlikely" from netvsc_select_queue
	NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall
	scsi: pm8001: Fix memleak in pm8001_exec_internal_task_abort
	scsi: libfc: Fix for double free()
	scsi: lpfc: Fix FLOGI/PLOGI receive race condition in pt2pt discovery
	regulator: pwm: Fix machine constraints application
	spi: spi-loopback-test: Fix out-of-bounds read
	NFS: Zero-stateid SETATTR should first return delegation
	SUNRPC: stop printk reading past end of string
	rapidio: Replace 'select' DMAENGINES 'with depends on'
	openrisc: Fix cache API compile issue when not inlining
	nvme-fc: cancel async events before freeing event struct
	nvme-rdma: cancel async events before freeing event struct
	f2fs: fix indefinite loop scanning for free nid
	f2fs: Return EOF on unaligned end of file DIO read
	i2c: algo: pca: Reapply i2c bus settings after reset
	spi: Fix memory leak on splited transfers
	KVM: MIPS: Change the definition of kvm type
	clk: davinci: Use the correct size when allocating memory
	clk: rockchip: Fix initialization of mux_pll_src_4plls_p
	ASoC: qcom: Set card->owner to avoid warnings
	Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload
	perf test: Fix the "signal" test inline assembly
	MIPS: SNI: Fix MIPS_L1_CACHE_SHIFT
	perf test: Free formats for perf pmu parse test
	fbcon: Fix user font detection test at fbcon_resize().
	MIPS: SNI: Fix spurious interrupts
	drm/mediatek: Add exception handing in mtk_drm_probe() if component init fail
	drm/mediatek: Add missing put_device() call in mtk_hdmi_dt_parse_pdata()
	USB: quirks: Add USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk for BYD zhaoxin notebook
	USB: UAS: fix disconnect by unplugging a hub
	usblp: fix race between disconnect() and read()
	i2c: i801: Fix resume bug
	Revert "ALSA: hda - Fix silent audio output and corrupted input on MSI X570-A PRO"
	percpu: fix first chunk size calculation for populated bitmap
	Input: trackpoint - add new trackpoint variant IDs
	Input: i8042 - add Entroware Proteus EL07R4 to nomux and reset lists
	serial: 8250_pci: Add Realtek 816a and 816b
	x86/boot/compressed: Disable relocation relaxation
	ehci-hcd: Move include to keep CRC stable
	powerpc/dma: Fix dma_map_ops::get_required_mask
	x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
	Linux 4.19.147

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1c512021698db2701c51491a813bec79bda6bbf5
2020-09-24 12:48:04 +02:00
Sunghyun Jin
5afd52f302 percpu: fix first chunk size calculation for populated bitmap
commit b3b33d3c43bbe0177d70653f4e889c78cc37f097 upstream.

Variable populated, which is a member of struct pcpu_chunk, is used as a
unit of size of unsigned long.
However, size of populated is miscounted. So, I fix this minor part.

Fixes: 8ab16c43ea ("percpu: change the number of pages marked in the first_chunk pop bitmap")
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Sunghyun Jin <mcsmonk@gmail.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-23 12:11:01 +02:00
Greg Kroah-Hartman
009b982d9c Merge 4.19.144 into android-4.19-stable
Changes in 4.19.144
	HID: core: Correctly handle ReportSize being zero
	HID: core: Sanitize event code and type when mapping input
	perf record/stat: Explicitly call out event modifiers in the documentation
	scsi: target: tcmu: Fix size in calls to tcmu_flush_dcache_range
	scsi: target: tcmu: Optimize use of flush_dcache_page
	tty: serial: qcom_geni_serial: Drop __init from qcom_geni_console_setup
	drm/msm: add shutdown support for display platform_driver
	hwmon: (applesmc) check status earlier.
	nvmet: Disable keep-alive timer when kato is cleared to 0h
	drm/msm/a6xx: fix gmu start on newer firmware
	ceph: don't allow setlease on cephfs
	cpuidle: Fixup IRQ state
	s390: don't trace preemption in percpu macros
	xen/xenbus: Fix granting of vmalloc'd memory
	dmaengine: of-dma: Fix of_dma_router_xlate's of_dma_xlate handling
	batman-adv: Avoid uninitialized chaddr when handling DHCP
	batman-adv: Fix own OGM check in aggregated OGMs
	batman-adv: bla: use netif_rx_ni when not in interrupt context
	dmaengine: at_hdmac: check return value of of_find_device_by_node() in at_dma_xlate()
	MIPS: mm: BMIPS5000 has inclusive physical caches
	MIPS: BMIPS: Also call bmips_cpu_setup() for secondary cores
	netfilter: nf_tables: add NFTA_SET_USERDATA if not null
	netfilter: nf_tables: incorrect enum nft_list_attributes definition
	netfilter: nf_tables: fix destination register zeroing
	net: hns: Fix memleak in hns_nic_dev_probe
	net: systemport: Fix memleak in bcm_sysport_probe
	ravb: Fixed to be able to unload modules
	net: arc_emac: Fix memleak in arc_mdio_probe
	dmaengine: pl330: Fix burst length if burst size is smaller than bus width
	gtp: add GTPA_LINK info to msg sent to userspace
	bnxt_en: Don't query FW when netif_running() is false.
	bnxt_en: Check for zero dir entries in NVRAM.
	bnxt_en: Fix PCI AER error recovery flow
	bnxt_en: fix HWRM error when querying VF temperature
	xfs: fix boundary test in xfs_attr_shortform_verify
	bnxt: don't enable NAPI until rings are ready
	selftests/bpf: Fix massive output from test_maps
	netfilter: nfnetlink: nfnetlink_unicast() reports EAGAIN instead of ENOBUFS
	nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
	perf tools: Correct SNOOPX field offset
	net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
	fix regression in "epoll: Keep a reference on files added to the check list"
	net: gemini: Fix another missing clk_disable_unprepare() in probe
	xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files
	perf jevents: Fix suspicious code in fixregex()
	tg3: Fix soft lockup when tg3_reset_task() fails.
	x86, fakenuma: Fix invalid starting node ID
	iommu/vt-d: Serialize IOMMU GCMD register modifications
	thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430
	include/linux/log2.h: add missing () around n in roundup_pow_of_two()
	ext2: don't update mtime on COW faults
	xfs: don't update mtime on COW faults
	btrfs: drop path before adding new uuid tree entry
	vfio/type1: Support faulting PFNMAP vmas
	vfio-pci: Fault mmaps to enable vma tracking
	vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
	btrfs: Remove redundant extent_buffer_get in get_old_root
	btrfs: Remove extraneous extent_buffer_get from tree_mod_log_rewind
	btrfs: set the lockdep class for log tree extent buffers
	uaccess: Add non-pagefault user-space read functions
	uaccess: Add non-pagefault user-space write function
	btrfs: fix potential deadlock in the search ioctl
	net: usb: qmi_wwan: add Telit 0x1050 composition
	usb: qmi_wwan: add D-Link DWM-222 A2 device ID
	ALSA: ca0106: fix error code handling
	ALSA: pcm: oss: Remove superfluous WARN_ON() for mulaw sanity check
	ALSA: hda/hdmi: always check pin power status in i915 pin fixup
	ALSA: firewire-digi00x: exclude Avid Adrenaline from detection
	ALSA: hda - Fix silent audio output and corrupted input on MSI X570-A PRO
	media: rc: do not access device via sysfs after rc_unregister_device()
	media: rc: uevent sysfs file races with rc_unregister_device()
	affs: fix basic permission bits to actually work
	block: allow for_each_bvec to support zero len bvec
	libata: implement ATA_HORKAGE_MAX_TRIM_128M and apply to Sandisks
	dm writecache: handle DAX to partitions on persistent memory correctly
	dm cache metadata: Avoid returning cmd->bm wild pointer on error
	dm thin metadata: Avoid returning cmd->bm wild pointer on error
	mm: slub: fix conversion of freelist_corrupted()
	KVM: arm64: Add kvm_extable for vaxorcism code
	KVM: arm64: Defer guest entry when an asynchronous exception is pending
	KVM: arm64: Survive synchronous exceptions caused by AT instructions
	KVM: arm64: Set HCR_EL2.PTW to prevent AT taking synchronous exception
	vfio/pci: Fix SR-IOV VF handling with MMIO blocking
	checkpatch: fix the usage of capture group ( ... )
	mm/hugetlb: fix a race between hugetlb sysctl handlers
	cfg80211: regulatory: reject invalid hints
	net: usb: Fix uninit-was-stored issue in asix_read_phy_addr()
	Linux 4.19.144

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I81d6b3f044fe0dd919d1ece16d131c2185c00bb3
2020-09-09 19:48:58 +02:00
Muchun Song
221ea9a3da mm/hugetlb: fix a race between hugetlb sysctl handlers
commit 17743798d81238ab13050e8e2833699b54e15467 upstream.

There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.

  CPU0:                                 CPU1:
                                        proc_sys_write
  hugetlb_sysctl_handler                  proc_sys_call_handler
  hugetlb_sysctl_handler_common             hugetlb_sysctl_handler
    table->data = &tmp;                       hugetlb_sysctl_handler_common
                                                table->data = &tmp;
      proc_doulongvec_minmax
        do_proc_doulongvec_minmax           sysctl_head_finish
          __do_proc_doulongvec_minmax         unuse_table
            i = table->data;
            *i = val;  // corrupt CPU1's stack

Fix this by duplicating the `table`, and only update the duplicate of
it.  And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.

The following oops was seen:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    Code: Bad RIP value.
    ...
    Call Trace:
     ? set_max_huge_pages+0x3da/0x4f0
     ? alloc_pool_huge_page+0x150/0x150
     ? proc_doulongvec_minmax+0x46/0x60
     ? hugetlb_sysctl_handler_common+0x1c7/0x200
     ? nr_hugepages_store+0x20/0x20
     ? copy_fd_bitmaps+0x170/0x170
     ? hugetlb_sysctl_handler+0x1e/0x20
     ? proc_sys_call_handler+0x2f1/0x300
     ? unregister_sysctl_table+0xb0/0xb0
     ? __fd_install+0x78/0x100
     ? proc_sys_write+0x14/0x20
     ? __vfs_write+0x4d/0x90
     ? vfs_write+0xef/0x240
     ? ksys_write+0xc0/0x160
     ? __ia32_sys_read+0x50/0x50
     ? __close_fd+0x129/0x150
     ? __x64_sys_write+0x43/0x50
     ? do_syscall_64+0x6c/0x200
     ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: e5ff215941 ("hugetlb: multiple hstates for multiple page sizes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20200828031146.43035-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-09 19:04:32 +02:00
Eugeniu Rosca
af2cf2c5a2 mm: slub: fix conversion of freelist_corrupted()
commit dc07a728d49cf025f5da2c31add438d839d076c0 upstream.

Commit 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in
deactivate_slab()") suffered an update when picked up from LKML [1].

Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()'
created a no-op statement.  Fix it by sticking to the behavior intended
in the original patch [1].  In addition, make freelist_corrupted()
immune to passing NULL instead of &freelist.

The issue has been spotted via static analysis and code review.

[1] https://lore.kernel.org/linux-mm/20200331031450.12182-1-dongli.zhang@oracle.com/

Fixes: 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in deactivate_slab()")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200824130643.10291-1-erosca@de.adit-jv.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-09 19:04:31 +02:00
Daniel Borkmann
cfb4721fce uaccess: Add non-pagefault user-space write function
[ Upstream commit 1d1585ca0f48fe7ed95c3571f3e4a82b2b5045dc ]

Commit 3d7081822f7f ("uaccess: Add non-pagefault user-space read functions")
missed to add probe write function, therefore factor out a probe_write_common()
helper with most logic of probe_kernel_write() except setting KERNEL_DS, and
add a new probe_user_write() helper so it can be used from BPF side.

Again, on some archs, the user address space and kernel address space can
co-exist and be overlapping, so in such case, setting KERNEL_DS would mean
that the given address is treated as being in kernel address space.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/9df2542e68141bfa3addde631441ee45503856a8.1572649915.git.daniel@iogearbox.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-09 19:04:29 +02:00
Masami Hiramatsu
61135a9c74 uaccess: Add non-pagefault user-space read functions
[ Upstream commit 3d7081822f7f9eab867d9bcc8fd635208ec438e0 ]

Add probe_user_read(), strncpy_from_unsafe_user() and
strnlen_unsafe_user() which allows caller to access user-space
in IRQ context.

Current probe_kernel_read() and strncpy_from_unsafe() are
not available for user-space memory, because it sets
KERNEL_DS while accessing data. On some arch, user address
space and kernel address space can be co-exist, but others
can not. In that case, setting KERNEL_DS means given
address is treated as a kernel address space.
Also strnlen_user() is only available from user context since
it can sleep if pagefault is enabled.

To access user-space memory without pagefault, we need
these new functions which sets USER_DS while accessing
the data.

Link: http://lkml.kernel.org/r/155789869802.26965.4940338412595759063.stgit@devnote2

Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-09 19:04:29 +02:00
Greg Kroah-Hartman
599bf02de4 Merge 4.19.142 into android-4.19-stable
Changes in 4.19.142
	drm/vgem: Replace opencoded version of drm_gem_dumb_map_offset()
	perf probe: Fix memory leakage when the probe point is not found
	khugepaged: khugepaged_test_exit() check mmget_still_valid()
	khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
	btrfs: export helpers for subvolume name/id resolution
	btrfs: don't show full path of bind mounts in subvol=
	btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range
	btrfs: inode: fix NULL pointer dereference if inode doesn't need compression
	btrfs: sysfs: use NOFS for device creation
	romfs: fix uninitialized memory leak in romfs_dev_read()
	kernel/relay.c: fix memleak on destroy relay channel
	mm: include CMA pages in lowmem_reserve at boot
	mm, page_alloc: fix core hung in free_pcppages_bulk()
	ext4: fix checking of directory entry validity for inline directories
	jbd2: add the missing unlock_buffer() in the error path of jbd2_write_superblock()
	scsi: zfcp: Fix use-after-free in request timeout handlers
	drm/amd/display: fix pow() crashing when given base 0
	kthread: Do not preempt current task if it is going to call schedule()
	spi: Prevent adding devices below an unregistering controller
	scsi: ufs: Add DELAY_BEFORE_LPM quirk for Micron devices
	scsi: target: tcmu: Fix crash in tcmu_flush_dcache_range on ARM
	media: budget-core: Improve exception handling in budget_register()
	rtc: goldfish: Enable interrupt in set_alarm() when necessary
	media: vpss: clean up resources in init
	Input: psmouse - add a newline when printing 'proto' by sysfs
	m68knommu: fix overwriting of bits in ColdFire V3 cache control
	svcrdma: Fix another Receive buffer leak
	xfs: fix inode quota reservation checks
	jffs2: fix UAF problem
	ceph: fix use-after-free for fsc->mdsc
	cpufreq: intel_pstate: Fix cpuinfo_max_freq when MSR_TURBO_RATIO_LIMIT is 0
	scsi: libfc: Free skb in fc_disc_gpn_id_resp() for valid cases
	virtio_ring: Avoid loop when vq is broken in virtqueue_poll
	tools/testing/selftests/cgroup/cgroup_util.c: cg_read_strcmp: fix null pointer dereference
	xfs: Fix UBSAN null-ptr-deref in xfs_sysfs_init
	alpha: fix annotation of io{read,write}{16,32}be()
	fs/signalfd.c: fix inconsistent return codes for signalfd4
	ext4: fix potential negative array index in do_split()
	ext4: don't allow overlapping system zones
	ASoC: q6routing: add dummy register read/write function
	i40e: Set RX_ONLY mode for unicast promiscuous on VLAN
	i40e: Fix crash during removing i40e driver
	net: fec: correct the error path for regulator disable in probe
	bonding: show saner speed for broadcast mode
	bonding: fix a potential double-unregister
	s390/runtime_instrumentation: fix storage key handling
	s390/ptrace: fix storage key handling
	ASoC: msm8916-wcd-analog: fix register Interrupt offset
	ASoC: intel: Fix memleak in sst_media_open
	vfio/type1: Add proper error unwind for vfio_iommu_replay()
	kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode
	kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode
	kconfig: qconf: do not limit the pop-up menu to the first row
	kconfig: qconf: fix signal connection to invalid slots
	efi: avoid error message when booting under Xen
	Fix build error when CONFIG_ACPI is not set/enabled:
	RDMA/bnxt_re: Do not add user qps to flushlist
	afs: Fix NULL deref in afs_dynroot_depopulate()
	bonding: fix active-backup failover for current ARP slave
	net: ena: Prevent reset after device destruction
	net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()
	hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
	net: dsa: b53: check for timeout
	powerpc/pseries: Do not initiate shutdown when system is running on UPS
	efi: add missed destroy_workqueue when efisubsys_init fails
	epoll: Keep a reference on files added to the check list
	do_epoll_ctl(): clean the failure exits up a bit
	mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
	xen: don't reschedule in preemption off sections
	clk: Evict unregistered clks from parent caches
	KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
	KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
	Linux 4.19.142

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibfe4a0a4249f76ab35076f4b003e32cd6f9788a5
2020-08-26 11:07:03 +02:00
Peter Xu
734654ae79 mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
commit 75802ca66354a39ab8e35822747cd08b3384a99a upstream.

This is found by code observation only.

Firstly, the worst case scenario should assume the whole range was covered
by pmd sharing.  The old algorithm might not work as expected for ranges
like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
expected range should be (0, 2g).

Since at it, remove the loop since it should not be required.  With that,
the new code should be faster too when the invalidating range is huge.

Mike said:

: With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only
: adjust to (0, 1g+2m) which is incorrect.
:
: We should cc stable.  The original reason for adjusting the range was to
: prevent data corruption (getting wrong page).  Since the range is not
: always adjusted correctly, the potential for corruption still exists.
:
: However, I am fairly confident that adjust_range_if_pmd_sharing_possible
: is only gong to be called in two cases:
:
: 1) for a single page
: 2) for range == entire vma
:
: In those cases, the current code should produce the correct results.
:
: To be safe, let's just cc stable.

Fixes: 017b1660df ("mm: migration: fix migration of huge PMD shared pages")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26 10:31:06 +02:00
Charan Teja Reddy
c666936d8d mm, page_alloc: fix core hung in free_pcppages_bulk()
commit 88e8ac11d2ea3acc003cf01bb5a38c8aa76c3cfd upstream.

The following race is observed with the repeated online, offline and a
delay between two successive online of memory blocks of movable zone.

P1						P2

Online the first memory block in
the movable zone. The pcp struct
values are initialized to default
values,i.e., pcp->high = 0 &
pcp->batch = 1.

					Allocate the pages from the
					movable zone.

Try to Online the second memory
block in the movable zone thus it
entered the online_pages() but yet
to call zone_pcp_update().
					This process is entered into
					the exit path thus it tries
					to release the order-0 pages
					to pcp lists through
					free_unref_page_commit().
					As pcp->high = 0, pcp->count = 1
					proceed to call the function
					free_pcppages_bulk().
Update the pcp values thus the
new pcp values are like, say,
pcp->high = 378, pcp->batch = 63.
					Read the pcp's batch value using
					READ_ONCE() and pass the same to
					free_pcppages_bulk(), pcp values
					passed here are, batch = 63,
					count = 1.

					Since num of pages in the pcp
					lists are less than ->batch,
					then it will stuck in
					while(list_empty(list)) loop
					with interrupts disabled thus
					a core hung.

Avoid this by ensuring free_pcppages_bulk() is called with proper count of
pcp list pages.

The mentioned race is some what easily reproducible without [1] because
pcp's are not updated for the first memory block online and thus there is
a enough race window for P2 between alloc+free and pcp struct values
update through onlining of second memory block.

With [1], the race still exists but it is very narrow as we update the pcp
struct values for the first memory block online itself.

This is not limited to the movable zone, it could also happen in cases
with the normal zone (e.g., hotplug to a node that only has DMA memory, or
no other memory yet).

[1]: https://patchwork.kernel.org/patch/11696389/

Fixes: 5f8dcc2121 ("page-allocator: split per-cpu list into one-list-per-migrate-type")
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: <stable@vger.kernel.org> [2.6+]
Link: http://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaurora.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26 10:30:59 +02:00
Doug Berger
84b8dc232a mm: include CMA pages in lowmem_reserve at boot
commit e08d3fdfe2dafa0331843f70ce1ff6c1c4900bf4 upstream.

The lowmem_reserve arrays provide a means of applying pressure against
allocations from lower zones that were targeted at higher zones.  Its
values are a function of the number of pages managed by higher zones and
are assigned by a call to the setup_per_zone_lowmem_reserve() function.

The function is initially called at boot time by the function
init_per_zone_wmark_min() and may be called later by accesses of the
/proc/sys/vm/lowmem_reserve_ratio sysctl file.

The function init_per_zone_wmark_min() was moved up from a module_init to
a core_initcall to resolve a sequencing issue with khugepaged.
Unfortunately this created a sequencing issue with CMA page accounting.

The CMA pages are added to the managed page count of a zone when
cma_init_reserved_areas() is called at boot also as a core_initcall.  This
makes it uncertain whether the CMA pages will be added to the managed page
counts of their zones before or after the call to
init_per_zone_wmark_min() as it becomes dependent on link order.  With the
current link order the pages are added to the managed count after the
lowmem_reserve arrays are initialized at boot.

This means the lowmem_reserve values at boot may be lower than the values
used later if /proc/sys/vm/lowmem_reserve_ratio is accessed even if the
ratio values are unchanged.

In many cases the difference is not significant, but for example
an ARM platform with 1GB of memory and the following memory layout

  cma: Reserved 256 MiB at 0x0000000030000000
  Zone ranges:
    DMA      [mem 0x0000000000000000-0x000000002fffffff]
    Normal   empty
    HighMem  [mem 0x0000000030000000-0x000000003fffffff]

would result in 0 lowmem_reserve for the DMA zone.  This would allow
userspace to deplete the DMA zone easily.

Funnily enough

  $ cat /proc/sys/vm/lowmem_reserve_ratio

would fix up the situation because as a side effect it forces
setup_per_zone_lowmem_reserve.

This commit breaks the link order dependency by invoking
init_per_zone_wmark_min() as a postcore_initcall so that the CMA pages
have the chance to be properly accounted in their zone(s) and allowing
the lowmem_reserve arrays to receive consistent values.

Fixes: bc22af74f2 ("mm: update min_free_kbytes from khugepaged after core initialization")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1597423766-27849-1-git-send-email-opendmb@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26 10:30:59 +02:00
Hugh Dickins
2ef7ebb143 khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
[ Upstream commit f3f99d63a8156c7a4a6b20aac22b53c5579c7dc1 ]

syzbot crashes on the VM_BUG_ON_MM(khugepaged_test_exit(mm), mm) in
__khugepaged_enter(): yes, when one thread is about to dump core, has set
core_state, and is waiting for others, another might do something calling
__khugepaged_enter(), which now crashes because I lumped the core_state
test (known as "mmget_still_valid") into khugepaged_test_exit().  I still
think it's best to lump them together, so just in this exceptional case,
check mm->mm_users directly instead of khugepaged_test_exit().

Fixes: bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>	[4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008141503370.18085@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26 10:30:58 +02:00
Hugh Dickins
17c08ee00b khugepaged: khugepaged_test_exit() check mmget_still_valid()
[ Upstream commit bbe98f9cadff58cdd6a4acaeba0efa8565dabe65 ]

Move collapse_huge_page()'s mmget_still_valid() check into
khugepaged_test_exit() itself.  collapse_huge_page() is used for anon THP
only, and earned its mmget_still_valid() check because it inserts a huge
pmd entry in place of the page table's pmd entry; whereas
collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
merely clears the page table's pmd entry.  But core dumping without mmap
lock must have been as open to mistaking a racily cleared pmd entry for a
page table at physical page 0, as exit_mmap() was.  And we certainly have
no interest in mapping as a THP once dumping core.

Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>	[4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26 10:30:58 +02:00
Greg Kroah-Hartman
369c9d2963 Merge 4.19.141 into android-4.19-stable
Changes in 4.19.141
	smb3: warn on confusing error scenario with sec=krb5
	genirq/affinity: Make affinity setting if activated opt-in
	PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context()
	PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken
	PCI: Add device even if driver attach failed
	PCI: qcom: Define some PARF params needed for ipq8064 SoC
	PCI: qcom: Add support for tx term offset for rev 2.1.0
	PCI: Probe bridge window attributes once at enumeration-time
	btrfs: free anon block device right after subvolume deletion
	btrfs: don't allocate anonymous block device for user invisible roots
	btrfs: ref-verify: fix memory leak in add_block_entry
	btrfs: don't traverse into the seed devices in show_devname
	btrfs: open device without device_list_mutex
	btrfs: fix messages after changing compression level by remount
	btrfs: only search for left_info if there is no right_info in try_merge_free_space
	btrfs: fix memory leaks after failure to lookup checksums during inode logging
	btrfs: fix return value mixup in btrfs_get_extent
	dt-bindings: iio: io-channel-mux: Fix compatible string in example code
	iio: dac: ad5592r: fix unbalanced mutex unlocks in ad5592r_read_raw()
	xtensa: fix xtensa_pmu_setup prototype
	cifs: Fix leak when handling lease break for cached root fid
	powerpc: Allow 4224 bytes of stack expansion for the signal frame
	powerpc: Fix circular dependency between percpu.h and mmu.h
	media: vsp1: dl: Fix NULL pointer dereference on unbind
	net: ethernet: stmmac: Disable hardware multicast filter
	net: stmmac: dwmac1000: provide multicast filter fallback
	net/compat: Add missing sock updates for SCM_RIGHTS
	md/raid5: Fix Force reconstruct-write io stuck in degraded raid5
	bcache: allocate meta data pages as compound pages
	bcache: fix overflow in offset_to_stripe()
	mac80211: fix misplaced while instead of if
	driver core: Avoid binding drivers to dead devices
	MIPS: CPU#0 is not hotpluggable
	ext2: fix missing percpu_counter_inc
	ocfs2: change slot number type s16 to u16
	mm/page_counter.c: fix protection usage propagation
	ftrace: Setup correct FTRACE_FL_REGS flags for module
	kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler
	tracing/hwlat: Honor the tracing_cpumask
	tracing: Use trace_sched_process_free() instead of exit() for pid tracing
	watchdog: f71808e_wdt: indicate WDIOF_CARDRESET support in watchdog_info.options
	watchdog: f71808e_wdt: remove use of wrong watchdog_info option
	watchdog: f71808e_wdt: clear watchdog timeout occurred flag
	pseries: Fix 64 bit logical memory block panic
	module: Correctly truncate sysfs sections output
	perf intel-pt: Fix FUP packet state
	remoteproc: qcom: q6v5: Update running state before requesting stop
	drm/imx: imx-ldb: Disable both channels for split mode in enc->disable()
	mfd: arizona: Ensure 32k clock is put on driver unbind and error
	RDMA/ipoib: Return void from ipoib_ib_dev_stop()
	RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah()
	media: rockchip: rga: Introduce color fmt macros and refactor CSC mode logic
	media: rockchip: rga: Only set output CSC mode for RGB input
	USB: serial: ftdi_sio: make process-packet buffer unsigned
	USB: serial: ftdi_sio: clean up receive processing
	mmc: renesas_sdhi_internal_dmac: clean up the code for dma complete
	gpu: ipu-v3: image-convert: Combine rotate/no-rotate irq handlers
	dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue()
	selftests/powerpc: ptrace-pkey: Rename variables to make it easier to follow code
	selftests/powerpc: ptrace-pkey: Update the test to mark an invalid pkey correctly
	selftests/powerpc: ptrace-pkey: Don't update expected UAMOR value
	iommu/omap: Check for failure of a call to omap_iommu_dump_ctx
	iommu/vt-d: Enforce PASID devTLB field mask
	i2c: rcar: slave: only send STOP event when we have been addressed
	clk: clk-atlas6: fix return value check in atlas6_clk_init()
	pwm: bcm-iproc: handle clk_get_rate() return
	tools build feature: Use CC and CXX from parent
	i2c: rcar: avoid race when unregistering slave
	openrisc: Fix oops caused when dumping stack
	scsi: lpfc: nvmet: Avoid hang / use-after-free again when destroying targetport
	watchdog: initialize device before misc_register
	Input: sentelic - fix error return when fsp_reg_write fails
	drm/vmwgfx: Use correct vmw_legacy_display_unit pointer
	drm/vmwgfx: Fix two list_for_each loop exit tests
	net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init
	nfs: Fix getxattr kernel panic and memory overflow
	fs/minix: set s_maxbytes correctly
	fs/minix: fix block limit check for V1 filesystems
	fs/minix: remove expected error message in block_to_path()
	fs/ufs: avoid potential u32 multiplication overflow
	test_kmod: avoid potential double free in trigger_config_run_type()
	mfd: dln2: Run event handler loop under spinlock
	ALSA: echoaudio: Fix potential Oops in snd_echo_resume()
	perf bench mem: Always memset source before memcpy
	tools build feature: Quote CC and CXX for their arguments
	sh: landisk: Add missing initialization of sh_io_port_base
	khugepaged: retract_page_tables() remember to test exit
	arm64: dts: marvell: espressobin: add ethernet alias
	drm: Added orientation quirk for ASUS tablet model T103HAF
	drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume
	Linux 4.19.141

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0800f8e03919fd8f054c1bcda87efd70a6e5db6b
2020-08-21 13:01:46 +02:00
Hugh Dickins
2406c45db3 khugepaged: retract_page_tables() remember to test exit
commit 18e77600f7a1ed69f8ce46c9e11cad0985712dfa upstream.

Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.

The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.

In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock.  Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example.  But retract_page_tables() did not: fix that.

The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.

Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org>	[4.8+]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21 11:05:39 +02:00
Michal Koutný
e88a72e86b mm/page_counter.c: fix protection usage propagation
commit a6f23d14ec7d7d02220ad8bb2774be3322b9aeec upstream.

When workload runs in cgroups that aren't directly below root cgroup and
their parent specifies reclaim protection, it may end up ineffective.

The reason is that propagate_protected_usage() is not called in all
hierarchy up.  All the protected usage is incorrectly accumulated in the
workload's parent.  This means that siblings_low_usage is overestimated
and effective protection underestimated.  Even though it is transitional
phenomenon (uncharge path does correct propagation and fixes the wrong
children_low_usage), it can undermine the intended protection
unexpectedly.

We have noticed this problem while seeing a swap out in a descendant of a
protected memcg (intermediate node) while the parent was conveniently
under its protection limit and the memory pressure was external to that
hierarchy.  Michal has pinpointed this down to the wrong
siblings_low_usage which led to the unwanted reclaim.

The fix is simply updating children_low_usage in respective ancestors also
in the charging path.

Fixes: 230671533d ("mm: memory.low hierarchical behavior")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>	[4.18+]
Link: http://lkml.kernel.org/r/20200803153231.15477-1-mhocko@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21 11:05:33 +02:00
Greg Kroah-Hartman
7086849b9c Merge 4.19.140 into android-4.19-stable
Changes in 4.19.140
	tracepoint: Mark __tracepoint_string's __used
	HID: input: Fix devices that return multiple bytes in battery report
	cgroup: add missing skcd->no_refcnt check in cgroup_sk_clone()
	x86/mce/inject: Fix a wrong assignment of i_mce.status
	sched/fair: Fix NOHZ next idle balance
	sched: correct SD_flags returned by tl->sd_flags()
	arm64: dts: rockchip: fix rk3368-lion gmac reset gpio
	arm64: dts: rockchip: fix rk3399-puma vcc5v0-host gpio
	arm64: dts: rockchip: fix rk3399-puma gmac reset gpio
	EDAC: Fix reference count leaks
	arm64: dts: qcom: msm8916: Replace invalid bias-pull-none property
	crypto: ccree - fix resource leak on error path
	firmware: arm_scmi: Fix SCMI genpd domain probing
	arm64: dts: exynos: Fix silent hang after boot on Espresso
	clk: scmi: Fix min and max rate when registering clocks with discrete rates
	m68k: mac: Don't send IOP message until channel is idle
	m68k: mac: Fix IOP status/control register writes
	platform/x86: intel-hid: Fix return value check in check_acpi_dev()
	platform/x86: intel-vbtn: Fix return value check in check_acpi_dev()
	ARM: dts: gose: Fix ports node name for adv7180
	ARM: dts: gose: Fix ports node name for adv7612
	ARM: at91: pm: add missing put_device() call in at91_pm_sram_init()
	spi: lantiq: fix: Rx overflow error in full duplex mode
	ARM: socfpga: PM: add missing put_device() call in socfpga_setup_ocram_self_refresh()
	drm/tilcdc: fix leak & null ref in panel_connector_get_modes
	soc: qcom: rpmh-rsc: Set suppress_bind_attrs flag
	Bluetooth: add a mutex lock to avoid UAF in do_enale_set
	loop: be paranoid on exit and prevent new additions / removals
	fs/btrfs: Add cond_resched() for try_release_extent_mapping() stalls
	drm/amdgpu: avoid dereferencing a NULL pointer
	drm/radeon: Fix reference count leaks caused by pm_runtime_get_sync
	crypto: aesni - Fix build with LLVM_IAS=1
	video: fbdev: neofb: fix memory leak in neo_scan_monitor()
	md-cluster: fix wild pointer of unlock_all_bitmaps()
	arm64: dts: hisilicon: hikey: fixes to comply with adi, adv7533 DT binding
	drm/etnaviv: fix ref count leak via pm_runtime_get_sync
	drm/nouveau: fix multiple instances of reference count leaks
	usb: mtu3: clear dual mode of u3port when disable device
	drm/debugfs: fix plain echo to connector "force" attribute
	drm/radeon: disable AGP by default
	irqchip/irq-mtk-sysirq: Replace spinlock with raw_spinlock
	mm/mmap.c: Add cond_resched() for exit_mmap() CPU stalls
	brcmfmac: keep SDIO watchdog running when console_interval is non-zero
	brcmfmac: To fix Bss Info flag definition Bug
	brcmfmac: set state of hanger slot to FREE when flushing PSQ
	iwlegacy: Check the return value of pcie_capability_read_*()
	gpu: host1x: debug: Fix multiple channels emitting messages simultaneously
	usb: gadget: net2280: fix memory leak on probe error handling paths
	bdc: Fix bug causing crash after multiple disconnects
	usb: bdc: Halt controller on suspend
	dyndbg: fix a BUG_ON in ddebug_describe_flags
	bcache: fix super block seq numbers comparision in register_cache_set()
	ACPICA: Do not increment operation_region reference counts for field units
	drm/msm: ratelimit crtc event overflow error
	agp/intel: Fix a memory leak on module initialisation failure
	video: fbdev: sm712fb: fix an issue about iounmap for a wrong address
	console: newport_con: fix an issue about leak related system resources
	video: pxafb: Fix the function used to balance a 'dma_alloc_coherent()' call
	ath10k: Acquire tx_lock in tx error paths
	iio: improve IIO_CONCENTRATION channel type description
	drm/etnaviv: Fix error path on failure to enable bus clk
	drm/arm: fix unintentional integer overflow on left shift
	leds: lm355x: avoid enum conversion warning
	media: omap3isp: Add missed v4l2_ctrl_handler_free() for preview_init_entities()
	ASoC: Intel: bxt_rt298: add missing .owner field
	scsi: cumana_2: Fix different dev_id between request_irq() and free_irq()
	drm/mipi: use dcs write for mipi_dsi_dcs_set_tear_scanline
	cxl: Fix kobject memleak
	drm/radeon: fix array out-of-bounds read and write issues
	scsi: powertec: Fix different dev_id between request_irq() and free_irq()
	scsi: eesox: Fix different dev_id between request_irq() and free_irq()
	ipvs: allow connection reuse for unconfirmed conntrack
	media: firewire: Using uninitialized values in node_probe()
	media: exynos4-is: Add missed check for pinctrl_lookup_state()
	xfs: don't eat an EIO/ENOSPC writeback error when scrubbing data fork
	xfs: fix reflink quota reservation accounting error
	RDMA/rxe: Skip dgid check in loopback mode
	PCI: Fix pci_cfg_wait queue locking problem
	leds: core: Flush scheduled work for system suspend
	drm: panel: simple: Fix bpc for LG LB070WV8 panel
	phy: exynos5-usbdrd: Calibrating makes sense only for USB2.0 PHY
	drm/bridge: sil_sii8620: initialize return of sii8620_readb
	scsi: scsi_debug: Add check for sdebug_max_queue during module init
	mwifiex: Prevent memory corruption handling keys
	powerpc/vdso: Fix vdso cpu truncation
	RDMA/qedr: SRQ's bug fixes
	RDMA/rxe: Prevent access to wr->next ptr afrer wr is posted to send queue
	staging: rtl8192u: fix a dubious looking mask before a shift
	PCI/ASPM: Add missing newline in sysfs 'policy'
	powerpc/book3s64/pkeys: Use PVR check instead of cpu feature
	drm/imx: tve: fix regulator_disable error path
	USB: serial: iuu_phoenix: fix led-activity helpers
	usb: core: fix quirks_param_set() writing to a const pointer
	thermal: ti-soc-thermal: Fix reversed condition in ti_thermal_expose_sensor()
	coresight: tmc: Fix TMC mode read in tmc_read_unprepare_etb()
	MIPS: OCTEON: add missing put_device() call in dwc3_octeon_device_init()
	usb: dwc2: Fix error path in gadget registration
	scsi: mesh: Fix panic after host or bus reset
	net: dsa: mv88e6xxx: MV88E6097 does not support jumbo configuration
	PCI: cadence: Fix updating Vendor ID and Subsystem Vendor ID register
	RDMA/core: Fix return error value in _ib_modify_qp() to negative
	Smack: fix another vsscanf out of bounds
	Smack: prevent underflow in smk_set_cipso()
	power: supply: check if calc_soc succeeded in pm860x_init_battery
	Bluetooth: hci_h5: Set HCI_UART_RESET_ON_INIT to correct flags
	Bluetooth: hci_serdev: Only unregister device if it was registered
	net: dsa: rtl8366: Fix VLAN semantics
	net: dsa: rtl8366: Fix VLAN set-up
	powerpc/boot: Fix CONFIG_PPC_MPC52XX references
	selftests/powerpc: Fix CPU affinity for child process
	PCI: Release IVRS table in AMD ACS quirk
	selftests/powerpc: Fix online CPU selection
	ASoC: meson: axg-tdm-interface: fix link fmt setup
	s390/qeth: don't process empty bridge port events
	wl1251: fix always return 0 error
	tools, build: Propagate build failures from tools/build/Makefile.build
	net: ethernet: aquantia: Fix wrong return value
	liquidio: Fix wrong return value in cn23xx_get_pf_num()
	net: spider_net: Fix the size used in a 'dma_free_coherent()' call
	fsl/fman: use 32-bit unsigned integer
	fsl/fman: fix dereference null return value
	fsl/fman: fix unreachable code
	fsl/fman: check dereferencing null pointer
	fsl/fman: fix eth hash table allocation
	dlm: Fix kobject memleak
	ocfs2: fix unbalanced locking
	pinctrl-single: fix pcs_parse_pinconf() return value
	svcrdma: Fix page leak in svc_rdma_recv_read_chunk()
	x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task
	crypto: aesni - add compatibility with IAS
	af_packet: TPACKET_V3: fix fill status rwlock imbalance
	drivers/net/wan/lapbether: Added needed_headroom and a skb->len check
	net/nfc/rawsock.c: add CAP_NET_RAW check.
	net: Set fput_needed iff FDPUT_FPUT is set
	net/tls: Fix kmap usage
	net: refactor bind_bucket fastreuse into helper
	net: initialize fastreuse on inet_inherit_port
	USB: serial: cp210x: re-enable auto-RTS on open
	USB: serial: cp210x: enable usb generic throttle/unthrottle
	ALSA: hda - fix the micmute led status for Lenovo ThinkCentre AIO
	ALSA: usb-audio: Creative USB X-Fi Pro SB1095 volume knob support
	ALSA: usb-audio: fix overeager device match for MacroSilicon MS2109
	ALSA: usb-audio: work around streaming quirk for MacroSilicon MS2109
	pstore: Fix linking when crypto API disabled
	crypto: hisilicon - don't sleep of CRYPTO_TFM_REQ_MAY_SLEEP was not specified
	crypto: qat - fix double free in qat_uclo_create_batch_init_list
	crypto: ccp - Fix use of merged scatterlists
	crypto: cpt - don't sleep of CRYPTO_TFM_REQ_MAY_SLEEP was not specified
	bitfield.h: don't compile-time validate _val in FIELD_FIT
	fs/minix: check return value of sb_getblk()
	fs/minix: don't allow getting deleted inodes
	fs/minix: reject too-large maximum file size
	ALSA: usb-audio: add quirk for Pioneer DDJ-RB
	9p: Fix memory leak in v9fs_mount
	drm/ttm/nouveau: don't call tt destroy callback on alloc failure.
	NFS: Don't move layouts to plh_return_segs list while in use
	NFS: Don't return layout segments that are in use
	cpufreq: dt: fix oops on armada37xx
	include/asm-generic/vmlinux.lds.h: align ro_after_init
	spi: spidev: Align buffers for DMA
	mtd: rawnand: qcom: avoid write to unavailable register
	parisc: Implement __smp_store_release and __smp_load_acquire barriers
	parisc: mask out enable and reserved bits from sba imask
	ARM: 8992/1: Fix unwind_frame for clang-built kernels
	irqdomain/treewide: Free firmware node after domain removal
	xen/balloon: fix accounting in alloc_xenballooned_pages error path
	xen/balloon: make the balloon wait interruptible
	xen/gntdev: Fix dmabuf import with non-zero sgt offset
	Linux 4.19.140

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6b0d8dcf9ded022f62d9c62605388f1c1e9112d1
2020-08-19 08:43:22 +02:00
Paul E. McKenney
abfa9c47ec mm/mmap.c: Add cond_resched() for exit_mmap() CPU stalls
[ Upstream commit 0a3b3c253a1eb2c7fe7f34086d46660c909abeb3 ]

A large process running on a heavily loaded system can encounter the
following RCU CPU stall warning:

  rcu: INFO: rcu_sched self-detected stall on CPU
  rcu: 	3-....: (20998 ticks this GP) idle=4ea/1/0x4000000000000002 softirq=556558/556558 fqs=5190
  	(t=21013 jiffies g=1005461 q=132576)
  NMI backtrace for cpu 3
  CPU: 3 PID: 501900 Comm: aio-free-ring-w Kdump: loaded Not tainted 5.2.9-108_fbk12_rc3_3858_gb83b75af7909 #1
  Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
  Call Trace:
   <IRQ>
   dump_stack+0x46/0x60
   nmi_cpu_backtrace.cold.3+0x13/0x50
   ? lapic_can_unplug_cpu.cold.27+0x34/0x34
   nmi_trigger_cpumask_backtrace+0xba/0xca
   rcu_dump_cpu_stacks+0x99/0xc7
   rcu_sched_clock_irq.cold.87+0x1aa/0x397
   ? tick_sched_do_timer+0x60/0x60
   update_process_times+0x28/0x60
   tick_sched_timer+0x37/0x70
   __hrtimer_run_queues+0xfe/0x270
   hrtimer_interrupt+0xf4/0x210
   smp_apic_timer_interrupt+0x5e/0x120
   apic_timer_interrupt+0xf/0x20
   </IRQ>
  RIP: 0010:kmem_cache_free+0x223/0x300
  Code: 88 00 00 00 0f 85 ca 00 00 00 41 8b 55 18 31 f6 f7 da 41 f6 45 0a 02 40 0f 94 c6 83 c6 05 9c 41 5e fa e8 a0 a7 01 00 41 56 9d <49> 8b 47 08 a8 03 0f 85 87 00 00 00 65 48 ff 08 e9 3d fe ff ff 65
  RSP: 0018:ffffc9000e8e3da8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
  RAX: 0000000000020000 RBX: ffff88861b9de960 RCX: 0000000000000030
  RDX: fffffffffffe41e8 RSI: 000060777fe3a100 RDI: 000000000001be18
  RBP: ffffea00186e7780 R08: ffffffffffffffff R09: ffffffffffffffff
  R10: ffff88861b9dea28 R11: ffff88887ffde000 R12: ffffffff81230a1f
  R13: ffff888854684dc0 R14: 0000000000000206 R15: ffff8888547dbc00
   ? remove_vma+0x4f/0x60
   remove_vma+0x4f/0x60
   exit_mmap+0xd6/0x160
   mmput+0x4a/0x110
   do_exit+0x278/0xae0
   ? syscall_trace_enter+0x1d3/0x2b0
   ? handle_mm_fault+0xaa/0x1c0
   do_group_exit+0x3a/0xa0
   __x64_sys_exit_group+0x14/0x20
   do_syscall_64+0x42/0x100
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

And on a PREEMPT=n kernel, the "while (vma)" loop in exit_mmap() can run
for a very long time given a large process.  This commit therefore adds
a cond_resched() to this loop, providing RCU any needed quiescent states.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19 08:14:52 +02:00
Greg Kroah-Hartman
bcf9517454 Merge 4.19.135 into android-4.19-stable
Changes in 4.19.135
	soc: qcom: rpmh: Dirt can only make you dirtier, not cleaner
	gpio: arizona: handle pm_runtime_get_sync failure case
	gpio: arizona: put pm_runtime in case of failure
	pinctrl: amd: fix npins for uart0 in kerncz_groups
	mac80211: allow rx of mesh eapol frames with default rx key
	scsi: scsi_transport_spi: Fix function pointer check
	xtensa: fix __sync_fetch_and_{and,or}_4 declarations
	xtensa: update *pos in cpuinfo_op.next
	drivers/net/wan/lapbether: Fixed the value of hard_header_len
	net: sky2: initialize return of gm_phy_read
	drm/nouveau/i2c/g94-: increase NV_PMGR_DP_AUXCTL_TRANSACTREQ timeout
	drivers/firmware/psci: Fix memory leakage in alloc_init_cpu_groups()
	fuse: fix weird page warning
	irqdomain/treewide: Keep firmware node unconditionally allocated
	SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO compeletion")
	spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours
	tipc: clean up skb list lock handling on send path
	IB/umem: fix reference count leak in ib_umem_odp_get()
	uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix GDB regression
	ALSA: info: Drop WARN_ON() from buffer NULL sanity check
	ASoC: rt5670: Correct RT5670_LDO_SEL_MASK
	btrfs: fix double free on ulist after backref resolution failure
	btrfs: fix mount failure caused by race with umount
	btrfs: fix page leaks after failure to lock page for delalloc
	bnxt_en: Fix race when modifying pause settings.
	fpga: dfl: fix bug in port reset handshake
	hippi: Fix a size used in a 'pci_free_consistent()' in an error handling path
	ax88172a: fix ax88172a_unbind() failures
	net: dp83640: fix SIOCSHWTSTAMP to update the struct with actual configuration
	ieee802154: fix one possible memleak in adf7242_probe
	drm: sun4i: hdmi: Fix inverted HPD result
	net: smc91x: Fix possible memory leak in smc_drv_probe()
	bonding: check error value of register_netdevice() immediately
	mlxsw: destroy workqueue when trap_register in mlxsw_emad_init
	qed: suppress "don't support RoCE & iWARP" flooding on HW init
	ipvs: fix the connection sync failed in some cases
	net: ethernet: ave: Fix error returns in ave_init
	i2c: rcar: always clear ICSAR to avoid side effects
	bonding: check return value of register_netdevice() in bond_newlink()
	serial: exar: Fix GPIO configuration for Sealevel cards based on XR17V35X
	scripts/decode_stacktrace: strip basepath from all paths
	scripts/gdb: fix lx-symbols 'gdb.error' while loading modules
	HID: i2c-hid: add Mediacom FlexBook edge13 to descriptor override
	HID: alps: support devices with report id 2
	HID: steam: fixes race in handling device list.
	HID: apple: Disable Fn-key key-re-mapping on clone keyboards
	dmaengine: tegra210-adma: Fix runtime PM imbalance on error
	Input: add `SW_MACHINE_COVER`
	spi: mediatek: use correct SPI_CFG2_REG MACRO
	regmap: dev_get_regmap_match(): fix string comparison
	hwmon: (aspeed-pwm-tacho) Avoid possible buffer overflow
	dmaengine: ioat setting ioat timeout as module parameter
	Input: synaptics - enable InterTouch for ThinkPad X1E 1st gen
	usb: gadget: udc: gr_udc: fix memleak on error handling path in gr_ep_init()
	hwmon: (adm1275) Make sure we are reading enough data for different chips
	hwmon: (scmi) Fix potential buffer overflow in scmi_hwmon_probe()
	arm64: Use test_tsk_thread_flag() for checking TIF_SINGLESTEP
	x86: math-emu: Fix up 'cmp' insn for clang ias
	RISC-V: Upgrade smp_mb__after_spinlock() to iorw,iorw
	binder: Don't use mmput() from shrinker function.
	usb: xhci-mtk: fix the failure of bandwidth allocation
	usb: xhci: Fix ASM2142/ASM3142 DMA addressing
	Revert "cifs: Fix the target file was deleted when rename failed."
	staging: wlan-ng: properly check endpoint types
	staging: comedi: addi_apci_1032: check INSN_CONFIG_DIGITAL_TRIG shift
	staging: comedi: ni_6527: fix INSN_CONFIG_DIGITAL_TRIG support
	staging: comedi: addi_apci_1500: check INSN_CONFIG_DIGITAL_TRIG shift
	staging: comedi: addi_apci_1564: check INSN_CONFIG_DIGITAL_TRIG shift
	serial: 8250: fix null-ptr-deref in serial8250_start_tx()
	serial: 8250_mtk: Fix high-speed baud rates clamping
	fbdev: Detect integer underflow at "struct fbcon_ops"->clear_margins.
	vt: Reject zero-sized screen buffer size.
	Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation
	mm/memcg: fix refcount error while moving and swapping
	mm: memcg/slab: synchronize access to kmem_cache dying flag using a spinlock
	mm: memcg/slab: fix memory leak at non-root kmem_cache destroy
	io-mapping: indicate mapping failure
	drm/amdgpu: Fix NULL dereference in dpm sysfs handlers
	drm/amd/powerplay: fix a crash when overclocking Vega M
	parisc: Add atomic64_set_release() define to avoid CPU soft lockups
	x86, vmlinux.lds: Page-align end of ..page_aligned sections
	ASoC: rt5670: Add new gpio1_is_ext_spk_en quirk and enable it on the Lenovo Miix 2 10
	ASoC: qcom: Drop HAS_DMA dependency to fix link failure
	dm integrity: fix integrity recalculation that is improperly skipped
	ath9k: Fix general protection fault in ath9k_hif_usb_rx_cb
	ath9k: Fix regression with Atheros 9271
	Linux 4.19.135

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0bbcde83e7c810352d998f28d3484efa2b9ede8e
2020-07-29 13:22:30 +02:00
Muchun Song
d87ddcdb2d mm: memcg/slab: fix memory leak at non-root kmem_cache destroy
commit d38a2b7a9c939e6d7329ab92b96559ccebf7b135 upstream.

If the kmem_cache refcount is greater than one, we should not mark the
root kmem_cache as dying.  If we mark the root kmem_cache dying
incorrectly, the non-root kmem_cache can never be destroyed.  It
resulted in memory leak when memcg was destroyed.  We can use the
following steps to reproduce.

  1) Use kmem_cache_create() to create a new kmem_cache named A.
  2) Coincidentally, the kmem_cache A is an alias for kmem_cache B,
     so the refcount of B is just increased.
  3) Use kmem_cache_destroy() to destroy the kmem_cache A, just
     decrease the B's refcount but mark the B as dying.
  4) Create a new memory cgroup and alloc memory from the kmem_cache
     B. It leads to create a non-root kmem_cache for allocating memory.
  5) When destroy the memory cgroup created in the step 4), the
     non-root kmem_cache can never be destroyed.

If we repeat steps 4) and 5), this will cause a lot of memory leak.  So
only when refcount reach zero, we mark the root kmem_cache as dying.

Fixes: 92ee383f6d ("mm: fix race between kmem_cache destroy, create and deactivate")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200716165103.83462-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-29 10:16:57 +02:00
Roman Gushchin
763b04c6b2 mm: memcg/slab: synchronize access to kmem_cache dying flag using a spinlock
[ Upstream commit 63b02ef7dc4ec239df45c018ac0adbd02ba30a0c ]

Currently the memcg_params.dying flag and the corresponding workqueue used
for the asynchronous deactivation of kmem_caches is synchronized using the
slab_mutex.

It makes impossible to check this flag from the irq context, which will be
required in order to implement asynchronous release of kmem_caches.

So let's switch over to the irq-save flavor of the spinlock-based
synchronization.

Link: http://lkml.kernel.org/r/20190611231813.3148843-8-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Waiman Long <longman@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-29 10:16:57 +02:00
Hugh Dickins
91404e91eb mm/memcg: fix refcount error while moving and swapping
commit 8d22a9351035ef2ff12ef163a1091b8b8cf1e49c upstream.

It was hard to keep a test running, moving tasks between memcgs with
move_charge_at_immigrate, while swapping: mem_cgroup_id_get_many()'s
refcount is discovered to be 0 (supposedly impossible), so it is then
forced to REFCOUNT_SATURATED, and after thousands of warnings in quick
succession, the test is at last put out of misery by being OOM killed.

This is because of the way moved_swap accounting was saved up until the
task move gets completed in __mem_cgroup_clear_mc(), deferred from when
mem_cgroup_move_swap_account() actually exchanged old and new ids.
Concurrent activity can free up swap quicker than the task is scanned,
bringing id refcount down 0 (which should only be possible when
offlining).

Just skip that optimization: do that part of the accounting immediately.

Fixes: 615d66c37c ("mm: memcontrol: fix memcg id ref counter on swap charge move")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2007071431050.4726@eggly.anvils
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29 10:16:57 +02:00
Greg Kroah-Hartman
a6c20c0442 Merge 4.19.132 into android-4.19-stable
Changes in 4.19.132
	btrfs: fix a block group ref counter leak after failure to remove block group
	mm: fix swap cache node allocation mask
	EDAC/amd64: Read back the scrub rate PCI register on F15h
	usbnet: smsc95xx: Fix use-after-free after removal
	mm/slub.c: fix corrupted freechain in deactivate_slab()
	mm/slub: fix stack overruns with SLUB_STATS
	usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect
	s390/debug: avoid kernel warning on too large number of pages
	nvme-multipath: set bdi capabilities once
	nvme-multipath: fix deadlock between ana_work and scan_work
	kgdb: Avoid suspicious RCU usage warning
	crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()
	drm/msm/dpu: fix error return code in dpu_encoder_init
	cxgb4: use unaligned conversion for fetching timestamp
	cxgb4: parse TC-U32 key values and masks natively
	cxgb4: use correct type for all-mask IP address comparison
	cxgb4: fix SGE queue dump destination buffer context
	hwmon: (max6697) Make sure the OVERT mask is set correctly
	hwmon: (acpi_power_meter) Fix potential memory leak in acpi_power_meter_add()
	drm: sun4i: hdmi: Remove extra HPD polling
	virtio-blk: free vblk-vqs in error path of virtblk_probe()
	SMB3: Honor 'posix' flag for multiuser mounts
	nvme: fix a crash in nvme_mpath_add_disk
	i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665
	i2c: mlxcpld: check correct size of maximum RECV_LEN packet
	nfsd: apply umask on fs without ACL support
	Revert "ALSA: usb-audio: Improve frames size computation"
	SMB3: Honor 'seal' flag for multiuser mounts
	SMB3: Honor persistent/resilient handle flags for multiuser mounts
	SMB3: Honor lease disabling for multiuser mounts
	cifs: Fix the target file was deleted when rename failed.
	MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
	irqchip/gic: Atomically update affinity
	dm zoned: assign max_io_len correctly
	efi: Make it possible to disable efivar_ssdt entirely
	Linux 4.19.132

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I394bc3ee947bdf92bad83297e7bd579e92fed8a9
2020-07-09 11:20:59 +02:00
Qian Cai
3e632652e3 mm/slub: fix stack overruns with SLUB_STATS
[ Upstream commit a68ee0573991e90af2f1785db309206408bad3e5 ]

There is no need to copy SLUB_STATS items from root memcg cache to new
memcg cache copies.  Doing so could result in stack overruns because the
store function only accepts 0 to clear the stat and returns an error for
everything else while the show method would print out the whole stat.

Then, the mismatch of the lengths returns from show and store methods
happens in memcg_propagate_slab_attrs():

	else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf))
		buf = mbuf;

max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64]
in show_stat() later where a bounch of sprintf() would overrun the stack
variable.  Fix it by always allocating a page of buffer to be used in
show_stat() if SLUB_STATS=y which should only be used for debug purpose.

  # echo 1 > /sys/kernel/slab/fs_cache/shrink
  BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0
  Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251

  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
  Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
  Call Trace:
    number+0x421/0x6e0
    vsnprintf+0x451/0x8e0
    sprintf+0x9e/0xd0
    show_stat+0x124/0x1d0
    alloc_slowpath_show+0x13/0x20
    __kmem_cache_create+0x47a/0x6b0

  addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame:
   process_one_work+0x0/0xb90

  this frame has 1 object:
   [32, 72) 'lockdep_map'

  Memory state around the buggy address:
   ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  >ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
                                                         ^
   ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00
   ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ==================================================================
  Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0
  Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
  Call Trace:
    __kmem_cache_create+0x6ac/0x6b0

Fixes: 107dab5c92 ("slub: slub-specific propagation changes")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Glauber Costa <glauber@scylladb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Link: http://lkml.kernel.org/r/20200429222356.4322-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-09 09:37:09 +02:00