Changes in 4.19.126
ax25: fix setsockopt(SO_BINDTODEVICE)
dpaa_eth: fix usage as DSA master, try 3
net: dsa: mt7530: fix roaming from DSA user ports
__netif_receive_skb_core: pass skb by reference
net: inet_csk: Fix so_reuseport bind-address cache in tb->fast*
net: ipip: fix wrong address family in init error path
net/mlx5: Add command entry handling completion
net: qrtr: Fix passing invalid reference to qrtr_local_enqueue()
net: revert "net: get rid of an signed integer overflow in ip_idents_reserve()"
net sched: fix reporting the first-time use timestamp
r8152: support additional Microsoft Surface Ethernet Adapter variant
sctp: Don't add the shutdown timer if its already been added
sctp: Start shutdown on association restart if in SHUTDOWN-SENT state and socket is closed
net/mlx5e: Update netdev txq on completions during closure
net/mlx5: Annotate mutex destroy for root ns
net: sun: fix missing release regions in cas_init_one().
net/mlx4_core: fix a memory leak bug.
mlxsw: spectrum: Fix use-after-free of split/unsplit/type_set in case reload fails
ARM: dts: rockchip: fix phy nodename for rk3228-evb
arm64: dts: rockchip: fix status for &gmac2phy in rk3328-evb.dts
arm64: dts: rockchip: swap interrupts interrupt-names rk3399 gpu node
ARM: dts: rockchip: swap clock-names of gpu nodes
ARM: dts: rockchip: fix pinctrl sub nodename for spi in rk322x.dtsi
gpio: tegra: mask GPIO IRQs during IRQ shutdown
ALSA: usb-audio: add mapping for ASRock TRX40 Creator
net: microchip: encx24j600: add missed kthread_stop
gfs2: move privileged user check to gfs2_quota_lock_check
cachefiles: Fix race between read_waiter and read_copier involving op->to_do
usb: dwc3: pci: Enable extcon driver for Intel Merrifield
usb: gadget: legacy: fix redundant initialization warnings
net: freescale: select CONFIG_FIXED_PHY where needed
IB/i40iw: Remove bogus call to netdev_master_upper_dev_get()
riscv: stacktrace: Fix undefined reference to `walk_stackframe'
cifs: Fix null pointer check in cifs_read
samples: bpf: Fix build error
Input: usbtouchscreen - add support for BonXeon TP
Input: evdev - call input_flush_device() on release(), not flush()
Input: xpad - add custom init packet for Xbox One S controllers
Input: dlink-dir685-touchkeys - fix a typo in driver name
Input: i8042 - add ThinkPad S230u to i8042 reset list
Input: synaptics-rmi4 - really fix attn_data use-after-free
Input: synaptics-rmi4 - fix error return code in rmi_driver_probe()
ARM: 8970/1: decompressor: increase tag size
ARM: 8843/1: use unified assembler in headers
ARM: uaccess: consolidate uaccess asm to asm/uaccess-asm.h
ARM: uaccess: integrate uaccess_save and uaccess_restore
ARM: uaccess: fix DACR mismatch with nested exceptions
gpio: exar: Fix bad handling for ida_simple_get error path
IB/qib: Call kobject_put() when kobject_init_and_add() fails
ARM: dts/imx6q-bx50v3: Set display interface clock parents
ARM: dts: bcm2835-rpi-zero-w: Fix led polarity
ARM: dts: bcm: HR2: Fix PPI interrupt types
mmc: block: Fix use-after-free issue for rpmb
RDMA/pvrdma: Fix missing pci disable in pvrdma_pci_probe()
ALSA: hwdep: fix a left shifting 1 by 31 UB bug
ALSA: hda/realtek - Add a model for Thinkpad T570 without DAC workaround
ALSA: usb-audio: mixer: volume quirk for ESS Technology Asus USB DAC
exec: Always set cap_ambient in cap_bprm_set_creds
ALSA: usb-audio: Quirks for Gigabyte TRX40 Aorus Master onboard audio
ALSA: hda/realtek - Add new codec supported for ALC287
libceph: ignore pool overlay and cache logic on redirects
IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
mm: remove VM_BUG_ON(PageSlab()) from page_mapcount()
fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info()
include/asm-generic/topology.h: guard cpumask_of_node() macro argument
iommu: Fix reference count leak in iommu_group_alloc.
parisc: Fix kernel panic in mem_init()
mmc: core: Fix recursive locking issue in CQE recovery path
RDMA/core: Fix double destruction of uobject
mac80211: mesh: fix discovery timer re-arming issue / crash
x86/dma: Fix max PFN arithmetic overflow on 32 bit systems
copy_xstate_to_kernel(): don't leave parts of destination uninitialized
xfrm: allow to accept packets with ipv6 NEXTHDR_HOP in xfrm_input
xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output
xfrm interface: fix oops when deleting a x-netns interface
xfrm: fix a warning in xfrm_policy_insert_list
xfrm: fix a NULL-ptr deref in xfrm_local_error
xfrm: fix error in comment
vti4: eliminated some duplicate code.
ip_vti: receive ipip packet by calling ip_tunnel_rcv
netfilter: nft_reject_bridge: enable reject with bridge vlan
netfilter: ipset: Fix subcounter update skip
netfilter: nfnetlink_cthelper: unbreak userspace helper support
netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code
esp6: get the right proto for transport mode in esp6_gso_encap
bnxt_en: Fix accumulation of bp->net_stats_prev.
xsk: Add overflow check for u64 division, stored into u32
qlcnic: fix missing release in qlcnic_83xx_interrupt_test.
crypto: chelsio/chtls: properly set tp->lsndtime
bonding: Fix reference count leak in bond_sysfs_slave_add.
netfilter: nf_conntrack_pptp: fix compilation warning with W=1 build
mm/vmalloc.c: don't dereference possible NULL pointer in __vunmap()
Linux 4.19.126
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic7ffeb4cbc4d3f1b49c60d97a5d113fcad1d098a
Changes in 4.19.119
ext4: fix extent_status fragmentation for plain files
drm/msm: Use the correct dma_sync calls harder
bpftool: Fix printing incorrect pointer in btf_dump_ptr
crypto: mxs-dcp - make symbols 'sha1_null_hash' and 'sha256_null_hash' static
vti4: removed duplicate log message.
arm64: Add part number for Neoverse N1
arm64: errata: Hide CTR_EL0.DIC on systems affected by Neoverse-N1 #1542419
arm64: Fake the IminLine size on systems affected by Neoverse-N1 #1542419
arm64: compat: Workaround Neoverse-N1 #1542419 for compat user-space
arm64: Silence clang warning on mismatched value/register sizes
watchdog: reset last_hw_keepalive time at start
scsi: lpfc: Fix kasan slab-out-of-bounds error in lpfc_unreg_login
scsi: lpfc: Fix crash in target side cable pulls hitting WAIT_FOR_UNREG
ceph: return ceph_mdsc_do_request() errors from __get_parent()
ceph: don't skip updating wanted caps when cap is stale
pwm: rcar: Fix late Runtime PM enablement
scsi: iscsi: Report unbind session event when the target has been removed
ASoC: Intel: atom: Take the drv->lock mutex before calling sst_send_slot_map()
nvme: fix deadlock caused by ANA update wrong locking
kernel/gcov/fs.c: gcov_seq_next() should increase position index
selftests: kmod: fix handling test numbers above 9
ipc/util.c: sysvipc_find_ipc() should increase position index
kconfig: qconf: Fix a few alignment issues
s390/cio: avoid duplicated 'ADD' uevents
loop: Better discard support for block devices
Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled"
pwm: renesas-tpu: Fix late Runtime PM enablement
pwm: bcm2835: Dynamically allocate base
perf/core: Disable page faults when getting phys address
ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN MPWIN895CL tablet
xhci: Ensure link state is U3 after setting USB_SS_PORT_LS_U3
drm/amd/display: Not doing optimize bandwidth if flip pending.
tracing/selftests: Turn off timeout setting
virtio-blk: improve virtqueue error to BLK_STS
scsi: smartpqi: fix call trace in device discovery
PCI/ASPM: Allow re-enabling Clock PM
net: ipv6: add net argument to ip6_dst_lookup_flow
net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup
blktrace: Protect q->blk_trace with RCU
blktrace: fix dereference after null check
f2fs: fix to avoid memory leakage in f2fs_listxattr
KVM: VMX: Zero out *all* general purpose registers after VM-Exit
KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from vmcs01
KVM: Introduce a new guest mapping API
kvm: fix compilation on aarch64
kvm: fix compilation on s390
kvm: fix compile on s390 part 2
KVM: Properly check if "page" is valid in kvm_vcpu_unmap
x86/kvm: Introduce kvm_(un)map_gfn()
x86/kvm: Cache gfn to pfn translation
x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed
x86/KVM: Clean up host's steal time structure
cxgb4: fix adapter crash due to wrong MC size
cxgb4: fix large delays in PTP synchronization
ipv6: fix restrict IPV6_ADDRFORM operation
macsec: avoid to set wrong mtu
macvlan: fix null dereference in macvlan_device_event()
net: bcmgenet: correct per TX/RX ring statistics
net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node
net: stmmac: dwmac-meson8b: Add missing boundary to RGMII TX clock array
net/x25: Fix x25_neigh refcnt leak when receiving frame
sched: etf: do not assume all sockets are full blown
tcp: cache line align MAX_TCP_HEADER
team: fix hang in team_mode_get()
vrf: Fix IPv6 with qdisc and xfrm
net: dsa: b53: Lookup VID in ARL searches when VLAN is enabled
net: dsa: b53: Fix ARL register definitions
net: dsa: b53: Rework ARL bin logic
net: dsa: b53: b53_arl_rw_op() needs to select IVL or SVL
xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish
vrf: Check skb for XFRM_TRANSFORMED flag
mlxsw: Fix some IS_ERR() vs NULL bugs
KEYS: Avoid false positive ENOMEM error on key read
ALSA: hda: Remove ASUS ROG Zenith from the blacklist
ALSA: usb-audio: Add static mapping table for ALC1220-VB-based mobos
ALSA: usb-audio: Add connector notifier delegation
iio: core: remove extra semi-colon from devm_iio_device_register() macro
iio: st_sensors: rely on odr mask to know if odr can be set
iio: adc: stm32-adc: fix sleep in atomic context
iio: xilinx-xadc: Fix ADC-B powerdown
iio: xilinx-xadc: Fix clearing interrupt when enabling trigger
iio: xilinx-xadc: Fix sequencer configuration for aux channels in simultaneous mode
iio: xilinx-xadc: Make sure not exceed maximum samplerate
fs/namespace.c: fix mountpoint reference counter race
USB: sisusbvga: Change port variable from signed to unsigned
USB: Add USB_QUIRK_DELAY_CTRL_MSG and USB_QUIRK_DELAY_INIT for Corsair K70 RGB RAPIDFIRE
USB: early: Handle AMD's spec-compliant identifiers, too
USB: core: Fix free-while-in-use bug in the USB S-Glibrary
USB: hub: Fix handling of connect changes during sleep
vmalloc: fix remap_vmalloc_range() bounds checks
mm/hugetlb: fix a addressing exception caused by huge_pte_offset
mm/ksm: fix NULL pointer dereference when KSM zero page is enabled
tools/vm: fix cross-compile build
ALSA: usx2y: Fix potential NULL dereference
ALSA: hda/realtek - Fix unexpected init_amp override
ALSA: hda/realtek - Add new codec supported for ALC245
ALSA: usb-audio: Fix usb audio refcnt leak when getting spdif
ALSA: usb-audio: Filter out unsupported sample rates on Focusrite devices
tpm/tpm_tis: Free IRQ if probing fails
tpm: ibmvtpm: retry on H_CLOSED in tpm_ibmvtpm_send()
KVM: s390: Return last valid slot if approx index is out-of-bounds
KVM: Check validity of resolved slot when searching memslots
KVM: VMX: Enable machine check support for 32bit targets
tty: hvc: fix buffer overflow during hvc_alloc().
tty: rocket, avoid OOB access
usb-storage: Add unusual_devs entry for JMicron JMS566
audit: check the length of userspace generated audit records
ASoC: dapm: fixup dapm kcontrol widget
iwlwifi: pcie: actually release queue memory in TVQM
iwlwifi: mvm: beacon statistics shouldn't go backwards
ARM: imx: provide v7_cpu_resume() only on ARM_CPU_SUSPEND=y
powerpc/setup_64: Set cache-line-size based on cache-block-size
staging: comedi: dt2815: fix writing hi byte of analog output
staging: comedi: Fix comedi_device refcnt leak in comedi_open
vt: don't hardcode the mem allocation upper bound
vt: don't use kmalloc() for the unicode screen buffer
staging: vt6656: Don't set RCR_MULTICAST or RCR_BROADCAST by default.
staging: vt6656: Fix calling conditions of vnt_set_bss_mode
staging: vt6656: Fix drivers TBTT timing counter.
staging: vt6656: Fix pairwise key entry save.
staging: vt6656: Power save stop wake_up_count wrap around.
cdc-acm: close race betrween suspend() and acm_softint
cdc-acm: introduce a cool down
UAS: no use logging any details in case of ENODEV
UAS: fix deadlock in error handling and PM flushing work
usb: dwc3: gadget: Fix request completion check
usb: f_fs: Clear OS Extended descriptor counts to zero in ffs_data_reset()
xhci: prevent bus suspend if a roothub port detected a over-current condition
serial: sh-sci: Make sure status register SCxSR is read in correct sequence
xfs: Fix deadlock between AGI and AGF with RENAME_WHITEOUT
s390/mm: fix page table upgrade vs 2ndary address mode accesses
Linux 4.19.119
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4b16db8472367d135a4ff68d2863c634bf093ef5
commit bdebd6a2831b6fab69eb85cee74a8ba77f1a1cc2 upstream.
remap_vmalloc_range() has had various issues with the bounds checks it
promises to perform ("This function checks that addr is a valid
vmalloc'ed area, and that it is big enough to cover the vma") over time,
e.g.:
- not detecting pgoff<<PAGE_SHIFT overflow
- not detecting (pgoff<<PAGE_SHIFT)+usize overflow
- not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same
vmalloc allocation
- comparing a potentially wildly out-of-bounds pointer with the end of
the vmalloc region
In particular, since commit fc9702273e2e ("bpf: Add mmap() support for
BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer
dereferences by calling mmap() on a BPF map with a size that is bigger
than the distance from the start of the BPF map to the end of the
address space.
This could theoretically be used as a kernel ASLR bypass, by using
whether mmap() with a given offset oopses or returns an error code to
perform a binary search over the possible address range.
To allow remap_vmalloc_range_partial() to verify that addr and
addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset
to remap_vmalloc_range_partial() instead of adding it to the pointer in
remap_vmalloc_range().
In remap_vmalloc_range_partial(), fix the check against
get_vm_area_size() by using size comparisons instead of pointer
comparisons, and add checks for pgoff.
Fixes: 833423143c ("[PATCH] mm: introduce remap_vmalloc_range()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.19.117
amd-xgbe: Use __napi_schedule() in BH context
hsr: check protocol version in hsr_newlink()
net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin
net: ipv6: do not consider routes via gateways for anycast address check
net: qrtr: send msgs from local of same id as broadcast
net: revert default NAPI poll timeout to 2 jiffies
net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes
net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
ovl: fix value of i_ino for lower hardlink corner case
scsi: ufs: Fix ufshcd_hold() caused scheduling while atomic
jbd2: improve comments about freeing data buffers whose page mapping is NULL
pwm: pca9685: Fix PWM/GPIO inter-operation
ext4: fix incorrect group count in ext4_fill_super error message
ext4: fix incorrect inodes per group in error message
ASoC: Intel: mrfld: fix incorrect check on p->sink
ASoC: Intel: mrfld: return error codes when an error occurs
ALSA: usb-audio: Filter error from connector kctl ops, too
ALSA: usb-audio: Don't override ignore_ctl_error value from the map
ALSA: usb-audio: Don't create jack controls for PCM terminals
ALSA: usb-audio: Check mapping at creating connector controls, too
keys: Fix proc_keys_next to increase position index
tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation
btrfs: check commit root generation in should_ignore_root
mac80211_hwsim: Use kstrndup() in place of kasprintf()
usb: dwc3: gadget: don't enable interrupt when disabling endpoint
usb: dwc3: gadget: Don't clear flags before transfer ended
drm/amd/powerplay: force the trim of the mclk dpm_levels if OD is enabled
ext4: do not zeroout extents beyond i_disksize
kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBD
scsi: target: remove boilerplate code
scsi: target: fix hang when multiple threads try to destroy the same iscsi session
x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
x86/resctrl: Preserve CDP enable over CPU hotplug
x86/resctrl: Fix invalid attempt at removing the default resource group
wil6210: check rx_buff_mgmt before accessing it
wil6210: ignore HALP ICR if already handled
wil6210: add general initialization/size checks
wil6210: make sure Rx ring sizes are correlated
wil6210: remove reset file from debugfs
mm/vmalloc.c: move 'area->pages' after if statement
Linux 4.19.117
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib4ab9aa34c22c034887be15902a625ecc5622b35
Changes in 4.19.113
drm/mediatek: Find the cursor plane instead of hard coding it
spi: qup: call spi_qup_pm_resume_runtime before suspending
powerpc: Include .BTF section
ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes
spi: pxa2xx: Add CS control clock quirk
spi/zynqmp: remove entry that causes a cs glitch
drm/exynos: dsi: propagate error value and silence meaningless warning
drm/exynos: dsi: fix workaround for the legacy clock name
drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer
altera-stapl: altera_get_note: prevent write beyond end of 'key'
dm bio record: save/restore bi_end_io and bi_integrity
dm integrity: use dm_bio_record and dm_bio_restore
riscv: avoid the PIC offset of static percpu data in module beyond 2G limits
drm/amd/display: Clear link settings on MST disable connector
drm/amd/display: fix dcc swath size calculations on dcn1
xenbus: req->body should be updated before req->state
xenbus: req->err should be updated before req->state
block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
parse-maintainers: Mark as executable
USB: Disable LPM on WD19's Realtek Hub
usb: quirks: add NO_LPM quirk for RTL8153 based ethernet adapters
USB: serial: option: add ME910G1 ECM composition 0x110b
usb: host: xhci-plat: add a shutdown
USB: serial: pl2303: add device-id for HP LD381
usb: xhci: apply XHCI_SUSPEND_DELAY to AMD XHCI controller 1022:145c
ALSA: line6: Fix endless MIDI read loop
ALSA: seq: virmidi: Fix running status after receiving sysex
ALSA: seq: oss: Fix running status after receiving sysex
ALSA: pcm: oss: Avoid plugin buffer overflow
ALSA: pcm: oss: Remove WARNING from snd_pcm_plug_alloc() checks
iio: st_sensors: remap SMO8840 to LIS2DH12
iio: trigger: stm32-timer: disable master mode when stopping
iio: magnetometer: ak8974: Fix negative raw values in sysfs
iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode
mmc: rtsx_pci: Fix support for speed-modes that relies on tuning
mmc: sdhci-of-at91: fix cd-gpios for SAMA5D2
staging: rtl8188eu: Add device id for MERCUSYS MW150US v2
staging: greybus: loopback_test: fix poll-mask build breakage
staging/speakup: fix get_word non-space look-ahead
intel_th: Fix user-visible error codes
intel_th: pci: Add Elkhart Lake CPU support
rtc: max8907: add missing select REGMAP_IRQ
xhci: Do not open code __print_symbolic() in xhci trace events
btrfs: fix log context list corruption after rename whiteout error
drm/amd/amdgpu: Fix GPR read from debugfs (v2)
drm/lease: fix WARNING in idr_destroy
memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
mm: slub: be more careful about the double cmpxchg of freelist
mm, slub: prevent kmalloc_node crashes and memory leaks
page-flags: fix a crash at SetPageError(THP_SWAP)
x86/mm: split vmalloc_sync_all()
USB: cdc-acm: fix close_delay and closing_wait units in TIOCSSERIAL
USB: cdc-acm: fix rounding error in TIOCSSERIAL
iio: light: vcnl4000: update sampling periods for vcnl4200
kbuild: Disable -Wpointer-to-enum-cast
futex: Fix inode life-time issue
futex: Unbreak futex hashing
Revert "vrf: mark skb for multicast or link-local as enslaved to VRF"
Revert "ipv6: Fix handling of LLA with VRF and sockets bound to VRF"
ALSA: hda/realtek: Fix pop noise on ALC225
arm64: smp: fix smp_send_stop() behaviour
arm64: smp: fix crash_smp_send_stop() behaviour
drm/bridge: dw-hdmi: fix AVI frame colorimetry
staging: greybus: loopback_test: fix potential path truncation
staging: greybus: loopback_test: fix potential path truncations
Linux 4.19.113
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I90c48cd7189a964e59d199ecc0f32c0a68688ec5
Vmalloc() is getting more and more used these days (kernel stacks, bpf and
percpu allocator are new top users), and the total % of memory consumed by
vmalloc() can be pretty significant and changes dynamically.
/proc/meminfo is the best place to display this information: its top goal
is to show top consumers of the memory.
Since the VmallocUsed field in /proc/meminfo is not in use for quite a
long time (it has been defined to 0 by a5ad88ce8c ("mm: get rid of
'vmalloc_info' from /proc/meminfo")), let's reuse it for showing the
actual physical memory consumption of vmalloc().
Link: http://lkml.kernel.org/r/20190417194002.12369-3-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 97105f0ab7b877a8ece2005e214894e93793950c)
Signed-off-by: Minchan Kim <minchan@google.com>
Bug: 141257998
Test: "cat /proc/meminfo | grep VmallocUsed" doesn't show 0 any longer
Change-Id: Ibedcd5c1290ba7be3293e220a917434008a30996
commit 3f8fd02b1bf1d7ba964485a56f2f4b53ae88c167 upstream.
On x86-32 with PTI enabled, parts of the kernel page-tables are not shared
between processes. This can cause mappings in the vmalloc/ioremap area to
persist in some page-tables after the region is unmapped and released.
When the region is re-used the processes with the old mappings do not fault
in the new mappings but still access the old ones.
This causes undefined behavior, in reality often data corruption, kernel
oopses and panics and even spontaneous reboots.
Fix this problem by activly syncing unmaps in the vmalloc/ioremap area to
all page-tables in the system before the regions can be re-used.
References: https://bugzilla.suse.com/show_bug.cgi?id=1118689
Fixes: 5d72b4fba4 ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-4-joro@8bytes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit afd07389d3f4933c7f7817a92fb5e053d59a3182 ]
One of the vmalloc stress test case triggers the kernel BUG():
<snip>
[60.562151] ------------[ cut here ]------------
[60.562154] kernel BUG at mm/vmalloc.c:512!
[60.562206] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[60.562247] CPU: 0 PID: 430 Comm: vmalloc_test/0 Not tainted 4.20.0+ #161
[60.562293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[60.562351] RIP: 0010:alloc_vmap_area+0x36f/0x390
<snip>
it can happen due to big align request resulting in overflowing of
calculated address, i.e. it becomes 0 after ALIGN()'s fixup.
Fix it by checking if calculated address is within vstart/vend range.
Link: http://lkml.kernel.org/r/20190124115648.9433-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 401592d2e095947344e10ec0623adbcd58934dd4 upstream.
When VM_NO_GUARD is not set area->size includes adjacent guard page,
thus for correct size checking get_vm_area_size() should be used, but
not area->size.
This fixes possible kernel oops when userspace tries to mmap an area on
1 page bigger than was allocated by vmalloc_user() call: the size check
inside remap_vmalloc_range_partial() accounts non-existing guard page
also, so check successfully passes but vmalloc_to_page() returns NULL
(guard page does not physically exist).
The following code pattern example should trigger an oops:
static int oops_mmap(struct file *file, struct vm_area_struct *vma)
{
void *mem;
mem = vmalloc_user(4096);
BUG_ON(!mem);
/* Do not care about mem leak */
return remap_vmalloc_range(vma, mem, 0);
}
And userspace simply mmaps size + PAGE_SIZE:
mmap(NULL, 8192, PROT_WRITE|PROT_READ, MAP_PRIVATE, fd, 0);
Possible candidates for oops which do not have any explicit size
checks:
*** drivers/media/usb/stkwebcam/stk-webcam.c:
v4l_stk_mmap[789] ret = remap_vmalloc_range(vma, sbuf->buffer, 0);
Or the following one:
*** drivers/video/fbdev/core/fbmem.c
static int
fb_mmap(struct file *file, struct vm_area_struct * vma)
...
res = fb->fb_mmap(info, vma);
Where fb_mmap callback calls remap_vmalloc_range() directly without any
explicit checks:
*** drivers/video/fbdev/vfb.c
static int vfb_mmap(struct fb_info *info,
struct vm_area_struct *vma)
{
return remap_vmalloc_range(vma, (void *)info->fix.smem_start, vma->vm_pgoff);
}
Link: http://lkml.kernel.org/r/20190103145954.16942-2-rpenyaev@suse.de
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Variant of proc_create_data that directly take a struct seq_operations
argument + a private state size and drastically reduces the boilerplate
code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Variants of proc_create{,_data} that directly take a struct seq_operations
argument and drastically reduces the boilerplate code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Kai Heng Feng has noticed that BUG_ON(PageHighMem(pg)) triggers in
drivers/media/common/saa7146/saa7146_core.c since 19809c2da2 ("mm,
vmalloc: use __GFP_HIGHMEM implicitly").
saa7146_vmalloc_build_pgtable uses vmalloc_32 and it is reasonable to
expect that the resulting page is not in highmem. The above commit
aimed to add __GFP_HIGHMEM only for those requests which do not specify
any zone modifier gfp flag. vmalloc_32 relies on GFP_VMALLOC32 which
should do the right thing. Except it has been missed that GFP_VMALLOC32
is an alias for GFP_KERNEL on 32b architectures. Thanks to Matthew to
notice this.
Fix the problem by unconditionally setting GFP_DMA32 in GFP_VMALLOC32
for !64b arches (as a bailout). This should do the right thing and use
ZONE_NORMAL which should be always below 4G on 32b systems.
Debugged by Matthew Wilcox.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20180212095019.GX21609@dhcp22.suse.cz
Fixes: 19809c2da2 ("mm, vmalloc: use __GFP_HIGHMEM implicitly”)
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Kai Heng Feng <kai.heng.feng@canonical.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commits 5d17a73a2e ("vmalloc: back off when the current
task is killed") and 171012f561 ("mm: don't warn when vmalloc() fails
due to a fatal signal").
Commit 5d17a73a2e ("vmalloc: back off when the current task is
killed") made all vmalloc allocations from a signal-killed task fail.
We have seen crashes in the tty driver from this, where a killed task
exiting tries to switch back to N_TTY, fails n_tty_open because of the
vmalloc failing, and later crashes when dereferencing tty->disc_data.
Arguably, relying on a vmalloc() call to succeed in order to properly
exit a task is not the most robust way of doing things. There will be a
follow-up patch to the tty code to fall back to the N_NULL ldisc.
But the justification to make that vmalloc() call fail like this isn't
convincing, either. The patch mentions an OOM victim exhausting the
memory reserves and thus deadlocking the machine. But the OOM killer is
only one, improbable source of fatal signals. It doesn't make sense to
fail allocations preemptively with plenty of memory in most cases.
The patch doesn't mention real-life instances where vmalloc sites would
exhaust memory, which makes it sound more like a theoretical issue to
begin with. But just in case, the OOM access to memory reserves has
been restricted on the allocator side in cd04ae1e2d ("mm, oom: do not
rely on TIF_MEMDIE for memory reserves access"), which should take care
of any theoretical concerns on that front.
Revert this patch, and the follow-up that suppresses the allocation
warnings when we fail the allocations due to a signal.
Link: http://lkml.kernel.org/r/20171004185906.GB2136@cmpxchg.org
Fixes: 171012f561 ("mm: don't warn when vmalloc() fails due to a fatal signal")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alan Cox <alan@llwyncelyn.cymru>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
the page allocator. This has been true but only for allocations
requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always
ignored for smaller sizes. This is a bit unfortunate because there is
no way to express the same semantic for those requests and they are
considered too important to fail so they might end up looping in the
page allocator for ever, similarly to GFP_NOFAIL requests.
Now that the whole tree has been cleaned up and accidental or misled
usage of __GFP_REPEAT flag has been removed for !costly requests we can
give the original flag a better name and more importantly a more useful
semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
that the allocator would try really hard but there is no promise of a
success. This will work independent of the order and overrides the
default allocator behavior. Page allocator users have several levels of
guarantee vs. cost options (take GFP_KERNEL as an example)
- GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
attempt to free memory at all. The most light weight mode which even
doesn't kick the background reclaim. Should be used carefully because
it might deplete the memory and the next user might hit the more
aggressive reclaim
- GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
allocation without any attempt to free memory from the current
context but can wake kswapd to reclaim memory if the zone is below
the low watermark. Can be used from either atomic contexts or when
the request is a performance optimization and there is another
fallback for a slow path.
- (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
non sleeping allocation with an expensive fallback so it can access
some portion of memory reserves. Usually used from interrupt/bh
context with an expensive slow path fallback.
- GFP_KERNEL - both background and direct reclaim are allowed and the
_default_ page allocator behavior is used. That means that !costly
allocation requests are basically nofail but there is no guarantee of
that behavior so failures have to be checked properly by callers
(e.g. OOM killer victim is allowed to fail currently).
- GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
and all allocation requests fail early rather than cause disruptive
reclaim (one round of reclaim in this implementation). The OOM killer
is not invoked.
- GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
behavior and all allocation requests try really hard. The request
will fail if the reclaim cannot make any progress. The OOM killer
won't be triggered.
- GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
and all allocation requests will loop endlessly until they succeed.
This might be really dangerous especially for larger orders.
Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
because they already had their semantic. No new users are added.
__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
there is no progress and we have already passed the OOM point.
This means that all the reclaim opportunities have been exhausted except
the most disruptive one (the OOM killer) and a user defined fallback
behavior is more sensible than keep retrying in the page allocator.
[akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
[mhocko@suse.com: semantic fix]
Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
[mhocko@kernel.org: address other thing spotted by Vlastimil]
Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alex Belits <alex.belits@cavium.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When ioremap a 67112960 bytes vm_area with the vmallocinfo:
[..]
0xec79b000-0xec7fa000 389120 ftl_add_mtd+0x4d0/0x754 pages=94 vmalloc
0xec800000-0xecbe1000 4067328 kbox_proc_mem_write+0x104/0x1c4 phys=8b520000 ioremap
we get the result:
0xf1000000-0xf5001000 67112960 devm_ioremap+0x38/0x7c phys=40000000 ioremap
For the align for ioremap must be less than '1 << IOREMAP_MAX_ORDER':
if (flags & VM_IOREMAP)
align = 1ul << clamp_t(int, get_count_order_long(size),
PAGE_SHIFT, IOREMAP_MAX_ORDER);
So it makes idiot like me a litte puzzled why this was a jump the
vm_area from 0xec800000-0xecbe1000 to 0xf1000000-0xf5001000, and leaving
0xed000000-0xf1000000 as a big hole.
This patch is to show all of vm_area, including vmas which are freeing
but still in the vmap_area_list, to make it more clear about why we will
get 0xf1000000-0xf5001000 in the above case. And we will get a
vmallocinfo like:
[..]
0xec79b000-0xec7fa000 389120 ftl_add_mtd+0x4d0/0x754 pages=94 vmalloc
0xec800000-0xecbe1000 4067328 kbox_proc_mem_write+0x104/0x1c4 phys=8b520000 ioremap
[..]
0xece7c000-0xece7e000 8192 unpurged vm_area
0xece7e000-0xece83000 20480 vm_map_ram
0xf0099000-0xf00aa000 69632 vm_map_ram
after this patch.
Link: http://lkml.kernel.org/r/1496649682-20710-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: zijun_hu <zijun_hu@htc.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kmemleak requires that vmalloc'ed objects have a minimum reference count
of 2: one in the corresponding vm_struct object and the other owned by
the vmalloc() caller. There are cases, however, where the original
vmalloc() returned pointer is lost and, instead, a pointer to vm_struct
is stored (see free_thread_stack()). Kmemleak currently reports such
objects as leaks.
This patch adds support for treating any surplus references to an object
as additional references to a specified object. It introduces the
kmemleak_vmalloc() API function which takes a vm_struct pointer and sets
its surplus reference passing to the actual vmalloc() returned pointer.
The __vmalloc_node_range() calling site has been modified accordingly.
Link: http://lkml.kernel.org/r/1495726937-23557-4-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Existing code that uses vmalloc_to_page() may assume that any address
for which is_vmalloc_addr() returns true may be passed into
vmalloc_to_page() to retrieve the associated struct page.
This is not un unreasonable assumption to make, but on architectures
that have CONFIG_HAVE_ARCH_HUGE_VMAP=y, it no longer holds, and we need
to ensure that vmalloc_to_page() does not go off into the weeds trying
to dereference huge PUDs or PMDs as table entries.
Given that vmalloc() and vmap() themselves never create huge mappings or
deal with compound pages at all, there is no correct answer in this
case, so return NULL instead, and issue a warning.
When reading /proc/kcore on arm64, you will hit an oops as soon as you
hit the huge mappings used for the various segments that make up the
mapping of vmlinux. With this patch applied, you will no longer hit the
oops, but the kcore contents willl be incorrect (these regions will be
zeroed out)
We are fixing this for kcore specifically, so it avoids vread() for
those regions. At least one other problematic user exists, i.e.,
/dev/kmem, but that is currently broken on arm64 for other reasons.
Link: http://lkml.kernel.org/r/20170609082226.26152-1-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1f5307b1e0 ("mm, vmalloc: properly track vmalloc users") has
pulled asm/pgtable.h include dependency to linux/vmalloc.h and that
turned out to be a bad idea for some architectures. E.g. m68k fails
with
In file included from arch/m68k/include/asm/pgtable_mm.h:145:0,
from arch/m68k/include/asm/pgtable.h:4,
from include/linux/vmalloc.h:9,
from arch/m68k/kernel/module.c:9:
arch/m68k/include/asm/mcf_pgtable.h: In function 'nocache_page':
>> arch/m68k/include/asm/mcf_pgtable.h:339:43: error: 'init_mm' undeclared (first use in this function)
#define pgd_offset_k(address) pgd_offset(&init_mm, address)
as spotted by kernel build bot. nios2 fails for other reason
In file included from include/asm-generic/io.h:767:0,
from arch/nios2/include/asm/io.h:61,
from include/linux/io.h:25,
from arch/nios2/include/asm/pgtable.h:18,
from include/linux/mm.h:70,
from include/linux/pid_namespace.h:6,
from include/linux/ptrace.h:9,
from arch/nios2/include/uapi/asm/elf.h:23,
from arch/nios2/include/asm/elf.h:22,
from include/linux/elf.h:4,
from include/linux/module.h:15,
from init/main.c:16:
include/linux/vmalloc.h: In function '__vmalloc_node_flags':
include/linux/vmalloc.h:99:40: error: 'PAGE_KERNEL' undeclared (first use in this function); did you mean 'GFP_KERNEL'?
which is due to the newly added #include <asm/pgtable.h>, which on nios2
includes <linux/io.h> and thus <asm/io.h> and <asm-generic/io.h> which
again includes <linux/vmalloc.h>.
Tweaking that around just turns out a bigger headache than necessary.
This patch reverts 1f5307b1e0 and reimplements the original fix in a
different way. __vmalloc_node_flags can stay static inline which will
cover vmalloc* functions. We only have one external user
(kvmalloc_node) and we can export __vmalloc_node_flags_caller and
provide the caller directly. This is much simpler and it doesn't really
need any games with header files.
[akpm@linux-foundation.org: coding-style fixes]
[mhocko@kernel.org: revert old comment]
Link: http://lkml.kernel.org/r/20170509211054.GB16325@dhcp22.suse.cz
Fixes: 1f5307b1e0 ("mm, vmalloc: properly track vmalloc users")
Link: http://lkml.kernel.org/r/20170509153702.GR6481@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more arm64 updates from Catalin Marinas:
- Silence module allocation failures when CONFIG_ARM*_MODULE_PLTS is
enabled. This requires a check for __GFP_NOWARN in alloc_vmap_area()
- Improve/sanitise user tagged pointers handling in the kernel
- Inline asm fixes/cleanups
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Silence first allocation with CONFIG_ARM64_MODULE_PLTS=y
ARM: Silence first allocation with CONFIG_ARM_MODULE_PLTS=y
mm: Silence vmap() allocation failures based on caller gfp_flags
arm64: uaccess: suppress spurious clang warning
arm64: atomic_lse: match asm register sizes
arm64: armv8_deprecated: ensure extension of addr
arm64: uaccess: ensure extension of access_ok() addr
arm64: ensure extension of smp_store_release value
arm64: xchg: hazard against entire exchange variable
arm64: documentation: document tagged pointer stack constraints
arm64: entry: improve data abort handling of tagged pointers
arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
arm64: traps: fix userspace cache maintenance emulation on a tagged pointer
If the caller has set __GFP_NOWARN don't print the following message:
vmap allocation for size 15736832 failed: use vmalloc=<size> to increase
size.
This can happen with the ARM/Linux or ARM64/Linux module loader built
with CONFIG_ARM{,64}_MODULE_PLTS=y which does a first attempt at loading
a large module from module space, then falls back to vmalloc space.
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
__vmalloc* allows users to provide gfp flags for the underlying
allocation. This API is quite popular
$ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
77
The only problem is that many people are not aware that they really want
to give __GFP_HIGHMEM along with other flags because there is really no
reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
which are mapped to the kernel vmalloc space. About half of users don't
use this flag, though. This signals that we make the API unnecessarily
too complex.
This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM
are simplified and drop the flag.
Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Cristopher Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__vmalloc_node_flags used to be static inline but this has changed by
"mm: introduce kv[mz]alloc helpers" because kvmalloc_node needs to use
it as well and the code is outside of the vmalloc proper. I haven't
realized that changing this will lead to a subtle bug though. The
function is responsible to track the caller as well. This caller is
then printed by /proc/vmallocinfo. If __vmalloc_node_flags is not
inline then we would get only direct users of __vmalloc_node_flags as
callers (e.g. v[mz]alloc) which reduces usefulness of this debugging
feature considerably. It simply doesn't help to see that the given
range belongs to vmalloc as a caller:
0xffffc90002c79000-0xffffc90002c7d000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N0=3
0xffffc90002c81000-0xffffc90002c85000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3
0xffffc90002c8d000-0xffffc90002c91000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3
0xffffc90002c95000-0xffffc90002c99000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3
We really want to catch the _caller_ of the vmalloc function. Fix this
issue by making __vmalloc_node_flags static inline again.
Link: http://lkml.kernel.org/r/20170502134657.12381-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "kvmalloc", v5.
There are many open coded kmalloc with vmalloc fallback instances in the
tree. Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.
As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.
Most callers, I could find, have been converted to use the helper
instead. This is patch 6. There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.
[1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com
This patch (of 9):
Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.
This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
require GFP_NO{FS,IO} context which is not vmalloc compatible in general
(note that the page table allocation is GFP_KERNEL). Those need to be
fixed separately.
While we are at it, document that __vmalloc{_node} about unsupported gfp
mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers. Existing ones would have to die
slowly.
[sfr@canb.auug.org.au: f2fs fixup]
Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull documentation update from Jonathan Corbet:
"A reasonably busy cycle for documentation this time around. There is a
new guide for user-space API documents, rather sparsely populated at
the moment, but it's a start. Markus improved the infrastructure for
converting diagrams. Mauro has converted much of the USB documentation
over to RST. Plus the usual set of fixes, improvements, and tweaks.
There's a bit more than the usual amount of reaching out of
Documentation/ to fix comments elsewhere in the tree; I have acks for
those where I could get them"
* tag 'docs-4.12' of git://git.lwn.net/linux: (74 commits)
docs: Fix a couple typos
docs: Fix a spelling error in vfio-mediated-device.txt
docs: Fix a spelling error in ioctl-number.txt
MAINTAINERS: update file entry for HSI subsystem
Documentation: allow installing man pages to a user defined directory
Doc/PM: Sync with intel_powerclamp code behavior
zr364xx.rst: usb/devices is now at /sys/kernel/debug/
usb.rst: move documentation from proc_usb_info.txt to USB ReST book
convert philips.txt to ReST and add to media docs
docs-rst: usb: update old usbfs-related documentation
arm: Documentation: update a path name
docs: process/4.Coding.rst: Fix a couple of document refs
docs-rst: fix usb cross-references
usb: gadget.h: be consistent at kernel doc macros
usb: composite.h: fix two warnings when building docs
usb: get rid of some ReST doc build errors
usb.rst: get rid of some Sphinx errors
usb/URB.txt: convert to ReST and update it
usb/persist.txt: convert to ReST and add to driver-api book
usb/hotplug.txt: convert to ReST and add to driver-api book
...
When vmalloc() fails it prints a very lengthy message with all the
details about memory consumption assuming that it happened due to OOM.
However, vmalloc() can also fail due to fatal signal pending. In such
case the message is quite confusing because it suggests that it is OOM
but the numbers suggest otherwise. The messages can also pollute
console considerably.
Don't warn when vmalloc() fails due to fatal signal pending.
Link: http://lkml.kernel.org/r/20170313114425.72724-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert all non-architecture-specific code to 5-level paging.
It's mostly mechanical adding handling one more page table level in
places where we deal with pud_t.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
task_struct::signal and task_struct::sighand are pointers, which would normally make it
straightforward to not define those types in sched.h.
That is not so, because the types are accompanied by a myriad of APIs (macros and inline
functions) that dereference them.
Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>.
With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore,
trying to put accessors into sched.h as a test fails the following way:
./include/linux/sched.h: In function ‘test_signal_types’:
./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’
^
This reduces the size and complexity of sched.h significantly.
Update all headers and .c code that relied on getting the signal handling
functionality from <linux/sched.h> to include <linux/sched/signal.h>.
The list of affected files in the preparatory patch was partly generated by
grepping for the APIs, and partly by doing coverage build testing, both
all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of
cross-architecture builds.
Nevertheless some (trivial) build breakage is still expected related to rare
Kconfig combinations and in-flight patches to various kernel code, but most
of it should be handled by this patch.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>